Design study of Software-Implemented Fault-Tolerance (SIFT) computer by Wensley, J. H. et al.
NASA Contractor Report 3011 
Design Study of Software4mplemented 
Fault-Tolerance (SIFT) Computer 
J. H. Wensley, J. Goldberg, M. W. Green, 
W. H. Kautz, K. N. Levitt, M. E. Mills, 
R. E. Shostak, P. M. Whiting-O'Keefe, 
and H. M. Zeidler 
CONTRACT NAS1-13792 
JUNE 1982 
NASA 
https://ntrs.nasa.gov/search.jsp?R=19820022093 2020-03-21T08:06:18+00:00Z
TECH LIBRARY KAFB, NM 
NASA Contractor Report 3011 
Design Study of Software-Implemented 
Fault-Tolerance (SIFT) Computer 
J. H. Wensley, J. Goldberg, M. W. Green, 
W. H. Kautz, K. N. Levitt, M. E. Mills, 
R. E. Shostak, P. M. Whiting-O’Keefe, 
and H. M. Zeidler 
SRI International 
MenZo Park, Cal$ornia 
Prepared for 
Langley Research Center 
under Contract NAS 1- 13792 
National  Aeronautics 
and Space  Administration 
Scientific  and  Technical 
Information Office 
1982 

CONTENTS 
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . .  v i i i  
I INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . .  1 
I1 SIGNIFICANT RESULTS AND OUTSTANDING PROBLEMS . . . . . .  5 
A . Significant Research Results . . . . . . . . . . . .  5 
B . Outstanding  Problems . . . . . . . . . . . . . . . .  8 
111 TECHNICAL PLAN FOR FUTURE DEVELOPMENT OF SIFT . . . . . .  9 
A . Introduction . . . . . . . . . . . . . . . . . . . .  9 
B . The  Relevance  of  Analytic  Techniques. 
Simulation.  Emulation.  Experimental  Models. 
Prototypes.  and  Flight  Model  in  the 
Development  Process . . . . . . . . . . . . . . . .  9 
1 . Introduction . . . . . . . . . . . . . . . . .  9 
2 . Outline of the Plan . . . . . . . . . . . . . .  10 
3 . Justification of the Plan . . . . . . . . . . .  14 
C . Recommendations . . . . . . . . . . . . . . . . . .  1 7  
1 . Step 1-- Current Contract . . . . . . . . . . .  18 
2 . Step 2. Complete Design of SIFT . . . . . . . .  19 
3 . Design  Review . . . . . . . . . . . . . . . . .  27 
4 . Step 3 . . . . . . . . . . . . . . . . . . . .  28 
5 . Steps 4 and 5 . . . . . . . . . . . . . . . . .  30 
IV  THE  SIFT  CONCEPT . . . . . . . . . . . . . . . . . . . .  33 
A . Introduction . . . . . . . . . . . . . . . . . . . .  33  
B . SIFT Performance and Reliability Goals . . . . . . .  34 
C . SIFT System Design . . . . . . . . . . . . . . . . .  34 
D . The Design Methodology . . . . . . . . . . . . . . .  .38 
E . Design Features of SIFT . . . . . . . . . . . . . .  39 
1 . Task  Dispatching . . . . . . . . . . . . . . .  39 
2 . Task  Communication . . . . . . . . . . . . . .  4 0  
3 . Detection  and  Location  of  Processor 
and  Bus  Failures . . . . . . . . . . . . . . .  42 
F . The Logical Structure of SIFT . . . . . . . . . . .  46 
G . Discussion . . . . . . . . . . . . . . . . . . . . .  47 
' 
iii 
. . 
........... I.,_." . " ...... 
V TASK  STRUCTURE.  ALLOCATION AND SCHEDULING . . . . . . . .  
A . Introduction . . . . . . . . . . . . . . . . . . . .  
B . Flight  Phase  Analysis . . . . . . . . . . . . . . . .  
C . Review  of  Task  Characteristics  for  Flight 
Phase  Assignment  and  Processor-Memory 
Unit Allocation . . . . . . . . . . . . . . . . . .  
D . Task  Allocation  and  Schedule  Generation . . . . . .  
E . Schedule Derivation . . . . . . . . . . . . . . . .  
F . Schedule  Representation  and  Notation . . . . . . . .  
G . Sample  Schedule  Derivation . . . . . . . . . . . . .  
H . Conclusion . . . . . . . . . . . . . . . . . . . . .  
VI HARDWARE  DESIGN . . . . . . . . . . . . . . . . . . . . .  
A . Bus  Interconnection  Network . . . . . . . . . . . .  
1 . 
2 . 
3 . 
4 . 
5 . 
6 . 
7 . 
8 . 
9 . 
10 . 
11 . 
12 . 
Introduction . . . . . . . . . . . . . . . . .  
Design  Alternatives . . . . . . . . . . . . . .  
Parallel  Transfer . . . . . . . . . . . . . . .  
Bit-Serial  Transfer . . . . . . . . . . . . . .  
Byte-Serial Transfer . . . . . . . . . . . . .  
Comparative  Analysis of Cost  Measures . . . . .  
Networks  with  More  Than  Three  Levels . . . . .  
Modularization  of  the  Bus  Interconnection 
Network . . . . . . . . . . . . . . . . . . . .  
Comparison  of  Delay  Times . . . . . . . . . . .  
Fault  Tolerance  Aspects . . . . . . . . . . . .  
Routing . . . . . . . . . . . . . . . . . . . .  
Conclusion . . . . . . . . . . . . . . . . . .  
B . Input/Output  Subsystem . . . . . . . . . . . . . . .  
1 . Introduction and Summary . . . . . . . . . . .  
2 . Critical  Input/Output  Units . . . . . . . . . .  
3 . Noncritical  Input  Units . . . . . . . . . . . .  
4 . Noncritical  Actuator  Units . . . . . . . . . .  
C . SIFT  Memory  System  Design . . . . . . . . . . . . .  
1 . Introduction and Summary . . . . . . . . . . .  
2 . Memory Hierarchy . . . . . . . . . . . . . . .  
3 . Memory  Technologies . . . . . . . . . . . . . .  
4 . Special  Logical  Functions . . . . . . . . . . .  
5 . Fault  Tolerance . . . . . . . . . . . . . . . .  
6 . Performance Specifications . . . . . . . . . .  
D . Processors . . . . . . . . . . . . . . . . . . . . .  
E . Power  Supply  System . . . . . . . . . . . . . . . .  
51 
51 
52 
52 
54 
66 
6 7  
68 
76 
79 
79 
79 
80 
84 
93 
94 
95 
98 
100 
101 
104 
106 
108 
109 
109 
110 
111 
112 
112 
112 
114 
116 
119 
121 
126 
129 
131 
iv 
VI1 RELIABILITY ANALYSES . . . . . . . . . . . . . . . . . .  
A . Summary . . . . . . . . . . . . . . . . . . . . . .  
B . Motivation . . . . . . . . . . . . . . . . . . . . .  
C . The  Reliability  Model . . . . . . . . . . . . . . .  
D . Analytical Techniques . . . . . . . . . . . . . . .  
E . Models  and  Programs . . . . . . . . . . . . . . . .  
F . Computational  Results  and  Interpretations . . . . .  
VI11 THE  HIERARCHICAL  DESIGN  METHODOLOGY . . . . . . . . . . .  
IX  HIERARCHICAL  ORGANIZATION OF SIFT . . . . . . . . . . . .  
A . The  Hierarchical  Methodology  Relative  to  SIFT . . .  
B . Stage 0 of  the  Methodology  for  SIFT . . . . . . . .  
C . Stage 1 as  Applied  to  SIFT . . . . . . . . . . . . .  
D . Formal  Specification of SIFT . . . . . . . . . . . .  
1 . Introduction . . . . . . . . . . . . . . . . .  
2 . The  FUNCTIONS  Section . . . . . . . . . . . . .  
3 . The DECLARATIONS Section . . . . . . . . . . .  
5 . The  DEFINITIONS  Section . . . . . . . . . . . .  
6 . The  EXCEPTIONS  Section . . . . . . . . . . . .  
4 . The  PARAMETERS  Section . . . . . . . . . . . .  
APPENDIX 
A MARKOV  PROCESSES . . . . . . . . . . . . . . . . . .  
139 
139 
140 
141 
142 
147 
151 
163 
179 
179 
184 
186 
192 
192 
194 
197 
198 
198 
199 
227 
V 
. 
ILLUSTRATIONS 
111-1 
IV-1 
IV-2 
IV -3 
IV-4 
IV-5 
IV-6 
v-1 
v-2 
v-3 
v-4 
v-5 
VI -1 
VI-2 
VI -3 
VI-4 
VI  -5 
VI-6 
VI-7 
VI -8 
VI -9 
VI . 10 
VI-11 
VI-12 
VI-13 
VI-14 
VI-15 
VI-16 
SIFT  Development  Plan . . . . . . . . . . . . . . . .  
System  Configuration . . . . . . . . . . . . . . . .  
Example  of  Task/Processor  Allocation . . . . . . . .  
Snapshot of a  Sample  Schedule . . . . . . . . . . . .  
for  Two  Community  Buffers . . . . . . . . . . . . . .  Task  Schedules  Demonstrating  the  Need 
Bus  Assignments  to  Enable  Single  Fault  Location . . .  
Illustrations of  Fault-Location  Algorithm . . . . . .  
Allocation  Algorithm . . . . . . . . . . . . . . . .  
Schedule  Representation  Examples . . . . . . . . . .  
Flight Phase . . . . . . . . . . . . . . . . . . . .  
Alternate  Schedule  Representation . . . . . . . . . .  
Schedule  Derivation  Flowchart . . . . . . . . . . . .  
Interconnection  Network . . . . . . . . . . . . . . .  
Possible  Interconnection  Schemes . . . . . . . . . .  
Scanner  and  Switch  Functional  Block  Diagram . . . . .  
Scanner and Switch  Logic  Circuitry . . . . . . . . .  
Sample  Schedule  Derivation  for  the  Landing 
Two-Level  Network  Block  Diagram . . . . . . . . . . .  
Example of a  Three-Level  Network . . . . . . . . . .  
Gate  Costs  for  Parallel  Transfer . . . . . . . . . .  
Gate  Costs  for  Bit-Serial  Transfer b = 4 . . . . . .  
Example  of  a  Five-Level  Network . . . . . . . . . . .  
Five-Level  Network  Using  2 X 2 and 3 X 3 
S-Units  in  Alternate  Levels . . . . . . . . . . . . .  
Arrangement  of  Units  to  Achieve  Fault  Tolerance . . .  
Input/Output  for  Critical  Sensors  and  Actuators . . .  
Typical  Aircraft  Alternator/Regulator  System . . . .  
Schematic  of DC-10 Power  System . . . . . . . . . . .  
Overvoltage  Protection  Circuit . . . . . . . . . . .  
Connection  Between  Power  Sources  and  Processors . . .  
11 
35 
37 
40 
42 
43 
45 
63 
69 
70 
71 
72 
81 
82 
85 
86 
89 
90 
96 
98 
99 
101 
104 
110 
133 
134 
136 
13 7 
vi 
VI-17 
VI1 . 1 
VII-2 
VII-3 
VI1 -4 
VII-5 
VII-6 
VII-7 
VI11 . 1
VI11 -2 
VIII-3 
VIII-4 
IX- 1 
IX- 2 
IX-3 
Power  Sources  and  Processors  Connection  Pattern 
A  Simplified  SIFT  Model . . . . . . . . . . . .  
Model I State-Diagram . . . . . . . . . . . . .  
Model I Behavior . . . . . . . . . . . . . . .  
Model  I1  State-Diagram . . . . . . . . . . . .  
Model  I1  Behavior . . . . . . . . . . . . . . .  
Model  IV  State-Diagram . . . . . . . . . . . .  
Model  IV  Behavior . . . . . . . . . . . . . . .  
Decomposition  in  Terms  of  Function . . . . . .  
Illustration  of  a  Functional  Hierarchy . . . .  
Dependency Set . . . . . . . . . . . . . . . .  
Example  of  Concepts  and  Definitions . . . . . .  
Description  of  State  Changes.  Representation 
Mappings.  and  Implementations  in  Adjacent 
Abstract  Machines . . . . . . . . . . . . . . .  
Hierarchical  Structure of SIFT . . . . . . . .  
Timing  Diagram  for  Two  Communicating  Processes 
. . .  
. . .  
. . .  
. . .  
. . .  
. . .  
. . .  
. . .  
. . .  
. . .  
. . .  
. . .  
. . .  
. . .  
. . .  
138 
145 
151 
153 
154 
157 
158 
161 
166 
167 
171 
173 
183 
187 
203 
v i i  
. . 
TABLES 
V - 1  System Configuration Analysis . . . . . . . . . . . . 
V-2 Adjustments and Modifications 
to  the  Initial  Task  Specifications . . . . . . . . . 
V-3 Task  Module  Properties  for  Scheduling  Assignments . . 
V-4 Allocation  of  all  Tasks  Triply  Replicated 
across  Five  Processors . . . . . . . . . . . . . . . 
V-5  Table  of  Automated  Flight  Phase  Tasks 
and  the  Characteristics  Used  to  Distribute  Them 
over  Processor-Memories . . . . . . . . . . . . . . . 
V-6 Allocation Examples--Distributed Assignment 
of  Autoland  Phase  Tasks  over  Five  Processor- 
Memory  Units . . . . . . . . . . . . . . . . . . . . 
VI-1 Summary of Maximum  Delay  Times DQ, . . . . . . . . 
VI-2 Summary of Costs  for  Network  Realization 
Using  All-Identical  Modules . . . . . . . . . . . . . 
VII-1 Sample Output . . . . . . . . . . . . . . . . . . . . 
VII-2 Failure Rates for Reconfiguration Time 
of One  Second  as  Computed  by  Model IIA . . . . . . . 
VII-3  Typical  Output  from  Model I1 Program . . . . . . . . 
VII-4 Transient Recovery Probability 
and  Transient  Error  Rates . . . . . . . . . . . . . . 
VIII-1 Module Array . . . . . . . . . . . . . . . . . . . . 
VIII-2 Specification of Module  Stack . . . . . . . . . . . . 
53 
55 
56 
61 
64 
65 
102 
103 
149 
150 
156 
160 
174 
175 
v i i i  
I INTRODUCTION 
T h i s  r e p o r t  c o v e r s  t h e  r e s e a r c h  c a r r i e d  o u t  by SRI on  con t r ac t  
NASI-13792 (SRI p r o j e c t  4026) d u r i n g  t h e  p e r i o d  5 February 1975 t o  
5 February 1976. The p r imary  goa l  o f  t he  r e sea rch  i s  t o  des ign  a f l y -  
a b l e  SIFT  computer t h a t  c a n  d e m o n s t r a t e  t h e  f e a s i b i l i t y  o f  a n  i n t e g r a t e d  
func t ion ,   fau l t - to le ran t   computer   in   connnerc ia l   av ia t ion .  
P r i o r  r e s e a r c h  by S R I  on  con t r ac t  NASI-10920 (SRI p r o j e c t  1406) 
[Refs.  1,2] over   the   per iod   October  1972 to   October  1973 cons ide red   t he  
d e s i g n  o f  f a u l t - t o l e r a n t  c o m p u t e r  a r c h i t e c t u r e s  and, i n  p a r t i c u l a r :  
J; 
e The computa t iona l   and   r e l i ab i l i t y   r equ i r emen t s   o f   an  
advanced  t ransonic  commerc ia l  t ranspor t  a i rc raf t  us ing  
f ly-by-wire  techniques  wi th  a un i f i ed  d ig i t a l  comput ing  
system. 
e The impac t  of  modern d i g i t a l  c i r c u i t  t e c h n o l o g y  o n  t h e  
des ign  of  such  a computer. 
e C a n d i d a t e   a r c h i t e c t u r e s   f o r  a computer t o   s a t i s f y   t h e  
requirements.  
One of t h e  a r c h i t e c t u r a l  c o n c e p t s  c o n c e i v e d  i n  t h a t  s t u d y  was given 
t h e  name "SIFT" (Software-Implemented  Fault-Tolerance).  It showed g r e a t  
p romise  o f  s a t i s fy ing  the  extreme r e l i a b i l i t y  r e q u i r e m e n t s  o f  t h i s  a p p l i -  
c a t i o n  class. The d e t a i l e d  d e s i g n  o f  a computer  based  on  the  SIFT  con- 
cep t  i s  t he  p r imary  ob jec t ive  o f  t he  s tudy  r epor t ed  he re .  
The g o a l s  o f  t h e  e f f o r t  were: 
(1) To develop   the  SIFT des ign   concep t   t o  a p o i n t  a t  
which i t s  p o t e n t i a l  r e l i a b i l i t y  may be evaluated 
wi th  reasonable  accuracy .  
(2) To i n v e s t i g a t e   a l t e r n a t e   s t r a t e g i e s   f o r   p h y s i c a l  
imp lemen ta t ion ,  u s ing  ava i l ab le  o r  spec ia l ly  des igned  
components . 
* 
Numbered r e f e r e n c e s  are l i s t e d  a t  the end of  the chapter .  
(3 )  To prove   the   cor rec tness   o f   the   hardware   and   sof tware  
des   i gns  . 
( 4 )  To model the   sys tem  and   eva lua te  i t s  e f f e c t i v e n e s s  
from a f a u l t - t o l e r a n c e  p o i n t  of  view . 
To ach ieve  these  goa l s ,  t he  r e sea rch  was d i r e c t e d  a t  t h e  c r i t i ca l  
a spec t s  o f  t he  des ign ,  l eav ing  less c r i t i c a l  a s p e c t s  t o  a la ter  phase 
i n  t h e  r e s e a r c h  program. 
Some o f  t he  r e sea rch  r e su l t s  r epor t ed  he re  have  been  p rev ious ly  
d i scussed  in  the  mon th ly  t echn ica l  p rog res s  r epor t s  and  in  a series of 
s even  t echn ica l  memos tha t  have  been  i s sued  du r ing  the  cour se  o f  t h i s  
s t u d y .  I n  a d d i t i o n ,  a Technica l   P lan   for   the   Future   Development  of SIFT 
w a s  i s s u e d  i n  November 1975. 
2 
REFERENCES 
1. J. H. Wensley, K. N. Levitt, M. W. Green, J. Goldberg,  and 
P. G. Neumann,  "Design of a  Fault-Tolerant  Airborne  Digital 
Computer,"  Vol. I, Architecture,  Final  Report.  NASA 
CR-132252,  1973. 
2. R. S. Ratner,  E. B. Shapiro, H. M. Zeidler, S. E.  Wahlstrom, 
C. B. Clark,  and J. Goldberg,  "Design of a Fault  Tolerant 
Airborne  Digital  Computer,"  Vol. 11, Computational  Require- 
ments  and  Technology,  Final  Report.  NASA  CR-132253,  1973. 
3 

I1 SIGNIFICANT  RESULTS AND OUTSTANDING  PROBLEMS 
In  this  section  we  summarize  the  significant  research  results 
achieved  in  this  study,  and  we  identify  significant  problems  that  remain.. 
A. Significant Research Results 
The  principal  objective  of  the  study  was  to  carry  out  a  refinement 
of the  SIFT  concept,  thereby  reducing  uncertainties  in  the  design.  The 
intent  was  to  prove  the  feasibility  of  a  design  based on the  SIFT con- 
cept  with an eventual  goal  of  a  flyable  prototype  (or  "brassboard"). A 
significant  result  of  our  current  study is that  in  this  process  of re- 
fining  the  design,  no  radical  changes  have  had  to  be  made.  Indeed  the 
fundamental  SIFT  concepts  that  distinguish  it  from  other  fault-tolerant 
computer  architectures  remain,  namely: 
e A l l  fault-tolerance  procedures  (error  detection,  error 
correction,  diagnosis,  and  reconfiguration)  carried  out 
by  software. 
e No  essential  special  fault-tolerance--different  replica- 
tion  possible  for  different  tasks,  or  at  different  times 
for  the  same  task. 
e Very  high  reliability  achieved  without  the  need  for  high 
intrinsic  reliability  of  subunits  of  the  system. 
e Reconfiguration  on  the  basis of complete  processor/memory 
modules  or  complete  busses. 
e An ability  to  use  fairly  standard  units  such  as  processors 
and  memories,  with  an  attendant  gain  in  reliability  by 
taking  advantage  of  the  stability of production  processes 
with  standard  high-volume  production. 
The  development of these  concepts  leads  to  a  design  with  the  follow- 
ing  characteristics: 
e Replicated  units  do  not  operate  in  lock-step  mode  but  are 
only  loosely  synchronized.  The  communication  between CPUs 
is  asynchronous,  thereby  removing  the  need  for an ultra- 
reliable  system  clock. 
5 
e Agreement   between  repl icated  uni ts  i s  v e r i f i e d  o n l y  a t  
the complet ion of  program segments  ( tasks) .  
0 F a u l t y  u n i t s  are n o t  n e c e s s a r i l y  removed b u t  c a n  b e  e i t h e r  
e Trans ien t   fau l t s   do   no t   necessar i ly   cause   permanent   removal  
o f   t h e   f a u l t y   u n i t s .   F u r t h e r m o r e ,   t h e   l o o s e n e s s   o f   s y n -  
c h r o n i z a t i o n  among sets  o f  t a s k s  makes i t  p o s s i b l e  t o  
enhance immunity from transients  by providing that  redun-  
d a n t  v e r s i o n s  o f  a computation may be done a t  d i f f e r e n t  
moments i n  time. 
i g n o r e d  o r  a s s i g n e d  t o  t a s k s  h a v i n g  n o  o v e r a l l  e f fec t .  
0 The degree   o f   f au l t - to l e rance   can  be d i f f e r e n t  f o r  d i f f e r e n t  
tasks  be ing  per formed and  can  be  d i f fe ren t  a t  d i f f e r e n t  times 
f o r  t h e  same t a s k .  
e No spec ia l   hardware  i s  u s e d  t o  c a r r y  o u t  f a u l t  d e t e c t i o n  
o r  c o r r e c t i o n .  
e Communication  between CPUs i s  minimized s o  t h a t  low  band- 
wid th  busses  can  be  used ,  thereby  fac i l i t a t ing  phys ica l  
separat ion of  modules  in  environments  where physical  
damage i s  a hazard.  
0 The design  concept  i s  independent   of   the  way i n  which 
t h e  u n i t s  are b u i l t ;  i . e . ,  no s p e c i a l i z a t i o n  o f  CPU o r  
memory des ign  i s  r e q u i r e d  f o r  f a u l t  t o l e r a n c e ,  t h e r e b y  
a l lowing  the  cho ice  to  be  based  on  o the r  p rope r t i e s ,  e .g . ,  
s p e e d ,  a v a i l a b i l i t y .  
e The to t a l  comput ing  power of   the   sys tem  can   be   var ied   by  
u s i n g  u n i t s  o f  d i f f e r e n t  s p e e d  o r  by  changing the number 
o f  u n i t s .  
D u r i n g  t h e  c u r r e n t  s t u d y  a l l  c r i t i c a l  uni ts  of  both hardware and 
sof tware  have  been  s tudied.  The fol lowing are  the   key   r e su l t s   ob ta ined :  
There i s  no r equ i r emen t   fo r  a c e n t r a l  w o r k i n g  memory, 
b u t  t h e r e  i s  j u s t i f i c a t i o n  f o r  a back-up ,  nonvola t i le  
memory, e .g . ,  magnet ic  bubble  memory (VI-C).* 
e Viab le   s t ruc tu res   fo r   t he   i npu t /ou tpu t   subsys t ems   have  
been developed (VI-B) . 
e Trade-of f   s tud ies   o f   the   bus   sys tem  des ign   have   been  
c a r r i e d   o u t .   C o n s i d e r a t i o n   h a s   b e e n   g i v e n   t o   c o s t ,  
component   count ,   de lay ,   bandwidth ,   re l iab i l i ty ,   and  
s t r u c t u r a l  s i m p l i c i t y  o f  d i f f e r e n t  b u s  s t r u c t u r e s ,  w i t h  
a c o n c l u s i o n  t h a t  a t w o - l e v e l  s t r u c t u r e  i s  preferred (VI-A).  
* 
P a r e n t h e s i z e d  n o t a t i o n s  i n d i c a t e  t h a t  c h a p t e r  o r  s e c t i o n  o f  t h i s  r e p o r t  
i n  w h i c h  t h e  p a r t i c u l a r  r e s u l t  i s  d i s c u s s e d  i n  more d e t a i l .  
A d e q u a t e  p r o t e c t i o n  a g a i n s t  e x t e r n a l  power t r a n s i e n t s  
e f f e c t i n g  t h e  power suppl ies  can be provided,  and 
f a u l t - t o l e r a n c e  o f  t h e  power system can be economically 
achieved (VI-E) . 
O p t i c a l  t r a n s m i s s i o n  o f f e r s  a c o s t - e f f e c t i v e  way of pro- 
t e c t i n g  a g a i n s t  i n d u c e d  t r a n s i e n t s  i n  d a t a  p a t h s .  
Sa t i s f ac to ry  me thods  have  been  dev i sed  fo r  a l loca t ing  
tasks  be tween processors  and  for  devis ing  and  repre- 
s e n t i n g  s u i t a b l e  s c h e d u l e s  (V). 
R e l i a b i l i t y  a n a l y s e s  show t h a t  a system employing five 
p r o c e s s o r s  a n d  f o u r  b u s s e s  y i e l d s  s a t i s f a c t o r y  r e l i a b i l i t y ,  
w i t h  g r e a t e r  r e p l i c a t i o n  y i e l d i n g  e v e n  b e t t e r  r e l i a b i l i t y  ( V I I ) .  
Fo rma l  p roof s  o f  t he  r e l i ab i l i t y  p rope r t i e s  o f  t he  sys t em 
can  be  ca r r i ed  ou t  i n  a r i g o r o u s  manner, t hus  p rov id ing  
a s su rance  o f  t he  co r rec tness  o f  t he  des ign ,  and  a l so  the  
c o r r e c t n e s s  o f  t h e  r e l i a b i l i t y  model (VII). 
The d e s i g n  o f  t h e  s o f t w a r e ,  i n c l u d i n g  t h e  f a u l t - t o l e r a n c e  
f ea tu res ,  can  be  spec i f i ed  in  a fo rma l  abs t r ac t  manner, 
t h u s  e n a b l i n g  t h e  p r o o f s  r e f e r r e d  t o  a b o v e  a n d  a s s i s t i n g  
i n  t r a n s p o r t i n g  t h e  d e s i g n  a c r o s s  d i f f e r e n t  h a r d w a r e  
implementat ions  (VIII) .  
I n  c a r r y i n g  o u t  t h e  d e s i g n  s t u d y  i t  has  been  necessary  to  deve lop  
a methodology  for  design  and  analysis.  While t h i s  methodology  has  been 
aimed d i r e c t l y  a t  the  objec t ives  of  the  cur ren t  s tudy ,  they  have  grea t  
r e l e v a n c e  i n  t h e  w i d e r  c o n t e x t  o f  f a u l t - t o l e r a n t  c o m p u t e r  d e s i g n  a n d  
beyond t h a t  t o  t h e  d e s i g n  a n d  a n a l y s i s  of computer  systems i n  g e n e r a l .  
The p r inc ipa l  f ea tu re s  o f  t h i s  me thodo logy  a re :  
0 Techniques  for  formal  specification  of  complex  computer 
systems. 
0 Techniques   for   formal   p roof   o f   cor rec tness   o f   des ign .  
T e c h n i q u e s   f o r   t h e   a n a l y s i s   o f   r e l i a b i l i t y   o f   f a u l t -  
tolerant  computer  systems.  
0 Techn iques   fo r   a l l oca t ing   t a sks  among processors   and   for  
des igning  and  represent ing  schedules  wi th in  processors  
of a multiprogrammed multiprocessor computing system. 
These  techniques  represent  a powerful r igorous methodology of design 
and  ana lys i s  t ha t  can  have  a s i g n i f i c a n t  i m p a c t  o n  f u t u r e  d e s i g n  e f f o r t s .  
7 
. .... 
B. Outstanding Problems 
While  our  study  has  considered  all  the  critical  design  and  analysis 
issues,  there  remain  some  outstanding  problems  both  in  the  development  of 
SIFT and  in  the  design  of  fault-tolerant  computers  in  general.  The  major 
outstanding  needs  that  we  see  are  for: 
Continued  refinement of the SIFT design  to  include  all 
design  aspects  and,  in  particular,  to  develop  costlperformancel 
reliability  trade-offs  to  enable  optimized  versions f SIFT 
to  be  produced. 
0 More  definitive  data  on  the  intrinsic  reliability of different 
electronic  technologies,  particularly  the  newest  ones, . g., 
CMOS Large-Scale  Integrated (LSI)  circuits. 
e Improved  methods  for  the  analysis of coverage,  particularly 
of  diagnosis  techniques . 
e More  definitive  data on the  nature  and  incidence of massive 
transient  disturbances, e.g., as  caused  by  lightning  strikes. 
0 A systematic  study of the  inputloutput  units  within  an 
aircraft  (sensors,  actuators,  etc.)  and of  their  per- 
formance  and  reliability  characteristics. 
Most  of  these  problems  are  considered  in  the  technical  plan  for ' 
future  development  of  SIFT. 
8 
I11 TECHNICAL PLAN  FOR FLTTURF, DEVELOPMENT OF SIFT 
A. I n t r o d u c t i o n  
The material p r e s e n t e d  i n  t h i s  c h a p t e r  w a s  p rev ious ly  pub l i shed  
i n  November 1975 as an  informal  document. It is i nc luded  he re  so  t h a t  
t h i s  r e p o r t  c a n  b e  a self-contained document .  
The p lan  as p resen ted   he re  i s  t h a t  o r i g i n a l l y  s u b m i t t e d .  Sub- 
sequent  d i scuss ions  be tween NASA s t a f f  and SRI have  r e su l t ed  in  a 
recommendation t h a t  c e r t a i n  t a s k s  s h o u l d  be delayed from Step 2 t o  
S tep  3 .  These are the  tasks   "Test   Procedures"   and  "Aircraf t   Test  
I n t e r f a c e "  as shown i n  F i g u r e  111-1. 
The p lan  i s  p r e s e n t e d  i n  d e t a i l  f o r  t h e  p e r i o d  up t o  November 1976, 
by which t i m e  t he  des ign  i s  expec ted  to  have  been  comple t ed  in  su f f i c i en t  
d e t a i l  t o  e n a b l e  p r o c u r e m e n t  of equipment. The spec i f i ca t ion   o f   sys t em 
so f tware  ( loca l  and  g loba l  execu t ives )  w i l l  h a v e  b e e n  f u l l y  s p e c i f i e d  t o  
enab le  p rogram wr i t i ng  to  commence. The p l an  i s  p r e s e n t e d  i n  more  gen- 
eral  terms f o r  t h e  p e r i o d  beyond November 1976. We d i s c u s s  i n  t h i s  
document the  impor tance  of  s t rong  in te rac t ion  wi th  o ther  segments  of  
i ndus t ry  such  as t h e  a i r l i n e s ,  a i r f r a m e  m a n u f a c t u r e r s ,  a v i o n i c s  manu- 
fac turers ,  and  semiconductor  manufac turers ,  and  a l so  wi th  o ther  re la ted  
r e sea rch  and  development  centers,  e.g., NASA-Ames  STP:AMD p r o j e c t  and 
NASA Houston Space Shuttle Development. 
B. The Relevance  of  Analytic  Techniques,  Simulation,  Emulation, 
Exper imenta l  Models ,  Pro to types ,  and  Fl ight  Model i n  t h e  
Development Process 
1. I n t r o d u c t i o n  
The pr imary  goa l  of  the  SRI e f f o r t  i s  a f l y a b l e  SIFT computer 
t h a t  c a n  d e m o n s t r a t e  t h e  f e a s i b i l i t y  o f  a n  i n t e g r a t e d - f u n c t i o n ,  f a u l t -  
t o l e ran t   compute r   i n   commerc ia l   av i a t ion .  Because  of  the  complexity  of 
9 
i 
such a computer  and the importance of  the demonstrat ion,  it i s  d e s i r a b l e  
t o  a c h i e v e  a high level o f  c o n f i d e n c e  i n  t h e  d e s i g n  b e f o r e  a f l i g h t  model 
is b u i l t .  
For a computer system, such confidence may be achieved through 
v a r i o u s  means o f  v a l i d a t i o n  s u c h  as human des ign  review, fo rma l  ana lys i s  
and  proof ,   s imulat ion,   emulat ion,   and  the  tes t ing  of   physical   prototypes.  
Fu r the rmore ,  i n  any  ve ry  complex  system i t  i s  common p r a c t i c e  t o  p r o c e e d  
w i t h  t h e  v a l i d a t i o n  i n  s e v e r a l  s t e p s ,  e a c h  s t e p  d e a l i n g  w i t h  d i f f e r e n t  
l eve l s  o f  abs t r a t ion  and  approx ima t ion .  Va l ida t ion  exe rc i se s  can  be 
very  expensive  and  time  consuming, so  i t  i s  d e s i r a b l e  t o  c h o o s e  a s t r a t -  
egy f o r  v a l i d a t i o n  t h a t  w i l l  y i e l d  a h igh  l eve l  o f  conf idence  qu ick ly  
and with low c o s t .  
The design approach SRI  i s  u s i n g  f o r  SIFT i s  u n u s u a l  i n  t h a t  
both the hardware and the execut ive sof tware w i l l  have precise, formal 
s p e c i f i c a t i o n s .  I n  t h i s  s e c t i o n  w e  w i l l  p r e s e n t  a p l a n  f o r  d e s i g n  va l i -  
d a t i o n  t h a t  t a k e s  a d v a n t a g e  o f  t h i s  d e s i g n  a p p r o a c h .  We b e l i e v e  t h e  
p lan  i s  both  e f fec t ive  and  economica l  compared  to  reasonable  a l te rna t ives .  
2. Out l ine   o f   t he   P l an  
We propose   the   fo l lowing   s teps :  
(1) Abst rac t   spec i f ica t ion   and   proof   o f   sof tware   and  
hardware. 
(2) Des ign   of   p rograms  and   log ic ;   inves t iga t ion   of  
nonlogica l*  des ign  i ssues .  
( 3 )  Val ida t ion   o f   t he   des ign   i nc lud ing   cons t ruc t ion  
o f  a prototype computer.  
( 4 )  Tes t ing   o f   t he   p ro to type   and   cons t ruc t ion  and 
c e r t i f i c a t i o n  o f  a n  e x p e r i m e n t a l  model f l i g h t  
computer. 
(5) F l i g h t   t e s t s .  
* 
By "nonlogical"  we mean f a c t o r s  s u c h  as packaging, device performance , 
f a u l t  modes, e t c . ,  t h a t  are n o t  i n c l u d e d  e x p l i c i t l y  i n  p r o g r a m s  o r  
l og ic  des igns .  
10 
FIGURE 111-1 SIFT DEVELOPMENT  PLAN 

Between Steps 2 and 3 w e  p ropose  tha t  a des ign  review take  p lace .  
This should be conducted by both  S R I  and NASA p e r s o n n e l  o r  t h e i r  r e p r e -  
s e n t a t i v e s .  
S tep  1 w i l l  i nc lude  the  execu t ive  so f tware  and the major system 
modules ,  taken to  a level o f  d e t a i l  t h a t  c o m p r i s e s  w e l l - u n d e r s t o o d  s o f t -  
ware  and  hardware  functions.  Some f a u l t - t o l e r a n c e   f u n c t i o n s  w i l l  be 
pa rame te r i zed  to  a l low some use r  f r eedom in  the  cho ice  o f  f au l t - to l e rance  
po l i c i e s .   Aux i l i a ry   so f tware   such  as d i a g n o s t i c   r o u t i n e s  and  system 
e x e r c i s e r s  w i l l  n o t  b e  i n c l u d e d  i n  t h i s  s t a g e .  
S t ep  2 w i l l  r e s u l t  i n  t h e  c o m p l e t e  s p e c i f i c a t i o n  o f  t h e  com- 
pu te r  sys t em to  a l e v e l  o f  d e t a i l  t h a t  w i l l  be  su f f i c i en t  fo r  equ ipmen t  
and  software  procurement.  The ha rdware   spec i f i ca t ions  w i l l  i nc lude  s ta te-  
men t s  o f  func t iona l  capab i l i t y ,  pe r fo rmance  pa rame te r s ,  r e l i ab i l i t y  con-  
s t r a i n t s ,   i n t e r f a c e   s p e c i f i c a t i o n s ,  and  packaging  constraints .  The s o f t -  
ware s p e c i f i c a t i o n s  w i l l  a l so  con ta in  func t iona l  r equ i r emen t s  and  p e r -  
formance  parameters and, i n  a d d i t i o n ,  w i l l  be  accompanied by sample i m -  
p lementa t ion   schemes .   Cer ta in   nonlogica l   des ign   i s sues  w i l l  be  inves t i -  
gated.   These  include  choice  of   device  and  interconnect ion  technologies ,  
packaging and shielding,  power supp ly  des ign ,  and  spec i f i ca t ion  o f  pe r iph -  
e r a l  e q u i p m e n t  s u c h  a s  d i s p l a y s  and s torage   un i t s .   In format ion   concern ing  
f a u l t  modes w i l l  be  acqu i r ed  and  app l i ed  to  the  log ica l  des ign  and  execu-  
t ive  programs.  
A t  t h i s  s t a g e  a design  review  should  take  place.  The purpose 
o f  t h i s  r e v i e w  i s  t o  make a detai led examinat ion of  the design,  which had 
not  been  poss ib le  before  a complete  design  existed.   While w e  are conf iden t  
t h a t  t h e r e  w i l l  be l i t t l e  need  for  change  in  the  bas ic  sys tem concepts ,  w e  
do see t h e  p o s s i b i l i t y  o f  c h a n g e s  i n  some of  the parameters  of  the system. 
This  review should also examine the assumptions upon which the rel iabi l i ty  
analyses  were based.  
The p ro to type  compute r  t o  be  p rocured  in  S tep  3 w i l l  be r e a l i z e d  
m a i n l y  i n  t h e  same techno logy  tha t  i s  expec ted  to  be  used  in  a f l i g h t -  
exper imenta l  model. The sof tware  w i l l  be   ex tended   to   inc lude   sample   appl i -  
ca t ion  programs,  in - f l igh t  d iagnos t ic  programs,  and  bas ic  checkout  pro-  
grams. In s t rumen ta t ion  w i l l  be  provided  both  in  software  and  hardware,  
13 
and an external  computer  w i l l  be programmed t o  d r i v e  t h e  SIFT computer 
so as t o   s i m u l a t e   t h e  a i r c ra f t  environment.  Hardware  developments w i l l  
i nc lude  power suppl ies ,   c locks   and   in te rconnec t ions ,   and   connec t ions   wi th  
p e r i p h e r a l  u n i t s  s u c h  as bubble  memories  and a i r c r a f t  c i r c u i t s ,  The 
computer w i l l  b e  a n a l y z e d  i n  o r d e r  t o  g u i d e  (1) t h e  f i n a l  d e s i g n  o f  i n t e r -  
connections  and  packaging  and (2)  t he   des ign   o f  test procedures.  A s  d i s -  
cussed a t  the  end  of  the  next  sec t ion ,  a l i m i t e d  amount o f  e v o l u t i o n  w i l l  
be  a l lowed  in  the  p ro to type .  Examples o f  mod i f i ca t ions  tha t  migh t  be  
planned are (1) change in   i n t e rconnec t ion   t echno logy ,   e .g . ,   t he   i n t ro -  
duc t ion  o f  op t i ca l  coup le r s  and  special power-supply circui ts ,  and (2)  
replacement  of  some changeable  memories  by  read-only  memories,  e.g., f o r  
microprograms or programs. 
The f l i g h t  computer of Step 4 w i l l  be  ruggedized and shielded 
and w i l l  be  provided with maintenance aids  such as handbooks and diagnostic 
t o o l s .  A f u l l  se t  of   appl ica t ion   programs w i l l  be  prepared. The computer 
w i l l  be c e r t i f i e d  f o r  e x p e r i m e n t a l  a i r c r a f t  i n s t a l l a t i o n .  Our o b j e c t i v e  
w i l l  b e  t h a t  t h e  f l i g h t  model w i l l  d i f f e r  f rom the  pro to type  to  only  a 
l imi ted  degree ,  and  as such can be regarded as an evolut ion from it. 
This  view i s  j u s t i f i e d  by t h e  f a c t  t h a t  t h e  SIFT concep t  a l lows  fo r  t he  
use  o f  o f f - the - she l f  un i t s  fo r  t he  p rocesso r s  and  memor ie s .  We can   fore-  
see changes between the prototype and the f l ight  model in  the  bus  sys tem,  
the  input /output  sys tem,  and  the  sca le  of  the  sys tem as  a whole and,  in  
addi t ion ,  cer ta in  technology changes  ment ioned  in  the  d iscuss ion  of  the  
e v o l u t i o n  o f  t h e  p r o t o t y p e  i t s e l f .  
3. J u s t i f i c a t i o n   o f   t h e   p l a n  
The proposed plan departs  f rom convent ional  practice i n  two major 
r e s p e c t s .   F i r s t ,   f o r m a l   s y s t e m   s p e c i f i c a t i o n   ( S t a g e  1) i s  a much l a r g e r  
e f f o r t  i n  t h e  p l a n  t h a n  i n  c o n v e n t i o n a l  p r a c t i c e ,  w h i c h  t y p i c a l l y  s p e c i f i e s  
a sys t em d i scu r s ive ly .  Our s p e c i f i c a t i o n  method i s  a c t u a l l y  a very  h igh-  
level form of programming that not only i s  p r e c i s e  ( t o  some level of ab- 
s t r a c t i o n ) ,  b u t  a l s o  p r o v i d e s  a c a p s u l e  i n t u i t i v e  view of the system. 
14 
The second depar ture  i s  t h a t  t h e  p l a n  d o e s  n o t  i n c l u d e  a major 
component f o r  t e s t i n g ,  e i t h e r  by computer  s imulat ion or  breadboarding.  
We b e l i e v e  t h i s  o m i s s i o n  i s  j u s t i f i e d  b e c a u s e  w e  b e l i e v e  t h a t  t h e  s p e c i -  
f icat ion and proof  methodology we are employing w i l l  leave ve ry  few des ign  
ques t ions  unanswered ,  w i th  the  excep t ion  o f  ce r t a in  non log ica l  e l emen t s  
such as power supply and packaging (we p l a n  t o  a l l o w  as much t e s t i n g  f o r  
t h e s e  aspects as good eng inee r ing  practice r e q u i r e s ) .   E x t e n s i v e   t e s t i n g  
wou ld  thus  cons t i t u t e  a w a s t e f u l  d i v e r s i o n  o f  time and money. Moreover, 
w e  do n o t  see t h e  n e e d  f o r  s i g n i f i c a n t  i n c o r p o r a t i o n  o f  i n n o v a t i v e  h a r d -  
ware technologies  which  would  need  to  be  tes ted  in  a breadboard  vers ion  
of  the system. 
L e t  u s  cons ide r  t he  usua l  a rgumen t s  i n  f avor  o f  t e s t ing ,  and  the  
r easons  why w e  reject them. 
The u s u a l  r o l e  o f  tests i s  t o  h e l p  d e s i g n e r s  see j u s t  what 
behavior  i s  produced  by  the  thing  they  have  created.   This   purpose i s  
obviated by our  specif icat ion methodology.  
Ano the r  a rgumen t  o f t en  t ende red  in  f avor  o f  t e s t ing  i s  t h a t  t h e  
s p e c i f i c a t i o n s  may themse lves  be  de f i c i en t ,  and  tha t  i n  the  cour se  o f  
p repa r ing  tests, i n d i v i d u a l s  w i l l  th ink of  input  condi t ions and sequences 
t h a t  may have been overlooked by the designers .  The i s s u e  i s  how t o  v a l i -  
d a t e  o r  c o n f i r m  a designer 's   understanding  of   the  system  problem. We 
b e l i e v e  t h a t  t e s t i n g  may be a weak tool  for  achiev ing  th i s  purpose ,  and  
t h a t  g i v e n  t h e  p r e s e n t  c o s t  a n d  power o f  t e s t i n g ,  o t h e r  more human- 
oriented  methods  would  be  more  effective.   Such  methods  include: 
(1) Good means fo r   exp res s ing ,   r eco rd ing   and   d i sp l ay ing   t he  
design and i t s  documentation. 
( 2 )  Ca re fu l   des ign  review methods,  such as redundant   design 
teams and "walk-through" (discussion of  a des ign  wi th  an  
o u t s i d e r ) .  
It i s  f o r  t h i s  r e a s o n  t h a t  w e  propose a des ign  review a t  the end 
o f  S tep  2, i.e., before  procurement  of  prototype equipment  and sof tware.  
Y e t  a n o t h e r  a r g u m e n t  f o r  t e s t i n g  i s  t h a t  it can  expose  assump- 
t i o n s  made about   p r imi tve   sys tem  func t ions .   For   example ,   the   a r i thmet ic  
o p e r a t i o n s  o f  a p a r t i c u l a r  p r o c e s s o r  may not  suppor t  the  computa t iona l  
15 
methods  assumed  by  the  application programer--or even  the-claims of the 
programming  manual.  This  is  indeed  a  significant  issue.  Computer  simu- 
lation  will,  in  general, not be  useful  to  solve  this  problem.  We  are 
optimistic  however  that  the  problem  can  be  deferred  to  the  prototype  stage 
(Stage 3)  without  creating  the  need  for  major  redesign. 
The  final  argument  for  testing  or  simulation  is  that  formal 
validation  methods  cannot,  in  their  present  state of development,  inform 
the  designer  about  execution  speeds.  Such  information  may  be  necessary 
in  order  to  set  performance  specifications  for  components,  such  as  bus or
memory  bandwidth  and  processor  cycle-time. We believe  that  such  informa- 
tion  can  be  obtained  as  needed  by  special  analyses,  including  the  use  of 
computerized  models.  Such  models  would  be  much  simpler  and  easier  to 
create  and  use  than  general  system  simulations,  emulations,  or  breadboards. 
Fortunately,  the  SIFT  distributed-computer  design  places  very  mild  perfor- 
mance  requirements  on  system  components  for  the  chosen  application  domain, 
so we  believe  that  extensive  performance  analyses  will  not  be  needed. 
Some  further  remarks  about  the  prototype  are  necessary. A s  
described,  the  prototype  will  have  essentially  the  same  logic  and  use  the 
same  device  types  as  the  flight  model.  The  two  will  differ  mainly  in  size, 
completeness  of  program  set,  interconnection  techniques,  packaging,  physi- 
cal  hardening,  and  the  like.  It  might  be  argued  that  the  design  issues 
involved  in  these  various  qualities  might  be  resolved  without  the  necessity 
of  building  an  operating  prototype. It is  perhaps  too  early  in  the  course 
of  the  design  to  be  certain  about  this  issue;  however,  the  justification 
for  a  separate  prototype  does  not  rest on its  support  of  the  design  studies 
implied  by  the  differences  listed. 
Despite  our  confidence  in  the  prospect  for  validation  of  SIFT 
functional  design,  there  are  many  nonlogical  issues  that  may  contain  hid- 
den  implications  on  some  aspects  of  the  system  logic.  These  issues  in- 
clude  transient  fault  effects,  accessibility  and  controllability  for 
diagnosis,  and  postspecification  shortcomings  in  device  performance. 
16 
It would be d e s i r a b l e  t o  u n c o v e r  t h e s e  " s u r p r i s e s "  as e a r l y  as 
poss ib le .   There  i s  t h e r e f o r e  a case   for   in t roducing   phys ica l   parameters  
i n to  the  des ign  o f  t he  p ro to type  a s  ea r ly  as poss ib l e .  
To summarize,  our  view is t h a t  t h e  i s s u e s  t h a t  are t y p i c a l l y  
reso lved  by s imulat ion and breadboard models  can be delayed to  the s tage 
o f  p r o t o t y p e  t e s t i n g  b e c a u s e  o f  t h e  i n h e r e n t  f l e x i b i l i t y  o f  t h e  SIFT 
concept .  
C .  Recommendations 
We see the development of the SIFT concept as a s e r i e s  o f  f i v e  s teps  
t h a t  c a n  b e  b r i e f l y  c h a r a c t e r i z e d  as fo l lows:  
( 1 )   C r i t i c a l  aspects o f   t h e   d e s i g n   ( c u r r e n t   c o n t r a c t )  
(2)  Complete  design  of a  SIFT sys t em  ( ex tens ion   t o   cu r ren t   con t r ac t )  
( 3 )  Bui ld ing   of  a prototype  of  SIFT 
( 4 )  P r o t o t y p e  t e s t i n g  and  procurement  of a f l y a b l e  model  of  SIFT 
(5) F l i g h t   t e s t  and eva lua t ion .  
The r e m a i n d e r  o f  t h i s  s e c t i o n  d e t a i l s  t h e  a b o v e  s teps  and  def ines  
the  work t o  be  accomplished  in them. The involvement   o f   d i f fe ren t   o rgani -  
z a t i o n s  (SRI, NASA, a i r f rame  manufac turers ,   a i r l ines ,   semiconductor  manu- 
f a c t u r e r s ,   e t c . )   i n   t h e  work i s  a l s o   d i s c u s s e d .  
The t echn ica l  p l an  i s  shown g r a p h i c a l l y  i n  F i g u r e  111-1, with major 
groupings  of  expec ted  resu l t s  of  the  present  cont rac t  under  S t e p  1 (e .g . ,  
Hardware  Design). The p l an  shows t h e  work s teps  up t o  t h e  l a t t e r  p a r t  o f  
1976 i n  d e t a i l  ( w i t h  e a c h  item be ing  de f ined  in  Sec t ion  C2), and less 
d e t a i l  f o r  t h e  s t e p s  beyond the end of 1976. 
A s  can be seen from the chart ,  i t  i s  our  view  and  recommendation 
t h a t  a f l y a b l e  model of the SIFT computer can and should be implemented 
approximately  in   1979.  A l l  of  the  preceding  tasks  have  been  designed 
w i t h  t h a t  g o a l  i n  mind. The exac t  phas ing  of  the  many tasks  can  be  a 
matter o f  nego t i a t ion ,  bu t  t he  r e l a t ive  phas ing  i s  cons idered  to  be  
as i n d i c a t e d  by t h e  a r r o w s  i n  t h e  c h a r t .  
17 
We f o r e s e e  t ha t  the tasks to  be  accompl ished  i n  S t e p  2 w i l l  r e q u i r e  
a l e v e l  o f  e f f o r t  s l i g h t l y  h i g h e r  than tha t  o f  the e x i s t i n g  c o n t r a c t .  
T h i s  w i l l  be augmented with cooperation from many industry segments,  such 
as a i r l i n e s  and  semiconductor  manufacturers.  We show t h e  p o i n t  a t  which 
t h i s  c o o p e r a t i o n  b e g i n s  i n  S t e p  2, bu t  con t inu ing  coope ra t ion  i s  impl ied  
i n  t h e  l a t e r  s t e p s ,  t h o u g h  t h i s  l a t e r  coope ra t ion  i s  n o t  s p e c i f i c a l l y  
ind ica t ed   on   t he   cha r t .   Dur ing   t he  l a te r  p h a s e s ,   t h i s   c o o p e r a t i o n  w i l l  
have  to  become o n e  o f  a c t i v e  p a r t i c i p a t i o n  as equipment i s  procured and 
the prototype  and  the  f lyable   model  are  p h y s i c a l l y  t e s t e d .  We see t h a t  
t h e  t o t a l  l e v e l  o f  e f f o r t  i n  S t e p s  3 ,  4 ,  and 5 w i l l  b e  s i g n i f i c a n t l y  
h ighe r  t han  a t  p re sen t  and  tha t  i t  w i l l  be  spread  over  many o rgan iza t ions .  
The exact d e t a i l s  o f  t h e  t a s k s  t o  b e  p e r f o r m e d  i n  t h e s e  l a t e r  s t e p s  are 
t h e  s u b j e c t  o f  t h e  p r o c u r e m e n t  p l a n  t h a t  w i l l  be  prepared under  Step 2, 
a t  which time w e  e x p e c t  t o  b e  a b l e  t o  d e t a i l  t h e  l a t e r  s t e p s  so t h a t  a 
complete  determinat ion can be made o f  t h e  r e q u i r e d  f u n d i n g  f o r  t h e  re- 
mainder o f  the program. 
We c o n s i d e r  t h a t  t h e  m a j o r  e f f o r t  o f  S t e p  2 should be carried out  by 
SRI, wi th   ac t ive   d i scuss ion   by   indus t ry   and   o ther   research   segments .   This  
combined p a r t i c i p a t i o n  w i l l  e n a b l e  t h e  e f f o r t  t o  p r o c e e d  w i t h  c o n t i n u i t y  
o f  p e r s o n n e l ,  w i t h  t h e  a d v a n t a g e  o f  m a i n t a i n i n g  t h e  e x i s t i n g  c a p a b i l i t y  
and  enthusiasm,  while a t  t h e  same time widening  the  community o f  t hose  
who are i n v o l v e d  i n  t h e  t o t a l  e f f o r t .  
The p a r t i c i p a t i o n  o f  o the r  i ndus t ry  segmen t s  i n  S teps  3 t o  5 i s  
e x p e c t e d  t o  i n c r e a s e  v e r y  s i g n i f i c a n t l y .  We see t h a t  t h e  m a j o r  r o l e  t h a t  
S R I  s h o u l d  p l a y  i n  t h e s e  l a t e r  s t eps  i s  one  of  technica l  leadersh ip  and  
coord ina t ion ,  as well as be ing  the  r e sea rch  arm o f  t h e  t o t a l  e f f o r t  i n -  
r e s o l v i n g  i s s u e s  tha t  are  a t  present  unforeseen  but  become known as  work 
proceeds .   In   addi t ion ,  w e  see t h a t  t h e  s c o p e  o f  t h e  t o t a l  e f f o r t  may be 
changed with changing circumstances,  causing a need t o  c a r r y  o u t  r e s e a r c h  
t h a t  i s  beyond the scope of  that  which i s  c u r r e n t l y  e n v i s i o n e d ,  
1. Step  1 - Current   Cont rac t  
The c u r r e n t  c o n t r a c t  c a l l s  f o r  SRI  t o  d e s i g n  and ana lyze  a l l  
c r i t i c a l  aspec ts  of  SIFT.  The  following i t e m s  are  i n c l u d e d  i n  t h i s  e f f o r t :  
(1) General   system  requirements  
(2)  The d e f i n i t i o n   a n d   c o n c e p t u a l   d e s i g n   o f  SIFT 
(a) Bus and  processor  memory b u s   i n t e r f a c e  
(b)   Processor  
( c )  Memory 
( d )   I n p u t / o u t p u t   i n t e r f a c e  
(e) Software 
(i) Executive  program 
( i i )  Sample appl icat ion program 
(3 )  Analysis  and  assessment  of SIFT 
(a) Thoroughness   o f   au l t   ana lys i s  
(b )   P roof   o f   au l t - to l e rance   p rocedures  
(c)  Modeling 
( 4 )  T e c h n i c a l   p l a n   f o r   f u r t h e r  SIFT development ( t h i s  document) 
(5) Reporting  and  ocumentation. 
SRI has  cu r ren t ly  comple t ed  abou t  70% o f  t h i s  r e s e a r c h ,  w i t h  
e v e r y  i n d i c a t i o n  t h a t  the o b j e c t i v e s  w i l l  be  m e t .  Very  few new and 
s ignif icant   problems  have  been  uncovered.  The o n l y  o n e  o f  g r e a t  s i g n i f i -  
cance i s  t h e  matter of  massive t r a n s i e n t s  i n  t h e  e v e n t  o f  severe envi ron-  
menta l   d i s t rubances ,   e .g . ,  a l i g h t n i n g   s t r i k e  on t h e   a i r c r a f t .  The tech-  
n i c a l  p l a n  f o r  S t e p  2 as p r e s e n t e d  b e l o w  d e s c r i b e s  t h e  a c t i o n s  t h a t  are 
p roposed  fo r  dea l ing  wi th  th i s  p rob lem ( see  d and h under  Step 2, 
fol lowing.  
2. Step  2 - Complete  design  of  SIFT 
T h i s  s u b s e c t i o n  d e f i n e s  t h e  t a s k s  t h a t  c o n s t i t u t e  S t e p  2 i n  t h e  
t echn ica l  p l an .  Each  ma jo r  t a sk  o r  g roup  of t a s k s  i s  s e p a r a t e l y  d e f i n e d  
i n  t h e  s e c t i o n s  a through j. 
a.  General - The o v e r a l l   g o a l  i s  t o   s p e c i f y   f u l l y  a represen- 
t a t ive  SIFT system to a l e v e l  o f  d e t a i l  s a t i s f a c t o r y  f o r  p r o c u r e m e n t  o f  
a l l  hardware  and  software  components.  The p a r t i c u l a r  c o n f i g u r a t i o n  o f  
19 
SIFT t h a t  w i l l  be  chosen  fo r  t h i s  pu rpose  w i l l  b e  a p p r o p r i a t e  f o r  c a r r y -  
i n g  o u t  a reasonable  set  o f  a i r c r a f t  a p p l i c a t i o n  t a s k s  c h o s e n  a f t e r  c o n -  
s u l t a t i o n  w i t h  b o t h  a i r l i n e s  and a i r f rame manufac turers .  Throughout  th i s  
s t u d y ,  c l o s e  c o n t a c t  w i l l  be maintained with both semiconductor and avi-  
on ic s  manufac tu re t s  so  as t o  p r o v i d e  a n  e f f i c i e n t  i n v o l v e m e n t  o f  them i n  
the  bu i ld ing  o f  a p r o t o t y p e  t o  b e  c a r r i e d  o u t  i n  S t e p  3 of t h i s  program. 
b. Applicat ion  Tasks - F i r s t  implementat ion  of   the SIFT system 
should  inc lude  only  a subse t  o f  t he  to t a l  on -boa rd  da ta  p rocess ing  sys t em 
t h a t  i s  p r o j e c t e d  f o r  j e t  t r a n s p o r t s   o f   t h e  1980-1985 per iod .   This  w i l l  
make much more e c o n o m i c a l  t h e  t a s k  o f  r e a l i s t i c  e v a l u a t i o n  o f  t h e  f a u l t -  
t o l e r a n t  t e c h n i q u e s  and procedures  that  are to  be  inco rpora t ed .  
One o b j e c t i v e  w i l l  be t h e  s e l e c t i o n  o f  t h o s e  a p p l i c a t i o n  
t a sks  tha t  can  mos t  r ead i ly  and e f f e c t i v e l y  be i n c o r p o r a t e d  i n t o  t h e  p r o t o -  
type  system. The s u b s e t  o f  t a s k s  must   be   adequate   and   suf f ic ien t ly   var ied  
to   p rovide   meaningfu l  tes t  r e su l t s  f o r   a n a l y s i s .   I n   a d d i t i o n ,   t h e   s y s t e m  
must be planned to meet the constraints imposed by ope ra t iona l ,  economic, 
and  procurement  factors.  The a p p l i c a t i o n   t a s k   s e l e c t i o n   m u s t   t h e r e f o r e  
b e  b a s e d  i n  p a r t  upon the  recommendations,   facil i t ies,   and  equipment  pro- 
v ided  by var ious segments  of  the air l ine and data  processing communit ies .  
Among t h o s e  t o  b e  c o n s u l t e d  i n  t h e  t a s k  s e l e c t i o n  are the  fo l lowing:  
a Air l ines - -As   cu r ren t ly   v i sua l i zed ,   coope ra t ion   o f   one  
o r  more a i r l i n e s  w i l l  be sought t o  p r o v i d e  a commercial 
a i r l i n e  e n v i r o n m e n t  t o  e n a b l e  g e n e r a t i o n  o f  a t r u e  
o p e r a t i o n a l  i n t e r p r e t a t i o n  o f  s y s t e m  t e s t  r e s u l t s .  
C l o s e  l i a i s o n  s h o u l d  be e s t a b l i s h e d  w i t h  a i r l i n e  p e r -  
sonnel  a t  a n   e a r l y   d a t e .   O p e r a t i o n a l  and  economic 
a s p e c t s  o f  t h e  o p e r a t i o n s  o f  t h e  a i r l i n e ( s )  w i l l  most 
p robab ly  impose  l imi t a t ions  on  the  func t ions  tha t  can  
i n  a p rac t i ca l  s ense  be  inco rpora t ed  in to  a pro to type  
tes t  sytem. 
0 Research  and  development  agencies--Throughout  he p a s t  
few yea r s ,  a cons ide rab le  amount of e f f o r t  h a s  b e e n  
directed toward the design and tes t  o f  d i g i t a l  s y s t e m s  
f o r  c e r t a i n  a i r c r a f t  c o n t r o l  and r e l a t e d  f u n c t i o n s .  
Agencies and companies such as NASA, t h e  U.S. Air Force,  
and Boeing have been a t  t h e  f o r e f r o n t  of such programs. 
L ia i son  w i l l  b e  e s t a b l i s h e d  w i t h  s u c h  o r g a n i z a t i o n s  i n  
o r d e r  t o  e x p l o i t  t h e  r e s u l t s  o f  t h o s e  e f f o r t s .  
20 
". . . . 
ARINC--It  will  be  desirable  to  specify  the  prototype 
system  to  be  consistent  with  existing  and  projected 
standards  insofar as feasible,  provided  that  the  basic 
goals of the  project  are  not  jeopardized.  Accordingly, 
direct  contact  will  be  made  with  agencies  such a  ARINC 
(Airline  Radio,  Inc.)  to  achieve  the  desired  measure 
of system  standardization  and  compatibility. 
c. Hardware  Design - A complete  specification  of  all  hardware 
components of a  SIFT  system  will  be  prepared  to  a  level of detail  suffi- 
cient  for  procurement  of  a  prototype  from  commercial  vendors. The speci- 
fications must include  all  aspects  including: 
Functional capability 
Size,  speed,  and  performance  parameters 
Reliability constraints 
Interface specifications 
0 Packaging  constraints 
To provide  assurance  that  the  specified  system  can  be  im- 
plemented  effectively  and  economically  within  the  projected  time  frame, 
contact  will  be  established  with  vendors  of  computers,  data  communication 
equipment,  interfacing  hardware,  and  semiconductor  products. In some 
cases,  standard  prodact  lines  can  possibly  be  modified  slightly  to  accomo- 
date  the  requirements  of  the  prototype  system.  For  example,  data  com- 
munication  elements  will  be  modified  to  be  compatible  with  the  specified 
bus  structure of the  prototype  system.  Liaison  will  be  established  with 
appropriate  departments of  such  organizations  to  facilitate  specification 
of any  such  modifications  that may prove  to  be  necessary. 
d. Software  Design - The software of the SIFT will  be  fully 
specified.  For  the  system  software  (Global  and  Local  executives),  the 
formal  specifications  prepared  under  the  current  contract  will  be  aug- 
mented  to  the  level of detail  that  can  be  used  for  procurement.  This 
will  involve  preparing  sample  implementation  schemes  and  providing  firm 
estimates of those  variables  in  the  system  that  are  currently  pararneter- 
ized  in  the  formal  specifications. 
2 1  
The set of application  tasks  (item  2b)  will  be  specified 
in a  similar  manner  to  the  system  software. 
The specifications  will  include  full  details of acceptance 
tests  to  be  used  for  evaluation. The  use of programing languages  at 
different  levels  will  be  specified.  Any  special  measures  to  be  taken  to 
assist  in  validation of the  software  will  be  specified. 
Techniques  should  be  investigated  for  providing  a  scheme 
for  achieving  fault-tolerant  application  software.  Such  studies  should 
include  consideration of the  concept of "Recovery  Blocks"  as  developed 
at  the  University  of  Newcastle,  England,  or  derivatives of it  specially 
adapted  to  the  SIFT  concept. 
e.  Aircraft  Interface - The  SIFT  system  will  require  digital 
data  transmission  from  a  multiplicity  of  instruments,  sensors,  radio- 
frequency  units,  and  other  peripheral  input  units,  and  transmission  to  a 
multiplicity of actuators  and  display  units.  One of the  tasks  will  there- 
fore  be  to  specify  the  total  data  communication  system.  Some  of  the  sa- 
lient  factors  to  be  considered  are as follows: 
0 The basic  system--Various  techniques  for  accessing  and 
transmitting  the  data  are  possible.  For  example,  a 
fully  buffered  system  could  be  devised,  or  a  simple 
polling  system can be  implemented  at  less  cost in hard- 
ware  but  at  greater  cost in time  requirements. 
A  third  alternative  would  be  to  provide  an  interrupt 
system  whereby  the  individual  data  source  notifies  the 
computer  via  an  interrupt  that  new  data  are  available. 
Requirements,  advantages,  and  tradeoffs  of  each  of  the 
candidate  data  communication  techniques  will  be  con- 
sidered  and  recommendations  made  for  each  input  and 
output  variable. 
0 Protocol--For  each  data  communication  technique  to  be 
used,  specific  procedures  will  be  established  for  proper 
control of  the  individual  communication  function. 
0 Synchronization--For  some of the  prototype  system 
functions,  relatively  noncritical  "macrosynchronization" 
among  the  various  types  of  units  and  among  replicated 
units of a given  type  will  suffice  for  proper  system 
operation.  In  general,  synchronization of this  type 
can  be  accomplished  by  proper  polling  sequences  and 
22 
the  like.  However,  there  may  be  some  functions  for 
which time  synchronization  will  be more critical,  and 
for  which  special  synchronization  must  be  provided. 
This possibility will be  given  special  attention. 
0 Data validity--For  a  high  reliability  system,  accuracy 
of  data  transmission  will  be  extremely  critical. The 
basic  data  communication  system  will  therefore  neces- 
sarily  include  data  verification  as  a  major  considera- 
tion. Use of  various  techniques  such  as  parity,  hash 
totals,  and  check  digits  will  be  considered in the 
system  specification. 
0 Data transmission  noise--In  computer  installations  such 
as those in aircraft  where  various  system  elements  are 
widely  separated,  and  where  (electronic)  data  pulse 
fronts  are  relatively  sharp,  grounding  problems  some- 
times  lead  to  generation  of  "noise"  pulses on the  data 
lines  that can cause  errors in reception  of  the  data. 
In addition,  errors  can  be  caused  by  such  phenomena as 
lightning  strikes on the  aircraft  and  the  consequent 
induced  signals on the  data  lines.  Potential  problems 
of  this  type  will  be  discussed  with  airline,  research, 
and  vendor  personnel, so that  system  specifications  can 
be  prepared with  a  high  assurance  that  problems  of  this 
type  can  be  circumvented. It is possible  that  optical 
data  transmission  via  glass or plastic  fiber  lines  may 
prove  to  be  satisfactory  as a  technique  for  interunit 
transmission  for  some  or  all of the  data in the  on-board 
system. 
f. Maintenance  Aspects - Ideally,  any  system--electronic  or 
mechanical--would  throughout  its  full  life  be  free of any  faults  that 
would  require  either  preventive  (scheduled)  or  fault-forced  (immediate) 
maintenance.  The  architecture  of  the  SIFT  system  precludes  the  need  for 
the  latter  type. It is  necessary  to  consider  scheduled  maintenance  pro- 
cedures  that  do  not  include  extensive  disconnection  of  system  components, 
probing of circuit  boards  with  oscilloscopes  and  voltmeter  probes,  etc. 
Airline  experience  has  consistently  indicated  that  additional  system 
faults  are  often  caused  by  such  well-intentioned  but  fault-prone  proce- 
dures  themselves.  Rather,  SRI  considers  that  the  scheduled  maintenance 
should  be a  preflight  "test  and  verification" (T/V) procedure,  whereby 
the  various  system  modules  are  exercised  automatically, e.g.  via 
23 
prestored  programs,  and  either  approved  as  valid  (verified),  or  flagged 
as faulty. In the  latter  case,  at  least  some  minimal  method  of  auto- 
matic  diagnosis  should  be  incorporated  to  designate,  via  printout  or 
display,  the  faulty  module so that  it  can  be  replaced  with  a  minimum of 
effort  and  time. 
* 
The  economic  effect  of  different  maintenance  policies on 
airline  operations  should  be  considered.  Analyses  should  be  carried  out 
to  determine  the  conditions  under  which  different  maintenance  policies 
are  advisable;  for  example,  one  policy  may  be  appropriate  for  a  short- 
haul  use  while  a  different  policy  may  be  appropriate  for  long-haul  use. 
g. Reliability  Analyses - As the  design  of  the SIFT system 
becomes  defined  in  progressively  more  detail,  it is necessary  that  the 
reliability  analyses  carried  out  on  the  current  contract  be  updated.  As 
new  data  becomes  available on fault  statistics,  it  will  be  necessary  to 
examine  their  effect  on  system  reliability  and  in  particular  to  determine 
if  changes  are  therefore  required  in  the  design.  The  reliability  analyses 
carried  out  under  the  existing  contract  will  have  to  be  extended  to  in- 
clude  consideration  of  the  various  input/output  units,  including  the  sen- 
sors  and  actuators  of  the  aircraft.  The  analyses  will  also  be  extended 
to  take  into  account  all  the  different  fault-tolerance  procedures  that 
are  possible  within  the  general  framework  of  the  SIFT  concept. 
h.  Transient  Behavior - As  with  fault-tolerant  computer  de- 
signs, a cause  for  concern  is  the  possibility  of  a  massive  transient t 
* 
Current  plans  are  to  incorporate,  in  a  background  mode,  continuous  test- 
ing  of  this  type  for  all  processor  modules  during  real-time  flight 
operations. It is  further  suggested  that  the  status  of  each  processor 
be  indicated on  the  flight  deck  in  some  simple  manner  such  as (1) green 
light--processor  operation  valid; (2) amber  light--monetary  (transient) 
fault  detected;  or (3 )  red  light--processor  outputs  blocked  from  the 
system  because  of  continuing  faults. 
Such  transients  are  typically  caused  either  by  lightning  strikes  or by 
other  disturbances  of  the  electrical  system  of  the  aircraft. 
t 
24 
that  causes  multiple  faults  and  perhaps  multiple  errors in the  system. 
The approach  to  be  used in dealing  with  this  possibility  consists  of 
three  parts. 
First,  it  is  necessary  to  collect  data  relevant  to  the 
problem of transients.  Airlines  and  airframe  manufacturers  have  signifi- 
cant  data  relating  to  this  matter.  Tests  have  been  made on a rospace 
computers  at  NASA  Houston  and  other  sites.  Other  experiments  at SRI and 
elsewhere  have  examined  the  effect  of  large  electromagnetic  fields on 
electronic  equipment. It is  hoped  that  these  data  plus  consultation 
with  experts  in  this  field  will  enable an estimate  to  be  made of the 
effects of such  phenomena on an aircraft  computer.  Some  tests  may  need 
to  be  carried  out  to  answer  specific  questions  on  this  issue. 
Second,  schemes  must  be  developed  to  reduce  the  probability 
of such  massive  transients  in  the  system.  Such  schemes  must  include  as- 
pects  of  shielding,  improved  grounding  systems  and, as just mentioned, 
the  use of optical  data  links  for  the  larger  path-lengths  external  to 
the  computer  system  itself. 
Third,  within  the SIFT system,  techniques  must  be  developed 
for  recovery  from  such  transients.  This  may  involve  software  techniques 
such as an  automatic  restart  capability,  and/or  hardware  techniques  such 
as the  provision of a  highly  protected,  nonvolatile  back-up  memory  to 
store  critical  state  variables  to  assist  recovery  after a transient. 
Recovery  speed  requirements  for  these  variables  may  require a special 
memory  in  addition  to  the  general  system  back-up  memory. 
i. Diagnosis - The viability of the  error  detection  and re- 
covery  strategies  in  SIFT  relies on the  freedom  from  faults of most of 
the  SIFT  units  that  are  performing  computations.  Assuming  a  majority- 
vote  strategy,  there  are  double  failures  that  cannot  be  tolerated. At 
the  beginning  of  a  flight,  it  is  essential  that all units,  or  possibly 
all  units  except  one,  are  fault-free. W  do  not  advocate  the  use  of 
special  test  equipment  to  accomplish  preflight  checkout  of  the  computer. 
In'stead,  the  checkout  is  to be carried  out  by  executing  special  diagnosing 
programs  that  flex  the  various  system  units, e.g., processors,  memori.es, 
25 
busses ,  1/0 processo r s .   Dur ing   t he   p re sen t   yea r ' s   r e sea rch ,  w e  a l s o  
i d e n t i f i e d  t h e  n e e d  t o  c a r r y  o u t  p e r i o d i c ,  i n f l i g h t ,  d i a g n o s i s  o f  t h e  
h a r d w a r e  u n i t s  t h a t  are not   f lexed   dur ing   normal   computa t ion .   This   per i -  
o d i c  d i a g n o s i s  r e d u c e s  t h e  p r o b a b i l i t y  o f  m u l t i p l e  f a u l t s  r e m a i n i n g  un- 
d e t e c t e d .  
There  has  been  ex tens ive  work  on  log ic  c i rcu i t  d iagnos is ,  
from  both a t h e o r e t i c a l  and a p r a c t i c a l  v i e w p o i n t .  Much o f  t h e  p r a c t i c a l  
work has  been  ca r r i ed  ou t  by the semiconductor  manufacturers  toward tes t ing 
LSI c h i p s  as they  emerge  from  production.  This  work i s  no t  adequa te  fo r  
our  purposes  since i t  r e l i e s  on special  t e s t  equ ipmen t  ( s igna l  gene ra to r s ,  
p robes ,  o sc i l l o scopes )  and  s ince  i t  does not  guarantee complete  coverage.  
The t h e o r e t i c a l  work has been concerned wi th  developing  d iagnos ing  se- 
q u e n c e s  t h a t  i f  a p p l i e d  t o  a c i r c u i t  w i l l  d e t e r m i n e  i f  i t  i s  f a u l t y .  
T h i s  work i s  a t t r ac t ive  f rom our  v iewpoin t  s ince  i t  re l ies  on t h e  c i r c u i t  
i n t e r f a c e s   o n l y .  (The  Computer  Science  Group  of SRI has   done   ex tens ive  
work i n  t h i s  area under  commercial  and NASA-ERC sponsorsh ip) .  However, 
t h e  t h e o r e t i c a l  work i s  not  adequate  s ince  i t  t y p i c a l l y  a s s u m e s  t h a t  t h e  
o n l y  f a u l t  mechanism i s  a g a t e  b e i n g  s t u c k  a t  z e r o  o r  s t u c k  a t  one. It 
i s  known t h a t  LSI c i r c u i t s  e x h i b i t  a f a i lu re  behav io r  wh ich  i s  s i g n i f i -  
c a n t l y  more complex. 
Our approach w i l l  f i r s t  i n v o l v e  t e c h n i c a l  d i s c u s s i o n s  w i t h  
s e m i c o n d u c t o r   m a n u f a c t u r e s   t o   d e t e r m i n e   t h e   a c t u a l   f a i l u r e   b e h a v i o r .   P r e -  
l i m i n a r y  d i s c u s s i o n s  h a v e  i n d i c a t e d  t h a t  t h e  f o l l o w i n g  f a i lu re  behavior  
can be expected:  
A l l  t y p e s   o f   s i n g l e   g a t e   f a i l u r e s ,   i n c l u d i n g   i n p u t -  
ou tput   shor t s ,   open-outputs ,  e tc .  
e Shor ts   be tween  cont iguous   ga tes  on a c h i p .   T h i s   f a u l t  
assumption precludes the development  of  test  sequences 
t h a t  are b a s e d  e n t i r e l y  on the  log ic  d iagram.  
0 F a i l u r e s  t h a t  o c c u r  o n l y  u n d e r  maximal g a t e  l o a d i n g  
c o n d i t i o n s .   T h i s   f a u l t  seems t o   b e   m a n i f e s t e d  as 
i n p u t  g a t e  f a i l u r e s  f o r  some of  t h e  g a t e s  d r i v e n  by 
t h e  f a i l e d  g a t e .  
A f t e r  i d e n t i f y i n g  t h e  f a i l u r e  b e h a v i o r  w e  w i l l  s t u d y  t h e  
development of the sequences that  w i l l  r e v e a l  t h e  o c c u r r e n c e  o f  t h e  
26 
expec ted   f a i lu re s .   Dur ing   S t ep  2, w e  i n t e n d  t o  d e v e l o p  a p p r o p r i a t e  t e c h -  
n iques ;  the wr i t ing  of  ac tua l  d iagnos ing  programs must  await the  procure-  
ment o f  h a r d w a r e  i n  S t e p  3 .  
j. Procurement  Plan - D e t a i l e d   p l a n s  w i l l  be drawn  up f o r   t h e  
t a s k s   t o   b e   a c c o m p l i s h e d   i n   S t e p s  3 t o  5. These  plans w i l l  i n c o r p o r a t e  
f o r  e a c h  t a s k  t h e  f o l l o w i n g  items: 
e The s p e c i f i c a t i o n  o f  work to   be   accompl ished  
e The  es t imated  time t o   a c c o m p l i s h   t h e  work 
e The e s t i m a t e d  c o s t  o f  c a r r y i n g  o u t  t h e  work 
e The q u a l i f i c a t i o n s  r e q u i r e d  o f  o r g a n i z a t i o n s  t h a t  c o u l d  
c a r r y  o u t  t h e  work 
e C o n d i t i o n s   o f   d e l i v e r y   a n d   a c c e p t a n c e   c r i t e r i a  
e A t e n t a t i v e  l i s t  o f   c a n d i d a t e   o r g a n i z a t i o n s   t o   c a r r y  
o u t  t h e  work. 
The procurement plan w i l l  a l s o  d e f i n e  t h e  i n t e r a c t i o n  
b e t w e e n  t h e  s e p a r a t e  t a s k s .  I n  p a r t i c u l a r ,  i t  w i l l  i d e n t i f y  t h e  c r i t i c a l  
pa th(s )  in  the  deve lopment  and  the  manner  in  which  the  procurement  p lan  
i s  i n t e n d e d  t o  p r o t e c t  t h e  p l a n  as  a whole from being jeopardized by 
f a i l u r e  t o  c a r r y  o u t  a n y  p a r t i c u l a r  t a s k .  
The plan w i l l  cons ider  methods  tha t  are p o s s i b l e  f o r  
c o n t r a c t i n g  t h i s  work, for  example ,  the  use  of  subcont rac t ing  or  the  
i s s u i n g  o f  i n d e p e n d e n t  c o n t r a c t s .  The o v e r a l l  management of   the   deve lop-  
ment w i l l  be considered and recommendations made as t o  t h e  way i n  which 
t h e  s e p a r a t e  e f f o r t s  w i l l  be  coord ina ted .  
3. Desipn Review 
Between Step 2 and Step 3 w e  a n t i c i p a t e  a des ign  rev iew.  This  
w i l l  be c a r r i e d  o u t  b y  NASA p e r s o n n e l  o r  t h e i r  r e p r e s e n t a t i v e s  i n  consul-  
t a t i o n  w i t h  t h e  SRI d e s i g n  team. The  pu rpose  o f  t h i s  review i s  t o  re- 
examine  the  des ign  f rom the  poin t  of  view of  comple teness  and  cor rec tness  
and  to  check  i t s  a p p r o p r i a t e n e s s  f o r  t h e  a p p l i c a t i o n  set f o r  which i t  i s  
intended.  A t  t h i s  p o i n t ,  i t  w i l l  a l s o  b e  p o s s i b l e  t o  review t h e  v a r i o u s  
e s t i m a t e s  t o  timescale and  funding  tha t  w i l l  have been prepared i n  t h e  
27 
procurement  plan in Step 2. Heavy  involvement  with  airlines,  airframe 
manufacturers,  and  avionics  manufacturers will be  desirable in this  review. 
4 .  Step 3 
There  are  three  major  objectives of Step 3 :  hardware  procure- 
ment and  integration,  software  procurement  apd  integration,  and  the  de- 
velopment of  a test  facility. It is  recommended  that SRI  continue  to 
play a  central  role in this  development  step  but  that  the  major  develop- 
ment of hardware  and  software  be  carried  by  organizations  specializing 
in  those  fields.  This  way  of  organizing  the  development of  a prototype 
has  been  used  extensively  by  SRI  with  great  success. In one  case,  we 
have  carried  out  the  role  of  system  integrators  for  a  mobile  digital 
packet  radio  network,  with  radio  and  computing  equipment  being  supplied 
by  vendors  in  those  fields. In another  case, SRI is  the  system  inte- 
grator in the  development  of  a  blind  landing  system  for FAA. The  design 
of  SIFT  greatly  facilitates  this  kind of operation  in  that  there  is  a 
high  degree  of  functional  independence  among  the  various  units  of  either 
hardware  or  software. It is  thus  relatively  easy  to  specify  individual 
procurements,  with  the  final  integration  to  be  carried  out  after  delivery. 
Our  design  methodology  for  preparing  formal  specifications  and  for  de- 
fining  the  functional  hierarchy of the  system  also  makes  independent  pro- 
curement  of  parts of the  system a  practical  strategy. 
The  integration of the  various  parts  will  involve  the  building 
of limited  amounts of special  hardware  (e.g.,  the  bus  system),  and  also 
the  writing of limited  amounts of programs (e.g., the  programs  used  for 
prototype  tests), 
a. Hardware  Procurement - It is planned  that as part  of  Step 2 
we  will  have  already  determined  those  organizations  that  are  qualified  to 
act as suppliers of the  processors  and  memories,  which  represent  the  major 
hardware  components of the  system.  It will be  necessary  in  Step 3 to  pre- 
pare  formal  requests  for  bid  from  these  organizations  for  each of the
hardware  units.  Following  evaluations of these  bids,  purchase  orders or 
development  contracts  will  be  drawn  up  for  procurement  of  equipment. 
28 
We a n t i c i p a t e  t h a t  t h e s e  a c t i o n s  w i l l  h a v e  b e e n  t a k e n  i n  t h e  f i r s t  few 
months  of  Step 3 ,  t hus  enab l ing  the  even tua l  p rocuremen t  to  be  completed 
a f t e r  9 months  in to  S tep  3 .  The remaining 3 months of Step 3 w i l l  be 
devoted  to  the  in tegra t ion  of  the  hardware .  This  w i l l  b e  g r e a t l y  f a c i l i -  
t a t e d  by the  pr ior  deve lopment  of  the  spec ia l  hardware  tha t  i s  necessary  
f o r  i n t e g r a t i n g  t h e  w h o l e  s y s t e m .  I n  p r o c u r i n g  t h e  m a j o r  u n i t s  o f  h a r d -  
ware, i t  i s  a n t i c i p a t e d  t h a t ,  t o  a l a rge  ex ten t ,  s t anda rd  o f f - the - she l f  
un i t s  can  be  used  wi th  ve ry  minor  mod i f i ca t ions .  We a n t i c i p a t e  t h a t  many 
s u p p l i e r s  may be involved;  for  example,  i t  may b e  d e s i r a b l e  t o  p r o c u r e  
main processors from an avionics computer manufacturer and inpu t /ou tpu t  
processors  f rom an LSI microprocessor  manufacturer .  
A s  t he  p ro to type  evo lves ,  t he re  w i l l  need  to  be  continuing 
e f f o r t ,  p r i m a r i l y  c o n c e r n e d  w i t h  t h e  d e t a i l s  o f  t h e  c i r c u i t  technology 
t h a t  i.s used  but  a l so  involv ing  ques t ions  of  packaging  and  in te rconnec-  
t i o n .  We e x p e c t  t h a t  t h e  f i r s t  v e r s i o n  o f  t h e  p r o t o t y p e  w i l l  use  con- 
v e n t i o n a l  t e c h n o l o g i e s  i n  t h e s e  u n i t s  b u t  w i l l  e v o l v e  t o  become ve ry  
s i m i l a r  t o  t h e  e v e n t u a l  f l i g h t  model t h a t  i s  p lanned  in  S tep  4. 
b.  Software  Procurement - The major  software  procurememt  can 
be  broken down i n t o  two pa r t s ,  sys t em so f tware  and  app l i ca t ion  so f tware .  
The system software w i l l  h a v e  b e e n  f u l l y  s p e c i f i e d  i n  S t e p s  1 and 2 and 
can  be l e t  o u t   f o r   b i d   u s i n g   t h e s e   s p e c i f i c a t i o n s .  The a p p l i c a t i o n s  
sof tware w i l l  be s p e c i a l i z e d  t o  t h e  p a r t i c u l a r  a i r c r a f t  f u n c t i o n s  t h a t  
are de termined  in  S t e p  2 and w i l l  be g r e a t l y  i n f l u e n c e d  by the  type  of  
a i r c r a f t  t h a t  i s  t o  be t h e   e v e n t u a l  tes t  veh ic l e .   Cons ide rab le   ga in  may 
be had by procuring the application software from the same o r g a n i z a t i o n  
t h a t  i s  se lec ted  to  supply  the  major  hardware  components ,  par t icu lar ly  i f  
t h e  l a t t e r  i s  an  av ionics  manufac turer .  
We see tha t  t he  o rgan iz ing  o f  t he  so f tware  p rocuremen t  can  
b e  a c h i e v e d  i n  t h e  f i r s t  3 months  of S t e p  3, p a r t i c u l a r l y  when w e  t ake  
in to  accoun t  t he  p re l imina ry  ac t ions  tha t  w i l l  have been taken in  develop-  
ing the procurement  plan of  Step 2, f o r  example ,  the  pr ior  se lec t ion  of  
o n e  o r  more c a n d i d a t e  o r g a n i z a t i o n s  t o  a c c o m p l i s h  t h e  n e c e s s a r y  work. It 
is expec ted  tha t  the  ac tua l  procurement  of  the  sof tware  w i l l  be p o s s i b l e  
29 
i n  a pe r iod  o f  6 months, with t h e  f i n a l  3 months of  Step 3 d e v o t e d  t o  t h e  
i n t e g r a t i o n  o f  t h e  several components software. 
c. Development o f  a T e s t   F a c i l i t y  - I n   t e s t i n g   t h e   p r o t o t y p e  
(Step 4 below) it w i l l  be  necessa ry  to  p rov ide  an  adequa te  test envi ron-  
ment.   This  involved two major   components ,   the   connect ion  of   the  proto-  
t ype  to  s imula t ed  inpu t  and  ou tpu t  un i t s  and  the  gene ra t ion  o f  appropr i a t e  
tes t  d a t a .  We f o r e s e e  t h e  s e t t i n g  up o f  a t e s t  g e n e r a t i o n  f a c i l i t y  b a s e d  
upon a genera l -purpose  computer  su i tab ly  programmed t o  g e n e r a t e  t h e  
a p p r o p r i a t e  tes t  s i g n a l s .  
d .   F l i g h t  Model Packaging - A l s o   t o   b e   i n c l u d e d   i n   S t e p  3 i s  
the   deve lopment   o f   packaging   techniques   for   the   f l igh t   model .  In t h i s  
t a s k  w e  would expect tha t  the  exper ience  of  av ionics  equipment  manufac-  
t u r e r s  wou ld  be  d i r ec t ly  app l i cab le ,  and  in  such  case w e  a n t i c i p a t e  t h a t  
t h i s  t a s k  would  be a r e l a t i v e l y  small e f f o r t .  The ma jo r  nove l ty  to  be  
inco rpora t ed  i s  t h e  p r o v i s i o n  f o r  p r o t e c t i o n  a g a i n s t  t h e  e f f e c t s  o f  elec- 
t romagne t i c   d i s tu rbances .  We a l s o  see t h e  p o s s i b i l i t y  of  some problems 
i n  i n c o r p o r a t i n g  o p t i c a l  c o u p l i n g  b e t w e e n  u n i t s  w h i l e  m a i n t a i n i n g  t h e  
i n t e g r i t y  o f  a n y  r e q u i r e d  s h i e l d i n g .  
5 .  Steps  4 and 5 
The major a c t i v i t i e s  of  S teps  4 and 5 are t h e  t e s t i n g  o f  t h e  - 
pro to type   and   t he   bu i ld ing   and   t e s t ing   o f   t he   f l i gh t   mode l .  The t a s k s  
t o  b e  a c c o m p l i s h e d  i n  t h e s e  s t e p s  are shown i n  t h e  accompanying c h a r t .  
A s  s t a t e d  p r e v i o u s l y ,  w e  see t h a t  t h e  f l i g h t  model  should be an evolut ion 
from t h e  p r o t o t y p e  r a t h e r  t h a n  a completely new des ign .  
We a n t i c i p a t e  t h a t  t h e  t e c h n o l o g y  o f  t h e  f l i g h t  model w i l l  be  
v e r y  c l o s e l y  r e l a t e d  t o  t h e  p r o t o t y p e .  One scheme  would  be t o  u s e  a 
set of processors and memories from a minicomputer manufacturer, which 
i n  t h e  p r o t o t y p e  would  be  cons t ruc ted  us ing  convent iona l  c i rcu i t  board  
techniques ,  and  to  use  a rugged ized  ve r s ion  o f  t he  same hardware i n  t h e  
f l i g h t  model.  This  scheme i s  v e r y  a t t rac t ive  i n  t h a t  much of   the   sup-  
p o r t i v e  d e s i g n  work  would no t  be  changed  in  go ing  f rom the  p ro to type  to  
30 
t h e  f l i g h t  model. It would inc lude   the   des ign   of   spec ia l   equipment  
(busses ,   i n t e r f aces ,   e t c . ) ,   t he   so f tware   sys t em  ( execu t ive   and   app l i ca t ion  
p rograms) ,  and  the  t e s t  p rocedures  and  f ac i l i t i e s  have  been  des igned  fo r  
t he  p ro to type .  
The approach suggested above might preclude the use of advanced 
technology  components  such as LSI c i r c u i t r y ,  b u t  t h i s  i s  considered a 
small r i s k  i n  view of  two f a c t o r s :  
It i s  u n l i k e l y  t h a t  t h e  f l i g h t  model  would b e  b u i l t  u s i n g  
r a d i c a l l y  new technology because  of  the  unt r ied  na ture  of  
such a technology and  the  lack  of  da ta  on  i t s  r e l i a b i l i t y .  
0 Any LSI components t h a t  are s u i t a b l e  f o r  t h e  f l i g h t  model 
w i l l  probably be preceded on the market by t h e  same type 
of  equipment  implemented i n  a less  advanced technology.  
I n  t e s t i n g  t h e  f l i g h t  m o d e l ,  a s u i t a b l e  r e s e a r c h  a i r c r a f t  
environment w i l l  be   requi red .  We under s t and  tha t  NASA Langley i s  equipped 
wi th  such  a f a c i l i t y  and a n t i c i p a t e  t h a t  i t  can  be  used  for  tes t ing .  
F o r  t h i s  r e a s o n  we s e e  a s t rong involvement  of  NASA p e r s o n n e l  i n  t h e s e  
s t e p s .  
31 

IV  THE  SIFT  CONCEPT 
A .  Introduction 
In recent  years,  a  number  of  fault-tolerant  architectures  [Refs. 1-41
have  been  devised  and  in  some  cases  analyzed  and  implemented. Most  of 
these  architectures  depend  heavily on  special  hardware  structures  to 
achieve  their  fault-tolerance.  While  hardware  mechanisms  are  fast  and 
economical,  they  are  severely  limited  in  the  kinds  of  faults  they  can 
treat.  Also,  such  mechanisms  cannot  be  easily  modified  to  reflect  changes 
in  performance  and  reliability  requirements. 
The  SIFT  (Software-Implemented  Fault-Tolerance)  computer  [Ref. 51 
is  founded on  a  new  approach  to  fault-tolerant  computing  that  puts  strong 
emphasis  on  the  use  of  software  for  achieving  reliability,  with  correspond- 
ing  de-emphasis  on  special  hardware.  The  software  that is critical  to  the 
reliability  of  the  system  is  designed  in  accordance  with  a  hierarchical 
design  methodology  [Refs. 61 that  permits  the  stating  and  proving  of 
formal  properties  relating  to  the  system's  correctness. A Markov  process 
model  is  used  to  analyze  SIFT'S  reliability  as  a  function of various 
error-detection  and  reconfiguration  strategies.  The  reliability  model 
is  incorporated  into  SIFT'S  formal  description,  permitting  the  demonstra- 
tion  that  the  model  indeed  reflects  the  behavior of the  system. 
The remainder  of  this  chapter  is  concerned  with  the  goals  of  the 
SIFT  system  and  a  narrative  description of its  operation. 
We believe  that  the  SIFT  concept  is  useful  in  many  application  areas 
where  high  reliability  is  at  a  premium.  Although  a  system  might  have 
extensive  redundancy,  if  the  software  or  hardware  mechanisms  that  manage 
the  redundancy  are  incorrect,  the  system  will  still  be  unreliable.  Later 
chapters show  how formal  verification  methods  can  be  used  to  ensure  that 
the  present  system  is  correct. We  have  attempted  to  develop  a  precise 
statement, in terms of a  Markov-like  model,  of  the  behavior  of SIFT in 
33 
t h e  h i e r a r c h i c a l  d e c o m p o s i t i o n  o f  t h e  S I F T  s o f t w a r e  t o  f a c i l i t a t e  i t s  
v e r i f i c a t i o n .  We b e l i e v e  t h i s  i s  t h e  f i r s t  a t t e m p t  t o  s p e c i f y  f o r m a l l y  
a f a u l t - t o l e r a n t  s y s t e m .  
We t h i n k  t h a t  it w i l l  b e  p o s s i b l e  t o  v e r i f y  f o r m a l l y  t h e  SIFT s o f t -  
ware, because it i s  r e l a t i v e l y  s i m p l e  and  because i t  i s  h i g h l y  s t r u c t u r e d .  
Although SIFT exhibi ts  some o f  t h e  f e a t u r e s  of a modern operating system, 
e . g . ,  t a s k  d i s p a t c h i n g  a n d  ( l i m i t e d )  memory management, i t  i s  much s imple r  
t h a n  o t h e r  s y s t e m s  b e i n g  c o n s i d e r e d  f o r  v e r i f i c a t i o n  [ R e f s .  6-71, 
B. SIFT  Per formance   and   Rel iab i l i ty   Goals  
SIFT is  a genera l -purpose  computer  in tended  for  use  as t h e  c e n t r a l  
computer i n  advanced  commercial   a i rcraf t .   The  computat ional   requirements  
[Ref. 81 f o r  t h e  a i r c r a f t  e n v i r o n m e n t  can be  summarized as follows: 
The  cont ro l   fea tures   can   be   b roken  down t o  a b o u t  20 t a s k s ,  
e . g . ,  e n g i n e  c o n t r o l ,  s t a b i l i t y  a u g m e n t a t i o n ,  a n d  c o l l i s i o n  
avoidance,   that   must   be  serviced.   The  computer  i s  designed 
so  t h a t  i t  c o u l d  s e r v i c e  t h e  f a s t e s t  t a s k s  e v e r y  1 msec. 
The r e l i a b i l i t y   r e q u i r e m e n t  is dependent on t h e   t a s k .  The 
t a s k s  t h a t  are f l i g h t  cr i t ical  m u s t  e x h i b i t  a f a i l u r e  ra te  
no t  exceed ing  10 -9 / f  l i gh t -hour .  Th i s  h igh  r e l i ab i l i t y  can -  
not  be  achieved  wi th  cur ren t  hardware  technology wi thout  
redundancy. 
The  programs  and  associated  data   that   implement   the  tasks  
are of  modera te  s ize .  
A t a s k  m i g h t  r e q u i r e  i n p u t  d a t a  f r o m  o n e  o r  more o t h e r  t a s k s  
( t y p i c a l l y  o n l y  a few  words). No other   type  of   communicat ion 
exis ts  between  tasks.  
I n p u t  f r o m  a i r c r a f t  s e n s o r s  c a n  b e  a c c o m p l i s h e d  by r ead ing  
mul t ip l e  cop ie s  o f  s enso r s ,  and  in  some cases t h e  o u t p u t  
c a n  b e  d e l i v e r e d  t o  m u l t i p l e  a c t u a t o r s .  
C .  SIFT  System  Design 
The SIFT computer (Figure IV-1) c o n i s t s  o f  a number of hardware mod- 
u les ,   each  composed of  a memory and a p r o c e s s i n g  u n i t .  T h e  i n d i v i d u a l  
p rocess ing  un i t s  w i th in  the  modu les  are c o n n e c t e d  t o  t h e  c o r r e s p o n d i n g  
memory uni t s   wi th   wide-bandwidth   busses .   The   in te rmodule   bus   o rganiza t ion  
(B1,B2,B3) i s  d e s i g n e d  t o  a l l o w  a p rocesso r  t o  r ead  f rom any  memory b u t  
34 
Mi Memory 
Pi Processor 
Bi Bus 
e 
e 
e 
FIGURE N - I  SYSTEM CONFIGURATION 
35 
n o t  t o  write i n t o  o t h e r  memory uni ts .  The intermodule bus i s  expec ted  
t o  have a much lower bandwidth than an intramodule bus because of  the 
r e l a t i v e l y  low rate of  in format ion  f low be tween tasks .  
The inpu t /ou tpu t  sys t em as sumed  to  be  connec ted  to  the  busses  B 1' 
B2, and B as shown i n  F i g u r e  I V - 1 ,  c o n s i s t s   o f  a l l  the  noncomputing 
u n i t s ,   f o r  example, t ransducers ,   ac tua tors ,   and   sensors .   The  p a r t  of 
t h e  t o t a l  i n p u t - o u t p u t  t h a t  i s  carried o u t  by  program,  such as fo rma t t ing  
or  code  convers ion ,  i s  handled i n  t h e  same manner as f o r  any  o the r  t a sk ;  
t h a t  is, it i s  r e p l i c a t e d  i n  s e v e r a l  p r o c e s s o r s .  
3' 
A l l  l a r g e  t a s k s  are b roken  in to  a number o f  sub ta sks  in  such  a way 
than  no  sub ta sk  r equ i r e s  more computing power than can be supplied by 
one  processor .   The  tasks  are g iven  the  des igna t ions ,  A, B, C, ...; t h e  
p rocesso r s  are numbered 1, 2, 3 . .  . . Each  processor  i s  capable   o f   be ing  
mul t ip rogramed  ove r  a number of  tasks ,  as i l l u s t r a t e d  i n  F i g u r e  IV-2. 
The cont ro l  of  the  comput ing  sys tem i s  carried o u t  by a number  of 
funct ions that  can be segmented into two classes: 
(1) L o c a l   E x e c u t i v e :   f u n c t i o n s   t h a t   a p p l y   t o   e a c h   p r o c e s s o r  
( e .g . ,   d i spa tch ing ,*   vo t ing ,   r epor t ing   e r ro r s ,   l oad ing  
new task programs) .  
( 2 )   G l o b a l   E x e c u t i v e :   f u n c t i o n s   t h a t  are g l o b a l   t o   t h e   s y s -  
t e m  (e .g . ,  a l loca t ion  and  schedul ing  of  work  load ,  recon-  
f i g u r i n g ) .  
A complete set o f  t he  so f tware  func t ions  o f  t he  Loca l  Execu t ive  is 
p resen t  i n  each  p rocesso r ;  t hose  o f  t he  Globa l  Execu t ive  are carried o u t  
i n  a s u f f i c i e n t  number o f  p r o c e s s o r s  t o  p r o v i d e  t h e  d e g r e e  o f  f a u l t  t o l -  
e r ance   r equ i r ed .  The f u n c t i o n s  are r e a l i z e d  by  programs t h a t  h a v e  t h e  
same t a s k  s t r u c t u r e  as a l l  other programs. 
The normal operating mode f o r  a p rocesso r  ca r ry ing  ou t  a t a s k  i s  as 
fo l lows:  Data r e q u i r e d  f o r  t h e  t a s k  are assumed t o  have  been  computed  by 
s e v e r a l  p r o c e s s o r s  ( p o s s i b l y  i n c l u d i n g  t h e  same ones  ca r ry ing  ou t  t he  
t a s k ) .  The i n p u t  data are read  f rom the  severa l  p rocessors  where  copies  
* 
The  bus  logic   envis ioned  does  not  u s e  vot ing.   The number  of  busses i s  
va r i ab le .  The  number 3 is chosen   for   convenience   o f   d i scuss ion .  
36 
TASKS 
PROCESSORS 
r 
1 2 3 4 5 6 * = - n  
. . 
I 
N e 
FIGURE JY-2 EXAMPLE OF TASK/PROCESSOR ALLOCATION 
.. 
37 
exist. A v a l i d a t i o n  is now carried o u t ,  t y p i c a l l y  by a v o t e  among t h e  
several values  of  each  datum. I f  a n y  o f  t h e  c o p i e s  o f  t h e  i n p u t  d a t a  
are found  not t o  a g r e e ,  t h i s  f a c t  i s  n o t e d  f o r  later p rocess ing  by t h e  
execu t ive .  Dur ing  the  r ead ing  o f  t he  d i f f e ren t  ve r s ions  o f  a d a t a  i t e m ,  
d i f f e r e n t  b u s s e s  are used i n  o r d e r  t o  p r o t e c t  a g a i n s t  e r r o r s  i n  b u s  op- 
e r a t i o n s .  The  computat ion  of   the  task i s  now carried o u t ;  t h e  r e s u l t s  
are l e f t   i n   t h e  memory of  the module,  and note  i s  made ( i n  t h e  module) 
o f  t h e  f a c t  t h a t  t h e  t a s k  i s  computed. 
I f  d i s c r e p a n c i e s  are de tec ted  be tween the  several ve r s ions  o f  a d a t a  
ob jec t ,  d i agnos i s  p rograms  in  the  g loba l  executive de termine  which  uni t  
is  a t  f a u l t .  R e c o n f i g u r a t i o n  i s  achieved by h a v i n g  t h e  several v e r s i o n s  
o f  t he  g loba l  execu t ive  ind ica t e  to  each  loca l  execu t ive  wh ich  t a sks  
should be performed and which other  processors  should replicate t h e  cal- 
c u l a t i o n s   f o r   e a c h   t a s k .  A l l  the   loca l   execut ives   examine   each   of   the  
g loba l   execu t ive   ve r s ions   and   i ndependen t ly   vo te   on   t hese   d i r ec t ions .  
That is, each  loca l  execut ive  dec ides  which  of  the  reconf igura t ion  d i rec-  
t i o n s  i t  w i l l  accept, u s ing  a m a j o r i t y  r u l e .  A f a u l t y  p r o c e s s o r  m i g h t  
n o t  h e e d  t h e  d i r e c t i o n s  o f  t h e  g l o b a l  executive, but ,  based on the instruc-  
t i o n s  o f  t h e  g l o b a l  e x e c u t i v e ,  o p e r a t i v e  p r o c e s s o r s  w i l l  i g n o r e  t h e  
f au l ty  p rocesso r .  Thus  the  wors t  i m p a c t  of a f a u l t y  p r o c e s s o r  i s  t h a t  
i t  w i l l  exert a s l i g h t  l o a d  on the bus system. 
D. The  Design  Methodology 
The  SIFT des ign  has  been  spec i f ied  in  accordance  wi th  a formal de- 
s ign  me thodo logy  tha t  o r ig ina t ed  wi th  D. Parnas  [Refs .  9,101 and has  
been  extensively  developed a t  SRI  [Ref. 61. The   ch ief   reasons   for   us ing  
such a medium were (1) t o  impose a d i s c i p l i n e  on the  des ign  p rocess  as- 
s u r i n g  a c l e a r l y - s t r u c t u r e d ,  e a s i l y  m o d i f i e d  d e s i g n ;  (2)  t o  s i m p l i f y  v e r -  
i f i c a t i o n  o f  t h e  c o r r e c t n e s s  o f  t h a t  d e s i g n ;  a n d  ( 3 )  t o  f a c i l i t a t e  t h e  
a n a l y s i s  o f  c e r t a i n  r e l i a b i l i t y  p r o p e r t i e s .  P r e v i o u s  u s e  o f  t h e  m e t h o d -  
o logy  has  been  conce rned  wi th  on ly  the  f i r s t  two of  these aims. The  SIFT 
e f f o r t  i s  t h e  f i r s t  i n s t a n c e  o f  i t s  u s e  i n  c o n n e c t i o n  w i t h  f a u l t - t o l e r a n t  
des ign .  
38 
The methodology can be viewed as a f o r m a l i z a t i o n  o f  D i j k s t r a ' s  
s t e p - w i s e  ref inement   concept   [Ref .  1 1 1 .  The c e n t r a l   i d e a  is t o  decompose 
t h e  d e s i g n  i n t o  a hierarchy of  modules .  The h i g h e s t  m o d u l e s  i n  t h e  h i e r -  
a r c h y  p r o v i d e  a n  a b s t r a c t ,  g l o b a l  d e s c r i p t i o n  o f  t h e  s y s t e m ' s  c a p a b i l i t i e s .  
Modules a t  lower levels o f  t h e  h i e r a r c h y  s e r v e  as b u i l d i n g  b l o c k s  f o r  
implementing  the  highest- level   module.   Modules  a t  s t i l l  lower   l eve ls  a r e ,  
bui ld ing  b locks  for  implement ing  those  a t  in t e rmed ia t e  l eve l s ,  and  s o  on.. 
The  modu les  ly ing  nea r  t he  top  o f  t he  h i e ra rchy  thus  t end  to  be  h igh ly  
a b s t r a c t ,  w h i l e  t h o s e  a t  o r  n e a r  t h e  b o t t o m  t e n d  t o  b e  more c o n c r e t e .  I n  
t h e  SIFT des ign ,  for  example, d e s c r i p t i o n s  o f  real machine  hardware ap-  
pea r  a t  the bot tom level ,  and a s e t - t h e o r e t i c  model of the  workings  of  
t he  sys t em appea r s  nea r  t he  top .  
Each  module i n  t h e  h i e r a r c h y  is s p e c i f i e d  i n  t e r m s  o f  a se t  of ab- 
stract data s t r u c t u r e s  (called V- func t ions )  p lus  a set  o f  o p e r a t i o n s  
(called O - f u n c t i o n s )  t h a t  c h a n g e  t h e  v a l u e s  o f  t h e s e  s t r u c t u r e s .  A t  any 
g iven  moment, t h e  state of the module is  determined by the  aggrega te  o f  
t he   va lues   o f  i t s  V-functions.   O-function cal ls  t h u s  cause t r a n s i t i o n s  
from  one state to another .  The V-funct ions and P-funct ions of  each mod- 
u l e  are s p e c i f i e d  u s i n g  a formal   l anguage .   The   spec i f ica t ions   descr ibe  
what  happens when each  o f  t he  func t ions  of a module i s  c a l l e d .  S p e c i f i c a -  
t i o n s  f o r  O - f u n c t i o n s  c o n s i s t  o f  a s s e r t i o n s ,  i.e., logicaL  formulas   that  
relate t h e  s ta te  (va lues  of V-functions) of t h e  module before  an O-funct ion 
ca l l ,  t o  t h e  s ta te  r e s u l t i n g  from t h e  ca l l .  Module s p e c i f i c a t i o n s  h a v e  
o the r  a spec t s  [Ref .  1 2 1  t h a t  are d i s c u s s e d  i n  g r e a t e r  d e t a i l  i n  Chap- 
ter VIII. 
E. Design  Features   of   SIFT 
T h i s  s e c t i o n  is concerned  wi th  the  more  impor tan t  des ign  dec is ions  
t h a t  we have formulated for  SIFT.  
1. 
I n  t h e  a i r c r a f t  a p p l i c a t i o n ,  m o s t  c o m p u t a t i o n s  are i t e r a t i v e .  
Thus, tasks are executed on a r e g u l a r  b a s i s  w i t h  a f r equency  tha t  is de- 
penden t  on  the  app l i ca t ion .  Non i t e ra t ive  t a sks  can  a l so  be  hand led  
39 
w i t h i n  t h i s  scheme w i t h  n o  a p p a r e n t  d i f f i c u l t y .  D i s p a t c h i n g  i s  accom- 
p l i s h e d  v ia  a f i x e d  s c h e d u l e  t h a t  is s t o r e d  i n  e a c h  p r o c e s s o r .  T h r e e  
per iods ,  o r  f rames ,  o f  a p o s s i b l e  s c h e d u l e  are d e p i c t e d  i n  F i g u r e  IV-3. 
The maximum t a s k  i t e r a t i o n  rate, o r  frame rate, is determined by the 
f requency  of  the  "c lock- t icks ,"  which  can  be  der ived  f rom an  u l t ra re l iab le  
system-wide clock,  or  v ia  a c l o c k  a s s o c i a t e d  w i t h  t h e  p r o c e s s o r  i n  q u e s -  
t i o n .  F o r  t h e  lat ter o p t i o n ,  t h e  c l o c k s  i n  t h e  r e s p e c t i v e  p r o c e s s o r s  are 
loosely  synchronized.   The  synchronizat ion  requirement  is tha t   no   p ro -  
c e s s o r  is  t o  commence i t e r a t i o n  n of a t a s k  b e f o r e  i t e r a t i o n  n - 1  h a s  b e e n  
completed  on a l l  p rocesso r s   execu t ing   t ha t   t a sk .   Thus ,   t he   s lowes t   p ro -  
cessor  should  not  s l i p  b e h i n d  t h e  f a s t e s t  p r o c e s s o r  by  more than one 
frame . 
FIGURE IV-3 SNAPSHOT OF A SAMPLE  SCHEDULE 
I n  t h e  example, t a s k s  A and C are dispatched every frame, and 
t a s k  B every  two  frames.  Note t h a t  t a s k  C is not  d i spa tched  a t  t h e  same 
r e l a t i v e  time i n  e a c h  f r a m e .  F o r  t h e  a p p l i c a t i o n  b e i n g  c o n s i d e r e d ,  t h i s  
i s  an   a l lowab le   pe r tu rba t ion .   Each   o f   t hese   t h ree   t a sks  is  in tended  
to  execu te  to  comple t ion  du r ing  each  f r ame  in  wh ich  i t  i s  d ispa tched ,  
t h u s  o b v i a t i n g  t h e  n e e d  f o r  many mechanisms usua l ly  assoc ia ted  wi th  
multiprogramming. A t a s k  t h a t  f o r  some reason   does   no t  complete t h e  
i t e r a t ion  by . ' t he  end  o f  i t s  a l l o t t e d  time i s  h a l t e d  i n  f a v o r  o f  t h e  n e x t  
t a s k .  I n  F i g u r e  IV-3, A d e s i g n a t e s  a n  i n t e r v a l  i n  w h i c h  t h e  p r o c e s s o r  
is i n  a noncomputing state, a w a i t i n g  t h e  n e x t  c l o c k - t i c k .  
2 .  Task  Communication 
Each  t a sk  i s  p rocessed  acco rd ing  to  the  fo l lowing  scheme: 
READ DATA FROM EACH TASK SUPPLYING INPUTS 
COMPUTE 
40 
WRITE  DATA TO A BUFFER  FOR EACH TASK THAT REQUIRES 
IT AS AN INPUT 
We are assuming tha t  any  da ta  a t a s k  r e q u i r e s  f o r  t h e  e x e c u t i o n  of a n  
i t e r a t i o n  i s  ob ta ined  f rom ou tpu t  da t a  computed  by t h e  p r e v i o u s  i t e r a t i o n  
of t h e  same and  o the r  t a sks .  
S ince  SIFT does  not  a l low any  processor  to  write d i r e c t l y  i n t o  
t h e  memory o f  ano the r  p rocesso r ,  t he  WRITE DATA o p e r a t i o n  is  accomplished 
by u s i n g  a b u f f e r  t h a t  resides i n  t h e  w r i t i n g  t a s k ' s  p r o c e s s o r .  I f  B i s  
t o  write d a t a  f o r  A, B d e p o s i t s  t h e  d a t a  i n  a buf fe r  t ha t  can  be  subse -  
quen t ly  r ead  by A, which may b e  e x e c u t i n g  i n  t h e  same o r  i n  o t h e r  p r o -  
c e s s o r s .  
The READ DATA ope ra t ion ,  s ay  by t a s k  A f rom task  B, i s  imple -  
mented as fo l lows:  The  da ta  depos i ted  by  each  vers ion  of  B, i n  i t s  own 
b u f f e r ,  is  r ead ,  and  the  ma jo r i ty  va lue  o f  t he  seve ra l  ve r s ions  o f  t he  
d a t a  i s  computed  by each  vers ion  of  A .  I n  o r d e r  t o  t o l e r a t e  bus  f a i lu re s ,  
each  vers ion  of  B is read  via  a d i f f e r e n t  b u s .  The disagreements reported 
by the  aggrega te  o f  p rocesso r s  are used t o  l o c a t e  f a u l t y  p r o c e s s o r s  a n d  
busses.  Some o f  t h e  data r equ i r ed  by a t a s k  are obta ined  f rom externa l  
sources, which can themselves be viewed as t a s k s  replicated f o r  relia- 
b i l i t y  enhancement. However, t h e   v a r i o u s   i n s t a n c e s  of a g iven   i npu t  
datum are n o t  l i k e l y  t o  b e  i d e n t i c a l  b e c a u s e  o f  s l i g h t  d i f f e r e n c e s  among 
real phys ica l   da t a   sou rces .   Such   s l i gh t   d i sag reemen t s   can   be   p reven ted  
from causing a vote  disagreement  by p rov id ing  a mechanism whereby a t a s k  
performing a r e a d  c a n  s p e c i f y  t h e  p r e c i s i o n  e x p e c t e d  among t h e  v a r i o u s  
i n s t a n c e s .  
When t a s k  A v o t e s  o n  t h e  d a t a  computed by s e v e r a l  i n s t a n c e s  o f  
t a s k  B y  t hese  da t a  mus t  a l l  b e  a s s o c i a t e d  w i t h  t h e  same i t e r a t i o n  o f  B. 
C o n s i d e r  t h e  t a s k  t i m i n g  i l l u s t r a t e d  i n  F i g u r e  I V - 4 .  S i n c e  a l l  i n s t a n c e s  
of t h e  t a s k s  e x e c u t i n g  i n  d i f f e r e n t  p r o c e s s o r s  are n o t  assumed (nor i n -  
t ended)  to  be  mutua l ly  synchron ized ,  and  i f  on ly  one  bu f fe r  w e r e  pro- 
v ided  p e r  p rocesso r  p e r  w r i t i n g  t a s k ,  t h e n  i t e r a t i o n  n of t a s k  A i n  P2 
would  read  da ta  f rom i te ra t ion  n-1  f rom A i n  P2, b u t  f r o m  i t e r a t i o n  n 
from A i n  P1. This problem is re so lved  by p rov id ing  two b u f f e r s  i n  e a c h  
41 
ITERATION n-1 ITERATION n ITERATION n+l 
SCHEDULE 
FOR P1 I 
A A I -  A * - I A - - 
READ -\ WRITE " - 
ITERATION  ITERATION 
SCHEDULE 
FOR P2 
A A - - - I - 
FIGURE IV-4 TASK SCHEDULES DEMONSTRATING THE NEED 
FOR TWO COMMUNITY BUFFERS 
processo r  fo r  each  wr i t i ng  t a sk ,  one  wh ich  i s  w r i t t e n  i n t o  o n  odd-numbered 
i t e r a t ions  and  the  o the r  on  even-numbered  i t e r a t ions .  
3 .  D e t e c t i o n  and  Location  of  Processor  and Bus F a i l u r e s  
I n  t h i s  s e c t i o n ,  we d i s c u s s  t h e  method  whereby t h e  g l o b a l  exec- 
u t ive  can  de termine  which  processor  or  bus  i s  f au l ty ,  based  on  the  e r ro r  
r epor t s  o f  each  o f  t he  p rocesso r s .  Fo r  s impl i c i ty ,  we  assume t r ipl ica-  
t ion  of  processors  and  busses ,  so t h a t  s i n g l e  f a u l t s  c a n  b e  t o l e r a t e d  and 
loca ted .  The g e n e r a l i z a t i o n  t o  general   redundancy is  n o t  d i f f i c u l t .  
On behalf  of  a t a s k ,  a p rocesso r  w i l l  read data from other pro- 
c e s s o r s   t h a t ,   i n   t h e   a b s e n c e   o f   f a u l t s ,   s h o u l d   b e   i d e n t i c a l .   I f   o n e   o f  
t h e  d a t a  i n s t a n c e s ,  as r ead  by a processor ,  is  i n  d i s a g r e e m e n t ,  t h e n  t h e  
processor  w i l l  r e c o r d  t h e  i d e n t i t y  o f  t h e  d i s a g r e e i n g  p r o c e s s o r  a n d  t h e  
ident i ty  of  the  bus  used .  The  g loba l  execut ive  w i l l  examine t h e  p r o c e s s o r -  
bus  d i sc repanc ie s  r epor t ed  by each  of  the  processors  and  a t tempt  to  iden-  
t i f y  t h e  p r o c e s s o r s ( s )  a n d / o r  b u s ( s e s )  t h a t  are f a u l t y .  
T h e  f o l l o w i n g  f o u r  f a u l t  t y p e s  c o v e r  a l l  p o s s i b l e  s i n g l e  p r o -  
cessor  and  bus f a u l t  o c c u r r e n c e s  t h a t  c o u l d  l e a d  t o  e r r o n e o u s  r e s u l t s :  
(1) Processor   computa t ion   and/or   vo t ing  (PCV)--A pro- 
cessor  produces  e r roneous  va lues  in  comput ing  re- 
s u l t s  f o r  t a s k s  a n d / o r  i n  p e r f o r m i n g  a vote  and 
dec id ing  wh ich  inpu t ( s )  t o  the  vo te  i s  i n  d i s a -  
greement  wi th  the  major i ty .  
42 
(2) Bus t r a n s m i s s i o n  (BT)--A bus  changes  the  value 
of a word as it i s  t ransmit ted between processors .  
(3 )  Processor-bus i n  i n i t i a t i n g  r e a d i n g  (PB1R)--A pro- 
c e s s o r  is i n c a p a b l e  o f  i n i t i a t i n g  a r e a d  o p e r a t i o n  
v i a  a p a r t i c u l a r  b u s .  
( 4 )  Processo r -bus   i n   depos i t i ng   da t a  (PBDD)--A proces- 
s o r  is incapab le  o f  depos i t i ng  d a t a  on to  a p a r -  
t i c u l a r  b u s .  
To  i l l u s t r a t e  t h e  f a u l t  l o c a t i o n  a l g o r i t h m ,  s u p p o s e  t h a t  on 
e a c h  i t e r a t i o n ,  a t a s k  reads data  f rom a p r e v i o u s  i t e r a t i o n  o f  i t s e l f .  
The t a sk  execu te s  on  th ree  p rocesso r s  and  u s e s  t h r e e  d i s t i n c t  b u s s e s  f o r  
t h e  read ope ra t ion .   Fo r  odd i t e r a t i o n s ,  t h e  bus  assignment i s  as i n  
F igu re  IV-5a and f o r  e v e n  i t e r a t i o n s  as i n  F i g u r e  IV-5b.  The i n t e r p r e -  
t a t i o n  o f  t h e  matrices i s  as f o l l o w s :  t h e  P 1  row of  Figure IV-5a indi-  
cates t h a t  when P1 reads from P1 i t  uses bus B l , *  when P 1  reads from P2 
i t  uses  Bus 2, and when P 1  reads from P 3  i t  u s e s  B3. It i s  apparent  
READ-FROM PROCESSORS 
READING  P2 P3 
PROCESSORS 
P1 
P2 
P3 
(a) ODD-ITERATIONS 
READ-FROM PROCESSORS 
READING  P2  P3 
PROCESSORS 
P1 
P2 
P3 
(b) EVEN-ITERATIONS 
FIGURE JV-5 BUS  ASSIGNMENTS  TO ENABLE SINGLE FAULT LOCATION 
* 
It is, o f  c o u r s e ,  f e a s i b l e  f o r  a p r o c e s s o r  i n  r e a d i n g  f r o m  i t s e l f  t o  u s e  
the  in te rna l  processor -bus  connec t ion  which  i s  of a higher  bandwidth than 
the   i n t e r -p rocesso r   bus   sys t em.  However, i n  t h i s  d i s c u s s i o n  we assume 
tha t  t he  bus  sys t em i s  used   for  a l l  read  data ope ra t ions .  Th i s  avo ids  
t h e  need f o r  a separate f a u l t  l o c a t i o n  a l g o r i t h m  when a t a s k  r e a d s  d a t a  
from a t a s k  i n  i t s  processor .  
43 
t h a t  f o r  e i t h e r  a s s i g n m e n t ,  t h e  o c c u r r e n c e  o f  a n y  o n e  o f  t h e  f o u r  f a u l t  
t ypes  leads t o  a n  e r r o r  which i s  masked  by t h e  v o t i n g  scheme. It remains 
t o  show t h a t  a f a u l t  c a n  b e  p i n p o i n t e d  t o  a processor ,  a bus ,  o r  a 
processor -bus  connec t ion  ( type  3 o r  4 ) .  
The f a u l t  l o c a t i o n  a l g o r i t h m  i s  i l l u s t r a t e d  i n  F i g u r e  IV-6, f o r  
a f a u l t  o f  each  o f  t he  fou r  types .  In  the  case of a t y p e  1 f a i l u r e  i n -  
vo lv ing  P 1  (PL computes erroneous resu l t s  fo r  t he  t a sk  and /o r  p roduces  
e r roneous ,  and  arb i t ra ry ,  e r ror  repor t s ) ,  f rom the  repor t s  of  P2 and P3 
it i s  a p p a r e n t  t h a t  P1 is f a u l t y  a n d  t h a t  i t s  e r r o r  r e p o r t s  s h o u l d  b e  i g -  
nored .   The   on ly   poss ib i l i ty   for   ambigui ty  is between a type  1 f a i l u r e  
(P1  v o t i n g  i n c o r r e c t l y )  and a type  3 f a i l u r e  (P1  u n a b l e  t o  i n i t i a t e  a 
read v i a  B l ) .  Both   fa i lure   types   could   p roduce   the  same e r r o r  r e p o r t  
(a l though i t  is  u n l i k e l y  t h a t  f o r  a type  1 f a i l u r e  t h i s  would be the case) ,  
i n  which case t h e  g l o b a l  e x e c u t i v e  would f i r s t  s u s p e c t  a type  3 ;  the  sub-  
sequent  use of  P 1  cou ld  ac tua l ly  r evea l  t he  p re sence  o f  a type  1 f a i l u r e .  
A f a u l t y  u n i t  s h o u l d  b e  i d e n t i f i e d  s h o r t l y  a f t e r  t h e  d e t e c t i o n  
o f  t h e  f a i l u r e  by t h e  v o t i n g  p r o c e s s o r s .  The g l o b a l  e x e c u t i v e  w i l l  then  
i n s t r u c t  a l l  p rocesso r s  t o  i g n o r e  t h e  f a u l t y  p r o c e s s o r  o r  n o t  t o  u s e  t h e  
f au l ty   bus .   Th i s   p rocess  is w e l l  known as adap t ive   vo t ing ,   [Re f s .  13,  141 
a n d  e n a b l e s  S I F T  t o  t o l e r a t e  some m u l t i p l e  f a u l t s  t h a t  may occur  be fo re  
reconf igura t ion  can  be  comple ted .  
A f t e r  t h e  i d e n t i t i e s  o f  t h e  f a u l t y  u n i t s  are known t o  t h e  g l o -  
ba l  execu t ive ,  i t  i n i t i a t e s  a r econf igu ra t ion  p rocess ,  s o  as t o  u t i l i z e  
e f f e c t i v e l y   t h e   r e m a i n i n g   o p e r a t i v e   r e s o u r c e s .   A f t e r   t h e   r e c o n f i g u r a t i o n  
i s  c o m p l e t e ,  t h e  a l l o c a t i o n  o f  t a s k s  t o  p r o c e s s o r s  a n d  b u s s e s  c o u l d  b e  
e n t i r e l y  d i f f e r e n t  f r o m  t h a t  p r i o r  t o  t h e  r e c o n f i g u r a t i o n .  A s  a r e s u l t  
o f  t he  r econf igu ra t ion ,  p rocesso r s  migh t  be  g iven  new schedules ,  and 
t h e  p r o c e s s o r  a n d  b u s  a l l o c a t i o n  t a b l e s  i n  e a c h  p r o c e s s o r  m u s t  b e  up- 
da ted .  
The formula t ion  of  new bus  a s s ignmen t s  a f t e r  de t ec t ion  and  lo-  
c a t i o n  o f  a b u s  f a i l u r e  i s  e a s i l y  c a r r i e d  o u t  by t h e  g l o b a l  e x e c u t i v e .  
It i s  f e a s i b l e  f o r  t h e  g l o b a l  e x e c u t i v e  t o  c o m p u t e  i n  r e a l  t i m e  new t a s k  
a l l o c a t i o n s  and  schedu les  in  r e sponse  to  ea,ch p r o c e s s o r  f a i l u r e .  
44 
ERROR REPORTS 
FAILURE REMARKS P3-odd P3-even P2-odd P2-even P1-odd P1-even 
PROCESSOR-COMPUTE MAJORITY AGREEMENT THAT P1 IS (P1.62)  ,BI) (P1.63)  2? ? 
AND  VOTE  (P1)  FAULTY,  INDEPENDENT  OF  Pl's 
REPORT 
BUS TRANSMISSION  (P1,Bl)2.61) 
READ INITIATION 
(P2.61)  (P3.61)  (P3.61)  (P1,Bl) UNANIMOUS AGREEMENT THAT 61 
IS FAULTY 
ONLY P1 REPORTS  ERRONEOUS 
RESULTS, ALWAYS INVOLVING 61 i 
PROCESSOR-BUS, TWO DIFFERENT PROCESSORS - (P1.61) - - (P1,Bl) - 
DEPOSITING DATA REPORT ERRONEOUS RESULTS 
(P1,Bl) INVOLVING P1 AND B1 - 
FIGURE IV-6 ILLUSTRATIONS OF FAULT-LOCATION ALGORITHM 
An alternative  approach  is  to  precompute  the  allocations  and  schedules 
for  all  possible  processor  fault  occurrences. We are  now  using  this 
latter  approach,  since  the  storage  requirements  are  small. 
A global  executive  instance  could  reside n each  processor,  and 
thus  assume  the  entire  responsibility  for  reconfiguring  that  processor, 
based on  the  error  reports  of  all  processors.  Besides  deriving new
schedules  and  bus  assignments  for  its  processor,  each  global  executive 
updates  tables  and  loads  new  tasks  into  the  processor.  However, in  order 
to  minimize  the  executive  computational  load  on  the  processors, we have 
decomposed  the  executive  tasks  into  two  parts: (1) a  global  executive 
task,  residing  in  at  least  three  processors,  which  computes  new  allocations 
and  schedules  for  each  processor,  and (2) a  local  executive  residing  in 
each  processor,  which  determines  what  its  new  configuration  should  be  by 
voting  on  the  three  global  executive  instances. 
F.  The Logical  Structure  of  SIFT 
The preceding  discussion  has  summarized  the  primary  operation of 
SIFT. In this  section, we consider  a  hierarchical  decomposition  of  the 
system. 
The logical  structure  of  SIFT  consists  of  a  hierarchical  layering 
of  modules,  which we designate  as  system  modules,  and  some  programs, 
namely  the  application  tasks  and  the  global  and  local  executives,  that 
utilize  the  facilities  of  the  external  interface  of  system  modules. For 
simplicity,  we  will  say  that  the  tasks  call  the  functions  of  the  interface. 
Each  system  module  may be  considered  as an abstract  machine  that  maintains 
a  state  (represented  by  V-functions)  and  provides  operations  (0-functions) 
to  modify  the  state. The application  tasks  and  executive  may  then  be 
considered  as  programs  that  run  on  the  abstract  machines. The data  re- 
quired  by  the  tasks  are  distributed  among  the  system  modules. 
Each task,  including  the  global  executive,  executes  in  some  subset 
of  the  processors. The fault  status  and  fault  schedules  modules,  which 
are  accessed  only  by  the  global  executive,  appear  only i  processors 
executing  the global executive. 
46 
It is  assumed  that  the  real-machine  module  describes  the  ordinary 
machine  instructions, e.g.,  add,  store. In reality,  the  tasks  and  all 
modules  above  the  hardware  will  call  these  instructions  and  thus  should 
be  depicted  as  connecting  to  the  real  machine.  We  will  not  concern  our- 
selves  with  these  connections  here  since  they  are  not  essential  to  the 
fault-tolerance,  reliability,  and  scheduling  properties  of  SIFT.. 
G. Discussion 
The  SIFT  concept  embodies  a  number  of  ideas  whose  usefulness  extends 
beyond  the  particular  application  for  which  the  system  is  designed.  Be- 
cause  conventional,  off-the-shelf  processing  units  comprise  the  bulk  of 
the  hardware,  the  system  can  be  easily  and  inexpensively  adapted  to  a 
broad  range  of  needs.  Moreover,  because  the  degree  of  reliability  achieved 
by the  system  depends  on  the  number  of  processors  used  and  on  scheduling 
strategies  rather  than  on  built-in  aspects  of  the  design,  it  can  be  varied 
according to performance  and  cost  requirements  of  the  application. 
The use  of  a  formal  design  medium  for  purposes  of  specification, 
validation,  and  reliability  modeling  can  be  expected  to  play  an  important 
role  in  future  designs  of  fault-tolerant  computers.  While  a  system  might 
make  extensive  use  of  redundancy,  the  system  will  not  be  reliable  unless 
the  software  or  hardware  mechanisms  that  manage  the  redundancy  are  correct. 
Similarly,  the  formulation  and  use  of  elaborate  reliability  models  is  of 
little  value  if  it  cannot  be  demonstrated  that  these  models  actually  re- 
flect  the  behavior of the  system. We believe  that  SIFT  constitutes  a 
major  step in  the  direction  of  fault-tolerant  systems  whose  correctness 
and  reliability can be  verified. 
REFEXENCES 
1. A. Avizienis, G. C. Gilley, F. P.  Mathur,  D. A. Rennels, J. A. Rohr, 
and  D.  K.  Rubin,  "The STAR (self  testing  and  repairing)  computer: 
An Investigation of the  Theory  and  Practice  of  Fault-Tolerant Com- 
puter  Design," IEEE Trans., Vol. C-20, No. 11, pp. 1312-1321 (No- 
vember 197 1). 
47 
2. 
3. 
4 .  
5. 
6. 
7. 
8 .  
9. 
10. 
11. 
12. 
A. L.  Hopkins,  Jr., "A Fault-tolerant  Information  Processing  Concept 
for  Space  Vehicles,"  IEEE  Trans.,  Vol.  C-20,  No. 11, pp. 1394-1403 
(November  197 1).
F.  P.  Maison,  "The  MECRA: A Self  Reconfigurable  Computer  for  Highly 
Reliable  Process,"  IEEE  Trans., Vol. C-20,  No. 11, pp. 1382-1388 
(November 1971). 
"Design  of  a  Modular  Digital  Computer  System,"  NASA  CR-123655, 
1972. 
J.  H.  Wensley,  "SIFT-Software  Implemented  Fault  Tolerance,"  Proceed- 
ings  of  the  Fall  Joint  Computer  Conference,  Vol.  41, pp. 243-253 
(AFIPS  Press,  Montvale,  New  Jersey, 1972). 
L. Robinson, K. N. Levitt,  P. G. Neumann,  and  A. K. Saxena, "A Formal 
Methodology-  for  the  Design  of  Operating  System  Software,"  in  Current 
TGends  in  Programming  Methodology, Vol. 1, R.  T. Yeh (ed.) (Prentice- 
Hall,  Inc.,  Englewood  Cliffs, New  Jersey, 1976). 
P. G .  Neumann, L. Robinson, K. N. Levitt, R. S .  Boyer,  and A .  Saxena, 
"A Provably  Secure  Operating  System,"  final  report  under  Contract 
DAAB03-73-C-1454,  Stanford  Research  Institute,  Menlo  Park,  California 
(June  1975). 
R. S .  Ratner,  et  al.,  "Design  of  a  Fault-Tolerant  Airborne  Digital 
Computer,"  Volume  II--Computational  Requirements  and  Technology, 
Final  Report.  NASA  CR-132253,  1973. 
D. L. Parnas,  "Information  distribution  aspects  of  design  method- 
ology,"  in  Proc.  IFIP  Congress  1971  (North-Holland  Publishing  Com- 
pany,  Amsterdam,  1972). 
D. L. Parnas, "A technique  for  module  specification  with  examples," 
Comm.  ACM., l5, 5, pp. 330-336  (May 1972). 
E. W. Dijkstra,  "Notes on structured  programming," in Structured 
Programming, C.A.R. Hoare (ed.) (Academic  Press,  New  York, New 
York, 1972). 
L. Robinson,  "Specification  Techniques,"  Proceedings of the  Thirteenth 
Design  Automation  Conference  (June 1976). 
48 
I I I 111111 I 
13 . w. H .  pierce,  "Adaptive  Vote-Takers Improve the Use of Redundancy," 
i n  Redundancy Techniques for Computing Systems, pp. 229-250 (Spartan 
Books, Washington, D . C . ,  1962). 
14.  J. Goldberg, K. N.  Levitt, and R. A. Short,  "Techniques  for  the 
Real izat ion of  Ultra-Reliable  Spaceborne Computers,'' F inal  
Report. NASA CR-80019, 1966. 
49 

V TASK STRUCTURE, ALLOCATION AND SCHEDULING 
A. I n t r o d u c t i o n  
T h i s  s e c t i o n  o f  t h e  r e p o r t  d e s c r i b e s  e f f e c t i v e  p r o c e d u r e s  f o r  t h e  
fo l lowing  major  e lements  in  the design of a f au l t - to l e ran t  compute r  sys -  
tem: 
0 A n a l y s i s   o f   t a s k s   t o   d e r i v e   a p p l i c a b l e   f l i g h t   p h a s e  
d e s c r i p t i o n s .  
0 Determir ia t ion  of   the  numbers   and  s izes   of   the   redundant  
processor  and memory u n i t s  r e q u i r e d  f o r  t h e  r e l i a b l e  
execu t ion  o f  t he  spec i f i ed  t a sks .  
A l l o c a t i o n  o f  t a s k s  t o  s p e c i f i c  p r o c e s s o r  a n d  memory 
u n i t s  t o  a c h i e v e  some balance of  processing and memory 
loads .  
0 Spec i f i ca t ion   o f   t a sk   s chedu l ing   and   r econf igu ra t ion  
procedures  and symbologies  in  a c o n c i s e ,  e f f i c i e n t  
f r o m  s u i t a b l e  for  implementation. 
The a n a l y s i s  c o n s i d e r s  r e p r e s e n t a t i v e  t a s k s  f o r  t h o s e  a i r c r a f t  f u n c -  
t i o n s  t h a t  h a v e  b e e n  p r e v i o u s l y  d e s c r i b e d  as c a n d i d a t e s  f o r  c o n t r o l  b y  
SIFT. While t h i s  se t  i s  more  comprehensive  than  the set o f  t a s k s  t h a t  
w i l l  b e  i n i t i a l l y  implemented i n  t h e  p r o t o t y p e ,  i t  i s  impor t an t  t o  deve l -  
op a design methodology for  SIFT t h a t  w i l l  b e  a p p l i c a b l e  u p  t o  t h e  d e s i r e d  
conceptua l  level o f   t echno log ica l   soph i s t i ca t ion .   Th i s   app roach   a s su res  
a n  upward compat ib le  des ign  t h a t ' w i l l  meet the  requi rements  of  a smaller 
scale p r o t o t y p e  w h i l e  n o t  c o n s t r a i n i n g  o r  i n v a l i d a t i n g  f u t u r e  levels of 
system expansion. 
Th i s  s ec t ion  a l so  cons ide r s  s chedu le  imp lemen ta t ion  f ac to r s  such  as 
schedu le  s to rage ,  t he  impac t  o f  sys t em degrada t ion ,  t he  p rac t i ca l  imp l i -  
c a t i o n s  of t a s k  c r i t i c a l i t y  class, the  change  o f  s chedu les  wi th  f l i gh t  
phase  change ,  and  approaches  to  der iv ing  new task  schedules  dynamica l ly  
d u r i n g  f l i g h t .  
51 
B. Flight Phase Analysis 
The  set of  flight-related  application  tasks  previously  described n 
the SRI report  entitled  "Design  of  a  Fault  Tolerant  Airborne  Digital 
Computer"  [Ref. 13 was  examined  to  establish  configurations of active 
tasks  sufficient  to  support  the  major  flight  phases  potentially  encoun- 
tered  in  normal  flight.  The  intent  of  this  study  was  to  determine  the 
variations  in  the  task  profiles  of  the  various  phases. It became  neces- 
sary  to  distinguish  between  tasks  that  were  actively  being  executed, 
those  that  were  serving  a  vital  backup  role  (referred  to  as  passive  allo- 
cation),  and  those  tasks  that  performed  secondary  roles  by  actively  con- 
firming  and  augmenting  the  results of the  primary  tasks. In  some 
instances,  certain  tasks  were  not  required  at  all  for  certain  phases. 
The derived  flight-phases  (Table  V-1)  constitute  the  major  operational 
modes  for  which  SIFT  must  provide  task  allocation  and  scheduling  among 
the  multiple  processors  and  memories,  as  discussed  in  the  following 
portions  of  this  section. 
An  anomaly  state  presenting  a  "Navigational  Failure"  is  also  speci- 
fied in  Table  V-1  as  an  illustrative  example.  Although  it  is  represented 
as  a  phase,  it  is  clearly  a  state  that  may  need  to b  accommodated  during 
several  of  the  described  phases.  The  primary  change  is  the  shift  of  the 
navigational  support  system  from  the  VOR/DME  and  Multiple-DME  to  the  Omega 
or  satellite  equipment  as  primary  support,  with  reliance  on  Air  Data  for 
secondary  support.  Thus,  the  task  schedules  could  be  modified  by  simple 
replacement of the  preferred  primary  and  backup  navigation  systems  by 
the  appropriate  secondary  set  of  primary  and  backup  systems. 
C.  Review  of  Task  Characteristics  for  Fl~igkt  Phase  Assignment  and 
Processor-Memory  Unit  Allocation 
- " 
The set  of  flight  tasks  that  were  to be considered  for  initial  SIFT 
implementation  are  listed  in  Tables 3 and 4 of  the SRI report  entitled 
"Design  of  a  Fault  Tolerant  Airborne  Digital  Computer"  [Ref. 13. Further 
qualification  and  modification of  the  characteristics  of  these  tasks  have 
been  made  to  facilitate  schedule  development.  Values  of  some  properties 
52 
Table  V-1 
SYSTEM CONFIGURATION ANALYSIS 
Task 
Code 
A1 
A2 
A 3  
A4-6 
A7 
A8 
B2 
83 
B4 
B5 
B l  
B8 
B9 
B 10 
c1 
c2 
c3 
Dl 
D2 
D3 
D4 
D5 
- Application 
Attitude  Control 
Flutter  Control 
Load  Control 
Autoland 
Autopilot 
Attitude  Indicator 
Inertial 
VOR/DME & Multiple DME 
OMEGA or  Satellite 
Air  Data  (Navigation) 
Flight  Data 
Airspeed,  Altitude 
Graphic  Display 
Test  Display 
Collision  Avoidance 
Data  Comn.,  A/C 
Data  Comn.,  Air/Ground (DABS) 
AIDS 
Instrument  Monitor 
System  Monitor 
Life  Support 
Engine  Control 
Flight  Phase 
C 1 imb Initial  Hissed  Navigational 
Takeoff  Descent 
" 
- P 
P 
S 
- 
- 
- - 
- P 
P  P 
P P 
P 
B 
- 
- 
- - 
- P 
P  P 
P 
S S 
P  P 
P  P 
S S 
S S 
S S 
S S 
P 
P  P 
- 
- 
Cruise 
P 
P 
S 
- 
P 
P 
P 
P 
B 
- 
P 
P 
P 
S 
P 
P 
S 
S 
S 
S 
P 
P 
Approach 
P 
- 
S 
- 
P 
P 
P 
P 
B 
- 
P 
P 
P 
P 
P 
P 
P 
S 
S 
S 
S 
P 
Landing 
- 
- 
P 
P 
B 
P 
P 
P 
B 
- 
P 
P 
P 
P 
P 
P 
P 
S 
S 
S 
- 
P 
Approach 
P 
- 
S 
- 
P 
P 
P 
P 
B 
- 
P 
P 
P 
P 
P 
P 
P 
S 
S 
S 
- 
P 
Failure 
P 
- 
S 
- 
P 
P 
P 
- 
P 
B 
B 
P 
- 
S 
P 
P 
S 
S 
S 
S 
P 
P 
Symbols: P: Prime; S: Secondary; B: Backup; -: N/A 
needed  to  be  assigned,  and  ranges  of  values  required  unique  assignments 
before  schedule  development  could  proceed. In addition,  the  local  and 
global  executived  have  now  been  specified  to  a  degree  permitting  values 
with  a  sufficient  level  of  confidence to  be  assigned  to  their  description. 
Likewise,  the  flight-phase-change  tasks  and  the  reconfiguration  tasks c n 
be  specified  more  accurately. A summary  of  the  modifications  and  addi- 
tions  that  have  been  made  to  the  tables  are  given  in  Table V - 2 .  The 
resulting  revised  set of task modules-and their  properties  is  shown  in 
Table V - 3 .  These  data  can  now be used  to  allocate  task  sets  to  processor- 
memory  units  and  to  develop  a  suitable  task  scheduling  algorithm. 
* 
D. Task  Allocation  and  Schedule  Generation 
Procedures  are  outlined  here  for  the  allocation  of  the  various  tasks 
to  specific  (redundant)  processor  and  memory  units,  and  for  the  organiza- 
tion  of  the  scheduling  of  those  tasks. 
The  problems  of  allocating  and  scheduling  these  tasks  are  considered 
in  some  detail.  Assumptions  regarding  the  characteristics  of  the  tasks 
include : 
All tasks  pertinent to a  flight  phase  are  resident  in 
main  memory  during  that  phase. 
0 Reconfiguration  can  occur  due  to  flight  phase  change, 
processor-memory  failure  or  pilot  intervention. 
Software  task  replication  in  separate  processor-memory 
units  is  used  to  achieve  fault-tolerance. 
Replicated  tasks  will be only  loosely  synchronized. 
The  tasks  operate  on  a  real-time  basis  and  have 
stringent  execution  periodicity  requirements. 
* 
The alphanumeric  task  designators  are  those  used  in  Table V - 1 .  
54 
I 
Table V-2 
ADJUSTMENTS AND MODIFICATIONS 
TO THE INITIAL TASK SPECIFICATIONS 
Tasks A4, A 5 ,  and A6, the Autoland Tasks,  were coa lesced  
in to  one  t a sk .  
Task B 1 ,  t h e  S u p e r v i s o r  Task, w a s  merged wi th  the  Loca l  
Execut ive,  LE. 
Task B 4 ,  DME/OMEGA, was combined wi th  Task. B3, VOR/DME, 
s i n c e  t h e y  would not   run   concurren t ly .   Also ,  B 4  was 
as s igned  the  same MIPS as B 3 . '  
Task B 5 ,  A i r  Data, was a s s i g n e d  a n  i t e r a t i o n  ra te  p e r  
second of 5 and a MIPS of 0.001. 
The c r i t i c a l i t y  class of  Task C2, Data Comm. A/C, w a s  
set t o  1 assuming tha t  the  backup i s  u n d e r  p i l o t  d i r e c -  
t i o n .  
V a r i a b l e  t a s k  c r i t i c a l i t y  class assignments  were f i x e d  
by assuming the highest  value.  
Task D3, System Monitor, w a s  a s s i g n e d  a n  i t e r a t i o n  rate 
p e r  second  of 5 .  M u l t i p l e  o r  v a r i a b l e  i t e r a t i o n  rates 
were chosen  to  have  the  h ighes t  va lue .  
Task LE w a s  added t o  t h e  t a b l e  f o r  t h e  L o c a l  E x e c u t i v e  
and i s  t o t a l l y  r e p l i c a t e d .  It h a s   t o   r u n  as f r e q u e n t l y  
as the  most  f requent  module ,  tha t  i s ,  w i t h  a n  i t e r a t i o n  
ra te  of 670 p e r  second. The  number of i n s t r u c t i o n s  p e r  
i t e r a t i o n  was determined to  be 50. 
The Global  Execut ive,  GE, was a l s o  added  and was ass igned  
a n  i t e r a t i o n  ra te  of  f ive  per  second and  200 i n s t r u c t i o n s  
p e r  i t e r a t i o n .  
Inf requent  requi rements  - 
I n s t r u c t i o n s  
p e r  I t e r a t i o n *  
Reconf igura t ion  - Global 100 
Reconf igura t ion  - Local 120 
Fl ight  phase change - Global 100 
Fl ight  phase change - Local 120 
Program load 20 X number of words 
* 
C r i t i c a l i t y  c l a s s  = 1. 
55 
Table V-3 
TASK  MODULE PROPERTIES  FOR  SCHEDULING  ASSIGNMENTS 
Iteration 
Task Rate/Sec. 
A1  20 
A2 250 
* A3 240
* A4,5,6 160 
A7 5 
* A8  30 
B2 25 
* B3 5 
B4 5 
B5 5 
B6 0.2 
B7 5 
* B8 16 
* B9 8 
B10 10 
* c1 670
* c2 5 
* c3 4 
* Dl 4 
* D2 5 
* D3 5 
* D4 0.5 
* D5  33 
*LE 670 
* GE 5 
* REC-GE 
* FPC-GE * IRREGULAR 
* FPC-LE 1 
* PROGRAM LOAD 
Period in 
Sec. (~103)t 
50 
4 
4 (3) 
6.25 (6) 
33.3 (30) 
40 
200 (180 
200 
200 
5000 
200 
125 (120) 
100 
200 
62.5 (60) 
1.5 
200 
250 
250 
200 
200 
2000 
30 
1.5 
200 (180) 
Instructions 
Iteration 
1150 
276 
58 
344 
200 
2567 
1360 
800 
800 
200 
5000 
5600 
562 
4000 
1900 
31 
1200 
250 
500 
2800 
200 
2000 
3606 
50 
200 
100 
120 
100 
120 
20* 
Abbreviations: 
FPC - Flight phase change 
LE, GE - Local and Global  Executives 
REC - Reconf igurat ion 
T - Totally replicated 
- MIPS 
0.023 
0.069 
0.014 
0 .OS5 
0.001 
0.077 
0.034 
0.004 
0.004 
0.001 
0.001 
0.028 
0.009 
0.032 
0.019 
0.021 
0.006 
0.001 
0.002 
0.014 
0.001 
0.001 
0.119 
0.034 
0.001 
Criticality 
Class 
1 
1 
3 *  
1 *  
4 
1 *  
2 
4 *  
4 
4 
4 
4 
4 *  
4 *  
4 
4 *  
1 *  
4 *  
5 *  
4 *  
I *  
1 *  
1 *  
l * T  
1 *  
1 *  
l * T  
1 *  
l * T  
l * T  
Memory 
2075 
92 
60 
1025 
250 
1310 
2250 
3 00 
505 
135 
3 15 
550 
430 
6250 
9340 
1200 
610 
562 
1300 
1900 
1000 
1000 
1500 
320 
1:oo 
* 
Most demanding phase, Autoland 
The values in parentheses are period assignments somewhat shorter than the 
desired requirements, but  representing  convenient  multiples of the smallest 
period (1.5  msec.), and of subsequent  higher multiples, for  schedule derivation. 
Note that in no instance was the period changed by more than 30%. 
t 
*Times the  number of words 
56 
To achieve  ba lanced  task  loading  in  te rms  of  bo th  
processing and memory requirements .  
To a l low enough spare  capac i ty  to  permi t  reconf ig-  
u ra t ion  wi th  one  less memory-processor unit .  
0 To accomodate   supplementary  "passive"  a l locat ion  of  
c r i t i c a l  t a s k s  t h a t  demand p rocess ing  wi th in  a t i m e  
f r a m e  t h a t  d o e s  n o t  a l l o w  f o r  r e c o n f i g u r a t i o n .  
0 To p e r m i t  e i t h e r  f u l l  t a s k  a l l o c a t i o n  o r  s i n g l e  f l i g h t -  
p h a s e  t a s k  a l l o c a t i o n  w i t h  r e c o n f i g u r a t i o n  t o  a c h i e v e  
f l igh t -phase  change .  
Schedule  genera t ion  has  a similar set  o f  o b j e c t i v e s :  
0 A t echnique   wi th   an   unequivoca l   se t  of task  sequence 
ass ignment  ru les .  
A s c h e d u l e  s p e c i f i c a t i o n  t h a t  c a n  be e f f i c i e n t l y  
s t o r e d ,  a p p l i e d ,  and  changed. 
0 A der iva t ion  methodology tha t  i s  f l e x i b l e  and can  be 
shown t o  a c h i e v e  t h e  r e q u i r e d  t a s k  e x e c u t i o n  p e r i o d i c i t y .  
0 A r e p r e s e n t a t i o n  t h a t  i s  eas i ly  in t e rp re t ed  and  imple -  
mented. 
P r i o r  t o  d e t a i l e d  a l l o c a t i o n  a l g o r i t h m  d e v e l o p m e n t ,  c o n s i d e r a t i o n  s h o u l d  
b e  g i v e n  t o  t h e  way t h a t  t a s k  s c h e d u l e s  w i l l  be  implemented. Of p a r t i c u -  
l a r  i n t e r e s t  h e r e  are such  f ac to r s  as: 
0 The r e source  pena l ty  fo r  l oad ing  a l l  normal   tasks  
b e l o n g i n g  t o  a n y  f l i g h t  p h a s e  s c h e d u l e s  so t h a t  
no r e c o n f i g u r a t i o n  w i l l  be  necessary .  
When a f a i l u r e  o r  set of f a i l u r e s  o c c u r s ,  t h e  method 
by which new schedules  are de r ived  and implemented. 
0 The e x t e n t  t o  w h i c h  t a s k s  c a n  b e  r e p l a c e d  b y  p i l o t  
c o n t r o l  so  t h a t  s c h e d u l e s  may b e  r e v i s e d  by simply 
e l i m i n a t i n g  f a i l e d  t a s k s .  
The form i n  which  schedules are t o  b e  s t o r e d  a n d / o r  t h e  
way they  w i l l  be  genera ted  on- l ine .  
The r e s o l u t i o n  o f  t h e s e  d e s i g n  f a c t o r s  i s  dependen t  on  the  t a sk  r ep l i ca -  
t i o n  scheme, p a s s i v e  a l l o c a t i o n  t e c h n i q u e s ,  f a i l u r e  state procedures ,  
a n d  t h e  t o t a l  number of processor -memory  uni t s .  I f  a general ized approach 
57 
is  taken  to  addressing  these  factors,  the  factors  might  be  evaluated 
from  several  alternative  approaches  to  determine  the  impact  of  these 
assumptions  on  either  the  required  processor-memory  resources,  the  sim- 
plicity  of  schedule  derivation  and  implementation,  or  the  reconfiguration 
method.  For  instance,  if we triply  replicate  all  tasks  and,  for  all 
flight  phases  simultaneously,  allocate  them  over  five  processor-memory 
units,  would  resources  of  any of these  units  be  exceeded,  and  what  is  the 
size  memory  units  that  is  required?  Likewise,  what  is  the  resulting 
accumulated  distribution of processor  and  memory  resources?  These.  and 
other  similar  questions  are  examined  in  this  section. 
The  problems  associated  with  reconfiguration  resulting  from  system 
failures  are  considered  first  and  may  determine  the  amount  of  spare 
capacity  that  must  be  designed  into  the  system  to  assure  redistribution 
of tasks on a  failed  processor-memory  unit  onto  the  other  operational 
units. A less  critical  task  that  is  adequatly  replicated  initially  may 
not  need  to  be  reassigned.  Very  critical  tasks  either  must  be  reassigned 
to  another  processor-memory  unit,  must  have  adequate  backup  via  another 
task  that  is  operational,  or  must have  been  passively  allocated  on  some 
other  processor.  Tasks  that  cannot  tolerate  missed  iterations  and  that 
belong  to a  high-criticality  class  should  be  passively  allocated  to  guar- 
antee  immediate  takeover  of  the  critical  task  processing  from  a  failed 
unit . 
One  approach  is  to  take  all  the  tasks  in  any of the  phases  and  dis- 
tribute  them  across  processors.  However,  this  distribution  would  lead 
to  rather  heavily  loaded  processors,  and  one  or  more  additional  processor 
units  would  be  required  to  support it. Such  a  distribution  would,  how- 
ever,  facilitate  reconfiguration  and  flight  phase  change. 
If there  is  only  one  flight  phase  per  allocation  or  if,  at  least, 
not  all  tasks  are  loaded  into  the  system  all  of  the  time,  then  the 
occurrence of a  failure  requires  either  accessing  stored  schedules o r  
having  a  method  of  automatically  generating  them.  Such  methods  are  con- 
sidered  in  detail  in  the  next  section.  Another  approach  is  to  depend 
upon  pilot  intervention  for  certain  noncritical  support  functions  and  to 
more  than  triply  replicate  the  critical  tasks.  While  these  approaches 
58 
are all viable,  the  current  documents  need  only  demonstrate  the  feasi- 
bility  of  at  least  one  such  allocation  and  scheduling  scheme. 
In order  to  estimate  the  minimum  adequate  number  of  processors  to 
forestall  degradation  because  of  a  single  processor  or  memory  unit loss, 
a  simple  approach  might  be  taken.  First,  all  tasks  above  minimum  criti- 
cality  must  be  at  least  triply  replicated  to  assure  an  adequate  level  of 
confidence.  Three  replications  are  sufficient,  but  four  give  an  addi- 
tional  margin  that  allows  one  processor-memory  to  fail  while  the  module 
continues to  execute  reliably  without  configuration.  Assuming,  however, 
that  reconfiguration  is  acceptable  and  necessary,  a  simple  approach  to 
estimate  the  number  of  processors  sufficient  for  the  task  is: 
Let PT be the processing time to carry out one iteration of the 
task, 
T  be  the  iteration  period  of  the  task, 
MR be  the  memory  requirement  of  the  task, 
then,  the  total  processor  requirement  is 
the  total  memory  requirement  is 
M =  
all  tasks 
To illustrate  application  of  possible  allocation  techniques,  two 
approaches  are  taken.  Instead  of  focusing  initially  on  the  flight  phases, 
it was  decided  to  attempt  to  schedule  all  tasks  across  all  processors. 
These  processors  are  assumed  to  have 0.5 MIPS (millions  of  instructions 
per  second)  capacity  and  have  a  20-kiloword  memory. 
First,  all  tasks  were  assigned  triple  replication.  Then  the  number 
of processors  was  calculated: 
59 
Number of - 
Processo r s  
- 
I n  t h i s  example, N = t h e  
2 MIPS X 3 X - 
j=1 0 .5  
1 . 2  + 
number  of t a s k s ,  MIPS 
j 
1 = 4.8 '5  . 
i s  t h e  m i l l i o n s  of i n s t r u c -  
t i ons / second   r equ i r ed   fo r   t a sk  3 i s  f o r   t r i p l e   r e p l i c a t i o n ,   t h e  0.5 
f a c t o r  i s  the machine MIPS, and the 1 . 2  f a c t o r  p r o v i d e s  a sa fe ty  marg in .  
When t h e  r e s u l t i n g  number i s  rounded up  to  achieve  an  in teger  number  of 
p rocesso r s ,   t he   app ropr i a t e  number i s  found t o  b e  f i v e .  Thus, when t h e s e  
t a s k s  are a l l o c a t e d  o v e r  f i v e  p r o c e s s o r s ,  a s s i g n i n g  t a s k s  o f  t h e  h i g h e s t  
c r i t i c a l i t y  class f i r s t ,  and a l l o c a t i n g  s o l e l y  on t h e  b a s i s  of processor  
r e s o u r c e  u t i l i z a t i o n  (MIPS) and n o t  on memory, t h e  a l l o c a t i o n  f o u n d  i n  
Table  V-4 i s  ob ta ined .  Of p a r t i c u l a r  n o t e  i s  the  degree  of  MIPS ba lanc ing  
achieved.  The t o t a l s  a r e  c o n s i s t e n t  t o  w i t h i n  a p p r o x i m a t e l y  49.2%. 
j' 
* 
The fo l lowing  s t eps  de f ine  a more r e f i n e d  method f o r  d i s t r i b u t i n g  
r e p l i c a t e d  t a s k s  o v e r  a number of  processor-memory uni ts  while  assur ing 
t h a t  some balance of  loads i s  achieved .  
0 D e f i n e  t h e  f l i g h t  p h a s e  and t a s k s  t o  b e  r e s i d e n t  i n  
t h e  memory u n i t s .  
a Tabu la t e   fo r   each   t a sk :  (1) t h e   t a s k  MIPS and t h e  
f r a c t i o n a l  p r o c e s s o r  u t i l i z a t i o n  f o r  t h e  assumed 
processor  type,   and (2) t h e  t a s k  memory requirement  
i n  thousands  o f  words ,  and  the  f r ac t ion  o f  t o t a l  
memory r e q u i r e d  ( f o r  t h e  s i z e  of memory u n i t  t o  b e  
used) .  
a D e t e r m i n e  t h e  r e q u i r e d  r e p l i c a t i o n  p e r  t a s k  b a s e d  on 
t h e  c r i t i c a l i t y  c l a s s  and whether passive a l l o c a t i o n  
w i l l  b e  r e q u i r e d .  
0 Accumulate  the sum of t h e  MIPS r e q u i r e d  f o r  a l l  t a s k s  
o f  t he  wors t - case  f l i gh t  mode t o  d e t e r m i n e  e i t h e r :  
(1) t h e  t o t a l  number  of a p respec i f i ed  p rocesso r  t ype  
t h a t  w i l l  b e  r e q u i r e d ;  o r  (2)  the  processor  speed  re- 
qui red  to  provide  one  comple te  f l igh t -mode  process ing  
c a p a b i l i t y  p e r  p r o c e s s o r  ( i n c l u d i n g  a r e a s o n a b l e  s a f e t y  
margin) .  From  an o v e r a l l  r e l i a b i l i t y  v i e w p o i n t ,  i t  i s  
d e s i r a b l e  t o  p l a n  f o r  a t  least f i v e  p r o c e s s o r s  (as d i s -  
cussed  in  Chap te r  V I I ) .  
* 
However, the  ba lance  may no t  a lways  be  th i s  c lose .  
60 
Table V-4  
ALLOCATION O F  ALL TASKS  TRIPLY  REPLICATED  ACROSS FIVE PROCESSORS 
Allocation Sequence: 
0 Take t a s k s  w i t h  c r i t i c a l i t y  c l a s s  1 - 2  and d i s t r i b u t e  them  by maximum 
MIPS; then 
0 Take t a s k s  w i t h  c r i t i c a l i t y  c l a s s  3-5  and d i s t r i b u t e  them  by maximum 
MIPS. 
"~ ~ 
:G MIPS ~. 1 2 3  4  5 
~ ~~~ 
1 0.119 D5  D5  D5 
1 0.077 A8 A8 A8 
1 0.069 A2  A2 A2 
1 0.055 A4  A.4 A4 
2 0.034 B 2  B 2  B 2  
1 0.023 A 1  A 1  A 1  
1 0.006 c2 c2 c2 
1 0.002 D3,4 D3,4  D3,4 
~~ ~ ~~ . 
4 
4 
4 
4 
3 
4 
4 
4 
4 
5 
4 
4 
4 
4 
- 
0.032 
0.028 
0.021 
0.019 
0.014 
0.014 
0.009 
0.004 
0.004 
0.002 
0.001 
0.001 
0.001 
0.001 
- 
T o t a l  
B9 
B 10 
A.3 
D 2  
B8  
A7 
B6 
B5 
B9 
c1 
A.3 
D2 
B3 
B4 
A7 
c3  
B 5  
B9 
B7  B7 
c1 
B 10 
A3 
D2 
B 8  
B3 
B4 
D l   D l  
A7 
B6 
c 3  
B5 
B 7  
c1 
B 10 
B 8  
B3 
B4 
D l  
B6 
c3 
0.322  0.322  0.321  0.322  0.321 
61 
0 Accumulate  the sum of   the memory requi rements  for  a l l  of  
t h e  a l l o c a t e d  t a s k s  o f  t h e  w o r s t - c a s e  f l i g h t  mode ( inc lud -  
ing  a r easonab le  sa fe ty  marg in )  fo r  each  o f  t he  p rocesso r s .  
I n  a d d i t i o n ,  memory mus t  be  p rov ided  fo r  t he  pas s ive  
a l l o c a t i o n  of those  c r i t i c a l  t a sks  fo r  wh ich ,  du r ing  
r econf igu ra t ion ,  no m i s s e d  i t e r a t i o n s  c a n  b e  a l l o w e d .  
0 B e g i n  a l l o c a t i o n  b y  s e l e c t i n g  t h e  u n a l l o c a t e d  t a s k  
r e q u i r i n g  t h e  h i g h e s t  f r a c t i o n  of e i t h e r  p r o c e s s o r  MIPS 
o r  memory s torage,   whichever  i s  g r e a t e r .  I n  t h e  e v e n t  
of a t ie ,  select  the  one  wi th  the  h ighes t  combined 
processor-memory t o t a l .   F o r   t h e   g i v e n   f l i g h t   p h a s e ,  
p a s s i v e  a l l o c a t i o n  s h o u l d  b e  b a s e d  on t h e  memory 
requirement   only.   That  i s ,  t h e   r e q u i r e d  memory s t o r a g e  
i s  c o n s i d e r e d  f o r  p a s s i v e  a l l o c a t i o n ,  a l o n g  w i t h  t h e  
ac t ive   modules .  However, t h e   e x e c u t i o n  of t i m e  f o r  t h e s e  
t a s k s  i s  t r e a t e d  as z e r o  f o r  p r o c e s s o r  r e s o u r c e  u t l i z a t i o n .  
Us ing   t he   de t e rmined   a l loca t ion  c r i te r ia  ( e i t h e r  memory 
o r  p r o c e s s o r ) ,  a s s i g n  e a c h  s e l e c t e d  t a s k  i n  t u r n  t o  t h e  
memory o r  p rocesso r ,  as appropr ia te ,  wi th  the  most  avail- 
ab le   unass igned   capac i ty .   Ass ign   t a sks   t o   t he   i nd ica t ed  
r e p l i c a t i o n   l e v e l .  The o n l y   c o n s t r a i n t  i s  t h a t  a g iven  
t a s k  may be assigned a t  most once t o  a given  processor -  
memory u n i t .  
0 Cont inue   ass igning   tasks   o f   lower   load   requi rement   un t i l  
a l l  tasks   have   been   ass igned .   Tasks   tha t  are t o t a l l y  
r ep l i ca t ed ,  such  as the Local  Execut ive,  can be assigned 
a t  any   po in t   dur ing   the   p rocedure .   For   the  examples given 
h e r e ,   t h e s e   t a s k s   a r e   a s s i g n e d  las t .  
0 Check accumula ted   capac i t ies  on a l l  u n i t s  t o  v e r i f y  t h a t  
none  have  been  exceeded. Of any  have  been  exceeded  or i f  
e i t h e r  r e s o u r c e  i s  badly misbalanced, a r e a l l o c a t i o n  
should  be made t o  a c h i e v e  b e t t e r  b a l a n c e .  However, f o r  
a g iven  t a sk ,  r ea l loca t ion  can  t ake  place o n l y  t o  u n i t s  
t ha t  have  no t  a l r eady  been  a l loca ted  tha t  t a sk .  
A f lowchar t  r ep resen t ing  these  bas i c  steps i s  g i v e n  i n  F i g u r e  V-1. 
Next ,  app ly ing  th i s  a lgo r i thm to  the  f l i gh t  phase  r equ i r ing  the  mos t  
r e sources ,  we encounter  more i n t e r e s t i n g  c o n d i t i o n s .  An examination  of 
t h e  f l i g h t  p h a s e s  i n  T a b l e  V - 1  leads to  the Landing Phase as the most 
demanding o f  the  phases .  The c r i t i c a l  d a t a  f o r  a l l o c a t i o n  a r e  shown i n  
Table V - 5 .  The a l l o c a t i o n  b a s e d  on these  da t a  i s  shown i n  T a b l e  V-6. 
No preference  was made f o r  c r i t i c a l i t y  c l a s s .  T r i p l e  r e p l i c a t i o n  was 
assumed.   Also,   the   f ive  processors   used  in   the  previous  example  were 
used   here .   This  i s  t o  a l l o w  f o r  p a s s i v e  a l l o c a t i o n  a s  w e l l  as to  a l low 
modu les  f rom o the r  phases  to  be  p re sen t  t o  f ac i l i t a t e  r ap id  phase  change .  
62 
".  .- ". .. . ._ .. . , . , . .I 
p. = Processing Requirement 
m. = Memory Requirement 
Pi = Processing Load  Already 
Allocated to Module i 
Mi = Memory  Load Already 
Allocated to Module i 
’ for Task j 
’ for Task j 
FIGURE V-I 
I I ALLOCATE TASK J TO MODULE i 
Pi < 1 Exit 
Algorithm 2 
ALLOCATION ALGORITHM 
6 3  
Table V-5 
TABLE  OF  AUTOMATED  FLIGHT  PHASE  TASKS  AND THE 
CHARACTERISTICS  USED TO DISTRIBUTE  THEM  OVER  PROCESSOR-MEMORIES 
Task 
A3 
A4 
A8 
B3 
B8 
B 9  
c1 
c2 
c3 
Dl 
D2 
D3 
D4 
D 5  
GE 
-
LE (T)* 
MIPS 
0.012 
0.055 
0.077 
0.004 
0.009 
0.032 
0.021 
0.006 
0.001 
0.002 
0.014 
0.001 
0.001 
0.119 
0.001 
0.034 
Fraction 
of 0.5 MIPS 
Processor 
0.024 
0.110 
0.154 
0.008 
0.018 
0.064 
0.042 
0.012 
0.002 
0.004 
0.028 
0.002 
0.002 
0.238 
0.002 
0.068 
Memory (K) 
0.06 
1.02 
1.31 
0.30 
0.43 
6.25 
1.20 
0.61 
0.56 
1.30 
1.90 
1 .oo 
1 .oo 
1.50 
1.10 
0.32 
Fraction 
of 20K 
Memory 
0.003 
0.051 
0.065 
0.015 
0.021 
0.312 
0.060 
0.030 
0.028 
0.065 
0.095 
0.050 
0.050 
0.075 
0.055 
0.016 
* 
T = Replicated  totally 
64 
Table V-6 
ALLOCATION  EXAMPLES--DISTRIBUTED  ASSIGNMENT OF
AUTOLAND  PHASE  TASKS  OVER  FIVE  PROCESSOR-MEMORY  UNITS 
Accumulated  Task  MIPS 
er  Processor 
Task M/P* 1 2  3 C -
B9 
D5 
A8 
A4 
D2 
Dl 
c1 
GE 
D3 
D4 
c2 
c3 
A3 
B8 
B3 
LE 
M  0.032  .032  .032 
P 0.151 
P 0.109  0.109 
P 0.164  0.164 
M  0.165 
M 0.166 
M 0.185 
M 0.166 
M 0.186 
M  0.167 
M  0.172 
M 0.187  0.168 
P 0.199  0.180 0.184 
M  0.208 
M 0.184  0.188 
M  0.209  0.185  0.189 
4 - 
0,119 
0.196 
0.210 
0.212 
0.233 
0.234 
0.235 
0.236 
0.242 
0.251 
0.255 
0.256 
- J 
0. 119 
0.174 
0.188 
0.190 
0.211 
0.212 
0.213 
0.214 
0,220 
0,221 
0.230 
0.231 
Accumulated  Task  Memory  (K) 
per  Memory  Unit 
1 
6.25 
7.56 
8.58 
- 
9.78 
10 78 
11.34 
11.83 
12.15 
11 a40 
2 3 
6.25 6.25 
7.75 
7.56 
8.58 
9.88 
"
9.65 
10.75 
10.88 
11.36 
11.44 
11.50  11.42 
11.80  11.72 
12.12  12.04 
* 
M - Allocation  based  on  fraction f memory  requiring  largest  capacity 
P - Allocation  based on fraction of processor  requiring  largest  capacity 
Notes: 1. Autoland  requires  the  greatest  processing. 
2. Memory  assumed -- 20  kilowords 
3.  Processor  assumed -- 0.5  MIPS. 
4 - 
1.50 
2.81 
4.71 
6.01 
7.21 
8.31 
9.31 
10.31 
10.92 
11.35 
11.65 
11.97 
- 5 
1.50 
2.52 
4.42 
5.72 
6.92 
8.02 
9.02 
10.02 
10.63 
11.19 
11.62 
11.94 
I n  t h i s  case, the accumulated MIPS and memory are r e p o r t e d  i n  e a c h  
column.  The r e s u l t s  i n d i c a t e  t h a t  t h e  d i s t r i b u t i o n  i s  no t  well equa l i zed  
f o r  t h e  p r o c e s s o r  ( a b o u t  +16% d e v i a t i o n )  w h i l e  t h e  memory d i s t r i b u t i o n  
i s  c l o s e l y  b a l a n c e d  ( t o  a b o u t  +1%). 
Thus, v i ab le  a l loca t ion  schemes  have  been  demons t r a t ed  tha t  are 
e a s i l y  implemented and s a t i s f y  a l l  s t a t e d  a l l o c a t i o n  o b j e c t i v e s .  
E .  Schedule   Der iva t ion  
Now t h a t  some f l ex ib l e  a l loca t ion  t echn iques  have  been  desc r ibed ,  
the  problem of s c h e d u l e  d e r i v a t i o n  c a n  b e  a d d r e s s e d .  P r o p e r t i e s  t h a t  
are b a s i c  t o  s c h e d u l i n g  of t a s k s  i n  SIFT inc lude :  
0 Repl i ca t ed   t a sks   execu t ing  on d i f f e r e n t  p r o c e s s o r  
u n i t s  need  not  be  in  lock-s tep  synchroniza t ion ,  bu t  
are only loosely synchronized.  
0 Tasks may be  preempted,  but  such a procedure  can  
make program proving  more  d i f f icu l t .  
0 Fixed sets o f  t a s k s  r e p r e s e n t  f l i g h t  p h a s e s  and are 
executed  wi th  a g i v e n  p e r i o d i c i t y .  
0 The only  mandatory  descr ip t ion  of  a g iven  schedule  
execut ion  i s  one  of f l i g h t  c h a n g e  o r  r e c o n f i g u r a t i o n .  
I n  view of  these  condi t ions ,  severa l  assumpt ions  can  be  made t h a t  some- 
what s impl i fy  the  approach  to  schedu l ing .  
Tasks are assumed t o  b e  a s s i g n e d  t o  s c h e d u l e s  as s i n g l e  
u n i t s   t h a t   c a n n o t  be preempted. The only   except ions   occur  
when t a s k s  e x e c u t e  f o r  a per iod  of t ime  tha t  equa l s  o r  
e x c e e d s  t h e  s h o r t e s t  t a s k  p e r i o d  ( e q u a l  t o  t h e  p e r i o d  o f  
the   Loca l   Execut ive) .   These   t asks   do   requi re   p reempt ion  
and w i l l  be  considered more closely.  
Tasks are execu ted  acco rd ing  to  a f i x e d  t a s k  s e q u e n c e  i n  
which each task i s  a l loca ted  one  o r  several t i m e  b locks .  
E x c e p t  f o r  t h e  c l o c k  r o u t i n e  and g l o b a l  e x e c u t i v e  f l i g h t  
f a i lu re  o r  phase  change  p rocess ing ,  no t a s k  i s  i n t e r r u p t -  
d r i v e n  o r  i n i t i a t e d  o u t s i d e  of  the f ixed sequence.  
The schedule  i s  based on t h e  maximum execut ion  t i m e  of 
each  of  the  member t a sk  modu les  (wi th  p rov i s ion  fo r  a 
r easonab le  sa fe ty  marg in ) .  
The schedule  main ta ins  a t o t a l l y  r e p r o d u c i b l e  o r d e r i n g  o f  
t a s k s  t h a t  a s s u r e s  a l l  t a s k s  o f  t h e  r e q u i s i t e  p e r i o d i c i t y .  
66 
Furthermore, to  accomplish  the  objectives  laid  out  at  the  beginning  of 
this  section,  a  simple,  compact  method  of  representing  the  schedule  must 
be  available. 
F. Schedule " "" Representation  and  Notation 
A convenient  notation  is  needed  that  allows  representation  of  a 
task  schedule  in  a  concise,  easily  derived,  and  readily  interpreted  form. 
One  such  approach  is  the  adoption  of  a  notation  resembling  that  of  reg- 
ular  expressions.  The  notation  and  interpretation  of  this  formalism  has 
been  modified  and  expanded  to  accommodate  the  SIFT  scheduling  requirements. 
The  following  notational  conventions  have  thus  far  been  adopted: 
Symbol 
r Ii 
Interpretation 
The  parentheses  enclose a sequence  of  tasks  or  sets 
thereof ,+ separated  by  commas,  that are to  be exe- 
cuted in sequence.  The  lrnl'  superscript  indicates 
that  the  event  sequence  defined  within  the  expression 
is  to  be  repeated n times  and  then  terminated. The 
asterisk  superscript  means  that  the  expression  is  to 
be  repeated  from  left  to  right  until  externally  termi- 
nated. Note that  the  asterisk  may  not  be  used more 
than  once  in  a  given  expression  and  if  used,  must 
qualify  the  outermost  parentheses.  Each  task  is 
allocated  a  time  slot  equal  to  the  maximum  time 
required  for  normal  execution  (extended to provide 
a safety  margin). 
The square  brackets  enclose  tasks,  or  expressions 
containing  sets  of  tasks,  separated  by  commas.  Only 
one  of  these  elements  is  to  be  selected  for  scheduler 
processing  each  time  this  expression  is  encountered. 
Items  are  selected  sequentially  from  left  to  right  in 
turn  as  the  expression is  encountered,  such  that  for 
n-items,  one  task  for  each  of  these  n-items  will  have 
been  executed  after  n-iterations  through  this  expres- 
sion.  Again,  cycling  wraps  around  to  the  first 
'Clearly  the Task  Dispatcher  in  the  Local  Executive  must run following 
each  task  execution  to  refer  to  the  schedule  in  effect  and  to  determine 
the  next  task  to  be  executed.  Likewise,  the  clock  routine  must  be run. 
We assume  that  these  small,  fixed-length  blocks  of  instruction  can be 
treated as  though  they were part  of  each  task  rather  than  being  explic- 
itly  included  as  separate  tasks.  This  approach  sacrifices  nothing 
technically  but  greatly  simplifies  the  representation. 
67 
A i n t e g e r  
[ I j  
e lement .  The s u p e r s c r i p t   i n d i c a t e s   t h a t   t h i s   e x -  
p r e s s i o n  i s  to  be  r epea ted  i-times a t  t h a t  p o i n t  
i n  t h e  s c h e d u l e .  
The A i n d i c a t e s  a n  i d l e  o r  u n a l l o c a t e d  p r o c e s s o r  t i m e  
b l o c k  w h e r e  t h e  i n t e g e r  i s  t h e  a v a i l a b l e  t i m e  i n  
microseconds.  This may b e  u s e d  t o  e x e c u t e  i r r e g u l a r  
t a s k s  o r  t a s k s  t h a t  are s o  i n f r e q u e n t  o r  r e q u i r e  so  
much processor  t i m e  tha t  they  must  be  preempted  a 
number  of times t o  a l l o w  m o r e - f r e q u e n t  t a s k s  t o  r u n .  
S u b s c r i p t s  i n d i c a t e  t h e  p e r i o d i c i t y  i n  m i l l i s e c o n d s  
of a given i t e m  i n  a g iven  schedul ing  express ion .  
These are used for  convenience and need not  be 
p r e s e n t .  
Examples of t he  app l i ca t ion  o f  such  a n o t a t i o n  are shown i n  F i g u r e  V-2. 
Also shown i n  t h i s  f i g u r e  are a l te rna t ive  l ink-connected  d iagrams and  
the long forms of  these expressions.  
Bas ica l ly ,  the  convenience  of  the  modi f ied  regular -express ion  for -  
m a l i s m  i s  t h a t  i t  g r e a t l y  f a c i l i t a t e s  t h e  d e r i v a t i o n  of a schedule ,  as 
w e l l  as i t s  s to rage  and  execu t ion ,  wh i l e  t he  g raph ica l  r ep resen ta t ion  
a ids  in  the  comprehens ion  of  a completed schedule.  
G.  Sample  Schedule  Derivation 
Us ing  the  t echn iques  d i scussed  in  the  p rev ious  sec t ion ,  a sample 
schedule  i s  now d e v e l o p e d  u s i n g  t h e  t a s k s  i n  t h e  L a n d i n g  F l i g h t  P h a s e .  
T h i s  p h a s e  r e c e i v e d  a t t e n t i o n  s i n c e  it demands more system resources 
than  do  o the r  ope ra t iona l  phases .  The d a t a  r e q u i r e d  f o r  s c h e d u l i n g  are 
t h e  t a s k  p e r i o d i c i t y  a n d  t h e  t a s k  m o d u l e  e x e c u t i o n  t i m e .  The step-by- 
s tep development  of  the sample task schedule  i s  shown i n  F i g u r e  V-3, 
a n d  t h e  c o r r e s p o n d i n g  a l t e r n a t i v e  r e p r e s e n t a t i o n  i s  g i v e n  i n  F i g u r e  V - 4 .  
The procedure  used  for  der iv ing  such  a schedule  i s :  
L e t  N = t h e  number of  tasks ,  
F o r  t h e  j th task ,  ( j  = 1, . . . N) ,  l e t  
= t h e  p e r i o d i c i t y  ( i n  m i l l i s e c o n d s )  
e = t he   execu t ion  t i m e  ( in   microseconds)  
j 
68 
The  flowchart  given  in  Figure V - 5  describes  the  procedure  for  deriving 
the  schedule  expression. 
SCHEDULE ALTERNATE 
EXPRESSION REPRESENTATION INTERPRETATION 
~~ 
(A,B,c)* 
(A ,B ,A)* 
A 
I 
I 
A 
I 
I 
I 
A 
B 
A 
B 
A 
n 
B k j  
A.B.C.A.B.C. A . 
AB. A.A.B.  A 
A.B. A,A,C,A.B. A ,  . . . 
FIGURE V-2 SCHEDULE  REPRESENTATION  EXAMPLES 
In representing  the  schedules, we can  use  a  convenient  shorthand 
notation  in  which we define  clusters  of  tasks  according  to  the  scheme 
illustrated  in  Figure V - 4 .  The  total  storage  required  to  represent a 
schedule  can  be  greatly  reduced,  as  can  be  seen  by  comparing  the  cluster 
representation of Figure V - 4  with  the  longhand  expression  illustrated  in 
Figure V - 3 .  
6 9  
TASKS? 
c 1  
LE 
A3 
A4 
88 
GE 
"- 
A 8  
89 
8 3  
~ 
ASSIGNED 
PERIODICITY 
(MILLISEC) 
1.5 
1.5 
3.0 
6 .O 
60.0 
180.0 
"" 
30 .O 
120.0 
180.0 
EXECUT)ON TIME 
WSEC) 
62 
100 
116 
688 
1124 
400 
"" 
5134 
8000 
1600 
EXPRESSION 
(THIS SET OF TASKS WILL REQUIRE PREEMPTION, 
AND THE TASKS ARE LOGGED IN A SEPARATE 
EXPRESSION IN  ORDER OF INCREASING PERIOD 
SIZE .) 
fTha" tasks are allocafmd in fha ordar of docramsing itoration r a t a .  axcmr for thola tasks for which the 
a r u u f i o n  rima aqualmd or arcdmd the pariod tor tha most fraquantly axacufmd tasks. Thew tasks raauire 
p r m p t i o n  and are assignmd fo an expransion wparata from fhaf for fhe main nchdula. 
$Assumad processing rate of 0.5 MIPS 
FIGURE V-3 SAMPLE  SCHEDULE  DERIVATION  FOR  THE  LANDING  FL IGHT  PHASE 
70 
c1 
I 
1 
LE 
A3 
Y""""-"- 
w-------- -7 A4. / \  """"- Z 
/ l1 2 7 /  \ Q  
v - - - - -  7 ------ \ G E  A / 
c 1  
I 
I 
LE 
A 1338 
NOTE: Clusters may be defined in increasingly  greater detail: 
x = [Y,Zl   V = ( G E . A l 2 8 )  
Y = (A4.W) z = [ A I Z I S ~ ~ , U I  
w = [A52829.Vl  U = (88 ,   A94)3  
FIGURE V-4 ALTERNATE  SCHEDULE  REPRESENTATION 
There are two procedures  tha t  need  fu r the r  d i scuss ion .  The f i r s t  
i nvo lves  the  r eason  beh ind  choos ing  success ive  mul t ip l i c i ty  f ac to r s  
r a t h e r  t h a n  f a c t o r s  b a s e d  on the  " lowes t  common denominator." It i s  
i n d e e d  t r u e  t h a t  t h e  la t ter  approach would lead to  an  accep tab le  sched-  
u l e .  However, t h e  i n t e n t i o n  h a s  b e e n  t o  d e r i v e  a s c h e d u l e  d e s c r i p t i o n  
t h a t  was conc i se  and  s imple .  Wi th  th i s  in  mind ,  examining  the  in i t ia l  
s tages  of  schedule  development ,  one writes a s imple  expres s ion  wi th  a 
pe r iod  as b i g  as t h e  s h o r t e s t  o f  t h e  t a s k s .  One t h e n  a s s i g n s  p o r t i o n s  
of  the  remain ing  t i m e  b lock  to  success ive  t a sks .  In  the  c i t ed  example  
71 
I I I N I T I A L I Z E  
ORDER TASKS BY Pj 
FROM SHORTEST TO 
LONGEST 
WITHIN Pi, ORDER 
FROM SHORTEST TO 
LONGEST ej, SET k = j 
+ 
ROUND Pj + 1 DOWN TO 
NEAREST MULTIPLE 
OF Pk 1 
SET k TO k - 1 
FOR THE SHORTEST 
PERIOD: 
SET T~ = p1 x 103 psec 
THEN, AT = T~ - Zs ej 
Number of tasks: 
Tasks: 
Period of task: 
Execution time: 
Subsets of task: 
N 
j = 1,2, . . . N 
P. (msec) 
ej (psec) 
s = number of tasks j ,  
j + 1 . . . having the 
same period (Pi) 
1 
FIGURE V-5 SCHEDULE DERIVATION FLOWCHART 
72 
Q 
SPLIT AT 
NEXT Pi - INTO 2 
BRANCHES 
1 1 
REPEATING ALL  LAST  ELEMEN : 
Cod.: I S I AT, = AT - I: ej 
+ 
Pi + s -1 * AT,, 
INSERT TASKS: 
AT - pi,pi + 1. . . . 
v 
I GET 1ST AT > ai j = j + s  
t
STORE TASK AT 
END OF LIST FOR 
PREEMPTIVE 
ALLOCATION 
(MAY BE BAD 
ROUNDING 
SELECTION) 
CREATE NEW FORKED 
CYCLE, SET 
M Pi'Pcxprcslion 
7 
1 
SET OE + l [ P ~ , P ~ l ,  . . . IP,.ATFIJ OLD EXPRESSION 
AND AT = A T F  FOUND ABOVE, THEN OE BECOMES 
[(OEIM - 1,  ([P,.P, I, . . . [Pm.Pi.ATF - Pi) l I I  
FIGURE  V-5  SCHEDULE  DERIVATION  FLOWCHART  (Continudl 
73 
I COLLECT SIMILAR EXPRESSIONS FOR SIMPLIFICATION 
COLLECT BLOCKS 
OF ATS AS POSSIBLE 
AND DEVELOP ALTERNATE 
DIAGRAM 
APPEND PREEMPTED 
MODULES AT END 
OF EXPRESSION 
FIND SHORTEST PERIOD 
Abort: WITH UNASSIGNED AT, 
Inadequate STORE AS PShort SET j 1 
Schedule Time 
Available 
ei - FIND 
WITH UNASSIGNED AT 
SET e. preempt 
NEXT SHORTEST PERIOD 
1 Yes 
FIGURE  V-5  SCHEDULE DERIVATION FLOWCHART (Concluded) 
74 
t h e  t o t a l  e x p r e s s i o n  h a d  a p e r i o d i c i t y  o f  twice t h e  smallest per iod .  We 
t h e n  p a r t i t i o n  the a v a i l a b l e  t i m e  b l o c k s  ( t h e  As) t y p i c a l l y  i n t o  two 
subexpressions,  only one of  which i s  executed each t i m e  t h e  e x p r e s s i o n  
i s  processed.   These  subexpressions w i l l  t y p i c a l l y  h a v e  t h e i r  own A 
t i m e  b locks  as e lements .  Note  tha t  the  per iod  of  these  e lements  i s  a 
m u l t i p l e  of t h e  p e r i o d  of i t s  parent   expression.   Thus as each  t a sk  i s  
added t o  t h e  e x p r e s s i o n ,  a dec is ion  must  be  made as t o  which A t i m e  
b l o c k  s h o u l d  b e  p a r t i t i o n e d  t o  accommodate t h e  t a s k .  I f  i t  i s  p o s s i b l e  
to  t ake  advan tage  o f  t he  expres s ion  hav ing  the  l a rges t  pe r iod  a l r eady  
ass igned ,  by u s i n g  t h e  A t i m e  b lock  r ema in ing  wi th in  tha t  expres s ion ,  
t hen  the  embedded expres s ion  h i e ra rchy  i s  g r e a t l y  s i m p l i f i e d  by not  
p r o l i f e r a t i n g  many  new d i s j o i n t  s u b e x p r e s s i o n s  w i t h  new p e r i o d i c i t i e s .  
I n  a d d i t i o n ,  t h i s  c h o i c e  of t h e  A t i m e  b l o c k  l e a d s  t o  more e f f i c i e n t  u s e  
of a v a i l a b l e  A t i m e  b locks ,  l eav ing  l a rge r  time blocks unfragmented and 
who le. 
The second procedure deals  with the use of remaining time blocks  
t o  s a t i s f y  t h e  n e e d s  o f  t a s k  r e q u i r i n g  p r e e m p t i o n .  It  i s  assumed t h a t  
each  o f  t hese  t a sks  i s  run  to  comple t ion  be fo re  ano the r  i s  s t a r t e d .  Here 
la rge  cont iguous  t i m e  b l o c k s  t h a t  are m a i n t a i n e d  i n t a c t  and belong t o  
t h e  s h o r t e r  p e r i o d s  make t h e  p r e e m p t i o n  t a s k  a n a l y s i s  much more s t r a i g h t -  
forward .  This  a l lows  use  of  very  regular  t i m e  i n t e r v a l s  t o  v e r i f y  t h a t  
t h e  p e r i o d  n e e d s  c a n  b e  s a t i s f i e d  f o r  t h o s e  p r e e m p t e d  t a s k s  i n - t h e  w o r s t  
case of a l l  t a sks   r equ i r ing   p rocess ing   s imul t aneous ly .   Fo r   example ,   i n  
t he  sample  schedu le ,  i f  a l l  t h ree  p reempt ive  type  t a sks  came d u e  f o r  
execut ion  a t  t h e  same t i m e ,  t h e r e  ar.e A t i m e  blocks of  1338 psec avail- 
ab le  eve ry  3 millisec.  This  means t h a t  by 30 millisec ( the  pe r iod  of 
the   mos t   f r equen t   t a sk ) ,   t he re  are 13,380 psec   ava i l ab le .   S ince  a l l  
t h r e e  t a s k s  r e q u i r e  a t o t a l  e x e c u t i o n  t i m e  of.l4,734 p s e c ,  i f  w e  used no 
o t h e r  A t i m e  b l o c k s  i n  t h e  e x p r e s s i o n ,  A8 and B9 would have been executed, 
and B3 c o u l d  b e  s t a r t e d  when A8 came due f o r  e x e c u t i o n  a g a i n .  I f  w e  
then examine the longest  consecut ive sequence of  tasks  that  could occur  
before .  a A t i m e  b lock  came a v a i l a b l e ,  i t  could be C1, LE, A 3 ,  B8, C1, 
LE, which  consumes a t i m e  b lock  of  1564 psec.  Then i n  t h e  w o r s t  case, 
i f  t h i s  s equence  occur red  a t  t h e  t i m e  the  three  preempted  tasks  needed  
75 
execut ion ,  a l ead  t i m e  i n  s c h e d u l i n g  t h e i r  e x e c u t i o n  a h e a d  o f  t h e i r  
r e q u i r e d  p e r i o d i c i t y  t o  a s s u r e  n o  s c h e d u l i n g  p r o b l e m  i n  t h i s  i n s t a n c e  
c a n  b e  c a l c u l a t e d .  It i s  d e s i r e d  t o  e x e c u t e  A8  p r i o r  t o  t h e  o n s e t  o f  
t h i s  "no b r e a k "  s i t u a t i o n .  Then it  should   be   scheduled  a t :  
M= T i m e  No A + Exec Time  A8 = 6.7 mill isec = 23% Of pe r iod  
Hence, i f  t h e  p e r i o d  of t h i s  t a s k  i s  decreased  by 23% f o r  s c h e d u l i n g  
purposes ,  then  the  per iod  a l l  t h r e e  t a s k s  s h o u l d  b e  s a t i s f i e d  i n  t h i s  
w o r s t - c a s e  s i t u a t i o n .  A similar a n a l y s i s  c a n  b e  c a r r i e d  o u t  on t h e  o t h e r  
p r e e m p t e d   t a s k s .   A d d i t i o n a l   c o n s i d e r a t i o n   o f   c r i t i c a l i t y  class and 
m i s s e d  i t e r a t i o n s  may weaken t h i s  r e q u i r e m e n t .  
H. Conc Ius  ion  
I n  summary, then ,  methods  for  de te rmining  the  requi red  number  of 
processor  memory u n i t s  h a s  b e e n  d e s c r i b e d  and a l l o c a t i o n  o f  t a s k s  c a n  
be  readi ly   performed.   Furthermore,  a schedule  der iva t ion  method has  
been   presented   tha t   could   be   per formed  on- l ine .  However, a l l  normal 
f l i g h t  s c h e d u l e s  wou ld  be  bes t  s to red  in  a r egu la r  expres s ion  and  
invoked as requ i r ed .  Th i s  i s  because  the  de r iva t ion  a lgo r i thm wou ld  
requi re  more  execut ion  t i m e  and da ta  access  than  would t h e  retrieval 
of   s tored   schedules .  While c r i t i c a l  t a s k s  w i l l  b e  p a s s i v e l y  a l l o c a t e d ,  
t h e r e  w i l l  be  ins tances  where  i t  may b e  n e c e s s a r y  t o  d e r i v e  a new 
schedule .  A method has  been  desc r ibed  tha t  cou ld  be  r ead i ly  imp le -  
mented. I f  t a s k s  w i l l  revert t o  p i l o t  c o n t r o l ,  t h e n  t h e y  n e e d  o n l y  
b e   d e a c t i v a t e d   d u r i n g   r e c o n f i g u r a t i o n .   T h i s   e n h a n c e s   t h e   c r i t i c a l i t y  
o f  t he  d i sp lay  sc reens  to  the  sys t em.  
A s  t o  s chedu le  s to rage ,  t he  r egu la r  expres s ion  fo rma l i sm can  
c l e a r l y  b e  mapped i n t o  a compact s t o r a b l e  s t a c k  o f  t a s k s  w i t h  a p p r o -  
p r i a t e  d e l i m i t e r s  a n d  f l a g s .  T h i s  would l e a d  t o  a way t o  s t o r e  t h e  
s c h e d u l e s  b o t h  e f f i c i e n t  a n d  u s e f u l .  
A s i m p l e r ,  m o r e  v i s u a l  s c h e d u l e  r e p r e s e n t a t i o n  h a s  a l s o  b e e n  
der ived  tha t  a l lows  for  ready  comprehens ion  of  the  task  execut ion  as 
a f u n c t i o n  o f  time a n d  t h e  p e r i o d i c i t y  o f  i s o l a t e d  t a s k  s e q u e n c e s .  
76 
Lastly,  the  schedule  derivation  is  such  that  preemption  is  required 
for  only a  subset  of  the  tasks,  namely  those  with  exceptionally  long 
execution  times  that  encroach  upon  the  time  periods of the  more  fre- 
quent  tasks,  and  the  period  of  these  preempted  tasks  is  verified.  This 
concludes  a  derivation  of  a  suitable  schedule  representation  for  SIFT. 
REFERENCE 
1. R. S. Ratner, E. B. Shapiro, H. M. Zeidler, S. E. Wahlstrom, 
C. B. Clark,  and J. Goldberg,  "Design  of  a  Fault-Tolerant 
Airborne  Digital  Computer," Vol. II--Computational  Requirements 
and  Technology,  Final  Report.  NASA  CR-132253,  1973. 
77 

I 
VI HARDWARE DESIGN 
A .  Bus  Interconnection ~ Network 
1. Introduction 
The  purpose  of  the  bus  interconnection  network in the  SIFT 
computer  is  to  provide  communication  between  each  processor  (main  or I/O) 
and all  memory  units,  except  possibly  the  single  memory  unit  already 
connected  directly  to  that  processor  by  a  high-bandwidth  link.  This 
communication  could  be  established  with  a  separate  connection  between  all 
processor-memory  pairs.  However,  since  only  a  few of the  total  number  of 
possible  communication  paths  would  ever  be  in  use  at  the  same  time,  a 
multilevel  interconnection  network,  similar  to  those  employed  in  telephone 
systems,  should  be  considered in the  hope of achieving  a  net  saving  in 
equipment. A multilevel  realization  may  turn  out  to  have  some  desirable 
fault-tolerance  features  as  well. A two-level  arrangement  having  four  to 
six  intermediate  busses  was  proposed  in  the  original  SIFT  design  concept. 
In this  section,  some  alternative  designs  for  the  interconnec- 
tion  network  are  explored.  Comparisons  are  made  between  a  single-level 
network  of  direct  connections (no busses),  a  two-level  network  (single 
set  of  busses),  and  a  three-level  network  (a  cascade  having  two  separate 
sets  of  busses).  Bit-serial,  byte-serial,  and  all-parallel  data  transfer 
modes  are  evaluated.  The  principal  cost  measures  used  for  these  compari- 
sons of the  several  cases  are: 
g  or G = number  of  equivalent NAND gates,  a  measure  of 
hardware  complexity. 
t or  T = number  of  terminals. 
d or D = number  of  clock  cycles  of  delay  for  a  full  memory 
access. 
Lower-case  letters  apply  to  a  single  module  or  unit,  and  upper-case 
letters  to  the  grand  totals  for  the  entire  network.  Other  important  but 
less  quantitative  criteria  are: 
79 
0 The  degree  to  which  the  final  network  can  be  conve- 
niently  modularized  into MSI or LSI semiconductor  chips, 
either  custom  designed or commercially  available. 
0 The  complexity  of  calculations  needed  to  generate 
the  routing  codes  for  the  two-  and  three-level  net- 
works. 
0 Algorithms  required  for  checking  and  diagnosis  of  the 
interconnection  network  to  achieve  the  desired  degree 
of  fault  tolerance. 
Typically  about  half  of  the  processors  will  be 1/0 microproces- 
sors,  rather  than  main  processors,  and  their  memory  units w ll be corre- 
spondingly  smaller.  They  may  not  need  the  full  number  of  address  and 
data-word  bits,  and  communication  paths  will  probably  not  be  required 
between  each  microprocessor  and  the  other  microprocessor  memories.  How- 
ever,  the  savings  in  time  and  equipment  resulting  from  these  simplifica- 
tions  are  not  expected  to  be  great  and  have  therefore  been  neglected  at 
the  present  stage  of  the  design. 
The  main  results  of  this  analysis  are  expressed  in  Figures 
VI-7 and VI-8, which  are  described  in  detail in the  following  section. 
2. Design Alternatives 
Figure  VI-1  shows  the  interconnection  network  within  its  imme- 
diate  context  in  the  SIFT  computer. Its overall  function  is  to  provide 
bilateral  communication  paths  between  a  set of  p processors  and a  set  of 
p memory  units.  Requests  normally  originate  with  a  processor,  which 
injects  onto  the  forward  connection  the  number  of  the  memory  unit (M) 
with  which  it  wishes to communicate,  bus  routing  information (B) as 
appropriate,  and  the  address (A) within  the  memory.  The  return  connec- 
tion  carries  a  data  word (W) from  memory, or  else  a  single  acknowledg- 
ment  digit .
80 
DIRECT HIGHCAPACITY 
I 1 CONNECTION 
FIGURE VI-1 INTERCONNECTION NETWORK 
The  following  list  gives  the  principal  independent  parameters 
and  their  expected  ranges: 
Parameter  Minimum Typ ica 1 Ma  x  imum 
p = number  of 9 12 18 
processors = 
number  of  memories 
b = number  of 3 
simultaneous  paths 
needed (= number  of 
busses  for  2-level 
case) 
4 6 
nw = number  of  bits 16 24  32 
in  data  word 
na = number  of  bits 16 20  24 
in  memory  address 
The  number  of  bits  needed  for  memory  selection  and  for  bus  selection  can 
be  derived  from p and b, respectively,  assuming  a  convenient  coding. 
81 
Three  possible  interconnection  schemes  are  shown  in  Figure  VI-2; 
switching  unit S in  this  figure is assumed  to  be  capable  of  making  one- 
to-one  connections  between  its  left-hand  terminals  and  its  right-hand 
terminals in  all (or  almost  all) ways--in  the  fashion  of  a  crossbar,  for 
example.  The  quantity (5 designates  the  total  number of simple  switches 
that  would  be  required  if  each  path  through  every  switching  unit  were 
provided  by  a  separate  switch.  (The  subscript  designates  the  number  of 
levels in the  network.) In Figure VI-2(a), for  example, we have 
O1 = p(p - 1). (Recall  that  a  network  connection  from  processor k to 
memory k is  not  needed.)  The  two-level  arrangement  in  Figure VI-2(b) 
reflects  the  assumption  that no  more  than b connections  are  ever  needed 
at  the  same  time.  The  three-level  network of  Figure VI-2(c), to  be 
described  in  detail  later,  also  has  the  flexibility  to  provide  as  few as 
R 
I P 
(c) u3 = p q  (2 + E  
s2 
FIGURE VI-2 POSSIBLE INTERCONNECTION SCHEMES 
82 
i P '  
b < p pa ths ,  a t  a s a v i n g  i n  c i r c u i t  c o m p l e x i t y .  O t h e r  schemes  having 
more l e v e l s  are  a l s o  p o s s i b l e ,  b u t  t h e s e  t h r e e  a l t e r n a t i v e s  w i l l  be seen 
to  be  the  most  compet i t ive  for  present  purposes .  
The se t -up  of the  pa ths  in  the  in te rconnec t ion  ne twork  must  be  
e x e c u t e d  i n  a sequence  of steps.  F i r s t ,  a p r o c e s s o r  i n i t i a t e s  a r e q u e s t  
i n  the  fo rm (B, M y  A ) ,  where B c o n s i s t s  o f  0, 1, o r  2 bus  numbers,  depend- 
ing  on the  number o f  l e v e l s ;  M i s  the  memory number;  and A is  the  addres s  
w i t h i n  t h a t  memory ( job  number  and l o c a l   a d d r e s s ) .   T h i s   r e q u e s t  i s  s e n t  
t o  t h e  f i r s t  r e c e p t o r - - e i t h e r  a memory u n i t  [ F i g u r e  V I - 2 ( a ) ]  o r  a bus 
[Figures   VI-2(b)   and  VI-2(c)] .  Each o u t p u t  c o n t r o l l e r  C of each  receptor  
cont inuous ly  scans a l l  r e q u e s t  l ines  i n c i d e n t  upon i t  whenever i t  is n o t  
busy  holding a connect ion.  When the   scanning  is s u c c e s s f u l ,  t h e  r e c e p t o r  
en te r s  t he  busy  s ta te  and  c loses  the  connec t ions  fo r  t he  fo rward  t r ans fe r  
of  the  address  and  next  bus  ( i f  any)  and  for  the  reverse  t ransfer  of d a t a .  
A t  t h e  n e x t  l e v e l ,  t h e  same a c t i o n  is repea ted  by the  succeed ing  r ecep to r .  
A t  t h e  f i n a l  level (1 ,  2 ,  o r  3 ) ,  t he  addres s  A i s  handled by the  memory 
u n i t  i t s e l f .  
Ac tua l  t r ans fe r  o f  da t a  ( and  la ter  release o f  e s t ab l i shed  pa ths )  
occurs  somewhat d i f f e ren t ly ,  depend ing  upon t h e  mode of b i t  communication 
through  the  network.  For para l le l  t r a n s f e r ,  t h e  memory addres s  A arrives 
a t  the  memory u n i t  c o i n c i d e n t  w i t h  t h e  d a t a  r e q u e s t .  The r e t u r n  of t h e  
d a t a  word W au tomat ica l ly  s igna ls  comple t ion  of  the  opera t ion .  The re- 
ques t  is then  removed  by the  p rocesso r ,  a l l  g a t e s  a l o n g  t h e  p a t h  are 
opened ,  and  each  cont ro l le r  is  released from the busy s ta te  and  resumes 
scanning .   In   the   case   o f  serial  t r a n s f e r ,  r e c e i p t  of a r e q u e s t  a t  t h e  
memory u n i t  t r i g g e r s  t h e  r e t u r n  of  an  acknowledgment d i g i t  t o  t h e  s o u r c e  
processor .   This   p rocessor   then   spews  for th  i t s  stream o f  a d d r e s s  d i g i t s ,  
on completion of which the memory u n i t  r e t u r n s  i t s  stream of data-word 
d i g i t s .  Release of t he   pa th   t hen   fo l lows  as i n  t h e   p a r a l l e l  case. Trans- 
f e r  t o  a n d  f r o m . I / O  p r o c e s s o r s  t a k e s  p l a c e  i n  a n  i d e n t i c a l  b u t  p o s s i b l y  
abbreviated manner, since cormnunication may be needed i n  o n l y  one r a t h e r  
t h a n  b o t h  d i r e c t i o n s .  
83 
3 .  Parallel Transfer 
It should  be  clear  from  this  description  that  each  controller 
in  each  receptor  (memory  unit  or  bus)  requires  two  parts,  a  scanner  and 
a  switch.  The  manner  in  which  these  two  parts  function  together  is shown 
in  block  diagram  form  in  Figure  VI-3  and  as  a  logical  circuit  in  Figure 
VI-4,  for  the  single-level  case  and  parallel  mode  of  transfer. 
In the  functioning  of  each  scanner  (Figure  VI-4), a (p - 1)- 
stage  unary  counter  cycles  continuously  as  long  as no request  is  received 
on  one  of  its  memory  select  lines M. The  first  such  request  that  is 
encountered  stops  the  counter,  and  the  counter  state  m  and  busy  signal 
are  passed  on  to  the  switch. 
Each  switch  is  a  simple  two-way  multiplexor  for  connecting  one 
of the p - 1 processors  to  the  corresponding  output  lines.  Thus  it  has 
(P - 1) (na + n ) left-hand  terminals,  which  connect  to  the  processors, W 
and n + n right-hand  terminals,  which  connect  to  the  memory  proper. 
The  gate  realization  is  straightforward.  The  complex of lines  at  the 
left  side  of  Figure VI-4 corresponds  to  the  nearly  complete  crossing  of 
connections  within S in  Figure VI-2(a). Note  that  the  total  of p(p - 1) 
M-lines  ties  directly  to  the  scanners, p - 1 of  them  to  each  scanner. 
The  other  lines  are  paralleled  to  or  from  the  controllers. 
a W 
a.  One- and Two-Level Networks 
The  cost  measures  for  a  single  controller  may  now  be 
written  down  directly  for  the  single-level  case.  For  the  scanner  we  have 
gsc 
tsc 
= 12(p - 1) + (p - 1) + 2 + 6 = 13p - 5 
equivalent  gates 
= 2p  terminals  (excluding  clocks  and  power) 
d = 1, d = p - 1 clocks; 
and  for  the  switch,  letting n = n + n 
scmin scmax 
a W’ 
gsw = np gates 
= (n + 1)p terminals sw 
dsw = 0 
84 
TolFrom 
Processor 1 
W 
Memory Select Cable 
\ 
\ 
\ 
Address Cable 
Data Word Cable 
-I s c T E R  
M 
TolFrom 
Processor 
P 
A - 
SWITCH 
MEMORY 
. 
. 
- " -  
p Controllers 
FIGURE VI-3 SCANNER AND SWITCH FUNCTIONAL BLOCK DIAGRAM 
85 
FIGURE VI-4 SCANNER AND SWITCH LOGIC CIRCUITRY 
86 
We have assumed here a c o s t  of 1 2  ga t e s  pe r  s t age  fo r  t he  coun te r ,  and  
6 e q u i v a l e n t   g a t e s   f o r   t h e   s i n g l e - d i g i t   d e l a y .   ( T h e s e   c o s t s   a r e  a l l  
a p p r o x i m a t e ,  b u t  t h e  f i n a l  r e s u l t s  do n o t  d e p e n d  c r i t i c a l l y  upon  them.) 
Wired-OR o u t p u t  g a t i n g  i s  assumed f o r  t h e  d a t a - w o r d  l i n e s  r e t u r n i n g  t o  
the  processors .  
~ I n  t h e  t w o - l e v e l  c a s e ,  shown i n  F i g u r e  V I - 5 ,  the bus and 
memory r e c e p t o r s  a r e  t h e  same as in  the  one - l eve l  ca se  excep t  fo r  t he  
d i f f e r e n t  numbers   o f   inputs   and   ou tputs .   In   par t icu lar ,   the  number  of 
scanned posi t ions increases  f rom p - 1 t o  p i n  t h e  f i r s t  leve l  and  reduces  
from p - 1 t o  be in   the   second  leve l .   Thus ,  
Level 1 
gsc 
t = 2 p + 2  
= 13p + 8 
s c  (b of  these)  
d = l t o p  s c  
Level 2 
gs c = 13b + 8 ) 
t = 2 b + 2  (p og these )  
= l t o b  
s c  i d s c  
The  number of b i t  l i n e s  t o  be switched increases  from n t o  n + p i n  t h e  
f i r s t  l e v e l ,  i n  o r d e r  t o  i n c l u d e  t h e  r o u t i n g  d i g i t s  B y  bu t  r ema ins  a t  
the   va lue  n in   t he   s econd   l eve l .  Thus 
Level 1 
SW 
= (n + p + 1 ) ( p  + 1) 
d = O  
s w  
Level 2 
gsw 
d s w  
= n(b  + 1) 
= (n + .1) (b + 1) 
= o  
sw 
(b  of  these)  
(p  of  these)  . ,  
, .  
87 
Note  that  the  circuit  arrangements  presented  in  Figures 
VI-3  and  VI-5  constitute  a  simplification  over  that  described  previously 
in  the  Project 1406 Final  Report.  Specifically,  memory-unit  selection 
is  done  here  with  a  unary  rather  than  a  binary  code.  This  choice  presumes 
that  each  processor  requests  each  individual  memory  unit  with  a  separate 
line. The  previous  DATA  REQUEST  line  can  then  be  combined  with  this  line. 
If the  memory  units  were  selected  with  a  more  compact  binary  code  having, 
say, np digits,  where n < p, then  the  number of bit  lines  to  be  switched 
in the  first  level  would  be  reduced  slightly  in  the  two-level  case 
P 
[gsw = (n + n,)p], but  a  more  costly  scanner  must  be  used [g M 13p + sc 
15 + n (p + ll)]. We conclude  that  the  unary  code  leads to a  more  eco- 
nomical  design. 
P 
The  grand  totals  may  now  be  calculated.  Summing  over  all 
p memory-unit  controllers,  the  single-level  case  yields 
G~ = p(13p - 5) + (n)p = p (n + 13) - 5p 2 2  
T1 = 2p + (n + 1)p = p (n + 3) 2  2  2 
- 
, Dlmin 3, Dlmax = p + l ;  
while  the  two-level  case  yields 
G2 = b(l3p + 8 )  + (n + p)b(p + 1) + p(13b + 8 )  + n(b + 1)p 
= (n + 8) (2bp + p + b) + bp(p + 11) 
T2 = b(2p + 2) + b(p + l)(n + p + 1) + p(2b + 2) + 
p(n + 1)(b + 1) 
= (n + 3)(2bp + p + b) + bp(p + 1) 
Note  that  in  both  cases  two  clocks  have  been  added  to  the  total  transfer 
time  for  acceptance of the  memory  address  and  return of the  data  word. 
G and T designate  the  total  number  of  gates  and  terminals,  respectively, 
for  all  controllers,  assuming  one  scanner  module  and  one  switch  module 
in  each  controller. 
88 
" 
L 
SWITCH . I 
Eb A 
W 
4 
- 
b Level 1 
. 
. 
Controllers 
3 
. . 
A 
I 
M SCANNER - P 
W SWITCH W 
" P 4- MEM 
P - 
p Level 2 
Controllers 
FIGURE VI-5 TWO-LEVEL NETWORK BLOCK DIAGRAM 
b. The Three-Level Network" 
The  general  form  of  a  three-level  interconnection  network 
was  shown  in  Figure VI-2(c)  [Ref. 1 1 .  In the  first  level,  the  set  of p 
network  inputs is handled s at  a  time,  by  qp/s  controllers  in  p/s  groups 
of q each.  Each  has s inputs.  The  second  level  has  qp/s  controllers, 
now  in q groups of p/s  each. Each  has  p/s  inputs.  The  third  level, 
antisymmetrical  to  the  first,  consists  of p controllers  in  p/s  groups  of 
s each.  Each  has q inputs. An example  for p = 9, s = q = 3 ,  is  given 
in  Figure VI-6. 
p = 9 ,  s = q = 3  
F I G U R f  VI-6 EXAMPLE OF A THREE-LEVEL NETWORK 
The  parameters q and s should  be  chosen to optimize  the 
3' design;  we  use  here  the  cost  parameters G T3,  and D The  values of q 
and s measure  the  richness  of  interconnectability,  through  the  number b 
3' 
of  simultaneous  parallel  paths  provided,  just  as  in  the  two-level  network. 
First, s must  be  selected  in  the  range 2 S s S p/b  for  the  network  of 
* 
The  calculations  reported  earlier  in  Technical  Memo No. 5 contained  an 
algebraic  error,  which  affected  the  numerical  results  and  the  compari- 
son of the  three-level  case  with  the  others.  This  error  has  now  been 
corrected  and  the  conclusions  modified  accordingly. 
90 
Figure  VI-2(a)  to be meaningful.  To  achieve b simultaneous  paths  we  need 
q = ~ i f s S b a n d q k b i f s > b .  
Telephone  switching  theory  provides  several  directly  appli- 
cable definitions  and  results. 
A rearrangeable  network  is  one  that  provides  all  possible 
one-to-one  input-output  connections,  just  as  assumed  for  the  switching 
unit S itself; i.e.,  it  is a permutation  network. In general,  however, 
if  some  of  these  connections  have  already  been  established  along  certain 
paths,  it  may  be  necessary  to  reroute  them  in  order  to  set  up  additional 
connections. A nonblockinq  network  also  has  full  permutation  capability, 
but  in  this  case  such  rerouting  is  never  necessary,  regardless  of  the 
order  and  the  particular  routing  with  which  prior  connections  are  set  up. 
These  two  classes  of  switching  networks  are  useful  theoretical  models  for 
telephone  switching,  but  are  never  used  in  practice  because of their  high 
cost,  and  because  only  a  small  fraction  of  all  telephones  are  ever in use 
at  the  same  time. 
The  parameters  of  the  three-level  network  are  constrained 
as  follows : 
0 For  a  nonblocking  network, q 2 2s - 1. 
0 For  a  rearrangeable  network, q 2 s .  
0 For  most  telephone  networks, q << s .  
For  the  bus  interconnection  network,  it  has  been  assumed 
sufficient  to  have  as  few  as  b < p  simultaneous  connection  paths.  Con- 
sequently,  a  rearrangeable  or  nonblocking  capability is not  needed; 
however,  such  capability  may  be  an  asset  if it can be  achieved  at  a  small 
additional  cost--a  definite  possibility,  since  the  utilization  ratio  b/p 
is larger  here  than  in  telephone  practice. 
Cost  parameters  for  the  scanners  and  switches in the  indi- 
vidual  levels  may now be  readily  calculated,  just  as  was  done  in  the  two- 
level case.  For  parallel  data  transfer: * 
*Actually,  a scanner-having only two positions  is  somewhat  less  complex 
than  the  above  expressions  indicate.  The  value  gsc = 21 has  been  used 
in  this  case  instead  of  the  value  given  by  these  formulas  (gsc + 3 4 ) .  
91 
Level 1 
gsc 
tsc 
= 13s + 8 
= 2 s + 2  
d = l t o s  sc 
gSW 
tsw 
+ (n + + q) ( s  + 1) 
= (n + E  + q + l)(s + 1) 
S 
d = O  sw 
Level 2 
gsc 
tsc 
= l$+ 8 
S 
= 2E+ 2 
S 
d sc =lto: 
gsw 
tsw 
dSW 
= (n + E) (t + 1) 
= (n + 2 + 1) (E + 1) 
S 
S S 
+ o  
Level  3 
gsc 
tSC 
= 13q + 8 
= 2 q + 2  
d = l t o q  sc 
gsw = n(q + 1) 
tsw = (n + 1) (q + 1) 
d = O  sw 
(y of  these) 
I 
> (y of these) 
(p of these) 
Summation  of  these  individual  contributions  leads to com- 
plex  expressions  for G and  T3.  Those f o r  T are identical  in  form  and 
similar  in  value to those  for G3,  differing  only  in  some of the constants. 
Consequently,  it  will  be  sufficient to deal  with G only,  in  optimizing 
q and s in  terms of p, b y  and n. 
3  3 
3 
92 
F i r s t ,  n o t e  f r o m  t h e  c o n t r i b u t i n g  terms tha t  the  depen-  
dences on q and s are s t r i c t l y  p o s i t i v e  and i n v e r s e ,  r e s p e c t i v e l y ,  so  t h a t  
G w i l l  be  least when q i s  minimized  and s is maximized,  subject  only to  
t h e  c o n s t r a i n i n g  i n e q u a l i t i e s  c i t e d  a b o v e .  Two cases mus t   be   d i s t i n -  
3 
g u i s h e d .  I f  p 5 b , t hen  w e  have q = s 5 p/b  5 b .  Se lec t ion  of t h e  
l a r g e s t  p o s s i b l e  v a l u e  o f  s g ives  s = p / b ,  y i e l d i n g  
z 
G3 = p [ p  + (2n + 27)bl /b  + p(p  + 3n + 24) + 2 2 
pb(b + n + 1 5 ) .  
2 On t h e  o t h e r  h a n d ,  i f  p > b , t hen  we have q = b  2nd  b < s 5 p/b.  The 
maximum value of  s is  t h e  same, y i e l d i n g  
G = b + (n + 16)b + 2(n + p + 8 ) b  + 2(n + 13)pb + 4 3 2 3 
p (n  + 8 ) .  
The cor responding   va lues   o f  D are  
D3min 
3 
= 5  
( $ + b + 2 w h e n p 5 b  2 
4 .   B i t -Se r i a l   T rans fe r  
For serial  implementation of t h e  c o n t r o l l e r ,  t h e  s c a n n e r  h a s  
t h e  same c o s t s ,  w h i l e  t h e  number  of b i t  l i n e s  i n  t h e  s w i t c h  is  reduced 
from n t o  j u s t  3 :  a s i n g l e  a d d r e s s  l i n e ,  a n  acknowledgment l i n e ,  and a 
data-word  l ine.  However, t h e  time r e q u i r e d   f o r   a c t u a l   t r a n s f e r  is now 
inc reased  by n. Thus, 
gsw = 3P 
tsw = 4P 
dsw = n 
For a l l  p memory u n i t s  i n  t h e  s i n g l e - l e v e l  case, then  
93 
2 2 2 
Tl = 2p + 4p = 6p 
lmin = n + 3, Dlmax = n + p + l  
For  the  two- l eve l  ca se ,  a s suming  pa ra l l e l  t r ansmiss ion  o f  mem- 
o r y  s e l e c t i o n  l i n e s  t h r o u g h  t h e  b u s  c o n t r o l l e r s ,  b u t  s e r i a l  t r a n s f e r  o f  
a l l  a d d r e s s e s  a n d  d a t a ,  
G2 = b( l3p  + 8) + b(p + 1 ) ( 3  + p)  + p(13b + 8) + p(b + 1 ) 3  
2 
= b(p  + 33p + 11) + l l p  
D2min - n + 4,  D2max = n + p + b = 2  
- 
For  the  three- leve l  case ,  t ak ing  s = p / b  and e i t h e r  q = s o r  
q = b, a s  b e f o r e :  
G, = p ( p  + 33b)/b + p(p + 33) + bp(b + 18) f o r  p S b , 2 2 2 
= b + 19b3 = 2(p + 1 l ) b  + 32pb + l l p  f o r  p > b . 4 2 2 
D3min = n + 5 ,  D3max = * + b + n + 2 f o r p S b y  b 2
= E + 2 b + n + 2 f o r p > b  2 b 
I n  t h e s e  c a l c u l a t i o n s ,  i t  is assumed t h a t  t h e  c i r c u i t r y  r e q u i r e d  
f o r  s e r i a l i z a t i o n  and p a r a l l e l i z a t i o n  of  addresses  and data  i s  i n t e g r a t e d  
in to  the  p rocesso r s  and  memor ie s ,  r e spec t ive ly ,  a t  no appreciable  change 
i n  ha rdware  cos t  w i th in  these  un i t s .  I f  t h i s  a s sumpt ion  is n o t  j u s t i f i e d  
fo r  t he  t echno logy  chosen  fo r  p rocesso r  and  memory implementation, then 
t h e  e f f e c t i v e  v a l u e  of G w i l l  increase o v e r  t h a t  c a l c u l a t e d  h e r e .  R 
5. Byte-Ser ia l   Transfer  
S i m i l a r  c a l c u l a t i o n s  a p p l y  i f  t h e  t r a n s f e r  is e f f e c t e d  u s i n g  
r = Ina/B’ + fnw/B’ success ive   B-b i t   by tes .  Here, n -+ 2 + 1 i n  G and T ,  
and  the  de lay  D i n c r e a s e s  by r o v e r  t h e  v a l u e  f o r  t h e  p a r a l l e l  mode. 
B 
The r e s u l t s  f o r  t h e  s i n g l e - l e v e l  n e t w o r k  become: 
94 
lmin = r + 3 , D  = r + p + l  lmax 
For the  two-level  network:  
2 
2 
G 2  = bp + (48 + 29)bp + (b + p)(28  + 9) 
T2 = bp + (48 + 9)bp + (b + p)(2@ + 4)  
D2min = r + 4 ,  = r + p + b + 2  2max 
F i n a l l y ,  f o r  t h e  t h r e e - l e v e l  n e t w o r k :  
G3 = p [ p  + (48 + 29)b] /b  + p(p + 6 8  + 27) + 2 2 
pb(b + 28 + 1 6 )  f o r  p 5 b , 2 
= b4 + (28 + 17)b3 + 2(28 + p + 9)b2 + 2(28 + 14)pb + 
p(28 + 9 )   f o r  p b 2 
D3min = 5 + r ,  
D3max = l e + b + r + 2 f o r p < b ,  b 2 
= ‘ = 2 b + r + 2 f o r p > b  2 
b 
6 .  Comparative  Analysis of Cost  Measures 
F igures   VI-7(a) ,  ( b ) ,  and ( c )   d i s p l a y   c o l l e c t i v e l y   t h e  magni- 
tude  of G a s  a func t ion   of  p ,  b ,  n ,  and   over   the   ranges   o f   in te res t   o f  
t h e s e   p a r a m e t e r s ,   f o r   t h e   p a r a l l e l  mode o f   da t a   t r ans fe r .   These   cu rves  
a r e  shown f o r  t h e  t y p i c a l  v a l u e  o f  n = n + n = 4 4  b i t s  ( s o l i d  c u r v e s ) ,  
w i t h  v a l u e s  f o r  t h e  minimum and maximum (n = 32  and n = 56, r e s p e c t i v e l y )  
des igna ted  by do t t ed   cu rves   fo r   t he   two- l eve l   ca se .  (The o t h e r s  are  very 
similar .) 
a w  
All of  the  fo rmulas  fo r  T are so similar i n  form and relative 
v a l u e  t o  t h e  c o r r e s p o n d i n g  o n e s  f o r  G t h a t  t h e r e  is no need t o  d i s p l a y  
t h e i r  values s e p a r a t e l y .  It may be  sa fe ly  conc luded  tha t  t he  r anges  o f  
op t ima l i ty  o f  des ign  pa rame te r s  based  on  the  to t a l  number T of scanner 
and  swi tch  te rmina ls  are v e r y  n e a r l y  t h e  same as those  based  on  the  to t a l  
number G of  ga tes .  
95 
W m 
G 
14,000 
12,000 
10,000 
8000 
6000 
4000 
2000 
cc-L_L_Il, 9 12  15  18 p t 
9 12 15 18 p 
(b) b = 4 
9 12 15 18 p 
(e) b = 6 (a) b = 3 
FIGURE VI-7 GATE COSTS FOR PARALLEL TRANSFER 
Reference  to  these  figures  leads  to  the  following  conclusions 
for  the  parallel  case. 
For  b = 4 and p large  the  two-level  network is preferred  by  a 
wide  margin,  but  as  b  is  increased  or  p  decreased,  both  the  one-  and 
three-level  networks  become  more  competitive  in  terms  of  gate  cost. In 
particular,  the  two-level  network  is  superior  to  the  three-level  over  the 
entire  range  of  parameter  values  of  interest,  and  to  the  one-level  network 
as  well  over  all  of  this  range  except  when p 5 2b;  however,  this  extreme 
combination  of  values is very  unlikely to  occur in the  design  of  the SIFT 
computer.  The  dependence  of  the  minimum  gate  cost  on all.three parameters 
p, b, and  n is linear  and  nearly  proportional; in fact  the  approximate 
formula 
holds  within 3% over  the  entire  range  of  interest.  This 
one  to  estimate  quickly  the  effect  of  a  change in  one of 
eters on the  circuit  complexity. 
Gate  costs  for  the  seri.al  mode  of  transfer  are 
formula  allows 
the  design  param- 
shown  in  Figure 
VI-A-8  for  the  case  b = 4 .  Again,  the  superiority  of  the  two-level  imple- 
mentation is apparent. 
For  the  typical  case: p = 12, b = 4 ,  n = 4 4 ,  and B = 4 ,  Table 
VI-1  lists  maximum  delay  times  D  for 1, 2, and 3 levels  and f o r  all 
three  transfer  modes.  Dependence on  p, n, and  b  is  linear  (where  there 
is any  dependence  at all), so the  sensitivity  of D to  changes in these 
parameters  can  be  readily  estimated. 
Rmax 
max 
It is clear  that,  within  each  mode,  the  one-  and  three-level 
networks  are  preferred  from  the  standpoint  of  minimizing  the  maximum  delay 
time.  However,  the  differences  are  only  about  25%  to 30% for  the  parallel 
mode,  are  truly  negligible  for  the  bit-serial  mode,  and  are in between  for 
the  byte-serial  mode. 
It may  be  concluded  that  the  operating  speed  of  the  intercon- 
nection  network  is  not  critically  dependent  upon  the  number  of  levels  in 
the  network,  but  (not  surprisingly)  depends  rather  critically  upon  whether 
97 
6000 t 
4000 
G 
2000 
9 12 15 18 
P 
FIGURE VI-8 GATE COSTS FOR BIT-SERIAL TRANSFER b = 4 
the  parallel,  byte-serial,  or  bit-serial  mode f transfer  is  employed. 
In the  parallel  case  there  is  a  secondary  dependence o  the  number p of 
processors  and  number b of  busses,  as  expressed  in  the  equations 
lmax = p + 1 ,  D = p + b + 2 ,  2max D3max x 3 b +  2 
7. Networks  with  More  Than  Three  Levels- 
For  sufficiently  large  p,  the  costs G A and T  can be  reduced  by 
employing  an  interconnection  network  having  more  than  three  levels.  This 
can be  done  by  holding q and s at  small  values  (2, 3, or 4 )  and  applying 
recursively  to  each  controller  in  the  center  level  the  one-to-three-level 
transformation  implied  by  Figures  VI-2(a)  and  VI-2(c).  For  example, a 
five-level  network so generated  from  Figure  VI-2(c)  for p = 8, s = q = 2 
is  illustrated  in  Figure VI-9. The relevant  concern  here  is  whether  such 
an  alternative  is  preferred  over  the  two-level  case  for  the  range  of 
values  of p likely  to  be  encountered  in  the  SIFT  interconnection  network. 
R 
98 
FIGURE VI-9 EXAMPLE OF A FIVE-LEVEL NETWORK 
The  limiting  situation q = s = 2  may  be  examined  first. In this 
case  each  of  the S-units  has two  inputs  and  two  outputs. The  network  is 
the  well-known  Benes-Waksman  arran  [Ref. 21 shown  in  Figure VI-9 for 
p = 8. This  network  is  known  to  have  complete  permutation  capability 
(b = p), and  for p = 2  has  2u - 1 levels  and p(u - 1/2) S-units. (A 
less  symmetrical  version  has  a  slightly  smaller  number p(u - 1) + 1 of 
S-units, but  would  probably  require  a  more  complex  routing  algorithm.) 
Thus, 
U 
R = 2u - 1, 
OR = 2p(2u - 1) 
Each  S-unit  requires  a  pair  of  route-selection  lines  from  the  source  pro- 
cessor. At the  ith-level,  then, 
gswi = 3[n + 2(R - i)] 
Summing  over  all  levels  to  get  the  grand  total 
we obtain,  taking  g = 21  as  before, sc 
gA = 6.2  (2u - 1) (n + 2u + 5) ,  where u = log2(p) u- 1 
99 
For p = 16 we have u = 4 and .k? = 7, giving G = 19,152  for n = 44 and 
G7 = 5736 for  the  serial  mode.  These  values  are  much  greater  than  the 
minimum  values  plotted  in  Figures V I - 7  and V I - 8 .  The  corresponding  costs 
for p = 8 (Figure V I - 9 )  are G5 = 6600 and 1 6 8 0 .  These  values  are  indi- 
cated  by  small  triangles  in  Figures V I - 7  and VI-8. We may  conclude  that 
the  exclusive  use  of 2 X 2 S-units,  using  as  many  levels  as  needed,  gives . 
rearrangeability  but  is  more  costly  than  all  other  solutions  in  the  total 
number  of  gates  required.  The  delay D would  also  increase  in  these 
five- and  seven-level  realizations,  of  course. 
7 
max 
Exclusive  use  of 3 X 3 S-units  does  not  lead  to  a  fully  utilized 
network  having > 3 levels  until p = 3 = 2 7 .  For p = 1 2 ,  however,  a 
hybrid  five-level  network  for s = q = 3 can  be  formed  by  realizing  each 
of the  three 4 X 4 S-units  in  the  central  level  in  Figure V I - 2 ( c )  in  the 
form  of  a  three-level,  six-element  subnetwork  of 2 X 2 S-units.  The 
final  network,  shown  in  Figure VI-10, has  a  total  gate  cost G = 1 0 , 8 6 0  
for  n = 44 and G = 2496 for  the  serial  mode.  Another  five-level  version 
using 2 X 2 and 3 X 3 S-units  in  alternate  levels  has  exactly  the  same 
cost.  These  values  are  indicated  by  small  circles  in  Figures V I - 7  and 
V I - 8 .  Again,  these  costs  are  quite  high  and  indicate  the  undesirability 
of  increasing  the  number  of  levels  above  three. 
3 
5 
8 .  Modularization of the  Bus  Interconnection  Network 
The  most  complex  scanner  encountered  in  the  previous  discussions 
had  g = 1 3 p  + 8 2 242 gates  and tsc = 2 p  + 2 5 39 terminals.  Conse- 
quently,  there  would  be no difficulty  in  implementing  the  scanner  as  a 
separate  semiconductor  module,  should  this  be  desired.  The  switch, on
the  other  hand,  has  a  typical  complexity  given  by 
sc 
gsw = (n + y ) ( 6  + 1) 
t = (n + y + 1)(6 + 1) sw 
where 6 varies  from 0 to p and 6 from b or s to p. Except  for  the  serial 
mode (n = 3 ) ,  then,  the  switch  will  need  to  be  partitioned  into  two or 
more  parts  to  fit on any  but  the  largest L S I  modules.  This  may  be  con- 
veniently  done  in  the  present  instance by  taking  advantage  of  the 
100 
I 
""""- 7 
L """"" -I
FIGURE VI-10 FIVE-LEVEL NETWORK USING 2 X 2 AND 3 X 3 S-UNITS 
IN ALTERNATE LEVELS 
iterative  form  of  the  circuitry  for  the  n  parallel  bits.  Only  the 6 gate 
control  lines  (from  the  counter  in  the  scanner)  need  be  repeated in each 
module,  and  all  switch  modules  at  the  same  level  could  be  essentially  the 
same. 
9 .  Comparison  of  Delay  Times 
The  maximum  delay  times  through  the  interconnection  network  for 
different  values  of  are  shown  in  Table  VI-1. 
A particularly  attractive  design  alternative  for  realization 
would  be  one  in  which  all  controllers  have  the  same  number  of  inputs  and 
outputs, so that a  common  module  might  then  be  utilized  in  all  three 
levels.  This  would  result  in  a  minimum  of  wasted  gates  and  terminals, 
merely  those  corresponding  to  some  of  the  routing  selection  digits n 
levels  past  the  first. With reference  to  Figure VI-2(c),  this  condition 
101 
Table  VI-1 
SUMMARY OF MAXIMUM DELAY TIMES Dhax 
(p = 1 2 ,  b = 4 ,  n = 44, f! = 4 )  
- R Parallel Byte-Serial  Bit-Serial 
1 13 2 4  57 
2 18 2 9   6 2  
3 1 2  2 3  56 
2  2 requires  that s = q = p/s, so that p = q = b , for  the  three-level  net- 
work.  Hence , 
G3 = 3b [b + b(n + 14) + (n + 8 ) l  2 2  
for  the  parallel  case,  and  with n replaced  by 3 for  the  serial  case.  Each 
= 2b + b(n + 15) + (n + 8 )  of  the  3b  modules  now  requires g = 
gates  and t = t + t - p = b + b(n + 5) + (n + 3) terminals.  Results 3 sc  sw
for  the  only  two  cases  of  interest,  corresponding  to  b = 3 and  b = 4 ,  are 
tabulated  in  Table VI-2 for  parallel  transfer (n = 44)  and  for  serial 
transfer. The  network  for  b = 3 is  shown  in  Figure VI-6. 
2 2 
23 gsc + gsw 
It is  immediately  apparent  that  the  modules  are  pin-limited 
rather  than  gate-limited  for  any  modern  semiconductor  technology. A 
serial  controller  might  fit  nicely  on  one  semiconductor  chip,  but  the 
parallel  version  would  still  need  to  be  bit-partitioned  to  make  this 
possible.  Nevertheless,  this  possibility  is  an  attractive  one  from  the 
standpoint  of  implementation,  despite  the  nearly  double  total  gate  cost 
(small  squares  in  Figures V I - 7  and VI-8). 
If this  high  gate  cost  can  be  afforded,  the  single-level  network 
must  be  reconsidered,  since  all  controllers  are  also  identical  in  this 
case: 
2 
G1 = p (n + 13) = 5p 
g1 = p(n + 13) - 5 
tl = p(n + 3) 
102 
Table  VI-2 
SUMMARY OF COSTS FOR NETWORK  REALIZATION 
USING  ALL-IDENTICAL  MODULES 
Three-Level  Network One-Level  Network 
Parallel,  n = 44 Bit-Serial Parallel,  n = 44 Bit-Serial 
F 
0 w No. of - b E _Modules - G3 g3 - 3 t - - G3 g3 t3 "
3 9  27  6,345247  203 1,917 83 39 4,572 508 423  1,25139  54
4 16 48 14,400 320  259  4, 611554  14. 2907752 4,016 251  96
The total  number  of  gates  required  is  now  somewhat  less,  but  the  number 
of terminals  (and  gates)  per  module  is  substantially  greater, so more 
modules  would  be  required. 
All of  these  alternatives  are  compared  numerically  in  Table 
VI-2. 
10. Fault  Tolerance  Aspects 
We  follow  the  principle  that  recovery  should  be  possible  from 
all  single  faults  and  from  all  double  and  most  other  multiple  faults  that 
occur  sufficiently  far  apart  in  time to allow a delay  for  the  fault  analy- 
sis  programs  to  identify  the  faulty  unit  and  reconfigure  the  machine  to 
avoid  its  use. 
This  degree  of  fault  tolerance  can  be  achieved  by  defining  and 
isolating  units  in  such  a  way  as  to  limit  the  extent  of  a  "single"  fault. 
Note  that  the  extent  of  a  fault  is  determined  not  only  by  how  the  domain 
of  improper  operation  propagates  electrically,  but  also  by  the  precision 
with  which  the  diagnostic  routines  are  able  to  pinpoint  the  location  of 
the  fault.  For  example,  if  these  routines  are so weak  that  a  fault in A 
can  be  narrowed  down  only  to  the  combination (P ,A), then  the  extent  of 
the  fault  must  be  considered  to  be  (P1,A)  and  not just (A). With  refer- 
ence  to  Figure  VI-11,  this  ability to limit  means  that  any  one  of  the  units 
1 
PI'  P2' P y  A1, A2, or A 3 may  fail  as  a  result  of  a  single  fault,  but 
FIGURE VI-11 ARRANGEMENT OF UNITS 
TO  ACHIEVE  FAULT  TOLERANCE 
104 
t h a t  a f a u l t  i n  u n i t  A1, for  example,  must  never  cause A of A t o  become 
i n o p e r a t i v e .   S i m i l a r l y ,  a f a u l t  i n  P1 must   never   incapac i ta te  P o r  P 2 3‘  
It is  possible ,  though presumably less l i k e l y ,  f o r  a f a u l t  t o  d e v e l o p  o n  
an   ac tua l   connec t ion   pa th- - f rom P t o  A fo r   i n s t ance - - tha t   p reven t s  
communication  between  these two u n i t s .  Such a f au l t  migh t  occur  wi thou t  
b locking   proper   opera t ion   of  P in conjunct ion   wi th  A and A o r  of A1 
i n  c o n j u n c t i o n  w i t h  P and P 2 .  On the   o ther   hand ,   such  a f a u l t  m i g h t  
a l s o  d i s a b l e  e i t h e r  P o r  A o r   bo th .  What we do   i n s i s t ,   however ,  is  
t h a t   i f  A1 mus t  be  dec lared  fau l ty ,  then  A2 and A3 will not  be rendered 
i n o p e r a t i v e  by t h e  same f a u l t ,  and i f  P i s  d e c l a r e d   f a u l t y ,  P and P 
are n o t  a f f e c t e d .  
2 3 
1 1’ 
1 2 3’  
1 
1 1 
1 2 3 
I n  terms o f  t h e  c o n n e c t i o n  g r a p h  f o r  t h e  u n i t s  of t h e  e n t i r e  
compute r ,  t h i s  cond i t ion  i s  e q u i v a l e n t  t o  r e q u i r i n g  t h a t  a s i n g l e  f a u l t  
may disable  (a)  any one branch,  (b)  any one node,  (c)  any one node with 
an  inc ident  branch ,  or  (d)  any  two connected nodes and their  incident  
branch. 
P rope r  de f in i t i on  o f  wha t  cons t i t u t e s  a s e p a r a t e  u n i t  w i t h  
r e s p e c t  t o  s i n g l e  f a u l t s  r e q u i r e s  o n l y  t h a t  r e p l i c a t e d  p a r a l l e l  s u b n e t -  
works  be  regarded as s e p a r a t e  u n i t s ,  j u s t  a s  one  would e x p e c t .   I s o l a t i o n  
o f  t h e s e  s e p a r a t e  u n i t s  must then proceed according t o  t h e  p r i n c i p l e  t h a t  
p a r a l l e l  r e p l i c a t e d  u n i t s  s h o u l d  n e v e r  b e  c o n n e c t e d  t o g e t h e r  i n  a n y  way. 
A t  t h e  c i r c u i t  level  t h i s  c o n d i t i o n  of i s o l a t i o n  r e q u i r e s  t h a t  
m e r g i n g  l i n e s  m u s t  b e  e l e c t r i c a l l y  i s o l a t e d  a t  a l l  p o i n t s  of fan-in and 
fan-out   o f   these   rep l ica ted   un i t s .   This   might   be   implemented  as simply 
as e m p l o y i n g  r e s i s t o r s  o r  d i o d e s  f o r  i n p u t  c l a m p i n g  a n d  f o r  i n t e r u n i t  
connec t ions  a t   a l l  poin ts  of  fan- in  and  fan-out ,  in  order  to  prevent  
p ropaga t ion  o f  f au l t s  be tween  un i t s .  However, o p t i c a l  c h a n n e l s  o r  s p e c i a l  
coup l ing  c i r cu i t s  migh t  be  p re fe r r ed ,  depend ing  upon t h e  n a t u r e  of t h e  
actual s igna l s  and  the  magn i tudes  o f  t he  f au l t  p robab i l i t i e s .  
To p r e v e n t  t h e  m o s t  l i k e l y  t y p e s  o f  d o u b l e  f a u l t s  ( t h o s e  con- 
f i n e d  t o  s i n g l e  u n i t s )  f r o m  b l o c k i n g  f a u l t - f r e e  u n i t s  f r o m  u s e ,  a n  a d d i -  
t i o n a l  c o n d i t i o n  o n  t h e  t h r e e - l e v e l  i n t e r c o n n e c t i o n  n e t w o r k  s h o u l d  b e  
105 
imposed:  that  all  switch  units  should  have  a  fan-in  and  fan-out  of at 
least three. Thus, 
This  requirement  is  satisfied  by  all  of  the  realizations  discussed  pre- 
viously. 
Diagnostic  routines  should  have  no  difficulty in pinpointing 
the  location  of a  fault  to  any  switch,  since  this  unit  is  purely  combina- 
tional  and  is  interposed  in  data  paths  between a  processor  and  a  memory. 
Thus  all  stuck-at  and  short-circuit  faults  will  appear  as  errors in 
address  or  data  transmission  and  can  be  readily  detected  and  located. 
Some of the  switched  lines  carry  routing  and  memory  selection  information, 
but  errors  in  this  information  will  also  show  up  as  data  errors  whenever 
the  wrong  memory  is  selected.  To  provide  more  positive  protection  against 
memory.selection errors,  it  may  be  desirable  to  assign  replicated  files 
to different  address  blocks  within  the  various  memory  units. 
The  scanner,  while  a  simpler  unit,  operates  sequentially  and 
must  therefore  be  checked  externally  for  both  "do-nothing"  faults  and 
faults  that  cause  two  otherwise  nonsimultaneous  actions  to  occur  at  the 
same  time.  The  former  type  of  fault  will  presumably  be  detected  by  what- 
ever  overtime  monitor  is  used  for  processor  operations.  The  latter  type 
could  cause  small  delays  if a processor  tries to communicate  with  a  memory 
unit  through  more  than a  single  path  through  the  interconnection  network, 
or  it  could  cause  data  errors  if a  processor  becomes  connected to more 
than a  single  memory  unit.  Both  of  these  types  of  faults  are  readily 
detected.  However,  the  second  may  require  some  special  attention  in  prep- 
aration  of  the  fault  location  routines so that  the  fault can be  properly 
localized. 
11. Routing 
When  the  number of levels  in  the  interconnection  network  is two
or  more,  the  allocation  program  in  the  system  executive  must  contain  a 
routine  for  assigning  busses  or  a  route  for  each  processor-to-memory  access. 
This  routing  information is forwarded  to  the  processor  as  part  of  the 
106 
execution  program;  in effect it  becomes  a  portion of the  memory-select  and 
address  information.  However,  the  assignment  of  access  routes  cannot  be 
made  independent  of  one  another.  The  scheduling  of  all  of  the  various 
tasks  must  be  considered, in order  to  avoid  conflicts  and  delays  that 
would  otherwise  result  from  two  processors  demanding  overlapping  routes. 
Thus,  route  assignment  must  take  into  account  both  the  scheduling  of  tasks 
and  the  detailed  interconnection  possibilities  within  the  interconnection 
network. 
For  the  two-level  network, all bus  controllers  connect sym- 
metrically  to  all  processors  and  also  to  all  memories.  Consequently, 
potential  routing  conflicts  can  be  circumvented  by  simply  avoiding  the 
assignment of the  same  bus  to  two  concurrent  accesses.  If  the  scheduling 
constraints  are  not  too  severe,  such an assignment  might  be  handled  by 
simply  rotating  the  assignment  of  busses  to  processors.  This  allocation 
rule  would  be  used  until  faults  occur. As particular  bus  controllers, 
processor-to-bus  connections,  and  bus-to-memory  connections  are  recognized 
as  potentially  faulty  and  are  taken  from  use,  the  assignment  algorithm 
would  become  more  constrained. It could  still  operate  on  a  "next  avail- 
able"  basis,  or  by  whatever  algorithm  is  used  for  handling  defective 
processors  and  memories. 
e 
If the  number of levels is three or more,  the  choices  between 
possible  routes  between  processors  and  memories  are  no  longer  equally 
preferable. As indicated  previously,  these  interdependencies  could  be 
completely  avoided  by  employing  a  nonblocking  interconnection  network. 
This  form  of  network  was  seen  above  to  be  nearly  twice  as  costly  as  a 
rearrangeable  network,  however,  and  its  use  is  probably  not  justified  in 
the  present  application. In view of  the  small  number of connections 
likely  to  be  set  up  simultaneously,  relative  to  the  total  number  of  pro- 
cessors,  routing  conflicts  would  appear  to  be  the  exception  rather  than 
the  rule,  even  if  a  simple  "next  available  path"  assignment  algorithm 
were  used.  Telephone  theory  provides  sophisticated  algorithms  aimed at 
minimizing  the  blocking  probability  for  average  low-level  use  of  a  switch- 
ing  system  [Ref. 11. In the  present  case,  however,  the  number  of  proces- 
sors  is  probably too small to make  such  elaborate  schemes  either  necessary 
or  beneficial. 
107 
It seems  sufficient,  therefore,  for  the  allocation  subroutine 
to  maintain  and  update a  simple  table,  which  contains  for  each  processor 
(row  of  the  table)  and  memory unit (column) a  list  of  the  paths  that  have 
been  provided  for  in  the  design--normally 3 or  4--and  the  status  of  each 
such  path:  (a)  fault-free  and  available  at  the moment, (b) fault-free 
but  busy, (c) potentially  faulty,  or (d) faulty.  Except  for  the  multiple 
choice  of  routes,  this  table  is  the  same  in  kind  as  must  be  maintained  to 
store  the  status  of  all  units  in  the  SIFT  computer. 
12. Conclusion 
The  foregoing  analysis  indicates  the  superiority  of  a  two-level 
interconnection  network  over  alternatives  employinp  only  one  level  or 
three  or  more  levels  for  virtually  all  ranges of parameters  likely  to  be 
encountered in the  SIFT  computer  and  for  all  three  transfer  modes, parallel, 
byte-serial,  and  bit-serial.  In  terms  of  the  various  cost  measures,  the 
two-level  network  is  less  complex  in  total  gate  and  terminal  counts  for 
all  parameter  ranges of interest,  in  most  cases  by a  wide  margin. It has 
the  same  (or a  little  greater)  maximum  access  delay. 
The  bus  interconnection  network  is  readily  decomposed  into  cir- 
cuit  modules,  although  some  sacrifice  in  gate  cost  can be expected  if  all 
of  these  modules  are to be  made  alike. 
These  conclusions  are  not  yet  based  upon a reliability  analysis, 
one  that  will  take  into  account  the  various  fault  probabilities  in  each 
type of unit  and  especially  failures  in  interunit  connections,  The  final 
design  choice  for  the  interconnectio  network  will  depend to some  extent 
upon  this  analysis,  as  well  as  upon  the  scheduling  algorithms  and  diag- 
nostic  strategy  adopted  and  upon  system  tradeoffs  involving  speed  re- 
quirements,  relative  hardware  costs,  and  the  parameters  assumed  specified: 
p, b, and  n. 
The  principal  unanswered  question  at  this  stage  of  the  design 
concerns  the  matter  of  how  the  controller  should  be  realized  in  hardware. 
No two  controllers  occupying  similar  replicated  positions  in  the  network 
should  share  the  same  chip.  However,  most  .of  the  controllers  are 
108 
t e r m i n a l - l i m i t e d  r a t h e r  t h a n  g a t e - l i m i t e d ,  and t h i s  p o s e s  a problem for 
good economy o f  r e a l i z a t i o n .  
B .  Input/Output  Subsystem 
1. Int roduct ion   and  Sunmlary 
The input /output  subsystem of  the SIFT computer i s  u s e d  t o  con- 
n e c t  t o  t h e  a i r c r a f t  e n v i r o n m e n t .  It c l e a r l y  m u s t  s a t i s f y  t h e  same kind 
of r e l i a b i l i t y  r e q u i r e m e n t s  as the  remainder  of  the  sys tem.  In  addi t ion ,  
i t  is h igh ly  cons t r a ined  by t h e  c h a r a c t e r  o f  t h e  d e v i c e s  w i t h i n  t h e  air- 
c r a f t ;  f o r  e x a m p l e ,  r e p l i c a t e d  a i r  p r e s s u r e  s e n s o r s  w i l l  not  produce 
i d e n t i c a l l y  t h e  same read ings ,  so  t h e  v o t i n g  on t h i s  d a t a  will have  to  
a l l o w  f o r  t h i s  f a c t .  
The b a s i c  scheme fo r  t he  inpu t /ou tpu t  subsys t em is: 
0 Crit ical  senso r s  are r e p l i c a t e d ,  and the  programs 
t h a t  r e q u i r e  t h e  d a t a  r e a d  a l l  t h e  v e r s i o n s  and 
c a r r y  o u t  a vot ing  procedure  as w i t h  a n y  d a t a  t h a t  
are read .  
0 Crit ical  ac tua tors   mus t   be   rep l ica ted ,   each   of  
them c o n t a i n i n g  s u f f i c i e n t  l o c a l  l o g i c  t o  b e  a b l e  
t o  r e a d  t h e  several v e r s i o n s  of t h e  o u t p u t  d a t a  
t h a t  t h e y  r e q u i r e  a n d  c a r r y  o u t  l o c a l  v o t i n g ,  
p o s s i b l y  by  mechanisms similar t o  t h o s e  c u r r e n t l y  
employed w i t h  m u l t i p l e  a c t u a t o r s  on a i r c r a f t ,  e . g . ,  
forced  sum v o t i n g .  
0 N o n c r i t i c a l  s e n s o r s  and a c t u a t o r s  are n o t  r e p l i c a t e d  
b u t  are connec ted  to  the  sys t em in  the  same way as 
c r i t i ca l  o n e s  i n  o r d e r  t o  p r e s e r v e  t h e  same f a u l t -  
i s o l a t i o n  r u l e s  on the  input /output  subsys tem as are 
used between processing modules. 
0 The cent ra l   comput ing   e lements  of t h e  SIFT system 
are i so l a t ed  f rom noncr i t i ca l  s enso r s  and  ac tua to r s .  
The manner by which the  above  ob jec t ives  are achieved is de- 
s c r ibed  be low,  s t a r t i ng  wi th  the  des ign  o f  a sys t em fo r  c r i t i c a l  sensor  
and  ac tua to r  i npu t /ou tpu t ,  fo l lowed  by a d i scuss ion  o f  appropr i a t e  s t ruc -  
t u r e s  f o r  n o n c r i t i c a l  u n i t s .  The sec t ion  conc ludes  wi th  cons ide ra t ions  
o f  t h e  a i r c r a f t  b u s  s t r u c t u r e  a n d  t h e  q u e s t i o n  o f  p r o b l e m s  o f  t h e  p o s i t i o n -  
i n g  o f  t h e  l o g i c  o f  s e n s o r s  and a c t u a t o r s  a t  the  cen t r a l  compute r  o r  a t  
the  un i t s  t hemse lves .  
109 
2. Critical Input/Output Units 
Figure  VI-12  shows  how  critical  input  and  output  units  would 
be  connected  to  the  central SIFT  computer  system. We assume  that  data 
to  and  from  the SIFT  system  flows on a  multiple  bus  system,  which  is con-
nected  to  the  main  bus  system  of  SIFT  via  logic  that  is  realized  by a 
specially programed microprocessor  (marked  as P in  Figure.VI-12). 
t 
MICROPROCESSOR-BASED 
INPUT/OUTPUT  CONTROLLER 
FIGURE VI-12 INPUT/OUTPUT FOR CRITICAL SENSORS AND ACTUATORS 
110 
Each microprocessor  operates  in  the same manner as t h e  main 
p rocesso r s  o f  S IFT ,  excep t  t ha t  t he  t a sks  tha t  are to  be  pe r fo rmed  are 
much smaller a n d  t h e  e x e c u t i v e  t h a t  r e s i d e s  i n  them is  a reduced  vers ion  
of   the LE/GE c o m b i n a t i o n  i n  t h e  c e n t r a l  p r o c e s s o r s .  The r e d u c t i o n s  t h a t  
are made are: 
0 No g loba l  execu t ive  i s  p r e s e n t  i n  t h e  m i c r o p r o c e s s o r s ,  
as the  func t ions  normal ly  per formed by it are e i t h e r  
n o t  n e c e s s a r y  o r  are c a r r i e d  o u t  by t h e  GE of  the  
central p rocesso r s .  
0 The LE con ta ins  on ly  the  vo te r ,  s chedu l ing ,  and d i s -  
p a t c h i n g  f u n c t i o n s ,  t o g e t h e r  w i t h  s u f f i c i e n t  o f  t h e  
g l o b a l / l o c a l  i n t e r f a c e  t o  e n a b l e  i t  t o  d e t e r m i n e  i t s  
schedu les  by  r ead ing  the  cen t r a l  GE t a b l e s .  
I n  a l l  o t h e r  r e s p e c t s  t h e  1/0 p r o c e s s o r s  o p e r a t e  a c c o r d i n g  t o  
t h e  same g e n e r a l  r u l e s  as t h e  c e n t r a l  p r o c e s s o r s .  T h i s  i n c l u d e s  v o t i n g  
on  mul t ip l e  i npu t  t o  ach ieve  e r ro r  de t ec t ion  and  co r rec t ion ,  r econf igu -  
r a t i o n  by  change  o f  s chedu l ing  t ab le s ,  and  the  r e s t r i c t ion  tha t  a pro- 
c e s s o r  may on ly  r ead  da ta  f rom o the r  p rocesso r s  and  may n o t  write i n t o  
t h e  memory o f  o the r  p rocesso r s .  
F igure  VI -12  shows t h e  l o g i c a l  s t r u c t u r e  b u t  d o e s  n o t  i n d i c a t e  
the   phys ica l   p lacement   o f   the  1/0 p rocesso r s .  It is a n t i c i p a t e d  t h a t  some 
o r  a l l  of t h e  1/0 processo r s  cou ld  be  p l aced  c lose  to  the  senso r s  and  
a c t u a t o r s  t h a t  m u s t  b e  c o n t r o l l e d .  Such cons idera t ions   depend  on   the  
e c o n o m i c s  a n d  r e l i a b i l i t y  p r e d i c t i o n s  f o r  t h e  v a r i o u s  b u s  s y s t e m  t e c h -  
no log ie s .  
3.  - N o n c r i t i c a l  I n p u t  U n i t s  
I n  a SIFT s y s t e m  t h a t  is ca r ry ing  ou t  bo th  c r i t i ca l  and non- 
c r i t i ca l  t a s k s ,  i t  is necessa ry  to  ma in ta in  a sepa ra t ion  be tween  the  t a sks  
b e c a u s e  t h e  n o n c r i t i c a l  t a s k s  may n o t  receive as much v a l i d a t i o n  and ver- 
i f i c a t i o n  as t h e  c r i t i c a l  tasks  and  thus  may c o r r u p t  them. A method  of 
pro tec t ion ,  whereby  a p rogram canno t  a f f ec t  t he  memory o u t s i d e  t h a t  al- 
l o c a t e d  t o  i t ,  p reven t s  p rograms  fo r  t he  nonc r i t i ca l  t a sks  f rom in t e r -  
f e r i n g  w i t h  t h e  c r i t i c a l  o n e s  ( s e e  S u b s e c t i o n  VI-C-4).  The same degree 
of p r o t e c t i o n  a g a i n s t  t h e  p o s s i b i l i t y  of e r r o r s  i n  t h e  h a r d w a r e  i s  
111 
achieved  by  the  use of a  microprocessor-based  unit  that  connects  to  the 
main  bus  system  and  is  used  to  read  from  the  sensor  and  deposit  the  re- 
sults  of  the  read  operation in  its own memory.  These  results  can  then  be 
read  by  the  main  processors of SIFT.  This  scheme  effectively  isolates 
potentially  unreliable  equipment  from  the  other  units  of  the  system. 
4.  Noncritical Actuator Units 
Noncritical  actuators  can  be  dealt  with in the  same  manner  as 
noncritical  sensors,  by  interposing  a  microprocessor-based  unit  between 
the  busses  of  SIFT  and  the  units  themselves. To some  extent  this  may  not 
be  necessary  if  the  units  themselves  are so connected  into  the  bus  system 
that  they  can  only  read  from  the  SIFT  module  memories.  This  constraint 
on the  operation  can  be  achieved  in  the  bus  control  mechanism,  as  is  the 
case  for  the  connection  of  SIFT  modules  themselves.  The  advantage  of  the 
use  of  a  microprocessor  is  that  attention  can  be  given  to  the  design  of 
its  interconnection  and  the  same  logic  can  be  used  many  times,  whereas if 
the  units  are  themselves  connected,  then  it  is  necessary  to  ensure  that 
the  logic  and  physical  design  preclude  the  propagation  of  errors  or  dam- 
age.  This  validation  would  have  to  be  carried  out  for  each  individual 
type of unit  that  is  connected  to  the  system. 
C. SIFT  Memory  System  Design 
1. Introduction and Summary 
In this  discussion  of  SIFT  memory  system  design,  we  consider 
the  storage  of  programs  and  data  for  high-rate  control  and  display  func- 
tions,  which  are  served  by  the  distributed  memories  of  the  basic  SIFT 
scheme,  and  also  the  low-rate,  high-volume  storage  that  may  be  required 
in  a  practical,  general-purpose  aircraft  computer. 
At  the  present  state  of  development  of  the  SIFT  architecture, 
the  following  questions  about  memory  are  the  most  pertinent: 
0 How  many  levels  of  memory  are  needed  to  accommodate 
the  range of storage  capacities  and  speeds  in  the 
SIFT  prototype? 
112 
0 What  technologies  are  appropriate  to  the  various 
storage  functions? 
0 What  special  logical  functions  may  be  needed  within 
memory  modules  to  support  SIFT  processing  and  com- 
munication  modes? 
What  fault-tolerance  capabilities  are  appropriate 
to  the  various  levels  of  the  memory  hierarchy? 
0 What  are  the  basic  performance  requirements  for  SIFT 
memories? 
These  questions  are  discussed  in  order. 
The  following  conclusions  are  derived  in  the  discussion: 
0 Two  levels  of  memory  are  needed  in an Air  Transport 
SIFT:  a  set  of  high-speed  random  access  memories 
(RAM) for  program  execution  and  a  high-capacity  block- 
access  byte-serial  store.  A  large  central RAM is  not 
needed. 
0 The  following  fault-tolerance  schemes  appear  attractive: 
- Single-error-correction,  double-error-detection 
codes  exist  for RAM data  channels. 
- Software  reconfiguration  of  contiguous  blocks  of 
words in RAM is  provided. 
- Either  arithmetic  sum  checks  or  longitudinal  parity 
checks  are  made  for  a  byte-serial  store. 
- Redundant  address  information  (to  some  degree  of 
precision)  may  be  usefully  appended  to  each RAM 
word. 
- The  appropriate  form  of  redundancy  for  the  block 
access  memory  needs  further  study.  Dual  redundancy 
appears  satisfactory. 
- A section  of  read-only  memory  may  be  employed  use- 
fully  in  each  processor  memory. 
- A reliable  form  of  nonvolatile  writable RAM would 
be  beneficial,  but  is  not  presently  available. 
Awareness of future  developments  is  desirable. 
- The  feasibility  of  marginal  checking  for  contem- 
porary  semiconductor  memories  should  be  investigated. 
It is  potentially  of  great  value  in  SIFT. 
- Memory  design  should  allow  for  the  possible  use of
tagged  architecture.  Tagging  of  words  appears  to 
have  several  beneficial  uses, e.g., in  protecting 
data  from  erasure  because  of  erroneous  address 
calculation. 
113 
0 The  problem  of  unflexed  fault  tolerance  circuits  is 
significant,  but  solutions are apparent, e.g.,  perma- 
nently  string  data  patterns in memories  that  can  test 
error-detection  logic. 
2. Memory Hierarchy 
The  most  pressing  concern  about  the  SIFT  prototype  memory  hier- 
archy  is  the  need  for  a  central  fast  memory.  Our  studies3  have  determined 
that  the  high-rate  aircraft  control  programs  may  be  served  adequately  by 
the  memory  modules  associated  with  the  distributed  SIFT  processors. It 
also  appears  necessary  to  have  capability  for  storing  high-volume  data 
with  relatively  low  requirements  for  access  speed.  Data  of  this  type 
include : 
0 Copies  of  all  programs,  to  be  used in an extreme 
situation  when  normal  reconfiguration  fails. 
0 Infrequently  used  programs,  e.g.,  for  diagnosis. 
0 Sequences  of  recent  input-output  activity,  to  be 
used  for  system  recovery  or  off-line  system  analysis. 
The  volume  and  rate  required  for  this  class of data  are  now  known  precisely, 
but  it  appears  that (1) transfer  of  a  block  of  data,  with  latency o  the 
order  of  a few  milliseconds,  is  satisfactory  (see  Section V), and (2) 
capacities  on  the  order  of l 7 bits  and  data  rates  of lo6 bits  per  second 
are  reasonable  to  demand.  Such  characteristics  are  provided  by  current 
technologies  in  the  form  of  block-organized  shift  registers. 
These  considerations  indicate  that  the  SIFT  prototype  should 
provide  for  serial-mode,  block-transfer  storage  as  well  as  multiple, 
fast,  random-access  storage  units  for  the  distributed  processing  of  high- 
rate  programs.  Given  the  disparity  in  speeds  and  sizes of the  two  memory 
types,  it  is  natural  to  consider  the  use  of an intermediate  level  of 
storage,  such  as  a  large-capacity  random-access  memory. 
The  following  uses  for  such  a  memory  are  apparent: 
(a) Direct  program  execution,  one  benefit  of  which  would be 
an  economy  in  storage,  since  failures  in  a  processor  would 
not  require  abandonment of memory,  as  in  the  present  dis- 
tributed  processor  design. A second  benefit  would be a 
1 14 
r educ t ion  in  the  ove rhead  r equ i r ed  fo r  t he  memory mapping 
t h a t  s u p p o r t s  r e c o n f i g u r a t i o n  o f  s t o r a g e  u n d e r  f a u l t s .  
Storage of  rarely used programs,  a benef i t  o f  which  would 
be a r e d u c t i o n  i n  t h e  r e q u i r e d  c a p a c i t y  o f  t h e  d i s t r i b u t e d  
processors  for  programs which,  when needed ,  requi re  rap id  
a c c e s s .  
S t o r a g e  o f  f l i g h t - c r i t i c a l  p r o g r a m s  f o r  r a p i d  l o a d i n g  o f  
d i s t r i b u t e d  memories d u r i n g  f a u l t  r e c o v e r y .  
S to rage  of fu ture  very  la rge  programs tha t  exceed  the  
ind iv idua l   capac i ty   o f   p re sen t   d i s t r ibu ted   memor ie s .  A 
b e n e f i t  would be to  s impl i fy  r econf igu ra t ion  p rocedures ,  
s i n c e  t h e  d i s t r i b u t e d  memories would be reserved for small 
programs. 
Of t h e s e  p o i n t s ,  o n l y  (a) r e q u i r e s  s p e c i a l  a r c h i t e c t u r a l  p r o -  
v i s i o n s ,  s i n c e  t h e  d i r e c t  e x e c u t i o n  of programs from a c e n t r a l  memory 
r e q u i r e s  a ma jo r  i nc rease  in  bus  da t a  rates and/or  a redesign of  the bus 
concept .   The  major   benefi ts   appear   to   be  economic,   on  the  grounds  that  
i n  a c e n t r a l  memory, f a i l u r e  of a processor  does  not  cause  a r e d u c t i o n  
i n  memory; hence a lower  amount  of memory redundancy is needed. The 
economic impact would appear t o  b e  small, b e c a u s e  p r o c e s s o r  f a i l u r e  is 
e x p e c t e d  t o  c o n t r i b u t e  l i t t l e  t o  s y s t e m  f a i l u r e  rate compared t o  memory 
i t s e l f .  
Po in t  (b ) ,  s to rage  o f  r a re ly  used  p rograms ,  can  be  sa t i s f i ed  by 
the  b lock-o rgan ized  s to rage  l eve l ,  p rov ided  tha t  t he  s to re  i s  capable of 
loading  a 1 K  program  block  in 1 2  m s  [ R e f .  3 1 .  
P o i n t  (C), s t o r a g e  f o r  r a p i d  l o a d i n g  o f  c r i t i c a l  programs, 
a p p e a r s  n o t  t o  h a v e  s i g n i f i c a n t  b e n e f i t .  A l l  t h e  c r i t i c a l  programs  sur- 
veyed are small and can be t ransferred between a p a i r  of d i s t r i b u t e d  
memories w i t h  s u f f i c i e n t  s p e e d  t o  meet r e c o v e r y  r e q u i r e m e n t s .  I f ,  i n  t h e  
f u t u r e ,  c r i t i ca l  programs are added t h a t  are so  l a r g e  as t o  p r o h i b i t  
r a p i d  t r a n s f e r ,  s a t i s f a c t o r y  r e c o n f i g u r a t i o n  c o u l d  b e  a c h i e v e d  by employ- 
i n g  a h i g h e r  o r d e r  o f  r e p l i c a t i o n  t h a n  i s  c u r r e n t l y  e x p e c t e d  f o r  a p p l i c a -  
t ion  programs.  
Poin t  (d) ,  s torage  of  abnormal ly  la rge  programs,  does  not  con- 
s t i t u t e  a c o m p e l l i n g  r e a s o n  f o r  d i s t i n g u i s h i n g  a s e p a r a t e  level of memory. 
It m i g h t  r e q u i r e  t h a t  some of t h e  d i s t r i b u t e d  memories  be of  larger  than 
115 
average  size.  This  would  tend  to  constrain  the  flexibility of reconfigu- 
ration,  but  it  does  not  seem  to  be  a  serious  problem. 
We  conclude  that an intermediate  level  of  storage, in the  form 
of  a  large  random  access  memory,  is  not  justified. 
It should be  noted  that  the  arguments  given  are  based on the 
particular  computational  needs  of  the  air  transport  problem  environment 
and  on  currently  feasible  memory  organizations.  The  appropriate  memory 
hierarchy  for  a  SIFT  computer  used in different  problem  areas,  e.g.,  time- 
shared  computing  or  communications  processing,  would  have  to  be  reexamined. 
New  memory  developments  might  also  have n impact.  For  example,  an  ex- 
tremely  low-cost  serial  store  might  be  a  useful  attachment  to  each  dedi- 
cated  processor-memory. 
3.  Memory Technologies 
The  primary  impact  of  memory  technology n SIFT  architecture  is 
felt  in the  following  issues: 
0 The  choice  of  magnetic  core  or  semiconductor  storage 
0 The  feasibility  of  a  block-access  107-bit  secondary 
for  the  processor  memories. 
store. 
0 The  need  for  fixed or nonvolatile  storage. 
In the  July 1974 study  [Ref. 31 it  was  argued  that  semiconductor 
memories  are  preferable  to  magnetic-core  memories on the  ground  that (1) 
they  tend  to  use  much  fewer  circuit  connections  and  manual  assembly  opera- 
tions, (2) the  drive  circuits  operate  at  lower,  hence  less  stressful  power 
levels,  and ( 3 )  low-level  sense  signals  are  restricted  to  the  interior  of 
the  devices,  hence  are  more  immune  to  noise.  We  believe  that  these  fac- 
tors  still  apply,  and  our  following  discussions  assume  the  use  of  semi- 
conductor  memories  for  the  processor  memories.  We  also  observe  that 
magnetic-core  memory  technology  continues  to  evolve  appreciably.  There- 
fore  the  comparative  value  of  the  two  memories  should be reviewed  period- 
ical  ly . 
116 
Recent  developments  have  established  the  feasibility  of  two 
novel  technologies  for  block-access  stores:  charge-coupled  devices  (CCD) 
and  magnetic  bubble  storage (MBS). Both  are  amenable  to  byte-serial 
shift-register  type  structures  and  are  well  suited  for  block-structuring, 
with  block  lengths  of 104-10 bits. Data  rates  appear  to  favor  CCD  by  a 
factor  of  two  to  three.  Contemporary  CCD  units  have  a  maximum  rate  of 
about 5 Mb/s, while MBS units  have  a  maximum  rate  of  about 2 Mb/s. CCD 
appears  to  be  capable  of  higher  speed  through  design  refinements,  but 
significant  increases  in MBS speeds  may  require a breakthrough  in  tech- 
nology. 
5 
An important  function  of  a  block-store  memory  is  to  retain  data 
between  flight  periods. At such  times,  aircraft  power  may  be  off. It is 
therefore  important  to  consider  the  feasibility  of  means  for  preserving 
data  with  power-off in the  two  schemes. MBS appears  to  have  a  significant 
advantage  in  nonvolatility  of  data, in respect  to  both (1) power loss and 
(2) interference  due  to  strong  environmental  signals  such  as  lightning. 
This  advantage  results  from  the  use  of  static  magnetic  biasing  fields 
closely  adjacent  to  the  storage  surface.  The  problem of power  loss  in 
CCD  (and  other  semiconductor  memories)  may  be  mitigated  by  the  use  of 
small  batteries,  and  the  problem  of  interference  may  be  solved  by  careful 
shielding.  The  use of "holding"  batteries  is  common  in  current  semicon- 
ductor  memories,  but  it  is  not  yet  fully  accepted  as  a  solution.  Shield- 
ing  as  a  protection  against  lightning  strikes  also  has  proved  generally 
satisfactory  for  low-power  digital  circuits.  Power  loss  remains a  
inadequately  studies  problem. 
Based on nonvolatility, MBS would  appear  to  be  the  medium  of 
choice  at  this  time.  The  issue,  however,  may  be  decided on economic 
grounds.  While  the  primary  issue  for  the  air  transport  application  is 
not  cost  but  reliability,  the  two  are  closely  connected,  since  high-volume 
production  has  a  strong  effect on intrinsic  device  reliability. 
With  regard  to  economic  factors,  CCD  has  a  strong  current  ad- 
vantage  over MBS, derived  in  part  from  the  strength  of  the  existing  semi- 
conductor  industrial  technology  base.  CCDs  are  themselves  facing  strong 
1 1 7  
competition  from  ordinary  random-access  memory (RAM) technologies.  Appar- 
ently  the  somewhat  higher  intrinsic  cost  of RAMs (due to higher  fabrica- 
tion  and  testing  complexity)  may  be  outweighed  by  their  presently  much 
higher  production  volumes  and  by  the  system  advantages  of  their  lower 
latency  times. 
In  summary,  a  reliable,  low-cost,  block-oriented  mass  store 
appears  feasible  at  this  time.  If  the  magnetic  bubble  technology  proves 
not  to  be  economically  viable,  then  special  effort  should  be  made  to 
assure  the  capability  of  semiconductor  memories  to  withstand  transient 
interference  and  to  have  data  maintained  during  power-off  periods of 
several  days. 
The  problem of data-volatility  also  pertains  in  the  distributed 
RAMs used  for  program  execution.  The  reason  for  concern  is  that a  massive 
power  failure  or  noise  impulse  may  erase  critical  programs  or  data  and 
require  a  time-consuming  reloading of programs  or  recomputation of crit- 
ical  state  data. 
The  most  critical  data,  in  order,  are: 
0 The  local  executive  program 
0 Any  copy  of  the  global  executive  program 
0 Flight-critical  state  information  for  high-iteration- 
rate  control  programs 
0 Flight-critical  state  information  for  low-rate  control 
programs 
0 Non-flight-critical  programs. 
The  following  several  candidate  functional  types  of  random- 
access  memory  are  considered. It is  generally  feasible  to  mix  memory 
types  within  a  single  memory  unit. 
The  most  nonvolatile  memory is a read-only  memory (ROM). For- 
tunately,  a ROM is  entirely  feasible  for  the  local  executive  because  the 
same  program  appears  in  each  memory,  and  it  should  not  be  subject to 
change.  The  use  of  a ROM has  the  important  benefit  that  it  would  permit 
system  initialization  without  the  use  of  external  memory  units. 
118 
The  use  of  ROM  for  the  global  executive  is  more  controversial, 
because  not  every  memory  needs a copy of the  global  executive  and  because 
the  program  may  be  more  subject  to  change.  The  nonvolatility  of  a  ROM 
may  be  attractive  enough  for  the  purpose of rapid  recovery  after  massive 
transient  errors  to  justify  some  extra  copies  of  the  global  executive. 
It should  not  be  necessary  to  have  a  copy  in  each  memory,  merely  enough 
to  cover  the  expected  needs  for  spare  copies.  Furthermore, it may  be 
possible  to  partition  the  global  executive  program  into  ROM  and  RAM  por- 
tions  to  achieve  some  economy in the  redundant  copies;  every  memory  would 
then  have  a  copy  of  the  ROM.portion,  but  only  as  many  RAM-portion  copies 
would  be  carried  as  are  expected  to  be  needed. 
ROM  technology  is  not  appropriate  for  Items (c), (d) and (e). 
For  critical-state  information,  it  would be attractive  to  be able to  use 
one of the  various  forms of writable  nonvolatile  semiconductor  memory, a 
function  usually  referred  to  as  programmable  read-only  memory (PROM). 
The  so-called MNOS technology,  for  example,  has  the  desirable  character- 
istic  that  information  is  retained  without  applied  power. It has  several 
disadvantages,  such  as  deterioration  with  extensive  use  and  slow  writing 
speed,  which  seem  to  rule it out  for  the  present  application.  Several 
claims  have  been  made  recently  in  various  trade  journals  about  improved 
PROM  technologies  for RAMS and CCDs. Such  developments  deserve  continued 
attention. 
In  summary,  the  use of ROM  for  a  portion of the  SIFT  working 
memories  should  be  assumed.  The  use  of  PROM  technology  would  be  benefi- 
cial  if  some  reliable  form  appears  that  is  compatible in timing  with  the 
'primary RAM devices. 
4 .  Special Logical Functions 
In this  section  we  discuss  the  possible  need  for  special  logical 
functions  in  memory  units  other  than  for  fault  tolerance. We consider 
both  the  distributed RAM memories  and  the  block-structured  mass  memory. 
In SIFT,  each  processor  handles a mix of tasks,  some of which 
are of high  criticality  while  others  may  have  no  strong  reliability 
119 
requirements. It is  expected  that  the  high-criticality  programs  will  be 
subjected  to  a  variety of verification  procedures  (including  formal  proof) 
to  ensure  their  correctness.  The  tasks  of  low  criticality  may  not  be 
verified  to  the  same  high  degree,  and  thus  it  is  necessary  to  ensure  that 
these  latter  tasks  cannot  adversely  effect  the  correct  operation  of  the 
high-criticality  task  through  the  presence  of  programming  errors. A 
powerful  mechanism  to  guarantee  the  separation of tasks  is  the  use  of 
"bounds"  checking  on  memory  references.  This  mechanism  allows  a  task 
program  to  write  only  into  the  area  of  memory  that  is  allocated  to  it. 
Various  known  address-control  mechanisms  should  be  incorporated 
in the  processor. It may  also  be  cost-effective'to  include  some  redun- 
dancy  within  the  memory.  For  example,  a  short  data  field  may  be  attached 
to  each  data  word  that  can  carry  some  identification  to  aid  in  program 
protection.  Such  identification  could  be  either  a  unique  program  label, 
or  perhaps,  a  label  indicating  the  level  of  certification  reached  by  a 
given  program. 
Such  appended  data  fields  are known as tags. The  extensive  use 
of  tags  has  been  advocated  by  several  authors  (Feustal),  e.g.,  for  secu- 
rity  enhancement  and  for  indicating  the  "type"  of  a  datum  (integer,  alpha- 
numeric, etc.). It is  employed  in  at  least  one  line  of  commercial  com- 
puters.  Decisions  about  tagging  properly  belong  to  processor  design. 
For  the  SIFT RAMS, the  concept  simply  implies  additional  word  length  of 
from  three  to  ten  bits.  Logical  operations  on  the  tags  would  be  accom- 
plished  within  the  processors. 
The  second  class  of  memory  function  is  the  block-oriented  bulk 
memory.  This  memory  is not intended  for  direct  program  execution, so it 
need  not  be  associated  with  a  regular  processor.  Nevertheless,  in  order 
to  receive  data  for  recording,  it  must  actively  request it. Some  form 
of  processor  is  therefore  needed  to  provide  this  function,  as  well  as 
such  functions  as  block  address  interpretation  and  output.  Only  a  frac- 
tion  of  the  logical  capability of a  standard  SIFT  processor  would  be 
needed,  but  using  one  is  probably  the  most  cost-effective  approach. 
120 
5. Fault Tolerance 
The  basic  SIFT  concept  assumes  the  use  of  modules  of  standard 
design  for  both  memories  and  processors.  The  primary  mechanism  for  fault 
tolerance  is  reconfiguration  over  modules,  but  the  use  of  some  fault- 
tolerance  mechanisms  within  modules  is  not  excluded,  and  may,  in  fact, 
be  cost  effective.  This  section  discusses  the  use  of  fault  tolerance 
within  SIFT  memories.  The  major  emphasis  is  on  the  primary  distributed 
memories. 
The  five  relevant  issues  in  memory  fault  tolerance  are: 
0 Diagnosis 
0 Error  detection 
0 Error  correction 
0 Reconfiguration 
0 The "unflexed  fault  tolerance  circuit"  problem. 
These  issues  will  be  discussed  in  order. 
a. Diagnosis 
The  need  has  been  established  for  in-flight  diagnosis  in 
addition to preflight  diagnosis  for  the  Air  Transport  SIFT.  The  archi- 
tectural  issues  are (1) should  diagnostic  data  have  a  special  memory  port, 
or  should it flow  via  the  normal  data  part,  and (2) are  any  special  logi- 
cal  capabilities  needed  within  the  memory  devices  or  subsystems  to  aid  in 
diagnosis? 
With  regard  to  the  first  of  these, it is  clear  that  addi- 
tional  ports  would  introduce  sources  of  error  whose  control  might  be  very 
expensive  in  terms  of  added  system  fault-tolerance  mechanisms  and  increased 
complexity of reliability  analysis. All efforts  should  be  made  to  employ 
the  data  paths  and  processor  control  functions  used  for  actual  computa- 
tion. 
The  second  issue  is  difficult to address  in  the  absence  of 
particular  memory  designs. In general  it  is  desirable  to  avoid  special 
mechanisms,  both to reduce  the  number  of  fault  sources,  and  to  avoid 
121 
special,  low-volume  production  runs  that  might  be  required  for  special 
features,  but  two  functions  may  justify  some  added  equipment: (1) marginal 
testing,  and (2) partitioning  of  memory  to  enhance  fault  location. 
We  deem  marginal  testing  to  be  potentially  a  very  valuable 
facility,  especially  because  it  may  help  to  uncover  incipient  faults,  and 
thus  give  time  for  reconfiguration  before  computing  errors  occur. The 
actual  benefit  for  contemporary  memory  systems  needs  to  be  ascertained 
prior to the  statement of engineering  specifications.  One  item  of  con- 
cern  is  that  the  control  of  such  marginal  states  needs  to  be  protected. 
One  one  hand,  the  use  of  program  control  introduces  new  hardware  and  soft- 
ware  sources  of  error  (in  order  to  limit  the damage  due to such  errors, 
marginal  checking  should  be  controlled  independently  in  each  processor). 
On  the  other  hand,  it  would  be  an  unreasonable  burden  on  the  flight  crew 
to  have  the  control  completely  manual.  It  may  be  acceptable  to  use  pro- 
gram  control  together  with  a  crew-visible  indicator to indicate  the  appli- 
cation  of a  marginal  state.  This  would  tend  to  protect  against  a  stuck-on 
marginal  state. * 
The  use  of  internal  logic  to  enhance  fault  location  could 
be  beneficial  if  it  were  desired  to  use  internal  reconfiguration  for  fault 
tolerance  (e.g.,  by  modifying  memory  address-mapping).  Its  benefit  would 
be  to  accelerate  diagnosis  by  isolating  sections  of  memory. RAMS have 
very  uniform  structures,  which  are  amenable to systematic  testing. It is 
therefore  not  clear  that  the  amount  of  the  acceleration  of  diagnosis 
would  justify  the  cost  and  fault-hazard of the  added  logic.  Some  further 
investigation of this  point  would  be  justified.  In  any  event,  since 
memory  diagnosis  is  not  feasible  for  recovery  of  high-criticality  faults, 
its  acceleration  is  a  relatively  minor  issue. 
- 
*The  trade-offs  discussed  here  also  appear  in  th'e  more  general  issue  of 
run-time  system  diagnostics,  since  any  diagnostic  mode  can  introduce 
some  sources of vulnerability  to  mission-directed  computation. 
122  
b. Error Detection 
Error  detection in SIFT  memories  may  have  several  benefits. 
For  example, 
0 As an adjunct  to  error  correction,  it can  give 
warning  of an incipient  memory-unit  failure. 
0 It can  strengthen  the  decision  of  a  programmed 
voting  check;  that  is,  if  one  version  of an 
input  datum  disagrees  with  the  one  (or  more) 
other  versions,  a  memory-error  indicator  can 
confirm  the  identification  of  the  faulty  data 
source.  If  there  is  only  one  other  version 
(i.  e. , if  the  redundancy  is  "dual"),  then  the 
faulty  source  may  be  identified  without  further 
diagnosis.  This  would  tend  to  justify  increased 
use  of  dual-mode  redundancy. 
Many  effective  techniques  are  known  for  error  detection in 
memories.  Several  important  examples  are: (1) generalized  parity-check 
codes  for  words  of  data, (2) special  codes  for  numerical  data, (3)  arith- 
metic  sumchecks  for  blocks  of  data,  and ( 4 ) address  tags  (for  verifying 
address-selection  logic  in RAMs). 
Parity-check  codes  are  very  cost-effective  for RAMs, and 
are  widely  used.  The  use  of  up  to  ten  percent  additional  number  of  bits 
is  probably  justified. 
The  use  of  separate  codes  for  numerical  data  is  not  justi- 
fied.  They  are  inefficient  for  memories,  and  ineffective  for LSI- 
realized  processors,  especially  in  a  voting-and-whole  processor- 
reconfiguration  scheme  such  as  SIFT. 
The  use  of  arithmetic  sum-checks  for  data  blocks  appears 
attractive  for  a  serial  mass-memory,  since  it  would  tend  to  catch  shift- 
control  faults  (that  would  cause  loss  of  duplication  of  characters).  The 
technique  has  no  additional  contribution  over  word-parity  checks  for RAMs. 
The  use  of  redundant  address  tags on words  is  attractive, 
since  certain  faults  in  some RAM word-selection  logic-circuits  tend  to 
cause  selection  of  single  incorrect  words  (other  faults  may  cause  selec- 
tion  or  partial  selection  of  several  words).  Single-word  selection  errors 
123 
would  not  be  detected  by  parity  checks. The cost  of  detection  would  de- 
pend on the  precision  of  the  redundant  address  information.  This  could 
range  from  one  bit  to  the  full  address.  The  value  of  this  technique 
should  be  estimated in the  context  of  particular  memory  system  designs. 
For  example,  the  word  selection  logic  may  be  such  that  most  failures  give 
either  non-selection  or  multiple-word  selection. 
c. Error Correction 
The  benefit  of  error  correction  for  the  data  of  SIFT  memo- 
ries  is  that  it  may  be  a  very  cost-effective  way  to  prolong  the  useful 
life  of  by-far  the  largest  portion  of  SIFT  hardware. It appears  to  be 
unsurpassed  in  cost-effectiveness  protection  against  one  or  perhaps  two 
faults  per  memory  unit.  Furthermore,  it  is  usually  inexpensive  to  obtain 
error  detection of one  unit  more  than  the  amount  of  error  correction. 
Thus,  given  the  programmed  voting  check  of  SIFT,  the  occurrence  of  two 
errors  in  a  single-error-correctiony  double-error-detection  memory  unit 
would  still  allow  indication  of  which  version  of  a  dual-redundant  com- 
putation  is  correct. 
While  the  use  of  error  detection  alone  is  potentially  very 
effective  in  SIFT,  as  discussed  previously,  the  added  cost  of  single-error- 
correction  with  double-error-detection  appears  to  be  highly  justified. 
The  degree  of  redundancy  required  for  a  block-access  mass 
memory  will  depend  on  technological  factors  that  are  presently  unknown. 
Considering  the  relatively  low  criticality  of  the  data,  dual  redundancy 
may be satisfactory.  Since  the  memory  will  probably  have a module  real- 
ization,  it  may  be  that  dual  redundancy  might be effectively  applied  over 
storage  blocks  within a memory,  provided  that  the  driving  circuitry  can 
be protected. 
d. Reconfiguration 
The  major  value of memory  reconfiguration  is  to  deal  with 
multiple  faults,  since  error-correction  schemes  using  coding  are  almost 
124 
always  more  cost-effective  for  one  or  two  faults.  Reconfiguration  may 
be  applied  at  the  level  of  a  block  of  words,  a  bit-plane  or  a  memory 
device. 
The  least  expensive  form  is  block  reconfiguration.  In  the 
current  SIFT  design,  software  memory  mapping  tables  are  employed  to  inter- 
pret  addresses  for  all  interprocessor  communication.  These  tables  greatly 
facilitate  program  relocation  among  processor-memory  units. It is a 
trivial  step  to  assign  values  to  the  mapping  tables so a to  bypass  any 
contiguous  block  of  words  in  a  memory  that  has  been  determined  to  contain 
faults,  The  major  cost  would  seem  to  be  the  size of  th  program  (and  the 
cost  of  its  verification)  needed  to  analyze  the  fault  pattern  and  to 
define  the  boundaries  of  the  forbidden  region. 
Block  reconfiguration  is  effective  for  faults  in  memory 
cells  and  for  some  faults  in  word  selection  circuits,  but  it  is  not  effec- 
tive  for  faults  that  affect  an  entire  bit  circuit.  Such  faults  are  well 
covered  by  error-correcting  codes,  for  one r two  bit  circuits  per  mem- 
ory.  If  a  larger  number  of  bit-circuit  faults  must  be  tolerated,  some 
form  of  switching  of  logical  bit-planes  may  be  effective.  Such  switch- 
ing  is  very  expensive  and  fault-prone.  Considering  the  numerous  other 
fault  tolerant  mechanisms  available  in  SIFT,  we  deem  it  not  to  be  a  cost 
effective  measure. 
A device-level  reconfiguration  scheme  has  been  described 
[Ref. 31 that  is  very  cost-effective  for  large  numbers  of  faults.  In 
order to employ  this  scheme  some  modifications  to  memory  chip  design  are 
needed,  together  with  a  rather  powerful  diagnosis  and  reconfiguration 
program.  The  initial  cost  of  the  scheme  appears  to  be  prohibitive  for 
the  present  application.  Its  greatest  value  is  for  very  long  unattended 
life. 
e. The "Unflexed  Fault  Tolerance  Circuit"  Problem 
The  general  "unflexed  problem" is the  problem  of  determin- 
ing  that  a  functional  element  is  operable  prior  to  its  use n a  r al-time 
computation.  This  problem is especially  serious  for  fault-tolerance 
125 
mechanisms.  In  a  straightforward  design,  some  failures  of  such  a  mecha- 
nism  will  be  observed  only  when  the  fault  condition it is designed  to 
treat  occurs. An apparent  dilemma  exists in that  if  a  fault  condition  is 
artificially  simulated so as to  test  the  mechanism,  some  action  (such  as 
program  reconfiguration)  may  be  initiated  that  could  harmfully  degrade 
performance;  therefore,  to  avoid  such  degradation,  the  consequent  activity 
must  be  inhibited.  Such  inhibition,  of  course,  may  also  be  a  source  of 
trouble,  and  a  complete  test  must  assure  that  the  inhibition  itself  can 
be  terminated. 
This  problem  applies  to  both  hardware  and  software  mecha- 
nisms. In the  case  of  memories,  the  controlled  flexing  of  error  detection 
and  correction  circuits  is  clearly  desirable.  One  attractive  way  to 
achieve  this  would  be  to  record  permanently  a  set  of  erroneously  encoded 
words  in an.ROM section  of  each  memory.  Such  words  could  be  read  by  a 
diagnostic  program  which  would  be  designed  to  interpret  correctly  the 
outputs  of  the  error-detection  or  correction  circuits.  The  problem  of 
correctly  inhibiting  undesired  reconfiguration  would  be  passed  upward  to 
the  executive  program. 
This  scheme  avoids  the  need  for  special  circuitry to defeat 
the  normal  data  encoding  circuits.  Such  defeating  would  be  needed  in 
order  to  simulate  a  defective  memory. 
6. Performance Specifications 
In  this  section  we  summarize  the  performance  required  for  SIFT 
memories. At this  stage  of  SIFT  development,  the  various  requirements 
have  different  degrees  of  certainty.  The  discussion  references  the 
sources of  the  requirements  and  indicates  the  issues  that  must be examined 
in  order  to  achieve  more  precise  values. 
a. Distributed-Processor Memories 
The  following  requirements  pertain  to  the  random-access 
memories  used  locally  to  each  SIFT  processor: 
126 
0 Word l eng th  
Data f i e l d :  24 b i t s   [ R e f .  31 
Tag f i e l d :  4 t o  8 b i t s  
( l eng th  to  be  de t e rmined  by a n a l y s i s  of h igher -  
level mechanisms f o r  c o r r e c t i o n  of f a u l t s  i n  
memory addres s ing )  
Error-detection/correction f i e l d :  9 t o  12 b i t s  
(assume 30 b i t s  f o r  d a t a  and  t ag )  ( l eng th  to  
be determined by the t rade-off  in  cdst and 
r e l i a b i l i t y  b e t w e e n  memory and coding-decoding 
l o g i c )  
0 Maximum c a p a c i t y :  [64K-l28K]  words  [Ref. 31 
Actual  values  w i l l  va ry  wi th  the  app l i ca t ion ,  and 
may be much less than   the  maximum. The expected 
v a l u e  r e q u i r e d  t o  c o v e r  t h e  e n t i r e  se t  of a i r  
t r a n s p o r t   a p p l i c a t i o n s  i s  [24K]. The maximum 
amount assumed w i l l  de t e rmine  the  addres s  s i ze  
r e q u i r e d   f o r   p r o c e s s o r   d e s i g n .  The va lue  of 
maximum-capacity s t a t e d  i s  c o n s i s t e n t  w i t h  t r e n d s  
i n  modern minicomputer technology and architec- 
t u r e .  It is  v e r y   c o n s e r v a t i v e   w i t h   r e s p e c t   t o  
t h e  f u l l  s e t  o f  c o m p u t a t i o n s  s u r v e y e d  i n  R e f e r -  
ence 3 .  It i s  conce ivab le   t ha t  some new r e q u i r e -  
ments may d e v e l o p  t h a t  g r e a t l y  i n c r e a s e  r e q u i r e d  
system memory-capacity,  e.g. ,  elaborate graphic 
d i s p l a y s .  The SIFT arch i tec ture   can   be   expanded  
(by a t  least a f a c t o r  of three,  and perhaps 
h ighe r )  by t h e  a d d i t i o n  o f  new processor-memory 
p a i r s ,  so as t o  meet a g r e a t l y  i n c r e a s e d  r e q u i r e -  
men t . 
Access  modes:  (1)  whole  word,  read/write, random- 
a c c e s s  ( a c c e s s  t o  s u b f i e l d s  a t  t h e  memory i n t e r -  
f a c e  i s  an  unnecessary  fea ture  and  would g r e a t l y  
complicate  error-detectionlcorrection); ( 2 )  whole 
word,   read-only,   random-access   (data   are   preset  
a t  manufac ture) .  Assembly should  a l low  easy 
change of ROM p o r t i o n  by maintenance personnel.  
T o t a l  amount of  t h e  ROM p o r t i o n  i s  expec ted  to  
be less than  20% of  the  d i s t r ibu ted -p rocesso r  
memory. 
0 S p e c i a l   f e a t u r e s  
(1) S i n g l e  e r r o r  c o r r e c t i o n  w i t h  d o u b l e  e r r o r  
detect ion,  using encoding and decoding 
l o g i c  a t  t h e  (word) d a t a  i n t e r f a c e .  
(2) P o s s i b l e  u s e  of programmed marginal  check- 
ing  of  memory c i r c u i t s .  
127 
0 Interfaces 
(1) High  bandwidth  interface  to  the  local 
processor. 
(2) The  interface  to  the  bus  system  must  incor- 
porate  a  means  to  protect  the  memory  from 
continuous  accesses  by  a  faulty  bus.  Sec- 
tion VI-B describes  such an interface  which 
is  identical  to  the  means  used  by  a  bus  to 
protect  itself  against  repeated  accesses  by 
a  faulty  processor. 
b. Block-Access Memory 
Requirements on the  block-access  memory  are  less  certain 
at  this  time  than  those on the  random-access  memories,  because  the  re- 
trieval  functions  are  less  critical. Also, some  relevant  architectural 
issues  remain  to  be  settled,  such  as  tolerance  to  transient  interference. 
The  key  characteristics  are  data  rate,  delay  in  accessing 
the  beginning  of  a  data  block,  and  capacity.  The  capacity  of  the  memory 
system  will  be  determined  by  numerous  critical  and  noncritical  data 
functions. A range of lo6  to  10  bits  appears  likely.  The  data  rate  and 
access  delay  will  depend  upon  the  critical  functions.  These  appear  to  be 
of  two  classes, i.e., temporary  storage  of  input  and  output  data,  and 
storage  of  duplicate  copies of critical  programs. A maximum  recovery 
time of 12 milliseconds  is  assumed.  Separate  block  memories  may  be  needed 
for  the  two  classes. The  following  estimates  apply: 
8 
0 Data rate: for temporary I / O :  10 bytes/sec 6 
for  program  access : 0.5 X 106 bytes/sec. 
(based  on  transfer  of a block of 4K word, 
30 bit/word,  in 50 ms). 
Access  delay  for a random  block:  less  than  1  ms. 
Special features: 
(1) fault  tolerance  probably  in  the  form  of 
dual  redundancy,  with  independent  control 
of storage  and  retrieval.  Possible  use  of 
longitudinal  error-detecting  codes. 
128 
D. P rocesso r s  
The e s s e n t i a l  f e a t u r e s  t h a t  are r e q u i r e d  i n  a SIFT p r o c e s s o r  f o r  
u se  in  an  advanced  comnerc i a l  t r anspor t  are d e s c r i b e d  i n  t h i s  s e c t i o n .  
A s  t he  f au l t - to l e rance  o f  t he  comple t e  sys t em i s  achieved by so f tware  
a ided  by t h e  o v e r a l l  s y s t e m  s t r u c t u r e ,  t h e r e  i s  l i t t l e  need for  any 
spec ia l   f au l t - to l e rance   ha rdware . in   t he   p rocesso r s   t hemse lves .   Ra the r ,  
t he  r equ i r emen t s  on the  p rocesso r  are c o n f i n e d  t o  a small number of 
c r i t i c a l  f e a t u r e s  t h a t  e n a b l e  t h e  s o f t w a r e ' t o  a c h i e v e  t h e  f a u l t - t o l e r a n c e .  
The c r i t i ca l  i s s u e s  i n  t h e  s p e c i f i c a t i o n  o f  t h e  p r o c e s s o r s  are: 
I n t e r f a c e   t o   t h e   b u s   s y s t e m  
I n t e r f a c e   t o   t h e  memory 
0 F e a t u r e s  t o  assist d i a g n o s i s  i n  t h e  p r o c e s s o r s  
0 Indi rec t   and   indexed   address ing  
0 I n t e r r u p t   s y s t e m  
0 I n t e r n a l   c l o c k  
0 Memory access bounds  checking. 
We a d d r e s s  t h e s e  p o i n t s  i n  t h e  o r d e r  l i s t e d .  
The i n t e r f a c e  t o  t h e  b u s  s y s t e m  i s  t h e  m o s t  i m p o r t a n t  i s s u e  i n  t h e  
p r o c e s s o r   s p e c i f i c a t i o n .  It  i s  t h i s   i n t e r f a c e   t h a t   e n a b l e s   t h e  SIFT 
s y s t e m  t o  a c h i e v e  f a u l t  i s o l a t i o n  a n d  damage i s o l a t i o n  b e t w e e n  i n d i v i d u a l  
processor  memory modules.  The f a u l t  i s o l a t i o n  i s  achieved by t h e  f a c t  
t h a t  t h e  p r o c e s s o r s  c a n n o t  w r i t e  d a t a  o n t o  o t h e r  u n i t s ,  and t h e  damage 
i s o l a t i o n  c a n  b e  a c h i e v e d  by t h e  u s e  o f  a p p r o p r i a t e  c i r c u i t  d e s i g n  
techniques  such as t h e  u s e  o f  h igh  impedance  d r ive  c i r cu i t s .  The i n t e r -  
face  must  be  capable  of  t ransmi t t ing  a da ta - r ead  r eques t  t o  the  appro -  
p r ia te  b u s  a n d  o f  t r a n s f e r r i n g  t h e  a c c e s s e d  d a t a  b a c k  t o  t h e  r e q u e s t i n g  
p rocesso r .  A da ta - r ead   r eques t   con ta ins   t he   fo l lowing   e l emen t s :  
Bus des igna to r  (max 3 b i t s )  
0 Process   des igna to r  (max 4 b i t s )  
Task   des igna tor  (max 6 b i t s )  
0 O f f s e t  w i t h i n  t a s k  (max 14 b i t s ) .  
129 
.. . . . . - . . . . . . " ." 
The above estimates f o r  t h e  number o f  b i t s  a s sumes  tha t  t he  fo l lowing  
maxima are u s e d  i n  t h e  SIFT s p e c i f i c a t i o n :  8 busses ,   16   p rocessors ,  64 
t asks  and  16K words within a t a s k .  It a l s o  a s s u m e s  t h a t  a s i m p l e  b i n a r y  
code i s  used  and  tha t  codes  fo r  e r ro r  de t ec t ion  and  co r rec t ion  are n o t  
employed.  The  above  control  data  must  be  communicated  to  the  bus  system. 
The 27 maximum b i t s  o f  t h i s  d a t a  c a n  b e  a c h i e v e d  w i t h  t h e  f i l l i n g  of a 
r e g i s t e r  of two words   l eng th .   Th i s   imp l i e s   t he   ex i s t ence  of t h i s   s p e c i a l  
r e g i s t e r  a t t a c h e d  t o  t h e  c o n v e n t i o n a l  o u t p u t  u n i t  o f  t h e  p r o c e s s o r .  The 
use of  such a s p e c i a l  r e g i s t e r  c a n  e n a b l e  c o n v e n t i o n a l  p r o c e s s o r s  t o  b e  
used i n  t h i s  a p p l i c a t i o n  w i t h  or r ly  the  requi rement  of  be ing  ab le  to  load  
a p a i r  of e x t e r n a l  r e g i s t e r s ,  as will b e  p o s s i b l e  w i t h  a l l  commonly avail- 
a b l e  p r o c e s s o r s .  F o l l o w i n g  a c t i o n  by the bus system on t h e  c o n t r o l  d a t a  
a word w i l l  b e  t r a n s f e r r e d  b a c k  t o  t h e  p r o c e s s o r .  T h i s  word of d a t a  w i l l  
b e  p l a c e d  i n  a n  e x t e r n a l  r e g i s t e r  t h a t  c a n  b e  r e a d  by t h e  p r o c e s s o r  i n  a 
conventional manner.  
* 
The i n t e r f a c e  t o  t h e  memory u n i t s  would  be  the  convent iona l  in te r -  
f a c e  as t y p i c a l l y  s u p p l i e d  w i t h  a processor  and no spec ia l  r equ i r emen t s  
arise i n  t h e  case of  the  SIFT computer. 
The u s e  o f  s p e c i a l  f e a t u r e s  t o  assist i n  t h e  d i a g n o s i s  of p rocesso r s  
and memories would l e a d  t o  a n  o p p o r t u n i t y  f o r  more power fu l  d i agnos i s  
t e c h n i q u e s .  I n  t h e  SIFT system we in t end  to  base  the  d i agnos i s  o f  equ ip -  
ment on t h e  b e h a v i o r a l  c h a r a c t e r i s t i c s  of that  equipment  and thus w e  see 
l i t t l e  need f o r  s p e c i a l  b u i l t - i n  test equipment (BITE) o r  i t s  u s e  i n  
b u i l t - i n  t e s t i n g  ( B I T ) .  
The reading of  data  in  one computing module by ano the r  demands t h a t  
i n d i r e c t  a d d r e s s i n g  b e  a v a i l a b l e ,  b e c a u s e  t h e  l o c a t i o n  of a word i n  one 
memory u n i t  i s  unknown t o  t h e  r e a d i n g  u n i t .  T h i s  i m p l i e s  t h a t  a t a b l e  
o f  base  addres ses  fo r  da t a  segmen t s  be  kep t  i n  each  memory and t h a t  t h e  
access t o  a word in  those  memor ies  would u s e  i n d i r e c t  a c c e s s i n g  t o  t h e  
r equ i r ed  word.  The same comments a p p l y  a l s o  t o  t h e  u s e  of  indexed access 
~~ ~ 
* Assuming a word l e n g t h  o f  1 6  b i t s  as d i scussed  below. 
130 
t o  da t a .  These  r emarks  are a l s o  r e s t r i c t i o n s  upon t h e  s p e c i f i c a t i o n  o f  
memory u n i t s  b u t  are i n c l u d e d  h e r e  b e c a u s e  t h e  i n d i r e c t i o n  a n d  i n d e x i n g  
i n t o  memory is o f t e n  c a r r i e d  o u t  i n  t h e  a s s o c i a t e d  p r o c e s s o r .  
The p r o v i s i o n  o f  a n  i n t e r r u p t  s y s t e m  i n  t h e  p r o c e s s o r  is requ i r ed  
f o r  some o f  t h e  f a u l t  t o l e r a n c e  t e c h n i q u e s  t h a t  are p a r t . o f  t h e  SIFT 
d e s i g n .  I n c l u d e d  i n  t h i s  are t h e  i n t e r r u p t  f o r  t i m e - o u t  when a t a s k  h a s  
ove r run  the  time a l l o t t e d  t o  i t  a n d  t h e  i n t e r r u p t  o n  t h e  o c c u r r e n c e  o f  a 
c l o c k   t i c k .  The l a t te r  occurs  whenever a new t a s k  starts. No e x t e r n a l  
i n t e r r u p t s  are e n v i s i o n e d  f o r  t h e  SIFT system. An i n t e r n a l  c l o c k  is  re- 
qu i r ed  to  p rov ide  fo r  t he  above -men t ioned  c lock  t i cks  a t  the  start of 
each  task  f rame.  
An impor tan t  fea ture  of  the  processor  i s  t h e  n e e d  t o  p r o v i d e  f o r  
checking of t h e  a d d r e s s  i n  e a c h  t a s k  c a l c u l a t i o n  t o  e n s u r e  t h a t  d a t a  o f  
o t h e r  t a s k s  is not  cor rupted .  This  can  be  accompl ished  by the  use  o f  
array bounds checking as i n  c o n v e n t i o n a l  d a t a  p r o c e s s o r s .  
Wi th in  the  limits of  the  above ,  there  are few requirements on the 
p rocesso r  beyond the  speed  requi rement  tha t  the  processor  be  capable  of 
o p e r a t i n g  a t  a n  i n s t r u c t i o n  e x e c u t i o n  rate of approximately 0.5 MIPS. 
This  i s  a r e l a t ive ly  modes t  r equ i r emen t  compared t o  modern minicomputers,  
b u t  i s  c u r r e n t l y  beyond t h a t  a c h i e v a b l e  by present-day L S I  microcomputers, 
a l though i t  is  expec ted  tha t  by t h e  mid-1980s t h i s  s p e e d  will be achiev- 
a b l e  by s u c h  u n i t s .  
E. Power  Supply  System 
We are concerned i n  t h i s  s e c t i o n  w i t h  r e l i a b i l i t y  a s p e c t s  of  the 
power supp ly   sys t em  fo r  SIFT. Two i s s u e s  are i m p o r t a n t ,   p r o t e c t i o n  
a g a i n s t  damage propagat ion and maintaining adequate  power s u p p l y  t o  SIFT 
i n  t h e  e v e n t  o f  i nd iv idua l  power s o u r c e  f a i l u r e s .  
The pr imary power sources used on modern c i v i l i a n  c o m m e r c i a l  a i r c r a f t  
va ry  wi th  each  a i rcraf t  type.   For  an  example,   the DC-10 and t h e  747 both  
u s e  a three-phase 400 c y c l e  a l t e r n a t o r  d r i v e n  by each j e t  engine.  The 
DC-10 h a s  t h r e e  j e t  e n g i n e s  a n d  t h r e e  a l t e r n a t o r s  w h i l e  t h e  747 h a s  f o u r .  
I n  b o t h  a i r c r a f t  t h e  power genera ted  by t h e  a l t e r n a t o r s  i s  r e c t i f i e d ,  
131 
r egu la t ed ,  and  f ed  to  a common 28-volt bus, which i n  t u r n  s u p p l i e s  c u r r e n t  
t o  t h e  2 8 - v o l t  b a t t e r y  b a n k .  
The a l t e r n a t o r s  are used i n  a f e e d b a c k  s y s t e m  i n  w h i c h  t h e  r e g u l a t o r s  
mon i to r  t he  inpu t  vo l t age  (and cur ren t )  and  feedback  a p r o p o r t i o n a t e  
amount t o  t h e  a l t e r n a t o r  f i e l d  t h u s  c o n t r o l l i n g  t h e  a l t e r n a t o r  o u t p u t  v o l t -  
age.   Figure  VI-13  depicts   such a s i n g l e   a l t e r n a t o r   s y s t e m .   F i g u r e  VI-14 
shows t h e  power sources  on a DC-10 a i r c ra f t  w i th  th ree  p r imary  and  two 
a u x i l i a r y  a l t e r n a t o r s .  One o f  t h e s e  a u x i l i a r y  u n i t s  i s  d r i v e n  by  a t u r -  
b ine  fo r  u se  when t h e  a i r c r a f t  i s  on the ground and the main engines  are  
a t  i d l e  o r  when e n g i n e s  a r e  b e i n g  u s e d  t o  t a x i  t h e  a i r c r a f t ,  as the  vary-  
ing engine speeds used during taxi  operat ion would cause unacceptable  
v o l t a g e  f l u c t u a t i o n s .  The   s econd   aux i l i a ry   un i t  on t h i s  a i r c r a f t  is 
d r i v e n  by the  f low f rom an  ex terna l  a i r  scoop when t h e  a i r c r a f t  is i n  
f l i g h t .  It is  s t r i c t l y  f o r  emergency  pu rposes  in  even t  o f  f a i lu re  o f  t he  
m a i n  a l t e r n a t o r s .  
The 747 a i r c r a f t  s y s t e m  is  similar but  has  four  main  engine  dr iven  
a l t e r n a t o r s  a n d  two a u x i l i a r y  g a s  t u r b i n e  d r i v e n  a l t e r n a t o r s .  Each  of 
t h e s e  a i r c r a f t  p r i m a r y  power systems i s  a good example of p a r a l l e l  r e d u n -  
dancy.   Under   emergency  operat ion,   any  one  a l ternator   feeding  the  bat tery 
bank  cou ld  supp ly  adequa te  cha rg ing  cu r ren t  fo r  a long enough per iod for  
s a f e  c o m p l e t i o n  o f  t h e  f l i g h t .  
Primary power sys t ems  o f  t h i s  na tu re  p rov ide  a para l le l  redundant 
sys t em in  wh ich  the  ma in  a l t e rna to r s  a re  a l l  on l i n e  a n d  c o n t r i b u t i n g  t o  
the  common load .  Up t o  t h e  l i m i t  o f   the   s torage   capac i ty  o f  t h e  b a t t e r y  
bank,   the  excess  power i s  be ing   s to red .  The r e g u l a t o r s   a r e   t y p i c a l l y  
a d j u s t e d  so  t h a t  e a c h  a l t e r n a t o r  is con t r ibu t ing  an  approx ima te ly  equa l  
amount t o  t h e  common load ,  When the  ba t t e ry  pack  is f u l l y  c h a r g e d ,  t h e  
r e g u l a t o r s  f u r n i s h  less f i e l d  and t h e  a l t e r n a t o r  i s  a l l o w e d  t o  i d l e .  
F a i l u r e  of a n  a l t e r n a t o r  c a n  b e  c a u s e d  e i t h e r  by f a i l u r e  o f  t h e  
d r iv ing  eng ine  o r  by a f a i l u r e  o f  t h e  a l t e r n a t o r  i t s e l f .  I n  e i t h e r  case 
the  r emova l  o f  t ha t  a l t e rna to r  f rom the  sys t em is a f f e c t e d  by t h e  series 
combination  of (1) t h e  r e c t i f i e r  d i o d e  a n d  ( 2 )  by t h e  r e g u l a t o r  b e f o r e  
manua l   ( swi t ch ing )   i n t e rven t ion  is a f f e c t e d .  The r e m a i n i n g   a l t e r n a t o r s  
132 
COMPARING 
AMPLIFIER 
RECTIFIERS I I 
DIODE 1 
ALTERNATOR 
FIELD 
1 
- 3-PHASE 400-CYCLE 
I) ALTERNATOR 
FIGURE VI-13 TYPICAL AIRCRAFT ALTERNATOR/REGULATOR SYSTEM 
I 
SWITCHING 
I 
RECTIFIERS 
AUXILIARY 
ALTERNATOR 
a 
T 
EMERGENCY 
ALTERNATOR 
1. I 
MAIN JET 
ENGINE-DRIVEN 
ALTERNATORS ( 1  
I 
MAIN BATTERY BUS -
TO LOAD SWITCHING PANEL -
1 
MAIN - BATTERIES - 
i 
FIGURE VI-14 SCHEMATIC OF DC-10 POWER SYSTEM 
134 
a u t o m a t i c a l l y  p i c k  up the  ex t r a  load  wh ich  w a s  b e i n g  c a r r i e d  by the  
f a i l e d  u n i t  i n  a d d i t i o n  t o  t h e i r  own previous  load .  
F a i l u r e  o f  a n y  s i n g l e  d i o d e  by o p e n  c i r c u i t  c o n d i t i o n  w i l l  not  nec-  
e s s a r i l y  c a u s e  a s y s t e m  f a i l u r e .  The r i p p l e  w i l l  i nc rease  and  the  rect i -  
f i e r  o u t p u t  v o l t a g e  will d r o p ,  r e s u l t i n g  i n  i n c r e a s e d  f i e l d  e x c i t a t i o n  
r e q u i r e m e n t  t o  t h e  a l t e r n a t o r .  The u n i t  f a i l u r e s  are normal ly   de tec ted  
a n d  c o r r e c t e d  b y  t h e  f l i g h t  e n g i n e e r .  A shor t ed  d iode  would  be  more 
ser ious  and  could  cause  a f a i l u r e  i n  t h e  u n i t  e i t h e r  by burn  out  of t h e  
phase  winding  or  damage t o  t h e  r e g u l a t o r  c i r c u i t .  (A s imple  fuse  could  
b e  i n s e r t e d  i n  e a c h  p h a s e  l e g  t o  e n s u r e  a n  o p e n  c i r c u i t  i n s t e a d  of a s h o r t  
and thus prevent component damage.) 
One sugges t ed  p ro tec t ion  dev ice  i s  an  ove rvo l t age  p ro tec t ion  dev ice ,  
Figure  VI-15.   This   device would  be loca t ed  a t  the   l oad   i npu t .  The 
c i r c u i t  h a s  t h e  a b i l i t y  of f a s t  a c t i o n  i n  t h e  e v e n t  t h a t  a r e g u l a t o r  
f a i l u r e  r e s u l t e d  i n  a n  o v e r v o l t a g e  f r o m  t h e  a s s o c i a t e d  a l t e r n a t o r .  I n  
t h a t  case t h e  d e v i c e  would tu rn  on  the  SCR thus  shoo t ing  ou t  t he  fuse  
and  removing  the  offending  system. The r e m a i n i n g  a l t e r n a t o r s  would be  
una f fec t ed  excep t  t ha t  t hey  would b e  r e q u i r e d  t o  p i c k  u p  t h e  a d d i t i o n a l  
load .  
The a v a i l a b i l i t y  o f  t he  f ive  gene ra t ing  sys t ems  p lus  the  ma in  ba t t e ry  
bank provide a paral le l  redundant  system which has  proved capable  of  
f a i l u r e - f r e e  o p e r a t i o n  f o r  t h e  time pe r iods  no rma l ly  r equ i r ed  in  
commerc ia l  f l i gh t s .  The a t t e n t i o n  of a f l i g h t  e n g i n e e r  t o  m o n i t o r  t h e  
system and make t h e  n e c e s s a r y  s w i t c h i n g  d e c i s i o n s . i s  m a n d a t o r y  i n  p r e s e n t  
f a i l u r e  s i t u a t i o n s .  
Consider a f a u l t  t o l e r a n t  c o m p u t e r  power s y s t e m  i n  w h i c h  t h e  f l i g h t  
eng inee r  does  no t  p l ay  such  a p a r t .  The u s e  of m u l t i p l e  p r o c e s s o r s  would 
a l low us  to  t ake  advan tage  o f  t he  r edundancy  o f  bo th  the  power sou rces  
and the  p rocesso r s .  F igu re  VI-16 shows such a s y s t e m  a p p l i e d  t o  t h e  
f i v e  power s o u r c e s  a v a i l a b l e  on a DC-10 a i r c r a f t ,  f o r  t h e  c a s e  of a SIFT 
sys tem wi th  a i r  processors .  F igure  V I - 1 7  i s  a s i m p l i f i c a t i o n  i n  w h i c h  
t h e  X s  i n d i c a t e  t h a t  a connect ion i s  made, i . e . ,  t h a t  t h e  p r o c e s s o r  i s  
o b t a i n i n g  power f rom the  assoc ia ted  source .  
135 
_. . . .. . 
FUSE 
TO PROTECTED 
LOAD 
FIGURE VI-15 OVERVOLTAGE PROTECTION CIRCUIT 
Assume a failure  of  main  alternator No. 1 in  this  system.  Proces- 
sors A ,  C, D, and F would  each  lose  one of their  three  power  sources. 
Two  sources  would  remain  to  sustain  operation.  The  case  is  similar  for 
loss  of  alternators 2 or 3 .  In each  instance,  two  power  sources  remain, 
In  multiple  failure  cases,  four  processors  are  obtaining  power  from  the 
essential  battery  bus. 
We  conclude  that  a  reliable  power  supply  for  the SIFT system  can  be 
designed  by  using  two  techniques: 
0 Incorporate  protective  devices  in  the  local  power  system 
controllers so as to  provide  protection  against  damage 
propagation. 
0 Use  an  interconnection  scheme  between  power  sources  and 
processor so that  failure  of  up  to  two  power  sources  does 
not  effect  the  SIFT  system  and  failure  of  three  power 
sources  causes  failure  of at most  one  processor/memory 
module. 
136 
POWER 
SOURCES 
W.1 EMERGENCY SOURCE 
t ==E 
PROCESSORS 
1 
I 
I rE 
F 
FIGURE VI-16 CONNECTION BETWEEN POWER SOURCES AND PROCESSORS 
137 
PROCESSORS 
A B C D E F 
1 
2 
POWER SOURCES 3 
EMERGENCY 
ESSENTIAL 
BUS 
FIGURE VI-17 POWER SOURCES AND PROCESSORS CONNECTION PATTERN 
REFERENCES 
1. V. E. Benes, "Mathematical Theory of Connecting  Networks and 
Telephone Traffic" (Academic Press,  New  York, 1965). 
2. A .  Waksman, "A Permutation Network,'' Journal Assoc. Comp. Mach., 
Volume 15,  No. 1, pp. 159-163 (1968). 
3. J. Goldberg, K. N. Levitt, and J. H. Wensley, "An Organization for 
a Highly  Survivable Memory," IEEE  Transactions on Computers, 
pp. 693-705 (July 1 9 7 4 ) .  
138 
V I 1  RELIABILITY ANALYSES 
A. Summary 
Several  models  of  SIFT system are examined t o  d e t e r m i n e  f a i l u r e  
p r o b a b i l i t i e s  a n d  f a i l u r e  rates under  c i rcumstances where random permanent 
f a u l t s  o r  t r a n s i e n t  e r r o r s  i n t e r f e r e  w i t h  normal  operation.  The  system 
i s  modeled  by a t ime-homogeneous Markov process,  and analytical  techniques 
are deve loped  fo r  conven ien t  so lu t ion  o f  t he  a s soc ia t ed  s ta te  graphs.  
P r inc ipa l  pa rame te r s  o f  t he  ana lys i s  are t h e  numbers of working processor/ 
memory and  bus  uni t s  remain ing  a t  a p a e t i c u l a r  time, t h e i r  r e s p e c t i v e  
f a i l u r e  ra tes  and t h e  l e n g t h  o f  t i m e  involved  in  reconf igur ing  the  sys tem 
on  de tec t ion  o f  e r ro r .  Each  model i s  implemented as a FORTRAN program  from 
w h i c h  t a b u l a t e d  v a l u e s  o f  f a i l u r e  p r o b a b i l i t i e s  may b e  c a l c u l a t e d  f o r  any 
d e s i r e d  v a l u e s  o f  r e l e v a n t  model parameters. The  most s i g n i f i c a n t  con- 
c l u s i o n s  o f  t h e  s t u d y  are 
a Assuming t y p i c a l   f a i l u r e  ra tes  f o r   e l e c t r o n i c   c o m q n e n t s  
and   an   accep tab le   t o t a l   sys t em  f a i lu re  r a t e  o f  10 / h r  
( a s  recommended  by t h e  FAA), a SIFT system composed of  
f i v e  o r  more processors  and  four  or  more busses  should 
adequate ly  meet t h e  r e l i a b i l i t y  r e q u i r e m e n t s .  
a System  survival  i s  l i m i t e d  by t h e  "weaker" c o l l e c t i o n  
o f  t h e  two t y p e s   o f   u n i t s   ( p r o c e s s o r s   o r   b u s s e s ) .   T h a t  
is, i f  t h e r e  are too  few p r o c e s s o r s  a v a i l a b l e  a t  a given 
time, i t  does  no t  improve  sys t em re l i ab i l i t y  to  have  a 
l a r g e  number o f  ava i l ab le  busses ,  and  conve r se ly .  
a For   e i the r   pe rmanen t   o r  randum t r a n s i e n t   f a u l t s ,   s y s t e m  
performance i s  no t  s ign i f i can t ly  deg raded  by  r econf igu -  
r a t i o n  times an  order  of  magni tude  grea te r  than  the  ex- 
p e c t e d  v a l u e  o f  a few mi l l i s econds .  
0 U n c o r r e l a t e d   t r a n s i e n t   e r r o r s   s i g n i f i c a n t l y   d e c r e a s e  
s y s t e m  r e l i a b i l i t y  o n l y  i f  t h e i r  ra te  of  occurrence  i s  
comparable  wi th  tha t  o f  permanent  fa i lure .  
139 
B. Mot iva t ion  
According to  an old anecdote ,  one e l e c t r i c a l  instrument  manufacturer  
dur ing  the  mid-1930s  used  to  t es t  h i s  p r o d u c t s  f o r  r e l i a b i l i t y  by kicking 
them down a f l i g h t  o f  s tairs.  Apparent ly   the   p rocedure  was e f f e c t i v e ,  
s ince   h i s   equ ipmen t   en joyed   t he   r epu ta t ion   o f  extreme ruggedness.  The 
s t o r y  i l l u s t r a t e s  a n  e a r l y  a n d  n o t  v e r y  s c i e n t i f i c  a p p l i c a t i o n  o f  d e s t r u c -  
t i ve  t e s t i n g  t o  reveal weaknesses i n   d e s i g n   c o n c e p t s .   D e s t r u c t i v e   t e s t i n g  
i s  s t i l l  widely used and i s  p a r t i c u l a r l y  e f f e c t i v e  i n  t h e  areas of mechani- 
ca l  and s t r u c t u r a l  e n g i n e e r i n g  w h e r e  f a i l u r e  ra tes  may b e  a c c e l e r a t e d  i n  
well understood ways  by  applying excessive loads,  stresses, hea t ,  e t c .  
Another way o f  o b t a i n i n g  i n f o r m a t i o n  a b o u t  f a i l u r e  ra tes  i s  by l i f e  
t e s t i n g  many similar un i t s   unde r   no rma l   ope ra t ing   cond i t ions .   Th i s  
p o l i c y  i s  the  one  more usually employed when i t  i s  u n c l e a r  how t o  accel-  
e ra te  component f a i l u r e  i n  a predictable  manner .  
F o r  t h e  e s t i m a t i o n  o f  t h e  r e l i a b i l i t y  o f  SIFT, ne i the r  o f  t he  above  
m e t h o d s  o f  t e s t i n g  c a n  b e  u s e f u l l y  a p p l i e d  t o  t h e  t o t a l  s y s t e m  f o r  two 
r e a s o n s .   F i r s t ,  i t  i s  not  clear how t o  "abuse"   the   sys tem  in   such  a way 
t h a t  a p r e d i c t a b l y   i n c r e a s e d  ra te  o f  f a i l u r e  would  occur.  Second,  the 
d e s i g n - g o a l  f a i l u r e  r a t e  f o r  t h e  t o t a l  s y s t e m  i s  so low (-10-9/hour) t h a t  
a c t u a l  l i f e  t e s t i n g  would be prohibi t ively s low and expensive,  
A s  an  a l ternat ive  approach,   one  can  decompose  the SIFT sys t em in to  
p a r t s  e a c h  o f  which has known r e l i a b i l i t y  p r o p e r t i e s  o r  i s  s u s c e p t i b l e  
t o  separate r e l i a b i l i t y  a n a l y s i s .  
Then the in t e rac t ion  be tween  these  parts c a n  b e  a c c u r a t e l y  d e s c r i b e d ,  
one can make conf iden t  p red ic t ions  abou t  t he  behav io r  o f  t he  sys t em as a 
whole.   Except   for  a c h o i c e   o f   a n a l y t i c   t e c h n i q u e s ,   t h i s   p a r t i t i o n i n g  
scheme i s  the  e s sence  o f  a r e l i a b i l i t y  model,  and our conclusions w i l l  
depend on the accuracy of our knowledge about the behavior of i t s  compo- 
n e n t  p a r t s .  What w e  hope t o  l e a r n  f r o m  t h e  r e l i a b i l i t y  model w i l l  be  
o u t l i n e d  i n  t h e  f o l l o w i n g  s e c t i o n .  
140 
C. The Reliability Model 
We deal  with  a  physical  system (a collection  of  hardware  and  pro- 
grams)  and  a  complicated  set  of  possible  "events."  For  convenience  one 
can  distinguish  between 
Normal  events; e.g.,  initiation  and  termination of scheduled 
programs,  changes  in  flight  phase,  pilot  intervention,  etc. 
Abnormal  events; e.g.,  hardware  failure or transient  errors. 
The  partitioning of the  system  could  be  carried  out  to  any  level of 
detail,  but  for  purposes  of  the  following  analysis we distinguish  proces- 
sors  (with  their  associated  memories)  and  individual  communication  busses 
as  component  parts  subject  to  separate  reliability  analysis.  For  these 
system-hardware  units  good  reliability  estimates  exist,  based  on  total 
count  of  active  devices  and  much  experience  with  similar  electronic 
equipment.  For  example  the  failure-rate of a  typical  processor  in  the 
SIFT  system  is  estimated  with  some  confidence  as  about 10 per  hour, 
while  failure of  a  bus-unit  is  estimated  to  be 10 per  hours  on  the  basis 
of  a  component-count  of  approximately 10% that  of  a  processor. 
-4  
-5 
As  part of the  SIFT  system,  programs  may  also  be  partitioned  into 
subsets  for  reliability  analysis.  The  main  interaction  between  programs 
and  hardware  from a reliability  standpoint  concerns  the  duration  and 
criticality of the  programs.  The  role of  program  criticality  is  dis- 
cussed  elsewhere  in  this  report,  however,  it  is  clear  that  short,  rapidly 
executed  programs  have  less  likelihood f being  disturbed  by  transients 
or  onset of hardware  failure. In particular,  reliability  estimates  turn 
out  to  be  sensitive  to  the  execution  time of  a  program  necessary  to re- 
covery or reconfiguration  of  the  system. 
We  intend  to  model  the  functioning of the  SIFT  system  by  a  finite 
set  of  distinct  "states"  with  transitions  between  states  occuring  in 
response  to  particular  events.  A  state of the  system  can  represent  any 
condition  that  seems  important  to  consider  in  the  reliability  analysis. 
For  example  a  specific  state  of  the  model  may  represent  the  combined 
conditions  that  one  processor  has  failed  and  that  a  reconfiguration  pro- 
gram  is  currently  being  executed on  a  different  processor.  Of  the  events 
141 
t ha t  cause  t r ans i t i ons  be tween  s ta tes  of  the model (pa r t i cu la r ly  abnorma l  
e v e n t s ) ,  many w i l l  occur  a t  random times. We want t o  a n a l y z e  the model t o  
d e t e r m i n e  t h e  p r o b a b i l i t i e s  t h a t  t h e  s y s t e m  w i l l  be  found  in  a des igna ted  
s t a t e  a t  o r  b e f o r e  o r  a f t e r  a p a r t i c u l a r  time. I n  p a r t i c u l a r  we are i n -  
t e r e s t e d  i n  t h e  p r o b a b i l i t y  t h a t  t h e  s y s t e m  w i l l  have  reached  the  FAIL 
s t a t e  be fo re  the  mis s ion  time 2 has  e l apsed .  
I n  a d d i t i o n  t o  t h e  estimate o f - s y s t e m - f a i l u r e  p r o b a b i l i t y  t h e  a b o v e  
model y i e l d s  several o t h e r  u s e f u l  t y p e s  o f  i n f o r m a t i o n .  
The p r o b a b i l i t y   o f   r e a c h i n g   c o n d i t i o n s   ( s t a t e s )   t h a t  would 
r e q u i r e   s p e c i a l   a c t i o n s   t o   b e   t a k e n .   F o r   e x a m p l e   t h e  abandonment 
of  some n o n c r i t i c a l  t a s k s  o r  some fo rm o f  p i lo t  i n t e rven t ion .  
D i f f e r e n t i a l   f a i l u r e  r a t e s .  That i s ,  t h e   f a i l u r e  ra te  o f   t he  
s y s t e m  a f t e r  i t  has  been i n  o p e r a t i o n  i n  say  n hours .  This  
i s  i m p o r t a n t  i n  e s t a b l i s h i n g  a pol icy for  equipment  maintenance 
and replacement.  
0 Mean t ime  be tween  fa i lures  (MTBF) f o r  t h e  s y s t e m  o r  p a r t s  of  
i t .  T h i s  i s  i n f o r m a t i v e  i f  u s e s  are contemplated  where no 
maintenance i s  f eas ib l e ,   e .g . ,   an   ex tended   mi s s ion   i n   space .  
D. Analy t ica l   Techniques  
The  most g e n e r a l  f o r m u l a t i o n  o f  t h e  r e l i a b i l i t y  model t h a t  w e  u se  
c o n s i s t s  o f  a d i r ec t ed  g raph  whose ver t ices  cor respond to  s ta tes  o f  t he  
s y s t e m ,   w h i l e   e d g e s   c o r r e s p o n d   t o   p o s s i b l e   t r a n s i t i o n s .   I n   t h i s   c a s e  
e a c h  t r a n s i t l o n  h a s  a n  a s s o c i a t e d  r a t e  t h a t  may be time dependent  and 
a l s o   h i s t o r y   d e p e n d e n t .   T h a t  is, t h e  t r a n s i t i o n  r a t e  o f  a p a r t i c u l a r  
edge might depend upon the manner i n  which  the  a t tached  state-vertex 
was reached. To analyze  such a s y s t e m   r e q u i r e s   t h e   s o l u t i o n   o f  a 
s y s t e m  o f  n o n l i n e a r  d i f f e r e n t i a l  e q u a t i o n s  f o r  w h i c h  r o u t i n e  a n a l y t i c a l  
methods do no t   ex i s t .   Consequen t ly ,   w i th   t h i s   t ype   o f  model, r ecour se  
would  have t o  be made to  numer ica l  in tegra t ion  programs,  and  to  the  
computation of many p a r t i c u l a r  cases t o  d e t e r m i n e  how so lu t ions  behave  
w i t h  r e s p e c t  t o  d i f f e r e n t  p a r a m e t e r s  o f  t h e  model. 
F o r t u n a t e l y ,  w e  w i l l  no t  need  such  comple t e  gene ra l i t y  in  d e a l i n g  
w i t h  t h e  SIFT r e l i a b i l i t y  m o d e l s .  One s i m p l i f i c a t i o n  i s  t h a t  we w i l l  
a lways  be  ab le  to  choose  the  s t a t e s  of  the model i n  such a way tha t  the 
probabi l i ty  of  occupying  a s ta te  i s  independent  of  how t h e  s t a t e  was 
142 
reached.  For  example,  if  we  designate  one  state  of  the  model  to  repre- 
sent the condition  that  two  processors  have  failed,  it  should  be  imma- 
terial  which  one  failed  first  (history  independence).  Another  simplifi- 
cation  occurs  because  most of the  transitions  of  our  models  are  in  re- 
sponse  to  "abnormal"  events of stochastic  nature,  like  component  failure, 
whose  failure  rates  are  constant  (time  independence).  Note  that  this 
assumption is approximately  true  for  semiconductor  devices  but  would  not 
be  true  for  say  the  clutch  in  an  automobile  which  wears  out  with  con- 
tinued  use. 
With  both  of  the  foregoing  simplifications,  our  reliability  model 
falls  into  the  category  of  a  finite-state,  continuous-time,  simple 
Markov  process.  For  such  Markov  processes,  there  are  elegant  and 
powerful  methods  of  obtaining  complete  closed-form  solutions.  When 
transition  rates  are  time  dependent  (but  not  history  dependent), as will 
be  the  case  if  a  transition  depends  on  say  the  fixed  execution  time f a 
program,  then  the  model  corresponds  to  a  semi-Markov  process. 
Here  a  variety of solution  techniques  have  been  suggested  in  the 
literature.  For  our  purpose  it  seems  very  desirable  to  retain  the 
simplicity of the  pure  Markov  model  and  to  have  a  uniform  analytic 
procedure  that  can  be  applied  in  all  cases.  For  this  reason,  the 
method  we  favor  is  to  approximate  the  behavior  of  time-dependent  state- 
transitions  with  a  collection  of  redundant  states  having  constant 
transition-rates,  and  whose  collective  behavior  simulates  the  desired 
time  dependence.  This  artifice  has  been  called  "the  method of stages" 
and  a  discussion  of  its  use  may  be  found  in  Reference 1. For  those 
unfamiliar  with  Markov  processes  a  short  description  with  applications 
to  the  modeling  problem  is  presented in the  appendix  to  this  report. 
The  appendix  also  illustrates  the  method  of  stages  as  applied  to  the 
modeling of  a  fixed-duration  event  and  a  transient  event. 
We  wish  to  find  the  most  convenient  and  potentially  useful  way of
obtaining  the  desired  insight  and  the  actual  numerical  results  from  a 
given  SIFT  model.  The  Markov-process  description  of  the  model  yields  a 
system of linear,  first-order,  constant-coefficient  differential  equations 
that  completely  determine  all of the  state-transition  probabilities  as  a 
143 
func t ion   o f  time. I n   t h e   u s u a l   f o r m u l a t i o n ,   t h e s e   r e l a t i o n s  are c a l l e d  
t h e  Chapman-Kolmogorov d i f f e r e n t i a l   e q u a t i o n s .   S i n c e  w e  are more i n -  
t e r e s t e d  i n  t h e  p o s s i b i l i t y  o f  o c c u p a t i o n  o f  p a r t i c u l a r  states o f  t he  
model, w e  w i l l  cons ider  a s l i g h t l y  d i f f e r e n t  set  o f  d i f f e r e n t i a l  e q u a -  
t i o n s  t h a t  r e l a t e  o c c u p a n c y  p r o b a b i l i t i e s  r a t h e r  t h a n  t r a n s i t i o n  p r o b a -  
b i l i t i e s .  The system  of   equat ions i s  
where P q ( t )  i s  the  probabi l i ty  of  occupying  s t a t e  q a t  time t, and a 
b .  a r e  t h e  ( c o n s t a n t )  t r a n s i t i o n - r a t e s  a s s o c i a t e d  w i t h  e d g e s  e n t e r i n g  
s t a t e  q from s t a t e  i and leaving  s tage  q t o  s ta te  j ,  r e s p e c t i v e l y  ( f o r  
a d e r i v a t i o n  o f  t h e s e  r e l a t i o n s  see the  appendix) .  
i’ 
3 
A s  i s  well known, the  so lu t ion  o f  a sys tem of  equat ions  l ike  the  
above s e t  c a n  be c a r r i e d  o u t  by a purely  mechanical  process.   Given  an 
i n i t i a l  v e c t o r  o f   o c c u p a t i o n   p r o b a b i l i t i e s ,   s a y  P = 1 and P = 0 f o r  
a l l  o t h e r  s ta tes ,  then  each P ( t )  can  be  expres sed  as the  sum 
1 q 
4 
where n i s  the  number o f  s t a t e s  i n  t h e  model, A i  a r e  numer i ca l  coe f f i -  
c i e n t s  t h a t  d e p e n d  o n l y  on t h e  i n i t i a l  p r o b a b i l i t i e s ,  P l ( 0 )  ... Pn(0) ,  
and A i  are the  e igenva lues  o f  t he  ma t r ix  o f  coe f f i c i en t s  o f  Pi 
assoc ia ted   wi th   the   sys tem  of   equa t ions   g iven  by (1). T r i v i a l  c o m p l i -  
ca t ions  occur  i f  a l l  o f  t h e  e i g e n v a l u e s  a r e  n o t  d i s t i n c t ,  i n  w h i c h  c a s e  
terms  of  the  form 
where k i s  the  mul t ip l i c i ty  o f  an  e igenva lue ,  need  to  be included. 
14 4 
In principle  one  can  rely on standard  routines  for  finding  eigen 
values  and  eigen  vectors of a  finite  matrix  to  solve  any  particular re- 
liability  model.  One  important  problem,  however,  is  that  expressions 
like  that of equation (2) can  lose  almost  all  of  their  computational 
accuracy  when  very  low  failure-rates  are  involved.  To  illustrate  this 
point  consider  the  very  over-simplified  model f  a SIFT  system  shown  in 
Figure  VII-1. 
FIGURE VII-1 A SIMPLIFIED  SIFT  MODEL 
Here  we  depict  a  system  that  begins  in  state  3  with  three  active  proces- 
sors  whose  individual  failure-rates  are  (per hour). This  model  shows 
a  rate  of "decay" 3r, into  the  situation  where  two  processors  survive, 
followed  by a  rate  of  failure  into  a  state  where  one  processor  is  left, 
followed  by a  rate of failure  into  a  failed  state F. Suppose  we  are 
interested  in  the  probability  of  being  in  state  1,  with  one  processor 
still  surviving.  Using  standard  numeric  techniques  as  described  above 
we  could  obtain  the  following  expression  for  this  probability, 
P (t) = 3e-lt - 6e + 3e-3rt -2rt 1 
where  the  above  coefficients  might  not  actually  be  exact  due  to  round-off 
errors.  Now  if  we  attempt  to  evaluate  this  expression  for  small rt the 
answer  may  be  almost  meaningless  unless  high-precisions  arithmetic is 
employed  throughout  all of the  calculations.  The  reason  is  that  each of
the  above  exponential  terms  has  a  value  nearly  equal  to  unity. The 
value of P (t) for  small rt is  actually  close  to 3(rt)2, but  this  fact 
might  not  be  evident  from  an  imprecise  evaluation of (3), particularly 
for  very  small  values of rt  where  much of the  significant  behavior of 
the  SIFT  system  occurs. 
1 
145 
A second  problem  involved  in  using  automatic  numeric  solution  tech- 
niques  of  the  eigenvalue-eigenvector  type  is  that  the  representation  of 
a  solution  in a form  involving  computed  numeric  coefficients  conceals  the 
way that  different  parameters of the  model  interact.  Thus,  it  is  more  dif- 
ficult  to  determine  how  particular  parameters  are  affecting  the  behavior 
of the  model  than  it  would  be  if  explicit  parameterized  expressions  for 
the  state  occupation  probabilities  were  available. 
For  these  (and  other)  reasons  we  prefer  to  use  Laplace  Transform 
techniques  in  dealing  with  the  system of  equations (1). Using  this 
approach  one  can  reduce  the  solution  of  the  system  to  algebraic  mani- 
pulation  of  polynominals  in  a  transform  variable 5, that  is  related  to 
the  probabilities Pi(t)  by  the  transformation 
/-a 
Pi(t)e dt 
.L -st 
When  the  transformation  is  applied  to (l), one  obtains  a  set  of  linear 
equations of the  form 
n n 
or 
n 
where B stands  for  the sum of the  rates b of  edges  emanating  from  state 
q. The  solutions  of  this  system  are  simple  ratios of polynomials  in g 
which  can  be inverted  by  standard  methods  to  obtain  the  corresponding 
time-dependent  solutions  for  each of the  state-occupancy  probabilities 
i 
Pi(t> ' 
Using  this  method on the  diagram  of  Figure VII-1 one  obtains 
PA1 ( s )  = 6r / (s+3r)  (s+2r)  (s+r) 2 ( 7 )  
from  which  one  can  immediately  deduce  the  exact  solution  already  given 
in (3). However,  the  expression (7) can  also  be  seen  by  inspection  to 
have  approximate  value  6r2/s3  for  sufficiently  large s . So, from 
14 6 
a  knowledge  that  tn  transforms  to n!/s one  can  also  conclude  by 
inspection  that P (t) behaves  like 3(rt)2 as t-Q. Several  simple but 
useful  relations  of  this  type  are  mentioned  in  the  appendix. 
n+l 
1 
E. Models and Programs 
In this  section  we  will  discuss  four  reliability  models of the  SIFT 
system  and  describe  computational  programs  based  on  these  models. A 
primary  assumption  is  that  the  principal  failure  modes of  a  model  corre- 
spond  to  transient or  permanent,faults  occurring  either  in  a  processor/ 
memory  unit or in  a  communication  bus  unit.  The  modeling so far  has  not 
considered  a  further  subdivision of the  system  components,  since  strate- 
gies  for  making  use  of  "partially  failed"  devices  have  not  been  considered 
in  any  detail.  We  also  assume  that  there  is  no  difference  in  criticality 
to  the  system  in  the  failure  of  any  particular  processor  or  bus.  This 
assumption  may  not  be  strictly  true  since  a  processor  that  is  executing 
an executive  program  may  be  more  critical  to  survival  than  one  employed 
in some  inessential  task.  By  choosing  to  ignore  the  latter  possibility 
we  obtain  results  that  are on the  pessimistic or "safe"  side of the  actual 
reliability  situation. 
The  main  parameters of the  models  considered  here  are  the  number  of 
processor/memory  units  and  number  of  buses  available  at  the  start of   
mission,  their  respective  permanent-failure  rates,  and  mission  time.  Two 
other  parameters  of  importance  are  the  expected  time  to  reconfigure  the 
system  after  detection of  a  fault  and  a  measure of the  expected  degree of 
success  in  diagnosing  a  transient  fault  and  restoring  the  system  to  oper- 
ation  without loss of  equipment.  For  each  model  we  assume  that  a  failed 
state  has  been  reached  if  less  than  two  processors  or  buses  survive. 
This  would  correspond  to  a  situation  in  which  voting  among  remaining  hard- 
ware  units  would  no  longer  be  meaningful. 
The  models  listed  below  were  selected  with  the  intent of discovering 
the  effects of differing  failure  modes  separately  and  in  combination. 
Model I: Permanent faults, instantaneous reconfiguration. 
Model 11: Permanent faults, finite reconfiguration time. 
147 
Model 111: T r a n s i e n t  f a u l t s  a lone .  
Model I V :  T rans i en t   and   pe rmanen t   f au l t s   w i th   f i n i t e  
r e c o n f i g u r a t i o n  t i m e .  
For  each of  the above models  the corresponding Markov-process  s ta te -  
graph was analyzed using Laplace Transofmr methods to  obtain closed form 
s o l u t i o n s  ( o r  a p p r o x i m a t e  s o l u t i o n s )  f o r  t h e  p r o b a b i l i t y  o f  r e a c h i n g  t h e  
f a i l e d  s t a t e  as a f u n c t i o n  o f  m i s s i o n  time. T h e s e  a n a l y t i c a l  e x p r e s s i o n s  
are  used  by several small i n t e r a c t i v e  FORTRAN p r o g r a m s  t o  p r i n t  t a b l e s  o f  
f a i l u r e  p r o b a b i l i t i e s  f o r  a n y  d e s i r e d  v a l u e s  o f  t h e  model parameters. A 
s a m p l e  o u t p u t  g i v i n g  t h e  f a i l u r e  p r o b a b i l i t i e s  f o r  a SIFT system composed 
from 10 or  fewer  processors  and  7 o r  fewer  buses  under  Model I assumptions 
i s  shown i n  T a b l e  V I I - 1 .  Here t h e  sample  o u t p u t  i s  parameter ized   wi th  
f a i l u r e  ra tes  o f  10 /hr  and 10 / h r   f o r   p r o c e s s o r   a n d   b u s   f a i l u r e  ra tes  
r e spec t ive ly ,   wh i l e   t he   mi s s ion  t i m e  was t aken  as 10 hour s .   These   f i gu res  
are i n t e r a c t i v e l y  s u p p l i e d  by the program user .  
-4 -5 
Cons is tency  be tween d i f fe ren t  models  may be checked by running their  
cor responding  programs wi th  parameter  va lues  tha t  cause  one  case to  de -  
g e n e r a t e   i n t o   a n o t h e r .   F o r  example Model 11, f o r   r e c o n f i g u r a t i o n  
time = 0, g i v e s  t h e  same r e s u l t s  as Model I, and  Model I V  wi th  100% 
p r o b a b i l i t y  o f  t r a n s i e n t  r e c o v e r y  g i v e s  t h e  same r e u l t s  as Model 11. 
For convenience,  we have  a l so  mod i f i ed  the  Model I1 and  Model I V  
programs t o   p r o v i d e   d i f f e r e n t i a l   p r o b a b i l i t i e s   o f   f a i l u r e .   T h a t  is, t h e  
p r o b a b i l i t y  o f  f a i l u r e  d u r i n g  t h e  n e x t  o n e - h o u r  p e r i o d  o f  a s y s t e m  t h a t  
s t a r t e d  w i t h  n processors  and 2 buses,  as a func t ion  o f  mis s ion  time. 
A t a b l e  o f  f a i l u r e  r a t e s  f o r  t h e  p a r t i c u l a r  r e c o n f i g u r a t i o n  t i m e  of one 
second as computed  by  Model I I A  i s  shown i n  T a b l e  V I I - 2 .  This   informa- 
t i o n  i s  impor t an t  i f  one  mus t  make po l i cy  dec i s ions  conce rn ing  when re- 
placement o r  m a i n t e n a n c e  o f  d e f e c t i v e  u n i t s  s h o u l d  o c c u r .  F o r  example, 
Table  V I I - 2  shows t h a t  w i t h  t h e  a s s u m e d  i n i t i a l  c o n f i g u r a t i o n  several 
10 -hour  mis s ions  cou ld  sa fe ly  be  unde r t aken  wi thou t  be tween- f l igh t  
m a i n t e n a n c e .   A c c o r d i n g   t o   t h e   t a b l e ,   a f t e r  60 hour s   o f   ope ra t ion ,   t he  
e x p e c t e d  h o u r l y  f a i l u r e  ra te  i s  s t i l l  on ly  5 x LO-', which i s  w i t h i n  
the   nominal   acceptance   va lue   o f  10 . -9 
148 
Table VII-1 
SAMPLE OUTPUT 
MODEL I. * 
PROCESSOR  FAILURE RATE: 1E-4 
BUS UNITS  FAILURE RATE: 1E-5 
MISSION TIME, I N  HOURS: 10 
FAILURE PROBABILITY TABLE : 
PRO/BUS 2 
2 2 .20~-03  
3 2.03E-04 
4 2.00E-04 
5 2.OOE-04 
6 2.00E-04 
7 2.00E-04 
8 2.00E-04 
9 2.00E-04 
10 2.00E-04 
3 
2.OOE-03 
3.02E-06 
3.40E-08 
3.00E-08. 
3. ow-08 
3.00E-0& 
3.00E-08 
3.00E-08 
3. OOE-Oe. 
4 
2.OOE-0 j 
3.OOE-06 
4.OOE-'39 
8.99E-12 
4.01E-12 
'4.00E-12 
4.00E-12 
4.00E-12 
4.00E-12 
5 
2.00E-03 
3.00E-06 
3.993-09 
4. WE-12 
6.4eE-15 
5.07E-16 
5.00E-16 
5.00E-16 
5.00E-1 G 
6 
2. OOE -03 
3.OOE-06 
3.99E-09 
4.993-12 
5.98E-15 
7.03E-1 e 
6.79E-20 
6.00E-20 
6.00E-20 
7 
2. OOE-OS 
3.00E-06 
j. 99E-09 
4.99E-12 
5.9EE-15 
6 .  gr/E-1 8 
7. WE-2 1 
1.60E-25 
7.01E-24 
T a b l e  V I I - 2  
FAILURE RATES FOR RECONFIGURATION  TIME 
OF ONE SECOND AS  COMPUTED BY MODEL I I A  
MODEL 1 1 - A .  * 
PROCESSOR  FAlLUHE RATE: 1E-4 
BUS UNITS FAILURE RATE: 1E-5 
NO. PROCESSORS: 5 
NO. BUSSES : 4 
RECONFIG. TIME, iN SEC: 1 
FAlLURE PRCBABILITY  TABLE: 
TINE: €iR. FAiL. PROB. H@URLY RATE. 
1 
2 
4 
6 
8 
10 
12 
16 
20 
40 
60 
80 
100 
200 
4 00 
5.59E-11 
1.12E-10 
2.24E-10 
5.37E-10 
4.51E-10 
5.67E-10 
6.87E-10 
9 . 4 2 ~ 1 0  
1.23E-09 
3.75E-0 9 
1.OGE-08 
2.65E-06 
5.elE-08 
7-993-07 
1.17E-05 
5.59E-11 
5.59E-11 
5.61E-11 
5.65E-11 
5. TZE-11 
5. EOE-11 
6.04E-11 
6.60E-11 
7. %E-1 1 
1.96E-10 
5.10E-10 
1.  11E-09 
2.08E-09 
1.5sE-08 
1.13E-07 
DATA EXTRACTED FROM MODEL 11. 
150 
F. Computa- t iona l   Resul t s   and   In te rpre ta t ions  
The r e l i a b i l i t y  m o d e l s  r e f e r r e d  t o  a b o v e  y i e l d  q u a n t i t a t i v e  d a t a  o n  
the  sepa ra t e  and  combined e f f e c t s  o f  f i n i t e  r e c o n f i g u r a t i o n  time and i m -  
p e r f e c t  t r a n s i e n t  r e c o v e r y  o n  a SIFT s y s t e m  t h a t  a l s o  s u f f e r s  s p o n t a n e o u s  
permanent   faul ts .  Model I depic t s   the   mos t   idea l ized   (and   mos t   op t imis-  
t i c )  s i t u a t i o n  i n  which  only  permanent   faul ts  are cons ide red .   Pe r fec t  
reconf igura t ion  s t ra teg ies  and  t rans ien t  recovery  schemes  cannot  improve  
t h e  r e l i a b i l i t y  estimates o f  Model I unless  "sa lvage"  of  working  par t s  
o f  t he  sys t em a t  a level smaller than  a processor/memory o r  b u s - u n i t  i s  
f e a s i b l e .  We have   no t   cons ide red   t he  l a t t e r  p o s s i b i l i t y  i n  t h i s  a n a l y s i s .  
A s t a t e - d i a g r a m  f o r  Model I i s  shown i n  F i g u r e  V I I - 2 .  S t a r t i n g  
wi th  2 processors  and m b u s s e s  i n  t h e  i n i t i a l  s t a t e ,  w e  assume cons tan t  
f a i l u r e  ra tes  o f  p and g f o r  e a c h  o f  t h e s e  d e v i c e s  r e s p e c t i v e l y .  It i s  
assumed t h a t  a f a i l e d  s t a t e  w i l l  have  been  reached  only  i f  less than  two 
p r o c e s s o r s  o r  b u s s e s  s u r v i v e  a f t e r  a miss ion  time o f  d u r a t i o n  T. 
STATE 1 
FIGURE VII-2 MODEL I STATE-DIAGRAM 
151 
I n  t h e  r e c t a n g u l a r  a r r a y  o f  s ta tes  r e p r e s e n t i n g  t h e  s u r v i v i n g  num- 
b e r s  o f  p r o c e s s o r s  a n d  b u s s e s  r e s p e c t i v e l y ,  n o t e  t h a t  t h e  f a i l u r e - r a t e s  
. a s s o c i a t e d  w i t h  e a c h  ex i t  arrow are p r o p o r t i o n a l  t o  t h e  number of  re- 
m a i n i n g  u n i t s  o f  t h e  same type.  The s ta tes  des igna ted  F r e p r e s e n t  
f a i l u r e .  
Obse rve  tha t  t he  model  corresponds to  one in  which processor  and 
b u s  f a i l u r e  are cons idered  to  be  independent  and  uncorre la ted  events .  
T h e r e f o r e  a n  a n a l y s i s  o f  f a i l u r e  p r o b a b i l i t y  c a n  b e  c a r r i e d  o u t  w i t h o u t  
"so lv ing"   for   each  s t a t e  occupancy   probabi l i ty .  However, w e  r e t a i n   t h e  
g e n e r a l i t y  o f  t h e  f u l l  M a r k o v - p r o c e s s  r e p r e s e n t a t i o n  b e c a u s e  w e  might 
wish i n  some l a t e r  a n a l y s i s  t o  c o n s i d e r  some s p e c i a l  a c t i o n  t o  b e  t a k e n  
i n  o n e  o f  t h e  p a r t i c u l a r  s t a t e s  of   F igure   VII -2 .   In   each   of   the   subse-  
quent  models  to  be  d iscussed ,  a s imilar  s t a t e  graph i s  assumed,  except 
t h a t  t h e  t r a n s i t i o n s  b e t w e e n  s ta tes  are compl i ca t ed  by  the  in se r t ion  o f  
i n t e r m e d i a t e  s ta tes  t h a t  r e p r e s e n t  t h e  e f f e c t s  o f  r e c o n f i g u r a t i o n  d e l a y s  
o r  t r a n s i e n t - f a u l t  r e c o v e r y .  
An a n a l y s i s  o f  Model I y i e l d s  c l o s e d - f o r m  e x p r e s s i o n s  f o r  f a i l u r e  
p r o b a b i l i t i e s  p a r a m e t e r i z e d  i n  terms o f  p r o c e s s o r / b u s  f a i l u r e  ra tes  and 
miss ion  time as  shown i n   T a b l e  V I I - 1 ,  where p = 10 , q = 10 and T = 
10 hour s .   Th i s   t abu la t ion  shows t h e  i n t u i t i v e l y  e x p e c t e d  r e s u l t  t h a t  
s u r v i v a l  p r o b a b i l i t y  f o r  a system having few processors  or  few busses  i s  
n o t  s i g n i f i c a n t l y  i m p r o v e d  by having a l a r g e  number o f  t h e  o t h e r  t y p e  o f  
u n i t  a v a i l a b l e .  
-4 -5 
To i l l u s t r a t e ,  r e f e r r i n g  t o  T a b l e  V I I - 1 ,  w e  see t h a t  w i t h  a n  i n i t i a l  
c o n f i g u r a t i o n  e m p l o y i n g  f o u r  p r o c e s s o r s  t h e  s y s t e m  f a i l u r e  p r o b a b i l i t i e s  
are e s s e n t i a l l y  i d e n t i c a l  i f  f o u r  o r  more busses  are a v a i l a b l e  i n i t i a l l y .  
S i m i l a r l y  i f  o n l y  t h r e e  b u s s e s  are i n i t i a l l y  a v a i l a b l e  t h e n  i t  does no 
good to   have   more   t han   f i ve   p rocesso r s .   The   sys t em  r e l i ab i l i t y   can   be  
s a i d  t o  b e  p r o c e s s o r - l i m i t e d  o r  b u s - l i m i t e d .  It t h e r e f o r e  makes sense  
t o  t h i n k  o f  "enough" b u s s e s  f o r  a g iven  number of  processors  (and con-  
v e r s e l y ) .  
The c a l c u l a t e d  v a l u e s  o f  T a b l e  V I I - 1  are t y p i c a l .  I n  t h i s  p r o t o t y p e  
case, a SIFT system composed of four processors and four buses i s  seen  
152 
t o  have a 1 0 - h o u r  m i s s i o n  p r o b a b i l i t y  o f  f a i l u r e  o f  4 x 10 . On a per -  -9 
h o u r   f a i l u r e  rate b a s i s   t h i s  would  be 4 x which i s  w i t h i n   t h e  
FAA a c c e p t a b l e  l i m i t  o f  1 x lO-’/hour.  The s i t u a t i o n  i s  not improved 
by  adding  another   bus,   which  would  change  the  fa i lure  ra te  t o  3.99 x 10 . 
However, t h e  a d d i t i o n  o f  another  processor  (making the count  5 and 4 )  
l e a d s   t o  a f a i l u r e  ra te  of   about   which  again i s  n o t   s i g n i f i c a n t l y  
improved by add ing  ye t  more processors.  
-10 
F o r  t h i s  model, t h e  s i g n i f i c a n t  f e a t u r e s  c a n  b e  summerized  by  the 
g raph  o f  F igu re  VII-3, which  shows sys tem-fa i lure  probabi l i ty  p lo t ted  
10-l8 
1 0 - l ~  
>- 
k 
=! 
m 
0 
a 
a 10-12 
a 
2 a 
Q, 
w 
3 
U 
1 o - ~  
1 o4 
1 o - ~  
1 
P = NUMBER OF PROCESSORS 
10 100 1000 
MISSION TIME - hours 
10.000 
FIGURE V I I S  MODEL I BEHAVIOR 
15 3 
a g a i n s t  m i s s i o n  time f o r  v a r i o u s  numbers o f   p rocesso r s   and   buses .   In  
p a r t i c u l a r ,  t h e  p o i n t  c o r r e s p o n d i n g  t o  a mis s ion  time o f  100 hours and 
a f a i l u r e  p r o b a b i l i t y  o f  lo-'' (10 /hour)  i s  s e e n   t o   b e   b a r e l y   a c h i e v a b l e  -9 
w i t h  6 processors  and 4 buses ;  the  need  i s  f o r  n e a r l y  7 and 5 r e s p e c t i v e l y .  
I n  Model I1 w e  c o n s i d e r  t h e  e f f e c t  t h a t  a f i n i t e  r e c o n f i g u r a t i o n  
t i m e  has   i n   deg rad ing   t he   pe r fo rmance   o f  Model I. Here i t  i s  assumed 
t h a t  upon d e t e c t i n g  t h e  p r e s e n c e  o f  a n  e r r o r  i n  a p rocesso r  o r  bus  some 
d iagnos t ic  and/or  reconf igura t ion  programs of  f ixed  dura t ions  must  run  
be fo re   s a fe   sys t em  ope ra t ions   can   be   r e sumed .   I f   ano the r   p rocesso r   o r  
b u s  f a i l u r e  o c c u r s  d u r i n g  t h i s  s h o r t  i n t e r v a l  o f  time i t  could  cause  
ser ious   p roblems  in   recovery .   For   the   purpose   o f  Model I1 w e  cons ide r  
t h e  l a t t e r  e v e n t   t o   b e  f a t a l .  Therefore ,  Model I1 p r e s e n t s  a r a t h e r  
pessimistic i n t e r p r e t a t i o n  of  f a i lu re  p r o b a b i l i t i e s  d u e  t o  n e a r l y  s i m u l -  
t aneous   permanent   fau l t s .   The   s ta te -d iagram  of   F igure   VII -2  i s  a l s o  
a p p r o p r i a t e  t o  t h i s  case i f  each t r a n s i t i o n  i s  r e i n t e r p r e t e d  t o  c o n t a i n  
a d e l a y - s t a t e  o f  f i x e d  d u r a t i o n  a n d  a d i r e c t  e n t r y  t o  t h e  f a i l e d  s t a t e  
as  shown i n  F i g u r e  V I I - 4  
I In-llp 
FIGURE VI14 MODEL 11 STATE-DIAGRAM 
The f i x e d  d e l a y  s ta tes  l a b e l e d  T i n  F igu re  VII-4 are handled in  the 
manner desc r ibed  i n  the appendix t o  th i s  r e p o r t .  A c t u a l l y ,  w e  o b t a i n  
exact c losed - fo rm expres s ions  fo r  t he  va r ious  s t a t e -occupancy  p robab i l i t i e s  
just as was done  for  Model I .  
154 
A t yp ica l  ou tpu t  f rom the  Model I1 program i s  shown i n  T a b l e  VII-3. 
Here again,  we have   taken   processor   and   bus   fa i lure  ra tes  of  10  and 
10 as r e a s o n a b l e   v a l u e s   w i t h  a miss ion  t i m e  o f  10 hours.  The value 
of T, t h e  r e c o n f i g u r a t i o n  t i m e ,  s t a t e d  as 10 seconds i s  a gross  over -  
estimate of  the  probable  time for  such  a procedure, which might take a t  
the  most  perhaps 100 m s .  We t a k e  t h e  l a r g e r  v a l u e  o f  T t o  i l l u s t r a t e  t h e  
e f f e c t s  o f  t h e  f i n i t e  r e c o n f i g u r a t i o n  t i m e  o n  t h e  r e s u l t s  o f  Model I. 
-4 
-5 
From an  in spec t ion  o f  Tab le  VII-3 it can  be  seen  tha t  in  each  row 
and  column  corresponding  to a f i x e d  number o f  p r o c e s s o r s  ( b u s e s ) ,  i n  
s y s t e m  f a i l u r e  p r o b a b i l i t i e s  f i r s t  d i m i n i s h  and  then  increase  wi th  la rger  
number buses   (p rocesso r s ) .  The i n t e r p r e t a t i o n  o f  t h i s  r e su l t  can  be  under- 
s tood  by cons ide r ing  ou r  p rev ious  a s sumpt ion  tha t  a s y s t e m  f a i l u r e  would 
o c c u r  i f  a n y  p a i r  o f  f u n c t i o n a l  u n i t s  f a i l e d  w i t h i n  t i m e  r. Obviously, 
t h i s  p o s s i b i l i t y  i n c r e a s e s  w i t h  t h e  number o f  a c t i v e  p r o c e s s o r / b u s  u n i t s  
involved. However, t h e  d i s t r i b u t i o n  o f  t a s k s  o v e r  many p rocesso r s  would 
a l l o w  v o t i n g  o v e r  t h e  r e s u l t s  o f  s e v e r a l  u n i t s  i n s t e a d  o f  s a y  t h r e e - -  
so t h a t  some d o u b l e   f a i l u r e s   m i g h t   i n   f a c t   b e   t o l e r a t e d .   T h i s  means t h a t  
t h e  r e s u l t s  o f  Model I1 may be  cons idered  to  be  pess imis t ic .  F igure  VII-5 
shows how f a i l u r e  p r o b a b i l i t y  v a r i e s  w i t h  e x t e n d e d  m i s s i o n  t i m e s  w i t h  re- 
c o n f i g u r a t i o n  time as a parameter.  The c u r v e s   r e p r e s e n t   a n   i n i t i a l   c o n -  
f i g u r a t i o n  o f  f i v e  p r o c e s s o r s  a n d  4 buses. 
The d a s h e d  l i n e  r e p r e s e n t i n g  z e r o  r e c o n f i g u r a t i o n  t i m e  cor responds  
to   da ta   ob ta ined   f rom Model I. The e f f e c t s  o f  a f i n i t e  r e c o n f i g u r a t i o n  
t i m e  a r e  m o s t  a p p a r e n t  f o r  s h o r t  m i s s i o n  times where f a i l u r e  p r o b a b i l i t y  
would have been very low accord ing  to  the  assumpt ions  of  Model I. 
For  very long mission times f a i l u r e  p r o b a b i l i t y  is l i m i t e d  b y  t h e  
numbers o f  ava i l ab le  p rocesso r s  and  buses  and  becomes independent of the 
v a l u e  o f  T. In   t he   r ange   o f   mi s s ion  times nea r  10 hours ,   there  i s  a n  
inc rease  o f  f a i lu re  p robab i l i t y  o f  abou t  one  o rde r  o f  magn i tude  fo r  each  
o rde r  o f  magn i tude  inc rease  in  T .  It should   be   no ted   tha t   for   about  a 
10 hour  mission and a r e c o n f i g u r a t i o n  time of  10 seconds ,  t he  f a i lu re  
p r o b a b i l i t y  i s  about 5 x 1 0  o r  5 x LO-" on a per -hour   bas i s .  -9 
155 
Table  VII-3 
TYPICAL  OUTPUT  FROM  MODEL I1 PROGRAM 
MODEL 11. Q 
PROCESSCR FAILURE RATE: 1E-4 
BUS  NITS  FAILURE RATE: 1E-5 
MISSION TIME, IN HOURS: 10 
RECONFIG. TIME, I N  SEC: 10 
FAILURE PROBABILITY TABLE : 
PRO/BUS 2 
2 2 . 2 0 ~ - 0 3  
3 2.03E-04 
4 2.00E-04 
5 2.00E-04 
6 2.OOE-04 
7 2.00E-04 
a 2.00E-04 
9 2.00E-04 
10 2.0OE-04 
1 
d 
2.0013-03 
3.03.E-06 
3.726-08 
3.56E-08 
3.83E-08 
4 .  17E-08 
4.56E-OR 
5.OOE-08 
5.50E-0 8 
4 
2.00E-03 
3.OOE-06 
7.363-09 
5.593-09 
8.36E-09 ' 
1 .  17E-08 
1.563-08 
2.00E-08 
2.50E-08 
5 6 
2.OOE-03 2.OO.L-03 
3.00E-06 3.OOE-06 
7.38E-09 7 . 4 0 ~ ~ - 0 9  
5.6113-09 5.64E-09 
8.383-09  .41E-09 
1.17E-08  1.17E-08 
1.56E-08  1.56E-08 
2.00.E-08 2 . 0 1 ~ - 0 8  
2.50E-08 2 . m - 0 8  
7 
2.OOE-03 
3.00E-06 
7.44E-09 
5 . 6 7 ~ - 0  9
8.44E-09 
1.18E-08 
1.5'jE-08 
2.01E-08 
2 . 5 1 ~ 0 8  
* FINITE RECONFIGURATION TIME. 
156 
I 
1 0-l4 
1 0-13 
10-1 
> 
k 
2 ; 10-l0 
z 
~1 
n 
a 
2 a 
3 
U 
1 o-8 
1 o - ~  
1 o4 
1 o - ~  
1 
\ I I 
\ 
\ - \ r = o .  PROCESSORS = 5 
\ 
\ BUSSES = 4 
- \ 
\ 
7 = 0.1 \ 7 (IN SECONDS) 
10 100 
MISSION TIME - hours 
1000 
FIGURE VII-5 MODEL 11 BEHAVIOR 
Therefore ,  10 seconds would be an acceptable  reconfigurat ion t i m e  f o r  
this   combinat ion  of   processors   and  buses .   Actual ly  w e  have  es t imated 
t h a t  r e c o n f i g u r a t i o n  s h o u l d  t a k e  no  more than  a few mi l l i s econds ,  so t h a t  
a f a i l u r e  p r o b a b i l i t y  a b o u t  100 times smaller than the above f igure would 
b e  e x p e c t e d  i n  t h i s  case. 
Model I11 was des igned  to  measure t h e  e f f e c t s  o f  i m p e r f e c t  r e c o v e r y  
from t r a n s i e n t  e r r o r s .  A s  such, i t  does  not   provide a r e a l i s t i c  model o f  
SIFT s i n c e  i t  assumes  that  no  permanent f a u l t s  o c c u r .  The u s e f u l n e s s  o f  
Model I11 was mainly to  have an independent  computat ional  check on the 
r e s u l t s  o f  Model I V  which includes Model I11 as t h e  special  case where 
bo th  p rocesso r  and  bus  pe rmanen t  f a i lu re  r a t e s  have  va lue  ze ro .  
15 7 
I I I I ...I .. . .. . ._  .- "- 
Model IV is  a superposition of  Models I, 11, and 111. It  attempts 
to  account  for  the  effects of permanent  faults  (randomly  occurring), 
finite  reconfiguration  time  and  transient  recovery.  The  state  transitions 
associated  with  one  node of the  state  diagram  are  shown  in  Figure V I I - 6 .  
... 
'Designates probabilities (not rates). 
. 
FIGURE VII-6 MODEL IV STATE-DIAGRAM 
For  the  typical  node  of  the  state-transition  graph  of  Model  IV 
there  are  these  possibilities: 
a. A transient  error  may  occur  with  average  rate 
or rb, causing  a  transition  to  one of the  two  states 
labeled R in  Figure  VII-6.  Following  this  occurrence 
we  assume  that  detection  and  correction  in  a  return  to 
the  original  state.  Otherwise,  we  conclude  that  a  per- 
manent  fault  has  occurred  and  proceed  to  a  state  with 
one  less  processor or bus  in  the  active  system. 
158 
b. A permanent   faul t  may occur   wi th   average  ra tes  p o r  g 
( a s  i n  Model 11) caus ing  a t r a n s i t i o n  t o  o n e  o f  t h e  
s ta tes  having one less  p r o c e s s o r  o r  b u s  S f  c o n f i g u r a t i o n  
i s  s u c c e s s f u l .   I f  two permanent f a u l t s   o c c u r   w i t h i n  
time r, t h e  FAIL s ta te  i s  en te red .  
S ince  Model I V  has  seven  pa rame te r s ,  i nc lud ing  two permanent f a i l u r e  rates 
( f o r  p r o c e s s o r s  a n d  b u s e s  r e s p e c t i v e l y ) ,  two c o r r e s p o n d i n g  t r a n s i e n t  f a i l -  
u r e  rates r e c o n f i g u r a t i o n  t i m e ,  t r a n s i e n t   r e c o v e r y   p r o b a b i l i t y ,   a n d  
miss ion  tirne--it i s  h a r d  t o  p r e s e n t  a comprehens ive  p ic ture  of  how a l l  
t h e s e   p a r a m e t e r s   i n t e r a c t .  The b e s t  way t o   e x p l o r e   t h i s   s e v e n - d i m e n s i o n a l  
space (nine dimensions of  processor  and bus counts  are inc luded)  i s  t o  r u n  
t h e  i n t e r a c t i v e  Model I V  FORTRAN program using parameter  values  near  the 
r e g i o n  o f  i n t e r e s t .  We have  a l r eady  focused  a t t en t ion  on  one  set  of  pa- 
rameter values t h a t  seem t o  b e  r e a s o n a b l e  o r  t y p i c a l  f o r  t h e  p r o p o s e d  
SIFT system, i . e . ,  p = 10 q = l f 5 ,  T = 10 hours ,  T = 100ms, m = 5 
p rocesso r s ,  n = 4 buses.  To see how t r a n s i e n t  e r r o r s  c a n  a f f e c t  f a i l u r e  
p r o b a b i l i t i e s  w e  may assume the above values and compute Model I V  f o r  
v a r i o u s  v a l u e s  o f  t r a n s i e n t  r e c o v e r y  p r o b a b i l i t y  a n d  t r a n s i e n t  e r r o r  rates. 
Resu l t s  co r re spond ing  to  one  cho ice  o f  r ecove ry  p robab i l i t y  ( . 9 )  and t ran-  
s i e n t  e r r o r  ra te  (10 ) are  shown i n   T a b l e  VII-4. 
-4  
-5 
A composi te  graph  showing the  e f fec t  o f  d i f fe ren t  assumpt ions  about  
recovery  r a t e  and t r a n s i e n t  e r r o r  r a t e  i s  shown i n  F i g u r e  V I I - 7 .  It i s  
o b s e r v e d  t h a t  t h e  e f f e c t s  o f  t r a n s i e n t s  are r a the r  c lose ly  approx ima ted  
by s imply  adding  to  the  processor  and  bus  permanent  fa i lure  rates a ra te  
e q u a l  t o  ( l - p t r . )  times t h e  c o r r e s p o n d i n g  t r a n s i e n t  f a i l u r e  rate.  
J- 
In  F igu re  VI I -7  the  cu rve  l abe led  A r e p r e s e n t s  t h e  s i t u a t i o n  i n  w h i c h  
t r a n s i e n t   r e c o v e r y  is  s u c c e s s f u l  100% of  t h e  time i . e . ,  ptr;k = 1. 
Curve B r e p r e s e n t s  a case w h e r e  t r a n s i e n t  e r r o r  rates are assumed t o  b e  
i d e n t i c a l  w i t h  p e r m a n e n t  f a u l t  ra tes  a n d  t h e  p r o b a b i l i t y  of  recovery  i s  
zero.   Thus,   for   any  recovery  probabi l i ty   between 0 and 1 t h e  f a i l u r e  
p r o b a b i l i t y  o f  t h e  s y s t e m  l ies  between the curves A and B--and very close 
t o  A f o r  h i g h  r e c o v e r y  p r o b a b i l i t i e s .  The curve  D r ep resen t s  ze ro  p roba -  
b i l i t y  o f  r e c o v e r y  from t r a n s i e n t s  o c c u r r i n g  a t  10 times t h e  ra te  o f  
159 
Table VII-4 
TRANSIENT RECOVERY PROBABILITY AND TRANSIENT ERROR RATES 
NODEL i V .  
PROCESSOR PERMANENT FAILURE RATE: 1E-4 
PROCESSOR TRANSIENT FAILURE RATE: 1E-5 
BUS UNITS PERblANENT FAILURE RATE: 1E-5 
BUS  NITS TRANSIENT FAILURE RATE: 1E-5 
MISSION TIME, HOURS: 10 
RECONFIG. TiME, 1N SEC: 0.1 
RECGVERY PROBAGILlTY: 0 .9  
FAILURE PROBABILlTY TAELE : 
PRO/BUS 2 
2 
3 
4 
5 
6 
7 
8 
9 
10 
2.24E-03 
2 .23 -04  
2.20E-0 4 
2.20E-04 
2.20E-04 
2.20E -04 
2.20E-04 
2.20E-04 
2.20E-04 
3 
2.02E-03 
3.09E-06 
4.04E-08 
3.64E-08 
3. €4E-08 
3.64E-0 8 
2.65E-08 
3.65E-08 
3.65E-08 
4 
2.02E-0 3 
3.06E-06 
4. 15E-Cg 
6.75E-11 
9.07E-11 
1.25E-10 
l.€QE--10 
2.10E-10 
2.60E-10 
5 
2.02E-03 
3.06E-0 6 
4 .  15E-09 
6.25E-11 
8.56E-11 
1.20E-10 
1.59E-10 
2.04E-10 
2.55E-10 
6 
2.02E-OS 
3.06E-.O6 
4.15E-09 
0.28E-11 
8.59E-11 
1.20E-10 
1.60E-10 
2.05E-10 
2 . 5 6 ~ - 1 0  
7 
2.02E-03 
3.06E-06 
4. 15E-09 
6.32E-11- 
8.63E-11 
1.20E-10 
1.60E-10 
2.05E-10 
2.5615-10 
* RECONFIGURATION WITH TRANSiENT RECOVERY. 
(UNCORRELATED BETKEEN DEVICES) 
160 
1 0 - l ~  I I 1 
10-l2 t PROCESSORS = 4 BUSSES = 5 
RECONFIGURATION  TIME = 100 rns 
1 0-1 ’ 
MISSION TIME - hours 
FIGURE VII-7 MODEL IV BEHAVIOR 
permanen t   f au l t s .   F ina l ly ,   cu rve  C r e p r e s e n t s  a 70 p e r c e n t   p r o b a b i l i t y  
o f  r e c o v e r i n g  f r o m  t r a n s i e n t  e r r o r s  a l s o  o c c u r r i n g  a t  10 times t h e  perma- 
n e n t  f a i l u r e  rate.  
We do n o t  p r e s e n t l y  h a v e  r e l i a b l e  d a t a  o n  t h e  e x p e c t e d  rates f o r  
t r a n s i e n t  f a u l t s ,  b u t  t h e  rates c o v e r e d  i n  F i g u r e  VII-7 are probably 
h i g h e r  t h a n  t h o s e  t h a t  would  occur i n  p r a c t i c e .  O b s e r v e  t h a t  e v e n  w i t h  
the   h igh  ra tes  o f  10 a n d   f o r   p r o c e s s o r   a n d   b u s   t r a n s i e n t   e r r o r s ,  
t h e   s y s t e m   f a i l u r e  ra te  f o r  a 10 hour   mission i s  about  3 x 10 . T h i s  
is  well wi th in   the   nominal  10 p e r   h o u r   f a i l u r e  ra te  cons idered  satis- 
-3  
-8 
- 9  
f a c t o r y  by t h e  FAA. 
161 
REFERENCE 
1. D. R. Cox and H .  D. M i l l e r ,  The Theory  of Stochastic Processes ,  
Wiley (1965), p p .  187-189. 
162 
V I 1 1  THE HIERARCHICAL  DESIGN  METHODOLOGY 
T h i s  s e c t i o n  was p rev ious ly   i s sued  as Technica l  Memo No. 6 ,  It is 
a tu to r i a l  desc r ip t ion  o f  t he  so f tware  des ign  me thodo logy  tha t  i s  be ing  
employed i n  t h e  d e s i g n  o f  t h e  SIFT computer and other software systems. 
T h i s  p a r t i c u l a r  a p p r o a c h  i s  a n  i n s t a n c e  and an extension of  what  has  come 
t o  b e  c a l l e d  " s t r u c t u r e d  programming." It  has  been  developed i n  i t s  pres-  
ent  form mainly a t  SRI f o r  t h e  c r e a t i o n  o f  l a r g e  and complex programs, 
i nc lud ing  ope ra t ing  sys t ems .  
Th i s  new methodology a p p e a r s  t o  have  cons ide rab le  gene ra l i t y ,  bu t  
f o r  t h i s  i n t r o d u c t o r y  d e s c r i p t i o n  o n l y  t h o s e  aspects t h a t  are r e l e v a n t  t o  
t h e  SIFT  computer  design are  covered in  d e t a i l .  Emphasis i s  placed  on 
c o n c e p t s  t h a t  are  e i ther  fundamenta l  bu t  unfami l ia r ,  o r  are e s p e c i a l l y  
c r i t i c a l  i n  the SIFT  computer. 
The new methodology claims s e v e r a l  s i g n i f i c a n t  a d v a n t a g e s  o v e r  c o n -  
ven t iona l  so f tware  des ign  t echn iques ,  namely 
The costs   of   program  product ion are  reduced. 
The f i n a l  program i s  more  amenable to   formal   p roof   o f  
c o r r e c t n e s s  t h a n  a program developed on an ad hoc basis.  
The r e l i a b i l i t y  o f  t h e  r e s u l t a n t  program i s  improved. 
The  program i s  s p e c i f i e d  i n  a way tha t  enhances  i t s  
u n d e r s t a n d a b i l i t y .  
The  program i s  f l e x i b l e  t o  f u t u r e  d e s i g n  m o d i f i c a t i o n s .  
The  program i s  c r e a t e d  i n  s u c h  a way t h a t  i t  may be  aug- 
mented t o  h a v e  a d d i t i o n a l  special  proper t ies - -e .g . ,  
- Secur i ty  p rov i s ions  can  be  added  to  p reven t  unau thor -  
i z e d  access, l eakage  o r  mod i f i ca t ion  o f  i n fo rma t ion .  
- T h e  o p e r a t i o n a l  r e l i a b i l i t y  c a n  b e  a s s e s s e d  q u a n t i t a -  
t i v e l y .  
Most  of  these  fea tures  are impor tan t  des ign  objec t ives  of  the  SIFT com- 
p u t e r .  
. , 
Large  and  d i f f icu l t  p roblems of  any  type- -bas ic  research ,  engineer -  
ing   des ign ,   p roduct   deve lopment ,   and   product   o rganiza t ion- -are   invar iab ly  
handled by  some so r t  o f  decompos i t ion  of t h e  large p rob lem in to  several 
smaller ones.  Of t h e  many possible   decomposi t ions  that   could  be  used,  
one i s  n o r m a l l y  s e l e c t e d  f o r  w h i c h  t h r e e  c o n d i t i o n s  are s a t i s f i e d :  
(1) Each smaller problem i s  well def ined ,  s o  t h a t  t h e r e  
i s  no ambigui ty  about  whether  a proposed  so lu t ion  i s  
rea lLy a s o l u t i o n .  
(2 )  The  decomposition is  comple te - - tha t  i s ,  s o l u t i o n s   t o  
a l l  o f  t h e  smaller problems w i l l  b e  S u f f i c i e n t  t o  
s o l v e  t h e  l a r g e  o n e .  
( 3 )  The smaller problems are easier t o  s o l v e  t h a n  t h e  large 
one. 
The s m a l l  problems can be fur ther  decomposed a c c o r d i n g  t o  t h e  same con- 
d i t i ons .   Th i s   decompos i t ion  may be  repeated as o f t e n  as n e c e s s a r y  u n t i l  
on ly  r ead i ly  r e so lved  ques t ions ,  tests, measurements o r  expe r imen t s  re- 
main. I f  t h e  t h r e e  c o n d i t i o n s  are s a t i s f i e d  a t  each s t e p ,  then   one  may 
b e  s u r e  t h a t  t h e  d e c o m p o s i t i o n  p r o c e s s  w i l l  t e rmina te ,  and  tha t  t he  o r ig -  
i n a l  l a r g e  p r o b l e m  h a s  a s o l u t i o n  when t h e  f i n a l  r e s i d u a l  p r o b l e m s  are 
so lved .  
T h e s e  c o n d i t i o n s  f o r  t h e  i t e r a b i l i t y  o f  a decomposition are w e l l -  
known, but  are s t a t e d  h e r e  i n  t h i s  p a r t i c u l a r  way b e c a u s e  o f  t h e i r  d i r e c t  
r e l e v a n c e  t o  t h e  c r e a t i o n  o f  complex  sof tware  sys tems--a  re levance  tha t  
has  r e s i s t ed  fo rma l  t r ea tmen t  du r ing  the  p a s t  two decades  of  development 
of  computer  programming.  These t h r e e   c o n d i t i o n s ,  when p r o p e r l y   r e s t a t e d  
in  fo rma l  t e rms ,  are s u f f i c i e n t  t o  i n s u r e  t h a t  l a r g e  a n d  v e r y  i n t r i c a t e  
programs may be created by repeated decomposi t ions,  as f i n e l y  as des i r ed ,  
so  as to  ach ieve  the  advan tages  l i s t ed  above .  
The  concepts  tha t  are c e n t r a l  t o  a n  u n d e r s t a n d i n g  of t h i s  t y p e  o f  
program decomposition w i l l  be  introduced by extension from two f a m i l i a r  
engineer ing  design  problems.   These examples u t i l i z e   d e c o m p o s i t i o n s  so  
n a t u r a l  and f a m i l i a r  t h a t  t h e  i t e r a b i l i t y  c o n d i t i o n s  are n o t  u s u a l l y  d e a l t  
w i t h  e x p l i c i t l y .  
The f i r s t  p o i n t  t o  b e  made  i s  t h a t  t h e  r e s u l t s  o f  a decomposition 
may b e  a b s t r a c t  r a t h e r  t h a n  c o n c r e t e .  
164 
Consider  the  design  of  a  large  unit of engineering  hardware  such as 
an  automobile  or  a  spacecraft. For an automobile  a  traditional  approach 
is  effective:  the  overall  problem  is  decomposed  into  portions  correspond- 
ing  to  the  various  physical  parts  of  the  vehicle--engine,  steering,  brakes, 
body,  electrical  system,  and so on.  Each of these  portions  can  be  simi- 
larly  decomposed,  and so on.  Specifications  are  written  for  each  portion 
to  permit  the  various  portions  to  be  designed  independently, with the 
assurance  that  they  will  all  fit  together  in  the  final  assembly. 
For  spacecraft,  however,  it  has  proven  more  effective  to  make  the 
initial  decomposition on the  basis  of  function--e.g.,  attitude  control, 
propulsion,  scientific  experiments,  communications,  etc.,  as  suggested  by 
Figure  VIII-1.  Each  such  function  may  but  need  not  correspond  to  a  par- 
ticular  unit  of  hardware. Finer decompositions  are  made in terms  of  func- 
tion  and/or  hardware. A given  unit  of  hardware  may  be  used  to  provide 
several  functions  or  subfunctions. The first  two  iterability  conditions 
are  satisfied  at  each  step  by (1) describing  each  function  or  unit  by  a 
specification  that  prescribes  its  behavior  completely,  without  getting 
involved  in how this  behavior  is  implemented;  and ( 2 )  showing how the 
overall  design  specifications  will  be  met if only  the  various  functions 
or  units  meet  their  own  specifications. 
In addition,  if  each  step  involves  a  simplification  of  function 
[Condition ( 3 ) 1 ,  the  entire  iteration  will  converge  into  an  effective 
design. For example,  the  communication  function  might  be  broken  down 
into  data  acquisition,  data  reduction  and  data  storage,  followed  by  a 
transmitter  and an antenna  system. Data acquisition  could  be  further  de- 
composed  into  sub-functions  such  as  scanning,  multiplexing,  A+D  conver- 
sion,  etc.;  and so on. A s  noted,  some  subfunctions  will  be  executable on
common  subunits of hardware,  such as  an onboard  computer  provided  to  im- 
plement  not  only  data  acquisition  and  reduction,  but  data  storage,  por- 
tions of attitude  control  and  the  scientific  experiments  as  well. 
This  example  should  illustrate  clearly  that  design  decompositions, 
including f u l l  specifications,  need not be  made  in  terms  of  hardware,  but 
that  abstract  nonphysical  concepts  such  as  "function"  can  be  used  as  well. 
165 
FIGURE VIII-1 DECOMPOSITION IN TERMS OF FUNCTION 
A second  illustration  is  provided  by  the  set  of  subroutines  that  are 
commonly  employed  in  the  writing of a  program.  Assuming  that  no  recursion 
is  allowed,  the  "calling"  operation  can  effectively  organize  these  sub- 
routines  into  a  hierarchy  with  the  main  program  (or  programs)  at  the  top 
(Figure VIII-2). This  hierarchy  represents an iterated  decomposition  of 
the  original  program  (problem)  into  a  succession of successively  smaller 
programs.  One  may  even  suppose  that  the  simplest  subroutines  are  them- 
selves  decomposed  into  sequences  and  instructions  taken  from  a  common 
programming  language  at  the  lowest  level.  The  syntax  of  conventional 
programing normally  forces  the  three  iterability  conditions  to  be  sat- 
isfied  automatically. 
FIGURE VIII-2 ILLUSTRATION OF A  FUNCTIONAL  HIERARCHY 
167 
This  second  example  i l l u s t r a t e s  how a programming h ierarchy  can  be  
s e t  up. It i s  d e s i r a b l e  t h a t  t h i s  h i e r a r c h y  p o s s e s s  c e r t a i n  f e a t u r e s ,  t h e  
most important of which i s  the  independence  of  the  da ta  handled  by a sub- 
rou t ine  f rom d i r ec t  man ipu la t ion  by t h o s e  h i g h e r - l e v e l  s u b r o u t i n e s  t h a t  
may c a l l  t h e  s u b r o u t i n e  i n  q u e s t i o n .  It would   be   p refer red   i f   the   da ta  
s t r u c t u r e s  u s e d  by any  one  subrout ine  (or  procedure)  could  be  in t imate ly  
a s s o c i a t e d  w i t h  t h e  o p e r a t i o n s  o f  t h a t  s u b r o u t i n e ,  so t h a t  a l l  accesses  
and  changes t o  t h e  d a t a  m u s t  pass th rough  the  sub rou t ine  i t s e l f .  Whi l e  
these  k inds  o f  cons t r a in t s  may b e  i n c o r p o r a t e d  i n t o  a convent ional  pro-  
gram i n  any  spec i f i c  i n s t ance ,  wha t  i s  r e a l l y  needed i s  a methodology that  
au tomat i ca l ly  con t ro l s  t he  a l lowed  r anges  o f  ope ra t ions  on  s to red  d a t a  
and p rograming  states when c rea t ing  p rograms  in  gene ra l .  
This  requirement  may b e  s a t i s f i e d  i n h e r e n t l y  by p l ac ing  the  data 
s t r u c t u r e s  t o  be manipulated by t h e  p r o g r a m  i n  t h e  same decomposition 
h i e r a r c h y  as i s  used   fo r   t he   sub rou t ines   t hemse lves .   I f   t h i s  i s  done 
p r o p e r l y ,  t h e  s p e c i f i c a t i o n s  f o r  a p a r t i c u l a r  f u n c t i o n  i n  t h e  h i e r a r c h y  
can  be  wr i t t en  to  p rov ide  s t r ic t  cont ro l  over  the  cor responding  e lements  
of data. As a resul t ,  each   subrout ine  w i l l  be s e l f - c o n t a i n e d   a n d   f u l l y  
s p e c i f i a b l e  w i t h  respect t o  b o t h  t h e  o p e r a t i o n s  i t  performs and t h e  co r -  
responding data s t ructures .  Note   here   tha t ,  j u s t  as the   fundamenta l   in -  
s t r u c t i o n s  f o r  t h e  h a r d w a r e  c o r r e s p o n d  t o  t h e  l o w e s t  l e v e l s  i n  t h e  h i e r -  
archy, so  a l s o  are the  fundamental   elements  of data loca ted  a t  t h i s  l e v e l .  
As one moves  upward th rough  the  h i e ra rchy ,  da t a  s t ruc tu res  composed from 
these  da t a  e l emen t s  may become la rger  and  more i n t r i c a t e ,  j u s t  as the  
functions performed upon them become  more complex and powerful. 
A type  of program hierarchy w i l l  now b e  d e s c r i b e d  t h a t  s a t i s f i e s  
t h i s  a d d i t i o n a l  d a t a - s t r u c t u r i n g  r e q u i r e m e n t  i n  a n  e f f e c t i v e  and p r a c t i c a l  
way. 
The fundamental element i n  t h e  new h ie ra rchy  w i l l  be designated a 
module, a term d u e  t o  P a r n a s  (who is  a l s o  r e s p o n s i b l e  f o r  o t h e r  i d e a s  i n  
s t ructured  programming) .  A t yp ica l  p rog ram h ie ra rchy  has  the  same form 
as t h a t  i l l u s t r a t e d  i n  F i g u r e  VIII-2. Each  module, dep ic t ed  by a circle 
168 
In the  figure, is  related  to  certain  other  modules  below  it  by  a  depen- 
dency  relation,  which  is  indicated  by an arrow and  will  be  defined sub- 
sequently. The uppermost  module(s)  represents  the  user  program(s),  and 
the  lowermost  modules  represent  the  minimal  level  of  implementation--e.g., 
computer  instructions  or  hardware. 
.. 
In essence,  a  module  consists  of  a  collection  of  data  structures  and 
a  collection  of  operations  on  these  data  structures.  For  example,  a  mod- 
ule  called  MATRIX  might  maintain  n X n  matrices  of  real  numbers  and  func- 
tions  for  inversion,  transposition,  element  change,  access,  etc. A mod- 
ule  called  STACK  might  be  used  to  push  and = characters  on  the  top  of 
a  "stack"  of  stored  characters. 
By  virtue  of  its  data  structures,  a  module  may  be  said  to  possess  a 
storage  state  which  will  change  from  time  to  time  as  operations  are  per- 
formed  in  the  module. To specify  a  module  completely,  it  is  first  con- 
venient  to  define  its  data  structures  by  declarations  of  variables,  pa- 
rameters,  etc., in  a  conventional  way,  plus  a  set  of  value-functions 
called  V-functions.  These  V-functions  collectively  and  completely  de- 
scribe  the  storage  state of the  module,  though  without  presuming  any  par- 
ticular  configuration  of  the  data  elements  in  a  physical  or  other  form. 
For example,  it  is  of  no  concern  at  this  point  whether  the  characters  in 
a STACK  module  are  stored in the  form of a  bidirectional  shift  register, 
an array  with  a  pointer, or as  a  linked  list;  the  only  property  of  inter- 
est  is  that  the  characters  entered  by  push  be  returned  by pop in inverse 
order  of  entry. In this  case,  the  set  of  V-functions  describes  the  set 
of all  past  characters  pushed  in  that  have  not  yet  been  popped. 
For  the  second  part  of a module  specification,  the  operations  per- 
formable  in  the  module  are  described  by a set  of  operation-functions  or 
0-functions.  These  0-functions  are  expressed  in  terms  of  the  effects 
they  have on the  set  of  V-functions  of  the  module.  That  is,  each 
0-function  describes  in  V-function  terms how a  prior  storage  state  is 
transformed  into  a  new  storage  state.  Again,  no  presumptions  are  made 
here as to how an 0-function is to be  realized or implemented in terms 
of simpler  constructs. 
169 
To c o m p l e t e  t h e  s p e c i f i c a t i o n  o f  a module, i n i t i a l  v a l u e s  m u s t  b e  
s p e c i f i e d  f o r  a l l  V-funct ions.  Also, e x c e p t i o n  c o n d i t i o n s  may b e  i n d i -  
c a t e d  f o r  b o t h  0- and  V-funct ions- -e .g . ,  an  input  var iab le  i s  ou t  o f  
r a n g e ,  a l l o c a t e d  s t o r a g e  s p a c e  i s  f u l l ,  o r  a d i sa l lowed  ope ra t ion  is  re- 
ques ted .   Engl i sh- language   explana tory  comments may a l s o  b e  i n c l u d e d  i f  
des i r ed ,  bu t  t he  spec i f i ca t ion  mus t  be  comple t e  wi thou t  t hese ,  o f  cour se .  
F ina l ly ,  t he  modu le  i s  g iven  a name. 
T h e  t o t a l  s p e c i f i c a t i o n  now s a t i s f i e s  t h e  f i r s t  i t e r a b i l i t y  c o n -  
d i t i o n ,  namely, t h a t  t h e  module i s  comple te ly  def ined  as f a r  as i t s  k- 
h a v i o r  i s  concerned. 
To s a t i s f y  t h e  s e c o n d  i t e r a b i l i t y  c o n d i t i o n ,  e a c h  m o d u l e  i s  i m p l e -  
mented by a set of  lower- leve l  modules  in  the  h ie rarchy .  The  modules  on  
which a given module depends for i t s  implementat ion are s a i d  t o  b e  i n  a 
dependency r e l a t i o n  t o  it, and are c o n n e c t e d  t o  i t  i n  F i g u r e  V I I I - 2  by an  
in t e rmodu le  a r row po in t ing  downward from it .  Consider  a t y p i c a l  module 
M and a l l  those other  modules  on which it depends--i ts  dependency set-- 
as i l l u s t r a t e d  i n  F i g u r e  V I I I - 3 .  L e t  us   suppose   tha t   an   implementa t ion  
o f  M has   a l ready   been   accompl ished ,  so  t h a t  M and a l l  of   the  modules  
i n  i t s  dependency set have   been   spec i f ied  as described  above.  Thus,   each 
has  i ts  own set  of 0- a n d  V - f u n c t i o n s ,  i n i t i a l  v a l u e s ,  etc.,  a p p r o p r i a t e  
t o  i t s  purpose.  To "implement" M it i s  now n e c e s s a r y   t o  somehow relate 
i t s  own s p e c i f i c a t i o n  t o  t h e  s p e c i f i c a t i o n s  o f  t h e  m o d u l e s  i n  i t s  depen- 
dency set .  
k 
k k 
k 
To t h i s  end, a correspondence  between M and i t s  dependency set  i s  k 
r e c o g n i z e d .   F i r s t ,   t h e  states of  M and t h o s e   o f  i t s  dependency set are k 
pu t   i n to   co r re spondence .   Th i s  is done  by  mapping  the  nonhidden 
V-funct ions i n  the  dependency set  o n t o   t h e  set of   V-funct ions  of  . For  
example, t h e  states r e p r e s e n t i n g  t h e  c o n t e n t s  o f  matrices i n  a MATRIX 
module would be expressed i n  terms o f  t h e  states r e p r e s e n t i n g  v e c t o r s  i n  
a lower VECTOR module  on  which it depends. A state mapping is  a c t u a l l y  
a mapp ing  o f  da t a  s t ruc tu res .  It can  be  expressed  by a set of a s s e r t i o n s  
or equat ions ,  or  i n  s i m p l e  cases merely as a t a b u l a r  l i s t i n g .  
Mk 
170 
DEPENDENCY 
SET OF Mk 
FIGURE VIII-3 DEPENDENCY SET 
I n  g e n e r a l ,  c e r t a i n  p o r t i o n s  o f  t h e  t o t a l  s ta te  i n  t h e  dependency 
set may b e  t r a n s p a r e n t  t o  M i n  which  case  the  mapping w i l l  be  many-to- 
one  ra ther   than  one-to-one.  The corresponding  V-funct ions are s a i d  t o  
be hidden (HV), e .g . ,  a l l  s t o r e d  c h a r a c t e r s  e x c e p t  t h e  most  recent ly  en-  
tered one i n  a STACK module would normally be transparent t o  a c a l l i n g  
module  above,  and  would  be  represented by such a function.  HV-functions 
i n  t h e  dependency set  do  not  par t ic ipate  i n  t h e  mapping  onto M b u t  t h e  
HV-functions i n  M must a l l  be accounted for  by the set  of V-functions 
i n   t h e  dependency set. 
k’ 
c 
k’ 
k 
Second, every 0-function and V-function of module M m u s t  be imple -  k 
mented as a program ( tha t  i s  a s e q u e n c e  o f  c a l l s )  i n  terms o f  t h e  0- 
and  V-functions  of  the  modules  of i t s  dependency set. Note t h a t  t h e s e  
programs for  funct ion implementat ion are a b s t r a c t  programs, i n  t h e  s e n s e  
t h a t  t h e  modules on which they might be run are themse lves  abs t r ac t  
machines. 
171 
This  particular  definition  of  a  program  hierarchy  of  modules  natur- 
ally  assumes  that  there  is no recursion  in  the  dependency  order.  However, 
it  is  permitted  for  two  or  more  modules  to  depend  upon  the  same  lower- 
level  module.  Consequently,  the  graph  that  describes  the  hierarchy  is  a 
directed  graph  without  cycles. 
The set  of  nonhidden  V-functions of a  module may  also  contain de- 
rived  or  DV-functions.  These  are  redundant  and  are  created  for  conve- 
nience  only.  DV-functions  need  not  participate  in  the  assertions  that 
describe  mapping  correspondences,  but  must  be  defined  directly  or  indi- 
rectly  as  programs,  just  as  for  the  other  accessible  V-functions. 
As  a  convenience  one  may  also  speak  of n OV-function  as an insep- 
arable  concatenation  of an  0-function and a  V-function. It need  not  par- 
ticipate i n  the  mapping,  but  like  0-functions  requires  a  statement  of 
effects  in  its  specification. 
The second  condition  of  iterability,  the  completeness  of  decompo- 
sition,  is  therefore  satisfied  provided  each 0- and  V-function  of  every 
module  can  be  realized  as  program of 0- and  V-function  taken  from 
lower-level  modules of the  hierarchy,  where  the data structures  of  the 
corresponding  modules  are  related  by  a  complete  V-function  mapping  as 
defined  above. 
The  third  iterability  condition  is  satisfied  in  the  course  of  design, 
provided  each  module  is  implemented  with  other  modules  that  are  less  com- 
plicated  than  itself.  This  condition  is  not  automatically  satisfied  by 
the  design  methodology.  However,  the  methodology so structures  the de- 
sign  that  it  is  easier  for  the  designer  to  maintain  control  over  the  com- 
plexity  of  the  implementations  at  each  level,  compared  to  a  conventional 
des  ign. 
These  concepts  and  definitions  will  now  be  illustrated by means  of 
the  simple  example  illustrated  in  Figure  VIII-4. An upper  module  STACK 
is to be 
STACK  is 
stack--a 
on  which 
implemented  by  means  of  a  lower  module ARRAY. The  function  of 
to  carry  out  the  usual  push  and  pop  operations  on  a  conventional 
data  structure  consisting  of  a  finite  ordered  list of elements 
elements  may  be  inserted  (pushed)  or  removed  (popped)  only  at 
172 
MODULE STACK 
V-FUNCTION  0-FUNCTION 
.. 
SIZE  PUSH (X)  
STAK(J)  [HVI POP [OVI  
TOP [DVI  
# 
MODULE ARRAY 
V-FUNCTION  0-FUNCTION 
CHAR(A)  CHANGE(A,Y), 
FIGURE VIII-4 EXAMPLE OF CONCEPTS AND  DEFINITIONS 
t h e  t o p  of t h e  l i s t .  T h e  d a t a  s t r u c t u r e  i n  ARRAY i s  an unordered set  of 
e lements .  
A c o m p l e t e  s p e c i f i c a t i o n  o f  t h e  m o d u l e  ARRAY i s  l i s t e d  i n  T a b l e  
V I I I - 1 .  F i r s t ,  v a r i a b l e s  are dec la red  as t o  t h e i r  t y p e s - - t h a t  is ,  
whether   they  are i n t e g e r ,  real, complex,  boolean, e tc . ,  and  parameters 
of t h e  a r r a y  are de f ined .  Next, t h e   s i n g l e   V - f u n c t i o n  CHAR(A) i s  spec-  
i f i e d .  After s t a t i n g  i ts  purpose ,   an   excep t ion   cond i t ion  i s  g iven  t o  
cover t h e  case when the  independen t  va r i ab le  A i s  o u t  of range.  Then, 
t h e  i n i t i a l  v a l u e  o f  t h e  V - f u n c t i o n  i s  i n d i c a t e d  f o r  a l l  p o s s i b l e  v a l u e s  
of A .  Next, t h e   0 - f u n c t i o n  CHANGE(A,Y) is  s p e c i f i e d ,   i n c l u d i n g  i t s  pur- 
pose ,  excep t ions  cond i t ions  ( e i the r  A o r  Y o u t  of range),  and i t s  effect  
upon the  V-funct ion  of t h e  module .  This  module  could  be  d i rec t ly  real- 
i z e d  i n  a random access memory, i n  which case t h e  V- and 0-funct ions 
would correspond t o  nondes t ruc t ive  r ead  and  s imple  wr i t e  ope ra t ions ,  re- 
s p e c t i v e l y .  Note t h a t  t h e r e  i s  n o t h i n g  a b o u t  t h e  s p e c i f i c a t i o n  o f  ARMY 
t h a t  p r e s u p p o s e s  t h i s  p a r t i c u l a r  h a r d w a r e  r e a l i z a t i o n ,  h o w e v e r .  
173 
Table  VIII-1 
MODULE  ARRAY 
Declaration: Integer  A, Y 
Parameters: AMAX Size  ofArray 
CHMAX . Maximum value of stored element 
V  -Func t ion : CHAR (A) 
Purpose:  To  return  the  A-th  element  of  the 
Exceptions:  AOUT:  A < 1 or A > AMAX 
Initially: B (0 5 i S AMAX) (CHAR(i) = 0) 
array 
0-Function: CHANGE (A, Y) 
Purpose:  To  replace  the  A-th  element  of  the 
Exceptions:  AOUT: A 1 or  A > AMAX 
Effects:  CHAR(A) = Y 
array  by Y 
YOUT:  X e 0 or  Y > CHMAX 
The  specification  of  the  module  STACK  shown  in  Table VIII-2 follows 
similar  lines.  The  V-function  SIZE  reflects  the  number  of  entries  that 
are  currently  stored  in  the  stack.  A  HV-function  STAK(J)  represents 
the  entire  contents  of  the  stack;  it  is  invisible  and  cannot  be  called  by 
higher-level  modules. A DV-function  TOP  is  derived  from STAK(J). The 
effects  of  the  0-function  PUSH(X)  are  to  inject a  new  element  X  onto  the 
top  of  the  stack  and  to  increase  SIZE  by 1. The  OV-function  POP  accom- 
plishes  the  reverse--the  top  element  is  both  returned  and  deleted  from 
the  stack,  and  SIZE  is  reduced by 1. Again,  the  STACK  module  could be 
implemented  in  a  variety  of  ways  and  none  are  assumed  or  precluded  by 
the  specification  given  here. 
Implementation  of  STACK  by  means  of  ARRAY  requires  first  that  the 
V-functions  of  the  two  modules  be  placed  in  correspondence. To this  end, 
the  parameters  are  related  first. The nonderived  V-functions of STACK 
are  expressed in terms  of  (nonhidden)  V-functions  of  ARRAY.  Note  that  the 
elements  of STAK and  CHAR  are  placed  in  one-to-one  correspondence,  except 
that  one  extra  element of CHAR  is  reserved  for  SIZE,  which is to be used 
as a  pointer  in  this  implementation. 
174 
Table  VIII-2 
SPECIFICATION OF MODULE  STACK 
Module  Stack 
Declarations:  Integer J. X 
Parameters: 
V-Function: 
HV-Func  tion: 
DV-Func t ion : 
0-Function: 
OV-Func  tion: 
Mapp in& 
Parameters: 
V-Functions: 
Initialization 
CHAKE (rLsAX, 0) 
Implementation 
V-Func t ion : 
DV-Func t ion : 
0-Tunc t ion : 
OV-Function: 
SMAX Maximum s i z e  of stack 
CHARMAX Maximum value of stored  element 
SIZE 
Purpose:  To  return  the  number  of  elements  currently in
Exception: None 
Initially:  SIZE = 0 
STAK(J) 
Purpose: To represent  entire  contents  of  the  stack 
Initially: Vi(0 i s SMAX)(STAK(i) = Undefined) 
TOP 
Purpose:  To  return top element in the  stack 
Derived:  TOP = STAK(S1ZE) 
Exceptions: EMPTY: SIZE = 0 
PUSH (X) 
Purpose: To augment  stack  with  an  additional  Element X 
Exceptions: FULL: SIZE = MAXS 
Effects:  SIZE = 'SIZE' + 1 
the stack 
XOUT:  X C 0 or  X > CHARMAX 
STAK(S1ZE) = X 
Pot- 
Purpose: To return  and  remove  top  element of the stack 
Exception: EMPTY: 'SIZE' = 0 
Effects:  POP = STACK('S1ZE') 
SIZE 'SIZE' - 1 
CHMAX = CHARMAX 
CHMAX = SMAX 
A M A X = S M A x + l  
VJ(1 5 J 5 AMAX - l)(STAK(J) = CHAR(J)) 
SIZE = CHAR(AMAX) 
SIZE = CHAR(AMAX) 
TOP = CHAR(CHAR(AMAX)) 
EXIT: EMPTS: CHAR(AMAX) = 0 
ASSERTIOK: FL'LL: CHAR(ANAX) AMAX - 1 
SOI'T: X < 0 or X > CHEW 
PCSH(X):  CHAiiGE (AMAX, CHAR(ANAX) + 1) 
CHAKGE (CHAR("), X) 
ASSERTIOK: EXPTY: CHAR(ANAX) - 0 
Pop: m e  = CHAR(CHAR(AMAX)) 
CHANGE (AMAX, CHAR(AMAX) - L) 
175 
Next, ARRAY must  be  initialized  to  conform  to  the  initial  conditions 
of STACK.  Only  the  extra  element  CHAR  need  be  set  to  a  defined  value. 
Finally,  the  implementation  is  described  by  expressing  each  function 
of  STACK  as  a  program  in  terms  of  the  V-function  CHAR  and  the  0-function 
CHANGE  of  the  module  ARRAY.  Exit  conditions  expressed in the  same  terms 
describe  non-normal  returns  from  the  lower  level  to  the  upper  level. 
Several  properties  of  this  realization  may  be  noted  at  this  point 
as  a  way of  summarizing  some  important  general  features  of  a  hierarchi- 
cal  design : 
(1) Each  module  specification  is  essentially  independent 
of  those  of  all  other  modules. 
(2 )  The effects  of  0-functions  and  the  definitions  of 
DV-functions  are  expressed  in  terms of nonderived 
V-functions  of  the  same  module,  and  are  implemented 
solely  in  terms  of  functions  occurring  in  that  mod- 
ule's  dependency  set. 
(3 )  The implementation  employed  for  a  module  is  not 
visible  to  those  upper  modules  that  may  depend  upon 
that  module  for  their  own  implementations. 
( 4 )  The  state of  a  module  is  determined  by  the  complete 
set of values  of  all  of  its  nonderived  V-functions, 
over  all  allowed  values  of  their  arguments. 
(5) The mapping of V-functions  between  modules  presumes 
explicit  relationships  between  the  parameters  of 
corresponding  modules. 
( 6 )  HV-functions  have  no  exception  conditions; 0- and OV- 
functions  have  no  initial  conditions;  and  none of the 
V-functions  have  effects. 
With  this  background,  the  steps  of  design  according  to  the  methodol- 
ogy  may  be  outlined  as  follows. The starting  point f o r  the  design  con- 
sists  of  a  specification  of  the  uppermost  module  in  the  hierarchy--a  con- 
cise  description  of  what  the  overall  program  is  to  accomplish. If the 
hardware  on  which  the  final  program  is  to  be  implemented  is  prescribed, 
a  list  of  the  lowest-level  elements  or  functions  out of which  the  system 
is  to  be  composed  will  also  be  specified.  Then: 
(1) The uppermost  module  function  is  decomposed  into  a  hier- 
archy  of  modules,  the  function  of  each of which  is  cur- 
sorily  described  in  words. 
176 
(2) The func t ion  of each  module is  de f ined  p rec i se ly  by  a 
s p e c i f i c a t i o n ,  as expla ined   prev ious ly .  
(3 )  The  V-function  mappings are worked out  between  each 
module and those of i t s  dependency set. 
( 4 )  A l l  V-funct ions  and  0-funct ions are implemented as 
programs i n  t h e  V- and 0-functions of modules of 
the i r  cor responding  dependency  sets. 
Like  most  des igns ,  these  four  steps are not  independent  of  one another .  
They are n o t  e x e c u t e d  i n  a s i n g l e  s e q u e n c e  b u t  are passed through re- 
p e a t e d l y  i n  t r i a l - a n d - e r r o r  f a s h i o n  u n t i l  a l l  cond i t ions  are s a t i s f i e d  
and a l l  c o s t  and qua l i ty  measu res  ( such  as t h e  number of  programming 
s t e p s ,  t h e  r u n n i n g  time, and  amount o f  s t o r a g e  r e q u i r e d )  are s u i t a b l y  
opt imized.  
S t e p s  2 and 3 above are l a rge ly  fo rma l ,  bu t  steps 1 and 4 are more 
c r e a t i v e .  T h e  f i r s t  s t e p  r e q u i r e s  a p e r s p e c t i v e  v i e w  o f  t h e  e n t i r e  h i e r -  
a r c h i c a l  program. I n  decomposing  the  var ious  module  funct ions,   the   de-  
s i g n e r  m u s t  u s e  h i s  p a s t  e x p e r i e n c e  t o  a n t i c i p a t e  how they  shou ld  bes t  
be  decomposed,  both i n  terms of  the  ope ra t ions  pe r fo rmed  and  the  da t a  
s t r u c t u r e s  a p p r o p r i a t e  t o  t h e s e  m o d u l e s .  The f o u r t h  s t e p  c a n  o f t e n  bene- 
f i t  f rom ingenui ty  a t  a more d e t a i l e d ,  l o g i c a l  l e v e l .  
Two i d e a l i s t i c  a p p r o a c h e s  t o  h i e r a r c h i c a l  d e s i g n  are fa sh ionab le .  
I n  t h e  bottom-up approach the modules are d e f i n e d  i n  s u c c e s s i o n  f r o m  t h e  
b o t t o m  o f  t h e  h i e r a r c h y  t o  t h e  t o p ,  i n  s u c h  a manner t h a t  no  module i s  
c r e a t e d  u n t i l  a l l  modules i n  i t s  dependency set h a v e  b e e n  c r e a t e d  f i r s t .  
The  user  program a t  t h e  t o p  is c r e a t e d  last. I n  t h e  top-down  approach 
one starts with the uppermost module and creates lower-level modules i n  
success ion  as they  are needed, u n t i l  a l l  of  the  lowes t - leve l  modules  are 
defined.  The  bottom-up  approach i s  more  of a s y n t h e s i s ,  i n  t h e  s e n s e  t h a t  
t h e  s i m p l e s t  e l emen t s  o f  t he  sys t em are gradual ly  assembled into more and 
m o r e  p o w e r f u l  u n i t s  i n  o r d e r  t o  r e a l i z e  t h e  f u n c t i o n  a t  t h e  t o p .  I n  o n e  
way it i s  a more o rde r ly  p rogres s ion .  The  top-down  viewpoint i s  more 
a n a l y t i c a l ,  i n  t h e  s e n s e  t h a t  f u n c t i o n s ,  d a t a  s t r u c t u r e s  and ope ra t ions  
are repea ted ly  broken  down i n t o  s i m p l e r  parts u n t i l  o n l y  pr imi t ive  ver- 
s i o n s  remain. 
177 
Both   of   these   approaches   suf fe r   f rom  the  same disadvantage .  Namely, 
it i s  n o r m a l l y  d i f f i c u l t  t o  d e f i n e  m o d u l e s  a t  i n t e r m e d i a t e  l e v e l s  m e r e l y  
from a knowledge of  module specif icat ions a t  the uppermost and lowermost 
levels o f  t h e  h i e r a r c h y .  While t h e  d e c o m p o s i t i o n  o f  t h e  s y s t e m  i n t o  a 
f u n c t i o n a l  h i e r a r c h y  g r e a t l y  s i m p l i f i e s  t h e  o v e r a l l  d e s i g n  problem, t h e  
s t e p  o f  s e l ec t ing  in t e rmed ia t e - l eve l  modu les  s t i l l  r e q u i r e s  a broad view 
of t h e  m a n i f o l d  p o s s i b i l i t i e s  o f  a n a l y s i s  and   synthes is .   These  are pres- 
en t ly  bes t  acqu i r ed  on ly  th rough  expe r i ence .  
Consequently,  the bottom-up and top-down approaches actually mark 
extremes a t  the   ends   o f  a cont inuum  of   poss ib i l i t i es .   The  practical  ap- 
proach l ies i n  between.  The t o t a l  number o f   compe t ing   des ign   a l t e rna t ives  
are reduced by working a t  a l l  l e v e l s  o f  t h e  h i e r a r c h y  s i m u l t a n e o u s l y ,  
u s ing  the  top  and  bo t tom l eve l s  on ly  as s t a r t i n g  p o i n t s .  
I f  a computer  program has been designed hierarchical ly  as de f ined  
above ,  formal  program proving  techniques  can  be  appl ied  appropr ia te ly  to  
t h e  module spec i f ica t ions ,  to  the  V-funct ion  mapping  cor respondences ,  and  
t o  t h e  0- and V-function implementations,  t o  p r o v e  c o r r e c t n e s s  o f  t h e  
program.  These  techniques are now u n d e r g o i n g  r e f i n e m e n t  i n  r e l a t e d  SRI  
p r o j e c t s  i n  w h i c h  t h e  h i e r a r c h i c a l  d e s i g n  m e t h o d o l o g y  i s  b e i n g  a p p l i e d  t o  
o the r  so f tware  sys t ems .  
T h i s  h i e r a r c h i c a l  a p p r o a c h  a l s o  creates a d e s i g n  i n  which i t  i s  
easier t o  c o n t r o l  s e c u r i t y - - m a i n t a i n i n g  c o n t r o l  o v e r  d i f f e r e n t  u s e r s ’  
r i g h t s  t o  access and/or  change var ious data  e lements-- including a capa- 
b i l i t y  f o r  p r o v i n g  t h a t  t h e  d e s i r e d  s e c u r i t y  is  indeed   ach ieved .   F ina l ly ,  
a p rogram resu l t i ng  f rom the  new methodology i s  amenable t o  d e t e r m i n a t i o n  
of i t s  r e l i a b i l i t y ,  u n d e r  a n  assumed set of f a i l u r e  p r o b a b i l i t i e s  f o r  i t s  
lowes t - l eve l  modu le  func t ions ,  and  to  p rov ing  tha t  a r e q u i r e d  r e l i a b i l i t y  
l e v e l  h a s  a c t u a l l y  b e e n  a c h i e v e d  i n  t h e  d e s i g n .  
178 
I X  HIERARCHICAL ORGANIZATION OF SIFT 
I n  t h i s  s e c t i o n  we d i s c u s s  t h e  d e s i g n  o f  t h e  S I F T  s y s t e m  as a h i e r -  
a r c h i c a l  l a y e r i n g  o f  a b s t r a c t  m a c h i n e s .  F i r s t ,  we b r i e f l y  d i s c u s s  t h e  
h i e ra rch ica l  des ign  me thodo logy ,  w i th  respect t o  o u r  c o n c e p t  o f  t h e  SIFT 
requ i r emen t s   and   t he   hand l ing   o f   f au l t s  and errors .   This   methodology com- 
p r i s e s  f i v e  s t a g e s  o f  d e s i g n  a n d  i m p l e m e n t a t i o n .  T h e  r e a l i z a t i o n  o f  SIFT 
is  d i s c u s s e d  r e l a t i v e  t o  t h e s e  f i v e  stages. T h e   p r o p o s a l   f o r   t h i s   p r o j -  
ect  s u g g e s t e d  t h a t  t h e  o p e r a t i n g  s y s t e m  b e  d e s c r i b e d  u s i n g  f l o w c h a r t s .  
Th i s  sugges t ion  has  no t  been  fo l lowed  because  the  f ive  s t ages  p rov ide  a 
descr ip t ion  which  is  easier t o  u n d e r s t a n d  and a l s o  i s  easier t o  v e r i f y  
and analyze.  
A .  The  Hierarchical  Methodology Relative t o  SIFT 
I n  S e c t i o n  V I 1  t he  h i e ra rch ica l  me thodo logy  w a s  d i scussed  as a gen- 
eral approach t o  d e s i g n i n g  and  proving  systems. Below, we b r i e f l y  r e v i e w  
the methodology and present some augmen ta t ions  to  hand le  ha rdware  f au l t s ,  
t h e i r  i m p a c t  on a l l  abs t r ac t  mach ines  and an approach toward developing 
a c r e d i b l e  r e l i a b i l i t y  a s s e s s m e n t  o f  S I F T .  The  methodology  involves  the 
fo l lowing  s t ages .  
S t age  O--Express  the problem to  be  so lved  in  abs t rac t  and  perhaps  
imprec ise  terms. For  SIFT th i s  s t a g e  en ta i l s  expres s ing  the in ten t  of  
t h e  S IFT  sys t em wi th  r ega rd  to  d i spa tch ing  app l i ca t ion  t a sks  and  the  
h a n d l i n g  o f  e r r o r s .  
Stage 1--Conceive of a set  of abs t r ac t  mach ines  that  seem appro- 
p r i a t e  fo r  so lv ing  the  p rob lem.  Each  abs t r ac t  mach ine  has  a state space  
and o p e r a t i o n s  t o  c h a n g e  t h e  state.  As suggested  by  Parnas   [Ref .  11, 
we use V-funct ions t o  r e p r e s e n t  t h e  state and O-functions t o  correspond 
to   opera t ions .   The   machines  are o rgan ized   h i e ra rch ica l ly ,  i.e., as nodes 
i n  a d i r e c t e d  a c y c l i c  g r a p h .  An edge  from  node A t o  node B i n d i c a t e s  t h a t  
the  machine  a t  node B implements  the  machine a t  node A .  The  machines 
179 
c o r r e s p o n d i n g  t o  l e a f  n o d e s  i n  t h e  g r a p h  are c a l l e d  primitive machines 
s i n c e  t h e  o p e r a t i o n  o f  t h e  e n t i r e  s y s t e m  i s  dependent  on these machines .  
I n  t h e  d e s i g n  p r o c e s s  i t  i s  u s e f u l  t o  view each  abs t r ac t  mach ine  as main- 
t a i n i n g  a p a r t i c u l a r  t y p e  o f  a b s t r a c t  o b j e c t  f o r  u s e  by a n  a b s t r a c t  ma- .  
c h i n e  d i r e c t l y  a b o v e  it i n  t h e  h i e r a r c h y .  Some o f  t h e  f u n c t i o n s  o f  e a c h  
a b s t r a c t  m a c h i n e  c a n  b e  c a l l e d  by  programs t h a t  r u n  on the  sys tem.  We 
d e s i g n a t e  t h e s e  f u n c t i o n s  as compr i s ing   t he   sys t em  in t e r f ace .   Fo r  SIFT 
the  programs tha t  ca l l  t h e  i n t e r f a c e  f u n c t i o n s  are s i m p l y  t h e  a p p l i c a t i o n  
t a s k s .  It  should  be clear t h a t  t h e  m a c h i n e  f u n c t i o n s  t h a t  are n o t  p a r t  
o f  t h e  i n t e r f a c e  are n o t  a c c e s s i b l e  t o  t h e  a p p l i c a t i o n  p r o g r a m s .  
A l l  o f  t h e  0- and V-functions of a m a c h i n e  a c c e s s i b l e  t o  a h ighe r  
leve l  machine  are c a l l e d  t h e  v i s i b l e  f u n c t i o n s  o f  a machine. Some o f  t h e  
V-funct ions j u s t  s e r v e  t o  a i d  i n  d e f i n i n g  t h e  s ta te  and cannot be accessed; 
t h e s e  f u n c t i o n s  are ca l led   h idden   V-funct ions .   The  set of   V-funct ions  of  
a machine  tha t  are e s s e n t i a l  i n  d e f i n i n g  t h e  s ta te  space of  a machine are 
ca l l ed   p r imi t ive   V- func t ions .  Some o the r   V- func t ions ,   ca l l ed   de r ived  
V- func t ions ,  r e tu rn  a v a l u e  t h a t  i s  dependent  on the value of  more pr imi-  
t i ve  V- func t ions .  The ro l e  o f  de r ived  V- func t ions  is  t o  provide  a mech- 
a n i s m  f o r  r e f e r r i n g  t o  a c o l l e c t i o n  o f  states by a s ing le  func t ion .  These  
func t ions  are d e s c r i b e d  i n  more d e t a i l  i n  S e c t i o n  IX-D. 
I n  o r d e r  t o  fo rma l i ze  the  concep t  o f  a f a u l t  o c c u r r e n c e ,  we use t h e  
mechanism  of  hidden  0-functions.   That i s ,  some o f   t he   abs t r ac t   mach ines  
w i l l  con ta in  the  0 - func t ion  cause - fau l t ,  wh ich  i s  n o t  c a l l e d  by any pro- 
gram, but  occurs  asynchronous ly  wi th  o ther  process ing ,  wi th  a p r o b a b i l i t y  
dependent   upon  the  hardware  and  t ransient   faul t   mechanisms.   The  object  
t h a t  i s  "damaged" by t h e  f a u l t  i s  d e p e n d e n t  o n  t h e  p a r t i c u l a r  a b s t r a c t  
machine   tha t  i s  s u b j e c t  t o  f a i l u r e .  I n  o u r  a n a l y s i s  o f  SIFT t h e  p r i m i t i v e  
o b j e c t s  t h a t  are s u b j e c t  t o  f a i l u r e  are processors   and  busses .   This  
r e p r e s e n t s  a c o a r s e  t r e a t m e n t  o f  f a u l t s  as compared with one i n  which 
f a u l t s  are assumed t o  a f f e c t  memory words o r  p r o c e s s o r  r e g i s t e r s .  Our 
course approach is realist ic f o r  a n  LSI implementation and moreover, 
does  not   produce  overly pessimistic resu l t s .   Hidden   V-funct ions  are in -  
c l u d e d  i n  c e r t a i n  m a c h i n e s  t o  r e c o r d . t h e  o c c u r r e n c e  o f  f a u l t s .  T h e s e  
180 
f u n c t i o n s  are o f  necess i ty  h idden  s ince  an  obse rve r  o f  t he  mach ine ' s  be -  
hav io r  w i l l  n o t  d e t e c t  t h e  f a u l t  o c c u r r e n c e  u n t i l  t h e  m a c h i n e  i s  appro- 
p r i a t e l y  e x e r c i s e d .  
S t age  2 - - In  th i s  s t age  a f o r m a l  s p e c i f i c a t i o n  i s  w r i t t e n  f o r  e a c h  
o f   t he   abs t r ac t   mach ines .  The e x a c t  f o r m a t  o f  t h e  s p e c i f i c a t i o n s  i s  d i s -  
c u s s e d  i n  t h e  n e x t  s e c t i o n ,  b u t  i t  s u f f i c e s  f o r  t h i s  d i s c u s s i o n  t o  s a y  
t h a t  t h e  m o s t  s i g n i f i c a n t  p a r t  of a s p e c i f i c a t i o n  i s  t h e  e f f e c t s  s e c t i o n  
for  each  0- func t ion  which  g ives  the  new V- func t ions  in  terms of t h e  v a l -  
ues  of  V-funct ions  immedia te ly  pr ior  to  a c a l l  on the 0-funct ion.  These 
e f f e c t s  are w r i t t e n  as a s s e r t i o n s  i n  a language to  be descr ibed below.  
I n  t h e  case of  machines  whose specif icat ion is  i n t e n d e d  t o  p o r t r a y  t h e  
r e s u l t s  o f  f a u l t s ,  t h e  e f f e c t  o f  t h e  g e n e r i c  0 - f u n c t i o n  c a u s e - f a u l t  i s  t o  
change  the  va lues  o f  h idden  V- func t ions  tha t  r eco rd  the  f au l t  occu r rence .  
The f au l t  can  p roduce  a p e r c e i v a b l e  e r r o r  when an  0- func t ion  i s  invoked 
whose s p e c i f i c a t i o n s  are dependent on the above hidden V-functions.  
S t age  3 - - In  th i s  s t age  the  s ta tes  o f  each  non-p r imi t ive  abs t r ac t  
machine  a re  represented  in  te rms  of  the  s ta tes  of  the lower level  machines  
tha t   compr i se  its implementa t ion .   This   representa t ion  i s  a par t ia l  map- 
ping from the state-spaces of the lower level  machines  onto the upper  
l e v e l  state-space. I t  is pa r t i a l  s ince   no t   every   lower   l eve l  s ta te  w i l l  
have  an  image i n  t h e  uppe r  l e v e l ,  and it is on to  s ince  each  u p p e r  l e v e l  
s ta te  m u s t  be t h e  t a r g e t  o f  a mapping. Two d i s t i n c t  states S1, S2 i n  t h e  
u p p e r  level  machine m u s t  have  d is t inc t  images  in  the  lower  leve l  machine ,  
o t h e r w i s e  i n  t h e  i m p l e m e n t a t i o n  t h e  states w i l l  no t  be  d i s t ingu i shab le .  
On the  o ther  hand ,  an  uppe r  l e v e l  state S1 can  be  represented  by more 
than  state T1, T2, ... i n  t he  lower  l eve l  mach ine .  The  meaning  of t h i s  
m u l t i p l e  r e p r e s e n t a t i o n  i s  t h a t  a t  any  in s t an t  on ly  one  o f  t he  states T i  
a c t u a l l y  r e p r e s e n t s  sl, but which s ta te  i s  se lec ted  depends  on  the  i m p l e -  
m e n t a t i o n   ( s t a g e  4 )  and p o s s i b l y   e x t e r n a l   i n p u t s ,   e . g .   f a u l t s .   I f   t h e  
l o w e r  l e v e l  c o n s i s t s  of more than  one  abs t rac t  machine  it i s  convenient  
t o  view the aggregate  of  such lower level  machines  as a s ing le  mach ine  
wi th  a state space t h a t  is t h e  C a r t e s i a n  p r o d u c t  of t h e  component s ta te  
spaces. 
181 
It is a l s o  c o n v e n i e n t  t o  c a r r y  o u t  t h e  s ta te  mapping i n  two steps.  
I n  t h e  f i r s t  s t e p  a l l  of t h e  states in  the  lower  l eve l  mach ine  tha t  have  
images i n  t h e  u p p e r  level are d e f i n e d .  I n  t h e  s e c o n d  s t e p  the   upper  
l e v e l  t a r g e t  states are s e l e c t e d  f o r  e a c h  s ta te  d e f i n e d  i n  s t e p  1. 
As d i s c u s s e d  f o r  s t a g e  2 above ,  ce r t a in  states of a module record 
the  occurrence  of  a f a u l t .  I n  some cases an  occurrence  of  a f a u l t  of  one 
machine  a lso results i n  a f a u l t  a t  a h igher   l eve l   machine .   In   such  cases 
t h e  f a u l t  states, similar t o  o t h e r  states are mapped upward. However, 
i n  o t h e r  cases the  occurrence  of  a f a u l t  i s  masked  by t h e  u p p e r  l e v e l  
machine.  Thus, a f a u l t - f r e e   a n d  i t s  c o u n t e r p a r t  f a u l t y  s ta te  both map up 
t o  t h e  same state i n  t h e  u p p e r  level machine. 
S i n c e  t h e  states of  an abstract  machine are def ined  by values  of  
p r i m i t i v e  V - f u n c t i o n s ,  t h e  i n t e r m o d u l e  r e p r e s e n t a t i o n s  are w r i t t e n  as 
expres s ion  r e l a t ing  the  lower  l eve l  p r imi t ive  V- func t ions  and  the  uppe r -  
leve l   V-funct ions .   The   process   o f   def in ing   the   lower   l eve l  states t h a t  
map upward i n v o l v e s  w r i t i n g  a n  e x p r e s s i o n  i n  terms o f  t h e  p r i m i t i v e  V- 
func t ions  of  the  lower  leve l  machine .  
S t age  4 - - In  th i s  s t age ,  t he  nonpr imi t ive  abs t r ac t  mach ines  are  
implemented i n  terms of  the  machines  d i rec t ly  be low them in  t h e  h i e r a r c h y .  
S i n c e  t h e  v i s i b l e  0- and V-functions of a machine are c a l l a b l e ,  i t  i s  
t h e s e  f u n c t i o n s  t h a t  are t o  be  implemented i n  terms o f  t h e  v i s i b l e  0- 
and  V-functions  of  the  lower  level  machines.  At presen t ,  we write t h e s e  
implementation programs i n  a s i m p l e  language which we c a l l  a n  a b s t r a c t  
programming  language.  These  programs  can  serve as a p l a n  f o r  t h e  u l t i -  
mate implementat ion programs wri t ten in  assembly language or  perhaps some 
higher   l eve l   l anguage .  We i n t e n d  t o  s t u d y  t h e  p o s s i b i l i t y  o f  u s i n g  some 
augmenta t ion  of  an  ex is t ing  h igh  leve l  language  as t h e  a b s t r a c t  program- 
ming  language.  That is, t h e   a b s t r a c t  programs  could  be compi l ed  d i r e c t l y  
into assembly code.  
As d i s c u s s e d  above  fo r  s t age  1, t h e  o c c u r r e n c e  o f  f a u l t s  is handled 
by t h e  i n v o c a t i o n  o f  t h e  0 - f u n c t i o n  c a u s e - f a u l t  by a hidden asynchronous 
p r o c e s s .  I f  a f a u l t  i n  a lower  level  machine i s  t o  b e  a p p a r e n t  i n  t h e  
182 
machine direct ly  above,  then there  must  be a c a u s e - f a u l t  0 - f u n c t i o n  i n  
the   upper   l eve l   machine .   The   upper   l eve l   cause- fau l t   0 - func t ion  is  t h e  
"implemented"  by a combina t ion  of  cause- fau l t  0 - func t ions  of  the  lower  
level  machine.  
F i g u r e  I X - 1  d e p i c t s  t h e  s ta te  changes and s ta te  r e p r e s e n t a t i o n  map- 
p ings  for  upper  leve l  machine  S which is implemented  by  machine T .  I n  S 
t h e  e f f e c t  o f  t h e  0 - f u n c t i o n  when the  machine  i s  i n  state S1 i s  t o  c a u s e  
a t r a n s f e r  t o  S2. The states T 1  and T2 of T map up t o  S1 whi le  T3 and T4 
map up t o  S2. I f  T i s  i n  state T 1  then  the  implementa t ion  of  the  0- 
f u n c t i o n  i s  a program which causes T to undergo numerous state t r a n s i t i o n s ,  
b u t  o n l y  t h e  i n i t i a l  and f i n a l  states ( T 1  and T4) map up t o  S .  A f a u l t  
i n  T could  cause  a t r a n s i t i o n  f r o m  T1 t o  T2, b u t  t h i s  f a u l t  i s  i n v i s i b l e  
t o  S s ince  bo th  T1 and i t s  f a u l t y  c o u n t e r p a r t  b o t h  map up t o  S1. 
UPPER LEVEL 
STATESPACE 
(MACHINE TI 
UPPER ' ""' 
S T A T E - w w x  
(MACHINE S) 
0-FUNCTION 
/ I  REPRESENTATION 
""_ 
"""""" 
'FIGURE E - 1  DESCRIPTION OF STATE  CHANGES,  REPRESENTATION  MAPPINGS, AND 
IMPLEMENTATIONS IN ADJACENT  ABSTRACT  MACHINES 
183 
. .... . . - " - 
B. S t a g e  0 of   the  Methodology  for   SIFT 
I n  t h i s  s t a g e  t h e  i n t e n t  o f  SIFT i s  d e s c r i b e d  i n  i m p r e c i s e  terms i n  
o r d e r  t o  a i d  i n  d e v e l o p i n g  a h i e r a r c h i c a l  o r g a n i z a t i o n .  T h i s  d e s c r i p t i o n  
w i l l  a l s o  s e r v e  i n  f o r m u l a t i n g  p r e c i s e  a s s e r t i o n s  a b o u t  SIFT fo r  pu rposes  
of   ver i fying  the  design.   The  pr imary  purpose  of   SIFT i s  t o  d i s p a t c h  ap-  
p l i c a t i o n  t a s k s  when t h e i r  s e r v i c e  i s  needed, even i f  t h e  h a r d w a r e  u n i t s  
f a i l .  A s  d i s c u s s e d  i n  S e c t i o n  V, two t y p e s  o f  a p p l i c a t i o n  t a s k s  are 
handled by t h e  SIFT  system:  scheduled  tasks  which are g u a r a n t e e d  t o  b e  
d i spa tched  a t  a f i x e d  rate, and p r io r i ty  t a sks ,  each  o f  wh ich  i s  d i s -  
p a t c h e d  i f  i t s  dead l ine  has  exp i r ed  and  i f  i t s  p r i o r i t y  i s  h ighes t  o f  a l l  
s u c h  t a s k s  whose dead l ine  has  e x p i r e d .  
The basic scheme for a l l  a p p l i c a t i o n  t a s k s ,  f o r  e a c h  i t e r a t i o n  i s  as 
fo l lows:  
READ DATA FROM EACH TASK  SUPPLYING  INPUTS 
COMPUTE 
WRITE DATA TO A BUFFER  FOR EACH TASK THAT REQUIRES I T  
AS AN INPUT 
I f  t h e  n - t h  i t e r a t i o n  o f  t a s k  A r e q u i r e s  d a t a  f r o m  t h e  ( n - 1 ) s t  i t e r a t i o n  
o f  t a s k  A, t hen  A is i n c l u d e d  i n  b o t h  t h e  i n p u t  t a s k  set and t h e  o u t p u t  
t a s k  set f o r  A .  A p a r t i c u l a r  f e a t u r e  o f  t h e  s e l e c t e d  a v i o n i c s  t a s k  set 
is t h a t  a t a sk  runn ing  a t  i t e r a t i o n  rate f does not read data from a t a s k  
running a t  i t e r a t i o n  ra te  f l ,  f l  > f .  Thus a t a s k  A need  only write d a t a  
f o r  a less f r e q u e n t l y  d i s p a t c h e d  t a s k  B, as o f t e n  as B i s  d i spa tched .  
When B writes data f o r  A, B d e p o s i t s  t h e  data i n  a b u f f e r  t h a t  i s  
shared   wi th  A .  When A i s  d ispa tched  i t s  read da ta  ope ra t ion  invo lves  
r ead ing   t he   con ten t s   o f   t he   bu f fe r .   Fo r   pu rposes   o f   r edundancy ,   t a sks  
are executed on  more than  one  p rocesso r .  Thus  the  r ead  ope ra t ion  fo r  B 
from A y i e l d s  a r e su l t  which i s  t h e  m a j o r i t y  v a l u e  o f  t h e  d a t a  over a l l  
in s t ances  o f  execu t ion  o f  A .  I f  no m a j o r i t y  v a l u e  exists t h e n  s e v e r a l  
policies can be invoked, one of which i s  t o  t e m p o r a r i l y  s u s p e n d  B ' s  
e x e c u t i o n .   S e v e r a l   i s s u e s   r e g a r d i n g   t h e   v o t e   o p e r a t i o n  are s i g n i f i c a n t  
t o  t h e  d e s i g n  of t h e  SIFT opera t ing  sys tem.  
184 
T h e  a p p l i c a t i o n  p r o g r a m e r  f o r  B should  not  have  to  know 
which processors  are running  A nor even how  many such 
p rocesso r s  ex is t  a t  any   ins tan t .   The   SIFT  opera t ing   sys-  
t e m  should  main ta in  such  informat ion  and  appropr ia te ly  
p r o c e s s  t h e  r e a d  i n p u t  d a t a  command. 
When B's read from A invo lves  a vo te  ove r  more than one 
i n s t a n c e  o f  A, it is e s s e n t i a l  t h a t  a l l  such  in s t ances  
p r o d u c e  d a t a  f o r  t h e  same i t e r a t i o n  ( e x c l u d i n g  t h o s e  
i n s t a n c e s   o n   f a u l t y   p r o c e s s o r s ) .  As we observe  below 
t h i s  is t h e  o n l y  s y n c h r o n i z a t i o n  r e q u i r e m e n t  o n  t h e  ex- 
e c u t i o n  o f  i n s t a n c e s  of t a s k s .  
When a v a i l a b l e ,  d i f f e r e n t  b u s s e s  are used   fo r   t he   r ead  
over  ins tances  of  a t a s k ,  i n  o r d e r  t o  p e r m i t  t h e  v o t e  
mechanism t o  mask s i n g l e  b u s  f a i l u r e s ,  i n  a d d i t i o n  t o  
s i n g l e  p r o c e s s o r  f a i l u r e s .  
The pr imary error  detect ion mechanism i s  v i a  a d i sag ree -  
ment  on a v o t e .  I f  t h e  r e d u n d a n c y  i s  s u f f i c i e n t  r e l a t i v e  
t o  t h e  number o f  f a u l t y  p r o c e s s o r s  and  busses, it i s  pos- 
s i b l e  t o  u n i q u e l y  i d e n t i f y  t h e  f a u l t y  u n i t s .  
T h e  u n i t s  t h a t  are i n d i v i d u a l l y  s u b j e c t  t o  f a i l u r e  are p rocesso r s  
and  busses.  A t  a f i n e r  g r a i n  i t  will b e  n e c e s s a r y  t o  c o n s i d e r  f a i l u r e s  
than  invo lve  a p r o c e s s o r ' s  i n t e r a c t i o n  w i t h  a bus. (We have  conducted 
some ana lyses  of t h i s  la t ter  f a i lu re  type  and  have  inco rpora t ed  mechanisms 
i n  t h e  d e s i g n  t o  accommodate it, but   the  work is  n o t  y e t  complete.) When 
a f a i l e d  b u s  i s  d e t e c t e d  a n d  i d e n t i f i e d ,  t h e n  t h e  r e c o n f i g u r a t i o n  p r o c e s s  
w i l l  mod i fy  the  sys t em t ab le s  such  tha t  t h i s  bus  i s  avoided i n  a l l  f u t u r e  
read i n p u t  d a t a  o p e r a t i o n s .  
When a f a i l e d  p r o c e s s o r  i s  d e t e c t e d  a n d  i d e n t i f i e d ,  t h e  r e c o d f i g u r -  
a t i o n  p r o c e s s  is  t o  r e a l l o c a t e  t a s k s  t o  o p e r a t i v e  p r o c e s s o r s  s u c h  t h a t  
t h e  o p e r a t i v e  p r o c e s s o r s  are used  in  an  opt imal  manner .  One s i m p l e  
po l i cy  tha t  can  be  pu r sued  he re  is t o  a l l o c a t e  t h e  t a s k s  o f  t h e  f a i l e d  
p r o c e s s o r  t o  a spare p r o c e s s o r ,  i f  o n e  exists, o r  else accept  reduced 
redundancy  for  these  tasks .  The  des ign  does  not  impose  the  use  of  th i s  
s i m p l e  r e c o n f i g u r a t i o n  p o l i c y ,  b u t  i n s t e a d  a l l o w s  f o r  t h e  s t o r a g e  o f  t a s k  
a l l o c a t i o n s  t o  p r o c e s s o r s  as a f u n c t i o n  of f a u l t y  p r o c e s s o r s .  T h i s  t a b l e  
could  be  computed  pr ior  to  a f l i g h t  o r  when a p r o c e s s o r  f a i l u r e  is 
d e t e c t e d .  
185 
.. .. . . . . .. 
F o r  a n y  p o l i c y  o f  a l l o c a t i n g  t a s k s  t o  p r o c e s s o r s ,  a f t e r  a processor  
f a i l u r e ,  t h e  f o l l u w h g  steps must  be carried ou t :  
The  program  code f o r  a t a s k  a s s i g n e d  t o  a p rocesso r  is  
l o a d e d  i n t o  t h a t  p r o c e s s o r .  I n  S I F T  t h e  l o a d i n g  p r o -  
cess invo lves  the  r ead ing  o f  t he  p rogram code  f rom 
o t h e r  i n s t a n c e s  o f  t h e  program,  by a special load ing  
program i n   t h e   p r o c e s s o r .  
The t a b l e s  o f  a l l  p r o c e s s o r s  e x e c u t i n g  r e a l l o c a t e d  t a s k s  
m u s t  be updated s o  t h a t  t h e  r e a d  i n p u t  d a t a  o p e r a t i o n s  
are d i r e c t e d   t o   t h e   a p p r o p r i a t e   p r o c e s s o r .   S i n c e  a l l  
d a t a  r e q u i r e d  by a t a s k  is  obta ined  by read i n p u t  data, 
once the program code i s  loaded  and  the  tab les  are up- 
d a t e d ,  t h e  t a s k  i s  r e a d y  t o  b e  d i s p a t c h e d .  
Some t a s k s  w i l l  r e q u i r e  service i n d e p e n d e n t  o f  t h e  a c t i o n s  o f  t h e  
r econf igu ra t ion   p rocess .  Thus it is e s s e n t i a l  t h a t  a t  least one   i n s t ance  
of each c r i t i ca l  t a sk  be  d i spa tched  as needed  du r ing  the  r econf igu ra t ion  
p r o c e s s .   I n   o r d e r   t o   a c h i e v e   t h i s   c o n t i n u i t y   o f   s e r v i c e ,   t h e   r e c o n f i g u r -  
a t ion  po l i cy  changes  on ly  one  p rocesso r ' s  t a sk  a l loca t ion  a t  any t i m e .  I n  
a d d i t i o n ,  t h a t  p r o c e s s o r ' s  r e a l l o c a t i o n  is  comple ted  before  a t ten t ion  i s  
directed t o  a n o t h e r  p r o c e s s o r .  
Th i s  s ec t ion  has  p re sen ted  an  in fo rma l ,  bu t  comple t e ,  desc r ip t ion  
o f  t h e  e x t e r n a l  i n t e r f a c e  of SIFT  system,  and  SIFT'S  mechanisms  and 
f e a s i b l e  p o l i c i e s  i n  h a n d l i n g  f a u l t s .  T h e  n e x t  s e c t i o n  p r e s e n t s  a h i e r -  
a rch ica l  decomposi t ion  of  SIFT.  
C .  S t age  1 A s  Applied t o  'SIFT 
The decomposition of the SIFT system as a h i e r a r c h y  o f  a b s t r a c t  ma-  
ch ines  i s  shown i n  F i g u r e  IX-2.  Each  of t h e  s o l i d - l i n e  b o x e s  c o r r e s p o n d s  
t o  a n  a b s t r a c t  m a c h i n e  t h a t  m a i n t a i n s  a b s t r a c t  data o b j e c t s ,  i.e., con- 
t a i n s  0 and  V-funct ions.   Each  of   the  dashed-l ine  boxes i s  a n  a b s t r a c t  
program tha t  does  no t  con ta in  a s ta te ;  the  p rograms  jus t  con ta in  code  to  
ca l l  the  func t ions  o f  l ower  l eve l  abs t r ac t  mach ines ,  and  pe rhaps ,  con-  
s t a n t s .  
The state of  the  sys tem is m a i n t a i n e d  e n t i r e l y  by t h e  a b s t r a c t  ma- 
c h i n e s .  I n  o r d e r  t o  s i m p l i f y  t h e  d e s c r i p t i o n  we show on ly  a s i n g l e  i n -  
s tance  of   each  machine.  However, i t  is  unders tood   tha t   each   processor  
186 
APPLICATION  TASKS EXECUTIVE TASKS 
I""" 
I 
1 I N I 
L"""l I"--,-J 
1 r----- 7 r---- 1 r--- -1 
I TASK t""J 
TASK I I GLOBAL I 
I READER'VoTER I 
Clock-Tick 
I DISPATCHER I FAULT 1 I SCHEDULES FAULT I STATUS 
CIRCULAR 
LISTS 
BUFFER 
MEMORY BUS 
ADDRESSING CONNECTION 
1 
I HARDWARE I 
I I 
~~ 
FIGURE IX-2 HIERARCHICAL  STRUCTURE OF SIFT 
i n  t h e  s y s t e m  p r o v i d e s  some o f  t he  func t ions  o f  t hese  abs t r ac t  mach ines  
a t  its i n t e r f a c e .  I n  m a c h i n e  s p e c i f i c a t i o n s  ( S e c t i o n  IX-D) we fo rma l ly  
h a n d l e  t h e  s i t u a t i o n  o f  m u l t i p l e  i n s t a n c e s  by incorpora t ing  an  a rgument  
iden t i fy ing  the  p rocesso r  on  wh ich  the  func t ion  was ca l l ed .  Th i s  fo rma l -  
i t y  i s  s t r i c t l y  f o r  p u r p o s e s  o f  s p e c i f i c a t i o n  s i n c e  a t a s k  i n s t a n c e  c a n  
only  c a l l  func t ions  provided  by the  processor  on  which  i t  is running. 
A t a s k ,  when i t  i s  d ispa tched ,  w i l l  a cqu i r e  in fo rma t ion  by c a l l i n g  
the  in t e r f ace  V- func t ions  o f  t he  abs t r ac t  mach ines  and  can  change  the  
s t a t e  of the  abs t r ac t  mach ines ,  s ay  fo r  pu rposes  of t r a n s m i t t i n g  i n f o r -  
m a t i o n  t o  a n o t h e r  t a s k ,  by c a l l i n g  t h e  i n t e r f a c e  0 - f u n c t i o n s  of the  ab-  
s t ract  machines. 
Below we b r i e f l y  d i s c u s s  t h e  a b s t r a c t  m a c h i n e s  and  two e x e c u t i v e  
t a s k s ;  d e t a i l e d  d i s c u s s i o n s  are g i v e n  i n  S e c t i o n s  I X - D  and IX-E. 
Hardware :  Th i s   mach ine   p rov ides   t he   bas i c   p r imi t ive   p rocess ing   i n -  
s t r u c t i o n s  ( a r i t h m e t i c ,  l o g i c a l ,  c o n t r o l )  a s s o c i a t e d  w i t h  e a c h  o f  t h e  
p r o c e s s o r s ,  i n  a d d i t i o n  t o  t h e  few i n s t r u c t i o n s  a s s o c i a t e d  w i t h  t h e  b u s  
system. None o f  t he  bus  sys t em ins t ruc t ions  are v i s i b l e  t o  t h e  a p p l i c a -  
t i o n  t a s k s .  The bas i c  mach ine  in s t ruc t ions  are v i s i b l e ,  w i t h  t h e  excep- 
t i o n  t h a t  a l l  main memory r e f e r e n c e s  by t a s k s  are processed  by t h e  ma- 
chines   above  the  hardware.   These extra levels o f  i n d i r e c t i o n  e n s u r e  t h a t  
a n  e r r a n t  t a s k  w i l l  n o t  b e  a b l e  t o  access memory o u t s i d e  of i t s  workspace. 
Memory Addressing: A t a s k  d u r i n g  i t s  e x e c u t i o n  r e q u i r e s  access t o  
memory i n  o r d e r  t o  r ead  and  wr i t e  l oca l  d a t a ,  and to  r ead  p rogram ins t ruc -  
t i o n s .  The memory of  concern here  is t h e  memory a s s o c i a t e d  w i t h  t h e  p r o -  
c e s s o r  t h a t  is e x e c u t i n g  t h e  t a s k .  T h e  memory a d d r e s s i n g  a b s t r a c t  ma- 
c h i n e  e n s u r e s  t h a t  e a c h  t a s k ' s  a c c e s s e s  are w i t h i n  t h e  preset bounds f o r  
t h e  t a s k .  T h i s  l e v e l  o f  i n d i r e c t i o n  need no t  r e su l t  i n  i n e f f i c i e n t  p r o -  
ces s ing  o f  memory i n s t r u c t i o n s  i f  h a r d w a r e  b a s e  a n d  bound r e g i s t e r s  are 
used. 
Bus  Connection:  This  machine  provides  the  mechanism for  p rocesso r  
t o  connec t  wi th  another  processor  by a p a r t i c u l a r  b u s .  A t a s k  c a n  e s t a b -  
l i s h  a bus  connec t ion  on ly  ind i r ec t ly ,  by performing a read i n p u t  d a t a  
ope ra t ion .  
Buf fe r :  When t a s k  A is t o  d e l i v e r  d a t a  t o  t a s k  B, the  channel  i s  a 
b u f f e r  i n  t h e  p r o c e s s o r  o n  w h i c h  A i s  executing and which can be read by 
B. Corresponding  to  each  task  for  which  A computes d a t a ,  t h e r e  exists 
two b u f f e r s .  As we d i s c u s s  i n  S e c t i o n  IX-D two  such  communication  buf- 
f e r s  are r e q u i r e d  s i n c e  t h e  e x e c u t i o n  o f  d i f f e r e n t  i n s t a n c e s  o f  t a s k s  is 
no t   t i gh t ly   synchron ized .  The t a s k s  do n o t  a c c e s s  t h e  b u f f e r s  d i r e c t l y ;  
t h e  a c c e s s  i s  v i a  t h e  r e a d e r / v o t e r  a b s t r a c t  m a c h i n e .  The b u f f e r s  a r e ,  
o f  course ,  u l t imate ly  implemented  in  terms of areas of real memory t h a t  
on ly   the   buf fer   machine   abs t rac t   p rograms  can  access. Also, t h e  b u f f e r  
machine w i l l  e f f e c t  t h e  n e e d e d  b u s  c o n n e c t i o n s  i n  o r d e r  t o  c a r r y  o u t  t h e  
i n t e r p r o c e s s o r  read o p e r a t i o n s .  
Dispa tcher :   The   d i spa tcher   abs t rac t   machine   ho lds ,   for   each   pro-  
c e s s o r ,  a s c h e d u l e  g i v i n g  t h e  o r d e r  i n  w h i c h  t a s k s  are t o  be  d ispa tched .  
For  scheduled  tasks  (see S e c t i o n  V )  t he  schedu le  i s  a c i rcular  l i s t  of 
t a s k  names. During a f r ame  in t e rva l  success ive  t a sks  on  the  l i s t  are 
d ispa tched   in   tu rn .   Each   f rame  conta ins  some spare t i m e  i n  which  pr ior-  
i t y  t a s k s  are c o n s i d e r e d  f o r  d i s p a t c h i n g .  A t  t he  beg inn ing  o f  t he  nex t  
f r a m e  a n y  p r i o r i t y  t a s k  i n  e x e c u t i o n  i s  in te r rupted ,  and  the  next  sched-  
u l e d  t a s k  o n  t h e  c i rcular  list i s  d i spa tched .  A processor   can  be  given 
a new s c h e d u l e  a f t e r  a p r o c e s s o r  f a i l u r e  o r  when a change i n  f l i g h t  p h a s e  
occurs .  
Circular  L i s t s :  This   machine   p rovides   func t ions   for   main ta in ing   and  
a c c e s s i n g  t h e  c i rcular  l is ts  tha t  compr i se  the  schedu les  f o r  schedhled 
t a s k s .  I n  t h i s  m a c h i n e  t h e  c i rcular  l i s ts  are stored  compact ly  as reg-  
u l a r  expres s ions .  
Faul t   Schedules :  A s  noted i n  S e c t i o n  IX-B, i t  is f e a s i b l e  t o  p r e -  
c o m p u t e  s c h e d u l e s  f o r  e a c h  o f  t h e  p r o c e s s o r s  t h a t  are t o  be invoked under 
a l l  p o s s i b l e  p r o c e s s o r  f a u l t  c o n d i t i o n s .  T h e s e  s c h e d u l e s  are s t o r e d  i n  
t h e  f a u l t  s c h e d u l e  m a c h i n e  f o r  access as needed. It i s  l i k e l y  t h a t  e a c h  
processor  could be preloaded with a l l  of t h e  s c h e d u l e s  t h a t  it w i l l  need 
dur ing  a f l i g h t ;  t h e  a p p r o p r i a t e  s c h e d u l e ,  f o r  a g iven  state o f  t he  sys -  
t e m ,  could be selected by t h e  g l o b a l  e x e c u t i v e .  
189 
F a u l t  S t a t u s :  T h i s  m a c h i n e  is used t o  s t o r e  t h e  s t a t u s  ( o p e r a t i v e  
o r  f a i l e d )  o f  t h e  p r o c e s s o r s  and  buses  of  the  system. A s  we w i l l  n o t e  
below it i s  used  by  the  g loba l  execu t ive  in  dec id ing  the  r econf igu ra t ion  
state a f t e r  a f a u l t .  
ReaderIVoter:   This  machine serves two purposes:  (1) it i s  c a l l e d  
by a task whenever  i t  writes d a t a  i n t o  a b u f f e r  f o r  a n o t h e r  t a s k  o r  r e a d s  
data  f rom another  task,  and (2)  it  r e c o r d s  t h e  o c c u r r e n c e  o f  f a u l t s .  
With regard t o  (1) t h e  m a c h i n e  s t o r e s  t h e  i d e n t i f i c a t i o n  o f  t h e  p r o c e s s o r s  
e x e c u t i n g  t a s k s  a n d  t h e  b u s e s  t o  b e  u s e d  i n  r e a d i n g  d a t a .  F o r  a read  op- 
e r a t i o n ,  i n  which  task  A reads from a b u f f e r  l o c a t i o n  o f  t a s k  B, t h e  
r e a d e r l v o t e r  r e t u r n s  t h e  m a j o r i t y  v a l u e  o v e r  a l l  i n s t a n c e s  o f  t a s k  B. 
W i t h  r e g a r d  t o  (2) t h e  r e a d e r / v o t e r  r e c o r d s  t h e  o c c u r r e n c e s  o f  p r o c e s s o r  
and  bus f a u l t s .  A f a u l t  becomes  manifested as a d e t e c t e d  e r r o r  i f  it 
causes  a d i s a g r e e m e n t  i n  a v o t e .  M u l t i p l e  f a u l t s  c a n  c a u s e  a t a s k  t o  
f a i l  it i f  t h e y  c a u s e  o n e - h a l f  o r  more a t  t h e  i n s t a n c e s  o f  a t a s k  t o  
p r o d u c e  a n  e r r o n e o u s  r e s u l t  i n  a n  o u t p u t  b u f f e r  o f  t h e  t a s k .  T h e  r e a d e r /  
v o t e r  r e c o r d s  s u c h  f a t a l  e r r o r  o c c u r r e n c e s .  
E a c h  r e a d e r l v o t e r  i n s t a n c e ,  u p o n  d e t e c t i n g  a n  e r r o r  as a d i sag ree -  
ment among o n e  o r  more  of  the  voted  inputs ,  a t t e m p t s  t o  i d e n t i f y  t h e  o f -  
f e n s i v e   u n i t s .  It accompl i shes   t h i s   d i agnos i s   by   r eco rd ing   t he   p rocesso r -  
bus  combinat ions  that   produced  an  input   in   the  minori ty .  The g l o b a l  
execut ive ,  descr ibed  be low,  ana lyzes  the  repor t s  of  a l l  o f  t h e  r e a d e r /  
v o t e r  i n s t a n c e s ,  a c c o u n t i n g  f o r  t h e  p o s s i b i l i t y  o f  e r r o n e o u s  r e p o r t s  f r o m  
a f a i l e d  p r o c e s s o r .  
Above we h a v e  b r i e f l y  d e s c r i b e d  t h e  a b s t r a c t  m a c h i n e s  o f  t h e  SIFT 
sys t em;  more  de t a i l s  are g i v e n  i n  S e c t i o n  IX-D relative t o  t h e  s p e c i f i -  
c a t i o n s .  The g loba l   execu t ive   and   l oca l   execu t ive   p rog rams  are b r i e f l y  
discussed  below. 
Global   Execut ive :   The   g loba l   execut ive  (GE) is s imply a t a s k  t h a t  
manages t h e  s y s t e m  r e c o n f i g u r a t i o n  a f t e r  t h e  d e t e c t i o n  o f  a f a u l t .  I t  
has  access t o  t h e  g l o b a l  s t a t u s  o f  t h e  s y s t e m  a n d  h e n c e  c a n  d e t e r m i n e  
t h e  new system state. The GE shou ld  be  d i spa tched  o f t en  enough  to  gua r -  
a n t e e  a r a p i d  t r a n s i t i o n  t o  a recovery  state.  A p r e l i m i n a r y  a n a l y s i s  
190 
o f  t h e  e f f e c t s  o f  r e c o n f i g u r a t i o n  t i m e  on t h e  s y s t e m  r e l i a b i l i t y  h a s  d e -  
t e r m i n e d  t h a t  t h e  GE should be dispatched as o f t e n  as t h e  h i g h e s t - r a t e  
a p p l i c a t i o n  t a s k .  S i n c e  t h e  GE i s  a c r i t i ca l  t a sk ,  it m u s t  be  executed 
redundant ly- -a t  least t r i p l i c a t e d .  
The   opera t ion   of   (each   ins tance   o f )   the  GE is as fo l lows .  It  reads 
t h e  e r r o r  r e p o r t s  o f  a l l  r eade r lvo te r  i n s t ances ,  and  a t t e m p t s  t o  i d e n t i f y  
the  f a i l ed  p rocesso r s  and /o r  buses ,  i f  any .  The  GE w i l l  c o r r e c t l y  i d e n -  
t i f y  t h e  f a i l e d  u n i t s  p r o v i d e d  f o r  e a c h  v o t e  o n l y  a m i n o r i t y  o f  t h e  i n -  
p u t s  are i n  e r r o r .  I f  t h e  GE h a s  i d e n t i f i e d  a f a i l e d  b u s  it computes a 
new bus assignment,  and informs each local executive o f  t h i s  new as s ign -  
ment  by  communication  through a s h a r e d  b u f f e r .  I f  a processor  has  been  
deemed t o  h a v e  f a i l e d  t h e n  t h e  GE communicates  with the local  execut ives  
t h a t  m u s t  fo l low new s c h e d u l e s .  I n  o r d e r  t o  m a i n t a i n  t h e  s e r v i c i n g  o f  
t a s k s  d u r i n g  a r econf igu ra t ion ,  on ly  one  p rocesso r  i s  p e r m i t t e d  t o  be i n  
t h e  r e c o n f i g u r a t i o n  s ta te  a t  a n y  i n s t a n t .  Hence, t h e  GE s e l e c t s  a pro- 
c e s s o r  t o  f o l l o w  a new schedule ,  and then awaits the  comple t ion  o f  t h i s  
p r o c e s s o r ' s  r e c o n f i g u r a t i o n  b e f o r e  s e l e c t i n g  t h e  n e x t  p r o c e s s o r .  T h e  
p rocess ing  t i m e  f o r  t h e  GE i s  dependent  on  the  ex is tence  of  an  e r ror  and  
on t h e  number o f  r e a d e r l v o t e r  i n s t a n c e s  r e p o r t i n g  a n  e r r o r .  However, t h e  
maximum a n t i c i p a t e d  p r o c e s s i n g  t i m e  should be small enough such that 
t h e  GE can be considered as a scheduled  task  to  be  d ispa tched  every  f rame.  
The abs t r ac t  imp lemen ta t ion  o f  t he  GE i s  d i s c u s s e d  i n  S e c t i o n  IX-F. 
By v i r t u e  o f  t h e  a b s t r a c t  f u n c t i o n s  p r o v i d e d  i n  t h e  SIFT i n t e r f a c e ,  t h e  
program i s  r e l a t i v e l y  s i m p l e  and should be amenable t o  p r o o f .  
Loca l   Execut ive :   Each   processor   conta ins  a l o c a l  e x e c u t i v e  (LE) 
which i s  a t a s k  w h i c h . c o n t r o l s  t h e  r e c o n f i g u r a t i o n  o f  t h e  p r o c e s s o r  as 
d i c t a t e d  by t h e  GE. The LE i s  dispatched  every  f rame  and i t s  o p e r a t i o n  
i s  s i m p l y  t o  r e a d  t h e  b u f f e r  s h a r e d  w i t h  t h e  GE, i n  o r d e r  t o  d e t e r m i n e  
i f  t h e  GE wi shes  to  change  the  s ta te  o f  t h e  LE'S p rocesso r .  Th i s  s i m p l e  
read operation consumes only a few machine  ins t ruc t ions .  
However, i f  t h e  p r o c e s s o r  is t o  b e  s i g n i f i c a n t l y  r e c o n f i g u r e d  t h e n  
it is p r o b a b l y  n e c e s s a r y  t o  s u s p e n d  t h e  s c h e d u l i n g  o f  t a s k s  u n t i l  t h e  
r e c o n f i g u r a t i o n  i s  complete. The LE can  accompl i sh  th i s  suspens ion  by 
19 1 
c a l l i n g  t h e  d i s p a t c h e r  m a c h i n e  t o  i n v o k e  a schedule  which  conta ins  only  
t h e  LE. Based  upon the  computa t ion  o f  t he  GE, t h e  r e c o n f i g u r a t i o n  o f  a 
p rocesso r   cou ld   i nvo lve :  (1) in fo rming   t he  LE tha t   t he   a s s ignmen t   o f  
t a s k s  t o  p r o c e s s o r s  ( e x c l u d i n g  t h e  LE'S processo r )  i s  changed, o r  (2) 
in fo rming  the  LE tha t  t he  bus  a s s ignmen t  fo r  r eads  i s  changed, o r  ( 3 )  
i n fo rming  the  LE t h a t  i t  i s  t o  change   the   schedule   o f  i t s  processor.   The 
r e c o n f i g u r a t i o n s  a s s o c i a t e d  w i t h  (1) and (2)  are no t  t i m e  consuming, 
mere ly  requi r ing  the  updat ing  of  the  reader lvoter  ass ignment  tab les .  
The r e c o n f i g u r a t i o n  a s s o c i a t e d  w i t h  ( 3 )  is s i g n i f i c a n t ,  however, 
s i n c e  it p o s s i b l y  i n v o l v e s  a major change i n  t h e  s c h e d u l e  o f  t h e  p r o c e s -  
s o r .  I f  new t a s k s  are t o  b e  a l l o c a t e d  t o  t h e  p r o c e s s o r  t h e n  a loader  pro-  
gram, which could be a separate t a s k  o r  j u s t  a subprogram of the LE, m u s t  
be invoked t o  s t o r e  t h e  program code of the new t a s k s  i n  t h e  p r o c e s s o r ' s  
memory. The  loading i s  accompl ished  for  each  task  by  vot ing  on  each  of  
the  instances   of   the   task 's   program  code.   The LE m u s t  a s s i g n  memory lo- 
c a t i o n s  i n  t h e  p r o c e s s o r  f o r  t h e  new tasks  and  must  update  the  tab les  of  
t he  bu f fe r  and  memory a d d r e s s i n g  m a c h i n e s  t o  r e f l e c t  t h e  new assignment 
of   t asks .  Once the   l oad ing   and   t ab l e   upda t ing   ope ra t ions  are complete, 
t h e  LE can  invoke  the  new schedule  and  cause  the  processor  to  resume i t s  
s e r v i c i n g  o f  t a s k s .  
The LE is  a more complicated program than the GE, bu t  we f e e l  t h a t  
it should s t i l l  be  amenable t o  f o r m a l  p r o o f .  Some of the   complexi ty   can  
be handled by decomposing the LE in to  th ree  subprograms ,  co r re spond ing  
t o  items (1) through ( 3 )  above. 
D. Fo rma l   Spec i f i ca t ion  of SIFT 
1. I n t r o d u c t i o n  
SIFT is s p e c i f i e d  as a hierarchy of  Parnas  modules .  Each mod- 
u l e  i s  regarded as an  abs t rac t  machine ,  having  i t s  own d a t a  s t ruc tures  
(V-funct ions)   and  operat ions  (O-funct ions) .  
A t  any given t i m e ,  t h e  state of each machine is j u s t  a d e s c r i p t i o n  
o f  t he  in s t an taneous  va lues  o f  i t s  V-functions.  The  O-functions  of a mod- 
u l e  are o p e r a t i o n s  t h a t  cause t h e  s ta te  t o  change. 
192 
The highest  module in the  hierarchy  is an abstract,  global 
description  of  what  the  system  does.  Modules  at  lower  levels  of  the  hier- 
archy  can  be  viewed  as  building  blocks  for  implementing  the  highest- 
level  module.  Modules  at  still  lower  levels  are  building  blocks  for  im- 
plementing  those  at  the  intermediate  levels,  and so on. 
The specifications  that  follow in subsection IX-D-1 describe 
each  module  independently.  By  themselves,  they  say  nothing  about  how  the 
lowever  level  modules  are  actually  used  to  implement  those  at  higher 
levels.  This  information  is  provided,  rather,  by  mapping  functions  and 
abstract  programs.  Mapping  functions  implement  the  V-functions  of  a  given 
module  with  the  V-functions  of  lower-level  modules;  abstract  programs 
implement  the  O-functions  of  a  given  module  with  programs  written in terms 
of  the  O-functions  and  V-functions  of  lower-level  modules.  The  mapping 
functions  and  abstract  programs  for  SIFT  will  have  to  be  specified  before 
the  system  can  be  coded. It should  be  emphasized  however,  that  the  prop- 
erties we wish  to  prove  about  the  SIFT  design  depend  only on the  module 
specifications--not  on  the  mapping  functions  or  abstract  implementation. 
(In  the  same  way,  the  correctness  of  a  FORTRAN  program  depends  only  on 
the  program  itself--not  on  the  compiler.) 
Parnas  modules  are  specified  according  to  a  rigorous  syntactic  dis- 
cipline  much  like  a  programming  language.  Each  module  specification  is 
composed  of  several  segments--one  for  declaration  of  type  variables,  one 
for  defining  V-functions  and  O-functions,  and so forth. 
The  purpose  of  the  various  sections  of  a  module  specification  might 
best  be  explained in relation  to  a  specific  example.  Consider,  then, 
the  reader/voter  module,  which  has  five  sections--one  for  DECLARATIONS, 
PARAMETERS,  DEFINITION,  EXCEPTIONS,  and  FUNCTIONS.  This  last  section  is 
actually  the  most  important,  since  it  declares  the  V-functions  and 
O-functions  of  the  module. 
193 
TI 
2. The FUNCTIONS section 
The first  function  declared  in  the  FUNCTIONS  section  of  the 
reader-voter  is  the  V-function  task-set. The function  header VFUN task- 
set(proc) = st  gives  the  name  of  the  V-function,  its  formal  argument  list 
(proc),  and  the  result  identifier  st. The identifiers  proc  and  st.  are 
declared  in  the  DECLARATIONS  section  to  be  of  type  PROC  (processor)  and 
SET-OF  TASK  (set  of  tasks),  respectively. The  V-function  task-set is 
thus a  data  structure  which  is  indexed  on  PROCs  and  which  stores  sets  of 
TASKS.  The  INITIALLY  phrase  in  the  declaration  indicates  that  at  the 
time  the  module  is  initialized,  the  value  of  task-set  (on  each  argument 
proc)  is  the  single-element  set  {le]. The intended  interpretation  is 
just  that  at  initialization  of  the  system,  each  processor  is  loaded  with 
only  the  local  executive  task. 
The  next  declaration  in  the  FUNCTIONS  section  introduces the 
V-function  proc-bus-assignments,  which  is  keyed  on  two  arguments (a  PROC 
and a  TASK)  and  which  stores  sets  of  PAIRS.  Unlike  the  declaration  for 
task-set,  this  one  has  an  EXCEPTIONS  subsection.  Exception  conditions 
are  boolean  expressions  used  to  restrict  the  domain  of  a  V-function,  much 
as  array  bounds  in  an  ALGOL  array  declaration  restrict  the  domain  of  the 
array.  While  V-functions  may in general  have  an  arbitrary  number  of 
exception  conditions,  proc-bus-assignments  has  only  one:  task-not-in- 
proc(proc  task). Like  all  other  exception  conditions,  task-not-in-proc 
is  defined  in  the  EXCEPTIONS  section  of  the  module  specification  (to  be 
described  later).  As  is  evident  from  the  definition,  task-not-in-proc 
(proc  task) is TRUE  for  a  given  value  of  proc  and  task  if  and  only  if  task 
is  not  currently a  member  of  task-set.  Thus,  just  as  it  is  erroneous  to 
try  to  read  an  array  outside  the  limits  of  its  indices,  it  is  erroneous 
to  try  to  read  proc-bus-assignments  at  (proc,task)  if  task & task-set 
(proc)  at  that  time.  More  generally, it is  erroneous  to  try  to  read  any 
V-function  on  arguments  that  violate  any of its  exception  conditions  at 
this  time. 
Declarations  that  contain  a  header, a comment,  zero  or  more 
exceptions,  and  an  initialization  part  account  for  the  great  majority  of 
194 
V - f u n c t i o n s  i n  a t y p i c a l  module s p e c i f i c a t i o n .  I n  a d d i t i o n ,  t h e r e  are a 
few spec ia l  k inds  o f  V- func t ions  tha t  are dec la red  somewhat d i f f e r e n t l y :  
A DERIVED V-function is  one whose v a l u e  i s  determined completely 
by t h e  v a l u e s  of o ther  V-funct ions  in  the  module .  The  V-funct ion  e r ror -  
d e t e c t e d ,   f o r  example, i s  DERIVED. Its v a l u e  is  determined  completely by 
tha t   o f   t he   V- func t ion   d i sag reemen t - se t .  More p a r t i c u l a r l y ,  f o r  a g iven  
va lue  of  i ts argument  proc,  error-detected has  the value t r u e  i f  and only 
i f  t h e  set d isagreement -se t (proc)  is non-empty.  Although  derived V- 
f u n c t i o n s  are, s t r i c t l y  s p e a k i n g ,  s u p e r f l u o u s ,  t h e y  o f t e n  a d d  c l a r i t y  t o  
s p e c i f i c a t i o n s .  N o t e  t h a t  d e r i v e d  V - f u n c t i o n s  h a v e  n o  INITIALLY s p e c i f i -  
c a t i o n  s i n c e  t h e i r  i n i t i a l  v a l u e s  are determined by t h e  i n i t i a l  v a l u e s  of 
the V-funct ions from which they der ive.  
A HIDDEN V-function i s  one  tha t  i s  n o t  i n t e n d e d  t o  b e  a v a i l a b l e  
t o  t h e  u s e r  of  the  module.   The  V-function  fault ,   for  example,  i s  HIDDEN 
r e f l e c t i n g  t h e  f a c t  t h a t  t h e  module  does  not  provide  d i rec t  access  to  in-  
formation  about   what   processors   and/or   busses  are f a u l t y .  Note,  however, 
t h a t  t h e  v a l u e s  o f  HIDDEN V-functions do impac t  F - f u n c t i o n s  t h a t  are v i s i -  
b l e .   Fo r  example, t h e  DERIVED V-function read i s  i n  p a r t  derived  from 
f a u l t .  A p a r t  f rom the  des igna t ion  HIDDEN, HIDDEN V-functions are s p e c i -  
f i e d  i n  e x a c t l y  t h e  same way as ordinary V-funct ions.  
A thi rd special  kind of  V-funct ion,  the OV-funct ion,  w i l l  be 
treated l a t e r .  
I n  a d d i t i o n  t o  d e c l a r a t i o n s  f o r  t h e  s t o r a g e  e l e m e n t s  o f  t h e  
module, t h e  FUNCTIONS s e c t i o n  c o n t a i n s  d e c l a r a t i o n s  f o r  t h e  o p e r a t o r s ,  
o r  0 - func t ions ,  t ha t  change  the  va lues  o f  t hose  e l emen t s .  As wi th  V- 
f u n c t i o n  d e c l a r a t i o n s ,  e a c h  0 - f u n c t i o n  d e c l a r a t i o n  b e g i n s  w i t h  a header 
g i v i n g  i ts  name and  formal  argument l ist .  S ince   0 - func t ions   do   no t   s to re  
va lues  but  on ly  change  the  va lues  of  V-funct ions ,  no " r e s u l t "  i d e n t i f i e r  
i s  given. As with   V-funct ion   dec la ra t ions ,   an  EXCEPTIONS s u b s e c t i o n  
may be   p re sen t ,   r e s t r i c t ing   t he   r ange   o f   accep tab le   a rgumen t s .  Once aga in ,  
t h e  d e f i n i t i o n  of each  excep t ion  cond i t ion  a p p e a r s  i n   t h e  EXCEPTIONS sec- 
t i o n  of t h e  m o d u l e  s p e c i f i c a t i o n .  
195 
The  substance  of an O-function  specification  is  contained in
its  EFFECTS  subsection--the  section  that  describes  exactly  what  the 0-  
function  does. More precisely,  the  EFFECTS  section  contains  a  statement 
of the  relationship  between  the  state of the  module (i.e., the  values  of 
its  V-functions)  before  the  O-function  is  called,  and  the  state  just 
after  it  is  called. The O-function  delete-task(proc  task) in the  reader- 
voter  is  a  typical  example.  This  O-function  has  two  effects. It changes 
the  value  of  the  V-function  task-set,  deleting  task  from  the  set  task- 
set(proc). It also  changes  the  value of the  V-function  proc-bus- 
assignments,  causing  it  to  be  undefined  for  the  argument  pair  (proc,task). 
In the  specification,  quotes  are  used  to  distinguish  the  values  of  V- 
functions  before  the  call  from  those  after  the  call.  Thus,  'task-set 
(proc)' refers  to  the  state  before  the  call  while  task-set(proc)  refers 
to  that  after  the  call. 
It is  quite  important  to  note  that  the  statements i  EFFECTS 
sections are assertions,  i.e.,  mathematical  statements  of a relationship 
among  states.  They  are  not in any way  procedural  as  would  be,  say, 
assignments  in  some  programming  language. For example,  the  assertion 
task-set(proc) = 'task-set(proc)' u {task]  could  equally well be  written 
'task-set(proc)' U [task} = task-set(proc) 
since  the = stand  for  equality, not for  assignment. 
Just  as  HIDDEN  V-functions  are  used  to  mask  certain  state  in- 
formation,  HIDDEN  O-functions  are  used  to mask certain  changes in state. 
The O-function  cause-fault  is  of  this  type. The cause-fault  operation 
simulates  a  hardware  failure  that  impacts  certain  reads.  Since  this 
operation  is  not  really  available  to  the  SIFT  system  but  is  rather  part 
of  the  internal  affairs  of  the  module,  it  is  made  hidden. 
In addition  to  V-function  and  O-function  declarations,  the 
FUNCTIONS  section  contains  declarations  for a function  which  is  a  combi- 
nation  of  the  two--the  OV-function.  An  OV-function  may  be  viewed  either 
as  an  O-function  that  returns a value,  or  as  a  V-function  whose  invocation 
produces a side  effect.  The  OV-function  vote-read  is of this  kind  since 
196 
i t  bo th  r e tu rns  a va lue  of  word  and po ten t i a l ly  changes  the  va lues  o f  t he  
V - f u n c t i o n s  f a t a l - e r r o r  and disagreement-set .  
3 .  The DECLARATIONS S e c t i o n  
P a r t  o f  t he  spec i f i ca t ion  o f  any  Pa rnas  module i s  a s e c t i o n  de-  
c l a r i n g  t h e  t y p e s  o f  t h e  i d e n t i f i e r s  ( s u c h  as formal arguments t o  V- and 
O- func t ions )  u sed  in  o the r  parts of t h e  module s p e c i f i c a t i o n .  I n  m o s t  
programming  languages,   declarations are u s e d   f o r  two p u r p o s e s :   t o   d i r e c t  
a l l o c a t i o n  o f  s t o r a g e ,  a n d  t o  p r o v i d e  f o r  t y p e - c h e c k i n g .  I n  t h e  P a r n a s  
con tex t ,  s to rage  i s  associated only with V-funct ions,  which are dec lared  
i n  t h e  FUNCTIONS s e c t i o n .  The DECLARATIONS s e c t i o n  of a module  spec i f i -  
ca t ion  conce rns  i t s e l f  exc lus ive ly  wi th  the  typ ing  o f  fo rma l  a rgumen t s  
( i n c l u d i n g  t h e  "result" arguments of V-functions).  
I n  t h e  DECLARATIONS s e c t i o n  o f  t he  r eade r -vo te r ,  t he re  are 
d e c l a r a t i o n s  f o r  i n t e g e r  a n d  b o o l e a n  i d e n t i f i e r s  much i n  t h e  s t y l e  of 
ALGOL. Unlike ALGOL, however, the  language of  module s p e c i f i c a t i o n s  
p r o v i d e s  a n  e x t e n s i b l e  t y p e  f a c i l i t y ,  t h a t  is, t h e  d e s i g n e r  of a module 
may in t roduce  new, a b s t r a c t  t y p e s .  
F o r  t h i s  p u r p o s e ,  t h e  DECLARATIONS s e c t i o n  may inc lude  a TYPE 
subsec t ion  in  wh ich  types  ( a s  opposed  to  ob jec t s  o f  a given type)  are 
dec la red .  The TYPE subsec t ion   o f   t he   r eade r -vo te r   con ta ins   dec l a ra t ions  
f o r  t h r e e  new p r imi t ive  types  (PROC, TASK, and MACHINEWORD) and a new 
compound type  (PAIR).  The  word "DESIGNATOR" i n d i c a t e s  t h a t  t h e  new type  
is p r imi t ive ,  i.e., is no t   cons t ruc t ed   f rom  ex i s t ing   t ypes .  The name 
PROC s u g g e s t s  t h a t  o b j e c t s  o f  t h i s  t y p e  are in t ended  to  des igna te  p ro -  
ces so r s ,  as indeed  they are. From the  formal  point  of  view,  however,  
ob jec t s  o f  t ype  PROC have no i n t r i n s i c  p r o p e r t i e s  o t h e r  t h a n  t h a t  t h e y  
are d i s t i n c t  f r o m  a l l  objec ts  of  any  o ther  type .  
The new t y p e  PAIR, o n  t h e  o t h e r  hand, is d e f i n e d  i n  terms of 
more p r i m i t i v e   t y p e s .  The type   des igna t ion  STRUCTURE  (PROC: p roc ,   i n t ege r :  
b u s )  i n d i c a t e s  t h a t  o b j e c t s  of type PAIR a r e  composed o f  two par ts ,  one 
of which i s  a PROC and the other  of  which i s  a n  i n t e g e r .  The i d e n t i f i e r s  
proc and bus i n  t h e  d e c l a r a t i o n  o f  PAIR a r e  s e l e c t o r s ;  g i v e n  a n  o b j e c t  p 
197 
of  type  PAIR, p . p r o c  r e f e r s  t o  t h e  component  of p which i s  of  type  PROC, 
a n d  p . b u s  r e f e r s  t o  t h a t  o f  t y p e  i n t e g e r .  
New types  may a l so  be  deve loped  us ing  the  SET-OF o r  BAG-OF con- 
s t r u c t s .  The i d e n t i f i e r  s p ,  f o r  example, i s  dec la red  as a set of o b j e c t s  
each  of  which i s  of   type PROC. S i m i l a r l y ,  v o t e s  is  dec la red  as a bag 
( t h a t  is, a kind of set i n  which a g iven  member may have  r epea ted  in - . -  
s t ances )  o f  MACHINEWORDS. 
4 .  The PARAMETERS S e c t i o n  
The PARAMETERS s e c t i o n  c o n t a i n s  d e c l a r a t i o n s  f o r  t h e  c o n s t a n t s  
of t he  des ign .  Parameters can  be  viewed as V-functions  whose  values are 
f ixed  once  and f o r  a l l  a t  t h e  time t h e  module i s  implemented.  They are 
f r e q u e n t l y  used  i n  e x c e p t i o n  c o n d i t i o n s  t o  d e s i g n a t e  t h e  maximum or  min i -  
mum va lues  V- o r  0- func t ions  a rguments  may t ake .  The i n t e g e r  parameter 
max-tasks,  for  example,  indicates  the maximum number o f  t a sks  a processor  
can  accomodate.  Max-tasks a p p e a r s  i n  t h e  d e f i n i t i o n  o f  t h e  e x c e p t i o n  
condi t ion  too-many-tasks.   Note   that   the   syntax  of  parameter d e c l a r a t i o n s  
m i r r o r s  t h a t  o f  d e c l a r a t i o n s  i n  t h e  DECLARATIONS s e c t i o n .  
5. The DEFINITIONS S e c t i o n  
J u s t  as assembly  language  macros  save  programmers  the  labor  of 
w r i t i n g  o u t  repeated ins t ances  o f  a r o u t i n e ,  d e f i n i t i o n a l  m a c r o s  a l l o w  
t h e  s p e c i f i e r  of a module t o  a v o i d  r e p e a t e d  i n s t a n c e s  of an expression.  
Each d e f i n i t i o n  b e g i n s  w i t h  a MACRO h e a d e r  g i v i n g  t h e  name of t h e  macro, 
a ( p o s s i b l y  empty)  formal  argument l ist ,  and a r e s u l t  i d e n t i f i e r  ( u s e d  
fo r  t ype  check ing) .  
Two de f in i t i ona l  macros  are used i n  t h e  r e a d e r - v o t e r  module-- 
m a j o r i t y - o p i n i o n ( p r o c  t a s k  o f f s e t )  and d i s s e n t i n g - p a i r s ( p r o c  t a s k  o f f s e t ) .  
A s  i t  happens ,  each  def in i t ion  i s  used only once-- in  the EFFECTS s e c t i o n  
of  the OV-funct ion read-vote .  Macros were used in  this  case not  to  save 
w r i t i n g ,  b u t  t o  make the  EFFECTS s e c t i o n  easier t o  r e a d  and understand. 
198 
6 .  The EXCEPTIONS Section 
Exceptions  were  described  earlier  as  boolean  conditions  used  to 
restrict  the  intended  argument  domains  of  V-functions  and  0-functions. 
Because  a  single  exception  frequently  applies  to  several V- and/or 0- 
functions,  all  exceptions  are  defined  as  macros. The syntax  used  is 
exactly  the  same  as  that  used  for  macros  in  the  DEFINITIONS  section. 
Memory  Addressing 
The  functions  of  the  abstract  machine  instance  in  processor  proc 
are  called  by  abstract  machine  instances  above  memory  addressing  in  the 
hierarchy  and  tasks  executing on proc. These  tasks  will  use  memory ad- 
dressing in order  to  execute  instructions  in  their  programs  and  to  access 
temporary  data  locations. The machine  includes  some  simple  protection 
mechanisms in order  to  prevent  a  task  from  writing  beyond  the  limits  of 
its  address  space. 
Initially  the  only  task  known  to  memory  addressing,  as  indicated  by 
the  value of the  V-function  task-set,  is  the  local  executive (LE). The 
V-function  mem-area-write  defines  the  memory  area  allocated  to  a  task  for 
writing.  Initially  a  fixed  area  is  assigned  to  the LE; the  remaining  area 
is  free  as  indicated  by  the  value  of  the  V-function  area-free. The 0- 
functions  assign-mem-area  and  make-free  respectively  allocate  memory  area 
to  a  task  and  deallocate  the  area  that  was  previously  assigned  to  a  task. 
In order  to  ensure  than  an  errant,  unproved  application  task  does  not 
deleteriously  affect  another  task  or  the  system,  these  functions  are  not 
accessible  to  application  tasks.  Instead  the  LE  and  the  buffer  abstract 
machine will have  the  major  responsibility  for  managing  the  memory  in  its 
processor. 
The  V-function  memory,  is  called  by  a  task,  or  an  abstract  machine 
program,  in  order  to  read  the  contents  at  a  memory  location.  The 0- 
function,  write,  is  called in order  to  modify  the  value  at  a  location. 
199 
MODULE memory-addressing 
DECLARATIONS 
TYPE 
PROC, TASK, WORD = DESIGNATOR 
END -T Y PE 
WORD machineword 
boolean b p a r i t y  
i n t e g e r  o f f s e t  l e n g t h  a d d r e s s  
TASK t a s k  
PEOC proc 
SET-OF TASK s 
END-DECLARATIONS 
PARAMETERS 
TASK l e  (; local e x e c u t i v e  t a s k )  
SET-OF PROC proc-se t  (;  s e t  of processo r s )  
SET-OF  TASK t a s k s  ( ; set  of v a l i d  t a s k s )  
i n t e g e r  s i z e - l e  (; number of words occupied by 1 
i n t e g e r  
i n t e g e r  
- e )  .oca1  execut i v  
mem-size ( ;  t o t a l  number of  words o f  a s i n g l e  memory) 
max-tasks (; m a x i m u m  a l lowable  number of tasks) 
END-PARAMETERS 
EXCEPTIONS 
MACRO no-proc (proc) = b 
not  proc member-of proc-set  
MACRO n o t - a - t a s k ( t a s k )  = b 
n o t  t a s k  member-of t a s k s  
MACRO out-of-bounds(address)  = b 
addres s  > mem-size - 1 o r  a d d r e s s  < 0 
MACRO task-not - in-proc(proc  task)  = b 
n o t  t a s k  member-of ' t a s k - s e t ( p r o c ) '  
MACRO not-authorized-write(proc t a s k  a d d r e s s )  = b 
not(  mem-area-write(proc  task)[ 1 3  <= addres s  and 
address  <= mem-area-write(proc task)[ 11 + 
mem-area-write(proc task) C2 I 1 
200 
MACRO too-many-tasks(proc1 = b 
cardinality('task-set(proc) I >= ma-tasks 
MACRO task-in-proc(proc  task) = b 
task  member-of 'task-set ( proc) ' 
MACRO area-not-fPee(proc  base  length) = b 
not  'area-free(proc  base  length)' 
MACRO not-a-task(task) = b 
not  task  member-of  tasks 
END-EXCEPTIONS 
FUNCTIONS 
VFUN memory(proc  address) = word 
( ;  memory  contents of processor  proc at given  address) 
EXCEPTIONS 
out-of-bounds(address) 
no-proc ( proc 1 
INITIALLY if o <= i <= size-le  then  memory(proc,  i) = mem-le(i.1 
else  if  size-le <= i < mem-size  then  memory(proc,i) = 0 
else  undefined 
END-EXCEPTIONS 
VFUN area-free(proc,base,length) = b 
( ;  indicates  whether  or  not  the  locations  in  proc  from 
EXCEPT IONS 
out-of-bounds(  base 
out-of-bounds(  base+length-1 ) 
not-proc(proc) 
INITIALLY if  base < size-le  then  area-free(proc,base,i) = false 
base  to  base+length-1  are  free) 
END-EXCEPTIONS 
else  area(proc,base,i) = true 
V F U N  mem-area-write(proc,task) = <base,length> 
(; indicates  memory  range  within  which  task is allowed  to  write) 
EXCEPT IONS 
no-proc(proc1 
task-not-in-proc(proc,  task) 
INITIALLY if task = le then 
END-EXCEPTIONS 
memory-area-write(proc,le) = <O,size-le - 1 >  
else  undefined 
VFUN task-set(proc) = s 
(; indicates  set of tasks  assigned  to  processor  proc) 
INITIALLY task-set(proc) = le 
201 
wuN write(proc  task  address  word) 
(; writes  word  in  address of processor  PrOC  for  task  task) 
EXCEPTIONS 
task-not-in-proc(pr0c  ,task) 
not-proc  (proc 1 
not-authorized-write(pr0c task  address) 
EFFECTS 
END-EXCEPTIONS 
memory(proc  ,address) = word 
END-EFFEC TS 
WUN assign-mem-area(proc  task  base  length) 
(; assigns  authorized  area  in  proc  into  which  task  can  write) 
EXCEPTIONS 
task-in-proc  (proc  task) 
area-not-free(proc  base  length) 
not-a-task(task) 
too-many-tasks(proc) 
END-EXCEPTIONS 
EFFECTS 
task-set(proc) = 'task-set(proc)  union I task 1 
forall i, j (base <= i <= j <= length - 1 )  
implies  area-free(proc i j 1 = false 
END-EFFECTS 
OFUN make-f'ree(proc  task) 
(; deassigns  task  from  proc,  causing  memory  occupied by task 
EXCEPTIONS 
not-proc(proc) 
task-not-in-proc(proc  task) 
EF F EC TS 
to be  deallocated) 
END-EXCEPTIONS 
let  base = 'mem-area-write(proc  task>Cl]' 
let  length = 'mem-area-write(proc  task)[21' 
forall i, j (base <= i <= j <= length - 1) 
implies  area-free(i j) 
END-EFFECTS 
END-MODULE 
202 
The Buffer  
T h e  b u f f e r  m o d u l e  f a c i l i t a t e s  t h e  t r a n s f e r  o f  c o m p u t a t i o n  r e s u l t s  
f rom one  processor  to  another .  The  module c o n t a i n s  a s t o r a g e  area, o r  
bu f fe r ,  fo r  each  t r i p l e  <proc ,  t ask l ,  t ask2> such  tha t :  
(1) proc is a p rocesso r  
( 2 )  t a s k l  is a t a s k  c u r r e n t l y  r u n n i n g  i n  p r o c  
( 3 )  t a s k 2  is a t a s k  c u r r e n t l y  r u n n i n g  o n  some processor  
( p o s s i b l y  p r o c )  t h a t  r e q u i r e s  c o m p u t a t i o n  r e s u l t s  
f rom task l .  
The  bu f fe r  a s soc ia t ed  wi th  each  t r i p l e  a c t u a l l y  c o n s i s t s  o f  two sep -  
arate s t o r a g e  areas: t h e  e v e n  b u f f e r  a n d  t h e  odd b u f f e r .  On even iter- 
a t ions  o f  t he  computa t ion  fo r  a g iven  task ,  results are s t o r e d  i n  the  even  
bu f fe r ;  on  odd i t e r a t i o n s ,  results are s t o r e d  i n  t h e  odd b u f f e r .  
The need f o r  s u c h  a scheme arises from the kind of  synchronizat ion 
s i t u a t i o n  i l l u s t r a t e d  i n  F i g u r e  IX-3. The f i g u r e  shows a few i t e r a t i o n s  
of   computat ion i n  p r o c e s s o r s  1, 2, and 3 .  The s o l i d  h o r i z o n t a l  l i n e s  
PROCESSOR 1 
PROCESSOR 2 
PROCESSOR 3 
A 
" 
B 
B 
" 
ITERATION 1 
A 
B 
" 
B 
 
ITERATION 2 
A 
" 
0 
B 
ITERATION 3 
FIGURE IX-3 TIMING DIAGRAM FOR TWO COMMUNICATING PROCESSES 
r e p r e s e n t  i n t e r v a l s  d u r i n g  w h i c h  p a r t i c u l a r  tasks are executed. N o w  sup- 
pose  tha t  Task  B r e q u i r e s  f o r  i t s  inpu t ,  on  each  i t e r a t ion ,  t he  ou tpu t  
of Task A o n  t h e  p r e v i o u s  i t e r a t i o n .  B e c a u s e  t a s k s  e x e c u t i n g  i n  d i f f e r -  
e n t  p r o c e s s o r s  are only  loose ly  synchronized ,  Task  B may not  be executed 
c o n c u r r e n t l y  i n  p r o c e s s o r s  2 and 3. More p a r t i c u l a r l y ,  B may b e g i n  i n  
p rocesso r  2 befo re  A completes i n  p r o c e s s o r  1, whi l e  B b e g i n s  i n  p r o c e s s o r s  
203 
3 a f t e r  A completes. I f  a s i n g l e  s t o r a g e  area i s  used t o  h o l d  t h e  results 
of A, the .programs running B i n  p r o c e s s o r s  2 and 3 w i l l  r e a d  d i f f e r e n t  
i n p u t  v a l u e s .  B e c a u s e  t h e s e  i n p u t s  are s u b j e c t  t o  v o t i n g ,  a d i f f i c u l t  
arises. 
T h e  i n t r o d u c t i o n  o f  s e p a r a t e  b u f f e r s  f o r  odd and  even  i t e r a t ions  
allows programs i n  d i f f e r e n t  p r o c e s s o r s  t o  r e a d  t h e  same d a t a  r e g a r d l e s s  
o f  t h e i r  r e l a t i v e  p o s i t i o n s  i n  t h e  i t e r a t i o n  f r a m e .  I n  t h e  example s i t u -  
a t i o n ,  t h e  r e s u l t s  o f  t h e  f i r s t  i t e r a t i o n  are p l a c e d  i n  t h e  odd b u f f e r  
and  those  of  the  second i te ra t ion  are p l a c e d  i n  t h e  e v e n  b u f f e r .  D u r i n g  
the  second  i t e r a t ion  f r ame ,  bo th  p rocesso r s  runn ing  B t a k e  t h e i r  i n p u t  
f rom the odd buffer .  
T h e  s p e c i f i c a t i o n s  f o r  t h e  b u f f e r  m o d u l e  are l a r g e l y  s e l f - e x p l a n a t o r y .  
The module 's  chief  funct ion i s  the OV-funct ion 
read(proc1,   proc2,  bus, t a s k l ,   t a s k 2 ,   p a r i t y ,   o f f s e t ) ;  
read  is c a l l e d  by the  program running  task2  in  proc2  to  obta in  input  f rom 
t h e  p r o g r a m  r u n n i n g  t a s k l  i n  p r o c l .  I f  p a r i t y  i s  TRUE, t h e  a p p r o p r i a t e  
e v e n - b u f f e r  i n  p r o c l  i s  r ead ;  o the rwise ,  t he  odd b u f f e r  i s  read.  The V- 
func t ions  connec ted- r  and  connec ted- t  model  the  necessary  bus  swi t ich ing .  
By convention, bus 0 d e s i g n a t e s  t h e  i n t e r n a l  c o n n e c t i o n  o f  a p rocesso r  
t o  i t s  own memory. 
2 04 
MODULE buffer 
DECLARATIONS 
TYPE 
PROC = EXTERNAL  DESIGNATOR 
TASK = EXTERNAL  DESIGNATOR 
MACHINEWORD = EXTERNAL  DESIGNATOR 
END-TYPE 
integer  offset,  length, bus, number 
boolean b, parity 
PROC proc,  procl,  proc2 
TASK taskl,  task2,  task 
MACHINEWORD word 
END-DECLARATIONS 
PARAMETERS 
integer  max-buff  ;maximum  number of buffers  allowed  in  a  processor 
integer  max-buff-size  ;maximum  size of a  buffer 
integer  numb-busses  ;number of busses  in  system 
END-PARAMETERS 
EXCEPTIONS 
MACRO no-buffer(proc,taskl,  task21 = b 
not 'buffs-exist(proc,taskl,task2)' 
MACRO out-of-bounds(proc,taskl,task2,offset) = b 
offset >= 'buffs-size(proc,taskl,task2)' 
MACRO buffer-too-long(1ength) = b 
length > max-buff-size 
MACRO bad-bss(bus) = b 
bus > numb-bgsses  or bus < 0 or  (bus = 0 and  not  PrOCl = PrOc2) 
MACRO same-proc ( proc 1, proc2 = b 
procl = proc2 
END-EXCEPTIONS 
205 
FUNCTIONS 
VFUN connected-r(proc) = bus 
(; indicates  which  bus  proc  is  connected  to  for  receiving  data) 
HIDDEN 
I N I T I A L L Y  undefined 
VFUN connected-t(bus) = proc 
(; indicates  which  proc  bus is connected  to  for  transmitting  data) 
HIDDEN 
I N I T I A L L Y  undefined 
VFUN buff-mem-odd(proc,taskl,task2,offset) = word 
( ; stores  words  in ttcddtt buffer  for  transmission  from 
HIDDEN 
I N I T I A L L Y  undefined 
taskl  to  task21 
VFUN buff-mem-even(proc  ,taskl,  task2,offset) = word 
(; stores  words  in  "even"  buffer  for  transmission  from 
HIDDEN 
I N I T I A L L Y  undefined 
taskl  to  task21 
VFUN buffs-exist(proc,taskl,task2) = b 
(; indicates  whether  buffers  exist  in  proc  for  the 
I N I T I A L L Y  undefined 
transmission of data  from  taskl  to  task21 
VFUN buffs-size(proc,taskl,task2) = length 
(; indicates  size of buffers  in  proc  for  transmission f data 
EXCEPTIONS 
no-buffer(proc,taskl,task2) 
I N I T I A L L Y  undefined 
from taskl  to  task21 
END-EXCEPTIONS 
CFUN create-buffers(proc,taskl,task2,length) 
will  deposit  data  for  task21 
(; establisthes  buffers of size  length  in  proc  in  which  taskl 
EXCEPTIONS 
too-many-buffers(proc1 
buffer-too-long(1ength) 
EFFECTS 
END-EXCEPTIONS 
buffs-exist(proc,taskl,task2) 
buffs-size(proc,taskl,task2) = length 
forall i (0 <= i <= length)  implies 
btlff-mem-odd(proc  ,taskl,  task21 = 0 
forall i (0 <= i <= length)  implies 
buff-mem-even(proc,taskl,task2) = 0 
END-EFFECTS 
206 
OFUN write(proc,taskl,task2,parity,offset,word) 
(; called  by  taskl  to  deposit  data  into  the  appropriate  buffer 
for  task21 
EXCEPT IONS 
no-btlffer  (proc,  taskl,  task21 
out-of-bounds(proc,taskl,tas.k2,offset) 
EFFECTS 
END-EXCEPTIONS 
if  parity  then buff-mem-even(proc,taskl,task2,offset) = word 
else  buff-mem-odd(proc  ,taskl,  task2,offset) = word 
END-EFFECTS 
OVFUN read(procl,proc2,bus,taskl,task2,parity,offset~ = word 
(; called  by task2  running in proc2  to  receive  data  from 
EXCEPTIONS 
no-buf  fer ( p oc 1, task 1, task2 
out-of-bounds(procl,taskl,task2,offset) 
bad-bus ( bus 
EFFECTS 
the  appropriate  buffer  deposited  by  taskl  in  procl) 
END-EXCEPTIONS 
if  not bus = 0 then  connected-r(proc2) = bus 
and  connected-t(bus) = procl 
if parity  then  word = 'buff-mem-even'(procl,taskl,task2,offset) 
else  word = 'buff-mem-odd' (procl,taskl,task2,offset) 
END-EFFECTS 
WUN delete-buffers(proc  ,taskl,  task21 
( ;  deletes  the  buffer  in  proc  for  the  transmission of data 
EXCEPT IONS 
no-buffer(proc,taskl,task2) 
EFFECTS 
*om taskl  to  task21 
END-EXCEPTIONS 
buf  fs-exist  (proc,  taskl,  task2) = false 
buffs-size(proc,taskl,task2) = undefined 
buff-mem-even(proc,taskl,task2, offset) = Llndefined 
buff-mem-odd(proc,taskl,task2, offset) = undefined 
END-EFFECTS 
END-FUNCTIONS 
END-MODULE 
207 
. .  
Dispa tche r  
The  p r imary  ro l e  o f  t he  d i spa tche r  i s  t o  s t o r e  t a s k  s c h e d u l e s  a n d  
d i s p a t c h  t a s k s  as d i c t a t e d  by the  cu r ren t ly  app ly ing  schedu le  and  by 
e x t e r n a l  e v e n t s .  
The  d ispa tcher  responds  to  the  passage  of  t i m e  i n  d e t e r m i n i n g  t h e  
t a s k  t o  be  dispatched.  There are two c l o c k s  p e r t i n e n t  t o  t h i s  m a c h i n e :  
a h igh  s p e e d  c lock ,  as r ep resen ted  by t h e  0 - f u n c t i o n  timer, and a slower 
c l o c k  as rep resen ted  by the  0 - func t ion  c lock - t i ck .  Each  o f  t hese  t iming  
0 - func t ions  i s  assumed t o  b e  c a l l e d  by a separate independent  process  
tha t  can  ope ra t e  a synchronous ly  wi th  the  o the r  sys t em t a sks .  Tha t  is, 
t h e s e  c l o c k s  are treated l i k e  i n t e r r u p t  s i g n a l s .  
The in t e rva l   be tween   success ive   c lock - t i cks  is called a frame.  The 
f a s t e s t  t a s k s  are d ispa tched  once  every  f rame;  s lower  tasks  are d i spa tched  
every  n- th   f rame,  n > 1. A s  p r e v i o u s l y  n o t e d ,  t h e  d i s p a t c h e r  h a n d l e s  two 
types   o f   t asks :   scheduled   and   pr ior i ty .  The schedu led   t a sks   run   t o  com- 
p l e t ion  eve ry  t i m e  they  are d i spa tched .  A t a s k  calls  the  0 - func t ion  job -  
complete t o  i n d i c a t e  t h a t  i s  has   completed  execut ion.   Each  task i s  a l s o  
g iven  a maximum t i m e  fo r  execu t ion ,  as measured  by c a l l s  on t i m e r .  The 
V-funct ion max-task-t ime records the maximum allowed t i m e ,  and t h e  V- 
func t ion   t ime-cur ren t - task   records   the   remain ing   execut ion  time. I f  a 
t a sk   ove r runs  it i s  undispa tched   and   the   next   t ask  i s  dispatched.   The 
in fo rma t ion  tha t  t he  d i spa tche r  needs  abou t  s chedu led  t a sks  i s  provided 
by t h e  LE c a l l i n g  t h e  0 - f u n c t i o n  a d d - r e g u l a r l y - s c h e d u l e d  t a s k .  B e s i d e s  
s e t t i n g  t h e  v a l u e  f o r  m a x - t a s k - t i m e ,  t h i s  f u n c t i o n  a l s o  makes known t h e  
i n i t i a l  s t a t u s  of a t a s k  (e.g., e n t r y  p o i n t ,  i n i t i a l  v a l u e  o f  r e g i s t e r s )  
t o  t h e  d i s p a t c h e r .  The actual s c h e d u l e  i t s e l f  i s  g i v e n  t o  t h e  d i s p a t c h e r  
by c a l l i n g  t h e  0 - f u n c t i o n  a d d - r e g u l a r - s c h e d u l e .  A schedu le  i s  a c i r c u l a r  
l ist of  scheduled  tasks ,  wi th  a p o i n t e r  i d e n t i f i e d  by nex t - se l ec t ed -  
element.  
P r i o r i t y  t a s k s  are o n l y  e l i g i b l e  t o  b e  d i s p a t c h e d  when a l l  o f  t h e  
scheduled  tasks  have  comple ted  the i r  execut ion  in  a frame.  The  parameter 
gp t ,  when i t  a p p e a r s  i n  a s c h e d u l e  i n d i c a t e s  t h a t  a p r i o r i t y  t a s k  is t o  
be dispatched.  A c l o c k  t i c k  o c c u r r i n g  d u r i n g  t h e  e x e c u t i o n  o f  a p r i o r i t y  
208 , 
t a s k  s i g n i f i e s  t h a t  t h e  s ta tus  o f  t h e  c u r r e n t l y  e x e c u t i n g  p r i o r i t y  t a s k  
is to  be  saved ,  and  the  nex t  s chedu led  t a sk  i s  t o  be  dispatched.  When a 
p r i o r i t y  t a s k  c o m p l e t e s  its e x e c u t i o n ,  o r  i t s  execu t ion  t i m e  exceeds  the  
va lue  o f  max- t a sk - t ime  ano the r  p r io r i ty  t a sk  is c o n s i d e r e d  f o r  d i s p a t c h -  
i n g .  T h e  p r i o r i t y  t a s k  s e l e c t e d  i s  t h e  t a s k - o f  t h e  h i g h e s t  p r i o r i t y ,  as 
r e f l e c t e d  by t h e  v a l u e  o f  t h e  V - f u n c t i o n ,  p r i o r i t y ,  s u c h  t h a t  t h e  t i m e  
s i n c e  i ts  last execu t ion  exceeds  the  des i r ed  pe r iod  fo r  t ha t  t a sk ,  as 
r e f l e c t e d  by t h e  v a l u e  o f  t h e  V - f u n c t i o n  p e r i o d - p r i o r i t y .  I n f o r m a t i o n  
a b o u t  p r i o r i t y  t a s k s  i s  g i v e n  t o  t h e  d i s p a t c h e r  v i a  t h e  O - f u n c t i o n  a d d -  
p r i o r i t y - t a s k .  
The  d i spa tche r ,  u s ing  the  V- func t ion  i t e r - coun t ,  r eco rds  the  number 
o f  i t e r a t ions  comple t ed  by each  task .  
209 
MODULE d i s p a t c h e r  
IECLARATIONS 
TYPE 
TASK = DESIGNATOR 
TIME = DESIGNATOR 
MACHINEWORD = DESIGNATOR 
END-TYPE 
i n t e g e r  p o s i n t  
boolean b 
TIME time, t i m e l ,  t i m e 2  
TASK t a s k  
MACHINEWORD word 
CIRCULAR-LIST t a s k - l i s t  
TUPLE-OF MACHINEWORD word-tuple 
ONE-OF {regular, p r io r i ty )   k ind -o f - t a sk  
END-DECLARATIONS 
PARAMETERS 
TASK l e  ( ;  local e x e c u t i v e   t a s k )  
TASK gp t  ( ;  g e n e r i c  p r i o r i t y  t a s k - -  GPT i s  the  l l cu r ren t - t a sk l l  
when a s p e c i f i c  p r i o r i t y  t a s k  i s  t o  be  
sc hed u l  ed 
real  t a s k  is t o  be  scheduled)  
TASK n u l l - t a s k  (; t h e  empty p r i o r i t y  t a s k  used t o  f i l l  i n  when no 
TUPLE-OF MACHINEWORD ze ro - tup le  ( ;  t u p l e  c o n s i s t i n g  of a l l  ze ro  words)  
TUPLE-OF MACHINEWORD status-le (; i n i t i a l  s t a t u s  o f  l o c a l  e x e c u t i v e )  
i n t e g e r  pn (;  p r i o r i t y  l e v e l  o f  n u l l - t a s k  1 
integer  max-scheduled-tasks ( ;  m a x i m u m  n m b e r  o f  
r e g u l a r l y  s c h e d u l e d  t a s k s )  
in teger  max-pr ior i ty- tasks  ( ;  maximum number o f  p r i o r i t y  t a s k s )  
i n t e g e r  s t a t u s - l e n g t h  (; number of machine  words 
TIME max-time-le (;  maximum execu t ion  time for  local  execu t ive )  
TIME max-time-null-task (; maximum execu t ion  time for n u l l  task)  
compris ing status o f  a n y  t a s k )  
END-PARAMETERS 
DEF I N  I T  IONS 
dispa tch-next - regular - task  
cu r ren t - t a sk  = currently-selected-element(ltask-list') 
c = advance - se l ec to r (   t a sk - l i s t  ) 
t ime-current- task = 
s t a tus -cu r ren t - t a sk  = *initial-status-task(current-task)t 
'max-task-time(currently-selected-element(?task-list?))? 
210 
d i spa tch - in t e r rup ted -p r io r i ty - t a sk  
c u r r e n t - t a s k  = currently-selected-element('task-1ist ' ) 
c = advance-selector('task-list') 
t ime-current- task = ' t ime- in te r rupted- task '  
s t a tus -cu r ren t - t a sk  = ' s t a tu s - in t e r rup ted - t a sk '  
s ave - s t a tus -cu r ren t -p r io r i ty - t a sk  
t ime- in te r rupted- task  = ' t ime-current- task '  
s t a t u s - i n t e r r u p t e d - t a s k  = ' s t a tu s -cu r ren t - t a sk '  
d i spa tch -nex t -p r io r i ty - t a sk  
l e t  s = { task I task member-of ' p r i o r i t y - t a s k - s e t '  and 
e x i s t s  taskl  
'time-to-next-exec ( t a sk )  = 0 1 
t a s k l  member-of s 
fo ra l l  t a s k 2  t a s k 2  member-of s i m p l i e s  
c w r e n t - p r i o r i t y - t a s k  = taskl  
t ime-cur ren t - task  = 'ma - t a sk - t ime( t a sk1) '  
s t a tus -cu r ren t - t a sk  = ' i n i t i a l - s t a t m - t a s k (   t a s k l  
time-to-next-exec(task1) = 'period-priority(task1)' 
' p r i o r i t y ( t a s k 1  1' <= ' p r i o r i t y ( t a s k 2 ) '  
END-DEFINITIONS 
EXCEPTIONS 
MACRO no - t a sk ( t a sk )  = b 
n o t  t a s k  member-of task-set and 
n o t  t a s k  member-of p r i o r i t y - t a s k - s e t  
MACRO t a sk - i s -gp t ( t a sk )  = b 
t a s k  = gpt  
MACRO not-a-priority-task(task) = b 
n o t  task member-of p r i o r i t y - t a s k - s e t  
MACRO task-already-known( task)  = b 
task member-of task-set or t a s k  member-of p r i o r i t y - t a s k - s e t  
MACRO too-many-regular-tasks = b 
c a r d i n a l i t y ( t a s k - s e t )  >= max-scheduled-tasks 
MACRO tuple-wrong-length(word-tuple) = b 
not   l ength(word- tuple)  = s t a t u s - l e n g t h  
211 
MACRO too-many-prior i ty- tasks  = b 
cardinality(pri0rity-task-set) >= max-prior i ty- tasks  
MACRO task-not - regular  ( task)  = b 
n o t  t a s k  member-of task-set 
MACRO no t - cu r ren t - t a sk ( t a sk )  = b 
n o t  t ask  = ' c u r r e n t  t a sk '  
MACRO n u l l - t a s k ( t a s k 1  = b 
t a s k  = n u l l - t a s k  
MACRO next-task-not-known = b 
n o t  next-selected-task('task-list'1 member-of 'task-set' 
MACRO cu r ren t - t a sk -no t -p r io r i ty  = b 
MACRO n e x t - t a s k - p r i o r i t y  = b 
no t   ' cu r r en t - t a sk '  = gpt  
next-selected-element('task-list') = gp t  
END-EXCEPTIONS 
FUNCTIONS 
VFUN task-set = s 
(;  s e t  of a l l  r e g u l a r  t a s k s  a s s i g n e d  t o  d i s p a t c h e r )  
EXCEPTIONS 
INITIALLY s = { le ,  g p t )  
END-EXCEPTIONS 
VFUN t a s k - l i s t  = c 
(; c i r c u l a r  l ist  o f  r egu la r ly - schedu led  tasks) 
EXCEPTIONS 
INITIALLY c = NEW c i rcu lar - l i s t  ( l e  , g p t )  
END-EXCEPTIONS 
VFUN max-task-time(task1 = time 
( ;  maximum time a l lowed  fo r  execu t ion  of a schedu led  t a sk )  
EXCEPT IONS 
no- task( task1  
task- is-gpt  ( t a s k )  
INITIALLY ma-task-time( task) = 
END-EXCEPTIONS 
if t a s k  = l e  then max-time-le-task 
else i f  task = null - task then max-t ime-nul l - task 
else undefined 
VFUN c u r r e n t - t a s k  = t ask  
(; cur ren t ly  d i spa tched  schedu led  t a sk )  
EXCEPTIONS 
INITIALLY c u r r e n t - t a s k  = l e  
END-EXCEPTIONS 
212 
VF.m t ime-cur ren t - task  = time 
(; a l lowab le  time remaining for c u r r e n t l y  d i s p a t c h e d  task) 
EXCEPT IONS 
INITIALLY time = time-le 
END-EXCEPTIONS 
WUN s t a t u s - c u r r e n t - t a s k  = word-tuple 
(; s t a t u s  ( v a l u e s  of program counter and other registers) 
EXCEPTIONS 
I N I T I A L L Y  s t a tus -cu r ren t - t a sk  = status-le 
, of c u r r e n t l y  d i s p a t c h e d  t a s k )  
END-EXCEPTIONS 
WUN initial-status-task(task) = word-tuple 
(; i n i t i a l  s t a t u s  of e a c h  t a s k )  
EXCEPTIONS 
no - t a sk (   t a sk )  
INITIALLY i n i t i a l - s t a t u s - t a s k ( 1 e )  = s t a t u s - l e  
END-EXCEPTIONS 
initial-status-task(nul1-task) = zero- tuple  
VFUN s t a t u s - i n t e r r u p t e d - t a s k  = word-tuple 
( ;  h o l d s  s t a tus  o f  i n t e r r r u p t e d  p r i o r i t y  j o b )  
EXCEPTIONS 
INITIALLY s t a t u s - i n t e r r r u p t e d - t a s k  = zero- tuple  
END-EXCEPTIONS 
V F U N  c u r r e n t - p r i o r i t y - t a s k  = t a s k  
(; g i v e s  i d e n t i t y  of  c u r r e n t l y  e x e c u t i n g  or  
EXCEPTIONS 
INITIALLY task = n u l l - t a s k  
i n t e r r u p t e d  p r i o r i t y  t a s k )  
END-EXCEPTIONS 
VFUN t ime- in te r rupted- task  = time 
( ;  remaining execQtion time f o r  i n t e r r u p t e d  p r i o r i t y  t a sk )  
EXCEPTIONS 
INITIALLY time = 0 
END-EXCEPTIONS 
VFUN overrun- tasks  = < sl, 92 > 
(;  i n d i c a t e s  t a s k s  w h i c h  h a v e  o v e r r u n  t h e i r  a l l o t t ed  times; 
s l  is se t  of schedu led  t a sks ;  s2 is set of p r i o r i t y  t a s k s )  
E XC EPT IONS 
NO-EXCEPTIONS 
I N I T I A L L Y  ~l I )  
s2 = { I  
VFUN p r i o r i t y ( t a s k 1  = p o s i n t  
(; p r i o r i t y  l e v e l  of p r i o r i t y  t a s k - - s m a l l e r  v a l u e s  i n d i c a t e  
EXCEPT IONS 
not-a-priority-task(task1 
INITIALLY p r i o r i t y ( n u l 1 - t a s k )  = pn 
h i g h e r  p r i o r i t y )  
END-EXCEPTIONS 
213 
. , . . , ." - . . ." . 
V F U N  p e r i o d - p r i o r i t y ( t a s k )  = time 
(; m i n i m u m  schedul ing  f requency  for  p r i o r i t y  task 
EXCEPTIONS 
not-a-priority-task(task) 
INITIALLY time = 0 
measured i n  c l o c k  ticks) 
END-EXCEPTIONS- 
VFUN time-to-next-exec(task1 = time 
(; minimum allowable time to  n e x t  d i s p a t c h i n g  
EXCEPTIONS 
not-a-priority-task(task) 
INITIALLY time-to-next-exec(nul1-task) = 0 
of p r i o r i t y  t ask   t ask)  
END-EXCEPTIONS 
VFUN i t e r - c o u n t ( t a s k )  = p o s i n t  
(; i n d i c a t e s  t h e  number o f  i t e r a t i o n s  c o m p l e t e d  by task) 
EXCEPTIONS 
no-task(  task) 
I N I T I A L L Y  i t e r - coun t (1e )  = 0 
END-EXCEPTIONS 
i t e r - coun t (nu l1 - t a sk )  = 0 
OFUN add-regular ly-scheduled-task(   task, t ime,word-tuple)  
(; makes new t a s k  known to  d i spa tcher - - informat ion  concern ing  
m a x i m a n  execut ion  time and i n i t i a l  s t a t u s  are passed)  
EXCEPTIONS 
task-already-known(  task) 
too-many-scheduled-tasks 
tuple-wrong-length(word-tuple) 
EF F EC TS 
END-EXCEPTIONS 
task-set = ' task-set' union {task)  
ma-task-time(task) = time 
initial-status-task(task1 = word-tuple 
i t e r - c o u n t ( t a s k )  = 0 
END-EFFECTS 
OFUN delete-scheduled-task(  task) 
(; removes  regular ly  scheduled  task task) 
EXCEPTIONS 
task-not - regular (  t ask)  
EFFECTS 
END-EXCEPTIONS 
task-set = ' task-set' - {task) 
max-task-time( task)  = undefined 
initial-status-task(task) = undefined 
in t e r - coun t (  task)  = undefined 
END-EFFECTS 
2 14 
OFUN add-regular-schedule(c)  
(; give t h e  schedule  of r e g u l a r l y  s c h e d u l e d  t a s k s  as t h e  
EXCEPTIONS 
EFFECTS 
circular  list c )  
END-EXCEPTIONS 
t a s k - l i s t  = c 
END-EFFECTS 
CFUN add-priority-task(task,posint,timel,time2,word-tuple~ 
(; makes a new p r i o r i t y  t a s k  w i t h  p r i o r i t y  l e v e l  p o s i n t  
m a x i m u m  execu t ion  time timel, m i n i m l a m  time between 
e x e c u t i o n s  t i m e 2 ,  a n d  i n i t i a l  s t a t u s  w o r d - t u p l e  
known to  t h e  d i s p a t c h e r )  
EXCEPTIONS 
task-already-known( task)  
too-many-priority-tasks 
tuple-wrong-length(word-tuple) 
EF  FEC TS 
END-EXCEPTIONS 
p r i o r i t y - t a s k - s e t  = ' p r io r i ty - t a sk - se t '  un ion  { t a s k ]  
max-task-time(task) = t ime l  
i n i t i a l - s t a t m - t a s k ( t a s k )  = word-tuple 
i t e r - c o u n t ( t a s k )  = 0 
p r i o r i t y ( t a s k 1  = p o s i n t  
p e r i o d - p r i o r i t y (  task) = t ime2 
time-to-next-exec(task1 = 0 
END-EFFECTS 
WUN delete-priority-task(task1 
(; removes p r i o r i t y  t a s k  task)  
EXCEPTIONS 
not-a-priority-task(task) 
EFFECTS 
END-EXCEPTIONS 
p r i o r i t y - t a s k - s e t  = ' p r io r i ty - t a sk - se t '  - {task)  
ma- t a sk - t ime(  t a sk )  = undefined 
initial-status-task(task) = undefined 
iter-count(task1 = m d e f i n e d  
p r i o r i t y ( t a s k )  = m d e f i n e d  
p e r i o d - p r i o r i t y ( t a s k 1  = undefined 
t ime-to-next-exec(task) = undefined 
END-EFFECTS 
C F U N  assign-iter-count(task,posint) 
(; a s s i g n s  i t e r a t i o n  count p o s i n t  t o  t a s k )  
EXCEPTIONS 
no- task( task1  
EFFECTS 
END-EXCEPTIONS 
i t e r - c o u n t ( t a s k 1  = p o s i n t  
END-EFFECTS 
215 
OFUN job-complete( task)  
(; called by c u r r e n t l y  d i s p a t c h e d  task on  comple t ing  
execu t ion ;  a new task  i s  then  d i spa tched- -  i f  task i s  a 
p r i o r i t y  t a s k ,  t h e  new t a s k  will be as well. i f  t a s k  i s  
a r e g u l a r l y - s c h e d u l e d  t a s k ,  t h e  new task may be e i ther  
s c h e d u l e d  o r  p r i o r i t y .  we assrune t h a t  n u l l - t a s k  n e v e r  
c a l l s  job-complete.)  
EXCEPT IONS 
n u l l - t a s k (   t a s k )  
n o t - c u r r e n t - t a s k ( t a s k )  
next-task-not-known 
EFFECTS 
END-EXCEPTIONS 
i t e r - c o u n t ( t a s k 1  = ' i t e r - c o m t ( t a s k ) '  + 1 
i f  n o t  ' c u r r e n t - t a s k '  = gpt  
and not next-selected-task('task-list' ) = gpt  
t hen  d i spa tch -nex t - r egu la r - t a sk  
e lse  i f  n o t   ' c u r r e n t - t a s k '  = gpt  and 
nex t - se l ec t ed - t a sk  = gp t  
t h e n  d i s p a t c h - i n t e r r u p t e d - p r i o r i t y - t a s k  
else d i s p a t c h - n e x t - p r i o r i t y - t a s k  
END-EFFECTS 
OFUN clock-tick 
(; s i g n a l s  i n t e r r u p t i o n  o f  p r i o r i t y  t a s k  and subsequent  
d i s p a t c h i n g  of r egu la r ly - schedu led  task)  
EXCEPTIONS 
c u r r e n t - t a s k - n o t - p r i o r i t y  
n e x t - t a s k - p r i o r i t y  
ne.xt-task-not-known 
EFFECTS 
END-EZZEPTIONS 
f o r a l l  t a s k  t a s k  member-of ' p r i o r i t y - t a s k - s e t '  and 
n o t  'time-to-next-exec(task)' 0 
imp1 ies 
time-to-next-exec ( task)  = 
'time-to-next-exec(task)' - 1 
s a v e - s t a t u s - c u r r e n t - p r i o r i t y - t a s k  
d ispa tch-next -scheduled- task  
END-EFFECTS 
216 
OFUN timer 
(; decrements  time remaining for c u r r e n t  t a s k ,  logs 
EXCEPTIONS 
next-task-not-known 
EF F EC TS 
error a n d  d i s p a t c h e s  n e x t  task i f  o v e r r u n  o c c u r s )  
END-EXCEPTIONS 
t ime-cur ren t - task  = ' t ime-cur ren t - task '  - 1 
i f  t ime-cur ren t - task  = 0 t h e n  
i f  ' c u r r e n t - t a s k '  = g p t   t h e n  
over run- tasks[2  1 = 
' over run- tasks ' [2]  un ion  i ' c u r r e n t - p r i o r i t y  
- task '  1 
else overrun-tasks[  1 I = 
' over run- tasks ' [   11   un ion  i ' c u r r e n t - t a s k '  1 
n o t  next-selected-element('task-list' = g p t   h e n  
i f  n o t  ' c u r r e n t - t a s k '  = gpt   and 
dispatch-next-scheduled-task 
else i f  n o t  ' c u r r e n t - t a s k '  = gpt  
and  next-selected-element( '  t ask- l i s t '  = g p t  
then 
d i s p a t c h - i n t e r r u p t e d - p r i o r i t y - t a s k  
else d i s p a t c h - n e x t - p r i o r i t y - t a s k  
END-EFFECTS 
217 
The Reader-Voter 
The reader-voter  module  provides  the  means  of  comparing  results  of 
different  processors  working on the same task. In the  present  design,  it 
is  the  only  mechanism  for  detecting  failures,  including  those  uncovered 
during  diagnosis. The reader-voter is  therefore  at  the  heart  of  SIFT's 
fault-tolerance  machinery. 
The  reader-voter  is  also  the  lowest  module in the  system  in  which 
processor  and  bus  failures  are  explicitly  modeled. The  reason is  that 
this  module  is  conceptually  the  lowest  point  at  which  errors  are  intro- 
duced. From the  point  of view'of the  global  executive,  if  the  reader- 
voter  has  not  recorded  a  voting-discrepancy,  no  fault  has  occurred--even 
if  certain  busses  and  processors  have  in  reality  failed.  One  must  bear 
in mind,  of  course,  that  the  reader-voter,  like  all  of  SIFT's  functions, 
is  actually  distributed  among  the  processors  of  the  system--the  global 
executive  compares  results  obtained  by  reader-voters in  all  modules  and 
is  aware  of  the  possibility  of  a  fault  affecting  some  processor's  reader- 
voter  program. 
The central  focus  of  the  module  is  the  OV-function  vote-read(proc 
task  offset).  This  function  is  used  by  the  program  associated  with  the 
task  task  running in processor  proc. The effect  of  vote-read  is  to  read 
(via  the  bus  network)  the  contents  of  the  virtual  address  <task,  offset> 
of  every  processor  performing  the  task  task,  and  then  vote on  the results. 
If some  value  receives  a  simple  majority,  that  value  is  returned;  other- 
wise, the  flag  fatal-error  is  set,  and  undefined  is  returned as the  value 
of the  call. In  either  case,  unless  the  vote  is  unanimous,  the  flag 
error-detected  is  set,  and  the  details of the  disagreement  are  logged. 
The outcome  of  a  call on vote-read  depends on a  number of factors. 
It clearly  depends on whether  and  which  processors  executing  task  are 
faulty  at  the  time. It similarly  depends on  which busses  used  in  com- 
municating  with  these  processors  are  faulty.  Naturally,  the  outcome  also 
depends on  what  the  polled  values  actually  are,  right  or  wrong. 
218 
The o ther  V- and O-functions i n   t h e  module are used t o  model t h e s e  
cons idera t ions .  The  most  impor tan t  of  these  are the V-funct ions proc-bus-  
ass ignments ,   faul t ,   correct-read,   and  read.  
Proc-bus-assignments i s  a V-function of two arguments, proc and task. 
Fo r  each  p rocesso r  and  each  t a sk  execu ted  wi th in  tha t  p rocesso r ,  it 
s t o r e s  t h e  i n f o r m a t i o n  as t o  what busses are t o  b e  u s e d  i n  r e a d i n g  v a l u e s  
f rom the  o the r  p rocesso r s  execu t ing  the  t a sk .  Th i s  i n fo rma t ion  i s  r e p r e -  
s en ted  as a set  of PAIRS. Each   pa i r   has  two p a r t s :  a p roc   pa r t   des ig -  
n a t i n g  a processor ,  and a b u s  p a r t  d e s i g n a t i n g  t h e  b u s  t o  b e  u s e d  i n  
r ead ing  f rom tha t  p rocesso r .  I t  might  be  noted  tha t  the  set pro-bus- 
ass ignments  (proc  task)  may c o n t a i n  a p a i r  whose p rocesso r  component is 
p r o c  i t s e l f - - i n  o t h e r  w o r d s ,  a p rocesso r  may wish t o  read from i t s e l f  o v e r  
a bus. A d i a g n o s t i c  r o u t i n e ,  f o r  example, might  need t h i s  c a p a b i l i t y .  
The  V-funct ion  fau l t  is used t o  keep  t r ack  o f  f au l t s  i n  the  ha rdware .  
F o r  p a r t i c u l a r  v a l u e s  of its arguments  procl,   proc2,  bus,   task,   and  off-  
set, it r e t u r n s  TRUE o r  FALSE depending on whether  or  not  a f a u l t  exists 
t h a t  impac t s  a read  by  proc l ,  us ing  bus  bus ,  o f  the  v i r tua l  loca t ion  
< ta sk ,  o f f se t>  in  p roc2 .  No te  tha t  t h i s  V- func t ion  is gene ra l  enough  to  
model a complete  breakdown  of a p r o c  o r  b u s .  I f  f o r  example, bus 1 i s  
s e v e r e d ,  f a u l t  r e t u r n s  t r u e  o n  a l l  legi t imate  combinat ions of  arguments  
for  which  bus = 1. Because  f au l t  i s  n o t  i n t e n d e d  t o  b e  v i s i b l e  o u t s i d e  
t h e  module, i t  is dec la red  HIDDEN. Another HIDDEN V-funct ion ,   cor rec t -  
r e a d  ( t a s k  o f f s e t ) ,  r e t u r n s  t h e  v a l u e  o n e  would e x p e c t  t o  f i n d  i n  t h e  
v i r t u a l  l o c a t i o n  < t a s k ,  o f f s e t >  o f  a non-fau l ty  processor  working  on  task .  
L i k e  f a u l t ,  c o r r e c t - r e a d  is a b s t r a c t  i n  t h e  s e n s e  t h a t  it i s  n o t  a c t u a l l y  
implemented i n  t h e  SIFT so f tware .  It i s  d e f i n e d ,  r a t h e r ,  i n  terms o f  a n  
i d e a l  p r o c e s s o r .  
The  V-funct ion  read(proc1,   proc2,   bus,   task,   offset)   del ivers   the 
r e s u l t  o f  a read  by p r o c l ,  o v e r  b u s  b u s ,  o f  t h e  l o c a t i o n  < t a s k ,  o f f s e t 3  
i n  proc2. I f  a f a u l t  c o n d i t i o n  exists w h i c h  a f f e c t s  t h a t  read (as d e t e r -  
mined  by the  V-funct ion  fau l t ) ,  an  undetermined  va lue  i s  re turned .  Other -  
wi se ,  t h e  c o r r e c t  v a l u e  c o r r e c t - r e a d  ( t a s k  o f f s e t )  is returned.   Note  
t h a t  t h e  v a l u e  o f  r e a d  is comple te ly  de te r in ined  by  the  va lues  of  fau l t  
and  cor rec t - read;  read  is t h e r e f o r e  a DERIVED V-function. 
219 
MODULE reade r -vo te r  
DECLARATIONS 
TYPE 
PROC = DESIGNATOR 
TASK = DESIGNATOR 
MACHINEWORD = DESIGNATOR 
PAIR = STRUCTURE(PR0C: proc, i n t e g e r :  b u s )  
END-TYPE 
PROC proc, procl, proc2 
SET-OF PROC s p  
TASK task,  t a s k l ,  t a s k 2  
SET-OF TASK st 
PAIR p a i r ,  pa i r1  
SET-OF PAIR setpairs  
MACHINEWORD word, word 1 
BAG-OF MACHINEWORD v o t e s  
boolean b 
i n t e g e r  b u s ,  o f f s e t  
END-DECLARATIONS 
PARAMETERS 
END-PARAMETERS 
DEF I N  I T  IONS 
MACRO major i ty-opin ion(proc  t a s k  o f f se t )  = word 
LET v o t e s  = 
BAG {word1 I e x i s t s  pair  
pair member-of 'proc-bus-assignments(proc task) '  
and word = 'read(proc pair.proc pa i r . bus  t a s k  
o f f se t )  1 
i f  e x i s t s  word 1 
word1 member-of v o t e s  
a n d   m u l t i p l i c i t y ( w o r d 1 ,   v o t e s )  > 
( 1 / 2 ) c a r d i n a l i t y ( v o t e s )  
then  word = word1 
e lse  word = undefined 
220 
MACRO dissenting-pairs(proc  task  offset) = setpairs 
if  majority-opinion(proc  task  offset) = undefined 
then  setpairs = 'proc-bus-assignment(proc task)' 
el  se 
setpairs = 
{pair f pair  member-of 'proc-bus-assignments(proc task) ' 
and  not  'read(proc  pair.proc  pair.  bus  task  offset)' 
= majority-opinion(proc  task  offset) 1 
E N D - D E F I N I T I O N S  
EXCEPTIONS 
MACRO no-proc-bus-assignment(proc1 proc2 bus task) = b 
not  exists  pair 
pair  member-of 'proc-bus-assignment(proc1 task)' 
and  pair  .proc = proc2 
'and  pair.bus = bus 
MACRO bad-offset(task  offset) = b 
offset < : or  offset > max-offset(task) 
MACRO not-assigned(task) = b 
not  exists  proc 
task  member-of ' task-set(proc) 
MACRO bad-assignment(proc  task  setpairs) = b 
exists  pair 
pair  member-of  setpairs 
and  (pair.bus > maxbusses  or 
exists  pair 1 
pair 1 member-of  se  tpairs 
and  (pair.bus = pair1 .bus 
or  pair  .proc = pair 1 .proc) 1 
MACRO too-many-tasks(proc1 = b 
cardinality( 'task-set(proc) ' >= ma-tasks 
END-EXCEPTIONS 
FUNCTIONS 
VFUN task-set(proc) = st 
( ; set of tasks  assigned  to  processor  proc) 
I N I T I A L L Y  st = { le 1 
221 
VF UN proc-bus-assignments(proc task) = se tpa i rs  
(; for  each proc and task ,  y ie lds  se t  o f  PAIRS--one pair 
for each other processor working on that task; the first 
component of each pair  names the processor, the second gives 
the bus assignment for reading from that processor) 
EXCEPTIONS 
task-not-in-proc(proc task) 
I N I T I A U Y  undefined 
END-EXCEPTIONS 
VFUN fault(proc1 proc2 bus t ask  of fse t )  = b 
(; indicates whether or not a f a u l t   e x i s t s   t h a t  impacts a read 
by procl using bus bus  of t h e  memory of proc2 a t  t h e  
location associated w i t h  t ask ,  o f fse t )  
HIDDEN 
EXCEPTIONS 
no-proc-bus-assignment(proc1 proc2 bus task) 
bad-offset(task offset) 
INITIALLY fa l se  
END-EXCEPTIONS 
VFUN correc t-read (task ofset = word 
(; resu l t  tha t  one would expect from a non-faulty module) 
HIDDEN 
EXCEPTIONS 
not-assigned(task1 
bad-offset(task  offset)  
I N I T I A L L Y  undefined 
END-EXCEPTIONS 
VFUN read(proc1 proc2 bus t ask  of fse t )  = word 
and offset  i n  proc2 using bus  b u s )  
(;  result of procl's reading of location associated w i t h  task 
DERIVED 
DERIVATION 
i f  fault(proc1 proc2 bus task offset) then 
else  word = correct-read(task offset)  
word = tundetermined 
OVFUN vote-read(proc task offset) = word 
( ;  returns majority vote on value associated w i t h  task and 
offset .  i f  vote is not unanimous, disagreements a re  logged. 
if no majority exists, returns undefined and s e t s  
fatal-error flag) 
EXCEPTIONS 
task-not-in-proc(proc task) 
bad-offset(task  offset) 
EFFECTS 
END-EXCEPTIONS 
word = majority-opinion(proc task offset) 
i f  word = undefined then fatal-error(proc) = true 
disagreement-set(proc) = 
dissenting-pairs(proc task offset) 
END-EFFECTS 
222 
C F U N  e r ro r -de tec t ed (p roc1  = b 
(; f lag  i n d i c a t i n g  t h a t  d i s a g r e e m e n t  ex is t s )  
DERIVATION 
not  d i sagreement -se t (proc)  = i 1 
VFUN f a t a l - e r r o r ( p r o c )  = b 
(; f l ag  t h a t  i n d i c a t e s  l a c k  of a m a j o r i t y )  
INITIALLY fa l se  
OFUN a s s i g n - t a s k ( p r o c  t a s k  s e t p a i r s )  
(; a s s i g n s  new t a s k  t o  proc- s e t p a i r s  i n d i c a t e s  t h e  o t h e r  
process’ws working on that  task and the busses  t o  be used 
i n  r e a d i n g  from them) 
EXCEPTIONS 
b a d - a s s i g n m e n t ( p r o c  t a s k  s e t p a i r s )  
too-many-tasks ( proc 
EF  FEC TS 
END-EXCEPTIONS 
t a sk - se t (p roc1  = ‘ t a sk - se t (p roc )*  Union { t a s k )  
proc-bus-assignments(proc , t a s k )  = s e t p a i r s  
END-EFFECTS 
OFUN d e l e t e - t a s k ( p r o c  t a s k )  
( ; d e a s s i g n s  t a s k  t o  p roc )  
EXCEPTIONS 
task-not- in-proc  (proc  task)  
EFFECTS 
END-EXCEPTIONS 
task-se t ( proc = * task-se  t ( proc * - { t a sk1  
proc-bus-assignments(proc t a s k )  = s e t p a i r s  
END-EFFECTS 
OFUN cause- fau l t (proc1  proc2  bus  task  o f fse t )  
(; produces a fau l t  t h a t  affects  reads  by  proc l  over  bus  b u s  
H I D D E N  
EXCEPT I O N S  
b a d - o f f s e t ( t a s k  o f f se t )  
EF  FEZ TS 
of l o c a t i o n  i n  p r o c 2  a s s o c i a t e d  w i t h  t a s k ,  o f fse t )  
END-EXCEPTIONS 
fau l t (p roc1  p roc2  bus  t a sk  o f f se t )  = true 
END-EFFECTS 
223 
. . .. . . . . 
OFUN change-correct-read(task offset word) 
(; updates correct-read to give correct  result for current 
HIDDEN 
EXCEPTIONS 
not-assigned(task) 
bad-offset(offset) 
EFFECTS 
iteration) 
END-EXCEPTIONS 
corect-read(task offset) = word 
END-EFFECTS 
END-FUNCTIONS 
ENDAODULE 
2 24 
REFERENCE 
1. D .  L .  Parnas ,  "A Technique f o r  Module Spec i f ica t ion   wi th   Examples ,"  
Corn. ACM, Vol. 15, No. 5, p p .  199-218 (May 1972) .  
2 25 

APPENDIX A 
MARKOV  PROCESSES 
227 

APPENDIX A 
MARKOV PROCESSES 
There i s  a s imple ,  e legant ,  and  powerfu l  theory  for  handl ing  
models of t h e  t y p e  c o n s i d e r e d  i n  t h i s  r e p o r t - - p r o v i d i n g  t h a t  t h e  
fo l lowing  cond i t ion  ho lds .  The p r o b a b i l i t y  P i j  of  making  any state 
t r a n s i t i o n  i s  independent  of t h e  manner. i n  which state w a s  reached.  
(The t r a n s i t i o n  p r o b a b i l i t i e s  are h i s t o r y  i n d e p e n d e n t ) .  I n  t h i s  case 
t h e  model i s  s a i d  t o  h a v e  t h e  Markov p rope r ty  and t o  d e f i n e  a Markov 
p rocess .  No te  tha t  abnorma l  even t s  o f  t he  t r ans i en t  o r  spon taneous  
f a i l u r e  t y p e  h a v e  t h i s  c h a r a c t e r ,  b u t  f a i l u r e s  i n  e q u i p m e n t  t h a t  
II wears ou t "  l i ke  an  au tomobi l e  do no t .  
Markov p rocesses  can  be  t r ea t ed  as e i t h e r  d i s c r e t e  t i m e  o r  con t in -  
uous t i m e  p rocesses .   In   t he   fo rmer   ca se ,  t i m e  i s  assumed to   p roceed  
in  d iscont inuous  " t icks"  where  one  state t r ans i t i on  mus t  occur  a t  
each  t ick.   For   example,   consider   the  fol lowing  model   for  a c o i n  
f l i pp ing  expe r imen t  i n  wh ich  the o b j e c t  i s  t o  o b t a i n  two "heads" i n  a 
row. 
The state diagram i s :  
1 2 3 
Tail Either 
Tail 
229 
and  the  whole  process  can  be  described  by  the  transition  matrix of P ij 
Given  the  probabilities of occupancy of a particular  state  at  time n, say 
then p (n+l) is given by p . Or, given p , (n>, ('1 then p(n) = p(0)pnm 
For  example,  with 
P (O) = (1 0 0) 
One  easily  gets 
and so forth.  The  general  solution  for p!"' is  known to be of the  form 
1 
230 
A closed  form  expression  for  the  probability  of  occupancy of each 
state  may  now  be  easily  obtained  using  the  initial  probabilities  for 
p"), p(l) and P(~). For  example,  the  probability 'of being  state  3  after 
n steps  (tosses)  is 
We show  the  above  example  in  some  detail  for  comparison  with  the 
continuous-time  formulation  below. 
A continuous-time  Markov  process  may  be  derived  as  a  limiting  case 
of  a  discrete-time  process  in  which  the  ticks  of  time  become  infin- 
itesimal;  however, we take  a  slightly  different  approach.  Since we are 
less  interested  in  particular  transition  probabilities  than we are  in 
probabilities of state  occupancy, we derive  an  expression  for  the  prob- 
ability  of  being  in  a  state g at  time t via  another  limiting  argument. 
For this  purpose we temporarily  assume  that  the  probability  of 
transition  from  any  state i to another  state j is  proportional  to  the 
time  spent  in  state  for  sufficiently  small  times.  That  is 
Pij = P At J At + 0 
where P is  a  constant  with  dimensions  time  and  value 0 P <-. -1 
This  assumption  is  necessarily  true  for  uniformly  distributed 
random  stochastic  events  that  occur  at  an  average  rate p ind pendent 
of past  history  (the  Markov  property). 
For  any  state q 
231 
we can then  write 
n m 
FZq L i#q 
or taking  the  limit,  At 4 0 
n  m 
The  above  system  of  linear  differential  equations,  together  with  an  ini- 
tial  vector  of  state  occupation  probabilities,  say P = (1,0,0,0 ... 0 ) ,  
completely  determines  the  state  of  the  system  for  all  time. 
i 
To observe  the  correspondence  between  this  formulation  and  the  dis- 
crete time  case  (and  also  to  greatly  facilitate  solution  of  the  system) 
one  can take  the  Laplace  transform of each  equation.  With  transform 
m 
variable S and = C 8 .  the  above  becomes 
q 1  1 
or  more  neatly  as  the  matrix  equation P ( S )  X M = P(O), as follows. 
-1. * 
X 
(S + B,) - CY -CY 
(S + 8,) - 
12 13 - ... CY In 
2n -CY 21 23 - ... cy 
232 
Observe 
1. 
2. 
3 .  
4 .  
that: 
The  differential  equations  above  are  similar  but  not  identical 
to  the  Chapman-Kolmogorov  equations  which  describe  transition 
probabilities.  Our  equations  describe  occupation  probabilities. 
The  transform  technique  moves  us  into  a  purely  algebraic 
domain  where  approximations  and  limits,  may  be  made  before 
inverting  to  obtain  solutions  in  the  time-domain. 
The  general  solution  of  the  system of equations  is  given  (as 
in  the  discrete  case)  by P (t) = C a.e  where hi are  the 
roots  of  the  polynomial  equation  Det. M = 0 .  
hit 
4 1 
Two  limiting  cases  of  behavior  are  immediately  apparent: 
a. As t -+ 03 P (t) -, alehlt where h is  the  numerically  largest 
9 1 
eigenvalue. 
b. Since P (S) = N ( S ) / D ( S ) ,  a  ratio  of  polynomials, we have 
4 9 
4 
that  as S -, OD P (S) + A S-(k+l) k + 0 ,  1, 2 . . . where 
A is a  constant.  Therefore  as t + 0 ,  Pq(t) -+y A k  t k. 
The  latter  limit  theorem  is  one  of  several  kinds  of  argument  that  can  be 
used  to  deduce  general  features of a  solution  without  actually  obtaining 
it. 
Examples : 
A. -0 
Det M = S(S + a)  roots 1, -a 
* 
pl(s) = s] 1 0  Det M = - 1 S + a  
* (S + a) 
P2(S) = [ -; 3 Det M = a 1 1 S(S + a) S S + a ="- 
233 
P,(t) = e , P2(t) = 1 - e -at -at 
P,(t) + P2(t) E 1. (This  is  the  simple  exponential  occupation 
distribution  for  state 1) 
B .  
-C 
M =  ( S  + b + c) 0 Det M = S ( S  + a) (S + b + c) - acS 
-b S '1 
* ab ab 
'3") = S ( S  + a)(S + b + c) - acS 3 as S -)a 
1 2 P3(t) M y abt as t -)O 
There  are  opportunitis  for  automation f the  solution  process.  Where 
the  transition  rates  are  numeric  quantities,  the  explicit  solutions  are 
easily  obtained  by  machine  using  linear  equation  solvers  and  eigenvalue 
routines. One can  also  use  algebraic  manipulation  programs to obtain  the 
parameterized  expressions Pn(S). For  state  graphs  without  cycles,  the 
individual P. (S) are  easily  obtained by a  "chain  rule." 
J- 
9; 
1 
For  example,  no  consideration of the  associated  matrix M was  required 
in  the  system  below. 
234 
C. 
P1(S) = 1 
* 1 
(S + a + b)  
P;(S) = bP:(S)/(S + c) 
p;(s) = [aP;(S) + cPi(S)]/(S + d) 
J. 
Pi(S) = dP2(S)/S 
-k 
Furthermore,  questions  concerning  the  "dependency of P (t) on the  value 
of a' can  be  answered  by  taking  the  appropriate  partial  derivitives  in 
2 
the S domain. To illustrate 
a 9; 
aa (S + a + b)  (S + d) - P2(S) = P1(S)?(S + d) = 
>k 1 
or 
- P (t) - t a aa 2 as t '0 
4 .  Incorporation of Non-Markov  Behavior 
The  pure  Markov  model  is an  excellent  approximation  to  the  situation 
where  all  state  transitions  in  the  model  correspond  to  stochastic 
(usually  abnormal)  events  such  as  component  failure  or  the  onset of a 
transient.  However,  there  are  three  other  types of behavior  that we 
would  like  to  be  able  to  handle  and  it  seems  particularly  important  to 
try  to  incorporate  these  events  into  a  pure  Markox  model so that  our 
powerful  analysis  techniques  will  still  be  applicable: 
235 
a. P rocesses   o f   f i xed   du ra t ion   ( such  as the   runn ing   o f  
a p rogram)  tha t  occu r  e i the r  (1) p e r i o d i c a l l y  i n  t i m e  
o r  (2) i n  r e s p o n s e  t o  a n o t h e r  e v e n t .  
b. P rocesses   w i th   nonexponen t i a l   p robab i l i t y   dens i ty  
func t ions ,  such  as t r a n s i e n t s  w h i c h  may have a narrow 
d i s t r i b u t i o n  o f  d u r a t i o n  times. 
The two types  o f  p rocess  in  a. a b o v e  m i g h t  b e  c a l l e d  d e t e r m i n i s t i c  
i n  t h e  s e n s e  t h a t  t h e y  i n v o l v e  b e h a v i o r  t h a t  i s  d e f i n i t e l y  h i s t o r y -  
d e p e n d e n t .   F i r s t   c o n s i d e r   t h e   f o l l o w i n g   s i t u a t i o n :  
n 
d 
Suppose there  i s  a d i agnos t i c  check  tha t  i s  v e r y  b r i e f  i n  d u r a t i o n  a n d  
i s  execu ted  pe r iod ica l ly  wi th  a low d u t y  c y c l e .  S t a r t i n g  i n  state 1 w e  
have two p o s s i b i l i t i e s .  I f  a n  e r r o r  o c c u r s  d u r i n g  a pe r iod  when t h e  
d i agnos t i c  check  i s  not running then w e  w i l l  go t o  s t a t e  3.  I f ,  on t h e  
o the r  hand ,  t he  d i agnos i s  rou t ine  happens  to  be  ope ra t ive  when t h e  e r r o r  
occurs  then  w e  f i n d  o u r s e l v e s  i n  state 2 where  the  e r ro r  may o r  may n o t  
b e  d e t e c t e d .  I f  i t  is, w e  r e t u r n  ( i n  t h i s  s i m p l i f i e d  s i t u a t i o n )  t o  s ta te  
1 otherwise  we go t o  s ta te  3 .  Now s u p p o s e  t h a t  t h e  d i a g n o s t i c  r o u t i n e  
r u n s   f o r  a pe r iod  T and i s  invoked  wi th   per iod  7 >> T Further   sup-  
p o s e  t h a t  t h e  a c t u a l  e r r o r  r a t e  i s  p and t h a t  p C 1/7 ( e r r o r s  o c c u r  
less f r equen t ly   t han   t he   d i agnos i s   pe r iod ) .  Under these   c i rcumstances  
it i s  a n  e x c e l l e n t  a p p r o x i m a t i o n  t o  a s s u m e  t h a t  e r r o r s  a r e  c o m p l e t e l y  
u n c o r r e l a t e d  w i t h  t h e  i n i t i a t i o n  o f  t h e  d i a g n o s t i c  t e s t .  Therefore  we 
may a s s i g n  t o  t h e  t r a n s i t i o n  r a t e s  a and b t h e  v a l u e s  
1 2 1' 
2 .  
236 
T 1 b = p -  
2' 
where, of course,  a + b = p .  Moreover,  if  the  probability  that  error 
detection  actually  occurs when reaching  state 2 is 2, then  the  rates 
are  a good  approximation  to  actual  transition  probabilities.  That is, 
with p = 1 one  would  expect  on  the  average  one  transition  from 2 to 3 
in the T and  this,  by  definition  gives  the  corresponding  transition 
rate C.  Generally,  the  above  treatment  must  be  quite  satisfactory so 
long  as p CK 1 / T 2  e< 1hl, as  will  frequently be the  case  since  typical 
values  for  these  quantities  are 
1' 
p - 10 - 3  hour -1  
The  second  form  of  deterministic  behavior  we  need  to  adequately 
handle  is  the  case  in  which  fixed  delays  occur.  Consider  the  following 
situation: 
( GOOD) 
237 
Here w e  assume t h a t  t h e  e v e n t  o f  a f a u l t  when i n  s ta te  1 causes  the  
s y s t e m  t o  e n t e r  a state 2 i n  which a reconf igura t ion  program i s  s t a r t e d  
up.   This   program  runs  for  a f i x e d  t i m e  7 .  If a n o t h e r  f a u l t  o r  e r r o r  
occurs during the running of the program, w e  e n t e r  a n  u n s a t i s f a c t o r y  
s ta te  3 .  O t h e r w i s e   a f t e r  t i m e  T w e  e n t e r  a s a t i s f a c t o r y  state 4 .  The 
a c t u a l  o c c u p a t i o n  p r o b a b i l i t i e s  f o r  states 2, 3, and 4 are as shown below: 
An a t tempt  to a p p r o x i m a t e  t h i s  s i t u a t i o n  by a s s ign ing  a cons t an t  
t r a n s i t i o n   r a t e   t o  i s  no t  a very good s t r a t egy   because  p would  then 
appea r  a s  shown below: 
4 
p4 
1 " 
-7 
which would a s s i g n  a r e l a t i v e l y  l a r g e  p r o b a b i l i t y  t 
t o  4 dur ing  the  pe r iod  when t h i s  i s  n o t  p o s s i b l e .  
:o a tr a n s i t i o n   f r  o m 2  
An a r t i f i ce  fo r  improv ing  the  approx ima t ion  i s  t o  r e p l a c e  state 
by a chain of states, a s o r t  o f  p r o b a b i l i s t i c  d e l a y  l i n e :  
2 
238 
n 
now  the  easily  computed  transform  of  the  probability  to  be  in  state 4
is  of  the  form l /S (B /S  + B)n and  in  the  time  domain p now  appears  as 
f 01 lows : 
4 
p4 
1 ” 
which is a  much  closer  approximation  to  the  desired  behavfor. 
General  Probability  Distribution  Function 
In general,  if we wish to  approximate  some  probability  distribution 
function.  (say  behavior  of  a  particular  type  of  transient),  it  is  always 
possible  to  obtain an arbitrarily  close  approximation  by  replacing  the 
state  having  the  odd  behavior  with  a  series-parallel  combination  of 
states  having  real  positive  transition  rates  as  depicted  below: 
239 
Construction  of  these  approximations  is  much  like  the  problem  of 
synthesizing  passive  electrical  networks  having  prescribed  transfer 
characteristics. The "best"  approximation  to  a  given  probability  func- 
tion  for  a  fixed  number  of  states  would  lead  to  the use of  complex 
rather  than  real  transition  rates.  This  is  an  interesting  possibility 
to  consider  in  future  research. To effectively  employ  such  techniques 
we would  need  some  theorems  that  say  when  it  is  safe  to  use  an  approx- 
imation  that  would  lead  to  nonphysical  occupation  probabilities  for 
some of  the  fictitious  states  involved  in  the  synthesis. 
240 
- 
1. Report No. 2.  Government Accession No. 3. Racipimt'r Catalog No. 
NASA  CR-3011 I . "  
4. Title and Subtitle 5. Report Date 
DESIGN STUDY OF  SOFTWARE-IMPLEMENTED  FAULT-TOLERANCE June  1982 
' (SIFT)  COMPUTER 6. Performing Organization Code 
7.  AuthodsJ J. H. Wensley,  J.  Goldberg, M.  W. Green, 8. Performing Organization Report No. 
W. H. Kautz,  K.  N.  Levitt,  M.  E.  Mills,  R.  E.  Shostak, 
P. M. Whiting-O'Keefe,  and H. M.  Zeidler 10. Work Unit No. 
9. Performing Organization Name and Address 
SRI International 
Menlo  Park,  California 94025 
NAS1-13792 333  Ravenswood  Avenue 
11. Contract or Grant No. 
, 13. Type of Report and Period Covered 
12. Sponsoring Agency Name and Address Contractor  Report 
National  Aeronautics  and  Space  Administration 
Washington,  DC 20546 
14. Sponsoring  Agency  Code 
" 
15. Supplementary  Notes 
Langley  Technical  Monitor:  Nicholas  D.  Murray 
Final  Report 
I 16. Abstract I 
This  paper  reports  on a continuing  effort  to  design a flyable  SIFT  computer  that 
can  demonstrate  the  feasibility  of  an  integrated  function,  fault-tolerant  computer 
in  commercial  aviation.  The  goals of the  work  reported  in  this  paper  are: 
(-1) to  develop  the  SIFT  design  concept  to a point  at  which  its  potential 
reliability  may  be  evaluated  with  reasonable  accuracy; 
( 2 )  to  investigate  -alternate  strategies  for  physical  implementation, 
using  available  or  specially  designed  components; 
(3)  to  prove  the  correctness  of  the  hardware  and  software  designs;  and 
( 4 )  to  model  the  system  and  evaluate  its  effectiveness  from a
fault-tolerance  point  of  view. 
17. Key Words (Suggerted by Author(s)) 
~~ - 
18. Distribution Statement 
~~ I 
SIFT 
Ultrareliability 
Faul t tolerant 
Unclassified - Unlimited 
Multicomputer  system 
Subject  Category 62 Reconf  igurable  computer  system 
19. Security Classif. (of this report) 
Unclassified 252 A1 2 Unclassified 
20. Security  Classif. (of this page) 21. No. of Pages 22. Rice 
. . . . . . . - . A 
For sale by the National Technical Information Service, Springfield, Virginia 22161 
NASA-Langley, 1982 
