A forward view on reliable computers for flight control by Goldberg, J. & Wensley, J. H.
A FORWARD VIEW ON RELIABLE COMPUTERS FOR FLIGHT CONTROL” 
Jack Goldberg and John H. Wensley 
* Stanford Research  Inst i tute  
W e  examine the  requirements f o r  f a u l t - t o l e r a n t  computers f o r  f l i g h t  
con t ro l  of  commercial a i r c r a f t  and conclude t h a t  the r e l i a b i l i t y  require-  
ments f a r  exceed those t y p i c a l l y  quoted f o r  space missions.  Examination 
of c i r c u i t  technology and a l t e r n a t i v e  computer a r c h i t e c t u r e s  ind ica t e s  
that  the des i red  r e l i a b i l i t y  can be achieved with seve ra l  d i f f e r e n t  computer 
s t r u c t u r e s ,  though there are obvious advantages t o  those t h a t  are more 
economic, more r e l i a b l e ,  and, very importantly,  more c e r t i f i a b l e  as to  f a u l t  
tolerance.  Progress i n  t h i s  f i e l d  is expected to  b r ing  about better computer 
systems t h a t  are more r igorous ly  designed and analyzed even though computa- 
t i o n a l  requirements are expected t o  increase  s i g n l f i c a n t l y .  
INTRODUCTION 
Current NASA developments i n  a i r c r a f t  and a v i a t i o n  systems design requi re  
a g rea t  increase i n  on-board computing. Most of t he  advanced a i r c r a f t  
designs--e.g., configuration-controlled vehic les ,  and cer ta in  STOL modes-- 
r equ i r e  extremely r e l i a b l e  computations. NASA must t he re fo re  be assured 
t h a t  i t  w i l l  be possible  to  bu i ld  computing sys tems having the  high capaci ty  
and extreme r e l i a b i l i t y  t h a t  i ts cu r ren t  advanced a i r c r a f t  designs w i l l  
r equi re .  
The r e l i a b i l i t y  requirements f a r  exceed those t y p i c a l l y  quoted f o r  space 
missions (95% success a f t e r  f i v e  years). This  implies  t ha t  the  p robab i l i t y  
of e r r o r  of spaceborne computers i s  designed t o  be on the  order  of 10’6/hr 
for long missions while t h e  acceptable  f i g u r e  f o r  advanced av ionic  systems 
f o r  the commercial environment i s  on the o rde r  of lO-’/hr f o r  s h o r t  missions. 
The commercial environment a l s o  has d i f f e r e n t  c e r t i f i c a t i o n  requirements, 
no t  on ly  because of the high publ ic  demand f o r  s a f e t y ,  bu t  because t h e  users 
are more d i v e r s i f i e d .  Thus the hardware and software components of a computer 
for commercial av ionics  must no t  only s a t i s f y  the r e l i a b i l i t y  c r i te r ia  of 
computer designers ,  but  the  r e l i a b i l i t y  must be convincingly demonstrated t o  
a i r c r a f t  system designers  and users .  I t  i s  w e l l  understood t h a t  computers 
of t he  needed power w i l l  r equ i r e  a l a rge  number of components, and t h a t  
t h i s  number is  so l a rge  ( > l o  ) and the assured r e l i a b i l i t y  i s  so low ( 4 
* 
This  work w a s  supported i n  p a r t  by the  National Aeronautics and Space 
Administration, Langley, Vi rg in ia ,  Contract  NAS1-10920. 
97 3 
https://ntrs.nasa.gov/search.jsp?R=19760024087 2020-03-22T12:53:13+00:00Z
f a i lu re s /h r )  t h a t  some form of  b u i l t - i n  f a u l t  tolerance is e s s e n t i a l .  
Unfortunately,  t he  simpler forms of f a u l t  to le rance  (e.g., error co r rec t ing  
codes and t r i p l e  modular redundancy) are inadequate f o r  computers of t h e  
required s i z e .  
Rea l iza t ion  of t h i s  inadequacy has given rise t o  seve ra l  research and 
development e f f o r t s  i n  the design of au tomat ica l ly  reconfigurable  computers. 
Some examples of computers c a r r i e d  t o  a f a i r l y  d e t a i l e d  design l e v e l  are 
STAR (JPL) [ r e f .  11, EXAM (NASA-ERC) [ r e f .  21, ARMMS (NASA-Marshall) [ref. 31, 
and MSC (SAMSO) [ r e f .  41. Other recent  designs,  a t  a less -de ta i led  l e v e l ,  
include SIFT (NASA-Langley) [refs. 5 and 61, an unnamed computer, h e r e a f t e r  
c a l l e d  HS (MIT C. S. Draper Laboratory) [ r e f s .  7 through 91. There has 
also been considerable  research i n  techniques f o r  designing redundant l og ic  
networks and memories, f o r  t e s t i n g  a r b i t r a r y  l o g i c  networks, and f o r  modelling 
redundant systems.  For a d iscuss ion  of these  top ic s ,  see reference  10. 
These design and technique s t u d i e s  comprise a well-rounded, bu t  r e l a t i v e l y  
unproven ar t .  They do not  y e t  comprise a base of technological  p r a c t i c e  
s u f f i c i e n t  f o r  the design of computer s y s t e m s  whose r e l i a b i l i t y  can be 
spec i f i ed  w i t h  a high degree of assurance.  Th i s  i s  a consequence of the bas i c  
f a c t  t h a t  (1) f a u l t s  and e r r o r s  can occur  i n  extremely var ied ways, and (2) t he  
f a u l t - t o l e r a n t  behavior of an automatical ly  reconfigurable  computer can 
be extremely complex. 
Subsequent s ec t ions  of t h i s  paper examine t h e  computational and r e l i a b i l i t y  
requirements,  t h e  technology cons t r a in t s ,  and estimates o f  the l ike l ihood of 
achieving the  goals .  
COMPUTATIONAL REQUIREMENTS 
I n  t h i s  s ec t ion  w e  consider  the  computational and r e l i a b i l i t y  requi re -  
ments of a r ep resen ta t ive  a i r c r a f t  computer system. The example w e  choose 
is t h a t  of a commercial t ransonic  four-engine a i r c r a f t .  W e  assume t h a t  
advanced con t ro l  systems w i l l  be required f o r  such func t ions  a s  f l u t t e r  
con t ro l  and a t t i t u d e  con t ro l .  W e  f u r t h e r  assume t h a t  an advanced b l ind  
landing system would be used. 
The requirements are reported i n  d e t a i l  i n  re ference  6, and are summarized 
i n  t a b l e  1. The m o s t  c r i t ical  phase of t he  f l i g h t  from a computational stand- 
po in t  is  during an instrument landing. Those app l i ca t ions  involved i n  t h a t  
phase are indica ted  wi th  an "#". 
t h a t  phase do not  inf luence the design of t h e  computer system and therefore  
have not  been est imated t o  the same accuracy as the m o r e  important tasks. 
Small tasks t h a t  are not  required during 
The column headings of t a b l e  1 are def ined as follows: 
974 
C r i t i c a l i t y  Class----1. 
2. 
3. 
4. 
5, 
name given to  the  app l i ca t ion  program. 
Immediate sa fe ty-of - f l igh t  impact. 
Eventual safety-of-f  l i g h t  impact. 
S ign i f i can t  change-of-mission impact. 
Operat ional  impact. 
Economic impact. 
Table 1 
COMPUTING REQUIREMENTS FOR EACH COMPUTATIONAL FUNCTION 
At t i t ude  con t ro l  
F l u t t e r  cont ro l  
Load cont ro l  
Autoland, hor iz .  
Autoland, v e r t ,  
Autoland, t h r o t t l e  
Autopi l o t  
E l e c .  a t t .  cont ro l  
Supervisor 
I ne r t i a l  
VOR/DME 
DME, OMEGA 
A i r  da ta  
Kalman f i l t e r  
F l i g h t  da t a  
A i r  speed, a1 t i  tude 
Graphic d i sp l ay  
;Text d i sp lay  
Co 11 i s ion  avoidancl 
Data comm, A/C 
D a t a  corn ground 
AIDS 
Ins t .  monit. 
Syst.  monit. 
L i f e  support 
Engine cont ro l  
------------------ 
------------------ 
,---,----L,-----,- 
Task I 
C r i  t icali tg 
C l a s s  
1 
1 
3,5# 
# 
1# 
# 
4 
1# 
4 
2# 
4 
4 
4 
4 
4 
4# 
4# 
4 
4# 
# 
----------- 
i t era t i ve 
Ins  t . 
1845 
70 
45 
750 
150 
790 
75 
2100 
250 
400 
110 
2 50 
450 
360 
890 
640 
550 
210 
450 
650 
800 
900 
900 
1300 
------- 
------- 
------- 
22 
15 
2 -3 
2-3 
275 I 2-3 
100 
52 0 
15 
150 
50 
105 
25 
65 
100 
70 
5360 
8700 
650 
400 
------- 
------- 
4-5 
? 
? 
0-4 
4-5 
4-5 
4-5 
2-3 
2-3 
2-3 
2-3 
4-5 
1-2 
? 
--------- 
--------- 
Tasks to be run during b l ind  landing, the most cri t ical  f l i g h t  mode, are 
marked "#" . 
Tasks marked "?" exert a neg l ig ib l e  load for the  parameter i n  question. 
The column headings are defined i n  the  tex t .  
I t e r a t i o n  Rates/Sec--The number of t i m e s  per  second t h a t  the ca l cu la t ion  
must be c a r r i e d  ou t .  When two f i g u r e s  are quoted, 
they represent  two ca l cu la t ions  wi th in  the  same 
funct iona l  task. I 
The Mi l l ions  of Ins t ruc t ions  Per Second to ca r ry  o u t  Equivalent MIPS------ 
t h e  calculations: 
- -  
975 
Memory Required------ The number of words of  memory required for i n s t r u c t i o n s  
and data. 
Missed Iterations----The m a x i m u m  number of consecutive i t e r a t i o n s  t ha t  can 
be missed before the app l i ca t ion  is  jeopardized. 
I n  i n t e r p r e t i n g  the table and d iscuss ing  i t s  impl ica t ions  on computer 
i 
a r c h i t e c t u r e ,  w e  consider  r e l i a b i l i t y ,  rol l -back delay,  main memory requirements, 
processor speed, processing v a r i a t i o n s  within a mission, and data rates. 
R e l i a b i l i t y  
W e  assume t h a t  the P robab i l i t y  of not  successfu l ly  car ry ing  out  the most -
cr i t ical  computation should be less than loW8 pe r  mission. 
corresponding to  c r i t i c a l i t y  classes 1 and 2, could cause an aircraft  crash 
i f  not  carried o u t  or i f  carried o u t  w i t h  g ross  e r r o r s .  With t h i s  assumed 
computation r e l i a b i l i t y ,  for a f l e e t  of 1000 a i r c r a f t  f l y i n g  fou r  d a i l y  
missions,  each of f i v e  hours without r e p a i r  between f l i g h t s  wi th in  a day, 
about one crash due to a computer f a i l u r e  would occur i n  100 years. For the 
These computations, 
o t h e r  c r i t i c a l i t y  classes, the assumed r e l i a b i l i t y  i s  not  
t y p i c a l  f a i l u r e  p robab i l i t y  is l0-4--since the f a i l u r e  t o  
c r i t i ca l  computations r e s u l t s  i n  only a mission change or 
I n  a system design, it would be bene f i c i a l  to  so a l l o c a t e  
task is  carried o u t  w i t h  the  ind ica ted  r e l i a b i l i t y .  
as s t r ingent--a  
c a r r y  ou t  these less 
an economic loss .  
redundancy tha t  each 
R o  1 1 -back 
An important parameter of a f a u l t - t o l e r a n t  computer is  the maximum t i m e  
i n t e r v a l  tha t  the computer can he i n  a roll-bacWreconfiguration mode i n  
responding t o  a f a i l u r e .  During t h i s  i n t e r v a l  some processing of c e r t a i n  
computations may cease, and newly appearing data might be l o s t .  The missed 
i t e r a t i o n s  column of table 1 i n d i c a t e s  the number of i t e r a t i o n s  tha t  can be 
ignored i n  a given computation without adversely a f f e c t i n g  the aircraft. I n  
the worst  case ( c o l l i s i o n  avoidance) the system must be "down" f o r  no more 
than 1.5 msec. Severa l  other cr i t ical  computations--flutter control, load 
cont ro l ,  autoland--require reconf igura t ion  t i m e s  nea r ly  as short. For these 
computations, it might be necessary t o  re load  programs, which i n d i c a t e s  that  
the computer might be requi red  t o  be t o t a l l y  engaged i n  reconf igura t ion  
following a f a i l u r e .  Fortunately,  the computations w i t h  l a rge  amounts of data, 
e.g., d i sp lay ,  can tolerate a downtime of approximately 0.5 sec., thus  allowing 
ample t i m e  f o r  the possible reloading of data, in te r leaved  w i t h  the m o r e  
cr i t ical  computations. 
Memory Requirements 
The app l i ca t ion  programs f o r  the cr i t ical  phase r equ i r e  approximately 
20K words. T h i s  f i g u r e  i s  a l o w  estimate f o r  t w o  reasons: 
0 The d i f f i c u l t y  of es t imat ing  accu ra t e ly  
The need for  memory space f o r  the execut ive rout ines .  
976 
Hence w e  assume a t o t a l  memory requirement of 24K words. Note that  t h i s  is a 
nonredundant requirement; the demand f o r  f a u l t  tolerance w i l l  increase t h i s  
f igure.  For archi tecture  re lying t o t a l l y  on t r ip l i ca t ion ,  t h i s  storage 
requirement must be t r i p l ed  to  72K. 
byte correction (ref.  10) i n  memory (plus possibly a f e w  ex t ra  bytes for 
double-byte detection and sparing), the f igure i s  about one-third i n  excess 
of 24K or about 32K. 
For archi tectures  u t i l i z i n g  only single- 
Processor Speed 
For the cr i t ical  phase, the application tasks  require  0.386 MIPS (millions 
of instruct ions per second). 
low i n  par t  due to  inaccuracies, but mostly due t o  the  
multiprogramming and the processing of executive routines.  
w e  assume a processor load of 0.5 MIPS. An important a t t r i b u t e  of the 
computations is  their  r e l a t i v e  independence. That  is ,  the sharing of functions 
and data among the computations does not subs tan t ia l ly  reduce the overal l  
memory or processor requirements. Each computation requires access to  the 
state of the a i r c r a f t ,  but most o ther  data can be considered to  be local .  
Hence it is qui te  simple t o  impose a multiprocessor d i sc ip l ine  on the computa- 
t ions,  w i t h  almost an arbi t rary number of procesSors. 
Once again w e  must regard t h i s  f igure as being 
wasted" CPU power i n  l f  
For these reasons 
Under cer ta in  a l loca t ion  of tasks to  processors i t  is not necessary to  
do any task interrupt ion within a processor. That is, a task can be allowed 
to  run through completion before i n i t i a t i n g  another task. Five processors 
each of 0.1 MIPS would enable such an allocation. However, near the end of the 
useful l i fe  of the computer, s a y  i f  j u s t  one or two unfailed processors remain, 
i t  i s  possible tha t  a high-rate task ( f l u t t e r  control)  might be al located 
to  the same processor as a low-rate but long task (graphic display). If such 
a j o i n t  a l locat ion i s  unavoidable, then interrupt ion of the  longer task w i l l  
be essent ia l .  
Processing Variations Within a Mission 
All applications marked w i t h  "#" are required during an instrument landing. 
T h i s  represents about 60 percent of the  t o t a l  CPU requirement and about 50 
percent of t h e  memory requirement. Hence some graceful degradation is 
possible as, during the mission, tasks w i l l  be na tura l ly  deallocated as they 
are no longer needed as par t  of the f l i gh t .  Hence, when a task is no longer 
needed, i t s  memory area can be al located to another task,  or, a f a i l u r e  i n  a 
memory module i s  automatically handled by a memory module w i t h  a reduced 
requirement. However, w e  note that the degradation w i t h  respect t o  memory 
is  not uniform, assuming t h a t  a l l  programs and constants a re  retained in  
main memory. For example, i n  mid-flight, although not a l l  tasks are being 
processed, a l l  programs must be stored reliably in the main memory. Hence the 
* 
* 
The issue of back-up memory i n  an a i r c r a f t  environment is  yet t o  be completely 
resolved. Rugged discs can be obtained but t h e i r  cos t  per b i t  is  not 
s ign i f icant ly  less than t h a t  for LSI main memories. 
977 
graceful degradation w i t h  regard to  main memory is  not exploitable u n t i l  the 
last minutes of the f l i g h t ,  and hence is of questionable u t i l i t y  t o  the 
archi tecture  . 
Data Rates 
An important measure of computer power required is  the load on the bus 
structure f o r  t ransfer  of instruct ions and data. Given a computing load 
of 0.5 MIPS, w e  assume that an instruct ion w i l l ,  on average, require 24 
bits.* 
following cases: 
Different instruct ions require varying amounts of data including the 
0 0 b i t s  f o r  regis ter- to-regis ter  operations 
0 8 b i t s  f o r  byte operations, e.g., t e x t  display 
0 16 b i t s  for integer  operations 
* 32 b i t s  f o r  f loa t ing  point operations. 
Based on an estimate t h a t  the average’data required i s  16 b i t s ,  the t o t a l  
flow between memory and CPU i s  20 Mbits/sec. 
the JPL STAR), the bus would have to  be capable of maintaining t h i s  rate. 
In  the case of the Hopkins scheme, a s igni f icant  reduction would be achieved 
by the use of the loca l  CACHE on the processors. An addi t ional  reduction is 
achieved by providing a multi-bus s t ruc ture  or allowing multiple ports  in to  
main memory. In  the SIFT system, most of the bus load would be i n  individual 
modules, w i t h  only an estimated one percent between modules. 
I n  some archi tectures  (e.g., 
TECHNOLOGICAL ADVANCES 
The most important future  development i n  technology is  expected to  be 
the continued improvements i n  LSI. The cos t  of LSI c i r c u i t s  w i l l  continue 
to  drop throughout the 1970s, and w i l l  r e s u l t  i n  processor and memory costs  
tha t  are low enough so that extensive redundancy of un i t s  is prac t ica l  from 
a cost  viewpoint. 
the lat ter being more applicable to  memories, I t  is  expected that  the cost  
of a computer system to carry out a l l  computation within an a i r c r a f t  w i l l  be 
comparable with the present cos t  of ex is t ing  single-function avionic un i t s  
(e. g., i n e r t i a l  navigation). 
Th i s  redundancy can be either by rep l ica t ion  or by coding, 
A second advantage i n  the use of LSI i s  the small s i z e  of such uni t s ,  
making i t  possible to  achieve far more e f f i c i e n t  shielding from both electric 
and magnetic f i e lds ,  thereby reducing the probabi l i ty  of noise and crosstalk.  
I t  i s  expected that f a u l t  modes of t h i s  type (which are manifested as data- 
dependent t rans ien t  f au l t s )  w i l l  be ins igni f icant  within the cent ra l  uni ts .  
However, such f a u l t s  may st i l l  e x i s t  i n  connections to  external sensors and 
actuators ,  
* 
In  a 16-bit computer t h i s  implies equal number of single- and double-length 
instruct ions.  
978 
With the use of LSI most of the connections a t  the device and gate leve l  
take place within the semiconductor device, or chip, rather than on a,board 
or through a connector as  i n  the use of discrete c i r cu i t s .  The number of 
soldered and wrapped j o i n t s  i s  estimated to be a t  least an order  of magnitude 
less than that  associated w i t h ,  say, integrated c i r c u i t s ,  thus there would be 
consequent reduction of f a u l t s  i n  the  connection system. 
-
LSI c i r c u i t s ,  though r e l a t ive ly  cheap i n  high-volume production, have a 
high development cost. This implies that an e f f i c i e n t  design would contain 
as small a number of d i f f e ren t  chip types as possible. This a f f ec t s  ~ 
archi tec tura l  decisions a t  two levels .  A t  the u n i t  l eve l  (memory, bus, 
arithmetic un i t ,  control,  ctc.), there w i l l  be strong advantage i n  using 
rep l ica t ion  of i den t i ca l  un i t s  rather than uni t s  designed spec i f ica l ly  f o r  
par t icu lar  functions. A t  the logic  leve l ,  the high development cost  of custom 
b u i l t  un i t s  makes it more a t t r a c t i v e  t o  t ransfer  arbitrary log ic  to  a form 
of memory as i n  the use of microprogramming. 
Replacement and maintenance s t r a t eg ie s  i n  a reconfigurable computer are 
a l so  influenced by LSI. The large number of gates per chip, together w i t h  
the tendency f o r  a chip f a u l t  t o  a f f e c t  many gates, implies that groups of 
r eg i s t e r s  on the  same chip should be replaced, ra ther  than replace small un i t s  
such as reg is te rs .  I 
The choice of LSI technologies is  between the lower-speed, lower-cost 
MOS and the higher-speed and higher-cost bipolar technologies. The t o t a l  
computing power required among the elements of the several  candidate 
architectures i s  such t h a t  MOS w i l l  be suf f ic ien t ly  f a s t  f o r  memories, buses 
and arithmetic uni t s .  In  addition, the use of a multiprocessor organization 
permits the attainment of high computation capacity w i t h  slower processors. 
The higher speed of bipolar c i r cu i t s  may s t i l l  be necessary i n  the control 
sections where the microprogram cycle t i m e  w i l l  typ ica l ly  be an order of 
magnitude f a s t e r  than the ins t ruc t ion  cycle t i m e .  Recent advances i n  
technology have tended to  bring the two types closer  i n  both speed and cos t ,  
W e  note that  the choice between d i f fe ren t  LSI technologies, discussed 
above, was on the basis of speed and cost. The lower-cost a l te rna t ive  of MOS 
is possible because of the higher density within $he chip, thereby enabling 
the use of fewer chips. 
the inherent r e l i a b i l i t y  due t o  the reduction i n  number of chips. 
systems appear to  be poten t ia l ly  more r e l i ab le  than core or pla ted  w i r e ,  
because of the reduced numbers of discrete semiconductor device's and in te r -  
connections. The use of ba t t e r i e s  is  deemed to  be a f u l l y  adequate assurance 
T h i s  w i l l  have the desirable e f f e c t  of increasing 
LSI memory 
of non-volatility. I 
The MTBF f o r  LSI c i r c u i t s  i s  estimated to be between lo6 and lo7 hours. 
The requirement t o  achieve a MTEtF of lo9 hours f o r  the whple system can be 
shown to be achievable by several architectures.  
I 
The use of o p t i c a l  coupling between un i t s  can provide great  protection 
against  damage propagation through several  un i t s ,  The archi tecture  must 
therefore be more concerned w i t h  f a u l t  propagation through erroneous data 
979 
than by adverse electrical phenomena. 
i s  substant ia l ,  though not prohibit ive,  so careful  design to  achieve fau l t -  
isolat ion is required. 
The added cost f o r  such protection 
DESIGN CONSIDERATIONS FOR FAULT-TOURANT COMPUTER ARCHITECTURES 
In  the preceding sect ions w e  have discussed the requirements f o r  fau l t -  
W e  now conslder some representative computer a$chitectures from 
to le ran t  a i r c r a f t  computers, and the impact of new technology on their 
design. 
the viewpoint of cost  and r e l i a b i l i t y .  
Many possible computer s t ruc tures  ex is t  t o  s a t i s f y  the requirements and 
i t  i s  not our i n t en t  here to  survey a l l  ex is t ing  or p o g s i b l e  designs, but 
r a the r  t o  look a t  a s m a l l  number of designs i n  order to compare the  use of 
d i f f e ren t  fault-tolerance techniques, W e  choose three designs--multichannel, 
SIFT, and SIFT w i t h  coding i n  memory. 
In  the multichannel design, a number of ident ica l  computers are used 
w i t h  a l l  computers operating ident ica l ly  on the  tasks  to  be performed. The 
computers are operated i n  a lock-step mode w i t h  a l l  data movement being checked 
by voters t h a t  are connected to  the buses.' A typical  number of channels would 
be three, four or f ive ,  higher numbers being unnecessary and tending t o  
complicate the design of the voters.  
In  the SIFT design, a number of computers a re  a l so  used but they d o  not 
operate i n  lock-step mode, and they do not a l l  operate on the same tasks.  
Error-detection is  achieved by comparison of r e s u l t s  of calculations carried 
out  i n  several  computers, t h i s  comparison being by program, not by a hardware 
voter.  An important characteristic of the design i s  t h a t  the  buses connecting 
computers are constrained so tha t  each computer cannot w r i t e  in to  the memory 
of the other  computers. This great ly  improves f a u l t  i so l a t ion  between computers. 
Reconfiguration i s  a l so  carried out  by software i n  a sys t em executive tha t  is 
i t s e l f  repl icated to  assure adequate r e l i a b i l i t y .  
In  the t h i i d  design to  be considered, the processors operate a s  i n  the  
SIFT design, but coding i s  applied to  protect  against  f a u l t s  in  memory. 
W e  now consider each of the above designs. I n  a l l  cases w e  assume a 
W e  use the notation tha t  P[event] = chip f a i l u r e  probabi l i ty  of 10'6/hr, 
probabi l i ty  of the event occurring per hour. 
W e  d is t inguish between the most cri t ical  (MC) tasks where error* probabili- 
W e  a l so  dis t inguish those tasks required f o r  automatic 
t ies should be below lO-g/hr and the least cri t ical  (LC) tasks where e r ro r s  
should be below 10'4/hr. 
* 
In  t h i s  analysis,  w e  do not dis t inguish 
and n u l l  outputs. A more comprehensive 
d is t inc t ion .  
between erroneous outputs t o  actuators  
analysis would need to  make t h i s  
980 
'blind' landing and other  tasks.  
terms of computing load. W e  summarize i n  table  2 a representative set of 
requirements, where M is  memory requirements i n  thousands of words and P is  
processor requirements i n  MIPS. 
The landing phase is the most demanding i n  
Landing 
Table 2 
COMPUTATION AM) &EMORY REQUIREMENTS 
Other 
P = 0.29 
M = 8.8 Most C r i t i c a l  
P = 0.09 
M = 2.2 
P = 0.9 P = 0.05 
M = 6.8 M = 5.5 I L e a s t  C r i t i c a l  
W e  assume tha t  words contain, on the average, 24 information b i t s .  W e  
fu r ther  assume tha t  a memory chip contains 4 K  b i t s ,  and tha t  i t  requires 30 
chips/MIPS to  rea l ize  the CPU. 
C a s e  1: Multichannel 
W e  assume 10% extra memory and processor requirement to handle the 
multiprogramming and o ther  executive requirements ( interrupt  handling, e tc  .) . 
The multichannel concept requires enough memory i n  each channel t o  hold a l l  
tasks (23.2K + 10% M Zm), and the CPU must handle the heaviest task load 
(0.38 + 10% x 0.42 MIPS). Therefore f o r  each channel w e  have 
w 170 chips 
26K words = 156 chips 
0.42 MIPS= 13 chips 
Assume t h a t  the chips i n  the voter  ( su f f i c i en t ly  repl icated fo r  reli- 
ab i l i t y )  are negligible and consider the probabi l i ty  of e r ro r  for three-, four- 
and five-channel configurations. The resu l t s  are displayed i n  table  3. 
Case 2: SIFT With Faul t  Tolerance Achieved by Uniform Replication 
For t h i s  case, the s t ra tegy  is  to t r i p l i c a t e  a l l  tasks, and when f a u l t s  
occur t o  reduce the LC tasks  t o  duplicate,  then s ingle  processors, f i n a l l y  
removing them e n t i r e l y  i n  the event that resources are d ras t i ca l ly  reduced. 
W e  assume 20% overhead f o r  executive plus voting routines.  * 
The memory and processor requirements are a s  i n  tab le  4. The r e l i a b i l i t y  
results are displayed i n  tab les  5 and 6, for a SIFT system decomposed in to  
four  and ten modules, respectively.  
_ _  - * 
This estimate (of 20%) i s  not cr i t ical  i n  determining the component count, 
the cost  or the r e l i a b i l i t y  of the design. 
981 
Table 3 
Landing 
RELIABILITY ESTIMATES FOR MULTICHANNEL SYSTEM 
Other 
3 Channel 
Tota l  chips 
PC1 f a u l t l  
P = 0.11 
M = 8.2 Least Cr i t i ca l  
PC2 f a u l t s ]  
P = 0.06 
M = 6.6 
4 Channel 
Total  chips 
P[1 f a u l t l  
PC2 f a u l t s ]  
P[3 f a u l t s ]  
5 Channel 
= 540 
= 0.51 X . . .voting masks error, d iscard  f a u l t y  
= 0.17 X 10 J . .  sys t em f a i l u r e  
channel 
-6 
= 680 
= 0.68 X f O  ). .. .vo ter  removes f a u l t y  channel 
= 0.34 X 10 , . . . .vo ter  masks second f a u l t ,  d i scard  fau l t3  
= 1.2 X 10 
-3 
-6 
channel 
-10 , . . . .system f a i l u r e  
Tota l  chips 
PC1 f a u l t l  
P[2 f a u l t s ]  
PC3 f a u l t s ]  
P[4 f a u l t s ]  
= 850 
= 0.85 X . . . voter removes f a u l t y  channel 
= 0.58 X . . .vo ter  removes f a u l t y  channel 
= 0.3 X lo-’, . . .vo ter  masks f a u l t ,  d i scard  f a u l t y  
channel 
= 1  x . . sys t em f a i l u r e  
Table 4 
P = 0.35 P = 0.11 
M = 10.4 M = 2.6 I I I Most C r i t i c  a1 
Tota l  memory requirement = 27.8 x28K 
Maximum CPU requirement = 0.46 MIPS 
982 
Table 5 
RELIABILITY ESTIMATES FOR A 4-MODULE SIFT 
= (0.46 X 3)/4 = 0.35 M 1 0  chips  
Total  ch ips  = 544 
During Landing: Remove LC, MC surv ive  
During Landing: MC/L only surv ive  in  DUPLEX 
P[3 f a u l t s ]  = 0.6 X 10'Io, System f a i l u r e  
Table 6 
RELIABILITY ESTIMATES FOR A 10-MODULE SIFT 
'155 chips  Each memory = (28 X 31/10 = 8.4K = 51 ch ips  i 
Each CPU = (0.46 X 3)/10 = 0.14 M 4 chips  .J 
r i n g  Landing: Faul t  masked, LC to DUPLEX 
ur ing  Other: Fau l t  masked, LC/O t o  DUPLEX, Future LC/L t o  DUPLEX 
[2 f a u l t s ]  = 0.27 X 10-6, M = 67.2K, P = 1.12 
r i n g  Landing: MC f a u l t  masked, LC f a i l e d  
[3 f a u l t s ]  = 0.19 X M = 48.8K, P = 0.98 
r i n g  Landing: MC f a u l t  masked, MC/L t o  DUPLEX 
r i n g  Landing: P o s s i b i l i t y  of system f a i l u r e  
p r i n g  Other:  P o s s i b i l i t y  of LC f a i l u r e ,  f u t u r e  MC/L i n  DUPLEX 
983 
C a s e  3: SIFT with Coding i n  Memory 
The majori ty  of ch ips  f o r  SIFT i n  C a s e  2 are used in  the  memory. W e  can 
add pro tec t ion  by us ing  an e r r o r  de tec t ing /cor rec t ing  code. 
played in  t a b l e  7 i s  f o r  a single-error-correcting, double-error-detecting code 
with an assumption of 25% increase in  memory cos t .  A module f a i l u r e  requi res  
f a i l u r e  of one ch ip  i n  the  CPU or two chips  i n  the  memory. Low c r i t i c a l i t y  t a sks  
are run i n  SIMPLEX mode, 
The ana lys i s  d i s -  
Table 7 
RELIABILITY ESTIMATES FOR FOUR- AND SIX-MODULE SIFT 
W I T H  CODING IN MEMORY 
14 Module 
i 84 chips  Memory p e r  module = (13 X 2 -I- 15)/4 + 2 5 % ~  1 3 K  = 78 ch ips  CPU per  module = (0.35 X 2 + 0.11)/4 x 0.2 = 6 chips  
Tota l  chips  = 332 
-5 P[CPU f a u l t ]  = 0.6 X 10 /per module 
Prs ingle  memory f a u l t ]  = 0,8 X 10 
Pldouble memory f a u l t ]  = 0.6 X 10 
P[LC t a sk  f a i l u r e ]  = 0.6 X 
P[reconfigurat ion]  = 0.3 X 10 
P[second module f a i l ]  = 0.8 X 10 
PCMC task  f a i l ]  = 1.3 X 10 
-4 
-8 
-3 
-7 
-11 
Tota l  chips  = 348 
P[LC f a i l ]  = 0.4 X lom5 
The above ana lys i s  is portrayed i n  f i g u r e  1 which shows the  r e l a t i o n s h i p  
between number of chips  required and the  p robab i l i t y  of f a i l u r e  of the  most 
c r i t i ca l  tasks  e 
W e  conclude t h a t  a l l  these  a r c h i t e c t u r e s  are capable of achieving the  re- 
quired r e l i a b i l i t y  given s u f f i c i e n t  replication. Using t r i p l i c a t i o n ,  both of 
these a r c h i t e c t u r e s  are capable of achieving a r e l i a b i l i t y  of f a i l u r e  i n  the  
region of lom6 t o  10-7/hr. 
as i n  the  case f o r  commercial a i r c r a f t ,  the  multichannel approach can only 
achieve s u f f i c i e n t  r e l i a b i l i t y  a t  a c o s t  s i g n i f i c a n t l y  higher  than t h a t  achiev- 
ab le  by the  SIFT a rch i t ec tu re .  In both cases ,  the  use of coding in  memory can 
Where r e l i a b i l i t y  requirements a r e  more s t r i n g e n t ,  
984 
10-13 
10-12 
10-11 
10-10 
= .- 10-9 
0 
H 
m LC 
- 
a 10-8 
10-7 
10'6 
10-5 
10-4 
nMC = n Multichannel 
nS = n Module SIFT 
nSC = n Module SIFT + CODING 
0 200 , 400 600 800 1000 1200 
NUMBER OF CHIPS 
SA-1406-1 R 
FIGURE 1 PROBABILITY OF FAILURE OF MOST 
CRITICAL FUNCTIONS P[MC fail] AGAINST 
NUMBER OF CHIPS 
985 
have a s i g n i f i c a n t  impact on r e l i a b i l i t y  and c o s t  by handling s ingle-er ror  cor- 
rec t ion  and double-error detection i n  memory i n  a very economic manner. 
FACTORS INFLUEN CIN G FUTURE COMPUTER ARCH ITECTURES 
W e  have discussed i n  the preceding sec t ions  t h e  problems of des igning  
f a u l t - t o l e r a n t  computers f o r  advanced avionics requirements. W e  now examine 
the forces  tha t  w i l l  influence such computer designs i n  the  fu tu re ,  pa r t i cu la r -  
l y  the period 1980-85. W e  see three types of influences:  changes i n  require- 
ments, advances i n  technology, and maturity i n  t h i s  spec ia l ized  design f i e l d .  
I n  looking a t  requirements w e  expect t o  see an increase i n  the computing 
load. To a l a rge  ex ten t ,  t h i s  w i l l  be due t o  the  trend towards aircraft de- 
s igns  tha t  requi res  s u b s t a n t i a l  real-time con t ro l  systems f o r  cri t ical  func- 
t ions.  Obvious examples are in  f l u t t e r  and a t t i t u d e  cont ro l .  I n  addi t ion ,  the 
opera t iona l  modes of commercial aircraft  w i l l  change. We would expect t o  see 
more extensive use of automatic blind-landing systems, collision-avoidance 
systems, and automatic or semi-automatic route-control systems. In summary, 
w e  see a greater requirement due both t o  more advanced a i r c r a f t  designs and t o  
a wider range of opera t iona l  modes. 
The most s i g n i f i c a n t  development of technology i n  the  la te  1970s is ex- 
pected t o  be the wide appl ica t ion  of la rge-sca le  in tegra ted  (LSI) technology. 
T h i s  w i l l  cause severa l  effects. F i r s t ,  w e  observe tha t  low-cost production 
of LSI c i r c u i t s  relies upon large-volume production and therefore  there w i l l  
be a s t rong  incent ive  t o  use standard c i r c u i t s  whenever possible.  Th i s  w i l l  
g r ea t ly  influence the type of acceptable computer a rch i tec tures .  Design con- 
cepts ,  such a s  discussed i n  the preceding sec t ion ,  are the types tha t  w i l l  be 
favored compared w i t h  designs r e ly ing  upon spec ia l ized  l o g i c  t o  car ry  out  the  
various func t ions  associated w i t h  f a u l t  tolerance.  
The second effect w i l l  be t ha t  the demonstrable inherent r e l i a b i l i t y  of a 
c i r c u i t  w i l l  be ava i l ab le  only on large-production-volume devices. Th i s  effect 
w i l l  be another fo rce  towards the  use of standardized c i r c u i t s  whenever possible.  
A t h i r d  e f f e c t  of LSI development w i l l  be the a v a i l a b i l i t y  of back-up s torage  
u n i t s  based upon e l ec t ron ic  ( i . e * ,  non-mechanical) technology. Such develop- 
ments as bubble or charge-coupled memories po ten t i a l ly  can be used t o  hold data 
e i ther  f o r  la ter  use, or f o r  re-entry i n t o  main memory a f t e r  a memory f a u l t .  
The th i rd  s i g n i f i c a n t  fo rce  that  w i l l  influence fu tu re  avionics computers 
s t e m s  from the increas ing  maturity i n  t h i s  f i e l d .  Most designs i n  the pas t  
were a r b i t r a r y  designs developed i n  vacuo, i .e.,  each design e f f o r t  d id  not 
r e l y  upon r e s u l t s  of o the r  e f f o r t s .  There was l i t t l e  that  could be taken from 
one e f f o r t  t o  assist another. T h i s  is now changing so tha t  the community of 
f au l t - to l e ran t  computer designers can borrow from the r e s u l t s  of others.  Notable 
examples of t h i s  expanding technology base are e r r o r  cor rec t ing /de tec t ing  codes, 
reliable switches, and reliable clocks. W e  s t i l l  see de f i c i enc ie s  i n  the tech- 
nology base, but expect tha t  w i t h  continued research, they w i l l  disappear. 
The most notable present de f i c i enc ie s  are in  the  f i e l d  of r e l i a b i l i t y  modeling 
and i n  the area of c e r t i f i c a t i o n .  R e l i a b i l i t y  modeling as an a r t  a t  present 
tends only t o  be able t o  analyze very idealized systems and must make very 
986 
simplifying assumptions (e.g., that  f a u l t s  are independent and permanent). W e  
expect t h a t  r e l i a b i l i t y  modeling techniques w i l l  be developed t o  the poin t  where 
more realist ic r e l i a b i l i t y  analyses  can be carried out .  In consider ing any 
f a u l t - t o l e r a n t  a r c h i t e c t u r e ,  one is faced w i t h  the problem of c e r t i f i c a t i o n  of 
the procedures used f o r  achieving r e l i a b i l i t y .  These procedures may be imple- 
mented i n  either hardware or software,  but whichever implementation is  used 
there is  a need t o  prove t h a t  the desired r e l i a b i l i t y  characteristics are 
achieved. The present  progress in the f i e l d  of program proving gives us grounds 
t o  be l ieve  t h a t  formal proofs  of f a u l t - t o l e r a n t  behavior w i l l  be possible .  
To summarize, w e  see a s t rong  trend towards the use  of LSI c i r c u i t r y  with 
i t s  a t tendant  reduct ion i n  the  number of devices ,  thus g r e a t l y  improving the 
i n t r i n s i c  r e l i a b i l i t y  of computers. In  addi t ion ,  w e  expect advances i n  the 
theory and p rac t i ce  of designing, analyzing and c e r t i f y i n g  f a u l t - t o l e r a n t  com- 
puters  fo r  aircraft  con t ro l  appl ica t ions .  
W e  see the greatest need f o r  improvement i n  techniques as: 
(a) S t ruc tu res  f o r  logic, systems, and software that provide 
both high l e v e l s  of f a u l t  tolerance and ease of ana lys i s ,  
without the penal ty  of gross  inef f ic iency  or too inflexible 
a s t r u c t u r e .  
(b)  Economical and accura te  methods f o r  ve r i fy ing  the  cor rec tness  
of sys t em hardware and software w i t h  respec t  t o  f a u l t  to le rance  
and proper s e rv i c ing  of appl ica t ion  programs. 
However, there appears t o  be no fundamental reason why very reliable com- 
puters  cannot be b u i l t  wi th in  reasonable economic cons t r a in t s .  W e  would en- 
visage such computers t o  use more than one technique to  achieve adequate 
r e l i a b i l i t y .  The main techniques would be r e p l i c a t i o n ,  coding and reconfigura- 
t ion .  
CON CLUS ION s 
I n  some new a i r c r a f t  types under development there is a need f o r  compu- 
t a t i o n a l  resources  t o  handle very cr i t ical  func t ions ,  indeed, the s a f e t y  of 
the a i rcraf t  w i l l  be dependent on the co r rec t  func t ion ing  of t he  computer. 
In  addi t ion ,  the  combination of high r e l i a b i l i t y  and s u b s t a n t i a l  computational 
load needed f o r  f u t u r e  aircraft makes the use of simple redundant computer 
configurat ions impract ical .  
The present  r e l i a b i l i t y  a r t ,  together  w i t h  cont inua l ly  improving technology, 
promises s u b s t a n t i a l  improvements wi th in  the next  f i v e  years f o r  those aircraft  
appl ica t ions  w i t h  only modest computational loads. However, t o  m e e t  a l l  the 
l a r g e r  set  of computational requirements t h a t  have been suggested, a t  the 
necessary r e l i a b i l i t y  l e v e l s ,  advances i n  the a r t  of f a u l t  t o l e r a n t  computer 
design w i l l  be required.  
987 
REFEREXI CES 
1. 
2. 
3. 
4. 
5. 
6. 
7. 
8. 
9. 
Aviz ien is ,  A.: The STAR (Self-Testing and Repairing) Computer: An Inves t i -  
ga t ion  of the  Theory and P r a c t i c e  of Fault-Tolerant Computer D e s i g n ,  IEEE 
Trans. Comp., C-20, pp. 1312-21 (November 1971). 
Wang, G. Y . :  System D e s i g n  of a Multiprocessor Organization: Memorandum 
RC-T-079, NASA Elec t ronics  Research Center, Cambridge, Mass. (1969). 
D e s i g n  of a Modular D i g i t a l  Computer System, Phase I Report under Contract 
NAS8-27926, Hughes A i r c r a f t  Company, Fu l l e r ton ,  Ca l i fo rn ia  (Apri l  1972). 
Hecht, H. and Fry,  L. A . :  Fault-Tolerance in the Modular Spacecraf t  Compu- 
ter. 6 th  Annual In t e rna t iona l  Hawaii Conference (January 1973). 
Wensley, J. H.: SIFT-Software Implemented Faul t  Tolerance, AFIPS Proc. of 
the  F a l l  J o i n t  Computer Conf., pp. 243-253 (1972). 
Wensley, J. H. ,  e t  a l :  Fau l t  Tolerant  Archi tec tures  f o r  an Airborne 
D i g i t a l  Computer, Stanford Research I n s t i t u t e ,  Report of Task I ,  Contract 
NAS1-10920 (October 1973). 
Alonso, R. L. ,  Hopkins, A. L. ,  Jr., and Thaler,  H.A.: D e s i g n  C r i t e r i a  
f o r  a Spacecraf t  Computer: Spaceborne Multiprocessing Seminar, pp. 23-28, 
NASA ERC, Boston Museum of Science (October 1966). 
Alonso, R.  L. ,  Hopkins, A. L. ,  Jr., and Thaler,  H. A . :  A Multiprocessing 
S t ruc tu re ,  Digest  of the  F i r s t  Annual IEEE Computer Conf., Chicago, I l l . ,  
pp. 56-59 (September 1967). 
Hopkins, A. L . ,  Jr.:  A Fault-Tolerant Information Processing Concept f o r  
Space Vehicles,  IEEE Trans. Computers, Vol. C-20, pp. 1394-1403 (November 
1971). 
10. Neumann,P. G., e t  a l :  A Study of Fault-Tolerant Computing, Stanford 
Research I n s t i t u t e ,  F ina l  Report Contract N00014-72-C-0254 (Ju ly  1973). 
988 
