Fault-tolerant computer architecture based on INMOS transputer processor by Ortiz, Jorge L.
. 
Fault-Tolerant Computer Architecture 
‘7, , 
Based on INMOS Transputer Processor /-, 7 I ( i  I P 
v 
., Final Report 
NASA/ASEE Summer Faculty Fellowship Program --1987 
Johnson Space Center 
Prepared by: 
Academic Rank: 
University & Department: 
NASA/ JSC 
Directorate: 
Division: 
Branch : 
JSC Colleague: 
Date: 
Contract Number: 
Jorge L. Ortiz, Ph.D. 
Associate Professor 
Electrical and Computer 
Engineering Department 
University of Puerto Rico 
Mayaguez,P. R. 00709-5000 
Engineering and 
Development 
Avionic Systems 
Flight Data Systems 
Michael M. Thomas 
August 7, 1987 
NGT 44-001-800 
24-1 
https://ntrs.nasa.gov/search.jsp?R=19880005498 2020-03-20T09:07:55+00:00Z
ABSTRACT 
Redundant processing has been used for several years in 
mission flight systems. In these systems, more than one 
processor performs the same task at the same time but only 
one processor is actually in real use. A fault-tolerant 
computer architecture based on the unique features provided 
by INMOS Transputers is presented in this report. The 
Transputer architecture provides several communication links 
that allow data and command communication with other 
Transputers without the use of a bus. Additionally the 
Transputer allows the use of parallel processing to increase 
the system speed considerably. 
The processor architecture consists of three processors 
working in parallel keeping all the processors at the same 
operational level but only one processor is in real control 
of the process. The design allows each Transputer to perform 
a test to the other two Transputers and report the operating 
condition of the neighbor processors. A graphic display has 
been developed to facilitate the identification of any 
problem by the user. 
24-2 
I. Introduction 
The concept of redundant processing has been used in 
the space for long time specially for critical maneuvers 
like landing or launching. In these cases, several 
processors had been working in parallel performing the same 
task but only one of then is in real control of the process. 
If something goes wrong with this computer the system 
operator or astronaut can switch the operation to another 
processor. 
Recently, the C. S. Draper Laboratory designed a fault- 
tolerant processor called Advance Information Processing 
System (AIPS) with the concept of maintaining three 
processors (or more) working redundantly and testing each 
other to "vote" on the status of the other processors. In 
this fashion, the user has the information about the system 
performance on real time. This system is linked by a data 
communication bus called the Inter-Computer Bus (IC) for 
communication between processors and other 1/0 devices. 
A fault-tolerant computer architecture based on the 
unique features provided by INMOS Transputer has shown to be 
an adequate alternative to this kind of processor. Among the 
characteristics that can improve the design of the processor 
are the serial communication links that allow data and 
command communications with other Transputers without the 
use of a bus, and the capability of parallel processing to 
increase the system speed. Therefore, a Transputer Fault- 
Tolerant Processor (TFTP) designed based on the Transputers 
could mean a faster more reliable processor. 
24-3 
Discussion and Results 
The first objective of this research was to design a 
fault-tolerant processor with a parallel architecture based 
on the INMOS Transputer. Two solutions to this problem were 
presented at the moment. The first one, presented by Mr. 
Dennis Taylor, uses four T414 transputers with three of then 
working in parallel an the remaining one will be the 
coordinator as shown in Figure 1. Three parallel processors 
will perform the same task while the coordinator will 
compare the results of the operations and report to the user 
if it finds a fault in one of the processors. This 
architecture is very efficient and is easy to keep control 
on the parallel processors but the possibility exists that 
the fault could come from the coordinator itself. The second 
architecture, shown in Figure 2, consist of keeping three 
processors (or more) working in parallel. All of then will 
be kept in the same level, but only one of then will perform 
the real operation. Each of then will keep performing tests 
to the other two processors and reporting to the operator 
the results of these tests. The three Transputers are 
interfaced to the Transputer Development System (TDS) in an 
IBM AT compatible. The design allows each Transputer to 
perform an evaluation to the other two Transputers and 
report the operating condition (based on that test) of the 
neighbor Transputers. The test consists of sending an 
integer constant to a processor and the processor under test 
will return its square value. This result is analyzed and 
compared with the previously known solution to later send a 
report to the host computer. At this moment, three 
Transputers are running in parallel, performing the 
indicated test, and finally showing its report on the 
screen. 
24-4 
‘J- 
! 
I 
I 
j r3 
i 
Figure 1. Fault-Tolerant Processor With a 
Coordinator Processor 
24-5 
L2 I 
L2 L3 
T I  TO . 
1 
r L2 L3 
T 2  
I: 
I 
i 
I 
I 
HOST 
BO04 
LO i I LO 
L 
I 
I L1 
I 
I 
if 
Figure 2 ,  Transputer Fault-Tolerant Processor 
Architecture 
24-6 
Svst J em. 
Test 
Figure 3 .  Program Flow Chart  
24-7 
The three processors are kept executing the same 
software sequence at the same time. As shown in Figure 3 ,  
the processors start executing the main task followed by the 
system testing algorithm and finally ending the sequence 
with the system status report that is sent to the host 
processor. The software to implement the sequence was 
written in Occam. 
A graphic display was developed to facilitate the 
identification of any problem by the user. The system shows 
a constantly updated screen detailing the status of each 
processor and the result of the tests performed between the 
processors. Figure 4 illustrates the screen graphic display 
when all processors are on line in normal operation, as it 
shows processor TO is reporting on the screen t h e  s t a t u s  of 
T1 and T2 (the other two processors) and since everything is 
normal at the moment these processors are "ok" . 
One of the faults that the processor can detect is a 
software fault, where the processor on test for some reason 
does not get the correct answer for a numeric operation. A s  
it can be seem on Figure 5 processors T1 and T2 found that 
processor TO has a software fault and they display the 
occurrence of that fault on the status of processor TO. A 
similar software fault is simulated in processor T1 and the 
results are shown in Figure 6 .  
Another fault that can be simulated in this system, is 
a communication fault or hardware fault. A s  shown in Figure 
7 a fault has occurred in a link at point a, and Figure 8 
shows that due to this fault the host processor could not 
receive the status report from processor T2. Also, 
processors TO and T2 acknowledge that a fault has been 
detected on the mentioned processor. 
24-8 
F.4CLT TOLER.AST PROCESSOR 
5 .J 5 
: FROCESSOH TO : 
Rep0 r t 
: T1 S t a t u s :  OIi : 
: T2 S t a t u s :  01; : 
8 -  
I 1- 
I 
I 
1 , 
, 
I 
4 
Processor T1 : Processor  T2 ; 
I Report Report 
I t I d 
: TO S t a t u s :  OK ; TO s t a t u s :  OK : 
: T2 s t a t u s :  OK : I T1 s t a t u s :  01;. : 
p r e s s  any key t o  s t o p  
Figure 4. Transputer Fault-Tolerant Processor 
in Normal Operation 
24-9 
12 12 12 
FAULT TOLER.4NT PROCESSOR 
: PROCESSOR TO : 
Report  1 
I I : T1 S t a t u s :  01; ; 
I I T2 S t a t u s :  OK ; , 
; P r o c e s s o r  T1 ; 
8 I
I 
Report  
; Processor T 2  : 
Report  I  8 I 
I , I I 
; TO S t a t u s :  FAULT : ; TO s t a t u s :  FAULT : T2 s t a t u s :  OK ; : T1 s t a t u s :  OK : 
I 8 
p r e s s  any k e y  t o  s t o p  I 
Figure 5. Software Fault in Processor TO 
24-10 
8 8 8  
FAULT TOLERANT PROCESSOR 
p r e s s  
; PROCESSOR TO : 
I Report  I 
I T2 S t a t u s :  OK ; 
# 0I T1 S t a t u s :  FAULT I : 
I  
I 
I  
I , 
1 
t 8
1 
8 
0 
4 
I 
1 
: Processor  T1 I P r o c e s s o r  T2  I 
Report Repor t  
I I 
8 
I 
I key t o  s t o p  I 
, : TO S t a t u s :  OK ; : TO s t a t u s :  OK : 
; T2 s t a t u s :  OK ; : T I  s t a t u s :  FAULT ; 
I 
any  key t o  s t o p  
Figure 6. Software Fault in Processor T1 
24-1 1 
,L2; HOST 
LO 
I I LO 
I 
1 
; LO 
! 
I 
- I 
i i  
Figure 7. Fault in a Link at Point a. 
24-12 
-16 46 11 
F:\ULT TOLERANT PROCESSOR 
: PROCESSOR TO : 
Report 1 
: T1 S t a t u s :  OK : 
: T2 S t a t u s :  FAULT ; 
; Processor  T1 : 
I 
I Report 
: Processor  T2 : 
S O  REPORT 
0 
I 
; TO s t a t u s :  ? ? ? ? ?  ; 
: T1 s t a t u s :  ?9??? : 
: TO S t a t u s :  OK : 
: T2 s t a t u s :  FAULT : 
I 
I 
p r e s s  any key to stop I 
Figure 8. Hardware or Communication Fault in 
Processor T2. 
24-13 
Conclusions 
The Transputer Fault-Tolerant Processor has shown to be 
an excellent alternative when a reliable processor is 
needed. More research has to be done to improve link 
communications, its synchronization, and link resetting 
after a hardware fault occurs. However the TFTP is 
potentially faster than other fault-tolerant processors due 
to the Transputer parallel processing capacity and its 
specially designed Occam language to facilitate concurrent 
processing. 
24-14 
References 
[l]. Daniel P. Siewiorek, Robert S. Swary, "The Theory and 
Practice of Reliable System Design," Digital Press, 
c 1982. 
[ Z ] .  Jacob A .  Abraham, et a1,"Fault Tolerance Techniques for 
Systolic Arrays", IEEE Trans. Computers, July 1987, 
pp. 65-74. 
[ 3 ] ,  Roger Allan,"Technology Report for Fault-Tolerant 
Computing, Software is Finding a Powerful1 Ally in 
Hardware",Electronic Design,October 1985,pp.lll-116. 
[4]. L. Beaudet, F. Eshragh, "System-level diagnostics 
Troubleshoot Multiprocessors", Computer Design, 
July 1987, pp. 77-82. 
[ 5 ] .  J. Bond,"Parallel-Processing Concepts Finally Come 
Together in Real Systems",Computer Design,June 1987, 
pp . 5  1-7 4. 
[6]. Advanced Information Processing System (AIPS), 
Demostration Notes, The Charles Stark Draper 
Laboratory,Inc,Cambrid$e,Massachusetts. 
24-15 
