Fault-free performance validation of fault-tolerant multiprocessors by Feather, Frank E. et al.
NASA Contractor Report 178236 
FAULT-FREE PERFORMANCE VALIDATION 
OF FAULT-TOLERANT MULTIPROCESSORS 
4 li AS A-CB - 178 2 36) 
(cdrneqie-f ie l loc U n i v - )  6E F CSCL 098 
FA U L  I - F R E E  I EEEC B dA NC h N 8 7 - 1 5 4 4 8  
V A L I D A T I C  N OF F A C I T - T C I E H A h S  FULTlPROCESSOhS 
Unclas 
G3/b2 4 4 0 1 1  
Edward W .  Czeck, Frank E .  Feather, 
Ann Marie G r i z z a f f i ,  Zary Z. Segal l ,  
and Daniel  P.  Siewiorek 
CARNEGIE-MELLON UNIVERSITY 
Pit tsburgh,  Pennsylvania 
Grant NAG1- 190 
January 1987 
National Aeronautics and 
Space Administration 
Langley Research Center 
Hampton,Virginia 23665 
https://ntrs.nasa.gov/search.jsp?R=19870008015 2020-03-20T13:00:06+00:00Z
1 
Table of Contents 
1. 
2. 
8 
Introduction 
Background 
2.1 Proposed Validation Methodology 
2.1.1 Validation Framework 
2.1.2 Performance Definitions 
2.2.1 FTMP Hardware 
2.2.2 FTMP Software 
2.2 FaultTolerant Multiprocessor Structure 
2.2.2.1 Scheduling and Dispatcher Strategy 
2.2.2.2 FTMP FaultHandling Software 
2.2.3 F T ”  Experimental Environment 
2.3.1 Hardware Configuration 
2.3.2 SIFT Software 
2.3.3 Experimental Environment 
2.3 Softw‘are Implemented Fault Tolerance Structure 
0. Inatruetion Set, Hardware Level Baneline Experimenta 
3.1 Clock Read Delay 
3.2 High Level Language Instruction Execution Times 
3.3 High Level Language Instruction Pair Execution Times 
4.1 Data Block Transfers 
4. Executive Level BYelina Experimenta 
4.1.1 Experiment S e t u p  
4.1.2 Results of Block Transfers 
4.2 Real Time Task Behavior 
4.2.1 FTMP Frame Sizes 
4.2.2 SIFT Frame Size 
4.2.3 Comparison of FTMP’s and SIFT’s Frame Sizes 
4.2.4 FTMP Frame Stretching 
4.2.5 SIFT Frame Stretching 
4.3 Real Time Scheduler Overhead 
4.3.1 FTMP’s Real Time Scheduler Behavior 
4.3.1.1 Sta r tup  Dispatcher Time 
4.3.1.2 IPC ’Kick’ Times 
4.3.1.3 Intrlc-Task Group Switching 
4.3.1.4 Inter-Task Group Switching 
4.3.1.5 FTMP Dispatcher-Scheduler Software Overhead Summary 
4.3.2 SIFT’s Real Time Scheduler Behavior 
4.3.3 Comparison of FTMP and SIFT Scheduler 
4.4.1 FTMP Software Overhead for Fault Detection and Isolation 
4.4.2 SIFT Software Overhead for Fault Detection and Isolation 
4.4.3 Comparison of FTMP and SIFT for FaultHandling Software Overhead 
4.4 FaultHandling Software Overhead 
6. Future Work 
6. Conclamions 
Appendix A. hmtruction Execution Tima 
Appendix B. Block T r d e r  Execution Tmea 
Refereneea 
2 
a 
3 
3 
4 
5 
5 
7 
7 
9 
10 
12 
12 
13 
15 
17 
18 
19 
20 
26 
26 
27 
27 
28 
31 
31 
34 
34 
35 
36 
36 
37 
39 
41 
42 
44 
45 
45 
46 
46 
46 
48 
60 
62 
64 
60 
61 
.. 
11 
Figure 2-1: 
Figure 2-2: 
Figure 2-S: 
Figure 2-4: 
Figure 2-6: 
Figure 2-6: 
Figure 2-7: 
Figure 2-8: 
Figure 8-1: 
Figure S-2: 
Figure S-S: 
Figure 8-4: 
Figure 41: 
Figure 4-2: 
Figure 44: 
Figure 4-4: 
Figure 4-6: 
Figure 46:  
Figure 4-7: 
Figure 4-8: 
Figure 4-9: 
List of Figures 
FTMP Structure, Programmers Model 
FTMP Task Frame Structure 
FTMP Dispatcher Scheduler Strategy, Showing Two Consecutive Frames 
FTMP Experimental Environment 
Representation of a Synthetic Workload Task 
Block Diagram of SIFT Distributed System 
A SIFT Processor 
The SIFT Test Environment 
Basic Task Algorithm 
Graph of Instruction Times: SIFT vs. FTMP \a 
Procedure Calls vs. Parameters 
A=l vs. Consecutive Executions 
Read and Write Execution Times as a Function of Block Size, 1 Triad 
Read Execution Times as a Function of Block Size, 1 Triad and 2 Triads 
Write Execution Times as a Function of Block Size, 1 Triad and 2 Triads 
Actual Frame Size for R4 Tasks 
Actual Frame Size for R3 Tasks 
Program Used For Task Stretching - Voting Case 
R4 Dispatcher Execution Times, 1, 2, and 3 Triads 
IPC 'Kicks' Timing Diagram 
Time and Variation Between the Starts of the Application Tasks on Different 
Processor Triads. Emulating IPC Kick Times 
Figure 4-10: 
Figure 4-11: 
Intra-Task Group Switching Times in Milliseconds 
Inter-Task Group Switching Times in Milliseconds 
6 
8 
9 
11 
12 
13 
14 
15 
17 
21 
22 
23 
28 
29 
30 
32 
33 
35 
38 
39 
40 
42 
43 
iii 
List of Tables 
Table 2-1: Performance Evaluation Matrix 
Table bl: 
Table b2: 
Table &S: Instruction Times: SIFT vs. FTMP 
Table 5-4: 
Table $6: 
Table 3-6: 
Table 5-7: 
Table 4-1: SIFT: Task Stretching Results 
Table 4-2: 
Table 4-S: 
Table 4-4: 
Table 4-6: 
Table 4-6: 
Table A-1: 
Table A-2: 
Table A-8: 
Table A-4: 
Table A-6: 
Table A-6: Raw Data: SIFT Clock Read Experiment 
Table A-7: 
Table A-8: SIFT Instruction Execution Times: Miscellaneous Instructions 
Table A-0: SIFT Instruction Execution Times: Instruction Combinations 
Table A-10: SIFT: Comparison Instruction Combinations Not Done on FTMP 
Table Sl: FTMP Block Transfer Times, Read from System to Local Memory 
Table S 2 :  FTMP Block Transfer Times, Write from Local to System Memory 
SIFT Clock Read Results 
Clock Read Results for SIFT and FTMP 
SIFT vs. FTMP for Integer Assign A=l  
SIFT vs. FTMP in Addition Combination 
SIFT: Comparison Between Single Instructions and Combinations 
FTMP: Comparison Between Single Instructions and Combinations 
Performance Estimates for FTMP 
FTMP Software Overhead for FaultHandling 
SIFT’s FaultHandling Software Overhead [Palumbo and Butler 851 
SIFT’s Vote Times [Palumbo and Butler 851 
SIFT Software Overhead for FaultHandling Summary 
FTMP Instruction Execution Times: Integer 
FTMP Instruction Execution Times: Real 
FTMP Instruction Execution Times: Long Integers 
FTMP Instruction Execution Times: Boolean 
FTMP Instruction Execution Times: Miscellaneous Operators 
SIFT Instruction Execution Times: Integer and Boolean Data Types 
4 
18 
19 
20 
23 
24 
25 
25 
36 
44 
46 
47 
48 
48 
54 
55 
55 
56 
56 
57 
57 
58 
58 
59 
60 
60 
1 
Abstract 
By t h  199O’s, aircraft will employ complex computer systems t control flight-critical functions. 
Since computer failure would be life threatening, these systems should be experimentally validated before 
being given aircraft control. 
A validation methodology for testing the performance of fault-tolerant computer systems was 
developed and applied to the Fault-Tolerant Multiproceasor (FTMP) at NASA-Langley’s AlRLAB facility. 
This methodology was claimed to be general enough to apply to any ultrareliable computer system. 
The goal of this research was to extend the validation methodology and to demonstrate the 
robustness of the validation methodology by its more extensive application to NASA’s Fault-Tolerant 
Multiprocessor System (Fl”) and to the Software Implemented Fault-Tolerance (SIFT) computer 
System. Furthermore, the performance of these two multiprocessors was compared by conducting similar 
experiments. 
An analysis of the results shows high level language instruction execution times for both SIFT and 
FTMP were consistent and predictable, with SIFT having greater throughput. At the operating system 
level, FTMP consumes eO% of the throughput for its real-time dispatcher and 5% on fault-handling 
tasks. In contrast, SIFT consumes 16% of its throughput for the dispatcher, but consumes 66% in fault- 
handling software overhead. . 
Introduction 2 
1. Introduction 
Aircraft today employ computers to perform isolated functions. If a computer fails, its tasks are 
assumed by the aircrew without loss of life or cargo. Soon aircraft will require an on-board real-time 
computer to perform flight critical control functions. If such a computer were to fail, the craft would be 
unable to fly. One study by the National Aeronautics and Space Administration (NASA) in its Aircraft 
Energy Efficiency (ACEE) Program required the probability of failure be less than lo-' at ten hours. 
This specified failure rate translates to less than one failure per million years of operations. Two 
multiprocessor systems designed to these specifications, SIFT [Wensley et al. 78) and F T "  [Hopkins et 
al. 781, have been delivered to NASA's Avionics Integrated Research Laboratory (AIRLAB). 
Conventional validation methods, such as life testing, would be impracticable for a system designed 
to these constraints. Studies at NASA were conducted to determine system validation and verification 
methodologies, [NASA 79a, NASA 79b]. Two approaches were chosen: the first involved mathematical 
models and verification, and the second involved experiments to test the functionality, behavior, 
performance, and faulthandling capabilities of the system. 
The goal of the experimentation described in this paper is to refine the experimental methodology 
0 A background of the experimeqtal methodology and the hardware and software environments 
0 The instruction set, hardware level baseline experiments conducted on SIFT and FTMF' along 
0 The executive level baseline experiments conducted on the two systems, and a comparison of 
0 Conclusions and future work for the computer systems and the validation methodology. 
through its application to F" and SIFT. This report covers the following: 
for both SIFT and FTMP. 
with a comparison of the two systems. 
the result where applicable. 
Background 3 
2. Background 
2.1 Propoeed Validation Methodology 
Underlying any methodology, there must be a set of guiding philosophies. Over the last decade, 
C-MU has dedicated over 100 man years of effort in the design, construction and validation of 
multiprocessor systems. A partial list of the experimental guidelines developed during the last decade 
include: 
The experimental validation methodology is successively refined as experiments uncover new 
information and the methodology is applied to new multiprocessor systems. 
Experiments are designed to validate behavior that is documented, as well as behavior that is 
not documented. 
Experiments are conducted in a systematic manner; since the search is for the unexpected, 
there are no shortcuts to thorough testing. 
Experiments should be repeatable. 
The feasibility of performing various experiments is tempered by what is available in the 
experimental environment. More sophisticated experiments may have to be postponed until the 
experimental environment is provided with more tools. 
0 A building block approach should be used wherein one variable is changed at a time, so the 
0 Testing should take advantage of the structural (abstract) levels used in the design of the 
cause of unexpected behavior is easy to isolate. 
system. 
With a fault tolerant, ultrareliable system other problems arise that make the validation task difficult. 
Some of these problems are: [NASA 79b] 
0 Life testing is inappropriate, due to large mean-time-tefailure of the system. 
0 System design complexity makes it  difficult to perform failure effect analysis, instrument and 
measure all relevant parameters, and use exhaustive testing approaches, since there are a large 
number of states and failure modes possible. 
0 Large scale integration makes access to control and observation points difficult as well as 
determining a confidence level for fault coverage. 
2.1.1 Validation Framework 
NASA held several workshops to determine validation procedures. One [NASA 79b] in particular 
produced a detailed outline of a validation procedure. The procedure is based on a building block 
approach. Primitive system activities are characterized first. Once these activities are understood, 
complex experiments involving the interaction of primitive activities, as well as complex activities built 
from the basic primitives, may be conducted. This orderly progression insures uniform, thorough 
coverage and maximizes the ability to locate the cause of unexpected phenomena. The steps in the 
methodology include: 
1. Initial checkout and diagnostics. 
2. Programmer’s manual validation. 
3. Executive routine validation. 
4. Multiprocessor interconnect validation. 
Background 
Application Task 
Times. Flight 
Control, etc. 
Operating System 
Primitive 
Times. 
Instruction, 
and Resource 
Times. 
4 
Idle Time. 
O.S. Primitives 
Frequency of 
Use. 
HW Resource 
Frequency of 
Usage. 
5. Multiprocessor executive routine validation. 
6. Application program verification and performance baseline measurements. 
7. Simulation of inaccessible physical failures. 
8. Single processor fault insertion. 
9. Multiprocessor fault insertion. 
10. Single processor executive failure response characterization. 
11. Multiprocessor system executive failure response characterization. 
12. Application program verification on multiprocessor system. 
13. Multiple application program verification on multiprocessor system. 
The first six tasks in the l i t  validate the faultfree baseline functions of the system, items seven through 
eleven characterize the faulthandling capabilities of the processors, and the last two validate the total 
integrated environment of the system. This report presents faultfree baseline performance 
measurements. 
2.1.2 Performance Definition8 
Performance is measured in functions per unit time or the time needed to complete a specific 
task [Siewiorek, Bell, and Newel1 821. The notion of performance exists throughout the digital design 
hierarchy, from the circuit level (switching times), to the system (application task time) level. With this 
definition and the validation methodology, a performance evaluation matrix can be created, as depicted in 
Table 2-1. The vertical axis is the design hierarchy, while the horizontal axis is definitions or 
characterizations of performance. 
Behavior Throughput Utilization Delay 
Application 
Executive, 
Operating System. 
Instruction Set, 
Hardware. 
~~~~ ~ 
Correct Function 
in Integrated 
Environment . 
Correct Operation 
of Scheduler, 
Dispatcher. etc.. 
Correct Operation 
of Interrupts, 
etc.. 
Variation Caused 
by Shared Data, 
Increased Load. 
Variation Caused 
by Hardware and 
Data Contention. 
Variation Caused 
by Hardware 
, Contention. 
Table 2-1: Performance Evaluation Matrix 
In detail the entries for the table are described as follows: 
0 Instruction Set, Hardware Level: 
o Behavior: The operation of hardware primitives, such as interrupt and exception 
o Throughput: The time to execute basic primitives, instruction times, bus access, 
o Utilization: The frequency and percent usage of the hardware resources. 
o Delay: Delay and variation caused by hardware contention. 
handling characteristics. 
interrupts, etc. 
Background 5 
0 Executive, Operating System Level: 
o Behavior: Validate operation of the executive software. 
o Throughput: The execution time of dispatcher-scheduler, message systems, and other 
o Utilization: The frequency and percent usage of the executive and operating system level 
oDelay: Executive primitive contention and delay due to hardware constraints, and 
O.S. primitives. 
resources. 
common O.S. databases. 
0 Application Level: 
o Behavior: Actions of the system and application software in the fully integrated 
environment. 
o Throughput: The execution times of application (user) tasks and the total useful work 
accomplished by the system. 
o Utilization: Frequency and percent usage of the combined operating system, hardware 
resources, and application tasks in relation to the total usable time. (i.e. Total available 
throughput less overhead times.) 
o Delay: Variation caused by shared databases, hardware contention, and temporary work 
overloads. 
Each element in the matrix is not singular and evaluation measures can overlap. The matrix can be 
used for both fault-free and faulty performance measurements. In general, there is a building block 
approach, starting with baseline experiments and moving up to more complex experiments: the same 
approach referred to in the guiding philosophies of the validation methodology. 
2.2 Fault-Tolerant Multiprocessor Structure 
The Fault-Tolerant Multiprocessor, FTMP, is a hardware redundant multiprocessor system designed 
for use in an ultrareliable avionics environment. The architecture is discussed in [Hopkins et al. 781 and 
[Draper 83a]. This section gives an overview describing the hardware structure, the software 
composition, and the experimental environment. 
2.2.1 FTMP Hardware 
Figure 2-1 gives the software appearance of the FTMP system. Each virtual processor is a triad 
consisting of three synchronized processors1 executing the Same code independently and conducting a 
hardware vote on the results. Each processor contains a local PROM, used to hold frequently used 
executive code, and a local RAM to store working stacks, data, less frequently used executive code (such 
as self tests), and application task code. Code and data are paged into local memory from global memory 
as needed. The system's memory is triplex redundant. Data written into system memory is the voted 
result from each processor in the triad. Data read from system memory is the voted result of each 
'For clwity in thia report, the term prbeessar refers to a aingle proceclsor in the system, whereas virtual proeewr, processor triad, 
or triad refera to a aynchronized processor triple, working aa one proeessar element of a multiproceclsor. 
Background 6 
memory module in the memory triad. The system bus is a quintuply redundant serial bus, with three 
active lines. Active elements are allowed to transmit on only one line; while the receiving unit votes on 
information transmitted on these lines. The error latches are registers used to hold voter disagreements 
until subsequent error processing. The 1/0 ports have system bus addresses and are used to communicate 
with the external environment (aircraft actuators and sensors, display terminal, etc.). 
r 1 I 
Processor 
Triad 3 
1/0 Port 1 & 
I -1 
1/0 Port 2 
System 
Memory 
32K 
1/0 Port 9 
Svstem Bus 
Real Time 
Counter Latches 
1/0 Port 10 
Figure 2-1: FTMP Structure, Programmers Model 
The system is configured with active processor, memory and bus elements. On a failure of one of 
these elements, the active element is replaced with a spare; system integrity is maintained through the 
hardware voters. In the case of a processor failure with no spare processor available, the triad is retired 
with the non-failed units becoming spares. The workload of the retired processor triad is continued on 
the remaining triads. During normal processing, the active and spare elements are rotated to allow the 
detection of faults on all elements of the redundant system. 
Background 7 
2.2.2 FTMP Software 
The software support for FTMF' includes the real time operating system and the software tasks 
required to maintain system reliability. The following two sections describe the scheduling strategy for 
the operating system and the software tasks required to maintain the fault tolerant characteristics of the 
system. 
2.2.2.1 Scheduling and Dispatcher Strategy 
F T "  was designed as a real time computer system intended to execute tasks at a fixed iteration 
rate or frequency to meet hard deadlines. This section covers the Scheduler-Dispatcher strategy used in 
FTMP to meet these deadlines. The strategy is presented in two parts, an overview and definition of the 
task frame structure and an example of scheduling as seen by a single triad. [Draper 83b] 
An F" task is a single "program" or thread of execution. Tasks in a real time environment run 
at regular intervals called task iteration rates. All tasks do not need to be run at the same iteration rate, 
or frequency (e.g., The status display does not have to be updated as often as an aircraft's control 
surface). Tasks are grouped into common rate groups and are executed within a time period called a 
frame. FTMP has three iterations rates: 
0 R4: the fastest set at 25 Hz or 40 milliseconds per frame. 
0 R3: an intermediate rate set at 12.5 Hz or 80 milliseconds per frame. 
0 R1: the lowest priority, set at 3.125 Hz or 320 milliseconds per frame. 
Figure 2-2 shows the time relations of the frame structure, including the 1:4:8 ratio between the frames. 
A major frame is the complete cycle of eight R4 frames, four R3 frames or one R1 frame. 
The dispatch strategy for F T "  frame structure as seen by a single triad proceeds in the following 
manner. Figure 2-3 presents a diagram of this process. 
1. Assume an initial state where the processor is idling, waiting for an interrupt to start the 
2. A timer interrupt occurs and the interrupt handler starts the R4 dispatcher. This interrupt 
frame. 
occurs at regular intervals to signal the start of the R4 frame. 
In particular FTMP's dispatcher marks the 
lower iteration rates to start, does 1/0 for the tasks, and issues reconfiguration commands. 
F T "  marks lower rate groups for execution by stringing together the Processor State 
Descriptor, PSD, of the dispatchem2 
4. Once the dispatcher is finished with its housekeeping: it begins work on the first application 
task of the highest rate group, R4. The tasks to be executed are located in a task queue data 
base. 
3. The dispatcher does necessary housekeeping. 
%hie can be thought of as a string of p:octdure calls e: ic?errr?pt~, where the retwning location of any prmcdurc may be changed 
dy nrmically . 
Background 8 
Major Frame Major Frame 
R4: 25 Hz 
t 2  
R4 Frame, 40 millsec. 
R3: 12.5 Hz 
u 
R3 Frame, 80 millsec. 
R1: 3.125 Hz 
4 
R1 Frame, 320 millisec. 
Figure 2-2: FTMP Task Frame Structure 
5. When this task is complete, control is passed back to the dispatcher. The dispatcher finds the 
next task to execute and the processor begins work on this task. This process continues until 
all the tasks in the rate group are completed. 
6. At the completion of all the tasks in the primary rate group, R4, control is passed to either 
the next lower rate group dispatcher, a previously interrupted task, or the idle state. In Figure 
2-3a control is passed to the R3 dispatcher in the first frame. The control is passed by 
transferring control to the next task in the PSD chain. 
7. The lower rate group, R3, dispatcher selects an application task from its task queue and starts 
execution of the task. The selection and execution of tasks continues until all tasks have been 
executed, then control is passed down the PSD chain to the lower priority dispatcher, R1, or 
an application task. If all tasks have completed for the frame, the idle process will execute3. 
8. At some point in this process a timer interrupt occurs signaling the start of the next R4 frame. 
The task executing, (R3-Application task 2 in Figure 2-3a), is suspended until the R4 tasks 
(and possibly R3 tasks) of this frame complete. The R4 dispatcher begins its execution, does 
housekeeping and then works on the R4 application tasks. 
pending task, such as the Rbapplication task 2 shown in Figure 2-3b. 
9. When all the R4 tasks are completed, control is passed to the previously interrupted task or 
10. This process continues with each major frame. 
3The idle proeem is the last task in the PSD chain. It will never complete. 
R4 Dirmkher Time 
RCDisp 
RI to R4 R4 to RS 
I I I I 
R3-D irp 
R1-Disp 
rimer Inkrrupi 
\ 
R 4 A p p l  R4App2 
L 
nmer Interrupt 
RS to R1 
Timer 
RCAppl  RCApp2 
R S A p p l  
Interrnpt I R4Dirp 
a. 
R l  Dirp TA$KS I I 
Idle . .  .) 
b. TIME 
Figure 2-3: Fl" Dispatcher Scheduler Strategy, Showing Two Consecutive Frames 
A potential problem which arises in this strategy is the hogging of the CPU cycles by the highest rate 
group. FTMP handles this situation by delaying or slipping the start of the next frame (R4, R3, or Rl.) 
The stretching of the frame should only occur under abnormal conditions such as a temporary ihcrease in 
the workload (e.g. fault isolation.) 
2.2.2.2 FTMP FaultHandling Sofiware 
Errors' are detected with the hardware majority voters of the bus interface units. An error is 
corrected by voting to maintain system integrity, and the error is marked in the error latches for further 
processing. The proceasing of the error latches is done under software control by the System 
Configuration Controller, SCC. This task runs under the lowest iteration rate, R l ,  and completes the 
following during its execution: 
0 R e d s  the error latches, tests for reaeonability' , and compacts the error latches into four 
words, one for each bus quintuple, for further processing. 
'An error is the manifestation or r fault which causer 8 chrnge in the data, wherers I frult is BUY deviation from the intended 
b@c. Frulh ern be clmified UJ hard, permrnent faults, or soft, trrnrient frulta, either type of fruit m y  or may not crurc an error. 
'A receive bun line mry be frulty, cauring either WI error lrteh to be ret or error in the reading of the error latch or both. 
Background 
0 If no errors were detected, SCC rotates active and shadowing elements (processors, memories, 
and buses). The rotating of active and spare elements occurs once every ten seconds. Or if 
the elements are not to be rotated, self tests are executed to expose latent faults in the voters, 
bus guardian units, and buses. 
0 If errors were detected from the error latches, fault isolation occurs. The possible source(s) of 
For the next four 
o If the error reoccurs and the source can be determined from the past error(s), the faulty 
unit is retired and a spare brought on - 1' me. 
o If the error does not reoccur, a transient error routine is entered to assign demerits to all 
possible faulty units. If the total of demerits for a unit crosses a threshold, the unit is 
retired. 
errors are determined and isolated by swapping with shadowing units. 
iterations the program remains in this state to discover if an error reoccurs. 
10 
2.2.3 FTMP Experimental Environment 
Figure 2-4 shows the experimental (test) environment for FTMP. The following steps must be taken 
to create and run experimental tasks on FTMF'. The application task code is created on the VAX and 
shipped to the IBM for compilation. The IBM returns to the VAX a listing of errors and assembly code. 
The object code is kept on the IBM for linking at a later time. The system memory tables are modified 
to include the application code in the task queue and allocate global memory for the task. The tables are 
assembled on the IBM in the same method as the application task. A link file is then sent to the IBM for 
linking together the executive routines, application tasks, and system memory tables. A listing of global 
variable locations, task code locations and errors is sent down from the IBM along with the load module 
for FTI". The load module is down-loaded to FTMP via the PDP-11 emulation on the VAX and the 
test adapter. The experiment is then debugged using the test adapter. Once the experiment is debugged, 
the test adapter is used to set flags and iteration values in the experimental tasks, and to dump data from 
FTMP's system memory to the VAX for further analysis. 
In an effort to shorten the experimental turnaround time, a synthetic workload generator was 
proposed by [Clune 841 and developed by [Feather et al. 851. A synthetic workload is a set of programs 
designed to exercise a computer system to check its performance and behavior under artificial conditions. 
A natural workload is an environment where the system does useful work. Some of the advantages to 
using a synthetic workload over a natural workload are: 
0 The synthetic workload is easy to create and debug, whereas a natural workload may have to 
0 Experiments are easily repeatable, corresponding to the experimental design philosophies. 
0 Experiments are easily controlled using the workload parameters. 
0 The workload can be adapted to other systems for performance comparisons. 
0 The system must be dedicated when using a synthetic workload, whereas with a natural 
0 The synthetic workload is only an approximation of the natural workload. 
be created and its set of inputs defined. 
Conversely, disadvantages to using a synthetic workload are: 
workload data can be collected while useful work is being done. 
Background 
I 
MilStd 
1553 
RS 
232 
- 
FTMP 
I/O 
Interface . 
11 
I 
FTMP 
System 
I i 1 
I I 
Emulation 
F=-t Display 
Unibus 
Test Adapter 
I 
i 
A natural task includes a mixture of the following five actions: 
1. Read Sensor data. 
2. Read Inter-process Communication (IPC) data. 
3. Operate on the Sensor and E'C data. 
4. Write Actuator Commands. 
5. Write IPC Commands. 
The synthetic model of a single natural task for FTMF' is illustrated in Figure 2-5. Loops represent the 
amount of work each of the five actions is to perform in the task. The controllable parameters are thus 
the loop counters. The counters are configured during experimental setup. In FTMP's implementation 
the real time clock is read at the start of the task, and at the end of each of the actions. The clock times 
are then stored in system memory for transfer to the VAX. 
At  the application level, there is more than one task on a multiprocessor. The performance, behavior 
At the and interaction of the tasks can be modeled by combining several single synthetic tasks. 
application level the system's synthetic workload parameters im!ude: 
0 The number of tasks and their frequency of execution (FTMP's real time frame structure). 
0 The number of triads executing on Fl". 
0 The inclusion or exclusion of system executive tasks, such as Display and Configuration 
Each task parameter is individually controllable. 
Controller. 
Background 12 
Worklord-Task 0 ; 
Bogin 
ReadCP, 0, T,  R.  S 1 ; 
Road(Timo) ; 
For x = i to P eo 
Road(Timo) ; 
For x = i to a eo 
RoaU(Timo) ; 
For x = i to T eo 
Road(Time) ; 
For X = 1 to R do 
Road (TImo) 
For x = i to s eo 
Road(Timo) 
Storo (ClockJimos) ; 
Road-Sonsor-Input ; (Road Xrmory) 
Road IPC Data ; (Road Xomory) 
Ex~cuto~Instructions ; (A = B + C 1 
Uritogctuator-Command ;Writ0 nowry) 
Vrito-IPC-Command ; (Vrito nomory) 
End ; 
Figure 2-6: Representation of a Synthetic Workload Task 
2.3 Software Implemented Fault Tolerance Structure 
SIFT was designed and built by Bendix Flight Systems Division, under subcontract to SRI 
This section gives a brief overview of the International, and delivered to the AIRLAB in April 1982. 
SIFT'S hardware configuration, software, and experimental environment [SRI 841. 
23.1 Hardware Configuration 
The SIFT architecture is made up of a fully distributed configuration of Bendix BDX-930 processors, 
with point-point communication links between every pair of processors as shown in Figure 2-6. 
Although SIFT was designed and built to accommodate eight processors, there are seven in the current 
system. Reliability estimations have demonstrated six are needed to meet the required safety margin of 
less than lo-'' probability of failure per hour [Palumbo and Butler 851. The seventh processor is used by 
the Data Acquisition System described in the next section. 
In a fully distributed system, dependency on shared facilities are kept to a minimum. Therefore, 
each SIFT processor contains its own main memory, power supply, clock, and 1/0 channel. A block 
diagram of a SIFT processor is given in Figure 2-7. Each processor in the system comprises: 
0 16-bit bitsliced CPU. 
0 32K words of static random access memory (RAM) which holds the SIFT executive program, 
0 A broadcast controller for interprocessor communication. 
0 A 1553A controller used to support external 1/0 to terminals, sensors, or avionics modules. 
0 1K words datafile memory used as a buffer area for the broadcast and 1553 controller. 
0 1K words transaction file memory used to hold the destination address of the values in the 
the application programs, the transaction and data files, and the control stack. 
datafile to be transmitted. 
Background 
1553 
10 BUS 
13 
1553 
IO BUS 
. 
. - - -  -> 
SENSORS 
AND 
ACTUATORS 
PROC 7 
AND 
MEMORY 
f 
BROADCAST BROADCAST 
TRANSM RECEIVER 
PROC 1 
MEMORY 
1 
BROADCAST BROADCAST 
TRANSM RECEIVER 
I 
A A h I 
Figure 2-6: Block Diagram of SIFT Distributed System 
0 A real-time clock driven by a 16 MHe crystal. 
2.3.2 SIFT Software 
To run an experiment on SIFT, the user writes a task in Pascal on the host computer. Once a task 
is written, it is compiled, assembled, and linked with the SIFT operating system. This procedure creates 
an absolute executable image file that can be loaded directly onto the selected SIFT processors. 
Reliability is achieved by replicating the task on more than one processor. The number of processors 
chosen is specified by the user, whose decision is based on the importance of the task. 
Allocation of a task is done through a user defined Schedule Table. The Schedule Table lists the set 
of tasks that will be periodically dispatched, along with task specific information. It is the user's job to 
decide the order tasks are executed, the number of processors used for replication, and the data to be 
voted. The user must also specify the 'duration' of the task in increments of 1.6 millisecond slots. This 
step insures results are brodcasted in time for voting. It also prevents a Don-faulty processor from being 
configured out of the system because of task timoout. 
Background 14 
16K 16K 
MEMORY MEMORY 
MEMORY B'JS BROADCAST 
CONTROLLER 
CPU 
intei rupt 
1 REAL-TIME 
I CLOCK  
I 
r 
1K DATAFILE 
1K TRANS FILE 
1553A 
CONTROLLER 
- ~~ 
t I 
Figure 2-7: A SIFT Processor 
After execution of a task, the results from each processor are compared, or "voted" on. If all copies 
are not the same, an error has occurred. These errors are recorded in the processors' memories to assist 
the Executive System in determining which processor is faulty. If an error occurred, the Executive 
System masks the fault by ensuring that only the correct or "majority" value is passed onto the next 
task. Fault masking prevents a faulty unit from causing problems in the system, such as corrupting a 
non-faulty processor's memory. If fault masking is not done, a faulty or "malicious" processor could 
create a life threatening situation, such as transmitting an invalid control signal. Once a processor has 
been found faulty, the Executive System reassigns the processor's tasks to another processor, thereby 
configuring it out of the system. 
Background 15 
2.S.S Experimental Environment 
SIFT provides the experimenter with a user-friendly test environment, promoting experimentation 
through interactive facilities designed to help prepare, exercise, and observe the system’s behavior. From 
a terminal linked to the host computer, a researcher can create and run experiments on SIFT, collect 
data, print out files, and dump data to an on-line printer. Figure 2 8  depicts the test environment zu 
seen by the user. All communication to and from SIFT is through a VAX-11/750. This host computer is 
solely dedicated to SIFT research. NASA also installed added features to the SIFT environment to 
enhance experimental conditions: a Data Acquisition System (DAS) for improved data collection and a 
global clock for improved measurement conditions. 
t- PRINTER I 
HOST 
COMPUTER 
VAX 
11/750 
I Proc 
Figure 2 8 :  The SIFT Test Environment 
 
I 
I 
DAS is made up of many integrated programs that receive and analyze data from the SIFT 
processors. These programs are downloaded to the seventh SIFT processor, which can then control data 
collection. Before this system was created, data collection was limited to 4K words of memory. With the 
Data Acquisition System, information is sent from the SIFT processors to a disk capable of holding 50K 
blocks, a total of 12.8 Mwords. DAS requires some initial preparation, but it features an interface 
program that facilitates the task. A preprocessing program is also available which provides the user with 
. 
Background 16 
the ability to manipulate the data straight from the disk, and the ability to specify what data to save for 
later processing. 
The global clock is a l b b i t  counter, like the real-time clocks of the SIFT processors, except that it is 
an independent measuring device. It features a programmable time-base so the user can specify the 
resolution of the clock (i.e. 1 microsecond, 1 millisecond, etc.). The processors gain access fo the clock via 
a read bus. The clock values are available for all processors simultaneously, since there is no arbitration 
for this bus and therefore no contention. The advantage of having a global clock is the assurance of the 
consistency and reliability of the measurements taken by the processors, since clock times come from a 
common external reference. 
Ehperimente 17 
3. Instruction Set, Hardware Level Baseline 
Experiments 
Instruction set and hardware baseline experiments consist of measuring the characteristics of the 
most basic level of the computer visible to the programmer. This includes measuring execution time of 
hardware instructions and testing the existence of and time to process interrupts. Interrupt times for 
FTMP were covered in [Feather et al. 851. SIFT% interrupts have not been validated yet. 
Figure 3-1 illustrates the typical loop for measuring instruction timings. This task reads the global 
clock and stores the value of the starting time in memory. It then enters the loop where it executes the 
statement being tested LOOPCOUNT times. After the loop terminates, the global clock is read again 
and the ending time is stored. The value of LOOPCOUNT, and the number of times the task executes to 
collect data is controlled by external variables set by other tasks or the experimenter through an 
interface. This task is set up to test instruction times for all processors. For both SIFT and F’I”, the 
null loop itself was measured so that the overhead from its execution could be subtracted from the results 
of the other statements. 
begin 
data[time] := gclock; 
for i := 1 to LOOPCOUNT do 
begin 
end; 
< function to be measured > 
data[time+l] := gclock; 
end 
Figure 8-12 Basic Task Algorithm 
Since the collection loop reads the clock, an experiment to validate clock consistency is first done, 
followed by measurement of High Level Language (HLL) instructions. HLL instructions were chosen since 
this is the level visible to the user, thus the experimentation process is simplified. Also, the executive 
systems are written in HLL. Efficiency of high-level language depends on compiler technology. Instead of 
statement by statement translations, the compiler may optimize to produce instruction pairs that execute 
faster than the sum of the single instructions. Thus, instruction pairs were also tested to determine the 
effects of compiler optimizations. This section will cover the three experiments mentioned above: 
0 Clock read time. 
e High level instruction execution time. 
0 High level instruction pairs. 
Experiments 18 
3.1 Clock Read Delay 
In the clock read experiment, the global clock was tested for consistency. The clock can be used as a 
measuring tool if reading it produces consistent results, or if variations are predictable. To insure that 
future experiments using the global clock are valid, repetitive readings are performed. Should the clock 
be found to be inconsistent, steps should be taken to adjust future experiments [Kong 821. 
[Clune 841 measured the time to read FTMP’s clock from two triads running standalone and 
running simultaneously. On FTMP, the processors read from a common global clock located on the 
memory bus. Thus, simultaneous reads of the clock or system bus accesses can cause contention. The 
results of these experiments are summarized in Table 3-2. These experiments found FTMP’s clock to be a 
reliable measuring device with little additional delay in the face of bus contention. The resolution of 
FTMP’s clock is 250 microseconds. 
In the SIFT experiment, a clock read statement was inserted in the basic task and iterated 100 times. 
Using the experimental procedure described, 3000 data points were collected. Analysis of the data shows 
the global clock to be a reliable measuring tool. Clock read results for the three processors used are 
shown in Table 3-1. 
Read Time Clock Delay 
Microseconds Per 100 Reads with Overhead 
Processor Microseconds 
minlmax 
P1 285512856 
P2 285512856 
P3 285712860 
Table 3-1: SIFT Clock Read Results 
As Table 3-1 illustrates, the clock reads differed by only 1 to 3 microseconds within a processor, a 
negligible amount. As for the variation between processors, the maximum difference was five 
microseconds. This is an excellent result considering the SIFT processors are only loosely coupled. The 
difference in clock read time is caused by slightly different processor execution rates. 
Summaries of SIFT and FTMP clock read results are shown in Table 3-2. For both machines, the 
global clock proved to be a reliable measuring device where any delays were predictable and negligible. In 
comparison to FTMP however, SIFT’S clock can measure finer grain of events. 
Experiments 19 
Execution Time for SIFT Clock Read: 
100 Iterations of 1 Clock Read = 2856.5 microseconds 
With Null Loop Overhead 
1 Clock Read = 17.7 microseconds 
Without Null Loop Overhead 
Execution Time for FTMP Clock Read:' 
16 Iterations of 5 Clock Reads = 13.99 milliseconds 
With Null Loop Overhead 
1 Clock Read = 172 microseconds 
Without Null Loop Overhead 
Table 9-2: Clock Read Results for SIFT and FTMP 
3.2 High Level Language Instruction Execution Times 
In the second category of baseline experiments, the execution times of various instructions were 
measured. Since we eventually want to compare two systems, efforts were made to insure that as many 
of the applicable instructions tested on FTMP were also measured on SIFT. Table 3-3 presents the 
instructions that were measured on F T "  and SIFT. Since the SIFT architecture does not provide 
hardware or software support for real or long word (32 bit) data, only integer and boolean data type were 
tested on this system. 
As an overall comparison, the execution speed of SIFT instructions is listed along side FTMP's in 
Table 3-3 and illustrated in Figure 3-2. Although SIFT requires more time than F T "  in negating 
variables, it is faster at all other instructions including procedure calls. For example, when executing a 
boolean 'OR' function; SIFT is 219% faster. This disparity is due to the differences in compilers. 
Whereas SIFT'S compiler simply loads the variables into two registers and 'OR% them, FTMP's 
compiler testa each variable separately and executes code depending on the outcome of the .test (i.e. if the 
first variable is true, i t  jumps without testing the other). Worst case is when both variables are false: i t  
must test both variables before i t  can jump. An overall unweighted average (assuming all instructions 
tested are equally likely) shows that SIFT is 129% faster than FTMP in executing instructions. 
Along with simple instructions, execution times for procedure calls were measured for various 
numbers of parameters. To help visualize the results, Figure 3-3 plots procedure calls against number of 
parameters. An analysis of Figure 3-3 shows that after some constant overhead, the execution time 
increases almost linearly with increasing number of parameters. As a comparison, the results of FTMF"s 
~~ 
'Average of Two reported in [Clune 841 
Experiments 20 
f* 
Pascal 
Instruction 
A := 1 
A := B 
A : = B + C  
A : = B * C  
A := B div C 
A : = B = C  
A : = B > = C  
A : = B < C  
A := True 
A := B 
A := B or C 
A := B and C 
A := N O T B  
NULL 
Procall(A) 
Rocall(A,B) 
Procdl(A ,B,C) 
Rocall(A,B ,C ,D) 
If GO then A:=l 
If GO then A:=l  
If GO then A:=l  
Else B:=l 
If GO then A:=l 
Else B:=l 
A := -B 
Procall() 
~~~ 
Instruction Execution Times: SIFT vs. FTMP 
Per 
Description 
Integer Assign 
Integer Variable Assign 
Integer Addition 
Integer Multiply 
Integer Division 
Integer Negate 
Integer Compare 
Integer Compare 
Integer Compare 
Boolean Assign 
Boolean Variable Assign 
Boolean Or 
Boolean And 
Boolean Negate 
Null Loop 
Procedure Call 
Procedure Call 
Procedure Call 
Procedure Call 
Procedure Call 
Conditional , True 
Conditional , False 
Conditional, True 
Conditional , False 
w 
SIFT 
3.70 
4.39 
6.45 
12.57 
20.83 
9.48 
8.51 
9.70 
9.45 
3.70 
4.39 
6.89 
6.89 
6.26 
10.86 
6.45 
7.00 
15.88 
20.27 
24.39 
6.95 
3.70 
8.32 
7.14 
!&sa 
FTMP 
4.0 
5.5 
10.0 
20.2 
21.7 
7.0 
23.2 
23.5 
21.2 
4.0 
5.5 
22.0 
21.1 
10.9 
17.7 
37.0 
51.7 
57.5 
63.2 
69.0 
9.0 
5.5 
13.2 
9.5 
- 
Percent Difference 
8.1% 
25.3% 
55.0% 
60.7% 
4.2% 
172.6% 
142.3% 
124.3% 
8.1% 
25.3% 
219.3% 
206.2% 
74.12% 
63.0% 
473.6% 
638.6% 
262.1% 
211.8% 
182.9% 
29.5% 
35.1% 
58.7% 
-26.2% 
33.0% 
Table 5-3: Instruction Times: SIFT vs. FTMP 
experiment are plotted on the same graph. Although FTMP’s execution time also increases linearly, i t  
has 474% more initial overhead than SIFT’S. Since FTMP is a stack machine, i t  executes extra 
instructions that SIFT does not. It must push the number of parameters on the stack before executing a 
return statement. The return statement must then pop this number of parameters so i t  can adjust the 
stack pointer before returning control to the calling program , thereby removing parameters no longer 
needed. 
3.3 High Level Language Instruction Pair Execution Times 
In the third category of baseline experiments, the execution times of instruction combinations were 
measured to determine if the results exceeded the worst case time, the sum for executing each instruction 
alone. This was an important experiment since in the SIFT operating system the user is responsible for 
defining the duration of a task: if instruction combinations take longer than expected, the allocated time 
may prove insufficient and the task will time out. It was also of interest to determine if each system’s 
compiler takes advantage of optimizations. In these experiments, each set of instructions was executed in 
the basic task. For  the SIFT experiments, using the standard procedure, 3000 data points were collected. 
~~ 
Ehperiments 21 
Microsecs. 
Per 
Ins tr  u c ti0 n 
I 
Int.Ass Var.Ass Neg Add Or And Compare Mult  Div 
Instruction 
Figure 8-2: Graph of Instruction Times: SIFT vs. FTMP 
Two approaches were taken for this experiment. One experiment tested the effect on execution times 
when the consecutive iteration of a single instruction was increased. For this case, the integer assign 
statement A=l was iterated between 1 and 20 times, inside the basic task loop. Figure 3-4 illustrates the 
results for SIFT and FTMP. 
Inspection of Figure 3-4 shows that for SIFT, although execution time increases linearly with the 
number of iterations, the slope reflects signs of compiler optimization. An analysis of the SIFT assembly 
code shows that savings occum because the compiler loads a register with the value 1 the first time and 
uses stores to m i g n  1 to A thereafter. For example, a representation of the assemble code for the case 
where A-1 was executed five times in a row is illustrated in Table 3-4 for both FTMP and SIFT. 
In comparison, the results of FTMP's experiment show that although FTMP's graph is also linear, 
there is no compiler optimization. This result occurs because FTMP is a stack machine and consecutive 
stores are not done. Consequently, although SIFT and FTMP start off with a similar execution time, by 
the 20th iteration SIFT is done 94% moner. 
Experiments 22 
70 , 
60 
50 
40 
Microsecs 
Per Instruction 
30 - 
SIFT 
20 - 
10 - 
L k  
0 1 2 3 
Number of Parameters 
Procedure Calls vs. Parameters Figure 8-3: 
4 
In the second set of experiments the execution times of instruction pair and triple combinations were 
measured. The architectural difference between register and stack machines was witnessed when the 
instruction combinations performed on FTMP were applied to SIFT. Table 3-6 and Table 3-7 shows the 
results of these combinations. Table 3-5 is an example illustrating how each machine handles a 
representative instruction combination. 
Comparing the two systems, SIFT’S compiler is able to optimize for cases where FTMP’s compiler 
can not, such as register allocation to avoid unnecessary loads and stores. In general, the only 
optimization FTMP features is a duplicate store: in aome cases, if a variable is going to be used twice i t  
is duplicated instead of stored and reloaded. 
Experiments 
80 
70 
60 
50 
40 Microsecs Per Instruction 
30 
20 
10 
0 
5 10 15 
Iterations 
Figure 3-4: A=l vs. Consecutive Executions 
Instruction 
A = l  
SIFT FTMP 
Load 1 , R l  Push 1 
Store R1,A Push 1 
Store R1,A Push 1 
Store R1,A Pop A 
Push 1 
Push 1 
Store R1,A Pop A 
Store R1,A Pop A 
Pop A 
Pop A 
23 
20 
Table 3-4: SIFT vs. F T "  for Integer Assign A-1 
Ekperiments 
Instruction 
B = C + D  
E = C + D  
SIFT FTMP 
Load C , R l  Push C 
Add D,R1 Push D 
Store R1,E Pop B 
Push C 
Push D 
Add 
Pop E 
. Store R1,B Add 
24 
Table 8-6: SIFT vs. FT" in Addition Combination 
Ejrperiments 25 
Table 8-6: SIFT: Comparison Between Single Instructions and Combinations 
Table 8-7: FTMP: Comparison Between Single Instructions and Combinations 
Experiments 26 
4. Executive Level Baseline Experiments 
This section discusses the executive level baseline experiments conducted on FTMP with a 
comparison of similar experiments conducted on SIFT. Executive level baseline experiments include the 
behavior of the executive level software (operating system), along with the throughput, utilization, and 
delay of the operating system software. The experiments in this section include: 
0 The time to transfer blocks of data to and from local memory for FTMP. 
0 The actions of the systems when the task is allowed to overrun its allotted time. 
0 The task iteration rate for both FTMP and SIFT. 
0 The software overhead for FTMP's and SIFT's real time dispatcher7 
0 The overhead for FTMP's and SIFT's fault-handling tasks. 
4.1 Data Block Transfers 
On a multiprocessor system where processors access a common resource, there can be contention for 
that  resource. Experiments to measure the time to access the resource along with the delay and variance 
caused by contention for the resource should be performed. On FTMP the shared resources are the global 
memory and the 1/0 ports. During execution, blocks of data or code are transferred to and from the 
shared resources by the triads. The remainder of this section describes experiments to measure data block 
transfer times to and from global memory on FTMP. SIFT is a distributed computer system with no 
shared or global memory. For this reason a comparable experiment cannot be done on SIFT. 
All F" tasks require access to shared system memory or other common system resources. Access 
to these resources is achieved using System Executive Primitives. Executive primitives are basic functions 
used repeatedly by most user and executive tasks. F W  divides these into four categories: [Draper 83b] 
0 System Bus Service Routines: These procedures are used to read to and write from devices on 
the system bus (except error latches). The read and write routines involve simplex and voted, 
high and low address space, and non-incremental and incremental transfers. Also included 
under this category are hog bus, release bus, and synchronization primitives. 
0 Error Latch Service Routines: These are specific bus service routines optimized for the error 
latches. Error latches are registers containing voter disagreement data. 
0 Timer Routines: These keep track of several interval timers in software. There is only one 
hardware timer per processor. 
0 Miscellaneous Primitives: Include lock and unlock semaphores, test and set routines, and IPC 
(Inter-process Communication) Kick function used to start R4 tasks in another processor triad. 
'SIFT'S aoftwue overhead WM prereoted in (Pdumbo sad Butler 851; their rerulb u e  rummuired here lor compuison. 
Experiments 27 
Of these executive primitives two system bus routines were examined. The remaining executive routines 
could be validated in a similar fashion as the bus service routines measured. 
4.1.1 Experiment Setup 
This experiment considered two of the system bus service routines: RD and WRT. The functions are 
voted, low address, incremental read and write. These experiments were se tup  by placing the function 
inside a loop, similar to the high level language instructions. The size of the block transfers was varied 
from 1 to 200 words' (1 word = 16 bits). The number of processor triads competing for the system bus 
was also varied: one triad running to show execution time with no contention for the bus, and two triads 
running to ahow the effects of bus contention. 
4.1.2 R,eoultm of Block Trmefem 
Figurea 41, 4 2 ,  and 4 3  show the execution times of reads and writes to be linear with respect to the 
size of the data block being transferred. On these figures, the horizontal axis indicates the size of the 
block transfer in 16 bit words. The vertical axis is the average time to transfer the block of data. The 
figures also show an increase in average execution time due to bus contention. This increase is nearly 
constant throughout the range of block sizes. 
Of interest is the variation within each group of data and between the two groups of data. With one 
triad, no bus contention, the measured times were all within two clock ticks9 With two triads competing 
for the bus, most of the measured times were within eight clock ticks, with a few outliers up to 20 clock 
ticks (5 milliseconds) from the mean. (The number of clock ticks to complete a transfer task, 50 transfers 
per task, varied from 31, for a one word write, to 248, for a 200 word write.) This bus contention is 
illustrated in Figures 4 2  and 4 3 ,  showing the increase in the 95% confidence intervals. 
A final word regarding system bus service routines coma from the FTMP Executive Summary 
[Draper 84). In the summary, the authors noted the bottleneck caused by system bus access (greater 
than 150 pseconds overhead per access, as shown on the graphs, which equals about 15 high level 
language instructions) lowered the expected throughput of the system. [Draper 841 showed that one third 
of the dispatcher time is spent in overhead for bus service routines. This bottleneck could be reduced by 
microcoding of some of the 1/0 functions. 
~~ 
'A full page, 266 words, could not be tr8118ferred becru8c of the 8he of the working 8twk for m rpplie8tion tank. 
'A clock tick Q one period of the 8y8tem clock, or the obrervrble remlution of the clock. F " ' r  clock period wld 250 
micro-8econd8. 
Experiments 28 
Average 
Block Transfer "me 
(in micro-seconds) 
600 
500 
0 
0 
0 
0 
Write to 0 Syrtem Memory ' 0 
I 
0 
0 
0 
0 
0 
Rerd from 
Syatem Memory 
/ / / '  /
I I I I I 
0 25 50 75 100 
Size of block transfer 
(16 bit words) 
Figure 4-1: Read and Write Execution Times as a Function of Block Size, 1 Triad 
4.2 Real Time Task Behavior 
In real time systems, application tasks are executed a t  a fiied iteration rate or frequency to meet 
hard deadlines. Since periodic task execution is an integral part of the system, experiments must be 
devised to validate the basic properties of task iteration rates. First, the basic task iteration rate should 
be validated. Next, the behavior of these iteration rates in the presence of errors or "malicious' tasks 
should be validated. The type of errors that can occur depend in a large part on the scheduling 
mechanism of the particular real time system. On SIFT, tasks are dispatched by a fiied schedule, as 
discussed in Section 2.3. Tasks are divided into 1.6 millisecond slots, assigned at compile time. Thus, the 
only malicious properties are a task broadcasting bad data or a task exceeding its slot allocation. 
In contrast, FTMP uses a dynamic scheduling mechanism as discussed in Section 2.2. Tasks are 
grouped by execution rates. All of the tasks in a rate group must execute within a frame. FTMP has 
three frame sizes which define the task grouping. Processors select tasks from the appropriate global task 
queues in each frame group. When all higher frame tasks are exhausted, processors select tasks from the 
next lower rate group. Thus, the effect a malicious task has on individual tasks and frames should be 
tested. 
The remainder of this section discusses the experiments used to measure the basic properties of the 
task iteration rate on FTMP and SIFT. 
Experiments 
800 
700 - 
600- 
Average 
(in micro-seconds) 
Block Transfer Time 500 - 
400 - 
500- 
29 
I 
Solid: 1 Triad 
Drshed: 2 Triads 
Vertical Lines Represent 
95% Confidence Intervals 
Average 
Block Transfer "lme 
(in micro-seconds) 
I 
I I I I I 
1 5 10 15 20 25 
Size of block transfer 
(16 bit words) 
200-1 
I I I I I 
25 50 100 150 200 
Size of block transfer 
(16 bit words) 
Figure 4-2: Read Execution Times as a Function of Block Size, 1 Triad and 2 Triads 
Experiments 30 
Average 
Block Transfer Time 
(in micro-seconds) 
Average 
Block Transfer Time 
(in micro-seconds) 
300 
275 
250 
225 
200 
175 
I 
A 
Solid: 1 Triad 
Drshed:2Triads 
Vertical Lines Represent 
05% Confidence Intervals 
1 5 10 15 20 25 
Size of block transfer 
(16 bit words) 
1200 
1 100 
lo00 
900 
800 
700 
600 
500 
400 
300 
Solid: 1 Triad 
Dashed: 2 Triads 
Vertical Lines Represent 
05% Confidence Intervds 
I I I I I 
25 50 100 150 200 
Size of block transfer 
(16 bit words) 
Figure 43:  Write Execution Times as a Function of Block Size, 1 Triad and 2 Triads 
Experiments 31 
4.2.1 FTMP Frame Sisw 
This experiment involves measuring the true task iteration rates for the two highest task rate groups, 
R4 and R3. This experiment measured the difference between consecutive starts of the first task in the 
rate groups. The results yield the nominal frame size and variation of the frame sizes. The data for these 
experiments are presented in Figures 4-4 and 4-5. 
Figure 4-4 shows the spread of data for the R4 frame size. The frame sizes show a similar grouping 
for the one and two triad case, and for the three triad case a similar grouping but at different locations. 
The R4 pattern determines the pattern for the R3 frame. R3 frames are started every second R4 
frame, hence the sum of two consecutive R4 frame sizes determine the size of the R3 frame. For the one 
and two triad case the observed R4 frame size pattern is 36, 42, 114, 92, ... The first two frame sizes 
determine the R3 frame sire, about 78 m i l l i o n & ,  and the second pair determine the next R3 frame size, 
about u)5 milliseconds. Theae patterns were observed in the unprocessed data dumps. Similarly for the 
three triad case the observed R4 pattern is 40, 65, 105, 40, ... Thus the R3 pattern would be 105, 145, ... 
This pattern differs from the observed pattern, a single 125 milliseconds group, but the difference could 
be explained by additional overheads occurring in the scheduler. 
These patterns are probably caused by the variation in scheduler workload between tasks. (i.e. 
scheduling and 1/0 for R3 and R1 tasks, etc.) Further study of the dispatcher will be necessary to fully 
characterize its behavior. 
The long frames are in correlation with the long scheduler times (Section 4.3.1.1, Figure 4-7). The 
scheduler will not schedule the start of the next R4 frame until the present R4 frame is complete and ten 
milliseconds is allowed for lower rate group tasks. The two triad case in Figures 4-7 and 4-4 represent 
this behavior best:" as shown in Figure 4 7  about one-quarter of the execution times lie near 40 
milliseconds, and another quarter around 65 milliseconds. Similarly the R4 frame sizes are grouped at  90 
and 115 milliseconds or 50 milliseconds greater than the corresponding scheduler times. From the frame 
size measurements, a problem with the FTMP seheduler meeting its real time constraints can be observed. 
4.2.2 SIFT Frame Sime 
In SIFT, the schedule table dispatches tasks periodically. This period is called a 'major frame" and 
is divided into 1.6 millisecond time slots [Palumbo and Butler 851. The purpose of this experiment was to 
validate the true size of these slots which consequently validates the frame size. To obtain these results, 
three experiments were conducted. 
"The exret correlation between the drt. ia not preoent becaw of different experirnentri setup, the wheduler behavior, and the 
dmtribution of tmks between tri&. 
Experiments 32 
12.5 
Percent of Total ,.5 
Distribution 2.q 
0 
5 -  
4 -  
3 -  
Percent of Total 
Distribution 
2 -  
1 -  
One Triad Executing 
668 dab pointa 
I 
40 50 60 70 80 90 100 110 120 
Two Triads Executi 
792 data pohtr 
5 
40 50 60 70 80 90 100 110 120 6 
5 -  
Three Triads Executing 
1448 data pointa 
Percent of Total 
Distribution 
- .  
I I I I I I 
40 50 60 70 80 90 100 110 120 
Frame Size, milliseconds 
Figure 4-4: Actual Frame Size for R4 Tasks 
Ekperiments 33 
I .  One Triad Executing I 
0 
10 80 90 100 110 120 130 140 150 160 170 180 190 200 210 
2.5 I 
9 -  Two Triads Executing 
Percent of Total 
Distribution 
-.. 80 90 100 110 120 130 140 150 160 170 180 190 200 210 
Three Triads Executing 9- I  
6 
64S d a b  poinu 
5 i  
Percent of Total 
Distribution 
0 
80 90 100 110 120 130 140 150 160 170 180 190 200 210 
Frame Size, miiiiseconds 
1k
Figure 4-6: Actual Frame Size for R3 Tasks 
Faperiments 34 
In the first experiment, the time between consecutive tasks was measured by reading the clock upon 
entering each task. This delta-time validated the true time for a slot, 1.6 milliseconds. In the second 
experiment, the same two tasks were consecutively run. However, this time the tasks were allocated a 
duration time of two slots per task. In the last experiment, three tasks were run back to back and the 
time was taken upon entering the first task and entering the second task. These experiments showed that 
the slots sizes were consistently 1.6 milliseconds. 
4.2.3 Comparison of F"P'r and SIFT'S Frame Sises 
In comparing the frame sizes for both systems, SIFT uses a timer interrupt, occurring at a fixed rate, 
to signal the start of the slot. FTMP also uses a timer interrupt, but the interrupt is software controlled 
and armed by the scheduler to allow for frame slippage or software extension of the frame. Hence SIFT's 
frames are constant in size, whereas FTh4F"s vary because of slipping caused by large and varying 
overhead in the scheduler (Section 4.3.1.1.) From these results the advantage of using a fiied timer 
interrupt, rather than a software maskable interrupt, for real time control is demonstrated. 
The next section discusses performance of FTMP and SIFT's task stretching mechanisms. 
4.2.4 FTMP Frame Stretching 
As discussed in Section 2.2, FTMP has three task rate groups or frame sizes: R4 (25 Hz), R3 (12.5 
HI), and R1 (3.125 He). Individual tasks in the frame are given time limits and any task that exceeds its 
limit is aborted. All tasks in a rate group should finish before the next frame. FTMF' has a mechanism 
for handling frame slippage, i.e. tasks in a frame taking longer than scheduled. If all tasks in a frame are 
not completed before the end of the frame, the frame is extended, or stretched, to give tasks more time. 
[Clune 841 ran two experiments: the first experiment determined the effects of an individual task taking 
longer than the frame size. [Clune 841 discovered the start of the next frame will be delayed without limit, 
until all tasks in the present frame are completed. However, the task will abort if it exceeds its time 
limit. The second experiment determined the effect of a frame trying to execute more tasks than could be 
completed in the frame's time limit. In this experiment, [Clune 841 created an infinite task queue" and 
showed the frame will slip forever. 
Next, the effect of a processor broadcasting bad data was investigated. On FTMP, processors run 
the same program in lockstep and voting is accomplished in hardware. Therefore, having one of these 
processors run incorrectly to test correct voting and reconfiguration is not feasible. However, [Draper 
83a] reported the results of fault injection experiments which verify that a faulty processor will be 
configured out. 
"An infinite tmt queue WM created by dynamiely changing pointem with the teat adapter. 
Experiments 35 
4.2.6 SIFT Frame Stretching 
In the SIFT system, as long as a task does not exceed its time allocation and a processor does not 
broadcast bad data, a processor is considered healthy. If either condition is violated, the processor will be 
tagged 'faulty' and be configured out. The purpose of task stretching was to determine whether a faulty 
processor will be reconfigured out. 
To answer this, two experiments were done. In the first experiment, conditions for task timeout, a 
task not meeting its time allocation, were explored. Processor 1 and 3 were allowed to complete their 
task before the deadline, while processor 2 stretched its task beyond the allocated time. For this 
experiment, the system behaved as predicted-the straying processor was halted when its task took longer 
than the scheduler allowed. 
In the second experiment, the system's reaction to the broadcast of bad data was explored. For this 
experiment, two methods were used to make a processor appear faulty: a task was stretched, preventing 
the broadcast of correct data and, correct data WNBS allowed to be broadcast, but the task was stretched 
beyond its allocation. The first method was used to create the broadcasting of bad data. The second 
method explored the reaction of the system when a processor became faulty but managed to pass valid 
data. Figure 4-6 shows the code used for this experiment. 
begin 
data[time] := gclock; 
for i := 1 to LOOPCOUNT do 
if i = WHEN then 
if pid = 2 then 
begin 
stobroadcast(passit,l 6#ABCD); 
for j := 1 to STRETCH do 
extraloop := j; 
end; 
data[time+l] := gclock; 
end 
Figure 4-6: Program Used For Task Stretching - Voting Case 
To fully control the two conditions which could trigger a configuration, two variables STRETCH and 
WHEN were introduced. STRETCH controls the amount the task running on processor 2 is lengthened, 
or stretched. WHEN signals processors 1, 2, and 3 to broadcast hex value ABCD, arbitrarily chosen data. 
To test the full effects of task manipulation, STRETCH and WHEN were varied from 1 to 100. 
The results of this experiment were as predicted. As Table 4-1 illustrates, when STRETCH was 
increased beyond 7, the process running on P2 way stretched beyond its time constraints. Because its task 
did not finish, it was forced to broadcast bad data and was therefore configured out. It was also 
configured out when STRETCH was increased beyond 7 but WHEN was small. This proved that 
Experiments 36 
100 
1 
10 
50 
100 
1 
10 
50 
100 
1 
10 
50 
100 
although P2 had the opportunity to paas correct data, it was configured out because its task did not meet 
its time allocation. 
7 2.36 12.31 2.31 No 
8 2.36 timed out  2.31 Yes 
8 2.36 timed out  2.31 Yes 
8 2.38 timed out  2.35 Yes 
8 2.36 timed out  2.31 Yes 
10 2.36 timed out  2.31 Yes 
10 2.36 timed out  2.31 Yes 
10 2.38 timed out  2.35 Yes 
10 2.36 timed out  2.31 Yes 
100 3.77 timed out  3.70 Yes 
100 3.78 timed out  3.71 Yes 
100 3.82 timed out  3.76 Yes 
100 3.16 timed out  3.70 Yes 
Table 4-1: SIFT: Task Stretching Results 
The results of task stretching experiments proved that SIFT handles "malicious" processors exactly 
as predicted: if a task does not complete before its allocation of 1.6 millisecond time frames, or if i t  
broadcasts bad data, i t  will be configured out of the system. 
4.3 Real Time Scheduler Overhead 
The behavior and delay of the executive software are two more elements of the Evaluation Matrix of 
This section covers the Dispatcher-Scheduler software overhead. The dispatcher-scheduler Table 2-1. 
strategy was described in Section 2.2 for FTMP and in Section 2.3 for SIFT. 
4.8.1 FTMF"s Red Time Scheduler Behavior 
The following experiments measured the software overhead of the dispatcher-scheduler tasks for the 
FTMP operating system: 
0 R4 dispatcher start up times, 
0 IPC 'Kick' times, 
Experiments 37 
0 Intra-task group switching times, and 
0 Inter-task group switching times. 
4.3.1.1 Start-up Dispatcher Time 
In the dispatch strategy, a timer interrupt occurs in one triad to signal the start of an R4 frame. 
The interrupt handler initiates the R4 dispatcher; the dispatcher schedules pending R3 and R1 tasks" , 
kicks the other triads to start R4 tasks, and does necessary I/O. At some point the dispatcher activates" 
the application task. The behavior and execution time of the dispatcher from the interrupt to the start of 
the first application task was measured. This time is spent scheduling lower priority events, starting 
other processors in the frame, executing reconfiguration commands and doing necessary task I/O. 
The synthetic workload generator provides a means to measure execution times and intertask times 
of applications tasks, but does not provide a means to measure the behavior of the R4 dispatcher. Hence a 
custom task was created. This task is the first R4 application task; it reads both the real time clock at 
the start of its execution and the absolute time of the frame start which is stored in system memory (to 
determine the time of the interrupt). The task was repeated 256 times, dumping the clock values into 
system memory for transfer to the VAX. This experiment was repeated for one, two, and three processor 
triads running. 
Histograms of the results are given in Figure 4 7 .  As seen from these graphs, the data is grouped 
into three regions. The dispatch start up times vary in a cyclic pattern of 65, 14, 16, 67, ... milliseconds 
for one triad executing, and 65, 15, 15, 40, ... for two and three triads. 
This pattern is probably caused by the variation in scheduler workload between tasks. (Le. 
scheduling and 1/0 for R3 and R1 tasks, etc.) Further study of the dispatcher will be necessary to fully 
characterire its behavior. 
This behavior is unacceptable for a real time system. Assuming the frame start interrupt occurs at a 
constant rate, the actual time between consecutive application task starts will vary plus or minus 50 
milliseconds from the intended starts. This may cause missed or late data, allowing the real time system 
to miss deadlines. (Data on the variation between frame starts and variation between tasks starts was 
presented in Section 4.2.1.) Onehalf of the time the dispatcher takes a full R4 frame or longer to 
execute, clearly an implementation problem. 
'%he dmpitcher schedules the & u t  of 8 minor frame, R3 or Rl ,  by rppending the procemr state descriptor of the rite group 
dispatcher behind the =tire procemr st i te  dereriptor, the R4 dispatcher. 
"The term activate refers to i proeta where the dispatcher trmsfem control to another procean. At process completion, control 
h returned to the dhpatcher. 
Experiments 38 
Percent 
Frequency 
of Occurence 
6400 points 
Percent 
Frequency 
of Occurence 
6400 points 
15 
10 
5 
0 
I 
15 
L 
Percent 
Frequency 
of Occurence 
6400 points 
15 
10 
5 
0 
One Triad Executing 
I Two Triads Executing 
L 
1 
I 
12 
Three Triads Executing 
66 68 70 16 18 38 14 
Figure 4 7 :  R4 Dispatcher Execution Times, 1, 2, and 3 Triads 
40 42 64 
Execution Time (milliseconds) 
(Note broken scale) 
Experiments 
Demonstrating any possible affects 
Dispatcher 
f syst 
.. R43 Task 
m w  
39 
rkload, and to show the frame start interrupt occurs 
at its set time, the experiment was repeated under the following conditions. 
1. Decrease the work load to just the single application task being executed. The results show 
the same pattern and spread as the other tests. 
2. Extend the frame size from 40 milliseconds to 250 milliseconds. This allowed all tasks to 
complete in a single R4 frame without slippage. Again the dispatcher behaved similarly. It 
was also noted that the frame starts occurred at 250 msec. with no variation14 (i.e. the 
interrupt mechanism works correctly). 
4.3.1.2 IPC 'Kick' Tim- 
> Timer Interrupt 
First Triad R4 DisDatcher I R41 Task .- 
Second Triad Dispatcher R 4 2  Task .. I 
Third Triad 
Time 2w 
Figure 48:  IPC 'Kicks' Timing Diagram 
One function of the R4 dispatcher is to 'Kick', through an IPC (Inter-Process Communication) 
interrupt, the start of the R4 frame in another triad. A timing diagram of this process is given in Figure 
48. Again the workload generator does not allow the measurements of the time from the 'Kick' to the 
start of the application task; but the timing and behavior can be approximated by measuring the 
difference between the starts of the application tasks. The desired time to measure would be from the 
IPC kick of the first triad to the time when the R4 dispatcher begins execution, or from the frame start 
interrupt to the time when the second and third triad start their R4 dispatcher. An approximation to 
this behavior is the time between starts of the application task on different triads, labeled as "Measured 
Time" in the Figure 4 8  
l4There WM variation between frame stub at the 40 millisecond size. Thi variation b cawed by frame slippage; the frame h 
extended 10 miiiiseconds p u t  the 1Mt R4 task completion time. If the dispatcher takes 45 milliseconds the next frame cannot be 
started at the 40 millisecond mark. 
Experiments 40 
Percent 
Frequency 
of Occurence 
Percent 
Frequency 
of Occurence 
. 
Percent 
Frequency 
of Occurence 
5 0 -  
40 - 
30 - 
20 - 
10 - 
1 
40 
10 
0 1 2  
rn 
rl Y a. L 
I I I I  I I I I I I I  
3 4 5 6 7 8  23 24 25 26 27 28 29 
Execution Time (milliseconds) 
(Note broken scale) 
Figure 49: Time and Variation Between the Starts of the Application Tasks on 
Different Processor Triads. Emulating IPC Kick Times 
Ekperiments 41 
Histograms for the IPC 'Kicks' are given in Figure 4-9. With two triads executing, the times are 
grouped around 1.5, 2.5 and 27 milliseconds. With three triads executing the first to second triad 'kickm 
is centered at 3.0 milliseconds with no outlayers beyond 5 milliseconds. However, in the second to third 
triad 'kick' there was a large group, about 10% at 24.5 milliseconds. This behavior is undesirable in 
real time systems. The long 'Kick' times allow the possibility of one triad running its first task, finishing, 
and then running a second task while the second triad is still idle. This behavior was mentioned in 
Upon inspection of the dispatcher code, a few 
possibilities for this unwanted behavior arise. These possibilities include: a problem with the IPC 'Kick' 
mechanism, the F i b l i l y  of the dispatcher 'hanging' because of a locked or failed bus access, or an 
extended execution time during a system reconfiguration. Note that the outlayers involve between 10 and 
18 percent of the data. 
[Clune 841 and its cause should be investigated. 
4.8.1.8 Intra-Taak Group Switching 
At the end of the application task, control is passed back to the dispatcher. The dispatcher activates 
the next task in the queue (same rate group) or if all tasks have been dispatched, the dispatcher returns 
control to the previously interrupted or pending task (R3 or Rl dispatcher or application tasks). This 
experiment measures the intra-task group switching times including R4 to R4, R3 to R3 and R1 to R1 as 
shown in Figure 2-3. 
This experiment was run using the synthetic workload as a tool, and the measurements were taken 
for one, two, and three triads for all rate groups. The data is summarized in Figure 4-10. 
As seen from the data the behavior is regular, with skewing as the number of executing triads I 
increases. This skewing is probably caused by bus access contention, measured in Section 4.1. The large 
spread of the data in the R1 task switching could be explained by the difference in the experimental 
acquisition of the data. To measure the switching times, the tasks were ordered by controlling the task 
lengths. To have one triad execute two tasks from the same rate group require the other triad(s) 
executing another task (usually an R4 task since the R4 frame can be extended by lengthing an R4 task). 
This method works well for controlling R4 and R3 task orders, but the R1 tasks were interrupted by the 
start of the next R4 frame. Hence the R4 and R3 switching times were measured under easily repeatable 
conditions while R1 switching times the conditions were repeatable though to a lesser extent." In general 
the intratask group switching times are predictable within a certain range. The time for task switching is 
large in comparison with the desired frame size (40 milliseconds). If a triad needed to execute three R4 
tasks in a 40 millisecond frame, 8 milliseconds or 20% would be spent switching between the tasks. 
"In the Ft3 snd R1 dipsteher there u e  about 20 bus trmsrctions. Each transaction takes about .2 milliseconds. A delay csused 
by contention msy cause the bus trsnsrction to take two or more timer the uneontented time. 
Experiments 42 
R4 to R 4  Task 
Switching. Percent 
of Total 
Distribution 
One Triad Executing Two Triads Executing 
60 
R 3 t o R 3 T a s k  
Switching. Percent 
of Total 
Distribution 20 
10 
0 
Sb7S d a b  pointr 
Three Triads Executing 
1211 dab pointr 
1592 data points 
L 
60 
Switching. R l t o R l T a s k  Percent i i  
Distribution 20 
10 
0 
of Total 
106 data points 120 data points 86 data points 
3 4 5 6 7 8  3 4 5 6 7 8  3 4 5 6 7 8  
Figure 4-10: Intra-Task Group Switching Times in Milliseconds 
4.8.1.4 Inter-Task Group Switching 
As previously discussed, when the dispatcher finds all the tasks in its rate group completed or 
dispatched, i t  executes a 'resume' to restart the previously interrupted or pending task. Three cases of 
this process were studied, see Figure 2-3 
1. Time and behavior of an R4 task finish time to the start of the R3 task in the same triad. (R4 
to R3.) 
triad." (R4 to R3.) 
(R3 to Rl.) 
2. Time and behavior of an R4 task finish time to the start of the R 3  task in the R4 responsible 
3. Time and behavior of the R3 task finish time to the start of the R1 task in the same triad. 
16An R4 responsible triad h the last triad to complete ita R4 task. This triad is respomiblc for the start of the next R4 frame by 
arming ita timer interrupt. 
70 - 
60 - 
50- 
Responsible Triad 40 - 
Percent of Total 30 - 
Distribution 20 - 
10 - 
0 
R 4  to R3 Task 
Switching. R 4  
70 4 
680 data point. 12M) data point. - B O  data point. 
n 
- - - - 7 
n t 4 - m  n 
I I I I I  I I I I I  I I I I I  
Not Applicable 
R 4  to R 3  Task 
Switching. Non-R4 
Responsible Triad 40 
70 - 
60- 
R 3 t o R l T a s k  50- 
Switching . 40 - 
Percent of Total 30 - 
20 - 
10 - 
0 
Distribution 
Percent of ~ o t a l  
Distribution 
10 
146 data point. 
- 
- 
r 
1487 d8ta point. 
-1 
188 data point. OB data point. 
5 6 7 8 9 1 0  5 6 7 8 9 1 0  5 6  7 8 9 1 0  
Figure 4-11: Inter-Task Group Switching Times in Milliseconds 
The behavior of these three processes are summarized in Figure 4-11. As is shown in the figure the 
behavior of these interactions is predictable and well behaved. The variation in the data is probably the 
result of bus contention and queue semaphores or the dispatcher executing different sections of code or 
both. The large spread in the R3 to R1 switching is caused by the same factors as for the spread in the 
intratask group switching. In summary, the intertask group switching times are predictable over a range 
of 4.5 to 10.5 milliseconds. The execution times of the dispatcher for the intertask group switching are 
large for the desired frame iteration rate. The R4 frame is 40 milliseconds with task switching taking 7 
milliseconds, a significant percent of the useable time. 
Experiments 44 
4.5.1.6 FTMF' Dispatcher-Scheduler Software Overhead Summary 
Thii section estimates the useable throughput of the FTMP system from the dispatcher-scheduler 
overheads measured in the previous sections. Table 4 2  gives a breakdown of the available throughput 
and overheads observed. In Table 4 2 ,  a major frame is defined as eight R4 frames or one R1 frame. 
1 A81 
Time 
Imsec.) F" Overhead 
960. 
384.0 
222.9 
Useable Throughput 
Three Triads. 
R4 Dispatcher Time. 
8 Der Frame Triad. 
Task Switching Times 
bR4,3-R3 and 13-R1 
Tasks are Assumed. 
8 R4 responsible. (45.5) 
16 Non responsible. (84.5) 
12 R3 to R1. (81.7) 
3 R1 to R1. (11.2) 
Fault Tolerant Software 46.5 SCC time. 43.3 
READALL time. 3.2 
Total Useable Time 
D e r  Major Frame 306.6 
As Corrected 
Total Time msec. Total Time msec. Total Time 
40.0 11 801.6 I 40.4 11 272.6 I 28.4 
23.2 222.9 11.2 222.9 23.2 
(4.7) (45.5) (2.3) (45.5) (4.71 
(8.8) (84.5) (4.3) (84.5) (8.8) 
(81.7) (4.1) (81.7) (8.5) 
(1.2) (11.2) (0.6) (11.2) (1.2) 
(8.5) 
4.8 46.5 2.3 46.5 4.8 
31.9 912.0 45.9 418.0 43.5 
Table 42:  Performance Estimates for FTMF' 
In Table 4 2 ,  the 'As Designed' columns give the performance estimates assuming a 40 millisecond 
R4 frame (320 millisecond major frame with three triads executing), and the software overhead times 
measured in this report. The R4 dispatcher execution time used was 16 milliseconds: the upper limit of 
acceptable times presented in Figure 4 7 .  This shows 63% of the available throughput is spent in the 
dispatcher. The 'As Running' columns of the table use the actual frame sizes as measured, instead of 
the 40 millisecond R4 frame size assumption and use the true dispatcher times measured in this report, 
Figure 4-7. The overhead percents are lowered but the frames are extended, showing the present behavior 
of the system. This behavior is inappropriate for a real time system with a 25 Hz iteration rate. 
The FTMF' Executive Summary, [Draper 841, noted the large bottleneck caused by the long bus 
access times (Section 4.1). The authors state that one third of the R4 dispatcher time is spent doing bus 
service routines and by microcoding some of the 1/0 functions the bus service times could be lowered to 
one-eighth of their current times. Using this estimate, one third of the dispatcher time reduced 88%, will 
show the R4 dispatcher time lowered to 71% of its current value.17 The 'As Corrected' column of Table 
1733% of the dwpateber h reduced 88% for 1.29% total reduction. 
Experiments 45 
4 2  preaent these! results. The performance increase gained by microcoding some 1/0 functions was 
applied only to the R4 dispatcher: if this increase was applied to all functions, a larger performance 
increase could be expected. 
4.8.2 SIFTL Real Time Scheduler Behavior 
The overhead for SIFT’s Operating System was presented in [Palumbo and Butler 85). This section 
will consider the overhead involved for the real time dispatcher. The results from [Palumbo and Butler 
85]18 will be generalized to provide a parallelism between FTMP and SIFT. 
SJFT’s real time dispatcher is fully distributed with no master controller. Application tasks or fault- 
handling tasks are dispatched by the local executive in each processor. The local executive is responsible 
for the timely dispatching of tasks and the voting of data according to the task and vote schedule. The 
dispatching of the tasks is the only part of SJFT’s operating system considered under the real time 
dispatcher. [Palumbo and Butler 851 determined the dispatching of tasks to be nominally 270 pseconds 
per subframe per processor or 16.9% of a 1.6 millisecond subframe. 
4.8.8 Cornparimon of FTMF’ and SIFT Scheduler 
In comparing SIFT’s and FTMP’s real time dispatcher, SIFT’s dispatcher is more efficient. When 
comparing the two real time dispatchers, the reason for the large difference in overhead include: 
0 SIFT’s dispaicher uses fixed task tables and application code stored in each processor’s local 
memory, whereas each FTMP triad needs to process the task queue and application code 
stored in global memory which require multiple bus accesses. 
0 The flexibility of FTh4F”s dispatcher (e.g. same dispatcher and task queues regardless of 
0 FTMP’s dispatcher disables the timer interrupts hence allowing the dispatcher to fail to return 
Furthermore, the disabling of the timer interrupt allows the 
configuration) requires more processing than the futed tables used in SIFT’s dispatcher. 
control to application tasks. 
frame to stretch possibly forever. [Clune 841 
amount of traffic on the system bus. 
0 FTMP’s dispatcher could be improved by using fixed schedule tables and decreasing the 
0 SIFT’S dispatch overhead while small is a significant percentage of the subframe. If the 
subframe site were increased, the dispatch overhead would be reduced but the distribution of 
tasks may not efficiently fit into the larger frame size. 
l8 pdumbo and Butler 851 presented overhead times for three versions of SIFT’S operating system, this report will only refer to 
the latest, optimized, remion of the operating system, version V. 
Experiments 
Task 
46 
Percent of Sysetm 
ThrounhDut Execution Time Range 
4.4 Fault-Handling Software Overhead 
For both fault-tolerant systems, software tasks must be executed to maintain system integrity in the 
The following sections measure the software overhead required to attain fault presence of faults. 
tolerance for FTMP and SIFT. 
SCC 
READALL 
4.4.1 FTMP Software Overhead for Fault Detection and Isolation 
The final F'I" experiment presented in this report measured the software overhead for fault 
detection and isolation. The FTMP software tasks for these include READALL and the System 
Configuration Controller, SCC. READALL incrementally reads system memory to assure all copies of the 
data in the memory elements are the same. SCC is described in Section 2.2. These tasks are specifically 
dedicated to fault detection and isolation, but they do not include all the software overhead for fault 
tolerance. Some fault detection commands are done in the R4 dispatcher. These include the execution of 
43.4 f 23.3 3.5 -. 153. 4.5% 
3.1 f 0.9 1.25 -. 6.25 0.3% 
reconfiguration and retire commands. This experiment will not take these overheads into 
results for this experiment are presented in Table 4-3. 
account. The 
FTMP's F a u l t  Handl ing Sof tware  Overhead 
!All ). .  
Table 4-3: FTMP Software Overhead for Fault-Handling 
The variation in the READALL times is due to bus contention and different sections of code being 
executed. The SCC times show the spread of data as SCC executes different states. These states include 
fault isolation, shadow swapping, transient fault-handling routines, and self tests. These software 
overheads are a small percentage of total system throughput, 4.8%, in comparison to the real time 
scheduling overheads for FTMF'. 
4.4.2 SIFT Sofiware Overhead for Fault Detection and Isolation 
The functions required to maintain system integrity in SIFT include: 
1. Clock synchronization to limit the clock skews between processors. 
2. Error task to broadcast error information to all processors. 
3. Fault isolation task to determine the faulty device. 
4. Reconfiguration task to reconfigure the system upon detection of a fault. 
Experiments 
5. Interact.Je consistency task to replicate external data to all processors. 
6. Voting of processed data to maintain integrity. 
The first five tasks are executed at the global executive or major frame level (equivalent to an application 
task.) The last task, voting, occurs at the local executive level inside a subframe. The analysis of SIFTls 
faulthandling software overhead from [Palumbo and Butler 851 will be divided along this break: major 
frame vs. subframe. 
At the global executive or major frame level, the faulbhandling tasks are executed once per major 
frame with the exception of the interactive consistency task. The interactive consistency task is executed 
once per iteration of the application tasks." All of these tasks take a constant number of subframes to 
execute and their overhead is tallied in Table 4-4. 
Table 4-4: SIFT'S Fault-Handling Software Overhead [Palumbo and Butler 851 
A t  the local executive or subframe level the faulthandling task is voting. The data voted are 
determined from a vote schedule constructed at  the same time as the task schedule is created. All data 
are voted by each processor to assure consistent data on every processor and to avoid transferring of data 
during system reconfiguration. 
The overhead involved for voting is a function of the number of words voted, the type of vote (three 
or five way), and the number of working processors in the system. The time to vote can be approximated 
by a linear function of the number of words voted with a constant bias or overhead determined by the 
other parameters. These times are summed in Table 4 5 .  In order to estimate the overhead involved 
with voting at a major frame level, four words are assumed an average vote (five way) in every second 
frame with the resulting overhead of 23.6% per subframe. 
"When application tmks ue executed once per major frame, the major frame is called iingle frame. Similarly, when the 
application trsb are executed three times in a major frame, the major frame is called a triple frame. 
Experiments 
72.7% 
Total Software 
Overhead 
48 
66.3% 
Initial Vote Time 
0.245 0.079 
0.328 0.107 
T a b l e  4 6 :  SIFT’S Vote Times [Palumbo and Butler 851 
Combining the result of the global executive and local executive overhead, the total overhead 
involved per major frame is summarized in Table 4 6 .  
Level 
Global Executive 
Frame Frame 
Table 4 6 :  SIFT Software Overhead for FaultHandling Summary 
4.4.3 Comparison of FTMP and SIFT for Fault-Handling Software Overhead 
In comparing FTMP and SIFT’S software overhead for fault-handling, the main difference lies in the 
architecture. SIFT does most of the faulthandling functions in software, hence a large software 
overhead, whereas FTMP accomplishes most of its faulthandling in hardware. Differences between the 
two systems in achieving their fault tolerance and the large difference in overhead stems from the 
following sources: 
0 Clock synchronization for SIFT is achieved in software whereas FTMP maintains 
synchronization with hardware voting of the clock signals. 
0 Voting for SIFT is completed in software according to a fixed schedule table; hence a limited 
quantity of words are voted. FTMP votes during a system bus access; words communicated 
between system memory or system bus devices (I/O ports) and local memory are voted upon. 
The number of words voted in FTMP is larger but does not result in additional software 
overhead. 
Interactive Consistency is the largest overhead in SIFT. Interactive consistency involves the 
replication of external data to all processors of the system. The overhead is proportional to 
the number of application tasks and the number of words required for the tasks. Hence a 
faster processor will reduce the time to achieve data consistency, but the data requirements 
would also increase. This overhead cannot be reduced significantly without hardware 
Experiments 49 
support, [Palumbo and Butler 85). Although interactive consistency was not considered part 
of FTMP overhead, the function is performed in software with hardware support (voters). 
FTMP accomplishes interactive consistency by reading the external data in a simplex mode to 
local memory. The data is written to system memory via a voted write. Hence the overhead 
is two system bus transfers. 
0 The error task, fault isolation, and reconfiguration functions in SIFT are similar to the FTMP 
0 [Palumbo and Butler 851 described the overhead for three versions of the operating system, 
each an improvement over the previous. Although improvements in performance were 
obtained, further reductions in overhead, especially with interactive consistency, does not 
appear likely without hardware support. 
system configuration controller. 
Future Work 50 
5. Future Work 
Although much work has been accomplished in refining the experimental methodology by applying it 
to FTMP, the methodology still needs to be further verified by additional experiments on FTMP and 
SIFT. In particular the following items are some areas in which further characterization of FT" may 
be needed: 
0 Further characterize the dispatcher to determine the time consuming sections and, if possible, 
correct this undesirable behavior. This may be done by characterizing some of the executive 
primitives the dispatcher uses. 
0 Determine the overhead required for system reconfiguration. How much overhead is required 
for the dynamic redundancy of FTMP ? 
0 Characterize the throughput versus workload and task distribution. Will the system still meet 
its deadlines under increased load ? 
0 Further characterize the software overhead for both the faulty and faultfree behavior. This 
includes the times to isolate faults and reconfiguration overhead. 
0 Validate the system configuration controller. Does the controller handle faults correctly ? A 
log of failed units should be kept to determine faults within the units or controller problems. 
0 Explore the fault coverage in the self test routines. How many faults can the self test routines 
locate in the bus guardian unit and system buses ? 
0 Explore the behavior of multiple faults. [Draper 83c) showed the fault-handling capabilities of 
the system by injecting pin level faults. How will the system behave if two faults occur close 
together ? 
These experiments move the validation and performance measurements for FTMP into the application 
level of the performance evaluation matrix, along with exploring faulty behavior of the system. 
Although application of the validation methodology on SIFT has thus far  proven successful, i t  is by 
no means complete. The following discuss a few thoughts in these areas: 
0 Implement a synthetic workload to provide a tool for measuring the performance and 
interaction of the SIFT system. 
0 Determine the action if all processors executing a task exceed the allotted time. 
0 Characterize the interrupts on SIFT. Are the interrupts implemented on SIFT, and how will 
interrupts affect the real time and fault tolerant performance of the system ? 
0 Explore communication bandwidth versus the number of tasks. High performance implies 
minimizing overhead, hence a single task structure. Maximizing reliability requires 
subdividing a task to increase the frequency of voting. 
0 Investigate communication versus I/O, as both intertask communication and 1/0 contend for 
the same resources. 
0 Study broadcast bandwidth on SIFT; load the broadcast system to capacity. 
Future Work 51 
0 Characterire the effects of malicious liars, especially on clock skews. 
0 Explore the minimal time between faults such that reconfiguration is successful. 
These experiments for SIFT move the validation and performance measurements into the executive level 
of the evaluation matrix, along with exploring faulty behavior of the system. 
Conclusions 52 
6. Conclusions 
0 .  
This report outlined a validation methodology for ultrareliable multiprocessors and applied the 
methodology to F T ”  and SIFT. The methodology entails a building block approach, starting with 
simple baseline experiments and building to more complex experiments. Previous work has been done to 
measure the baseline performance, EIS well as characterize most hardware primitives on FTMP [Clune 
84, Feather et  al. 851. This report presents a continuation of the baseline experiments on FTMP and the 
start of validation procedure on SIFT. In particular this report presented: 
0 Clock read delay for SIFT. The global clock proved to be a reliable measuring device with a 
resolution of 28.3 microseconds, five times finer than FTMP’s clock. 
0 High level language instruction execution times for both SIFT and FT” .  The execution 
times measured, for both FTMP and SIFT, were consistent and predictable with SIFT having 
a greater throughput. 
0 System memory read and writes times and the variance of the times caused by bus contention 
on FTMP. The memory read/write times were a linear function of block size 
(400 Kbytes/sec.) with an overhead of approximately 150pseconds. Bus contention showed a 
slight increase in average overhead. SIFT is a fully distributed system with no global or 
system memory. 
In addition to failing to meet its real time 
constraints, the FTMP real time dispatcher consumed approximately 60% of the system 
throughput; whereas the SIFT dispatcher consumed 16% of a single subframe. 
0 Fault-handling software overhead. The software overhead involved in the fault-handling tasks 
consumed 5% of system throughput for FTMP. The overhead required for SIFT’S fault 
tolerance required 66% of the system throughput. 
0 Dispatcher execution timed and overhead. 
From these experiments and their results the following points can be inferred about the FTMP and 
SIFT systems. 
0 In the present implementation of FTMP there is a large overhead consumed by the real time 
dispatcher. About one-third of the dispatcher overhead is caused by the large overhead 
involved in the system bus access which is an implementation problem and not dependent on 
the fault tolerant design of FT”.  
0 With SIFT a large overhead is involved with the fault-handling tasks, especially voting and 
interactive consistency. For FTMP a relatively small software overhead is involved with the 
system configuration controller, fault detection and isolation, thus showing the advantage of 
hardware voting over software voting to obtain system fault tolerance. 
The goal of the validation methodology is to thoroughly test and characterize the performance and 
behavior of an ultra-reliable computer system. The validation methodology presented in Section 2.1 and 
applied throughout this report proved effective in the following areas. 
0 The methodology uncovered -both system implementation dependencies with the instruction 
0 The validation methodology is not machine specific. Although specific experiments were 
executions times and behavioral oddities in the dispatcher-scheduler. 
Conclusions 53 
designed for each machine, the methodology i s  general enough to apply to both systems 
without change. 
0 The methodology showed system validation c a n  occur without using life testing approaches. 
0 By applying a building block approach in a systematic manner, the FTMF' and SIFT systems 
were broken down into manageable levels of experimentation thus concealing system 
complexity from the experimenter. 
0 Finally, most of the experiments were run at the system level, demonstrating system 
validation can be independent of the implementation (LSI or VLSI.) 
The enumerated items demonstrate the feasibility of the validation methodology by addressing the 
problems encountered with the validation of ultrareliable systems. 
Appendix 54 
Appendix A. Instruction Execution Times 
This appendix contains the tabulated results of the execution times of all the instructions measured. 
The predicted execution times are from [Rockwell Collins 791. 
HLL 
Instruction 
B - 1  
B = 17 
B = 257 
J -  1 
D - B  
B - J  
D = - B  
D - B + C  
D = B * C  
D - B / C  
D = .N. B 
D = B .A. C 
D = B . V . C  
D = B . X . C  
D = B .RS. C 
D = B .RS. C 
A3 = B E Q L  C 
A3 = B NEQ C 
A3 = B LES C 
A3 = B GEQ C 
Instruction Execution Times Summary, 16 Bit Integer Operators 
Description 
Integer assign 4 bits 
Integer assign 8 bits 
Integer assign 16 bits 
Integer assign 
extended reference 
Integer variable assign 
Variable assign 
extended reference 
Integer negate 
Integer addition 
lnteger multiply 
Integer division 
Bit wise negate 
Bit wise and 
Bit wise or 
Bit wise exclusive or 
Right shift (1  bit) 
Right shift (2 bits) 
Integer compare = = 
Integer compare != 
Integer compare < 
Integer compare >= 
Execution time 
per one loop 
20.2 f .30 
22.2 f .31 
22.7 f .30 
22.4 f .30 
23.2 f .29 
30.2 f .31 
30.7 f .31 
33.7 f .30 
46.4 f .29 
37.9 f .30 
29.7 f .31 
31.2 f .30 
32.7 f .31 
31.2 f .30 
33.7 f .29 
32.4 f .30 
39.4 f .30 
38.7 f .30 
37.4 f .30 
41.2 f .30 
Instruction 
Time 
4.0 f .22 
6.0 f .22 
6.5 f .22 
6.2 f .22 
5.5 f .21 
12.5 f .22 
7.0 f .22 
10.0 f .22 
20.2 f .21 
21.7 f .22 
6.0 f .22 
15.0 f .22 
15.0 f .22 
15.0 f .22 
16.0 f .21 
16.2 f .22 
23.2 f .22 
21.0 f .22 
21.2 f .22 
23.5 f .22 
rvall 
Pre dicte d 
Time 
2.7 
3.6 
4.1 
4.1 
4.1 
5.4 
6.3 
7.7 
12.8 
13.1 
5.3 
7.8 
7.8 
7.8 - 
- 
14.2 
11.9 
12.1 
14.4 
Pre cent 
Difference 
48.1 
66.7 
58.5 
51.2 
34.1 
131. 
11.1 
30.0 
57.8 
65.6 
13.2 
92.3 
92.3 
92.3 - 
- 
63.3 
76.4 
75.2 
63.2 
Table A-1: FT" Instruction Execution Times: Integer 
~~ 
Appendix 
A3 = B EQL C 
A3 = B NEQ C 
A3 = B LES C 
A3 = B GEB C 
Real compare == 40.9 f .29 23.2 f .21 14.2 63.3 
Real compare != 38.7 f .30 21.0 f .22 11.9 76.4 
Real compare C 38.9 f .30 21.2 f .22 12.1 75.2 
Real comDare >= 41.2 f .29 23.5 f .21 14.4 63.2 
HLL 
Instruction 
B = 1  
B - 17 
B = 257 
B = 65537 
J =  1 
Instruction 
Time 
12.0 f .21 
14.2 f .21 
14.7 f .21 
12.0 f .22 
12.7 f .22 
14.5 f .21 
17.0 f .22 
B = C  
B = J  
Predicted 
Time 
5.7 
6.6 
7.1 
5.7 
7.1 
7.8 
9.3 
A = - B  
A = B + C  
D = B * C  
A = B / C  
A3 = B E Q L  C 
A3 = B NEQ C 
A3 = B L E S C  
A3 = B GEQ C 
Instruction Execution Rmes  Summary Long, 32 bit  Integers 
Description 
Long assign 
Long assign (8 bib)  
Long assign (16 bits) 
Long assign (4 bits) 
Extended variable 
reference assign 
Long variable assign 
Extended variable 
reference 
Long negate 
Long addition 
Long multiply 
Long division 
Long compare == 
Long compare != 
Long compare < 
Long comDare >= 
e is W 
Execution time 
Der one 1000 
29.7 f .28 
31.9 f .29 
32.4 f .30 
29.7 f .30 
30.4 f .30 
32.2 f .29 
33.2 f .30 
42.2 f .30 
48.7 f .30 
64.9 f .29 
87.4 f .31 
56.4 f .29 
54.2 f .30 
54.2 f .28 
56.4 f .30 
18.5 f .22 
32.5 f .22 
48.7 f .22 
71.2 f .22 
38.7 f .21 
36.5 f .22 
36.5 f .21 
38.7 f .22 
15.3 
17.9 
31.7 
45.4 
17.8 
15.5 
15.7 
18.0 
Recent  
Difference 
110.5 
115.2 
107.0 
110.5 
78.9 
85.9 
82.8 
20.9 
81.5 
53.6 
56.8 
117.4 
135.5 
132.5 
115.0 
Table A-8: FTMP Instruction Execution Times: Long Integers 
Appendix 56 
Instruction Execution Times Summary Boolean ODerators 
Instruction Predicted 
Time Time 
- 15.7+ 
- 17.6+ 
37.0 f .22 35.8 
51.7 f .21 38.0 
57.5 f .22 40.2 
63.2 f .22 42.4 
69.0 f .22 44.6 
9.0 f .22 7.9 
5.5 f .22 5.2 
13.2 f .22 10.1 
-~ ~ ~~ 
HLL 
Instruction 
A = TRUE 
A = B. 
A = N O T B  
A = B O R C  
Recent  
Difference 
- 
54. 
36.0 
43.0 
49.0 
54.7 
13.9 
5.8 
30.7 
A = B O R C  
A = B O R C  
A = B A N D C  
A - B A N D C  
A - B A N D C  
Description 
~ 
Boolean assign 
Boolean variable assign 
Boolean negate 
Boolean OR = F 
2 tests required 
Boolean OR = T 
on 1st condition 
Boolean OR = T 
on 2nd condition 
Boolean AND = T 
2 tesb required 
Boolean AND = F 
on 1st condition 
Boolean AND = F 
on 2nd condition 
E!!&== 
Predicted 
Time 
2.7 
4.1 
7.9 
13.1 
10.2 
15.4 
15.4 
7.9 
13.1 
Table A-4: FT" Instruction Execution Times: Boolean 
Recent  
Difference 
48.1 
34.1 
38.0 
64.1 
102.9 
53.9 
63.6 
89.9 
75.5 
Instruction Execution Times Summary Miscellaneous Operators 
HLL 
Instruction 
NULL 
NULL 
Testl( B) 
Test2( B,C) 
Test3(B,C,D) 
TesU(B,C,D ,E) 
If A3 then B = 1 
If A3 then B = 1 
If A3 then B = 1 
If A3 then B = 1 
T e s W  
Else C = 1 
Else C = 1 
Ta 
Procedure call 
Procedure call 
Procedure call 
Procedure call 
Conditional, True 
Conditional, False 
Conditional, True 
Conditional, False 1 33.2 f .32 I 9.5 .22 1 7.9 I 20.3 
I I I I 
le A-6: FTMP Instruction Execution Times: Miscellaneous Operators 
Appendix 
Pasc a1 
Instruction 
A := 1 
A :- B 
A : = B + C  
A : = B * C  
A := B div C 
A := -B 
A : = B = C  
A := B >= C 
A := B C C 
A := True 
A := B 
A := B or C 
A := B and C 
A := NOT B 
57 
Description 
Integer Assign 
Integer Variable Assign 
Integer Addition 
Integer Multiply 
Integer Division 
Integer Negate 
Integer Compare 
Integer Compare 
Integer Compare 
Boolean Assign 
Boolean Variable Assign 
Boolean Or  
Boolean And 
Boolean Negate 
Raw Data: Read Time Clock Delay 
Table A-6: Raw Data: SIFT Clock Read Experiment 
Instruction Execution Times: Integer and Boolean 
microsecs/ 100 
inst. w/overhead 
1456.28 2 .MI72 
1525.00 2.0047 
1731.19 f.0056 
2343.46 2.0089 
3169.47 2 .0089 
2031.08 2.0036 
1937.37 2 .0083 
2056.07 f .0033 
2031.08 2 .0036 
1456.26 2.0069 
1526.26 f .0036 
1774.96 2.0035 
1774.94 2 .a37 
1712.43 2.0088 
microsecs 1 
w/overhead 
14.56 
15.25 
17.31 
23.43 
31.69 
20.31 
19.37 
20.56 
20.31 
14.56 
15.26 
17.75 
17.75 
17.12 
r instruction 
w/o overhead 
3.70 
4.39 
6.45 
12.57 
20.83 
9.48 
8.51 
9.70 
9.45 
3.70 
4.39 
6.89 
6.89 
6.26 
Table A-7: SIFT Instruction Execution Times: Integer and Boolean Data Types 
Appendix 58 
A := 1 
A := B + C 
A := 1 
A := B * C 
A := 1 
A := B div C 
A := 1 
A := B + C 
Instruction Execution Times: Miscellanous 
2674.57 f .0088 
3113.23 2.0085 
3525.58 f .0087 
Else  B:=l 
Assign & Add 2074.83 f .0049 20.75 9.88 
Combination 
Assign & Mult 2687.12 2.0043 26.87 16.01 
Combination 
Assign & Div 3513.13 f .0077 35.13 24.27 
Combination 
Assign, Add, Mult 3331.93 f .0067 33.32 22.46 
Combination 
Table A-8: SIFT Instruction Execution Times: Miscellaneous Instructions 
Assign, Add, Div 
Combination 
4131.63 f .0020 41.32 30.45 
A : = B + C  
A := 1 
A : = B * C  Combination I I 45.64 I 34-78 I Assign, Mult, Div 4564.03 f .0170 
I I I I 
Table A-9: SIFT Instruction Execution Times: Instruction Combinations 
Appendix 
Time If Done 
Separately 
10.15 
16.27 
59 
Time For Percent Difference 
Combination SeDarate vs. Combo 
9.88 2.7% 
16.01 1.6% 
~~~ ~ 
Pascal 
Instruction 
A := 1 
A : - B + C  
A := 1 
A : = B * C  
A := 1 
A := B div C 
A := 1 
A : = B + C  
A : = B * C  
A :- 1 
A : = B + C  
A : = B / C  
A := 1 
A : = B * C  
A : = B / C  
37.1 
:omparison of Instruc 
in 
Description 
. .  
~~ ~~~ 
34.78 6.7% 
Assign & Add 
Combination 
Assign & Mult 
Combination 
Assign & Div 
Combination 
Assign, Add, Mult 
Combination 
Assign, Add, Div 
Combination 
Assign, Mult, Div 
Combination 
1.1% I 24.53 I 24.27 
22.72 22.46 1.2% 
Table A-10 SIFT: Comparison Instruction Combinations Not Done on FTMP 
Appendix 60 
Appendix B. Block Transfer Execution Times 
This appendix contains tabulated results of the block transfer experiment: 
. .  ven in 
Block Size 
[words = 16 bits) 
1 
2 
3 
4 .  
5 
10 
15 
20 
25 
50 
100 
125 
150 
175 
200 
are 959% C 
Block Transf 
1 Triad 
162.7 f 1.54 
165.7 f 1.06 
168.6 f 1.38 
171.6 f 1.45 
174.7 f 0.74 
189.6 f 0.82 
204.7 f 0.71 
224.7 f 0.76 
235.7 f 1.08 
310.8 f 1.13 
460.7 f 1.11 
535.6 f 1.02 
610.7 f 1.10 
685.6 f 1.02 
760.8 f 1.13 
r Time with 
2 Triads 
170.7 f 4.02 
172.9 f 4.60 
176.3 f 5.94 
176.3 f 4.33 
180.3 f 3.86 
193.3 f 4.31 
208.6 f 3.71 
230.9 f 3.99 
239.3 f 3.86 
316.0 f 6.12 
465.3 f 6.23 
541.0 f 6.63 
621.3 f 10.3 
692.0 f 8.48 
765.1 f 6.56 
Table El: FTMP Block Transfer Times, Read from System to Local Memory 
Block Size 
[words = 16 bits) 
1 
2 
3 
4 
5 
10 
15 
20 
25 
50 
100 
125 
150 
175 
200 
Block Trans 
1 Triad 
158.0 f 1.35 
163.7 f 1.35 
168.6 f 1.38 
173.7 f 1.34 
178.7 f 1.35 
203.6 f 1.38 
228.6 f 1.38 
258.7 f 1.36 
283.7 f 1.35 
408.6 f 1.38 
658.6 f 1.38 
783.7 f 1.36 
908.8 f 1.33 
1033.8 f 1.31 
1158.7 f 1.35 
r Time with 
2 Triads 
170.4 f 4.03 
170.6 f 3.77 
178.5 f 6.22 
178.9 f 3.81 
186.2 f 4.45 
207.8 f 2.76 
233.7 f 3.90 
263.1 f 3.56 
290.0 f 5.81 
413.7 f 4.23 
663.6 f 5.00 
790.8 f 7.63 
919.9 f 10.1 
1039.8 f 6.04 
1164.2 f 6.18 
Table B-2: FTMF' Block Transfer Times, Write from Local to System Memory 
References 
Refer en ces 
[Clune 841 Ed Clune. 
Analysis 0, the Fault Free Behavior of the FTMP Multiprocessor System. 
Technical Report CMU-CS-84130, Carnegie Mellon University, 1984. 
[Cseck et al. 851 Edward W. Cseck, Daniel P. Siewiorek, Zary Z. Segall. 
61 
[Draper 83.1 
[Draper 83b] 
[Draper 83c] 
[Draper 841 
Fault Free Per formance Validation of a Fault-Tolerant Multiprocessor: Baseline and 
Technical Report CMU-CS-85-177, Carnegie Mellon University, 1985. 
Development and Evaluation of a Fault-Tolerant Multiprocessor Computer, Vol. Z, 
FTMP Rinciplee of Operations 
Charles Stark Draper Laboratories, 1983. 
NASA CR 166071. 
Development and Evaluation of a Fault-Tolerant Multiprocessor Computer, Vol. ZZ, 
FTMP So ftunrre 
Charles Stark Draper Laboratories, 1983. 
NASA CR 166072. 
Development and Evaluation of a Fault-Tolerant Multiprocessor Computer, Vol. ZZZ, 
FTMP Test and Evaluation 
Charles Stark Draper Laboratories, 1983. 
NASA CR 166073. 
Development and Evaluation of a Fault-Tolerant Multiprocessor Computer, Vol. W, 
FTMP Ezecutive Summary 
Charles Stark Draper Laboratories, 1984. 
NASA CR 172286. 
Synthetic Workload Measurements. 
[Feather et  al. 851 
Frank Feather, Daniel Siewiorek, and Zary Segall. 
Validation of a Fault-Tolerant Multiprocessor: Baseline Ezpetiments and Workload 
Technical Report CMU-CS-85-145, Carnegie Mellon University, July, 1985. 
Implementation. 
[Ferrari 781 Domenico Ferrari. 
Computer Systems firformanee Evoluation. 
PrenticeHall, 1978. 
[Green et al. 841 David F. Green, Jr., Daniel L. Palumbo, and Daniel W.Baltrus. 
So ftunrre Implemented Fault-Tolerant (SZFT) User 'e Guide 
NASA-Langley Research Center, 1984. 
NASA TM 86289. 
H.M. Holt, A.O. Lupton, and D.G. Holden. 
Flight Critical System Design Guidelines and Validation Methods. 
AIAA/AHS/2SEE Aircraft Design Systems and Operating Meeting ( M - 8 4 2 4 6 1 ) ,  
[Holt et al. 841 
1984. 
[Hopkins et al. 781 
A.L. Hopkins, T.B. Smith, and J.H. Lala. 
FTMP - A Highly Reliable Multiprocessor. 
In Roceeding of the ZEEE, pages 1221-1237. October, 1978. 
62 
[Lala 851 
(NASA 79a) 
(NASA 79b) 
[Palumbo 851 
Thomas H. Kong. 
Measuring Time for Performance Evaluation of Multiprocessors Systems. 
Master’s thesis, CarnegieMellon University, 1982. 
Parag I(. Lala 
Fault Tolerant d Fault Testable Hardware Design. 
Prentice Hall International, 1985. 
NASA-Langley Research Center. 
Validation Methods for Fault-Tolerant Avionics and Control Systems - Working Group 
NASA Conference Publication 21 14. 
Research Triangle Institute. 
Validation Methods for Fault-Tolerant Avionics and Control System6 - Working Group 
NASA Conference Publication 2130. 
Daniel L. Palumbo. 
The SIFT Hardware/So ftware Systems 
NASA-Langley Research Center, 1985. 
NASA TM 87574. 
Meeting I, NASA-Langley Research Center, 1979. 
Meeting 11, NASA-Langley Research Center, 1979. 
[Palumbo and Butler 851 
Daniel L. Palumbo and Ricky W. Butler. 
Meaeurement of SIFT Operating System Overhead 
NASA-Langley Research Center, 1985. 
NASA TM 86322. 
[Rockwell Collins 791 
CAPS Instruction Set Description 
Rockwell Collins, 1979. 
[Shin and Krishna 841 
Kang G. Shin, C.M. Krishna. 
Characterization of Real-Time Computers 
NASA-Langley Research Center, 1984. 
NASA CR-3807 
[Siewiorek and Swarz 821 
* Daniel P. Siewiorek and Robert S. Swarz. 
The Theory and Ractice of Reliable System Design. 
Digital Press, 1982. 
[Siewiorek, Bell, and Newell 821 
Daniel P. Siewiorek, C. Gordon Bell, and Allen Newell. 
Computer Structures: Rinciplee and Ezamples. 
McGraw-Hill Book Company, 1982. 
Investigation, Development, and Evaluation of Arfownancc Roving for Fault- 
Tolerant Computers 
SRI International, 1983. 
[SRI 831 
NASA CR- 166008. 
References 63 
[SRI 84) Development and Analysis of the So ftrucrre Implemented Fault-Tolerance (SIm) 
Computer 
SRI International, 1984. 
NASA CR 172146. 
[TOY 781 W.N. Toy. 
FaultTolerant Design of Local ESS Processors. 
ZEEE Trans on Computers :17261745, October, 1978. 
[Walpole and Myers 821 
Ronald E. Walpole, and Raymond H. Myers. 
Bobability and Statistice for Engineers and Scientists. 
The Maemillan Company, 1982. 
[Wensley et al. 781 
J.H. Wensley, L. Lamport, J. Goldberg. M.W. Green, K.N. Levitt, P.M. Melliar-Smith, 
R.E. Shostak, and C.B. Weinstock. 
SIFT: A Computer for Aircraft Control. 
In Rocceding of the ZEEE, pages 1240-1255. October, 1978. 
Standard Bibliographic Page 
1. Report No. 
NASA CR-178236 
2. Government Accession No. 
3. Performing Organization Name and Address 
Carnegie-Mellon University 
Department of Electrical and Computer Engineering 
Pittsburgh, PA 15213 
12. Sponsoring Agency Name and Address 
National Aeronautics and Space Administration 
Washington, DC 20546 
19. Security Classif.(of this report) 20. Security Classif.(of this page) 
Unclassified Unclassified 
15. Supplementary Notes 
Langley Technical Monitor: George B .  Finelli 
21. No. of Pages 22. Price 
67 A04 
3. Recipient's Catalog No. 
5. Report Date 
January 1987 
6. Performing Organization Code 
8. Performing Organization Report No. 
10. Work Unit No. 
11. Contract or Grant No. 
NAG 1-190 
13. Type of Report and Period Covered 
Contractor Report 
14. Sponsoring Agency Code 
505-66-21-01 
16. Abstract 
A validation methodology for testing the performance of fault-tolerant 
computer systems was developed and applied to the Fault-Tolerant 
Multiprocessor (FTMP) at NASA-Langley's AIRLRB facility. This methodology was 
claimed to be general enough to apply to any ultrareliable computer system. 
The goal of this research was to extend the validation methodology and to 
demonstrate the robustness of the validation methodology by its more extensive 
application to NASA's Fault-Tolerant Multiprocessor System (FTMP) and to the 
Software Implemented Fault-Tolerance (SIFT) computer System. Furthermore, the 
performance of these two multiprocessors was compared by conducting similar 
experiments. 
An analysis of the results shows high level language instruction 
execution times for both SIFT and FTMP were consistent and predictable, with 
SIFT having greater throughput. 
60% of the throughput for its real-time dispatcher and 5% on fault-handling 
tasks. In contrast, SIFT consumes 16% of its throughput for the dispatcher, 
but consumes 66% in fault-handling software overhead. 
At the operating system level, FTMP consumes 
17. Key Words (Suggested by Authors(s)) 
Fault-Tolerant SIFT 
Validat ion 
Performance Measurement 
Multiprocessors FTMP 
18. Distribution Statement 
Unclassified - Unlimited 
Subject Category - 62 
NASA Lsngley Form 63 (June 1985) 
