Validation of a fault-tolerant multiprocessor: Baseline experiments and workload implementation by Siewiorek, Daniel et al.
Validation of a 
Fault-Tolerant Multiprocessor: 
Baseline Experiments and 
Workload Implementation 
Frank Feather, Daniel S i i o r e k ,  Zary Segdl 
22 July 1986 
DEPARTMENT 
of 
(hASA-CR-18ll~8) V A L l L A T I C h  C E  A NL17-2t277 
E A G L T - T C L E R A N I  PCIIIEICCESSCI:  € P . S E L I N E  
tXPEhlIEli'XS A E D  G t C D K L C A C  I P E L E P E h l A T I O N  
(Carneqie-Rel lon C r i v . )  5 3  F A v a i l :  " I I S  
€ C  A C 4 / P I F  A01 C S C L  0 3 B  G 3 / 0 0  O C 9 3 1 6 1  
Unclas  
. I '  
Carnegie-Mellon University 
https://ntrs.nasa.gov/search.jsp?R=19870018844 2020-03-20T10:07:10+00:00Z
i 
t 
. 
~ ~- 
CMU-C S -8 5- 14 5 
Validation of a 
Fault-Tolerant Multiprocessor: 
Baseline Experiments and 
Workload Implementation 
Frank Feather, Daniel Siewiorek, Zary Segall 
22 July 1985 
Department of Electrical and Computer Engineering 
and Department of Computer Science 
Carnegie-Mellon University 
Schenley Park 
Pittsburgh, P A  15213 
This Research was sponsored by the National Aeronautics and Space Administration, Langley Research 
Center under contract NAG-1-190. The views and conclusions contained in this document are those of 
the authors and should not be interpreted as representing the official policies, either expressed or implied, 
of NASA, the United States Government or Carnegie-Mellon University. 
i 
Table of Contents 
Abstract 
1. Introduction 
2. Background 
2.1. Guidelines to Experiments 
2.2. Proposed Methodology 
2.3. Definition of Performance 
2.4. The FTMP and Experimentation Environment 
2.5. Previews of Experiments 
3.1. Mechanisms 
3.2. Interrupts, System Validation, and Performance 
3.3. Interrupts on FTMP 
3.4. Experimental Results 
4.1. Definition 
4.2. Advantages of A Synthetic Workload 
4.3. Motivations 
4.4. A Realtime Workload Model 
4.5. Implementation of the Synthetic Workload on FTMP 
3. Interrupts 
4. Workload 
4.5.1. User Interfaces 
4.5.2. Implementation: FTMP Tasks and Workload Consideratlons 
4.5.3. Calibration 
5. Future Work 
6. Conclusion 
I. Test of Select RD/WRT Primitives 
II. Example of Workload Use 
1 
2 
3 
3 
3 
4 
6 
12 
14 
14 
15 
16 
17 
20 
20 
20 
22 
22 
24 
24 
28 
29 
36 
37 
38 
39 
.. 
11 
Figure 2-1: 
Figure 2-2: 
Figure 2-3: 
Figure 2-4: 
Figure 2-5: 
Figure 2-6: 
Figure 3-1: 
Figure 4-1: 
Figure 4-2: 
Figure 4-3: 
Figure 4-4: 
Figure 4-5: 
Figure 4-6: 
Figure 4-7: 
Figure 4-8: 
Figure 4-9: 
List of Figures 
Performance Evaluation Matrix 
Software Appearance of FTMP (virtual machine) 
Task Control Block Structure 
Frame Structure 
FTMP Support Environment 
Steps to Creating a Program 
Summary of FTMP’s Interrupts 
General scheme of performance comparisons among n systems [Ferrari 781 
Representation of a Synthetic Workload Task 
Workload Model [Clune 841 
FTMP Synthetic Workload Environment 
Task Switching Overhead 
Task Startup Overhead 
Baseline Experiment: Task Switching Overhead 
Workload Experiment: Task Switching Overhead 
Baseline Experiment: Task Startup Time 
Figure 4-10: 
Figure 4-11: 
Figure 4-12: 
Figure II-1: 
Figure II-2: 
Workload Experiment: Task Startup Time 
Baseline Experiment Task (AED) 
Synthetic Workload Task (AED) 
Illustration of Workload Tasks 
Running the FTMP Workload 
5 
7 
9 
10 
11 
12 
18 
21 
23 
25 
27 
30 
30 
31 
31 
32 
33 
34 
35 
40 
41 
1 
Abstract 
In the future, aircraft must employ highly reliable multiprocessors in order to achieve flight safety. 
Such computers must be experimentally validated before they are deployed. This project outlines a 
methodology for validating reliable multiprocessors. The methodology begins with baseline experiments, 
which test single phenomenon. As experiments progress, tools for performance testing are developed. 
This methodology is used, in part, on the Fault-Tolerant Multiprocessor ( F W )  at NASA-Langley’s 
AIRLAB facility. Experiments were designed to evaluate the fault-free performance of the system. 
This report presents the results of interrupt baseline experiments performed on FTMP. Interrupt 
causing exception conditions were tested, and several were found to have unimplemented interrupt 
handling software while one had an unimplemented interrupt vector. A synthetic workload model for 
realtime multiprocessors is then developed as an application level performance analysis tool. Details of 
the workload implementation and calibration are presented. 
Both the experimental methodology and the synthetic workload model are general enough to be 
app!icz!de to reliable nultiprccessors beside FT?.P. 
2 
1. Introduction 
In the 1990's aircraft will employ computers that must run correctly and continuously for the aircraft to 
fly. NASA, in its Aircraft Energy Efficiency (ACEE) program requires that the probability of failure in 
these computers be less than lo-'' per hour. Meeting such requirements can not be achieved with 
standard realtime computers; instead fault-tolerant computers have been developed to meet these 
requirements. Two such systems are SIFT (Softwace Implemented Fault-Tolerance) [Wensley 
78) conceived by SRI International and fabricated by Bendix Corporation; and FTMP (Fault-Tolerant 
Multiprocessor) [Hopkins 781, conceived by MIT's Charles Stark Draper Laboratory, Inc. and fabricated 
by Collins. 
These complex systems, which must meet stringent performance requirements, have to be validated (i.e. 
proven functionally correct). However, since a probability of failure of lo-'' per hour translates to one 
failure per million years of operation, a validation method must be developed to discover flaws in the 
design and implementation before such a system is placed into service. Proving a system correct can take 
place at many stages from mathematical models and theorem proving, also called verification, to 
experimental testing, called validation. Mathematical models of the system are based on simplifying 
assumptions and can be used in conjunction with, but not as a substitute for, actual experimentation. 
Indeed, many of the errors in a system surface during the experimentation and use of the system. Bell 
Telephone [Toy 781 divided the causes of system outages for their fault tolerant electronic switching 
systems into several categories. The percentages given for each category represents fraction of total down 
time measured in the field attributed to each cause: 
I 
0 Hardware Reliability: Actual component failures - 20% 
0 Software Deficiencies: Software design errors - 15% 
0 Recovery Deficiencies: Inability to detect, isolate, and correctly recover from faults - 35% 
0 Procedural Errors: Human error on the part of maintenance personnel or office 
administrators -- 30% 
Fault-Tolerant techniques directly impact the first category. The later three categories are all forms of 
design errors. These errors can be reduced by effective system design and validation. 
The goal of this research is to develop a methodology for the validation of the fault free performance of 
fault-tolerant avionic multiprocessors. Initially this methodology will be applied to FT", although the 
approach should be general enough to migrate to other fault-tolerant systems like SIFT. 
3 
2. Background 
2.1. Guidelines to Experiments 
Over the last decade, Carnegie-Mellon University has devoted over 100 man-years to the design, 
construction, and validation of multiprocessor systems. Some of the guidelines developed over the last 
decade include: 
The experimental validation methodology is successively refined as experiments uncover new 
information and/or the methodology is applied to new multiprocessor systems. 
Design experiments to validated behavior that is documented as well as uncover behavior that  
is not documented. 
Perform experiments in a systematic manner. Since the search is for the unexpected there is 
no shortcut to thorough testing. 
Experiments should be repeatable. 
The feasibility of performing various experiments is tempered by what is available in the 
experimental environment. More sophisticated experiments may have to be postponed untii 
the experimental environment is provided with more tools. - 
A building block approach should be used wherein one variable is changed at a time so that 
causes of unexpected behavior are easy to isolate. 
Testing should take advantage of the abstract levels used in the design of the system. 
Using these guidelines, we will develop a generalized methodology for testing multiprocessor systems. 
2.2. Proposed Methodology 
Showing that a computing system, as designed, will meet its dependability goals is called validation 
[NASA 79al. In 1979, NASA held several workshops to determine system validation procedures. One in 
particular [NASA 79b], produced a detailed list of validation categories to evaluate the system in an 
orderly manner. A building block approach was chosen so that  confidence in the system would be built 
up in an incremental manner starting with the understanding and measurement of primitive hardware 
and operating system activities. After primitive activities are characterized, more complex experiments 
are devised to define interactions between primitive activities. This orderly progression insures uniform 
coverage and makes i t  easier to locate the cause of an unexpected phenomenon. Steps in the proposed 
methodology included: 
1. Initial Checkout and Diagnostics 
2. Programmer’s Manual Validation 
3. Executive Routine Validation 
4. Multiprocessor Interconnect Validation 
5. Multiprocessor Executive Routine Validation 
6. Application Program Validation and Performance Baseline 
4 
7. Simulation of Inaccessible Physical Failures 
8. Single Processor Fault Insertion 
9. Multiprocessor Fault Insertion 
10. Single Processor Executive Failure Response Characterization 
11. Multiprocessor System Executive Fault Handling Capabilities 
12. Application Program Validation on Multiprocessor 
13. Multiple Application Program Validation on Multiprocessor 
The first six tasks validate the fault free functionality of the system while the next seven validate fault 
handling capabilities. Step 1, initial checkout and diagnostics, is usually done before system delivery, 
while Step 2, manual validation, is ongoing throughout the testing process. Part of this project involved 
updating and clarifying information in FTMP’s manuals [Draper 83a, Draper 83b] with a user’s guide 
[Feather 84). Of the other fault free validation steps, Step 4 is considered hardware validation, Steps 3 
and 5 are operating system level validation, and Step 6 is application level validation. This project deals 
with fault free performance (Steps 2 through 5 ) )  and develops an application level tool called the 
synthetic workload to address Step 6. 
Ideally, hardware and operating system validation should take place in the development stage of the 
respective levels. For example, as the operating system is written, a set of validation tests is produced. 
Each step of the methodology, like the whole methodology, follows a building block approach. First, 
baseline experiments are conducted. Baseline experiments measure a single phenomenon while all other 
interactions are held constant. These experiments are designed to validate the basic assumptions used in 
the mathematical models as well as validate the assumptions made by the application programmers. 
Once individual phenomenon have been characterized, more advanced experiments can. be conducted 
which explore the interaction between basic phenomena. 
As stated in the experiment guidelines, the validation procedure is tempered by the available 
experimental environment. This implies that  at any one step, more sophisticated experiments may have 
to be postponed while the experimenter moves on to the next step to conduct baseline experiments until 
the advent of more sophisticated experimental tools. Experiments can proceed in parallel if tools are 
available at a higher yet disjoint step. For example, at AIRLAJ3, fault insertion experiments occur in 
parallel with fault-free validation and performance experiments. 
2.3. Definition of Performance 
Validation experiments test system behavior and establish whether the system works correctly. That  is, 
validation experiments test functional correctness. In addition to establishing behavior, performance can 
also be measured. Performance refers to how well a system, assumed to be functionally correct, works. 
Validation and performance are not always dichotomous; in some systems, if performance criteria are not 
met the system is considered to be incorrect. Therefore, validation experiments are usually accompanied 
5 
by performance analysis. For example, testing basic instruction times, besides testing functional 
correctness of hardware instructions, also can be used to estimate total system throughput in terms of 
operations per second. 
Performance measurements can be conducted at many levels, starting with the instruction set, working 
up to the operating system and then the application level. Three parameters which can be measured at 
each level are Throughput, Utilization, and Delay. Initially, the baseline experiments took measurements 
from the instruction set and operating system level. However, these experiments quickly progressed to the 
application level with construction of the synthetic workload. There are several advantages to validation 
at the application level: 
1. This is the level that real programs (i.e. natural workloads) run. Any meaningful statements 
about computer performance to the application programmer must be based on measurements 
made at this level. 
2. Experiments are much easier to design at the application level. The person validating the 
! ]--------------------------------------------------------------- 
Application I Display, I Subtask I Idle I Write, I 
I Flight Control I Execution I Time I Read Delay I 
I I Times I I & Variation I 
I I--------------------------------------------------------------- 
FXecutive I Scheduler, I 0s I OS Primitives I Primitive I 
Software I Message I Primitives I Freq. of I Variation, I 
I System I Times I Use I Contention I 
I I--------------------------------------------------------------- 
Instruction I Instruction, I Instr. & I Resource I Resource I 
Set I Exceptions I Resource I Freq. of I Variation, I 
I I Times I Use I Contention I 
I I--------------------------------------------------------------- 
system at this level does not need hardware and/or operating system expertise. 
Behavior Throughput Utilization Delay 
Figure 2-1: Performance Evaluation Matrix 
Figure 2-1 illustrates the system levels and the types of performance experiments that  can take place at 
each level. In more detail, the performance measurements are: 
0 Throughput: 
o Instruction Set: Measure the time to access limited resources (e.g. memory, clock) and 
o Operating System: Measure the execution times of the operating system primitives and 
o Application Software: Measure the execution times of the different subsections of each 
execute instructions 
tasks 
application task 
Utilization: 
o Instruction Set: Frequency and percentage of hardware resource used 
o Operating System: Frequency of OS primitives use 
o Application Software: Measure idle time between tasks 
Delay (and Variation): 
o Instruction Set: 
o Operating System: Variation in execution of primitives due to resource contention 
o Application Software: Delay (and variation) between a data write and a data read of 
Variation in the access time of resources; amount of contention for 
resources 
common data 
In general, baseline experiments are conducted at the instruction set and operating system levels while 
more complex measurements occur at the application level. 
Initially, this project deals with instruction set/executive level baseline experiments (interrupts). 
However, realizing that the most meaningful performance statements come from the application level, an 
application level performance tool called the synthetic workload was developed. Baseline experiments and 
workload implementation were done on the FaultrTolerant Multiprocessor (FTMP). The next section 
discusses that computer. 
2.4. The FTMP and Experimentation Environment 
The Fault-Tolerant Multiprocessor (FT") has been discussed in several papers and manuals [Draper 
This section is a software overview of FTMP from the application programmer's 83b, Hopkins 781. 
perspective. The reader is referred to the references mentioned above for more details. 
Figure 2-2 illustrates the F T "  system. Each processor is this figure actually consists of three 
processors in a fault-tolerant configuration executing in lockstep. This trio of processors is sometimes 
referred to as a processor triad or a oirtual processor because the application programmer sees it as a 
single processor. Likewise, memory is in a triad configuration. The FTMP can consist of one, two or 
three processor triads. The 
PROM contains frequently used executive code and is identical in all processors. Each processor's RAM 
holds local variables and stack, plus application software paged in from global memory. A bus connects 
the triads to global memory, 1/0 devices, a real-time clock and several latches needed for fault handling. 
The triads execute independently of each other when accessing global memory. If a program running on a 
processor triad uses a global variable, the program must first move the variable from global to local 
memory with a bus service routine. Similarly, the variable is written back to global memory with 
another bus service routine. 
Each triad has a local memory which is divided into PROM and RAM. 
Work on FTMP is performed by tasks. A task is a single thread of execution that runs by itself. Each 
task has a time limit associated with it. If a task does not complete by its allotted time it  is aborted and 
7 
Processor Processor I 
I 2 
8K 
PROM 
Processor 
3 
GLOBAL 
MEMORY 
32K 
SYSTEM BUS E r r o r  
Latches 
I I /o I /o 
Reel Time Port Por t  
Clock 1 1  10 . A 
+ I / O  Por t  8 I 
Figure 2-2: Software Appearance of FTMP (virtual machine) 
8 
1 another task is started. A task can execute on any processor triad , 
In a realtime system a task is run at regular interval which defines the task's iteration rate. Not all 
tasks need to run at the same iteration rate. For example, the task that updates the display terminal 
does not have to be executed nearly as often as the task that monitors and adjusts the plane's airspeed. 
Tasks are grouped by common iteration rate, called rate groups, and are run within frames. A frame 
defines the execution interval length and is essentially one over the iteration rate. In the time allotted by 
the frame, the working triads must execute all the tasks defined for the frame's iteration rate. Task 
control blocks, which contain all the information necessary to run a task, are in a linked list resident in 
global memory. When FTMP is in a 
multiple triad configuration, some tasks will execute in parallel. When there are no more tasks left in a 
particular iteration rate group to execute, a triad will either become idle or start executing tasks from a 
lower iteration rate group. Figure 2-3 is an example of a task control block structure arranged by rate 
groups (defined below). The control blocks in this figure are those of the synthetic workload (Section 4). 
Individual triads access this global list to select a task to run. 
The FTMF' has three iteration rates which define three different frame sizes. There are separate task 
control block lists - one for each rate group. The frame sizes are: 
0 R4, the basic frame size 
0 R3, equivalent to 2 R4 frames 
0 R1, equivalent to 4 R3 frames; also called the major frame 
Figure 2-4 illustrates the different frames and their execution frequencies. 
FTMP handles the multiple rate groups as follows. At  the beginning of an R4 frame, one of the triads, 
called the responsible triad, starts the R4 frame for that  triad and signals another triad to start its 
frame. This second triad in turn signals the third triad, if i t  exists, to start its R4 frame. Each R4 frame 
does not necessarily have the same responsible triad. Every second R4 frame signals the start of an R3 
frame and every eight R4 frames starts an R1 frame. Once a triad runs out of R4 tasks to execute, the 
triad will begin taking tasks from the R3 task list to execute. Likewise, when a triad runs out of R3 tasks 
i t  takes tasks from the R1 task list. Execution of a lower task frame group can be suspended in a triad 
by the start of a higher numbered frame group. Suspended tasks are continued once the the triad runs 
out of tasks from the higher iteration rate. For example, the beginning of an R4 frame suspends 
execution of R3 and R1 tasks until all tasks in the R4 frame finish. The processor triad that finishes the 
last R4 task in the R4 frame becomes the responsible triad that starts the next R4 frame. 
Several computer systems are involved in creating and running experiments on FTMP as illustrated in 
'The only exception is a rate 1 task called 'SCC', the system configuration control task; this task is systematically run on 
different processor triads so i t  can execute self-tests on each triad. There is a bit in SCC's Task Control Block, set by SCC, that 
specifies on which triad the dispatcher should run SCC. 
R4.CONTROL 
- - TIMER 
A 
1 Workload.R32 I 
1 
I 
I 
I 
I 
Initial 
L o  
(Start3 Workload) 
- i 
k Wor kload.R33 
- 
(See's posit 1 on in  task 
list may change during 
execu t I on 1 
Wor kload.R42 
R 1  CONTROL 
h A 
READALL 
1 
Workload.R3 1 - 
A 
Workload.Rl I I 
1 
1 
Workload R12 - - 
- , 
Workload R 1 3 - 
4 - 
IDLE 1 . 
+ 
I D L E  2 - 
+ - 
IDLE 3 . 
,.om- 
Figure 23: Task Control Block Structure 
\ 
Workload. R43 
10 
M a j o r  F r a m e s  
R 4  F r a m e  
F r a m e  M a r k s  
12.5 Hz I H 
R 3  F r a m e  
6.25 Hz k-, R 1 F r a m e  -4 
Figure 2-4: Franie Structure 
Figure 2-5. The steps to creating an experiment and tlie systems involve include: 
Create and compile a program task written in a language called Automated Engineering 
Design (AED) system which runs on an IBM 4341. 
The user must map out where the code goes in memory along with the location of stack, local, 
and system variables. 
The user then modifies OS task tables to include the task in FTMP's task  structure, 
reassembling task tables when finished. 
The experimental task is linked with the rest of the operating system code to create an 
absolute load module. 
The load module is downline loaded from the IBM 4341 to a VAX-11/750. 
The load module is downline loaded from the VA.X-11/750 to FTMP. 
The FTMP test adapter (CTA) is used to debug the experimental program. 
Once the experimental program is correct, the test adapter is used to dutiiy a nieniory image 
into a file for later analysis. 
Figure 2-6 illustrates the process of creating a program. The experiiiiental loop may take up to two hours 
from the time of compiling a program on the II3M 4341 until it is esecuted on FThtP. The esperimeiiter 
must have knowledge of several systems including the 113M 4341,  tlie VA)(-11/750, and F T h P .  The 
experimenter also must be intimately familiar with FTW's hardware, operating system, and task 
structure. 
l 6 M  4341 7 
I 
PROM Programmer 
I VAX 11/750 
I 
m Emulation 
Seven 
1553 
Interfaces 
FTMP RS232 
Display & 
Moni t o r  
* 
3 - Feu1 t - Injector - I - I - 
3 , - F T M P  
- 
Figure.2-5: FTMP Support Environment 
In order to shorten this experimental loop and improve experimental efficiency, a synthetic workload 
model for real time avionic systems was proposed [Clune 841. With an easy to envision model, an 
experimenter can be working with the workload after merely a few hours of reading over the model, 
getting an  overview of F W  and learning VAX/VMS commands; the IBM 4341 is eliminated from the 
experimental loop. 
P r oq r am 
Task 
CAPS-6 f i  
A s  sernbl y 
Ref  oca tab1 
Object 
Modules 
I K  OS Modules 
'I I F 1MP 
Figure.2-6: Steps to Creating a Program 
2.6. Previews of Experiments 
To date, baseline experiments up to the application level have been performed. Various experiments, 
classified by the level of abstraction presented in Figure 2-1, are shown below. Experiments niarked by 
an asterisk (*) have already been performed [Clune 841. 
1. Instruction Set Level: 
Verify the  clock as an accurate fundamental measuring device. \Vith the clock 
Timings of Assembly and High-level language instructions. (*) 
calibrated, future performance experiments can be performed with confidence. (*) 
13 
0 Observe and document the existence and the direct effects of interrupts. 
2. Executive Software Level: 
0 Executive primitive and overhead times (*) 
0 Interrupt procedure times 
Memory Access time 
0 Bus access and contention delays 
0 Fault-tolerant overheads 
3. System and Application Level: 
0 Frame utilization characteristics (*) 
0 Length of the frame of all task iteration rate groups 
0 Fault-tolerant overhead to the application programmer 
4. Development of an application level tool for measuring performance. 
This report covers two experiments on FTMP. First, an experiment was run to test the existence and 
document effects of interrupts on F W .  The second part of this report discusses the development and 
impiementation of the application level tool cz!!ed the s y ~ t h e t i c  warklaad. -4n experiment. to calibrate 
the synthetic workload is also discussed. Once installed, the synthetic workload can be used to run 
application level experiments as well as certain executive level baseline experiments. 
14 
3. Interrupts 
Interrupts can be viewed as a signal of unusual events in a processor. These signals can be of simple 
events like arithmetic overflow or of more complex events like a device is ready for input. Interrupts can 
be used for communication between a user process and the supervisor, in which case they are called traps. 
A user process invokes a trap to request service (I/O, resource request, etc.) that the user process could 
not fulfill directly. Interrupts are also a mechanism for enforcing virtual memory and protection schemes. 
Interrupts notify the processor that a memory reference was to a page not in memory (page fault) and the 
page needs to  be brought in, or can halt a program that tries to  access memory outside its memory space. 
Finally, interrupts are a mechanism for software reliability. Whereas, fault-tolerant systems, through 
redundancy, can catch hardware errors and mask or record them for later reconfiguration, interrupts are 
the mechanism for detecting and recovering from software faults. There are four categories of interrupts: 
Intraprocessor asynchronous events that happen within the processor during the execution of a 
machine instruction. Examples of these events include: zero divide, arithmetic 
overflow, memory access violation, privileged instruction execution, and page fault. 
Intrasystem 
Executive 
Interprocessor 
interrupts caused by a peripheral such as a disk, timer or terminal. Examples of these 
interrupts include timer reached zero, input received, and output device ready. 
caused by the current executing program. Executive interrupts are used to make 
requests of the executive (operating system) program. Examples of such requests are 
starting new tasks, allocating hardware resources, communication to other tasks, etc. 
These interrupts are sometimes referred to as traps, supervisor calls (SVC), or 
privileged mode calls. 
interrupts between two intelligent processors. 
implement an interprocess communication (IPC) mechanism between processors. 
This type of interrupt can be used to 
This section describes mechanisms used in implementing interrupts, followed by a discussion of interrupts 
on FTMP CAPS-6 processor. Finally, results of experiments to test interrupts mechanism on FTMP are 
presented. 
3.1. Mechanisms 
Generally, interrupts are vectored, that is, the address of the interrupt handling routine is in a special 
memory location. When an interrupt occurs, control is transferred to  a routine pointed to by this vector. 
Several devices can be associated with a single interrupt vector, in which case the processor must poll the 
devices to see which caused the interrupt. 
When there are several interrupt vectors, a system will sometimes have interrupt priority nesting. 
Nesting allows higher priority interrupt9 (e.g. power failure) to interrupt the processing of low priority 
interrupt routine (e.g. overflow). 
15 
To provide operating system support for protection mechanisms, most computers have, at the very 
minimum, user and supervisor states. Which protection violations are reported are a function of machine 
state. Obviously, interrupts like privileged instruction violation should not occur in supervisor state, 
hence there is an architectural decision of which interrupts are ignored in supervisor state. 
Finally, there is the issue of disabling and masking interrupts. Disabling an interrupt prevents a device 
from sending an interrupt. Thus the interrupt signal is actually turned off. In contrast, masking does 
not prevent the interrupt from occurring, but instead ignores the interrupt until the mask is changed. 
Using this definition, in a priority interrupt scheme, low priority interrupts are masked by a higher 
priority interrupt. Processors generally have a hardware mask field which tells which interrupts to 
ignore. In general, most interrupts (overflow, I/O, etc.) are supervisor maskable, but only intrasystem 
and interprocessor interrupts can be disabled. 
Some system responses to interrupt include: 
0 Do nothing. The results are equivalent to masking the interrupt except that  the interrupt is 
cleared since i t  was acknowledged. For example, some applications might wish to be notified 
of an overflow condition yet continue execution. 
0 Abort the current job (e.g. divide by 0, memory access violation, etc.). 
0 Restart the job or start a job with new software (e.g. N-version programming). This is a 
consideration in a software reliable system. 
0 Performs service (e.g. privilege mode call, page fault). 
0 React to an event (e.g. timer interrupt, i/o interrupt, IPC interrupt). 
3.2. Interrupts, System Validation, and Performance 
The steps to evaluating interrupts are similar to the steps taken when evaluating any part of the 
system. First, the existence of the interrupt is tested, thus validating the programmer’s manual. Baseline 
experiments follow which test functional correctness of the interrupt mechanisms (Le. do interrupt 
masking mechanisms work correctly, are supervisor/user effects of interrupts correct, etc.). Interrupt 
evaluation encompasses both the hardware and the operating system. Interrupts are invoked in hardware, 
but the interrupt handlers are in the operating system. 
Interrupts do effect performance. An add instruction that overflows (thus invoking an interrupt) is 
slower than the equivalent instruction that does not overflow. Likewise, page faults impact performance. 
Therefore, the performance matrix of Figure 2-1 was used: 
0 Throughput - How long does it take to process the interrupt? This delay is a function of the 
length of the interrupt handler, the system load, whether the handler is in memory (Le. does i t  
need to be paged in), etc. 
16 
Utilization - How often are interrupts invoked. Although utilization of processor exception 
interrupts (overflow, privileged mode violation, etc) is of less interest due to rarity, utilization 
of IPC and page fault interrupts are more frequent. 
Delay -- Variation of interrupt delay between processors. Also, does the effect of interrupts 
cross processor boundaries? 
The following is an excmple of experimental steps for evaluating interrupts: 
1. Test the existence of interrupts (manual validation). 
2. Test interrupt masking mechanisms. Also test which interrupts occur in user versus 
supervisor mode. 
3. Test how long it takes to process each intraprocessor interrupts (overflow, page fault, etc.). 
Compare this to interrupbfree execution. 
4. What is the overhead of processing intrasystem interrupts (timer, terminal, etc.). How often 
to these interrupts occur? 
5 .  For executive interrupts (traps), evaluate how long it takes to service the trap. Likewise, how 
long does a processor take to respond to an IPC interrupt? 
6. What is the interrupt rate of page fault and IPC interrupts? For typical instruction 
execution, how often to page faults occur? 
7. Perform the above tests in both uniprocessor and multiprocessor configurations. 
3.3. Interrupts on FTMP 
The processor elements in FTMP are Collins Avionics CAPS-6 processors modified for fault tolerance. 
The CAPS-6 processor has 18 interrupt vectors, stored in the first 18 words of PROM. Vectors 0-7 are 
unavailable in the F" implementation of the CAPS-6 processor. According to documentation [Draper 
83a], interrupts can only occur in user mode; interrupts in supervisor mode are automaticly masked. 
Actual implementation reveals that interprocess communication (IPC), interval timer, and page fault 
interrupts can occur in supervisor mode. Otherwise, for example, the processor would not be able to page 
executive code. Interrupts SF,, are maskable. The CAPS-6 has a bit mapped interrupt mask which is 
stored in the Process Status Descriptor (PSD) of each task. This mask is loaded into the hardware 
interrupt mask when the task is started. There are no interrupt priority levels in the CAPS-6 processor. 
Figure 3-1 summarizes FTMP's interrupts. This table also presents the results of experiments to test 
the effect and existence of these interrupts. 
17 
3.4. Experimental Results 
Many of the interrupts do not have an interrupt handler. These are: 
0 Arithmetic Overflow 
e Write Protection Violation 
0 Illegal Opcode 
0 Stack Overflow 
0 Non-local Search Fault 
0 Privileged Instruction Violation 
0 Privileged Mode Call Fault 
Instead, a generic routine called 'NO.INT."DLER" handles all the above interrupts. 
"NO.INT."DLER" is an infinite while loop that will, of course, hang the system when entered. An 
alternative implementation of " NO.INT."DLER" is to ignore the interrupt, immediately returning 
control to the executing task. The reason for looping forever is for debugging; when the system entered 
this routine you could examine the system state to find where the error occurred. Since there is this 
potential of hanging the system if one of the above exceptions occurs, all tasks, including application 
tasks, run in privileged mode where exceptions are ignored. 
In addition, there is no interrupt vector for divide exception. A divide by zero in user or privileged 
mode will stall the system. Admittedly, the above hazards are a characteristic of the present, 
experimental system. The original design called for USER/PRIVILEGED mode implementation and 
interrupt handlers. 
Running tasks in privileged mode, while preventing system failure from an unimplemented interrupt, 
does compromise software reliability. In particular, write protection is ignored in privileged mode, so a 
software error can be potentially disastrous (i.e. a R4 task writing into a R3 task's stack area). Likewise, 
an overflow or illegal instruction signals software error and the need to stop the task (for task restart or 
n-version programming). These signals are missed in privileged mode execution. 
Even if interrupts were implemented as the original design called for, one may be reluctant to execute 
tasks in USER mode because its of limited power. In particular: 
1. A user task cannot use system bus service routines, that  is, the user cannot access system 
memory. User tasks attempting to access system memory stall the system (the original design 
calls for a write protection violation interrupt). Hence, all variables must be in local memory. 
Since a task might run on any processor triad from one task execution to another, local 
memory variables are not guaranteed to retain values between task iterations. 
2. A user task can save values through use of a task data block. Variables in a task data block 
are copied from system memory into local memory by the dispatcher before the task starts, 
and moved back to system memory when the task ends. Thus, these variables retain their 
value between task iterations. However, changes to  data block variables are not reflected in 
system memory until the task finishes, which limits the potential for inter-task communication 
to task completion boundaries. 
18 
Interrupt 
Number Maskable 
E Yes 
F Yes 
10 no 
11 no 
12 no 
13 no 
14 no 
15 no 
16 no 
17 no 
* no 
Assignment/ 
Function 
unassigned 
unassigned 
Arithmetic 
Overflow [I] 
IPC interrupt 
Interval timer 
Write Protection 
Violation 111 
Page Fault [4] 
Test Adapter [41 
Halt Instruction 
Execution [ll 
Illegal Opcode [l] 
Stack Overflow [11 
Non-local Search 
Privileged instr 
- Fault [l, 21 
pmcall fault [1,31 
Fault [l ,2,41 
unassigned 
unassigned 
Mode/ 
Effect 
USER/Stalls system 
PRIV/No effect 
USER/Stalls system 
PRIV/Write protection 
ignore 
USER/stalls system 
PRIV/ignored 
USER/stalls system 
PRIV/ignored 
USER/stall system 
USER/No Privileged 
mode routines 
Divide exception[51 USER or PRIV/Stalls 
system. 
[l] -- No interrupt handler mitten. If this interrupt occurs, a routine 
[2] -- Non-local Search Fault occurs when a routine attempts to access a 
called "NO.INT."DLER" is entered which executes a DO-FOREVER loop. 
variable in its caller's local environment that does not exist. None 
of FTMP's software demands non-local searches; instead, the software 
uses static local variables to communicate to nested procedures. 
can use to call supervisor routines. 
routines on the current version of FTMP. 
[31 -- Pmcall, Privileged mode call, is an instruction that a user process 
There are no privileged mode 
[41 -- Not tested. 
C51 -- There is no interrupt vector for Divide Exception. 
Figure 3-1: Summary of FTMP's Interrupts 
19 
3. Synchronization between user tasks is very limited (if not impossible) since user tasks cannot 
access system bus routines. The original design of FTMP does provide constraint bits in the 
task tables for task ordering (i.e. do not start a task until specified tasks are finished), but 
these bits are not implemented on the current version of FTMP. 
The reliability/system capability trade-offs of running a task in USER or PRIVILEGED mode is a 
dilemma to the FTMF programmer. However, with minor modifications to the original design, some of 
the power only available in the privilege mode can be made available to a user application task, As an 
example, making some of the system bus routines available as traps (see interrupt number hex[l5] - 
pmcall fault) would give the user controlled access to system memory without compromising the software 
reliability of user mode execution. 
Since many interrupts are not implemented on F'I", no performance analysis was performed. The 
rest of the report instead concentrates on a tool for application level experiments: the synthetic workload. 
20 
4. Workload 
4.1. Definition 
The workload of a computer is defined as the set of all inputs (programs, data, commands) the system 
receives from its environment. A workload can be classify as natural or synthetic. Natural workloads 
accomplish useful work while a synthetic workload models a natural workload. 
There are many types of natural workloads. If the computer is a timesharing system the workload 
would be a user typing commands to the terminal. The workload would also include overhead of loading 
user programs, inputing data, and executing user programs. For control computers the workload is of a 
different flavor; the input is in the form of sensor readings that must be processed before they are 
overwritten. The program task that processes the sensor data is also considered part of the control 
computer workload. These tasks are executed at regular intervals. 
The above two situations are examples of natural system workloads. Evaluating the performance of a 
natural workload involves putting measurement code into an existing system and collecting workload 
performance data over a period of time. With the second example, a control system, evaluation would 
involve taking measurements on existing control software to evaluate its performance. Sensor input to 
the control program could be real input from the actual environment (Le. the computer would be flying 
an airplane) or simulated sensor input. In either case, we assume the system and application software 
already exists and the major effort is in setting up the system for evaluation. 
A synthetic workload, like a natural workload, exercises a computer system. But unlike a natural 
workload which at least must have simulated input to "real" application programs, a synthetic workload 
is essentially a "fake" set of application programs (or tasks) that  are modeling a natural workload. A 
synthetic workload can test a computer without having to develop or install application software. 
Characteristicly, synthetic workloads are controllable by the experimenter and can be used to analyze 
performance by varying parameters in the synthetic workload model. 
4.2. Advantages of A Synthetic Workload 
As inferred from the above discussion, although a synthetic workload does not represent an application 
as well as a natural workload, there are several advantages to synthetic workloads: 
1. Easy to create and debug. A natural workload must be written as well as have a natural or 
simulated external environment. If analyzing performance (perhaps for a performance 
improvement study), a natural workload would already exist and thus would be preferred. 
However, if we are performing a feasibility study where external input, let alone application 
software, might not exist for the system, a synthetic workload is an excellent device for 
measuring performance. With little effort to create and debug the synthetic workload, we 
could answer some feasibility questions such as "Ls the computer fast enough for our target 
2. 
3. 
4. 
applicatio~is?~ or “DOCS the computer have enough capacity for the natural workload we are 
modeling? 
Easily repeatable. One of 
those guidelines included experimental repeatability. With natural workloads repeating an 
experiment would involve recording all the environmental inputs over a measurement period, 
as well as output wliicti might have an effect on the input. This is particularly difficult i f  
output from the system effects the input. The natural workload approach tends to be 
cumbersome in terms of storage requirements. A synthetic workload not only simplifies the 
environment through a model but also simplifies the interface. The only data tliat needs to be 
recorded for repeat experiments is the workload parameters arid the measurement period. 
These parameters can set the system to the exact state of the original experiment. 
I n  an earlier section we listed several guitlelines for experinleiits. 
Easily controlled by parameters. The workload model is designed to make variation of 
parameters easy. With a parametric model, sensitivity to parameter changes can be 
systematically explored and bottlenecks discovered. 
Model many natural workloads. With new computer systenis we usually want to study the 
feasibility of using the system for many types of applications or natural workloads. hliodeling 
these applications with a single synthetic workload can yield a good feeling for the 
performance or a set of natural workloads. 
5. Easily migrated to different systems. Generally the same workload model can be used on 
several systems. Thus if we model the same workload on several computer systems it is much 
easier to make direct comparisons between systems. In 
this figure, if workload CV is a natural workload it is sometimes called a bencktnark. 
Figure 4-1 illustrates this concept. 
A- -7- I t 
Figure 4-1: General scheme of performance coniparisons aniong r i  systems [Ferrari 781 
22 
Of course there are disadvantages to using synthetic workloads: 
1. The system must be dedicated while using the synthetic workload. With natural workloads 
data can be collected while useful work is being done. 
2. The synthetic workload is only an approximation of a natural workload. 
4.3. Motivations 
An additional motivation for designing a synthetic workload for FTMP is to simplify the 
experimentation environment (see Figure 2-5). Prior to the use of the synthetic workload, experiments 
were performed by creating a program on an IBM 4341 followed by compilation, assembly and linkage of 
the task. An absolute load module was then downloaded to the support VAX and then to FTMP for 
execution. The entire experimental cycle usually took up to two hours assuming the experiment was 
designed correctly. Analysis was limited to a few parameters in each experiment. To analyze data from 
the experiment the user must provide a data collection program or modify an existing data collection 
program. The original FTMP baseline experiments were conducted in this manner. In order to master 
the experimental loop, the user had to learn about the internal structure of FTMP, including the setting 
up of task tables, the CTA interface program between FTMP and the VAX, and the VAX/VMS 
command language. Because of the time it  took to develop experiments, there was substantial motivation 
to simplify the experiment loop, even possibly taking the IBM 4341 - the major bottleneck - completely 
out of the experimental loop. 
A synthetic workload relieves the user of these details as well as providing a mechanism for further 
simplifying experimental preparation. Synthetic workload experiments would be run by varying 
parameters in the model. The parameters of the synthetic workload must correspond to meaningful 
variables; otherwise analogies to real workloads would be meaningless. There is, of course, a fine line 
between representativeness and ease of use. 
The next section discusses a realtime workload model. This is followed by the details of the 
implementation of that model on FTMP and the program support for the implementation. Finally, 
several workload experiments are compared to equivalent baseline experiments to calibrate (Le. test the 
representativeness of) the synthetic workload. 
4.4. A Realtime Workload Model 
The goal of any model is to find a simple representation of a system that is not too far removed from 
the natural system. If the model is too complex, deriving conclusions from parameter changes will be 
difficult. Conversely, too simplistic a model would not adequately describe system behavior. 
There are several factors that must be considered when developing a realtime workload model. First is 
23 
the task structure of realtime workloads. A task is a single thread of execution. With a realtime system, 
a task is run at regular intervals, which defines the iteration rate of that  task. Not all tasks need to be 
run at the same iteration rate (Le. a display terminal does not need to be updated nearly as often as the 
airplane flap control). Control 
systems demand task completion within the interval defined by the task iteration rate, which is referred 
to as a hard deadline. This implies that any implementation of a workload model must collect data from 
several task iterations to check if deadlines, and thus iteration rates, are adhered to. A realtime workload 
model was presented in [Clune 841. The following discussion is an overview of that  workload model. 
Thus a realtime task model should allow for multiple iteration rates. 
For our model, tasks are assumed to be execution entities sharing a common memory. Each task has 
the form: 
e read sensor data 
e read interprocess communication (IPC) data 
e do work (computations) on the data 
e write IPC data 
e write sensor data 
On F'TTviP, a task is represented by the program in Figure 4-2. In this case the imps  represent data read 
in (P and Q), operated on (T), and written out (R and S), with A=B+C considered the typical 
instruction. The communication mechanism between processes on FTMP is main memory. Thus both 
sensor and IPC exchanges are done through memory reads and writes. The value of the realtime clock is 
stored after each iteration for later timing analysis. 
Task, 0 ; 
Begin 
Read (Pi, Q,. Ti. R,. Si) ; 
Store (Time) ; 
For X=i to P, d o  
Read Sensor Input (read memory); 
Store (Time) ; 
For X=l  to Q, d o  
Read IPC Input (read memory); 
Store (Time) ; 
For X=i to T, d o  
Execute Instruction (A = B + C);  
Store (Time) ; 
For X=i to R, d o  
Write Sensor Output (write memory); 
Store (Time) ; 
For X=i to S, d o  
Write Ipc Output (write memory); 
Store (Time) ; 
End ; 
Figure 4-2: Representation of a Synthetic Workload Task 
The above task model is sufficient to implement a synthetic workload on FT". However, if we want 
to more closely approximate a realtime system, a higher level structure is required. 
24 
The next abstraction level above the task is the function. A workload can consist of any number of 
functions, each of which is composed of one or more tasks. The parameters at the function level are: 
0 the number of tasks 
0 frequency of execution of this function. All tasks within the function will have this iteration 
0 percentage of total system instructions used by the function 
0 percentage of total sensor 1/0 used by the function 
0 percentage of total IPC 1/0 used by the function 
rate. 
Tasks are grouped into a function because of parametric similarities (Le. perform approximately the same 
number of operations and have the same execution rate), rather than functional similarities. 
Finally, we define the system level of the model which gives the structure and capability of the overall 
realtime workload. Parameters at this level are: 
0 number of instructions (thousands of operations per second) 
0 total amount of sensor 1/0 (words per second) 
0 total amount of IF’C (words per second) 
0 number of functions 
percentage of sensor 1/0 that  is input 
0 percentage of IF’C 1/0 that is input 
Figure 4-3 illustrates the workload model for a realtime system. 
A program, called the workload calculator, takes system and functional level parameters and calculates 
This program, developed in iteration numbers that can be used to implement a synthetic workload. 
[Clune 841, is discussed in Section 4.5.1. 
4.5. Implementat ion of the Synthetic Workload on FTMP 
The goal of the synthetic workload implementation is for a user to be able to use the workload with 
minimal knowledge of the underlying system. The user should only need to know the workload model. 
In addition, the workload should have an easy to use interface. Initially, the discussion of the synthetic 
workload implementation will focus on the user interface. This will be followed by a discussion of the 
details of the actual synthetic workload implementation on F W .  
4.5.1. User Interfaces 
To the user there are three parts to the synthetic workload: the workload calculator, the workload 
generator, and the workload data analyzer. Each of these programs is invoked at different times in the 
developing and running of a workload experiment. The following is a discussion of these three programs. 
Workload Calculator:  
The workload calculator was developed and implemented in [Clune 841. This program 
converts parameters from the workload model into iteration numbers in a workload 
task on FTMP. This program inputs system and functional level parameters and 
calculates iteration numbers that are used by the synthetic workload generator. The 
Fn 1 
PN I\ 
RN“ 
,\QN 
“SN 
FnN 
Figure 4-3: Workload Model (Clune 841 
Fn2 
26 
system level parameters directly correspond to those parameters presented in the model. 
These parameters include total instruction KOPs, total sensor I/O, and total IPC rate. 
Functional level parameters also correspond to those presented in the model. Examples 
of functional level inputs include the number of tasks per function, the function’s 
iteration rate and the percent of the total system instructions, the total sensor I/O, and 
the total IPC 1/0 used but each function. This program outputs loop iteration values 
for insertion into the synthetic workload tasks (Figure 4 2 ) .  The workload calculator 
can specify workloads for any control computer that  implements the same workload 
model. 
I 
Workload Generator: 
This program is the interface between the user and FTMP. The major motivation for 
the program is to separate the details of the workload model from the details of 
installing task level parameters into the FTMP synthetic workload. This program uses 
iteration values supplied by the user (e.g. those supplied by the workload calculator) 
and deposits them into synthetic workload tasks on FTMP by setting up a command 
file. When run, this command file enters CTA, the interface between FTMP and the 
VAX, and selectively writes to FTMP’s memory to set up the workload. The command 
file also sets up the number of tasks to run in each rate group (again defined by the 
calculator), plus configures FTMF’ for one, two or three processor triads. The workload 
generator creates a second command file for collecting timer data from FTMP. The 
user is again quizzed on which timer values to save and the number of iterations to 
observe. These timer dumps are later analyzed by the third component of the 
workload, the data analyzer. 
Data Analyser: This program works in conjunction with the workload generator to analyze data dumps 
and make histograms of differences between timer values. The user is quizzed on which * 
timer values to compare and put into histograms. 
Figure 4 4  illustrates the relationship of the above programs. Each program is user oriented, quizzing 
the user about system configuration, workload structure, and timer values desired. Presently, the user is 
responsible for filling in the link between the workload calculator and the workload generator. 
The steps to running an experiment with the synthetic workload are: 
1. Load FTMP with the synthetic workload (need only be done once). 
2. Use the workload calculator to describe the application workload you wish to test. Iteration 
values are stored in a file called RESULT.DAT. 
3. Run the workload generator using data from Step 2 as parameters into the workload model. 
The workload generator will create two command files: one to configure the the synthetic 
workload on FTMP and a second to collect data from the workload. 
4. Run the first command file to configure FTMP 
5. Run the second command file, storing the data in an output file. Run this command file 
several times until you have the desired amount of data. 
6. Run the data analyzer using an output file from Step 5 as input. The data analyzer outputs 
27 
I WORKLOAD 1 1nstructions:sec. 
Freqiienc 1 e s .  
I / O  r a t e ,  e t c  
CALCULATOR 
Task O e f i n i  t ions .  
I t e r e t i o n s ,  e t c  
WORKLOAD 
GENERATOR 
Recon f igura  t 1 on I 
R a w  D o t e .  
Timer Dumgs 
i D 3 t 4 3  . H yistograms D A T A  
ANALYZER i 
Figure 4-4: FTMP Synthetic Workload Environment 
28 
the data in a readable form and create histograms of that  data. 
7. Repeat Steps 2 through 6 for each workload experiment. 
Once F T "  is initially loaded with the synthetic workload, the elapsed time from running the workload 
calculator to output histograms is about 10 minutes. Occasionally, a hardware interface to FTMP may 
stall, in which case the experiment loop can be significantly longer. 
4.5.2. Implementation: FTMP Tasks and Workload Considerations 
The model for a realtime workload task was presented in Figure 4 2 .  In this task model the values for 
the loop iterations are read in from a special area in memory set up by 'the workload generator before the 
workload starts. Timer values are written back to memory at the end of the task. 
FTMF has three task rate groups. For initial implementation, there are three workload tasks for each 
rate group. Three per group is not a hard limit since there is room in the task tables to potentially 
expand to 15 tasks per rate group (except for the R1 rate group - there are 6 special tasks thus limiting 
this rate group to 9 workload tasks). The major limit on the number of workload tasks in F W  is 
memory storage for timer values. The number of tasks that actually run in each rate group is set up by 
the workload generator. 
Data collection is done in cycles. A collection cycle starts when the data collection command file 
(created by the workload generator) enables tasks to execute. For a period of time workload tasks write 
timer values to memory. These values are then retrieved from FTMP's memory by the command file for 
later analysis. Once this is done tasks are enabled again to start another data collection cycle. The saved 
data is essentially a snapshot of the computer over a defined execution period. 
To encompass all workload tasks, a collection cycle must include at least one full execution frame of the 
lowest frequency rate tasks (Rl). Thus, a collection cycle begins at an R1 frame boundary, called a major 
frame. An additional R4 task 
collection was added, making nine R4 collection frames, to record boundary cases such as missed 
deadlines. To monitor when to start collection cycles an additional R4 task is present. This task 
monitors when a major frame is ready to begin and sets all the workload tasks to start collecting data. It 
then removes itself from the R4 task list so as not to interfere with workload tasks while the workload is 
executing. A cycle is begun by externally linking in the special R4 task. All of these details of data 
collection is transparent to the user since they are set up by a data collection command file created by the 
workload generator. 
A major frame encompasses four R3 frames and eight R4 frames. 
The workload has to take into consideration several special tasks running on FTMP. These tasks are: 
1. A R3 task (R31) called "TIME" which updates TIME.NOW, the current time, in memory by 
29 
2. 
3. 
checking RT.CLOCK (the realtime clock) and BASE.TIME. This is considered essential to the 
computer performance and is always linked in. 
The R1 "DISPLAY" task which updates F W ' s  display terminal on the status of the system. 
This is considered non-essential and can be taken out if the user so chooses (i.e. if a workload 
task already models a system display). 
Two R1 tasks "READALL" and "SCC" which are the fault-tolerant tasks of FTMP. These 
two tasks can be considered essential in a fault-tolerant computer such as FTMP for fault 
recovery and reconfiguration. However, during fault-free execution they only perform self- 
tests. Therefore, the user has an option to take either of these tasks out of the task structure, 
which is useful should the user want to investigate the overhead of fault-tolerant tasks. 
The workload generator will ask the user which special tasks to include in the workload and links them in 
accordingly. 
Each task has an associated Task Control Block (TCB) which contains information on that task. Task 
Control Blocks are in a linked list common data structure in global memory. Processor triads select tasks 
from this structure when they need a new task to execute. Figure 2-3, presented earlier, illustrates the 
i d 3  uaca mi-ucmit: auu che positioii of i+ddoad a id  other tasks iii that stiiiiAiire. The fiiid three R1 
tasks, IDLE1, IDLE2 and IDLE3, are special tasks to record idle time in a major frame on each of the 
processor triads. After a processor has completed an R1 task i t  will select an idle task and hold that task 
until other processors have finished their R1 tasks and select an idle task. 
Cyrm - 1 - L -  - - J  & 
Finally, the FTMP R1 task dispatcher can assign R1  tasks to a specific processor if possible. A special 
field in the TCB of the task determines which processor (1, 2, or 3) to run the task on with 0 specifying 
any processor. "SCC" modifies this field so it can progressively run a battery of self-tests on different 
processors. Execution of SCC effects TCB ordering since the dispatcher will postpone execution of this 
task until the requested processor becomes available by moving this task down the task list. 
4.5.3. Calibration 
The final step to synthetic workload implementation is calibration. Calibration determines the 
correctness of the workload model. The best calibration experiments are, of course, direct comparisons to 
natural workloads. However, comparisons to dedicated FTMP experiments is acceptable since the goal of 
calibration is to show that the workload can produce similar results. 
The calibration experiments chosen for FTMP's synthetic workload are baseline experiments previously 
conducted without the workload generator in [Clune 841. These experiments provide an opportunity for 
comparison. The experiment are: 
I.  A task switching time experiment. This finds the overhead associated with starting a new 
task once a task finishes. This time is found by comparing timer values recorded at the end of 
the first task and the beginning of the second task respectively. Figure 4 5  illustrates task 
30 
switching overhead. 
2. The task startup experiments measures the overhead of starting a task on a processor. This 
time is found by comparing timer values taken at  the beginning of tasks running on separate 
processors. Figure 4-6 illustrates task startup overhead. 
Figures 4-7 though 4-10 are the results of four experiments: task switching time, dedicated experiment; 
task switching time, workload experiment; task startup overhead, dedicated experiment; and task startup, 
workload experiment. 
Task I 
Task 
&Switching+ 
I Overhesd I 
Figure 4 6 :  Task Switching Overhead 
P I  
I 
I Task. 
I Startup 
‘t: 
I 
I 
Task 2 
\. 
/ P2  
Figure 4-6: Task Startup Overhead 
Initial comparison is encouraging; both baseline and workload experiments have similar shapes. Both 
task startup experiments reveal similar dual peak curves with fringe da ta  points. In the baseline 
experiment, these lone data points revealed that the dispatcher was occasionally late starting a task. The 
synthetic workload exhibits the same behavior. 
Closer inspection of the data reveals that the workload curves of task switching overhead and task 
startup time are displaced 4 and 1.88 clock ticks (1 and .47 mSec) respectively from their baseline 
experiment counterparts. Thus, overhead exists in the workload that is not present in the baseline 
experiments. The source of this overhead is obvious upon inspection of the AED source code of the 
baseline experiment task (Figure 4-11) and a workload task (Figure 4-12). The baseline experiment was 
designed to measure beginning and end task times. Thus, time is read immediately upon entering and 
just before exiting the task. In contrast, the workload contains both task entry overhead (statements 
31 
clock data- 
ticks time points 
Average: 12.55 f 0.042 Ticks (540 data points) 
3.13 f 0.011 mSec 
Figure 4-7: Baseline Experiment: Task Switching Overhead 
Average: 16.35 f 0.068 Ticks (189 data points) 
4.09 f 0.017 mSec 
Figure 4-8: Workload Experiment: 'Task Switching Overhead 
S1-S4) and task end overhead to save results (statements El-E4). Because the synthetic workload is an 
application level tool, overhead is put outside the inner loops. The workload can still be used for timing 
intertask events if we take into account this overhead. 
By summing the execution times of statements S1 through S4 in the workload we can find the workload 
task initialization overhead. Execution times of the RD primitive are from a separate experiment 
(Appendix I). Execution time of arithmetic operations are taken from [Clune 841. Execution time of the 
"IF'" statement is neglected since global memory RD time is substantially larger. 
Statement # Instruction Execution Time (mSec) 
s1 RD C1 word] 0.138 
s2 IF (EXEC4 GEP 0) . . .  0.0 (for simplifying calculations) 
s3 0.0 
s4 RD [5 words] 0.150 - - - - - - - - 
0.288 mSec (Ave.) 
Similarly, the workload end overhead is: 
32 
clock 
t i c k s  
4 t i c k s  
5 t i c k s  
6 t i c k s  
8 t i c k s  
9 t i cks  
10 t i c k s  
11 t i c k s  
12 t i c k s  
13 t i c k s  
14 t i cks  
15 t i cks  
16 t i c k s  
17 t i c k s  
18 t i cks  
19 t i cks  
31 t i cks  
32 t i cks  
33 t i c k s  
34 t l c k s  
----_--- 
7 t i cks .  
20-30 t i c  
time 
(I .OO mSec) 
(1.25 mSec) 
(1.50 mSec) 
(1.75 mSec) 
(2.00 mSec) 
(2.25 mSec) 
(2.50 mSec) 
(2.75 mSec) 
(3.00 mSec) 
(3.25 mSec) 
(3.50 mSec) 
(3.75 mSec) 
(4.00 mSec) 
(4.25 mSec) 
(4.50 mSec) 
(4.75 mSec) 
ks  
(7.75 mSec) 
(8.00 mSec) 
(8.25 mSec) 
(8.50 mSec) 
_--_----- 
Average: 7.15 f 0.198 Ticks (744 data points) 
1.79 f 0.014 mSec 
Figure 49:  Baseline Experiment: Task Startup Time 
Statement # Instructlon ----------- --_----_-_- 
El WRT [12 words] 
E2 WRT [l word] 
E3 EXEC4=EXEC4+ 1 
E4 WRT [l word] 
EXEC4*6 
3*mEc4 
Execution Time (mSec) 
.................... 
0.190 
0.063 
0.164 
0.063 
0.058 
0.164 
0.702 mSec (Ave.) 
----- 
In the synthetic workload, calculation of task switching must consider task ending overhead of the first 
task, and task initialization overhead of the second task. Finally, 0.164 mSec is added since the baseline 
experiment must write a timer value to memory (El) a t  the end of the task. Taking these into account, 
we get 
4.09 mS - 0.288 mS - 0.702 mS + 0.164 mS = 3.26 mS (Ave.) 
a value within 5 percent of the baseline experiment’s value. 
Similarly, overhead should be deducted from the task startup time experiment. Since this experiment 
compares the first timer values of two workload tasks, task initialization overhead for both tasks should 
clock 
ticks time ----_--- --------- 
5 ticks (1.25 mSec) 
6 ticks (1.50 mSec) 
7 ticks (1.75 mSec) 
8 ticks (2.00 mSec) 
9 ticks (2.25 mSec) 
10 ticks (2.50 mSec) 
11 ticks (2.75 nSec) 
12 ticks (3.00 mSec) 
13 ticks (3.25 mSec) 
14 ticks (3.50 mSec) 
15 ticks (3.75 mSec) 
16 ticks (4.00 mSec) 
17 ticks (4.25 mSec) 
18 ticks (4.50 mSec) 
19 ticks (4.75 mSec) 
20 ticks (5.00 mSec) 
21 ticks (5.25 mSec) 
22 ticks (5.50 mSec) 
23 ticks (5.75 mSec) 
24 ticks (6.00 mSec) 
25 ticks (6.25 mSec) 
26 ticks (6.50 PrSec) 
Average: 9.03 f 0.391 Ticks (306 data points) 
’ 2.26 f 0.098 mSecs 
Figure 4-10: Workload Experiment: Task Startup Time 
be deducted. The actual startup time becomes: 
a value within 10 percent to the baseline experiment’s value. 
2.26 mSec - 2*0.288 mSec = 1.68 mSec (Ave.) 
The following table summarizes the above results: 
Baseline Workload 
Experiment Experiment Times Experiment Times 
Task Switching time I 3.13 mSec (Ave.1 4.09 mSec 
---------- ---------------- ---------------- 
I 
Minus workload I -- 
overhead I 
I 
I 
Minus Workload I 
overhead I 
Task Startup time I 1.79 mSec 
-- 
3.26 mSec 
2.26 mSec 
1.68 mSec 
Although these experiments are not application level calibration experiments, they do show that the 
synthetic workload is a valid tool for making baseline experiments, as long as workload overhead is 
considered in any intertask measurements. If measurements are intratask, the overhead is much smaller 
34 
Q4U.TESTl BIXIN 
... 
DEFINE PROCEDURE TIMl3ESTl TOBE 
BEGIN 
LONG HOLD, HOLD1 ; 
I m E R  MEC.RTCCNUH.1; 
I m E R  A; 
HREXD (RT .CLOCK, HOLD. 2) ; 
IF MEC LEQ 14 
RD~cMu.ExEc.MEc.1) ; 
THEN BEGIN 
RD(QN.RTCNUn,RTCNUn. 1) ; 
FOR 1=1 STEP 1 UNTIL RTC" 
W BEGIN 
END; 
A = E I ( E C * 6 ;  
VRT (QRI . TIME (A) , HOLD, 2) ; 
HRUD(RT.CLOQ<.HOLDl .a) ; 
VRT(QRI.TIME(A+l) ,HOLD1.2) ; El 
A=l ; 
m; 
-(o) ; 
END; 
END FINI; 
Figure 411: Baseline Experiment Task (AED) 
since the clock read time (€€READ) is the only overhead. In conclusion, the workload is a useful tool for 
performing experiments on FTMP. 
35 
CMU.TEST BEGIN 
. . .  
DEFINE PROCEDURE VRKLOADR41 TOBE 
BEGIN 
INTEGER X,  Y ,  2; ... WON-STACK LOCALS // 
OWN INTFGER A; 
OWN 1 m E R  LOCAL, EXEc4; 
OWN LONG ARRAY HOLD(OUT.VALUES); ... HOLDS TIHER VALUES / /  
OWN INTEGER ARRAY R41. INPUT(6) ; . . .INPUT PARAMETERS // 
INTEGER P; P $=$ R41. INPUT(0) ; 
INTEGER T; T $=$ R41.INPUT(2); 
INTEGER R; R $=$ R41.INPUT(3); 
INTEGER a; a $=s R~~.INPUT(I); 
INTEGER S; S $=$ R41.INPUT(4); 
RD(CMU.MEC(0) ,EXEC4,1) ; 
IF (EXEC4 GEQ 0 )  AND (EXEC4 LES 9) THEN 
BEGIN 
RD (R4.INPUT(O) .R41.INPUT.S); 
HREAD (RT .CLOCK, HOLD ( 0 )  ,2) ; 
FOR A=l STEP 1 UNTIL P DO 
HREAD(RT.CLOCK.HOLD(1) ,2); 
FOR A=l STEP 1 UNTIL Q DO 
RD(CMI.GLOBAL.LOcU.1); 
HREAD(RT.CLOCK.KOLD(2) ,2) ; 
FOR A=l STEP 1 UNTIL T DO 
BD(CMI.GLOBAL,LOCAL,~) ; 
X=Y+Z; 
HREAD (RT .CLOCK, HOLD (3) ,2) ; 
WRT(CMI.GLOBAL,LOCAL,l); 
HREAD (RT . CLOCK, HOLD (4) ,2) ; 
FOR A=l STEP 1 UNTIL S DO 
WRT(CMJ.GLOBAL.LOCAL,~); 
HREAD(RT.CLOCK.HOtO(6) ,2); 
WRT(R4i.OUTPUT(EXEC4*6) ,XOLD.lP); 
WRT(RI.ID(3*EXEC4) ,TRIAD.ID,l) ; 
EXEC4 = -4 + 1; 
WRT (CHU.EXEc(0) ,EXEC4,1) ; 
FOR A=l STEP 1 UNTIL R DO 
END; ... IF (EXEC4 CEQ 0 )  AND . . // 
RESUME(0) ; 
END; 
END FINI; 
s1 
s2 
s3 
s4 
E1 
E2 
E3 
E4 
Figure 4-12: Synthetic Workload Task (AED) 
36 
5. Future Work 
Although much work has been done defining the experimental methodology and using it to validate 
F W ,  there is still work to be done. First, the methodology should be verified through application to 
another system. In particular, the Software Implemented Fault-Tolerant (SIFT) computer at AIRLAB 
should have the validation steps applied to it. This computer has constraints similar to F W ' s  and 
would be an excellent candidate for the validation procedure. 
On FTMF, a few remaining baseline experiments should be performed. These include: 
Measure the time to transfer varying blocks of data from global to local memory, varying 
parameters much more than was done in the brief RD/WRT experiments described in 
Appendix I. 
Measure instruction execution time in pairs to see if the result is equivalent to the sum of the 
execution times when the instructions were measured singly. 
Investigate overhead and variation in application software due to the fault-tolerant 
mechanisms of FT". 
0 Find the nominal length of R3 and Rl tasks on FTMP. 
0 Find context swap time. This time is defined as the amount of time it  takes to start up an R3 
task once the dispatcher finishes with R4 tasks. 
The later three experiments can probably be performed with the synthetic workload. 
The potential of the synthetic workload has only been superficially demonstrated. The workload should 
be used for performance tests and comparisons, along with application level baseline experiments. Only 
through use will its power be demonstrated. 
Also, the present synthetic workload is a minimal implementation that was used to investigate 
feasibility. Presently, there are only three tasks per rate group. The R4 and R3 rate groups each have 
room for ten more tasks in their task structure, while the R1 rate group has room for seven more tasks. 
The only limiting factor is the amount of global memory available on FTMP to hold timer dumps. More 
compact timer dumps could possibly resolve this problem. Any enhancements will require changing the 
workload generator and data analyzer. 
Finally, in the future i t  will be desirable to contrast performance versus reliability of fault-tolerant 
computers. One idea is to integrate the synthetic workload - a performance measurement tool - with the 
fault-injection experiments. 
37 
6. Conclusion 
This project outlined and refined an experimental methodology for validating the multiprocessor 
avionics computer, FTMP. The methodology emphasizes a building block approach in which tests are 
performed starting at the instruction level, progressing through the operating system level and finally up 
to application level validation. At  each level baseline experiments, which test a single phenomenon, were 
performed. These were followed by more sophisticated experiments which test interactions between 
several baseline phenomenon. Finally, the concept of a generalized application level experiment tool, 
called the synthetic workload, was developed. 
Previous research had developed an outline of the methodology and tested i t  through the application 
level. This research refined that methodology with additional baseline tests. In addition, the synthetic 
workload was implemented as an application level tool. The synthetic workload was then calibrated with 
a baseline experiment to demonstrate the workload's representativeness. 
Although the technique was developed specifically for F T "  the origin of the technique dates back to 
ezilki work on multiprocessors at C-h4V. Thns, the methcds used here shsu!d be zpplicable tc! ether 
computer systems. Tests on another system will supply information on the robustness of the technique 
along with supplying meaningful comparisons between systems. 
By no means is the methodology complete. Using the synthetic workload for experiments will 
undoubtedly reveal deficiencies in the original methodology. But the existence of this tool will greatly 
improve productivity, allowing researchers to run more experiments and further refine the methodology. 
In general, the methodology has proven to be a sound approach to validating computer systems. 
38 
I. Test of Select RD/WRT Primitives 
On FTMP, most program tasks access the shared system memory with the following bus service 
routines: 
RD(sys.adr,cache.adt,num). This routines transfers num number of words from system 
memory address sys.adr to cache address cache.adr. 
WRT(sys.adr,cache.adr,num). This procedure is the same as RD except, of course, the 
direction of transfer is reversed. 
We wish to find the time these procedures use to access system memory with varying transfer sizes. In 
particular, we are interested in the sizes that  are used in the workload. The following instructions were 
tested: 
1. RD(sys,cache,l) 
2. WRT(sys,cache, 1) 
3. RD(sys,cache,S) 
4. WRT(sys,cache,lB) 
Instructions 1 and 2 were each executed in a loop 100 times along with the instruction 'A=l;'. The 
other two instructions were executed in a similer loop 50 times?. To  find loop overhead, a loop just 
containing an 'A=l;' instruction was executed both 50 and 100 times. This is the 'NULL' loop3. Times 
to execute instructions can be found by subtracting loop overhead from the instruction loop, leaving only 
instruction execution time. 
The results of the measurements were as follows: 
clock ticks 
/loop count 
Instruction per loop (Ave . 
1) Null 15.7/100 
2) RD (x,y,num=l) 70.8/100 
3) WRT(x,y,num=l) 69.1/100 
4) Null 8.3/50 
5 )  RD (x,y,num=5) 38.2/50 
6) WRT (x , y , num=l2) 46.0/50 
pSec per instruction 
I_____________________________ I 
w/ overhead w/o overhead 
.--------------------------------- 
39.3 f 0.019 0.0 
177.0 f 0.025 137.7 f 0.044 
172.8 f 0.023 133.5 f 0.042 
191.0 f 0.025 149.5 f 0.052 
230.0 f 0.018 188.5 f 0.045 
41.5 f 0.027 0.0 
Number 
of data 
points 
340 
220 
260 
500 
500 
300 
.--____--- 
The first column is the raw data in clock ticks (1 clock tick = .25 mSec). The next column is the time 
The third column adjusts the time from the to execute a single instruction including loop overhead. 
second by subtracting overhead. 
?The loop count was reduced to 50 for these calls since many large block transfers could take more time than an R4 process is 
allowed 
3A loop must contain at least one instruction; otherwise the compiler will not accept it. This is why 'A=l' is used as a substitute 
for a 'NULL' loop 
39 
II. Example of Workload Use 
This appendix contains an example of the running of the workload generator and data analyzer. An 
example of the running of the workload calculator is not presented since that program is discussed in 
This example starts with the very first step of the user providing information to the 
workload generator followed by the loading of FTMP with the synthetic workload. Then, using the two 
command files produced by the generator, the FTMP synthetic workload is configured and data collection 
is run. Output from the data collection is redirected into a file which is used as input to the workload 
data analyzer. 
(Clune 841. 
The workload generator basically queries the user on how he/she wants the synthetic workload 
configured. Input parameters to tasks correspond directly to workload parameters in Figure 4-2. The 
workload generator will also ask if the user wants the special R1 tasks (SCC, READALL, and DISPLAY) 
included in the workload. Finally, this program will inquire about data collection including what values 
and how many iterations the user wants from the workload collection. 
The workload dafu anolyzer is more complicated. This pregran: reads ir. timer d u e s  prdceed  by the 
collection file generated by the generator and quizzes the user on which timer values to compare. The 
initial part of the analyzer is file management. The program skips comments and tables in the data file 
to find the start of the workload data. It then quizzes the user on where he/she wants output sent. 
Should there be a break due to garbage data, a new collection set, or incomplete data (Le. CTA stalled in 
the middle of a collection and had to be restarted), this program will skip to the next major frame of data 
and return to the file management prompt. 
Next, the Analyzer gets from the user timer values to compare. The format for specifying timer values 
is: 
<task name> <timer no> 
where <task name> : := READAL, SCC, I D L E C l 2 3 1 ,  R C 4 3 1 1  C1233 
<timer no> ::= 0-5 for R x x  ta sks .  
6+ for timer value In another co l lect ion frame. 
0-1 for READALL, SCC and I D L E  task. 
Figure II-1 illustrates the workload tasks and timer numbers.. For Rxx tasks, the user can specify a 
number greater than 5 to refer to a timer value in another collection frame, e.g. 6 corresponds to the 0th 
timer value in the task iteration immediately after the current iteration. Thus, to find the time between 
running of task R41 we would compare R41 6, the last timer value in task R41, to R41 6, the first timer 
value in the next R41 iteration. This is feasible since the timer values for all iterations of a task in a 
major frame are stored in a continuous array. The analyzer will try to collect as many data points as 
possible in a major frame. 
p- 
R 4 1  A! 
Length 
R42 
0 
1 
2 
3 
4 
5 
- 
R43 
0 
1 
2 
2 -
40 
- 
R3 1 
Po 
1 
2 
3 
4 
5 
- 
R3 1 
30 
1 
2 
3 
4 
5 
R32 
0 
1 
2 
3 
4 
5 
R32 
0 
1 
2 
3 
4 
5 
R33 
0 
1 
2 
3 
4 
5 
R33 
0 
1 
2 
3 
4 
5 
Figure 11-1: Illustration of Workload Tasks 
R 1 1  
0 
1 
2 
3 
/1/ 
R12 
0 
1 
2 
3 
/1/ 
R13 
0 
1 
2 
3 
/1/ 
4 1  
Work1 oad 
Generator 
1 
WRKLD .EX€ 
It is recommended that the reader look at  the steps for running the workload presented in Section 4.5.1 
while reading through this example. '.COM' files 
contain CTA commands for loading FTMP with the synthetic workload (BTRIAD.COM), configuring the 
workload (CONIG.COM), and collecting data from the workload (COLLECT.COM). WRIUD.CAP is 
the absolute load module of the synthetic workload. WRKLD.LOG is an output log of workload data 
produced through the collection command file (COLLECT.COM). WRI<INFO.TXT is an internal file 
that communicates workload information from the workload generator to the data analyzer. 
Figure 11-2 illustrates the running of the workload. 
Throughout this appendix the user response will be in bold font while italicized phrases are guiding 
Therefore, data collection is for comments. 
eight major frames of data. This is much less than would be included in a normal experiment. 
Space constraints require that the example be minimal. 
I 1 
CONF IG-COM 
I ' p i  4 Data I I A --*..-A- 
2TR I ADS .COM 
I - I  
Anu i yrer 
€ 
DATA 
Figure II-2: Running the FTMP Workload 
$ RUN WRKLD 
Input f i l e  [STDIN]: <CR> 
Output f i l e  [STDOUTI : CONFIG.COM 
No. of Rl tasks: 0 
No. of R3 tasks: 1 
Task R31: 
Time l i m i t  In t i c k s  (1 tIck=0.25 msec) (48 t icks] :  <CR> 
Input parameters [? or (P Q T R SI] : 0 0 0 0 0 
AN AL.EXE 
42 
No. of R4 t a sks :  2 
Task R41:  
Time l i m i t  i n  t i c k s  (1  t ick=0.25 msec) [24 t i c k s ]  : t C R >  
Input  parameters [? or (P Q T R SI]: 0 0 0 0 0 
Task R42: 
Time l i m i t  i n  t i c k s  (1 t i c G 0 . 2 5  msec) [24 t i c k s ]  : c C R >  
Input  parameters [? or (P Q T R SI] : 0 0 0 0 0 
How many processor t r i a d s  (1,  2 ,  or 3)?  2 
Do you want SCC l inked i n  [Y]? t C R >  
Do you want DISPLAY l inked i n  CY]? c C R >  
Do you want READALL l inked in CY]? t C R >  
Data for  c o l l e c t i o n  
Do you want t h e  da t a  co l l ec t ion  loop i n  a separa te  f i l e ?  [nl y 
Output f i l e  [Sn>OUTl : COLLECT.COM 
Wait time between co l l ec t ions  [6 s e c s l :  <CR> 
There are 2 R4 t a sks .  
There are 1 R 3  t a s k s .  
Do you want t he  I D  t a b l e  dumped? [YES] t C R >  
Do you want IDLE, SCC, and READALL values dumped? [YES] <CR> 
Loop i t e r a t i o n s  [251 : 8 
How many of these  t a s k s  do you want da t a  from? [ALL] t C R >  
How many of these  t a sks  do you want da t a  from? [ALL] t C R >  
$ @BTRIADS.COM 
Output from loading ... 
B i t  s e t  
Load FTMP w*th the synthetic workload. 
THIS PROGRAM STARTS UP 2 PROCESSOR AND MEMORY TRIADS. 
MEMBERS OF TRIAD1 ARE LRU’S 0, 1 AND 2 .  
MEMBERS OF TRIAD2 ARE LRU’S 3 ,  4 AND 5 .  
THE MASTER I S  LRU “ A ” .  
CO0P.CAP LOADED I N  MASTER 
MASTER ISSUING BUS ENABLE/SELECT COMMANDS. 
CLEARING SYSTEM MEMORY TO 0 
BEGINNING LOAD OF EXEC MEMORY IMAGE 
SYSTEM MEMORY LOAD COMPLETE 
LRU’S 6,7,8,9,A,B ARE MARKED FAILED. 
TRIAD.ID.TABLE, MRR.TABLE SHOULD BE ALTERED TO CHANGE 
THIS CONFIGURATION. 
SLOP IS  SET TO 40 PER CENT OF R4 PERIOD. 
STARTING 2 TRIADS 
MASTER MAKING FINAL BUS ASSIGNMENTS 
43 
SYSTEM STARTED IN MULTIPROCESSOR MODE. 
CONFIGURATION TABLES ARE LOCATED AS FOLLOWS: 
TABLE LOCATION LENGTH 
BUS I " X  SELECT CODE 0 20 12 
C BUS ASSIGNMENTS 0 20 12 
P, R AND T BUS ASSGN 0 38 12 
MEMORY STATUS 0 44 12 
PROCESSOR STATUS 0 50 12 
ERROR LATCHES 1 00 48 
INITIATING TRANSFER OF CLOCK FROM MASTER 
Bit is reset 
DISCONNECTED FROM C BUS 1 
DISCONNECTED FROM C BUS 2 
DISCONNECTED FROM C BUS 3 
DISCONNECTED FROM C BUS 4 
DISCONNECTED FROM C BUS 5 
$ QCONFIG 
Output from con figuring ... 
. Linking in DISPLAY . 
. Preparing Ri tasks . 
0 Rl tasks . 
. Preparing R3 tasks . 
1 R3 Tasks . 
. Preparing R4 tasks . 
2 R4 Tasks . 
. Bringing up 2 Processors 
44 
. Repairing 0-2 . 
. Faillng 3-8 . 
. Bringing up Processors 3-5 
. Linking in IDLE and (optlonally) 
SCC, DISPLAY and READALL 
$ @COLLECT /OUTPUT:WRKLD.LOG 
$ RUN ANAL 
All output going a file 
Send Output to the terminal 
Input file [STDINJ : wrkld.log 
STARTING COLLECTION . 
. TABLES OF INTEREST LRUassignment table and 
table 01 workload input 
0020 0020 0016 0016 0016 0015 0015 0015 ! 0000 0050 2processor 
0000 0000 0000 0000 0000 0000 0000 0000 ! OOOF 0000 
0000 0000 0000 0000 0000 0000 0000 0000 ! OOOF 0008 
0000 0000 0000 0000 0000 0000 0000 0000 ! OOOF 0010 
0000 0000 0000 0000 0000 0000 0000 0000 ! OOOF 0018 
0000 0000 0000 0000 0000 0000 0000 0000 ! OOOF 0020 
0000 0000 0000 0000 0000 ! OOOF 0028 
OOOA 042F OOOA 042E OOOA 042D OOOA 042D ! 0010 0000 
%Start of new data. 
Where do you want new data (S,#,N,L,?): N 
New output file [STDOUTI: <CR> 
For this running o f  the workload we will collect data 
*The  R4l task length. This i s  calculated by subtracting 
0020 ! 0000 0058 triads 
EATING DATA . . .  
to measure four things: 
the f irst  timer value in task R4l (R41 0) f rom the last 
t imer value in that task (R41 5). 
after  the f irs t  processor started i ts  R4 task. This "task 
startup " time is found by  comparing timer values taken 
at the beginning o f  tasks R4l and R42 (R41 0 and R42 0). 
* The effective rate o f  an  R3 task. This i s  done by comparing 
time at the beginning o f  each iteration of the f irst  R3 task 
(R31 0 to R31 8). There are four R3 task iterations 
per major frame o f  data. Thus, three values can be 
collected in a major frame. 
* SCC startup time. This i s  a measure of the t ime for SCC to 
start after the f irs t  R4 task starts. I t  i s  found by comparing 
the first timer value in SCC (SCC 0) with the first timer 
*The  time for the second processor to start i t s  R4 task 
45 
I -  
reading (R41 O )  o f the first iteration o f task R41. 
There a r e :  
2 R4 tasks, 2 a r e  dumped. 
l R 3  tasks, 1 a r e  dumped. 
0 €21 tasks, 0 a r e  dumped. 
The task I D  t a b l e  was dumped. 
SCC, READALL and IDLE task values were dumped. 
Data po in t  dump 1.. Please l i s t  highest  r a t e  group f i rs t .  
F i r s t  t imer value (cmd,Q,H,?) [?I > R41 0 
Second timer value > R415 
Name of t h i s  data dump: Task R41 length 
F i r s t  t imer value (cmd,Q,H,?) [?I > R41 0 
Second timer value > R42O 
Name of t h i s  data dump: Task Startup time 
F i r s t  t imer value (cmd,Q,H,?) [?I > R31 0 
Second timer value > R316 
Timer number f o r  2nd task crosses  a frame boundary. 
How many co l l ec t ions  do you want per  dump group? [?I > <CR> 
Data po in t  dump 2 .  Please list h ighes t  r a t e  group f i r s t .  
Data poin t  dump 3 .  Please llst highest  r a t e  group f i r s t .  
N o r m a l  co l lec t ion  values a r e :  9 (R4) and 4 (R3). 
Use a number that is less than deiauit or 
you’ l l  go out  of bounds on t h e  data s t r u c t u r e .  
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
5 
4 
3 
How many co l l ec t ions  do you want per  dump group? [?I > 3 
Name of t h i s  data dump: 
F i r s t  t imer value (cmd,Q,H,?) [?I > R41 0 
Second timer value > SCCO 
Which R4 task i t e r a t i o n  do you want? 10-81 0 
Name of t h l s  data dump: SCC startup time 
F i r s t  t imer value (cmd,Q,H,?) [?I > Q 
R3 task rate 
Please list h ighes t  r a t e  group f i rs t .  Data po in t  dump 4 .  
Data po in t  dump 5 .  
10 403 378 
6 
10 635 
6 
12 380 
6 
11 
7 
11 
11 396 89 
6 
11 638 
6 
14 380 
6 
10 
7 
11 
11 638 294 
6 
Please list highest  r a t e  group f i rs t .  
46 
4 
4 
5 
4 
4 
5 
4 
4 
4 
4 
5 
3 
4 
4 
4 
5 
5 
3 
5 
4 
5 
4 
4 
4 
4 
5 
4 
4 
4 
4 
4 
5 
4 
4 
4 
3 
5 
4 
4 
4 
5 
4 
4 
5 
4 
4 
4 
4 
4 
4 
4 
4 
12 
6 
11 
6 
11 
7 
10 
11 
6 
10 
7 
11 
6 
10 
7 
11 
11 
6 
11 
6 
11 
6 
10 
7 
11 
11 
6 
11 
6 
11 
6 
11 
7 
11 
10 
6 
11 
6 
11 
6 
11 
7 
10 
11 
6 
16 
6 
10 
6 
11 
7 
10 
389 
642 
638 
38 1 
641 
640 
380 
638 
639 
380 
642 
637 
382 
643 
638 
381 
642 
443 
283 
84 
57 
298 
>>Task R41 length. 
47 
AVERAGE = 4.125000 (72 Data points) 
VAR = 0.223592 (ST. DEY. = 0.472855) 
MAX = 5 MIN = 3 
Print histogram of Task R41 length [Yl? <CR> 
3 ( 4) **** 
4 ( 55) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
5 ( 13) ************* 
>>Task Startup time. 
AVERAGE = 8.888889 (72 Data points) 
VAR = 6.269163 (ST. DEV. = 2.503830) 
MAX = 16 MIN = 6 
Print histogram of Task Startup time CY]? <CR> 
6 ( 23) . . . . . . . . . . . . . . . . . . . . . . .  
7 ( 9) ********* 
8 ( 0) 
9 ( 0) 
10 ( 11) *********** 
11 ( 25) . . . . . . . . . . . . . . . . . . . . . . . . .  
12 ( 2) ** 
13 ( 0) 
14 ( 1) * 
is ( 0) 
16 ( 1) * 
>>R3 task ra te .  
AVERAGE = 533.458333 
VAR = 12412.259040 (ST. DM. = 128.110339) 
MAX = 643 MIN = 380 
(24 Data points) 
The spread i s  too large to print 
Print histogram of R 3  task rate CY]? no 
>>see startup time. 
AVERAGE = 240.750000 
VAR = 21286.214844 (ST. D N .  = 145.897960) 
MAX = 443 MIN = 57 
(8 Data points) 
Print histogram of SCC startup time [Y l?  no 
Merge any of the data sets?  no 
48 
References 
[Clune 841 
[Draper 83a] 
[Draper 83b) 
[Draper 83c] 
[Draper 83d] 
[Feather 841 
[Ferrari 781 
[Hopkins 781 
[Kong 821 
(NASA 79a) 
(NASA 79b) 
[Singh 811 
[TOY 781 
Ed Clune. 
Analysis of the Fault-Free Behavior of the FTMP Muliprocessor System: Baseline 
Master’s thesis, Carnegie-Mellon University, 1984. 
Development and Evaluation of a Fault-Tolerant Multiprocessor (FTMP) Computer, 
Vol I, FTMP Rinciples  of operations 
Charles Stark Draper Laboratories, 1983. 
Contract Report (CR) 166071. 
Development and Evaluation o f  a FTMP Computer, Vol 11, FTMP Software 
Charles Stark Draper Laboratories, 1983. 
CR166072. 
Development and Evaluation of a FTMP Computer, Vol 111, FTMP Test and 
Evaluation 
Charles Stark Draper Laboratories, 1983. 
CR166073. 
Development and Evaluation o f  a FTMP Computer, Vol N, FTMP Ezecutive Summary 
Charles Stark Draper Laboratories, 1983. 
Frank Feather, Carlos Liceaga. 
FTMP Rogrammer ’8 Manual 
2nd edition, 1984. 
Domenico Ferrari. 
Computer Systems Performance Evaluation. 
Prentice-Hall, 1978. 
Hopkins, A.L., et.al. 
FTMP - A Highly Reliable Multiprocessor. 
IEEE Trans. on Computers , October, 1978. 
Thomas H. Kong. 
Measuring Time for Performance Evaluation of Multiprocessor Systems. 
Master’s thesis, Carnegie-Mellon University, 1982. 
NASA-Langley Research Center. 
Validation Methods for Fault-Tolerant Avionics and Control Systems - Working Group 
NASA Conference Publication 2114. 
Measurements and Synthetic Workload Development. 
Meeting I ,  NASA-Langley Research Center, 1979. 
Research Triangle Institute. 
Validation Methods for Fault-Tolerant Avionics and Control Systems - Working Group 
NASA Conference Publication 2130. 
Ajay Singh. 
Pegasus: A Controllable, Interactive, Workload Generator for Multiprocessors. 
Master’s thesis, Carnegie-Mellon University, 1981. 
W.N. Toy. 
Fault-Tolerant Design of Local ESS Processors. 
IEEE Trans on Computers , October, 1978. 
Meeting 11, NASA-Langley Research Center, 1979. 
49 
[Wensley 781 Wensley, J.H., et.al. 
SIFT: A Computer for Aircraft Control. 
IEEE Trans. o n  Computers , October, 1978. 
