Fault-free validation of a fault-tolerant multiprocessor:  Baseline experiments and workoad implementation by Segall, Z. et al.
NASA Contractor Report 178075
FAULT-FREE VALIDATION OF A FAULT-TOLERANT
MULTIPROCESSOR: BASELINE EXPERIMENTS AND
WORKLOAD IMPLEMENTATION
Frank Feather, Daniel Siewiorek, and
Zary Segall
CARNEGIE-MELLON UNIVERSITY
Pittsburgh, Pennsylvania
Grant NAGl-190
April 1986
NI\5/\
National Aeronautics and
Space Administration
Langley Research Center
Hampton, Virginia 23665
NASA-CR- II~Of5
NASA-CR-178075
19860014806
LANGLEY RESEH~(CH CENTEh
LIBRARY, NASA
~·:/.r,1PTON, VIRGIN!A
https://ntrs.nasa.gov/search.jsp?R=19860014806 2020-03-20T15:41:08+00:00Z
3 1176013150967
Table of Contents
Abstract
1. Introduction
2. Background
2.1. Guidelines to Experiments
2.2. Proposed Methodology
2.3. Definition of PerCormance
2.4. The FTMP and Experimentation Environment
2.5. Previews oC Experiments
3. Interrupts
3.1. Mechanisms
3.2. Interrupts, System Validation, and PerCormance
3.3. Interrupts on FTMP
3.4. Experimental Results
4. Workload
4.1. Definition
4.2. Advantages oC A Synthetic Workload
4.3. Motivations
4.4. A Realtime Workload Model
4.5. Implementation oC the Synthetic Workload on FTMP
4.5.1. User Interfaces
4.5.2. Implementation: FTMP Tasks and Workload Considerations
4.5.3. Calibration
5. Future Work
6. Conclusion
Appendix A. Test oC Select RDjWRT Primitives
Appendix B. Example oC Workload Use
Appendix C. Installation Notes
Appendix D. FTMP Tasks
1
2
3
3
3
4
6
12
14
14
15
16
17
20
20
20
22
22
24
24
28
29
36
37
38
39
, 48
50
ii
List of Figures
Figure 2-11 Performance Evaluation Matrix
Figure 2-21 Software Appearance of FTMP (virtual machine)
Figure 2-81 Task Control Block Structure
Figure 2-41 Frame Structure
Figure 2-01 FTMP Support Environment
Figure 2-6: Steps to Creating a Program
Figure 8-1: Summary of FTMP's Interrupts
Figure 4-11 General scheme of performance comparisons among n systems (Ferrari 78]
Figure 4-2: Representation of a Synthetic Workload Task
Figure 4-8: Workload Model [Clune 84]
Figure 4-4: FTMP Synthetic Workload Environment
Figure 4-6: Task Switching Overhead
Figure 4-th Task Startup Overhead
Figure 4-'1: Baseline Experiment: Task Switching Overhead
Figure 4-81 Workload Experiment: Task Switching Overhead
Figure 4-G: Baseline Experiment: Task Startup Time
Figure 4-10: Workload Experiment: Task Startup Time
Figure 4-11: Baseline Experiment Task (AED)
Figure 4-121 Synthetic Workload Task (AED)
Figure B-1: Dlustration of Workload Tasks
Figure B-2: Running the FTMP Workload
5
7
9
10
11
12
18
21
23
25
27
30
30
31
31
32
33
34
35
40
41
iii
List of Tables
Table 0-1: Files for Running the Synthetic Workload
Table D-l: FTMP Files Cor the Synthetic Workload
49
50

1Abstract
In the future, aircraft employing active control technology must use highly reliable multiprocessors in
order to achieve flight safety. Such computers must be experimentally validated before they are
deployed. This project outlines a methodology for doing fault-free validation of reliable multiprocessors.
The methodology begins with baseline experiments, which test single phenomenon. As experiments
progress, tools for performance testing are developed.
This report presents the results of interrupt baseline experiments performed on the Fault-Tolerant
Multiprocessor (FTMP) at NASA-Langley's AffiLAB. Interrupt-causing exception conditions were tested,
and several were found to have unimplemented interrupt handling software while one had an
unimplemented interrupt vector. A synthetic workload model for realtime multiprocessors is then
developed as an application level performance analysis tool. Details of the workload implementation and
calibration are presented. Both the experimental methodology and the synthetic workload model are
general enough to be applicable to reliable multiprocessors besides FTMP.
21. Introduction
In the 1990's aircraft will employ computers that must run correctly and continuously for the aircraft to
fly. NASA, in its Aircraft Energy Efficiency (ACEE) program requires that the probability of failure in
these computers be less than 10-10 per hour. Meeting such requirements can not be achieved with
standard realtime computers; instead faul~tolerant computers have been developed to meet these
requirements. Two such systems are SIFT (Software Implemented Faul~Tolerance) [Wensley
78J conceived by SRI International and fabricated by Bendix Corporation; and FTMP (Faul~Tolerant
Multiprocessor) [Hopkins 78J, conceived by the Charles Stark Draper Laboratory, Inc. and fabricated by
Collins. Engineering prototypes of these two systems have been delivered to the Avionics Integrated
Research Laboratory (AIRLAB) at NASA's Langley Research Center.
These complex systems, which must meet stringent performance requirements, have to be validated (i.e.
demonstrated to be functionally correct). However, since a probability of failure of 10-10 per hour
translates to one failure per million years of operation, a validation method must be developed to discover
flaws in the design and implementation before such a system is placed into service. Showing a system is
correct can take place at many stages from mathematical models and theorem proving, also called
veri/ication, to experimental testing, called validation. Mathematical models of the system are based on
simplifying· assumptions and can. be used in conjunction with, but not as a substitute for, actual
experimentation. Indeed, many of the errors in a system surface during the experimentation and use of
the system. Bell Telephone [Toy 78J divided the causes of system outages for their fault tolerant
electronic switching systems into several categories. The percentages given for each category represents
fraction of total down time measured in the field attributed to each cause:
• Hardware Reliability: Actual component failures - 20%,
• Software Deficiencies: Software design errors - 15%,
• Recovery Deficiencies: Inability to detect, isolate, and correctly recover from faults - 35%,
• Procedural Errors: Human error on the part of maintenance personnel or office
administrators - 30%.
Faul~Tolerant techniques directly impact the first category. The later three categories are all forms of
design errors. These errors can be reduced by effective system design and validation.
The goal of this research is to develop a methodology for the validation of the faul~free performance of
faul~tolerant avionic multiprocessors. Initially this methodology will be applied to FTMP, although the
approach should be general enQugh to migrate to other faul~tolerant systems like SIFT.
32. Background
2.1. Guidelines to Experiments
Over the last decade, Carnegie-Mellon University has devoted over 100 man-years to the design,
construction, and validation of multiprocessor systems. Some of the guidelines developed over the last
decade include:
• The experimental validation methodology is 8uccessively refined as experiments uncover new
information and/or the methodology is applied to new multiprocessor systems.
• Design experiments to validated behavior that is documented as well as uncover behavior that
is not documented.
• Perform experiments in a systematic manner. Since the search is for the unexpected there is
no shortcut to thorough testing.
• Experiments should be repeatable.
• The feasibility of performing various experiments is tempered by what is available in the
experimental environment. More sophisticated experiments may have to be postponed until
the experimental environment is provided with more tools.
• A building block approach should be used wherein one variable is changed at a time so that
causes of unexpected behavior are easy to isolate.
• Testing should take advantage of the abstract levels used in the design of the system.
Using these guidelines, we will develop a generalized methodology for testing multiprocessor systems.
2.2. Proposed Methodology
Showing that a computing system, as designed, will meet its dependability goals is called validation
[NASA 7980]. In 1979, NASA held several workshops to determine system validation procedures. One in
particular [NASA 79b], produced a detailed list of validation categories to evaluate the system in an
orderly manner. A building block approach was chosen so that confidence in the system would be built
up in an incremental manner starting with the understanding and measurement of primitive hardware
and operating system activities. Mter primitive activities are characterized, more complex experiments
are devised to define interactions between primitive activities. This orderly progression insures uniform
coverage and makes it easier to locate the cause of an unexpected phenomenon. Steps in the proposed
methodology included:
1. Initial Checkout and Diagnostics
2. Programmer's Manual Validation
3. Executive Routine Validation
4. Multiprocessor Interconnect Validation
5. Multiprocessor Executive Routine Validation
6. Application Program Validation and Performance Baseline
47. Simulation of Inaccessible Physical Failures
8. Single Processor Fault Insertion
O. Multiprocessor Fault Insertion
10. Single Processor Executive Failure Response Characterization
11. Multiprocessor System Executive Fault Handling Capabilities
12. Application Program Validation on Multiprocessor
13. Multiple Application Program Validation on Multiprocessor
The first six tasks validate the fault free functionality of the system while the next seven validate fault
handling capabilities. Step 1, initial checkout and diagnostics, is usually done before system delivery,
while Step 2, manual validation, is ongoing throughout the testing process. Part of this project involved
updating and clarifying information in FTMP's manuals [Draper 83a, Draper 83bJ with a user's guide
[Feather 84J. Of the other fault free validation steps, Step 4 is considered hardware validation, Steps 3
and 5 are operating system level validation, and Step 6 is application level validation. This project deals
with fault free performance (Steps 2 through 5), and develops an application level tool called the
8ynthetic workload to address Step 6.
Ideally, hardware and operating system validation should take place in the development stage of the
respective levels. For example, as the operating system is written, a set of validation tests is produced.
Each step of the methodology, like the whole methodology, follows a building block approach. First,
ba8eline experiments are conducted. Baseline experiments measure a single phenomenon while all other
interactions are held constant. These experiments are designed to validate the basic assumptions used in
the mathematical models as well as validate the assumptions made by the application programmers.
Once individual phenomenon have been characterized, more advanced experiments can be conducted
which explore the interaction between basic phenomena.
As stated in the experiment guidelines, the validation procedure is tempered by the available
experimental environment. This implies that at anyone step, more sophisticated experiments may have
to be postponed until the advent of more sophisticated experimental tools. Experiments can proceed in
parallel if tools are available at a higher yet disjoint step. For example, at AJRLAB, fault insertion
eXperiments occur in parallel with fault-free validation and performance experiments.
~.8. Definition of Performance
Validation experiments test system behavior and establish whether the system works correctly. That is,
validation experiments test functional correctness. In addition to establishing behavior, performance can
also be measured. Performance refers to how well a system, assumed to be functionally correct, works.
Validation and performance are not always dichotomous; in some systems, if performance criteria are not
met the system is considered to be incorrect. Therefore, validation experiments are usually accompanied
by performance analysis. For example, testing basic instruction times, besides testing functional
5correctness of hardware instructions, also can be used to estimate total system throughput in terms of
operations per second.
Performance measurements can be conducted at many levels, starting with the instruction set, working
up to the operating system and then the application level. Three parameters which can be measured at
each level are Throughput, Utilization, and Delay. Figure 2-1 illustrates the system levels and the types
of performance experiments that can take place at each level. In more detail, the performance
measurements are:
• Throughput:
o Instruction Set: Measure the time to access limited resources (e.g. memory, clock) and
execute instructions
o Operating System: Measure the execution times of the operating system primitives and
tasks
o Application Software: Measure the execution times of the different subsections of each
application task
• Utilization:
o Instruction Set: Frequency and percentage of hardware resource used
o Operating System: Frequency of OS primitives use
o Application Software: Measure idle time between tasks
• Delay (and Variation):
o Instruction Set: Variation in the access time of resources; amount of contention for
resources
o Operating System: Variation in execution of primitives due to resource contention
o Application Software: Delay (and variation) between a data write and a data read of
common data
In general, baseline experiments are conducted at the instruction set and operating system levels while
more complex measurements occur at the application level.
Application
Executive
Software
Instruction
Set
1---------------------------------------------------------------1
I Display, I Subtask I Idle I Write , I
I Flight Control I Execution I Time 1 Read Delay II I Times I 1 & Variation I
1---------------------------------------------------------------1
I Scheduler, I as I as Primitives 1 Primitive I
I Message I Primitives I Freq. of 1 Variation, 1
I System I Times I Use 1 Contention 1
1---------------------------------------------------------------1
I Instruction, I Instr. & I Resource I Resource I
I Exceptions I Resource I Freq. of I Variation, I
I I Times 1 Use 1 Contention I
1---------------------------------------------------------------1
Behavior Throughput Utilization Delay
Figure 2-1: Performance Evaluation Matrix
6Initially, this project deals with instruction set/executive level baseline experiments (interrupts).
However, realizing that the most meaningful performance statements come from the application level, an
application level performance tool called the synthetic workload was developed. There are several
advantages to validation at the application level:
1. This is the level at which real programs (i.e. natural workloads) run. Any meaningful
statements about computer performance to the application programmer Dlust be based on
measurements made at this level.
2. Experiments are much easier to design at the application level. The person validating the
system at this level does not need hardware and/or operating system expertise.
Baseline experiments and workload implementation were done on the Fault-Tolerant Multiprocessor
(FTMP). The next section discusses that computer.
2.4. The FTMP and Experimentation Environment
The Fault-Tolerant Multiprocessor (FTMP) has been discussed in several papers and manuals [Draper
83b, Hopkins 78J. This section is a software overview of FTMP from the application programmer's
perspective. The reader is referred to the references mentioned above for more details.
Figure 2-2 illustrates the FTMP system. Each processor in this figure actually consists of three
processors in a fault-tolerant configuration executing in lockstep. This trio of processors is sometimes
referred to as a processor triad or a virtual processor because the application programmer sees it as a
single processor. Likewise, memory is in a triad configuration. The FTMP can consist of one, two or
three processor triads. Each triad has a local memory which is divided into PROM and RAM. The
PROM contains frequently used executive code and is identical in all processors. Each processor's RAM
holds local variables and stack, plus application software paged in from global memory. A bus connects
the triads to global memory, I/O devices, a real-time clock and several latches needed for fault handling.
The triads execute independently of each other when accessing global memory. If a program running on a
processor triad uses a global variable, the program must first move the variable from global to local
memory with a bus service routine. Similarly, the variable is written back to global memory with
another bus service routine.
Work on FTMP is performed by tasks. A task is a single thread of execution that runs by itself. Each
task has a time limit associated with it. If a task does not complete by its allotted time it is aborted and
another task is started. A task can execute on any processor triadl .
lThe only exception is a rate 1 task called ·SCC·, the system configuration control task; this task is systematically run on
different processor triads so it can execute self·tests on each triad. There is a bit in SCC's Task Control Block, set by SCC, that
specifies on which triad the dispatcher,should run SCC.
•7
Processor Processor Processor
1 2 3 11 1/0 Port 11
8K 8K 8K 8K 8K 8K 11 1/0 Port 2 \PROM RAM PROM RAM PROM RAM
~ 110 Port 31
~ 1/0 Port 41
GLOBAL
MEMORY
HIIO Port sl32K
11 1/0 Port 61
Error SYSTEM BUS
Latches 11 1/0 Port 7 1
I/O I/O ~ I/O Port 8\Real Time Port Port
Clock 11 10
YI/O Port 91
Figure 2-2: Software Appearance of FTMP (virtual machine)
8In a realtime system a task is run at regular intervals which define the task's iteration rate. Not all
tasks need to run at the same iteration rate. For example, the task that updates the display terminal
does not have to be executed nearly as often as the task that monitors and adjusts the plane's airspeed.
Tasks are grouped by common iteration rate, called rate groups, and are run within /rame8. A frame
defines the execution interval length. The execution interval length is one over the iteration rate of tasks
grouped in the frame. In the time allotted by the frame, the working triads must execute all the tasks
defined for the frame's iteration rate. Task control blocks, which contain all the information necessary to
run a task, are in a linked list resident in global memory. Individual triads access this global list to select
a task to run. When FTMP is in a multiple triad configuration, some tasks will execute in parallel.
When there are no more tasks left in a particular iteration rate group to execute, a triad will either
become idle or start executing tasks from a lower iteration rate group. Figure 2-3 is an example of a task
control block structure arranged by rate groups (defined below). The control blocks in this figure are
those of the synthetic workload (Section 4).
The FTMP has three iteration rates which define three different frame sizes. There are separate task
control block lists - one for each rate group. The frame sizes are:
• R4, the basic frame size
• R3, equivalent to 2 R4 frames
• RI, equivalent to 4 R3 frames; also called the major frame
Figure 2-4 illustrates the different frames and their execution frequencies.
FTMP handles the multiple rate groups as follows. At the beginning of an R4 frame, one of the triads,
called the re8pon8ible triad, starts the R4 frame for that triad and signals another triad to start its
frame. This second triad in turn signals the third triad, if it exists, to start its R4 frame. Each R4 frame
does not necessarily have the same responsible triad. Every second R4 frame signals the start of an R3
frame and every eight R4 frames starts an RI frame. Once a triad runs out of R4 tasks to execute, the
triad will begin taking tasks from the R3 task list to execute. Likewise, when a triad runs out of R3 tasks
it takes tasks from the Rl task list. Execution of a lower task frame group can be suspended in a triad
by the start of a higher numbered frame group. Suspended tasks are continued once the the triad runs
out of tasks from the higher iteration rate. For example, the beginning of an R4 frame suspends
execution of R3 and Rl tasks until all tasks in the R4 frame finish. The processor triad that finishes the
last R4 task in the R4 frame becomes the responsible triad that starts the next R4 frame.
Several computer systems are involved in creating and running experiments on FTMP as illustrated in
Figure 2-5. The steps to creating an experiment and the systems involve include:
• Create and compile a program task written in a language called Automated Engineering
Design (AED) system which runs on an mM 4381.
9R4CONTROl R3 CONTROL R1 CONTROL
(SCC's posHlon in tos\(
Jist mey chenge dunng
8xecut Ion )
~
DISPLAY
~
"--
see
~
---
-
READAll
---
-
Worlcload.R 11
-
,...--
Workload R12
~
Workload R1 :5
-
-
-
IDLE'
~
IDLE 2
,...--
"""'-
IDLE :5
Work:load.R31
TIMER
Worlcload.R33
Workload.R32
R44 Special
Initial
(Starb Workload)
-
-
Workload R41
--- Workload.R42
~
-
\
.....
Workloed.R43
I
I
I
•I
•_...
Figure 2-3: Task Control Block Structure
10
1 Mojor Fromes 1 1
25 Hz I H / IR4 Frome
Frflme Mflrks
12.5 Hz I /< >/
R3 Frame
3.125 Hzl< Rl Frome ~
Figure 2-4: Frame Structure
• The user must map out where the code goes in memory along with the location of stack, local,
and system variables.
• The user then modifies OS task tables to include the task in FTMP's task structure,
reassembling task tables when finished.
• The experimental task is linked with the rest of the operating system code to create an
absolute load module.
• The load module is downline loaded from the ffiM 4381 to a VAX-ll/750.
• The load module is downline loaded from the VAX-ll/750 to FTMP.
• The FTMP test adapter (CTA) is used to debug the experimental program.
• Once the experimental program is correct, the test adapter is USed to dump a memory image
into a file for later analysis.
Figure 2-6 illustrates the process of creating a program. When the baseline experiments presented in this
paper were run, the experimental loop took up to two hours from the time of compiling a program on the
ffiM 4381 until it is executed on FTMP. The current environment has been updated to simplify the
experimentation steps. The experimenter must have knowledge of several systems including the m1\1
4381, the VAX-ll/750, and FTMP. The experimenter also must be intimately familiar with FTMP's
hardware, operating system, and. task structure.
In order to shorten this experimental loop and improve experimental efficiency, a synthetic workload
..
11
VAX 11/750
IBM 4381
PDP
Emu16tion
I UNIBUS
PROM Progr6mmer I , Test Ad6ptor
1553
- F6ult-
- Injector
-Seven
-FTMP RS232 - F T M P1553
-Dfsp16Y ~ -InterfacesMonitor
Figure 2-6: FTMP Support Environment
model for real time avionic systems was proposed [Clune 84]. With an easy to envision model, an
experimenter can be working with the workload after merely a few hours of reading over the model,
getting an overview of FTMP and learning VAX/VMS commands; the mM 4381 is eliminated from the
experimental loop.
12
AEO
Program
Task
CAPS-6
Assembly
Relqcotobl
Object
Modules
as Modules
FTMP
VAX
Figure 2-8: Steps to Creating It. Program
2.6. Previews of Experiments
To date, baseline experiments up to the application level have been performed. Areas of experiments,
classified by the level of abstraction presented in Figure 2-1, are shown below. Experiment sets marked
by an If,Sterisk (.) have already been performed IClune 84J.
1. Instruction Set Level:
• Verify the clock as an accurate fundamental measuring device. With the clock
calibrated, future performance experiments can be performed with confidence. (.)
• Timings of Assembly and High-level language instructions. (*)
13
• Observe and document the existence and the direct effects of interrupts.
2. Executive Software Level:
• Executive primitive and overhead times (*)
• Interrupt procedure times
• Memory· Access time
• Bus access and contention delays
• Fault-tolerant overheads
3. System and Application Level:
• Frame utilization characteristics (*)
• Length of the frame of all task iteration rate groups
• Fault-tolerant overhead to the application programmer
• Development of an application level tool for measuring performance.
This report covers two experiments on FTMP. First, a set of experiments were run to test the existence
and document effects of interrupts on FTMP. The second part of this report discusses the development
and implementation of the application level tool for measuring performance called the synthetic workload.
A set of experiments to calibrate the synthetic workload is also discussed. Once installed, the synthetic
workload can be used to run application level experiments as well as certain executive level baseline
experiments.
14
3. Interrupts
Interrupts can be viewed as a signal of unusual events in a processor. These signals can be of simple
events like arithmetic overflow or of more complex events like a device is ready for input. Interrupts can
be used for communication between a user process and the supervisor, in which case they are called traps.
A user process invokes a trap to request service (I/0, resource request, etc.) that the user process could
not fulfill directly. Interrupts are also a mechanism for enforcing virtual memory and protection schemes.
Interrupts notify the processor that a memory reference was to a page not in memory (page fault) and the
page needs to be brought in, or can halt a program that tries to access memory outside its memory space.
Finally, interrupts are a mechanism for software reliability. Whereas, fault-tolerant systems, through
redundancy, can catch hardware errors and mask or record them for later reconfiguration, interrupts are
the mechanism for detecting and recovering from software faults. There are four categories of interrupts:
Intraprocessor
Intrasystem
Executive
asynchronous events that happen within the processor during the execution of a
machine instruction. Examples of these events include: zero divide, arithmetic
overflow, memory access violation, privileged instruction execution, and page fault.
interrupts caused by a peripheral such as a disk, timer or terminal. Examples of these
interrupts include timer reached zero, input received, and output device ready.
an interrupt caused by the current executing program. Executive interrupts are used to
make requests of the executive (operating system) program. Examples of such requests
are starting new tasks, allocating hardware resources, communication to other tasks,
etc. These interrupts are sometimes referred to as traps, supervisor calls (SVC), or
privileged mode calls.
Interprocessor interrupts between two intelligent processors. This type of interrupt can be used to
implement an interprocess communication (IPC) mechanism between processors.
This section describes mechanisms used in implementing interrupts, followed by a discussion of interrupts
on FTMP CAPS-6 processor. Finally, results of experiments to test interrupts mechanism on FTMP are
presented.
3.1. Mechanisms
Generally, interrupts are vectored, that is, the address of the interrupt handling routine is in a special
memory location. When an interrupt occurs, control is transferred to a routine pointed to by this vector.
Several devices can be associated with a single interrupt vector, in which case the processor must poll the
devices to see which one caused the interrupt.
When there are several interrupt vectors, a system will sometimes have interrupt priority nesting.
Nesting allows higher priority interrupts (e.g. power failure) to interrupt the processing of low priority
interrupt routine (e.g. overflow).
15
To provide operating system support for protection mechanisms, most computers have, at the very
minimum, user and supervisor states. Which protection violations are reported are a function of machine
state. Obviously, interrupts like privileged instruction violation should not occur in supervisor state,
hence there is an architectural decision of which interrupts are ignored in supervisor state.
Finally, there is the issue of disabling and masking interrupts. Disabling an interrupt prevents a device
from sending an interrupt. Thus, the interrupt signal is actually turned orr. Processors might disable an
interrupt to take a device out of service. In contrast, masking does not prevent the interrupt from
occurring, but instead ignores the interrupt until the mask is changed. Using this definition, in a priority
interrupt scheme, low priority interrupts are masked by a higher priority interrupt. Processors generally
have a hardware mask field which tells which interrupts to ignore. In general, most interrupts (overflow,
I/O, etc.) are supervisor maskable, but only intrasystem and interprocessor interrupts can be disabled.
Some system responses to interrupt include:
• Do nothing. The results are equivalent to masking the interrupt except that the interrupt is
cleared since it was acknowledged. For example, some applications might wish to be notified
of an overflow condition yet continue execution.
• Abort the current job (e.g. divide by 0, memory access violation, etc.).
• Restart the job or start a job with new software (e.g. N-version programming). This is a
consideration in a system with fault-tolerant software.
• Performs service (e.g. supervisor call or trap, page fault).
• React to an event (e.g. timer interrupt, I/O interrupt, IPC interrupt).
3.2. Interrupts, System Validation, and Performance
The steps to evaluating interrupts are similar to the steps taken when evaluating any part of the
system. First, the existence of the interrupt is tested, thus validating the programmer's manual. Baseline
experiments follow which test functional correctness of the interrupt mechanisms (i.e. do interrupt
masking mechanisms work correctly, are supervisor/user effects of interrupts correct, etc.). Interrupt
evaluation encompasses both the hardware and the operating system. Interrupts are invoked in hardware,
but the interrupt handlers are in the operating system.
Interrupts do affect performance. An add instruction that overflows (thus invoking an interrupt) is
slower than the equivalent instruction that does not overflow. Likewise, page faults impact performance.
Therefore, the performance matrix of Figure 2-1 was used:
• Throughput - How long does it take to process the interrupt? This delay is a function of the
length of the interrupt handler, the system load, whether the handler is in memory (i.e. does it
need to be paged in), etc.
16
• Utilization - How often are interrupts invoked. Although utilization of proce880r exception
interrupts (overnow, privileged mode violation, etc) is of less intereSt due to rarity, utilization
of IPC and page fault interrupts are more frequent.
• Delay - Variation of interrupt delay between processors. Also, does the effect of interrupts
cross processor boundaries?
The following is an example of experimental steps for evaluating interrupts:
1. Test the existence of interrupts (software manual verification).
2. Test interrupt masking mechanisms. Also test which interrupts occur in user versus
supervisor mode.
3. Test how long it takes to process each intraprocessor interrupt (overnow, page fault, etc.).
Compare this to interrupt-free execution.
4. What is the overhead of processing intrasystern interrupts (timer, terminal, etc.). How often
do these interrupts occur?
5. For executive interrupts (traps), evaluate how long it takes to service the trap. Likewise, how
long does a processor take to respond to an IPC interrupt?
6. What is the interrupt rate of page fault and IPC interrupts? For typical instruction
execution, how often do page faults occur?
7. Perform the above tests in both uniprocessor and multiprocessor configurations.
8.8. interrupts on FTMP
The processor elements in FTMP are Collins Avionics CAPS-6 processors modified for fault tolerance.
The CAPS-6 processor has 18 interrupt vectors, stored in the first 18 words of PROM. Vectors 0-7 are
unavailable in the FTMP implementation of the CAPS-6 processor. According to documentation [Draper
83aJ, interrupts can only occur in user mode; interrupts in supervisor mode are automaticly masked.
Actual implementation reveals that interprocess communication (IPC), interval timer, and page fault
interrupts can occur in supervisor mode. Otherwise, for example, the processor would not be able to page
executive code. Interrupts 8 through F (base 16) are maskable. The CAPS-6 has a bit mapped interrupt
mask which is stored in the Process Status Descriptor (PSD) of each task. This mask is loaded into the
hardware interrupt mask when the task is started. There are no interrupt priority levels in the CAPS-6
pr6cellsor.
Figure 3-1 summarizes FTMP's interrupts. This table also presents the results of experiments to test
the effect and existence of these interrupts.
17
8.4. Experimental Results
Many of the interrupts do not have an interrupt handler. These are:
• Arithmetic Overflow
• Write Protection Violation
• lllegal Opcode
• Stack Overflow
• Non-local Search Fault
• Privileged Instruction Violation
• Privileged Mode Call Fault
Instead, a generic routine called -NO.INT.HANDLER- handles all the above interrupts.
-NO.INT.HANDLER- is an infinite while loop that will, of course, hang the system when entered. An
alternative implementation of -NO.INT.HANDLER- is to ignore the interrupt, immediately returning
control to the executing task. The reason for looping forever is for debugging; when the system entered
this routine you could examine the system state to find where the error occurred. Since there is this
potential of hanging the system if one of the above exceptions occurs, all tasks, including application
tasks, run in privileged mode where exceptions are ignored.
In addition, there is no interrupt vector for divide exception. A divide by zero in user or privileged
mode will stall the system. Admittedly, the above hazards are a characteristic of the present,
experimental system. The original design called for USER/PRIVll..EGED mode implementation and
interrupt handlers.
Running tasks in privileged mode, while preventing system failure from an unimplemented interrupt,
does compromise software reliability. In particular, write protection is ignored in privileged mode, so a
software error can be potentially disastrous (i.e. a R4 task writing into a R3 task's stack area). Likewise,
an overflow or illegal instruction signals software error and the need to stop the task (for task restart or
n-version programming). These signals are missed in privileged mode execution.
Even if interrupts were implemented as the original design called for, one may be reluctant to execute
tasks in USER mode because its of limited power. In particular:
1. A user task cannot use system bus service routines, that is, the user cannot access system
memory. User tasks attempting to access system memory stall the system (the original design
calls for a write protection violation interrupt). Hence, all variables must be in local memory.
Since a task might run on any processor triad from one task execution to another, local
memory variables are not guaranteed to retain values between task iterations.
2. A user task can save values through use of a task data block. Variables in a task data block
are copied from system memory into local memory by the dispatcher before the task starts,
and moved back to system memory when the task ends. Thus, these variables retain their
value between task iterations. However, changes to data block variables are not reflected in
system memory until the task finishes, which limits the potential for inter-task communication
to task completion boundaries.
Interrupt
Humber Maskable
18
Assignment/
Function
Mode/
Effect
8
9
A
B
C
D
E
F
10
11
12
13
14
15
16
17
yes
yes
yes
yes
yes
yes
yes
yes
no
no
no
no
no
no
no
no
unassigned
unassigned
Arithmetic
Overflow[1]
IPC interrupt
Interval timer
Write Protection
Violation [1]
Page Fault [4]
Test Adapter[4]
Halt Instruction
Execution [1]
Illegal Opcode[1]
Stack Overflow[1]
Hon-Iocal Search
Fault[1,2,4]
Privileged instr
Fault [1]
pmcall fault[1,3]
unassigned
unassigned
USER/Stalls system
PRIV/No effect
USER/Stalls system
PRIV/Write protection
ignored
USER/stalls system
PRIV/ignored
USER/stalls system
PRIV/ignored
USER/stalls system
USER/No supervisor routines
to support pmcalls
• no Divide exception[5] USER or PRIV/Stalls
system.
[1] No interrupt handler written. If this interrupt occurs, a routine
called -HO.IHT.HANDLER- is entered which executes a DO-FOREVER loop.
[~] Hon-Iocal Search Fault occurs when a routine attempts to access a
variable in its caller's local environment that does not exist. Hone
of FTMP's software demands non-local searches; instead, the software
uses static local variables to communicate to nested procedures.
(3) -- Pmcall, Privileged mode call, is an instruction that a user process
can use to call supervisor routines. There are no supervisor routines
to support this mechanism on the current version of FTMP.
[4] Not tested.
[5] -- There is no interrupt vector for Divide Exception.
Figure 3-1: Summary of FTMP's Interrupts
19
3. Synchronization between user tasks is very limited (ir not impossible) since user tasks cannot
access system bus routines. The original design or FTMP does provide constraint bits in the
task tables ror task ordering (Le. do not start a task until specified tasks are finished), but
these bits are not implemented on the current version or FTMP.
The reliability/system capability trade-orrs or running a task in USER or PRIVILEGED mode is a
dilemma to the FTMP programmer. However, with minor modifications to the original design, some or
the power only available in the privilege mode can be made available to a user application task. As an
example, making some or the system bus routines available as traps (see interrupt number hex[15J -
pmcall rault) would give the user controlled access to system memory without compromising the sortware
reliability or user mode execution.
Since many interrupts are not implemented on FTMP, no perrormance analysis was perrormed. The
rest or the report instead concentrates on a tool ror application level experiments: the synthetic workload.
20
4. Workload
4.1. Definition
The workload of a computer is defined as the set of all inputs (programs, data, commands) the system
receives from its environment. A workload can be classified as natural or synthetic. Natural workloads
accomplish useful work while a synthetic workload models a natural workload.
There are many types of natural workloads. If the computer is a timesharing system the workload
would be a user typing commands to the terminal. The workload would also include overhead of loading
user programs, inputing data, and executing user programs. For control computers the workload is of a
different flavor; the input is in the form of sensor readings that must be processed before they are
overwritten. The program task that processes the sensor data is also considered part of the control
computer workload. These tasks are executed at regular intervals.
The above two situations are examples of natural system workloads. Evaluating the performance of a
natural workload involves putting measurement code into an existing system and collecting workload
performance data over a period of time. With the second example, a control system, evaluation would
involve taking measurements on existing control software to evaluate its performance. Sensor input to
the control program could be real input from the actual environment (i.e. the computer would be flying
an airplane) or simulated sensor input. In either case, we assume the system and application software
already exists and the major effort is in setting up the system for evaluation.
A synthetic workload, like a natural workload, exerCIses a computer system. But unlike a natural
workload which at least must have simulated input to "real" application programs, a synthetic workload
is essentially a "fake" set of application programs (or tasks) that are modeling a natural workload. A
synthetic workload can test a computer without having to develop or install application software.
Characteristicly, synthetic workloads are controllable by the experimenter and can be used to analyze
performance by varying parameters in the synthetic workload model.
4.•2. Advantages of A Synthetic Workload
As inferred from the above discussion, although a synthetic workload does not represent an application
as well as a natural workload, there are several advantages to synthetic workloads:
1. Easy to create and debug. A natural workload must be written as well as have a natural or
simulated external environment. If analyzing performance (perhap~ for a performance
improvement study), a natural workload would already exist and thus would be preferred.
However, if we are performing a feasibility study where external input, let alone application
software, might not exist for the system, a synthetic workload is an excellent device for
measuring performance. With little effort to create and debug the synthetic workload, we
could answer some feasibility questions such as "Is the computer fast enough for our target
21
applications!- or -Does the computer have enough capacity.for the natural workload we are
modeling?-
2. Easily repeatable. In an earlier section we listed several guidelines for experiments. One of
those guidelines included experimental repeatability. With natural workloads, repeating an
experiment would involve recording all the environmental inputs over a measurement period,
as well as output which might have an effect on the input. This is particularly difficult if
output from the system effccts the input. The natural workload approach tends to bc
cumbersome in terms of storage requirements. A synthetic workload not only simplifies the
environment through a model but also simplifies the interface. The only data that needs to be
recorded for repeat experiments is the workload parameters and the measurement period.
These parameters can set the system to the exact state of the original experiment.
3. Easily controlled by parameters. The workload model is designed to make variation of
parameters easy. There is no need to recompile or reload the system as parameters are varied.
With a parametric model, sensitivity to parameter changes can be systematically explored and
bottlenecks discovered.
4. Model many natural workloads. With new computer systems we usually want to study the
feasibility of using the system for many types of applications or natural workloads. Modeling
these applications with a single synthetic workload can yield a good feeling for the
performance of a set of natural workloads.
. 5. Easily migrated to different systems. Generally the same workload model can be used on
several systems. Thus if we model the same workload on several computer systems it is much
easier to make direct comparisons between systems. Figure +-1 illustrates this concept. In
this figure, if workload W is a natural workload it is sometimes called a benchmark.
~--~--~~---T---'
t r- J--,
System I System I
:P I n I
~-,--~
1'-_1__,
I ,.,,-- \
\ 19,. I
'--T--.I'
'-- --..... - - - - J
Figure 4-1: General scheme of performance comparisons among n systems (Ferrari 78]
22
Of course there are disadvantages to using sy~t~etic workloads:
1. The synthetic workload is only an approxir,nation of a natural workload.
2. The system must be dedicated while usillg the synthetic workload. With natural workloads
data can be collected while useful work is l>eing done. .
4.8. Motivations
An additional motivation for designing a synthetic workload for FTMP is to simplify the
ex~erimentation environment (see Figure 2-5). Prior to the use of the synthetic workload, experiments
were performed by creating a program on ~ IB~ 4381 followed by compilation, assembly and linkage of
the task. An absolute load module Was then downloaded to the support VAX and the~ to FTMP for
execution. The entire experimental cycle usually took up to two hour$ assuming the experiment was
designed correctly. Analysis was limited to a few parameters in each ~periInent. To aIlalyze data from
the experiment the user must provide a data collection prograIO or modify lion existing data collection
program. The ori~inal FTMP baseline experiments were conducted i~ this IIlanner. In order to master
t~.e experimental loop, the user had to learn about the internal structure of FTMP, including the setting
up of task tables, the CTA interface progrl\m between FTMP and the VAX, and the VAXjVMS
c~rp.mand language. Because of the time it tool:c t,o develop experiments, there Was substantial motivation
to simplify the experiment loop, even possibly takin;g t~e 113M 1~81 - the r,naj?r bottleneck - completely
out, of the experimen1tal loop.
A synthetic workl~:>ad relieves the user of these details as well as providing a mechanism for further
simplifying experiDllental preparation. Synthetic w9rk~oad experiments ""ould be run by varying
Parameters in the model. The parameters of the synthetic workload must correspond to meaningful
Yaril\~lesi otherwiSe analogies to real workloads woulcl be IIlel\Dillgless. There is, of course, a fine line
~et;'Yeen represeD.tativeness and ease of use.
The next s'ection discusses a realtime worklol\d model. This is followed by the ~etails of the
implementati,on of that model on FTMP and the program support for the implementation. Finally,
, ..:.'.:' .,.. "" .
several work~load experiments are compared to equivalent baseline experiments to calibrate (i.e. test the
representativeness of) the synthetic workload.
4,.4~ A R'ealiibn,e Workload Model
The goal of any model is to find a simple representation of a system that is not too far removed from
th;e natural system. If the model is too complex, deriving conclusiollS from parameter changes w.ill be
difficl'Jlt. Conversely, too simplistic a model would not aclequately describe system behavior.
There are several factors that must be considered when developing a realtime workload model. First is
23
the task structure oC realtime workloads. A task is a single thread oC execution. With a realtime system,
a task is run at regular intervals, defining the iteration rate ot that task. Not all tasks need to be run at
the same iteration rate (i.e. a display terminal does not need to be updated nearly as often as the
airplane flap control). Thus a realtime task model should allow Cor multiple iteration rates. Control
systems demand task completion within the interval defined by the task iteration rate, which is referred
to as a hard deadline. This implies that any implementation oC a workload model must collect data from
several task iterations to check it deadlines, and thus iteration rates, are adhered to. A realtime workload
model was presented in [Clune 84]. The following discussion is an overview or that workload model.
For our model, tasks are assumed to be execution entities sharing a common memory. Each task has
the Corm:
• read sensor data
• read interprocess communication (IPC) data
• do work (computations) on the data
• write IPC data
• write sensor data
On FTMP, a task is represented by the program in Figure 4-2. In this case the loops represent data read
in (P and Q), operated on (T), and written out (R and S), with A=B+C considered the typical
instruction. The communication mechanism between processes on FTMP is main memory. Thus both
sensor and IPC exchanges are done through memory reads and writes. The value or the realtime clock is
stored after each iteration ror later timing analysis.
Task1 ();
Beg1n
Read (P1 , Ql' T1 , R1 , Sl);
Store(T1..e);
For X=l to P1 do
Read Sensor Inpu~ (read ..e..ory);
Store(T1..e);
For X=l to Q1 do
Read IPC Inpu~ (read ..emory);
Store(T1.e);
For X=l to T1 do
Execu~e Ins~ruct10n (A =B + C);
Store(T1me);
For X=l to R1 do
Wr1~e Sensor Ou~pu~ (vr1~e memory);
Store(Ume);
For X=l to Sl do
Wr1~. IPC Ou~pu~ (vr1~e memory);
Store(T1me);
End;
Figure 4-2: Representation or a Synthetic Workload Task
The above task model is sufficient to implement a synthetic workload on FTMP. However, if we want
to more closely approximate a realtime system, a higher level structure is required.
24
The next abstraction level above the task is the function. A workload can consist of any number of
functions, each of which is composed of one or more tasks. The parameters at the function level are:
• the number of tasks
• frequency of execution of this function. All tasks within the function will have this iteration
rate.
• percentage of total system instructions used by the function
• percentage of total sensor I/O used by the function
• percentage of totallPC I/O used by the function
Tasks are grouped into a function because of parametric similarities (Le. perform approximately the same
number of operations and have the same execution rate), rather than functional similarities.
Finally, we define the system level of the model which gives the structure and capability of the overall
realtime workload. Parameters at this level are:
• number of instructions (thousands of operations per second)
• total amount of sensor I/O (words per second)
• total amount of IPC (words per second)
• number of functions
• percentage of sensor I/O that is input
• percentage of IPC I/O that is input
Figure 4-3 illustrates the workload model for a realtime system.
A program, called the workload calculator, takes system and functional level parameters and calculates
iteration numbers that can be used to implement a synthetic workload. This program, developed in
[Clune 84], is discussed in Section 4.5.1.
4.5. IJnplelDentation of the Synthetic Workload on FTMP
The goal of the synthetic workload implementation is for a user to be able to use the workload with
minimal knowledge of the underlying system. The user should only need to know the workload model.
In addition, the workload should have an easy to use interface. Initially, the discussion of the synthetic
.
workload implementation will focus on the user interface. This will be followed by a discussion of the
details of the actual synthetic workload implementation on FTMP.
4.6.1. User Interfaces
To the user there are three parts to the synthetic workload: the workload calculator, the workload
generator, and the workload data analyzer. Each of these programs is invoked at different times in the
developing and running of a workload experiment. The following is a discussion of these three programs.
Workload Calculator:
The workload calculator was developed and implemented in [Clune 84]. This program
converts parameters from the function and system level of the workload model into
iteration numbers for a workload task on FTMP. This program inputs system and
functional level parameters and calculates iteration numbers that are used by the
Fnl
25
GLOBAL
MEMORY
PN QN
Fn2
RN SN
®u'"I I
FnN
Figure 4-3: Workload Model [Clune 841
26
synthetic workload generator. The system level parameters directly correspond to those
parameters presented in the model. These parameters include total instruction KOPs,
total sensor I/O, and total IPC rate. Functional level parameters also correspond to
those presented in the model. Examples oC Cunctional level inputs include the number
oC tasks per Cunction, the Cunction's iteration rate and the percent oC the total system
instructions, the total sensor I/O, and the total IPC I/0 used by each Cunction. This
program outputs loop iteration values Cor insertion into the synthetic workload tasks
(Figure 4-2). The workload calculator can speciCy workloads Cor any control computer
that implements the same workload model.
Workload Generator:
This program is the interCace between the user and FTMP. The major motivation Cor
the program is to separate the details oC the workload model Crom the details oC
installing task level parameters into the FTMP synthetic workload. This program uses
iteration values supplied by the user (e.g. those supplied by the workload calculator)
and deposits them into synthetic workload tasks on FTMP by setting up a command
file. When run, this command rile enters CTA, the interCace between FTMP and the
VAX, and selectively writes to FTMP's memory to set up the workload. The command
rile also sets up the number oC tasks to run in each rate group (again deCined by the
calculator), plus configures FTMP Cor one, two or three processor triads. The workload
generator creates a second command rile Cor collecting timer data Crom FTMP. The
user is again quizzed on which timer values to save and the number oC iterations to
observe. These timer dumps are later analyzed by the third component oC the
workload, the data analyzer.
Data Analyser: This program works in conjunction with the workload generator to analyze data dumps
and malce histograms oC diCCerences between timer values. The user is quizzed on which
timer values to t.ompare and :put into histograms.
Figure 4-4 illustrates the relationship oC the above programs. Each program is user oriented, quizzing
the user about system configuration, workload structure, and timer values desired. Presently, the user is
reSponsible Cor filling in the link between the workload calculator and the workload generator.
The steps to running an experiment with the synthetic workload are:
1. Load FTMP with the synthetic workload (need only be done once).
2. Use the workload calculator to describe the application workload you wish to test. Iteration
values are stored in a file called RESULT.DAT.
3. Run the workload generator using data Crom Step 2 as parameters into the workload model.
The workload generator will create two command riles: one to conCigure the the synthetic
workload on FTMP and a second to collect data Crom the workload.
4. Run the first command rile to configure FTMP.
5. Run the second command rile, storing the data in an output rile. Run this command rue
several times until you have the desired amount oC data.
6. Run the data analyzer using an output rue Crom Step 5 as input. The data analyzer outputs
27
Instructions/sec.
FreQuencl es,
I/O rete, etc.
ta,
togrems
WORKLOAD l/CALCULATOR f'
flnitl0ns,
s, etc.
,l/
WORKLOAD
GENERATOR
uration
ds
,~
F T M P
ata,
Dumps, /
DATA
"-
Da
ANALYZER , His
Raw 0
Timer
TflSk De
Iterflti on
Reconf1g
Comman
Figure 4-4: FTMP Synthetic Workload Environment
28
the data in a readable form and creates histograms of that data.
7. Repeat Steps 2 through 6 for each workload experiment.
Once FTMP is initially loaded with the synthetic workload, the elapsed time from running the workload
calculator to output histograms is about 10 minutes. Appendix B contains an example of running the
synthetic workload through the above steps.
4.&.2. Implementation: FTMP Tasks and Workload Considerations
The model for a realtime workload task was presented in Figure 4-2. In this task model, the values for
the loop iterations are read in from a special area in memory set up by the workload generator before the
workload starts. Timer values are written back to memory at the end of the task.
FTMP has three task rate groups. For initial implementation, there are three workload tasks for each
rlt,te group. Three per group is not a hard limit since there is roolIl in the task tables to potentially
expand to 15 tasks per rate group (except for the Rl rate group - there are 6 special tasks thus limiting
tllis r3te group to 9 workload tasks). The major limit on the nUlIlber of workload tasks in FTMP is
memory storage for timer values. The number of tasks that actually run in each rate group is set up by
the workload generator.
Data collection is done in cycles. A collection cycle starts when the dat3 colJection command file
(created by the workload generator) enables tasks to execute. For a period of time workload tasks write
tbner values to memory. These values are then retrieved from FTMP's memory by the command file for
13ter analysis. Once this is done, tasks are en3bled again to start another data collection cycle. The
saved data is essentially a snapshot of the computer over a defined execution period.
To encompass all workload tasks, a collection cycle must include at least one full execution frame of the
lowest frequency rate tasks (Rl). Thus, a collection cycle begins 3t an Rl frame boundary, called a major
trame. A major frame encompasses four R3 frames and eight R4 frames. An additional R4 task
collection was added, making nine R4 collection frames, to record boundary cases such as missed
deadlines. To monitor when to start collection cycles, an additional R4 task is present. This task
IDonitors when a major frame is ready to begin and sets all the workl03d tasks to start collecting data. It
tllen removes itself from the R4 task list so as not to interfere with workload tasks while the workload is
e:lCecuting. A .cycle is begun by externally linking in the special R4 task. All of these details of data
collection are transparent to the user since they are set up by a d3ta collection command rue created by
the workload generator.
The workload has to take into consideration several special tasks running onFTMP. These tasks are:
1. A R3 task (R31) called -TIME- which updates TIME.NOW, the current time, in memory by
29
checking RT.CLOCK (the realtime clock) and BASE.TIME. This is considered essential to the
computer performance and is always linked in.
2. The RI -DISPLAY- task which updates FTMP's display terminal on the status of the system.
This is considered non-essential and can be taken out if the user so chooses (i.e. if a workload
task already models a system display).
3. Two RI tasks -READALL· and ·SCC· which are the fault-tolerant tasks of FTMP. These
two tasks can be considered essential in a fault-tolerant computer such as FTMP for fault
recovery and reconCiguration. However, during fault-free execution they only perform selC-
tests. Therefore, the user has an option to take either of these tasks out of the task structure,
which is useful should the user want to investigate the overhead of fault-tolerant tasks.
The workload generator will ask the user which special tasks to include in the workload and links them in
accordingly.
Each task has an associated Task Control Block (TCB) which contains information on that task. Task
Control Blocks are in a linked list common data structure in global memory. Processor triads select tasks
from this structure when they need a new task to execute. Figure 2-3, presented earlier, illustrates the
TCB data structure and the position of workload and other tasks in that structure. The final three RI
tasks, IDLEI, IDLE2 and IDLE3, are workload special tasks to record idle time in a major frame on each
ot the processor triads. After a processor has completed an RI task it will select an idle task and hold
that task until other processors have finished their RI tasks and select an idle task.
Finally, the FTMP RI task dispatcher can assign RI tasks to a specific processor if poesible. A special
field in the TCB of the task determines which processor (I, 2, or 3) to run the task on with 0 specifying
any processor. ·SCC- modifies this field so it can progressively run a battery of self-tests on different
processors. Execution of SCC affects TCB ordering since the dispatcher will postpone execution of this
task until the requested processor becomes available by moving this task down the task list.
4.5.3. Calibration
The final step to synthetic workload implementation is calibration. Calibration determines the
correctness of the workload model. The best calibration experiments are, of course, direct comparisons to
natural workloads. However, comparisons to dedicated FTMP experiments is acceptable since the goal of
calibration is to show that the workload can produce similar results.
The calibration experiments chosen for FTMP's synthetic workload are baseline experiments previously
conducted without the workload generator in [Clune 84). These experiments provide an opportunity for
comparison. The experiments are:
.1. A task switching time experiment. This finds the overhead associated with starting a new
task once a task finishes. This time is found by comparing timer values recorded at the end of
the first task and the beginning of the second task respectively. Figure 4-5 illustrates task
30
switching overhead.
2. A task startup experiment. This experiment measures the overhead or starting a task on a
processor. This time is round by comparing timer values taken at the beginJ}ing or tasks
running on separate processors. Figure 4-6 illustrates task startup overhead. . .
Figures 4-7 though 4-10 are the results or rour eXPeriments: task switching time, dedicated experiment;
task switching time, workload experiment; task startup overhead, dedicated experiment; and task startup,
workload experiment.
ITesk 2Tesk
IE--S WI t ch Ing~
I Overhead I
Tesk I
~-,PI
Figure 4-5: Task Switch_ng Overhea,d
PI
Tesk
P2
I
I Tesk
I Stertup
I J Tesk 2
~ 1~------
Figure 4-8: Task StlLftuP Overhead
Jpit.ial comparison U! encouraging; both baseline and workload ~periments have similar shapes. Both
'-"":,-.' '. - ;-.-., .'. ".' :' '. -:',','-', .. '.,- . - .,-' . ......, -,'.-.
tl¥l.~ ~t~tup experiments reveal similar dual peak curvejJ With fringe data points. In the baseline
e~PeriOleQt, th~e lone data points revealed that the dispatcher was oc~asionally late starting a task. The
~Yllth,et.C workload exhibits the same behavior.
Ol<>ser in~pection or the data reveals that the workload curveti or task switching overhead and task
sta.rtup time are di~placed 4 and 1.88 clock ticks (I B,nd .17 mSec) retipectively rrom their baseline
e~periment counterparts. Thus, overhead exists in the workload that is not present in the baseline
el(:periments. The source or this overhead is obvious upon inspecti<m or the AED sourCe code or tilt,
l?3,!le(ine experiment task (Figure 4-11) and a workload task (Figure 4-1~). The baseline experiment WM
c;l~igned to measure beginning and end task times. Thus, time ill read immediately upon enterillg and
just berore exiting the task. In contrast, the workload cont~ns both task entry overhead (~taten,en~
••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••••••••
clock data-
ticks time points
--------
---------
16 ticks (4.00 mSec) [122]
17 ticks (4.25 mSec) [ 67]
..
31
clock data-
ticks time points
-------- ---------
12 ticks (3.00 .Sec) [242]
13 ticks (3.25 .Sec) [298]
Average: 12.55 ± 0.042 Ticks (540 data points)
3.13 ± 0.011 mSec
Figure 4-7: Baseline Experiment: Task Switching Overhead
•••••••••••••••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••
Average: 16.35 ± 0.068 Ticks (189 data points)
4.09 ± 0.017 mSec
Figure 4-8: Workload Experiment: Task Switching Overhead
SI-S4) and task end overhead to save results (statements EI-E4). Because the synthetic workload is an
application level tool, overhead is put outside the inner loops. The workload can still be used for timing
intertask events if we take into account this overhead.
By summing the execution times of statements 81 through 8-4 in the workload we can find the workload
task initialization overhead. Execution times of the RD primitive are from a separate experiment
(Appendix A). Execution time of arithmetic operations are taken from [Clune 84]. Execution time of the
-IF- statement is neglected since global memory RD time is substantially larger.
Statement # Instructlon Execution Time (mSec)
Sl
S2
S3
S4
RD [1 word]
IF (EXEC4 GEQ 0) ...
RD [5 words]
0.138
0.0 (for simplifying calculations)
0.0
0.150
0.299 mSec (Ave.)
Similarly, the workload end overhead is:
clock
ticks tiae
32
data-
points
4 ticks (1.00 aSec)
5 ticks (1.25 aSec)
6 tiCks (1.50 aSec)
7 ticks (1.75 aSec)
8 ticks (2.00 aSec)
9 ticks (2.25 aSec)
10 ticks (2.60 aSec)
11 tlcks (2.76 aSec)
12 ticks (3.00 aSec)
13 ticks (3.26 aSec)
14 tlcks (3.60 aSec)
16 ticks (3.76 aSec)
16 ticks (4.00 aSec)
17 ticks (4.26 aSec)
18 ticks (4.60 aSec)
19 ticks (4.75 aSec)
20-30 ticks
31 ticks (7.75 aSec)
32 ticks (8.00 aSec)
33 ticks (8.26 aSec)
34 ticks (8.50 aSec)
[ 24] ***
[298] ****************************************
[ 48] ******
[ 2] *
[ 29] ****
[328] ********************************************
[ 9] *
[ 0]
[ 0]
[ 1] *
[ 0]
[ 0]
[ 0]
[ 0]
[ 1] *
[ 0]
[ 0]
[ 0]
[ 3] *
[ 0]
[ 1] *
Average: 7.16 ± 0.198 Ticks (744 data points)
1.79 ± 0.014 aSec
Figure 4-9: Baseline Experiment: Task Startup Time
Statement #
El
E2
E3
E4
Instruction
WRT [12 words]
EXEC4*6
WRT [1 word]
3*EXEC4
EXEC4=EXEC4+1
WRT [1 word]
EXecution Time (aSec)
0.190
0.063
0.164
0.063
0.058
0.164
0.702 mSec (Ave.)
In the synthetic workload, calculation of task switching must consider task ending overhead of the first
task, and task initialization overhead of the second task. Finally, 0.164 mSec is added since the baseline
experiment must write a timer value to memory (E1) at the end of the task. Taking these into account,
we get
4.09 mS· 0.288 mS· 0.702 mS + 0.164 mS = 3.26 mS (Ave.)
a value within 5 percent of the baseline experiment's value.
Similarly, overhead should be deducted Crom the task startup time experiment. Since this experiment
33
clock data-
ticks tille points
--------
---------
5 ticks (1.25 IISec) [ 18] ••••••
6 ticks (1.50 aSec) [ 95] •••••••••••••••••••••••••••••••
7 ticks (1 .75 IISec) [ 21] •••••••
.. 8 ticks (2.00 IISec) [ 0]
9 ticks (2.25 IISec) [ 2] •
10 ticks (2.50 IISec) [ 51] •••••••••••••••••
11 ticks (2.75 IISec) [108] •••••••••••••••••••••••••••••••••••
12 ticks (3.00 IISec) [ 0]
13 ticks (3.25 IISec) [ 1] •
14 ticks (3.50 IISec) [ 1] •
15 'ticks (3.75 aSec) [ 0]
16 ticks (4.00 IISec) [ 0]
17 ticks (4.25 IISec) [ 0]
18 ticks (4.50 IISec) [ 0]
19 ticks (4.75 IISec) [ 0]
20 ticks (5.00 IISec) [ 0]
21 ticks (5.25 aSec) [ 1] •
22 ticks (5.50 aSec) [ 3] •
23 ticks (5.75 aSec) [ 0]
24 ticks (6.00 aSec) [ 1] •
25 ticks (6.25 IISec) [ 2] •
26 ticks (6.50 IISec) [ 2] •
Average: 9.03 ± 0.391 Ticks (306 data points)
2.26 ± 0.098 IISecs
Figure 4-10: Workload Experiment: Task Startup Time
compares the first timer values of two workload tasks, task initialization overhead for both tasks should
.be deducted. The actual startup time becomes:
2.26 mSec - 2*0.288 mSec = 1.68 mSec (Ave.)
a value within 10 percent to the baseline experiment's value.
The following table summarizes the above results:
Baseline
Experiment Experiment Times
Task Switching time 3.13 mSec (Ave.)
Workload
Experiment Times
4.09 mSec
Minus workload 3.26 mSec
overhead
Task Startup time 1.79 mSec 2.26 mSec
Minus Workload 1.68 mSec
overhead
34
oro. TEST! BEGIN
DEFINE PROCEDURE TIMETESTl TOBE
BEGIN
LONG HOLD. HoLDl ;
INTEGER EXEC.RTCNUM.I;
INTt.cER'A;
HREAD(RT.CLOCK.HOLD.2);
RD (CMU. EXEC. EXEC. 1> ;
IF EXEc LEQ 14
THEN BEGIN
RD(CMU.RTCNUM.RTCNUM.l);
FOR 1=1 STEP 1 UNTIL RTCNUM
DO BEGIN
A=l;
END;
A =EXEc • 8;
WRT(CMU.TIME(A).HoLD.2);
HREAD(RT.CLoCK.HOLD1.2);
WRT(CMU.TIME(A+l).HOLD1.2);
END;
RESUME(O) ;
END •
, .
END FINI;
El
•
Figure 4-11: Baseline Experiment Task (AED)
AIthough these experiments are not application level calibration experiments, they do show that the
synthetic workload is a valid toOl for making baseline experiments, as long as workload overhead is
e()n~idered in any intertask measurements. If measurements are intratask, the overhe&.d is much smaller
since the dock read time (HREAD) is the only overhead. In conClusion, the workload is a useful tool for
performing exp~riments on FTMP.
35
CHU ,TEST BEGIN
DEFINE PROCEDURE VRKLOADR41 TOBE
BEGIN
INTEGER X, Y, z; . .. NON-STACK LOCALS II
OWN INTEGER A;
OWN INTEGER LOCAL. EXEC4;
OWN LONG ARRAY HOLD (OUT. VALUES); •.. HOLDS TIMER VALUES II
OWN INTEGER ARRAY R41.INPUT(6); ... INPUT PARAMETERS II
INTEGER P; P $=$ R41.INPUT(0);
INTEGER Q; Q $=$ R41.INPUT(1);
INTEGER T; T $=$ R41,INPUT(2);
INTEGER R; R $=$ R41.INPUT(S);
INTEGER S; S $=$ R41.INPUT(4);
RD(CHU.EXEC(0).EXEC4,1);
IF (EXEC4 GEQ 0) AND (EXEC4 LES g) THEN
BEGIN
RD (R4.INPUT(0).R41.INPUT,6);
HREAD(RT.CLOCK.HOLD(0),2);
FOR A=l STEP 1 UNTIL P DO
RD(CMU.GLOBAL.LOCAL.l);
HREAD(RT.CLOCK.HOLD(1),2);
FOR A=l STEP 1 UNTIL Q DO
RD(CMU,GLOBAL,LOCAL,l);
HREAD(RT.CLOCK,HOLD(2).2);
FOR A=l STEP 1 UNTIL T DO
X=Y+Z;
HREAD(RT.CLOCK.HOLD(S),2);
FOR A=l STEP 1 UNTIL R DO
VRT(CMU.GLOBAL.LOCAL,l);
HREAD(RT.CLOCK.HOLD(4),2);
FOR A=l STEP 1 UNTIL S DO
VRT(CMU.GLOBAL.LOCAL.1);
HREAD(RT.CLOCK.HOLD(6),2);
VRT(R41.0UTPUT(EXEC4.6).HOLD,12);
VRT(R4.ID(S.EXEC4),TRIAD.ID.l);
EXEC4 = EXEC4 + 1;
VRT (CHU,EXEC(0),EXEC4,1);
END; ... IF (EXEC4 GEQ 0) AND •• II
RESUME(O);
END;
END FINI;
81
82
83
84
El
E2
E3
E4
Figure 4-12: Synthetic Workload Task (AED)
36
6. Future Work
On FTMP, a few remaining baseline experiments should be performed. These include:
• Measure the time to transfer varying blocks of data from global to local memory, varying
parameters much more than was done in the brief RD/WRT eXperiments described in
Appendix A.
• Measure instruction execution time in pairs to see if the result is equivalent to the sum of the
execution times when the instructions were measured singly.
• Investigate overhead and variation in application IlOftware due to the fault-tolerant
mechanisms of FTMP.
• Find the nominal length of R3 and Rl tasks on FTMP.
• Find context swap time. This time is defined as the amount of time it talces to start up an R3
task once the dispatcher finishes with R4 tasks.
The later three experiments can probably be performed with the synthetic workload.
The potential of the synthetic workload has only been superficially demonstrated. The workload should
be used for performance tests and comparisonS, along with application level baseline experiments. Only
through lise will its power be demonstrated.
Also, the present synthetic workload is a minimal implementa.tion that was used to investigate
feasibility. Presently, there are only three tasks per rate group. The R4 and R3 rate groups each have
room for ten more tasks in their task structure, while the Rl rate group has room for seven more tasks.
The only limiting factor is the amount of global memory available on FTMP to hold timer dumps. More
compact timer dumps could possibly resolve this problem. Any enhancements will require changing the
workload generator and data analyzer.
Although much work has been done defining the exp,erimental methodology and \Ising it to validate
FTMP, there. is still work to be done. First, the methodology should be verified through application to
another system. In particular, the Software Implemented Fault-Tolerant (SIFT) computer at AIRLAB
should have the validation steps applied to it. This computer has constraints similar to FTMP's and
would be an excellent candidate for the validation procedure.
Finally, in the future it will be desirable to contrast performance versus reliability of faul~tolerant
com.puters. One idea is to integrate the synthetic workload - a performance measurement 0001- with the
fault-injection experiments.
37
6. Conclusion
This project outlined and refined an experimental methodology for validating the multiprocessor
avionics computer, FTMP. The methodology emphasizes a building block approach in which tests are
performed starting at the instruction level, progressing through the operating system level and finally up
to application level validation. At each level baseline experiments, which test a single phenomenon, were
performed. These were followed by more sophisticated experiments which test interactions between
several baseline phenomenon. Finally, the concept of a generalized application level experiment tool,
called the synthetic workload, was developed.
Previous research had developed an outline of the methodology and tested it through the application
level. This research refined that methodology with additional baseline tests. In addition, the synthetic
workload was implemented as an application level tool. The synthetic workload was then calibrated with
a baseline experiment to demonstrate the workload's representativeness.
Although the technique was developed specifically for FTMP the origin of the technique dates back to
earlier work on multiprocessors at C-MU. Thus, the methods used here should be applicable to other
computer systems. Tests on another system will supply information on the robustness of the technique
along with supplying meaningful comparisons between systems.
By no means is the methodology complete. Using the synthetic workload for experiments will
undoubtedly reveal deficiencies in the original methodology. But the existence of this tool will greatly
improve productivity, allowing researchers to run more experiments and further refine the methodology.
In general, the methodology has proven to be a sound approach to validating computer systems.
38
App~Ddix A. Test of Select RD/WJ:.tT Primitives
Qn FTMP, Dl08t program tasks access the shared system memory with the foll()wing bus service
rolltines:
411 RD(s!f8.adr,cache.adr,num). This routines transCers num number oC words Cram system
lDelDoryaddress s1/s.adr to cache address ctJche.adr.
• WRT(s!/s.adr,cache.adr,num). This pr()Cedure is the same as RD except, oC course, the
direction oC transCer is reversed.
We wish to find the time these procedures use to access system lDemory with varying trallsCer sbes. In
p~tic1l1ar, we ~e interested in the sizes that are llsed in tlte w()rklo~d. The following illstructions were
tested:
1. IW(sys,cache,l)
2. WRT(sys,cache,l)
a. RD(sys,cache,5)
1. ~T(sys,cache,12)
~~rllct~ons 1 and 2 were each executed in a 190p 100 tilDes ~ong with the instructioll 'A=lj'. The
o.~lter twa instructions ",ere execu.ted in a shniler 190p 50 times2. To find loop overhead, a loop just
.',C':". '.. ,,', ,
cC?~,~ailliog an 'A=lj' instruction was executed both 50 and ~()() times. This is th.e 'NULL' loop3. Times
to ~)Cecllte instructions can be Cound by subtracting l()()p overhead CrolD. the instruction loop, leaving only
instruction execution time.
. ';- '~'.", ;-:' -, :
..
Ins,t,ruct,10n
The refillits of tlte measurements were as Collows:
clock ticks pSec per instruction
per 100p(Ave.) I---~-------------------------I
/loop count w/ overhe~~ w/o overhead
Number
of ~ata
p01nts
1). ~:Q11
2)\ J;U> .. (x;,y,num=l)
3) ~T(x,y.n1lm=l)
1), ~1l1l,
5) @, (x,y,num=5)
($) ~T(x,y,num=12)
15.7/100
70.8/100
69.1/100
8.3/60
38.2/60
46.0/50
39.3 ± 0.019
177.0 ± 0.025
172.8 ± 0.023
41.6 ± 0.0;27
191.0 ± 0.025
230.0 ± 0.018
0.0
137.7 ± 0.044
133.6 ± 0.042
0.0
149.. 6 ± 0.062
188.6 ± 0.046
34.0
220
260
600
600
300
~h~ firt'lt column is the raw data in clock ticks (1 clock tick = .25 mSec). The oext column is the time
te., ~~~(:ut,e a single inlltruction including loop overhead. The third colulDo adjusts the tiloe Cram the
s~colld by subtracting overhead.
2~he loop count. was reduced to 50 Cor these calls since man)' large block transfers cQuld t8:ke mQr~ time than an Rf, prQcess is
al\\>"ed
3A loop must contain at least one instruction; otherwise the compiler will not, accept it. This is why. 'A-I' is Qed. &llas~bs~it~te
C~r a'NULL' loop
39
Appendix B. Example of Workload Use
This appendix contains an example of the running of the workload generator and data analyzer. An
example of the running of the workload calculator is not presented since that program is discussed in
IClune 84J. This example starts with the very first step of the user providing information to the
workload generator followed by the loading of FTMP with the synthetic workload. Then, using the two
command mes produced by the generator, the FTMP synthetic workload is configured and data collection
is run. Output from the data collection is redirected into a me which is used as input to the workload
data analyzer.
The workload generator basically queries the user on how he/she wants the synthetic workload
configured. Input parameters to tasks correspond directly to workload parameters in Figure 4-2. The
workload generator will also ask if the user wants the special Rl tasks (SCC, READALL, and DISPLAY)
included in the workload. Finally, this program will inquire about data collection including what values
and how many iterations the user wants from the workload collection.
The workload data analyzer is more complicated. This program reads in timer values produced by the
collection me generated by the generator and quizzes the user on which timer values to compare. The
initial part of the analyzer is me management. The program skips comments and tables in the data me
to find the start of the workload data. It then quizzes the user on where he/she wants output sent.
Should there be a break due to garbage data, a new collection set, or incomplete data (Le. CTA stalled in
the middle of a collection and had to be restarted), this program will skip to the next major frame of data
and return to the me management prompt.
Next, the Analyzer gets from the user timer values to compare. The format for specifying timer values
is:
<task name> <timer no>
Where <task name> ::= READAL, see, IDLE[123], R[431] [123]
<timer no> .. - 0-5 for Rxx tasks.
6+ for timer value in another collection frame.
0-1 for READALL, see and IDLE task.
Figure B-1 illustrates the workload tasks and timer numbers. For Rxx tasks, the user can specify a
number greater than 5 to refer to a timer value in another collection frame, e.g. 6 corresponds to the Oth
timer value in the task iteration immediately after the current iteration. Thus, to find the tim~ between
running of task R41 we would compare R41 6, the last timer value in task R41, to R41 6, the first timer
value in the next R41 iteration. This is feasible since the timer values for all iterations of a task in a
major frame are stored in a continuous array. The analyzer will try to collect as many data points as
possible in a major frame.
40
..
2
3
o
2
3
o
1
2
o
3
Rll R12 R13R41 R42 R43 R31 R32 R33
l/IO
°
0
~ 1 1 1 1./10 0 02 2 2
.'~ 3 3 ~V 1 1 1~4 4~s S ~ 2 2 2~
"...
lJ:en R41 R43 3 3 3sJ<_~o 01/ 0
l% 1 1 4 4 42 23 3 \ 5 5 5
/ 4 4 4S S Sncy
--
.
R41-~ R43 R31 R32 R33
0 0 """0-- r-.~o1 1 1 0 0
2 2 2
3 3 3 1 1 1
4 4 4
S S s 2 2 2
R41 R42 R43 3 3 3
0 0 0
1 1 1 4 4 4
2 2 2
3 3 3 5 5 5
4 4 4
S S S
.. ....
COWW R31 R32 R33~~~
R41
Length
Time
Betwe
R41 Te
Runs
~31
Fr~que
Flsure 8-1: Dlustration of Workload Tasks
41
It is recommended that the reader look at the steps for running the workload presented in Section 4.5.1
while reading through this example. Figure B-2 illustrates the running of the workload. '.COM' files
contain CTA commands for loading FTMP with the synthetic workload (2TRIAD.COM), configuring the
workload (CONFIG.COM), and collecting data from the workload (COLLECT.COM). WRKLD.CAP is
the absolute load module of the synthetic workload. WRKLD.LOG is an output log of workload data
produced through the collection command file (COLLECT.COM). WRKINFO.T.XT is an internal file
that communicates workload information from the workload generator to the data analyzer.
Throughout this appendix the user response will be in bold font while italicized phrases are guiding
comments. Space constraints require that the example be minimal. Therefore, data collection is for
eight major frames of data. This is much less than would be included in a normal experiment.
Worklo8d
Gener8tor
WRKLD.EXE
ANAL.EXE
DATA
D8t8
An81yzer
WORKINFO.TXT
FT M P
CONFIG.COM
WRKLD.CAP
2TR lADS.COM
Figure ~2: Running the FTMP Workload
• RUNWRKLD
Input fUe [STDIN]: <OR>
Output, fUe [STDOUT]: OONFIG.OOM
No. of R1 tasks: 0
No. of R3 tasks: 1
Task R31:
T1ae 11a1t, in t,1cks (1 t,1ck=O.25 .sec) [48 ticks]: <OR>
Input parueters [1 or (P Q T R S)]: 00000
42
No. of R4 tasks: 2
Task R41:
T111e 11111t 1n t1cks (1 t1ck=0.26 IIsec) [24 t1cks]: <OR>
Input parueters [? or (P Q T R S)]: 0 0 0 0 0
Task R42:
T111e 11111t 1n t1cks (1 t1ck=0. 26 IIsec) [24 t1cks]: <OR>
Input parameters [? or (P Q T R S)]: 0 0 0 0 0
How lIany processor tr1ads (1, 2, or 3)? 2
Do you want SCC linked 1n [Y]? <CR>
Do you want DISPLAY 11nked 1n [Y]? <CR>
Do you want READALL linked 1n [y]? <CR>
Data for collect10n
Do you want the data collect10n loop 1n a separate f1le? en] y
Output fHe [STDOUT]: COLLECT.OOM
Wa1t t111e between collect10ns [6 sees]: <OR>
There are 2 R4 tasks.
How lIany of these tasks do you want data from? [ALL) <OR>
There are 1 R3 tasks.
How lIany of these tasks do you want data from? [ALL] <OR>
Do you want the ID table dumped? [YES) <OR>
Do you want IDLE, SCC, and READALL valUes dumped? [YES] <OR>
Loop 1terat10ns [25]: 8
$ @2TltIADS.OOM Load FTMP with the synthetic workload.
Output from loading...
B1t set
tHIS PROGRAM STARTS UP 2 PROCESSOR AND MEMORY TRIADS.
MEMBERS OF TRIADl ARE LRU·S 0, 1 AND 2.
MEMBERS OF TRIAD2 ARE LRU· S 3, 4 AND 6.
THE MASTER IS LRU • A· .
COOP. CAP LOADED IN MJ,.STER
MASTER ISSUING BUS ENABLE/SELECT COMMANDS.
CLEARING SYSTEM MEMORY TO 0
BEGINNING LOAD OF EXEC MEMORY IMAGE
SYSTEM MEMORY LOAD COMPLETE
LRU·S 6,7,8,9,A,B ARE MARKED FAILED.
TRIAD.ID.TABLE, MRR.TABLE SHOULD BE ALTERED TO CHANGE
THIS CONFIGURATION.
SLOP IS SET TO 40 PER CENT OF R4 PERIOD.
STARTING 2 TRIADS
MASTER MAKING FINAL BUS ASSIGNMENTS
43
SYSTEM STARTED IN MULTIPROCESSOR MODE.
CONFIGURATION TABLES ARE LOCATED AS FOLLOWS:
TABLE LOCATION LENGTH
BUS INMUX SELECT CODE 0 20 12
C BUS ASSIGNMENTS 0 20 12
P. R AND T BUS ASSGN 0 38 12
MEMORY STATUS 0 44 12
PROCESSOR STATUS 0 50 12
ERROR LATCHES 1 00 48
INITIATING TRANSFER OF CLOCK FROM MASTER
Bit is reset
DISCONNECTED FROM C BUS 1
DISCONNECTED FROM C BUS 2
DISCONNECTED FROM C BUS 3
DISCONNECTED FROM C BUS 4
DISCONNECTED FROM C BUS 5
$ @CONFIG
Output from configuring...
Linking in DISPLAY .
Preparing Rl tasks
o Ri tasks
Preparing R3 tasks
1 R3 Tasks
Preparing R4 tasks
2 R4 Tasks
Bringing up 2 Processors
44
Repairing 0-2 .
railing 3-8 .
Sringing up Processors 3-5 .
Linking in IDLE and (opt1onally)
sec, DISPLAY and READALL
$ @COLLEOT /OUTPUT:WRKLD.LOG
All output going a file
$ RUN ANAL
Send Output to the terminal
i:nput file [STDIN]: wrkld.log
START1MG eO~eTION .
TABLES OF IMTEREST LRU assignment table and
table of workload input
0020 0020 0016 0016 0016 0015 0015 0015 0000 0050 e processor
0020 0000 0058 triads
0000 0000 0000 0000 0000 0000 0000 0000 OOOF 0000
0000 0000 0000 0000 0000 0000 0000 0000 OOOr 0008
0000 0000 0000 0000 0000 0000 0000 0000 OOOF 0010
0000 0000 0000 0000 0000 0000 0000 0000 OOOF 0018
0000 0000 0000 0000 0000 0000 0000 0000 OOOF 0020
0000 0000 0000 0000 0000 0001" 0028
OOOA 042F OOOA 042E OOOA 0420 OOOA 042D 0010 0000
~Start of new data.
Where do you want new data (S,#,M,L,?): N
New output tile [STDOUT]: <CR>
EAT:l:MG DATA ...
For this running of the workload we will tOlled data
to measure four things:
*The R41 task length. This is calculated by subtracting
the first timer value in task R-41 (R41 O) from the last
timer value in that task (R41 oj.
* The time ftJr the second processor to start its R-I task
after the first processor started its R-I task. This "task
etartup" time is found by comparing timer values taken
at the beginning of tasks R-41 and R-Ie (R41 0 and R42 OJ.
*The effective rate of an RO ta8k. This i8 done by comparing
time at the beginning of each iteration of the first RO ta8k
(RS1 0 to R31 8). There are four RO ta8k iterations
per major frame of data. ThU8, three value8 can be
collected in a major frame.
* SOO 8tartup time. Thi8 is a measure of the time for SOO to
start after the fir8t R-4 ta8k 8tarts. It is found by comparing
the first timer value in SOO (SeC O) with the fir8t timer
..
45
reading (R.41 0) 01 the lirst iteration 01 task R-Il.
There are:
2 R4 tasks, 2 are dumped.
1 R3 tasks, 1 are dumped.
o R1 tasks, 0 are dumped.
The task ID table was dumped .
sec, READALL and IDLE task values were dumped.
Data point dump 1. Please list highest rate group first.
First timer value (cmd,Q,H,?) [?] > R410
Second timer value > R41 5
Name of this data dump: Task R41 length
Data point dump 2. Please list highest rate group first.
First timer value (cmd,Q,H,?) [?] > R410
Second timer value> R420
Name of this data dump: Task Startup time
Data point dump 3. Please list highest rate group first.
First timer value (cmd,Q,H,?) [?] > R310
Second timer value > R31 8
Timer number for 2nd task crosses a frame boundary.
How many collections do you want per dump group? [1] > <OR>
Normal collection values are: 9 (R4) and 4 (R3).
Use a number that is less than default or
you'll go out of bounds on the data structure.
How many collections do you want per dump group? [?] > 8
Name of this data dump: R3 task rate
Data point dump 4. Please list highest rate group first.
First timer value (cmd,Q,H.?) [?] > R410
Second timer value> see 0
Which R4 task iteration do you want? [0-8] 0
Name of this data dump: see startup time
Data point dump 5. Please list highest rate group first.
First timer value (cmd.Q,H,1) [1] > ~
4 10 403 378
4 6
4 10 635
4 6
4 12 380
4 6
4 11
4 7
4 11
4 11 396 89
4 6
4 11 638
4 6
4 14 380
4 6
4 10
4 7
5 11
4 11 638 294
3 6
46
4 12 889
4 6
6 11 642
4 6
4 11
6 7
4 10
4 11 688 448 ~
4 6
4 10 881
6 7
8 11 641
4 6
4 10
4 7
6 11
6 11 640 288
8 6
6 11 880
4 6
6 11 688
4 6
4 10
4 7
4 11
6 11 689 84
4 6
4 11 880
4 6
4 11 642
4 6
6 11
4 7
4 11
4 10 687 67
3 6
6 11 382
4 6
4 11 648
4 6
6 11
4 7
4 10
6 11 688 298
4 6
4 16 381 ~
4 6
4 10 /642
4 6 ..
4 11
4 7
4 10
»TaSk R41 length.
•47
AVERAGE =4.126000 (72 Data points)
VAH =0.223692 (ST. DEY. =0.472866)
MAX =6 MIN =3
Print histogram of Task R41 length [Y]? <OR>
3 ( 4) ....
4 ( 66) ••••••••••••••••••••••••••••••••••••••••••••••••••••••*
6 ( 13) •••••••••••••
»Task Startup time.
AVERAGE =8.888889 (72 Data points)
YAH =6.269163 (ST. DEY. =2.603830)
MAX = 16 MIN =6
Print histogram of Task Startup time [Y]? <OR>
6 ( 23) •••••••••••••••••••••••
7 ( 9) •••••••••
8 ( 0)
9 ( 0)
10 ( 11) •••••••••••
11 ( 26) •••••••••••••••••••••••••
12 ( 2) ••
13 ( 0)
14 ( 1).
16 ( 0)
16 ( 1).
»R3 task rate.
AVERAGE =633.468333 (24 Data points)
YAH = 12412.269040 (ST. DEY. = 128.110339)
MAX =643 MIN =380 The epread ie too large to print
Print histogram of R3 task rate [Y]? DO
»scc startup time.
AVERAGE =240.760000 (8 Data points)
YAH = 21286.214844 (ST. DEY. = 146.897960)
MAX =443 MIN =67
Print histogram of sec startup time [Y]? DO
Merge any of the data sets? DO
$
48
Appendix C. Installation Notes
The workload is installed on two systems at AlRLAB: System 10 and System 1. The directory
[EFC.WRKLD] on System 10 contains all workload related files. Since System 10 is the support system
for FTMP executable files (.EXE) for running the workload calculator, the workload generator, and the
data analyzer are here along with absolute load modules for setting up FTMP. AlsO, copies of the AED
workload tasks are on this system along with C code of the above workload programs.
Installed in the directory [FEF.WRKLD] on System 1 is duplicates of .C and .EXE versions of the the
workload calculator, generator, and analyzer. Since the C compiler is only on System 1, the user must
compile on this system rather than System 10. Table C-llists the files associated with the workload.
The files tha.t are essential to running the workload are:
SYNTH2.EXE The workload calculator.
SYNTH.DAT File that must accompany the workload calculator.
WRKLD.EXE The workload generator.
WRKLD.CAP Absolute load module of the synthetic workload.
2TRIADS.COM Command file for loading FTMP with the synthetic workload. This is modified to load
the synthetic workload memory image (WRKLD.CAP) instead of the standard
executive memory image.
ANAL.EXE The workload data analyzer.
It is recommended that the user of the workload copy the above files into his/her own directory and run
lt from that directory. This will help keep things organized and prevent crossing of WRKINFO.TXT
riles. Above all, stay organized; e.g. put output from FTMP into .LOG files and output from the
analyzer into .OUT files.
File Contents
49
Name on VAX
Absolute load module wrkld.cap10
of synthetic workload
Command file for loading 2triad.comlO
.. FTMP with the workload
Workload Calculator synth2.c10,1
C program code
Workload Calculator synth2.exe10,1
Executable code
Data file that must accompany synth.dat10,1
the workload calculator
Output file of Workload results.dat
Calculator
Workload Calculator synth.c/synth.exe10,1
Older version
Common constants used by defines.h10,1
Generator and Data Analyzer
Workload Generator wrkld.c/wrkld.exe10,1
Data file created by the Generator wrkinfo.txt
and used by the Analyzer
Workload Data Analyzer anal.c/anal.exe10,1
Binary tree routines btree.c1O,1
used by Data Analyzer
Command files for compiling synth2.com10,1
the workload Calculator, wrkcomp.com10,1
Generator, and Data Analyzer analcomp.com10,1
Respectively.
•
10 File installed on System 10 in the directory [EFC.WRKLDJ
1 File installed on System 1 in the directory [FEF.WRKLDJ
Table 0-1: Files for Running the Synthetic Workload
50
Appendix D. FTMP Tasks
This appendix describes the AED code used in the synthetic workload. This code is in the data set
LFMTN.CMU.AED on the Business Data Systems Division's (BDSD) mM .381 at Langley Research
Center. Also included is a description of files used to compile and link the workload. For information on
accessing the mM 4381 and the associated file structure see [Feather 84J. The data set member name (file
name) is in the header to the file.
Table D-1 summarizes the AED and associated files used in making the FTMP synthetic workload. In
a.d<lition, many files are on VAX System 10 in directory [EFC.WRKLDJ as indicated by the last column
of the table.
File Oo~tents Name on mM
Ra.te 4 Workload Tasks LFMTN.CMU.AED(wrkld4)
Ilate 3 Workload 'rasks LFMTN.CMU.AEp(wrkld3)
Ilate 1 Workload 'rasks LFMTN.CMU.AED(wrkld1)
R~t~ 4 -Special- 'rask LFMTN.CMU.AED(wrkld44)
t() start the workload
Ilate 1 -idle- Tasks LFMTN.CMU.AED(wrkldn14)
SOC modified to record LFMTN.CMU.AED(nscc)
st"..t 8.& stop time
~ALL lDodified to LFMTN.CMU.AED(nreadall)
record start & stop tilDe
Table of all of the FTMP LFMTN.CMU.ASM(wrktab)
aJI.<t workload global variables.
Linker command file LFMTN.CMU.LINK(wrkld)
for the workloacl
Name on VAX 10
wrkld4.aed
wrkld3.aed
wrkld1.aed
wrkld44.aed
wrkldn14.aed
nscc.aed
(partial listing)
nreadall.aed
(partial listing)
wrktab.asm
wrkld.lnk
Linker output
J\,b~lute load Qlodule
of workload
CQJDmand file for loading
FTMP with the workload
LFMTN.CMU.LINKLIST(wrkld)
LFMTN.CMU.LOAD(wrkld)
Table D-l: FTMP Files for the Synthetic Workload
wrkld.cap
2triad.com
'rhe. following is a detailed description of all of the files used to make the workload.
•51
LFMTN.CMU.AED(wrkld4)
This file contains the highest rate group (R4) workload tasks. There are three rate
four workload tasks. Parameters to each task correspond to the model parameters
(Figure 4-2) and are read from FTMP's global memory by the task. These parameters
are set in memory by the workload generator (Section 4.5.1). A total of six, two-word
timer values are recorded.
LFMTN.CMU.AED(wrkld44)
This is the forth R4 task. Although not linked in while the workload is collecting data,
this task is absolutely essential for starting a round of data collection. This task waits
for a global variable to be set to -1 (set by a command file generated by the workload
generator), thus signaling the start of data collection. Once this variable is set, this
task waits for the start of a major frame (Section 2.4), sets all the other tasks'
execution counters, and links itself out of the task structure. Each tasks' execution
count indicate how many times the task should run in a collection round. At the
conclusion of a collection round, the workload generator command file links this special
R4 task back into the task list for another round of data collection.
LFMTN.CMU.AED(wrkld3)
This file contains the R3 workload tasks. Except for the rate group, these tasks are
identical in all respects to the R4 workload tasks.
LfMTN.CMU.LINK(wrkldl)
This file contains the Rl synthetic workload tasks. These tasks are very similar to the
R4 and R3 workload tasks.
LFMTN.CMU.AED(wrkldn14)
This file contains special Rl -idle- tasks that help record triad idle time. Once a
processor triad has finished its regular Rl tasks (i.e. Rl workload tasks, SCC,
READALL, and DISPLAY), that triad will execute one of the -idle- tasks. The triad
will hold onto the task until all other processor triads have finished their Rl tasks and
started an Rl -idle- tasks. Thus, these tasks give the user an idea of the amount of
idle time in a, major frame. The time from the start of the idle task to the end of the
frame is not a direct measurement of idle time since Rl task execution can be
interrupted by the start of a Rate 3 or Rate 4 task frame. Also, because SCC's position
in the task list may change depending on processor availability (Section 4.5.2), the SCC
task might execute after that processor's idle time in a major frame. In conclusion,
these tasks are a tool for measuring idle time in a major frame.
LFMTN.CMU.AED(nscc)
This code is the System Configuration Control (SCC) task modified for recording time.
LFMTN.CMU.AED(nreadall)
This file contains the -READALL- task modified for recording time.
LFMTN.CMU.LINK(wrkld}
This is the linker command file for the workload. It ties all of the miscellaneous files
together for linking into a download module.
LFMTN.CMU.LINKLIST(wrkld}
This file is the output from the linker. It contains a linker cross-reference along with a
52
memory map of the FTMP workload. There is a copy of this file on the VAX called
wrkld.lnk.
LFMTN.cMu.LOAD (wrkld)
This file contains the absolute load module, ntinus PROM code, for downloading to the
VAX. A copy of this file should already be on the VAX under wrkld.cap.
LFMTN.CMU.ASM (wrktab)
This is the OAPS-6 assembly file of all the global variables in FTMP. It, of course
includes workload variables, A copy of this file called wrktab...rn is on the VAX
System 10 in directory [EFO.WORKLD].
..
•..
•
[Clune 84J
[Draper 83aJ
[Draper 83bJ
[Draper 83cJ
[Draper 83dJ
[Feather 84J
[Ferrari 78J
[Hopkins 78J
[Kong 82J
(NASA 79a)
(NASA 79b)
[Singh 81J
[Toy 78J
53
References
Ed Clune.
Analysis of the Fault-Free Behavior of the FTMP Muliprocessor System: Baseline
Measurements and Synthetic Workload Development.
Master's thesis, Carnegie-Mellon University, 1984.
Development and Evaluation 01 a Fault-Tolerant Multiproceaaor (FrMP) Oomputer,
Vol I, FrMP Principle8 0lOperation8
Charles Stark Draper Laboratories, 1983.
Contract Report (CR) 166071.
Development and Evaluation 01 a FrMPOomputer, Vol II, FrMP Soltware
Charles Stark Draper Laboratories, 1983.
CR166072.
Development and Evaluation 01 a FrMP Oomputer, Vol III, FrMP Te8t and
Evaluation
Charles Stark Draper Laboratories, 1983.
CR166073.
Development and Evaluation 01 a FrMPComputer, Voillf, FrMP Executive Summary
Charles Stark Draper Laboratories, 1983.
Frank Feather, Carlos Liceaga.
FrMP Programmer '8 Manual
2nd edition, 1984.
Domenico Ferrari.
Computer SY8tem8 Perlormance Evaluation.
Prentice-Hall, 1978.
Hopkins, A.L., et.al.
FTMP - A Highly Reliable Multiprocessor.
IEEE Tran8. on Oomputer8 , October, 1978.
Thomas H. Kong.
Measuring Time for Performance Evaluation of Multiprocessor Systems.
Master's thesis, Carnegie-Mellon University, 1982.
NASA-Langley Research Center.
Validation Method8 lor Fault-Tolerant Avionic8 and Control SY8tem8 - Working Group
Meeting I, NASA-Langley Research Center, 1979.
NASA Conference Publication 2114.
Research Triangle Institute.
Validation Method8 lor Fault-Tolerant Avionic8 and Control SY8tem8 - Working Group
Meeting II, NASA-Langley Research Center, 1979.
NASA Conference Publication 2130.
Ajay Singh.
Pegasus: A Controllable, Interactive, Workload Generator for Multiprocessors.
Master's thesis, Carnegie-Mellon University, 1981.
W.N. Toy.
Fault-Tolerant Design of Local ESS Processors.
IEEE Tran8 on Computer8 , October, 1978.
[Wensley 78]
54
Wensley, J.H., et.al.
SIFT: A Computer (or Aircraft Control.
IEEE 7'rans. on Oomputers , October, 1978.
..
•..
1. R~ NO.
NASA CR-178075
3. Recipient'. c.tllog No.
4. Title Ind· Subtitle
Fauit-Free Validation of a Fault-Tolerant Multipro-
cessor: Baseline Experiments and ~orkload Implementa~
Hnn
5. Report Date
April 1986
8. Performing Orl'lniZition Code
...
•
NAG1-190
8. Performing Organimion Report No.
11. Contract or Grlnt No.
7. Author(sJ
Frank Feather, Daniel Siewiorek, Zary Segall
t-,-.,-:----,-.,-----------------------~10. Work Unit No.
9. ""orrnirtO Orglilizltion Name and Addr_
Carnegie-Mellon University
Pittsburgh, PA 15213
Contractor Report
1•. Sponsoring Agency Code
505-66-21-01
.......,-.,.....-,-.,--__------------------------i 13. Type of Report and Period Covered
12. SPonsoring AgilriCy Name and Addr...
National Aeronautics and Space Administration
Washington, DC 20546
15. SupPIerritrit-v Not"
Langley Technical Monitor: George B. Finelli
,6. Atisulct
In the future, aircraft employing active control technology must use highly
reliable multiprocessors in order to achieve flight safety. Such computers must
be experimentally validated before they are deployed. This project outlines a
, methodology for doing fault-free validation of reliable multiprocessors. The
methodology begins with baseline experiments, which test single phenomenon. As
experiments progress, tools for performance testing are developed.
This report presents the results of interrupt baseline experiments performed on the
Fault-Tolerant Multiprocessor (FTMP) at NASA-Langley's AIRLAB. Interrupt-causing
exception conditions were tested, and several were found to have unimplemented
interrupt ha.ndling software while one had an unimplemented interrupt vector.
A synthetic workload model for realtime multiprocessors is then developed as an
a.pplication level performance analysis tool. Details of the workload implemen-
'ta;tidnand calibration are presented. Both the experimental methodology and the
sy'nth~tic workload model are general enough to be applicable to reliable multi-
prOcessors besides FTMP.
17. KiyWordi (Su9llllted by Autllof(s)
Validation
Fau'1t~Tolerant
Multiprocessors
Performance Measurement
W():rkload
Fault-Free
18. Distribution Stltement
Unclassified - Unlimited
Subject Category 62 •
.. It.Slcurity o..;f. lof this report)
Undass if ied
20. Slcurity Clatlif. lof this PIlI)
Unclassified
21. No. of Pili"
58
2~. Price
A04
FCII sale by the Nalional TechnicallnfCllmationService.Springfield. Virginia 22161
..
._. - ,v
); II \\1 III\\1 ~~\\11\~~ \[r\~\~~[I\I[~\\fli[\ III\11\11II '
3 1176 01315 0967
..
