Measurement of SIFT operating system overhead by Butler, R. W. & Palumbo, D. L.
NASA'TM-86322 19850015026 lNASA Technical Memorandum 86322
FOR REFERENC_
Measurement of SIFT ---..........
Operating System Overhead _0_°=_'___"
Daniel L. Palumbo and Ricky W. Buder
*"_"' [IBRA_YC_PY
APR2 4 1985
.lANGLEY_[.SEAI_C:! C"N'f_R
LIBRARY, t ,_,_,_A
• t._lr#t-tA,_IPU.,.. V:;'_2_NtA
NAS/
https://ntrs.nasa.gov/search.jsp?R=19850015026 2020-03-20T18:43:47+00:00Z

NASA Technical Memorandum 86322
Measurement of SIFT
Operating System Overhead
Daniel L. Palumbo and Ricky W. Butler
Langley Research Center
Hampton, Virginia
NI A
National Aeronautics
and Space Administration
Scientific and Technical
InformationBranch
1985
Use of trade names or names of manufacturers in this report does not constitute an official endorsement
of such products or manufacturers, either expressed or implied, by the National Aeronautics and Space
Administration.
Contents
Summary .............................. 1
IntIoduction ............................. 1
Hardware Configuration ........................ 2
DatafilReeal_Time • • • • • • • • • • • • • • • • • • • • • • • • • • • 3
°el eO_ k
Operating System Overview ...................... 3
Local Executive Data Structures .... ............... 4
Task schedule .......................... 4
Task table ............................ 4
Buffer information array ..................... 5
Buffer table ........................... 5
Vote schedule and POSTVO_E array ................ 6
Global Fxecutive .......................... 6
SIFT Scheduler .......................... 8
Subsystem Overhead Measurement ................... 8
Instrumentation of Operating System ................. 8
Vote Oveihead ................. , ......... 8
Version B ,cote times ....................... 9
Version R vote times ....................... 11
Version V vote times ....................... 16
Error impact on vote time ..................... 16
Executive Ta,_k Ovexhead ...................... 18
Reeonfiguration overhead ..................... 18
I_teractive Consistency overhead .................. 19
Execution times of executive tasks ................. 19
Discussion of Results ......................... 21
Concluding Remarks ......................... 21
References .............................. 22
11Ol• •

Summary
The Software Implemented Fault Tolerance Three-way vote Five-way vote
(SIFT) computer system was developed for NASA time per buffer, time per buffer,
by SRI International as an experimental vehicle for Version ms ms
fault-tolerant systems research. SIFT was delivered B 0.413 0.412
to the Langley Avionics Integration Research Labo- R .302 .357
ratory (AIRLAB) in April 1982. Development and V .079 .107
testing have continued at the NASA Langley Re-
search Center, and several versions of the operating The vote times were found to vary with the number of
system have evolved. Each new version represents processors in the configuration and the location of the
the different strategies employed to improve the per- task replicates in the schedule table. These variations
formance of particular functions of the operating sys- were typically on the order of 10 to 25 percent.
tern. The three versions discussed in this paper are The overhead due to the second category, the
executive task overhead, is given below.
Version B as delivered (baseline) Overhead,
Overhead, Frame size, percent of
Version R improved reconfiguration performance Version ms ms frame size
Version V improved vote and reconfiguration B 60.8 83.2 73.2R 32.0 51.2 62.5
performance V 28.8 44.8 64.3
The SIFT operating system is a fully distributed,
real-time executive with no master controller. The The reduced major frame size for Versions R and
operating system is implemented in Pascal and per- V arises because the improved vote performance en-
forms the following major functions: ables some tasks to be scheduled in fewer subframes.
The SIFT computer system requires significant
1. Periodic task scheduling and dispatching overhead to achieve its fault tolerance. The voting
2. Data communication and voting and interactive consistency functions were found to
3. Clock synchronization be the primary sources of operating system overhead.
Unfortunately, these functions seem to be inherently
4. Fault isolation expensive when implemented in software.
5. Reconfiguration
6. Interactive consistency Introduction
These operating system functions fall into two cate- The Software Implemented Fault Tolerance
gories: Local Executive and Global Executive. The (SIFT) computer system was developed for NASA
Local Executive performs functions local to an indi- by SRI International as an experimental vehicle for
vidual processor, i.e., (1) and (2) above. The Global fault-tolerant systems research. The SIFT effort be-
Executive is a set of tasks which assume the respon- gan with broad, in-depth studies stating the reliabil-
sibility of items (3) through (6). ity and processing requirements for digital comput-
The SIFT operating system utilizes significant ers which would, in the next generation of aircraft,
resources to achieve fault tolerance in software. This control flight-critical functions. (See refs. 1 and 2.)
overhead falls in two main categories: Detailed design studies were made of fault-tolerant
architectures which could meet the required reliabil-
1. The time required to vote the intertask communi- ity and processing requirements. (See refs. 3 and 4.)
cation variables at the beginning of each subframe Following these studies, SRI International and the
2. The time utilized by the executive tasks Bendix Corporation designed and built the SIFT
system, which was delivered to the Langley Avion-
The overhead from the first category, vote over- ics Integration Research Laboratory (AIRLAB) in
head, was found to be a linear function of the amount April 1982. The SIFT architecture consists of a fully
of data to be voted. The following table gives a ba- distributed configuration of Bendix BDX-930 proces-
sic comparison of the vote times (best values) for sors with a point-to-point communication link be-
the versions of SIFT investigated with a six-processor tween every pair of processors. (See fig. 1.) Although
configuration: the design can accommodate up to eight processors,
only six processors are in the current system; relia- reconfigure is based on error information gathered
bility estimations have demonstrated that this is ad- when the data from replicate tasks are voted.
equate to meet the stated goal of a probability of The synchronization of the computers is funda-
failure of less than 10-° for a 10-hour flight, mental to the correct functioning of the exact match
vote algorithm and communication system. Synchro-
nization insures that all replicates of a task receive
the same input data and therefore produce the same
output data if fault free. Interprocessor communi-
to yncronou
PRocEssoR3 signals or rendezvous mechanisms are used. The va-
lidity of data is guaranteed by the precedence estab-
CPU lished in the task schedule and the synchronization
MEMORY of the processors.
DATAFILE The SIFT operating system has two levels of au-
i! thority. Executive contains procedures
P5 which support scheduling, voting, and communi-
nG _ cations. The Global Executive consists of tasks
"U which cooperate to provide synchronization and re-dundan y management (fault isolation and reconfig
uration). Since the delivery of SIFT, development
Figure I. SIFT system interconnection, and testing have continued at the NASA Langley Re-
search Center, and several versions of the operating
The basic attributes of fault-tolerant computers system have evolved. Each new version represents
are the different strategies employed to improve the per-
1. Redundant hardware and tasks are used. formance of particular functions of the operating sys-
2. Errors caused by hardware faults are masked by tern. The three versions discussed in this paper are
voting the redundant outputs. Version B as delivered (baseline)
3. To increase reliability, faulty hardware is removed
from the system by means of reconfiguration. Version R improved reconfiguration performance
Version V improved vote and reconfigurationImportant distinctions between SIFT and other performance
fault-tolerant computers are
Version B only ran by disabling the clock inter-
1. The functions supporting fault tolerance (e.g., rupt during certain executive tasks. This was unac-
voting) are primarily implemented in software, ceptable because the disabling of interrupts resulted
2. Different tasks can be replicated to different levels in significant delays in the output of the periodic ap-
(i.e., a noncritical task may be simplex, whereas plication tasks. The primary problem was the large
more critical tasks can be replicated three-fold or overhead of the Reconfiguration Task. Therefore, the
five-fold), operating system was redesigned. This new version
3. The unit of reconfiguration is a complete corn- is referred to as Version R. Version R is able to sup-
puter, i.e., processor, memory, and busses, port reasonable task schedules without disabling the
4. The design is not based on a special central pro- clock interrupts during certain executive tasks. Fi-
cessing unit (CPU) or memory design, nally, the vote system was redesigned to improve the
5. The redundant computers are loosely synchro- vote performance. This version is referred to as Ver-
nized, sion V. Version V was obtained by enhancement of
The assignment of tasks to processors in SIFT Version R. Before the results of performance mea-
ls predetermined by a task schedule table, which is surements on each version are discussed, an expla-
constructed by the application designer. The SIFT nation of pertinent hardware and operating system
internals is presented.scheduler periodically dispatches tasks according to
the task schedule. As processors fail, the hardware Hardware Configuration
complement changes. Therefore, the application de-
signer must define a task schedule for each level of The SIFT processors are Bendix BDX-930 avion-
configuration the system may encounter. Reconfigu- ics computers which communicate via a fully con-
ration in SIFT is essentially accomplished by select- nected point-to-point broadcast network. Although
ing the appropriate task schedule. The decision to the interconnection network and operating system
 6K // / Operating System OverviewM MoR
/ // The SIFT operating system is a fully distributed,// real-time executive with no master controller. Theoperating system is implemented in Pascal and per-
forms the following major functions:
C P U
DATAFILE/ 1. Periodic task scheduling and dispatching
|
rRANSACrTO-_I 2. Data communication and voting
FILE ] 3. Clock synchronization
c 1553A ] 4. Fault isolation
ONTROLLERI 5. Reconfiguration
6. Interactive consistency
REAL-TIME/
CLOCK These operating system functions fall into two
categories: Local Executive and Global Executive.
The Local Executive performs functions local to an
Figure2. SIFT processor block diagram, individual processor, i.e., (1) and (2) above. The
Global Executive is a set of tasks which assume the
are able to support up to eight processors, only six responsibility of items (3) through (6). The major
processors are currently used in SIFT. Each corn- distinction between the Local and Global Executives
puter in the system has a 16-bit CPU, 32K words is that the Global Executive tasks exchange data
of static random access memory (RAM), 1K datafile and cooperate on a systemwide basis, whereas the
memory, 1K transaction file memory, a broadcast Local Executive procedures do not exchange data or
controller, a 1553A controller, and a real-time clock, cooperate.
(See fig. 2.) The CPU is constructed from four The primary purpose of the SIFT operating sys-
2901 bit slice chips in a microprogrammed pipeline tern is to provide fault tolerance through masking of
architecture and achieves a performance level of errors by voting of replicated data from the redun-
1 million instructions per second. The 1553A con- dant tasks executing on separate computers. The
troller provides a MIL-STD-1553A bus interface for voting is exact match (i.e., bit-by-bit comparison)
communication with external aircraft systems, and therefore requires that the replicated processes
use exactly the same input data and produce their
Datafile outputs prior to the vote. System coordination is
The datafile is a 1K memory block and serves as achieved by use of a decentralized clock synchroniza-
buffer area for the broadcast and 1553A controllers, tion algorithm (ref. 4) and a simple communication
The datafile is partitioned into eight 128-word sec- protocol relying on the synchronization provided by
tions. The first seven sections function as input this algorithm. Basically, the communication proto-
"mailboxes." The other section serves as an output col requires that a data-producing task broadcast its
buffer. Each input data stream from the broadcast data at some preagreed time T and that the data-
network is hardwired to a specific mailbox to main- receiving tasks wait until at least time T + Maxi-
tain communication isolation, mum broadcast time + Maximum clock skew before
To broadcast a value, the value is first stored in reading. Communication within SIFT is therefore
the datafile output section. To start the broadcast, critically dependent upon the correct performance of
the location of the value is loaded into the transaction • the clock synchronization algorithm. Data generated
pointer register. The broadcast transmitter signals by a task are available to other tasks only after the
completion after 14.7ps. This allows for worst-case termination of the data-producing task. The oper-
contention for the receiving datafile, ating system allows the application system designer
to define up to 128 "data buffers" for each proces-
Real-Time Clock sor's mailbox. Each of these "data buffers" consists
of one word in the datafile area. An application task
The real-time clock is a read/write register which broadcasts its output data by calling a Local Execu-
produces interrupts at 1.6-ms intervals. The clock is tive procedure. The receiving task must be scheduled
a 16-bit counter that is driven by the 16-MHz crystal after termination of the last replicate of the data-
in the CPU. The clock is therefore synchronized producing task. Prior to execution of the receiving
exactly to the fetch-execute cycle of the CPU. The task, the Local Executive votes the replicates of the
least significant bit of the clock has a value of 1.6/zs. data and places the majority value in a "post vote"
3
array. To retrieve the voted data, the receiving task is constructed completely by the operating system.
calls a Local Executive function. Several operating All the other structures require some, if not all, of
system data structures must be initialized by the ap- their information to be entered by the application
plication designer to control the functions which co- designer in BDX-930 assembly code.
operate in performing the communications.
The Local Executive also keeps a count of every Task schedule. Task scheduling in SIFT is non-
vote disagreement and the identity of the nonagree- preemptive and based on precalculated schedule ta-
ing processors. This error information is distributed bles. The schedule table defines the set of tasks
and analyzed by the Global Executive. From this which will be periodically dispatched. This period
analysis, the Global Executive decides whether a re- is called a major frame and is partitioned into 3.2-ms
configuration should take place. Although this func- subframes. Tasks are statically allocated to these
tion is transparent to the application tasks at run- subframe slots. Therefore, all task execution times
time, the application designer must initialize sev- must be less than or equal to 3.2 ms. If an appli-
eral data structures to preplan the reconfiguration cation requires more time, it must be decomposed
process, into a sequence of 3.2-ms tasks. In Version R and
Version V, the operating system was generalized to
Local Executive Data Structures allow a task to utilize any number of 1.6-ms intervals.
In all versions, the application designer must allocate
In this section, the functions performed by the the tasks in a preplanned schedule table.
Local Executive are described. Because the SIFT Figure 4 shows a typical task assignment to the
operating system is driven by static data structures, subframes for a six-processor configuration. In this
the descriptions center around these data structures, schedule, the application designer has defined a ma-
The Local Executive has two main responsibil- jor frame containing 32 subframes.
ities: (1) scheduling and dispatching of tasks and The tasks are named by three-character identi-
(2) data communication and voting. The following tiers (e.g., IC1, IC2, MLT, LAT). The application
data structures are used by the Local Executive: tasks are scheduled three times in the major frame.
This is referred to as a "triple-frame" schedule. All
! 1. Task schedule the executive tasks except the Interactive Consis-
2. Task table tency Tasks are scheduled once per major frame. In
3. Buffer information (BINF) array a "single-frame" schedule, the application tasks are
4. Buffer table (BT) array only scheduled once in the table. In such a schedule,
5. Vote schedule the frame is shorter, but the operating system over-
6. POSTVOTE array head is proportionately larger, as seen subsequently.
Since the SIFT system is reconfigurable, there
The data linkages between these structures are must also be schedules for five-, four-, three-, and
illustrated in figure 3. Only the POSTVOTE array two-processor configurations (not shown). Although
the Local Executive executable code and schedule ta-
TASK SCHEDULE TASK TABLE TASK ble are identical on every processor, each executive
LISTOF ARRAYOF J uses a different section of the schedule table. Each
I INDICESTO r, TASK STATE I t, EXECUTABLEVECTORS dered pair (NW,VPN), where the NW field indicates[TASKVECTORS ODE section of the schedule table is identified by the or-
the number of working processors in the configura-tion, and the VPN field is the number of the virtual
VOTESCHEDULE BINF processor which uses this section of the schedule. Ev-
ARRAYOFLISTOF BUFFER ery physical processor has a virtual processor number
INDIcEsBUFFER INDICES assigned to it. After a recontiguration, this virtual
. / DATAFILE processor number may change. Since any processor
POSIVOTE may fail, the new virtual processor number cannot
ARRAY D, E_...-/'_j_...__ be predetermined. Thus, each processor contains allI IPROCESSORI the schedule table sections. The schedule table then
I DATAFILE/---I-_ I MA,LBOXESI consists of a sequence of these sections which are ini-
[ ADDRESSES] tialized in BDX-930 assembly code.
Task table. The task table contains information
Figure 3. Linkages betweenSIFT data structures, specific to each task in the system. The following
4
PROCESSOR ERRORS the number of times the task failed
SUBFRAME 1 2 3 4 5 6 to complete
0 icl icl icl ......... STATE the state of the task (including
1 ic2 ic2 ic2 zc2 ...... registers, restart address, and stack2 IC3 IC3 Ic3 IC3 Ic3 IC3
3 MLT --- MLT MaT MLT MLT area) upon interrupt
4 --- GUT GUT GUT GUT GUT
s PIT PIT --- PIT PIT PIT STKPTR the value of the stack pointer6 LAT --- LAW LAT LAT LAT
7 .................. register upon task termination or8 ..................
9 iCl icl icl ......... clock interrupt
10 IC2 IC2 IC2 IC2 ......
11 IV3 It3 It3 IC3 It3 It3
12 MLT --- MLT MLT MLT MLT Buffer information array. Before the Buffer13 --- GUT GUT GUT GUT GUT
14 PIT PIT --- PIT PIT PIT information (BINF) array can be constructed, the15 LAT --- LAT LAT LAT LAT
16 .................. application designer must enter in BDX-930 assembly
17 .................. code a list of EQU instructions identifying each buffer18 ICI IC1 ICl .........
19 It2 It2 IC2 IC2 ...... "name" with a buffer "number"
20 IC3 IC3 It3 IC3 IC3 IC3
21 MLT --- MLT HLT MLT MLT ERRER EQU 33
22 --- GUT GUT GUT GUT GUT
23 PIT PIT --- PIT PIT PIT GEREC EQU34
24 nAT --- LAT nAT LAT tAT
25 .................. :
26 ..................
27 ERT ERT ERT ERT ERT ERT
28 FIT FIT FIT FIT --- FIT These buffer names are simply convenient syn-
29 RET RET REX REX RET nET onyms for the buffer numbers. The BINF array is30 ..................
31 CLT CLT CLT CLT CLT CLT essentially a memory pool containing lists of buffer
where names which are pointed to by the BUFS field of the
task table. Each list is terminated by a zero field.
IC1, IC2, IC3 " Interactive Consistency Tasks The details of the assembly code which initializes thisMLT, GUT, PIT, LAT - ApplicationTasks
ERT - Error Task structure are not important for this discussion. This
FIT - Fault Isolation Task
RET- Reconftguration Task list is used in the Reconfiguration Task to rebuild the
CLT - Clock Task BT array, described next.
Figure 4. Typical task assignment in SIFT.
Buffer table. The buffer table, BT, is used by the
Local Executive and generated by the Global Execu-Pascal record defines its structure:
tive. The BT array is the central data structure used
TT: ARRAY[TASK]OF RECORD by the system for redundancy management. This
CAUSE: (TASKTERM,CLOCKINT,SYSTEMSTART); structure maps the logical buffer names to physical
BUFS: INTEGER; datafile locations. Since SIFT uses replicated tasks to
ERRORS:INTEGER; achieve fault tolerance, each data value (i.e., buffer)
STKPTR: INTEGER; is calculated by several tasks and is broadcast to spe-
STATE: .aRRAy[0.. 128] OF INTEGER; cific locations in the datafile of each processor. The
END; BT array maintains the datafile locations of all the
replicates of the data.
Most of these fields are initialized and managed by The BT array, shown below, is essentially a func-
the operating system. Only the BUFS field and the tion mapping a buffer number into a vector indicating
initial state of the task must be initialized by the where its replicated values reside.
applications programmer. The BUFS field points to
list of the buffer numbers in the Buffer information BT: ARRAY[0.. MAXBUFS]OF RECORD
array (described below). This list defines the output DBX: INTEGER;
variables of the task. The initial state holds the task AD: ARRAY[0.. MKXPROCESSOR]OF INTEGER;
starting location, terminating routine location, and END;
initial register values. Other fields in the task table
The datafile offset (DBX) must be entered by"therecord are used by the scheduler. They contain the
following information: application designer. This offset describes the loca-
tion of the data within the 128-word mailbox. For
CAUSE the reason for entry into the example, buffer number 10 might have an offset of
scheduler 8. Two different buffer numbers may be assigned the
same datafile offset (DBX), but the application de- SUBFRAME BUFFERSTOBEVOTED
signer must be careful not to utilize both at the same
time--i.e., they must be time multiplexed. The abil- 0
ity to time multiplex datafile locations was sacrificed 1 EXPEC
in Version R to enable an efficient reconfiguration 2 NDR
process. The AD array is constructed by the Recon- 3 LOCKXRESE4 QX QY QZ
figuration Task from information in the task sched- 5 PSIN PRIN RN QDELY QLATM TIMER
ule, task table, and BINF array and from data from 6 CMDEL QDELZ CMDTH QPITM
7 CMDAI CMDRU
the Fault Isolation Task. If processor i computes the 8
buffer, then AD[i] contains the location of processor 9
i output in the datafile. If processor i does not corn- 10 EXPEC
pute the buffer, then AD[i] = -1. For example, the 11 NDR12 LOCK XRESE
BT array entry relating to the processor 2 datafile is 13 Qx QY Qz
shown in figure 5. Buffer item 10 is found at an offset 14 PSIN PHIN RN QDELY QLATM TIMER
15 CMDEL QDELZ CMDTH QPITM
of 8 into the processor mailbox and was produced by 16 CMDAICMDRU
processors 0, 2, 3, and 5. 17
18
19 EXPEC
20 NDR
DBX : 8 21 LOCK XRESE
22 QX QY QZ
AD[O]: 8 23 PSIN PHIN RN QDELY QLATM TIMER
24 CMDEL QDELZ CMDTH QPITM
25 CMDAI CMDRU
AD[I]: -I 26
27
AD[2]: 904 28
29 GEREC GEMEM
AD[3]: 264 30
31
AD[4]: -I
Figure 6. Typical SIFT vote schedule.
AD[5]: 520
AD[6]: -1 to transfer data to a processor when its new schedule
contains a task it previously had not executed.
AD[7]: -I
Global Executive
Figure 5. Detailed descriptionof BT[10]. The Global Executive performs four major func-
tions:
1. Clock synchronization
Vote schedule and POSTVOTE array. The 2. Error report analysis and identification of faulty
vote schedule is constructed in parallel with the task processors
schedule. The vote schedule in figure 6 corresponds 3. Logical removal of a faulty processor via reconfig-
to the task schedule of figure 4. Unlike the task uration
schedule, there is only one vote schedule for each 4. Interactive consistency
level of configuration (i.e., all processors in the con-
figuration use the same vote schedule). The sched- These functions are performed by a set of tasks--
ule contains a list of items to be voted before the the Clock Synchronization Task, the Error Task, the
task scheduled for that subframe is executed. The Fault Isolation Task, the Reconfiguration Task, and
result of this vote is placed in the POSTVOTE ar- the Interactive Consistency Task. The details of the
ray. The restriction of allowing only one vote sched- synchronization process are not presented here.
ule for each configuration level guarantees that all When a processor fails, the immediate effect of
good processors contain exactly the same data in the its errors is masked by the vote function. The voter
POSTVOTE buffers--even if their schedules do not records the number of errors produced by each pro-
execute tasks which use all the data. Although at cessor. Once during each major frame, the Error
first this appears wasteful, it simplifies the reconfig- Task condenses the local error data and broadcasts
uration process. Since all data are available on every the information. All processors now have a sys-
processor during reconfiguration, it is not necessary temwide record of the errors produced during the
6
VAR
ERAD AT 16#4000: ARRAY[I..6,1..MAXSUBFRAME,I..8] OF INTEGER;
ERADPT: INTEGER;
GLOBAL FUNCTION SCHEDULER(CAUSE:SCHEDCALL; STATE:INTEGER):INTEGER;
VAR TI,I:INTEGER;
BEGIN
TSKFN := CLOCK; (* - PERFORMANCE MONITOR - *)
TT[TASKID].STKPTR := STATE;
IF CAUSE<>TASKTERMINATION THEN (* --- CLOCK INTERRUPT --- *)
BEGIN
IF (TASKID<>NULLT) THEN (* TASK DID NOT COMPLETE *)
BEGIN TT[TASKID].ERRORS := TT[TASKID].ERRORS + I;
BUILDTASK(TASKID);
END
ELSE TT[TASKID].STATUS := CLOCKINTERRUPT;
IF SFCOUNT >= MAXSUBFRAME THEN (* START NEW MAJOR FRAME *)
BEGIN SFCOUNT := O;
IF FRAMECOUNT >= MAXFRAME THEN FRAMECOUNT := 0
ELSE FRAMECOUNT :_ FRAMECOUNT+I; GFRAME :_ GFRAME+I;
IF ERADPT _ 6 THEN ERADPT := I (* - PERFORMANCE MONITOR - *)
ELSE ERADPT :_ ERADPT + I; (* - PERFORMANCE MONITOR - *)
END
ELSE SFCOUNT := SFCOUNT+I;
TSCHEDULE; (* SELECT NEW TASK *)
BCLOCK := CLOCK; (* - PERFORMANCE MONITOR - *)
VSCHEDULE; (* PERFORM VOTE *)
BCLOCK := CLOCK - BCLOCK; (* - PERFORMANCE MONITOR - *)
ERAD[ERADPT,SFCOUNT I] :_ GFRAME; (* - PERFORMANCE MONITOR - *)
ERAD[ERADPT SFCOUNT 2] := SFCOUNT; (* - PERFORMANCE MONITOR - *)
ERAD[ERADPT SFCOUNT 3] := TSKFN; (* - PERFORMANCE MONITOR - *)
ERAD[ERADPT SFCOUNT 4] := BCLOCK; (* - PERFORMANCE MONITOR - *)
ERAD[ERADPT SFCOUNT 5] := RCLOCK; (* - PERFORMANCE MONITOR - *)
ERAD[ERADPT SFCOUNT 6] := XCLOCK; (* - PERFORMANCE MONITOR - *)
ERAD[ERADPT SFCOUNT 7] := O; (* - PERFORMANCE MONITOR - *)
ERAD[ERADPT SFCOUNT 8] := O; (* - PERFORMANCE MONITOR - *)
END
ELSE (* --- TASK TERMINATION --- *)
BEGIN TI := TSKFN - TSKST; (* - PERFORMANCE MONITOR - *)
IF TI > TTIME[TASKID] THEN (* - PERFORMANCE MONITOR - *)
TTIME[TASKID] :_ TI; (* - PERFORMANCE MONITOR - *)
ERAD[ERADPT,SFCOUNT,7] := TSKST; (* - PERFORMANCE MONITOR - *)
ERAD[ERADPT,SFCOUNT,8] := TTIME[TASKID]; (* - PERFORMANCE MONITOR - *)
TASKID := NULLT;
END;
SCHEDULER := TT[TASKID].STKPTR;
TSKST := CLOCK; (* - PERFORMANCE MONITOR - *)
END; (* SCHEDULER *)
Figure 7. SCHEDULER procedure with instrumentation.
7
past frame. The Fault Isolation Task searches the the SCHEDULER. The SCHEDULER with the mea-
error data to locate faulty processors. The Fault Iso- surement code is shown in figure 7. An array, ERAD,
lation Task then broadcasts a value indicating which was added to store the data during a test. ERAD is a
processors are faulty. This value is voted and used three-dimensional array with indices ERADPT, SF-
by the Reconfiguration Task to compute the set of COUNT, and DVI, which differentiate the 6 major
"good" processors. The Reconfiguration Task does frames, 32 subframes, and 8 data items, respectively.
not physically remove a faulty processor from the To retrieve the performance data, the SIFT proces-
configuration. The good processors merely agree to sors were allowed to run an arbitrary amount of time
ignore outputs from the faulty processor. Thus, re- (5 to 10 s) and then were halted manually from the
configuration may be accomplished by changes to the host processor. The portion of the processors' memo-
internal data structures, ries containing the ERAD array was copied to a disk
The Reconfiguration Task basically selects a new file and analyzed by offiine programs. The eight data
schedule table and regenerates the buffer table (de- items are as follows:
scribed previously) to accomplish the logical recon-
figuration. Since the algorithm used to determine GFRAME the major frame count (nonrepeat-
which processor must be eliminated is decentralized, ing count)
it is essential that the algorithm is designed so ev- SFCOUNT the subframe count (0..MAXSUB-
cry good processor makes exactly the same decision FRAME)
at exactly the same time. This must be done cor-
rectly even in the presence of a malicious processor TSKFN the time the previous task finished
(i.e., one that sends good data to some processors
and erroneous data to others). This is accomplished BCLOCK the vote time for the subframe
by use of the "interactive consistency algorithm" de- RCLOCK not used
veloped by SRI International (ref. 4) and discussed
in the section "Interactive Consistency overhead." XCLOCK not used
SIFT Scheduler TSKST the time at which the task for the
subframe started
The scheduler consists of two major components--
the assembly code interrupt handler and the Pascal TIME the maximum task execution time
procedure SCHEDULER. The Pascal procedure is Figure 8 shows the components of a subframe.
called from assembly code whenever any one of the All the variance in execution time results from either
following three events occurs: (1) system startup, the voter or the task itself. The time required for
(2) task termination, or (3) clock interrupt. The dispatching a task was calculated by subtracting the
SCHEDULER has two primary responsibilities after vote time from the total time spent in the SCHED-
a clock interrupt--vote the data scheduled for the ULER routine. This overhead then includes the time
subframe and dispatch the next task according to needed for the instrumentation. Figure 8 shows the
the information in the schedule table, task schedule overhead to be nominally 270/_s. An
Under Version B of the SIFT Operating System, analysis of the voter and the Global Executive tasks
an application designer has to divide a process that follows.
takes longer than 3.2 ms into a series of 3.2-ms tasks.
An entry must be made in the schedule table for each
subframe in which the process would run. To spare Begin subframeon clock tick New subframe
the designer the work of partitioning the process _ land to reduce the size of the schedule table, the n Schedule Performvote Appficationtask Idlen
structure of the schedule table was changed to allow _ task Uthe designer to define how many 1.6-ms interrupts 0.27 ms 0.04 to 2.5 ms 0.3 to 2.4 msthe task should use.
Figure 8. Components of a subframe.
Subsystem Overhead Measurement
Instrumentation of Operating System Vote Overhead
Since the SCHEDULER controls voting and the Voting is performed at the beginning of every
dispatching of tasks (and therefore the Global Execu- subframe prior to the execution of the application
tive), the measurement instrumentation was added to task. The operating system scans the vote schedule
8
table to determine which data buffers are to be voted. 2.6_
For each such data buffer, the VOTE routine is 2.4- _£_ 2.51
called. The VOTE routine uses the BT array to 2.2
retrieve the replicated versions of the data from the
datafile. After the data are retrieved, either VOTE3 2.0-
or VOTE5 is called for three-way voting or five- 1.8
way voting, respectively, depending on the number "_ 1.6_ j- 1.69
of values found. Unfortunately, the time required for E
./1.1,
voting is affected by the following four factors:
.E ,.2- /p ,.27
1/<D 1.0
1. VS, the type of vote--three-way or five-way (de- "5 /p .s62
termined by the number of task replicates which > .8-
./generate data) .6__
.4 / .450
2. NV, the number of data buffers to be voted, as
indicated in the vote schedule .2
3s I I I [ I I
3. HP, the position of the data-producing tasks in , 2 3 4 5 6
the schedule table Number of Data Values Voted (NV)
4. NW, the number of working processors in the Figure 9. Five-way vote times VB(5, NV,6,6).
configuration
2.6
The characteristics of the vote time are different 2.4_ //_ 2.s_
for each of th e operating system versions. These vote 2.2
times will be referred to as VB, VR, and Vv. Thus, 2.o_
the vote time for each of these versions is effectively
a function defined on a four-dimensional space as 1.8
follows: _ 1.6 _- 1.70
1.4
0)
VB( VS, NV, gP, NW) .E_1.2 ///_ 1.28
Vn( VS, NV, gP, NW) _ 1.o ./
0 / .864
Vv ( VS , NV, HP , NW ) > .s_
.6
Obviously, it is difficult to clearly present the .,_/.451
details of an empirical function defined on a four- .2r/
dimensional space. In this paper, this is attempted )'.o38 I I I I I I1 2 3 4 5 6
by illustrating different planes in the four- Number of Data Values Voted (NV)
dimensional space. Figure 10. Three-way vote times for VB(3,NV, x,6), where
x E {3,4,...,8}.
Version B vote times. First, looking only at Thus, when there are only three data-producing
a six-processor configuration (NW = 6) and as- tasks, the loop does not terminate until "I>MAX-
signing the highest task replicate to processor 6 PROCESSOR." (MAXPROCESSOR is 7.)This ad-
(HP -- 6), we obtain the graph in figure 9 for ditional looping time is almost exactly equal to the
VB(5, NV, 6,6). Surprisingly, the vote time for the difference between the VOTE3 and VOTE5 execu-
three-way vote, VB(3, NV, 6, 6) is virtually identical tion times. If five-way voting is not required on any
to that of the five-way vote. A single five-way vote tasks, then the loop test can be changed to "UNTIL
requires 0.412 ms, whereas a single three-way vote re- (J=3) OR (I>MAXPROCESSOR)." With this mod-
quires 0.413 ms. (See fig. 10.) Although the VOTE3 ification, the system will be referred to as "TRIAD
routine does execute faster, the code which retrieves SIFT." The three-way vote in TRIAD SIFT will be
data from the datafile (see fig. 11) continues until el- indicated by appending an * after the subscript letter
ther all eight fields of the BT array are examined or denoting the version (e.g., Va.). The VB.(3,NV,6,6)
five good values (i.e., not -1) are found, vote times for the TRIAD SIFT are compared with
9
VAR BT: ARRAY[O..MAXBUFFERS] OF RECORD
DBX: INTEGER;
AD: ARRAY[O..MAXPROCESSOR] OF INTEGER;
END;
PROCEDURE VOTE(B: BUFFER; DEFAULT: INTEGER);
VAR I,J,K: INTEGER;
BEGIN
J := O; I :- O;
REPEAT
K :- BT[B].AD[I]; (* DATAFILE ADDRESS OF BUFFER B *)
IF K >= 0 THEN
BEGIN
J := J + I;
P[J] := I; (* SAVE PROCESSOR NUMBER *)
V[J] := DATAFILE[K]; (* RETRIEVE DATA FROM DATAFILE *)
END;
I := I + I;
UNTIL (J=5) OR (I>MAXPROCESSOR); (* UNTIL 5 VALUES FOUND OR SEARCH DONE *)
(* CALL APPROPRIATE VOTE FUNCTION; i.e. 5-WAY OR 3-WAY *)
END;
Figure 11. Vote procedure in Version B.
2.6
2.__ 2157_ VB(3,NV,x,6)
2.2
2.0 -- 2.15 (IRIAD SIFT)
1.8 - ,=,.," p 1.8o
,° yj
J/-
.E_
,451
.4 .390
_.o38 I t I I I I
0 1 2 3 4 5 6
Number of Dot:o Volues Voted (NV)
Figure 12. Vote times for three-way SIFT and TRIAD SIFT.
10
the normal SIFT vote times in figure 12. From this dependent on HP and therefore on the task schedule
graph, the advantage of the above software modifi- constructed by the application designer. This is not
cation when only three-fold redundancy is needed is a desirable attribute, since a task which runs within
clearly seen. its time limit in one schedule may not have enough
Because the vote time is a linear function of NV, time to run if the schedule is slightly modified.
the following formula is valid: It is noteworthy that the HP dependency arises
from the (J=5) test of the data retrieval loop. By
VB( VS, NV, HP, NW) = CB( VS, HP, NW). NV+B simply removing this test from the Boolean ex-
where CB(VS, HPINW) is the slope of the line and pression, this dependency can be eliminated. The
B is they-intercept. The notationCB essentially rep- vote time would then always be worst case (i.e.,
resents the vote time per buffer and will be referred CB(5, MAXPROCESSOR, 6)), but the verification
to as the vote cost. The symbol B represents the process would be simpler.
basic overhead when no buffers are voted. It is in-
dependent of the other parameters of VB and always Version R vote times. In Version R, a minor mod-
has the value 0.038 ms. ification was made to the data retrieval loop code.
Next, the impact of the position of the tasks in the (See fig. 14.) This modification reduced the vote time
schedule table on the vote time is illustrated. In a six- in certain cases but introduced the additional corn-
processor configuration, there are six ways to assign plexity that the vote time depends on NW. The
the five task replicates to the processors. However, data retrieval loop in Version R differs from that
the only factor which influences the vote time is in Version B in two significant aspects. First, the
the highest numbered processor which is running the loop termination logic now refers to NW rather than
task; i.e., in all the following five assignments of task MAXPROCESSOR. Second, the variable I refers to
tl, processor 6 is the highest processor executing the virtual processor number rather than physical pro-
task, and thus all five assignments have the same vote cessor number as in Version B. This is a consequence
time. of the new design. The BT array in Version R is
indexed by the number of working processors, NW,
and buffer number, B. (See the section on reconfig-
Processor number uration overhead.) The BT array in Version B was
1 2 3 4 5 6 indexed by buffer number alone. Thus, whereas the
tl tl tl tl tl Version B vote times were influenced by the high-
tl tl tl tl tl est physical-processor-producing data, the Version R
tl tl tl tl tl vote times are influenced by the highest virtual pro-
tl tl tl tl tl cessor number. Thus in this section, HP will be-
tl tl tl tl tl come HVP, which denotes the highest virtual pro-
cessor which produces the data being voted. In fig-
ure 15, the vote costs CR(3, 3, NW), CR(5, 5, NW),
For each of the above task assignments, HP = 6. and CR(5, 6, 6) are given as a function of the number
For the remaining assignment of working processors, NW. The value of CR(5, 6, 6)
is greater than that of Cn(5, 5, 6) because an extra
BT array access is needed. The three-way vote cost
I Processor number I dependency on NW occurs because the BT array
1 2 3 4 5 6 search loop terminates when I becomes greater than
, tl tl tl tl tl NW. In figure 16, the effect of HVP is illustrated.
The three-way vote cost Cn(3, HVP, NW) is seen to
be independent of HVP. However, as can be seen in
the highest processor is 5; thus, HP = 5. The vote figure 17, the TRIAD SIFT three-way vote cost is de-
costs CB(5, HP, 6),CB(3, HP,6), and CB.(3, HP,6) pendentontheparametergVP. Infigurel8, WRIAD
are given as a function of HP in figure 13. (Note SIFT is compared with Version R SIFT.
that HP is the physical processor number, which is The NW dependency of this version arises from
derived from the slot position on the broadcast bus. the (I>NW) test in the REPEAT-UNTIL loop. By
Therefore, even if the number of working processors returning this to the (I>MAXPROCESSOR) test,
is less than eight, HP can be made equal to 8 by the NW dependency can be eliminated. Although
placing one of the good processors in slot 8.) The the (I>NW) test results in a reduced vote time for
vote times in Version B are independent of NW. smaller SIFT configurations (e.g., after several re-
As can be seen in figure 13, the vote costs can be configurations), additional vote overhead complexity
11
4_45_4_--- 4_°/_'_ CB(5'HP'6)
4_ - 413_ • _/ • //_o CB(3'HP'6)
5'75 -- 1
"-'o"_s_ - / _52 CB,(3,HP,6 )
L) 525 -- / 323
500-
"_ 295
275 --
225--
_o , I I ! I I I
2 5 4 5 6 7 8
Highest Processor in Vote (HP)
Figure 13. Vote costs for CB(5, HP, 6), CB(3, gP, 6), CB,(3, gP, 6).
VAR BT: ARRAY[O..MAXPROCESSOR,O..MAXBUFFERS] OF INTEGER;
PROCEDURE VOTE(B: BUFFER; DEFAULT: INTEGER);
VAR I,J,K: INTEGER;
BEGIN
J :- O; I :- O;
K :- BT[NW,B]; (* BIT VECTOR OF DATA PRODUCING PROCESSORS *)
REPEAT
IF ODD(K) THEN (* PROCESSOR I COMPUTES BUFFER *)
BEGIN
J :- J + I;
P[J] :- I; (* SAVE PROCESSOR NUMBER *)
V[J] :- DATAFILE[VTODF[I]+B]; (* RETRIEVE DATA FROM DATAFILE *)
END;
K := K DIV 2; (* SHIFT NEXT PROCESSOR BIT TO LSB *)
I := I + I;
UNTIL (J=5) OR (I>NW); (* UNTIL 5 VALUES FOUND OR SEARCH DONE *)
(* CALL APPROPRIATE VOTE FUNCTION; i.e. 5-WAY OR 3-WAY *)
END;
Figure14. Vote procedureinVersionR.
12
400--
374
5"-/s- e) CR(5,6,6)_
_ ® • CR(5,5,NW)
.--..- 357 .357
-__
o
302
"_- 285_.._e CR(3,3,NW )
0 269
252
_-_-- _
5--
_ ! I I I
2 5 4 5 6
Number of Processors in System (NW)
Figure 15. Vote cOStS of optimized system.
400--
575- e./_374 CR(5,HVP,6)::=L
----- $50-
357
o 525 --(.3
_ $__ 302 (D (D d) • CR(3,HVP,6)
>o 27s -- 285 (D _, _) CR(3,HVP,5)
269 • 0 CR(3,HVP,4 )
2._ -- 252 • CR(3,HVP,3 )
225--
a_ I I I "I I
2 5 4 5 6
Highest Virtuel Processor in Vote (HVP)
Figure 16. Vote costs for OR(5, HVP, 6) and OR(3, HVP, NW).
18
lqIGHESTVIRTUAL
PROCESSOR IN
- ¥OTE(HVP)
%-
:=l.
- 298 • 6
o
22_--
264 (D (D (D 4o
2ro0- 247 • (D (D (D
2¢e _ 1 I I
2 5 4 5 6
Number of Processors in System (N_)
Figure 17. Vote costs for CR.(3, HVP, NW) of TRIAD SIFT.
400 --
_ 57s- e/./_374 CR(5.HVP,6)
_t
.__ 550- 357
0 525--
G3
o
302 _ 298
> 275 -- _ 281
_B - CR,13,1-1YP,6) _ 264247
225 --
2 5 4 5 6
Highest Virtual Processor in Vote (HVP)
Figure 18. Vote costs for OR(5, HVP, 6), Ca(3, HVP, 6), Ca.(3, HVP, 6).
14
VAR BT: ARRAY[O..MAXPROCESSOR,O..TASKS] OF INTEGER;
PROCEDURE VOTE(TK: TASK; DEFAULT: INTEGER);
VAR I,J,K: INTEGER;
B: BUFFER;
BEGIN
J := O; I := 0;
K := BT[NW,TK] (* BIT VECTOR OF DATA-PRODUCING PROCESSORS *)
REPEAT
IF ODD(K) THEN (* PROCESSOR I EXECUTED TASK TK *)
BEGIN
J := J + I;
P[J] := I; (* SAVE VIRTUAL PROCESSOR NUMBER *)
DF[J] := VTODF[I]; (* SAVE DATAFILE OFFSET *)
END;
K := K DIV 2; (* SHIFT NEXT PROCESSOR BIT TO LSB *)
I := I + I;
UNTIL (J=5) OR (I>NW);
I := TT[TK].BUFS; (* RETRIEVE INDEX OF FIRST BUFFER OF TASK *)
B := BINF[I]; (* RETRIEVE BUFFER NUMBER *)
WHILE B > 0 DO
BEGIN
• (* CALL APPROPRIATE VOTE PROCEDURE, i.e. 5-WAY OR 3-WAY *)
I := I + I;
B := BINF[I] (* RETRIEVE'NEXT BUFFER NUMBER *)
END;
Figure 19. Vote procedure inVersion V.
15
is also obtained. This overhead complexity signif- TABLE I. ERROR IMPACT ON THREE-WAY VOTE
icantly increases the effort to validate the system,
since one must insure that adequate time is present
for all the tasks to run under all possible configu- Faulty Increase in
rations and all possible schedules that the system processor(s) vote time, ms
may encounter. Perhaps the workload requirements 1 0.0392 .031
of the degraded configurations will require this re-
duced overhead (i.e., for small values of NW), since 3 .031
processing power is scarce, but a serious price is paid
for this during the validation effort. 1,2 .096
Version V vote times. In Version V, the explicit
purpose of the operating system modification was to
decrease vote costs. An analysis of the VOTE pro- TABLE II. ERROR IMPACT ON FIVE-WAY VOTE
cedure showed that a great deal of time was spent
indexing into the BT array for each buffer. After the
modifications of Version R, the information in the BT Faulty Increase in
array was reduced to a bit vector representing the processor(s) vote time, ms1 0.040
set of processors which produces the buffer. Since 2 .032
the set of processors that runs a task which com- 3 .032
putes a buffer is equivalent to the set of processors
that produces the buffer, the approach taken was to 4 .032
modify the BT array so the VOTE procedure could 5 .032
manipulate a task instead of a buffer. (See fig. 19.)
The BT array was modified to be indexed by 1,2 .079
task rather than by buffer number, and the vote 1,3 .072
schedule was changed to a list of task names instead 1,4 .072
of buffer numbers. As seen in figure 20, although 1,5 .079
Version V pays an initial penalty and actually takes 2,3 .071
longer when voting one buffer, it shows significant 2,4 .063
2,5 .063improvement over Version R (and therefore also over
Version B) for more than one buffer. In Versions B 3,4 .063
and R, the basic overhead time B is a constant, but 3,5 .061
in Version V, all the configuration dependency is 4,5 .061
in the basic overhead. Thus, the following formula
describes the Version V vote overhead Vv: 1,2,3 .072
1,2,4 .064
Vv = Cv (VS). NV + Bv ( ITS,HVP , NW ) 1,2,5 .064
1,3,4 .064
Since Bv is no longer a constant overhead, it will 1,3,5 .064
be referred to as vote bias. Figure 21 shows this 1,4,5 .072
vote bias relationship to VS, HVP, and NW. Bv 2,3,4 .064
is dependent on NW, as shown by the solid lines 2,3,5 .064
By(5, HVP=NW, NW) and By(3, x, NW). Bv 2,4,5 .064
is only dependent on HVP for NW = 6 in a five: 3,4,5 .064
way vote. For TRIAD SIFT, Bv. is independent
of NW and only dependent on HVP, as shown by
the dashed lines By.(3, x, NW). The Bv term only as in the other versions, this dependency can be elim-
contains the data retrieval overhead. The actual time inated by removing the (I>NW) and (J=5) tests
spent voting is represented by Cv and is dependent from the UNTIL loop.
only on the type of vote done. For a three-way vote,
Cv(3) = 0.079 ms. A five-way vote has a cost, Cv(5), Error impact on vote time. The vote time mea-
of 0.107 ms. surements given in the previous sections were made
The redesign of the vote system moved the depen- only when the data replicates were identical. Un-
dencies on NW and HVP to the vote bias. However, fortunately, the vote time is increased if some of
16
2.6 i
2.4 --
2.2 -- _'g_2.28
1.82"°- VR(5,NV,6,6_/_.90
_ 1.4 _ 1.53
._ 1.2
I---
0 ,986
> .8 -- _.786 _-_^ .879
.6 / _ .772
-- .450 /-.---_ .665 . .
"4 _-----_ 2.2 .557 VV(5,NV,6,6)
_o38 i I I I I I
0 i 2 3 4 5 6
Number of Data Values Voted (NV)
Figure 20. Comparison of Vv and VR vote times.
._. _- ,__,5_'_ Bv(S,6,NW) =.545
:::l" _q'e_" -e BV(5,5,NW) =.528
>
a3
.__ _ BV,(3,6,6 ) =.296
m zTs- - BV,(3,5,NW ) =.278©
"5 4 NW) "--.2 1
> _- e_---e- • • BV,(3,3,NW) =.245
2L_5--
2_e I I I I I
2 5. 4 S 6
Number of Processors in System (NW)
Figure 21. Vote bias dependency on NW and HVP in Vv.
17
the data values are erroneous. Furthermore, the in- 46-
crease is slightly dependent on the particular pro- VERSIONB 35.19
cessor which generated the erroneous data because --. ss- • _
of the structure of the IF_THEN_ELSE statements _ 3o.58 _ _ _
in the VOTE procedure. The increases in vote time _ 50- _ _ _ _ 31.5s
are given in table I and table II. From the tables _ _ _
it is obvious that the application designer must be _. 25--I- 27.28
very careful to insure that the worst-case vote time _
is accommodated when generating the vote and task _ _ -
schedule tables, c¢
15-
Executive Task Overhead
m- VERSIONR¢o
Reconfiguration overhead. To maintain a high
_1 5 -- 2.17 2.26 2.35 2.47
level of reliability, the SIFT Global Executive re-
.... .@.... _ -4) @ ....
moves faulty processors from the system, or recon- o I I [ I I
figures. The process is divided into three tasks. 2 5 4 5 6 7
The Error Task transfers the local error data to the NUMBER OF PROCESSORS
Global Executive via an error report. The Fault (before reeonfiguration)
Isolation Task uses the error report from each pro- Figure 22. Reconfiguration times for Versions B and R.
cessor to locate faulty processors. The Reconfigura-
tion Task determines if a reconfiguration is necessary. The major points of the Version R redesign are
During normal operation, each Global Executive task 1. A buffer number defines the location of the buffer
uses one 3.2-ms task slot. If a fault occurs and re-
within the processor mailbox; i.e., each buffer
sults in a reconfiguration, the Reconfiguration Task item is assigned a unique mailbox address.
utilizes resources significantly in excess of 3.2 ms. 2. Array VTODF maps the virtual processor hum-
The exact time for reconfiguration depends on the ber to the corresponding physical processor mail-
number of working processors, but the worst case box offset in the datafile.
was found in Version B to be 35.19 ms or 11 sub- 3. Arrays RTOV and VTOR map real to virtual
frames. (See fig. 22.) Since the scheduling is static, processor numbers and vice versa, respectively.
it must be based on worst-case performance, and 4. The BT array is restructured to hold the virtual
hence 11 subframes must be dedicated to the Re- processor data production information for every
configuration Task, even though the vastproportion configuration level.
of time they are not being utilized. However, in 5. The voter operates on virtual processors, and
Version B, rather than dedicate 11 subframes, the hence the ERROR array is represented in terms
real-time clock interrupt was disabled during the Re- of virtual processors.
configuration Task. This allowed the task to take as 6. The Error Task translates the ERROR array in-
much time as necessary, even though it was allocated formation to physical processor numbers while
to only one subframe. This was clearly unaccept- constructing the error report.
able, since a serious disruption of output data would A minor loss in generality comes about from
occur during the reconfiguration process. Therefore, point (1) above. Under Version B, two or more
the SIFT Reconfiguration Task was redesigned at the buffers could be assigned to the same mailbox lo-
NASA Langley Research Center. cation. This would be necessary if, for example, the
As stated above, the worst-case time for a recon- system required more than 128 data buffers. This
figuration is equivalent to eleven 3.2-ms subframes condition is not conceptually restrictive, since the
under Version B. Most of this time is spent recon- datafile could be enlarged to 32K words (or more[)
structing the BT array. The reconstruction is needed to allow 4000 buffers per processor. The BT array
because the physical to virtual processor mapping under Version R of the SIFT Operating System now
changes after every reconfiguration. This mappihg is has the form
._ necessary because the voter operates on physical pro-
cessors, whereas the task schedules (and therefore, BT: ARRAY[PROCESSOR,BUFFER]OF INTEGER;
the processor data production information) are nec-
essarily built in reference to virtual processors. The For a configuration level of NW processors and
Version R design enables the Reconfiguration Task buffer B, BT[NW,B] contains a bit map of the vir-
to execute in one 3.2-ms subframe, tual processors that produce buffer B. Under Ver-
18
sion R, the BT array is filled during system initial- The overhead for these special interactive consis-
ization from task schedule information. The BT then tency algorithms can be very large. This overhead is
contains valid information for all levels of system con- especially severe because the failure modes that the
figuration. The Reconfiguration Task now builds ar- algorithms eliminate may be rare events. In the ab-
rays VTOR, RTOV, and VTODF. During a recon- sence of a thorough analysis demonstrating that the
figuration, since the BT array no longer needs to be probability of these failure modes is negligible, inter-
rebuilt, the only chore is to select the proper task active consistency algorithms must be used. In order
and vote schedules. The execution time for the Re- to accommodate m faulty processors, the total num-
configuration Task was reduced to about 2.5 ms for ber of processors, n, must be at least 3m + 1. The
all levels of initial configuration, number of messages required to obtain interactive
consistency is on the order of n '_+1. Although five-
Interactive Consistency overhead. Data values way voting, which can deal with two internal faults,
from the external environment are unreplicated and is supported in SIFT, the interactive consistency al-
gorithm that was implemented can only handle one
must be transferred to all computers of the system malicious fault. The simple flight control applica-in a consistent manner; i.e., all computers must re-
ceive the same value. This could be accomplished tions currently running in SIFT use 63 external sen-
by every processor reading the external source inde- sor values, each of which goes through the interactive
pendently or by one processor reading the external consistency algorithm. This requires 11.8 ms when
source and then distributing the obtained value to no disagreements occur in the data. With faulty data
the rest of the processors. In the first case, each present, the Interactive Consistency Tasks can utilize
processor might get a different value because of the up to 13.4 ms.The interactive consistency algorithm consists ofinherent uncertainty of reading analog data. Hence,
a subsequent exchange of the values read, along with the following steps:
a midvalue selection, is required to produce a value 1. The source value is input and distributed to the
which is consistent across all processors. However, n processors.
if one of the processors is malicious, i.e., sends dif- 2. The received values are exchanged m times.
ferent values to different processors, then the good 3. A consistent value is obtained by use of a recur-
processors can still end up with different values. (See sive algorithm. When m = 1, this reduces to
fig. 23.) The second method will produce similar er- determinihg a majority value.
roneous results if the single "input" processor is ma-
licious. Note that although the good processors each The following execution times were measured for
decide on a slightly different value, they are both SIFT:
"good" in that the difference is only the slight differ-
ence in the redundant external sources. A midvalue Step (1) 3.05 ms
selection on the replicated output channels would al- Step (2) 2.22 ms
ways result in a "good" output. However, if exact Step (3) 6.57 ms
match voting is used to detect and isolate the fault, Total 11.84 ms
then serious problems can result, e.g., a good proces- Since the Interactive Consistency Tasks must be
sor can be reconfigured out of the system. Thus, spe- executed at the data sample rate, a large portion of
cial "interactive consistency" algorithms are essential the available CPU time is consumed, as shown in the
in fault-tolerant systems in which fault isolation and table below.
reconfiguration are performed. (See refs. 5 and 6.)
In systems in which fault-masking is performed but
no reconfiguration is attempted, such algorithms are Data sample period, Utilization,
unnecessary, ms percent
100 11.8
Sensors Processors Received Midvalue 50 23.7
Distribution values selected 33 35.9
• 1o___ o@ @1oo,200.lO`3@ lO3 25 47.4
• -- @ 100, 99. 10.3 @ 100 Execution times of executive tasks. The max-
, 10___2_3(_ PC) 100.2s.10`3(_) 100 imam execution times of the SIFT executive tasks
were measured and are tabulated on page 20. The
Figure 23. Distributing unreplicated data. dispatch time represents the amount of time utilized
19
by the operating system prior to dispatching the executive task. This includes the vote time for this
subframe. Therefore, the dispatch overhead is strongly dependent on the vote schedule. It should be
noted that some variables voted during a particular subframe are not necessarily used by that task. The
execution time column gives the time used by the executive task after being dispatched. The total time
column is the sum of the dispatch time and execution time columns. A single frame is a major frame that
contains one iteration of the sample application set.
A triple frame contains three iterations of the application set and therefore requires three iterations of
the Interactive Consistency Tasks. The Global Executive Tasks are dispatched once every major frame.
In particular, the Reconfiguration Task is executed at the major frame rate. Preliminary design studies
recommended a major frame period of 100 ms in order to achieve the reliability requirements of SIFT.
Under Version B, the application set of four tasks utilizes seven 3.2-ms subframes. The executive
overhead then is
Dispatch Execution Total No. of
time, time_ time_ 3.2-ms
Version B subsystem ms ms ms subframes
Interactive Consistency 1.7 11.8 13.5 5
Error Task .3 .3 .6 1
Fault Isolation Task .3 2.4 2.7 1
Clock Synchronization .3 2.4 2.7 1
Reconfiguration Task 1.1 34.1 35.2 11
19
19 subframes = 60.8 ms = 73.2% of an 83.2-ms single frame
29 subframes = 92.8 ms = 58.0% of a 160.0-ms triple frame
Under Version R, the application set of four tasks utilizes twelve 1.6-ms subframes. The executive
overhead then is
Dispatch Execution Total No. of
time, time, time, 1.6-ms
Version R subsystem ms ms ms subframes
Interactive Consistency 1.4 11.8 13.2 10
Error Task .9 .3 1.2 2
Fault Isolation Task 1.4 2.4 3.8 3
Clock Synchronization .3 2.4 2.7 2
Reconfiguration Task .9 2.4 3.3 3
20
20 subframes = 32.0 ms = 62.5% of a 51.2-ms single frame
40 subframes = 64.0 ms = 52.6% of a 121.6-ms triple frame
Under Version V, the application set of four tasks utilizes ten 1.6-ms subframes. The executive overhead
then is
Dispatch Execution Total No. of
time, time, time, 1.6-ms
Version V subsystem ms ms ms subframes
Interactive Consistency 1.3 11.8 13.1 10
Error Task .7 .3 1.0 2
Fault Isolation Task .7 2.4 3.1 2
Clock Synchronization .3 2.4 2.7 2
Reconfiguration Task .7 2.4 3.1 2
18
18 subframes = 28.8 ms = 64.3% of a 44.8-ms single frame
38 subframes = 60.8 ms = 55.9% of a 108.8-ms triple frame
2O
The overhead improvement in the subsequent ver-
sions of the operating system is readily seen in the Three-way vote Five-way vote
decrease in the length of a triple frame. The de- time per time per
Version buffer, ms buffer, ms
crease from Version B to Version R (i.e., 160.0 ms
to 121.6 ms) is a result of the improved Reconfigu- B 0.413 0.412
ration Task. The further decrease from Version R to TRIAD-B .352
Version V (i.e., 121.6 ms to 108.8 ms) resulted from R .302 .357
the decrease in vote time and the consequent ability TRIAD-R .247
to squeeze several tasks into one less subframe each. V %079 b.107
The higher percentage overhead of Version V results TRIAD-V %079
from the smaller major frame size and not from any aDoes not include 0.245-ms initial overhead.
increased inefficiency, bDoes not include 0.328-ms initial overhead.
The impact of the fault tolerance mechanisms
on performance can be seen by comparison with an
equivalent simplex system. The overhead of such a The vote times were found to vary with the number of
simplex system is easily calculated from the available processors in the configuration and the location of the
data. Without voting, the dispatch overhead would task replicates in the schedule table. These variations
be about 270 _s, or less than 10 percent of a 3.2-ms were typically on the order of 10 to 25 percent.
subframe. The time needed to execute the four sam- The overhead due to the second category, the ex-
ple application tasks would be approximately four ecutive task overhead, is given below. The Interac-
subframes, or 12.8 ms. A communications task, the tive Consistency Tasks were the major contributors
equivalent of Interactive Consistency Task 1 (IC1), in this category and accounted for 56 percent of the
would still be needed and would require at most one executive task overhead in the optimized Version V.
3.2-ms subframe. The executive task overhead would The overhead when a single frame is scheduled is
then be 20 percent of a small 16.0-ms major frame or
3.2 percent of a larger 100-ms major frame. Overhead,
Overhead, Frame size, percent of
Version ms ms frame size
Discussion of Results B 60.8 83.2 73.2
R 32.0 51.2 62.5
The SIFT operating system utilizes significant V 28.8 44.8 64.3
CPU resources to achieve fault tolerance in software.
This overhead falls in two main categories:
The reduced major frame size for Versions R and
V arises because the improved vote performance en-
1. The time required to vote the intertask communi- ables some tasks to be scheduled in fewer subframes.
cation variables at the beginning of each subframe The overhead for a triple frame is
2. The time utilized by the executive tasks, espe-
cially the Interactive Consistency Tasks
Overhead,
Overhead, Frame size, percent ofThe overhead from the first category, vote over- Version ms ms frame size
head, was found to be a linear function of the amount B 92.8 160.0 58.0
of data to be voted. Unfortunately, for as few as
six data buffers, the vote overhead was in excess of R 64.0 121.6 52.6V 60.8 108.8 55.9
30 percent of a 3.2-ms subframe. The vote times were
measured for three versions of the operating system.
Also, a slight modification to the VOTE routine was
discovered which enables a more efficient three-way Concluding Remarks
vote if the system is run with only three-way voting,
i.e., no five-way replicated tasks. This modification is The Software Implemented Fault Tolerance
referred to as TRIAD SIFT. The following table gives (SIFT) computer system requires significant over-
a basic comparison of the vote times (best values) for head to achieve its fault tolerance. Several versions
all versions of SIFT investigated with a six-processor of SIFT--Versions B, R, and V---evolved as improve-
configuration: ments were made to reduce this overhead. Version B
21
is the original delivered version of SIFT. This ver- problem is further compounded because erroneous
sion only runs by disabling the clock interrupt func- data also increase the vote time. The marginal
lion while many of the executive tasks are executing, increase in performance gained by adding software
This disabling of interrupts is unacceptable, since it shortcuts is offset by the increased effort required
seriously disrupts the cyclic output of the application for validation. By designing the vote system so
tasks. To eliminate this problem, the system was re- the vote time is constant and independent of the
designed to produce Version R. A drastic reduction schedule table or number of working processors, the
in the Reconfiguration Task overhead was obtained, complexity is reduced and the multiplicity of test
and feasible schedules were constructed without dis- modes is eliminated. Of course, the vote time is then
abling of the interrupts. Finally, a redesign of the always worst case.
vote subsystem resulted in Version V.
The voting and interactive consistency functions
were found to be the primary sources of operat- NASALangley ResearchCenter
ing system overhead. Unfortunately, these functions Hampton, VA 23665
seem to be inherently expensive when implemented December 10, 1984
in software. Several modifications were made to the
vote subsystem, but only moderate improvements
were obtained. Even in the improved system, with References
as few as six input variables, the five-way vote time
can consume over 30 percent of a 3.2-ms subframe. 1. Wensley, J. H.; Levitt, K. N.; Green, M. W.; Goldberg,
The Interactive Consistency Tasks require 13.1 ms J.; and Neumann, P. G.: Design of a Fault TolerantAir-
for every iteration of the applications task set (Ver- borne Digital Computer. VolumeI--Architecture. NASA
sion V). The Interactive Consistency Tasks along CR-132252, 1973.2. Rather, R. S.; Shapiro,E. B.; Zeidler,H. M.; Wahlstrom,with the other Global Executive Tasks consume at
least 55.9 percent of each major frame. By contrast, S.E.; Clark, C. B.; and Goldberg, J.: Design of aFault-TolerantAirborne Digital Computer. Volume II--
a simplex system with no voting or redundancy man- Computational Requirements and Technology. NASA
agement would use less than 10 percent of each sub- CR-132253,1973.
frame during scheduling. If a single communications 3. Wensley, J. H.; Goldberg, J.; Green, M. W.; Kautz,
task similar to Interactive Consistency Task 1 (IC1) W.H.; Levitt, K. N; Mills, M. E.; Shostak, R. E.;
is utilized, the executive task overhead is reduced Whiting-O'Keefe, P. M.; and Zeidler, H. M.: Design
to about 20 percent of a single 16.0-ms major frame Study of Software-ImplementedFault-Tolerance(SIFT)
or 3.2 percent of a 100-ms major frame. The fault- Computer. NASA CR-3011, 1982.
tolerant requirements of the SIFT system produce an 4. Goldberg, Jack; Kautz, William H.; Melliar-Smith,
P. Michael;Green, Milton W.; Levitt, Karl N.; Schwartz,
overhead at least 3 times that of conventional sys-
tems. There appears to be little hope for improve- Richard L.; and Weinstock, Charles B.: Development
andAnalysis of the SoftwareImplementedFault-Tolerance
ment of these figures without additional hardware (SIFT) Computer. NASA CR-172146,1984.
support. 5. Lamport, Leslie; Shostak, Robert; and Pease, Marshall:
The vote time dependency on the schedule table The Byzantine Generals Problem. ACM Trans.Program.
and on the number of working processors in the Languages_ Syst., vol. 4, no. 3, July 1982,pp. 382--401.
system is a serious obstacle to validation. One 6. Pease, M.; Shostak, R.; and Lamport, L.: Reaching
must be careful that sufficient processing time is Agreement in the Presence of Faults. J. ACM, vol. 27,
allocated to a task to cover all possible configurations no. 2, Apr. 1980,pp. 228-234.
of SIFT in which the task may run. The validation
22

1. Report No. {2. Government Accession No. 3. Recipient's Catalog No.
NASA TM-86322 {
4. Title and Subtitle 5. Report Date
Measurement of SIFT Operating System Overhead April 1985
6. Performing Organization Code
7. Author(s) 505-34-13-32
Daniel L. Palumbo and Ricky W. Butler 8. PerformingOrganizationReport No.
L-15855
9. Performing Organization Name and Address 10. Work Unit No.
NASA Langley Research Center
Hampton, VA 23665 11. Contract or Grant No.
12. Sponsoring Agency Name and Address 13. Type of Report and Period Covered
National Aeronautics and Space Administration Technical Memorandum
Washington, DC 20546 14. Sponsoring Agency Code
15. Supplementary Notes
16. Abstract
This paper presents the results of experimentation performed in the Langley Avionics Integration Research
Laboratory (AIRLAB) to measure the overhead of the Software Implemented Fault Tolerance (SIFT)
operating system. During the course of this experimentation, several versions of the operating system
evolved. Each version represents different strategies employed to improve the measured performance. Three
of these versions are analyzed here. The internal data structures of the operating systems are discussed in
sufficient detail to allow the reader to understand the experimental results and appreciate the modifications.
The overhead of the SIFT operating system was found to be of two types--vote overhead and executive task
overhead. Both types of overhead were found to be significant in all versions of the system. Improvements
incorporated at NASA substantially reduced this overhead; even with these improvements, the operating
system consumed well over 50 percent of the available processing time.
17. Key Words (Suggested by Authors(s)) 16. Distribution Statement
Fault tolerance Unclassified--Unlimited
Performance
Voting Subject Category 62
Scheduling
Reconfiguration
Interactive consistency
19.UnclassifiedSecurityClassif.(of this report) 20.UnclassifiedSecurityClassif.(of this page) 21.26No. of Pages 22.A03Price
For sale by the National Technical Information Service, Springfield, Virginia 22161
NASA-Langley, 1985

National Aeronautics and THIRD-CLASS BULK RATE Postageand Fees Paid
Space Administration National Aeronautics and I ._! ISpace Administration
Washington, D.C. NASA-451
20546
Official Business
Penalty for Private Use. $300
POSTMASTER: If Undeliverable (Section 158Postal Manual) Do Not Return
