A task scheduler for a fault-tolerant multiprocessor architecture / by Budidharma, Susatya
Lehigh University
Lehigh Preserve
Theses and Dissertations
1989
A task scheduler for a fault-tolerant multiprocessor
architecture /
Susatya Budidharma
Lehigh University
Follow this and additional works at: https://preserve.lehigh.edu/etd
Part of the Electrical and Computer Engineering Commons
This Thesis is brought to you for free and open access by Lehigh Preserve. It has been accepted for inclusion in Theses and Dissertations by an
authorized administrator of Lehigh Preserve. For more information, please contact preserve@lehigh.edu.
Recommended Citation
Budidharma, Susatya, "A task scheduler for a fault-tolerant multiprocessor architecture /" (1989). Theses and Dissertations. 4985.
https://preserve.lehigh.edu/etd/4985
' 
' J 
A TASK SCHEDULER FOR 
A FAULT-TOLERANT 
MULTIPROCESSOR ARCHITECTURE 
by 
Susatya Budidharma 
A Thesis 
Presented to the Graduate Committee 
of Lehigh Univ~rsity 
in candidacy for the degree of 
Master of Science in Electrical Engineering 
,. 
'' ' 
Lehigh University 
Bethlehem, Pennsylvania 
1989 
' ' .......... :
; 
I . 
' 
This thesis is accepted and approved in partial fulfillment of the require-
m~nts for the degree of Master of Science in Electrical Engineering. 
Hay 18, 198:I 
Date 
Advisor 1n Charg 
• • 
11 
..... , . 
·,,.! 
' 
\ 
!,-....,, 
- . \_,, 
1, 
",· . . . 
,I 
ACKNOWLEDGMENT 
I would like to. express my deepest appreciation to my advisor, Dr. 
Meghanad D. Wagh for his time and patience during the development of this 
thesis. The completion of this thesis was made possible through his guidance. 
t 
I would also like to thank Daniel A. Schwartz, and Richard A. Sullivan for 
helping me with the postscript and the scribe manuscript files and to Monica 
A. Newman for providing the copy of scribe manual. 
I wish to thank to my friends, Irwan, Rick, Yosi, Ray, Chandra, Fasil, and 
t 
the Edy family for delighting the stay at Lehigh University. Special thanks to 
Irwan for providing the Macintosh, and to Rick for proofreading the early ver-
sion of this thesis. 
This thesis is dedicated to my parents after all their support, patience, 
encouragement, and prayer. 
lll 
(J 
., 
Table of Contents .. 
. ' 
ABSTRACT 1 
1. Introduction 2 
1.1 Multiprcicessor Architectures 2 
1.2 The NX8086 Architecture 4 
1.3 Organization of the Thesis 6 
2. Operating System for the NX8086 Architecture 8 
2.1 Task Definition 9 ,j 
2.2 Sc1ieduling Algorithm 10 
2.3 Software Fault Tolerance 13 
2.4 Mutually Exclusive Access 15 
2.5 System Initialization 16 
2.6 System Bottlenecks 17 
3. Strategies for Fault Avoidance 19 
3.1 Simulation Description "' 19 ~ 
3.2 The Task Scheduling Simulation 20 
3.2.1 IDLE state 21 
3.2.2 EXECUTING state 22 
3.2.3 END_EXECUTING state 22 
3.3 I/0 Simulation 23 
3.4 Arbitration Logic Simulation 23 
3.5 Fault Simulation 24-
/ 
_/"• 3.6 Parameters 25 
3.7 Simulation Results · 26 
4. Initial Set-up of a Parallel Task 33 
4.1 The FORK-JOIN language constructs 34 
4.1.1 The Parallel Program Structure using FORK-JOIN Con- 35 
struct 
4.2 The Host-PreProcessor (HPP) 39 
4.2.1 Input and Output Files of HPP 43 
4.3 Sample Results 44 
5. Conclusion 46 
5.1 Discussion 
' 
46 
5.2 Future Extensions 47 
REFERENCES 48 
Appendix A. Grammar Specification for FORK-JOIN Ian- 49 
guage construct. 
Appendix B. The Task Scheduler Simulation Specifications 51 
Vita 59 
ACKNOWLEDGMENT ••• 111 
' 
... 
IV 
.• 
I •• 
' 
List of Figures 
Figure 1-1: Block diagram of NX8086 5 
Figtde 2-1: The data structure of a single ,node of a task. 11 
Figure 2-2: . A task table representation with 6 nodes/subtasks 12 
Figure 2-3: The task scheduler algorithm 13 
~ 
Figure 3-1: The precedence graph of input task for the simulation. 26 
Figure 3-2: Dependence of the complete task execution time on the 28 
complexity of each subtask,. t and number of processors ,, 
when rho= 1 and CO= 3 time units. 
Figure 3-3: Dependence of the complete task execution time on the 28 
complexity of each subtask, t and n~mber of processors 
when rho = 2 and CO = 3 time units. 
Figure 3-4: Dependence of ~~ total execution time on the com- 29 
munication overliead, CO, and the number of processors 
' 
when:1rho=l and t=45. 
·,:c,. Figure 3-5: Dependence of the task execution tin1e on the execution 30 
redundancy factor, rho, and the number of processors in 
the presence of 2 processor failures, C0=3, and t=45. 
Figure 3-6: Dependence of the task execution time 011 the execution 30 
redundancjr factor, rho, and the number of processors in 
the presence of 3 processor failures, C0=3, and t=45. 
Figure 3-7: Dependence of the task execution time on .the execution 31 
redundancy factor, rho, and the number of processors in 
the presence of 4 processor failures, C0=3, and t=45. 
Figure 3-8: Dependence of the task execution time on the number of 32 
processors when different number of processors fail with 
J t=3, C0=45, and rho=2. ' 
Figure 4-1: An example of a precedence graph. 34 
Figure 4-2: FORK-JOIN program representing the same precedence 36 
information as in Fig 4~1. 
Figure 4-3: An example of a more complex precedence graph. 37 
Figure 4-4: FORK-JOIN program representing the same precedence 38 
information as in Fig 4-3. 
I 
V 
. ~-.;.,. ..... 
• 
< 
' 
/ 
List of Tables 
Table 1-1: NX8086 Memory Map 
Table A-1: Notation used in the productions 
,) 
., 
' 
\ 
• VI 
' -.,ll'i-, 
• l •• .. 
,.. 6 
49 
.. 
') ', 
. '· 
• 
.. 
ABSTRACT 
This thesis describes the implementation and evaluation of an operating 
system and a task scheduler of a fault tolerant task-flow parallel computer 
NX8086 designed and built at Lehigh University. Taking into account the 
I 
major characteristics of the system, where there is no master and slave relation 
. among the processors, and also its expandability, the task scheduler is designed 
to be run independently on each processor. Furthermore, some software redun-
dancies are added to produce a more reliable system. 
A simulator is developed to evaluate the performance of the architecture 
in the presence of processor failures. Based on parameters, such as the number 
of processors, the task duration and the communication overheads, this 
simulator allows the system designed to compare and choose the right fault 
tolerance strategy for the application. 
1 
,j. 
· Chapter.I 
Introduction 
1.1 Multiprocessor Architectures 
L 
The need for computing power is increasing at an enormously high pace in 
recent years. However, the reached a limiting point and provides only marginal 
improvement at the cost of tremendous cost. This has focused greater attention 
at parallel architectures and in particular to architectures configured as a mul-
tiprocessc,r systems [1] 
A multiprocessor system can be defined as a system with multiple proces-
sors communicating through an interconnection network. The interconnection 
could be a single common-bus, a simple and a less expensive one to implement, 
or more complex variation such as a ci;:oss-bar switch [2] or a butterfly network. 
A multiprocessor system could be further categorized based ·on its memory 
structure. It is a centralized memory system where the memory is shared 
equally among all the processors. If each processor has its own local memory 
then it is called a distributed memory system. There are two basic models for 
the process/task synchronization and communication also. The first is where 
the processes1 communicate through shared variables. The second is where 
process communication is done through explicit message passing. This thesis 
focuses on the multiprocessor architectq:re, which has a centralized memory 
structure and adopts the shared memory model for the process synchronization . 
.  .
1The names Task and process are used interchangeably in this thesis. 
2 
One of the :qiain advantages of a multiprocessor system is its ability to 
increase processi11g power by adding more processor module to the system. 
However, th~re are limitations imposed by both hardware and software bot-
\ \ 
I \ ,~,.,-"- "' " 
.. _ t-: .... 
tlenecks. The hjd;are bottlenecks are primarily caused by the number of 
i ,,.,_,."' 
processors, the· nfemory bandwidth, and the processors-memory interconnec-
tions. On the other hand, the software bottlenecks in a shared memory module 
" 
are caused by the maintenance of the tasks or data shared among the pro·ces-
sors. In addition, processors' access to shared memory should be guaranteed the 
mutual exclusiveness. 
( -
The major resource shared by all the processors in the shared memory 
model is the ready queue. This queue holds tasks ready for execution. (These 
-· 
are the tasks whose precedence constraints are satisfied.) Only one idle proces-
sor is allowed to access this queue at any time. There are several approaches to 
solve this scheduling problen:i. One approach is to .. have a dedicated processor, 
called dispatch processor, to handle the task assignment. As the number of 
1 processors increases, the dispatch processor itself becomes the system 
bottleneck [3]. Further, if the dispatch processor fails, it can be potentially be a 
catastrophic loss for the system. Thus from the reliability point of view, this 
strategy is very poor. An alternative to this is .to let all the processors have an 
equal access to the shared resource. Any idle processor will compete to access 
the ready queue to get a task to be executed. In this approach all processors are 
runni~g independently. Any processor failure degrades the system performance 
gracefully. In other words, failure of a processor only affects the execution time 
but can never cause a catastrophy. Furthermore, the number of processors in 
the system is transparent both to the user and to the task scheduler; hence, the 
system expandability can be well maintained. 
3 
r 
·~. 
•r.,"' •• • ' 
.... '.,_., ...... 
.. , 
Central focus of the work reported here is to implement such a task 
scheduling system embedded ~ithin the operating system, and to study its per-
formance with and without processor faults. The architecture used for these· · 
studies is the NX8086 multiprocessor system developed at Lehigh University 
[ 4] and described in the next section. 
1.2 The NX8086 Architecture 
The NX8086 architecture was designed and fabricated at Lehigh Uniyer-
sity around 1987. We call it NX8086 simply because it can be expanded to any 
size N and because it currently uses only 8086 processors. However, it should 
be mentioned here that the architecture is flexible enough to employ any com-
patible (running the same object code) processors as long as the bus interface 
rules are observed. 
Unlike other conventional parallel architecture, the NX8086 was designed 
to get rid of the master and slave relationship between processors. Each proces-
sor can control the system independently through its own copy of the operating 
system. The architecture also allows one to exploit the parallelism at the task 
level. 
The overall system block diagram of NX8086 is shown in Figure 1-1. It 
has a shared memory module, a control board, and several processor modules, 
all interconnected by a shared bus. Current version of NX8086 has four 8086 · ...( 
processors. 
The memory module contains the global memory and ~consists of a 64K 
bytes of RAM and 8K bytes of ROM. The ROM is used to store the task 
scheduler. The RAM is the storage where the task and the data will reside. The 
4 
,. . 
" 
0 
,, ' 
r--------------------, 
:MEMORYBOARD : 
I I 
I I 
I I 
I MAIN MEMORY I 
I I 
I I 
I I 
I I 
---------------------
\._ . 
SYSTEM BUS 
/ 
' 
.---------------
-
-------------- ... 
Local 
L 
' 8086-5 - Memory 
' 
, 
NODEO 
--- -- _ .... __ i __ -------- ---- -- ---- _ .. 
HOST PC 
r-------------------1CONTROLBOARD 
I 
I 
I 
I 
I 
I 
I 
I 
I 
ARBITRATION 
LOGIC 
l/0 
RS232 
l)IA GNOS1'ICS 
AND 
CONTROL 
-------------------- --------------------
/ 
,--------------- -
---------------. 
l.,ocal 
L ' 8086-5 - Mernory 
1"- ,I 
NODE I 
------------------------------~ 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
Figure 1-1: Block diagram ofNX8086 · 
f .,. ... 
.,. 
~------------------------------~ 
N()DE 3 
8086-5 
/ 
r--------------- -
8086-5 L 
f' 
!'\ 
Local 
Memory 
---------------, 
I 
I 
I 
Local I ~ I 
Memory I 
I I 
I , 
I 
I 
N0DE2 1 
------------------------------~ 
I 
.. 
,; 
control board consists of the arbitration logic, the I/0 port, and the diagnostics 
and control logic. The arbitration logic controls the shared bus. It receives the 
bus requests (BRQ\) from all1 the processors, prioriti~es them and generates the 
bus grant (BG\) signal. It also gener.ates the token signal everytime a processor 
relinquishes the bus with no other processors requesting the bus. The I/0 port 
' 
is a high speed serial port (RS232). It can be configured to communicate with a 
host machine at baud rates up to 19200 baud rate. The diagnostics and control 
logic has the clock generator and an LCD display used for hardware debugging. 
This logic also handles the system reset and generates appropriate signals upon 
system. reset or system power-up. Finally, the processor modules are the loca-
. . ' 
tion of the main processing power. Each processor module has an INTEL 8086 
microprocessor. A local memory (RAM) of 16K bytes and the bus interface logic. 
A complete memory map of the system is provided in Table 1-1 
Logical Address Used 
' 
'CK' 
(Hex) (bytes) Mapped by 
00000 ... 3FFFF 16K Local RAM 
40000 .. 0 7FFFF n/a Unmapped 
80000 ... 8FFFF 64K Global RAM 
90000 . . . EFFFF n/a Unmapped 
FOOOO ... FFFFF BK ROM 
Table 1-1: NX8086 Memory Map 
1.3 Organization of the Thesis 1-~ 't 
This thesis was concerned with the development of the operating system 
for the NX8086 multiprocessor architecture and for the evaluation of the ar-
chitecture controlled by this operating system under processor fault conditions. 
Chapter two describes the operating system developed. The expandability of the 
6 
, 
system is taken into account in the implerrie tation. The task scheduler, prob-
ably the most important component of oper\ ting system of a task flow architec-
·11,. 
~, 
ture to support a fault tolerance is also explained in this chapter. Chapter three 
discusses the simulation of the scheduler, including the performance evalua-
tions and some simulation results. Chapter four describes a preprocessor used 
by the host to convert a given task into the format required by NX8086 operat-
ing system of chapter two prior to its execution. Finally, chapter five sum-
~ 
marizes the results obtained and possible future extensions. 
., 
.. 
~· ": •• ~.''I' "' 
~ j .... 
7 
! ' 
' 
• (>' 
Chapter 2 
Operating System for the NX8086 Ar-
chitecture 
To incorporate the fa ult tolerance ideas within the NX8086 architecture 
an identical operating system is rriade resident within each processor local 
memory. This operating system is self-sufficient to run the entire system 
providing the required degree of fault tolerance. (In case of a master-slave con-
figuration of a multiprocessor system, the master processor is the only one that 
runs the operating system.) A single copy of this operating system is main-
tained in the EPROM present on the memory module board. At the time of 
power-on (or system reset) a chosen processor takes control of the bus and loads 
this operating system in each processor memory. 
An earlier version of the operating system for NX8086 was written in 
8086 assembly language and was rather rudimentary [ 4]. As part of the current 
project, a complete operating system for NX8086 was developed using the Clan-
guage. The use of C ensures code optimality and ease of modification. C code is 
converted to object code before writing into the EPROM. It has been verified 
that the code generated does indeed" fit in the EPROMs provided. However, we 
were not able to verify the correct operation of the architecture under control of 
this operating system, because of time constraints. 
I 
' ' 
This chapter describes the new operating system for NX8086. It incor-
porates the scheduling algorithm, software fault tolerance, system initializatio11 
and system communication with the host. 
,,. 8 
,£, . 
..r....~·-, 
'i \ ·. 
'· 
IQ 
.. 
2.1 Task Definition 
NX8086 can be described as a task-flow-architecture. A major portion of 
' 
the operating system is devoted to task management and task scheduling. In 
order to describe the principles of this operating system, we now define a task as 
follows. 
• A Task is a compound executable code. It consists of one or more 
component sequential codes, called Subtasks, which may be ex-
ecuted in parallel based upon their interdependency relations using 
shared global datao dependency relations. 
0 
In order that the parallelism of the program is exploited effectively by 
NX8086, the user must explicitly divide the task into subtasks using the FORK-
JOIN language explained iifChapter . Each subtask may be written in any 
structured language. When it is compiled, the compiler can be instructed to 
separate the global data from the instruction code. In addition to the task's 
code, a precedence relations must also be provided. A program called Host-
PreProcessor (HPP) is devel9ped. It is used to help user to prepare the neces-
sary information needed to build a task. Chapter discusses the HPP program in 
detail along with the description of the input file format and the generated out-
put files. 
.. 
· The output file from HPP contains the necessary information to build the 
task table. It includes the precedence relations information, the beginning ad-
dresses of the subtasks, the size of each subtask's code, and the global data in-
formation. This information will then be used by the processor which downloads 
the data from host to global memory to generate a task table. Fig. 2-1 explains 
the data structure of a subtask as is used by the operating system. Fig. 2-2 
shows an example representation of a task, which contains 6 subtasks. It con-
tains the following information: 
9 
\ 
' 
J'' ' 
' 
I. 
I • 
•·· 
" . 
·11. 
• The subtask ID number. 
• The number of successor nodes. · 
• The number of predecessor nodes. 
• The pointer to the next node so that all the nodes may be traversed. 
• The pointer to the next ready node so that the correct task execution 
sequence is maintained. 
• The pointer to the successor nodes to intimate them when this sub-
task is finished 
• The status word to keep status of the task. 
• The starting off set code address so that the code may be transferred 
to the local memory. 
• The length of the n_ode's code (in bytes). 
. 
• The counter to keep count of how many processors are executing 
this subtask. 
2.2 Scheduling Algorithm 
Major shared resource in NX8086 is the global memory (as explained in 
the discussion of NX8086 system architecture in Chapter). At the beginning of a 
task execution a task table is constructed and stored in the global memory. The 
main function of the scheduler incorporated within the operating system is to 
maintain this table efficiently. 
Few assumptions are made in the implementation of the task scheduler: 
• No task preemption occurs. 
• Local memory is big enough to fit a subtask, and no memory swap-
ping is called for. 
• Global memory is big enough to hold the task complete table, task 
code, and the global data. 
Each processor module has its own local memory. A processor loads the subtask 
code and the global data into its local memory and executes it until task comple-
tion. Therefore there will be no subtask preemptions, and no memory swapping. 
·, 
r 
10 
~·. ~':',"j. :'"'" ._.. ,. '·. 
The node data structure: 
·-
The subtask data structure: 
Subtask ID -. {' 
# of Successor 
# of Predecessor 
*next node 
- ;(, y.-.,. :· 
*ready. link 
*saved_regs 
*succ node link 
- -
Status : ; ; 
Start offset addr The successor node d ata structure: 
-
-
Length 
- *successor -
*global_data_info *next succ node 
- -
counter I 
' 
' 
The global_data_info data ~tructure: 
< 
. Info -
*next info 
-
Figure 2-1: The data structure of a single node of a task. 
A ready queue is used to hold those ready subtasks for execution. To con-
serve memory, the ready queue is designed to be part of the task table (see 
Figure 2-2). An idle processor requests bus grant, and when granted, checks 
. 
·this queue. If the queue is not empty then it loads the subtask code and the 
~ 
necessary global data, and starts the execution. It also .modifies the task tables 
within the global memory to remove the subtask from the ready queue and set 
its status to EXECUTING. Upon the completion of a subtask, the processor up-
dates the global data and the task table, sets the subtask status to COM-
PLETED, notifies its successors a~d finally looks for another re~dy subtask. If 
a processor finds that all the subtasks are· COMPLETED, then the processor 
\ 
( 
11 
·., 
. '.J 
Head node 
-
- -
Ready head Q -1 
I 
I 
I 
-
I 
- 5 1 2 3 ·-- 4 6 . -
-
' 
- I - -
' 
-
~,,,,,.-
- -
- --
~ 
~· 
~- ~ t-~~- .... a.11811. ~~ -f..aial- - - - - 1 ~- .... - -- Ital ~------- • r .,_, ,-- -I I I I I I I 
- I I 
..,...._ -- -, I -..... - ~ -, 
I I I .. ,, I I t-·- 1--- I 1--- I I t--· t---I I --- I I -- lo.. - ·- I -I I I L.......111_ - - I I I I I I .... I I I I I I I I I I I I I I I 6 I I I a I I n I I I I I I i I I I B I I u I I I I I a I 
--·· I 
a I I I 
• I I 
I I 
=.-.J._ . ., __ •m-••-•mL .. J 
' ' ------ -I -· ·-1 -I 
I 
I I 
I I 
I I 
~-------------J_L------------~------' I I I 
I I I I- _ _. Ir 
11 I II...-......,. 
I I I I I 
I I I I I "----' 
I I · I I 
I ~------------j--~------~--------~~ I I I I 
~----------------------------------~, 
I - -I 
I 
I I 
I 
I I 
I 
I 
I 
.. - - - - - - - - - - ~ -...1-L- - - - - _, 
I I L--------------------------------------------------
Figure 2-2: A task table representation with 6 nodes/subtasks 
.. 
j 
.F 
'-'·· 
-
tries to create a connection to the host computer. After the connection is es-
Q) 
tablished, the processor sends the global data to the host and requests a whole 
new task. The algorithm used by the scheduler is shown in Figure 2-3. 
while true do 
while bus not granted do 
wait; 
whilend; 
if no task found then 
connect with the host; 
get a new task; 
create ready link; 
ifend 
if not empty ready_queue then 
get a subtask; 
release the bus; 
execute the subtask; 
request the bus; 
while bus not granted do 
wait; 
whilend; 
update the global data; 
update the task table; 
else 
if task completed then 
send result to host; 
ifend; 
ifend; " 
release the bus; 
whilend; 
Figure 2-3: The task scheduler algorithm 
2.3 Software Fault Tolerance 
\ 
) 
"Redundancy" is the key of fault tolerant computing systems. A mul-
tiprocessor system itself by default is inherently redundant, called processor 
redundancy [5]. 
The operating system implemented in this thesis provides further redun-
dancy by allowing more than one processor to execute a subtask. When an idle 
processor finds that the ready queue is empty (i.e. no new subtask is cleared for 
execution), and yet the task is not completed, it gets one of the rnnning subtasks 
13 
,-,.,. ..... , 
• 
' . 
' ;".) 
.t'!i\ .. , ....... :~.~··· .. ··.·» 
and starts the execution of that subtask. When any processor finishes, it checks 
the status of the subtask. If the status is not COMPLETED, then the processor 
updates the task table and sets the status of the subtask to COMPLETED. 
Otherwise the processor ignores the subtask. 
This scheme produces a more reliable systeffi, however it also reduces the , 
system's performance as the number of processors running the same subtask in-
creases. A limitation must be enforced. The simulation studies conducted as a 
part of this project have concluded that having a maximum of two processors 
running the same subtask still produces a tolerable dec-rease in the system per-
formance, but provides a greatly fault tolerance. A counter stored in each sub-
task information table is used to keep track of the number of processors execut-
in.g a particular subtask. The counter is incremented when a subtask is as-
signed to a processor. 
Another problem introduced by this scheme is that a new task may be 
loaded into the system while one of the processors may still be running an old 
subtask. To overcome this problem, each task is given a name. Two consecutive 
tasks must be assigned unique names. At the beginning of an execution each 
processor records the current task name and stores it in its local memory. Upon 
· completion of a subtask, the task name in the global memory is compared with 
the one in the local memory. If they are the same then and only then the 
processor proceeds with the modification of the global data tables. Otherwise it 
ignores the results of the subtask. 
·Another solution to the above dilemma is to modify the hardware so that 
any processor can trigger an interrupt of any other processor. This mechanism 
may be used thus: When a processor loads a subtask code into its local memory 
14 
,, 
. " • ., "., .»,l·• 
/ 
for execution, it records its identity against the task in the global task tables. 
Thus rather than recording merely how many processors are executing the sub-
\) 
task, we now store the identities of all the processors involved in the execution 
of the subtask. When any of these processors"' is finished execution and is 
granted for the bus, it modifies the global data· table, updates the status of the 
subtask. to COMPLETED and then checks if any other processor is executing 
that same subtask. If any such processor is found, it is interrupted. The inter-
t;., ...... 
rupt code in each processor merely aborts the task under execution and/puts the · 1 
processor in IDLE status. 
This strategy was used in the simulations reported in Chapter . It was 
seen that it improves the total task execution time marginally unless each sub-
task complexity is relatively large compared with the communication overheads. 
The failure of this strategy to provide better results is attributed to the ad:-
ditional overheads of storing processor numbers in global tables, checking the 
tables to identify other processor duplicating the subtask, creating interrupt sig-
nals and running the interrupt codes. 
2.4 Mutually Exclusive Access 
The ready queue and the task table stored in the global memory are the 
major shared resource in the system. Any processor may request to add a sub-
task into or remove a subtask from the ready queue. Mutually exclusive access 
to this queue must be maintained at all times. Fortunately, the hardware 
guarantees the mutually exclusive access due to the shared bus as the intercon-
nection network. To access the ready queue, which is in the global memory, the 
processor must request the bus first and wait until granted. It must then 
release the bus after completion . 
15 
:"' 
1 
.• 
• 
• 
As the number of processors increases, more memory bandwidth will be 
wasted due to the excessive bus requests by idle processors. To cope with this 
problem, a token signal is provided. Any idle processor will not request the bus, 
instead it just becomes idle. When a processor releases the bus, a token signal 
is generated by the bus arbitration logic. The signal propagates starting with 
the highest priority processor (0) until the first idle processor is found. The logic 
at each processor node allows the token to pass to the next node if the processor 
is not idle. If the processor is idle, the token is passed on to its interrupt input. 
This processor is then interrupted and given the bus to access the ready queue 
,. 
to check for an available subtask. If no subtask is found then the bus is released 
again and the processor goes back to the idle state. 
2.5 System Initialization 
When the system is powered up, all the processor are disabled except one, 
the processor 0. (In NX8086 architecture, this processor can be identified by a 
" proper jumper connections.) This processor is given an access to the ROM, lo-
cated in the memory module. By default, after system reset the 8086 program 
counter points to address FFFF:0000 [6]. The first instruction at this location is 
a lqng jump to the starting address of a loader. The loader then loads the 
operating system kernel from the ROM into the local memory. It also sets the 
interrupt vector (NMI) address to the starting address of the operating system 
kernel on the local memory. Due to the transceiver control logic, the information 
is simultaneously written into all the local memories of each.processor module. 
There are 8 reserved words ih the global memory that must be initialized 
by the loader.. These reserved words are used to store the following: 
1. The address of the task name. 
16 
,,., ·, 
----------
.P 
) 
.J 
l 
1 
2. The address of the pointer to the first node in the task table. 
3. The address of the pointer to the last node in the task table. 
4. The address of the pointer to the first node in the ready queue. 
5. The address of the pointer to the last node in the ready queue. 
6. The address of the alternate pointer to the first node in the ready 
queue. 
7. The starting address of the global code. 
8. The starting address of the global data. 
They are initialized to NULL by the loader. 
Upon completion of the initialization, pro~essor O reverts access back to 
itself. Th~ control board then sends an interrupt signal to the other processors \ 
to start the execution of the kernel on each processor's board. At this point, the 
system is operating normally, as the highest priority processor (0), it can start 
communicating with the host to request a task. 
2.6 System Bottlenecks 
The optimum speed up for a system with n processors would be n. 
However, in practice this speed up can not be obtained. This is due to the sys-
tem bottlenecks introduced in a multiprocessor systen1 and the associated mul-
tiprocessor system overheads. NX8086 architecture is not immune to this. 
A major bottleneck in NX8086 system is the need to gTant mutually ex-
clusive access to the ready queue. This reduces the system performance as the 
number of processors increases. Multiple ready queues may allow parallel ac-
cess to the queues. Studies about the multiple ready queues including the ef-
ficient scheduling strategies are reported in [1]. For the case of a single shared 
bus architecture, such as NX8086, however, having a multiple ready queue is 
not the solution. 
17 
... 
-· 
The scheduler itself introduces bottlenecks, which are caused by schedul-
ing a large number of processors. It also takes some times in searching for a 
task within the task table, updating the task table and loading the task to the 
global memory. These losses can be overcome by a careful design of the data 
structure of the task table, so that the searching and updating processes can be 
efficiently performed. To reduce the communication overhead encounter~d in 
the data/code transfer, it is recommended to lengthen the average length of the 
subtask so that fewer subtasks for a given application move through memory. 
., 
,, 
l 
I 
'l 
18 
.. 
r. , 
' . 
-., .,. 
... 
' 
,, 
Chapter 3 
Strategies for Fault Avoidance 
·1 
It is wellknown that a processor normally fails in a mode that disconnects 
it from the rest of the circuit. NX8086 architecture allows recognition of such 
dead processors and provides strategies to survive through such losses. All 
these strategies are based upon clever scheduling of tasks. If a processor dies 
while executing a subtask, the architecture can deal with it if it is employing the 
correct task scheduling strategy that allows another processor to continue the 
task if a failed processor is recognized. In order to study such strategies, a 
simulator fur NX8086 architecture is developed. This chapter describes this 
simulator and also some of the results obtained from various fault avoidance 
strategies. 
3.1 Silllulation Description 
The simulation routines are written in C for ease· of modification and 
closely resemble the actual task scheduling part of the operating system of 
l 
NX8086. The major differences between the simulation and the operating sys-
tem are primarily in the routines that directly control the hardware such as the 
bus request and the bus release. No modifications are made to the maintenance 
routines of the global table and the ready queue. One of the :r:easons for using C 
language for the simulation is to get a speedy execution and to approximate the 
actual run time as closely as possible. This same criterion also ruled the design 
of a lot of code for the simulator. The code runs on IBM PC or compatible and 
produces a colorful screen display showing the status of the machines (for ex-
ample of common bus), of each processor (status of processor, which task it is 
executing etc.) and of each subtask. 
19 
Appendix B. shows the organization of the simulator including all the 
routines. The simulator program is divided into following seven modules: PRINT 
module, HARDWARE module, CREATE module,QUEUE module, 1/0 module, 
KERNEL module, and Mf[N module. Detailed functionality of each of these 
.... 
.. r, 
module is also described in Appendix B. 
The simulator also allows any number of processor failures. Following as-
sumptions regarding to the processor failures are made: 
• Only active (non-IDLE) processors can fail. 
• At least one processor runs without failure (so that the overall task 
is completed though slowly). 
• No processor fails during I/0 operations. 
During a simulation, a processor i can be failed by hitting key i on the keypad or 
<::) 
one can fail the processors randomly. An internal random generator is used to 
randomly select the failing processors. 
3.2 The Task Scheduling Simulation 
An array of N records (where N is the number of processors) is used to 
simulate multiple processors of the NX8086 architecture. 
represents a processor and consists of the following fields: 
Each record 
ID 
state 
task name 
start 
The processor ID, an integer between O and N-1, where N is 
the total number of processors in the system. 
Variable representing the state of processor. Th~ state may 
be one of these: IDLE, EXECUTING, END_EXECUTING, 
and FAILED. )·· 
The name of current task being executed in the processor. 
The time when the current execution began. 
time_required The time required to execute the current subtask. 
executing_~ode_ptrThe pointer to the subtask which is being executed 
20 
. ........ 
.-
i '1 
,. ... - . ' 
' 
, .. ,.., ...... ·;~~--,., .,:1: 
1"he main task scheduling routine simply looks as follows: 
while true do 
for each processor do 
case processor's state of 
IDLE: 
do IDLE; 
EXECUTING: 
do EXECUTING; 
END EXECUTING: 
do END EXECUTING; 
FAILED: 
do NOTHING; 
casend; 
forend; 
whilend; 
,-
-,· 
F·or every iteration each processor's state is verified. A failed processor is 
removed from further simulation cycles unless it is brought back to life by hit-
ting the key corresponding to it again. This is the reason for the statement 
do_NOTHING in the above algorithm. Actions corresponding to the other 
processo-r states are elaborated in the following sections. 
3.2.1 IDLE state 
If the bus is not busy transferring either code to a processor or results 
from a processor to the global memory, all idle processors are checked in order to 
see if they want the bus. In particular if a processor is in IDLE state, the 
routine do_IDLE is called upon to first check whether the current processor may 
be granted the bus. If the bus is granted, :the task table and the ready queue 
are checked. Otherwise the do IDLE routine is exited. If the task table doesn't 
exist at the time of such check, then the task tree creation is started. This 
creates the task· table and the ready queue. Once the task tree exists (either a <:., 
newly crea~ed or an old one) a subtask execution can be started if a ready sub-
task is found. A subtask is said to be in ready state if all its proceeding subtask 
are already finished and all the data necessary for its execution is available. 
Before actual ... ~xecution of the subtask, however, it should be transferred from 
., 
' 
21 
J 
, 
I 
the global memory to the local processor memory. The simulator allows for the 
initial specification of each subtask code size and spends a proportional time to 
allow for its transfer over the common bus. 
At the beginning of an execution of a subtask, an initialization comprising 
of the following is carried out. 
l. task name is set to the name of the current task obtained from the 
task table. 
2. start is set to the current system time. .. 
3. time_required is set to the expected time to execute the subtask ob-
tained from the task table. 
4. executing_node_ptr is set to point to. the subtask record in the task 
table. ~ 
5. state is set- to EXECUTING. 
Finally the bus is released by calling bus_rel routine, so that it may be used by 
the other IDLE processors. 
3.2.2 EXECUTING state 
When the current processor is in the EXECUTING state, the routine 
do_EXECUTING is called. In this routine, the time elapsed from start of the 
S\lbtask to current ~ystem time is compared with the expected execution time of 
the running subtask. Simulator allows one to specify the execution time of each 
subtask. On completion of this time the processor state is _set to 
END EXECUTING. 
-
3.2.3 END EXECUTING state 
-
When the task execution .is finished, the processor is expected to return 
the execution results to the global memory so that the data can be used by suc-
cessive tasks. Therefore, when a processor enters END _EXECUTING state, the 
routine do END EXECUTING is called. It first checks whether or not the cur-
- -
rent processor request for the bus is granted. If the processor request is not 
22 
;, -
' ,. 
·"' 
' I 
\., 
gy·anted, then the routine bus_req is called. Once the bus request is granted, the 
processor begins to update the task table and the ready queue in the global 
memory. After completion, the routine bus_rel is called upon to free· up the bus 
( 
assignment and the proc~ssor state is set to 1DLE. 
3.3 1/0 Sim.ulation 
The I/0 routines include the simulations of data or code movements from 
the host to th~ global memory, from the global to the local memory, and vice 
versa. Each movement is monitored by a counter which counts from 1 to the 
number of bytes need to be moved. 
3.4 Arbitration Logic Sintulation 
The arbitration logic is the part of the Ntso86 architecture that controls 
~ j 
' 
the bus. It decides the bus allocation to different processors, and also generates 
a signal on the token line when any processor relinquishes the bus. This token 
goes through the processors in order. By passing processors in EXECUTING 
states, it wakes up the first IDLE processor it finds. 
f ·, 
There are three routines related to the simulation of the arbitration logic, 
the bus_req, the bus_rel, and the arbitrate. The bus_req routineis called when-
ever a processor is requesting the bus. It updates the BRQ, an array variable/ 
that keeps track the processor requests for the bus. When a processor requests 
the bus, it sets the corresponding element of the array BRQ to 1. This allows 
1: 
the arbitration logic to assign the bus, when free, to the highest priority bus 
requesting processor. In NX8086 hardware, this array is implemented by 
priority logic. The bus_r.el routine is called when a processor wants to release 
the bus, and updates the BRQ array again. The arbitrate routine is called every 
time a processor release the bus. It decides which processor gets the next bus 
23 
·,. 
,. 
-
I I 
assignment by setting variable BG equal to the assigned processor ID. 
If none of the processors is requesting the bus, this routine generates the 
token signal. Each processor in the NX8086 hardware can generate a single bit 
IU signal. A processor will set its IU to 1 at the beginning of a subtask execu-
tion and reset it upon completion. The IU signal is used to stop the propagation 
" 
of the token signal to the next lower priority processor. The IU signals for all 
the processors is simulated by an integer array, IU. The element is the·n set or 
.. 
reset by the corresponding processor according to the processor state. The token 
signal propagation is simulated by scanning through the array IU, starting from 
index 0, until the first O value is found. If the O value is found then BG is set to 
the index and the processor 'With ID equal to the index is granted for the bus. 
r 
\. 
3.5 Fault Simulation 
As explained earlier, the simulator allows the toggling of a processor be-
. ,. 
tween a healthy and a faulty state by hitting the corresponding key on the 
keypad. However, for realistic simulations and to estimate average performance 
of fault avoidance strategies, the processors may be failed randomly during ex-
ecution. 
The faulty processors are chosen randomly at the beginning of the execu-
tion of the simulation. The maximum number of faulty processors must be less 
than the maximum number of processors, because it is assumed that at least 
one processor is running without fault. 
Given an input task, which consists of n subtasks with ID numbers from 1 
to n, x number processors that are to fail during the execution are determined 
thus. First x unique random integer numbers between 1 and n are generated. ;It ·1 
24 
• 
l 
i 
is assumed that the processors that start on the subtasks corresponding to these 
numbers will fail during the execution. When a processor gets a subtask from 
the task table, it checks whether the subtask ID number is in the set of x ran-
dom numbers. If the subtask ID does belong to the set, then the ID number is 
removed from the set and the processor goes to the FAILED state. This scheme 
guarantees that only one active processor fails while executing a particular sub-
task. Later it is shown that a fault tolerant strategy can be developed by con~ 
currently executing the same subtask in up to p proces,sors. For small p (say 2), 
if all the p processors fail during execution of a particular subtask, then the 
fault tolerance strategy developed here fails and the architecture hangs up. In 
the simulations reported here, we ignore more than one processor failing during 
execution of the same subtask using the argument the probability of this hap-
pening is fairly low and can therefore be neglected. However, this is a weak 
point of the simulation and later studies should focus on more realistic 
strategies to force processors in failure mode. 
. f•.'.'' 
3.6 ParaU1eters 
The simulation allows user to set up five parameters that are considered 
important to determine the performance of the NX8086 architecture. These are 
the number of processors, the mean execution time for each subtask, t, the de-
,, 
gree of redundancy, rho, the communication overhead, CO, and the number of 
faulty processors, fail. 
The simulator allows number of processors to be varied between 1 and 8. 
The mean execution time is the time expected for a subtask to finish execution, 
and is held constant for all the subtasks. The degree of redundancy is the max-
imum number of a processor that may execute a particular subtask. The com-
25 
,L 
' 1 
munication overhead is the·length required for a processor to load the code, and 
. 
global data at the beginning of a subtask execution and to update the global 
data at the end of the execution. 
3.7 Sintulation Results 
The simulation described in earlier sections was used to evaluate the per .. 
formance of the NX8086 architecture under fault using the specified fault 
avoidance strategies. The precedence graph used in these simulations is shown 
I\ 
in Fig. 3-1. The task consists of 11 subtasks with the maximum degree of paral-
lelism of 5. 
Graph level 
1 
2 
3 
4 
- - -
---------
., 
~----------------
___________ .... 
Maximum degree of~ 
parallelism 
2 
5 
3 
1 
Figure 3-1: The precedence graph of input task for the simulation. 
Two types· of experiments were performed to study the architecture in the 
architecture in the presence and in the absence of faults. Simulations per-
formed to evaluate the system performance of the NX8086 multiprocessor ar-
chitecture to complete the task of Fig. 3-1 with all processors working normally 
resulted in graphs shown iri Figs. 3-2, 3-3, and 3-4. The factor which distin-
\ 
\ 
26 
r·-. 
•, 
guishes results of Fig. 3-2 from those of Fig. 3-3 is rho, the maximum redun-
dancy of execution. This factor works as follows. When a processor is granted 
the bus, it takes a look at the global task table (described in Chapter ) to find 
out if there is a subtask that is ready to execute. If there is none, then it merely 
duplicates the execution of a task already in progress. The task chosen for such 
duplication is the one that has been running for the largest time an1ongst all 
those that are currently active. rho is the factor that decides on how many 
processors (at most) will be running the same subtask simultaneously. This 
strategy for duplicating the execution helps in keepi.ng the system going if a 
processor dies during execution of a subtask. 
Figures 3-2 and 3-3 show the ,scheduler performance for rho= 1, and rho 
= 2 respectively. I11 each graph, parameter t, the time complexity of each sub-
task was varied from O time units to 75 time units. As t increments the graph 
shifts upward, because more time is required for each processor to execute a 
subtask. The graph corresponding to t = 0 is intended to get a rough idea of the 
communication overhead, since this case corresponds to the processors merely 
looking the code and global data and then updating the global data without any 
execution. 
Figure 3-4 shows the effect of increasing the communication. overhead, 
from 3 time units up to 15 time units. A constant increment of the communica-
tion overhead produced a constant increment in the task execution time as ex-
pected. 
In Figs. 3-2 to 3-4 the most significant drop in the execution time is ob-. 
served when the number of processors increase from 1 to 2. Smaller drops are 
noted when the number of processors go from 2 to 3 and from 4 to 5. The reason 
27 
.. 
, 
{.•. ;,~ .. , 
: , ........ 
~ .. ·. 
•,\ 
(1) 
E 
·-J-
C 
0 
·-
-t-J 
:J 
C.) 
(1) 
)( 
UJ 
-ca 
-t-J 
0 
J-
(1) 
E 
J-
c 
0 
·-
-+-' 
:J 
(.) 
(1) 
)( 
UJ 
ca 
-+-' 
0 
J-
..... , ..... ;,i;,... 
800 ................................................ · .•••• ................. ,. .................... ······· ......... . t=O 
)I( 
700 ················- ------······-·-------··-·-~·-·····--·······-··-·--·-·····-·-~-------··---.--·--·--····-·-·---·········· · t=15 
600 .................................................... · ......................... · ........... -.. ·, ................ . D 
t=3·o 500 ............................................................................... ····· ........ ··············. 
t=45 
• 
t=60 
200 .......... ·········~ 
~----F-,;L K 
1 00 .................................. . "· 
-----------------------·------------------------------------
o..___---.----.----~---.------.----.---.-----.--~ 
1 2 3 4 5 6 7 8 
Number of processors 
Figure 3-2: Dependence of the complete task execution time on 
the complexity of each subtask, t and number of processors 
t when rho = 1 and CO = 3 time units. 
900.-----------------~ 
800 ............. ··················· .............. ••. ··. •.••• ................ ··························.······· .•··· 
)j( 
700 ••••••••••••••••, •••••••.••. •.--•• .••• •••.••••••••••••••. •.•,••••••• .. •,.···· .. '-•••,'•A:••••'••••.,,,.···•••'••;',•• 
t=15 
600 ··············· ...... · · ... -..... · ....... ···· .... · · ................... · ............... · .................. · .... . D 
500 .......................................................... · ----·· ·.· -.--·· ................ ········-····-
X 
t=45 
t=60 
E 
0'-----.----,------,.-----.-----.-------.---.-----.-------' 
Number of processors 
Figure 3-3: Dependence of the complete task execution time on 
• 
. the complexity of each subtask, t and number of processors 
when rho = 2 and CO = 3 time units. 
28 
'.! 
; 
~ 
,~ ·~· •j' 
, 
". 
. ,.,lo 
,..~, 
700 
650 C0=3 
600 
* 
Q) 550 C0=6 E 
·- D r- 500 
C C0=9 0 
~ 450 
:J () X 
Q) 400 co= 12 X 
UJ 
Ct1 350 A 
....... 
0 co= 15 r- 300 
250 
200 
150 
1 2 3 4 5 6 7 8 
Number of processors 
Figure 3-4: Dependence of the total execution time on the communication 
overhead, CO, and the number of processors when rho=l and 
t=45. 
for this is obvious from the precedence graph of the task. As can be seen from 
3-1, the precedence graph shows a degree of parallelism of at least 2 at almost 
all levels, of 4 at two levels and of 5 at one level. (Degree of parallelism implies 
number of independent tasks of that level.) Because the maximum degree of 
parallelism of the task is 5, a maximum of 5 processors will be active during any 
execution, and having more than 5 processors in the system will not increase the 
(! 
system performance. This is demonstrated by an almost flat portion of the 
graph beyond number of processors = 5. 
¢{· 
Comparison of Fig. 3-2 and 3-3 shows that increasing degree of execution 
redundancy, rho from 1 to 2, results in a insignificant decrease in the system 
performance (with no processor failures). This due to the fact that when rho > 
1, additional time is spent to search and load the duplicate subtask. 
29 
J 
Q) 
E 
·-t-
C: 
0 
·-+-' 
:l 
(.) 
Q) 
>< 
w ., 
-<tS 
+-' 
0 
t-
500 -------------- -- .------------------------------------ ---- .--------- __ ,. -·-·-·· -----·--·-··-······· __ _. -·-·· 
400 ····--·--··--··----·--··· ··---··········----··------- · -------····: -------------------------·---··-- · -·-----· 
,/' ,., •., ,·, .. 
rho=2 
* 
rho=3 
D 
. ....... . • . . . -.· < •• ,• ••• •' •.••• ' ' ·' ' -' ,, ' •• ' ••••••• ,, . ,, ' ••••• • . ' . 
A 
350 ·-····--··--···-··-- ·------"--
3 4 
\ 
-- --· .. -- .............. -........... -- ·--- ............. -....... _ --·----- - .. --- --- --- .. ---- . -..... · .... ·_ ... _ ........ __ · ... .. 
. . . 
5 6 7 8 
Number of processors 
. . 
Figure 3-5: Dependence of the task execution time on the execution 
redundancy factor, rho, and the number of processors in the 
Q) 
E 
t-
c:: 
0 
--+-' 
presence of 2 processor failur¥s, C0=3, and t=45. ',. 
450,.....---------------~ 
400 ···-·-···-······-· ·--·. -. · ... · ··-·· .. ·.·· ... ··············· ···--------·-- .-·-·-··········-·-···············---. 
rho=2 
)K 
rho=3 
D 
rho=4 
:J u 350 ··············-~-·,--~----········· -----~---·········--------· --······· ................ ··-··········-········ 
Q) 
>< w 
<tS 
+-' 
0 
t-
. 300 ·············-···· · .... · ···········----------·········-···········-··············-·· 
4 5 6 7 8 
Number of processors 
Figure 3-6: Dependence of the task execution time on the execution 
redundancy factor, rho, and the number of processors in the 
presence of 3 processor failures, C0=3, and~t=45. 
f 
30 ,'· 
I 
·' 
Cl> 
E 
·-t-
C 
0 
·-....... 
:J 
() 
Cl> 
>< 
w 
res 
....... 
0 
t-
' \ 
. 
-
400 ------------------·- -· ---- --- ---· · -. -----. --·---- .----- · - ---· ---- ---·-· --· · -·---------. --·--···----·-·-----·· 
350 ·--- ----·----·--- ---·-······--··------ . . . •••••••••••••••••••••••••••••••••••••••••·•••••••••••·••• ··s••s•••••••••• 
4 5 6 7 
Number of processors 
, 
rho=2 
* 
rho=3 
D 
rho=4 
\ 
Figure 3-7: Dependence of the task execution time on the execution 
redundancy factor, rho, and the number of processors in the 
presence of 4 processor failures, C0=3, and t=45. 
Figures 3-5 - 3-7 show the other phase of simulation, where failures were 
introduced in the system. These figures show the effect of performance of 2, 3, 
and 4 processor failures respectively on the system performance. For each 
graph the degree of redundancy, rho, was varied from 2 to 4. Each point in each 
graph represents an average of 10 independent runs performed by random 
generation of required number of faults. rho = 1 mode of operation is extremely 
fault prone in the NX8086 architecture, because if a processor fails while execut-
ing a subtask, other proc~ssors have no way of knowing it and they merely wait 
(in vein forever) for that processor to finish the subtask. (Note that rho=l im-
plies no execution duplication). The fault simulation experiments therefore 
avoided rho=l. The communication overhead and the mean subtask execution 
time were fixed., One can see that resulting .three lines corresponding_ to dif-
ferent rho's on each graph almost merge into a single line. This implies that 
31 
:~ 
.n , . 
I ' 
" 
increasing rho say· to 4 from2 has no ill effects. (Recall that in the absence of 
faults, increase of rho from 1 to 2 severely degraded the system performance.) 
.. 
This increase of rho to 4 however makes the arcl1itecture NX8086 more robust 
to withstand faults in upto 3 processors running the same subtask. 
,, 
Finally, Figure 3-8 shows the consolidated results of fault simulations. It 
can be seen from the graph, the NX8086 archite.cture can withstand a relatively. 
large fraction of processor failure without an appreciable decrease· in perfor-
mance. 
500 
450 fail=2 
)K 
Q) 400 fail=3 
E 
·- D I- 350 C fail=4 0 
+""' 
:J 300 )( C) 
Q) 
fail=5 X w 
- 250 C'l1 A +""' 
0 fail=6 t- 200 
z 
150 fail=7 
3 4 5 6 7 8 
Number of processor$ 
Figure 3-8: Dependence of the task execution time on the number of 
processors when different number of processors fail with t=3, 
C0=45, and rho=2. 
32 
•. 
I} 
Chapter 4 
Initial Set-up of a Parallel Task 
The architecture of NX8086 Parallel processor is designed in a very 
specific way for t4e fault tolerance. In particular, as discussed in Chapter, the 
entire code corresponding to a task, and information about its parallelism is 
stored in the global memory. This information is contained in the form of three 
segments: · . 
• A Task table with the dependencies reso~ed amOng its subtasks. 
• The task executable code. 
• The global data. 
A part of this project was concerned about setting up the information seg-
ments so that NX8086 architecture may function as intended. The tables re-
quired by NX8086 are generated in a host machine. 'This same host may also 
used to compile (or _compile) the user code in higher level (or assembly) lan-
guage. (This is convenient for an IBM PC compatible host, since it is based on 
8088, 8086 or 80286 processor itself and will always generate appropriate ex-
ecutable machine code for NX8086.) Once the necessary tables are created, host 
then transfers them to NX8086 through the serial link as already explained in / 
Chapter. 
In order to be able to generate the tables containing task dependencies, 
the user should explicitly specify them using the FORK and JOIN language con-
structs described in this chapter. A pre-processor (residing on the host) is 
designed to examine 'the dependency specification and create the final code 
needed for the NX8086 architecture. 
'I ) 
33 , 
'. 
,. 
u 4.1 The FORK-JOIN language constructs 
It has been shown in Ch"l1pter 2-1 that the parallelism information within 
a task must be explicitly provided by the user. This information includes the ,. 
list of the subtasks within a task and the precedence relations among the sub-
tasks. Graphically, the precedence relations can be represented as a 
precedence, i.e. a directed acyclic graph whose nodes correspond to subtasks and 
'edges correspond to precedence relations. 
Fig. 4-1 shows a typical precedence graph. The precedence information in 
' 
the graph can be interpreted as follows. 
• Subtasks 4 and 6 can be started as soon as execution begins. 
' 
• Subtask 3 can be executed after subtask 6 finishes. 
• Subtask 5 can be executed after subtask 4 finishes. 
• Subtask 2 can be executed only after subtasks 6 and 5 finish. 
• Subtask 1 can be executed only after subtasks 2, 3, and 4 finish. 
Precedence Arrow 
Y is executed after X. 
Figure 4-1: An example of a precedence graph. 
Even though it is very easy to show the concurrency in a task using a 
preceden~e graph, it is difficult to implement the graph in a programming Ian-
34 
• 
• 
\ 
-\ 
'\ 
/:,< 
gnage. We show here an alternate way to specify the precedence constraints 
through the use of FORK-JOIN language constructs. Such language constructs 
. can be easily embedded in a higher level structured language (such as Pascal 
/ 
and C) and can efficiently depict the task concurrency. 
The FORK-JOIN language construct used here is an extension of the one 
• 
that was introduced by· Conway, Dennis and Van Horn as one of the first lan-
guage notations for specifying concurrency [7]. 
The FORK is used to produce two or more concurrent executions in a 
< 
' 
program. For example the statements 
S.l; 
FORK 11, 12· 
. ' 
I S2; 
• 
• 
• 
11.: S3· ,
, ~ 
• 
,, 
• 
• 
12: S4; 
are meant to imply that subtasks S2, S3, and S4 can be executed concurrently 
._i 
and independently. 
The JOIN instruction is similarly used to merge two or more indepen-
dently executing tasks. It has one parameter, a counter, which has to be initial-
ized to the number executions that must be recombined. 
4.1.1 The Parallel Program Structure using FORK-JOIN Construct 
A program using FORK-JOIN construct is separated into two parts: 
• the declaration part and 
• the statement part. 
-1 
.1 
35 
·\ 
I 
' 
J 
,, 
All the subtasks names, the counters, and the labels are specified under 
.• 
the declaration part,: The counters must also be initialized with appropriate 
values for the JOIN instructions. In the statement part the precedence rela-
tions among subtasks is described. Appendix A specifies complete syntax gram-
mar for these language constructs and also give some restrictions that must be 
applied in writing a program using these constructs. 
Fig. 4-2 shows the translation of the precedence graph of Fig. 4-1 into a \ 
FORK-JOIN construct. The construct proposed here can be used to represent 
,, 
rather easily a fairly involved parallel scenario. Fig. 4-3 shows a complex 
precedence graph and Fig. 4-4 the associated FORK-JOIN construct. 
1 .. 
5. 
10. 
TASK AAA; 
LABEL Ll, L2, L3; 
COUNTER COUNTl = 2, COUNT2 = 3; 
SUBTASK DUMMY, Sl, S2, S3, S4, S5, S6; 
TASKBEGIN; 
DUMMY; 
FORK 11; 
S4; 
FORK L3; 
S5; 
GOTO L2; 
L·l: S 6; 
15. 
L2 : 
L3 : 
20. 
FORK L2; 
S3; 
GOTO L3; 
JOIN COUNTl; 
S2; 
JOIN COUNT2; 
Sl; 
TASKEND; 
Figure 4-2: FORK-JOIN program representing the same precedence 
information as in Fig 4-1. 
,., 
36 
,, 
'i 
.• 
\ 
\ 
... 
.. 
Figure 4-3: An example of a more complex precedence graph. 
0 
\. 
I. 
37 
0 
""., 
. i . 
4-t,~,, .• ,"' I 
.. ! 
TASK COMPLEX TASK; 
-LABEL Ll, L2, L3, L4, LS, L6, L7, LS, L9, 
LlO, Lll, Ll·2, L13, L14; 
COUNTER COUNTl - 2, COUNT2 = 2, CQUNT3 - 3, 
COUNT4 = 2~· COUNTS - 2, COUNT6 - 2, 
COUNT7 - 2, COUNTS - 2, COUNT9 - 4; 
SUBTASK DUM:MY, Sl, S2, S3, S4, .. ,ss, S6, S7, ss, 
S9, S10, Sll, S12, S13~; S14, SlS, S16; 
TASKBEGIN; 
. ~-
D~Y; · 
FORK Ll, L4; 
Sl; 
FORK L2; 
S4; 
FORK LS, LS; 
GOTO L3; 
Ll: S2; 
FORK L 7; 
L2: JOIN COUNTl; 
SS; 
FORK L5; 
L3: JOIN COUNT2; 
S7; 
~ S12; 
FORK L12; 
GOTO L13; \ 
L4: S3; ·~ 
FORK L6, L9; 
GOTO L8; 
L5: JOIN COUNT4; 
S9; 
GOTO Lll; 
L6: Sll·; 
GOTO L13; 
L7: S6; 
FORK L9; 
t8; JOIN COUNT3; 
·SS; 
FORK LlO; 
GOTO Ll:2; 
L9: JOIN COUNTS; 
S10; 
FORK Lll; 
LlO: JOIN COUNT7; 
Lll: 
L12: 
L13: 
S14; 
GOTO L13; 
JOIN COUNT6; 
S13;· 
GOTO L13; 
JOIN COUNTS; 
S15; 
GOTO L14; 
JOIN COUNT9; 
S16; 
L14: TASKEND; 
Figure 4-4: FORK-JOIN program representing the same precedence 
information as in Fig 4-3. 
38 
... ~ 
I 
1· 
I 
•. 
4.2 The Host-PreProcessor (HPP) 
The Host-PreProcessor (HPP) is a program which runs on the host com-
puter. Basically the HPP acts as a front-end compiler, which scans through the 
input file, cheeks for illegal symbols, and parses each statement according to the 
FORK-JOIN language syntax. As it parses the input file, it also generates a 
task table with the dependencies resolved among the subtasks. 
In order to resolve all the dependencies, HPP uses an internal stack with 
push and pop operations, and an information table is used. The subtask ID 
numbers are stored on the stack and the information table contains the cor-
,. 
respondence between a subtask ID and a label number. The stack is formed in 
such a manner, that at any time the elements in the stack are all the predeces-
sors of the subtask that is encountered in the next subtask statement. 
, 
Thus, when two or more subtasks are to be executed initially, a DUMMY 
subtask must be added as their predecessor subtasks. A FORK-JOIN program 
should always be started with an execution of a single subtask. 
The algorithms implemented by HPP to resolve the subtasks depen-
• 
dencies are as follows. 
HPP algorithins: 
1. if a subtask statement, then 
(,. 
current subtask := subtask ID; 
if not initial · then _ 
while not empty stack do 
previous_subtask := pop_statk; 
set current subtask as the successor 
subtask of previous subtask; 
whilend; -
ifend; 
push(current_subtask); 
39 
.,. 
-~ 
;J 
I, 
., 
2. if a label is found, then 
current label := label ID; 
for eveiy matching label ID' s in 
information table do 
push_stack(the associate 
subprocess ID); 
forend; -
3. if a fork statement is found, then 
current subtask := pop stack; 
for every· label do - c 
create (current subtask, 
-
current label); 
insert the pair into the 
information table; 
forend; . 
push stack(current_subtask); 
4. if a goto statement is found, then 
set current label with the label·· 
found in GOTO statement; 
while not empty_stack do 
current_subtask := pop_stack; 
create pair (current subtask, 
- . 
current label); 
insert the pair into the information 
table.: 
whilend; 
5. if a join statement is found, then 
if current counter" <> number of- 1 · 
elements in the stack then 
exit; 
ifend; 
For clarification of how the HPP works, the following are the explanation of the 
execution of the HPP on th{FORK-JOIN program shown in Fig. 4-2. 
Line 1 to 4 is the declaration part of the program. HPP scans through the 
declaration part and initializes the symbol table with the label, counter, and 
subtask I).ames. The statement part is started with the keyword TASKBEGIN 
and ended by TASKEND. 
Initially, the stack and the information table are empty. Start with line 6, 
a subtask statement is found. Since the task is just begun (initial is TRUE)~ the 
40 
•· 
,, 
•1' 
!,. .; 
7 
current subtask,_ DUMMY, is pushed into the stack and the initial flag is s
1
et to 
FALSE. The next line is a fork statement. First, a subtask is popped from the 
stack and then paired with the label found in the fork statement. In this case 
the only pair generated is (DUMMY",Ll) and then the pair is stored into the in- n 
-·, 
formation table. Finally subtask DUMMY is pushed back into the stack. So far 
v the information table an1d the stack look as follows: 
information table 
(DUMMY,Ll) 
stack 
DUMMY 
Line 6 is another subtask statement. The current subtask is set to S4. 
Then all the subtask in the stack are popped and they are assigned as predeces-
sor subtask of 84, in this case it is only 1, DUMMY" subtask. The DUMMY sub-
task is treated differently. It will not ever be assigned to any subtask to be a 
successor subtask or a predecessor subtask. Therefore the DUMMY subtask is 
assigned as a predecessor of subtask 84. Finally, "84 is pushed into the stack. 
The next statement is a fork statement, the pair generated is (S4,L3) and it is 
added into the information table and S4 is pushed back into the stack. The sub-
task statement on line 10 resulus S5 to be set as the successor of S4. Up to this 
point the stack and the information table look as follows: 
information table 
(DUMMY,Ll) 
(S4,L3) 
stack 
S5 
Line 11 is a goto statement. The current label is set to L2. Subtask S5 is 
popped from the stack and paired with L2 producing (S5, L2) which is then 
stored in the_ information table. In line 12 a label is found, Ll, then for every 
matching label found in the information table, its associated subtask is pushed 
into the stack. The only pair that is matched with Ll is (DUMMY,Ll), there:fore 
41 
,· 
........... ti,a, 
1' • 
... 
DUMMY subtask is pushed into the stack, leaving the stack and the infor-
mation table look as follows: 
,. information table 
(DUMMY,Ll) 
(S4,L3) 
(S5,L2) 
stack · 
DUMMY 
Next, the subtask statement is found. With the same argument as before, 
the DUMMY subtask from" the stack is ignored and then S6 is pushed into the 
I 
stack. The fork statement on line 13 adds the pair (S6,L2) to the information 
table and l~aves S6 in the stack. The subtask statement in line 13 causes S3 to 
assigned as the· successor of S6. Line 14 is a goto statement. The pair 
generated is (S3,L3). Next, the label L2 is found. The matching pairs found in 
t-
the information table are (S5,L2) and (S6,L2), then subtasks S5, and S6 are 
pushed into the stack. Now the stack and the information table contain:. 
information table 
(DUMMY,Ll) 
(S4,L3) 
(S5,L2) 
(S6,L2) ,r, 
(S3,L3) 
· stack 
S5 
S6 
When the join statement is parsed, the number of elements in the stack is com-
pared with the value of .the counter variable, 90UNT1. From the declaration 
part, it can be seen that the value of COUNTl is declared to be 2. Therefore the 
value equals to the number of elements in the stack. 
The subtask statement in line 17 causes the subtasks S5, and S6 to be 
popped from the stack and then the subtask S2 is assigned to them as their suc-
cessor subtask. Finally S2 is pushed into the stack. The label statement in line 
18 selects the pairs (S4,L3) and (S3,L3) from the information table. Then the 
42 
.. 
. .. 
"'· 
., <7·--
·, 
subtask S4, and S3 are pushed into the stack. Now the stack and the infor-
mation table contain: 
information table stack 
(DUMMY,Ll) S2 
(S4,L3) S4 
(S5,L2) S3 
(S6,L2) 
(S3,L3) 
In line 18, a join statement is found. Since the value of COUNT2 agrees 
with the number of elements in the stack, this parsing is continued. The next 
line is a subtask statement. Sl is then assigned as a suc~essor subtask to all, 
the subtasks in the stack, S2, S4, and S3. Finally, the keyword TASKEND is 
found which ends the HPP. 
4.2.1 Input and Output Files of HPP 
HPP is written in C so that it could be modified as needed in the future. 
It could be invoked by the command HPP with two parameters as follows: 
HPP InputFile OutputFile 
./ 
/ 
InputFile is the program file containing the task representation in terms 
of FORK-JOIN language constructs, and OutputFile is the output file produc~d 
by the HPP consisting the task information. The output file will not be used for 
the execution of the task. Besides the FORK-JOIN program, HPP requires ad-
I 
I 
ditional files depending upon the task. They are the code files with a ".COD" 
extension and global data information files with a ".DAT" extension. 
I 
. ~ 
For every subtask name used in the FORK-JOIN program, such as Sl and 
S2, the corresponding code and global data information files need to be provided 
I 
and should be named as SI.COD, SI.DAT, and, S2.COD, S2.DAT respectively . 
.  
43 
• 
• 
; 
":' ,' .,; . 
( 
_,, .. ' .. 
' ', 
All the code files are merged into one file, called taskname.COD2 The contents of 
all the data files are inserted into the output task table. 
In the following section an example of a precedence graph with its cor-
t9 
responding FORK-JOIN program, · and the output produced by HPP is 
-~ 
presented. 
4.3 Sample Results 
The FORK-JOIN constrnct of the precedence graph on Figure 4-2 might 
~-~ 
., 
look like the program shown in Figure 4-1. For ease of reference let us name of 
this file AAA.FJ. In addition to AAA.FJ the files required for the HPP are: 
• the subtask code files: 
SI.COD, S2.COD, S3.COD, S4.COD, S5.COD, and S6.COD. 
• and the subtask global data files: C 
SI.DAT, S2.DAT, S3.DAT, S4.DAT, S5.DAT, and S6.DAT. 
Let the ip.put program be called AAA.FJ, and by invoking the command: 
HPP .,. AAA.FJ AAA.INF 
the HPP produces the following ~les: 
1. the task table, named AAA.TBL, which consists of: 
AAA 
6 
1 
2 
3 
4 
5 
' 
6 
1 
; task name. 3 
;the total number of subtasks. 
0 3 0 349 ;From col 1 to col 5: 
1 2 349 349 ;subtask ID, number of 
1 1 692 ~49 ;successor nodes, number 
2 0 9db 349 ;of predecessor nodes, 
1- 1 d24 349 ;offset address (HEX), 
2 0 106d 349 ;and code length (bytes}. 
1 5 1 2 3 2 ;the_ successor 
d I . ;no es ID~ 
,j 
2. AAA.COD the subtasks' executable codes. 
2taskname is replaced by the name of the task. 
3Comments are added for clarification purposes only. 
44 
\ 
' 
··, 
,, 
,. 
0 
,. 
This two files plus the global data file, called AAA.DAT, are the files which 
are used by the NX8086 architecture to actually execute the task. 
f. 
.. 
, . 
\ 
\ 
/ 
I 
.... 45 
I 
.. . 
' 
.. 
r 
j 
'\I 
,., 
/ 
I 
\.-
'·" j 
.• 
5.1 Discussion 
·Chapter 5 
Conclusion 
In this thesis an operating system for a Task.-Flow parallel microprocessor 
architecture. is· implemented· arid ·evalriate·d ... The . goal ·specified was . to produce 
~ 
fault tolerant system taking into account all the architectural features given of 
NX8086. 1'0 a large extent, the goal was reached. 
Task scheduling was seen as the most important function of the operating 
system. Proper scheduling allows one to exploit parallelism of the NX8086 
hardware and also to embed the fault tolerance strategies. Since the processor 
modules are independent, the scheduler is designed such that each individual 
processor module runs its own scheduler. With this configuration, a processor 
ff; 
failure only degrades the overall system performance rather than causing a 
catastrophic system failure. The expandability of the system is also maintained, 
since the number of processors is transparent to both the user and the 
scheduler. 
The shared bus interconnection network guarantees the mutual exclusive 
access to a global memory. Therefore, there is only one ready queue in the 
global memory maintained and shared by all the schedulers. Data coherency 
problem is also solved without employing a specific language construct such.as 
semaphores. 
~Jtask is represented by a task table stored in the global memory. To help 
the user generate a task table, a Host-PreProcessor program is developed. 
Simulation of the s.ystem under control of the operating system developed here 
46 
·' 
I 
.. 
.. 
"' 
shows that the failure of individual processors causes only marginal system 
degradation. 
5.2 Future Extensions 
The major limitation of the current scheduling and fault tolerance scheme 
is seen to be the size of the ROM, which is only 8K bytes. The scheduler and the 
loader must fit in this small space. Due to this limitation, the scheduler is 
designed to be as small as possible and just enough to perform the necessary 
jobs. By increasing the ROM size, (a hardware change) more flexibility may be 
provided in the design of the scheduler. In particular, a memory management 
scheme may be added to the operating system· to overcome the task's size limita-
tion using memory swapping. 
Additional effort needs to be expanded to develop a compiler for this ar-
chitecture if it is to be of practical use. The compiler must produce not only the 
object code of the task, but also the required information to generate the task 
table as well. This compiler thus should use all the ideas already built into the 
HPP pre-processor discussed in qhapter three. 
,, 
Finally, the simulation reported in Chapter four need to be improved. In 
particular, the random generation within the simulator should be modified to 
remove the current presumption that two or more processors running the same 
task cannot fail. This would make the simulation even more realistic and would 
enable the system designers to evaluate more complex fault tolerance strategies. 
47 .:-., ... ' 
... ;_~ ..... 
, . 
\ 
.J 
REFERENCES 
[1] Lionel M. Ni and Ching-Farn E. Wu, "Design Tradeoffs for Process· 
Scheduling in Shared Memory Multiproce·~sor Systems", IEEE Trans-. 
action on Software Engineering, pp. 327-333~ Vol. 15, 1989. 
[2] Philip H. Enslow Jr., editor, Multiprocessors and Parallel Processing, 
Wiley-Interscience, 197 4. 
[3] A. Gottlieb et al., "The NYU Ultracomputer-Designing a MIMD Shared-
memory Parallel Computer", IEEE Transaction on Computer, pp. 
175-189, Vol. C-32, Feb. 1983. 
. [4] 
[5] 
[6] 
[7] 
John C. Pagano, "A Parallel Microprocessor Architecture for A Task Flow 
Programming Environment", Master's thesis, Lehigh University, 1987. 
T. Basil Smith III et al, The Fault-Tolerant Multiprocessor Computer, 
Noyes Publications, Park Ridge, New Jersey., 1986. 
Walter A. Triebel and Avtar Singh, 16-B!TrMicroprocessors Architecture, 
Sofiware, and Interface Te~hniques, Prenti~e Hall, Englewood Cliffs, NJ 
07632, 1985. 
James L. Peterson and Abraham Silberschatz, Operating System 
Concepts, Addison-Wesley, Computer Science, 1985 . 
.. 
.y 
.· 48 
,. 
{ 
.• 
,. 
' 
Appendix A 
Grammar Specification for FORK-JOIN 
language construct. 
, 
The FORK-JOIN language construct to express parallelism of programs 
are defined through the grammar rules specified in this appendix. The parser of 
this grammar is explained in Chapter. 
I· 
..... 
Symbol .. Meaning 
- is defined to be 
-
. . 
.. , 
I alternatively 
., 
. 
• end of production 
[X] 0 or 1 instance of X 
{X} 0 or more ins·tances of X 
(XI Y) a grouping; either X or Y 
'XYZ'' the terminal symbol XYZ 
Metaidentifier the non-terminal symbol 
Metalden tifier 
Table A-1: Notation used in the productions 
The Productions: 
Process 
Process Heading 
Identifier List 
Identifier 
Letter 
Digit 
Block 
"-:-~~tatementPart 
- ProcessHeading ";" Block. 
= 
11PROCESS' 1 Identifier. 
= Identifier 
{ "," Identifier } . 
- Letter { Letter , I Digit } . 
= "a" I "b" I ... I "z" I "A" I 
- "O" I "l 11 I . .. I "9". 
- InitializationPart 
''PROCESSBEGIN'' ''•'' 
' StatementPart. 
- [CompoundStatement] 
"PROCESSEND" ";". 
· = · Statement { Statement } . 
u 
I "z" • • • • 
, 
. 
Com poundStatement 
Statement 
SimpleStatement 
= [Labelldentifier ":"] SimpleStatement. 
= ( ForkStatement I GotoStatem.ent I 
JoinStatement I SubprocessStatement ). 
49 
i 
' 
I 
~:-
•' 
ForkStatement 
Labelindentifier'List 
Labelidentifier 
GotoSta tement 
J oinStatement 
Counter Identifier 
SubprocessStatement 
Subprocessldentifier 
InitializationPart 
Labelinitialization 
Counter Initialization 
Counter Identifier List 
Counter Initialization 
Counter Identifier 
Unsigned.Number 
,,,1-•. , '• 'I 
... 
- ,'.'FORK" Labelldentifier·List. 
= Labelldentifier { "," Labelldentifier} ";". 
= Identifier. 
= "GOTO" Labelldentifier ";". 
= "JOIN" Counter Identifier ";". 
= Identifier. 
= Subprocessldentifier ";". 
= Identifier. 
- ( Labellnitialization I . 
Counterlnitialization I 
Subprocesslnitialization ). 
= "LABEL" LabelldentifierList. 
- "COUNTER" CounterldentifierList. 
= Counterlnitialization 
{ "," Counter Initialization } ";". 
= Counterldentifier "=" Unsigned.Number. 
= Identifier. 
= Digit { Digit}. 
Subprocesslnitialization · = "SUBPROCESS" 
Subprocessldentifier List. 
Subprocessldentifier List - Subprocessldentifier 
Subprocessldentifier 
Rules: 
{ "," Subprocessldentifier } ";". 
= Identifier. 
1. The counters, labels, and subprocesses. used in the· program must . 
be declared. .. 
2. When two or more of the processes can be started at the same time, · 
a DUMMY subprocess must be added into ·the program as the 
predecessor node for all the processes. (REMEMBER to put it into 
the declaration part) 
3. No GOTO to the previous line. 
50 
i 
( 
• 
, .. 
V 
• I 
.. 
AppendixB 
The Task Scheduler Simulation 
Specifications 
·' 
The task scheduling is a very important part of the operating syste~ of 
NX8086. Tne fault tolerance properties of the architecture are intimately re-
lated to the task scheduling. The simulator described in this Appendix was used 
> 
to simulate task scheduling and produce the performance results given in Chap-
ter. 
The simulator code is modularized for ease of modification and fine tuning 
to mimic the performance of NX8086 architecture. The important modules of 
the code are Print module, Hardware module, Create module, Queue module, 
I/0 module, Kernel module, and Main module. 
Code Specifications 
Module PRINT 
This module consists of following routines for displaying the execution of 
the simulation, and results. 
• print_tree_table (y) 
Parameters 
Return 
Description 
y coordinate on the screen. 
None. 
This routine prints the subtask status label start-
ing at location (1,y). 
, 
• print_node (ptrnode, x, y) 
Parameters 
Return 
A pointer to a node's record, ptrnode, and screen 
coordinates (x~). 
None. 
51 
.. 
• 
\ 
Description 
• print_tree (y) 
Parameters 
Return 
Description 
This routine prints a node's reco-r,d beginning at 
location (x,y). , 
y coordinate. 
None. 
This routine prints out the task tree. 
_.~ 
• print_processors_table (numb_of_proc) 
Parameters 
Return 
Description 
The number of processors, numb_of _proc. 
None. 
This routine prints the processor's status label. 
• print_processors (numb_of_proc) 
Parameters 
Return 
Description 1 
The number of processors, numb_of _proc. 
None . 
This routine prints all the processor's status. 
Module HARDWARE 
In this module three subroutines for simulating the hardware are in-
cluded. 
• bus_req (proc, numb_of_proc) 
Parameters 
Return 
Description 
Processor ID, proc and the total number of proces-
sors, numb_of _proc. 
None. 
This routine is 9alled when a processor wants to re-
quest the BUS. It sets the bit, corresponding to the 
processor ID number in BUS request array, BRQ. 
• bus_rel (proc, numb_of_proc) 
, \ 
Parameters 
Return 
Description 
Processor ID, proc and the total number of proces-
sors, numb_of _proc. 
None. 
This routine is called when a processor wants to 
releas the BUS. It resets the corresponding bit in 
arrayBRQ. 
52 ' L-
• 
• 
) 
,,< 
• arbitrate (numb_of_proc) 
Parameters 
Return 
Description_ 
Module CREATE 
The total number of processors, numb_of _proc. 
' None. 
This routine is called every cycle to simulate the ar-
bitration logic. By examining the .array BRQ, it 
"decides which processor should next get the BUS . 
This module consists of subroutines used to set up the ta·sk tree. 
• create_task_tree (mintime) 
Parameters 
Return 
Description 
• free_process_tree () 
Parameters 
Return 
Desription 
The minimum time for subtasks (in seconds), 
mintime. 
Pointer to the first node (subtask) called 
head._node. 
This routine opens the input file, which consists of 
the information of subtask dependencies, and then 
creates the task tree. 
None. 
None. 
This routine releases all the memory allocated for 
the task tree. 
• create_processor_table (num) 
Parameters 
Return 
De~cription 
J 
Module QUEUE 
Number of processors, num. 
None. ' 
This routine creates the processor table. 
This module consists of subroutines -dsed to manipulate the READY · 
QUEUE. 
• delete_ready_Q (subtask) 
., 
53 
-· - . -1 
I 
I 
I ' 
) 
,.. 
--· 
Parameters 
Return 
Description 
The pointer to a subtask which is to be removed 
from the READY QUEUE, subtask. 
;, 
TRUE. if the task is removed successfully else 
FALSE. ' 
. . 
This routine updates the status of specified subtask 
to COMPLETED. 
• add_ready _ Q (subtask) 
Parameters 
Return 
Description 
The pointer to ~ subtask which is to be added to the 
READY QUEUE, subtask. 
g 
None. 
This routine inserts a subtask into the READY 
QUEUE and sets the status of the subtask to 
READY. 
" 
• create_ready_link (headptr) 
Parameters 
Return 
Description 
Module IO 
The pointer to the first subtask in the task_tree, 
headptr. 
None. 
This routine scans through the task_tree and in-
serts all the subtasks that are ready to be executed 
into the READY QUEUE. This subroutine is only 
called once, immediately after a task_tree is built. 
This module consists of routines used to simulate input and output froill a 
processor. 
• send_result (proc) 
Parameters 
Return 
Description 
• getrdy (proc) 
Parameters 
The processor ID, proc. 
None. 
This routine is called when the task execution in a 
processor is completed. It sets the global_end_time· 
and finds the time elasped between the 
global_end_time and the global_start_time to ob-
tain the time used by the task. 
The processor ID, proc. 
54 
: ... ,;Ii· -r,;.:: 
f 
~; ·••··. 
/ l' 
... , ..... -
I 
1 
' 
Return 
Description 
None. 
This routine is called when a processor is ready to 
receive a task from the host computer. It sets the 
global_start_time. 
,, • recieve (proc, mintime) 
Parameters 
Return 
Description 
The processor ID, proc, and the minimum time, 
miniime . 
. 
The task name and the minimum time required for 
all the subtasks. 
This routine first calls getrdy to get ready for data 
transfer, then it calls create_task_tree to create a 
new task tree, and finally calls create_ready _link 
to build the READY QUEUE . 
. • movedata (srcseg, srcoff, destseg, destoff, length) 
Parameters 
Return 
Description 
• update_data (proc) 
Parameters 
Return 
Description 
. 
The source (segment and offset) address, scrseg and 
scroff, the destination (segment and offset) address, 
destseg and destoff,and the length of data to be 
moved (in bytes), length. 
None. 
' 
f 
This routine is used transfer data or code between 
the global memory and processor's local memory. 
The processor ID, proc. 
None. 
This routine is called when a processor finishes ex-
ecuting a subtask. 
Module KERNELOA 
This module contains all the routine used to simulate task scheduling. 
• update_table (subtask) 
Parameters 
Return 
Description 
,,., 
'. -,. 
The pointer to a subtask that is fmished executed, 
subtask 
None. 
This routine is called when a subtask is completed. 
It looks for all of the subtask's successor nodes and 
updates the number of predecessor. If the number 
I 
55 
'· 
• 
r 
.·,/ 
f 
.,. 
.1 
/• 
of predecessors is zero then it also sets the status of 
specified subtask to READY and puts the subtask 
into the READY QUEUE. 
• get_a_subtask (proc, procname, FT_degree) 
Parameters 
Return 
Description 
• task_com pleted () 
Parameters 
Return 
Description 
The processor ID, proc, the processor name 
procn.a,ne and the degree of redundancy FT_degree. 
This routine returns TRUE if a subtask is found 
else FALSE. It also returns the current task name. 
This routine first checks the READY QUEUE. If 
the queue is not empty then it gets a subtask from 
the head of the queue. If the queue is empty then 
the currently rnnning subtasks are checked until a 
running subtask with counter field less than the 
degree of redundancy is found. If a subtask is 
fou.nd then this routine loads the subtask i11to the 
local memory for execution and returns TRUE. 
Otherwise returns FALSE. 
None. 
TRUE or FALSE. 
This routine returns TRUE if a task reaches 
completion, otherwise returns FALSE. 
• execute_subtask (proc) 
Parameters 
Return 
Description 
The processor ID, proc. 
None. 
This routine is used to simulate the execution of a 
subtaske It calculates the time elasped between the 
start time of execution (stored in the processor's 
table) and the current time. If the time elaspe ex-
ceeds the time the subtask is expected to take then 
it sets the processor's state to END_EXECUTING. 
i 
• do_IDLE(P, numb_of_proq, FT_degree) 
Parameters 
Return 
Description 
' 
' 
The processor ID, P, the total number of processors, 
numb_of _proc, and the degree of redundancy 
FT_degree. 
None. 
This routine is called when the processor's, state is 
IDLE. If the precessor is granted for the bus then 
if there is no task yet, calls subroutine receive to 
get a task. Finally it calls get_a_subtask to get a 
56 
( 
I 
...... ..,..,, 
,, 
subtask, if finds a subtask then sets the processor's 
state to EXECUTING, and releases the BUS. 
• do_EXECUTING(P, numb_of_proc) 
Parameters 
Return 
Description 
The processor ID, P, and the total number of 
processor, numb_of _proc. 
None. 
~. 
This routine is called when the processor's state is 
EXECUTING. It then calls routine 
execute subtask. 
-
• do_END_EXECUTING{P, numb_of_proc) 
Parameters 
Return 
Description 
The processor ID, P, and the total number of 
processor, numb_of _proc. 
TRUE or FALSE. 
This routine is called when the processor's state is 
END_EXECUTING. First it calls bus_req to re-
quest the BUS. When the BUS is granted, the cur-
rent task r1ame is compared with the one stored in 
the processor's table, if the name is the same then 
the processor's table and the task tree are updated. 
It then calls task_completed to check the comple-
tion of a task. If the task is completed then this 
routine returns TRUE otherwise FALSE. Finally it 
sets the processor's state to IDLE and releases the 
BUS by calling bus_rel. 
• schedule (numb_of_proc, FT_degree) 
Parameters 
Return 
Description 
Module MAIN 
The total number of processor, numb_of _proc, and 
the degree of redundancy FT_degree. 
None. 
This routine iterates starting from O to the max-
imum number of processors and calls routines 
do_EXECUTING, do_END_EXECUTINQ, and 
do_IDLE depending on each processor's state. 
This module consists of two jnitialization routines. 
• init_kernel () 
Parameters None.·-· 
, I 
57 
!·.• 
i 
i 
l. 
' 
• 
.i 
,'i 
I 
.,. 
. .. 
/ 
-. 
Return 
Description 
• init_system () 
Parameters 
Return 
Description 
. •' 
,) 
.. ' .. 
" 
J 
The total number of processors. 
This routine is used to ·initialize the kernel. It en-
quires the total number of processors, then creates 
the processor's table. 
None. 
None. 
This routine is called once at the beginning of the 
simulation. It will ask for the input file that con-
tains the information of a task, and the output. file 
used to store the performance of the simulation. It 
also initializes the BUS grant (BG) to processor 0 . 
L. 
,, 
58. 
\ 
, .. >f'c,w 
.' ... ·\ " . ff-· 
.. 
• 
Vita 
The author was born in Bandu~g-lndonesia on February 28, 1964, to 
Budidharma and Hirawati Budidharma. After graduation from YAHYA Senior 
High School, Bandung, Indonesia, in May 1982, he enrolled in the Institute of 
Technology of University of Minnesota, Minneapolis, Minnesota to pursue a 
Con1puter Engineering curriculum. He graduated with Bachelor of Computer 
Science in Spring 1987. In Fall 1987, he pursued a graduate study at Lehigh 
University, Bethlehem, Pennsylvania. His areas of interest are in operating 
systems, distributed system, logic design, parallel computer architecture, and 
computer network. He received a Who's Who Among International Student 
\ 
! 
award in Spring 89 and completed his M.S.E.E. degree in June 1989. 
. .. 
I. 
' . 
• 
·'' 
• 
59 
,. 
" 
