An experimental distributed microprocessor implementation with a shared memory communications and control medium by Mejzak, R. S.
15
AN EXPERIMENTAL DISTRIBUTED MICROPROCESSOR IMPLEMENTATION
WITH A SHARED MEMORY COMMUNICATIONS AND CONTROL MEDIUM
Richard S. Mejzak
Naval Air Development Center
Warminster, Pennsylvania
An experimental distributed microprocessor subsystem is currently under
development at the Naval Air Development Center as a vehicle to investigate
distributed processing concepts with respect to replacing larger computers
with networks of microprocessors at the subsystem or node level. Major bene-
fits being exploited include increased performance, flexibility, system avail-
ability, and survivability by use of multiple processing elements with reduced
cost, size, weight and power consumption.
This paper concentrates on defining the distributed processing concept in
terms of control primitives, variables, and structures and their use in per-
forming a decomposed DFT (Discrete Fourier Transform) application function. The
DFT was chosen as an experimental application to investigate distributed pro-
cessing concepts because of its highly regular and decomposable structure for
concurrent execution. The design assumes interprocessor communications to be
anonymous. In this scheme, all processors can access an entire common data-
base by employing control primitives. Access to selected areas within the com-
mon database is random, enforced by a hardware lock, and determined by task
and subtask pointers. This enables the number of processors to be varied in the
configuration without any modifications to the control structure. Decompositional
elements of the DFT application function in terms of tasks and subtasks are also
described.
The experimental hardware configuration consists of IMSAI 8080 chassis which
are independent, 8-bit microcomputer units. These chassis are linked together
to form a multiple processing system by means of a shared memory facility. This
facility consists of hardware which provides a bus structure to enable up to six
microcomputers to be interconnected. It provides polling and arbitration logic
so that only one processor has access to shared memory at any one time. For
discussion purposes, five of the processors are designated as slaves and one as
a master where each slave contains an identical copy of a control executive and
application program tasks. In actual operation, the slave processors cooperate
to compute the DFT where the master provides external input, output, and control
functions. With this implementation, commands to perform a DFT iteration are
provided through the master.
It is expected that this concept will be tested and demonstrated on a lab-
oratory model by the end of 1980. Evaluations will concentrate on areas such
as performance comparisons based on varying the number of processors and bus
contention factors as a function of local processing and common data base ac-
cess times. Future work will focus on fault tolerant techniques that can be
directly implemented and evaluated on the baseline laboratory model.
113
https://ntrs.nasa.gov/search.jsp?R=19810003151 2020-03-21T16:09:47+00:00Z
MOTIVATION
AVIONIC PROCESSING SYSTEMS ARE BECOMING MORE DISTRIBUTED IN
ORDER TO EXPLOIT THE FOLLOWING MAJOR BENEFITS:
- INCREASED SYSTEM-WIDE REAL TIME PERFORMANCE
- EASE OF ADAPTABILITY TO INTEGRATION AND CHANGE
- HIGH SYSTEM AVAILABILITY
- DECREASED SYSTEM VULNERABILITY
BECAUSE OF REDUCED SIZE, WEIGHT, POWER CONSUMPTION AND COST
ADVANTAGES, MICROPROCESSOR TECHNOLOGY WILL IMPACT AVIONIC
PROCESSING SYSTEMS IN THE FOLLOWING AREAS:
- INTERFACE AND HARDWIRED LOGIC REPLACEMENT APPLICATIONS
^ - REPLACING LARGER COMPUTERS WITH NETWORKS OF SMALLER
COMPUTERS
MICROPROCESSOR TECHNOLOGY AND
DISTRIBUTED PROCESSING
• REASONABLE COST-PERMITS EXPERIMENTING WITH CONCEPTS WHICH
WOULD OTHERWISE BE PAPER STUDIES
• REDUCED SIZE/POWER, AND WEIGHT PERMITS APPLICATIONS THAT WOULD
OTHERWISE NOT BE FEASIBLE
• LIFE CYCLE COSTS OFTEN MUCH LOWER THAN FORMER SOLUTIONS TO SAME
PROBLEM
114
GLOBAL/LOCAL DISTRIBUTION
SUBSYSTEM 1 SUBSYSTEM 2 SUBSYSTEM N
s*
SUBSYSTEM BUS
SYSTEM BUS
APPROACH
• EXPERIMENTAL INVESTIGATION
• LABORATORY MODEL
• OFF-THE-SHELF HARDWARE (MICROPROCESSORS ARE INEXPENSIVE)
- MULTIPLE PROCESSORS
- SHARED MEMORY FACILITY INTERCONNECT
• EXPERIMENTAL CONTROL STRUCTURE
- LOCAL KNOWLEDGE OF EXISTANCE OF OTHER PROCESSORS NOT
REQUIRED
- GLOBAL CONTROL AND TASK SCHEDULING VIA HIGHLY RELIABLE
SHARED MEMORY
• EXPERIMENTAL WELL-KNOWN APPLICATION-DFT
• DEMONSTRATE CONCEPT FEASIBILITY
• PERFORM TRADE-OFF ANALYSES
• IDENTIFY AND IMPLEMENT FAULT-TOLERANT CONCEPTS
115
EXPERIMENTAL HARDWARE CONFIGURATION
SLAVE PROCESSORS
A
MASTER PROCESSOR
A,
S = SLAVE PROCESSOR
M = MASTER PROCESSOR
LM = LOCAL MEMORY
SHARED
MEMORY
SHARED MEMORY FACILITY CONSTRAINTS
• SIX PROCESSORS MAXIMUM
• ROUND-ROBIN POLLING SCHEME
• ONE BYTE ACCESSED PER POLL
• FIXED LOCK-OUT TIME IN FUG BLOCK
ASSUMPTIONS
• MASTER PROCESSOR PERFORMS INTERFACE AND DISPLAY FUNCTIONS
• SLAVE PROCESSORS PERFORM APPLICATION FUNCTION CONCURRENTLY
AS DIRECTED BY MASTER PROCESSOR
• LOCAL MEMORY
- EACH SLAVE PROCESSOR CONTAINS IDENTICAL COPY OF PROGRAMS
- CONTROL EXECUTIVE
- APPLICATION TASKS
• SHARED MEMORY
- COMMON TO ALL PROCESSORS
- CONTROL VARIABLES
- APPLICATION DATA
- ACCESSED BY CONTROL PRIMITIVES
- ACCESS RIGHTS ENFORCED BY SEMAPHORES
• VARYING NUMBER OF PROCESSORS DOES NOT AFFECT CONTROL STRUCTURE
116
TASK STRUCTURE
SYSTEM CONTROL: SEMAPHORES
• ENFORCES ACCESS RIGHTS TO SHARED MEMORY
• USED TO INDICATE CONDITIONS
- SHARED MEMORY BLOCKED
- SHARED MEMORY AVAILABLE
- ITERATION IN PROGRESS
- ITERATION COMPLETED
117
CONTROL PRIMITIVES
SEIZE RELEASE
CONTROL VARIABLES
BI/TLPt
FORMAT:
MSB
|7|6|5|4|3|2| 1| 0] -*- 3IT POSITION (ONE BYTE)
Bl TLP
-Bl IS O 2-BIT SEMAPHORE AND INDICATES THE FOLLOWING CONDITIONS:
SEMAPHORE
8 I CONDITION
0 0
0 I
1 0
I I
SHARED MEMORY BLOCKED AND ITERATION COMPLETED
SHARED MEMORY BLOCKED AND ITERATION IN PROGRESS
SHARED MEMORY AVAILABLE AND ITERATION COMPLETED
SHARED MEMORY AVAILABLE AND ITERATION IN PROGRESS
- TLP IS A 6-BIT TASK LIST POINTER THAT CAN POINT TO ANY ONE OF 64 TASKS
• TS, MSfl LSB
- FORMAT: I? |6 N 4 \3\ 2\ 1 ) Q] •+ BIT POSITION (ONE BYTE)
-TSs IS AN 8-BIT WORD USED TO ASSOCIATE CORRESPONDING DATA WITH A TASK AND
CAN TAKE ON 256 VALUES
. CTCt
- FORMAT: 6 5 4 3 2 1 BIT POSITION (ONE BYTE)
- CTC IS AN 8-BIT CUMULATIVE TASK COUNTER. ONE CTC IS REQUIRED FOR EACH TYPE OF TASK BEING PERFORMED,
i. ... THE NUMBER OF CTC i ARE EQUAL TO THE NUMBER OF TASKS POINTED TO BY TLP.
118
MASTER PROCESSOR CONTROL PRIMITIVES
SEIZEm PRIMITIVE
THIS PRIMITIVE IS EXECUTED BY THE MASTER PROCESSOR WHEN ACCESSING SHARED MEMORY
SEIZEm
HARDWARE POUING
LOCKOUT PERIOD =At
(NO OTHER PROCESSORS
CAN ACCESS SHARED
MEMORY DURING THIS
TIME)
OTHER PROCESSORS CAN AC-
CESS SHARED MEMORY BUT NO
VARIABLES CAN BE DISTURBED
UNTIL MASTER PROCESSOR
RELEASES SHARED MEMORY TO
THE SLAVE PROCESSORS BY
MEANS OF THE RELEASEm
PRIMITIVE
• RELEASEm PRIMITIVE
THIS PRIMITIVE IS EXECUTED BY MASTER
PROCESSOR WHEN RELEASING SHARED MEMORY TO
THE SLAVE PROCESSORS FOR PERFORMING AN ITERATION
SLAVE PROCESSORS CAN
NOW SEIZE SHARED MEMORY
SLAVE PROCESSOR CENTROL PRIMITIVES
SEIZES PRIMITIVE
THIS PRIMITIVE IS EXECUTED BY THE SLAVE PROCESSORS WHEN ACCESSING SHARED MEMORY
TIME DELAY
Xo/ =
^S^
INIT
> NO
-00s
f .,
IK
ALIZE
NO /""
^ ^ Ml
NO
\
TURN*1z
I
FETCH
BI/TLP
<fe
SET B
CONT
= 01
NIUE
>
>
SEIZES
HARDWARE POLLING
LOCKOUT PERIOD =At
(NO OTHER PROCESSORS
CAN ACCESS SHARED
MEMORY DURING THIS
TIME)
• RELEASES PRIMITIVE
THIS PRIMITIVE IS EXECUTED BY SLAVE
PROCESSORS WHEN RELEASING SHARED MEMORY
V
^ pf|IACC.
NO S^ ^*^^~
< M" TUBM'X HARDWARE
XX^XYES
StTBI 11—*. TIMEDELA
. _
OTHER PROCESSORS CAN AC- ANOTHER PROCESSOR
CESS SHARED MEMORY BUT NO CAN NOW SEIZE
VARIABLES CAN BE DISTURBED SHARED MEMORY
UNTIL ACCESSING PROCESSOR
RELEASES SHARED MEMORY BY
MEANS OF THE RELEASES
PRIMITIVE
POLLING
r
119
TASK EXECUTION CONTROL
ASSUMPTIONS
• UCH TASK CONSISTS OF NSUBTASKS
• All SUBTASKS OF A TASK MUST BE
COMPLETED BEFORE THE NOT TASK IS
PROCESSED
DFT APPLICATION
A DFT CAN BE DEFINED IN THE FOLLOWING MATRIX FORM:
G = WF
IF WE LET:
• n = 0,1,2 N-l = MATRIX ROW NUMBER AND FREOUENCT STEP
• k = 0,1,2 k-l = MATRIX COLUMN NUMBER AND TIME STEP
• N = k BUT MAINTAINING n AND k NOTATIONS TO DISTINGUISH ROWS FROM COLUMNS
THEN:
• W IS AN N x k MATRIX CONSISTING OF THE TERMS
W.k =
 e(-Mj/N)(nk MOD N)
= COS [<iZ))nk MOD N)] - j SIN [flp(nl MOD N))
• F IS A kxl MATRIX REPRESENTING THE FUNCTION F(tk)T/2OT( OVER THE TIME SPAN T
• G IS AN Nxl MATRIX WHERE Gn = T/2W 4 W"> F(tk)
IN EXPANDED FORM, G = WF CAN BE WRITTEN AS:k=0
WN-I.1 W"-1,K.1.
' HoTII TT K
/ FllT/2 IT K
Ft2T/2 n K
\ •\ •\FW-K-1T/2 TI
SINCE:
Gn.k (REAL) = COS delink MOD N)] f(tk) T/2 nk
Gfl.k (IMAGINARY) =-j {$« ((^ (nk MOD N)j)F(fk) T/2 nit
|Gn|= If. Gn.k (REAL)] *+ Pi' Gn.k (IMAGUURT)] *
V l*=o J L4-0 J-
THE AMPlirUDE/FREQUENCT VALUES CAN BE OBTAINED AS FOLLOWS:
120
INPUT/OUTPUT
SLAVE
PROCESSOR
1
2
3
4
5
CTC*
TASK/SUBTASK COMPLETIONS
TASK1
X
X
X
X
X
X
TASK 2
X
X
X
X
X
X
TASK 3
X
X
X
X
X
X
* CUMULATIVE TASK COUNTER
£
3s
.***x,,x'X.^t
**
UJ N/Z * * * *U)AH ww u>o tui (4/2 •
FREQUENCY
*
*»
• • • ul M£
F(t) = e-*
SIN(OUot)«(f)
DFT DECOMPOSITION FOR TASK 1
SUBTASK 1 OF N
INPUT
(SHARED MEMORY)
PROCESS
(LOCAL MEMORY p)
OUTPUT
(SHARED MEMORY)
121
DFT DECOMPOSITION FOR TASK 2
SHARED
MEMORY
LOCAL
MEMORY P
SHARED .
MEMORY
Go=
Gl =
1
~ N = K
T
_ f»o
r
W o,OFtoT/2iril
W 1,0 Ft
•.""•
WN-l ,OFtoT /2TTK
1 4 "Co.O^
\ 1
Gl
•
+
+
\
W o
"'
W N-
1
< :
^' *1
 % .
— T
— Ffl
• ' Ftl T /2TTK
.' Ftl T/2TTll
'.'Ftl '/21TK
* * *
• • •
T
1 F
Wl,K-lF«-lT /2tTK
W ".I-' FtK-1 T /2TTK
WN-l,K-lF t K . ,T/2TTK
Go,l / " + , • * * . + '
\I +.• '
61,1 / ' • • • > +
T\ J +,- ' '. +
Gft
> R = £Go(REAL) -.
Go.K-1 T
\ l = ZGo(IMAGINARY)
4 R = ZG1(REAL)
Gl,K-l^
"i V I = ^G1(IMAGINARY)
jV « -;• • -^ GH.iVi/R=ZCN-""AL'
^ j
 +). . . , - ) - \ I=ZGN-1(IMAGIMARY)
INPUT
PROCESS
OUTPUT
SUBTASK 1 OF K SUBTASK 2 OF K SUBTASK K OF K
DFT DECOMPOSITION FOR TASK 3
SUBTASK 1 OF N
X GN-l(l)
^, AMP^o)
•^ AMPfwi)
AMP(uiN-l)
INPUT-OUTPUT OF TASK 2
(SHARED MEMORY)
PROCESS
(LOCAL MEMORY p)
OUTPUT
(SHARED MEMORY)
122
STATUS
• IMPLEMENTATION
• GCSS SIMULATION
• LABORATORY EVALUATION
• FAULT TOLERANT STUDIES
- PROCESSOR
- SHARED MEMORY
- BUS
RELIABILITY MODEL
1 OF N
PROCESSOR 1
PROCESSOR 2
PROCESSOR N
• TAKE ADVANTAGE OF MULTIPLE
PROCESSORS
• OPTIMIZE EXISTING CONTROL
STRUCTURE FOR FAULT-
TOLERANCE PURPOSES
1 OF 1 1 OF 1
BUS SHARED
MEMORY
CURRENTLY SINGLE POINT FAILURES
STUDIES TO IDENTIFY FAULT TOLERANT
SCHEMES
POSSIBLE IMPLEMENTATION OF HIGHLY
RELIABLE SHARED MEMORY WOULD
BE DUPLEXED CONFIGURATION
EACH WITH SINGLE ERROR CORRECTION
AND DOUBLE ERROR DETECTION
123
