A microprocessor based high speed packet switch for satellite communications by Crist, S. C. & Arozullah, M.
  
 
 
N O T I C E 
 
THIS DOCUMENT HAS BEEN REPRODUCED FROM 
MICROFICHE. ALTHOUGH IT IS RECOGNIZED THAT 
CERTAIN PORTIONS ARE ILLEGIBLE, IT IS BEING RELEASED 
IN THE INTEREST OF MAKING AVAILABLE AS MUCH 
INFORMATION AS POSSIBLE 
https://ntrs.nasa.gov/search.jsp?R=19800019056 2020-03-21T16:49:59+00:00Z
CLARKSON COLLEGE OF TECHNOLOGY
POTSDAM, NEW YORK 13676
A MICROPROCESSOR BASED HIGH SPFED PACKET SWITCH FOR
SATELLITE COMMUNICATIONS
(HASH-CR-163357) A SICSOPROCESSOR BASED 	 N80-27557
HIGH SPEED PACKET SNITCH FOR SATELLITE
CONNOVICATIODS Final Report, 15 Apr. 1978
30 May 1980 (Clarkson Coll. of Technology) 	 Unclas
347 p 8C A15/NP A01	 CSCL 17B 63/32 28062
Prepared for
National Aeronautics & Space Administration
Lewis Research Center
21000 Brookpark Road
Cleveland, Ohio 44135
Final Report
on
Grant No. NSG-3191
James Rotnem - Project Officer
April 15, 1978 - May 30, 1980
Mohammed Arozullah - Principal Investigator
Stephen C. Crist - Co-Investigator
Grant Title: Design of a Microprocessor-Based
High Speed Space Borne Message
Switch.
I
L.'
	
ABSTRACT
This report is concerned with lesign and evaluation of a
microprocessor based high speed space-borne packet switch. Three
I
designs namely,a single, three and multiple processor designs
are presented. System architectures for these three designs are
presented. Further, the hardware circuits, and software routines
required for implementation of the three and multiple processor
designs are also presented. A bit-slice microprocessor is used.
This processor has been designed and microprogrammed. Maximum
throughput has been calculated for all three designs. Queue
theoretic models for these three designs have been developed and
utilized to obtain analytical expressions for the average waiting
times, overall average response times and average queue sizes.
From these expressions graphs have been obtained showing the
effect on the system performance of a number of design parameters.
ii
TABLE OF CONTENTS
Page
i1.	 INTRODUCTION	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 1
1.1 Problem Definition 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 2
1.2 Approach'to the Problem	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 3
2. SYSTE24 DESIGN CONSIDERATIONS	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 5
2.1 Protocols	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 5
2.2 Packet Construction	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 6
2.3 The Prior Architecture	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 7
I	 2.4 Processor Workload Divisions	 .	 .	 .	 .	 .	 .	 .	 . .	 13
,
2.5 Resource Contention Among Processors	 .	 .	 .	 . .	 14
,
3. THE THREE PROCESSOR DESIGN	 .	 .	 .	 .	 .	 ...	 .	 •	 •	 •	 • •	 23
3.1 System Hardware	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 23
3.1.1 The input Buffers	 .	 .	 .	 .	 .	 . .	 23
3.1.2 The Input Buffer Polling Circuit	 .	 .	 . .	 27
3.1.3 The Input Switching Network . 	 .	 .	 .	 .	 . .	 29
3.1.4 The Shift Register Array 	 .	 .	 .	 .	 .	 .	 . .	 31
3.1.5 The Output Queue Lists 	 .	 .	 .	 .	 .	 . .	 35
3.1.6 The Output Switching Network 	 .	 .	 .	 .	 . .	 44
3.1.7 The Output Buffers	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 46
3.1.8 The Output Status Words 	 .	 .	 .	 .	 .	 .	 .	 . .	 49
3.1.9 The Empty Shift Register List . 	 .	 .	 .	 . .	 52
3.2
I
The Processors	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 55
3.2.1 General Processor Architecture 55
3.2.2 The Instruction Execution Unit	 .	 .	 .	 . .	 57
3.2.3 Microprogram Word IEU and System
Hardware Control Fields	 .	 .	 .	 .	 .	 .	 .	 . .	 61
3.2.3.1 ALU Source Fields 	 .	 .	 .	 .	 .	 .	 . .	 63
3.2.3.2 ALU Function Fields 	 .	 .	 .	 .	 . .	 68
3.2.3.3 ALU Destination Fields 	 .	 .	 .	 . .	 68
s 3.2.3.4 Bus Control Fields	 . .	 68
3.2.3.5 System Hardware Control Fields 68
3.2.4 The Microprogram Control Unit . 	 .	 .	 .	 . .	 69
E 3.2.5	 Processor Timing	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 74
3.3 System Software	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 74
3.3.1 The Input Service Routine 	 .	 .	 .	 .	 .	 .	 . .	 77
3.3.2 The Routing Service Routine	 .	 .	 .	 .	 .	 . .	 79
3.3.3 The Output Service Routine 	 .	 .	 .	 .	 .	 . .	 86
iii
Page
C.
4. THE MULTIPLE PROCESSOR DEIGN . . . . . . • • •
	 94
	
4.1 The System Architecture . . . . . . . . . . . . 	 94
	
4.2 Shared Resources . . . . . . . . . . . . . . .	 99
	
4.2.1 The Shift Register Array . . . . . . . .	 100
	
4.2.2 The Output Queue Lists . . . . . . . . . 	 101
4.2.3 ELIST . . . . . . . . . .102
4.2.3.1 Processor-Controlled^ELIST	 103
4.2.3.2 Hardware-Controlled ELIST 	 106
	
4.3 The input System . . . . . . . . . . . . . . .	 115
4.3.1 Architectural Workload Division . 	 117
	
4.3.1.1 Master/Slave Scheduling . . . . . 	 118
	
4.3.1.2 Separate Systems . . . . . . . . 	 121
4.3.2 The Input Processors 	 . . . . . . . . .	 124
	
4.3.3 The Input Service Routine . . . . . . . .	 124
	
4.4 The Routing System . . . . . . . . . . . . . . 	 127
4.4.1 Architectural Workload Division 127
4.4.2 Packet Routing Data Ports	 .	 .	 .	 .	 .	 .	 . .	 131
4.4.3 The Packet Sorting Processors	 .	 .	 .	 .	 . .	 139
4.4.4 The Packet Sorting Service Routine 141
4.4.5 The Packet Routing Processors	 .	 .	 .	 .	 . .	 144
4.4.6 The Packet Routing Service Routine	 . .	 148
	
4.5 The Output System . . . . . . . . . . . . . . .	 152
4.5.1 Architectural Workload Division.. 	 152
	
4.5.2 The Output Processors . . . . . . . . . .	 156
	
4.5.3 The Output Service Routine . . . . . . .	 156
5. EVALUATION AND TUROUGHPUT ANALYSIS . . • • • . . . . 	 166
	5.1 Performance Evaluation . . . . . . . . . . . . 	 166
5.1.1 Throughput Estimation for the Three
Processor System . . . . . .	 . .	 166
5.1.2 Throughput Estimation for the Multiple
	
Processor System . . . . . . . . . . . . 	 171
	
5.2 Evaluation of the Processor . . . . . . . . . .	 183
	
5.3 Packet Losses . . . . . . . . . . . . . . . . .	 185
	5.4 Fault Detection and Fault Tolerance . . . . . .	 188
iv
QUEUE THEORETIC MODELLING FOR CALCULATION OF
THE AVERAGE RESPONSE TIMES AND THE AVERAGE
QUEUES IZES .	 .	 .	 .	 .	 .	 ... .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 191
6.1 Introduction	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 191
6.2 Design Parameters of the Switch	 .	 .	 .	 .	 .	 .	 .	 . 191
6.3 The Single Processor Design	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 192
6.3.1 Introduction .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 192
6.3.2 Parameters of the Input Queue .	 .	 .	 .	 .	 . 194
6.3.3 Parameters of the Output Queue	 .	 .	 .	 .	 . 196
6.3.4 Parameters of the Queue for Routing
Service .	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 198
6.3.5 Expression for the Average Response
Time
	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 199
6.3.6 The Average Queue Sizes	 .	 .	 .	 .	 .	 .	 .	 .	 . 203
6.3.7 Interpretation of the Graphs Showing
the Effect of the Various Design
Parameters on the Performance of the
Proposed Packet Switch	 .	 .	 .	 .	 .	 .	 .	 .	 . 204
6.4 The Three Processor Design	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 212
Page
6.4.1 Introduction. . .	 . . .	 212
6.4.2 Expressions for the Waiting Times
at the Various Queues and the
	
Overall Average Response Time . . . . . . 	 212
6.4.3 Expressions for the Average Queue Sizes	 214
6.4.4 Interpretation of the Graphs Showing
the Effect of the Various Design
Parameters on the Performance of
the Proposed Three Processor Packet
Switch	 . . . . . . . . . . . . . . . . . 	 215
	
6.5 The Multiple Processor Design . . . . . . . . . 	 222
6.5.1 Introduction . . . .
	
. . . . . .	 222
6.5.2 Analytical Expressions for the
Waiting Times at the Various Queues
and the Overall Average Response Time	 223
6.5.3 Expressions for the Averac;e Queue
	
Sizes at the Various Queues . . . . . . . 	 226
6.5.4 Interpretation of the Graphs Showing
the Effect of the Various Design Para-
meters on the Performance of the
Proposed Multiple Processor Packet
Switch
	
. . . . . . . . . . . . . . . . . 	 227
	
6.6 Conclusions ... . . . . . . . . . . . . . . . . 	 230
v
Page
7 .0	 Summary	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 231
7.1	 Suggestions for Future Work .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 231
7.2	 System Throughputs .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 231
7.2.1	 Single Processor Packet Switch .	 . .	 .	 .	 .	 232
7.2.2	 Three Processor Packet Switch.	 .	 . .	 .	 .	 .	 232
7.2.3	 Multiple Processor Packet Switch . .	 .	 .	 .	 232
7.3	 Queue Theoretic Results	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 234
REFERENCES.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 235
APPENDIX A:	 INPUT SERVICE ROUTINE MICROCODE . .	 .	 .	 .	 319
APPENDIX B:	 PROCESSOR-CONTROLLED ELIST.
	
. . . .	 .	 .	 .	 324
i
vi
LIST OF FIGURES
`-^	 Fie
2.1
	
Single Processor System Architecture
3.1	 Three Processor System Architecture . . . . . . .
3.2	 Input Buffer for One User . . . . . . . . . . . .
3.3	 Input Buffer Polling Circuit . . . . . . . . . .
3.4	 Single Data Path in the Input Switching
Network . . . . . . . . . . . . 	 . . . . . . . .
3.5	 Input Data Path Busy Port . . . . . . . . . . . .
3.6
	
One Location in the Shift Register Array . . . .
3.7
	
Shift Register Polling Circuit . . . . . . . . .
3.8
	 Output Queue List Data Structure	 . .	 . . . .
3.9
	 Am 29705 Two-Port PAM • • • • • • . • • . • .
3.10	 One Output Queue List . . . . . . . . . . . . . .
3.11 One Data Path in the Output Switching Network . .
3.12	 Output Data Path Busy Port . . . . . . . . . . .
3.13	 One Output Buffer . . . . . . . . . . . . . . . .
3.14 One Output Status Word and the Output Buffer
Polling Circuit . . . . . . . . . . . . . . . . .
3.15	 The Empty Shift Register List Data Structure . .
3.16	 The ELIST Hardware 	 . . . . . . . . . . . . . . .
3.17	 The Processor Architecture . . . . . . . . . . .
3.18	 The IEU for the Input and Output Processors . . .
3.19	 Tile Routing Processor's IEU . . . . . . . . . . .
3.20	 Am 2903 Four-Bit ALU Slice . . . . . . . . . . .
3.21	 Addressing Matrix . . . . . . . . . . . . . . . .
3.22	 Input Processor IEU UW Control Fields . . . . . .
3.23	 Routing Processor IEU WW Control Fields	 .
Page
10
24
26
28
30
32
33
36
37
39
41
45
47
48
50
53
54
56
58
59
60
62
64
65
vii
Figure Page
3.24 Output Processor IEU uW Control Fields	 .	 .	 .	 .	 . 66
3.25 ALU Control Fields	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 67
3.26 Microprogram Control Unit	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 70
3.27 An 2911 Microprogram Sequencer	 .	 .	 .	 .	 .	 .	 .	 .	 . 71
3.28 MCU uW Control Fields	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 72
3.29 Jump Control Logic Functions	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 73
3.30 Processor Clock Waveforms 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 76
3.31 Input Service Routine Flowchart	 .	 .	 .	 .	 .	 .	 .	 .	 . 78
3.32 Input Service Routine
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 80
3.33 Packet Routing Service Routine Flowchart
	 .	 .	 .	 . 81
3.34 Packet Routing Service Routine 	 .	 .	 .	 .	 .	 .	 .	 .	 . 87
3.35 Output Service Routine Flowchart
	 .	 .	 .	 .	 .	 .	 .	 . 89
3.36 Output Service Routine	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 92
4.1 The Multiple Processor System Architecture
	 .	 .	 . 95
4.2 Processor-Controlled ELIST Architecture .
	
.	 .	 .	 . 104
4.3 ELIST Data	 Input Port
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 108
4.4 ELIST RAM Structure
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 110
4.5 ELIST Data Structure 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 111
4.6 ELIST Input Port Hardware Timing Diagram 112
4.7 ELIST Data Output Port	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 113
4.8 FLIST Output Port Hardware Timing Diagram 116
4.9 Input System Architecture	 "A"	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 119
4.10 Input System Architecture
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 122
4.11 Input Processor IEU Microprogram Control Fields 125
4.12 Input Processor MCU Control Fields and Jump
Control Logic Function	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 126
Viii
Fi ure Page
4.13 Input Service Poutine Flowchart .
	 .	 .	 .	 .	 .	 .	 . .	 128
4.14 Input Service Routine
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 130
4.15 System Architecture for a Single Packet Sorting
Processor	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 132
4.16 System Architecture for a Single Packet Routing
Processor	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 133
4.17 A Single Packet Routing Data Port .
	 .	 .	 .	 .	 .	 . .	 136
4.18 Packet Routing Data Port Polling Circuit
	 .	 .	 . .	 137
4.19 Packet Routing Data RAMS
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 138
4.20 Packet Routing Data List Data Structure .
	
.	 .	 . .	 140
4.21 Packet Sorting Processor IEU Microprogram
Control	 Fields	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 142
4.22 Packet Sorting Processor MCU Control Fields
and Jump Control Logic Function .
	 .	 .	 .	 .	 .	 .	 . .	 143
4.23 Packet Sorting Service Routine Flowchart	 .	 .	 . .	 145
4.24 Packet Sorting Service Routine 	 .	 .	 .	 .	 .	 .	 .	 . .	 14;
4.25 Packet Routing Processor IEU
	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 149
4.26 Packet Routing Processor IEU Microprogram
Control Fields
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 150
4.27 Packet Routing Processor MCU Control Fields
and Jump Control Logic Function,	 .	 .	 .	 .	 .	 .	 .	 . .	 151
4.28 Packet Routing Service Routine Flowchart
	 .	 .	 . .	 153
4.29 Packet Routing Service Routine
	 .	 .	 .	 .	 .	 . 155
4.30 System Architecture for a Single Output
Processor	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 157
4.31 Output Processor IEU Microprogram Control
Fields	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 158
4.32 Output Processor MCU Control Fields and
Jump Control Logic Function
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 159
4.33 Output Service Routine Flowchart
	 .	 .	 .	 .	 .	 . .	 162
4.34 Output Service Routine	 .	 . .	 164
A
t9r d.
Figure	 Page
5.1	 System Throughput as a Function of the Number
of Processors . . . . . . . . . . . . . . . . . . 	 176
5.2	 System Throughput as a Function of the Number
of Userb	 . . . . . . . . . . . . . . . . . . . .	 180
6.1	 The Queuing Model . . . . . . : . . . . . . . . . 236
6.2
	
The Modified Queuing Model. . . . . . . . . . . . 237
6.3
	
Average Waiting Time Vs. Utilization Factor at
Queue 1 . . . . . . . . . . . . . . . . . . . . .	 238
6.4 Average Waiting Time Vs. Utilization Factor p2
at Queue 2 With p l As A Parameter
,
 .	 .	 .	 .	 . .	 .	 . 239
6.5 Average Waiting Time Vs. Utilization Factor p3
at Queue 3 With p 1 and p 2 As Parameter. . . .	 .	 . 240
6.6 Average Waiting Time Vs. Utilization Factor p3
With p l and p2 As Parameters.	 (Queue 3)	 . . .	 .	 . 241
6.7 Average Waiting Time Vs. Utilization Factor at
Queue 3 With p  and p 2 As Parameters.	 .	 .	 . .	 .	 . 242
6.8 Average Waiting Time Vs. Utilization Factor p3
At Queue 3 With pl and p2 As Parameters . . .	 .	 . 243
6.9 Average Waiting Time Vs. Utilization Factor p3
With p 1 and p2 As Parameter .	 .	 .	 .	 .	 .	 .	 . .	 .	 . 244
6.10 Average Waiting Time Vs. Utilization Factor p3
With pl and p2 As Parameters.	 .	 .	 .	 .	 .	 .	 . .	 .	 . 245
6.11 Average Waiting Time at Queue 1 Vs. Clock Cycle
Time of The Processor.	 (For the Proposed Design
0 - 120 ns.) .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 . 246
6.12 Average Waiting Timr. at Queue 2 Vs. Clock Cycle
Time of The Processor. (For the Proposed Design
0 - 120 ns.). . . . . . . . . . . . . . . . . . .	 247
6.13 Average Waiting Time At Queue 3 Vs. Clock Cycle
Time of The Processor. (For the Proposed Design
^ - 120 ns.) . . . . . . . . . . . . . . . . . . . 	 248
6.14 Overall Average Response Time Vs. Packet Size B
With p l , p2 and p3 As Parameters. . . . . . . . . 249
6.15 Overall Average Response Time Vs. Destination
Functions. ( X-10 packets/sec., B=1024 bits,
0=120 ns, p 1-.2325, p 2=.2685, p3=.48075,
S i = ^M8 )
	
	
. .	 250
x
ag=gam=	 Page
 - 
6.16 Overall Average Response Time Vs. Destination
Functions. (1-10 packets/sec., B-1024 bits,
4-120 no t p1-.2325, p2-.2685, P3-.480751
Si = yiABS) . . . . .	 . . . . . . . . . . . . . .	 251
6.17 Overall Average Response Time Vs. a in Si =-a. . 252
6.18 Overall Average Response Time Vs. a in S i - XBy ia. 253
6.19 Overall Average Response Time Vs. a in
Si - AByi + 
a8 (a-1) 
Byi254
E
i
6.20 Overall Average Response Time Vs. a in S i - lMa .	 255
6.21 Overall Average Response Time Vs. a in S i - iBYia. 256
6.22 Overall Average Response Time Vs. a in
AB(a-1)
Si
 - 11BYi + . 257
E Vow,
i
6.23 Average Queue Size Vs. Utilization Factor at
Queue 1 . .	 . .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 258
6.24 Average Queue Size Vs. Utilization Factor at
Queue 2 With pl As A Parameter .	 .	 .	 .	 . .	 .	 .	 .	 .	 259
6.15 Average Queue Size Vs. Utilization Factor At
Queue 3 With p l and p 2 As Parameters . . .	 .	 .	 .	 .	 260
6.26 Average Queue Size Vs. Utilization Factor At
Queue 3 With p l and p 2 As Parameters .	 . .	 .	 .	 .	 .	 261
6.27 Average Queue Size Vs. Utilization Factor At
Queue 3 With p l and p 2 As Parameters .	 . .	 .	 .	 .	 .	 262
6.28 Average Queue Size Vs. Utilization Factor At
Queue 3 With p l and p 2 As Parameters .	 . .	 .	 .	 .	 .	 263
6.29 Average Queue Size Vs. Utilization Factor At
Queue 3 With p l and p 2 As Parameters .	 . .	 .	 .	 .	 .	 264
6.30 The Queueing Model For The Three Processor Design. 265
7
xi
F- i=	 P-ne
r
-	
6.31 Average Waiting Time Vs. Utilization Factor
(,.)	 At The Input Queue . . . . . . . . . . . . . . . 265
6.32 Average Waiting Time Vs. Utilization Factor
e!	 At The Routing Queue: (No Contention) . . . . . 267
6.33 Average Waiting Time Vs. Utilization Factor
At The Routing Queue. (Contention at all times) 268
6.34 Average Waiting Time Vs. Utilization Factor
At The Output Queue. (No Contention). . . . . . 269
6.35 Average Waiting Time. Vs. Utilization Factor
At The Output Queue. (Contention at all times). 270
6.36 Average Queue Size Vs. Utilization Factor At
The Input And The Routing Queues. (No
Contention . . . . . . . . . . . . . . . . . . . 	 271
6.37 Average Waiting Time Vs. Clock Cycle Time Of
The Processor At The Input Queue. (For The
Proposed Design 0=120ns) . . . . . . . . . . . . 272
6.38 Average Waiting Time Vs. Clock Cycle Time Of
The Processor At The Routing Queue. (No
Contention For The Proposed Design o-120ns). . . 273
6.39 Average Waiting Time Vs. Clock Cycle Time Of
The Processor At The Routing Queue. (Contention
At All Times For The Proposed Design 0-120ns). . 274
6.40 Average Waiting Time Vs. Clock Cycle Time Of
The Processor At The Output Queue. (No
Contention For The Proposed Design ^-120ns). . . 275
6.41 Average Waiting Time Vs. Clock Cycle Time Of
The Processor At The Output Queue. (Contention
At All Times For The Proposed Design m-120ns).	 276
6.42	 Average Waiting Time Vs. Utilization.Factor
At The Input, Output and Routing Queues. . . . . 277
6.43 Overall Average Response Time Vs. Utilization
Factors At The Input, Output and Routing Queues. 278
6.44 Average Waiting Time 	 Clock Cycle Time Of
The Processor At The Input Queue . . . . . . . . 279
6.45 Average Waiting Time Vs. Clock Cycle Time Of
The Processor At The Output Queue. 	 . . . . . 280
6.46 Average Waiting Time Vs. Clock Cycle Time Of
The Processor At The Routing Queue . . . . . . . 281
xii
Fiqure
6.47
6.48•
Pale
Overall Average Aesponse Time Vs. Packet
Size B with q , P 2 and p 3 As Parameters. . . . . 282
Overall Average Response Time Vs. Destination
Functions . . . . . . ' . . . . . . . . . . . . . .	 283
6.49 Overall Average Response Time Vs. Destination
Functions .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 . 284
6.50 Overall Average Response Time Vs. a in Si s 
LMa. 285
6.51 Overall Average Response Time Vs. a in
Si M XBy ia	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	
.	 , 286
6.52 Overall Average Response Time Vs. a in
18(a-1) AB
Si
 s lBy i +	 .	 . 287
E ABi
6.53 Overall Average Response Time Vs. a in Si	
11a. 288
6.54 Overall Average Response Time Vs. a in
Si M 	AByia 	.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	
.	 .	 . 289
6.55 Overall Average Response Time Vs. a in
XB(a-1) A
.
290
E —B
i
6.56 Average Queue Size Vs. Utilization Factor At
The Input Queue .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 . 291
6.57 Average Queue Size Vs. Utilization Factor At
The Routing Queues .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 . 292
6.58 Average Queue Size Vs. Utilization Factor At
The Output Queue	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 . 293
6.59 Average Queue Size At The Output Queue Vs. The
Number of Output Lines .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 . 294
6.60 Average Queue Size ;t The Output Queue Vs. The
Number of Output Lines	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 . 295
6.61 Average Queue Size At The Output Queue Vs. The
Packet	 Sire .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 . 296
6.62 Average Queue Size At The Output Queue Vs.
Packet Size . . . . . . . . . . . . . . . . . . . 	 297
6.63 Average Queue Size At The Output Queue Vs. The
Clock Cycle Time . . . . . . . . . . . . . . . . 	 298
xii
..
Figure	 Page
6.64 Average Queue Size At The Output Queues Vs. The
{	 Clock Cycle Time . . . . . . . . . . . . . . . . 299
6.65 The Queueing Model For The Multiple Processor
Design
	 .	 ,	 .	 300
6.66 Average Waiti,;g Time Vs. Packet Arrival Rate At
The Input Queue . . . . . . . ... . . . . . . . .	 301
6.67 Average Waiting Time Vs. Packet Arrival Rate At
The Input Queue . . . . . . . . . . . . . . . . .	 302
6.68 Average Waiting Time Vs. Packet Arrival Rate At
The Input Queue . . . . . . . . . . . . . . . . . 	 303
6.69 Average Waiting Time Vs. Packet Arrival Rate At
The Output Queue . . . . . . .	 . . . . . .	 304
6.70 Average Waiting Time Vs. Packet arrival Rate At
The Output Queue . . . . . . . . . . . . . . . . 305
6.71 Average Waiting Time Vs. Packet Arrival Rate At
The Output Queue . . . . . . . . . . . . . . . . 306
6.72 Average Waiting Time Vs. Packet Arrival Rate At
The Routing Queue . . . . . . . . . . . . . . . . 	 307
6.73 Average Waiting Time Vs. Packet Arrival Rate At
The Routing Queue . . . . . . . . . . . . . . . . 	 308
6.74 Average Waiting Time Vs. Packet Arrival Rate At
The Routing Queue . . . . . . . . . . . . . . . . 	 309
6.75 Average Waiting Time Vs. Packet Arrival Rate At
The Sorting Queue . . . . . . . . . . . . . . . . 	 310
6.76 Average Waiting Time Vs. Packet Arrival Rate At
The Sorting Queue . . . . . . . . . . . . . . . . 	 311
6.77 Average Waiting Time Vs. Packet Arrival Rate At
The Sorting Queue . . . . . . . . . . . . . . . . 	 312
6.78	 Overall Average Waiting Time Vs. Packt;. Size 	 313
6.79 Overall Average Waiting Time Vs. Destination
Functions . . . . . . . . . . . . . . . . . . . . 	 314
6.80 Average Queue Size Vs. Packet Arrival Rate At
The Input Queue . . . . . . . . . . . . . . . . . 	 315
6.81	 Average Queue Size Vs. Packet Arrival Rate At
The Output Queue . . . . . . . . . . . . . . . . 	 316
6.82 Average Queue Size Vs. Packet Arrival Rate At
The Routing Queue . . . . . . . . . . . . . . . .
	
317
xiv
sFigure	 Page
6.83 Average Queue Size Vs. Packet Arrival Rate At
The Sorting Queue . . . . . . . . . . . . . . . . 318
i
xv
5.3
LIST OF TABLES
Pag6
Contention Problems in a Shared Flag System . . .
	
20
Hardware Control Signal Codes . . . . . . . . . .
	
25
Microprogram Word Bit Divisions . . . . . . . . .
	
75
Software Execution Times for the Three_
Processor System ... . . . . . . . . • . . . . . .
	
168
Software Execution Times for the'Multiple
Processor System . . . . . . . . . . .
	 . . . .	 174
Throughput for Each Processor Class . . . . . . .
	
175
xvi
1.0 INTRODUCTION
In the.past decade packet switching has revolutionalized
data communication. In 1960 virtually all interactive data
communication networks used circuit switching, which is the
current technology used in telephone networks 111. Circuit
switching networks preallocate channel bandwidth for an entire
message. However, since most interactive data traffic occurs
in short bursts, a large portion of the bandwidth is wasted.
Thus, as digital electronics became inexpensive and the need
for more digital data communication networks grew as computer
technology expanded, the redesign of data communication net-
works became economically feasible and desirable. Packet
switching was introduced since it allows for the dynamic allo-
cation of bandwidth, which permits users to share the same
transmission line previously assigned to only one user.
Packet switching has improved the economics of data communi-
cation systems, network reliability and functional flexi-
bility [11.
Packet switching networks divide the users' messages into
small segments, or packets, of data which move through the
network towards their destination. All packets are fixed-
length and serial in structure. Packets consist of a header
and a body. The header, which precedes the body, contains
the routing control information which indicates the packet's
E	 source and destination. In addition, the header also con-
tains message reconstruction information for use at the des-
{ T `r	 tination. Since a complete message may occupy more than one
j	 1
/i
^^0Nr11M^!^
^r++
	
packet,. each header contains a message number and a packet
sequence number. Thus, any packets arriving in a scrambled
sequence can be rearranged to correctly yield the entire
i
message received. The body of a packet contains the data
E '
	
	
being transmitted. The length of each packet within a net-
work is fixed for the entire system.
The routing of these packets is handled by the packet
switches implemented an the network. These special switches
k replace the previous circuit switches found in telephone net-
works and older data communication networks. The scope of
the work presented in the following chapters consists of the
design and evaluation of these packet switches using micro-
processors to control the switching functions.
1.1 Problem Definition
This report examines the problem of designing and evalua-
tiny, multiprocessor-controlled packet switches. (The design
atX evaluation of a single processor version is presented in
1;e,3).) The work presented in the following chapters will investi-
gate: the question of how large a multiprocessor packet switch can
be constructed before the problem of resource contention erodes
the system's performance. The performance of these multiprocessor
designs will be evaluated in terms of their maximum throughput with
respect to the number of users and the number of processors imple-
mented, average delay within the switch, and queue sizes.
These packet switches must be capable of routing packets
among any number of up to several hundred users. In addition,
2
x
all designs must allow the use of these packet switches in
communication satellites as well as in networks using only
land lines. The problems of protocols and error-correction
codes are briefly reviewed in this work.
1.2 Approach to the Problem
System design considerations are examined first. These
considerations include protocols, prior work, workload divi-
sions and resource contention among processors. A review of
protocols and their effects on throughput is presented. Using
the information from this investigation of protocols, a deci-
sion is made on how to handle this prGi)lem.
After the protocol problem is solved, a review of the
prior single processor design is presented. Using the prior
design as a.foundation, the requirements and goals of the
multiprocessor designs are formulated. A review of the prior
design at the functional level allows the workload division
for the three processor design to be made.
Once the workload division is made, the contention pro-
blems relating to the shared resources are investigated. In
this investigation, each shared resource is identified and
their specific contention problems are examined. Various
solutions to these problems are found and presented.
Once all the design considerations that influence the
actual implementation are examined, the system architecture
of the three processor packet switch is designed. The design
of the architecture, its operation and functional requirements
3
^rR
y
ifM
allows the detailed design of the system hardware to be
^^. completed.
once the system hardware is designed, the processors
and their software requirements are defined and designed in
detail.
After the design of the three processor system is com-
pleted, the same design procedure is repeated for the design
of the multiple processor packet switch.
With both designs complete, an evaluation of each system
is carried out. The evaluation determines the maximum through-
put of both architectures. A queue theoretic model is
developed that facilitates analysis of delay and queue sizes
i
j	 within the packet switch.
4
2.0 SYSTEM DESIGN CONSIDERATIONS
The final architectural designs of the multiprocessor-
based packet switches are influenced by several system de-
sign constraints and goals. Some of these are considered in
the single processor architecture (2,3]. Thus, those par-
ticular considerations will be reviewed briefly in this
chapter. The remaining design considerations arose directly
from the use of multiple microprocessors, and shall be dis-
cussed in detail. The review of each design constraint and
design goal will lend an explanation to the approach taken in
the development of the new system architectures.
2.1 Protocols
Much attention was given to the analysis of various pro-
tocols and their effects on the packet switch in the previous
work (2 , 3]. Implementation of a full forward error correc-
tion (FEC) scheme, an End-to-End Automatic-Repeat-Request
(ARQ) scheme, and an Up-Link ARQ scheme were considered. The
results of this research were used to select a protocol scheme
for the multiprocessor architectures.
Since large system throughput is a major goal, any proto-
col which was shown to reduce system throughput was eliminated
from further consideration.
A reduction in throughput was found to be linked to all
protocols requiring the packet switch to maintain special
software. Thus, only protocols which are transparent to the
packet switch will be supported by the multiprocessor
5
rchitectures. Two protocols which fulfill this requirement
re the FEC.scheme and the End-to-End ARQ scheme.
In addition to improving system throughput, "transparent"
protocols offer the users flexibility. Users can custom tailor
protocols to meet their needs. Transparent protocols could be
changed or altered even after the network is completed and
operational. Also, different protocols could be implemented
between different users in the same network.
2.2 Packet Construction
'
	
	 The packet format consists of a body and a header. Pac-
kets are serial in structure with the headerrecedin theP	 g
body. The body length of a packet is fixed for a given system.
However, the selection of this length is generally made from
a range of 256 bits up to 10240 bits. In order to maximize
the throughput of the multiprocessor-based packet switches
under investigation in this report, the recommended body length
is 10240 bits.
The packet header contains information required to route
the packets to their proper destinations. In addition, the
header also contains special information needed by the destina-
tion. Since entire messages may exceed the length of a single
packet, they must be divided into packet-length segments before
transmission. The last packet of a message will be "padded"
with blank characters to fill unused bits in the packet
should the message not require an integer number of packets.
The sfecial header information is used by the destination
f
6
to reconstract entire messages which have been sent via
(	 several packets. Therefore, should the packets arrive in a
i
scrambled sequence, the message is still recoverable. This
information and the routing information is arranged in vari-
ous fields. These fields contain, in-coded form, the packet's
source, destination, message number and sequence number.
Since the header information is vital for proper packet
transmission, its protection is a system design requirement.
Thus, the header is protected by an error -correcting code.
The Bose-Charedhuri -Horquenghen (BCH) code was chosen for this
task in the original design and is implemented again in the
multiprocessor systems. The cost of this protection is in-
creased hardware and software for both the system users and
the packet switch. However, this increased overhead has been
deemed necessary in order to maintain the integrity of the
network. The header is the only part of the packet which has
error-correction protection that is used directly in conjunc-
tion with the packet switch. Error protection of the packet
body is optional to system users and must be implemented at
the ground stations.
2.3 The Prior Architecture
There are several important design philosophies which
have shaped the architecture of the packet switch. They are
incorporated in the multiprocessor architectures as well as
in the single processor architecture. The following are the
^-^	 design guidelines used for all architectures:
7
1) A fixed packet length must be used by the network.
This simplifies hardware and software requirements.
2) All packet transfers through the switch are done
serially. This eliminates any need for Serial-to-
Parallel and Parallel -to-Serial conversions. (Al-
though all packet transfers are done serially, the
processor accesses the header in parallel.)
3) Since all packet transfers are serial, this operation
is to be managed by dedicated hardware. Processor
control of this function would decrease system through-
put due to the comparatively slow speed of software.
In addition, the use of dedicated hardware ­o perform
this task allows the processor to spend more time
making decisions and controlling other system opera-
tions.
4). The full capacity of the processor must be utilized
to avoid throughput reduction. This goal is achieved
by regviring that the processor never wait for hard-
ware. This requires parallel hardware for certain
functional blocks. These blocks are initiated into
action by the software. This hardware completes its
assigned task automatically without further software
supervision. All architectures permit several simul-
taneous operations to Le performed, since the processor
is free to move on to new tasks once the hardware is
activated.
4	
,^
8
.z
. a
5) To further increase system throughput, the procv9soi
is only allowed t3 access the header of each p,^+^:ket.
while in the switch, the packet bodies are left un-
touched by the processor. Since the routing informa-
tion needed by the processor is found only in the
header, this design goal is easy to implement.
The final system architecture of the single processor
packet switch ik> presented in Figure 2.1. This packet switch
handles N usere who are allocated one line each. Operation
of the system consists of each user transmitting their packets
to the rr:itch which routes the packets to the proper destina-
tion. The packets arrive at the switch as serial bit streams.
The switch is configured such that any user may communicate
with any other user in the network.
The routing of the users' messages begins with the
buffering of all incoming packets. Each input line is double
buffered. Even with double buffering, the processor service
response time must be short. Buffer overflow will destroy
paci;ets left tor' long in a buffer. In order to avoid packet
lossas, a oininum of processing is done at the input buffers.
As •-oon. ^s a full buffer is detected, the processor immediately
stores the packet in temporary storage. This storage area is
constructed of shift registers arranged in an array.
Once stored in the shift register array, each packet
receives additional service. Their headers are decoded by
the processor to determine each packet's destination. The
I	 routed packets are assigned to software output queues. Use
9
04
M
r
O
a
a
wet
V4 0
y ob <
^
•
O
a
M
Y ••
w ,40 w d
w y
6
M r'•
4
^I
a
N 34
10
•
• N
M1Y 3w YO
Ow ba+ y
M
L—:A—1
SL
.
0
I
$4
a
V
V
A
r• N
Y to ^7
L+ r0• 7 Gl W
►...4	 v O
p• 
u
NH Gr
O^
o+ ^
r. O
•,+ u
rl
N
•
:^
W
of software queues eliminates the need for additional packet
transfers required by hardware queues. Each queue corresponds
to one unique output buffer.
When an output buffer becomes empty, the processor
accesses the associated queue for the next packet awaiting
transmission. Each queue contains the location of each routed
packet in the array awaiting transmission to that queue's
corresponding output buffer. Using this information, the
processor begins the transfer of the queue's oldest packet to
the proper buffer. Once in the buffer, the packet is then
transmitted onto the network channel under hardware control,
The software required to control the packet switch con-
lists of three routines: The input service routine., the back-
ground service routine and the output service routine.
The input service routine is interrupt driven. Execution
of this routine begins when the Data Available (DAV) line of
an input buffer becomes active and is detected by the input
interrupt polling circuit. Equal priority among all users is
ensured by the sequential scanning of these DAV lines.
The first task of this software is the linking of a free
data path in the input switching network to the full buffer.
Next, the address of an empty shift register is fetched from,
the Empty Shift Register List (ELIST). This shift register
is then linked to the full buffer via the data path. Fina?.ly,
the processor initiates the packet's transfer into the array.
This routine has the highest priority and is uninterruptable.
11
ry 	 R )	 4
t.M-0A x
	
	 The background service routine continually scans the
shift register array in search of packets requiring service.
Upon finding one, the processor fetches the header. The header
:r
is corrected, if necessary, by using error pattern data stored
in a Syndrome Decoder ROM. Next, the packet's destination is
t
	
	
determined. The packet's address in the array is then placed
i:i the proper output queue list. However, if this list is
empty and its corresponding buffer is also empty, the processor
will load the packet directly into the buffer. The packet's
array address will then be placed in ELIST. This routine has
the lowest priority since it is not interrupt driven.
Like the input routine, the output service routine is
interrupt driven. Detection of an empty output buffer by the
output interrupt polling circuit forces the execution of this
software package. This routine must first check the output
queue associated with the buffer requesting service. If this
list is empty, the service request "flag" for this particular
buffer is reset and the processor exits from this routine.
However, ii the queue is not empty, the processor then fetches
the array address of the oldest packet in the queue. Using
this address, the processor then links the proper shift regis-
ter to the empty buffer. This link is established via an
available data path in the output switching network. once
the link is complete, the data transfer begins. This routine
has the second highest priority.
12
2.4 processor Workload Divisions
.. - In most. multi-microprocessor systems, the primary design
goal is the identification and separation of all tasks which
are relatively independent 14 ]. Ideally, this allows each
processor to perform a dedicated task. Thus, each processor
can operate mostly independently of the others. As a result,
	
j	 very little data needs to be exchanged among processors rela-
tive to the total system data flow.
	
:+h	 This design philosophy is implemented in the determination
of the processor workload division for the multiprocessor-
based packet switches. The first step in the implementation
is the identification of each "independent" task. A review
of the single processor design shows that the operation of the
packet switch consists of three major tasks. Each of these
are controlled by independent software routines. The three
tasks are:
1) Storage of received packets (Input Function)
2) Routing of each received packet (Routing or Background
Function)
3) Transmission of each routed packet (Output Function)
Now that the "independent" tasks have been identified,
the workload division can be made; one processor is assigned
to each of the three tasks. The architecture supports an
Input Processor, a Routing Processor and an Output Processor.
Each processor supervises dedicated hardware, executes custom
(-^	 software and shares a minimum amount of common resources.
13
 ...
	
T..:	
'SMI
A .1
^}` .Fri
 3^ . {^	 M	 .. .. 
	
'.   	 .
4
	
rx	 -	 7;^^	 Mfr- • _ y .. .
^
Pz,
	
e sharing presents many problems and is the next topic
a
— 	3
e
'2.5 Resource Contention Among Processors
3J
	
	
In most multiprocessor systems, shared resources are
necessary. Unfortunately, they present many control problems
_ 	
3
and may cause reduced throughput. Therefore, they must be
	 }
kept to a minimum.
Concern over shared resources arises whenever the possi-
bility of processor contention exists. Contention occurs when
two or more processors simultaneously request access to the
same resource. This is known as a race condition [ 51. Con-
tention also occurs when one or more processors request access
to a resource currently in use by another processor.
A system's throughput can be severely reduced by conten-
tion in two ways. Simultaneous access of a resource by two
or more processors will cause havoc in the system. Therefore,
special hardware and/or software is required . to schedule re-
source allocation. Only one processor must be granted access
to a particular resource at any given time. This requires
that the other processors be "locked out." Implementation of
any resource locking scheme requiring special system software
will reduce throughput. In addition, processors which become
"locked out" are forced to wait for the busy resource. Pro-
cessor idleness due to contention reduces throughput.
Since increased throughput is the primary goal in the
design of a multiprocessor packet switch, contention must be
14
r"
I R
	
	
minimized. This goal is achieved by first identifying each
shared resource. The following is a list of shared resources
compiled from a review of the system architecture:
'.^	 1) The shift register array
2) ELIST
3) The output queue lists
4) The Output switching network
5) The output-buffers	 i
An analysis of contention problems . for each of these re-
sources is now needed.
All three processors use the shift register array. Each
shift register must assume one of the following states:
1) Empty
2) Holding an unserviced packet
3) Holding a routed packet
4) Shifting out or in a packet in transit
Empty shift registers with their addresses in ELIST can
only be accessed by the Input Processor. Shift registers con-
taining unserviced packets can only be accessed by the Routing
Processor. The Output Processor can only service shift re-
gisters containing routed packets. Thus, any shift register
in one of these three states is free from contention problems.
However, shift registers containing packets in transit
from the array to the output buffers present a contention
^t	 problem. As stated earlier, once a packet transfer is
is
initiated by a processor, dedicated hardware takes control.
'	 Therefore, the processor is now free to start a new task. In
the case of the output processor, the next task is the up-
dating of ELIST with the address of the packet in transit.
ELIST now contains the address of a shift register who's con-
tents are only partially transferred. A resource contention
could occur if the Input Processor uses this location to store
a new packet.
Two solutions to this problem exist. one solution is to
require the output Processor to temporarily hold the address
of each packet in transit. This scheme needs hardware to sig-
nal- the completion of transfers, address storage and additional
control software. Since additional software reduces through-
put, this scheme is not used.
Instead, the scheme used requires the array hardware to
allow the simultaneous transmission of an old packet and the
storage of a new packet at the same location. Although the
shift register array is a shared resource, contention pro-
blems have been avoided.
The shared resources remaining to be examined all have
one thing in common: Each resource is accessed by the Routing
Processor. However, only the output queue lists are accessed
by this processor under normal operation. The Routing
Processor only requires access to ELIST, the output buffers
and the output switching network when a special event occurs.
This event takes place whenever the Routing Processor finds
a packet destined to an output buffer which is empty and who's
16
output queue list is also empty. The Routing Processor re-
sponds by transmitting the packet directly.
In order to deal with this one special operation, allo-
cation of many shared resources is required. This increases
the risk of reduced throughput due to contention. In addition,
throughput will be reduced by the system software required to
manage the resource allocations. Therefore, a decision must
be made whether or not to allow the Routing Processor to
transmit packets as done previously by the background routine
in the single processor design.
Since system throughput is at stake, the Routing Processor
must not be permitted to transmit packets.. Although a new
scheme must be devised to handle this special event, conten-
tion has been completely eliminated from the output buffer
system and the output switching network. (Specific details
on the new scheme are presented next in the contention analy-
sis of the Output Queue Lists.) These resources are now solely
controlled by the Output Processor. In addition, ELIST is now
only accessed by the Input Processor and the Output Processor.
Each output queue list is associated with one unique
output buffer. These lists contain the addresses of routed
packets in the array awaiting transmission. The Routing Pro-
cessor must access these lists to update them with the
7
addresses of newly routed packets. Meanwhile, the Output
Processor must access the lists to find the next packet re-
quiring transmission.
_	 r
{	 Since the Routing- Processor always writes to the lists
while the Output Processor always reads from the lists, dual
17
port RAM's can be used [61. Dual port RAM's permit two pro-
cessors to access them simultaneously provided at least one
processor performs a read operation. Only one processor is
allowed to perform a write operation at one time.
The data structure of the output queue lists is designed
such that if the processors are accessing the same location
the queue is considered empty. Only after the Routing Pro-
cessor updates the list and moves on to the next location can
the output processor read from the once empty list. Thus,
the situation of a concurrent read and write operation at a
single location is avoided. At first glance, the problem of
contention appears to be solved. However, further investiga-
tion is needed to ensure that this is true.
A new output buffer___status word with three states, "busy,"
"empty," and "idle" is now used. An output buffer in the busy
state is in the process of receiving a packet from the array,
receiving output processor service or transmitting a packet.
Once a packet is transmitted, the buffer enters the empty
state which indicates the buffer requires service. An output
buffer is placed in the idle state by the Output Processor
when the buffer becomes empty if its queue list is also empty.
When the Routing Processor encounters a packet destined
for an idle-buffer, it must first update the buffer's queue
list. Then the processor must change the buffer's status
word to indicate the buffer is empty and requires service
from the output processor. This operation replaces the prior
l^,	 scheme of transmitting packets directly. As stated earlier,
18
many contention problems are eliminated by this new scheme.
However, a new subtle problem has arisen.
Table 2 . 1 lists the sequence of events which leads to
the problem. The first line in the table shows that an empty
output buffer is receiving processor service. The buffer's
output queue is currently empty. The Output Processor is pre-
I
"j	 sently accessing this queue. Meanwhile, a packet destined for
this buffer - is being routed by the Routing Processor. The
Routing Processor has just read the status word of this buffer
which indicates the buffer is empty. However, just after the
status word was read, the Output Processor updated it to cor-
rectly indicate that the buffer is idle. Line two in the
s f
	
table now shows the new status word. In addition, the Routing
k	 ^
?	 Processor, acting on incorrect information, has placed thex
packet's address into the queue list without updating the
status word. Line three in the table displays the packet's
address residing in the queue list while the status word still
indicates the buffer is idle. Since the Output Processor can
only service buffers in the empty state, this packet is
trapped in the system. This packet will remain trapped until
a new packet arrives for the same destination. The remaining
lines in the table depict the events leading to the recovery
of the trapped packet. Since the recovery time may be quite
long, a solution to this problem must be found.
In order to solve this problem, a locking scheme is im-
plemented for the output queue lists and the associated output
buffer status words. Any time one processor gains access to
19
Y1
i
I
i
^	 1
1
1
a .^	 1	 a^
^^ ^ I	 1	 1	 ^	 1	 1	 1	 1	 1	 I
N1
^ ^	 1
	1 	 ^
Q	
1	
w ,
^ ^ ^ E 1 ^ ^ N	 g N 1 ' 1ak a	 1	 +
^ a a 1 ^o a s oa ^O V V	 ^/ V V V V
	
to	 0	 0
M T r4 r4	
1
1
1
N	 d	 tV	 4	
T	 r-I
IA ^
	
1	 M ^	 a ^ O
	
1	 ca ^ w
	 as
1
1
	
0 0^0 ^ ^ ^a 1 ^ 
a F-4
	
r4 a a
	
w	 P4
EA
1
1
M	 4>7 !J 1
	 .N N
	 i^J
	
^w a 9 mI	
a a
	 a,
	
0 V	 41 41 41
i	 a a a
z	 z1
1
In
	
N m I qr Ln	 r co 'rn o
	1 	 r-I
I0
W
roN
wb
ro
x
ro
O
a
a
O
V
0U
CV
GJ
r-I
rd
H
20
one of the queue lists, the other processor is locked out from
that particular queue and its associated buffer status word.
Thus, when a processor is granted access to these resources,
the processor is ensured of obtaining correct data. Although
the queue lists are never accessed by more than one processor
at one time, the dual port RAM's are still used since they
simplify other hardware and software requirements. The locking
scheme requirDs some additional hardware and software. Some
processor idleness may also be encountered. However, although
some system throughput is sacrificed, the packet switch's
integrity has been preserved.
The ELIST is the last shared resource to be analyzed for
contention problems. ELIST is shared by both the Input Pro-
cessor and the Output Processor. The Input Processor must
access this list to find available empty shift registers in
the array. The Output Processor updates this list with the
addresses of shift registers released by transmitted packets.
As in the case of the Output Queue Lists, one processor always
writes to ELIST while the other always reads from ELIST.
Therefore, ELIST can be built using dual port RAM's. These
RAM's allow a simultaneous re&I operation and write operation
to take place at different locations without interference.
The data structure of ELIST is designed such that no
simultaneous read and write operations can be performed at
the same location unless the list becomes empty. If the list
becomes empty, the system faces a far graver problem than con-
tention. However, ELIST should never become empty under
21
v4
normal operation. Thus, another shared resource is spared
from contention problems, since the processors using it are
transparent to one another.,
In summary, the output queue lists are the only shared
resources which face contention problems. Details concerning
how this problem is handled are found in 3.1.5, 3.3.2 1 and
3.3.3.
22
3.0 THE THREE PROCESSOR DESIGN
?
	
	 The system architecture of the three processor packet
switch is presented in Figure 3.1. As in the single processor
design, this packet switch handles N users who are allocated
one line each. Again, the switch is configured such that any
user may communicate with any other user in the network. Al-
though the workload is divided among three processors, the
function of the switch remains unchanged from the original
design. Thus, a detailed description of the packet switch's
operation is not presented. Instead, this chapter focuses on
the actual hardware, software and processors required to imple-
ment this new architecture.
3.1 System Hardware
Under processor control, the system hardware carries out
the assigned tasks of the packet switch. Since the packet
switch architecture now supports multiple processors, a new
control signal labelling scheme has to be adopted. Thera
scheme is designed to help eliminate any confusion regarding
the source and destination of each control signal. Table 3.1
contains each control signal code format with an example and an
explanation. Detailed explanations of the circuits and their
operations are presented in the following sections.
3.1.1 The Inpnt Buffers
Displayed in Figure 3.2 is the circuitry required by
an input buffer for one user. All received packets remain in
23
rI
L-
MW
w7
u
u
A
u
w
a
IA
>,
to
w
0
to
w
a^
w
a
v0
'tE
24
V •/ V
Z `_
SEn '7 ^ 1'1 ^' K' ^f
d
a tA!` e ^i
►"^ of o
3
"
W
}(
„
w a Q W V Q W V 1
d CC
°c a t+-K .-^	 sr G
LU Q m V W J+ W S .t Z	 ti U
J1 V, V• f R. 1!
Cr G c
.Z ,^ ^/ fly Jl .!^
u
.^ N Q
`
v
1 ^ ` r L
V
e
J+
N
O
U
Un
N
.-4
O
!4
C
OU
0)
1a
ro
'O
f^
roZ
.•r
ri
0)
.t]
ro
H
25
14
WN
A^
W
O
,,01'
`T
$4
W
44W
W
J
0
a
C
H
N
f"1
b►
.r4
w
^^++
	 t
Q	 H
v^1	 of
26
the input buffer until they are transferred into the shift
register array. In order to reduce the possibility of over-
flow, each input channel is double buffered. The two buffers
at each input channel are packet-length shift registers. Buf-
for select logic determines which buffer is to be linked to
the input channel while the other buffer is linked to the
Input Switching Network. This select logic is driven by a
packet counter. The packet counter monitors the arrival of
packets by counting each bit. Once an entire packet has been
received, the counter rolls over, activating the buffer select
hardware. The select logic then switches the buffer assign-
ments. Concurrently, the counter sets the Data Available
(DAV) flag indicating a full input buffer. This flag is
scanned by the Input Buffer Polling Circuit, which is the
next topic presented.
3.1.2 The Input Buffer Polling Circuit
The Input Buffer Polling Circuit appears in Figure 3.3.
This circuit sequentially scans each input buffer's DAV flag
searching for a full buffer. A counter, which cycles through
N values, drives the poller. The counter's output is supplied
S
to the DAV multiplexer (MUX). Selection of one of the N MUX
inputs is controlled by the counter's value. Each of the MUX
inputs is a DAV signal from an input buffer circuit. The
selected DAV signal is passed onto the Stop Scan flip-flop.
When an active DAV signal is encountered, the Stop Scan flip-
!
	
	
flop is set. Once set, this flip-flop halts the counter.
Simultaneously, it brings the Input Buffer Service Request
27
I—	 Q
s aC
m^
+	 - H
4A
0
u
w
u
0%
.r4
0
a
s+
a^
w
w
0
w
0
a
aH
M
M
w
u J v
v0
U
t	 Y
28
3(IBOR) line high, informing the Input Processor that a full
y	
buffer has been found.
The stabilized counter value represents the address of
-that full buffer. 'This value is sent to the Input Processor
for processing. In addition, the counter's output is supplied
to the Flag Reset Demultiplexer (DEMUX). This DEMUX allows
a
the processor to send the A-RESET (see Table 3.1) signal to
clear the proper buffer DAV signal. The A-RESET signal also
clears the Stop Scan flip-flop, thus restarting the polling
circuit.
3.1.3 The Input Switching Network
C The Input Switching Network can provide a programmablei
data path between any input buffer and any location in th3
shift register array. This network consists of multiple,
programmable data paths permitting the system to handle simul-
taneous packet transfers. A single data path is illustrated
in Figure 3.4.
In order to establish a complete data path in the net-
.work, the Input Processor must first place the address of the
input buffer being serviced into Latch A. The contents of
this latch are supplied to the Data Mux and ' the Input Buffer
Shift Clock Demux. The Data Mux links the selected input
buffer to the switching network. The Shift Clock Demux sup-
:	 plies the shift clock to the selected input buffer. Once this
half of the data path is established, the Input Processor sends
the address of the empty shift register to Latch B. This
latch provides the Data DeMUX, the Shift Register Array Clock
29
V
INPUT' PR0.e'ss k Os-is
LAT11 a rN
	 ,^p3-A
FRom
DATA DATA
TO	 SHIFT
INPVr	 i	 . RE- &Ts7C-F
gUF FERS	 rn v x
I
O E M UX DATA	 IniP•ITS
ii{
INPUT SHrFT
TO
	 I	 B UFFER RtGIsic-R	 I TO SHUT
sNv^T	 SHIFT
S^FFEfi	 ' ARRAY	 I RE&IVIFP,
SHIFT
	 C L0C-X (IOC Lock INNIS,
CIOCk	 I
tern s 	 I	 oEm^x QC-P.
M i-A
C.V.PACKc-T
ONE
6Nor TO	 51(a-JS
covnlrft CUR STATUSCO D
	
Q 1Z ^L1P-FLaf^S
Ck	 Q ^
STO P I
7R pr^sf E R
DATA PAjN puC-y
CLOCK
Fig. 3.4
	 Single Data Path in the Input Switching Network
30
"' Damux and the Status Demux with the select lines needed to
{
complete the data path. 	 The Data Demux links the selected
shift register to the network, completing the actual data
'`- path for the packet transfer.	 The Shift Register Array Clock
Demux supplies the shift clock to the shift register.	 The
function of Status Demux and its associated hardware is ex-
plained in 3.1.4.
• When a data path has been completed, the Input Processor
initiates the packet's serial transfer through the data path
by clearing the Stop Transfer flip-flop. 	 This flip-flop halts
the packet transfer once the packet counter rolls over. 	 The
! packet counter counts each bit of the packet in transit by
monitoring the shift clock pulses. 	 This scheme permits every
packet transfer into the array to be hardware terminated. 	 In
addition to halting the packet transfers, the Stop Transfer
f flip-flop generates the Data Path Busy signal. 	 This signal
r
E indicates the status of the data path, which can be either
idle or busy. Each Data Path Busy signal is sent to hardware
which provides the Input Processor with the address of a free
data path. Figure 3.5 is a diagram of this hardware circuit.
3.1.4 The Shift Register Array
The function of the Shift Register Array is to pro.•ide
temporary storage for received packets which are waiting to
be routed and transmitted. A single location in the array is
shown in Figure 3.6.
As packets arrive from the input buffers, they are shifted
into the shift registers in the array. Each location actually
31
rte...______ _____
Nx
it
r
I !4
0
tt
M
o.
d
Z
H
E J N +c_ W
W	
^j l9
a. J ~
F-
Q
Q
^
T 4
7 L
Q
41
w
O
a
M^W
a
ro
ro
ca
V
a
LL
q
H
M
t^
W
/ M
32
O
H
IY
Y
Q
0
^
^ a
W
gd X
m
Hi a o4W Q 7
• V 50h q.
2
PC cc
av D lC
3
W w
m
7 a ^; 111
• Q	 :7
e so J c	 v
tE
to
L
^n
e9
1
a
•
1
I I o°.
yS
A x
K
O p ^
^'
^ u O
LL a O Liv N ^ ^
LL O v
N
d
4J
tT
a
V
W
N
V
9i
O
•rl
41
oU
a
41
O
10
M
tT
.r4
W
6
6
d
^
Yg	
^3	 aat Q. Ic	 s
e
K t
u a H
.^	
cJ
t:.
s
X
_	 LL O o
i
33
r<
s
uses three shift registers linked together to form the packet-
length storage area required. As the packets are transferred,
the header arrives first and eventually resides in the Packet
Header Shift Register. Unlike the shift registers which con-
tain the actual packet data bits and the header error protec-
tion bits, this shift register allows parallel accessing of
the header. Since the processor fetches each packet header
and also returns the corrected header to the shift register,
the parallel access feature is a system requirement.
As packets are sent to the array, their headers and header
correction bits are also sent to the Syndrome Generator 131.
Each shift register location has its own Syndrome Generator.
This hardware circuit decodes the header information into a
syndrome. A non-zero syndrome indicates an error in the
header data. The syndrome is available to the Routing Proces-
sor which corrects the header using this error pattern infor-
mation.
All packet transfers into the array from the input buf-
fers are hardware terminated. When the Stop Transfer flip-
flop in the Input Switching Network is set, it halts the
packet transfer hardware. In addition, the activated flip-
flop is sent to the input of the Status Demux. This Demux
passes the flip-flop's signal onto the selected shift regis-
ter's Status flip-flop. The activated signal sets the Status
flip-flop  indicating that a packet transfer has been completed
and that this location in the array now contains a packet re-
j (	 quiring service. Every array Status flip-flop is scanned by
34
the Shift Register Polling Circuit, which appears in Figure
3.7. This pollen searches for unserviced packets, notifies
I	
the Muting Processor once one is found, and supplies the
array address of the unserviced packet to the processor. A
set Status flip-flop is cleared once the Routing Processor
accesses the Syndrome Generator at that location. Next, the
poller is restarted by the processor.
Previotisly, the polling of the array was carried out by
the processor. This scheme required additional software and
consumed processor execution time even when empty locations
were scanned 13 j. Thus, the proposed use of a hardware poller
increases the Routing Processor's throughput.
3.1.5 The Output Queue Lists
The Output Queue Lists are the software lists contain-
ing the shift register array address of each routed packet
awaiting transmission. Each list contains the addresses of
routed packets destined for that list's associated output
buffer. The Routing Processor always writes to the lists,
adding the addresses of newly routed packets. Meanwhile, the
Output Processor always reads from these lists, fetching the
next packet to be transmitted. The lists are organized in a
First-In-First-Out (FIFO) format, resulting in the transmis-
sion of the oldest packet in the selected list. Figure 3.8
contains the data structure of the Output Queue Lists.
i
	
	 The index pointer or "Input Pointer" (IPT11) used by the
Routing Processor points to the next address to be filled.
!	 i Once a location is filled, the Routing Processor updates :the
35
/RH
go
z
W T
h^% N
1 ~
m
oc oc w	 ^;(n
 
(P
V
u
w
u
v+
a
r4
0
w
w
d
to
a
w
N
r
r►
d+
.04
w
Y
v0J
36
1 x xx
a S. R.
3 S. R.
4 A X X
5 x X X.
O
O
O
Q Xxx
Z P7'R (n►)
OF TR(rJ)
I P(^►^ 1 xxx
a xxx
? xx x
O
O
O
IQI xxx
O PrR CN)
Fig. 3.8 Output Queue List Data Structure
f
37
IPTR by incrementing it. When the IPTR reaches the end of
the list, it rolls over, returning to the top of the queue
list. The index pointer or "Output Pointer" (OPTR) used by
the Output Processor points to the next location to be read.
In order to fetch the next address from a list, the Output Pro-
cessor must perform the read operation and then it must incre-
ment the OPTR.
The data structure of the lists is designed such that
when IPTR is equal to OPTR the list is assumed to be empty.
Under special circumstances, this assumption may cause packet
losses. This problem is explored further in Chapter S.
In the single processor design, the output queue lists
are stored in local RAM and the index pointers are stored in
the processor's register file. However, in the multiprocessor
environment of the new designs * this scheme no longer meets
system demands. Both the Routing Processor and the Output
Processor must access these lists. Therefore, the Output
Queue lists must be stored in RAM's that are available to
both processors. In order to reduce contention problems,
each list is stored in a physically different RAM structure.
This permits the two processors to simultaneously access dif-
ferent lists without interference. Special locking hardware
is required to prevent simultaneous access of one RAM should
the processors fail to access different lists. As mentioned
earlier, the RAM's used are TWO-PORT RAM's. The logic dia-
gram of the AM29705 chips used is presented in figure 3.9 (6).
Several chips can be arranged to form a RAM structure of re-
required width and length.
38
^r r s ti
wO
f ^
fi N
. y r
u • M
r •
lT
r r r r	 a tit W
39
v	
j
An additional constraint in the design of the queue lists
the requirement that the value of each index pointer be
available to hardware test logic. The function of the test
logic is to notify the Output Processor when an output queue
list becomes empty (OPTR = IPTR). Fulfilling this requirement
results in the storing of all queue list index pointers in
hardware counters. Figure 3.10 is the logic diagram of one
Output .Queud List structure. The operation of this circuit
is best explained by tra^ing the procedure followed by the
Routing Processor and the Output Processor as they access a.
queue list.
Once the Routing Processor has determined the destination
of a packet, it activates the uP4-B (see Table 3.1) control
line which selects the desired Output Queue List. These con-
trol lines, when activated, enable the selected RAM, the
associated locking circuit, and the IPTR updating circuit.
Next, the processor places the shift register array address
into the Output Queue List Data Port. The Routing Processor
then activates the B-REQUEST lines to request access to the
queue list. This control signal is sent to all the queue
lists, but is enabled only at the queue list selected by the
uP4-B signal.
If the selected queue list is available, the B-REQUEST
signal sets the WRITE ACCESS CONTROL flip-flop. This flip-
flop then activates the READ LOCK-OUT line, which disables
the READ ACCESS CONTROL flip-flop. Disabling this flip-flop
docks out the Output Processor from this list. In addition,
40
Q 10	 vo^tfw
I	
^	
.
f	
o.
O	 H
CO
O	
U^	
N
I I^
J
g
s	 p ^
o!	 o r 44y	 w
J
f°	 O
l
d	 O
3
yppt
at
V	 ~
d	 ^	 o0
	
N Iv	 Q
ca
s
a
^ 2 OI	 K
F•^
7
a d0
,J P d
1 m
V
N
t10
G
a.
7
0
•Or, 
I
GIn ^L PACE
41	
QU. Ty
the set WRITE ACCESS CONTROL flip-flop activates the WRITE
^f)
	
	
signal, which enables the RAM in the write mode. The address
data latched in the Output Queue List Data Port is then strobed
s
	
	 into the RAM location selected by the IPTR. The IPTR counter
has as many unique values as the RAM has locations. The
1
	
	
Routing Processor is informed of a completed write operation
by the STATUS-B signal, which goes low when the WRITE ACCESS
(
	
	 CONTROL flip-flop is set. Upon receiving the active-low
STATUS-B signal, the Routing Processor reads the associated
Output Status Word (OSW). (The function and operation of the
OSW is discussed in 3.1.8.) The reading of the OSW before the
release of the queue list is required since access to the OSW
is also controlled by the queue list lock hardware. Thus,
f
only one processor can access both the queue list and the
associated OSW. Once the OSW read operation is performed,
the Routing Processor generates the B-RELEASE signal. This
signal clears the WRITE CONTROL ACCESS flip-flop. Clearing
this flip-flop frees the list since the READ LOCK-OUT signal
and the WRITE signal are de-activated. In addition, the B-
RELEASE signal activates the IPTR UPDATE signal which incre-
ments the IPTR counter.
If the Output Queue List selected is locke6 by the Output
Processor, the B-REQUEST line is disabled by the WRITE LOCK-
OUT line. The Routing Processor is informed of its denied
access via the STATUS-B line, which remains high after the
access request. The action taken by the Routing Processor in
t this event is discussed in 3.3.2.
42
1	
The Output Processor must access an Output Queue List
f	 each time it services an empty output buffer. In order to
aucess the queue list associated with the output buffer being
1
serviced, the Output Processor must first activate the proper
{ pPl-C line. The activated UPl-C line enables the seclected
RAM, the associated locking circuit and the OPTR updating
circuit. Next, the Output Processor generates the C-REQUEST
signal. If the selected queue list is locked by.the Routing
Processor, the C-REQUEST line is disabled by the READ LOCK -
1	 OUT line. The active-low STATUS-C line will remain high after
the request, notifying the Output Processor of its access
denial. The action taken by the Output Processor is discussed
in 3.3.3.
If the requested queue list is available, the enabled C-
REQUEST signal will set the READ ACCESS CONTROL flip-flop.
Setting this flip-flop activates the WRITE LOCK-OUT, READ,
and STATUS-C lines. The activated WRITE LOCK-OUT disables the
B-REQUEST signal, locking the Routing Processor out from this
queue list. Also activated is the READ signal, which places
the RAM in the enabled read mode. In addition, the STATUS-C
line goes low informing the Output Processor that access has
been granted. Once access has bee p granted, the Output Pro-
cessor checks the EMPTY-C line to determine if the list is
empty. This lire is driven by a comparator whose inputs are
the values of IPTR and OPTR. If the two pointers are equal,
the comparator activates the EMPTY-C line.
If the list is empty, the Output Processor releases the
list by generating the C- RELEASE line. The OPTR is not
43
incremental in this situation since the list is empty. Should
the list not be empty, the Output Processor reads the packet's
address from the location selected by the OPTR. Once the read
operation is complete, the Output Processor activates the C-
RELEASE line freeing the list. In addition, the activated C-
RELEASE signal increments the OPTR.
Should both the Routing Processor and the Output Processor
request the • same queue list simultaneously, a Default Circuit
locks out the Routing Processor while granting access to the
Output Processor.
An important point to note about this component is that
although the hardware implementation of the index pointers is
a system requirement, the system is enhanced by this feature.
The first benefit of this scheme is the reduction of software
due to the decrease in index pointer management overhead.
The second benefit is the reduced number of register files
required by the processors since the index pointers are stored
externally. This reduces the processor's complexity. An
additional point about this scheme is that it can be imple-
mented in single processor systems as well as in multiprocessor
systems.
3.1.6 The Output Switching Network
Illustrated in Figure 3.11 is one data path in the
Output Switching Network. As in the Input Switching Network,
the function of this network is to provide the Output Processor
t
	 with programmable data paths.. These data paths are used to
link shift registers in the array •to output buffers. Packet
44
v
14 P4 -CLArr-H 8
OuTPu'r mocHswi% ossis
ATCW A eN t._.___,4 P3 - C
OurPvr PRocessoR o e us
S N IF r OUTPUTF Rom	 I RECTsrER
S HIFT	 i' guFFER I	 TO
RE61STE R ARRny I Ov1PVrI
DArA
(
OUTPUT
DATA
I Q UFi fR
OvtPuTi INK DE/AUX og7R
INPv7S
SWIFT Qutpur
TO
SHIFT'	 i REGISTER gVFFER .I	 TO
Remle q AP.RAy C COCK I OuTPVL4oCk	 I OuTPvT G(m^x 16uFFFP.3,NPuTS	 I
CLOCK I SHIFT
DeA:I^X CCuCk
I_ LNPd?S
Mi-C
COUNT
ONE
	
CK {'ACKET	 $" or 	 FTNISNED I CovIvr
	
(0•^ MER	 q,q	
I FINISIifb
C °	 D	 Q	 X-L	 OC-mvX I I.IlvtS 10
CK	 Q	 UVIPUT
	
HALr	 I IIuFFfR;
TRANSFE R
DATA PitsN Qusy
C^oCK
Fig. 3.11 One Data Path in the Output Switching Network
45
F"'l-
transfers through the switching network are processor ini-
tiated and hardware terminated. There are multiple data
paths allowing simultaneous packet transfers. The circuitry
required to monitor the status of all data paths in the
switching network-is presented in Figure 3.12. This circuit
provides the Output Processor with the address of a free data
path when one is nemded.
3.1.7 The Output Buffers
The function of the output buffers is to receive
packets transferred from the shift register array and to then
transmit those packets to the external channel hardware.
Packets arrive at a rate determined by the internal shift
clock. Packets then leave at the rate maintained by the ex-
ternal line clock. The logic diagram for one Output Buffer is
given in Figure 3.13.
The central component of the buffer is a packet-length
shift register where the packets are stored. While the COUNT
FINISHED line is inactive, the packet is shifted into the
shift register by the internal shift, clock. Meanwhile, the
INHIBIT XMIT flip-flop remains set, disabling the external
shift clock. Once the COUNT FINISHED line is activated by
the Output Switching Network, the INHIBIT XMIT flip-flop is
cleared. This action enables the external shift clock, which
then begins to shift the packet onto the channel line. The
packet counter monitors the complete transfer. As soon as
the last packet bit is shifted out of the buffer, the counter
rolls over. The carry out line from the counter sets the
46
w
a
O
V	 4J
^	 H
ed
	 a°
to
^- a v°	 a
2
I	 a^
L x ^h 0
J N
r-i
M
W
ti
7M
^'V
Q
3
^
J
F-
^
Q'2
H
N
o
N
C^
W
°o
V
Z
W
M
C)
N
J
lE
u-
F-
p
'^O^jO 9r
47
.w
W4cW
m
4
Q p
J ^
0 0
H	 Y
x	 ^
H
m	 Ly
ix
H
. r
$4
W
W
V
aV
a
0
M
r--1
M
b;
•ri
w
llr	
9.
^ M dM J
W ^ O
W `
H ^ o
6	
N
O
v
48
N
iINHIBIT XMIT flip-flop and generates the BUFFER EMPTY signal.
The BUFFER EMPTY signal is supplied to the output buffer's
!	 Output Status Word (OSW). The function and operation of the
OBW is presented as the next tori_..
3.1.8 The Output Status Words
j	 The Output Status Word (OSW) of an-Output Buffer is
I
	
hardware circuitry used to monitor and reflect the current
status of the buffer. All OSW's are accessible to both the
Routing Processor and the Output Processor. Each OSW is linked
to an associated Output Queue List. Thus, just as in the case
of the queue lists, only one processor may access a particular
OSW at any given time. This scheme eliminates the possibility
of one processor reading an OSW while the other processor is
altering the same OSW.
Each OSW indicates one of the three states that its
corresponding output buffer is in. The three output buffer
states are Busy, Empty and Idle. An output buffer is in the
Busy state whenever it is receiving a packet, transmitting a
packet or receiving Output Processor service. Output buffers
enter the Empty state when the packets that they were trans-
witting are completely transferred onto the channel lines.
The Output Processor places an empty output buffer in the Idle
state when the correspording Output Queue List is also empty.
The hardware implementation of one OSW and the Output Buffer
Polling Circuit used to scan each OSW is illustrated in
Figure 3.14.
49
vN
4.
0
ipy
ai
V
M
ml
V
R
.-1
ri
W
d!
w
W
a
a^
^v
a
ro
^o
w
0
N
a
ro
a
b^
a^
a
0
M
a^
w
P.	t	 W
IjV	
w	 > VH .,,^	 e,	 ac
V ^ m ^[
	
y•	 N 6S 4
50
v
The Output Buffer Polling Circuit sequentially scans
each OSW in .search o;. an empty buffer. When a buffer empties,
its support hardware generates a BUFFER EMPTY signal. This
signal sets the OSW's SERVICE REQUEST flip-flop. The acti-
vated SERVICE REWEST line is eventually found by the poller
as it scans the CSW's. Finding an empty buffer, the poller
signals the Output Processor and supplies the processor with
the address of the empty buffer.
The Output Processor then accesses the Output Queue List
associated with the empty buffer. If the list is empty, the
Output Processor updates the OSW to indicate that the buffer
is in the Idle state. This update is done when the Output
Processor generates the C-IDLE signal. (The proper uPl-C
select signal is still enabled from the queue list access.)
The poller is restated by the C-RESET signal. If the list is
not empty, the C-SERVICE signal is activated to clear the
SERVICE REQUEST flip-flop. This updates the OSW to indicate
that the buffer now is busy. The poller is restarted by the
C-RESET signal.
Every time the Routing Processor updates an Output Queue
List, it checks the corresponding OSW. If the OSW indicates
that the buffer is not idle, the OSW is left unchanged. How-
ever, if the OSW indicates that the buffer is in the Idle
state, the Routing Processor updates the OSW to indicate that
the buffer is empty. This update is accomplished when the
Routing Processor activates the B-EMPTY signal. (The proper
UP4-B select line is still enabled from the queue list access.)
51
W
3.1.9 The Empty Shift Register List
The Empty Shift Register List (ELIST) contains the
array addresses of every empty shift register in the array.
This list is read by the Input Processor and written to by
the Output Processor. Figure 3.15 shows the data structure
used to maintain the list. The index pointer (EPTRO) used
by the Input Processor points to the next shift register
address to be fetched. Once the address data is fetched, the
Input Processor increments EPTRO. The index pointer (EPTR1)
used by the Output Processor points to the last location
updated with tho address of a freed shift register. The Out-
put Processor must first increment EPTRl and then perform the
write operation.
This data structure is designed such that under normal
operation, a Read and a Write operation will not take place at
the same location. Thus, both processors can simultaneously
access the list without interference.
illustrated in Figure 3.16 is the hardware circuit re-
quired to implement the ELIST. As in the Output Queue List
system, the pointers are implemented in hardware and the RAM
is a 2-port RAM. Although this is not necessary, since
neither processor requires access to the other's pointer, it
does reduce software overhead. Since this increases through-
put, this scheme is proposed over the previous scheme of
storing the pointers in the register file. The use of hard-
ware index pointers could also be usct in a single processor
52
0N	 5 • (; • ^^
	
do	 E MO, J_
EPTR 0	 -- 0
1 5.P
a nx x
xx x
4 ^,R•^
5 S. R. #
O
O
O
go	 Ef7RI
Fig. 3.15 The Empty Shift Register List Data Structure
53
- UPDATE
X -REAP
C-ORATE
G w RITE
INP U T rROCFSS OK X90,	 OvTPuT PRo(C-sSOK 080$
E
F	 S
i
Fig. 3.16 The ELIST Hardware
54
system by using an up/down counter to hold a single index
pointer. The.ELIST data structure which supports the single
index pointer is found in [ 21 and [ 3 1.
3.2 The Processors
As stated earlier, there exist three classes of processors
in this implementation of a packet switch. Although each pro-
cessor's function is quite different, the actual processor used
in each class is constructed around a similar architecture.
The custom software executed by each processor and the blocks
of unique support hardware are the two elements which give each
class of processor its distinct character. As in the single
processor design, the processors are built using the Advanced
Micro Devices (AMD) 2900 family of bit-sliced processing com-
ponents. The design considerations which led to the selection
of these components are discussed in [ 2 ] and [ 3 ] .
3.2.1 General Processor Architecture
The architecture of all classes of processors is
comprised of two functional blocks: The Microprogram Control
Unit (MCU) and the Instruction Execution Unit (IEU). The
Routing Processor contains one additional functional block:
The Syndrome Read Only Memory (ROM) which contains the header
error correction information in a lookup table format.
Figure 3.17 contains the block diagram of the processor archi-
tecture.
55
f1	 mCu
1	 S	 MICRoPRobrrA ,n SIQuENCEfZ	 IY
t	 S	 I
I	 m	 i
I	 5	 I
A	 I
r
'	 s	 CONTR OL MFm0P,
I	 ^	 I
I	 I
1	 PITE LINE kEG-z src-N I
^	 I
'	 NEXT ACM-M.
SUS CONTROL L2NE5
T E 
PAS
	
=F. IS	 IBuS
ORIGTNAI'	 = ,^
OF
0BuS
LRrcH
A
0
0
R
C
S
j 
- ---
c
r--- --
q	 I SvNORoME '
r
pEcoock. K
` r — — I	 ^`^^------—
ROM	
' ^.-------
Rourrn,^ FRxcss^k
	 .
ONLY
To SWITi if IN4 HF1N'1 r.:i{
Fig. 3.17 The Processor Architecture
56
3.2.2 The Instruction Execution Unit
The Instruction Execution Unit (IEU) of the Input
Processor and the Output Processor is presented in Figure
i
3.18. Figure 3.19 shows the IEU of the Routing Processor.
Both versions of the IEU incorporate the AMD 2903 four-bit
ALU slices. Shown in Figure 3.20 is the block diagram of the
AND 2903 ALU chip. Cascading these chips in parallel will
provide the required width of the processor word.. The AMD
2903 has been selected over the AMD 2901 ALU because the 2903
architecture supports two Direct Data Inputs. The use of the
second data input allows the data from the polling circuits
to be directly supplied to the ALU. This reduces software
overhead since a typical two instruction read operation is no
longer required. Instead, the data is sent directly to the
ALU during the execution of a single instruction. Since this
scheme is implemented for each class of processors, a total
of three memory cycles has been saved, improving throughput.
All the arithmetic and logical operations required for
address generation and data manipulation are carried out by
the IEU. Inputs to the ALU are supplied by five different
sources: The Input Bus (IBUS), the Microprogram Word (uW),
Scratchpad 1, Scratchpad 2, and the polling circuit. The IBUS
provides a data path from all external memory and data ports
to the ALU. Immediate operands in the Control Memory are
suppi_ ..t to the ALU via the uW input. Scratchpads 1 and 2
are two file registers located in the ALU's internal W4.
'	 Their addresses are supplied by an external circuit which can
57
aFig. 3.18 The IEU for the Input and Output Processors
58
MW	 6
()	 S	 r
_l 	Q 	 M
u
k	 HARD- wirer,
E	 s	 AGO r•Es:vS
f	 w
V	 ^
A	 X, w`E oEB	
"	
a
I N Pu T
	 SAS
tAvx
TRY
STgTC
(7N1aoL
•	 FRarh
POCLINfr
Ott' 06 Ou . is
	 ^A	 WE 06p	 B	 0g	 CifttviT
Ama903 A LU
	 I q	 ° E ^z
En
EAjtN
	
Y o6,,	 zEN
OEy
IS
l.A'rt ri	 Ita RAI.E
'$;L ALE
I	 R
Q Roll
AOCF•(Sl
S
L /q tr N
WNORomE	 A tr;tss
OE^orEP^	 LATCPI
R0/4
Atuz,s
DEcoK P,
Fig. 3.19 The Routing Processor's IEU
;"	 t
59
w1
s
• BLOCK MAGAAM
OA[A Pi
AOUAatS AOOAISa  ) •'}
IIAM rrAnl a0_^d
A tDAVAOUr MIAMI
a a
CP	 a	 LATCH We t +— d
a
oaa_s
OAt-^
1
r ! MNX SSW S	 r•
•
4
ON n0-^ 60-3
TWA 0 ALU ca
e•.A a PO-3
a SipO
t% aN I tin Sp.'s [A Ot00
a a
oroa
ar, C>
eP	 aA[cnr[A
t
Q& C>_
IVIS
^
'
asrAUer^oN	 • rt_t
0[000[	 2[AO
WAiTL^L^! •
• .. ^_n CP
t
Vice
Fig.•3.20 AM 2903 Four-Bit ALU Slice
YPA OM
60
^e
provide the hardwired address of the selected register, when
needed. Data from the polling circuit provide the processor
with the address of the device requesting service.
Once the ALU inputs are processed, they leave the ALU
via the Output Bus (OBUS). This bus is sent to system hard-
ware, the Address Latch, the Address Decoder and the Data Bus
Decoder (Input Processor and Output Processor only). The
Address Latch holds address data stable during read and write
operations. The Address Decoder generates device select lines.
When used in conjunction with the Data Bus Decoder, the Address
Decoder forms an Addressing Matrix which can activate single
bit control lines ( 31. This matrix is illustrated in
Figure 3.21.
3.2.3 Microprogram Word IEU and System Hardware Control
Fields
IEU hardware and blocks of system hardware receive
control signals from various fields within the Microprogram
Word (VW). Along with control signals, the ALU can receive
operands from the microprogram word. Control signals from
the UW are also sent directly to systems hardware blocks.
These signals do not require processing by the IEU. There-
fore, while the processor performs one task, the VW control
signals can activate components of the system hardware. This
hardware can either assist the processor in completing its
task or will independently perform a different task. This
scheme permits concurrent operations to be carried on within
the packet switch.
61
Ci 1
Fig. 3.21 Addressinrj Matrix
(Courtesy of James Burnell)
62
Presented in Figures 3.22, 3.23 and 3.24 are the segments
the uW which are required to control the IEU and the system
dware. in addition, theee figures and Figure 3.25 contain
tables used to microprogram the packet switch.
3.2.3.1 ALU Source Fields
The AMD 2903 ALU chip provides the ALU with tT.*
operand inputs labeled R and S. A 2-1 MUX supplies the R oper-
and input with data from either the A output from the internal
register file or the external A-Direct-Data (DA) input. Since
no class of processors utilizes the A register file, the DA
input is permanently selected. External to the ALU, a 2-1 mux
selects either the VW operand data or the data held in the
IBUS Latch, and supplies the selected source to the DA input.
This mux is controlled by the R SOURCE field in the UW.
The ALU's S input has three sources: The B output of the
internal register file, the B-Direct-Data (DB) input and the
internal Q register. Addresses for the B register file are
supplied to the AM2903 via the external B SOURCE mux. This
mux has the harwired addresses for each scratchpad register
used as its inputs. The B ADDRESS field in VW controls this
mux. Data supplied to the DB input arrives from the pro-
cessor's polling circuit. Both the B register file output
and the DB input are tristated. Tristate bus control is
essential since both inputs share the same internal data bus.
This data bus forms one of the two inputs to an internal 2-1
mux. The other mux input is the output from the Q register.
63
n
A!
O M
w
^
M N o
ti j
a	 M J
W d
Q
IQ
2
^S Wz
=
M lw m ^
in
p^GiN AL PAGE aM
Q c•1W ll
s e
^ °
Ot POOR. QU ALI'CY1
w
100 U)
an	 1
J
/
Q
m
cd to
ww,,
i
Z
O
OD N
W
^
w
to
^.{
m
J ? <r t4 0
W
p
W
^.{ H OV
LAr
^
`
^ H
V V
HQ
fp
O
J
Q Z N
C.0
^r
O
c
CSI
G
~
G
fA
N
Z
N
V
LA
Q
co
^
Q
^ 3 3 T >H >H
Ali
a
a
^	 ^
v	 V H Vi Q t9; t-I 4 ^L
CHm
.o Ly d
c[
I 0
I Q	 o'
QDW ^ :< ^ Q x ^
¢
v
^.
N
^.
r f
W o
w W
O^
•,-
H ^1 ^ '0 r{ ^ ^ o
U
cw^^^^^^ xin H
Q1 6v  ^ n
0
t
H
o
64
W^ e We
m m ^
3
10 r
I^
^ W
J '^Q ^
H
M
IOla
w
lw H
CO
m
CV
qL
m
0
Q^
I
.4
r
N
m
ac
r
ZO
N
.e
d^
o~
M
CO
M
M
M
h1
M
M
1
M
M
b
r6
Cd
O
I
V
00JWM
c
W
W
c
Q
w-
OO
1
x
W
V
AW^
W
w
.1.1
C
u°
3
H
w
O
N
N
GuJ
O
1a
a
C
a°
M
N
M
d+
W
^	 r
A o
O	
:
to ^ s
o^
v W
2
J ^ _
•
O W Q rc
t
C.2 o
W LA Or o x
cIC
^
3
'2
3^
Q W H H
^o x e. x
uj
65
r
V`1
b
tl'
M,
01
M
h
M
1
n'1
N("1
M
In
m
0
u
M
u
vH
E
W
G
t-
M
n
v.
W
CL
4
OQ
(AV
w
O
14
R
O
u
W
H
b
OMM
.m
wIwQ,
tz
W
V
a
a
a0
qwN
W1
0%
.,4
w
K
M
m
to
1
s
t6
r6
1
14,
O^
1
1
.o
1
I. j
to
J
W V
J
s 
VA F-
^' ui N
©Cf Jv
i( A
to
O
W ,^S
^ W
o.
t^
N
it	 40
2S
4 W
• L
its,
 V
C QCD
J
¢ C
'A
^ H
V
Z
a ►°
J ^'Q
w
d ^
	W 	 U
	
tC	 11 '^
9
	
O	 ^•
N
J GL^
Q
^ O
Q ^
t ^ C
tE
v
w ^
H O
r
w
cD
3 V
a c
N ^^
N
m
{yj V W
^''Vootu
%
nn..
^^
rA7
,^
d
o
V)
Q 3 w m
Ham` ^. ♦-1 A. xx
I Qw e 4 e-1
66
t	 1 9.7, T10
cr Is	 .,A,
A 0
077
I t ^'I" ^^i rl^i^1^1^^-I-I`lyl
:1	
1
s	
is 
1 3 	 i's is
t o a
O
0
!E
0
O
tr
0C)
0
V
z
z
O
fa
-W — - — -
OF	 IS
TAM 1 1. ALU OP111AW . 50UnC1S
I A I TuL 11-T 171 u	 it	 t j Op.craml S_
L	 t	 L	 HAIA 06 'pul A	 FIA P.1 ou l"VI 0
L	 L	 14	 IIAIA O%o!pvl A	 Dro 3
L	 H	 x	 IIAIA Ojlpvl A	 a Ileg-1 1 (f
H	 L	 L	 DAC 3	 IWA Ow l; A 0
14	 L	 VS03
14I1 
L x	 D ,
0 Prn,%Icf
--	 I
L - LOW	 " - fcc, 14	 x - NMI cre
TADLC 2. AW ►U!,MOUS
1 4 13 12 1 1 	 14ca Code	 I ALU I unaons
L L L L 0
1	 ►4	 ri - tocit
L L I n V n+-s i V4,% C,
L L II L 2 F	 h vus s ixiw, 
I 
r;tjs c,
L L N it 3--
L 5- I ,Iv% C,,
VIVS C,
L m 11 L 141lu
Tl II ii i F	 14 Plot Cn
I - - L
L 11 1^~9,
11 l I I l A F, III, I kcmsw 14011 S,
-W T 1	 1 4 	 1. XCLU51VL 011 SI
-W - 1
it
L
D fj•• 14, 1:014 So
Ii 11 1 1
 -
F,	 14, ► MD
14
L - LOW	 1 4 - HIG14	 I - CIO 3
Fig. 3.25 ALU Control Fields
67
N NI	 I	 MI	 h •^h .ti ^11	 ti,	 'N IN
r,r	 [t Id'	 r,f a	 I^r'e;t.}Ir	 ^	 rls
NIN.
•6 `` NI	 h h'h:N	 ^1	 NON
^ I r t l	 1' r 1 t	 1+ r; r^.^ O i	 r O	 ^	 ^
..1
;IT
/^8
1	 11
•	 IU.",i^_^^I	 1III ( 1
I	 ,	 '	1	 '^, ^1 ^,	
.h 7
	
w 
h•	 'A
i
Ala
I	
i
$
N
' (]	 1 ' 2 '	 X1;1	 ^	 ^	 i	 I
. t	 .', Ir l	 rlt it 1l	 r't ^^,[ Ir [,
1	 ^
I's
. VAL PAQ,
01Z QUA'WTr
TABLE 1. ALU OPEnANC, SOUnCES
L^ Ip VCp ALU Operand 11 ALU Operand S
L L L RAM Ot 1WA A RAM OApA 0
L. L N RAM OvW A 043
L N X RAM Ovtput A O 11"tar
N L L ON- 3 PAM OWput0
N L N DA&>t 000-s
N I N X I	 OAa 3 a "•star
L • LOW	 N . 1410H	 X • Wnl Can
TABLE 2. ALU FUNCTIONS
16 13 12 1 1 Hex Code ALU Functions
l L L L 0
10- l	 I	 SrKt-al Func l•Ons
b_-H 	 F - NiG14
F+ S M nus n m nus t Nlus C„
T7'-A 	 , ► 5 uj ^^ i P,Zs c„
L L L
-1i
N
L
1
1 2L l
L 3	 y F+ n {'lus S 1° ,s CnL N N
L 11 L L • ► S hws Cn
L IT l 1/ S~— -F -^$ Ftus Cn
L N i1 L 6 F	 R vl s Cn
L 11 H N 7^- f = j i ► iui Cn
N l l L 6 FI_ LOW
N L l 'l/ 8 F, + I4,-A ^ _0S ^
11 l 11 -C. A F, + N, 1 kCIU^ ' VF: NUIi S,
F•	 H; CXCLUSIVC On SIN l N 11 0
N 11 L L Cw F j	 11, AND Si
N It l IT 0 -F -^ , 1i1 %Cri 5,
11 it li -E _ F,~ 04, Nk	 S Il
N 11 N 11 C 4i Qii S, -F
L•LOW	 N•10014	 I-0103
I^
1	 ,
Fig. 3.25 ALU Control Fields
r
67
Selection of the S input is made by the VW S SOURCE control
field. This uW field controls the 2-1 mux and the tristate
logic.
3.2.3.2 ALU Function Fields
The selection of an ALU arithmetic function or
logical operation is determined by the VW ALU Function
field.
3.2.3.3 ALU Destination Fields
Internally, the 2903's ALU output is sent to both
the register file's DATA IN input and the Q register (via the
Q shifter). The ALU's output is also available to the Output
Bus (OBUS) via an internal tri-state buffer. The ALU Destina-
tion field can direct the ALU output to any or all of these
locations.
3.2.3.4 Bus Control Fields
.^n order to hold address data stable, the OBUS is
supplied to two address latches: The Address Latch and the
ROM Address Latch (Routing Processor Only). These latches are
enabled by the Bus Latch uW field in cojjunction with the
Phase 2 clock (see 3.2.5) .
The various uW Read and Write fields control data trans-
fers between the processors and external hardware.
3.2.3.5 System Hardware Control Fields
The System Hardware Control Fields consist of vari-
ous control bits used to activate system hardware operations.
68
These signals are sent directly to the hardware since they do
not control.IEU operations. However, they usually act in con-
junction with the processorr often helping to speed up pro-
cessor tasks. They also may direct hardware operations which
carry out independent tasks. Thus, use of these special con-
trol bits has improved system throughput.
3.2.4 The Microprogram Control Unit
The function of the Microprogram Control Unit is
twofold: It must control the execution of the processor's
software and it must supply the microprogram's control signals
to the IFU and the system hardware. A diagram of this unit
is given in Figure 3.26. The MCU consists of an AMD 2911
mic~oprogram sequencer, jump control logic (implemented by a
Programmable Logic Array (PLA)) [3], a pipeline register
and the microprogram memory. A block diagram of the AMD 2911
chip is presented in Figure 3.27. This device generates the
pprogram counter value used to control the execution sequence
.of the processor's microprogram. Next address selection pro-
vides the MCU with one of the two possible next addresses.
Either the uprogram counter or the address in Jump Address
field of the uW is supplied to the address lines of the
microprogram memory. The PLA Jump Control Logic determines
this selection. Inputs to the PLA Jump Control Logic come
fy ori various system status signals and the Next Address Select
pW field. Figure 3.28 contains the Next Address Select field
and the Jump Address field. The Jump Control Logic Function
for each class of processor is given in Figure 3.29.
l
69
v
rg
Y
S
7
8
m
ST
A
s	 0
'Sump
S^ S^
LORC 
	
Am all i	 2FRo
,, /l CPLA)
N
E
x
T
A	 ADDRESS 
i/^Pvr
0
0
R
C
MICROPROCTRAM MEAORy
c
T	 DATA OOT P u T
RESIEF
PIPELINC R E 6. 1 TER
AAW	 1 E V	 SWITC.+2rjG
IMPVT	 CON ROI.	 HA RCUif+it^'
To
LEV	
LIPES	 CONTROL
TUMP AC0R E'S 5	
LLNES
Fig. 3.26 :Microprogram Control Unit,
70
LF.
Fig. 3.27 Am 2911 Microprogram Sequencer
71
Next Address
0 0 uPC±1
0 1 unconditional
Jump
1 X Jump on IBSR-A
a0
I
Control	 38-39
Bits N1 N0
40-43
Jump Address
INPUT	 JA3,JA2,JA,,JA^
PROCES-
SOR
42-44
N1 No J Next Address
0 0 0 uPC + 1
0 0 1 unconditional
Jump
0 1 X Jump on NEW-B
=0
1 0 X Jump on STATUS
-B=1
1 1 X Jump on IDLE-
B=1
OUTPUT
PROCES-
SOR
Jump Address
JA41JA3,JA21JA1
JA 
Mnemonic Next Address Select	 Jump Address
Fields
45-48
Jump Address
ROUTING JA3 , JA2,JA,JA0
PROCES-
SOR
42-44	 45-49
N! N0 J Next Address
0 0 0 uPC+ 1
0 0 1 unconditional
Jump
0 1 X Jump on SER-
VICE-C=0
1 0 X Jump on STATUS
-C=1
1 1 X Jump on EMPTY-
C=O.
Fig. 3.28 MCU uW Control Fields
72
INPUTS	 OUTPUTS
	
ISSR-A N1 No	 FE Cn S1 S0 	 ADDRESS SOURCE
x	 0	 0	 1	 1	 0	 0	 UPC +1
X	 0 .1	 1	 0	 1	 1	 Jump Address
0	 1	 X	 1	 0	 1	 1	 Jump Address
i	 1	 x	 1	 1	 0	 0	 UPC + 1
Input Processor Jump Control Logic Function
INPUTS I
 OUTPUTS
NEW STATUS IDLE N1 No J FE C  S1 S0 ADDRESS SOURCE
-B	 -B	 -B
X	 X	 X	 0 0 0 1 1 0 0	 UPC + 1
X	 X	 X	 0 0 1 1 0 1 1	 Jump Address
0 .	 X,	 X	 0 1 X 1 0 1 1	 Jump Address
1	 X	 X	 0 1 X 1 1 0 0	 UPC+ 1
X	 0	 X	 1 0 X 1 1 0 0	 UPC+ 1
X	 1	 X	 1 0 X 1 0 1 1	 Jump Address
X	 X	 0	 1 1 X 1 1 0 0	 UPC+ 1
X	 X	 1	 1 1 X 1 0 1 1	 Jump Address
Background Processor Jump Control Logic Function
INPUTS	 OUTPUTS
SER- STATUS EMPTY Nl N0 J FE Cn S1 S0 ADDRESS SOURCE
	
VICE- -C
	 -C
C
X	 X	 X	 0 0 0 1 1 0 0	 UPC + L
X	 X	 X	 0 0 1 1 0 1 1	 Jump Address
0	 X	 X	 0 1 X 1 0 1 1	 Jump Address
1	 X	 X	 0 1 X 1 1 0 0	 UFC + 1
X	 0	 X	 1 0 X 1 1 0 0	 UPC + 1
X	 1	 X	 1 0 X 1 0 1 1	 Jump Address
X	 X	 0	 1 1 X 1 0 1 1	 Jump Address
X	 X	 1	 1 1 X 1 1 0 0	 UPC + 1
Output Processor Jump Control Logic Function
Fig. 3.29 Jump Control Logic Functions
73
The output of the AMD 2911 can be unconditionally reset
to zero. This allows the system to initialize program execu-
tion whenever required. Table 3.2 contains the VW widths for
each class of processors. Since some bits in the VW remain a
constant logic value, they can be hardwired. This reduces
the actual Microprogram Memory (ROM) widths.
3.2.5 Processor Timing
A two-phase clock drives the processors. This clock
controls the timing of internal and external data transfers.
Figure 3.30 presents the waveforms and significant timing
events. Phase 1 latches the internal data of the 2903 ALU
and the 2911 microprogram sequences. Phase 2 is required to
stabilize data in the IEU hardware that is external to the
ALU. In addition, this clock phase is used to latch data and
address information required for external data transfers to
I/O ports and memory. Each clock cycle has a period of 120
nanoseconds which yields a maximum clock frequency of 8.33 mHz
[3] .
3.3 The System Software
Each class of processor executes a unique software rou-
tine. The three different routines are: The Input Service
Routine, the Routing Service Routine and the Output Service
Routine. A detailed explanation of each routine's function
is presented next.
74
IA iCeoproorao"	 Word Olt	 Ir g o, r, -A
Confral	 /=iP1ctS Lnp,t	 Ft„t• ••_	 os h	 11`nvi ;n. 	 Pro<E{ ai+r t	 nw „j
IM4,id;a{	 rl^cran^ ^6 •-^ 5 S0-i5
C."iro^Soure p-
Fvntf;OA	 tn"fral
ab-3o
OeSf;not;on	 cu-ird
a6- 3O a6-3o
3131- 3^
But Uci	 (O"il - o I 31 3a- 34
'53 -35
rev
3a	 34
36- 41
Sy stem	 N,•ta w&tt 	 Gfj : 	,.^ 3^ 
-37
4 3, - 444a- 49
Next	 AAIse5s	 r.'ekci 45 - 4945 ' 9s
^-„ M	 A.t,1-, S,P
4. p - 4 3
T OTAL	 „ J	 s,i . ^	 ;; }9 G''s $ 0 C:.
^c'OC^SSO^-
t^^^^^,,^..^	 ,^:^ti► 	 c.nrtro^	 s:,, :,^
,C(W ROM W f yr,-,
Lo9ic Q) Lo3ic 1,
InP^t IE'N)Cn ► r8)=s,=y 17, EA 33 P"J5
TAY-SA,6
43	 g.ts
Rovt^n^ ZEN, CA, 18 ) 1.5 I 	 E A
Output
IE/v,Cn,T,Iss) I4 1 7 P
 
EA 4^	 arts
Sa TAB
Table 3.2 Microprogram Word Bit Divisions
75	
OF POVI, `t !,
'f'1 -' x
Y
02.
nScc.
1 a) current instruction is latched into pipelineO
register.
b) Data is clocked into Q register.
r
\`J a) A and B latches internal to 2903 are open.
a
(	 b) IBUS latch is open.
c) READ line is low during this time in a read operation.
® a) A and B latches, IBUS latch are closed.
b) ALU output- is stable.
e) WE is low if storing into register file.
Address is latched.on this edge during an
address.gencration operation.
5 If Write, micropro;ram word bit is highs WRITE
goes loci during this pulse.
Fig. 3.30 Processor Clock Waveforms
(Courtesy of James Burnell)
76
_)
3.3.1 The Input Service Routine
The Input Service Routine is executed by the Input
Processor. Shown in Figure 3.31 is the flowchart of this
software routine. This routine is sense-loop driven. The
Input Processor loops on a status bit-which is controlled by
the Input Polling Circuit. when the poller finds a full in-
put buffer, it updates the sense-loop status bit. Once the
procs. : 4ir leaves the loop, it fetches the address of the full
input buffer. This address is supplied by the polling cir-
cuit. N-xt, the processor clears the buffer's DAV flag and
restaxta the pollen. Restarting the poller, before service
is complete allows the poller to find the next full buffer
before the processor returns to the sense-loop. This scheme
reduces processor idleness due to poller scan time.
The Input Processor then fetches the address of a free
data path in the Input Switching Network. This address is
supplied by the Input Data Path Status Port.. The ELIST is
accessed next. Using this list, the Input Processor fetches
the address of a free shift register in the array. After ob-
taining the address stored in the location selected by the
EPTRO index pointer, the Input Processor increments the EPTRO.
EPTRO now points to the next empty shift register address
stored in ELIST. Using the three addresses mentioned above,
the Input Processor links the full input buffer to the empty
shift register via the free data path. Upon completion of the
link, the Input Processor initiates the packets transfer into
the array. The Input ;'rocessor then returns to the sense loop.
77
V
Fig. 3.31 Input Service Routine Flowchart
N^,
78
A listing of this routine is given in Figure 3.32. Since
?	 the instruction set of each processor is custom tailored, no
standard computer language exists to describe the packet
switch's software. In order to document the software, a simple
format is used to code each line of software:
<lot operand><operation><2nd operand> -* <destination>.
In the listing, instructions performing a single task are
grouped together, followed by a comment explaining their. function.
Concurrent task execution is noted by ";". In addition, the uP
Address Code is listed next to the instruction which generates
that particular control signal. This is done to help explain how
the software interfaces with the system hardware. Each line of
code listed requires 120 nanoseconds of execution time.
3.3.2 The Routing Service Routine
Execution of the Routing Service Routine is carried
out by the Routing Processor. The flowchart of this software
routine is illustrated in Figure 3.33. This routine is sense
loop driven. The processor loops waiting for the Shift
Rogister Polling Circuit to indicate that a newly arrived
packet has been found in the array. When a new packet is
found, the Routing Processor leaves the loop and fetches the
packet's array address from the poller. Using this address,
the Routine Processor fetches the packet's syndrome from the
Syndrome Generator. This syndrome is latched into the address
input of the Syndrome Decoder Rom. Simultaneously, the shift
79
INPUT: If IBSR-A 0, JMP TO INil.,
*Is there an input buffer
requesting Service?
NO: Loop @ INPUT.
Input Polling Port + Q
*YES: Input the address
of the buffer requesting
service.
[ELISTI@EPTRO ♦ Scratch It Reset Poller
*Find a free shift regis-
ter, clear IBSR-A and
restart roller.
Input Data Path Status Port Address 4 Address Latch (uPl-A)
Data Path Busy Status Port 4 Q; Update EPTRO
*Find a free date path
and increment EPTRO.
Scratch 2+Data Path Latch A Base Address 4Address Latch (uP2-A)
Q -+ Data Path mux select Latch A(D)
*Link the input buffer
to the data path.
Scratch 2+Data Patb Latch B Base Address-Address Latch (PP3-A)
Scratch 1 4 Data Path Demux select Latch B(D)
*Link the empty shift
register to the data
path.
Data Path Transmit Control Address 4 Address Latch
Scratch 2 -+ Data Bus Decoder (Ml-A); Jump to INPUT
*Start data transfer
and return to the
sense loop.
Fig. 3.32 Input Service Routine
80
TOP
a
START
^O	 M.ea on
^^^•tvk.A NXI?
tt7i/{^
yes
Oafetn^^ne 'r'^►t
pddeess e^ thQ
oat tt9^^teP
te^^rti.^ Stroiee,
Fetch a.,d
Coved heodet:
Re , tart 11,,2
Poj;:A^ Ciervit.
Mot a catt.dpA
healer ,n S.R.
Acro^.
Oriert^^..r the
Poc kcl5
deL}IAai;OA
^^ QUq^F
 L
oC
Fig. 3.33 Packet Routing Service Routine Flowchart
81
y	 .•
.A r
^	 Place 'fhe Q^c►
i
Add^ecl ^r•to fhz
GIN	 QO a USI 90114 Port.
FOR PACE I3
Req^esf i ►+e
p„tp^t Qoeut
Cast.
Nd	 List Free
yE5
-^ lfackei's Add,ess
i	 ^	 ^1	 fr	 1	 1	 i
	
' IS SAOCPCI inio '	 PCelOemr^ o^j AvTon.OT.0
ftie queue list. (	 h0cdwoce,
i m
Fig. 3.33 Packet Routing Service Routine Flowchart, continued
0	 82
.I
Re.leme ovi-p f
4Veo p list.
TOP
Fig. 3.33 Packet Routing Service Routine Fl6wchart, continued
V
register status flag is cleared and the poller is restarted.
-The output from the Syndrome Decoder Rom is exclusive-ORed
with the packet's header. This operation yields the corrected
header, which is stored back into the shift register.
Using the corrected header, the Routing Processor deter-
mines the packet's destination. In order to route the packet,
the Routing Processor must place the packet's array address
into the proper Output Queue List. As stated earlier, these
lists are shared resources which have contention problems.
Thus, they are regulated by hardware locks which permit access
to only one processor at a time. Therefore, before accessing
any list, the Routing Processor must request access. In order
to minimize the time spent accessing these resources, the
shift register address data is first placed into the Output
Queue List Data Port. The Routing Processor then requests
access to the selected queue list. If access is granted, the
data in the data port is automatically strobed into the queue
list at the location specified by the IPTR. This scheme per-
mits the Routing Processor to move on to the next task rather
than writing to the list.
Once the shift register address is placed into the proper
queue list, the Routing Processor checks the associated OSW.
If the OSW indicates that the corresponding buffer is not in
the Idle state, the Routing Processor releases the Output
Queue List. The signal generated to release the queue list
also activates the IPTR update circuitry, which automatically
increments the IPTR counter. After releasing the queue list,
the Routing Processor returns to the sense loop.
f1
84
Should the OSW indicate that the output buffer is in the
i^	 idle state, the Routing Processor updates the OSW. After this
update, the OSW indicates that the buffer is now in the Empty
state, waiting for Output Processor service. The Routing
Processor then releases the queue list and returns to the
sense loop.
If the selected Output Queue List is not available, the
Routing Processor loops request service. This loop is called
j	 a SPIN LOCK since the processor spins on the hardware lock
while waiting for the busy resource to be freed 1 51. There
exists an alternative locking scheme called the SUSPEND LOCK.
This alternative scheme requires the processor to suspend the
current task which needs the busy resource 1 51. This task
is temporarily put aside as the processor moves on to a new
task. Implementation of this scheme was considered, but was
abandoned. Several reasons led to the abandonment of the
Suspend Lock:
1) The additional hardware and software required to sus-
pend and resume jobs.
2) The next task selected may also require the busy
resource.
3) The time wasted idling in the-spin lock is far shorter
than the time required to suspend and resume the exe-
cution of a job.
4) The possibility that no new task existed, resulting
in wasted time as the processor suspended the only
job available.
85
1W
Thus, the Spin Lock is used in both the Routing Service
^J	 Routine and the Output Service Routine. A listing of the
Routing Service Routine is contained in Figure 3.34.
3.3.3 The Output Service Routine
The Output Service Routine is executed by the Output
Processor. Figure 3.35 contains the flowchart of this rou-
tine. This routine is sense loop driven. The Output Proces-
sor remains in the loop until the Output Buffer Polling cir-
cuit locates an empty output buffer. When the poller finds
an empty buffer, it notifies the Output Processor by changing
the sense loop status bit. Once the processor leaves the
loop, it fetches the buffer's address from'the poller. Using
this address, the Output Processor selects the corresponding
Output Queue List. Access to the queue list is then requested.
As in the Routing Service Routine, a spin lock is implemented
for queue list accesses. The Output Processor must spin on
any activated queue list lock. Once access is granted, the
Output Processor checks to see if the selected queue list is
empty. If the queue list is empty, the Output Processor up-
dates the buffer's OSW to indicate that the buffer is now in
the Idle state. Then the Output Processor releases the queue
list, restarts the poller and returns to the sense loop.
If the selected queue list is not empty, the Output
Processor fetches the oldest packet address in the list. The
associated OSW is changed to indicate that the output buffer
is in the Busy state. After updating the OSW, the Output
Processor releases the queue list and restarts the poller.
86
V
START: If NEVI-B O, Jmp to START
*Is there a shift register
requesting service?
NO: Loop @ START.
SRS Polling Port Scratch 1
*YES: Input the address of
the shift register.
Syndrome Generator Base Address+Scratch 1- o-Address Latch (UP1 -E1
Syndrome (R) -►
 Decoder ROM Address Latch= Reset Poller
*Fetch header Syndrome and
send it to the Decoder ROM.
Clear NEW-B and restart the
poller.
Decoder ROM Address - ► Address Latch NP2-B)
(Decoder ROM) @Syndrom(R) -► Q
*Fetch error word from ROM.
Header Base Address+Scratch 1 4Address Latch (PP3-A)
ALU EXOR Q -►
 Scratch 2, Header Port(R)
*Correct the header. Store it
back into the S.R. Array and
into Scratch 2.
Scratch AND Destination Mask - ► Q
*Determine packet destination.
Q+Output Queue List Base Address+Address Latch (PP4-B)
*Select the queue list and the
OSW of the destination out-
put buffer.
Scratch 1 -►
 Output Queue List Data Port
*Place the packet's S.R. Array
address into Queue List Data
Port.
Fig. 3.34 Packet Routing Service Routine
87
fREQUEST: Request access to Queue List (N)
*Request access to the Output
Queue List selected. If
access is granted, the data
in the port is automatically
written into the queue list.
If STATUS-B a
 1, Jmp to REQUEST
*If . access is not granted,
loop @ REQUEST. Proceed
otherwise.
If OSW(N) - NOT IDLE, Jmp to END
*Is output buffer idle?
Set OSW(N)=EMPTY; Release Output i?ueue List; Jmp to START
I	 _
*YES: update OSW, release
queue list and return to
the sense loop.
END: Release Output Queue List; Jmp to START
*NO: Release queue list and
return to the sense loop.
Fig. 3.34 Packet Routing Service Routine, continued
88
O %CI1V9C
QV CF IS
`^Tr
i
Fig. 3.35 Output Service Routine Flowchart
89
Fig. 3.35 Output Service Routine Flowchart, continued.
i
90
Next, the address of a free data path in the Output Switching
Network is fetched from the Data Path Busy Port. The Output
Processor links the shift register containing the packet
awaiting transmission to the empty output buffer via the free
data path. Once the data link is established, the Output
Processor initiates the packet transfer and increments the
ELIST index pointer EPTR1. EPTR1 now points to an.unfilled
location in ELIST. After this update, the address of the
shift register containing the packet being transmitted is
placed into ELIST at the location specified by EPTR1. The
i
Output Processor then returns to the sense loop.
I
A listing of this routine is given in Figure 3.36.
91
OUTPUTs If SBRVICS-C = Or Jmp to OUTPUT
*Is there an output buffer
requesting service?
NO: Loop @ OUTPUT.
Output Folling Port - ► Q
*YES: Input the address of the
buffer requesting service.
Q+Output Queue List Base Address+Address Latch (uPl-C)
RE EST: Request Queue List (N)
*Select the buffer's output
Queue List and OSW. Then
request access.
If STATUS-C = 1, Jmp to REQUEST
Was access granted?
NO: Request access again.
If EMPTY-C 0, Jmp to IDLE
*YES: Determine if the list
is empty. List Empty: Jump
to IDLE.
(Output Queue List(N))@OPTR(N)-*Scratch 1; Set OSD7-BUSY;
Release Output Queue List; Reset Poller
*List Not Empty: Input the
S.R.# which contains the
packet to be transmitted.
Then update the OSW, restart
the Poller and release the
queue list.
Output Data Path Status Port Address-+Address Latch (PP2-C)
Data Path Busy Status Port -+ Scratch 2
*Find a free data path.
Scratch 2+Data Path Latch A Base Address+Address Latch (VP3-C;
Scratch 1 4 Data Path MUX Select Latch A(D)
*Link the shift register to
the data path.
Fig. 3.36 Output Service Routine
92
op
Scratch 2+Data Path Latch 8 Base Address ♦Addregs Latch W N -C)
Q 4 Data Path Demux Select Latch B(D)
*Link the output buffer to
the data path.
Data Path Transmit Control Address y Address Latch
Scratch 2 4 Data Bus Decoder W-C); Update EPTR1
*Start Packet transfer ani
increment EPTR1.
Scratch 1 ♦ (ELIST)lEPTRl; JMp to OUTPUT
*Place S.R.# in the Empty
S.R. List and return to the
top of the program.
IDLE: Set OSWwIDLE: Release Output Queue List; Reset Poller;
Jmp to OUTPUT
*t+pdate OSW, release queue
list, restart poller and re-
turn to the top of the pro-
gram.
Fig. 3.36 Output Service Routine, continued
93
1.0 THE MULTIPLE PROCESSOR DESIGN
With the three processor design complete, the next logi-
cal step in the expansion of the system is to include multiple
processors in each processor class. The major incentive be-
hind this idea is to increase the system throughput through
the use of a multiprocessor architecture. However, two major
problems must be :overcome before this goal can be achieved.
The two problems are contention and throughput-limiting func-
tions. The solutions to these problems are presented as
topics in this chapter since they shape the final system
architecture. Also included in this chapter is . the system
architecture, the processors, hardware and software required
for implementation, and the design trade -offs made. Many
hardware components used in this design are exactly the same
as those used in the three processor design and, therefore,
are not presented in much detail. This chapter begins with
an overview of the system architecture and its operation.
4.1 The System Architecture
The system architecture is shown in Figure 4.1. This ne%*y
architecture is controlled by four classes of processors. The
new class of processors and the system requirements that caused
the additional workload division are discusnO. in 4.4. In
order to examine the duties of each class of rrocessors, a
packet's transfer through the packet switch is traced.
The first function of the switch is to receive and to
store each incoming packet. When a pac}: ,r^t arrives, it is
temporarily stored in an input buffer. An input buffer
94
i^
^	 3 a
1
a^
	
a^
— ri.
ORIGINS
OF 
POOR 
• QU•gLIT^
.i
U
a^0
U
$4Q
W
V
N
N
O
NN
d1
U
O
a
m
a
0
a
a^
H
m
In
w
95
containing a newly received packet requests processor service.
Dedicated hardware pollens sequentially scan their assigned
group of input buffers searching for full buffers. One group
of input buffers is assigned to one Input Processor. Upon
finding a full buffer, a polling circuit signals the Input
Processor it is serving. Immediately, this processor estab-
lishes a data link between the full buffer and the-Shift
Register Array. In order to set up this link, the processor
must first find an available data path in the processor's
dedicated Input Switching Network. Next, the processor must
find an empty location in the Shift Register Array. Once the
address of an empty location is fetched from the Empty Shift
Register List (ELIST), the processor completes the data link.
The processor then initiates the packet's serial transfer into
the array. As in the previous systems, this transfer is hard-
ware monitored and terminated, allowing the processor to move
on to a new task.
The second function of the switch is to sort each packet
in the array into groups of packets that are destined for the
same group of ground stations. Each unique group of stations
is serviced by one unique Routing Processor. Shift registers
containing newly arrived packets signal for Packet Sorting
Processor service. Dedicated hardware pollers scan their
assigned group of shift registers for new packets. Once a
polling circuit locates a new packet, the Sorting Processor
it is serving is notified. This processor fetches the packet's
i	 header and corrects it. As in all previous systems, the header
96
rY:
is protected by the BCH error-correcting code. The packet's
destination-is then read from the header. Using this infor-
mations the Sorting Processor sends the packet's destination
information and array address to an input/output port asso-
ciated with the packet's destination. Each different I/O
port belongs to one Unique Packet Routing Processor. Any
Sorting Processor may access any I/O port.
i
	
	
The Packet Routing Processors carry out the.switch's
third function, which is the updating of the Output Queue
Lists with the addresses of sorted packets. Once an I/O port
is found to contain valid packet routing data, the I/O port
polling circuit signals the Routing Processor it serves. The
Routing Processor responds by fetching the packet's destina-
tion information. Using this information, the processor
determines to which ground station the packet is destined.
Packets leave for a ground station via an output buffer which
corresponds to that ground station. Each output buffer is
assigned to only a single ground station. In order to route
a packet to a particular ground station, the Routing Processor
must assign the packet to the software output queue list which
corresponds to the proper output buffer. This assignment is
made by fetching the packet's array address from the I/O port
and placing it into the proper queue list. Each Routing
Processor controls a unique group of output queue lists. A
packet is considered routed once its array address is placed
into one of the N queue lists.
97
- _A
fl
The fourth and final function of the switch is to trans-
fi
	 mit the routed packets to their final destinations. This job
belongs to the Output Processors. When an output buffer
i
empties due to a --ompleted packet transmission, the buffer
requests processor service. Dedicated hardware pollers
sequentially scan their own group of output buffers in search
of empty buffers. When an empty buffer is found by a polling
circuit, the Output Processor served by this poller is in-
formed. The processor then accesses the output queue list
belonging to the empty buffer. The address of the oldest
packet waiting for transmission to this destination is fetched
from the queue list. Next, the processor finds a free data
path in its dedicated Output Switching Network. A link is
established between the shift register containing the packet
to be transferred and the empty buffer via the free data path.
Once this link is complete, the packet transfer is initiated
by the processor. Automatic hardware controls this serial
packet transfer. As soon as an output buffer is loaded, the
packet is automatically transmitted to the ground station by
hardware external to the packet switch. While the internal
hardware transfer takes place, the Output Processor updates
ELIST by placing the packet's array address into ELIST.
If an output queue list is empty when its associated
output buffer becomes empty the Output Processor must place
the buffer in the "idle" state. An idle buffer will remain
idle until a new packet arrives for that buffer. The Routing
Processor will assign the new packet to the empty queue list.
98
Next, the Routing Processor must change the buffer's status{J	 to indicate that the buffer is empty and requires service from
c
the Output Processor servicing that particular buffer.
r
4.2 Shared Resources
In the three processor design, contention problems be-
tween the different classes of processors are discussed in
depth.. A workable solution is found and implemented for each
shared resource. In this multiple processor design, new con-
tention problems arise. Since there can be more than one
processor in each processor class, contention may occur between
processors of the same class. The contention problems of these
resources can be solved with design changes within the sub-
system they serve. These design changes may affect the archi-
tecture of that subsystem, but they do not affect the other
packet switch functions. Thus, the resource allocation schemes
required by these shared resources are discussed in the sec-
tions which describe each subsystem of the switch.
However, there are several resources which are shared
by two or more classes of processors. The design of these
"Multi-Access Resources" and the formation of their alloca-
tion schemes may affect the architecture of two or more packet
switch subsystems. Thus, these resources must be considered
before the entire architecture of the packet switch can be
designed. A review of the three processor design reveals
that there are three resources which will become Multi-Access
Resources in the multiple processor system. These resources
are:
99
v	
r
1. The Shift Register Array
^.	 2. The.Output Queue Lists
3. ELIST
Now identified, each of these resources must be investigated
and redesigned if necessary.
4.2.1 The Shift Register Array
Each Input (Output) Processor's switching network
may be linked to any location in the Shift Register Array.
However, no two Input (Output) Switching Networks will ever
access the same location simultaneously. This is due to the
fact that two or more Input (Output) Processors can never
fetch the same address for a particular array location from
ELIST (an Output Queue List) simultaneously as they service
packets. As mentioned earlier, the array is capable of re-
ceiving a new packet while concurrently transmitting the older
packet from the same location. Thus, no contention problems
will arise between the Input and Output processors even if
they access the same location concurrently. However, unless
only one Sorting Processor is allowed to access a single lo-
cation at one time, contention problems will arise. These
problems can be eliminated by the assignment of groups of
locations to one Sorting Processor. Since packets may be
stored in the array with an uneven distribution, the locations
assigned to each Sorting Processor should be interleaved.
This ensures against the Sorting Processors being forced to
carry unproportional workloads due to uneven packet storage.
100
_	 4.2.2 The Output Queue Lists
In'the three processor design, the Output Queue
Lists are not completely free from contention problems. They
are shared between the Routing Processor and the Output Pr.o-
cessor. In the multiple processor system, the lists are needed
by the multiple processors in both the Routing and the Output
classes of processors. This requirement adds new contention
problems for these already contention-plagued resources. In
order to keep the amount of processor contention from increas-
ing, a restriction regarding processor access to these lists
must be made. Only one Routing Processor and only one Output
Processor will be allowed to share a list. This requirement
changes the workload of the Routing Processor used in the
multiple processor packet switch.
In the three processor design, the Routing Processor
services the entire Shift Rigester Array and all of the N
output queue lists. A packet in any Shift Register Array
location can requir- routing to any output buffer. A packet
is considered routed only after its array address is placed
into the proper queue list.
As described earlier, the Shift Register Array is now
divided into groups of locations, each of which is serviced
by a unique processor. This architecture, using the previous
Routing Processor structure, would require that all the
Routing Processors be allowed to access any of the N queue
lists. Since this requirement is in conflict with the pre-
vious design decision that limited one Routing Processor to
a list, a new architecture is needed.
101
[	 1
If
The new architecture will force a division of the Routing
Processor's . workload. This workload division requires the
implementation of a new claps of processors which is needed
to carry out some of the tasks formally assigned to the Routing
Processor. The new class of processor is the Sorting Pro-
cessor. Each Sorting Processor is assigned to a group of
shift register array locations. They are allowed to send
routing data to any Routing Processor. Each Routing Processor
is assigned to a unique group of Output Queue Lists. These
two classes of processors are linked by a contention-free
hardware interface. Details concerning the actual implementa-
tion of this interface and the new processors are presented
in section 4.4.
4.2.3 ELIST
The Empty Shift Register List (ELIST) is accessed
by every Input Processor and every Output Processor as well.
The previous ELIST structure cannot handle this requirement.
Since only one Input Processor and only one Output Processor
can access ELIST without interference, a new ELIST allocation
scheme is needed to provide the multiple processors with
contention-free access.
The first scheme considered is the division of ELIST
into smaller lists. Each list would then be assigned to one
Input Processor and to one Output Processor. However, in
order for this scheme to work properly, the workload must be
distributed evenly among the Input Processors and also among
102
the Output Processors. An example of how an unevan packet
distribution can cause this scheme to fail is easily illus-
trated.
Assume each user is transmitting packets at his maximum
allowable rate. Assume even further that most of the packets
sent are destined to only one or two users that are serviced
by the same Output Processor. After a short time, all but
one of the Input Processors will have depleted their supply
of array addresses. Only the Input Processor that shares the
same ELIST with the busy Output Processor will continue to
4	 receive new array addresses. This case illustrates the need
I to supply ELIST data to each Input Processor through the use
of a data distribution scheme. In addition, this case
example clearly demonstrates that ELIST must remain as a
single resource that is shared through the rise of an alloca-
tion scheme. The idea of an ELIST data distribution system
is the foundation on which two ELIST implementations are
based. One design is based around an Elist Support Processor
while the other design uses only automatic hardware. These
two designs are discussed in detail below.
4.2.3.1 Processor-Controlled ELIST
Since there are no constraints regulating the use of
support processors, the use of a processor to coordinate the
operation of the ELIST data distribution system is a logical
choice. The processor controlled ELIST system architecture
is presented in Figure 4.2.
103
V1+
a
41
V
d
41
V
14
oc
E4
wM
a
w
1
0
54
4J
a
0
u
i
w
0
m
N
O^
i
w
N
d'
.H
w
104
W
%AW
The operation of this ELIST data distribution system is
straightforward. Each Output Processor sends its ELIST data
I
	 to a dedicated I/O port. Those ports support the common
DAV/DAC handshaking protocol. A dedicated poller scans these
ports in search of a full port. When the poller finds a full
port (identified as full by its activated DAV flag), it signals
the Elist Support Processor. The support processor then
fetches the data and sets the DAC flag. The Eliot Support
Processor then checks to see if any Input Processor-linked
I/O port requires data. Each of these I/O ports is assigned
to one Input Processor. Again, the DAV/DAC flag handshaking
is used and a dedicated hardware poller is . also used to scan
these ports. If the poller had located an empty port (sig-
nalled by an activated DAC flag), the Elist Support Processor
sends the ELIST data directly to the empty port. If no I/O
port is empty, the data is stored into the ELIST RAti. If an
Input Processor's I/O port empties before the Eliot Support
Processor has received data from an Output Processor, the
support fetc',es the data from the RAM and then sends it to
the empty port.
Since this ELIST data distribution system is controlled
by a processor, it can serve the Input Processors and the
Output Processors only as fast as the Elist Support Processor
executes its task. The Elist Support Processor can support
any packet switch throughput up to 3 Mega-packets per second
(see Appendix). This ELIST structure is a throughput-limiting
(	 function. Therefore,-adding additional processors to the
11: `,
other four classes will never increase the system throughput
beyond the upper bound of 3 Mega-packets per second. Thus,
tn.s system is replaced by a hardware-controlled data distri-
bution system, which is the next topic of discussion.
Although the processor-controlled system is not used in
this particular architecture, the processor architecture, the
interface hardware and the software required for implementa-
tion are located in the Appendix. This material is presented
because the processor-controlled ELIST schenn is less complex
than the hardware-controlled ELIST and it can offer the user
some degree of flex'bility in that the processor software can
be custom tailored. Thus, the processor-based ELIST is the
recommended implementation for packet switches operating below
3 Meta-packets per second.
4.2.3.2 Ha dware-Controlled ELIST
Since hardware is relatively faster than software,
a completely hardware-controlled ELIST will serve the packet
switch at this fastest rate possible. This design removes the
previous throughput limitations encountered in the ELIST
£upport processor-based design.
ELIST interfaces to the Input Processors and the Output
Processors throtiq;i input/output ports. A dedicated port is
assj-;nt;U to each processor accessing the list. In order to
explain how the system services the two classes of processors
(Input and Output), the operation of the data storage func-
tion is described first.
106
V
°CRY-'.!T'!^-
	
--^- .^.^-	 -..^.-.. _^ ._^_^_- ^^^•
Figure 4.3 contains the ELIBT Data Input Port architec-
ture. When an Output Processor has a new array address for
BLIST, it checks the port's Data Accepted (DAC) flag. If the
previous data was fetched by
is set. A not DAC flag ally
its port with the new data.
port, the processor sets the
the DAC flag is not set, the
this flag gets set.
the MIST hardware, this flag
es the Output Processor to load
Once the data is loaded into the
Data Available (DAV) flag. If
Output Processor must wait until
The ELIST storage hardware is controlled by a polling
circuit which is driven by a counter. All the ELIST Data
Input Port DAV flags are sent to the ELIST DAV MUX. The Out-
put of this MUX generates the FULL PORT signal which reflects
the status of the DAV flag selected. Selection of the DAV
flags is controlled by the value of the counter. The value
of this counter is also supplied to the ELIBT Input Enable
Demux. The activated Demux output enables the handshaking
logic and the tri-stated port output of the addressed I/O
I.	 port.
If the addressed I/O port's DAV flag is set, the FULL
PORT signal becomes activated. This activated signal sets
the STORE DATA Flik-Flop. Once set, this flip-flop halts the
poller's counter. Simultaneously, the flip-flop activates
the one-shot that generates the active-low WRITE signal.
While the WRITE signal is activated, the two-port MIST RAM
is enabled in the write mode. The data from the enabled I/O
port is then strobed into the RAM.
107
v
3
R
41NO
a$
41
a
a
c .M
b41b
ENM
a
w
M
a
t+
..,w
e
W
H
108
iThe ELIST RAM Structure is shown in Figure 4.4. The
lj	 ELIST Data Structure is presented in Figure 1.5. In this
data structure, both the read and write operations are per-
formed before the index pointers are updated. Both pointers
are updated by being incremented. Once the bottom of the
list is encountered, they roll over and return to the top of
the list.
Once the write operation is complete, the low-active
WRITE signal goes high. The leading edge of this low-to-high
transition fires the one shot which activates the DATA
RECEIVED signal. This activated signal clears the DAV flag
and sets the DAC flag belonging to the enabled I/O port. The
clearing of the DAV flag clears the STORE DATA flip-flop. The
reset flip-flop activates the UPDATED signal which increments
the counter.which serves as the write index pointer (EPTRO).
In addition, the reset flip-flop enables the poller to re-
start. Figure 4.6 contains the timing diagram and the signifi-
cant events for this entire operation.
The second function of the ELIST data distribution system
is to supply each Input Processor with the address data stored
in the ELISff. when required. Figure 4.7 contains the ELIST Data
Output Port system which carries out this task. The primary
function of this system is to keep each ELIST Data Output Port
filled with valid data. If the ports are kept full, no Input
Processor will be forced to wait for data.
As with the ELIST Data Input Ports, each Data Output Port
is assigned to one processor. Each port has its own hand-
shaking flags. When an Input Processor needs data from the
109
a^
a
v
a
w
H
H
W1/) ¢N °
J tA
a d'
O
F O •
O'
.r4
W
O J04'
Ge poojt^ PACE IS
,z j
t4-^
^r
w°
F ^a
o^
^c
w 04
U
..
WIwr	 ^
o	 ^Z
110
v
a s. R. # a
0
0
0
N S. R. # N
E PTR 0
X X X
a xx x
3 xX x
4 S. R.
5 s. R.
0
0
0
E.	 E P TR Z
F PTR 0
Fig. 4.5 ELIST Data Structure
111
`^ of ^, -^^` .a
C 0 to •^ O
C. °O
O
pJo
co
`
y
6 W '
^
jill
S N^ O. ` .A O 0 ..t
.^
u
4-0 r
•"	 b
•^ •
~
W ^n ^' 6 w °C '^	 o to r `0
•C ^d Q
'
•y .f^ C 11. d
H
e Ng
3
d b^b d
C
? T ° W
V(r N c G^ M ••d%	 N u Qu
d's 01
tL F- rQ tr
® ® o 0® ®.
W
H
H
w
ro
-•+
A
ON9
H
0)
14
ro3b
ro
x
4J
w
0
a
4J
a
a
H
H
H
a
w
a+
w
W	 J
	 a
vJ
	 1L G.
	
H	 OC	 W	 S
C
112
H
w
a
I I	 tt 4	
- il
i
o^h
o ^cdr
^n ac
Q V
^ O
o Iv V ^ 5 cN	 o
^
Irk	 oc W	 a
r H
2	 °M
> o ^ a
o
W
H
-^
V
F-
I ^
OC ^ .ir V
^ C 2
0
i
Q
0 AC
J ^Wo^
v	 H Q 4 H >
%A	 Lv J o  w o Q
- 1 	 — — -7- T
113
`'.: '1
SLIST, it checks the I/O port's DAV flag. If this flag is
set, valid data is held in the port and the Input Processor
fetches this data immediately. If the DAV flag is not set,
the processor must wait for service.
Once data is fetched from an I/O port, the Input Pro-
cessor accessing this port sets the DAC flag. Every I/O
port's DAC flag is sent to the ELIST DAC MUX. The output
•
	
	
of this MUX generates the EMPTY PORT signal. Selection of
the DAC flag is controlled by the counter which drives the
j
	
	 mux. The value of this counter is also sent to the LOAD
Demux and to the DATA SENT DEMUX. The activated Demux out-
puts enable the handshaking logic and the data loading cir-
cuitry of the addressed I/O port.
If the selected I/O port's DAC flag is se*, the EMPTY
PORT signal is activated. This signal then sets the SEND DATA
flip-flop. Once set, this flip-flop halts the poller and
activates the one-shot that produces the active-low LOAD sig-
nal. While the LOAD signal is active, the data on the ELIST
Output Data Bus is strobed into the enabled I/O port. The
data on the ELIST Output Data Bus is supplied from the RAM
location selected by the read index pointer.
Once the data transfer has been finished, the active-low
LOAD signal goes high. The leading edge of this low-to-high
transition activates the one shot that generates the DATA SENT
signal. The DATA SENT signal then clears the enabled port's
DAC flag and sets its DAV flag. The reset DAC flag clears
the SEND DATA flip-flop, enabling the poller to restart its
114
14 
fi
scanning. Simultaneously, the DATA SENT signal goes low
resulting in the updating of the read index pointer. This
update increments the hardware counter which serves as the
read index counter. A timing diagram of the complete ELIST
output function, along with the significant timing events,
is presented in Figure 4.8.
4.3 The Input System
The Input System consists of the Input Buffers, the
Input Processors, the Input Switching Networks and the Input
Polling Circuits. This sytem interfaces to the Shift Register
Array and the ELIST Data Distribution System. The architec-
tural organization of this system is presented below.
However, before the architecture can be designed and
explained, the contention problem related to the Input
Switching Network must be solved. As in the previous designs,
the Input Switching Network provides programmable data paths
from the input buffers to the Shift Register-Array. In order
to provide the address of an available data path in the net-
work, the status of each path is monitored by the hardware in
the Data Path Busy Port. This port is accessed by the Input
Processor.
.If a single Input Switching Netwark is used in the
multiple processor system, access to the status port must be
granted to only one Input Processor at a 4i-me. Since several
Input Processors will require access to this resource, a
resource allocation scheduling scheme is needed. This scheme
115
L
W
FE
P-
I
N
ro
^.1
Q
R
•.i
4
H
0!
w
3
54
ro
x
V
0
w
0a
00
HNHa
w
qr
a;
.r4
w
V
	
^	 c	 ^
^ 	 V
t	 ^	 .i rd's O	 .+	 M
	
Cd 13	 (A
^'	 o	 y g o	 C	
,t	
+^ o.
	
5 y
	
F-	 Q^	 rt
	
Z ,,	 w	 W	 o	 ;u
	
V	 ¢ CZ 	Q
o^ Qu
116
r,
will require new hardware and additional software. The addi-
tional software will reduce the throughput of the Input Pro-
cessors. Throughput may be-reduced even further if the Input
Processors are forced to wait for the resource whenever it is
busy. This contention problem needed to be solved. The solu-
tion implemented in this design eliminates contention com-
pletely by allocating a dedicated Input Switching Network to
each input processor.
4.3.1 Architectural Workload Division
In the three processor design, the workload is
divided into three relatively independent tasks. This scheme
works quite well, in that each processor can carry out its
assigned task without interference from the other processors.
However, in the multiple processor design, the workload of
the packet switch must be sub-divided within the three func-
tions. Processors in the same class must share the workload
within the function assigned to that processor class. There-
fore, if the proper architecture is not implemented, a pro-
cessor may be faced with interference from the other processors
in its own class.
The processors controlling the Input System can be
organized using ont; of two techniques: Master/Slave Scheduling
or Separate Systems [5 	 The Master/Slave Scheduling scheme
is organized such that one processor maintains the status of
all the "Slave Processors" and the uncompleted tasks. This
"Master Processor" schedules the work for each of the Slave
Processors.
117
^	
* a	
a t	
_
.i
The Separate Systems scheme is organized such that each
processor carries out its assigned tasks in parallel with the
other processors. The assignment of tasks for each processor
is fixed by the system architecture. There is no dynamic
allocation of processors to tasks, as in the Master/Slave
Scheduling System. In addition, each processor is assigned
dedicated memory and dedicated I/O devices.
These two schemes are the foundation on which two archi-
tectures for the input system are based.. Each of the two
architectures are presented below. Also included are the
design considerations which led to the selection of the
Separate System scheme.
4.3.1.1 Master/Slave Scheduling
One possible implementation of the Input System
using the Master/Slave Scheduling scheme is presented in
Figure 4 . 9. This figure contains a block diagram of the
Input System Architecture A.
Input System Architecture A uses a hardware poller to
locate full input buffers. Once the poller finds a full buf-
fer, which is indicated via an Input Status Word ( ISW), it
stops and signals the Job Scheduling Processor (Master Pro-
cessor). The Job Scheduling Processor inputs the address of
the full buffer from the poller. The Job Scheduling Processor
then updates the ISW to indicate "partial service" and re-
starts the poller. Next, the Job Scheduling Processor fetches
the address generated by the priority encoder. This encoder
118
se
d
V
d
U
Ia
to
ta
a
a
o+
t+
.pi
w
i
119
driven by the Busy flip-flops that indicate the current
status of each Slave Processor. The encoder supplies the
address of the free Slave Processor which has the highest
assigned priority. Using this address, the Job Scheduling
Processor assigns-the task of servicing the full buffer to
the free Slave Processor. This Slave Processor sets its Busy
flip-flop and begins the task of inputting the packet to the
Shift Register Array.
The main advantage of this scheme is that the workload
is shared by all the available Slave Processors, regardless
of the distribution of the incoming packets. Since the Slave
Processors are assigned to tasks (incoming packets) and not
to the input buffers, all the processors will be utilized
even if only one or two channels are heavily loader. An addi-
tional advantage of this system is that under lightly loaded
conditions, the low priority Slave Processors will be free.
These low priority processors could be programmed to execute
background functions. Service for the input buffers could
then be interrupt driven.
There exist two disadvantages in Architecture A. The
first disadvantage is a reliability problem. If the Job
Scheduling Processor fails, the entire packet switch becomes
inoperative. One possible solution to this problem is the
implementation of additional Job Scheduling Processors that
are assigned their own dedicated Slave Processors. Another
possible solution is to have a Slave Processor replace the
Job Scheduling Processor in the event of a failure. Both of
120
(	 these schemes will add complexity to the hardware and/or
software.
The second disadvantage is the amount of hardware re-
quired to allow any Slave Processor to serve any input buffer.
This architecture could require thousands of control lines
for just one control signal. An example of this problem is
given below for a typical system:
N - 100 Users (100 input buffers required)
# of Slave Processors - 4 processors
4 of Input Switch.'&ag Network Data Paths
- 10 paths/processor
t of DATA IN lines - 1 line/user/data path
(100 users)-(4 processors)•(10 paths/processor)-1 line/user/
data path - 4000 DATA IN lines.
As a result of this finding, n new multiprocessor archi-
tecture for the Input System is proposed. The new architecture
is discussed below.
4.3.1.2 Separate Systems
The Input System Architecture is presented in
Figure 4.10. Each Input Processor controls a complete input
system. Each of these systems operate independently of one
another. The size of these systems is determined by the rum-
ber of input buffers assigned to each system. Once a group
of buffers are assignee to an Input Processor, they remain
fixed to that processor. Therefore, in this scheme,
121
WO
M
ac
W
u
4)
x
u
w
a
N
a
cH
O
-i
qw
C+
.04
W
p 1 Y
ZH
~N
;
ti
Wz
a
M
ate.. ^ !! a ^
{21 a^tl G ^/
H ( 4
`,-)
	 N
F
122
As are assigned to processors, while the reverse is never
Li ale .
i	
In order to compare the hardware complexity of this
scheme to the complexity of the Master/Slave Scheduling scheme,
the previous example using the DATA IN lines will be continued:
N = 100 users (100 input buffers required)
# of Input Processors = 4 processors
# of Input Switching Network Data Paths
= 10 paths/processor
# of DATA IN lines = 1 line/l user/l data path
(25 users/separate system/processor)-(4 separate systems)-(10
data paths/processor)-(l DATA IN line/l .user/l data path)
1000 DATA IN lines
The Master/Slave Scheduling scheme required 4000 DATA IN
lines. Since the DATA IN line is only one of two Input
Switching Network signals that requires 1 line per 1 user per
b.
1 data path, the Separate Systems scheme is clearly less com-
plex.
The one major drawback with this architecture is that
idle or lightly loaded processors cannot be assigned to
heavily loaded channels if those channels are under the con-
trol of another processor. Thus, some Input Processors may
become heavily loaded while the other Input Processors remain
idle or under-utilized. However, this architecture is con-
sidered to be the best compromise since it is not as complex
123
-r
w
Architecture A. Therefore, this is the architecture that
3.9 chosen for the actual implementation.
Since this architecture merely divides the workload by
means of buffer assignments, no major hardware changes are
required. The Input Buffers, the Input Switching Networks
and the polling circuits are identical to those used in the
three processor design. Therefore, these system components
are not presented in this chapter (See 3.1 for a.review of
these components).
4.3.2 The Input Processors
The Instruction Execution Units and the Microprogram
Control Units for the Input Processors are the same as those
used in the Three Processor Designs (see section 3.2 for a
review). However, since ELIST has been redesigned to meet new
requirements, the Input Processors' Microprogram word is dif-
ferent. Figure 4.11 contains the IEU Control Fields in the
Microprogram word. Figure 4.12 contains the MCU Control
Fields and the Jump Control logic function for the Input
Processor. Again, their functions are similar to those in
the three processor design as discussed in 3.2.4.
4.3.3 The Input Software Routine
The Input Service Routine is sense loop driven. The
Input Processor remains in the loop until the Input Polling
Circuit locates a full input buffer. Once a full buffer is
found, the processor leaves the loop and fetches the address
of the buffer from the poller. Next, the Input Processor
124
lu
Q
Q
CD
^
+ 0`G
d
co
^
I
4
10
I ¢W ^ ^-1 'C-I cl ^ ^-)
U 0 fti
N
H
J
O
2e
W
M
JN
O
WZ
M
M
O
co
1
%O
16
in
fd
cu
1
cd
v.
t!1
1
N ^f
W V
W_ u^i
W
O ^
LA v'
c gco
Z
M
7
E H
v
0
J J
Q ^
LL.
OD
L o
v
cJ	 40
Q N
I- o
e 2
w Q
.Q
O
-
H
v
Q
Q
w
Q
I0^
V
J
Q
H
IH H
C
H^
d-
H
m
H^
WD
Iw N^
0
i
I
. Q
m
w ^ ^
J
Q
w vi cd
h N
CD to vl
N
col 	 I
to
b
r-1
W
-rl
W
O
W
43
Q
OU
R14
ONO
$4
a
O
H
U
a
w
H
O
w
m
O
U
O
1^1
a
aa
aH
.,4
w
125
NEXT Avp1 % sEcecr TuMP ADDR ESS
39- 38	 39.4oZ
Ni No I Next	 AooeEss
V5 (25 Aft +  1
0 1 UNcoNCrnoNAL rump
1 701-P ON XBSR-A s
nMP 00 OAV-A = ^
•	 3'u ror AopREss
TA z ^ TA Z ^ ^-A, 3 TAO
:EM-A
ELIST
D A v N1 No
X ),I- QS 95
A x 0
0 x 1
1 x 1 0
x 0 1 1
x 1 1 1
1 1 2
1.^ 1 2
Awc-vS source
JA PC + 1
T,,,.p A dd c'e s s
Sunp AddcesS
µPc + 1
Svmp Address
,uPc + 1
Fig. 4.12 Input Processor MCU Control Fields
and Jump Control • Logic Function
i
126
w
fetches the address of a free data path in its dedicated Input
Switching Network. Simultaneously, the processor clears the
buffer's service request flag and restarts the pollen. The
Input Processor then checks the DAV flag at its ELIST Data
Port. If this flag is not set, the processor loops until the{
i 
flag becomes set. When the flag is set, the Input Processor
fetches the address of an empty shift register from the port
and sets the DAC flag.
Using these three addresses, the Input Processor links
the full input buffer to the empty shift register via the
free data path. Once this link is established, the processor
initiates the data transfer and returns to . the loop. A flow
chart of this software routine is presented in Figure 4.13.
A listing of this program is given in Figure 4.14.
4.4 The Routing System
The Routing System consists of the Shift Register Array,
the Sorting Processors, the Shift Register Array polling cir-
cuit, the Pouting Processors, and the Packet Routing Data I/O
ports. The Routing System interfaces to the Input System,
the Output Queues Lists and the Output System. The archi-
tectural organization of this system is presented below.
4.4.1 Architectural Workload Division
As discussed in section 4.2, the Routing function
as defined in the three processor architecture can no longer
meet the requirements of the multiple processor design. The
I
127
Fig. 4.13 Input Service Routine Flowchart
128
Fig. 4.13 Input Service Routine Flowchart, continued.
129
Inputs If IBSR-A=fd, JMP to INPUT
*Is there an input buffer
requesting service?
N0: Loop @ INPUT.
Input Polling Port -► Q
*YES: Input the buffer's
address.
Input Data Path Status Port Address-*Address Latch (µp1-A);
Reset Poller Data Path Busy Status Port4Scratch 1
*Find a Free data path,
clear IBSR-A'and restart
the pollen.
ELIST Data Port Address4Address Latch (µp2-A)
WAIT: If ELIST DAV = y1, JMP to WAIT
MIST Data Port-+Scratch 2; Sent a DAC•
*When the data becomes
available, input the shift
register number from the
ELIST port.
Scratch 1+Data Path Latch A Base Address-*Address Latch (UP3-A)
Q -* Data Path MUX select Latch A(D)
*Link the buffer to the
data path.
Scratch 1+Data Path Latch B Base Address-Address Latch (pP4-A)
Scratch 2 -* Data Path DeMUX select Latch B(D)
*Link to empty shift
register to the data path.
Data Transmit Control Address ♦Address Latch
Scratch 14Data Bus Decoder (M1-A); JMP to INPUT
*Start data transfer and
return to sense loop.
Figure 4.14 Input Service Routine
130
principle requirement is that each Output Queue List be
accessed by only one Routing Processor. This constraint is
satisfied by dividing the Routing function into two smaller
tasks. Each task, the sorting of packets and the routing of
packets, is assigned to one class of processors. The Packet
Sorting Processor is assigned the task of sorting each packet
in the array. The sorting function requires that a packet's
destination and its location in the array be sent to the
proper Packet Routing Processor. The Packet Routing Processor
then uses this information to route the • packet by placing the
packet's array address into the proper output queue list.
The system architecture for a single Packet Sorting Processor
is presented in Figure 4.15. Figure 4.16 illustrates the
system architecture for a single Routing Processor.
Implementation of this scheme does not require the re-
design of the Shift Register Array, the Shift Register Array
Polling Polling Circuit, or the Output Queue Lists. There-
fore, these components are not discussed in this chapter (see
section 3.1 for a review of this hardware). However, the
processors and their software routines are different from
those in the previous design. In addition, the new component,
the Packet Routing Data Port, is implemented in this archi-
tecture. Therefore, these topics are discussed. The Packet
Routing Data Ports are presented below as the first topic.
4.4.2 Packet Routing Data Ports
The Packet Routing Data Ports provide the necessary
interface between the Packet Sorting Processor and the Packet
i
Routing Processor. Each Packet Routing Processor is assigned
131
Io a^ar d ^, c
W =
LL z
0 H H
Y. (X z
0
V
O Q
^
LL QH1
<C J
$4
0
0
a
t+
a
.04
41
w
0w
+1
a,
x
v
a
a^
tit
a
.,q
ul
M
w
0
w
a^
a
A
FA
tn
rl
V'
b+
MI
W
V CL
w M
1
M	
^.^^
LL N
M J
S
1/j
r
i
E
r
132
r
aW^
0 Q ,J
w
a
a
r+
n^
xu
a
0
d+
a
b
w0
w
u
N
D
u
01
J^
•.1
u
to
N
^o
er
d+
.rf
w
133
its own Packet Routing Data Port. Any Packet Sorting Pro-
censor can send data to any Packet Routing Data Port. Asso-
ciated with every Packet Routing Data Port is a dedicated RAM
which is external to the Routing Processor it serves. The
function of these ports is to accept the routing information
from the Packet Sorting Processors, to store the routing infor-
mation in the external RAM and to provide this data to the
Packet Routing Processor when needed.
There exists an alternative to using the external RAM
for storage, but it is considered too costly to implement.
The alternate scheme requires that whenever a Sorting Processor
places data into a Routing Processor's data port, the Routing
Processor is to be notified by an interrupt. This interrupt
signal is activated by the Sorting Processor. The Routing
Processor responds to the interrupt by suspending the Packet
Routing Routine in order to fetch the data from the full port.
Once fetched, this data is stored in an internal software queue.
This scheme is considered too costly because:
1) Additional processor hardware will be required to
handle the interrupts.
2) The additional required software overhead will
increase execution times and reduce throughput.
These are the reasons the hardware stack scheme is implemented.
The operation of a Packet Routing Data Port can be best
explained by tracing the procedure that a Packet Sorting Pro-
cessor follows to send data to a port. Once a Packet Sorting
`	 Processor determines which port is to receive the data, it
134
checks the associated DAC flag. If this flag is not set, the
processor waits for it to be set. When the flag is set, the
Packet Sorting Processor sends the packet's destination infor-
mation to the Packet Destination Data Latch. Next, the
Packet Sorting Processor sends the packet's shift register
array address to the Packet Array Address Data Latch. Once
both latches are loaded, the Packet Sorting Processor sets
the DAV flag which automatically clears the DAC flag. A
single Packet Routing Data Port is illustrated in Figure 4.17.
Every DAV flag is scanned by a hardware polling circuit.
This polling circuit is presented in Figure 4.16. When an
activated DAV flag is found by the poller,.the STORE DATA
flip-flop is set. The set flip-flop halts the poller and
activates the one-shot that generates the low-active WRITE
signal. The outputs of the two data latches associated with
active DAV flags are enabled. The enabled output of the
Packet Destination Data Latch is sent to the Packet Destina-
tion Data RAM and the output of the Packet Array Address Data
Latch is sent to the Packet Array Address Data RAM. Both of
these RAM's are enabled in the write mode by the activated
WRITE signal. The WRITE signal is held activated until the
data is strobed into the RAM's. This data is stored into
the two RAM's at locations which have the same address since
both RAM's share a single index pointer. The architecture
of the Packet Routing Data RAM's is presented in Figure 4.19.
Once the write operation is complete, the active-low
WRITE signal goes high. The low-to-high transition of this
135
A06"
A"Alit WA
RAM
AN-D
—RCS
(	 ^S
PAC.
Effiluf;
TO FkW7
DNA RAA
To Fwi*r
Ammo
AOpaiSi OA^A
M w
)Ac%eT
UIV-uve
D ATA
LArca
PAC kR
AmIN Y
40fa ti
OA1A
LAW" U
AA P4 -D
Res
^.^.-^	 ^r .c( a oa N S
R' Sr
—^ DAC- Dw
olAv 9, o S p- DAv
OAC.
t -
' eaAee^ 1L	 '
Ap-+ — D
was
R LS
To Ww" PACkCT
0lSt7^Afi^a a+
bw MIA OATH
tL ATCW
TFF
Mon %g*T34
oevc,_ Pawcssoa	 i
'A ^
DAv S
Ii
I
I
DAC-D
—^Q –	 D - DAV
Fig. 4.17 A Single racket Routing Data'Port
136
U
1^1
V4
V
O
Oa
0a
ro
43
roa
a
4J
a0a
a,J
U
ro
a
co
a
tT
.,j
w
T4 ft M
	 (11
W
V -d ccM	 w
w
xW
_J
137
W
H
3
H	 o e
0
3	 H a
F
Fe
bW ^
ro
ro
a ?
oC ri
	 ^
% d	 -rl
^ ^	 0
W W	a
41
u
U
to
a
sr
rn
.H
w
o ^
o.. m
7O
GC
02
•
138
signal activates the one-shot that generates the Flag UPDATE
signal. The Flag UPDATE signal clears the DAV flag and sets
F
.!	 the DAC flag. The clearing of the DAV flag resets the STORE
F	 DATA flip-flop. The reset flip-flop activates the INDEX UP-
DATE signal which increments the hardware counter that serves
as an index pointer. The data structure for the Packet
Routing Data List is given in Figure 4.20. Both the read and
the write operations take place before the index pointers are
incremented. When the two index pointers are equal, the list
is assumed to be empty.
When the Packet Routing Processor needs to fetch data
from the RAM's, it first selects the Packet Destination Data
RAM and then fetches the data. Next, the Packet Routing
Processor selects the Packet Array Address RAM and fetches
the data. The processor increments the index pointer once
both read operations are finished. Two-port RAM's are used
to allow a simultaneous read by the processor and write by
the hardware. Since the list is assumed to be empty when the
index pointers are equal, a read and a write operation will
never occur at the same location.
4.4.3 The Packet Sorting Processors
The Instruction Execution Units and the Microprogram
Control Units of the Packet Sorting Processors are similar to
those used in the three processor design for the Routing Pro-
cessor (see section 3.2). The IEU control fields in the Micro-
program Word for the Sorting Processors are presented in
139
x x x
x X
3
0
0
0
a	
- PTRO
i A X.
a, X x
x x
0 ATA
b A T A
0
0
0
DA TA
PTR I	 go
0
No	
— PTRO
Fig. 4.20 Packet Routing Data List Data Structure
140
Figure 4.21. Figure 4 . 22 contains the MCU control fields and
the Jump Control Logic Function for the Packet Sorting Pro-
cessors.
4.4.4 The Packet Sorting Service Routine
The Packet Sorting Service Routine is sense-loop
driven. While the Shift Register Array Polling Circuit
searches for unserviced packets, the Packet Sorting Processor
loops on the test bit. Once an unserviced packet is found by
the polling circuit, the Packet Sorting Processor exits from
I	 the loop. The processor fetches the address of the packet
I
from the halted poller. The packet's syndrome is fetched
and sent as an address to the Syndrome Decoder ROM. Con-
currently, the packet's service request flag is cleared and
the poller is restarted. The ROM output is fetched and
exclusively-ored with the fetched packet header. The cor-
rected header is stored back into the array. Using the cor-
rected header information, the Packet Sorting Processor
determines the packet's destination. The destination informa-
tion is then used to determine which Packet Routing Processor
is to receive the packet's routing data. This is accomplished
by sending the destination data to the Sorting Processor's
t Address Decoder. This decoder will generate the address of
the Packet Routing Data Port associated with the destination
k
of the sorted packet. Since a Routing Processor may route
r
packets destined for different ground stations, the different
i
	
	
destination codes of these packets must generate the address
of this Routing Processor's port. Since the different codes
141
i
i
v d ^s :
N ^ ^m
^
,v..
N
w W
o 0 0z Q
a a+^. m a
,j L'1 `' U
d
--^ ^ ^ ^ m m m
Q
`C H I-i H
^G x
IW e-1 t 1 C rl t-^
A
m
20
0
to
d1
-.i
W
r4
14
41
O
u
rx
ro
!T
N
aO
u
W
H
O
w
a^
U
14
a
ON
c
.r,
41
14
ul
4J
X
V
a
N
tr
.A
w
M Q
^ a
^ WM
4c
kv
M
H	
^ J
e`}►' CIC
5 ^M
ca ^ ^
I3
Q
M
^ ^ J Q
M C1?.	 ^ Q ^,
i
.o
J
w (H	 o0
^ HV lJ
^. Z Ham'
1
__j	 F"
d z
L
N
d s
Q m
w
^ ^
W
,^ c Q	 e-w H,,
4o 0^
L
o	 cW
G WZ N
S^
J VL e^
142
N, N NEXT ADL pE r s
S Aft + 1
W(AMITrtwAL sump
nj ZVOP ON NEW-0 n
'^, 1 sump oN
1va► P ADDRESS
TAS ^ TAB ) -XA 1 ) 7AI
NExr ADDRESS %tRCT Z umP ADDRESS
44-41
	 ¢a- 45
WW-6 A DDRE SS 	 SOURCE
X x 9S 1 1 .0 Pc + 1
X X qS 1 I -rut-, P 	 ADO RC ss
Y-, UUAP	 ADDRESS
^. x ^. A ^. ^PC + 1
X ^. 1 1 sumP	 ADDRESS
X ^. ^. 1 1 f3 .v►PC 4 ^.
Fig. 4 . 22 Packet Sorting Processor MCU Control
Fields and Jump Control Logic Function
143
will enable different address decoder lines, an encoding
V)	 scheme is needed. WIRE-ANDing the different address lines
that must enable the same port will provide the system with
the single port address lines needed. The only constraint
associated with this scheme is the requirement that the address
decoders have low active, open collector outputs.
Once the proper Packet Routing Data Port is addressed,
the Sorting Processor checks to determine if the port is
empty. If the port still contains valid data, the Packet
Sorting Processor waits for the port to be emptied by the
automatic port hardware. When the port is empty, the Packet
Sorting Processor first sends the packet's destination data
to the port. The processor then sends the packet's array
address, sets the port's DAV flag and returns to the sense
loop. Figure 4.23 contains the flow chart for this routine.
A listing of this program is supplied in Figure 4.24.
4.4.5 The Packet Routing Processors
The Microprogram Control Units of the Packet Routing
Processors are similar to one used in the three processor
design for the Routing Processor (see section 3.2). However,
the Instruction Execution Units (IEU) of the Packet Routing
Processors are redesigned to handle the Packet Routing Data
Ports. Since the Packet Routing Processors need no polling
circuits, the Direct Data (DB) is used to supply the pro-
cessors with the Packet Routing Data. This scheme saves
execution cycles since the processors are not required to
generate the addresses of the external data RAM's. A single
144
iI
START
TOP	 .
I t
N O
	
	i+eet a^Up saviced 044
site,
t
ye s
OeMV.iat 4 he
addtKi ^ the
ShA
reQvettin9 Service
Fetcl% oAJ comet
♦MC tntoder. W41
+ht hM45 ciTC04.
Stares Correc ho
Deader .a +hit
S.R. array ,
De}ec^;pe the
pAcIMNs
DCS+ i A' @, i on .
: Z-Ij Ze -,, .II NO
yes
Fig. 4 . 23 Packet Sorting Service Routine Flowchart
145
Fig. 4.23 Packet Sorting Service Routine Flowchart, continued
If NW-D=O• JMe to START
*Is there a shift register
•	 requiring service?
NO: Loop 0 START.
SRS Polling Port-►Scratch 1
*YES: Input the address
of the shift register.
Syndrome Generator Rase Address+Scratch 1 Address ♦Latch (uPl-D)
Syndrome (R) -P, Decoder ROM Address Latch= Reset Poller
*Fetch header syndrome and
send it to the Decoder ROM.
clear NEW-0 and restart
the pollen.
Decoder ROM Address-Address Latch (µp2-D)
(Decoder ROM) @ Syndrome (R)-Q
*Fetch error word from ROM.
Header Base Address + Scratch 1-+Address Latch (µp3-D)
ALU EXOR Q-+Scratch 2, Header Port (R)
*Correct the header. Store
it into the S.R. Array and
into Scratch 2.
Scratch 2 AND Destination Mask4Q
*Determine packet destina-
tion
Q + Packet Routing Processor Base Address ♦Address Latch
( µp4-D)
LOOP: If DAC-n - O, JMP to LOOP
Q-►selected Packet Routing Destination Data Port
Scratch 14selected Packet Routing Shift Register A Data
Ports set DAV flag= JMP to START.
Figure 4.24
*Select the proper Packet
Routing Processor's Data
Port. Send the packet's
destination data. Then
send the packet's S.R.
array address. Set the port's
DAV flag and return to the
top of the program.
Packet Sorting Service Routine
147
I.r
control signal from the processor's microprogram word controls
the DS SOURCE MUX, which selects either the Packet Destination
Data RAM or the Packet Array Address Data RAM. The output
1	 from the MUX is tristated because the DS data bus is inter-
nally shared by the output of the ALU's register file. The
redesigned IEU used by the Packet Routing Processors is dis-
played in Figure 4.25. The iEU control fields in the Micro-
program Word for the Packet Routing Processors are given in
Figure 4.26. The MCU control fields in the Microprogram Word
and the Jump Control Logic Function are presented in
Figure 4.27.
4.4.6 The Packet Routing Service Routine
The Packet Routing Service Routine is sense-loop
driven. The Packet Routing Processor loops, testing the
status bit which informs the rrocessor when packet routing
data is available. When a packet's routing data is available,
the Packet Routing Processor leaves the loop. The processor
then fetches the packet's destination information. Next, the
packet's array address is fetched. Using the destination data,
the Packet Routing Processor selects the proper output queue
list. Concurrently, the Processor's index pointer for the
Packet Routing Data List is incremented. The packet's array
address is loaded into the Queue List Data Port by the pro-
cessor. The Processor then requests access to the queue list.
Requests for access are generated until the Packet Routing
Processor is allowed to access the queue list. Once access
is granted, the hardware automatically strobes the array
148
Z,
A" at
FAA
Fig. 4.25 Packet Routing Processor IFU
149
d 7
war ;i
iN
i tu
r
cc
H
w
M
h
.t
a
n
ICW ^
a
i
W	 1
M	 a a1	 N 1'
g ^
W
M
M
^	 z
co
^	 2
to
1
t.
^ ¢ N
^ ^ H
2h	 O
ee
r
0
W ^
M
1
1^• a
N Q Z
~ o ^
^ OH
N
Q O
. a
M•
E R
Ili a ^, 8'a
Id
u
w ^ ^
i H4 ^
FA
er
von c^
J w w
y
Y
Q H .+ u
OAO
do
II ^ q4 er
I,a c-i	 .-1 V4 .. w
150
QS 0 S(
^ ^ 1
^ 1 X
^ 1 X
1 QS X
1 ^ ^
1 1 x
I I x
,uPC + i
SumP ADDRESS
AA PC + 1
'Sun, (- AGGRESS
APC + A.
,UPC- + 1
FE C„ S Z Sje
1 1 0 0
1 ^ s 1
1 1 0
^ 1 1
11 0
Z ^ 1 1
1 1 ^^
1 0 1 1 1
NEXT ADDRESS SELECT YvrAp ADDRESS
40 - 4d	 4.3 -46
Of NO T .I NEXT ADDRESS
0 0 a APO 
0 (d 1 UncwbxriwjAC T,4i-P
X MWP ON RME-e - i
1 (^ x AWP aw SIAtus -e n 11 1 x rp ens =W.E-a = Z
TumP ADORES5
TAS) V t ., TAI. SAO
ROuTC -B STAM-B XCU-6 N, No S
7C X x
X x >C
0 x X
1 X x
x 0 x
1 x
X X
x x 1
i
	 Fig. 4 . 27 Packet Routing Processor MCU Control Fields
and Jump Control Logic Function
r
s	 ^
151
address data from the port into the queue list RAM. Meanwhile,
the Packet Reputing Processor checks the status of the queue
list's corresponding output.buffer. If the buffer is in the
Idle state, the processor updates the buffer's status to the
Empty state, releases the queue list and returns to the sense
loop. However, if the buffer is not in the Idle state, the
processor simply releases the queue list and returns to the
loop. The flow chart for this software routine is shown in
Figure 4.28. A listing of this program is given in Figure 4.29.
4.5 The Output System
The Output System consists of the Output Buffers, the
Output Processors, the Output Switching Networks, and the Out-
put Polling Circuits. Interfacing to this system are the
Output Queue Lists, the Shift Register array and the ELIST
;
i	 Data Distribution System. The architectural organization of
^E
this system is presented below.
4.5.1 Architectural Workload Division
The two major system constraints that influence the
architectural organization of this sytem are:
1) Only one Output Processor must control an output
buffer. Each output buffer must be assigned to only
one Output Processor in order to eliminate resource
contention.
2) Only one Output Processor can have access to an out-
put queue list.
152
Fig. 4.28 Packet Routing Service Routine Flowchart
r
x
153
s
Fig. 4.28 Packet Routing Service Routine Flowchart, continued
l ^+4
154
TEST: If ROUTE-B = 1, JMP to TEST
*Are there any packets re-
questing routing?
NO: Loop @ TEST
Destination Data RAN + , Scratch 1
Shift Register Address RAM 4 Scratch 2
*YES: Input the packet's des-
tination and array address.
Scratch l+Output Queue List Base Address-oAddress Latch (UPl-B);
update packet data pointer
*Select the Output Queue List
and OSW of the-destination
buffer
Scratch 2 -►
 Output Queue List Data Port (N)
*Send the packet's array address
to the Output Queue List Data
Port.
REQUEST: Request Queue List (N)
*Request access to the Output
Queue List selected. If access
is granted, the data from the
Port is automatically stored.
If STATUS-B = 1, JMP to REQUEST
*If access is not granted, Loop
@ REQUEST. Proceed otherwise.
If OSW = NOT IDLE, JMP to END
*Is the output buffer idle?
Set OSW=EZiPTY; Release Output Queue List (N); JMP to TEST
*YES: Update OSW, release Queue
List and return to the top of
the routine.
END: Release Output Queue List (N); JMP to TEST
*NO: Release queue list and re-
turn to the top of the routine.
Fig. 4.29 Packet Routing Service Routine
155
The Separate Systems Scheme as discussed in section 4.3.1 is
considered the best technique to use in organizing the Output
System in order to fulfill the above requirements. The system
architecture of a single Output Processor in the Separate
System Scheme appears in Figure 4.30.- Each output processor
is assigned to a fixed number of output buffers. in addi-
tion, each processor is assigned a dedicated Output Switching
Network, a dedicated Output Polling Circuit and is allowed
access to the Output Queue Lists that corresponded to the
assigned output buffers. Since the implementation of this
architecture did not require the redesigning of the Output
Buffers, the Output Polling Circuits or the Output Switching
Networks, these hardware blocks are not discussed in detail
in this chapter (see section 3.1 for a review). The Output
Processors and their software are discussed below.
4.5.2 The Output Processors
Both the Instruction Execution Units and the Micro-
program Control Units used by the Output Processors are
similar to those used by the Output Processors in the three
processor architecture (see section 3.2). Shown in Figure
4.31 are the IEU control fields of the Output Processors'
Microprograms Word. Figure 4.32 displays the MCU control
fields of the Microprogram Word and also the Jump Control
Logic Function for this class of processor.
4.5.3 The Output Service Routine
{
	
	 Like all the software routines, the Output Service
Routine is also sense-loop driven. The Output Processor
156
0r	
C9w Q
4 H
^ QM
W
W 'z"
N
4
OW
fA
d1
O
a
a
41
a^
0+
a
ro
0
w
u
41
u
a^
u
$4
a^
41
N
N
d
OM
tN
w	 ^
f	 '
157
v
1- WN N
0
^ V
W
u
o
M
v v
W
O
W
J
Q
NH
IW H
15
Fyn
H
1A
co
A
^^1
H
11
I rO
^^ 
w
N	 -^
O^ N
1	 ^
^	 aO
c
A J
4t
to
m yS .	 ^, 'C
r-^1
W
r4
O
$4
Ou
w b
COX 0$
d
54
V
clo a
V to V	 i
C a
5C^ Cr
.J ca °0
a
Q H H H a
OO
H
•	 i
f4
O
i
E
S
1
1
W
gr4-
M
W
,	 1- aft
1
oer^J2O. WM r
A
ro
1
IY
Nd ^
M
ern oft
J14
M c
4441
m
x
'S
M
'
aJ ^
^ Q
^ a
Z
^ o7 F
/ J VQ
n
W
W
-+	 o
Q	 ^7	 J
W
^ G
	
^
i0 E dH	 O	 t;
158
I►
4-1- 41
NExr Aomess
AAPC+ 4.
Ulawo .riwAL Tv,-P
1 TwoP ON itma-C=1 1 W—f ON Smros -C +^
! aP om DAC-C a
i sine oN emm-c 	 fa
i
NEXT ADDRESS SGUCT	 TUMP ADDRESS
44-49
TumP ADORE ss.
sA.,, 7A3 , sAZ , 'SAS , 3740
SmVw -C S' aws-c DAL-C EmPTY•c N, Nm T I FE Cn C j So I ADORFSS SouRCE
X X %^ x Rf r6 q 1 ,uPc + 1
X
1
X
x
X
X
X
X
X
X
^
!^
x
X
x
^
x
X
X
x
^
^
X
x
x
X
X
k
x
X
^
1
^
^
1
1
1
1^^
^^
^'
^.
^
^
Y^
1 q
^1
1
^
^
^I
1
^
1
1
1
1^
1
^
1^
1
Qf
1
1
1^^
i ¢^^
1
1
1
1
el
1
^^
1
1
1
1
QS
1
1Lmp ADDRESS
^np ADDRESS
,uPc + 1
ARC +1
TumP ADDRESS 
-rumP ADDRESS
^uPC + 1
AAPc + 1 
S'JmP ADDRESS
Fig. 4.32 output Processor MCU Control Fields
and Jump Control Logic Function
159
rleaves the loop once its polling circuit locates an empty out-
put buffer. After leaving the loop, the processor fetches the
address of the buffer from the halted poller. Using this
information, the Output Processor selects the buffer's corre-
sponding output queue list. A request for access to this
list is generated by the processor until access is granted.
When access is granted, the Output Processor determines if
the queue list is empty. If the processor finds the list
empty, the processor changes the buffer's status from the
Empty state to the Idle state. Concurrently, the processor
releases the queue list, restarts the poller and returns to
the loop.
However, if the queue list accessed is not empty, the
Output Processor fetches the address of the packet to be
transmitted. Simultaneously, the output buffer's status is
changed from the Empty state to the Bus* , state, the queue
list is released and the poller is restarted. After the Out-
put Processor has completed all these tasks, it finds a free
data path in its dedicated Output Switching Network. This
data path is linked to the shift register containing the
packet to be transmitted. The Output Processor then links
the empty buffer to the data path. Once the path is complete,
the processor initiates the packet's transfer into the output
buffer. While this transfer is taking place, the Output
Processor checks the status of its ELIST Data Distribution
I/O port. If the Data Accepted (DAC) flag is not set, the
processor loops until it becomes set. Once the Output
160
Processor finds the flag set, it sends the array address of
the freed shift register. After loading the 1/0 port, the
processor sets the port's DAV flag and returns to the sense
loop. Figure 4.33 contains the flow chart for this routine
and the listing of this program appears in Figure 4.34.
161
l	 J
i
's
10
ft
At j
r
a
Fig. 4.33 output Service Routine Flowchart
162
V
r<<
VC(
lw^M io tldST
^n	 P.or nr
ei 1tie 0A v (L5.l
AC,'
kciC'.:: J^`j.f
F ir.A O t^:Q
d.io ^.^ ► I.^k,
Tcow;n.'t
t 1,
ra^krt.	 ^
INAL p
r	 •+
Loop
i	 rig. 4.33 Output Service Poutine Flowchart, continued.
163
.r
SQt Osw
rOLE.
Reuse OJIM
QW't lift $mt
ruilg i he Mcr.
LOOP
a
i
Y
{
6^/HISS
a M !^row' P	 da 4 +e t,.^s,
LOO P
Fig. 4.33 output Service Routine Flowchart, continued.
163
OUTPUT: If aERVICE-C - 0, JMP to OUTPUT
*Is there an output buffer
requesting service?
NO: Loop @ OUTPUT
Output Polling Port-+Q
*YES: Input the address of
the buffer.
Q + Queue List Base Address♦Address Latch (µp1-C)
REQUEST: Request Output Queue List (N)
*Select the buffer's Output
Queue List and OSW. Then
request access.
If STATUS-C-1, JMP to REQUEST
*Was access granted?
NO: Request access again.
If EMPTY-C-%F, JMP to IDLE
*YES: Determine if the list
is empty.
List Empty: Branch to IDLE
[Output Queue List (N)j @ OPTR (N)-4Scratch 1; Set
I	 OSW=BUSY; Release Output Queue List; Reset Poller
*LIST NOT EMPTY: Input the
SAA which contains the
packet to be transmitted.
Then update the OSW, restart
the poller and release the
queue list.
Output Path Status Port Address♦Address Latch (µp2-C)
Data Path Busy Status Port♦Scratch 2
*Find a free data path.
Scratch 2+Data Path Latch A Base Address-Address Latch G N -C
Scratch 1 Data Path HUX select Latch A(D)
*Link the shift register to
the data path.
Fig. 4.34 Output Service Routine
164
.w
t!
	
	
Scratch 2+Data Path Latch B Base Address-Address Latch (up4-C
Q -► Data Path DeMUX select Latch B(D)
K	 *Link the output buffer to
t.	 the data path.
Data Path Transmit Control Base Address4Address Latch
Scratch 2-+Data Bus Decoder (Ml-C)
r
t
	
	 *START Packet transfer.
LIST  Data Port Address-►Address Latch ( µp5-C)
LOOP:	 If DAC-C-#, JMP to LOOP
'	 Scratch 14ELIST Data Port
*Send the empty S.R.# to the
ELIST data port when the
port is empty.
Send a DAV; JMP to OUTPUT
^	 *Send a DAV to the port and
return to the top of the
program.
f
IDLE:
	
	 Set OOW=IDLE; Release Output Queue List; Reset poller;
JMP to OUTPUT
*Update OSW, release queue
list, restart poller and
return to the top of the
program.
Fig. 4.34 Output Service Routine, continued.
165
5.0 EVALUATION AND THROUGHPUT ANALYSIS
^--^	 The evaluations of the two packet switch architectures
are presented in this chapter. The evaluation of the packet
switch's performance is in terms of throughput. This evalua-
tion is based on the software execution times. In the multiple
processor architecture, additional parameters affect the sys-
tem throughput. Therefore, equations relating the number of
processors and the number of users to the system throughput
are presented.
5.1 Performance Evaluation
In order to compute the maximum system throughput, two
assumptions must be made. Both assumptions hold true for the
two architectures. The first assumption is that the system
is heavily loaded such that all output queues contain at least
one packet awaiting transmission. The second assumption arises
from the fact that processors never wait for internal hardware
and that each system is virtually free from resource conten-
tion. Thus, each processor is assumed to be busy 100% of the
time under heavily loaded conditions. Therefore, a processor
can process one packet in the amount of time required to exe-
cute the assigned software routine completely without inter-
ruption. Using these assumptions, an estimation of throughput
for each multiprocessor architecture is presented below.
5.1.1 Throughput Estimation for the Three Processor System
In order to estimate the system throughput, equations
and relationships are developed. In these calculations, system
166
W
parameters are introduced. These parameters are:
1) tPl s Input Service Routine execution time
2) tP2 = Routing Service Routine execution time
3) tP3 = Output Service Routine execution time
4) R = •Bit Rate per user
5) N = Number of Users
6) B = Number of Bits per Packet
7) FP = System Throughput in Packets per Second
8) F  = System Throughput in Bits per Second
A processor can process one packet in the amount of time
required to execute the assigned software routine. Since
each packet must be serviced by all three routines, the pro-
cessor with the longest execution time will determine the
maximum system throughput. The software execution time for
each processor is listed in Table 5.1. A processor clock
cycle of 120 nanoseconds is assumed. Table 5.1 shows the
number of instruction cycles required and the time taken.
Some routines have several execution times listed. Each of
the different values illustrate the various effects of re-
source contention, the state of the output queue lists and
the state of the output buffers.
167
Normal Operation (No memory
contention) :
Input.Service Routine 11 cycles = 1.32 µ Sec
Output Service Routine
(a) Transmit Packet 16 cycles = 1.92 µ Sec
(b) Empty Queue 7 cycles = 0.84 µ Sec
Packet Routing Service Routine
(a) Enqueue Packet 15 cycles = 1.80 µ Sec
(b) Enqueue Packet
and Update OSW 15 cycles = 1.80 µ Sec
Worst Case Due to Memory Contention:
Input Service Routine 11 cycles = 1.32 µ Sec
Output Service Routine
(a) Empty Queue 0 cycles
(b) Transmit Packet 18 cycles = 2.16 µ Sec
Packet Routing Service Routine
(a) Enqueue Packet
(Default) 19 cycles = 2.28 µ Sec
(b) Enqueue Packet
and Update OSW
(Default) 19 cycles = 2.28 µ Sec
(c) Enqueue Packet 17 cycles = 2.08 µ Sec
(d) Enqueue Packet and
update OSW 17 cycles = 2.08 µ Sec
Table 5.1 Software Execution Times for the Three Processor
System
168
IV
As stated earlier, the packet switch's maximum throughput
is achieved when the processors are busy 100% of the time and
when no output queue lists are empty. Therefore, in order to
determine the maximum throughput, the slowest execution time
must be selected from one of the following values:
1) The execution time for the Input Service Routine
under normal operating conditions.
2) The execution time for the Packet Routing Routine
when it enqueues a packet under normal operating
conditions.
3) The execution time for the Output Service Routine
when it transmits a packet under normal operating
conditions.
Selecting and comparing the above values from Table 5.1,
the execution time for the Output Processor is found to be
the largest of the three values. Therefore,.the three pro-
cessor system has a maximum throughput which is limited by:
FP < 1/t P3
	 (5.1)
The system throughput in terms of bit rate is found by
multiplying the maximum packet throughput by the packet bit
length:
B xFp
 = F  < B/tP3	 (5.2)
169
i
4The system throughput in terms of bit rate is related to the
number of users by:
FB - NXR	 (5.3)
This can be expressed as:
NXR < B/tP3 .	 (5.4)
or,
tP3 < B/(NXR) 	(5.5)
In a heavily loaded system free from resource contention,
the Output Processor services one packet every 1.92 micro-
seconds. Therefore, the maximum packet throughput is:
FP < 1/1.92 VSeconds = 520,833 packets/second 	 (5.6)
If a packet length of 10,240 bits/packet is used,
the maximum system bit rate is:
F  = 10,240XFP = 5.3X10 9 bits/second.	 (5.7)
An important point to note is that the system is de-
signed such that the processing time of each packet is inde-
pendent of the packet'size. Therefore, an increase in the
170
.
packet length will increase the system bit rate proportionally.
However, due to the two internal serial transfers, a packet's
delay is affected by the packet's size. An additional draw-
back of overly large packet sizes is that a significant por-
tion of a user's throughput is wasted when snort messages are
transmitted. Therefore, the system's throughput in terms of
a bit rate may be quite large while the actual information
rate could be small. All these points also hold true for the
multiple processor architecture.
5.1.2 Throughput Estimation for the Multiple Processor
System
The maximum throughput in bits/second of the multiple
processor packet switch varies depending on the values of two
parameters. These parameters are the packet size and the num-
ber of processors implemented. In this section, the relation-
ship between the throughput and the number of processors is
presented. In order to evaluate this packet switch, new para-
meters are needed. These new parameters are:
1) c  = Number of Processors in the Input Processor Class
2) c2 = Number of Processors in the Packet Processor Class
3) c3 = Number of Processors in the Packet Routing Pro-
cessor Class
4) c4 = Number of Processors in the Output Processor Class
i	 5) C = Total Number of Processors
171
6) tPl = Input Service Routine execution time
7)'t P2 : Packet Sorting Routine execution time
8) tP3 = Packet Routing Routine execution time
9) tP4 = Output Routine execution time
10) FPcl 
= Input Processor Class throughput in packets
per second
11) FPc2 1-' Packet Sorting Processor Class throughput inpackets per second
12) FPc3 ` Packet Routing Processor Class throughput in
packets per second
13) FPc4 = Output Processor Class throughput in packetsper second
The maximum throughput of the switch is limited by the
k'
maximum throughput of the class of processors which has the
smallest maximum throughput. The throughput of each class
of processor depends on the software execution times and the
number of processors assigned to each class. Therefore, the
throughput for each processor class is:
FPci 11 i(1/tP
)c i f
	
1 < i < 4	 (5.8)
In order to use this equation in the performance evalua-
tion of the multiple processor packet switch, the software
172
execution times must be known. Table 5.2 contains the soft-
°°	 ware execution times for each class of processor. Various
values are listed since the execution times of some routines
vary depending on the current state of the system. As stated
earlier, the packet switch's maximum throughput is achieved
i
when the processors are busy 100% of the time and when no output
F
queue list is empty. Therefore, the execution times used in
t
this throughput estimation are:
1) The execution time of the Input Routine when data
from the MIST is available immediately.
2) The execution time of the Packet Sorting Service
Routine when the Packet Routing Data Port's DAC flag
is set.
3) The execution time of the Packet Routing Service
Routine when it enqueues a packet under normal opera-
ting conditions without updating an OSW.
4) The execution time of the Output Service Routine when
it transmits a packet under normal operating conditions.
Using the data from Table 5.2 in equation 5.8, a table
listing the throughputs as a function of the number of pro-
cesses is constructed. Table 5.3 contains this data compiled
from the evaluation. A graph displaying the relationship be-
tween the number of processors and the upper bound on the
system throughput is presented in Figure 5.1. This graph is
plotted using the data contained in Table 5.3.
173
Normal Operation (No Memory Contention):
^-^ Input Service Routine 13 cycles - 1.56 µ Sec
Packet Sorting Service Routine 13 cycles - 1.56 µ Sec
Packet Routing Service Routine
(a) Enqueue Packet	 9 cycles - 1.08 µ Sec
(b) Enqueue Packet and
Update OSW	 9 cycles - 1.08 µ Sec
O"t ut Service Routine
Ca) Tra-:emit Packet	 19 cycles = 2.28 µ Sec
(a) Empty Queue	 7 cycles = 0.84 µ Sec
Wort Cast Due to Memory Contention:
Input Service Routine 13 cycles = 1.56 µ Sec
Packet Sorting Service Routine 13 cycles - 1.56 µ Sec
Packet Routing Service Routine
(a) Enqueue Packet
(DEFAULT)	 13 cycles = 1.56 µ Sec
(b) Enque Packet and
Update OSW (DEFAULT) 13 cycles = 1.56 µ Sec
(c) Enqueue Packet	 it cycles - 1.32 µ Sec
(d) Enqueue Packet and
Update OSW	 11 cycles = 1.32 µ Sec
Output Service Routine
(a) Transmit Packet	 23 cycles = 2.96 µ Sec
(b) Empty Queue	 11 cycles - 1.32 µ Sec
Table 5 . 2 Software Execution Times for the Multiple Processor
System
174
of • nM n N u1 a► M n ,.^ N ^
^ OD M n rl 10 O u1 01 M
rl rl N N M M M ^
x
.
H ^
roo
age mN O No► IAoo ODn On M^o 10^n 00 M ^pN u1^
. .
14
. . . .
W;
. . . .
N V N M ^ 1C n 4D 01 o^
roaw a va ^
a
O ^
ro^^ o
a^
BIM 4w
41j a vG7 rl 0 r-
^ a
9-4 P4 N M M to V; ^p a
^
M
Ln
E .a a^^
EZiti ^p N O^ to N 00 el' rl n ^ Id
H rl rl N M M
2
	
O	 W Z E
	
W W N
	 Oo
3 aC	
> x>>W
 0 z Z
O W O E^ E+ W N t~ cn W Z E
a
175
ci
VII
a-
0
14
41
u0
14
04
44
0
14
ofIz
4)
A
4J
44
0
0
kA
41
w u
v
0
%4,yo to
41
W
-0
r
0
14
-Z c
E-4
Ei
41
N
404- to
V
Lq	 X61	 V4	
96
LA	 CA	 Tq
VJ46OJ4 I
176
A specific example is prys% -d below to illustrate how
1	 the number of processors required 	 a desired throughput is
determined:
Packet Length
B - 10 0 240 bits per packet
Desired Throughput
FB < 30x109 bits per second
FB/B a Fp < 3.0x106 I.ackets per second
The processor assignments are determined using equation
5.8.
Number of Input Processors
FPC1 s 3 MPS < (1/tPlM
Ci > (3x10 6 packets/sec)(1.56 x 10-6 seconds/packet/
processor)
C1 > 4.68 processors.
Since C  must be an integer value, C 1 > 5 processors.
Number of Packet Sorting Processors
FPc3 a 3 MPS < (1/tP2)C2
C2 > ( 3 x 10 6 packets/sec)(1.56 x 10-6 seconds/packet/
processor)
177
C2 > 4.68 processors
C2 > 5 processors.
Number of Packet Routing Processors
FPc3 _> 3 MPS < (1/tP3)C3
•	 C > (3x106 packets/sec)(1.08 x 10-6, 	 seconds/packet/
processor)
C3 > 3.24 processors
C3 > 4 processors.
Number of Output Processors
FPc4 s 3 MPS < (1/tP4)C4
C4 > (3x10 6
 packets/sec)(2.28x:r-6 seconds/packet
/processor)
C4 > 6.84 rrocessors
C4
 > 7 processors.
There is an important point to note regarding the
system throughput. As mentioned earlier, the system through-
put depend3 on the packet size and the number of processors
implemented. The .important point of this relationship is
that the number of processors that can be implemented is
limited by the number of users. Each user is considered to
r
	 P
s
178
yhave one input and one output buffer. If one ground station
user is allocated two sets of buffers, he is viewed as two
distinct users by the switch. The number of users limits the
throughput because the number of Input, Packet Routing and
Output Processors can never exceed the number of users. This
limitation arises since each user's workload cannot be effi-
ciently divided among more than one processor of the same class.
Therefore, the maximum attainable packet throughput for a fixed
number of users is achieved when one processor from each class
listed above is assigned to one user. As seen in Table 5.2,
the Output function requires the longest execution time of
the three classes listed above. As a result, this function
limits the system's maximum attainable packet throughput as
given by
FP < (1/tP4 )N .	 (5.9)
This equation, which expresses the relationship between
the maximum throughput and the number of users, is plotted in
the graph of Figure 5.2. The importance of this relationship
is illustrated in the example given below.
Desired System Features:
N = 5 users
B = 10,240 bits per packet
F  = 30X10 9 bits per second
Y
	 179
l
A--
i
I
t
toZ)
w0
z
a,
x
w0
00
-A
n	 ^
•1 	 U
w
to
to
to
^ a
w
Ul
0
o
,0
N
- 21
E1
E
O
d N
7
C
w
a
ov
v
si
X
x
v
Lit
14^-	 JSL
	
'+9,
is s o; c
180
System Performance Evaluation using Equation 5.9:
FP < (1 packet/2.28 microsecozds)•5
FP < 2.19x106
 packets per second
FB = BxPp
 < 22.5x109 bits per second
As seen by the results above, the system performance
falls short of the desired goals. The system designer has
three options available:
1) Build the system and reduce each user's throughput to
meet the lower performance rating.
2) Increase the packet length. This solution faces the
problems described in section 5.1.1.
3) Assign the.ground station users additional sets of
buffers so that the packet switch serves more than
five users. This sclution allows additional pro-
cessors to be implemented, which will increase the
system's throughput rating.
The purpose of the above example is not so much to ex-
plain how to solve performance problems as to stress the
importance of the last relationship presented in equation 5.9.
Without this relationship, one would determine the number of
processors required by referencing Figure 5.1. This obtained
value may be impossible to implement due to the user/processor
limitations.
C'
	 181
iI
f
A final point regarding the maximum obtainable through-'.
put of the multiple processor system is that Equation,3.9 has
4
a finite upperbound which is not solely limited by the number
of users. As stated earlier, service fir each packet requires
a read and a write operation at ELIST. Therefore, ELIST will
limit the maximum packet throughput of the packet switch.
Using the hardware technology currently available; ELIST is
designed to provide and accept address data approximately
Y
every 100 nanoseconds. This fact limits the system maximum
attainable packet throughput as given by
FP < (1/tP4 )N < (1/100x10-9)
(5.10)
FP < (1/tP4 )N < 10x10 6 packets/second
A system using a packet length of 10,240 bits will have
a maximum bit rate limited by
F  = BXFP < (10,240 bits/packeL.) x (10x10 6 packets/
second)
FB < 102.4x10 9 bits/second.	 (5.11)
As new and faster hardware and processor technology
becomes available, the overall performance of this packet
switch will improve.
182
h.
S.2 Evaluation of the Processor
Implementation of the packet switch may require the con-
struction of a customized processor chip. Therefore, a review
of the characteristics of the AMD 2903 ALU will provide the
system designer with an insight into the design of a processor
which is better tailored for this particular application. This
review begins with the available features of the AMD 2903 ALU
and ends with the features not provided by this chip that
would enhance processor performance.
r
	
	 The AMD 2903 ALU provides ample arithmetic and operations
for the packet switch. In fact, the number of operations can
be reduced to save hardware complexity. The only functions
required are the addition operation, the logical AND and the
logical OR. The on-chip register file is ideal for holding
scratchpad variables. In both multiprocessor designs, the
full capacity of this file is never used. Therefore, this
component could be reduced in size without degrading system
performance. The single 0 Register, which provides a work
area for some operations, was quite adequate. The provided
ZERO flag went unused and could be eliminated from the custom
designed processor.
There are several features the AMD 2903 ALU architecture
does not support. These features would make the processor
better suited for this particular application. They are:
1) Internal tristate control of the DE Direct Data Input
Bus. This bus is not currently tristate because this
bus is bidirectional. This allows data to enter the
183
V
4aa>
r
ALU from external hardware as well as allowing data
_ - from the register file to be sent directly to external
hardware. Since direct transmission of data from the
register file to external hardware is not required,
this bus could be tristated internally to save ex-
ternal hardware. A possible alternative would be to
increase the size of the internal select MUX. In this
scheme, the DB input bus would no longer need to share
the internal data bus with the register file.
2) Additional Direct Data Inputs. These inputs save
execution cycles since the processor does not need to
generate a device's address before a read operation
can be performed. These inputs can be used whenever
the processor is required to access a single unique
system device. Since the Data Path Busy Status Ports
are unique system devices, this feature would reduce
the software execution times for the Input and Output
Processors in both architectures. This scheme may
require larger internal Select MUXs and more select
control signals. However, there does exist one way
to increase the number of direct data inputs without
increasing the Select MUX size or the number of con-
trol lines. As mentioned earlier, only a small por-
tion of the register file is used. In fact, the A-
Register File is never used. Therefore, this component
could be removed and its input to the Select MUX could
be replaced with a direct data input. This particular
184
1 ^, -
f	 ^,+ n' tai {	 fi/.T4	 S .. r:-. .e
	
...k'
Y:
'	 L
feature would increase system throughput directly and
}	 shouldebe considered an important design criterion.
3) Internal data bus latches. This feature would provide
for the stabilization of ALU data inputs without the
use of external latches.
All these features are recommended for any processor
custom designed for the packet switches.
5.3 Packet Losses
if the throughput rating of the packet switch is exceeded,
packets will be lost even when there are no hardware or soft-
ware failures in the system. However, an important point to
make concerning these packet losses is that the system will
always recover at some point in time. In both architectures,
packets can be lost due to overflow in three components.
overflow can take place in an additional component of the
multiple processor system. The components which are suscep-
tible to overflow are:
i
1) The Input Buffers
2) The Output Queue Lists
3) ELIST
4) The Packet Routing Data Ports' queues.
Even with double buffering, an input buffer will over-
flow if its user exceeds his allotted channel capacity. The
_
	
	 oldest of the two packets residing in the input buffer will
be lost as the new packet is shifted into the buffer.
185
V
if any output queue list becomes full, the packet switch
will encounter serious problems. When a queue list becomes
full, the two index pointers will be equal in value. This is
the same situation for an empty list. When the two pointers
are equal, the Output Processor assumes the list is empty and
does not access the list until new data is placed into the
queue list. Therefore, the list remains full until new data
is placed into the list, overwriting valid data. Only after
overflow has occurred can the Output Processor access the
list. Two serious problems arise from -this overflow condition.
The first problem is that once overflow takes place in the
queue, no less than the entire list of original data will be
lost. The second problem is a result of the first problem.
As stated earlier, the data stored in the Output Queu-w Lists
are the array addresses of routed packets. Therefore, if these
addresses are lost, the routed packets will never be trans-
mitted and they will remain in the Shift Register Array inde-
finitely. Since they are never transmitted, their array
addresses will never be returned to ELIST. This fact could
cause ELIST to become empty. An empty ELIST and the asso-
ciated problems of this situation are discussed next.
If ELIST becomes empty and a new packet arrives at the
input, the oldest packet in the shift register array will be
lost as the new packet is stored in its place. Packets will
continue to be lost until the Output Processors return enough
array addresses to ensure that the next shift register address
fetched by an Input Processor is valid data. ELIST will
in
186
i	
-	 I
become empty when the system users exceed the packet switch's
tU-oughput rating.
In the multiple processor design, if a Packet Destination
Data list becomes empty, the system will face problems similar
to those caused by a full Output Queue list. This is due to
the fact that both lists share the same data structure. Again,
packets will be trapped in the Shift Register Array because
	
•	 the data lost during overflow is needed for routing. If a
packet is never routed, it can never leave the array. There
is no way to re-sort these packets, which means the lost
routing information can never be recovered. As with a full
Output Queue list, the entire list of original data will be
overwritten before the system can recover.
Packet losses reduce the actual throughput of a system
since users must retransmit all packets lost in transmission.
Since a large and effective throughput is the primary goal of
this work, care must be taken to ensure against packet losses.
The system designer must research the queuing problems of the
switch before deciding on the size of the Shift Register Array
and all the various queue lists. If the packet switch is
built with an insufficient amount of array locations and/or
queue lengths for its throughput rating, packet losses will
be inevitable. In addition, part of the responsibility of
ensuring against packet losses belonge to the users themselves.
They must not exceed the charnel capacities assigned to them.
187
IV
5.4 Fault Detection end Fault Tolcrance
Since the packet switches presented in this work are
part of a proposed communication satellite network, fault
detection and fault tolerance are desirable features. Once
the satellite is placed into orbit, maintenance and repair
work will be quite expensive or impossible. Therefore, if
the packet switch could handle its own maintenance problems,
the useful life of the satellite will be extended.
The failure of some components will cause an entire
channel to fail. An example of such a component is an input
buffer. If an input buffer fails, the channel it serves will
also fail. Some component failures will cause intermittent
packet losses. An example of this type of failure would occur
if one location in the Shift Register Array failed. Only the
packets stored in this location would be lost or corrupted.
Both of these types of failures will degrade system perform-
ance but the packet switch can still operate. However, there
are certain component failures which will cause the entire
packet switch to fail. These components should be either
fault tolerant through the use of redundant circuitry or self-
diagnostic. The self-diagnostic components should be able to
hand over their tasks to a spare component upon detection of
a fault. The components which fall into this category for
the three processor system are:
1) The Input Processor
^-	 2) The Routing Processor
f
3) The Output Processor
188
4) All the polling circuits
5) Both Data Path Busy Status Ports
6) ELIST
The components which can cause a channel loss in the
three processor desi g n due to a failure are:
1) Input Buffers
2) Output Oueue Lists
3) Output Sta	 or	 ORIGIN.' P;',^r LS
DE POOR QUALITY
4) Output
The components which can cause intermittent packet losses
in the three processor design due to a failure are:
1) Data paths in the Input Switching Network
2) Shift Register Array locations
3) Data paths in the Output Switching Network
In the multiple processor design, the only system com-
ponent that may cause the entire packet switch to fail, should
it fail, is the ELIST. Single or multiple channel failures
could result if one of the following fails:
1) Input Buffers
2) Input Polling Circuits
3) Input Processors
4) Data Path busy Status Ports
5) Packet Destination Data Ports
6) Packet Routing Processors
189
1) Input Buffers
2) Output Qu
3) Output S
4) Output
"t
--4) 'All the polling circuits
5) Both Data"Path Busy Status Ports
6) BLIST
The components which can cause a channel loss in the
three processor design due to a failure are:
ORMAL PACC jv
aB .POOR QUAD
The . components which cacause intermittent packet losses
in the three processor design due to a failure are:
1) Data paths in the Input Switching Network
2) Shift Register Array locations
3) Data paths in the Output Switching Network
In the multiple processor design, the only system com-
ponent that may cause the entire packet switch to fail, should
it fail, is the ELIST. Single or multiple channel failures
could result if one of the following fails:
1) Input Buffers
2) Input Polling Circuits
3) Input Processors
4) Data Path Busy Status Ports
5) Packet Destination Data Ports
6) Packet Routing Processors
189
d7)
t ^
	 s)
9)
10)
11)
Output Queue Lists
Output Processors
Output Status Words
Output Polling Circuits 	 •
Output Buffers
As noted above, if the Data Path Busy Status Port of an
Input or Output Switching Network fails, the loss of some
channels will occur as a result. However, if only a single
Input (Output) Switching Network is used by the switch (as in
the case of the Aee processor system), a status port failure
will result in the failure of the entire packet switch. Thus,
system reliability and elimination of resource contention is
achieved with multiple Switching Networks.
B	 The components which can cause packet losses in the O
multiple processor design due to a failure are:
1) Data paths in the Input Switching Network
2) Shift Register Array locations
3) Shift Register Polling Circuits
4) Packet Sorting Processors
5) Data paths in the Output Switching Network
Now that the impact of each component failure is identi-
•
f ied, ± : system designer can decide Nhat level of fault
detection and fault tolerance is needed for each component.
t
190
e	 r
v
6.0 QUEUE THEORETIC MODELLING FOR CALCULATION OF THE AVERAGE
RESPONSE TIMES AND THE AVERAGE QUEUE SIZES
6.1 Introduction
In this section queue theoretic analysis and evaluation
of the proposed designs are presented. Analytical relationships
between the average response times and the design parameters
of the switch are obtained. These expressions are to be used
to evaluate the performance of the three designs of the switch
for various values of these parameters. Also, the average queue
sizes in the shift register array are obtained. This queue
size gives an idea as to the required size of these shift
register arrays in the various designs.
6.2 Design Parameters of the Switch
The average response time of the switch and tho average
size of the shift register array depends on a number of para-
meters. The more important of these are:
1) f - clock cycle time of the microprocessor - This
speed determines the time taken by the processor to
serve a packet at the various stages of its service.
n
e	
2) tpl - duration of the input interrupt service routine.
3) tp2 - duration of the output buffer interrupt service
routine for packets.
4) tp3 - duration of the routing service routine.
5) t
p 
4 - duration of the sorting service routine.
6) R - bit rate/user.
7) N - number of input lines connected to the switch.
8) S - number of bits/packet.
191
ci
9) yi : destination function - this function determines
the fraction of the total number of arrivin 	 ckets
going to individual output lines.
10) Si = output line speed - this speed determines the
time required to transmit a packet to a particular
destination. Different lines may have different
speeds.
11) Fp = system packet rate in packets/sec.
12) FB = system.throughput in bits/sec.
13) M = number of output lines.
14) R = number of packet size storage locations in the
shift register array.
15) Ti = time taken for unsuccessful polling of one line
at the i-th queue.
16) A = overall average arrival rate (packets/sec.).
17) T = time needed to shift one bit internally.
18) Ni t j=1,2,3,4 = number of processors at the input,
output, routing and sorting service points respectively.
6.3 The Single Processor Design
6.3.1 Introduction
It appears from the proposed single processor
architecture and operation of the switch thet queues build
up in the switch as shown in Figure 6.1. In this queueing.
model packets queue for service by the processor in three
places. Firstly, the arriving packets queue for inputting
into the shift register array. Secondly, these packets await
192
tho routing service which includes header analysis, error
analysis, generation of ACK's and NACK's, and separating the
packets into software queues. Finally, these packets queue
for outputting. The routing service is to be performed
by the processor whereas the inputting and the outputting
functions involve service by a polling circuit in addition
to that by the processor. Also, the inputting function has
the highest priority, the outputting function has the second
highest priority and the routing service has the lowest
priority. This priority assignment is assumed as the incoming
packets have to be attended to upon their arrival, otherwise
they will be lost. Also, the output lines, being slower than
the switch itself, causes a bottleneck in the system. Hence,
whenever an output line is free to transmit messages, it
should be serviced as quickly as possible.. Thus, the outputting
process is given the second highest priority.
The packets change priority class after receiving service
and the whole system can be modelled as a single server (the
processor) serving customers of three levels of priority as
shown in Figure 6.2. The packets of various priorities queue
separately for service. The average time spent by a packet
in the switch (average response time) is the sum of the
waiting times and the service times at the three queues. Next,
expressions are derived for the average waiting times, the
overall average response time, and the average queue si7.es
 at
the various queues.
j
193
i
E
i
1
sum of the arrivals on all the input lines. It is assumed
that the arrival on the i -th input line is Poisson with average
rate ali . Then the overall arrival at the input queue is
Poisson with arrival rate
N'
11 = ^ lli ^ a	 (6.1)i=1
(b) Service Time
The service time at this queue consists of
polling time to locate the packet, transfer setting up time
and the actual transfer time. However, the processor is
free to service other lines as soon as a transfer is set up
and also there are sufficient number of transfer paths
available so that the actual process of transfer of any
packet does not cause any delay in servicing any other packets.
Thus, for the purpose of calculating the average waiting time
for packets in this queue, we consider the service time
T1 polling time + setting-up time
(6.2)
= t  + tpl
where tpl is a constant.
ti We need the mean and the . second moment of Tl and, hence,
those of tl . If there are N input lines, polled equally,
194
then a particular
have to wait anti;
bility of staring
Thus, the average
one is polled is
packet may be polled immediately or it may
L N-1 other lines are polled and the proba-
the scan at any one particular line is N .
number of lines polled before the particular
N-1 i a N-1JO N a (6.3)
and the average time spent for unsuccessful polling is
( NN-I where T1 is the time taken for unsuccessful poll of
one line. Also, the mean square value of the polling time is
N-1 UTl ) 2	 ( N-1) (2N-1) T1
iIo	
N	 =	 6	 (6.4)
Hence, the average service time
E[Tl I = N21 Tl + tpl	 (6.5)
and the mean square value of T 1 is
(N-1)(2N-1)T
  1
E[T 22 ] =	 6	 + t2
2 (6.6)pl
(c) Utilization Factor
N	 _
P1 = Al • E [T1 1 _ ( 1 Ali ) [N21 Tl + tpl )	 (6.7)i=1
195
6.3.3 Parameters of the Output Queue (Second highest
priority)
•	 (a) The Arrival Process
This queue, in fact, consists of M separate
queues, one for each output line. A packet from this queue
is serviced when the corresponding output buffer is empty. An
empty output buffer produces an interrupt that is recognized
by a polling circuit, and is serviced by the processor if there
is a packet to be zLa:lsmitted in the corresponding output queue.
If there is no packet in the corresponding output queue, then
this interrupt is disabled until a packet is available.
The time spent in this queue is calculated in two stages.
Firstly, the time spent in waiting for and being serviced by
the processor and secondly, the time spent in transferring and
transmission of packets from the shift register array to the
output lines.
All the packets in all the output queues and the packets
in the input queue affect the time spent by any packet waiting
in any of the output queues for the processor. However, the
s
'time for transferring and transmission of a packet depends only
on the speed of the corresponding output line because the pro-
cessor can attend to other packets as soon as a transaction has
been set up. Hence, to find the waiting time, we shall consider
all transactions in the output queues to form one queue. It
should be noted that it is the interrupts by the output buffers
that are serviced by the processor. However, the interrupts
are serviced only if there is a transaction available for
transfer in the corresponding output queue. Thus, we are
196
tf.
assuming that the arrival of the interrupts follows the same
distribution as the arrival of the packets to the output
queues. This arrival process is, in fact, nonpoisson. How-
ever, we shall assume it to be Poisson with the understanding
that the results obtained are the worst case ones. The arrival
rate is A2 = Al a X.
(b) Service Time
The relevant service time for calculating the
waiting time is
T2 polling time + setting time
= t2 + tp2	 (6.8)
where tp2 is a constant..
The transfer time is not included here because it does not
affect the waiting time for service by the processor. Following
the arguments given in connection with the polling time for the
input queue, it can be shown that the average service time
E[T2] = M41 T2 + tp2	 (6.9).
2 = (M-1) (2M-1) TZ + 
t2	(6.10)E [T2 ]	 6	 p2
(c) The Utilization Factor
The utilization factor connected with the
service by the processor, for this queue is
P2 = a 2 • E[T2 1 	 (6.11)
and
197
6.3.4 Parameters of the Queue for Routing Service
(third highest priority)
•(a) The Arrival Process
The arrival process is not exactly Poisson.
However, for the purpose of this analysis, it is assumed to
be Poisson with the understanding that the results obtained
are the worst case ones. The arrival rate is-a3 11 = X.
(b) Service Time
The service time T3 = Polling Time +
Processing Time
= t3 + tp3	 (6.12)
where tp3 is a constant. Following the arguments given in
connection with the input queue, it can be shown that
E[T3] = K21 T 3 
+ tp3	 (6.13)
and
E[T2] 	 (K-1) (2K-1) T2 + t2	(6.14)3 -	 6	 3	 p3
where K is the number of storage locations (in packets) in
the shift register array and T 3 is the time spent in unsuccess-
ful polling of a storage location.
(c) Utilization Factor
The utilization factor for this queue is
P 3 = a3 • EIT3 1 	(6.15)
3
6.3.5 Expression for the Average Response Time
Equations derived in the previous section are now
used to obtain expressions for the response time bf the switch.
The queues use random dispatching (polling) and pre-emptive
queueing disciplines and do not give preference to packets with
shorter service times. As this dispatching discipline is
independent of service time, the mean waiting times are the same
as those for Head-of-Line service discipline. However, we take
the polling function into account by adding the average time
due to unsuccessful polling to the actual processing time by the
processor. Then the average waiting time at the queue with the
j-th priority is [7,8]
i
1	 j-1	
ii11iE [Ti]i	 EjtW	 J-1	 E(T j ) (	 Pi) +
	
j	 (1 - E pi)	 i-1 	 2[1- f Pi]
i 1	 i 1
j	 1, 213.
(6.16)
The average of the total time spent by a packet in the input
queue (highest priority) (time spent in waiting, being ser-
f	 viced by the processor and being transferred to the shift
4
. register array from the input buffers) is
E(tgl) - E[twl I + E[T1 ] + E[Ttl]
	
(6.17)
"	 where Ttl is the transfer time at this queue 1. The average
of the total time spent by a packet in the output queue (second
highest priority) is calculated in the following way:
r
199
p
M
Yi	 Yi=1
i=1
(6.19)
I
(a) Arrival Process
^.^
	
	
This queue consists of M separate queues and the
waiting time is different in the different queues as the waiting
time in a queue depends on the arrival process and the speed of
the corresponding output line. The arrival to each of the queues
is assumed to be Poisson. However, the arrival rate may be dif-
ferent for different queues. The arrival rate to the i-th
component queue of this second priority queue is
121 - Yi12 a Yl a
	 (6.18)
where yi is specified by the destination function such that
y  of the total arrivals at this second priority output queue
go to its i-th component queue.
Hence, the average service time at the i-th component
queue is
E[T21 ] = E[tw2 ] + tp2 + E(Tt
 ]
21 (6.20)
- 
EN w2  ] + tp2 + T 21
where Tt21 - transfer time, is a constant and t p2 the setting
up time, is also a constant. E[Tt21) - average transfer time
from the shift register array to the output buffer + average
transm; .ssion time over the i-th output line
200
where Si is the transmission speed in bits/sec of the i-th
output line. The utilization factor at the i-th component
queue is
p21 = 121 *E(T21] .	 (6.22)
Also
E (T20 _ -7 + tP2	(6.23)
Si
neglecting the cross multiplication terms and E(t2 ) as
small. Then, the average time spent in waiting at the i-th
component queue of the second priority queue is
A -E (T2i1
E [t	 ] _ 	 (6.24)
w 2
	 1 - p21
Thus, the average total time spent in the i-th component
queue of the second priority queue is
A • E [T2
 ]
E (tq ] = E ET2i I + 21 - 
p2i	 (6.25)
21	 2i
The overall average time spent in waiting and in service at
the second priority queue is
M 121eElt	 ]	 M
E[t 2] _ I
	
q2i =
	 4E(t
	 ]	 (6.26)q	 i=1	 2	 i=1	 q2i
The total average time spent by a packet in the queue for
routing service (the third highest priority - -:ue) is
201
s(tg3 I - EItw3 I + EIT3 I 	 (6.57)
Thus, the overall average response time - the average total
time spent by a packet in the switch
sit q 3  - $ Itgl l + E Itg2I + E (tg3 l 	 (6.28)
Putting back the expressions for the relevant quantities in
equation (6.28) we get the overall average response time
g Itgl - E (tgl l + E (tg2 l + E (tg3l
M
- 
E'(tw1I + tpl + E(Tt1I +i^l Y
i E(tw2} + tp2
Yi ( I 1 1 j ) (tp2  + s2}
+ g + i-1
S1 2(1 - y1A(E ( tw2) + tp2 + ^-) )i
+ sit w3 I + tp3 + K T 3	 (6.29)
2
neglecting E(t2 1 compared to t2 + 8 , where E(twj)Si
J-1,2,3 are given by equation (6.16).
Equations (6.16) and (6.29) show the relationship of
the average response time for the packets to the various
design parameters of the switch, namely, the total arrival
rate A, the number of input lines N, the size of storage at
the shift register array K, the number of output lines M,
^.^	 packet size B, transmission rates of the output lines S i , the
202
_}
processor times tpl, tp2 and tp3 , the times -C l , t 2 , T 3 needed
ft-Ar unsuccessful polling of a packet at the first, second and
third priority queues respeet.tvely, and y i, the destination
function. This relationship can be used to study the effect
of variation in any of these parameters on the average response
time. In this respect, it is useful to draw graphs showing
the variation in the average response time as some or all of
these parameters are varied. Graphs of u-xis type-are presented
in Figures 6.3 - 6.22. Further explanation of these graphs is
presented in section 6.3.7.
6.3.6 The Average Queue sizes
For this pre-emptive resume queue one can also obtain
average queue sizes. The average number of packets waiting in
the j-th queue is (7,6]
j 1	 ij	 A E[T2 1
8[w l •
	 1	 Pi ill pi +	
i=1
	 (6.30)j 
(1-i^1 P i)	 211 - i 1 Pil
s
where )l
	
12 
= 
13	 £ = iii = i, P1' P2 . and P 3 are giveni=1
by equations (6.?), (6.11) and (6 . 15) respectively, and E[Til.
E[ '2 ^ aad R[r ) are given by equations (6.6), (6.10) and (6.14)
respectively. We are specifically interested in the queue size
in the shift register array. This shift register array stores
the packets that are waiting for the output function and the
routing function. Hence, the required average queue size
is E[w2 1 + EIW3). A number of graphs showing the variation in
203
5i
i
These
1
1 
Bt11i ), jw1,2,3 have been obtained from equation (6.30).
graphs are shown in Figure 6.23 - 6.29. Further explanation
of these graphs is presented-in section 6.3.7. These graphs
show the average queue sizes. However, we may be interested
in finding queue size necessary for given utilization factor
and probability of overflow. These results can be used to ob-
tain an approximate answer to this question. If the utilization
factor is about .6 and the probability of overflow is 10 -3 , then
the required buffer size is approximately ten times the average
buffer occupancy. For smaller utilization factors, the required
buffer size is further less 193.
0
6.3.7 Interpretation of the Graphs Showing the Effect of
Various Design Parameters on the Performance of the
Proposed Packet Switch
A number of graphs showing the effect of the various
design parameters on the average waiting times and the average
queue sizes at the three queues and the overall average response
time are presented in Figures 6.3 through 6.29.
(a) The Average Waiting Times at the Three Queues
Effect of A, M, M, 0 and R on the average waiting
times at the three queues are shown in Figures 6.3 through 6.13.
Average waiting time ut queue 1 vs. A, N, t1 and tp1.
Figure 6.3 shows the effect of p l , the utilization factor
on E(twl), the average waiting time at queue 1. E(t wl) increases
as p1 increases and becomes very large as p1 approaches 1. The
effect of A t N and tp1 on E(twl) can also be obtained from this
(	 graph by calculating the corresponding p1 using equations (6.1)
through (6.7) and using this value of p1 in Figure 6.3.
204
1o^►verage waiting ti	 t ausue 2 vs. A t No tlL 2"tp,"tp2 and M.
}
	
	
The effect of ;,2 , the utilization factor on L(tw2), the
average waiting time at queue 2 is shown in Figure 6.4. Because
the packets at the input queue (queue #1) has priority
over those at the output queue (queue 62), the E(t w2) depends
on both pl and p2 . The family of graphs in Figure 6.4 show the
effect of p2 on E(tw2 ) for a number of values of pl . It should
be noted that pl has a dominant effect on E(tw2) and for values
of p1 close to 1, E(tw2) increases rapidly. This indicates that
when the input queue is heavily loaded,.the processor does not
have much time for the second queue. It is also observed from
equations (6.8) through (6.11) that p, is related tc the number
	
=	 of input lines N. the arrival rate A, the polling time t 2 , the
processor setting up time tp2 and the number of o-;'put. lines M.
Bence, the effect of any of these parameters on E(t w2 ) can be
obtained from Figure 6.4 by using the corresponding values of
	
i	 p2 and pl . It can be seen from Equation (6.16) that E(t w2 ) con-i
	
`	 tains a term i: p=_ -	 Hence, if pl + p2 approaches 1, then
1 2
8(tw2) increases rapidly. Also, if pl + p2 > 1 1 then E(tw2)
may become negative. Thus, to have a reasonable value of
E(tw2), pl + p2 should be less than unity.
Average waiting time at queue 3 vs. a No tit_t2.L—tp2 , K and M.
Figures 6.5 through 6.10 present the effect of p 3 , the
utilization factor on E(tw3), the average waiting time at queue
3 for a number of values of pl , p2 and K, the number of packet-
size storage units in the shift register array. Figures 6.5
	
k!	 through 6.7 show the effect of K on E ( tw3) for same values of
i	 205
bl' p2 and p3' It is seen from these graphs that for any given
values of pl , p2 and p 3 , (e.g., p  = .156, P 2 = .18 and P 3 = .343),
E(tw^) is smaller for K 10.than for both K 5 and K a 20.
This indicates that for a given data arrival rate and processor
speed, there is an-optimum value of K that produces minimum
E(tw3). For values of K below this optimum value E(tw3) increases
as there may not be sufficient storage space available. Hence,
the processor cannot immediately set up a transfer from the input
buffer to the shift register array and thus the processor has to
spend more than usual time for servicing each incoming input
packet which, in turn, increases the delay in servicing the
shift register array. This points to a possible tie-up situation
and, hence, sufficient storage should be provided to avoid this
breakdown of the process. On the other hand, as K increases,
Z(tw3) increases simply because more time is spent in polling
these storage units.
Figures 6.7 through 6.10 show the effect of p l
 on E(tw3)
for given values of p21 P3 and K. These figures show that as
pl increases (with the same values of p 2 , p3 and K), E(tW3)
increases very rapidly indicating a dominating effect of pl
on E(tw3). Vr*.s is because if the input queue is utilized
heavily, then the processor does not get time to serve the
second and the third queues giving rise to higher delay at
these latter queues.
It should be pointed out that E(ta3 ) involves a term
(1-P _P _ p	 , (cf. equation (6.16)), and, hence, as p l + P 2 +l2 3
^-,	 p3 approaches unity, E RW3 ) increases rapidly and if p l + P2 +
P3 > 1, then E( ta3 ) mad► be negative. Hence, P 1 + P2 + P3
should be kept less than unity.
206
f ' -
i
Average waiting times vs. clock cycle time of processor.
^-^	 One of the objectives of this work has been to find out
the effect of the speed of the microprocessor on the performance
of the packet switch. For this purpose, graphs have been ob-
tained showing the effect of f, the processor clock cycle time
on E(twl), E(tw2) and E(tw3) as shown in Figures 6.11, 6.12 and
6.13 respectively.
Seven values of the clock cycle time, namely 0, 25 ns, 50 ns,
75 ns, 100 ns, 125 ns and 150 ns have been considered. It is
seen from these graphs that the clock cycle time has a prominent
effect on the waiting times. An arrival rate of A - 8x104
packets/sec has been used in generating these graphs and the
corresponding values of p l , p2 and p3 as obtained from equations
i	 (6.7), (6.11) and (6.15) respectively are also shown on these
graphs. For-the AMD 2900 bit slice microprocessor used in the
present design, the clock cycle time is approximately 120 ns.
The corresponding values of E(twl), E(tw2) and E(	 are 250 nS,
1.7 US and 11.5 VS respectively.
In the future as more powerful microprocessors (with smaller
clock cycle times) become available, the corresponding waiting
times at the various queues can be obtained from these graphs.
Other arrival rates also can be used in obtaining similar graphs
provided that the corresponding p l + p2 + p3 remains less than
unity.
(b) The Overall Average Response Time
Effect of the various parameters on E(t q), the
!	 overall average response time is shown in Figures 6 . 14 through
6.22.
207
Figure 6: 14 shows the effect of the packet size B on the
r-
overall average response time E(tQ). Four graphs each corre-
sponding to a different set of (pl , p2, p3) are shown. It isI r _
seen that in each case the overall average response time
increases at the same moderate rate as B goes from 1000 bits
to 10 , 000 bits. This is a very useful result. Because the
throughput of the switch increases directly as B, whereas the
corresponding response time increases at a much slower rate.
Thus, the throughput can be increased considerably without
suffering severe penalty in response time. It is to be noted
that pl , p2 and p3 do not depend on B. It is the shifting times
that depend on B. Hence, the response time for a given B can be
reduced by employing a faster hardware for shifting of data.
Overall average response time vs. destination function yi.
Figures 6 . 15 and 6.16 show the effect of destination
functions on the overall average response time E(t q). In figure
6.15, all output lines are assumed to have equal capacities.
E
Also, five different sets of destination functions have been
used. The destination function sets 1 and 2 represent random
distribution of data to the various output lines. Set 3 repre-
sents uniform distribution of data to the output lines. The
fourth set is such that half of all the data go to the output
line number 1. The output lines 2, 3, 4 and 5 receive only ten
percent of the data each. The rest of the lines receive only
two percent of the data. This is a biased destination function.
^.	 The fifth set again represents a biased destination function
208
with the output line number 2 receiving fifty percent of the
 data. The capacities of all output lines are the same. It is
observed from Figure 6.15 that the overall average response
time is minimum for the uniform destination function. Also,
for the biased destination functions, the response tires are con-
siderably higher than that for the uniform destination function
case. The input arrival rate is chosen such that the utilization
factor for each of the output lines is less than unity.
For Figure 6.16 the same sets of destination functions
and same values of other parameters are used except that in
this case the capacities of the output lines are given by
Si = SAByi . Here, the capacity of each output line is propor-
tional to the amount of data destined for it. Because of this,
the response time remains constant for all the destination
functions.
Overall average response time vs. output line speeds S i.
Figures 6.17 through 6.22 show the variation of the over-
all average response time due to changes in the capacities of
the output lines. Three types of capacity assignments are con-
sidered: uniform, proportional and square root. In the uni-
form capacity assignment, the capacities of all the output
lines are the same (Si AMa). In the proportional assignment,
each.output line is given capacity proportional to the traffic
on it (Si = AByia). In the square root capacity assignment,
every line is assigned minimum capacity equal to the traffic
expected on this line. Additional capacities are then assigned
to each line in proportion to the square root of the traffic
expected on that line. Figures 6.17 through 6.19 show the
209
response time for uniform destination functions (Yi a .1 for
I
all i?. With this destination function, identical response
times are obtained for all three types of capacity assignments
as shown in figures 6.17 through 6.19. This is so because
with this destination function all three capacity assignments
result in the same capacity values for the output lines. The
case when a a 1, i.e., the capacity assignment is equal to the
average traffic on a line, the response time is undefined as
the one or more terms in equation (6.29) may be negative. It
is observed from these graphs that the response time decreases
as a increases, the decrease being sharper initially and more
sluggish for a > 5. Thus, after certain values of a, increasing
the line capacities may not reduce the response time corre-
spondingly. That means a point of diminishing return sets in.
These general comments apply to Figures 6.20 through 6.22
also. However, for these cases, the destination function is a
biased one and, hence, the response time does not have the
exact same value for the three different capacity assignment
strategies.
(c)
	
	
f the Various Design Parameters
on the Avera(
The number of packets waiting at the various
queues for various design parameters is shown in Figures 6.23
through 6.29.
Average queue sizes vs. X. N, M, K, tl , t2 , t 3 , tpl , tp2 and tp3.
Figure 6.23 shows the variation in the average queue size
C
z E(w1) with p l , the utilization factor at queue 1. This curve
has similarity with that for E(twl). This follows from Little's
210
formula which states that the average queue size average
(	 arrival rate.x average time spent in the system. As pl
approaches unity, the queue size increases rapidly. However,
as the queue 1 has the highest priority, the queue size is
rather small for p < 690
Figure 6.24 shows the average queue size EN) as a
function of p21 the utilization factor at queue 2 for a number
of values of pl . For reasonable results p l + p. should be less
than unity. It is also seen from this figure that pl has a
dominant effect on E(w2).
Figures 6.25 through 6.29 show how E(w3 ) 1 the queue size
• at-the third queue changes with pl , p2 , p3 and K. Figures 6.25
and 6.26 show EN) for K = 10 and K - 50 respectively for given
values of pl , p2 and p3 . It is seen that the ;spected queue
size EN 3 )  goes up somewhat for K = 50 than for K = 10. This
is due to the additional polling time necessary for finding the
stored packets. It appears that K = 10 is reasonable for p = .1.
However, it is seen from Figures 6.26 through 6.29 that E(w3)
increases rather quickly as p l increases. Hence, for higher
values of pl , a larger value of K should be used and the corre-
sponding queue size be determined. For the purpose of this
report, K = 50 is used and the corresponding E(w 3 ) are shown.
If a higher value of pl is intended to be used, then a K larger
than 50 has to be used.
211
V
I-
6.4 The Three Processor Design
^E
6.4.1 Introduction
it appears from the proposed three processor archi-
tecture and operation of the switch that queues build up in
the switch as shown in Figure 6.30. In this queueing model,
packets queue for service by the processors in three places.
Firstly, the arriving packets queue for inputting into the
shift register array. Secondly, these packets await the
routing service which includes header analysis, error analy-
sis, and separating the packets into software output queues.
Finally, these packets queue for outputting. All the packet
switch functions involve service by polling circuits in addi-
tion to processor service.
The average time spent by a packet in the switch (average
response time) is the sum of the waiting times and the service
times at the three queues. Next, expressions are derived for
the average waiting times, the average response times, and the
average queue sizes at the various queues.
6.4.2 Expressions for the Waiting Times at the Various
Queues and the Overall Average Response Time
The assumptions made for the queueing model for the
single processor design are also assumed here. Also, the
analytical developments used in section 6.3 are valid
here except that in the three processor design, each processor
is performing only one function. Hence, the average waiting
time at each queue depends on the corresponding utilization
212
tom,..''
	
fn	
_.
factor only. Thus, the average waiting times at the routing
-^	 and the output queues depend on p3 and p2 respectively and
not on other p's.
Following the definitions and analytical developments
similar to those for the single processor design (cf. section
6.3), it can be shown that for the three processor design, the
overall average response time E(tq) is
E[tq] = Bit gl ] + E[tg2 ] + E[tg3]
M
= E [twl] +t pl  + EITti] + i=1 Yi [Eltw2 l  + tp2
Yi (	 alj)(tP2 + S
B	
jal	
Si+ B +
Si 2(1 - Yia(E(tw2) + tp2 + S ))
i
+ E[tw3] + t p 3 	 (6.31)
2
neglecting E[tw 2] compared to t2 + B, where E(twj)
S1
j=1,2,3, the average waiting times at the j -th queue
are given by [7,8]
AE [T2 ]
	
E(twj ) = 
Z(1-p)	 ,	 j=1,2,3	 (6.32)j
where E[Tj l, j=1,2,3 and p j , j=1,2,3 are given by equations
(6.6)o (6.10) , (6.14) and (6.7) , (6.11) and (6.15) respec-
tively. The difference between equations (6.16) and (6.32)
213
f tia..
rn
should be noted. Also, the polling times T1 , T2 and T3
are assumed to be negligible as polling at all three queues
are done by hardware in this case.
Equations (6.31) and (6.32) show the relationship of the
average response time for the packets to the various design
parameters of the switch, namely, the total arrival rate a,
the number of input lines N, the number of output lines N,
packet size B, transmission rates of the output lines Si , the
processor times tpl , tp2 and tp3 , and yl , the destination
function. This relationship can be used to study the effect
of variation in any of these parameters on the average response
time. In this respect, it is useful to draw graphs showing
the variation in the average response time as some or all of
these parameters are varied. Some graphs of this type are
presented in Figures 6.31 - 6.55.
6.4.3 Expressions for the Average Queue Sizes at the
Various Queues
Following the developments in section 6.3.6 for
the average queue sizes for the single processor design, it
can be shown that for the three processor design the average
number of packets waiting at the j-th queue [7,8] is
A2E [T2]
E[W.] - p + --i	,	 j-1,2,3
	 (6.33)
7	 j	 2(1-pj)
where E[T?], j-1,20 and p j , J-1,2 1 3 are given by equations
(6.6), (6.10), (6.14) and (6.7), (6.11) and (6.15) respec-
tively.
214
1We are specifically interested in the queue size in
the shift register array. This shift register array stores
•	 i
the packets that are waiting for the output function and the
routing function. Hence, the required average queue size is
SIN2 1 + E[W31. A number of graphs showing the variation in
H(Mj), j=1,2,3 have been obtained from equation (6.33). These
graphs are shown in Figures 6.56-6.64. Further explanation
of these graphs is presented in section 6.4.4. These graphs
show the average queue sizes. However, we may be interested
in finding queue size necessary for given utilization factor
and probability of overflow. These results can-be used to
obtain an approximate answer to this question. If the utiliza-
tion factor is about .6 and the probability of overflow is 10-3,
then the required buffer size is approximately ten times the
average buffer occupancy. For smaller utilization factors,
the required buffer size is further less [9).
6.4.4 Interpretation of the Graphs Showing the Effect
of the Various Design Parameters on the Performance
of the Proposed Three Processor Packet Switch
(a) Effect of
s
	
es at the Various
es
In the three processor design, the problem of
contention among the processors for using common resources has
been resolved as much as possible. However, possible contention
over the use of the output queue lists by the routing and the
output processors could not be totally removed. It appears
from Table 5.1 that the durations of the routing service
215
I:
U-
Cl
routines and the output service routine increase by two cycle
times each in the presence of contention over those in the
absence of contention.	 garly on we wanted ^o find out the
effect of contention on the average waiting tunes and the
average queue sizes at the three queues. The graphs in Figures
6.31 through 6.41 show the effect of contention on the average
waiting times and the average queue sizes. An examination and
comparison of the corresponding graphs with and without conten-
tion show that the effect of contention on the average waiting
times and the average queue sizes at the routing and output queues
are negligible. The input queue, of course, is not affected by
contention. For this evaluation, two possible situations have
been considered: no contention and contention at all times.
The corresponding results give the lower and upper bound on the
effect of contention. Results for other degrees of contention
lie in between these two limits.
(b) The	 tins Times a	 s
Figure 6.42 shows the effect of the utilization
factors pl , p2 and p3 on the corresponding average waiting
times. The average waiting times increase as the corresponding
P increases. For values of p beyond . 8, the waiting times
become very high and these go to infinity for p equal to unity.
Actual values of these waiting times for a given value of p
differs due to the difference in the values of E(T2), E(T21
and E(T2 1 which happens due to the difference in the values of
tpl , tp2 and tp3 as noted on Figure 6 . 42. Figure 6.43 shows
similar effects on the average response times at the three
queues.
216
v
saverage waiting times vs. clock cycle time of processor.
One of the objectives of this work has been to find out
the effect of the speed of the microprocessors on the performance
of the packet switch. For this purpose, graphs have been ob-
tained showing the effect of 	 the processor clock cycle time
t
on 8(t
w1), 8(tw2) and E(tw3) as shown in Figures 6.44 0 6.45 and
6.46 respectively.
Eleven •values of the clock cycle time have been considered.
The corresponding values of the respective utilization factors
are shown on these graphs. It is seen from these graphs that
the clock cycle time has a prominent effect on the waiting times.
An arrival rate of A = 8x104 packets/sec has been used in
generating these graphs and the corresponding values of pl' p2
and p3 as obtained from equations (6.7) , (6.11) and (6.15)
respectively are also shown on these graphs. For the AMD 2900
bit slice microprocessor used in the present design, the clock
cycle time is approximately 120 ns. The corresponding values
of E(t
wl ) ' 1'•(tw2) and E(Tw3) are 80 ns, 150 ns and 166 no
respectively.
In the future, as more powerful microprocessors (with
smaller clock cycle times) become available, the corresponding
waiting times at the various queues can be obtained from these
graphs. Other arrival rates also can be used in obtaining
similar graphs provided that the corresponding o's remain less
than unity.
(c) The Overall Average Response Time
.`
	
	 Effect of the various parameters on E(tq), the
overall average response time, is shown in Figures 6.47 through
6.55.
217
ti
Overall average response time vs. packet size 8
Figure 6.47 shows the effect of the packet size 8 on
the overall average response time E(t 9). Four graphs each
corresponding to a different set of (pl' p2 0 p3) are shown.
It is seen that in each case the overall average response time
increases at the same moderate rate as 8 goes from 1000 bits to
10,000 bits. This is a very useful result. Secarse the through-
.'
	
	
put of the switch increases directly as 8 whereas the corre-
sponding response time increases at a much slower rate. Thus
the throughput can be increaser considerably without suffering
severe penalty in response time. It is to be noted that pl, p2
and' P3 do not depend on B. it is the shifting times that depend
on B. Hence, the response time for a given 8 can be reduced by
employing a faster hardware for shifting of data.
Overall average response time vs. destination function 11.
Figures 6.48 and 6.49 show the effect of destination func-
tions on the overall averags response time E(tq). In Figure
6.48 all output lines are assumed to have equal capacities.
Also, five different sets of destination functions have been
used. The destination function sets 1 and 2 represent random
distribution of data to the various output lines. Set 3 repre-
sents uniform distribution of data to the output lines. The
fourth set is such that half of all the data go to the output
line number 1. The output lines 2, 3, 4 and 5 receive only
ten percent of the data each. The rest of the lines receive
only two percent of the Lata. This is a biased destination
function. The fifth set again represents a biased destination
218
function with the output line number 2 receiving fifty percent
of the data. It is obserAd from Figure 6.48 that the overall
average response time is minimum for the uniform destination
function. Ale` for the biased destination functions the
response time is considerably higher than that for the uniform
destination function case. The input arrival rate is chosen
such that the utilization factor for each of the output lines
is less than unity.
For Figure 6.49 the same sets of destination functions
and same values of other parameters are used except that in
this case the capacities of the output lines are given by
Si = SAByi . Here the capacity of each output line is
proportional to the amount of data destined for it. Because
of this, the response time remains constant for all the
0 destination functions.
Overall average response time vs. output line speeds S i
Figures 6.50 through 6.55 show the variation of the overall
average .response time due to changes in the capacities of the
output lines. Three types of capacity assignments are
considereds uniform, proportional and square , ,)ot. In the
uniform capacity assignment the capacities of all the output
f	 lines are the same (Si - AMa ). In the proportional assignment
^i
	
	 each output line is given capacity proportional to the traffic
expected on it (Si - ABYia). In the square root capacity
t
assignment every line is assigned minimum capacity equal to
the traffic expected on this line. Additional capacities
i
	
	
are then assigned to each line in proportion to the square
root of the traffic expected on that line. Figures 6.50
219
through 6.52 show the response time for uniform destination
functions (yi = .1 for all i) . With this destination function
identical response times are obtained for all three types
of capacity assignments as shown in Figures 6.50 through 6.52.
This is so because with this destination function all three
capacity assignments result in the same capacity values for
the output lines. The case when a - 1, i.e., the capacity
assignme,:t is equal to the average traffic on a line, the
response time is undefined as the one or more terms in
equation 6.29 may be negative. Hence the values of response
time for 2 < a < 10 are shown in these graphs. It i:.
observed from these graphs that the response time decreases
as a increases, the decrease being sharper initially and
more sluggish for a > 5. Thus after certain values of a
increasing the line capacities may not reduce the response
time correspondingly. That means a point of diminishing
return sets in.
These general comments apply to Figures 6.53 through 6.55
also. However, for these cases the destination function is
a biased one and hence the response time does not have the
exact same value for the three different capacity assignment
strategies.
(d) The Effect of the Various Design Parameters
on the verage Queue Sizes
The number of packets waiting at the various
queues for various design parameters is shown in Figures
6.56 through 6.64.
220
Figure 6.56 shows the variation in the average queue
size E(wl) with pl , the utilization factor at queue 1.
This curve has similarity with that for E(twl). This
follows from Little's formula which states that the average
queue size - average arrival rate x average time spent in
the system. As pl approaches unity the queue size increases
rapidly. However, the queue size is rather small for p < .9.
Similar comments also apply to Figures 6.57 and 6.58 which
show the variation of average queue sizes at the routing and
the output queues respectively.
Figures 6.59 and 6.60 show the effect of varying M, the
number-of output lines, on the average queue sizes at the
output queue for two proportional capacity assignments to
these lines. It follows from these graphs that even with
proportional capacity assignment the output queue size
increases with the number of output lines. This increase
is mainly due to the work involved in demultiplexing data to
so many lines which may or may not be ready to receive data.
It is seen from Figure 6.61 and 6.62 that the average
queue size at the output queue does not increase much with
increase in the packet size. This is an encouraging result
as the throughput can be increased by increasing packet size
without making the corresponding storage requirements too
high.
Figures 6.63 and 6.64 show that the queue size at the
output queue cannot be decreased much by using faster
221
5i
processors. This is mainly because at the output queue
major part of the service time is due to shifting time and
many packets wait for the output buffers to be available
rather than for service by the processor itself. it also
appears from a comparison of Figures 6.63 and 6.64 that
increasing the capacities of the output lines make the
queue size to go down considerably.
6.5 The Multiple Processor Design
6:5.1 Introduction
In the multiple processor architecture queues
build up in the switch as shown in Figure 6.65. In this
queueing.model every packet queue for service by appropriate
processors in four places. Firstly an incoming packet queue
for service by one of the input processors for inputting
into the shift register array. Secondly, this packet awaits
service by one of the sorting processors that assigns it to
one of the routing processors. The routing processor services
it by putting it into one of the output queues. Lastly this
packet is serviced by one of the output processors. Each of
these services involve service by appropriate processors and
polling circuits. However, the hardware polling times are
negligible.
It is assumed that at every stage . of the service, e.g.
at the input service, the total number of packets arriving
there for service are equally divided among the processors
performing that function. This assumption is physically
reasonable as this will ensure that all the processors are
equally busy. operation of the multiple pro::essor design
indicates that at each stage of service there are a number of
single server queues in parallel.
222
The average time spent by a packet in the switch
(average response time)-is the sum of the waiting times and
the service times at the four queues.
Analytical expressions are derived next for the average
waiting times, the average response times and the average queue
sizes at the various queues.
6. S.2 Analytical Expressions .for the Waiting Times
at the Various Queues and the Overall Average
Response Time
Assumptions made for the queueing model for the
single processor design are ass,mir_-3 here. Also the
analytical developments used in section 6.3 are valid here
except
i) there is no interdependence among the functions as each
function is performed by a number of processors dedicated
for this function.
ii) The packet arrival rate to each processor assigned for
the j-th function is 
aj/Nj where N  is number of
processors performing this function an a j is the
overall packet arrival rate for this service. In
normal operation a j=a for j=1,2,3,4.
It should be noted that
j=1 ; input function
J=2 -► output function
J=3 routing function
and j=4 -► sorting function
Following the analytical developments similar to those for
the single and three processor designs, it can be shown that
223
v
Cthe average waiting times at the j-th queue are given by [7,8].
A BIT2 ]	 A t 2
8(t ) _ --- ^	 t j=1,2,3,4 (6.34)
wj	 2 1-p i
	2Nj ( 1 - i Pj)
Where BITj 2] - mean square value of the service time t pj (6.35)
and 
p3 
= aj B ITj] - L- 
tpj	 ( 6 .36)Nj
(neglecting the polling times).
For the queueing analysis the following values of the
service times have been used.
tpl=15#
tp2 =19 $ (6.37)
tp3 = 9 f
tp4=13¢
These values differ.slightly from the values shown in table 5.2.
The values in table 5.2 are the final refined values obtained
after the queueing models have been developed using the earlier
estimates of these quantities. However, the queueing results
will not be much different using the values in table 5.2.
The average response times at these queues are
E(tgl) = E(twl) + tpl + Ttl	 (6.38)
=E(twl) + 15$+2nsxB
M
E (tq2) _	 Yi [E (tw2 ) + tp2 + S. +i=1	 i
2
Yia (B2 + tp2)
+	
i	 ]	 (6.39)
2(1-Yi A (E(tw2) + tp2 + S ) )
i
224
4
E(tq) = I E(tgj)J=]. (6.42)
E(tg3) a E(tw3) + tp3 = "NP+ 9	 (6.40)
V	
E(tg4)
	
E(tw4 ) + tp4 = E(tw4) + 13
	 (6.41)
E(tg2) is obtained by following the development in section 6.3.5.
The overall average response time of the switch is
where E(tq.j ); j=1,2,3,4 are given by equations (6.38) through
(6.41) respectively.
Equations (6.34) through (6.42 show the relationship of
the average waiting and response times for the packets to the
various design parameters of the switch, namely, the total
arrival rate a, the number of input lines N, the number of
output lines M, packet size B, transmission rates of the output
lines Si, the processor times tpl, tp2 and tp3 and-tp4 , Yi,
the destination function and N  the number of processors at
the various queues. These relationships can be used to study
the effect of variation in any of these parameters on the
performance of the switch. In this respect, it is useful to
draw graphs showing the variation in the average waiting and
response times as some or all of these parameters are varied.
Some graphs of this type are presented in Figures 6.66 - 6.79.
The aim here is to see how the waiting times and response
times vary as the number of processors at every service stage
is varied. Hence these Figures show family of graphs with N
as a parameter. The effect of variation of other parameter
should be similar to that shown for the single and three
processor designs.
225
0.•
6.5.3 Expressions for '-he Average Queue Sizes at the
Various Queues
Following the developments in section 6.4.3 for
the average queue sizes , for.the three processor design, it
can be shown that for the multiple processor design the
average number of , packets waiting at the j-th queue [7,8] is
X2	 2
l 2E[T2 ^ _ 1	 (N ) tPj
E[Wjl a pj + 2 1-n j	 Ni tPj + 2(1- N tpj)j
j - 1,2,3,4.	 (6.43)
We are specifically interested in the queue size in
the shift register array. This shift register array stores
the packets that are waiting for the output, sorting and the
routing functions. Hence, the required average queue size is
E [W2 I + E [W3 ] + E (W4 ) . A number of graphs showing the variation
in E(Wj ), j=1,2,3,4 have been obtained from equation (6.43).
These graphs are shown in Figures 6.80 - 6.83. Further
explanation of these graphs is presented in section 6.5.4.
These graphs show the average queue sizes. However, we may
be interested in finding queue size necessary for given
utilization factor and probability of overflow. These results
can be used to obtain an approximate answer to this question.
If the utilization factor is about .6 and the probability of
overflow is 13-3 , then the required buffer size is approximately
ten times the average buffer .occupancy. For smaller utilization
factors, the required buffer size is further less [9).
226
6.5.4 Interpretation of the Graphs Showing the Effect
of the Various Design Parameters on the Performance
of the Proposed Multiple Processor Packet Switch
Major aim of the analysis is to see how the
average waiting times at the various queues vary for given
overall arrival rate as the number of processors at these queues
are varied. Figures 6.66 - 6.68 show the effect of varying
the number input processors on the average waiting time at the
input queue. These graphs also show the effect on the average
waiting time of varying the overall packet arrival rate for a
given number of input processors. These three figures differ
I
in the maximum value of a, the packet arrival rate that is
allowed. Maximum packet arrival rates of 2x10 6 , 2x107 and
5x107 packets/sec have been used in Figures 6.66 - 6.68
respectively. The rationale for using these three maximum
values of a is the following: For a	 2x106 one cant	 max
observe clearly how the average waiting time varies for a
single input processor. However, the effect is not at all
clear for other higher number of input processors. The
using of Xmax - 2x10 7 and 5x10 7 shows the effect on average
waiting time of the varying the number of input processors.
`
	
	
For the same za.ason three values of Xmax have also been
used for the sorting, the routing and the output queues.
Figures 6.69 - 6.71 show the effect of varying the
overall packet arrival rate on the average waiting time
at the output queue. These graphs also show the effect
on the average waiting time of varying the number of output
processors. Similar results are shown in Figures 6.72 - 6.74
and Figures 6.75 - 6.77 for the routing and the sorting
queues respectively.
r-
227
Figure 6..78 shows the effect of varying B. the packet
size on the overall average response time for a fixed ►umber
of processors. The overall average response time increases
slightly as B increases.
Figure 6.79 show the effect of destination functions on
the overall average response time E(tq). In this figure all
output lines are assumed to have equal capacities. Also
five different sets of destination functions have been used.
The destination function sets 1 and 2 represent random
distribution of data to the various output lines. Set 3
represents uniform distribution of data to the output lines.
The fourth set is such that half of all the data go to the
output line number 1. The output lines 2,.3, 4 and 5 receive
only ten percent of the data each. The rest of the lines
receive only two percent of the data. This is a biased
destination function. The fifth set again represents a
biased destination function with the output line number 2
receiving fifty percent of the data. The capacities of all
output lines are the same.
It is observed from Figure 6.79 that in the case of
multiple processor design the E(tq) is almost constant for
all the sets of destination functions. One-explanation
is that in the case of multiple processor design any output
line with more packets destined for it may be provided
with a dedicated processor. Also since the output lines are
slower than the processor no large queue will build up, of
course the capacity of the output lines should be high enough
to absorb the packets destined for them. It is to be noted
that in Figure 6.79, the ratio of maximum packet arrival rate
228
to the line capacity is .5 M AB/AB8. 8. Thus the channel i{
capacity is large enough to handle the packet arrival rates
for even the line with destination function of .5.
Thus it is seen-that E(tq) is almost constant for all
sets of destination functions.
i
Finally figures 6.80 - 6.83 show the effect of variation
in the number of processors on the average queue sizes for a
given packet arrival rate at the input, output, routing and
the sorting queues respectively. These Figures also show
the effect on the average queue sizes of varying the packet
arrival rate for a fixed number of processors at the corresponding
queues. It should be noted that the queue sizes decrease as
the number of processors increase at the various queues.
i
229
I6.6 Conclusions
l }
	
	 Queue theoretic models have been developed for all the
three proposed architectures. Graphs showing the average
waiting times, the overall average response times and average
queue sizes as functions of various design parameters have
been obtained. It is observed from these graphs that in most
cases the average waiting times and the average queue sizes
are reasonable. The overall response times and the queue
sizes are much smaller in the three processors case than in
the single processor case. These quantities a:3 further
reduced in the multiple processor case, however, not propor-
tionately.
The main incentive for using multiple processors is to
increase the throughput. However, the response times and the
queue sizes (the storage requirement) are also reduced in the
process. Thus it seems that the multiple processor design is
the one to be used.
230
I!
7.0 Summary
7.1 Suggestions for Future Work
Several suggestions for future work in the area of
processor-controlled packet switches are presented in [2).
An additional feature which is possible in the multiprocessor
architectures is the transmission of system status data to
each user. This scheme would require an additional processor
which would be required to monitor the system status. This
processor could monitor the status of ELIST, the Output Queue
Lists, and important system hardware. If this processor dis-
covered a hardware failure, a near empty ELIST or a nearly
filled queue list it could generate a packet-length message
that would inform the user of the system problems. This
processor would be required to inform the Output Processor to
send a system status data packet to each user. Using the
received status information, user could re-route messages
around nonfunctioning channels, reduce their overall through-
put, or reduce their throughput to a specific user to avoid
packet losses.
Any system enhancements will be paid for in terms of
throughput and/or the number of required processors.
7.2 System Throughput
All three packet switch architectures are capable of handling
large system throughputs as shown in the following examples.
231
4
F
7.2.1 Single Processor Packet Switch
F  < 1.5x105 packets/sec.
Using a packet length of 10,240 bits, the maximum bit
rate for the system is
F  = FpxB < (1.5x 105) x (10,240) bits/sec.
FB < 1.50109 bits/sec.
7.2.2 Three Processor Pack Switch
F  < 5.21 x 10 5 packets/sec
Using the packet length of 10,240 bits, the bit rate for
thid system is
F  = FpxB < (5.21 x 10 5) x (10,240) bits/second
F  = 5.33 x 109 bits/second
7.2.3 Multiple Processor Packet Switch
Example System
F  < 30 x 10 9 bits/second
N = 10 users
B = 10,240 bits/second
Packet Throughput Requirement for this system
F  = FB/B < (30 x 109/10,240) packets/second
Fp < 2.93x10 6 packets/second
I
232
QSince the throughput is limited by Equation 5.9, which
( }	 states the packet throughput is limited by the number of users,
the value calculated above may not be obtainable for this
system . An evaluation must be made.
F  < 2.93x 106 < (1/tP4 )N - 4.39x106
is true, the proposed system can be built to handle the desired
bit rate. By using Equation 5.8 for each class of processors,
the total number of processors required for this system is
determined. Twenty-one processors are needed: five Inpur Pro-
cessors, five Sorting Processors, Four Routing Processors and
seven Output Processors. This system using twenty-one proces-
sors will provide a bit rate of 30 X 10 9 bits/second. As shown
above in the evaluation using Equation 5.9, this throughput is
not the maximum obtainable bit rate. Thus, if additional pro-
cessors were implemented, a larger throughput could be provided
to the ten users.
The cost oZ achieving these large throughputs is paid for
in terms of the number of proces rs required, the width of
the Microprogram ROM and the special purpose hardware and soft-
ware required to deal with contention problems. T.se major
trade-off in both designs is that a reduction in the software
executions is paid for i1 hardware complexity. Two prime
examples of this type of trade-off are the use of hardware
pollers and the large number of microprogram control bits,
which enable the execution of concurrent tasks.
233
F7.3 Queue Theoretic Results
0
	
	
Queue theoretic models have been developed for all the
three proposed architectures. Graphs showing the average
waiting times, the overall average response times and average
queue sizes as functions of various design parameters have
been obtained. It is observed from these graphs that in most
cases the average waiting times and the average queue sizes
are reasonable. The overall response times and the queue
sizes are much smaller in the three processors case than in
the single processor case. These quantities are further
reduced in the multiple processor case, however #
 not propor-
tionately.
The main incentive for using multiple processors is to
increase the throughput. However, the response times and the
queue sizes (the storage requirement) are also reduced in the
process. Thus it seems that the multiple processor design is
the one to be used.
The major contribution of this work to the area of digital
communications is the design of efficient multiprocessor packet
switches which can provide large throughputs, special functions
and flexibility not available in non-programmable systems. The
overall performance of these packet switches will improve as
faster hardware and processors become•available.
234
REFERENCES
1. Roberts, Lawrence G., "The Evolution of Packet Switching,"
Proceeding s of the IEEE, Vol. 66, No. 11, pp. 1307-1312,
November 1978.
2. "Design of a Microprocessor Based High Speed Space Borne
Message Switch," Annual Report to NASA on Grant No. NSG-
3191, Clarkson College of Technology, Potsdam, N.Y.,
April 1979.
3. Burnell, James F., "The Design of a Microprocessor-Based
High Speed Packet Switch," M.E. Thesis, Clarkson College
of Technology, Potsdam, N.Y., August 1979.
4. Russo, Paul M., "Interprocessor Communication for Mult'-
Microcomputer Systems," IEEE Computer Magazine, pp. 6
76, April 1977.
5. Madnick, Stuart E. and Donovan, John J., Operating Systems,
McGraw-Hill, Inc., New York, N.Y., 1974.
6. Advanced Micro Devices Inc., "The AM 2900 Family Data
Book," 1978.
7. L. Rleinrock.
	
Queueing Systems, Vol. 2. John Wiley &
Sons, New York, 1975. Ch.	 3.
8. James Martin. Systems Anal sis for Data Transmission.
Prentice Hall, New York, 1972. Ch. 31.
9. M. Schwartz. Computer-Communication Network Design and
Analysis. Prentice Hall, New York, 1977. Ch. 7.
235
\ZOO,
Fig. 6.1. The Queuing Model
236
Legend:
O1	
The input queue with
priority 1.
pThe output queue with
priority 2.
JThe queue for background
service with priority 3.
Legend:
O1	 The input queue with
priority 1.
O2	 The output queue with
priority 2.
O3	 The queue for background
service with priority 3.
Fig. 6.2. The Modified Queuing
Model.
237
L,.1
rl
LO ^N3
^ v
Wb
y^
i 238
7
r m
1pqCL
L
cc
1.r
0
Y
u
w
0
a+
CO
a+
H
00
M
^J
3
d
00
^o
H
d9d
M
^O
0o
Lz+
co 	 3	 In
Ln 
w	
N
e
r
i
1
u
Q
Q
r
6
N
d
!0
C m
rr
a
A
Y
N
41
aa
Y
N
CL
0Y
u
w
a
N	 G
Y
^	 M
M
Y
L7
M
N
M
YM {r
3 u
o°^o
w ^
v a
d d
moo
w
m
N
7
m
o-
2 39
m
r
1
CL
N
m
u	 ^^
c	 ^ !1
Y/
V'
CY
Q
L4	 ^-
N
C6
I
r
!'r1
a
en
CL
w0
u
w
a+
N
rr
"4
a
^o
H
ao
3
010 w
w 
t
a
i^
d d
LA
^o
w
m
N
7
m
aN
I
IA
m
N
M
Q
a
u.
C
I
N
a
M
I
W
IS!
p`"' 3	 7
m	
LD
m
a
c
QO
cv
CL
a
M
ON
u
w
aOM
w
N
.	 M
m ^MV
a
N
M
t+1
I ti
w
a a
^o
CD
© w
fn
7
m
a •-1
M	 ^
w de
241
	C^	 C
aN
t0
w
MI
eryl
CL
O
w
O
w
u
tro.
C
O
a+
	
'	 M
Y
d)
H
00
C
a^
-A
m
w
d
d V
w ^M
d a
	
m	 ^
w
N
0
C^ o^ , t
242
	 n 1'9(+^
m^ 3
m v
^ Wt
7
o P4N •
N	 N
A4
Oki
1d
a
v
a
M
ON
^N
a
M
m M
.-1
^r1L
H
00
^ d
^O
w a
d
00
,aw
i
tt
t
c
c
N
u
a
11 t^ i
m
m~ 3	 ^
m	 ^
W	 to
N
m
243
m
w
0Y
aa
QN
1
!r1	 crO
tY
ig
1+1
a
w0
Y
v
12
cs
N	
,^
m 
a^
,aVa
D
H
Y
•1
Ii
w
d
o^
^o
m ^
m w
7
m
^ M
E 14
244
m
w
! cU
c
N
n
M
N
M
ata
N
a
cv
+^ tu
eh
CL
MON
v
a
"4
s	 a
.a
D
uM
ee
w
d
.o
m
o0
m w
N
N	 N
245
7
^C
O,p A00le L A'^CQ0.	 Is
T^
N
m ^
m ^
Uf m
N
m
N
C
m
N
e
ID
N
m
a
w
^	 o
C
d
O
> R
rl
rl
N
m ^ gN^
,^ 8
M {r
d v
vo
c^
m w^
I
a
N
N
,^	 a
w
•
op
	N
N a
•
N
N
246
mn
m
N
m
OD
M
N
m
^d l
Nt
r
m
NN
M
C
m
NT
I	
1
o
C
CE4
d"
n!
m
M
N
N OD
m^
N
OD
W^
^OD
z N
CN
0
M
m
M
l 1	 ^,^ 1 1	 1	 1	 t 1 `^
msN N
•
A
N
M
aa
V N
a N
w 
N^
a
a+
y
o a
R ^
CO N
N	 •♦
La
M
^ a
m an
r
r M
0)
m
m
0
w
.^01
F
d
v0
V ^
a a0
N N
N
q
v m
ed ^
H 'O
u O
^ w
a
o►
to
w b
W
N
06
V4
W
247
1>
M
C
M
OD a LI)
M w
as'^
m
N
N to
Id ^
m N M o
m
m M
W % a u
•- (0 m o
aCD
m
Q
OD
of
w
m N o
N m
CL H
N d
M a'i v
IU .et
m W 0
$ CV) ..
N 9 Q
NM N M
a+a W 7 a
d
•
u I^
CY
fa N (S) [
(0 p t^ O (S)
Ln ^6 AW
ae
N
a E-4 b
u 41
cc
N o 0
^a $ m M u o
G 3 a
co p • m )N d
' m a
.o
00
w
..
w _ rn
248
x
ONG1NAL PAGE
cw POOR QUAL]
N c th N 7 o0
^ N © m ^ nN
to "  N N w
aM
N
m a
w ^
CL
I
cd
N
a
w
t	
rl
1	
n
w
M
^	 3L
,^	 oa
0o
to
M
u
a
m^  H
m d
^ m
a
d
00
^oM
74
w
m ^
m w^
iiv
a
f
N
O
U1
w C
•
.•1 N N^
O O
w
^
N • w w
• w N N
w N .-i O O
A w w w wC N O O
w ^ w . •
Ifs • •..) w w
N w • N N
O O O^
w o • w w
M • w N •^
r'1 w r1 O •
w • w w ..^
W% w N ri •
• ^ w w ^
U1 w • •" w
. •1 U1 w w N
If1 w • w wO r^ .-1 .-1 u1
w w w w w
•^ N .-1 tf1 rl
.. .. .. .. ..
r''' r•'^ r'^ P:4 r
w w w w w
O O o c 0
N N M .7 u1Elie ^` We C^ C.4
IJ 41 L LJ LI
%1 V1 V1 ff N
fl X:
N
rl
H
N
A
V
NM
4
V;
00
N
• a
^ wM ^
^ N
u r1C N
CL
•N' 4 a^
rr A a
^ oW N N
O Ny .^
N A 6
d
N	 • w
tll	 fp
d 3^ N
A
w ad ^
O [-^ NO
F+	 d .-1
d W Na
w
z.	 fA
V
M d ^
/+	 00 W
V1	 1+ Q!
d 
a^
V4Lot
u o
M rl
v
L4
.i
^O
OD
MW
^^ r
N
N w	 mLn
Y
i
^a
250
9 m
w
w
Q
.•
w N
.
N
• w O O
N • w w
• w w N N
w N rl O O
w • •
1► w w w w
w Q w • •
• w w
w • N N
w
Q • w w
u^
w
rl w .-1 O ^
w • w w ^
w w rl ^ •
O w • • w
• Q w w ^
• ^ ^ •
yyew
^
• •wi w N
• O rl r-1 O
w w w w w
O .^ m rl w
ri N rl
'Me I
.. .. .. .. ..
^ fA dl Vl OD
W W W W W
O O O O O
r) N eel %T U 1
Aj
d
N
)
N N N
.4
w
N
w
n
m
r	 ^
NcnO
W;
CO
N
N
• aN
O w
M v1
&j N
u N
W N^
^ a
0 w
w co a
	
N	
8
d
	
W	 vd
O Q1 A
k YI ^T
QI [^ N
QI v-4
o
^ O
	
rl	 (^ w
fA	 •
	
^1	 ai AI
d ^
cc JJ
11	 n,
ca^i
a
9-4^
to o
.o
^o
w
N	 N
m ^+'a	 m	 mN W	 lA
251
^.	 t
i
t
I
L1
V
al
W	 ^rl
iI
d ^
u
a	 w
N ^
oD .-1 N
N M ^
•e ^ r
r
1
•
N
N
tl +^
n
7
H
W
0
a
N
N
a
w
0
1^
t_.J_.—L—L-1.J.—.1!_IJ —1_1
N N7^ 7
252
v
n^1
n ^
co
.a
H
O)
^o
RL
d
00
a1
1+
d
ao	 ,
do
as
w
V, N
7 N
(	 w@	 .. m m
N	 w — L
253
ORIGINAL PAGE I3
19 MR QUA4JMS
u
a^
m
u
a	 w
ao ^ •
M	 M	
p
m
J-% T
v
OQ W ^1
N
M
H
m
m
In ^•
m
°a1
m
mN
d
m
N
>dO
a^
^o
00
W
N	 N7
t
N w
	
^	 N
254
Iil - -- .----
w
mBIZ
a
^	 YIN
90J
H
d
^	
d1
W
aD
A
d
00
of
1+
d
.-4
O
O
Cl!
%0
50
rf
Lrr
0
wN
O
wN
O
w
O
wN
O
N
O
w
O
•
NO
NNO
u'►
n
N
T
@♦ moo'	 @	 3
N w	 ^	 N
255
v
NT,rl
R
7
N
O
w
O
N
O
NO
w
rl
w
r1
.^i
ri
t
m
tl
N
My
iiM
rn
tf	 D
H
d
A4
N
d
M
N
00
Cs
N w	 61	 . ^P	 IA
256
w
1^	 Nm 
	 7G	 m
N
-	
.._. 
__ A
m
I
i
NO
w
N
O
w
NO
wNO
w
O
w
rl
.^1
w
7 ,^
m ^; a
N w
^ TM
tl
v
W v^l
p
^}M
•
M
y
Q
MI
r7
^d
OQ
O.
aW
OG
d
00
M
1+
rl
F+
NN
00
w
N
?M
L L -L--1
257
I1-
li1C14
Ln
m
w
0VV
w
a0MM
N
^rl
MY
5
dN
4dN
Ir
14N
^O
00
.rdW
vW	 N	 m
256
v
_	
♦7
1	 .
'	 M
M
^	 a
e
C14
Y
4
N a N
^	 m
M	 M	 ^
m	 •m	 o
m
U3	 w
cv	 N	 o
LOOR QUALM
m	 ^
•	
y
m
M
M
n	 e
m	 _
N
Lid
N
vW
259
cr
N
ow
O ^
N N
14
•	 E4
N
C
m
cr
I
cv
N
1 ^
Y
1+
O
Y
v
a
g
CD
Y
d
W
N
d
d Y
00 41
W !f
4 a
u'►N
^O
^	 00
m	 4+
!"1
^ v
^ Oi	 ^ co
260
O
W
el
Ve
H
w
,rV
N
•rlO
Y
a
b
a
H
I
a wd
m
►% A
^ e6 a
K cr
Q
cr
c•
CY
cL4
M M
LD - `•'- •• \
CD
1
t^
v
^n
m
i
i
m
n
!+1
a
CD 913 N
* 261 •
^oN
^O
m	 o^
m W
m
It
NCL
Y ^
m
l,A
N
CL
Y
N
M
LA
M
Q'
LD
Q^
NQ
Y
In
N
co
aN
Y
m
	
^--	 N0
a
co
ea
X
a+
3
	
f	
M
	
1	 a
	
CL 	 a
d
1+0y
u0
w
00
L
	
•	 N
a
N
rl
N
d
C7
^ 71
CY i+
N
y +^
Of+ C1
cc E
> eC¢ a
N
%D
	
f CD	 oD
^m
O •
Y	 Y
I)RIGINAL PAGE 3
	 ^ ^r•i
Q$ POOR QUALITY
CD 3
In W	 N CD
26 2
t
cn
a
p ^
CL
m
m
N
4
N
Q'
W
	 CS)
C9
CD
C9
N
CD
m
N
a
1
N
m
N
CLIN
M
M
N
CL
4
MCD
C9^.,
Ln w
N
CL
a
u
,4
3
M
a
u
4
0u
u
to
w
93
O
44
u
0N
44
Ai
^o9
d
N
v^l
N
7 1+
Q► d
7 +^a
41 0
to $4
t0	 td
w a
d
d d
coN
^O
00
W
263
C36
I
cc
.O
rl3
cn
d
d
a.r
d
F+
O
a^
u
w
a
O
.+
L
N
N
YI
•4
YI
a+
m
d
N
rl
Al	 fA
O 14
N 0!
y
E
00 ►+
O O
^+ a
d
d
o^
N
^O
00
Ls.
v v
	 ^y
t' W	 LO	 CD
N
264
0aH040H aN3CWV,
o c^
w
w
r+ o
a^
0
e a
nr G E+M 44 HW
rk,O^ 	
Oa 0^ H ^
a1	 ^ a1
0M
o:DEHaaH N9 0UN O 0 9
ON
w
W4
V oz cnH N
WV
°a °a
a
a^
._._.. "'
	
v 0 a
"—'	 a a
a	 cr
+^	 a
V
a a
04 V
H °	 •• H	 °o a
a w	 z
H a	 W r-I N M
a	 a
265
'rYI CD
m
GN
!'d
41
ac
$4
V
w
0
ro
N
.-1
a
r-1
	
My
4)
R
.,.f
H
b+
41
•.q
ro3
a^
a+
ro$4
a^
Q
.4M
^O
b;
w
v	 ^
W
266
aa°
a,9
a
w
0
v
ro
w
a
0
.,4
41
ro
N
L
m «'7
N
M
CL	 O
E
H
•rl 0{J •rl
.r y!
N ^
341
ro Ot p O
ro^
ai o>Z4—
N
fr1
tT
•rl
w
O^ fC9	
OF PO ' N4,C 
^,A
U	 d
m
w	 267
O
7
ON
R
N
O
R
4
q
O
.,4
41
ro
N
.r4
4)
N
R
•.1
0) 4J
E
•.1 r 4
H •-1
to
a ++
.1
.H a
ro0
3V
v O
ON O
to
$4 a
v0
>u
a--
N	
m94	
268
rO
4t
d
41
7
a
as
A
H
a
b
O0
V
ro
w
O
44
41
ro
N
r-1
J
N
N	 ^
CL
E4
R q
•^ O
O ^
b+ O
ro V
2
a --
^o
m
w
w	
269
a41
a
0
0
a
w
0
v
ro
w
a
0
•,4
ro
N
•,4
0
.-r
.A
a a!
.,4
•rl .-1
E, r-4
ro
•H 0
Id 03
0% a)
ro +^$4 a
al 0Ua"
44
P'1
.r4
w
m
L9
n
9
N
CL
r
	
r --	 r	 Y
	
w	 270
M3	
N	 271w
0^1
a
d►
R
4A
pa
OG
Oa
a
i
4J
oc
M
V
ro
W
O
O
ro
.4
•4
a^
H
.a .
N G
0 •F4
0 41090 4)41
o r
0% o
rou
w
>z°
a --
4
M
1C
b+
•d
lom
M
$4
O
a
LO
co
2	 +,3	 Z
M w	 a- 272
n
z
a
n
a
a
M
^m
11
O
M
N
dl
ba
w0
E
u•
u^
,se o
N
O •^
.-1 IIV^
. O
1A b+
•.i
to
^g
E •^
0+ to
Q
V O
•A $4
roa
dVt0% E
w W
^ O
v
M
^D
"4
r1
NZ
m
w
A
^ Z •
t
A4
M
a
^' N
O ^
w
a
a . E+ a
a
^, o
.pe
of
++
N
s
E 0
O N d •r+
4 a00 CDOD S'
x^
u0
•a
> a^
H
w
Ew0
. ti+
•^ O
4j •.1
3d
dl C
O+ O
rou
a Z
^oM
to
d1
w
Z ^ 3 zm
w 273
0m	 u)
M w
	 274
U)
N	 m
^e
o►
va
I'm-
m
N CoN
ba+
^a
0E
H&
so
ua
x
^w
•+ o
u 04
DO
H^
Cv
Z"4
d1 C
O^ d
O ^
f^ Cd O
>u
^' v
to
44
vm
M
O
0a
O
w
a
ar •
Ew
RwoO NM
0
s
EO
ONd •.,
^(A
^g
x^
u0$+
•a
^o
> ar
•.^ w
E 0
w
O+
C R
••^ O
3d
G! R
ON
 O
to
w
>za..
a
v;
..,
w
•
n
!90
a	 Z
M
M
mN
m
N2
m
275
Z	 3	
,
Z^
C	 v	 y/
M	 w	 276
0
(Y
N
ci
U)
z
m
t
er
^o
d+
.A
w
0
0
w
a
w.o -.ww
a a
woNQ ^
la U
a$
w
www
oA
^o
H^
wo
r4 $4
ua
uA
.x H
V
o$4
r+ ou 0
>^
w E+
E
N M
a
•^ a
•^ a
3V
wo
ON w
b V
N O
w O
^U
m
M
aC
6
•v
N
u0
a
a
fr
as
M A
w ^N
4.4
CL MON
► 	 u
w
a0
a+0
N44
r1
44
I	 d
•rl
E
00q ^
^ d
a^ 7
rl GJ
3 0'
W b0
I	 -H
i+ u
Od ra
NS
I'O
to
•► 1W
m
M
/1	 w
w •pn 	 ^F p
004
Ql/	 IS
6
Iq
G
N
Y
277
w
Y
a
a
N
N
0 y
a
M 0
w ^N u^
a
a+
m
m M
D
a
H
GD
41 Q!
^ O ^
CS) 
C 
C
0 00as
d a+
0o z
e0 0
^ad
d C
t0
P u
N w ad^CS) ^o
^o
44
w
m
i
r^
{	 ^ M
a N
^	 iJ	 rlj	 v N
f
278
—maim".
 -
PIN
m
n
m
7
m ^ ®	 m	 m	 ^
a+
W
x+
^ a
D ^
D d
V ^
w
a+
0
M
O
C^ VO
D a
d
w0
N ^
U
.!C
u
0
v
a
m
00
00
e
a+
3
d
eoN
CD Q C
d
^o
M
W
C
N ^
7
m
rt'
i 279
•ymay^
	 ^' w 3
	
bX4(^cn.JA	
t ia-1	 .^1'	 t`^	 4l'i:
,G *'*'.: ^^pK.:,
	
i' 1	 ,^'.F	 _ F r!	 v _' t [*may' {+r t	 zX` t C 1 r 
	
yr r ,^^
	 ^.CLst ^,	 r	 yfir, r_r ^	 d.:;^f.,^..	 ^	 _
	
q	
va
N
m
d
a
a+
0
M
Wd
u$ O
ow
C d
m ^
w
O
H
N d
C uu
N ,^
--	 u0
U
C
OD
a^ 01
M ^
3 41
u
a
^ L
►+ a
01 L
C d O
qr	 A
^O
00
,a
W
N7
LA
N
v
W
CM N mN3
m m m m m
280
:3
s:
N
N
ao
1
OD
m ^
0
H
ar
C
m
N .^
o
v
ao
M
H
m C
L a
mar
d wW C
co +d
ow 41d 0
^ d a°
C
m
st• `O
^o
M
w
a^v
w
7
LA V^^' M N
C
CD
^ m m m m
281
e+1o
wN
a
w
rl
Q.
N.
}`^"1
	
^
UM^I r1	 '
w ww w
*4 N*A rt
w ww w
^ NN .•1
m a
a
ro
NQ
w
a
.es
M
a
^o
m
H
V
a
H
CD a
N g
a
d
eo
m a^
^was
wad
8 ^°
M
m
r
N3
CD
7	 7	 3	 7	 7
LLa ^+° 	 Vim'	 M	 Nv 
W
f	 ^—
fi
t
i
	
282
^ O'
^^qqw O O
w wi • •
•' w
^Vw
` w N rl O v
O
': O ON
w O w • •	 .
p 1A 'w O O
T w
Ow ^r1 O •
N111 w rl rl
• • w
^ p w w rl
w • r) r) •
w • • w
• p 4 .4 O
Q rl rl rl 111
w
rl N rl 111 rl
i
.;1
1M•
R.,Q 10 ON
111
	
N e+1 ^D 0p	CQ
Z
I
N	 N	 n n N	 N	 N
CL CL CL	 N
® N N	 OD	 m
a
v
94
I
283
7
t9	 N	 of	 to
tl'	 M	 N	 00
wa
w
7
t9
O
w
1^
Q
w
• O O
• w w
/ w N rl- O
w w w Nw
Qi^j rj w w
O eel w O O
w Q • w w
• w N ^
w • w w .y
M w r1 ri •
Q • • wQ w w ^
N w w N
• • O rl m O
NUN WN u1 M O rl m m V1
du1	 eNr1
Go
0
&W
w
9
w w w w
N N rl N •-1 H1 MWO
N	 N	 N N N N ^
In 
Q e3 dH V4
N eh d N
1
1
M
J
C
S
W
O
M
a+
en	 ^
O fs
d
H
.es
	 eu
w
O
M d
eu a
^	 eo
w
a6
M
ra
e0
N
O^
a
^o
eic
,r
w
284
i4i
w^
W
•
i
rr
-row
Q	
ORIGINAL PAGE 1S
c0
N	 M. PMR QUALITY
r9
im
it
r
N
a
m
a^
^O
Md
4
M
rl
vt
N
00
*1
o
N
03	 03
N	
n	 lw	 N	 m
..
r^Y
v
t W
286
v
m ^ M
M
M
H
pM
0
H
W
M
e
.4
w
NN
00
M
r ow
I
^p
> 	 >•	 6
CD
N	
Of	 n	 le	 N	 m
287
ws
wNO
w
O
w
	
N	 •O
w
	
•^	
N
	w 	 N
rl
•
	
w	 a {^
	
•^1
	
M qW
•
• •.1
	 00
I N 1 1
X  OtH
N
7
mN
aY
vW
m
03 1 03
T N co
SIX
ri
N
b
1A
a
N
d
wda
eo
w
e
w
eo
w
288
v
IO
w
•
- N
y^•.t o
w
NO
m
C
^	 ^
igW/l	 p
o •n00 P4
N	 N^ N
	 ^	 M	 M
h' ^r Tv^i N H y
N
N 7v^o, ^
OD
N CD
0
^M
N
M
h
^ a
H
d
o^
O.
aD
a
a►
00
cd
w
A d
M
dN
00
MW
289
OD v N CD
"I Lar
%^-fl 94
so
44
4
to
0
Pw
a]
10
La 
ao
 
eo
co
O
ui
eo
290
Q1	 ^
'a	 a0	 Q	 N	 m
d
a
a^
aa
aa
c
G N
d
to
d
s+O
u
m
w
a0
D M
cc
• r-1	 N
9 a ,a
M
a+
m9
Gl
N
Y'1N
r
^	 a
a
a
a,
eo
dd
..a
w
291
OD	 0 v	 (M
OD
CD c
"4
41
(D
cc
0
-r4
41
-H
rl
ri
cn
CL
TI
tc
CM >
ec
QP
CD
AV
292
md
a
a
	
DO	 ^a
a,
0
d
d
w
O
a+
u
	
c0	 CIE.
	
m	 o
N v1
O.	 J.1
N
rl
rl
"4
t^
'a
	
^	 d
	
•	 N
-0
N
01
7
C'
4!
CO
O
F+
Cl
N
	
m	 ai
u^
w
^►N
3
w
OD	 (D	 IT	 N	 m
293
Y
Am
to
OD	
N
r.N
3	 '
v
W	 '
l
294
v
LA
d
c
a
a.+
a
a+
0
wO
N
m O
Z
d
d
d
Cl
u
^ d CMM ^
a o
,.r	 d
au
a
d	 a^N
M
w
^ ad
®	 C'
N	 d
u
a
0^
o
cc
,4
w
Ln	 M	
co
	
m	
m
3y
295
s
pV
a
d
ma^^
a
d
d
a►
m ^m
v	 ^,
eo
d
d
,a
W
F i.
.^
w:
CD
•—	 00	 N
N
m
G
CD
m
1
1
I
296
g
h^
eS
`.	 I s
Y
a
•
n
• a
d
^ 4
j N
€ d
-	 j a
o	 u
•	 ^ m
Q ^o
P4 w
^C-4
N	 N	 N	 N	 N
T z	 .! y W
i	 a	 OD	 t0	 N	 m
W
297
30
c
m
V)
c0
D
n
c
9
m
OD	
'p'	 N
	
mN0
298
OD
OD	 co	 tv
tai
to
I
CD
299
O D E+ a 0 E+
	 A H 2 m to
4 • • •
= a
l
h
M a0 Yw 0
h FFF • ^ a
a
I 3 e.
•	 N
ORIC NA,L PA ;E is
Q^oz	 ^'Yp/^	 a
~ Xc
n
47- 
t	
.2
1	 113
lMf 	 CY
301
0 ocou
r4 MWW4
D
y
er
oc
t;
ow
G
m2
..P4
4?
QW
03
O
N
O
N
Z
7
N
O
	 I
D
D
^7	 N	 7	
.7.^1
.4
v*v
m
M
m
v
a^
A
IrM
A:
A
x
u
A
a
0
H
e
y
u
o+A
.w
4
^o
^o
EJ+
302
1	 ^^ Zh i	 ^ Ri.
t4+.;
4M,"
tY^ y
y r.	 .
i
0
if}
v,
7
in 3^ 7
V1
7 viM
Y
m
M
m
M
•
w^
W
/W
MM
.0
E
d
a
.A
w
a^
Wx
U
roa
00
V4H
ON
R
3
ON
to
14
.o
tr
.,q
Ad
303
0
co o
C-4 	
®Y
N
0
Y
W
CDN
04
n
z
7	 7	 7	 7
tl'	 M	 Ni
	
In
.-. N
o3
l
	 W
t
f
7
m
a
aa
a
.N
A
E
41
a
a,41
A
ro
N
x
U
ro
a
w
a^
N
b+9
.,j
V
.H
ro3
m
ON
to
a^
a^
.o
cr
w
304
V
Ii o ti
dl
a
^ a
^ a
V	 +^
a
a,
a
r4b
.04
w
V
xN v
a
m
H
M
R
.,a
41
41
to
3
tr
rts
a
0
^o
w
00
O
N
0N
0
CD
mM	 N
N
ll
N
W
305
Y
9-
m
m
04
Y-(9
m
m
m
M
W
a
a4J
a^
r
H
41
a
a,
a
ro
4J
4)
x
U
^o
a
a^
H
tp
a
tyl
ro3
a^
HU
a
H
r
%D
•,q
w
•	 1
w
	 c
In	
w3N ^
	 M	 N	 m
w
i
306
Now.
nrz
i
d
7
o	 0
^oA
a
a^
m
tp
x
E
d!
c3
a
b
.A
$4
w
aY
W 0
m
xN V
to
a
ui
a^
H
a
ro3
a^
a^
ro
a^
a
N
r`
t0
.r4
m
r	 W	 W	 .	 -
4J3
..
W
307
d1
a
O+
O
.,4
O
Oa
a,A
E
41
O
41
roa
ro
41
a^
x
v
ro
a
a^
a~
E
ON
O
41
ro
3
O
O+
ro
a^
a
M
r
O+
•,i
w
o Y0 0 ^9
m
mN
9
v
e
I
's
i
C7D	 (	 (V
M
W
308
id
c9999h
m
ya
a+
W
F
1	
i
I
WAr
Y
a
a,
a
a,
aro
ro
a
x
U
^O
C.
9
U
E
b^
ro
a^
ro
a^
>a
w
309
v
as
9
r4
0
ao
tp
ro
#4	 9
V
tr
t;
-A
04
CD
%W
CDCMO.
~'M tp
V	 NO
to
0
OD
310
c
c
r
999
;V
m
a
a
a
w
a^
a
a^
w
d
v
a
^o
E
b3
d
b+
a
^o
w
o v
..
.r
w
311
v
W.
C
41
a
d
a^
a
m	
41m ^
M u^
a
a
a
H
d+
e
41
v+
a
a^
w
a
m a
'	 r
'	 r
g
^^	 00	 t0	 st	 lV	 m
v	 'W
312
JK
f
4
1
I
TN
0
P.
A
dC
V
A
a
E
9 C
CC	 •.^
A
3
A
0+
A
w
A
a
ao	 -+
w
a,
^o
o^
N
^G
a
a^
N 7
M N ._ M
313
0on.
0 U^
lbO
in •
in
	 O
ObN •
• • NN
.Nr100in • • . .
OIA rINNON •000.0 • . •
tn • rl • •N 4b •NNOUl •OO
• tftb rl •
•0 • . •in • • N r)
rl • rl O' •
• rl • • •
U1 • rl rl •
O in • •
• 0 • %P4
lb • rl rl •N lb • • •
ri in • % N
•OrIM0
N • • •
O A rl A in
rl N rl in M
. • .	 .
N N N N N
r1NMq'in
•^^Il1eq^^
►1#Mel►
WWi►7^^
to to N N to
4l M
V A
pi 	 an 04cc
PC4 	 P4 9-4 r4
&Mk014A
•W
to
^ O
9C
d;
W
to
d1
O
vi
O
ml
0
C
(A
7
r1E
d+
C
i^
m
tT!p
rl
rt
N
m
O+
r
^O
tT
ow
y 43 A
M N m
i ^a
as
r	
314
aiR^M
d
^q
ai
rl
7
^.i
M
W	 01
x
a
a
m
w
d
a
m
ad
m
o+
w
d
0
co
^o
b►
w
m
LA	 N
al	 m	 M	 m
.a
a
oa
315
v
a
a^
O^
a
a
01	 ^
V A
a
t
w
x
a^
x
v
v
a
a^
14
44
w
a^
a^
d
w
a^
w
+o
a+
44
w
0
in pq S
14%
o^0
CD
N
--	 Of	 m	 M	 m
N
v
316
v
k
	
0N
a
v
°o
a
o ^
a
a
430
a°
a^
H
a
v4J
to
a
b
•ri
v
^^ a
rn 41
a, vxx UV
a a
I •^
a^
a
a^
a
a
W
tyl
.a$4
a,
a
N
d+
W
I
•	 .	 ,	 , CD
^	 N
M	 -3
W
317
v
amp
O^
..4
41
N
H
4J
Cl
roa
ro
N
k
a
a,
x
U
ro
Ow
(D
N
.r4
W
dl
N
a
N
ro
M
10
.,4
w
0cc
N
.<
x
rn
'	 C
V
LA	 N	 m
0)	 M	 m
3v
W
318
r
APPENDIX A
INPUT SERVICE ROUTINE MICROCODE
r
319
ia
2 IaL
O 4
CV
t
^h :'i •
0 Z
Z 2 --1
Z 2 `-!
T z
z
L LL
aZ
H v
t
.
y+ ^LL aLL
sh
^ y^l N
^ ^ a o 0
3^mN
1
H
y,
L
pO
2 o
H
1
t
i
320
i R
H
T `^
2 ^'
N
T ^
o ^
2
W
r
s	 v
^+	 IL
s
a
n
F-
Ns	 I
I
d
2
o •^2
O
o ^
z
o ^
Z
T
NN :
s	 ^
_c
Y
a T
LL
Q
n tg ^
NN
V
^ a ^
V
c^
	
N	 ^7
-T
	
s	 ^t 1
L S
	s 	 —
a
4 +
'^ w N
-f v w
^ T 0 nH .. .t
al
Cd
o2
T ^
o ^
Z
N
T
o ^
2
W ^.T
r
LL
4
W
v ^ ^
v
	
a	 µ-
i
	
V	 m
	
• o°	 d
	
a	 y,
r H
1
VG
eQ
T
321
Ii
r
N4
4.
ILJ
A
? 19` Z 2
0
No
^
K
e IL
.^
V'
It, LL. V-
Q
^ Q v
- Q ^
^
♦ M
t
Ul d' de
N
M
00 a if^^ v LL 4
4-e
K
•
^
s^
.M
•J G ^ JI j toa r
-= d
0 d
CA
r
G ^D S p
`i t Q Cl
y
V .t
322
w_ v
aT r
LL ^s
o•	 ^
V
c
u' u-• w
T-1
'e la'^
v
E ^ a
N
T
^ • f
8
`3a I
r
z SL
z ^
2 `y
o
z
T
r
T ¢.
LL
o ^ ^
v
+J
'C u
e ^
e
s sSO
2o ^
T
1
^E
323
1APPENDIX B
PROCESSOR-CONTROLLED ELIST
F^
324
t	 !
d
N
V
d
V
E
N
M
a
w
93
41
r4
P4
0
$4
0
a0
v
N
O
W
O!
O
N
a
t+
.,4
w
325
v
v V ^QV ( ^^ 1 0^ tp tJ ^ v
0 M • ^ ^O td >t
x
Q
W
9	 ^
L Q
t-
d
W
--	 -	 . sA
a
d^^
. Ma
M1
W
O
^
NM
W
„1 v W E
in ty 6 ^
O a dN .4 •rH
W
^ ^ h fa H1 t•H^y 2 W4 W H
N
d+
"1
' W
0
i	 W
0
326
NM^
a
V
N
E
NH
a
W
o4
m
.#4
o1
.1
it
i
f
t
327
ELIST onv.n
tT •od ELIST
IA+ INP.rr
f^
lA 1
RM k!d O.dM/
'	 t«^or Pw+ 1'oM
Fig. B4. ELIST Support Processor
328
v
Fig. 85. ELIST Service Routine Flowchart
329
BLLIST: If DAV = O, JMP to DAC
{ *Is there a Shift Register num-
ber to input? NO: Jump to DAC
Routine.
(Data Port)@Input Polling Circuit + Q
*YES: Input the data from the
selected port.
If DAC=d, JMP to STORE; send a DAC; Release Input Port Poller
*Send a DhC to the input port and
clear the Input Port Poller.
Meanwhile, check to see if any
output port requires new data.
If none do, jump to the STORE
Routine.
ELIST OUTPUT PORT BASE ADDRESS - ►
 ADDRESS LATCH
i	 Q 4 Selected Output Port
:	 *If a port requires data, enable
it onto the data bus and send
the data.
SEND A DAV; RELEASE Output Port Poller; JMP to ELIST
DAC: If DAC-0j JMP to ELIST
*Is there an output port request-
ing service? NO: JMP to ELIST
Routine.
ELIST BASE ADDRESS - ADDRESS LATCH
(ELIST)@EPTR -► Q
*YES: Fetch a S.R.# from ELIST
ELIST OUTPUT PORT BASE ADDRESS-#ADDRESS LATCH; Decrement EPTR
Q -# Selected Output Port
*SEND DATA and update EPTR
SEND A DAV; RELEASE OUTPUT PORT POLIXR; JMP to ELIST
STORE: ELIST BASE ADDRESS - ADDRESS LATCH; Increment EPTR
Q -+ (ELIST) @EPTR; JMP to ELIST
*STORE the data in ELIST
'._	 Fig. B6. ELIST SERVICE ROUTINE
330
W
iMIST Service Routine
a) S.R.# available as well as requested
b) S.R.# required from RAM
c) S.R.# stored in RAM
d) No data available or required
6 cycles = 0.72 uSec
7 cycles = 0.84 uSec
5 cycles = 0.60 uSec
2 cycles = 0.24 uSec
Fig. B7. Software Execution Times
331
a
I
J
