Study of spaceborne multiprocessing - Phase 2 Volume 2 - Technical description  Final report by Koczela, L. J.
NASA
=...,.=
I
,<I:
Z
CONTRACTOR
REPORT
! : .,
NASA CR-11 5 9
ACCESSION NumBER)
/ , r ' j/ _ __ .........
_EAq_ C.R OR TMX OR AD NUMBER)
/
,,c_ATEGOR'_;--
STUDY OF SPACEBORNE
MULTIPROCESSING -- PHASE II
i Volume II -- Technical Description
by L J. Koczela
Prepared hy
NORTH AMERICAN ROCKWELL CORPORATION
Anaheim, Calif.
for Electronics Research Center
" "" '"/(v",
_ :: .:SL; .
"" 2-¢ 'v";'\
t-.-
. ,-..:
r
- ?.)r
-.. g ..i_', i ''_ • ,;_"
NATIONAL AERONAUTICSAND SPACEADMINISTRATION • WASHINGTON, D. C. ° SEPTEMBER1968
https://ntrs.nasa.gov/search.jsp?R=19680026088 2020-03-12T06:50:31+00:00Z
NASA CR- 1159
STUDY OF SPACEBORNE MULTIPROCESSING- PHASE H
Volume H -- Technical Description
By L. J. Koczela
%
Distribution of this report is provided in the interest of
information exchange. Responsibility for the contents
resides in the author or organization that prepared it.
Issued by Originator as Report No. C6-1476.22/33
Prepared under Contract No. NAS 12-108 by
NORTH AMERICAN ROCKWELL CORPORATION
Anaheim, Calif.
for Electronics Research Center
NATIONAL AERONAUTICS AND SPACE ADMINISTRATION
For sale by the Clearinghouse for Federal Scientific and Technical Information
Springfield, Virginia 22151 - CFSTI price $3.00
P_F.CEDING PAGE BLANK NOT F!Lt-_,F.D,
FOREWORD
This final report describes the results of the Phase II portion of a study con-
ducted under NASA contract NAS 12-108, "Spaceborne Multiprocessor Study". It was
performed by Autonetics, a division of the Aerospace and Systems Group of the North
American Rockwell Corporation. The work was administered under the direction of
the National Aeronautics and Space Administration, Electronics Research Center,
Computer Research Laboratories, Cambridge, Massachusetts. The NASA project
manager was Mr. G. Y. Wang.
The contract participants during this phase and their primary contributions are
listed below:
L. J. Koezela Principal Investigator, Requirements, Paral-
lelism, Input/Output, Failure Detection and
Reconfiguration, Cell and Group Switch Design
Communication Bus, Reliability Simulation,
RF Communications
P. N Bogue
- Group Architecture, Executive Design,
Example Program, RF Communications
G. J Burnett
- Computer Structures, Cell Design
T. T Carter
- Reliability Simulation
H. W. Copenhagen - Reliability Simulation
J. A Luisi
- Technology
J. R Macy - RF Communications
W. E. Meyer - RF Communications
D. M. Motley - RF Communications
This report is being published as Autonetics document number C6-1476.22//33.
iii
m ._
,-._,_.__DING PAGE BLANK NOT F!LM_D.
CONTENTS
Foreword
Summary
1. Introduction
2. Requirements
2.1
2.2
2.3
Missions
Failure Detection and Reeonfiguration Time
Computational Requirements
, Technology
3.1 LSI Today.
3.2 LSI in Two Years
3.3 LSI in Ten Years
4. Parallelism .
4.1 Introduction
,
-- o
4.2 Assignment and Sequencing Parallel Operations
4.3 Application of Parallelism Studies
4.4 Results of Parallelism Studies
5. Computer Structures
5.1 Development of Computer System Architecture.
5.2 Operation of the Distributed Array Memory
and Processor System .......
5.3 Communication Structures .....
6. Group Architecture
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
Introduction • •
Cell States
Cell Identification • • •
Source of Instructions . .
Sources of Addresses . .
Sources of Data
Execution of Instructions
Neighbor to Neighbor Communications
Additional Topics
Page
o°,111
1-1
1-1
2-1
2-1
2-21
2-22
3-1
3-1
3-7
3-i0
4-1
4-1
4-3
4-6
4-19
5-1
5-1
5-12
5-16
6-1
6-1
6-1
6-3
6-4
6-7
6-9
6-12
6-33
6-35
CONTENTS(Cont)
, Input/Output
7.1 Input/Output Operation
7.2 I/O Mechanization
8. Failure Detection and Reconfignrati_n •
8.1 Failure Detection Methods
8.2 Reconfiguration
8.3 Backup Equipment Assurance
8.4 External Status Reports
9. Cell and Group Switch Design
9.1
9.2
9.3
Processor Features of the Cell
Functional Description of the Cell
Group Switch
i0. Communication Bus
10.1 Inter-Cell Bus • .
10.2 Inter-Cell/Inter-Group Communications
10.3 Mechanization of Inter-Cell Communication Bus
Commands and Control Words
10.4 Mechanization of Inter-Cell Communication Bus
Operations in a Cell
11. Executive Design
11.1 Introduction
11.2 Group Executive
11.3 System Executive
11.4 Cell Executive
12. Reliability Simulation
12.1
12.2
12.3
12.4
Monte Carlo Method
Simulation Models
Simulation Results
• •
Summary and Conclusions
13. Summary and Recommendations
7-1
7-1
7-11
8-1
8-1
8-5
8-15
8-15
9-1
9-1
9-17
9-61
10-1
10-1
10-14
10-19
10-29
11-1
11-1
11-1
11-27
11-29
12-1
12-1
12-4
12-9
12-30
13-1
v±
CONTENTS(Cont)
Appendix A. Example Program
A. 1 Introduction
A.2 Example 1
A. 3 Example 2
Appendix
Appendix
Appendix
D. 1
D. 2
D.3
D.4
D. 5
D. 6
Appendix E.
Appendix F.
References
Bibliography
B. Error Analysis of Monte Carlo Program .
C. Monte Carlo Simulation Results.
D. Communications
Introduction
Communication Requirements
R F Circuit Technology ....
R F Communication Systems
• • * •
Computer Organizational Considerations
Conclusion
Error Control in the Communications System
Alternative Distributed Organizations .
Page
A-1
A-1
A-4
A-9
B-1
C-1
D-1
D-1
D-1
D-2
D-5
D-22
D-26
E-1
F-1
R-1
B-1
vii
I LLUSTRATIONS
Figure
1-1
2-1
2-2
2-3
2-4
2-5
2-6
2-7
2-8
2-9
3-1
3-2
3-3
4-1
4-2
4-3
4-4
4-5
4-6
4-7
4-8
4-9
4-10
4-11
4-12
4-13
4-14.
4-15
4-16
4-17
4-18
4-19
5-1
5-2
5-3
5-4
5-5
5-6
5-7
Distributed Processor Organization .
Missions
Experiment Location Tree
 unctiAsScientific Experiment Computational • •
Interplanetary Lander Mission Storage Requirements
Interplanetary Lander Mission Speed Requirements
Interplanetary Flyby Mission Storage Requirements
Interplanetary Flyby Mission Speed Requirements
Computer Storage Requirements-Manned-Mars Lander Mission
Computer Speed Requirements - Manned-Mars Lander Mission
SOS Complementary MOS Memory Array
Packaging Concept
Layout of Distributed'Processor .
Sequential Steps in Computation . .
Applied Parallelism in the Computation
Applied and Natural Parallelism in the Computation
Sequential Solution of Computational Problem • • • • o °
Utilization of Applied Parallelism in the Computational Problem
Utilization of Applied and Natural Parallelism in the
Computational Problem _ . . .
Assignment and Sequencing of Tasks in the Applied and
Natural Parallelism . . °
Degree of Parallelism Utilized .
Degree of Parallelism vs. Computation Reduction Ratio .
Degree of Parallelism vs. Storage Required Per Cell .
Computational Problem Assignment Among Groups
Degree of Parallelism vs. Storage Required Per Cell
Using Groups of Cells
Degree of Parallelism vs. Number of Groups
Assignment of Tasks ....
Storage Graph/Cell
Applied Parallelism Speed Curve
Applied Parallelism Speed Curve
Natural Parallelism Speed Curve
Natural Parallelism Storage Curve
Structure of the Cob Web Array (From Ref. 24)
Holland Machine •
Solomon Machine . .
General Cell Block Diagram . .
Distributed Processor Organization
Full Intercommunication Distributed Processor
Arbitor Communication Structure .
Page
1-3
2-2
2-11
2-15
2-23
2-24
2-25
2-26
2-27
2-28
3-6
3-9
3-12
4-1
4-2
4-2
4-7
4-8
4-9
4-10
4-11
4-12
4-13
4-14
4-14
4-15
4-17
4-18
4-21
4-22
4-23
4-24
5-3
5-5
5-5
5-8
5-9
5-17
5-19
viii
ILLUSTRATIONS(Cont)
Figure
6-1
7-1
7-2
7-3
7-4
7-5
7-6
7-7
7-8
8-1
8-2
9-1
9-2
9-3
9-4
9-5
9-6
9"7
9-8
9-9
9-10
9-11
9-12
9-13
9-14
9-15
9-16
11-1
11-2
11-3
11-4
11-5
11-6
11-7
11-8
11-10
11-11
11-12
11-13
11-14
11-15
Bus Operation Example .
I/O Structure ....
Intergroup Bus I/O Scheme
Two Methods of Intergroup Bus I/O
Intercell Bus I/O .... .
Selected I/OApproach . . .
I/O Using Neighbor Lines .
Communications Lines to a Cell .
Request I/O System . .
Active Redtmdant Test Cells Within a Cell Group . . .
Output Switching of Critical Conditioners . . .
12-bit Instruction Word .
Use of TwoB/TBits . . .
14-bit Instruction Word . .
Use of Three B/T Bits . .
16-Bit Instruction Words .
18-Bit Instruction Word .
Byte Instruction Word . .
General Cell Block Diagram .
Detailed Cell Block Diagram .
Real Time Clock
Neighbor-to-Neighbor Communication .
Basic Memory Cell Utilizing Complementary MOS
Transistors Without Selection or Readout Provisions . . .
Logical Operation of a Coincident Select Memory Cell . . .
Organization of a Coincident Select Bit Plane Array . .
Connection of 6 Arrays to Form a 512 Word,16 Bit, Memory .
Group Switch Block Diagram .
Link List for RT C Requests ......
Periodic Control of System Resources by Real Time Clock
Processing a Cell's Request .......
Cell Poll Executive Routine . .
Executive Flow Chart _ . .
Preliminary Time Schedule," Inter-Cell Bus
Constructing the Bus Schedule . . .
Shifting the Bus Schedule . . .
Splitting the Periodic Use of the Bus . .
Cell Assignment Procedure .....
Software Test and Reconfiguration Routines
Machine Error Routine . .
Inter-Cell Bus I/O Routine ....
Inter-Cell Bus I/O Complete Routine .
Page
6-6
7-2
7-3
7-4
7-6
7-7
7-9
7-10
7-15
8-3
8-13
9-4
9-4
9-6
9-7
9-10
9-10
9-16
9-18
9-19
9-50
9-55
9-57
9-58
9-59
9-60
9-62
11-4
iJ-5
11-7
11-9
1]-16
11-18
11-19
ii-19
11-20
11-21
11-23
11-31
11-32
11-34
ix
I LLUSTRATI ONS (Cont)
Figure
11-16
12-1
12-2
12-3
12-4
12-5
12-6
12-7
12-8
12-9
12-10
12-11
12-12
12-13
12-14
12-15
12-16
12-17
12-18
12-19
A-I
A-2
A-3
A-4
A-5
B-I
D-1
D-2
D-3
D-4
D-5
D-6
Cell Executive Real Time Clock Routine .
Block Diagram of Monte Carlo Simulation . .
Mars Orbital Configuration . . .
Non-Critical Phase Configuration .....
Critical Phase Configuration ......
Example of Computer Print Out . .
Average Availability vs Number of Spare Celis Per Group.
Average Availability vs Cell MTBF Varying Number of
Spare Cells _ . . __ .
Average Availability vs Cell MTBF Varying I_um'ber" of
Inter-Group Busses .....
Average Availability vs Cell MTBF Varying Number of
Inter-Cell Busses _ _ _ ___ _ _
AverageAvailabilityvs MTBFOFF/MTBFON
Average Availability vs % of Failures Affecting Busses .
Percent Unavailability vs Mission Time for
N=O, K=O, Y=O .....
Percent Unavailability vs Mission time for
N=0, K=I, Y=0 ......
Percent Unavailability vs Mission Time for
N=I, K=0, Y=0 ......
Percent Unavailability vs Mission Time for
N=I, K=I, Y=0 .
ProbabilityofSuccessvsMTBFOFF/MTBFON- -- - ---- " --
Cell and Group Switch Failures vs Cell MTBF .....
Cells with External Connections Failing vs Cell MTBF . . .
Inter-Cell and Inter-Group Bus Failures vs Cell MTBF .
Navigation Program Flow Diagram ....
Example 1, Inter-Cell Bus Requirements . . .
Example 2, Program Allocation to Cells . .
Example 2, Inter-Cell Bus Usage . .
Example 2, Timing . .
Error vs Ps in Monte Carlo Results
Dielectric Effects on Size . .
Solid-State RF Power Oscillators . . .
Signal/Crosstalk Ratio Plot .
Intra-Computer Communication System Simplified Block
Diagram {Single Channel) .
Probability of Error as a Function of Data, Threshold Level
and Signal-to-Noise Ratio
Theoretical Performance of'ON:OFF Amplitude Modulation
with Envelope Detection in White Gaussian Noise . .
Page
11-3_
12- 3
12-5
12-7
12-8
12-10
12-1#
12-16
12-17
12-18
12-19
12-20
12-21
12-22
12-23
12 -24
12-26
12-27
12-28
12 -29
A-3
A-7
A-10
A-12
A-27
B-4
D-3
D-4
D-7
D-8
D-12
D-13
X
ILLUSTRATIONS (Cont)
Figure
D-7 Simple Time Multiplex ......
D-8 Simple Frequency Multiplex
D-9 Detailed Block Diagram of Simple Frequency'Multiplex"
D-10 Full Frequency-Simple Time Multiplex .
D-11 Construction of New Cell ......
D-12 Time Frequency Spectrum , . ,
D-13 Multiple Dependent Program Sources . . .
F-1 Decentralized DAMP System ....
Page
D-15
D-16
D-18
D-20
D-24
D-25
D-25
F-2
x±
TABLES
Table
2-1
3-1
3-2
4-1
4-2
6-1
6-2
6-3
6-4
6-5
6-6
6-7
6-8
10-1
10-2
10-3
10-4
11-1
11-2
11-3
11-4
12-1
12-2
A-1
A-2
A-3
A-4
A-5
A-6
A-7
A-8
A-9
A-10
A-I1
A-12
A-13
Mars Planetary Mission Computation Requirements
Integrated Equipment Components - 1966
Extrapolations to Distributed Processor
Applied Parallelism Results
Natural Parallelism Results .
Cell States
Instruction Categories .
Summary of Instruction Execution
CC Instructions
GC Instructions
GC-CC Instruction Execution.
GC Formats.
CC Transmitted Instructions .
Communication Bus Operations
Communication Bus Commands
Inter-Cell Bus Group Switch Commands
Intergroup Bus Group Switch Commands
Control Program Data
Hardware Resources Table
Software Requirements
Program Requirements
Mission Phases
Simulation Results .
Example 1 Storage Estimates
Relative Times Used in Programming Examples
Communication Sequence Timing
Subroutine Timing Requirements
Program for 1.2.4.1 (Ref 2) .
-- .
Program for 1.2.4.2 Through 1.2.4.5 (Ref 2).
Controller Cell Program to Generate Sequence One
First Sequence Received by CX, CY, CZ
Neighbor to Neighbor Communication
Second Sequence Received by CX, CY, CZ
Third Sequence Received by CX, CY, CZ
Fourth Sequence Received by CX, CY: CZ
Example 2 Storage Estimates
Page
2-19
3-3
3-4
4-19
4-20
6-2
6-13
6-14
6-16
6-17
6-18
6-24
6-31
10-3
10-3
.10-15
10-17
11-3
11-11
11-12
11-13
• 12-4
12-12
A-5
A-6
A-8
A-9
A-13
A-13
A-16
A-17
A-19
A-19
A-20
A-21
A-25
xii
SUMMARY
This final report presents the results of a research study of an advanced multi-
processor computer organization for future space missions. The organization is
entitled the "Distributed Array Memory and Processor" computer. A manned Mars
lander mission was selected as a representative mission for application and the
computer requirements were developed. The organization developed utilizes an array
of cells capable of a high degree of computational parallelism. The feasibility of such
cells is dependent upon future LSI technology. Research was carried out in the hard-
ware design _-_ _-_" ...... '-- _
1. INTRODUCTION
The purpose of this study was to investigate an advanced multiprocessor com-
puter organizational concept. This organization is called the "Distributed Array
Memory and Processor" and will be referred to as the "distributed processor. " The
distributed processor evolved from the Phase 1 portion of this study (Reference 1);
the purpose of the Phase 1 portion was to evaluate various multiprocessor organiza-
tions for future space missions.
As a base for the study, computer requirements for a future manned space
mission were used; in particular, a manned Mars lander mission in the 1980's was
used. The basic requirements for this mission were developed during Phase t. These
requirements were updated for this phase of the study and are presented in Section 2
of this report. This mission covers a broad spectrum of requirements (long duration,
widely varying computational loads, and high reliability demands); therefore, using it
as a base will result in applicability to many other missions in the same period such
as extended earth orbital space stations.
In addition to defining the requirements, it was necessary to define the technology
to be considered as state-of-the-art for the time period of interest before proceeding
with an investigation of the organization. Although the time period of the mission con-
sidered is in the 1980's, it is necessary to have the technology available to hardware
designers considerably ahead of when the missions are actually flown. Therefore,
technology was projected out 10 years from the present. In particular, MOS(1) LSI
technology was considered and it was projected that large wafers containing 100,000 or
more active devices would be available.
The organizational concept exhibits the capability for computational parallelism.
Section 4 Contains a discussion of parallelism within computations and results of an
evaluation of the space mission computations to determine the amount of parallelism
contained therein are presented.
(1)MOS -- Metal-Oxide-Semiconductor
1-1
Figure 1-1 contains a block diagram of the organizational concept. The organ-
ization is seen to consist of a number of identical cells interconnected in a particular
manner. Each cell consists of a general purpose processor section and a small
amount of memory (512 16-bit words) on a single MOS wafer. The cells are divided
into groups (4 groups of 20 cells each are considered for the spaceborne application)
and these groups are connected by an intergroup bus for communication. Within each
group the cells communicate with each other by an intercell bus and by neighbor
communication lines. Each group will have one cell designated as a controller cell;
the remaining cells can be operated independently or dependently of the controller
cell. This organization is thereby capable of simultaneously taking advantage of
applied (global control) parallelism and natural (local control) parallelism within com-
putations. The organization is seen to be somewhat technologically dependent in that
it is desirable to have each cell manulactured on a single wafer, thereby resulting
in having one identical wafer throughout the entire organization. However, this is
not a rigid restriction since more than one wafer could be used to construct a cell.
The organization offers considerable advantages in application to future manned
space missions: (a) efficiently meeting the widely varying computational loads of
different phases of a mission, (b) efficiently mechanizing the diverse requirements of
various subsystems of a mission such as a command module and a lander module,
(c) an overall net reduction in power due to the ability to turn modules on and off,
(d) increase in reliability, given that failure rates of dormant equipment are lower
than operating equipment, and (e) enhancement of probability of mission success and
availability due to reconfiguration around failures at a low module level. The latter
advantage is most prominent for this organization since it offers many levels of grace-
ful degradation by reconfiguration around individual cell failures.
The organization was subject to a detailed analysis, in particular, the following
topics were considered: (a) the development of this distributed organization, (b) the
development of the architecture of the organization, (c) input/output methods, (d) fail-
ure detection and reconfiguration, and (e) software and executive programming
methods.
As a result of this analysis, a preliminary design of the cell was conducted.
In addition, a detailed design of the communication system (the inter-cell bus) was
accomplished.
In addition, a reliability simulation using Monte Carlo techniques was conducted
for the manned Mars lander mission. Statistics for probability of success and avail-
ability were thereby generated.
1-2
NEIGHBOR
/COMMUNICATION
I"r' I _ INTER
I_ L,- ---,---t't.,J I CELL
GROUP
SW ITCH
I--I -
,,o_" I10 CONNECTION
PANEL
INTER
CELL
__ BUSSES
INTER CELLS
GROUP
BUS
I_u_ I INPUT/OUTPUTCONDITIONERS
DEVICES
Figure i-i. Distributed Processor Organization
1-3
2. REQUIREMENTS
The previous Phase 1 portion of the study included a detailed investigation of
the computer requirements for a manned Mars lander mission in the late 1970's -
early 1980's time frame. To be compatible with the greater flexibility and later time
period of probable application of the distributed processor, the requirements were
re-evaluated to include applicability to more missions and to move the time frame out
to span the 1980's. In the first paragraph of this section, the requirements for various
missions will be evaluated and updated for the manned Mars mission. In addition,
further considerations on the requirements of failure detection m_d reconfig-aration
will be presented in this section.
2.1 MISSIONS
Manned and unmanned space missions shall be discussed in this paragraph;
within these two classifications several types of missions shall be presented as shown
in Figure 2-1.
2.1.1 Manned Missions
2.1.1.1 Orbital Space Stations
These applications may be expected to begin with post Apollo applications and
continue indefinitely. It is expected that the first stations would be semipermanent
with a lifetime of approximately 1 to 5 years; eventually the space stations would be
permanent installations. They will require the servicing of a large crew (9 to 12).
On board repair capability due to both on board maintenance facilities and resupply
from earth may be expected. Space stations with an extended lifetime may be
resupplied periodically or possibly on demand by shuttle vehicles. This is particularly
evident when the fact that crews will be interchanged after some length of duty is noted.
This mission/application requires computer system availability as the primary
consideration of reliability. Two portions of the mission may be considered to have
somewhat of a critical nature associated with the computations: (a) rendezvous and
(b) orbital plane changes. However, the criticality of these computations does not
appear as severe as those associated with re-entry or midcourse corrections
encountered on other missions since failures here may not have as much of a direct
effect on crew survival.
Computational Functions: Some of the primary functions to be accomplished in
this mission are inventories of earth resources, astronomical observations, weather
observations, advanced communication services, and other scientific experiments; in
addition the function of a planetary assembly and launch base may also be required.
2-1
(/) m
-F
I
_ I-1z!
--I
i
m
z
-- z
m
m
Z
___i
m
,--1
e_
,
Z
r_
<
z
0
>_
m
Z >_
I I
u
z z
_ _ _i _
I I I
r_
Z >_
5 _
I I
z
,.-1
n.
_1 ,--1 ra.l
..__1
z _
, I I
I
2-2
The computational functions that will be required in this mission are:
(1) Orbital Coast functions, (2) Orbital Maneuvering, and (3) Orbital Experiments.
A further listing of functions required in each of these three phases is given below.
1. Orbital coast
a. Orbital navigation
(1) Rectify osculating orbit
(2) Update osculating orbit
(3) Compute pertubations and derivatives
(4) Update for runge kutta integration
(5) Update state vector estimate
(6) Correct state vector
b. Orbit determination
(1) Noise covariance matrix
(2} Output matrix
(3) System description matrix
(4) A priori statistics
(5) Filter computation
(6) Compute correction
c. IMU aline and bias
(1) Initialize matrices
(2} Propagation and noise covariance
(3) Measurement vector and output matrix
(4) Optimum estimate and control vector
d° Landmark tracker operation
(1) Selection executive routine
(2) Pointing signals
(3) Acquisition and tracking
(4) Observational residuals
{5} aM computation
e. Attitude reference
(1) Star selection executive routine
(2) Tracker pointing
(3) Acquisition and tracking
(4) Kalman filter star data
(5) Compute cB from gimbal angles
I
(6) Compute cIL matrix
(7) Compute C L matrix
B
(8) Generate attitude control signal
2-3
f. Statusmonitoring
(1) Self test operations
(2) Performancemonitoring
g. Telecommunications
(1) Datacompressionandprocessing(2) Dataformatting
(3) Commandprocessing(4) Transmission instrumentationpointing commands
h. Datadisplay
Estimatedrequirements
Storage: 12,500words
Speed: 60,000operations/second
2. Orbital experiments
a. Weather observations
b. Advanced communication networks
c. Experiment on demand (There will be many experiments onboard;
however, generally, only one particular experiment will be conducted
at any one time. )
d. Continuous scientific experiments
There will probably be a considerable amount of data compression done
here. Also there will be some data reduction for the experiments. A few major
experiments such as in the MOL mission may be expected to require considerable
computer mechanization.
Estimated requirements
Storage: 15,000 words
Speed: 150,000 operations/second
3. Orbital maneuvering
a. IMU mechanization
b. Navigation computation
c. Steering
d. Set up orbit injection
Some of f, g, and h (status monitoring, telecomm, display) under
Orbital Coast will also be required here.
Estimated requirements
Storage: 6,500 words
Speed: 30,000 operations/second
2-4
It is expectedthat items (1)and (2), orbital coastandorbital experiments,
will be conductedsimultaneously;during (3), orbital maneuvering, a small
portion of (2), the orbital experiments, maybe required. Therefore the maxi-
mumrequirement occursduring the orbital coastoperationwhenfull experi-
mentationis beingused.
Estimated total requirements
Storage: 27,500words
Speed: 210,000operations/second
_L -1 .... 1 L, JLAO. _, .....1_ _otad be noted +_n+ it is also possible for the vehicle in this mission to
be unmanned at times and possibly still be required to perform some functions
such as scientific experiments.
2.1.1.2 Shuttle Vehicles
Logistics support is essential to long duration space stations both for resupply
of expendables and transfer of crews. The duration of the mission depends on the
length of stay at the space station. Typical missions may be expected to last on the
order of 1 to 3 days. During the active portions of the mission no onboard repair can
be made. However, the mid-portion of the mission, or the time after rendezvous, may
be utilized for repair if necessary.
In general, all of the navigation and guidance computations of this mission may
be considered as critical. Probability of success will be the primary reliability
consideration in this mission.
Computational functions: the computational functions may be broken down into
seven basic phases: Prelaunch, Boost, Orbital, Orbital Maneuver, Rendezvous,
De-orbit, and Re-entry.
The subfunctions required during each of these phases are tabulated below.
1. Prelaunch
a. Accelerometer Calibration
b. IMU Aline and Bias
c. Status Monitoring and Checkout
d. Data Display
2. Boost
a. IMU mechanization
b. Navigation computation
c. Status monitoring
d. Data display
e. Telecommunications
2-5
3. Orbital
a. Orbital navigation
b. Orbit determination
c. IMU aline andbias
d. Landmarktracker operation
e. Attitude reference
f. Statusmonitoring
g. Datadisplay
h. Telecommunications
4. Orbital maneuver
a. IMU mechanization
b. Navigation computation
c. Steering
d. Setup rendezvousinjection
e. Setup midcoursemaneuver
f. Statusmonitoring
g. Datadisplay
h. Telecommunications
5. Rendezvous
a. Sensorpointing
b. Navigationcomputation
c. IMU mechanization
d. Steering
e. Statusmonitoring
f. Datadisplay
g. Telecommunications
6. De-orbit
a. Setupde-orbit
b. Steering
c. Navigationcomputation
d. IMU mechanization
e. Statusmonitoring
f. Datadisplay
g. Telecommunications
7. Re-entry
a. Navigationcomputation
b. IMU mechanization
c. Steering
d. Status monitoring
e. Data display
f. Telecommunications
2-6
The first five phases will probably be loaded into the computer initially; then
the last two phases, De-orbit and Re-entry, will be loaded into the computer when
needed. This then results in the following computer requirements:
Phases 1 through 5
Storage: 19,500 words
Speed: 85,000 operations/second
Phases 6 and 7
Storage: 11,500 words
Speed: 75,000 operations/second
2.1.1.3 Extended Lunar Exploration
The computational functions required during this mission are very similar to
those required in the interplanetary lander mission. The primary difference in this
mission is the exclusion of certain functions such as spin up, de-spin, and aero-
braking maneuvers required for the command module in the interplanetary mission.
Computational functions associated with the lander module are expected to be closely
identical for both missions. The lunar lander may be expected to have more equip-
ment on board for experimentation; however, it will also have the capability for a
much higher data transmission rate to earth, primarily due to the shorter communi-
cation distance. Therefore, the total computational requirement is expected to be
similar since more experiments are involved; however, less data processing for
each of the experiments (data compression, etc.) need be performed at the lunar
base. There will be a significant amount of information collected from satellite
probes, roving vehicles, etc. The lunar bases are expected to be semi-permanent
or permanent in nature with periodic resupply. The capability for on site repair is
expected to exist due to the extended nature of the mission.
2.1.1.4 Interplanetary Lander
As a prelude to most interplanetary lander missions,one or more flyby missions
will be conducted to gain preliminary data on important parameters such as atmos-
pheric parameters, radiation environment, etc. The early missions will be to the
planets Mars and Venus. A thorough discussion and presentation of the computer
requirements for this mission was given in the first quarterly report of the Phase 1
portion of this study. (1) A brief summary of some of the requirements along with new
reformation and changes in the requirements due to the extended time period of
application (the 1980's) is given here.
The computer functions fall into four major categories:
1. Vehicle guidance and control
2. Telecommunications
3. Scientific experiments
4. System checkout and monitor
(1)Reference 2
2-7
This is not intended to imply that the vehicle contains equipment with four independent
or separate computational functions. For example, the system checkout and monitor
function services all the equipment interfacing with the computer system and the
vehicle guidance and control function and the scientific experiments function inter-
change computed results primarily derived from navigation and guidance equipment.
A list of potential instrumentation for scientific experiments was given in the
Phase 1 final report. (1) As more data became available (References 3 and 4) it was
possible to update and add to the initial scientific experiments function. Also, since
the time period of application is beyond that considered in Phase 1, it is likely that
a considerably larger scientific experiment payload will be used. The following is a
revised tabulation of scientific experiments for interplanetary and Mars orbital
phases. This list contains only experiments that require or have the possibility of
computer usage. Examples of experiments that do not require the computer are
maintenance with special tools, collection and observation of soil samples, etc.
2.1.1.4.1 Interplanetary Experiments. These experiments are all conducted
in the command module and will utilize computational facilities on board this vehicle.
The experiments may be listed in four basic classes:
1. Investigation of interplanetary bodies: comets and asteroids in close
approach with the vehicle will be studied.
. Analysis of the interplanetary medium: various environmental properties
need to be monitored such as: neutral gas, charged particles, neutrons,
electromagnetic radiation, meteroid, magnetic fields, etc.
. Observations of solar phenomena: various observations will be made to
determine properties of the photosphere, chromosphere, corona, and
magnetic moment and fields.
. Analysis of the aeromagnetosphere: measurements will be made to
determine a magnetic field map and magnetically trapped energetic
charged particle belts.
A tabulation of some of the specific items to be obtained from these classes of
experiments is as follows:
Meteoroid (flux, mass, density, velocity, directional, and spatial distribution)
Magnetic field (mapping)
High- resolution photographs of the Sun
Coronal Photography
Sequential White Light Coronal Photography
Coronal spectra
X-ray scans of solar disk and corona
(1)Reference I
2-8
Large-scale coronal photos
Lyman-Alpha spectra of flares
Stellar photometry
Neublar photos in Lyman-Alpha
Multi-color mapping of the Milky Way
Map entire sky in the 1 to 10 vIR
X-ray mapping of the entire sky
U.V. photometry of strongly reddened stars
U.V. polarimetry of strongly reddened stars
U.V. photometry of galaxies and radio stars
Galaxy photography
Zodiacal light photometry
Solar high-energy protons
Galactic protons
Solar high-energy electrons, Alphas
Solar wind protons
2.1. J..1 -z.A 2 _,_oL_"_o ,_,_*'1_÷"1_,_._Expe,,irn_f_._.v..._. _l'h_...... _oiontifie ex_)eriments, to be con-
ducted in the Mars orbital phase of the mission may be broken down into three basic
classes:
1. Analysis of the topography and surface composition of Mars: Measure-
ments will be made to determine the amount of energy absorbed and
reflected by the planet in different regions of the electro-magnetic
spectrum, the composition of various areas of the plant, the existence
and distribution of plant life, the topography of the planet, geological
and geophysical investigations, etc.
2. Determination of the periods and gravitational properties of Mars: The
gravitational field and rotation of the planet will be determined and variou,_
properties of the satellites of the planet will be determined.
3. Analysis of the martian atmospheric structure and composition: Measure-
ments must be made to determine the molecular and isotopic density
distributions, the atmospheric density, the pressure, the temperature,
etc.
2_9
In addition some of the four basic classes of experiments listed under Inter-
planetary will be applicable during the Mars orbital phase also such as magnetic field
measurements, charged particles measurements, etc.
To accomplish the desired experimentation objectives the performance of the
scientific experiments may be divided into those centering around two separate areas:
1. Mars orbital
2. Mars surface
more specifically, this could be listed as those experiments centered around (i) the
Command module which remains in Mars orbit and (2) the excursion module which
descends to the surface of the planet and then returns to rendezvous with the command
module. A further breakdown as to the location of experiments within these two
centers is given in Figure 2-2.
The command module is initially in orbit with the excursion module. There are
a number of experiments that will be conducted directly from the command module.
In addition, there will be several probes launched from the command module; these
may be divided into orbital and lander probes. Orbital probes will be used to attain
widely different orbits than that used by the command module such as highly elliptical
orbits. These probes will monitor environments and perform reconnaisscance func-
tions in areas out of reach of the command module. One or more lander probes will
be launched from the command module; these may either be hard or soft-landing
probes. Also it is possible for probes to be launched to the martian moons, Phobos
and Deimos, in addition to the Martian surface.
In most likelihood these probes will not contain their own computational capa-
bility. The scientific and engineering data that is collected will be transmitted back
to the command module for data processing, analysis, and storage. The specific
scientific experiments to be conducted by these probes will not be listed here since
the only interface the computer system has with them is via the transmission link
and the data processing on the data received from the probe experiments is expected
to be quite similar to that on the experiments conducted in the command module.
The excursion module is launched from the command module at some time after
the command module probes have been launched. A base will be set up as a base for
surface activities. The scientific experiment activities that will be conducted outside
the base will include those on (1) the MOLAB Ca roving manned vehicle), (2) automatic
stations (unmanned fixed stations), (3) rockets and balloons (launched from the base
primarily to study the atmosphere). Computation capability will exist at the base and
more than likely on the MOLAB. The other scientific experiment facilities will
transmit their data back to the base for data processing, analysis, and storage. The
base, in turn, will transmit reduced and formatted experiment data back to the com-
mand module. A listing of surface experiments is given in this section.
Some of the specific items expected to be derived from the experiments are
listed. The experiments are divided into two areas as noted before: (1) those center-
ing around the Mars orbital activities and (2) those centering around the Mars surface
activities.
2-10
MISSIONVEHICLE
I I
COMMAND ] EXCURSIONMODULE MODULE
PROBES I BASE [
I i
I I I I I I
i oo,_'_I _°_I _o_]L_::o_,_i_,_ _oo_
Figure 2-2. Experiment Location Tree
Mars orbital:
1. Orbital parameters such as: semimajor axis, period of revolution,
eccentricity of orbit, orbital inclination to the ecliptic, mean daily
il_otion, longitude of the node, longitude of the perihelion, mass,
axis inclination, perihelion, aphelion, mean-orbital velocity, length
of year, etc.
2. Radiation belts
3. Ionosphere and radio wave behavior
4. Photographic mapping
5. IR mapping
6. Radar mapping
7. Wind velocity
8. Cloud photography
9. Solar spectral absorption
10. Magnetic field
2-11
11. Solar plasma
12. Chargedparticles
13. Topsideionosonde
14. Electron density
15. Ioncomposition
16. Radionoise monitor
17. Gustdistribution
18. Searchfor satellites
19. Determinationof the martian moons' PhobosandDiemos, parameters
suchas size, shape, surface features, orbital parameters, etc.
Mars surface:
1. Topographicmapping
2. Siesmicstation short-period range
3. Seismic station long-period range
4. Short-rangeseismic reflection andrefraction
5. Long-rangeseismic refraction
6. Gravimetry
7. Magneticfield
8. Magneticpermeance
9. Resistivity
10. Temperature
11. Heat flow
12. BetaGammamapping
13. Gammamapping
14. Neutronflux
15. Alpha scattering
16. Gammascattering
2-12
17. Neutron scattering
18. X-ray diffraction
19. IR spectrophotometry
20. Emission spectroscopy
21. X-ray fluorescence
22. Neutron activation
23. Mass spectrometry
24. Gas chromatography
25. Thermal analysis
26. UV absorption spectrophotometry
27. Atmospheric pressure
28. Atmospheric density
29. Wind velocity
30. Total incident sunlight
31. Cloud photography
32. Solar spectral absorption
33. Atmospheric composition
34. Water vapor
35. Oxygen
36. Ozone
37. Lander ionosonde
38. Radio noise monitor
39. Riometer
2-13
2.1.1.4.3 Computational Functions for the Scientific Experiments: There are
basically four computational functions required to mechanize the scientific experiments.
These are:
1. Data processing
2. Sequencing and scheduling
3. Pointing and control
4. Data storage and retrieval
Figure 2-3 shows these four basic functions and the next sublevel in these
functions.
2.1.1.4.4 Data Processing. The data processing function consists primarily
of data compression, data handling, and data reduction and analysis. The Phase 1
final report contains a detailed discussion of computational algorithms that may be
applied in data compression. Two main types of data compression were cited in
this report, namely: Compression by an encoding or curve fitting method whereby
the data may be reconstructed after compression and compression by computing some
statistical properties of the data (such as mean and variance). Examples of the first
type of compression are:
1. Debiasing
2. Difference coding
3. Zero order polynominal predictor
4. Zero order polynominal interpolator
5. First order polynominal predictor
6. First order polynominal interpolator
7. Orthogonal polynomial series
Examples of the second type of compression are:
1. Quantiles representation
2. Representation by moments
As mentioned above detailed computational algorithms were given in the Phase I
final report for the data compression methods. Data compression will be used
wherever applicable on scientific experiment data to reduce the amount of data (1) to
be transmitted back to earth, thereby reducing the communication band width require-
ments and (2) to reduce the total data storage requirements.
2-14
DATA
COMPRESSION
DATA DATA
HANDLINGPROCESSING
SEQUE NCING
AND SCHEDULING
FIXED
ADAPTIVE
SCIENTIFIC
EXPERIMENTS
FUNCTIONS
POINTING
AND STABILIZATION
POINTING
AND CONTROL
PREVENTION OF
EXCESSIVE OVERLAP
DATA
STORAGE
AND RETRIEVAL
ASSEMBLE
AND FORMAT DATA
FOR STORAGE
RETRIEVE
REQUESTED
DATA
Figure 2-3. Scientific Experiment Computational Functions
2-15
The data handling function primarily consists of sampling experimental data
and routing it to the proper area for processing, display, transmission, or storage.
This is principally an executive control-type function with tables that are cycled
through periodically. It should be notcd that these tables can be both fixed and
variable, since some may require changes on the basis of previous experimental
results, changes in the mission, etc.
Data reduction and analysis functionally is the complete reduction of measured
data into a useful and meaningful result. With man on board this function becomes
important since there is now an adaptive element in the loop controlling the sensors.
Based on the results from experiments the astronaut may decide to perform more
extensive experiments on an item of particular interest or even to possibly reduce or
eliminate certain experiments. This capability is very important when one considers
that there is a considerable time delay in communicating data to a ground facility for
data reduction and analysis. The communication time delay may preclude the
adaption of an experiment to changes in conditions which may exist for a very short
time.
Two examples of data reduction and analysis were given in the Phase 1 final
report, namely (i) composition analysis of a soil sample and (2) human performance
evaluation. The particular computational algorithms will of course depend on the
particular experiment involved.
2.1.1.4.5 Sequencing and Scheduling. These computational functions are mainly
executive control type of operations. Experiments may be sequenced or scheduled
based on time and position information and other events that occur in the mission. The
sequencing and scheduling could be fixed or adaptive. The computer is particularly
useful when an adaptive role is needed in sequencing and scheduling. The procedures
followed could vary considerably based on the outcome of previous experiments,
changes in mission plans, etc. Computational functions required to mechanize this
consist primarily of tables containing conditional logic sequences.
2.1.1.4.6 Pointing and Control. Some of the experiment sensors will require
pointing and control to obtain the desired information in an optimum manner. The
more obvious of these being the telescopic systems including the TV, infrared, etc.
Computational functions here will require the computer to accept angular position and
rate signals from the sensors and based on inertial navigation information compute
appropriate pointing commands.
The computer can also be used to control the pointing of scanning sensors to
prevent excessive overlap between frames. This involves the coordination of the
pointing mechanization with navigational data. This is one example of an interface
between two major computational functions namely navigation and guidance and
scientific experiments. Another application of the computer in a real time control
loop is in the performance of image motion compensation. The functions required
here involve logic and arithmetic manipulations similar to those required in the
navigation and guidance routines.
2.1.1.4.7 Data Storage and Retrieval. There will be voluminous quantities of data
from a wide variety sources and in a broadlv diffrent spectrum of formats.The computer cgn
assist in the data storage and retrieval function by assembling and formating the data for storage.
2-16
This will involve placing appropriate headers with blocks of data noting important
or revelant mission parameters such as time, position, calibration data, etc. Also
the computer can assign storage locations depending on the rate and quantities of
data coming in from the total scientific experiments package.
It is expected that the astronaut will require the capability to retrieve data at
will from the central data storage facility. He will desire access to data taken from
experiments. This data may be called up in different catagories, some examples
are, by experiment number, by time period, by sequence number (position in a file or
experiment), or simply by storage location. The astronaut should have the capability
of interrogating the central storage system through simple commands and have it
respond by selecting and displaying in an optimally useful way the data relevant to
the inquiry.
An example inquiry may be the request by the astronaut to review all temperature
measurements made by a sensor over a certain period of time. The system should
therefore scan the memory for the relevant data, compile the data with respect to
chronological order, and display the results in a form for ease of reading and inter-
pretation. It is conceivable that the requests may also be for data which will require
computation on certain measured data. For example, instead of temperature alone
the astronaut may desire the heat transfer rates at certain places in the vehicle
over a given period of time. The computer system should be flexible to respond to
a wide variety of interrogation requests, thereby relieving the astronaut of manual
manipulation of data.
In addition to accessing data taken during the mission it is expected that
significant quantities of prestored scientific and operational data will be desired in
the computer system. The prestored data can be used for a direct comparison with
the measured data and generally to assist in the initial interpretation of the measured
data.
2.1.1.4.8 Requirements. One new function has been introduced in addition
to that presented in the Phase 1 final report namely the data storage and retrieval
function. As can be seen from the discussion above this function can essentially use
associative features in searching the memory in retrieval Of requested data. There
are two primary methods of implementing the associative features required. One way
is to have an associative memory, the operation of which is well known and need not
be discussed here; secondly, it is possible to construct pseudo associative programs.
The software approach is particulary attractive since it naturally eliminates the
need for the associative memory. Basically the software approach requires the
placing of data in areas which are located by one or more identifying headers or keys.
When a request for retrieval of data is made, the keys are compared and once a
match is achieved that particular block of memory identified by the matching key is
searched until the desired data is found. What this method requires is the ability
to locate the data according to some fixed number of keys. The data storage and
retrieval function for the scientific experiments is expected to be readily implemented
by the pseudo associative programming approach.
The requirements in the Phase 1 final report were based on a more near term
time period and consequently the resultant data transmission bit rate was considerably
lower (20,000 bits/sec) than that which will be realized in the time period of interest
2-17
here. The capabilities presently predicted for the 1980's appear to be in the range
of 106 bits per second (Reference 5 ).
The fact that a higher information transmission rate will be available coupled
with the inclusion of a larger quantity of more sophisticated scientific experiment
instrumentation will result in a significantly larger amount of computational require-
ments for the scientific experiment function. An example of more sophisticated
instrumentation is higher resolution video data from better reconnaissance equip-
ment (108 to 109 bits per frame).
In addition to the increased requirements of the data processing, sequencing
and scheduling, and pointing and control functions there are the requirements to be
added by the new function, data storage and retrieval. An estimation of the increased
requirements for the scientific experiment function is given below. It should be noted
that the significant factor in affecting the speed requirements is the percentage of
data that will be subject to data processing (data compression or data reduction) prior
to data transmission. With the assumption of the higher transmission rate (106 bits/
sec), it was assumed that 10 percent of the experiment data was subject to data
processing such as compression or reduction prior to transmission.
Mars Orbital Phase
Speed: 1,325,000 short ops/sec.
Storage: 9,200 words
Non Mars Orbital Phases
Speed: 235,000 short ops/sec
Storage: 6,500 words
The remaining three major computational functions, navigation and guidance
telecommunications, and status monitoring are also expected to increase in terms of
computational requirements. However, the increase is expected to be considerably
less than that of the scientific experiment function. Therefore, since the effects of
these increases will be quite a small percentage, the computer requirements will
simply be updated on the basis of the new scientific experiment requirements. A
tabulation of the requirements as a function of Mars mission phase is given in
Table 2-1.
It should be noted that the same discussion as given in Paragraph 2.9 of the
Phase 1 final report on computer requirements applies here. Basically the require-
ments assume a conventionally organized 18-bit-word-length computer, with multiple
accumulators, indexing features, and a basic instruction repertoire.
2.1.1.5 Interplanetary Flyby
The interplanetary flybys will be conducted prior to the lander missions and may
contain similar scientific instrumentation in a number of areas. Computational
functions similar to those required for the interplanetary lander mission will also be
required here. The primary difference between the missions being the fact that there
2-18
Table 2-1. Mars Planetary Mission Computation Requirements
.
2.
3.
4.
1
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Storage Speed
Mission Phase (words) (short ops/sec)
Atm ascent
Earth orbital
Trans Mars Inj
Trans Mars coast
Trajectory corr
Spin up
Spin cruise
De-spin
Mars appr corr
Aerobraking
\
2,224
10,303
6,161
17,886
12,799
12,180
18,476
12,180
13,919
14,520
4,812
110,802
57,554
314,860
379.454
308,550
318,660
308,550
379, 454
287,550
Mars Orbit Inj
Mars orbital
Trans earth Inj
Trans earth coast
Trajectory corr
Spin-up
Spin cruise
De-spin
Earth appr corr
Earth re-entry
12,449
24,633
12,869
17,886
12,799
12,180
18,476
12,180
5,964
8,620
285,454
1,449,402
301,254
314,860
379, 454
308,550
318,660
308,550
199,554
69,010
will not be a planetary orbital phase. During the orbital phase for the lander missions,
the scientific experiments data is operated on in real time and transmitted back to
earth; for the flyby mission there will only be a very short time during which data
close to the planet will be obtained. The collected data will be stored on board and
then transmitted back to earth during the cruise mode.
2-19
Thereare two approachesto dataprocessing for this data:oneis to process the
data in real time before it is stored andthe other is to store it in real time andsimply
processit later as it is being transmitted. The first approachwill reducethe total
storagerequired at the expenseof requiring very high processing speedcapability.
In mostlikelihood a combinationof the two approacheswill beused, that is, for data
which is expectedto havea very high compressionratio it is likely that real time
processingwill beapplied. Thecomputer requirements for this mission will be similar
to thosefor the lander mission. Ofcourse the requirements for certain phasessuch
as planetaryorbit injection andaerobrakingare eliminated here. The requirements
associatedwith the dataprocessingfunction for the scientific experimentsare expected
to differ somewhat;there will be a lower processing speedrequirement during the
planetaryphasedueto the fact that muchof the dataare simply beingstored andthe
processingspeedrequirement will begreater during the trans-earth cruise phase
whenthe stored dataare beingprocessedfor data transmission.
2. i.2 Unmanned Missions
2.,1.2.1 Earth Orbital
Unmanned earth orbital missions may be broken down into three basic classes:
extensive scientific lab satellites, advanced reconnaisscance satellites, and scientific
applications satellites. In general, the computer requirements may be characterized
by several features: long lifetime satellites will require operation over several years,
the computational requirements are relatively constant and not varying as a function
of time, and as characterized by all unmanned missions the capability for on board
repair due to human action is not available.
The computational requirements are expected to be relatively low in these
missions. Some of the basic functions that the computer may be required to perform
are:
1. Attitude control
2. Status monitoring and checkout
3. Position determination
4. Data processing and control for experiments
5. Command decoding
2.1.2.2 Interplanetary
Two types of unmanned interplanetary missions will be involved: flyby and
lander. Both missions have the characterisitic that the capability of human action for
unboard repair is not available. The computer requirements for the lander mission
are very similar to those for the manned lander up to and including the planetary
orbital phase (spin up and de-spin phases are not required here); the unmanned lander
mission terminates in the orbital portion, however. The unmanned lander mission
will, however, have slightly lower computational requirements due to the elimination
of certain functions, such as: display, biomedical and life support, status monitoring,
and information storage and retrieval.
2-20
The computer requirements for the unmanned flyby are similar to those for the
manned mission. However, the unmanned flyby does not require the earth re-entry
phase and, therefore, terminates in the trans-earth cruise phase. Spin-up and de-spin
also are not required here. In addition the same comments hold true for lower com-
putational requirements due to the elimination Of certain functions required in the
manned mission.
2.2 FAILURE DETECTION AND RECONFIGURATION TIME
The most critical periods during the manned Mars lander mission occur during
the atmospheric entry phases (Mars entry and Earth re-entry). The critical nature
of the computations during these phases may be appreciated when one considers that
the computer, as part of its navigation and guidance routine, is computing attitude
commands to guide the vehicle along some narrow corridor at very high speeds. A
malfunction in the computer could cause the issuance of erroneous commands which
could force the vehicle towards a temperature or acceleration limit and result in
disaster to the space mission.
The entry corridor defines the limits of the incoming velocity vector such that
an atmospheric braking maneuver can be accomplished. Entry corridors are generally
determined by establishing overshoot and undershoot limits. The overshoot limit
is defined as the skip out condition and the undershoot limit is defned as a load factor
limit for either structural or biological requirements. Several study reports were
investigated to determine what is currently being used as entry corridors and
velocities (References 6 and 7). The worst case conditions (generally considered for
an 0.5 L/D vehicle) are on the order of 50,000 to 60,000 feet/second entry velocities
and 70,000 to 100,000 feet entry corridors. Currently even more stringent require-
ments are foreseen for further out time period missions (Reference 8); entry velocities
on the order of 75,000 feet/second and corridors of 25,000 feet are predicted. These
predictions will be used here in this consideration of failure detection and reconfiguration
time.
The determination of the maximum allowable time a failure can exist in the entry
guidance system is required in order to determine the necessary failure detection and
computer reconfiguration time. The determination of this time requires the assump-
tion of some nominal position of the vehicle in the entry limits and an analysis of the
dynamics of the vehicle if an error occurs (such as a hard over attitude command).
Such an analysis is beyond the scope of this study. However, based on some simple
assumptions a worst case time may be determined. If it is assumed that the vehicle
is in the center of the corridor (12,500 feet from a boundary limit) and an erroneous
command forcing it towards the corridor limit occurs, a worst cage time to the limit
may be found by assuming the vehicle velocity vector is pointed instantaneously toward
the corridor limit. This time then is 166 milliseconds to reach the limit; following
through with the same assumption a correction may be applied up to this time to
correct the erroneous command. For the purposes of this study a time limit of
100 milliseconds will be set as the failure detection and reconfiguration time. Recon-
figuration is defined here as having the critical computational program with all the
necessary values of the variables mechanized and being performed correctly in a
computational facility after a detected failure. The time defined includes both failure
detection and reconfiguration.
2-21
2.3 COMPUTATIONAL REQUIREMENTS
The requirements {speed and storage) for the interplanetary missions are shown
in Figures 2-4 through 2-7. For purposes of assigning a specific time scale on each
figure, Mars planetary missions were considered. These figures basically contain
three mission phases: trans Mars, Mars (Mars Orbital or Flyby), and trans Earth;
this has been done to clarify the requirements and differences between missions. It
should also be noted that portions of the requirements are not continuous but required
periodically such as 1/2 hour every 5 hours etc., this also has not been shown so as
to point out the basic differences between the missions.
The requirements for the lander mission (manned and unmanned) are shown in
Figures 2-4 and 2-5. The unmanned mission is expected to have slightly lower require-
ments as discussed previously, primarily due to elimination of certain functions
associated with manned missions (display, biomedical, etc. ) and also because that it
will probably precede the manned mission by several years there by utilizing less
sophisitcated technology {lower transmission rates, less experiments, etc. ) It
should also be noted that the manned mission has a fixed Mars orbital time whereas
the unmanned does not.
Flyby mission requirements are shown in Figure 2-6 and 2-7 for manned and
unmanned missions. The requirements show a rise for a short period of time (approx.
1 hour) at the period of flyby and then subside to a level somewhat higher than that of
the trans Mars phase. This is due to the fact that the data collected during flyby
will be processed during its transmission to earth. For the same reasons as given
above the requirements for the unmanned mission are slightly lower than the manned
mission. The manned mission has a fixed duration, terminating at Earth re-entry
whereas the unmanned mission continues on indefinitely.
Comparing the lander and flyby missions, the requirements during the trans
Mars phase are the same for each mission. During the Mars vicinity phase, however,
the flyby mission has a much lower requirement in terms of speed and storage increase
over the trans Mars phase. This is due to the fact that there will not be real time
data processing to any significant extent on data being collected. Only a small portion
of the data may be processed and the remainder simply stored. The exact amount
of processing done here is not firm and therefore the requirements during this phase
are only postulated. The stored data collected during flyby is processed and transmitted
to Earth during the initial portion of the trans Earth phase. This accounts for the
increase in requirements during this phase over that for the lander mission (data will
be processed and transmitted in real time during the Mars orbital phase for the
lander mission).
Figure 2-8 and 2-9 represent updated versions of the speed and storage require-
ments given in Paragraph 2.9 of the Phase 1 final report. This is primarily based on
the increased scientific experiment requirements as discussed in Paragraph 2.2
of this report.
2-22
Z
z
\
q,
I
I
I
I
z
z I
<
z
= I
I
I
I
I
I
I
I
I
I
, t! !
o
I
o
oo
(sml0M) -_o_O£S
2-23
o
o
o
,.r o
m
z
o
o
m
°_,,i
N
%
-._ -_
e_
4
I
N
e_ oF,,I
z
b.
II
z
z
z z
z
\ ,
I I m i t
°
o _.
,-T ,-T
((INOD_IS/SNOIZ V'd3dO Z_IOH$)
2-24
"I
I
I
I
I
I
I
I
5
o
_33dS
o
,-r
b_
r_,
t_
z
A
-° _
co
t_
r,,
<
Z
o
o
t_
t_
I--4
_4
!
¢q
1t
I
l
I
_1
I
I
I
B I
z
:z
n < I
z
z z I
I
I
I
I
I
I
I
I J t
÷
I
_2
(S(IHO.,Vi) 3 9V'dO,I,S
o'_
U'I
Ill
o
O"2
.o
;a
-_ _
I,-I
I
u)
,
z
, o
o
2-25
¢
I
1
I
I
I
I
I
1
2: m
Z Z
I
I
I
I
I
I
I
I
I i
I
I
I
I
I
I
I
I
I
I
I
Ul
°,..i
r_
.2
ul
°l-.i
(D3S/SNOIZV'd3dO i,0000[ )
2-26
0
,._
o0
X_£N3-_clH£_V_
¢*q
h'-8OD "dddV H,L_IV3
NIdS _0
_tSI n_D NIdS
c-_
dfl NIdS
_. ] _oD fV'd±
o ,[
£SVOD H£_IV_]SNV'd£
fNI H l._V3 £NWIZ
,LNVOD IV£II]80 S'dVIAI
fNI ,LI880 S"dVIAI
r--4
¢;
O NI>I V"d8 -O_I_IV
hl_lOD '_tddV S'dV!AI
N'Id$ 30
3SI_HD NIdS
dfl NMS
_lSOD f V"d ,L
£SVOD $_IVlAI £NV'd,.L
I "fNISSVIN £ N V"d,L
0 _ O0 0
o1
t 1 l
o o o
o
m
oo
,¢
¢o
..LSVOD _3,Llff',:lO H,L_IV;I ¢_
:LNaasvl2
o
(S(I_IO_) _IDV'dO,LS
.£
O
"O
I
O
I
.t.=t
,'_
0 0"
r_
_u
o
o
0
2-27
oo
I
I
I
I
I
II
I
J I 1 !
oo oo ooo o
o
c_ c_
(3_S/SNOI,LV_I3dO) q33d$
o_
0
r_
r_
I
¢0
r._
0
I
¢,q
.,,=4
2-28
3. TECHNOLOGY
The objective of this section is to indicate the means for mechanizing a novel
machine-organization, the distributed processor, within the constraints that are part
of the microelectronics revolution. The distributed processor will be introduced
and developed in later sections of this report in addition to detailed design des-
criptions. A summary of this description is presented in this section to place
the technology discussion in perspective.
The distributed processor is a computer organization made up of a number of
cells, each cell fabricated by large-scale-integration (LSI) on a single substrate. A
cell has both processing logic and memory and operates as a small computer. For
example, a cell may contain a memory of 512 words of 18 bits each and a small par-
allel-by-bit processor having approximately 5,000 active devices. (See Figure 5-4).
The cells are organized into groups, each group may contain on the order of 20 cells.
For the requirements defined in Section 2, the computing system may contain about
four groups as shown in Figure 5-5. These numbers represent preliminary estimates
to describe the nature of the distributed processor and the numbers are subject to
revision as further detailed design of the distributed processor is undertaken.
However, this does establish design goals for the technology.
This section also describes current work on LSI, and some of the immediate
research-and-development goals. From this base, trends are identified which can
be extrapolated to predict the capabilities of technology for mechanizing aerospace
computing hardware within the next ten years. This speculation is a necessary part
of the dialogue between system-organization designers and hardware-fabricators in
the pursuit of advanced systems. The conceptual approach must exploit the expected
progress in the fabrication-technology without indulging in wishful-thinking. This
section points out the anticipated developments that will make it possible to construct
a distributed-processor-computer in the 1980's. Given the undeniable trend toward
more complexity, as exemplified by LSI, the concept of the distributed processor was
conceived and is being advanced as a candidate organization which effectively trades
increased hardware complexity per wafer for an advantage in reliability, expandability,
and power dissipation.
The following discussion of technology is presented in three parts: (1) A review
is given of the contemporary technologies for LSI describing the state-of-the-art today;
(2) The problems and goals of present R&D programs are outlined to anticipate the
progress toward LSI within 2 years; and (3) A projection is made to predict the achieve-
ments in I_SI within ten years. Special attention is given to those recognized problems
most difficult to solve and needing particular attention to realize an effective solution.
3.1 LSI TODAY
There are two device-oriented technologies that lead the development in LSI
today. Complete functions are being mechanized by integrated networks using; (1) bi-
polar transistors or (2) metal-oxide-semiconductor-field effect transistors (MOS-FET).
Both of these competing approaches to LSI take advantage of the planar-fabrication
process in bulk silicon wafers to produce monolithic structures. Each technology
offers unique characteristics which can be exploited to achieve digital signal processing
3-1
Thesecharacteristics are summarized in Reference 9. For anygivensize of the
active devices,bipolar transistors offer a basic speedadvantageover MOS-FETSfor
two reasons: (1)Bipolar transistors candeliver chargefaster to changethe voltages
across capacitiesthat are associatedwith the interconnectionsin a large-scale-integrated-
circuit; (2) The effective Signal-swing,AV, is smaller for the operationof a bipolar
transistor. The speedadvantagefor bipolar over MOStransistors is about10X{Ref-
erence9). However, the process for makingbipolar transistors is harder to control
andthe transistors are larger in area comparedto fabrication of MOS-FETs. The
MOS-FETapproachhasadvantagesof 2X in process simplicity and5X less area com-
pared to the fabrication of the samefunction usingbipolar transistors (Reference 9).
Thesetwotechnologiesdocompetebecausesystem/circuit designerscan schemeto
trade increasedcomplexity for speedto achieveanoverall desired function. There
are bothadvantagesand disadvantagesassociatedwith complexity, butwith the useof
batch-fabrication, the cost of complexity is no longer theprinciple factor to becon-
sidered. In particular, aerospacecomputinghardwareis not evaluatedin terms of
maximizing through(speed)dr in terms of the number of computations per dollar.
Instead, the characteristics of low power, high reliability, and small physical size
are important factors. As attainable complexity increases, the advantages of MOS-FETs
over bipolar transistors become increasingly important to the system/circuit designers.
Table 3-1 is a comparison of some results achieved in LSI by each of the two
technologies as summarized in Reference 10. Also included, are entries describing
integrated circuits manufactured by Autonetics which use p-channel, enhancement mode
MOS-FETs in four-phase dynamic logic networks.
The basic feature of the distributed processor organization is the integration of
memory with processing logic on a single substrate. Assuming that the storage and
control for a single bit of memory requires about 10 devices, each cell of the processor
would then require about 100,000 devices to mechanize (N = 512 x 18 x 10 + 5,000
= 97,160 devices). Using the data from Table 1, the required area and power dissi-
pation can be estimated for a cell mechanized today by the two leading technologies.
These estimates are presented in Table 3-2. The arrays chosen from Table 3-1 are
somewhat representative of complex logic although the Honeywell Memory probably
has a more regular and simple structure then the Autonetics DDA.
It is clear from the estimates in Figure 3-2 that stand-by power dissipation ks
the dominant limiting factor which makes the fabrication of 10°devices on a single
substrate unfeasible today. It is also clear that at this level of complexity the advan-
tages of MOS-FETs over bipolar transistors are impressive.
There are other batch-fabrication technologies being developed that might provide
low-standby power and, therefore, should be considered as a means for mechanizing
the proposed distributed-processor. Three of these are: (1) Cryoelectric thin film
devices; (2) Optically coupled logic and memory devices and; (3) Heteroepitaxy of
thin semiconductor films in which active devices are fabricated.
Cryoelectric technology has been promoted as a means for realizing very large
memory subsystems in which the overhead-like expense and power consumption of the
refrigeration equipment can be spread over a very large number of bits e.g., 10 9 bits.
(See Reference 11). However, one of the principle motivations for developing the con-
cept of the distributed processor is to improve computer reliability. The use of
cryoelectric technology would require that the reliability of the refrigeration
3-2
oL_
0
0
0
LO
0
_0
0
0
¢Xl CQ
&
oO
cQ
o
¢Xl
00
0
T-4
A A
¢q
A
0 LO
,A
o_ Izl
! L_
_q
N
LO
0
0
0
o
_ • 0
0 • 0 0
0
12_
0
0
0
0
_4
0
0
0
0
0
0
0
0
0
0
_d
0 0
0
0
1_0
0
1.0
0
I
0
0
0
0 0
0
0
0
0 0 0
F_
"_ r_ 0 0 0 0 r/l
0 ©
0
o
o
I 0_,I
0 Q;
0
¢.)
Q;
c_
0
o,-_
0
E_
3-3
Table 3-2. Extrapolations to Distributed Processor
Required Area for 105 devices
Power dissipation for 105 devices
Bipolar* MOS**
7.0 in 2 @ 14,000/in 2
250 watts -25 ns speed
Figure of merit
1.6 in 2 @ 60,000/in 2
6.5 watts -250 kHz clock
*Honeywell Memory used as representative (See Table 1)
**Autonetics DDA used as representative (See Table 1)
equipment was much superior to that of the distributed-processor-cells, a claim which
has never been suggested. The power consumption and size of such a system would
also be very large.
Optical computing elements have been described in the literature; (Reference 12)
and their use for inter-cell data-transmission has been considered. The use of optical
coupling from radiating storage and logic devices to sensing devices should be faster
than achievable by wired interconnection but the efficiency of the coupling, in terms of
the energy required to transfer the data, probably will be less. The most complicating
feature in the use of LSI with optical devices is the additional constraints of physical
registration and allignment of devices to achieve the desired optical paths. The LSI
technique of discretionary wiring, to enhance yield, is more difficult to use because
the physical location of electrically tested circuits is no longer arbitrary. The develop-
ment of materials, devices and LSI techniques for optoelectronic computing systems
may lead to dramatic advances in computing performance. While the potential for
such breakthroughs is acknowledged, the use of optoelectronic technology does not
appear to offer predictable advantages for the proposed distributed-processor.
The growth of thin-semiconductor-films, in which active devices can be fabri-
cated, is receiving considerable attention because of the potential for LSI. Aparticular
example is the growth (Heteroepitaxy) of single-crystal silicon-on-sapphire (SOS) by
chemical-vapor deposition. Since this technique was developed by Autonetics research
chemists (Reference 13), Autonetics engineers have taken a particular interest in this
new candidate for LSI. The basis of this interest is the possibility of creating large
networks of devices having unusually small stray capacity, i.e., high speed potential.
There are two reasons for the small stray capacity:
1. Interconnecting lines are made on top of the insulating sapphire surface.
Conductor-to-substrate capacity is eliminated.
2, The P-N junctions for devices are formed perpendicular to the substrate.
Therefore, the junction area is directly proportional to the thickness of
the film which is typically one micron. 1Junction capacity is extremely
small, a few femto-farads (femto = 10- 5).
3-4
Using SOS, three kinds of useful computing devices have been fabricated: diodes,
p-channel MOS-FETs, and n-channel MOS-FETs. Combinations of all three have been
fabricated successfully on the same substrate so that complementary devices can be
used to realize memory and logic functions. Based on these developments, two I_I
projects have been attempted at Autonetics:
. Project 1 involves the design and fabrication of large diode-arrays. Perfect
96 x 70 arrays have been delivered to the Massachusetts Institute of Tech-
nology for a read-only memory application.
. Project 2 involves the design and fabrication of large complementary flip-flop
arrays for high-speed, very low-power read/writc memory applications.
Yield of the R-S flip-flops in a 22 x 24 flip-flop array is low at present, but
the characteristics of working flip-flops are encouraging. The description of
this memory array and some measured characteristics are summarized as follows:
Size: 17 devices per flip-flop (6 p-channel MOS-FETs, 6 n-channel MOS-FETs,
and 5 diodes)
25 x 25 mils per flip-flop.
22 x 24 flip-flops per array
The array includes power distribution and read-out electronics.
Packing Density: 22,000 devices/in 2 on 1 in. dia sapphire substrate
Standby Power: 60 nan.watts per flip-flop at room temperature.
(512 x 18 flip flops would dissipate less than 1 milliwatt)
Speed: Read operation - 200 nanoseconds
Write operation - 300 nanoseconds
The development of the complementary SOS technology is important to the
realization of the proposed distributed processor. The combination of high speed and
very low standby power is unique to the complementary circuit and, with SOS, the
mechanication is very attractive. The major cause for the low-yield at present is
the critical nature of the doping-process step which converts the silicon to form the
channels in half of the MOS-FETs. Experimental work is in progress on new process
steps and device structures to simplify or eliminate this critical step in the fabrication
of complementary integrated circuits. It should be noted that the dimensions of the
individual devices on SOS are about three times larger than comparable dimensions of
devices built in monolithic form. This is an indication of the current state of the
development and does not constitute a fundamental limit due to photolithography or
imperfections in the silicon film. Work on devices with smaller topologies is planned.
Figure 3-1 is a photograph of the complementary flip-flop memory array.
3-5
3_6
c_
o
©
c_
o
©
!
p_
b_o
o_ml
3.2 LSI IN TWO YEARS
Research and development programs are in progress to solve the recognized
problems in LSI. This work can be divided into three categories:
1. System/circuit design through mask-set fabrication
2. Materials, devices, process improvements and packaging
3. Testing and customizing
This section of Section 3 wili describe the nature of the problems under study
and predict the state of the development in two years.
3.2.1 Research Category 1 - Computer Aided Design
When circuits are assembled from discrete components, it is possible to confirm
the validity of a circuit-design or adjust parameters by building a bread-board model.
A valid comparison could be made between actual performance and predicted perform-
ance. Two basic complications frustrate this approach in LSI. First, LSI makes it
possible to build systems which were impractical to build with discre te components.
The word, "large", in LSI really adds a distinctive new feature to the design problem.
It must be recognized that I_,SI will produce complex circuits of non-linear elements
which are many times more difficult to analyze, in terms of device models and
equivalent circuits, than any previously analyzed circuits. This fact could be ignored
if it were not for the second complication: the system/circuit designer cannot avoid the
detailed selection of the topology of the integrated circuits. The relationships of device
sizes and layouts to design-performance-goals could be used by systems/circuit
designers for performance-optimization if the design process were augmented with
machine-aids. However, the optimization of dynamic-performance through a man-
machine interchange (e. g., conversational-mode computing} is a huge undertaking.
LSI leads to smaller packages but to simulate the complex inner working will require
the solution of many simultaneous non-linear differential equations which describe the
network-dynamics. As a result, it can be seen that smallness in size does not make
the system problems any easier to solve. In fact the problems are more difficult if
only for the reason that integration excludes the flexibility afforded by the manipulation
of separable subsystems.
Current R&D work on computer-aided design is directed toward the creation of
new design tools for the automatic handling of design details as described above. The
objectives are to permit the designer to exer cise simulated alternative designs, to
select optimum device characteristics and to create the tooling (optical masks} for the
fabrication of systems using LSI. -Further, computer-aided design development
promises to relieve the designer of the greatly increased, tedious, book-keeping chores
in which the human is most prone to error. In two years, these design tools should
develop to permit the dynamic simulation of MOS-FET networks with about 100 nodes.
The quality of the device-models will still be fairly crude and the man-machine interac-
tion will still be painfully awkward, but estimation of dynamic performance and
optimization of topology should be possible in 100 node networks. However, the auto-
matic layout and generation of art work (masks} for I__I will not be possible within the
next two years. Human pattern-recognition-skill will outperfrom machines for these
tasks in the foreseeable future.
3-7
3.2.2 Research Category 2 - Materials, Devices, and Processes
Significant advantages have been attributed to complementary MOS-FET circuits
on sapphire for the fabrication of the proposed distributed processor. The anticipated
development of this technology, CMOS by SOS, is judged to be of special interest for
the purposes of this report.
The Autonetics project on the CMOS by SOS memory array should produce a
complete working memory module with about 10,000 devices integrated on a single
substrate. This accomplishment will result from three developments:
. Improvements in the quality of the single-crystal silicon film which are
grown on the sapphire will permit predictable and repeatable diffusion
operations.
2. An alternative method will be found to replace the critical process step
which is used now to convert (dope) the silicon when forming the channels
in one-half of the complementary devices
3. Discretionary wiring will be used to enhance the yield by combining function-
ally- tested flip-flops into custom-wired sub-arrays, e.g., 16 x 16 working
flip-flops from a total of 22 x 24 flip-flops.
It should be noted that CMOS by SOS is not a one-company development any
longer. In particular, RCA is reporting significant progress under a USAF sponsored
program in this same technology (Reference 14 and 17).
There are other process developments in LSI which will contribute to the eventual
mechanization of the proposed distributed processor. For example, the experimental
work at Texas Instruments on multilayer-interconnects and their pioneering work on
automated discretionary wiring are of particular interest (Reference 10). Both of these
developments are keys to the successful fabrication of very large, high-density arrays
of devices such as those in the proposed distributed processor. Texas Instruments has
made substantial investments in capital equipment, e.g., automatic CRT generation
of masks and automatic laser-position-s ensing plotting machines, to support their
judgement on the potential of discretionary wiring and automation in LSI.
Packaging of CMOS by SOS wafers will be developed to increase the volume-
density of devices to form more complex computing subsystems. For example, it
would be possible to realize one cell of the proposed distributed processor in a cubic
inch within two or three years by stacking 18 of the previously described Autonetics
CMOS memory wafers with a processing wafer to form a physical structure resembling
that shown in Figure 3-2. This packaging concept is under development at Autonetics
for the Diode-Array Memory program. Interconnection from wafer to printed-circuits
are shown as conventional ball-bonds but a better method should be developed. Two
candidates are:
1. An adaptation of the beam-lead concept, developed by Bell-Telephone
Laboratories
2. The combination of ultrasonic techniques with flip-wafer to execute simultane-
ous bonding of the metalization pattern on the wafer to external wiring patterns.
3-8
oI
.4,-4
3-9
3.2.3 Research Category 3 - Testing and Reliability
One of the most sobering aspects of LSI involves the matter of establishing the
testing program and the criteria for acceptance of a processed wafer. It should be
noted that a computer cannot be tested completely because it can exist in too many
internal states to permit the complete exercise of the machine. The fact that a
machine might exist on a single substrate does not change the astronomical number of
tests that would be required for complete testing. In addition, integration tends to
add a number of unplanned for internal states, characterized by unique worst-case
combinations of regular internal states and signals, whose existence would be difficult
to predict. Finally, the option to exercise seperable subsystems is lost by the act of
integration. Under these conditions, the question of what constitutes a failure-in-the-
field is difficult to answer. It is apparent that a designer must prepare carefully to
be able to define the test-for-acceptance at the end of the fabrication process. This
preparation will involve three areas: I. Simulation will assist in the search for
worst-case conditions before array-designs are cast. 2. Discretionary wiring will
help to restore the advantage of being able to perform functional tests on smaller sub
systems on the wafer using external-to-the-wafer instrumentation. 3. Built-in-test-
equipment will be included to provide integrated instrumentation to assist in the testing
of large integrated systems.
In two years, a combination of special _robing/testing machines and built-in-test-
equipment will be used to inject test-signals at operating speed, to monitor the responses,
and to perform the data-management required during acceptance-testing. In this same
period the development trends in LSI technology will become clearer as experience
with largeness is gained and new problems are recognized. These trends will further
influence the machine-organization designer as he attempts to exploit the technology.
For example, the ease of building shift-registers in integrated form might lead to
conceptual approaches which trade memory capacity for access speed. Another strong
possibility is to incorporate error-detection and error-correction coding into the hard-
ware so that reliability may be improved using data-redundancy, as described in
Reference 15. Reliability improvement through hardware redundancy has also been
described in very great detail in Reference 16. The distributed processor is another
candidate organization which suggests an approach to afffect a trade of hardware for
reliability. However, on the question of predicting the reliability of complex integrated
arrays, very little is known. The claim that batch-fabricated interconnections on a
wafer must improve matters is generally accepted; as already discussed, the distinction
between production faults and in-the-field failures is blurring due to the difficulties in
defining acceptance tests for the complex products of LSI. Nonetheless, it is possible
to predict that integrating complete complex functions on a wafer, such as a cell from
the distributed processor, will probably provide a good increase in system reliability
due to a decreased number of interconnections from each wafer and a decreased
number of packages.
3.3 LSI IN TEN YEARS
The feasibility of building the distributed processor, described in this report by
1978 requires the development of a number of disciplines in parallel; e.g., computer-
aided design, materials, devices, processes, automatic testing and data-management.
The states of development for these LSI fundamentals have been described in
previous parts of this section. In the following paragraphs of Section 3, a closer
3-10
examinationis madeof the distributed processor so thatthe progress required to
achievethis advancedcomputinghardware canbeappreciated. The questionof feasi-
bility in ten years can thenbe resolved into specific judgmentsconcerningthe reality
of specific predictions.
Figure 3-3 is an illustration suggestinga grosslayout usingSOSfor a cell from
the proposeddistributed processor. Thememory sectionof this diagram includes
provision for bothdata redundancy(double-error-detection, single-error-correction
using 8 extra bits for the coding)andfor discretionarywiring (within eachmemory
subarray, 768flip-flops exist from which 512will beselectedby discretionary wiring).
It is not the intention to specifya firm commit ment to using data redundancy
and in particular 8 extra bits for purposes of achieving higher yields. However, these
numbers are used to illustrate a possible method to approach higher yields and will be
used in the sample calculations for a cell.
The processor section includes sufficient area so that more than one copy of the
processor can be fabricated. Discretionary wiring can then be used to choose a good
processor and thus increase the wafer yield. The firstobvious question concerns the
density of devices required to integrate such a cell on a single substrate. The highest
densities will be within the memory arrays. The number of devices that must fit into
such a subarray is computed as follows:
i0 devices/x 768 flip-flops/= 7680 device_'
N = flip-flop subarray subarray
The alloted area on Figure 3-3 for each subarray is computed as:
A _ ['_- :13I E 1 D I D2_f_ x 3 /'_ = 7--2" inches2
where D is the diameter of the sapphire wafer.
The growth of larger sapphire crystals can be anticipated to provide wafers which
are 1.5 inches in diameter (compared to 1 inch maximum diameter now). Thus the
required density within a memory subarray will be:
Density =
7680 devices
2.25/72 inches 2
"- 245,000 devices/in 2
Compared to the present Autonetics CMOS memory project, this is an increase of
about 10X in density. It was noted earlier that a factor of 3X decrease in linear dimensions
could be expected within the CMOS by SOS array of static flip-flop without pushing the
present state-of-the-art in_hotolithography. Therefore, it does appear feasible to
achieve an increase of (3X) which approximates the required factor of 10X to realize
the densest part of the cell. If more memory capacity were desirable, it might be
possible to trade access time and power to gain the higher density found in dynamic
shift registers. This option is available to the machine-organization designers to
increase the cell memory as the concept of the distributed processor is developed.
(An explicit tradeoff of static versus dynamic shift register memories is not in order
here; however, the shift register memory will be practical to build if its power dissi-
pation can be sizably decreased from that of today).
3-]1
>u
o_
! !
_u
f.._
o_
NUH
! ! !
o
o
o
o
°_-,I
r/l
°_,,I
o
_4
I
o
,%
3-12
The second basic question concerns the power dissipation of the static memory.
Total power dissipation is composed of two parts: (1) standby power, and (2) dynamic
power. The basic advantage of CMOS by SOS is that low standby power can be achieved
simultaneously with high speed performance within the memory subarrays. This
standby power is computed using measured data from Autonetics CMOS static Flip-flops
as follows:
512 Flip-Flops/ 60 nanowatts/
Ps = 24 subarrays x Subarray x Flip-Flop
= 740 microwatts per cell at room temperature
The standby power for an entire distributed processor should be less than 0.1 watts.
(lmw/cell x 20 cells/group x 4 groups_ 0.1 w). Recalling that the duty cycle on memory
cells is very low, the basic advantage of CMOS by SOS is impressive.
Dynamic power, consumed by charging and discharging the nodes of the network
during operation, is more difficult to estimate. The dynamic power associated with
any single node of the cell would be computed as:
PD (1 node) = C N V 2 f watts
Assuming an average of 10,000 active nodes in the whole distributed processor,
each having an average of 5 pf capacity, a supply voltage of 10 volts, and an operating
frequency of 2 mHz (easily obtainable using CMOS), the dynamic power is:
PD = (104) x (5 x 10 -12 ) x (102 ) x (2x 106 ) = 10watt
The dynamic power completely dominates the standby power at room temperature.
The total operating power, 10 watts, is divided among some 80 cell wafers so that there
does not appear to be a power dissipation (heating) problem using CMOS by SOS.
The capacity to produce, in 1978, CMOS by SOS integrated cells of about 200,000
devices for the processor including redundant processor arrays, is more a question of
management and company goals than of technical limitations. Progress in materials
and processing techniques, e.g., ion-implantations, multiple heteroepitaxy, multilayer
interconnects, must be matched by progress in the following contributing fields within
the next ten years:
I , Computer aided design including advanced I/O equipments should permit the
man and machine to function in the conversational mode. Dynamic simulation
of an entire memory subarray (7680 devices) or the processor (5,000 devices)
must be possible. The simulation programs should permit the detailed
selection of topology of the networks including device dimensions, physical
location, and interconnection patterns to optimize performance.
. The use of computers will be extended to the automatic production of the tool-
ing i. e., the layout and fabrication of the photo-mask-sets. At the level of
complexity represented by a cell from the proposed distributed processor,
the quantity of detail in such a mask set is too great for human generation.
3-13
. The use of computers will be extended to the control of the acceptance testing
including dynamic functional testing and data management followed by the
automatic generation of masks for discretionary wiring. Retest of the custom-
wired packaged cells must also be under computer control. The access to
various subsystems on the wafer will make particular use of multilayered
wiring and built-in test signal generators.
The importance of all these parallel developments in LSI is appreciated by the
management throughout the industry as evidenced by their direction of R&D funds.
Therefore, it is predicted that the technology will exist in 10 years to build the
distributed processor
3-14
4. PARALLELISM
4.1 INTRODUCTION
This section of the report covers parallelism within the computations. Definitions
of parallelism will 5e given and a discussion of the methods and procedures used to
evaluate parallelism within a set of computations will be presented. Parallelism will
be investigated in order to provide a basis for evaluating various parallel or distributed
computer organizations. In particular, the amount of parallelism that may be utilized
will be evaluated anu u,u muL,,uu_ of making use ofpar o11,_I_m or assigning parallel
operations shall be investigated. Finally, results derived from the application
of parallelism studies to the spaceborne computational problem shall be presented.
Two types of parallelism, applied and natural, may be defined as follows: Applied
parallelism: The property of a set of computations that enables a number of groups of
identical operations within the set to be processed simultaneously on distinct or the
same data bases. Natural parallelism: The property of a set of computations that
enables a number of groups of operations within the set to be processed simultaneously
and independently on distinct or the same data bases.
It may be noted from these definitions that applied parallelism is a special case
of natural parallelism, since the naturally parallel operations could be groups of
identical operations. The distinction between the two types is introduced since it has
important implications with regards to computer organization. The two types of
parallelism will be illustrated with the simple example in Figure 4-1.
The example is the computation of the expression a/x + b/x + cy = Z. Figure 4-1
illustrates how the expression would be computed in a sequential manner. The numbers
above each circle indicate the time that might be required to compute each term on a
sequential computer (S). If applied parallelism were capable of being taken advantage
of, the computation on such a machine would take place as illustrated in Figure 4-2.
The term A/S in this figure is the ratio of the time required on the machine with the
capability for executing applied parallelism to the time required on the sequential
_0.333 _0.333 O.16q _7 Z
Figure 4-1. Sequential Steps in Computation
4-1
Figure 4-2. Applied Parallelism in the Computation
machine. It shouldalso benotedthat a degreeof appliedparallelism of 2 is utilized
in Figure4-2 (this occurred during the parallel computationof a/x andb/x).
If the capability for taking advantageof natural parallelism is nowintroduced,
then thecomputationmaytakethe form illustrated in Figure 4-3. It shouldbenoted
that theterm cy maynowbe computedin parallel with a/x andb/x. Thetotal
computationillustrated by Figure 4-3 maybeclassified asutilizing natural parallelism;
however, it shouldbenotedthat a subsetof this may actually beclassified as applied
parallelism as indicatedby the dashedlines. Therefore, the computationmaybe said
to consistof appliedandnatural parallelism; this combinationmaybe referred to as
total parallelism. Utilization of total parallelism results in the reduction ratio T/S
in Figure 4-3. With this introduction to the notionof parallelism, someof the
problemsin attemptingto analyzea set of computationsto determine the utilization
of parallelism will be givennext.
,Q' W i
I I
[ APPLIED ]
Figure 4-3. Applied and Natural Parallelism in the Computation
4-2
4.2 ASSIGNMENT AND SEQUENCING PARALLEL OPERATIONS
It can be seen from the above simple example that an important part of any
parallelism analysis is the formation of computation graphs to study the degree of
parallelism utilized and the inherent reductions in computation time. Unfortunately
the graphs are quite complex to construct for problems of practical interest. It will
be shown below that the problems encountered in forming the graphs are analogous
to those studied in assembly line and job shop scheduling theory. This particular
topic will be treated first before discussing the overall problem of analyzing
parallelism in the spaceborne computational problem.
An important consideration i_ the al_,y_-^'..... of _"-"""';o'_v............ is +h,_.... degree of
parallelism utilized and the computation time reduction or "effectiveness of the
parallelism". In determining this effectiveness of parallelism one approach that may
be used is to assume a certain degree of parallelism and then determine the minimum
computation time by re-ordering the computation graph to make use of the parallelism.
Following this procedure a curve of the degree of parallelism vs. computation reduction
ratio is obtained. It should be noted that this problem can be approached another way
to arrive at the same curve; this alternate approach is to assume a certain computation
reduction ratio and solve for the minimum degree of parallelism required to achieve
this reduction ratio.
These two approaches of obtaining the "parallelism" curve are analogous to
problems proposed and studied in the area of operations research. The first approach,
determination of minimum computation time given a degree of parallelism, is analogous
to the problem encountered in job shop scheduling or assembly line balancing, namely:
given m jobs or tasks to be performed by n men or assembly stations, determine the
sequence or assignment of the jobs or tasks that completes them in the shortest time.
The second approach is analogous to another job shop scheduling or assembly
line balancing problem, namely: given m jobs to be performed, the total time spent
by any one man or at any one assembly station is not to exceed T, what is the sequence
or ordering of the jobs to minimize n the number of men or assembly stations required?
Considerable research has been carried out in this field of operations research
and several references were reviewed to determine if any of it could be extended to
the parallelism investigation carried out in this study; some significant points from
the references are discussed below. An extensive list of references was found in a
fairly recent report: "The Automatic Assignment and Sequencing of Computations on
Parallel Processor Systems" by D. F. Martin at UCLA (Reference 18).
The following problem is considered in Reference 19: n jobs with partial or total
ordering restrictions are to be performed, if all jobs must be completed by time T
arrange a schedule that completes them with the minimum number of men, or if n men
are available, arrange a schedule that completes all jobs at the earliest time. It is
assumed that all jobs require equal time to complete by any man and that any man can
do any job with the capability of immediately starting another after finishing one.
Under these assumptions a lower bound for the number of men is found. A brief
description of how this lower bound can be found will be given here.
4-3
If
j = y*
y<l/(y*+c) 1 p(a+l-j)
is true then it is impossible to complete all jobs with y men ina+e units of time.
Where
y = number of men
p_+l-j) = number of nodes on the graph withai =a+l-j
a i = Xi +1, where X i is the length of the longest path from the node N i to
the final node in the graph (it is assumed that each branch takes an
equal unit of time, 1, to complete)
a = maximum X i over the entire graph
c = a non-negative integer
y* = a positive integer such that
(1)
j =Y j =y*
maxy [ 1/(,+c) E p(a+l-j) ] = 1/(y* +c) Ej 1 j=l
p(_+l-j)
(2)
Basically, the lower bound (y) is found by: labelling all the nodes of the graph with
their appropriate p(ai), determining c from the allotted time, T, to complete the tasks
(c corresponds to a time interval that may be added on to the shortest possible time to
complete the tasks, thereby giving the alloted time, T, to complete the tasks, c units
of time greater than the minimum), and then determining y* for Equation 1 by solving
Equation 2.
Unfortunately in using this algorithm to find the upper bound on the minimum
number of men, the ordering restriction that the graph be a tree* must be imposed.
With this restriction the minimum is given by m, where m is the integer satisfying:
m-l<I/(y* + c) ZJ = y*
]=i
p (a+ l-j) _< m (3)
and where the same definitions apply.
It is also proven that for these assumptions a very simple algorithm may be
applied to determine the sequencing procedure for the jobs. This algorithm states that
*A tree may be defined as the ordering restriction that the directed graph has no nodes
whieh contain more than one successor node.
4-4
if the number of jobs that may be started at any point in the graph is greater than the
number of men available, then the jobs assigned are those with the longest path through
the graph, in case of a tie the choice is arbitrary. This method is simply that of
starting the longest job first or as soon as possible.
The restrictions that each job requires an equal amount of time and that the
ordering of the graph be a tree are severe in terms of attempting to extend this work
to parallelism studies of computations. It may be possible to reduce computational
graphs to the micro-operation level where each job takes an equal amount of time;
however, for problems of reasonable complexity, this may become a monumental
task . The restriction of a tree limits the type of graphs that may be analyzed;
unfortunately most of the computations involved here do not form trees (This is
because the intermediate results are often used at many points Within a computation).
Nevertheless this work has been pointed out here since it represents the only
pure analytical approach found in the literature to solving the assignment and sequencing
problem. If the first restriction namely equal lengths of time can be reckoned with,
it does offer a lower bound for any computational graph.
The following problem is investigated in Reference 20: given n jobs, the sum of
the execution time at any one station is not to exceed some T, what is the ordering of
the jobs to minimize the number of stations ? The approach taken for solution here is
by means of dynamic programming. A recursion type approach is proposed. Basically
what is proposed is to vary the assignment of the last job that is executed and determine
the assignment over all feasible cases that results in the minimum cost (cost here is
the number of execution stations). Then recursively the remaining assignments are
determined to minimize the total cost. This method will yield an optimum solution.
However, for all but modest size problems the program will become extremely lengthy.
For large problems approximate solutions are proposed by testing subsets of all
possible sequences thereby yielding a suboptimal solution.
An apriori assignment and sequencing approach to assigning jobs on a computer
in order to minimize overall computation time is given in Reference 21. The problem
is stated for a set of autonomous processing units where the jobs take different times
depending on which unit it is assigned to; the situation where a job may be assigned to
only one particular unit is considered here however. The graph of the computation
flow is used to derive a "precedence" matrix which is then used to derive an "assign-
ment" matrix. This matrix contains information as to what must precede each task,
its time to complete, and also the number of tasks that must follow it (called the
precedence number).
Several rules are defined for assigning the tasks; the objective being to minimize
the overall computation time. One rule given for assigning tasks in the case where
more than one task can be assigned to a given processor is to assign the task with the
highest precedence number. Another case is considered for this situation where the
precedence numbers are the same; a very simple situation is examined and some
special rules are stated. It is noted that when more tasks and processors are involved,
the solution becomes very complex and in fact it is stated that no general solution
exists for the general case of n tasks and m processors.
Evaluation of these references and others show that there are no general
solutions to the optimum sequencing problem that are applicable to the study of
4-5
parallelism here. The only solutions to the problem are generally very lengthy
computational programs considering all permutations of the solution space to find a
minimum; analytical procedures are non-existent for general solutions. It should
also be mentioned here that there is another field in the assignment of computations,
this is the scheduling problem where jobs are scheduled on one or more processors
to achieve a minimum cost. What is generally assumed here is that there is a cost
function, c (t), associated with each job and a schedule is to be determined which will
minimize the total cost. The methods used here are generally applicable to and used
for time sharing or multiprogramming applications.
Some work has been done on several specific mathematical algorithms in
Reference 22 to determine parallelism and computation reduction ratio. The type of
algorithms investigated were matrix multiply, solving ordinary differential equations,
etc. This work may be useful in analyzing the problems here when these types of
specific mathematical operations are encountered in the computations to be analyzed.
4.3 APPLICATION OF PARALLELISM STUDIES
The foregoing has presented some general concepts of parallelism and their
relationships to the problems involved in job shop scheduling and assembly line
balancing. In this section the application of parallelism studies to the spaceborne
computational problem shall be investigated. The theoretical approach to the
application of parallelism studies shall first be presented; this approach is too complex
to apply to the problem at hand. Therefore, at the end of this section an approach that
will be used with simplifying assumptions shall be presented.
Two types of parallelism shall be considered: applied and natural. The com-
putations as executed on a sequential computer for the spaceborne computational
problem are shown in Figure 4-4. These computations are shown as a sequential
solution of tasks i through n. The computations are cyclic, in thatwhentask n is
completed, task 1 is once again initiated. These tasks are subtasks of the four major
computational tasks, namely:
1. Navigation and Guidance
2. Telecommunications
3. Scientific Experiments
4. Status Monitoring and Checkout
The subtasks may be items such as "compute body to inertial transformation matrix, "
"compress video experiment data, " etc. These subtasks are either periodic in that
they require a certain repetition rate or they are asynchronous in that they are required
upon request, as background, etc. As might be expected the four major tasks consist
of a variety of subtasks in terms of size and repetition rates of solution. Some of the
tasks shown in Figure 4-4 are identical due to the required periodicity of certain tasks
for example task 4 may be required to be computed once per second and assuming
task 1 thru n are cycled thru once per second then task 4 is unique and represented by
only one task in this figure. Now assume task 5 is required to be computed twice per
second, then this task would appear twice in Figure 4-4; for example as task 5 and
say task n.
4-6
TASK 1
TASK 2
TASK 3
TASK 4
TASK 5
Figure 4-4.
m
m
Sequential Solution of Computational Problem
4-7
4.3.1 Applied Parallelism
If the computational facility has the capability for execution of applied parallelism,
the execution of the set of tasks shown in Figure 4-4 may now be depicted as shown in
Figure 4-5. Referring back to Figure 4-4, the sequential solution of a task is depicted
by the sequential set of steps shown in task 4. The application of applied parallelism
is shown in Figure 4-5 by the formation of an applied parallel graph shown in the
solution of tasks 1 and 2. As inferred by the definition of applied parallelism, the
vertices of these graphs that are executed simultaneously (indicated by vertices lying
in the same horizontal plane), consist of an identical computational step (e. g. add,
transfer, etc. is the level considered here). The degree of parallelism utilized varies
as a function of time or vertical position in each graph. The degree of parallelism is
given by the number of vertices in a particular horizontal plane.
The considerations of parallelism in Figure 4-5 are within tasks only. It is
possible to consider applied parallelism between tasks also. For example, the graphs
for tasks 1 and 2 in Figure 4-5 may have several horizontal planes that execute the
same applied parallel operations and as a result may be combined horizontally; this
combining cannot be carried out if precedence relationships exist, however. Likewise
many of the graphs for the remaining tasks may have horizontal planes that could also
be combined. The application of applied parallelism to be considered in this study will
be within a task only or between tasks only when the tasks have identical graphs so
that they may be combined directly. This limitation is introduced since the considera-
tion of combining applied parallelism between tasks and the timing of the remaining non-
TASK I
TASK 2
TASK 3
Figure 4-5.
TASK n
Utilization of Applied Parallelism in the Computational Problem
4-8
appliedcomputationsbetweenthesetasks becomesavery complexproblem. Indeedit
may notbe practical to programthe tasks in sucha mannersince executivecosts in
attemptingto scheduletasks to take advantageof all theappliedparallelism between
tasks mayprove extremely high. Someconsiderationwill be givento this problem
whensoftwarestudies are conductedon the distributed organizationlater in the report.
The computationaltasks will be studiedas describedaboveanda curve will be
derived indicating the degreeof appliedparallelism vs. the computationreduction
ratio. The computationalreduction ratio is basicallythe speedadvantagegainedby
using appliedparallelism or howmuchfaster the problemwill run. It is determined
by dividing thetime it takesto executethe problem ona sequentialmachine
(Figure 4-4) by the time it takes ona machinewith appliedparallelism (Fibre 4-5).
The curve is obtained by varying the degree of applied parallelism available when
constructing the solution as shown in Figure 4-5. This curve gives a measure of the
effectiveness or efficiency of applied parallelism, e.g., if a reduction ratio of a were
obtained with a degree of parallelism of a, then one could say the effectiveness was
100 percent or the applied parallelism were utilized all throughout the solution of
the problem.
4.3.2 Natural Parallelism
The next step in investigating parallelism within the spaceborne computational
problem is to consider natural parallelism. If natural parallelism is assumed available
on the computational facility, the solution of the computations may take the form shown
in Figure 4-6. In this figure the tasks have utilized applied parallelism as in
TASK 5
TASK 6
ASK 7
_ASK 8
ASK
TASK I0
Figure 4-6. Utilization of Applied and Natural Parallelism in the Computational Problem
4-9
Figure 4-5 andthennatural parallelism whenpossible. This utilization of natural
parallelism occursbothwithin a task andbetweentasks. Considertask 6 for example
someof the graphof this task may exhibit appliedparallelism and someof it natural
parallelism. It is seenin Figure 4-6 that task 5 maybe solvedin parallel with
tasks6, 7, and8 andalso tasks 9 and10andall the other tasks indicatedby the
dashedlines. However, note that tasks 6, 7, and8 containcertain precedence
relationshipsandmustbe solvedin the sequenceas indicated. Solvingthe tasks in
this manner, onecanobtainthe minimum computationtime or maximumcomputation
reductionratio by finding the tasks or set of tasks whichtake the longest to compute
in Figure 4-6, for examplesaytasks 9 and10. Of course this assumesthat whatever
degreeof parallelism that is required to achievethis is available. Theproblem that
oneis interested in howeveris to find this minimum computationtime using the mini-
mumdegreeof parallelism to achieveit. This then requires assigningandsequencing
the computationsin anoptimummanner, it shouldbe notedthat this must bedone
within a task andalso betweena task, as mightbe expectedthis is not aneasyproblem
to solvedueto the large amountof permutationsinvolved.
As anexampleconsider the situation depictedin Figure 4-7. In task n is shown
thedegreeof appliedandnatural parallelism required as a functionof time through
this task; the remainder of thetasks also have similar graphsassociatedwith them.
It shouldbe rememberedthat thesegraphs of the degreeof parallelism are obtained
by anoptimum assignmentandsequencinganalysiswithin eachof the tasks. The
problemof optimumassignmentandsequencingbetweenthetasks involves arranging
thetasks bothvertically andhorizontally while observingprecedencerelationships
TASK n
- <::IID
m_m
DEGREE /OF APPLIED DEGREE
PARALLELISM OF NATURAL
PARALLELISM
Figure 4-7. Assignment and SequencingofTasks in the Applied and Natural Parallelism
4-10
between certain tasks. As may be expected this involves many permutations. To
further complicate the problem it is also desired to utilize as much applied
parallelism as possible. This requires trying to match up identical applied operations
between tasks where possible and also trying to sequence the tasks so that as much
applied is being utilized at any one time as is possible. The total problem has two
curves similar to that given for task n and it is desired to have as much applied
parallelism as is possible utilized and the minimum amount of natural parallelism
utilized to achieve the maximum computation reduction ratio. (It should be noted that
the natural parallelism curve also includes the applied parallel curve by definition.)
In general the final result desired is to consider the degree of parallelism
required *_'_,,_,-s*,_,_,_*'^"+the +'_!.... prob!em, as shown in Figure 4-8, and minimize the
maximum point on this graph. Each permutation in assigning and sequencing the tasks
will result in a new graph in Figure 4-8 and the objective is to determine the optimum
permutation.
4.3.3 Limiting the Amount of Parallelism
The above discussion was centered about determining the minimum computation
time or maximum reduction ratio. As noted above some task or set of tasks will
require the longest to compute, for example 9 and 10, in Figure 4-6. Then with the
above objective as a goal the problem involved a horizontal and vertical permutation
around these longest tasks. The next consideration involved in investigating parallelism
is to limit the degree of parallelism (lower than that degree obtained above) available
t_
o
cg I,I,, ,ll
SEQUENCE THROUGH PROBLEM
Figure 4-8. Degree of Parallelism Utilized
4-11
and determine the maximum computation reduction ratio with this degree. The
proccdure now is the same as above, however, since there is now a lower degree of
parallelism considered, additional tasks must be vertically combined with tasks 9 and
10 (the longest tasks). Essentially as the degree of parallelism is reduced the graph
of the set of computations as shown in Figure 4-6 will be stretched out vertically and
reduced horizontally. It should also be noted now as the degree of parallelism is
reduced the permutations involved in determining the optimum computation reduction
ratio become more complex. This is due to the fact that not only must the permutations
between the tasks horizontally and vertically be considered but also the permutations
within each task must be considered at the same time.
The result of the above analysis would be a curve as shown in Figure 4-9. This
curve shows the degree of parallelism required vs. the computation reduction ratio.
The curve shows the computation reduction ratio from 1 with a degree of parallelism
of 1 (equivalent to solution with a single sequential computer as in Figure 4-1) to the
maximum reduction ratio and the minimum degree of parallelism required to achieve it.
4.3.4 Computer Organization Relations
Relating the above effort to computer organization, it is possible to think of
computational cells executing the above parallel operations where one cell is required
for each degree of parallelism. The computation reduction ratio essentially dictates
at what speed the cells must be capable of operating. (Details of organizations such as
this is given in Section 5.) The above, however, assumes as much storage available
co
r_
r..
O
r,l
L_
r_
I
1 MAX
COMPUTATION REDUCTION RATIO
100%
UTILIZATION
CURVE
Figure 4-9. Degree of Parallelism vs. Computation Reduction Ratio
4-12
per cell as required. After havinggonethroughthe aboveanalysis it is possibleto
consider the programmingof the tasks on the cells anda curveof storage required per
cell as a function of the degreeof parallelism (or numberof cells) wouldbeobtained
as shownin Figure 4-10. Giventhe curves in Figures 9 and10,it is possible to select adegreeof
parallelism (or numberof cells) anddeterminethe required speedandstorage capabil-
ity of eachcell.
It is possibleto grouptogethertasks or portionsof the total computational
problem as shownin Figure 4-11; this figure is identical to Figure 4-4 exceptthat
tasks are nowassignedas parts of groupsas shown. Thepurposeof introducing this
conceptof groupsis that is offers the possibility of makinguseof more applied
parallelism; for example, if two groupswereusedtwo different typesof applied
parallel computationcouldbe executedsimultaneouslyif oneconsiders the two groups
as assignedto two independentcomputationfacilities. In consideringgroups, onewill
obtaina set of curves for degreeof parallelism vs. storagerequired per cell as shown
in Figure 4-12. This figure is identical to Figure 4-10 excepta newparameter, the
numberof groups, hasbeenadded. The degreeof parallelism in Figure 4-12 refers
to the total parallelism (or total number of cells) required within the groups (e. g.
2 groups with a degree of 10 each result in a total parallelism of 20). Only an equal
amount of parallelism for each group is considered here since this is perhaps the most
practical from a computer organizational viewpoint as discussed in Section 5.
Figure 4-12 is not intended to imply the exact shape of the curves or which curves lie
on top of each other. Actually the curve for 2 or more groups could lie below that for
1 group etc. The determination of this would require the plotting of curves as shown
in Figure 4-13 which shows the total number of cells vs. the number of groups
r¢
Figure 4-10.
STOaAGE REQUIRED PERCELL
Degree of Parallelism vs. Storage Required Per Cell
4-13
GROUP n - 1 GROUP n GROUP n + 1
Figure 4-11. Computational Problem Assignment Among Groups
C/3
t_
_=_
r_
N GROUPS
2 GROUPS
1 GROUP
Figure 4-12.
STORAGE REQUIRED/CELL
Degree of Parallelism vs. Storage Required Per Cell Using
Groups of Cells
4-14
¢O
C2
r,.
o
r¢
_J3
z
o
3
2
1
COMPUTATION REDUCTION RATIO
STORAGE/CELL
NUMBER OF GROUPS
Figure 4-13. Degree of Parallelism vs. Number of Groups
(assuming a given computation reduction ratio and storage/cell). Curve 1 in
Figure 4-13 shows the situation where a minimum is achieved with more than 1 group;
this can occur if there is a significant amount of applied parallelism in the computa-
tions. This results since it is assumed that in distinct groups applied parallel
operations can be executed independently and simultaneously of each other whereas in
one group only one type of applied parallel operation can be executed at any one time
by definition.
Curve 2 shows the situation where for a low number of groups the number of
cells are approximately the same. In curve 3 the number of cells increases rapidly
as more groups are used, this would occur if the computations contained little applied
parallelism and there was considerable inefficiency in splitting the tasks among
groups (e. g. one group used 27 cells and the other required 29 etc. ). Given a set of
curves as in Figure 4-13 one would attempt to pick a point which made efficient use of
the number of cells.
4.3.5 An Alternative Approach
The entire discussion above has been oriented from a computation reduction ratio
or speed viewpoint. That is with a certain degree of parallelism the maximum com-
putation reduction ratio has been found and associated with each of these points is a
storage requirement per cell if one considers a cell required for each degree of
parallelism. It is possible to approach this problem from the other viewpoint: assume
a certain degree of parallelism or number of cells available, arrange the computations
so as to minimize the storage per cell, and associated with each point will be a speed
requirement per cell.
4-15
It is obviousthat the two approaches can yield completely different results. The
latter approach will not be gone into detail here since the general problems associated
with it have been brought out above in the first approach. It is possible to take both
approaches simultaneously and set up criteria which yields a sub optimum approach
in terms of speed and storage. This would be similar to finding the optimums by each
approach and backing off each somewhat to determine the effect on the other parameter
and selecting a suitable point.
4.3.6 A Practical Approach to Parallelism Studies
The theoretical approach to studying parallelism within the computations has
been presented above. Unfortunately the approach is too complicated, as pointed out
above to be used with a reasonable degree of success. To apply it however certain
simplifying assumptions will be employed. The applied parallelism investigation as
outlined above will be followed and a curve of degree of applied parallelism vs.
computation reduction ratio will be obtained. A number of tasks will be analyzed to
determine curves such as Figure 4-9 for each. Suitable points will be selected for
each tas_ to represent the degree of parallelsim. For tasks not analyzed or for which
no information is available, curves will be hypothesized. As mentioned above one task
or set of tasks will take the longest to compute and therefore determine the minimum
overall computation time. The tasks will then be laid out as in Figure 4-14. In this
figure task m takes the longest to compute, T, and defines the maximum computation
reduction ratio. The remaining tasks are assigned as shown. The assignment can be
based on following an algorithm as shown in the lower portion of Figure 4-14. Essen-
tially this assigns the tasks in order of increasing or decreasing degree of parallelism
as indicated by the arrows. The time allotted is now varied and the assignment
procedure repeated to obtain the degree of parallelism vs. computation reduction ratio
curve for the total problem. It should be noted that the above assumes a certain
constant degree of parallelism is required for each task throughout its execution. This
is of course a gross simplifying assumption since as pointed out previously this degree
of parallelism varies as a function of time through the task. The storage required
per cell will be determined by combining storage graphs of the type indicated in
Figure 4-15 for each task. The graphs will be combined for all the tasks assigned to
any one vertical column (e. g. task i, j and k in Figure 4-14). This procedure may
require the shifting of tasks somewhat if the storage requirements per cell are very
unbalanced between vertical columns (e. g. task k may require being exchanged with
two or more tasks which total up to the same degree of parallelism to smooth out the
storage requirements). Following this very over simplified procedure curves as
shown in Figures 4-9 and 4-10 will be derived.
4-16
r@0Q
@0
TIME
ALLOTTED
Figure 4-14. Assignment of Tasks
4-17
mi
CELL NUMBER
Figure 4-15. Storage Graph/Cell
4-18
4.4 RESULTSOF PARALLELISMSTUDIES
The prior sections of this chaptercontaineda discussionof parallelism in
general andmethodsfor analyzingcomputationsto determine the amountof paral-
lelism within them. The computationsfor the mannedMars lander mission were
analyzedfor parallelism. Theresults of this analysisare presentedin this section.
Eachof the computationaltasks as definedin the requirementsin the PhaseI study(Reference2) were investigatedandthe results aresummarized in Tables4-1 and
4-2.
Figures 4-16 and4-17 containthe appliedparallelism speedcurve; theyshow
the computationreductionratio vs the degreeof appliedparallelism available in the
computationsystem. The 100percent utilization curve in the figures is the 1-to-1
curve, i.e., for a degreeof parallelism of 2 the computationreduction ratio would
be2, for 5 it wouldbe 5, etc. Theactual curve is seento deviateslowly from the
1-to-1 curve at first andthen reachesanasymptoticreduction ratio valueof 13.66
for higher degreesof parallelism. Thekneeof thecurve occurs at approximately
a degreeof 15; beyondthis degreethe curve deviatessharply from the 1-to-1 curve.
Figures4-18and4-19containthecomputationreduction ratio andstorage
required per cell (assumingonedegreeof parallelism results in onecell), respectively,
vs the degreeof natural parallelism available in thecomputationsystem. It shouldbe
recalled that natural parallelism includesappliedparallelism by definition. The
vertical scale in Figure4-18is the sameas in Figure4-16;however,note that the hori-
zontal scale is considerablylarger. The computationreduction ratio curve doesnot
Table 4-1. Applied Parallelism Results
Degreeof Applied Parallelism ComputationReductionRatio
1331
800
300
100
50
15
5
2
1
13.66
13.66
13.65
13.47
12.58
8.75
3.98
1.895
1
4-19
Table 4-2. Natural Parallelism Results
Degree of Computation Storage
Parallelism Reduction Ratio (Words/Cell)
1342
300
100
50
42.4
42.4
42.4
28.2
15
5
2
1
12.15
4.6
1.96
1
200
350
575
1,800
5,250
12,700
24,633
have as sharp a knee as in the applied parallelism case. However, it appears that
somewhere in the range of 40 to 60 in degree of parallelism the curve starts to deviate
rapidly from the 1-to-1 curve. The storage curve is drawn on log-log paper to bring
out the deviations from the 1-to-1 curve. It can be seen that the curve begins to
deviate rapidly from the 1-to-1 curve in the vicinity of a degree of parallelism of
80 to 150.
The above curves give an indication of the efficiency or utilization of parallelism.
They may also be used in determining the speed-storage characteristics of the cells.
It should be recalled from the technology section of this report that a storage capability
of approximately 512 words per cell was considered to be achievable. If this storage
is available per cell, then, referring to Figure 4-19, one can see that approximately
58 cells are needed (assuming a degree of parallelism equals one cell). Translating
this into speed requirements one can see from Figure 4-18 that a computation reduc-
tion ratio of approximately 31 results with 58 cells. Since the speed requirement for
a single computer is approximately 1,450,000 short operations/sec (the parallelism
investigations were carried out for the Mars orbital phase which has the maximum
speed and storage requirement), the speed requirement per cell is therefore approxi-
mately 25,000 short operations (ADD, SUB, etc. ) per second. Since these require-
ments did not include overhead functions (such as the executives) one may estimate
the number of cells required at approximately 80 with a storage capability of 512 words
each and requiring a speed capability of 25,000 operations per second. One can now
see that the cells will most likely be storage restricted rather than speed restricted
since the cells should be capable of more than 25,000 short operations per second.
4-20
lO0
90
80
70
60
_o
P_
_ _o
30
2O
10
0
m
_ I 1 I I I
1 3 5 7 9 11 13 15
COMPUTATION REDUCTION RATIO
Figure 4-16. Applied Parallelism Speed Curve
4-21
10o0
,-I
,..I
e_
e_
,,¢
L_
lOO
10
1
I
J
/
/
/
...%
/
100% UTILIZATION CURVE
95
COMPUTATION REDUCTION RATIO
11 13 15
Figure 4-17. Applied Parallelism Speed Curve
4-22
t,1
r,1
IO0
9o
80
7O
6o
50
4o
30
2O
10
Figure 4-18.
lOO% UTILIZATION CURVE
I I I I
16 24 32 40
COMPUTATION REDUCTION RATIO
Natural Parallelism Speed Curve
48
4-23
lO00
lOO
,-]
ill
_I
t_
io
100 1000 10,000
STORAGE ('WORDS/CELL)
100, 000
Figure 4-19. Natural Parallelism Storage Curve
4-24
.5. COMPUTERSTRUCTURES
5.1 DEVELOPMENT OF COMPUTER SYSTEM ARCHITECTURE
5.1.1 Introduction
This section discusses a number of approaches to construction of general
purpose computer systems. It is included in order to put into perspective the Distri-
buted Processor System under study in this contract. (Hereafter the Distributed
Processor will be referred to as DAMP for Distributed Array Memory and
Processor. )
The various system approaches are presented in an order that is consistent
with the development of the technology necessary to implement each structure. This
order will be seen to be almost chronological; however, there will be some deviation
since a number of interesting and practical computer organizations have been
conceived both a number of years after and before lhe technolology was available to
practically implement them. It should be noted that each organization was not
developed to explicitly use the technology but rather was developed to attempt to meet
a certain set of requirements (cost, high speed, reliability etc. ) in a more efficient
manner by taking advantage of new hardware technology. In fact, in many cases,
initial organizational considerations of a system may have not even considered
technology, but instead were totally concerned with a theoretical structure suitable
for meeting certain requirements.
The discussion of computer organizations is pointed toward general purpose
structures since the requirements for the missions discussed in Section 2 dictate a
machine capable of carrying out a wide variety of computational tasks. As a result,
associative memory structures will not be discussed here since it is generally felt that
associative machines are not practical for general purpose computers. (See Refer-
ence 23. ) There are also such a large number of these structures suggested in the
literature for solution of certain types of problems that it would not be practical to
present a synopsis here. However, this lack of discussion does not necessarily
eliminate the possibility of using associative memories for carrying out certain
specific functions within a more generalized G.P. computer. On the other hand the
Solomon and Illiac machines are discussed even though they are also recognized to
only be practical for execution of a relatively limited set of computational tasks.
However, these structures are of interest since they are powerful distrubuted
processing structures that can be practically implemented with near-term LSI technolog_
and because they provide good examples of the use of global control (a single instruc-
tion stream with multiple data streams).
5.1.2 Single Computer
In the early stages of computer development machines were generally built
from a variety of digital circuits with varying fan-ins and fan-outs, such as AND
gates, OR gates, inverters, etc. The development of the transistor enabled machines
to be efficiently built with a very few basic circuits, for example, simply NAND gates.
This eased the design and fabrication process considerably; and as a result, much
5-1
lessexpensivemachinescouldbeconstructed. With the more recent developmentof
integratedcircuits the constructionof computersfrom a very small numberof types
of circuits hasbecomewide spread. This hasnot only vastly decreasedtheprice
of computers,but hasalso simplified reliability programs, sparing, andmaintence.
Theabovedevelopmentshave, of course, also reflected into more economicaland
reliable memorycircuitry.
Theeconomypossiblewith transistor andIC computersenabledmultiple com-
puterandmultiprocessor structures to beproposedandconstructedfor applications
with requirementsthat were difficult to meetwith a singlecomputer. For example,
large scientific computingcenterswith requirements for greater processingpower
thanavailable from a singlecomputer are ableto usemultiple computersor multi-
processors(time-sharing systems, etc. ). Thesestructures havealso founduse in
aerospaceapplicationswheregreater on-line reliability than that offered by a single
computerwas required.
5.1.3 Cellular Arrays
The development over the past number of years of batch fabricated complex
integrated circuits has helped foster a good deal of research. Of particular interest
here is the increase in research activity pointed toward development of a single logic
element that could be used with a fixed connection pattern to implement a wide
variety of combinational (and sequential) circuitry. This type of work can be seen to
be the logical extension of the early use of IC Vs to minimize circuit types since
structures built from these IC circuits required complicated interconnections. Com-
binational circuitry of the above type has been called cellular arrays. The cells or
the single logic elements in the arrays can either be fixed or variable (using cut-
points as in Reference 24); and the interconnection structure can also be fixed or
variable. (It can be variable in the sense that cells can be preset so thatthey do not
use certain available connections. ) A survey of this mierocellular research is
given in Reference 25.
Reference 24 by Minnick et. al. at SRI provides a good example of research
being carried out on cellular arrays. It is concerned with the organization, use, and
design of arrays constructed from "cutpoint" cells. These cells are constructed
on part of an IC chip and are provided with cutpoints (switches or flip-flops) that
can be used to specialize the operation of the cell to one of six simple logic functions
or to an R-S flip-flop. The usefulness of these cells within a variety of fixed and
variable connection structures is then discussed and evaluated. The structure of one
representative array organization is shown in Figure 5-1. This organization can vary
its connection structure by disconnecting various input leads from any cells. These
structures used to implement the logic in a computer co_dd offer a number of advant-
ages by the good utilization of IC technology. For example, a number of cells including
an interconnection structure could be fabricated on one IC chip and then mounted in a
package. If this was practical it would clearly provide very inexpensive and reliable
logic for the processing section of a computer. (It would be reliable since a small
number of external connections would be required. ) However, there are a number of
problems with the approach that may make it impractical: (1), In order to use the
fabrication and packaging structure described above, an efficient algorithm to avoid
faulty cells must be developed. The present algorithms require extensive duplication
of hardware; (2) the number of cells required to implement a given switching function
is much larger than the number required with non-cellular logic; (3) at the present,
5-2
Figure 5-1. Structure of the Cob Web Array (From Ref. 24}
effective logical design techniques do not exist; and (4) much more effort is required
to be able to simply design a computer with the circuitry. Solutions to the above
problems are necessary to make the Minnick's arrays and in fact cellular arrays in
general at all practical for computer design and fabrication. In fact, an even greater
draw back to these structures is the fact that the onset of LSI technology in the future
will make it practical to construct a complete processor on a chip. (Just a few chips
in the near future) As a result, a single computer, a multiprocessing system, or a
distributed processing system using a single chip for a processor may well preclude
the use of cellular arrays to construct processors. However, these techniques, if
efficiently developed, may find some use in increasing yields of processor chips by
using discretionary wiring or some other faulty cell avoidance algorithm.
5.1.4 Distributed Processing Arrays
In the past and at present a considerable amount of thought has been devoted to
developing arrays of processors capable of carrying out a number of computations in
parallel. For the most part these structures have been suggested to solve various
types of problems, such as associative processors for information storage and
retrieval, Holland machines for direct control of highly parallel systems (e. g.,
pattern recognition etc. ), and Solomon machines for solution of linear systems,
ordinary and partial differential equations, matrix operations, etc. Some work has
also been pointed toward the use of these machines on a set of general purpose
problems; however, although they would be capable of solution of these problems,
the array structures are quite inefficient in this case when compared to conventional
computing systems. An additional reason for interest in these distributed processors
5-3
is that they could take good advantage of the present and developing LSI technology.
Each processor could be constructed from a single chip and the chips could be
packaged on a standard board with a fixed connection structure. Two distributed
processing structures will be briefly discussed below.
5. I. 4.1 Holland Machine
The Holland machine has been discussed in an earlier report Reference i and
also in the literature, Reference 22 and 26. It consists of an array of cells, see
Figure 5-2, each of which has a small amount of storage and is capable of executing
a small number of arithmetic and control instructions. This structure, like the
Minnicks logic elements, would take good advantage of IC technology by implementing
each cell as a chip and by packaging the chips on a board containing the fixed standard
interconnection structure. The real advantage here over the Minnicks structure is
that both the processing and memory are included in this regular structure. The cells
operate under local control and as a result can execute a number of different instruction
streams in parallel. Any cell can act as a controller cell, an accumulator, or a
storage unit. Paths to operands are then built from the controller cell to storage cells
and then back to a cell designated as an accumulator to carry out the operation. This
basic iterative machine structure, investigated in some depth at the University of
Michigan (see Reference 22), was found to be impractical because the path building
necessity extremely complicates the programming and also makes it difficult to
reconfigure after failures. The above problems were also compounded by the fact that
very little memory was available to each cell and that only a small instruction set
was available.
Although this machine was not considered practical, the discussion of parallelism
and the usefulness of local control for executing parallel computations is of interest.
In particular the ability to locally control computations enables a large number of
distinct problems as well as parallel parts of a single problem to be executed simultan-
eously. Although the Holland machine was investigated for problems in the fields of
pattern recognition, games, simulation and adaption etc., this local control feature
can be seen to be of value for executing a set of general purpose problems on a distri-
buted processor machine without severe programming problems. Each separate
problem can be executed simultaneously if the computing power is available; however,
the efficiency of hardware utilization would not be as high as for a fast single computer
since some problems or groups of problems will be found to be sequential in nature.
Nonetheless, it will be shown below that the use of a distributed processing machine
capable of local control is worthwhile under certain sets of requirements (e. g. high
reliability, high speed on certain problems).
5.1.4.2 Solomon and Illiac Machines
The Solomon machine was developed in order to obtain a high solution rate on
certain classes of problems by applying parallel processing. The class of operations
involve "sets of variables which permit simultaneous independent operation on each
individual variable within the set". The high solution rate is achieved by utilizing
a central control unit to control the operation of a large number of processing elements
operating on separate variables (global control). This machine, shown in Figure 5-3,
is also discussed in Reference l and in the literature (References 27 and 28). Each
processing element has two thousand bits of memory available to it along with some
5-4
I I I
I I I
E?--_
E?--_
I
Figure 5-2. Holland Machine
CENTRAL CONTROL STORAGE
i
I
L__
Figure 5-3.
BRANCHING LEVELS I
Solomon Machine
5-5
processingability. Thelocal control is limited to simple decisions as to whetheror
not to receive the presentcontrol word from the central control unit. This structure
simplifies theprocessingelementwhich of course is important since there maybe a
fewhundredof thesein the system. It is also organizedto take goodadvantageof
present LSItechnology. (Theconnectionstructure is regular andthe processing
elementsandmemorycanbeconstructedfrom a small numberof commonLSI chips;
however, separatetypesof chips must be usedin the central control unit, the processing
elements, andthe memories. Notethat the Hollandmachine, if practical, could
haveusedjust onechip type for everything.)
The Solomonstructure, as mentionedearlier, was developedto efficiently solve
the classof problemscharacterizedby a single instruction stream simultaneously
operatingonmultiple datastreams (globalcontrol). This meansthat the structure
wouldbe relatively inefficient (andspeedlimited) ona set of generalpurpose com-
putationssinceglobal type operationswouldonly occur a portion of the time. For
example,wheneversets of sequentialcalculations must beperformed all but oneof
the cells are idle. (Evenwith parallel computationsmanycells are idle since each
calculation maynot involve enoughdatastreams to use all the availableprocessing
elementssimultaneously.) This limitation of processingcapability for general purpose
computationsis lessenedto someextent in the Illiac machine by dividing the
processingelementsup into separategroupseachwith its owncontrol unit. The
nliac is an advancedversion of the Solomonmachineandas a result its operationand
purposefor development{applicationarea) is basically the sameas for the Solomon
machine. The main differenceis that the groupingof processing elementsenablesthe
Illiac structure to obtainhigher hardwareutilization since all the groupscanoperate
simultaneouslyonseparatetasks or on the sametask. It shouldagainbenotedthat
the limitations of the SolomonandIlliac machinesongeneralpurposecomputations,
showsthe problemsof using only global control for executionof thesecomputations;
however, this is not intendedas a criticism of thesemachinessince theywere not
developedto efficiently executea widevariety of computations.
The real interest here in thesearray structures is to introduceandinvestigate
thevalueof global control for executionof certain parallel computations. It was
statedabovethat the global control of a large numberof processingelementsenabled
highsolution rates onappropriateproblems. Thesehigh rates are obtainablesince
a singlecentral control minimizes the overheadassociatedwith obtaininginstructions
andsequencingcomputationson the processingelements. (As mentionedearlier this
structure also savesa significant amountof control hardwarein the Solomonprocessing
elements) In additionto savingoverheadtheuse of globalcontrol cansavea good
quantityof memory since only oneinstruction stream must bestored to operatemany
processors. With a local control structure the sameinstruction stream wouldhave
to bestored in a numberof places in order to beaccessibleto a large numberof
processingelements. (This wouldbe necessarysince a central subroutinememory
wouldhavelow accessibility becauseit wouldneedto beusedby manyprocessors.)
Becauseof theseadvantagesof global control, it wouldbeuseful to addglobal control
capability to a local control distributed processor if the hardwaredid not becomeunduly
complex. Sucha structure is discussedin the next section.
5-6
5.1.5 Distributed Array Memory and Processor (DAMP} System
The DAMP System was developed to provide general purpose computing with very
high reliability for long duration space missions. In addition it was desirable for the
system to make good use of future circuit and memory technology, to dissipate a
small amount of power, and to be easily expandable or contractable so that it could be
flexibly applied to a variety of missions. The high reliability requirement can actually
be interpreted as a need to continue executing a critical subset of the computations in
spite of hardware failures (see Paragraph 2.2). In other words the computations
can be "gracefully degraded. " The DAMP system meets this goal by dividing its
memory up into a number of modules (approximately 512-16 bit words used per module)
and then integrating each memory .,uuum--^'_"lwith a p ............_ to make an independently
operating "cell. " These cells can then individually fail without bringing down the
whole computing system. In addition they can be turned on and off to match the require-
ments from phase to phase. This will substantially decrease the power dissipation of
the system and also increase the reliability (assuming off line or dormant reliability
is higher than on line or operating reliability). The same DAMP structure consisting
of many cells can meet the expandability requirement by adding a small or large
number of cells to match a new set of requirements. The system also makes good use
of future circuit and memory technology since each cell can be fabricated on a single
LSI (MOS) wafer in the 1980 time frame (see Section 3 on technology). In particular,
the regular pattern of the memory in a cell will enable discretionary wiring or encoding
techniques to be used to obtain reasonable yields on large complex wafers (or cells).
In addition the cells are used in a regular fixed connection pattern so that they can be
easily packaged on a board. This discussion of the general features of the DAMP
System is expanded below.
The organization of a cell is shown in Figure 5-4. Other possible cell
organizations could use additional chips in order to give each cell more memory. For
example separate wafers containing 512 to 1024 words of memory and no processing
could be added to the cell shown in Figure 5-4. Further investigation and programming
effort is necessary to determine if the presently chosen single wafer cell is memory
limited and thus in need of additional memory wafers. Clearly the single wafer cell
is preferable if the system is not memory limited since it requires somewhat less
connections, offers higher processing rates on certain problems, and because it offers
a higher degree of graceful degradation (more memory per cell would mean less
cells in the system}.
The cells, as previously mentioned, are organized into groups that have a regular
connection structure that results in simplified packaging. The system grouping and
connection structure is shown in Figure 5-5. Appendix F presents a discussion of
the alternative organizations that were also considered, thereby leading to the structure
shown in this figure.
In light of the above discussion of the cell and group structure the ability of the
system to gracefully degrade can be better understood. A few spares will be placed
in each group so that a few cells can fail and simply be replaced by the spares without
affecting computation. After these first initial failures, the computation power of
a group would simply degrade in small increments with each additional cell failure.
Note that this enables the critical subset of computations to be continued in spite of a
large number of failures. It should also be noted here that a failure of an inter-cell
bus would cause a whole group to fail, but the system is designed such that this
5-7
I©
c_
Z
@
c_
I
0
5-8
NEIGHBOR
/COMMUNICATION
GROUP
SWITCH I
J__
l
INTER INTER
CELL GROUP
BUSSES BUS
I-iT-i
INTER
_-__-t
.J.,r_" I10 CONN[CTION
PANEL
CELLS
INPUT/OUTPUT
CONDITIONERS
DEVICES
Figure 5-5. Distributed Processor Organization
5-9
failure modehasa very low probability of occurance. (Thebuscontainsnohardware,
besideslines andhardwired or depositedconnections.)
Theinteresting point to noteaboutthe DAMPSystemis that it is ableto meet
the reliability, expandabilityetc., goalsas described abovewith only a small addition
of hardwareto the memoryof aconventionalcomputationsystem. (In fact, depending
on theparticular set of generalpurposecomputations,discussedbelow, this may run
from slightly more to evenless total hardwarethanwouldbeneededin a multiple
computersystemto do the sametasks. ) To understandthis statement, consider the
pointbroughtout in Reference I that in the time frame of interest the main memory
of mostcomputationsystemcouldbe equallywell constructedfrom semiconductor
wafersor from magneticarrays; as a result, the DAMP Systemcanbeconsidered
to besimply a semiconductormemory that hasa small amountof eachwafer devoted
to processing(theprocessorwill take up less than1/10 of eachwafer containing
processingand512-16bit memorywords as discussedin the technologysection).
This meansthat the high reliability, flexibility, etc., gainedwith this systemhas
cost avery small hardwareincrease. In fact, for somecomputationmixes the
DAMPsystemmay use less hardwarethanthe equivalentmultiple computer. For
examplethe DAMPSystemwoulduseless hardwareif thecomputationswere somewhat
specializedsothat they containeda goodamountof appliedparallelism and/or small
highrate programs that are naturally parallel. This point will bemadeclearer by the
discussionof local andglobal control in the nextparagraphs.
TheDAMPSystemprovides reasonableefficiency ongeneralpurposecomputa-
tions bytaking advantageof both local andglobal control.1 In addition, eachcell has
a reasonablylarge instruction set andmemory; as a result, relatively large subtasks
canbeexecutedby a cell. This large capacity significantly easestheprogramming
andcommunicationproblems typically encounteredwith distributed processing
organizations. The useof local control (globalcontrol is describedin the nextpara-
graph)enableseachcell to executeanycomputationsimultaneouslywith other com-
putationsas long as there is nosequentialdependenceof this computationonothers
beingprocessedor aboutto beprocessed. (In other words noprecedencerelations
amongstthe parallel computations.) This latter restriction provides almost nocon-
flicts for executionof separatetasks simultaneously, suchas the navigationand
guidancetask andthe statusmonitoring task. ttowever, for thosesubtasksor tasksthatare
sequentialin nature, the aboverestriction meansthat only onecell canbeused at any
onetime to executethesecomputations. (Other independenttasks or subtaskscan
still bein executio::, however). For the latter reasonduringcertain periods, the
DAMPsystemcanbeoperatingin an inefficient mannerwhencomparedto a uni-
processorcapableof executingthe sameset of computations.2 The processing
1Auniprocessorwouldprovide the most efficiency on a set of generalpurposecompu-
tations, if it could bemadefast enough,since less overheadis required; however,
this typeof systemwouldclearly offer muchlower reliability. In addition, if the
computationshad somespecializationas notedin the last paragraph, the DAMPSystem
wouldoffer bothhigher efficiency and reliability.
2This local operationcanbe thoughtof as a simple extensionof a multiprocessor (or
multiplecomputer) system. The samerestrictions clearly apply here too, but they
are notas pronouncedsince there are a smaller numberof processors andmemories.
5-10
section of a cell will be described in detail in section 9, however it will be stated here
that each cell contains a processor capable of executing a reasonably large instruction
set in a parallel by bit fashion with clock speeds as high as a few MHz. This means as
mentioned earlier, that the operation of a cell may be limited by the amount of memory
available within itself. This will not be the case only for computations that require
small programs that are executed at high repetition rates. One method used to
alleviate the possible memory restriction involves letting one cell use another cell's
memory via the communication buses. This enables the cells with small high-rate
programs to share their memory with other cells that are memory restricted. A
second method of alleviating the memory limitations is by providing the system with
the abi,,_--" .... to use global control where it is applicable. (As mentioned earlier, global
control is characterized by the use of a single instruction stream on multiple data
streams and is therefore useful in problems that require the same set of operations
on diffeYent pieces of data, e.g., matrix manipulations etc. ) The DAMP System takes
advantage of global control by using one cell to hold the instruction stream and to send
this stream over the communication bus to the other cells in a group executing the
same problem. In this manner, redundant storage or sequential redundant access
of the same program is avoided. The use of global control as discussed here thus
enables a number of processors to be used with only a minimal amount of memory.
A third method of decreasing the amount of storage required per cell is to provide "
less buffering in the DAMP's memories for I/O data. This would increase the
required I/O rates but that is acceptable since extra processing is available. In
addition to the above, one of the groups that operates as the central executive will
provide a storage area for common variables and constants if appropriate.
One of the primary advantages of this organization, as previously discussed,
is its ability to withstand many failures and still continue executing at least some
computations. Another way to look at the system is to say that it is designed to be
tolerant of cell failures. It is tolerant in the sense that not only can a number
of cells fail and simply be replaced by spares connected into a group but also cell
failure beyond the number of spares available only causes small degradations of the
system. A little reflection will show that this type of operation is significant since
actual on-line failures (caused by overstresses etc. ) can be tolerated along with a
reasonable number of on-line failures caused by use of originally defective parts.
This failure distinction, also made in the technology section, is a fine point, but
it is important, since it has been observed that a great many failures in comprex
IC's are actually caused by manufacturing defects that were not detected prior to
placing the system on-line. In fact, many of these failures only occur under a very
specific set of circumstances and so may not cause on-line failures for thousands
of hours. These failure modes can generally be removed by feeding back information
to the production line, but it is a long and expensive process to obtain very reliable
components. Of even more importance is the fact that as circuitry reaches greater
and greater complexity on a single chip, the practicality of giving a chip a 100 percent
test on the ground before inclusion in a system decreases immensely. As a result,
the ability of the computation system to tolerate a number of latent manufacturing
defects will gain more significance in the future. It should also be noted that this
ability to tolerate cell failures does not exist to the same extent in a global structure
such as the Solomon machine. A single failure in the main control unit or memory
of such a structure could bring the whole system down. (Clearly making these central
units redundant is costly in terms of hardware when compared to a structure like the
DAMP System. )
5-11
The main anticipated difficulty with the DAMP System is making it relatively
easy to program. The tasks must be split up into subtasks such that when programmedthey
will fit into a single cell. This splitting should also give as much freedom as possible
in scheduling the tasks on cells. In other words constraints on executing certain
computations in a sequential fashion should exist, as possible, only within the sub-
tasks to be executed on an individual cell. The allotted subtasks must then be assigned
to the minimum number of cells. This assignment problem is similar to the problem
of determining the minimum number of cells necessary to implement a system for
a given set of requirements. This latter problem was discussed in Section 4 on
parallelism; the discussion in that section indicated that an optimum assignment is
difficult to achieve.
An additional problem exists once an assignment of programs to cells is made.
The subtasks must be scheduled so that all of them can be accomplished in the required
period of time. This requires consideration of allocation of separate periods to the
global control programs and to avoiding conflicts on the intercommunication busses.
Clearly the scheduling operation could also feedback on the assignment of subtasks to
cells such that a new assignment is necessary.
From the foregoing discussions it can be seen that actually programming a
subtask for a cell would not be much more difficult than programming the same sub-
task for a single computer system; however, if the programmer is required to sub-
divide tasks, assign them to cells, and setup the scheduling, the programming job
will be relatively difficult. Therefore some important areas of development are
software packages to automatically subdivide programs into tasks that can be exec-
uted by a single cell, to handle the automatic assignment of subtasks to cells, and
to schedule automatically the global control programs and intercommunication on
the busses. Section 11 will present some preliminary considerations of these topics.
It should be noted here that there are other possible approaches to obtaining
very high reliability that have been and could be investigated. One approach is to
use throughout the system encoded information that contains enough redundancy to
correct multiple failures. However, this type of system requires considerable
redundancy to withstand many failures; it will also totally fail after a fixed maximum
number of failures have occurred. This maximum would be considerably less (for
any practical system) than the number of failures that the DAMP System can tolerate.
A second possible approach would be to develop a multiprocessing system with a
number of processors and with memory modules that are subdivided into small
individual blocks with separate sets of read/write and addressing circuitry. This
type of system could gracefully degrade by losing part of a memory module or a
processor with each failure.
5.2 OPERATION OF THE DISTRIBUTED ARRAY MEMORY AND PROCESSOR
SYSTEM
The last section described the development of a number of computer
organizations. The last such organization, the DAMP system, was shown to be well
suited to advanced space missions due to its high reliability, easy fabrication and
expandability, and ability to be turned on and off in small increments. The chosen
form of the DAMP system is described in this section.
5-12
Thedistributed processor structure consistsof cells. Cells aie interconnected
to form agroup anda numberof groupsare interconnectedto form thecomplete
structure as shownin Figure 5-5. A cell maybe thoughtofas a small conventional
computerthat hasa relatively small memory. Parallel operationmayexist between
cells in agroup andbetweenthegroupsin the system. The structure therebyappears
as a highly parallel computationalfacility. For thepurposeof this study, the groups
will beconsideredasbeingall locatedphysically inonecentral locationso that the
organizationappearsas onelarge parallel system. (An alternative physically distri-
butedsystemis discussedin AppendixF. )
The computationaltasks that are presently required to be carried out are
divided or assigned to various groups and finally to various cells in the groups. Groups
of cells may carry out a complete task (such as the navigation functions in a space-
craft), a number of small tasks, or even part of a large function or program. One of
the primary considerations used in dividing programs among the groups is to limit the
inter-communication between groups as much as possible. A second consideration is
to enable as much global control as possible to be used in order to take advantage of
applied parallelism and thus maximize hardware utilization. This point was discussed
in Section 4 on parallelism. As will be explained below, each group will be capable of
executing a global program(l) by a number of cells in the group, this capability allows
the system to take advantage of applied parallelism in the computations assigned to the
group, tt should be noted that at any one time, only one global program may be in
execution in the group; however, since there are a number of groups in the system and
the groups operate independently of each other, a maximum of one global program per
group can be in simultaneous execution in the organization at any one time. Therefore,
one of the considerations in dividing programs among the groups is to assign them so
that as much applied parallelism in the total computational task is being utilized as
possible.
The tasks assigned to a group are subdivided or assigned to individual cells
within the group for actual computational mechanization. The actual task or tasks
assigned to a cell will, in general be a part of a larger task. Therefore, there may
exist, in a group, a certain amount of sequential relationships or precedence restric-
tions on the execution of the tasks in cells that must be taken into account when assign-
ing tasks. An example of a sub-task that might be assigned to an individual cell is to
compute position and velocity from navigation sensor input data. This sub-task may
may be part of a lai'ger navigational task assigned to a group. The results of this
individual cell's sub-task may be used to initiate the start of another sub-task that
might be in another cell, for example, update Kalman filter data for the state vector
computation.
In addition to the groups of cells in Figure 5-5, the bulk memory shown is used
only for loading programs and data since the cells themselves provide all the main
system memory. The conditioners and sensors are shown to represent the peripheral
equipment and controllers connected to the computation system.
The cells in a group and in the entire system are all identical in terms of
hardware. However, at any particular time some cells will be functionally operated
differently from other cells. Each group will have one cell designated as a controller
{1) A global program is simply a program executed by global control of a number of
cells.
5-13
cell. The remainingcells in thegroupwill be designatedasworking cells. The cell
that is the controller cell will be responsible for controlling the inter-cell communication
bus andproviding theexecutivecontrol of the group's operation. All the cells in a
group are connectedto the inter-cell communicationbus. This buswill bea 1/2 word
parallel and servesas theprimary meansof communicationamongthe cells in a group.
The inter-ceil buswill beusedfor communicatingdatabetweencells andalso for the
transmissionof global instructions andcommandsto cells. The cell that has the
uniquefunction of the controller cell willcontrol the useof the inter-cell busandwill
provideall the global instructions andcommands.
Theworking cells in a groupcaneither acceptandexecuteglobal instructions
that aresent on the inter-cell busor they mayfetch andexecuteinstructions from their
ownmemory. In fact, somecells may beusingthe global instructions from the
inter-cell buswhile others are using instructions from their ownindividual memory.
Sinceall the cells are physically identical, this operationwill bedescribedby stating
that thecells are capableof existing in different states functionally. In this way, both
local and global control may be carried out simultaneously within a group: As a result,
both the natural and applied parallelism inherent in a group's tasksmaybe efficiently
carried out. As was noted above, the controller cell will control the inter-cell bus.
The functions to be carried out on the inter-cell bus include communication of data
between cells, the sending of global instructions and commands, and the sending of
commands to individual cells. As mentioned above, only cells operating under global
control will be using global instructions and commands from the inter-cell bus. How-
ever, all the cells in the group will be responsive to commands sent to individual cells
via the inter-cell bus regardless of whether the cell is operating under local or global
control {these commands are addressed to one particular individual cell}. It may,
therefore, be seen that the controller cell can command one or more cells operating
under global control simultaneously while cells operating under local control must be
commanded or talked to on an individual basis.
Global control implies that one or more cells are using this type of control. The
controller cell provides all control information sent on the inter-cell bus. Itmay
therefore be seen that cells that are being operated in a global control mode are highly
dependent on the controller cell in that they are receiving instructions from itand are
also responsive to its global commands. However, the cell operated in a local control
mode is considerably more independent of the controller cell in that itfetches its
instructions from its own memory and must be talked to or commanded individually.
It can be seen that the controller cell has the overall control of the cells in a group and
the degree of this control can be varied depending on the functional use of each cell.
The above discussion is intended to serve only as a brief introduction to the operation
of a group of cells, a considerably more detailed explanation will be given in Chapter 6.
A simplified block diagram of a cell is shown in Figure 5-4. Each cell is
planned, as mentioned earlier, to be a single MOS/SOS chip, although, depending on
the cell memory requirements for certain applications, more than one chip could be
used. In fact, initial prototype systems could be constructed using more than one
chip to make up a cell (see Section 3 on Technology). The cell consists of a memory
and arithmetic and control section in a similar manner to conventional computers.
In addition, the cell will contain logic for inter-cell bus communications and for its
own identification;this is the part of the cell that differs significantly from conventional
computers.
5-14
All storage within a cell is directly addressable and is divided into storage
registers and a number of control registers. The storage registers hold the programs
and data to be used by this cell and possibly other cells. The control registers are
the general processor registers used in the execution of instructions. Instructions,
whether they come from the cell's own memory or from the inter-cell bus, are
decoded and executed in a manner similar to conventional computers.
In addition to the inter-cell bus communications path, another means of
communication if provided for, this is the neighbor communications. Each cell con-
tains a single serial connection to each of its 4 adjacent neighbors (north, east, south,
mn.d west), wit_h wrap around connections at the edges of the array of cells. This
communication means is primarily included to facilitate the global programming of a
group where small amounts of data are required to be communicated between cells;
e. g., a matrix multiply utilizing applied parallelism.
Storage in the cells of the distributed processing system is volatile
semi-conductor memory; however, investigation of this feature showed that it should
cause no problems. The primary power supply would be backed up and could be fed
into the distributed processor on separate, isolated lines. These lines could then be
connected to the computation system through separate voltage regulators with diode
isolation. If one of the supplies were to go out of tolerance, the other would simply
take over. In this manner, the power supply could be made more reliable than the
computation system hardware. Another simple possibility would be to use presently
available small Nickle-Cadmium batteries to provide backup power for the computation
system. This would be practical since a DAMP System as discussed above would only
dissipate a few watts of power.
The above discussion briefly describes the mechanization of a cell and how the
cells can operate. The cell that operates as the controller cell will either execute or
transmitt instructions contained in its own memory; instructions identified as global
instructions will simply be sent on the inter-cell bus for execution in cells operating
under global control. In addition, to the functions outlined previously, the controller
cell will contain various executive routines to control the operation of the group and
the execution of the assigned tasks to the group.
The operation of a group was briefly outlined above; it was seen that the cells
were capable of operating under either local control or global control. The organiza-
tion consists of a number of groups interconnected by an inter-group communication
bus as shown in Figure 5-5. Each of the groups can operate independently and simul-
taneously of the other groups. Therefore, this introduces another level of parallelism
in the organization in that it is possible to simultaneously have more than one group in
operation where each group may be utilizing local and global control to carry out its
tasks. Each group is connected to the inter-group bus by a group switch as shown in
Figure 5-5. The group switch will be controlled by the controller cell of the group.
One of the groups will contain additional functions in its controller cell that
enables it to operate as the system executive group. This will primarily involve
coordinating and controlling the communications that take place on the inter-group bus.
Since all the cells are identical, the system executive can be located in any one or
more groups. It is responsible for the following tasks: (1) Controlling the inter-group
bus, (2) handling communication with the bulk storage unit, (3) handling data communi-
cation from group to group, (4) sending out commands to load the system, and (5)
5-15
allocatingI/O time on the inter-group bus. The system executive will be discussed in
Section 11 along with all the other software and executive aspects of the organization.
There are many factors in determining the number of cells in a group and the
number of groups to be used. One of the influences on the number of cells in a group
is the capacity requirements of the various independent system functions. For
example, a task with very little parallelism could require many cells for its execution
due to the need for a great deal of instruction and data storage. Another important
influence on the group size is the degree of applied and natural parallelism within a
task. These points are also discussed in Section 4 on parallelism.
From the requirements given in Section 2 and the parallelism studies of
Section 4, it has been determined to use 4 groups of 20 cells each (512 words/cell). It
should be noted that the system could be designed using a variable number of cells per
each group; however, there is at least some impetus toward using fixed size groups.
Such a structure is more symmetrical and as a result more capable of many levels of
graceful degradation in small increments. For example, in a structure with 3 groups
of 30, t5, and 15 cells, a failure of the bus in the group of 30 would eliminate one-half
of the system computation power. Admittedly this will be an unlikely failure mode, but
the situation is certainly not ideal.
Finally, it should be noted that input/output connections are provided directly to
each cell and will be connected to the appropriate sensors and conditioners (some may
not be used if the cell gets I/O over the bus). In addition, I/O devices may also be
connected directly to the communication busses in the system. This section has
served to introduce the general concepts of the DAMP System and its general operation.
The remaining sections of this report will present detailed descriptions of features and
operation of the system. The next section on Group Architecture will clarify the
detailed operation of the group and many points presented here.
5.3 COMMUNICATION STRUCTURES
There are a large number of possible schemes for communication amongst the
cells (and/or groups) in a distributed processor. For example, the Holland machine
shown in Figure 5-2 is structured so that each cell can communicate directly with
its four neighbors. This scheme is satisfactory as long as the communication
amongst cells can be highly localized. However, if one cell must communicate
instructions or data to another cell that is not located physically near, path building
problems would generally result so that the communication delays may be intolerably
long. In particular the DAMP System's use of global control requires some long
communication paths; as a result, a simple neighbor communication scheme is not
sufficient. There are other more complex neighbor communication schemes, such
as the cobweb array connections shown in Figure 5-1, but these schemes still suffer
from path building problems and are therefore not applicable to the DAMP System.
The logical extension of the neighbor communication schemes would be a full inter-
communication structure similar to that used in a multiprocessor system. Such a
structure, shown in Figure 5-6, would provide no communication delays since each
cell can communicate directly with any other cell. However, for a group of reasonable
size (20 cells or so), the number of connections and extra circuitry required make the
system impractical. This is especially true since the high communication rates
obtainable with this system are not required. The Solomon system uses neighbor
communication for data and a bus from the central control unit to all the cells for
5-16
*ONLYFIVECELLSSHOWNI THISEXAMPLE.
Figure 5-6. Full Intercommunication Distributed Processor
the global instructions. This type of communication iswell suited to the needs of a
global structure with a central control unit. However, a global-local structure where
any cell can act as the controller (the DAMP System) required a communication system
of somewhat greater complexity than that of the Solomon machine. Two alternative
structures are discussed.
Presently the most promising communication structure for the DA MP System is
that shown earlier in Figure 5-5. This structure uses a common central bus in each
group with each cell connected directly to the bus through a driver/receiver circuit.
The bus and connections are byte parallel. The driver/receiver circuits in each cell
will be designed such that a single component failure in these circuits will not short the
bus directly to ground or to the supply. This will make the probability very small that
a driver/receiver will fail and short out a complete group. It should be noted that this
system does not require redundant driver/receiver circuits in the sense that a second
circuit should be available within a cell if the first one fails. The only requirement is
that a failure of the cell's driver/receiver does not short the common bus. Each cell
will also carry out self checks such that if a failure is detected a flip-flop will be set
that will inhibit driving the 'common bus. In addition, each cell will have a connection
to each of its four neighbor cells. This will be a single serial line since it's use will
be at a relatively low rate (Section 6 will present reasons for including neighbor
communications).
The foregoing communication structure between cells within a group would be all that is
required if only one group of cells was used in the DAMP System. However, for
reliability purposes a second central bus would have to be added thus doubling the
5-17
number of connections in the system. The approach taken for the DAMP computer is
to subdivide the system into groups where each group contains one inter-cell bus
connected as described above. This approach requires less total connections in the
system, provides for more failures to occur before bringing the system down, and
enables global programs to be flexibly scheduled so that the minimum number of cells
will be required to execute a given set of requirements. (Dividing the cells into groups
also significantly eases the problem of assigning tasks to cells.) The latter point is
discussed in Section 4 on parallelism.
A group switch is used in order to connect together the buses within each group
and still insure that all groups can simultaneously communicate within themselves.
Each switch, shown in Figure 5-5, is connected to its inter-cell bus in the same
manner as a cell; it is in turn connected to the inter-group bus that provides
communication between all group switches. This switch then looks like another cell to
the group controller cell since it must be monitored for communication needs and must
be sent words that go to other groups. The group switch has the following tasks: it
honors requests from other groups for communication to its group and it receives
information from its group and places it on the inter-group bus.
A little reflection will show that the group switch is a single failure point for a
group since if it fails a group can no longer communicate with some I/O devices, other
groups, or the bulk storage unit. As a result the switch must contain some redundancy
and must have a failure rate significantly lower than that of a cell. This low failure
rate will be achieved by using redundancy within the chip that carries out the group
switch function and by using redundant group switches in each group if necessary. It
should be noted that this low failure rate should be relatively easy to achieve since
the complexity of the group switch chip is significantly less than that of the cell wafer.
It should also be noted that the central inter-group bus will have to be made redundant
since a single failure of this bus would bring down the system.
The alternative to the chosen communication structure is that shown in
Figure 5-7. This structure uses a new element, the arbitor, to handle communication
between cells and between groups. The arbitor is connected to each cell on two-way
byte parallel lines and to the arbitors of other groups on the same type of lines via a
central bus. The arbitor simply carries out a round-robin (circular) scan of requests
for communication from each cell or from another arbitor. When a request is found
up, the arbitor reads a header word and connects the requesting cell or arbitor to
the requested cell (or central bus). The simplest system would only allow one set of
connections at a time, but would prepare for the next set of connections while two cells
are communicating. (A more complex arbitor providing for more than one simultan-
eous cell to cell communication link could be used if this increased communication
rate was necessary; however, such an arbitor would have somewhat lower reliability
due to greater complexity. ) The main advantage of the arbitor structure is that it
would decrease the overhead associated with obtaining a communication link between
cells. This is true since in the chosen structure with a group switch, the controller
cell must sequentially monitor each cell for communication requests. This process
requires words to be sent between each cell and the controller; therefore, if one cell
was to make two requests in a row while no other cell was requesting service, a
sizable delay would occur between obtaining the first and second requests (all the
other cells in the group would have to be monitored in between). In this case the
arbitor method of scanning request lines will be considerably faster. Of course how
much time savings this amounts to depends on the communication rates necessary on
5-18
• __\_ _':___J
\ CELL
GROUP
CENTRAL
BUS
Figure 5-7. Arbitor Communication Structure
the buses. Another possible advantage of the arbitor is that additional hardware
could also enable it to ignore requests from cells with failed communication lines or
drivers. The real disadvantage of the arbitor structure is that it is a single failure
point for a group. It will clearly be fairly complex (a few thousand FET's) and have
a large number of external connections. As a result, its reliability will be signifi-
cantly lower than that of the simple common bus used in the chosen communication
structure. (This bus is also a single failure point within a group. ) Since the arbitor
also handles the task of the group switch it should be compared to it too. Agaih the
arbitor is much more complex and has more connections. Of even more importance
is the fact that although the group switch could be connected redundantly into the group
bus if necessary to increase its reliability, it is relatively impractical to use two or
more arbitors in each group since that would at least double all the connections in the
system.
The preceding discussion points out the reasons behind the chosen communica-
tion structure for the DAMP System (Figure 5-5). The important point behind the
choice was that the bus structure with a group switch offered considerably higher
reliability;however, the calculations were not made to obtain the actual relative
reliability of the arbitor scheme.
5-19
6 GROUP ARCHITECTURE
6.1 INTRODUCTION
Many different computer systems have been studied during the course of this
study. A description of these systems was given in Section 5 of this report. The
distributed array memory and processor system was found to be the most useful for
the general computations needed on future spaceflights. This distributed system,
shown in Figure 1-1, requires a unique architecture to make a capable and reliable
system.
Architecture means the combining of software and hardware features to make a
balanced useful system that will meet the requirements set upon the computing system.
Some of the considerations, such as memory size and approximate processor capa-
bility, are based upon the ground rule to build a cell upon a single wafer. This section
describing the group architecture will describe the features desirable to unify the cells
into a working group.
The distributed processor system consists of groups, which are made up of
cells. Because the cells in a group are connected by neighbor communication lines
and the controller cell can send global instructions to cells, the group is the funda-
mental unit of the computing system. The software studies to date indicate the
compiler must be aware of the cell memory contents, the cell bus loading, and the
controller cell capabilities when compiling programs. For these reasons the
architectural studies were applied to the group.
The features and characteristics of a group are described here. All these
features may not be needed; future studies are needed to determine the useful
features to be retained and the features of little value to be discarded.
6.2 CELL STATES
A fundamental ground rule has been to make all cells of identical hardware.
Although all the cells are identical in hardware, a cell always exists functionally in
one of seven different and mutually exclusive states. These states are listed in
Table 6-1.
A permanently failed cell is placed in state 1 by a combination of software and
hardware controls. These cells will not be used again.
State 2 is the power saving state for cells that are not needed presently. If
standby power is applied to the identification registers and the cell bus gates, the main
power to this cell may be turned on by the controller cell and switched to another
state. The controller cell then may reload the cell's memory. If all the power has
been switched off, a special restart procedure, using the neighbor communication
lines, must be used.
6-1
Table 6-1. Cell States
1. Permanentlyfailed - poweroff
2. Shutdown- powersavingstate
3. Independent
4. Dependentunder global control (GlobalState)
5. Dependentunder local control
6. Dependentin wait state
7. Controller cell
Independentcells are functionally similar to a conventionalcomputer. These
cells fetch all instructions andoperandsfrom their memories. Thecell that is in the
independentstate stays in this state until the controller cell sendsa commandon the
inter-cell bus commandingthis cell to changestates. The independentcells can
processproblems that are not amenableto global processing.
Dependentcells respondto global instructions andglobal level commandssent
out from the controller cell. A dependentcell exists in oneof the states 4, 5 or 6.
Whichof the three dependsuponthe level of instructions beingsent from the controller
cell andthe cell's level register contents. Theconceptof levels is describedlater in
thenext sectionunder cell identification.
The concept of having both independent cells and dependent cells in a computer
system is an important concept developed in this study. Other studies of similar com-
puter systems require all cells to be independent or all dependent. With this improved
system, problems may be solved more efficiently by using both independent and
dependent cells as was noted previously.
A dependent cell in the global state (also called the active state) is receiving
instructions from the controller cell via the inter-cell bus.
As mentioned above the instructions being sent from the controller cell to depen-
dent cells are identified as being at a certain level. A dependent cell that is not at the
proper level to receive global instructions can idle and not execute instructions. This
is the wait state. Therefore, if the controller cell is servicing certain dependent
cells, other dependent cells may wait their turn for service.
6-2
A dependentcell, insteadof waiting for thecontroller cell to sendthe instructions
for its level, may fetch andexecuteinstructions from its ownmemory. This is the
local control state of a dependentcell. This statemay appearto bequite similar to the
independentstate of a cell in that bothoperate in a local control mode. The basic dif-
ference betweenthem is that the dependentlocal control state implies that the cell is
dependentanddependentcells respondto certain globalcommandson the inter-cell bus.
Therefore, the dependentlocal cells are under morecontrol of the controller cell than
the independentcells are. This will be clarified later in this chapter. This state has
advantageswhenonewants to usea cell partly for globalprograms andpartly for
local programs.
The capability of dependentcells to use local control meansthat the cell bus is
not wastedonsendinginstructions whenthe instructionscouldbebetter stored in the
cell's memory. With this feature, the cells canefficiently use local programs to cor-
rect for baddataandhandleexceptionalconditions. Thecell canenter the dependent
local control state, dosomeprocessing, andlater inform the controller cell of the
situation. It shouldbekept in mind that eventhoughthe cell may uselocal control in
state 6, it is still a dependentcell. This meansthatit is basically under control of
the controller cell andwill respondto certain globalcommands.
The seventhstate of a cell is the controller state. In this state, a cell controls
the cell bus andmayissue global instructions. Thefundamentalgroundrule of making
all the cells of the samehardwareallows anycell to becomea controller cell. This
gives the advantagethat the controller cell functionsmaybe switchedamongseveral
cells. Thusthere is no requirement that all the executiveandcontroller programs
fit in onecell. At anytime, there is only onecell in the controller cell state in a
group. The reasonis the controller cell controls the cell bus, andtwo cells cannot
beallowedto issue conflicting commands. Softwareandhardware interlocks will be
usedto insure only onecell is in the controller cell state in a group.
The group switch, shownin Figure 1-1, is part of the group althoughit is not a
cell. The group switch, like a cell, hasan ID register. The group switch respondsto
control words containingthe proper ID bits andwill perform the operationgiven in the
control word (CW). Thusthe controller cell will operatethe group switch.
6.3 CELL IDENTIFICATION
Thedistributed processor computersystemhastwo methodsfor identifying the
individual cells. Onemethodidentifies the cells bya level. A cell will be identified
as beingat oneof eight levels. Theother methodresults in eachcell beinggivenan
identifier, also known as the cell address. Thus a cell has two "names", a common
first name (level) and a unique last name (identifier). This concept of having two
names is important when discussing the dependent and independent cells.
Independent cells use only one name, their identifier or cell address. The level
(or first name) is not used and, although present in a level register, has no meaning
unless the cell later assumes a dependent state.
6-3
Dependentcells usebothnames. The controller cell cansendout information
using a first name (level number) to all the dependent cells; all the dependent cells at
this level will respond. If a last name (cell address) is sent, only the cell with this
name will respond since each cell has a unique last name regardless of state. Thus
it can be seen that the controller cell must communicate with independent cells
individually or to more than one at the same time since the dependent cells respond
to levels and more than one cell may have the same level.
The global instructions that are sent by the controller cell on the inter-cell
bus to dependent cells always follow the sending of a name. The cells that responded
to the name will receive the instructions. For example, assume a system with
7 cells as follows:
First Name Last Name Dependent Independent
JOE SCOTT x
BOB ROSE x
HELEN TRUMP
BOB MILLER x
BOB JOHNSON x
JOE SMITH x
HELEN DAVIS x
x
As an example assume the controller cell sends the following instruction groups.
The results are explained below.
JOE: Load X, Store Y, BOB: Load A, Add X, Subtract B, Store Y,
HELEN: Load A, Store Y, ROSE: Add M, Add N ....
Two cells (JOE) will Load X, Store Y. Three cells (BOB)
will execute the next four instructions. One cell will execute
the next two instructions. (The cell TRUMP is an independent
cell and does not respond to first names.) The name ROSE
is a last name, thus only one cell will execute the last group
of instructions.
6.4 SOURCE OF INSTRUCTIONS
The traditional computer has instructions stored in a memory which is always
available to the processor. The processor controls the instruction fetch sequence
by using the program counter. In most modern machines, the instructions are
6-4
locatedin a randomaccesscore memory, andtheprogram counter is incremented
to fetch sequentialinstructions. A jump is performedby loadingtheprogram counter
with the addressof the next desired instruction.
IndependentCells:
The independentcell receives all of its instructionsfrom the cell's memory,
like the traditional computer. Theprogram counteris usedto control the fetch of
instructions.
DependentUnder GlobalControl:
The dependentcell in theglobal state gets its instructions from the controller
cell. Theseceils receive theinstructions from theinter-cell busand thenexecute
them. Thecontroller cell precedesthe instructions with a name. Thename(level
number) is containedin a control word senton theinter-cell bus. This control word
is a prefix to a groupof instructions. This prefix is the level oI all instructions
until a newlevel prefix is sentor anothercontrol instruction is sent. Thus, a group
of global instructions is definedas that informationcontainedbetweencontrol words
on the inter-cell buswhena certain typeof control word preceedsthe group of
instructions. The groupof global instructions is variable in length. It will be
mentionedhere that the cell will executetheinstructions in a mannerquite similar to
independentceils; the primary differencebeing thatthe sourceof the instructions is
the inter-cell busandnot the cell's ownmemory. Theparticular details of the
instruction executionfor eachstate of a cell will begivenlater in this chapter.
A detaileddiscussionof the inter-cell busoperationwill begiven later in this
report. However, in order to facilitate the discussionhere a brief discussionwill be
givenhere. The bus is a half-word (8 bits) parallel. In additiononeextra line is used
anddesignateda control line; this line is usedto definecontrol words or commandson
the bus. The commandsor control words utilize levels or addresses(first or last
names)to identify which cell or cells the commandor control word is intendedfor.
Every dependent cell compares the level prefix sent by the controller cell to the
level register contents contained in the cell. If the prefix and the level register con-
tents are different, the cell ignores all the instructions, data, etc sent by the con-
troller cell, until a new level prefix (or other control word) is put on the bus by the
controller cell.
An example is given in Figure 6-1. Note that every cell is required to examine
every control word, but will not perform the control word operation if the cell is at a
different level (only the dependent ceils examine the level), or has the wrong ID (cell
address).
6-5
1 2 3 4 5 6 7
Segment
Number
@
Figure 6-1. Bus Operation Example
Control Byte (CB) All cells will examine this byte. If the cell{s) matches
this CB (either by level or cell address), the cell(s) will receive the
control word.
: Control Word (CW) This word includes the CB, and defines an operation
to be performed by the cell. Often the CW consists of only a CB.
Data. In this example, it shall be assumed that the Control Word
specified that instructions are contained here {note that the number of
instructions can be variable).
When segment 1 in the example occurs, all the cells in the system will examine
the CB. Assume the CB is a type that specifies a level. Thus, all the dependent cells
at this level will be ready to receive the CW (segment 2) and are automatically placed
in th--e'depe-_dent active (global) state. These cells in the global state will receive the
CW {segment 2) and will receive the instructions and/or data following (segment 3).
No other cells will receive any instructions or data {segment 3) from the bus until the
next CB occurs {segment 4 in the example).
When segment 4 comes on the bus, all the cells will again examine the control
byte. In the example, it shall be assumed the CB specifies a different level. The
following actions will occur:
In dependent cells that were active (global state) the CB at a new level will set
these cells to the dependent wait state.
In dependent cells that were not active and are at the new level indicated in the
new CB (segment 4), these cells will become active and will receive the data
(instructions}following {segment 5). All other cells are left unchanged.
Thus, it can be seen that many sequences of global instructions may be sent to
many sets of cells at very low overhead cost to switch between sets. The low over-
head is advantageous when many cells are at each level and short sequences of instruc-
tions are to be transmitted to each. Also the bus is used very efficiently. This is
primarily possible because of having the dependent local and dependent wait state in
addition to the dependent global state. Since dependent cells respond to level com-
mands, it is possible to switch many cells quickly and efficiently from one dependent
state to another dependent state.
6-6
To summarize, a dependentactive (global) cell is a cell that is receiving
global instructions anddata. By definition, a globalcell is at the samelevel as the
global instructions beingsent on thebus. Actually, the level is in the prefix CB,
there is no level transmitted with eachinstruction. The term "global instructions
level" will refer to this prefix, althoughthe term is not exactly correct.
DependentUnder Local Control:
The dependentcell not receiving instructionsfrom the inter-cell busmayfetch
instructions from its ownmemory. This cell is in the dependentlocal control state.
Thefetch of instructions is identical to anmd_p_utn_ceil. "_'_ "_:"..... " *_'"
cell being that it is constantlyexaminingthe bus for a commandat its level as notedin
the aboveexample.
Controller Cell:
Thecontroller cell alwaysfetchesinstructions from its ownmemory. The
instructions destinedto be executedby the dependentglobal cells are not executedby
the controller cell. All other instructions are executedby the controller cell andare
not sent to theglobal cells. This is explainedfurther in the section describing the
controller cell instruction execution(Section6.7).
6.5 SOURCESOF ADDRESSES
The computertechnologyhasdevelopedover the years manywaysof specifying
a memory address. Theearly machineshadthe operandaddressgiven in the instruc-
tion. Later, indexregisters were usedto modify the instruction addressandmemory
bankswere usedto save instruction bits. The traditional computerhadseveral ways
of determining thefinal (or effective} addressthatwasusedto addressmemory.
The cells in the distributed processor computeralso haveseveral waysto specify
an address. All the wayswill be describedhere, althoughsomeare not usedby cells
in certain states.
The addressmaybe specifiedby addingtheinstruction displacementandthe
bankregister (alsoknownas a baseregister). Thebankregister is 9 bits long since
a cell will have512words of storage. The displacementis obtainedfrom the instruc-
tion. Sincethe displacementis less than9 bits, abankregister will alwaysbeadded
to thedisplacementwhena full (9 bit) address is required.
Bank J
00.,.0 [Disp. ]
calculated address
The sum is called here the calculated address.
fled, it is also added.
If an index register is speci-
6-7
Bank ]
l Index Register 1
00...0 [ Disp. [
calculated address I
These two calculated addresses use the registers located in the cell.
Independent cells will obtain all the parameters that make up the address from
the cell itself. Dependent global cells will obtain the displacement from the instruc-
tion that was sent on the inter-cell bus, the' bank and index register are always from
the cell itself.
In addition to the calculated address, a new concept of a given address is used.
A given address is an address that is used instead of the calculated address. This will
be explained below.
A dependent global cell recognizes a given address by a special control
instruction received on the inter-cell bus. This special instruction is called a GC
instruction. The GC instruction is sent from the controller cell to signal the dependent
global cells that a given address, in addition to an instruction, is to be sent on the cell
bus. This particular GC instruction is called a format instruction. It will be seen
later that there are several types of format instructions, the particular one under
discussion here is called a format-given address. The sequence is as follows:
Time Contents of Cell Bus Length
GC Format-(given address)
Instruction
Given address
subsequent instructions
( 8 bits)
(16 bits)
(16 bits)
(16 bits)
Section 10 contains a detailed description of the inter-cell bus, however it will
be mentioned here that the bus is 8 bits in parallel (1/2 word). The global cell
normally expects to receive 16-bit instructions. However, this normal sequence is
altered by a format instruction. This instruction is a control byte and tells the
dependent global cells that something new has been added. In this case, that a 16-bit
address follows the next 16-bit instruction. The global cell will execute the instruc-
tion using the given address instead of the calculated address. Thus the controller
cell may send an address to all the global cells instead of having the cells calculate
the address.
The independent cells (and the dependent cells under local control) may use the
GC instruction. In this case, the format instruction is located in the cell's memory.
After the format instruction is executed, the processor knows the type of data
contained in the following memory locations. An example is given below.
6-8
Location
START
+1
+2
Contents Length
GC Format Instruction- 16 bits
address is given
Instruction 16 bits
Given Address 16 bits
(subsequent instructions} 16 bits
Here, the instruction at START +1 is executed using the given address instead
of a calculated address. The use of the GC format instruction is called instruction
modification. The modification is usually not a change in the operation of the instruc-
tion, but rather a respecification of the address. As seen from the above examples this
modification can be done in any cell that executes an instruction.
6.6 SOURCES OF DATA
The cell in the distributed processor computer system can obtain data from
many sources. Some sources are available to all cells irregardless of their state,
others are available only to cells in a particular state.
All cells have access to data stored in their memory. The present cell concept
has no division of memory into data areas, read-only areas, etc. Thus any location
in a cell is available to the processor.
All cells may obtain data from their neighbors. The neighbor to neighbor data
transfer is independent of the cell state, and is described separately in Section 6.8
and 9 of this report.
Cells may receive data from outside the group via the cells' I/O line. This
system is described in the section on Input/Output Operation (Section 7.1).
Cells may receive data from outside the group or from other cells in the group
via the inter-cell bus. This system of a cell communicating directly with another cell
via the inter-cell bus is described in the section on the communication bus operation
(Section 10).
Some special cases of how a dependent global cell can receive data over the
inter-cell bus will be given below. Dependent global cells may receive data from
the controller cell. A GC format instruction may be used to indicate to these cells
that data is being transmitted in addition to instructions. The GC format instruction
is used in a similar manner to that described in the preceding section, Sources of
Addresses. In the preceding section, the format instruction was called, given address.
The format instructions described below will deal with data and will be identified
differently as will be explained below.
6.--9
A 16-bit dataword maybe sent to dependentglobalcells by seadingthe following
sequenceonthe inter-cell bus. The GCformat instruction is called a D16 format.
Time Contents of Cell Bus Length
GC Format - 16 bit data follows
Instruction
Data
(subsequent instructions)
8 bits
16 bits
16 bits
16 bits
This format instruction indicates an instruction is followed by a data word of
16 bits. The instruction is executed by the cell. The operand used, however, will
be the data word received from the inter-cell bus and not the data word usually
fetched from memory. More details concerning the operands and data are given in
the section on instruction execution {Section 6.7). Having data sent by the controller
cell means that the individual cells do not each have to store constants. To have
20 cells all store pi, e, and other constants is an inefficient use of cell memory,
whereas the controller cell has to store the constant but once and send it out when it
is needed. The constants are sent at the time they are used; thus they need not be
saved in the individual cells memory.
A 32-bit data word may be sent to dependent global cells. The GC format
instruction is now called a D32 format:
Time Contents of Cell Bus Length
GC Format - 32 bit data follows
Instruction
Data
{subsequent instructions)
8 bits
16 bits
32 bits
16 bits
This situation is similar to the D16 format. The instruction will use the 32-bit data
word in pe_forming its operation.
It is possible to have the dependent global cells use as data, the displacement
field in the instruction. Naturally, the magnitude that may be sent depends upon the
length of the displacement field in the instruction. This is accomplished by using
another format instruction called the I (Immediate) format. The I format is especially
useful when loading registers with small values. The I format is received by a
dependent global cell as shown below:
Time Contents of Cell Bus
GC Format-Immediate
Instruction
{subsequent instructions)
Length
8 bits
16 bits
16 bits
6-10
As anexample, if the instruction is a LoadIndexRegister3, the displacementfield of
the instruction, precededby zeros, will be loadedinto index register 3.
The last GCformat instruction that concernsdatais the DSformat. This is a
very special format whoseusefulnessis yet to be entirely determined. It wasdesigned
to rapidly movedata from the controller cell to a numberof cells or a cell. The
sequencereceivedby a dependentcell is as follows:
Time Instruction and Data Length
GC format - DS 8 '-'*ui_S
Instruction 16 bits
address 16 bits
data word 1 16 bits
data word 2 16 bits
data word 3 16 bits
data word N
GC format-End of DS
16 bits
8 bits
The dependent global cells will receive the DS format instruction. The instruction
following will be executed using the given address and the first data word. The given
address will be incremented by one, and the instruction will be repeated using the
second data word. The operation will continue until the GC format byte, End of DS,
is received instead of a data word at N+I. The DS is seen to be useful if the instruc-
tion is a store or compare to memory type of instruction.
The above description of instruction modifiers (GC format instructions) to allow
the controller cell to send data to the dependent cells applies to dependent global cells.
The instruction modifiers may also be used by cells in other states. Of course, the
format instructions must then be stored in the cell's memory, and are not sent over
the intercell bus. Section 6.7 describing the instruction execution should be consulted
for more details. An example of how one of the above GC format instructions can be
stored in a cell's memory and used to modify instructions is given below:
Location Contents Length
START GC Format-D16 data follows 16 bits
+1 Instruction - Load Acc 1 16 bits
+2 data word 1 16 bits
+3 Instruction - Load Acc 2 16 bits
+4 (subsequent instructions) 16 bits
6-11
The instruction at location START is a CC instruction that will generate a GC Format -
D16 and indicates that the next instruction is followed by a word of data. The load
accumulator 1 instruction will load accumulator 1 not with the contents of the memory
location specified by the calculated address but with data word 1 located at START +2.
It is seen that the GC format instruction is used here to respecify the location of the
data to be loaded into the accumulator. The instruction at START +3, because it is
unmodified, is executed in a normal manner.
MOST instructions may be modified by a GC format instruction. This will be
explained in the next section where table 6-3 will be presented that gives a listof all
the instruction types and how they are affected by modification.
6.7 EXECUTION OF INSTRUCTIONS
The instruction execution in the distributed processor computer system is a
complex subject. The execution depends upon the state of the cells and upon where the
addresses, instructions, and data are located. This section does not contain the
detailec_ instruction list or details on the operations in a cell to execute the instruc-
tions (see Section 9); the intent of this section is to present the general concepts on
how the different types of instructions are treated. Instead of first presenting the
detailed processor section of the cells, it will be stated that it contains the program
counter, instruction decoding logic, adder and several registers as in a conventional
machine. The registers are accumulators, index registers, and base registers.
The registers may be located in an addressable section of the cell's memory, or they
may not be addressable by the programmer. The cell also contains a cell address
register and a level register for purposes of cell identification as discussed in
Section 6.3.
Instead of treating each instruction, the instruction listwas divided into several
general categories as shown in Table 6-2. The instructions in a category are executed
quite similarly in most states, however the differences will be pointed out in this
section. An explanation of the execution for each of the categories will be given below
for each state of a cell. Table 6-3 contains a summary of the execution of instructions.
Two instruction categories have been defined that are quite unique, namely (1)
CC (Controller Cell Instructions) and (2) GC (Global Control Instructions). Actually
both of these categories are mechanized with only one operation code in the processor
by using an operation code extension scheme. However, they will be considered
separately for purposes of clarity. Table 6-4 summarizes the types of CC instruc-
tions and Table 6-5 summarizes the types of GC instructions. The term instruction
is used freely here and may not be exactly correct. The CC instructions are conven-
tional type instructions in that'they are stored in the cell's memory that executes
them (this will normally be the controller cell). However, GC instructions are sent
over the inter-cell bus to the cells by the controller cell; the controller cell can send
out these GC instructions only by executing a CC instruction. This area of discussion
is quite complex and the explanation of instruction execution in this section should
serve to clarify it. In addition, the types of CC and GC instructions will be explained
in this section (some of the GC - format instructions have already been discussed in
this chapter).
6-12
A summary of the GC and CC instruction execution is given in Table 6-6 for
each of the cell states. This table will be explained in detail in the remainder of
this chapter. It should be noted that in Table 6-6, that the communications bus
I/O commands, C, are listed as GC instructions while not being listed in Table 6-5.
This is due to the fact that they are sent by the controller cell by the execution of a
CC instruction therein in much the same manner as those of Table 6-5 are. They
are not truly GC instructions but rather inter-cell bus I/O commands. They are not
discussed in this chapter; chapter 10 contains the detailed mechanization of these
commands and the operation of the inter-cell bus.
Table 6-2. Instruction Categories
1. LR
2. STR
3. OPR
,
.
6.
7.
°
.
12.
RR
R
EXEC
COMP
SKIP
JUMP
CC
GC
IO
Load Register from a memory location
Store Register into a.memory location
An operation is performed between a register and a memory
location contents, the results are in a register.
An operation is performed between one register and another
register.
Single register operation, such as shift.
Execute an instruction in a memory location.
Compare the contents of a memory location (or register) with
a register. The results of the comparison are saved in the
COMPARISON flip-flops.
Test the contents of a memory location (or register} with a
register or implied value. The result is true or false.
A new sequence of instruction is begun. The jump may be
combined with a test to make a conditional jump.
Controller Cell instructions.
Global Control instructions. These instructions control the
states and levels of all cells and dependent cell execution of
global instructions.
Input-Output instructions. These instructions initiate and
control I/O operations.
6-13
o_o _ _
o
o
o
_ _ o
ooo_:
6-14
_ =
=
l i ii llll
tlllll
6-15
Table 6-4. CC Instructions
.
2.
3.
4.
5.
6.
7.
Control inter-cell bus I/O logic
Generate GC and Bus I/O commands to be sent over the inter-cell bus
Transmit mode, single
Transmit mode, all
Execute mode, single
Execute mode, all
Do not change mode
7.1 No Operation
7.2 Format
7.2.1 A
7.2.2 D16 -
7.2.3 D32 -
7.2.4 A, DI6 -
7.2.5 A, D32 -
7.2.6 I
7.2.7 DS
Given Address
16 bit Data Word
32 bit Data Word
Given Address and 16 bit Data
Given Address and 32 bit Data
Immediate
Given Address and Data Stream
7.3 State and Level Control
7.3.1
7.3.2
7.3.3
7.3.4
7.3.5
Controller Cell State - Set Level
Independent State - Set Level
Independent State - No Level Change
Dependent Wait State - Set Level
Dependent Wait State - No Level Change
(r)
(SlL)
6-16
Table 6-5. GC Instructions
o Format
A
D16
D32
A, D16
A, D32
I
DS
END DS
(F)
Given address follows the next instruction
16 bit data word follows the next instruction
32 bit data word follows the next instruction
Both a 16 bit data word and given address follow the
next instruction, the address comes first
Same as A, D16 only the data word is 32 bits long
The displacement field of the instruction is the data
A number of 16 bits words preceded by a _ven address
follow the next instruction.
Indicates the end of DS data words
. State Control of
Level, G
Level, L
Level, W
Level, R
Level, R, DG
Level, IND
Dependent Cells on the basis of levels (SL)
All dependent cells at this level go to global state;
Instructions follow.
All dependent cells at this level go to local control
All dependent cells at this level go to wait state
All dependent cells at this level reply on inter-cell
bus with a constant
All dependent global cells at this level reply on
inter-cell bus with a constant
All dependent cells at this level go the independent state
3. State and level control of Individual Cells
IND, Level
IND
DG, Level
DG
DW, Level
DW
DL, Level
DL
CC
(SlL)
The cell is made independent and level register set
to the value specified
The cell is made independent with no change in the
level register
The cell is set to the dependent global state and the
level register set to the value specified
The cell is set to the dependent global state with no
change in the level register
The cell is set to the dependent wait state and the level
register set to the value specified
The cell is set to the dependent wait state with no change
in the level register
The cell is set to the dependent under local control state
and the level register set to the value specified
The cell is set to the dependent under local control state
with no change in the level register
The cell is made the controller cell
6-17
Table 6-6. GC-CC Instruction Execution
State
CC
IND
DL
DG
DW
Fetched from Own
Memory (CC Instr)
ALL(1)
F, SIL(2)
F, SIL(3)
X
X
Rcv'd from
Bus (GC Instr)
X
SIL, C
SIL, SL, C
F, SIL, SL, C
SIL, SL, C
Sent Over
Bus (GC Instr)
F, SIL, SL, C
X
X
X
X
Legend:
X:
F:
SIL:
SL:
C:
(1):
(2):
(3) :
Not Applicable
Format
State Control, Level Control, or both of Individual Cells
State Control of Dependent Cells on the basis of Level
Communication Bus I/O Commands
Except 7.3.2 thru 7.3.5 of Table 6-4
For Independent Cells only a Level Control CC and no State Control
is allowed from its own Memory (7.3.2 in Table 6-4)
For Dependent Local Cells only State Control to the Dependent
Wait or Independent State is allowed from its own Memory
(7.3.2 thru 7.3.5 in Table 6-4 are allowed)
6.7.1 Dependent Global Cell
A dependent cell may receive instructions, data, and commands from the inter-
cell bus. The global cell, or active cell, is receiving instructions and executing them
as they are received; the level prefix placed before the instructions by the controller
cell is the same as the contents of the level register in the global cell.
Although the global cell receives instructions from the intercell bus, the
registers, addresses, and data are usually from the cell's memory. Thus several
global cells will receive the same instruction, but all may use different addresses
and process different data. The exceptions are indicated by the use of a GC Format
modifier byte preceding the instruction. This concept was explained in the previous
sections.
6-18
A description of the execution of each instruction category is given below.
1. LR instructions are all instructions that fetch an operand from a memory
location and load the contents into a register. The address of the memory location
is calculated by adding the base register, index register (if one is specified) and the
displacement from the instruction. Only the displacement is received on the inter-
cell bus. The low-order nine bits are used to address memory; the remaining seven
high-order bits are ignored. Of course, if the memory in a cell were greater than
512 words, more bits would be used. The operand is fetched from this cell's memory
and placed in the specified register.
The LR instructions may be modified with a GC format byte. This byte, when
transmitted by the controller cell just before the LR instruction, modifies the address
or the source of the operand. The A (address) modification forces the cell to use the
given address instead of the calculated address. The D (data) modification forces the
cell to load the register with the data word sent on the inter-cell bus. The I (immedi-
ate) modification will load the register with the displacement field of the instruction.
The DS modifier is invalid.
No matter what the source of the address, the address always specifies a word
in the cell's memory. Of course, the A and D modifications can not both be used
with LR instructions. The controller cell may use a D modification to send the same
data to all cells.
2. STR instructions store registers into memory. The address is calculated,
and the contents of the specified register are placed in the addressed memory location.
A GC format byte, when received just before a STR instruction, will modify the
instruction. If an A modification is used, the given address is used instead of the
calculated address. A data word, either 16 or 32 bits (D16 or D32), may be specified.
In this case, the register contents are ignored and are not used. The data word from
the cell bus is placed in the specified memory location. Thus words may be placed
directly in a cells memory without changing register contents. Both A and D may be
given. In this case, the controller cell sends out both the address in which the data
is to be placed, and the data to be stored. This serendipitious result is used by the
controller cell to start up cells that have had their memory cleared for some reason,
such as a reconfiguration.
The DS modification, when used with a store instruction, is similar to using a
GC format with an A and D. The difference is the instruction and address are not
repeated with each data word, they are sent to the cell but once. Each time a data
word is sent to the cell, the data is stored and the address is incremented. An End
DS GC format byte will end the sequence.
The I modification can not be used with store register (STR) instructions,
because the displacement field is needed for an address.
3. OPR Instructions. The address is calculated in the normal manner, using the
base (and perhaps an index register), along with the displacement in the instruction.
The contents of the memory location specified by the address are obtained, and are
6-19
usedas operand1. Operand2 is always obtained from a register.
specifies what operation is to be performed with the two operands.
always placed in a register or registers.
The instruction
The results are
The following modifications are allowed.
A A given address may be specified, which will be used instead of the calcu-
lated address.
D
m
A data word is sent on the bus, which becomes operand 1. No address is
used. Depending upon the operation, the data word may be either 16 or
32 bits in length.
I The displacement field from the instruction sent over the cell bus becomes
operand 1. No address is used.
DS The DS modification may be used; however, the address is not used.
Because the same operation is performed with each word of data, this
DS modification may not be very useful.
4. RR instructions operate exactly as in the independent state. No modifications
are pos"--sible. Because both registers are in the same cell, no transmission of data
by the cell bus is required. Only the instruction itself is sent on the cell bus.
5. R instructions are the same in both dependent and independent cells. No modi-
ficati-ons are possible with register (R) instructions.
6. EXECUTE instructions can be sent from the controller cell to the dependent
global cells. In this way every global cell can execute a different instruction. The
address is calculated, the contents of the specified memory location are obtained and
executed as an instruction. The fetched instruction may be any legal instruction for
a dependent cell. The address (A) modification is allowed, in this case, the address
of the memory location would be sent by the controller cell. No other modifications
may be used.
7. COMPare instructions are executed much differently in dependent cells than in
traditional computers. In the traditional computer, a comparison is made between
two values; one is located in a register. The comparison results set some flip-flops.
In some computers, a separate instruction tests the flip-flops and jumps or otherwise
modifies the program counter. In other machines the same instruction actually
modifies the program counter. Sometimes, of course, the instruction does not modify
the program counter, depending upon the results of the comparison.
The dependent cells in the global state do not use the program counter, thus
another means of using the comparison results is needed. The concept adapted here
is to change the level register instead.
The instruction is received from the inter-cell bus. The address is calculated
and the specified word from the cell's memory is fetched. This word (operand 1)
is compared with the col_tents of a register (operand 2). The results of the compari-
son will set a pair of flip-flops to one of 4 states. These will probably be overflow,
greater, equal, less than.
6-20
Another instruction will test the state of these flip-flops and take some action.
Sometimes, the same instruction may compare, set the flip-flops and take some action.
The action to be taken may be one of the following. How many are mechanized would
require further study.
a. Continue at this level.
b. Increment level register by 1.
c. Increment level register by 2.
d. Decrement level register by 1.
e. Decrement level register by 2.
Any level register change will always discontinue the reception of instructions
from the inter-cell bus.
The conditions above are several that could be used. A compare instruction
could state:
If Flip-flops are 00 or 01 or 10
THEN increment level register by 1,
E I__E continue at this level.
Many other combinations are possible. How many would require software studies to
determine their usefulness.
It is seen that, because the compare instruction uses data that may be different
in each dependent global cell, some cells may change levels and thus discontinue
receiving global instructions. In these cells, data processing will be continued at
a later time when the controller cell sends out a GC for the new level.
Some compare instructions may use several words of memory, or perhaps
several words from the inter-cell bus as a DS modification. The setting of the
compare flip-flops and the subsequent action is the same as an instruction that uses
only two operands.
The modification possibilities have not all been explored. Some possibilities
are given here.
A The given address is used instead of the calculated address.
D The data sent on the cell bus is used instead of a register operand.
I The displacement in the instruction is used instead of a register operand.
DS The words of memory starting at the given address are compared with the
data words sent on the data bus. If the comparison changes the level
register, the reception of data words is discontinued. If no level change
is made, the flip-flops are left at their last state.
6-21
8. SKIP instructions are really test and skip. A test is made between two operands,
the result of this test is always True or False. The true state will always increment
the level register by 1. The reception of instructions from the inter-cell bus will be
discontinued immediately and the cell is placed in the dependent wait state. The cell
will remain in this state until a CG command is sent indicating instructions of the new
level are being sent on the cell bus. The false state will not change the level register;
reception of instructions will continue. If an address is required, it is calculated
and the operand is fetched from memory. One operand is usually from a register.
Some modification possibilities are:
A A given address is sent on the intercell bus.
D
D The data word sent on the intercell bus is used instead of the register
operand.
I The displacement in the instruction is used as an operand instead of
using a register operand.
DS The words of memory starting at the given address are tested against the
words sent on the inter-cell bus. If any test is true, the level is changed
and data reception is discontinued. If every test is false, the GC format
command which indicates the end of the data string will be received; the
level register is not changed.
9. JUMP instructions are very seldom sent to dependent active cells because they
have no meaning. A Jump instruction will always be preceded by a GC format,
because a JUMP, by itself, is never sent over the inter-cell bus. The GC format will
indicate a special operation is to be performed. One operation is to load the program
counter with a value. The program counter is not incremented or used as long as the
cell is in the global state.
10. CC instructions are not sent over the inter-cell bus.
11. GC instructions are received by the dependent global cells. Dependent global
cells ma--y receive any of the GC commands: format, state and level control of individual
cells, and state control via levels.
The GC format instructions ha ve been described in the preceding sections on
addresses and data. These GC instructions describe what is to follow on the cell bus.
Eight categories are possible (Table 6-7).
The other GC instructions that may be received control the levels and states.
These instructions may control the state of cells on the basis of levels or may control
the state and/or level of an individual cell.
The GC instructions for state control via levels are given below:
GC level, G
All dependent cells at this level to go the active global state. Instructions for
this level normally follow. If the cell is in the dependent global state when this
6-22
instruction is received, two cases occur the levels match or dontt match. If the
levels match the cell continues in the dependent global state and this is essentially
treated as a no op. If the cell's level is different, the cell will go to the dependent
wait state; essentially a new set of cells are then made dependent global.
GC level, L
All dependent cells at this level go to the dependent under local control state.
Dependent global cells whose level matches will change state, the others will remain
dependent global. If the cell goes to the dependent local state, the next instruction
for this cell will be taken from a fixed location in the cell's memory.
GC level, W
All dependent cells at this level go to the dependent wait state. The procedure
is the same as above except no next instruction is fetched for the dependant global
cells that change state here.
GC level, IND
All dependent cells at this level go to the independent state. This instruction wil
will set all the dependent cells at this level to the independent state; the cell or cells
will begin by initializing the program counter from a predetermined location in the
cell's memory.
These four above GC instructions will force all dependent cells at the given
level to change state. Of course, independent cells are not changed, neither are
cells that are at a different level from the level number in the GC instruction sent
over the inter-cell bus.
Another instruction is the GC reply.
GC level, R
All dependent cells at this level will respond to this GC instruction. The con-
troller cell will now allow the dependent cells to transmit on the cell bus. All cells
which responded will now return a constant number to the controller cell. Because
all cells are setting the cell bus lines to the same value, the hardware design is
simple.
This instruction is used by the controller cell to determine if one or more
dependent cells are at a certain level. The cells may switch levels dependent upon
the value of the data processed by a cell. Thus some cells may or may not be at a
specific level. To enable the controller cell to quickly and at low overhead determine
if there are any cells at a given level, this reply instruction is included. If no cells
are at this level, no cell will send back a constant to the controller cell, and the con-
troller cell will not receive the constant reply number. The controller then may not
need to send this level program. If at least one cell replies, the controller cell
will note this and send out the program to process this level of cells. There is no
way for the controller cell to know how many cells are receiving the instructions at
a level without interrogating each one individually. With the reply instruction, the
controller cell knows only that there is at least one cell at this level.
6-23
Table 6-7. GC Formats
Cell Bus
Bits 5-7 Description
1
2
3
4
5
6
7
8
D16 (16bit dataword)
D32 (32bit dataword)
A (Givenaddress)
A andD32
A andD16
I (Immediate}
DS{Datais beingsent)
Instructions follow (endof DS)
In addition, anotherGCreply instruction is:
GClevel, R-DG
All dependentglobal cells at this level will reply as in the aboveinstruction.
Howeverdependentlocal andwait cells will not respond. This instruction is
particularly useful to determine if anydependentcells are still in the global state.
The GCinstructions for state and/or level control of individual cells will be
givenbelow.
Theseinstructions were listed in Table 6-5; the general form is:
GCState, Level
or GCState, No Level Change
The GCcommandis sentover the inter-cell busby the controller cell. This com-
mandis addressedto an individual cell regardless of state andonly the addressed
cell will pick up the commandandexecuteit. The commandis executedsimilarly in
all cells regardless of their initial state. The addressed cell will change state as
indicated by the command and also change its level as indicated by the command. If
the cell is already at the indicated state and/or level, no change occurs. Therefore,
it is possible for example to keep a cell at the same state and change its level.
The procedures for changing states depend on the initial state and the final
commanded state in the cell. Since this section is concerned with the execution of
instructions in dependent global cells, only this initial state will be considered.
6-24
To enter the dependent local state the cell will fetch its next instruction from a
location in the cell's own memory given by a fixed address. Entering the dependent
wait state will result in the cell entering an idle mode and suspension of reception
of instructions over the inter-cell bus until a GC instruction is received forcing it to
again change states.
If an independent state is indicated, the cell will place the contents of a location
in the cell's own memory, given by a fixed address, into the program counter. The
cell will then begin fetching instructions from its own memory under control of the
program counter. This location may simply be the address saved when the cell was
!ast independent and told to go to the dependent global state, an address placed there
by the controller cell, etc. ; the possibilities here are numerous on how to restart
the cell in an independent state.
The controller cell state is entered by having the cell fetch its next instruc-
tion from a location in its own memory given by a fixed address. One of the first
things the new controller cell will do is to change the old controller cell into the
independent state.
12. IO instructions are described in the section on input--output (see Section 7).
Dependent global cells will usually not execute IO instructions, since the inter-cell
bus is being used for global instructions. Dependent global cells will normally be
switched to local control to execute IO instructions.
6.7.2 Dependent Cells - Local Control State
The dependent cell can have its level register at a different value than the
instruction and data level that is being sent over the cell bus. These dependent cells
that are not active and not receiving global instructions may (1) idle or (2} execute
instructions from its own memory. The first case is called the wait state, the
second is local control and is discussed in this section.
The execution of instructions from the local memory will continue until one
of the following events occurs:
1. A CC instruction is executed that puts the dependent cell in the wait state.
2. A GC instruction is received from the inter-cell bus that specifies this
level and causes a change of state.
3. A GC instruction is received on the inter-cell bus specifying this cell's
address and causing a change of state.
In the second case, the global instructions will always be used whenever they
are identified as being at the same level as the level register. The programmer is
responsible to be sure that the local control program is at a completed state before
global instructions at this level are sent from the controller cell.
Because the instruction execution is similar to the dependent global cell, only
the differences will be noted below. Note that any GC format instruction modifiers
must be stored in the cell's memory, preceding the modified instruction. The
sequence of instructions is controlled by the program counter.
6-25
LR
STR
OPR
RR
R
EXECUTE
COMPARE
SKIP
JUMP
The instruction is fetched from the cell's memory, the register is
loaded from the memory location specified by the effective address.
The calculated address is used unless a CC format modifier instruc-
tion is present. The D, A, or I modifier may be used.
The instruction is fetched from the cell's memory, the register is
stored in the memory location specified. The calculated address is
used unless a CC format modifier is present. The D or A modifier
may be used.
The operation is performed between the memory location contents and
the register. The CC format modifier can specify an A, D, I, or
DS modification.
These are executed exactly the same way in all cells. No modifications
are possible.
The register instructions are executed the same way in all cells.
Execute instructions are executed as in an independent cell. The
address is calculated and the contents of the specified memory
location are executed as an instruction. The fetched instruction
may be any legal instruction for a dependent local control cell. A
CC format modifier may specify a given address.
instructions are executed in the same way as in a dependent global
cell. The comparison is made in the same way, only the instruction
and all data are obtained from this cell. The flip-flops are set in
the same way.
The level register is either changed or will remain the same. If
the register is changed, the cell will automatically go to the wait
state. If the register is unchanged, the cell will continue to execute
instructions in the local control state.
The compare may be modified as given in the dependent global cell
description.
These instructions are executed exactly as in a dependent global cell.
The results of the test, if true, will increment the level register and
force the cell into the wait state. The false state will not change the
level register and thus the program will continue and fetch the next
instruction. The skips may be modified, as described in the dependent
global cell description.
These instructions will usually be executed as in an independent cell.
The new value of the program counter is calculated and replaces the
present program counter value. Conditional jumps are also possible.
One jump will take place depending upon the comparison flip-flop
setting.
6-26
CC
GC
A very restricted part of these instructions maybe executedby the
dependentlocal cells. This includesthe Format modification (7.2 in
Table 6-4}andpart of the state andlevel control instructions (7.3 in
Table 6-4}. It shouldbe kept in mindthat theseCC instructions will
be locatedin this cell's memory. Thefollowing state andlevel control
instructions maybe executed.
Ind, Level
Ind, No level change
Dep Wait, Level
Dep Wait, No level change
Any other CC instructions (1 through 6 in Table 6-4) will not be executed
in this state.
The use of the format instructions has been described in the
previous sections.
The use of any of the state and level control instructions will
result in a change of state and possibly a level change. The procedures
for entering the Independent or Dependent wait state are the same as
these presented for entering from the dependent global state.
Format GC instructions sent over the inter-cell bus are ignored by
dependent local cells.
The GC instructions that control state on the basis of level and
those that control state or level on the basis of cell address may be
executed by dependent local cells.
The GC instructions for state control via levels will be executed
only if the level register matches that of the instruction.
GC level, W
GC level, DG
The execution of the above two instructions will result in the cell
assuming the new state. Going to the global state means that the
cell is now ready to accept instructions via the inter-cell bus.
GC level, IND
GC level, R
The above two instructions are executed as in the dependent global
cells (see Section 6.7.1),
The GC instructions for state and/or level control of individual
ceils are handled quite similarly to that described for dependent global
cells; the differences will be brought out in the discussion below.
6-2 7
To go to the dependentglobal state, the present instruction will
becompleted, the program counter saved, andthehardware accumulator
stored in its memory location. Thenthe state is set to dependent
global andthe cell is ready to receive instructions over the inter-cell
bus.
Thedependentwait state will simply result in the cell entering
an idle modeasbefore. The commentsas to the present instruction,
program counter andaccumulatorin the aboveparagraphapplyhere
also andto every changeof state for the dependentlocal cell.
The same procedure is followed to enter the independent and
controller cell states as given for the dependent global cell.
IO These instructions are executed as in an independent cell. They are
described in Section 7.
6.7.3 Dependent Cell - Wait State
This cell is not executing instructions. The program counter is not being
incremented.
The dependent cell in the wait state is always examining the cell bus. When a
GC instruction that controls states via levels is received which has the same level
number as the contents of the cell's level register, the cell will switch to the com-
manded state. A GC instruction may also address this cell individually to change
state and/or level. The procedures followed to change state are identical to that
described previously.
6.7.4 Independent Cell
The cell operating in the independent state is described below. The independent
cell operation is very similar to the traditional computer operation. These cells
fetch all instructions and operands from the cell's own memory. The instruction
fetched is located at the address contained in the program counter.
The independent cell cannot set its ID register. The level register, although it
is not used by an independent cell, may be set to any value via a CC instruction.
An independent cell will respond to GC commands received on the bus that
specify this cell address (last name), Independent cells do not respond to commands
based on level. The cell that is in the independent state must stay in this state until
the controller cell sends a command on the inter-cell bus with a cell address equal
to the contents of the cell's ID register to change states. Thus each independent
cell must be addressed individually. The independent cell concept is an important
part of the distributed processor system. Other similar computer systems require
all cells to be independent or all dependent. It should be noted that the capability
for having independent cells switch themselves to dependent states was not included.
The reason for this is that it is felt to be of little use since a dependent local cell
acts very much like an independent cell and programs that may want to change states
and be performed locally may be done in a dependent local cell. This approach
simplifies the executive procedures in the controller cell.
6-28
LR instructions are all instructionsthat fetch an operand from a memory
locatio-n'and load the contents into a register. The address of the memory location
is calculated by adding the base register, index register (if one is specified) and the
displacement from the instruction. A CC format modifier instruction may be used,
however, the address is always in the same cell. The D, A and ! modifications may
be used.
ST___._RRinstructions are the reverse of the LR, the register contents are stored
in the given memory location. An A or D format modifier may be used.
OPR instructions are similar to LR, only the pi"esent contents of the registcr
are combined with the memory location contents according to the operation code.
The results of the operation are placed in the register. OPR instructions include
add, subtract, multiply, divide, AND, OR, etc. An A, D, I or DS modifier may be
used.
R_.RRinstructions are all instructions that use two registers, and place the results
in one register. Add accumulator 1 to accumulator 2 is an example. No memory
operations are required (unless the registers are stored in main memory). No
modifiers may be used.
R instructions are all single register operations, such as shift, complement
accumulator, etc. No modifiers may be used.
EXEC instruction is the traditional computer execute instruction. The specified
memory location contents are treated as an instruction; this fetched instruction is
executed. The independent cell can only execute instructions located within its own
memory. A CC format modifier may be used to specify a given address.
COMP instructions compare two values. One is located in a register, or is
understood (such as zero), the other is located in another register or in the cell's
memory. The result u_-r a _v,,,ycL,_ _ov,.{.... . ,,...{11 _.ca+ +,,,a.,..... f]in-flnn.qr---_ to a certain state, which
will be one of 4 states. An equal comparison, for example, may set the flip-flop to
00, greater to 01, and less than to 10. The compare may be modified as in the
previous descriptions; A, D, I or DS modifications are possible.
SKI__.__Pinstructions are always a test and conditional skip. The value tested may
be in a register or in memory. The result of a test is always true or false. The A,
D, I or DS modifiers may be used.
The independent cell will modify the program counter contents based upon the
results of a skip test. If the test results are true, P + 2 replaces the contents of the
program counter P. If the test results are false, P + 1 replaces the contents of the
program counter P. In other words, the following instruction is skipped if the test
results are true.
JUMP instructions are either conditional or unconditional. Additional operations
may take place in addition to the jump, such as storing the program counter in an
index register.
6-29
The jump is implemented in an independent cell by replacing the contents of the
program counter with a new value.. This new value is the location in the cell's
memory where the next instruction to be executed is located. The calculation of this
new value is dependent upon the type of instruction; however, in all cases, a new value
replaces the old value. Conditional jump instructions make a test, usually on some
register. If the test results are true, the program counter contents are replaced.
If the test results are false, the program counter is handled in a normal manner, i.e.,
the program counter is incremented by one. The A, D, I or DS modifiers may be used.
CC A very restricted part of these instructions may be executed in the independent
cell. This includes the format modification (7.2 in Table 6-4) and one of the
state and level control instructions (7.3 in Table 6-4). The only state and level
control instruction that may be executed is to change the level in the independent
cell:
IND, Level
Any other CC instructions will not be executed by an independent cell.
The use of the format instructions has been described in the previous
section. The instruction to change level simply results in the level register
being set to the new value specified.
G_._C The only GC instructions on the inter-cell bus that will be executed by an inde-
pendent cell are those that control state and/or level on the basis of cell address.
If the cell address in the GC command matches the cell's ID register, the
command will be executed. The procedure to change states is identical to that
given previously for the dependent local cell; namely, the present instruction is
completed, the accumulator stored in its memory location, and the program
counter saved in the location used for interrupts. Then the state is changed as
described previously. In this case the procedure in going to the dependent local
state is similar to that of going to the independent state for the dependent local
cell, namely, the next instruction is fetched from a location given by a fixed
address.
I_o Input-Output instructions are executed normally, as described in Section 7.2
of this report. In fact, the Input Output operation is the same for all cells
except those that are failed (obviously cannot perform I/O) and those in the power
saving state (there are no memory words, the memory is shut off). In all
other states the I/O is the same.
6.7.5 Controller Cell
The controller cell is the most difficult to describe. This cell has the charac-
teristics of the independent cell and of a storage bank. The controller cell controls
the inter-cell bus. The bus is used for local communication between cells and global
control and communications. The local communication operation between cells is
described in the section on the communication bus (Section 10).
6-30
As far as instruction executionis concerned,thecontroller cell canoperate in
basically two modes: the transmit mode and the execute mode. That is, the controller
cell can either send out instructions over the inter-cell bus or execute them internally.
The controller cell supplies instructions and data to the dependent cells. The
controller cell will first be described in the transmit mode. In this mode the con-
troller cell will be transmitting instructions to the global cells; the instructions are
executed only in the dependent global cells and not in the controller cell.
The transmit mode is entered by executing a particular controller cell instruc-
tion (see Table 6-4). This instruction will be discussed be!ow when a description of
the CC instructions is given. Essentially this instruction causes the controller cell
to place the subsequent memory words on the inter-cell bus. The program counter
controls the fetch of instructions.
Most fetched instructions that are transmitted are NOT executed by the con-
troller cell. The transmit mode causes non-execution (by the controller cell) of
the instruction categories shown in Table 6-8. An appropriate delay between trans-
missions is made so the dependent global cells will have time to execute the instruc-
tions before the next instruction is sent.
JUMP instructions are not sent out unless they are preceded by a GC format
modifier instruction. JUMP instructions are normally executed by the CC. The
program counter is usually modified to start a new sequence of instructions as in an
independent cell. Conditional jumps are executed using data from the controller cell.
The address and registers used (if required) are always from the controller cell.
Table 6-8. CC Transmitted Instructions
LR
STR
OPR
RR
R
EXEC
COMP
SKIP
Load Register
Store Register
Operate on Register
Register to Register
Register
Execute
Compare
Test and skip
6-31
In the execute mode, the controller cell fetches and executes instructions like
an independent cell. All the instructions except the CC and GC instructions are
executed identically to the independent cell. Therefore, only the execution of the CC
and GC instructions will be discussed below.
CC instructions are listed in Table 6-4. There are seven basic types of CC
instructions.
1.
2o
Control inter-cell bus I/O logic - Instructions of this type control the logic
associated with the inter-cell bus communications section of a cell. Some
instructions of this type are: prepare to input from the bus, prepare to
output over the bus, etc.
Generate GC and Bus I/O commands - CC instructions of this type when
executed by the controller cell will result in the formation and transmission
of control words on the inter-cell bus. All of the GC commands listed in
Table 6-5 are generated by this instruction. This includes the format,
state control via levels, and state and/or level control of individual cell
commands. In addition, the commands to control local communications
between the cells are generated by this instruction. Section 10 contains
a detailed discussion on how the GC and Bus I/O commands are formed
from this instruction.
3o Transmit Mode, Single - This instruction forces the controller cell to
enter the transmit mode until one instruction (including any GC modifiers,
given address, etc. ) has been transmitted and then return to the previous
mode.
4o
5.
Transmit Mode, All - Same as above except the transmit mode is retained
until one of the two CC instructions given below is executed.
Execute Mode, Single - This instruction forces the controller cell to
execute the instruction (including any format modifiers) that follows in
the controller cell. Then the controller cell is to revert to the previous
mode.
6. Execute Mode, All - Same as above except the execute mode is retained
until one of the CC instructions, 3 or 4 above, are executed.
. Do Not Change Mode - These instructions are used to control the execution
of instructions in the controller cell and to change its level. The present
mode of the controller cell will be retained.
a°
b.
No Operation
Format - These are the format modifiers previously described
(A, D, I, DS). They are used in the controller cell to modify the
next instruction it executes (not transmits). These are not sent over
the inter-cell bus. Format modifiers sent over the bus are generated
by the CC instruction of type 2 described above.
6-32
C. State and level Control - The only one of these instructions (Table 6-4)
that the controller cell may execute is "Controller Cell State - Set
Level. " This instruction may be used by the controller cell to change
its level register.
G__CCinstructions are sent over the inter-cell bus by the controller cell. There-
fore, the controller cell will not normally be receiving and executing any GC com-
mands from the bus. The one exception occurs when the controller cell is to change
state. This is accomplished by the controller cell executing a CC instruction of
type 2 above. The particular GC command generated by the execution of this instruc-
tion and sent over the inter-cell bus will be a command for a particular cell to change
its state to the controller cell state. The original controller cell will now be waiting
for a command over the inter-cell bus to tell it to change states. A timer is set
so that this command is received in a certain maximum allowable time. The command
will of course be a GC command sent by the new controller cell. If this command is
not received in this maximum allowable time, the original controller cell will inter-
rupt and enter a software routine to diagnose the no response condition.
One point should be made with regards the execution of global programs. It
is possible to be sending out a global program at a certain level and only have part
of the cells in a group at that level be executing this global program. Recall that a
cell wfll be executing the global program if it is in the dependent global state. This
situation could occur due to several cases:
1. The cell is independent - its level has no meaning and may therefore be
the same as the level of instructions on the inter-cell bus.
. The cell is dependent local or dependent wait - a cell may be dependent
local and initially at a different level than that of the instructions on the bus,
then the cell may change level and/or state under its own control and be at
the same level as the instructions on the bus. However, the cell will not
switch to the dependent global state under this condition.
. Only one cell is made Dependent Global - It is possible for the controller
cell to send out a GC command addressed to an individual cell and place it
in the dependent global state. This is the DG or DG, Level command in
Table 6-5. The level register may or may not be changed by this command.
In any case, if the controller cell sends out instructions over the inter-cell
bus following this command, the cell will execute them and assume they
are at the same level as its level register. Other cells may be at the same
level as this cell, in fact they may even be dependent local or dependent
wait; however, they will not enter the dependent global state since a GC
to change state of an individual cell was used.
6.8 NEIGHBOR TO NEIGHBOR COMMUNICATIONS
This section will discuss the reasons for including neighbor communications in
the distributed processor system. Neighbor-to-neighbor communication is a means
for one cell to communicate with a restricted number of neighboring cells. These
communications paths are separate from the inter-cell bus. Because the bus is
6-33
alwaysneededfor global instructions (applied parallelism), the question arises,
should special communication paths be provided to neighboring cells, or should all
communication go via the intercell bus ?
The advantages of neighbor-to-neighbor communication come from the problems
of using the bus and the fact that certain computations may be placed with a geometric
relationship (matrix and vector manipulations, etc.) that may efficiently utilize
neighbor communications. The inter-cell bus requires control as to which cell gets
to use the bus, how long the cell may use the bus, and how control is to be passed
from one cell to another. A controller cell must have means of controlling transmis-
sion priorities. Even if some of this control can be done by hardware, the fact
remains that there will always be an overhead in software and storage. To address
any word in a group requires 14 bits; thus to send a 16-bit data word requires 14 bits
of storage to hold the destination address. Thus for a word of data, a word of
address is required. Adding words of program to control the transmission will
require additional words of storage.
The time delay in transmitting via the inter-cell bus will depend upon the bus
design and how long it takes to establish control. If the bus is 8 data bits wide, for
example, about 6 clock times are required to transmit a single 16 bit word. When
many words are transmitted in a single message, the clock times per word will
approach 2 (one clock time per 8 bits). A detailed discussion of how communication
is carried out on the bus may be found in Section 10.
The time delay in transmitting is seen to be a problem, especially when few
words are transmitted. A more serious problem is the delay in obtaining control of
the intercell bus. The time delay between the time a cell needs the intereell bus
and the time a cell actually gets control of the bus is called the request delay. The
longest delay will occur when all the other cells have transmissions to make and the
requesting cell must wait its turn. The shortest time can be almost zero. Some-
where between these two times will be the average request time delay. This delay is
variable, unknown, and could be very large. For any program where the timing is
critical, the unknown times could make the programming difficult. In many eases
where data has to be passed to neighboring cells the timing is critical, such as in the
navigation and guidance routines. It should be noted that the more applied parallelism
that is being utilized, the longer the request delay. Since this may mean that more
cells are involved in the neighbor communications; if they all had to be serviced over
the inter-cell bus, the average request delay can be very large.
It is of course possible to do without neighbor communications providing one
does not run into a time problem and can complete the computations in the allotted
time. However, as noted above, it would prove very inefficient and difficult to use
the intercell bus where small amounts of data have to be passed between neighbors
as in the use of applied parallelism.
To improve the system, neighbor-to-neighbor communication is needed. The
system implemented must provide the following advantages:
1. Low overhead (no more than one instruction to execute a data transfer)
2. High reliability
3. Known times to implement a transfer
4. Adaptable to reconfiguration
6-34
The disadvantagesto includingneighborcommunicationsare two: (1) additional
connections are required to each cell, and (2) reconfiguration is more difficult. The
number of connections required per cell increase by four; while this is not a large
increase, it does of course provide more failure points. The most serious disadvant-
age appears to be that of reconfigaration since now the organization is spatially
oriented. One approach to this problem is to require the programs to be set up
using small independent set__.._sof cells so that one may pack the sets of programs around
a number of failed cells in a group. This problem requires further investigation to
determine how effectively it can be solved.
While a firm answer as to whether or not neighbor communications are required
cannot be given at this time, the advantages in including it appear to outweigh the
disadvantages that may be encountered. In addition, inclusion of the neighbor com-
munications provides an organization with increased capabilities (notably speed) so
that it may be applicable to a wider variety of applications (for example a high speed
video data reduction problem). Therefore, for purposes of this design neighbor
communication will be included.
6.9 ADDITIONAL TOPICS
The concepts described in this section result in a powerful system with many
options on how operations can be performed. Further study may show how features
can be changed or added to improve the system. A number of alternate concepts are
given in this section to show some other ideas that were considered.
The given address type of format modifier, as described above, was used with-
out modification by the receiving cell. An alternate idea, which may aid in the soft-
ware, is to have the cell always add a base register to the given address before it is
used. Because each cell can have its base register set to a different value, the place-
ment of programs and data in the cells may be made easier by using a base register
with all the given addresses.
The use of a special instruction (GC) modifier was selected here because it
used the least amount of time on the inter-cell bus. Another way to modify the
instruction execution is to reserve a bit or so in each instruction (or memory word)
to indicate its length and any special address modifications, and how much dan was
attached. The advantage is that each instruction carries its own length code, etc.
The disadvantage is that each instruction must be made longer. Using the GC modi-
fier makes only the modified instructions longer, but they are very much longer than
they would be in the other method. The present decision was to make the unmodified
instructions as short as possible, even though this made the modified ones long. The
net result should be a storage saving, especially in the independent cells.
The two modes of a controller cell, transmit and execute, are only one way of
selecting which controller cell words are instructions to be executed in the controller
cell or in the dependent global cells.
Another method is to place extra bits on each memory word, telling what the
word is. This method requires many extra memory bits in all cells.
6-35
Another methodis to use two program counters. Oneprogram counter controls
thefetch of instructions to be executedby the controller cell, the other program
countercontrols the fetch of instructions to be transmitted to thedependentglobal
cells. This two program counter ideaallows the modesto bediscarded. The result-
ingsystemis nowmuchmore elegantandpowerful. However, the processor now
requires muchmore hardware.
In addition, anothermethodwouldbe to store in the controller cell, as input/
outputdata for the inter-cell bus, all the instructions anddata that are to be placed
onthebus. This approachis very inefficient since the controller cell must identify
control words by setting the control line in thebus; it hasnomeansof distinguishing
betweeninstructions anddatasince they are all storedas I/O data. Of course one
approachas notedabovewouldbe to haveextra bits stored with eachword identifying
it as instructions or datawith theresultant penaltyof increasingthe numberof bits
usedfor storage. Another methodhere wouldbe to store the countof the numberof
wordsbetweenthe control words anduse thesecountsto identify the control words.
This approachrequires storing a numberof counts, additional control hardware and
makesprogram modification difficult.
The aboveare only somealternatives in the designof the grouparchitecture.
Manyother possibilities undoubtedlyexist. Thechosenarchitecture is believed the
mostefficient andflexible compromiseamongmanyalternatives andpossibilities.
However,further investigation into somealternatives couldresult in modification
andimprovementsto the presentdesign.
6-36
7. INPUT/OUTPUT
7.1 INPUT/OUTPUT OPERATION
This section of the report discusses the operation of the input/output system in
the DAMP Computer. Input/output is handled by the hierarchical structure shown in
Figure 7-1. This figure shows that the interface to the computer consists of serial
and parallel digital lines. The conditioners C1 through CN each have a number of
sensors connected to them. The sensors provide a variety of signals to the condi-
tioners and the conditioners, in turn, accept these signals and provide a standard
digital interface to the computer. Some devices contain their own conditioner circuitry
and are connected directly into the computer; these devices will generally be connected
in a full word parallel format. The bulk storage memory unit will be one of these
devices; other parallel devices may include items such as buffers for video sensor
data, etc.
The I/O structure described above was chosen over a completely centralized
I/O structure which would absorb the conditioners into the computer and have the
sensors interface directly with the computer for the reasons given below:
. A completely centralized I/O structure is generally used to gain a more
efficient hardware utilization by the consolidation of common signal condi-
tioning functions. In this computer, reconfiguration is possible around a
number of failures (down to the cell level). Since some of the I/O signals
are connected directly into the cells as will be explained later, reconfigura-
tion around a number of failures now makes a completely centralized I/O
structure inefficient. This is due to the fact that all the cells would be
required to have the capability for interfacing directly with any of the
sensors. This approach would result in a large amount of redundant hard-
ware and not provide an overall hardware savings.
,
.
The conditioner structure is easily able to adapt to a change in sensors,
addition of sensors, or improvements in the sensor design. All that is
necessary is to add a conditioner or replace one that is already there;
whereas in the completely centralized I/O structure there is a need to
redesign the cells and replace the entire computer with new chips.
The conditioner I/O structure also provides ease of adapting the computer
system to various vehicles between missions and within missions such as a
command module and a lander module of a Mars Lander Mission. These
vehicles will have significantly different sensors. As a result the condi-
tioner structure will provide the ability to use exactly the same basic com-
puter with only the need to change the appropriate conditioners in each
vehicle.
Many of the techniques that will be used in the Mars Lander Mission for handling
guidance and control, status monitoring, and scientific, data have been established;
however, there will certainly be many new developments. As a result the sensors to
be used in such a mission are presently not well defined, especially in the area of
scientific experiments. This of course means that the conditioners also cannot be
7-1
DANLP
COMPUTER
00000
PARALLEL
Figure 7-1. I/0 Structure
well defined since their primary task is to generate control sequences, carry out
analog to digital conversion, etc, for the sensors. However, certain general proper-
ties of the programs necessary to operate upon and handle the data from the sensors
can be defined. These properties, typical of a wide range of spacecraft programs,
will be used to obtain a first approximation to the operation of the I/O system.
The method in which the I/O ties into the block called the computer in Figure 7-1
will now be developed. A discussion of the possible methods of handling the I/O
internally will be given first and then further details will be given on the selected
method.
7.1.1 With the Intergroup Bus
Figure 7-2 shows the structure of the computer using an intergroup input/output
scheme. The intergroup bus actually consists of two redundant half word parallel
busses. Groups in the computer organization use these busses for communications
amongst themselves, therefore the I/O devices attached to the busses will appear as
groups functionally as far as the computer organization is concerned. The conditioners
will transmit serial data to the computer; therefore, a conversion from serial to
parallel is required before they get on the intergroup bus. This is accomplished in
the blocks labeled I/O Cell in Figure 7-2. Two of these I/O Cells are shown in
this figure, one to each of the intergroup busses, this connection provides the recon-
figuration flexibility required. Conditioners handling data from critical sensors are
7-2
INTERG ROUP BUS
T
GROUP 4
-t "° I'--"CELL 1
PARALLELDEVICES
Figure 7-2.
I/o _._CELL 2
Intergroup Bus I/O Scheme
CONDITIONERS
CONDITIONERS
duplicated and connected redundantly into each of the I/O Cells. Therefore, if a
failure occurs in the bus, I/O cell, or conditioner, the other conditioner - I/O Cell -
bus connection can be used as a backup.
As noted above, sensors supplying critical data are connected to both I/O
Cells; the non-critical sensors such as experiment data will be connected to only one
of the I/O Cells. Two alternatives are present here: (1) to have all the non-critical
sensors connected to one I/O Cell, or (2) to divide the non-critical sensors between
the two I/O _ells; these two schemes are shown in Figure 7-3. It should be kept in
mind that Figure 7-3 shows only the sensor connected in a serial manner through
conditioners to the I/O cells, there are also the parallel sensors which connect
directly onto the bus. Each of the two schemes results in a different operation of the
I/O system in the computer.
The first scheme where all the non-critical sensors are connected to one I/O
cell results in using Bus 1 primarily and Bus 2 serves only as a backup in case of
failure of Bus 1, I/O Cell 1, etc. This is due to the fact that the only I/O connections
to Bus 2 are redundant connections from the critical sensors. With the second scheme
both busses must be used since the non-critical sensors are divided between the two
busses. The first scheme requires a higher communication rate capability on the bus
when compared to the second scheme. IIowever, note that in case of failure and
reconfiguration the first scheme will be able to offer a full reconfiguration since
either bus can handle the total I/O requirement. The second scheme besides reducing
the communication rates required on the bus introduces a new executive method of
handling the I/O. Since the I/O control and data are now divided onto two separate
busses, the executive in charge of I/O has the additional task of scheduling and
7-3
BUS 2
CRITICAL ]SENSORS
NON-CRITICALJ
SENSORS J
_ _ _ _ _ J J
BUS 1
BUS 2
==
Figure 7-3. Two Methods of Intergroup Bus I/O
7-4
interleaving I/O into the groupsandavoidingconflicts of simultaneousI/O on the two
bussesinto onegroup. Of course theschemewith twobussesalso offers more flexi-
bility in handlingthe I/O from anexecutiveviewpointsince it is possibleto handle
periodic highpriority ononebusandbackgroundtypeof I/O on the other bus, etc.
It shouldbenotedhere that the I/O cells are identical to the cells usedin the
groupsin thedistributed processor. Theconnectionsto the busare nowdirectly
madethroughthe cell andnot a groupswitchas is thecase in the groups. Theserial
connectionto the conditioners is mechanizedwith theneighborcommunicationline.
It shouldbenotedthat it is possibleto useeachof thefour neighborlines of a cell
to sets of conditioners; this couldbedoneto increasethe communicationrate capa-
bi!ity to the conditioners (serial lines) if this becamea bottleneckproblem.
Thecell containsprocessingandstorage ability andcan therefore functionvery
well as an I/O processor since it hasgeneralpurposecomputercapability. Memory
in the cell canbeusedto store programsfor inputting/outputtingdata, someof these
programs may bepermanentandsomemaybe loadedinto the I/O cell by the executive
group. Thememory mayalso beusedto store data, thereby acting as a buffer device.
7.1.2 With the Intercell Bus
In Figure 7-4 the I/O concept introduced above is extended to the intercell bus.
The I/O scheme is quite similar to that given above and only the differences will be
pointed out here. Only one bus is required in the group for the intercell bus, whereas
two are required for the intergroup bus; therefore, only one I/O cell is shown in the
figure. The sensors connected to the I/O cell may be critical sensors, non-critical
sensors, or a combination of the two. Redundant inputs of critical sensors would be
connected to another group, thereby providing the reconfiguration capability required
if a group failed. The I/O cell is physically the same chip as the other cells in the
group. It is connected on the cell bus just as the other cells are; however, its
neighbor communication lines are no__ttused with neighboring cells but for communica-
tion to the I/O conditioners. The same comments apply with regards using all four
neighbor lines for I/O conditioner connections.
This approach has a communication advantage over the previous one in that the
intergroup bus now is not tied up with all the I/O data since the data is now fed
directly into the group where it is being used. It also does not require the overall or
intergroup bus executive to be concerned with handling and scheduling the I/O; it is
simply handled by the executive directly in a group. The disadvantage with the
approach just presented is that the I/O cells in Figure 7-4 are somewhat specialized in
comparison to the other cells in the group since the neighbor lines of the I/O cell are
not connected to neighboring cells. This then makes reconfiguration difficult if the
I/O Cell in the group should fail. With this approach it is also difficult to handle data
which may be required by any or all the groups, e.g., data from the bulk storage
device. If it is not possible to connect such devices to more than one group then a
group may be required to transmit data from such devices to other groups via the
inter-group bus. This obviously can place quite a burden on an intercell bus.
The logical extension of the two approaches thus far discussed would be a combi-
nation of the two: connections to the cell bus as presented above and connections to the
group bus similar to that discussed previously. There are many possibilities open
here. One scheme may be to connect noncritical sensors and one of the inputs from
7-5
TO INTER-
GROUP BUS{
INTER CELL BUS
\
_ PARALLEL
G ROUP
I PARALLELDEVICES
SERIAL
1
i ITIONERS
, SENSORS
D-.
Figure 7-4. Intercell Bus I/O
critical sensors directly to groups and the other input from critical sensors and
devices such as the bulk store directly to the intergroup bus. An alternative may be
to connect the critical sensors directly to different groups; there are numerous possi-
bilities here, each with certain advantages and disadvantages.
7. 1.3 With the Cells Directly and the Communication Busses
Another scheme considered and the one selected is to provide connections
directly to the cells in the groups for I/O and also connections to all busses in the
system. This I/O approach is shown in Figure 7-5.
It should be recalled that with the previous I/O approaches of communicating to
the busses; a cell had to be provided for connecting the serial conditioners into the
system (I/O Cell). This cell was identical to the other cells in the system. However,
it was specialized in that it was not connected in the regular pattern (array) of cells
in the groups. Therefore, since the I/O cell was specialized it could not be replaced
by the other cells in case of failure. The selected approach requires a separate lead
to be brought out from each of the cells in the groups to an I/O connection panel; this
is similar to another neighbor communication lead. Every cell therefore has the
capability for handling I/O. Connections from the busses are brought out to the panel
also. This scheme requires no additional or specialized I/O hardware in the system.
All that is required is to provide an additional connection from each of the cells in the
system (the I/O hardware can principally be thought of as connections here). Since
the I/O is handled by cells directly in the group, reconfiguration around failures in
terms of I/O is relatively straight forward. This is due to the fact that any of the
7-6
INTERGROUP BUS
_ELLS
c;_oueN ,
!
I
I
I
I
I
I
m
I
I
I
I/O PANEL X
C
I
I
| INTER
,CELL
X BUS
FROM
CELL
BUSSESFROM OTttER GROUPS
OF CELLS
#._
I
,._ GROLIP BUSCONNECTIONS
CONNECTIONS CELl. BUS / "
CONNEf'TIONS
BULKSTORE
Figure 7-5. Selected I/O Approach
7-7
cells can handlc I/O functions. It may be necessary to unplug - plug connectors on
the I/O panel. However, a group will not necessarily then be lost due to the failure
of an I/O cell.
Another posSibility along these lines that was given some thought, was that of
using one of the four neighbor leads for I/O as shown in Figure 7-6. In this approach
connections are made to half of the neighbor lines and brought out to an I/O panel as
in Figure 7-5. This eliminates the extra connection to each cell as described in
Figure 7-5.
Some of the neighbor lines are now shared between neighboring cells and I/O
conditioners. The problem that arises in doing this is that of avoiding conflicts of
usage on the common line. If only a single line is used (bidirectional channel) for
neighbor communications, then an additional connection must be added to the cells
to serve as a request/acknowledge line for use of the common line between I/O and
neighbors. This is required since a request/acknowledge approach must be used --
otherwise, communications over the common line would be random and meaningless.
It is impossible to use only one line with a request/acknowledge approach, since the
request signal from one source would naturally interfere with any established com-
munications by the other source sharing the line. It should be noted that since it is
now required to add the request line, one might just as well use this extra connection
for a separate I/O line as in Figure 7-5.
If two unidirectional lines are used to mechanize the neighbor communications,
the situation is somewhat different. A request/acknowledge type of approach is still
required since there still would be possibilities of simultaneous usage of the communi-
cation lines (although not as probable as with one line). With a request/acknowledge
approach and unidirectional lines it is possible not to require any additional lines
since the scheme can be mechanized on the two unidirectional neighbor lines. This
scheme would require using one of the lines for notification purposes to inhibit a
request from the non using source. If two requests occurred simultaneously the
proper acknowledge signal can be inhibited.
It is thus seen that if bidirectional lines are used for the neighbor communication
lines, no advantage results in tying the I/O directly into the neighbor lines. However,
if unidirectional lines are used, this approach could save two connections per cell.
Further explanation of the selected approach shown in Figure 7-5 will be given
below. Each cell contains communication lines as shown in Figure 7-7. The I/O
line shown in this figure is similar to an additional neighbor line added on to a cell.
Each of the I/O lines are brought out to the I/O panel as shown in Figure 7-5. This
results in an increase in the number of connections in the system, which, of course,
degrades the reliability of the system. However the actual increase in terms of
connections actually used is not as great as that provided (approximately 80), since
only a small portion of the I/O connections to the cells will be used {approximately 10),
as will be pointed out below. Therefore, this scheme actually results in only a small
number of extra connections in the system.
As shown in Figure 7-5, a number of conditioners, C1 --- CN, are connected
to a cell's (Ci) I/O line. This connection hierarchy is the same as described in the
preceeding approaches. Connections are brought out from both the intercell busses
7-8
C6-1476_2_
CELL_
v
I/O
Figure 7-6. I/O Using Neighbor Lines
and the intergroup bus to the I/O panel. These are parallel connections and the type
of devices that will be connected here are, e.g., the bulk storage unit, special buffers
for scientific experiment data, high rate experiment sensors with large quantities of
data, etc.
As mentioned previously, only a small number of the cell I/O connections will
be utilized. There are a number of reasons for taking this approach. If one tended
to use many of the cell I/O lines then many cells would be associated with particular
conditioners or sensors. This places a severe restriction on reconfiguration. Con-
sider reconfiguration due to phase changes for example from midcourse cruise to
midcourse velocity correction; if the sensors are closely associated with particular
cells one is now constrained as to where new programs may be placed in the system.
It would be undesirable to have the I/O information coming into many different cells
which may not even need this information and then have to be placed on the inter cell
bus to cells that require it. It may be necessary to unplug-plug sensors to affect a
reconfiguration in this manner.
Another disadvantage with this approach is that the reconfiguration of the system
around cell failures is difficult. While the probability of a single cell falling may be
quite low, there are many cells in the system (approximately 80). Therefore, by using
I/O connections to many cells the probability of having a failure now associated with
I/O signals is increased. If a cell falls that is being used with an I/O connection and
the conditioner is connected only to this one cell, then this conditioner has to be
unplugged-plugged to affect reconfiguration. The reconfiguration cannot be handled
by software alone.
7-9
N
N CELL BUS
CELL
; "-o
N_ I/O
N
W
Figure 7-7, Communications Lines to a Cell
The advantage in using more connections is that more of the I/O can now be
brought into cells that will use the data directly, thereby reducing the amount of I/O
that has to be handled over the busses or the neighbor lines. In order to provide the
reconfiguration flexibility, the approach of limiting the number of I/O connections to
cells was taken.
One or more cells per group will be connected to I/O conditioners as in Fig-
ure 7-5 (typically 2 or 3 cells). The I/O control in the group may be handled in a
number of ways. One approach is to have the I/O handled by both the individual cells
and the cells connected to the conditioners. The individual cells will have small I/O
routines for calling and accepting I/O data from the cells acting as I/O cells. The
I/O cells will contain routines for servicing requests from other cells and for generat-
ing requests to other cells; they may also contain autonomous routines such as for I/O
of periodic sensor data. The I/O cells will also have some memory available for
buffering of I/O data.
Other possibilities include having the controller cell in the group supervise the
I/O programs and/or also contain a good portion of I/O routines within itself. This
may be useful for I/O data intended for more than one cell. It is also possible for
the I/O cell to use its neighbor lines to pass I/O data to its four neighbors, thereby
eliminating the use of the bus for certain I/O data; this could prove useful for pre-
cisely periodic sensor data. The intent here is to present the general concept of the
I/O system; further definition of the requirements and in particular the sensors is
needed before an exact approach is designed.
As mentioned previously, connections are provided to the busses. Devices
connected to the intergroup bus such as the bulk storage unit will effectively function
as groups. The system executive will control the communication (I/O) between such
devices and the groups in the system. Devices connected to the intercell bus will
effectively function as cells; the controller cell in the group may control I/O to such
devices and/or the individual cells may also provide requests for I/O action to such
devices.
7-10
One other point should be mentioned here, critical sensors will be connected
to I/O cells in two different groups. This will provide for the required reconfigura-
tion capability if the intercell bus or I/O cell that a critical conditioner is connected
to should fail.
7.2 I/O MECHANIZATION
A preliminary design of the I/O system was completed to get an estimate of
the I/O mechanization requirements. This section presents a summary of this
investigation.
The I/O cells are to carry out their communications {both in and out) with the
conditioners over a single line connected between the I/O cell and all conditioners
that are to communicate with the cell. The I/O cell has control over this line and
operates under control of an internally stored program.
I/O is performed by executing the I/O instruction in the processor section.
This instruction will be presented in Section 9 where the entire instruction set will
be discussed. However, it will be introduced here so that the mechanization of the
I/O operations can be clarified. It will be stated here that the basic instruction
length is 16 bits with 6 bits used for the operation code. The I/O instruction uses a
long instruction format; this means that two 16 bit words are used to form the
instruction (a 32 bit instruction). The contents of the I/O instruction are shown
below:
bits: 6 1 9 7 9
l Op I I/OCode AdOessIICond/ viceWord Count
The first word of this instruction identifies the operation as either input or output and
also contains the address of the location in the cellts memory where the first word
will either be output from or inputted to. The second word identifies the conditioner
and device that the I/O is to be carried out with. Seven bits are provided here for
identification of the conditioner/devices. This provides the capability of handling up
to 128 devices per I/O cell. If this proves too few devices, one may possibly use one
of the word count bits for additional conditioner/device identification. A preliminary
feeling for the breakdown is to provide for 8 conditioners and 16 devices per I/O cell.
Nine bits are provided for the word count. This results in the capability of inputting/
outputting up to 512 words (full memory) with one instruction.
The sequence of events occurring in executing the I/O instruction will now be
explained. A combination of hardware and software will be used and the relative
usage of each to accomplish the instruction may be varied; the description given is
for preliminary design purposes only.
7-11
The following sequence of operations are carried out to execute the I/O
instruction: (by hardware)
- Place Address in a certain index/bank register
- Place proper bits in buffer register
- Shift out buffer register over I/O line
- Place word count in a certain location in memory
- Transfer to the input or output (depending on rd/write bit) software routine
The address of the input/output routine could be hardwired or previously setup
in a certain index/bank register. As noted above hardware and/or software may be
used to execute the I/O operation described above. A preliminary feeling is that the
above may be carried out most efficiently by hardware.
The buffer register contains the following information in it when it is to be
shifted out:
bits: 1 1 7 8 1
ISync I Cone/data I Cond/Device I Spare I Rd/Write
The buffer register is seen to be 18 bits in length. This is due to the fact that two
control bits must be added to utilize only one line in the I/O mechanization; these
are the sync and control/data bits, The sync bit is always a one and is used to
synchronize the conditioners. The control/data bit identifies the following 16 bits
as either a control word or a data word. The control word shown for the conditioners
uses 7 bits for the conditioner device identification--8 bits are not used (actually the
word count may be placed in here if this is deemed useful to the conditioners) and
one bit for identification of the operation as input or output.
The conditioners are clocked with the I/O cell and utilize the sync bit to set a
counter. Upon counting to 18 the conditioners reset and are ready for the next trans-
mitted word. Upon detection of a word identified as a control word, each conditioner
will examine the conditioner/device bits to determine if the control word is for this
particular conditioner. If it is then this conditioner will lock on to use the I/O line
for receiving or transmitting the data that follows.
7-12
A preliminary description of the output routine will be given below. This routine
will be described as a software approach. It should be kept in mind that hardware may
be substituted to achieve a faster execution if this is found to be necessary.
Output routine:
- Load Accum. with first word (address specified by previously setup index/
bank reg. )
- Transfer Accum. to Buffer Register (with this the buffer is automatically
started shifting out)
- Load Accum. with word count
- Subtract 1
-" - Test and transfer on 0
=0
- Modify index/bank reg. for 1st instruction
- Jump to some location back
in program
-_- Jump to master I/O routine
To get back to the output routine (when the buffer register is shifted out), an interrupt
is sent from a counter associated with the buffer register; this transfers the program
to the output routine (location of this routine could be a hardwired location or previous-
ly loaded in an index/bank register).
Input Routine:
- Transfer Buffer to Accumulator
- Store Accum. in first word (location setup by index/bank reg)
- Load Accum. with word count
- Subtract 1
Test and transfer on 0
Modify index/bank register used in 2nd instr.
Jump to some location back in program
Jump to master I/O routine
7--13
To get back to the input routine (when the buffer register is full again), an interrupt
is sent from the same counter associated with the buffer register (the I/O instruction
sets a mode flip flop to determine if the input or output routine is to be entered).
The above description results in an I/O system completely under control of the
I/O cell. The conditioners cannot interrupt or request I/O operation independently
from the I/O cell. To facilitate requests from devices for I/O operations (for
example a request from the astronauts panel or a buffer holding experimental data),
it is possible to insert in each (or possibly only several) conditioners a request
register that may be sampled periodically by an input operation by the I/O cell.
Then the I/O cell may decide how to handle the requests, if any.
Another possibility is to add a separate request line as shown in Figure 7-8.
This line passes serially through the conditioners. Periodically, the I/O cell sends
out a request pulse. If no requests are present in the conditioner, the connection
will be successively completed to the last conditioner where the request line will be
grounded. This, then, signifies no request is present. The first conditioner with a
request that receives the pulse will not complete the circuit. The I/O cell recogniz-
ing this knows a request is present but not from which conditioner. The conditioner
with the request that received the pulse will now send a control word to the I/O cell
indicating what I/O action is desired with the proper identification.
The advantage with this scheme is that the conditioners may be sampled very
quickly and easily (minimum software in the I/O cell) by the I/O cell to handle
requests by the devices for I/O operation. Its disadvantage is that it costs an extra
connection. Note however that if a failure occurred in this connection one could
always revert back to the original approach without this scheme.
For purposes of this design the request line scheme was not incorporated.
Further definition of the requirements is necessary if it is to be justified. It was
seen that software routines were proposed for actually inputting and outputting data.
Further study is required to determine if some or all of the functions of these
routines should be handled by hardware. This depends on what operations can
actually be interleaved with the I/O operation.
7-14
I/0 CELL
C ONDITIONERS
I I
REQUEST LINE
i/oLINE
m
m
I
Figure 7-8. Request I/O System
7-15
8. FAILURE DETECTIONAND RECONFIGURATION
The ability to detect failures and reconfigure the distributed processor around
failures is required in order to meet the probability of mission success and availability
goals. This section of the report presents the design of this capability.
Some of the basic guidelines forming a framework for this section are given
below. The time from the occurrence of an error to the time the system is reconfig-
ured and functioning properly should not exceed 0.1 seconds for critical mission
phases. Similarly, the time from the occurrence of an error to the time it is detected
should not exceed 0.1 seconds for critical maneuver phases. This time applies to the
critical mission phases and can be much longer for the non-critical mission phases
such as transplanetary coast. Crew participation can be considered for the following
functions (a} reconfiguration during non-critical mission phases, (b) turn-on and
requests for checkout of idle stand by equipment, and (c} replacement of failed equip-
ment with a spare, verification of the repair, and insertion of the equipment back
into the system.
8.1 FAILURE DETECTION METHODS
There are many approaches to detecting the failures that may occur in the system.
This section will discuss the approaches considered and select the most promising one.
Two important guidelines should be stated before proceeding with the discussion: (a) it
is desired to detect intermittent type of errors and (b) it is desired to detect and iso-
late failures to the cell level.
There are two basic approaches that may be considered: hardware and software.
The selected approach consists of a combination of hardware and software methods.
Hardware methods will be used where a large percentage of errors are detected for
a reasonable increase in hardware complexity. Software methods will be used primarily
where it is difficult or inefficient to implement hardware detection and also to supple-
ment hardware detection where a small amount of software checks a large percentage
of hardware. It should be noted that the desire to detect intermittent errors results in
relying more heavily on hardware methods. Intermittent errors may or may not be
detected by software methods when they occur. As time increases from the occurrence
of an intermittent error, the probability that it will be detected increases. However,
this probability may not increase fast enough to guarantee detection and reconfiguration
during a critical mission phase. A discussion of the selected approach and others con-
sidered will be given below.
The first approach to be considered involves a "floating test-cell" concept. In
this approach, one cell in each group contains a test program. In operation, all other
cells perform the operational problem while this cell tests itself. After a prescribed
number of program steps, the testing task of the successfully tested cell is exchanged
with the operating task of another cell within the group, this cell then tests itself during
the next program sequence. Thus, the testing is multiplexed, with all cells eventually
executing the self-test. A drawback of this method is the storage limitation of an indi-
vidual cell. It is questionable whether a very comprehensive self check program can
8-1
beplacedin onecell. However, it should be possible to use a neighbor cell as addi-
tional storage to hold the program if this becomes a problem. The most serious draw-
back however is that the geometric properties of the array of cells are disrupted.
Neighbor communications assignments would change as the operational programs
move and secondary connections would be required for I/O signals since I/O programs
would have to be reassigned as cells performing I/O are tested.
A second approach uses active redundant cells in a multiplexed manner. Refer-
ring to Figure 8-1, a cell group containing 20 operational cells and 5 test cells is
depicted. The operational data flow paths between cells are not shown, only those
for testing are shown. Cell T1 is responsible for testing cells C1, C4, C5, and C17;
T2 for testing cells C2, C6, C7, and Cll; etc. Periodically, during normal operation,
four test time slots are reserved so that each test cell may check its adjacent cells.
All test cells act in parallel, testing one operational cell at each test time slot. The
test action at any test time slot consists of checking the results of the previous test,
and if OK, setting up the test of the next cell by loading its contents into the test cell.
Upon resumption of the operational program, the test cell performs the same opera-
tional calculations as the cell under test. As an example, assume that T1 is now
checking C5 and is to test C1 next. Operationally T1 is performing the same problem
as C5. At testing time the results generated by T1 are compared with those gener-
ated by C5. This may readily be performed by global control of the testing cells.
If disagreement exists the error is reported to the controller cell. If no disagree-
ment occurs, T1 is loaded with the contents of C1 and will redundantly perform Cl's
calculations during the next operational cycle.
Some of the pitfalls of this approach are as follows: First, there is a loss of
flexibility of operational communication paths between cells since one of the four
paths is used only for testing. Second, the ability to provide the test cell the same
inputs as the cell under test during the active redundant test phase may pose a severe
programming constraint. Third, all the neighbor communication paths are not
checked. Fourth, errors in circuitry peculiar to the execution of a particular instruc-
tion are detected only if that instruction is being executed while the cell is tested.
Fifth, the symmetry and efficiency of test cell utilization is geometry dependent. For
example, in Figure 8-1 each test cell uses all four of its neighbor communication
paths and 5 test cells test 20 operational cells. In a 4 x 4 test cell matrix the symme-
tric approach is to have 4 test cells checking 12 operational cells with each test cell
checking 3 operational cells. Finally, the controller cell would be required to con-
tain a program for global control of the test cells and to move data between the
operational cell tested and the test cell for disagreement detection.
A third approach reduces the amount of redundant hardware by employing
time-redundancy. By this method a sequence of program steps is performed by the
group, the tasks of the cells within the group are interchanged, the program steps
are repeated, and the results of the two executions are compared. The obvious dis-
advantage of this approach is the reduction in operating speed by a factor greater than
two. It also will require some overhead in storage of the controller cell to globally
control the interchange of programs, the interchange of redundantly computed data
and the programs to detect disagreement.
A fourth approach uses a central self test program that will be executed period-
ically by the controller cell and sent out to the other cells in the group for simultaneous
execution. This program may be fairly lengthy for a comprehensive check and may be
L8-2
Figure 8-1. Active Redundant Test Cells Within a Cell Group
8-3
contained in a neighboring cell due to storage limitations in the controller cell. The
test routine would be sent from the controller cell; itwould be executed by all the cells
in a global manner and also by the controller cell itself.
The cells of the group would execute a test problem in the following manner.
Each cell would perform the same instruction at the same time, transmit computed
results to each of its four neighbors, check the data received from each of its four
neighbors against its own computed result, and report to the controller cell over the
inter-cell bus. Assuming that only one failure will occur at any one time, a cell's
failure will generally result in each of its four neighbors identifying it as a failed
cell and possibly it will identify all of its neighbors as having failed. The controller
cell will decode the failure reports and perform further tests if necessary to isolate
the failed cell.
Finally, the last approach is to incorporate fault detection hardware into each
cell. No special testing mode would be required. Checks would be performed con-
tinually,along with the operational problem. Parity checking, special error detecting
codes, feed-back checks on decoders, are typical of the type of fault detection hard-
ware that would be incorporated. The disadvantage with this approach is the amount
of hardware that must be incorporated to achieve a high probability of detection.
It should be pointed out here that the approach of placing self test programs
thaL would permanently reside in each cell was not considered. This is due to the
relatively small amount of storage available in each cell.
The selected approach is a combination of the last two approaches discussed
above. Hardware detection methods would be used where they are fairly efficient
in terms of the amount of hardware checked compared to the amount required for
checking. A failure detected by hardware would set an error flip-flop which is
sampled periodically by the controller cell. Software detection methods would be
used to supplement the hardware methods to achieve a very thorough test and a high
probability of error detection. The self test programs are sent out from the control-
ler cell as noted above. All of the other cells in the group are in the dependent
global state when these programs are executed. In general a neighbor cell will detect
a faulty cell since neighbor responses are checked as part of the program. Note
that this also checks out the neighbor communication paths. Part of the program
may be to see if the error FF has been set. If it has, then this cell will not trans-
mit a certain response to its neighboring cell. Detection of an error will cause a
cell to change its global level. The controller cell can test to see if any cells have
changed level at the end of the self test program. If none have changed the group
resumes its normal operational function. If an error has occurred, the controller
cell may simply ask the suspected faulty cell for the contents of its accumulator which
can contain a predetermined value only upon successfully passing the self test program.
It should be noted that this approach will also detect errors in the controller cell
since it can also execute the same instructions. However, the controller cell does not
have another "controIler cell" to report to. To resolve this problem, the approach
taken was to require the controller cell to report to the group switches.
8-4
Thegroup switcheswill contain BITE timing circuitry. This timing circuitry
simply consists of an out of tolerance voltage detector. The setting and resetting of
a flip-flop at a certain rate will produce a voltage of a certain value. If the voltage
is too high or too low this will cause an error signal to be generated. The timing
circuitry flip-flop can only be set by a predetermined fixed code word. This is the
code word that the controller cell will contain and send to the group switches only after
it successfully completes the self test program (note that if the hardware error
detection FF is set the software self test program will not be completed).
The timing circuitry has been included primarily for the purpose of reconfigura-
tion during critical mission phases as will be explained later. The executive group
can also check on another group's status by interrogating the controller cell response
register in the group switchers. This approach was selected over that requiring the
executive group to interrogate the controller cell directly. While the latter approach
may save some hardware in the group switches, it makes reconfiguration more com-
plex. It should be noted that the controller cell response word is used to indicate
failures of the controller cell only. Failures in the working cells of the group are of
primary concern to the controller cell in that group. However, periodically the exec-
utive group may interrogate the controller cells of the other groups to determine
what is the status of the individual group resources.
Present organizational concepts call for using two inter-group busses and two
group switches per group to reduce the possibility of single failures bringing down
the entire system. This results in having the controller cell response word reported
to two group switches. The executive group can then check the group switches to
determine if a failure is in the group switch or the controller cell. Conflicting reports
from the group switches would indicate a failure in one of the group switches.
The executive group contains the highest level executive in the system. In
most likelihood a backup minimal executive will be contained in another group.
This backup executive can monitor the executive group for failures just as the exec-
utive group monitors other groups.
The group switches will contain logic to shut off power to the cells in their
group if a failure is indicated by the controller cell response word. The reason
for including this function is to prevent a group from malfunctioning in a mode that
will tie up or bring down an inter-group bus. An example of this situation is where
the executive group fails in a mode such that it continually outputs erroneous infor-
mation over the inter-group bus.
8.2 RECONFIGURATION
This section discusses the task of reconfiguring the distributed processor after
a failure has been detected and reported to the space crew. Inherently this system
results in a high probability of mission success and availability because an individual
group can continue to operate in the presence of a failed cell(s), and because the
system can continue to operate in the presence of a failed group(s). There exists
a status zone between full available computing power and mission failure for which
additional failures may result only in the elimination of lowest priority computing tasks
and not in mission failure. This zone can be termed one of degraded performance.
8-5
The missionphaseswere presentedin Section2 of this report andfor purposes
of reconfigurationanalysismaybe classified as consistingof three basic types:
(1) Non-Critical, (2)Critical, and(3} Mars Orbital. Theorganizationalconfigurations
for eachphaseare listed below:
1. Non-Critical:
2. Critical:
3. Mars Orbital:
3 group; on, 18cells per group
4 groupson, 16 cells on per group
4 groups on, 18 cells on per group.
These configurations are based on the requirements in Section 2 plus an esti-
mated 30% for overhead functions. The reconfiguration plans will be based on the
phase in which the failure occurs.
8.2.1 Non-Critical Phase
Reconfiguration for this phase as well as the other two phases will depend on
the type of failure. The types of failures may be classified as follows:
1. Working cell fails
a. In a manner not affecting other cells
b. In a manner affecting the other cells, either bringing down the
inter-cell bus or contaminating information in other cells
2. Controller cell fails
a. In a manner not affecting other cells
b. In a manner affecting other cells
3. Group switch fails
a. In a manner affecting only internal operation
b. Bringing down one of the two busses connected to it
4. Executive Group fails
Either by the controller cell in this group failing; by a working cell
failing, and affecting other cells; or by a group switch failing and bringing
down the intercell bus of this group.
5. Conditioner/Sensor fails
The configuration during this phase is to have three groups turned on. The
computational functions are distributed among the groups. One of the groups will
contain the overall system executive.
Working Cell Failures
Reconfiguration around a working cell failure will be considered first. Assum-
ing the failure has been detected and does not affect another cell, the controller cell
8-6
will be informed of the failure andwill go into its reconfigurationexecutiveroutine.
It will scan its cell assignmentandstatus tables to determinewhat the failed cell
was assignedto do.
The controller cell does not know at this point whether the failed cell could
have contaminated data in other cells of the group. This could happen ifthe cell
failed and transmitted data to another cell. Therefore, the controller cell will
assume that any data transmissions that took place from the failed cell since its last
self check routine resulted in the transmission of erroneous data. As part of the
reconfiguration routine the controller cell will determine which cells the failed cell
has com_munieated to and inform these cells to ignore these data or any computed
results based on these data (ifthis is not possible to mechanize, itmay be required
to reinitialize the entire group).
Continuing through the reconfiguration routine the controller cell determines if
there are any spare cells available in this group. Ifthere are any spare cells avail-
able, the controller cell will enter its full capability assignment routine. This will
involve assigning the failed cell's program (reloaded from the bulk storage unit) to
either a spare cell or moving cell assignments around to maintain certain geometri-
cal relationships between assignments (this will occur when neighbor communication
capabilities are being used).
If there are no spare cells left in this group the controller cell can take one of
two courses of action: (1) it can ask the executive group if this group's functions or
part of the functions can be reassigned to another group or (2) it can reduce or elimi-
nate some of the functions being performed in this group according to some degrada-
tion procedure. Obviously the executive group must maintain tables of active/spare
cells in the various groups and also tables of a certain level of functions assigned
within the groups with degradation procedures. It should be noted that degradation
may take place in two forms: (1) lowest priority functions may be completely elimi-
nated from the computations or (2) the computational rate may be reduced on some
functions, i.e., the programs may be loaded in periodically from the bulk storage
unit and essentially time share storage in the computer with other low priority
programs.
The reconfiguration executive programs are discussed in the software section
on the distributed processor executive routines (section 11).
A working cell failure that affects other cells in the group will be discussed
next. If it fails in a manner bringing down the inter-cell bus, the controller cell will
not be able to respond to the group switch with its status word. The group switch
will then indicate a failure in this group. The executive group will attempt to isolate
the failure following the procedure outlined below for a controller cell failure. This
type of failure will cause the entire group to be rendered useless (unless the bus is
not affected when power is turned off to the individual cell).
If the working cell contaminates information in the controller cell then the con-
troller cell will indicate a failure when it enters the self test mode and the group
switch will indicate a failure in this group. How this failure is treated will be
discussed below under controller cell failures.
8-7
Controller Cell Failures
Failures of the controller cell whetheror not they affect other cells will be
treated in the samemanner. The groupswitcheswill not receive the proper response
word andthis will bedetectedby the executivegroupwhenit samplesthegroup
switches. Theexecutiveupondetectingthe failure has no further information asto
whatthespecific failure in the group is. It must assumethat the entire group is
down. An exampleof anentire groupbeingbroughtdownis a failure in the controller
cell that results in erroneousglobal commandsto all the cells in group. As a result
the cells canbe in totally false states or levels.
Theexecutivegroup therefore has to assumethat it hasnocapability to access
the cells in thegroup. Communicationswith the failed groupwill be initiated by an
externalsource, similar to that usedto initially start upeachgroup. Upondetection
of thefailure, a light on the display panelwill be lit indicatingwhichgrouphas
failed. A newcell in the groupwill thenbe initialized as the controller cell._ This is
accomplishedby initializing the cell to the controller state via this cell's I/O line
followingthe normal start upprocedures. It shouldbenotedthat this may require
the astronautto plug in a newconnectionto the computerI/O panel. This canbecir-
cumventedif analternate connectionis initially providedthat wouldbeusedwhenever
a groupfailure is indicated. In anycasea newcell is attemptedto be started up as
a controller cell byhaving theexecutivegroup loadin a self test routine to beexer-
cisedbythis cell. This routine couldbe containedin the executivegroup or called
in from the bulk storageunit.
If the newcontroller cell is functioningproperly it will issuethe proper response
word to the groupswitchesas is required in normal operation. Theexecutivegroup
will checkthegroup switchesfor the proper responseword. If the cell is functioning
properly thus far, the executivegroupmay attemptto communicatewith the new
controller cell to further checkout the cell. Whenthe cell is finally determinedto be
functioningproperly, the executivegroupwill commandthe controller cell to loadup
andcheckout the remainingcells in thegroup. It will alsogive the controller cell a
list of cells that were knownto havefailed prior to thegroupfailure so that the con-
troller cell canavoidtesting cells already knownto havefailed.
Thecontroller cell will thenbeginturning on andnamingcells via the neighbor
lines. It will checkoutonecell at a time until all cells, exclusiveof those listed by
the executivegroup, havebeentestedandtheir statusdetermined. After a fixed
period of time theexecutivegroupwill interrogate the controller cell to determine
what thefailure was. Theexecutivegroupworking with the controller cell will now
reconfigurethe group. Thereconfiguration routine will consist of onesimilar to that
describedfor theworking cell failure wherethe programs are attemptedto be rein-
statedin the groupandsecondlyin other groups if theycannotbeefficiently mechan-
ized in this group.
It maybepossiblethat the failed cell that causedthe group to indicate a failure,
couldhavetransmittederroneousdata to anothergroup. Theprocedure that will be
followedis exactly identical to that followedby the controller cell whena working
cell fails. Namelytheexecutivegroupwill inform other groupsto ignoredata
receivedfrom the failed groupor computedresults basedonsuchdatasince the time
8-8
the last self checkwas run by thegroup that failed (theexecutivegroupwill always
interrogate the controller cell i,_a group asto the time of its last self checkprior to
allowing it to transmit dataon the inter-group bus).
It shouldbe notedthat the failed cell maybring downthe buswhenit is turned
on. Shouldthis occur the controller cell will turn it off via the neighborcommunica-
tion lines. This shouldbeaccomplishedin time for the controller cell to continue
its reports to the group switch.
If the initial cell identified as thenewcontroller cell fails to respondproperly
to the groupswitches, anothercell will be attemptedto be initialized as the controller
cell. This procedurewouldbefollowed until somenumberof cells in thegrouphave
beenattemptedto be started upby the executivegroup. If it is notpossible to get the
cells started, theexecutivegroupwill declare the groupashavingcompletelyfailed.
This situation canexist if a cell fails in a modepermanentlybringing downthebus.
This procedurewill require the executivegroupto outputthe results of testing
the groupto a displaypanelandinform the astronautof the necessityto initialize new
cells as controller cells.
If the groupis declaredas havingtotally failed, the executivegroupwill go
into a specific assignmentprogram to reinstate the programof thefailed group.
This will involvefirst checkingto seeif anyspare groupsare available in the system(thenon-critical phaseswill initially have1 spare groupsince only 3 groupsare
required here and4 in the most heavily loadedphase). If a spare groupis available,
it will bestarted upusingthe normal start upprocedure. However, if no spare
groupsare availableanassignmentprogram will beenteredthat will allocatethe
programs to the other groupsin the system. This will probably result in a degraded
modeof performance. Theassignmentprogram will bequite similar to that used
whena workingcell fails exceptthat it must containspecific information on all the
tasks currently assignedto the system.
GroupSwitchFailures
If oneof the group switchesfails anddoesnot affectanyof the busses, it will
simply not beused. As longas spare groupswitchesare available the groupcan
continuefunctioning. The executivesin all thegroupsmust be informed of the failed
groupswitch so that they mayupdatetheir communicationroutines to notuse this
groupswitch for communicationswith this group.
A group switch mayfail in a mannerbringing downthe intercell busof its
group. Theexecutivegroupwill detect this failure whenit samplesthegroup switches
for the controller cell responseword. This failure thenappearsidentical to that
treated abovefor the controller cell failure. An identicalprocedurewill therefore
be followedto start up the failed group. Onegroupswitchwill be turned on first to
checkout the cells in the group. Assumingthis is thefailed group switch, it will be
impossible to get the group initialized with a controller cell. This group switchwill
thenbe turned off andanotherone, if available, turnedonandtheprocedure repeated.
If this group switch is functioningproperly andthe failed group switch doesnot affect
8-9
the intercell buswhenit is turned off, the groupwill be initialized. The executive
grouphasthen isolated the failed group switchandproceedsas discussedin the
preceedingparagraph. If thefailed group switch affects the intercell buswhenit is
in anoff state, the entire group will bedeclared faulty.
A failure of a group switch that brings downan inter-group buswill bedetected
bythe executivegroupwhenit tries to communicateto other groupsover this inter-
groupbus. Thegroup switchesconnectedto this inter-group buswill thenbe turned
off oneat a time until the bus is operativeonceagain. Thefailed group switchwill
thenbekeptoff andtheexecutivegroupwill reconfigure aroundthe failure as des-
cribed for the previous twocasesabove. If the failure results in bringing downthe
inter-group buswhenthe failed group switch is in anoff state, the inter-group bus
will simply notbeused.
ExecutiveGroupFailure
A failure in the executivegroup of the typedescribedaboveunder "controller
cell failures" will result in the executivegroup's groupswitches reporting a failure.
First it will beassumedthat there is a minimal backupexecutivein the systemcon-
tainedin someother group. This backupexecutivemonitors the executivegroupfor
a failure. Upondetectingthefailure, it will outputa signal to the display panel inform-
ingthe astronautof the failure. The astronautcancheckthis report against the
go/no-gosignal from the executivegroup's group switches. Assumingthe two reports
agree, the astronautwill proceedto reconfigure aroundthe failure of the executive
groupby havingthe minimal backupexecutiveenter its reconfiguration routine (which
is what the backupexecutiveprimarily consistsof). This routine hasno alternative
butto assumetheexecutivegroupcouldhavecontaminateddataandcontrol in anyof
thegroups in the system. (Theminimal backupexecutivecannotbealtered there-
fore any failures external to it shouldnot affect it). Therefore it assumesthe
entire systemmust be reinitialized.
The minimal backupexecutivewill nowload in from thebulk storage unit a new
executivegrouproutine into the group in which it is residing, preserving a record of
wherethe executivegroupresidedwhenit failed. Thenewexecutivegroup will first
checkto see if anysparegroupsare available. If a spare group is available, the
systemis reloadedas is donewheninitially starting up the system. After the system
is startedup, the newexecutivegroupwill inform the astronautthat thepreviously
failed executivegroupshouldbecheckedout in order to isolate the failure andpossibly
return thegroup to a spare status. Theprocedurefor checkingthe failed group is
exactly identical to that describedabovefor a controller cell failure (workinggroup
failure). The routine for assisting in checkingout thefailed group is of course a
normalpart of the executivegroupfunctions.
However, if the situation occurs that a spare groupwasnot available, the new
executivegroupwouldproceedto checkout the failed groupprior to loadingupany
othergroups. In this mannerit wouldknowwhether the failed groupcanbeplaced
backin service. If it has completelyfailed, then the newexecutivegroup wouldpro-
ceedto its degradedmodereconfiguration routine as previously describedabove.
8-10
If a minimal backupexecutive is not containedin the system thenthe astronaut
must reinitialize the system just as if the systemwerebeing first turned on. The
primary differencehere is of course that hewouldnotuse the failed group in recon-
figuring. Hewould loadupa newgroupas the executivegroup andgive it the loca-
tion of the failed executivegroupso that the sameproceduresas outlined abovemay
be followedby the newexecutivegroup.
It may notbe very difficult for the astronautto essentially act as the minimal
backupexecutive. However, the minimal backupexecutiveshouldbe relatively simple
andwouldprobablyfit into a single cell.
In anycase, a failure of the executive group may be considered as a hard core
type of failure. It essentially requires the system to be completely reinitialized as
from a cold start.
Sensor/Conditioner Failure
It is possible for sensor/conditioners to be connected to the cells, the inter-
cell busses, and the inter-group busses. Assuming the failure has been detected
and isolated the system would continue operating in a mode not using I/O associated
with the failed device until a new one is plugged in to the I/O panel replacing the
failed device.
A failure may occur in a sensor/conditioner bringing down the part of the system
it is connected to. For example, bringing down the inter-cell bus, causing a group
failure indication. In such a case, if the group cannot be started up following the
procedure discussed above for controller cell failures, the sensor/conditioners
would be disconnected by the astronaut one at a time thereby isolating the failure.
Similarly the same procedure would be followed for failures on inter-group busses
and the cells themselves.
8.2.2 Mars Orbital Phase
The configuration during this phase requires 4 groups of 18 cells each. One
group will be designated as the executive group with the system computational func-
tions divided up among the groups.
Working Cell Failures
The same procedure is followed as in the non-critical phase. There will be a
minimal Navigation and Guidance function being performed in another group. If a
cell fails which is concerned with the full Navigation and Guidance computation, the
minimal program will take over to prevent complete loss of data until the full naviga-
tion and guidance programs are reassigned. A complete loss of data will result in a
reinitialization of navigation and guidance parameters which could take approximately
a half hour. This could be detrimental say in correlating video data with position
information. While this is not a critical type of backup function, it is one that can
enhance the system availability.
8-11
Controller Cell Failures
The same procedure as outlined for the non-critical phase is resorted to. Note
that if the failure is in the group performing the full Navigation and Guidance function,
part of these functions would be loaded into the group with the minimal backup Naviga-
tion and Guidance function. Also note that this phase requires the full computing capa-
bility; therefore there may not necessarily be spare groups to resort to in case a
group is declared faulty.
Group Switch Failures
Same as non-critical phase.
Executive Group Failures
Same as non-critical phase. Note however that there is less likelihood of having
a spare group in this phase.
Sensor/Conditioner Failures
Same as non-critical phase.
8.2.3 Critical Phase
The configuration during this phase consists of 4 groups turned on. Groups 1
through 3 are considered the primary system, one of these groups will be identified
as the primary executive group. Group 4 will be considered the secondary system.
It will contain its own executive and the critical mission computations. Note that
this also implies that there are redundant connections to the critical sensor/
conditioners.
Working Cell Failures
The cell that failed or the cells that it affects can be involved with critical or
non-critical computations. Failures are detected as before in the other two phases.
If the failure is involved in non-critical computations, the controller cell will deter-
mine if any spare cells are available in the group to assign the failed computations to.
They would be loaded in from the bulk storage unit. There would be no reassigning
of programs in the group to affect a certain geometrical reconfiguration of the pro-
grams as may be done in the other two phases. If no spare cells are available, the
failed cells computations would be simply suspended. To reassign programs to other
groups would probably require plugging/unplugging sensor/conditioners to effect a
reconfigur_ition and it is doubtful this would be done in a critical phase for non-
critical computations.
A failure associated with critical computations will result in the controller cell
issuing a faulty response word to the group switches which will indicate group failure.
The group switches will output the group failure signals to switches that control the
steering of output signals to the critical sensors as shown in Figure 8-2. This will
result in switching over control to the secondary system as far as the critical sensors
are concerned. The executive group in the meantime will have also noted the failure.
8-12
CRITICAL SYSTEM OUTPUTS
OUTPUT SWITCH
CRIT.
COND.
PRIMARY
GROUP LOGIC LEVELS
INDICATING ON-
LINE GROUP
I
CRIT.
COND.
SECONDARY
GROUP
Figure 8-2. Output Switching of Critical Conditioners
At this point it does nothing. The controller cell will attempt to reconfigure within
the group so that the critical computation can be reinstated and serve as an additional
backup if the remaining system failed.
The executive group will therefore wait a fixed amount of time to determine if
the group has been reconfigured internally. If the group has been reconfigured, it
must then be restored to operational status by providing it various computational
parameters. There are several ways this can be done: either by using the last known
good set of parameters of the failed system which have been continually stored in
another cell or group during operation, or by obtaining the latest values from the
system which did not fail.
If the group has not been able to reconfigure itself, the failure will be treated
as discussed under controller cell failure given below.
Note that the above discussion applies to the primary and secondary systems
except that in the secondary case no actual output switching takes place. It should
also be noted that the executive group reconfiguration routines are simpler than those
in the prior two phases.
Controller Cell Failures
As above the group switches will control the output switches to switch control
to the operating system for the critical sensors if the failure involves a group with
critical computations. The executive group will follow a procedure similar to that
8-13
givenfor theprior two phases as far as attempting to check out and isolate the failures
in the failed group. Unless alternate connections were provided for in each group, so
that the astronaut could quickly initiate such action upon a signal from the executive
group, it is doubtful that the astronaut would plug in the necessary connections,
although it could be done.
If the failed group were reinitialized, it could be loaded up with critical com-
putations to serve as a backup for further failures. In the meantime, prior to
checking out the failed group, the executive group will reload one of the other groups
with a set of the critical computations from the bulk storage unit if the failed group
contained critical computations. In this way another group could serve as a backup
to further failures in critical computations. Note that this capability would require
redundant I/O connections to groups unless the astronaut plugged/unplugged the con-
nections during the critical phase (this may not be practical for him to do).
Group Switch Failures
Same discussion as in prior two phases applies here. The controller cell fail-
ure procedures outlined above will be followed here in attempting to reinstate the
failed group.
Executive Group Failure
If the executive group's group switches indicate a failure and a minimal backup
executive exists as discussed in the prior two phases, the backup executive can
follow the same procedure as discussed previously. It would probably reload the
executive group function in its own group (assuming it initially contains no critical
computations) and attempt to bring up the failed group as is normally done by the
executive group function. Once again the comments about connections to the groups
and requiring the astronaut to make connections applies here. Note that the backup
executive can only be located in one specific group since the other two non-failed
groups would be associated with critical computations.
If a backup executive were not provided the above would not be possible except
by astronaut participation. However, the two groups assigned the critical computa-
tions would continue operating normally.
Sensor/Conditioner Failures
Failures associated with non-critical functions result in suspension of the
associated computations. However, if they affect a critical function, the controller
cell would simply indicate a group failure and result in switching over control to the
operating system. Unless redundant connections were provided, the astronaut would
have to plug/unplug connectors to reinstate another backup system for critical
functions.
It should be noted that in the above discussion for the critical phase, failures
in one group affecting other groups are not mentioned. The reason for this is that
the programs in the groups are expected to be well isolated and contain check routines
to validate any data received from other groups. They will also have routines to
8-14
validate requestsfor anycontrol changesor locationsthat datamaybe read into. In
this mannereachgroupessentially locks out othergroupsexceptfor certain predeter-
minedfunctionswhichwill not affect this groupif anothergroupfailed that it is com-
municatingwith. The groupswitcheswill also containlogic to inhibit their group
from getting on the inter-group busshouldthey fail to get the proper responseword
from the controller cell. In this manner, it is very unlikely that onegroup's failure
will affect the remainder of the system.
It may beseenthat the organizationcontainsmanypossibilities for sustaining
multiple failures during a critical phase. This is particularly true if a numberof
extra I/O connectionsare provided to the system asdiscussedabove.
8.3 BACKUPEQUIPMENTASSURANCE
There mayexist at any time in the system a numberof spare cells in each
group. Thesecells will beperiodically testedalongwith the other cells in the groupto determine their status.
A spare groupwill exist during the non-critical phase. This group will be
turned onperiodically by the executivegroup andtold to run the standardself check
routines. Therefore, the executivegroupwill be informedof its status for possible
use in reconfiguration. This statuswill also be requiredprior to entering thecriti-
cal or the Mars orbital phase.
8.4 EXTERNALSTATUSREPORTS
The executivegroupis theprimary central sourcefor status reports. It will
issue reports to the displaypanel that a grouphas failed anda newcontroller cell
must be initialized. It will also indicate the numberof failed andoperablecells in
eachgroup. Eachgroupwill al so havea connectionfrom its group switchesto
indicate a failure in a group. This report couldbecomparedwith the executive
group's report for anydiscrepanciesin the system, The executivegroupwill also
issue status reports on thegroup switches in the systemandon the inter-group
busses.
Thecell that thebackupexecutiveis in will alsohavea connectionto thedisplay
panel (via the cell's I/O line) to indicateanyfailures in theexecutivegroup.
8-15
g. CELLAND GROUP SWITCH DESIGN
This section presents a detailed description of the cell. First the features are
presented and secondly some detailed aspects of the design are discussed. Finally
some design considerations of the group switch are given in 9.3.
9.1 PROCESSOR FEATURES OF THE CELL
TbAs section will describe the general features of the processor section of each
cell. In particular, the word length, accumulators, index-bank registers, and the
instruction word format will bediscussed. Since the requirements to be designed to
are not very firm at this point in time, some of the alternatives discussed cannot be
explicitly chosen as optimum. In addition, some general considerations with regards
to the instruction set will be given.
9.1.1 Accumulators and Index Registers
This sub-section will discuss the use of multiple accumulators and multiple
index registers in terms of their ability to save storage. Since it is not necessary
for the processor to operate at high speed, as many of the processor registers as
possible will be stored in the memory. This enables many registers to be used
for the processor at a small increase in system complexity or lowering of
wafer yields. This is the case since registers in the memory can be easily
fabricated with high yields by using discretionary wiring or similar techniques.
On the other hand, the registers constructed in the processor are generally part
of lower yield complex logic. (If this differences in ease of fabrication is eliminated
in the future the multiple accumulators and index registers can be included in the
processor section of the chip. The instruction execution time would then be
decreased. ) Because of the above points there will be just one accumulator in the
proccssor section of a cell and any additional accumulators will be accessed from a
specified area of the memory. (The chosen processor uses four accumulators. )
The index-bank registers will also be contained in a fixed area of memory.
A ccumulators
The use of more than one accumulator can save a significant amount of
execution speed and, of more importance for this application, storage. The storage
savings comes about since intermediate results do not have to be stored in the memory
or in hot storage. As intermediate results are obtained they are simply left in the
accumulator in use and another accumulator is brought into use. In this way it is not
necessary to store the first accumulator while further operations are carried out
before the intermediate value is again needed. The accumulator to be used in any
operation is simply specified in the up-code. As a result when the instruction is to
be executed the proper accumulator is pulled out of the memory and exchanged with the
accumulator in the hardware location or if the hardware accumulator is the one
specified no exchange is necessary. This process clearly takes a longer time than if
the accumulators were in the processor hardware itself; however, for the processor of
interest the main interest is in saving storage and as a result the processor hardware
that would be devoted to multiple accumulator registers can be eliminated or used in
other fashions, e.g., for complex macro instruction control, etc.
9-1
Theusefulnessfor memory savingsof a second,third, or more accumulators
for intermediate storagemust beevaluatedfor anygivenapplication. However, an
evaluationthat was carried out(Ref. 29)showedthat for guidanceandnavigationin
anavionicssystemaddition of a secondaccumulatorreducesthe instruction count
by asmuchas 8 percent andthe inclusion of six or more accumulatorsbring this
reductionto as muchas 12percent. Howmuch the savingis for scientific experi-
ments, telecommunications,etc. hasnot beendetermined, but it is clear that the
useof at least 2 accumulatorsis a very valuable asset for savingstorage. Since
theseaccumulatorsare in memory, their only hardwarecost is additionalcontrol
circuitry. It canbeseenthat the majority of the advantageof multiple accumulators
is accruedwith the additionof the secondaccumulator, as a result eachprocessor
cell will haveat least twoaccumulators. Theinclusion of additionalaccumulators
dependsonboth the availability of instruction bits to specify to whichaccumulatora
particular op-codeis beingappliedandthe relative usefulnessof additionalaccumu-
lators versus additionalindex registers. Thesepoints are madeclearer by the
discussiononeachpossibleword size givenlater.
Indexing
Full word length index and bank registers are used. Therefore, there is no
real distinction between the two since they both accomplish address generation and
address modification in the same manner. They will be referred to as index, bank,
or index/bank registers throughout this report.
Indexing in each processor of the cells will be carried out by memory index
registers. For an instruction that requires banking or indexing, the proper index/
bank register or registers are accessed from the memory, loaded into the memory
buffer register, and added to the memory address register (the memory address
register holds the address displacement obtained from the initial instruction word).
From this it can be seen that an instruction that needs to be banked and indexed
before picking up the operand would require four memory cycles, including the
memory cycle to pick up the instruction itself (accumulator may be in memory).
The advantage of indexing in terms of memory saving has been discussed in
many places. Two such discussions are given in References 29 and 1. From these
references and from an investigation of the requirements, it can be seen that the
inclusion of at least three index registers along with one or more bank registers will
provide significant storage savings (20 percent or more); however, the addition of
more index registers than three provide significantly less storage savings. As a
result, at least three index registers will be included in the processor of each cell,
the use of additional index registers depends on the availability of instruction word bits
and on a comparison of the value of these additional registers to the value of addi-
tional accumulators. The index registers could also be used for temporary storage;
however, use of the index registers in this manner would provide no memory saving
unless register to register instructions were included (instructions carrying out
basic operations such as add from one index register to another index register).
The reason for this is that the index register used as temporary storage cannot be
addressed directly by bits in the op code if an operation is to be carried out between
the temporary storage and a memory operand. This is clear since there are index
register bits in the instruction word that must specify the indexing of the address for
the memory operand. Therefore, only operations that address two index registers
used as temporary storage and then add or subtract them, etc., could provide
9-2
an instruction savingover theuseof memoryaddressingfor intermediateresults.
Ontheother handnotethat if accumulatorsare usedfor temporary storage, they
may easily beaddressedby additionalbits in the op-codeportion of the instruction
word since theseregisters will notbeusedfor indexinga memory address. Investi-
gationof theusefulnessof register to register operationsversus the usefulnessof
providing accumulatorsthat maybeusedfor temporary storagehasshownthat the
register to register operationsfind very little usagein comparisonto the accumulators.
As a result using accumulatorsinsteadof indexregisters for temporary storagepro-
vides a muchgreater storagesavings. (Reference1 discussedthe useof hardware
indexregisters for temporary storagein order to provide increasedexecutionspeed
by usingthese registers for temporary storage. Notethat this increasedspeedcan-
notbe obtainedfor the processor specifiedabovesincethe temporary storage or
indexregisters are locatedin the memoryandmustbeaccessedwith a memory
cycle in a similar fashionto anyother operandheldin the memory.)
Indirect Addressing
Investigations of the usefulness of indirect addressing have been carried out and
are discussed in reference 1. It was found that indirect addressing in a machine with
a number of index registers had a limited usefulness; however, when it was used, it
provided some storage savings. (The primary use was for sub-routine linkage. ) As
a result of this limited usage it is not recommended that an indirect bit be added to
the instruction word, but that instead where applicable, certain instructions may use
only indirect addressing or may have the facility to use indirect addressing if desired.
9.1.2 Word Size
The desire to save storage (to use the least number of bits in the memory
possible) gives a reason for considering small word sizes. If a small instruction
word can include enough features, it may provide enough flexibility such that it would
require only a slight increase in the number of words for instructions in the memory,
over that for a larger instruction word. This may then result in a smaller number of
bits in each cell's memory for instructions. Larger instruction words are generally
used to offer more flexibility and increased processing speed. The increased proces-
sing speed is not required here, but the amount of storage saved by increased instruc-
tion word flexibility must be investigated. It is also necessary to determine the amount
of extra data words that would be necessary with a small word size. A small word
would require increased double precision and possibly some triple precision opera-
tions and would result in smaller byte sizes for storage of multiple bytes per word.
Twelve- Bit Word
A 12-bit instruction and data word was first investigated to see if it would offer
enough flexibility for a savings in the number of bits in the memory over a larger
word. The chosen 12-bit instruction word is shown in Figure 9-1. Five op-code bits
should be sufficient to offer a reasonably large and flexible instruction repertoire
including the ability to use 2 accumulators. More than 32 instructions can be made
available by the use of op-code extension on instructions that do not require a memory
address. Two uses of the tag bits B/T, are shown in Figure 9-2. Figure 9-2a would
have a bank register, B, contained in the memory added to the address bits in the
9-3
bits: 1 - 5
I op code 5
6 7 8 - 12
1B/T 2 Address 5Displacement
B/T = index/banking bits
Figure 9-1. 12-bit Instruction Word
0 0
0 1
1 0
1 1
B
B+T 1
B+T 2
B+T 3
a.
Figure 9-2.
0 0 T 1
0 1 T 2
1 0 T 3
1 1 T 4
b.
Use of Two B/T Bits
instruction word for every memory access. In addition, one of three memory index
registers can also be added to the address. Figure 9-2b proposes using the index/
banking bits to specify one of four index/bank registers. Since this scheme does not
enable multiple indexing to be carried out, it will require a few more instructions
for execution of loops. However, it does not have the disadvantage of the scheme
specified in 9-2a, of requiring that the index register contents be changed any time
the bank register contents are changed. None the less, the scheme in Figure 9-2a,
seems to be somewhat more flexible and would be chosen for a 12-bit word.
The last part of the instruction word provides 5 bits for specification of an
address within a bank. Clearly this is only a 32 word bank; as a result it is probably
too short to hold the majority of the programs and/or their associated data. This
means that a reasonably large number of load bank commands would have to be
inserted into the instruction stream both for jumping to separate parts of a program
that is located in a number of banks and for picking up data for separate banks. A
complete investigation of the programs is necessary to determine the percent
increase in storage due to these load bank commands; however, an investigation
(reference 30) showed that for navigation and guidance programs a 32 word bank could
cause substantial increases in the amount of memory required. This short bank is
especially inefficient in a 12 bit word where the index/banking scheme is relatively
inflexible.
There are a number of additional problems with a 12-bit word. One of these
would be the inability of a memory word to contain the complete address for any word
in a group. A group may typically contain on the order of 20 cells of 512 words per
9-4
cell. This would amount to at least 10,000 words per group, whereas a 12-bit word
can only address 4,000 words. As a result, a 12-bit word would require two locations
to hold this address and the second location would only need two of the 12 bits for
additional addressing. This could amount to a substantial inefficiency.
In addition, the use of a 12-bit word would certainly require triple precision
operations to be carried out in the navigation and guidance routines. This could cause
an inefficiency of bits in the data word and would also require the addition of triple
precision software. It also appears from the requirements that the half word size or
byte size of 6 bits would be somewhat small for the needs of many of the scientific
experiments and other operations that will use byte manipulations. In particular, a
seven to eight bit byte would probably be necessary to offer sufficient flexibility. A
precise answer to the size of the byte required for efficient utilization of data storage
is difficult to provide; however, 7 or 8 bits would certainly offer more flexibility to
meet the requirements for byte manipulation when they are explicitly specified.
It should be noted when discussing byte manipulation that the number of bytes
per word should be a power of two for the most ease of operating on bytes. (The
most flexible byte manipulation is when the number of bits in the word is a power of
two. ) If this is the case bytes can be manipulated and obtained from data words by
simple shifts of the addresses of the bytes. In addition, indexing with respect to
words or with respect to bytes can be done simply by adding or by shifting and adding
the specified address to an index register.
The addition of a number of half word or byte instructions to the instruction
repertoire, such that bytes are directly accessed and operated on and then replaced
in memory without affecting the remainder of a memory word, may save a consider-
able amount of storage (e. g., in the scientific experiments in a spaceborne applica-
tion). The addresses of these instructions could handle the additional length required
due to byte specification since they would be relative to an initial address of a word in
a list. (The initial address would be held in an index register. ) Therefore, since a
considerable amount of byte manipulation is expected in at least the scientific experi-
ments, only word lengths that are a power of two and a multiple of some useful byte
length, such as 6, 7 or 8 bits, will be considered, i.e., 12, 14, and 16 bit word
lengths. Longer words that are a power of two and a multiple of a byte and that
would hold _vo instructions per word could also be considered from the above stand-
point (24 bit words, 28 bit words or 32 bit words). This may result in problems in
trying to pack data into the word for efficient byte manipulation. However, the most
important point is that for the requirements considered for the given space missions,
the majority of the words tend to be 12 to 16 bits and as a result a processor with a
considerably longer word length can result in inefficiencies of storing data. From
the above it can then be seen that there appear to be no real gains from a long word
and there are some losses.
Because of the considerations given earlier a 12-bit instruction word has been
eliminated as a possibility. It does not seem to provide sufficient flexibility to save
storage over a longer word length. In fact, it appears that it would require consid-
erably more storage primarily due to the addressing and short bank problems.
9-5
14-Bit Instruction Word
Two examplesof the 14bit instruction word are given in Figure 9-3. Figure
9-3ashows6bits usedfor the op-code. This shouldeasily be sufficient to offer
instructions to take advantageof the multiple accumulators (asmanyas four wouldbe
practical), to provide byte manipulationinstructions, andevento provide somecom-
plex macros. Twobits are provided for index/bankingusing the sameschemeas
discussedin relation to figure 9-2. The addresssectionof the instruction word,
providessix bits for a 64word bank. This bankshouldbesufficiently long for some
flexibility since full lengthbank/index registers are used. However, since only a
maximumof 4 indexbankregisters are available from the B/T bits, this 64word bank
couldprovide someinefficiencies.
The additionof more index/bankregisters could alleviate muchof thepossible
addressingproblem dueto the 64word bank. Sucha schemeis shownin the instruc-
tionword of Figure 9-3b. Here the op-codebits havebeendecreasedfrom 6to 5, but
the B/T bits havebeenincreasedto three to offer the usageschemesshownin Figure
9-4. There are clearly other possibilities of using 3 B/T bits,'but thoseshownin
Figure 9-4 offer the most flexibility. The advantageof the schemein Figure 9-4a
over thanshownin Figure 9-4b is simply the availability of more total registers that
canbeusedfor index-bankingpurposes. However, Figure 9-4b offers 5 registers and
a considerableamountof flexibility in terms of multiple indexing. In particular the
indexregisters donot haveto beadjustedevery time a bankregister is changed.
(This is the casein the schemein Figure 9-4a). Becauseof this multiple indexing
flexibility andthe possibility that it may savesomestorage, the schemein Figure 9-4b
wouldbechosen. The instruction word in Figure 9-3b also has six addressbits for a
64wordbank.
An explicit decisionto choosebetweenthe instruction words shownin Figure 9-3a
and9-3b cannotbemadeuntil an investigation into the useof various op-codesis carried
out. After this a relative evaluationof theuse of anadditionalbit for a 6-bit op-code
or for a three bit B/T specification canbe evaluatedsothat either Figure 9-3a or 9-3b
can be chosen for a 14 bit instruction word. The question of course would center
around which scheme provided the most storage savings; however, there are additional
considerations, such as, the ease of programming, etc.
1 6 7 8 9 14
op code 6 B/T 2 ,I Address Displacement 6 i
m ]
9-3a
1 5 6 8 9 14
I °pc°de 5 I B/T 3 AddressDisplaeement 6
9 - 3__b.b
Figure 9-3. 14-bit Instruction Word
9-6
000 B 000 B1
001 B+T 1 001 B 1 + T 1
010 B+T 2 010 B 1 + T 2
011 B 1 + T 3
100 B 2
1 0 1 B2+T 1
1 1 0 B2+T 2
1 1 1 B+T 7 1 1 1 B2+T 3
a_ b.
u
Figure 9-4. Use of Three B/T Bits
A 14-bit word may be an efficient choice for the computation system in the space
missions under consideration, however, it does have some inefficiency problems due
to its relatively short length. For example, double precision operations of 28 bits will
not be sufficient for some navigation and guidance systems. As a result, triple pre-
cision will be required, but a triple precision word containing 42 bits will offer greater
accuracy than that required and consequently will waste some amount of data storage
area. In addition, the use of triple precision will require triple precision software to
be added. There is also some question as to whether a seven bit byte will be sufficient
for byte manipulations in the scientific experiments.
There is, therefore, some push toward a 16-bit instruction word due to addi-
tional flexibility in the instruction word, the use of 8 bit bytes, and more flexibility in
bit manipulation. The latter point will be made clear by a renewed consideration of
the discussion of byte manipulation given earlier. If a considerable amount of bit
manipulation is necessary in the computations (in other words manipulation of bytes
that can vary in length from one bit to eight or more bits}, the use of an instruction
word with a number of bits that is a power of two would be useful. These varying
length bytes can then be packed into words and can be accessed by instructions by
simple shifts of the address in a fashion similar to that for the half word bytes dis-
cussed earlier. All that is necessary is to place the address of the first word in a
list in an index register and then to address all varying length bytes relative to this
initial address by using an address in the instruction word that represents the bit
number in the list. For example, a 16-bit word would simply require that the bit
address be shifted four positions to the right and then indexed with the initial word
address in the list. This would give a word address of the word that contained the
required bits. The four bits that were shifted right would be saved and used to choose
the particular starting bit in the chosen word. (The bit address can also be indexed
if desired. )
9-7
Clearly, if a fairly large amount of bit manipulation was necessary, a word
size that is a power of two could provide substantial storage savings in that simple
instructions could easily be included to automatically carry out the bit manipulations.
For example, a single instruction to load the accumulator with some desired byte
would simply pick up the address bits from the instruction word, shift them four posi-
tions to the right, save them, index the shifted address with the register specifying
the initial address in the word, pickup the word, and then left adjust it to the specified
starting bit location given in the instruction. Additional instructions to add another set
of bits to the ones that were loaded could also be implemented. These byte manipula-
tion instructions would take a reasonable amount of time; however, they would only
require a very small number of instructions for very complex manipulations.
The actual amount of byte manipulation required in the programs would have to
be extensively investigated before the above discussion could be used as a strong
reason for choosing a 16-bit word over a 14-bit word. However, there are also a
number of additional points to be considered when trying to choose a 16 or 14 bit
instruction word for the cell processor and memory. A word size increase from 14
to 16 bits would increase the number of memory bits 14 percent if the same number of
total words were required. Therefore, in order for the 16-bit word to be more effi-
cient in terms of storage savings than the 14 bit word the additional features and
flexibility gained with 16 bits would have to make up 14 percent or more of the memory
The factors contributing to a decrease in the number of memory words with a 16-bit
word are the following: less instructions would be required due to the ability for addi-
tional indexing or longer banks and/or additional op-codes, there would be no require-
ment for triple precision operations for data storage with a 16-bit word, and there
may be more flexibility of byte storage as discussed above. A precise comparison
of the storage usage of a 14- and 16-bit word cannot be carried out until the require-
ments are explicitly specified in the future. However, a rough evaluation seems to
show that there would not be a sizable memory difference between the two approaches
since the additional features possible with a 16-bit word would off-set the increase in
bits per word by a reduction in words required. As a result, a 16-bit word will be
chosen for the present design of the distributed processor cell since it will offer
additional flexibility in terms of meeting a variety of requirements while providing
somewhat greater programming ease. If in the future when a distributed processor
is to be designed explicitly for a specified mission or set of missions and the require-
ments can be clearly specified, the precise trade-off can be carried out to decide
between a 14- and 16-bit word. The features used in a 16-bit word are given in the
next paragraph.
16- Bit Word
Two useful instruction word formats for a 16 bit word are shown in Figure 9-5.
Clearly, there are other formats that are possible, but the two chosen appear to be
the most applicable to the space mission requirements. One other variation that
would use seven op-code bits and a lesser number of address or B/T bits may be of
some interest. The additional op-code bit could be used to take full advantage of
multiple accumulators (four or more) and to provide an extensive set of macros to try
to save storage. However, a preliminary evaluation of the instruction set and macro_
seems to indicate that a six bit op-code would be sufficient; as a result, only the two
6-bit op-code formats are shown in Figure 9-5. The instruction word in Figure 9-5a
uses 6 op-eode bits, three bits for index banking (either of the schemes shown in
Figure 9-4 could be chosen), and a 7-bit bank. The seven bit bank should provide vet
9-8
few inefficiencies in terms of requiring load bank commands for jumping from pro-
gram to program or from one data bank to another. The instruction word in Figure
9-5b uses a 6-bit banked address and uses the additional bit to obtain 4 B/T bits.
These bits can then be used for two bank registers and any one of seven index regis-
ters. This gives a powerful banking indexing and multiple indexing capability, such
that a 64 word bank may cause very few inefficiencies. Further evaluation is
certainly necessary to obtain an accurate tradeoff between these two 16 bit instruction
word formats. However the indication thus far is that the 7 bit bank will be more
important than 4 B/T bits. Therefore the instruction word shown in Figure 9-5a will
be chosen. It should also be pointed out here that a number of instructions will be
added to the op-code list in order to provide a reasonably flexible byte manipulation
capability in each cell. As mentioned earlier, these will not require extensive
processor hardware when a 16 bit instruction word is used (a power of 2).
18-Bit Word
An eighteen bit instruction word with the format shown in Figure 9-6 could also
be considered. However the addition of a longer bank and more op codes would not
save enough memory bits to warrant the increase from 16 to 18 bits. This can be
realized since the 16 bit word has very little memory restriction from either lack of
op codes or bank size. The eighteen bit word would allow bigger bytes, but it pro-
vides a less flexible word size for bit manipulation (word not a power of 2). The
primary advantage of this larger word would be in terms of speed increases both due
to slightly less double precision operations and to a larger and more flexible instruc-
tion set. {The instruction set for the 16-bit word will not contain a number of
instructions aimed at speed savings. ) Therefore since storage is of primary concern
in the distributed processor, it is felt that an 18 bit word is not necessary.
9.1.3 Control Hardware
The control hardware in the cell's processor section could be implemented with
MOS gating or with a microprogrammed control unit. The primary advantages of a
microprogrammod control unit are: case of changing the instruction set by replacing
the unit (e. g., replacing a diode array fixed memory wafer with another wafer), ease
of design and implementation of the instruction set, and a relatively easy unit to
checkout. The latter advantage may be realized since instead of complicated gating
signals and combinations of signals spread throughout the processor the micropro-
grammed unit can be considered to be a black box with a finite fixed set of inputs giving
a finite fixed set of outputs; therefore, it can be checked out by sequencing through a
set of inputs that checks each memory location.
The distributed processor integrates memory and processing on a single wafer;
so a primary consideration in constructing the control unit is its usage of wafer area
and its affect on the yield of the processor section of the cell. A control unit con-
structed from MOS gating could take good advantage of redundant logic terms and
could also be spread throughout the processor section of the wafer thus providing
efficient gate utilization and short interconnection lines. (Note that the micropro-
grammed control unit would require a considerable number of long control lines. )
Both of the above points would enable the gating control unit to use considerably less
area than the microprogrammed unit. Therefore even though the microprogrammed
control unit offers the advantages stated above a MOS gating network will be chosen
for the processor section control unit in order to minimize control area. This
decision can certainly be changed in the future, particularly if it is evaluated that the
9-9
1 6
6
op code
7 9 10 16
I B/T 3 I Address Displacement 7 I
5a.
1 6
6
op code
Bit 7 = 0 - B 1
Bit 7 = 1 - B 2
7 10 11 16
I B/T 4 I Address Displacement 6 I
Bit 8- 10 =000
001-T 1
010-T 2
111-T 7
5b__.
Figure 9-5. 16-bit Instruction Words
1 7
7
op code
8 11 12 18
I B/T Address Displacement
4 7
Figure 9-6. 18-bit Instruction Word
9-10
instruction set may be changed often in the distributed processor in order to apply the
system to a variety of missions. (In order to change the instruction set with MOS
gating the processor area of the wafer mask must be remade; however if a diode
array fixed memory control unit is included in the processor a new one-zero pattern
must simply be encoded into the mask. )
9-11
9.1.4 MACRO Instructions
INTRODUCTION
The DAMP system is basically an array of cells consisting primarily of storage
with a small amount of each cell devoted to a processor (arithmetic and logical). An
important consideration is what can be added to the processor section that can result
in requiring less storage and a net reduction in total hardware required in the cell.
Macro instructions (MACROS) were one such feature considered and this section will
give a brief discussion of Macros in general and several specific types investigated.
A considerable amount of additional effort in this area would be necessary in
order to choose a set of Macros to add to the set of common instructions (add, sub,
transfer, etc ). In particular the amount of hardware necessary to implement a
Macro should be traded-off against the amount of storage necessary to implement it
as a subroutine using common instructions. In addition, it should be determined how
much each macro would be used, since including it in the instruction repetroire
requires including it in all cells. This would clearly only be worthwhile if the Macro
was used often and it required a relatively small amount of hardware compared to the
amount needed for common instruction storage necessary to implement the same
operation. The above trade-off is biased against including most macros that may be
suggested, in fact the situation is even worse when it is considered that many func-
tions that are candidates for macros, such as sine or cosine, could be implemented
as a common subroutine in a single cell devoted to receiving parameters from other
cells and sending back sines, cosines, etc. as required. As a result storage for
certain routines may only be required in a small number of cells.
CORDIC ALGORITHM
The Cordic Algorithm is adequately described in the literature as a useful
means of generating sines, cosines, and other trigonometric or hyperbolic functions
(see ref. 31 and 32}. Some consideration was given to including the hardware necessary
for an efficient use of the algorithm in the processor sections of the cell. The
algorithm could of course be implemented by programming with a normal instruction
set; however this would require more instructions than the typical series solution
implemented. (The series solution for sine and cosine simultaneously requires on the
order of 30 instructions depending on the machine). As a result consideration was
given to making the necessary additions to the general purpose hardware in the cell.
(The cordic hardware does not provide sufficient flexibility to replace the GP hard-
ware; however it may enable some instructions to be deleted from the normal
instruction set. )
The hardware used to implement the algorithm typically involves three registers
capable of being shifted, two adders connected to two of the registers so that cross
addition of two of the three registers can occur, a third adder for the third register,
gating hardware to enable a variable pick off from two of the three registers (this
enables 2-J, j = 0, 1, 2...n, times the contents of a register to be picked up for cross
addition), and control circuitry. In addition to the above, for an n bit word, n angle
constants must be stored either directly in fixed hardware or in the memory. The
above registers can be simply made available from the normal processor registers
(accumulators); however it would be necessary to add additional connections, adders,
gating, control circuitry, and possibly the n constants. (If the constants are not in
9-12
fixed hardware they must be accessed from memory via stored instructions or by
control circuitry. ) In any case, implementation of the cordic algorithm, including
the stored constants, would require a few thousand FET's in addition to that required
for the general purpose hardware. Even with this hardware the algorithm still
requires a number of instructions at least for initialization and storing the result.
The hardware implementation of the algorithm would offer increased computation
speeds for trigonometric functions, but this is not what is needed in the distributed
processor system.
An accurate comparison can not be made of the above hardware that would be
required in every cell with the number of instruction locations in the whole machine
that would be required to execute the same functions. However, the applications for
which the cordic algorithm would be useful, navigation problems involving sines,
cosines, coordinate transformations etc., represent only a small percentage of the
requirements for the space missions under consideration. As a result trigonometric
routines would only be required in a small percentage of the cells. In fact use of
separate cells for subroutine storage and execution as mentioned in the introduction
would reduce to an even smaller number the percentage of cells storing trigonometric
routines. From the above discussion it can then be realized that increasing the com-
plexity of each processor by the addition of cordic hardware would bring a small
return in additional available memory. As a result it is considered to be not worth-
while to implement this algorithm. (A relatively large increase in processor com-
plexity of course makes the processor more difficult to fabricate and can result in
lowered wafer yields, as a result complex macros should not be included unless they
save a good amount of storage. )
DDA
The cell could be made into a DDA-GP structure. The DDA portion of the
machine could be used for generation of trigonometric and hyperbolic functions as
with the cordic hardware; however this DDA-GP organization would not be worthwhile
for the same reasons as for the cordic algorithm. Quite a large number of FET's
would be needed since at least two complete DDA's would be required including pro-
gramming flexibility for the interconneetions. In addition, the DDA implementation
would require a number of memory words for initialization and storing the result.
GENERAL MACRO SET
Macros are being considered for the DAMP System primarily from the stand-
point of saving storage. As a result, the present investigation of macros is pointed
toward those that would replace a number of common instructions (decrease the
number of bits associated with instructions so that less instructions can be used) and
that would be used in a reasonably large number of computations. Actually, limited
usage of macros would be acceptable if they did not require very much additional
hardware in the processor.
The following paragraphs discuss a few basic types of macros in order to point
out the type of macro that is the most fruitful to investigate for storage savings. The
first type presented are basic instructions (add, etc.) that operate on non-ordered
lists of data (i. e., each data word must be individually addressed). These macros
save some bits, but it is felt they will not be used very often in the applications of
interest here. A second type of macro would again carry out basic instructions but on
9-13
ordered lists of data. These macros are shown to be of relatively small value because
they only replace loops that generally contain a very small number of instructions.
The third class of macros are characterized by complex instructions on any type of
data. These macros are shown to offer the best possibilities for storage savings; as
a result, investigations of macros should emphasize this latter type.
It should be emphasized that when trying to save memory, the number of bits
used by a macro instruction is of importance. There are a number of types of
macros that can be investigated for memory bit savings. One example is multi-
operand macros that individually address a number of operands (operation on non-
ordered data that must be addressed from random locations) and carry out a basic
logical or arithmetic operation on all of them. These macros require sufficient bits
to address each operand; as a result the only memory savings is due to the saving of
the op code bits that would be necessary to individually access and combine the
operands. This saving could be large if enough operands could be combined at once.
However, for macros that would be used sufficiently it seems that only a very few
operands could be combined at once as described above and as a result the bit savings
would be typically small.
Many operations with a single basic arithmetic or logical operation are typically
carried out with data that is ordered into some type of a list in memory so that inher-
ent operand addressing is possible. An example of this type of addressing is the
processing of a list of information with an instruction loop that uses index registers
to hold and update the operand addresses. (Note that the list does not have to be
simply sequential. It could use every other memory location, etc. ) The inclusion
of a repeat mode in a processor enables loops, as described above, that contain a
single instruction to be executed very quickly since the count of the times through the
loop and the termination of the loop are automatically handled. However the use of
a repeat mode or of a macro to initialize the appropriate index registers and execute
an operation on a list of ordered data can save very little storage since the loops used
to carry out the same operation require only a few initialization words for the appro-
priate index registers plus the basic loop (one operation plus index handling instruc-
tions). As a result a macro to carry out a basic operation on an ordered list would
save very few bits (only the op code bits for the index initialization and handling
instructions). The inclusion of such macros or even a repeat mode is then not worth-
while in the DAMP System unless an increase in the speed of execution is needed.
Another type of macro that could use basic arithmetic or logical instructions and
forms of inherent operand addressing to save storage bits involves using a push down
stack in the processor. Data could be processed and placed in the stack, and when
appropriate a macro could be issued that executes a basic arithmetic or logical oper-
ation on the top members of the stack. The number of words to be combined would
be the only parameter required in addition to the op code (no addresses). This macro
would then save the loop initializations and index handling that would be necessary if
the same instruction was executed on the list of data by a loop. The stack may also
find other uses for the programs, however, it is not clear at this time that such a
stack would find any real usefulness.
The class of macros characterized by a single instruction that replaces a
quantity of basic instructions offers a good opportunity for memory bit savings; how-
ever, macros of this type that will be used fairly often and do not take an unreasonably
large amount of hardware are difficult to find. Useful macros of this type could
operate on very few operands or could use some form of inherent operand addressing
9-14
to operate on lists of operands. In fact use of the cordic algorithm to generate
trigonometric functions could be considered an example of such a complex macro.
This macro could call out sine, for example, and then give the angle to a hardware
unit set up to return the answer; however, this macro was shown to be impractical
from a hardware standpoint since it provided high speed sine generation but saved
very little storage and was not used very often. Exactly the same arguments eliminate
consideration of special function generators using diode arrays, for example. One
possible useful instruction of the complex macro type is the vector dot product. It
would address the first element in each vector and place the addresses in index
registers. The instruction would also specify the munber of elements in the vector
and place this value in a third index register. The elements of the vectors would be
stored in the memory in sequence following the first element. The instruction would
then address all elements by simply incrementing the index registers, multiplying
corresponding components, and adding to the previous sum (stored in the second
upper accumulator and the lower accumulator) until the third index register reaches
zero and the operation is terminated. This macro would then save the bits necessary
to specify the initializations and the sequence of operations within the loop that could
be used for the calculation. The main usefulness of this instruction would be in
executing matrix multiplies, however it may also find use as a substitute for a sum
of products multiply. The difference in this latter case is that two memory locations
would be multiplied instead of one of the upper accumulators and a memory location.
This is acceptable except that more bits are required than for _pecification of a simple
sum of products multiply; therefore the vector dot product instruction would be an
improvement in this case only if the use of the sum of products instruction generally
requires the accumulator to be loaded first. (This is probably not generally the case. )
In order to get a good evaluation of the value of this macro its use in a matrix multiply
needs to be compared to a loop implementation of a matrix multiply. The amount of
usage of the instruction should also be evaluated. Other macros of the above complex
type need to be investigated. Two possibilities are a full matrix multiply and complex
operations on a stack.
9.1.5 Byte Manipulation Instructions
Byte manipulation instructions were investigated and several included in the
instruction set that will be presented in the next section. The byte manipulation
instructions were included to enable the DAMP system to flexibly manage variable
length bytes for requirements such as the scientific experiments. These same
instructions can be used for character instructions.
The instruction word format for byte instructions is shown in Figure 9-7. In
order to set up this word a few assumptions were made about the operation on byte
data. It was assumed that the bytes to be operated on are generally located in a list
such that they could readily be addressed relative to the first word containing the
first byte in the list. It was also assumed that one particular byte length was
operated on for a number of instructions such that it would be efficient to use a separ-
ate instruction to setup the byte length in a separate register (This separate register
will be referred to as the LENR register). Of course if .the length was changed very
often the overhead of setting up the LENR register by separate instruction would be
impractically high; however if one length is used for a number of instructions, as
assumed, the scheme specified above will be efficient since the length bits will not
have to be included in each instruction. (The length bits are used for compare
9-15
instructions, for automatic right adjustment of bytes, for storing in memory without
affecting other bits in word, etc. )
An explanation of Figure 9-7 follows. The op code bits will give a basic byte
instruction by using the op code extension bits in addition to bits one to six. The
address will be relative to index/bank register B1 for all byte instructions. This
register will be setup with some position in the list of bytes and it will be assumed
that bit position one is the starting bit position of the first byte in this word. The
address portion of the instruction word will then simply give the starting bit number
in the list of bytes of the desired byte. In order to understand this addressing note
that the address portion of the instruction word is divided up into a word address and
a bit address. The bit address gives the starting bit position of the desired byte in
its memory word. These bits are placed into a register for instruction control. The
remaining address bits, the word address, are placed right adjusted into a memory
address register and added to B1 in order to obtain the memory word location to be
accessed for a particular instruction. If the instruction is indexed, index/bank
register T 1 and the full seven bit address given in the instruction word are added
together so that the indexing can be relative to bits. This addition is carried out by
first loading the full address into the accumulator then adding T 1, and then placing
the least significant four bits in a register and the remaining bits right adjusted into
the memory address register. The memory address register will then of course be
added to B 1 in order to get the proper word address.
The instruction word in Figure 9-7 allows for eight word lists without having to
change B1 or T1 (the word address is three bits). If it is found that the predominate
use of byte instructions is in loops it may be worthwhile to change the instruction
word to allow for more indexing. For example the tag could be increased to two bits
and the word address decreased to two bits. A number of byte instructions will be
given in the instruction list presented in Section 9.2.
bits: 16
instruction:
1 : 6 7:8
6 loP code
op code I ext
!
9 10 :
1 '3 _Word Address ! Bit AddressTag ADDRESS
Tag: "0" - no indexing
"1" - index with T 1
Figure 9-7. Byte Instruction Word
9-16
9.2 FUNCTIONAL DESCRIPTION OF THE CELL
This section contains a functional description of the cell hardware. A block
diagram of the cell will be presented first and then several areas of the cell's opera-
tion will be explained in more detail.
9.2.1 Cell Block Diagram
Figure 9-8 contains a general block diagram of the cell. As can be seen the
cell contains hardware that can be broken down into six general areas:
1. Memory
2. Identification
3. Arithmetic and Control
4. Inter-Cell Bus Communications
5. I/O
6. Timing
A detailed block diagram of the cell is shown in Figure 9-9. This figure con-
tains all the main registers in a cell. There is of course a lot of logical gating not
shown. However, the drawing will serve to present some insight into the detailed
operation of a cell. The registers shown also have a numeral in the right hand corner.
This indicates the length (bits) of the register and gives some indication as to its
complexity. Each of the blocks in the figure will be explained below:
U Hardware Upper Accumulator: The hardware upper
accumulator is used to hold one of the memory upper
accumulators in any operation that uses them.
L Lower Accumulator: The lower accumulator is used
primarily in multiply, divide, and double precision
operations to hold the lower half of a data word. This
accumulator, and U have a one-bit extension onto their
sixteen bits in order to hold the overflow carries which
may be generated in the multiply operation.
U1 , U2, U3, U4 Memory Upper Accumulators: The memory upper
accumulators are the primary arithmetic and logical
registers. They are also used to hold and manipulate
data in shift and register operations.
P Program Counter: This register is used to sequence the
flow of control in the processor. It is not only used to
access instructions but also to provide memory addresses
for interrupt status word storage. It must therefore be
connected both to the ALTU and to the memory interface
lines.
9-17
z
0
Z
r,e
0
_0
Z
h_
_9
o
_D
!
¢9
°p'4
9-18
J_m
_z
Z_
!
B
N
Figure 9-9. Detailed Cell Block Diagram
9-19 _
Pr<ECEDING PAGE BLANK NOT F!U,_ '.
MAR
MB
B1, B2
T 1 to T 3
ALTU
IR
TR
Memory Address Register: This register holds the memory
address for operand memory cycles. It is loaded with the
address displacement from the instruction word, B1 or B2
is added to it and, if indicated, one of the index (T) registers
is also added to it. This register is necessary since the B
and T registers are in memory and must therefore enter the
processor through MB; as a result MB cannot be used to
hold the operand addresses.
Memory Buffer: The memory buffer receives data and
instructions from the memory, sends data to the memory,
holds the divisor in divide operations, and the multiplicand
in multiply operations. It also holds one of the operands in
all other arithmetic and logical operations with the memory.
In addition to the above tasks since the MB receives all
instructions it keeps many of these bits for the instruction
decoding and operation. For example, it holds the B bit for
address generation, the register to be shifted in a shift
operation and one of the registers to be operated on in
register operations.
"B" Memory Index/Bank Registers: These registers hold
both index and bank values for address calculation and
looping control. One of these two registers, indicated by
the B bit, is added to the address displacement for all
operand address calculations.
'_r" Memory Index/Bank Registers: These registers have
the same functions as the B registers. The only difference
is that operand addresses can be generated without adding
any T register to B plus the address displacement. (Tag 00
specified no indexing with the T registers. )
Adder, Logical, and Transfer Unit: This unit contains all
the circuitry for carrying out arithmetic and logical operations
including comparisons. It also provides for transfers amongst
all the hardware registers and detection of overflows.
Instruction Register: The instruction register holds the
six bit op code throughout the instruction execution.
Tag Register: The tag register holds the T bits of the
instructions. It is necessary so that B plus the address
displacement can be generated, stored in MAR, and then
added to T prior to an operand cycle. This register also
holds one of the register addresses in register operations.
It also holds a two bit op code extension for byte operations.
9-21
SCR
LENR
HA
CFF
O
F
IMR
CCM
Cell ID
Shift Count Register: This register holds the shift count for
shift commands, and for setting up bits in byte manipulation
operations. It is counted down to zero by one count for each
shift. The register can be loaded from the ALTU in addition
to the MB since shift counts may be indexed prior to being
loaded into SCR for execution. It also provides a temporary
storage area for certain control information during several
inter-cell bus communications operations.
Length Register: This register is used for byte instructions
to specify the length of the byte that is being used. It is also
used to hold the op code extension bits in shift and register
instructions.
Hardware Accumulator: This register identifies which of the
4 accumulators is presently in hardware. Every instruction
that specifies an accumulator must compare these 2 bits to
that in the instruction. If the accumulator specified in the
instruction is in memory, the hardware accumulator must be
stored in memory and the other accumulator brought into the
hardware position (the next Section 9.2.2 will discuss this in
more detail).
Comparison Flip-Flops: These flip-flops hold the results of
a comparison; greater than, less than, or equal. They may
be tested by an instruction for a conditional jump, skip, etc.
Overflow Flip-Flop: This flip-flop is set if an overflow
results in the ALTU during an arithmetic operation. Like
the CFF, it may be tested by an instruction.
Failure: This flip-flop is set by the hardware failure
detection circuitry. It can be sampled by an instruction
during software self test.
Interrupt Mask Register: This register is set by the
programmer to mask off any interrupts that are not to be
allowed.
Controller Cell Mode: This register holds the mode of a
cell in the controller cell state. It has no meaning in cells
in any other state. It will be set to one of the four transmit/
execute modes and will control the execution of instructions
in the controller cell.
This register holds the cell address or identification. It is
used in the Bus I/O control section to decode commands over
the Bus.
9-22
State
Level
BCR
Bus I/0
IC Busy
Timer 1, 2
ION R
Timer 3
Counter
I/O
FUI, 2, 3, 4
This register holds the current state of a cell. It will be in
one of 5 states:
000 INDEPENDENT
001 DEPENDENT GLOBAL
010 DEPENDENT LOCAL
011 DEPENDENT WAIT
100 CONTROLLER CELL
This register specifies one of eight levels that the cell may
use as a means of identification during dependent state
operation.
Bus Communication Register: This register receives the
present byte on the inter-cell bus. It is used by the Bus
I/O Control to decode commands and control the bus
communications.
This register holds the identification of the present operation
being carried out on the inter-cell bus (input, output, etc).
INTER-CELL BUS BUSY: This flip-flop is used to determine
if the bus is presently being used.
These two timers are used to time certain operations over
the inter-cell bus. Expiration of the timers during these
operations results in an interrupt being issued.
I/O and Neighbor Register: This is a serial/parallel in and
out buffer register used for communications between I/O
devices and one of the four neighbor cells. It is 18 bits
long due to the need for sync and control bits in addition to
the data bits (16 bits).
This timer is set during certain I/O operations such as
requesting data from a neighboring cell. Its expiration will
cause an interrupt.
This is a counter to count the serial in/out of the ION R.
It will issue interrupts during I/O operations.
This flip-flop is set to either the input or output state during
an I/O instruction execution. It controls the type of interrupt
generated by the COUNTER mentioned directly above.
ACCUMULATOR FLAGS: These flags are set to one of five
states: N, E,S,W, or X. The first four indicate the accumu-
lator holds data for a neighboring cell and the last one
indicates no neighbor data. There is one flag register for
each accumulator.
9-23
RTC EXT Real Time Clock Extension: This is a simple counterdriven
by the clock.
RTC RealTime Clock: This register is a simple counterdriven
by the RTC EXT. It is set andread by instructions and
countsdownto zero.
BTC Bit Time Counter: This counteris driven by the clock and
generatestwo pulses (countsto two). It is usedto control
the instruction executionanddrive the instruction decoding
andcontrol generationlogic.
MC ModeCounter: This counter is d_ivenby the BTC and
generateseight control signals. It drives the instruction
decodingandcontrol generationlogic.
It shouldbe mentionedhere that the lengthof the registers indicatedin
Figure 9-9 doesnot include anybits for fault detection. At least one bit for parity
will be added on to several registers such as U, L, IONR, and BCR. Other registers
such as HA, CFF, O, etc. may be combined for generating parity and, therefore,
will not necessarily have an extra bit for each of these registers.
9.2.2 Accumulator Mechanization
A somewhat more detailed discussion of the accumulators is necessary in order
to tmderstand their use. There are a number of methods of handling the four accumu-
lators in each cell. One method would assign tag bit combinations to each accumulator
suchthat U1 is the hardware accumulator and U2, U3 and U 4 are the memory accumu-
lators. Each instruction would then specify one of these accumulators and cause its
contents to be exchanged with those of the hardware accumulator, U 1, for execution
of the operation. At the conclusion of the operation the hardware accumulator would
keep its current value so that any further operations of this value would be carried
out by specifying U 1. This scheme uses the minimum amount of processor hardware
for handling the accumulators, but it makes it difficult for the programmer to keep
track of information that was initially or subsequently stored in one of the accumu-
lators. It would also be necessary at the ends of a branch to restore the accumulators
to some specified ordering of information that is consistent between the branches.
Because of the above two disadvantages, this scheme was not chosen.
A second scheme would replace the named accumulator into its original location
at the completion of each operation. This means that unless the hardware accumulator,
U 1, was in operation, a final exchange of the present value of U 1 and the orignal
value of U 1 in one of the memory accumulators would have to be made. The main
disadvantage of this approach is that it requires two extra memory cycles for any
accumulator operation that does not use U 1. This extra time could be considerable
since when an accumulator is brought into execution it is generally used for a'few
instructions; as a result each operation would require two extra memory cycles.
For this reason this approach was not chosen.
9-24
A third approach as follows is also possible. Four sets of tag bits are held in
the processor to specify the tag associated with the hardware or memory accumulator
positions. (In this scheme U1, U 2, U 3, and U4 can be in any of three memory posi-
tions or the hardware position.) Each instruction operating with an accumulator
simply specifies the tag bits of the accumulator that it would like to use. This
accumulator is then found by an automatic comparison to the accumulator tag bits,
HA1, MA2, MA3, and MA4. The specified accumulator is then loaded into the
processor accumulator position (if it is not already there} and the accumulator tag
bits are updated to reflect the present locations of U1, U2, U3, and U 4. (When the
machine is started, these tag bits must be loaded into HA1, MA2, MA3, and MA4 in
any order. ) This schcme requires slightly more processor hardware than the other
schemes mentioned, but it has the advantage of leaving the last accumulator referenced
in the hardware accumulator position. Therefore only the first reference to a new
accumulator requires an extra memory cycle (the exchange of accumulators can be
carried out in one memory cycle}. It should be noted that the accumulator tag bits in
the processor must be stored after an interrupt in order to enable proper restarting of
an interrupted program.
A fourth approach similar to the last scheme is described below. Four locations
in memory, as shown in Figure 9-9, are used to hold U1 to U4. Whatever accumulator
is referenced by the instruction word tag bits, is placed in the hardware accumulator
position and the present contents of the hardware accumulator are returned to their
proper memory position. Two control bits HA1 are necessary so that the accumulator
presently in hardware can be specified and compared to the tag bits in the instruction.
Clearly, if the present hardware accumulator is specified no memory access is
required. This scheme then accomplishes the same operation as the last scheme
(it leaves the last referenced accumulator in the hardware position) but uses slightly
less processor hardware for control and one more memory location. It also requires
only one additional memory cycle when an accumulator from memory is specified
since the present accumtdator value should be able to be replaced in its proper memory
position and the new accumulator picked up all in one memory cycle. At the same time
the HA bits will be updated to the new accumulator tag. This scheme was selected
over the third approach described above because it should actually require less total
hardware usage. It requires less control and register hardware in the processing
section due to requiring only the HA tag bits. More important though is that less bits
will have to be stored upon an interrupt which may actually make up for the extra
memory location used for the accumulator.
9.2.3 Instruction Set
This section gives the instruction set selected for the distributed processor.
Since 6 bits are used for the op code in the instruction word, there are a total of
64 op codes that may be used for the instructions. However, this does not limit the
set to 64 instructions since many op codes will use op code extension thereby mechan-
izing many instructions from one basic op code.
There are four accumulators used and referencing each accumulator for opera-
tions such as add, subtract, multiply, etc requires one op code for each reference.
Therefore, four op codes are required to provide an add instruction for each accumu-
lator. There are not enough op codes to provide full accumulator specification for
9-25
every instruction that specifies an accumulator. Therefore, the approach taken was
to provide full accumulator capability for the more important instructions only such
as add, subtract. Other instructions will be limited to less accumulators or to the
hardware accumulator only.
The operation times listed below with the instruction set are given in terms of
memory cycles. This leaves open the specification of the memory cycle time and
the processor bit time. However, if one wishes to assign values for preliminary
performance estimates, a value of 1 psecond for memory cycle time and 0.5 p sec
for processor bit time may be used.
The following is a list of abbreviations used in the specification of the
instruction set:
m:
M:
(M):
U:
U1, U2, g 3, U4:
U123:
U12:
U12,34:
Vh"
U :
m
L:
T :
n
B:
R:
MB:
MAR:
RTC:
Address Displacement
Address After Banking and Indexing (if specified)
Contents of Addressed Memory Position
Replaces
Any of the Upper Accumulators
Upper Accumulators
Upper Accumulator U1, U 2 or U 3
Upper Accumulator U1 or U2
Upper Accumulators U 1 and U2 or U3 and U4
Hardware Upper Accumulator
Any of the Memory Upper Accumulators
Lower Accumulator
Any of Bank/Index Registers Specified by the T Bits
n
Bank/Index Register One or Two
Any of the Bank/Index Registers or Accumulators
Memory Buffer Register
Memory Address Register
Real Time Clock
9-26
P:
LUn
Run:
Lun:
C
C
Program Counter
n Position Left Shift of U
n Position Right Shift of U
n Position Left Cycle of U
n Position Right Cycle of U
Before giving the instruction list an example of the execution of an add instruction
will be given to clarify the use of memory cycles.
INSTRUCTION: ADD
(M)+U --. U
This is executed as follows with no indexing and with U located in the hardware
location:
Instruction access: m _ MAR, (P) -. MB
Bank access: B + MAR --. MAR
Operand access and execution: (M) + U _ U
1 memory cycle
1 memory cycle
1 memory cycle
3 memory cycles
It is executed as follows with indexing and with U located in one of the memory
accumulator positions:
Instruction access: m --, MAR, (P) --, MB
Bank access: B + MAR --* MAR
1 memory cycle
1 memory cycle
Index Access: T n + MAR --* MAR
Accumulator access: U h --. U m (old accumulator
put in its location)
1 memory cycle
1 memory cycle
U m -. U h (new accumulator
picked up)
Operand access and execution: (M) + U --. U 1 memory cycle
5 memory cycles
9-27
OP Code
1-4
Instruction
ADD
3 (4) mc
(memory cycles) (1)
5-8 SUBTRACT
3 (4)mc
9-12 MULTIPLY
-_7 (8) me
13-15 DIVIDE
_7 (8)mc
16-17 AND
3 (4)me
Operation
(M) + U_U
(M) - U--U
(M) x U---U, L
U123, L - (M) --- U123 (quotient)
L (remainder)
(M). U12 --,.U12
Comments
Since the memory
contents are loaded
into the memory
buffer register, the
actual add is
between the MB and
the accumulator.
Note that any of the
4 accumulators can
be used.
A one bit at a time
multiply is used
since speed is not
critical.
This time is for a
straight one bit at
a time divide.
Note that this
instruction is
limited to 3 accum-
ulators. It also
uses a U, L dividend
so that double pre-
cision divide soft-
ware is possible to
implement. For
single precision
divides the L
register can be
zeroed, if desired.
Note that only two
of the accumulators
can be used for the
logical and.
(DOne additional memory cycle should be added to this and any other instructions that
are to be indexed (specified by a T bit). The execution time in parenthesis repre-
sents the execution time if the accumulator is in memory and not in the hardware
location.
9-28
OP Code.
18-19
2O
21
22
Instruction Operation Comments
OR
3 (4) mc
EXCLUSIVE OR
3 (4) mc
This instruction is
limited to the con-
tents of the hard-
ware accumulator.
Note that this type
of limitation enables
any accumulator to
be used. It is
simply necessary to
load the hardware
accumulator in the
desired accumulator
by an earlier
instruction.
SUM OF
PRODUCTS
MULTIPLY
7mc
(M) x Uh+U 1,L--'U h, L Note that this is a
full length sum of
products multiply.
This is not possible
with only one
accumulator.
SHIFT SINGLE
PRE CISION
NOTE: The shift instructions, op code 22 and 23,
use an op code extension format shown below:
bits: 10--11 12 - - -1- - -6 7--8 9
F FCODE ACC T OPEXT COUNT
where ACC: One of the 4 accumulators, U
T" A zero specifies no indexing, a
one specifies indexing the shift
with index register T1
OP EXT: Two bits to provide 4 instructions,
they are held in the LENR register.
COUNT: The amount of the shift up to 32 positions
(32 are needed to handle double precision
instructions given in op code 23).
9-29
OP Code
22a
22b
22c
22d
23
23a
23b
23c
23d
24-25
26-27
Instruction
Short Right Shift
1 (2) mc+n-1
bit times (1)
Short Left Shift
Short Logical Right
Shift
Short Logical Right
Cycle
SHIFT DOUBLE
PRECISION
Long Right Shift
Long Left Shift
Long Logical Right
Shift
Long Logical Right
Cycle
DOUBLE
PRECISION
ADD 5 (6) mc
Operation
Run_u
Lun---.U
Run_u
Run----.U
c
RU, Ln_----U, L
RU, Ln----_U, L
c
(M) + U12,34_U12,34
DOUBLE (M) - U12,34--_U12, 34
PRECISION
SUBTRACT
5 (6) mc
28-31 LOAD (M) _U
ACCUMULATOR
3 (4) mc
Comments
It should be noted
that on all noncyclic
right shifts the sign
is spread whereas
noncyclic left shifts
insert zeros.
'0' 's are inserted at
the left.
One of the four
accumulators and
the lower accumu-
lator axe shifted
n positions.
With'O' 's inserted
on the left.
Two sequential
memory locations
are added to
accumulators 1
and2 or 3 and4
(1)Same execution time for all shift instructions given by OP Codes 22 and 23
9-30
OP Code
32-35
36-37
38-39
40-43
44
45
46
Instruction
STORE
ACCUMULATOR
3 (4) mc
DOUBLE
PRE CISION
LOAD
ACCUMULATORS
5 (6) me
DOUBLE
PRE CISION
STORE
ACCUMULATORS
5 (6) me
COM PARE
3 (4) mc
INCREMENT AND
REPLACE 3 me
EXECUTE
LOAD ADDRESS
3 me
Operation
U---- (M)
(M)---'UI2, 34
U12, 34 -(M)
U>(M), CFF = G
U=(M), CFF = E
U<(M), CFF = L
(M)+ Uh-----M
(M) ---*MAR
m + B + T----U h
Comme_s
Two sequential memory locations
are loaded into accumulators 1
and 2 or 3 and 4.
This instruction compares any-
one of the accumulators to
memory; it results in the com-
parison flip-flops (CFF) being
set to one of three states.
This instruction sets up the
address of the next instruction
in the memory address register.
This register is then used to
pick up the next instruction.
The program counter is not
incremented for this instruction.
It will then be incremented and
used to continue the normal
instruction flow after the instruc-
tion addressed by the execute
(unless this instruction is a jump).
The execute is then just a one
instruction jump. However, it
should be noted that it is not
considered completed until the
next instruction is completed.
(It is not interruptable until this
time. )
9-31
OP Code
47
48
49
5O
50a
50b
Instruction
JUMP
2 me
JUMP AND SET
INDEX
3 mc
JUMP ON
COMPARE FF
2 mc
LOAD B/T INDEX/
BANK REGISTERS
LOAD B
REGISTERS
4 mc
LOAD T
REGISTERS
4 mc
Operation
M----- P
P----* T
n
(re+B) ---------P
If any condition
TRUE then:
B + m-----P
(M)-------B
(M) ------T
Comments
This instruction is for sub-
routine linkage. Note that the
loading of P is indirected. The
instruction must be preceded by
a load bank, if the subroutine
starting location is not contained
in the program's working stor-
age. Note that the command
also stores the return address
in an index-bank register. A
JUMP instruction indexed by the
same index-bank register will
return back to the original pro-
gram exit plus the increment (m).
This instruction is also conveni-
ent if it is desired to have the
calling sequence (subroutine
parameter values) in the program
bank after the instruction. (For
example the periodic programs
may obtain the I/O variables and
place them here. ) In this case
the loaded index register can
also be used by the subroutine to
get at the calling sequence.
The T bits of the instruction hold
the conditions (greater than,
less than, Equal, or Overflow)
that are to be compared to the
CFF.
Two instructions are derived
from this Op Code.
This instruction is executed if
theTbits = 0. Mis given by
B+m, where B is either B 1 or
B2.
This instruction is executed if
T_ 0. Misgivenby B+mand
T is specified by the T bits.
9-32
OP Code
51
51a
51b
52
52a
52b
52c
53
53a
53b
Instruction
STORE B/T
INDEX/BANK
REGISTERS
STORE B
REGISTERS
3 mc
STORE T
REGISTERS
4 mc
LOAD/INCREMENT
LOAD T
IMMEDIATE
2 me
INCREMENT T
IMME DIATE
3 mc
INCREMENT B
IMMEDIATE
3 me
LOAD B
IMMEDIATE
2 me
LOAD STATUS
3 me
Operation
B ----(IV[)
T -----(M)
B/T REGISTERS
m-----T
m + T----_T
m + B-----B
m-----B
(M)----L, P,
HA, CFF, O,
CCM
Comments
Two instructions are derived
from this Op Code.
This instruction is executed if
T=O. Misgivenby B+m.
This instruction is executed
when T _ 0. M is given by B+m.
Three instructions are derived
from this Op Code.
This instruction is executed if
T#0. Also B=0, otherwise
52b will be executed.
This instruction is executed if
T_0and B=I.
This instruction is executed if
T=0.
This Op Code is used to derive
two instructions.
This instruction is executed if
bit 8 (one of the T bits) is 0.
This leaves 8 bits for m.
This instruction is executed if
Bit 8 = 1. A direct address
given by bit 7 and bit 9 through
16 (9 bits - full address) is used
to form M and access the first
of two sequential locations that
contain the initializing information
for the L, P, HA, CFF, O,
CCM registers. These two
locations will be the place an
interrupting program stores this
information that was automatically
stored upon an interrupt.
9-33
OP Code
54
Instruction
INCREMENT B/T
TEST AND JUMP
2 mc
Operation Comments
This instruction is explained by
showing the contents of the
instruction andB/T register.
bits: 6 3
INSTRUCTION: l OP B/T [
1 6
INCR JUMP I
bits: 7
B/T REGISTER: I MOD
m
9
ADDRESS
It should be recalled that the
B/T registers are in memory
(16 bits and only 9 bits are
required for the full address.
Therefore, the excess 7 bits are
used in mechanizing this
instruction.
This instruction fetches the designated B or T register (B if T = 0) as indicated
by the B/T bits in the instruction. It increments it (+1) as indicated by the increment
bit in the instruction (both the Mod and the ADDRESS in the B/T register are incre-
mented, but the MOD is always counted down by 1). The amount of increments is
specified by MOD in the B/T register (up to 128). If the MOD equals 0 then the JUMP
takes place. The JUMP (+32) is with respect to the P register.
Note that if a T register is being used, the B bit in the instruction can extend
the increment value to +2.
55 This Op Code is used to derive
4 B/T register instructions.
55a INCREMENT B/T
COMPARE
IMMEDIATE AND
SKIP
2 mc
bits:
INSTRUCTION:
bits:
B/T REGISTER:
6 3 7
2 2 3 9
ADDRESS
9-34
The B/T bits in the instruction specify which B/T register is to be fetched. The
first two bits in the B/T register are then decoded to derive which one of the four
instructions given by OP Code 55 are to be performed. The B/T register (the address)
is incremented (+1, 2, 4, 8) as specified by the INCR bits in the B/T register. The
COMPbits inthe instruction and the B/T register are combined to get a full 9 bit
comparison value that is compared to the address in the B/T register. If the com-
parison is not equal the program counter is incremented by 2 rather than 1 if they
compare.
OP Code Instruction Operation Comments
55b INCREMEN_r T,
COMPARE TO A
VARIABLE AND
SKIP
4 me
bits: 6 3 7
INSTRUCTION: IOP I B/T1 ADDR" DISPL" 1
bits: 2 2 3 9
TREGISTER: 100 ]SPARE I INCR] ADDRESS }
The B/T bits are used as follows. The T bits cannot = 0 in this instruction,
they specify which T registeris to be fetched. The first two bits in the T register
identify this particular instruction (55b). Then the B bit in the instruction is used to
fetch the variable {located by B + m) using the address displacement field in the
instruction as done in conventional instructions. The address in the T register is
incremented (+1, 2, 4, 8) as specified by the INCR bits and compared to the va_iable.
If they do not compare the program counter is incremented by 2 rather than 1 if they
compare.
55c INCREMENT B/T
AND JUMP ON 0
2 mc
Bits: 6 3 7
Bits: 2 2 3 9
B/TREGISTER: [ii [ JMPADDR [ INCR I ADDRESS [
9-35
The B/T bits specifywhich register to fetch. The first two bits identify this
Op Code (55c). The B/T register (address) is incremented (+1, 2, 4, 8) as specified
by the INCR in the B/T register. The address is then tested against 0; if it is 0,
then a full address is generated from the 7 bits in the instruction and the 2 bits in the
B/T register to be used as a jump (replaces the program counter). If it is not 0 then
the program counter is simply incremented by 1.
OP Code Instruction Operation Comments
55d INCREMENT B/T
AND JUMP ON +
2 me
bits: 6 3 7
bits: 2 2 3 9
L 1 I
This is identical to 55c except the jump takes place if the address is +.
56 LOAD BYTE This op code is used to form
4 different types of instructions
Ior loading bytes. See
Section 9.1.5 for a discussion
of the byte instructions and the
byte instruction word format.
56a
CLEAR U AND (M)----_U h
PLACE BYTE IN
HIGH ORDER END UhBL-----'Uh (1)
3 mc+(N R + N L + 1) bit times (2)
(1)This transfer and all others involving UhB contain at least the following operations.
(UhB represents the desired byte in its unshifted position in the hardware accumu-
lator with all other bits zero. UhB L (UhBR) is the same accumulator only left
adjusted (right adjusted).
1. The bits including the leftmost of UhB are right shifted into L
2. The accumulator is zeroed.
3. Only the byte is shifted back into U. (The number of bits to be shifted is
controlled by length in LENR. ) The byte will be right adjusted at this point.
4. The byte will now be shifted to its original position or right or left adjusted.
(2)This same execution time applies to all the memory addressing byte instructions.
N R represents the number of bit positions shifted right into L. NL represents the
number of bit positions shifted left back into Uh. The one is added for the zeroing
ofV h.
9-36
OP Code
56b
56c
56d
57
57a
Instruction
CLEAR U AND
PLACE BYTE IN
LOW ORDER END
CLEAR U AND
PLACE BYTE IN
UNSHIFTED
"DT.A_'I_ BYTE IN
UNSKIFTED
STORE BYTE
STORE BYTE IN
HIGH ORDER END
Operation
(m---uh
%BR----%
(m----%
UhB----Uh
,, ----.,T (3)
_hS _1
(M) ----_U h
UhB---_U h
U h V U1----U h
(or)
UhB_---_U1
(m---u h
UhsH----Uh
uhv uf--u h
Uh------ (M)
Comments
Remaining Bits are ,=f_ in
accumulator.
4 store byte instructions are
derived from this Op Code.
The rest of the memory location
remains the same. UhS H is the
same as UhS except the byte is
left in the higher order end of
the accumulator.
(3)This transfer is executed as follows:
1. The bits including the leftmost of the desired byte are right shifted
into L.
2. L is then left shifted the byte length, but zeros are inserted into the
lower end of U. The remaining bits are shifted back into U. (This
has simply zeroed the desired byte position in Uh).
3. U h then replaces U 1.
9-37
OP Code
57b
57c
57d
58
58a
58b
Instruction
STORE BYTE IN
LOW ORDER END
STORE BYTE
UNSHIFTED
LOAD LENGTH
REGISTER
COMPARE BYTE
ALGE BRAIC BYTE
COMPARISON
LOGICAL BYTE
COMPARISON
Operation
Vhsff----vl
(m_u h
VhsL_Vh
UhVU1--%
u h-------(M)
UhB----_U 1
(M_----uh
vhs----vh
VhVV_---vh
M13: I_----'LE NR
(M B) >UhB,
CCF = G
(MB) = UhB,
CCF = E
(M B ) < UhB,
CCF = L
Comments
Ch----(M)
The four bits are used from
bits 13 through 16 of the
instruction.
This OP Code can be used to
derive 4 compare instructions.
However only 3 have been
included thus far.
Note the comparison will assume
that the same position in the
memory word and the accumu-
lator are _o be compared. M B
represents the proper memory
byte. For this comparison the
first bit is considered to be a
sign bit.
Same as above except no sign
bit is considered.
9-38
OP Code
58c
59
Instruction
COMPARE BYTE
ABSOLUTE
ACCUMULATOR-
ACCUMULATOR
AND NEIGHBOR
INSTR UCT IONS
Operation
I(%)1> oc =o
Comments
Many instructions are
derived from this Op code.
They are listed at the end
of the instruction list.
6O REGISTER-REGISTER
INSTRUCTIONS
This Op code is used to
derive many instructions
for operating on the
registers. They are listed
at the end of the instruction
list.
61 SKIP
INSTRUCTIONS -
ACCUMULATORS
Many conditional skips are
derived from this Op code.
They are listed at the end
of the instruction list.
62 CC INSTRUCTIONS A variety of CC instructions
are derived from this Op
code and are listed at the
end of the instruction list.
63 LONG
L_TSTRUCT!ON
FORMAT (32 bits)-i
(Input/Output)
This instruction and the
one that follows use two
words for their mechani-
zation. The format of this
long instruction is shown
for O13 Code 63. This
particular long instruction
is used for I/O and is
explained in Section 7.2.
BITS: 6 1 9 7 9
INSTRUCTION:lOP II/OIAddress I COND/DEVICE I WORDCOUNT
64 LONG
INSTRUCTION -2
The instructions derived
from this format are given
at the end of the instruction
list.
9-39
Itwas seen that some of the instructions used OP Code extension to derive
many instructions from one OP Code. Instructions 59, 60, 61, 62, and 64 will be
presented separately below.
OP CODE 59
CASE 1
BITS:
INSTRUCTION:
Accumulator - Accumulator and Neighbor Instructions
6 1 5 2 2
R 2
R 1
is always an accumulator in this cell
is either an accumulator in this cell or is one of the
4 neighbor cells. If O1>2 is even, then this cell accumulator
is used, if OP2 is odd, then a neighbor is used, except for
some cases as shown below.
OP2 is
0-1 Load
2 Exchange
3 Double Precision Exchange
4-5 Add
6-7 Subtract
8-9 Multiply, 32 bit result
10-11 Divide
12-13 AND
14-15 OR
16-17 XOR
18-19 Logical Add
20-21 Compare, Arithmetic
22-23 Compare, Logical
24-25 Load Byte
9-40
CASE 2
bits:
INSTRUCTION:
26-27
28-29
30
31
Multiply, 16 Bit Result
Load Double Precision
Set Flag, Accumulator R2 to neighbor R 1
Double Precision Add
From CASE 2 Format (below)
1-2 Double Precision Subtract
3 Double Precision Compare
Single Register (see also OP Code 60)
6 1 5 4
R 2 is a register in this cell
4
3
2
1
2
1
1
1
1
Accumulator s
Index Registers
Bank Registers
Level Register
Pairs of Double Precision Accumulators
Program Counter
State and Level Register
L Register
Real Time Clock
9-41
OP2
0
i
2
3
4
5
6
7
8
9
10
11
12
13
14
15
is
l's Complement
2's Complement
Absolute Value
Negative Value
Clear to Zero
Set to all Ones
Jump to Address
Store P
Spare
16
17
18
19
20
21
22
23
24
25
26
27
28
OP CODE 60
BITS:
INSTRUCTION:
R 2
R 1
Register - Register Instructions
6 2 4 4
is any Register
is any Register
4
3
2
1
1
1
2
1
1
Accumulators
Index Registers
Bank Registers
Level Register
Program Counter
State and Level Register
Pairs of DP Accumulators
L Register
Real Time Clock
Skip ifR 2 = N
R2dN
R 2 >_N
R 2 >N
R2< N
R2..< N
R 2 all ones
R 2 even
R 2 odd
Spare
]
]
9-42
OP CODE 61
BITS:
OP2 is
0
1
2
3
INSTRUCTION:
Load
Exchange
Add
Compare
Skip Instructions - Accumulators
6 3 3 4
R 2 is one of the following
3 Index Registers
4 Accumulators
2 Bank Registers
1 Level Register
R 1 is one of the following
4 Accumulators
3 Index Registers
T is type of Skip. Skip if
0 R 1 = R 2
1 R 1 _ a 2
2 R 1 >_R 2
3 RI< R 2
4 R 1 > R 2
5 RI_< R 2
6 Overflow (accumulators}, I/O tests, etc.
7 Spare
9-43
OP CODE 62 CC INSTRUCTIONS
bits: 6 3 4 3
This instruction is used to control the modes of the controller cell and to gener-
ate the commands for the inter-cell bus operation. For certain instructions it uses
an extended format. It was explained in Section 6.
mC Controller Cell Field
000 Do not change mode
001 Control bus !/O logic
010
011
100
101
110
111
000,If C=
Bus I/O (Generate GC Commands) (Sent over inter-cell Bus)
Transmit mode, single
Transmit mode, all
Execute mode, single
Execute mode, all
Do not change mode, then the following instructions may be
formed
IF OP = 000
No Operation
IF OP = 001
Format Instructions:
L Instruction
000 A
001 D16
010 D32
011 A, D16
100 A, D32
101 I
110 DS
9 -44
IF OP = 010
State and level control instructions:
L Instruction
000
001
010
011
100
Go to Controller Cell State, Set Level
Goto Independent State, Set Level
Go to Independent State, No Level Change
Go to Dependent Wait State, Set Level
Go to Dependent Wait State, No Level Change
IF C = 001, Control Bus I/O Logic
There is room for many instructions here. Thus far,
two have been defined.
a. Prepare to input over the bus
b. Prepare to output over the bus
IF C = 010 or 011, Generate commands to be sent over the
inter-cell bus
There are eight basic commands Xo - x7 generated here.
They will be defined below. Section 10 discusses how they are
formed from the CC instruction in great detail. All these
commands are sent out over the bus and Section 10 describes
their functional usage.
L Command
g_ClA
VVI_ X
0
- G!ebal M___e Command
a) Format Instructions
x -1 AFormat
o
x - 2 D16 Format
o
x - 3 D32 Format
o
x - 4 A, D16 Format
o
x - 5 A, D32 Format
o
x o- 6 IFormat
x - 7 DS Format
o
x - 8 End DS Format
O
9-45
L000
001
010
011
100
101
110
111
Command
x o - Global Mode Command (Cont)
b) State Control via Levels
x -9
O
x - i0
0
Go to Global State
Go to Dependent Local State
x I - Global Mode Command
x 1 - 1 Go to Independent State
x 1 - 2 All Dependent Cells at this Level Repl:
x 1 - 3 All Dependent Global Cells at this Lew
Reply
x 1 - 4 Go to Dependent Wait State
x 2 - Report Communication Request Status
x 3 - Input
x 4 - Output
x 5 - Report Status Word
x 6 - Reconfigure
x 7 - Extended Format
a) State and Level Control of Cells
x 7 - 1 Go to Independent State, Set Level
x 7 - 2 Go to Independent State, No Level Char
x 7 - 3 Go to Dependent Global State, Set Leve
x 7 - 4 Go to Dependent Global State, No Leve:
Change
x 7 - 5 Go to Dependent Wait State, Set Level
x 7 - 6 Go to Dependent Wait State, No Level
Change
x 7 - 7 Go to Dependent Local State, Set Leve:
x 7 - 8 Go to Dependent Local State, No Level
Change
x 7 - 9 Go to Controller Cell State
9-46
L111
Command
x 7 - Extended Format (Cont)
b) Output
x 7 - 10 Output
IF C = 100 through 111 then transmit or execute mode is
entered as indicated.
OP CODE 64 - Long Instruction- 2
This instruction uses 2 words (32 bits) for its mechanization. Several possible
instructions that will be included are given below. There is room for several more
and future study should investigate other useful instructions. This instruction was
primarily designed for the bank/index instruction given below. However, this did not
quite use up all the bits. Therefore, it was possible to include many instructions
under this one O:P code.
1. INCREMENT Bor T TEST AND JUMP
Bits:
Instruction:
B/T:
Jump Address:
INCR:
Compare:
OP EXT:
TRANSFER MEMORY - MEMORY
Bits: 6 3 7 3 7 3
t
6 3 9 2 9 3
_ Jump Address ] _CR [ Compare lOP EXT 1
Specifies the B or T index/bank register to be
operated upon.
A full length address that replaces P if the
comparison is equal after incrementing.
The amount the register is to be incremented (_1, 2).
A full 9 bits for comparison with the B or T register
after it is incremented.
Indicates this particular instruction.
3
OP EXT I
9-47
Operation:
B/T1:
ml:
Spare:
(MI)----_ (M 2)
B/T register specification to form M 1.
Address displacement for M 1.
3 spare bits are left for further OP code ext.
3. EXCHANGE MEMORY - MEMORY
OPERATION: (MI)------(M 2)
This instruction has the same format as (2) above, it can use one of the
codes from the 3 spare bits for its specification.
4. COMPARE TOLERANCE AND SKIP
OPERATION: U + L _ (M) >-U-L
This instruction has the same form as (2) above. It can use one of the
codes from the 3 spare bits for its specification. M will be formed from B/T 1
and m I while L is located by B/T 2 and m 2. If (IV[)is out of tolerance then
the program counter is incremented by 2 rather than 1.
5. COMPARE TOLERANCE AND JUMP
Same as (4) above except B/T 2 and m 2 are used to form a jump address.
L will have to be loaded with the tolerance previously. If (M) is out of tolerance
then the jump address replaces the program counter.
6. SET REAL TIME CLOCK
Bits: 6 3 1 9 10 3
Instruction: [OP64 [B/T[ IIm ISpare[ OPEXT
Specifies whether the B/T bits should be used or ignored
and m used directly to load the real time clock.
This instruction loads the real time clock with a value located by using
m and B/T to form an address and fetch the value to be loaded or by using m as
a direct load.
9.2.4 Control
The control section of the processor receives a number of lines from the
memory and from various parts of the processor which it then uses to set control
flip-flops or to generate sequences of control signals that get sent throughout the
processor and to the memory.
9-48
The comparison flip-flops represent conditions, greater than (G), less than
(L), and equal (E), generated from a comparison carried out in the ALTU. After the
comparison, these flip-flops are set or reset by the ALTU as appropriate. The con-
trol unit then uses these flip-flops to control future processor actions during a JMC
(jump on compare FF) instruction. The overflow (O) flip-flop is set or reset by the
ALTU after arithmetic overflows. It can then be used to cause a jump during
execution of a JMC.
When an interrupt occurs, the control unit generates a store status sequence
after the present instruction is completed. This sequence will be explained below
under interrupts (Section 9.2.6). Any of the interrupts can he masked by the inter-
rupt mask register (IlVIR).
The section of the control unit !ab!ed instruction decoding and control genera-
tion (IDCG) has the task of sending out sequences of control signals. These sequences
are generated by decoding and combining all input control flip-flop lines, memory
signals, timing information, and MB, IR, TR, SCR, and LENR register contents.
The signals are then sent out on a number of lines to the remainder of the processor
and to the memory.
The state register and the CC mode register play an important role in the
control unit's operation. The state register will be used to interpret instructions in
ways dependent on the state of the cell. The general functions here were described
in Section 6 under group architecture when the instruction execution was discussed.
For example, jump instructions may modify the program counter in independent cells
and the level register in dependent cells. If the cell is in the dependent global state,
the program counter is not used and the control unit will use the instructions received
over the bus. The format instructions (Given Address, etc. ) modify some of the
normal sequences of control that are generated in the IDCG.
If the cell is in the controller cell state, the CC mode flip-flops will be sampled
by the IDCG for every instruction. These flip-flops determine whether the instruc-
tion is to be executed internally or sent out over the inter-cell bus. Several hard
wired addresses are also contained in the IDCG unit. These are used for the inter-
rupts and for certain types of interrupts due to inter-call bus _omm_u_-_ u_,_u_u_-^^'_^'_.,,._^_
this cell. It should again be noted that Section 6 contains a general discussion on
many of the functions that the IDCG unit will control or generate.
9.2.5
As shown in Figure 9-9, each cell contains its own clock. For preliminary
design purposes a clock frequency of 2 MHz has been assumed. This number may
certainly be varied depending on the technology without affecting the cell design. The
clock drives two counters as shown in Figure 9-9, the RTC Ext, and the bit time
counter (BT C).
A real time clock as shown in Figure 9-10 will be used in each cell. The lower
23 bits of this clock are hardware registers, while the upper 16 bits, or more if
desired, are in the memory. The hardware portion is divided into two sections, a
16 bit clock register (RTC) and a 7 bit clock extension register (EXT). The RTC
register can be set and read by instructions. Once it is set it will count down to
zero and send out an interrupt. The EXT register cannot be read or set, but it can
9-49
16 BIT MEM. RTC[
•"*3.2 DAYS
(MAXIMUM VALUE)
16 BIT RTC I.
4.2 SEC (MAXIMUM
VALUE)
64 _s 7 BIT EXT 0.5gs
Figure 9-10. Real Time Clock
be initialized to zero in order to setup precise timing at the beginning of a computa-
tion phase. Unlike the RTC register this clock and the memory clock count up.
The time weighting of the RTC register depends on the variation from phase to
phase in the rates of the highest rate programs. This variation has not been accur-
ately established at this time; however a reasonable estimate for the RTC register is
that shown in Figure 9-10. This weighting was chosen to allow for fairly low fre-
quency periodic programs, up to 4.2 sec, while not being forced to waste too much
time in order to handle higher frequency programs. A maximum of 64 _ sec may
have to be wasted, but the majority of this time is actually well spent in storing the
registers of the interrupted program. Further detailed discussion on using the RTC
to interrupt and schedule periodic programs may be found in Ref. 1.
The bit time counter (BTC) will be two bits and is incremented by the clock.
(This assumes a memory cycle will take 2 processor clock times). It will in turn
increment the mode counter (MC) that is used to keep track of the phases of instruc-
tions longer than a memory cycle. The mode counter has been set at 4 bits so that
it will provide at least 8 counts for long instructions, such as divide and provide room
for growth up to 16 if complicated instructions are later added to the list. Therefore,
it will be assumed that the BTC sends two lines to the control section of the processor
and the MC eight lines to the control section.
9.2.6 Interrupts
There are several types of interrupts that may occur in the system:
1. Real time clock counts down to 0
2. Software or machine error
3. I/O Buffer register operation complete
4. Inter-cell bus I/O complete
9-50
5. Timer interrupt due to no neighbor response
6. Timer interrupt due to no response to a GC X 2 command sent out
7. Timer interrupt due to time interval between words received over the
inter-cell bus exceeding some value.
Any of the above interrupts may be masked out by the Interrupt Mask Register
(IMR). This register may be set by the programmer. Some of these interrupts were
discussed in previous sections of the report. The remaining ones will be discussed
in Section 10 and 11.
The general actions that occur upon the occurrence of an interrupt will be
presented below.
The present instruction will be completed and certain processor registers
saved before giving control to the interrupting routine.
1. Complete present instruction
2. Pick-up hard wired interrupt address
3. Store L register
4. Store processor status (see status word below)
5. Store U back in its memory location
6. Store Bo register
The address of where the processor registers will be saved is given by a hard
wired address in the processor. This will be picked up and placed in the MAR. The
L register will be saved in the first location. Then the MAR will be incremented and
the status word stored. The status word is shown below:
bits: 9 2 1
Status Word: I p I HA I O ]
2 2
CFF CC MODE
P: Program Counter
HA: Hardware Accumulator ID bits
O: Overflow Flip-Flop
CFF: Comparison Flip-Flops
CC MODE: The CC Mode Register
9-51
The MARis incrementedagainand the B o index/bank register is moved from
its regular memory location to the location specified by MAR. The MAR is incre-
mented again and placed in P and the memory location normally used for Bo.
This final location provides a jump to the executive routine for processing the
interrupt. It also loads the Bo register with a bank address for the executive pro-
gram to use to address operands. Depending on what action is to follow the inter-
rupt, the executive may have to store the remainder of the processor registers;
(U1, U2, U3, etc. ) however, it can easily do this by instruction since the program
counter and Bo have been saved and Bo is then able to be used for addressing
storage.
A different executive interrupt routine will be entered depending on the particular
interrupt. This is accomplished by incrementing the program counter an appropriate
number of times before using the final location as a "jump to" address. Thus in the
above example the P register was not incremented before the jump, this could have
been a RTC interrupt. For a machine error interrupt the P register might be incre-
mented once before jumping, etc. for the other interrupts.
The priority scheme among the interrupts has not been defined yet. Further
study would dictate the relative priority among the interrupts.
To restart the interrupted program, the reverse procedure is essentially
followed. The registers that were saved by the interrupting program are restored and
then a LOAD STATUS instruction is executed (OP Code 53b). This instruction takes
the full 9 bit address from the instruction word and places it in the MAR. This
address will specify where L and the status word were saved by the interrupting
program. First L will be loaded, then MAR will be incremented by 1 and the status
word picked up. The status word will reload P and the other registers described
previously. The original interrupted program is then ready to continue. It should
be noted that it is not necessary to restore B since this can be done with a regular
load Bo instruction. This is possible since the LOAD STATUS instruction does not
use any B/T registers.
9.2.7 Neighbor - Neighbor Communications
Some of the hardware associated with implementing neighbor communications
was discussed previously.
The functional operation is described here. One cell will pass a word of data
to a neighbor cell. The sending cell, called here CO, will be executing a program
1_. The receiving cell (C1) will be executing a program (Pl) that requires the word
of data contained in CO.
The program PO will place the word in one of its accumulator registers and set
a flag associated with that register. This flag will indicate to whom the data is to be
sent. Thus the flag can be set to 5 states: N, S, E, W, or X. The X setting is the
normal state of the flag and indicates that the register contents are for the use of
cell O only. Unless an instruction is executed to change the flag state, the registers
can only be used by cell O; the flag will always be X. If the flag is set by an instruc-
tion to N, the North cell is to receive the register contents. By setting the flag to
S, E, or W, the South, East, or West neighbor can receive the data.
9-52
Subsequently,the program in cell 1 needsthedatafrom cell 0. Program Pl
will executean instruction with the following format.
o cooE091o loP2j r j
OP 2 - Is the operation to be performed, and is any of the usual register-
to-register operations (add, subtract, load, etc. )
R2 Is the accumulator register into which the results will be loaded.
This accumulator may also furnish an operand for some
instructions.
R 1 - Is the relative location of the neighbor cell (N, S, E, W) which
will have a register with the proper flag.
The execution of this instruction will cause cell C1 to request from CO a word of data.
If CO has a register with the proper flag, CO will send C1 the data word. Each cell
contains a buffer register, ION (Serial line used between cells); thus the transfer will
not destroy any accumulator contents. After the transfer is complete, cell C1 will
perform the operation specified by the OP Code, using the transferred word as one
of the operands.
After a data word has been transferred the register flag in CO is reset to X.
By executing an instruction, the cell can test its flags. In this way, a cell can verify
that the neighbor cell called for and was sent the data. This flag test may provide a
means of detecting a malfunctioning cell.
Several special cases are outlined below. The descriptions below may be
changed as the system is studied further. Actual programming of some sample
problems may show ways of improving the system.
If two accumulator registers are set to the same flag value, the lower numbered
accumulator will be transferred first. Only one datum will be transferred at any one
time for any single instruction execution.
CO may attempt to alter the accumulator contents before a neighbor cell has
requested and received its data word from cell CO. The control circuits in CO can
be set to: (1) wait until C1 receives the data, (2) interrupt to some special error
routine, or (3) ignore the flag and use the accumulator anyway. Which options are
implemented requires further study and programming examples.
9-53
The hardware needed to implement this neighbor-to-neighbor communication
scheme consists of a buffer register, accumulator flag registers, and control circuitry.
Figure 9-11 shows a block diagram of the hardware. Upon execution of an instruction
fla_gin_ an accumulator the hardware will set the flag flip-flops associated with the
accumulator to the proper state. The control circuitry will be set to a state to
expect a request.
Upon execution of an instruction requesting a word from a neighbor, the control
in cell C1 will select the proper line and send a request to that cell. If cell CO does
not have any flag set for this neighbor, the request will be rejected. (Actually no
response will be received, When C1 sends out the request it also sets a timer which
allows for a maximum delay in receiving the data. If the timer runs out, an interrupt
occurs in C1 and is handled by software. ) If cell CO has an accumulator flagged, the
accumulator contents will be transferred in parallel to the buffer register in CO.
The accumulator flag is then set to X. A bit-by-bit serial transfer will move the bits
from CO to the buffer register in C1. The filled buffer register in C1 will be used as
an operand.
One aspect of the control circuit should be mentioned here. A single line
between any pair of cells gives the highest reliability because of the least number
of connections. The only problem comes about when two cells attempt to request
data from each other at the same time. One method of control is to have the cells
with an even address make requests at one time, and cells with an odd address make
requests during alternate times. The address assignment is done by the controller
cell via the inter-cell bus. By assigning cell addresses in a checkerboard pattern,
every evenly addressed cell will have four odd addressed neighbors.
The scheme described here requires one instruction to send out a word, and
one instruction in the receiving cell to pick up the data and operate on it. A total of
only one additional instruction compared to a sequential program is therefore
required.
9-54
iI
I
r----
I
,..]
• 0
E_
Z
o
• U
o
z0 , _.1
O _ZO
_o F_
F_
{/)
o_ _
F_ 0
(.3 ] f._ b-I l (/_
r--
Z
0
U
F_
(/)
O.J
,_ {J .j
I
OI
i
I
I
!
:E
U
U
o
o
o
t_
°_-I
2_
I
o
I
t_
o
o_-,I
Z
.---q,
!
9-55
9.2.8 Inter-Cell Bus Communications
Basically nine lines enter the bus communications section via the BCR. One is
used for control and the remainder for 1/2 word data (8 bits). The Bus I/O Control
Section will examine every byte on the bus that has the control line set to the one state.
It will then determine if this byte is for this cell. This will be accomplished by
examining the bits in the byte and determining what type of command this is, and if the
cell address (ID) or level register match it (which one is used depends on the
command). The Bus I/O Control Section will then issue the proper control signals to
execute commands for the cell from the bus. Section 10 will present a detailed
discussion of the bus operation, the bus commands, and detailed timing and logical
operations to mechanize the bus commands. However, a general discussion of the bus
section Is in order here to place it in proper perspective to the remainder of the cell
hardware.
Some of the commands from the bus result in a pseudo interrupt. That is, they
halt the execution of the present instruction (essentially freeze the IDCG and the
arithmetic and control units) and provide access to the memory to carry out data
transfers. Once the data transfers are complete, the cell is allowed to complete the
instruction. This is relatively simple to accomplish since the arithmetic and control
registers are not altered during such an interrupt.
Other commands result in: the state or level registers being changed, certain
global control instructions (such as the format) etc. Certain commands will also tell
the cell that global instructions will be arriving over thebus. The Bus I/O Control
then has the responsibility of generating the proper control signals to the IDCG unit
and sending it the received instructions.
This section of the cell also contains a Bus I/O register that essentially holds
the I/O commands for bus operation so that the Bus I/O control section may generate
the proper control signals, such as the memory write signals for inputting a string of
bytes from the bus into the memory.
Two timers are also shown in this section of the cell. One timer is used by a
cell in the controller cell state to time the response from other cells to certain
commands. This prevents the controller cell from being hung up waiting for a response
from a faulty cell. If the timer exceeds its maximum value, an interrupt is generated.
Similarly another timer is used to time the interval between bytes received over the
bus. If the interval exceeds some value an interrupt will be issued if the bus I/O
operation has not yet been completed. This has several functions, one of which is to
prevent a cell from being hung up waiting for a cell that fails during the middle of a
communications between the two cells. The above points will be clarified when
Section 10 is referred to. The important point to note here is that the bus I/O control
has a very high priority. This approach was taken to provide fast communications
over the bus and to not make the controller cell have to wait for cells to complete
instructions or programs they are executing before it gets communications control of
the cell.
9-56
9.2.9 Memory
Each cell contains a 512 word (16 bits per word) memory as shown in
Figure 9-9. Part of the Memory (9 locations) are used for processor registers
namely the accumulators and B/T registers. The memory receives addresses and
data and read/write control signals from the processor. All information read out of
the memory is sent to the MB register.
A description of a basic memory cell for storage of a bit of information and
how the basic memory cells are used to form the memory will be given below.
Figure 9-12 shows the schematic of a conventional complementary MGS bistable circuit
without any provisions for reading or writing. Referring to the schematic of
Figure 9-12 note that Q1 and Q2 are never both on simultaneously except for a very
short transient time during a change in logic states. Since one of these two MOS
transistors is always off in a standby mode, the only current drawn from the supply
voltage is the leakage current of the "off' transistor. In one logic state, Q1 and Q4
are on, Q2 and Q3 are off. In the other logic state, Q1 and Q4 are off, Q2 and Q3
are on. There are two leakage current paths in this basic memory cell, one through
Q1 and Q4 and the other through Q3 and Q2. Addition of read and write circuitry will
add at least one more leakage current path, making a minimum of three per memory
cell.
A block diagram of a possible memory cell design is shown in Figure 9-13
along with a truth table.
The memory cells may be combined into a bit plane array as shown in
Figure 9-14. Each bit plane array consists of 512 bits of memory cells shown in
Figure 9-13. The bit plane arrays may now be combined into a 512 word -16 bit/word
memory as shown in Figure 9-15. This is accomplished by stacking the arrays in
parallel. Each array is inputted common row select, column select, read, and write
signals. One bit of the data word is fed into each array. It should be noted here that
this is the place where yields on the wafers may be increased by manufacturing more
+V
Figure 9-12. Basic Memory Cell Utilizing Complementary MOS Transistors
Without Selection or Readout Provisions
9-57
+V
s j, O------
ONE-SET INPUT
sJ° O------
ZERO-SET INPUT
COINCIDENT SELECT
NDRO
RS MEMORY CELL
i Gnd
mj
BIT OUTPUT
Rj
"------OREAD COMMAND
Wi WRITE COMMAND
FUNCTION OF
ij CELL Rj Wi SJ_ Sj °
NOT SELECTED 0 0 0 0
NOT SELECTED 0 0 0 1
NOT SELECTED 0 0 1 0
NOT SELECTED 0 0 1 1
NO CHANGE 0 1 0 0
WRITE "O" 0 1 0 1
WRITE "1" 0 1 I 0
NOT ALLOWED 0 1 1 1
NOT SELECTED 1 0 0 0
NOT SELECTED 1 0 0 1
NOT SELECTED 1 0 1 0
NOT SELECTED 1 0 1 1
READ 1 1 0 0
NOT A LLOWED 1 1 0 1
NOT A LLOWED 1 1 1 0
NOT ALLOWED 1 1 1 1
Figure 9-13. Logical Operation of a Coincident Select Memory Cell
9-58
O_
0
.I.-_
0_
m
_0 m
_D
o
_D
_D
o
0
o
P_
0
o
r_
°_,,i
o
o
o..
o
0
4
I
9-59
b_
Q,eee
L J
eeooe]
!
¢_
¢_
r-4
v-4
T
Q
_-4
0
L_
@
t_
0
0
0
I
t_
L)
0_
9-60
than'16 arrays on the wafer and selecting which of the arrays to wire together to form
the memory shown in Figure 9-15. Finally, to form the entire block labeled
"memory" in Figure 9-9 a 9 bit address register and a small amount of control logic
would be added to Figure 9-15.
The memory will read and write in two bit times. The first bit time will be used
to set up the address decoding and selection circuitry. The second bit time will be
used to either write in the data or read out data from the selection cells. It should be
noted that the data will be actually written into the selected cells somewhat before the
end of bit time 2. However, the data read out of the memory will only be strobed into
the processor section at the end of bit time 2. Therefore, the actual write cycle takes
somewhat less time to complete then the read cycle. The numbers used in the pro-
cessor section for bit times were 500 nano seconds resulting in a memory cycle of
1 Ilsec. These are conservative numbers and should be readily attainable. The actual
figures in the time period of application of the DAMP computer may result in typical
memory cycle times of 250-400 nano seconds.
This section has presented only a preliminary design of the memory. Further
study may dictate better ways of organizing the basic memory cells. In particular,
discretionary wiring techniques to achieve reasonable yields will play an important
role in determining how to organize the basic memory cells. It should be noted that
Ref i contains further detailed analysis of a Complementary MOS memory similar to
the one presented here.
9.3 GROUP SWITCH
This section will present some preliminary design considerations of the group
switches. Referring to Figure 1-1, one can see the relation of the group switches to
the cells and the busses. The group switch controls the information transfer between
the inter-cell busses and the inter-group busses. Essentially the group switches look
like another cell to the group that is the system executive (also referred to as the
LX_ ...........
the communications over the inter-group bus. This is quite similar to the controller
cell in the group controlling the communications over the inter-cell bus.
The main difference between the control of the two busses is that the controller
cell receives an instantaneous response to its commands that it sends out to the other
cells in the group whereas the executive group does not receive an instantaneous
response to its commands sent over the inter-group bus to other groups. This results
in different communication system mechanizations for the two busses. This subject
will not be gone into in this section but will be treated in detail in Section 10 where
both busses and their mechanizations will be discussed. However, a block diagram of
the group switch and a functional description of its registers and circuitry will be
given below. It will be seen that the group switch is relatively simple in terms of
hardware complexity. This should offer the possibility of employing redundacy
techniques to design a highly reliable group switch. A block diagram of the group
switch is shown in Figure 9-16.
9-61
inn m m/ m _lllm mira u u / i m / m mml_ |
i
i
i
I
I
I
i
I
I
I
I
I
li
I
I
I
I
I
I
INTER-GROU P
BUS
CONTROL
BITE
TIMING
CIRCUITRY
INTER-CELL
BUS
CONTROL
I
I
I
I
I
Lain m m
BCR
k
T Ir
BCR
._.Jk ,k.
m m
I
BUSY m
I
GROUP REQUEST REGISTER
5 9 1 9
I GROUP i CELLIADDRESS[I/O[COUNT I
8i
I
I
I
I
I
I
OUTPUT KEG. m
I
I
I
I
I
I
mm_ A
_, _,
i INTER-CELLBUS I
Figure 9-16. Group Switch Block Diagram
9-62
A functional description of the hardware in the group switch will be given below:
BCR: Buffer Communication Register - Two half word long
registers that receive the present byte over the bus they
are connected to.
GROUP SWITCH
ID
A 5 bit register to hold the identification or address of the
group switch. This register is used to decide which
commands are addressed to the group switch. It has a
serial load line for initialization purposes. Note that this
provides the capability for having up to 32 group switches
or 16 groups (2 group switches used per group) in the
system.
I-G BUSY: Inter-Group Bus Busy Flip-Flop - flip-flop is set when the
EXECUTIVE GROUP COMMAND REGISTER is loaded in a
group. It is reset when an "end of transmission" command
is sent over the inter-group bus. It may be sampled by
command over the inter-cell bus.
EXECUTIVE GROUP
COMMAND
REGISTER
This register holds a command received over the inter-
group bus that was addressed to this group switch. It may
be sampled by command over the inter-cell bus. It will be
loaded with a command for this group identifying the cell,
address and number of words to be input or output over the
inter-group bus. This register will automatically be reset
when it is sampled over the inter-ceU bus.
GROUP REQUEST
REGISTER
This register holds a request received over the inter-cell
bus by a command addressed to the group switch to load
this register. It may be sampled by command over the
inter-group bus. It will hold the requested I/O over the
inter-group bus.
0: Output Register - This register is set by a command
received over the inter-group bus. It may be sampled by
a command over the inter-cell bus. It is used to
synchronize the start of communications between two
groups. This register is automatically reset when it is
sampled by command over the inter-cell bus.
BITE TIMING
CIR CUITRY
This circuitry consists of a flip-flop that is alternately
set-reset by a command from the inter-cell bus. The rate
at which it is set will determine the voltage generated across
a circuit. Two out of ........._u_raI,ce level detectors across this
circuit will check for a high and low value of voltage. If
the voltage is out of tolerance a failure signal is issued.
9-63
INTER-CELL AND
INTER-GROUP
BUS CONTROL
These blocks contain all the control circuitry to decode
information received over the busses and execute
commands addressed to the group switch. This control
circuitry will also generate inter-group bus commands
from certain inter-cell bus commands. It is responsible
for opening up the communication path between the two
busses.
9-64
I0. COMMUNICATION BUS
The inter-cell and inter-group bus communications will be described and
mechanized in this chapter.
10.1 INTER-CELL BUS
This section will discuss the operation of the inter-cell bus. A functional des-
cription of the bus will first be given followed by a detailed mechanization of the
operations that take place on the bus. The bus is under completc control of the ccll
in the group designated as the controller cell. The number of lines required in the
bus depends heavily on the communication rates required. Determination of this
number is not possible at this stage of the machine design. However, for preliminary
design purposes an 8-bit parallel bus shall be assumed. This then provides for 1/2
word bytes to be transmitted over the bus (16-bit word length in the cells).
The inter-ceil bus communications may be divided into two classes: (1) comm-
unication between cells in a group and (2) communications between the controller cell
and the group switches of the group. The latter type of inter-cell bus use is primarily
concerned with accomplishing intergroup communications. Therefore, it will be
treated in the folIowing section (10.2) where the intergroup bus is discussed. This
section will treat the inter-cell bus operations that take place in the group between
the cells only. When the "bus" is referred to in this section, it will imply the inter-
cell bus.
Communication on the bus takes place in basically two types of modes, local
and global. Local use is basically communication between two cells with control set
up on the basis of cell address identification and not dealing with the control of levels
or states of cells. Global use implies communication of the controller cell with one
or more of the other cells with control set up on the basis of cell address identifica-
tion, levels, or states for the purpose of global control of the cells.
The bus may be used for both instructions and data. Local use of the bus is
basically for passing data amongst cells and amongst cells and I/O devices connected
to the bus (see I/O section for a discussion of the I/O operation). Global use of the
bus may be for instructions and/or data. In any case the controller cell sets up and
controls all information flow over the bus. Software routines are set up in the con-
troller cell to control the operation of the bus. There are basically two types of
operations in these routines: fixed periodic and background. Fixed periodic opera-
tions on the bus are those that must take place at pi_edetermined intervals. These
may be either local or global type of communications. Examples of such operations
are a cell requiring updating of navigation and guidance parameters computed in
another cell every second, a set of global operations comprising a periodic program
that must be computed ten times a second, etc. Background type of operations on
the bus are those that take place in the absence of any fixed periodic operations; in
other words operations fitted in between the fixed periodic operations. The executive
section (Section 11) should be referred to for a discussion of the software routines
for bus operation.
i0-I
A total of ten lines are used for the inter-cell bus. One line is used to denote
control or data and is designated the control line. Another line is used for parity;
future studies may dictate the need for more lines for error control, however for the
present only one line shall be assumed. The remaining eight lines are used for
control or data words. The lines are bidirectional so that a cell may receive or
transmit over the same line; this is accomplished by the use of driver/receiver cir-
cuits at the interface with the lines in each cell (see Ref. 33 for a discussion of
these circuits).
The use of the eight lines used for control/data words, in particular for control
purposes, will be described below. There are basically two types of control words
used: Local and Global. Local words are distinguished by requiring a particular cell
identification or address to be specified while global words require the specification
of an identification address or certain global levels or modes. The control words are
decoded by all the cells and only the appropriate cells partake in or accomplish the
desired communication on the inter-cell bus.
All words over the bus are composed of eight-bit bytes whether they are control
or data words. The control words are identified as such by use of the control line:
the first byte of any control word is identified by the control line set to a one (control)
state. The control line will return to a zero after the first byte of a control word.
Some control words require more than one byte, this is accomplished by use of
special control word formats as will be explained below. The format of the first
byte of a control word is shown below:
lines : C 1 2 3 4 5 6 7 8 P
I Cont/ I Cell Global [ Parity IData Command Address or Control
Three lines are used for the command thereby providing eight possible command
states and five lines for the cell address. This provides for addressing up to 32
devices or cells. The group may consist of typically 20 cells for the manned Mars
mission considered, this would then leave room for 10 I/O devices to be connected
to the bus (2 addresses are required for the two group switches connected to each
intercell bus). It also provides for expansion so that more cells may also be added.
If 32 addresses are not sufficient for addressing cells and I/O devices on the bus, it
is relatively simple to provide for expansion beyond 32 by using an address extension
scheme. This involves using some address, say address 32, to signify that the next
byte contains more address bits. This of course has to be designed into the hard-
ware, however it is relatively simple to implement and can provide for a high degree
of expandibility both in terms of cells and I/O devices connected to the intercell bus.
Of course it is possible to provide this expandibility in the cells or the I/O devices
only, for example providing up to 31 cells and group switches and using address 32
for an address extension scheme in the I/O devices only, etc.
Table 10-1 lists the types of communication required to be carried out on
the inter-cell bus: (this list includes both local and global operations). It should be
noted that I/O devices are also included in the term cell used in Table 10-1.
10-2
Table 10-1. Communication Bus Operations
Q
.
a
.
5.
.
Controller cell to send words directly to a cell under controller cell's
command.
Controller cell to receive words directly from a cell under controller
cell's command.
Controller cell to scan bus usage requests from individual cells and
establish communication between two cells based upon requests.
Controller cell commanding communication between two cells.
Controller cell issuing a command to a cell specifying some changes to
the internal control state.
Controller cell issuing global commands to one or more cells or a cell
issuing an end of transmission command.
The operations listed above are mechanized by various control words. Each
control word is composed of one or more bytes, the first byte having the format
given above. The first byte will be identified as a command; a list of commands is
given below in Table 10-2.
Table 10-2. Communication Bus Commands
Command No. Code Description
000X 0
X 1
X2
X3
X4
%
X6
X 7
001
010
011
i00
I01
ii0
IIi
Global mode command
Global mode command
Report communication
request status
Input
Output
Report Status Word
Control Reconfiguration
Extended command
format
10-3
Commands X O, XI:
Command X2:
Command X 3:
Command X 4:
Command X5:
Command X6:
Command X7:
These commands are used for global operations, the
remaining 5 lines are not used for cell address pur-
poses, but will be used for various global control
functions.
This command requests a response from a given cell
as to the status of its requests for use of the communi-
cation bus.
This command tells the cell to input the next set of
data words on the bus. (A set of data words is defined
as the words on the bus in between control words)
This command tells a cell to output a set of data words
on the bus.
This command requests a cell to send to the controller
a status word representing certain control states in the
cell.
This command forces a cell to perform some change to
the internal control state (e. g., turn on/off, etc. ).
This command uses an extended format. It requires
the second control word byte to identify what the com-
mand consists of; this then provides for more than the
eight basic commands listed here. This command will
also be used to change cells from local control modes
to global control modes as will be explained later in
this section.
10.1.1 Local Operations
The mechanization of the required communication operations on the bus will now
be discussed. A description of the control words will be given for each operation.
The first five operations listed in Table 10-1 may be identified as local operations;
these will be given below.
10-4
1. Controller to sendwords to a cell:
Sender: CO C O CO C O
Receiver: ALL C 1 C1 C 1 ------
BYTE: 1 2 3 4
Lines
C 1
' l
2 X 3
C5
E
L6
L
8
0 0
0 0
1
A
D
D
R
E C
S O
U
N
T
I
D
A
T
A
The first byte of the control word is identified by a one on the control line. As
mentioned pi'eviously the first byte contains the command, input, and the address
of the cell, C 1. A 9-bit address is sent in the second byte and part of the 3rd
byte to identify the first location in the cell to be input to. The number of words
to be input to the cell are specified by the count which is sent in the remaining
part of the third byte and part of the fourth byte. The use of only three bytes in
the control word providvs for a count of up to 32 words while four bytes provide
for a count of up to 512 words. The cell determines when the control word is
complete by the 0 to 1 transition on line number 1 as shown above; this scheme
provides for the capability of a variable length control word format and saves
the transmission of one byte when less than 32 words are to be input. The
following definitions will be used to identify the cells: Co, the controller cell,
C 1 and C2, the cells used in the communication process.
10-5
2. Controller to request words from a cell
Sender: C O C O C ° C O C 1
Receiver: ALL C 1 C 1 C 1 C o
BYTE : 1 2 3 4
Lines
C 1
3
C5
E
L
6
L
8
0
0
I
A
D
D
R
E
S
S
0 0
0 1
C
O
U
N
T
D
A__
T
A
The same discussion as for 1 above applies here except that the control
word applies to outputting data from cell C 1 now.
10-6
3. Controller to scan a cell for communication requests - Cell, C1, to request
words from cell C2.
Sender:
Receiver:
BYTE:
Lines
C
1
2
3
4
5
6
7
8
c iciAo
L IICL o
1 2
t 0
II/
O
LC
i2LL
l' ID
C1 C I CI[ C C C C
AO o o o
CO CO CO L C 2 C2 C 2
L
3 4 5 7 8 9 10
D
R
E
S
S
0 0
I I,
O
u I
N I
T I
I
I
I
i
I
I
I
c
A° i
LLI
6 I
I
1 I
C
E
L
L
C 1
1
1 0 0 0
i
0
:4
IC
E
IL
L
]c2
1
A
D
D
R
E
S
S
I
N
T
I
C2 .... C2
C 1.... A
L
L
I
I 1
t
I IA
[ T_ X
I A I°
I
0
I
I o
I 0
I
0
I
I 0
I
The controller cell outputs the first byte of the control word sequence,
X2, which asks a particular cell, C 1, if it has a request for service on the bus.
Cell, C 1 responds by outputting a response word as shown above. In this
particular case the first byte of the response identifies the desired operation
(input for this case), the Cell, C2, communications is desired with, and part of
the address in the cell C2. The address and number of words are completed in
the remaining three bytes. The address in this case specifies the location in
cell C2 from which the words are desired. If a cell has no request the response
will consist of two bytes all zeroes. It should be noted that cell C2 could also
be the controller cell with whom a request is made for communications with. The
control word response could actually be formed in three bytes, however using
four bytes results in an easier mechanization within C 1 and actually does not
result in requiring more time on the bus as will be seen later in this section.
The controller cell accepts the response from the cell and examines the
request. It will determine whether the full word count requested can be aceomo-
dated on the communication bus. The controller cell then outputs the sixth
byte of the control word which is an input command to the cell, C_, that
requested the words. Next the controller cell outputs the remaining control
word bytes telling Cell C2 to output a certain number of words (may be reduced
below that requested by C1) starting at a location specified by the address. The
same comments apply as before with regards to varying the length of the X4
command should less than 32 words be desired to be communicated.
10-7
It should be noted that the X3 command, input, to cell C 1 given by byte 6
is not executed until byte 10 has been sent. This is due to the fact that the X 3
command is a variable length command and requires a 0 to 1 transition of line 1
after transmission of the first byte (byte 6) of the control word comprising this
command to signify the complete transmission of the command. This logical
function is utilized here so that cell C 1 is told to input, however it will not pick
up words on the bus until byte ten has been transmitted (this 0-1 transition will
enable the receiver circuitsin cell CI and simultaneously enables the driver
circuitsin cellC2). Itshould also be noted thatthe byte 7 command is actually
not examined by cellC 1 since ithas a command which is in the process of being
set up (normally a I on the control lineforces all cellsto examine the command
to determine ifitis for them). Also note thatthe number of words thatwill be
received by cellC1 may be different(less)from thatrequested by it;the cell
keeps track of thediffe]'enceifany between the number of words requested and
actuallyreceived. Ifany differenceexists itwill take the appropriate action on
itsnext request to the controllercell. Finally, the lastbyte is an Xo command
with all zeroes sent out by cellC2. This command is used to reset the bus
busy flip-flop.This will take place in allcells;however, itonly has meaning
in the controllercell. This will signifythatthe communications is complete.
(IfcellC2 is the controllercell,thisbyte will not be sent out.)
1
Sender:
Receiver:
BYTE
Line
C
1
2
3
4
5
6
7
8
Controller to scan a cell for communication requests - Cell, C1, to
request to send words to cell C2
I
C C 1
A o i
[
LLI Co
1 _2
I
1 j
I
1 IO 0
i
i I I/
" '1 s
:2 I S
I c
IN
' IL
I i L
E I 2
II, I
I
li ,I'A
I tD
I D
, I
C 1 C 1 Cll C C C C C
^o o o o o
i
C C C L C2 C2 CI CI
o o o I L
3 4 5 6 7 8 ii 12
I
0 0 i
O
u I
T I
I
'i
t C
IE
i L
L
I C2
0 0 0 0
0
A
D
D
R
E
S
S
C
0
U
N
T
¢
i
Co I Co
I
C2 , L
L
9 I10
I
I
0 I1
'1I
l
C
!
tL
II.
I
I
I
I
I
0 1
l°N
X 7 T
X
'r
t
O
i C1I CI ----_
] C2 ----_ ALL ,
I
I
I
1
i D t
1 A .... X
t T o
!
I o
I
I 0
i
0
J
I o
i 0J
I
I
10-8
The mechanization of thisoperation is very similar to thatdescribed
above for the input request by a cell. In factthe firstfivebytes are identical
except thatan outputis indicatedin the second byte. The cell, C2, to whom
words are to be sent is toldto input by bytes 6 through 9; note thatbyte 9 does
not have lineI in a i state. This prevents cellC2 from picking up the next
three bytes as data words. The requesting cell,CI, is toldto inputby means
of an X 7 extension command (output- word count),thisis used since all the
controllercan tellcellCI is how many words itmay output since itdoes not
know the address they willcome from (ifitdid itcould use an X 4 command)•
The X7 command isa variable lengthcommand and therefore uses a 0 to 1
transitionon linei after the firstbyte to signifyitscompletion; note thatthis
transitionis also used to complete the X3 command given previously to cellC2.
In addition,cellC1 willsend an end of transmission command as the lastbyte.
5. Controller Cell to tellone cell,C I, to outputto another cell, C2.
Sender:
Receiver:
BYTE:
Line
C
1
2
3
4
5
6
7
8
CoI,
L
i
I
'3
I
,!
IE
L
Ic2
II
Co Co Co
C2 C2 C2
2 3 4
0
1
A
D
D
R
F
S
S
0 0
0 0
C
O
U
N
T
I
Co Co Co Co I C1
A
L C1 C1 C1 I C2
L
5 6 7 8 I
I
1
X4
l
I
C,
E
L
L
C1
1
0 0
0 0
A
D
D C
OE
U
S
N
S T
i
• . . C 1
--" ALL
01D i
A .... I
i I TA X
o
I
I o
I
I o
I o
! o
This operation requires an output command as described previously to be
given to the cell C 1 and an input command as described previously to cell
C2. The only difference is that a 0-1 transition on line 1 is inhibited in
byte 4 so that cell, C2, may not start receiving data until byte 8 has been
transmitted.
6. Controller Cell to tellone cell,CI, to inputfrom another cell,C2. (Same
as above.)
10-9
o Controller Cell to command a cell, C 1 to reconfigure some control state.
Sender: Co Co Co
A A
Receiver: L L C 1
L L
BYTE: 1 1 2
Line
C
1
2
3
4
5
6
7
8
i Oil i1
T
T T
C C
E E
L L
L L
i i
0 This operation is simply a one
byte command if the control
1 change is represented by X6,
I two bytes are necessary to
identify other changes by
X7 using X 7.
E
X
T
E
N
S
I
O
8. Controller to Command a cell, C 1, to report its status word.
Sender: Co C 1 "-
A
Receiver: L Co --_
L
BYTE: 1
Line
C 1
2 i_3
4 1
C
5
E
L6
L
7 C I
8 _
S
T
A
T
U
S
W
O
R
D
10-10
This command forces cell, C1, to output a status word to the
controller.
It should be noted that there are a number of alternatives in deriving the formats
presented above. For example, it is possible to only use one byte for the X3, input
command, since the first word transmitted could contain address and word count
information. However, this provides no transmission time saving on the bus and con-
taminates the data being sent and received with control information. It is also possible
to not use the variable command scheme requiring the 0-1 transition on line 1 for
certain commands (X_, X4 and X7). It would thereby provide for up to 128 words
with three bytes for tl_e X3 and X4 commands. To go beyond 128 words, one could
use a word count of 128 to signify that the next byte is to be used as an extended word
count, this could result in some savings on the bus. However it would not be possible
to use the 0-1 transition logically to delay the start of certain X3 and X4 commands as
was explained above. To accomplish this would require additional logical circuitry
and/or transmission of additional bytes so there may not really be any savings in
eliminating this logical function.
10.1.2 Global Operations
This section will present a description of global control of the inter-cell bus;
these operations are classified as number 6 in Table 10-1. The same format is used
as presented in the introductory section (10.1). It was pointed out there that commands
Xo, X1.. and X_. are used for global control. Commands Xo and X1. do not specify a
5 bit cell address identification but use the bits for control purposes; these commands
will be one 8-bit byte long. Command X 7 will utilize a b bit cell address identification.
As mentioned previously it uses command extension and is variable in length to offer
the possibility of many control commands within X 7.
1. Format GC Instructions (Xo)
One byte is used here and the format is shown below:
lines: C 123 45 678 P
Y
L
= 11 indicates
= 000 indicates
001 indicates
010 indicates
011 indicates
100 indicates
101 indicates
110 indicates
111 indicates
format type GC
end of DS data words
A - given address
D16 - 16 bit data word
A and D16
D32 - 32 bit data word
A and D32
I - Immediate data
DS - Data is being sent CAis also sent here)
The functional explanation and usage of this and the remaining global operations
may be found in the chapter on group architecture (Section 6); reference should also be
made to the instruction set (Section 9) for further clarification of these operations.
I0-II
1 Level Control GC Instructions (Class 1) - (Xo) -
The same one-byte format as given above is used here.
L is always the level number to be compared to the contents of the
level register in the dependent cell.
Y = 01 G - indicates all cells at this level go to global state, instruc-
tions for this level follow
10 L - all cells at this level go to local control state
3. End of Transmission Command - (Xo)..
e
lines C 1 2 3 4 5 6 7 8 P
I,Loool00[000I an**l
This command is used to reset the bus busy flip-flop in every cell.
Actually it will only have meaning in the controller cell since it is the only
cell controlling the bus use. It is the only command that may be generated
by any cell other than a controller cell.
Level Control GC Instructions(Class 2) - (XI)-
One byte is used here and the format is shown below:
lines: C 1 2 3 4 5 6 7 8 P
L
Y
is always the level number
00 R - all dependent cells at this level reply with a constant,
to be sent on the inter cell bus.
01 R, DG - all dependent global cells at this level reply with a
constant.
I0 IND - all cellsat thislevel go to the independent state.
ii W - allcellsat thislevel go to the wait state.
10-12
o Individual Cell GC Instructions - _X7) -
Recall that the X 7 extended format command is used here, the format is
shown below-
BY TE: 1 2
LINES
C 1 0
1 1
3 i 7 X7
4 X
T
5 c l
E
L
8
The three variable fields have the following meanings:
CELL The cell address that is compared to the cells ID register.
L: The level number that is to be loaded into the cell's level
register.
X7EXT: The operation to be performed by the receiving cell.
When X 7 EXT = 1000 IND, L Go to independent state, set level
I001 IND Go to independentstate,nolevel change
1010 G, L Go to dependent globalstate,set level
1011 G Go to dependent global state,no level
change
1100 W, L GO to dependent wait state, set level
1101 W GO to dependent wait state,
no level change
1110 L, L Go to dependent local control state,
set level
1111 L Go to dependent localcontrol state,
no level change
0111 CC Go to the controllercellstate.
The remaining X 7 EXT codes are used for other
operations as given in Section 10.1.1.
10-13
The use of the instructions presented in this section was given in Section 6 and
reference should be made to clarify their use in global operation of the bus.
10.2 INTER-CELL/INTER-GROUP COMMUNICATIONS
This section will describe how the intergroup communications are carried out.
This requires a description of how the group switch uses the inter-cell bus and how it
uses the inte_roup bus.
Communications over the intergroup bus are somewhat different from that
described in Section 10.1 for the communications within a group. The difference is
that the cells in a group are under immediate control of the controller cell and immed-
iately respond to commands over the inter-cell bus. However, the executive group
that is in charge of controlling the intergroup bus use does not have the groups (rep-
resented by the group switches) under its immediate control. The groups periodically
sample the group switches to see if any commands have been placed in the "Executive
Group Command Register" by the executive group. (Any group can do this; however
only the group acting as the executive group will be issuing such commands. ) The
groups then respond to these commands. It can be seen that there will be some delay
in getting a response to these commands. This delay is primarily dependent upon the
rate at which the group switch is sampled by the controller cell of a group for these
commands. Once the controller cell picks up these commands, it will immediately
respond by commanding the proper cells in its group to begin communications.
The groups will place requests for intergroup use in the group switch (in the
GROUP REQUEST REGISTER). These requests are periodically sampled by the
executive group. It is based upon these requests and any individual requests of the
executive group, that commands will be placed in the group switches for intergroup
bus control.
It is expected that the sampling of the group switches will be a fairly high rate
periodic program. Therefore, the delay in response should be minimized by proper
programming.
The following is a list of the desired communication functions over the intergroup
bus:
1. Ex.ecutive group to scan groups for requests, one group to request to send
words to another group (could be the executive group).
2. Executive group to scan groups for requests, one group to request words
from another group (could be executive group).
3. Executive group to request to send words to another group directly.
4. Executive group to request words from another group directly.
5. Executive group to command another group switch to turn off power.
To accomplish the desired communications functions listed above various com-
mands are required over the inter-cell bus and the intergroup bus. These commands
are given in Table 10-3. It should be noted that the group switch contains an ID
10-14
(address register) just as a cell does. It will therefore respond to commands
addressed to it just as a cell does. However, how it interprets a command may be
different than how a cell does. Each command will use the same format as that given
previously namely 3 bits for the X (command) code and 5 bits for the address.
Table 10-3. Inter-Cell Bus Group Switch Commands
Command Code Description
X
O
X
2
X
3
X 6
X 7
X71
X7 2
X7 3
X7 4
000
010
011
100
101
ii0
111
111
111
111
lll
End of transmission command
Request "Executive Group Command" register
Send out a load "Executive Group Command" register
command
Load "Group Request" register
Request "0" and'_I-G Busy" registers
Initialize "Group Request" register
Extended command format
Send out a request "Group Request" register command
Set BITE TIMING FF
Send out a load "0" register command
Send out a power off ......... "I_UllllllZ,! IIU
Each of the commands will be explained below:
X :
o
This command signifies that the transmission is complete.
It will reset the "I-G busy" flip-flop signifying that the inter-
group bus is now free. It will also be sent out on the intergroup
bus by the group switch. This will also reset the "I-G Busy"
flip-flop in the other group switches. It should be noted that
this is the only "global" command valid in a group switch
(X1 command codes are not sampled). The 5 bits for the
address portion of this command are always all "O's."
X2" This command tells the group switch to send the data con-
tained in the "Executive Group Command" register on to the
inter-cell bus. This results in 4 bytes sent on the bus and
the register being set to 0.
10-15
X3:
X4-*
X5:
X6:
X7:
X71:
X72:
X73:
Reception of this command forces the group switch to send
out a command (to be described below in Table 10-4) over
the intergroup bus to load the "Executive Group Command"
register in another group switch. The X3 command received
over the inter-cell bus will be followed by 5 data bytes. These
bytes are comprised of Group 1 (5 bits), Cell (5 bits), Address
(9 bits), Count (9 bits), I/O (1 bit), GrouP2 (5 bits). These
bytes are all sent out to the other group by the group switch.
The group switch identifies which group they will be sent to
by "GrouP2".
This command tells the group switch to load the "Group
Request" register with the next 4 data bytes on the inter-cell
bus. The bytes are comprised of Group (5 bits), Cell (5 bits),
Address (9 bits), Count (9 bits), and I/O (1 bit).
This command requests the group switch to send out a byte on
the inter-cell bus. Two bits are of interest in the byte:
(1) the "0" register (an output go command) and (2) the I-G
Busy register (signifies if intergroup bus is tied up with a
transmission).
This command will set the group request registers in the
group switch to all "O's."
This command requires the reception of another byte and the
subsequent decoding of the first two bits of that byte to deter-
mine what the actual command is. This then provides for 4
total commands from the X 7 command code.
This command tells the group switch to output on the inter-
group bus a command to another group switch requesting a
"Group Request" register. The group switch will then let
the response thru itself onto the inter-cell bus. The second
byte contains: Code (2 bits), Group (5 bits) and Spare (1 bit).
The "group" will be used to tell the group switch what group
switch it should send out the intergroup bus command to.
This command results in the BITE timing flip flop being
set/reset. It simply uses th_ Code (2 bits) in the second byte
for its identification. (The remaining 6 bits could be used as
a check code by the group switch. )
This command tells the group switch to send out on the inter-
group bus a command to tell another group switch to set its
"0" register to a 1. This signifies that the group can now
output over the intergroup bus (the inputting group is ready).
The second byte contains Code (2 bits), Group (5 bits), and 1
spare bit. The group bits identify what group switch to send
the command to.
10-16
X74:
This command tells the group switch to output a command
over the intergroup bus to tell another group switch to turn
off power. The second byte contains: Code (2 bits), Group
(5 bits) and 1 spare bit.
Table 10-4. Intergroup Bus Group Switch Commands
Command Code Description
X
O
X 3
X4
X5
X 7
t'tN
vv0
011
i00
101
II0
End of transmission command
Load "Executive Group Command" register
Load "0" register
Turn off power
Send out "Group Request" register over the intergroup
bus
These commands will be explained below: (the same command format is used:
3 bits for X and 5 bits for the group switch ID).
X :
0
This command causes all I-G Busy flip-flops in the group
switches to be reset.
X3: This command tells the group switch to load its Executive
Group Command register with the next 4 bytes on the inter-
group bus. The bytes will contain: Group (5 bits), Cell
(5 bits), Address (9 bits), count (9 bits) and I/O (1 bit).
X4:
X5:
This command tells the group switch to set its "0" register
to a one. It is one byte long.
This command tells the group switch to turn off its power.
is one byte long.
It
X7: This command tells the group switch to output its "Group
Request" register over the intergroup bus.
Now that the commands have been presented, the mechanization of an intergroup
communication operation maybe discussed. The operation will be for the executive
group (EG) to scan a group for a request and the group (G1) to request words from
another group (G2). The complete operation will be described starting from when
the requesting group (G1) places its request in its group switch.
10-17
Group
G1
EG
EG
EG
G2
G2
G1
G1
Inter-Cell Bus
Command
X4
X71
X 3
X 3
X 2
X 5
X 2
X7 3
Inter-Group Bus
Command
X 7
X 3
X 3
X 4
Description
The group switch of G1 is loaded with
the request.
The group switch of EG fetches the
request from the group switch of G1.
This request will pass right onto the
inter-cell bus of the EG where it will
be examined.
The EG, if it approves the request,
will command the group switches of
G1 and G2 to be loaded with an input
and output command, respectively.
Simultaneously, when the group switch
of EG sends out an X 3 intergroup bus
command, it sets the I-G Busy
flip flop.
G2 requests the executive command
register contents to be sent on the
inter-cell bus. Internally, G2 will
send out the proper inter-cell bus
commands to its cells to prepare to
output (as described in Section 10.1
previously).
G2 will be continually sampling the
"0" register in its group switch to
determine if G1 has said it is ready
to input. Then it will issue the
necessary inter-cell bus command
to its cells to tell it to start out-
putting. This results in a direct
transmission path from a cell in G2
to a cell in G1.
G1 requests the executive command
register contents to be sent over the
inter-cell bus. Internally G1 will
send out the necessary inter-cell bus
commands to a cell to tell it to input.
G1 then immediately tells its group
switch to tell the group switch in G2
to set its "0" register. The commu-
nications begin as soon as G2 has
sampled the set "0" register as
mentioned above.
10-18
Group
Inter-Cell Bus Inter-Group Bus
Command Command Description
G2 X X
O O
The cell that was transmitting in G2
will send out as its last byte an Xo,
end of transmission command. This
command will reset all the bus busy
flip-flops that were set due to this
transmission (I-C Busy in G1 and G2
and I-G Busy in group switch of EG).
The commands for the intermediate operations carried out by the controller
cells in Gi and G2 have been left out for clarity. They are identical to that presented
previously in Section 10.1 for communications among cells.
It should be noted that the other types of intergroup communications presented
earlier in this section are mechanized quite similarly to the operation presented above
and will therefore not be presented in detail.
10. B MECHANIZATION OF INTER-CELL COMMUNICATION BUS COMMANDS
AND CONTROL WORDS
A functional description of the commands and operations on the bus was given
in the previous two sections. The control words that are required for the operations
that will take place over the bus were introduced and the contents of each of the bytes
comprising a control word were given. This section will present a detailed descrip-
tion of how the control words are formed or mechanized. It should be recalled that
each of the operations was mechanized by one or more control words. These control
words were identified by commands X 0 thru X 7 (recall that the first byte of a control
word is identified as a command byte). Each of the control words may be identified
by the command that it uses. Therefore, the term mechanization of the commands
or control words will be used interchangeably below. The controller cell will be the
only cell given the capability of mechanizing the commands or control words except
for the X0, end of transmission command. (Each cell is identical, however, only the
controller cell is capable of using certain logical functions in the cell. )
The commands for operation of the bus are generated by the controller cell when
it executes a CC instruction. Referring back to Section 9, one can see the various
codes used in the CC instruction to derive these commands. A description of how the
commands are actually formed and sent out over the bus will be given below. The
operations that take place in the controller cell during each clock time in the forma-
tion of these commands will be given. Note however that only the operations inherent
in forming the commands are given. Other operations inherent in the processor's
internal operation such as decoding the CC instructions, transferring the op code to
the instruction register, etc. are not given. These were omitted to illustrate only
the fundamental operations used in forming the commands.
10-19
A list of some abbreviations used throughout this section is given below:
CC: The Controller Cell instruction to generate a Global
Command described in Section 9 as op code 62.
SCR:
MB:
BCR:
IC Bus, Bus:
P:
U, A:
L:
10.3.1 X 0 and____X1 Commands
Shift Count Register
Memory Buffer Register
Buffer Communications Register
The Inter Cell Communication Bus
Program Counter
Upper Hardware Accumulator
Lower Accumulator
X 0 is generated based on the following conditions in the CC instruction:
C field = 01x
and L field = 000
note: x means this bit can be a 1 or 0
X 1 is generated based on the following conditions in the CC instruction:
C field = 01x
and L field = 001
Clock Time
1
2
Operations
a. CC instruction _ MB
b. selected bits from MB _ BCR
CC: Op Code I C _q
J
Xo. Ixo=oooI ,
bits: 1 3 2 3
10-20
Clock Time
3 C°
10.3.2 .._.X2,__X5 t.__X6 Commands
X2, X5, and X 6
instruction:
Operations
L is transferred directly
Y is f.ormed directly from OP and the first bit
of C
XI: identical to X0 except L will now contain
001, X 1= 001
BCR is sent out on the IC bus
are generated based on the following conditions in the CC
C field = 01x
and L field = (010 + 101 + 110)
OperationsClock Time
1 a°
2 b.
CC instruction--* MB
selected bits from MB---- BCP.
CC: IOpCode I C i OP_ L I
_z • _I I __--010 I CELLID I
-2" I -I I I
bits: 1 3 5
X2 is formed from L
Cell ID is formed from OP and the last bit of C
X5: Same as X2 except L will now be 101
X6: Same as X2 except L will now be U0
3 c. BCR is sent out on the IC Bus
Generation of an X2 command will also result in the Cell ID being stored in the
Shift Count Register. The reason for this will be explained in Section 10.4.
10-21
10.3.3__X3,_X 4 Commands
X 3 and X 4 commands are generated based on the following conditions in the
CC instruction:
C field = 010
and L field = {011 + i00)
The CC instruction uses an extended format to mechanize these commands.
Below is a diagram showing how the bits in the control words using commands X 3 and
X 4 are formed from the CC instruction words.
Condition 1 : 1st 2 bits of 09 = O0
i i , i
Word: I 1 I 2 I 3 I
I I I I
CC: I Op I C ] OP I L I Spare i Cell I Address I Sparel_ Count IIC°del I I I I _]:'1 ! I/
_,,,,. _ . _. t..._ J9 6/1 j9
I I i I I
, : ,
Byte: i _ I/ o _ o / '/I /A '
X 3:
bits: 1 3 5 1 1 7 1 1 2 5 1 1 4 3
X 4 is formed the same way except L will now contain X 4 = 100
The extended format (3 words) for the CC instruction is called for whenever:
C field = 01x
and L field = {011 + 100)
and OP field = 00xx
Clock Time
1
2
3
Operations
a. CC instruction Word 1 _ MB
b. L from CC1----_ BCR
c. P + i---- P
d. P---- Memory (read)
10-22
Clock Time Operations
4 e. CC instruction Word 2 ---* MB
f. P+ I----P
5 g. Cell ID from CC 2--* BCR
h. BCR _ IC Bus
i. Address 2 from CC2----_ Saved
6 j. Address I from CC 2 --* BCR
k. BCR --_ IC Bus
I. P --* Memory (read)
7 m. CC 3--* MB
n. P + i---- P
8 o. Address 2 --- BCR
p. Count 1 from CC 3 ---* BCR
q. BCR ---- IC Bus
r. P -_ Memory (read)
s. Count 2 from CC3---* saved
9 t. Count 2 _ BCR
u. Next instruction -- MB
12 v. BC R -_ IC BUS
Note in the above timing description that the last byte on the bus is sent at clock
time 12 instead of at clock time 9 when it is ready. The reason for this is in easing
the mechanization within the receiving cell as will be pointed out in Section 10.4.
10-23
CONDITION 2: IST 2 BITS OF OP = 01
I I I I I
WORD: I 1 i 2 I 3 I 4 |
CC:
BITS:
BIT S:
X3:
BYTE:[ 1 [ 2 I 3 I 4 I 5 [ 6 [
I I I I I I
The above condition results in 4 words being called for in the extended CC
format. This condition is used when an X 3 inter-cell bus group switch command is
to be generated. The timing is somewhat similar to that given above except slightly
larger due to more bytes being outputted on the bus.
10-24
CONDITION 3: IST 2 BITS OF OP = 10
WORD: I l l 3 I
I I I 2 I I
BITS: 6
i I I I I 5 I
B'_TE:I I I 2 I 3 I 4 I I
As in condition 2 this X4 is generated for an inter-cell bus group switch
command. However, it only requires 3 CC words for its mechanization.
An X 3 command can also be generated under another condition:
C field = 011
and L field = 011
Clock Time Operations
1 a. CC instruction -----MB
2 - b, Selected bits from MB _ BCR
SCR register _ BCR
X3:!1 ] X3=011 ] CellIP [
bits: 1 3__ 5
SCR: | Cell ID
L
10-25
Clock Time Operations
3 c. BCR----* IC Bus
This X 3 command is simply one byte long and therefore does not use the
extended aC instruction format for the previous X 3 command. It also requires that
the shift count register (SCR) be loaded with the Cell ID prior to executing this CC
instruction. Recall that execution of the C C instruction forming an X 2 command
loads the SCR with the Cell ID.
10.3.4__X 7 Commands
X 7 commands use an extension format to derive many commands from the basic
X 7 code. These commands will be generated based on the following conditions in the
CC instruction:
/
C field = 010 l
Iand L field = 111 Condition 1
or
C field = 011
and L field = III
Condition 2
An X 7 command will use an extended word CC format, in particular two words will be
used to form the CC instruction.
Below is an illustration showing how the X7 command is formed for the
first condition. This particular X 7 extension illustrated is an output command to a
particular cell with a specified count of the number of words to be output.
Word:
CC Instr:
bits:
Byte:
X7:
bits:
SCR:
i I
1 I Ii 2 ,
OpC°de ] C [ OP ] L [ Spare I X7 [ C°unt]EXT
! I I
,, / ,
1 3J5 1 1 4 3 1 1 6 1
I CellID ]
10-26
The X 7 extension command illustrated above requires 3 bytes to be sent out on the
IC bus. The "Cell ID" is formed from the Shift Count Register. It is stored in this
register by the prior formation of an X 2 command. The "count" is taken from the
2nd word of the CC instruction and used in the 2rid and 3rd bytes.
Clock Time
Timing for Condition 1
Operation
a. CC instraction word 1---- MB
b. L from CC 1 ---* BCR
c. SCR register ---- BCR
d. P+l--P(read)
e. P ---- Memory
f. CC instruction word 2 -- MB
g. BCR---_ IC Bus
h. X 7 EXT from CC2.----_ BCR
i. part of count from CC2---* BCR
j. P + 1------ P (read)
k. BCR---- IC Bus
1. part of count from CC2---.- BCR
m. P---_ Memory
n. Next instruction --_ MB
o. BCR _ IC Bus
3
4
5
Next an illustration of how the X 7 command is formed when the second condition
occurs will be given below. This particular forming of the X 7 bytes is followed if
the 1st bit of the 2nd CC word is a 1. _ it is 0, a slightly different forming is
followed for the 2nd byte. The first case is for X 7 commands to cells and the latter
for X 7 commands to group switches.
10-27
Word:
CC instr:
bits:
Byte:
X7:
bits:
I I
I 1 I 2
I I
OpCode C OP L I1 Spare I L I CelllDi I I
I !
I 1
1 3 5 1 1 4 3
The timing for condition 2 will be given below:
Clock Time Operation
1 a. CC instruction Word 1 _ MB
2 b. L from CC1---* BCR
c. OP from CC1----* Saved
d. P + 1 ----* P (read)
3 e. P _ Memory
4 f. CC instruction Word 2 _ MB
5 g. Cell ID --_BCR
h. BCR --* IC Bus
i. P+I--P
6 ]. P -----Memory (read)
k. L from C C2-"* BC R
1. OP --- BCR
m. BCR --- IC Bus
7 n. Next instruction--- MB
10-28
r
10.4 MECHANIZATION OF INTER-CELL COMMUNICATION BUS OPERATIONS IN
A CELL
This section will discuss the mechanization, within a cell, of the bus communi-
cation operations (the operations upon the generation or reception of X commands by a
cell). These operations were presented in section 10.1 where a description of the
control word bytes were given. A detailed description of the logical properties
required in the ce 11to carry out the operations will be given below. Each of the
operations will be presented in the same order as presented in section 10.1.
10.4.1 Controller Cell_ CO, to Send Data to a CelI C1
Two basic approaches were considered to mechanize this operation as well as
all the others, namely software and hardware approaches. The approach selected
was to use a hardware mechanization. The primary reason behind this choice is that
software is very inefficient with respect to the time it takes to mechanize. For
example, in this operation, the basic function to be performed once the command from
the bus has been decoded, is to simply store a set of words in memory that arrive via
the bus. These words may arrive at a rate up to 1 word every 2 clock cycles since
the bus is a 1/2 word in parallel. However, a software routine would require a three
instruction loop:
1. Test, Increment Index Register by 1, Jump on 0
2. STORE
3. JUMP
This loop would take at a minimum 7 memory cycles to execute. Assuming 2 clock
cycles per memory cycle, this results in 14 clock cycles for each word to be stored.
Therefore, this would result in very inefficient use of the bus.
The hardware approach will now be described. Reference should be made to
_N_Lmuu mU. m £U m" Lilmm U[J_£-_m-mu mm _IlU lbCd_ uml_D Lk_c$_ J.U4._UW LI.PA _ 4.1_O_A_X_J.L _L _x_ _AA_A
word bytes. The mechanization will be described for the two cells, CO and C1. First
the mechanization within the controller cell, CO, will be described. The controller
cell will execute the following instructions:
1. Double Precision Load A and L
2. cc (x 3)
3. CC (Output)
The first instruction will load A and L with the address and count of the number of
words to be sent out. Instruction 2 is the CC command to generate the format X3 as
described in Section 9 of this report. This particular CC instruction requires
3 words for its mechanization as discussed previously. It will result in the 4 X3
command bytes being sent out over the bus. The last instructtou is the CC command
that tells the controUer cell to start outputting data on the bus. This instruction
simply exchsnges the accumulator (previously loaded with the address) and the
10-29
programcounter andsets the outputmodeflip-flop. The controller cell will now
simply read data from its memory andoutputit over the bus. Each time the program
counter is incremented, the count contained in L will be decremented by 1 and checked
against 0. When the count equals 0 the output mode flip-flop will be reset and the
accumulator will be transferred to the program counter.
A listing of the sequence of operations, starting from the time the first byte
generated by the second instruction (X3 commands) is output over the bus, is given
below.
Operations in Controller Cell CO
Clock Time Operation
1 X 3 Byte 1 -* Bus
2 X 3 Byte 2 --_ Bus
_m_
4 X3 Byte3 _ Bus
5 Next instruction(CC Output)_ MB
6 U (Accumulator) _ P (Program Counter)
P------- Memory (read)
7 1st data word --* MB
L (Lower Accumulator) -- 1 --,- L
P+I --.- P
8 X 3 Byte 4------_Bus
1st word (1/2) --- BCR (Buffer Communication
Register)
P ------- Memory (read)
9 BCR --_ Bus
1st Word (1/2) --_ BCR
P+I --_ P
L-1 --- L
2nd Word -- MB
10-30
Clock Time Operation
i0 BCR --_ Bus
2nd Word (1/2) --* BCR
P-----*Memory (read)
11 BCR --- Bus
2nd word (1/2) --_ BCR
P41 "-" P
L-I -* L
3rd Word --* MB
12 etc. untilL=0, thenA --- P
The mechanization within the receiving cell, C1, will be discussed below. The
first byte received by C1 will contain the X3 command code and the cell's, C1,
identification code. Cell, C1, will recognize that the command is for itself and will
decode the command and issue an interrupt. The bus communications will be handled
on an interrupt basis with the interrupt halting the execution of the present instruction.
This approach was taken since the cell could be in the process of executing a long
instruction such as multiply or divide and it would be inefficient to tie the bus up
waiting for instructions to be completed.
The interrupt will result in the program counter, accumulator, memory buffer
and possibly some control flip-flops being stored in fixed locations in memory.
Basically the functions that will occur in mechanizing the bus operation are:
Place address in P
Place count in A
Assemble bytes from the bus in MB
Use P as memory address to store MB
Decrement A by 1, test for 0
Increment P by 1
The sequence of operations that will occur in the cell, starting from the time
the first byte is received over the bus, are listed below. It should be kept in mind
that the buffer communication register (BCR) will be mechanized with master slave
type of flip-flops. Therefore it is possible to load byte i + 1 into BCR while byte i
is being read from it in the same clock time.
10-31
Operations in Receiving Cell, C 1
Clock Operation
4
Command
Bytes
Recv'd
By The
Cell
5
6
7
8
Byte I_BCR, Decoded, Interrupt
Byte 2------*BCR, Processor Stopped, Fixed address
sent to memory (write)
P -----* Memory (P is stored)
BCi_------part of P, Byte 3-----_BCR.
Fixed address sent to memory (write)
Part of BCR-------part of P, MB---------Memory
Fixed address sent to memory (write)
A------_Memory (A is stored)
Part of BCR--------*A
Byte 4 _ BCR
Data
Recv'd
10
11
BCR-------A
Data Byte 1------_BCR
BCR-----_MB, Data byte 2 ------_BCR,
P (Address)-----_Memory (write)
BCR -------MB, MB -------Memory,
Data byte 3-------BCR, A - 1--------_A, P + I_P
12 BCR -----MB, Data byte 4------BCR, P------Memory
13 BCR----_MB, MB------_Memory, Data byte
5----_BCR, A - 1-----_A, P + 1-----_P, etc.
until A = 0 then reload stored registers and start
processor
As noted the above operations are all performed solely by hardware. Referring
back to Section 10.1, it can be seen that the 1st line in the byte from the bus is used
for control purposes. Following the first command byte (identified by the C (control
line) being a 'one'), the first line in the byte will be a '0' and will be a 'one' only when
the command sequence is complete and the next bytes on the bus represent data.
Therefore this 0 to i transition will be used to set the communication bus control
logic to accept information on the bus as data. The sequence of operations described
above will then be clocked based on the occurrence of data bytes received via the bus.
10-32
It may be noted above that command byte 4 is received at clock 8. If this com-
mand byte were received earlier extra hardware would be required to store informa-
tion from the command bytes. This is the case since MB cannot be used until A has
been stored. Therefore MB and A would both have to be stored before information
could be stored in them. This is why P is stored first and used immediately to store
information (address) from the command bytes. As can be seen from the mechaniza-
tion within the controller cell, CO, the fastest it can output the first data byte is at
clock 9. Therefore, there is no penalty in requiring it to output command byte 4 at
clock 8; in fact this simplifie_ the mechanization in cell, C1.
10.4.2 Controller Cellp C 0, to Request Data From C 1
The mechanization within the controller cell will be described first. The con-
troller cell will execute the following instructions:
1. Double Precision Load A and L
2. CC (X4)
3. CC (Input)
The operations are quite similar to that described for the previous operation, 10.4.1;
A a nd L are loaded with the address and count of the words that will be read out of
the controller cell's memory. Instruction 2 is a CC instruction that generates 4 X4
bytes to be output over the bus as described in Section 10.3.3. Finally the last
instruction is a CC instruction that sets the input mode flip-flop and exchanges A and
P. The controller cell is now ready to input words from the bus. Each time a word
is stored in memory, the program counter is incremented by 1 and L decremented by
1 and checked against_ 0, When the count is 0 the contents of A are transferred back
to P and the controller cell proceeds with its program. It should be noted that the
cell will contain a simple one shot type of timer that is set whenever the CC (input)
instruction is executed. The purpose of this timer is to allow a certain maximum
time for a cell to respond with data. If the cell fails to respond within the maximum
time interval an interrupt will be issued that will be processed by a software routine
in the controller cell (generally this will indicate a failed cell, C1). In addition,
the controller cell contains another simple one-shot type of timer that times the time
interval between words on the bus. If the time interval exceeds that allowed by the
timer, an interrupt is issued. This prevents the cell C1 from transmitting only a
part of the words and hanging up the controller cell by having it keep on waiting for
the remainder of the words. Of course, once the count in L is zero this timer is
inhibited.
A listing of the sequence of operations, starting from the time the first byte
generated by the second instruction (X4) is output over the bus is given below.
10-33
Clock
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Operations in Controller Cell C O
Operation
X 4 Byte 1----*Bus
X 4 Byte 2-----_Bus
B--m
X 4 Byte 3-----_Bus
Next instruction (CC Input)----_MB
A_---_la
Timer Set
X 4 Byte 4------*Bus
Word 11----_BCR
P-----*Memory (write)
BCR------_MB
Word 12------* BCR
BCR_MB
MB-----Memory (Word 1 stored)
Word 21---_BCR
A - 1----_A
P + l-------P
B CR-------MB
Word 22------_BCR
P-----Memory (write)
10-34
Clock Operation
15 BCR-------MB
Word 31-----_ BCR
MB-----_Memory (Word 2 stored)
A - 1-----*A
P + I----'P
The mechanization within the sending ceil, C1, will be discussed below. The first
byte received by C 1 will contain the X 4 command code and the cell's, C1, identification
code. Cell, C1, will recognize that the command is for itself and will decode the
command and issue an interrupt. The same process as described for the previous
operation, 10.4.1, will be followed on the interrupt. The primary difference in the
mechanization given below is that words will be readout of Cl'S memory instead of
into it as in 10.4.1.
Operations in Sending Cell, C 1
Clock Operation
4
Command
Bytes
Rcv'd
by C1
5
Byte 1------*BCR, Decoded, Interrupt
Byte 2-------BCR, Processor Stopped,
Fixed Address -----*Memory (write)
P------*Memory (P is stored)
BCR-------Part of P
Byte 3------*BCR
Fixed Address------ Memory (write)
Part of BCR _ part of P
MB------* Memory
Fixed Address _ Memory (write)
A-----_Memory (A is stored)
Part of BCR-----_A
Byte 4------BCR
P------Memory (read)
10-35
Clock O peration
9 BCR ----_A
Word i----* MB
P+ I------P
10 Word ii------*BCR
P-----_ Memory (read)
A -I ---_A
Data on
the bus
ii
12
BCR _ Bus
Word 12 -----*BCR
Word 2-----_MB
P+ 1 -----_P
Etc. untilA = 0,
Then restart Processor,
Reload P, A, and MB registers
10.4.3 Controller Cell, C 0, to Scan a Cell for Requests - Cell, C1, to Request
Words from Cell, C2
This operation is initiated by the controller cell executing a CC (X2) instruction.
This X2 command forces cell C1 to respond with a request word. Upon executing
the CC (X2) instruction the controller ceU automatically enters the idle state. It
will also set a simple one shot timer (previously introduced in Section 10.4.2) so
that the response from cell C 1 is received within some maximum time limit. If it
is not received within this time limit an interrupt will occur as previously described.
The response word willbe placed in the A and L registers under hardware con-
troland once thisis accomplished the controllercellwillbe released from the idle
stateand allowed to execute itsnext instruction. This instructionis the startof a
software routineto process the request response word from the Cell CI. This
routinewillcheck what request isbeing made ifany, Input or Output, what celland
locationcommunication is desired with and the count of the number of words to be
communicated. Itwillcheck for valid requests and may possibly reduce the amount
of words desired to be communicated. At the end of the software routine the proper
CC instructionswillbe loaded with the necessary information so as to initiatethe
requested communications.
In the operation under consideration here Cell C1 is requesting to input words.
Itwilltherefore receive an X 3 (input)command from the controUer cellifits
requested communications willbe honored. Ifits request is not honored, the con-
trollercellwill send itan X 7 command that signifiesrequest not honored. This X7
command results in an interruptin cell CI that forces itto enter itscommunication
executiveroutine. Now once cellC 1 has been toldto input,itis ready to accept data.
However, the number of words itreceives may be differentfrom the number it
requested. This situationis handled by the simple one-shot timer presented in
Section10.4.2. This timer allows a maximum time between words received over
the bus. If this time is exceeded, an interrupt occurs that sends cell C1 to a software
10-36
routine that will note the count of the number of words remaining to be received
indicated in the accumulator. This routine will use this information to update its
communication bus executive tables. If the count requested is received, the accumu-
lator will count down to zero and this will restart the processor section of cell C 1
so that it may complete its interrupted instruction.
For further details on the software routines discussed above, both in the con-
troller cell and working cell, C1, the reader is referred to Section 11 of this report.
In addition this section will show how these routines tie in with the other executive
functions within a cell. A detailed mechanization of the hardware aspects of the
operation will be given below.
Operations in Controller Ceil, CO
Clock Operation
CC (X2) instruction--------MB
X 2 Command _ BCR
Cell ID----_SCR
One-Shot Timer l-------Set
BCR ----- Bus
4-8 Idle
Response Byte 1 _ BCR
P+l -----P
i0 BCR ----A
Response Byte 2 _BCR
P -----Memory (read)
ii BCR ---- A
Response Byte 3----_BCR
Next Instruction_ MB
12 BCR _ L
Response Byte 4 ----*BCR
(For details on next instruction and the
remainder of the software routine to process
the response word see section 11.2.) (It is
assumed that this routine will take j clock
times to complete)
13 BCR------L
j+lO
j+ll
CC (X3) instruction _ MB
X 3 Command------_BCR
P + i ---_P
10-37
Clock
j+12
j+13
j+17
j+18
j+20
j+24
Clock
3
4
5
6
7
8
i0
11
12
13
Operation
B CR-----* Bus
P------Memory (read)
C C (X4) instruction _ MB
X 4 Byte i--------Bus
X 4 Byte 2------*Bus
X 4 Byte 3------*Bus
X 4 Byte 4-------Bus
(See Section 10.3.3 for details on how these
bytes are formed)
Operations in Requesting Cell, C 1
Operation
X 2 Command -----*BCR, Decoded, Interrupt
Fixed Address------* Memory (write)
MB _ Memory (stored)
Fixed Address _ Memory (read)
Response Word 1 _ MB
Response Word 11 ------ BCR
Fixed Address ------ Memory (read)
BCR _ Bus
Response Word 12 _ BCR
Response Word 2 _ MB
BCR ----* Bus
Response Word 21 _ BCR
BCR _ Bus •
Response Word 22 -----_BCR
BCR _ Bus
9 bit count from Response Word 2
saved in BCR
Fixed Address _ Memory (write)
10-38
Clock Operation
14 A ----*Memo_
15 Fixed Address _ Memory (write)
BCR (Count} _ A
16 P _ Memory (stored)
17 Fixed Address -----* Memory (read)
18 Address where to store data ---- MB
19 MB ----- P
20
j+12
idle until command from C O
X 3 Command ---- BCR, Decoded, Logic
set to input over the bus
DATA
RCV'D
From
C 2
j +28
j+29
j+30
j+31
j+32
j+33
Word 11 ----- BCR
BCR -----_ MB
Word 12 ---* BCR
P ---* Memory (write)
BCR----- MB
MB _ Memory (Word 1 stored)
Word 21 _ BCR
A - i----* A
P+l ----P
BCR _ MB
word 22 _ BCR
P -----* Memory (write)
BCR -----* MB
MB _ Memory (Word 2 stored)
Word 31 ----* BCR
A -i --*A
P+I -----P
ETC. Until A = 0 or one-shot timer that times
interval between words exceeds some maxi-
mum time, then if A _ O, interrupt.
It should be noted that the logic to input data in C 1 is not enabled until the last X4
byte to C2 has been sent out by C O since this byte has a '1' on line 1. (Recall that
this logical property is required to enable the bus communication logic for X3, X4
and X 7 commands.)
10-39
Operations in Sending Cell C 2
Clock Operation
(See 10.4.2 for these operations, since they are identical)
j + 28 Word 11 _ Bus
j + 29 Word 12 _ Bus
ETC.
10.4.4 Controller Cell, CO, to Scan a Cell for Requests -Cell, C1, to Request
to Send Words to Ceil, C2
This operation is very similar to that described above in 10.4.3. The primary
difference being that upon examining the response word from C1, the controller cell
will send out different X commands. Cell C1 will of course be following a procedure
for outputUng data; while cell C2 will follow the procedure presented in 10.4.1 to
input data.
Operations in Controller Cell
Clock
3
4-8
9
10
11
12
Operation
CC (X2) instruction _ MB
X 2 Command ------ BCR
Cell ID _ SCR
One-Shot Timer 1 _ Set
BCR ----- Bus
Idle
Response Byte 1 _ BCR
P+I--_ P
BCR _ A
Response Byte 2 _ BCR
P _ Memory (read)
BCR _ A
Response Byte 3 _ BCR
Next Instruction _ MB
BCR---- L
Response Byte 4 _ BCR
(For details on next instruction and the
remainder of the software routine to process
the response word see Section 11.2. It is
10-40
Clock
13
k+10
k+ 14
k+ 15
k+17
k+18
k+21
k+ 22
k+23
k+24
Operation
assumed that this routine will take k clock
times to complete. ) (This routine will, of
course, branch differently than that in the
previous operation 10.4.3, since the response
word now contains an output request. )
BCR ------ L
CC (X3)
X 3 Byte
X 3 Byte
X 3 Byte
CC (X7)
X 3Byte4 _Bus
(see section 10.3.3 for detailed mechaniza-
tion of the CC (X3) instructions)
X 7Byte1 _Bus
X 7Byte2 _Bus
X 7Byte3_ Bus
instruction-----* MB
1 _ Bus
2 _ Bus
3 _ Bus
instruction ---* MB
Clock
3
2O
k+22
k+23
k+24
JL
Operation
The operations for clock times 3-20 are
identical to those given in 10.4.3 except for
clock time 13, the count is not saved in
BCR and clock time 15, BCR is not trans-
ferred to A
X 7 Byte 1 ----* BCR, Decoded
X 7 Byte 2 _ BCR, Output Operation
Decoded
Part of BCR ---.. A and MB
X 7 Byte 3 ----- BCR
Fixed Address _ Memory {write)
10-41
Clock
k+25
k+26
k+27
k+28
k+29
k+30
Clock
k+ 14
k+15
Operation
Part of BCR _ A and MB
MB _ Memory (count is stored)
P .----- Memory (read)
Word 1 _ MB
P+l -----P
Word 11 ----*BCR
P ---- Memory (read)
A- i----- A
BCR -----Bus (1stdata byte out)
Word 12 _ BCR
Word 2 ------MB
P + 1-------P
ETC. untilA = 0, then complete interrupted
instruction
Operations in Receiving Cell C2
Operation
X 3 Command --_ BCR, Decoded, Interrupt
X 3Byte 2 _BCR
ETC. (the same procedure is followed as
in 10.4.1) at clock time k + 22 the cell is
ready to accept data, however this first
data byte wilt be received at clock time
k+29
10.4.5 Controller Cell to Tell One Cell, C1, to Output to Another Cell, C 2
This operation is simply a combination of the X 3 and X 4 commands described
in 10.4.1 and 10.4.2. Therefore the description here will be minimal.
Operations in Controller Cell
C lock Operation
1 CC (X3) instruction-----MB
5 X 3Byte 1 _Bus
6 X 3Byte2 _Bus
10-42
Clock
8
9
12
13
14
16
20
Operation
X 3Byte3_ Bus
CC (X4) instruction-----* MB
X 3 Byte 4 _ Bus
X 4 Byte i _ Bus
X 4 Byte 2 ----* Bus
X 4 Byte 3 ----* Bus
X 4 Byte 4 _ Bus
Clock
5
m
m
I
13
C!vck
13
23
Operations in Receiving Cell C 2
Operation
X 3 ----BCR, Decoded, Interrupt
Ready to Accept Data
Operations in Sending Cell C1
Operation
X4-----BCR, Decoded, Interrupt
1st Data Byte on the Bus
It should be recalled here that the receiving cell C2 uses the 0 to 1 transition on
line 1 in the 4th command byte of X4 to enable its data inputting circuitry.
10.4.6 Controller Cell to Tell One Cell, C 1 to Input from Another Cell, C2
This operation is mechanized identically to that above.
10-43
10.4.7 Controller Cell to Tell Another Cetlp C 1
Control State
Case 1: CC (X6)
Clock
Clock
3
4
Case 2: CC CX_:
Clock
Clock
to Reconfigure Some Internal
Operations in Controller Cell
Operation
CC (X6) instruction---- MB
X 6 .---. BCR
BCR _ Bus
Operations in Cell 1
Operation
X 6 -----BCR, Decoded, Interrupt
Change Control State
Operations in Controller Cell
Operation
CC (XT) Instruction ---- MB
X 7 Byte 1 _ Bus
X 7Byte2 _Bus
Operations in Cell C 1
Operation
X 7 _ BCR, Decoded, Interrupt
X 7 Byte 2 _ BCR, Decoded
Change Control State
10-44
10.4.8 Controller Cell to Request Status Word From Cell C 1
Operations in Controller Cell
Clock Operation
1 CC (Xs) instruction _ MB
2 X 5 _ BCR
One Shot Timer I --_ Set
3 BCR _ Bus
4-7 IDLE
8 Status Word---* BCR
(A software routine will process the
Status Word)
Operations in Cell C1
Clock Operation
3 X 5 --_ BCR, Decoded, Interrupt
4 Fixed Address _ Memory (write)
5 MB -----* Memory (Saved)
6 Fixed Addre_ _ _ ............ _'
7 Status Word ----* MB
8 MB ---- BCR
BCR _ Bus
It should be noted here that the one-shot timer introduced in Section 10.4.3
is again used here to prevent a no response condition from hanging up the controller
cell.
10.4.9 Global Operations - Format GC Instructions
The previous described operations were all local operations. The operations
that follow will now be classified as Global. The first of these to be described is
the Format GC Instruction.
10-45
Operations in Controller Cell
Clock Operation
1 CC (X0) Instruction _ MB
2 X 0 ------ BCR
3 BCR -----Bus
Operations in the Cells
Clock Operation
3 a) X 0 -----*BCR
b) if Y = I ! and cell _ dependent global
state, then ignore the command
c) X0 decoded and proper GC Format
Signal generated and sent to processor
control logic
10.4.10 Level Control GC Instructions (X0)
Operations in Controller Cell
Clock Operation
1 CC (X0) Instruction
2 X0-----_BCR
3 B CR ------Bus
Operations in the Cells
Clock Operation
3 a) X0-----_BCR
b) if Y = 01 + 10, and cell = any dependent
state, and L = Level Register in the cell;
then decode the command
c) X 0 decoded and change of state issued
to the cell
10.4.11 Level Control GC Instructions _X1)
Same as 10.4.10 except an X 1 code is used and Y does not have to be checked.
10-46
10..4.12 Individual Cell GC Instructions (X7)
Operations in Controller Cell
Clock Operation
CC (X7) Instruction Word 1 _ MB
X 7 Byte 1 ----- Bus
X 7 Byte 2 ---- Bus
(See Section 10.3.4 for details on X 7
mechanization)
Clock
Operations in Receiving Cell C 1
Operation
X 7 --_ BCR, Decoded
X 7Byte2_ BCR
X 7 EXT is decoded and the proper change of
state is performed (see 10.1.2 for these
operations); if indicated by X 7 EXT, L is
used to set the level register
It should be noted in the above operation that the X 7 is not decoded as a GC
command until X 7 EXT is received.
The following procedures are used to change the states. From the independent
state to the dependent local state the following is done. The program counter is
saved in the same location as used for interrupts and the present instruction com-
pleted, then the hardware accumulator is stored in its memory location. The next
instruction will then be taken from a fixed location in memory. From the independent
to the dependent wait or global state the same is done except that no next instruction
is fetched. It should be noted that it will be the responsibility of the program that
follows to save the necessary accumulators and index/bank registers in this cell;
essentially this may be considered an interrupting program.
To go from a dependent state to the independent state, the program counter is
restored from the same location that it is automatically stored in above and the cell
is restarted independently.
To enter the controller cell state, a fixed location will be used to fetch the
next instruction. If the cell was in an independent state or dependent local state
prior to this, the program counter and hardware accumulator would be saved as
done above and the present instruction completed.
10-47
10.4.13 End of Transmission (X0)
This command is generated by the bus I/O control hardware in a cell upon the
completion of an X4, output, command. It is the only command that can be sent from
a cell that is not a controller cell. The command simply consists of the control line
going to the one state and all O's on the 8 byte data lines. It is thus very simple to
generate in the control section. This command will result in the IC Busy register
0nter-Cell bus busy FF) being reset.
10-48
11. EXECUTIVE DESIGN
11.1 INTRODUCTION
The distributed processor has many levels of control as described previously.
This results in many levels of control (executive action) in the software to program
the computer. The executive programs may be considered at three levels: (1) the
system executive, (2) the group executive, and (3) the cell executive.
The group executive will be discussed first in this section. Basically this
executive controls a group of cells. Each group must have allocated one cell as a
controller cell; this cell will act as the group executive. Since each cell in a group
is identical any cell can perform as a controller cell. The group executive functions
will reside in one or more cells in a group (more than one as required due to
storage limitations of an individual cell) and one of these cells will be acting as a
controller cell at any one time. Although the different programs that comprise the
group executive may be in different cells, this set of cells (which always includes the
present cell that is in the controller cell state) is called the group executive.
The cell executive is that program that resides in an individual cell and controls
its performance. As may be expected, this program is considerably simpler and
smaller than the group executive. Each cell will have its own executive and in gen-
eral they will all be different although containing many basic similarities.
One of the group executives will have in addition to the group executive func-
tions, functions that control the operation of the overall computer system. These
functions will be called the system executive and this group will be said to contain
the system executive.
11.2 GHOUP _.'X_.'CUTi--v_
The group executive has many functions. Basically, it must control the opera-
tion of the cells in the group and the inter-cell communication bus, it must recon-
figure the group resources due to changes in processing requirements and due to
failures, and finally it must interface with other groups. The functions may be
broken down into six subfunctions:
1. Control system resources
2. Furnish global programs to dependent cells
3. Allocate system resources
4. Test system hardware and software, and respond to any malfunctions
11-1
5. Re configure system upon
a. Normal phase change
b. Malfunction
6. Interface with other groups, including the system executive
Each of these subfunctions will be discussed below.
11.2.1 Control System Resources
The group executive has many more resources to control than the usual
executive of traditional computer systems. In some ways this job is easier in the
distributed computer system, in some ways more difficult. This section of the
group executive is concerned with the controller cell interface with the operational
cells as the cells require services of the executive. No malfunctions are considered
here, nor is reconfiguration, these problems are discussed in Sections 8 and 11.2.4.
The computational tasks willbe performed in the operationalcells. These
cellsmay require executive services such as gettingI/O data over the inter-cell
bus, gettingintermediate computational data from other operational cells,setting
clocks, etc. The group executive will provide such services under the subfunction
of "controlsystem resources. " These services willbe provided in two modes:
periodicand background. Periodic services are those required at fixedpredeter-
mined intervalsoftime while background services are those fittedin between the
periodic according to some priorityscheme. Each of these two modes of service
willbe described below. The services willalso be provided in two ways: by com-
mand or request. Command services are those completely controlledby the con-
trollercell,e.g. the controllercellcommanding one cellto send another cellcertain
data. Request services are those requested by the operationalcells,e.g. an opera-
tionalcellrequesting data from another cell.
It should be noted here that the group executive function of control system
resources is actually controlled or formed by another group executive function
namely that of "allocate system resources" which will be discussed later (sec-
tion 11.2.3). The function of controlling the system resources may also be considered
as one of controlling the control and data flow over the inter-cell bus; this will be
made clearer in the discussion below.
The periodic mode will be described first. The periodic programs and the
times they require service will be listed in a table as shown in Table 11-1. These
table entries are described below. It should be kept in mind when reading this
section that the actual programs are contained in the operational cells and are
executed therein. The controller cell is only performing certain executive functions.
The next interrupt time is the time at which the cell is expected tu request
service, or will need service. Unlike a traditional computer system, each cell
contains a clock. The clocks will all be synchronized.
The period for interrupt time is the time interval between requests. This table
entry may be changed by the controller cell or the problem cells as required.
11-2
Table 11-1. Control Program Data
Next Interrupt Time
Period for Interrupt Times
Status of Task
Task Identification
Valid Request List
The task status tells the controller cell whether the task is active, or inactive
and not requiring attention, or has been suspended for a more important higher
priority task.
A task may make certain requests of the group executive. These include input
request, output request, data transfer to another cell, global program data from
controller cell, etc. Each task is expected to make only certain types of requests.
There are task requests that are not valid, and the valid request list may be used to
prevent these from taking place. Certain cells, for example, may never be written
into, or over-layed with new programs.
The table entries can be time ordered or placed in a linked list as shown in
Figure 11-1, where each request contains a link to the next time. With the latter
method, inserting new requests and deleting requests that are no longer needed is
simplified. Although this method may not be used in actual practice, it will be
assumed here in this section.
A real time clock (RTC) in the processor section of the controller cell will be
used to time the interrupts to provide periodic servicing of the tasks listed in
Table 11-1, The flow diagram in Figure 11-2 shows how the RTC interrupt is proc-
essed in the controller cell.
When the clock becomes zero, the interrupt hardware saves the program
counter and any other status information as required in the processor. The instruc-
tion at the dedicated memory location for the RTC interrupt is then fetched. This is
done in block 1.
Block 2, using the start link address, gets the next clock time at which an
interrupt is desired. The link is changed to put the top item (which is being processed
now) in a new, different list. The Controller Cell (CC) interrupted program counter
is saved in this list. The item for which the clock is being set now becomes the top
item. The remaining list items remain the same. Block 4 will now allow a RTC
interrupt. The link list has been updated so now the interrupts can be stacked.
Block 5 may be omitted if the nature of the service to be performed is known to
the CC, such as the next part of a global instruction sequence. This flow chart
assumes a worst case in that the cell address must be fetched so that the CC can send
11-3
START
NEXT INTERRUPT TIME
LINK TO NEXT WORD
PERIOD FOR TIMES
TASK ID, STATUS
VALIDITY CHECKS
Figure 11-1.
--_ NEXT INTERRUPT TIME
LINK TO NEXT BLOCK
PERIOD FOR TIMES
TASK ID, STATUS
VALIDITY CHECKS
i
Link List for RTC Requests
V
11-4
RTCINTERRUPT
SAVE PROCESSOR
STATUS
i GET NEXT TIME
FOR CLOCK, LOAD
INTO CLOCK
1 UPDATE TIMESERVICE LIST
7
t
8
1
9
ALLOW RTC INTERRUPT
GET CELL ADDRESS FOR
THIS SERVICE FROM THE
SOFTWARE REQUIREMENTS
TABLE
½
IS PRIORITY OF SERVICE
GREATER THAN PRIORITY
OF THE INTERRUPTED
PROGRAM?
]
?
MAKE UP CC INSTR. FOR
X2 GC COMMAND TO THIS
CELL - EXECUTE THIS INSTR.
÷
EXAMINE THE CELL ID
AND ADDRESS REQUESTED
½
IS THIS REQUEST A
GREATER PRIORITY THAN
THE CC INTERRUPTED
PROGRAM?
I (DONE BY HARDWARE)
"i
t
Ii
NO I/O IS [
REQUESTED ._| RETURN TO INTERRUPTED II PROGRAM IN THE CC
12
PLACE THE CC INTERRUPTED ROUTINE
ON THE JOB STACK. PROCESS THE
yES _ REQU S ED SERVIC .
O SAVE THE REQUEST AND PLACE IN THE JOB STACK.
. RETURN TO THE CC INTERRUPTED ROUTINE.
Figure 11-2. Periodic Control of System Resources by Real Time Clock
11-5
this cell a command to report its communication request status. (A detailed dis-
cussion of commands sent and received on the inter-cell bus is given in Section 10;
this chapter simply discusses how some of the commands are used and not how they
are mechanized. )
The Task Identification in Table 11-1 is an identifier that points to a table in
the CC (Table 11-3). This table contains, for this Task Identification, the priority
and the cell address to which the report communication request GC command (here-
after called x2) should be sent. The priority will be compared to the CC interrupted
program priority. The cell address, obtained from this table, is used to make up
a CC instruction to send the X 2 command to the cell given by the address. The CC
instruction will then be executed in the CC (Block 7). The receiving cell will return
a response request word. How the cell forms and sends the request word is
described in the cell executive (Section 11.4). The request word will contain whether
input or output over the bus is required, an address and a word count. The group
executive will examine the requested cell address, address, and word count (Block 8).
Knowing the priority and what type of request may be expected by examining the con-
trol program data table (Table 11-1), the group executive will decide whether this
cell has a higher priority than the interrupted program in the CC and whether the
request is valid.
The "Next Job Stack" will contain the list of jobs that need to be done. The
concept is used here to explain how the jobs are listed. The group executive can
select which job is to be continued with and which is to be held in the stack until its
priority has become larger than all the other jobs. Here, in the RTC routine, there
are two jobs to consider: the requested service by the responding cell and the CC
program that was interrupted. The relative priorities of these two jobs are known to
the group executive and thus a decision can be made as to which must be processed
first.
Blocks 8, 9 and 12 are described in more detail in the flow chart in Figure 11-3.
The CC first examines the response request word received from the cell. Some of
the possible requests are: Input, Output, Global Program, No request, etc. Further
software study would be required to define the full spectrum of the requests.
However, most requests may simply be defined as either input or output. For
example, if the cell desired a global program be sent to it, it could request input
service from the CC at a certain address in the CC. The CC can identify this from
Table 11-1 as a global program request. The CC checks the priority and validity
of the request.
A request will be processed by the CC by forming the proper CC instructions
in the CC and executing them. These CC instructions may be to send out input/output
GC commands over the bus (referred to as X3 and X 4 commands in Figure 11-3).
The GC commands on the bus will then initiate the desired service to be accomplished.
Section 10 contains a detailed description of the operations on the bus and the mech-
anization of the commands sent on it. The intent in this section is to show how these
commands interface with the executives and not their detailed implementation.
It should be noted here that as part of the validity check on the request, the
CC will determine if the request can be completed in time before the next interrupt
occurs. It will do this by examining the word count and the time to the next interrupt.
11-6
I
I
I
I
i
DOES ACCUMULATOR =
YES, NO I/O
I R_QUESTED
0? ,
• L
!
IS COUNT GREATER |
ITHAN ZERO?
GET PRIORITY FROM ]
ICPD (TABLE 3-1)
½,
COMPARE WITH PRIORITY
OF INTERRUPTED PROGRAM,
GREATER?
NO
CHECK CELL ID AND
ADDRESS AGAINST VALID
REQUEST LIST (TABLE 3-1)
11 VALID
MAKE UP CC INSTR. BY PLACING
CELL ADDRESS AND WORD COUNT
IN THE PROPER CC INSTR.
LOCATION AND EXECUTE THE
CC INSTR. TO SEND OUT THE
DESIRED GC COMMANDS
(x 3, x4)oN TfiE _US
i
NOT
Figure 11-3. Processing a Cell's Request
II-7
Since the time for transmissions over the inter-ceU bus is known, the CC will deter-
mine if the desired word count can be accommodated. If it cannot, the CC can either
reduce the word count or treat this as an invalid request. It should be noted that this
occurrence should be very rare for the periodic service.
The background mode of service will be described next. This mode of service
is handled by periodically entering the "Cell Poll Executive" routine shown in Fig-
ure 11-4. The purpose of this routine is to poll or request response for service
words from the cells according to some priority scheme. The CC then examines the
requests and initiates service as previously described for the periodic services.
This routine will be entered periodically. However, its length will be unknown
since the amount of requests that will be received are unknown or cannot be antici-
pated. It will also be interrupted whenever the periodic service routine needs to be
entered as determined by the RTC. Therefore, this routine can be visualized as
service being filled in around the periodic service, essentially this is background
service. In actual use the CC will be spending most of its execution time on the
periodic and background service routines; one can visualize the CC then as period-
ically servicing certain cells or tasks and then at the spare or "dead" time servicing
the other cells or tasks.
As shown in Figure 11-4, the request poll program starts with the task that has
the highest priority. The CC forms a CC instruction and executes it to send out a
GC (X2) command that will request a response word from a particular cell. After the
response word is received it is examined to determine the "status of the request. "
This is performed very much the same as described previously under the periodic
service: the type of request is examined, cell address and word count are checked.
The validity of the request is checked and the appropriate action taken similarly to
the previous description.
A problem that will occur in this routine much more frequently than in the
periodic service routine is that of completing the requests in time before the RTC
interrupt. If the request is for input/output communications over the inter-cell bus,
the CC will examine the word count requested. It will determine if the desired com-
munications can be accomplished in the time interval before the next RTC interrupt
occurs. Actually, this interrupt will not bother the communications, since it affects
only the controller cell (unless the controller cell is one of the parties to the com-
munication). However, the RTC will need the intercell bus (usually) to send a GC
command and receive the response. Thus I/O should not be started if it cannot be
completed before the RTC runs down.
If there is not time, the executive will either reduce the count so that there is
time, or will decide to not allow the I/O to be performed. If the latter course of
action is chosen, special action needs to be taken since the response has been fetched
from the cell. The cell has set itself up for I/O transmission, and is expecting a
GC command and data. One way is to return a GC command (input or output) with a
count of zero. Thus the cell executive will note the count and try again later.
Another approach is to use a special GC command to cause the cell to return to its
status before the GC x2 was received on the inter-cell bus. Either approach can be
used.
11-8
START AT HIGHEST
PRIORITY TASK
EXECUTE CC INSTR. TO SEND
OUT A GC (X 2) COMMAND
I
STATUS OF REQUEST? [
|
I/O REQUEST
IS THERE TIME TO
PERFORM REQUEST
BEFORE RTC WILL
INTERRUPT?
YES
PERFORM
\
GET NEXT TASK ]
POLL COMPLETE
SET UP FOR NEXT ||PASS THRU LIST
GLOBAL PROGRAM
NONE
NO
m
V
SET UP CELLS,
SEND OUT GLOBAL
PROGRAM
Figure 11-4. Cell Poll Executive Routine
11-9
Upon servicing this request, the next task according to priority will be polled.
The rate at which each cell is interrogated depends upon the priority of the tasks
and how the priority relates to the inter-cell bus communications requirements.
The priority scheme is not defined here, nor can it be defined until the system
requirements, I/O conditioner specifications, scientific experiments as well as the
other tasks have all been defined. The only assumption made here is that the
priority will require some cells to be interrogated more frequently than others. At
the end of the list, the program is re-initialized for another pass through the list of
tasks.
It should be noted here that for both modes of service, periodic and background,
the CC will not be able to use the bus during the communications time for the request
being serviced. That is, if Cell :i wants data from Cell 2, during the time that
Cell 2 is transmitting the data to Cell 1, the CC can not interfere by trying to use
the bus. During this time the CC can do other tasks such as various internal house-
keeping executives. (Of course, if the CC is one of the parties to the communica-
tions, then it is tied up communicating and cannot be doing any other tasks. ) The
only difficulty in doing these other internal tasks is that the time to accomplish the
communications will be quite random. However, for any given request the CC will
know the count of the number of words to be communicated. It will therefore know
the approximate time to communicate and may thereby select which one, if any, of
the internal tasks to take on. The CC may either keep on doing an internal task
until it receives an "I C Busy" free interrupt (recall that the transmitting cell sends
out a global command at the end of transmission that resets the IC Busy FF; this
can then be used to generate an interrupt) or it may simply set its clock to interrupt
itself after a specified interval of time. The "IC Busy" free interrupt will be used
quite frequently during the background mode of service; the interrupt can cause the
CC to then poll the next cell for a request.
11.2.2 Furnish Global Programs to Dependent Cells
The cell or cells that contains the group executive contains all the global instruc-
tions that are to be executed by the dependent cells. Because these instructions are
sent on the inter-cell bus and the controller cell controls the inter-cell bus, the
controller cell sends out all the global programs. These global programs may be
simply sent out by the controller cell or they may be sent out in response to a request
either by the periodic or background service routines.
At the time a global program is to be sent out, the controller cell will transfer
to the start of the global program. This program will usually begin by addressing
the individual cells that will be used and setting each one to the desired level. Once
this is done, the rest of the global program is fetched from the controller cell
memory and sent to the dependent cells. The controller cell transmit and execute
mode instructions are used to control the sequence of instructions sent out from the
controller cell. During this time, the inter-cell bus can not be used for input/output.
If time is needed for input/output, the controller cell must discontinue the global
program before any input/output instructions can be executed.
The instructions used by the controller cell to change modes and control the
global program are described in section 6.
11-10
11.2.3 Allocate System Resources
The group is made up of hardware resources, software resources and time.
The efficient use of these resources is required to perform all the tasks required.
This efficiency can become important as the space flight continues and hardware
malfunctions occur and the hardware fails. The remaining resources must perform
efficiently to ensure mission success.
Before the executive scheduler program can begin to allocate system resources
and schedule the use of these resources, there must be made available to this program
a list of the hardware resources in the system and a list of what tasks must be per-
formed by the group. Table 11-2 shows a hardware resources list. The resources
are listed with their status. The status will contain such information as power off,
malfunctioning_, failed connections, etc. The tasks are all the tasks that require this
hardware resource.
Another set of data required is the software requirements. Table 11-3 gives
these requirements• This table is changed by the system executive as the needs of
the computer system change. Each task that is performed has a priority and a list
of major programs that make up this task.
Table 11-2. Hardware Resources Table
Resource Status Tasks using this Resource
Cell 01
Cell 02
I/O Conditioner 01
I/O Connection to
I/O Conditioner
Inter-Cell Bus
Group Bus Switch 1
Group Bus Switch 2
3
2
2, 56
3
7, 23, 9
II-Ii
Table 11-3. Software Requirements
1. Task Identification
2. Priority
3. Programs Required by this Task
Table 11-4 gives detailed program requirements. Usually a copy of this data
will be kept in the bulk storage unit and called when required. This table is impor-
tant, since it links the software to the hardware. Table 11-4 will be discussed in
detail below.
Program, Task Identification
The program number, program, task identification or task number, identifies
the program to the system and is also used in the hardware resources table.
Number of Cells Required
The number of cells required is the normal cell requirements. However, there
exist programs whose cell requirements can vary. Some, such as data reduction
programs with long global programs, can vary from one to as many cells as there
are available in the group. For these special cases the number of cells may be
varied by the executive scheduler to accommodate other requirements and restraints.
For most programs, there is a number of cells that makes good use of the
storage and does not use too much inter-cell bus time. Fewer cells may require
more I/O or inter-cell bus tirne, more cells may require more neighbor communica-
tion time, executive time and memory. If there are no problems the normal cell count
will be used. The executive scheduler will use this program, as compiled, through-
out the entire mission. If problems arise, such as (1) using all the spare cells in a
group, (2) the intergroup bus fails or (3) a group fails; then the system executive may
recompile some programs to use less of the system resources. Note that the system
executive may wish to recompile certain critical programs to reduce their reliance
upon resources that present a single point failure. For example, if one inter-group
bus fails, the other is now a single point failure. Following the time of the inter-
group bus failure, the system executive will direct one of the groups to recompile the
critical programs to bypass the inter-group bus or to prepare to use alternate paths
(such as through I/O conditioners or the manual consoles). Thus if the second (or
last inter-group bus, if there are more than two) fails, the critical programs are
prepared for this failure. Conversely, if an I/O conditioner should fail, these same
critical programs may be recompiled to make more use of the inter-group bus and/or
use critical data from alternate I/O conditioners.
The distributed computer system has the properties for providing many paths
and alternate ways to accomplish tasks. Thus any rules that can be set down can be
broken under 'certain' conditions, especially when failures occur. In present,
nondistributed computer systems, there is a memory, a processor and I/O
11-12
Table 11-4. Program Requirements
1. Program, Task Identification
2. Number of cells required
3. Word Count for Each Cell
4. Priority, Etc.
5. Periodic or on Request
6. Time Period (if Periodic)
7. Input Requirements
8. Output Requirements
9. Neighbor Requirements
10. Cell Bus Usage
11. Starting Location
12. Restart Location
13. Other Programs Required by this Program, Such as Subroutines
conditioners. A program is in memory and instructions are executed periodically.
In the distributed processor, there is no processor penalty for more rapid execution
or more repetitive execution. Note that this does not apply to the I/O or inter-cell
bus system.
The distributed processor can, by recompiling the programs, change:
1. Storage
2. Number of cells used
3. Inter-ceU bus usage
4. Inter-group bus usage
5. Neighbor to neighbor usage
6. Execution time
Thus many variables are available to mold the system to meet certain objectives.
11-13
Word Count
The word count allows the executive scheduler program to best fit the programs
in the group. This parameter is of particular importance when several programs
must share a cell.
Priority, Etc.
This parameter depends upon the use of the program. A subroutine which is
used by a very high critical program will be of the same priority, whereas the
same subroutine, when used only by a low priority program, will have a low priority.
The priority is very application dependent. At the time the system is firmly defined,
the priority numbers may be assigned tothe programs. The executive scheduler,
using its knowledge of the system requirements at a particular point in time, may
change the priorities to best insure the probability of mission success.
Periodic or Request
This refers to the fact that some programs are required to be performed at
certain intervals of time. Other programs are executed only upon the request and
demand of another program. Periodic programs, on the other hand, are executed
on demand of the real time clock.
The periodic programs are divided into two categories, the absolutely periodic
and the average periodic programs. The absolutely periodic programs must be
started at periodic intervals, e.g., 0, 20, 40, 60, 80 ......
The average periodic programs may be varied in starting time, but the
average starting times are constrained to be within some limits. For example, the
sequence 0, 25, 45, 60, 80 is acceptable. The delay from 20 to 25 may have been
caused by a higher priority program interrupt.
Input Requirements, Output Requirements
All required interfaces to the outside world are defined. The conditioners that
are needed, the data rate, the identification of the data are all listed. The cell bus
usage is not required here, as the cell bus usage is handled in another table entry.
Special input requirements, such as special sensor commands that must be
output before the processing can begin are given in this table entry. Again this
data is very application oriented.
Neighbor Requirements
Any spatial requirements imposed by neighbor cell communication are given
here. It is expected that all the tasks that will not fit in a single cell will use a
small number of cells, working together, to form a SET of cells. This set wouId be
fitted with the other sets that make up the group. This fitting is a geometric
relationship, fitting the sets of cells among the good cells of the group.
11-14
Cell Bus Usage
The executive scheduler must know the inter-cell bus usage requirements of the
programs. These table entries give the inter-cell bus usage requirements of the
program. This problem is discussed in considerable detail in the latter part of this
section where the executive scheduler is discussed.
Start Location
The start location is the location where the program is to begin execution. If
this is a dependent global program, the initial locations will usually be the controller
celI setting the cells to the proper level.
Restart Location
The restart location is for those programs that have initialization procedures
in the program. The restart location, or locations, are used when the calling pro-
grams have different initialization requirements.
Other Programs
Other required programs are listed for the reconfiguration programs and
the schedules.
The executive scheduler will have the hardware resources table, the program
requirements table, and the software requirements table at its disposal. The soft-
ware requirements table, made up by the system executive, will contain a list of
the absolutely essential programs, the high priority programs and the low priority
programs.
Figure 11-5 contains a flow chart of the executive scheduler and will be dis-
cussed in detail in the remainder of this section.
The executive scheduler will take each task and fetch the Program Require-
ments table (PR table, Table 11-4) for this task, from the bulk storage unit. The
PR table entry for this task may contain "other programs required" entry that calls
out other programs that are needed. The executive scheduler program will add these
to the list of required programs. The duplicated are then eliminated. Storage and
the inter-cell bus requirements are added up. The I/O requirements are listed
as I/O conditioners and I/O connections. After all the required (essential) tasks
are processed, the storage and other hardware sources that are available (not
failed) are compared and examined. A problem exists if the total storage iB more
than the total group capacity. Actually the storage available will be less than the
total words of storage because of the overhead and some wasted space in some cells.
Thus, if the simple sum of all memory storage requirements is greater than all the
cell memories_ there is no way to fit all the programs in. A similar line of reason-
ing holds for the inter-cell bus and I/O requirements. The expectation of the
resources being a problem at this point in the executive scheduler is small. Later in
the scheduled program there may be a cause for problems if the dems_l for service
is too great. The next block in the flow diagram examines the inter-nell bus
11-15
SCHEDULE
COLLECT ALL ESSENTIAL
PROGRAMS FROM TASK
TABLE
,,
SUM ALL CELL
REQUIREMENTS
ARE THERE ENOUGH?
i
i H i
SUM ALL INTER-CELL
BUS REQUIREMENTS -
SUFFICIENT RESOURCES ?
i
YES
SUM ALL I/O
REQUIREMENTS -
SUFFICIENT RESOURCES ?
YES
MAKE UP THE TENTATIVE
CELL BUS USAGE SCHEDULE
GET HIGHEST PRIORITY
PERIODIC PROGRAM
N__..9__O
INFORM SYSTEM EXECUTIVE
ESSENTIAL PROGRAMS CAN NOT
BE PERFORMED
SUM ALL REQUIREMENTS,
(CELL, INTER. CELL BUS
AND I/O) - SUFFICIENT
RESOURCES ?
YES
ADD INTER.CELL BUS
REQUIREMENTS TO SCHEDULE
NO
CHECK BULK STORAGE UNIT
FOR ALTERNA_FE PROGRAMS
AND/OR OTHER I/O SOURCES
WILL ALTERNATE FIT?
NO
YE_S MORE PROGRAMS TO ]SCHEDULE ?
i
Figure Ii-5. Executive Flow Chart
11-16
ASSIGN THE
I/O CELLS
ASSIGN THE
I/O PROGRAMS
ASSIGN THE
REMAINING PROGRAMS
STARTING WITH THE
LARGEST
MAKE UP THE BUS
USAGE SCHEDULE
AND CONTROLLER
CELL SCHEDULE
DATA TO
BULK
STORE
I
TRANSFER CONTROL
TO PROGRAM LOADER
Figure 11-5. (Cont)
11-17
requirementsand makes up a tentative bus schedule. A major cycle time is
selected and the time requirements, rate and deviation of each task are noted.
The software test and controller cell to system executive times are included.
result is:
The
SOFTWARE CC - SYSTEM
TEST EXECUTIVE TASK 1 TASK 2
START
MAJOR
CYCLE
TIME
I
END OF
MAJOR
CYCLE
TIME
Figure 11-6. Preliminary Time Schedule, Inter-Cell Bus
Note that there is no attempt in Figure 11-6 to order the tasks, they are just placed
in approximate order and duration. The software test, for example, is usually done
last, not first. The contrast to the final order will be evident later on when Fig-
ure 11-10 is discussed. The exact periodic and fitting of times will also be done
later.
Having gotten to this point the executive scheduler will begin to make up the
final bus schedule. The highest priority task is selected and all the requirements
for this task are listed; the storage is totaled and the number of cells required are
noted. For example, if the controller cell requires 2-1/2 cells, one-half a cell is
available for other use; 2-1/2 cells are removed from the group resource list. The
inter-cell bus requirements (which for the controller cell will be the software test
and communication with the system executive and bulk storage unit) are noted. The
preliminary schedule (Figure 11-6) is used, the exact times are set and the durations
are noted. This may push aside and cause some other tasks to be delayed. This task
is now assigned time slots and if the program requires an exact periodic timing,
these times will be fixed. Any special I/O requirements to any conditioners are
noted.
11-18
The next highest priority program is obtained from the list of essential tasks
and the storage requirements obtained. Cells are allocated to containthis task. The
inter-cell bus requirements are obtained and put in the same spots on the prelimi-
nary schedule. However, this time the exact times are used; the tasks are placed
next to one another (see Figure 11-7). If each task is an integer multiple of a com-
mon sub multiple, there will never be any time in which two tasks require the bus at
the same time. The use of a major cycle time will also prevent the occurrence
of two periodic tasks requiring the bus at the same time.
J.oservodf] esJ JRosII e'J I es J
Figure 11-7. Constructing the Bus Schedule
By using a common submultiple for all essential periodic programs, the scheduling
is simpler. One possible sequence is shown below; every periodic task is executed
at one of the following rates - no other rates are used.
Times per second
1/16 1/8 1/4 1/2 1 2 4 8 16 32 64 128 .
Sometimes a task to be added to the schedule will require a block of time that is
longer than the available time. First, a trial will be made to shift previous tasks
to leave enough room as shown in Figure 11-8.
T2
T3
W
Figure 11-8. Shifting the Bus Schedule
Note that T3 is too long. By delaying the start of T3, BUT NOT THE PERIOD, the
task can be made to fit (Figure 11-9).
Figure 11-9. Fitting Periodic Tasks in the Bus Schedule
11-19
If this attempt is not successful, the scheduler program will split the program into
sections. Notethe tasks that are split are always of a lower priority than thetasks
that are already scheduled. Thusthe highestpriority tasks will not be interrupted
by a lower priority task (Figure 11-10).
T2
Figure 11-10. Splitting the Periodic Use of the Bus
The periodic and approximately periodic programs will require accurate time
placement, the exact cell bus usage times will be scheduled and this time list will be
used by the group executive when the object programs are being executed. For other
programs, the request poll (see Section 11.2.1) in the group executive will be used.
This poll program will start I/O over the inter-cell bus only when the periodic pro-
grams, which are already scheduled, are not using the inter-cell bus. However,
some time, unused by the periodic programs must be allocated so the remaining
programs can use the inter-cell bus. If all the inter-cell bus time is reserved for
periodic programs, there will be no time left for the polled programs. The latter
programs must be given some spare bus time. The preliminary time schedule made
up initially will insure that sufficient time is available.
The cell assignments are made next. An example will be given to illustrate
the procedure. It is assumed the compiler has calculated the word count needed for
each task, the number of cells and neighbor lines. Figure 11-11 contains an example
and shows the present step in the assignment procedure as shaded area, and past
assignments as cross-hatched areas. An explanation of each assignment step is
given below:
lo All tasks that require I/O conditioner connections (via an I/O line) are
done first, Figure 11-11(1). These cells have special hardware connec-
tions to the conditioners that will not be changed unless a failure occurs.
2. All tasks using these I/O cells are assigned to cells that have a neighbor
line connection to these I/O ceils.
. The task with the largest connected set of cells is assigned to the smallest
set of cells in which it will fit. The executive scheduler program will try
to leave the resulting area so there are no isolated squares. Thus Fig-
ure ll-ll(3a) is preferred to ll-ll(3b).
. Since long closed-end connected cells are more difficult to fit than rec-
tangular or open-end connected cell sets, the former will be fitted next.
The term closed end means that the first cell is connected to the second,
the second to the third, the third to the fourth, etc., and the last is
11-20
I/O
I
D
i/o x/o
I. 2. 3a 3b
4. 5.
Figure 11-11. Cell Assignment Procedure
connected to the first. Also included here are all the sets of cells with
complicated neighbor connections.
5. Finally, all the single or pairs of cells are fitted into the remaining
areas.
Several trials may be required to get all the cells to fit within the group.
By noting what cell geometry is required and the present spatial relationship, the
allocation procedure can begin again, this time attempting to make room for the
remaining cells. The algorithm to be followed to attempt to reassign the cells has
not been defined at this time. Section 4 of this report discusses the problen_s asso-
ciated with optimum assignment procedures for parallel computations. It was seen
that this is a very complicated problem. Nevertheless, it is assumed that some
algorithm will be available to the executive scheduler for reassignment purposes.
If the tasks cannot be all assigned to the group, there are many alternatives
that can be resorted to. Some of these are:
. Increase the efficiency of the existing tasks so more will fit in the capac-
ities of the group. Compiler optimization techniques will be improved
over what is being accomplished presently.
2. Reduce the rate at which tasks are executed. If a task is not required all
the time, execute it only when requested.
11-21
3. Swapprograms in andout of the bulk storage, thus sharing the group
storage capability between tasks.
4. Schedule the use of resources, such as subroutines, I/O routines, etc.,
instead of repeating these routines wherever they are needed.
. Increase or decrease the amount of applied parallelism to reduce the
time and/or storage requirements. (This would generally be done in the
reassignment algorithms. )
If there are problems, there will be a program or programs that will not fit.
The programs of lowest priority may be deleted, or they may be brought in from the
bulk storage unit when required. In this way, several programs may time share
several cells, there being one program using the cells at one time. Or the lowest
priority programs may be eliminated entirely. In all cases of eliminated programs
the system executive will be notified because there is always the possibility that
another group has room for these deleted programs.
It should be noted here that there may exist several assignment algorithms
in the bulk storage unit and the executive scheduler may call the desired one depend-
ing upon the phase of the mission and/or the reason for using the algorithm. For
example, an optimum assignment algorithm may be used in the heavily loaded
Mars orbital phase either due to attempting to initially schedule the tasks or due to
a failure. However, during the critical mission phases, if a failure occurred a
suboptimal algorithm may be resorted to for reasons of quickly accomplishing a
reconfiguration.
Having completed the assignment routine, the executive scheduler can assign
the actual cell addresses, make up the inter-cell bus I/O commands, etc. The
neighbor commands are all relative, so unless the scheduler had to change a relative
position, there will be no need to change the compiler generated coding.
11.2.4 Test the System Hardware and Software
Section 8 of this report describes many approaches to the software test prob-
lem. The fourth approach, described in Section 8.1 is discussed here.
The software test program begins in Figure 11-12 by using a presently unused
cell level and setting a single specific cell to the dependent state and to an unused
level. The controller cell will ask for and expect a response. The cell is set to
another level. The accumulator, etc., are moved to a temporary storage area.
This procedure gives some confidence that the cell's state and cell address register
are functioning correctly. This procedure is repeated for all the other cells. The
controller cell now transfers control to the cell that contains the test program.
This new cell, now the controller cell, will send out the software test program for
all the other cells.
Notice that the given address and data concept introduced in Section 6 are now
extremely important. The processor and memory can be tested without having each
cell save many constants, addresses, etc. In fact, each cell will need to move very
few registers out to make room for the software test program. Some registers to
11-22
i i
SOFTWARE )TEST
÷
SET UP EACH CELL
TO RECEIVE TEST
PROGRAM. SAVE ANY
CRITICAL REGISTERS.
SWITCH TEST CELL
TO BE CONTROLLER CELL
SEND OUT TEST
PROGRAM TO ALL
CELLS
ANY FAILUREREPORTS ?
NO
HAVE CELLS
COMMUNICATE TO
NEIC.HBORS
ANY FAILURES?
NO
RETURN CONTROL TO
EXECUTIVE CONTROLLER
SET GROUP SWITCHES
CONTINUE
FIND WHICH CELL AND
ISOLATE. SET UP
MESSAGE FOR SYSTEM
EXECUTIVE AND INFORM
SYSTEM EXECUTIVE.
u t
MORE TESTING TO DO?
0 t I I
Figure 11-12. Software Test and Reconfiguration Routines
(Sheet 1 of 2)
11-23
SPARE CELLS
AVAILABLE ?
YES
CAN BACKGROUND
YES
PROGRAMS BE STORED _..
IN BULK STORE?
CAN LOWEST
PRIORITY TASK BE
ELIMINATED?
ii i
i
REQUEST SYSTEM
EXECUTIVE TO
ELIMINATE TASKS
SCHEDULE TASKS
FOR THE GROUP ']
Figure 11-12. Software Test and Reconfiguration Routines
(Sheet 2 of 2)
11-24
be moved include the program counter, some index registers, and status registers.
The operational program contained in the cell is assumed to be stopped at some point
where the accumulator contents may be destroyed by the software self-test program,
if not, it will need to be saved.
The software testing program should require less than 500 words. Because
there are no words required in the receiving cells, all the constants, etc., are sent
as data (D16 or D32) over the inter-cell bus. The last commands would make the
cells transmit to their neighbors their final test results. Each cell, testing its
results against the neighbors, will report to the controller these 'good - no good'
results.
By using the level response command, the controller cell running the tests can
obtain a report of (1) no failure reports or (2) one or more failure reports. An
individual cell by cell poll can determine which cell or cells had sent in the malfunc-
tion report.
At this point, either all the cells test good or there is a failure report or
reports. Of course, if all is well, the test controller cell will return control back to
the original controller cell. The original controller cell will now send the proper
response words to the group switches. (See Section 8 for a discussion of the operation
of the failure detection system. )
The testcontrollercell,upon receiving a failurereport, willisolatethe cell,
and continue testingthe other cells. The failurereportwillbe sent tothe system
executive and the originalcontrollercell. This completes the testcontrollercell
routine;the diagnosis having been completed.
11.2.5 Reconfigure the Group
Reconfiguration is required under two conditions. There are the normal
changes due to phase changes in the mission; e.g. new programs are required during
the coast phase which are differentfrom the requirements of the Mars orbitalphase.
Secondly, reconfigurationmay be required upon a failureor malfunction of a cell,
group, bus, etc.
To perform reconfiguration due to phase changes, there must exist a means
of changing the programs by overwriting the programs that are not needed with the
new programs that are required. The reconfiguration during normal phase changes
has the advantage that the procedures can be simulated on ground based computers
to ensure that the software will smoothly change from one phase to another. Thus
the interactions between one program and another can be noted and any adverse
effects can be corrected.
Ifthe group is not performing periodic programs and the group programs that
are tobe retained do not keep a data history, the entiregroup can be loaded using
the resources tables. For many groups this can notbe done. One means of loading
the group is to monitor the programs operation. As the cellscomplete alltheir
programs, a cellcan be loaded with a new program and set to inactive. As tasks
are completed, they are removed (ifthey are not needed or must be moved) and the
cellsare loaded with the new program. As soon as the new periodic programs are
loaded, the old periodic programs are made to average periodic so as not to disturb
11-25
the new time schedule. An excellent time to change the periodic programs is to
reload cells instead of performing the software tests. After the periodic programs
are loaded and are performing correctly, the background programs can be started.
The group is now reconfigured with the new programs.
Reconfiguration because of a hardware failure is a more serious and difficult
problem. Some simulation of failures and the reconfiguration programming can be
done on ground based computers before flight begins. However, there are too many
possible combinations of critical programs and hardware configurations to check all
possible problems and solve each one. There are some that can be simulated
easily, such as a group failure or an inter-group bus failure. There are two solutions
to the problem - (1) a general algorithm to reconfigure any hardware - software
system ($1) into another hardware- software system ($2), or (2), set up the first
system S1, in a form that can tolerate as many varieties of failures as possible and
still perform the critical functions.
The first solution is very complex, it is the most sophisticated and is the goal
of the failure reconfiguration considerations. The objective is to go from any state
S1 to another state S2 which will maximize the amount of computer resources that
can be used for background programs. In other words, the most efficient assignment
of the programs is desired. Some considerations of the general algorithm were
given in the section on parallelism (Section 4) in this report. It was seen there that
there are many possible solutions to performing tasks in parallel and in general one
has to resort to a suboptimal algorithm to assign tasks. In general each assignment
will vary the cell storage requirement, speed requirements, inter-cell bus usage
requirements, etc. This subject is very complex and deserves much more considera-
tion before any conclusions can be drawn as to the merits of optimal assignment
algorithms.
The second solution is to ignore efficiency and configure the system into a very
reliable, fault tolerant one. In this approach background and non-critical programs
may be executed at lower rates and lower response times expected when reconfigura-
tion occurs with no opportunity to seek out an efficient utilization of the system
resources.
The first page of Figure 11-12 presented the general procedures for the software
test. The second page of this figure contains the flow chart that is entered if a failure
occurs; this is part of the reconfiguration routine. The result of a failure is a change
in the hardwai'e resources table. A comparison between the present and new tables
will first be made to determine if a simple solution is available such as to start up a
spare cell or two. If this cannot be accomplished, background or low priority tasks
may be attempted to be eliminated. If these attempts fail the system executive will
be called upon to reschedule and allocate the tasks throughout the entire computer
system. In general the procedures followed throughout this process are quite similar
to those presented in the previous section for the executive scheduler (Section 11.2.4).
During non-critical phases, time is available to reconfigure the system and in
general the first solution, that of attempting to obtain an optimum assignment, will be
followed. However, during the critical phases; time is generally not available
to follow an optimum procedure (0.1 second is allowed to reconfigure the system) for
reconfiguration. In this case the second solution will be followed; this will involve
11-26
switching over the output signals to a redundant set of critical programs being
performed in the computer system.
11.2.6 Interface With System Executive
The overall computer system control is performed by the system executive.
To ensure reliability and ensure the mission will be successful despite a system
executive failure, the software design approach has been to minimize the system
executive functions and distribute its functions where possible. In this way the
group will continue to function if the system executive has failed for some reason.
The group executive has some interface with the system executive. As stated above,
this Interface has been designed to be minimal. The functions are outlined here.
. Task assignments. The tasks each group is to perform will be issued
by the system executive. The system executive will have more informa-
tion concerning the status of the groups and which groups have spare
capacity. When a group can not perform a task for some reason, the
system executive can be informed. The system executive may be able to
re-assign the task to a group that has spare capacity sufficient to do the
task.
, Back-up Data. Critical tasks during the critical phases will be duplicated
in separate groups. If one group fails, the remaining group will continue
to operate. Usually it is advantageous to have one task check the results
of the other tasks. This data will be sent from one group to the other
groups that are concerned. Thus various groups can check each other,
and all will have the latest data {some groups may not perform a periodic
task as frequently as others}. Back-up programs might not be performed
unless the primary task fails. This back-up program should contain
current data used and generated by the prime program. The back-up
program should it be needed, is ready to perform using the current data.
. I/O Conditioners. Many I/O conditioners are connected to two groups.
Depending upon the hardware design, the system executive may be called
upon to resolve conflicts that arise when the two groups each attempt to
control the I/O conditioner. The system executive may require one group
to not use a particular I/O conditioner, or to time share the device with
the other group.
. Data Transmission. The system executive will allocate time slots to
groups so the groups may transfer data over the inter-group bus. The
group executive will monitor its group switches as part of its periodic RTC
service routine. It will pick up requests from other groups here. In
addition, the group executive will place any requests it has for communica-
tions with another group in its group switches. The system executive will
control the transfer of these requests between the groups.
11.3 SYSTEM EXECUTIVE
In addition to the group executive functions presented in the previous section,
one of the group executives will also contain additional functions giving it the
11-27
capability of a system executive. The system executive is designed to coordinate the
groups and not really to control them. By this means, the system reliability is
increased.
Some of the functions of the system executive were introduced in Section 11.2.6
where the normal group executive interface with the system executive was discussed.
This section will discuss those functions plus others that reside in the system execu-
tive (hereafter referred to as SE).
11.3.1 Task Assignments
The SE will contain a complete up-to-date list of the total hardware resources
in the computer system. It will be responsible for deriving the software require-
ments table (Table 11-3) and sending it to each cell. This table essentially assigns
the tasks throughout the computer system. The SE will retain a copy of this table
for each group. This table is formed in the SE by using an executive scheduler
similar to that described in the previous section. The prime difference here is that
the executive function is more of a gross assignment routine rather than a detailed
scheduler routine.
11.3.2 Control Inter-group Bus
The SE is reponsible for control of the inter-group bus in much the same
manner as the group executive is responsible for control of the inter-cell bus. The
SE interfaces with the group switches and essentially this is similar to the SE acting
like the group executive and the group switches looking like cells. The SE will follow
a routine that is quite similar to the control system resources routine described in
Section 11.2.1. It will essentially sample the group switches periodically by a RTC
interrupt routine and also in a background manner by a poll routine. The prime
difference between this one and the group executive is that more of the service on
the inter-group bus is expected to be of background in nature.
The primary difference between the SE function here and the group executive
is the sequence of commands that must be sent on the bus to accomplish the
requested services. Section 10.2 discusses the inter-group bus operations and how
they take place.
11.3.3 Reconfignration
The SE will be forcedtohandle many cases of reconfiguration: phase changes,
faulty assignment of tasks, and failures. The most difficult reconfigurations are
due to the latter two cases, both are handled in a similar manner. The SE will enter
its executive scheduler routine (obviously more complex thanthat described for the
group executive) and proceed to re-assign tasks among the groups.
Some of the other SE functions will be mentioned here. The SE will be respon-
sible for monitoring the group switches to check Jfor a Go/No-Go signal. This signal
informs the SE of the status of a group. The general functions of the SE if a No-Go
signal is received were outlined in Section 8 of this report.
11-28
The SEwill be responsible for communicating with the group executives to keep
track of the hardware resources and the status of assigned tasks (completed, in
process, etc. ).
The SE will also be responsible for containing initialization routines to start up
the groups. There will be basically two types of initialization routines the cold start
and the start up after a failure. The first case occurs when a group is inactive such
as in the pre Mars orbital phases and is required in a new phase such as Mars
orbital. The latter case occurs when a group failure results as indicated by the
group switches.
Some of the general concepts in the initialization of a group were introduced in
Section 8. Assuming a cold start, one of the cells in the group that will contain the
SE will have a connection via its I/O line to the display/control panel. This cell will
be initialized by first sending it information to initialize its address (ID} and state
(set to the CC state) register. Then the cell may be loaded from the bulk storage
unit with the initialization program.
This program will first begin checking out the cells in the group one at a time.
(The cell's ID and state registers will be initialized via neighbor lines. ) Having
determined the hardware resources for this group, the program will load in the
system executive from the bulk storage unit. The SE will then notify the astronauts
via the display panel that another group is to be started up. This results in a cell in
another group being initialized in the same manner via its I/O line as done for the
SE group. The SE will then control the loading of this cell in the group from the
bulk storage unit. The cell will follow a similar procedure in checking out and ini-
tializing the cells in its group. However, all loading will be done under control of
the SE group via the inter-group bus. When the group has been completely checked
out the group executive of this group will inform the SE of the status of the hardware
resources (any failed cells).
Following this report the SE will proceed to the next group in the same manner
until all the groups have been initialized and checked out. The SE will then have a
complete list of the hardware resources available. It will then proceed to assign the
tasks to the groups and send each group its assignment table. The group executives
will then control the final loading of the operational programs into the groups.
Start up after failures, in particular group failures, were generally discussed
in Section 8 and are quite similar in procedure to that given above. This section
should be referred to for some of the specific formalities for starting up after
failures.
II.4 CELL EXECUTIVE
Each cell in a group will require a cell executive routine. This routine will
vary in size and complexity from cell to cell depending on the specific tasks to be
performed in each cell. In any case the cell executive (hereafter called CE) must
be a small, minimal program. The reason is that it is repeated so many times in
the group and the system would be very inefficient if each cell required for example
hundreds of words for an executive.
11-29
A way for minimizing CE requirements is to organize programs into sets of
cells and have the bulk of the executive functions in one cell of this set and very
minimal executive functions in the other ceils of the set. This was done in an
example program that will be presented in Appendix A of this report. In the example
program 3 ceils were in the set for the particular program and only 1 of the ceils
had most of the executive functions such as: all interface with the controller cell, all
error checking of intercell bus transmissions, and all data error and program
exception processing.
The minimum cell executive requirements will be given below; every cell will
be required to contain these functions. The storage requirements for the minimum
executive are:
Location 0 Save location for MB register
Location 1 Save location for P
Location 2 Save location for L
I_cation 3 GC X 2 response request
Location 4 GC X 2 response request
Location 0-2 are required in case the controller cell forces the cell to do some
function via a command addressed to this cell, such as during software testing.
These locations allow the program that was interrupted to be resumed.
Locations 3 and 4 are required to respond to a GC command sent over the
inter-cell bus by the controller cell that requests a communications request word
back. These locations are required even if the cell has no requests to be made.
Most cell executives will require a little more in the way of a program to take
care of the cell processing problems. Some of these problems may be allowed
interrupts by the programmer: The interrupts will be discussed below. The follow-
ing are possible interrupts in a cell:
1. Machine errors
2. Inter-cell bus I/O interrupt
3. Illegal data such as attempted division by zero
4. Illegal operation such as an invalid operation code, no neighbor response,
etc.
5. Real time clock goes to zero
Any of the above interrupts may be masked off. Each interrupt will be briefly
discussed below to provide some idea of what is required of the cell executive to
provide servicing the interrupts.
11-30
Machineerror interrupts are handledby a simple routine. An exampleof such
a routine is given in Figure 11-13. Block 1 is the normal hardware savingandchang-
Lugof the program counter. In Block 2, the two locationsthat hold the GCX2 response
requestwords are loadedwith the request. The requestwill be for anoutputof a
fixed numberof words to the controller cell as seenby the description of the X2
storage area. The location following the secondX2word is loadedwith the address
in this cell where the words will beoutputfrom. Thecell will thenmask off all
interrupts andwait for the controller cell to sendthis cell a GCcommandrequesting
theX2 responserequest words.
Oneof the prime functions of the cell executivewill be handlingthe inter-cell
bus communications. The inter-cell bus I/O routine is shownin Figure 11-14. As
maybe seenpart of the functions areaccomplishedby hardware andpart by software.
WhenI/O over the inter-cell bus is desired, thetwo dedicatedlocations that store
the X2 responserequest words are loadedwith the desiredcommunicationinforma-
tion. Theprogram will then return to its problemor execute a wait instruction if
the problem cannotproceeduntil the communicationis accomplished(this would
normally be inputting somedata from anothercell).
Whenthe controller cell sendsthis cell a GCX2 command(describedin Sec-
tion 11.2.1), the cell will respondas shownby the hardwareblocks. The detailed
actions that occur to implement the functionsdescribedin theseblocks wasgiven in
Section10of this report. However, Figure 11-14showsthe general functions and
the interaction betweenthe softwareandhardware. The interrupt whenthe GCX2
commandis received is not really a true interrupt. It simply halts the executionof
MACHINE )ERROR
STORE P COUNTERCHANGE P COUNTER
LOAD X2 STORAGE AREA WITH IOUTPUT REQUEST TO CC
MASK OFF ALL INTERRUPTS AND ]WAIT FOR CC
I }(PERFORMED
BY HARDWARE)
OUT ] C C ] ADDRESS
ADDRESS ] COUNT
ADDRESS OF OUTPUT WORDS
X2 STORAGE
AREA
Figure ii-13. Machine Error Routine
11-31
SOME PROGRAM
STORE INTO X2 REQUEST
WORD THE REQUEST FOR AN
I/O OPERATION
RETURN TO PROGRAM OR
EXECUTE A WAIT INSTRUCTION
PROCESS INTERRUPT
RETURN TO INSTRUCTION WHERE
THE PROGRAM WAS INTERRUPTED OR
FOLLOWING WAIT INSTRUCTION
SOFTWARE
Figure ii-14.
!
WAIT FOR CC TO REQUEST X2 |
IRESPONSE WORDS
INTERRUPT[ GCX2 IS RECEIVED
SEND X2 RESPONSE WORDS TO CC,
SAVE COUNT IF INPUT REQUESTED
!
SET UP ADDRESS FOR THIS |
ICELL MEMORY
WAIT FOR CC REPLY
STORE OR OUTPUT THE DATA
INTERRUPT UPON I/O COMPLETION
HARDWARE
Inter-Cell Bus I/O Routine
11-32
the present instruction and forces the hardware actions to be completed. It will
essentially release the processor to complete the present instruction and continue
with the halted program in the block 'Wgait for CC reply. " When the CC reply is
received a similar halt interrupt occurs until the communication is completed. The
communication is completed when: the count in the accumulator equals 0, the time
interval between data words coming in from the inter-cell bus exceeds some value,
or the transmitting cell sends out an end of transmission command. Then an I/O
complete interrupt occurs. The block "process interrupt" is expanded upon in
Figure 11-15.
The interrupt will enter block 1 in Figure 11-15 by executing the instruction in
the memory location dedicated to processing this interrupt. This interrupt will be
entered when the accumulator is counted down to 0, the time interval between data
words has exceeded that allowed by a hardware timer or a command "end of trans-
mission" is received and decoded. If there are errors such as parity, then the
interrupt will occur early, i.e. before the count has reached 0 or the timer has
signaled end of data.
The count is checked to see if it is 0. This is first accomplished by seeing if
the accumulator equals 0. If the operation was to input words and the accumulator
is 0 then all the words expected were received. (Note that the controller cell could
tell another cell to commuoicate less data than requested by the requesting cell as
explained in Section 11.2.1. ) For an input operation this is all that is needed, to
check for 0 in the accumulator, since the cell will save the count it requested in the
accumulator and count it down each time a data word is received. If it is not 0, then
the address in the program counter is compared to the initial address (given in the
location following the second X 2 response word) and the difference noted. This will
indicate how many words were received and the cell can then form a new X 2 response
word to try to get the remainder of the data on the next GC X 2 received from the
controller cell.
If an output operation was being performed, the procedure is slightly different.
In this case the count that is placed in the accumulator is received from the controller
cell. This should be zero at the end of transmission otherwise an error exists.
Therefore the program counter will be compared with the initial address given in the
same location as discussed above and the difference noted. This difference will be
compared to the count in the X 2 response word location and any discrepancies noted.
If they compare, all the words desired were output. If not, then a new X 2 response
word is formed as above.
If all the words were transferred, Block 2 is entered and the response word
location set to zeros. The X 2 response word is set to zeros so this request is not
repeated. The branch to the requestor or a return to the interrupted location may be
made. The X2 response word being zero may be tested by other soiCware routines
to verify the data transfer was completed and the interrupt processing routine was
entered and executed correctly. Block 4 will try to get the data again if there were
errors. Since the X 2 is still in the same location, the next time the controller cell
polls the cells, the same request will be made. If some words were transferred and
not others, the address would be modified and the request made only for the data
words that are missing. Block 6 will be entered if the retry was not successful.
11-33
INT[R"CELL PUS
I/000MPLETE
INTE[_U_T
TEST FOR ANY ERRO6o
IS COUNT " 07
SET X2 RESPONSE WORD
TO ZERO6
BRANCH TO REQUESTOR
TROUBLE
4
I
I ,
HAS A RETRY BEEN MADE?
(TEST RETRY FLAG)
SET RETRY FLAG. WAIT FOR
CONTROLLER CELL TO SEND
GC X2 AGAIN. ALLOW I/O INT|RRUPT
÷
ASSUME MAGHINE ERRO_ I
BRANCH TO MACHINE ERROR I
_NTEBRUrT
Figure 11-15. Inter-Cell Bus I/O Complete Routine
A machine (hardware) error will be assumed. The next transmission to the controller
cell will request the machine error routine.
Finally for another program within the cell to request some I/O on the inter-
cell bus, the program will test the X2 location, if it is zero, the program will load
the X2 request locations with the desired address and count. The branch address is
stored in a location known to the interrupt processing routine.
Interrupts three and four are concerned with the operation of the program. In
correctly operating software, these errors may not exist. They can be handled like
the machine error, sharing most of the instructions. Thus only three additional
instructions and a data word will be needed.
Sometimes the data values may vary radically and overflows may be expected.
In this case, the cell should contain its own interrupt processing routine. Conversely,
the cell could send its problem to a neighbor cell or even to the controller cell. The
time and space requirements all play a roll in defining how the software should be
developed.
The routine to process the data will be similar to present programming for
large computers. Essentially the programmer tells the routine what to do whenever
an error occurs. Thus on an overflow error, the programmer may cause the inter-
rupt routine to modify shift instructions and repeat the operation.
11-34
Illegal operations are usually assumed to be caused by a software error.
Usually these are treated as a machine error, i.e., the controller cell error correct-
ing routine is called to attempt to correct the bad program. The Flow diagram in
Figure 11-13 will perform this operation.
The last interrupt, the real time clock going to zero, is either expected and the
clockis set correctly, or the clock is not used and the interrupt is always masked
off. When the clock is wanted, the flow diagram in Figure 11-16 shows a real time
interrupting routine. Block 1 of Figure 11-16 stores the program counter and
branches to the dedicated memory location for the real time clock.
Block 2 resets the clock, calculating the time interval from the present contents
of the RTC, the time the clock was set and the desired interval to the next interrupt
time. The branch address is updated so the next requestor may be branched to.
If there are more than 2 times, a pointer to indicate which is the next time for
the clock needs to be updated. The complexity of the list depends upon the system
software requirements. A scheduling of "random" time intervals will require a push-
down list technique or similar software programming technique.
In all cases, the interrupt has been processed, the time for the clock when the
next RTC interrupt occurs is ready. The branch to the proper routine that needs
processing at this time is made. If the RTC interrupt has not yet been allowed, it
is allowed at this time.
REAL TIME )CLOCK INTERRUPT
I
!
3 4,
I GET NEXT REQUESTOR ADDRESS IAND UPDATE LIST & TIMES
i
GET NEXT TIME TO INTERRUPT
AND STORE IN CLOCK
ALLOW INTERRUPT FOR RTC AND
TRANSFER TO THIS REQUESTOR
Figure 11-16. Cell Executive Real Time Clock Routine
11-35
The cell executive is seen to be as small as possible. Only the common,
frequently used routines are in the memory. Error correcting procedures and
complicated testing programs that will be used infrequently are all kept in the con-
troller cell. The programmer will make trade-offs between the amount of executive
required in the cell and the speed requirements. Obviously, it is slower to ask the
controller cell to perform tasks than to do the same task by having the program in
the same cell.
One point that should be brought out here is that there will exist two types of
cells that will require quite different executive routines, these are the operational
cells and the I/O cells. Operational cells are those actually performing the tasks
and I/O cells have the connections to the sensor/conditioners via their I/O lines.
The I/O cells will be primarily concerned with inputting/outputting information over
the inter-cell bus and the I/O line.
Basically both cells will contain the same fundamental executive ideas intro-
duced in this section. However, the I/O cell will contain considerably larger lists
of tables and data handling routines. Its principal program in fact will be that of
communicating over the inter-cell bus whereas in an operational cell this is more
of an executive routine to handle the inter-cell bus communications.
11-36
12. RELIABILITY SIMULATION
12.1 MONTE CARLO METHOD
12.1.1 Introduction
The distributed processor organization was subject to a reliability simulation
for a manned Mars mission application. A Monte Carlo reliability analysis program
was used for the simulation. This program generates reliability and availability
data based on simulating the organization.
Monte Carlo techniques of analysis refer to the simulation of random variables
in a process by the generation of random numbers or sequences. For reliability
analysis, the random event which is simulated is component or subsystem failure.
It has been found that electronic equipment exhibit random failure rates which have
an exponential distribution. That is, the probability of failure as a function of time,
Pf(t), is exponentially distributed.
Pf(t) = 1-e -At where k is the expected failure rate.
This equation is interpreted as meaning: Given that the equipment is currently failure
free, the probability that a failure will have occurred by some later time t is given by
Pf(t) = 1-e -At.
It can be shown that the expected time to failure is k-1. The probability density
function for Pf(t) is Ae -At. The expected value of time, E(t), is then found by
oo
E(t) = /At e -At dt =1k
o
E(t) is then defined as the mean time to failure, MTTF.
12.1.2 Operation of Monte Carlo Program
The Monte Carlo program solves the probability of failure equation in reverse.
It generates a random number which is evenly distributed between zero and one. It
sets this equal to the probability of failure and solves for time to failure.
Pf(t) = 1-e -At
= _in [1-Pf(t)l
k
or
t = -ln 1-Pf(t) • MTTF
12-1
This equation is solved for each piece of equipment in the system. The )_ or MTTF
which is used in solving for t is a function of whether or not the equipment is turned
on or off. It has been found that electronic equipment is susceptible to failure even
while it is sitting idle. This idle failure rate is distributed exponentially also and the
failure rate is approximately proportional to the active failure rate. For this analysis
it is assumed that the idle failure rate is directly proportional to the active failure
rate; so that Pf for an idle piece of equipment is:
where _ is the constant of proportionality.
A block diagram of the Monte Carlo program is given in Figure 12-1. The pro-
gram first generates the time to failure for each piece of equipment. Equipment is
divided up into two basic types, prime and secondary, prime equipment is that required
to keep the computer operating while secondary equipment is that required to replace
prime equipment in the event of failure. Prime equipment will have its statistics
generated by an active failure rate while secondary equipment may be in the active
or idle failure rate mode. Equipment may also change between active and idle modes
depending on which phase of the mission is being considered. This will become
clearer when the model for each phase of the mission is presented below.
Continuing with the operation of the program, it then checks if any of the times
to failure of prime equipment are less than or equal to the length of the first phase of
the mission. If there is a failure, another random number is generated and compared
with a probability of detection of the failure. If the random number is lower than the
probability of detection, the failure is recorded as a detected failure. That equipment
is then replaced (in the simulation) and the program continues. If the random number
is higher than the probability of detection the failure is recorded as an undetected fail-
ure. An undetected failure can result in a mission failure if it occurs during a critical
phase. Undetected failures that occur in other phases are simply disregarded here and
assumed to be detected. This is a simplification of the actual case, namely that the
probability of detection will approach 1.0 as time increases from the occurrence of
the undetected failure. The effects of this during mission phases that are non critical
will be to lower the computer system availability. The error in assuming the unde-
tected failure is immediately detected will be trivial since these mission phases are on
the order of thousands of hours and the undetected failure should be detected in less
than approximately 10 hours.
The failed prime equipment will be replaced only if a suitable secondary piece of
equipment is available. This requires generating a time to failure for the secondary
equipment and also a detection probability if this time to failure is less than the length
of the first phase of the mission.
The program then changes to the next phase of the mission and follows the same
procedure for the computer configuration for the new phase. Of course if a mission
failure occurs in a critical phase the program is terminated for this mission.
The program collects the statistics for each phase. Critical mission phases are
primarily concerned with mission failure (probability of success) and the program will
record these failures. Other phases of the mission are concerned with availability of
the computer system. For these phases the percentage of the prime equipment that is
operating will be recorded.
12-2
t_
o
o
o
°_
o
!
c_
12-3
When all the phases of the mission have been completed, the process is again
repeated. By repeating this process, statistics are collected for probability of success
and availability. As more and more runs are made the error in the results decreases,
appendix B gives curves of the error in Monte Carlo simulations.
12.2 SIMULATION MODELS
This section will present the actual models used in the Monte Carlo program.
The mission was divided into five basic phases from the twenty given in reference 1.
This was done to reduce the running time of the program. These phases are given
below:
Table 12-1. Mission Phases
Phase
Non Critical
Critical
Mars Orbital
Non Critical
Critical
Hours
2962.8
1
974.3
6242.6
0.2
Mission Time
(hours from to)
2962.8
2963.8
3938.1
10,180.7
10,180.9
The first Non Critical phase consists of the first 9 mission phases listed in Section 2
namely Atmospheric Ascent through Mars Approach Correction; the first critical phase
consists ot phases 10 and 11, namely Mars Aerobraking and Mars Orbit Injection; the
Mars Orbital phase is simply phase 12; the last non critical phase consists of phases
13 thru 19, namely Trans Earth Injection thru Earth Approach Correction; and the last
critical phase is phase 20, Earth Re-Entry. Each of the three basic types of phases,
Non Critical, Critical, and Mars Orbital results in a different computer configuration
as will be presented below.
12.2.1 Mars Orbital Phase
The configuration for this phase consists of 4 groups turned on with 18 cells
required per group as shown below in Figure 12-2. Spare cells, group switches and
cell busses are provided for in each group. Spare groups are also provided for.
One identical failure rate will be assumed for each of the cells and cells which are
in groups that are off will have the failure rate of the cells in groups that are on,
multiplied by a factor (_,ON/_,OFF). If a group switch or a cell bus is not available in
a group then this group is considered as having totally failed. Assuming a group
switch and a cell bus are operating in a group, the number of cells that are operating
in a group is the primary statistic of interest. This will give an indication of the avail-
ability of the computer system when all the groups are considered. The group bus is
quite important to efficient operation of the groups. However, a group could still
continue to function in the absence of a group bus. Therefore, if no group bus is
12-4
ON
GROUP 1
REQUIRED CELLS
o n--D}_t _ _(i s, ::)• All
GROUP 4
(SAME AS GROUP 1)
I |
_doRou bBUS
OFF
(SPARES)
L
SPARE GROUP 1
(SAME AS GROUP 1)
I I
GROUP
BUS
qll
Figure 12-2. Mars Orbital Configuration
12-5
available the total percentage of cells operating will simply be reduced by some
fraction. This effectively degrades the availability of the computer in the absence of
a group bus as would be the case.
12.2.2 Non-Critical Phase
The configuration for this phase is identical to that for the Mars orbital phase
except that only three groups will be required as shown in Figure 12-3. It should be
noted that in this phase group 4 is considered a spare group. Availability is the
prime concern here and the same comments as for the Mars-Orbital Phase apply
here.
12.2.3 CriticalPhase
The critical phase is primarily concerned with mission success or failure.
This is determined by whether or not a critical navigation and guidance program can
be successfully run on the computer and correct outputs sent to the critical sensors.
The critical phases are primarily the planetary entry phases. It is during these
phases that failure of the computer may result in destruction of the vehicle and a
resultant mission failure. There are other phases during which one requires
navigation and guidance computations that may be considered very important such
as midcourse maneuver phases. However, failures here may simply result in the
full mission not being accomplished due to the expenditure of too much fuel, etc.
These effects can not be classified as availability or probability of success. However,
they do affect the probability that the total information desired from the complete
mission will be obtained. This is a systems analysis type of effort that is relatively
important, however it is beyond the present scope of the study. (The availability
determined in this study will also have affects on this probability of achieving the
total information desired. )
The distributed processor offers many internal paths around which it may be
reconfigured due to failures as pointed out in Section 8. Therefore, the configura-
tions to determine whether or not the computer has failed in a critical phase are very
complex. Three configurations have been defined for a critical phase for this simula-
tion as shown in Figure 12-4. The first configuration is the one used if a group bus is
available during the critical phase. Spare cells can replace failed cells in the groups
and likewise for the other components. It should be noted that no spare groups are
applicable in this phase. This is due to the fact that one would generally not have time
to load up and get these groups running during the relatively short critical phase.
Another point to be noted is that three of the operating cells in each group, say cells 1,
2 and 3, have been selected as non replaceable in case of failure by a spare cell.
This is due to the fact that these cells may require I/O connections etc. to the outside
world and it is assumed these would not be altered during the critical phase. A group
is defined as operating if and only if it has 16 operating cells, a cell bus, and a group
switch. Only one group is required to be operating during the critical phase for a
successful critical phase (the critical navigation and guidance function should fit into
one group). To take advantage of all the sparing outlined above, the group bus is
needed.
12-6
ON
GROUP 1
REQUIRED CELLS
rrl
SPARE CELLS
GROUP 2
GROUP 3
RES
(SAME AS GROUP 1)
(SAME AS GROUP 1) t
GROUPBUS
OFF
(SPARES)
GROUP 4
SPARE GROUP I
(SAME AS GROUP 1)
(SAME AS GROUP 1)
I GROUP IBUS
Figure 12-3. Non-Critical Phase Configuration
12-7
A) GROUP BUS OPERATING:
GROUP 1
REQUIRED CELLS
SPARE CELLS
, I II ii
I GROUP 4
I
(::%s !)
(SAME AS GROUP 1 )
B) GROUP BUS FAILS DURING CRITICAL PHASE:
GROUP A
I, (SAME AS GROUP A)
GROUP
BUS
I
G ROU P
BUS
, SPARES
C) GROUP BUS FAILS BEFORE CRITICAL PHASE
GROUP
ii
(SAME AS GROUP I)
Figure 12-4. Critical Phase Configuration
12-8
If a group bus is not available a new configuration is assumed. Two cases can
occur here. The bus fails during the critical phase or before it. If a bus is not
available during the phase and was available before the phase was entered (degenera-
tion of configuration a in Figure 12-4) then the configuration shown in Figure 12-4-b is
assumed. As can be seen from this figure, one of the two groups needs to be working
for a successful critical phase. However, note that no replacement of failed ceils is
possible by spare cells in a group.
Finally, if before entering a critical phase, no group bus is available, the con-
figuration in Figure 12-4-C would be assumed. This is similar to Figure 12-4-b in
that only one group is needed to have a successful critical phase and no replacement
of failed cells is possible. However in this case 4 groups are used. This implies
that all 4 groups are loaded with the same programs and the outputs switched.
It should again be noted that undetected failures are also considered in the
critical phases. An undetected failure in the group that is being used for output to
the critical sensors will result in immediate failure of this critical phase. However,
an undetected failure in other groups will not result in a failure of the critical phase
unless this group is brought in to replace the group outputting to the critical sensors.
12.3 SIMULATION RESULTS
The results of the Monte Carlo simulation will be presented in this section.
Numerous cases were simulated varying parameters such as, cell and group switch
failure rate, number of spare cells per group, number of spare inter-group busses
and group switches, number of spare inter-cell busses, on/off failure rate ratio, and
percent of failures affecting the busses. A listing and definition of the parameters
that could be varied in the Monte Carlo program is given below:
MTBF Each cell and each group switch failure rate
S Number of spare cells in each group (number of cells
exceeding the 18 required)
M Number of spare groups
K Number of spare inter-group busses (also the
number of spare group switches per group)
V Number of spare inter-cell busses per group
PD Probability of Detecting a Failure
PB Probability that a cell or group switch failure results
in a failure of a bus
o/o MTBF ON/MTBF OFFR_io
A total of 23 different cases were simulated. An example of the print out of one
case is shown in Figure 12-5. Examination of this figure will give more understanding
as to the operation of the program. The first two columns contain the phase and time
at which the statistics are accumulated. The program starts at t = 0 hours and
12-9
o
0
0
0
(D
II
U.
I--
=E
u",
al
0 _
z ,
o)-
_v
ii
ze.._
OZ
0
o_
.=1
w
e,e
W
b.-
I,,,.-
w
I
I
UJ
I.- 0 0
Z
_ 000000000
dgdddddgd
_ _ O0 000
_ 000000000
ooeee_oee
_D _00000000
_ ooeoe_ooo
_ 000000000
_- gdogggggd
_ I _ _0000_00
_,_ ggJggZggJ
u_ 0 0
'_LU 0 (D
CDu 0 C_.
_U 0 0
_-:D 0 0
• eeeeee
Ziggg;gdd
IIIllllli
ZUZEZZZZ_
4_
0
[J
0
I
O_
12~10
examines the computer configuration at t = 2962.8 hours, 2963.8 hours, etc. This
procedure was described in Section 12.1. The next column contains the cell availabili-
ity. This is the average number of cells available (good) in the computer system
(Number Available/Number actually required to meet the requirements of the particu-
lar phase) for the time interval just evaluated. For example, at t = 8500 hours, a
cell availability of 0. 98911 is the average availability from t = 6500 to t = 8500 hours.
Cell availability is not of interest during a critical phase and is not calculated at
phase CR-2 and CR-9. In its place probability of success is calculated. The conditions
on probability of success were defined in section 12.2.
In .-_lr]14-1n_ _,l-n.b|o4-|_ ,1-I.,^ ^^11-- --_..1 I.. .... ;1 . . _
_4t_ lt._ 't,..4iL #,.,L k.J L $. , J*.__ C]i, L, L b*_ I, L I._ _ on _,Llq_:_ .......
_wL_ ,_L_u_u_ faLmre_ wvie printed out. Cells i,
2, and 3 were defined as having external connections and were therefore considered
apart from the others in a group. The average number of these cells failing during
each time interval is printed out in column 5. It should be pointed out here that a
total of 1000 runs were made for each case and the number printed out in this column
is the average of the 1000 runs. For example, for the first time interval, t = 0 hours
to t = 2962.8 hours a total of 159 of these cells failed. Therefore, the average
number of cells 1, 2, and 3 failing is 0. 159. It should be noted that this is the total
number of cells 1, 2, and 3 in the computer system, each group contains 1 cell 1,
1 cell 2, and1 cell 3.
Columns 6 and 7 give the total number of cell and group switch failures
respectively. Columns 8 and 9 contain the statistics for the inter-cell (IC) and inter-
group (IG) bus failures respectively. Undetected failures are considered in the
critical phases and the last column gives the total number of these occurring.
The results from the simulation are summarized below in Table 12-2. The
print out for each of the cases may be found in appendix C.
It should be noted that the statistics for cell, group switch, and bus failures are
tabulated for the entire mission and not each phase as in the print outs. Also
availability is the average over the total mission time. A word of caution shouldbe
mentioned when reading the table; errors are inherent in Monte Carlo simulations and
the curve in Appendix B should be referred to. This will give one an indication how
far to extrapolate the results.
Effects of N - Number of Spare Cells Per Group
Figure 12-6 shows a curve of the average availability over the entire mission
as a function of the number of spare cells in each group. The conditions as shown are
for an MTBF of 200, 000 hours for each cell and group switch, 1 group hus, and 1 cell
bus per group. The average availability is 97.25°_ with no spares and rises to 99.6°_
with two spares per group.
It should be noted that for this and all the curves in this section the following
conditions hold unless otherwise specified: M (Spare groups) = 0, MTBF-OFF/
MTBF-ON = 2.0, PB (probability of a failure affecting a bus) = 0. 5 percent.
12-11
u_
0
o,.d
U_
I
C_
v-4
u-_ u'_ w-I C,_ _i _ _ u_ _ _ c'D oO _'_ O_ _i _ c_ '_
12-12
0
r_)
I
¢'x]
¢'xl
12-13
Figure 12-6. Average Availability vs Number of Spare
Cells Per Group
12-14
Effects of MTBF
The effects of cell and group switch MTBF (on) on average availability are
shown in Figure 12-7. These effects are shown parametrically as a function of the
number of spare cells per group. The conditions on the curves are no spare busses
in the system. The curve for N=2 is dotted between 75,000 hours and 200,000 hours
since no point was actually obtained at 75,000 hours. As may be expected, MTBF
has a significant effect on availability and is most pronounced for values up to
200,000 - 300,000 hours. In addition, the improvements due to adding spare cells
are most pronounced when adding the first spare cell.
Effects of K - Inter-Group Busses
Average availability as a function of MTBF is shown in Figure 12-8. Curves
are shown for K-0 and K=I with both N=0 and N=I. In both cases a slight improvement
in availability results by having one spare intergroup bus.
Effects of Y - Spare Inter-Cell Busses
Figure 12-9 shows the average availability as a function of MTBF when the
number of inter-cell busses per group is varied. It can be seen that only a slight
improvement in availability results by adding a spare inter-cell bus.
Effects of ON/OFF Failure Rate Ratio
The MTBF-ON ratio was varied from 1 to 5 and the effects on average avail-
ability is shown in Figure 12-10 for two values of MTBF-ON. As may be seen the
availability did vary significantly. One of the factors that probably produced such a
result is that three out of the four groups are always on and the fourth one on during
the Mars Orbital phase therefore, most of the equipment is always on. However, it
wiii be _howu ..... *_-+ +_ _otln h_s an important effect on probability of success.
Effects of PB- Percent of Failures Effecting Busses
The percent of failures of a cell or group switch bringing down a bus was varied
from 0.5 to 2.0 percent and the effects on average availability observed. The changes
in availability were insignificant. It is felt that somewhat higher values of PB would
have to be used before degradations in availability result.
Availability as a Function of Mission Time
It should be recalled that availability is a function of time during the mission
and the curves presented previously only indicate the average availability. Figure
12-12 through 12-15 show the variation with time for four different conditions:
(1) N=0, K=0, V=0, (2)N=0, K=I, Y=0, (3)N=I, K=0, Y=0, (4)N=I, K=I, Y=0. In
addition the curves for three values of MTBF are shown for each condition. It should
be noted that the curves are plotted as straight line segments since the values repre-
sent average availability over fixed time intervals of the mission.
12-15
Ii
J
i
I
I
i
I
:i
:i
"i
i
i
i
i
i
i
Figure 12-7. Average Availability vs Cell MTBF Varying
Number of Spare Cells
12-16
Figure 12-8. Average Availability vs Cell MTBF Varying
Number of Inter-Group Busses
12-17
Figure 12-9. AverageAvailability vs Cell MTBF Varying
Numberof Inter-Cell Busses
12-18
L
"_1 III I I J J
,JlJ IIEil
llllllt]
r i i i i i i i
I i i I _ •
i p q ÷-
_JAd:
.__._,..+...+_
½J-l-
i p i i
u i
&LI-I
,_t-. t
-_T-t-t
4-_-L
++-4-t-
_--_u4-.- +
2H--I
I
Figure 12-10. Average Availability vs MTFB-OFF/MTBF-ON
12-19
Figure 12-11. AverageAvailibility vs % of Failures Affecting Busses
12-20
Figure 12-12.
6 8,
MISSION TI_ (HOURS)
q i
_4
+4
4_
'1
5
=.--2--
&
!!
44
Z
u
I
A:
!
F
+
-4-
I
i
,4
t--*
@
d-
!
Percent Unavailibility vs Mission Time for
N=O, K=O, Y=O
IIi
! : i
111
Ill
III
12-21
Ltrrrrr
II]IIIT
i i i i_ L
IIillll
44-1411 I
I_L_
]L _AJ_ 14
=t i t 4.LII
i i i i i
I i i i i i i
! FF F-_ I
21_1 I r'
Figure 12-13. Percent Unavailability vs Mission time for
N=O, K=I, Y=O
12-22
2,000 4,000 6 000 8,000 i0, 000 12,000
MISSION TIME (HOURS)
Figure 12-14. Percent Unavailability vs Mission Time for
N=I, K=0, Y=0
12-23
Figure 12-15.
6p _D
MISSION TIME (HOURS)
I0,000 12,000
Percent Unavailability vs Mission Time for
N=I, K=I, Y=O
12-24
Both percent availability and _c unavailability are indicated in the figures.
Comparing figure 12-12 (N=0, K=0, Y=0) with figure 12-15 (n=l, K=I, Y-0) one can
see the improvements as a function of time by adding spares. Note also that in many
of the curves the availability at the end of the Mars orbital phase is lower than at the
start of the trans earth phase. This is due to the lower requirements in the trans
earth phase.
Probability of Success
It should be recalled that 1000 runs were made for each case and therefore the
results can only be extrapolated to the third decimal point. In all the cases run with
MTBF's of 200, 000 and 500, 000 hours no mission failures resulted. Therefore a
fiat curve at Ps = 1.0 results for these values. With MTBF's of 75,000 hours there
were a number of mission failures and the values for Ps ranged from 0. 967 to 1.000.
Figure 12-16 shows the effects of ON/OFF failure rate ratio on Ps- As may be
seen, the only effects were with an ON MTBF of 75,000 hours. The high probability
of success of this system is due to the many logical paths available for reconfiguration
as discussed in Section 12-2. It should be recalled that availability was not signifi-
cantly affected by ON/Off failure rate ratios, however, Ps is affected at lower values
of MTBF (on) as can be seen in Figure 12-16. The reason behind this appears to be
due to the fact that only i group is necessary for a successful critical phase and the
fact that one group will be off during a good part of the mission means that it will have
considerably fewer failures as the ON/Off failure rate ratio is increased.
One other point should be mentioned with regards Ps. The probability of
detection was 95 percent for all cases run. One case, case 13, had an undetected
failure in a critical phase in one of the 1000 runs. Also one of the runs resulted in a
mission failure during that critical phase. It appears that this mission failure was due
to the undetected failure since the failure occured with an intergroup bus available as
may be seen by investigating the print out for case 13 in Appendix C. Due to the low
number of undetected failures it was felt unnecessary to vary the probability of
detection parameter.
Number of Cell, Group Switch, and Bus Failures
The average number of failures of specific components were plotted in
Figure 12-17 through 12-19. Figure 12-17 shows the average number of cell failures
as a function of MTBF. One may expect from 8.1 to 1.3 failures per mission as
MTBF varies from 75,000 hours to 500,000 hours; this is for the entire computer
system. Also shown in this figure is the average number of group switch failures per
mission with one and i spare group switch per group. In all cases one can expect
less than 1 of these failing.
Cells 1, 2 and 3 in each group were arbitrarily assigned as having external
connections (e. g. used with I/O conditioners). The average number of these types of
cells failing over the mission is shown in Figure 12-18. This gives an indication of
the number of times a connection may have to be changed due to a failure, providing
no spare connections exist. This number varied from 1.43 to 0. 22 with cell MTBF.
12-25
Figure 12-16. Probability of Success vs MTBF-OFF/MTBF-ON
12-26
Figure 12-17. Cell and Group Switch Failures vs Cell MTBF
12-27
Figure 12-18. Cells with External ConnectionsFailing vs Cell MTBF
12-28
Figure 12-19. Inter-Cell and Inter-Group Bus Failures
vs Cell MTBF
12-29
Bus failures were determined as a percent of cell and group switch failures.
The curves in Figure 12-19 show the average number of bus failures assuming
PB = 0.5 percent. In general the numbers here are fairly low.
12.4 SUMMARY AND CONCLUSIONS
More accuracy could be obtained in the results by having made more runs for
each case. However, the results appear accurate enough to derive some general
conclusions. Availability is influenced very heavily by the number of spare cells and
the MTBF. Alloting i or 2 spare cells per group results in a fairly high average
availability. MTBF's of 200,000 hours or higher provide very good results. A
200,000 hour MTBF is equivalent to a 0.5 percent/1000 hour failure rate which appears
a conservative estimate for a single MOS-SOS LSI wafer in the future. Nevertheless,
even with conservative estimates, very good results are obtained.
Spare group switches had a slight affect on improving availability. However,
spare cell busses had very little effect. The conclusion is that 1 inter-cell bus is all
that would be recommended.
Probability of success appears to be very high with this system. However, if
the cell MTBF's are quite low; one can obtain a significant improvement in PS if the
MTBF-OFF/MTBF-ON ratio is high. In addition, probability of detection does not
appear to be a very significant parameter here.
Finally the number of cell failures do not appear very high. In fact one can
expect to have to change a connection less than once during the entire mission due to
a failure in the computer.
Considering the above results the recommended organization would be one with
2 spare cells per group and 1 spare group switch (2 inter-group busses) per group.
For the manned Mars mission considered, this results in 4 groups of 20 cells per
group.
12-30
13. SUMMARY AND RECOMMENDATIONS
A summary of the work accomplished during this phase of the study along with
areas requiring further investigation is given below. Computer requirements were
defined for what is considered as a representative future manned space mission, the
Mars Lander Mission. Some significant points about the requirements are the widely
,varying computer requirements in terms of speed and storage from phase to phase
and the critical nature of the computations during certain phases such as atmospheric
entry. The requirements have a large influence on the computer design and there are
many areas which could not be covered completely primarily due to a lack of data.
Some of the more important areas are: (1) the interface between the computer and the
sensors, information is lacking as to the nature of the signals expected, (2) precision
requirements for various computations needs to be firmly defined, also the question
as to whether floating point is needed for some navigation and guidance function
should be decided, (3) the structure of a reliable bulk storage unit for the time period
of interest should be investigated and (4) the investigation and design of ultra reliable
switches should be carried out. Any future requirements studies should take into
account the above points.
It was shown that future LSI technology is difficult to extrapolate. However, it
is felt that the technology predicted should be attainable in 10 years. Some important
areas of development for LSI techniques were outlined in section 3.
Parallelism within computations is an important area of consideration for highly
parallel computer structures such as the distributed processor. In particular,
methods for measuring the effectiveness of parallel structures need to be developed
and most important of all, the procedures for optimally assigning parallel computations
need to be developed.
The organization and its architecture were developed. There are many areas
where alternate decisions could have been taken as pointed out in numerous places in
chapters 5 and 6, resulting in a slightly different or radically different architecture.
Section 6.9 contains a number of such trade-offs.
The design of the cell and group switch may be considered as "preliminary
functional." The next logical step in the design would be the detailed logical layout
including logic equations and timing diagrams. Such a design was beyond the scope
of the study; however, the design carried out showed that no major difficulties should
be encountered in a detailed design effort.
Software and executive methods were investigated in considerable depth.
However, this area is quite complex for this organization and considerable further
study is required in this topic. Some of the more important areas are the reconfigura-
tion executive routines and the assignment executive routines.
The reliability simulations showed that the organization exhibits very impressive
probability of success and availibility features. A most important result is that even
with conservative estimates of a cell (1 wafer) MTBF, 200, 000 hours, the reliability
results are very good.
13-1
Finally, since the technology being considered is considerably away in terms of
time, an approach using microwave circuits on the LSI wafer and communicating
between cells at RF frequencies was considered. (This is presented in Appendix D. )
It was concluded that this approach may very well prove feasible in the time period of
interest; however, to make the design perhaps somewhat more near term this
approach was not selected. It was shown that with an RF communication system, the
organization could be considerably enhanced in terms of reliability and performance.
13-2
APPENDIX A. EXAMPLEPROGRAM
A. 1 INTRODUCTION
The distributed processor system, being a new concept of computer design,
requires some new software to make the best use of the hardware. The example pro-
gram that is described in this section will show how a typical program is coded and
some of the considerations used in coding the problem. Two examples of coding the
problem are shown, the second uses global programming, the first does not.
One major part of the software is the compiler that takes the equations and
statements from the programmer and converts them into instructions, cell assign-
ments, input/output commands, etc. This compiler should appear to the programmer
similar to present advanced languages, such as PL/I or JOVIAL. There will be added
statements for the programmer to control the amount of applied parallelism, cell
assignments and intercell bus usage. This compiler for the distributed processor
system will require development. In the past 10 years the computer industry has
progressed from absolute octal and simple assemblers to the powerful languages of
today. There is no reason to believe that compiler techniques cannot be developed to
compile programs for the distributed processor system. For example, the following
statements can be added to the language and used by the programmer to control the
compilation.
, Data Description. To aid the compiler in finding all the places where
parallel processing could be used, the programmer could specify how the
data is to be arranged. An example is a vector notation X(3). The compiler
could place the three components of X in one cell, or put one component in
each of three ceils.
. Schedule. The programmer may wish to force data to be processed in
parallel, or conversely, he may wish an array to be processed serially by
a single cell.
There are several areas where more work needs to be done; some of these are
given here.
. Program interactions. This occurs whenever several programs share, in
time, a cell or the use of a subroutine. Several programs may request,
simultaneously, the use of a common subroutine. The programmer needs
tools to force the compiler to generate programs with the desired
interactions.
. Data allocation. Data may be in several cells or may be in one cell. Data
may be placed in one cell and distributed to several ceils as needed. The
programmer needs ways of controlling how the data is to be arranged in the
computer.
A-1
. Data Dependent Problems. An example illustrates this problem best.
Assume the problem is to invert anNxNmatrix. WhenN =3, a3 x3
matrix can easily be inverted within a single cell. To invert a 30 x 30
matrix (N = 30) requires a different program and many cells. Applied
parallelism can be used to do the inversion in a reasonable time.
These problem areas are being studied by many people. By the time the manned
missions to Mars are ready to fly, these problems should be solved. References 34
thru 36 indicate work that is being done now.
The example chosen for this program is the navigation equations given on
pages 50 - 54 of the first quarterly report of Phase I of this study, reference 2. These
equations are not so complex as to require many words of explanation, but are of
sufficient complexity to illustrate the techniques used to program the distributed
processor.
There is some opportunity to use applied parallelism. Many of the vectors have
three components that may be processed in parallel. Example 2 will show how applied
parallelism can be used. Because of the nature of the navigation problem applied
parallelism does not give a dramatic time reduction. A data reduction program is a
better example of how applied parallelism reduces time. However only the
navigation problem was coded due to time limitations.
The distributed processor has many ways of performing a task. The method
selected for this example is not the only way the computer system can be programmed,
nor is it expected to represent the best way. The method was selected to best illustrate
the software.
The general flow diagram is given in Figure A-1. This diagram and the details of
the program will be described in the following sections.
Block 1 is the interface with the controller cell. An assumption made was that
the controller cell will contain the current values of the navigation parameters. A
Kalman filter would process the various parameters, making the best estimate by using
data from the various sensors. The output is the best estimate of the present state,
(the state vector). This state vector would be saved in the controller cell and sent to
the navigation program. The controller cell will send the time and state vector to the
navigation program via the inter-cell bus.
The navigation cell will use the time for a check, as each cell has its own clock.
Thus the time parameter is for checking the clocks of the two sets of cells. The navi-
gation cell will keep a copy of the last values it sent to the controller cell. The naviga-
tion cell will compare these past values with the ones sent from the controller cell.
The values should be in very close agreement. If all is acceptable and the data checks,
the navigation calculations will begin with block 2.
A-2
3o
'5.
START
RECEIVE DATA FROM
CONTROLLER CELL,
CHECK TIME CLOCK
RECTIFY OSCULATING
ORBIT
I=0
I
UPDATE OSCULATING
ORBIT, COMPUTE
PERTURBATIONS
I=I+l
DOES I = 4?
YES
UPDATE STATE VECTOR
ESTIMATE
SEND DATA TO ]
CONTROLLER CELL,
SET TIME CLOCK
WAIT FOR ELAPSED
TIME, THEN GO TO
START
Figure A-1. Navigation Program Flow Diagram
A-3
One difference between this computer system and a conventional computer
should be noted here. The purpose of block 1 is to use a more accurate state vector
value and to start the calculations at the stated time period. However, the cell carl
function without this block. Each cell contains a clock, so the navigation cell can be
interrupted as required to set the proper time period. The past values calculated
will be very close to those best estimates from the filter. Thus, if the controller has
a reconfiguration or a more important task, the navigation cell can still function and
begin its calculations. If the controller should fail to inform the navigation cell that
it should begin calculating, perhaps because of a malfunction, the navigation cell's
clock will interrupt and the navigation cell will begin anyway. Thus the system is
more reliable because it is not depending upon a single processor and a single clock
for all its control.
Block 2 rectifies the osculating orbit, if required. As will be seen, there is
much processor capability remaining unused, so performing the rectification each
time will not cost any additional processor time. However, the inter-cell bus may be
needed more. No attempt was made to modify any equations, they were coded as they
were given.
Block 3 calculates perturbations and evaluates the derivatives. This is done
four times. Block 5 does the numerical integration and updates the state vector.
Block 6 sends back the updated state vector parameters to the controller cell.
Block 7 is a wait until it is time to go back and start again.
This navigation program will be described in more detail. Example 1 does not
use any global control; it will be described first using the neighbor cell for subroutines
and not using the inter-cell bus except when necessary. Then a variation of the
example is studied, using the inter-cell bus for passing necessary subroutine data.
The second example shows how global programming, using three dependent cells, can
be used to solve the program. Initialization is not considered in the examples and
would be done earlier in the mission by some other program.
A.2 EXAMPLE 1
The example program and storage estimates are given in Table A-1. In cell 1 is
placed the controller cell interface subroutines and two parts of the navigation program
that did not fit in cell 2. Cell 2 contains most of the operational navigation program.
These two cells make up a self-contained set that is nearly independent of the rest of
the group.
The word counts in Table A-1 were obtained by trial programming the equations.
All data is assumed to be 32 bits in length, thus most operations were done in double
precision (32 bits}. It was felt the only exceptions were so few in number as not to
affect the results or any conclusions.
A-4
Table A-1. Example 1 Storage Estimates
Cell 1
Navigation Program Executive, I/O, Error checking
Rectify Osculating Orbit
Update State Vector
Temporary Storage
Subroutine Storage for Square Root, Sin, Cos, Exp.
Total Cell 1
Cell 2
Block 3
Calculate Delta E
Update Osculating Orbit
Compute Perturbations and evaluate Derivatives
Block 4
Test I, Update Delta
Permanent Data Storage and Constants
Total Cell 2
Total Storage
Words
102
56
31
40
228
457
73
72
164
17
98
424
861
To obtain the inter-cell bus usage, an estimate of the execution time is required.
Table A-2 lists an estimate of time required to perform different operations. This set
of assumed times makes no claim of absolute accuracy, but is only an estimate of the
relative times that can be obtained with a reasonable quantity of hardware.
The inter-ceU bus usage can now be estimated by using the times given in
Table A-2. The functionsas illustratedinFigure A-2 were estimated to require the
followingtimes. These times include both hardware and software overhead.
Controller cell send X3 command
Receiving cell set-up time
Send first word of data
Send next 13 words
Total transmit time
20 cycles
18 cycles
6 cycles
26 cycles
70 cycles
Note that only the top half of figure A-2 is applicable here. The lower half
applies to the variation of example 1 that will be discussed later (subroutines not in
neighbor cell).
A-5
Table A-2. Relative Times Used in Programming Examples
Time to transfer one byte on the inter-cell bus
Time for one memory cycle
Load or store an accumulator
Add or subtract, single precision
Add or subtract, double precision
Multiply, double precision
Perform SIN/COS, square root, double precision
Perform SIN and COS with one call, double precision
Perform exponential, double precision
1 cycle
2 cycles
6 cycles
6 cycles
10 cycles
42 cycles
600 cycles
1000 cycles
1000 cycles
The fourteen data words are the six state vector parameters and the time. The
inter-cell bus is not needed again until the computed parameters are to be returned to
the controller cell. This is assumed to require the same time as sending the data to
the navigation cell. This assumes that the controller cell will request the navigation
cell to return its computed parameters. The navigation program requires 140 cycles
of inter-cell bus time.
The program execution time is estimated below.
Block 1
Start 16 cycles
Input start parameters, check 214 cycles
Block 2 2200 cycles
Block 3 and 4 51600 cycles
Block 5 1000 cycles
Block 6 90 cycles
Block 7 remaining time
Total 55120 cycles
There are, depending upon the hardware capabilities, between 106 to 107 cycles
per second of real time. Thus between 5.5 percent to 0.55 percent of the cycles
available to the cell in a second are being used. The cell is in block 7 (idle) during
the remaining time as far as this navigation program is concerned. (Of course during
the idle time the cell could be used for global operations from another program etc. )
The inter-cell bus usage for this task is only 0. 014 percent to 0. 0014 percent.
The navigation problem, as programmed here, has used more storage to decrease
dependence upon the inter-cell bus.
A-6
m..1
c)
,-."
z 0
r,_
z
r,i
b_
z
z
_J
i
<
<
I
i
0
<
o
I
<
<
.r._
0
_J
+
z
o0
0
i
<
z
i
i
o
<
z
c_
i
©
z
z
<
8
0
:z
z
e_
r/l
r..)
i
c_
i
A-7
Taking another look at the problem, some storage could be saved by using
mathematical subroutines located in another cell. Again, there are many choices
that can be made. The square root, because it is used so much, could be left incell 1.
The other subroutines could be moved into a storage area in cell 1 whenever they were
needed. Or the variable could be transmitted on the cell bus to a subroutine cell, and
the subroutine cell will return the desired function. The latter case was examined in
more detail.
For one cell to communicate to another requires a sequence of commands to and
from the controller cell and the communicating cells. This sequence is described in
section 6 of this report.
The estimated time required to send a double precision word to a subroutine was
calculated as follows. CI is the navigation cell set, CC the controller cell, CS the
subroutine cell as shown in Table A-3.
Table A-3. Communication Sequence Timing
Sender Receiver Command/Data Cycles required
CC
C1
CC
CC
C1
C1
CC
CS
C1
CS
X2 - Report communication requests
I/O is desired
X3 - Prepare to input data
X7 - Output data
Data Words transmission time
2
8
30
10
4
Total cycles required 54
To return the data from the subroutine cell to the navigation cell will require the
same amount of time. Thus 108 cycles are required for each subroutine call. This is
neglecting the time the sending cell waits for the controller cell to ask if ithas any I/O.
This time to poll the cells is variable, and is a function of the rest of the tasks con-
tained in the group. This time will be ignored in this discussion, not because it is
unimportant but because there is no way to analyze the amount of time used by other
tasks in this group. When they are all known this delay can be estimated.
There are 42 subroutine calls in the navigation program, which require the
times on the bus given in Table A-3. Table A-4 shows the time for execution of the
subroutines using this new approach.
A-8
Table A-4. Subroutine Timing Requirements
Subroutine
Square Root
SIN
COS
SIN and COS
EXP
Total
Execution Time
Including Bus Time
708
708
708
1112 *
1108
Times Required
By the Program
18
Total Time
Including Bus Time
12744 cycles
4
4
12
4
42
2832
2832
13344
4432
36184
*Both sine and cosine are returned, requiring 4 more cycles than if only one
function is returned.
This time is added to the previous execution time and the previous inter-cell bus
usage. The previous execution time estimate was 55120. Adding the additional inter-
cell bus time requirements gives a total of 59660 cycles required to calculate the new
state vector. It is seen that the total execution time required in the cell is not
increased greatly.
The inter-cell bus usage will increase as shown in part of Figure A-2. The
addition of 4540 cycles to 140 cycles gives a total of 4680 cycles. If there are
106 cycles available in a second of real time, 0.47 percent of the inter-cell bus is used
by the navigation program. The saving of subroutine storage in the navigation cell set
requires increased inter-cell bus time. The saving in cell 1 reduces the storage
requirements therein from 457 words to 239 words, this gain is of course at the
expense of increased bus usage.
These two variations of example 1 show how the distributed processor system
gives great flexibility in the use of the system resources. System availability in event
of a malfunction is one important reason for having a flexible system. The programmer
can make trade-offs to give the resultant system the best chance of meeting system
objectives.
A. 3 EXAMPLE 2
The navigation program was programmed a second time, but this time global
programming was used to show the effects of global programming on the software.
Example 1 used conventional coding. Example 2 will show how global programming is
used. This section will go into much more detail on the coding. Because most of the
navigation vectors contain three components, three cells are used to process the data.
Usually X is calculated in cell CX, Y in cell CY, and Z in cell CZ. Cell CY is the
center and main processing cell. Neighbor lines connect CY to CX and CZ, as shown
in Figure A-3.
A-9
CONTROLLER CELL
START PARAMETERS
END PARAMETERS
GLOBAL PROGRAM
r
CELL CX
CALCULATE X
COMPONENTS OF
VECTORS
SUBROU TINES
INTER-CELL BUS
,r
CELL CY
MAIN PROGRAM
CALCULATES Y
COMPONENTS OF
VECTORS
i|i |
NEIGHBOR LINES
II
,v
CELL C Z
CALCULATES Z
COMPONENTS OF
VECTORS
Figure A-3. Example 2, Program Allocation to Cells
A-IO
The equations were analyzed for places where applied parallelism could be used
effectively. The resulting dependent program sequences, as they appear on the inter-
cell bus, are shown in Figure A-4, Tables A-5 and A-6 are the program that generates
the global sequences. The make-up of the sequences is given in Tables A-8 and A-10
thru 7-12.
The example shown uses an exponential subroutine in order to calculate the
atmospheric drag. As in example 1, the subroutines may be in various places through-
out the system and may be used in various ways. The method proposed here was to
have the exponential, squareroot, sin/cos in the CX cell. The exponential could also
be sent from the controller cell, or the subroutine cell. Or the data could be sent to
the subroutine cell and the subroutine cell can return the mn_tion. F,E,-_ A-a assumes
the exponential is located in CX and the data is sent over the neighbor line to CX.
It should be noted scaling and rounding have been omitted in all these examples.
The equations in 1. 2.4. 2. 1, {reference 2) calculating the eccentric anamoly, use only
one variable and there are few opportunities for doing natural or applied parallelism.
One possibility is to put sin in CX and cos in CZ, and calculate the sin and cos of the
estimated eccentric anamoly simultaneously. In this example applied parallelism was
not used.
A look at Tables A-5 and A-6show many details, most of which are of interest
only to professional programmers. A few items will be mentioned here because they
are of interest in studying the distributed computer System.
TableA-5 shows a program to solve the equations given in section 1.2. 4. 1. of
reference 2. These equations rectify the osculating orbit. Usually the orbit is recti-
fied whenever the deviation vector becomes large. This example assumes that the
rectification takes place, thus this is a worst case example.
The first two lines shown in Table A-5 transmit the components of RV and RV to
the proper cells.
The first global sequence is given next. The BEGIN line tells the compiler that
this next block is to be made into a global program. Cells CX, CY and C Z are all to
receive the global instructions. The compiler will compile a program to be placed in
the controller cell. This program will be called whenever this sequence is required.
The next 4 statements will be made into a global program. This program, shown in
Table A-7, will be executed by the controller cell. The output of this program is
shown in Table A-8.
Looking first at Table A-7, the first three instructions are controller cell
instructions and are executed by the controller cell. Instruction 1 will cause cell CX
to be addressed, set to level L, and set to the dependent wait state (DW). Level L is
any one of the 8 levels. Any level may be used, just so all three cells are set to the
same level.
Instructions 2 and 3 will perform the same operation on cells CY and CZ. Thus
three X7 control words will be sent out over the inter-cell bus.
A-f1
Ui ©0Z
0
Z
r/l
!
X
I
T-I
A-12
Table A-5. Program for 1.2.4.1 (Ref 2)
Line
Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Statement
TRANSMIT RVXT0, RDVXT0 FROM CY TO CX
TRANSMIT RVZT0, RDVZT0 FROM CY TO CZ
BEGIN GLOBAL SEQUENCE 1 TO CX, CY, CZ
RV2 = RVT0 * RVT0
RDV2 = RVDT0 * RVDT0
DELTA = 0
DELTAD = 0
END GLOBAL SEQUENCE 1
TRANSMIT RV2X, RDV2X FROM CX TO CY
TRANSMIT RV2Z, RDV2Z FROM CZ TO CY
V02 = RDV2X + RDV2Y + RDV2Z
R0 = SQRT(RV2X + RV2Y + RV2Z)
A = UG * R0/(2 * UG - R0 * V02)
ECOSE0 = 1 - (R0/A)
ESINE0 = V02/SQRT(UG * A)
N = SQRT(UG)/A ** 1.5
Table A-6. Program for 1.2.4.2 Through 1.2.4.5 (Ref 2)
Line
Number
1
2
3
Statement
FP = 1 - (A/R0) * (1 - COS(DE))
GP = DT - (DE - SIN (DE))/N
TRANSMIT FP, GP FROM CY TO CX, CZ
A-13
Table A-6. (Cont)
Line
Number
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Statement
BEGIN GLOBAL SEQUENCE 2 TO CX, CY, XZ
ROSCT = FP * RT0 + GP * RDT0
RM2 = ROSCT * ROSCT
END GLOBAL SEQUENCE 2
TRANSMIT RM2 FROM CX TO CY
TRANSMIT RM2 FROM CZ TO CY
RM = SQRT (RM2X + RM2Y + RM2Z)
FV = SQRT (UG * A) * SIN (DE)/(R0 * RM)
GV = 1 - (A/RM) * (I - COS (DE))
TRANSMIT FV, GV FROM CY TO CX, CZ
BEGIN GLOBAL SEQUENCE 3 TO CX, CY, CZ
RDOSCT = FV * ROSCT0 + GV * RDOSCT0
RVT = ROSCT + DELTAT
RDVT = RDOSCT + DELTADT
Q = (DELTAT * (DELTAT - 2 * RVT))/
(RVT * RVT)
END GLOBAL SEQUENCE 3
TRANSMIT Q FROM CX TO CY
TRANSMIT Q FROM CZ TO CY
Q=QX+ QY+QZ
FQ = (Q* (3 +Q+Q *Q))/(I+ (i +Q) ** 1.5)
TRANSMIT RVT * RVT, RDVT * RDVT FROM
CX TO CY
TRANSMIT RVT * RVT, RDVT * RDVT FROM
CZ TO CY
A-14
Table A-6. (Cont)
Line
Number
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Statement
RV2 = RVTX * RVTX + RVTY * RVTY + RVTZ *
RVTZ
HE = SQRT (RV2)
LA = SQRT (RDVTX * RDVTX + RDVTY *
RDVTY + RDVTZ * RDVTZ)
LB = GAMMA * LA * EXP ( - BETA * HE)
LC = RM2* ROSCT
L1 = J * RE2/RV2
L2 = 5 * Z2/RV2
TRANSMIT L1, L2, LB, LC FROM CY TO CX,
CZ
BEGIN GLOBAL SEQUENCE 4 TO CX, CY, CZ
DEFINE L3 = i IN CX, CY
DEFINE L3 = 3 IN CZ
A2= LI*RV* (L3- L2)
A1 = LB * RDT
AD = A1 + A2
KV(I) = DELTADT
KA(1) = AD - UG * (FQ * R + DELTAT)/LC
IFI <3 DO
DTNHV = DTNV + (H/2) * KV(I)
DTNHA = DTNA + (H/2) * KA(I)
END DO
A-15
Table A-6. (Cont)
Line
Number
46
47
48
49
50
51
52
Statement
IFI=4DO
CV1 = (H/6) * (KV(1) + 2 * KV(2) + 2 *
KV(3) + KV(4))
CA1 = (H/6) * (KA(1) + 2 * KA(2) + 2 *
KA(3) + KA(4))
RTN = ROSCT + CV1
RDTN = RDOSCT + CA1
END DO
END
Table A-7.
Line
Number
1
2
3
4
5
6
7
8
9
10
11
Controller Cell Program to Generate Sequence One
Instruction iOperati
CC
CC
CC
CC
LOAD DP
MPY DP
STORE DP
LOAD DP
MPY DP
STORE DP
SUB DP
Operands
CX, LEVEL L, DW
CY, LEVEL L, DW
CZ, LEVEL L, DW
Transmit Mode, Level L
U1, RV
U1, U1
U1, RV2
U1, RDV
U1, U1
U1, RDV2
U1, U1
A-16
TableA-7. (Cont)
Line
Number
12
13
14
Instruction
Operation
STORE DP
STORE DP
CC
Operands
U1, DELTAT0
U1, DELTADT0
Execute Mode, Level L, DLC
Table A-8o First Sequence Received by CX, CY, CZ
Line
Number
1
2
3
4
5
6
7
8
9
10
11
Operation
PREFIX
LOAD DP
MPY DP
STORE DP
LOAD DP
MPY DP
STORE DP
SUB DP
STORE DP
STORE DP
GC
Operands
LEVEL L
U1, RV
U1, U1
U1, RV2
U1, RDV
U1, U1
U1, RDV2
U1, U1
U1, DELTAT0
U1, DELTADT0
Go to Dependent Local Control State
A-17
Instruction 4 sets the controller to the transmit modc and a X0 control byte at
Level L is sent out- Three dependent cells are at level L and the three cells will
respond and switch to the global state. The controller cell will go into the transmit
mode, not executing instructions until line 14 is fetched from memory. Lines 5 thru
13 are sent to the dependent ceils. When the line 14 instruction is fetched, because
it is a CC instruction, the instruction is executed only by the controller cell. The
controller cell is set to the execute mode. An X0 control byte is sent over the global
bus at level L, setting all three cells to the Dependent Local Control (DLC) state. The
controller cell will now do some other task; the three cells will each execute their
local instructions to solve the navigation problem.
Table A-8 shows the instructions received by each cell. Line 1 is the X0 control
byte sent by the instruction on line 4 of Table A-7. The cells are all set to the global
state and are ready to receive instructions from the inter-cell bus. Lines 2-4 form
the square of RV. The DP means Double Precision, or a 32 bit data word is to be
used. U1 and U2 are considered one 32 bit accumulator. The LOAD DP loads U1 and
U2 with the double precision data word RV. The next instruction will multiply U1 and
U2 by U1 and U2, giving a 64 bit product in accumulators U1, U2, U3, U4. In this
example, only 32 bits are saved and stored in RV2. Normally, shift instructions
would be used to scale the data. The scaling has been omitted in this example.
Instructions 5-7 square RV. The next instruction subtracts U1 - U2 from itself,
setting it to zero. This double precision zero is stored by instructions 9 and 10. An
alternate method would be to send the zeros from the controller cell as a D32 data
word. In this case, Table A-7, line 11 - 13, would be changed as follows:
Line Instruction Operands
11 GC D32
12 STORE DP U1, DELTAT 0
13 Data DO
14 GC D32
15 STORE DP U1, DELTADT 0
16 Data DO
This sequence is clearly inefficient when a zero can be generated so easily in
the cell by using a register - to - register instruction.
The last instruction received is one that sets the three cells to the dependent
local control state. As seen inTable A-5, the cells next operation is to transmit the
resultant squares to CY.
Cell CY will compute R0 and V02 from the components received from the neighbor
cells. The machine coding for this neighbor-to-neighbor transfer is given inTable A-9.
A-18
Table A-9. Neighbor to Neighbor Communication
Line
Number
1
2
3
4
5
6
7
8
Cell CX
FLAG U1, CY
FLAG U2, CY
LOAD DP U1, RVTOX
FLAG U1, CY
FLAG U2, CY
Cell CY
LOAD DP U3, U1
LOAD U1, CX
LOAD U2, CX
ADD DP U3, U1
LOAD U1, CZ
LOAD U2, CZ
ADD DP U3, U1
STORE DP U3, V02
Cell C Z
FLAG
FLAG
LOAD DP
FLAG
FLAG
U1, CY
U2, CY
U1, RVTOZ
U1, CY
U2, CY
Table A-10. Second Sequence Received by CX, CY, CZ
Line
Number
1
2
3
4
5
6
7
8
9
10
Operation
PREFIX
LOAD DP
MPY DP
LEVEL L
U1, FP
U1, R
Operands
STORE DP
LOAD DP
MPY DP
ADD DP
STORE DP
MPY DP
GC
U1, ROSC
U1, RD
U1, U1
U1, ROSC
U1, ROSC
U1, U1
Go to Dependent Local Control State
A-19
Table A-11. Third Sequence Received by CX, CY, CZ
Line
Number
1
2
3
4
5
6
7
8
9
I0
11
12
13
14
15
16
17
18
19
20
21
22
23
Operation
PREFIX
LOAD DP
MPY DP
STORE DP
LOAD DP
Operands
LEVEL L
U1, RDOSCT0
U1, GV
U1, RDOSCT
U1, ROSCT0
MPY DP
ADD DP
STORE DP
ADD DP
STORE DP
MPY DP
STORE DP
LOAD DP
ADD DP
STORE DP
MPY DP
STORE DP
LOAD DP
CMPL
CMPL
SHIFT
ADD DP
MPY DP
U1, FV
U1, RDOSCT
U1, RDOSCT
U1, DELTAD
U1, RDVT
U1, U1
U1, TEMP1
U1, ROSCT
U1, DELTAT
U1, RVT
U1, U1
U1, TEMP4
U1, RVT
U1
U2
U1, DELTAT
U1, DELTAT
A-20
Table A-11. (Cont)
Line
Number
24
25
26
Operation
DIVIDE DP
STORE DP
GC
Operands
U1, TEMP4
U1, Q
Go to Dependent Local Control State
Table A-12. Fourth Sequence Received by CX, CY, CZ
Line
Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Operation
PREFIX
LOAD DP
SUB DP
MPY DP
MPY DP
STORE DP
LOAD DP
MPY DP
STORE DP
ADD DP
STORE DP
LOAD DP
STORE DP
LOAD DP
MPY DP
ADD DP
Operands
LEVEL L
U1, L3
U1, L2
U1, L1
U1, RV
U1, A2
U1, LB
U1, RDT
U1, A1
U1, A2
U1, AD
Ui, DELTADT
U1, KV(I)
U3, FQ
U3, R
U3, DELTAT
A-21
Line
Number
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Table A-12. (Cont)
Operation
MPY DP
LOAD DP
SUB DP
DIVIDE DP
STORE DP
LOAD DP
SHIFT
MPY DP
ADD DP
STORE DP
LOAD DP
SHIFT
MPY DP
ADD DP
STORE DP
LOAD DP
DIVIDE DP
STORE DP
LOAD DP
SHIFT
LOAD DP
SHIFT
Operands
U3, UG
U1, AD
U1, U3
U1, LC
Vl, KA(/)
U1, H
i
U1, KV(I)
U1, DTNV
U1, DTNHV
U1, H
UI, KA(I)
UI, DTNA
U1, DTNHA
UI, H
U1, SIX
U1, TEMP
U1, KV + 1
U3, KV+ 2
A-22
Table A-12. {Cont)
Line
Number
39
4O
41
42
43
44
45
46
47
48
49
5O
51
52
53
54
55
56
57
Operation
ADD DP
ADD DP
ADD DP
MPY DP
STORE DP
ADD DP
STORE DP
LOAD DP
SHIFT
LOAD DP
SHIFT
ADD DP
ADD DP
ADD DP
MPY DP
STORE DP
ADD DP
STORE DP
CC
Operands
U1, U3
U1, KV + 3
U1, KV
U1. TEMP
U1, CV1
U1, ROSCT
U1, RTN
U1, KA + 1
U3, KA + 2
U1, U3
U1, KA + 3
UI, KA
U1, TEMP
UI, CA1
U1, RDOSCT
U1, RDTN
Go to Dependent Local Control State
Notes for Table A-12
Instructions 22 through 31 are sent out only when I is less than 4.
Instructions 32 through 57 are sent out 0nlY when I is equal to 4.
A-23
Cell CX set the neighbor flag for its accumulator U1 to indicate the contents of
this accumulator are to be sent over the neighbor line to cell CY. Cell CX is expecting
cell CY to request a word of data. When this request comes, accumulator U1 has
been set for cell CY. Since the product which is to be transferred was just computed,
there is no need to load the accumulator. At the same time, cell CY is moving its
product to the other pair of accumulators, U3 and U4. Now (line 2) U1 is loaded with
the first accumulator that has the flag for cell CY. Thus CX will transfer its accumu-
lator U1 to CY, which will place the contents in its U1. When CY executes the line 2
instruction, the accumulator U2 contents will be transferred from CX to CY.
While CY is transferring U2, CX will attempt to execute its instruction on line 3.
However, U2 continues to contain a flag, therefore this instruction execution will be
delayed until CY executes its line 4 instruction.
The remaining instructions operate in a similar manner. Thus the C Z compo-
nents are moved to CY and added. Because the data is double precision, 32 bits, the
program is longer than if the data were single 16 bit precision. In the latter case,
the addition can be done directly, the instructions to load the CY accumulators are not
required.
Returning to Table A-5, the line 12 instruction takes the square root of the sum
of the squares. In this example, the square root subroutine is assumed to be located
in one of the neighbor cells. The data is transferred to the neighbor and the square
root is computed and returned. The remaining statements are similar.
The equations in section 1.2. 4. 2. 1 of reference 2 are not described here. The
programming is all contained in cell CY and is conventional.
Table A-6 contains the program for the remaining sections. The data descrip-
tion, data equivalence and similar statements have been deleted. The statements are
not written in any particular language, since the purpose of these examples is to show
how a problem may be implemented and not to develop a compiler or assember
language. The programs to obtain the starting parameters and return the ending
parameters to the controller cell have also been omitted. The other global programs
have been written in detail and are given in Table A-10 thru A-12. These are similar
to the sequence one program, Table A-8. One comment that should be made is K(I),
as given in the equations, is a 6 element vector. To make the program easier to
follow, the vector was made into two three component vectors, KV(I) and KA {I}. The
resultant program is equivalent.
Storage estimates were made for the example 2 program. These estimates are
given in Table A-13. Cell CX contains instructions, data storage and the subroutines.
The instructions are all required to move data into and out of the cell over the
neighbor lines and to transfer from the local control to wait state. There are 10
instructions included to set the cell up to receive the first sequence of global instruc-
tions. Some words are required to save intermediate values as they are calculated
and to save data as it comes from cell CY. This data storage is temporary and may
be used by other tasks when the navigation program is completed. Only cell CY is
assumed to save permanent data that may be required between executions of the
navigation program.
A-24
Table A-13. Example 2 Storage Estimates
Cell CX
Instructions 60 words
Data Storage, temporary 58
Subroutines 228
Total CX 346 words
Cell CZ
Instructions 60 words
Data Storage, temporary 58
Total CZ 118 words
Cell CY
Navigation Program Executive, I/O, 102 words
and error checking
Instructions 207
Program to calculate delta E 73
Program for block 4 17
Data Storage 98
Total CY 497 words
Controller Cell
Instruction Storage 91 words
Total Controller Cell 91 words
Total Storage 1052 words
Cell C Z is like CX, only the subroutine package is not required. Cell CY
contains the remaining programs, the executive and I/O interface with the controller
cell. The instructions include the same 60 instructions as cells CX and C Z. The
remaining are the instructions required to solve the equations in Table A-5 and A-6
that are not part of the global program sequences. The program to calculate delta E
(1. 2. 4. 2. 1) is here, as well as block 4. The data storage are the remaining words.
A-25
The controller cell will require about 90 words to hold the global program.
Figure A-5 shows the general time requirements for example 2. Each line
represents 4000 cycles. The blocks on the lines show the approximate inter-cell bus
time required. A little over 52000 cycles of time are required to complete the naviga-
tion program, or 5. 2 percent of the available time, assuming 106 cycles per second
of real time. The inter-cell bus requirements go to 0.55 percent of the available time,
again assuming 106 cycles per second.
Comparing example 1 with example 2, a cost of storage and increased inter-cell
bus time requirements was paid to reduce the execution time. For a critical program
such as navigation, the example 1 approach appears to have definite advantages. There
is no speed problem and two cells should be better than three. The two examples show
how the distributed processor computer system gives the programmer the flexibility
of changing the programs to best meet the system objectives. For a problem that
requires much data handling, the converse will be true; the mechanization with the
global programming will require much less time and less storage.
A-26
Et'q cq ,ql t_ t_ aO
o1-¢
t_
!
G)
A-27
APPENDIX B. ERRORANALYSIS OFMONTECARLOPROGRAM
The Monte Carlo program used in the reliability simulation is described in
Section 12. This appendix will present an analysis of the accuracy and confidence one
can expect from Monte Carlo techniques.
Th_ number of runs, N, necessary to achieve a given accuracy and confidence in
the results can be found from the following equatiom
N>_ p (l-p) K 2 (a)
E 2
where p is the probability to be determined, K 2 (a) is a confidence function which will
be discusses below and, E is the allowable error.
This equation is aerived below.
The runs generated by the Monte Carlo program are essentially independent
Bernoulli trials. Since Bernoulli trials obey the binomial probability law, it is
desired to find the statistics which describe binomially distributed probabilities (see
Reference 37). In order to simplify this task without sacrificing accuracy the normal
approximation to the binomial distribution is applied. This states that:
IN(fn-P)I ]p < h•,/Np(1-p)
Where N is the number of trials
= 2 _ (h) -1
fn is the relative frequency of success of an event with probability p of success
on each trial.
h is the allowable error of
,JNp (1 - p)
f e_l/2 y2@ (h) = _ dy
0
This can be rewritten as
P = 2 _(h)-I
B-1
where )/ is the standard deviation of the results of Bernoulli trials
/p (1- p)
4 N
This equation states that: the probability that the ratio of the difference between the
simulated probability and the actual probability, (fn - p)' and the standard deviation,
_), of the Bernoulli trials is less than or equal to some constant h, is equal to(
twice the positive normal distribution of h minus one. If h = e /Y, where c is the
allowable error between the simulated and actual probabilities, the previous equation
becomes:
- p(1-p) ) -1
if K (a) is set as the solution to
K(_) -1/2 y22 ¢ _K(_,_-1 = e dy= aL J
-K(a)
where c_ is the confidence level which is desired then
which will be satisfied if
K(a)
N _ _ p(1-p)
c2
K (a) can be found from a table of the normal curve of error. If the value of 1/2 a is
found in the area column the corresponding value in the t column is K (a). Note that
for the worse case
N _ _ whenp = 1/2
4E 2
for p greater or less than 1/2 the number of trials (runs) decreases. This means that
if nothing is known of the probability that is being simulated the worse case must be
used but if the probability can first be estimated the number of runs can be reduced.
This equation can also be used inversely, that is if a number of runs have been
made, an error limit can be established for a given confidence level.
B-2
C__K(a)_ i.e., _ -_K(a)
Figure B-l contains the error as a function of p for various number of runs at a95% and 75% confidence level.
B-3
o.o].5 o.o2o o.o2_;
Figure B-1. Error vs Ps in Monte Carlo Results
0.035
B-4
APPENDIX C. MONTE CARLO SIMULATION RESULTS
This appendix contains the actual print outs from the Monte Carlo simulation
discussed in Section 12 of the report. A description of the program along with an
explanation of the print out may be found in that section.
C-1
0
0
0
fw.
II
IJ.
O"
it
0
Z 1
I,.,.. e,
,.I II
II
_Z
.J
W_
W-
_Z
L_
0
,-!
.,,.1
UA
Z
ev.
0
I,--
T
UJ
W-
>-
I
I
IN
C[
U
W,.-
U
i11
Z
I,_UJ
_J
U_
tl
"rL_
_. U ¢Y"
C3 _....,._
.J UJ
I-- .=J _
0 0
O0 000
000000000
GgGgGGGGG
000000000
_0000_0
eooeeooee
_gggg2_lg
__0
gjgddggJg
V_
0 C
,_U_ 0 0
d gg;ddg
_0_000_
_d_dddg
IIIIIIIII
ZU_EZZZZ_
C-2
0
O
0
0
N
#1
U.
t--
_E
,;
O ,b
O
||
..J
c_) I!
Zr._
OZ
iii
0
rY
2'
C
u.l
I
I
LIJ
uLI
Z
ms_3
_ua
k.)-,_
0 0
J
_w
C3 _- .J
t_
t_
t_
0 000
000000000
_.ddddJ_dd
. _0 ,.-4 o4 ,4r t_-
O0 000
OC, O000000
00_'00"0000
_00_00000
gddgJgdgg
ddggdgggg
o _omom
_O0oo_Co
dggJdddgg
t._ 0 0
_o1_ 0 0
_._, 0 0
,'_, o 0
o-_) o 0
• oeeeee
o 000000
_o _c 0.._000I_-O _
J_d_gddgd
m,,,, ,0 '..00 _"_ 000 _ aO
I_ O" O_ _00" _ u_ '_ --_ w'_
w,,_w-¢
Illlillll
Z_ZZZZ_
C-3
C)
Z
:Z
L)
14.1
I,-.
Z
!"
0
0
0
O
U_
U
_._
I.-
,2
D
0
tl
w-
I|
o,
0
2"
,,,,,I
Z
C
0.
_J
V'3
I
I
_t
tU
I--
U
I-- 0 0
UJ
Z
V) u,
"_ _ O00000000
U.
_ 0 00000
_ 000000000
eeeeeeeee
_ CO0000000
_ 00_000000
=_- gddggggdd
_- gddddddgd
_ I _ 000000000
=,_ ddd_ggdgg
W 0
'1" U.I 0 (D
G_..) 0 0
0._ 0 0
'_ .2" .2"
oeeeee000000
_mO_O00_
= _g_gdggd
IIIIIIIII
C-4
O
O
II
I.L.
_._
I"
&
It
H
0 h
0
_ m
cJ II
O
,,r
G,
...1
X
W
w
0,.
U
k-
L_
I
!
U
U
I.-- 0 _
o
Z
_ OO O
_ OOOOOOOOO
_dddgdd_g
=_ _ _oo
_u- _dddd_Jd
_ I _ _O_OONN_O
_< dgdddgddd
u_ O O
c0u_ O O
_J O O
v5 •
_ _ __0
44d_dgd_d
t11111111
Z_EEZZZZU
C-6
0
0
0
0
0
N
II
U.
I-..
3-
&
II
0
Z tl
0>-
Ig
II
OZ
U ,
W_
ZC.._
_Z
ill
,-,e
-..I
uJ
Z
U
t_
W--
W-
T
W
I
I
I.-
U
U.I
Z
;._UJ
_L
_'_ UJ
U_
L_
0.. U _'Y"
DI'--_
U..
C_ UJ -_
F.- U _._
CL
UJ I _r
0 0
000000000
N__
0 000000
0_0000000
eeeeoo_oe
OOOO00000
gggddgdgd
ggggggggg
IOee_JJ_I
U9 0 0
_DU_ 0 _D
CU 0 0
a_U 0 C
0._ 0 0
g _JZdgZ
_id_doggd
llllllJil
ZU_EZZZZU
C-6
0
0
0
0
II
U.
_i_
Q.
0 _
0)"
,.I II
_v
_ b
I!
_Z
ZtD
_,Z
V)
O-
Z
.,_
0,.
tJ
I.-
LU
V'l
I
I
W
W
Z
cO_
.J
_.=_
U.
Va
,.J
LL
@:W
_k.'m
U.
t_
.J UJ
k-.J_
0 _1._,,,,I
0 0
000000000
eOoeoeeoe
000000000
0 O0
0000oo000
• • • • • • • e e
0000000o0
ddddggggd
P'.- oo_e4m-o,
ggddggddd
0000(DO O00
e • •googgggog
v_
u9 0 0
¢F_U4 0
C_..J 0 C
,','U 0 0
o._ 0 C
• ooeeeo
_o_ggddg
_0_000_
< IIIIIII1!
U_OOUUUU_
ZU_EZZZZU
C-7
O
0
0
0
II
I,-
O,
llg
I!
..,,J
,ero
T
w_
z_
OZ
e.-
,.)
L_
Z
0
I,-
>-
I
I
O"
I,--
L.)
uJ
I.=,. o 0
Z
D,.,.
,_ D 0o0 O0 o oO 0
o- ggdgdgddg
000
eo_eeeeee
_ 000000000
D_D O00CO0000
=_- gggggdgdg
_- gdddgggdg
_ I _ _0000_00
_,< gddgggddd
u0
u'_ 0 0
,'nLU 0 0
(D_.2 0 0
u') • 2
• oeo_eo
0 000000
= _G_GGogG
__,_
__00
W
C-8
I=-
U4
Z
,_gd_ZZgZZ
C-9
e,j
I
,-4
i!
o
_0
o
o
o
Q
0
N
II
u.
:E
O
O
O II
Z_
I,.- II
_J b
DO
_," II
_ P
O II
..IZ
L) II
ZO
OZ
ku
e_
._J
ii.
Z
N"
U
¢k
U
,....4
u3
I
I
k_
L_
tJ
U.I
w
Z
O O
_,') t.I,.I
.,,J
OOO000000
d_dddddgd
0 0 000
_qqqqqq_
000000000
I_ _ 0__
_UE _ _00_
_ 000000000
=_- d_dddddgg
_ _0__0
0_
_. gggJ_J_gJ
<
_1_ _0000_000
_,_ _Sddddddd
U9
UO 0 0
a_LL* 0 0
_D_.J 0 0
_:U 0 0
C.. /.) 0 %2
JJJJJJ
,Z._d_dggdd
IIIIIIIII
UEU_UUUU_
ZUZ_ZZZZ_
C-IO
II!
0
0
0
0
II
_0
0
0
U.. •
C_ II
ZW_
0 -
I-- II
¢_>,.
_1 o
o
0
C_ II
-JZ
nr"
II
IJJ
I--
OZ
¢/)
W
0
O.
J
Z
C
0-
<[
W
I,--
>-
I
I
W
_J
tU
l.-
ILl
0
Z
_UJ
eO_
U,.
b5
..J
U,.
O.L)_
tJ.
<_ ._j rw
I-- ..1 _
C3 IJJ --I
u.
.._ I --I
I111 ,,_
_._ ,-,0 IJ_
0 0
000000000
gd_dd_ddd
0 O0 000
000000000
ooeeoeoee
000000000
000_
000000000
ddddd_d
_0000_0
eeee_eoee
000000000
000000000
bO
_'_ 0 0
_tJJ 0 0
C3t._ 0 0
o_ cj 0
• eoeeoo
,_d_dgddd
IIIIIIIII
ZU_ZZZZU
C-11
¢M
I
,-4
II
C'
0
0
t,
II
I.L
u%
0
0
0 U
Q -
_O
0
C II
.,._Z
e,¢" _
U II
ZO
.'nZ
U
£21
n
I---
-J
Z)
u.l
Z
uJ
¢%
t_
_J
UJ
I
I
I--
U
Z
0
W
D
.J
"r'w
1231,1.1,.,.I
u,.
...11 .-J
--IIN_
UJ I '_
L) .--_ LL
0 0
ooo
000000000
ddddddg_g
O000OOO00
IIIit1111
000000000
_0_
_0000_000
ddgg_dgg
_0__
eeeeeeeee
_0000_0
ddSS_dd
_ d d
• eeeeee
0 _UO_OQ
,_2d_ggg_g
IIIIIIIII
UEQQUUU_
ZU_ZZZZZU
C-12
NI
I!
0
0
t.
0
0
0
,b
0
0
4%
It
a_
m:
0
0 I/)
C) !I
eC k.,-
ZO.. tn
C _1 )-
-JZ '._
'<fO I
__) II I
_E
uJ
Z_
_Z I.U
uq _.J
LU
¢3.
.J
iu
Z
u.l
_J
t,-
_J
U.I
Z
m_
_1
u.
u=l
.n_
,..I
M_
u-
tn
"rw
C_ ,...= ..J
¢..:3 ¢n ,,¢
u.
t_
.-I uJ
c) uJ .-_
<[
u.
-I I -J
o o
0o00o0o00
000000000
000000000
_0_
000000000
ooioooeoe
• • • • • • • • o
_0_
000000000
_d_dddd_
Ug,
uO 0 (D
,_u 0 0
0_.) 0 0
_ °
J JJJ_jJ
g._d_ddgdd
u=
I I I I I I I I I
-r (.._ O_ C) C;¢.jU_.) Uric
_. Z ¢.b_ _r ZZ Z Z(..)
C-13
N!
II
O
0
0
U
11
0
O
0 II
_3
0 -
P- II
:::),-4
Q
,._Z
_ n
=g
t_
I.-
tO
QZ
tJ
,../
_r
Z
0
em
IJ
IJ
UJ
I--
LcJ
I
I
if%
I&t
t_
U
U
Z
V_
0 ,,,.4
i1
I._UJ
--1
t_
"rUJ
-.1 UJ
_ U-_ -J
I"- U _
-J (%l ,.-,
UJ I "_
,J ,.-ILL
0 0
0 000
000000000
dddddggdd
000_0
000000000
_0000_0
_ggggd_gd
eeoeooeoo
SdSSS_SG_
_o
un 0 0
_" uJ o 0
OU 0 _C
OdU (D cO
ZJJ_ZJ
4._d_dgdJd
ZU_ZZZZU
C-14
!II
0
0
0
wm
Q
0
II
_E
g
0
0
W. •
0 .
0 ,-
_ P
O
CI II
-lZ
(J tl
ZO
uJ
0
-3
Z
0
,lff
I
I
.-4
UJ
_,.w 0
a_"_ 0000000 O0
..I
D_ 0 0 000
_D 000000000
oeo0eeeee
_ 000000000
_ _00000000
_- Sggg;;gdg
_ _0__0
_o- _dg_d_gg
_00
I'_ _ 0 0 0 0_ 0 0
,_ gddg_dg_
&A)
v) 0 0
_OLU 0 0
(DU 0 0
_.b 0 Q
,.j ,-, O,_ _"_ 0', ,,T CO 04"
U 0 ,-_ 0 rO 0 (J
g_d_ggdgd
_,_C_
Z_ZZZ_U
C-15
Itl
(:3
,%
0
!.
0
0
0
0
II
U.
0
0
Q n
Z0.
G -
I-- ii
.j .
Q
Q fl
..,Jz
U u
CZ
1,.-
,_J
a.,
Z
0
N
i,.,-
I--
_E
ILl
I--
_'_
I
I
I.U
,J
I--
U
ILl
I.-
0
Z
V_LU
.J
U.
U-
0 0
_-_
_L
I1- U _..
U.
_ I _
000000000
dgdgddgdd
000000000
_0_
000_
000000000
dgdggddgd
_0_0_0
gggdgdddd
QOOOOOQO0
eoe_ootoe
_00000000
r_U_ 0 0
OU 0 0
_g o o
_ u
• eelelo
,_o_d_ddddd
IIIIIIIII
ZU_ZZZZU
C-16
II
0
t_
0
0
0,
r...
ii
GL.
I--
:E
0
0
0 II
m
Z_
0 *.
_.- II
t,_ *.
0
(_ Pl
-Jt
kJ I!
U..I
OZ
E ,...,
uJ
L_
0
Q.
J
Z
123
m
tJ
I--
t.-
I
I
.-4
tJ
U
W
W
Q
Z
_L,u
U.
0 O
0 000
000000000
_d_dgd_g
0 00_0
000000000
V_
_D,,_ 0 0
_JU Q
.._ u,% 0"1.4"_00_m_..
.._,-., 0 OcD(_Om'C _
,_.:d_gddgd
IIIIIIIII
ZUZZZZZZU
C-17
7
,-4
II
C)
P
0
0
0
Q
O
II
_n
0
0
G II
Z_
0 ,..
I--- I|
,,_>.
-,I o.
t-,v
0
,..,,1 Z
ZO
OZ
0
.,.,.1
uJ
Z
C
u.i
tJ
U
k.-
_g
)-
I
I
.-.i
ILl
I--
I.--
UJ
C_
Z
..J
(._,--
It.
u')UJ
LL
n ure
C) _ ..I
U-
LU
U_
u9_%
.J I --I
U_ I _
_l) ,-._ I/.
0 0
o
000000000
0 0000
qq_qqq_
000000000
_0_
_00000000
_dddd_dd
00_00
_0__0
2d_gggdd_
dddddSgdd
oO_u 0 0
CDU 0
_ _0_
_ _ __0
• elelel
u uUO_UU
_000_
,_._d_ddddd
IIIIIIIII
ZU_ZZZZ_
C-18
F.-
U
W
b.-
W
V) I._
_'ld.I
-rua
o.t.Jo_
0_...I
LO_,¢I
I.L
u_
-J u.J
ow..J
t.- (j .._
u.
,-I1-1
I.ul,_
0 0
0 000
0013000O00
d;dd_gdgg
000 O0
000000000
0_0000000
NO000_O
gdg;_gd;;
eleelelee
_0000_0
Sddd_d_d
_0 0 0
,_n.UJ 0 0
U u.
_ _- _._
d _gdg_g
= ,___
IIIIIIIII
C-19
I
,ld
I!
C
0
0
0
0
0
0
It
0
0
Za.
Q ..
"_0
,=J _
:IE _f
0
Q II
,,dZ
._0
U II
E
_Z
a.
..=/
ill
Z
t.l
I
!
U
UJ
Z
==J
-1
_W
_.. U e¢'
_1 ! -J
..1 ¢M _
0 0
000
000000000
_gg_ggg
0 000000
000000000
QOOOOOOoO
OOQOOOOQO
000000000
_eooooo_o
000000000
_eeooeleo
000000000
ddddddd_d
u'_ c_ 0
_uJ 0 r.1)
r,,_ 0 0
oooeo_
___'
IIIIIIIII
C-20
e,4
±
II
ID
0
0
0
0
N
It
cO
I=-
J,
O II
Z_
O _
_,,.t O
.,.J _
_,-,4
'_- II
O
,,.IZ
,w" -.
W
Z_
OZ
O
,.y.
..J
Z
O
llg
!
I
N
u'_W
,.J
U.
l,¢_ tU
_tY
u_
..J _
O _LI ,-J
u.
_,_
_11 -I
t.) ,,..4 _L
O O
_ _.w N .._000
000000000
dd_dd_J_
000000000
gSggggg;g
000000000
000000000
_0__0
_d_gdd;gd
dgdddggdd
,"nw 0 0
O%.J _o o
af ,,.J 0 0
J JJJJJ_
_dd_dddd
Z_ZZZZ_
C-21
o_
!
II
0
0
0
0
0
o,1
I!
p-
U. • I._
22 hi _"
C' f! _,-
(J li I
=[:
UJ "_
I-- _1
ZL_
QZ UJ
V_ tO
LU
_J
-J
U.I
Z
0
ill
w
Z
m_
i._ U.J
.J
_uJ
ii
t_
_1 uJ
OuJ.J
LL
..I I .-J
0 0
0 000
000000000
ddddddddd
000_0
000000000
_oooooeeo
000000000
0 __
_00000000
d_ddddj_
_0__0
_0000_00
Sdddddd_d
u"l 0
azuJ (9 0
e_ooeo
0 0_00_
_'_ ,_Jd_dJdgd
IIIIIIIII
C-22
!II
C_
rD
e.
0
0
0
0
0
N
I!
g
0
0 II _-*
ZO-
O -
,..- *._ tJJ
0 JI >-.
.../Z
',_0 i
p- (-_
OZ UJ
_J_ tJ
L)
C
.J
Z
O
rr
IJJ
L)
r_
t.3
I'"
£1J
r_
_SUJ
t_
c"_.W
..J
t_
_UJ
0 ,-, .J
t_
J UJ
I-- J -_
0 t.U -J
0 0
000000000
0 0 000
000000000
_O_N
000000000
ooeoeooeo
000000000
_0_0_
2dddddggd
_0_0_
_00000_00
_e@_eeoe
000000000
u'_ C) 0
,'nuJ 0 0
_.J 0 0
JJJJJJ
,_d_dJgJd
< llllllll;
Z__Z_
C-23
APPENDIX D. COMMUNICATIONS
D. 1 INTRODUCTION
This sectionof the report discusses the communications within the distributed
processor. Primarily thisis the communications between the cells and between the
groups. Presently the organization isbeing studied and designed on the basis ofusing
conventional DC transmission of binary signals at thecomputer clock rate (approx.
2 MHz) and providing for the transmission of a halfword (8 bits) inparallel (actually
10 linesmay be required when control and parityare considered).
The approaches to communications may be considered as DC or RF. RF
approaches modulate a high frequency carrier thatisthen demodulated to receive the
transmitted information. Within the DC approaches, the primary variables are the
number of linesprovided (thenumber of bitsin parallel)and the clock rate or bitrate
of transmission. The RF approaches have many variablesthatyieldmany communica-
tionssystems. Four systems were defined and considered in this section.
1. Simple Time Multiplex: Data is transmitted serially from a cell with only
one cell transmitting at a time.
2. Simple Frequency Multiplex: Data is transmittedin parallelover one line
from a cellwith only one celltransmittingata time.
3. Full Frequency - Simple Time Multiplex: Data is transmitted seriallyfrom
a cellwith allcells transmitting simultsneously over one line.
1 Full Frequency - Frequency Multiplex: Data is transmitted in parallel over
one line from a cell with all cells transmitting simultaneously over this
one line.
These RF approaches will be explained in further depth in this section. R
should be pointed out that RF approaches to the communication system are particularly
attractive since data may be multiplexed by modulating carriers at various frequencies.
This offers the possibility of using only one line to each cell for communication pur-
poses. This minimization of the number of connections to each cell is of prime
importance since the number of connections have a great effect on reliability. The
advantages and effects on the computer organization will be explained in this section.
In addition,itis desired to fabricatethe RF circuitryon the same wafer as the
cellis mechanized on. RF circuittechnology willbe examined and discussed in this
sectionalso.
D. 2 COMMUNICATION REQUIREMENTS
The distributedprocessor organizationuses approximately 20 cellsper group and
two group switches per group. The basic internalcomputer word lengthis 16 bits. As
far as communications is concerned parity and a controlbit may add two more bitsto
every "word" communicated from cellto cell. Therefore, the basic word lengthfrom
the communications standpointwillbe 18 bits. Each cellis to be an individualwafer so
that the main communication problem is that of communicating between cells (wafers)
in a group (probably on one board).
The data rates have not been established firmly at this time. However, they are
not expected to be severe and a nominal figure of 10 megabits per second will be
assumed (total rate per group).
Reliability is the prime requirement; therefore, this requires that the com-
munication system use as few leads to each cell as possible (primary importance) and
require a minimum amount of circuitry for its mechanization (secondary importance).
D. 3 RF CIRCUIT TECHNOLOGY
This section will present a brief discussion of RF circuits that may be fabricated
on the same wafer as used for the digital devices to mechanize a cell. The feasibility
of using an internal RF link to communicate from one section of the computer to another
is within the scope of present technologies. All of the integrated circuit RF components
necessary to construct an internal communications system of this type have been
operated in the laboratory. The fabrication of these components is also compatible
with the thin film techniques which are used in the construction of high speed digital
integrated circuits. As in the lower frequency circuits the size of the integrated cir-
cuit is controlled by the passive circuit elements. At the higher RF frequencies
circuit elements are developed from lengths of transmission line which perform the
electrical function of capacitors and inductors. The physical circuit size will be
related directly to the wavelength of the RF energy propagated on a microstrip trans-
mission line. In order to illustrate the size reduction of these circuits possible with
various substrate materials, Figure D-1 shows both the reduction of the linear
dimensions and the reduction of area that results at a given frequency for various
values of substrate dielectric constant. As can be seen from this chart, the materials
presently in use and under consideration for use in integrated circuits have dielectric
constants in the vicinity of 10. This would result in linear dimensions which are
38 percent and equivalent areas which are 15 percent of those when the dielectric con-
stant of the transmission line is unity (air line). It should also be noted that ff the
system is operated at a higher frequency the linear reduction in circuit size will be
directly proportional to the frequency increase. Both of these factors would determine
the total area per wafer occupied by the RF communication link circuits.
The frequency of operation for an RF computer communication system will be
limited at the upper frequency by the RF power output of available devices. At the
lower frequency of operation, the size of the passive components will increase and
the area required for communication purposes will increase, the amount of area
available for the RF circuits will therefore determine the lower frequency. A plot of
frequency vs. power output of a group of Solid State RF power generators utilizing the
bulk effects in semiconductors is shown in Figure D-2. These devices could be ideally
suited for use in the RF computer communication system because of their small size
and simplicity. A number of these devices have been fabricated in thin film con-
figuration and are presently being studied to enhance their power generating capability.
The data which has been collected in Figure D-2 represents the performance of many
solid state bulk effect devices operating in different modes and for comparison also
presents the most recent upper limit for transistor operation. This plot compares
D-2
b_
1.0
0.8
0.6 _%%_UIDE
o.._ _ _ _o_......_,_
0.3 - I
SAPPHIRE
_ I GaAs Sl
(_ FRACTIONAL REDUCTION IN LINEAR DII/ENSION
FRACTIONAL REDUCTION IN AREA
0.2
0.1
1 2
I" I I I
4 6 8 10
RELATIVE DIELECTRIC CONSTANT (e)
Figure D-1. Dielectric Effects on Size
D-3
iIOO0
I00
"%
%
10 -
1.0
O.1
0.01 --
0.1
%
%
%
1967
0
CW PULSE
GUNN @ 0
_A • A
IMPATT • ['1
TRANSISTOR -J-
\
PULSE - LSA
LSE - LMPATT
\
\
\
PULSE - GUNN
CW - IMPATT
\
- LSA
CW - GUNN
I0
FREQUENCY (GHz)
I
I00
I
1000
FigureD-2. Solid-State RF Power Oscillators
D-4
both the cw and pulse modes of performance that has been achieved for the Gunn,
LSA (Limited Space Charge Accumulation), and IMPATT (Impact Avalanche Transit
Time) devices. For further discussion on these types of devices the reader is
referred to reference 38.
One example of the type of circuitcompatibilitythatis availableis the operation
of the Gunn device whose frequency is dependent on sample thickness. To obtain
RF power at a frequency of 10 GHz the sample thicknessrequired is approximately
10 microns. These thicknesses, especiallyatthe higher frequencies are the same
order of magnitude as those which are controlledby thinfilm processing techniques.
The data for the LSA mode of operation as of August 1967 shows thatthe operation of
these devices is feasibleup to the region of 90 GHz. The power outputlevelsare
greater than thatof the Gunn mode oscillators. Sincethese LSA devices may be used
in conjunctionwith microstrip resonators, the choice of the type of device at the upper
RF frequencies would be an LSA mode oscillatorfor the transmitter source. Since
these devices are justin theirinitialexperimental stages, rapid advances in power
output,frequency range, and efficiencyare expected in the next few years.
The power requirement on the transmittingdevice willbe modest. One of the
reasons for thislow transmitter power outputrequirement isdue to the use of the
highlysensitivedetectingdevices now availablein integratedform_ One of the most
suitablecandidates for thisdetectionfunction is the Schottky barrier diode. These
diodes are compatible with integratedconstructiontechniques and have shown per-
formance which is superior to standard diodes at frequencies up to 90 GHz. The use
of thistype of a detector would provide compatibilitywith the present solidstate
activedevices and passive components atthe upper RF frequencies. Another reason
for a low transmitter power is the short distance (lessthan 12 inches)thatthe signals
willneed to be transmitted.
D. 4 RF COMMUNICATION SYSTEMS
D. 4.I_Analysis
D. 4.1.1 Systems Considerations
This sectionwillpresent an analysis of the performance of a general r-f com-
munication system thatutilizesfrequency multiplexingofbinary data channels. The
results here will apply to specificsystems presented in Section D. 4.2. Inthe design
of a frequency multiplexed communication system, itis common practice to dividea
band of frequencies intoa number of equally spaced data channels. Each channel has
a bandpass filterwhose general functionis the preventionof interferenceor crosstalk
from signals of other channels as well as the reductionof noise.
In most communication systems, bandwidth limitationis the primary considera-
tion. However, in thisintracomputer communication study the tuned circuitcom-
ponents are the limitingfactors.
Because the totalcommunication system must be capable ofbeing mounted on an
integrated circuitwafer, the overall size of the circuitcomponents such as the quarter
wave sections used for the singletuned circuitsmust be on the order of a few milli-
meterS. This requires thatfrequencies in the range of 10 GHz or higher must be
D-5
utilized. However, the maximum realizable tuned circuit Q's that can be obtained for
resonant elements at these frequencies is about 50. For frequencies on the order of
10 GHz to 20 GHz, a Q of 50 results in 3 db bandwidths between 200 to 400 MHz, which
is considerably greater than the signal bandwidth (for purposes of this analysis,
assumed to be 4 MHz). For this reason, the system design is controlled by circuit
limitations rather than data transfer rates.
Shown in Figure D-3 is a plot of the signal to crosstalk ratio as a function of the
frequency spacing of the channels and the number of tuned circuits.
The total crosstalk in one channel is the RMS sum of its response to all of the
transmitted tones. Although the crosstalk is usually affected by the spectrum of the
transmitted signal, the 3 db circuit bandwidth to data spectrum is sufficiently large in
this instance so that it can be neglected.
The curves in Figure D-3 were plotted assuming a carrier frequency of 10 GHz.
From this figure it can be seen that for a signal to crosstalk ratio of about 20 db at
least two tuned circuits and a channel spacing of 400 MHz is required. The overall
system bandwidth for 20 channels would then be approximately 20 x 400 MHz or
8 GHz. Also because resonant elements are used for the tuned circuits it is important
to limit the overall system bandwidth to less than one half octave to avoid problems
with harmonics. This then requires the 20 frequency multiplexed channels to operate
between approximately 16 and 24 GHz which is within the current state of the
technology.
Consideration of more than approximately 20 channels is given in one of the RF
systems to be presented in the next section. This system assigns one channel to each
bit in each and every cell. Such a system would require a bandwidth of approximately
70 GHz and an operating frequency range which would extend from 140 to 210 GHz.
Such a system does not appear to be technically feasible. Therefore only systems that
limit the number of channels to approximately 20 will be considered.
D. 4.1.2 ON-OFF Amplitude Modulation Performance Analysis
On-off modulation was chosen as the intra-computer communication technique
because of its simplicity in application, its amenability to microminiaturization at
operating frequencies in the range of 16 GHz and because an optimum modulation
technique is not essential in a system where the signal-to-noise ratio can be main-
tained at relatively high levels.
Shown in Figure D-4 is a simplified block diagram of an RF communication
system. At the output of the envelope detector shown in the block diagram, it is
possible to define an envelope function (p) where:
p2= (E s +x)2+y2
D--6
\\\\\!0tO11
_3
° \
r_
\
\
\
\
\
\
\
\
\
\
\
\
I I I I I I I I I I I
(E(I} _IV.I. SSOHO/_IVNOI_
0
0
0
Cr_
0
0
0
0
0
-- 0
0
0
0 C/3
0
o
c_
c_
0
M
d
D.-7
_o
z
o_
\
z
o
ui
O
z_
/P
z
O
r..)
r_
v
ell
e_
O
gl
0_.,i
o.
o
r..)
4_
Q
I
D-8
where E s is the signal voltage when the transmitter is turned on by the data; x and y
are assumed to be independent gaussian variables where:
p(x) = exp(-X2/2
2)
p(y) = exp (-y2/2
2 -o
are the density functions of these variables and where a is defined by:
a 2 2 _o fc= (n (f + fc) df
where co(f + f_) is the spectral density of the noise and fc is the equivalent lowpass cut-
_J
off frequency of the two cascaded single tuned circuits shown in Figure D-4.
The density function of the envelope when the transmitter is on can be shown
to be:
P s,n(p)
ok,,2 /
where Io (EsP/a2) is the zero-order Bessel function of imaginary argument.
When the transmitter is turned off, E s = 0o and the probability density function
of the envelope without signal is simply:
_p2
p n(p) = _2 exp\2az]
The probability distribution function of the envelope without the transmitter
signal present is:
fo pP n(p) = qn (p) dp
D--9
which is the so-called Rayleigh distribution which applies to the envelope of
narrow-band gaussian noise. The distribution function P s, n (P, Es) representing the
envelope of the sum of a sine wave and gaussian noise is given by:
t Es PtPs,n(p, E s) = 1-Q a , a
in terms of the Q function where:
fb °
Q(a, b) = exp (-a2/2) x exp (-x2/2) I° ax dx
where:
1
1 f-1 (1-t2)-1/2 exp(aXt) dtI° (ax) = _-
Numerical integration can be used to evaluate the integral to the accuracy desired.
Q s, n ( p, Es) is the Rice distribution which degenerates to the Rayleigh _istribution
when E s is zero and approximates a gaussian distribution when E s > > a •
The probability of error of the threshold detector which follows the envelope
detector can now be calculated. The threshold is set at kE s so that a mark is
detected if the detector output p is greater than kE s and a space is detected if p is
less that kEs. -7 a space signal is sent, that is if the transmitter is not on, the signal
is erroneously read as a mark if p exceeds kEs. Hence the probability of this error
is given by:
P(M/S) = 1 - P n (kEs) = exp
If a mark is sent, the envelope is that of the noise plus the transmitted signal and an
error is made if p is less than kE s and the probability of this error is given by:
P(S/M) = P s,n (kE s, Es)
When the signal-to-noise ratio is relatively large the following approximation
holds:
P(S/M)-= exp
where Y= Es2/2 a 2"
3 + 6k- k 2 +- -.]
16k( 1-k 2) Y ]
D--10
In contrast to the case of coherent detection, the probabilities of error are not
equal when the threshold is set at Es/2. Shown in Figure D-5 is a graph illustrating
the effect of signal-to-noise and threshold variation on the error rate. Although the
optimum threshold does vary somewhat as shown in Figure D-5 it does not change
sufficiently to cause more than about a 1 db degradation in the overall performance of
the system. Shown in Figure D-6 is the performance curve of the system under
consideration showing the error rate that would be obtained as a function of the signal-
to-noise ratio at the input to the decision device. Also shown are the error rate
curves that would be obtained for k = 0.5.
Although these performance resultswere obtainedby assuming thatthe disturb-
ing influencewas white gaussian noise, itin factwillmost likelybe adjacent channel
crosstalk, which willconsist primarily of the two adjacent channel tones thatwill not
be completely filteredby the channel filter. Therefore, itwillnot be gaussian and
somewhat betterperformance than thatindicatedby Figure D-5 can be obtained.
From the analysis in the previous section (D.4. l. 1), it was seen that a signal-
to-crosstalk ratio of 20 db could be obtained, which based on Figure D-6 and the
above comments should result in approximately errorless performance for the
intracomputer communication link when considering the effects of crosstalk alone.
Other noise sources such as logic noise, etc., have not been considered here, the
intent of this section primarily being an investigation of the system operation.
D. 4.2 Description of RF Communications Systems
D. 4.2.1 Introduction
There are two principal methods of combining several (binary) channels on a
single physical bus. One is frequency multiplexing and the other is time multiplexing.
Frequency multiplexing utilizes several carrier frequencies which are independ-
ently modulated by parallel data sources. At the receiving station these frequencies
are separated by filters or other means and the modulation detected in order to
recover the data. For the systems under consideration, only simple on-off modulation
will be considered. Then the receiving station simply has to detect the presence or
absence of each of the tones.
In time multiplex the transmission time is divided up into time slots and each
of the multiplexed channels has a time slot assigned to it. The time slots can
correspond to serial bits transmitted from a cell or to the sequence of transmitting
times assigned to cells or a combination of these two.
Control and supervision of the cells in the communications subsystem is a
difficult question involving the processor organization to some extent, Certain trade
offs are involved which cannot be evaluated here. For example, most of the supervi-
sion could be concentrated in the groupTs control cell or most of the supervision could
be distributed among the communicating cells. There is a software vs hardware
trade off implicit in this dichotomy which requires detailed descriptions of each of the
technically feasible systems before the trade offs can be fully specified. This problem
is not considered further here except to illustrate systems operation.
D-11
0=
t_
©
fla
<
er_
O
e_
e_
I0 -3
"4
I0
10 "5
"6
10
-7
10
P(S/M)
P(M/S)
13DB
P(S/M)
I I I I
B.5E S 0.52E S 0.54E S 0.56E S
K = SLICER THRESHOLD
Figure D-5. Probability of Error as a Function of Data, Threshold
Level and Signal-to-Noise Ratio
D--12
10 -2
I0 -3
OPTIMUM
SLICING
LEVEL
\
\
0
b
re,
10 -4
10 -5
10 -6 \ \
-7
10
\ \
10 -8
\
1 1 1 A 1
9 11 13 15 17
AVG SIGNAL POWER/NOISE POWER (DB)
FigureD-6. Theoretical Performance of ON-OFF Amplitude Modulation
with Envelope Detection in White Gaussian Noise
D--13
In this sectionblock diagramsof four possible RF communicationssystems are
discussed. Theseare: 1. Simpletime multiplex; 2. Simplefrequencymultiplex;
3. Full frequency- simple time multiplex; and, 4. Full frequency- frequency
multiplex.
D,4,2, 2 RF System1 - SimpleTime Multiplex
In RF System1, the words are transmitted serially with only onecell transmitting
at atime. {Thewords are time multiplex; the cells time share the bus.)
Figure D, 7 shows a functional block diagram of a simple time multiplex system.
In this system an oscillator is fed through a modulator (an on-off gate) which is con-
trolled by the output of a parallel to serial converter fed by the data register. The
parallel to serial converter converts the data word into a serial bit stream which
turns the modulator on and off controlling the output of the oscillator. This sequence
of carrier pulses is transmitted on the bus to the receiving cells.
Each receiving cell is activated (for reception) by enabling a gate feeding a
filter. The filter may not be essential for this system unless switching transients,
etc., are severe. The filter feeds a detector which produces an output whenever the
oscillator's tone is present on the bus. The output of the detector is fed to a serial to
parallel converter which reconstructs the parallel computer word.
This system is almost as simple as DC serial word transmission; it is feasible
if the average and peak data transmission demands of the system are not too high for
the technology.
It may be possible to speed up the transmission of the serial data words through
the use of a fast rise time device in the modulator. In radar work it has been pos-
sible to modulate gigahertz carriers using 100 megahertz modulating signals. It thus
appears feasible to modulate the output of an oscillator operating in the 10-50 gigahertz
range at better than 18 x 2 = 36 megahertz rates. If this is possible, then a computer
with a basic clock rate of 2 megahertz could transmit 18 bit words serially in each
clock period. This implies that simple time multiplexing should impose no delay
restrictions on the cells. That is, when the transmitting cell is allotted its time slot
for transmission, the transmitting cell could come up with data words for trans-
mission as fast as it can and the time multiplex system would transmit them to the
receiving cells.
D. 4.2.3 RF System 2 - Simple Frequency Multiplex
Figure D-8 is a functional block diagram of a scheme for frequency multiplexing
the 18 bits in a word so that they can be transmitted in parallel from one cell to another.
In this scheme the outputs of 18 oscillators operating at 18 different frequencies are
controlled by gates operated by the data bits in the 18-bit register. (The gates are the
"modulators. ") It should be noted that a combination of time multiplexing could be
considered here also. For example, 9 oscillators could be used to transmit half a
word at a time etc. Thus, the output of oscillator 1 is fed through to the bus whenever
dl is a 1, and is not fed through to the bus when dl is a 0. Similarly, the output of
oscillator 2 is fed through when d 2 is a 1 and is not fed through when d 2 is a 0
D-14
II I I II II
r'l
F
0
0
[.z.1
o_
o_
_J
Z
o,.=_
o..._
r_O
c_
o,=.i
D-15
OSC
NO. 18
FILTER
NO. I
FILTER
NO. 2
FILTER
NO. 18
DETECTOR
NO. 1
DETECTOR
NO. 2
DETECTOR
NO. 18
TRANSMITTING
CELL BUS
RECEIVING CELL(S)
Figure D-8. Simple Frequency Multiplex
D-16
and similarly for the outputs of the other oscillators.Thus, a transmitted word
appears on the bus as a sub-set of the 18 oscillatorfrequencies. In thiskind of a
system a transmitting cellcan transmit to one or more receiving cells. Each of these
receiving cells,in fact, each of the cells in the system, would have 18 filterscor-
responding to the 18 frequencies of the oscillators. The outputs of these filterswould
be fed to threshold detectors to determine the presence or absence of energy at those
frequencies. Thus, ifdl were zero, frequency 1 would not be present on the bus and
the outputof detector 1 would be a zero. Similarly ifd2 were a 1 the outputof
oscillator2, f2, would be present on the bus and the outputof detector 2 would be a 1.
Similarly for the rest of the detectors.
This system is basicallya switching system inwhich one transmitting cellat a
time is switched on to the bus and the receivingcellsare simultaneously switched on
to the bus. The current transmitting cellwould transmit itsinformation to the
receiving cell. The control cellcould monitor the transmission and when itdetected
an end of transmission itcould then activatethe next transmitting cell,deactivateall
the previously receiving cells and activatea new set ofreceiving cells. The new
transmittingcellwould then transmit itsinformation,etc. This process would be
repeated to take care of the transmission requirements of the system.
There are various drawbacks to this scheme aside from technological problems.
First of all, those cells desiring to transmit must wait in line to get on the bus. This
can result in lost time in the cells as well as potential queuing problems. Second, the
control of this system requires that a previous set of receiving cells be deactivated
and a new set of receiving cells be activated for each transmission period. Unless a
separate supervisory bus is provided, this uses up transmission time and again raises
the overhead of the system.
Technological problems may rule this system out; i. e., there are definite prob-
lems of interference between several oscillators operating simultaneously on a single
chip, and there are also problems in connection with loading the bus with a varying
number of filters.
The loadingproblem can be solved in a straightforwardmanner as illustrated
in the detailedexample of thissystem given below. Interferencebetween oscillators
operating simultaneously on a singleship is a more difficultproblem. Some device,
eitherpassive or active, is required to isolatethe oscillatorsfrom one another. A
simple filtermay be adequate, but ifnot, an isolatorwould very likelyhaveto be
developed. At present there is not an adequate isolator;however there may be one by
the time the system is ready for development.
A detailedblock diagram of the Simple Frequency Multiplex System willbe given
below. This system is presented in detailsince itisexpected to be the most demand-
ing as far as the RF circuitmechanization is concerned. The discussion below may
be extended to the other three systems in _.his3ection. The block diagram in Fig-
ure I)--9shows in detailthe concept of a twenty channel simple frequency multiplexed
communications system in the transmit-receive and terminated mode. A cellin the
terminated mode is simply selectea not to receive any ofthe transmitted information.
The transmitter section consists of small LSA devices which are tuned at equally
spaced frequencies fl - f20. Each of the outputsto the main lineis controlledby
D--17
CELL A TRANSMITTER POWER PULSE RF PORTION
TRANSMIT MODE OF CELL A
MULTI BIT WORD FROM PROCESSOR
READ, WRITE CONTROL
TRANSMIT
SIGNAL
mmmm m mmm mmm emmmm m mmm m mmm mmm mmmm m mm m mmmm m mm
CELL B NO POWER PULSE RF PORTION OF CELL R
RECEIVE MODE
TRANSMITTER (OFF)
RECEIVER (ON)
11
READ SIGNAL
m mm mmm almlmmm m qmmnmanm m m mum qmmmlmm am ammmmm s aulmmm m qunnnlmm
CELL C NO POWER PULSE RF PORTION OF CELL C
T ERMM_AT ED MODE
TRANSMITTER (OFF)
RECEIVER (OFF)
TRANSMIT SIGNAL
Figure D--9. Detailed Block Diagram of Simple Frequency Multiplex
D-18
switchingdiodeswhosestate is determinedby theprocessor logic. The main drive
for all of the transmitter oscillators is provided by a separateline which provides the
necessarypower at the proper intervals to generatethe RF energy in the solid state
oscillators.
Oneof the primary considerationsof systemoperationis transmitter oscillator
power. The following assumptionswere madein calculatingthe transmitter power.
The maximumtransmission line lengthbetweentransmitter andreceiver will be less
than6 inches. The powerwill be fannedout to no more than twentystations. The
receiver sensitivity will be that of a typical videodetector, - 50dbm. A filter loss
of 2 dbwill beexperienced. The required minimumsignal to noise ratio at the
detector outputwill be i0 db. A receiver input attenuator pad of 6 db will be used as
a simple method of reducing any mismatch reflections. When all of these factors are
taken into account the required transmitter power for each frequency channel is
0.05 mw. The total output power of all 20 transmitters in a cell will be 1 mw. For a
twenty cell group the total DC input power requirement will be only 660 mw if a device
efficiency of 3 percent is assumed.
The components which will take up the largest area of the chip are the filters
which are required to separate the frequency channels at the receiver. The second
largest components may be the power dividing networks. The assumption is made
that a two section filter is required in each channel and that the power division may
be included by using component design which doubles the functions per area. The
required separation between transmission lines to provide isolation is included in this
calculation. The area per filter section was determined from an experimental filter
constructed at 9 GHz on an alumina substrate ( e = 9). At this frequency the total area
required for the twenty filters in the receiver is 1.6 square inches. If the frequency
were doubled, operation centered about 20 GHz, the area would be reduced to
0.4 square inches. This is probably a minimum area since a detailed design will
require other functions. The area may be kept close to this by special multi-purpose
circuits. For example the tuned circuits required for the I__A transmit mode opera-
tion, may be the same tuned circuits as used for the receiver section. These would
then be shared by a diode switching scheme. The Schottky barrier diodes in the
circuit should tolerate the transmitter RF power well since the minimum burn out
level is in the 10 mw region. These diodes have very high frequency properties but
unlike most microwave diodes can tolerate a high power level. Not only can these
diodes be used as detectors but they can be made to function as a low power high
speed switch. This leads to the possibility of using a more sophisticated switching
scheme in which a combination series parallel transmission arrangement might be
utilized.
D. 4.2.4 RF System 3 - Full Frequency - Simple Time Mult.plex
The simple time multiplexing scheme described in section D. 4.2.2 can be
combined with frequency multiplexing into a serial-word time multiplex, with cells
frequency multiplexed as shown in Figure D-10. One possible way of obtaining time-
frequency multiplex would be to have each of the 20 cells assigned a particular
individual frequency. Its oscillator would then be tuned to that frequency. Then, to
set up a broadcast network, the receiving cells would be actuated by gating on the
appropriate filter. A particular configuration is shown as an example in Figure D--10
in which there are three transmitting cells which are assumed to be cells 1, 2, and 3,
D--19
0 d
D-gO
transmitting on frequencies fl, f2, and f3. Each of these cells is received by several
cells, shown at the right of the figure. Cell 1 is transmitting to cells 4 through 7.
Similarly cells 8 through 15 are receiving data from cell 2 on frequency f2 and their
filters for f2 will be turned on. Finally, cells 16 through 19 would be receiving from
cell 3 on frequency f3 and their filter for f3 would be turned on. Cell 20 is the control
cell. In this particular configuration three cells are then transmitting to the other
cells in the group. It is clear that many other arrangements are possible with the
maximum number of transmitting cells being ten, in which case there would be ten
receiving cells. At the expense of additional RF hardware, one could have the cells
simultaneously transmit and receive. There could then be a maximum of 20 trans-
mitting and 20 receiving simultaneously. The frequency multiplex scheme which
allows simultaneous transmission from !0 transmittin_ cells to 10 receiving cells is
called "full frequency - simple time multiplex. "
The tuning of the oscillators mentioned above can be achieved through the use of
the Gunn effect. Devices based on this effect oscillate at a frequency which is
dependent on an input voltage. If it is not possible to tune the oscillators to the
precision required, then as many as 20 fixed frequency oscillators and 20 filters
would be required on each chip.
It is simpler in terms of supervision to have each of the 20 cells have its
individual frequencies as specified above. In this way, when a broadcast network is
to be set up the only thing that must be done is to gate on the appropriate filter in each
of the receiving cells.
To emphasize the total capability of full frequency multiplex, the example of
Figure D-10 had all 20 cells in operation simultaneously. This is not necessary.
Some of the cells could be in the process of computing and could be neither trans-
mitting or receiving at a particular time.
D. 4.2.5 RF System 4 - Full Frequency - Frequency Multiplex
The simple frequency multiplex scheme described in section D. 4.2.3 can be
expanded into a full frequency - frequency multiplex scheme. This can be done by
assigning each of the 20 cells its own set of 18 carrier frequencies. These would be
the frequencies of the oscillators on the chips. To set up a net, 18 filters correspond-
ing to the oscillator frequencies of the transmitting cell would be connected from each
of the receiving cells in the net to the common bus. This would require either tunable
filters -- which do not appear feasible -- or a total of 360 filters on each chip in
addition to its set of 18 oscillators. Clearly 360 filters plus 18 oscillators is too
complex compared to the other schemes. It is posslble to cut down the number of
filters by using tunable oscillators. Even so, full frequency multiplexing between
cells with the 18 bits of a word transmitted in parallel would require at least 10x18 or
180 frequency multiplex channels. This is a large number of channels in a bandwidth
that is practicable for the projected state of the technology.
Expansion of frequency multiplexing in this fashion compounds the problems
mentioned in section D. 4.2.3. In addition the problem of how many channels can be
multiplexed in a given bandwidth becomes critical and as pointed out in section D. 4.1.1
may not be technically feasible.
D--21
D.4.2.6 Discussionof Systems
Various other combinationsof time andfrequencymultiplexing canbe considered
for usein the distributed processor. However, the systems describedcanbe viewed
as boundingsystemsin that the first two are the simplest andthe last two are the most
complexthat shouldbeconsidered. The basic choiceappearsto bebetweensimple
time multiplex, simple frequency multiplex, and full frequency-simple time multiplex.
The system which has the most flexibility and highest capability of these three is
the full frequency-simple time multiplex. This will be made clearer in section D. 5
where the effects on the computer architecture are discussed. The basic difference
between the first two systems and the last one is in the area of supervisory control.
The first two time multiplex the cells and are quite similiar in supervision to the
present system which uses DC transmission. The last system can simplify certain
aspects of supervision while making others quite complicated. Overall there may be
an increase in supervisory complexity in going to the full frequency - simple time
multiplex system. With regards to technology, the second system, namely the simple
frequency multiplex system, would require the most complex mechanization. It also
appears that simple frequency multiplex would put the biggest burden on the technology,
i. e., interference between oscillators, variable loading due to filters, etc.
D. 5 COMPUTER ORGANIZATIONAL CONSIDERATIONS
The previous discussion in this section has presented various RF communication
systems. The last two systems discussed in section D. 4.2, namely the Full
Frequency - Simple Time Multiplex and the Full Frequency - Frequency Multiplex
have the most significant affect on the computer organization or computer architecture.
A discussion of the impact of a Full Frequency system will be given below. For pur-
poses of this discussion the Full Frequency - Simple Time Multiplex System will be
considered, however it is equally applicable to the Full Frequency - Frequency
Multiplex System.
The Full Frequency system may be thought of as a full intercommunication
system. There is a communication path between each and every group, and between
each and every cell in a group.
The effects on the inter-group communications does not result in a great impact
on the computer system design. Whether the groups can share one or several inter-
group busses or whether the groups have a connection to every other group does not
change the functional usage or the basic capabilities of the groups, other than an
increased response time whenever communication is desired between groups. How-
ever, the result is significant where the inter-cell communication is concerned. The
resultant computer system has greatly increased eapability and can perform functions
that are performed only with difficulty with the present DC time shared inter-cell bus
system.
Presently the distributed computer system will be limited in capability primarily
by its inter-cell communication system. In section 6 of this report, it was seen that
the neighbor to neighbor communication system was justified, among several reasons,
because of the difficulty in time sharing the inter-cell bus. The first step in improving
D-22
the system is to increase the capacity of the inter-cell bus. This means either
increasing the transmission (clock} rate, adding more wires, or frequency multi-
plexing the bits. The clock rate reaches a limit based on the hardware capabilities,
the extra connections required for more wires decreases the reliability. The RF
schemes therefore appear particularly attractive in increasing the communication
system capability. The construction of a cell using RF communications is shown in
Figure D-11.
The first big advantage, over and above the increased communication capability
available, is the reduced number of connections. There are no neighbor lines, no
I/O lines etc. In fact, there are three basic lines required, DC power, ground, and
a connection to the transmission line. A fourth connection might be needed for a cold
(power off) start, but would have no other use. Each cell transmitts on a particular
frequency; the time-frequency spectrum of the communication system is shown in
Figure D-12.
Now with regards to the effects of having a full intercommunication capability.
No controller cell action is required to set up communications. In the present system,
with only one cell bus shared by all, the controller cell was required to resolve con-
flicts and schedule the bus. In the new system, any cell can talk to any other cell.
The only time a controller is required is when one cell wants to interrupt another cell
that is already busy communicating with a third cell. I/O can go directly to the cell
or cells in which it will be used, the controller cell is not needed. Several cells may
receive the same data by having several cell's receiving on the same frequency.
Many dependent programs may take place simultaneously. The present computer
system, because the dependent cell programs must go out on the cell bus, has only one
controller cell that sends out dependent programs. The new system may have many
cells send out instructions to many other cells, as shown in Figure D-13.
Cells C5, C8 and C19 are all tuned to the frequency used by cell 1, namely fl.
Cell 20, transmitting on a different frequency, f20, can send a dependent cell program
to C2 and C15 simultaneously and not conflict with cell C1. Thus the controller cell
no longer has the problems concerned with the dependent cells. This may also reduce
the need for a large number of groups. Groups would be required for increasing
reliability and for a shortage of frequency allocation in one group.
Cells communicate when they are ready, and not when they are polled by the
controller cell. This provides a great increase in capability in recoafiguration, and
decreases the response time to failures and high priority interrupts. A cell may
receive its inputs, do calculations, and send the outputs to another cell without
requiring the controller cell. A faster response to an emergency condition is achieved.
Reconfiguration is considerably simpler. There are no geometry restraints due to
neighbor lines.
It is seen this system has greatly increased capabilities, and the software
problems concerned with scheduling resources are simpler. The controller cell,
which was concerned with trivial but time consuming problems of scheduling the
intercell bus, has been transformed into a true executive. A true executive does not
take part in the day-to-day operations, but initiates a program, gets it started,
monitors the program's operation, and concludes the program when its priority
D-23
PROCESSOR
REGISTERS
PROCESSOR
MEMORY
CONTROL SECTION
ADDER AND
ARITHMETIC
SECTION
RF TRANSMITTER
FILTER
AND
ISOLATOR
RF RECEIVERS -._
i I I
4----
DC POWER
GROUND
'_ TRANSMISSION
LINE
Figure D-11. Construction of New Cell
D-24
t
F
R
E
Q
U
E
N
C
Y
CELL 1
CELL 2
CELL 3
CELL N
I/O CONTROLLER 1
I/O CONTROLLER N
i
GROUPSWITCH
t i
TIME
Figure D-12. Time Frequency Spectrum
I.....! I.... I IC5 C8 C19
i
t tt.
I "
Ic21 Ic1 i
Figure D-13. Multiple Dependent Program Sources
becomes low, or its reason for existing has expired. These are the true executive
functions. The input/output of normal data, instruction execution, normal output are
all done without the controller cell having any essential part of the operation. The
controller cell will only monitor the normal operations, thus, the controller cell may
fail and the other cells may continue their normal operations with no changes. Chang-
ing the controller cell to an executive cell will increase reliability, the whole group
will not necessarily fail if the controller cell fails.
D-25
The controller cell now becomes an executive cell with the following functions:
1. Monitor normal operations
2. Allocate system resources whenever
a. a conflict arrises; e.g., two cells wish the same I/O controller,
b. a change in priority or mission occurs,
c. a failure occurs.
3. Monitor and communicate with the group that is the system executive.
This reduction in controller cell functions also reduces the storage
imbalance between the controller cell and working cells. In the present
system, the controller cell was expected to require considerably more
storage than the other cells.
It should be noted that the I/O is not explained in great detail here. The amount
of I/O, the frequency allocation, etc., will depend upon the capacity required. For
example, the bulk storage unit may have several frequencies, as many as there are
groups, so it may communicate to all the groups simultaneously.
Each I/O controller can have one or more frequencies allocated. Low data rate
devices can be multiplexed into one controller. High data rate devices can use a unique
frequency. The I/O controllers, since they do not normally communicate with each
other but only communicate with cells, may be controlled by a cell to ensure there are
not two I/O controllers on the same frequency. Note that this time sharing of a fre-
quency allocation cannot be used with the cells, except in a degenerate case. The
cells are all independent and may communicate with any other cell. Thus to force a
time sharing of frequencies forces a special cell to allocate time and frequencies, and
the system degenerates into the present system. Time sharing may be used only if
some cells communicate with only one cell, the receiver cell may allocate frequencies
and time. Now reconfiguration is more difficult.
The most capable cell is the cell that can transmit and receive simultaneously.
This cell makes the best usage of the frequencies. A cell that may receive but not
transmit simultaneously will always have one frequency not used; e. g., the transmit
frequency. Thus half the frequencies may not be used.
A disadvantage of this system is the problem of providing a high priority
interrupt. If the executive cell wishes to interrupt a cell that is busy communicating
with another cell, special hardware on the receiver circuits may be required to let the
cell know another cell wants to begin communicating with it.
D.6 CONCLUSION
This section has outlined various types of RF communications systems that
could be applied to enhance_the communication system of the distributed processor.
It has been shown that these could have dramatic affects on the computer architecture.
In addition, considerable research has been conducted in solid state RF circuits and in
D-26
the time frame of the distributed processer technology (1975) these devices should be
capable of being incorporated into the cell wafer. Definitive comparisons of these
systems both between themselves and with the present DC system cannot be made at
this time. To do this the technology needs to be investigated further and the effects
on computer operation, namely supervision, need to be studied in depth.
D-27
APPENDIX E. ERROR CONTROL IN THE COMMUNICATIONS SYSTEM
This appendix analyzes the error control problems associated with transmitting
data from one cell to another within the multiprocessor. This analysis is particularly
relevant to the RF Communication Systems. For high reliability of critical data,
data that cannot be discarded, error corredtion is required. Error detection with
retransmission of messages within which an error has been detected is the solution
that will be considered here since it is assumed the system will have ample time for
retransmissions. It will be assumed here that the S/N ratios at the receiving cell's
input will be high enough that errors from this source can be neglected. The
remaining sources of significance will be self-noise from the multiprocessor and
crosstalk between the binary channels of the communications system. These can be
reduced to very low levels through proper design.
Although the errors that do occur during transmission may be systematic,
i. e., have patterns that occur more often than others, there is no way of determining
this except through experiment with an actual system. Therefore in the derivations
that follow it is assumed that errors occur independently and at random. It is also
assumed that the probability of receiving a "1" when a "0" is transmitted is the same
as for the opposite type of error. These assumptions are those which define the
binary symmetric channel which is characterized by p, the probability of a bit being
received in error. The communications system then is composed of a number of
binary symmetric channels.
Messages which are transmitted from cell to cell are made up of 18 bit words.
The number of words in a message could be fixed or it could vary from one trans-
mission to the next. Therefore the formulas derived here will be for messages of
N words.
The first step in deriving formulas expressing the performance of a
retransmission error control system is to find Pw, the probability of an undetected
error in an 18 bit word. Each 18 bit word contains one parity check bit checking all
the bits in the word. Pw then is the probability of getting an even number of errors
in the received word. Thus, letting q = l-p:
P
W i even = i even - q
i_0
18
p)18] 18= I/2 (q+p) 18+(q_ _q
= 1/2 [I+(I-2p)18] _(1-p)18
E-I
2
= 153p
This approximation holds when p is small with respect to 1/18.
The next step is to derive PN the probability of an undetected error in a message
consisting of N words. This is:
_ _ pw ) N << 1/NPN = 1 (1 _" NP w for Pw
This parameter is the probability of a message containing errors being accepted at a
receiving cell as being correct.
When the transmitted data is not critical, it can be marked or discarded
whenever an error is detected. When retransmission is used, there is the probability
that an undetected error will occur in the retransmitted message. This can increase
the expected probability of error, but the increase is still insignificant for the
communications system. This follows from the following:
Let PRN(n) be the probability of an undetected error on the n th transmission of
a given N word message. That is, errors have been detected in n - 1 transmissions
but not in the n th. An error is detected in an 18 bit word only when an odd number of
errors occur. Designate the probability of this event by QDW • Then:
18
= =
18p
The probability, QDN, of detecting an error in an N word message is the probability
of detecting errors in any one of the N words. Approximately:
QDN = 18Np
Then the probability, P B_(n}, of detecting errors in n - 1 transmissions and not
detecting errors on the n transmission is:
n-1
PRN (n) = (ISNp) NP w
= (18Np) n-I N153p 2
E-2
The expectation of PRN(n) over all values of n is:
CO CO
PRN = n _ 1 ( 18sp)n-1 N153p2 = N153p2 n _= 0 (18Np)n
_ N153p 2
1 - 18Np
-_ N153p 2 for 18Np<< 1
The expectation of the transmission time lost due to retransmissions is an
important parameter. This is:
CO
PRN (n) T n
n=l
where T n is the time required for a n transmission. Substituting for PRN(n) from
the above, and setting T n = nT : where T is the time required for a single
transmission:
CO
E (Tn) = _ PBN(n) T n
n=l
= _ (18Np) n-1 N153p 2 n T
n=l
CO
= N153p2T _ n (18Np) n
n=0
= N153p2T 1 --- N153p2T for 18Np << 1
(I - i8Np) 2
It can be concluded from the above formulas that there will seldom be a
retransmission, and almost never more than one retransmission. The probability of
undetected errors is extremely small since p is small relative to the other pertinent
system parameters.
E-3
APPENDIX F. ALTERNATIVEDISTRIBUTED ORGANIZATIONS
There are a number of alternative distributed logic systems to the chosen
DAMP System described here. Three such systems, the Holland machine, Solomon
machine, and IUiac were discussed earlier. These structures were shown to have
interesting features but they were not practical for a spaceborne computer designed
for high reliability on general purpose computations. Alternative forms of the
DAMP System could use the same basic cell structure but different quantities of
memory or different communication links. The addition of more memory to each
cell in the DAMP Systerh has been mentioned earlier. This would be necessary if
the Cells were found to be memory limited. The addition of memory could be carried
out by adding to each single wafer cell additional wafers containing 5!2 to 1024 words
of storage and no processing. This would of course increase the number of wafer
types to two along with increasing the number of interconnections in the system. A
system composed of the new larger cells (two or more wafers) would now contain a
significantly smaller number of total cells.
Another consideration for a distributed processing system is to take the
suggested DAMP System and decentralize it such that each set of sensors (e, g. the
navigation sensors or the telecommunication sensors) have their own group of cells
located physically with the sensors. In fact the group of cells could b_ packaged on
the same board as one of the sensors in each set. The group of cells would handle
all processing and control associated with the set of sensors. Howevbr there would
still be a need for a central communication group to handle the transmission of
information between sensors and to provide some executive functions. For example,
results of scientific experiments would have to be sent to the telecommunications
group. The resulting decentralized DAMP system is ShOWn in Figure F-1. It
should be realized that the distances from the sensors and groups of cells to the
communication units may be quite long; as a result the communication rates can
not be high. However, this later restriction should nat be too severe if the sensors
are grouped so that only a small amount of intercommunication between groups is
necessary.
This decentralized system could be considered to try to achieve greater
reliabilityfrom the system hardware. Locating tb_ groups of cellsphysicallyclose
to the sensors cuts down the probabilityof failureof the communication linesbetween
the processors and the sensors. R also decreases thepower dissipationof the
system and somewhat simplifiesthe real time controltask of the processor. The
latterimprovement is realized due to the lack of long delays in receiving information
from the sensors. However the advantages of thisorganizationmay be outweighed
by itsdisadvantages in relationto the chosen DAMP system. The need for central
communication groups requires long communication linesfrom the sensors to a
central pointand as a result essentiallycancels the reliabilitygained by decentralizing
the computer. This can be understood by realizingthatthe central DAMP system
used conditionerslocated near the sensors to minimize the communication linesto
the central computer. However, not only may the decentralizedstructure not require
conditioners, but also some of the groups could operate autonomously of the central
executive once loaded from the bulk storage unit. Another possible disadvantage is
the factthateach group in the decentralized system would require atleast some
amount of itsown executive and common data storage areas. As a result, some
increase in totalequipment would be necessary. In addition,two central executive
groups should be used to guarantee an availableexecutivegroup. Of even more
F-1
BULK
STORAGE
UNIT
II
t I
I
BYTE PARALLEL
INTER-GROUP BUS
SENSOR
GROUPI
_ PROCESSING }
DAMP
GROUP
!
SENSOR
GROUP I1
C PROCESSING }DAMP
GROUP
1' I
I
I
I
I
I
I
SENSOR
GROUP N
PROCESSING
DAMP
GROUP
Figure F-1. Decentralized DAMP System
F-2
importance than these points is the fact that there would no longer be a relatively
large and powerful central computing facility to carry out the background programs.
The distributing of these programs may be somewhat difficult and inefficient due to
the long communication delays between parts of a large program that could be located
in two separate groups. It is also clear that the system is not as flexible as the chosen
DAMP system and as a result its ability to gracefully degrade is somewhat limited.
(The groups now have explicit functions rather than being available to carry out any
combination of tasks). However, this latter point should not be as severe of a limita-
tion as the earlier points since the reliability of a group of cells with a few spares
connected on the bus should be considerably greater than the reliability of a complicated
sensor.
The preceding discussion shows the difficulty in choosing a decentralized or
centralized DAMP System as the best structure for the given requirements. A con-
siderable amount of future design and development of the sensors will be necessary
before a well founded decision can be made. In the meantime, for this study the
centralized system was investigated because it was felt that this structure will be
more natural for development of software and investigation of programming and
system features. In addition, both have much in common so that the vast majority
of the developmental work of the centrailized structure will be equally applicable to
the decentralized structure.
F-3
P
REFERENCES
ii
..
.
4.
.
o
o
o
9.
i0.
II.
12.
13.
14.
15.
16.
Study of Spaceborne Multiprocessing Final Report, North American Aviation/
Autonetics Division, April 1967, C6-1476.10/33.
Study of Spaceborne Multiprocessing, North American Aviation/Autonetics
Division, First Quarterly Report- Phase 1, June 1966, C6,1476.1/33.
Manned Mars Surface Operations, AVCO/RAD, RAD-TR-65-26, Sept. 30, 1965.
Mission Requirements for Manned Mars and Venus Exploration, Vol. III,
General Dynamics/Forth Worth, FZM-4366-3, May 30, 1965.
Renzetti_ et. al., Projected NASA/JPL Deep Space Network Capabilities in
the 1970's, AIAA Stepping Stones to Mars Meeting Baltimore, Maryland,
March 1966.
Manned Interplanetary Spacecraft Systems, North American Aviation/_pace
Division, SID 65-1278.
Study of Subsystems Required for a Mars Mission Module. North American
Aviation/Space Division, SID 64-1-1 and SID 64-1-4. January 1964.
Personal communication with North American Aviation/Space Divisioi,.
Warner, R. M., Jr., Comparing MOS and Bipolar Integrated Circuits,
IEEE Spectrum, June 1967, pp 50-58.
Petritz, R. L., Technological Foundations and Future Directions of Lar6e
Scale Integrated Electronics, Proceedings of the Fall Joint Computer
Conference, 1966.
Gange, R. A., Taking Cryoelectric Memories Out of Cold Storage,
Electronics, April 17, 1967.
Stewart, _l. D., PSIN Arrays - The Complete Optoelectronic Function_
Proceedings of Conference on IntegratedCircuits,2 - 4 May 1967.
Manasevit, H. M. and Simpson, W. I., LateNews Item, Proc. Am. Phys.
Soc., Edmonton, Alberta, Canada, 1963.
Allison, J. F., Meiman, F. P., and Burns, J. R., Siliconon Sapphire
Complementary MOB Memory Systems, Proc. ISSCC, Phila., Penn.,
February 15 - 17, 1967.
Allen, Charles, A., Design of DigitalMemories That Tolerate All Classes
of Defects, Stanford Electronics Laboratory, May 1966. SU-SEL-66-031.
Handbook for System Application of Redundancy, Naval Applied Sciences Lab,
30 August 1966.
R-1
17.
18.
Burns, J. R., Switching Response of Complementary Symmetry MOS
Transistor Logic Circuits, RCA Review 25, 1964.
Martin, D. F., The Automatic Assignment and Sequencing of Computations on
Parallel Processor Systems, University of California, Los Angeles, California,
AD628220, January, 1966.
19. HU, T.C., Parallel Sequencing and Assembly Line Problems, Operations
Research, 9:841-848, Nov.-Dec., 1961.
20. Held, M., and Karp, R., A Dynamic Programming Approach to Sequencing
Problems, J.S.I.A.M., 10:196-210, March 1962.
21.
22.
Schwartz, E.S., An Automatic Sequencing Procedure with Applications to
Parallel Programming, J.A.C.M., 8:513-537, Oct. 1961.
Garner, H. L. et. al., A Study of Iterative Circuit Computers, University of
Michigan, report #AL-TDR-64-24, April, 1964.
23. Hollander, Gerhard L., Architecture for Large Computer Systems, Spring
Joint Computer Conference Proceedings, April 1967, pages 463-466.
24. Minnick, R.C. et. al., Cellular Arrays for Logic and Storage, Stanford
Research Institute, April 1966, AD643178.
25. Minnick, R.C., Survey of Microcellular Research, Stanford Research
Institute, July 1966, AD637010.
26. Holland, John, A Universal Computer Capable of Executing an Arbitrary
Number of Sub-Programs Simultaneously, Proceedings of the Eastern Joint
Computer Conference, 1959, pages 108-113.
27. Slotnick, D. L., et. al., The Solomon Computer, Proceedings of the Fall
Joint Computer Conference, 1962, pages 97-107.
28. Slotnick, D.L., Unconventional Systems, Spring Joint Computer Conference,
April 1967.
29. Results of Multi-Accumulator Study for Next Generation Computer; Internal
Report; Autonetics, Anaheim, California.
30. .Computer Memory Bankin_ Study; Internal Report; Autonetics, Anaheim,
California.
31. Volden, Jack E., "The Cordic Computing Technique". Proceedings of the
Western Joint Computer Conference, March 1959.
32. Hartig, David, "Microeleetronic Digital Stabilization Computer", Bureau of
Naval Weapons Symposium for Rotating and Static Components, April 22, 1964.
33. Study OfSpaceborne Multiprocessin_, thirdquarterly report- Phase I,
C6-1476.8/33; Autonetics, Anaheim, California.
i
R-2
34.
35.
Survey of Highly Parallel Information Processing Technology and Systems,
Phase I of an Implications Study, by John C. Murtha and Robert L. Beadles,
Westinghouse Defense and Space Center, Baltimore, Md. November 1964
AD480150.
Detection of Essential Ordering Implicit In Compiler Language Programs, by
Harvey W. Bingham, et. al., Report ECOM-02463-2. Burroughs Corp.,
Paoli, Pennsylvania. AD650845 February 1967.
36. A General Approach to Parallel Operation in a Multiprocessor Environment,
37.
by Samuel D. Epstein, Auerbach Corp. RADC-TR-67-83 AD812274 March 1967.
Parzen, E., Modern Probability Theory and Its Applications, Chapter 5,
Wiley & Sons, 1960.
38. Electronics, pages 105 - 136, November 13, 1967, Volume 40 Number 23
R-3
BIBLIOGRAPHY
o
o
8
Heller, J., Sequencing Aspects of Multiprogramming, J.A.C.M., 8:426-439,
July, 1961.
Klein, M., On Assembly Line Balancing, Operations Research, 11:274-281,
March-April, 1963.
Kilbridge, Webster, A Review of Analytical Systems of Line Balancing,
Operations Research, 10:626-638, Sept.-Oct., 1962.
NASA-Langley, 1968 -- 8 B-1
