Digital avionics design and reliability analyzer by unknown
MCR-81-515 NASA CR- 181641
[NASA-CR- 1816 41 )
AND RELIABILITY
Corp.) 153 p
D I G IT AL
ANALYZER
AVIONICS DESIGN
[Martin Marietta
CZCL 09B
G3/62
N88-23472
U ncla s
0142951
i
•::_ .._
"¢. ', -.'5
-.rCS_
[_.%!
. J _J
]""_ >3
4.,,
"," .:'-'3
DIGITAL AVIONICS DESIGN AND
RELIABILITY ANALYZER
NASA LaRC NASI-15780
February 1981
Approved:
Edward C. Stanke, II
Program Manager
https://ntrs.nasa.gov/search.jsp?R=19880014088 2020-03-20T06:19:15+00:00Z
TABLEOFCONTENTS
1.0
2.0
2.1
2.2
2.3
3.0
3.1
3.2
3.3
3.4
3.5
4.0
4.1
4.2
4.3
Introduction .....................
Applicable Documents ..................
Reference Documents ...................
Standards ........................
Other ..........................
System Functional/Operational Description ........
Introduction .....................
Usage Phases ......................
Test Design Phase ...................
Test Execution Phase ..................
Data Reduction/Analysis Phase .............
System Specification ..................
General System Configuration ..............
Hardware Configuration
Software Configuration .............. . . .
Appendix A Hardware Composition Trade Study ..........
Appendix B Microprogrammable Computer Trade Study .......
Attachment I Interim Technical Report ..............
ii
Page
1-I
2-1
2-1
2-1
2-1
3-1
3-1
3-2
3-3
3-3
3-13
4-1
4-1
4-3
4-11
J•%
Figure
3-i
3-2
3-3
3-4
3-5
3-6
4-1
4-2
List of Figures
Page
3-3
3-5
3-ii
oeeeoee*oeeleeo,eoee*'aee'oola'oe
................................ 3-14
• e•o••e•ea,.•e•,meloe,e•,'•ee'•
....... _-2
•eeeeeo,eeBe•eee•eeee•eee
iii
Table
A-I
B-I
B-2
List of Tables
Page
............................ A-_
iv
! .:
I.0 INTRODUCTION:
This document contains the description and specifications for a digital
avionicsdesign and reliability analyzer. It is the result of the study done
by Martin Marietta concerning the use of emulation for investigating
reliability and fault-tolerance issues for proposed highly reliable commercial
digital avionics systems. The study was contracted by the NASA Langley
Research Center because of the coming technology in commercial aircraft, which
largely precludes traditional approaches to certification.
Airframes for the 1990's are designed to be much more fuel efficient than
current designs, but this fuel efficiency is bought at a price of less
stability. To maintain safe flight, very reliable avionics computers are
envisioned to allow the necessary quick reaction times and continuous
monitoring of flight parameters.
_X
F .Z.,
--:_i_
_ ¢}
The computers are designed to break down so rarely (less than once in a
human lifetime) that conventional bench and field tests cannot certify their
reliability. The Federal Aviation Administration is in the process of
adopting new certification procedures that emphasize mathematical models and
simulations of the system over actual tests. To put the effort in
perspective, the computers will be predicted to break down less often than the
wings are expected to fall off planes in flight. The new avionics computers
must be significantly more reliable than today's avionics computers. They
...._ function unattended, despite hardware or software failures for at least
a 10-hour flight. This super-reliability will be gained through redundant
hardware and software. Faults that occurred will be Counteracted
automatically by hardware and/or software algorithms. As these highly fuel
efficient aircraft would fly i00 percent of the time in critically stable
conditions, control of the aircraft must be maintained concurrently with the •
fault detection and correction process. Further, any faults occurring during
the recognition and correction of a previous fault must be handled as well.
The hardware/software configuration described in this document is referred
to as the Digital Avionics Design and Reliability Analyzer. Its basic
function is to provide for the simulation and emulation of the various
fault-tolerant digital avionic computer designs that are developed. It has
been established that hardware emulation at the gate-level will be utilized.
The primary benefit of emulation to reliability analysis is the fact that it
provides the capability to model a system at a very detailed level. This
_means that rather than basing reliability analyses on manufacturer's supplied
data, or on expected probability distributions of failures of parts to
determine the response of a system, detailed models of a system may now be
employed on an experimental basis and system responses to faults observed
rather than predicted. Emulation allows the _irect insertion of faults into
the system, rather than waiting for actual hardware failures to occur. This
allows for controlled and accelerated testing of system reaction to hardware
failures.
This reporthas two primary sections. Section 3 is a description of the
functions of the system. This is intended to provide a perspective of the
system for the specification which follows in Section 4. Section 4 contains
the more definitive hardware and software requirements necessary to achieve
the goals and functions given in Section 3.
I-i
There are two Appendices and one Attachment. Appendix A is the trade
study which leads to the decision to specify a two machine system, including
an emulation computer connected to a general purpose computer. Appendix B is
an evaluation of potential computers to serve as the emulation computer.
Attachment I is the previously delivered Interim Technical Report. This
report details the feasibility study and describes in some detail the NASA
Langley gate level algorithm which provided the basis for most of the
performance figures required in the specification.
/!
. !
ii"
: ,,_--71
• _?2";
f . N_
-'<i
1-2
]i
d
:7
:! "!
i'.d'_
-! si.._
!¢;£_!
_;, it:
i}:.:2
' -."2?
:!:"i?i
: __.3
2.0 APPLICABLE DOCUMENTS
2.1 Reference Documents
i) Feasibility Study Report, Digital Avionics Design and Reliability
Analyzer, November 1979.
2) Interim Technical Report, Digital Avionics Design and Reliability
Analyzer, February 1980.
3) System Design Progress Report, Digital Avionics Design and
Reliability Analyzer, July 1980.
2.2 Standards
l) Electronics Industries Association Standard RS-449
2) Federal Standard 1031
3) Electronics Industries Association Standard RS-232-C
4) American National Standards Institute X.3.9-1966
5) Federal Information Processing Standards Publication i
6) Federal Information Processing Standards Publication 2
7) Federal Information Processing _tandards Publication 3-1
8) Federal Information Processing Standards Publication 25
9) Federal Information Processing Standards Publication 16
i0) Federal Information Processing S_andards Publication 17
ii) Federal Information Processing Standards Publication 18
2.3 Other
To be furnished by the Government
2-1
3.0 SYSTEM FUNCTIONAL/OPERATIONAL DESCRIPTION
3.1 Introduction
This section is intended to provide an overall description of what the
system (including the analyst) must do without regard to the elements;
hardware, software or manual procedures, which allow it to be done. The
emphasis in this section is on the logical functions required for the digital
avionics design and reliability analyzer. To express these functions, we use
structured analysis tools and notation. I The notation which will be used
throughout this section is based on three elements: data flow diagrams,
mini-specifications, and the data dictionary.
3.1.1. Data Flow Diagrams
I
!:_ii
t
;:i!%!
•;!.:": i?
Data Flow Diagrams (DFD) are used to present the system pictorially thus
reducing the amount of narrative needed. A DFD is a network representation of
a system. The system may be automated,.manual, or mixed. The DFD portrays
the system in terms of its component functional pieces with all interfaces
among the components indicated. A DFD does not represent the flow of control
or the order of processing. Numbers used on the diagrams are for
identification purposes only. Data Flow Diagrams are made up of four-basic
elements:
i) Data flows, represented by na_ed vectors, are pipelines through which
packets of information of known composition flow.
2) Processes, represented by bubbles, are transformations of incoming
data flow(s) into outgoing data flow(s). Each process bubble needs a
descriptive name.
3) Data stores, represented by two straight horizontal lines, are
temporary repositories of data and may consist of tapes, discs, card
sets, index files, data bases,or even someone's memory.
_) Data sources and sinks, represented by boxes, are persons,
organizations, or other entities lying outside the context of a
system, that are net originators or receivers of system data. A
source box exists only to provide co,,nentary about the system's
connection to the outside world.
Data Flow Diagrams are expressed in levels. The first level, called the
Context Diagram is labeled Diagram 0 and portrays an overall picture of the
system with subsystems shown. These subsystems are labeled 1 through N. The
subsystems are broken down in separate DFDs and further described. The
components of the first subsystem are labeled I.i, 1.2, 1.3, etc. When a
subsystem has been decomposed to as simple a form as necessary, it is called a
functional primitive.
l. Tom DeMarco, Structured Anal_sis and S_stem Specification. New York;
Yourdon, 1978.
3-1
iThere are many advantages to using leveled Data Flow Diagrams. They allow
a top-down approach to analysis. By reading the top few levels one can get
the big picture, or one can begin with the abstract and go to the detailed and
narrow in on particular areas of interest. Each page is a complete
presentation of the area of work allocated to it. All diagrams can be
restricted to 8 I/2 X ii inch paper.
3.1.2 Mini-Specification
The second part of the system functional definition consists of the
Mini-Specifications which are concise descriptions of the bottom-level bubbles
(functional primitives). Each Mini-Spec describes rules governing
transformation of data flows arriving at the associated primitive into data
flows leaving it.
3.1.3 Data Dictionary
To augment the Data Flow Diagram, there is an entity called the Data
Dictionary. This contains rigorous definitions of all Data Flow Diagram
elements such as data flows, components of data flows, files, and processes.
These definitions relate all data elements through sequence, selection, or
iteration.
3.1.4 The structured analysis information in this section is _ugmented as
necessary by textual material to highlight important points.
f
f_, J
,/ '.
3.2 Usage Phases
The digital avionics design and reliability analyzer is intended to
support three primary uses:
i) Reliability analyses
2) Failure effects analyses
3) Conventional performance analyses
Regardless of their differences, each of these has several characteristics
in common with the others. Primary among these commonalities is the fact that
each involves data gathering which is facilitated by the technology of
emulation. As shown in Figure 3-1, there are 3 basic phases of each use.
These phases are:
I) Test design
2) Test execution
3) Data reduction/analysis
f;+ 3-2
q_i_
i•i_'_L
Start
Test _ Data
 ate
Figure3-1 FacilityUsePhases Stop
These phases are shown in a different form in the Context Data Flow
diagram given in Figure 3-2. In this diagram, the results of each phase are
shown. Test design encompasses processes 1 and 3, test execution is process 2
and data reduction is process 4. Model building, process I, is an inherent
part of test design and so is not considered a separate phase in itself.
3.3. Test Design Phase
The modeling part of the Test Design Phase is shown graphically in Figure
3-3_ and described in the process descriptions. One key concept which needs
highlighting is the division of a system into functional blocks. This
partitioning is necessary due to the time constraints of emulating at the gate
level. Based on the results of the feasibility study (see Attachment i), it
is impossible to emulate the gate structure of the entire system under test.
Thus the mixed mode concept, where the system is simulated at a functional
level until a fault is inserted at which time the functional simulation of the
affected block is replaced with a gate level emulation of that block.
Following Figure 3-3, Figure 3-4, and Figure 3-5 are mini-specifications
describing each process shown in these figures.
One other concept not shown explicitly concerns the redundant computations
which occur in a fault tolerant computing system. In a model, there is no
necessity of actually performing redundant operations until one of the
redundant paths errs (due to the introduction of a fault). This concept
arises also during the Test Execution.
3.4 Test Execution Phase
The Test Execution Phase is shown in Figure 3-5. As noted in 3.3, the
actual execution uses a combination of functional level simulation and gate
level emulation of the machine under test.
3-3
s,!
.'7
tl
I
L
-'-_ I-
! o
°-
o
I,,,.
e_
N
l,i=
I.I-
3-4
I", :,2
1
3-5
,.
(!
?
/!
r,,,: i
PROCESS: I. I, Subdivide System
;PROCESS SPECIFICATION
IF model-lnformation CONTAINS "gate-level-model-needed" THEN
CREATE functional-block USING system-block-dlagrams
DEFINE system-boundry USING functional-block
DEFINE Internal-interfaces USING (functlonal-block AND
system-block-diagrams)
ELSE
DEFINE internal-interfaces USING system-block-diagrams
ENDIF
DEFINE external-interfaces USING system-block-diagrams
ENDPROCESS -
PROCESS: 1.2, Produce Gate Level Model
°;PROCESS SPECIFICATION
IF model-lnformation CONTAINS "gate-level-model-needed" THEN
FOR EACH functional-block IN system-boundries DO
CREATE block-gate-model USING system-logic-dlagrams
TRANSLATE block-gate-model TO block-gate-table
ENDFOR
ASSEMBLE gate-level-model FROM block-gate-tables
ENDIF
ENDPROCESS
PROCESS: 1.3, Produce Functional Model
;PROCESS SPECIFICATION
FOR EACH Functional-block IN system-boundries DO
CREATE (block-functional-model AND interface-behavior-model) USING
(system-functional-descrlption AND internal-interfaces)
TRANSLATE (block-functional-model AND Interface-behavior-model) TO
(functional-level-simulation-code AND functional-level-symbol-table)
ENDFOR
ASSEMBLE functlonaI-leve]-modeI FROM (functlonal-level-simulation-code AND
functlonal-level-symbol-table)
CREATE code-generation-descriptlon USING system-functional-description
ENDPROCESS
ORIOI_AI_ PXGE IS
OF POOR QUALITY
3-6
L, j
PROCESS: 1.4, Define Model Specifics
;PROCESS SPECIFICATION
IF model-type-needed CONTAINS "gate-level-model-needed" THEN
SET model-information TO "gate-level-model-needed" +
"model-subdivision-needed"
ELSE
SET model-information TO "monolithic-model-needed"
ENDIF
ENOPROCESS
t
)
:!
r..
• °
'i:J
• ,o
• " • . ,T::
•.m' 'J,
•_ ,_.'_
r.-,
:2_?;!
PROCESS: 1.5, Produce Model Interconnectton
;PROCESS SPECIFICATION
FOR EACH system-boundry DO
DEFINE boundry-lnformation USING (system-block-diagrams AND
system-logic-diagrams)
ENDFOR
ENDPROCESS
PROCESS: t.6, Produce Loadable Software
;PROCESS SPECIFICATION
FOR EACH test-software DO
TRANSLATE test-software TO (machine-object-code AND symbol-table)
USING code-generation-description
ENDFOR
ASSEMBLE loadable-software FROM (machine-object-code AND symbol-table)
ENDPROCESS
PROCESS: 1,7, Produce Environmental Model
;PROCESS SPECIFICATION
CREATE environmenta|-model-description USING
system-envlronmental-description
TRANSLATE environmental-model-description TO executable-environmental-model
ENDPROCESS
3-7
g,P
t,- | t"g _._
"_ .__..
\
==..
i'm
e-
_m
E
a_
i
i;)
i
-i
' ::{
j_
.:.';
,.. _?
Y<"
;.._-_:_;_
-:ii
.... t
-... |
PROCESS: 3.1. Determine Data to be Collected
;PROCESS SPECIFICATION
IF type-result-needed = reliability-number THEN
DETERMINE confidence-level-desired
CALCULATE number-of-samples-necessary FROM confidence-level-desired
DETERMINE type-data-necessary /* for statistical reduction */
DETERMINE (type-of-failure-desired AND desired-failure-distribution)
ELSE
IF type-result-needed = failure-effects-analysis THEN
DETERMINE number-of-samples-necessary
DETERMINE type-data-necessary FROM failure-mode-of-interest
FOR EACH number-of-samples-necessary DO
DETERMINE type-ofofailure-desired
ENDFOR
ELSE
IF type-result-needed = performance-characteristic THEN
DETERMINE type-data-necessary /* for specific characteristics */
ENDIF
ENDIF
ENDIF
ENDPROCESS
PROCESS: 3.2, Define Model Characteristics
;PROCESS SPECIFICATION
IF type-data-necessary,IN data-desired CONTAINS "gate-performance" THEN
SET model-type-needed TO "functiona]-model-needed" +
"gate-level-model-needed"
ELSE
SET model-type-needed TO "functional-model-needed"
ENDIF
ENDPROCESS
PROCESS: 3.3, Determine Instrumentation Points
;PROCESS SPECIFICATION
DETERMINE instrumentation-points IN (e×ecutable-environmental-model AND
functional-level-model) USING
type-data-necessary IN data-desired
IF type-data-necessary IN data-desired CONTAINS "gate-performance" THEN
DETERMINE instrumentation-points IN gate-level-model
ENOIF
ENDPROCESS
3-9
-,j
i
PROCESS: 3.4, Define Data Recording Directives
;PROCESS SPECIFICATION
FOR EACH instrumentation-point DO
DEFINE data-recording-directives USING data-desired
ENOFOR
ENDPROCESS
1
::'_i
.i
,I
'i
- ?
. n..:l
E:,I
-_
•-:}
F,t
>)
_ :2:!
i , tii_?
? - :" j')
'!I
;",?"t
' "" !: 'i
,_. T i
• 2: :_
2:. 'i:;
PROCESS: 3.5, Define Test Sequence
;PROCESS SPECIFICATION
IF type-result-needed = reliability-number THEN
FOR EACH number-of-samples-necessary DO
DETERMINE faults-to-be-Inserted FROM (desired-failure-distribution AND
specific-system-portion-of-Interest)
ENDFOR
ELSE
IF type-result-needed = failure-effects-analysis THEN
DETERMINE faults-to-be-inserted FROM (type-of-failure-desired AND
specific-system-portion-of-interest)
ENDIF
ENDIF
DETERMINE environmental-mode]-dtrectives /* for desired test */
ENDPROCESS
3-I0
-)j
...,:
F
"i
, E _1 ,
::l _,_ " ,
_,_|
? -- J
•" / "_ = :El
-_1 - E _ :_1
._ a_l o _ _ • I .'-• 1 _ ." ,'1 ?
=_ =._ 8 _
"--'4 E "'_ - _ _-I_ _1 _- I/ _. _ /=,=_1 _ I
• _,n _- 0 / m Q) I -_-' _. I
I_ <_ /_,1 _v;I
3-11
PROCESS: 2.1, Load Software
;PROCESS SPECIFICATION
ASSEMBLE configured-system FROM (executable-environmental-model +
(functional-level-model ÷ (gate-level-model +
(boundry-informetion + 1oadable-software))))
ENDPROCESS
:5
•"K
i
.,h
t __..-_;zd
 !:ii:i
PROCESS: 2.2, Instpument System
;PROCESS SPECIFICATION
FOR EACH data-recording-directive DO
IF data-recording-point = "symbol" THEN
FIND insertion-point IN symbol-table
ELSE
IF data-recording-point = "target-memory-location" THEN
FIND Insertion-point USING functional-level-symbol-table
ENDIF
ENDIF
CHANGE machine-object-code TO "trap"
OUTPUT instrumented-system
ENDFOR
ENDPROCESS
PROCESS: 2.3, Run Test
;PROCESS SPECIFICATION
IF test-directive CONTAINS "load-saved-state" THEN
RETRIEVE saved-state FROM saved-star:e-data-base
ENDIF
FOR EACH test-directive DO
/* execute test directive */
ENDFOR
ENDPROCESS
3-12
3.5 Data Reduction/Analysis Phase
The Data Reduction/Analysis Phase is shownin Figure 3-6. For failure
effects analysis or conventional performance analysis, this phase consists
mostly of grouping and analyzing collected data to determine actions, trends,
etc. For reliability analysis, this phase consists of data reduction and
statistical analysis, followed by the use of the results in a reliability
model of the system.
3.6 Data Dictionary
The Data Dictionary follows Figure 3-6 and defines all terms used in the
data flow diagrams as well as the mini-specs. For the Data Dictionary, the
following symbols indicate:
/• J
.. ?i
:L=;::,!
l)
2)
3)
4)
5
= is composed of
( ) optional item
[a b _ alternative items
n [ _ m iterations of with optional lower (n) and upper (m) limits
+ and
3-13
.1--./ll
• !'_ '%i
E1 _ ,_--3 ,'- I-%
_/ -_ /==1 !
=.,1 _ I1 t l
t _ I + t T _ ®'-c- .
1 ..I-
,,," _1 -_ I _1
.w . I _ _ I I I_;_
= _ , .,-.
,..® $= ._'_
Q,1 "l I :_,-l,-I_l __l _l. =
I_l-=_÷ I ___l N I_
• !_ _ _ "_ _I _IN_I _L_ _1_ =_
• ": "_ I / \ '- I _" _
E
I
''s _
i- l "
3-14
PROCESS: 4.1, Determine Analysis to Perform
;PROCESS SPECIFICATION
IF type-result-needed CONTAINS reliability-number THEN
OUTPUT reliability-results-needed
ELSE
IF type-result-needed CONTAINS failure-effects-analysis THEN
OUTPUT failure-effects-needed
ELSE
IF type-result-needed CONTAINS performar_ce-characteristic THEN
OUTPUT performance-characteristics-desired
ENDIF
ENDIF
ENDIF
!
..j
J_
-2]
.;! ):?
'-;-)i
:.L',M
•i;;??
• ,_ .;
ENDPROCESS
PROCESS: 4.2, Reduce Execution Data
:PROCESS SPECIFICATION
ASSEMBLE execution-data USING performance-characterlstics-desired
EXTRACT performance-measure
ENDPROCESS
PROCESS: 4.3, Analyze Effects of Inserted Faults
;PROCESS SPECIFICATION
FOR EACH faults-to-be-inserted IN execution-data DO
DETERMINE (effect-of-fault AND propogation-of-fault) USING execution-data
OUTPUT failure-effects-result
ENDFOR
ENDPROCESS
PROCESS: 4.4, Reduce and Group Execution Data
;PROCESS SPECIFICATION
ASSEMBLE execution-data USING specific-system-portion-of-interest
CHANGE execution-data TO composite-execution-data
ENDPROCESS
3-15
pROCESS: 4.5, Run Reliability Analysis
;PROCESS SPECIFICATION
/* Run reliability model using composite-execution-data "/
CALCULATE predicted-reliability-number USING (composite-execution-data +
confidence-level-desired)
ENDPROCESS
:!
4
"4
:i
4
• 9
. i:-:," }
:!;C
i;_"4
• i
'1
: i
..-;!]
•:Z.i
.X
3-16
!,i
i
;!
i'(
• •T
DATA DICTIONARY
actuator-description = SELF_DEFINING /* description of actuators */
analysis-result = [performance-measure I reliability-number I
failure-effects-result]
block-functional-model = SELF DEFINING /* description of the behavior of
each functional block =/
block-gate-model = SELF DEFINING /* machine readable version of system logic
diagrams broken into functional
blocks */
block-gate-table = _gate-tnfo_
block-gate-tables = tblock-gate-table]
b|ock-number = SELF DEFINING /* id of the block thts gate ts in */
boundry-information = SELF DEFINING /* list of inputs and outputs to system */
code-generation-descriptio_ = op-code-tnformatton + Instruction-formats
compostte-e.xecution-data = _execution-data]
confidence-level = number
confidence-level-desired = percentage
configured-system = executable-environmental-model + gate-level-model +
functional-level-model + boundry-tnformation +
loadable-software
current-gate-value = ["O" I "1" I "undefined" I "trt-state"]
data-desired = [number-of-samples-necessary + type-data-necessary +
confidence-level-desired + type-of-failure-desired +
tdestred-failure-dtstrtbutionl I type-data-necessary] +
specific-system-portion-of-Interest
data-recording-directive'= data-recording-point + data-to-be-gathered +
[time-!Rterva! I ttme I system-significant-event] +
output-device + output-format
data-recording-directives = _data-recordJng-dtrective}
date-recording-point = ["symbol" I "target memory location"]
data-recording-points = (data-recordtng-potnt}
data-to-be-gathered = SELF_DEFINING /* this item left unspecified since it
could be wtde range of possibilities,
ranging from modeled ttems to actual
items in the mocleling machine */
desired-failure-distribution = probability-distribution
desired-performance-information = SELF DEFINING /* this ts the performance
characteristic which we need to ascertain.
Since the possibilities are numerous, this
definition is not constrained. */
duration-of-fault = number
effect-of-fault = [stuck-at-fault I transient-fault]
environmental-model-description = [sensor-description] + {actuator-description}
+ toutput-device-description} +
_tnterconnectton-descrtption}
environmental-model-directive = initial-value + range-limits
environmental-model-directives = _envtronmental-model-dtrective}
environmental-model-performance = time + sensor-state
environmental-simulation-code = machine-object-code
environmental-symbol-table = symbol-table
event-identifier = ["sensor out of bounds" I "machine parameter out of bounds" I
system-significant-event]
3-17
<i
::!
:!
DATA DICTIONARY (CONT)
i
<::!:5
_.:"_
:_'2
• :.'2
[ .'.5
• ,? i
executable-environmental-model : environmental-simulation-code +
environmental-symbol-table
execution-data = i{run-id + [reliability-sample-data I performance-sample-data
I failure-effects-samp]e-data]}
execution-time = number
external-input = SELF_DEFINING /* this is an input from the system from the
outside world. No restrictions are placed on
its form or contents */
externai-lnterfaces = Zexternal-input_ + {externa]-output_
external-output = SELF DEFINING /* this is an output to the outside world.
No restrictions are placed on its form or
content. */
failure-effects-anaIysls = "failure effects needed" +
specific-system-portion-of-interest + type-of-fal]ure-desired
failure-effects-needed = specific-system-portion-of-interest +
type-of-failure-desired
failure-effects-result = _faults-to-be-inserted + propogation-of-faultl
failure-effects-sample-data = f_initial-state-data ÷ l{faults-to-be-inserted +
tgate-behavior-data]}J
failure-mode-of-interest = failure-effects-analysis + faults-to-be-inserted
fault-insertion = "fault inserted" + faults-to-be-inserted
faults-to-be-inserted = location-of-fault ÷ time-of-fault + effect-of-fault
+ duration-of-fault
functional-block = subsystem + internal-interfaces
functlonal-blocks = _unctlonal-block}
functlonal-element-performance = SELF_DEFINING /* performance measures of some
portion of the system. This
item is so variable, it is not
specified in detail */
functional-level-model = functional-level-simulation-code +
functional-level-symbol-table
functional-level-simulation-code = machine-object-code
functional-level-symbol-table = symbol-table
gate-behavior = last-gate-value + current-gate-value
gate-behavior-data = gate-id + machine-cycle + gate-behavior
gate-td = block-number + gate-number
gate-tnfo = gate-state-tnfo + {gate-output}
gate-interconnection-tabIe = {gate-td ÷ {gate-td_
gate-level-model = _block-gate-table} + {gate-tnterconnection-table} +
gate-symbol-table
gate-number = SELF DEFINING /* td of this gate within its block */
gate-output = SELF_DEFINING /* pointer to one of this gates outputs */
gate-performance = {gate-behavfor-_ata]
gate-state-info = gate-type + gate-value
gate-symbol-table = symbol-table
gate-type = ["AND" I "OR" I . NAND" I "NOR" I "INVERT" I "XOR" I "FLIP-FLOP"]
gate-value = ["O" I "1" t "undefined .... trl-state"]
initial-state-data = time + {external-input} + _externa_-outpbtJ +
_sensor-state} + _tnternal-state_
initial-value = SELF DEFINING
insertion-point = maChine-object-code-location
3-18
- DATADICTIONARY(CONT)
i
-t
-.4
i
-j
.- :3i
;J
,,,% .,>
:, -,71
I
....>:i
i::i/,!
/i
. L, !
J
. < '_
'i
insertion-points : _insertion-polnt_
instruction-formats = SELF DEFINING /* information concerning addressing modes,
bit patterns, etc as needed by the
code generator */
instrumentation-point = machine-object-code-location
instrumentation-points = _instrumentation-point_
instrumented-system = _machine-object-code} + [data-recording-points} +
[instrumentation-points_
tnterconnection-descr'iption = SELF_DEFINING /* description of how sensors,
actuators, output devices are connected
to the test system */
interface-behavior-mode] = SELF DEFINING /* list of interconnections between
functional b|ocks */
internal-tnterfaees = SELF DEFINING /* connecttona between blocks */
internal-state = _machine-statej
interrupt = SELF_DEFINING
last-gate-value = ["O" I _I" I "undefined" I "tri-state"]
loadable-software = _machine-obJect-code + symbol-table_
]ocatlon-of-fault = gate-id
Iower-llmlt = SELF DEFINING
machlne-cycle = SEEF_DEFINING /* id of the current machine cycle */
machlne-object-code = SELF DEFINING
machlne-object-code-location = number
machine-state = SELF_DEFINING /* this is the state of the computer, including
registers, memory, mode and any other
parameters necessary to describe the current
status of the machine itself */
mode)-tnformation = ["gate-level-model-needed" I "model-subdivision-needed"
J "monolithic-model-needed"]
model-type-needed = "functional-model-needed" + ("gate-level-model-needed")
number _ SELF DEFINING
number-of-samples-necessary = number
op-code-information = SELF_DEFINING /* information concerning op codes as
needed by the code generator */
output-device = ["disk" I "tape" I "console" I "line printer"]
output-device-description = SELF_DEFINING /_ description of any other system
output devices */
output-format = ["decimal" I "octal" I "hexidecima]" I "binary" I
"unformatted"]
percentage = number
performance-characteristic = "performance information needed" +
deslred-performance-lnformation
performance-characteristics-desired = speciflc-system-portlon-of-lnterest
performance-measure = SELF_DEFINING /- this will depend on the type of measure
desired, this is highly varlable so
no enumeration is given here */
performance-sample-data = l_intttal-state-data + [Significant-event-data_
predicted-reliability-number = number + confidence-level
probability-distribution = SELF DEFINING
propogation-of-fault = _gate-behavior-data(
3-19
. ..'._
i
!
• i"_ _
, h,!
• i
DATA DICTIONARY (CONT)
range-limits = upper-limit + lower-limit
rellability-number = "reliability number needed" + confldence-level-desired
reliability-results-needed = confidence-level-desired
reliability-sample-data = l_sample-number ÷ initial-state-data +
tsignificant-event-dataI_
run-id = number
sample-number = number
saved-state = [configured-system J instrumented-system] + time +
_e×ternal-interfaces_ + tsensor-state3 + _tnternal-stateE
saved-state-data-base = {saved-state_
sensor-description = SELF_DEFINING /_ description of what sensor is and how it
behaves. May be text _/
sensor-state = SELF DEFINING /* this is the current state of the sensor as
defined by some parameters such as orientation,
or by its output values */
significant-event-data = time + _external-inputt + [external-output_ +
_sensor-state} + _tnternal-state_ + event-identifier
specific-system-portion-of-interest = functional-block
stuck-at-fau]t = [stuck-at-one-fault I stuck-at-zero-fault I
stuck-at-indeterminate-fault]
stuck-at-indeterminate-fault = SELF_DEFINING
stuck-at-one-fault = SELF DEFINING
stuck-at-zero-fault = SEL_ DEFINING
subsystem = SELF_DEFINING 7* any reasonable chunk of the system which can be
isolated as an identifiable piece _/
symbol-table = insertion-points + instrumentation-points
/* + a bunch of other stuff _/
system-block-diagram = SELF DEFINING /- block diagram of the system of interest
showing major components and their
Interfaces */
system-block-diagrams = _system-b]ock-dtagram_
system-boundries = _ system-boundry_
system-boundry = _functional-block] + _interna]-Jnterfaces] +
external-interfaces
system-environmental-descrlption = SELF_DEFINING /* descriptioh of the behavior
of the system external
environment including all
input and output */
system-functiona]-descrtption = SELF DEFINING /* description of the functional
level behavior of the system,
including instruction fetch and
decode of the computer(s) ,/
system-logic-diagram = SELF DEFINING
system-logic-diagrams = _system-logic-dtagram_
system-mode] = executable-environmental-model + functional-level-model +
(gate-level-model) + boundry-tnformation
system-significant-event = [interrupt I trap I fault-insertion]
test-conduct-directive = initial-state-data + execut|on-ttme + sample-number
test-directive = {faults-to-be-inserted] + _environmental-model-dtrecttve_
+ _test-conduct-directive]
3-20
F_
"i
-.0
4
•4
-711
_j
.:/!
,,..,
'-1:2iL__
,-(,;
, : ',,-_
"7.
.'• T:*:
T. ; ::;J
• ' ,4
DATA DICTIONARY (CONT)
test-directives = Etest-directiveJ
test-software = SELF_DEFINING /* source software for the system under test */
test-system-definition = system-environmental-description +
(system-logic-diagrams) + system-functional-description +
test-software -I. system-block-diagrams
time-= number
time-interval = number
time-of-fault = number
transient-fault = SELF DEFINING
trap = SELF_DEFINING /; this is the occurrance of a system trap inserted for
the purposes of recording data or some such reason */
trap-insertion = SELF DEFINING
type-data-necessary =--(gate-performance) + (functional-element-performance) +
(environmental-model-performance)
type-of-failure-desired = [stuck-at-fault I transient-fault]
type-result-needed = [performance-characteristic I failure-mode-of-interest
I reliability-number] + specific-system-portion-of-interest
upper-limit = SELF_DEFINING
3-21
i__
:!
r_
i1
?i
4.0 System Specification
4.1 General System Configuration
This specification describes the requirements for the digital avionics design
and reliability analyzer. This facility consists of two major hardware
itemsas shown in Figure 4-1, a general purpose computer providing user support
and in_erface, simulation, and numerous other pieces of software; and an
emulation computer to provide either gate level emulation or general
instruction level hardware emulation. These two computers are interfaced for
synchronization and data transfer. The software for the facility is shown in
diagram 4.2. Of the five major components only a small part of the general
purpose support software, the model building software and the test execution
software run on the emulation computer. The major portion of the software
runs on the general purpose computer.
Digitizing
Board
,<C7",,
/ ul:_r, \ / ,ay= \
" I ( Drives | ( Drives |
operat°rI \ (2) J \ (z) )
• Console __
Graphic Workstation __ / , "
Iii i r!
/l E'ectrostat'c / X
--I _lr_tr_teer/_ / I L_ine J
Figure 4-1 System Hardware Components -
Emulation
Computer
4-1
>.Y-_
J
:_!,]
, > "t
i ,;_!>;>_i!
i' ''¸%_¸
E_
0_
! I I
I C=
°_
_.,,,
w
0_
.,_
- _ _ _>_=o=_
_, ._ _ _'-_ 2_
! I
I
o_
__ _ .__ = _ _._
•- _
I I I
4-2
j_i¸4
:.'.74
-.._?,
i?:c_
"-:-:77_.[i
...... 3:31
_Tf.:'_
;':i:3['3]
: ::°.:.i 1
• 2 i
3 :'3:,!
4.2 Hardware Configuration
The system shall consist of two cooperating machines connected via an
interface. These machines shall be:
1) A general purpose computer providing user interface; software support
such as editors, assemblers, compilers, simulation support; and
analysis support.
2) An emulation computer supporting emulations ranging from gate level
to instruction level.
4.2.1 General Purpose Machine
4.2.1.1 Central Processor
4.2.1.1.1 The system shall have a real-time clock (interval timer) for use by
the operating system
4.2.1.1.2 Machine hardware instructions shall include integer, single and
double precision floating point, packed-decimal, character string
manipulation, bit shifting and rotating, and logical instructions.
4.2.1.1.3 Hardware fault detection shall be provided, i.e., detection of
d_,_d= _y _=_, exponent u,=_u,, =_ exponent _d=_fl_..
4.2.1.1.4 The system shall detect a power failure or fluctuation and have the
capacity to provide for an orderly system shutdown. Upon re-establishment of
stable power, automatic restart of the system must be provided for. This
requirement may be met by battery back-up to maintain proposed MOS (metal
oxide semiconductor) memory allowing for operator notification and
intervention. The system must be maintained for a long enough period to
permit any necessary steps to be accomplished to allow for restart of the
system and user programs.
4.2.1.1.5 The architecture of the system shall be based on a computer with
effective addressing, register size, and interger arithmetics of at least
sixteen (16) bits.
4.2.1.1.6 The general purpose computer shall have the speed and power
necessary to execute the enviror_nental model and the functional level model
specified in 4.3.2.1 in the normal operating mode, cooperating with each
other, at a slow down of not more than 3000 times real time.
•4.2.1.2 Memory
4.2.1.2.1 The memory requirements stated are in terms of bytes. A byte is
defined as the alphanumeric character oriented unit of measure composed of a
least eight (8) bits. Manufacturers whose internal architecture is such that
they normally operate with less than 8 hit bytes must adjust their bytes or
words of memory proposed to reflect the 8 bit requirement. Memory single word
size must be at least sixteen (16) bits available to user programs.
4-3
4.2.1.2.2 The initial configuration must be a minimum of one-half (1/2)
million bytes of main memory. The system architecture shall not preclude a
single user program from utilizing the full complement of main memory beyond
the residency requirement of the operating system and related software.
Both hardware and software shall support two (2) million bytes of physical
memory for expansion purposes.
4.2.1.2.3 Areas or regions of memory shall be memory protected to facilitate
the protection of the operating system and individual user programs. This
requirement may be met by any combination of hardware and/or software features.
4.2.1.2.4 Single bit fault correction and multiple bit fault detection shall
be provided.
All detected memory faults shall be logged by the system. This log shall be
accessible by either a vendor, customer engineer, and/or government personnel.
4.2.1.2.5 The rationale for the one-half (1/2) million bytes of main memory
is as follows:
I) Traditionally, interactive graphics systems tend to be complex and to
require significant amounts of memory to operate effectively. The
interpretive graphic subsystem is only a small portion of the total
system and will undoubtedly have to operate concurrently periodically
with other tasks. Even if operating by itself, it is quite
conceivable that once in a production mode that multiple digitizing
stations will be required.
2) As detailed system design has not been completed, it is difficult to
predict with an accuracy the ultimate memory requirement of the test
execution software. The following items will need to be memory
resident for the test execution and in total will he significant in
terms of memory required:
a. Fault tolerant target machine object code.
b. Compiled hardware description code for the target machine.
c. Actuators/sensors values and associated bound limits.
d. Fault data being introduced.
The fault tolerant target machines will be complex in terms of having
redundant hardware components and significant associated control/management
software.
3) With anticipated run times of test execution software to be in terms
of hours or days, throughput can not be significantly degraded due to
excessive page thrashing and/or overlay roll-in and roll-out. The
requirement for physical memory to be expandable to two (2) million
•bytes is to keep execution times within reason as the fault tolerant
systems under study become more complex.
4-4
4f
5
p•i___'_
4.2.1.2.6 Memory allocation shall be dynamically allocated with the ability
to support at least four (4) interactive devices concurrently at installation
time and expandable to eight (8). A minimum of two (2) batch jobs must run
concurrently with the interactive users. The environment is to be that of
true multiprogramming, i.e., a fixed partition foreground/background
environment specifically shall not be permitted.
4.2.1.3 Disk Storage
4.2.1.3.1 Five-hundred (500) million 8 bit bytes of removable and
interchangeable formatted disk storage shall be available to the users of the
system. Disk storage required for system software is in addition to this
requirement. This d_sk space for the system shall be expandable by a factor
of two (2).
4.2.1.3.2 Average access time including latency and seek time, shall be 55
milliseconds or faster. The transfer rate shall not be less than 800,000
8-bit bytes per second.
4.2.1.3.3 The vendor shall provide an initial complete set of recording
media, as well as a complete backup set, both containing no more than 0.01%
unnacceptable sectors per unit.
4.2.1.3.4 A minimum of two (2) physical drives are required.
4.2.1.3.5 The five-hundred (500) million 8-bit bytes of removable and
interchangeable formatted disk storage for the user is considered justified
for the following reasons:
I) Disk I/O spooling area for local print output. It'is anticipated
that the various report products shall be maintained on disk for
several work days during their review, the rationale being to save
computer run time in the event that additional copies are required
for further study and distribution.
2) Provision for multiple files of gate level logic diagrams with
associated legend. These files represent the various portions of the
target fault-tolerant computer system under evaluation. Different
portions of the target fault-tolerant computer system will be at
different stages of the capturing and editing of gate level logic
diagrams via the interactive graphics subsystem.
3)
4)
5)
Program source code library.
Program object code library.
Multiple gate queueing structure tables in emulator computer
compatable format.
6) Multiple fault data files.
4-5
:S
r
' .|
- :{!
-!
?
/.
.<
7J
<. _:I
-£
.:: ;5
: ,,j
£-<.:j
<-:,,f{
t 'i
-::. ,
" .;:2.
-,Jq
_)
8)
9)
tO)
ll)
12)
13)
Multiple files of the initial conditions and bound limits of the
avionic actuators and sensors of the fault tolerant systems under
study.
Multiple hardware configuration descriptions defining various fault
tolerant system options.
Library of procedure files and parameter files.
Data files associated with mathmatical and statistical analysis
routines.
As the mechanical/electronic nature of disk drives require frequent
maintenance, the requirement of two physical spindles was specified
to allow some work to continue when one drive in unavailable.
Admittedly, the capability will require careful organization of the
disk files.
Further, utilizing a large capacity disk drive allows achievement of
economy of scale. For example, a calculation revealed that, for one
vendor, going from a medium to a large capacity drive resulted in a
162% increase in capacity for a 76% increase in cost.
Disk space is required for the recording of data during the execution
of test software (4.3.4). The approach taken of recording
information only when out of limits conditions occur (4.3.4.4.3) is a
compromise over what the run data recording requirements could be.
The calculation provided here is an example of what the run storage
requirements would be if recording of data were to be done for each
target machine simulated/emulated cycle.
Assumptions:
6,000 gates
15% gate state changes
l_s target machine cycle time
1,000 samples per run.
Calculations:
900 (15% of 6,000) gate state changes per cycle
900,000,000 gate changes per sample
9000,000,000,000 gate changes per run
With 3 gate changes recorded per
32 bit word, 300,000,000,000 words of disk required.
1,200,000i000,000 bytes of disk required
or if on magnetic tape
With 4,000 charactor tape blocks
300,000,000 blocks at 3 inches of tape each is 900,000,000
inches of tape
9 U_, _uuu,uuu - 28,800 = _,"=^_u 2400 foot reels
4-6
!4.241.4 I/O Devices
4.2.1.4.1 Tape Drivers
4.2;1.4.1.I Two (2) read/write nine track 1600 CPI phase encoded tape drives
of not less than 75 IPS read/write speed or less than 120,000 bytes per second
peak transfer rate shall be provided.
4.2.1.4.1.2 The tape units shall provide for read-after-write check feature.
4.2.1.4.1.3 The tape units shall handle up to 2400 foot reel size.
4.2.1.4.1.4 The tape units shall be of the vacuum chamber type. Mechanical
feed arms are not permitted.
4.2.1.4.2 Line Printer
_J
t i-:
°E."
, i
-L
"i)?
-'C-
._/ {
-." 2
. J
4.2.1.4.2.1 One (i) impact type printer shall be provided. The printer shall
have no fewer than 132 print positions. The ASCII character set of 95
characters shall be employed. The proposed printer shall be able to line
space at 6 and 8 lines per inch, vertically. The printer must provide
standard horizontal spacing of ten characters to the inch.
4.2.1.4.2.2 The throughput requirement is minimally 600 lines per minute when
printing full 132 characte_ lines consisting of the 95 printable character set.
4.2.1.4.2.3 The system shall be upgradable to a configuration of two (2)
printers meeting these specifications.
4.2.1.4.3 Operator Console
4.2.1.4.3.1 The system operating console shall provide for hard copy output.
The console must be of rugged construction capable of withstanding heavy use,
i.e., continuous use during operating hours. This requirement would not
preclude a printing unit operating as a slave to a CRT operator console.
4.2.1.4.4 Telecommunications Hardware
Telecommunication hardware shall be provided to handle data exchange and its
associated line disciplines between local terminals and the host computing
system. Attached terminals will be used for time sharing, inquiry/response,
local graphics and local plotting. The system shall be able to handle half
and full duplex lines concurrently. Circuit disciplines in general shall
include at the minimum start/stop half duplex and full duplex asynchronous
transmission.
4.2.1.4.4.1 All telecommunications hardware supplied by the vendor shall
conform to the Electronic industries Association Standard RS 449. The
government has adopted RS-449 as Federal Standard 1031, which became mandatory
for all procurements by federal agencies starting June I, 1980. EIA Standards
RS-449, RS-422, and RS-423 are intended to gradually replace RS-232-C.
Telecommunication hardware which conforms to the new standards shall be
provided. The vendor's proposal must state how existing terminals which
conform to RS-232-C will be accomodated.
4-7
q4.2.1.4.4.2 The telecommunications hardware/software shall support the
Teletype (TTY) start-stop asynchronous communications. The
emulators/simulators are intended for use With various ADP vendors, so the
proposed emulation/ simulation shall not be specifically designed for any
particular vendor.
4.2.1.4.4.3 The initial four (4) con_munications ports (See A. 2.2.5) will be
utilized for some combination of alphanumeric CRT(s), graphic display,'
digitizer board, and electrostatic plotter. The proposed system must be
upgradable to eight (8) communication ports.
I
:i
#
:'}
z,[
;i
2_
 i!.!il
• ,2:_
• :#,a
" _
4.2.1.4.4.4 One (I) interactive user CRT terminal is to be provided by the
vendor. The unit will be locally attached to the CPU operating at the speed
of 1200 BPS or faster. The physical connection will not exceed the industry
standard of fifty feet. This unit will be utilized by the software program
specified under 4.3. The following are minimum specifications to be met:
I) 80 character line width
2) 24 vertical lines
3) Fill-in-the-form capability with the form stored in the background
and variable information entered in the foreground. Once the form
has been loaded to the CRT memory, it will be utilized for a series
of transactions w_thout need for retransmission from the computer.
Only the variable information is to be transmitted to the computeK
during the data entry process.
4) Normal and reverse video
5) Double intensity
6) Blinking
7) Underlining
4.2.1.4.5 General Purpose Computer Emulation Computer Interface
4.2.1.4.5.1 The contractor shall provide any necessary hardware to
interconnect the general purpose computer to the emulation computer.
software is specified under 4.3).
(Related
4.2.1.4.5.2 Data transfers between the two computers will consist of the
following:
4.2.1.4.5.2.1 For reliability analyses data gathering, _he transfers will
include:
4-8
I)
2)
At the start of the run, the general purpose computer will load the
emulation computer control memory with the gate level emulation
algorithm and the emulation computer primary memory with the gate
tables for the specific system portion of interest.
During the run at each fault insertion time, the gate tables within
the emulation computer primary memory will be updated to reflect the
inserted fault by the general purpose computer.
.<
2:
- .?
,..%
:l
4
-_
. .._!
,._ .2.
, -,:..%
• .?
• _!_
3) During the run, for each machine cycle, the inputs to the block being
emulated at the gate level will be transferred from the general
purpose machine to the emulation machine and the outputs of the block
will be transferred back from the emulation machine to the general
purpose machine. The quantity of data transferred depends on the
degree of interconnection between the emulated block and the rest of
the system.
4.2.1.4.5.2.2 For failure effects analysis, the data transfers will be the
same as specified in reliability analysis data gathering. In addition, at the
end of each cycle, the new state of each changed gate may potentially be
transferred from the emulation computer back to the general purpose computer.
4.2.1.4.5.2.3 For standard emulation purposes, data transfers will be as
follows:
l) At the start of the run, the emulation machine control and primary
memory will be loaded with the appropriate software by the general
purpose machine.
2) During the run, input and output data from the environmental
simulation to the emulated machine and back will be transferred at
appropriate times.
3) Additional data concerning the state of items in the emulated machine
may potentially be transferred back to the general purpose machine
for performance evaluation purposes.
4.2.1.4.5.2.4 The speed of the computer-computer interface for data transfer
shall be sufficiently fast so that the predominant amount of time in the
reliability analysis data gathering experiments will be time for
simulation/emulation of the system rather than for data transfer.
4.2.1.5 Interactive Graphics Subsystem
4.2.1.5.1 The contractor shall provide the necessary hardware to capture and
validate gate level •logic diagrams. (Related software is specified under 4.3)
4.2.1.5.2 The graphics workstation is to be made up of the following
component s :
4-9
I) Digitizing Board with cross-hair cursor. The digitizing surface must
be large enough to handle logic diagrams up to standard size E (34" X
44"). The gantry style digitizer is preferred, but is not
mandatory. The logic symbols are not to be digitized in detail. The
symbol type is to be selected from a menu and the symbol position is
to be recorded via the cursor. With this approach, a digitizer board
with minimal accuracy, resolution, and repeatability may be
utilized. A resolution of I00 points per inch is adequate. The
working surface shall have both tilt and height control.
j
i ,i
i iiil
• " i
2) Graphics CRT with alphanumeric keyboard. The minimum screen size
shall be 19". This requirement may be met by a single raster scan
type graphic CRT with the capability of a reference drawing being
flashed onto the screen from which a zoom-in area may be selected.
The requirement may also be met via two storage tube type graphic
CRT's. A reference drawing would be displayed on one CRT while
zoom-in areas are displayed on the second CRT.
3) Electrostatic plotter with a roll paper width of 36". The unit shall
have a resolution of I00 points per inch. The electrostatic plotter
will he used primarily for quick turnaround images for validating
plots against the original input document, i.e., gate level logic
diagrams.
4.2.2 Emulation Computer
4.2.2.1CPU Architecture - The emulation computer shall be user
microprogrammable. The microcode shall provide control over primitive
functions within the machine (e.g. connection of registers to busses, ALU
operations, etc) and shall provide the capability for parallel operations
within a microword.
4.2.2.1.1 Microcode containing the NASA Langley gate-level algorithm or
similar algorithm must be programmed into the emulator. Due to the stringent
speed requirements for processing such an algorithm, the microcode must
perform multiple operations in parallel.
4.2.2.1.2 Each gate being processed is described by a gate information word
of eight or more bits. This word is also the address to which control is
transferred in micro store, thus micro store must be sufficient to handle all
locations addressed.
4.2.2.2 Memory
4.2.2.2.1 Microprogram Memory - Sufficient microprogram memory shall be
provided to accumodate a table-driven gate level emulation algorithm.
Requirements of the algorithm are detailed in paragraph 4.3.4.7. As a
minimum, at least IK words of microprogram memory shall be provided.
4-10
4.2.2.2.2 Primary Memory - Sufficient primary memory shall be provided to
contain the gate level tables required by the emulation algorithm. These
tables shall accomodate at least 5000 gates with an average gate fan-out of
2. As a minimum, at least 32K words of primary memory shall be provided.
4.2.2.3 The selection of the micro code to be executed shall be via a
"vector" type mechanism. That is, some combination of bits in a word
containing gate status shall provide the address of the microinstruction to be
executed. Such a mechanism precludes the necessity of testing individual bits
to determine the action to take for a particular gate.
4.2.2.4 The emulation computer shall also be useful for instruction level
emulation of digital devices. The characteristics of the machine shall be
such that it will accomodate such emulation.
4
!!i:_?'4
" "2.
. ;i_,d
4.2.2.5 The emulation computer shall be interfaced to the general purpose
computer for data transfer and for software level synchronization of
cooperating, parallel simulations and emulations in the two machines. Data
transfers expected are defined in 4.2.1.4.5.
4.2.2.6 The emulation computer shall have the speed and power necessary to
execute the gate level emulation, in the normal operating node, for 6000 gates
for 0.i seconds of emulated time in 5 minutes or less of real time. The cycle
time of the emulated system for this timing figure shall be i microsecond, the
average gate fanout shall be 2; and in any one cycle, 5% of the gates will
change value, on the average.
4.3 Software Configuration
The software consists of five major pieces. These pieces are:
I) General purpose support software
2) Model building software
3) Test generation software
4) Test execution software
5) Analysis software
The software, with exception of some of the test execution software, some of
the general purpose support software, and some of the model building software
shall run on the general purpose machine.
4.3.1 General Purpose Support Software
4.3.1.1 General Purpose Machine Operating System
4-ii
,:#
J
4.3.1.I.I The system shall feature a single, fully implemented operating
system that integrates all the hardware and software that comprise the
system. The operating system shall be generally available in the market
place. More specifically, all features and capabilities shall have been
publicly and formally announced and operational prior to the offer submission
deadline.
4.3.1.1.2 It is anticipated that the primary mode of operation will be a
single operator performing a single task. Examples would be a single graphic
station capturing a logic diagram or a simulation/emulation job running alone
in the system. However, the architecture of the system shall not preclude the
concurrent processing of a simulation/emulation run with the digitization
process. Nor should the architecture preclude the addition of a second
graphic work station in the future to operate concurrently with the original
graphic work station.
7
!
l
i
ii
";/}
, £-s':!
211::.!
iii!
. .,j
.'- - iJ
" 2
" I
Allocation of resources to tasks shall be performed as automatically as
possible. All the software items specified throughout this document shall be
able to operate concurrently with any and all others, except for restrictions
such as momentary unavailability of an equipment resource.
4.3.1.1.3 The operating system shall provide a dynamic environment. That is,
memory management shall be done in such a manner that all concurrent running
jobs in total may require more memory than what is physically available. The
addition of more physical memory would improve the system's performance. This
capability shall be provided without requiring the'programming staff to define
overlays.
4.3.1.1.4 The operating system shall have the ability to produce and retain
in mass storage for later processing, resource utilization data pertinent to
each task performed. The resource data produced shall include most of the
following by user account/charge:
I) Number of lines printed
2) Central processing unit usage
3) Input/output usage
4) Remote terminal connect time or traffic statistics
5) Actual memory used
6) Amount of mass storage used
Simplified measuring units such as the aggregate of the items above, shall be
reversible to the individual component level.
4-12
4.3.1.i.5 The operating system shall operate the following basic job origin
tasks concurrently:
I) Interactive
2) Local batch
4.3.1.1.6 The operating system shall provide for I/O spooling. Spooling of
local print output shall be provided for. Spooling is defined hereas
providing a temporary file that will act as a buffer for spontaneous input or
output of data and thereby reduce impacts to executing programs waiting for
I/O services. Direct, i.e., non-spooled, I/0 shall also be available for
time-critical transmissions.
I
_J
2.1
"%' ._
:-_-_
. . -j
4.3.1.1.7 Terminal users of the system shall be able to communicate with the
system operator via terminals and vice versa via the operator control
console. This is required because terminal users may or may not reside in the
same room as the system.
4.3.1.1.8 Interactive batch job submittal from time-sharing devices shall be
provided. The user shall be permitted to save files on either magnetic tape
or mass storage disk.
4.3.1.1.9 A terminal user shall be able to determine the status of a batch
4.3.1.1.10 The system response time to an interactive user system command
shall not exceed an average of two (2) seconds. The absolute maximum response
time shall not exceed thirty (30) seconds.
4.3.1.1.11 The system must provide for a job control language that allows the
user to override system defaults and parameters pertinent to job management,
job scheduling and data management. This provision shall provide control over
job priorities; job termination options; programmatic steps within a job
stream; job dispatching and execution, etc.
4.3.1.1.12 Security and system authorization. The system must limit access
to any and all installation resources, including files and data contained
therein. This facility will only allow processes to those users that are
pre-defined as authorized for access. Read and write permits must be features
within the data authorization scheme.
4.3.1.1.13 The operating system shall be considered to be state-of-the art.
That is, the operating system shall have been designed, developed and
implemented to support an enviro_ent of concurrent interactive and batch
_0bs. The system being specified in this document is to be utilized in the
evaluation and testing of fault tolerant airborne avionic computers of the
future. When consideration is given to this fact and the fact that the
enviroranent is one of new technological development, it is prudent and
reasonable that only the best available resources and toolsshould be made
available for the project.
4-13
4.3.1.2 User Oriented System Software Components
4.3.1.2.1 A file editor shall be provided with the following minimum
capabilities:
i) With the exception of binary object files and files written by
FORTRAN as unformatted, be able to manipulate any and all files used
by the system.
+ 2)
3)
Must be available interactively and optionally be available through
the batch mode.
Must contain, as a minimum, the following, or equivalent capabilities:
i. Replace String
i,
!
::ii
_ ..,+
+-' ,"t
• + ,:
+ .._,
i ? +::i
i
.-;:5
ii
i: !
)•+:
1
, )
• • ++
+ (2"i
• • +
2. Change line
3. Delete
4. Print/List
5. Search (forward and backward)
6. Insert
7. Add
4) Must provide the user with the view that his whole file is
immediately available to him, that is, he must not have to
specifically fill and empty the current edit buffer.
4.3.1.3 General Purpose Programming Languages
4.3.1.3.1 A FORTRAN compiler that minimally meets the ANSI X.3.9-1966
specifications shall be provided. The delivered compiler must be stable and
thoroughly debugged.
4.3.1.3.2 An ASSEMBLER or hardware level compiler shall be provided which
possesses features not available in the high level programming languages
required under 4.3.1.3.1 and 4.3.2.2. Bit and character level manipulation,
privileged instructions, register referencing, and branching based on hardware
conditions shall be provided.
4-14
4.3.2.1 The relationships of the various models shall be as follows:
=, I
!
X
i!i
;!
.:_
"'2_
-_•i4i
4.3.2.1.1 The enviror_ental simulation shall execute in the general purpose
machine and shall simulate the effects of the systems, sensors and activators
which interface to the digital avionics computer(s). This would include such
items as attitude and rate sensors, attitude control activators, etc. This
model shall produce the identical effect as if the avionics computer(s) were
connected to actual devices in a real system.
4.3.2.1.2 Functional level machine simulation shall provide a model of the
behavior of the avionics computer(s). This simulation shall be at the
instruction level of the computer such that actual software may be executed by
the simulation with results identical to the real avionics computer(s). This
model shall interact with the environmental model, reacting to the inputs
provided by that model and producing the appropriate outputs to that model.
This model shall also interact with the gate level emulation(s) which are
active by providing the appropriate inputs and receiving the outputs of the
gate level emulation(s).
4.3.2.1.3 The gate level emulation shall provide a model of the behavior of a
portion of the digital avionics computer(s) at the gate level. It shall
interact with the functional level simulation by receiving inputs from that
simulation and by providing appropriate outputs to the simulation. This model
will correctly propagate inserted faults to its outputs.
4.3.2.1.4 The instruction level machine emulation, is intended to interact
only with an environmental simulation. This model shall provide the
capability to emulate, on the emulation machine, a complement of digital
hardware at the instruction level. This is intended for gathering data
concerning performance and not for failure effects or eeliability evaluation.
4.3.2.2 Model Description Translators
4.3.2.2.1 Environmental Model Translator
4.3.2.2.1.1 A translator shall be provided which will translate an
environmental description into executable simulation code on the general
purpose machine. The translator shall accomodate descriptions of outputs,
inputs, limits, etc., for sensors, activators and items external to the
avionics computer(s). The translator shall accomodate tagging items of
interest for later checking on limits during the simulation execution.
4.3.2.2.1.1 The environmental model execution code shall interface to and
provide input and output for the functional level computer simulator.
•i•
4.3.2.2.2 Functional Model Translator
4.3.2.2.2.1 A translator shall be provided to translate a functional or
instruction level description of one or more digital avionics computers into
executable simulation code on the general purpose machine.
4-17
!J
4.3.2.2.2.2 The functional model translator shall use a hardware description
language which allows expression of the structure and behavior of digital
systems •
4.3.2.2.2.2.1 The hardware description language shall provide for expression
of timing and synchronization, both between internal elements and between the
system being described and the external enviromnent.
.4
.J
!t
: :.:_,_
._,_;_
: -:J&
.[
j 1
' 5.. i
4.3.2.2.2.2.2 The hardware description language shall provide for description
of the interface between the system being described and the external
environment in terms of inputs and outputs. This description shall provide
the tie to the executable code for the enviror.nental model so the two models
will work together.
4.3.2.2.2.2.3 The hardware description language shall allow the description
of the system in terms of independent functional blocks and the interfaces
between those blocks. The translator shall produce code which allows the
replacement of the code for a functional block with something else which will
provide the same inputs and accept the same outputs without modifying the
model itself. This shall provide the link between the functional simulation
model and the gate level emulation model.
4.3.2.2.3 Gate Level Model Translator
4.3.2.2.3.1 The gate level model translator shall translate system logic
diagrams to the gate level tables needed by the gate level emulation algorithm.
4.3.2.2.3.2 The gate level model translator system shall include all
necessary interactive graphics software to operate on the general purpose
computer for capturing and validating logic diagrams. Standard logic symbols
shall be used for:
I) Inverter
2) AND gate
3) OR gate
4) XOR (exclusive OR) gate
5) NAND (not AND) gate
6) NOR (not OR) gate
7) RS flip-flops
8) T flip-flops
9) D flip-flops
I0) JK flip-flops
Ii) other logic devices.
4-18
. .:_._
:i
,;1,1
The software capability to capture legend and associate the legend with each
gate or device symbol shall be provided. The software shall provide for
multiple logic diagram sheets for a single function to be emulated. That is,
a single logic diagram up to 34" X 44" in size will not always represent an
entire function to be emulated as a complete unit. Yet each separate sheet
must be stored on disk as a subunit for output on the electrostatic plotter
for the validation process. Off diagram linkages to other sheets must be
provided for. A single "E" size drawing 34" X 44" will represent
approximately 1500 gates.
4.3.2.2.3.3 The translator shall translate the logic diagrams captured via
the graphics system to produce the gate level tables required by the gate
level emulation algorithm on the emulation computer. Each functional block,
corresponding to the functional blocks of the functional level simulation
model, shall be in a separate table, identifiable with the corresponding
functional level block.
4.3.2.2.3.4 A language translator shall also be provided which allows
description of gates and their interconnections in a purely textual manner.
The output of this translator shall be identical to and compatible with the
graphics input translator.
4.3.2.2.4 Instruction Level Emulation Translator
4.3.2.2_4.! The _st_ction level emulatlon translator shall provide the same
functions as the functional model translator specified in 4.3.2.2.2 except
that the executable code to which the description is translated shall be code
for the emulation computer rather than the general purpose computer.
4.3.2.2.4.2 The instruction level emulation translator shall use a hardware
description language but there is no requirement for partitioning into
functional blocks. The translator shall provide the interfaces to the
environmental model simulation running in the general purpose computer.
4.3.2.2.4.3 The instruction level emulation translator shall provide a code
generation description output which may be input to a retargetable software
translator such as a compiler or assembler which will translate the software
to drive the described system. The description shall provide instruction
formats, machine code descriptions and any other data necessary.
4.3.2.3 Link Software. Any software necessary to link the various models and
allow them to communicate shall be provided.
4.3.2.4 ModelDebug Packages
4.3.2.4.1 Debug packages shall be provided for each type of model.
4.3.2.4.2 The debug packages shall support interactive control and display of
actual system parameters and modeled system components.
4-19
•!i
i
il ?!
i_ ¸
4.3.2.4.2.1
as a minimum
1)
2)
3)
4)
5)
6)
7)
S)
The debug packages shall support control of each model including
Start
Stop
Single Step
Trace
Breakpoints (minimum of 16)
Item value change trace
Continue after break
Interactive modification of values
4.3.2.4.2.2 The debug package shall support display of both actual and
modeled systems items including as a minimum:
i) memory
2) registers
3) emulated gates
4) external inputs and outputs (environmental simulation)
5) processor state
6) time
4.3.2.4.2.2.1 The items to be displayed shall be specifiable by the operator
including display device and format.
4.3.2.4.2.2.2 Items shall be capable of being tagged for display in response
to system events such as:
I) breakpoint
2) interrupt
3) user command
4) trace
5) single step
k;fl
• _ i•i•_
4.3.3 Test Generation Software
Software for developing test scenarios and fault insertion shall be provided.
4.3.3.1 Test Scenario Software
4.3.3.1.1 The test scenario software shall include the capability to specify
a sequence of runs, perhaps with differing parameters, which will subsequently
be run automatically by the system.
4.3.3.1.2 The test scenario software shall include the capability to specify
initial values of all external inputs, simulated devices and internal state,
including time, of the test system. This shall include the capability to load
a configured system which has been previously stored on a storage device.
4.3.3.1.3 The test scenario software shall include the capability to specify
specific data to be collected, the format of the data and the event in
response to which the data shall be recorded.
4.3.3.2 Fault Insertion Generation
4.3.3.2.1 The fault insertion generation software shall provide the
capability for either automated or manual generation of faults to be inserted
in the gate level emulation.
4.3.3.2.2 The fault insertion generation software shall produce the following
information concerning each fault to be inserted:
I) Gate identifier to receive the fault
2) The simulation/emulation run time at which the fault is to be applied
in terms of sample number and fraction of time within the sample.
3) The duration of the fault
4) The fault state that is to be introduced, i.e., steady zero state,
steady one state, intermittent zero state, or intermittent one state,
or alternating between the zero and onestate.
4.3.3.2.3 The manual fault generation software shall allow the analyst to
specify all of the factors for each fault as given in 4.3.3.2.2.
4.3.3.2.4 The automated fault generation software shall allow the analyst to
specify the following:
I) Number of faults to be generated
2) Specific system portion of interest or probability distribution of
faults across the system
3) Probability distribution of faults over time for each sample
4) Probability disZribution of type and duration of faults.
4-21
/i
i_
:!
"j
i•
4.3.3.2.4.1 The automated fault generation software shall produce the data
specified in 4.3.3.2.2 through the use of random number generators to provide
the desired distributions.
4.3.3.2.5 The fault generation software shall produce the information
specified in 4.3.3.2.2 in such a way that the simulation execution system will
use it to insert faults at the specified time in the specified sample in the
gate level emulation.
4.3.3.3 Test Driver Software Generation
4.3.3.3.1 The capability shall be provided for translating software for the
target (emulated/simulated) machine on the general purpose computer. The
translated software shal_ be used to drive the test system during the
execution phase.
4.3.3.3.2 It is highly desirable that the translation process be entirely
automated, taking as input the source code and a description of the machine
for which code is to be generated and then producing object code for that
machine. Translators which operate in this mode are often referred to as meta
assemblers or meta compilers.
4.3.3.3.3 It is highly desirable to have a meta compiler delivered to satisfy
this requirement. However, given the current state-of-the-art, a meta
compiler with retargetable code generator is not available. As a minimum a
meta assembler is required.
4.3.3.3.3.1 The meta assembler will have as one input a description of the
instruction formats, operation codes and addressing modes of the machine for
which code is to be generated. This input shall be produced from the hardware
description language specified in 4.3.2.2.2, augmented as necessary for this
particular task.
4.3.3.3.3.2 If a meta compiler is proposed, it shall have the same
requirements as specified for the meta assembler in 4.3.3.3.3.1.
4.3.4 Test Execution Software
Software for executing the given test shall be provided. This software shall
provide for the control of all the simulation and emulation models during a
test.
4.3.4.1 The execution software shall provide for coordination of timing
between the various models so that they are all synchronized in relation to
simulated time.
4.3.4.2 The execution software shall also coordinate the execution of the
simulations and emulations in the following manner for tests in which fault
insertion is used.
4-22
• 5
4.3.4.2.1 The execution software shall execute the functional level model and
the enviro_ental model until the time at which a fault is to be inserted.
4.3.4.2.2 When the fault is to be inserted, the execution software shall
cause the execution of the functional block in which the fault is inserted to
switch from the functional level simulation to the gate level emulation with
the inserted fault. The balance of the simulated system, without the'faults,
will continue at the functional level.
4.3.4.3 The execution software shall provide for the data recording specified
under the test scenario software in reaction to the events specified.
4.3.4.4 For data reliability analysis collection, the following data
collection shall be provided.
4.3.4.4.1 At the start of a sample, sample number, initial conditions
(external inputs and outputs and internal system state) and all other
pertinent information shall be recorded.
4.3.4.4.2 At the time of insertion of a fault, the sample number, system
inputs and outputs and internal system state, and all the information
concerning the fault shall be recorded.
4.3.4.4.3 At any time during the run, whenever any of the inputs or outputs
exceed the limits specified under the test scenario software, the sample
number, simulated time, inputs, outputs, internal system state, and value out
of limits shall be recorded.
4.3.4.4.4 As the test execution software may very well run for hours or even
d_ys, it is absolutely mandatory that automatic check-point restart capability
be provided.
4.3.4.5 The execution software shall operate without user intervention but
shall allow the user to stop the execution and save the system state for later
reload.
4.3.4.6 The execution software shall provide support for all of the model
debug packages specified in 4.3.2.4. It shall allow execution of the total
system in debug mode.
4.3.4.7 The execution software shall include the gate level emulation
algorithm.
4.3.4.7.1 The gate level emulat$on algorithm shall be table driven, using the
gate tables produced by the translator specified in paragraph 4.3.2.2.3.
4.3.4.7.2 The gate level emulation algorithm shall be able to emulate at
least a 6000 gate system.
4.3.4.7.2.1 The maximum slow down factor for the algorithm operating on a
6000 gate system, assuming an average gate fan out of 2, 5% of gates changing
value in any one emulated machine cycle, shall be 3000 times slower than real
4-23
time. This timing should be based on a 1 microsecond emulated machine cycle
time.
4.3.4.7.2.1.1 This requirement is based on the results of the feasibility
study (Attachment I).
4.3.5 Analysis Software
i
•!
Software to support the reduction of the data gathered during test execution
shall be provided. This software shall provide for data reduction,
statistical analyses and reliability modeling.
4.3.5.1 The data reduction software shall allow the analyst to group common
data and reduce it to necessary components, using the data recorded during the
test execution phase. It shall allow the extraction of items deemed important
for a particular use on an individual basis by the analyst.
L_
)
;ii
'I
zi!
".; .;i
L -?
.- ,. _
•" .? ;Jc"
?i_-;?;:I
:if
' _°'I
i.;.,Z::
[- "'L_
• .;,Z
4.3.5.2 The statistical analysis software shall provide the capability to
calculate statistical parameters from the reduced data produced above. The
following capabilities shall be provided at a minimum:
i) Matrix manipulation
Real Matrices
Complete Matrices
Eigen values, Eigen vectors
2) Ordinary differential equations
3) Regression analysis
4) Time series analysis
5) Variance analysis
6) Interpolation
7) Numerical integration
8) Differentiations
9) Polynomial manipulations
4.3.5.3 The reliability modeling software shall provide the capability to
develop parameterized, unified reliability models of the system of interest.
4.3.5.3.1 The reliability model shall be capable of using the statistical
data produced from actual test execution as an input in place of predicted or
expected paramters.
4-24
:!
4.3.5.3.2 The reliability model software shall
for fault tolerant, multiple processor systems.
supRport development of models
"3
"'7
7_ ..°._
• _.-_
i
,':_ )
"'i
4-25
i"'i
J
j
w- I
APPENDIX A
Hardware Composition Trade Study
_o
II.
III.
IV.
Table of Contents
Introduction .........................
Methodology .........................
Evaluation ...........................
Final Recon=nendation ......................
A-I
A-I
A-2
A-2
4
i
.!
bl
4 :;
_*. _
._2?,
6.i;!
.; • :.-)
-" " :t2 4q
A-ii
•:1!
-4
-j
I. Introduction
This trade study was done to determine the best approach to the hosting of
the various pieces of software needed by the digitial avionics design and
reliability analyzer. The trade study was designed to answer the question:
"should the facility be based on an emulator-only system or emulator/support
machine system?". The emulator/support system envisions an emulation machine
connected to general purpose computer. The general purpose computer supports
most of the software, with the emulator supporting only actual emulations. In
the emulator-only system, the emulator must support everything.
II. Methodology
To perform the trade study the following 5 criteria were established:
:i
.• .:.!
.4
" " i
\
• '>'i
i) Operation speed
2) User interface
3) Difficulty of use
4) Costof implementation
5) Size of facility needed
These criteria were then rank ordered in order of importance (as shown in
the list above) and assigned weights of 5 to I with 5 being the most important
(operation speed).
Each alternative was then evaluated for its satisfaction of each criteria
on a scale of i to i0 with I0 being most satisfactory and 1 the least. We
then multiplied the satisfaction by the criterion weighting to obtain the
weighted ranking. Weighted rankings for each criterion were then added to
give a total for each alternative, with the higher score reflecting the "best"
choice. The results are shown in Table A-I.
Emulator Only
Emulator/Support
Table A-1 Hardware Composition Trade Study
A
v
0
.m
A
_ _ 0 0
_. u_ _ 0 N
m
0
I--
4/
/ 16
3/
/6
A-1
J ?i
J
I
III • Evaluation
l) Operation Speed - The feasibility study indicated that the primary
limiting factor for the avionics design and reliability analyzer in
the gate level emulation mode is the speed of the gate level
emulation. In the emulator/support case, the support computer
removes the burden for support of environmental simulation etc., from
the emulator. Thus this combination rates 6. A more parallel system
could get a higher score. The emulator only system rates a 3.
2) User Interface - The user interface is one of the key items in the
use of the facility. If this interface is poor, there will be a
reluctance to use the system. General purpose support machines have
the user interface as one of their most visible portions and hence,
modern operating systems have attempted to provide for a flexible
interface. Emulators, on the other hand have a much narrower
applicability and hence less attention is paid to such "mundane"
factors.
3)
4)
Difficulty of Use - This relates not only to the user interface, but
also to the operating system backing it up. The emulator/support
combination can use the general purpose operating system to hide the
tedious details of interaction with the emulation machine which is
attached to it. In the emulator-only case, the user usually has to
explicitly deal with the details of the emulation machine.
Cost of Implementation - The emulator/support system represents an
increase in hardware cost over the emulator-only system. Comparable
software needs to be developed in both cases, with the exception of
the additional driver software necessitated by the emulator/support
interface.
5) Size of Facility Needed - There is no clear indicator that either
choice represents a better possibility here.
IV. Final Recommendation
Based on the established criteria, the emulator/support system is
recommended.
A-2
:..._-c,¸
!
•_iI
• /!
•i,i_!]
,.. °.
APPENDIX B
Microprogrammable Computer Trade Study
g
it'"
_o
II.
III.
IV.
V.
VI.
Table of Contents
Introduction .........................
Microprogra_mable Computer Architecture ...........
Requirements ........................
Computer Search .......................
Computer Performance Analysis ................
Conc lus ion ..........................
B-I
B-I
B-2
B-3
B-6
B-15
B-ii
List of Tables
Table Page
"',4
? _;:t
1
;"2_
.: _ . _!
B-iii
I • Introduction
The following trade study was done to determine which micropr6gran_nable
computers would best serve as the emulator portion of the digital avionics
design and reliability analyzer. First a search was done to find all
available user-microprogrammable machines. These were then analyzed to
•determine which ones met the requirements for implementing the NASA Langley
gate-level emulation algorithm. The machines which met the requirements were
then compared concerning performance and price. A select few were recommended
as candidates for the emulator portion of the digital avionics design and
reliability analyzer.
•t
I
.!
.,
7"
_q
}i!;
. LO.ii
2;
• ,.?|
II. Microprogrammable Computer Architecture
A microprogrammable computer is one whose microcode can be changed by the
user. Microcode, which is stored in control store, consists of
microinstructions which control the primitive operations of the computer. A
complex operation performed by a computer can be represented as a sequence of
ti _yp - -:---'- ....._ _" ......." _ "_microopera ons. There are _hree es uf m_u_u_o__,,o. ,=_,,
allowing one operation per instruction; diagonal, allowing one or more; and
horizontal, allowing many operations per instruction, thus increasing
processing speed.
B-I
•.!IL
i i _
I
4
u!
i!
I
•:Ui
o
Ill. Requirements
There are a number of requirements that must be met by a user-
microprogrammable computer in order to implement the NASA Langley gate-level
algorithm. These requirements are based on a feasibility study implementation
of the algorithm using the Nanodata QM/I computer.
I. The microcode controlling the machine shall be user-progranm_able
through software.
2. The microcode shall provide for parallel operations within a single
microword.
3. Control shall be directed from main store via a gate info word of at
least 8 bits to micro store using a vector mechanism. The gate info
word contains the address of the location in micro store to which
control is directed.
4. Control store shall contain at least one thousand words.
5. Main memory shall be sufficient to handle the algorithm, at least 32
thousand words.
B-2
IV. Computer Search
Various references were investigated in order to find namesof all
companies manufacturing minicomputers which are microprogrammed. The
following sources were used: AuerbachPublishers, Inc., Data Pro Information
Services, Electronic Buyers' Guide 1980, NASARecon Data Base (remote
console), Defense Technical Information Center, and the Lockheed DIALOGdata
base. This search resulted in the list of companies shownin Table B-I. Each
companywas then contacted and asked which, if any, of their minicomputers
were user microprogrammable and could function as emulators. The list of
computers shownin Table B-2 resulted from these inquiries.
4
!
.i
/ .|
i¸,:i_
, ._2" tl
• • j
_ - _( £
B-3
z__
;! Table B-1. Computer Manufacturers Surveyed
MANUFACTURER COMMENT
Ji
!
-2 -'_
_i
,t
i) Burroughs
2) Cado Systems
3) Control Data Corp.
4) Data General
5) Digital Equipment Corp.
6) Digital Scientific
7) Hewlett Packard
8) Honeywell
9) Nanodata Corp.
i0) Northrop Data Sys.
ii) Microdata
12) Ohio Scientific
13) Perkin-Elmer
14) Prime Computer Inc.
15 ) Rolm
16) Sperry Univac
17) Systems Engineering Lab
See Table B-2
Word length limited to 8 bits
See Table B-2
See Table B-2
See Table B-2
See Table B-2
See Table B-2
See Table B-2
See Table B-2
Nothing user microprogrammable
Nothing user microprogrammable
Nothing user microprogrammable
See Table B-2
Only limited information available
Machine too small; C.S. too small
See Table B-2
See Table B-2
!
, 3
B-4
?'1
z_
.!
_i'iI
rr,
_E
B-5
V. Computer Performance Analyses
Each computer was then examined to determine whether or not it would meet
the requirements determined by the study done on the QM/I using the NASA
Langley Research Center gate-level emulation algorithm. All machines met both
the 32K main store requirement as well as the IK control store requirement.
The following analyses discuss the operation of each machine in relation to
Requirements I, 2, and 3.
!
L_
3
;4_
, :?t._: 1
- 1
}
i:
• q
% _,i
1.0 Burroughs BISO0 or BIg00 Series
In this computer cache _emory (2K words) is used as control store; it is
possible to store all of the microcode in the cache memory. A pipelined
processor permits fetching, decoding, and executing microinstructions to be
performed separately and concurrently thus compensating for the limited
capability of the 16-bit microcode. Memory addressing at the hardware and
microcode level is accomplished through a 24-bit field address register that
can directly address 16,777,215 bits as though they were a continuous string.
Up to 24 bits can be processed in one operation taking 167 ns. Optional port
interchange enables independent rather than processor-dependent access to main
store by such devices as the multi-line data communications control. The
18-bit A register contains the absolute "S" or Main Memory address of the
microinstruction to be executed.
This machine would be a suitable candidate.
2.0 Control Data Cyber 18
The CDC Cyber 18 was designed to emulate the CDC 1700 Series. The
microprocessor contains 2K to 4K of 32-bit user programmable microcode. One
typ__ of micro memory consists of 512 words of read/write memory and/or IK
words of read only memory; the other type contains 2K of read/write memory.
Each 32-bit microinstruction is divided into five main sections each
performing a different operation in parallel with the others. The
microprocessor controls the machine at all times. The process of decoding a
macroword in main store determines the address of the micro routine which is
called.
B-6
qi
:il
-•]
.i
.!. _
!
2, _ ,i
. i
s -.i
The read/write random access memory (RAM-) can either be loaded from an
external device or data can be written into micro memory under control of the
micro program.
Since this machine does have sufficient control store of parallel
microcode and uses a vector mechanism to transfer control from main store to
micro store it would be a candidate.
3.0 Data General Eclipse
The control store of the Data General Eclipse contains 2K 56-bit words of
parallel microcode. Each microinstruction is divided into 15 micro fields
which can be grouped according to the purpose they serve. A word in control
store is addressed by the 12-bit output of the state change logic which is
determined by the contents of the True Address bus or the False address
field. In order to start main memory the CPU places an address on the logical
address (LA) bus and issues a start signal to memory. Only the module
containing the memory location addressed responds to the signal.
microprograms in symbolic form and assemble them to produce a binary object
file. The microloader is then used to load the object files.
This machine contains the vector mechanism to address the microcode and
has flexible, parallel microcode so it would be a candidate.
B-7
• 13
!
] •
/ i
i
4.0 DEC - VAX 11/750
The VAX 11/750 contains 6K of 80-bit microcode. A _single microinstruction
can perform many operations in parallel. The VAX 11/750 was designed as an
emulator for the VAX architecture and contains IK of user control store.
Emulation starts with one micro-_rder called the BUT/IRDI. This signals the
beginning of the next VAX machine instruction. In r the micro-code which
emulates each VAX instruction, this micro-order is present in the last
microinstruction. Access to the user control store is by the opcode called
"FC" in the VAX instruction stream. This opcode results in a branch to a
location in user control store. From this point on, user microcode has
control of the micromachine. Control can then be returned to the VAX
emulation by means of the BUT/IRDI micro-order.
There are a number of features which support user microprogrammin_°_ the
data path which includes 18 general purpose 32-bit scratch pad registers, 8 of
which have ports to both the RBUS and the MBUS; the super rotator, which
allows very efficient (in hardware) bit picking operations; and a flexible
ALU. The microsequencer supports general microprogramming in three important
ways; conditional branching, loop control, and subroutine control. The VAX
11/750 has six independent flag bits, four of which are always available for
user microprogramming and two of which are conditionally available. There is
a 5-bit step counter which can be initialized to any arbitrary value
(0 _- X _- 30). For subroutine control, a 16-deep microstack is available for
nested subroutine calls.
This computer does have the required horizontal microcode as well as an
opcode resulting in a branch to user control store so would be a good
candidate.
5.0 DEC - VAX 11/780
The VAX 11/780 contains IK of 96-bit user control store which is available
primarily for augmenting the speed and power of the basic machine. It is,
however, possible to access 4K of ROM containing the operation and sequencing
of the central processing unit. The architecture and operation of the VAX
11/780 is similar to the VAX 11/750 as far as the requirements of this
contract are concerned.
This machine would be a good candidate.
B-8
+,+
i!
6.0 Digital scientific META 4
This mmchine is designed to be an adjunct processor to a main CPU. One
possible application is as an I/0 processor. The microcode instruction set is
very structured, 32 bits long. Typical predefined instructions include load
from control store, move register to register, etc. Control store size is
limited. Microcode can also read from "main store" via a request, wait
protocol.
Microcode operation is not started via a vectored operation and in
general, this "microprogrammable" machine is typical of a mini computer
without microprogrammab ility.
This computer will not provide the capabilities necessary for our
purposes. Microcode execution is not started via an opcode type operation,
necessitating bit decoding in the implementation of the algorithm. The
machine does not have ready access to the larger main store which would be
necessa=y to hold gate tables in the algorithm. Finally, the instruction set
looks like a mini computer instruction se _ and is not flexible enough to do
_t-.^ 1..-" _ _..1 _,_ _,i_ ,.T_ I_
I.. LL_I_ I,_ .,t. _ Ulna, LL _ LJ I.L ,_,g, L..I. V LL _,.. LL'..- ',.. _ •
7.0 Hewlett Packard i000 E/F Series
The HP I000 E/F Series has 50K of user addressable 24-bit microcode in
control store with access to 12 scratch pad registers. There are four word
types of microcode with up to five micro-orders each. Each micro-order
defines one or more operations to be performed by the computer.
The control processor, part of the CPU, is always in control of the
computer, and the base set microroutines cause the read operations to occur
for all instructions and data from main memory. All 16-bit instructions are
placed in the Instruction Register (IR) and decoded. The process of decoding
the IR bits determines which control memory address (which microprogram) is
called by the instruction received from main memory. Control memory module
selection is determined by the value of bits 8 through 4 in the Instruction
Register. These bits help determine the address of branches in the control
memory base set Primary Mapping Table, which in turn directs a branch to the
desired module.
B-9
j'.
'i
There is a micro programming support software package consisting of the
following:
• RTE Microassembler Program
• RTE Microassembler Cross-Reference Generator Program
• RTE Microdebug Editor Program
• RTE Microdebug Editor Subroutine
. RTE Driver DVR36
• WCS I/O Utility Routine WLOAD
• PROM Tape Generator Program
i
.P,
L.j.
N
.i_
• ,_
• )'-_
-_2:jl
" . !\
:!:,i?ii'
7.-,_"'f
The microcode may be loaded into writable control store (WCS) modules or
may be permanently fused in programmable read-only memory (PROM) chips.
This machine contains horizontal microcode as well as the necessary vector
mechanism so would be a good candidate.
_.0 Honeywell Level 6
The Honeywell Level 6 contains up to 2K 64-bit words in its writable
control store• Eac_ 64-bit word is divided into four 16-bit segments each of
which can be loaded with a separate instruction. Thus, one word may perform
four parallel operations• Control is transferred from the CPU to the writable
control store by causing the CPU to issue a megabus cycle (I/O write)
addressed to the WCS. This operation is performed by the native firmware
whenever the first word of an instruction lies in the range 0080 hexadecimal
through OOBF hexadecimal (64 bits)• The location to which control is
transferred is one of the first 16 locations in the WCS; the specific location
is identified by the least significant hexadecimal digit of the instruction
word.
There is a WCS assembler available to assemble firmware routines as well
as a loader to load the assembled routines into the WCS. A microcode analyzer
is available to selectively display pertinent CPU and W_S information for
debugging microprograms.
Due to the horizontal microcode and the vectoring effect transferring
control from the CPU to the microcode, this machine would be a candidate.
B-10
"4
i_•_,_
_ _i!_
9.0 Nanodata QM/I
The Nanodata QM/I is unique in that it is specifically designed to emulate
other computers. There are two levels of microprogramming with the lower
level called nanoprogra_ming. The top level microprogram is an 18-bit
vertical microcode having many of the characteristics of an assembly
language. The lowest level microcode is a 360-bit horizontal word (144 bits
of which are active at any one time) which interprets the higher level
microcode. The identification of the nanoword which interprets a given
microinstruction is determined by 7 bits in the 18-bit microword itself and a
3 bit page indicator in a CPU store register, giving a total of I0 bits of
address to cover the 1024 words of nanostore.
The control store limit is 40K words. For the Langley algorithm, the
algorithm would be coded in nanocode, using control store to provide the
vector into the proper nanoword and to hold the gate state information. Based
on its architecture and the actual implementation of the Langley algorithm for
the QM/I under the feasibility study, the QM/I is a suitable candidate.
I0.0 Perkin Elmer 3320
The Perkin Elmer 3320 contains 2K 32-bit words of writable control store.
WCS is addressable through ROM location counter (RLC). There are four
assembly level instructions which enable the user to write into WCS, read from
WCS, and transfer control to WCS resident microcode. Unfortunately the 2K
words of the WCS serve as a supplement to the fixed control store; the user
cannot delete or modify user level instructions or machine features located in
the ROM control store. If an operation does not exist in ROM, it cannot be
used in WCS. A new emulator cannot be created in WCS; the user can only add
to the existing one. For this reason, this machine would not be a suitable
candidate.
II.0 Sperry Univac V77-800
The Sperry Univac contains 2K of 48 bit microcode in writable control
store (WCS) with space for IK 48-bit ROM storage. Each microword executes
multiple operations. The WCS acts as an extension of the processor control
store.
B-II
_ i¸ !
7
The WCS contains a decoder control store, a central control store (CCS),
and an I/O control store. The decoder control store consists of two 16-word
by 16-bit memory arrays with associated logic that decodes main memory .
instructions into a 9-bit address which is applied to the CCS. Addressing for
the 64-bit microinstruction is provided by the 9-bit address from either the
processor, decoder control store, or subroutine stack.
The microcode is input as a series of source statements via a terminal or
card reader using the operating system VORTEX II or SUMMIT. The
Microassembler, MIDAS, is then used to transform these statements to object
code. The object code is then loaded into WCS using the microutility, MIUTIL.
This machine does contain horizontal microcode as well as the necessary
vector mechanism to control store and would be a viable candidate.
The architecture of the V77-600 is the same except that there are 4K 64
bits of _S. Thus this machine would also be a candidate.
12_0 Systems Engineering Laboratories 32/70 Series
The SEL 32 Series contains _K 64 bit high speed Random Access Memory (RAM)
as a physical extension of Control Store (CROM). The microinstructions
contained in WCS allow parallel operations within the execution timing of a
single instruction.
The writable control store (WCS) may be used as a CROM extension in the
host computer, or it may be used with the Development Support System (DSS),
residing in the DSS Test Stand. The CROM takes an instruction from Main
Memory and stores it in a 32-bit internal register (Ii). An appropriate
microprogram is executed and the contents of register II are moved to register
IO (a 32-bit register). The CROM entry point is determined by a decode of the
contents of register IO. The CROM contains a series of read only memories
(ROMs) which contain the decode and vector tables within CROM to the
microprogram_ed routines that operat_ the computer.
Entry into the WCS from software is accomplished using the JUMP WCS
Macro-Assembler instruction. This instruction allows the user to jump to any
of the first 64 locations in WCS where vector addresses (in microcode) are
stored, which address routines within the WCS.
The writing of WCS is accomplished using the WRITE WCS Macro-Instruction.
The reading of WCS is accomplished using the READ WCS Macro-Instruction.
B-12
"4
i
r
i
_i'j
Since this machine does have horizontal microcode and does have the vector
mechanism from main memory to control store it would be a suitable candidate.
In addition to the analyses that were done to determine whether each
computer met the requirements, an algorithm was used to rank the computers
with respect to those characteristics necessary to the solution of the
gate-level algorithm I. The following equation was examined then altered to
better fit the algorithm requirements:
p_
1012 [(L-7) (T) (WF)] i
[32,000 (36-7)] i
_c - tl/_
where
p=
L=
T=
WF=
tC =
operations
ti/o =
i= .5
the computing power in bits per second
the word length in bits
the total number of words in memory
I for fixed word length memory
2 for variable word length memory
the time in microseconds for the CPU to perform one million
the time the CPU sits idle waiting for I/O to take place
• .._ _4
.j
.J :J
. • '. i _
I Knight, Kenneth E.: Changes in Computer Performance, Datamation, vol. 12,
no. 9, pp. 40-54, September 1966.
B-13
The above equation was altered to include only those parameters relevant to
the implementation of the NASALRCgate-level emulation:
1210 [(L-7) (i) (i)_ I/2
532,000) (36-7)] i/2
[2 (CS) + (m)_
4
!
4
:4
4
• , i
:.j .' ?
'-...? "
{ -, ,
.-_jc,
whe re
CS=
M =
p=
control store cycle time in microseconds
main memory cycle time in microseconds
measure of the bits processed based on a weighted average cycle time
A weighted average of the control store cycle time and the memory cycle time
was chosen as the control store is accessed more frequently than the main
memory so its access time should carry more weight in analyses of the overall
performance.
The measurement P' is not meant to be a direct measurement of the power of
each machine but more of a relative measurement of performance to aid in
choosing the computer which best fills the requirements of this contract.
The value for P' for each computer was then scaled to fall between I and
i00 in order to more easily rank the performances. These numbers appear in
Table A-2 under the heading "Performance Rating". The value for the
performance rating was then divided by the cost of the CPU with minimum memory
(at least 32K words) plus control store to give a value for performance per
dollar. These values were then scaled to fall between i and i00 to give each
candidate a "Performance per Cost Rating".
The prices quoted in Table B-2 represent only the price of the CPU with a
minimum of memory plus the control store. They do not reflect the price of
interfaces, consoles, printers, etc. They should be used only as a general
basis for cost comparison.
B-k4
.;O
I
/ i-7
- .,,,-I
._ .?:':_t_
Vl. Conclusion
The machines which have been recommended as final candidates were chosen
more on a basis of performance than cost due to the stringent requirements for
supporting the gate-level algorithm.
The performance of the QM/I is far superior to any of the other machines
studied. There are a number of machines that compete for second place such as
the DEC VAX 11/750, DEC VAX 11/780, Honeywell Level 6 Model _3, Systems
Engineering Lab 32 Series, Data General Eclrose, Sperry Univac V77-600_ and
Sper=y Univac V77-800. Comparing the two VAX machines, one would eliminate
the VAX 11/780 on basis of cost. The Univac V77-800 could be eliminated for
the same reason. Any of the following machines would be good second choices:
i) DEC VAX 11/750
2) Honeywell Level 6 Model 43
3) SEL 32 Series
4) Univac V77-600
5) Data General
The QM/I far outperforms those machines in second place and would be the
recommended choice for the emulator portion of the Digital Avonics Design and
Reliability Analyzer.
B-15
Attachment i
Interim Technical Report
=i
i
,j
4
,4
" ,2
, . ..g_
.: _ ;:_
,
._ •.5
:_t"}
"• •-5
•72
% <7, !_
Table of Contents
I. INTRODUCTION 1
II. SUMMARY 3
III. BASIC ALGORITHM DESCRIPTION WITH PRELIMINARY TIMING ESTIMATE 5
IV. ADDITIONAL FEATURES OF THE ALGORITHM 24
•:_,?
2 ::_,:
V. IMPLEMENTATION OF THE ALGORITHM 29
_rX _TMTN_ _V_IIT.T_ 37
VII. CONCLUSIONS 49
Vlll. REFERENCES 51
UNIFORMITY OF GATE TREATMENT
DERIVATION OF EQUATIONS
NANOCODE FOR BEST CASE TIMING ESTIMATE
Appendix A.
Appendix B.
Appendix C.
A-I thru A-6
B-I thru B-If
C-I thru C-7
I. INTRODUCTION
This interim technical report details the results of Martin
Marietta's implementation on the Nanodata 0#I/I of an algorithm for the
emulation of digital devices at the gate level. The implementation is
intended to prove the feasibility of using emulation technology for
data collection in support of reliability studies of fault tolerant
digital avionics equipment. From the high level point of view, it
j
3
• w
....:'!3
is clear that that feasibility depends primarily on the adequacy of the
speed improvements emulation seems to offer over simulation. That is
to say, the most useful measure of the feasibility is the time required
to perform a "sufficient" number of experimental runs to give statistical
significance to the results obtained.
The specific algorlthmwhlch we implemented was developed by the
NASA Langley Research Center. The algorithm has two significant factors
inherent in its use. First, It doesn'trequire examination of every gate
in the system and second, It allows treatment of every gate in the same
manner, regardless of gate type. The algorithm is described in detail
in sections III, IV, V and Appendix A.
To provide a basis for the actual timing figures, Section III
provides a discussion of the basic operations involved in the algorithm
and the basic timing considerations in the QM/I to try to determine a
"best case" slow-down factor for the algorithm (i.e., an indication of
the best we can do in terms of speed). This section contains a brief,
high level overview of the philosophy of the algorithm and provides
an introduction to the more complex discussion in Section IV.
Section V gives details of our implementation including considerations
r_
i_?¸_k_
of memory requirements and system size. The timing results are detailed
in Section VI and include graphs to determine predicted performance
for systems of varying sizes. Section VII presents our conclusions
based on the implementation and timing studies. Appendix A contains
the rationale and basis for uniformity of gate treatment, Appendix B
contains the derivation of the timing equations used for projection,
and Appendix C contains the nanocode for the "best case" timing
analysis.
2
!i
II. SUMMARY
This report details the results and conclusions of Martin Marietta's
implementation of the NASA Langley Research Center's gate level emulation
algorithm on the Nanodata QM/I computer. The implementation was done
to determine the applicability of emulation technology to reliability
analysis of digital avionics systems. This determination has focused
primarily on the speed aspects of the emulation and the time necessary
3
;2d
;,{i
"_FLiN
- 4_.7_':']
to run a sufficient number of sample cases to provide significant results.
The slow-down factor of the emulation is based on four primary
considerations:
.
2.
.
4.
The average percentage of gates changing value in a
machine cycle;
The average fan-out of the gates in the system4
The machine cycle time of the system under study.
Using a system size of 2000 gates, and assuming 5% of the gates change
value, with an average gate fan-out of 2.0, and a machine cycle time
of _s, the actual slow down factor based on the implementation was
found to be 1200:1• This is compared to a best possible slow-down
of 600:1. The 1200:1 figure means that i0,000 samples of 0.i seconds
real time per sample would take 17.2 days of emulation processing,
a span which is entirely reasonable in relative to the kinds of numbers
seenin previous studies (i.e.[3]).
The largest control-store resident system possible under the same
constraints (6000 gates) also exhibits a reasonable slow-down factor
" •7
.#
,!
of 3500:1. However, in attempting to extend the emulation capability
beyond 6000 gates, we found that the processing time is overshadowed
by the time it takes to load data into control-store, and hence this
mode of operation is not feasible.
' ::_÷;
_:J:l?SJ
::': }: a
j2;;i
,/;_ (
_;." ":_
The basic conclusion of the report is that gate level emulations
of systems up to 6000 gates is feasible within the constraints imposed
by the architecture of the QM/I.
III. BASIC ALGORITHM DESCRIPTION WITH PRELIMINARY TIMING ESTImaTE
In our discussions with NASA Langley Research Center Personnel,
we have been given several estimates of slow-down factor expected by
them in the QM/I implementation of their algorithm. These have ranged
from a low of 300:1 to a higher range of 500-600:1. From our imple-
mentation of the algorithm, these figures seemed very optimistic. The
following discussion is an attempt to define a possible, reasonable lower
bound on the slow-down factor, taking into account QM/I and nanocode
realities as well as the operations necessary because of the algorithm.
For the analysis which follows, we assume that the reader is
familiar with the basics of the QM/I and its nanocode. We further
assume that the reader is familiar with the algorithm implemented
(described briefly below). Please note that in these assumptions we
do not require a working knowledge of either item, only a familiarity
to the extent that allows an understanding of the terms involved. For
example, most of our discussion will be based on the basic time cycle
in the QH/I, the T period (80 ns). It is sufficient for the
reader to realize what a T-period is and what it means in relation to
execution of a nanoword.
The NASA LRC algOrithm is conceptually straightforward. There is
one item requiring acceptance on the reader's part. The algorithm as
defined allows all gates of any type (AND, OR, NAND, NOR, XOR, etc.) to
be treated identically after initialization of the value of the gate
and a quantity called CNT (count) which relates to the number of inputs.
The algorithm has a major and a_inor loop as shown in Figure III-i.
,,;"
i•-_ _-_
• 4
<J
• ,2. _-_
I
b-lniUalize __
_._e For
NextCycle
T
ReadNextX-GateVector I
t"
! IOutput Gate (Z) .For This X
DoesZ GateValueChange_NoAsA ResultOf NewXValue?/
"-QueueThisZ Gate
_X_r_
l_'_x_ssln_I A Future
"_c_.
_,
Figure II1-1 ALGORITHMOVERVIEW
.T
;- OutputForThisX ?
k_
7
L_
_r_ ,_.
The driver on the major loop is a queue consisting of those gates whose
value has changed. The minor loop is comprised of examining each gate
which is connected to the outputof the changed gate to see if the
change affects the value of the output gate. For example, we will use
the 3 gates shown in Figure 111-2. Suppose the value of gate A has
changed (the mechanism for this is described later). The algorithm
Figure II I-2
then specifies that gates B and C must be examined to determine if the
change in the value of A will cause a change in the value of either
B or C. This is determined by updating the_NT quantity for the output
gate (B or C) based on the new value of the input (A) and determining
if CNT, by virtue of that update, transitions into or out of zero (see
Appendix A for an explanation of this mechanism). Transition of CNT
for the output gate into or out of zero indicates that the value of the
output gate (B or C) changes value. If the value of the output gate (B
or C) does not change because of the input (A) value change, no further
action is necessary. If the value of the output gate (B or C) does
change, the gate is added to the next cycle's changed-value queue (as a
future x-gate) to allow examination of the effects on its outputs.
For brevity in our discussion, we will term the changed-value
queue and its processing "the x queue" and "x processing". We will
similarly term the output gate processing to be "z processing". The
action of the algorithm then causes a z whose valUe changes to become
an x for the next "cycle" (for its outputs to be examined). In this
case, a "cycle" represents the propagation of the signal through one
logic level; and a "machine cycle" would be completed when the x queue
becomesempty (i.e., when the logic circuit has reacted to changed
inputs and the circuit has settled to quiescent values).
8
]
-i
,..4
J
i;i
i2
i ::,?
i
i,:i!i;;ii
_. f.0
There are several benefits derived from the algorithm. For x
processing, only those gates whose values change need to be examined.
Further, only the outputs of those gates must be known. This contrasts
with the typical "brute force" algorithmwhich requires examination of
each gate and subsequent examination of all of its inputs to determine
its value.
For the analysis below, the only way a gate is put on the x queue
is by virtue of its having been examined during z processing and found
to have changed value. We will term this the non-null case of z proces-
sing. Thus we have the following relationship
# x gates processed = # non-null z gates processed (1)
This ignores the mechanism for starting the cycle, so we will cover
that later. The null z process is the other case for z's and repre-
sents basically no operation (i.e., no action necessary since the
gate doesn't change value).
_^_uprovide some quantitative values for our discussion we need to
make some assumptions concerning the system being modeled at the gate
level. For example, we need to fix the size of the system. This is due
to the fact that the slow-down factor is directly proportional to the system
size (actually the proportionality is based on the number of gates whose
i7.!
values change, but this is related to system size). Gate proces-
sing must proceed sequentially, while the modeled machine cycle
time is fixed. Therefore, we based our analysis on the following
assumptions:
4
rli!
N
J
?,>:ii]
.::>.4
':!:gS;
- __?_
I. The system under consideration contains 2000 gates ;
2. Only 5% of the gates will change value in any machine
cycle (x processing_
3. The average gate fan-out is 2 (there are 2 z's per each x);
4. The basic machine cycle time of the emulated system is O. lff_.
For discussion purposes, any further reference to cycle, cycle time or
the time for the logic circuit in the modeled machine to react com-
pletely to a change in input. This represents th_ real time against which
the algorithm is measured.
A second set of assumptions is necessary for this analysis. This
second set relates to the data structure upon which the algorithm
is built. For this discussion we will assume the following structure
(shown pictorially in Figure 111-3).
,
Each gate is characterized by a gate info word. This word
contains the gate value, CNT and various other information.
The x queue is a linked list with the link word following the
gate info word in memory. The link word contains the address
of the next gate info word in the queue. We will call the
link word "LINK".
i1
7_/:i
• r • _:
I 4; _
kMress ]I Address Of Next GateM Gate Info WoN In Oueul (LINK)
•+3[ Gate Info Wool ] I LINK
-I ]l
I AddressOfOutputUst(CLINK) ]
c..,i it ,,,P
•.,I It ,,.,FI
..o[ 1 II
",'! li -.",PI
OutputUst A_lrus
AddressFor1st Output I N
N+15) I
AddressFor2ndOutput i N+]
(M+6) I
1AddressFor1stOutput N+2
! kklress F_r2.d OMImt] N.,
10
Figure III - 3 Gate Queueing Structure
(N+2)
I
I
Be
e
The address of the first gate info word in the queue is
kept in a local store register designated MLINK.
The output gate addresses are kept in a separate section
Ii
of control store in consecutive order. That is, the
•:|
:i
'r._:#N
)...;";i
S:_!;J
. i°'°•-
e
address of the gate info word for output 2 of gate y follows
immediately the address of the gate info word for output i
of gate y.
The address of the output list for each gate is contained in
a word following the queue link word (LINK). We will call
this word "CLINK."
F_he diagram of Figure 111-3, we see the gate whose info word is
located at address m+9 has two outputs. The first output is the gate
whose info word is located at address m+15 and the second is the gate
whose info word is located at address m+6. This is found by following
the CLINK to address n to address m+15 and then following •address n+l
to address m+6. The x queue in that figure goes from the gate of
address m+18 (due to MLINK) to m+12 (LINK) to•m+9 (LINK) to m (LINK).
The data structure just presented represents, in a somewhat simplified
manner, the data structure used in our actual implementation. The
specifics of the implemented algorithm are covered in more detail
in Section V.
Given these assumptions concerning system •size, the definition
of the algorithm and the structure of the data in QM/I control store,
we can nowbegin to analyze the potential slow-down factors based
on those assumptions. One further assumption which is inherent
in the following analysis isthat the decisions concerning
q]
:_fl_
,_:_!..i
the actions to be taken in z processing are made in a highly parallel
fashion using the QM/I microlnstructlon execution feature [1:66] and
_ocal store register R31. Using this feature, testing of bits in the
gate info word is done quickly in a highly parallel fashion requiring
a very small amount of time.
For the first cut estimate of slow-down, we will ignore the time
necessary for x processing and concentrate on what is required for z
processing. However, to determine the number of gates examined in
z processing, we must use system assumptions 1 and 2 to determine the
number of x's processed and then multiply that by the 2 from system
assumption 3 to give us the total number of output gates examined
(i.e., 2 outputs per x = total z's). Therefore we have the following:
# x gates processed = _ystem siz_ X _ system changin_
= 2000 gates X 5%
= i00 gates (2)
# z gates processed = # x gates processed X fan-out of x gates
= i00 gates X 2
= 200 gates (3)
Now using equation (I), we find the number of non-null z cases.
Equation (i) stated
# x gates processed = # non-null z gates processed (1)
Therefore, we have
# non-null z gates processed = # x gates processed
= I00 gates (4)
12
! i_i
Which means that of the 200 z gates we have, half are the non-null
case and the other half are the null case. Now that we have fixed
the amount of processing to be done, we need to get an estimate of
13
the time necessary to do each case. We will consider the null case
first. The shortest nanoword in the QM/I which does not branch to
itself looks like [1:58]:
'7
2:,_=j
Tn:
Tn+ I:
READ NS (not stretched)
GATE NS (not stretched)
Tn"
or
STRETCH, READ NS, GATE NS
We need to explain our notation in the above two examples. The Tx to the
left indicates the T-step (not T-perlod which is fixed, but T-step which may be
either i or 2 T-perlods long). In the first case, the T-steps are not
stretched which means they are each i T-period long. In the second,
the T-step is stretched indicating it is 2 T-periods long. For those
more familiar with nanocode notation, this can be shown thus:
X . . . READ NS
.X . . GATE NS
S •
_r
READ NS, GATE NS.
The net result is that the null z processing requires, in the best
possible case, 2 T-periods.
For the non-null case, we need to do more in the nanoword than
simply branch out. Let us assume that we can do all necessary processing
in one nanoword. To determine the length of that nanoword let us
examine the length of a set of nanowords. The set of _LTI nanowords,
consisting of 124 words, represent 669 T-periods. This works out to:
14
q
•;4
:¢i2
_; t ,
, ,._j
T-periods/nanoword = 669 T-periods/124 nanowords
= 5.39 T-periods/nanoword (5)
This result fits in well with intuition in which we realize that the
case where none of the T-steps in a nanoword are stretched is relatively
rare, and that in most words observed, at least one and occasionally
2 of the T-steps are stretched. Thus, without considering the exact
operations to be performed, we will use a 5.4 T-periods/nanoword figure
for the non-null z processingnanoword.
Now we have the information necessary to calculate the z processing
and the absolute best case slow down factor. The time required is
given by
null'z processing = # null z gates X 2 T/gate
= 200 X 2T
=_200T;
non-null z processing = # non-null z gates X 5.4 T/gate
= 100 X 5.4T
= 540T;
!J
i
4
_/i __
total z processing = null z processing + non-null z processing
= 200T + 540T
= 740T.
To translate this to understandable terms, we use the 80ns/T-period
conversion to get:
time = 740T X 80ns/T
= 59200ns = 59._
For our given machine cycle time of 0._s (system assumption 4),
the slow-down factor is given by:
slow-down factor = actual time
machine time
= 59.2_
= 592: i
This 600:1 factor is close to the NASA LRC expected slow-down in
their 500-600:1 estimate. What is significant about this figure is
that our judgment about 600:1, which appeared to be an optimistic
figure is proven to be true. This slow-down factor is based solely
on z processing, does not include x processing at all, and in addition,
does not include most of the processing necessary for z's.
Let us look further into the x processing. This processing
must consist minimally of:
i)
2)
Reading in the x gat 9 info word from control store;
Reading in the address of the output list (CLINK) from control
store;
15
..... :1
.:!t
• ;i
• [.:j
,¢':_
.}
i
3)
4)
5)
6)
Reading in the address of the first output gate (z);
Reading in the gate info word of the first output gate (z);
Reading in the address of the second output gate (z);
Reading in the gate info word of the second output gate (z).
16
i
5
q
:i
_2
%-¢-;,t
÷.,'7:,:,t
;'-,I
.;-. t:
Steps 3-6 depend on there being 2 outputs per gate. Referring back to
Figure 111-3, and using the gate whose info word is at m+9 as the x
gate, step i reads the info word from address m+9 into a local store
register. Step 2 reads the CLINK word at address m+ll into a local
store register. This register contains the address n. Step 3 reads
the contents of address n into another local store register. This
register now contains the address (m+15) of the first output gate.
Step 4 reads the gate info word for the gate at location m+15 into a
register. Step 5 reads the contents of location n+l (m+6) into a
register. Finally step 6 reads the gate info word for the second output
into a register. Thus the minimal x processing consists of steps 1-6.
Now to estimate the timing on this, assuming best possible case, we
will consider the time necessary to do the 6 reads. We assume address
formation takes no time. Based on the timing constraints for control
store [1:36], if we set up for the control store read in Tn, (assuming
all T-steps are non-stretched and correspond to I T-period) the READ
CS cannot legally occur until Tn+2. In the best case, we can also set
up for a new read of control store in Tn+2, which produces the timing
sequence below:
T 1
T2
T3
T4
set up for read of x gate info word
g
read x gate info word, set up for read of CLINK
!J;
i
17
T5
T6
T 7
T8
T9
TIO
TII
TI2
TI3
read CLINK, set up for read of first output address
read first output address, set up for read of first z
gate info word
read first z gate info word, set up for read of second
z address
read second z address, set up for read of second z
info word
read second z info word.
slow-down factor:
Thus, the best case for x processing is 13 T-periods per x. Now let us
examine the z processing_ In the non-null case, we used one nanoword
to do the setting of bits, etc., necessary in processing a z. We now
need to add in the time necessary to link the gate into the MLINK,
LINK queue. A measure of this task can be gleaned from the MULTI
instruction ENQ. ENQ is an enqueue instruction designed for creating
linked lists. It takes 27 T-periods [2:60]. We could possibly do better
by using an ST (store) instruction of MLINK into the new gate's LINK
and then an MVR (move register) of the new gate's address into _INK.
This approach requires 7T for the store and 5T for the MVR [2:54-55]
for a total of 12T. We might further assume that custom nanocode could
speed this up by I/3 for a time expenditure of ST. (ST is very close
to the time necessary for this operation in the actual implementation.)
Based on these new numbers, we can calculate a new lower bound on
•• •_q
11.._
x processing time = # x gates processed X 13T/x gate
= i00 gates X 13T/gate
= 1300T;
null z processing time = # null z gates X 2T/gate
18
= I00 gates X 2T/gate
L4
,/
,kj
L :-_i
- :.- -,)
':,,,; '5
.-.:;:i,.:i
.=
= 200T ;
non-null z processing time = # non-null z gates X (5.4T/gate
+ 8T/gate>
= i00 gates X 13.4 T/gate
= 1340T;
total processing time = 1300T + 200T + 1340T
= 2840T.
This translates to:
time = 2840T x 80ns/T
= 227200ns -- 227,2_s
slow-down factor = 227.2_s
o.1 
: 2272:1 slow down (for a ._ machine cycle)
This figure is much more realistic than the 600:1 figure obtained
before, but it is important to note that our inclusion of T-periods for
processing in this analysis does not begin to approach what is necessary
in the actual algorithm.
•W
.5
,4
L_
_ _I_ _
_ _I I _
I _ 'I_ I_ _ _ k_
We propose to iterate through the calculations one final time,
developing motivations for additions to the time estimates we have
presented and ultimately defining a realistic best case estimate
of the time required for performance of the algorithm.
To begin the analysis for this last iteration, we will modify
somewhat the allocation of timing between the x and z processing. By
this, we mean that the stepping down the output list and reading of the
output gate info word is not really a function of x processing but
belongs more properly in z processing. We will shift it into z pro-
cessing for one primary reason. The x processing pipelined read of
the outputs in the last analysis is not practical and really cannot
be done in that fashion. The practical implementation is: read of
one output; process the output; then loop back and read the next
output. So, in the first step, we have taken i0 T-periods out of the
x processing. (Time for read of CLINK and each of the output addresses
and info words.) At this point x processing consists of reading
only the x gate info word and requires 3 T-periods per gate.
As you will remember, the 3T estimate assumed that address
formation took no time. In actual fact, if we assume that the address
is in a local store register, address formation only takes the time
necessary to set up the busses to use that as the control-store address.
This adds i T-period. Thus to read the gate info word for an x requires
4 T-periods.
19
i>ii!
I
.J
-i
,J
,Iig,
i
/. r '"_,¢ 4
The next thing we need to do for x processing is to use the QM/I
micro-instruction execution capability to do a multi-way branch based
on the data in the info word. Since we want to branch on more than
the 7 bits available in the QM/I local store register 31 C-field, we
need some extra processing to set up the proper address. This pro-
cessing, plus the multi-way branch itself, requires 5 additional T-
periods. (The minimum nanocode segments are given in Appendix C_)
Thus the basic x processing set up takes 9 T-periods. Based on our
implementation, the actual x processing takes from 2 T-periods (for
a gate not properly queued; i.e., no action necessary) to 9 T-periods
for a gate requiring more complex processing. The time for the most
standard processing (gate normally queued) is 5 T-perlods. Thus for each
x: set up, multi-way branch and x processing takes 9T + 5T = 14T.
The only remaining step is to set up for processing of z's for each
x and the set up (address formation) for processing the next x in the
queue.
The set up for z processing consists of calculating the address
of the CLINK word for this gate (gate info word address +_ then
reading in CLINK to get the address of the output list. This processing
takes 6T. The end of x processing for the current gate consists of
setting up for the next x. This involves calculating the address of
LINK and then reading the value of LINK. This takes 6 T-periods.
Thus, the total x processing is given by:
Xtotal = Xset up + _ranch + Xproc + x + xz set up next
= 4T + 5T + 5T + 6T + 6T
Xtota I = 26T (6)
20
7?
i_
I
i •'_
_'_ :_i
" .- L : ,;,
21
Z processing consists of setting up for the processing of the current
output, doing the actual processing, and doing the preliminary set up
for processing the next output. The first part consists of reading the
address of the gate info word for this output and then reading the gate
info word itself. This is then followed by the multi-way branch
(similar to the x processing multi-way branch). This operation takes 12T
(see Appendix C).
Actual z processing takes 2T for the null case, and from 5T to 18T for
the non-null case. To this 5-18T we need to add the time necessary to add
this z to the queue. This time is 6T. Thus, for non-null z processing,
using ....... •.........._,= mu=_ _- case of 7_ for p__ng plus 6T for the queue
addition, we need 13T. So, for z processing itself we have:
null z processing = 2T;
non-null z processing = 7T + 6T = 13_
The final set up for next z is essentially included in the set up
for this z. The only thing that is not done is the testing if this
is the last output. We will assume the sign bit in the last output
is set to i. The time required to do this test is 4T if it is the last
gate and 8T if it is not. (We will use 6T for our figures based on
an average fan-out of 2.) Thus, the total z time is:
Ztotal = Zset up + Zprocess + Znext
= 12T + 2T + 6T (null z)
= 20T (null z)
= 12T + 13T + 6T (non-null z)
= 31T (non-null z).
(7)
(8)
,q
i
• ., _j
. _ _._
; : !_
: _._.!_._
- .6-' ' _.
.,,--_-;_:...._
' • t., i
We now have all the figures necessary to calculate best case slow-down.
As an aside, please note that the nanocode given in Appendix C will
not work if put together. The most striking example of the reason for
this is the processing to determine if this z was the last in the output
list. Remember we assumed the sign bit was set. This means that when
we read the gate info word, we would have to clear all sign bits before
the read. This is not accounted for in the nanocode of this example.
There is also no provision for testing the last x in the queue. But
as a best case timing estimate, these figures define the range of
numbers involved. So, the calculation of slow-down factor looks like:
x processing time = # x gates processed X 26T
= I00 gates X 26T
= 2600T;
= # null z gates X 20T
= I00 gates X 20T
= 2000T;
non-null z processing time = # no_-null z gates X 31T
= i00 gates X 31T
= 3100.
null z processing time
This translates to:
time = (2600T + 2000T + 3100T) X 80ns/T
= 7700T X 80ns/T
= 616000ns = 61_s.
Thus for a .l_s cycle machine, the slow-down factor is 6160:1.
/
22
ir
• _ _L_ ¸
23
In summary, it is obvious that there are several parameters
which determine the slow-down factor for a given case. The parameters
are:
i.
2.
.
4.
system size in total gates;
number of gates in system which change value during a
cycle (average). This may be expressed as a percentage
of system size;
average fan-out per gate;
cycle time of the emulated syste_
For our analysis here we assumed that:
It
2.
3.
4.
system size = 2000 gates;
percentage of gates changing = I00 gates = 5%;
average fan-out = 2;
cycle time of the emulated system = 0._s.
In the following sections , we present some details of the algorithm
we implemented and the results of our timing studies. Since those
timing studies address a _s cycle time machine (I0 times slower than
the machine we assumed here) we can recalculate the slow-down for our
idealized implementation. It then becomes 616:1. Remember that this
does not take into account all of the necessary actions. It is thus
reasonable to expect that the slow down factor for a _ machine to
be best case 600-800:1 and for a ._s machine to be 6000-8000:1. In
sun,mary, for the slower machine , with a smaller system (2000 gates), the
LRC estimate of 500-600:1 is quite optimistic but still a reasonable figure.
+•q
j
5+',J
,.+
." +.+'_.
: ,+ ..:\},
'+ :J+:}+i
,'+ 2 ?
...%
IV. ADDITIONAL FEATURES OF THE ALGORITHM
The algorithm introduced in the previous section includes additional
elements which allow it to handle the types of situations expected in
real-world applications. Aside from handling all types of gates in
the same manner, and being able to quickly process gates without having
to review every input of every gate, this approach takes into account
the possibility for double queueing. If two gates share the same
output gate and both change value such that the output gate should
change, they will both independently queue the same gate for processing.
If this happens within a single logic level, it is quite possible that
the common output gate should in fact not be queued at all, since as a
result of both inputs its value should remain the same. To handle
this sort of case,the NASA LRC algorithm includes processing which pre-
vents unnecessary queueing, as well as a second set of flags (V2 and A2)
and a second queue linkage word (LINK 2) which are used as a means of
remembering the necessary data for processing an additional queueing
if in fact one is required. This latter situation arises when double
queueing is spread over two propagation cycles (two logic levels).
An additional feature of this approach is that flip-flop devices
are treated as ordinary gates with some additional special case con-
siderations. The flag FF is used to indicate a flip-flop device and
enaSles the algorithm to handle such a device effectively. Involved
in this process is the flag T, which indicates a trigger input for a
flip-flop, requiring a slight variation in treatment.
A list of the variables involved in this algorithm and their usage
is provided in Table IV-l, and a section of the algorithm dealing with
24
J25
double-queuelng is detailed in Figure IV-I. It is entered only if during
the normal processing of an output z-gate, that gate's CNT value transi-
tioned into or out of zero. (Thus indicating that this z gate should be
queued for future x gate processing.) The variables of most concern here
are: I) the "properly queues" flag AI, which indicates that a gate is
queued for x processing, and when needed for double queueing enables
i'_ii
•,i_):st
• • :)
..O_j''
._-_
....;i.?_?
a gate to be "dequeued" without actually dequeueing the gate itself;
2) the "cycle queued" flag A3' which remembers in which propagation
cycle (C) this gate was queued for x processing (propagation cycles indi-
cated by C are equivalent to x queue processing cycles, and each repre-
sents the processing of one. logic level uf u_= =_=_=m_; _ _= _.......
value of the gate VI; and 4) the linkage variables LINK I and MLINK
used in the linked-list x queue whose ties between gates define the
course of any given processing cycle.
The variables concerned with the queueing of a gate onto the
second x queue include V 2, A2, A 4, and LINK 2. Although we imple-
mented the algorithm as given to us, we feel there are some functional
discrepancies involved in the manner in which these variables are used.
The concept presented here, however, is of more importance than the
details of its design in the algorithm. The need being addressed
here concerns the queueing of gates a second time. (This situation
arises when double queueing occurs over two consecutive propagation
cycles, as discussed briefly above.) The idea is to remember what
the value of the gate is at the time of the second queuelng, in
g
order to correctly process the gate when its first processing cycle
begins; and to queue the gate properly for a second processing. The
area of the flow in Figure IV-I encircled with dashed lines attempts
to accomplish these goals. The flag A 4 indicates that the second queue
is employed for this gate; A 2 is later to become the A I of the second
processing cycle, and indicates that proper queueing has occurred; and
26
i
.-_-_
• -"c
_3
...f
,
1
./'i
/? .i- .iI
_,: 2i
_?,:, ,!
i :./ ,|
• i
i
r .
V 2 remembers the current newly changed value of the gate. (The second
queue linkage involving LINK 2 and MLINK in the given algorithm does
not work properly when integrated with the normal linkage system using
LINK 1 •)
These additional features do cause extra overhead in the
execution of the algorithm, but they enable the algorithm to emulate
a wider range of real-world systems and to accommodate all the currently
foreseeable events which occur in gate level emulation.
£
J"- d
"ii
:.!!
:!:i]
: ;3
.... ,.i:i,£::_
Variable
A 2
A 3
a4
Vl
V2
FF
T
CLINK
LINK 1
LINK 2
MLINK
C
T,ABLE IV-l:
27
Definition / Usage
"Properly Queued" Flag: Indicates Gate Queued For
X Processing.
A1 For Second Queue.
"Cycle Queued" Flag: Indicates Value Of C When Gate
Queued For X Processing.
Flag Indicating Gate Queued Onto Second Queue.
Current Output Value Of This Gate.
Gate Value For Processing Of Second Queue.
I:l:=n I ndle_finn A I=lin-I=lnn B_.vice.
l lli_l_ I I l_i._l._,._i.i_i.=. 1_ • • • -.l _ • ---I _ .......
Flag Indicating A Flip-Flop Trigger.
Pointer Word Containing The Address Of The Output
List For Each X Gate.
Linked-List Linkage Word Pointing To The Next Gate In
The Queue (Zero If LastWord In The Queue).
LINK I For Second Queue.
Pointer To First Gate•Of The Next Queue (Each X-Queue
Cycle Starts A New Queue), And Zero If End Of Machine
Cycle.
Propagation Cycle (X-Queue Cycle)Indicator (Alternates
Value For Each Logic Level Procesed).
Definition Of Variables.
-!
]
:i
,%,
'.i
:i__:
:.u
I
I
I
I_
Figure
/No
.L 2nd Queue
i-,.o-iFlags:Set A4" LA 2" 0. And V2 -91
i
II Set-Up2ndQueue
LinkageTo Include
This Gate In The Linked
• List: Set UNK2 - IW.iNK
And MLIHK • This Gate
ChangeGate
Value In 2rid
Queue:.Set
V2"C#2
|
'_ueu¢' It Onto l
I
Lestore This Z Gate _'_
nd Continue With )q
xt Z In CLINK Ust_
f
i
ComplimentSecondary I
'_ueuedPr.operly"FI_ e
Sd A Z - &Z " ,
Is This GateA >
Yes Flip-Flop Device7
(FF - 17)
;No
Is 'This Gate Currently >
In The Queue? No
(A|' 17) '_ueuV' It
N° / WasIt QueuedDuring) i( This Propagation .
•\ Cycle?(A _- C?)
"Dequeue" It[ Yes
[ compliment "Properly Queued" I¢Flag: S t A[ - AI
1
1
Nojr is ThisGateStillProperly
_Oueued? IA I- 1?) /
i ,,
I Change Queue LinkageTo Include
ThisGateIn TheLinkedUsE
Set LINK[ - MLINK And
MLINK • This Gate
ii
D/_CJNAI5 PAGE I$
D_ _EOOR QUAT,/Ty
IV-1 FlowOf DoubleOueueing Logic
28
RememberQueue J
Cycley 5atting
A3-C [
_ ii
!
i
i
:_ v'7_i__
V. IP_LEMENTATION OF THE ALGORITHM
In implementing the NASA LRC algorithm previously described, we
have used the unique micro-instruction decoding capabilities of the QM-I
as a means of efficiently handling all of the individual flag conditions
which arise in the course of normal processing. It is important to
recognize in this algorithm the inherent dependence upon individual
flag-bits and the large amount of processing necessary to handle them
properly. Conventional coding methodology requires these flags to be
tested and manipulated individually (which can be quite burdensome).
A great deal of speed and flexibility can be gained by combining all
of these flag-bits into one n-bit computer word, and subsequently using
this word as the address of a specific routine in memory written to
handle the exact bit pattern found in that arrangement of flag-bits.
Thus we associate one word of data with each gate in the system, and
we arrange that word so'that each bit is dedicated for use as a specific
flag. Then when the value of a flag-bit is needed to be known in order
that someaction may be taken, rather than reading each bit and testing
for one or zero, the entire set of flag-bits is taken together as a
"condition set" and used as the absolute address in nanostore of the
routine which performs the exact actions necessary under the conditions
specified by the flag-bits. In addition, when processing of that gate
changes one of these flags, the appropriate bit of that gate's '_info
word" is changed to reflect the latest condition of the flag. This is
very fast and very effective, but it does require a great deal of memory
(in this case, nanostore). For our use, however, this drawback is far
outweighed by the execution speed and flexibility gained.
29
Figure V-I gives an overview of the logic used to implement the NASA
LRC algorithm. The boxes containing an asterisk (*) or asterisks (**) include
3O
.r_
_J
'X
i__:_i/_
Figure
AddThlsZ-G,-ti
Into TheZ _ueue
Is TheZ-_ueue [mp(y?_ TM
I Load81ockWhichContainsThe J
NextGateIn Z QueueFromMS
Into C$
ProcessAll GatesIn ThisBlockIWhi hAre In TheZ Queue
,, _
is The X-_ueueEmW?_Yes
(Is X • MLINK- 07) /
NowIn Control-Store? _/ :* Z-Gate 1.)
/*Is This Z-GateTheLast
. No ( OutputGateF_r ThisX-Gate?
\ _ o,cu=_) /
, JYeS
V-I Overview Of Implementation Logic
i]
?
/!
•!
" -'7 ,_
the type of processing described above. Before detailing these however,
it is first appropriate to examine the lay-out of the memory tables
used and the reasoning behind their structure.
Figure V-2 shows the structure of the "blocks" of gate data which
reside in main-store and are loaded one at a time into control-store
for processing. (Note that for an all control-store resident system
no loading of blocks nor pre-x-cycle z-gate processing is needed.)
In order to handle the most general case of system size and design,
we designed our emulation to handle systems larger than 6000 gates
(which is the largest system under this design to be totally control-
store .... pli ........ t_^_,.resloen_) Tnls is accom shed ........ the u_ u_• LLL_ U U_
_nruu_[1
structure and the means to keep track of interblock connections as
follows. For each gate there are four 18-bit words which are reserved
for dedicated use. The first word is the"gate info word" containing
the flag-bits (detailed in Figure V-2) and the CNT counter for this
gate. It is the lower three bits of this word, concatenated with the
upper seven bits of the same word, which comprise the 10-blt nanostore
address used for branching to the various processing routines as des-
cribed earlier. It is of importance to point out here that the three
bits FF, T, and the "z or x" flag, are used as a "nanostore page address"
and therefore must be placed into F-register FIDX prior to branching
to any routine. This enables the use of the micro-instruction decoding
facility of the QM-I, which concatenates the 3-bit page address found
in FIDX with the 7-blt address found in the C-fleld of local store
register 31, to form a 10-bit absolute nanostore address. Thus the
decoding of all flag-bits for each gate can be done "instantly" by
31
. ;J
.7
]
,...:
Gate _1
Number_ u
'i
5
6
CLINK
Addresses
Info
Word
(See
Below)
zo
In The Structure
Shown Below
CLINK
Address
7--K+2
k Block
!
This Table 2
LocatesThe
AboveDefined 3
BlocksOf Gates
Within MS. 4
MS Location
Tabl...._ee
Absolute Block
MS
Addrtss Size
17 16 ]2 11
FlagbitUsedToIndicate EndOfCLINK
List For EachGate;,As
Well As For Transmitting
V]xThrough LINK1
Gate Info Word.
17161514131211109876 5 4 3
, l,vlv uC A Z_ C nused
,, I z
C A B
Note: FF - 1 For Flip-Flop"Gate"
T • 1 For Flip-FlopTrigger
X Or Z - 0 For Z And 1 For X
& LINKZ
210
FI IXI
Figure V-2 BLOCK STRUCTURE IN MEMORY
32
Jplacing the gate info word into R31, and the B-field of R31 into FIDX,
and then invoking the micro-instruction decoding facility. IT_is causes
33
. -_!
• l
.i? t
?52.:i
, J
' ,z2i
2. _.
•.)_
_:_ _i_ ,, _
a branch to a dedicated routine which sets/resets the flag-bits as
necessary for its exact input conditions, and then returns to a common
continuation location in the main processing routine.
The second and third words of the gate block structure are labeled
LINK I and LINK 2. These are used as linkage address words in the linked-
list x-queue structure which controls the flow of x processing. (LINK 2
is used only for secondary queueing as discussed in section IV). They
contain the block number and gate number of the next gate in the queue.
The fourthword is labeled CLINK and is the relative address of
the start of the list of output gates (z-gates) associated with this
x-gate. The number of output gates for each x-gate will vary from
gate to gate of course, but for sizing considerations in this study
we have averaged the fan-out factor at 2.0 output gates/x-gate. Thus
we need 6 words per gate in each block (the four words described
above and one word for each output gate). This gives an average size
of 36000 words/block for a block size of 6000 gates.
Because the blocks will generally vary in size, there is an additional
table in main-store dedicated to locating blocks inmemory. It is
indexed by block number and contains the absolute main-store address
of the start of each block and each block's size (as the number of
18-bit words in the block). This table is also shown pictorially in
Figure V-2, and is particularly useful in aiding the process of block
loading (from main-store into control-store and back again).
There is one more table of interest in this design. This is a
control-store resident "free core pool" table which is dedicated for
use as a linked-list queue. _en, during normal x-gate processing, an
output gate (z-gate) comesup for processing which does not reside in
34
the sameblock as the x-gate, then it becomesnecessary to queue such
z-gates for future processing (when the appropriate block is loaded
i
'%
a
f
i . .
I
? : ".2_'_1
:i i)}
into control-store). This free core pool queue is used to queue these
z-gates for "pre-x-cycle processing", and is termed the "z queue".
It is depicted in Figure V-3.
Now, returning to Figure V-I, the first action is to process
(pre-x-cycle process) the z-gates waiting in the z queue. So if the
queue is non-empty (and since the gates are ordered sequentially by
block number/gate number), the next z-gate in the queue dictates what
block should be loaded from main-store. When the loading is completed,
those z-gates in the queue which reside in this block are processed
as normal z-gates in the following manner: The CNT counter for each
z-gate is updated according to the value of the x-gate for which the
z-gate is an output. (If Vlx = i, CNTz is incremented, and if
Vlx 0, CNTz is decremented. For z-gates whose x-gate is in some
other block,. Vlx is passed as the sign-bit of the queue element
itself, as shown in Figure V-2, similar to the "end-of-CLINK" flag-bit K.)
If this CNT transitions into or out of the value zero, then the
z-gate becomes "non-null" and a branch is taken using the micro-
instruction decoding facility to the appropriate z-processing routine.
Upon return from this routine, the z-gate Nas been queued into the
x-queue (using LINK1, etc.) for future x-processing. Those z-gates
• -:1
:i ilii!
Figure
I_ I Start Of Free Core Pool
t
II]O [ EndOf FrR Core Pool
(AlusoluteCS Addresses)
RO (Queue Pntr)
Absolute C°
Address
AbsoluteCS
/Vkklm
V-3 Output Gate
mEECO_POOf.
Block tl Gate f (End Of Queue)
ZK.2 "!
Bhc_ # Gate #
ZK.]
Block# Gate t ABS CS Addr
ZK Of NextZ
(Z) Oueuing
Each Queue
Element Is A
Member Of A
Two-Word-Entry
Unked-Ust:
Where Unk Word
Is An Absolute CS
Free Core Pool
Address indicating
The Nexl Element
In The Queue.
The last Bement
In The Queue
Contains -1
in The Unbvord.
Local-Store
R=jlster Zero (RO)
Contains The Absolute CS
Free Core Pool Address
Of 1he Latest Element
Tol_e Queue.
OKIG_NAIJ PAGE IN
.0]_ £OOR QUALITY
35
L_
, j1
// _,. %
36
whose CNT does not transition into or out of zero need no other processing,
so they (and the returned non-null z-gates) are simply restored into
the control-store block for future reference. This is the procedure for
normal z-processing and Qccurs in Figure V-I in the boxes marked with an (*)
Following the pre-x-cycle z-processing, we begin normal x-processing.
This includes the two loops and the same basic logic shown in Figure
III-I, with the actual x-gate processing (shown as the box marked with (**)
in Figure V-I) accomplished via the micro-instruction decoding facility.
Note an additional difference exists here, in that z-gates to be
processed for a given x-gate must be checked first to see if they
reside in the current block in control-store. If they do not, they
are added into the z queue for future processing. And if they are in
this block they are processed as described above for normal z-gate
processing.
The implementation of this algorithm has been coded in nanocode,
and the timing studies discussed in section VI and Appendix B for non-
control-store resident systems are entirely based upon this implementation.
For systems which are completely control-store resident (system size
6000 gates), the timing studies presented are based upon the normal x
and z processing loops Of this implementation, skipping the sections
of code dealing with z queueing. This is reasonably accurate as an
estimate of resident system timing considerations. However, the linkage
words and CLINK "addresses" in memory are still in "block number/gate
number" form, and hence incur additional unneeded overhead for absolute
address computation, etc. If this code were optimized to be a strictly
control-store resident emulation (instead of the generalized "handle all
cases" emulation now coded), the efficiency of processing could be signifi-
cantly increased, and perhaps a 20%-30% timing improvement realized.
VI. TIMING RESULTS
In order to gain an understanding as to the applicability of the
algorithm described in the previous sections, we derived two equations
37
which enable the generalized projection of the timing factors involved
for various kinds of systems. (The derivations of these relations and
examples of their use are presented in Appendix B.) As an initial,
"most simple" case, we examined systems which reside totally in control-
i;
_d
•"'n_
klt!i
° ,,
•<,-
_,._"::.'_.7::-.2..."j_
.-:_:( ",
• 2'4. •"1
... ,v._ ;<
. ::.. >.:.;
..,:_I
: ,_:_ _
• , .°
• 5 , :i
:." _
store (and therefore need no data storage external_to control-store nor
the associated loading and linkage software.) The amount of time
necessary to emulate a single machine cycle is given in T-periods by
the relation:
Tresident = (68 + 35F)x + 29y
where F is the fan-out factor defined to be the number of output gates
per gate processed (or the number of z gates per x gate); x is the number
of gates changing value in the system; and y is the number of queue
processing cycles needed to emulate the data propagation associated with
a complete machine cycle. This can be thought of as the number of logic
levels in the system.
Figure VI-I displays this equation plotted for F = 2.0 outputs/
gate, for the value of x ranging through 5%, 10%, 15%, and 20% of the
system size. The datum of most interest here is that a single cycle for
6000 gates is emulated at a slow-down factor of 3,492:1 for a real
machine cycle of l_s, with 5% of the system changing. Notice the effect
of changing F from 2.0 to 3.0 outputs/gate in Figure Vl-2. A single
cycle for 6000 gates at 5% changing now results in a slow-down factor
of 4,322:1 for a _s machine cycle. This change from a fan-out factor
"I
• ,%2:._
",.)
-",_I
' '::i
_:- F2,;
.,??..2_i
I!
>..
I!
EL
N
r=,=l
J01:)l_-I UMO(]-MOI_
\
|
I I I I I I
SW Ul aUJ!l el:),_3
38
i,!
, l, ",i
1
,,'_.,_:li
i-,,,I
,lo
i_,%
N
Im
E
II
>-
II
L,
i,--I
_o
i,,1%
e,;
c,,J
r--I
c:;
C'M
39
J0pe-I UMOG-M01S
I I I I I I
SW Ul atU!l alO_
,#
40
of 2.0 to 3.0 outputs/gate generally results in a 20% increase in slow
down factor. The data points for these curves are given in Table VI-I.
We can gain a more comprehensive understanding of how these curves
relate to one another by translating them into "sample time" (i.e., the amount
of time needed to emulate .i second of real time execution on a l_s machine).
Figure VI-3 shows the curves for F = 2.0 and 3.0 both for 5% of the
system changing and for 20% of the system changing. Notice that a
single sample for a 6000 gate system at 5% changing and F = 2.0,
requires 5.82 minutes, where the same sample at 20% changing requires
22.38 minutes. (Further data points are given in Table VI-2.) We
can also see that changing F from 2.0 to 3.0 increases the sample
time by the same 20% seen above.
It is perhaps most useful to view this data from an experimental
point of view, and to see how long it would take to run for example
I0,000 samples of .I second real time each. Figure VI-4 displays this
information. For a 6000 gate system with a machine cycle time of _s,
with 5% of the system changing, it takes approximately 40 days or about
1 1/3 months to run a I0,000 sample test with F = 2.0. The data points
for these curves are given in Table VI-3.
In addition to the above described "most simple" case, we expanded
our study to include larger systems whose size necessitates system
residence in main-store with "blocks" of gates being loaded into control-
store for processing. The equation for these systems in generalized
form is as follows:
41
.!
.i
i_._
i_. _ _,_
,7 _ y '_
\
\
selnu!w Ul atU!l alduJes
_o
],5
em
E
O
14%
i._
,o
"G
e_
t-
=L
i,--,I
em
E
i--I
I
I
\
(x0JddV)
Od
sqlu0w Ul aUJ!l
e,,,-i
\
II I1
I..I.I.i_
I I , I I I I I I
_8
c_
i.--I
o
m
s_(] Ul aUJ!l
42
8
J
,--4 e'_
8
ll,_
w_°
m
L
_n
el
lau.
N-.{
. i T
non-resident = (# blocks in system) X {(44) X (4 + F) X
(block size) + 84 X (z prequeued) + 94 +
(z queued) X [72 + (25) X (n - I)] +
43
(68 + 35F)x + 29y}
where z prequeued, z queued, and n are variables associated with the
processing of output gates which reside in blocks other than that of
J
their input gates (see Appendix B for exact details). In order to
gain an appreciation for the meaning of this relation and the processing
involved in such an emulation system, we will consider a system with
only two blocks, each block sized at the maximum available control-store,
6000 gates. Thus we have a 12000 gate system. Assuming 5% changing
and F = 2.0, as above, we find that a single cycle on a _s machine
takes 260,579,630ns. This is a slow down factor of 260,580:1. A
single .i second real time sample would take 7.238 hours, so a i0000
sample test would require 8.26 years! As you can see this is not a
feasible approach. The reason these numbers are so high is that main-
store accessing is extremely slow. It requires 22 T-periods/word to
transfer from main£store into control-store, which means to load a
single 6000 gate block (with 6 words of data required for each gate) it
takes (22) X (6000) X (6) = 792,000 T-periods = 63.36ms. Hence this
overhead becomes quite prohibitive.
%,
• '%",
- h_L '
"3,: ._
t_
=E
o_
.o= -_
leo
1.1.1I..i..
&i
!
m
I,--
I#1 oO
r,,..
oo
.4
/ __'_
r...l
i===I
N
• I
x
e._ r'v
II
ii
ii
_____} _ _ _ I|_>,= _,,_
_ 6 >-
44
--I-
I.i_
4-
I!
J.j
i
'i
j
ISi_ •
c_
c_
_0
c_
c_
c_
iml
r_
c_
_0
c_
rml
c_
O0
r_
C_ 0_
c_
00
C_j
iml
c_
no "_
a_ C_
00
r_
CO
r_m
O0
c_
C_
O0
Lt_
N
q_
o -/| Qm
D
m_
° /IB X
Lt_
C; _n °rm °_,_
•_ °_ _ _
C_
O0
Lt_
q_
c_ c_
CL_
mi_w
c_
!
X
In
Q_
,,line
0
li
Li_
0
LL.
N
÷
X
A
LL
Lr_
÷
0O
I!
r_
r--
iml
Im
Q_
N
Om
C_
E
Q_
v_
C_
i!
45
C
o_
r-
_r_ 1
r- 0
om
,lO 
EE
_m Ol
I--- I--
I
Er_
I=:----
W
|
I
I--.
r-.. N
N
i,-,. N
r,-.
I',l,
r,,._
N
O0 .-.4
l'--I r-,-
odoe
N
,1,,I
I_,. oO
_N
!',,- llll
I""
-_-_
N
g,
E :N-,
II
e-
om
E
C
om
¢Y
O
I
O
¢-
O
46
i_/,!
el
r- O
em
EE
em em
EE_
I
m
I--
(3
(3
r,-I N
M_
m
N
N
_M
I_ om
, E'_
/ ×
N'_"
N
N
g,
;5 E'E_
N
r,.._
u4
N
_ N
_._=
E
(_ om
II
I,J-
e-
em
E
e-
o
I
m
O
47
.i"
.?.,
• l
ii
2._._i
_ _2_.:;._
i!
E
o o
l-u. e4 _ _
I
m
m
#--
48
r L
• :.f
t"
_i,_]
_i_ _,'
VII. CONCLUSIONS
As we stated in the introduction, the intent of our implementation
was toprove the feasibility of using gate level emulation technology
in support of data collection for reliability studies of fault tolerant
°
digital avionics equipment. It is clear that to support statistical
measures, the key potential problem is the time necessary to execute
a sample run on the gate level emulation. If this time is too long,
the task of running a statistically significant number of samples
becomes overwhelming. We have therefore focused our feasibility
determination on the execution speed of gate level emulation.
49
As shown in Appendix B, our QM/I implementation results in a
1200:1 slow-down factor for a 2000 gate, control-store resident system
within the constraints given in that appendix. This datum is also shown
in section VI in the graphs although it is not explicitly noted since
the 6000 gate example given there is the limiting case. This 1200:1
slow-down compares very favorably with the best possible case for
slow-down shown in section III of 600:1, since the implementation
contains features which cause additional overhead for address calculation
and system partitioning into blocks. The maximum resident system of
6000 gates also falls in the reasonable range of 3500:1 slow down.
On the other hand, the partitioned system case, of which the 12000
gate, 260,500:1 slow-down is an example, is clearly not feasible for
any reasonable number of samples. Based on this, we conclude that the
gate level emulations should be restricted to control-Store resident
subsystems, which for the QM/I works out to a maximum of about 6000 gates. We do not
feel this is overly restrictive considering that we have seen gate level
simulations of current technology micro-processors which fall in the
h_
i
•o:'L
/, _ !I
i'_L -:_
range of 2000 gates. Thus 6000 gates can represent a fairly substantial
subsystem. Furthermore, by restricting ourselves to completely resident
emulations, further economies in the implemented algorithm can be
achieved as mentioned in Section V. We estimate that we can achieve
about a 20-30% improvement in speed.
The primary conclusion we can make based on the implementation for
the QM/I is that gate level emulation is feasible to do and provides
the speed necessary for statistical studies of reliability.
Although the implementation we did was based on the QM/I archi-
tecture, the restrictions imposed by that machine do impose a limit
on what is achievable. Examples are the 6000 gate limitation and the
additional overhead necessary to decode ten bits rather than the seven
that the QM/I is set up for. Three possibilities come to mind in terms
of providing the emulation support capability for the final facility.
The first of these is to consider making hardware modifications to the
QM/I. This could include expansion of the maximum permissable control-
store size or the addition of a bus to connect main-store and control-
store directly. Secondly, other micro-programmable machines may be
more amenable to the application. And finally, the possibility of
building a special purpose, gat_ level emulation machine should be
considered. Such a machine might be readily assembled from 2900 series
chips. All three of these possibilities will be considered in the second
phase of the contract.
50
:4
I
VIII. REFERENCES
I. qM-I Hardware Level User's Manual, Nan.data Corporation,
March 1976.
2. MULTI Micromachine Description, Revision i, Nan.data
Corporation, March ii, 1976.
. Digital Avionics Design and Reliability Analyzer,
Feasibility Study Report, MCR-79-663, Martin-Marietta
Aerospace Corporation, November 1979.
51
&i
L_
,'74
-?& ,_i:
::'X:2-!
APPENDIX A
UNIFORMITY OF GATE TRFAT_E._T
One of the primary benefits of the NASA LRC gate level emulation
algorithm is the concept of gate processing independent of the function
of the gate itself. What this means in practical terms is that the
algorithm does not need to keep track of the gate type and can handle
ANDs, ORs, NANDs, and inverters all in exactly the same fashion. This
appendix is intended to provide a brief description of how this is
possible by discussing a few examples to illustrate the processing
done and decisions made.
A-I
In order to process the gates, two values are required. The first
represents the current value of the gate. We will call this V. The
second quantity relates to the number of inputs of the gate. We will
call this value CNT (for count). This number is the key to the pro-
cessing and the distinction as to type of gate is characterized by the
initial values assigned to CNT and V.
In operation, whenever an input to a gate changes value (from 0 to
i or vice versa), the quantity CNT is operated on. For the 0 to i change,
CNT is incremented by i. For the 1 to 0 change, CNT is decremented by i.
The gate whose CNT is being updated will change value whenever CNT
transitions either into or out of 0. That is, if either the old value
of CNT is zero (before increment or decrement) or the new value of CNT
-is zero (after the increment or decrement), then the value V is changed
(0 to I or 1 to 0 depending on current value). A few examples will
best illustrate this.
,!
_•k_
/ Ci
• !i,?_
i ::::';
-: .2 '!
i ?':'!
• '/ ._
A-2
Example 1 : 3 input AND 8ate
V:A.B'C
The description of the action of the 3 input AND gate is best described
by the follo_ing state diagram.
I
_GoesFrom _ "n_Ut g_Z --_u_ g'_ Z
|
V_ue V- g ] Vzlue V-!
Note that the left to right arcs represent an input going from 0 to I
while the right to left arcs represent an input going from I to 0.
When we get to the rightmost state, all inputs are I and hence the output
V is I. In all the other states, at least one input is 0 and the
output V is 0. Now suppose we let CNT = 0 for the case where all
inputs = I. The state diagram with CNT values in place of number of
inputs = 0 is:
I
ecrementC _X
Value V- B ] Value V-!
Note that the arcs on this diagram represent exactly the same as on
the previous diagram; i.e., left to right is an input going from 0 to 1
and right to left is an input going from i to 0. The transition of
"i
J
:5
:J
: •L
A-3
CNT into and out of 0 occurs across the dotted line and value V does
indeed change when we cross this line. From this diagram, we see that,
to initialize CNT and V for an n-input AND gate, we first assume all
inputs are 0. We then set CNT = -number of inputs and V = 0. After
initialization, we can blindly follow the specified processing and
the proper gate output value will be produced.
Example 2 : 2 input NAND gate
B--I J
The NAND gate is a simple extension to the AND. The only difference is the
value of V. V will be 1 to the left of the dotted line and 0 to the
right. Thus a 2 input NAND gate state diagram looks like:
I
l
!
ValueV-! i ValueV-_
Initialization conditions are CNT = -number of inputs and V = I,
!
Example 3 : 4 input OR sate
A
B
C
D _ V=A+B+C+D
i_ :,_ _,_
i j
{
x L:
• :, . q
The state diagram for the OR gate looks like:
i
In_t 1-_ nput l-'-
Value V-J' I Value V-I
I
To isolate the CNT = 0 node in this case, we need to isolate the state
to the left of the dotted line. Thus the CNT state diagram looks
like:
.,_.t CN/" .._e_ent CAn. .,_ctement C,_,. .rement C/w
Value V-,g ! Value V-I
Thus the initial condition (all inputs 0) for an OR gate are: CNT = 0,
V = 0. The extension to a NOR is made simply by using CNT = 0, V -- 1
for initial conditions.
Example 4 : 2 input XOR gate
The state diagram for the XOR gate is:
Vdr I V-! I V.ff
A-4
?'4
?:_
. "7;• "-7
A-5
The CNT state diagram becomes:
I I
This again is a simple case for the general algorithm. Initial
values for the two input cases are CNT -- I, V -- 0. This is also the
case for the general "odd" number of inputs type gate (i.e., 1 is produced
for an odd number of inputs = I). The only difference is that instead
of signed arithmetic, modulo arithmetic is used (-1 mod 2 = I).
Based on these examples, the initial conditions (assuming all
inputs O) for the most common gates are given in Table A-I. The
inverter can be handled as either a one input NAND or a one input NOR.
Type Initial Value
Gate CNT V
AND -number inputs 0
NAND -number inputs 1
OR. 0 0
NOR 0 1
INVERT -I 1
INVERT 0 i
XOR -i 0
NXOR -i 1
Table A- 1
/!
q
i. _?,
/!
'i
Once the initialization has been done, the processing of each
gate is exactly the same, regardless of type. In addition, the concept
is flexible enough to be able to handle more non-standard type gates
(e.g., the odd number of input counter which could be used for parity
generation).
A-6
APPENDIX B
DERIVATION OF EQUATIONS
This appendix explains the basis for the system "emulation time"
equations used in this report. Two basic equations are derived herein,
the first for systems residing totally in control-store (e.g., system
size _ 6000 gates); and the second an approximation for larger systems
which necessarily have only part of the system in control-store and the
remainder in main-store. This appendix first explains the former of
these and then gives some examples. Following this is the derivation
of the second equation and then an example of its usage.
It should be understood that these equations are derived from the
actual implementation of the algorithm described in section V.
The given timing considerations are simply the sum of the individual
T-periods involved in executing the algorithm. (Thus when it is stated
that x-processing takes 46 T-periods, this comes from examining the
code itself. Recall that one T-period is the basic unit of time for
nanocode, and is defined to be 80ns).
B-I
- L
•/ 5]
/
i
•_ ,, _44
i i:**¸_!
B-2
I. DERIVATION OF THE EQUATION FOR CONTROL-STORE RESIDENT SYSTEMS
Given: I) x-processing requires 46 T-periods/x and 29 T-perlods/
logic level in the System (using 5 T-periods for
most common case processing routine);
z) z-processing requires:
62 T-periods/non-null-z
and 35 T-periods/null-z
Where non-null-z's are those output gates whose counter (CNT)
transitions into or out of zero as it is processed during normal z-
processing. This results in that z gate being queued as an x-gate for
processing in a future cycle. Null-z's are z-gates whose counter
does not transition into or out of zero and hence cause no further
action to be taken. In addition the last z processed for each x
takes 5 T-perlods less than the other z's, hence: -5 T-periods/x.
Adding these together we get
Tresident = (46 - 5)x + 62X (# non-null z's) + 35 X (# null-z's) + 29y
with y = # logic levels in the system. (I)
Now consider that
i) the fan-out factor F =
total # z's
total # x's
= # z's/x
and
,'2 _'_L,=__^_**=# non=null z's is _-+_.11,, equal _n _h=
number of x's since each x comes from a non-null z.
ii!!
• |
;!
L_
J
il
_'#,
S
_ LI_
_ence:
so
F = z/x
(# null z's) + x = total # z's
(# null z's) + x = xF
# null z's = xF - x = x(F-l). (2)
Substituting equation (2) into equation (i) we get:
T = (46 - 5)x + 62x + 35[x(F - I)] + 29y
= (41x + 62x + 35xF - 35x + 29y
= 68x + 35xF + 29y
.'. Tresident = (68 + 35F)x + 29y. (3)
Equation (3) is the general form of the control-store resident
system equation. Now for the following examples let us assume that 5%
of the system changes at any given time. Thus
x = (.05) X (system size) (3.1)
Level 1 2
2
B-3
Additionally, in order to estimate the value y, the number of logic
levels (or x-queue cycles), we need to make an assumption concerning the
system itself• To facilitate this assumption, we will deal with an
intuitive concept called system shape• We assume the system is in
general rectangular when the logic levels are plotted across the top
of the diagram and the gates per level down the side. For example,
the RS flip-flop below is square (2 logic levels w/2 gates/level)
Logic
,I
ri
o
i
_ i_. _._
_: i'_i
/'.'i
_-4
Now, we recognize that, for larger systems, the general shape
will be a non-square rectangle with the long side in the vertical
direction; i.e.,
P
Where P (3
However, the algorithm will translate feedback (as in the flip-
flop shown before) as additional logic levels. Because of this, the
general shape of the system, as seen by the algorithm, will become more
square. Therefore we will assume for the following examples that the
system is square so that the number of logic levels (y) = number of
gates per logic level = ,/system size . (3.2)
Combining (3), (3.1), and (3.2) with a fan-out factor of
F = # z's/x = 2.0 we get:
T = (68 + 35(2.0))(.05)(System Size) + 29/system size
resident
Hence:
T = (6.9)(System Size) + 29#system size
resident
Example I: System Size = 2000 gates:
Tresident = (6.9)(2000) + 29( 20_6_ )
= 13800 + 29(44.725)
= 15097 T-periods = 1207760ns
Or a slow down factor = 1207.8:1
for a l_s machine cycle. (12,077.6:1 for .l_s machine)
(3.3)
L_¸i_
Example II: System Size = 6000 gates:
T = (6.9)(6000) + 29( 60_6_ )
resident
= 41400 + 29(77.46)
= 43646 T-periods = 3491680ns
Or a slow down factor = 3491.7:1
B-5
,":;::4"!
-?' _!
for a _s machine cycle. (34,916.8:1 for .I_ machine)
:J
k ,..
¸..22¸ _
• 6:" i
-i_-7 _
.._._. iI
i¸ , _i_
II. DERIVATION OF THE EQUATION FOR NON-CONTROL-STORE-RESIDENT SYSTEMS
In order to accommodate large systems whose size prohibits
resid_mce of the entire system in control-store, in this emulation
"blocks" of gates (large tables of gate data) are loaded into
control-store one at a time for processing, while the remainder of
the system being emulated resides in main-store. When a block is
loaded, the first processing necessary is that needed for z gates
residing in this block which were queued by previous x gates in other
blocks. This uses an additional queue, dedicated to this situation,
and so we have termed this initial z processing "prequeue" processing.
So we define "zprequeued"as the number of z gates processed in this
prequeue phase. Similarly, "z queued" is the number of z gates queued
during the normal processing of each block onto this dedicated z gate
queue.
Now, given that:
I) it takes 22 T-periods/word to transfer 18-bit words
from maln-store into control-store (and visa versa),
with an overhead of 28T-periods per block;
2) Pre-queue processing of z's takes:
84 X (z prequ_ued) + 66;
3) there are 4 words/gate in the memory tables plus one
word for each output gate (z). Assume a fan-out factor
# z's/x = F. Then 4 + F words are needed in memory
per gate.
B-6
B-7
So to begin with, equation (4) below accounts for the loading of the
new block from main-store into control-store, and the processing of
z's which were queued by someprevious block. All of this occurs
prior to the normal x and z processing:
T = (44) X (4 + F) X (block size) + 84 X (z prequeued)pre
+ 94. (4)
Equation (4) will be in effect for each block as it is loaded, and includes
the restoration of each block to main-store.
In addition we must consider the processing of gates in the block
while it is in control-store using equation (3).
Furthermore, subroutine ZQUEADDis used to queue z gates who
reside in blocks other than the current block in core (onto the dedicated
z queue). In this process each element in the queue is comparedwith
the z being placed onto the queue to ensure sequential ordering of
the queue (by block numberand gate number). This searching takes
25 T-periods for each queue element searched which does not yield the
position for the new z. Hence if there are m elements in the' queue
and n of these are searched for each z being added (including the
element which reveals the location for the new z), we must add
25 X (n-l) T-periods for each z placed onto the queue. Additional time
is needed as well, but the searching accrues most of the queueing time.
Thus for each block:
Tzqueadd = (Z queued) X [72 + (25) X (n-l)] (5)
::!!
},!
!
<,_
, _
i
. ,
i ,,!t
-..::.=
..<.:-,_
:. "._
.
1
.2,
' ' :'<i
B-8
Hence if we combine equations (3), (4), and (5) we arrive at what
seems to be a reasonable approximation equation for systems of more
than 6000 gates:
T
resident
T
pre
T zqueadd
T
overall
= (68 + 35F)x + 29y (3)
= (_) X (4 + F) X (block size) + 84 X (z prequeued)
+ 94
(z queued) X [72 + (25) X (n - i)]
(4)
(5)
+ T + T X (# blocks loaded)
= (Tresident pre zqueadd )
(Where Tresident is interpreted such that x and y are associated
with block size instead of system size)
Substituting we get:
T = (# blocks in system) X {(44) X (4 + F) X (block size)
+ 84 X (z prequeued) + 94+ (z queued) X [72 + (25) X (n - i)]
+ (68 + 35F)x + 29y} (6)
Equation (6) is the general form of our equation. Now for the
following example, we will make some assumptions concerning the system
under consideration. First we need some way to approximate, as closely
as possible, the number of output gates (z's) which are prequeued for
a block by other blocks, and as well, the number which are queued by
each block. If we assume a square system (as discussed previously for
the resident system configuration) then the number of outputs for a
block = /block size Additionally if 5% of those outputs are changing,
then we can assume 5% of /block size as a reasonable estimate for the
g
number of z's to be prequeued for and/or queued by a given block.
,J
Thus: z prequeued = z queued = .05/block size (7)
Similarly we can say that the number of logic levels in the system
= y = /block size (8)
(based upon a square system configuration). Now we must gain an
understanding of n, the number of elements in the queue which are
B-9
searched in order to add each z queued into the queue. We know from
(7) above that for each block, 5% of /block size
queued by that block. Thus (# blocks in system) X (.05/block size )
is the number of gates
gives the maximum length of the z-processing queue at any time. Now
we further assume that on the average, 50% of the queue needs to be
searched for any given z.
Hence: n = ½(# blocks in the system) X (.05/block size ) (9)
Combining equations (6), (7), (8), and (9) we get:
T = (# blocks in System) X [ (_4) X (4 + F) X (block size)
+ (84) X (.05/block size ) + 94 + (.05/block size )
X {72 + (25) X [(½) X (# blocks in System)
X (.05/block size ) - I]} + (68 + 35F)x + 29/block size ].
Combining terms gives:
T = (# blocks in System) X _4 + (44) X (4 + F) X (block size)
+ (.05/block size ) X [156 + (12.50) X (# blocks in system)
X (.05/block size ) - 25] + (68 + 35F)x + 29/block size }
= (# blocks in System) X [94 + (44) X (4 + F) X (block size)
+ 7.8/block size + (0.03125) X (# blocks in system)
X (block size) - 1.25/block size + (68 + 35F)x +
29/block size ].
,i
*•4_,..?
• ;,4
So T = (# blocks in system) X {94+ (block size) X [(44) X (4 + F)
+ (0.03125) X (# blocks in System)] + 35.55/block size
+ (68 + 35F)x}. (i0)
B- i0
-?
: d
. _.-,_
2 ,i
: ' £
. .,:.3
.% ._.
! .....
,. !..:
• , , q
• L"
. ;7.
i
i L °:
I: : ::Yl
• L "'4
Equation (I0) is valid for all x and F in all systems meeting our
initial assumptions. But in general we wish to use this as an aid
in determining if this type of system is feasible. Hence let us
further.assume that the number-of gates changing in the system at
any given time is 5%. Thus x = 5% of the block size. Furthermore,
assume a fan-out factor of two output gates per gate so F = # z's/x
= 2.0. Substituting these into equation (I0) we get:
T = (# blocks in System) X {94 + (block size) X [(44) X (4 + 2.0)
+ (0.03125) X (# blocks in system) + (.05) X (68 + 35 X 2.0)]
+ 35.55/block size }.
In simplifying this equation, for the purpose of understanding the
nature of this relation and its applicability to real systems, we assume
the most simple case in which block processing occurs sequentially
without interblock feedback. This means that the total system size
= (# blocks in the system) X (block size), and hence we can substitute
system size into the equation for the term which contains this product.
Realizing that this is not the general case, it is understood that
for more complex systems_ the time T will be greater than that which
is given in this equation.
Thus by combining terms and substituting system size appropriately
we get :
T = (# blocks in System) X [94 + (0.03125) X (system size)
+ (270.9) X (block size)+ (35.55_block size )]. (II)
B- Ii
i
Equation (Ii) gives T (in T-periods/cycle) for simple case systems
with F = 2.0 and 5% of the system changing. An example of its use
follows.
'4
:!i
J
• :.,-j
: .2 :;
",f. • _;o
• • ,+ .
Example of Non-Resident S_stem:
System size = 12000 gates.
Block _ize = 6000 gates.
(hence # blocks = 2)
T = 2_4 + (0.03125)(12000)+ _70.9)(6000)+ (35.55)(77.46)]
= 2_4 + 375 + 1,625400+_753.7)
= 2(1,628,622.7) = 3,257,245.4 T-periods
= 260,579,630ns
Or a slow down factor of 260,580:1 for a _s cycle machine.
_,605,796:1 for a ._s machine)
Compare this to the 3491.7:1 slow down for a resident system of
6000 gates. (34,917:1 for ._s system)
APPENDIXC
NANOCODE FOR BEST CASE TIMING ESTI_IATE
This appendix contains the "minimum" nanocode to implement the
processing necessary for the algorithm as defined in section III.
structure shown in Figure 111-3 is assumed and the following Local
Store (LS) register assignments are also assumed:
C-I
The
LS re_ister contents
x •address of current x gate info word
-,|
•.iJ
.,J ,_
-:,.}?
°,.,_
2:.-_
,,.-.,
,,/,,
Y
z
w
a
b
address of address of current z gate info word
address of current z gate info word
MLINK
constant integer 2
scratch
i!
• i•
C-2
The x-gate processing (read of info word) and multi-way branch assumes:
gate info word address in LS register x.
XPROC: .... BRANCH(N. + i)
KA= x
KB= 31.
X... KA + FCIA, KB ÷ FCOD
4T
5T
•X..
• eSe READ CS (CIA) , GATE CS,
READ NS, GATE NS
e e,e
X.. • B ->FIDX
•X. • INCF ÷ FIDX
•.X. LOAD NPC(CS)
...S READ NS, GATE NS
Set up to read info word into
R31.
CS bus wait.
R31÷ gate info word.
Set up top 3 bits of NS address.
A I in bit _ signifies x processing.
Set up NPC for branch based on
FIDX and top 7 bits of gate info
word.
Branch through micro-op-code.
9T
Aj_
7:
.a A (_
_.:_
:2.1.!
"- -:_
• .:,:)
i12:. I
-7!
:'2!
_""<2
-, .::5
/ . -] =_
p
..- ::.-...:
C-3
Read of CLINK for this x assumes:
- gate info word address is in LS register x
- constant 2 is in LS register a
- scratch register is LS register b
- gate info word address for next output (z gate) is in LS register z
6T
.... BRANCH (ZPROC)
KALC = ADD
KA-- x
•KB = a
KX=b
KT = y
X... KB + FAIR, KA + FALL,
KX + FAOD
•S.. GATE ALU, KT + FCOD,
KX + FCIA.
..X•
...X READ CS(CIA), GATE CS,
READ NS, GATE NS
Set up to get b = x+m (z = x+2)
Register b ÷ x+2 (addr CLINK into
b). Set up to read CLINK value
into y.
CS bus wait.
y ÷ CLINK (address output gate
list).
6T This then proceeds to Z processing (ZPROC).
);i
e
i!
:i
k_
l
LLi
• 4!
!i
;. ]
C-4
Set up for next x-gate assumes:
- current gate info word address is in LS register x
- scratch register is LS register b
6T
.... (xPRoc)
KALC -- INCR LEFT
KA=x
KB=b
X .... KA -_ FALL, KB ÷ FAOD,
SET CIH
.S.. GATE ALU, KB -_ FCIA,
KA -_ FCOD
.oK.
...S READ CS(CIA), GATE CS,
READ NS, GATE NS
Set up to increment address to
get addr of LINK.
Scratch register b + x+l. Now
set up to read that LINK.
CS bus wait.
x ÷ address next gate info word
(LINK of current word).
Then go to XPROC.
6T
.!
/I
• ,
•i
' f.
L!
H •
i° ii
C-5
Z-gate processing set up and branch assumes:
- address of the address of this z gate inf. word is in LS
register y
- address of this z gate inf. word in LS register z
ZPROC: .... BRANCH(N. + I)
KALC = INCR LEFT
KA=y
KB=z
KX = 31.
JA
4T
3T
X• • •
X •
• L•
• •Se
•.•X
•ee•
Xee•
eSeo
.oK•
••ee
Xeee
•See
KB ÷ FCOD, KA÷ FCIA,
KA ÷ FALL, SET CIH
F_.÷ FAOD
READ CS(CIA), GATE CS,
GATE ALU, READ NS
KB ÷ FCIA, KX ÷ FCOD,
GATE NS
BRANCH(N.+I)
READ CS (ClA), GATE CS,
READ NS
B ÷ FIDX, GATE NS
LOAD NPC(CS)
READ NS, GATE NS
Set up to read gate inf. word
for this gate; and to incre-
ment the address to point to
next gate address•
Want to write the new address back.
Register z ÷ addr gate inf.
word. y ÷ y+l (Next gate addr).
Set up to read gate inf. word
into R31.
CS bus wait.
R31 ÷ gate inf. word for this z.
Set up top 3 bits of NS address•
Set up multi way z branch.
And go.
12T
"/!
•L._
r_.
• I
C-6
Addition of z-gate to link queue assumes:
- This z gate info word address is in LS register z
6T
- Scratch register LS is register b
- MLINK is in LS register w
: .... BRANCH (N.+I)
KALC -- INCR LEFT
KA= z
KB=b
KX=w
KS = PASS LEFT
X... KA ÷ FALL, KB _ FAOD,
SET CIH
.S.. GATE ALU, KB ÷ FCIA,
KA -_ FCID,
KA÷ FAIL
• °X•
...S
G(G KS); G + KALC
KX ÷ FAOD
WRITE CS (CIA) ,
GATE ALU,
READ NS, GATE NS
Set up to find addr of LINK
for this z.
Scratch register b + addr LINK.
Now set to write MLINK into this
LINK,
And to set MLINK to address of
this z info word.
Change ALU to PASS LEFT this gate
info word addr to MLINK.
Set LINK to MLINK. Set MLINK to
addr of this gate info word;
And continue.
6T
For each x-gate, the last z output test assumes:
- LS register contains address of this gate info word
- bit 17 is set for the last z in list
C-7
1
•4
•> ,; "3
i
i1_'!
:I
: i
,il
4/6T
2T
4/8T
.... BRANCH (XPROC)
ALT BRANCH
KALC = PASS LEFT
KT = SIGN
KA= z
X... KA ÷ FAlL
.X.. LOAD NPC (SEQ)
..S. READ NS, GATE NS(T)
_oS READ NS, GATE NS
.... BRANCH (ZPROC)
S... READ NS, GATE NS
Test sign bit.
ALT. BRANCH to next word.
Branch to XPROC for sign = 0.
Otherwise continue to N.+I.
Narwhal Aeronaut,_s and
Soace A(:lmlr, strat_n
Report Documentation Page
1. Report No.
NASA CR- 181641
4. Title and Subtitle
Digital
7. Author(s)
2. Government Accession No.
Avionics Design and Reliability Analyzer
9. Performing Organization Name and Address
Martin rlarietta
Denver, CO 80201
12. Sponsoring Agency Name and Addre_
National Aeronautics and Space Administration
Langley Research Center
Hampton, VA 23665-5225
15. Supplemen_w Not_
Technical Monitor: Gerard E. Migneault
Langley Research Center
3. Recipient's Catalog No.
5. Report Date
February 1981
6. Performing Organization Code
8. Performing Organization Report No.
10. Work Unit No.
505-66-21-03
11. Contract or Grant No.
NASI -15780
13. Type of Report and Period Covered
Contractor Report
14. Sponsoring Agency Code
16. Ab_mct
This document contains the description and specifications for a digital
avionics design and reliability analyzer. Its basic function is to provide
for the simulation and emulation of the various fault-tolerant digital avionic
computer designs that are developed. It has been established that hardware
emulation at the gate-level will be utilized. The primary benefit of
emulation to reliability analysis is the fact that it provides the capability
to model a system at a very detailed level. Emulation allows the direct
insertion of faults into the system, rather than waiting for actual hardware
failures to occur. This allows for controlled and accelerated testing of
system reaction to hardware failures. There is a trade study which leads to
the decision to specify a two-machine system, including an emulation computer
connected to a general purpose computer. There is also an evaluation of
potential computers to serve as the emulation computer.
17. Key Words (Suggested by Author(s))
Reliability Analysis
Digital Emulation
18. Distribution Statement
Uncl assi fi ed-Unl imi ted
Subject Category 62
19. Security Classif. (of this report)
Unclassified
20. Security Classif. (of this page)
Unclassified
21. No. of pages
153
22. Price
NASA FORM 1626 OCT 86
