

















TIua documznX ka& been appiovzd ^oi paotcc te-
£eaae and taJU.; -ct* <ia>Vu.bixtLon -U untanctzd.

Built-in Self-Test for an Airborne Diaital Computer
by
Edward Oliver Bierman
Major, United States Marine Corps
B.S., United States Military Academy, 1960
Submitted in partial fulfillment of the
requirements for the degree of





This thesis describes the desiqn of a built-in self-test capability
for a military airborne digital computer. The supportive investigation
of program constraints and their effects on the example test desiqn is
intended to give broad perspective to the general self-test design
problem. Alternate procedures for achieving the goal of airborne
detection and isolation of a certain class of failures to the modular
level are surveyed. A specific test design is evolved illustratinq the
unique mix of program-oriented, periodic techniques, and added hardware,
continuous techniques best suited to the example development program.










II. PROBLEM DEFINITION AND DESIGN OBJECTIVES — 9
A. BROAD GOALS - — 9
B. PROGRAM CONSTRAINTS --- 9
1. Cost of BIT - 9
2. The Parent Computer 11
C. BIT DESIGN OBJECTIVES — 16
III. THE NATURE OF FAILURE 18
IV. TEST PROCEDURE ALTERNATIVES 26
A. GENERAL CONSIDERATIONS -- 26
B. PROCEDURE CATEGORIES 28
1. Normal vs. Marqinal 28
2. Software vs. Hardware 29
3. Continuous vs. Periodic 33
4. Deterministic vs. Non-Deterministic 34
5. Combinational vs. Sequential 34
C. ALTERNATIVES 35
1 . General 35
2. Coding 36
3. Diaqnostic Partitioninq 38
4. Program Hierarchy Testinq 40
5. Software Exercise, Hardware Detection 40
6. The Black-Box Approach 41
7. Concurrent Hardware Checking 44
8. Replication and ComDarison --- - 45
9 Probabalistic Method
V. THE EXAMPLE DESIGN -
47
A. THE TEST CONCEPT -
47
B. THE PROCESSOR UNIT
- 51
C. THE CONTROL UNIT
56
D. THE CORE MEMORY UNIT
62




F. SEQUENCE OF TESTING 70









APPENDIX A PSEUDO-RANDOM NUMBER GENERATOR
-
APPENDIX B FIGURES -
BIBLIOGRAPHY —
-









1. The 2A NAFI Module 78
2. General Processor Module 79
3. Carries - - 80
4. Processor Module Checking Circuitry 81
5. Modular Relationships 82
6. Control Unit Organization 83
7. The Partitioned Control Unit 84
8. ROM Storage Module Details 85
9. Standard Memory Unit Layout 86
10. Sense/Inhibit Related to Core Stack 87
11. Decode Module --- 88
12. Sense/Inhibit Module — 89
13. Scratch Pad Test Procedure 90
ACKNOWLEDGEMENT
The author wishes to acknowledge his considerable debt to the
Advanced Projects Laboratory, Data Systems Division, Aerospace Group,
Hughes Aircraft Company, Culver City, California, in the oreparation
of this thesis. In particular, Mr. Charles A. Pullen of the Advanced
Computer Systems Department, whose competent guidance and constructive
suggestions throughout the major oortion of the supporting Droject
were so helpful, should be cited. In addition, the assistance of
Mr. Charles B. Ames, Mr. Noel A. Boehmer, Mr. Donald Achterberg,
Mr. Charles Disparte, Mr. John A. Schiro, and Mr. Lynn Anderson, all
of Hughes Aircraft Company, is also gratefully acknowledged.
The role played by Professor Mitchell L. Cotton of the Naval
Postgraduate School in the writing of this thesis should not go
unmentioned. His assistance and advice in defining the investigation,
watchina its direction, and maintaining its persnective throughout
is sincerely anoreciated.
I. INTRODUCTION
Maintenance and repair of faulty electronic equipment have always
been the less glamorous companions of design and oneration. Indeed,
the subjects were often broached only after design concepts were
formed and specific circuitry develooed. The evolution of increasinaly
complex electronic systems, such as digital computers, has forced
qreater and earlier consideration of the problems of locatinq failures
and correcting them. A digital computer which automatically tests
itself for proper operation and which provides valuable information to
facilitate maintenance and repair has become yery attractive for mili-
tary and space systems applications. This thesis reports the results
of an investigation to provide such automatic self-checking for a
digital computer system.
A project which considerably supported the investigation was accom-
plished at Hughes Aircraft Company, Culver City, California, during an
industrial experience tour. The project goal of designing a built-in
self-test (BIT) capability for an advanced airborne digital computer
system for military application was more fully realized because BIT
was accepted as a principal desiqn consideration early in the architec-
tural design procedure. The specific desiqn developed will be used
as an example; however, the test procedures will be recoqnized as
being more generally applicable to the class of diqital computers for
which the assumptions and constraints applied herein can be validated.
Only one of many possible solutions to the fault detection and isolation
problem will be presented. The choice made should not be construed to
reflect official policy at Hughes Aircraft Company.
Some qeneral comments at the outset should olace this investigation
in proper perspective and temper expectation with praqmatism. The inves-
tigation has as its central focus the specific BIT desiqn develoned;
however, it is intended to consider the broader systems desiqn options
available, thereby showing the example design in better oersnective.
As Sellers, Hsiao and Bearnson [Ref. 43] so aptly observe, one should
initially set reasonable design objectives relative to the thoroughness
of test, recognizing that exhaustive automatic test is an almost
unattainable practical goal. As Dart of a computer development nro-
qram, the BIT desiqn is subject to the larqer nroqram objectives and
constraints. The first part of this investiqatton will define the test
design problem in more specific terms. Subject to practical limitations,
a reasonable set of test objectives will be developed. Once objectives
have been focused, alternatives for implementation will be considered
and a test concept evolved. Specific test procedures will be oresented
for automatically testing the digital computer. Finally, the results
obtained will be critically evaluated in liqht of the desiqn objectives,
and further related work will be suqqested.
II. PROBLEM DEFINITION AND DESIGN OBJECTIVES
A. BROAD GOALS
Given the framework of a digital computer in a military avionics
application, one can identify three broad goals for a self-test
capability:
1. To decrease the cost of ownership by reducinq maintenance cost/-
time and increasing system availability.
2. To indicate to the pilot in flight the level of system operational
capability available to him.
3. To provide limited assistance throuqh self-test in prototype
design and checkout.
Any information relative to the existence and location of failure will
reduce the time spent (and hence cost) to reoair the comouter and
therefore increase the aircraft's availability for operational ournoses.
Airborne indications of system degradation through failure allow the
pilot to make timely and informed choices of alternatives to ontimize
the probability of successful mission completion. Lastly, self-test
during computer development assists the engineer to more quickly iden-
tify and correct design and hardware faults. In short, BIT is designed




. Cost of BIT
In a very real sense, the dominatinq factor effecting BIT desiqn
problem definition is cost. Cost has several facets. The cost of BIT
is considered to be part of the overall computer nroqram price taq.
Required performance criteria for the completed comnuter system are
specified by the sponsoring government agency to the aerospace industry.
A participating company must strive to reduce its proposed system's
cost while meeting or exceedinq specifications to remain competitive.
So within the overall program development and production cost, the
contribution of BIT must be justified and minimized. Since the broad
goal of increased cost-effectiveness has been identified for BIT,
justification includes critical assessment of the added cost to the
computer program of providing a self-test capability to ensure that a
compensatory benefit in reduced cost of ownership will be realized.
Sources of added cost for BIT include but are not limited to
the followinq:
1. The checking hardware itself
2. Additional power required
3. Greater capacity logic to provide for the added checking
hardware; e.q., drivers with greater fanout
4. Additional data lines to provide for test hardware and
procedures
5. Storage capacity required for BIT routines and data
6. Design, programming, and development costs
Other "costs", often translated into dollar values, include the
penalties (if any) attached to increased size and weiqht of an airborne
computer provided with BIT capability. For an air superiority fiahter
application, these penalties are severe.
Hughes Aircraft Co. uses internally qenerated weiqhting factors
of $500/lb and $5000/ft3 for added hardware. To illustrate usinq
these typical penalties, two computers are compared:
1) A 0.5 ft3, 25 lb computer costinq $50k
2) A 0.4 ft3 , 20 lb computer costinq $52. 5k
The penalties added to computer (1) are:
0.1 ft 3 x $5000/ft3 = $ 500 for volume
5 lb x $500/lb = $2500 for weiqht
Total = S3000 penalty
Computer (2), thouqh ostensibly costinq more, is $500 less expensive
after penalties are applied.
10
The benefits of BIT can also be reduced to monetary terms by
operations analysis techniques. Projected maintenance experience, snare
parts costs, inventory levels, and the effects of maintenance concents
can all be qiven dollar values. However, the relative weight that
increased system operational availability receives is more subjective.
In a space system, for example, there is a very hiah premium on avail-
ability; in a military airborne system, availability is important but
not as critical
.
The result on the overall cost of ownership for the military
system is that, while the penalties for providina BIT are quite clear,
the benefits are harder to evaluate and therefore less visible. Even
when a clear long-term reduction in cost of ownership can be expected,
insufficient available funding may force procurement of a less expen-
sive option without a BIT capability. The effect on BIT desian is to
place emphasis on minimizing the more visible nenalties, reducinn them
to an acceptable fixed percentage of the system cost without a BIT
2
capability.
2. The Parent Computer
The nature of the computer for which the self-test capability
is to be provided certainly has a large influence on the BIT desian
objectives. For the example design, the characteristics for the
parent computer evolved from the original specifications and the
subsequent company policy decisions. The parent computer was to:
2
Estimates in the literature ranqe from 3% cost increase for BIT
for a commercial machine to over 300% for a triplicated space
system computer. A figure of 10% fell in the general area of
acceptability at Hughes Aircraft Company for this project.
I I
1. Have a military avionics application
2. Be modular
3. Have flexible word length
4. Be non-redundant
5. Be repaired on the ground, not in the air
6. Have minimal storage capacity
7. Suffer no operational degradation because of BIT
8. Be developed on short schedule at low risk
Each of these characteristics will be more thoroughly discussed.
A military avionics application implies that size and weiqht
are to be minimized consistent with the cost penalties discussed earlier.
It also imDlies high speed, real-time comDutation. The more rigid
military specifications concerning operating temperatures, humidity,
shock resistance and other severe environmental factors affect the
quality of components used and the packaging of these components at
all levels.
The comnuter was to be of modular construction, the term module
referring to a standardized plug-in circuit card with a given surface
area and number of pin connectors. The Naval Avionics Facility
Indianapolis (NAFI) has developed a series of modules designed to be
acceptable as the basic building blocks for many military applications
[Ref. 10]. The basic "NAFI module" chosen for use in the parent com-
puter (with some modifications) was the "2A" size whose important fea-
tures relative to BIT design are dimensions of roughly five (5) inches
in length and two (2) inches in height (both sides may be used for
mounting hardware) and 30 pins in the two bottom connectors. Figure 1,
derived from Ref. 10, depicts the 2A NAFI module. The module's surface
area and number of pins place limitations on (1) the amount of hardware
i
12
which will physically fit on the module (heat dissioation is a related
problem), and (2) on the number of external, intermodular electrical
paths available. The level of solid state technology of the implementing
circuitry determines whether the area or pin limitation dominates. For
example, circuitry consisting of discrete comnonents (senarate trans-
istors, capacitors, resistors) tends to impose an area limitation because
the relatively large size of individual components limits the number
which can be accommodated in the fixed area, before the available pin
connectors are exhausted. At the other extreme, circuitry imolemented
using large scale integration (LSI) technology, in which perhaos 1000
or more gates are placed on a single silicon chip [Ref. 48], requires
little mounting surface area. The number of external connections
needed, however, can be large. Hence, in the latter case a pin limi-
tation exists. In between these extremes fall the integrated circuit
(IC) and medium scale integration (MSI) technological levels which may
be area or pin limited for specific modules. The size of the modular
partition chosen for the parent computer and the predominantly IC/MSI
technology utilized will be seen to have a significant effect on BIT
design.
Partitioning of the parent computer was not otherwise speci-
fied, except that the computer's basic design was to be readily
adaptable for differing word length applications (specifically,
multiples of eight bits, up to a 32-bit word lenqth) without major
redesign of the original modules. The expected initial application
of the parent computer specified a 24-bit word lenqth; this word lenqth
will be used in the example design.
I I
The parent computer was to be essentially non-redundant; that
is, no general replication of hardware at any level was intended. This
constraint arose from cost considerations. Penalties in the additional
hardware cost, increased size and weight associated with redundancy
were deemed unacceptable. Additionally, the mean time between failures
(MTBF) of the computer tends to be several times higher than the MTBF
3
of the equipment which the computer serves; e.g., a radar.
A closely related characteristic dictated ground repair of
failures. No automatic reconfiguration under failure or fault-masking
was intended, since such self-repair qenerally requires some redun-
dancy. Airborne personnel to effect maintenance would not be available
in the type aircraft for which application was projected. Access,
removal of shielding, and dust-free repair would be difficult airborne.
Built-in test was therefore restricted to detection and isolation of
faults, and was not intended to include a self-repair capability.
The requirement for minimal storaqe capacity was again related
to cost. Random access storage such as core memory is expensive in
hardware, size, weight, and power requirements. No peripheral bulk
storaqe devices such as drum, disc, or taoe were to be available. The
effect of these characteristics of the parent computer on the design
of BIT is significant. The dedication of memory bit locations to
storage of error detecting codes, such as narity or residue, is elim-
inated from consideration because of the attendant reduction in word
Reference 34 shows MTBF's in the 100's of hours for the F-111A
weapons system avionics equipments. MTBF's for airborne computers,
as shown by marketinq brochures, are typically in the 1000's of hours
14
lenath available to the flinht nroqram. Increased word lenqth is
unacceptable because of the greater storage requirement and hiaher cost.
Coding is a widely used technique for detecting data transfer errors.
The storage of software self-test programs and data in core-memory is
also virtually eliminated from the list of often-used test tools. The
core memory, then, is reserved for the flight program and for operational
use with negligible capacity available for BIT use.
Any self-test capability is not allowed to degrade the real-time
operational efficiency of the computer in speed or availability. The
effect of this requirement is to prohibit the insertion of test hard-
ware in operational propaqation paths because of the delays thereby
introduced. Additionally, any sequential, oroqram-oriented test routines
would have to be exercised on a time-shared basis with onqoinq tactical
onerations in available short blocks of "idle" time. Such routines would
therefore have to be interruptable without destroyino test efficacy
so that the machine could be returned to operational computation immed-
iately, whenever required.
The overall computer program called for a short development
schedule with low risk to the company. These constraints dictate the
use of existing techniques and designs wherever feasible. No completely
new technoloqy could be developed within schedule requirements. Off-
the-shelf hardware components would be primarily used because of the
risks attendant in meetinq a short schedule with comDonents potentially
available from outside suppliers at production time but still under
development durinq comnuter desiqn.
15
C. BIT DESIGN OBJECTIVES
With the aforementioned broad goals set forth and the constraints
imposed on self-test design by the nature of the larqer program more
clearly defined, realistic BIT design objectives can be developed. The
maintenance problem would be most significantly assisted if faults
could be isolated to the plug-in card, or modular level. Sub-modular
fault isolation, while desirable from the standpoint of higher echelon
maintenance, does not contribute any more significantly to increased
aircraft availability since the faulty module must be removed in either
case. Conformal coating for environmental orotection applied to cir-
cuitry within the module makes removal of sub-modular components a
difficult and specialized task inanorooriate at the immediate squadron
(1st echelon) level. Higher level isolation would require renlacement
of large and more expensive units of the computer. Stockinq of spare
parts at the module level seems reasonable for the squadron shoo both
in the inventory costs involved and the volumes required. Of course,
commonality among modules reduces the different types to be stocked and
is desirable. These heuristic arguments can be quantized, but the views
presented should suffice to intuitively support the decision to set
fault isolation to the modular level as a BIT desiqn objective.
Since no airborne repair, manual or automatic, is required,
reporting of faults detected within specific modules completes the
self-test task. A compatible design objective, supporting the second
and third broad goals related to pilot notification of failure and aid
to prototype development, is to rapidly indicate the soecific modular
16
location of existing faults to a central location for immediate use by
the pilot and later use by maintenance personnel upon mission termin-
ation.
The BIT design objectives and major constraints can now be summa-
rized. The BIT design should automatically detect failures in the
computer and isolate them to the modular level airborne. The modular
location of such failures should be rapidlv reported to a central loca-
tion. The design should be minimized as to cost, require negligible
core memory storage, utilize no coding techniques requirinq storaqe
capacity, and inflict no operational deqradation on the computer's soeed
and availability. All this should be accomplished on short schedule
and at low risk. While these objectives and constraints for a self-
test desiqn are imposing, they are not atypical of the requirements
of a military airborne system. Just what constitutes the failure to
be detected can now be examined.
17
III. THE NATURE OF FAILURE
Since the objective of "fault detection" has been set, its meaninq
should be explained. This section will consider what constitutes a
fault and will define several related terms. The literature is replete
with descriptive terms such as catastrophic , intermittent, solid,
transient, burst, marginal, multiple, insioient, minor, and qross, applied
4
to fault and the related terms failure, error and malfunction. The
terms "fault," "failure," and "malfunction" will be used synonymously
to mean a physical defect in equipment which causes that equipment to
perform in an unsatisfactory manner. The substandard performance
usually resulting from a fault will be termed an "error." Another way
of statinq this is to say that an error is an incorrect result. The
terms "solid" and "intermittent" will be used to characterize the dura-
tion of the error, and by inference, the failure causinq the error.
A solid error will refer to an error which results from a failure which
persists; a solid error will consistently recur under the same equip-
ment conditions. An intermittent error will be one which is of short or
transient duration and is non-persistent; that is, an intermittent
error does not consistently recur given the same conditions. The terms
"catastrophic" and "transient" are often used to describe these two
categories of error, but they will not qenerally be used herein. The
idea of deqrees of failure is introduced by such terms as marqinal,
4
A qood discussion of some typical terminoloqy surroundinq "failure"
i
r
, found in Ref. 24.
13
sinqle or multiple, minor or qross. The term "marainal" will be re-
served to describe a category of testinq. The terms "sinale" and
"multiple" will refer to one failure or error, and to more than one
failure or error, respectively.
Erroneous results can arise from sources other than equipment
failure. Proqramminq inaccuracies and human operator mistakes will
not be considered to be error within the scope of this investigation.
Equipment failure leadinq to erroneous results represents the class of
faults to be detected by the desiqn test techniques. Inaccurate intra-
computer data transmission, faults in loqic, failures in core memory,
and failed test circuitry are representative of faults within this
class of interest.
Certain types of equipment, generally termed "hard-core," serve
the entire computer and must operate properly If the computer is to
function at all. Examples of such equipment are main power supplies,
clockinn circuitry, coolinq equipment and other mechanical components
such as electromaqnetic interference shieldinq. Faults in this hard-
core equipment have been effectively identified by voltaqe/temperature
sensing devices which continuously compare performance to preset toler-
ances, and similar well-known techniques [Ref. 46]. Faults in the types
of hard-core equipment described above will not be considered to be part
of the BIT detection and isolation task as defined herein. The main
thrust of this investigation will treat the less adequately resolved
problems of identifvinq and locatinq all possible failures in the loqic
19
circuitry, storage, data transmission paths, checking hardware and
other equipment which is not hard-core in the previous sense of provid-
5
ing "housekeepina" and utility services.
Faults are usually identified by detecting the resultant errors.
If a fault does not produce erroneous results, its existence is of
little immediate consequence. For example, a shorted transistor always
causina an output to be in the low voltaae level (the zero of positive
logic having the binary logical states one and zero) does not become
significant until the hiah voltage level represents the proper output
value. In other words, a stuck-at-zero failure is not important until
the proper result should be a logical one. Conversely, as previously
mentioned, all errors are not the result of eauipment failure (e.g.,
operator mistakes), but some of these appear to be the result of eguip-
ment failure. Eauipment failure modes should be examined to identify
those of interest to the test design.
Assuming transistor building blocks (discrete, IC, MSI, or LSI
technoloay) for the example computer Ionic (vice cryoaenics or some
other technoloay), some of the possible failure modes are:
1. Inputs or outputs stuck at the hiah or low voltage levels
(stuck-at-one, stuck-at-zero). Inputs stuck above the hiah
level or below the low level, a possible condition in some
computers, have the same effect.
2. Inputs or outputs stuck at an indeterminate, intermediate
level between the hiah and low voltage levels. Indeterminate
voltage levels miaht sometimes be interpreted as a one, and
sometimes as a zero.
5
The term "hard-core" will later also be applied to some eauipment
within this nroun subject to test, but in a different sense.
20
3. Deteriorated comDonent response to inputs or weakened drive
capability of outputs.
The first failure mode is the one of greatest interest for the
subject test design because such persistent failures result in solid
errors susceptible to detection and isolation.
The second failure mode, inputs or outputs stuck at an indeterminate
voltage level, might lead to no error if nroDerly interpreted, inter-
mittent error if interpreted differently at different times, or solid
error if consistently misinterpreted. An assumption which is often
made in deriving a diagnostic scheme is to disallow the second failure
mode. Another way of stating this is to assume that logic fails to
one of the two logic levels, one or zero, and not to some intermediate
level. The assumption can be validated by setting a voltaqe threshold
above which results will be interpreted as one logical state, and below
which results will be interpreted as the other logical state. The
assumption of disallowing the second failure mode will be made for the
test design.
The third failure mode could result in solid or intermittent
errors depending on the consistency of the erroneous results and the
duration. For example, a weak driving capacity of an outnut feeding
several subsequent inputs (fan out) could result in some inDuts receiv-
ing a zero and others a one. This would be a solid error if the same
Tor example, see Ref. 31.
This assumption is occasionally not made. For examnle, one scheme
which relies on circuitry which fails to a NULL state intermediate be-
tween one and zero is described by Connolly and Schmitt [Ref. 8]. The
assumption of failure to one or zero is far more common.
.'1
inputs always received the same signal under the given conditions. An
intermittent error would result if, for example, a driven input received
a logical one in one instance and a zero in another for the same drivinq
output value. The third failure mode is considered part of the test
problem. It will be discussed again under the topic of marginal testing
in Section IV.
Intermittent errors should be discussed more fully, as they are
sometimes part of the test problem and sometimes not. Some physical
causes of intermittent errors are:
1. Dirty connectors - a small smudge of oil or dirt on a oin might
be sufficient to intermittently block the low current levels
typically found in intermodular lines. Vibration can provide
slight shifts in the contact surfaces sufficient to make or
break contact.
2. Temporary overheating of hardware regions - when not persistent,
such transient environmental conditions can cause intermittent
erroneous results.
3. Loose connections or particles between circuits or within
hardware packages - vibration can cause ooen and closed circuit
conditions intermittently.
4. Unusual electromagnetic interference (EMI) or coupling-spikes
coupled into the circuitry from outside, or appearing through
the power supply can cause changes tn state resultinq
in erroneous performance.
5. Drifting characteristics - aqing or deterioratinq components
or changing environmental conditions can cause varying and
inconsistent performance changes in circuitry.
While the above list is certainly not complete, it does serve to
illustrate the many sources of intermittent error, and to suggest the
difficulty of detecting and isolating the causes of such errors. Those
causes not representing hardware failure, such as dirtv connectors or
unusual EMI, can cause erroneous results which falsely indite fault-
free circuitry (which, when faulty, exhibits the same symptoms). Such
causes of faulty performance are important because even one state
22
change affectinq a logical decision within the machine can produce
catastrophic results. While intermittent errors caused by other than
hardware failure have been excluded from the test oroblem, test oro-
cedures must endeavor to ensure they are, in fact, excluded. A proce-
dure which signals hardware failure when none exists not only reduces
the level of confidence accorded error signals, but also increases
cost, in direct opposition to BIT objectives, by causing fault-free
circuitry to be replaced.
The degree, or extent, of failure is also important to test design.
Single failures are inherently easier to detect and isolate than
multiple failures; the detection problem is smaller. Additionally,
multiple solid failures can have the property of occasionally masking
each other, qiving the appearance of intermittent single failure. To
reduce the test problem to reasonable limits, the assumption that there
exists at most a single failure in a computer to be tested is often
made. The validity of the "single failure assumption" will be examined
relative to the example BIT design as a possible means of reducing the
quantity of added hardware required to give sufficient test effectiveness
within acceptable program bounds.
The components used in modern military/space systems are designed
to have high individual component reliability. Low power silicon
transistors in the Raytheon equipment used in Apollo and Polaris Dro-
_5
grams, for example, were found to have a failure rate of 1.4 x 10
failures/1000 hours [Ref. 40]. If mul ticomponent nackaqes such as IC's
are used, the interconnections between comnonents on the same silicon
chip are more reliable than in the discrete component case. Overall
23
equipment reliability can therefore be expected to go ud through the
use of integrated circuits [Ref. 29]. Figures provided from a variety
of aerospace suppliers 1964 to 1966 show failure rates for integrated
-5 -4
circuits from 7 x 10 failures/1000 hours to 5.2 x 10 failures/
1000 hours [Ref. 19]. Brauer has reported integrated circuit failure
— ft 3
rates varying from 7 x 10" failures/1000 hours to 6 x 10" v failures/
1000 hours [Ref. 4]. Infant mortality failures and adolescent failures,
usually occurring during burn-in and testing at the factory, exceed
the exponential failures (constant failure rate) more common in an
operationally deployed unit. This partially accounts for the diversity
in the cited failure rates, and emphasizes the need to know failure
rate sources and conditions for proper interpretation. The noint to
be made is that even the most pessimistic of the cited figures shows
that a long operating life can be expected from modern components.
The MTBF of a computer considers all the different component
failure rates in addition to connection reliabilities and workmanship
flaws in assigning a commonly used overall reliability figure of merit.
The MTBF of the digital airborne computer can be expected to be in the
o
1000's of hours. With system MTBF's of this order of maqnitude, the
probability of experiencing one failure in a short time interval is
very small. Experiencing two or more failures in the same short time
interval is highly improbable. It then seems reasonable that one incurs
a very small risk of undetected error if one designs test techniques
o
'The Autonetics D26J airborne computer with an estimated MTBF of
18,000 hrs; the Litton LC-728, 4,250 hrs; the Raytheon R-ll, 3,500 hrs;
the CDC 5400, 2,500 hrs are examples from marketing brochures.
24
assuming single failure, as long as testing is done at least periodically
at short intervals. This intuitive approach is used, as more exact
calculations are dependent on actual failure rates, numbers and types of
components, specified confidence levels and assumed distributions. The
single failure assumption seems to be justified for the example design,
and will be made. Restated, the assumption asserts that the computer
is constructed of highly reliable individual components so that essen-
tially simultaneous failure of more than one component is so improbable
that it can be reasonably nenlected. The assumption is further
justified economically by program limitations in that testing for
multiple failures reguires more added hardware at an unacceptable cost
penalty.
The foregoing examination of the nature of failure has led to some
assumptions and conclusions relative to BIT design. First of all,
logic will be assumed to fail to one of its two logic states, and not
to some intermediate level. Solid failures will be of major interest;
however, any failure leading to erroneous results is part of the detec-
tion and isolation problem. Intermittent errors will be especially
difficult to detect and isolate. Those erroneous results caused by
non-hardware sources are important in that care must be taken to avoid
condemning fault-free hardware as their source. Finally, the single
error assumption will be made because little risk of undetected error
is thereby incurred, and it presents the most reasonable annroach from
an economic standpoint. Now the possible test procedures available
to meet the BIT objectives can be considered.
25
IV. TEST PROCEDURE ALTERNATIVES
A. GENERAL CONSIDERATIONS
Comparison forms the basis of all test procedures. A norm aqainst
which comparison can be made must be available, either a priori or as
a result of some generating process. The computer then produces a
result which is suspect until verified against the norm. The variety
of procedures available for testing a computer have this comparative
process in common.
Since thorough testing for all possible errors within the test area
of interest is the objective, the different levels at which testing can
be conducted should be identified. The computer can be functionally
exercised by directing it to perform the operations for which it was
designed on a variety of operands. The thoroughness of test can be
evaluated by asking how many of the possible machine states are thereby
verified. The totality of the possible combinations of inputs and
outputs of the machine's logic circuits form the set of machine states.
A gross functional check performed by exercising the computer's instruc-
tion set on a few operands can be seen to be less efficient and complete
in verifying proper operation of all circuitry than comprehensive
application of the set of inputs with comparison of resulting outputs
aqainst the set of unfailed machine output states. The one test method
is superficial while the other is unnecessarily exhaustive. Each has
been termed "100% testing" by industrial marketeers. The percentage
of testinq for this investigation will refer to the percentage of
possible errors for which checking has been performed. The former method
26
mentioned above would Probably vield a low nercentane while the latter
would represent testinq in excess of 100%. The closer to the Ionic
level that testinn is directed, the more thorounh testinn becomes. Se-
lective testino at the Ionic level can be most efficient in identifvinn
all the failures of interest.
Not onlv must test procedures check for all possible failures of
interest, thev must also take care to avoid sianallina error when none
exists, as alluded to in Section III in the case of non-hardware-caused
intermittent error. Testinn which is not thorounh leads to invalidation
of the sinnle failure assumntion since some failures can no undetected.
On the other hand, inappropriate error sinnals "cryinn wolf" can cause
the pilot to take unnecessary abnormal action detrimental to mission
completion. A significant advantane to testinn conducted in the air-
borne environment is that not all errors identified airborne would be
found if nround testinn procedures were used instead. Consequently,
nround maintenance personnel must have a hinh denree of confidence in
airborne error indications since nround verification may be impossible.
If a throwawav maintenance concent is in effect, nood modules minht be
discarded because of inaccurate test results.
Detection of error is only one nart of the test problem. Isolation
of the causative failure is the other. Test procedures differ in their
abilitv to provide fault isolation. Earlv test Procedures were desinned
to nroduce isolation to the sinnle component level (if isolation was
provided at all) since machines were constructed with discrete technol-
ooy. The mul ti component nackane of the hinher level technolonies has
made unnecessary such fine resolution procedures. For the example BIT
27
design, the modular level is the level of interest. One is not con-
cerned where within the module a failure is located; whether or not the
modular package as an entity is faulty is of primary interest. With
these general comments as a background, the various ways of categorizing
test procedures can be explored.
B. PROCEDURE CATEGORIES
1 . Normal vs. Marginal
Diagnosis of existing solid errors should be the first order of
business for any test procedure. Prediction of possible future failures
would be a desirable supplement to the preceding tests to locate exist-
ing errors. The former testing will be termed "normal" testing while
the latter is called "marginal" testing. Normal testing will be the
type pursued in the example test design. However, marginal testing
conducted in conjunction with normal testing is generally valuable in
furthering test objectives.
Intermittent errors cause one of the biggest problems to the
test designer. However, an intermittent failure causing inconsistent
results can often be forced to become a solid failure with a resul-
tant solid error manifestation through marginal testing techniques
[Ref. 7]. Marginal testing tends to worsen the third failure mode
discussed in Section III by further weakening already deteriorated
components until they become solid failures of the more easily diag-
nosed first failure mode. Marginal testing consists of overstressinq
components through the application of abnormal conditions to cause the
weak ones to fail prematurely during test instead of later durinq
normal operations. Stressing, for example, can consist of over or
28
under biasinq transistors by a certain percentage of rated values. The
danger of marginal testing is that existing intermittent failure can be
masked by a rash of new failures should stressing be done carelessly
or to needless extremes [ Ref. 3]. When done carefully, however, mar-
ginal testing, in effect, oredicts future failures by forcina them to
occur at non-critical times. It also serves to identify and rid the
machine of bothersome intermittent failure, thusly increasing the
degree of confidence accorded to airborne test results.
Marginal testing is generally not appropriate airborne because
of the time and extra equipment necessary to accomplish it. The
accomplishment of marginal testing on the ground depends uoon the
maintenance concept. If periodic maintenance on the around supplements
airborne built-in testing, marginal testing should be part of this
periodic procedure. In the example design, where no airborne repair is
done, marginal testing can be accomplished whenever the computer is
removed from the aircraft for repair of a solid failure identified by
BIT.
2. Software vs. Hardware
Software testing refers to program-oriented, sequential lv
executed, periodic testing. The computer is directed by a program to
accomplish a series of operations on supplied data. The results of
these operations are then interpreted to provide diagnostic informa-
tion. Since software testing is program-oriented, the level of testing
(and therefore, to a certain extent, the efficiency of testinq) is
determined by the level of the orogramminn language used. The lower the
29
order of the programming language, the closer to the component level
operations can be specified. Assembly language or its equivalent is
most frequently used.
A programmed test routine is sequentially executed, one instruc-
tion after the next. The length of the program in number of instructions,
the cycle time of the storage device containing the program, and the
execution times of the instructions affect the time duration of the
test. Test results can usually only be determined after a sequence of
instructions has been executed and a result determined. This result
is then compared against some previously calculated correct result to
see if error has occurred. The same sequence of instructions might
then be repeated with a different set of data and a different exDected
result. Comparison against the norm can take Dlace automatically
under program control after short sequences have been executed, or
later upon examination of a printed output.
Procedures for software testing differ widely. The detection
and isolation functions can be accomplished concurrently or separately.
In the separate case, an "executive" routine might be run periodically
to determine in a qross sense whether or not the computer were exhibit-
ing abnormal behavior. Once such behavior were sensed, a more detailed
"diagnostic" routine might be run to determine the more exact location
of the failure causing the error. Because of the limitations of the
proqramminq language in closely manipulating suspicious components,
results might localize the failure to a region of the machine. Techni-
cians would then locate the failure by hand probinq. Such procedures
tend to be inefficient, marginally effective, and always time-
consuming.
30
The characteristics of software testinq can be evaluated with
regard to the BIT objectives. A definite advantage is that software
testing requires little added hardware (other than storage) to accomplish
the checking function. Isolation after detection is difficult because
of the periodic nature of testing. The test program typically occupies
core memory (unless slower peripherals are available for temporary
storage) and reguires significant runninq time if many different test
data are to be used in an attempt to make testing more comprehensive.
Some functional degradation would occur when time is scarce, even
when the test program is run on a periodic basis, because testing must
share available time with the operational flight proqram execution.
On the other hand, the shorter the test program and the lonqer the inter-
val between tests, the greater the danger of using erroneous results
of undetected failure and downgrading test efficacy by invalidating the
single failure assumption. Test results are only known after several
operations have been executed. This presupposes that the machine has
not failed to the extent that it cannot execute instructions and give
results necessary to locate the failure. Intermittent failure would
tend not to be detected by software testing, eliminating the problem
of signalling error and indicating failure when none exists. On
balance, software testing did not look generally attractive for the
example design.
Hardware testing refers to checking accomplished by added
circuitry. Such testing is characterized by simultaneous detection
and isolation usually at the logic level, rapidly available results,
and minimal degradation of operational capability. In general, the
U
added checking hardware generates a basis for comparison with concurrent-
ly generated flight program results, and actually accomplishes the com-
parison at the logic level. Operation at the logic level provides
excellent fault isolation capability. Results of the comparison are
known essentially immediately. If a fault exists, it can be located and
appropriate action taken prior to contamination of other data, or
utilization of erroneous results. Hardware testing differs from soft-
ware testinq in that it checks the correct operation of the circuit
being tested, but does not verify the correctness of the data being
operated upon. The effect is that each circuit in a chain must be so
checked if resultant data is to be certified. Further discussion of
concurrent testing, characteristic of hardware testing, will be pre-
sented in the next subsection.
By virtue of consisting of fewer components, checking circuitry
is inherently more reliable as a whole than the loqic it checks. How-
ever, the components themselves are just as subject to failure as the
components they test. To provide a high confidence of valid testing,
therefore, one must consider the added test hardware itself as a ooten-
tial source of failure. Such hardware then becomes hard-core in the
sense that its proper functioning must be verified before testinq
commences. Unlike the hard-core housekeeping and service hardware
previously mentioned, checking hardware was considered to be part
of the test problem.
Hardware testing offered many benefits making it attractive
as a means of meeting the example desiqn objectives within oroqram
constraints. Its obvious disadvantaqe relative to software test was
32
the much higher cost oenalty incurred as a result of the exnense of
added hardware. A combination of hardware test to orovide efficient
test performance and software test to reduce expense offered a nossible
tradeoff for the example design.
3. Continuous vs. Periodic
Testing can be classified by its duration as either continuous
or periodic. Continuous testing must also be concurrent (the results
of test may be somewhat time-skewed) since ongoing operational comou-
tations occur simultaneously. Continuous testing is characteristic of
hardware test. The effectively immediate failure detection provided
by continuous testing tends to identify intermittent errors, where
periodic testing does not. The single failure assumption is justified
since failures are detected as soon as they occur. Operations can be
halted upon occurrence of an error and the machine state at time of
halt preserved. The process of "retry" or "restart" then attempts the
last operation again to see if the same error recurs. Recurrence
indicates a solid error and failure is flagged. Non-recurrence denotes
an intermittent error, in which case the second correct attempt is used
and operation continued. By noting the recurrence rate of intermittent
error under the same conditions, intermittent hardware failure can often
be distinguished from one-shot external sources. Hard-core house-
keeping and service hardware is generally continuously tested.
Periodic test refers to checking conducted at specific intervals,
such as software testing. The testing then time-shares with operational
computation. Results are only determined after a number of seguential
steps have been accomplished. Preservation of machine status for retry
33
when Deriodic testing detects an error is generally not practical.
However, if the periodicity of test is sufficiently brief, error halt
can occur shortly after failure, minimizing the cumulative effect of
error on Dost-failure computation. The single failure assumDtion is
still valid if the period between tests is short. Intermittent errors
will not be detected by periodic testing until they become solid. Even
in a continuously tested machine, hard-core checking circuitry is more
reasonably tested Deriodically.
The unigue nature of the added checking hardware providing
continuous concurrent testinq to the different logic circuits of the
machine results in high cost. A tradeoff in favor of a periodic,
interruptable test procedure exercised at freguent intervals appeared
attractive for the example design.
4. Deterministic vs. Non-Deterministic
A deterministic test yields a definite answer to the question
of whether or not an error exists. A non-deterministic test yields
results which are interpreted statistically against an expected dis-
tribution to determine the probability of the existence of error. The
terms are more often applied in relation to software testinq procedures
since hardware testing is always deterministic. Non-deterministic
testing was not attractive for the examnle BIT design because of the
requirement for a high degree of confidence in test results. Sta-
tistical technigues were, however, found useful in selecting initial-
izing data.
5. Combinatorial vs. Sequential
Seshu and Freeman [Ref. 45] classify the organization of
testing into two different categories, combinatorial and sequential.
34
A combinatorial testing procedure involves apDlication of a fixed set
of inputs to the machine with the outnut results beinq analyzed to
identify failures. As an examole, non-deterministic testinq is combina-
torial. A sequential procedure has no fixed set of tests which are
aoplied. The result of the first test sequence determines which test
sequence will be used next. Sequential testinq is more efficient since
selection leads to fewer tests. These two categories should not be
confused with the often used classification of loqic as combinatorial
(combinational) or sequential. Combinatorial and sequential testina




The previous section presented several categories which can be
used to describe test procedures. In practice, the specific proce-
dures presented in the literature tend to fall simultaneously into
several of the categories previously mentioned; all are a blend of
alternate anproaches having favorable characteristics relative to their
intended aopl ications. The discussion of soecific alternatives re-
quires a further cataloging effort, difficult because of the diversity
of approaches to test and because of the aforementioned overlapping of
categories. The discussion presented is not intended to be comprehen-
sive; it is meant to demonstrate the diversity existinq in the test
field and to introduce some techniques which proved useful in devel-
oping the specific blend of approaches best meetinq the requirements of
the example desiqn.
$5
Since most of the test alternatives identified have been
Dresented in the literature, the discussions are usually short, rapidly
settling to a single level of interest. Some discuss the systems
approach, giving overall techniques for testing the computer's different
major units. Others have develoDed schemes for determining the ootimal
test sequences for checking one unit of the computer (e.g., the arith-
metic unit, or the memory). Such schemes examine the states of the
elements comprising the unit under consideration, the elements being
identified as either fault-free or failed, and develop tests to yield
the final diagnostic results on the entire unit. Still other techniques
examine the states of the inputs and outputs of a sinqle logic element,
or block of elements (e.g., an AND gate or a multiplier block), with
the goal of locating a failed element. The presentation of alterna-
tives below will generally move from the system level to the loqic-
block level; however, the tyDing is loosely defined and often diffi-
cult.
2. Coding
A large variety of schemes and a significant body of theory have
been developed in the literature relative to coding test techniques.
Generally, coding represents a succinct way of supplying redundant
information to provide a norm for comparison. Codes can be used to
detect and correct single or multiple errors. The program constraints
imposed on the example BIT design eliminate from consideration error-
correcting codes and those reguiring core memory storage. For this
reason, only parity was considered potentially annlicable for the
examnle design. Its nature and possible use will be discussed next.
36
Parity is the simplest error-detectinq code consistina of one
redundant bit of information, making the sum of the information bits




















^ I. 1 mod
2
The correct narity value for a data word is known a priori. Upon
completion of an operation, the correct parity of the result is known
and is qenerally attached to the result as an additional bit. The
actual parity is then calculated and compared to the exnected parity to
determine whether or not error has occurred.
Parity has the capability of detecting odd numbers of errors,
and therefore provides protection beyond the sinale error assumed. In
the absence of the single error assumption, the risk of undetected multi-









resulting from operations, the binary value a. of the ith bit nosition
can have one of two states relative to failure (failure states): it
is either correct or erroneous. The probability of undetected error P „J ue
is just the sum of the probabilities of multiple even errors. Assuminn
an instantaneous probability of error p in bit location i and independence
$7
between bit locations, the probability of k simultaneous errors is
just p . Accounting for all combinations of ways k errors can occur
in an n-bit word length (n even), the instantaneous probability of unde-
tected error can be expressed as
P
ue
" (> 2 0"P) n - 2 + (>4 (1-P) n " 4 + ..< +
(>n (l-p)°
= E (". )p
2k (l-p) n
" 2k
where m = n/2
k=l
6K
For n = 24 and p = 10"
3
P -4
ue ~ 2,7 x 10 , or .027%, a \/ery low risk.
Parity can be useful in both software and hardware test pro-
cedures. It is often used to detect single errors in data transmissions
For the example design its potential use was as a hardware test where
the correct parity was automatically present, or generated by the cir-
cuitry to be checked. A hardware parity generator and comparator could
then be added to provide error indication. An example application
might be to a feedback shift register which always generates a number
with odd parity to which a parity generator and comparator could be
added to verify proper operation. The generation and use of parity for
comparison was only acceptable for the examnle desiqn where core memory
storage of parity bits was not required.
3. Diagnostic Partitioning
The general technigue of diagnostic partitioning divides the
computer into smaller entities, each of which can then be tested
separately. Forbes, Rutherford, and Steiglitz [Ref. 13] present such
a technigue in which the computer is partitioned into "diagnostic
38
subsystems," each having certain caoabil ities. The subsystem essen-
tially is able to apply stimuli, sequentially execute a series of oper-
ations, receive and process inputs, and communicate diagnostic results
of test to the outside world. The subsystems can then alternately
diagnose each other. A sequence for system diagnosis at the subsystem
level is developed. Their technique of partitioning a machine into
essentially autonomous sections was found to be applicable in the exam-
ple design. The test technique involves a periodic, software test with
fault isolation provided by the order of operations. An interestinq
feature is the microprogramming of the test routine to provide closer
manipulation of the logic for the reasons previously described in
Section IV-A.
The concept of diagnostic partitioning can be applied to a
partitionable machine in a "bootstrap" fashion. One subsection is
considered to be hard-core, and it is checked by hardware means,
manually, or by software. An example of software test would be execu-
tion of a small number of operations requiring only the hard-core sub-
section to implement. Uoon verification of the hard-core subsection,
one then uses it to check the next subsection. The two checked sub-
sections can then be used to check the next and so forth. This repre-
sents a type of sequential testing (vice combinatorial) at the
subsystem level. Manning [Refs. 31 and 32] describes a modification
of such a technique. The difficulty with diagnostic nartitioninq is
that the architectural designs of many computers do not facilitate
partitioning.
39
4. Program Hierarchy Testing
A system technique related to diagnostic partitioning examines
the functional capabilities of the computer. A hierarchy of distinct
software programs is used to functionally partition the machine, in
contrast to the physical sectioning associated with the diagnostic
partitioning of the previous section. A high level proaram oeriodically
functionally tests the computer by exercising short routines using the
machine instructions to grossly check the computer for proper oper-
ation. Examples of functional checks might be adding, multiplying or
shifting. Such "executive programs" are not intended to be comprehen-
sive or isolating; they detect errors in functions by comDaring results
obtained to Dreviously stored expected results. Once an error has been
identified, a "diagnostic routine" tailored to the type of functional
error detected is executed to provide the isolation required for repair.
While not comprehensive, such a technique allows frequent running of
the short executive routine, while calling on the longer diaqnostic
routine only when error is sensed. Cohen and Whitaker [Ref. 7] describe
such a procedure developed at Sylvania. Bashkow, Friets, and Karson
[Ref. 3] divide the diagnostic process by hierarchy into a command
checkout phase, used to assure that the machine is "breathinq" (no
gross malfunctions exist), and "executive", "testing", and "diag-
nostic" phases to give more detailed checking at lower levels. The
diagnostic programs used are microprogrammed to Drovide failure
resolving capability.
5. Software Exercise, Hardware Detection
An interesting combination of testing technigues uses software
routines to exercise the computer periodically and added hardware
40
circuitry to detect errors. The hardware provides the level of detec-
tion resolution required. Software routines need only thorouqhly exer-
cise the machine, with no attention to order of execution for isolation
being necessary. Fred Lee [ Ref. 27] describes such a procedure in
which the machine's operations are broken down into sequences of events,
recognizable as pulses occurring in a specific order. The correct
sequence is provided for the test routine and is compared aqainst the
actual sequence. Hardware monitoring devices provide the comparative
function with non-coincidence signalling specific error. With an 18.2%
increase in transistor count for test purposes, Lee claims 100% confi-
dence in the device. This procedure is also described by Sellers,
Hsiao and Bearnson [Ref. 43] under the title of "sequential loqic
latch checking." While Lee's procedure was not used, the idea of
software exercising and hardware detection was of use for the examDle
design.
6. The Black-Box Approach
The black-box approach refers to the process of setting the
inputs of a network and observing the resultant outputs, useful in-
formation thereby being derived without internal access to the net-
work. A most extensive body of literature reports on varyinq
schemes to obtain optimal, minimal sets of inputs to diagnose all
possible errors internal to the network. With the growing use of
mul ticomponent packages inaccessible internally (IC, MSI, and LSI
technology), this test area has received renewed attention. Eldred
[Ref. 12], in one of the earlier papers treating the black-box
approach, discussed the derivation of minimal tests for a simple
II
network of discrete components by evaluating the inDUt conditions which
should cause the network output to be "activated" or "inhibited."
Results deviating from this norm indicated failure. Armstrong [ Ref. 1]
presented a procedure based on "path sensitizing" in which a given
internal fault is selected and its effect is traced to the outout for
given input conditions. The procedure continues until all faults have
been treated and the significant input and outout patterns derived.
The "truth table" or fault dictionary technique is similar in that a
table of the expected outputs for given inputs and specified internal
failures is derived. Comoarison of combinatorial test results to the
fault dictionary determines if an error has occurred, and where.
The derivation procedure for a large block of logic can be
tedious, even when computer aid is used. The requirements for memory
can easily exceed availability in the analysis of large networks. Such
difficulties have led to the development of simplifying methods for
automating the analysis of large networks. There is wide agreement in
the literature that the derivation of minimal input tests for a large
block of logic must be automated.
Sellers, Hsiao and Bearnson [Ref. 42] develooed an algebraic
technique based on Boolean difference to facilitate learninq the effect
of a chanqe in state of a chosen input on the network output. The
procedure involves logically Exclusive-ORing the Boolean output func-
tion, expressed in terms of the inputs, with the same function having
the chosen input inverted. If the Boolean output function is
r i X -, , Xp, ..., X-j» • • • i X J














..., x., ... , x
n
) Y F ( x i > x 2'
where the chosen input of interest x. is inverted in the second ex-
pression and V represents the Exclusive-OR ooerator. The Boolean
difference yields the input conditions for which the outout will change
state, given the chosen input state change.
Roth [Ref. 41] with his calculus of D-cubes exoands on the
above method, but with a more graphical technique to solve the some-
times formidable problem of accomplishing algebraic operations such
as V for complex functions. He first expresses the truth table of each
element of the network in a succinct form and then gives rules for
intersecting the tables of the individual elements to form the table
describing the entire network.
The usefulness of such technigues is reported by Galey, Norby
and Roth [Ref. 14] in an earlier version of Roth's later technique.
Four eight-bit input tests were automatically derived, the results
of which would indicate whether any one of 102 possible internal
failures had occurred (but not which one). This illustrates the con-
cept of testing an internally inaccessible network for failure
without interest in which specific component has failed.
43
An interesting contrast is offered by Maling and Allen [Ref. 30]
who test a network for failure with the purpose of identifying the
specific failed component. For each n-input component of the logic
net, 2
n
represents the number of different input combinations. Only n
+ 1 of these are necessary to show that each input in turn can control
the output and that the output can take either state. For a net of k
such components where the ith comDonent has n. inputs, they state that
the number of configurations C of the n + 1 required inputs per compon-
ent is
k
C = k + En.
1=1
n
This number also represents the maximum number of tests required to
thoroughly check the circuit with component isolation. The lower
bound is determined if each test is efficient enough to eliminate half
the components from further consideration. The minimum number of tests
T . is thenmm
T . = 1 + |log Cmm ' 3 2
where indicates next hiqher integer. From experience, they state
that the number of tests required is usually approximately equal to the
number of components.
7. Non-Dupl icative Hardware Checking
Checking by adding hardware which does not duplicate the cir-
cuitry being checked provides the benefits of hardware test without
the cost of duplication. Rao [Ref. 39] describes a method for checking
arithmetic-type operations in a processor throuqh the use of residue
coding generated and employed by added hardware without storage to
44
identify errors but not to locate them. The residue code was used to
provide a high level of multiple-error checking capability not required
in the example design. The 1000 gate processor required 400 added gates
to check it, or a 40% increase in cost which would be unacceotably high
for the example design. Sellers, Hsiao and Bearnson have comniled a
comprehensive volume [Ref. 43] on error detecting logic, which is the
only one of its kind identified by the author. The cited reference
is an excellent source of non-dupl icative hardware checking schemes.
The use of non-duol icative hardware schemes anneared attractive for
the example design, particularly for the hard-core circuitry included
in the test problem.
8. Replication and Comparison
When other schemes do not provide adequate checking, one can
replicate circuitry, operate the replicated portions in parallel and
compare the results, with any non-coincidence indicating error. While
the technique is expensive (and unacceptable for the example design)
when employed on a large scale, it often presents the only technique
by which isolated small blocks of circuitry, or hiqhly irregular cir-
cuitry can be thoroughly checked. For the examole desiqn, duplication
of small sections was very useful. The replicate and comoare conceDt
is often applied when high reliability requirements force the use of
redundant hardware on a large scale. Switching to the unfailed dunli-
cate offers continued operation while the failed portion is renaired.
Automatic repair is not appropriate to this investigation, yet it
proceeds naturally from some of the methods found useful and there-
fore represents a good topic for further related investigation
45
An interesting contrast is offered by Maling and Allen [Ref. 30]
who test a network for failure with the purpose of identifying the
specific failed component. For each n-input component of the logic
net, 2 represents the number of different input combinations. Only n
+ 1 of these are necessary to show that each input in turn can control
the output and that the output can take either state. For a net of k
such components where the ith component has n. inputs, they state that
the number of configurations C of the n + 1 required inputs per compon-
ent is
k
C = k + En.
i=l
n
This number also represents the maximum number of tests required to
thoroughly check the circuit with component isolation. The lower
bound is determined if each test is efficient enough to eliminate half
the components from further consideration. The minimum number of tests
T . is then
mi n
T . = 1 + |log CImm ' y 2 '
where indicates next hiqher integer. From experience, they state
that the number of tests required is usually anproximately equal to the
number of components.
7. Non-Dupl icative Hardware Checking
Checking by adding hardware which does not duplicate the cir-
cuitry being checked provides the benefits of hardware test without
the cost of duplication. Rao [Ref. 39] describes a method for checking
arithmetic-type ODerations in a processor through the use of residue
coding generated and employed by added hardware without storage to
44
identify errors but not to locate them. The residue code was used to
provide a high level of multiple-error checking capability not reguired
in the example design. The 1000 gate processor reguired 400 added gates
to check it, or a 40% increase in cost which would be unacceDtably high
for the example design. Sellers, Hsiao and Bearnson have comniled a
comprehensive volume [Ref. 43] on error detecting logic, which is the
only one of its kind identified by the author. The cited reference
is an excellent source of non-dupl icative hardware checking schemes.
The use of non-duDl icative hardware schemes anneared attractive for
the example design, particularly for the hard-core circuitry included
in the test problem.
8. Replication and Comparison
When other schemes do not provide adeguate checking, one can
replicate circuitry, operate the replicated portions in parallel and
compare the results, with any non-coincidence indicating error. While
the technigue is expensive (and unacceptable for the example design)
when employed on a large scale, it often presents the only technigue
by which isolated small blocks of circuitry, or highly irregular cir-
cuitry can be thoroughly checked. For the example design, duplication
of small sections was very useful. The replicate and compare concent
is often applied when high reliability reguirements force the use of
redundant hardware on a larqe scale. Switching to the unfailed dunli-
cate offers continued operation while the failed portion is renaired.
Automatic repair is not appropriate to this investigation, yet it
proceeds naturally from some of the methods found useful and there-
fore represents a good topic for further related investigation
45
by others. Duplication and comnarison, recognized as one of the most
effective test techniques, formed the basis for a unique application
in the example design of the diagnostic partitioning scheme described
earlier.
9. Probabalistic Method
A non-deterministic method which is periodic and combinatorial
is presented by Merwin [Ref. 33]. A block of combinatorial logic
(vice sequential logic having feedback paths, not to be confused with
combinatorial test) having many inputs is tested by first establishing
the exnected distribution of output values. Each of the possible
combinations of input values is considered equally likely. The output
pattern resulting from each input pattern is derived. The statistical
appearance of a qiven logical value at each specific outout of the output
set can then be determined. For example, if there are 16 possible input
combinations (four inputs) and three outputs, output number two may
have the value logical one for eight of the input combinations. The
logical value one would then be expected 8/16 or 1/2 of the time at
output number two. Merwin attaches a random number generator to the
inputs and tabulates the incidence of appearance of the logical value
one at each of the outputs. Deviation of the actual ratios from the
expected ratios may signify an error. If output two took the value
logical one only 1/16 of the the time instead of the expected 1/2 of
the time, error would be likely. Decision criteria can be established
using statistical procedures. The random number qenerator as a source
of random bit patterns was useful in the example design.
46
V. THE EXAMPLE DESIGN
A. THE TEST CONCEPT
The parent computer was divided into units:
1. The processor unit - containina arithmetic loaic and
aeneral purpose reaisters.
2. The control unit - to provide control sianals for direction
of operations in the processor unit.
3. The core memory unit - to provide storane of the flinht
proqram and temporary data.
4. The input/output (I/O) unit - to provide interface between
the computer and the equipment it serves.
The I/O unit will not be considered in the present investigation.
The proposed instruction set for the computer (to be termed the
macro-instruction set) provided for an extensive half-word/ half-
reqister addresstna and manipulation capability. Processina was to
be possible on 24-btt words (full-word operations), on the riant or
left 12-bits of the 24-bit word separately (separate half-word opera-
tions) . or on the riant and left 12-bits of the 24-bit word simultaneously
(parallel half-word operations). With little added hardware and desinn
effort, it appeared possible to confinure the niohlv reaular Ionic
of the processor unit into two autonomous halves, each nossessina multi-
functional capabilities. This diaanostic partitionina in effect
provided a duplex redundant processor unit without the exnense of
duplicatina the hardware. This technique will be termed "split
duplication."
With the proposed hiah speed of the parent computer, sufficient
time was available when the machine was not performino its basic
47
operational mission (idle time) to time-share a periodically exercised
test procedure without imposing any functional degradation. This would
be particularly true if the test procedure, once initiated, could be
interrupted to return the computer to operational computation without
destroying test efficacy. Idle time was to be available eyery few
seconds, validating the single failure assumption through short Derio-
dicity of test. The lower cost advantage of periodic, program-oriented
testing could be thereby enjoyed.
Two modes of operation were identified. In "normal" mode operation,
denoting mission operational computations, both halves of the comouter
would be used together, making full -word, separate half-word, or
parallel half-word operations possible. In "test" mode, denotinq idle-
time test exercising, only parallel half-word operations would be
possible. During test mode, the autonomous processor halves would be
loaded with identical half-word bit oatterns. Identical parallel ODer-
ations would then be executed on the like data independently. Comparison
of the results would then be accomDlished with non-coincidence of the
two halves indicating error. The advantages of the superior dunlication
and comnare method could be enjoyed without the cost disadvantage of
duplicated hardware.
The source of data words with which to initialize the two Drocessor
halves during test mode remained to be resolved since core storane was
not acceptable. The possibility of usinq an inexpensive hardware
pseudo-random number generator, similar to the one used in Merwin's
probabilistic method, appeared to be an attractive ootion which was
compatible with the concent of interruntable test while requiring no
48
core storage. Random patterns would more nearly simulate inputs used
during normal mode operation. An argument can be made for "worst-case"
testing in which a small number of unusual bit patterns not nomallv
encountered in normal mode operation are used to stress the machine in
a worst-case manner. Such stressing appeared to be more aDpropriate
for marginal testing on the ground when such worst-case patterns might
be expected to hasten impending failure. Additionally, no "end-of-
test" point needed to be identified since the machine was to revert to
test mode at any time not required for normal mode operation. Finally,
the storage required for worst-case bit patterns obviated their further
consideration.
The use of a pseudo-random number generator allowed the core memory
unit to be disconnected from the processor unit during test mode, and
made possible the core memory unit's separate checking either concur-
rently, prior to, or subsequent to nrocessor unit test. The control
unit, however, was required in test mode to supply the control signals
to direct the parallel half-word operations. Testing of the control
unit itself, and the location and execution of the exercising test
routine still needed resolution.
The issuance of accurate control signals by the control unit to the
processor unit is a prerequisite to correct computation. The control
unit was to be microprogrammed using a read-only-memory (ROM) as the
storage device. The control signals apnronriate for executing the
macro-instruction set were to be hard-wired in the form of short
routines of the lower order micro-instructions. The hard-wirinq
consisted of arrays of transistors implemented on a small number of
49
silicon chips, the whole comprising the ROM. The remainder of the
control unit consisted of the selection and sequencing circuitry required
to assure issuance of the proper sianals in a timely manner.
Because of the standard packaged arrays available with which to
implement the ROM (the low risk nature of the program dictated use of
off-the-shelf hardware), sufficient unusued storage capacity beyond the
requirements for the microprogrammed control siqnals was present to allow
storage of a microprogrammed test routine. Careful , efficient micro-
programming of the test sequences promised a much shorter test routine
requiring significantly less ROM storage than the comparable core
memory storage needed for an equivalent routine programmed using the
macro-instruction set. The inherent advantage of the lower order
micro-instruction set relative to thorough exercise of the computer at
the logic level is enjoyed by such a scheme. An additional significant
advantage for an interruptable, time-shared test routine is the much
9
shorter cycle time of the read-only-memory compared to the core memory.
Note should be made here that test mode exercise of the processor unit
could be accomplished entirely independent of the core memory unit.
Since the control unit was to issue the control sianals directing
the test routine, it became hard-core hardware whose proper function-
ing had to be continuously assured. Hardware techniques for continuous,
concurrent testing of the control unit were therefore essential to the
i\ typical core memory cycle time is 2 psec while a typical ROM
cycle time is 200 nsec, 10 times faster.
50
concurrent testing of the control unit were therefore essential to the
test concept. As will become evident when the control unit BIT desiqn
is discussed, the hiqhly irreqular nature of control unit circuitry
tends to necessitate hardware test techniques in any case.
With the test concept developed, the more detailed BIT desiqn of
each unit can now be examined.
B. THE PROCESSOR UNIT
With the exception of the power supply, considered to be hard-
core servicing hardware excluded from the test problem, no hard-core
hardware requiring continuous test was to be located in the processor
unit. The split duplication, periodic technique of testinq the pro-
cessor unit could be expected to thorouqhly check its operation.
The contents of the general processor module resultinq from
partitioning the processor unit are shown in Fiqure 2. Figure 3a shows
the 24-bit data oath divided into four-bit qroups, with the double line
denotinq the left and riqht half-word division. Two four-bit qrouns
L. and R., are physically located on the same module, Droviding eight
bits of the 24-bit wide data path. The remaining groups are likewise
associated on separate modules, a total of three identical modules
(see Figure 2) being necessary to implement a 24-bit path. Modifi-
cation of word length in eight bit increments is possible, in conson-
ance with the objective of flexibility of word lenqth. For examnle,
addition of a fourth identical module would easily convert the pro-
cessor to a 32-bit path width.
Emphasis should be placed on the fact that the description above
refers to a data path, and not to a sinqle reqister or a sinqle
51
functional circuit. The amount of hardware Implemented on the 2A NAFI
module is dictated by Its area and pin limitations, discussed in Section
II-B-2. The entire processor unit can then be thouaht of as consisting
of a series of three-module sets, the modules within each set being
identical. The total number of modules 1n the processor unit would be
a multiple of three.
Providlnq Isolation to the modular level has only been briefly
discussed so far. Identical half-word bit patterns are used to initial-
ize the processor circuitry beinq tested. While the computer is in
test mode, these bit patterns undergo parallel operations concurrently
in the autonomous halves. The results of such operations should there-
fore be identical at each point In the data path . Any difference
indicates that a fault exists. Non-coincidence is signalled by a
hardware comparator placed 1n each module to compare the autonomous
halves' results. The required fault detection and isolation are hence
achieved by the placement of the comparators in the data path at the
modular level. Comparison takes place continuously durinq test mode at
each clock pulse, so interruption to return to normal mode operation
has no effect on test efficacy.
The decentralized power supply located 1n each module consisted
of the final step of renulatlon required to provide the power level or
levels necessary in the module. The decodino of the control slqnals
was also accomplished in the associated module. Decode could thereby
be checked bv the same technique as other processor hardware, elimina-
tinq the necessity for the more difficult, costly continuous checkinq
of decode circuitry located 1n the control unit. Any failure in
52
the power supDly serving the module or in the decoding function would
occur i_n the module. By tying the hard-core checking circuitry for
testing the local power supply (not treated herein) into the processor
module checking circuitry, a single error signal could be issued from
the module in case of failure. For the examole design, the reason for
failure within the module did not need to be identified; only isolation
to the modular level was required. If a centralized power supnly
provided fine power regulation and if decode were located outside the
module served, precautions would be necessary to insure that failures
in these functions did not cause failure within the module to be errone-
ously sionalled. Confidence in the error signal once issued is
increased by the decentralizing scheme described.
In test mode, only parallel half-word operations are accomplished.
In normal mode, however, full-word and separate half-word operations
are also utilized. Differences in the execution of operations in the
two modes had to be identified to ensure that test procedures thoroughly
exercised the circuitry, and that test hardware did not deqrade normal
mode operation. The carry forward found in adders, shift registers and
counters in the processor unit was the major such difference.
Figures 3b and 3c show the carries associated with parallel half-
word and with full-word operations, respectively. In the case of
parallel half-word operations, the carries between adjacent four-bit
groups in the two halves are identical. For example, the carries from
L, to L
?
and from R, to R
?
are the same. Since the L, and R, qrouns are
located in the same module, the carries from the most significant ends
53
of L-. and R, are identical when no fault exists. These carries can
then be compared, with non-coincidence indicating a failure in that
module .
One difficulty arises during test mode parallel half-word ooerations
when an error in a carry is detected; e.g., the carry from L-, to L
?
differs from the one from R, to R«. Error is signalled in the current
module. The differing carries, however, cause the bit contents of L
?
and R« in the next module to differ, and because they don ' t comDare,
error is also signalled in the next module. This difficulty can be
resolved by inhibiting the error signal in module i+1 when an error
signal is issued from module i preceding it.
Another difficulty arises because during full-word operations in
normal mode, the bit contents of the groups L. and R. in the same module
may differ with no faults existing. Likewise, the carries propagated
from L. to L.,,, and from R. to R.,-,, may also differ. The error
i l+l l l+l J
signal due to non-coincidence must only be allowed in test mode, in
which any non-coincidence is the result of failure. A test-enable
signal can be applied to checking circuitry in test mode.
It was also desirable to eliminate any gating from the inter-modular
carry paths to avoid prooagation delays. Figure 4a shows the checker
circuitry added to each module. Figures 4b and 4c show Dossible Ionic
implementations of the desired truth tables for the carry checker and
error-inhibit respectively. Figure 5 shows the relationships between
two adjacent modules.
Note should be made that the error inhibition in the case of the
first difficulty discussed does not allow two adjacent modules to signal
54
error during the same periodic test iteration. Both carries from one
module to another are also assumed not to fail simultaneously, in which
case the comparison check would be passed in spite of existing failures.
Both of these cases are highly improbable and represent cart of the unde-
tected failure risk accepted under the single-failure assumption for
built-in-test. In the case of simultaneous failures in adjacent modules,
only one is signalled. However, upon checkout after repair or replace-
ment of the signalling module, the second module would then immediately
indicate failure.
The fault-detecting circuitry described thus far does not distinguish
between faults occurring in the module and faults occurring in the data
transfer paths between that module and the previous one. Circuitry to
provide such isolation could be added, and would consist of another
comparator if the additional cost were acceptable. The problem of de-
termining if the fault exists in the module or in the data transfer
paths between that module and the preceding one would have to be accom-
plished by ground maintenance personnel unless the additional comnarator
were incorporated.
The pseudo-random number generator has been very briefly treated.
Such a device is capable of providing long seguences of data words.
A 12-bit generator was reguired for the example design. Golomb [Ref. 15]
describes the design of a simDle linear feedback shift register reguirinq
very little hardware. An example generator which adeguately fulfills the
test requirements under consideration is included as Appendix A. The
12
maximum length sequence of 2 " different patterns was obtained by
implementing a modulo two irreducible polynomial found in Peterson
55
[Ref. 36] and adding the nonlinearity of the important all-zero case.
The patterns so obtained met Golomb's tests of randomness in each bit
location. A self-checking Dseudo-random number generator design
using more hardware is illustrated by Sellers, Hsiao, and Bearnson
[Ref. 43] under the title of "unit distance code parity checked counters.
C. THE CONTROL UNIT
The control unit was the least regular of the units to be self-
tested. Additionally, it was hard-core, requiring continuous test to
validate the control signals issued to the processor from the ROM. The
solit duplication test concept of neriodically exercising the processor
unit during idle time presupposed a fault-free control unit able to
issue appropriate control signals to direct test exercises whenever
such idle time became available. Continuous testing of the control
unit with added checking hardware would assure its fault-free avail-
ability by signalling its unavailability upon occurrence of a failure.
Partitioning the control unit to provide modular isolation of failure
while minimizing the requirement for added hardware is the subject of
this section. Since the control unit was the only unit requirinq
continuous test, it should be recognized that a large portion of the
overall hardware penalty for providing BIT to the comnuter as a whole
was to be paid in the control unit.
Testing the control unit consisted of the followinq steps:
1. Testing the ROM for correct word content
2. Testing proper accessing of the ROM
3. Testing proper sequencing of accesses
4. Testing the checking hardware, which was also subject to
failure.
56
Testing the checking hardware was a problem common to all the units,
and it will be treated in Section E below. Figure 6 shows the non-
partitioned control unit organization for the parent comouter. Figure 7
illustrates the general modular partitioning and hardware added for
checking, which is described below.
Testing the ROM for proper word content will be examined first.
The control signals used to properly execute the flight program (and
the test routine) are stored in the ROM in the form of hard-wired bit
patterns called microwords. The contents of the microword can change
under failure, having a catastrophic effect on the control unit's ability
to issue proper signals and consequently on the computer's ability to
execute the fliqht urogram. The ROM, exclusive of addressing hardware,
will be assumed to be implemented in segments of 256 eight-bit words,
shown in Figure 8, although this implementation is not critical to
the test procedures described. The ROM microword length will be
assumed to be 48 bits, also not critical. The ROM is then implemented
in six segments, as illustrated in Figure 8.
Three fields of the microword format (see Figure 8) have test
significance:
1. Parity field (P) - one bit dedicated to parity of the entire
microword in which it is located.
2. Next address field (NA) - eight bits containing the next
address in the microprogram sequence (the next microword to
be executed). This field was necessary even without BIT.
Depending on the method of microprogramming, a microword may include
several micro-instructions to control simultaneous onerations in
the processor and elsewhere. A microprogram is executed one microword
at a time.
57
3. Current address field (CA) - eight bits containinq the address
of the microword in which it is located.
The six segments comprising the ROM are checked for correct word
content by narity. The addressing circuitrv accesses only one micro-
word at a time. Parity is generated on the microword issued to the
43-bit hold register. This generated narity is then coirmared to the
proper parity stored in the parity field of the microword. Note should
be made that the hold register and the ROM sense amnlifiers are also
checked by this Drocedure. The functions of parity generation and
comoarison are combined in the parity checker shown in Fiaure 7.
The Dartitioning indicated shows all addressing and decoding cir-
cuitry in a module separate from the ROM storaae seaments, sense
amplifiers and hold register. Divorcing the circuitry functionally
related to addressing in this manner allows fault isolation to the
modular level. This technique eliminates the ambiguity as to the mod-
ular location of failure when a nortion of the addressing function is
implemented in the same module as the ROM storage seqments (a qood
example is the address decode, often provided on the same MSI chin
as the storaae devices).
The single-failure assumption made for the examnle design contends
that the probability of multiple simul taneous failures in systems
composed of comnonents having inherent high comnonent reliability is
so small that practical test design need not consider it. This assump-
tion was justified for discrete comnonents and even for IC's, but
with the advent of MSI and LSI with their numerous closely-oacked
comnonents, it must be reconsidered. In the context of the present
subject, one must consider the higher nrobabilitv of multinle failure
58
caused, for examnle, by a cracked silicon chin where several adjacent
components would be simultaneously affected. Odd parity, for instance,
will not detect multiple even failures. The use of parity for ROM
content checkinq anoears to be justified by the fact that multiDle
failures would tend to affact more than one microword (to continue the
example, a chin crack probably would not lie straight alonq the line
of devices implementing a single microword). While one ROM access
might not catch an even number of failures in one microword, very few
subsequent accesses to different microwords would be necessary before
a single or multiDle odd failure would be detected and signalled. So,
while the single failure assumption can be questioned for an MSI ROM
implementation, the use of parity can still be justified.
Testing the addressing functions of the ROM is accomplished by
comparing the current address field (CA) of the microword with the steo
counter contents. The step counter (or a second register if timinq
reguires the step counter to change prior to the issuance of the micro-
word beinq accessed) contains the address of the microword to which
access is beinq attempted. The eiqht bits of the CA field contain the
address actuallv accessed. Comparison of the two indicates whether an
addressinq failure has occurred. The sten counter, decode and drivers
are implemented on the same module. Non-comparison of the CA field
and the step counter therefore siqnals an error in this address-func-
tion module. If the parity check in the ROM storaqe module fails,
indicatinq incorrect microword content or a failed hold reaister, the
error siqnal from the address function module is inhibited since the
contents of the CA field being used for comparison are now susnect.
59
ProDer sequencing of accesses to the ROM is the most difficult
check to accomplish. A description of the sequencing nrocess in gen-
eral terms qives insight to the oroblem. The microprogram contained
in the ROM consists, in effect, of a series of "subroutines" in a
lower level language (the micro-instruction set), one "subroutine"
for each of the macro-instructions used to write the fliaht nroaram
stored in the core memory. The flight proaram instruction word's
operation code field, representing the macro-instruction, is analo-
gously used as the "call" statement for its "subroutine". Since
the same micro-instructions may be used in different mix to imDlement
different macro-instructions, the number of micro-instructions is, in
general, smaller than the number of macro-instructions.
Given a new flight program instruction word to be executed, the
first access to the ROM is dictated by the ooeration code field of
the instruction. This operation code is decoded as a selection of one
microword in the ROM. Subsequent accesses to the ROM until the "sub-
routine" started by the operation code "call statement" is comDleted
are dictated by the NA field of the microword itself. At the end
of the sequence, the microword indicates that the sequence is comnlete
and a new flight program instruction word is fetched by the FETCH
CONTROL. Under certain conditions (such as reneats and branches),
the repeat counter and condition code register dictate that the NA
field be ignored and that the step counter (ROM address register) be
incremented or decremented to indicate the next ROM address to be
accessed. There are, then, several different sources of the next
ROM address to be accessed:
60
1. The operations code field of the oroqram instruction word
found in the instruction register (U,, IL, IL in Figure 6)
dictates the initial access to the ROM in executina a given
program instruction word.
2. The NA field of the microword just accessed indicates the next
ROM address to be accessed excent that:
3. The repeat counter and condition code reqister can dictate
direct modification of the step counter to yield the next ROM
address to be accessed, in which case the NA field of the last
microword accessed is ignored.
The SEQUENCE CONTROL selects the orooer source of the next ROM
address to be accessed. It modifies the steD counter as required by
the repeat counter or condition code register, and selects the proDer
field (U-, , U
?
, or U^) from the instruction reqister deoendent on
whether half or full -word instructions are beinq executed. When the
NA field is selected as the source of the next address, its contents
could be held in a separate reqister until they could be comDared with
the CA field of the microword actually accessed to see if a nrooer
accessinq had occurred. However, because of the possible other sources
of the next address, it anneared that the nroner functioninq of the
SEOUENCE CONTROL, FETCH CONTROL, REPEAT COUNTER, and CONDITION CODE
REGISTER could only be assured by duplication, narallel operation, and
comparison for identical results. Only in this way did adequate con -
tinuous checkinq of the proper sequencinq to accesses seem feasible.
While the duplication and comparison test method should be reserved
for last consideration, as indicated in Section IV-C-3, its application
to the small logic sections described here appeared to be required to
provide continuous checkinq. Controls which are duplicated and compared
can be placed in any module as lonq as the dunlex circuitrv and com-
parator are in the same module. Partitioninq of this duplicated
61
circuitry was therefore dependent on 2A NAFI module limitations only.
The portion of Figure 7 labeled SEQUENCE CONTROL MODULE, then, could be
broken into several modules with isolation of faults to the modular
level still provided.
D. THE CORE MEMORY UNIT
Modification of an existing design to meet the requirements for
a 24-bit word length, 8K core memory for the parent computer was con-
sidered. The use of an already developed memory design appeared
favorable in light of the short schedule and low risk nature of the
program. Although the final choice of memory type and size was depend-
ent on changing requirements and therefore not firm, the example
design will consider modifications of the basic design shown in Figure 9
to provide a BIT capability with fault detection and isolation to the
modular level as the goal. The memory to be modified, termed the
"standard memory unit" (SMU), was a 3D, coincident current, 32-bit
word lenath, random access, 4K core memory. The example used serves well
to demonstrate the factors involved in memory test.
Reference 35 briefly summarizes the standard techniques for func-
tionally exercisina a core memory. The functional exercisers listed
below check for proper operation of the memory as a black-box without
examining specific internal circuits. The standard functional exer-
cisers are:
1. Check-sum - checks proper memory loading. This check can be
accomplished using the flight proaram and constants stored in
the computer for the mission.
62
2. One's discrimination - checks memory's ability to read and
write ones coreectly. Memory buffer registers, sense amnlifiers,
the core array, and drivina circuits are checked by this test.
3. Zero's discrimination - checks the memory's ability to read
and write zeros correctly. The driving circuits are checked
by this test, as well as the sense amplifiers' sensitivity to
noise.
4. Addressing - checks whether or not each memory location can be
correctly accessed. In addition to those circuits tested by
the discrimination tests, the memory selection logic, decoders
and drivers are checked.
5. Checkerboard and Inverted Checkerboard - these tests produce
worst case noise conditions unon half-readv, which results in
maximum inhibit noise whenever a zero is written. The inhibit
noise from a cycle where zero was written can cause an error
during the read portion of the next cycle.
The discrimination and checkerboard tests are aimed at discoverinq
marginal conditions, and were not considered anproDriate for airborne
testing. They would certainly be appropriate as part of pre- or oost-
flight checkout on the ground, as discussed earlier in relation to
marginal testing in general. The check-sum and addressina tests, more
suited to discovering existing solid failure, apoeared to be anoro-
priate for in-flight application.
The five tests enumerated above are Drogram-oriented, Deriodically
exercised tests. Test technigues which reguire added hardware include
codinq and separate checking circuitry for each circuit tyne. Coding,
principally parity, is popular for checking memories, but this tech-
nigue fell outside the program constraints for the examnle desion.
Technigues for adding specialized circuitry to test the memory are
described in Ch. 14 of Ref. 43. The additional expense of the cir-
cuitry and complete memory reconfiguration appeared inanprooriate for
the design modification intended.
63
Modification of the modular partitioninp of the SMI) appeared
necessary to facilitate the test desiqn if isolation to the modular
level were to be accomoltshed. While packaged employing standard NAFI
modules, the SMU did not use the 2A size, but rather the 1A and IB
sizes. The standard memory unit was implemented with the equivalent
of 152 1A NAFI modules. As evident in Figure 9, partitioning was done
by circuitry type; e.g., there are 16 IB size sense/inhibit modules,
one 1A address register module, and so forth. Several modules, of
different types, are involved in one memory access; an address reg-
ister module, address decoder module, timing control and timing
modules, and sense/inhibit modules are all involved in one access.
It is difficult to determine airborne in which module the fault lies
once one is detected by a functional test alone. A unique way of
applyinq functional tests and some added hardware were required to
accomplish the modular isolation capability required.
Sixteen IB NAFI modules were used in the SMU to implement the
sense/inhibit functions for the 32 memory planes (32-bit word lenqth).
This represents circuitry for two planes (bit locations) per IB
module. An estimate (based on area limitation because of an essen-
tially discrete component implementation of sense/inhibit circuitry,
and allowinq for added checkinq hardware) indicated that eight 2A
11
The number in the NAFI size designator refers to the horizontal
dimension of area (width), while the letter refers to the thickness.
1A is the smallest basic size, havlnq unit standard width and unit
standard thickness. The 2A module is twice as wide as the 1A and
hence has twice the area, and the IB is twice as thick as the 1A
[Ref. 10].
64
modules should amDly suffice to imDlement the sense/inhibit functions
for a 24-bit word length. Three planes were to be served Der 2A module.
It was envisioned that bit locations served by a module would be ad-
jacent. Figure 10 illustrates the scheme.
The implementation of the decode function for the SMU reguired two
modules dedicated to X select and two to Y select, each module serving
the entire core stack. An approach to partitioning which initially
aDpeared attractive was to partition the decode logic so that the X
and Y decode serving a smaller block of the core stack would be Dlaced
in the same module. However, partitioning the decode in effect doubles
the logic reguired for every partitioning (e.g., placing the X and Y
decode for one guarter of the core stack in one module would, for the
entire core stack, entail guadrunlicating the loaic). Duplication and
comparison reguired only twice as much decode logic, and this method
was chosen. For example, the circuitry on one of the two X decode IB
modules is duplicated, the duplex hardware being placed in the same
2A module. Figure 11 shows a decode module. Four 2A modules were
reguired for decode in the example design.
The address register also reguired duplication for separate test
by the duplication and comparison technigue. Checking of power supplies,
transient protection, temperature tracking voltage sensors, timinq, and
associated regulators have been excluded from consideration, as thev
are hard-core housekeeping and service functions. The major areas
subject to failure during flight are the decoding, sense/inhibit and
select lines, cores, drivers, and amplifiers associated with accessing
the memory, which are checked by the procedures described herein.
65
Fault isolation to the functional module level plus the core stack
is provided by the test procedures described below. Faults occurring
in the sense/inhibit functions are isolated to a single sense/inhibit
module and core stack combination. Faults occurring in the addressinq
function are isolated to a single address register module or decode
module. Faults occurring in the core stack are isolated to the core
stack only if all tests can be conducted. No airborne discrimination
between a single sense/inhibit module and the core stack aDDeared feasi-
ble if the sense/inhibit test failed because later tests could not
then be confidently conducted. Such discrimination is easily
accomolished on the ground. While a hiaher degree of isolation would
be preferable, the level provided airborne closely focuses the efforts
of maintenance Dersonnel and greatly reduces the time/cost of mainte-
nance. Sub core-stack isolation would probably not be useful since the
core stack must be treated as an entity by maintenance personnel.
Testing of the sense/inhibit functions should precede testinq of
the decode function to insure that the latter tests are valid when
conducted. The sense/inhibit functions serve the entire core stack;
that is, a single sense amplifier & a sinale inhibit driver serve the
same bit location in all the 8K words of the core stack. Each access
to the core memory exercises all the sense/inhibit circuitry since
all the bit locations of the word are involved. Solid failures result
in a stuck-at-one or stuck-at-zero condition in a bit location. To
isolate such fault manifestations to the sense/inhibit module or the
core stack serving the bit location, one must first detect the fault
and then relate it to the proper module. The test consists of attempt-
ing to access a core memory location which contains a nreviously
66
stored constant, A location containing all one's tests for the stuck-
at-zero condition. Another location containinq all zero's tests for
the stuck-at-one condition. Two core memorv locations are therefore
dedicated for test use, one containinq all one's and the other all
zero's. A second set of such tests usinq the same cells should be
performed to verify the restore oneration; however, discrimination
between failures in the sense/inhibit module and the core stack would
still not be nrovided because of the nossibility of a broken sense line
(which also looks like a sense amplifier stuck-at-zero) . Relatinq the
failure to a specific sense/inhibit module is accomplished by checkinq
hardware added to each module. Assuming eiqht sense/inhibit modules
with three-bit locations served per module (24-bit word), one adds a
three-bit reqister to each module (that is, in effect, a partitioned
output buffer register for the core memory). A three-bit comnarator
(XOR) senses the failed condition when the three-bit locations are not
identical. For example, stuck-at-one failure in the fourth bit loca-
tion would be detected by accessinq the memory location containinq
all zero's. The three-bit register of the second sense/inhibit module
(servinq the second three-bit groun of the 24-bit word) would read
10 0, producing an error signal from the XOR circuit on the module.
Figure 12 shows the configuration of the sense/inhibit module.
The exercising procedure for the decode function and the core
stack consists of check-summing over sections of memory. The core
memory contains the stored Program and constants (unalterable part of
memory) which cannot change durinq fliqhts, and a small section
(scratch pad) reserved for storage of data which can chanqe in-fliqht.
«./
Scratch Dad test will be discussed separately. Check-summinq is
accomplished by cumulatively adding the contents of all the cells
of the unalterable Dart of memory modulo 24, the final sum accruinq in
1
2
the accumulator. The exoected check-sum (ECS) for the unalterable
Dart of memory has been Dreviously calculated externally and stored in
the memory as a constant. Coincidence of the calculated sum and the
ECS (subtraction is often used to give an expected zero result) indi-
cates not only that the program stored in that part of memory is intact,
but also that the accessing process has been properly accomplished.
Sequential access to each cell of the segment is attemnted durinq
calculation of the sum; the sum will check with the ECS only if every
access has been properly executed. The accessing process thorouqhly
exercises the core stack and its associated decode modules. Isolation
of faults to the decode module (by its internal comnarator) or to the
core stack (by an incorrect check-sum) is thereby orovided without
seoarate addressing tests, modification of cell contents, or storaqe
of any test results. The ECS can be stored at the end of the unalter-
able Dart of memory. The core memory can also contain the memory test
nroqram for check-summinq, at the Drice of a few cells of core stor-
age. The memory test program can also be microDrogrammed in the ROM
with other test sequences, and this alternative is nreferable if
sufficient ROM snace is available. It has been imnlicit throuqhout
12
Different schemes of handlinq the carry out of the most signifi-
cant olace (e.g., addition to the least significant bit location)
reduce the nrobability of obtaining a proper check sum when failure
exists to a negligiblv low value.
68
the foreqoinq test Drocedure description that the control and Drocessor
units have been tested prior to memory checking so that they can be
validly used to calculate the check-sums and do comparisons.
The scratch pad is tested last, and it must be treated somewhat
differently, since its contents can chanae durinq the mission. Conse-
quently, an ECS could not be calculated and stored earlier for comoari-
son. In addition, there will be some data stored in scratch Dad which
cannot be destroyed during test mode; e.q., positional data. The same
check-sum test technique can, however, still be annlied if a small
block of scratch pad cells (block A cells in Fiqure 13) can be altered
durinq test. A like-sized block of stored pronram cells in the unalter-
able part of memory (block B cells in Fiaure 13) is identified and its
ECS externally calculated and stored as a constant nrior to flight.
Figure 13 illustrates the checkinq procedure for a IK scratch pad. 256
words of the scratch pad can be altered (block A cells). The sequence
of steos to test the IK scratch oad is listed below:
1. Write contents of block B cells into block A.
2. Check-sum block A and compare to previously stored ECS.
3. Write unalterable scratch pad data of block C cells into block
A for temporary storage (block A cells and associated decode
modules have been verified by steos 1 and 2).
4. Write contents of block B cells into block C.
5. Check-sum block C and compare to previously stored ECS.
6. Restore data temporarily stored in block A into block C.
7. Continue the procedure with blocks D and E to complete scratch
pad test.
Note should be made that the size of block A can be quite small,
if necessary, with resultinq increase in the number of data shuffles
required to complete scratch pad test. Alternate techniques to test
69
the scratch Dad include coding, addition of more hardware, or oerhans
acceptance of an untested scratch pad in consonance with reasonable
test objectives discussed earlier.
E. TESTING THE CHECKING HARDWARE
The checking hardware represents hard-core circuitrv whose oroner
functioning must be assured before test results are considered valid.
The failure of checking circuitrv can lead to the very undesirable
indication of error when none exists, or failure to flaq existing
error. To provide assurance that checkinq hardware is fault-free, one
can
1. Provide redundant circuitry with reliability an order of
maqnitude higher than the circuitry it checks.
2. Provide some earlier periodic check to verify nrooer
oneration before test commences.
3. Verify only during periodic maintenance periods.
The first alternative tends to be too exoensive, at least doubling
the hardware cost of providing built-in test. The third alternative
reduces confidence in the test results to an unacceptably low level.
A periodic gross functional check of the checkinq circuitry is probably
most feasible, but at the expense of a few words of core storaqe.
Test bit patterns stored in core-memory can be used to initialize the
circuitry so that the left and riqht half-words will differ. Error
therefore should be indicated. Identical half-word natterns can be
introduced, in which no error should be siqnalled. Such tests can be
made part of the periodic test sequence orecedinq test of the rest of
the computer. While it is reconnized that comprehensive test has not been
achieved, one can be assured of a hi qh deqree of confidence in the
checking circuitrv for minimal cost and effort.
70
F. SEOUENCE OF TESTING
The sequence in which testing should be conducted for the oarent
computer has been indicated in the separate sections. A summary is
useful to gain better perspective. For those portions periodically
tested, the priority should be:
1. Preflight marginal checks.
2. The checking circuitry (gross functional check).
3. The processor unit.
4. The core memory.
a. Sense/inhibit function
b. The core stack (check-sum)
c. Scratch pad
Those portions tested continuously include:
1. Hard-core housekeeping and service functions (power supplies,
clock, and so forth)
2. The control unit
3. Core memory (partially)
a. Address register
b. Decode function
G. PROCESSING OF ERROR SIGNALS
Some general comments should be made relative to the handling of
error signals once issued. If the goal of providinq a separate error
signal from each module of the computer is achieved, a larne number
of sources will be reporting. The reports must be interpreted and
processed to achieve the desired test goals.
First, the signal lines should be made "fail-safe"; that is, a
voltage should be present on each line except when it is reporting
failure. In this way, the line itself is checked since the absence
71
of a voltage will lead to investigation of the cause. The problem of
errors Dropaqating from module to module, qivinq several false error
signals in addition to the accurate error sianal , has been resolved
locally in the modules by error-inhibit Drecautions, as in the general
processor module checking circuitry. An error signal transmission path
should be provided separate from other computer output ^aths, and
by the most direct route to allow signals to be communicated under a
failed condition. The problem of signal interpretation remains to be
resolved.
A reasonable number of 128 modules with separate error lines will
be assumed. By the single failure assumntion, only one of the 128 lines
will siqnal error at one time. With 80 pins limiting the 2A NAFI
module, two separate error processinq modules would be necessary to
accommodate the required error inputs. Sixty-four error lines would
then input to each module, well within the 80 nin limitation. Encodinq
circuitry in each module would encode the error source into binary
code, each error line havinq a unique binary number identifyinq it.
Seven outDut lines, then, would be necessary from each module, six to
encode one of 64, and one to indicate which module was sendinq the en-
coded error messaqe qiving a resolving power of one in 128. A total
of 71 inDut and output lines for each module, nlus required power
supply and timinq inputs, apDears reasonable relative to nin limita-
tions. The encoded messaqe would then be routed bv direct means to a
central buffer reqister where the messane of error location would be
preserved by some recording means for later use by maintenance Der-
sonnel . The message could also be used to turn off the central nower
7?
source to avoid the use of contaminated comnutations . The nilot would
be notified of error in accordance with test goals. Care would have
to be taken to ensure that failures in checkinq hardware, detected
durinq nre-test neriodic check, did not initiate comnuter shutdown.
In such cases, notification to the pilot that the error checkinq
capability of the computer had failed would allow him to continue
its use knowledqeable of the attendant risk.
73
VI. DESIGN EVALUATION
Any test design is subject to the unique limitations imposed
by the parent design program, and the example used was no exception.
The design oresented achieves the reasonable objectives established
for it in almost all instances:
1. A thorough self-test capability is provided for the parent
computer in the airborne environment with a high confidence
level for the test results. The risk of undetected error is
kept negl igibly low.
2. The test design represents a unique series of tradeoffs,
optimizing the test performance per dollar for the short
schedule, low risk program. Maximum advantaqe was taken of
proposed architectural characteristics for the machine.
The hardware-software split duplication technique and the
proposed modification of an existing memory design illustrate
this.
3. Partitioning of the computer was achieved usinq the specified
NAFI 2A module. Detection and isolation of the most important
classes of faults to this modular level is automatically
provided. This capability was achieved while allowing for
flexible word length with minimal basic design changes. In
the highly regular nrocessor and memory units, the number of
different module types was kept favorably low.
4. Redundancy was not qenerallv used. The caoabilitv of signif-
icant test performance is nrovided for considerably less
than duplication of hardware.
5. The test design required verv few cells of core storaqe,
such requirements being limited to a few constants and possibly
a memory test routine of short lenqth. A simole pseudo-
random number qenerator to nrovide test but natterns was
substituted for a large number of stored constants. The
coding techniques used required no core storage, leaving
maximum word length available for operational use. Dissoci-
ating the core memory from the processor and control units
simplified the overall test problem.
6. Operational degradation was minimized throuqhout. An inter-
runtable microprogrammed routine using idle time and executed
at read only memory cycle speed provides valid test without
infringing on operational availability.
74
Assiqnment of a specific fiqure of merit to the test desiqn
must await choice of specific hardware, and the important micro-
programming of the test routine upon which much of the potential
test performance is predicated.
Various figures of merit can be assigned to a test design. Davis
[Ref. 9] developed a formula to assign a figure of merit to his
residue code arithmetic unit test scheme. Other finures relating to
cost, such as the 10% added hardware figure mentioned earlier, or in
more absolute terms the cost of BIT oer gate tested have been assigned.
The ultimate justification for a self-test capability is its measured
performance in detecting errors. A high confidence level that a
high percentage of potential failure sources have been checked seems
to the author to be the best figure of merit.
Evaluation of a self-test caoabilitv can be accomnl ished in
several ways. One technique which allows such evaluation is simula-
tion, during which faults can be artificially duplicated to verify
expected test response. Once the computer is built, actual faults can
be injected and the response measured. Failure history for a nroduc-
tion machine can also helD in evaluating test efficiency. A full-
scale simulation of the parent comnuter with self-test circuitry was
envisioned.
The example design promises to provide significantly more test
capability per dollar than previous designs for similar comouters. Its
potential beneficial effect on overall cost of ownershin makes the
self-test capability provided by the desiqn a very attractive feature.
Reconnition of this fact should certainly result in nreater future
emphasis on the relatively new field of built-in self-test design.
75
VII. SUGGESTED FURTHER INVESTIGATION
The subject of derivation of an optimal test routine usinq the
micro-instruction set is an interesting one for future work. Many
techniques, some briefly presented herein, sugqest ways in which the
states of a block of loqic can be identified and related to the micro-
instructions. Additionally, special instructions for test use only
can be formulated, as needed. Computer-aided desiqn fits well in this
cateqory.
Once error siqnals from each module can be provided, the subject
of automatic reconfiguration for continued operation after failure can
be addressed. Ideally, the error signal from a "bad" module would be
used to turn off the bad module and switch in a substitutinq module.
For example, in the processor, the three identical modules of a set
could be joined by a fourth identical module to be used in the event
of failure. The ability to add such a reconfiguration capability in
modular form might prove to be an attractive option available at extra
cost dependent on the computer's intended use.
The ability of a computer to continue to operate after failure in
a degraded mode usinq its remaininq unfailed circuitry miqht be inves-
tigated. For example, limited operations miqht continue at a slower
speed for high priority tasks related to aircraft survival (e.q.,
electronic countermeasures and naviqation.)
Lastly, the effects of continued technoloqical advance on test
desiqn and self-repair offer fruitful subjects for further investi-
nation.
76
APPENDIX A - PSEUDO-RANDOM NUMBER GENERATOR
The pseudo- random number qenerator shown below aenerates the
12
maximum length sequence of 2 different 12-bit binary patterns. The
numbers so produced are random in each bit position. The aenerator







+ x + 1
as a linear feedback shift reqister. A different pattern is produced
at each clock pulse. The nonlinearity of the all -zero case is added
by the 11 - input NAND qate (which, of course, can be implemented as





































































































C,= Carry from L. to L.
+ 1













































































































































































































































































v x C <u




























































































40 Equiv 1A NAFI Modules















































































Note: Numbers refer to












1. Armstrong, D. B., "On Findinq a Nearly Minimal Set of Fault
Detection Tests for Combinational Logic Nets," IEEE
Transactions on Electronic Computers , v. EC-15, o. 66-73,
February 1966.
2. Ball, Michael and Hardie, Fred, "Self-Repair 1n a TMR Computer,"
Computer Design , v. 8, p. 54-57, February 1969.
3. Bashkow, T. R., Friets, J., and Karson, A., "A Proarammlnq
System for Detection and D1aanos1s of Machine Malfunctions,"
IEEE Transactions on Electronic Computers , v. EC-12,
p. 10-17", February 1963.
4. Brauer, J. B., "The Numbers Game -Who Wins?" paper presented
at Annual Symposium on Reliability, Washinaton, D. C,
10 January 1967.
5. Carter, W. C, and others, "Design of Serviceability Features
for the IBM System 360," TBM Journal , v. 8, p. 115-126,
April 1964.
6. Chanq, Herbert Y., "An Algorithm For Selecting an Optimum Set
of Diagnostic Tests," IEEE Transactions on Electronic
Computers , v. EC-14, p. 706-71 1 , October 1965.
7. Cohen, J. J. and Whitaker, L. A., "Improved Techniques in
Diagnostic Programming," The Sylvania Technologist , v. 13,
p. 90-96, July 1960.
8. Connolly, J. B. and Schmidt, W. G., "Failure Erasure Circuitry;
A Duplicative Technique of Failure - Maskinq Systems,"
IEEE Transactions on Electronic Computers , v. EC-16, p. 82-85,
February 1967.
9. Davis, R. A., "A Checkinq Arithmetic Unit," AFIPS Proceedings
Fall Joint Computer Conference , 1965, v. 27, part 1 , p. 705-713
10. Department of the Navy NAVWEPS OD 30355 (3D Revision), Standard
Hardware Program Data Handbook , v. 1, table 4-2, figure 4-3,
1 August 1968.
11. Downinq, R. W. , Nowak, J. S., and Tuomenoksa, L. S., "No. 1 ESS
Maintenance Plan," Bell System Technical Journal , v. 43,
p. 1961 - 2019, September 1964.
12. Eldred, R. D., "Test Routines Based on Symbolic Loqic Statements,"
Journal of ACM, v. 6, p. 33-36, January 1959.
91
13. Forbes, R. E., Rutherford, D. H., and Stieplitz, C. B., "A Self-
Diagnosable Computer," AFIPS Proceedings Fall Joint Computer
Conference , 1965, v. 27, nart 1, n. 1073-1086.
14. Galey, J. M., Norby, R. E., and Roth, J. P., "Techniques for the
Diagnosis of Switching Circuit Failures," IEEE Transactions
on Communications and Electronics , v. 83, p. 509-514,
September 1964.
15. Golomb, S. W. , Shift Register Sequences , Holden-Day, 1967.
16. Hackl , F. J. and Shirk, R. W. , "An Integrated Approach to
Automated Computer Maintenance," Proceedings Annual IEEE
Symposium on Switching Circuits and Logical Design, 6th,
Ann Arbor, Michigan, 1965, p. 280-302.
17. Hamming, R. W. , "Error Detecting and Error Connecting Codes,"
The Bell System Technical Journal , v. 26, p. 147-160, April 1950,
18. Happ, W. A., "Combinatorial Analysis of Multi -Terminal Devices,"
IEEE Transactions on Systems Science and Cybernetics , v. SSC-3,
p. 2V-27, June 1967.
'
19. Hausrath, D. A. and Fleming, D. C, "Reducing Failure Rate
Prediction Uncertainties," Proceedings Annual Symposium on
Reliability
,
Boston, Massachusetts, 1968, v. 1, o. 226-235.
20. Hennie, F. C, "Fault Detecting Experiments for Sequential
Circuits," Proceedings, Annual IEEE Symposium on Switching
Circuits and Logical Design
,
5th, Princeton, N. J., October
1964, p. 95-110.
21. Joseph, E. C, "Impact of Large Scale Integration on Aerospace
Computers," IEEE Transacti ons on El ectroni c Computers
,
v. EC-16, p. 558-561, October 1967.
22. Kautz, William H., "Fault Testing and Diagnosis in Combinational
Digital Circuits," IEEE Transactions on Computers , v. C-17,
p. 352-366, April 1968.
23. Kirkman, R. A., "Failure Concepts in Reliability Theory," IEEE
Transactions on Reliability
,
v. R-12, p. 1-10, December 1963.
24. Kirkman, R. A., "Failure Prediction in Electronic Systems,"
IEEE Transactions on Aerospace and Electronic Systems,
v. AES-2, p. 700-707, November 1966.
25. Kohavi , Zvi and Lavallee, Pierre, "Design of Diagnosable
Sequential Machines," AFIPS Proceedings Spring Joint Computer
Conference , 1967, v. 30, p. 713-718.
'
26. LaMacchia, S. E., "Diagnosis in Automatic Checkout," IRE
Transactions on Military Electronics, v. 6, p. 302-309,
July 1962.
92
27. Lee, Fred, "An Automatic Self-Checkinn and Fault-Locatlna Method,"
IRE Transactions on Electronic Computers
,
v. EC-11 , n. 649-654,
October 1962.
28. Lowrie, R. W. , "Hiah Reliability Computers usinq Duplex Redundancy,"
Electronic Industries
, p. 116-128, August 1963.
29. Lynn, D. K. , Meyer, C. S., and Hamilton, D. J., eds., Analysis and_
Design of Integrated Circuits
, p. 6, McGraw-Hill, 1967.
30. Malinq, K. and Allen, E. L., "A Computer Organization and
Programming System for Automated Maintenance," IEEE Transactions
on Electronic Computers , v. EC-12, p. 887-895, December 1963.
31. Manning, Eric, "On Computer Self-Diagnosis Part I —Experimental
Study of a Processor," IEEE Transactions on Electronic
Computers , v. EC-15, p. 873-881, December 1966.
32. Manning, Eric, "On Computer Self-Diaqnosis Part II — Generalizations
and Design Principles," IEEE Transactions on Electronic
Computers , v. EC-15, p. 882-890, December 1966.
33. Merwin, R. E., "Processor Control Checkup System," IBM Technical
Disclosure Bulletin
,
v. 8, p. 58-59, June 1965.
34. Minner, E. S. and Romero, H. A., "Reliability Testing of F-111A
Avionics System," Proceedings Annual Symposium on Reliability
,
Boston, Massachusetts, 1968, v. 1, p. 567-575.
35. National Aeronautics and Space Administration Report CR-65254,
AES-EPO Study Program Final Report , by IBM Corp., Federal
Systems Division, Oswego, N. Y., v. 2, p. 135-166, 31 December
1965.
36. Peterson, W. W., Error-Correcting Codes , MIT Press and Wiley, 1961.
37. Poage, J. F. and McClusky, E. J. Jr., "Derivation of Optimum Test
Sequences for Sequential Machines," Proceedings, Annual IEEE




Pri nceton, N. J., October 1964, p. 121-132.
38. Preparata, F. P., Metze, Gemot, and Chien, R. T., "On the
Connection Assignment Problem of Diagnosable Systems," IEEE
Transactions on Electronic Computers , v. EC-16, p. 848-854,
December 1967.
39. Rao, T. R. N., "Error-Checking Logic for Arithmetic-Type Operations
of a Processor," IEEE Transactions on Computers , v. C-17,
p. 845-849, September 1968.
40. Reference Data for Radio Engineers
,
5th ed., p. 40-20, Howard W.
Sams, 1968.
93
41. Roth, J. P., "Diaqnosis of Automata Failures: A Calculus
and A Method," IBM Journal , v. 10, n. 273-291, July 1966.
42. Sellers, F.F., Jr., Hsiao, M.Y., and Bearnson, L.W.,
"Analyzing Errors with Boolean Difference," IEEE
Transactions on Computers , v. C-17, n. 676-683,
Julv 1968.
43. Sellers, F.F., Jr., Hsiao, M.Y., and Bearnson, L.W.,
Error Detectinq Logic for Diqital Computers, McGraw-
Hill, 1968.
'
44. Seshu, Sundaram, "On An Improved Diagnosis Program,"
IEEE Transactions on Electronic Comnuters , v. EC-14,
p. 76-79, February 1965.
45. Seshu, S., and Freeman, D.N., "The Diagnosis of Asynchronous
Sequential Switching Systems," IRE Transactions on
Electronic Computers , v. EC-11, p. 459-465, August 1962.
46. Terris, I. and Melkanoff, M.A., "Investigation and Simulation
of A Self-Repairing Digital Computer," Proceedings Annual
IEEE Symposium on Switching Circuits and Logical Design
,
6th, Ann Arbor, Michigan, 1965, d. 264-278.
47. Tsiang, S.H. and Ulrich, W., "Automatic Trouble Diagnosis of
Complex Loqic Circuits," Bell System Technical Journal
,
v. 41, d. 1177-1200, July 1962/
48. Weber, Samuel, "LSI: the Technologies Converge," Electronics
,
v. 38, o. 124-129, 20 February 1967.
49. Weitzenfeld, A.S. and HapD, W.W., "Combinatorial Techniques for
Fault Identification in Multi terminal Networks," IEEE
Transactions on Reliability
,




1. Defense Documentation Center 20
Cameron Station
Alexandria, Virqinia 22314
2. Library, Code 0212 2
Naval Postqraduate School
Monterey, California 93940
3. Associate Professor M. L. Cotton, Code 52Cc 1
Department of Electrical Enqfneerinq
Naval Postqraduate School
Monterey, California 93940
4. Major Edward 0. Bierman, USMC 1
14915 Daytona Court
Woodbridqe, Virqinia 22191
5. Mr. Charles A. Pullen 1
Mail Station 6/E-102
Huqhes Aircraft Company
Culver City, California 90230
6. Commandant of the Marine Corps (Code A03C) 1
Headquarters, U.S. Marine Corps
Washington, D.C. 20380
7. James Carson Breckinridqe Library 1





DOCUMENT CONTROL DATA -R&D
s»*i or it v « tBWmifit at ion of title, body of abstrat t And indexing annotfttirtn mu*t be entered when the overall report Is c la&tuftedj
OHiuiNATiNC activity ( Corporate author)
Naval Postqraduate School
Monterey, California 93940




Built-in Self-Test for an Airborne Digital Computer
« DESCRIPTIVE NOTES (Type ol report and.inclueive dates)
Master's Thes1s_^ June 1969
i au thoriS) (First nmm*. middle Initial, last name)
Edward Oliver Bierman
I REPOR T DATE
June 1969
la. TOTAL NO. OF PACES
97
76. NO OF RE FS
49
la CONTRACT OR CRANT NO
6. PROJEC T NO
0*. ORIGINATOR'S REPORT NUMBERISI
96. OTHER REPORT NOIS) (Any other numbers that may be assigned
thle report)
10 DISTRIBUTION STATEMENT
Distribution of this document is unlimited




This thesis describes the design of a built-in self-test capability for
a military airborne digital computer. The supportive Investigation of program
constraints and their effects on the example test design is Intended to give
broad perspective to the general self-test design problem. Alternate procedures
for achieving the goal of airborne detection and Isolation of a certain class of
failures to the modular level are surveyed. A specific test design 1s evolved
illustrating the unique mix of program-oriented, periodic techniques, and added
hardware, continuous techniques best suited to the example development program.
The test design is evaluated and further work is suggested.
DD. Fr:..1473
S/N 01 01 -807-681 1
(PAGE 1)







DD , F.°"-.,1473 back,
98 Security Classification





Built-in self-test for an airborne digit
3 2768 001 03646
DUDLEY KNOX LIBRARY
