Research on failure free systems with supplemental information Final report by unknown
e. I 
LOAN COPY. RETURN TO 
AFWL (\ JLlL-2J 
KIRTLAND AFB, N MEX 
RESEARCH ON FAILURE FREE SYSTEMS 
WITH SUPPLEMENTAL INFORMATION 
December, 1963 
Distribution of this report is provided in the interest of information 
exchange and should not be construed as endorsement by NASA of 
the material presented. Responsibility for the contents resides 
in the author or organization that prepared it. 
AV AILABLE TO GOVERNMENT AGENCIES ONLY 
Prepared under .Contract No. NASw-572 by 
THE WESTINGHOUSE ELECTRIC CORPORATION 
Baltimore, Maryland 
for 
NATIONAL AERONAUTICS AND SPACE ADMINISTRATION 
https://ntrs.nasa.gov/search.jsp?R=19660017908 2020-03-16T19:11:49+00:00Z
LIST OF CODE LETTERS AND COMPANY NAMES 
Company A--Westinghouse Electric Corporation 
Company B- - Signetics Corporation 
Company C-- Curtis Instruments, Inc. 
Company D--Fairchild Camera & Instrument Corporation 
Company E--Sylvania Electric Products, Inc. 
Company F--Siliconix Incorporated 
Company G--Texas Instruments Incorporated 
Company H--Motorola, Inc. 
Company I--Amp, Inc. 
LIST OF CODE NUMBERS AND TRADE NAMES 
Device I--Memistor 
Device 2--Amp-Mad 
PURPOSE. 
SUMMARY 
TABLE OF CONTENTS 
CONCLUSIONS AND RECOMMENDATIONS 
Appendix 1 - Design and Testing of Redundant Systems 
Appendix 2 - Reliability of Imperfect Redundant Systems 
Appendix 3 - A Survey of Components for Adaptive Restoring Circuits 
Appendix 4 - Transor Analysis 
Appendix 5 - Comparison of Dynamic and Threshold Restorers 
Appendix 6 - Self Repair Techniques 
Page 
1 
3 
9 
i 
PURPOSE 
This final report is prepared in accordance with the requirements of Contract NASw-
572, "Res-earch on Failure Free Systems ", between the National Aeronautics and Space Ad-
ministration and the Westinghouse Electric Corporation (reference WGD-38521). The 
research that is reported herein has the general objective of the advancement of the state-
of-the-art in the design of highly reliable electronic systems associated with the national 
space effort. The design objectives which are studied are those which permit the proper 
operation of systems to be relatively independent of the effects of individual component or 
module failures within systems. The scope of this objective includes the use of the more 
conventional techniques of multiple-line, majority voted redundancy, as well as the study of 
self-repair and advanced voting techniques. The research has been divided into the following 
maj or tasks: 
TASK 1: IMPLEMENTATION 
TASK 2: ADV ANCED VOTING TECHNIQUES 
TASK 3: SELF REPAIR TECHNIQUES 
1 
Page intentionally left blank 
~----------------------------------------------------------~----------------------- -------
SUMMARY 
TASK 1 - IMPLEMENTATION. 
This portion of the study is concerned with developing suitable circuits, systems, and 
testing techniques for use with currently available redundancy techniques. The circuit and 
system design is expected to be suitable for general use 'in spaceborne or ground support 
equipment, free from extremely detrimental failure modes, and compatible with whatever 
testing techniques are to be applied. The testing techniques are expected to be suitable for 
a wide variety of applications. They are, therefore , similarly varied according to the pur-
pose of the testing, the system configuration involved , and the information which is available 
for the test. The testing of redundant systems represents a unique problem, since individual 
component or module failures do not indicate their occurrence by affecting the system per-
formance. The various purposes for testing are indicated by the following types of diagnostic 
tests which have been considered: 
The verification that all Signal-processing elements are working properly , or 
additionally that the voters are capable of transmitting a correct signal, or 
further, that all signal processors and voters work properly under all possible 
design conditions. This may be further extended to include the verification that 
any additional hardware which is added for the testing is also capable of proper 
operation. This range of test requirements is also encountered when the purpose 
of the tests is not only to detect any failures, but to locate these failures to facil-
itate repair or replacement in redundant systems where repair is desired, or 
systematic maintenance is used. 
Another type of teSting is referred to as "statistical measure of quality", which obtains 
a limited amount of information concerning the failure pattern existing within the system to 
estimate the reliability of the system. Many different types of tests can be used to obtain 
this information, depending on the confidence required for the reliability estimate, the cost 
of obtaining the information, and the type of analysis which will be applied to that information. 
Much of the preliminary work necessary for the determination of suitable circuitry for 
redundant systems has been described in an earlier Com pan y A report, "Failure Effects 
in Redundant Systems',! The report describes in detail the effects of catastrophic component 
1. A. R. Helland, W. C. Mann, "Failure Effects in Redundant Systems", Westinghouse Re-
port EE 3351, March 1963 . 
3 
failures which were induced into a laboratory model of a portion of a typical redundant sys-
tem. Potentially serious detrimental failures which might occur are discussed. A major 
portion of the report is concerned with the random failure simulations and their results. 
Briefly, a computer program generates random failure lists using available reliability data 
for each part. Each failure list includes all component failures which might have occurred 
in a typical system which had been operated for the specified time interval , and therefore 
simulates the actual testing of such a system. The indicated failures are induced into the 
system, which is then tested to determine whether it is capable of performing all of its de-
sign functions, or if it has failed. This actual test result can be compared with the analytical 
result which would have been obtained with the same group of failures, to test the validity of 
the assumptions used for the analytical result. These tests showed that the most common 
analytical model is exceSSively pessimistic for a well-designed system. For these tests it 
predicted more system failures than actually occurred by a ratio of more than 2: l. The 
reasonS for this departure, and more accurate analytical models, are discussed. A new tech-
nique is described which permits the reliability of a redundant system to be estimated by the 
product of exponentials , using the failure rates of the components or modules involved. 
Finally, several circuit design considerations are discussed. 
The results of implementation studies as part of the research on failure free systems 
have been previously published in special technical reports. Two major areas of interest 
are discussed in Special Technical Report No.3, "Circuits and Circuit Testing for Space-
borne Redundant Digital Systems". The entire report is reproduced as Appendix 1 of this 
final report. The first portion of the report is concerned with efficient initial design and 
contains a discussion of several possible circuit implementations. The latter portion is con-
cerned with the diagnostic testing of a multiple line, majority logic redundant system. Sev-
eral techniques are described for detecting and locating failures within an operating redundant 
system to greatly increase reliability. The report is summarized below. 
Section I contains a discussion of the general problems concerned with the design and 
testing of redundant systems. These problems inClude the most appropriate choice of circuit 
implementation, special design requirements, and the realization of high system reliability 
with available circuits. 
Section II contains a discussion of the possible use of magnetics to reduce the total 
power consumption and provide non-volatile storage in redundant spaceborne systems. Mag-
netics appear to be most useful for applications requiring memory aSSOCiated directly with 
simple forms of logic , or for non-volatile data storage when the data is altered at very slow 
rates , but is not recommended for general logic use. 
4 
Section III contains descriptions and comparisons of types of semiconductor circuits 
suitable for use in redundant systems. Since integrated circuits offer many important ad-
vantages for redundant systems, they are chosen as a basis for system design with semicon-
ductor circuitry. Since custom design of integrated circuits is not especially practical for 
low volume operation, the circuit design problem includes the choice of the most suitable 
type of available circuits. Integrated Diode-Transistor Logic elements were chosen as the 
most appropriate for general use. A majority voter restoring element, which is not subject 
to the detrimental failure modes found to be characteristic of conventional elements, is de-
signed using positive logic D-TL NAND elements. 
The discussion of Section IV is concerned with the testing of redundant systems. Various 
solutions to the problem of failure detection within a redundant system are discussed in this 
section; some are more suitable for simple failure detection, others also provide information 
concerning the location of any failures. The failure detection tests alone are expected to be 
most suitable for initial acceptance and verification tests to indicate that all parts are work-
ing. The combined detection and location techniques are most applicable to systems where 
additional information is required to facilitate repair or replacement of individual parts of 
the system. 
It is shown that failure location and maintenance of a redundant system does not require 
the test equipment and operator skill which are usually required to maintain a conventional 
non-redundant system. Techniques are described which permit a redundant system to be 
systematically maintained to provide much higher operational reliability than possible with-
out maintenance. It is shown that a major portion of the maintenance may be performed dur-
ing normal system operation. 
The partial testing of imperfect redundant systems to estimate future reliability is dis-
cussed in part two of SpeCial Technical Report No.4, "Transor DeciSion Functions and Sta-
tistical Measure of Quality". The second part of the report is reproduced as Appendix 2 of 
this final report. 
The objective of this portion of the study has been to develop a test philosophy from 
which a good statistical estimate of the probability of mission success could be made from a 
limited amount of test data. Several possibilities have been formulated. The failure masking 
characteristics of redundant systems prohibit the use of simple test programs which merely 
determine the performance capability of the system at the time of test. Such programs 
5 
- ------ -
cannot differentiate between systems containing many component failures with correspondingly 
many stages vulnerable to succeeding failures. or few component failures with few vulnerable 
stages. Because the probability of mission success after the time of test is heavily influenced 
by the component failure pattern existing at the time of test, a test program must be devised 
from which mission reliability can be predicted with a reasonably high degree of confidence. 
The general complexity and microminiature size of modern systems generally precludes the 
possibility of testing each Signal processor in each stage. 
In the proposed extention of this study the various philosophies will be considered in 
more detail, and an effort will be made to evaluate the usefulness of each one with the pur-
pose of determining which of the candidate philosophies provides the most accurate estimate 
of probability of mission success for a fixed cost of testing. 
TASK 2 - ADVANCED VOTING TECHNIQUES 
This study is concerned with advanCing the state-of-the-art in developing new restoring 
circuits for use in redundant systems. Several advanced voting techniques have been studied 
as part of the research on failure free systems. The results of the Adeline-Neuron study 
and the initial results of the Transor study have been previously published as special techni-
cal reports. Further study of Transor and a new dynamic restorer (the Hamming Distance 
Restoring Circuit) has been conducted, but the results have not been previously published. 
These results are, however, contained in Appendix 5 of this report. 
The results of the study of the Adaline - Neuron adaptive voter with continuously variable 
input weighting.have been previously published as Special Technical Report Number 1, "A 
Survey of Adaptive Components for Use in Failure Free Systems". It is reproduced as Appen-
dix 3 of this report. Briefly, it concludes that suitable analog memory devices are not cur-
rently available for use in this class of adaptive voters, although the mercury cell integrator 
with photoelectric readout is apparently the most suitable technique. 
Since the Adaline-Neuron adaptive voter requires an analog memory for each input, 
the selection of a suitable input device is important to realize a practical adaptive voter. 
Several types of analog memory devices were surveyed in order to evaluate their suitability 
for use in implementing an adaptive voter for redundant systems. It is desirable that the 
devices be simple, reliable, relatively linear, and store the analog variable weighting for a 
relatively long time. It was found that most of the available devices which have been de-
veloped for pattern recognition or learning machines are too complex, unstable, or unreliable 
for use in adaptive voters. 
6 
Devices which were included in the survey included the Device 1 
plated resistor, the solion iodine ion cell, the mercury cell integrator (with either 
capacitive or photoconductive readout), the MAD magnetic integrator, the orthogonal core in-
tegrator, the second harmonic magnetic integrator, and the magnetostrictive integrator . The 
mercury cell integrator with photoconductive readout appears to be the most suitable device 
among those which were surveyed. It incorporates an electroplating technique for providing 
the continuously variable input weighting for adaptive voters, with relatively good stability, 
reversibility, and permanent storage. Since it is a four terminal device with electrical cur-
rent as the input and electrical resistance as the output, it is relatively simple and generally 
compatible with conventional circuitry. It is, however, currently in a relatively early state 
of its development as a device for general use. It appeared that any detailed circuit design 
for adaptive voters should not be undertaken before the expected progress in the development 
of more effective cells is accomplished. 
The proposed continuation of the development of this class of adaptive voters includes 
monitoring the state of the art in the development of more effective devices, followed by the 
design and breadboard construction of at least one Adaline-Neuron adaptive restorer, or pre-
ferably a small redundant subsystem using these restorers, in order to demonstrate their 
effectiveness in redundant systems. 
The objective of the Transor study portion of the research 'was to evaluate the Transor 
Restoring Circuit for possible use as a replacement for threshold voters in redundant systems. 
In the process of performing this evaluation, another dynamiC restorer, the Hamming Dis-
tance Restoring Circuit, was invented. The study was extended to include an evaluation of 
both circuits. 
The initial portion of this study has been reported in part one of Special Technical Re-
port No.4, "Transor Decision Functions and Statistical Measure of Quality" which is repro-
duced as Appendix 4 of this final report. In that report, analytical reliability expressions 
for systems using Transor restorers are obtained for the case when signal processors are 
restrained by certain failure mode assumptions. An appendix to that report shows how the 
probability of occurrence of various failure modes might be computed. The results of later 
portions of this work are presented in Appendix 5 of this final report. In these results, gen-
eral reliability expressions for the Transor and the Hamming Distance Restoring Circuit are 
obtained which are relatively free of restrictive assumptions. A computer simulation pro-
gram which was developed for use in the evaluation, is described and some results obtained 
from the program are discussed. Finally, the conclusion is drawn that the Hamming Dis-
tance Restoring Circuit is always superior to the Transor but that it is as good as or better 
than the threshold voter only in certain failure mode environments. 
7 
----------------------------------- --- --
TASK 3 - SELF REPAIR TECHNIQUES. 
This study is concerned with the development of new, more efficient means for employ-
ing redundant equipment. Using these techniques, a system may be designed to absorb more 
internal failures without system failure than is possible with the same amount of fixed, 
multiple-line redundant equipment. The results of this study have been previously published 
as Special Technical Report No.2, "Self Repair Techniques for Failure Free Systems". 
The report is reporduced as Appendix 6 of this final report. 
As a part of the effort to develop hyper-reliable systems, Company A has devised a 
class of techniques for using redundant blocks of circuitry more effectively than has been 
done previously. The systems using these techniques are similar to the familar multiple-
line, majority-voted redundant systems except blocks of circuitry are allowed to shift around 
as component failures leave certain subsystem functions more vulnerable than others to suc-
ceeding failures . The object of this phase of the study has beeoto devise sev€ral general patterns 
in which systems could be organized to absorb relatively large numbers of internal failures 
without system failure and to develop a means for evaluating the effectiveness of the various 
patterns for performing this function. 
Three broad classes of organization patterns have been developed, and several specific 
patterns within each class have been examined. A versatile computer Simulation program 
has been written from which approximate reliability vs. time CUrves and a variety of other 
pertinent information about each pattern can be directly obtained. Both of the patterns which 
have developed and the computer program have been described in detail in Appendix 6. 
A three-part program has been proposed for future study in this area. In the first part, 
the computer simulation program will be used as an evaluation tool for establishing a set of 
rules for deSigning optimum or near-optimum self-repairing s ystems. The rules will be pri-
marily concerned with the organizational patterns to be used and with the maximum allowable 
ratio of repair circuitry complexity to Signal processor complexity. Secondly, an implementa-
tion study has been proposed to determine effective means for implementing the organization 
patterns which have been and will be devised. Finally, an appropriate study vehicle will be 
selected and designed with suffiCient detail than a breadboard model could be constructed 
from the specifications produced. Such a vehicle design is required in order to verify the 
usefulness of both the organizational pattern theories and the implementation techniques 
which are being developed. 
8 
CONCLUSIONS AND RECOMMENDATIONS 
TASK 1 - IMPLEMENTATION 
1. Design of Redundant Systems 
Redundancy is a powerful tool for achieving extended reliability, but effective design 
is required to achieve the reliability goals with a minimum of additional complexity. Although 
magnetic logic is often cited as having several advantages applicable to spaceborne computers, 
the use of magnetic logic is limited to special applications. Magnetic logic is not particularly 
suited for general logic use in redundant systems, due to the lack of steady output signals, 
low speed capability, high peak power requirements, and the complexity required for general 
logic functions. It appears that no proven magnetic restoring element exists which is suitable 
for general use in redundant systems. Magnetic logic does, however, offer non-volatile 
storage and very low average power for slow speed operation. Magnetic devices appear to 
be suited to special applications where certain logic functions, such as transfer and OR, 
are intermixed with the memory function, and very low speed capability is acceptable. It is 
useful for low speed shift registers, counters, and timers which consume negligible stand-
by power. 
Integrated semiconductor circuitry offers many desirable characteristics for use 
in redundant spaceborne systems, including small size, reduced weight and power consump-
tion and high frequency capability. A comparison of the currently available integrated logic 
elements indicates that diode-transistor logic (D-TL) is the most suitable for general logic 
use in redundant spaceborne systems. A majority voting restorer, designed using inter-
connected NAND elements, has been described which is not subject to the detrimental failures 
of more conventional restoring elements . 
2. Testing of Redundant Systems 
It is a characteristic of redundant systems that they offer a high reliability for a 
period of time after the initially failure free condition, and that the system reliability decreases 
1-apidly when internal failures are present. It is therefore important to insure that no initial 
failures exist in a redundant system to obtain maximum system reliability. Since an initially 
failure free, order three system can withstand any single failure, as well as a relatively 
large number of randomly scattered failures, it offers very high reliability for the period 
of time when the probability of individual failures is low. Techniques are described which 
permit even higher reliability by the use of systematic maintenance of a redundant systems. 
9 
It has been shown that a relatively simple technique called singular rank testing 
may be used to determine that all of the replicated signal processors in a redundant system 
are working properly, and that the majority vote rs are sufficiently failure free to insure that 
the system is not vulnerable to single failures. The system is monitored to determine if each 
individual rank is able to perform all system functions correctly, in a manner Similar to the 
verification of a non-redundant system. This testing places no restrictions on system size 
or configuration. A somewhat more complicated testing procedure, referred to as interwoven 
rank testing, has been described which will completely test all voters to insure that they will 
make correct decisions for all possible input combinations. 
Although a redundant system is more complex that its conventional counterpart, 
failure location within a working system does not require the operator skill and simulation 
equipment usually required to locate failures in a non-redundant system. Since a working 
redundant system always has at least one correct signal available at each stage in the system, 
these correct Signals may be used as a basis of comparison. A difference detector on the 
signal processor outputs to restorers may be used to indicate either permanent or sporadic 
failures among these signal processors. The failure location techniques described may be 
performed during normal operation, since they do not jeopardize system operations. 
3. Reliability of Imperfect Redundant Systems 
The mission reliability of an operating redundant s ystem which contains internal 
failures depends strongly on the number and location of initial circuit failures, as well as 
the failure rates of the circuits which make up the system. 
One very important task is the design of simple and effiCient tests to be performed 
at the beginning of a mission. These tests are 'required to obtain the information required for 
the reliability estimates. A maximum amount of information is desired from a minimum 
number of tests. The work which has been done will provide a basis for future efforts in 
this area. 
Several tests are proposed that may be made just before a mission is to begin to 
determine, at least apprOXimately, the mission reliability without complete information on 
the state of the system. It proposes some procedures for USing the results of the tests to 
estimate the mission reliability with varying degrees of accuracy. A procedure for making 
the decision on the useability of the system without estimating the mission reliability is also 
presented. 
Although a basis for future study has been provided, the details of these procedures 
are still to be worked out and the accuracy of their results are still uncertain. It is recom-
10 
mended that efforts be made to develop an appropriate measure for comparing the technio.'..les 
so that they may be' evaluated relative to a common scale. 
TASK 2 - ADVANCED VOTING TECHNIQUES 
1. Components for Adaptive Restorers 
A survey has been conducted of several devices which are potentially suitable for 
use in the Adaline-Neuron adaptive voter. The survey concludes that none of the suggested 
devices were sufficiently developed to justify the immediate circuit implementation of an 
adaptive voter. 
In general, magnetic devices do not appear to be suitable for use in adaptive voters, 
due to their environmental sensitivity and cgmplexity required for useful operation. Similarly 
electro-chemical devices do not appear to have sufficient simpliCity, stability and compati-
bility with electronic circuitry to justify their use in adaptive voters. 
The mercury cell integrator with photoelectric readout appears in principle to offer 
the most attractive approach because of its simpliCity, stability and general compatibility 
with conventional circuitry. Since the output is essentially a variable resistance pro-
portional to the interval of the control input current, the device offers the possibility 
of providing a simple interface with standard circuitry. The mercury cell integrator 
is, however, still in a rather primitive state of development. It is recommended 
that detailed circuit design should not be undertaken until further device development is 
completed and that present effort on the design of an adaptive voter be restricted to that of 
monitoring the state of the art in device development and to begin detailed circuit design 
when more suitable devices become available. 
2. Threshold and Dynamic Restorers 
The majority voting class of threshold restorers are the most commonly used 
restorers in present technology. Because the majority voter requires a majority of correct 
inputs to provide a correct output, its error-correcting capability is limited . Sioce many 
circuit failures result in steady-state outputs, restorers which detect only changes in input 
states offer the capability of deriving a correct output with less than a majority of working 
inputs. Restorers which detect changes in input states are referred to as dynamic restoring 
circuits. 
The mission of this part of the Failure Free Systems Study has been to evaluate 
the potential usefulness of one proposed dynamic restoring circuit implementation,the Transor . 
11 
The results of section IV have shown that there are certain environments in which Transor 
can be used to advantage in improving sys tem reliability. For example, the maximum error 
restoring capability of Transor is shown to be R-l failures of R redundant lines in an enviro-
. ment free from transitional failures. This is a significant improvement Over the majority 
threshold restoring capability under the same conditions. There is need for caution, however, 
for in environments where symmetrical transitional errors are possible, error correlation 
may make Transor performance inferior to threshold. 
During the course of the study of Transor Restoring Circuits, a new class of 
restoring circuits was conceived. This class, called "Hamming Distance Restoring Circuits" 
is similar to Transor in many ways. It was compared with Transor analytically and by 
simulation. From the results obtained by manipulating the analytical reliability expressions 
for the Transor and Hamming Distance Restoring Circuits, it may be concluded that the 
output of a Hamming Distance Circuit is more reliable than that of the Transor in order-five 
redundant systems. This conclusion holds for any ratio of steady-state to transient error 
probability or any asymmetry (tendency toward "ones" or "zeros") of error probabilities. 
From comparison of the simulation curves, it may be concluded that the threshold 
circuit is more reliable than either of the dynamic restoring circuits until the ratio of the 
probability of steady-state errors to the probability of transient error exceeds apprOXimately 
seven to one . Above this ratio, the dynamic restoring circuit outputs are more reliable. 
Further comparison reveals that the difference in the reliability curves tends to stabilize or 
slightly decrease as the ratio becomes much larger than 7: 1. The stabilizing effect is more 
pronounced as the order of redundancy is increased from five to seven. 
Also, it may be concluded that in the early life, high reliability region with 
approximately a seven to one probability ratio, an order five system using Hamming Distance 
Restorers may be as reliable as an order seven system using threshold voters. 
Since the improvement available from Transor is limited, and since the Hamming 
Distance Restorer is normally superior, further study of the Transor is not justified. 
TASK 3 - SELF REPAIR TECHNIQUES 
Before self-repairing systems can be implemented, many feaSible switching strategies 
must be considered in an effort to determine the most effective manner to manipulate the 
redundant or "spare" blocks. The extreme complexity of the reliability expressions associated 
with these strategies has resulted in the use of a computer simulation program for comparing 
the effectiveness of the strategies. The present program includes subroutines for three 
classes of switching strategies. Each class subroutine contains a great deal of flexibility, 
12 
thereby including many individual strategies. This method facilitates easy comparison 
between members of a class. This comparison allows immediate elimination of many 
possible strategies which are obviously uneconomical. For example, the flattening out of the 
Percent of System Failed versus Spare Mobility curves indicates that none of the strategies 
on the flat part of the curves can be optimum strategies. 
From the results of the Simulation program, curves for Percent of Systems Failed 
versus Spare Mobility have been plotted for the Gamma Class Strategies. These curves have 
been referenced to that of a multiple-line majority voted system because this particular 
technique has been the most effective of the passive, failure masking, circuit level redundancy 
techniques. In all cases these curves show not only that great gains can be realized over the 
multiple-line redundant configuration, but that by far the greatest part of these gains are 
realized for the first few moves allowed to the spare function blocks. Beyond the range of 
relatively limited mobility, little or no gain in the average number of failures absorbed is 
realized by the additional mobility allowed to the spares. This is an encouraging result 
since the great majority of the gain due to self-repair can be retained without the use of an 
exor bitant amount of switching circuitry. 
All of the computer simulation results have been based on the assumption that the 
switching circuitry was perfectly reliable. There is a need to determine the range of allowable 
failure rates which can be associated with each strategy for it to be of maximum effectiveness. 
These ranges should be studied as a function of the failure rates of the associated Signal 
processor blocks . As a result, information specifying the optimum switching strategy 
corresponding to a given signal processor failure rate should be available before actual system 
designs are begun. 
It has become obvious that many of the spare function blocks do not experience as many 
switching operations as they are capable of performing. When all spares are assigned mobility, 
those which use their mobility extend the life of the system substantially. However, in many 
cases when system failure has occurred, there are many spares remaining which have not 
been used to any great extent. In order to try to capitalize on this phenomenon, a class of 
strategies should be investigated which would assign different mobilities to the spare in a 
stage. 
The curves show a very definite gain in reliability for the self-repair strategies over 
multiple-line redundant systems. The curves for the Beta Class strategies show an increase 
in reliability for each increase in "repair" capability. Strategy Beta-3 yields the highest 
reliability but even strategy Beta-1 shows a significant gain over the multiple-line system. 
The reliability curves for the Gamma Class show essentially the same result with respect to 
13 
the multiple-line case. However, inve9tigation of the curves show that increasing the "repair" 
capability produces gains for the first few increases, after which the magnitude of the gain 
diminishes. These curves tend to bear out the conclusions drawn from Percent System 
Failed versus Spares Mobility curves which flattened out after a certain mobility was reached. 
The gains illustrated here must be considered as ideal because the switching circuitry for 
self-repair is here assumed to be perfectly reliable. More realistically, the gains obtainable 
will be a function of the switching circuit complexity and will not be as great as shown here. 
Although little has been said about the physical switching techniques to be employed, it 
has been tacitly assumed that the failure detection and replacement circuitry would be 
combined as much as possible. It has been suggested that these two phases of the repair 
function might profitably be separated and made almost completely independent from a 
circuit viewpoint. This is another area which should be given careful attention. 
None of the strategies considered so far have permitted spares to return to previous 
locations. It is possible that removal of this restriction might add to the failure absorption 
capability of a system. This area certainly should be explored further. 
The Alpha class strategies have not been thoroughly investigated to determine the 
optimum degree of spare overlap (i. e., two sets of spares serving some of the same 
functional region). The information from this investigation should influence the design of new 
strategy classes as well as indicating the optimum strategy for the Alpha class. 
In general, investigations to date have shown that self- repair techniques can be much 
more powerful than presently available redundancy techniques. Further studies are expected 
to show effective ways to apply the techniques to real eqUipment needs. 
14 
Appendix 1 
DESIGN AND TESTING OF REDUNDANT SYSTEMS 
by 
H. Brinker 
A. R. Helland 
September 1963 
l-ii 
ABSTRACT 
This report describes the results of the study on the imple-
mentation of majority logic redundancy. r":ost of the work concerns 
spaceborne systems, hut some portions are more applicable to e rolmd 
s'.lpport eq'llpment. The report is concerned with the initial desie;n 
of the system as ,vell as the testing of redundant systems. 
The possible use of magnetic logic to reduce the total power 
cons'.l1Ilption a.nd provide non-volatile storage is discussed. li:aenetics 
seems to be most '.lsef,.u for non-volatile memory and simple forms of 
logic where t he data rate is very low. Various types of semiconductor 
lor ic are described a.nd compared for 'lse in red'lDdant systems. In-
tegrated Diode-Transistor Logic elements are chosen as the most s~ttable 
for general use. 
Several methods of testing red'.lndant systems are disc'.lssed and 
described i n the section on detection a,nd location of f a il'l res. Vario'ls 
sol'.ltions to the faih\re detection problem are disc'.lssed in this section. 
Some are more s'll table for sLrnple fail'rre detection; others also provide 
information concerning the location of any failures. It is shmffi t ha.t 
maintenance of a red'mdant system greatly increases system reliability 
a.nd red'lces the test eq'llpment and operator skill .. /hich are uS 1.lally 
req'llred to mainto.in a conventional system. Techniq'.1es are described 
which perroi t a major portion of the maintenance to be performed d'rring 
normal system operation. 
TABLE OF CONTENTS 
Page 
I. INTRODUCTION 1 
II. MAGl'lETIC LOGIC 5 
A. Introduction 5 
B. Dynamic Storage and Sequential Logic 6 
C. Hybrid Devices 7 
D. All-Magnetic Logic 13 
E. Summary and Conclusions 2l 
III. SEMICONDUCTOR LOGIC 25 
A. Introduction 25 
B. Classification of Basic TJrpes of Logic 26 
C. Comparison of Logic Types 31 
D. Description of Logic Types 34 
E. Logic Selection 41 
F. Majority Voter Design 43 
IV. FAILURE TESTTI!G OF REDUNDANT SYSTEMS 45 
A. Introd '.lction 45 
B. Singular Rank Testing 61 
c. Interwoven Rank Testing 71 
D. Circ'llt Implementations 79 
V. SUM}1ARY & CONCLUSIONS 84 
I-iii 
l-iv 
Figure 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
OR Gate 
Negation 
LIST OF FIGURES 
TITLE 
Block Diagram, At;D Function 
SRI ~IAD Shift Register 
Device V Flux States 
Device 2 Shift Register 
R-TL Resistor-Transistor Logic (+NOR) 
DC-TL Direct Coupled-Transistor Logic ( +NOR.) 
R- DC- TL Resistor-Direct Coupled- Trans:l stor Logi c 
NS-DC-TL Ncn -Saturated-r~ rect Coupled-Transistor 
Loeic 
D- TL Diode-Transistor Logic (+NAND) 
NS -D-'l'L Non-Saturated-lliode-Transistor Logic 
T-TL Transistor-Transistor Logic 
Sneed-PoHer Performance 
Najori ty Element with Input Isolation 
Reli abili t y of Conventional vs. Redundant Systens 
Singular Rank Testing 
Intervl0ven Rank Testing 
IntenJOven Rank Testing 
Signal Proce s sor Output Control 
Difference retector 
Page 
9 
10 
11 
14 
17 
19 
27 
28 
28 
29 
30 
30 
31 
37 
43 
4S 
62 
73 
74 
80 
82 
- -- -------------------------------------------------------------------------------------------
1. Introd'.lction 
Past studies of redundancy techniques and consideration of the 
basic characteristics of some redundancy techniq'les have yielded in-
teresting insights and problems. Hany of these considerations are in 
the area of engineering method. Others concern the design of redundant 
systems with high reliability a.nd other desirable characteristics. This 
section is intended to review some of these considerations and to preview 
some of the tho'lghts behind the disc'lssion in later sections. 
The report itself deals primarily with some of the problems which 
B,re enco'mtered in designin~ and testing 'lsef'11 red'.mdant digital systems. 
Some of these problems are at least comparable to non-redundant design; 
others are rather uniq'le to red'mdant systems. Possihle SOl'ltions for 
these problems, as well as more detailed problem descriptions, are con-
tained in appropriate sections of the report. 
Circ'li t and system desien m'lst reflect the fact that red'mdancy 
is only a tool to realize reliability. The proper use of red'.mdB.ncy is 
often a more efficient and powerf'l.l techniq'le to realize a reliability 
req'lirement tha,n are the more conventiona.l techniq'les s'lch as conservative 
desien or component selection. Redundancy is, h01'leVer, most powerf'll when 
'lsed in con j unction with techniq'.1es that increase basic reliDbility. 
It is importe.nt to recognize that a red'.mdant system is expected 
to operate • .nth rele.tively large nlJInbers of random fa.il'lres. Since con-
ventional s3rstems 'lsua.lly fail when e.ny of their parts fail, it if' relatively 
'lnimport ant wha.t effects these f ail'.l.res have, except I-Then repaj r is desired. 
1-1 
-------------------------------------- --- -
1-2 
Circui ts f or redundant systems, .hmvever, r'lust he desi gned so that the 
effects of individual component failur e s are mini~. zed, and us ually limited 
to the circuits in Hhich the failure occurs. This does not i mpl y , however, 
that rednndancy includes " usel ess" parts. Eac h part of the system must 
contr ibut e t.o the assurance t hat the system will perform all of its fun cti ons 
properly . 
The use of redLm dancy will al ter t~e characteristics and e rformance 
of the system. Redundancy will usually increase design complexi ty , pO\'ler 
requi r ements and dissipation , si gn al propagation time , size and "Ieight , 
number of interconnections, and initial cost. Redundancy , therefore , 
emphasi?es the need for continuing developnent of 10vJ-power ci rcuitry, 
:ni cro-miniaturization, and interconnection t echniques . The type of circuitry 
Ivl;i ch is used to i!llplement a redundant system must be carefully chosen to 
mee t the system requirements without incurring excessive costs. Wnenever 
there is a need f0r hi gh reliahili ty , the circci try should be chosen to 
have a hi gh basic reliability, 1m. sensitivity t o par ameter variations, and 
low power dissipation tC' minimize temDeratur e stress. In addition , sFecific 
systems have _speci al req'J.i rements which must t,e considered in the system 
design as well as the choi ce and design of the c i rcuitry . For example , the 
total availabl e power is often s everely limited f e r spaceborre equi pment, 
a1 though the processi ng rate is usually quite low. It i s usually desir able 
to provide some means of testing t o verify that all parts of t he redundant 
system are working to insure that all of t he reliability ini tially desiGned 
into the systeCl i s av ailable f or tre duration of t he mission . The system 
and the circui trj" therefore must be desi gned so that accurate and mean~ng­
ful tests may b e applied to verify t hat tre parts are working . Hhen 
extended lifetime is desi r ed and repair is possibl e , n redundant system may 
be systematically repaired to gr eatly increase the expec ted time between 
system failures. If a S'.fstem is completely repaired prior to each mission 
in which it is used, it will e xhibi t t he high miEsion reliahili ty character-
istic for each mission. Such systems must be designed so that compl ete , 
effi cient tests may be periodically applied to t hese sys t ems which lrlill 
verify that all the parts are working properly , or that will facilitate 
maintenance procedures which will r eturn the sys tem to the initially perfect 
condition. I t is important for this type of maintenance that all failures 
be detectable, otherwise these undetectable fai lures will tend to accmnulate. 
These accumulated fai lures ;vill event'Jally tend to dominate t he system 
behavior by causing additional system failures. 
Many failures may be detec ted as they occur in a redundant ~Jstem . 
These may be renaired while t he system is in operation to obtain a v er y low 
system failure rate compared to the failure rate for t he parts of the 
system. Periodic maintenance must be performed in aidi tion to the cont inuous 
moni tor and repair described ab ove to detect those failures which cannot be 
detected during re gular operation of the system. 
Systems which will be maintained must t her efor e be designed both 
wi th the capabi lity for detectin g all failures and facilitating the main-
tenance and r epai r procedures. With proper design , many of t hese failure 
1-3 
1-4 
detecti.on, ma:1nt enance and repair procedures may be accomplished durinr.: 
ope ration of the system. 
The following sections of this report will discuss t he problems 
associated with circuit design , choice of the type of circuitry, failure 
detection , and maintenance of redundant systems. This report describes the 
results of the study of these ~roblems and possible solutions. The results 
are su.rnrnarized in the Summary and Conclusions section of t his report. 
II. Magnetic Logic 
A. Introduction 
The past decade has witnessed the development of a variet y of mag-
netic devices suitable for performing storage and logic in digital com-
puters. Perhaps the most important application of magnetics to digital 
technology has been provided by the development of large capacity, random 
access memor,y S,Ystems composed of ferrite cores. Advances in techniques 
for perfOrming logic have received some attention, but to date magnetic 
logic does not appear to be widely accepted as a superior replacement for 
the conventional transistorized counterpart. This general reluctance to 
utilize the special attributes of magnetic logic is often justified by 
several difficulties inherent to the device characteristics and system 
configuration. 
Much of the magnetic logic research has been motivated by the 
potential ability of magnetic devices to provide higher reliability at 
lower cost while consuming negligible standby power. These attributes are 
understandably important in any large electronic system, especially in space 
applications where reliability must be high and available power is invari-
ably low. To evaluate the potential ability of magnetic logic schemes to 
provide these advantages a discussion of some of the more promising a~proaches 
appears to be in order. An all inclusive survey and treatment of the 
* myriad of suggested approaches could easily fill a book. It appeared 
.* Edited by Meyerhoff, A. J., Digital Applicati ons of Magnetic Devices, 
New York; John Wiley and Sons, Inc., (1960). 
1-5 
reasonable therefore to restrict the detailed discussion to the more pop-
ular approaches and to provide references for other. Of particular in-
terest are those devices which utilize magnetic componets which are either 
commercially available or in an advanced state of development. 
B. Dynamic Storage and Sequential Logic 
The state of a magnetic device is determined by the direction of 
remanent flux. Information stored is not directly accessible and a clock 
or read pulse must be used to determine the state. The read process in 
most schemes also destroys the information which was stored. An output 
signal is available only for that portion of the read cycle during which 
dynamic flux change is in progress and thus level output and asynchronous 
operation is not obtainable. The rinple-carry binary counter, the parallel 
adder, and many familiar digital configurations are not directly amenable to 
magnetic i1Tll'lementation. In contrast, the powerful combinational. logic 
approach utilized in conventional conrputers consists of a cascade of com-
patible logic modules which form complex functions simultaneously during 
the interim between clock pulses. In a magnetic logic machine using 
qynamic logic this is not possible and operations involving OR, AND, 
transfer, buffering, negation and delay require several clock periods to 
generate a particular function. This step by step process usually consumes 
considerable time which may be further extended if the magnetic logic 
modules are limited in fan-in and fan-out and thus require adcli. tional operations. 
1-6 
c. Hybrid Devices 
The principle involved i n using square loop mat erial to store a 
remanent flux has been known for some time. With the development of small 
torroi dal structures employing sintered ceramic ferrites and ferromagnetic 
tape materials, magnetic devices began to demonstrate practical utility. 
The magnetic shift register has received the most attention primarily be-
cause of its general utility and simple configuration and has been the 
subject of much of the magnetic literature. Although playing an important 
part in most digital systems, several additional devices are required in 
order to provide the variety of logical operations required by typical 
computer systems. 
The task of performing general logic requires circuitry capable of 
being arranged to perform any Boolean output function of a set of input 
variables. In order to provide this operation a complex function is usually 
formed by using logic modules to perform OR, AND, negation, storage, delay, 
etc. If gates are to be connected in various configurations the devices 
used must provide a clearly identifiable "1" and "0" state, unilateral 
information transfer and the capability for fan-in and fan-out. To meet 
these requirements with magnetic devices has not been an easy task. 
A major difficulty which impeded rapid development of devices to 
meet these requi rements has been the inherent bilateral nature of simple 
magnetic structures. In the early devices this 'Was largely overcome by 
combining diodes with simple torroids to achieve unilateral information 
1-7 
flow. Obvious limitations in inroedance levels, fan-in and fan-cut 
drive capabilities nece ssitated in many cases t he further inclusion of 
resistors for tailoring impedance levels, capacitors for temporary storage 
and transistors for power gain. Although this hybrid logi c apprcach led 
to the development of a number of clever magnetic devices, t he potential 
of achieving high reliability at low cost is seriously challenged by the 
requirement for using non-magnetic components and the more complex wiring 
and system organization which becomes necessary. An excellent survey 
of a wide variety of hybrid devices has been provided by Haynes. l One 
such approach, parallel transfer core-diode logic, will be used as a vehicle 
for describing the' principles of dynamic logic and to indicate the opera-
tion of a ~ical practical device. 
Shown in figure 1 is the OR gate, the simplest of logical functions 
which may be implemented with magnetic cores and diodes. The 0 and 0 
notations denote cores of the same rank, i.e. threaded by a series con-
nected, current driven clock line. The two phase clock system effects 
readout and transfer of data by driving the core to the "0" state. If 
a core was previously in the "1" state the clock, in driving the core to 
the "0" state, causes the core to Swl. tch and provides an output sufficient 
to drive the next core to the "1" state. If a core was previously in the 
"0" state a negligibly small output occurs when the clock drive is applied. 
Diodes are shown to prevent output loading when a core is being set. 
Addi tional components such as resi stors for tailoring impedance levels and 
1-8 
diodes to prevent reverse data transfer may be required in a practical design. 
It should be noted also that t he core output windings must contain more 
turns than core inputs in order to allow a transmitting core to set a 
receiving core, which also tends to prevent r everse data t r ansfer . 
X - __ t---{-~ 
r-"9t--X + Y 
y--..... --+--I... CLOCK 8 
CLOCK A 
Figure 1 OR Gate 
Operation is initiated by reading inputs X and Y into the r::J 
cores. The phase A clock then transmits the state of each of the input 
cores into a dual winding storage core. If the storage core was set by 
1-9 
------------ - - - - - ---- - --- - - - -
any of the transmitting input cores, a readout signal is gener ated when the 
storage core is reset by the phase B clock. 
The AND function is not as easilJ' implemented unless a coinci dent 
current threshold technique is employed to set the storage core. This 
technique does not appear to be suf ficiently reliable however, due t o the 
associated threshold and drive tolerances normally encount ered in a t ypical 
system. A more conventional system employs the pri nciple of logical 
negation in combination with the OR gate to provide the AND function. 
For example, consider the negation arrangement of figure 2. 
1-10 
DUMMY CORE 
("I " GENERATOR) 
c=J 
X ---tM---f--I.... 
CLOCK A 
Fi gur e 2 Negation 
INHI BIT 
CORE 
\---M-X 
The upper core is used as a "1" generator which in the absence of an in~ut 
from the X core causes the inhibit core to be set by the phase A clock. 
The phase B clock will then generate an output whenever the X si gnal is 
absent and thus represer..ts the negation of the input. When both the "1" 
generator and X input signal appear simultaneously at the inhibit windings 
they eIIecti vely cancel each other and the inhibit core remains in the riO" 
state. The phase B clock in driving the inhibit core to the "0" state 
will not generate an output signal for this case. 
The principle by which the AND function may be performed is based 
-
- -
on the well known logic relation X + Y = XY. A block diagram of a typical 
AND gate scheme is shown in figure 3. 
X 
x NEG, 
--
- -
x + y x + y = X· OR NEG. 
y 
y NEG. Y 
Figure 3 Block Diagram, AND Function 
1-11 
--------------------------------------------- ---
Since each of the logic modules require two clock peri ods and each oper ation 
is performed in sequence, the output signal is seen to appear six clock 
periods after the inputs were applied. If the resultant output of the 
AND function is to be further combined with other AND-OR operations it 
becomes evident that the total number of clock periods required may become 
prohibi tive. 
In view of the system complexi~ and speed limitations suggested by 
the simple example described, magnetic logic is seen to introduce problems 
of system organization which are alien to conventional DC level logic. 
As far as cost and reliability are concerned, the prospect of winding cores 
with several turns and the large number of cores and connections required 
do not appear to provide a significant cost advantage. In the qybrid 
approach the use of additional components such as diodes and resistors 
appear to seriously negate the basic reliabili ty inherent to the magnetic 
material. These difficulties not withstanding, several companies are 
active in the manufacture of magnetic logic modules. The major emphasis 
has been placed on the usefulness of the magnetic shift register to provide 
cost, size and power advantages over the conventional approach. Magnetic 
shift registers employing the hybrid approach have been successfully applied 
to a wide range of airborne equipment. Sequential programmers, counters 
and timers operating at low clock rates represent the majority of applica-
tions. ~Vhen operating at shifting rates higher than 10 kc however, the 
1-12 
advantage that the magnetic shift register has i n consuming negli gible 
standby power is obscured by a power requirement which i s often greater t han 
the solid s tat e counterpart. A leading supplier of ~rbrid magnetic l ogic 
modules and shift r egisters is currently marketing a 10 bit shift register 
which requires a maximum average power of .4 watt s to operate at 10 kc 
and 3.7 watts at 750 kc. Since it appears reason able to assume that t hese 
power r equi rements are reflected also to general loei c sys tems, the appli-
cati on of hybrid magnetic logic to power-li mited envi ronments i s l imit ed 
to systems whose s hift rate is very low. 
D. All-Magnetic Logic 
The obvious limitations of the hybrid approaches in reli ability and 
cost has to some extent motivated an effort to develop systems using only 
magnetic material and connecting wire. Several novel a~proache s were 
developed which made use of magnetic device geometry to achieve coupling 
isolation, flux gain and unilateral information flow. One of 
t hese devi ces is t he I1ulti-Aperture Device (MAD ), 2,3 a thr ee 
aperture ferrite structure similar to the Transfluxor.4 Input-output 
isolation is possible because the flux stored around the minor output aper-
ture may be sensed non-destructtvely without affecti ng stored flux about 
the i nput aperture. 
Shown in f i gure h is a t ypical MAD shift regi ster developed at 
Stanford Research I nsti t ute. 
1-13 
o 
ADV. 0'" E ---+--+--~ 
CLEAR 0 
E 
ADV. E - 0 ____________ ---!~+----I 
CLEAR E 
Figure 4 S.R.I. }~D Shift Register 
o 
An advance current is applied to the parallel connection of output and 
input aperture windings in order to effect information transfer from the 
transmitting core to the receiving core. In accordance with the state of 
the flux stored around the transmitting aperture and the resultant magnetic 
threshold thereby established, the advance cur r ent will divide between the 
input and output windings. If the transmitting aperture is in the !lO" 
or cleared state the advance current will divide equally thus not exceeding 
the magnetic threshold of either apertures. If a "111' were stored the output 
aperture with its lower threshold is swamped by the advance current and the 
transmitter switches flux locally about its output aperture with low values 
1 - 14 
of current. By voltage or impedance steerhng the ma j ority of advance current 
will flow through the r~ceiver input aperture causing it to exceed its 
setting threshold and be set. In time as t he flux switching is completed, 
both currents wi l l return to their nominally equal values. 
Since the read-out and transfer process is nondestructive to the 
state of the core, a clear line threading t he major aperture is required 
to return the core to the reset condition. In order to provide information 
flow from left to right a basic four clock cycle is required with the 
following sequence: •••• , ADV.O~E, CL.O, ADV.E-tO, GL.E, ••• The 
ADV O~E pulse switches flux locally about the output aperture of the 0 
element and causes the E element to be set. The CL 0 pulse then clears 
the 0 element and in so doing switches flux through the output winding. 
This results in a loop current flow that negatively sets the E element 
receiver without affecting the flux state about the out put aperture of the 
E element. Note that neither the ADV. O~E nor CL. 0 pulse causes any 
flux to be switched in the output leg of the E element thus eliminating 
the need for a diode to prevent backward data transfer. In this manner 
unilateral data transfer is possible using only MAD devices and conducting 
wire. 
Thus far the discussion has been devoted to techniques for achieving 
unilateral data transfer with the S.R.I.-MAD approach. The problem of 
achieving reasonable flux gain and fan-out is one which could not be solved 
1 - 15 
in a practical sense with the s~mple transfer scheme previously discussed. 
H.D. Crane has done much of the work in arousing interest in the all-
magnetic }~ approach. In a paper) describing t he design of a moderate 
sized computing S,Ystem using S.R.I.-MAD devices however? the basic transfer 
gate had to be seriously modified in order to operate in the S,Ystem. 
Problems inherent to the flux threshold r elationship between receiving 
and transmitting apertures, flux gain, fan-out as well as flux decay and-
build-up in circulating loops made such modifications necessarJ. As a 
consequence the revised gate module required flux doubling and clipping 
operations in addition to the previously described clear and advance cycles. 
The complexity involved in the resultant device implementati on appears to 
be a serious encumberance. The system chosen to demonstrate the ability 
of all-magnetic devices took the form of a decimal arithmetic unit with 
the ability of perfonning addition, subtraction, and multiplication. The 
system was made exclusively of modules which perform either the two input 
OR function or the two input OR with negation (NOR). 
1 - 16 
Rather than describe the comolex details of the S.R.I.-MAD logic 
gates it appears more reasonable to present an alternate 
approach to the design of MPD devices developed by Com pany I . In t ris 
approach a priming operation is performed to reverse the flux stored about 
the transmitting aperture prior to readout. The readout process in this 
case is destructive and resets the core. The priming operation provides 
an adequate flux level which, when reversed by the clear or transfer 
operation, delivers an output pulse to set the next core throu~h its 
-
najor aper t ure. Since data floH is f rom minor aperture to major a.per-
ture and si nce the state of a core is not disturbed by rever se c'lrrents 
flovling through a minor apert'lre, the possibility of reverse data flovl 
is prevented . 
The flux conditions present for the various states of a typi cal 
MAD element of thi s type (referred to as Device 2) is shown in figure 5. 
OUTPUT 
ADV. 
ICLEAR) 
0) RESET OR CLEARED STATE d) RESET CORE AFTER PRIMING 
OUTPUT 
AOV. 
ICLEAR) 
b) SET STATE c) SET CORE AFTER PRIMING 
Figure 5 Device 2 Flux states 
1-17 
In the cleared state (figure Sa) the core is saturated in the clockwise 
direction by a previously generated advance current which thr eads the major 
aperture. Upon application of an input signal threading the inner Dortion 
of the major apertureJthe flux nearest the major aperture is reversed thus 
providing the set condition shown in figure Sb . Thi s read-in oper ation does 
not affect the flux linking the output aperture and thus a diode i s not 
required to block data transfer to receiving cores. In order to obtain 
an output from a properly set core it is necessary to provide a prime 
current as shown in figure Sc to reverse the flux stored about the output 
aperture. Priming current is of a lower magnitude than the advance current 
and because of its slow rate of change is not sufficient to cause the core 
linked by the output winding to be disturbed. Once a core has been set and 
primed, the application of an advance current causes a flux reversal about 
the output aperture. This in turn, provides an induced voltage of suffi-
cient magnitude to drive the next core to the set condi tion. If the core 
was initially i n the reset condition it will remain in this condition after 
priming (figure Sd) . For this case, the applicati on of t he advance current 
does not provide a flux reversal and thus no output OCCIITS. 
Device 2 elements may be connected in a variety of shift re gi ster con-
figurations including parallel input-parallel output, parallel input-serial 
output, serial input-serial output , etc. Such shif t registers take the form 
of 2 core-per-bit arrays and requi re a two clock system in combination with 
1-18 
a priming source. A typical serial input-serial output shift register 
section is shown in figure 6. 
ADV 0- E 
INPUT 
PRIME 
ADV E- 0 
Fi gure 6. Devic e 2 Shi f t Regi ster 
The propagation of a "1" from left to right proceeds by activating clock 
and prime signals in the follOWing sequence: ••• PRIME, ADV O~E, PRIME, 
ADV E~O, PRIME, ADV O~E, .... AMP-~JW shift registers require relatively 
high values of pUse current for performing advance, prime and set oper-
ations. Nominal operating level for the advance current is 2 to 3 amperes 
in a typical design. Prime and set pulse currents are lower being 100 rna 
and 250 rna respectively. Because of the requirement for slow priming and 
in order to keep average power dissipation at reasonable levels, these 
1-19 
shift registers are limited to repetition rates of 10 Kc. A typical driver, 
which utilizes a capaci ti ve storage-discharge scheme and du.al Shockley 
diodes for triggering the advance currents, requires an average power of 
5.3 watts to drive a 10 bit shift register at 10 Kc. A 10 bit shift register 
with its associated driver requires a package occupying approximately 9 
cubic inches. 
The implementation of general logic operaiions using MAD devices is 
not easily accomplished, due to the difficult.y of achieving logical inversion 
and reasonable fan-out without an imposing complexi t.y. The treatment of 
much of the general logic capabilities of MAD devices is reported in rather 
implicit terms by the current literature. The OR function may be provided 
relatively simply by threading additional windings about the input aperture 
if care is taken in preventing reverse information transfer. The negation 
operation may be achie ved by extending the current inhibiting and "one" 
generator technique described in the hybrid approach to the MAD topology. 
Perhaps the most difficult problem wrich faces the all-magnetic logic de-
signer is that of providing fan-out. This arises from the fact that all 
the power which is used to provide inputs to receiving cores comes from 
the clock source. Power gain inthe ordinary sense is not available except 
in those nybrid schemes which use transistors to provide regeneration. 
A MAD device with a reliable fan-out of two is sufficient, however, to 
allow the performance of general logical operations requiring much greater 
fan-out. This may be accomplished by utilizing additional clock pulses to 
1 - 20 
sequentially transfer data in a "tree" wiring arrangement until the ori-
ginal single core data is available simultaneously in several ceres. As 
far as fan-out is concerned, it appears that the hybrid approach using 
transistors provides an important advantage over the all-magnetic techni-
ques which necessarily require considerable device and system complexity 
to achieve the same result. 
E. Sunnnary and Conclusions 
The foregoing description of magnetic logic has not attempted to 
describe the variety of possible approaches. The techniques for accomp-
lishing general logical operations have been implicit, reflecting the treat-
ment of the current literature. Examples from two general classes of 
magnetic devices have been described to provide a basic understanding of 
the techniques involved. If the approaches described may be regarded as 
typical, then some conclusions about their utility m~ reasonablY be expected 
to apply in a general sense. 
Information regarding transfer and shifting operations are covered in 
considerable detail by current literature, but the treatment of general 
magnetic logic schemes has been seriously neglected. This suggests the 
degree of difficulty which has been encountered in the design of practical 
devices. Complex clock programming and device configurations are necessary to 
achieve operations which conventional designers have come to consider as 
1-21 
tri vial. In general, magnetic devices do not display a natural ability 
for performing logic. The primar.y- attribute of magnetic devices is that 
of non-volatile storage, the ability of a core to remain in a particular 
state indefinitely without further application of energy. This feature is 
an important consideration in power limited environments such as space 
vehicles where the standby power between clock pulses may be made to approach 
negligible values. If the clock processing rate exceeds approximately 10 Kc 
however, the average power required often exceeds that of a conventional 
transistorized counterpart. This limits the application of magnetic shift 
registers, timers, etc. to equipment with low clock rates. 
Recent advances in low power microminiaturized devices are seriously 
challenging the magnetic attribute of zero standby power while providing 
higher speed, smaller size and the greater utility of combinational DC 
logic. NASA's Lewis Research Center is sponsoring much of the work in this 
important area. Operating speeds of several newly developed circuits are 
approaching 100 Kc at power levels in the microwatt range. A complete 
logic system with a power consumption of 10 microwatts per stage is anti-
cipated for space applicati.on using micropower logic circuits. With the 
basic reliability of microminiaturized devices constantly improving by 
virtue of an industry-wide effort, the role of magnetic logic appears to 
be fading. 
Another advantage claimed for magnetic devices is the reliability in-
herent in the use of magnetic material and connecting wire. It is assumed 
here that magnetic parameters affected by temperature have been compensated 
1-22 
for by proper design and that clock current amplitude and rise time are 
within the limits of proper operation. Under these conditions the basic 
mechanism of magnetic storage and switching appears devoid of any known 
failure mode. This reliability is however obscured by the large number 
of connections required by the device configuration and t he complexity 
inherent to the system organization. The reliability of a magnetic system 
depends upon the connective paths and the clock pulse drivers. 
Simplicity and low cost is often claimed as a virtue for magnetic 
devices because of the simulicity and cost of the basic cores utilized. 
It should be noted however that the task of provi ding several turns about 
the various apertures and connecting cores in a configuration to perform 
the basic logical operations of AND, OR and negation is not generally 
amenable to automated assembly. The extensive a~ount of hand wiring and 
soldering appears to represent an item of considerable cost. 
The physical size of magnetic devices are generally one or two 
orders of magnitude larger than their microminiaturized counterparts. 
Advances in thin film magnetic logic hold some promise for a significant 
size reduction, but developments in this area have not been extensively 
reported to date. 
The flexibility of magnetic devices is seen to be severely limited 
by the dynamic logic approach and the di fficul t y of achieving reliable fan-
out in the absence of active devices. The flexibili t y of conventional 
1-23 
DC logic systems is evidently superior because of the power gain and the 
inherent signal level s t andarization. 
After considering the attributes of magnetic devices for performing 
general logic, the popular core techniques do not appear to provide an 
evident superiority in power consumptlon, reliability, simplicty, cost, 
size and flexibility over the conventional solid state circuit approach. 
Indeed, the requirements of performing the logical operati ons characteristic 
of digital computers appear to be at variance with the capabilities of 
magnetic logic. The applications which are best sui ted to magnetic imple-
mentation are those in which the operations to be performed are not clearly 
separated into "logic" and "memory". A strong case can be made for mag-
netic circuits applied to the performance of integrated storage ar.d transfer 
operati ons required by a variety of digital processing functions. }fost 
appropriate are the low speed operations inherent in input-output, inter-
face and peripheral equipment. Typical applications include shift registers, 
programmers, timers, sequencers, etc. where the magnetic modules perform 
entire functions rather than discrete operations of storage ~I d logic. 
In these special applications where speed is low, the advantages in simpli-
city, reliability, cost and power to be gained through the use of magnetic 
circuits should not be neglected. In general applications, however, the 
presently developed magnetic circui ts do not appear satisfactory due to the 
several problems inherent in their use. 
1 -24 
III. Semiconduct.or Logic 
A. Introduction 
In contrast with the numerous disadvantages and the general un-
availability of magnetic logic devices, conventional semiconductor logic 
has been used widely. Logic modules are commercially available for con-
struction of general logic systems. Integrated semiconductor circuits 
offer an order of magnitude reduction in size compared to magnetic logic 
modules; they do not require high VOltage or high peak pOlver pulses. 
They operate at frequencies many times greater than comparable magnetic 
logic requiring the same average power, and provide the convenience of 
steady voltage outputs. 
Integrated semiconductor circuits offer a significant size and 
power reduction compared to discrete component semiconductor circuits. 
The rapid acceptance of integrated and semiconductor logic elements attests 
to the advantages of their use. Therefore, integrated circuits have been 
chosen as more sui table for spaceborne digital applications than t.he d:i. s-
crete component circuitry. The circuit design problem is then translated 
to the problem of the choice of suitable types of circuitry and logic. 
A variety of such elements is available with predictable characteristics 
for a wide range of operating environments. The selection by the Air 
Force of integrated circuitry for use in the improved Minuteman is a 
significant factor in the availability of reliable integrated circuits and 
appropriate reliability data. There is also a large amount of goverment 
1-25 
:---------------------- - --- - - - - -
and industry effort devoted to research and development of new and impro'ved 
integrated circuits. 
The low weight and power consumption of integrated circuits offers 
an important compensation for the increase in the number of circuits required 
for redundant design of spaceborne equipment. It is expected that advances 
in integrated circuit technology will allow more complex circuits to be 
included td. thin a single package to further decrease size and weight. In-
tegrated circuits also offer significantly improved reliability performance; 
it is eXDected that the reliability of single chip containing an entire 
function can be shown to approach that of a single discrete transistor. 
The low pot-rer consumption characteristic also tends to i ncrease reliability 
by reducing temperature stress. The significant reduction in the number of 
interconnections is also an important factor in reliability improvement . 
Host integrated logic modules are available in the form of a univer-
sal gate functioD (NAND or NOR). These logic elements are quite appropriate 
for the construction of the restoring function required for a multiple line 
majority voted redundant system. Several types of logic available for the 
universal gate function have been studied. Each basic type is described 
below; those commonly available are compared for suitability for use in 
spaceborne redundant systems. One of these is chosen as par ticularly suit-
able. 
B. Classification of Basic Types of Logic 
It appears that most of the co~~on types of transistor logic (TL) 
may be classified according to three basjc coupling schemes used for the 
1-26 
----- - --- - -
- - - - - - - --
universal gate function. They are described below . 
I. Linear impedance coupling to an input transistor may be us ed 
to form R- TL , as shown in fi gure 7. This type of logic is gene rally not 
available in integrated circuit form. 
+v 
-'I 
Figure 7 R-TL Resistor-Transistor Logic (+NOR) 
II. Direct couplin£ to a multiple output transistor array ( DC- TL), 
may be used as shown in fi gure 8. It is commonly used in the mor e prac t i cal 
modified forms, such as R-DC-TL (type II-A) shown in figure 9. An impedance 
is inserted in each input line to improve operational characteristics. 
Although this type of l ogic is s ometimes r eferred to as r esistor coupled-
transistor logic, its operation is not the same as R- Tl , described above. 
1-27 
+v 
Figure 8 DC-TL Direct Coupled-Transistor Logic (+NOR) 
+v 
Figure 9 R-DC-TL Resistor-Direct Coupled-Transistor Logic 
1-28 
Type II-B coupling involves current swi tc lling and output buffering 
to prevent saturation of the input transistors. This type of logic i s 
sometimes referred to as emitter coupled-transistor logic (EC-TL) or current 
mode-transis t or logic (CI1-TL). One type of non-saturated-direct coup1ed-
transistor logic (NS-DC-TL), which uses an emitter-follower output buffer, 
is shown in figure 10. 
Figure 10 NS-DC-TL Non-Saturated-Direct Coupled-Transi stor Logic 
III. Diode coupling uses non-linear input summing to form the 
logical AND or OR function. The most common form of D-TL is shown in 
figure 11, which performs the positive logic NAND (AND- NOT) function. 
Saturation of the output transistor m~ be prevented by limiting the 
minimum saturation voltage, as shown in figure 12. This results in a more 
constant "zero" output voltage, and diverts excess base current to improve 
transient response. 
1-29 
-------
+ y 
-y 
Figure 11 D-TL Diode-Transistor Logic (+ NAND) 
-y 
Figure 12 NS-D-TL Non-Saturated-Diode-Transistor Logic 
1-30 
Type III-A coupling, shown in figure 13, is a variation referred to 
as T-TL which uses transistor coupling to obtain improved response. 
Logic cperation is equivalent to D-TL when inverse transistor gain ((1 r) 
is low; coupling transistor action removes stored change durin g turn-off, 
and generally perrni ts the elimin<ltion of the output transistor base bias 
resistor. 
+v 
Figure 13 T-TL Transistor-Transistor Logic 
C. Comparison of Logic Types 
A comparison of the types of circuits described above is shown in 
the table belo\Ol for five types which are commercially available. They are 
arranged in the table in increasing order of the numher of equivalent com-
ponents required for a 3-input universal gate function. A larger number 
of components generally i.ncreases fabricati on complexity and i.ncreases 
1-31 
power dissipation. The general characteristics of the se lObic ccn-
figurations are discussed and compared in the paragraphs follOlvin g t he 
table. 
The isolation and speed-power rarkings for the three sat ur::lt ed 
logic types were obtained from liThe Changine Prospective ir. Hi~roci.rcuitsl!, 
Electronic Design , February 15, 1963, p. 56. This article descri es the 
result of a study of different types of logic for single substances 
conducted by PSI . The author observes that no one logic type is superior -to 
all others for every application, but rather that the characteristlcs of 
each type must be considered according to the particular over-all ~'stem 
requirements. 
The isolation ranking is aqua 1 i tati ve measure of the 
input loading, the isolation between inputs, noise immunity, and varia-
tion of input loading with parameter changes, internal failures, and out-
put loading. Logic types with the highest isolation are ranked first; 
those with lOtoJer isolati on are ranked in increasing order. The non-
saturated logic types are inserted into the original ranking by a ccm-
parison of their general characteristics with those of the three sahlrated 
logic types. 
The speed-pOtoJer ranking is a quanti tati.ve measure of the product 
of propagation delay and po>ver dissipation of the different logic types 
I-Jhen similar components and techniques are used in fabrication. This 
1-32 
characteristic varies considerably according to the design and technology 
used for the construction of actual circuits. Logic types with the IOv-Jest 
power-speed product are ranked first; those with higher power- speed 
products are ranked in increasing order. The non-saturating logic types 
are inserted into the ranking order indicated according to available data. 
TABLE I COHPARATIVE RANKING OF AVAILABLE LOGIC TYPES 
NAME Function for Type of Number of Speed- Isolation 
+ Logic Coupling Components Power Ranking 
Ranking 
T-TL NAND III-A 3 1 4 
D-TL NAND III 5 3 2 
NS-D-TL NAND III 6 2 3 
R-DC-TL NOR II-A 7 5 5 
NS-DC-TL NOR II-B 9 4 1 
1-33 
D. Description of Logic Types 
Resistor-transistor logic (R-TL) is a basic scheme for providing 
the NOR function for NflJ positive logic. The resistors are used for linear 
input summing into the output transistor, which is normally biased off 
unless at least one input is present. The bias may be increased to provide 
either the inverse majority or the NA~~ output. The addition of speed-up 
capacitors to the input resistors, although significantly increasing transient 
response, is not sufficient to reduce the power-speed product to that avail-
able with other types of logic. The bilateral interconnection may create 
interaction problems between inputs; performance of the devi ce is sens)tive 
to variations of the input resistors, biasing, and transistor eain. The 
difficul ty of f ahricating an integrated resi.stor-capaci tor ccmbination fer 
each input further decreases the suitability of this type of logic. 
Direct coupled-transistor logic (DC- TL) is a theoretically simple 
method of performing the NOR function for NPN positive logic. Innuts are 
applied directly to transistor bases; the common collector is the output. 
Actual operation, however, is limited by the high sensitivity to parameter 
vari.ations, input current "hoggingl1 and low input impedance which limits 
fan-in and fan-out, and the low noise margin. These severe liuQtations 
have resulted in the actual use of a modified version (R-DC-TL) which includes 
a low impedance resistor-capacitor combination on each input to reduce the 
sensitivity to noise, parameter variations, and current "hogging". This 
modification increases power dissipation, propagation delay, and fabricat ion 
complexi ty. Since the fan-out capability of most NPN posi ti ve logic NOR 
1-34 
schemes is derived from the output collector resistor, the power 
dissipation must be increased to allow fan-out capability regardless of 
whether the fan-out is used or not. 
The basic DC-TL scheme may be modified to provide non-saturated 
input logic (NS-DC-TL). The common emitter resistor reduces the nroblems 
of input current "hogging", and increases input imredance so that this 
type of logic offers high input isolation. Various methods may be used 
to provide outputs; both the OR and NOR may be provided conveniently. 
Good matching of components and close tolerance on a special reference 
voltage supply are required. The clocking function may be obtained by 
controlling the negative voltage supply by gating or a sinusoidal voltage. 
A two phase clock is required for flip-flop functions more complex than 
simple storage. An additional transistor, which shares a common collector 
~~th other input transistors, is required for each input. The voltage 
difference between the "1" and "0" level is usually very small, resulting 
in reduced DC stability and noise margin. NS-DC-TL offers high speed oper-
ation at the expense of high power dissipation. 
Diode-transistor logic (D-TL) is probably the most popular type of 
integrated circuit logic, due to its similarity to discrete component 
circuitry and the excellent operating characteristics. D-TL circuitry 
operates with wide parameter variations to minimize the possibility of 
malfunction due to drift failure. Actual failure testi ng has ~hown that 
redundant D-TL is not sensitive to most catastrophic failures. D-TL is 
most commonly available as NPN positive logic NAND integrated circuits. 
1-35 
The newer versions of commercially available D-TL circuits offer about the 
lO\olest power-speed product available for cireui ts ope rating at moderate 
speeds and v.ri t h good noi se mareins. Considerat ion of int egrated ci ycui t 
character i sti cs has signi ficantly reduced the number of indivi dual 
i solated components compared to the number of discrete components required 
for an equi valent circuit. The entire inDut diode array , as well as one 
level-shifting diode, may be constructed as one mul tiple-emi tter tra.nsistor. 
Each additional i nput merel y requires an addj.tional emitter connection. 
Transistor-transistor logic (T-TL) is a simplified variation of 
D-TL employing transistor coupling directly to the base of the output 
t ransistor. The elimination of one coupline diode reduces the noise margin 
and voltage swin g to about the equivalent of DC-TL. Input isolation is 
si r.rllar to D-TL, except that inverse gain of the coupling t ransistor allows 
some "hogging" of i nput current. The inverse gain cannot be reduced without 
increasing the offset voltage of the coupling transistor* ; increased off-
set voltage, in turn, decreases DC stability and noise mar~. n. Increased 
speed at low pOv1er levels is possible because the coupling transistor 
r emoves stored change from the output transistor to reduce turn-off time. 
The output inverter of D-TL may be designed to prevent saturation 
to reduce excess drive and stored-change effects. This may be accomplished 
by limiting the minimum "0" output voltage by a base t o collector clamp 
to prevent saturat ion of the output transistor, as shown above for non-
saturated diode-transi stor logic (NS-D-TL) . The increased "0" output 
voltage will, how ever, be more constant with increases in output loading, 
VI [" o<.z == In 
1-36 
).. 
" ~~ 
~ 
~ I-..: 
~ 
~ 
~ 
It 
~ 
~ 
~ 
" 
100 
50 
30 
~ 
<rj 
, 
~ 
10 
5 
if sufficient gain is available. Logic operation is equi valent to D-TL 
wi th increased speed and lower power dissipation under comparable 
condi tions. Additional gain may be easily obt ained for D- TL by sub-
stituting an emitter follower fo r the final level shifting diode. 
The speed-pm-ler performance of some of the commonly available 
logic elements currently available are shown in fi eure lL. This firurc 
shows the advertised performance characteristics of different logic types 
available from different suppliers • . 
R-DC-TL NOR 
COMPANY G 
J D- TL NAND x COMPANY 8 (IMPROVED) x w D-TL NAND COMPANY A COMPANY D 
COMPANY E 
NS-DC-TL NORIOR 
3~----~--------~------~------~--~----~~~~~----~--~~--~~ 
.2 1.0 2 3 5 7 
AVERAGE POWER 'DISSIPATION, P -MW 
Figure 14 Speed-Power Performance 
1-37 
The wide variation of performance characteristics for 
different suppliers of the same logic types is due to several causes: 
differences of circuit parameter design , lack of standard test conditions 
(temperature, fan-out, voltages , etc.), as well as the rapidly improving 
technology in this field. Two recently announced improved versions of 
previous elements (Company A D-TL and Company D R-DC-TL) are indicated 
in the figure. The rapid rate at which improvements have been made in 
the field of integrated circuits makes it impractical to make an arbitrary 
decision te use only one logic element for all future sDaceborne redundant 
systems. General characteristics, as well as the specific requirements 
of redundant systems, may be used to make recormnendati ons, hm-lever, 
based on available information. The ger.eral characteristics discussed 
below may be used as a guide to the choice of circui ts, even through 
exact requirements may vary. 
Since systematic redundancy is mest efficient and pOHerful when 
the basic elements are highly reliable, the realization of high system 
reliabili ty with minimum weight and power penal ties requires circuitry with 
high basic reliability. High circuit reliability, especially for extended 
periods of time, is usually realized when the circuit configuration is such 
that proper operation is not excessively sensitive to paramet er variation 
or environmental extremes. High speed performance does not appear to be 
a particular requirement for most spacehorne systems; loVI power di ssipation 
1-38 
is a much more desirable characteristic. Available po\~ er (and total 
energy) is often limited on space missions; the additional circuitry 
required to reduce the probability of system failure \·Jill further emphasize 
this problem. The power required by individual circuits must be held to 
a minimum to keep total power within available limits. The reliability 
performance of most integrated circuits depend on the temperature stress. 
The use of lovT power circuitry is an important f actor in reducing the 
temperature stress, which, in turn, improves the basic reliability and 
performance characteristics of the individual elements. 
Although T-TL offers high speed at low power levels, its 
sensitivity to parameter variation, noise, and input current "hogging" 
has reduced the general suitability of T-T1. This sensitivity anpears to 
be a major disadvantage because the individual circ~its in a redundant 
SDaceborne system are required to operate reliably despite severe environ-
mental variations and the occurrence of failures within the system. Since 
inverse transistor action can limit the input voltage signal, failures 
within the circuit or on the output may affect the inputs. This transfer 
of failure effects to inputs would be a serious disadvantage in redundant 
systems, where the effect of failures must be minimized. 
DC-T1 appears to be even more sensitive t o parameter variations 
and failure effects, except for the various modifications which are used 
to reduce this problem. Posi ti ve NOR logic appears to be particularly 
vulnerable to output failures resulting in failure of input signals. This 
occurs because the transistor turn-on current. is obtained from inputs; any 
1-39 
input must be able to provide sufficient drive to cause the output to be 
"a" for proper operation. Fan-out capability is obtained by providing 
each output with the ability to drive several inputs. If actual failures 
may cause all of the inputs to a circuit to be overloaded, then any other 
cjrcuit receiving any of these inputs are also effectively failed. Addi-
tional fan-out capability is usually reflected in increased pOv.T8r consum-
tion, which, in turn, increases reliability problems. 
In contrast, the turn-on current for positive NAND logic i s obtain-
ed wi thin each logic element. This drive current is diverted to a 10v.T 
impedance input whenever any input is "all. Fan-out capability is provided 
by the output transistor gain, and may be increased withcut significantly 
increased power r:equirements. Since drive current. is provided by e ach 
circui t, rather than by inputs, failures wi thin an N.AND circuit usually 
do not affect proper operation of inputs. The back-to-back diode coup-
ling also offers good isolation characteristics. Actual failure testing 
has verified that failure effects in D-TL is usually limited to the 
circuit in which the failure occurs. 
Limited testing for the effects of both transient effect of 
high gamma radiation and the permanent effect of integrated neutron flux 
has shown that D-TL integrated circuits are more resistant to radiation 
6 than forms of DC-TL. The transient effects of high gamma radiation anpear 
to be primarily due to the leakage of the collector isolation diode. DC-TL 
is more susceptible because the larger number of common-collector transis-
tors used creates a larger junction area. DC-TL was seriously affected at 
1-40 
galllIna levels of 106 to 107 R/ sec, ,vhile one company 's D-TL withstood an 
order of magnitude increase. The same company 's D-TL also showed mor e 
resistance t o integrated neutron flux, but no microcirclli ts s':1olved damage 
at ordinarily expected dosages. At a flux do se of 2.8 x 1014 neutrons/cm2 
(equivalent to about 100 years of continuous exposure in the Van 4llen belts) , 
one company's elements failed, another showed waveshape deterioration, while 
another microcircuit brand and discr~te component D-TL sh01.ved no noticeable 
effects. 
E. Logic Selection 
Integrated D-TL circuitry appears to be the mest appropriate type 
of logic for general use in redundant logic systems for spacecraft missionf. 
It has been chosen for the general advantages of featt:res described above, 
and particularly for its suitability for use in redundant spaceborne equip-
ment, which requires both high immunity to noise and parameter variation, 
as well as reasonably low power dissipation. These requirements are 
generally not available in the various forms of DC-TL. Although T-TL logic 
is equivalent to D-TL, currently available elements are too sensitive to 
input current "hogging" to be suitable for use in redundant systems. 
D-TL is known to have high noise immunity, good input-to-output 
isolation, good capabili ty with other circni try and relatively low power 
consumption. D-TL is particularly insensiti'le to drift failures; failure 
testing had shown that the effect of most catastrophic failures is not 
especially harmful in redundant l ogic networks. The speed capability of 
1-41 
available integrated D-TL circuits appears to exceed the requirements of 
most spaceborne systems. Some of this excess speed capability may be 
traded for lower power requirements by reducing the power supply voltages. 
PO\.rer dissipation could be further reduced by a redesign of present D-TL 
circui ts to use higher resistance values. High resistance is a diffi-
cult problem in present circuits, since the characteristically low resis-
tivity of diffused resistors requires a large area for high resistance 
values. The use of thin film resistors and capacitors on the silicon block 
in which the semiconductors are diffused, as planned by 1.vestinghouse for 
the near future, would permit circuit design for significantly lO~7e r power 
dissipation without the large areas and narrcw strip layout requi red for 
totally diffused circui try. Such single-chip hybrid circui ts are not 
presently available for general logic use. 
It is expected that the positive logic NAND function wil l be 
used, since this permits logic design of functions as the sum of products, 
which is convenient for reduction and simplification by familiar methods. 
The NAND circuits shown are particularly versatile, since the collector 
outputs may be connected together to form AND-OR-NCT logic functions 
directly. R-S flip-flops may be formed by interconnected NAt-ill elements; 
formation of more complex functions such as a compatible counter element 
require a large number of NAlm elements and a two-phase clock. The majori iu 
voter is not a commercially available element, but it is easily constructed 
from NA}ID elements. 
1-42 
-- --- -----------------------------------------------------------------------~ 
F. Majority Voter Design 
Failure testing has shown that particular care must be used for 
the design of restoring elements so that failures on one input. to the 
restorer do not cause failures on other inputs, and the failures in the 
restoring elements do not cause failure of a majority of inputs. This 
testing has shown that a conventional majority element (whether constructed 
as the minimum discrete component circuit, or of interconnected NOR or NAND 
elements) may experience failures which either cause immediate failure of 
the entire set of restorers, or which would cause the same result if a 
single input error occurs] If such effects are overlooked, the system 
reliability may be seriously degraded. Shown in figure 15 is a three 
input majority element using NAND elements which cannot cause an entire 
set of restorers to fail due to any single failures. 
A-----; 
B----I 0---"'--- MAJ (A ,B,C) 
C----I 
Figure IS Hajori ty Element with Input Isolation 
1-43 
The NAND implementation shown utilizes common output logic so that 
the voter requires only two more gates than conventional majority voters, 
and retains a two element input to output propagation delay. NOR implemen-
tation, however, would require a total of eight gates and four element 
input to output propagation delay to obtain input isolation for NPN positive 
logic. It is expected that the isolated input majority element shown will 
be more reliable in normal operation (all inputs alike) than a more conven-
tional configuration, since very few single failure modes can cause the 
output to disagree with the inputs when all inputs are identic~l. 
If higher orders of redundancy are used, then each inDut is 
provided with isolation gates. Since component redundancy is not used to 
protect against single failures, a simple test consisting of monitoring 
the logic output while applying all combinations of logic inputs will 
completely test the operation of the circuit. A custom-packaged majority 
voter would significantly reduce the size and weight of a redundant system 
when compared to one using individual packages. The packaging of this 
majority voter is of particular importance because it is used repetitively 
in a redundant system. 
1-44 
IV. Failure Testing of Redundant Systems 
A. l ntroduction 
1. Characteris ti cs of Redundant Systems 
The outstanding attribute of a r edundant system is t hat of 
providing high reliabil ity for a lon ger period of time t han t he n0n-
redundant counter part. Typical r eliability curve s depicting this r elation-
shi p for a simple system shovm in figure 16. It is assumed he r e t hat bo.th 
syst ems begin operati on with all circuits, subsystems, wi r i ng , e t c . i n a 
failure free condition. 
RELIABILITY 
e 
REDUNDANT SYSTEM 
CONVENTIONAL 
SYSTEM 
-----------------
MTBF(CONVENTIONAL SYSTEM)--...... 
o~------------------------------__ ~~ 
OPERATING TIME I 
A 
Fi gure 16 ReE abili ty of Conventional vs. Redundant Systems 
1-45 
The statistical relationship betwe r-n reliability and operating 
time is derived by assuming that failures occur at constant rate and are 
inherently random and independent. After some period of operation wi thout 
maintenance, the reliability of a typical multiple line, majority voted 
redundant system falls off and becomes less reliable than the non-redundant 
version. This behavior is normal since the greater number of compcnents 
subject to statistical failure eventually cause the majority voters to have 
incorrect outputs. The initially flat portion of the redundant system 
reliability curve is the characteristic which is exploited to provide high 
mission reliability. 
Since current spaceborne eq~ipment is unattended after mission 
commencement, it is important to assure that the equipment is in perfect 
working order "before launch". It may not always be practical to completely 
• 
test each part of a redundant system after final assembly and installation 
into a space vehicle, and thus the term "before launch" includes diagnostic 
testing before final assembly. It will be shown that a redundant system 
may be conveniently diagnosed for the presence of failures after final 
assembly and installation in a space vehicle. This may be accomplished 
during the pre-launch test period when the vehicle is about to begin its 
mission. Essentially the technique employed is that of removing the failure 
masking effects of redundancy and testing the replicated systems separately. 
The function of these tests is initially to detect the occurrence 
of a failure and secondly to dete~ine its location. The tests would be 
1-46 
useful in deciding whether the equipment should be finally assembled and 
installed into the space vehicle or if the equipment is free of failures 
and ready for launch. The goal here is to assure that all of the initial 
failure protection which has been designed into the system is available. 
In a non-redundant system the best one can do is to test the system 
and then hope that no failures occur. The statistical nature of failure 
occurence, however, offers little as~urance that a failure will not occur 
just after mission commencement. This occurrence often precipitates total 
mission failure in a non-redundant system. The redundant counterpart is 
obviously better suited to tolerate random failures. Further, a typical 
order three redundant system which has been diagnosed to be free of failures 
prior to mission commencement is not vulnerable to single failures and thus 
offers a high degree of assurance of mission success. 
Further tests would be utilized to isolate and locate the failure. 
The goal here is to effect repair and thus return the system to nerfect 
working order. Since this may consume considerable time and involve special 
repair or replacement facilities, a duplicate system, which has been found 
free from failure, may be required to expedite scheduled installation into 
the space vehicle. 
For redundant systems which receive maintenance the purpose of 
diagno stic testing is again to detect and locate failures. The goal, how-
ever, is to return the system to perfect working order and thus assure the 
highest possible reLiability during the entire operati onal life of the equip-
ment. In order for periodic maintenance to be effective it follows that the 
1-47 
----- --------------~-----~-----------------------------~ 
period between maintenance checks should be sufficiently short so that the 
reliability for the maintenance period is high. The probability of operation 
repeatedly traverses the initially flat portion of the redundant reliability 
curve. 
The general problem of diagnostic testing is to provide suitable 
test facilities and methods which are effective in determining whether a 
failure has occurred, and to determine its location. In a redundant system 
the implementation of test facilities entails many considerations, ranging 
from basic system configuration to the details of circuit design. In a 
conventional non-redundant system, test provisions are all too often given 
only token consideration. Although the test features provided may be in-
effective or inconvenient, the diagnosis, failure locahon and renair of the 
equipment is often made possible through the ingenuity of an exnerienced 
technician. A redundant system similarly encumbered iMPoses a much more 
difficult task. Thus the need for integrating system configuration and test 
facilities in the initial design stages becomes extremely important. 
2. Testing of Conventional Systems 
The techniques for detecting a failure in a redundant system 
represents a problem which is alien to the test philosophy of conventional 
systems. In a non-redundant system the effect of a failure is rather 
dramatic and is usually evidenced by either partial or total system failure, 
or obvious changes in operational behavior. This simplifies the problem of 
detecting an error, but is small consolation to the user who loses the 
service of a system without warning, perhaps at some crucial moment. Total 
1-48 
system failure usually indicates the failure of a major function, such as 
a power supply or clock generator. Changes in operati onal behavi. or and 
partial failures normally provide symptoms which,when analyzed,are valuable 
in converging on the failure location. In a redundant system the effect of 
a non-critical failure is not evidenced by any change in system cehavior. 
This means that the effect of a failure does not provide gross symptoms 
which may be used to indicate its occurrence or deternune its location. 
The solution to this unique problem is suggested through several avenues of 
approach which represent diagnostic routines and implementation schemes 
unique to redundant systems. 
Before considering the unique demands which a redundant system 
imposes on the required test facilities, it is useful to consider seme 
approaches which are applicable to digital systems in general. These 
general approaches include waveshape monitoring and the application of 
various stresses to enhance the chance of detecting present or potential 
failures. The combination of general approaches with the specific ap-
proaches to be suggested appear to offer a more inclusive repertoire of 
techniques from which to choose. 
In a conventional system a failure of some circuit or sub-system 
normally provides an indication of its occurrence by the resultant changes / 
in operational behavior. These are usually designated as catastrophic 
failures. Degraded components which are not sufficiently marginal to cause 
circuit failure are more difficult to detect because there is no indication 
of a change in system behavior. Often, howev"er, a degraded component may 
~-- ----
1-49 
r - -
be detected at the circuit test point level by changes in normal wave-shape. 
At the component level the degradation may be considered as a failure. At 
the circuit level this condition represents an impending failure. Under-
standably it is important to detect and repair impending failures since it 
is very likely that the circuit will soon fail. This is cne of the more 
important aspects of periodic maintenance of non-redundant systems. Often 
the system may be operated normally and the various test points monitored 
to detect marginal voltages, wave shapes or rise times. This represents 
a very time consuming procedure and is severely limited in effectiveness 
by the number of test points which are provided. ManJr marginal components 
are then essentially undetectable. 
Another problem which often arises is when a failure in circuit 
operation becomes sporadic. In this case the system may operatE: normally 
for most of the time making the location of the fault a difficult task. 
As so often happens, just as maintenance personnel are in the process of 
converging on the fault location, the fault disappears and the system 
operates normally. The problem here is that. t.he fault is not nresent lcng 
enough to allow an adequate diagnosis of the difficulty. 
A more powerful approach for locating impending and sporadic fail-
ures involves the application of stress to the system. This will often 
precipitate a circuit failure by subjecting components to a condition which 
magnifies any degradation. Consider now the two general classes of approaches 
for imposing system stress--environmental and elect~cal. Environmental 
1-50 
stress may be typically sub-divided into temperature, humicity, pressure 
vibration, shock, radiation, etc. The application of one or combination 
of these environmental stresses is seen to present three main problems; 
1) the size, complexity and cost of the facilities required, 2) the 
difficulty of performing measurements in an alien and often dangerous en-
vironment, and 3) the possibility of subjecting components to unnecessary 
stresses and thus causing umrarranted damage or destruction. 
Temperature stress is perhaps the most popular approach because of 
its utility in causing parameter changes in resistance, capacitance, leakage, 
gain, threshold, etc. A second advantage is the small amount of additional 
facilities which are required. Often, temperature stress may be conven-
iently applied by controlling the system cooling to increase or decrease 
operational temperature. Component variations caused by temperature stress 
often make circuit operation marginal when such chanfes are beyond the 
normal specified design limits. Thus a component which has become only 
slightly marginal at normal operating temperature, and is indicative of 
impending failure, may be magnified by temperature stress to precipitate 
circuit failure. This method is often used, for example, in testing tran-
sistors for leakage current degradat.ion at elevated temperatures. In a 
system test the increased leakage current of degraded transistors causes 
circuits to become ~ufficiently marginal to effect circuit failure. 
The remaining types of environmental stress are difficult to imnose 
on a system without test facilities of vast complexity. For this reason 
1-51 
they are not readily amenable to system testing but find greater utility 
at the comDonent or sub-system level. A case in point. is the development 
of highly reliable components, i.e., by carefully controlled production 
followed by extensive testing under a variety of environmental and elec-
trical conditions. 
Electrical stress is a more convenient method for detecting 
marginal components and impending failures. A convenient method for stress-
ing an entire system simultaneously is that of margi.nal voltage testing. 
In this approach the system power supply voltages are varied to combinations 
of maximum and minimum levels for which the circuits were designed. When 
all defective components, modules or sub-systems have been detected and 
replaced the system power supplies are returned to their ncminal values. 
Marginal voltage testing is often combined with simulaticn routines and 
static and dynamic measuring techniques to provide an inclusive test program. 
Simulation programs provide a form of electrical stress which is 
seen to exercise the variety of operational functions which a system may be 
required to perform under actual operating conditions. Often however, a 
simulation tec~que may subject the system to ooerational speeds which are 
not encountered in normal system operation. This might be accomplished by 
varying the frequency of system clock generators to either increase or 
decrease the speed of operati on. In a spaceborne sequencer, for example, 
it may be necessary to speed up the occurrence of time events by several 
orders of magnitude in order to test all functions in seme reasonable test 
period. I n other applications increasing the speed of operati ons to the 
1-52 
maximQ~ design limi t is often useful for magnifying the effect of marginal 
components. For example this technique is seen t o be useful in dete rmining 
degradation in capaci ti ve coupling circni ts. 
A reduction in operating speed does not usual ly subject t he system 
t o stress but is useful in ascertw_ning that some normally fast sequence 
of operations is beinf, performed correctly. Here, the r educti on of clock 
rate is utilized t.o allow operation sequence to be convenientl y moni tored. 
The general approaches discussed are primarily useful in pr ecipitating 
static failures wbich are · impending or sporadic. DC failures ar.d catas-
trophic failures are usually immediately apparent from the manner in which 
the system behaves. vJhen only a portion of the system fails in the static 
state it often provides symptoms which may be used in diagnos i ng the 
location of t he failure. If a ::ailure occurs near t he "front end" of a 
system, the major:i ty of ('utputs will 11sually become static. In this case 
the symptoms are not sufficiently explicit to allow aD adequate diagnosi s. 
Simulation equipment then becomes useful in deterrrJini ng the failure locati on. 
This is accomplished by applying sui table sirnals at t he various subsystem 
inputs and monitoring outputs for the presence of the correct response. 
3. Failure Detection in Redundant Systems 
The problem of detecting a failure in a redundant system is 
usually more difficult than in the conventional counterpart , because the 
effect of non-cd tical failures do not provide gross symptoms of their 
occurrence. This difficulty in diagnosing a failure is amply compensated 
1-53 
by the vast improvement in reliability Hhich a red '..h"'1dant system provides. 
Since a conventional systen normally provides little indication 
of an impendir:g failure, the only availgble resort by "[hich t he system qual-
ity may be diagnosed is b;r t he application of stress. It is, hOl,ever, an 
inconclusive test of t he s~rstems ability to perf orm reliably . In a redun-
dant system the application of stress to components and circ'lits for t he 
p'.1rpose of detecting :iJnpending fail'.ITes is not of significant vahl.e because 
t he effects of individual failures a.re masked by the system configuration. 
Altho'.1gh red'mdant systems are able to tolerate failures u ithout c2.'.1sing 
total system failure, it is often desirable to diagnose t h e systen to detect 
any internal faihtres. It will be Sh01"ffi tha t the application of conditions 
which reduce the ability of a redundant system to vathstand internal f ail-
'.ITe a cts like stress by modifying the configuration so that the f e.ilure 
masking effects are removed. In this marmer, fail'.ITes "[hicl1 are present 
vdll be indicated hy t h e behavior of the system. The following paragraphs 
will describe techniques for detecting and locating failures in redundant 
systems. 
An order-three, multiple-line, majority-voted redundant shift 
register system ,v,Lll b e used to demonstrate basic a.pproaches. This is done 
for ease of explanation and is not intended to suggest t hat the approaches 
ma3r not be extended directly to more general system configurations, or to 
~igher-order redundant systems. It may be noted t hat the testing of redun-
dant systems will involve a hierarchy of tests involved with first testing 
t he signal processing parts, then the testing of the restoring elements, 
and finally the testing of the hardl·mre added for the initial testing f'mction 
1-54 
i tsell' • The extent a:'ld complexity of this hierarchy ,rill d epend on the 
confidence v.rhich i s requir ed of the tests and t he degree of automation 
desired. It appears impossihle, however, t ha.t perfectly reliable opera-
tion can ever be expected f rom any hierarchy of imper f ect eq,.upment 
monitoring other eq'lipment. Altho'lgh these testing methods are intended 
to make a significant contribution to t he techniques available for testing 
redundant eq'lipm.ent, it is expected that further l>Jork in this area will 
result i n f'.Irther improvements. The a ccura"cy and complexit;)r of the tests 
sho,11d be halanced to obtain efficient system operation . 
Often, the problem of failure detection is directly connected 
wi th t he req,lirement for determining the location to facilitate ma.inten-
ance repairs. Therefore, some of the more complete testing methods I-rill 
include combined detection and location. Altho'lgh failure location tecr.-
niques are 'l s'lall:7 more complex than t he basic fail'lre detection techniques 
they often i n clude complete failure detection capahility i n order to locate 
all fail'lres I'rhich might exist in a redundant system. Fail'lre location 
techniq'les also provide effective methods to detect and loca te fail'lres 
i n the iaihu-e detection and location circ'litry itself . 
Dasic fail'lre detection ,viII probably be most 11sef'11 a s a 
verification techniq1te to indicate t hat at least a major portion of 8. 
redundant system is f ail'lre free. This will assure that the iail'lre pro-
tection 'Which has been designed into a red1J.Ildant system i s available t o 
prevent system fC'.il'lre. 3i.TJlple failure detection techniques are also expec-
ted to be a preliminar y technique l'lhich vnll indicate if any fail'lres are 
1-55 
present in a maintained redundant system, so that furt~er corrective 
action may be undertaken. It is important that all failures be detectable 
in a maintained redundant systen, so that failures are not allowed to 
accumulate and degrade system reliability. 
4. Failure Location in Redundant Systems 
If a failure is kno~m to exist in a redundant system, it is 
often desirable to obtain further information concerning the location of 
the failure. This is generally required so that the module containing the 
failure may be repaired or replaced. Although it is very desirahle to be 
able to detect any failure to permit maintenance, it is only necessary to 
locate failures to within the smallest replaceable module. Therefore, the 
requirements of failure detection depend strongly on the contents of the 
smallest replaceable module. If entire subsystems are contained in a module, 
then each subsystem could be provided with independent failure detection 
hardware. This would be sufficient to locate failures within the replace-
able module. It is possible that the requirement for test points at each 
replaceable module to permit failure location may in turn determine the 
practical size and contents of the module. If the test points and con-
nections occupy a large space compared to the basic module, then the volume 
efficiency is rather poor, and a larger replaceable module might be more 
practical. 
If repairs are expected to be made while the system reMains in 
operation, then the module which contains the failure must not include the 
remaining replications of that function. This is nece~sary to permit the 
system to operate while the module containing the failure is removed. 
1-56 
If the entire module is to be replaced if it contains a failure, then the 
failure location techni que must be sufficiently accurate to determine which 
module contains the failure. This module may then be replaced without 
interruption of normal system operation. Maintained redundant systems 
which are continuously monitored and repaired require a combined failure 
detection and location technique which may be anplied without altering the 
operational characteristics of the system. It will be shown that relatively 
complete testing m~ be accomplished during system operation. This is pos-
sible because the most frequent and harmful failures usually cause signal 
disagreements at the inputs to the voters. These signals may then be 
compared, either automatically or with the use of test points, to detect 
and locate these failures. Certain system configurations are amenable to 
controls which allow complete failure detection and location with acce~s only 
to the signals at the inputs to the voters. More generally applicable 
techniques require access both to the voter inputs and outputs. These tech-
niques, as well as the implementation circuitry required, are described in 
the following paragraphs. 
.. 
S. Signal Comparison in Maintained Systems 
The location of a failure in a conventional system requires 
that a handbook be provided to indicate the correct wave shape and binary 
sequence to be expected at each location. This is in addition to sim-
ulation equipment which may be required to place portions of the system 
into dynamic operation. The redundant system masks the effect of individ-
ual failures and thereby makes the task of detectine their occurrence more 
difficult. It will be shown, h0wever, that the masking effects of a 
1-57 
a red'mdant config'.ll'ation may be conveniently removed by controlling the 
O'.1tP'.1ts of the signal processors. This is essentially a gross system 
approach ,,,hereby the occ'.lrre·nce of a faihlI'e is i ndica.ted b~T forcing the 
system to a ss'_une vario'.ls V'llnerable config'.lrC'.tions. If t :1e system i s 
al101·red to either operate norrrw.lly, or in some coni'ig'.lration for ,vhich 
all operations are performed correctly~ the detection and location of 
fail'.lres may be conveniently accomplished by examining replicated elements 
for siGnal disagreement. 
, 
In many respects~ the location of fail'.lres in a red'.lIldant S~TS-
tern is a much easier task than in the conventional system co'mterpart. 
This is because an improper signal may be determined by comparison 1dth 
its replicated versions. If a red'.lIldant system is operating correctly 
in an overall system sense, then the correct signal of each monitored 
element is available at least at a majority of associated test points. 
This is seen to eliminate the tediO'.ts task of monitoring elaborate wave 
shapes and sequences. Haintenance personnel are t hen presented with a 
s~stem whic 1, in principle, contains an integral handbook of normal sig-
nals to be expected at the various locations. The system may be permitted 
to operate normally, ·without simulation eq'llpment, perforrniilg operations 
vrltose bina.ry seq'.tence at any single location is so complex that one co,.1ld 
not hope to describe t hem adeq'_tately in any handbook. This s'.leeests the 
possibility that maintenance personnel need not be completely familiar 
with t~1e detailed operation of the system. 
1-58 
..-- -- --- -- - - -
Th e determim:.tion of an error cO'lld be provic.led by P. lli .~;ercLce 
de t ector in combi nation "lith a s,.uta1.Jle i ndicator. 1~ t e clmici<:',n 1:lO'llcl 1:-0 
reqmred only to monitor the vario'~s test points in some prcscrihed sec;.'.l.c;'".cc 
'mtil arriving at the locetion of a. signal di sagree;n.ent . ;:e Ho'J.lJ r:ot ~' c 
req,.ured to possess any special knov-rledge of ~Illat COYlst.i t'~tes ['. correct. or 
incorrect vrave she.pe, bi nary sequence 'or repet ition r .::.:'e. ;;.lso, nost c.: i1-
f erence detector devices "[hich might b e emplo~;red Hill siGnal C.n~· l ar GG de-
part'~e from normal signals, and ma:-r i ncl'.ldc memor;{ t o i nclic8.te the l ocati on 
of transient or sporadic fail'Jres. From this lie r.iay concl'~de that the 
training req'lirements for oaintenance pers onnel r.'.ay b e appreciablJT red'..lcecl , 
thus providing redur..dant systems uit}l a disti nct nCl.inter:a:~ c e cost adv8.r..t2.ge 
over the more conventional co'mterpart. Thi s attribute Cl.lone mig~lt ,ecome 
a significant factor i n eval'~atiT'.g the tota l utility of a r ed'.mclf.nt s~rste;:'.. 
which is periodically maintained. 
In order to reduce the tota l systen fail'~e rate, periodic m2.i n-
tenance must be cond'~cted a t a s,.uficientl;T short i nterval so t l1at i ndi vi-
dual failures are not so probable that sys teM reliahility i s I:'.ppr e cir.hl:' 
degraded. In addition , if system failure occurs it might be necess[1.r y to 
employ sim'.llation equipment to place portions of the system back into oper-
ation. The advantage of not req,.uring sim'll8.ti0l1 equipment to locate 
individ'~al faihlXes is an important feature of a maintained redund2.nt s j7 stem. 
TIl'~S the f'mction of periodic maintenance is not only to B.ss'~e high syster.;. 
reliability d'~ing the lii'e of the equipment, b'~t also to eliminate t he 
requirement for sim'llation eq,.upment to locate failures. 
Tl1 '~S far i n O'.IT disc'..l.ssion o:f maintained redundant systems, it has 
~een implied th2.t the signal comparison equipment is '~s'~all:r externa.lly 
applied to t he appropriate test points in much t he same manner a s an 
1-59 
oscilloscope or voltmeter is '.lGed in a. conventi ona.l system. As i r.dicated 
previo'.lsly , it may be '..mdesirable t o provide these test p oi r:ts at every 
signal processor cmd voter output i n t he system. Thi s may be due to t he 
lack of access to t h e s i gnal s , t h e physical s i ze of t h e test points in 
comparison to t he circ'.litry b eing monitored , or t he s i gnal loadi ng caus ed 
by test point leads. I n s ome applications it mc.y t heref ore be de sirable 
to provide error detection and display as c.n int egr al p8.r t of the system. 
Integral signal compara tors may "be desiracle for example, i n a mai nt a ined 
redundant system which is continuously monitored during operation and each 
failure is repaired a s soon ~s it is detected . This maintenance philos ophy 
allows a much hieher system reliability t han available ,.d.th period ic main-
tenance. vlith proper d esign it appea r s feasible to remove and replace 
defective mod'lles wi t ho'.lt disturbing t he operation of t he sys t em. 
Since signal comparators \vill i ndica te only 1-Then signal disagree-
ment occurs d'.lring t h e normal system operation, more ext ens ive tests are 
required to detect and locate such faib.lres as might oc~ur i n signal pro-
cessors which 8.re not to be used for some raodes of system opera tion , son e 
of the fail'.lres in voters, and f ail'.lres t hat might OCC'Jr in the control and 
signal comparison circuitry. This suggests a maintenan ce philosophy ,)f con-
tin'.lous monitoring combined .d th periodic complete testing a s fo110\1S: Sigr:al 
processor outP'.lts are continuously monitored d'.lring the operation of t he 
s ystem for t he i ndication of t he more frequent and harmful failures v!l}ich 
cause incorrect signals. These failures are locat ed and may be repaired 
without inter~.lpting normal system operation. Periodically t he normal 
1-60 
operation of the system is sh'~t down to allm.; the system to be completely 
exercised and the othe~·dse undetectable failures to be located and repaired. 
In contrast, the periodically maintained system is allowed to acc~~ate 
failures, even tho'lgh they may be easily detectable, until the end of a 
scheduled maintenance period. Continuous monitoring and repairing is there-
fore a very powerf'~ techniq'~e for detecting and repairing most failures 
as they occur, wi tho'.lt serio'.lsly impairing the ability of t he system to 
operate contimlo'.lsly '\>1hi1e individ'.la1 fai1'.lres are repaired. 
B. Sin~~ar Rank Testing 
1. Detection of Signal Processor Fail'.lres 
An obvious method for detecting fail'.lres in a typical redundant 
system is to separate and reconnect the replicated parts to create indi-
vidual, independent systems. Each system may then be separately diagnosed 
for the presence of failures in the conventional manner. This wo'ud req'llre 
that the basic system be provided ,'lith a large D'..unber of special si'fitching 
circuits imich accomplish a separation. S'~ch an approach is somewhat-im-
practical beca'.lse of the expense, complexity and reliability degradation 
i'mich the additiQnal circ'.utI"1J and wiring would impose. As i'rill be shown, 
a ~.lch simpler means is available to provide a pseudo-separation of repli-
cated systems i'rithout req'llring an elaborate s,v.1tching mechanization. 
As B.n example, consider the simple redundant configuration shown 
in fiV.lre 17. Each of the complete replications of the non-red'mdant system 
are hereB.fter referred to a.s a rank of the system. Each rank normally 
1-61 
NA 0.1 
_ .... _______ .... __ A=O. 
~A:.I.0=B1 Q 2.A 0.1 0-.-- (N-M 0.1 INPUT ~ A " 
INPUT B __ _ 
n..r 
INPUT C ___ --' 
\\ " \\ " 
, \ I, 
\ ~ , 
, \ I , ., 
_ .... ______ ..... ___ 8=1.0 
IP & ___ (N-I)S NB IP 
__ .... ________ .... ___ C=N.~ 
n.r NC 
Figure 17 Singular Rank Testing 
consists of the components of the non-redundant equivalent system, separated 
by the majority-voting restorers. Each of the signal processing elements 
(indicated by blocks) 'Within the same rank are designated ,.nth the sc:une 
capital letters; each of the majority voting restorers (indicated by circles) 
within the same rank are designated with the same lOl'ler case letters. 
The corresponding replications of the same signal processors ere 
hereafter referred to as being on the same file of the system. Each element 
in the file normally performs the same function, and is designated "nth the 
same number. Each signal processor file corresponds to i ndividual f'xoctions 
at the non-redundant system. If a signal processor file has a restoring file 
associated with it, the restoring file me.y be assifP1ed the same number. 
1-62 
r--------------------------------------- ----------~----------------------- -- --- --- --
It will be assumed that the order of redundancy is uniform 
throughout the portion of the system which is being tested and that the 
only interconnections between ranks occur at the inputs to restorers. 
Singular rank testing will assume that there is no restrictions on system 
size, configuration, or uniformity of direction of signal flow. These 
characteristics are chosen to be compatible with current redundancy synthesis 
techniques. 
Suppose that the control lines shown in figure 17 provide a 
means of causing each output of the rank signal processors to assume 
ei ther the "1" state, the "0" state or "N" (nermal operation). In effect, 
the output of the A and B rank blocks have been forced to assume definite 
DC failure states. The mecnani7ation to accomplish this is described in 
part D of this section, and will be shown to entail only slight modification 
to the normal circuitry. Consider the effect of causing all the A and B 
rank signal processors to assume a static complimentary state, allowing 
the C rank signal processors to operate normally, and that the system 
is allowed to operate with its normal inputs. Under the conditions that 
all A and B blocks are im a complimentary state the input to each voter con-
sists of "1", "0" and the output of the preceding C rank signal processor 
output. This means that the dynamic signal predomi nates and causes this 
signal to appear at the output of the voters. If all voters operate cor-
rectly, the system is equivalent to a non-redundant system, and may be 
completely exercised in the same manner as the non-redundant system 
to verify that all signal processing blocks in rank C are functioning 
correctly. This test should also yield identical results if the 
1-63 
complimentary states of the A and B rank blocks are reversed. If an 
incorrect final output results for both tests it indicates that at least 
one failure is present in the C signal processors, the c voters or com-
binations of both. If only one test is successful, then a failure is 
evidently present in one or more of the c voters. 
Success of either of the above tests is sufficient to verify that 
all C rank signal processors are failure free. It should be noted that the 
presence of a correct output for both complimentary test c~nditions does 
not verify with certainty that the c voters are failure free. This i s be-
cause each voter was subjected to less than the maximum possible number of 
input signal combinations. Consider the various combinations of input signals 
and the correct response of a three input majority voter in the table be-
low. states 1 and 2 represent the case when A="l", B="O", and C="N"; states 
3 and 4 represent the case when the static signals on A and B are reversed. 
All signals are the same for states 5 and 6. states 7 and 8 occur when 
C disagrees with the other two inputs. 
State No. A B C Output 
1) 1 0 1 1 
2) 1 0 0 0 
3) 0 1 1 1 
4) 0 1 0 0 
S) 0 0 0 0 
6) 1 1 1 1 
?) 1 1 0 1 
8) 0 0 1 0 
1-64 
Only the first four of the eight combinations .{ere verified by the test 
conditions described. states 5 and 6 are trivial however, since they 
contain the combinational states of 2, 4 and 1, J respectively. If a 
majority voter makes a "I" output decision for inputs consisting of two 
"l"'s and a "O"~it will make the same decision for an input of three "l"'s. 
Similarly, if a majority voter makes a "0" 011tput decision for inputs con-
sisting of two "O"'s and a "ll1,it will make the same decision for an input 
of three 11011'S. From this it appears reasonable to aSSlJIIle that if the ma-
jority voter operates correctly for the first four states it will operate 
correctly for states 5 and 6. Thus the combinations which have not been 
tested and hence explicitly verified are states 7 and 8. 
The tests conducted thus far have verified that all C rank blocks 
operate correctly and that the voters operate correctly for six of the eight 
possible input signal conditions. The A and B ranks may be similarly tested 
with the result that the correct operation of all signal processing blocks 
may be verified. This test philosophy is seen to be an approach for isolat-
ing each rank of a multiple line configuration and thus determining the 
presence of any faiblres which would jeopardize the ability of the system 
to mask out fut1lre failures. Each rank is not operated simultaneously and 
independently, but rather one rank at a time is effectively removed from 
the ml11tiple line config'lration and separately diagnosed fol' the presence 
of fail'lres. 
The S'lccess of all of these tests has verified the proper operation 
of all signal processors. These tests have not completely verified the 
condition of the voters as was described by the example of the C rank tests. 
However, the following voter input-output operation has been verified with 
certainty: All voters will make correct decisions if the input from the 
rank in which the voter is located agrees tn th at least one of the other 
inputs. 
The condition which has not been verified is the uncertainty that 
a voter will make a correct decision when the input from the rank in which 
the voter is located is in disagreement with the majority of the remaining 
inputs (both remaining inputs for order three redundancy). It should be 
noted, however, that the complete set of singular rank tests will result in 
the application of all possible combinations of inputs to the voters. These 
tests are therefore sufficient to verify that any undetectable voter failures 
cannot combine with further single failures to cause an order three system 
to fail. 
There are, however, a very limited number of component failures which 
can occur in the majori ty voter which cannot be detected wi th singular rank 
testing. These involve the failure of two of the input diodes for the three 
input D-TL voter. If the voter has a conventional minimum design, singular 
rank testing will indicate if either of these diodes is shorted. Due to 
the additional input isolation, the occurrence of these input diode shorts 
cannot be detected in the isolated input voter which has been shown in figure 
IS. If either of these undetectable diode shorts has occurred in the isolated 
input voter, the result is that the voter output is a "1" whenever the input 
from the rank in which the voter is located is a "1". The majority function 
is performed for all other inputs. The occurrence of either one of these 
1-66 
diodes being open cannot be detected for either the minimal design or the 
isolated inp1J.t voters. The res'..1l t of this condition is iha t the o'~tp,~t 
of the isolated inp1J.t voter is "0" .vhenever the inp1J.t from the r ank in 
which the voter is located is a "011; if the inp'~t to a minimal design voter 
is a "1", the voter o'~tp,~t is a "1". If one of the diodes shorts and the 
other opens, then the voter o'~tp,~t is controlled by t h e inp1J.t from t he r ank 
in which t he voter is located, altho'~gh the diode short co,lid be detected if 
the minimal design voter is '~sed. Therefore the existence of 'mdetectable 
fail1J.res cannot introd1J.ce additional errors, b1J.t may c a1J.se signal processor 
errors to propagate thro1J.gh the restorers. 
The above analysis has shown that t he OCC1J.rrence of 'mdetectable 
fail'~res tends to ca1J.se the 01J.tP1J.t of the voter to be dominated by the 
signal from the rank in which it is located. In the worst possible case 
(complete dominance ca1J.sed by t he one diode open and the other diode short 
in every voter in every restoring file when these failures ar e undetec table) , 
the restorers have been effectively replaced by cond'~ctive pa ths from the 
o'~tp,~t signal processor in the previo1J.s file to the inp'~t of each follow-
i ng signal processors in the same rank. The res'..1lt is eq'.llvalent to elim-
inating the restoring file completely (except that the reliability of the 
signal processors is red'~ced by the additional voter circ,.lltry). Although 
it is extremely improbable th2.t s'~ch conditions vlO'..1ld predominate in a 
system recently constr'~cted from completely tested parts, the system becomes 
more vulnerable to f'.lI·ther fail1Jl'es if they are allowed to 2.cc,..rrnlate. 
1-67 
2. Detection and Location of Voter Failures 
It may be desirable to have some means for detecting the 
presence of any failures within the system. One such example in which some 
method of complete testing is desirable is a maintained system ~hich is 
expected to operate reliably for extended periods of time. If such a method 
is convenient, signal comparison may be combined with sin~ular rank testing 
to detect and locate all voter failures. Since the combined sinEular rank 
tests result in the application of all possible inputs to the voter, the 
outputs of all voters in a restoring file may be compared for agreement while 
the inputs are applied. All voters are failure free if no output disagree-
ments occur while all combinations of inDut signals are applied. 
Since the only purpose of reversing the complementary states of the 
two ranks not being tested in an order three system was to gain additional 
information concerning the voters, voter comparison testing eliminates the 
need for interchanging the complementary states associated with each rank 
test. This requires, however, that a systematic method be used to assure 
that the complete set of tests results in the application of all possible 
combination of inputs to the voters, except the trivial cases when all 
inputs are the same. This condition will be met if the following rule is 
followed during singular rank testing: As each of the ranks is completely 
exercised as an individual non-redundant system, the particular pair of 
complementary DC states of the remaining two signal processors is chosen so 
that the state of either rank does not duplicate the DC state during any 
previous testing of the other ranks. Since the choice of which pair of 
1-68 
complementary I'C states for the testing of t be first rank is arbitrary, 
either of two alternate sequences may be used for the complementary DC 
states; these states will be compleraents of those in t.he alternate sequence. 
Thus it may be shown t.hat only three tests (one for each rank) are required 
for complete singular rank testing with signal comparison. If each test is 
successful in demonstratin~ that the system will perform the entire set of 
functions for which it was designed, all signal processors are verified to 
be failure free and the voters are capable of transmitting a correct dynamic 
signal for some of the possible input states. If, in addition, all voters 
make the same decision while the proper sequence of controls is applied 
during the above tests, the voters are verified to be failure free. 
3. Detection and Location of Control and Comparator Failures 
The basic concepts of singular rank testing may be extended 
to verifying that the controls used for sin gular rank testing are operating 
correctly. Rather than allowing each rank to operate individually, each 
rank is individually controlled by the sine.ular rank testing controls. If 
the controls are working properly, a signal comparison on the output of 
each signal processing file should indicate a disagrE;ement ~henever the 
dynamic signal on the remaining ranks is in disagreement ~~th the DC state 
of the rank being controlled. In the case where difference detectors are 
used on the output of all signal processor files, this testing will also test 
these difference detectors. The detectors should indicate a difference at 
each signal processor file whenever the signal on the controlled rank dis-
agrees with the dynamic signals. If the signal comparison of the signal 
1-69 
processors is accomplished while complementary DC states are applied to 
each pair of ranks, as described above, all pOfsible input combinations 
involving disagreements are applied, and the difference detectors should 
give a continuous indication. If signal disagreements are noted for each 
signal processing file while all of the ranks are being controlled (eitner 
individually, in pairs, or for all possible input combinations involving 
disagreements, but not when the entire system is allovled to operate without 
signal processor failures) then t he associated singular rank control 
circuitry is verified to be failure free. 
4. Summary 
It may be concluded that singular rank testing " techniques are 
a very powerful tool for verifying that a redundant system does not contain 
internal failures. This testing would be valuable for use in acceptance 
tests which verify that all the reliability designed into a redundant system 
is av~"lable, or as the failure testing for continuously monitored and 
repaired systems with periodic complete verification, or in a system which 
is only periodically diagnosed to determine if any repairs are needed. The 
basic singular rank testing is a simple and effective metr_od to alloy) a 
redundant system to be "tested as if it were a non-redundant system to verify 
that all signal processors are operating correctly, and that the restorers 
will introduce no additional errors. This is equivalent to verifying that 
an order three system is not vulnerable to single failures. Basic singular 
rank testing techniques may combine with signal comparison to detect and 
locate failures which may exist in the signal processors, the restorers, the 
1-70 
control equipment, and any signal processor difference detectors. 
Failure detection and location are often directly associated 
problems; failure location techniques are also effective failure detection 
techniques when they are available. It is expected that basic singular 
rank testing h~ll be used as an effective and efficient technique for verify-
ing that a redundant system is nearly failure free for regularly schedlued 
maintenance, or for relatively simple acceptance tests. The more complete 
detection and location techniques are expected to be used for the more 
thorough maintenance checks where any failures would be repaired, or for 
complete final tests after assembly. Signal comparison on all signal 
processor outputs may be used to continuously monitor and locate most failures 
in a continuously maintained system. These tests can be desi gned as part of 
almost any majority voted, multiple line system with a uniform order of 
redundancy threughout the portion being tested. No special signal sim-
ulation equipment is required, except the normally required inputs. The 
equipment required fer the tests is described in more detail in part D of 
this section. 
C. Interwoven Rank Testing 
1. Complete Failure Detection 
In some systems it may be desirable to completely diagnose a 
redundant system without the use of the signal ccmparison and failure 
location technique described above. In some cases, it is possible to per-
form this diagnosis without the requirement for any of the test points 
necessary for signal cemparison. One such technique, which will be described 
1-71 
in the following paragraphs, is referred to as interwoven rank testing. 
It represents an extension of the singular rank testing, since the signal 
paths are interwoven between the ranks to form an equivalent non-redundant 
system in which the signal is switched from one rank to another at the 
restoring files. This is possible only if the system config'xration has a 
sufficient degree of regularity. The example will ass'~e that the system has 
restorers on the O'.ltput of every signal processing file, and that these files 
may be assigned odd and even numbers in such a manner that odd files receive 
inputs only from even files, and likewise that even files receive inputs 
only from odd files. These restrictions are in addition to the assumptions 
on which sing'llar rank testing is based. It will also be shown that the 
controls '.lsed for fail'xre detection may be used to locate voter fail'xres 
without requiring test points or difference detectors on the output of the 
voters. Comparison of signal processor outputs is suSficient to continually 
monitor signal processors and locate all voter failures. 
Shown in figures 18 and 19 are six replications of the previously 
discussed configuration, with the exception that the two control lines for 
each rank individ'.lally determine the state of the odd and even numbered 
signal processors. If the two control lines for each rank were connected, 
the system would be identical to the one '.lsesJ, in describing sing'llar 
rank testing. Consider that the control lines and associated signal proces-
sors are placed in the following states: AO="O" , AE=' l ", BO= 'N" , BE="O", 
CO="l " , CE="N'" , as shown in fig'xre 18a. If an input signal is applied to 
the first file of signal processors, the signal flow will take the path 
shown by the arrows. This is b~cause the two rema.ining signal processors 
in each file have been placed in complimentary static states. If all signal 
1-72 
~b====::::;~;;:== Aa.a,1 ~~a"8 12~ ~1,a0-~~~(N- I)A AE-I,a INPUT ~ A 
-;t;;::====::::;~~==Ba- N,N f.:\ .~---= BE-O,I 
~a,1 yru- (N-llB 
~~=~;:::=:::;;~~= ca-I,a 
(0----
Figure l8a 
;;:t;====:::;;±;==Aa= I,a a~"a8 1 2~ ~~ICa\-~~~ AE ·a,1 INPUT \:.~ A 
;;l~======::::::;;;;;:==Ba:a,1 (0-___ ~N,N 
~~~===::::;;~;::=ca.N,N 
= CE- I,a 
Figure lSb 
--;t;;::=4;:==:::;;t;;==Ba -I,a :: BE =N,N 
rLf 
Figure 18c 
Figure 18 Interwoven Rank Testing 
1-73 
~r;;~==:::::;;;::::;:;:::::=80-N,N 8E-',O 
~~====::;;;.:;~=CO=O" ~c O"~C 2C OI&C =~~ CE~O, I m~ , c 
Figure 19a 
~~~====::;;±;:=AO- 'P ;::. AE=N,N 
n..r 
~~=1===:;;:;:::===80 =O,' 8E=O,' 
::::1=t:====;;~;::::CO= N , N CE= I,O 
Figure 19b 
~t=====::::::±===AO =N,N AE- ',O 
~~:::t===;:;;;:;:;:==80 -0" :-- BE-O,' 
& ---
Figure 19c 
Figure 19 Interwoven Rank Testing 
1- 74 
processors and voters in the path operate correctly the final output of the 
Nth processor (NO) will be the correct output signal. Reversing the states 
of control lines AO, AE, BE, CO should also provide the same result since 
this causes the pairs of signal processors in each file to assume the 
opposite complementary condition. The system may be completely exercised 
as a ncn-redundant system for either of the above DC states. 
Consider now the various combinations of input signals which the 
lc voter was subjected to as a result of the above tests. An examination 
of figure 18a reveals that these ccmbinations are as follows: 
State No. 
3) 
8) 
7 ). 
2) 
A 
o 
o 
1 
1 
B 
1 
o 
1 
o 
C 
1 
1 
o 
o 
Output 
1 
o 
1 
o 
Note that the tests have verified that the voter operated correctly for the 
two signal states which could not be confirmed by the basic sinEular rank 
tests. This was the uncertain condition t hat a voter will make a ccrrect 
decision when the signal processor proceeding it in the same rank is in 
disagreement with the other two signal processors. Thus far our tests have 
verified the above uncertain condition for all odd numbered c rank voters, 
as well as all even numbered b rank voters. A total of four different input 
states have been verified for each of these voter s . The remaining voters 
in these ranks may be similarly verified by the test conditions shm-m in 
1-75 
figure 18b. The a rank voters are verifi ed by the arr angement shown in 
figure 18c and figure 19a. This is seen to be a mirror image extension of 
B-C rank tests. 
At this point in the tes t s, the correct operation of all signal 
processors has been verified. An examination of the various input signal 
combinations which the voters were subject to is tabulated as follows: 
Rank a voters Rank b voters Rank c voter~ 
A B C A B C A B C 
0 1 1 0 1 1 0 1 1 
0 0 1 0 0 1 0 0 1 
1 1 0 1 1 0 1 1 0 
1 0 0 1 0 0 1 0 0 
1 0 1 
0 1 0 
Note that the b rank voters have been verified fer six of the eight possi ble 
signal combinations while the a and c rank~ were examined for only four. 
Since the signal condition of all "Ills or all 110 l1 S was previously shown to 
be trivial, it is evident that the b rank voters have been complet ely tested 
for proper operation under all combinations of input signals. The reason 
that only the b rank voters have been completely veri fied and not the a or 
c rank voters is due to the fact that the b rank voters provided a corr~on 
signal path in the tests involving the c rank voters and the rank voters. 
The a and c rank voters may be completely verified by the tests shown in 
1-76 
figures 19b and 19c. This is seen to cause the dynamic signal path to be 
interwoven between the a and cranks. 
Interwoven rank testing may therEfore be used as an all inclusive 
procedure for detecting any failureR of voters or signal proceescrs without 
requiring access to any test points within the system. The system is reduced 
to sets of equivalent non-redundant systems by appropriate controls. It is 
then completely excercised and tested to deterrune if all functions are 
performed correctly. The success of all tests verifies that all signal 
processors and voters are failure free. If any of the tests result in an 
incorrect output, then some failure is present in the system. The detection 
of a failure gives very li t tie information concerning its location wi thin 
the system. 
Although interwoven rank testing does not require access to 
test points wi thin the system, it is a more elaborate approach which requires 
a degree of regularity in the system configuration as well as the ectablish-
ment of twelve separate test conditions for an order three system, instead 
of the three required for singular rank testing and voter signal comparison. 
The system should be completely exercised for each of t hese tests to verify 
that the system is failure free if all tests are successful. 
2. Failure Detection and Location for Maintenance 
The alternate file controls described above may be used to 
detect and locate failures during normal system operation. Signal com-
parators are required only on the output of every signal processing file. 
1-77 
If a difference detector is integrally connected with each pro-
cessor file, then the correct operation of the signal processors may be 
continuously monitored for maintenance purposes. If only test points are 
available, they may be periodically tested for signal disagreement. Any 
disagreement on the output of a signal processor will indicate that there 
is a failure in that signal processor or the voter which proceeds it. This 
failure may be repaired during system operation if the other replicated 
signal processor and voters in that file continue to operate correctly. If 
a module consists of one signal processor and the voter which provides its 
input, then repair is accomplished by replacing that module. This procedure 
is useful for detecting and locating failures which cause errors, but is 
not sufficient for determining the location of some failures within the 
voters. If all signal processors are failure free, the voter portion of 
the modules may be completely tested by imposing various combinations of 
signals at the voter inputs and examing the associated signal processor out-
puts for signal disagreement. To locate all possible voter failures, it 
is necessary to provide a means of examining signal processor outputs while 
subjecting the associated voters to the various combinations of input signals. 
This may be accomplished by controlling separately the odd and even files of 
the system or sub-system qnder test, as described in the previous paragraphs 
and illustrated in figure 18. For example, suppose that the odd files are 
allowed to operate normally and that each one of the three signal processors 
in the even files are in turn placed in each of the static DC states. The 
outputs of the odd files are monitored for signal disagreement during each 
1-78 
of the succe ssive tests. Any disagreement on the output of an odd file 
signal processor will indicate that there is a failure in the voter which 
provides the input to that processor. Similarly, the outFuts of the even 
files are monitored for each of the successive tests. Si gnal disagreement 
should be indicated whenever t he control signal disagrees with the correct 
signal on the other processors in that file. If this indication does not 
occur, then either the control to that file is not effective, or there is a 
failure in the difference detector. The above testing is then repeated with 
the role of the odd and even files interchanged, each successive test 
examining the signal processors for disagreement. \~i th proper design, any 
f ailures in the voters, the difference detectors, or the control hardware 
may be repaired while the system is in operation. Removal or disablement 
of one replicated voter or processor will not seriously jeopardize system 
reliability if the remai ning replications of voters and processors continue 
to operate correctly. 
D. Circuit Implementations 
1. Control Circuitry 
Consider now t he mechanization for controlli n£ the outnut 
of several signal processors with a single control line. A typical si gnal 
processor output is shown in figure 20. The circuitry shown is seen to be 
in the usual form of D-TL NAND gates. The base return resistor RB may be 
connected to the emitter ground return if the associated transistor is 
representative of the low leakage silicon devices found in integrated cir-
cui try. Since this resistor is normally connected to ground by a discrete 
1-79 
1-80 
connective path, it is a relatively simple matter to provide ~ with a 
separate external connection. 
,-------
I 
I 
I 
LOGIC 
2 
N 
+E 
------------, 
Re ~ RA I 
I 
I 
I 
I 
____ J 
CONTROL 
Figure 20 Signal Processor Output Control 
OUTPUT 
Suppose further that ~ is chosen to be equal to or less than RA• If ~ 
is connected to ground potential the circuitry will operate normally. If ~ 
is connected to the + E supply QO will conduct and saturate regardless of 
the signals present on the inputs 1, 2, - - - N. This is seen to be the 
condition where the control line potential forces the signal processor out-
put to assume the "0" state. If the control line is connected to an equal 
potential of opposite polarity (-E), transistor QO will be cut off thus 
causing it to assume the "1" state regardless of the signals present on 
inputs 1, 2, - - - N. The method described to implement the required control 
function is one of several possible approaches. It is an approach which 
represents a simple modification to existing circuitry and requires only 
a single control line which is grounded in normal operation. 
Another alternative requires control of both the base return line 
and the emitter ground line, but does not restrict the value of the base 
return resistor, Rs, and does not require a negative voltage supply. The 
same method described above is used to cause the "0" output, i.e., to con-
nect the control line to a voltage which j s suff iciently posi ti ve to cause 
the output to saturate. For most cirCUits, + E will be of sufftcient mag-
nitude for this purpose. To effect a "I" output, the emitter ground liIle 
may be removed, so that the out put c annat be a low impedance to ground, 
regardless of input signals. This approach may be Farticularly useful when 
it would be undesirable to reduce ~ less than RA, or in circuits where the 
base input diode, DB' is replaced by an emitter follower to increase base 
current drive. This approach places little restriction on circuit 
1-81 
1-82 
configuration or values and the test power s upplies, but requires two 
separate control lines, hoth of v1hich are grounded in normal operation. 
2. Difference Detector Circuit 
Shown in figure 21 is a typical discrete component difference 
detector which may be utilized in the fo regoing tests. The output level 
is a logical "0" only if all inputs are identical. Any disagreement of 
input signals will cause the first transistor to conduct and thus cause 
the second transistor to assume the "1" state (cut off). The cirouit is 
seen to perform the functional operation of "exclusive OR" for two inputs. 
+'v 
OUTPUT 
INPUTS 
Figure 21 Difference Detector 
The output of the difference detector may b e used t o t r ifler a flip-
flop in order that any momenta~J disagreement of in~ut siEnals may be di~ ~ 
played. This would be us eful in detecting any sporadic errors which might 
otherwise remain unnoticed. As previou~ly mentioned, the diffe rence 
detectors might be combined ~)i th suitable indicators and packaged as an 
integral part of the system circuitry. This would eliminate any loadir.g 
effects due to the use of test leads and external test equipment in monitor-
ing test points. In addition this would provide maintenance personnel with 
a simultaneous display of the condition of the system and the location of 
faulty modules. 
1-83 
v. S'JJ'IID1ary and Concl'.1sions 
1. General 
It has been shown that the special feat'.1res of a redun-
dant confieuration impose '.1nique requirements on the design of functional 
circ,.utry and the facilities req'llred for test. Red'mdancy is a powerful 
tool for achieving extended reliability, b'.1t it sho,.1ld not be encumbered 
wi th circuitry which is inherently 'mreliable or contain partic1.1lar failure 
modes which prevent the associated system confi~.1ration from operating 
independently. An appreciation of this philosophy allows the achievem~nt 
of reliability goals with a minimum of additional complexity. Effective 
circ1llt design is required to obtain the desired balance between complexity 
and reliability in redundant systems. 
2. }~gnetic Logic 
Although magnetic logic is often cited as having several 
features particularly applicable to spaceborne computers, the disadvan-
tages of magnetic logic strictly limit their usefulness in general logic 
systems, and particularly for redundant spaceborne systems. Some basic 
disadvantages are listed below: 
1-84 
1) Lack of compatible steady output signals 
2) Excessive power consumption for speeds 
comparable to low-power microcircuitry. 
3) Extensive peripheral equipment, inchtding 
high current drivers. 
4) Limited fan-out and gain characteristics 
5. High peak power requirements. 
6. Indeterminate reliability performance due to 
extensive hand wiring with fine wire and numerous 
connections, as well as unavailability of accurate 
reliability data. 
7. Complexity required for general logic functions. 
8. Lack of suitable restoring element for use in 
redundant systems. 
Magnetic logic does, however, offer non-volatile storage and 
reduced average power for low computing speeds. Magnetic devices appear 
to be suited to special applications where certain logic functions, such 
as transfer and OR, are intermixed with the memory function, and very low 
speed capability is acceptable. 
3. Integrated Semiconductor Logic 
Integrated semiconductor circuitry offers many character-
istics which are desirable for circuits to be used in redundant space-
borne systems. Some general features of integrated semiconductor logic 
when compared to other commonly available logic systems are: 
1. Significantly reduced size, weight, and power consumption. 
2. Availability of general logic elements, as well as 
special purpose circuits. 
3. Predictable operating characteristics over wide 
environmental variations. 
h. Availability of accurate reliability data. 
1-85 
5. Extensive research and development for new integrated 
circuits. 
6. High frequency capability. 
7. Compatibility with synthesis and testing techniques 
for redundant systems. 
A comparison of the currently available integrated logic elements 
indicates that diode-transistor logic (D-TL) is the most suitable for use 
in redundant spaceborne systems. D-TL offers excellent operating charac-
teristics, such as easily distinguished "1" and "0" states resulting in 
high IX: stability and compatible output signals, high noise immunity, 
self contained drive current, allowable parameter tolerances, input iso-
lation, and other characteristics which permit efficient redundant design. 
D-TL frequency capability exceeds the requirements of most spaceborne 
systems, and requires relatively low power, so that total power dissipation 
and temperature stress are minimized. 
A majority voting restorer, designed using interconnected NAND 
elements, has been described which is not subject to the detrimental 
failures of conventional majority voters. 
4. Failure Testing 
It is a characteristic of redundant systems that they offer a 
1-86 
high reliability for a period of time af ter t he i nitially f ailure free 
condition, and that the sys tem reliability decreases rapidly when internal 
f ailures are present. It i s t herefore important to insure t hat no initial 
fai l ures exist in a redundant system to obtain maximum system reliability. 
This reliability may be required for a single time interval without further 
maintenance, such as for spaceborne systems, or it may be requi red for 
repeated time intervals, where the system is restored to t he initi~lly 
perfect condition prior to each interval. The latter method may be used 
to obtain high mission reliability by maintaining a redundant system 
which is used repeti ti vely, such as the ground support and launch equip-
ment used prior to and during each mission. Since an initially failure 
free order three ~stem can withstand any single failure, as well as a 
relatively large number of randomly scattered failures, it offers high 
reliability for the period of time when the probability of individual 
failures is low. Techniques are described which permit even higher reliabili-
ty by combining periodic maintenance with continuous maintenance of a redun-
dant system. 
It has been shown that a relatively simple test referred to as 
singular rank testing m~ be used to determine that all of the replicated 
signal processors are working properly. If the signal processor fails 
whenever any of its parts fail, success of the singular rank tests will 
verify that all signal processors are failure free. Success of singular 
rank testing will also verify that the majority voters are sufficiently 
failure free to insure that the system is not vulnerable to single failures. 
Singular rank testing effect ively isolates each rank of the replicated non-
1-87 
red'lndant sys t em b:r for cine each r ernainine pair of replica ted ra nks to 
have static complementary binary 01J.tP1J.t s . System o'ltP'.lt i s monitored t o 
de t ermine if each i ndivid'.lal r ank i s able t o p erform all system f '.l.Tlctions 
correctly , i n a manner similar to t he verification of a non-red1mdant sys-
tem. Si ne'.1lar rank testing is expected to be t he most efficient and effective 
method for diagnosing eq'llpment which has been recently assembled from com-
pletely tested" mod1J.les, since the probability t hat t he f ew 'mdetectable 
fail1Jl'es might have occ1J!'red since complete testing is very low. 
A somewha t more complicated testing proced1J.re, referred to as inter-
woven rank testing , has been described which will completely test all voters 
to ins'lre t hat they will ~~ke correct decisions f or all possible inP1J.t 
combinations. It has been s hown that the fail1J!'e detection proced1J!'es may 
b e accomplished by controlling one or more normally gro'mded common lines 
for each of t he replicated ranks of the system, witho1J.t altering the logic 
design or incl'.lding any additional hardware except to provide access to 
t hese lines. Sing1J.lar rank testing places no restrictions on system size 
or confi~lration . 
The characteristics of red1J.ndant systems have been shown to intro-
d1J.ce 'miq1J.e properties to the problem of fail'lre location and fa'.1lty mod'.1le 
replacement. Altho1J.eh a red1J.lldant system is more complex that its conven-
tional co'mterpart, fail'.lre location within an operating system does not 
req1J.ire the operator skill and sim'.1lation eq'.llpment 1J.S1J.ally req'llred to 
locate fail'lres in a non-red1J.lldant system. Since an operating red1J.rldant 
sys tem always has at least one correct signal available at every point in 
the system, these correct signals may be 1J.sed as a basis of comparisQn to 
1-88 
other versions of the nominally identical signal. A difference detector 
on the signal processor outputs to restorers may be used to indicate 
fail'xres among these signal processors. If the detector includes memory, 
it will also detect and locate transient or sporadic fail'xres. These same 
difference detectors may be used for the somewhat more difficult task of 
locating those failures in the voters which do not cause errors when all 
voter inputs are identical, as well as verification that the test controls 
are actually capable of proper operation. The method which has been 
described uses the same types of control as sing'ilar and interwoven rank 
testing, and does not jeopardize system operation if all signal processors 
are operating correctly. 
1-89 
BIBLI OGRAPHY 
1. Haynes, J. L., "Logic Circuits Using Square-Loop Magnetic ~vices: 
A Survey", IRE Trans. on Elec. Computers, Vol. EC-IO, No.2 (June 1961) 
2. H. D. Crane, "A High Speed Logic System Using Magnetic Elements and 
Connecting Wire Only," Proc. IRE, Vol. 47, pp. 63-73; (Jan. 1959). 
3. D. R. Bennion and H. D. Crane, "Design and Analysis of MAD Transfer 
Circuitry," Proc. 1959 Western Joint Computer Conf., San Francisco, 
Calif., pp. 21-36, (March 1959). 
4. J. A. Rajchman, "The Transfluxor," Proc. IRE, Vol. 44, pp. 321-332; 
(March 1956). 
5 . H. D. Crane, "Design of an All-Magnetic Computing System," IRE Trans. 
on Elec. Computers, Vol. EC-IO, No.2 (June 1961). 
6. "Aviation Week and Space Technology," Aug. 19, 1963 pp. 93-103 
7. A. R. Helland and W. C. Mann," Failure Effects in Redundant Systems" 
Westinghouse Report EE-3351. (March, 1963) 
8. Report No. NADC-EL-6319, Micro-Notes No.3, "Information on Micro 
Electronics for Navy Avionics Equipment" (June, 1963) 
1 - 90 
Appendix 2 
RELIABILITY OF IMPERFECT REDUNDANT SYSTEMS 
by 
R. S. Bray 
P. A. Jensen 
C. G. Masters 
September 1963 
I. 
II. 
III. 
IV. 
V. 
2-ii 
TABLE OF CONTENTS 
INTRODUCTION . . . 
MISSION RELIABILITY 
PROCEDURES FOR ESTIMATING THE SYSTEM RELIABILITY 
A. Estimation of the Expected Value of Mission Reliability with only 
the Information that the System is Operating at tl . . . . 
B. Estimation of the Expected Value of Mission Reliability with 
Tests at t 1 Helping to Establish the Circuit Failure Rates 
C. Improvement of the Estimate Through Failure State Tests 
D. Determining the Mission Reliability of Large Systems . . 
E. Using Tests to Determine Both the Failure States of the System 
and Failure Rates of the Circuits at tl 
TEST OF THE HYPOTHESIS THAT MISSION RELIABILITY IS GREATER 
THAN A REQUIRED VALUE ... . . . 
CONCLUSIONS AND RECOMMENDATIONS 
2-1 
2 -2 
2-6 
2-6 
2-8 
2- 9 
2 -12 
2 -16 
2 -17 
2 -19 
I. INTRODUCTION 
The problem of the pre-launch testing of space borne electronic systems is becoming 
more severe as the systems increase in complexity while decreasing in physical size. The 
testing problem will soon become much worse as systems are made redundant and in-flight 
tests are used to determine the successive actions of deep space probes . Tests can no 
longer be made adequately on the basis of a strict "working" or "failed" criterion because a 
redundant system may contain many internal failures and still be operating at the time of 
test. Such a system might easily have a much lower probability of successfully completing 
a mission than a functionally identical non-redundant system. 
In addition, the large number of subsystems in a complex redundant network will make 
complete check-out (i. e. tests of each sUbsystem) virtually impossible. Consequently, a 
new method must be devised which will permit a statistical estimate to be made of the proba-
bility of mission success (reliability). This estimate must be based on the results of a 
limited amount of testing and should be as accurate as possible. 
2-1 
II. MISSION RELIABILITY 
The problem may be stated more specifically as follows. A test of a redundant machine 
will be made at some time t 1. (It is expected that some failures will be found in the equipment, 
and the object of the test is merely to determine the number and pattern of the failures in the 
system.) From the test data, the probability that the redundant system under test will oper-
ate successfully throughout a mission which begins at time, t l , and ends at time, t 2, given 
that the system is operating at t l , is estimated. This probability is defined as the mission 
reliability (R) and is a function of the system organization, the state of the system at t l , the 
failure rates of the parts of the system, the starting time (tl) of the mission, and the mission's 
duration, t2 - t 1· At some time to' which is less than tl or t2, all circuits in the system are 
assumed perfect. As time progresses they are assumed to fail in a random manner with a 
constant failure rate. At tl when the system is ready to begin the mission, the system must 
be in one of a finite number of possible failure states. The failure states are determined by 
the number and location of failed circuits in the system. For example, consider the multiple-
line redundant network of figure Q-l. A restoring Circuit indicated by a circle will make a 
correct decision if at least two of its inputs are correct. 
STAGE A STAGE B 
Figure Q-l. A Two Stage Example of a Redundant System 
Assume for Simplicity of explanation, that the restoring circuits of this system are 
perfectly reliable and that only signal proceSSing circuits, indicated by rectangles , can fail. 
The possible failure states of this system are listed in columns 2 and 3 of Table 1. 
2-2 
r--~ --- --- --
TABLE 1 
1 2 3 4 5 
Num.ber of Number of 
Failure Failures in Failures in R.* (t ) ** Pi (t1) *** State Stage A Stage B 1 2 
[3 2 ~ 2 [p3J [p3J 1 0 0 p + 3 p (1- p ) m m m 
~ 3 3 ] 2 [p3J ~ i (1-P~ 2 0 1 p + 3 p (1- p ) p m m m m 
3 0 2 0 [p3J ~P (1_p)2] 
4 0 3 0 [p 3J ~1_p)3 J 
0 [p 3 + 3 p 2 (1-p )] 2 [3P2 (1- P)] [p 3] 5 1 Pm m m m 
6 1 1 4 [3P2 (1-p~ ~ p2 (1-P~ Pm 
7 1 2 0 [3p2 (1-P~ [3 p 2 (1_p)2] 
8 1 3 0 [3P2 (1 - P~ [(1 _p)3] 
9 2 0 0 ~P (1_p)2] [ p3] 
10 2 1 0 ~ P (1-p)2J[3 p2 (1-P~ 
11 2 2 0 ~ p (1_p)2] [3 p (1_p)2] 
12 2 3 0 [3 p (1_p)2] ~1 - P)~ 
* R i(t2} is the probability of correct system operation at time (t2) given the i th failure 
state exis ts at t 1-
** All the p~s in this column are probabilities that a circuit is successful at t 2, given it was successful at t1 -
*** All the p's in this column are probabilities that a circuit is successful at t
1
, given 
it was successful at to-
2-3 
TABLE 1 (Cont) 
1 2 3 4 5 
Number of Number of 
Failure Failures in Failures in R.* (t2) ** P. (t ) *** State Stage A Stage B 1 1 1 
13 
14 
15 
16 
* 
** 
*** 
3 0 0 [(I_p)3] [ p3] 
3 1 0 [(I_p)3] ~p2 (I-P~ 
3 2 0 ~1-p)3J ~ P (l_p)2] 
3 3 0 ~1_p)3] ~1_p)3 ] 
Ri(tZ) is the probability of correct s ystem operation at time (t2) given the ith failure 
state exists at t 1-
All the prhs in this column are the probability that a circuit is successful at t2, given 
it was successful at tl-
All the p's in this column are the probability that a circuit is successful at t 1, given it was successful at to-
For each of the failure states of Table 1, the reliability of the system can be calculated 
at t2- This is done as follows: If the failure rate, A , of a circuit is constant and known, the 
probability that a circuit is successful at t2, given it is successful at tl is the expontential. 
(1) 
For the system to be successful at the end of the mission, two or three circuits in each 
stage must be successful. The probability that the system meets this requirement depends 
on the failure state of the system at t 1, and the value of Pm _ For instance for failure states 
3, 4, 7, 8 and 9-16, the probability of correct system operation must be zero because there 
are too many failures at t 1-
in the ith state at t1: 
Because R. is defined as this probability, given the system is 
1 
R. = 0 for i = 3, 4, 7, 8, 9-16 
1 
For failure state 1, the reliability is the probability that two or three circuits are 
successful at t2- Thus: 
2 
The reliability of the system for other failure states is shown in column 4 of Table 1. 
2-4 
Column 5 of Table 1lists the probabilities that the particular failure states will be 
present at t1. The factor p in this column is the probability of success of a circuit at t1 
given the circuit was successful at to. These probabilities will find use in later discussions. 
Two things must be known if the mission reliability of the system is to be determined 
with 100% confidence, the failure state of the system and the failure rates of the circuits 
(needed to calculate Pm). For large systems both these factors may be very difficult or 
impossible to determine exactly. To find the failure state of a system, the failure state of 
each stage must be known. This may require a considerable amount of testing, probably a 
test of all circuits in the system. The failure rates of the circuits can only be determined 
exactly with a test of an infinite number of circuits all operating under the same environments 
as the circuits in the system. Of course, with limited testing allowed at t1 it is improbable 
that the exact failure state of the system can be found. Estimates and their accuracy are the 
subject of the remainder of this report. 
2-5 
III. PROCEDURES FOR ESTIMATING THE SYSTEM RELIABILITY 
In the study of this problem, several ways have been proposed to estimate a system's 
mission reliability with varying degrees of accuracy and varying levels of confidence. Four 
of these are described below. 
A. ESTIMATION OF THE EXPECTED VALUE OF MISSION RELIABILITY WITH ONLY 
THE INFORMATION THAT THE SYSTEM IS OPERATING AT t l . 
Us ing the design failure rates * one can estimate the m ission reliability with only the 
information that the system is operating successfully at t l . This is done using the equations 
representing the reliability of the system at time t given only that all cir cuits are operating 
successfully at time O. The system reliability R (t) can be wr itten as the pr obability of 
successful operation fr om time 0 to time t. The r eliabil ity of the system of figure 1 is: 
R(t) 
where p (t) -At e 
A plot of R(t) for the redundant sys tem of figure Q-I is shown in figure Q-2a . 
(2) 
* The design failure rates are those assigned to the cir cuits during the design of the system. 
They are generally derived from contr olled life testing of components similar to those 
used in the circuits or from field tests of similar components. 
1. 00 1.00 
0 .9 A 0 .9 
0 .8 0 .8 
>- 0.7 >- 0 .7 
~ ~ 
...J 
0 .6 
...J 0 .6 
II) 0 .5 II) 0 .5 
eX eX 
...J 0 .4 ...J 0.4 
w w 
cr 0.3 cr 0 .3 
0 .2 0 .2 
0 .1 0 .1 
2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10 
T I ME IN HUNDfiEDS OF HOURS T I ME IN HUNDREDS OF HOURS 
Figure Q-2 . Reliability vs Time For a Redundant System. 
A) With No Test at t l . 
B) With a Test Determining the Success of the System at t l 
2- 6 
If one tests the system at a time. t1 and finds it to be working successfully, this infor-
mation can be used to adjust the system reliability for time greater than t1 to take account of 
the condition of success at t1' A curve must now be determined which gives the reliability 
of the system given successful operation at t1' This is expressed as: 
For t< t 1, the reliability must be unity, because it is assumed that once a system fails 
it stays failed. 
Then: 
R [t \ R (td 1 t < t1 
For t>t1, the reliability is: 
R [t \ R (t 1)] R (t) t > t1 R (t 1) 
This is derived from the definition of conditional probabilities. 
P (AlB) P (A and B) P (B) 
(3) 
(4) 
A plot of equations (3) and (4) is shown in figure Q-2b for a particular t1 and the system 
shown in figure Q-l. 
Using equation (4) the mission reliability can be written: 
(5) 
Thus, the mission reliability can be determined simply by using the reliability equations of 
the system and the design failure rates of the circuits of the system. 
The question now arises, of what value is this result? First, assuming the failure 
rates used in the calculation of R are perfect, if a large number of systems were constructed 
and run until t 1, approximately R (t1) x 100% of them would be working. Throwing away all 
systems that were failed at t1 and continuing the test until t 2, R (t2, t l ) x 100% of the popula-
tion all systems working at t1 will be working at t2' 
2-7 
No information was given for this estimate about the failure state of the system at t 1, 
except that the system was in one of the failure states for which the system is successful. 
For the example, these are states 1, 2, 5 and 6. This limited information about the failure 
state makes it necessary to approximate the mission reliability by an expected value given 
that the system is in one of the four successful failure states. The approximation has a con-
siderable effect on the accuracy of the estimate which is described in detail in Section IITC 
of this report. 
B. ESTIMATION OF THE EXPECTED VALUE OF MISSION RELIABILITY WITH TESTS 
ATt1 , HELPING TO ESTABLISH THE CIRCUIT FAILURE RATES. 
Another problem which threatens the validity of the R calculated by this method is the 
uncertainty of the failure rates of the components of the system. The failure rates used in 
design are derived from a variety of sources and are almost surely not exactly accurate for 
any operational system. A realistic way to use design failure rates is to assign confidence 
limits to their values. With these one can say with a certain confidence that the failure 
rates of his parts are within a region determined by his confidence limits. This data is often 
available with design failure rates. Using the two extremes of failure rates, upper and lower 
confidence limits can be calculated for the mission reliability. The statement can then be 
made with a certain confidence that the mission reliability is within the interval of its con-
fidence limits. It is instructive to point out that if the failure rates of all parts are perfectly 
known, there is 100% confidence in the calculated value of mission reliability. If, however, 
the failure rates are uncertain, as is always the case, confidence limits should be indicated 
for the mission reliability which reflect the uncertainty of the failure rates. 
Estimation of the mission reliability of the system using the failure rates used in design 
has one serious failing. These failure rates often do not accurately describe the actual com-
ponents. The design failure rates may have been determined under different environmental 
conditions than those of system in use, or components in the system may have been subjected 
to different manufacturing conditions than those used to derive the design failure rates. 
These and other factors might cause the circuits in the system to have different failure 
rates than those predicted in original design. Tests performed at t1 can be used to deter-
mine if the actual failure rates are indeed different from design failure rates. If they are 
different the tests will be used to estimate the actual failure rate. 
The first task is to test the null hypothesis that the actual average failure rates are 
the same as those used in design. To do this, the system must be split into groups of 
circuits with each group comprised of circuits of identical design. Using the design failure 
rates, the number of failures that can be expected in each group at t1 is calculated. 
2-8 
-" '\ This expected number is p .n, where p .= e ], and n is the number of circuits in the group. 
J J 
About this expected value one can construct an interval specifying the number of failures he 
is willing to observe at tl and still accept the hypothesis that the actual failure rate is that 
used in design. 
The next step in the procedure is to test the circuits. If possible, all circuits are 
tested ** and the numbers of failures recorded. If the number of failures at tl in n samples 
is within this interval the design failure rate is used to calculate the mission reliability. If 
the number of failures is not within the interval a new failure rate is calculated using the 
observed data at t1. The mean of this new failure rate is A 0 and is determined from the 
equation 
A 
o 
In x/ n 
tl 
Confidence limits are placed on this calculated rate and the extremes of the confidence 
interval are used to calculate confidence limits on the estimates of the mission reliability of 
the system. 
The question immediately arises, "Why test the null hypothesis at all if test data is to 
be accepted in preference to the design failure rates?" This is done because under the con-
dition that the null hypothesis is met, the correspondence of the two sources of failure rate 
estimates would result in a higher confidence in the final estimate than either source alone 
can provide. When the null is rejected and the test data alone is used, the confidence in the 
estimate is reduced. 
C. IMPROVEMENT OF THE ESTIMATE THROUGH FAILURE STATE TESTS 
In this reliability estimation procedure a more accurate estimate is obtained by testing 
at tl to determine the failure state of the system. If the failure state were known exactly and 
the failure rates of the circuits were accurate, the mission reliability of the system could be 
calculated with no equivocation. Thorough testing at tl could determine exactly the failure 
state of the system, but since thorough testing is not of interest in this study the failure state 
will be known imperfectly. One will have a number of alternatives each with a certain pro-
bability given the results of the tests. 
* Aj = design failure rate of the j th type Circuit. 
** Note, if the system is too large to permit complete testing, a random sample of each 
type of circuit is taken and the number of failures observed in the sample is used to 
estimate the actual failure rates. 
2-9 
Consider again the example of figure Q-l. Each stage of the system has four failure 
states, zero, on~, two, or three failed circuits. If no information is available at t l , not 
even that the system is operating, every stage may be in anyone of these states. Thus there 
are 42 possible failure states of the system. They have been listed in column 1 of Table l. 
Associated with the ith failure state is a probability Pi which is the probability that the sys-
tem is in this state at t1 given that all circuits were successful at to' Thus, with no 
information at tl on the condition of the system, the probability that the system is in the 
state in which no circuits have failed is 
6 P = P 1 
The factor p is the probability of success of a circuit at t1' The probability of the 
failure state in which one circuit is failed in Stage B is 
5 P 2 = 3p (l-p). 
The probabilities of occurrence of the states given no information on the condition of the 
system at t1 are listed in column 5 of Table Q-l. 
Associated with each of the failure states is a reliability of the system at t2 given that 
the system is in the failure state at t 1. This is written as R1 (t2) and is shown for each state 
in column 4 of Table 1. 
The reliability of the system is written as the sum over all i of the product of the 
probability of a ith failure state and the mission reliability given that the system is in the 
ith state at t 1. Thus: 
all i 
L P. R. 1 1 
If tests are made at t1 that give some information on the condition of the system, the 
number of failure states possible are markedly reduced, and the reliability estimate available 
at t 1 is much more accurate. For instance if one tests the system of figure Q-1 and finds it 
functioning correctly at t 1, each stage must have no more than one circuit failure. Thus, 
only four states are possible after this test. These are states 1, 2, 5 and 6. The probability 
that the system is in a particular state must be adjusted to account for the known condition 
that the system functions at t 1. Thus, for the example the probability of being in state 1 with 
no failures is: 
1,2,5,6 
2-10 
P. 
1 
(6) 
The denominator in equation (6) is the .probability that the system is in one of the four 
possible states. 
In general, a test to establish the failure state will leave only a set of possible failure 
states. Assume the test determines the state of the system to such an extent that the only 
possible failure states are included in the set 1. If P: is the probability of being in the ith 
1 
failure state given the results of the tests, then: 
pI = 0 
i For f 1 
Or if a state is not in the set I its probability is zero. 
If a state is possible then: 
P. 
P~ 1 For 
1 all i € I 
[ P. 1 
€ I (7) 
The mission reliability for a particular failure state, R., does not change, hence the 
1 
mission reliability given the results of the test can be written in general as: 
For the example 
all i € I 
[ [-a-:-l::-l -:-i _:_i -=-1 - J 
L Pi 
R. 
1 
PI + P 2 ! P 5 + P 6 [p 1 Rl + P 2 R2 + P 5 R5 + P 6 R6 ] 
(8) 
(9) 
More extensive tests at tl will further reduce the number of failure states which can 
exist. For instance if a test reveals that at least one circuit in the network is failed, the 
failure state which has no errors is eliminated, changing considerably the expected mission 
reliability. For this example Pi = 0, and states 2, 5 and 6 are the only members of the set I. 
To illustrate the value of testing to determine the failure state at t l , consider the 
example. The probability that a circuit operates until tl is p (t l ) = 0.9 and the probability 
it lasts until t2, given it was successful at tl is Pm (t2) = 0.9 . The system is that shown 
in figure Q-I and the restoring circuits are assumed perfectly reliable. Say that in reality 
one circuit is failed in one stage and the circuits in the other stage are all successful, but 
2-11 
this information is unknown to the tester . This is the information to be gained at t1 through 
the tests. Table 2 lists the reliability one would predict with different amounts of infor-
mation about the condition of the system at t 1. The wide variation in the result indicates the 
importance of testing at t 1. 
This section does not propose the detailed procedures for testing a system at t 1. It 
should, however, indicate the importance of making these tests and the calculations required 
to utilize the information gained from the test to estimate the system reliability. 
1. 
2. 
3. 
4. 
Test Results at the 
Mission's Start (t1) 
TABLE 2 
No information at t1, not even 
that the system is working. 
Tests show that the system is 
working at t 1. 
Tests show that the system is 
working but that at least one 
circuit is failed. 
Tests show that exactly one 
circuit in the system is failed 
Predicted System 
Mission Reliability 
0.821 
0.867 
0.770 
0.788 
D. DETERMINING THE MISSION RELIABILITY OF LARGE SYSTEMS 
Corresponding 
Risk of Failure 
0.179 
0.133 
0.230 
0.212 
The example of the last section is a small two stage system. One might well ask if it 
is feasible to enumerate all of the possible failure states of a large system for the determina-
tion of the mission reliability. Indeed with no information at t1 on whether or not an n stage 
system is operating correctly, there are 4n possible failure states of the system. As n in-
creases, the number of possible failure states increases exponentially. 
The purpose of the tests at t1 is to eliminate large numbers of these states in the manner 
shown for the example and hence obtain a better estimate of the mission reliability. The use 
of equation (8) provides this estimate but it requires, in its present form, separate considera-
tion of each failure state. This is impractical for all but the smallest systems. 
2-12 
This problem is circumvented by .first putting the mission reliability equation in a 
more general form. The mission reliability of the system given the results of the test at t1 
is a conditional probability which can be written: 
~= 
Prob. (Test results at t1 and successful system operation at t 2) 
Prob. (Test results at t 1) 
Equation 8 is a representation of this equation for small systems. 
(10) 
The form equation (10) takes depends on the characteristics of the system under study 
and the type of test to which it is subject at tl' For example, consider an n stage order-
three-multiple-line system which has perfect voters. For simplicity assume all the stages 
are identical with equally reliable circuits. For illustrative purposes assume the stages are 
a.rranged in a chain as in figure Q- 3. 
Figure Q-3. Chain of n-Multiple-Line Stages 
The first type of test to which the system of figure Q-3 is subjected is a simple test to 
determine its operability. Is the system failed or successful at tl ? Given the system is 
successful at t1 the miSSion reliability will now be determined. 
Because the system is working at tl, each stage must be in one of two states, either 
three circuits successful or two circuits successful and one failed. Then the system may be 
in anyone of 2n possible states. Using equation (8) to evaluate the mission reliability would 
be a rather tedious and time consuming process if n were a sufficiently large value since both 
the numerator and denominator of this equation have 2n terms. However, because of the in-
dependence of the stages of the multiple line system, it isn't necessary to carry out this 
operation. The probability that each stage is successful at tl is independent of the condition 
of all other stages and can be written: 
[p3 + 3 p2 (1 - p) J (11) 
2-13 
• 
Since they are all identical the probability that all the stages are succes sful at t1 is: 
r 3 2 ] n LP + 3 P (1 - p) (12) 
This term is the probability that the system is in a successful failure state at t1 and is 
the denominator for equation (10) when the test consists only of determining the operability of 
the system. 
The probability that a single stage is operating at t2 can be written: 
(13) 
Since the stages are independent the probability that system is operating at t2 is: 
• 
(14) 
This term is equivalent to the numerator of equation (10). Using the terms (12) and (14) 
the mission reliability can be determined for this system. Given that the system is successful 
at t1 the probability that the system is successful at t2 is: 
(15) 
Note that for this determination of the mission reliability the separate failure states have not 
been enumerated. The calculation of mission reliability for this system has been a relatively 
simple procedure. 
Other tests at t1 will result in different forms for the mission reliability equation (10). 
For instance assume the system of figure 3 is subjected to a different test. This test sub-
divides the system into three nonredundant ranks as shown in figure Q-4. 
Each rank will be tested individually. If a rank fails it can be inferred that one or more 
circuits in the rank are failed. If a rank is successful it can be inferred that all circuits in 
the rank are successful. 
At t1 the information is given that the system is operating corr ectly and that 0, 1, 2 or 
3 of the ranks have failed. Now equations m ust be developed that determine the mission 
reliability of the system given the results of the test at t1· 
2- 14 
-- -- -- -- ----------------------------------------------------------------~ 
r------------------------------------, 
i o-O-cJ--o--cJ--o- .......... -0--0 I_RA,NK 
I r-------------------------------~ 
I D-O-CJ--O-CJ--O···········-cJ--O I_RA,"' 
r-----------------.----------------1 
I D-O-CJ--o-CJ--O-······----o--o I+-R~NK 
L _______________________________ J 
Figure Q-4. System Divided Into Three Nonredundant Ranks 
The numerators and denominators of the mission reliability equation for the Various 
test results are shown in Table 3. 
TABLE 3 
trest Result Prob. Prob. (Test Result at t1 and Mission (Ranks (Test Result at t 1) Successful System Operation at t2 ) ReI iabil ity Failed) 
YO =: [p3] n [33 2 ]n 
QO 0 QO = p (Pm + 3 Pm (1-Pm) ) --YO 
Y 1 = [p2 (l-p) + [2 232 2 r Q1 1 Q1 = P (l-p)p +p (p +3p (1-p )) m m m m -
3J n 
Y1 
p -yo -QO 
2 Y2 = [2p2(1_p)+ [2 2332 r Q2 Q2= 2p (l-p)p +p (p +3p (1-p )) m m m m -
3J n 
Y2 
p -YO-2Y1 -QO - 2Ql 
3 Y3 = [ 3p2(1_P)+ 
[2 2 3 3 2 jn Q3 Q3 = 3p (1-p)p +p (p +3p (1-p ) m m m m -
-n Y3 
p3 J-YO-3Y 1-3Y2 -Q1 - 3Q1 - 3Q2 
Compared to enumerating all the failed states possible with the particular results of a 
test, these equations are relatively Simple. If the assumption that all circuits are equally 
reliable is removed, the equations for mission reliability are very similar to these except in-
stead of raising a single term to the power n as in these equations, a product of n factors 
will be taken. This should be a simple matter on a computer. 
2-15 
If the restriction that the restoring circuits be perfectly reliable is removed, the 
mission reliability equation will not be changed signifi~antly unless the stages are intercon-
nected in such a manner that they are no longer independent. The techniques used to calculate 
system reliability in this section are invalid if the stages are not independent. Techniques 
have been developed to determine the reliability of such systems* and these must be used in 
determining the mission reliability. 
The equation describing the mission reliability for a s ystem will depend on both the 
tests performed at t1 and the characteristics of the system. These factors will surely be 
known prior to the test, so equations can be developed to evaluate the mission reliability 
which take into account the possible failure states of the system without exhausti ve enumeration. 
E . USING TESTS TO DETERMINE BOTH THE FAILURE STATE OF THE SYSTEM AND 
FAILURE RATES OF THE CIRCUITS AT t1 
In technique C, tests were made at tl to determine the possible failure states of the 
system. In technique B tests were made to establish the actual failure rate of the circuits of 
the system. It should be possible to design tests which give information regarding both these 
parameters. 
The tests will establish the failure rate of the system at t1 and use these in carrying out 
the reliability calculations described for Technique C. It takes little imagination to see that 
in the course of tests to determine the failure rate a great deal will be learned about the 
failure state of the system. For instance as soon as one failure is found the possibility that 
the system is in the no Circuit failure state is decreased to zero, probably decreasing the 
miSSion reliability appreciably. 
The details of this technique have not been developed, but generally it proposes to use 
the tests of tl to indicate both these parameters and thereby increase markedLy the accuracy 
of the mission reliability estimate. 
* Jensen, P. A., W. C. Mann and M. R. Cosgrove, "The Synthesis of Redundant Multiple-
Line Networks", First Annual Report Contract NONR 3842 (00), May 1, 1963. 
2-16 
IV. TEST OF THE HYPOTHESIS THAT THE MISSION RELIABILITY IS 
GREATER THAN A REQUIRED VALUE 
This method is separated from the others because it does not explicitly estimate the 
reliability of a system. Instead it finds, through measurements at the beginning 01 me 
mission, the probability that the system will not meet a given mission reliability specification. 
The user of the system must specify the minimum mission reliability. He must also 
specify the maximum chance he is willing to take that the system does not meet this goal when 
his tests indicate that it will. It is assumed that the system is not acceptable if the probability 
that it does not meet the reliability specification is above the given value, and is acceptable 
ntherwise. 
The first step in this procedure is to determine the failure rates that the circuits of 
the system must have to just meet the mission reliability goal. These failure rates are 
called the maximum failure rates, X. m. For a system in which many circuits have the same 
fallure rate this does not seem to be too imposing a problem. For example consider a system 
where all circuits have the same failure rate. If the starting time and duration of the mission 
are known, the mission reliability can be expressed only as a function of the failure rate, X. . 
Equation (5) can then be set equal to the required mission reliability and solved for the failure 
rate. A cut and try method may be required for the solution. 
The maximum failure rate is a function of both the starting time, t 1, and the duration, 
t2 - t 1, of the mission. However, if the duration of the mission is known, it is possible to 
plot a curve of mission starting time against the maximum failure rate. 
Once the maximum failure rate is known it only remains to determine if the actual 
failure rate of the circuits of the system is less than or equal to this value. This will be de-
termined by testing n of the circuits at t1 and counting the number of failed circuits. Call the 
number of failed circuits Xl" With this data and by using the maximum failure rate, an upper 
bound on the probability that the true failure rate is greater than the maximum failure rate can 
be determined. 
If the fact that a majority of the circuits in a stage must be operative at t1 is neglected, 
the success of a circuit in the system may be considered a Bernoulli trial with probability of 
success, e - X. t. The probability distribution of the total number of circuit failures in M 
circuits is then binomial. This distribution or the associated denSity function can be plotted 
for any number of samples. One such plot appears in figure Q-5. 
The probability distribution of the number of failures at time t1 can be plotted using the 
calculated maximum failure rate. 
2-17 
p 
I I I I I 
x 
Figure Q-5. Sample Distribution 
Some maximum number of failures Y will be chosen such that there is probability of 8 
that the number of failed circuits observed at t1, Xl, will be less than Y if the failure rate of 
the circuits is )... m' The quantity 8 is determined from the binominal: 
y - 1 
L 
-)... t n-h 
(e m 1) 
-)... t h 
(l-e m 1) (16) 
h = 0 
For failure rates greater than)... the probability that less than Y failures occur must be 
m 
less than 8. So if Xl is less than Y, with confidence 1 - 8 the statement can be made that 
the actual failure rate must be less than the maximum failure rate. Now the statement can 
be made that with confidence 1 - 8 that the reliability of the system is greater than the mini-
mum reliability spec ified by the user. 
This method leads to the statement with a confidence (1 - 8 ), it can be said that the 
probability that the system will suceed is R. The information used to compute R might be 
used to compute the expected time to system failure instead . The object of the test would 
then be to confirm or reject the hypothesis that the expected life would exceed the mission 
time with a confidence (1 - 8). This modification has not been carefully examined but it 
appears to reduce the number of probabilistic statements from two to one. 
This procedure again uses no information on the failure state of the system except that 
the system is successful at the beginning of the mission. The effect of this on the accuracy 
of the results has already been discussed in Section IIIC. 
2-18 
V. CONCLUSIONS AND RECOMMENDATIONS 
It is the nature of a redundant system to withstand a number of internal failures and 
still perform its function successfully. This is an extremely desirable property for increas-
ing life or providing high reliability, but it makes it unreasonable to base the decision-
whether or not to carry out a mission with the system - only on the fact that the s ystem is 
operating at the beginning of the misSion. 
This decision should be based on the probability that the s ystem will complete the 
mission successfully. There are two major factors affecting the probability which are im-
perfectly known at the beginning of the mission. First, the number and location of initial 
circuit failures has a very Significant effect on the probability that the system will operate 
throughout the mission. Second, the miSSion reliability depends heavily on the failure rates 
of the circuits which make up the system. There is little accurate information concerning 
either of these factors when it is time to make the decision. 
The report proposes that certain tests be made just before the mission is to begin to 
determine at least apprOXimately, these unknowns. It proposes some procedures for using 
the results of the tests to estimate the mission reliability with varying de grees of accuracy. 
A procedure for making the decision on the useability of the system without estimating the 
mission reliability is also presented. 
It should be noted that the details of these procedures are still to be worked out and 
the accuracy of their results are still uncertain. The work here reported will provide the 
basis for future studies on the subject. 
No attempt has been made to evaluate the relative usefulness of these procedures. It 
is recommended that efforts be made to develop an appropriate measure for comparing the 
techniques so that they may be evaluated relative to a common scale. 
One very important area of study neglected by this report is the design of simple and 
effiCient tests to be performed at the beginning of the mission to obtain the information re-
quired for the reliability estimates. As much information as possible must be gained from 
a minimum number of tests. A small amount of basic work has been done in this area, and 
it will be the subject of future efforts. 
2-19 
Appendix 3 
A SURVEY OF COMPONENTS FOR ADAPTIVE RESTORING CIRCUITS 
by 
H. Brinker 
TABLE OF CONTENTS 
Introd uction 
1. Electrochemical Devices 
a. Device 1 
b. Solion 
c. Herc'l17 Cell 
2. Magnetic Devices 
a. MAD Integrator 
b. Orthogonal Core Integrator 
c. Second Harmonic Integrator 
d. ¥~gnetostrictive Integrator 
3. Conclusion 
References 
LIST OF FIGURES 
Fig'l1'e 1 Comparison of Adaptive and Majority 
Voting Techniques 
Figure 2 Adaptive Voter 
Fig'rre 3 Device 1 Cell 
Fig'rre 4 Device 1 Integrator 
Fig'rre 5a Solion Tetrode and Output Characteristics 
Figure 5b Solion Tetrode connected as an Integrator 
Fig,rre 6 Mercu~ Cell Integrator (capacitive readout) 
Fig'rre 7 Multiple Aperture Device (MAD) 
FiS'rre 8 MAD Integrator 
Fig'll'e 9 Orthogonal Core 
Fig'rre 10 Second Harmonic Integrator 
Fig'rre 11 Magnetostrictive Integrator 
3 -ii 
Page 
1 
3 
3 
5 
7 
8 
8 
11 
11 
12 
13 
15 
2 
2 
4 
4 
6 
6 
7 
9 
10 
11 
12 
13 
Introduction 
The Adaline Neuronl is an adaptive logic device which may be trained 
to recognize certain classes of input patterns. The device output is a 
binary signal Which classifies particular combinations of input signals 
into two categories. An output decision is determined by a threshold 
element ~ose input is the linear sum of the products of each input and 
its associated variable weight. During adaption the weights are appro-
priately changed in order to make the output decision agree with the de-
sired response. By following a simple set of rules after each application 
of input signal combinations the device is caused to converge to an optimum 
state for properly categorizing the set of input patterns. 
Althou~h training rules for a single layer system have been formulated 
by Widrowl, new adaptive theory is required if systems of t,,:o or more cas-
caded layers are to be properly trained to perform complex functions of 
adaptive behavior and pattern recognition. The question of whether suCh 
devices m~ be connected in complex arrays and demonstrate brain-like 
behavior has generated considerable interest. Such applications appear to 
be philosophical and subject to considerable controversy. Of primary con-
cern in the present study is to consider the usefulness of the Ada1ine 
neuron approach in implementing the adaptive voting elements of a redundant 
system. 
The chart of Figure 1 shows how adaptive voters may extend the relia-
bility of a conventional redundant system, allowing a system using 9 replicas 
to outperform a conventional system using 35 replicas of each function. 
The Ada1ine neuron has received considerable quantitative study in 
application to pattern recognition. When modified as shown in Figure 2, . 
and applied as an adaptive voter, the training rules become quite simple 
since the desired output is determined by a voting of the weighted inputs. 
Initially, all weights (gains) are made equal. The deciSion element will 
then provide an output in accordance with the states of the majority of 
binary, replicated input Signals. If input errors are independent and 
random the adaptive voter, by progressively adjusting its weights to assign 
high weights to reliable inputs and low weights to failed or unreliable in-
puts, may derive correct information from a small minority of correct inputs. 
3-1 
~ 
... 
1.0 
.8 
-J .(, 
.., 
::! 
.... 
Iu 
II:: 
~ 
~ ~ 
If) 
~ 
If) 
.2 
o 
3 ~ 
" "'"' . .,."'" , ... !\ l\ 9 INPUT ADAPTIVE VOTER \ W.'"'' ,.,," ,[QU'" • 
1 
\ 1\ 
\ \ ~ 
o TIME 
Figure 1 Comparison of Adaptive and Majority Voting Techniques 
WI 
/ ~--~--t 
2. 
n 
3-2 
W2. 
Adapf,on 
conl-rol 
Figure 2 Adaptive Voter 
~I-+I 
-/ 
ou.tpu.t 
In this manner the effect of errors caused by input failures may be negated, 
allowing a correct decision to be made under a high probabil1 ty of input 
signal failure. 'Dle simple, fixed majority voter will make output decision 
errors when more than half of the inputs fail or are in error. '!he adaptive 
voter, by masking out input errors as they occur, JIlay tolerate failures until 
only two correct inputs out of the original group are present. 
In order to provide automatic adaption it is necessary to continuously 
compare the output decision with each binary input and to incrementally 
decrease or increase each input weight according to ~ether agreement or 
disagreement exists. Assuming that input error! or failures occur randomlJr 
and that the automatic adaptive process can negate an unreliable input be-
fore other failures occur, the adaptive voter offers the possibility of 
realizing 8,Ystam reliability of unprecedented excellence. 
Inherent in the basic design of an adaptive voter is the requirel!lent for 
a variable weighted device which perfonts integration and dieplays relatively 
permanent memory. These special characteristics have stimulated considerable 
effort toward the development of suitable adaptive components. Devices which 
display variable weight with memory generally utilize phenomena involving atomic 
translation or rotation. The following represent! a survey of the more prom-
iSing techniques which have been suggested by researchers. The first three 
devices described exploit electrochemical effects ~ile the remaining devices 
utilize magnetic domain phenomena. 
1. Electro-Chemical Devices 
a. Device 1 
Devic e 1 3 , an electrolytic device developed at Stanford UniverSity 
by Widrow, i! an electronically adjustable resistor with a rate-of-change of 
resistance controlled by application of d-c current in a third electrode. 
It consists of a sealed plating cell containing an electrolytic bath, a 
resistive substrate upon Which metal is deposited and a metal source elec-
trode. A typical configuration indicating the placement of electrodes and 
electrolyte in a small plastic enclosure is shown in Figure 3. Two leads 
are attached to the substrate and resistance between these leads can be 
reversibly controlled by paSSing plating current into a third electrode. 
'!he conductance of the device is changed and stored by plating or stripping 
metal from the substrate by means of the integral 6f the plating current. 
Conductance is sensed nondeetructive1y by applying a low voltage a-c signal 
and l1easuring the resultant current flow. 
Normal d-c drop between between source and substrate is typically 0.2 
volts at a plating current ot 0.2 mao The substrate resistance changes 
from )0 onms to 2 ohms in 10 seconds with this magnitude of plating current. 
The AC senSing voltage applied is usually 0.1 volts RMS. A typical imple-
mentation of Device 1 wi t h associated transformer coupled senSing and 
d-c plating circuitry is shown in Figure 4. 
3-3 
Al though Device 1 models are commercially ava ilable at a cost of approxi -
mately $50 per cell their application in a practical ~stem is somewhat cum-
bersome. Transformer coupled circuits are usually required in order to 
present a balanced load to the plating current source, and to provide the 
3-4 
CONTAINER FILL ED WITH 
PLATING SOLUTION 
RHOOIUM CO ATED 
PL ATING SURFACE 
Figure 3 Device 1 Cell 
311 S.3V IIEE 
INPUT 
Figure 4 Device 1 Integr ator 
low voltage drop across the substrate. The substrate resistance is usually 
less than 100 ohms and the a- c voltage drop must be kept below 3/4 volt in 
order to prevent the formati on of gas in the cell. Some difficulty has been 
reported in keeping the substrate material free of dimensional imperfections 
Which in turn cause non linear plating effects to take place. Long term 
stability is apparently affected by chemical reacttons taking place between 
plating material and electrolyte. To date Device 1 models are available 
in sampl e quantities and it is difficult to predict ultimate large scale 
product i on costs, repeatability and reliability. 
b. SoHon 
The solion is a fluid-state device which functions by controlling 
and monitoring a reversible electrochemical "redo~' reaction. The term 
redox refers to a chemical reaction in which oxidation and reduction occur 
simultaneously. The redox system used in solions consists of two electrodes 
immersed in an electrolyte containing both the oxidized and reduced species 
of an ion. The system is completely reversible in that oxidation can occur 
at either electrode while an equivalent amount of the same element is reduced 
at the opposite electrode. Iodine is the reacting element most commonly used. 
A simplified drawing of a solion tetrode and its output characteristics 
is shown in Figure 5a. The tetrode has a platinum electrode at each end of a 
glass tube and two perforated platinum electrodes separating the tube into 
three compartments. The reservoir, containing the input electrode, is the 
largest compartment. The integral compartment, containing the common elec-
trode, is made very small so an equilibrium distribution of the iodine may 
be quickly reached. The compartment between the shield and readout elec-
trodes serve to separate the two electrodes. The output characteristics of 
a solion tetrode are similar to that of a vacuum tube pentode, and show a 
transconductance of 40,000 micromhos at an output current of 500 microamperes. 
A solion tetrode connected as an integrator is shown in Figure 5b. 
By controlling the charge transferred between the two input electrodes, 
a change in conductivity proportional to the integral of the input current 
may be obtained between the output electrodes. In this manner the device 
~y be utilized as an integrator, prOviding an output current proportional 
to the integral of the input current. Because of the concentration poten-
tial, the input impedance of the solion tetrode is in the order of 1000 
ohms and therefore a relatively high impedance signal source is required 
in order to avoid integration errors. At constant temperature, the 
stability of solions is reported to be less than 1% over a period of several 
days. 
3-5 
3-6 
.. 
.. 
~ 
.. 
.... 
-.8 
-.6 
~ - .4 
~ 
I 
, 
~ 
I 
I' 
o - .2 
E. = 0.7 Volts 
E = I - 4 Millivolts 
0 
4 
8 
12 
- .4 - .6 - .8 
ED - Volts 
Figurp Sa Soli on Tetrade and Output Characteristics 
Circuit Symbol 
Input Signal 
(Current Source) 
[Itelrodu 
I Input 
S Shield 
R Readout 
C Common 
O.7V 
Figure 5b Soli on Tetrode Connected as an Integrator 
A practical problem in the use of solion tetrodes arises from the 
requirement of providing an isolated battery potential between input and 
shield electrodes to prevent iodine diffusion between the reservoir and 
integral compartments. Primary application for the solion tetrode to date 
has been demonstrated as a low level DC amplifi~r with a time constant of 
20 seconds. Because of the inherent practical problems of precision de-
sign, isolated supply voltages and discharging effects of parallel outputs 
the soli on appears to offer little promise as a practical adaptive component. 
c. Mercury Cell 
Another novel approach fOE variable gain with memory is achieved by 
use of a mercury cell integrator, an electrochemical device which provides 
visual and electrical readout of the integral of an applied current. The 
integrating element consists of a capillary tube filled with two columns 
(electrodes) of mercury separated by a gap of aqueous electrolyte of metal-
lic salt. Two different methods have been used to provide electrical read-
out. The first method called capacitive readout is shown functionally in 
Figure 6. The d-c input Signal electroplates mercury across the gap at a 
rate which is a direct function of the input signal runpli tude, thus causing 
the gap or bubble of electrolyte to move. The outside of the capillary is 
covered by a vapor-deposited conductive sheath. The mercury electrodes and 
sheath, separated by a thin glass wall provide a capacitance of approximately 
20 pF. In application, an a-c signal is connected across the electrodes and 
rv--; 
CIRCUIT DIAGRAM 
Figure 6 Mercury Cell Integrator 
(Capacitive Readout) 
superimposed on the d-c input signal. The a-c signal will divide in accor-
dance with the capacitance existing between the upper mercury column and 
sheath, and the capacitance between sheath and lower grounded column of 
mercury. The excitation Signal provides a signal at the sheath which is 
a direct function of the length of the ungrounded electrode. An auxiliary 
amplifier and detector in turn provide a proportional d-c signal of proper 
level to operate other related devices. 
The device provides reversible integration, relatively stable 
memory, direct visual readout and a linearity better than 0.1 percent. 
Input control current is limited to +5 me d-c. The integration time from 
minimum to maxlmum output signal is approximately 100 minutes at maximum 
control CUITent. This time is ultimately limited by the maximum voltage 
~ich may be dropped across the electrolyte, without causing the formation 
of gas. 
3-7 
A typical capacitive readout integrator now commercially available 
is approximately 0.5 cu. in. but prices range around $1)0 per unit. Although 
displaying excellent stability and predictable operation such devices will 
require considerable price reduction before application becomes practical. 
The integration time although relatively long may not present a serious 
limitation for systems Which displ~ slow adaptive behavior as would be the 
case in adaptive voting elements. 
Another technique for sensing the position of the bubble utilizes 
a light source and a photo-conductor whose resistance is inversely propor-
tional to the amount of light passed by the transparent electrolyte. As 
the bubble moves out of line with the light source and photo-conductor 
target area the light becomes progressively blocked by the mercury columns, 
causing the photo-conductor resistance to increase. This technique allows 
faster integration because the bubble need only be displaced by its own 
height to effect a change from maximum to minimum light intensity at the 
photo-conductor. A typical photoelectric integrator commercially available 
occupies 1 cu. inch and requires 300 milliwatts to power an integral in-
candescent lamp. Output resistance varies over the range from 2SK ohms to 
3S0K ohms. Quantity prices are expected to fall below $15 per unit thus 
providing a reasonably inexpensive adaptive component. The use of an in-
candescent lamp for the light source imposes a serious life and reliability 
problem. The use of a more reliable light source and a substantial size 
reduction will be necessary before application becomes practical. 
2. Magnetic Devices 
Various techniques have been suggested for providing variable gain and 
non-destructive readout with magnetic devices. The phenomena utilized in 
such devices is based upon the ability of magnetic materials to store a 
remanent flux which is sensed in a non-destructive manner. Suggested de-
vices provide the capability for a partial switching of magnetic domain 
under a volt-second impulse as the basic incrementing source. Suitable 
magnetic materials include ferrites and tape wound cores which are charac-
terized by a square hysteresis curve. Most of the devices to be described 
utiliZe the same basic type of incrementing technique and differ primarily 
in the manner by which the stored flux is sensed. 
a. MAD Integrator 
A diagram of a typical multi-aperture device7 is shown in Figure 7. 
In this device flux can be sm tched around the minor aperture by means of an 
a-c drive winding without disturbing the flux linking and stored around the 
main aperture. Initially the flux around the main aperture is set to cause 
saturation in either a clockwise or counterclockwise direction. A momentary 
reversal of the magnetizing force driving the main aperture will cause a 
partial reversal of the flux. The amount of flux reversal is determined by 
the magnitude and duration of the drive and the value of the hold current. 
The purpose of the hold winding is to retain a portion of the core saturated 
in the original direction of magnetization and thereby assure partial 
switching of the flux. The amount of flux alternately Switched around 
the small aperture is then proportional to the flux which has been switched 
3-8 
around the main aperture. The output voltage will consist of a signal 
whose voltage integral is proportional to the amount of flux trapped in 
the common area between the tt-lO flux paths. Several cycles of carrier 
drive may be required before this condition stabilizes. Care must be 
taken to limit the carrier drive to values less than the magnetizing force 
required to disturb the remanent flULx around the main aperture. 
The extent to Which the remanent flULx can be incremented is usually 
implemented by means of a smaller core of like magnetic material. The 
smaller core provides the appropriate amount of volt-second drive to 
increment thg storage core in equal steps at various settings of remanent 
flux. Brain has indicated that it is essential that incrementtng should 
alHays occt~ at a constant reference phase with respect to the carrier 
drive unless carrier drive is removed. If this is not done the size of 
the incremental flux change will be dependent on the vector sum of the 
switching and carrier signals. A typical scheme for realizing integrator 
operation is shown in Figure 8. 
ADAPT 
WINDING 
HOLD 
WINDING 
Figure 7 Multiple Aperture Device (MAD) 
The physical requirement of providing a number of hand wound turns 
about the various apertures dictates to a large extent the cost of the de-
vice. Large driving currents, a moderate amount of timing during incre-
menting and relatively low output Signal amplitude necessitate peripheral 
circuitry of considerable complexity. The resultant degradation in the 
basic reliability of the approach then becomes an imposing problem. 
3-9 
225 ma 
2TURNS 
READ OUT 
3-10 
2 TU RNS 
75 k C /5 
0 .2 AMP 
12.n. 
681l.. 
Figure 8 MAD Integrator 
o.003/fd 
-----if . 
6v 
.JL 
SATURATION 
52.9 
b. Orthogonal Core Integrator 
The magnitude and direction of a stored flux may be sensed by apply-
ing a magnetic field orthogonally to the direction of stored flux. 9 This 
causes the remanent flux vector to rotate generating a voltage proportional 
to its rate of change a~d hence its magnitude. The application of a read 
or sensing field at right angles to the stored or written flux minimizes the 
interraction of the sense drive on the stored flux magnetic path. At the 
termination of the read drive the flux vector returns back to its original 
preferred orientation by virtue of domai n elasticity. A typical orthogonal 
core configuration is shown in Figure 9. The flux level stored in the core 
is altered by pulsing the output winding in a manner similar to the incre-
menting techniques previously discussed. Output signal consists of either 
positive or negati ve pulses dependi ng upon the direction of the stored 
flux, with an amplitude proportional to the magnitude of the remanent flux. 
Practical problems similar to those associated with the multi aperture de-
vice previously discussed agai n make physical implementation cumbersome. 
10 
c. Second Hannonic Integrator 
Nondestructive readout of remanent flux may be obtained by reducing 
the sensing drive to a value insufficient to cause irreversible switching. 
Since magnetic cores are generally non-linear the output voltage will con-
tain harmonics of the drive current. In particular, the even harmonic 
SEN SE 
AND ADAPT 
WINDING 
Ih;:;::=::::::'~;=:;:~ SENSE AND ~;j ~! ADAPT WINDING 
t21 ~: DRIVE WINDING 
~~"-=~\b~ FERRITE CORE 
Figure 9 Orthogonal Core 
3-11 
voltage for certain core materials is found to be proportional to the net 
remanent flux level. The second-harmonic generator shmvn in Figure 10 
consists of a pair of tape wound cores driven from an r-f sinusoidal 
power source. The output winding is arranged so that the fundamental com-
~onent of drive voltage cancels out, leaving a second harmonic distortion 
voltage proportional to the remanent flux in the cores. 
By passing a direct current through the output winding the remanent 
flux level may be altered. Due to an interaction between the d-c adapt 
current and the RF drive the rate of change of the remanent flux with 
respect to the adapt current is constant and reversible. Tape-wound cores 
have been found to provide the best performance and because of their higher 
permeability require fewer turns. Typical associated driving, senSing and 
timing circuitry tend to be rather elaborate however. The cancellation of 
the fundamental driving frequency is difficult to achieve in practice thus 
making the desired output signal appear against a background of noise. This 
low level signal must in turn be amplified in order to provide a Signal com-
patible with the associated solid state circuitrY Which it must ultimately 
control. Clearly a separately switched driving source for each pair of 
cores is required in order to provide the individual binary signal inputs 
Whose weights are to be altered. Since the sinusoidal drive currents tend 
to be in the order of 10 to 100 or more milliamperes the driving and peripheral 
circuitry is necessarily elaborate. 
d. Magnetostrictive Integrator 
The direction and magnitude of the net remanent flux in a magneto-
strictive core may be sensed if the core is excited mechanically.11 Figure 
11 shows a simplified scheme for implementing a magnetostrictive storage 
system using an ultrasonic delay line to excite several magnetostrictive 
torroids. Driving source for the sonic delay line is a piezoelectric trans-
ducer. Input to each of the torroids is provided by means of narrow width 
OUTPUT -~r-.I/ 
VOLTAGE ____ ..J' 
ADAPT CURRENT 
Figure 10 Second Harmonic Integrator 
3-12 
--- - - - - - - - - - - ------------~-~---------------
pulses through a separate write coil wound concentrically with the read 
coil. If the frequency and rms amplitude of the stress wave is maintained 
at constant value, the open circuit output of the read coil is approxi-
mately proportional to the flux stored in the individual torroids. Although 
this effect has been demonstrated experimentally by Nagyll and others the 
basic peculiarities of magnetic domain behavior especially under the in-
fluence of mechanical excitation is only crudely understood. 
The experimental ~stems fabricated to date are rather large owing 
to the structural requirements of acoustical devices and the associated 
electronic circuitry necessary to provide proper tunng, current driving 
and voltage amplification. At best considerable experimental work is 
necessary to show that magnetostrictive storage offers any real advantage 
over more conventional electro-magnetic approaches. Indeed, the sensing 
of remanent flux by acoustical means rather than by non-destructive, elec-
trical drive appears to inject an unwarranted interface complexity. 
PI EZOELECT RIC TRANSDU CER lOSSY TERMINATION 
Figure 11 Magnetostrictive Integrator 
3. ConcluSion 
As a result of the foregoing survey it became apparent that none of the 
suggested adaptive devices were sufficiently developed to justify the selec-
tion of a practical approach for immediate circuit implementation of an 
adaptive voter. An explicit evaluation was not attempted owing to the 
superficial treatment of the various devices by academic researchers. 
The magnetic devices with their known sensitivity to temperature stress 
appear to offer the least hope for providing analog memory with long term 
stability. The requirement for providing carefully controlled incrementing 
with relatively large drive currents coupled with the small output signals 
and associated amplification appears to dictate an imposing amount of 
peripheral circuitry. The degradation in reliability as a result of this 
complexity represents a liability which makes practical application doubtful 
for redundant systems. 
3-13 
Page intentionally left blank 
References 
1) B. 1o.r:i dr ow and M. E. Hoff, "Adaptive Swif.ching Circuits," Technical 
Report No. 1553-1, Stanford Electronics Laboratories, June 1960. 
2) B. Widrow, "Adaptive Sampled-Data Systems - A Statistical Theory 
of Adaption, It 1959 WESCON Convention Record, part 4. 
3) B. Widrow, ~An Adaptive 'Adaline' Neuron Using Chemical Memistcrs,~ 
Technical Report No. 1553-2, Stanford Electronics Laboratories, 
October 1960. 
4) "An Introduction to Solions,l! Texas Research and Electronic Corp., 
Dallas, June 1961. 
5) "D-C Amplifier Uses Fluid-State Tetrode, It Electronic Products 
Magazine, October 1962. 
6) "Capacitive Readout Integrator,'" Technical Brochure, Curti~ 
Instruments, Inc., Mount KiSCO, New York. 
7) J. A. Rajcbman and A. W. Lo, ·"'The Tranfluxor, II Proceedings of the 
I.R.E., March 1956. 
8) A. E. Brain, "The Simulation of Neural Elements by Electrical Net-
works based on Multi-Aperture Magnetic Cores," Proceedings of the 
I.R.E., January 1961. 
9) J. K. Hawkins and C. J. Munsey, "A Magnetic Integrator for the Percep-
trcn Program,n Annual Summary Report, Publication No. U-603, Aeronu-
tronics, Newport Beach, Col., July 30, 1960. 
10) H. S. Crafts, ~A Magnetic Variable Gain Component for Adaptive Net-
works, II SEL-62-l47, Technical Report 1851-2, Stanford Electronics 
Laboratories, December 1962. 
11) G. Nagy "Analogue Memory Mechanisms for Neural Nets," Cognitive 
Systems Research ~rogram, Contract No. NONR 401(40), Report No.3, 
Cornell University, Ithaca, New York, August 31, 1962. 
3-15 
TRANSOR ANALYSIS 
by 
R. S. Bray 
P. A. Jensen 
C. G. Masters 
September 1963 
Appendix 4 
I. 
II. 
III. 
IV. 
V. 
4-ii 
TABLE OF CONTENTS 
INTRODUCTION . . . . . . . 
RESTORING CIRCUIT MODELS 
A. The Transor Decision Function 
B. The Threshold Decision Function 
FAIL URE MODES 
A. Transor Restoring Circuit Vulnerability 
B. Threshold Restoring Circuit Vulnerability 
RELIABILITY ANALYSIS . . . 
A. Transor Reliability Defined 
B. Output Modes Defined. . . 
C. Upper Bound on Transor Reliability 
D. Transor Reliability for Strictly Asymmetric Failure Modes. 
E. Transor Reliability for Mutually Exclusive Output 
Failure Modes . . . . . . . . . . . . . . . 
F. Transor Reliability for Symmetrical Environment 
CONCLUSION 
BIBLIOGRAPHY 
4-1 
4-3 
4-3 
4-4 
4-6 
4-6 
4-10 
4-11 
4-11 
4-11 
4-12 
4-13 
4-13 
4-15 
4-17 
4-25 
1. INTRODUCTION 
In recent years many novel schemes have been proposed to improve digital system 
reliability through the use of "redundant" equipment. Several of these, patterned after 
a concept of Von Neumann, 1 require a "restoring organ, " "restorer" or "voter" to be placed 
after each set of redundant signal processors which perform a particular subsystem 
function. A restoring organ receives an input fr om each member of the associated set of 
processors . From these nominally identical input signals, the restoring organ produces 
an estimate of the correct subsystem output based on one or more specified de cision 
criteria. It should be noted that the restorer does not perform any data processor function 
but acts as an error correcting transmission channel connecting two signal processors. 
It has been shown in the literature2 that the theoretically most efficient restoring 
organ is one that is capable of adapting itself to changes in the reliability of inputs. 
Specifically, for threshold type organs it has been shown that the optimum use of n unreliable 
versions of the same signal could be achieved by dynamically weighting each input in accor-
dance with its relative reliability. Inputs which have a past history of being more r eliable 
are given the heavier vote weights , and the unreliable inputs the lighter vote weights. 
The ideal restoring organ would sense the unreliable inputs and decide on the optimal vote 
weights . By efficiently tailoring the restoring organ to its ever-changing environment, 
Significant improvement could be achieved over the presently popular majority restoring 
Circuits. 
In studying adaptive restoring organs, Company A has shown 3 that circuit imple-
mentation of adaptive restoring organs for the specific requirements of redundant space-
borne systems is not yet practical. The complex circuitry required under the present 
"state of the art" to perform the adaptive function results in machines too cumbersome and 
unreliable to compete with less sophisticated redundant systems. This does not mean 
though that the present restoring organs used in redundancy techniques are adequate and 
cannot be improved upon. 
The purpose of this study is to investigate a new restoring organ proposed by Comp-
any A, called the Transor4. A characteristic of many failed subsystems is their tendency 
to have steady- state outputs as their dominant failure mode . In Transor, steady-state 
outputs are automatically deweighted by detecting only changes in states rather than the 
absolute states themselves. In an environment where the probability of steady state 
1, 2, 3, 4 See Bibliography 
4-1 
failure is relatively high, a restoring urgan which ignores its steady-state inputs can derive 
a correct output with less than a majority of working inputs. 
The salient characteristics of the Transor restoring organ are best shown by contrasting 
them to the corresponding characteristics of a majority restoring organ. The majority 
organ was chosen as a reference base because of its similarity in function to the Transor 
and because it is presently the most widely used restoring organ. 
4-2 
II. RESTORING CIRCUIT MODLES 
A. THE TRANS OR DECISION FUNCTION 
To be consistent with the terminology adopted by one group of investigators, the 
term "restoring circuit" will be sued to denote one functional unit of a restoring organ or 
restorer. A very general block diagram of a Transor restoring circuit having binary inputs 
(Xl' x2'··· X R) and an output z is shown in figure T-l. 
SUM CHANGE I----j 
DETECTOR 
Figure T-l. Transor Restoring Circuit 
OUTPUT 
MEMORY 
Some of the salient characteristics of a Transor Restoring circuit are noted below: 
1) It has memory 
2) It operates only on the number of changes in the states of 
individual inputs between two adjacent bit times, (t - 1) and 
( t ). 
3) It is a binary voting element with a binary output. 
4) It has two thresholds, not necessarily of the same magnitude , 
which combine with the states of the input at (t - 1) and ( t ) 
to determine the element output. 
The functional relationship, describing the Transor Decision function can be stated as 
follows 
z 
(1) 
4-3 
-------------------------.----
The number of binary Ones appearing on its inputs during each bit time are summed and com-
pared with the number present during the previous time period. If the change is positive 
and greater than a given threshold T 1 then the output z is forced to a binary One. If the 
change is negative and greater in magnitude then a second threshold, T , the output is o 
forced to a binary Zero. If neither threshold is exceeded, the output does not change from its 
previous state. This operation may be summarized by t he following decision rule state -
ments. 
R R LXi (t) _[ xi(t -l ) ~ T - z(t) 1 1 (2) 
0 0 
R R 
L (t) -I (t-1) $ -T - z(t) 0 x. X. 1 1 0 (3) 
0 0 
R R 
-T < [ (t) -[ (t-l) < T - Z (t) Z (t -1) x. x . 0 1 1 1 (4) 
0 0 
B. THE THRESHOLD DECISION FUNCTION 
The threshold model* consists of a black box having a certain number of binary inputs 
(xl' x
2 
... x
R
) and an output z. At any bit time (t) the state of the output line z is a 
function of the state of the input lines and the threshold T. A general relationship similar 
to equation (1), but describing the threshold decision function may be delineated by the 
following expression. 
If the output , z, can assume either a Zero orOne state, the threshold restoring circuit 
makes a decision to force its output to the One state under the following decision rule: 
(5) 
* The majority gate is a threshold model with T R+1 2 ,where R is the number of inputs. 
4-4 
If 
R 
L x. (t) 2: 1 T Z(t) 1 (6) 
0 
and to the Zero state when 
R L \ (t) < T z(t) 0 (7) 
0 
4-5 
III. FAILURE MODES 
A. TRANSOR RESTORING CIRCUIT VULNERABILITY 
Before the reliability of any Transor network can be expressed in a meaningful 
mathematical form , the failure modes of the individual subsystems appearing at the 
Transor's inputs must be explicitly stated. 
A characteristic of Transor is its ability to differentiate between transistional and 
steady-state failures. This property creates failure modes different from those of 
threshold decision. Specifically, a signal processor is assumed either to be working 
correctly or failed into one of the following modes: 
1. The transitional mode , in which extra Ones and/ or extra Zeros 
appear at the output , and 
2. The steady-state mode, in which the output permanently remains 
in a single state. 
A transition (figure T - 2) is defined as the rise or fall of a pulse during its switching 
time. The restoring circuit executes a decision by vector summing the change 
I 
I 
I 
I/,: '\ ' I , , , I 
I , I I 
\../ 
~~~ 
TRANS I T ION INTERVAL S 
Figure T-2. Transition Intervals 
in input pulses on the R redundant lines during the vote interval and a decision is made 
according to the decision rules (2) through (4). The term "extra One"implies a one has 
appeared on a signal processor's output when it should have been a Zero . By going to the 
wrong state a signal processor creates a wrong transition which is voted by the Transor. 
Wrong transitions can occur through diode failures in the gating section of diode-transistor 
4-6 
type signal processors. These failures sporadically generate "extra" Ones or "extra" 
Zeros as a function of the information at the gate t s inputs. To illustrate , consider a 
three input T ransor voting on the output of a network of redundant AND gates. The 
state of the binary inputs may be represented by the state vector S. (t) below. 1 
Xl 
(t) 
s (t) (t) 
1 
X2 
x3 
(t) 
In figure T-3 a diode is assumed to have opened in branch (1) of two of the gates causing 
those branches to appear as Ones. An erroneous One will appear at the gate's output 
whenever a correct Zero appears on those inputs and correct Ones appear on the remainder 
of the inputs. However, if all the input diodes open or an output element fails , the gating 
function will be destroyed, and the output will assume a steady-state. A method for 
determining the probability that a signal processor will fail into either of these two modes is 
discussed in Appendix 1. 
t - I t 
H --. [:] 
t -I t 
m ----. [:] 
t -I 
m--[:] 
Figure T-3. 
2 
3 
2 
3 
I 
2 
3 
TRANSOR 
INPUT 
[} -TI 
m ----. 
Generation of Wr ong T ransitions in Redundant AND Gates 
m 
4-7 
Because transitions are vector quantities their occurrence in the wrong direction 
may threaten Transor performance in three ways: 
1. Wrong transitions cancelling correct transitions . 
2. Wrong transitions occurring while the correct inputs remain in 
the same state (a series of Ones or Zeros). During this time 
the correct inputs have lost their voting power. 
3. Wrong transistions temporarily simulating steady-state failures. 
Wrong transitions produced by "extra Ones" and/ or "extra Zeros" over a sequence of bit 
times can result in "error correlation" and create a variety of failure modes , subject to 
the nominally correct input states to the Transor for the conSidered sequence. 
Figure T-4 shows this more clearly when state vectors are used to represent the inputs 
to a hve input Transor. Inputs Xl and x2 are assumed to have failed and capable of 
randomly producing wrong transitions in either direction , 1. e, extra Ones or Zeros. 
No inputs are assumed failed to a steady-state. For definiteness all inputs at time (t) 
may be assumed correct. In the following bit times (proceeding to the right) several 
failure patterns are possible for each nominally correct input state. At (t+1) the states 
(2), (3), (4) , and (5) are considered among the possible states (four other possible states 
including (1) have been omitted as repetitious) . Observe that sequence (1) (2) 
is the most damaging because only the wrong transitions have any voting power. For a 
threshold set as low as two this would result in a wrong decision. The sequence (1) - (5) 
represents a possibility in which both erroneous inputs have temporarily "stuck" in one 
state Simulating a temporary steady-state. The sequences (1) - (3) and (1) - (4) are 
the most likely possibilities in which one of the failed inputs is temporarily correct. In 
the next bit time (t + 2) transitions to the possible states (3), (4) , (5) and (6) and (7) are 
considered (again repetitions are omitted). Shown here are the cancellation effects 
caused by the introduction of errors on the previous bit time, demonstrating the "error 
correlation" inherent in Transor. The sequence (2) - (5) is the most damaging because 
any threshold greater than one would have resulted in a wrong decision. Observe the 
tradeoff conflict created by the necessity for setting the threshold at a value greater than 
two in the sequence (1) (2) and the same threshold at a value less than two in the 
sequence (2) (5) in the following bit time. Clearly there must exist an optimum 
threshold. Inclusion in figure (4) of transitions from states (4) and (5) would have pro -
duced no new failure modes since they are but the duals of (2) and (3). 
4-8 
CD 
t + I t+2 
0 
0 
I 
Figure T-4. Possible Sequences of Input States for a Five Input 
Transor Over Two Bit Times 
0 
0 
0 
0 
0 
0 
0 
I 
0 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
® 
® 
® 
® 
® 
4-9 
B. THRESHOLD RESTORING CIRCUIT VULNERABILITY 
A threshold restoring circuit makes a decision at time (t) by summing the number of 
binary Ones appearing momentarily at its inputs . The decision is independent of the input 
state at time (t - 1). By virtue of decision rule (6) if the number of errors appearing on 
the restorer's inputs is greater than the threshold T the restorer makes the wrong output 
decision. As opposed to Transor, the threshold device cannot differentiate between pure 
wrong transitions and steady-state failures so that both failure modes may be lumped 
together. To illustrate , consider a three-input threshold restoring circuit whose threshold 
is set at two (T = 2) . For definiteness assume that xl and x2 at time (t) are in error and in 
the same state and x3 is correct as indicated below. 
xl 
(t) 
x 
(t) z(t) 
x2 x = x 
x3 
(t) 
x 
Under this condition a wrong decision will be made . This may be considered a "worst case" 
failure mode because the alternate situation is possible where xl and x2 have failed into 
opposite steady states. 
1 
o z 
In this case the errors nullify each other and the restoring circuit's output will follow 
the single correct input (x3). In most reliability analyses the "worst case" is assumed, and 
any two failures in a set of restoring circuit inputs are assumed to cause system failure. 
4-10 
.--- ------ -- - - -
IV - RELIABILITY ANALYSIS 
A. RELIABILJTY DEFINED 
In keeping with the usual concept of reliability, the reliability of a Transor restoring 
circuit will be defined as the probability that it never makes a wrong decision during its 
mission time. For analysis purposes the transor itself is assumed perfectly reliable, 
i. e., a wrong de cision is never made through component failure within the Transor itself. 
In part ill it was shown that errors appearing on the Transor inputs in a particular bit 
time could be correlated with errors that appeared on adjacent bit times to produce unique 
failure modes. Two of these were : 
(1) Cancellation effects 
(2) Simulated steady-state 
In the follOWing discussion it will be shown how these failure modes may be "built in" 
to reliability models by USing multinomial expansions. Analyt\cal models formulated in this 
manner may be easily compared with models for threshold reliability. 
B. OUTPUT MODES DEFINED 
Any output of a binary signal processor can be classified into one of six mutually 
exclusive classes over the element's mission time. These are: 
1) Correct 
2) Continuous Zero state 
3) Continuous One state 
4) Extra Ones but no extra Zeros 
5) Extra Zeros but no extra Ones 
6) Both extra Ones and Zeros. 
Moreover the output of a system, composed of binary signal processors may be defined 
by the six mutually exclusive classes above. Each of these classes will be assigned the 
following probability measures in conformance with the Transor deCision rules. 
1) p; the probability that the output is correct 
2&3) q ; the probability that the output is either a continuous Zero or a 
s 
continuous One. 
4) q1 ; the probability that the output generates extra Ones, but not extra Zeros . 
4-11 
5) 
6) 
q ; the probability that the output generates extra Zeros, but not extra Ones 
o 
Q10; the probability that the output generates both extra Ones and Zeros randomly. 
Note that the measure q is the result of the union of classes (2) and (3). The transitional 
s 
probabilities q1' qo and ql0 are defined to represent only the probabilities that a particular 
set of components, whose failure will cause wrong transitions to be generated randomly, 
will fail. 
C . UPPER BOUND ON TRANSOR RE LIABI LITY 
An upper bound on reliability is easily obtained by excluding all but steady - state failures 
from the environment. If f3 is a random variable denoting the number of correct transi-
tions (or working inputs) and y the number of inputs failed to a steady-state; a probability 
denSity function may be defined over the sample space as 
Since Transor ignores steady-state failures the only criterion for a correct decision 
is that 
f3 > T 
- 0 
The corresponding limits on yare 
where T 
o 
y ~ R - T 
T. The reliability is 
R 
R U. B: L 
[3=T 
(8) 
(9) 
In an environment capable of producing only steady-state failures , the maximum 
reliability and error correction capability is obtained by setting T = 1. This is the optimum 
threshold. From equation (8) we see that Transor can correct at best R - 1 failures in 
an order R redundant system. 
4-12 
D. TRANSOR RELIABILJTY FOR STRICTLY ASYMMETRIC FAILURE MODES 
Excluding from the mutually exclusive ways an environment can fail class (6) and 
either class (4) or (5) limits transitional failure modes to states (2), (3), (4) and (5) in 
fig. (4) . Of these the sequence (1) - (2) is the "worst case". For definiteness let it be 
assumed that Transor inputs may produce only extra Zeros and steady-state failures. Let 
a be a random variable denoting the number of wrong transitions to the Zero state. 
o 
The density function on this sample space is 
~ " &' :, · 0) pP 
.A wrong decision will be made unless 
o S T - 1 
o 0 
Since it is necessary that 
{3 ~ T 
o 
the limits on y must be 
ySR-T-o 
o 0 
The reliability is 
T -1 R - T - Q 
0 0 0 
R L L 
Q =0 y = O 
0 
(R -.
o
-
R ) R-. 0 - r 0 0 
y , y,a P qY ~ o s 
E. TRANSOR RELIABILJTY FOR MUTUALLY EXCLUSIVE OUTPUT FAILURE MODES 
The scope of the environment considered in part D can be broadened to include both 
(10) 
the mutually excluSive classes (4) and (5). Each input may be failed to either steady-state, 
extra Ones or extra Zeros (but not both). The failure modes (figure T-5) may be represented 
in a manner similar to figure T-4; inputs xl and x2 assumed failed in one of the four mutually 
exclUSive ways listed above. 
The sample space may be described by the denSity function 
4-13 
t + I 
0 
0 
I 
I 
0 
0 
0 
0 
0 
I 
o 
o 
o 
® 
@ 
@) 
® 
t + 2 
I 
I 
I 
0 ® 
0 
0 
0 
0 
G) 
Figure T-5. Possible Sequences for a Five-Input Transor with Mutually 
Exclusive Output Failure Modes 
The sequence (1) - (2) in figure (5) implies that a Transor will make a wrong 
decision unless 
a :5T - 1 
o 0 
and its dual 
From the sequences (1) - (3) and (1) (4) respectively 
4-14 
(11) 
(12) 
(13) 
f3 + a ~T 
o 0 
for a correct decision. However examination of the sequences (3) 
(14) 
(4) and (4) (3) 
show that inequalities (13) and (14) do not represent "worst cases " . "Error correlation" 
between the bit times (t + 1) and (t + 2) have produced a temporary steady-state. A correct 
decision will be made only if 
From (15) and (16) 
f3~T 
o 
(R - T ) 
o 
y ~ (R - T 1) 
- a 1 a o 
Of these last two inequalities the number of allowable steady-state failures , y , will be 
governed by the highest threshold , To or T 1· 
The reliability will take the form 
T -1 T -1 R-T - a - a 1 R -0 1 0 0 a - a 1- y a 
~- 0 0 R L L L R "0 "1 Y qo ql a = 0 a =0 y =0 0 1 a - a - a 01, , , 
where To is assumed > T 1· 
F . TRANSOR RELIABILITY FOR A SYMMETRICAL ENVIRONMENT 
(15) 
(16) 
(17) 
(18) 
a 1 y 
qs 
(19) 
A symmetrical environment utilizing Transor deCision will be defined as the mutually 
exclusive classes (1), (2), (3) and (6). Wrong transitions may occur in both directions and at 
random. Therefore a 0 a 1 = a and To = T 1 = T . The density function on this sample 
may be written as 
From figure T-4 it can be seen that a wrong decision will be made unless 
4-15 
a ~ T - 1 
and f3-a 2: T 
From (21) 
y ~ R-T-2a 
Therefore the reliability for the symmetrical environment is 
T-1 
R= 
a =0 
/ 
4-16 
- '( 
q Y 
s 
(20) 
(21) 
(22) 
(23) 
V. CONCLUSION 
The dynamic characteristics of the Transor decision function make this type restoring 
circuit unique to the present art. The mission of this part of the Failure Free Systems 
Study has been to evaluate the potential usefulness of the Transor as a restoring circuit. 
Primarily because it is most commonly used in present redundant equipment, the thres-
hold type restoring circuit has been chosen as the reference point for the evaluation primarily. It has 
been hypothesized that, if it can be shown that the Transor failure masking capability com-
pares favorably to that of the threshold restoring circuit, further development, including the 
construction of a breadboard model, should be justified. 
The results of section IV have shown that there are certain environments in which 
Transor can be used to advantage in improving system reliability. For example, the 
maximum error restoring capability of Transor is shown to be R - 1 failures of R redundant 
lines in an environment free from transitional failures. This is a significant improvement 
over the majority threshold restoring capability under the same conditions. There is need 
for caution, however, for in environments where symmetrical transitional errors are 
possible error correlation may make Transor performance inferior to threshold. From the 
reliability models, a tradeoff may be determined in terms of the output error probabilities 
of the environment. 
The work done up to this point represents only a first step in Transor decision study. 
Work yet to be done includes: (1) a general Transor reliability model incorporating all the 
possible failure modes and (2) a decision rule for determining an optimum threshold. 
In addition to continuing the analytical effort described in this report, a computer s im-
ulation program is being written to aid in the task (1) effort. This will be a relatively simple 
but versatile program designed to accommodate any set of restricting assumptions including 
those made in the four models derived in this report. The results of this r eport have shown 
a solution to task (2) would be desirable because of the tradeoffs between different failure 
modes. If the error probabilities of the signal processor outputs are known in the design 
stage maximum reliability can be bought for zero additional cost by a judicious choice of 
the thresholds. 
4-17 
VI. APPENDIX 
Determination of the Reliability Parameters p, qs' qo' q1' q10 in a Signal 
Processor. 
In section IV it was shown that reliability models could be formulated in terms of the 
output error probabilities of a set of redundant signal processors. This section describes 
a method for determining these probabilities. 
Consider a set X* which has for its members the n components of a signal processor. 
Each member (component) has two possible states: 
th . th b' k' xi; e 1- mem er 1S wor mg. 
- . th Xi; the 1- member has failed. 
Let each component have a reliability 
P(x.) 
1 
and a probability of failure 
= e 
\ . t 
- f\ 1 
_ A t 
Ptx.) = 1 - e i 
1 
The probability measure on the sample space of X may be partitioned into the canonical 
form 
1 = P ( x 1 n x 2 n __ xn 
+ P ( x 1 n x 2 n x 3 - - xn ) + ... + 
__ x 
n 
(24) 
Briefly, the method requires determining the correspondence between groups of the terms 
in (24) and the individual terms in 
(25) 
Obviously the parameter p, that the signal processor output is correct is 
The remaining 2n -1 terms in (24) are mapped ~nto the four remaining parameters in (25) by 
paritioning the set X into subsets whose members are defined by those components whose 
* Summary of all the notation to be used is included on the last page of this appendix. 
4-18 
failure will result in one of the four mutually exclusive events described in part IV. SpeciE-
cally let 
X be the set whose failure results in either a steady-state Zero or One. ss 
Xl be the set whose failure results in extra Ones. 
X be the set whose failure results in extra Zeros. o 
X 10 be the set whose failure results in extra Ones and Zeros. 
Since each component may fail by shorting or opening, these two modes will determine 
membership in one or more of the above sets. If the probability of a component shorting 
given that its failed, P( x . s I x .), is p. then the joint probability of x. failing and shorting 
1 1 1 1 
is 
P (- n s) = P (x. s) = P. (1 _ e - A it) \ xi 1 1 
Let the probability of an xi opening given that its failed the P (Xi 0 I xi) 
Then 
P (\ s I x.) + P(x. o I~) =1 1 1 
and 01-P (Xi xi) = 1 
- Pi. 
Also since for each x. the events working, shorted or opened are mutually exclusive the 1 
probability of a component not shorting is 
P (x. s ) = P (x. U x. 0) = 1 _ p. 
1 1 . 1 1 
_A. t (1 - e 1 
To illustrate the technique a NAND gate will be analyzed using the test results contained in 
an earlier report. 5 
+12 +6 
R4 R8 
C9 
~----.. OUTPUT 
CR2 R5 T7 
CR3 
-12 
Figure AT-I. NAND GATE 
4-19 
The pertinent results are included below. 
1. AND gate input diodes; CR1, CR2, CR3 
A. OPEN - Any open circuit input is equivalent to a logical "one" on that input; it 
cannot inhibit the AND gate. 
B. SHORT - A shorted diode will not affect the ability to perform the AND function if 
that input has low impedance to ground in the "zero" state and high impedance to a 
positive voltage in the "one" state. The line with a shorted diode is no longer 
isolated from other inputs ; that line is shorted to the AND gate output and may, 
therefore, be an incorrect "zero". 
2. AND gate resistor; R4 
A. OPEN - The AND gate has no voltage available to drive current into the transistor 
base, so the NAND gate output remains a "one". 
B. SHORT- This will cause a low impedance path from the +12 volt power supply 
through the input diodes to all of the inputs to the gate. If any of these inputs 
are from NAND gate transistors which are conducting, that input will also be a 
low impedance to ground. A low impedance path then exists from the power 
supply to ground, and a high current will flow through the diode and transistor 
according to the magnitude of the impedance of the power supply and components 
involved. In the tests observed, this current was not sufficient to damage the 
transistor or diode and did not blow the fuse on the power supply . However, if 
any inputs are from flip-flops, the clamp diode will turn on when the voltage 
exceeds the clamp voltage. A low impedance path then exists from the +12 volt 
power supply through the shorted AND gate resistor , the input diode, and 
may seriously overload the clamp voltage supply, depending how the clamp 
voltage is derived. In the tests observed, this current was sufficient to cause 
both the input diode and clamp diode to short and the clamp voltage to rise 
toward +12 volts . 
3. Input resistor - capacitor ; R5 , C9 
4-20 
A. Resistor SHORT- The transistor base voltage will be the AND gate output. 
This will normally cause the transistor to conduct, so that the output will 
be "zero" for any logic input. 
B. Resistor OPEN- This will cause the transistor to be off, so that the output will 
be a "one" for all logic inputs. 
-~--~------------- -- -
C. OPEN C9 - This does not adversely affect operation, unless the switching time is 
critical, in which case NAND gate turn-on time was increased from 65 nanoseconds 
with C9 to 80 nanoseconds without C9 ; turn-off time was increased from 25 to 45 
nanoseconds in one approximate measurement with a constant load on the output 
of the circuit . The turn-on time was measured as the time from the input going 
positive above +1. 6 v. until the output goes to +1. 6 v. from the "one" state. The 
turn-off time was measured as the time from the input going negative below +2.4 v. 
until the output goes to +2. 4 v. from the "zero" state. 
4. Base bias resistor, R6 
A. OPEN - This will normally cause the transistor to conduct, so that the output will 
be "zero" for any logic input, except that when the AND gate voltage is going 
negative from the "one" state, this voltage change is coupled across C9 and will 
turn the transistor off until the transient effect has ended. 
B. SHORT- The short of the base resistor may cause damage to the output transistor , 
since -12 volts on the base exceeds the maximum rating of 5 volts for V EBO' The 
output voltage will depend on the failure mode, if any, of the transistor. In three 
multiple failure tests that included short of the base bias resistor in a NAND gate, 
two transistors shorted base to collector , which resulted in a -12 volt output; 
one transistor shorted collector to emitter, which resulted in a nzero " output. 
The -12 volt output did not cause any significant difference than a normal "zero" 
output to the following circuitry. 
5. Collector (output) resistor, R8. 
A. OPEN- The removal of the output resistor does not affect the logical operation 
of the circuit, since any loads are also to positive voltage sources. The output 
rise time will be somewhat slower but the output will turn off faster because the 
output voltage in the "one " state is lower and the load current is less. 
B. SHORT- The output voltage will be +6 volts ; the current in the transistor will be 
high if the transistor is conducting. This current was not sufficient to cause 
permanant damage to the transistor in the observed tests. 
6. Transistor, T7 
The transistor may fail into any of several possible modes, but the circuit output 
will usually be a "one " unless a low impedance path exists from the output to ground, 
such as when the collector is shorted to emitter, or if the transistor is otherwise 
forced to remain conducting from collector to emitter. 
4-21 
From the test results the component failures may be categorized (below) into their 
effects on the NAND gate ' s output. 
I Cbmponents Causing Failure into Steady State "1" 
1) R4 Open 
2) R5 Open 
3) T7 (most modes result in a "1" ) 
II Components Causing Failures into Steady State "0" 
1) R5 short 
2) R6 short 
3) R6 open 
4) CR1 and CR2 and CR3 open (together) 
III Component Failures that will Produce Transitional Extra "Ones " 
1) CR1 or CR2 or CR3 open 
2) CR1 and CR2 open 
3) CR1 and CR3 open 
4) CR2 and CR3 open 
From the three categories above may be formed the mutually exclusive sets 
Set X 
s 
Xs (1): ")c 4 0 
X (2): i 5 
s 
X (3): x 6 s 
X (4):x 7 s 
0 0 x 0 X (5): "1 n x 2 n s 3 
The probability of a steady-state failure is 
4-22 
5 
qs = L 
i =1 
5 
L 
1 F j 
P robability of X (i) = P 
s 
[XS (i)] 
- A t (1- p 4) ( 1 - e 4) 
1 - e - Ast 
1 _ e-Aet 
1 - e -A7t 
[ (1 - p) (1 - e -At) J 3 
5 
Probability of X (i) 
o 
3 (l-e) (l-e -At) • 
-2 At [ - A4t] -(A5+ A6+A7)t 
e 1-(1- p 4) (l-e ) e 
-At 2 -At 
3 [(1- p) (l-e )] e [1-(1- p 4) 
The probability of an extra zero is 
q = 
o 
2 
.r: 
1 = 1 
- A t 
. (l-e 4 ) ] . 
- ( A 5 + A 6 + A 7)t 
e 
Observe from the set X that transitional errors will be caused by less than three of 
o 
the input diodes failing through opening. In actuality the probability of a wrong transition 
for the member X (1) in the set X is the joint probability: 
o 0 
P (i .!!:.- Diode open n "0" on the i.!!:.- input n 
n-l diodes working n "l's" on the n-l diodes n no steady-state failures) 
=P (i th Diode open). P (n-l Diodes working). P ("0 " on i th input n 
l's on n- l inputs I i~ Diode open n n-l working). P (no steady- state failure) 
The third term in the joint probability expression is the conditional probability express -
ing the fact that a wr.ong transition is a function of the information appearing at the gate inputs 
in any bit time . For all practical purposes this term may be set equal to unity due to the 
tremendous speed at which information is processed and the resulting short time between 
occurrence of all possible input states. This same reasoning may be applied to the other 
member X (2). 
o 
Note that a NAND gate possesses an asymmetric environment because there are no 
failure modes that can result in the exclusive classes Xl or Xl o' 
4-23 
Thus the reliability of a Transor voting on the output of a network of redundant NAND gates 
can be defined by equation (10) in part IV. 
4-24 
The following notation was used in this appendix. 
1) 
2) 
3) 
4) 
5) 
6) 
7) 
8) 
9) 
10) 
11) 
12) 
13) 
14) 
th 
xi' the event that the i-component is working correctly. 
x.; the event that the i th component has failed. 
1 
P (x.); probability of the defined event (1) 
1 
p ex.) = 1 - P (x.) 
1 1 
s th 
x. ; the event that the i-component has shorted 
1 
the event that the i ~ component has opened because the probability 
space of each component is the logical union of 
x. U (x. n xS ) U (x. n x~) 
s 1 1 I I 
P(x. ); the probability of (5) 
1 
P (x. 0 ) ; the probability of (6) = 1-P (x.) _ P (XI'S) 
1 1 
-s 
X. 
I 
-0 
X, 
1 
; the event that the i~ component has not shorted 
; the event that the i~ component has not opened 
P (x , s ) ; the probability of (9) 
I 
P (x ,o ) ;the probability of {10) 
I 
P (x~ I xi ); the probability of the i th component shorting given that its 
failed = p 
p (x.o I x.); the probability of the i th component opening given that its 
I 1 
failed. = 1- P 
BIBLIOGRAPHY 
1) J. von Neuman, "Probabilistic Logics and the Synthesis of Reliable 
Organisms from Unreliable Components in Automata Studies , "Ed. 
C. E. Shannon and J. McCarthy, Princeton University Press, 1956. 
2) W. H. Pierce, "Adaptive Vote-Takers Improve the Use of Redundancy, " 
Redundancy Techniques for Computing Systems. " Ed. R. H. Wilcox and 
W. C. Mann, Spartan Books, 1962. July 17 , 1961 
3) "A Survey of Adaptive Components for Use in Failure Free Systems ", 
Special Technical Report No.1, Nasw-572, Aug. 1963. 
4) W. C. Mann, "Restorative Processes for Redundant Computing Systems, " 
Redundancy Techniques for Computing Systems, Ed R. H. Wilcox and 
W. C. Mann, Spartan Books, 1962. 
5) A. R. Helland, W. C. Mann, "Failure Effects in Redundant Systems, " 
Report No. EE-3351, Westinghouse Electronics Division 1963. 
4-25 
.-----~- - - - - - - - - - - - - - -
Appendix 5 
COMPARISON OF DYNAMIC AND THRESHOLD RESTORERS 
by 
C. G. Masters 
R. S. Bray 
December 1963 
Section 
I. 
II. 
III. 
IV. 
V. 
VI. 
5-ii 
TABLE OF CONTENTS 
INTRODUCTION . . . . . . . . . . . . . . . . . 
DESCRIPTION OF DYNAMIC RESTORING CIRCUITS 
A. 
B. 
Review of the Transor Decision Function . . . 
Description of the Hamming Distance Restoring Function 
C. Comparison of Transor and the Hamming Distance 
Restoring Circuit. . . . . . . . 
REVIEW OF THE ANALYTICAL EFFORTS 
A. 
B. 
C. 
D. 
E. 
Signal Processor Assumptions . 
Classification of Failure Effects 
Class Probability Measure. 
Analytical Models 
1. Multinomial Model for a Dynamic Restoring Circuit 
2. The Transor Model . . . . . . . . . . . . . 
3. The Hamming Distance Restoring Circuit Model 
4. The Threshold Restoring Circuit Model . . . . 
Threshold Parameters as a Bound on Dynamic Parameters 
F. A Comparison of Transor and the Hamming Distance 
Restoring Circuit 
SIMULATION PROGRAM . 
DISCUSSION OF RESULTS 
A. 
B. 
Simulation Results 
Curves Discussion 
CONCL USIONS 
Page 
5-1 
5-3 
5-3 
5-4 
5-5 
5-7 
5-7 
5-8 
5-10 
5-11 
5-11 
5-12 
5-12 
5-13 
5-14 
5-16 
5-19 
5-21 
5- 21 
5- 24 
5-29 
Figure 
1. 
2. 
3. 
4. 
5. 
6. 
7. 
8. 
9. 
LIST OF FIGURES 
Block Diagram of the Transor . . . . . . . . . . . . . 
Block Diagra m of the Ha mming Distance Restoring Cir cuit 
P ossible Five-Input Seque nces for Two Failures . 
Typical Histogr a m . . . . . . . . 
Approximation to Re liability Curve . 
TransQr Orde r 5 Redundancy 
Comparison of Transor a nd Ha mming Dis ta nce 
Comparison of Threshold and Ha mming Distance 
Comparison of Order 7 Threshold and Orde r 5 Hamming Distance 
Page 
5-4 
5-4 
5-9 
5- 22 
5-22 
5-23 
5-25 
5-26 
5-28 
5-iii 
1. INTRODUCTION 
The basic function of a restoring circuit is discussed in Part One of Special Technical 
Report No. 4 which is contained in Appendix 4 of this report. The Transor is described in 
that report as a device which is potentially useful for performing the restoring function. Be-
cause it is sensitive only to changes in the states of its inputs. a restoring circuit of this 
type appears to have advantages over the common threshold voter in environments where 
most failures result in steady state inputs to the restorers. Of course. such a circuit should 
be inferior to the threshold voter when failures result in transient errors. 
The original goal of this stuuy wa.s the determination of the ratio of probability of steady-
state errors to probability of transient errors for which any decrease in the ratio will make 
the use of threshold voter advantageous compared to the Transor. In the process of perform-
ing the study, a new dynamic restoring circuit has been developed which has obvious advan-
tages over the Transor for certain input failure pattern conditions . The invention of the 
Hamming Distance Restoring Circuit caused a shift in the primary goal to include evaluation 
of both it and the Transor relative to each other, as well as to the threshold voter. 
Section II of this report includes a brief review of the Transor and describes the Ham-
ming Distance Restoring Circuit. Section III reviews the analytical techniques which have 
been used in searching for tools to evaluate the two restorers. Section IV describes the com -
puter Simulation program which was used in the evaluation . Sections V and VI contain the 
results which have been obtained and the conc lu sions which can be drawn from these results. 
5-1 
Page intentionally left blank 
II, DESCRIPTION OF DYNAMIC RESTORING CIRCUITS 
A. REVIEW OF THE TRANSOR FUNCTION 
The Transor is described in detail in Appendix 4. A brief review of the Transor func-
tion is given here to ease the discussion of the Hamming Distance function and to facilitate a 
rough comparison of the salient features of each . 
A block diagram of the Transor Restor ing Circuit with binary inputs (xl' x 2· ... XR ) 
is shown in figure 1. Th e functional relationship between the output Z, the inputs, and the 
thresholds TO and T 1 is expressed in general as 
(1) 
The specific function summarized by this relationship may be described as follows . 
The number of binary "ones" appearing on the Transor inputs during each bit time (t) are 
summed and compared with the number present during th e previous time period (t-1), If 
the change is positive and gr eater than a given thr eshold T 1 then the output Z is forced to a 
binary "one", If the change is negative and greater in magnitude than a second threshold, 
TO ' the output is forced to a binary "zero ", If neither threshold is exceeded, the output does 
not change from its previous state. This operation may be completely specified by th e follow-
ing deCision rule statements: 
R 
L ( t) X. 1 
i=O 
R 
L (t) X. 1 
i=O 
R 
-T < L ( t) x. 0 1 
0 
R 
L 
i=O 
R 
L 
i=O 
t 
o 
x. 
1 
(t - l) 
X(t- 1) ~ 
1 
(t- 1) 
x. < 
1 
T - z(t) 
1 
T _ z(t) 
o 
T - Z(t) 
1 
1 
o 
5-3 
SUM CHANGE ~---l 
DETECTOR 
To 
OUTPUT 
MEMORY 
Figure 1. Block Diagram of the Transor 
B. DESCRIPTION OF THE HAMMING DISTANCE RESTORING CIRCUIT DECISION 
FUNCTIO 
A block diagram for a Hamming Distance Restoring Circuit with binary inputs (x l ' 
x.2' .. . ~) is shown in figure 2. The functional relationship between the output Z, the in-
puts, and the threshold T can be expressed in a form similar to that of Transor: 
Z(t) = f [Z(t-l): (t) (t-I) 1 - Xl ; (t) x -2 
(t-I) (t) (t- l) 
x2 : x2 - x2 : .. . 
(t) (t-l) T ] 
xR xR : 
z 
Again , this relationship summarizes a rather complicated function. In the same man-
ner as the Transor , the output of the Hamming Distance Restoring Circuit tends to remain in 
the z(t-l) state unless the number of state changes on its inputs exceeds some threshold . In 
the latter case, however , the direction of state changes is not considered and output state 
change decisions are made without any cons ide ration of the absolute states of the imputs. Thus , the 
5-4 
STATE 
CHANGE 
DETECTOR 
STATE 
CH ANGE 
DETECTOR 
TiJ MEMORY OUTPUT 
Figure 2. Block Diagram of the Hamming Distance Restoring Circuit 
output at time t, z(t), is always dependent upon z(t -1 ) and th e Hamming distance between the 
(t) (t-1) 
two input vectors· (xl ' x 2' .. . XR) and (Xl' x 2' .. . XR) This re lationship is completely 
specified by the foll owing rule statements* : 
R 
T~ [ 
i=l 
X. 
1 
X. 
1 
( t) 
(t) 
(t-1) I (t) 
x. - z 
1 
C. COMPARISON OF TRANSOR AND THE HAMMI G DISTANCE RESTORI G CIRCUIT 
Th e outstanding char acteri s ti c of the Hammi ng Distance Restoring Circuit which dif-
ferentiates it from the Transor is th a t it igno r es information about th e absolute s tate of its 
inputs. This characteristic can be used to advantage because the input from a signal pro-
c essor producing both e rroneous "ones" and "zeros" cannot canc el th e influence of a working 
processor input as it can in the Transor case. This may be illustrated by conSidering the 
following input patte rn for two bit times . Suppose that input 3 is failed to a steady state 
"ze r o", th at input s 1 and 2 r ep r esent the correct information . and that inputs 4 and 5 are 
producing both extra "ones" and "zeros" at th ese bit tim es. 
I PUTS 
1 (corr ec t) 
2 (correc t) 
3 (failed) 
4 (incorr ect) 
5 (incorr ec t) 
R 
* The function [I X (t) _ (t-1) 1 xi · 
i=l 
x. 
(t-1 ) 
1 
0 
0 
0 
1 
1 
X. 
1 
(t) 
1 
o 
o 
o 
is a measure of th e diffe r e nce between vectors x (t) 
and )t-1 ) which applies frequ ently in formation theory. Th e conception of this measure 
is credited to R. W. Hamming of Bell T elephone Laborator ies. 
5-5 
OUTPUTS 
Threshold (majority , 
T = 3) 
Transor 
Hamming Distance 
Restorer 
o o 
o o 
o 1 
Actually , the states indicated by inputs 4 and 6 need not necessarily occur as a result 
of component failures. For example. if no provision is made for synchronization. corre-
sponding elements o[ a redundant binary counter may become permanently out o[ phase as the 
result of either noise, or the initially random states due to application of power . For thi s 
example, the net change in the number of "ones" is zero. but the total number of state changes 
is four. It ca nnot be said from this one example that the Hamming Distance Restorer can al-
ways withstand more input failures. but grounds for further consideration have certainly been 
established. 
It should be noted at this point that ignoring the absolute state of the inputs provides Ul e 
major advantage of the Hamming Distance Restor e r but it also a disadvantage. Because the 
output Z is not dir ectly related to the absolute states of the input , the output s tate must be 
set to the correct initial state before operation is begin or it has only a chance. perhaps 50% , 
of being correct. If it is not initially correct , Z(t) will always be in th e state opposite to the 
correct one. Transor, on the other hand, will conver ge to the correct value af ter a small 
number of bit times because of its dependence on the direction of state changes. 
The r e maining s ections of this r eport will describe the effor ts which have been made 
to evaluate both Transor and the Hamming Distance Restoring Circuits. These evaluations 
are r efe r enced to the commonly used thr eshold voter. The r esult s of the evaluations are 
discussed in Section V. The ' conditions under which one of the dynamic restoring circuits 
might be more powerful than the thr eshold vote r a r e established . 
• 
5-6 
Ill. REVIEW OF THE A AL YTICAL EFFORTS 
A. SIGNAL PROCESSOR ASSUMPTIONS 
To clarify the description of th e analysis o[ the various restoring Circuits, it seems 
advisable to summarize th e assumptions which have been made concerning the signal pr o-
cessors which provide inputs to the restoring circuits. Each processor is assumed to be 
composed of a set of components , all of wh ich must work properly in order for the proces-
sor output to be correct. It is assumed that the i-th component of the set has a probability 
of failure during the different ial interval 6 t which is proportional to the interval length. 
This probability can be expressed as >-. i 6 t' This implies tllat the r eliability (the proba-
bility that the i-th component does not fail during a time interval, t) given by the expression 
R(t) = e 
- A .t 
1 
(3) 
Because correct operation of all components is r equired [or correct processor opera-
tion and assuming independence of failures between signal processors, th e reliability of a 
processor composed of N components is equal to th e product of the component reliabilities. 
Therefore: 
N 
N N 
- >.. . t -( 2: >-.d 
R IT R IT 1 i =l e = e s 
i=l i =l (4) 
Similarly , if the set of components is partioned into M subsets and a r e liability com -
puted for j - th subset , the processor reliability would be the product of subset r el iabilities. 
Mathematically , this is expressed as 
M 
R IT R. s J 
j=l (5) 
and 
11. 
J 
>-'i 11 . 
-( ~1 ) t ) 
R. 1f R. '" e ) 1 
j= l (6) 
5- 7 
where n. is the number of components in j-th subset and A . is the failure rate of the i-th 
J 1 
component of the .subset. The subsets which the components are partioned into cor r espond 
to the class of processor output errors which failure of the component will cause. The 
classification of errors is discuss ed in U1is section. 
If all failure modes of a component caused only errors of one class. the assumption 
could be made that each component was completely associated with one of the class subsets. 
In general. this is not true . For example. if the output transistor of a binary signal proces-
sor is shorted (emitter to collector). the output would probably become permanently fixed at 
the "zero" level. If. however, the transistor is open circuited. the ou tput of U1e processor 
would probably become permanently fixed at the "one" level. Because subsets are established 
by classification of output error types: the abo\'e transistor cannot be unique ly associated with 
any subset. To make an association. some artificial method must be used to assign to each 
subset only that "portion" of a component which will cause that particular class of output er -
ror. Although the components cannot be phYSically diYided in the required manner. they can 
be analytically split by multiplying the total failure rate of the component by th e conditional 
probability of the occurrence of each possible failure mode. This procedure produces a num-
ber which can be considered the failure rate of a smaller component or subcomponent whose 
failure results in only one of the possible classes of output errors. 
It should be noted at this point that the failure probabilities of the smaller subcomponents 
described above are not independent of the operational state of all other similar components . 
as are the original circuit components. This may be illustrated by referring to th e preYious 
example. If the transistor in the example were split into two subcomponents representing 
the short and open failure modes , and one of the subcomponents had failed. the other compo-
nent could not also fail. The occurrence of a doubl e failure of subcomponents associated with 
a single physical component. however, is normally a relatively improbable event in compari -
son to the other system- failure producing events in associated Circuits . For this reason. 
this dependence effect has been ignored in all the modelS develop ed during this study. 
B. CLASSIFICATION OF FAILURE EFFECTS 
In the initial phase of this study, which is reported in Appendix 4. it was shown that 
the ability of dynamic restorers to differentiate between inputs working correctly and those 
failed to a steady state could generate failure modes different from those of threshold deci-
sion. There are, specifically, four modes which threaten the operation of dynamic restoring 
circuits. 
5-8 
1) Wrong transitions cancelling correct transitions. (A sufficient number leave 
a net number of correct signals insufficient to span the set threshold . ) 
2) Wrong transitions occurring while the correct inputs remain the same state 
(a series of extr a "ones" or "zeros"). During this time the nominally correct 
inputs have lost th eir voting power so that. if enough wrong transitions occur 
at one time. they will span th e threshold and r esult in a wrong decision. 
3) Wrong tranSitions temporarily s imulating s teady state failures. Wr ong tr an-
sitions can combine on adjacent bit tim es in a manner to produce a steady 
state effect. 
4) Steady-state failures. Enough steady-state failu r es would leave insufficient 
correct signals to span the threshold . 
To illustrate, consider figure 3 where state vectors ar used to represent the fi\·e in-
puts to Transor . Inputs xl and x2 are assumed to have fail ed and capable of error. For 
definiteness all inputs at time (t) may be assumed correct. In the following bit times (pro-
ceeding to the right) several failure patterns are possible for each nominally correct input 
s tate. The cancellation mode (1) is clearly shown in th e sequence (2) -- (5) where extra 
··zeros" have appeared at time (t+l). By virtue of th e Transor decision rules, an e rr or 
will be made at (t+2) unless T = 1 since th e net result of the summation oye r (t+1) and (t+2) 
o 
is minus one. Of course, it is also possible for e rrors to cancel each other as in sequences 
(3) - (4) and (3) - (7). 
T T,t I Tt2 
i l 121 
(I) [i tiJ (4) 
i] ~l (6 ) 
[iJ 171 
Figure 3. Possible Five-Input Sequences for Two Failures 
5-9 
The s e cond failur mode (2) is shown in equ nce s (1) - (2) and (1r--(3) and the third 
mod (3) by sequ e nc e (3) - (6). Th e r esult is th e same in the third mode wh e th e r the errors 
ar e c aused by wro ng trans itions o r ste ady- s tat e rro r s . 
Any output of a binary ignal proc s o r can be classifi ed into on of s ix mutually ex-
c lu s ive c las s e s o\' e r a n arbitrary tim inte rval of sLx mutually exclusive classes ove r an 
arbitr ary time int en·al . Th se are : 
1) Corr c t 
2) Conti nuous Ze r o- Stat e 
3) Con ti nuous One -Stat e 
4) Extr a "on s·· but no " ze r os " 
5) Extra "z r os " but no "ones " 
6) BOUl ext ra "ones " and "ze r os " 
Thi s c Ia s ifi cation i s necessary bec au e th e fa ilur e modes cause d by wrong transitions 
ha y no pa r a lle l in thr hold \·ote r . A r ealistic co mparison c annot be made on the basis of 
eac h ou tpu t imply fa iling o r wo rki ng. For xampl e. th ixth output mod e listed abo \'e r e -
sul ts in th e c ance lla t ion frec t (1 ) m entioned arlie r . Likewis e . output modes (4). (5 ). and 
(6) r e sult in th e seco nd and Ulird fa ilur modes list ed in part A. 
C. CLASS PROBABILITY MEASURE 
Each of th e Six mutually exclu s i\' e cIa s es mus t be as s igned a s parate pro bability 
measur e . Le t these be : 
1) p: th e probability that th e output i s cor I' c t 
2) q y 0 the pro babilit y that th out put is a continuous "z 1'0 
3) q y 1 th e pro bability th a t th e output is a continuous "one" 
4) q Q 0 th e pro babilit y that th e output gene rates extra "zeros" 
5) q a 1: th e probability that th e output ge ne rat e s extra "ones" 
6) q Q 10 : th e proba bility that the output gene rat e s both extra "ones" and "zeros" 
rando m ly. 
The q 's abO\'e are l' lat ed to the r e liability of th e r e liabilit y of th e co mponent sub ets 
Ull' ough the Simple r ela tions hip 
I' '" 1 -q . j J wh e r e j yo ' Y l' Q O· Q 1 . Q 10 (7) 
and 
r = ry e r ye r Q e r a e r a 
o 1 1 0 10 (8) 
5-10 
Thus, the q's refer to the probabilities that one or more failures will occur within a particu-
lar set of subcomponents and cause the related output error. 
D. ANALYTICAL MODELS 
In a multiple-line redundant system, it is assumed that each input to a restoring circuit 
is derived independently , and each input , over an arbitrary time interval , can be defined by 
one of six mutually exclusive operational classes. A physical system, defined in this man-
ner , suggests a multinomial distribution as its possible analytical model because the R re-
dundant lines can be considered analogous to R repeated trials of an event with more than 
two possible outcomes. 
1. The Multinomial Model for a Dynamic Restoring Circuit 
Let the number of outputs failed to a particular mode be represented by a random 
variable. Specifically, let 
y the number of outputs failed to the steady state 
a 0 the number of outputs generating extra "zeros" 
Q 1 number of outputs generating extra "ones" 
Q 10 nu~ber of outputs generating both extra "ones" and "zeros " randomly 
Hence, the number of outputs that are continuously correct is 
R - Q 10 
We see that the analytical model for a dynamic voter may be delineated by a subset of pOints 
in a four dimensional sample space. These pOints correspond to possible operating states 
of the system. Associated with eac,l sample point is a probability defined ·by the density 
function 
c1> ( a a Q y ) 10' l' 0' 
• The symbol ( Xl ' 
m 
where L xi = N. 
i= l 
N 
x. , 
1 
R 
10 - Q 1 - Q 0 - Y , Q 10' a 
m ) represents the mathematical function m 
n 
i=l 
N! 
X I 
"i. 
(9) 
5- 11 
Where 
p + q a 10 (10) 
Thus, the reliability of a dynamic restoring circuit will be 
ALL<PC IT 
R (t) =[ <p ( a lO I a I I 0 0 I Y ) (11) 
where IT is the subset of sample points whose outcomes result in a continuously correct de-
cision by the circuit. 
2. The Transor Model 
For the Transor, membership in the subset IT may be determined by the intersec-
tion of the following set of linear inequalities derived from the Transor decision rules. 
a lo + a I :S T I -I 
0 10 + ao:S TO-l 
20 10+ a 1+ a o + y :S T' 
where Y = YI + Yo and T' = R - TO or R - T l ' whichever is smaller. Thus 
(13) 
3. The Hamming Distance Restoring Circuit Model 
The decision rules for the Hamming Distance Restoring Circuit described earlier 
in the report determine the following set of linear inequalities: 
5-12 
a lO + 01 :S T-I 
0 10 + ao:S T-I 
0 10 + a I + 0 0 + Y :S R - T 
Removal of the cancellation effect accounts for the absence of the factor of two (2) in the last 
inequality thus making the Hamming circuit less sensitive to failures causing both extra "one" 
and "zero" transitions. From these decision rules, the reliability of the circuit can be 
written as 
TI T I 
RH(t )= I I 
° 10=0 °1 =0 
For R=5 and T=2 
_54 3 2 3 3 + 3 3 RH(I I-p +5 p (l-pl +I Op qy + 20p qa l qy +20p qoo q Y 20p qOI qoo +20p qo qy 
2 2 2 2 
t lOp2q 3+ 30p2q q + 30p2 q q 2+30p q q +60p q q q Y 010 yo
,o Y °0 Y Q 1 °0 Y 
(15) 
4. The Threshold Restoring Circuit Model 
In system reliability analysis using majority threshold voters, it is customary to 
assume that the failure of a majority of inputs, regardless of their mode, will result in a 
wrong dec ision. Although this common assumption was used in Special Technical Report No. 
4, it is not strictly correct because a threshold voter may tolerate as many as R -1 failed 
inputs and still function correctly. A more rigorous approach, using the results of section 
. lIB, can be found by letting: 
2) '¥ 1 
3) '{/ 
o 
be a random variable devoting the number of wrong "ones" 
and "zeros" 
be a random variable denoting the number of wrong "ones" only 
be a random variable denoting the number of wrong "zeros" only 
Thus, we see that the parameters defined for the threshold voter are related to 
the dynamiC restorer by: 
'¥ = 0, + y , I 
'¥o :: °0 + YO 
8'0 :: °10+ X 
where X is a dummy variable which accounts for the case in which a signal processor has 
experienced two failures causing opposite steady- state errors. Because it is impossible to 
5- 13 
say which of the two failure will control the output for a general case, the worst case condi-
tion is assumed and in the models both are assumed to exist simultaneously. By virtue of 
threshold decision rules the subset IT may be defined by 
8 +'if~T-1 
10 1 
8 10 + 'if 0 ~ R - T 
The reliability of threshold voter is, then 
where T" = T-1 or R-T whichever is smaller. 
For example, if R=5 and T-3 we have 
5 H 322 2 22 RTh(t)=p +5p (l-p)+IOp (I-p) +30p (q,T, )(q,T,) +30(p) (q'if) (q'if ) 
TI TO 1 0 
(17) 
E. THRESHOLD PARAMETERS AS A BOUND ON DYNAMIC PARAMETERS 
It was shown that the terms in the analytical models corresponded to probability mea-
sures associated with specific members of the subset IT within the sample" space. Criteria 
for membership inIT was determined by the intersection of a set of linear inequalities de-
termined from a decision rule. 
It will now be shown that a dynamiC restoring circuit can now be as effective as the 
threshold voter when the optimum threshold T for the threshold voter is (R + 1)/ 2 and the 
optimum threshold for a dynamic voter is ~ (R + 1)/ 2. It has been shown that when 
q'll ~ q'ifO (defined earlier) within a certain range, the optimum threshold for a threshold 
voter is (R + 1)/ 2. The decision for the threshold voter now becomes, using the relations 
previously described in the threshold restoring Circuit model: 
R-I 
8 10 + o l +y/ <.. --2 
(18) 
8 10 +o 0 +yo R-I ~ --2 
(19) 
5-14 
Assume also that the ratio of q/(qolO + qOl + q oO) is such (.hat the optimum dyn-
amic restoring circuit threshold is also (R + 1)/ 2 ; hence, the decision rules for the dynamic 
circuit becomes 
~ 
R-l 
0 10 + 0, 
--
2 
(20) 
+ 
R-' 
0'0 0 0 ~ 2 
(21) 
R-I 
a + 0 + a + Yl + Yo ~ 10 I 0 2 
(22) 
when Y = Y 1 + YO and 0 10 c 910 , Let all the terms generated by inequalities (18) 
and (19) form the set IT Th and those by (20), (21), and (22) the set IT R The proof consists 
of simply showing that IT H C IT Th . Clearly each random variable consisted one at a time 
will form the non-empty sub-sets of the form: 
R-I 
-2-
I 
, = , 
G R) R-i . R-i (p) (qk)1 
where k = 910 ' a 10 ' 0 1 ' 0 0 , Y l' yO· IT H C Th by virtue of the fact that 0 10 C 
910 . The proof becomes even more obvious when we consider the non-empty subsets 
formed by combinations of random variables taking two at a time. Choosing one variable 
from inequality (18) and one from (19) will generate non-empty subsets of the form 
R-I R-I t t (R.·I-~'ili) (p)R-·,-i (qk)i(ql)i FOR (k*') 
i = I j = I 
(23) 
where k, 1 = 910 ' 0 10 ' 0 1 ' 0 0 , Y 1 ' yO· Choosing two terms from (6) will form non-
empty subsets of the form 
(24) 
5-15 
Now rr crr H Th 
number in (24) is 
because the number of terms generated by(23)is ( R-1 ) 2 and the 
-2-
3 + 4 + ••• + R+I 
-2-
and for all R ~ 5 R-3 
-2-
(R;I)2 ~ L M 
M= I 
M =3 
(25) 
(26) 
Likewise, the same reasoning may be applied to combinations of random variables taken 
three at a time. Thus , it has been shown that if the dynamic restorer is to show superior 
performance it can only do so when its optimum threshold is reached at values less than 
R + 1 
-2-
F. A COMPARISON OF TRANSOR AND THE HAMMING DISTANCE RESTORING CIRCUIT 
In previous discussions , it has been noted that the Transor is controlled by two thres-
holds as opposed to the single threshold of the Hamming Distance Restoring Circuit. It 
might be argued that the utility of two thresholds, not necessarily set at the same level, 
would present an added advantage in a high asymmetrical environment, i. e . , one in which 
either "one" or "zero" errors are more likely. That this is not the case will be shown in 
the following discussion. 
In an earlier Westinghouse report1 it was shown that in an asymmetrical environment, 
a great increase in threshold voter performance could be had by USing thresholdS less than 
or greater than (R + 1)/ 2 according to a criterion developed in that report. Since dynamic 
restoring circuits cannot distinguish between outputs failed to a continuous "one" and those failed 
to a continuous "zero", they cannot take advantage of the asymmetry in steady state errors. 
This leaves for conSideration only asymmetrical tranSitional errors. 
The results of the previous section have shown that for a dynamiC restoring circuit to 
show improvement over a threshold voter, the optimum dynamic restoring circuit threshold 
must be reached at a value less than (R + 1)/ 2. 
1. P. A. Jensen, "DeciSion Making in Redundant Systems", Report No. EE-2599, 
December 1961. 
5-16 
If it is assumed that the optimum value of threshold for the Hamming Distance Restoring 
Circuit is reached at a value T t where T t is less than (R + 1)/ 2 and R=5, the following 
op op 
possibilities exist for the Transor thresholds. 
1) TO = T 1 = T opt 
2) TO f T 1 = T opt 
3) T 1 f TO = T opt 
The first case is trivial. If all thresholds are equal, then the IT formed from the 
T 
Transor criteria is clearly a subset of IT H ' i. e., n T C IT H by virtue of the factor 
2 a 10 in the Transor inequality. 
In case (2) TO can either be greater or less than T l' If TO < T 1 then (R - T 1)< (R - TO) 
and is the controlling factor. But since (R - T 1) = (R - T
opt) IT T cIT H' If TO> T 1 = 
Topt for example, TO = T1 + 1 = Topt + 1 then (R - TO) is the controlling factor. But (R = TO) 
= (R - T t - 1) so that, effectively, while the number of terms containing transient proba-
op 
bilities has been increased, the number of terms containing steady-state probabilities has 
been decreased by the same number and since qy» qao the reliability of the Transor 
will never be as good as that of the Hamming Distance restoring circuit. The same reason-
ing may be applied to case (3). 
5-17 
Page intentionally left blank 
IV. SIMULATION PROGRAM 
The success of the computer simulation program in evaluating self-repairing systems 
encouraged the use of a similar program for use as an analytical tool in this phase of the 
failure free systems study. Such a computer program has been written and has provided a 
variety of interesting results. Insights into the Transor circuit's most vulnerable areas 
were gained through this program. One of the results was the development of the Hamming 
Distance Restoring Circuit. The development of the system failure criteria statements for 
the program contributed to the development of the general decision rules which have been de-
fined for Transor, Hamming Distance, and Threshold restorers. The program was used to 
find the ratio of steady-state to transient error probabilities for which the dynamic restoring 
circuits were at least as effective as the Threshold voter in deriving correct system outputs. 
Finally, the program provided a check for the analytical models when numerical examples 
were considered. 
Computer simulation programs are commonly used to analyze the performance of de-
terministic systems which are so large and complex that a mathematical model would be 
unwieldy or of probabilistic systems which are difficult to model, or when specialized infor-
mation is desired. The Dynamic Restoring Circuit Evaluator (DRCE) program fell into this 
last category. 
The computer program which has been written for this study retains all of the basiC . 
philosophy of the program previously developed for the evaluation of self-repairing systems. * 
Some portions of the self-repair program were used directly in the DRCE program, but the 
sections of this latter program which concerned system operational state (i. e. , working or 
failed) are much simpler than those of the self-repair program. These simplifications were 
possible because of the reduced size and the non- adaptive nature of this simulation problem. 
In this simulation program, the range of numbers between zero and unity is divided into 
intervals, and each interval is assigned to one of the subcomponents of the system. In a 
system containing (s) subcomponents, the range is divided into (s) intervals each aSSigned to 
a different subcomponent. This procedure guarantees that all the numbers in the range are 
assigned in a manner which uniquely associates every number with only one component and 
similarly, all components are assigned intervals in the range. By judiciously specifying 
the lengths of the intervals, random numbers from a population uniformly distributed between 
zero and unity can be used to simulate naturally occurring random SUbcomponent failures with-
in the system. To do this, the length of the component interval is made equal to the conditional 
*This program is described in Appendix 6. 
5-19 
:--- - - - - - - - -
probability of failure of the subcomponent given that a failure exists somewhere within the 
system. This probability is given by the expression Ai 
Pj=--M-
RLAi 
i = , 
where M is the number of subcomponents in a single processor and R is the order of redun-
dancy (i. e. , the number of signal processors in a state) . A component failure is simulated 
by determining a time to failure* and then locating the sUbcomponent to be designated failed 
by associating a random number with a particular interval of numbers. Having done this, 
the type signal processor output error is automatically specified, and the effect of this error 
on system operation can be found. 
As the first step, a system is set up with no initial failures. The above process is be-
gun and continued repetitively until the system under consideration no longer meets one or 
more operational criteria. At this point, the total system operating time is computed as the 
sum of the times between component failures. This entire procedure is now repeated many 
times (usually 100), and data concerning number of failures withstood and system operating 
times are recorded. From this data various curves are plotted, and system response to 
various failure patterns is observed. 
* The method used to determine the time between each succeeding failure is identical to that 
used in the self-repairing systems Simulation. That method is described on pages 10 and 
11 of Appendix 6. 
5-20 
V. DISCUSSION OF RESULTS 
A. SIMULATION RESUL TS 
Before proceeding with a discussion of the results, a brief description of how compara-
tive reliability versus time curves were obtained is required. For each system simulation, 
the computer print-out includes a number which indicates the total operating time of the sys-
tem before the occurrence of a critical failure pattern caused loss of system function. These 
numbers are ordered and split into groups so that a histogram of percent of systems failed 
versus time can be formed. A typical histogram is shown in figure 4. From this histogram 
an approximate reliability vs. time curve can be easily constructed by starting a line at 
unity (100%) on the ordinate and zero (0.0) on the abscissa or time axis and proceeding hori-
zontally to the right until the time corresponding to the first spike on the histogram is 
reached. At this point the line is dropped vertically by the arithmetic magnitude of the spike, 
then continued to the right again until the next spike is reached. Continued repetition of this 
procedure produces a curve such as that shown in figure 5. 
The question that immediately arises is "How many system simulations must be run in 
order for a curve constructed in this manner to be smooth enough to provide a meaningful 
approximation to the true system reliability curve?" Because the question of "What is smooth 
enough?" cannot be precisely stated without a series of opinionated assumptions, a simpler, 
much less rigorous method of evaluation was used. The number of runs was arbitrarily set 
at 100 and a curve was plotted for a particular Transor voted system. This was compared to 
a series of pOints computed from the analytical reliability expression for the same system 
subject to the same failure rates. The curve and pOints are shown in figure 6. The corre-
spondence of the curve and the set of points was close enough that the no increase in the 
number of Simulated runs was considered necessary. This relatively low number of runs 
had the distinct advantage of requiring a computer running time of only about 30 seconds, in-
cluding compilation time, while producing acceptable results. 
One more detail must be pointed out before the curves can be completely understood. 
The primary interest in the study was the effect which changes the ratio of probabilities of 
steady state to transient errors. For this reason, the total failure rate of the signal pro-
cessors was held constant for all Simulations. This means that not only the general shape 
of the reliability curves can be meaningfully compared, but also their locations relative to 
the time axis. Holding the total failure rate constant in no way restricts the generality of 
the results because a change in this rate would simply cause a linear shift of the curves along 
the time axis. 
5-21 
5-22 
20 
PERCENT OF 
SYSTEM 
FAI LURES 
DURING 
EACH 
INTERVAL 
PERCENT OF 
SYSTEMS 
OPERATING 
10 
o 
100 
90 
80 
70 
60 
50 
40 
30 
20 
10 
0 
TIME INTERVALS 
Figure 4. Typical Histogram 
0 TIME-
Figure 5. Approximation to Reliability Curve 
0 r<"> 
CD 
N
 
N
N
N
<
.O
 
r
-
0
0
0
0
 
0
0
0
0
 
ID 
N
 
0
0
0
0
 
.
.
 
II 
II 
II 
"
 
r
-
r<"> 
<
t<
t<
t<
t 
.. 
0
0
0
0
 
t--
~
~
~
~
 
V N 
<
t<
t<
t<
t 
.
 
r
-
-
-
.
.
.J...J...J...J 
N
 
N
 
N
 
"
 
m
 
..... 
N
(/') 
t- 0 
f
-
-
X
X
X
IIl 
0 
III 
N
 
a: 
:::> 
0 
<0 
I LI-
0 III 
ID
 
·0
 
,.. 
0 
v
 
t-
1I 
r 
v 
\OJ 
t---
.
 
y 
L 
~
 ~
 
.r
' 
W
 
~
 
N
 
t-
.
.
.J 
Q 
w
 
a 
0 ~ 
CD 
.
.
.J 
<t 
u
 
t-W 
ID
 
a: 
0 w 
I t-
v
 
~
 
0 a: 
LL 
~
 
o
 
(Jl 
CD 
,
.
.
,
 
ID 
1.0 
V
 
r<"> 
N
 
.
.
.
.: 
0 
0
0
0
0
0
0
0
0
 
N
 
III 
t-Z 
0 
0 0-0 
(1) 3
W
ll 
itt 9N
IlC
J3dO
 
SW
3iS).,S ~O NOI1~ttCJ~ 
Figure 6
.
 
T
rans
o
r O
rd
e
r 5 R
edundancy 
5-23 
B. CURVES DISCUSSION 
The first Transor simulations showed that in the region where Transor was competitive 
to the threshold voter, the optimum TO and T 1 were both equal to two for an order five sys-
tem. The discovery that relationship held even under highly asymmetric failure probability 
conditions stimulated the development of the Hamming Distance Restoring Circuit. It has 
since been shown analyticall.y (see Section III) that the Hamming Distance Circuit always domi-
nates the Transor for order five redundancy applications. This result correlates with the 
simulation comparison for the same configuration, subject to the same failure mode condi-
tions. An example of the simulation results is shown in figure 7. 
In comparing the curves for the Hamming Distance Restoring Circuit and those for the 
threshold voter, it has been found that the latter tends to produce a more reliable output for 
steady-state to tranSient error probability ratio below approximately seven to one (7 :1) and 
the Hamming Distance Restoring Circuit slightly more reliable above that ratio, This ratio 
cannot be exactly determined because certain worst case assumptions have been made in 
establishing system operational rules for both circuits. These assumptions are slightly more 
detrimental to one than the other and may not be precisely realistic in either case. This is 
demonstrated by the combination of pOints and curves shown in figure 8. In this figure, the 
Hamming Distance curve appears to be slightly better than the threshold simulation curve in 
the high reliability region of the curves and worse in the long life region. For this plot of ,~ 
threshold curve, the assumption was made that the first steady-state error to occur in any 
processor assumed permanent control of the output of the processor and any future tranSient 
or steady-state errors in that processor were ignored. The pOints in that same figure were 
plotted from a theoretical analysis in which it was assumed that the most detrimental steady-
state error which had occurred always controlled the outputs. This worst case assumption 
does not affect the Hamming Distance curve but it heavily influences the threshold curve. 
Under this assumption, the Hamming Distance Restoring Circuit clearly dominates over a 
large section of the curve. 
It is interesting to observe the changes which occur in the reliability curves of the re-
storing circuits as the ratio of steady- state to transient error probabilities is increased. 
/' 
The fact that as this ratio is increased the Hamming Distance curve and the threshold curve 
get closer together until they cross, indicates that one or both of the curves are shifting in 
response to the change. The first pos sibility seems to be the case. The points on the thres-
hold curve tend to remain fixed. (NOTE : a slight shift to the right may be observed. This 
is caused by a reduction in the Pa 10 as the ratio increases) . The Hamming Distance curve 
5-24 
r
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
a r<) 
ro 
N
 w 
I 
N
 
I 
0:: 
I 
IJ.J 
I------
f-a 
r' 
>
 
I 
~
 
N
 
t!) 
0:: 
r
-
z 
a 
rt 
-
(f) 
N
 
N
 
::E 
z 
~
 
<I 
I 
f
-
-
-
<I 
Cl: 
I 
f-
I 
a 
N
 
I 
I 
(f) 
I 
.
.
.
 
I------
I I 
) 
I 
if 
0:: 
=> 
~ 
a I IJ.. 
a 
w
 
_<Il 
a 2 
~
 
rf' l)J 
f-IJ.J 
~
 
N
 
~
/
 
f-
a 
rJ/ "
,
.
.
 
:rJ: .
.
.
.
.
.
.
 
l;=1f. 
f"~ 
( ~
 
.rf,J 
( I 
<Xl 
w
 
~
 
N
 
a 
a 
m
 
ro 
~
 
w
 
~
 
~
 
~
 
N
 
o
 
a 
a 
a 
a 
a 
a 
a 
a 
(1) 3W
Il it! 9Nlltj~3dO SW
3iSJ..S dO
 NOIl~tj~d 
Figure 7. 
C
o
m
parison of T
ransor a
nd H
am
m
ing D
istan
c
e
 
5-25 
~
 
~
 
r
-
-
-
r
-
-
-
5-26 
a:: 
1J.J 
f-0 > 
a ...J 
(!> 
0 
~ 
r \Il 
::!: 
1J.J 
::!: 
a:: 
«
 
r 
r 
f-
1 I I I I 
fiJ 
¥-J 
tf ;1", 
r
-
' 
~
 
I I ~ 
~
 
m
 
d 
00 
ci 
r--
o
 
ld;'[{/ 
If-..r 
?~ W
 
'" 
d 
I() 
o
 
v
 
o
 
1
-
1
-
-
f--\ I 
J-( 
[ 
I 
J r-I 1-I 
j--v
-
j ,.J 
I 
rr-" 
-r~ 
~ 
rJ 
J-...r
J 
N
 o 
o
 
(.l)3W
I.L itt 9Nl1tt~3dO S
W
3iS)'S 
.dO
 NOI1~tt~.d 
N
 0<) 
0 0<) 
00 
N
 
'" 
N
 V N ~ 0 N 00 '" v N a 00 '" v N 0 
Figure 8
,
 
C
o
m
parison of T
hresh
old a
nd H
am
m
ing D
istance 
(f) 
fl: 
::::> 
0 r IJ.. 
0 
_\Il 
0 0 t: 1J.J ::!: ;:::: 
.
.
.J 
1J.J 
a 0 ::!: 
.
.
.J 
«
 
u
 
f-a:: 
0 1J.J 
r f-::!: 
0 a:: 
u-\Il 
f-Z <5 CL 0 
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
is sensitive to changes in the ratio and shifts rapidly enough to the right to overtake the thres-
'hold curve. At approximately the ratio when this occ:urs , the Hamming Distance curve rapidly 
becomes less sensitive to changes in the ratio. The ratio continues to be increased, the 
curve stabilizes and finally begins to slowly fall back to the left, thus indicating that an opti-
mum ratio exists in the region near (7: 1). This phenomenon appears to be caused by the discrete 
nature of the threshold which controls the Hamming Distance decision rules. As the seven to 
one (7 :1) ratio is greatly exceeded, the threshold of the Hamming Distance should be reduced 
to (1) if additional improvement in the reliability curve is to be expected. This threshold 
reduction, however, would make the circuit vulnerable to single transient errors. Despite 
the probable improvement in the overall reliability curve, this sensitivity to single failures is 
generally considered undeSirable. For this reason, no effort was made to simulate systems 
with this threshold. 
In figure 9, a comparison is made between an order five Hamming Distance curve and 
an order seven threshold curve at a ratio of seven to one (7: 1). It can be observed that in 
the high reliability region, the curves are almost indistinguishable. This implies that unde:, 
these ratio conditions, an order five Hamming Distance restorer system might be as useful 
as an order seven threshold voter system. This would allow an obvious saving in redundant 
equipment. 
5-27 
5-28 
-
I
-
-
-
-
-
-
-
-
-
1
-
-
tl: 
-
,
 
LU 
I 
l-
I 
I
-
-
0 
-
) 
:>
 
0 
-
-
-
-
-
.J 
~; 
(!.J 
0 
Z 
I 
-
(fl 
1
-
-
-
;:;; 
LU 
-
-
;:;; 
u: 
<
:[ 
I 
I 
I 
l-
I 
I
-
-
) 
I-
-(-
I 
1 
I 
J 
I
-
-
I 
J 
,r
-
I 
f' 
{
tl 
J 
r¥ 
-
d~Jj~Jr 
-
-
.r-' 
C
b
~
 
LV! 
'
-
r
_
f 
':" 
I 
(J 
0 
-
-
-
r0 
~ 
CX) 
~ 
U
l 
~
 
<t 
~ 
N
 
o
 
0 
0 
0 
0 
0
0
0
 
o
 
o
 
11) 3
W
Il 
-1 If 
9
N
Il 1f~3dO SW
3-1S)'S ~o N0I19t/~~ 
N
 
r<l 
0 r<l 
CX) 
N
 
U
l 
N
 
<t 
N
 
N
 
N
 
0
_
 
N
rJ) 
a: 
~
 
0 
CX) 
I I.L 
0 
U
l 
_rJ) 
0 0 
<
t;: 
LU 
;:;; 
N
 
I-
52 CX) Ul <t N 
o
 
-
.J 
LU 
o
 
o
 
;:;; 
-
.J 
<
:[ 
o
 
I-a: 
o
 
LU 
I I-;:;; 
o
 
a: 
I.L 
rJ) 
I-Z 6 (L o 
Figure g. 
C
o
m
paris
o
n
 of O
rd
e
r 7 T
hr
e
shold a
nd O
rder 5 H
am
m
ing D
istanc
e 
VI. CONCLUSIONS 
From the results obtained by manipulating the analytical reliability expressions for 
the Transor and Hamming Distance Restoring Circuits , it may be concluded that the output of 
Hamming Distance Circuit is more reliable than that of the Transor in order five redundant 
systems. This conclusion holds for any ratio of steady-state to transient error probability 
or any asymmetry (tendency toward "ones" or "zeros ,,) of error probabilities. 
From comparison of the simulation curves, it may be concluded that the threshold cir-
cuit is more reliable than either of the dynamic restoring circuits until the ratio of the pro-
bability of steady-state errors to the probability of transient error exceeds approximately 
seven to one. Above this ratio, the dynamic restoring circuit outputs are more reliable. 
Further comparison reveals that the difference in the reliability curves tend to stabilize or 
slightly decrease as the ratio becomes much larger than 7: 1. The stabilizing effect is more 
pronounced as the order of redundancy is increased from five to seven. 
Finally, it may be conCluded that in the short life, high reliability region with approxi-
mately a seven to one probability ratio, an order five system using Hamming Distance Re-
storers may be as reliable as an order seven system using threshold voters. 
5-29 
SELF REPAIR TECHNIQUES 
by 
M. R. Cosgrove 
C. G. Maste rs 
September 1963 
Appendix 6 
ABSTRACT 
This report describes the iniWll step in the design of an optimal self-repairing 
system. The report contains a description of the several classes of "repair" strategies 
under consideration and the computer simulation program which is used to determine the 
performance of the systems for each strategy. 
The computer simulation program determines the performance of a particular st.rategy 
by injecting random failures throughout the system and simulating system reaction according 
to the "repair" pattern of the strategy in question. The program prints out system performance 
in terms of: 
1. total time to failure 
2. average time to failure 
3. number of failures to system failure 
4. number of switches affected. 
The results for the two classes of strategies for which curves were drawn show 
that with the addition of a minimal amount of self-repair capability, the reliability of the 
system can be substantially increased over that of a comparable system using fixed 
redundancy alone for failure protection. 
6-ii 
I. 
II. 
ABSTRACT ... 
INTRODUCTION 
STRA TEGY DESCRIPTION 
TABLE OF CONTENTS 
A. 
B. 
Basic Assumptions. . 
Basic Strategy Classes ConSidered to Date . 
III. THE COMPUTER SIMULATION PROGRAM . . . 
A. 
B. 
c. 
D. 
The Reason a Simulation Program was Used 
How the Program Works 
Sample Format . . 
Production Format 
IV. RESULTS .. . . .. . 
A. 
B. 
Failures Withstood (as percent of system) vs. Spare Mobility 
Reliability vs. Time Curves 
V. SUMMARY AND CONCLUSIONS. 
VI. FUTURE STUDIES 
VII. APPENDIX 
Page 
ii 
1 
5 
5 
5 
9 
9 
9 
12 
13 
15 
15 
17 
25 
27 
29 
6-iii 
6-iv 
Figure 
1 
2 
3 
4 
LIST OF ILLUSTRATIONS 
Multiple-line Redundant System. . . . . . . . . . . . . 
Multiple-line Redundant System with Self-Repair Capability 
Probability Distribution of a Component Failure 
Simulation Matrix 
5 Average Number of Failures Withstood (As Percent of Gamma 1 
Systems) Versus Number of Moves per Spare. . . ... 
6 Average Number of Failures withstood (as Percent of Beta Systems) 
Versus Number of Spares per Block. . . . . . . . . . . 
7 Minimum Number of Failures (as Percent of Gamma 1 Systems) 
8 
9 
10 
Versus Number of Moves per Spare. . . . . . . . 
Minimum number of Failures (as Percent of Beta Systems) 
Versus Number of Spares per Block. . . . . . . 
Percent of Systems Operating (Beta Class) Versus Time 
Percent of Systems Operating (Gamma Class 1) Versus Time 
2 
2 
10 
12 
16 
18 
19 
20 
22 
23 
I - INTRODUCTION 
In an effort to increase the reliability of complex electronic systems, several methods 
have been proposed for using "redundant" equipment to provide failure protection within these 
systems. Two of the most useful types of redundancy techniques are multiple-line, majority 
voted logic and multiple component grouping schemes. Although both techniques are very 
effective, a large percentage of the "redundant" equipment is not efficiently used, i. e., the 
system fails with much of the "redundant" equipment still functioning. This undesirable 
feature is inherent in systems of this type because r andom failures do not tend to distribute • 
evenly throughout the system. Instead, they almost invariably tend to group and cause a 
critical failure pattern to occur in one subsystem area before many failures have occurred 
in the remainder of the system. The most drastic example of this is the failure of an order 
three, multiple-line, majority voted system upon the occurrence of two successive failures 
in the same stage with no other failures in the remaining stages. 
Company A has devised a new solution to the failure protection problem which exploits 
most of the desirable features of the multiple-line, majority-voted schemes, but is not as 
sensitive to critical failure patterns as the more standard techniques. This solution is in the 
form of a set of strategies for allowing the reorganization of the systems in response to 
failure patterns which may develop. The systems which employ these strategies are called 
self-repairing systems. 
The general approach of the self-repair strategies can be described through the use 
of an example. Figure 1 shows a block diagram of an order three, multiple - line system. 
Figure 2 shows the same system after some self- repair capability has been added. It is 
assumed that all blocks in the system are functionally identical such as the multivibrators 
in a shift register,and are interconnected by switching and voting circuits. If two blocks in 
the same column fail and the blocks on either side of this column are still operating, the 
self- repair switching mechanism senses this condition and shifts the required additional 
working blocks to the failed column. The failed block can now be eliminated or "voted out. " 
This procedure decreases the remaining protection provided the adjacent columns, but it 
prevents system failure at a critical point and thus extends the life of the system. As 
additional blocks fail, other blocks are switched into the failed columns. The choice of 
which block shall be brought in to aid the vulnerable column is determined by the particular 
strategy in use. 
6-1 
SIGNAL PROCESSING~ 
BLOCKS 
Figure 1. Multiple-line Redundant System 
Figure 2. Multiple-line Redundant System with Self-repair Capability 
The unique feature of these strategies is that the switching circuitry can be completely 
distributed rather than "lumped" into a central controller. As a result, most failures in 
the switching circuitry are equivalent to signal processor (block) failures and are elimi-
nated in the normal manner. This means that individual failures in the switching circuitry 
do not cause the loss of the entire self-repair capability. 
Before a "hardware" design of self-repairing systems can begin, the full range of 
feasible switching strategies must be examined, and from these an optimum strategy or set 
of near optimum strategies must be selected. The majority of this report is concerned with 
6-2 
a description of some of the more promiSing strategies and with the computer program 
which is being used to simulate the failure response of systems which employ these 
strategies. 
There are a great number of possible strategies which may be investigated, many of 
which are quite similar to one another. The strategies being considered are arranged in 
groups called classes, the individual members of which are special cases of the general class. 
This allows the investigation and programming of a few classes of strategies rather than 
many individual strategies. This facilitates comparison of strategies within a class as well 
as adding a certain degree of generality to the analysis. 
Before proceeding to the description of specific strategies or classes of strategies, 
the properties a self-repairing system should have must be noted and the basic assumptions 
stated. A short list of the general desirable properties is compiled below. 
a. Self-repairing systems should be more reliable than ordinary 
redundant systems of identical function capability and cost. 
b. The switching strategy used should make optimum use of the 
redundant function blocks for a fixed amount of switching 
complexity. 
c. Instantaneous failure masking must be provided for system 
applications which cannot withstand a temporary loss of data. 
An example of this is the key-stream generator used in secure 
communication channels. 
d. The strategy must be suitable for implementation by a distributed 
(non-centralized) switching network. 
6-3 
Page intentionally left blank 
II - STRATEGY DESCRIPTION 
A. BASIC ASSUMPTIONS 
Almost all large computing and control systems are formed by interconnecting a 
relatively small number of different types of basic circuit blocks. As a result, the com-
ponents of these systems can be split up into homogeneous groups of functionally similar or 
identical blocks. It is assumed, therefore, that such groups can be formed and that self-
repair strategies can be applied within each group. Note: The members of any group are not 
required to be physically or functionally adjacent but may be located in scattered sections of 
the overall system. 
It is also assumed that at least two blocks must be performing the same nominal function 
before a failure can be detected, and at least two correctly operating blocks must be perform-
ing the same function before a third (failed) block can be eliminated from this function. 
If at least three blocks are performing a function and one of them fails, the elimination 
process is assumed to be instantaneous, and the failure is assumed to be completely masked. 
If, however, only two blocks are performing the function and one fails, a third block must be 
switched to that location to eliminate the failure. This process is not assumed to be in-
stantaneous and errors appear in the system temporarily. As a result, systems using the 
basic order-three redundancy with self-repair (as will be described in the Beta and Gamma 
Class strategies of this report) must be capable of withstanding temporary data loss without 
mission failure. If this assumption is not true, a higher order of redundancy must be used 
as in the Alpha class strategies or higher-order versions of the Beta and Gamma classes. 
If, because of particular failure and response patterns, single blocks are left to per-
form particular functions it is assumed that the system continues to operate with one or 
more stages existing in the non-redundant state either until one of these blocks fails or until 
another critical failure pattern occurs elsewhere in the system. 
Finally, it is assumed that a stage shown pictorally at one end of a system is, in 
reality, adjacent to the opposite end and enjoys the same repair facilities as stages shown in 
the center of the system. 
B. BASIC STRATEGY CLASSES CONSIDERED TO DATE 
The following few paragraphs will indicate the general principles of each of the three 
strategy classes which have been Simulated thus far. Detailed examples of each class are 
shown in the Appendix,and the reader will probably need to refer to these for detailed con-
sideration of the following descriptions. 
6-5 
1. Alpha (a) Class 
Systems employing the a class strategies are basically multiple-line redundant 
(usually order three) systems which are equipped with sets of spares. These spares are 
additional function blocks which can be automatically used to replace failed blocks. In 
general, spares can not economically be given enough mobility to allow a single spare to be 
capable of replacing each operational block in the entire system. Instead, individual spares 
are usually given restricted capability and may replace only blocks in a single row* or 
portion of a row. A large number of strategies, each belonging to the (a) class, can be 
generated by varying ( a) the total number of spares available for a fixed system size, (b) 
the mobility of each spare (c) the pattern in which the spares' repair capabilities overlap. 
If it is assumed that spares will immediately replace failed blocks regardless of 
whether it is the first failure in a function column or not, complete failure masking is 
achieved. The threshold vote technique will continue to absorb failures after the spares 
complement is exhausted until a majority of unrepairable failures have occurred at a 
particular function. At this point the system will fail since both the self repair capability 
and the network redundancy have been exhausted. 
2. Beta ( (3) Class 
Beta Class strategies do not utilize inactive spare blocks as does Class a 
With no failures, the system operates as an ordinary multiple-line redundant system. When 
a critical failure i. e., one which would cause failure of a multiple-line redundant system, 
occurs, the failed block is removed from the system and replaced by a properly functioning 
block from an immediately adjacent function. The individual strategies in this class differ 
from one another primarily in the number of spares which they can draw from the rest of 
the system. 
Because failures are replaced by function blocks only from the adjacent functions 
there is a smaller amount of switching circuitry involved with Class {3 than with other classes 
of self-repair strategies. This advantage is partially offset, however, by the one drawback 
inherent in this class of strategies. That is these systems are more vulnerable to fail-
ures which are grouped in one area of the system than are the more flexible strategies. 
The three strategies of this type which have been simulated are described in the 
Appendix. These particular strategies do not usually allow blocks to move a second time 
after an initial repair has been made. This restriction has been made for a variety of 
reasons, but other strategies are being considered which will release this restriction. In 
addition, strategies having increased spare mobility will be conSidered in future studies. 
* For example the top line or row of signal processor in Figure 1. 
6-6 
3. Gamma ( y ) Class 
The Gamma ( y) Class of self- repair strategies contains much more variety 
than either Class a or Class {3. The class is characterized by a shifting of the spare 
blocks in one direction to alleviate the critical condition caused by the failed function 
blocks. Unlike the strategies of Class {3 , it is possible for a spare to move several times 
in response to failures. When a critical failure occurs, one of the function blocks adjacent 
to the failure will replace it, leaving a void. This VOid, if it creates a vulnerable situation 
i. e., one block per function stage, will be filled by the function block immediately adjacent 
to it in the opposite direction from the original failure. The next failure to occur in the same 
stage as the original failure causes another shift of the function block now adjacent to the 
failure. This may be a function which has already shifted in response to a failure. As long 
as spares are available, they will continue to shift laterally to replace failed blocks or to 
fill voids. 
Since the spare function blocks are allowed much more mobility in this class of 
strategies, more failures can be corrected. However, the amount of switching circuitry 
necessary to implement the strategies is a monotonically non-decreasing function of the 
mobility of the spares. This creates problems of implementation which limit the usefulness 
of high spares mobility. 
The individual members of Class y strategies dif.fer primarily in amount of 
mobility allowed to the function blocks. This, in turn, affects the failure absorption capa-
bilities of the strategies. Again, the individual strategies are described in more detail in 
the Appendix. 
6-7 
Page intentionally left blank 
III. THE COMPUTER SIMULATION PROGRAM 
A. THE REASON A SIMULATION PROGRAM WAS USED 
Although the reorganization features of self-repairing systems improve the failure 
absorption capability of redundant networks, these features drastically affect the analytical 
reliability expressions developed for multiple-line, majority-voted systems. Not only does 
a slight amount of reorganization capability greatly complicate the expressions, but each 
modification of each strategy class appears to require a different solution. Extensive efforts 
to model some of the simpler self-repairing systems have been unsuccessful. Because of 
this, efforts to write exact reliability expressions have been dropped, and a general computer 
simulation program has been written to facilitate a Monte Carlo approach to the reliability 
analysiS. This program can be used to simulate a broad range of strategies, and it provides 
data about the actual switching patterns which tend to occur in a system. This latter infor-
mation could not be easily determined from reliability expressions even if they were avail-
able. A plot of reliability versus time can be obtained directly from the program results 
with no more additional input information than would be required by calculations made using 
analytical expressions. 
B. HOW THE PROGRAM WORKS 
1. The General Program Philosophy 
A redundant system of the desired order of redundancy and number of functions 
is set up in matrix form. The strategy class is then selected from a group of sub-programs 
and input data which specifies the particular strategy to be tested is read in. Through the 
use of a series of random numbers, individual blocks are designated as failed, and the 
switching strategy responds to each failure until the system fails to pass the operational 
criteria. A second series of exponentially distributed random numbers determines the time 
between each Simulated failure, and the sum of these is the time to system failure. Once 
the system fails, the pertinent data is recorded, and the computer resets and begins to 
generate two new sets of random numbers. Continued repetition of this process provides 
the compilation of data mentioned in part A of this sec tion. The following paragraphs indi-
cate specifically how the various portions of the program work and the form of the print 
out. 
2. The Failure Selection Program 
A simple procedure for randomly selecting the failed function blocks has been 
set up. Each block is assumed to have an exponentially decaying reliability::: e -A t where 
'A is a constant failure rate. It has been shown that the conditional probability that a failure 
6-9 
has occurred in the i th block given that a failure has occurred in the system is equal to the 
constant, N 
L Ai 
i=1 
If the interval between zero and one is split into N subintervals, each proportional 
to the associated conditional probability, a set of random numbers unifor mly distributed 
between zero and one can be used to determine which blocks fail with correct conditional prob-
ability of picking anyone box. In this particular computer program, the random number 
specifies the block to be failed. The system then responds to eliminate the failed block. If 
the response is pOSSible, i. e . , a spare block is available to make the repair, a new random 
number is chosen and the procedure repeats. If no spare is available, the system is judged 
as failed. 
3. Time Determination 
For each of the simulated failed blocks selected above, a time to failure for the 
block is also determined. A. M. Mood1 has shown that random numbers taken from the 
uniform distribution can be transformed into any desired continuous distribution by letting 
fey) 1 o < y < 1 
y G(x) 
Where G(x) is the cumulative distribution of x. 
This relationship is shown graphically in figure 3. 
1.0 1---- -- -- - - --
Y, 1-------
Y 
(UNIFORMLY 
DI STRIBUTED 
RANDOM 
NUMBERS) 
0.0 
\ 
X, 
X [RANDOM NUMBERS 
DISTRIBUT ED AS G(Xl] 
Figure 3. Probability Distribution of a Component Failure 
1 Mood, A. M. - Introduction to the Theory of Statistics McGraw Hill Book Co. , Inc . 1950 
6-10 
Y is a single valued function of x and vice versa. For each Y chosen from a uni-
form distribution, a unique value of x is determined. 
The G (x) function which is of particular interest here is G(t) = 1 _ R(t) = 1 _ e -A t. 
This is the distribution function associated with the probability that the first failure has 
occurred within a system. This curve is shown in figure 3. 
For the first function block failure, a random number is chosen from a uniform 
population and transformed to a corresponding number from the exponential distribution. 
This latter number is the time from system start to the first failure. To calculate the time 
to the second failure, the ).. associated with the first failed block should be subtracted from 
the 2: A 's and the procedure repeated. The new number thus obtained would be the time from 
the occurrence of the first failure to the occurrence of the second failure. When the system 
fails, the sum of these individual failure times will determine the total system operating 
time. 
In the present program, the above procedure is slightly modified to make com-
putations easier. Instead of decreaSing the LA'S after each failure, this sum is left the 
same and blocks are allowed to fail more than once. When a block fails for the second time 
no action is taken other than to add the time to this failure to the system operating time. 
This modified procedure would not be acceptable if the times between subsystem failures 
were of interest, but since total system operating time is the only factor to be considered, 
the results are almost identical to these which would be obtained in the more straight-
forward approach. 
4. The System Reactions 
It is obvious that many specific reactions are different for different strategies, 
but the general manner in which the program performs the various shifts and the type 
"bookkeeping" involved can be briefly described. Figure 4 schematically illustrates the 
form in which computer "views" the system to be simulated. The height of the "basic array" 
is set by the original order of redundancy, the width by the number of stages, and tie depth 
by the number of data words associated with each block. The "failed block array" is a two-
dimensional array into which the data words for failed blocks are shifted as the failures 
occur. The only indication to the computer that a block has failed is the shifting of these 
data words into this latter array. 
When a set of data words is moved into this array, the computer examines the 
remainder of the system and makes any necessary response. This is done by shifting the 
data words associated with the appropriate spare blocks from their original locations into 
the locations specified by the particular switching strategy being considered. 
6-11 
M 
(ORDER OF 
REDUNDANCYJ 
\ 
....-----) Y 
N 
(NUMBER OF STAGES) 
Figure 4. Simulation Matrix 
.. 
F AI LED 
BLOCK 
ARRAY 
SHADING 
INDICATES 
EMPTY 
LOCATIONS 
C. SAMPLE FORMAT 
A check must be made to determine whether the computer simulation program is 
operating correctly, i. e., selecting the correct function block for failure according to the 
random number set, responding properly to failures according to the particular strategy, 
and failing at the proper time and under the proper conditions. In order to accomplish this, 
a sample format has been developed. This sample format prints out the following informa-
tion: 
1. * The function block deSignations and the random number range 
which describes failure of the block. 
2. * A list of failures which occur with all the information associated 
with the failure such as: 
a. The random number which was selected 
b. The location of the failed block 
c. The amount of time from the previous failure to the time 
of failure of the block in question 
d. The cumulative time from the beginning of system operation. 
3. The average time between failures. 
* This information is printed out for each failure until the system fails. 
6-12 
When a critical failure of a function block occurs, an operating spare is switched into 
the vacant position by assigning random number limits of the spare block to the failure 
location. This permits checking of the switching pattern to determine if the simulation 
program is working, since an incorrect switching operation will place the random number 
limit designation in the wrong position. This event can be detected when the incorrectly 
switched function block fails and the position specified by the random number does not 
correspond to that printed out in the sample format. 
To check a strategy, several runs are made USing different random number sequences. 
The sample format prints out all the above information for each case. From this information 
a determination can be made as to whether the simulation is following the rules for the parti-
cular strategy. 
In addition to performing the function of checking the simulation program, the sample 
format provides another valuable service. By observing the vicissitudes of the system with 
respect to the switching patterns which develop, information can be gained about changes in 
the strategy which might profitably be used to implement more efficient system operation or 
more economical switching circuitry implementation. This is the manner in which Class y 2 
was derived from class y l' 
D. PRODUCTION FORMAT 
A typical production run of the computer program simulates system operation for one 
hundred randomly selected failure patterns. Up to the present time, all runs have included 
one hundred patterns simply because relatively good estimates of the average system para-
meters such as total time to fail, number of failures withstood, etc. are obtained without 
requiring excessive amounts of computer time. 
The production format directly provides the following information for each of the one 
hundred cases: 
1. Average time between function block failures 
2. Total time to system failure 
3. Total number of function block failures before each system failure 
(including multiple failures of the same block) 
4. Net number of failed function blocks at time of system failure 
5. Total number of switching moves experienced by each system 
6. Total number of moves made by each spare function block. 
In addition to printing out columns of numbers covering the first five items on the 
list above, most of the data is compiled into bar graphs . Each of these graphs reflects the 
6-13 
performanc e of the set of one hundred runs with respect to a particular parameter. On the 
graphs, either discrete pOints (e. g. net number of failures) or interval terminal pOints 
(for continuous parameters such as time) are plotted on the abscissa. The height of the bar 
above each point or interval shows the number of spares or system simulations which are 
described by these positions on the abscissa. The program includes a normalization routine 
for each graph which is used to compute the average, the variance and the standard deviation 
associated with each graph. 
6-14 
IV. RESULTS 
The strategies discussed here (and any new ones which may be invented) must be com-
pared and contrasted to determine their usefulness in increaSing the reliability of electronic 
systems. The primary goal of this comparison is the determination of which strategy pro-
vides the greatest net increase in system reliability. Because it appears that the switching 
circuitry associated with spare blocks increases as the mobility of these blocks increases 
and because the failure protection effectiveness of added flexibility is non-linear, it cannot 
be simply assumed that the best strategy is the one with the greatest spare block mobility. 
The best way to compare these strategies would be to completely design functionally 
identical systems using each strategy; get the best available estimates of the failure rates 
of all the parts; feed this into the computer program and, in the manner described below, 
plot the reliability versus time curves. The comparison would merely require that one 
/ directly observe which strategy has the highest reliability curve. This approach would re-
quire a detailed system design for all strategies. To avoid wasting time on strategies which 
can be shown to be inferior to others with much less detailed input data, several less exact 
comparisons can be made. These comparisons, which are described below, are the ones 
which are being made at this point in the study. 
A. FAILURES WITHSTOOD (AS PERCENT OF SYSTEM) 'Is. SPARE MOBILITY 
An important consideration in the comparison of systems is the number of failures 
which can be withstood without system failure. In order to compare strategies with one 
another where the variable is the number of moves allowed per spare, the number of 
failures withstood is an important and meaningful criterion. To further compare systems 
of different sizes on a common base the curves plotted for these systems are expressed in 
terms of average percent of total system failed versus spare mobility. In figure 5 curves 
are plotted for three systems of different sizes, 24, 48 and 96 stages employing strategy Y l' 
They are plots of average percent of failures versus number of moves per spare. 
These curves provide very useful and interesting results. They are characterized by 
a sharp rise, a knee and a rapid leveling off. The knee occurs at a small number of moves 
per spare compared to complete (total system) spare mobility. According to this graph, a 
great increase in number of failures withstood by a system is effected by increaSing spares I 
mobility up to a point. The increase, then, is diminished and a point is reached beyond 
which little or no increase in number of failures withstood accompanies an increase in 
mobility. The characteristic exhibited by these curves illustrates that great increases can 
be attained in system performance by the introduction of self-repair Class y 1 with 
6-15 
6-16 
44 
40 
30 
25 
o 
L 
I 
I 
j 
1 
1 
/ 
/ 
1 
I 
24 FulcTIOJS PER 'SYSTEl 
V 
48 FUNCTIONS PER SYSTEM-
1/ ~ I I 
I I L 96 FUNCT~S PER SYSTEM 
---
" V ~ 
./ 
V 
L 
If 
BLOW UP OF CURVE #2 
20 40 
NUMBER OF SPARES PER FUNCTION BLOCK 
(GAMMA I SYSTEM) 
60 
Figure 5. Average Number of Failures Withstood (as Percent of Gamma 1 Systems) 
Versus Number of Moves Per Spare 
relatively little mobility. The addition of more mobility adds little to the effectiveness of 
the technique. This indicates that the most gain is attained with a small degree of mobility; 
ther~fore, the most efficient operation of the technique can probably be accomplished with 
relatively little switching circuitry. 
Plots have also been made for the percent of system failed vs. number of spares per 
function block for the {3 class strategies. These plots are illustrated in figure 6. The 
curves in figure 6 are plots of the Average Number of Failures Sustained versus Number of 
Spares per Function Block. The results show substantial gains over the multiple-line case 
for each increase in spare mobility. These curves are restricted to low mobilities because 
of the fact that the Beta class draws spares to replace failures only from the immediately 
surrounding area. 
Since an important consideration is the worst failure patterns, a plot is shown of the 
lowest number of failures which were sustained to system failed vs. mobility for the Gamma 
Class strategies. (See figure 7). These curves agree very closely with those of figure 5 
thereby substantiating the conclusion even for the worst case. 
Figure 8 shows the Minimum Percentage of Failures Sustained versus Number of 
Spares per Function Block for the three different length {3 Class systems. These curves, 
like those for class Gamma, show a gain over multiple-line system for each advance in 
mobility. 
B. RELIABILITY VS. TIME CURVES 
The reliability of a system as a function of time is the probability (P) that the system 
will be operating correctly at that time, or, out of a given sample, s, P x s of these will be 
operating correctly. From the production run printout of the computer program, it is 
possible to plot the percentage of the systems which are operating versus total operating 
time. This plot closely approximates the reliability curve associated with a particular 
strategy. The plots made here represent one minus the cumulative sum of the bara of the 
graph for number of systems failed versus time. For each interval of time in which failures 
occur a step function is subtracted from the curve corresponding to the number of systems 
which failed in that interval. This process produces a curve which is a series of discrete 
steps, starting at 1 and going to 0 as time increases. Smoothing out this curve would result 
in a curve which is identical in form to the standard s-shaped reliability versus time curve 
which is common to redundant systems. 
As it was mentioned in the introduction to this section, this type curve would be an 
excellent comparative tool if accurate estimates of the switching circuit failure rates could be 
made using completed system designs. Because the deSigns are not yet available, the use-
fulness of these curves is restricted to that of investigating which strategies are best under 
6-17 
6-18 
0.4 
t:l 
t:l 
~ 
~ 
" 
0.3 
~ 
~ g; 
::::! 
~ 
~ 
C) 
~ ~""' ~I/) I/)~ 0.2 
~ ...... C)~ 
~~ ~~ ~~ 
Cl 
I/) 
'q 
'- I 
!!i 
~ 
~ O. 
~ 
V 
I / 
V 
~ 
'q 
~ :s; 
'q / 
o 
I 
24 FUNCTIONS PER ;'~STEM 
V V • 
./ 
/ 48 FUNCTIONS PER SYSTEM 
/ 
/ /~ 
/ 
V l/ 
l/ V / 
V .;V 
/ i/ /' 
V 96 FUNCTIONS PER SYSTEM / -~~ 
V ~ ~ 
-
l ............ 
/ V 
2 
NUMBER OF SPARES PER FUNCTION 
3 
Figure 6. Average Number of Failures Withstood (as Percent of Beta Systems) 
Versus Number of Spares per Block 
Q 
0 
o~ ~:::e 
u)~ 
:t:U) 
t::::... 
itu) 
u)-
lJJoq ~:::e 
...J:::e 
-oq ~t:> 
lI..lI.. 00 
11::~ 
lJJ<!: 
cnlJJ 
:::eO 
;:)11:: ~~ 
:::eU) 
;:)oq 
:::e'-
~ 
40 
24 FUNCTIONS PER SYSTEM -
/ I 1 I I 48 FUNCTIONS PER SYSTEM 
Jif / I I I 
.... ~ 96 FUNCTIONS PER SYSTEM 
30 / V / V 
11 V 
II / 
II / 
20 
~ I JIJ 
IiI / 
I / 
II / 
10 
J I J~ 
1111 
~I/ 
if 
o 5 10 15 20 25 
NUMBER OF MOVES PER SPARE BLOCK 
Figure 7. Minimum Number of Failures (As P er cent of Gamma 1 Sys tems) 
Versus Number of Moves Per Spare 
30 
6-19 
6-20 
0.2 
24 FUNCTIONS PER SYSTEM 
-
./ ~ 
V 
~ ~. 48 FUNCTIONS PER SYSTEM 
l? V ~. 
~ ~ - 96 FUNCTIONS PER SYSTEM ~-:::::: -
V 
o 2 3 
NUMBER OF SPARES PER FUNCTION 
·Figure 8. Minimum Numb~r of Failures (As Percent of Beta Systems) 
Versus Number of Spares Per Block. 
• 
-- ---------------------------------------------------------------------------------------~ 
certain limiting failure rate conditions. Even under these conditions, the reliability versus 
time curves are very useful because they provide a universal means of comparing all stra-
tegies in all classes. 
Examples of these curves for the Beta and Gamma Class strategies are shown in 
figures 9, and 10. The following comments indicate some of the significant features of 
these curves. 
1. Beta Class Reliability Curves 
The reliability curves for the three members of the class are shown in figure 9. 
The curve for an order-three, multiple-line redundant system is also shown. These curves 
show a significant gain in reliability of all three strategies of the Beta Class over the re-
dundant case. The effective gain will not be as great in reality because perfect switching has 
been assumed in plotting the curves. 
With the limited amount of switching allowed to strategy f3 1 an increase in 
MTBSF of approximately 100% results. As more switching capability is allowed to the 
system the reliability continues to increase, showing that strategy f3 3 provides Significant 
increase, reliability-wise, over either f3 1 or f3 2 and very significant increase over the 
multiple-line redundant case. 
2. Gamma Class Reliability Curves 
Figure 10 illustrates the reliability curves for four gamma class strategies. 
Illustrated are the limiting cases 1 move per spare and 23 moves per spare *as well as a 
multiple-line redundant system. Two strategies of intermediate mobility are also shown. 
These curves, again, show that the introduction of a minimal amount of switching 
capability, 1 move per spare, causes a significant gain in reliability and operating time over 
the redundant system. It is obvious, also that the first few increases in mobility capability 
of the spares induce further noticeable gains in reliability over the one move per spare case. 
As additional mobility is granted to the system, the reliability gained begins to diminish. 
This is illustrated by the fact that as much gain in reliability is attained by increaSing 
mobility from one to three moves per spare as is gained by going from three to twenty-three 
moves per spare. This also reflects the flattening effect observed in the curves of percent 
of Failures Sustained versus Mobility of the System, wherein the additional mobility after a 
certain point bought no additional gain in reliability. 
* 24 Function System 
6-21 
6-22 
>-u 
Z 
<t 
a 
z 
::::> 
.,.J a 
w 
a: 
w 
_ N "'~ CD CD CD-..J 
I 
r >- w <.!> 
-..J W' W f- a.. 
<t : : f= 
r a: -..J f- ::::> 
(/) ~ V II ::Xcouo 
r 
lr .I~ ~ 
~ V / It ) 
r - l..t lJ f I 
~ V U 
u ,.......r 
.....-r r 
1/ v-r s~ ~ §s 
I V w--r l....--l lrI ~ ~l" 
Y w-r .J ~ ~ _I" 
~ ~ .... §,.-I W-
V 
~ 
~ ~ 
r-' 
CD 
o 
t"-
O 
10 r<l 
o o 
9NUtTI:I3dO SW3.l.S,l.S (SStT7:J tT.l.38).dO .J.N3:J1:13d 
(\J 
o o 
Figure 9. Percent of Systems Operating (Beta Class) Versus Time 
o 
o 
o 
<n 
o 
o 
o 
CD 
o 
o 
o 
t"-
o 
8 
lD 
0 
0 
0 
10 
0 
0 
0 
<t 
o 
o 
o 
r<l 
o 
o 
o 
(\J 
o 
o 
o 
(f) 
a: 
E 
w 
~ 
f-
f---
r--
f---
r-----
w 
0: 
« = a.. 
en 
0: 
w = a.. 
en 
ww 
» 
00 
:<:< 
-C\J 
<i.ai 
r 
u 
2 
« 
a 
z 
~ 
a 
w 
0: 
= = w 
~ F ..J 
I 
J = = w ..J ~ = = I-
..J 
<") ~ V r0C'l :< u ci LA.i 
r / ! 
L-.JI lrfj r J 
V .... rV 1/ a 
V- ~ r-' V~ l ~ ,.J ,......r 
/ V~ V f.f lr-i I" 
J L lL 
~ V Ir/ ~ 
-= 
tr V 
I 
f.---r-
r----
~ 
(j) 
ci 
(l) 
o 
~ 
V ~ ~r 
1-1 
~ 
to 
ci 
~~ ~ 
10 
ci 
~ ~ 
.~ 
l} 
-J 
~ 
C\J 
ci 
9NI.1tfY3dO SW3.1.SAS(SSt:t7:J tfWWtf9).:I0 .1.N3:JY3d 
j 
J 
,. 
11 
I~ 
~ 
fJ 
J 
I 
~ 
l) 
Figure 10. Percent of Systems Operating (Gamma Class 1) 
Versus Time 
f o o 
o 
(j) 
o 
o 
o 
(l) 
o 
o 
o 
r--
o 
o 
o 
to 
o~ 
oen 
00: IOE. 
o 
o 
o 
'It 
o 
o 
o 
r0 
o 
o 
o 
C\J 
o 
o 
o 
w 
:< 
I-
6-23 
Page intentionally left blank 
V. SUMMARY AND CONCLUSION 
Before self-repairing systems can be implemented, many feasible switching strategies 
must be considered in an effort to determine the most effective manner to manipulafe the 
redundant or "spare" blocks. The extreme complexity of the reliability expressions associated 
with these strategies has resulted in the use of a computer simulation program for comparing 
the effectiveness of the strategies. Rather than proceeding to write separate programs for 
each strategy, a more general program has been written which employs a small number of 
subroutines, each of which describes an entire class of strategies. Input data determines 
which class subroutine is being used and which strategy in a particular class is being simu-
lated. Although this generalized program is a great improvement over the individual pro-
gram for each strategy approach, it still requires additional programming each time a new 
class subroutine is added. At this time, the change to a more general program, whose simula-
tion strategy can be completely determined from input data, does not seem to merit the pro-
gramming time which would be required. 
The present program includes subroutines for three classes of switching strategies. 
Each class subroutine contains a great deal of flexibility, thereby including many individual 
strategies. This method facilitates easy comparison between members of a class. This 
comparison allows immediate elimination of many possible strategies as obviously uneconomi-
cal. For example, the flattening out of the Percent of System Faill€d versus Spare Mobility 
curves (figures 5 through 8) indicate that all possible strategies on the flat part of the curves 
cannot be optimum strategies. 
From the results of the simulation . program, curves for Percent of Systems Failed 
versus Spares Mobility have been plotted for the Gamma Class strategies. These curves 
have been referenced to that of a multiple-line majority voted system because this particular 
technique has been the most effective of the paSSive, failure masking, circuit level redundancy 
techniques. In all cases these curves show not only that great gains can be realized over 
multiple-line redundant scheme but that by far the greatest part of these gains are realized 
for the first few moves allowed to the spare function blocks. Beyond the range of relatively 
limited mobility, little or no gain in the average number of failures absorbed is realized by 
the additional mobility allowed to the spares. This is an encouraging result since the great 
majority of the gain due to self-repair can be retained without the use of an exorbitant amount 
of switching circuitry. 
In the f3 and -y classes of self-repair strategies the degree of failure masking is the 
same as that for a multiple-line redundant system of the same order of redundancy. This 
is due to the fact that no "repair" is made until an ambiguity is present on the output of a 
6- 25 
stage. This event corresponds to redundant system failure which activates the switching 
mechanism and the "repair" is effected. However, until the failure is "repaired" no 
failure masking is present, and incorrect information may be transmitted to the next stage. 
The a class strategies provide additional failure masking because repairs can be 
initiated by the first occurrence of a failure in any stage. However, because this class im-
plies a higher order of redundancy it cannot be compared to order-three multiple-line 
redundancy as the /3 and y class have been. 
The curves of figures 9 and 10 show a very definite gain in reliability for the self-repair 
strategies over multiple-line redundant systems. The curves for the Beta Class strategies 
show an increase in reliability for each increase in " repair" capability. Strategy /33 yields 
the highest reliability but even strategy /3 1 shows a significant gain over the multiple-line 
system. The reliability curves for the Gamma Class show essentially the same result with 
respect to the multiple-line case. However, investigation of the curves show that increasing 
the "repair" capability produces gains for the first few increases after which the magnitude 
of the gain diminishes. These curves tend to bear out the conclusions drawn from Percent 
System Failed versus Spares' Mobility curves which flattened out after a certain mobility 
was reached. The gains illustrated here must be considered as ideal because the switching 
circuitry for self-repair is here assumed to be perfectly reliable. More realistically, the 
gains obtainable will be a function of the switching circuitry complexity and will not be as 
great as shown here. 
6-26 
VI. FUTURE STUDIES 
All of the computer simulation results discussed in this report have been based on 
the assumption that the switching circuitry was perfectly reliable. Efforts are now being 
made to determine the range of allowab Ie failure rates which can be associated with each 
strategy for it to be of maximum effectiveness. These ranges are to be studied as a function 
of the failure rates of the associated signal processor blocks. As a result, before actual 
system designs are begun, information specifying the optimum switching strategy correspond-
Ing to a gi ven signal processor failure rate should be available. 
From the sample and production simulation run printouts it has become obvious that 
many of the spare function blocks do not experience as many switching operations as they 
have the capability for. When all spares are assigned a uniform mobility some reach their 
limit and, in doing so substantially extend the life of the system. However, in many cases 
when system failure has occurred, there are many spares remaining WhICh have not been 
used to any great extent. In order to capitalize on this phenomenon a class of strategies y 2 
is being developed which will assign different mobilities to the spares in a stage. Class y 2 
will be simulated by a new sub-routine which is being written for the computer program. 
When data is available comparisons will be made between this and the other classes. 
Additional classes will be simulated in a similar manner as they are developed. 
None of the strategies considered so far have permitted spares to return to previous 
locations. It is possible that removal of this restriction might add to the failure absorption 
capability of a system. This area certainly should b~ explored in this study series. 
Although little has been said about the physical switching techniques to be employed, 
it has been taCitly assumed that the failure detection and replacement circuitry would be 
combined as much as possible . It has been suggested that these two phases of the repair 
function might profitably be separated and made almost completely independent from a circuit 
viewpoint. This is another area which should be given careful attention. 
The Alpha class strategies have not been thoroughly investigated to determine the 
optimum degree of spare overlap (i. e., two sets of spares serving some of the same 
functional region). The information from this investigation should influence the design of 
new strategy classes as well as indicating the optimum strategy for the Alpha class. 
6-27 
Page intentionally left blank 
VII. APPENDIX 
A. CLASS a 
Illustrated in figure A-1 is an a class strategy wherein each spare can "repair " 
failures in one row and either of two stages. Spare "1" can "repair" stages 1 or 2; 
"2" can "repair" 3 or 4, etc. Each spare can repair failures only in its own rows. This 
can be expanded such that, for example, three spares can each repair function blocks in any 
of ten stages or, in general, r spares for n stages. Overlapping of spares capability may 
help guard against " lumped" failures. 
Many different strategies and system repair capabilities can be developed by simply 
varying rand n or by overlapping possible individual spare "repair" ranges. 
CD ~ 0 0 ~ @J 0 ~ 0 
CD [I] ~ 0 0 0 0 G [§J 
0 ~ ~ 0 0 0 [g ~ 0 
\... V- J 
~ ~ 
SPARES 
Figure A-l. Alpha Class Self-Repair 
B. CLASS f3 
There are presently three specific strategies of f3 Class. The major difference 
between these strategies is the number of spare function blocks which can replace a given 
failure. 
1. Class f3 1 (Figure A-2 ) 
Class f3 1 allows only one "spare" for a given failure response. For example, 
function block "H" is given capability as a spare for stage # 4. Figure A-2a shows the 
system before failures occur. When one function block, J, in stage #4 fails no switching 
results other than the elimination of the failure. (See figure A-2b). When the second failure, 
say K, occurs in stage # 4, function block "H" will move into stage # 4 (See figure A-2c. ) 
and resolve the ambiguity caused by the failure. After the failed block has been eliminated 
block "H" remains in stage # 4. 
6-29 
o. 
STAGE NO . 2 3 4 5 
SYSTEM BEFORE FAILURE 
Figure A- 2a. Beta Class Self-Repair 
OPERATION OF CLASS B1 STRATEGY 
b. 0 0 0 ~ 
[I] IT] 0 0 G 
@] 0 IT] [g 0 
STAGE NO . 2 3 4 5 
FAILED FUNCTION BLOCKS 
FIRST FAILURE - NO RESPONSE 
Figure A-2b. First Failure 
c. 
STAGE NO. 2 5 
SECOND FAILURE - 15_1 RESPONSE FAILED FUNCTION BLO CKS 
Figure A-2c. Second Failure Response 
6-30 
It is possible that one function block will remain working alone without system 
failure. For example, if function block "G" failed before "K" function block "I" will 
carry the load for stage 2 after "H" switches until it fails. (See figure A-3.) System failures 
occur when a lone operating function in a stage fails or when no spare is available to resolve 
an ambiguity. Failure of this system could occur when function block "E" and "G" have failed 
and failure of blocks "H" or "I" occurs (figure A-4), since for this strategy, block "E" is the 
only spare capable of "repairing" a failure in stage # 3. 
0 @] ~ 
1ST RESPONSE 
@] m ~ ~ 
3RD FAILURE 
@] 0 OJ ~ 
STAGE NO. 2 4 5 
I 2 
o @] FAILED FUNCTION BLOCKS 
Figure A-3. Third Failure Response 
0 @] r--' 0 ~ I I L_J 
@] r-' ~ ~ ~ I I L_.J 
@] ~ IT] [] [QJ 
NO SPARE AVAILABLE 
I 2 
[] @] FAILED FUNCTION BLOCKS 
Figure A-4. Catastrophic Failure Sequence 
2. Strategy /3 2 (Figure A-5) 
Strategy /3 2 is Similar to /3 l' but it allows one additional function block to re-
place failures in a given stage. In strategy /3 2 function block "M" in addition to "H" is 
given the capability of replacing failed blocks in stage #4. Strategies /3 1 and /3 2 operate 
6-31 
identically through the first two failures. When the third failure in stage #4 occurs block 
"M", if still operative, will switch into stage # 4 in the same fashion as did function block 
"H" in Class {3 1. This move is labeled "2 response" in figure A- 5. System failure in 
strategy {32 occurs in the same manner and under the same conditions as in strategy {3 r 
0 @] @]~~ 
3RD RESPONSE 2ND RESPONSE 
@J ~ rl-----~~ L.J H @ 
@] [] IT] tgj @] 
STAGE NO. 2 3 4 5 
FAILURE 
I 2 
0 0 FAILED FUNCTION BLOCKS 
Figure A- 5. Beta 2 & 3 Strategy 
3. Strategy {33 (Figure A-5) 
Strategy {33 extends the scheme one step further. Here, a third function 
block is allowed to move in addition to the two responses allowed to strategy {32. In this 
strategy the ability is imparted to function block "G" in stage 3 to replace failed blocks in 
stage # 4. This is the 3rd response shown in Figure A-5. Again, failure occurs in the 
identical fashion to the other two strategies. 
C. GAMMA (y) CLASS 
Gamma Class is divided into two parts: Class y l' where all spare function blocks have 
the same mobility, and Class y 2 where one spare in each stage has a greater mobility than 
the other. 
1. Class y 1 (Figure A-6) 
As in Beta Class strategies, the first failure in a stage of a Gamma Class system 
evokes no response from the system. The second failure creates an ambiguity on the output 
of the stage. This activates the switching mechanism to switch block "H" into stage 4 thereby 
dissolving the ambiguity. (See Figure A-5b.) The second failed block is now identified and 
switched out of the system. Block "H" remains in stage 4 to detect subsequent errors. 
another failure occurs in stage 4, for example block " L", block "G" from stage 3 will switch 
into stage 4 in the same manner as did block "H". This leaves no error detecting capability 
in stage 2. To overcome this, block E from stage 2 switches into stage 3 to fill the void created 
by the switch of block "G". (See figure A-6c.) 
6-32 
0 
o. ~ 
@J 
STAGE NO. I 
0 
b. ~ 
@] 
STAGE NO. 
c. 
0 
0 
0 
STAGE NO. 
@] @] ~ ~ 
~ [8] ~ ~ 
~ OJ ~ @] 
2 3 4 5 
~ 
FAILED FUNCTION BLOCKS 
FIRST FAILURE - NO RESPONSE 
Figure A-6a. Gamma 1 Strategy - First Failure 
@] @] r- , @] I I 
RESPONSE 
L _J 
[I] ~~ ~ [E] 
FAILURE 
~ [] [g @] 
2 3 4 5 
o FAILED FUNCTION BLOCKS 
SECOND FAILURE - IN STAGE NO.4 
Figure A-6b. Second Failure Response 
2 
r-:::l RESPONSE I' ~- ·LJ 
RESPONSE .1 I 
L.J 
3 FAILURE(~ 
00 
Figure A-6c. Third Failure Response 
5 
FAILED FUNCTION 
BLOCKS 
6-33 
d. 
0 ~ ri II ~ L.J L.-.-J 
[!J RESPONSE II 
B - -'L.J 0 0 ~ 
FAILURE 
0 0 CD 0 0 
STAGE NO. 2 3 4 5 00GJ FAILED FUNCTION 
BLOCKS 
Figure A- 6d. Single Block Operation 
Now if a failure should occur in stage 2, block "D"; a spare function block "B", 
from stage 1 will switch to stage 2 and the failed block "D" will be switched from the system. 
(See figure A-6d. ) As additional failures are sustained this process continues until a limit 
is reached. The end to this process can be reached in one of two ways: 
1) A limit can be set for the mobility of a particular function block. 
In this case, once a function block has reached its limit it can no longer act as a spare for 
failures in the stage following it. If a critical failure occurs and all possible spares m ve 
failed or reached their limits the system fails. Voids which cannot be filled due to spares 
reaching their limit remain as voids but the system continues to operate until the remaining 
function block fails. This limit sequence is illustrated in figure A-7a. Block "A" has a 
o. 
II II II II II 
L-.J L-.J L-.J L.-.J LJ 
G 0 0 ~ 0 
0 0 CD ~ 0 
STAGE NO . 2 3 4 5 
rglrDl 'ElfHlfKl FAILED FUNCTION 
l..2:J ~ L.::J t..:.:J ~ B LO C KS 
Figure A-7a. Function Block Limit 
6-34 
mobility of 3 and after a given failure pattern the system appears as in Figure 7a. Block "A" 
has reached its limit. Upon the occurrence of a critical failure in stage # 4, block "A" can-
not act as a spare for this stage. The ambiguity remains on the output of stage 1 and the 
system is considered failed. However, if the critical failure occurred in stage 2 rather than 
stage 4, block "M", since it hasn't reached its limit, would switch into stage 2 and resolve 
the ambiguity. This leaves a void in stage 1. Function block "G" cannot switch into stage 1, 
hence, the void remains and the system works properly as long as the remaining block in 
stage 1 does not fail. 
2) Another failure mechanism can exist for class y. When the system 
has sustained a large number of failures such that the number of remaining spares is equal to 
the number of stages this second mechanism case becomes effective. When an additional 
failure occurs, each spare function block will respond once, the initial one will resolve the 
ambiguity and others will fill the successive voids which appear in the immediately preceding 
stages. Since there is now one less spare than there are stages a void must remain some-
where in the system. If the next failure is in the stage which contains the void or that stage 
for which the void would have been a spare, the system goes down. For example, referring 
to Figure A-7b if function block "G" fails , block "D" will switch into #4 to correct for the 
failure. Block "A" will fill the void for block "D " , block "M" for "A" and block "H" for block 
"M". The process stops here. There is a void in stage 5. Now failure in stage 1 or stage 
5 will cause system failure. Class y l' allows uniform mobility to each spare function 
block in the system. 
b. 
STAGE NO. 
,I 
L.J 
,I 
L.J 
0 
0 
2 
Figure A- 7b. 
,I ,I ri 
L---1 L~ L..J 
~ ~ 0 
~ ~ 0 
3 4 5 
00000 FAILED FUNCTION BLOCKS 
Marginal Operation 
Many different strategies are contained under the heading of Class y l' These 
differ prima rily in the limit aSSigned to the mobility of the spare function blocks. A 
particular strategy may be identified by specifying "n" in the statement tin moves per spare. " 
The value of n prescribes where a given function block will reach its limit and therefore con-
trols the differences between the various strategies of Class y l' 
6-35 
2. Class y 2 
Unlike the Gamma 1 Class, which assigns the same mobility to all spare function 
blocks, Gamma 2 Class allows the two spare function blocks to differ from one another in 
mobility. Figure A-8 will assist in the description of the switching processes which occur 
for strategy Gamma 2. The members of the top row are ass igned a mobility 3, those of the 
middle row, a mobility 2. 
The first failure in a stage will evoke no response aside from the elimination of 
the failed block from the system. Upon failure of the second function block in a stage (stage 4), 
the spare will be drawn from the next stage (stage 3). Block "G" which has the greater mobility 
will switch from stage 3, to stage 4. (See figure A-8a) This is the only switch which will 
o. 
o @] 
STAGE NO. 2 5 
FUNCTION BLOCKS 
Figure A-8a. Gamma 2 Strategy - First Failure 
occur. Since there a re two function blocks remaining in stage 3 the void created by the 
switch will not be filled. The next failure occurring in stage 4 will require another spare 
to be switched into the stage. This spare is drawn from next stage which has. a spare with 
high mobility and which is within range to supply the need i. e., block D from stage 2 will 
switch into stage 4. (See figure A-8b.) This leaves another void which is not filled and which 
needs not be filled In the s stem described in fi ure A-8 the next failure in s e 4 cannot 
b. 
~ fi G L~ 
[TI 0 G 
STAGE NO. 
0 [0 (~ G 2 3 5 
00 FAILED FUNCTION BLOCKS 
Figure A- 8b. Gamma 2 Strategy 
6-36 
draw a high mobility spare A, because it is out of range for stage 4. In this case the lower 
mobility spare from stage 3 is used spare "H". This leaves a void in stage 2 which must be 
filled since there is only one remaining operating function block in that stage. This void 
is filled as though it were a failure; if a high mobility spare is available it will be switched, 
i. e., function block "A" will switch to stage 3. (See figure A-8c.) This process continues 
until either a failure occurs and no spare is available or a lone remaining function block in a 
stage fails. System failure occurs at this point. 
c. 
0 I, r-, r-, G L-1 L.J 
0 0 -'0 0 
0 0 0 (It} G STAGE NO. 2 3 5 
[g00FAILED FUNCTION BLOCKS 
Figure A- 8c. Gamma 3 Strategy - Third Failure Response 
NASA-Langley, 1964 CR-105 6-37 
