AES-EPO study program, volume II  Final study report by unknown
AES-EPOSTUDYPROGRAM
FINALSTUDYREPORT
VolumeII
NOb-ZiO04
X
(PAGI:-S| (CODE|
(NASA _l't OR TMX oR AD NUr4BEZR) (CATEGORY)
I
_o ,.,,:_ s LI_,,,,.,A_yCOPY
CFSTI PRICE(S)$ ,, JAN17 1965
Hard "copy (HC) __ ,,,_"" _ I_AN[_]EOSp_CE,3RhFT CE_TER
HOUSTON,
Microfiche (MF)._._.__
fl 6S3 Jury 65
___ Fodoral Systems Division,Elo©tronice Systoms Centsr, Owogo, Now York
1966011715
https://ntrs.nasa.gov/search.jsp?R=19660011715 2020-03-16T22:35:05+00:00Z
AES-EPO STUDY PROGRAM
Final Study Report
Volume II
ORIGINATED: AES-EPO Staff
CLASSIFICATION AND e
CONTENTS APPROVAL:
IBM NUMBER: 65-562-012
CONTRACT NUMBER: NAS 9-4570
Prepared for the
MANNED SPACECRAFT CENTER
National Aeronautics and Space Administration
Houston, Texas
__ Electronics Systems Center, Oweoo, New York
31 December 1965
1966011715-002
FOREWORD
A computer concepts study was conducted at the IBM Electronic
Systems Center at Owego, New York, under IBM contract NAS9-4570, for
the Manned Spacecraft Center, Houston, Texas. The objective of the study
was to investigate possible solutions to long term and time critical reliability
problems as they affect the Apollo Command Module guidance and control
computer in its application to the AES mission. Volume I of this final report
presents a summary of the work performed during the study, and Volume 12
presents detailed technical descriptions of the various investigations.
1966011715-003
TABLE OF CONTENTS
Section Page
1.0 PACKAGING ............................. 1
1. 1 Limiting Exposure ...................... 1
1.2 Connector Sealing ...................... 8
1.3 Contact Considerations ................... 14
1.4 Replaceability... _ ..................... 23
1.5 Module Size .......................... 24
2.0 MACHLNE ORGANIZATION .................... 35
2. 1 TMR Characteristics .................... 35
2. 2 Trade-off Criteria ...................... 41
2.3 Basic Subsystem Configuration ............. 43
2.4 Oscillator ............................ 63
2.5 Memory ............................. 70
2.6 Power Supplies and Distribution ............ 80
2. 7 Grcunding ........................... 86
2.8 TMR/Simplex Mode ..................... 95
2. 9 Reorganized Subsystem ................... 99
2. 10 Transient Protection .................... 125
3. 0 ERROR DETECTION AND DIAGNOSIS ............. 135
3. 1 Approach ............................ 135
3.2 Disagreement Detectors ........ ........... 140
3.3 Switching ............................ 150
3.4 Crew Requirements ..................... 156
3.5 Programming Requirements ............... 156
4. 0 FABRICATION AND TEST ..................... 166
4. 1 Equipment Mockup ...................... 166
4. 2 Exploratory Tests ...................... 167
4. 3 Environmental Simulation Equipment .......... 172
4. 4 Evaluation Tests ....................... 176
4. 5 Test Results .......................... 177
iii
1966011715-004
LIST OF ILLUSTRATIONS
Figure Page
1 Unit Packaging Approach ....................... 3
2 Computer Casting ........................... 4
3 Channel Packaging Approach .................... 5
4 Cell Packaging Approach ....................... 6
5 Initial Leakage Rate ........................... 7
6 Seal Deterioration with Use ..................... 9
7 Connector Sealing Technique (Modified Saturn - V
Connector) ................................ 11
8 Phase I Test Model .......................... 12
9 Change in Contact Resistance Versus Time (Gold and
Gold Alloy) ................................ 17
10 Porosity Versus Thickness for Gold Plating .......... 18
11 Constriction Resistance Versus Load .............. 21
12 Contact Resistance Versus Alloy Gold Content ........ 22
13 Interconnections per Circuit Versus Circuits per Page .. 25
14 Connections Versus Logic Blocks ................. 27
15 Interconnections per Circuit Versus Circuits per' Page... 28
16 Voters per Page Versus Circuits per Page ......... 30
17 Circuits per Machine Versus Circuits per Page ....... 31
18 Channel Packaging ........................... 32
19 TMR Voting ................................ 36
20 TMR Versus Simplex Reliability .................. 38
21 Channel Switching ........................... 40
22 Module Switching ............................ 41
23 Saturn-V Voter ............................. 42
24 Saturn-V Guidance Computer .................... 44
25 Data Adapter Block Diagram 53• • * • ,, • a • • • • • • • • • • • • •
26 Oscillator Synchronization ...................... 66
27 Transient Filter ............................. 67
28 Clock Generator ............................ 68
29 Biased Oscillators ........................... 69
30 Gated Oscillators ............................ 71
31 Duplex Memory .............................. 72
32 TMR Memory .............................. 73
33 Triplex-Duplex Power System ................... 82
34 Interrelated Power System ..................... 84
35 Grounding System .......................... 87
iv
1966011715-005
LIST OF ILLUSTRATIONS (Continue_
Figure Page
36 Regulated DC Return 3round Planes ............... 88
37 Module _ound Planes ........................ 90
3,3 Disc,:_te Input Circuit ........................ 91
39 D-screte Output Circuit ....................... 93
40 Pulse Input Circuit ........................... 94
41 Pulse Output Circuit ......................... 94
42 Generalized Reliability Curves .................. 96
43 Reliability Comparison (TMR and TMR/Simplex) ....... 100
44 Four-Module Partitioning of the Saturn-V Computer .... 102
45 AES Computer Flow Diagram ................... 105
46 AES Data Adapter Flow Diagram ................. 107
47 AES Computer Subsystem Package ................ 110
48 Voting mud Disagreement Detection ................ 136
49 Module Switching ............................ 137
50 Saturn-V System Simulator Flow Diagram ........... 139
51 Methods of Error Detection ..................... 141
52 TMR/Simplex Operation ....................... 152
53 Logic Voter ............................... 155
54 Detection Distribution in a Diagnostic Program ....... 161
55 Apollo Computer - KES ........................ 168
56 Completed Mockup ........................... A-2
57 Pepresentative Module for Phase 11 Testing .......... 169
58 Phase I Test Model .......................... 171
59 Environmental Test Chamber ................... A-3
60 Functional Diagram-Test Chamber ................ 173
61 Chamber During Test ......................... A-4
62 Test Fixture ............................... A-5
63 Phase II Module Failures (>25 Millivolts)-
Module No. 211 ............................. 180
64 Phase II Module Failures (>25 Millivolts)-
Module No. 212 ............................. 180
65 Phase II Module Failures (>25 Millivolts)-
Module No. 213 ............................. 181
66 Phase II Module Failures (>25 Millivolts)-
Module No. 214 ............................. 181
67 Phase II Module Failures (>25 Millivolts)-
Module No. 215 ............................. 182
68 Phase II Module Failures (>25 Millivolts)-
Module No. 216 ............................ 182
69 Phase II Module Failures (>25 Millivolts)-
l_Iodule No. 230 .............................. 183
V
1966011715-006
LIST C ? ILLUSTRATIONS (Continued)
Fig_re Page
70 Phase II Module Failures (>25 Millivolts)-
Module No. 231 ............................. 183
71 Phase II Module Failures (>25 Millivolts)-
Module No. 232 ............................. 164
72 Summary of Failures ......................... 185
73 Nine Modules Under Test - 9 Days ................ A-7
74 Nine Modules Under Test - 27 Days ............... A-8
75 Seven Modules - 27 Days ....................... A-9
76 Individual Module - 32 Days .................... A-10
77 Individual Module - 32 Days .................... A-If
78 Individual Module - 32 Days .................... A-12
79 Individual Module - 32 Days .................... A-13
80 IndividuRl Module - 32 Days .................... A-14
81 Individual Module - 32 Days .................... A-15
82 Individual Module - 32 Days .................... A-16
83 individual Module - 32 Days .................... A-17
84 Module 212 - 32 Days ......................... A-18
85 Module 212 Enlargement - 32 Days ................ A-19
86 Module 212 Female Connector - 32 Days ............ A-20
87 Module Pin Discoloration - 32 Days ............... A-21
88 Individual Module - 57 Days .................... A-22
89 Individual Module 57 Days A-23
90 Individual Module - 57 Days .................... A-24
91 Individual Module - 57 Days .................... A-25
92 Individua/Module - 57 Days ..................... A-26
93 Individual Module - 57 Days .................... A-27
94 Individual Module - 57 Days .................... A-28
vi
1966011715-007
LIST OF TABLES
Table Page
1 Physical Comparisons of Packaging Approaches .......... 10
2 Contact Resistance (Average of Several Field Sites) ........ 15
3 Reliability States for TMR Modules ................... 37
4 TMR Comput_'r Characteristics ..................... 50
5 Data Adapter Characteristics ....................... 52
6 Address Groups ............................... 52
7 Reliability Estimates (Basic System) ................. 64
8 Regulated DC Power per Section .................... 83
9 Power System Component Count ..................... 85
10 Computer Sizing ............................... 101
11 TMR Computer Characteristics (AES) ................. 106
12 Data Adapter Modules ........................... 109
13 Reliability Estimates (AES System - TMR Mode) ......... 115
14 Reliability Estimates (AES System - TMR/Simplex Mode) .... 115
15 List of Available Spares .......................... i17
16 On-board Spares - 100-Percent Duty Cycle ............. 118
17 On-board Spares - 50-Percent Duty Cycle, Non-op Failure
Rate > 0 ................................... 118
18 On-board Spares - 25-Percent Duty Cycle, Non-op Failure
Rate > 0 .................................... 119
19 On-board Spares - 50-Percent Duty Cycle, Non-op Failure
Rate = 0 .................................... 119
20 On-board Spares - 25-Percem _.Duty Cycle, Non-op Failure
Rate = 0 .................................... 120
21 Reliability Improvement Due to Sparing ................ 120
22 AES System Reliability - Re-entry Phase .............. 121
23 AES System Mission Reliability ..................... 121
24 AES System Reliability - Switchable Spare Mode .......... 122
25 AES System Reliability - Total Mission ................ 122
26 Disagreement Patterns ........................... 125
27 Diagnostic Listings ............................. J 26
28 Signals, Logic, and Voters ........................ 144
29 Additional Disagreement Detectors ................... 145
30 Distribution of Detectors ......................... 145
31 Error Signal Propagation ......................... I47
32 Typical Print-out from Redundant Computer Simulation ..... 149
33 Typical Redundant Computer Simulation ............... 150
34 Symptom - Failure Correlation ..................... 162
35 Computer Symptoms ............................ 163
36 Test Equipment Listing .......................... 174
37 Test Schedule ................................. 177
vii
1966011715-008
1.0 PACKAGING
The packaging scheme of the Saturn V computer and Apollo back-
up data adapter was examined to determine its applicability to in[light
maintenance, sparing, and module and channel switching in the high
humidity - zero gravity AES environment. Consideratio,_ was then given
to other packaging techniques which _ould improve operation in this
environment without resorting to hermetic seal'ug. Techniques that
required maintenance tools which could not be used by a suited a_tro-
naut were not considered in this study.
The study approach to the packaging problem was to first _xamine
methods for sealing the replaceable modules within the computer and
data adapter frame to provide gross protection against the high humidity,-
zero gravity env'romnent and then to examine methods for providing
additional environmental protection on the replaceable module level,
especially at the connectors. Since any seaiing method is imperfect,
especially over long periods of time, _ study was made of available
contact materials and connection techniques to ensure proper operation
of the module and cable connectors in the presence of the contaminants
which mar.age to penetrate the equipment sealing features. Finally,
consideration was given to size andconnection constraL_ts on the re-
placeable module which would affect the physicalorganization of the
computer and data adapter.
1.1 Limiting Exposure
A repackaging study was performed to determin_ the feasibility
of sealing the computer and data adapter circuits for operatio_ and
maintenance in the high humidity-zero gravity AES environment. The
approach investigated _o provide operation in the adverse environment
was _sket sealing with a slight overpressurization of the units main-
tainoa by periodic repressurization. Maintenance of the circuits would
be allowed by resealing, purging, and-repressurization after repaii-.
Almost perfect purging of gasses and water vapor and very short
purgin_; time periods are feasible by venting the sea: ,d equipment to
space. The installation of heaters within the computer and data a_._apter
to operate auring periods when a cover is removed and during purging
would provide additional assurance against the _tccumulation of contami-
nant gasses and vapors. The circulation of hot fluid through the coolant
system of the unit being purged or repaired would provide protection
equivalen_ to heaters, as would the application of eXternal infrared _,
heaters or dry air blowers.
1
1966011715-010
Treatment of the internal surfaces of the computer (including
electrical connections) with hydrophobic film would tend to prevent the
condensation of moisture on critical areas, and where moisture did
condense it would tend to form into droplets instead of thin films. The
entire interior of the computer could be treated with any of several
available silicone sprays on assembly, and solid silicone c3mpounds
could be applied during maintenance in the zero gravity environment
where sprays would be impractical.
Closed loop pressure systems were considered but will not be
emphasized in the study. The use of freon in a closed loop system (or
even in a static system) offers several advantages including moisture
repellent characteristics.
Three approaches to packaging the computer for operation and
maintenance in the AES environment were examined and compared.
These three approaches differ in the level of packaging to which the
sealing and purging techniques are applied. Although the preliminary
investigations were related only to the computer, the concepts apply
as well to the data adapter.
In the first approach the entire computer is sealed as a unit as
shown in Figure 1. Upon removing one of its two covers to replace a
failed r._odule, the entire computer is exposed to the adverse AES en-
vironment. When the cover is replaced after repair, the free moisture
and contaminants trapped in the computer are purged and the computer
repressurized to somewhat greater than cabin pressure with dry gas.
Inthe second and thirdapproaches an attemptismade tolimit
the degree of exposure duringa maintenance actionby sealingvarious
portionsofthe computer independently.Sincea triplemodular re-
dundant(TMR) computer consistscfessentiallythree individualcorn-
peters,each channelcan be separatelysealedso thatone thirdor less
ofthe computer isexposed each time a repairisattempted. The cast-
ingofthe SaturnV computer isdesignedsuch thateach logicchannelis
partitionedintoeffectivelyfivecellsas shown in Figure 2. Ifeach cell
isseparatelysealed,thena fifteenthor lessofthe computer isexposed
each time a repairisattempted. These second and thirdpackaging
approaches are illustratedinFigures 3 and 4, respectively.
"/Tie computer covers can be sealed by means of special gaskets
or modified O-rings. Gasket-sealed electronic units have been produced
at IBM to provide leakage rates as low as 10-7 cubic centimeters per
second per inch of linear seal length for ASQ-38, Gemini, and other
applications. The larger covers, especially those of the first approach
above, would preclude leak.age rates as low as this so that periodic
2
1966011715-011
Screws \ _ Gasket
(_ I 0 0 _ Coolant Connector
© ©
/ Memory Cover
II _ PurgingConnector
0 0 r _
O I "A,,
o
-------_ r-
l_'igure 1. Unit Packaging Approach
repressurization of the computer and data adapter would be required.
Figure 5 shows how the residual gas pressure of the proposed designs
would decrease with mission time.
An examination of Figure 5 reveals the following information:
1) The unit-seal approach affords the lowest leak rate (because
its linear seal length to volume ratio is the smallest). For
the same reason, the leakage rates from the memory mod-
ules are less than the leakage rates from the logic modules
for both the cell-seal and channel-seal designs.
3
1966011715-012
Figure 2. Computer Casting
2) Ifthe assumption ismade thata module replacement
occurs on the average ofonce a week duringthe 90-day
mission (10logicfailuresand 3 memory failuresper mis-
sion),thenthetime periodfor which the sealiseffective
(definedas the periodduringwhich the residualpressure
exceeds 50 percentof the inRialoverpressure)isdeter-
mined by the component failurerateofthe computer rather
thanby theleakageratesfor boththeunit-sealand channel-
sealdesigns. These factorsare tabulatedfrom Figure 5
as follows:
Time Period (days)to
SealingLevel Failure 50O/oPressure
U,_itLogic 4 I/_. 90
UnitMemory 15 90
Channel Logic 15 25
I Channel Memory 45 90
4
1966011715-013
Pres'_,r. _ec_l;n(i Sc,ews
oo Io
°° _0 0 0 Oj0 Q Coo,or.,Co,nec,o,
0 OOj 0 Electrlca, Connector
go_ o1°_
_t
"_ Logic Cover_
A_ ['1
_ Casting
I I Memo';" Cover
v v_- _'_ ._c Ii U 4 - Purging Connector
0/70o ol --
0 0 C _
F
Figure 3. Channel Packaging Approach
3) For the same assumptions, the time period for which the
seal is effective for the cell-seal design is determh._ed by
leakage rate rather than component failure rate. The fol-
lowing values were taken from Figure 5.
i Time Period (days) to
Sealing Level A Failure _ 50% Pressure
Cell Logic / 67 16
Cell Memory _ 90 55
5
1966011715-014
/-- Screws /_ Purging Connector
°-- o/_o. • o_
-- o0 0 0 0 _._O@ . 0®: 0 Coola.t Connector
0 o o o o o O
Electrical Connector© --- 0
--0 o o o o o !0o
Laglc Covers
Casting
L-
Memory Cover/
II U U Purging Connector
o/O
o • _ 7
e Oo 0
e e
• • • •
0 0 0 _ 41
Figure 4. Cell Packaging Approach
No attempt was made in the leakage rate computations to account
for the probable differences in efficiency of the various sealing ap-
proaches. Experience at IBM has indicated that the leakage rate of a
large seal (such as an entire computer cover) compared to that of a
small seal (such as an individual cell cover) is greater than that pre-
dicted simply by a difference in the linear length of the seal. This
effect is due to the greater difficulty in maintaining tolerances, parallel
sealing surfaces, and uniform sealing pressure with larger units. If
this effect were taken into account, the curves of differential pressure
versus mission time would be grouped closer together.
6
1966011715-015
1.0-
Component Fai lure
Figure 5. Initial Leakage Rate
7
1966011715-016
In addition, other maintenance features favor the cell-seal de-
sign. An average of about 9 screws would have to be removed every
time a failed module is replaced in the cell-seal design compared to
23 screws for the channel-seal and 32 screws for the unit_-seal designs.
The exposed volume per repair is about as follows:
Volume Exposed (in 3)
Cover Removed Unit Channel Cell
Logic 1500 125 25
Memory 15n0 300 150
Gasket sea!_ have a tendency to deteriorate with usage. Each
time a seal is broken and then resealed, the effectiveness of the seal
is compromised. Figure 6 is an estimate, based on IBM experience,
.of the effectiveness of a gasket seal versus use in the AES mission.
Repackaging the computer into independently sealed cells pro-
vides an improvement in ease of maintenance and in exposure to the
AES environment during relzair. The average number of screws which
must be removed to replace a failed module is less than a third and a
half, respectively, of that of unit and channel sealing, respectively.
The amount of circuitry exposed during a repaix is less than a twentieth
that of unit sealing and a fifth that of channel sealing.
The advantages of cell sealing over the other two approaches are
obtained at the cost of somewhat increased size and weight, as indi-
cated in Table 1. Since the advantages appear to outweigh the dis-
advantages, the cell or honeycomb approach weuld be recommended
over the other two approaches.
1.2 Connector Sealing
Even if fairly efficient protection of the replaceable modules is
achieved by sealing the modules within the computer and data adapter
frame and by limiting the exposure of the modules to the high humidity-
zero gravity environment, some free moisture and contaminants will
collect eventually at the module connectors. Whatever packaging tech-
niques are selected for AES applications, the problem of sealing the
intermodule and inter equipment connectors against the high humidity-
zero gravity environment will exist.
8
1966011715-017
100
75
0
0 25 50 75 100
Number of Reseals
Figure 6. Seal Deterioration with Use
Design analysis and exploratory testing of methods of sealing the
connector of a replaceable module resulted in the selection of a gasket-
silicone gel technique for the representative module to be demonstrated
according to the study test plan. As shown in Figure 7, a silicone rubber
gasket was cemented to the face of the female connector. The female
connector was then loaded with a silicone gel. The male pins on the re-
placeable module were also coated with silicone. Since the slotted holes
in the gasket are somewhat sma!ler than the male pins, a wiping action
occux _ both on insertion and on removal of the replaceable module. This
wiping action serves to remove moi sture from the male pins on ]nsertion
and to retain the silicone gel in the female receptacle upon removal of
the replaceable mqxlule. This concept was used in the Phase II Testing.
9
1966011715-018
TABLE 1 -- Physical Comparisons of Packaging Approaches
Deviation from Saturn V
(percent)
Physical Characteristics
F Unit Channel Cell
|
Volume and Mounting Area + 6 + 13 + 17
Weight + 1 + 3 + 5
Screws/Maintenance Action Ref. - 25 - 75
Exposed Volume/Maintenance Ref. - 65 - 90
Leakage Rate --- Initi_l Ref. + 75 - 25
--- End of Mission Ref. + 30 - 80
Phase Itestingincludedinvestigationsofgaske_sealson the
interfacebetween the male and female connectors,sealingofthe con-
nectorswithvariousgreases, and combinationsofgasketsand greases.
The techniqueshowingthe most promise is sketchedin Figure 8 A
male an.dfemale Saturn-V page connectorwere wired and sealedwith
epoxy on theirrear surfaces. A siliconerubber gasketwas gluedto
the faceofthe female connectorwithDow-Corning A9-4000. The
female cap was removed and DC-3 siliconegrease packed insidethe
connector. The pinsofthe male connectorwere also saturatedwith
DC-3 siliconegrease. Contactmeasurements beforeand sifterappli-
cationof siliconegrease indicatedthatthe grease had no measurable
effectson thecontactresistancebetween male and female connections.
Leakage resistance checks between adjacent pins were made under the
following conditions:
1) Initial leakage resistance of mated test model --500, 000
megohms;
2) Immersed mated connector in fresh water for 15 seconds
and shook off excess water- 2, 000 to 10, 000 megohms,
erratic
10
1966011715-019
P,eplaceable Module
%
%
%
Computer Frame
Figure 7. Connective Sealing _echnique (Modified
Saturn-V Connector)
11
1966011715-020
_oxy
,--- Connector
Gasket
Screws
Cap Gasket
Connector
Epoxy
_'_'--- Cable
Figure 8. Phase I Test Model
12
1966011715-021
3) Unmated connector, dried male for 20 seconds at 125 de-
grees Fahrenheit, re_.nated- 140, 000 megohms
4) Unmated connector, immersed both halves in fresh water
for 15 seconds, shook off excess water, remated.-70, 000
megohms
5) Unmated connector' and remated- 5,000 megohms
6) Unmated connector and _emmed- 70, 000 megohms
7) Unmated connector _nd remated- 85,000 me,ohms
8) Unmated connector and remated- 60, 000 megobms
9) Unmated and remated connector under fresh water -- reading
e r ratic;
10) Unmated connector and shocked water o;f male connector on
desk top, remated -- 50,000 megohms
11) U- hated connector and remated -- 10,000 megohms.
Although the preceding readings appeared erratic, they were very
encouraging from the following viewpoints:
i) The lowest leakage resistances were still m the thousands
of megohms
2) The surface between the cap anti connector of the female
and the screw holes in the female presented sources for
leakage which were sealed during Pha,ze II tests.
3) The large holes in the female gasket which allow penetra-
tion of the male pins also allowed penetration of excess
moisture.
The same test was essentially repeated in a salt wa, er solution with
no significant changes.
Another material used to seal the female connector was Dow
Corning Sylgard 51 dielectric gel. When cured, it develops into a soft,
transparent, jelly-like mass having good self-healing qua!ities. Differ-
ent consistencies were used, but none were found to be satisfactory.
All had a tendency to be drawn out when the connector was unmated.
13
1966011715-022
The most optimum sealing would be achieved by sealing the female
connector with a gasket without the need of impregnating with silicone
gels. A most promising concept is that of a membr2ne which has
qualities of self-sealing when unmated. Special tools would be required
to fabricate this type of gasket.
Under any conditions, a new development program would have to
be initiated _or a connector for this application. The new comuector
would be a molded one-piece connector with a self-contained gasket
containing approximately 200 self-sealing holes.
1.3 Contact Considerations
All investigation was performed to determine the best contact
material fer use in the AES connector application. Work performed in
IBM's field test program was included in this investigation.
The data of Table 2 presents partial results of a continuing IBM
environmental field test program to determine certain properties of
electrical contact materials. The tests were performed for a period.
of over a year at several test sites representative of a variety of en-
vironmental conditions. Relative humidity at the_e sites varied from
around 10 to over 80 percent. Measured airborne contaminants in-
cluded various amounts of NO2, HF, NH3, SO2, 03, CL2, and H2S.
Ambient temperatures varied over a range from 60° to 100 ° F.
Table 2 shows a summary oI the original values of contact resistance
for the contact materials tested and the values after 1 year of env_on-
mental exposure. The values in the table were obtained by averaging
the data from several test sites.
Use of copper alloys or other of the high resistance materials
listed in Table 2 would not be considered for use as contact material
in AES applications. Some of the data showed practically infinite re-
sistance after 1 year of exposure to the most adverse environments.
Although not indicated in the table, caution should be exercised in the
use of palladium or palladium alloys containing copper where high con-
centrations of organic materials may be encountered. Because of the
catalytic nature of palladium, polymers may form in a high organic
atmosphere.
14
1966011715-023
TABLE 2 -- Contact Resistance (Average
of Several Field Sites}
Resistance
(ohms) Original After 1 Year
100 Aluminum, Nickel-Silver,
Beryllium Copper, Cop-
per, Phosphor Bronze,
Brass, Nickel, Silver-
Cadmium Oxide
0.01 to 100 Aluminum, Nickel-Silver, Red Gold, Green Gold,
Phosphor Bronze, Brass, Silver
Nickel
0.001 to 0.01 BeryUium Copper, Cop- Tin, Tin-Lead, Rhodium,
per, Red Gold, Green Platinum, Platinum-
Gold Iridium
0.001 Silver, Silver-Cadmium Gold
Oxide, Tin, Tin-Lead,
Rhodium, Platinum,
Platinum-Iridium,Gold
i
The films that form on the surface of silver tend to be highly
resistive, coherent, and tenacious if tb major ingredient is silver
sulfide. However, if appreciable amounts of silver chloride are pres-
ent, the film may be nonadherent and of moderately low electrical re-
sistance. Although the average contact resis _tance after I year was
indicated in Table 2 as 0.01 to 100 ohms, the measured values at the
various sites varied over the range bracket of 0.0.ill to over 100 ohms.
Although silver would be very applicable in controll_q environments,
it will not be considered further in this study.
Samples of 10, 14, and 18 carat r_Teen-gold alloys (simple solid
solutions of gold and silver) showe_, characteristics similar to sih.er in
wide variations of contact resistance with environmen.+.. Although the
variations were less extreme because the films were thinner in pro-
portion to the gold content, caution should be used in tt, e use of green
gold alloys for AES applications. Red gold alloys (solid solutions of
gold and copper) compared generally with green gold and the same
conclusions apply.
15
1966011715-024
The oxide films of tin, lead, and other soft metals tend to be
coherent, self-limiting, and thin. These films are easily penetrated
under pressure as the soft-bulk material yields. Despite the apparen[
attractions of tin, lead, indium etc., their use is not recommended for
low-load separable-contact applications, espccmlly in sliding situations
where wear debris can build up_
Gold and gold alloys would seem to be best applicable as contact
materials tor AES connectors. The excellent behavior of gold and gold
alloy with exposure time is shown hu Figure 9. The higher platinum
content alloy (6-percent platinum, 25-percent silver, 69-percent gold)
is probably preferred although more test data is required for a firm
dec.ision. Visible _ilms do exist on exposed _go...sin-facesbut consist
ofabsorbed materialratherthantarnishproducts. SMS goldand 24-
caratgo_oshowed similarpropertiesdua-ingthe tests.
Shinegoldand low resistancegoldalloysare softmaterials,
theirprimary use as contactmaterialstodatehas been intheform of
goldplatingover base materials such as nickelor copper. The major
problem inthisapplicationhas been the porosityofthe plating,which
leavestheporous areas ofthe basic materialsexposed to contamina-
tion. Resultingdegradationofthe contactsurfacecan occur intwo
differentways.
When thebase materialexposed under the pores inthegold plat-
ingare attackedby contaminantssuch as sulfur,chlorine,and nitrous
oxide,the resultingsulfides,chlorides,and nitrateswillmigrate
throughthe goldpores and spread out over the surfaceoft]_ contact,
forming highresistancefilms. Althoughthese creepingfilms may be
controlledby means ofsurfacelubricantsor choiceofbase material,
the primary solutionwillprobablybe controlledporosity.
The second phenomenon which porous platinginvitesiselectro-
lytic activity at the base material in the presence of moisture and
active atmospheric contaminants. Flaking of the gold plating itself
can result as well as formation of resistant films by the migration of
corrosion products to the contact surface.
Both types of contact degradation dictate that the porosity of the
gold plating be minimizea. The obvious : )proach is to increase the
thickness of the plating. However, Figure 10 shows that although the
porosity does decrease as plating thickness increases, the curve levels
off at the higher thicknesses. This c::rve represents an average of data
at IBM and elsewhere and is plotted celative to the plating porosity at
50 mils plating thickness as standard (or unity). Because of practical
considerations, plating thicknesses over 150 mils are not being
considered at IBM.
16
1966011715-025
10-1
S
J
.s
s
_ Gold Alloy {5 % Platinum)
10-2_
E Gold Alloy (6 % Platinum)l-
0
t-
2
o_
a
U
10-3_
- SMSand 24-Carat Gold
10"4 i f I _ I i
0 7 4 6 8 !0 12
Exposu,ei"ime (Months)
Figure 9. Change in Contact Resistance Versus Time
(Gold and Gold Alloy)
17
1966011715-026
3.0_
2.5-
2.0-
i
0
.,i-.
o 1.5-
.__
o 1.0-0
r_
0.5-
0 ! I I
0 50 1O0 150 200
Plating Thickness
(Mils)
FigureI0. PorosityVersus ThicknessforGold Plating
18
1966011715-027
However, experience has shown that other considerations such as
production and testing methods tend to predominate over actual plating
thickness in determining porosity at the higher thicknesses. For ex-
ample, an increase in thickness from 50 to 150 mils was obtained in one
c_se by increasing current density rather than increasing processing
time, and the resulting thick plating had higher porosity than the thin
plate. The degree to which porosity can be minimized is very sensitive
to the cleaning and handling techniques used on the base material.
Porosity of the final product can be controlled by adequate testing
of the plating to screen out those samples exceeding a predetermined
limit. For critical applications such as AES, the limit can be set much
more stringently than the liLlits set for commercial grade platings.
Very sensitive electrographic testirg methods are used at IBM in which
an electrolyte-saturated filter pad is wrapped around each contact and
a small current made to flow frgm the contact through the pad. Mter a
fixed time period, a reagent applied to the pad indicates the porosity by
color test.
Protective contact coatings have been produced by welding gold
foil onto the base contact material rather than by plating. Although this
method has essentially eliminated the porosity problem associated with
plating methods, it has not proven suitable for commercial applications
because of the inherent cost of the process and because of the difficulty
in _r_.ducing uniform coating thickness. This technique should not be
overlooked, however, for low quantity-highly critical applications such
as AES.
Multilayer platings are generally more expensive than single
layer platings of the same total thickness and are therefore not favored
for high production commercial applications. However, when a contact
surface is built up from several layers of gold plating, there will exist
some misalignment of the porosity of the individual layers and a result-
ing decrease in effective porosity over that of a thick coating deposited
as a single layer.
Even if the problem of porosity is solved by pursuing those meth-
ods which have been rejected for commercial applications because of
cost or of the difficulties in mass production, a problem may still exist
in the form of diffusion, which in the migration of base contact material
or impurities through the plating material itself rather than through the
pores. The transferred materials form resistive films on the surface
of the gold plating in the same manner as those caused by porosity.
Some commercial applications use an intermediate barrier layer of
nickel between a base material of copper and the gold p_.ating to retard
diffusior_. Rhodium was found to exhibit the highest retardation capability
as a barrier layer but was considered too expensive, too difficult to
process, and too brittle for commercial use.
19
1966011715-028
The choice of plating and base contact materials must also be
made after consideration of tlle possibility of galvanic corrosion. To
prevent galvanic action, metals in relatively close position m the
electromotive series zhould be chosen for plating and base materials.
It appears that the choice of nickel as the base material with gold
as a thick-film surface material is best suited for AES applications in
view of all the preceding considerations. The thick film should be built
up by several successive electrodeposits or, preferably, by welding
gold foil on the nickel contact pins. Very stringent process and testing
methods must be applied, which will surely result in very high rejection
rates. Although the resulting connector costs will be considerably
higher than the costs ol presently employed commercial connectors,
the pin corrosion problem would be minimized for the AES applications,
and the costs seem to be justified for low quantity, highly critical
usages.
The resistanc_ )f an electrical contact is made up of two compo-
nents: 1) constriction resistance due to the convergence of current
flow lines to points of contact and 2) film resistance due to impedance
of electron flow by the surface films.
Constriction resistance varies pr_,arily with the resistivity of
the contact material, the contact load, and the contact geometry.
Figure 11 shows the resistance of several contact ma-
ter ials measured with a 1/8-inch diameter, spherically shaped gold
probe tip. The curves indicate the manner in which constriction
resistance varies with contact load for given contact geometry and
material. Note that some of the curves cross, suggesting tbat the
choice of contact material for a given connector design may be made
on the basis of the contact load of that connector. However, consider-
ation must first be given to film resistance.
Contact films are the result of reaction of the contact material
with one or more contaminants in the environment or absorption of
impurities by the contact surface. In general, all metals except pure
gold will form reaction films. Gold alloys will form reactant films in
proportion to the amount of alloying material as indicated in Figure 12.
The curves were derived from IBM field test data. Gold alloys may be
required in A]_;S applications to provide sufficient hardness and wear
resistance.
2O
1966011715-029
3.0
/
2.5
Figure 11. Constriction Resistance Versus Load
21
1966011715-030
1000.
Contact Load: 3.5 Ounces _/'_
Contact Geometry: 1/8" Diameter Gold _
%.
10C Tipped P,obe _
E
t-
O
u
C
121
o--
t-
o 3 MonthsU
1C
Start of Test (0 Months)
10 12 1'4 16 1'8 d0 _2 24
Gold Content (Carats) of Alloy
Figure 12. Contact Resistance Versus Alloy Gold Content
22
1966011715-031
1.4 Replaceability
Inflight maintenance, for prolonged missions, requires the care-
ful analysis of certain aspects of packaging and accessibility require-
ments of man-rated electronic equipment. One of these aspects is
replaceability.
Repiaceability is defined as the proper restoration or substitution
of like modules in a system or unit in the minimum period of time with-
out the use of any special tools. In the AES computer-adapter packaging
configuration, i_ was decided that all the modules were to be designed
as pluggable units. Each would have a connector on its mating face.
The AES equipment is basically divided into five major modules: logic
circuits, memories, power supplies, filters, and interfaCe drive cir-
cuits. All these modules will be designed as hermetically sealed units.
Similar units, filled with a nontoxic gas, have been extensively used
in the IBM designed B-52 bombing-navigation equipment.
All the logic circuits such as thning, control register, and arith-
metic logic, will be mounted on a page similar to the ones used in the
Saturn-V Launch Vehicle Digital Computer. The page will be slightly
iarger, measuring 4 inches square and will be mounted on a 200-pin
connector. This assembly will then be covered with a can and joined
hermetically with a tear strip. The tear strip has been successfully
used in the B-52 aircraft electronic equipment. The tear strip concept
lends itself to manufacturing checkout, testing, and depot repair.
The sealed can could be made out of either stainless steel or
aluminum. Before a recommendation is given, trade-offs would be
made of heat transmission, weight, corrosion, and cost. The canned
pages will be packed with silicon grease, RTV, or sylgard to prevent
moisture penetration. The hermetically scaled cans will be made with
guides on each side to assist in aligning the connector and to assist in
heat transmission.
Once aligned the page will be mated by a screw action. The screw
will be made a part of the page assembly. Its purpose will be to firmly
set the connector and lock it into place by means of a camming action.
No additional covers will be used.
The power supplies and filter will be approximately 4-inch cubes.
Power supplies will be hermetically sealed in either stainless steel or
aluminum cans. Modules will be made self-aligning. They will be
secured into place either by a screw-cam affa_.r similar to the logic
pages, or a ball-type camming arrangement will be used. End item
design will depend on size, weight, and accessibility in the vehicle.
23
1966011715-032
Three double density memories will be required. The memories
will be approximately 6 x 5-1/2 x 4-inches in size and will be mated in
position using a scheme similar to that used for the power supplies.
Repairability will be the prime factor in deciding a packaging concept.
This is due to the high cost of the memories.
The present design has all modules mounted on the top face of a
chassis. Integral cooling, similar to that used for the Saturn-V LVDC,
will be used. The bottom face of the chassis will contain the intercon-
nection wiring. The bottom section will be hermetically sealed. The
estimated weight of the entire package is 69 pounds.
The present packaging configuration, discussed in the prec_ :ling
paragraph, may be slightly altered when the vehicle mounting and instal-
lation requirements are more completely defined. At present, the
package will be installed face up, rack-and-panel style. Other pack-
aging configurations can be used, i.e., a page type package hinged on
one side or a spindle package mounted in a merry-go-round style.
All these form factors will be fully considered in the final design when
requirements are fully defined.
1.5 Module Size
Every consideration was given to designing the computer as com-
pactly as possible without sacrificing the maintainability and reliability
of the equipment. The present data processing equipment consists of
two pieces of hare, rare, a computer unit and a data adapter unit. Every
attempt in this program was made to combine the equipment into one
unit and still have a small, light unit.
The AES computer and data adapter consist of memories, power
supplies, filter, logic pages, and drive circuits. Every effort was
exerted to combine circuits for minimum package density. Different
machine packaging trade-offs, were considered. It is envisioned that
the computer will be fabricated with integrated circuits; however, it
can be made applicable to the present ULD circuit family.
One significant parameter which is frequently used in machine
packaging trade-offs is the ratio of required module (page) intercon-
nections per circuit (ULD, flatpack, etc.). Figure 13 shows the re-
sults of available data on this parameter for Saturn-V and for the
average of other technologies including integrated circuits. Both
curves show a general trend that the ratio decreases rapidly as the
packing density increases up to about 50 circuits per module and then
tends to level off.
24
1966011715-033
3.0
1
2.5-
2.0-
J--
Om
u
_- 1.5-0
"= SATURN-V Technology (TMR)u
¢)
c
O
u
I-
1.0- Average of Other Technoloc,_s
0,5-
0 0 5'0 J i ii 00 150 200
Circuits Per P',ge
Figure 13. Interconne_tions per Circuit Versus
Circuits per Page
25
1966011715-034
The tota] number of pluggable interconnections of a machine can
now be found for any packing density (circuits per page) by multiplying
the total number of machine circuits by the interconnection per circuit
value for that packing density.
The curves of Figure 13 indicate that consideration for the total
number of pluggable interconnections places a lower limit on tile pack-
ing density. That is, interconnections limit the module size to some
minimum number of circuits per module. The vertical displacement
of the Satarn-V curve from the average is due to the reliability require-
ment of the TMR machine as wel] as to the difference between Saturn
technology and integrated circuit technology. In general, redundancy
techniques should increase the number of circuits per page to provide
an are_ sufficiently large enough to accomm_late a reasorable block
of logic.
The curves in Figure 13 were found to be similar to recently
published curves by Meade and Geller 1 and by Keyes 2 . They showed
the i-eDtion which has been found to hold using SLT technology between
:he number of connecting pins in a part of a computer such as a card,
board, or chassis and the number of logic blocks contained in the part.
The curve is reproduced as the solid line in Figure 14. Two observa-
tions were made.
The first observation is illustrated by the dotted line in Figure
14. The dotted lin_ represents the ratio of surface volume of a sphere,
S = lr 1/3 62/3 V2/3, when an interconnection is regarded as a unit of
surface and a logic block as a unit of volume. These relations are
almost identical. It is as though the logic blocks were closely packed
into a sphere and connections made to those blocks that were on the
surface. It appears that computer interconneetions have an essentially
three-dimensional cbaracter.
The second observation is based on the fact that the human eye
contains about 108 photoreceptors. It is connected to the brain by an
optic nerve which contains about 106 fibers. The point (106 connec-
tions), (108 logic blocks) falls on the extrapolated curve of Mead aria
Geller. It seems quite reasonable to regard a nerve fiber as a connec-
tion. The relation of a photoreceptor to a logic block is less clear.
It may be, however, that the amount of data processing which takes
place in Me eye _.s about what a computer designer would have put
there.
" 1R.M. Meade and H. Geller, "Solid State Design," 6, (7), 21(July 1965).
2 Robert W. Keyes, "On the Relation Between Number of Connecting
Pins and Number of Logic Blocks," 28 July 1965.
26
1966011715-035
I000
E. 100 L / / ,,,,,_,,_ _,,_"_ "_ "_ *_ "_ _"
.o
U
c-
c-
O
U
!0
i I I
1 10 1O0 1000
Logic Blocks
Figure 14. ConnectionsVersus Logic Blocks
Taking thisinformationand interfacingitwith Figure 13 results
in Figure 15. The dottedcurve shows the ratioofsurfacetovolume
plottedon Figure 14,and itfallsinbetween Saturn-V technologyand
theaverage ofothertechnologies.Italsoshows thatthe ratiodecreases
rapidlyas the package densityincreasesup toabout 50 circuitsper
module and thentendstoleveloff.
27
1966011715-036
3.0_
2.0-
D
O
U
_" SATURN-V Technology (TM2)
.o _atio of Surface and Volume of Sphere
0
llJ
o_,- \ _
Average of Other Technologles
O v I
0 100 200
Circuits Per Page
Figure 15. Interconp.ectionsper CircuitVersus
CircaRs per Page
28
1966011715-037
The placement of voters in a TMR machine is an important con-
sideration in machine organization. Maximum machine reliability is
theoretically obtained with an organization in which the voting level is
such that the reliability of the voter ._ equal to the reliability of the
logic being voted upon. However, in practical aerospace machines,
the fan-in and fan-out complexity makes voter placement according to
this simple rule far from obvious, especially with restricted packing
_ensities.
A normalized curve ofvoters per machine versus packingdensity
isg!v:ninFigure 16. Althoughthiscurve was derivedfrom Saturn
data, itshouldapply generallytootheradvanced technologies.The
curve was constructedaccordingtotherulethatvoterswould be placed
on allintermodulesignals.
The minimum packingdensitywas taken as the inverterlevel,
pointI on Figure 16. That is,each inverterispackaged individually
as a replaceablemodule, alongwithassociatedAND's and OR' s, and
the number ofvotersisequal tothe number ofinvertersin+.hemachine.
The ordinateatthispointisabout0.75, the ratioof invertercircuitsto
to_alcircuitsinthemachine. As the sizeofthereplaceablemodule is
increasedto includemore inverters,the number ofvotersrequired
decreases relativelyslowlyatfirstbecause most inverteroutputsfan
out to several ,,_,,_,_+"_....,.,,,,.,.,_"'-"-_. As *_..,_o_...... o_,,,_,1_modules are absorbed
into larger modules, the number of inverters feeding out of the module
decreases rapidly, and the curve decreases accordingly. Then, in the
density region of about 0.05 to 0.10 circuits per page (that is, each
replaceable module contains from 5 to 10 percent of the total machine
logic), the curve flattens out as the organization tends toward "isolated"
functional modules. As the machine org'anization progresses from ten
towards one module per machine, the curve linearly approaches point 2
where the number oI voters required has reduced to the logic interface
(memory and input-output).
One of the costs of organizing a machine into individual replace-
able modules is the increased circuitry required. This increase is due
primarily to the additional drivers and decreased circuit-sharing im-
posed by the modular design. The effect is small except for organiza-
tions in which the machine is broken into ten or more modules, as
shown in Figure 17.
Figure 17 is a normalized curve of circuits per machine versus
circuits per page derived from Saturn-V computer design data. The
conclusion from this curve, if it represents the general case and not
just Saturn, is that the AES computer should be organized _.nto ten or
less modules.
29
1966011715-038
0,8"
0.7-
0°_ o
0.5-
SATURN-V Computel Organization
.__
U
0.4 "-
0 _
0
t I--
0.3- _6
=
U
0.2- _
0.;
0
I I I 1
0 0.2 0.4 0.6 0.8 1.0
Circuits Per Page (Abscissais normalized with respectto the total numberof circuits
in the modularizedmachine.)
F _ure 16. Voters per Page Versus Circuits per Page
3O
1966011715-039
2.2.
* Circuits per Machine" scale is the raHo of number of circuits (ULD_c,
flat p:cks_ etc.) in the modularized machine normalized with respect
to the number oF circuits in the unmodularlzed machine.
** "Circuits per Page" scale i_,the ratio of number of circuh_ per re-
2.0" placeable module normalized with respect to *he number of circuits
in the modularized machine.
1.t_"
.ll
e-
°_
i-
U
1.6
"S
u
1.4
1.2
F SATURN-V Computer Organization
1.0 ._ _ ...... ,
0 0.2 0.4 0.6 0.8 1.0
Circuits P_r Page
Figure 17. Circuits per Machine Versus Circuits per Page
31
1966011715-040
A TMR machine organization in which all three channels are routed
through the same physical replaceag'3 modules has been found to result
in the minimum interconnection requirements. This is shown in Figure
18, which represents output voters for module 1 of a TMR computer. If
the individual channels of module 1 are packaged on separate physical
pages, then the cross-channel communication is external to the page
and two inputs and t_o outputs are required per channel, as shown, or
a total of 12 intercounections are required in all. If, on the other hand,
the three channels are routed through the same page, then the cross-
channel communication signals are internal to the page and only one
output is required per channel, or a total of three interconnections
in all.
ModuIe 1 MocluIe 2
Channel 1 _. _ Channel 1
P,
8 ,
...... o_
E
E
o
U
Channel 2 _ Channel 2
ilJ
c ib
u
.L r ,_ '
8
L)
Channel3 _ , _. , Channel3
Figure 18. Channel Pa¢ .ging
32
1966011715-041
The reduction in interconnections which can be realized on the
computer level by packaging all three channels on the same physical
page has been found to be as high as 25 percent over those required by
packaging the channel_ individually. The feasibility of packaging partial
trios (more than one but !ess than three channels per page) has been
investigated and found to offer little advantage over either single-
channel or triple-channel packaging.
A preliminary machine organization for the TMR computer
assumed ten modules (about the knee of each of the curves of the
figures) of about the same size. The resulting module size was about
40 circuits of logic, which increases to 60 as tl_,e required drivers and
voters ar_ added. Packaging all three channels on the same page re-
sults in a packaging density of 180 circuits per page and 160 input-
outPut terminals.
Since the latter number is not compatible with a 98-pin connector,
a higher capacity connector must be designed or a replaceable module
design containin¢ more than one connector per page must be derived.
The first solu, is feasible but will result in an appreciable increase
per machine in the total number of circuits, voters, and interconnec-
tions. The second solution seems feasible at this time after discussing
the problem with connector manufacturers. The use of lubricants also
assists in fabricating connectors with 200 pins. The third solution is
feasible, neverthelesG it does increase alignment problems.
Since the interconnection limitation imposed by the 98-pin
capacity of the Saturn-V page connector appeared to be a very severe
constraint on the machine reorganization configuration, a quick survey
of the connection lubrication state-of-the-art was made to determine if
lubrication techniques might allow higher capacity connectors to be used,
The results of a continuing study at IBM of the characteristics of thin
film lubricants to reduce contact wear were reviewed. Consideration
was also given to the possible u_c of lubricants as protective coatings
for contact surfaces in adverse environments.
A lubrication study by the U.S. Army Electronics Laboratories and
Stanfora nesearch Institute resulted in the recommendation of octa-
decylaw ,e-hydrochloride (ODA-HCL) for use with gold contact sur-
faces, Test_ at IBM have verified the excellent properties of this
lubricant. Octadecylamine-hydrochloride lubricant forms a very stable
and tenacious film on gold surfaces. These properties are probably
due to physical absorption of the lubricant by the gold surface, and
perhaps also due to electrostatic attraction between the lubricant and
the gold. The thin film does not affect the electrical resistance of the
33
1966011715-042
gold contact while decreasing its coefficient of friction up to 75 percent.
The films are stable with time, contaminants, and hard vacuum. The
film maintained its lubricating properties and its low emctrical resist-
ance characteristics after prolonged exposure of several weeks to
atmospheres containing sulphur dioxide, hydrogen sulfide, and water
vapor.
Since the test results on octadecylamine-hydrochloride were so
consistently encouraging, it is recommended for AES connector lubri-
cation applications, even if small capacity connectors are used. The
decrease in insertion forces which it apparently affords, however,
would seem to indicate that page connectors with capacities of at least
15() to 20(}pins may well be feasible.
34
1966011715-043
2.0 MACHINE ORGANIZATION
The machine orgazfizatJ.on of the Saturn-V computer and t"_2
ApGllo backup data adapter was examined to determine its applicability
to the critical phases of the mission as well as its applicability to in-
flight maintenance during the noncritical phases of the mission. Con-
siderable study effort was then expended on modifying those areas of
the maci,lne organization representing serious constraints oh the
mission capabilities of the computer and data adapter. The reorgan-
ized version of the computer system included major changes in tile
oscillator, memories, power supplier power, and timing distributions,
and internal grounding. A TMR/simplex mode was developed which
incorporates certain automatic _:witching features and provides appre-
ciable increase in reli.ability over the basic TMR mode. A portion of
the organization study was directed at reducing the susceptibility of
the computer system, to externally generated voltage transients.
2.1 TMR Characteristics
One of the primary objecti_es of this study was to determine the
feasibility of a t,ciple modular redundancy configuration as a solution
to the short-term reliability problem in AES missions. TMR is a
form of reduad,_ncy incorporating two- out-three voting as shown in
Figure 19. Even if 053 modute-fails (dotted lines), the outputs of all
three voters are cc:-',_ct. The TMR organization therefore possesses
the unique characieristic tha_ component failures can be tolerated and
their disruptive effects on system performance masked automatically
by voting _;thout the need for error detection, diagnosis, and repair.
This error mas_ng occurs without interruption of the operational
program, a characteristic found in few other forms of redundancy.
The reliability models for the TMR portions of the computer
configurations examined during this study were based on the following
analysis of the reliability states of a TMR module. A TMR module
is defined as a section of the instrumentation isolated from other sec-
tions by voters. In the computer configurations of this study, this
reliability modul_, corresponds generally to the physical modules since
most of the voters were used at the physical interfaces.
35
1966011715-044
Module 1 Module 2 Moaule 3
Figure 19. TMR Voting
2.1.1 Reliability States
The primary reliability states of a TMR module zu'e shown in
Table 3. State 1 represents the condition of all channels operating.
State 2 represents the condition of two channels operating and one
failed. The coefficient 3 indicates that there are three ways the mod-
ule can be in State 2: channel i or channel 2 or channel 3 failed. State
3 represents the condition of one channel operating and two failed, and
State 4 repre._ents the condition of all three channels /ailed.
States 1 and 2 are operating states and state 4 is a failed state.
State 3, however, can be operating or failed depending on whether the
failures in the twe failed channels are in the opposite or in the same
logic direction, respectively. If one channel is failed to a logic "zero"
and the second channel failed to a logic "one", for example, the third
channel dominates the voting and the system will continue to operate
correctly.
36
1966011715-045
TABLE 3 --Re!iabihty States for TMR Modules
Sta' _. Operating Failed
1 Rc 3
2 3 Rc 2 (1 Rc)
3 3 P(o)Rc (i- Rc)2 3 P(s) Rc (I- Rc)2
4 (I- Rc)3
., .L..
Sta_eI - All modules operating
2 - One module [ailed
3 - Two modules f_iled
4 - Three modules failed
P(o_- Probabilitythatthetwo failuresare iN the
oppositelogicdirection
P(s)- Probabilitythatthetwo failuresare inthe
same logicdirection
Rc - Reliabilityofone channelofthe TMR module.
2.I.2 Basic TMR Reliability
The basic reliabilityof_ TMR module isderivedby addingthe
probabilitiesoftheoperatingstates. From T&ble 3, the reliabilit]
ofthe TMR module is:
3 2(1_ Rc) + 3 P(o) Rc (1- R) 2RTM = Rc + 3 R c
where Rc is the reliability of one channel of the module and P(o) is the
conditiontA probability that, if two failures occur, tbey occur in op-
posite logic directions and their votes tl=_refore cancel.
Mo,_le reliability is plotted against channel (or simplex) rell-
ability in Figure 20 for three values of P(o). Note that TMR redun-
dancy actually provides less reliability than simpl,_x if the reliability
37
1966011715-046
Rtm = Rc3 -_ 3Rc 2(1-Rc/ -_ 3P(O) Rc(1-Rc) 2
Rtm :: TMR Module Reliability
.kc = Simolex (Channel) Mr _,:le Reliability
P(O)= Probability that two fcilure_, will occur
in opposite !oglcal dire:tions
Figure20. TMR VersusSimplexReliability
38
1966011715-047
of the simplex channel is less than 0.5 and ititis assumed that all
related logic failuresoccur in the same logicaldirection. This con-
servative assumption is usually made in estimating the re.liabilityof
actual TMR systems sucb as the Saturn-V computer and data adapter.
The assumption of equal probabilityof failureto "one" or "zero"
states would be applicableto ._system constructed of symmetrical
double-line-transfer logic, but such systems rarely exist in practice
since circuit minimization requirements normally dictatean appreci-
able amount of unsymmetrical single-line-transfer logic.
2.1.3 Inter,;)ittent Masking
In the case of intermittent failures in a TMR module, the effects
of the failure are voted out as in the case of solid failures, but when
the period of the intermittent ends, the module automatically recovers
its original reliability state. This automatic recovery is a unique
characteristic of TMR system organizatio_.sand a fe_¢isolatedsub-
system units such _q the duplex Saturn-V memories. Flight fz;lures
on present 3.ndpast programs tend to be mostly intermittentin nature,
probably because of the much m_,,,=,o,_,==,,Ingefficiencyof present
checkout methods for solid faultsthan for intermittents resultingin
most of the soP_dfaultsbeing corrected before flight. In fact, as the
level of testingprogresses from preassembly screening of modules
to preflightcheckout of computer systems, the ratio of intermittent
to solid failuresfound during test apparently increases monotonically
with time and usage.
Data relatingto failures detected i)_past computers from ac-
ceptance testingthrough end-use indi.catethat over 30 percent of the
expected AES computer failureswould be masked and ¢vouldnot de-
g_-adereliability.Although some rough calculations of the increase
in reliabilityestimates due to consideration of the failuremechanisms
of intermittentsin TM_R organizalions were made in the preproposal
study':this item was not pursued i_rther during the study. All ['allures
were assumed solid, and the reliabilityestimates are therefore
conservative.
2. I.4 Channel/Module Switching
Channel and module switcL';z-gcapabilitiesare provided in the
Saturn-V cornpuLer and force the computer to operate in a simplex
mode. Channel switching forces the computer to operate on any one
of Lhe three simplex chalmels while module switching allows mixed
: 39
1966011715-048
channel operation. In either case, the operating channel is selected
by setting one voter input to a logical zero and a second voter input to
a logical one so that their "votes" cancel and the third input determines
the voter output. Channel switching is shown in Figure 21 and module
switching is shown in Figure 22. The heavy lines indicate the selected
data paths in both figures.
The Saturn-V voterisshown logicallyin Figure 23. Normally
inputsA1, A2, and A3 and outputsCH1, CH2, and CH3 are allalike
(allzeros or allones). PointsA, B, and C (inputstothe logicele-
ments drivingthevoter)are connectedto+6 volts,and poir,tsD, E,
and F are co.,mectedto +12 volts. To selectchannel 1 (CHI), inputs
A2 and A3 must be setto a logicalone, inputE to a zero, and inputF
toa one. InputsE and F couldbe reversed. Outputs CH1, CYI2, and
CH3 now correspond to inputA1 because the thresholdofthe current
summers are setfor two unitsofcurrent;A3 suppliesone unit,A2
suppliesnone, and the stateofA1 thereforedetermines whether the
currentthresholdofthevoteris rc.ched or not.
Module 1 Module 2 Module 3
Chan__] Zero _ Zero ZeroSiqnal
Chan_ Signal _ Signal Signal
Figure 21. Channel Switching
4O
1966011715-049
Module ] Module 2 Module 3
I I si_oo,
'1
Signal
Figure 22. Module Switching
2.2 Trade-oftCriteria
Any system optimizationeffortwillinvolvetrade-of!samong
mutuallyconflictingparameters or criteria.The machine organization
trade-offsofthisstudyinvolvedconsiderationsof reliability,error de-
tectionand faultisolation,module replaceabilityand sparing,machine
sizeand complexly, and susceptibilitlotransients.The lastcriteria
dictatedreinstrumentationofsimIAex and duplexcomponents ofthe
Saturn-V computer and Apollobackup data adaptertoTMR (withspe-
cialconsiderationfor memory protection)ar.ddidnot convictwiththe
trade-offcriteria.
41
1966011715-050
li- ..... ,D Voter ]
Current CH1
SummerA F
]! "
F ..... "7; i / Voter
i l_ Ii I
, _ 10 I
I_ 11_41 CurrentS mmer _ CH2
__EEl,B A2 k
' _ i _ I
' !l1_ Ii:
: L. _J
F _ Vote--r --1
! io I
.__ SummerC A3 11
ti'_ I
t_.......... _1
Figure 23. Saturn-V Voter
42
1966011715-051
Reliability, maintenance, and size criteria were mutually con-
flicting. Reliability maximization dictated that the computer system
be partitioned into small modules at a logic level where the reliability
of the circuits being voted is equal to the reliability of the voter. The
requirement for automatic failure isolation, however, dictated that the
computer system be partitioned into functional modules (such as arith-
metic or timing). The minimizaiion of circuits and interconnections
could be achieved only with a completely unmodularized computer
system.
After seme consideration of these conflicting requirements, it
was decided to ignore reliability as a trade-off criteria until partition-
ing of the machine wa_s completed on the basis of maintainability and
size trade-offs, and then to test the reconfigured machine to determine
whether the reliability requirements of the kES-EPO mission could be
satisfied with that configuration. In addition, preliminary examination
of the relationship between machine size (number of components or
circuits) and modularization level showed that .rel_ively little increase
in size occurred as the machine was partitioned into larger numbers of
modules up to about ten, beyond which the size increased very rapidly.
The optimization studies of machine partitioning therefore were based
on maintainability criteria alone (since these criteria dictated !ess than
ten modules).
Error detection and fault isolation dictated a functional partition-
ing of the computer system which also satisfied the module replace-
ability and sparing requirements.
2.3 Basic Subsystem Configuration
The basic system upon which the study was bascd consisted of
the TMR Saturn-V computer and a redundant version of the Apollo
backup data adapter. This basic system was ex_,znined to determine
to what extent it could meet the functional and availability requirements
of the 90-day AES-EPO mission and where redesign was necessary.
Special altention was given to the reliability and failure isolat'ton capa-
bilities of the basic computer and data adapter.
2.3.1 Saturn V Computer Description
The computer information flow is illustrated in Figure 24. This
simplified block diagram depicts the major data flow paths az._dassoci-
ated register level logic. The timing logic and input/output (i[/O) sec-
tion are not shown.
43
1966011715-052
STO and PA [ A ]
To: _/_Data
Adapter ._ _
PI0 Out
FlOin --_ , ,_1 Add Irom :" q SubData
Adapter
i
A I
Multiply &
Divide Vl
Multipli.er
Multiply & A
D_vide
Multlpllcand
or Divisor
Dividend
- To
an_
Ae_
T
_ -G1_22
Figure 24. Saturn-V Guidance Computer
1966011715-053
13 Lines
From Memory
Buffer
i I
I Transfer Register F
_1 I [I I ; I I I I I
Bit 13 Bits
Parity t -
Generator vv
Accumulator and Instruction Address Count,
!'-_ }_ DelayLinel _f Sense tq Drive
Drive Channel 1
___ Delay Line 1 H
Drive Channel 2 Sense
Multiplier, Quotient, or Final Result Reglst
Multiplicand or Divisor
L-_ H Delay Line! "_1 _Mu_t _
Drive Channel 3 Sense
i__ --_ DelayLinel -_t Sense t--_'°Gi¢
Drive Channel 4 m
Product or Remainder
Computer To Computer To Computer
:1 Data and Data and Data
:apter Ad_pter ArJapter
k Bit Time Pnn_seTime
(14 Bits) .. (3 Phases)
Timing
1966011715-054
14 Lines
tc Memory
Buffer
i ! i To El
Memory Address "_ Drivers
i _ Address Decode Memory
: Register _ Modules
,t--.-tI i
1 I 1
_ De'°YLi°e_F I F SectorChannel 1 _ Sense I Latch_s
IMemory_t Delay Line 2 ModuleChannel 2 Reglster
"err 1
I:_de I MemoryModuleI i ASync
I Dec°de [ -'_ Mem°ryB
ply_Divld_l
"._Oo'ayL_Oe2choono,3 I---_sensoF]
Timing for Multiply and Div[de
1966011715-055
The computer is a serial, fixed-point, stored-progran b general-
purpose machine which processes data using "two's complement"
arithmetic. Two's complement arithmetic obviates the recomplemen-
Lation cycle required when using "sign plus magnitude" arithmetic.
Special algorithms have been developed and implemented for multipli-
cation mud division of t,.,_,,'s cum_L_mem numuez_. IViUlLLI)IiC_tLIt)n ib
done four bits at a time and division two bits at a time.
A random-access magnetic c3re memory is used as the com-
puter storage unit. A serial data rate of 512 kilobits per second is
maintained by operating the memory units in a "serial-by-byte,
parallel-by-bit" operating mode. This allows the memory to work
with a serial arithmetic unit. The parallel read-write word length of
14 bits includes one parity bit to allow checking of the memory
operations.
Storage external to the memory is located predominantly in the
shift register area. High reliability in this area is achieved by using
grass delay lines for arithmetic registers and counters.
Each instructicn is comprised of 4-bit operation code and a nine-
bit operand address. The 9-bit address allows 512 locations to be
directly addressed. The total memory is divided into sectors of 256
words, and contains a residual memory of 256 words. The 9-bit
address specifies a locate_on in either the previously selected sector
(data sector latches) or in the residual memory. If the ol.erand ad-
dress bit (A9) is a binary "0", then the data will come from the sector
specified by the sector register; if A9 is a "1:' the data will come from
residual memory.
Instructions are addressed from an 8-bit instruction counter
augmented by a 4-bit instruction sector register. Sector memory
selection is changed by special instructions whici_ change the contents
of the sector register. Sector s.;ze is large enough so that this is not
a frequent operation.
Data words consist of 26 bits. Instruction words consist of 13
bits and are stored in memory two-instructions per data word. Henc_
instructions are described as being stored in syllable 1 or syllable 2
of a memory word. Two additional bits are used in the memory to
provide parity checking for each of the two syllables.
45
1966011715-056
1,,_ _,.,.,_,_,_, ._, _,.v_,r ..... ned by means of single-address in-
structions. Each instruction specifies an operation and an operand
ad,_.v_,s. Instructiens are addres_ced se_uentia]ly from the memory
unde: control ef the instruction counter. Each time the instruction
counter is used, it is incremented by one to develop the address of
the next instruction. After the instruction is read from memory and
parity checked, the operation code is sent from the transfer register
m _hu ..... ,-^.. Into ,,nae register, a static register which stores the
operation code for _he duration of the execution cycle.
The operand address portion of the instruction is transferred in
parallel (9 bits) from the transfer register (TR) to the mumory address
register. The TR is then cleared.
If the operation code requires reading the memory, the contents
of the operand address are read 14 bits at a time (including parity)
from. the memory into the buffer register where a parity check is
made. Data bits are then sent in parallel to the TR. This information
is then serially transferred to the arithmetic section of the computer.
If the operation code is a store (STO), the contents of the accumulator
are transferred serially into the TR and stored in two 14-bit bytes. A
parity bit is generated for each byte.
Upon completio,_ of the arithmetic operation, the contents of the
instruction counter are transferred serially into the TR. This infor-
mation is then transferred in parallel (just as the operand address had
previously been transferred) into the memory address register. The
TR is then cleared and the next instruction is read, thus completing
one computer cycle.
The data word is r;-_.,d from the memory address specified by
the memory address register and irom the sector specified by the
sector register. Data from the memory go directly to the arithmetic
section of the computer where it is operated on as directed by the OP
code.
The arithmetic section contains an add-subtract element, a
multiply-divide element, and storage registers for the operands.
Registers are required for the accumulator, product, quotient, multi-
plicand_ multiplier, positive remainder, and negative remainder. The
add-subtrac: and the multiply-divide elements operate independently
of each other. Therefore, they can be programmed to operate concur-
rently if desired; i.e., the add-subtract eJement can do several short
operations while the multiply-divide element is in operation.
46
1966011715-057
No dividend register is shown in Figure 24 be£ause it is con-
sidered to be the first remainder. The divisor is read from the ac
cumulator dJring the first cycle time and can be regenerated from the
two remair.ders on subsecluen_ cycles. As indicated, both multiply and
divid_ require more time for _xecu.t_on ":hart the rest of the computer
operations. A special cou,ter is used to keep track of the multi;_.; "-
div'_do progress end stop the operation when completed. The prod1 ct-
quot._ent (PQ) register has been assigned an address and is address-
able from the operand '_ddress of any instruction. The answel _.giil
remain m the PQ register until another multiply-divide is initiated.
A limited program interrupt feature is provided to aid the I/O
processing. A;_ external signal can interrupt the computer program
and cause a transfer to a subprogram. Interrupt occurs when the
instruction in progress is completed. The interrupt forces a HOP
constant to be retrieved from the reserved residual memory location
(octal address 400). The constant designates the start of the subpro-
gram. The instruction counter, sector and module registers, and
syllable latch can be stored in a reserved residual memory loc_tion by
programming a STO 776 on :'__O 777 as the first instruction in the sub-
program. Automatic storage of the accumulator and product-quotient
registers is not provide 2; this must be accomplished by tree subpro-
gram. Protection against rrultiple interrupts and interrupts during
MPY, DIV, HOP_ and EXM operations is pro_'ided.
The interrupt signal may be generated by a timed _ourceo The
rate at which it is generated is controlled by changing the magnitude
of a number which is being continuaD7 summed. When the summed
number reaches a predetermined valve, the interrupt _iffna! is gen--
erated. This is accomplished in the data adapter.
The main program can be resumed by addressing the contents
of residual memory word 776 or 777 with a HOP instruction, after
restoring the accumulator and PQ register to their pre-interrupt
values.
Certain discrete input signals at_ allowed to cause interrupt.
These are useful in causing the I/O subprogram to give imm _diate
attention to an input or output operation.
The memory for the 8:tt_,rn-V Guidanc_ Computer uses conven-
tional toroidal cores in a unique self-co-recting duplex system. The
memory unit consi_*_, of up to eight iaentical 4096-word memory
modules which m_:y be operated in simplex for increased storage
capabilit_ or in duplex pairs for high reliability. The basic computer
47
1966011715-058
progranl "a_-.be loaded into the instruction and constants sectors of
the memory at electromc speeds ._n the ground or just prior to launch.
Thereafter, the infermatior_ content of constants and data cmn be elec--
Irically .=,,e.ed but only under control of the computer program.
The self-correcting duplex system uses an odd parity bit with
detection schemes for malhmction indication and correction. In con-
junction with this scheme, error--detection circuitry is also used for
memory drive current monitoring. Unlike conventional toroid random-
access memories, the self-correcting extension of the basic duplex
approach permits regeneration of correct information a.fter tra_.sients
or intermittent failures. Otherwise destructive read-out cf the mem-
ory could result.
The basic configuration consists of a pair of memories provid-
ing storage for 8192 14-bit memory words for duplex operation, or
16, 384 !4-bit memory words for simplex operation. Each of the
simplex memories includes independent peripheral instrumentation
consisting of timing, control, address drivers, inhibit drivers, sense
amp'Afiers, error-detection circuitry, and I/O connections to facilitate
failure isolation. Computer functions which are common to these sim-
plex units consist of the following:
• Memory address register outputs
• Memory transfer register input-output
• Store gate command
• Read gate command
• Syllable control gates.
The computer functions, which are separate for each simplex
memory, consist of synchronizing gates which provide the serial data
rate of 512 kilobits per second. This data rate is required by the
computer to gene,'ate a start memory unit command at 128 kilobits
per second. These gates also provide the selection of multiple sim-
plex memory units for storage flexibility and permit partial or total
duplex operation throughout the mission profile to extend the mean-
time--before-failure for long mission times. Each of the simplex
units can operate independently of the others or in a duplex manner.
•.... The memory modules are divided into two groups: one group ce.-
sisting of even numbered modules (0-6) and the other consisting of
odd numbered modules (1-7). There is a buffer register associated
with each group which is set by the selected modules.
48
1966011715-061
For duplex operat:on, each memory is upder co_trol of inde-
penaent buf;er registers when both memories axe operating without
failure. Both memories are simultaneously read and updated, 14
bits in parallel. A single cycle is required for reading instructions
(13 bits pIus 1 parity bit per instruction word). Two memory cycles
are required for reading and updating data (26 bits plus 2 parity bits).
The parallel outputs of the memory buffer registers are serialized at
a 512-kilobit rate by the memory transfer L-egister under control of
* the memory select logic. Initially, only one buffer register output
is used, but both buffer register outputs are simultaneously parity
checked in parallel. When an error is detected in the memory being
used, operation immediately transfers to the other memory. Both
memories are then regenerated by the buffer register of the "good"
memory, thus correcting transient errors. Alter the parity-checking
and error-detection ci: :ult._ have verified that the erroneous memory
has been corrected, operation returns to the condition where each
memory is under control of its own buffer register. Operation is not
transferred to the previously erroneous memory unti! the "good"
memory develops its first error. Consequently, instantaneous
switching from one memory output to another permits uninterrupted
computer operatiop until simultaneous failures at the same storage
location in both memories cause complete system failure.
Proper operation of the memory system during read cycles is
indicated by each 14-bit word containing an odd number of bits and a
logical "I" output of the error-detecting circuitry. If either or both
of these conditions axe violated, operation is transferred to the other
memory. Buring regenerate or store cycles, since parity checking
cannot be performed, failure detection is accomplished by the error-
detection circuitry only and by parry detection during subsequent read
cycles. Intermittent addressing of memory between normal cycics is
also detected by the error-detecting circuitry producing a logical "1"
outF,,*.
A summary of the computer characteristics is given in Table 4.
2.3.2 Apollo Backup Data Adapter Description
The Apollo data adapter contains a high speed I/O processor and
the input and output circuitry and logic necessary to connect the cen-
tral processor unit and the I/O processor with the rest of the guidance
and navigation equipment. It also contains a data exchange register
(DER), which buffers and translates data flowing between the central
processor and I/O processor or between the central processor and
49
1966011715-062
TABLE 4- TMR Computer Characteristics
Function Description
,j
Type General purpose, storedprogram, serial,fixed
pointbinary.
Clock 2.048 Mc clock,512 kilobitsper second informa-
tionrate.
i Speed Add-subtractand multiply-dividesimultaneously.
Add 82_s
Multiply 328 _.s
Divide 656_s
Memory Type Toroidal magnetic core, random access.
StorageCapacity Interconnectionsprovideforusingup toeight
memory modules having 4096 28-bitwords.
Input/Output External;computer-programmed I/O control
Externalinterruptprovided.
Component Count 40,000 siliconsemiconductors and cermet
resistors.
Reliability 0.996 probabilityofsuccess for 250 hours;TMR
logicand duplex memories employed.
Packaging 73 electronicpage assemblies.
Weight 78.5 pounds.
Volume (Swept) 2.37 cubic feet.
Power 142 w_tt_.
5O
1966011715-063
external subsystem interfaces. In genera:, the central processor
directly controls discrete inputs and outputs in the data adapter. The
I,'O processor stores and generates pu!se train inputs and outputs and
provides the necessary communication link between these signals and
the central processor. It also provides discrete outputs for controlling
the spacecraf_ reaction control system (RCS) jets.
The data adapter is also required to:
• Accept and process interruptsignalstothecePtralproces-
sor and I/G processo;-from withinthedataadapterand
;:om otherspacecraftsubsystems.
• Frovide continuous timing signals for spacecraft subsys-
tems
• Convert Lunar Excursion Module (LEM) hand controller
signals from mlalog-to-digital form
• Provide regulated d-c power to both the cen'ral processor
and data adapter
• Provide regulated d-c excitation power for various space-
craft discrete inputs.
A block diagram of the data adapter is shown in Figure 25.
All functions within the data adapter are controlled directly by
the central processor and the data adapter timing is synchronous with
that of the central processor timing. The primary data adapter func-
tions are listed in Table 5.
The address generator decodes the nine operand address lines
from the central processor upon receipt of a central processor PIO in-
struction. The decoded address selects the register, I/O processor
memory location, or other circuitry in the data adapter that is to send
data to or receive data from the central processor.
Addresses are divided into four basic groups as determined by the
operand address bits A7 and AS. These groups are defined in Table 6.
The data adapter contains a register addressed by the central
proce_.3or for setting discretes which control internal functions. " -.L'_
register is designed so that the state of any output may be changed
without momentarily or permanently affecting the state of any other un-
rela[ed outputs.
: 51
,}
1966011715-064
TABLE 5 -- Data Adapter Characteristics
It e m Fun ctio n De s cr ipt ion
Inputs Discrete 73
Pulsed 33 (Serial and Incremental)
Outputs Discrete 68
Variable Pulsed 43 (Serial, Incrementai_ Di,__rete)
Fixed Pulsed 10
.) Modules Output Counter 11 (Including Registers and Control),
Gyro and Radar Counter Logic
Input Counter I1 Counters, Multiplexer, Hand
Control Logic, Boot Strap Loader
Time Counter 9 Counters, Pulse Timing
Data Flow Data Exchange Register, Logic,
Multiplexer
Control 4 Discrete Output Registers, Ad-
dress Decode, Controls
Processor Load Register, Down Lipk Register
and Control, Interrupt Register
Input/Output Simplex Drivers
TABLE 6 -- Address Groups
Group A7 A8 Data Transfer Functions
1 1 0 Central processor accumulator to data
adapter register,_
2 0 0 Central processor accumulator to i/O
processor memory
3 1 1 Data adapter inputs to central proces-
sor accumulator
4 0 1 I/O processor memory to central
processor accumulator
52
1966011715-065
Central Processor I Vehicle Subsystems
Id
I !nput Power
_- -- ' -- -- -- Apol 'o%at'a Adapt_
Centrc'l S_" System Power I
. _o_sor_o_or\__u_,_I u___,_
W,___ .ttTTtt
Timing I Data Adapter Power Gene,_tor
N/
i Central Processor Selection Controls
Op,_rand Address (9 LinLs)_ Address Generator.
• r Iv" PIO I/O Processin_
.. Data FromAccumulatorl _[
• I 1Data To Accumulator I
: I NL_II '_ •
5 I
Interrupt ', I t ro es r I I Di_cretelnput
Interrupt . I I Circults " i__
'_ I Alarms (3)
IVehicleI
Channel 15 Cha
Inputs Ir
J
1966011715-066
".z
; CDO's, Telernetry, LEM Attitude Gyros, CDO'_, ._f
Accelerometers, Etc. Commands Telemetry, Etc.
._r J'
,!
,
'- L' _L-_ ! ' Y
/' FI_T2 (1 3 onverter
j ' i
F _
,, I
Interrupt i
I/O Processing 4
Counters (6) ,_
_ ,_
Data Exchange Register i '_
(Serlal/Parallel And Central Processor Timed Interrupts (2_ I ,n,e,_Parallel/Serlal _ '
___AITransferReqlste, - 151,Bits} ii
1
.'te Input Discrete Input Discrete Input Discrete Input /
Circuits Circuits Circuits ]its
I/O Processor Inpu
reel 16 Channel 30 Channel 31 Chanrel 32
_u _- Inputs InpL '., Inputs
!
1966011715-067
Channel 5 Channel 6
Outputs Outputs
l
lip
Output Register Output Register
And D, ivers And Drivers %
I/O Processor ]
Interrupt
Register (8)
1 I' O PrCcessorTimed Interrupts (4) f i _fH 'real Controls Internal ControlDiscretes
Output Reglster !_ Output Register Output Registe,
And Drivers I And Drivers And DriversI
Channel 10 Cha,mel 11 Channel 12
Ou tputs Ouiputs Outputs
Figure 25. Data Adapter Block Diagram
53
1966011715-068
The data adapter contains three output registers controlled by
the central processor and ranging from 7 to 15 bits capacity. These
registers are used to supply miscellaneous discrete signals to systems
in the spacecraft and also for internal control within the data adapter.
These ;'egJsters are designed so that the state of any output may be
changed under computer control without momentarily or permanently
affecting the state of the other inputs.
To load a one ip any bit position of one of these registers, the
set address is used. The central processor data word must contain
zeros in all bit positions except those to be set, which must contain
ones. To load a zero in any bit potation, the reset address must be
i used. The computer data word must contain zeros in all bit positions
except those to be reset, which must contain ones.
The data adapter contains a 15-bit output register addressed by
the central processor which is used to control display matrix relays
in the display and i_eyboard (DSKY). At the start of each PIO load
operation for this register, all bits are reset to the zero state auto-
matically, allowing a new word to be loaded immediately thereafter.
The data adapter is capable of accepting 67 discrete inputs from
spacecraft subsystems and four internally generated discrete signals.
Except for the internally generated discretes, no storage is provided
for discretes within the data adapter. Groups of these discretes are
treated as words by the central processor. Each channel is addressed
and read into the central processor by a PIO operation.
The data adapter contains a 10-bit register for storing signals
required to interrupt the central processor. Some of these signals
are generated internally; the others are caused by critical discrete
inputs.
Upon receipt of one of these signals, it is stored in the register.
Register outputs are "OR'ed:' together so that any input signal will
cause the central processor interrupt signal to be turned on. This sig-
nal will cause an interruption to occur when the instruction in process
is completed.
At the start of an interrupt subroutine, the central processor will
read the contents of the interrupt register. It will then process, in the
order of highest assigned priority, any interrupt subroutines called
"-. for by the presence of ones in the interrupt word. Upon completion of
an interrapt subroutine, the central processor will address the inter-
rupt regi,_ter and reset the register position causing the interrupt just
54
1966011715-069
processed_ as explained in the following parag::apb. In the case of bit
position 10 (switch closure) ar,'l keyboard inputs, further interrupts
will not be recognized until the input stimulus has cycled down and back
up after the original request. This prevents repeated processing of
interrupt subroutines whose input signals last longer than the subrou-
tines.
Certain interrupt sources may be inhibited or "trapped" under
program control by the central processor. These are signals which
would cause interrupts at undesired times du_ing a mission if no pro-
tection against this were provided. To trap an interrupt, the Set
Central Processor h_terrupt Trap Address is used with a one in its
related accumulator data bit position. Zeros are placed in the other
positions of the data word. To remove the trap or to reset an inter-
rupt register bit position after processin6 it, the Reset Central Proc-
essor Interrupt Trap Address is dsed. Accumulator data for untrap-
ping or resetting is the same as for trapping.
The data adapter contains a 15-bit multipurpose shift register
called the data exchange register (DER). This register performs the
following fmmtions" ,_ _ :
• Accepts serial 512-kc/s data from the central proct._sor
accumulator and buffers and transfers it in parallel to the .,
I/O processor memory drivers, the internal c'mtrol regis-
ter, or discrete output :_egisters.
• Accepts parallel data from the I/O processor memory
sense amplifiers, discrete input channels, or the central
processor interrupt register and transfers it serially at
512 kc/s to the central processor accumulator.
The central processor accumulator data transferred to or from
the DER is referenced to the sign and 14 high-order bit positions in
the accumulator.
Some of the vehicle subsystems require timing signals from the
Apollo backup computer. These signals are derived from the data
adapter timing generator and are required continuously during a!l _.
phases of the mission. Therefore, they must be available during the
standby mode of operation.
Certain signals are continu,_usly monitored for malfunctions. _ :
Upon detecting a malftmction, _he computer system gene_'ates a dis-
crete sign_ to turn on a warning light either on the DSKY or the caution
55
1966011715-070
and warning electronics pan.el. Some of these signals are generated
under program control; others are generated _utomatically by special
alarm detect;_on circuits.
The data adapter can accept and generate pulse train inputs and
outputs for other spacecraft subsystems. It generates and stores the
current value of real time required for navigation and control functions.
It also generates program controlled, timed interrupt signals which
are flags for branching into different subroutines. These functions are
performed by counters and associated logic. Thirty-two counters are
required.
{ There are 15 input counters serving various input functions. In-
put pulses occur at varying and unpredictable in+ervals in several of
the channels and are therefore buffered, one pulse per channel, to
await sampling in the processor, the buffer positions are continuously
sampled consecutively at "l-clock pulse intervals to determine whether
an input is present. A 5-bit grey code counter is used to step from in-
put to input simultaneously with normal program cycles in the proces-
sor. Upon detecting an input pulse requiring storage, the grey code
counter is stopped, and its value is transferred into the five low-order
positions of the processor operand address register. The high-order
bits are forced to zeros. This Jrms the memory address for that in-
put counter. A memory-steal cycle is then executed incrementing or
shifting the pulse into its memory location. The input buffer for that
pulse is then reset to await the arrival of another input pulse.
There are 17 output counters serving various internal tim._ng
and output functions. When output pulses from any counter location are
required, they occur at a fixed rate of 3200 pps. This means that a
pulse occurs every 312 microseconds from the counter involved. It is
convenient to service these counters using the same grey code address
cou_ter used for T8 and the input counters. For these latter counters,
the grey code counter steps through 16 states and repeats. This opera-
tion is continuous except that every 312 microseconds an internally
generated timing pulse occurs causing the address counter to extend
its count from 16 to 32 to service the 16 other output counters. Each
output counter has a buffer latch which ij set or reset durh_g t:m previ-
ous memory-steal cycle associated with the counter. It is also set by
the central processor PIO operation used to initially load data in the
output counter memory location. The added 16 counts from the address
, counter are used to sample the state of these buffer latches in the same
manner as for the input pulses. The address counter al_o forms the
memory address for memory-steal operations as described for the in-
put counters. Any buffer latches that are set when addressed require
56
1966011715-071
output pulses to be generated (one per channel per address pulse).
When all 16 eutput counters have been sampled and processed as re-
quired, the address counter will again process the inputs repeatedly un-
til interrupted by another 3200-pps pulse. This operation is repeated
every 312 microsecouds until the counter memory locations have been
cleared, thus freeing them for l_adiag more output data.
The downlink implementation provides for transmitting _ one bit
identifier (word order bit) followed _y 32 information bits. T: is infor-
mation group will be two 15-bit accumulator words, each followed by a
separate parity bit.
The downlink output is coutrolled by the number of synchroniza-
tion pulses received from the pulse code modulator (PCM) telemetry
and the manner in which the processor is programmed. Therefore,
the output can be one word, one word redundant, or two words, depend-
ing upon how the dowulink registers are loaded and the number of data
synchronization pulses gated into the aata adapter. If only one word is
sent out, the second load PIO operation is still re.quired to advance the
output data from the load register to the shift register. This PIO word
will contain all zeros.
The downlink operation sequence is initiated b¥ an input pulse
(downlink end) which causes a processor program interrupt. During
the interrupt subroutine, the processor performs a PIO ogeration to set
the downlink word order bit to a one or zero as required. The proces-
sor then loads the first downlink word, which goes into the load regis-
ter. This operation is followed by another PIO to transfer the first
downlink word to the shift register and load the second downlink word,
if required, into the load register. This completes the processor inter-
rupt subroutine: The data remains in storage registers until the pulse
coded modulation (PCM) telemetry sends the next series of control and
synchronization pulses required to send the data to the PCM equipme_._L.
When the first 15-bit word has been transferred out of the shift regis-
ter, a iive-stage counter in the downlink timing and control logic causes
a parity bit to be generated for [hat word. It also causes the second
data word to be transferred from the load register to the shift register.
When this second word has been transferred to the PCM telemetry, its
parity bit is also generated and sent out.
There are two downlink data word transfer rates: 51.2 kpps and
1.6 kpps. The rate used is determined by the rate of the synchroniza-
tion pulses received from the PCM equipment.
57
1966011715-072
The data adapter contain the system power supplies. These
supplies furnish normal power tor the central processor and data
adapter and standby mode power when n_rmal computer operatious are
not required. Input power to the supplies is obtained from duplex re-
dundant 28-volts dc vehicle power sources. Power sequencing and in-
put power transient protection are provided to prevent loss of data
stored in the central processor memories.
In certain phases of the mission there may be relatively long
periods when it will not be necessary for the Apollo computer to per-
form any calculations. The only use for the computer during these
periods is to keep track of real time and provide continuous timiug sig-
J nals for other vehicle subsystems. Since these are only minor func-
tions, a standby power supply is provided to furnish power only for the
logic associated with these functions.
During standby operation the oscillator and clock logic of the
central processor is used to drive a group of clock drivers which are
associated only with the standby logic.
Real time is derived from the pulse timing logic, the lowest fre-
quency of which is 800 cps. This 800-cps signal is used as an input to
counter T2. The overflow of this counter is used to increment counter
T1. The two counters from the low and high-order bits of real time,
respectively. It is, therefore, possible to record time to (1.25 x 10-3)
(230 ) seconds or approximately 15 1/2 days.
Time recording during standby requires the operation of the pulse
timing logic, counter T1, counter T2, add/subtract logic, and proces-
sor memory. Memory must operate because the contents of counters
T1 and T2 are stored in memory locations.
Standby power is established by request of the operator. An in-
terlock system was designed to prevent the operator from inadvertently
requesting the standby mode. The first step required to initiate standby
is to insert the proper key code. When the ce_.tral processor recognizes
this code, it prepares itself and the discrete outputs for shutdown.
After this is completed, the central processor sets the discrete output
in channel 13, bit 10. This signal is an enable signal which energizes
the standby switch on the DSKY. Depressing the standby switch in con-
junction with the enable signal causes the power supply to start turning
> off main power.
58
1966011715-073
2.3.3 Reliability Models
A reliability model for t,,e basic computer subsystem consisting
of the Saturn-V computer and a redundant version of the Apollo backup
data adapter was derived and used as a basic reference with which to
compare the reliability characteristics of the reorganized computer
and data adapter.
The model includes "equivalent time" factors which modify real
time to include the effects of environmental severity factors and non-
operating failure rates. The equivalent time period of the most critical
mission phase is defined as:
m
T c = max (T i Ki)
i= 1,
where T i is the actual time period of the ith phase, m is the number of
critical mission phases, and Ki is the environmental severity factor
for the ith phase. The simplex reliability model for the most critical
mission phase is defined as
R = exp (- kop Tc),C
where k op is the operating failure rate of the equipment. The equiva-
lent operating time and standby time is generated for each of the three
assumed duty cycles from the expressions
n
TE = _ Top i K.
op := 1
n
= _ T K.
TE non-op i = 1, non-op i 1
where T E op is the equivalent mission operating time, T E non-op is
the equivalent mission standby time, Top i and T non-op i are. the actual
operating and standby times, Ki is the environmental severity factor for
the ith phase, and n is the number of mission phases. The simplex re-
liability model for the tot_ mission is defined as:
Rm = exp-(k op TE op + k non-op TE non-op),
knon-op is the standby failure rate of the equipment.
59
1966011715-074
The computer and data adapter operate effectively in series for
reliability computations so the model for the computer subsystem is
simply
Rss = Rco Rda,
where R is the computer reliability and Rda is the data adapter re-
liability, c°
The computer is composed of a simplex oscillator, TM_R logic,
and duplex memories. These three elements operate effectively in
series so the reliability model for the computer is
l
= RlogRco Rosc Rmem.
The oscillator for the Saturn-V computer is simplex so the reliability
model for this device is
- k osc TR = eOSC
The reliability of the computer logic (including timing) can be ex-
pressed mathematically as
Rlog = (Rtg)3 (RTMR) + _ (Rtg)2(I - Rt)(Rsim)2 ,
where Rtg isthe reliabilityofthe simplex timinggenerator,RTM R is
the reliabilityofthe TMR logic,and Rsim isthe reliabilityofone
channelofthe TMR logic. The reliabilitymodel forthe TMR logicwas
definedas
2 (Rmod)3Rtrio = 3 (R rood) - 2
where Rmo d is the reliability of each simplex module of a TMR module
trio. For p i.ndepehdent trios
P
R TMR = 17" (R,rio)t i.i=l
>
60
1966011715-075
The reliability of the duplex memory is represented by the model
Rme m = (Rsm)2,
where Rsm is the reliability of each of the simplex memories. The
reliability of a shaplex memory can be expressed mathematically as
P
Rsm=RM+ PND" Pp(1 -RM) RM+ PD(1-RM) RM'
where R M is the probability that the originally selected simplex
memory works for the entire mission, PND is the probability that a
failure in the selected memory will not be detected by the error sensing
circuitry, Pp is the probability that these uondetected errors will be
detected by parity, and PD is the probability that a failure in the se-
'ected memory v.21 be detected by the error sensing circuitry.
The data adapter is composed primarily of the power supply and
the logic, and the reliability model is the serial combination
Rda = %s Rdal'
where Rps is the reliability of the power supply and Rdal is the relia-
bility of the data adapter logic. The power supply model assumed for
the AES configuration is based on the Saturn-V duplex power supply
which is composed of six individual duplex supplies using duplex error
amplifiers. The reliability model for the AES power supply system "
then
P
RDs = v Rdi ,
- i = 1
where R i is the reliability of the ith duplex supply. R d can be expres-d
sed mathematically as
2 2 2RI(I_RI) + 2RsR I Psf_(Ri + Pill)'Rd=R s RI + 2R s
where R= isthe reliabilityofa simplex supplywithduplexerror amp-
lifier,l_ isthe reliabilityofa simplex isolationcircuit,Psf] isthe
61
1966011715-076
probability that the simplex supply will fail low, Pif_ is the probability
that the isolation circuit will fail low. The reliability of the sim21ex
supply is
2
Rs = RcRa + 2RcRaPafl,
where Rc is the reliability of the simplex DC/DC converter, R_ is the
simplex reliability of the duplex error amplifiers, Pafl is the proba-
bility that an error amplifier will fail low. For supplies not containing
an isolation circuit
= 2 p) R d R s + 2 R s sfL
The simplex Apollo data adapter may be arranged into p functional
modules and each of these assumed to be TMR for the AES configura-
tion. The reliability of each TMR module is, as in the case of the
computer TMR logic,
Rtrio = 3 (R mod )2 - 2 (Rmod) ,3
a_d the reliability of the data adapter logic is
P
Rdal = ,r (Rtrio) i.i= 1
2.3.4 Reliability Estimates
Simulations were performed to derive reliability estimates for
the basic subsystem configuration based on the reliability models of
the previous section and using the component failure rates, environ-
mental stress factors and mission profile.
The reliability estimates are summarized in Table 7 for the most
critical phase and mission reliabilities. The latter was estimated for
duty cycles of 100, 50, and 25 percent and for zero and nonzero stand-
by failure rates. The mission reliability figure represents the basic
long-term reliability of the equipment with no sparing. The sparing
'_ requirements to raise these figures to the required 0. 9994 mission
reliability were not determined for the basic system because the pre-
dicted system reliability for the most critical phase (0. 999921) fell
62
1966011715-077
below the system requirement of 0. 999999. Although memory appeared
to Le the limiting factor, the TMR memories proposed for tne recon-
figured computer provide higher reliabilities and the TM_/sunp_ex
mode further improves both memory and logic reliabilities. However,
since the reliability estimate for the simplex oscillator was less than
the required system reliability, a redundant oscillator was a necessary
component of the reconfigured computer. Also, two of the six ..egula-
tors in the Saturn-V power supply did not contain isolation circuits,
which resulted in considerably lower reliability than the four regulators
containing this protection. Addition of isolation to these circuits was
considered in the reconfigured supplies as well as triplex instrumenta-
tion. TMR memory and module switching was also considered in the
reorganized comput :r system to increase the inherent equipment reli-
ability to the specified 0.999999.
2.4 Oscillator
The reliability estimate for the basic computer given in Section
2.3 revealed that the simplex Saturn-V oscillator presented a reliabil-
ity constraint which will prevent ally recor,figured computer from
meeting the short term requirement of 0. 999999 specified in the AES-
EPO statement of work. A redundant oscillator investigation was
therefore performed during August to find a technique for providing a
redundant oscillator and thereby removing this reliability constraint.
2.4. i Alternate Approaches
A previous IBM oscillator study considered two general ap-
proaches to oscillator desigr, for space system applications: 1) an
astable multivibrator (saturated stages) and 2) a sine wave oscillator
(outputs shaped into symmetrical square waves). The second tech-
nique could be accomplished with tuned circuits, consisting of normal
inductors or capacitors, or with pi=zo-electric crystals.
The astable multivibrator was eliminated from consideration for
the AES computer for several reasons including frequency, _:ccuracy,
stability, and temperature sensitivity. Redundancy and crystal control
are feasible in the multivibrator pulse generator, but the required
pulse characteristics could not be maintained in the presence of a fault
: even though the pulse repetition frequency might be maintained.
Conventional sine wave oscillators were eliminated from consid-
eration mainly because of the high component co_mt in redundant
.}
t
, 63
1966011715-078
TABLE 7 -- Rei.:abil:ty Estimates (Basic System)
Mission Reliability
Critical
Element Phase Non-Op k > 0 Non-Op k = 0
100% 50% 25% 50% 25%
Computer O. 999933 O, 8879 O. 9272 O. 9446 O. 9647 O. 9886
Oscillator O. 999992 O. 9984 O. 9989 O. 9992 O. 9992 O. 9996
Logic 0.999998 0.9334 0.9580 0.9686 0. 9809 0.9947
Memory 0.999942 0.9528 0.9689 0.9760 0. 9843 0.9943
Data _"_pter 0. 999989 0. 8096 0.8921 0.9272 0. 9429 0. 9840
Power
0. 999995 0. 9980 0.9988 0.9992 0. 9992 0. 9997Supply
Logic 0. 999994 0.8113 0.8932 0.9280 0. 9436 0. 9843
Computer 0.999921 0 7189 0.8272 0.8759 0. 9096 0. 9728
Subsystem
I , , i
configurations, _specially TMR. The poor frequency accuracy and
stability inherent to non-crystal sine wave oscillators, and the required
adjustments, were additional reasons for their eliminatiou, although
precise frequency is not necessarily a systems requirement for com-
puter timing if delay lines are not used and if real time is derived from
a separate source.
A special crystal-controlled oscillator was chosen for the AES
computer in preference to crystal stabil£zed instrumentations to pro-
vide a precise timing source with a minimum component count and no
required adjustments. Several redundant configurations of this basic
crysta' controlled oscillato, _ were considered.
The maximum reliability increases are achieved generally when
redundancy is applied to the lowest circuit levels. Quad circuits apply
"_ redundancy at the cor_l_ent level by arranging sets of four identical
componerts =n series, parallel, or series-parallel. Five different
64
1966011715-079
quad circuit configurations oi the crystal oscillator were considered for
the AES computer and rejected for the following reasons:
1_ Excessive component count.
2) Quad crystal circuits tend to cause heterodyning of the out-
put frequency, especially in the presence of a component
failure, except in the configuration in which the four crys-
tals were connected directly Jn parallel.
3) Certain failures such as shorts at the junctions of tra_, 3is-
tors will cause system failure, a violation of the desig,
ground rule that single component failures shall not cause
failure of the redundant system.
4) C_tput wave shapes changed in unacceptable manners when
a component failure occurred in all but one of the five con-
figurations studied.
5) A transient occurred in the output of the redundant oscilla-
tor when a component failure occurred.
A redundant oscillator configuration was considered in which two
independent oscillators are operated in parallel but synchronized by
appropriate coupling. Inductive coupling was rejected because the in-
ductor becomes a significant factor in the equivalent crystal circuit and
therefore detracts from the desired crystal control. Resistive coupling
was rejected because shorted transistor junctions will cause system
failure. Capacitive coupling is feasible but requires more components,
dissipates more power, and yields lower reliability than the DC-AC-DC
synchronized crystal controlled oscillator described next.
This oscitlator, a duo redundant configur_.tion developed for the
OAO system, uses a pair of piezo-electric crystals as couplers as well
as resonators. The crystals couple the driving currents into the basis
of the transistors. The primary disadvantage of this scheme is the
occurrence of a component failure. The reliability estimate of this
redundant oscillator for the critical phase of the AES mission exceeds
0.9999999.
2.4.2 Synchrorization and Transients
Two problems must be solved in instrumenting a TMR oscillator:
1) synchronization of the three oscillators and 2) elimination of the
65
1966011715-080
transient which will appear at the voter outputs when a component failure
occurs in one of the oscillators.
Synchronization can be obtained by slaving the o_cillators as
shown in Figure 26. The channel 1 oscillator may be considered the
master oscillator; it drives the channel 2 oscillator, v'hich in turn
drives the channel 3 oscillator. As long as the channel 1 oscillator is
operating, channe/s 2 and 3 will be slaved to channel 1. If channel 1
fails, however, the channel 2 oscillator becomes the master and forces
channel 3 to synchronize with it.
Elimination of the loading transient can be obtained by inserting
, time delays in the three channels between the nscillator buffers and theJ
voters. The delay in each channel will be different by an amount at
least equal to the period of the loading transient (which should not ex-
ceed one cycle, although much larger trantAent periods could be accom-
modated). The relative delays between cha,_.aels must also be an inte-
gral number of oscillator output periods. The outputs of the delay cir-
cuits ana of the voters are sho'  :_o::re 2'!. Note that the transient
(assume to be a failure: in channel ! oscillator) is voted out.
Analysis of the S:_turn-V timing indicated, furH,er, that neither
the delay nor the synchro_dzing may be necessary. _ f.mctional repre-
sentation of the Saturr_-V clock generator is show in F'<:_re 28. The
generator is shown i_ simplex for simplicity. The _._:, ,ation of the
clock pulses is seque; (w derived from z, x from _,,., from x: and
z from y). The begir:_.,ng of each clock (say x) is dete. 4ned from the
Oscillator #_ 7
1 .... _ ,.. :,nel 1
Oscillator / _ _, Channel 2
2
t
",_ Oscillator j /_ Channel 3 -
3 J -
Figure 26. Oscillator Synchronization
66
1966011715-081
Delay Voter
2 2
I
..... i i _,1
I "_, I
Figure 27. Transient Filtqr
change in state of the preceding clock (_,,') in the direction of I eset (on to
off). No_raal operation is indicated in Figure 28 by the solid-line
wayefor_,,s.
If the oscillator fails by missing a pulse, indicated in Figure 28 by
the dotted-line wa-zeform_, the only result is an elongation of one of the
clock pulses. In Figure 28, the second oscillator puts¢ ".s missing
_vhich delays reset of the w clock for one oscillator period. The re-
maining clocks are undistorted but delayed by the transient period.
Operationally, the effect of the elongated clock pulse is effectively
a suspension in computation for the period of the transient providing
that the computer does not contain fixed-time components such as delay
lines and provided that the transient is not so long that it affects the ac-
curacy of real-time computations. The computer conf_uration
67
1966011715-082
[ ,m ,,
From
_-p ShaperOscillator
rn r_ r_ r_ r_ rn r
_J L_--J LJ LJ LJ L.J LJ L-
I
i w !----1Clock =
Generator i r I r---I
_J L J L
I
_ Clock
i Generator
r--I r---i
_J L J L__
I
Y --1
_-- _ Clock
Generator
I-----I P---
.... J L I
1
. ! z -1 I---I i-
Clock _
"_ Generator
-'1 r-_
L -J L .....
Figure 28. Clock Generator
68
1966011715-083
visualized at this time for the AES application does not contain delay
lines, and experience at IBM indicates that the transient period will be
of a short enough period that real-time accuracy will not be alfected.
Loading transients caused by oscillator component failures could
conceivably result in double oscillator pulses rather than missing
pulses. The bandwidth of properly designed ,'.lock circuits would not
respond to a double pulse (double frequency), however, and the effect
would be the same as a missing pulse. Similarly, if a distorted pulse
occurred, rather than a missing or a double pulse, then the result
would be a normal clock output (if the clock circuits responded to the
uistorted waveiorm) or a delayed clock output (if the clock circuits did
not respond).
The conclusion of the preceding discussion is that the delay cir-
cuits described previously may not be necessary. In addition, fre-
quency synchron_ation of the redundant oscillator may not be necessary
as indicated by the following discussion.
2.4.3 Selected Approaches
In addition to the TMR oscillator, two other redundant oscillator
configurations were considered to be feasible. One scheme is shown
in Figure 29 in which the outputs of three oscillators are tied together
and the common output used to drive all three channels of the TMR
machine. Each oscillator develops a d-c voleage (in additior, to the sine
._Osc_ll_tor 1 Channel 1
._ ! _OsclJlator2 Channel2 .I I 7
scillator 3 Channel 3 _--
,i
i Figure 29. Biased Oscillators
i 69
1966011 71 5-084
wave output) to bias the other two oscillators off. When the power is
first applied, all three oscillat'_rs try to build-up oscillations but one
wUl win the race and turn the other two off. Tbis winner will supply
the basic tim h_g for the entire TMR computer unless it tails, at which
time the remaining two oscillators race for control.
A transient will occur upon occurrence of a component failure in
the operating oscillator due to. the time it takes the second oscillator to
build-up to a usable output. Again this transient will cause only a clock
elongation which effectively halts computation for the length of the
transient. In this scheme, howe,,er, the transient can be of the orde. _
of a millisecond rather than an oscillator cycle, and a serious error
p" y result, especially in the computation of real-time.
A possible solution to this l_roblem is to develop real time from
an independent simplex oscillator. To provide sufficient reliability for
real-time computations, the triplex oscillator of Figure 29 may provide
backup for real time computation, L e., the triplex oscillator may be
automatically sw tched to update the real-time counter when a malfunc-
tion is sensed in the simplex oscillator.
A third scheme of redunding the oscillator is to allow three inde-
pendent oscillators to operate in paral!el and gate their outputs so that
only one is used at a time as shown in Figure 30. The output of each
oscillator is checked by the sense circuits to verify operation of that
channel. The tratch and sequerce circuits select an operating channel
and provide a gating signal (say UO1, use oscillator 1) to the selected
channel gate. The gate outputs are tied together to drive the TlVlR
channels.
This scheme reduces the transient time to that required for
switching (since all three oscillators are operating) and should be of
the order of microseconds.
All three schemes (TMR, biased, and gated oscillators) appear
to be feasible for use in the AES computer. The gated oscillator
scheme was selected as the reliability and performance model for the
reorganized subsystem.
2.5 Memory
_-_ The reliability estimate for the basic computer given in Section
2.3 revealed that the duplex Saturn-V memories, like the simplex
oscillator, presented a reliability constraint which precludes me
7O
1966011715-085
F- T
for2 , Gate 2 : ._
U02_
I
IOscillotor I "_ -i _
f
3 Gate 3
_q_ U03I
P _r •
Sen Tratch ._ UO1
! C_rcu_ts I q & :_U02Sequence _ U03
Figure 30. Gated Oscillators
short-term AES reliability requirement of 0. 999999. A reorganization
of the computer memory from duplex to TMR was found to provide in-
creased reliability in this critical area as well as a potential for im-
proved transient protection. Although the total memory capacity was
increased 50 percent over the equivalent duplex co_ffiguration, much of
the circuitry associated with duplex memory was eliminated including:
1) Half-select current monitoring circuits
2) Parity generating and measuring circuits
5) Buffer registers and associated switching
4) Voters on X-Y drivers
5) Voters on R-W liming.
The functionalblock diagram ofthe duplexmemory ofSaturnV
and the TMR memory oftheproposed AES configurationare shown in
Figures $1 and 32, respectively.
71
1966011715-086
Figure 31. Duplex Memory
72
1966011715-087
X-Y Drive A [nhlbit Drive A
I!_-|" _- ISector!' 'IISeo,o_-II[--_9111T __q II_-lseo,O,lll__ IIH_ i_._[I _1 -,26v o_
,_z r i i _ l l I I _
I_ I I I N
" II !i "_'_ 8 _'1Drlvers tS [Ii' w
I/ X-Y Drive
' P,I "_1 8 El Drivers
= t t t t t t f s,---is1
g transfer Register A i " .x -- " I
O ] :_ ' I
_ I i 8 i
" TI [ ,I
,_ ,,n 128 Diode Mat
t--_...
"t
X-Y Drive A Inhibit Drive A Timing Generator A X-Y Drlv
.T_ tr_
i
1966011715-088
II
1966011715-089
r
{
i_l•e C --41nhlbit-- Drive__;r Tiing Generator_ M
_ :ldress J I Transfer Register C ! Sequence Generator iq'_uts Voter Outputs Voter Outputs j J
Figure 32. TMR Memory
_! 73
1966011715-090
2.5.1 TMR Reliability Model
A reliability analysis of a TMR memory configuration was per-
formed and compared with the duplex configuration of the basic com-
puter. Reliability of the TMR memory may be expressed in the follow-
ing form:
RTM R = R 3 + 3 RJ (I-RM) + Cs2+(3Rm)(1-RM)2+ Cs3(I-RM )3
where R M isthereliabilityofone :simplexmemory module ofthetrio,
' Cs2 is the conditional probability that no identical address bit has failed
j inthe two failedmodules, and Cs3 isthe conditionalprobabilitythatno
identicaladdress bithas failedinthethreefailedmemory modules.
This expressionisconservativeinthatitdoes not recognizethe possi-
bilityofcompensatingerrors.
The failureeventsincludedinthe (I-RM) probabilityexpression
consistofone, two, etc., component partfailuresand willhave a con-
ditionaldis'ributionof
(C 1+ C2 + ....... + Cn)= 1.0,
where Ci is the conditional probability that i component part failures
have occurred in a given module failure. Since the number of compo-
nents in each memory is large and the probability of failure associated
with each component is small, a Poisson distribution can be assumed
in evaluating Cs2 and Cs3. Therefore,
-kmW (kmW)iC.= e
1 i! (1-e'kmT) '
where T is the equivalent time (real Lime times environmental stress
factor) oi the most critical mission phase and k M is the failure rate of
a simplex memory module. Cs2 and Cs3 can then be expressed as
n n
: Z Z Cj KCs2 j=l i=l i ]_
74
1966011715-091
n n n
Cs3 j=l i=l k=l .1 i '
where Kji is the conditional probability that no identical address bit
failure has occurred, given j component failures in one memory module
and i failures in a second module. Kjik is the conditional probabili_
that no identical address bit failure has occurred, given j component
failures in one memory module, i failures in a second module, and k
,,._lures in the third module.
Cs2 and Cs3 were evaluated on the basis of no more than two
component failures in each memory module, since the resulting uneval-
uated events are negligible, in order to determLue._ K.iJ and. K..jik, the
memory module component part failures were categorized as
1) Serial failures (failure results in complete memory module
failure, i.e., all bits failed)
2) Core failures
3) High X driver failures
4) Low X driver failures
5) High Y driver failures
6) I._w Y driver failures
7) Z failures (memory buffer register, sense amplifiers,
inhibit drivers)
8) Y line failures (including Y line solder connections,
terminating resistors and open failure mode of the decoupl-
ing diodes)
9) X line failures.
The value of (l-K11) was determined from an analysis of 81 cate-
gorized failure combinations. Since no more than two component part
failures are assumed, the expected distribution of the second compo-
nent part failures was assumed to be the same as for the first, or
1 75
1966011715-092
I - KI2 = (I-K11)+ KII (I-K11)
2
= I-KII
2
KI2 = K21 = KII •
Similarlyp
4
K2,? = KII ,
Kll 1 = Kl13,
5
KI12 = K121 = K211 = Kll ,
8
K221 =K212 - K122 = Kll , and
K222 = Kl112 .
2.5.2 Duplex Reliability Model
The dual memory reliability equation will be of the form:
RDual = RM+ CDIXRMX(I-R M)+ CD2XCD: (1-RM)2
where:
CD1 = Conditional probability that the first failure will be
detected and operation switched to the active standby
memor:/
CD2 = Conditional probability that the same address word has
not failed
R M = Reliability of one memory module (i. e., one of the two
? memory modules in the dual). Same electroni_ _ as for
_._ the TMR except one additional plane and associated
electronics are required for parity checking (i. e., dual
memory module has 14 planes).
76
1966011715-093
CD3 = Conditional probability that the failure(s) in one memory
module are continuously detectable by the memory error
detect_o,i circuitry, and operatioh is switched to t.e other
memory module.
Probability CD1 will be greater than CD3, since CD1 includes only first
component part failures. In addition to this, some of the compon_-nt
part failures result in a n_emory module failure mode which may not be
repeatable with respect to the memory error detection. For purposes
of this analysis, it will l'e assumed that CD1 = CD3 which will result in
a slightly optimistic reiiabuity number for the dual memGry.
in the analysis of the dual men,._ry, the memory module elec-
tronics were the same as for the TMR w:tth the addition of one memory
plane, one inhibit driver, one sense amplifier, and one more bit posi-
tion in the memory blfffer register for purposes of implementing parity
checking.
In order for CD1 to oe close to unity (i. e., 0.9 or better) it would
require more than a single bit parity check (i. e., two or more planes
per memory r dule for parity bits) or some other additional form of
memory error detection. For purposes of this analysis and comparison
with the TMR, a bare minimum configurat:_on was assumed (i. e., one
parity bit).
CD2 is found by the following equation, i.e., similar to Cs2 in
the TMR memory anatysis:
n n
: c.c.
CD2 j=l i=l ] _K(j'i) '
where Ci is the conditional probability that i component part failures
have occurred in one memory module, given that the memory module
has failed.
K¢_,ui) = Conditional probability that no _ame word failure has oc-
curred given j and i component part failures in two mem-
ory modules respectively.
CD2 can be evaluated on the basis of no more than two component part
failures. This is analogou.: to what was done in evaluating Cs2 and Cs3
in the TMR memory analysis.
77
1966011715-094
The value of (1-Kll) was determined in a manner similar to
(1-Kll) for the TMR memory, where:
for both the dual and the TMR memo"y. (Note: value of Kll for dual
memory is not the same as Kll for the TMR memory. )
!
2.5.3 Reliability Estimates
Simultations were performed to derive reliability estimates for
the basic duplex memories and for the reeonfigured TMR memories.
The latter eonfigu-'atiofi was examined in both TMR and T_R/sin'.plex
modes. Relative memory reliability estimates are listed below for the
eritieal mission phase (T = 10.8 hours) in order to compare these
configurations.
TMR Memory
= Failure rate for one double density memory moduleAM
plus its associated memory buffer register
= 55.2 x 10 -6
Kll = 0.441812
R M = 0.9994038
RTM R = 0.99999941
Dual Memory
= 57.6 x 10-6 (Same as for TMR with the addition of one),M
plane, one inhibit driver, one sense amplifier, and one
more bit position in the MBR. )
> = 0. 275307
,,_ CD2
78
1966011715-095
CD1 RDual
1.000000 0.9999994
0.998150* 0.9999977
0.950000 0.9999539
0.900000 0.9999083
*The value that CD1 must be for RDual =
RTM R, i.e., 99.8 percent of all first
failures must be detected by the memory
error detection circuitry.
2.5.4 Conclusions
In order for the dual memory predicted reliability to be equal to
the predicted reliability of the TMR memory, it is required that the
error detection circuitry in the dual memory have the capability of
detecting nearly all first failures that may occur in the memory module,
i. e., 99.8 percent of all first failures. The dual memory model uti-
lized for this prediction contains a minimum of error detection circuil_ry,
i.e.. one additional memory plane, inhibit driver, sense amplifier, and
one additions2 bit position in the MBR for a one-bit parity check and
could not b_ expected to detect 99.8 percent of all expected fLrst failures
in the memory module.
A memory error detection utilizing a one-bit parity check as the
only means of error detection could not be expected to detect more than
approximately 85 percent of t21e expected first failures in the memory
module. Of this 85 percent, approximately 86 percent comprise a
failure mode where detection is repeatable, and approximately 19 per-
cent of the 85 percent involves failures which are by chance detected by
parity checking and whose detection by parity checking is not repeatable,
i. e., failure woutd not be detected every time an error occurred. To
be able to predict approx;mately 100 percent detection of the expected
first failures in the memory module would require very sophisticated
memory error detection circuitry which, in turn, would further de-
grade the predicted dual memory reliability number.
i 79
i
1966011715-096
The reliability prediction for the TMR memory was pessimistic
in that it assumes that no compensating failures can occur, i.e., failure
is assumed if two or more bits at the same address have failed. This
is probably more realistic than assuming 50 percent of these failures
are compensating.
The reliability advantage of Tiv.._cRmemories is further strengthen-
ed by the feasibility of a TMR/simplex mode of operation and by be_er
operation in a disruptive transient environment. Tra._.sient protection
techniques for TMfl memories are discussed in Section 2.10.
) 2.6 "_ower Supplies and Distribution
A guidance computer/data adapter designed for a long term,
manned mission requires an internal power system with performance
characteristics far surpassing those required in forerunning guidance
systems. To meet these requirements, the power system must be
developed around a proven redundancy concept. Many design ap-
proaches have appl_.ed redundancy techniques at the component, circuit,
cr subsystem level. Their relevant merits are discussed in the follow-
ing paragraphs.
2.6.1 Design Approaches
Because of the inflight maintenance requirement, component re-
dundancy may not be applicable except to assure reliability of m indi-
vidual building block used within a system. Component redundancy, in
this case, would affect only the replacement interval for building blocks.
On the other h_nd, a circuit redundancy concept would be a valid ap-
proach only if the power distribution system is made highly reliable
and various load failures cannot cause secondary failures in the re-
mainder of the system.
Both component and circuit redundancy techniques carry cost,
weight, and volume penalties in their usual form. In general, compo-
nent redundant circuits require four components for each circuit ele-
ment. In addition, the use cf component re_,mdancy for linear circuits
is an extremely difficult design problem.
C'.rcuit redundant systems get.orally employ simple '_)rute force"
x... duplexingto achievereliability.Inpo" er systems, thismeans that
doublethe power capabilityrequiredisavailablewhen allcircuitsare
functioningproperly. Sincesimple duplexing,i.e.,parallelingof
80
1966011715-097
power circuits,furnishesprotectiononly againstundervoltagecondi-
tions,additionalprovisionsmust be made to insureprotectionagainst
overvoltage.
A subsystem redundancy designapproach resultsina power sys-
tem with severalimportantadvantages. For theTMR logiccircuits,
the power system containsthree independentpower supplycircuits
and three independentdistributionsystems. This can takethe form of
a tripleduplexor a triplesimplex configuration.Inthe tripleduplex
design,each of thethree logicchannelsreceivesindependentpower
from two of thethree power systems. Inthe triple-simplexdesign
each logicchannelreceives independentpower from a singlesource.
Inthe eventthatone ofthe power suppliesfail,the TMR logicloads
willoperateinduplexbut withoutthecapabilityofmaking a true major-
itydecision. This may not be desirablefrom a relLgb_itystandpoint.
For a littlextravolume, weight,and costpenalty,a triple-
duplexdesigncan be realized. This configuration,shown Lu Figure 33,
resultsindouble-sized,power supplies,butthe computer adapter
logicconfigurationremains TMR, even witha power supplyfailure.
A disadvantageofeitherthe triple-simplexor triple-duplex
supplyconfigurationisthatprotectionisrequiredagainstovervoltage.
A power supplyfailuremust not resultinstressfailureswithinthe
logicchannelloads. For thisreason, thecontrolportionsofeach
power supplyare made duplex redundant. The converterportionsmake
use ofcircuittechniquesdevelopedfor theSaturn-V Launch Vehicle
Data Adapter power supplies. These are designednottofailinan
overvoltagemode.
To preventpower supplyI_ad-sharingproblems, itisnecessary
for each feedbackcontrolloopto sample both the isolationdiode input
voltageand the respectivebus voRage. With thisarrangement, a short
circuitin_m individualdistributionsystem willnot cause excessive
voltageriseson theother distributionlines. Further precaution
againstloadsharingproblems can be takenby providingenergy coupl-
ingbetween primary power sources. This can be accomplishedby
using a multiple-winding,common core filterinductortocoupletran-
sientenergy between independentpower lines,therebystabilizingthe
process oftransferringloads between power supplies.
i
J
!
: 81!
i
1966011715-098
PS1
Figure 33. Triplex-Duplex Power System
2.6.2 Power Supply Switching
There are three reasons for providing some form of a power
switching arrangement between the primary sources and the loads:
1) To provide proper turn-on arid turn-off sequencing for loads
such as destructive readout memories.
2) To permit removal of a power supply because of a failure in
the supply.
" 3) To permit removal of a section o_ the load because of a
"_ power failure in the load itself.
82
1966011715-099
The switching combinations that are possible between power sup-
plies and loads require the use of remote error sensing circuits. These
should be designed to insure proper voltage regulation at the various
distribution loads.
To switch the power supply outputs, the isolation diodes shown in
Figure 33 can be replaced by power transistors. The base drive cir-
cuitry for these power switches requires some auxiliary voltage
sources to insure sufficient forward bias for "on" switches and to pre-
vent base-emitter breakdown of "off" switches. The remote sensing
lines must also be switched or otherwise disabled. One of the several
techniques used for low-level analog switching may prove to be suitable.
2.6.3 Recommendations
In summary, the power system described inciudes three simplex
power supplies, duplexed into three independent power distribution
systems using power switching transistors. Each simplex supply in-
cludes duplex error amplifiers to prevent overvoltage failures. All
remote sense lines are capable of being switched to permit removal of
any power sapply or any load. A single power supply failure results
in two simplex powered loads and one duplex powered load. A single
short circuit load failure results in one operational power supply feed-
ing two loads.
2.6.4 Power Requirements
Based on the preceding requirements for a triple-duplexed power
system, the power requirements for each of the three TMR portions of
the computer-adapter equipment are summarized in Table 8.
TABLE 8 -- Regulated DC Power Per Section
(tic) ] Load Current (amperes) Load Power (watts)Voltage
+ 20 0.84 16.8
+ 12 0. 092 1.1
+ 6 3.44 20.6
- 3 0.10 0.3
83
1966011715-1O0
2.6.5 Power Supply Circuit Configurations
Since each power supply output is required to drive two loads,
the desigL_ centers for each voltage will be twice that given in Table 8.
This table also shows that two of the loads are very light i_1comparison
to the others. Because of this, the use of independent power supplies
for these outputs is not recommended. This decision is based on both
the design cost and hardware standpoint. The best balance between
component co:_t, manufacturability, testing, and maintenance is ob-
tained by usini_ an interrelated system. This is shown in Figure 34.
These blocks represent one-third of the total power supply system.
) The seri_s regulator in Figure 34 must be designed to prevent an
overvoltage at the output. This is accomplished most economically by
providing duph:xed error signal amplifiers and a dual series pass
element.
+28 Vdc
+20V !Regulator _ 0 +20 A
Conyerter Redundant
Ser|esRegulator +12 A
Master
Timing
Oscillator / i+6V _ • 0+ 6A
-6 V ,.1 _ v
Regulator Redundant - 3 A
Converter Series Regulator
Figure 34. Interrelated Power System
84
1966011715-101
The entire power supply system will require three master oscil-
lators, six regulated DC-to-DC converters with duplexed amplifiers,
six redundant series regulators, twelve isolation and/or power switch-
ing elements, and twenty-four remote sense line switches. Since the
power levels are similar for the two types of converter-regulator
chains, multiple use can be made of many existing building blocks used
in the computer-adapter equipment. Component part requirements are
given in Table 9.
TABLE 9 - Power System Component Count
Components I Quantity
Small Signal Semiconducto,'s and Resistors 1194
Power Semiconductors 42
Magnetic Components 24
Capacitors - Ceramic and Tantalum 118
Reference Network Components 42
Total Count 1420
The volume and w_ight estimates of each power supply section
are based on the use of discrete components, with an operating fre-
quency of 100 kc/s for the converter portions. A reduction of approxi-
mately 20 percent in volume can be realized in integrated circuit
amplifiers, modulators, and control circuits are utilized. Power
supply size estimates are based on the use of three identical packages.
Each one contains two basic converters which supply four outputs and
the sequencing controls necessary for those outputs. Each unit will be
contained in a volume of approximately 55 cubic inches and will weigh
3 pounds.
85
1966011715-102
2.7 Grounding
2.7.1 Saturn-V Grounding
The general grounding system of the Saturn-V computer and data
adapter, consisting of a regulated d-c return and a filtered power
return, was retained in the AES computer subsystem. As shown in
Figure 35, the common vehicle ground for the AES system is brought
into the computer subsystem in duplex form through an RFI filter. The
outputs of the RFI filter are "commoned" in a ground plane designated
as the filtered power return, are capacitively coupled to the subsystem
chassis, and are routed through a transient decoupler to the triplex
power supplies. The secondaries of the power supply transformers
are "commoned" in a second ground plane designated as the regulated
d-c return, which in turn is routed in duplex form from the computer
subsystem to the common vehicle ground.
The filtered power return is used as a reference for the 28-volt
d-c output driver circuits and for elapsed time indicators. Discrete
outputs, discrete inputs, and d-c interrupts are referenced to the fil-
tered power return. The ground plane is located physically in the
back-panel multilayer interconnection board.
The regulated d-c return is used as a reference for signal and
power supply ground. Any signals to other equipment in the command
module which are referenced to the regulated d-c return will be trans-
former coupled or floated within that equipment. The regulated d-c
return consists of ground planes in the back-panel multilayer board
and in each of the logic module multilayer boards.
z.7.2 AES Grounding Modifications
Certain modifications to the Saturn-V grounding details were
made. In Figure 36, the module reference for the memory is shown
as three separate ground planes commoned at a single point in the
back-panel ground plane. The connection to the common vehicle
ground is also made at this same point on the back-panel ground plane.
This representation of the memory grounding is meant to indicate that
special consideration was given the physical location and electrical
connection of the memory ground reference to minimize the noise
effects of equipment ground current distributions on the AES memory.
Isolation of the three channel grouad planes ef the memory in this
86
1966011715-103
Pr, e  RF, i 7 , ons,eo )
Power I / Filter I . L I Decoupler I
(Duplex) L _, L-.__..--._ J, (_'(b|l ___Z I ,,I
Symbols: Filtered I
Power I _i__ I
FloatedGround Return I
with Equipment I
Computer [ R:CC;:_d IChassisGround Subsystem
__ loated GroundPlane
Common
Vehicles
Ground
Figure 35. Grounding System
manner also minimizes the probability that a voltage transient of ex-
ternal origin will affect all three memory channels, since cross-
channel coupling of transient currents is minimized.
The ground planes of the remaining modules in the computer-
data adapter subsystem are shown as single lc_.anes although each
module exists physically as three individual simplex subassemblies
containing individual ground planes in their respective multilayer
87
1966011715-104
m!oMemo M°'u'e'chonoo'' ,I
2 Memory Module_ Channel 2
L0
¢.
a Memory Module, Channel 3>
0
j E _
O O Arithmetic Module
o !
O , 10 Address Registers Module i
o' I
[O Control and Timlng Module ]
O-- [O Dat_ Adapter Output Counter Module
O I IO DataAdapterinputCounterModule I
O I O Data Adapter Time Counter Module I
(3-- ! O Data Adapter Data Fl°wM°dule. !
O Data Aclapter Control Module !
"_ O Data AdG,pter Processor Module
Figure 36. Regulated DC Return GroundPlanes
88
1966011715-105
interconnection boards. This representation of the module grounding
is meant to indicate that there was no attempt made to isolate the indi-
vidual ground planes as in the memory area.
An additional ground plane is provided i_ ear.h simplex module,
as shown iv Figure 37 to isolate the logic grou.nd current'_ frum the
voted intermGdule signal returns. The ideal intermodule interconnec-
tion technique for ground current isolation would be twisted signal-m-
ground pairs except that the number of module interconnections becomes
prohibitive. The technk, ue shown in Figure 37 effectively contains the
logic ground cl:rrents i]. the module and separated from the vGted inter-
connection currents. The regulated d-c return ground plane in the
back-panel interconnection board contains mainly interconnection signal
rett, rn currents and power currents.
2.7.3 h,_erface Circuits
There are four basic types of circuits which interface the com-
puter subsystem to the other AES subsystems:
i) Type D - Discrete inputs,
2) Type C - Discrete outputs,
3) type Y - Transformer coupled inputs
4) Type X - Transformer coupled outputs.
The interface cham, nteristics for these interface circuits are the same
as those of the Apollo backup data adapter except that the AES circuits
will be component-redundant. Simplex versions of these circuits and
some of their more important characteristics, including the manner in
which they are tied into the AES grounding system, are described in the
following paragraphs.
A simplex example of a typical AES dis, cete ir.p, lt circuit is shown
in Figure 38. Note that the ground reference from the driving AES sub-
system for the 28-volt d-c discrete is not routed along: with the signal_
This separate routing of signal and reference simplified Apollo int_r-
face wiring, and the resulting sensitivity to noise pickup was not
89
1966011715-106
Fti
90
1966011715-107
gl
An AES Computer +bY _'6V
Subsystem Subsystem
+28V
I Switching
I :T L
__v l
t RFI "Filter t
, I O CommonVekicle Ground i !
Primeow r
Figure 38. Discrete Input Circuit
' gl
i
t
I
i
1966011715-108
considered to be critical in the case of d-c signals. The circuit charac-
teristics of the discrete input signals are:
1) Input signal level - "one" 28 + 11 volts d-c
,, t,
zero 0 + 2 volts d-c
2) Source impedance - "one" 4, 000 ohms
"zero" open
3) Load impedance - "one" 22, 000 ohms
"zero" 22, 000 ohms
I A simplex example ofa typicalAES discreteoutputcircuitis
shown inFigure 39. Again itwas notconsiderednecessa_'yto routethe
ground referencealong withthed-c signal. The circuitcharacteristics
oftbediscreteoutputsignalsare:
1) Source impedance - "one" 3, 000 chms maximum
"zero" 500, 000 ohms maximum
2) Collector current - "one" 5 milliamperes maximum
3) Output - "zero" 0 milliamperes at40
voltsd-c
A simplex example of a typical AES transformer coupled pulse
input circuit is shown in Figure 40. The regulated d-c return is used
as the ground reference for these circuits, since the transformer pro-
vides ground decoupling between the computer subsystem and the driving
subsystem. The circuit characteristics of the pulse input signals are:
1) Input signal level - "one" 7 + 3 volts
2) Signal pulse width - 4 + 2 microseconds
3) Source impedance - "one" 100 ohms
"zero" 10 ohms
4) Load impedance - "one" 200 ohms
"zero" 20 ohms.
> A simplex example ofa typicalAES transformer coupledpulse
"_ output circuit is shown in Figure 41. Again the regulated d-c return is
92
1966011715-109
Computer
+6V Subsystem
+28V
Type C Circuit - I
' 1 Load VCircuit
OR 0
Extender I
I _olwterii_eturnI
Filter J
! _) Common Vehlcle Ground l I
Figure 39. Discrete Output Circuit
t
I
J
t
I
9,'1
1966011715-110
Compu:,r
Subsysten: +6V +6V
! '
, IL
I
1
Regulated d-c Returr: I
Figure 40. Pulse Input Circuit
Computer +12V +12V _ !
Subsystem +6V
Type X Circuit
AND _'C _ : _ :" _
Inputs _0 _ ,
OR 0
Extender
[ Regulated d-c _eturn ]
Figure 41. Pulse Output Circuit
94
1966011715-111
J/
used as the ground reference. Tile circuit characteristics of the pulse
output signals are:
I) Outputsignallevel- "one" 7 + 3 volts
2) Signal pulse width - 0.25 to 6.0 microseconds
P 3) Source impedance - "one" I00 ohms
"zero" I0 ohms
4) Load impeJance - "one" 105 to 500 ohms
"zero" 20 ohms.
2.8 TMR/Simplex Mode
A new operating mode for tile AES system was considered during
the study in which one or more modules in the computer and data
adapter operate simplex while the remaining ,nodules operate TMR.
This mode would be useful not only for equipment checkout as channel
and module switching modes are used in the Saturn-V program but
would provide an increase in reliability over the basic TMR mode as
described in the following sections.
2.8.1 ReliabilityConsiderations
The expressionfor the basic reliabilityof a TMR module without
regard to logicaldirectionoffailureis
2 2R3
RTM = 3 Rc - c '
where Rc isthe reliabilityofone channelofthe module. Assuming a
constantfailure rate (k)for the module,
= e-kr = e-t/MTBFR c
where MTBF isthe mean time between failuresofeach chann_.lofthe
TMR module.
The reliabilitycurves for the simplex module and forthe TMR
module are plottedinFigure 42 againstnormalized time (t/MTBF).
Since thecurves cross at t = 0.69 MTBF, itisobviousthatTMR
95
4
1966011715-112
Figure 42. Generalized Reliability Curves
g6
1966011715-113
configurations should be used only in applications where mission time
is short compared to the MTBF of the simplex channel unless mainte-
nance is permitted during the mission.
Note that if a failure occurs in one channel, both the remaining
channels must continue to operate correctly. Once a failure has occur-
red in a module, therefore, the reliability of that module is less than
, that of a simplex channel. For example, Lf a failure occurs in channel
1, the reliability of the module becomes R2 R3 = Rc 2 while the relia-
bility of a channel is Rc. A reliability improvement should be possible
by switching out one of the remaining unfailed channels after a failure
has occurred in a module and thereby operating that module simplex
while the remainder o£ the computer and data adapter operate TMR.
The choice of which of the good channels to switch out is arbitrary
2.8.2 Basic TMR/Siraplex Reliability
The relir' ity of a TMR module in the TMR/simplex mode can
be derived by aaaing the probabilities of operating states as was done
for the basic TMR mode in Section 2.1.2. A somewhat more rigorous
method was used, however, as follows.
Denote the phase or mission time period for which the reliability
is to be calculated by the constant T and operating time within this
period by the variable t. Assume that a first malfunction occurs at
some time r during the phase or mission, where 0_< r _<T.
The probability that one or more maFunctions will occur by
time t, assuming constant failure rate X, can be represel_.ted by the
unreliability expression
-3k t
U=(l-e ).
The probability density of a failure occurring at the specified time r is
_lUI = 3X e -3 kr
u -- _"j t*r
The probability that a failure will occur during any increment of time,
dr, at a specified time r is
udr = 3k e'3krdr
9"/
1966011715-114
Now assume that, when the first failure occ_trs, the faulty channel of
the module plus one of the good channels _re switche¢_ off. The proba-
bility that the m_e is operating at the end of the time period T is
then given by the product of the probability tha_ a failure occurs at time
r (u dr ) and the conditional probability that if a "ailure does occur at
time r and _;hes_itching action is accomplished, *.henthe remaining
goo¢l modul,e operates for the remainder of the phase or mission time
(e-X (T-r)).
3Xe-3Xr dr(e-X (T-r))
) 3Xe -2Xr eTM dr
Since the time of first failure could occur at any specific time
during the phaseor mission, the reliability expression derived above
must be summed for all specific times during the period T, or
3X e"XtT rT e"2Xr dr.Jo
Solving the integr;t]. Expression and substituting Rc (reliability _f one
channel of the TMR module) for the exponential e- _,T,
3/2Rc- R3.
This is the expression for the probability that the module is operating
at the end of the phas_e or mission, assuming that a failure occurs
during the period of the phase or mission. The reliability of the TMR
module in the TMR/simplex mode is derived by adding to this expres-
sion the probability that no failures occur during the time period, o.r
RTSM = 3/2 Rc - 3/2 Rc3 + Rc3
= 3/2R c - 1/2Rc 3.
This expression is identical to the expression for TMR reUa-
bility without the TMR/simplex mode but a_umes with equal proba-
\_ bilities that two f,_il_res may occur in the same or opposite directions
(P(o) = 1/2). The TMR/simplex mode therefore effectively forcee
compensating errors in different channels of the TMR module 60 per-
cent of the time.
98
1966011715-115
2.8.3 Ultimate TMR/Simplex Reliability
In the derivation of the reliability expression for the TMR/simplex
mode it was assumed that, when a failure occurred, a good channel of
the module was switched off along with the defective channel. If it is
possib!e to use this channel as a spare in case the operating channel of
the switched TMR module in turn fails, then the ultimate reliability
, provided by a TMR/simplex mode can be achieved.
Assuming, conservatively, that the off-time failure rate of the
"spare" channel is equal to the operatiag failure rate of a channel, then
the reliability of a TMR channel in the TMR/simplex mode (and using
the "spare" channel) is
= )3.RTM 1 - (l - Rc
This equation expresses the fact that all three channels must fail to fail
the TMR module. It is identical to the expression for TMR reliability
without the TMR/simp]ex mode but assumes that compensating errors
always occur in related portions of the three channels.
The reliability of the basic TMR/simplex mode and the "ultimate"
reliability of this mode are compared Jn Figure 43 with the reliability
of the basic TMR mode. Reliability is shown plotted against normal-
ized time on a time scale representing a few hundred operating hours.
2.9 Reorganized Subsystem
The reorganized computer subsystem was derived from the basic
subsystem consisting of the Saturn-V computer and a redundant version
ef the Apollo backup data adapter by reconfiguring certain performance
constraining areas (such as the oscillator, memory, and power supplies)
and by repartitioning the basic computer and data adapter into replace-
able modules suitable for error diagnosis and replacement. This re-
organized subsystem was then examined to determine to what extent it
met the functional and availability requirements of the 90-day AES-EPO
mission.
2.9.1 Computer Description
The attempts at partitioning the computer into modules resulted
in a four-module computer with the memory contained as one of the
99
1966011715-116
ex Mode - Ultimate Reliability
°-
_ O. 990 -
°-
I
0.985-
/
o.98o, .... .10 0.02 0.04 0.06 0.08 0 0
Normalized Time (t/MTBF)
Figure 43. Reliability Comparison (TMR and TMR/Simplex)
100
1966011715-117
modules. These four modules (memory and memory interface, arith-
metic, address registers, and control and timing) required a total of
120 voters to be assigned to the intermodule interface signals. Using
the Saturn logic as a reference, the logic was "sized '_ usiug a typical
DTL integrated circuit family. Since no voter or disagreement de-
tector circuits are presently available in this circuit f:tmily, it was
r_ssumed that each could be contained on an integrated circuit chip or
, flat pack. The subsequent module sizing data are tabulated m Table 10.
Figure 44 indicates how the Saturn computer could be partitioned into
the modules as descr._bed.
TABLE 10 - Computer Sizing
No. of Chips No. of Inputs-OutputsMocule
No. Voter
Logic Voter Total Inputs Outputs Interconnection Total
1 72* 34 106" 78 17 51 146
2 169"* 18 187"* 72 9 27 108
!I
3 59 I[ 80 [ 1"_9 1 59 40 120 219
4 83 [ 92 J 175 i 36 46 138 220I I
* Does not include raemory elec_rcuics
** Does not include delay line (or shift register chips)
The partitioning of the computer logic into modules was functional
in nature. For example, module number 2 contains all the arithmetic
logic. This allows the maxhnum incra-connection of the logic within a
given module and, therefore, reduces the interconnection between
modules. Module 2 is a classic example because it contains the most
logic of any module and the least interconnections. Only nine signals
are outputs and require vo_ers.
In theory, the more signals that are voted, the more reliable the
machine. However, the voters also have a failure rate and a situation
where more voter failures occur than ]ogic failures cannot be tolersted.
Also, it has been demonstrated that the TMR reliability is _.naximized
when the modules are the same size, i.e., their s'_:nplex reliabilities
are equal.
101
1966011715-118
102
1966011715-119
In practice, the partitioning and voter assignment are a series of
trade-offs. Unfortunately, no tool is presently available which will do
this and maximize the system reliability at the same time. Ti',is can be
done only through a trial and error procedure where the reliat)ility of a
given partitioning is computed. Obviously, it is impossible to do this
for all possible partitionings. The final result was the selection of the
partitioning with the least number of voters which yields Me desired
reliability.I¢
In an attempt to simplify the data adapter portion of the A_-_ollo
computer, a study was made of one possible means of increasing the
central processor speed. In this study, the central processor organi-
zation was restricted to be identical to that employed in the Saturn-V
computer. An increase of four times in the speed was fo,.md to be fea-
sible using state-of-the-art components. The three major areas
affected would be the memory, the logic circuits, and _e glass delay
line storage elements. Tae requh'ements in these areas for a "times
four" computer are:
1) Memory - The memory proposed for the AF..S is a double
density Saturn-V array which uses conventional toroidal
cores. Unlike the Saturn-V system, the memories will not
possess the duplex capability. This scheme will be re-
placed with a TlVIR organization that will also allow simplex
operation of the th:'ee redundaut computers. Each memory
will contain 8192 28-bit words and will be augmented with
an on-board memory load capability. Memory cycle time
will be 2.5 nanoseconds.
Parity checking is not required when operating in the TMR
mode, since each bit of the 14-bit parallel word is voted.
The parity check capability is included, however, to pro-
vide error indications when operating as independent sim-
plex systems. In the simplex mode of operation, the three
' memory modules operateindependentlywiththeirrespec-
; tive computers. Thus, each computing system has its own
8192 word memory.
When in the TMR mode of operation, each output of the la-
; bitpositionsof thebufferre_,_le,-isvoted. Accordingly,
the contents of each memory are restored correctly had a
i disagreement occurred in any one.
, 2) Logic Circuits- The logiccircuitshave approximately50
nanoseconds delay per stage. Integratedcircuitsare avail-
i able from many sources which would satisfy this require-
ment.
I
i 103
1
1966011715-120
3) Delay Lines - The delay lines have a 2 megacycle bit rate.
This requirement may be satisfied in a number of ways.
The present Saturn-V computer uses two glass delay lhms
with four channel3 of mformatiou per line. For an equiva-
lent 2-megacycle bit rate, eight of these lines would be re-
quired in the "times four" computer. An integrated circuit
shift register using field e_fect transistors (FET) is availa-
ble with a rehable operating range from 50 kilocycles to 2
megacycles. These shift registers duplicate the function of
the glass delay lines and have the advantage of being much
smaller and less critical to variations in clocking frequency.
Each computer will use eight of these devices to provide the
J required arithmetic registers.
These changes will allow the centrai computer to perform the
computations presently done on the I/O processor cr the backup Apollo
data adapter. A block diagram of the information flow is shown in
Figure 45. A summary of the comt.uter characteristics is given in
Table 11.
2.9.2 Data Adapter Descrip[ion
The organization of the AES data adapter is similar to the Apollo
data adapter system described in Section 2.3 except that the I/O proc-
essor and bit gate generator were eliminated. The central processor
unit bit gate generator will suffice for the Apollo data adapter.
Memory-steal was eliminated. The functions which were performed
by the processor were assumed by the AES computer except for the
task of updating the 32 counting registers. These registers were in-
corporated as hardware in the form ¢,f shifting registers. In addition
to the shift registers, each counter has its own add/subtract or shift
unit and control logic. To conserve logic, the shift registers utilize
the field effect transistor shift registers in which up to 100 bits of
storage may be obtained on a single flat pack, which makes the hard-
ware implementation of the counters feasible. The input and real-time
counters may be loaded or unloaded by computer commmid. The output
counters can only be loaded by computer command. Data is transferred
serially to or from the computer arithmetic element. The stahc data
such as the discrete outputs are stored in latch registers, which are
loaded in parallel from the data exchange register (DER). Discrete
inputs are transferred in parallel to the DER. This register converts
> the data from parallel to serial or vice versa for communications with
the computer. The real time counter was implemented in hardware as
a shift register and the central processor unit interrupt register was
increased to 15 bits to allow for time control interrupts.
104
1966011715-121
Memory
Computer
I
I
Tr
kA
Subtract Driver (F
_' = DriverI "0
_ _ Multlpl
.. a Driver (F
i
_ Produot: Driver
MultipI,
!"5
1g66011715-122
i SenseAmperOu*pu> ZoThe!Inhibit Driver Inputs _ Memory.
] -Jffer Register _,
Transfer Register Output _
<
H -ET Sh,ft Register) Driver (FET Shift Re(.
Accumulator-lnstructlon Counter
II
:ET Shift Register) MMp _I-_ Driver M (FET Shift Rec
• m
_ltipler, Quotient and Final Result Register
, I_'-I
ET Shift Register) H_I
I I,,,:-J
icand and Divisor
:T Shift Register) I tDI Oscillat,
and Remainder
•
•. :T Shift Register)
J
t-Divide Timing
x_
1966011715-123
5_ Timing and
Address Lines
,
Address Decode
and
_ Memory Timing
Controls
to ,1Register
}ister)
lister1 _ To
'- Comput_
z J_',,,.
Clock Gates
" Computer Timing
L_
!
Figure 45. AES Computer
Flow Diagram
¢
105
1966011715-124
TABLE 11 - TMR Computer Characteristics (AES)
Type General purpose, stored program, serial, fixed
point binary
Clock 2 megobits per second information rate
Speed Add-subtract and multiply-divide simultaneously
Add 21 ps
Multiply 84 ps
Divide 168 ps
Memory Type Toroidal magnetic core, random access
Storag_ Capacity Three TMR memory modules each having 8192
28-bit words.
Input/Output External; computer-l., ogrammed I/O control.
External interrupt provided.
Component Count 47, 800 integrated circuit semiconductors and
resistors. (4 memory modules)
Reliability 0.9994 probability of success for 90 days
Packaging 21 electronic page assemblies
Weight 69 pounds
Volume (Swept) 1.8 cubic feet
Power 102 watts
The data adapter contains approximately twice as much logic as
the computer. This is attributed primarily to the large amount of
static storage (latches) required to buffer the input and output signals
, and control the counting registers. Figure 46 illustrates the basic
configuration of the data adapter.
106
1966011715-125
-- 37 Outputs _ --
J 37 Voting Transfcrm_rCoupled Ou_pu*,Lv;,
.... L
DER 14 r-- -- -- C_tputC_nter M--'_u';'_AI---- "'I r -- --'
Address__ II OutputCounterII--Oot_
Modu,eA_ II MOd,
*_rni_--'I I II ,_._ot_ooIToI_oo,
_o,_._. [11ou_ut__I o uto_,_,_i,out_
D_0"rim_°gJ ! t II _u'eA_ II Mod,
' 1Addre_ Gyro DER9-14 Radar IJ CounterPuIse Logic PuIseDER Timing Timing Logic t r ProcessorModule A2
M 130Chips- 200 Interconnections 5 Votersj I Identical To Telemetry
Inpi,t/Output Data 4 [ Module A1
Timing
mo !
A1 - A9 ProcessorModule A3
_-1 J Identical To Telemetry[ Control Module - A1 i j Module A1
I / Dab i
"_1 I T__E
11 Output Register1p F- d Address
' "- - °.°_. 4- r - 0ota_,o_o_u,e,(12 Bits) _- I I Address
2 42 egister2 :_r:llslaneOus Controls
Data Exchange114_Voting DERRegister
o o_tput I , :" I .ll Register (14)U
T Drivers l-_OutputsRegister3_HF-d CPUOutput I , II_F._t__LI....p (15 Bits) le__¢_ Register I ' TU ,L' _vo,.o f I J I
J r-:_EEE----_,,_o,o_,'t
'_ S t 125 chips- 210 Interconnections J J I Logic !
' _ J 'denntlco'TcDiscrete I li_0Ch_psf [i'"
"--IZ} ' l O._utMO_u,eA,
i_ : F _ontr--_oModule A3 1 _ ,.
]dentlcaJTo Discrete I Output ! J Jnpu. J
i I O:,tputModuleAI I I DriversI I DriversL ...... J _16_ I I (15) I
J
i
1966011715-126
%, ,
CDU'S, Telemetry, Accel '!_
_t _ 30 Inputs ,_ ,
1[Dllvers 30 Transformer Coupled Inputs
' LI37 OutRuts 1 k30 tnr_ut,s 30 Inputs "
.oA_ _oduleA_ I ModuleA_ i i
,ut Counter Input Counter Input Counter J
_'le A1
[ Processor Module A1 -1 '_
Address '-_
• "--1 J 5 ,V°ters J Timing
U D°wn- _ JER14Reglster
Load L;nk Pulse Timi_
I IJ 4 Register DER 0-4J -I Counter Muhl_J I i __t ReaI-T_eCo
I Timing ---r-- Register I Output ReglstSync --_ J (15)
• -_ Address
Counter Multiplex i 133Chips J ' J50 Intercon_necfions _ T_rr,ingi • - __ ,-- -- _ 0ER14
L_------] Real-Time Co,
,I I [ Data F,_ I 8 Counter Outp
J _[ ModuleA3h,
Identical To
Data Flow
Module A1 j Pulse Timing
Timing
I Pulse Timing
,
i.._ 10 Voters [ Data Flow J
-- ,___ Module A3 /
DER Identical To
Controls _" "_ Data Flow J !Module A1
Inter¢onngctlorl, I J I i
> 'rivers J Drivers Drivers Drivers r)rivers L,nk
L_ (12)? (15)? (15) ? Inte,rupt?
"q@ I . -,
1966011715-127
LEM Hand Controller
erometers Attitude Pate Commards (3)
Digital - Converter
,k
J 30 Inputs I
Input Counter Module A1 _ -1
I I
I ___ _ 1_ External ]Input Counters t
i T T, i
DER ON-._ Strap J Hand
2_ rim;rig_ Loader I Pulse"! Control
Timing I Lo,,c IAddress _
g _ _F11 i
nter--_- Multiplex Real-Time
5 Voters Counter I
r _137 _s- 170 Intercon,,e_._ctions _ _ _ __..J
..._--" Ti"_e C_'-'_'nte_odul-_A|
!1 Vot,oot_ 9 Time Counters Transformer
I ? I 1_50utl°UtS Coupled 1
Pulse Output 7Timing Drivers O
• Timing _ 10 Voters I
1135135Chinlps.s-160.... Interconnectlons J UT
Time Counter _ 7 U
J Module A2 TIdentical To Time J _ S
J Counter Module AI j15
J T_me Counter -7 15
Module A3 1
i" J Identical To Time J
I: L_ CounterM_uleA; J
!i Figure 46. AES Data AdapterFlow Diagram
107
1966011715-128
The partitioningofthedataadapterintomodules isnot as straight-
forward as inthecomputer, inthatfunctionalareas are notas clearly
defined. Also, the use ofmultiplexorlogicto serializetheparallel
registerstends toreduce the number oftotalvotersrequired. This is
compensated for inthe number ofvotersrequired inthe interfacetothe
restofthe system. For the system requirements, 121 interfacevoters
are required. These are packaged separately from the six logic modules.
Table 12 indicates how the data adapter was partitioned into six modules
and the chip or flat pack component count and interconnection require-
ments. It should be noted that these modules are approximately the
size of the computer modules. This is an important consideration ff
the same packaging philosophy is to be applied to the whole computer-
J' data adapter complex.
The data in Table 12 is based on partitioning the data 'adapter into
six unique modules. It is a possibility that one or two of these modules
could be standardized such that only four module types, would be re-
quired. This standardization would be at the expense of a slight increase
in module size and interconnections. The obvious savings are in the
spares required to be carried for inflight maintenance.
To preserve the symmetry of the replaceuble simplex modules,
the data adapter interface voters are packaged separately. These cir-
cuits convert the TMR output signals to a single line. Circuit relia-
bility was increased through use of redundant and/or quaded compo-
nents so as not to degrade the system reliability.
The results of the exploratory (Phase I) and evaluation (Phase II)
testing effort described in Section 6 indicated that adequate connector
sealing could be maintained dur} :g operation and maintenance in the
high humidity-zero gravity environment when resorting to over-all unit
sealing of the computer or data adapter. The reconfigt,xed computer
and data adapter were packaged as a unit for the AES application; the
physical organization is illustrated by Figure 47. The individual re-
placeable modules are shown in indiv[dvJally sealed cans.
2.9.3 Reliability Models
A reliability model for the reconfigured computer subsystem was
derived and used to compare the reli}&Uity characteristics of the re-
configured system with the basic system of Section 2.3. The time
"_- models and basic reliability assumptions (such as constant failure
rates) are the same as in the reliability model for the basic system
although the failure rates of ir_dividual component parts are different
due to the radically different packaging conce_.
108
1966011715-129
109
1966011715-130
110
1966011715-133
t;ince the ccmputei" and data adapter operate effectively in series
for reliability computations, the model for the computer subsystem is
simply
Rs_ = RcoRda,
where l_co is the computer reliability and Rda is the dma adapter reli-
ability.
The reorganized computer is composed of a redundant oscillator,
TMR logic, and TMR memories:
= R R .R
Rce . os c! meI
where Ros is the reliability of the oscillator, Rcl is the reliability of
the computer logic, and Rme m is the reliability of the memory.
In the gated triplex oscillator; described in Section 2.4, success-
ful operation depends only on the operation of any one of the three sim-
plex oscillators. The reliability model is
R = 1 - (I - R )3,OS O"
where Ros is the reliability of the triplex oscillator and Ro is the reli-
ability of the simplex oscillator.
The computer ]ogic, including the timing generator, is physically
divided into three replaceable subassemblies. Hard core (nonreplace-
able) hardware is included in the basic frame. The reliabilky of a
TMR module can be expressed mathematically as
2 (Rmod)3Rtrio = 3 (Rmod) - 2
where Rmo d is the reliabihty of eacl: simplex module of a TMR module.
The reliability module for the computer logic is the product of the tell-
abilities of the individual logic modules, or
4
Rcl = 7r _ _(Rtrio)i"i=l
111t
!
1966011715-134
The reorganized computer uses a TMR memory which operates
in a manner similar to the logic and was described in Section 2.5. The
model is
R =R 3 + 2R 2(1-R )
mem m m m
+ 3 Cs2Rm (1 - Rm)2
+ Cs3 (I-Rm)3,
where Rm is the reliability of one simplex memory module, Cs2 is the
conditional probability that no identical address bit has failed in the two
failed simplex modules, and Cs3 is the conditional probability that no
same address bit has failed in the three failed simplex modules.
The reorganized data adapter is composed primarily of TMR
logic and triplex power supplies:
Rda : RdalRps,
whert Rda Iis the reliabilityofthe dataadapterlogicand Rps isthe
reliabilityofthe power supplies.
The dataadapterlogicispackaged insixreplaceablesub-
assemblies, and other hardware (includingthe input-outputcircuitry)
ismounted on thebasic frame. The outputd,iversand the hard-core
hardware were assumed to be TMR as wellas the replaceablemodules.
As inthe computer, thereliabilityofa TMR module can be expressed
as
Rtrio = 3 (Rmod)2 - 2 (Rmod)3.
The model for the _ix replaceable TMR modules and hard core is
7
= 7/"
Rdal (Rtrio) i"
> i=l
A triplexpower supplydrivesboththe ccmputt _and dataadapter.
Each supplyiscomposed ofthree identicalconverter-regulatorswith
duplexerror amplifiersand isconnectedtodriveany or allthree of
112
1966011715-135
the TMR channels. Each regulator is protected by isolation diodes.
Successful power supply operation is defined in the following ways:
1) All regulators (with duplex error amplifiers) work; all
isolation circuits work.
2) All regulators work; two isolation circuits work and one
fails high or low. This can occur in three ways.
3) All regulators work; one isolation circuit works and two
fail (three ways).
4) Two regulators work and one fails low; all isolation circuits
work (three ways).
5) One regulator works and ;_o fail low; all isolation circuits
work (three ways).
6) Two regulators work and one fails low; two isolation cir-
cuits work and one fails low (three ways).
7) One regulator works and two fail low; one isolation circuit
works and two fail low (three ways).
The resulting reliability model for a regulator is:
Rreg (RsRI)3 3 RS 3
+ 3_s3e_ (1- p_)2
+ 3 RS 2 PSFL RI3
2
-:- 3 RS PSFL RI 3
+3R 2S PSFL RI2 PIFL
2 2
+ 3R S PSFL RI PIFL '
113
1966011715-136
where P reg is the triplex reliability of one regulator, R S i_ the reli-
ability of one converter-regulator with duplex error amplihers, R I is
the rel':abilityofone isolationdiode, PSFL is,+heprobabilitythata
couve,'ter-regulatorfailslow, and PIFL isthe probabilitythatan iso-
lationdiode failslow. Sincethere are sixregulatorsinthe power sup-
ply,thereliabitityofthepower supplyisgivenby
6
= _r (Rreg)_"Rps i = 1 "
2.9.4 Rel£_bility Estimate3
Simulations were performed to derive the reliability estimates
for the reorganized computer and data. adapter. These reliability
estimates are summarized in Table 13 for the time-critical phases and
for the mission. The latter is esthnated for duty cycles of 100, 50,
aud 25 percent and for zero and non-zero standby failure rates. The
mission reliability figures represent the basic long-term reliability of
the equipment. The sparing requirements to raise these figures to the
required 0. 9994 mission reliability are described in Section 2.9.5.
Note that the required computer subsystem reliability of 0. 999999 for
the critical mission phases has been achieved by the basic TMR con-
figuration without resorting to the TMR/simplex switching mode. The
instrumentation of this TMR/simplex mode would provide significant
improvement even though the study goal has been met without it.
Table 14 summarizes these reliability estimates of the AES
equipment in the TMR/simple>" mode. These estimates represent only
the degree of reliability attainable witi_ this mode of operation but do
not account for the possible reliability degradation provided by the
switching mechanism required to accomplish this.
2.9.5 Sparing Considerations
In Section 2.9.4 reliability estimates of the AES computer/data
adapter system were presented. Mission reliabilities for the various
types of missions ranged from about 0. 9699 to 0.9979, all which fall
below the required 0. 9994 for *.he AES mission. In order to meet the
required reliability, inflight maintenance in the form of spare sub-
_.. assemblies had to be implemented. The problem then was one of
114
1966011715-137
TABLE 13 - Reliability Estimates (AES System - TMR Mode)
.............. I...............1
Element Critical N_n-op k>0 Non-ot, k: 0Phase 100% 50% ..... --25% 50% _5%
Computer 0.9999994 0.978717 0.989052 0.993031 0.994346 0.998542
Oscillator _-,1.000 _,1.000 _1.000 _1°000 ,_1. 000 _1.000
Logic 0. 99999999 0.999647 0. 999822 0.999888 0. 999909 0.9999765
Memory 0.9999994 0.979063 0.989228 0.993142 0.994437 0.998566
Data Adapter 0.9u99997 0.991132 0.995442 0.997092 0.997639 0.999372
Logic 0.999389 0.999697 0.999808 0.999845 0.999959
Input/Output 0.991728 0.995743 0.997282 0.997793 0.999412
Power Supply 0.99999992 0.999900 0.999951 0 999975 0.999976 0.999995
System 0.9999991 0.969941 0.984496 0.990118 0.991976 0.997911
...........................
TABLE 14 - Reliability EstLmates (AES System - TMR/Simplex Mode)
..... l -9
Element , Critical Non-o _>0 Non-o k = 0
i Phase 100% 50% 25% .........50% 25%
Computer 0.99999946 0. 981708 0. 990500 0.993921 0.995062 0.998716
Oscillator _1. 000 _-,1.000 _1. 000 _,1. 000 _1. 000 _, 1.000
Logic 0.99999999 0.999824 0.999911 0.999944 0.999955 0.999988
Memory 0.999999470.981882 0.990588 0.993977 0,995107 0.998728
Data Adapter 0.99999988 0.995488 0.997693 0.998532 0.998809 0.999685
Logic 0. 99999999
Input/Output 0.99999925!0.995788 0.997844 0.998627 0.998886 0.999705
Power Supply 0. 9999999710. 999900 0.999952 0.999975 0.999976 0.999995
System 0.99999927i0.977181 0.988167 0.992437 0.993854 0.998397
115
1966011715-138
determining the optimum spares inventory required to achieve the
desired reliabihty. Optimization of this inventory for this study was
based o,1 the criteria
]',R. P
l ai
Rwi = W. '
1
where
Rwi = Weighted ranking criteria for ith spare
A R i = Increase in system reJiability provided by ith spare
Pal = Probability of detecting a failure in the ith module
) Wi = Weight of the ith spare
An ordered list of spares was generated which provides the great-
est increase in system reliability for the added weight of the spare.
Table 15 is a list of subassemb]ies in the system which are
available for sparing.
Tables 16 through 20 are lists c,_ spares selected according to
i this criteria. Each of these lists represents those spares required for
a mission of the specified equipment duty cycle and nonoperating fail-
ure rate condition. Each list consists of two parts: 1) a list of the
subassemblies available for sparing the system and 2) the list showing
the order and quantity of each subassembly selected, the resultant non-.
critical phase system reliability, and the cumulative weight.
Table 21 is a summary of the improved reliability showing the
combined reliability of the launch, orbit injecL and orbit adjust phases;
the orbit phase rehability before and after sparing; the weight of spares
required to obtain this reliability; and the total mission system relia-
bility just prior to re-e .try.
This configuration includes the possibility that only two of the
three modules of each TMR trio will survive until the end of the orbit
phase. Therefore, a full TMR configuration will not exist just prior to
re-entry. Without the full TMR capability, the critical re-entry phase
will not meet the critical pbase reliability requirements.
If re-entry must be considered as a critical phase and must meet
> that requirement, it will be necessary to carry an additional comple-
'x_ ment of spares in order to restore the system to its full redundant
capability just prior to re-entry. This requires one each of the logic
pages and memory modules and two power supply modules, a total of
16.8 pounds, in addition to the spares required to achieve the mission
requirement.
116
1966011715-139
TABLE 15 - ListofAvailableSpares
Spare Subassembly Name Weight (lbs)Number
Computer
1 Arithmetic 0.43
2 Address Register 0.43
3 Control/Timing 0.43
Memory
4 Memory Module 4.50
Data Adapter
5 Output Counter 0.43
6 Input Counter 0.43
7 Time Counter 0.43
8 Data Flow 0.43
9 Control 0.43
I0 Processor _ 43
11 Input/Output 0.43 o
Powcrful. ply
12 Power Supply 4. O0
Module
................................
If re-entry is not a critical phase, the system may be in a "two
parallel" configuration at the end of the orbit phase, tilat is, only two
simplex modules of each tri_, may be operating. If the TMR/simplex
switching modification is available, one each of these parallel modules
may be switched out and the entire system operated simplex during
re-entry.
Table 22 shows the unit and system reliabilities for the re-entry
phase for each of these three equipment configurations.
Table 23 then gives the total mission reliability for the AES sys-
tem with the three re-entry configurations. Only when the system
operates at its full redundant capability during re-entry wil: the total
mission reliability requirement of 0. 9994 be achieved.
117
1966011715-140
TABLE 16 - On-board Spares - 100-Percent Duty Cycle
Duty Cycle 100%
Spare Subassembly Delta Cumulative System
Number Name Reliability Weight (lbs) Reliability
11 Input/Output 0. 00737480 0.43 0. 97749496
4 Memory Module 0.01608645 4.93 0. 99358141
11 Input/Output 0r 00071146 5.36 0. 99429286
,} 4 Memory Module 0. 00443837 9.86 0. 99873123
1 Arithmetic 0. 00012259 10.29 0. 99885383
9 Control 0. 00012093 10.72 0. 99897476
3 Control Timing 0. 00011069 !1. :5 0. 99908544
8 Data Flow 0. 00009657 11.58 0. 99918202
7 Time Counter O. 00009165 12.01 0. 99927367
6 Input Counter 0. 00008464 12.44 0. 99935830
10 Processor 0. 00008128 12.87 0. 99943958
12.87 Pounds of spares required.
TABLE 17 -- On-board Spares - 50-Percent Duty Cycle,
Non_op Failure Rate >0
Duty Cycle 50%
Non-op failure rate >0
. .q
Spare Subassembly Delta Cumulative System
Number Name Reliability Weight (lbs) Reliability
11 Input/Output 0.00388689 0.43 0. 98851638
4 Memory Module 0. 00894830 4.93 0. 99746468
>
_,_ 11 Input/Output O. 0002481,t 5.36 0.99771282
4 Memory Module 0.00171687 9.86 0. 99942970
9.86 Pounds of spares required.
1!8
1966011715-141
TABLE 18 -- On-board Spares - 25-PercentDuty Cycle,
-'_sn-op _ailure Rate >0
Duty Cycle 25%
Non-op failure rate >0
, Spare[ Subassembly Delta Cumulative System
Number Name Reliability Weight (lbs) Reliability
1i Input/Output 0.00249953 0.43 0. 99272476
4 Memory Module 0.00592437 4.93 0. 99864914
11 Input/Output 0.00012162 5.36 0. 99877076
4 Memory Module 0.00089125 9.86 0. 999 36200
9.86 Pounds of spares required.
TABLE 19 -- On-board Spares - 50-Percent Duty Cycle,
o Non-op Failure Rate = 0
Duty Cycle 50%
Non-op failure rate = 0
!_'
Spare Subassembly Delta Cumulative System
Number Name Reliability Weight (lbs) Reliability
11 Input/Output 0.00203302 0.43 0.99410621 p
4 Memory Module 0.00487773 4.93 0. 99898394
11 Input/Output 0.00008739 5.36 0. 99907133
4 Memory Module 0.00065684 9.56 0. 99972817
9.86 Pounds of spares required.
119
1966011715-142
TABLE 20 --On-board Spares - 25-Percent Duty Cycle,
Non-op FailureRate = 0
Duty Cycle 2'=_
Non-op failure rate = 0
1
Spare[ Subassembly Delta Cumulative ]I System
Number [ Name Reliability Weight (lbs) [ Reliability
, II [_Input/Output 0.00053165 0.43 0.99849373
' I4 I Memory Module 0.00140191 4.93 0.99989564
4.93 Pounds of spares required.
TABLE 21 --ReliabilityImprovement Due toSparing
m
Mission Spare Orbit Phase Orbit Phase
Duty Weight (Before (After Mission Reliability
Cycle (lbs) Sparing) Sparing) (Prior to Re-entry)
100 12. P7 0. 970120 0. 999439 0. 999437
50* I 9.86 0. 984629 0. 999429 0. 999427
I
25" ! 9.86 0. 99022 5 0. 999662 0. 999560
50** I 0.86 0.992073 0.999778 0.999726
I
25** 4.93 0. 997962 I 0. 999SJ5 0. 999893
i
* Non-opk >0
i **Non-opk = 0 Pre-orbit Reliability = 0.9999981
!
120
1966011715-143
TABLE 22 --AES System Reliability- Re-entry Phase
System Configuration
Element TMR Two Parallel Simplex
Computer 0. 999999996 0. 9998688 0.99S 244
Data Adapter 0. 999999908 0.99943172 0. 9997158
Memory 0. 999999763 0. 99941414 0o 9996240
Power Supply 0. 999999999 0. 99977775 0. 99977775
System O. 999999667 O. 9984317 O. 99904227
TABLE 23 -- AES Mission Reliability
Mission Total Mission Reliability
Duty Cycle TMR Two Parallel g__mplex
100% O. 999436 O. 997931 O. 998479
50%* 0. 999426 0.997921 0. 998469
25%* 0. 999659 0. 998153 0. 998702
50%** 0. 999725 0.998219 0. 998768
25%** 0. 999892 0.998386 0. 998935
* Non-op k > 0
**Non-op k = 0
In Section 2.8, a concept of the ultimate reliability of the TMR/
simplex mode was discussed. This mode of operation is actually a
sparing situation because the switched-out module may ultimately be
used as a switchable spare.
Table 24 presents the rel, ability estimates of the AES system
with this capability in the noncritical coast phase only.
121
t
1966011715-144
TABLE 24 -- AES System Retiability - SwiLtchable Space Mcde
Non-op k > 0 Non-ol:, >, = 0
Element lC0% 50% 25% 50% 25%
Compute:: 0.9399993 0.99999997 0.9999998 0.999999991 0.99999998
_/emory 0. 99859 0. 99926 0. 99974 0. 99981 0.999974
Data 0. 999851 0. 999946 0. 999973 0.999980 0. 999997
Adapter
Power 0.99990 0. 999951 0. 999975 0. 999976 0. 999995
Supply
System 0. 99830 0. 99915 0. 99969 0. 99976 0. 999966
The system must be in a TMR configuration for the critical re-
entry phase to meet its reliability requirement. With the switchable
spare capability, the entire system may be operating simplex at the
end of the coast phase. To restore it to a TMR configuration would
require two additional subassemblies of each type, a total of 25.6
pounds. If the system is riot restored to a TMR configuration, it may
operate simplex throughout the re-entry phase.
Table 25 shows the system mission reliability with the two re-
entry configurations. Only with the TMR configuration during re-entry
is the mission reliability requirement achieved, but this can be accom-
plished more economically by selecting a more optimum stock of
spares.
TABLE 25 .- AES System Reliability - T_tal Mission
Non-op > 0 Non-op = 0
Re-entry
Configuration 100% 50% 25% 50% 25%
TMR 0. 998337 0. 999154 0. 999685 0. 999763 0. 99996
Simplex 0. 997381 0. 998197 0. 998728 0. 998803 0. 99900
122
1966011715-145
2.9.6 Error Detection and Fault Isolation
Simulations were performed on the basic computer using the
IBM 7090 logic simalator to determine the error detection and fault
isolation capabilities of the computer. Since Saturn-V test programs
were found to be adequate for simulation purposes, the appreciable test
programming effort which was visualized at the start of the study was
not required. The test programs were used to operate the logic sima-
• lator which simulated selected failure conditions and produced error
symptoms.
Error detection is accomplished in the Saturn-V computer by
means of disagreement detectors which sense a difference in the three
channels feeding a voter. These disagreement detectors consist of
three-way exclusive OR circuits. Since the Saturn-V computer contains
approximately 200 disagreement detectors, the outputs of groups of
detectors are "OR'd together to provide 13 signals to an error monitor
register in the data adapter. The inputs to the detectors are clocked to
allow time for the input sig,mls to reach steady-state conditions before
sampling.
The logic simulation of the basic computer confirmed earlier
simulation results with the Saturn-V computer that indicated a 99-
percent detection efficiency based on the disagreement detectors. That
is, once the logic was screened for undetected failures involving re-
dundant logic elements or circuits which were included in the computer
design to conserve power or to ensure against marginal conditions,
99-percent of the component failures injected into the simulator were
detected by the disagreement detectors when the simulator was being
operated by a standard operation code exerciser test program. In fact,
extensive propagation of errors through the computer tended to be
sensed by many detectors even though these detectors were not directly
associated with the logic containing the fault, thus masking the source
of the error by overdetection.
Although the computer was judged on the basin of the logic simu-
lation results to possess adequate error detection capabilities, fault
isolation under AES mission conditions was judged inadequate in several
areas. An assumption of the study was that error detection and fault
isolation should necessarily be automatic. Since the disagreement
detectors indicate only that there is a difference in the information con-
tained in the three channels and not which channel disagrees with the
other two, channel or module switching would have to be performed by
the astronaut to isolate the error to a replaceable simplex module.
Also, since the voters existed at the module interfaces and since the
123
1966011715-146
detectors are positioned at the voter inputs, only clever analysis of the
disagreement patterns by the astronaut could differentiate between a
voter failure in one module and a logic failure in the module following
the sdsoected voter.
Overdetection occurs by propagation of errors in time as well as
in circuitry. Saturn-V disagreement detectors are clocked every like
clock time (for example, any one disagreement detector may be clocked
every x-time, another every y-time, etc.). As a result, detectors are
sensing for disagreements between the simplex modules of a TMR trio
even at times when those modules are not being used by the program,
making any diagnostic correlation between the detectors and program
very difficult.
The "OR-ing" network which reduced the 200 disagreement sig-
nals to 13 error monitor signals was found to need modification, to im-
prove the fault isolation capabilities of the computer. Several identical
disagreement patterns resulted from different component failures which
could have been iso!ated if the respective detectoz ._utputs had been
routed or clocked in different times to different positions in the error
monitor register.
For example, Table 26 shows the error patterns of some voter
failures which gave the same diagnostic error symptoms, although the
failure signals were located in different modules.
To unscramble identical error-pattern symptoms, simulation
experiments were performed. Errors were injected into the simulation
program, and the error propag_*iou was noted. This was done by ob-
serving the responses of the -_isagreement detectors. From these re-
sults, optimum timing and placement of some disagreement detectors
(DD) could be determined.
An example of this optimization is shown in Table 27. In the first
failure, symptoms could be unscrambled by clocking the status of the
TRSN disagreement detector into the error monitor register at either
instruction CLA 776 or STO 776. CLA 776 would pinpoint the error to
the RDV signai net, and STO 776 would indicate that the MD2V signal
net has failed.
In the second example, the CDS and SYLO disagreement detector
would be removed from the OR EP5 and EP9 position and assigned to a
'" new error monitor position. This would permit isolation of the STONV
'"_ and RUNNV signal net failures. Similar results can be obtained for the
other conflicting symptom conditions. A trade-off would have to be
made in cost of hardware to obtain this failure isolation capability.
124
1966011715-147
TABLE 26 -- Disagreement Patterns
II Error Pattern of DetectorsFailure
Signal Name Module ]No. Location 7 11 5 1 2 3 4 6 9 10 12 8
w
1 RDV 7 X _X
MD2V 4 I X !X
2 RUNNV 7 X X X X
S'IONV 4 X X X X
3 TRSN 2 I X iX
OP4N 5 X X J
, PIO 5 X X
4 VO4V 4 X X
RUNV 5 X X
5 A9V 6 X
TR10V 2 JX
!
6 TBCV Timing X X X
OP3NV 5 X X X
Q8 4 X X X
2. I0 Transient Protection
Considerable effort was expended to ensure adequate protection
of the computer memory from voltage transients or external origin.
The nature of the Gemini coml_ter memory alterations was investigated
to determine if these same types of alterations can occur in the Saturn-
V or AES computers. (Most of the Gemini computer memory alterations
appeared to result from the alteration of address information. The
addition of deletion of bits resulted from voltage transients between the
computer chassis and signal ground.)
Methods of preventing memory alterations due to address modifi-
cations were iuvestigated. Consideration was given to the magnitude of
this problem in relationship to simple hardware failures which can be
overcome by redundancy. Possible solutions to the transient problem
involved additional selective redundancy, separation of channels to the
ex'_ent practical, and improved circuitry.
125
1966011715-148
o_
©
°_ °_
©
126
1966011715-149
00
1966011715-150
J0
P_
1_.8
1966011715-151
.;-,4
"_,
0
0
129
1966011715-152
130
1966011715-153
ir,-i ._
_.,,"
@
I -1"4 ,v,,4
t°
I_ 4-'
• _ _
•,.4 e" I_
m 00
g _
•_ _ 0 0
,+,,4
,-, mm+
i
_ •
131
1966011715-154
2.10.1 Gemini ExperieDce
rbe results of the transient susceptibility tests of the Gemini
computer performed over the last few months were reviewed to deter-
mine their applicability to the AES-EPO computer organization. The
most transient-sensitive areas of the Gemini computer (next to jitter
in the cross-over detectors) were found _o be the memory sense lines
and the output lines of the delay line sense amplifiers. Since the mem-
ory sense lines operate with signal levels in the order of 10 millivolts,
low-level noise introduced on these lines will cause zeros to be read as
ones. The delay line drive and sense amplifieru axe cabled to the re-
mainder of computer logic (rather than connected by multi-!aJer board
lines) and are therefore more noise sensitive than other logic circuits
: cfessentiallythe same noiserejectionlevels.
The specifictypes oferrors encounteredduringthe transient
susce:.f;.hilitytestsincludedcross-over detectors,accumulator shift,
add and subtract,multiply,RDR, DAS, DCS, modifieddiagnosticpro-
gram, and I/O processor failures.Althoughthe shift,add, subtract,
and multiplyerrors appeared to be ofdifferentypes,these errors
were foundtohave a common source. In each case, the errors were
theresultoftransientnoisecoupledintothe memory sense lineson the
panelinterconnectingthe memory multiplexcircuits.The noisere-
sultedinthe incorrectreadingofmemory data.
Because of the low siffual levels in these areas, it was not feasible
to measure the magnitude of the noise. The output of the memory sense
amplifiers was observed while injecting transient noise into the cable
test loop. Errors occurred mainly in memory locations having the
most zeros.
Analysis of the Gemini data inaicates that the following features
should be incorporated into the AES design to decrease the susceptibility
of the memories to voltage transients:
1) TMR organization,
2) Improved layout of the multilayer interconnection board
(MIB) sense lines
3) Limited bandwidth sense amplifiers
_,_ 4) Alternately strobed memories
5) Isolated memory grounding.
132
i
1966011715-155
2.I0.2 TMR Organization
Whether or not a TMR organization per se willdecrease the
transient susceptibilityof the computer subsystem has not been deter-
mined. Unless the transient manifests itselfas a local phenomenon
within the computer, the TMR organization will be no less susceptible
than a simplex organization. The Gemini tests resulted in somewhat
conflictingresults regarding the propagation characteristics of exter-
nally generated voltage transients. Is_,iatingthe computer chassis by
removing ground shields in the cable apparently decreased the thresh-
old of the transient nei_e applied to the chassis relativeto power and
reference ground by altering the number of return paths and thereby
changing the distributionand concentration of currents flowing in the
chassis. On the other hand, changes in the point of contact of the noise
generator probe on the computer chassis seemea to have no affecton
the threshold level. An i,westigationof the propagation characteristics
of externally generated voltage transients in computer organizations
should be performed ];utis beyond the scope of this study. The assump-
tion is therefore made that additionaltransient protection must be
designed into the AES-EPO computer organization, especially in the
area of TMR memories.
2.10.3 Sense System
Tests at IBM have indicatedthat the rise time of a voltage tran-
sient is probably the most criticalparameter in definingthe transient
susceptibilityof a digitalmachine - even more significantthatvoltage
levels or durations. Rise times of the order of a few nanoseconds at
relativelylow voltage levels have resulted in computer failures in the
laboratory while longer rise times at substantiallyhigher voltage levels
have been tolerated. Unfortunately, transients with rise times of 50
nanoseconds and less are apparently a common occurrence in opera-
tional systems.
The relativelypoor high frequency common mode rejection of the
Gemini memory sense system could be improved in the AES computer
organization by careful MIB layout of the memory sense lines. Further
improvement in high frequency common mode rejection probably can
be attained, however, by reducing the bandwidth of the sense system to
a minimum allowed by strobe margins. Itwas found in the Gemini
testingthat an optimum sense system bandwidth exists such that maxi-
mum strobe margins are obtained. This maximum strobe margin at a
specificbandwidth is due to the fact that zeros read from memory have
higher frequency components the ones. The rate of change of the area
133
1966011715-156
of a zero therefore decreases at a faster rate than the area of a one as
the bandwidth of the sense system is reduced. The optimum bandwidth
of the memory in Gemini was found to be approximately 1 megacycle
for maximum strobe margins.
2.10.4 Memory Strobing
A class of voltage transients which appears to be cornmon in
digital systems is a high frequency (20 megacycles or greater) barst
lasting less than 1 microsecond. Assuming that these bursts are
coupled into all three TMR channels and that they are not sufficiently
suppressed by the limited bandwidth of the memory sensing system,
memory errors will be generatea. If each simplex memory of the TMR
configuration is strobed at consecutive time intervals rather than simul-
taneously, however, the noise burst will affect only one channel of the
memory, and no system error will occur. The data from the alternately
strobed memories could be stored in individual buffers with synchronized
outputs.
This instrumentation would require the reinstatement of the mem-
ory buffer registers (eliminated in the reorganized computer) and a
slight decrease in computing speed. The feasibility of the approach, the
scope of instrumentation required, and the resulting transient protection
effectiveness should be further investigated.
2.10.5 Isolated Grounds
Isolated channel grounds in transient susceptible areas such as
the memory would reduce the probability of system error due to an
externally generated voltage transient. The three _.hannel grounds
would be routed independently to the computer memory frcm the corn-
moll ground plane in the data adapter. Isolated channel grounds in a
TMR memory would require pulse transformers in the output lines of
the voters in the logic module driving the memory module and differ-
ential amplifiers in the channel inputs to the output voters of the mem-
ory module.
134
1966011715-157
3.0 ERROR DETECTION AND DLkGNOSIS
The Saturn-V computer and a redundant version of the Apollo
backup data adapter were examined to determine the required organi-
zation to allow efficient error detection and failure i_olation for inflight
maintenance in an AES-EPO mission. The machine organization was
required to be suck as _o allow failure isolation down to each separate
channel, module, voter, or replaceable spare. Failure detection effi-
ci;_iency was required to be sufficient to assure that all channels are
operative prior to a critical phase of the mission. Feasibility of com-
puter programs to isolate these failures was investigated as well as the
organization of disagreement detector nets.
3.1 Approach
The Saturn-V computer and data adapter use majority voting cir-
cuits which providc correct system operation even when several ran-
domly located failures exist in the hardware. Correct operation is
possible even if two failures exist in the same functional area of any
two channels if the logic feeding the voters has failed to opposite states.
Correct operation is also possible if two or mo_e failures exist in the
same functional area of the three channels if the failure effects are non-
continuous and occur at d._.ferent times or if the failures exist at differ-
ent points in the data/low (such as different positions in a shift register)
wi,h voting between the failed positions. Although these TMR error
masking characteristics are conducive to achieving very high reliabi!i-
ties, the problem of error detection and fault isolation is increased by
the very error masking features for which TMR was invented.
Failure conditions in the Saturn-V computer and data adapter are
detected by means of disagreement detector circuits located primarily
at voter inputs. Failure isolation is accomplished by means of disagree-
ment detector signal data, char',nel switching, m,_lule switching, and
data analysis. Special test programs are required to accomplish the
task of error detection and fault isolation. The functions of voting and
disagreement detection are shown in Figure 48.
Channel switching in the Saturn-V computer and data adapter
provides the capability of switching tt,,e TMR equipment into a simplex
operating mode by forcing two of the three channels into opposing logic
levels and thereby causing the third channel to control operation. Since
any channel may be selected as the operating channel, three simplex
modes are provided. Channel switching may be performed i', the
laboratory or on the launch pad.
135
1966011715-158
[: ook-
; I" _ Channel]
I '
_ I J .5,'_, .
Module 2 Module 3
Figure 48. Voting and Disagreement Detection
Module switching in the Saturn-V computer provides simplex
operating capability as in channel switching, but the operating channel
can be formed by selecting naodules from two or more of the three
physical channels. The data path can therefore be made to jump be-
tween channels as the data progresses from module to module as shown
by the solid path in Figure 49. Module switching may be performed
only in the laboratory,
The approach to solving the problems of error detection and fault
isolation to a replaceable module level was primarily by means of built-
in test and switching circuits in this study. Test and diagnostic pro-
grams were a secondary consideration meant to fill any gaps in the test
functions left by the hardware approach. Sinmlation of the computer
configurations on an IBM 7090 computer was the primary analytic tool.
3.1.1 Hardware
'x_ Any computer development program includes a set of hardw_tre/
software trade-off studies to determine the optimum characteristics for the
particular application. In a development program for an advanced AES
136
1966011715-159
Module 2 Module 3 Module 4 Module 5
"c,MI_ _
C2M2 --'-'_ C2M3 t C2M5
Chann._eJ3.3.._JC3M2 ]____[ C3M3 J_ ___j C3M4 ___ _C3M_5 __......_
Figure 49. Module Switching
guidance and control computer, such studies .would include an evalua-
tion of the relative cost and merits of hardware and software methods
of c.rror detection and diagr, osis. Since a basic ground rule was
established for the AES-EPO study that automatic methods and uninter-
rupted system operation were to be the primary criteria for tr&de-off,
the hardware approach was followed wherever a choice between hard-
ware and software existed. The resulting machine configuration there-
fore does not necessarily represent the optimum configuration for AES
applications if uninterrupted system operation is not weighted as
heavily as in the study and if component count is considered more
important.
The hardware approach involved investigations of various methods
of proviaing disagreement detection and of combi_ing the disagreement
signals in a manner to og:imize failure isolation. Appreciable logic
design effort was devote6, to the problem of minimizing built-in test
circuitry by combining the detection and voting functions in common
logic. The feasibility of automatic switching to bypass failed elements
was investigated with some success.
137
1966011715-160
3.1.2 Software
As originally conceived, an appreciable study effort was to be
applied to the design of test programs to support the logic simu'_tion
tasks and to the architecture (flow diagrams) of diagnostic programs
to supple-aent the failure isolation capabilities of the hardware instru-
mentatioas. Existing Saturn-V test programs were found to be adequate
for simulation purposes, however, and no new test programs were de-
veloped for this purpose.
Flow diagrams of programs capable of iault detection and isola-
tion, sufficient to ensure proper operation of all channels of the com-
i puter and data adapter prior to any critical mission phase, were to be
developed. However, since the hardware approach to failure detection
and isolation was emphasized and since the hardware approach was
extremely successful, little need remained for special diagnostic pro-
gramming. This area of the study narrowed down accordingly to the
definition of basic requirements for efficient detection and diagnostic
programs based on simulation results.
3.1.3 Simulation
IBM developed a system simulator under the Saturn-V program
to verify the logical integrity of the computer and data adapter, deter-
mine the effects of desigu changes, and evaluate test programs. Over
a period of time, however, emphasis gradually shifted to special simu-
lator applications where data concerning machine operation are gener-
ated to aid an operator in isolating detected errors. This simulator
_vas adapted to the AES-EPO study to examine the error propagation
effects of various types of component failures on the error detection
and fault isolation capabilities of the AES computer and data adapter.
The Saturn-V system simulator is a set of IBM 7090 programs
consisting of compiler, failure inspection, simulator, and diagnostic
evaluation programs. The simulator flow diagram is shown in Fig-
ure 50.
The failure injection program allows cards containing selected
failure identifications and descriptions to be read into the logic simu-
lator on a failure injection tape. The failure injection program also
> produces a failure tape that the diagnostic evaluator program uses to
,,,,. compare actual injected failure data with the results of the sLrnulator
program.
138
1966011715-161
Simulation ] Select 1 Failure Diagnostic
Control Control Injection Programming
Cards Cards Cards Data
" I
Failure
Select Injection
" Program Program
" |
Pr_, ram
Compiler
I Program
Memory Exectutive
Loader Program
Program
Node Edit _ I Snap Edit
Conhol Control
Cards Cards
Snap I I
Node Edit I Evaluator i
Edit Program Ii Program II
Program k ....... J
. I I
i Node States J J Node and I r .......... |
I atSelected I I Register Values i , Evaluation ,I I
J at Select,-_ I Report ,
I j
i.__.--.-- -_'- I...- -" """
Figure 50. Saturn-V System Simulator Flow Diagram
139
1966011715-162
The compiler program will produce 7090 instructions for the
logic portion o1 the simulator program. The logic tape that feeds the
compiler provides a detailed logical description of that portion of the
machine selected from the logic master tape. The outputs of the com-
piler include (1) a simulation tape containing 7090 computer instructions
for the simulator and (2) a location tape containing the assigned 7090
core storage locations for various logical element oatputs.
The simulator program can determine system states while exe-
cuting stored test or operational programs and can display on print-outs
the state of selected nodes or register contents at any time during in-
struction execution. Simultaneous failure environments are provided
) by parallel simulation techniques; up to 25 multiple failures may be
injected into each of 33 simultaneous environments. Up to 100 logical
nodes may be monitored in either normal or failure simulation modes.
Special pseudo operation codes allow additional selected nodes to be
retrieved should the need arise.
3.2 Disagreement Detectors
In the TMR Saturn-V computer and data adapter, disagreement
detectors provide an output ff any of the triplicated modules fail. The
disagreement detector consists of a three-way exclusive OR connected
to each set of outputs of each trio of modules. There are approximately
200 disagreement detectors in the Saturn-V Guidance Computer. The
outputs of several disagreement detectors are "OR'd" together to provide
fewer outputs to the data adapter where an error-monitor register stores
disagreement-detector outputs for telemetry transmission. The inputs
to the disagreement detectors are clocked to allow time for the inputs
to reach steady-state conditions before sampling.
Disagreement detector ckvcuits can be made to sense errors be-
tween voter inputs, voter outputs, or channel input and voter output as
shown in Figure 51. The Saturn-V instrumentation uses the method
indicated in Figure 5 la. Reliability requirements or module packaging
configurations may dictate the need for using the method of Figure 5lb.
The method of Figure 51c is recommended for the AES configuration
primarily because it indicates which channel is in error as well as
which module.
> Error detection and diagnosis studies were performed in the areas
'_ of optimum placement and timing of disagreement detectors in the AES
com.r, uter and data adapter. Consideration was given to the problem of
140
1966011715-163
Module h Module 2 _ Module 3
Channel 2
Voter I_puts Voter Oull:uts Channel Input to
Voter Output
Figure 51. Methods of Error Detection
optimizing the "OR-ing" network for the detectors to provide failure
isolation by mesms of disagreement signals to a replaceable module
level.
Failure simulation experiments performed with the Saturn-V sys-
tem simulator :have shown that three error identification tags are re-
quired for failure isolation to a logic signal level using conventional
141
1966011715-164
disagreement detectorsas instrumentedinthe Saturn-V computer and
dataadapter:
1) Detection time of the error
2) Program instruction steps at times of detection
3) Error detector patterns.
Correct t._ming and placement of the disagreement detectors will provide
the first hvo failure symptoms, and selective grouping of the disagree-
I ment detectors will provide the third.
3.2.1 Timing
The extensive propagation of errors through the computer pre-
sented the greatest problem in isolating failures to a replaceable
module. Propagated errors tend to be sensed by many detectors, even
though these detectors are not directly associated with the logic con-
taining the failure, thus masking the source of error by "overdetection".
An approach suggested during the course of the study of clocking the de-
tectors only at the time that the associated logic is being used was
found to require too much additional timing circuitry to be practical.
Bit gates, phase gates, and in some eases even program step identifi-
cation were found to be required to accomplish the desired detector
timing. A method of combining the detector logic with the voter cir-
cuitry which wou!d partially accomplish the object of optimum timing
was investigated and is described in Section 3.3.
The Saturn-V disagreement detectors are clocked every like clock
time (for example, any one disagreement detector may be clocked every
x-time, another every y-time, etc.). As a result, detectors are sen-
sing for disagreements between the simplex modules of TMR trios even
at times when those modules are not being used by the program.
3.2.2 Placement
Error propagation has also been the major problem in attempting
an optimum placement of disagreement detectors. Although failure isolation
> to a replaceable module level has been found to be feasible in the computer by
'x_ reorganization on a functional basis and by redesign of the Saturn-V disagree-
meat detectors, means must be found to prevent the error from propagating
from one module to another and thereby destroying the isolation (as in the
142
1966011715-165
case of timing signals). An approach was investigated in which each of
the logic modules was partitioned into two or more diagnostic sections
by placing additional detectors internal to the ra,xlule to provide ,'e-
qulred isolation information.
The logic simulator was revised to allow flexible diagnostic parti-
tioning and used to provide data optimum placement of disagreement
detectors.
A logic simulation was designed to determine the cpt_.mum place-
meat of disagreement detectors in the TMR logic. A tota, of 32 voters
were failed and the failure data aralyzed to determine the ._,gic level to
which the failures can be localized. The specific voters to be analyzed
were chosen as represemative of the various types of combinational
and sequential circuits which would be "inputted" by the voted signals.
The instruction and computer time when any of the module interface
disagreement ¢'.etectors sensed a failure w_s tabulated. An ana!y:,is of
the simulation results showed that:
1) 53 percent of the voter failures could be identified by know-
ing :hich disagreement detectors had sensed the failed
conditions.
2) 40. '7 percent of the voter failures could be identified by
knowing the program instruction and computer time of first
detection in addition to which detectors had sensed the
failed conditions
3) 6.3 percent could not be identified.
The partitioning of the reorganized computer resultcd in using
approximately 120 voters at the module interfaces. The simulation
described assumed disagreement detectors at the input of each voter
and nowhere else. The 6.3 percent of the voter failures which could
not be identifi_.d was due to error propagation within a module and sig-
nal feedbacks between modales, resulting in identical error patterns
for different failures.
This problem was alleviated by placement of additional disa_o_-ee.-
ment detectors within the modules and at the module interfaces. To
determine the number and location of the intramodule detectors, the
four computer modules of t._e reorganized computer were divided into
equivalent diagnosable subunits by physical count of the signal inputs
to each of the latches and tratches in each of the modules. Table 28
summarizes the results of this count and indicates ame$ sure of the
unbalance of signals and voters (disagl eemem detectors) in each module.
14_
1966011715-166
TABLE 28 - Signals,Logic, and Voters
I
Module Signal Latches, Voters
Number Name Inputs Tratcbes (DD'S)
1 Memory and Read 459 ' 2"t 17
2 Arithmetic 1213 74 9
' 3 ControlTiming 387 I 34 26
4 Operation and Decoder 720 69 45
Timing (Distributed among 135 13 23
four Modules)
Total 29.14 217 120
Of particular interest is the ratio of the total number of signal
inputs to the total number of voters (or disagreement detectors since
the DD's were located at the voter inputs). This ratio was found to be
24:1. Using this figure as the basis for organization of equivalent
diagnosable subunits, approximately 21 additional disagreement de-
tectors were required. Their distribution and effect on the detector-to-
signal ratio is shown in Table 29. The ratios are average values, which
may be misleading because the additional detectors were chosen on the
basis of individual circuit sizes within the module and on the basis of use
and criticality. The effect of these additional 21 disagreement detectors
was determined by simulation.
Based on component packaging density and intermodule wiring
considerations, the AES computer was partitioned into four modules.
Approximately 105 disagreement detector trios have been definod for
intermodule failure detection. Table 30 shows the distribution of these
detectors in the four modules.
The AES data adapter logic was partitioned into six modules con-
taining a total of about 65 disagTeement detector trios. In addition, 21
additional disagreement detectors montior signals at the interface of
the computer-data adapter unit.
144
1966011715-167
TABLE 29 -Addit:.onalDisagreement Detectors
Module Basic Added [j Modified
Ratio DD' s i RatioNumber Name
Iw
1 Memory and Read 27.0 1 25.5
2 Arithmetic 134.8 16 48.5
3 Control Timing 14.9 2 13.8
4 Operation and Decoding 16.0 2 15.3
TABLE 30 - Distribution of Detectors
Module Function DD' s
1 Memory and Memory Interface 39
2 Arithmetic 9
3 Address Registers 27
4 Control 30
, m
The partitioning of the computer into four modules and the data
adapter into six modules seemed to be optinlal from physical con-
siderations such as the size of a replaceable module, the number of
module interconnections, and the complexity of _e computer-data
adapter unit. Assuming that a disagreement detector is placed across
each voter (comparing voter input and output signals) to isolate an
errer to the channel in a TMR module, and that a disagreement de-
tector is also placed at the output of each voter trio to isolate voter
failures from failures in the following logic as shown in Figure 51,
then the partitioning described allows diagnostic error resclution by
means of built-in circuitry alone (no special test routines) to a replace-
able simplex level. Simulation results showed, however, that this was
not an ideal diagnostic partitioning if the detector technique is restricted
145
1966011715-168
to conventional Saturn-V disagreement detectors placed across voter
inputs or if diagnestic resolution to a functional signal is required
(because the generation of identical diagnostic symptoms from entirely
different and unrelated signal failures resulted).
At present, no clearly defined grouhd rules exist which can be
applied to opt!really p-3..rtilion electronic units into diagnostic modules.
Logic simulation has been used, instead, to determine the character-
istics of failed machines and the nature of error propagation in a digi-
tal system to provide data from which such ground rules might be
derived. Two simulation experiments were performed during the study
totracefailurepropagationthroughthe computer logic. Sixty-six
simulatedfailureswere injectedintorepresentativevoter interfaces
and error propagationmonitored by disagreement detectorsplacedat
the inputto every voter and atother selectedlogicnodes withinthe
four modules oftheAES computer. These nodes were selectedon the
basis ofthe totalnumber ofsignalinputsto logiclatches.
These experiments provided sufficient data to partition the com-
puter into eight diagmostic modules although no change in the physical
packaging of the four AES computer modules was considered. (The
diagnostic module is defined by placement of disagreement detectors
rather than by physical packaging.) The arithmetic module of the AES
computer was partitioned further into three diagnostic modules, as
was the control module. A comparison of error signal propagation be-
tween four and eight diagnostic modules is shown in Table 31 for sample
failures. Note that there is less likelihood of identical failur_ symp-
toms occurring for failures in each of the four physical modules if the
additional diagnostic lmrtitioning i_ instrumented. For example, a
failure in physical module 2 and another in physical module 3 caused
identical failure symptoms in physical modules 2 and 3 when the com-
puter was partitioned diagnostically into four modules but no identical
failure symptoms when the computer was partitioned into eight diag-
nostic modules.
3.2.3 Detection and Resolution
The redundant mode was designed to t.,_ the operational and failure-
detection mode of the Saturn-',/computer. However, availability of a
sufficient amount of failure data based on error-monitor indications
allow a high degree of failure isolation For this study, IBM simulated
"_ several hundred failures using the batch-simulation technique already
described. Only single failures were injected into each simulated
machine and each failed machine was exercised for the duration of the
146
1966011715-169
TABLE 31 - Error SignalPropagation
Fu,,ctionalPartitioning
Interface
Failure In Symptoms Will Occur in Functional Modules
Module
Timing 1 2 3 4
1 2 3 4
2 1 2 3
3 1 2
4 1 2 1 4
DiagnosticPartitioning
Interiace
FailureIn Symptoms WillOccur inDynamic Modules
Module
. _ .. |
Timing i 2 3 4 I 5 6 7 8
t ]. 4 i 5 6 82 2 3 4 5 6 7 8
i3 2 3 4 5 7 8I
4 2 4 I 5 8
147
1966011715-170
test program. As a result, a varying error-monitor pattern was gen-
erated for each failed machine.
Of the several hundred failures injected into the computer logic,
iess than 10 percent were undetected by the error monitors. These un-
detected failures involved either redundant logi, 2 elements or those in-
cluded in the computer design to conserve power or insure against
marginal conditions. A 99-percent failure-detection effectiveness was
obtained after these types of failures were screened out.
The approach to failure isolation in the simulation was based on
correlation of logic failures with error monitor patterns, pattern
I changes, and sequence of pattern changes. The simulation data indi-
cated that about 75 percent of the fa_:lures could be isolated to a single
logic module through examination of the error monitor patterns. About
90 percent could be isolated to one or two modules. In addition, exam-
ination of certain pattern characteristics - such as fixed-or-variable
pattern, number of pattern changes during the test program, and se-
quence e_ error monitor changes -- as the test program exercises
various portions of the computer, logic provides an error resolution
of one module for almost all of the simulated failures.
Table 32 isa portionofa typicalprint-outfrom the simulati'mof
a redundantcomputer. The phase, bit,and clock time listedinthe
left-bandcolumn isthe instructionfetchtime, but the simulatorcould
be instructedtoprintoutthe actualtime ofoccurrence ofthe error sig-
r,al instead. The error monitor signalsare representedby the 13 EP
(errorposition)columns and the instructionsectorand address location
by the right-handcolumns.
The simulator was instructed to print out a new line every time
an EP Jocation changed state. Consequently, only a small portion of
the test program is listed in Table 32. The particular failures simu-
lated in this run affected error monitor positions 12 and 19. Diagnostic
information __scontained not only in the generated EP signals but also
in the instructions associated with a change of state of an error monitor
and in the total number of changes in state, i.e., with the entire EP
pattern.
Table 33 represents the results of another redundant simulation
in which the simulated failures are associated with logic pages 1 and 2,
and with error monitors 1, 2, and 3. An examination of the failure/
monitor correlation alone (igno.ring_tbe_additional diagnostic informa-
tion given by the instructions and time sequences associated with state
changes shown in Table 33). Table 33 indicates a high degree of
148
1966011715-171
TABLE 32 --TypicalPrint-outfrom Redundant Computer Simulation
INSTFIUC ?ION ERROR MONITORSFETCH TIME
E E E E E E E E E E E E E INSTRUCTION
B P I-, P P P P P P P P P P P
I 0 0 0 0 I I 1 I I I I I 2 A
T 1 2 3 4 1 2 3 4 5 6 8 9 0 S D
P C E D
H T L C R
A I O T E
S M C O S
E E K R S
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 I 0 05 054
A 09 Z 0 0 0 0 0 I 0 0 0 0 0 I 0 05 057
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 I 0 05 060
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 05 062
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 I 0 13 037
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 13 042
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 I 0 13 043
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 13 051
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 1 0 13 054
,_ 09 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 13 061
A 09 Z 0 0 0 0 0 1 0 0 0 0 0 1 0 13 063
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 ! 0 13 064
A 09 Z 0 0 0 0 0 1 0 0 O 0 0 I 0 13 226
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 I 0 13 227
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 13 232
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 I 0 13 233
A 09 Z 0 C 0 0 0 0 0 0 0 0 0 6 0 10 212
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 l 0 I0 213
A 09 Z 0 0 0 0 0 0 0 £ 0 0 0 0 0 I0 222
A 09 Z 0 O 0 0 0 0 0 0 0 0 0 1 0 I0 223
A 09 Z 0 0 0 0 0 I 0 0 0 0 0 I 0 I0 326
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 1 0 I0 327
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 I0 332
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 1 0 I0 334
A 09 Z 0 0 0 0 0 I 0 0 0 0 0 I 0 I0 34!
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 I0 342
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 ! 0 I0 344
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 02 103
A 09 _ 0 0 0 0 0 0 0 0 0 0 0 1 0 62 104
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 02 111
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 1 0 02 113
A 09 Z 0 0 0 0 0 0 0 0 0 O 0 0 0 02 201
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 1 0 02 202
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 02 203
A 09 Z 0 0 0 0 O 0 0 0 O 0 0 1 0 02 204
A 09 Z 0 0 0 0 0 1 0 0 _ 0 0 I 0 02 206
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 1 0 02 207
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 02 211
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 ! 0 02 213
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 02 222
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 1 0 02 224
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 1 0 02 225
A 09 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 03 012
149
1966011715-172
TABLE 33 -- Typical Redundant Computer Simulation
!/
EP I EP 2 I EP 3
E P2 Page15% ,P_g_° 2 __
E P 3 Page 1 Page 2 Pages 1 and 2
12% 13% 25%
resolution between the two pages. Error monitor combinations EP1
alone, EP1/EP2, and EP1/EP3 were associated with failures injected
onto page 1; error monitor combinations EP2 alone and EP1/EP3 were
associated with failures on page 2; and error monitor EP3 alone indi-
cated a failure on either page 1 or page 2. Failures thus isolated to
page 1 represented 39 percent of the simulated failures, those isolated
to page 2 represented 36 percent, and those which could not be re-
solved between page 1 or page 2 represented 25 percent. However, the
25 percent of unresolved failures co, rid then be resolved by an examina-
tion of the furl pattern equivalent to that illustrated in Table 33.
3.3 Switching
Module and channel switching, both automatic and manual, were
consideced in order to increase the reliability of the Saturn-V computer
and the redundant version of the Apollo backup data adapter and to aid
inflight maintenance. Single channel operation of some modules or of
the entire computer-data adapter subsystem for noncritical mission
phases was considered.
Two new modes of operation were considered for the AES
computer system:
\... 1) TMR/simplex
2) Switchable spare.
150
1966011715-173
In the TM_/simpley. mode, one or more modules of the system may
be operated simplex while the remainder of the system operates TMR.
One operational simplex module is turned off with every failed simplex
module when that TMR module is switched to simplex operation. The
switchable sr_tre mode is an extension of the TMR/simplex mode in
which the turned-off operational module is made available if a failu;:e
occurs in the operating simplex module. Switching problems associ-
ated with each of these modes were examined during the ccurse of the
study.
3.3.1 General
If the three signal channels can be made to function independently
at the voter, simultaneous TMR/simplex operation is possible. The
signal switching scheme is shown in Figure 52. Channel independency
is obtained by forcing the voter input to a selected binary state regard-
less of the failed condition of the inputed logic. The failed logic module
is switched out of operation along with a good module, and the selected
simplex module operates with the other TMR modules. The key to
TMR/simplex operational capabilit:_ is in the method of using dis-
agreement detectors to localize the failure to a module and switching
voltages to control the flow of data through the voter circuits.
Voter designs using two different circuit technologies were studied
for AES channel/module sw£ ching operations. The first (Saturn V)
operates as a current summer feeding a threshold circuit. The second
(integrated circuit - modUied current switch) operates as a logic ex-
clusive OR function (vot = A B + B C + A C). Neither of these exist-
ing designs allow independent channel operation necessary for TMR/
simplex mode capability, although both designs do allow simplex chan-
nel and simplex module operation.
Module/channel switching is accomplished at present by voltage-
switch forcing of a binary one into one channel and a binary zero into
another. In the Saturn-V instrumentation, a binary one is forced by
grounding the +6 volt line to the logic AND gate preceding the _]oter,
and a zero is forced by grounding the + 12 volt input to the voter circuit.
The present modified current switch instrumentation requires the out-
puts of preceding logic NOR gates to be forced. A binary one is forced
by raising the reference supply of the NOR gate to the collector supply
voltage, and a zero is forced by lowering the collector supply to the
reference supply level.
151
1966011715-174
Module 1 Moduel 2 Module
Failed
Chon_ "o
0J
u
I Selected
Chann__
I
(-
.u
ii Good "_
Cho,:;_ u_
[MR Simplex TMR
Figure 52. TMR/Simplex Operation
The desired TMR/simplex mode capability can be realized with
the present integrated circuit design at the cost of additional logic and
additional voltage planes. On the other hand, the Saturn-V instrumenta-
tion will require only a wiring modification consisting of: l) supplying
the voters with three independent +12 volt power lines which can be
independently switched to ground level and 2) providing a switching
capability on the -3 volt line which reduces the threshold level at the
base of the voter transistor.
3.3.2 Switchable Voter
_'- Considerable effort was devoted to investigating methods for
using error correcting devices in lieu of the Saturn-V voters. Most of
these methods permitted a switch down to one good channel in a trio
152
1966011715-175
when another channel has failed, A constraint which was placea on the
switching system was that cor_trol lines be kept to a minimum. For ex-
ample, a control line for each individua! error correction device was
not considered to be feasible.
A sw:i.tchir4 _,device was designed with a capability of automatically
switching from TMR operation on detection of an error in one channel
to simplex operation of one of tim two remaining channels. This auto-
maCic switch affects only the logic trio in which the failure occurred,
all other trios remaining in the TlYIR mode. Propagation of errors is
held to a minimum by using input error correction techniques rather
than outIx_t error correction.
This design approach has two disadvantages compared to conven-
tional majority voters. An intermittent failare causes switching to a
simplex mode an,_ does not recover its initial redundant state when the
intermittent has ended, as does a voter circuit. Also, the switching
device must have a preferred failure mode or else it will have a relia-
bility no greater than the majority voter.
This first method was then extended to give an operator control
over the switching. The logic trio would automatically switch to sim-
plex operation upon occurrence of an error, but upon the occurrence
of a second error, the operator could switch all trios involved in the
module to the channel that has not failed. All other modules would re-
main in the TMR mode.
If the error was due to an intermittent failure, the operator could
switch the module back to TMR mode when the period of the intermittent
has passed. The disadvantage of this approach is that, to achieve an
appreciable reliability gain over a voted system, the computer would
have to be partitioned into a large number of modules. This would re-
quire a large number of error monitor indicators and switches, which
the operator would be required to use.
To overcome the problem of differentiating between an intermit-
tent and a solid failure, a third method was investigated in which the
operator (or an automatic switching device) resets the switched-down
trio and checks to see if the error is again detected. The first error
encountered in each trio switches only that trio to one of the two re-
maining good channels. The two good channe!s are compared and if a
second failure occurs to make them disagree, an automatic switching
device on the trio output selects one of the two operating channels.
There is a 50-percent probability that the automatic switch will select
the remaining good channel. If it selects the wrong channel and the
operator detects a system error, he can override the switch selection
and select the remaining good channel.
153
1966011715-176
Three disadvantages exist in this approach. The logic required
to perform the switching function exceeds the other approaches. The
method corrects only the first trio in any module having two channels
that fail. Certain combination of failures will not alert the operator,
and a diagnostic routine is necessary to enable the operator to detect
a system error. However, since the system can continue t_, open'me
even when two channels of a trio have failed, the reliac:'_ity of the
system approaches that of a majority voting system with manual re-
placement of failed modules with spares. Also, eack trio is switched
independently of all other trios.
A fourth method was examined which was very similar to the
third with the following exceptions. If the logic circuits have a pre-
ferred failure direction, the fourth method will have the same prc._)a-
bility of selecting the proper channel as the failure preference. The
AES computer subsystem contains a great deal of single-line transfer
logic which does have a high probability of failure in one direction.
However, the logic and interface required for the fourth method is
greater than that for the third method.
Each of the methods examine;l provides specific advantages and
contains certain disadvantages. Further study is required to select
the best method for the AES computer subsystem.
The module/channel switching techniques considered for AES
application were simple and straightforward but assumed the existence
of a switchable voter design. The existing voter in the Saturn-V com-
puter and data adapter is module or channel switched by forcing a
logical one in one channel and a logical zero in a second channel.
Since the "votes" of these switched voters cancel, the third channel
effectively controls the voter output providing that no faults affecting
voter operation exist in the equipment. Certain failures could exist
in the Saturn-V design which might prevent forcing the two "switched"
channels to the desired logical level.
Considerable effort was expended in reviewing technologies and
developing concepts for switchable voting circuits. Although the
primary requirement was a voter design in which the individual chan-
nel inputs could be forced to desired logic levels even in the presence
of circuit failures, the capability to turn off individual channels was
considered to be a more desirable feature. The "off" channels would
be effectively removed from the circuitry and would present neither a
_.. logical zero nor a logical one to the voter.
154
1966011715-177
The Saturn-V voter is a ttiree-input current summing circuit in
which the bias voltage sets the logical one threshold to two units of
current out of three. This circuit could be converted to a switchable
voter of the type desired by providing means of removing power at
each of the channel inputs and simultaneously changing the bia,_ so that
the logical one threshold is set at one unit of current. T_:-o channel
inputs can then be turned off, and the voter operates as an inverter on
the third channel.
An intuitively obvious approach to switchable voters was to con-
sider logic voting as "llustrated in Figure 53 rather than current sum-
ming as in the Saturn-V circt,_,+s. The output of the OR gate shown in
Figure 53 is the AND siglml of channels 1 and 2 or the AND cf 2 and 3
or the AND of 3 and 1..'f one channel (say 1) is in error, then the out.-
F',t of the OR gate is 2 • 3, the logic level existing on channels 2 and 3.
Although the voting function is achieved, no practical means has yet
been discovered to provide the desired switching function with logic
el;cults of this type.
3 2
• _ A_ _ 1"2 + 2.3 + 3.1
•
3 2
Figure 53. Logic Voter
155
1966011715-178
3.4 Crew Requirements
A primary goal was defined early in the study to automate the
error detectic:l and fault isolation functions to the highest possible de-
gree and thereby minimize crew requirements for inflight maintelmnce.
_Iraining, experience, and test information required by the crew to
effect repair were made negligible by the hardware approaches pur-
sued in the study. Mm_-in-the-loop operations required by the AES
instrumentation were limited to reading a bank of indicator lights to
determine the location of the iailu,e and to making a manual replace-
ment of the failed module. Semi-automatic repair methods in which
' the astronaut switches in wired-in spare units or changes mode were
also investigated, as well as fully auto_aatic repair and mode changing.
Test approaches and mechanical packaging approaches were
directed towards eliminating the need for special test equipment or
tools to effert inflight maintenance. No approaches were considered
which could not be used by .x suited astronaut.
3.5 Programming Requirements
The test programming requirements for AES applications were
derived mainly from evaluation of existing Saturn-V programs and
from simulation experience gained on this study and previous Saturn-V
studies. Although the hardware approach to error detection and failure
isolation taken in this study has minimized the need for special test
programs, the following sections outline the program types a,_d require-
ments for the general case where either a software approach might be
emphasized or where a mix of the two approaches has been chosen.
The Saturn-V test progr_uns consist of four primary types:
1) Memory load and verify
2) Computer self-test
3_ Data adapter test
4) Marriage test.
The test programs required for the AES computer system would be
,.,.. similar except that the data adapter test and marriage test programs
would be combined, since the data adapter is packaged with the com-
puter in the AES configuration. Also, these programs would be useful
156
1966011715-179
mainly for laboratory evaluation, since specia! test programs are not
required for inf'ight error detection and fault isolation. The memory
load and verify, or at least a simplified form of the one described in
the following paragraph, s would be used in flight.
3.5.1 Memory Test
" The Saturn-V memory test programs are load and verify pro-
grams which exercise the memory circuits with selected instruction
and :iata combinations. The programs are organized on a bootstrap
principle in that operations progress from the simplest to the most
complex tests. The programs are standard functional exercisers
which force the computer to perform the following tests:
1) Checksum
2) One' s Discrimination
3) Zero's Discrimination
4) Addressing
5) Checkerboard
6) Inverted Checker_rd.
The checksum test is a check of proper memory loading and
operation of the test program.
The one's discrimination test checks the memories ability to
write and read ones correctly. The memory buffer registers, sense
amplifiers, core array, and driving circuits are checked by this te_t.
The zero's discrimination test checks the memories ability to
write and read zeros correctly. The driving circuit_ are checked by
this test, as well as the sense amplifiers sensitivity to noise.
The addressing test checks whether or not each memory location
can be addressed correctly. The followiag circuits are checked in
addition to the one's and zero's test: memory selection logic, diode
matrix decoders, and all memory drivers.
157
1966011715-180
The checkerboard and complement test produces maximum delta
noise cond.ition upon half read, which results in maximum inhibit noise
whenever a zero is written. The inhibit noise from a cycle where zero
was written can cause an errc, r during the read portion of the .next cycie.
The inflight memory load and verify program for the AES com-
puter would probably be limited to address and checksum test. outines,
since the discrimination and checkerboard tests are for marginal con-
ditions which tend to exist e_rly in the computer's operational life
("infant mortality") or very late in life ("wearout"). These raarginal
tests would be a part of ground checkout before the start of the mission
but would be of little value during the mission because the expected
) memory mission failures are cataotrophic rather than marginal.
3.5.2 Computer
The Saturn-Y computer test programs are functional exercisers
which force the computer to perform each of the control, logic, and
arithmetic operations for which the computer was designed. The pro-
grams are ¢;rganized on a bootstrap principle in that the programmed
operations progress from those which exercise the least amount of
computer circuitry to those vchich exercise the most. In general, the
order of test instructions is as foHo_vs:
1) Transfers
2) Shifts
3) One cycle arithmetic (ADD and SUB)
4) Logic (AND and XOR)
5) Multiple cycle, arithmetic (MPY, MPH, DIV)
6) Input/output operations
7) !nterrupt
Within each class of instructions the test words also progress in a
bootstrap manner. For example, a shift test would progress from
> shift high and low order bits to shift odd and even bits to shift all bits.
1.58
1966011715-181
Although the computer test programs do not perform a d_agnostic
analysis of detected failures, they do provide diagnostic assistance to
the cpe,'ator by storing failure information.
Test data (literals) are used by the program in selected bit and
word seque,ces to optimize test efficiency.
The test program ir written such that the normal order of instruc-
" tion upon detection of an error is to enter the error storage routine,
store the error, and return to the main program at the next program
step after that which detected the error. Hewever, under operator
control, the program may be halted to read available data, or the pro-
gram may be recycled from the beginning.
The basic organization of the later Saturn-V test programs was
changed from a Lx_otstrap functional exerciser to a component-oriented,
sandwiched-subroutine format.
The programs were generated by failing components system-
atically (on paper) and deriving subroutines to check each and every
failure. The subroutines were then assembled in g-roups of 11 instruc-
tions followed by a special PIO, with given computer operations (such
as multiply) distributed throughout the program. If this special PIO is
not received at least every 11 instructions, the computer will assume a
runaway or inactive condition of the test resulting from a malfunction.
An alarm will be issved ,and r.he storage delay lines latched up. Instruc-
tion addresses were chosen to exercise all drive lines in all sectors
during the program run.
The resultant test program provides advantages over a simple
functional program, although the work effort involved in generating it
is considerably greater. The component orientation of the program
requires fewer instructions. The distribution of the computer functions
throughout th ' program, rather than lumping each function in a par-
ticular portion of the program, provides a better inherent capability
for detecting intermittents. Control of the test in the case of a com-
puter malfunction which would normally disrupt the program is pro-
vided by the instruction grouping with the special PIO, Better diag-
nostic capability is provided through operator interpretation of the
failure information and reference to logic analysis data which will be
available.
159
1966011715-182
An importantconc'usionfrom computer simulationisthe appar-
entfeaMbl]ityofconstructinga diagnostictestprogram inwhich pro-
gram bramhing isbased on error monitor indications.The main
program would be a shortlogicexerciserdesignedfor efficienterror
detectiononly, and could operateperiodicallyduringthe operational
periodsofthe computer mission. Ifno error isdetected,littleoper-
ationaltime isconsumed by the test. But ifan error isdetected,the
program willbranch to specificsubroutinesdetermined by the error
monitor patterns.
Inan actualdevelopment program, an "optimum" diagnostic
configurationwould be derivedby a trade-offbetween hardware (built-
! in test circuitry) and software (test programs or routines). In the
AES-EPO study, however, the hardware approach was selected when-
ever a choice existed, but consideration was given to tim requirements
for eventual diagnostic programs.
The majority of test programs are written by hand and are de-
signed to test either every function or every component of the machine.
Tb.e Saturn-V computer test programs were more than Gperation code
exercisers. They were designed to bring to an up-level every diode
line to an AND gate except the one being tested and to determine that
the associated latch, tratch, or inverter does not set or reset. Obvi-
ously, a complete and independent check of each diode cannot be
achieved. Also, the bit pattern intended to test a specific diode will
unintentionally test other diodes in other groups of circuits, and this
situation will not be recognized when the program is being written.
Only simulation will identify these multitest conditions.
Figure 54 shows the distribution of the disagreement detectors
for a "minimum length" diagnostic program. Although this program
contains 294 insh'uction program steps, the final failure propagation
was determined by the 150th step. From these results, these abserva-
tions were made: either the total program written to test out all the
solid state devices was not needed o: a better diagnostic symptom
distribution could be had.
The diagnostic distribution can be increased provided the dis-
agreement detectors can be properly timed. This is made possible by
instrumenting the detectors to accept failures only at selective pro-
gram steps. These step_ are chosen to give predetermined failure
> symptoms. Controlshouldbe made availabletooperatethe selected
group of logic a set number of clock times.
160
1966011715-183
@,uaoJod)
s_4o4S uJa_,_od JoJJ3 Io'.'!4 6u!^o H saJnl!,_-I
161
1966011715-184
Many different types of symptems were produced as a byproduct
of the simulation experiments. All of these were analyzed to determine
their' individual and combined value in identifying logic signal failures.
Table 34 gives a summary of these results. Signal failure identifica-
tions are based on the rearranged eight-diagnostic module configura-
tion. Only unique signal identifications were tabulated.
TABLE 34 -- Symptom - Failure Cerrelation
Failures Identified
No. Observed Symptoms in Logic) (percent)
1 First Program Step of Det ":ed
Error ' 10.5
2 Final Error Pattern 26.3
3 Time of First Detected Failure 28.1
4 Final Error Pattern 20.2
5 First Three Program Steps of
Detected Errors 63.2
6 First Three Program Steps of
Detected Error "rod Final
Error Pattern 96.5
7 First Program Step at Detected
Error and Final Error
Patter-_ 63.1
8 First Program Step of Detected
Error, Final Error Pattern,
and Phase, Bit_ Clock Time of
First Detected Error 82.4
162
1966011715-185
To test the conclusions of the simulation experiments, a failure
was physically injected in the computer by cutting a lead on an output
of a logic inverter. The symptoms are given in Table 35.
TABLE 35 -- Computer Symptoms
Instruction Step Operation Data Address Error Position
,m
006 CDS 121 -
Failure 007 CLA 077 6
010 MPY 100 6
The sim-_ation problem was to fail all of the possible signals
which could cause error position 6 to appear during instruction step
007. Based on the results of Table 34, approximately 63 percent of
the failures could be uniquely identified if only the failure program
step and final error pattern were given.
All the possible logic signals which could cause the error pattern
were simulated to fail to the stuck "1" or stuck "0" ease. As shown,
only the HOP signal failure gave a duplicate symptom as observed in
the laboratory. Indeed, this was the injected failure. Although failure
_solation capability cannot be claimed from this single ease, this is
sufficient evidence that the simulation results are valid. A number of
similar verified failur _ eases wi]l have to be simulated to prove the
worth of the diagnostic partitioning.
3.5.3 System
_ 8aturn-V adapter test programs are similar in organization
to the computer test programs being component-oriented rather than
simple functlonal exercisers. The subroutines, assembled in functional
groups of instructions, include the following tests:
1) Real time
2) Interrupts
163
1966011715-186
3) Accelerometer processor
4) Ladder pulse counter
5) Discrete output register (DOR)
6) Switch selector register (SSR)
7) Buffer register
8) Mode
) 9) Ladders
10) Cross-over :letectors (COD's)
11) Computer telemetry
12) Error monitor register
13) Discrete input multiplexer (DIM)
14) Data output multiplexer (DOM)
An error subroutine is forced upon program detection of an error,
the normal order of instructions being to enter the error routine, store
the error, and return to the main program at the next program step
after that which detected the error. Error storage capabilities are
provided for th_ first 46 errors occurring in the test run. Under pro-
gram control, the program may be recycled through the subroutine in
which the error was detected, the number of passes being predeter-
mined by a constant. Upon completion of the subroutine recycling,
the program resumes normal operation. Error data may be read out
visually or by printout on operator request through the front panel
switches.
The Saturn-V marriage test programs were designed to check
out computer/data adapter combinations. Since the computers and
data adapters used in the marriage tess were to have been checked
individually before the marriage test, the primary purpose of these
programs was to check the cemputer/data adapter interface and oper-
> ational capability, especially in the area of timing.
164
1966011715-187
In the AES configuration, since the computer and data adapter
are packaged "n the same unit, the data adapter and marriage tests
would be combined into a single system test program. It is also
possible to combine the computer and system tes_ programs for the
AES configuration except that there are advantages to applying func-
tional test routines to the computer and operational test routines to
the computer/data adapter system, which wm,ld be convenient to
implement in two test programs rather than one.
165
1966011715-188
4.0 FABRICATION AND TEST
Limited fabrication was required in the study to prow_ the feasibility
of inflight maintenance in a high humidity-zero gravity environment.
Nine representative replaceable modules illustrating a solution to the
problem of packaging and sparing in the adverse AES environment were
fabricated by modifying Saturn-V Breadboard Computer No. 1 logic
pages.
A nonfunctional mock-up of the AES computer subsystem was fab-
ricated to ill'lstrate the physical organization and pacgaging technique
developed by the packaging and machine organization investigations of
f the study. A departure from conventional aerospace computer _..tck-
aging illustrated by this mock-up is the elimination of over-all unit
sealing in favor of individual module sealing.
Since over-all unit sealing was cl . _tCd as a design feature,
special consideration had to be given to . problem of providing ade-
quate connector protection against long-term exposure to the high
humidity-zero gravity AES environment. Exploratory tests of various
connector sealing methods resulted in a gasket-silicone gel technique
which showed no appreciable change in leakage resistance between con-
tact pins even when unmated and remated in a salt water bath.
Fabrication of a special environmental test chamber was necessary
to provide a simulated AES environment for evaluating the representative
modules. This chamber provided a controllable humidity, a periodic
spray of a solution of sodium chloride and urea in water, a controllable
duty cycle on the test modules, and a means for disconnect_.ng and re-
connecting the modules in the environment to simulate maintenance
operations.
Evaluation tests were performed on the nine representative modules
in the special test chamber. These tests were exploratory rather than
demonstrative in nature. The environmental stresses such as humidity,
temperature, contaminants, and disconnections were gradually increased
over a test period of several weeks it: order to determine by what
margin the modules meet the operational requirements rather than
simply whether they qualify or not.
> 4.1 Computer Mock-Up
\
A nonfunctional mock-up of the AES computer system was fabri-
cated to illustrate the organization and packaging concepts developed
1_66
1966011715-189
during the study. A general layout of the mock-up is shown in Figure
55. The computer and _lata adapter are packaged in the same structure
although there may be some installation advantages to packaging then',
separately as in the Saturn-V system.
The recommended structure would be made of magnesium-lithium
and is designed for integral cooling. Since the AES co.aputer will not
be a sealed unit as in conve_tional aerospace ,_iesigns, no provision for
• over-pressurization, rolief valves, or purging were required. Most of
the eloctronic components are packaged in thirty logLc modules, three
memory modules, and four power supply modules (including an RFI
filter). Twelve additional logic modules are included as spares oi
growth potential. A large screw located in each module provides a
positive connect-disconnect technique for the logic pages and eliminate_
the need for special tools such as the Saturn-V page puller. Each
module is individually sealed. The structure itself is hermetically
sealed and contains the interco,mectiop boards and cables for the
modules.
A pho_ograph of the completed mock-up is shown in Figure 56 in
the Appendix.
4.2 Exploratory Tests
Whatever packaging technologies are eventually selected for _kES
a_plications, the problem of sealing the intermodule and interequipment
connectors against the high humidity-zero gravity environment will
exist. Explorato.ry testing of methods of sealing the Saturn-V page
connectors resulted in tim se!ection of a gasket-silicone gel technique
for the representative module fo be demonstrated according to the
Phase II test plan. A sketch of the representative module is shown in
Figure 57.
A request to use Saturn Computer Breadboard NO. 1 logic pages in
the AES-EPO study to test various methods of protecting against the
high humidity-zero gravity environmept was approved by NASA-MSFC.
These breadboard pages were modif':ed by sealing the pages with a
potting compound (RTV). _ince thin larainar coatings (as used on
Gemini and Saturn-V pages) were found to be porous and since thick
encapsulant coating_ were found to damage connections and components
during curing, a technique was developed for the AES modules in which
a thin laminar coat_.ng was applied to the page surfaces to protect the
cJ.rcuits mechanically from the thick layer of encapsulant which was
then applied over the laminar to provide a nonporouL_ seal. The modules
167
1966011715-190
168
1966011715-191
Page Frame
Potting Compound
- L,
_ .Jm
2 Rows
_crews
Cap ]_L_ 1 iasketConnector _ L_ L _ , Connector Loaded
_J _/ / / with Sillcone Grease
Test Fixture Base\
Sealed with RI"V /
/
Test Fixture Wiring
Figure 57. Representative Module for Phase II Testing
169
1966011715-192
were baked and evacuated before treatment to avoid sealing in moisture.
The female connector, wired into the test fixture, was sealed with a
silicone rubber gasket on top with an RTV seal around the sides and
was loaded with silicone grease to retard moisture accumulation in the
female pins.
Phase I testing included the following investigations:
1) Gasket seals on the interface between the male and female
connectors
2) Sealing of the connector with various greases
)
3) Combinations of the above.
The technique showing the most promise is sketched in Figure 58.
Male and female Saturn-V page connectors were wired and sealed with
epoxy on their rear surfaces. A sflmone rubber gasket was glued to the
face of the female connector with Dow-Corning A9-4000. The female
cap was removed and DC-3 silicone grease packed inside the connector.
The pins of the male connector were also saturated with DC-3 silicone
grease. Contact measurements before and after application of silicone
grease indicated that the grease had no measureabie effects on the con-
tact resistance between male and female connections. Leakage resist-
ance checks between adjacent pins sho_,ed the following worst-case
conditions:
1) Initial leakage resistance of mated test mo:le!--500,000
megohms
2) Immersed mated connector m fresh water for 15 seconds
and shook off excess water--2,000 to 10,000 megohms,
erratic
3) Unmated connector, dried male for 20 seconds at 125
degrees Fahrenheit, remated--140,000 megohms
4) Unmated connector, immersed both halves in fresh water
for 15 seconds, shook off excess water, remated--70,000
megohms
5) Unmated connector and remated--5, 000 megohms
6) Unmated connector and remated--70,000 megohms
170
1966011715-193
Cable
/
__' _/ Connector
U- D
Screws
Cap -_ Gasket(connector loaded
Connector I u ! with Silicone grease)
Epoxy
Cable
Figure 58. Please i Test Model
171
1966011715-194
7) Unmated connector and remated--85,000 megohms
8) Unmated connector and remated--60,000 megohms
9) Unmated and remated connector under fresh water--reading
erratic
10) Unmated connector and shocked water off male on desk top,
remated--50,000 megohms
11) Unmated connector and remated--10,000 megohms.
) Previous tests of a similar nature with gaskets alone and with
greases alone resulted in low leakage resistance readings. Although
the readings listed above appear erratic, they were very encouraging
from the following viewpoints:
1) The lowest leakage resistances were still in the thousands
of megohms.
2) The surface between the cap and connector of the female,
and the screw holes in the female, presented sources for
leakage which were sealed during Phase II tests.
3) Additional exploratory testing with a lighter silicone grease
did not exhibit as erratic readings as above even though
the water bath was changed to salt water.
4.3 Environmental Simulation Equipment
A special environmental test chamber was required to simulate
the high humidity-zero gravity AES environment. A photograph of the
chamber is shown in Figure 59 in the Appendix, and a functional dia-
gram is shewn in Figure 60. Test equipment used in Phase II testing is
shown in Table 36. The chamber was designed to provide the following
nominal environmental conditions with means for varying these con-
dttions over a wide range:
1) Relative humidity of 90 percent
> 2) Temperature of 100 ° F
172
1966011715-195
PageCo" c'or !iSupply 6 V _,, Page Bias oSupply 3 V _ _
__o i-
"I
z_L
19 Wires -.,. 2
Fog Bias 1
Power Supply I . lr_l!
- 'li 4I I o
!1 °' "
Regulated 115 VSolenoid
Air Pressure Valve Fog
Nozzle , , ,
I I I
-"-L'__
I . Page Type 2ii 212
I1 0.2_/o Salt
_evolutlon
Per Hour 0.11% Urea
'_Timer by Weight Rubber Gloves
in Distilled
Waler
Access Port., _.-
, Environmental Chamber
1966011715-196
iq
t
i
50% 25% '
Duty Cycle Duty Cycle ":
T_mer/Switch Timer/Switch
ad Box
To _
Digital
), Volt _j
__P_s ._ _i Metir Dr" ,;_<_
< Te' -_Cc"_i
est
, _ !_
N 19' ":res '0 Wires -_
L I_ Air Flow
,'',', ',' ttt+'_I I i I I I I I II I
_iilk L---J MuffinFan !i.]._°
j , I ) , i i Ii , I I I :_
i | I I I I I I I c t i I __
_, , , , _ , , _ _ A;r HeaterO,..,,o_Wo.orii
Reservoir forHum_dlty Control ,
:1
Water Level
_" (Float Valve) IControl
i__
i
1966011715-197
iRoom
Temperature
/ Bulb l
mperatu re i
ntrol Temperature Monitor
! IT,_.ermocouple Bridge
I
Wet Bulb
Temperature
Control
O... m
I To 115 Vac
! J Variac Ii Auto- . ,
J Former
ii Water Temperature Control
J n_°ll,_,w=:U i ,,: _ Water Temperature
r Control T-hermo-
couple Bridge
i!_ .._ To Distilled Water Supply
Figure 60. Functional Diagram - Test Chamber
, 173
1966011715-198
TABLE 36 --Test Equipment Listing
Unit Type
Page collector power supp!y, + 6v ............ Trygon Mod. $36-2.5
Page bias power supply, -3v and 6v
fog charge power supply. .................. Twin low voltage by
Harrison Labs.
. Mod. 802-B
t
) Duty cycle timer switches ................. Paragon Model 4001-0
Solenoid valve, 115 Vac ...................
Timer - one revolution per hour - for
fog spray ............................ Haydon
Muifin fan for air circulation ............... Rotron Corp.
Digital ve]_meter ....................... Kintel Mod. 456
Digital readout ........................ Kintel Mod° 473-A
Water temperature controller ............... Honeywell;
Mod. 152C33P-36-11
Water level control ..................... Water Boy by
Maid-O-Mist Co.
Water heater .......................... Chromalox_220v-
2000 watt,
Edwin L. Wiegand;
Pittsburgh, Pa.
Air heater ........................... 100v 'Hot Watt' - car-
tridge type with 10
1-1/2" square of 0.08"
co_pcr soldered to
heater
>
"_ Air heater control ...................... Variac. - General
Radio - Type Wl0MT3
Temperature control bridge readout .......... Honeywell Mod.
15618826-06-01-2-061
174
1966011715-199
3) Water with 0.22-percent sodium chloride and 0. ll-percent
urea in solution sprayed on test modules once each hour to
simulate the attraction of free water to an electrical field
under zero gravity conditions
4) Electrical charge applied to spray prior to contacting
modules to simulate ionization of free water
5) Duty cycles imposed oi1 modules of 25, 50 and 100 percent.
6) A source of hot air for drying the module coimector after
it has been removed from its receptacle, sprayed, and
before it is reconnected to simulate a possible maintenance
technique (not used during entire test)
7) Rubber gloves sealed to one wall of the chamber to allow
simulated maintenance activity (module replacement) with-
out changing the environmental conditions (and possibly
simulating the suited a.,,_ronaut condi.tions).
A photograph ot the chamber in operation is shown in Figure 61
in the Appendix.
The following Saturn V Simplex Breadboard No. 1 pages (rep_.ace -
able modules) were placed on test after potting modifications:
1) Transfer Register 1 6109211
2) Multiply and Divide 1 6109212
3) Operation Code 6109213
4) Transfer Register 2 6109214
5) Interrupt 6109215
6) Sector Register Y Decode 6109216
7) Multiply/Divkte 3 6109230
8) Arithmetic Instr¢lction Counter 6109231
9) Address Reg,stez ana X Decode 6109232
,_ 175
1966011715-200
Twenty functional connections were chosen for each logic page,
and d-c voltages were applied to the input terminals of this connection
set. The d-c patterns at the output terndnals of each logic page were
then monitored periodically during the test period to determine if the
environment had caused a deterioration 3f h:e logic pages or connectors.
Photographs of the test fixture cortaining the nine representative
test modules are shown in Figure 62 in the Appendix.
4.4 EvaluationTests
The fabrication tasks required by the statement of work were
directed at solving the long term reliability problem by sparing. How-
ever, inflight maintenance in a zero gravity is complicated by the fact
that free water in the form of droplets., migrating to points of clcctrical
potential difference, exist in the spacecraft environment. Consequently,
it was necessary to solve the problem of sparing in this environment
before inflight maintenance could be utilized as a means of meeting high
reliability apportionments for long term missions.
Nine functionalcomputer modules (Saturr,V breadboard pages)
were packaged usingthe techniquedetermined by exploratorytestingto
be tt_emo_t suitablefo:"successfuloperationand maintenance inthe
AES envilonment. Alltestmodules and theirmating receptacleswere
visuallyex'aminedtodetermine and record _:heirconditionpriortotest.
Each module was initiallychecked electric&llyusing thefollowing
procedure:
1) Module placed ;-n tb'_ receptacle.
2) Resistancemeasurements made from itsexternalload
resistorstoeitherground or the +6 voltpower supply.
: 3) Power applied.
4) Current flow through each of the 20 activated pius was
determined by measuring the vo!fage drop across the load
resistors. (Voltage sources have 10-ohm resistors in
series with their pins. )
> 5) Power was turned off and the module removed.
\.
176
1966011715-201
All modules were placed in their receptacles, power applied, and
currents checked. Voltage planes wvre supplied. Their nominal
voltages a,id logic inputs were supplied with d-c levels.
Environmental conditions described in Section 4.3 were applied
and the test i_.itiated.
Functional operation of the modules was checked daily for the
first five days of testing and twice a w_ek thereafter.
Duty cycle total number of removals and replacements, o..d
number of times each module was sprayed while disconnected from its
mating connector are indicated in Table 37.
TABLE 37. Test Schedule
Module Number i 2 3 4 5 6 ! 7 8 9 :
Duty Cycle (%) 100 100 100 50 50 50 25 25 25
Removals/Replacements C 0 2 0 0 2 I 0 0 2|
I "Sprays while removed 1 1 2 1 1 2 0 2 21 ,
Plus 6-volt power was turned =_ and off once each 24-hour perio, i
to achieve duty cycles less than !00 percent.
Removal and replacement of the modul_s was performed manually
by utilizing rubber gloves titted to the wail of the chamber. Modules
were removed from the mating connector for approximately 3 minutest-
i to simulate a maintenance action.
: For each replaceable representative module, a log was maintained
containing a detailed time history of modilication and test events, in-
i cluding all pertinent enwronmental and test data. These longs remained
with their respective modu]es throughout the study period.i
4.5 Test Results
i Phase ! (exploratory) te_ting of various methods of sealing =
module connectors resulted in the selection of a ffasket-silicone gel
l
177
-/
t
1966011715-202
technique. Although the purpose of Phase II testing was to evaluate this
method ef sealing the module connectors in a simulated high humidity -
zero gravity environment, the Saturn-V logic pages used the represent
AES replaceable modules had to be sealed as well to prevent logic
/allures during test. Urffortunately, the magnesium frames of the logic
pages deteriorated during the course of the test, allowing moisutre to
wol k l:_tween the decomposed frames and the RTV sealing compound,
and logic failures did occur. However, since the purpose of Phase II
testing was n Jt to evaluate sealing of the module itself (replaceable
modules in the AES computer would be hermetically sealed as in-
dividual components) and since only one connector failure (a loL;:-c
g.asket) could be identified during the entire test period, the gasket-
silicone gel sealing technique for electrical connectors was felt to be
demonstrated as a feasible solution to connector operation in the
adverse AES environment.
Very few test points cn the nine representative modules showed
any appreciable change in voltage level during the first month of cesting
although the magnesium -lithium frames of the Saturn-V logic pages
showed drastic deterioration. Those test points which did exhibit
appreciable change (over 5 or 10 mHlivolts) were found to be located
on those pages showing the most frame deterioration, indicating t,hat the
voltage changes resulted from moisture leakage around the frames
(under the RTV seal) rather that at the connector.
On 7 November the tests were interrupted by a failure of the test
chamber. A connection came loose from the water tank causing a
flooding condition and drastic changes in the temperature - humidity
environment. A rash of voltage changes occurred at this time,
especially on those modules exhibiting the most frame deteriorization.
4.5.1 Electrical Measurements
Itisdifficulttodefinea voltagechange as a failure,since the
operationofdigitalsystems ison-offinnature.,.Even themos_ drastic
changes involtagelevelsmonitored during thetestmay not cause a
logicfailureinactualpractice. For the purpose ofthisdiscussion,
however, a logicfailureisdefinedas a voltagechange ofover 25
millivolts.
Of the nine representative modules, sLx exhibited no failures until
the test chamber failed (a test period of over a month). Two modems
survived the flood and exhibited no failures fo_ the entire test period
(57 days). One module exhibited one failure, and a second module
178
1966011715-206
exhibiL _!two failures after the {iood. One of t_e modules which oper-
ated perfectly up to the time of the flood had to t,e disconnected follow-
ing the test equipment failure because it began t- draw excessive
current from the power supplies.
One module exhibited two failure_ during the first month of testing.
Another module lasted 20 days of testing before experiencing its first
failure and then degraded rapidly. The nintt, alodule experienced three
failures after only a few days of test and then stabilized until the flood,
after which it degraded steadily.
Only one module out of the nine had to be removed from the
tester due to drawnlg excessive current, hcwever. Eight finished the
test period of 57 days.
In general, the frequency of failures could be correlated with
deterioration of the magnesium-lithium frame (causing the RTV seal
to peel off the circuit board). The failure mechanisms seemed, there-
fore, to be moisture leakage under the edge of the RTV seal rather than
by means of the connectors. Detailed examination and disecting of the
failed modules supported this conclusion.
The failures (defined as a change in voltage level at the test points
of greater than 25 millivolts) are charted for each of the test modules
in Figures 63 through 71. A number nf recoveries (return of test point
readings to within 25 millivolts of the original reading) were identified
during the tests and denoted by downward pointing az'rows on the charts.
The chamber failure occurred at 936 hours.
As an example of how to read the charts, refez :o Figure 63. No
failures occurred until the chamber flooded. The voltage level at seven
test points shifted by more than 25 millivolts at this time. Another
failure occurred at about 1125 hours but recovered 2 days later. The
test point levels then remained unchanged for the remainder of the *.est.
The total failures of all nine modules are plotted in Figure 72 as
percentage of test points failed versus test days. This plot is o: some
interest since a continuous curve drawn through the discrete test data
points generally follows a typical logistic curve (growth modified by a
limiting factor). If this figure does represent a logistic trend, it in-
dicates that additional failures would occur at a slower rate if the test
period were extended for an additional period of timt. The "growth"
(exponentially increasing) portion of the curve could be due primarily
to deterioration of the magnesium-lithium frame, at,! the "co2tstraint"
(raveling off of the curve) could be due to the better p:otection afforded
!
179
I
1966011715-207
20-
15-
S lO-
U- •
I
o , , I ,
o _o 4oo 6oo _ l_ i_o i_
TestHours
Figure 63. Phase II Module F_ilures 525 Millivolts) - Module No. 211
2o- I
I
15- "_V Page Removed
i
E/
._1lO-
_'I
.5-
I
o , , I
,_. o 200 4oo _ 86o 1060 126o 146oTest Hours
Figure 64. Phase II Module Failures (>25 Millivolts) - Module No. 212
180
1966011715-208
015, I
5,
o !
o 2oo _o 6oo _o lo6o 1_o 1_o
Test Hours
Figure 65. Phave II Module Failures (> 25 Millivolts) - Module No. 213
0=
15.
'_- 10-
.J
No changesin testpoint readings> 25 mv
,P
5-
O t i t I I I
0 260 4oo 6oo 8oo iooo )2oo _4o0
Test Hours
Figure 66. Phase II Module Failures ( >25 Millivolts) - Module No. 214
181
1966011715-209
0.
15.
10-
"5
It..
5Ln ._ v
U 260 4()0 600 _ 1000 12_)0 1400
Test Hours
Figure 67. Phase II Module Failures ( >25 Millivolts) - Module No. 215
20-
15_
L. 10-
5-
"' Io t
0 260 460 6bo 860 Iobo 12bo 146o
_. Test Hours
Figure 68. Phase IIModule Failures( >25 Millivolts)- Module No. 216
182
1966011715-210
20_
15-
I0-
o . 4-
1 fo 2oo 46o _ _0 1_0 .2oo 140o
Test Hours
Figure 69. Phase II Module Failures ( >25 Millivolts) - Module No. 230
15.
• 10.
No changes in test point readings _ 25 milllvolts
I'
o 2oo _o 66o _ lobo 1_o 14_o
Test Hours
Figure 70. Phase II Module Failures ( >25 Millivolts) - Module No. 231
183
1966011715-211
2O
15
t-
lO I
0 i
o 2bo 4o0 _bo _ ;obo ;2bo ;_o
Test Hours
Figure 71. Pta,3e II Module Failures ( >25 Millivolts) - Module No. 232
the logic located internally on the module pages (away from the edges
where the frame is deteriorating).
4.5.2 Physical Examination
The nine representat've replaceable modules were examined at
the start of the Phase II testing period for defects in the materials or
for any unusual physical characteristics. Particular attention was
given to the RTV sealing compound and its adhesion to the magnesium-
lithium madule frame, to the adhesion of the silicone gasket to the
female connector, and to the pin contacts. No defects or unusual char-
acteristics (such as discolorations) were noted.
Figure 73 in the Appendix is a photograph of the nine modules
under test conditions (photographed through the plexiglass test chamber)
taken 9 days after the start of test. The photograph shows the begin-
_" ning of an accumulatio_ of _alts and contaminants on the aluminum test
fixture indicating the severity of the test. Figure 74 in the Appendix
184
1966011715-212
40. I II
I
+ !i ++5
,0 +:E
X
Z
I
ib 2b 3b _ '4g 50 6b
Test Days
Figure 72. Summary of "Failures"
shows the same test fixture photographed out of the test chamber after
27 days of continuous testing.
Figure 75 shows seven of the nine modules removed from the
test fizture after 27 days of testing. The two modules not shown could
not be removed from the test _xture because of adhesive corrosion
between the module frames and the fixtur, guides. The deterioratiGn
of the frames of the seven modules shown is noticeables especially of
Module 212 (middle right hand of photograph).
185
1966011715-213
The test chamber failed 32 days after start of the test. Eight
modules were removed from the chamber and test fixture at this time
and photographed. Difficulty was experienced in removing several of
the modules from the fixture because of adhesive corrosion, and one
module could not be removed. Photographs of the eight individual
modules are shown in Figures 76 through 83, which are in :he Appendix.
Testing of Module 212 had to be discontinued at this time because
its circuits began to draw excessive current. Figure 84 in the Ap-
pendix shows an end view of the module and the peeling of the RTV
caused by the deterioration of the frame. Note the piece_ of foreign
matter that have fallen between the RTV seal and the logic module.
Appendix Figure 85 is an cnlarge,nent of one section of the module
showing the exposure of the logic after loose RTV was cut away.
Although physical inspection of Module 212 indicated that the ex-
cessive current drawn by this module was probably due to a failure in
the modute sealing, the gasket attached to the female connecto_ was
also found to be loose (indicating that the connector might have con-
tributed to the failure) as shown in Figure 86, in the Appendix.
In general, the contact pins were found to be in very good shape
on all the modules at this time. Appendix Figure 87 is an enlargement
of the set of pins exhibiting the worst pin discoloration. A chemical
analysis of the corrosion products identified the blue-green material
as copper chloride hydrate and the black material as copper chloride
combined with a small amount of copper oxide. (No nickel barrier
layer existed between the copper pins and the gold plating on these
connectors).
At the end of test (57 days), seven of the eight remaining modules
were removed from the fixture and again photographed individually.
Module 215 could not be removed because of adhesive corrosion. The
seven modules are shown in Appendix Figures 88 through 94.
4.5.3 Conclusions
Although it was not possible _o assign the failure mechanism
(which caused a test point reading to change by more than 25 miilivolts)
to either the connector or to the log_.c module, the general correlation
between frame deterioration and failure cccurance indicates that most
._ of the failures were due to leakage of moisture and contaminants into
the logic circuits rather than by means of the connectors. The failure
186
1966011715-214
of the magnesium-lithium /rame_. a:,d resulting loosening of the RTV
seal were not considered significant, since hermetically sealed
modules would be used in the AES computer.
187
1966011715-215
P_IOTOGRAPHIC APPENDIX
AES-EPO STUDY PROGRAM
A-I
1
1966011715-217
• M o,: ku I)Fi_ure 56. Completed" "
A-2
1966011715-218

1966011715-220
Figure 62. Test Fixture (Sheet 1 of 2)
I A-5|
1 .qRRN 1 171 r-,_9o ,_
Figure 62. Test Fixture (Sheet 2)
A-6
1966011715-222

Figure 74. Nine Modules Under Test - 27 Days
A-8
1966011715-22"
Figure 75. Seven Modules - 27 Days
I A-9
1966011715-225
Figure 76. Individual Module - 32 Days
A-IO
1966011715-226
Figure 77. Individual Module - 32 Days
A-II
1966011715-227
A-12
1966011715-228
A-13
1966011715-229
_ $2 D_ 's
;0,, Ind
_, Figure SO.
A.-_4
,_9660'
_a
A-15
1966011715-231
1966011715-232
A-17
1966011715-233
Figure 84. Module 212 - 32 Days
A-18
1966011715-234
Figure 85. Module 212 Enlargement - 32 Days
A-19
1966011715-235
1966011715-236
' 1966011715-23"_
Figure 88. Individual Module - 57 Days
A-22
1966011715-238
Figure 89. Individual Module - 57 Days
A-23
1966011715-239
Figure 90. IndividualModule - 57 Days
A-24
1966011715-240
A-25
1966011715-241
A-26
1966011715-242
Figure93. IndividualModule - 57 Days
Am_
1966011715-243
A-28
1966011715-244
