Advanced Launch System Multi-Path Redundant Avionics Architecture Analysis and Characterization by Baker, Robert L.
NASA Contractor Report 4516
Advanced Launch System Multi-Path
Redundant Avionics Architecture
Analysis and Characterization
Robert L. Baker
CONTRACT NAS1-17964
JUNE 1993
(NASA-CR-4516) ADVANCED LAUNCH
SYSTEM MULTI-PATH REDUNDANT
AVIONICS ARCHITECTURE ANALYSIS ANO
CHARACTERIZATION Final Report
(Research Trianqle Inst.) 200 p
N93-31642
Unc 1 as
H1/62 0175543
RJASA
https://ntrs.nasa.gov/search.jsp?R=19930022453 2020-03-17T05:08:35+00:00Z
NASA Contractor Report 4516
Advanced Launch System Multi-Path
Redundant Avionics Architecture
Analysis and Characterization
Robert L. Baker
Research Triangle Institute
Research Triangle Park, North Carolina
Prepared for
Langley Research Center
under Contract NAS1-17964
N/SA
National Aeronautics and
Space Administration
Office of Management
Scientific and Technical
Information Program
1993
Contents
Acronyms, Abbreviations and Symbology 	 Viii
1 INTRODUCTION	 1
1.1	 Purpose	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 1
1.2	 General System Description 	 . . . . . . . . . . . . . . . . . . . . . . .	 1
	
1.3 Architecture Configurations and Coarse Module Breakdowns . . . . .	 2
2 INFORMATION REQUIREMENTS . 	 15
2.1
	
Requested Information . . . . . . . . . . . . . . . . . . . . . . . . . .
	 15
2.2
	
Information Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . .	 18
3 RELIABILITY, AVAILABILITY, MAINTAINABILITY AND TESTA-
BILITY EVALUATION	 21
:3.1	 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .	 21
:1.2	 Hardware Module Failure Rates . . . . . . . . . . . . . . . . . . . . .
	 24
3.3	 Core Processing Functions . . . . . . . . . . . . . . . . . . . . . . . .
	 31
:3.4	 Acquisition Network Reliability . . . . . . . . . . . . . . . . . . . . . 	 60
:3.5	 Testability . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .
	 71
4 PERFORMANCE EVALUATION
	 76
4.1	 Introduction . . . . . . . . . . . . . . . . . . . . . . . 	 . . . . . . . . .	 76
	
4.2 Sensor/Actuator Data Interprocessor Communications Performance .
	 82
5 FAULT TOLERANCE FEATURES OF MPRAS ARCHITECTURES
109
5.1	 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
	 109
5.2 Fault Masking and Data Consistency Mechanisms . . . . . . . . . . . 109
5.2.1	 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . .
	 109
5.2.2	 General Dynainics Fault Masking . . . . . . . . . . . . . . . .
	 111
5.2.3
	 Boeing Fault Masking
	 . . . . . . . . . . . . . . . . . . . . . .
	 115
iii
5.2.4	 AIPS Fault Masking	 . . . . . . . . . . . . . . . . . . . . . . .	 116
	
5.3	 Fault Detection and Diagnosis . . . . . . . . . . . . . . . . . . . . . . 	 117
	
5.4	 Fault Recovery	 . . . . . . . . . . . . . . .	 . . . . . . . . . . . . .	 121
	
5.5	 Redundancy and Sparing . . . . . . . . . . . . . . . . . . . . . . . . .	 122
6 SUMMARY AND CONCLUSIONS 	 124
APPENDIX A - Functional Decomposition of MPRAS RequirementsA-1
APPENDIX B — OS Performance Modeling for Distributed Real-Time
Systems	 B-1
iv
List of Figures
1.1	 .....................................	 :3
1.2 General Dynamics Core Avionics Example Configuration . . . . . . . 	 6
1.3 General Dynamics Booster Avionics Example Configuration	 7
1.4 Boeing Avionics Example Configuration . . . . . . . . . . . . . .	 8
1.5	 AIPS Example Configuration 	 . . . . . . . . . . . . . . . . . . . . . . 	 9
:3.1 Framework for Reliability Evaluation
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 22
:3.2 Core Processing Single Channel Failure Rates
	
.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . :30
3.3 Quadruplex On-pad Model .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . :32
3.4 Quadruplex	 Launch	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 3:3
3.5 Self-Checking Pair with Spare Model	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . :38
3.6 Self-Checking Pair Duplex Unreliability .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 40
3.7 Quadruplex On-Pad Model (State descriptions are in following text) . 41
3.8 Quad	 On	 Pad	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 46
Quad	 Failure to Launch	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 47
3.10. Quadruplex Launch Model (State descriptions are in following text)
	 . 48
3.11 Triplex Launch Unreliability (State descriptions are in r ofllowing text) 54
3.12 Sensor Acquisition Network with Local Voting
	
.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 62
:3.1:3 Sensor Acquisition Network with Central Voting .
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 6:3
4.1	 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
	 85
4.2 utilization of Model Components for 1 Sender and 1 Receiver.
	 86
4.3 Utilization of Model Components for 2 Senders and 1 Receiver..
	 87
4.4 utilization of Model Components for 8 Senders and 1 Receiver..
	 88
4.5 Utilization of Model Components for 2 Senders and 2 Receivers.
	 89
4.6 Utilization of Model Components for 8 Senders and 2 Receiver.. . . .
	 89
4.7 Utilization of Communication Network Across Sender to Receiver Ratios. 90
4.8 Utilization of Each Receiver Across Sender to Receiver Ratios. . . . .
	 91
4.9 Effective System Bandwidth Across Sender to Receiver Ratios. . . . .
	 92
v
4.10 FTP Architecture and 1/0 Data Flow .
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 93
4.11 FTP	 I/O	 Structure	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 96
4.12 Boeing Architecture and Time Critical I/O Data Flow
	 . . . . .	 .	 .	 .	 98
4.13 Boeing Non-Time Critical I/O Data Flow
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 100
4.14 Boeing	 I/O
	
Structure
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 101
4.15 General Dynanucs Sensor Data Flow
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 105
4.16 General Dynanucs I/O Structure
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 107
vi
List of Tables
1.1 Module Counts for General Dynamics Example . . . . . . . . . . . .
	 10
1.2	 Module Count for Boeing Example . . . . . . . . . . . . . . . . . . .
	 11
1.:3
	 Module Count for AIPS Example . . . . . . . . . . . . . . . . . . . .
	 12
:3.1	 Mission Phases	 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
	 2:3
:3.2 MIL-HDBK-217E Failure Rates for .JIAWG Modules
	 . . . . . . . . . 26
3.3 Failure Rates for General Dynamics Modules .
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 27
3.4 Failure Rates for Boeing Modules 	 .	 ...	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 27
:3. 5 FTP	 Failure	 Rates	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 28
3.6 AIPS Module Failure Rates	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 ..	 ..	 .	 .	 .	 .	 .	 .	 .	 . 29
:3.7 Sensor Interface Failure Rates	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . :31
3.8 Probability of Failure Due to Imperfect Pre-launch Diagnostics . .
	 .	 . 51
:3.9 Probability of Launch Failure For a Quad Degraded to Triplex During
Pre-launch (Duplex Coverage = 0.5) 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 52
3^ 10 Probability of Failure Due to Imperfect Coverage During Launch . . . 5:3
:3.11 Triplex Launch	 Unreliability	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 56
3.12 Models for Probability that Maintenance Will be Required Prior to
Launch.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 59
3.13 Boeing Sensor Network 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 66
:3.14 General Dynanucs Sensor Network	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 67
:3.15 AIPS	 Sensor Network	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 68
:3.16 Sensor Network Unreliability	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 70
4.1 MPRAS Computational Resource Requirements — Boeing . . . . . . .
	 79
4.2 MPRAS Computational Resource Requirements — General Dynamics
	 80
4.3 MPRAS Computational Resource Requirements — Martin-Marietta
	 81
vii
Acronyms, Abbreviations and Symbology
ADAS Architecture Design and Assessment System
ADP Advanced Development. Program
AGN&C Adaptive Guidance, Navigation and Control
AIPS Advanced Information Processing System
ALS Advanced Launch System
ASSIST Tool for Specifying Reliability Models
BIT Built-iii-Test
BIU Bus Interface Unit
BM/ C3 Battle Management Convnunications Command and Control
CAP-32 :32 bit Common Avionics Processor
C;P Computational Processor
DIU Device Interface Unit
DOF Degrees of Freedom
DPM Dual Port Memory
EEPROM Electronically Erasable Progranunable Read Only Memory
EME Electromagnetic Environmeut
FDIR Fault Detection Isolation and Recovery
FLV Future Launch Vehicle
FTM Fault Tolerance Module
FTP Fault Tolerant Processor
GN&C Guidance, Navigation and Control
GPS Global Positioning System
C78zN Guidance and Navigation
HERF High Energy Radio Frequency
HSDB High Speed Data Bus
HW Hardware
IC Intercomputer Network
ICIs Intercomputer Interface Sequencer
ID Identification
IMU Inertial Measurement Unit
IOs Input/Output sequencer
ISA/RTL Instruction Set Architecture/Register Transfer Language
.JIAWG Joint Integrated Avionics Working Group
Lee Life Cycle Cost
viii
LDL Local Data Link
MIL-HDBK Military Handbook
MIPS Millions of Instructions Per Second
MPRAS Multipath Redundant Avionics Suite
MTBF Mean Time Between Failure
OS Operating System
P/A Propulsion/Avionics
PAVE PILLAR Air Force Program to Develop ("0111111011
Avionics Computing Modules
PDR Preliminary Design Review
PE Processing Elements
PIbus Processor Interface Bus for .JIAWG Modules
PROM Programmable Read Only Memory
QMR Quadruplex Modular Redundancy
RADC-TR Rome Air Development Center Technical Report
RAM Random Access Memory
RB:BB Receiver Bandwidth to Bus Bandwidth Ratio
RCS Reaction Control System
RDI Remote Data Interface
ROM Read Only Memory
PTI Research Triangle Institute
SCP Self Checking Pair
SDIO Strategic Defense Initiative Office
SIM Sensor Input Module
SRD System Requirements Document
SRR System Requirements Review
STM System Test and Maintenance Modules
SURE Tool for Analyzing Unreliability
Based on Sen-i-Markov Models
TDPA Table Driven Proportional Access
TLM Telemetry
TMbus Test and Maintenance Bus for .JIAWG Modules
TT&C Telemetry, Tracking and Conunand
TVC Thrust Vector Control
UTC Universal Coordinated Tillie
VHDL VHSIC Hardware Description Language
VHM Vehicle Health Monitor
VLSI Very Large Scale Integration
ix
1. INTRODUCTION
1.1. Purpose
The objective of the Multi-Path Redundant Avionics Suite (MPRAS) program Is the
development of a set of avionics architectural modules which will be applicahle to the
family of launch vehicles required to support the Advanced Launch System (ALS).
To enable ALS cost/ performance requirements to be met, the MPRAS must support
autonomy, maintenance and testability capabilities which exceed those present in con-
ventional launch vehicles. The multi-path redundant or fault tolerance characteristics
of the MPRAS are necessary to meet avionics reliability requirements.
A complex, real-time distributed computing system is needed to meet the ALS
avionics system requirements. General Dynamics, Boeing Aerospace and C.S. Draper
Laboratory have proposed system architectures as candidates for the ALS 1*V1PRAS.
The purpose of this document is to report the results of independent performance and
reliability characterization and assessment analyses of each proposed candidate archi-
tecture and qualitative assessments of testability, maintainability and fault tolerance
mechanisms. These independent analyses were conducted as part of the MPRAS Part
2 program and were carried under NASA Langley Research Contract NAS1-17964,
Task Assignment 28.
The characterization and assessment analyses were directed toward identifying
strengths, weaknesses, limitations and development risks for each architecture. At
the outset of this effort, an evaluation plan was developed which called for compar-
ative evaluations of the architectures against a common baseline set of application
requirements. As the program evolved, each architecture was designed to meet dif-
ferent detailed application requirements as defined by the developer. While there
was considerable similarity between these requirements, there were instances where
the requirements differed radically. Consequently, comparative evaluations were not
possible in certain areas.
1.2. General System Description
Figure 1.1 is a generic diagram showing the major elements of a digital avionics system
appropriate for the MPRAS application. The system is composed of: 1) sensors that
provide measurements of physical parameters which are necessary to implement the
desired control functions, 2) a sensor communications network to deliver sensor data to
the distributed computing resources, :3) distributed computing resources to carry out
the control computations, 4) a computer communications network to provide for data
and control conununlcatlons between computing resources, 5) operating system and
application software which implement the desired avionics functions, 6) an actuator
data distribution network which delivers control information to the actuators, 7) the
actuators that effect the desired control actions, and 8) a poNver system that provides
the necessary electrical power to all elements and interfaces to other vehicle or ground
support subsystems.
The major functions implemented by the digital avionics system for core and
booster components include:
1. engine or propulsion control,
2. adaptive guidance navigation and flight control,
3. fluids management,
4. integrated health monitoring,
5. power management,
6. data recording,
7. communications and telemetry, and
8. mission control.
1.3. Architecture Configurations and Coarse Module Break-
downs
In order to determine rough complexity characteristics for the MPRAS application,
modules of the candidate architectures were configured to implement a hypothetical
case. This case was derived to be generally within and representative of the MPRAS
requirements as represented by the Boeing Aerospace MPRAS System Requirements
Document and the General Dynamics MPRAS Point Design Evaluation Report. It
does not, however, represent a specific application. This effort was undertaken to
determine the overall complexity of the application in terms of module counts and to
determine each architecture sensitivity to application complexity.
A hypothetical launch vehicle with three engines in the core stage and five engines
in the boost stage was considered. For each engine the propulsion control used 30
parameter measurements that must be updated at a 100 Hz rate. Of the :30, 15 used
dual redundant sensors and 15 require single sensors. A total of 45, 100Hz, sensors
2
}
^
\j{k\\
>
^
.
,
^
_
y
<
ƒ</\\
\
}
f
(§»
-
-
-
-
-
-
-
-
-
-
-
-
-
-
ƒ---------
-
-
-
-
-
.
|
	
° [
()
/<
^
\
§%
§(
;
^(
§
/o
Q z
(§
.
o
(\/
Q^
§(
/ (
^
^
<\
'
	
\
(\
(3\
2
)&/
»
[§
±
 
&
^
	
E
;
'
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
/)z
z»/([
F
i
g
u
r
e
 
1
.
1
.
were used for propulsion. In addition, 30 parameter measurements, updated at a 1 Hz
rate were used. Similarly, 15 used dual redundant sensors and 15 used single sensors
for a total of 45 additional sensors. Finally, two dual redundant acoustic vibrator
sensors sampled at a 2 KHz rate were used. Actuators required for each engine were
16 updated at a 1 Hz rate and 4 updated at a 50 Hz rate. To support fluids manage-
ment, adaptive guidance, navigation and flight control an additional 540 sensors and
240 actuators were distributed within the core and booster. A block of inertial mea-
surement sensors and a GPS receiver and processor were assigned to provide essential
inputs for the navigation function. A total of 1290 sensors and 400 actuators were
required for this example. This number is well within the estimated 3000 to 6000
sensor/actuator requirements set for MPRAS. Functions such as integrated health
monitoring and data recording were not considered.
Except where noted, modules represent a circuit card. The modules that make
up each architecture were configured to ineet the requirements of this case. No claim
is made that the configurations are optimum.
Following General Dynamics guidelines, a triplex system with a remote data in-
terface (RDI) dedicated to each engine is used. Additional, RDI's are used to acquire
sensor data from sensor input modules (SIM) distributed throughout the vehicle. The
SIM was assumed to handle 15 sensor signals with necessary excitation signal condi-
tioning, analog/digital conversion and testing capability. Since this module has not
been completely defined and since it may require more than one circuit card to realize,
module counts for this function are not directly comparable to other module counts.
The processing required to carry out MPRAS functionality is provided by the Iner-
tial Measurement Unit (IMU) and the Vehicle Management processor (VMP). The
proposed General Dynamics self-cliecking processor/ memory module is substantially
more complex than the processor or memory modules used in the other architectures
and may require more than one circuit card for implementation. Consequently, mod-
ule counts shown for the General Dynamics configuration may tend to be less than
that actually required. This configuration does not incorporate the more distributed
sensor data collection scheme currently being considered by General Dynamics under
MPRAS Part III.
The configuration for the Boeing modules following Boeing's guidelines is quadru-
plex and assigns only one processing site in each stage to propulsion control. Pro-
cessors are assigned to perform mission processing (MP) and guidance, navigation,
and control (GNC). Note that the computing requirements for propulsion control, if
they are as high as 5-10 MIPS/engine as some estimates indicate, may require more
processors to control the eight engines. Interface hardware for the 1773a and HSDB
buses are required for the processing sites and I/O processors. Module requirements
for the local signal conditioner unit (LSC) which was not fully defined by Boeing
4
assumes that six modules are required to interface to :30 sensors and/or actuators
and that an additional two modules are required for A/D conversion, testing and
transducer bus interfaces.
The AIPS module count is based oil quadruplex system and oil one
fault-tolerant processor (FTP) to each engine. Additional processors are assigned to
perform nu i sslon processing (MP) and guidance navigation and control ((;N(')The
module breakdown for this configuration follows the proof-of-concept implementation
of AIPS. Consequently, the number of unique modules and hence the total module
count will tend to be higher than module counts for a repackaged AIPS which takes
advantage of projected microelectronics technology, and higher than module counts
for the other architectures which are based oil technology. Device interface
units (DIU) provide for the acquisition of sensor . data and the distribution of actuator
control data. The DILL module counts assumes 15 inputs or outputs per module
for signal conditioning and sensor interfacing. The intercomputer network (IC) is
triplex redundant and the fault-tolerant I/O network is assumed to have at least dual
redundancy. Network node hardware provides for circuit switching on the IC and I/O
networks. Some I/O links call be used in AIPS to provide for additional reliability of
the networks. However, the constraints of routing these spare links within the vehicle
may render this AIPS feature unusable.
Figures 1.2, 1.3, 1.4, 1.5 show configurations for each architecture. Tables 1.1, 1.:3,and 1.2
sumitiarize module counts.
A direct comparison of module counts for each architecture is not appropriate
given the assumptions going into their derivation. However, a number of general ob-
servations can be made from these coarse module counts. First, between 50% and
85% of the modules are directly involved in sensor/effector I/0. Second, if the module
counts are divided by the assumed redundancy the number of modules in each re-
dundant path is nominally 150 modules plus or minus 20%. This will dictate average
module failure rates, which are more than two orders of magnitude lower than the
necessary channel failure. Further, each channel will have nominally 50,000 circuit
card electrical contacts. Third, the more distributed sensor data collection character-
istics of the Boeing architecture, which also was assumed for the AIPS architecture,
results in substantially more mechanical enclosures and hence more maintenance ac-
cess points within the vehicle. Since General Dynamics has indicated a desire to use
a more distributed sensor data collection scheme in their current MPRAS Part III ac-
tivities, the significant difference between enclosure requirements for the architectures
would be reduced.
A final observation regarding the total number of processors used in these mod-
ules is in order. Since the standard .JIAWG modules such as the HSDB interface and
the 1773 interface contain 1750A processors, the total number of processors used in
F
5
CARGO
FORWARD
L02 TANK
INTERSTAGE
LH2 TANK
P/A MODULE
RDI
RDI
RDI
IMU	 VMP	 OPS
EN	 EN	 EN
TO BOOSTER
b b	 b b	 b d
ENGINES
Figure 1.2. General Dynamics Core Avionics Example Configuration
FORWARD
L02 TANK
1 NTERSTAG E
LH2 TANK
ENGINES
RDI
RDI
RDI
ENO	 •	 •	 •	 ENO
FROM CORE
P/A MODULE
Figure 1.3. General Dynamics Booster Avionics Example Configuration
1
1
I
	
CORE	 ;	 BOOST
1
1
LSC	 1	 LSC
1
1
1
1
1
1
1
1
1
1
LSC	 I	 LSC
1
1
1
I
1
1
1
1
LSC	 1778
i
1
1
1
1/O
1
1/O11
HSDB	 1	 1779
1
OPS MP
	
ON	 MP
1
1779	 11
1	 1/O1
I/O
	
1
1
1
Figure 1.4. Boeing Avionics Example Configuration
I/C NET
ENO x 3
DIU	 X
F. T. 1/0 NET
CORE	 BOOST
DIU	 DIU
1
1
1
1
1
1
1
1
DIU	 a	 DIU
1
1
1
1
----^----	
—
--------
—	 1
1
^^?
Dlu
1/0 NODES --.
---------------
DIU
ON	 MP
1 DIU
1
1
1
t
------,	 —^,_
	 ----------
1
1
1
\ 1
IOPSI TOBOOSTER TOCORE
Figure 1.5. AIPS Example Configuration
9
4:
y
p
pN
N
^
O
O
^
 b
^
A
 
vv
i
^
"
+
U
-. ^
^
Q
^
+
'
	
O
U
 ^
 U
.
^y
^
 0
^
_
^7
^
' c
d
 • ^
C
p
 ^
C
 N
pO
..
O
f—
'
O
^
>
^
a
N
 C
 O
S
^
y 
^
.
aG
S
n
^"
00Uc
^
tom-,
OCCUUC1.
NNCUbNct.pC/1.U^Oz
Table I.I. M
odule C
ounts for G
eneral D
ynam
ics Exam
ple
10
Module
CA
In
x
Ln C^) 7^0 D
in U 0CL cn
GN&C 1 8 2 4 4 4 4
Processor
Mission
Processor 2 16 4 8 8 8 8
(2)
Local Signal
Conditioner 423'.':
(42)
1/0 Channels 16
r777M
(16)
TOTAL 61 24
1	
6
MO
20 12 332 128
Note: Shaded entries represent sensor/actuator 1/0.
Table 1.2. Module Count for Boeing Example
11
appoiN
110
i0
i tn
"V
 A
M
V
IN
D
10
1
-
21
S
 ru
a
JOJJUOD
A
A
 fU
G
LM
SIDI
C
A
Sol
00
a
stisialul
-T
jojw
)i
. U
n
w
w
o
:)
-t
-t
C14
en
OPON
 JON
r7777q
rZ
=
)UowaN PnqS
It
@
_,W
00
00
C
)
C-0
ioss;)aoid
cc
00
7
c
c
X
IddnS IO
IA
O
d
;)jnsol,)uA
0
/
°
j \_
\
U
0\
=
2
0
2
z
cc
0
aw
U
 Uz
2 z
F
-
ci.2z
Table 1.3. M
odule C
ount for K
IPS Exam
ple
12
the General Dynanuc configuration is about 250, in the Boeing configuration about
125 and in the AIPS configuration, assuming only the processor modules use pro-
grammable computer, is 80. Given the uncertainty in the configuration and the
designs particularly regarding computer requirements for propulsion, these should
not be compared. The point of importance to be considered is that a large num-
ber of embedded computers are required for this application. All of these computers
must interact in such a manner that real-time deadlines are met, maintained and
that overall synchronization and tinting is maintained. Design and validation meth-
ods which account for this complexity must be used to develop MPRAS applications.
Furthermore, the Boeing architecture is structured such that all time critical sen-
sors/actuators must reside in the propulsion/avionics (P/A) module.
The Boeing architecture distributes the 2 megabit/sec 177:3 bus throughout the
vehicle while confining the 50 megabit/sec system busses to each P/A module. The
General Dynamics architecture distributes the 50 megabit/sec vehicle management
and sensor data network bus throughout the vehicle. Maintaining the acceptable
tinting and bit error rates for the faster bus over greater physical distances represents
a higher technical risk. There are some observations that can be made about specific
architectures. If the number of sensors/actuators required are increased, the Boeing
architecture linuts at 31 the number of remote terminals connected to each of the four
transducer busses. Additional sensors/actuators can be connected to the flight control
bus but care must be exercised to not violate time criticality constraints associated
with the flight control bus.
The I/O net of the AIPS must be distributed throughout the vehicle. The RIPS
proof-of-concept I/O network bandwidth is two megabits/sec. This bandwidth is
expected to be increased to 20 or 30 megabits/sec. As such, its bandwidth will
fall between that of the Boeing and the General Dynamics architecture. While the
AIPS I/O network can be configured as linear connections from node to node, it
has features which permit use of spare links to reconfigure around failed links and
nodes. Use of this capability would represent a more efficient use of the AIPS I/O
network hardware. The recommended topology for the AIPS I/O nodes and links is
a binary tree with I/O nodes at each branch point and spare links included within
the tree. The partitioning of the I/O nodes into the vehicle sections and limitations
on the number of links which can cross a boundary between vehicle sections could
constrain the I/O network topology to an extent that the reconvnended topology
cannot be fully realizable. The number of sensors and actuators handled by AIPS
can be increased by adding I/O nodes and DIU's to the I/O networks. This can be
accomplished up to the maximum allowable number of nodes in a network. Beyond
this, new networks would have to be set up. The 4000-6000 sensors and actuators
required with a maximum combined bandwidth of 1 megabit/sec is well within the
13
capacity of the AIPS.
14
2. INFORMATION REQUIREMENTS
2.1. Requested Information
General Discussion — The information requested for the assessment and charac-
terization of the MPRAS architectures was baseline MPRAS requirements, hardware
module specifications, software operating system specifications, application functional
description, descriptions of architectures and topologies to meet the application re-
quirements and descriptions of the fault avoidance and fault tolerance features of each
architecture. The information used for this evaluation encompasses all MPRAS parts
1 and 2 information reported through December 1989.
Characterizations capable of quantitatively-discriminating between different "well
designed" architectures require high-fidelity architectural specifications. Such infor-
mation is consistent with the later stages of development. In the development stages
prior to the Preliminary Design Review (PDR), less precise information is available.
Consequently, the characterization of MPRAS architectures of necessity relied upon
qualitative evaluation factors, subjective measures and quantitative architectural pa-
raineter sensitivity analyses.
The following paragraphs describe the design information requested for this effort.
Specification of Hardware Architecture — The specification of the hardware
architecture requested the identification of all hardware functions including:
• power modules
• processor modules
• memory modules
• I/O communication links and modules
• data acquisition and distribution modules
• interprocessor communications links
• data buses
• maintainability features
• test interface modules
• testability features
15
• fault masking and reconfiguration inechanisills
• error detection mechanisms
• timing/clocking modules
A brief description of each module or element including relevant performance paraill-
eters such as processor speed, communications bandwidth, memory read/write times,
communications overhead and capacities was requested.
In addition to performance parameters, module failure rates for various nnsslon
phases were requested. For design parameters that were not known, the assumed
range of values for these parameters was requested.
Specification of the Application and Architectures to Meet Baseline
Requirements — The application specification requested called for a functional de-
composition. For each subfunction a description of the inputs, processing, outputs
and special requirements such as transport delay, jitter and process update rate, was
requested. Processing workload and information flow between subfunctions as well
as the basis for the workload estimates were requested.
A detailed audit of the sensors and actuators required to support the application
was requested. The audit called for specification of each sensor/actuator, the type,
proposed redundancy, number of each type, number of bits, source/destination sub-
function associated with the sensor/ actuator, associated failure rate for each mission
phase and on-line test and calibration requirements.
The proposed topology for the hardware elements and interconnections for each
vehicle configuration specified in the baseline requirements was also requested. In
addition, the allocation of subfunctions to specific hardware resources was requested.
The overall mission reliability and safety requirements as well as the size, weight
and power of the MPRAS for each configuration were requested.
Specification of the Reliability and Fault Tolerance Features of the Ar-
chitecture — Specification of the fault tolerance features of an architecture called
for the identification of the set of faults to be tolerated. Fault identification was to
include type, frequency of occurrence, dependence on mission phase or technology
and a characterization of the errors induced by the fault type. Error effects were to
be characterized by:
1. Count: single versus multiple
2. Origin: environmental factors
16
3. Activity: dormant versus active
4. Duration of activity: transient versus permanent
5. Extent: local versus distributed
6. Temporal behavior for multiples: coincident versus separated
7. Cause of multiples: independent versus common mode
Descriptions of the fault tolerance features such as error detection, error masking,
fault containment, consistency, reconfiguration, redundancy, and redundancy inan-
agement, was requested for each architecture. Recovery strategy descriptions for
both the local (subsystem) and global (system) levels were requested. Assumptions
for the effectiveness of error detection and for the times required to detect, recover,
and reconfigure and the techniques for providing fault-tolerant power, clocking, syn-
chronization and startup were also requested. Descriptions of fault tolerance features
called for the identification of those portions of each feature that were to be incorpo-
rated in hardware and those portions which were to be provided by software. Also,
the approach used to detect faults in the fault tolerance meclianisul was requested.
The primary reliability parameters associated with the architecture were requested.
These included, but were not limited to, module failure rates, transient frequency and
duration, coverage parameters such as latency and detection effectiveness, redundancy
of each element and the number of spares.
The techniques expected to be employed for fault tolerance in the application
software were to be defined. Methods and policies which were to be employed to
avoid software design faults and to provide software quality assurance were to be
identified.
Life Cycle Cost Report — A Life Cycle Cost (LCC) report which summarized
the recurring LCC analysis for each MPRAS architecture was requested. It was
requested the report state all ground rules and assumptions used and identify all
factors included in the recurring LCC. Assumptions regarding the use of common
modules within the architecture, which are used in other avionics programs or in
ALS ground support, were to be stated. Assumptions regarding maintenance and
testability were to be included in the report.
System Software Architecture Descriptions — It was recognized that limited
information would be available regarding the system software architecture for some
of the MPRAS designs. To the extent that this information was available, it would
be used. The information of interest regarding system software architecture included
a description of control characteristics such as distributed, central or hierarchical and
17
descriptions of functions including task scheduling, I/O services, interrupt services,
memory management, utilities, interprocessor communications services and functions
related to fault tolerance. In addition, performance information such as function
overheads, expected execution time for a given system function, function response
times and uncertainty in response tunes was requested.
MPRAS Engineering 'Dade Studies and Analyses — The engineering trade
studies and analyses conducted by the three MPRAS contractors to determine the
requirements and characteristics of the specified architectures were requested.
System Development Methods — The complexity and criticality of the nussion
requirements dictates the use of rigorous, systematic methods and policies for the
development of MPRAS. This is necessary to assure that all system objectives such
as cost, performance, reliability, safety and maintainability are met. Good engineering
practice dictates that these factors be addressed starting in the concept, requirements
and early design phases and continuing throughout the development.
To validate that the design objectives of a complex system are met requires that
systematic validation must be conducted throughout the system development starting
at the beginning of the development cycle. All elements of the architecture develop-
ment (application software, operating system software and hardware elements) must
be included in the design validation.
The methods expected to be used for the development of MPRAS will deternne
to a large extent the potential for a successful development. Documentation of the
system development methodology that each contractor expected to use was requested.
2.2. Information Sources
Listed below are the information sources used to conduct the architecture character-
izations and assessments.
Boeing Aerospace
MPRAS System/Subsystem Requirements Document (SRD),
February 1989, Doc. No. 180-30579-2.
MPRAS Third Quarterly Review, March 1989
MPRAS Conceptual Avionics Architecture Specification,
.July 1989, Doc. No. 180-30579-4.
MPRAS Fourth Quarterly Review, July 1989
MPRAS Final Review, October 1989
MPRAS Preferred Vehicle Avionics Architecture Specification,
October 1989, Doc. No. 180-30579-5.
18
MPRAS Critical Item Development Specification, GN&C Processor,
October 1989, Doc. No. 180-30579-7.
MPRAS Critical Item Development Specification, Local Signal
Conditioner, October 1989, Doc. No. 180-30579-9.
MPRAS Critical Item Development Specification, I/O Cbannel,
October 1989, Doc. No. 180-:30579-10.
General Dynamics Space Systems Division
MPRAS Reference Vehicle/Requirements, .June 1989
Preliminary MPRAS Architecture Specification, Vol. 1,
Concept Specification, .July 1989
Future Launcli System Technology, Contracted Exploratory
Research & Development, MPRAS Review, July 1989
MPRAS Point Design, Fault Set, August 1989
MPRAS Point Design, Design Evaluation, August 1989
MPRAS Arclitecture Specification, Vol. I, September 1989
Draft Version, MPRAS Architecture Specification, Vol. 11,
(Application to Advanced Launch System), September 1989
MPRAS Point Design, Fault Detection, Isolation, and
Recovery, September 1989
MPRAS ADP 2103, Technical Interchange Meeting, October 1989
MPRAS Point Design, Performance/Dependability Goals,
SCM, .July 1989
The Charles Stark Draper Laboratory, Inc.
Advanced Information Processing System: Input/Output Network
Management Software, Contract NAS1-17666, May 1988
Completion of AIPS (FY 88 Tasks), Oral Review, Marcli 1989
Validation of Core Fault Tolerance Concepts, AIPS Fault
Tolerance Concepts, March 1989
AIPS FTP Reliability Analysis, March 1989
MPRAS 2102: AIPS for ALS Review, Program Overview and Status
October 1989
19
Other Information
Architecture Specification for PAVE PILLAR Avionics,
SPA90099001A, .January 1987
Modular Avionics, IBM Product Specifications for JIAWG Modules
Application of Fault Tolerance Technology, Vol. I, Design of
of Fault-Tolerant Systems, SDIO BM/C 3
 Processor and Algorithm
Working Group, Avizienis, A. and Gille y . G., Editors
Application of Fault Tolerance Technology, Vol. 11, Management
Issues: Contractor Milestones and Evaluation, SDIO BC /C3
Processor and Algorithm Working ('Troup, Reiinels, D.
Gilley, G., Editors
RADC Testability Notebook, RADC-TR-82-189, Hughes Aircraft
Reliability Handbook, MIL-HDBK-217E
20
3. RELIABILITY, AVAILABILITY,
MAINTAINABILITY AND TESTABILITY
EVALUATION
3.1. Introduction
Reliability and the closely related characteristics of maintainability, availability,
testability and fault tolerance are of prime importance in the design of the MPRAS
architecture. Not only does improved reliability lead to cost reductions through
reduced logistics support costs, it enables introduction of functionality into the
avionics that will support reduced integration testing and operations costs.
The avionics used in current launch vehicles provide the basic mission functions
with moderate reliability. Improvements in the reliability of the basic systems would
result in reduced life cycle costs through reduced support costs. The avionics
necessary to provide functionality such as adaptive, guidance, navigation and
control (AGN&C) integrated health monitoring, and increased vehicle autonomy for
a complex vehicle could require thousands of sensors and in excess of 100 electronics
modules. If the average module MTBF for the prelaunch environment was 100,000
hours, the reliability of the avionics at the end of 200 hours on the launch pad
would be about 0.8. If the average sensor MTBF was 50,000 hours, it would be
reasonable to expect 10 or so sensor failures at the end of 200 hours on the pad.
Fault-.tolerant avionics architectures are required to provide the advanced functional
capabilities while at the same time meeting reliability requirements.
Consider using a simple fault-tolerant architecture with three independent
redundant channels whose output is voted. If the overall unreliability must be less
than 1 x 10- 5 and the exhaustion of components is the primary failure mechanism,
the probability of failure for each channel must be less than about 1.82 x 10 -3 . If
100-200 modules are required in each channel, the average module failure rate must
be between 5 x 10_ 5 /hour and 10 -4 /hour for 10 minutes in the iiussile launch
environment.
Even if it is assumed that this failure rate is the MIL-HDBK-217E predicted failure
rate and that it is not necessary to account for the transient failures, a higher rate
than the 217E rate, the failure rate/module for ground-fixed environment must be
between 10-5 and 2x10-5/hour.
This failure rate is at or below that for modules of the complexity of processors,
memory, and I/O controllers implemented using current VLSI technologies. One
inust conclude that it would be difficult to achieve the desired reliability with a
simple independent redundant 3 channel system.
21
Figure :3.1 diagrams a framework for reliability evaluation. Evaluation starts at the
mission requirements level and incorporates the system architecture, the technology
of implementation, and the appropriate failure processes.
Mission Characteristics and Requirements
(Type, Duration, Environment,
Performance, Reliability)
Fault	
^ Technology	 Architecture
Models	 Characteristics
'	 Reliability
Assumptions	 Determination
Requirements
Conventional	 Analytic	
Measurement
Modeling	 andMethods	 Experimental
Methods	 Methods
Tools
Figure 3.1. Framework for Reliability Evaluation
The three areas of focus for reliability determination are thus: 1) missions, fault
models, architectures and technologies; 2) reliability determination requirements;
and 3) methods and tools.
The MPRAS application characteristics which impact reliability modeling
requirements are mission criticality, many distinct mission phases with diverse
activity levels and reliability requirements, a relatively large number of distributed
system modules and a harsh operating environment.
22
The liigh system reliability requirements dictate the need for a fault-tolerant
system. Such systems have to be carefully evaluated to determine whether or not
the requirements are met. Fault tolerance nnerhanisms makes evaluation more
difficult by increasing the number and complexity of siguificant factors affecting
system reliability.
Except for the 10 minute launch phase, mission profiles for all three architectures
differ significantly. 'fable 3.1 summarizes the various phases and their duration.
One of the most important differences is the anwunt, of time that the avionics is
powered prior to launch. 'There is a factor of 8 between the shortest and the longest
time.
BOEING GENERAL DYNAMICS MARTIN/DRAPER
Phase lirs
On
1lrs
Off
Phase firs
On
lirs
Off
Phase lirs
On
IIrs
Off
Recovery 8 2 Fctry (:;hckt 136 —800 Rec&Refrbsh 200
Refurbish 150 150
Tnk,P/A Mte 8 16 Vert Integ 10 -100 I,iteg&Rllt 400
Cr,Bst Mate :32 88
Payld hiteg 90 Cargo Integ 6 —100
I.aunch Pad 48 :32 PreLaunch 25 —100 PreLaunch 200
Lnvh/Ascot 0.15 Flight .17 Ascent .17
Core Pow Flt
C; Eng Shtd
Orbit 12.65 On-Orbit On-Orbit 1.5
Pay](] Sprtn
Core DeOr
R.ecModR.bst 0.1 PA Mod Sprtn Flyback .17
RecModFlt :33.2 PA Recovery
BRM Sprtn
BRM Recvey
Table 3.1. Mission Phases
This difference would result in significantly different failure states at launch time
and directly impacts the availability of the avionics for launch.
Various reliability requirements have been put forth for MPRAS applications.
Boeing indicates a requirement of 0.999998 for the first flight of the core recoverable
P/A module and a 0.9999 for the 50th and last flight for that module. A reliability
of 0.9999997 is allocated to computing and 0.9999984 is allocated to I/0. General
23
Dynamics has put forth a requirement of 0.99999 at the end of a 10 minute launch
phase with the assertion that reliabilities better than 0.9999 are not likely to be cost
effective. General Dynamics has set a pre-launch availability goal of 0.99954 and
100% fault detection for all test methods combined with 99% from built-in-test
features of the VLSI chips.
The fault models for the reliability analysis will include transient and permanent
faults. The failure rates for the modules will be for the most part based on
MIL-HDBK-217E. Permanent failures will use the base rates adjusted by the
appropriate ncroelectronics environmental factor for the mission phase of interest.
Allowance for transient failures occurring at rates faster than the permanent failure
rate will be made.
Other fault models that should eventually be taken into account for MPRAS
reliability evaluations include the common mode failure. The conunon mode failure
violates the independent failure assumption upon which the reliability of redundant
systems is based. Mechanisms need to be included in the design to cope with
important subsets of common mode failure mechanisms. Since there may be. up to
200,000 connector contacts in an MPRAS application, other fault mechanisms such
as intermittent failures due to mechanical vibration should also be included. The
electromagnetic environment (EME) within which the MPRAS system operates can
pose a significant threat to the safe operation of these systems. Whether by
radiation or conduction, electrical energy from on-board components, radar
equipment, and RF communications equipment and from external sources such as
lightning strikes, high energy radio frequency broadcasts (HERF), nuclear radiation
and electrostatic discharge call
	
into digital microelectronics and induce
faults. Digital microelectronic systems are particularly vulnerable to this coupled
energy because of fast circuit switching times, the relatively small amount of energy
required to upset their operation and the fact that critical operating state
information stored in registers and memories can be easily lost.
The NASA Langley reliability modeling support tools ASSIST and SURE were used
for reliability analyses.
3.2. Hardware Module Failure Rates
Projected failure rates for the General Dynamics and Boeing MPRAS hardware
modules were not reported in the MPRAS documentation. Boeing, however,
specified reliability objectives for each major hardware unit. Since .JIAWG or
JIAWG-like modules were proposed for most of the hardware functions, it was
decided to use the failure rates for these modules as the basis for reliability analyses.
24
The MIL-HDBK-217E-predicted failure rates that were available for JIAWG
modules of interest are shown in Table 3.2. These failure rates have been translated
to the Ground Fixed environment. The Ground Fixed environment will be used for
the pre-launch on-pad portion of the MPRAS mission. The assumed failure rates for
General Dynamics hardware modules are given in Table 3.3. The rate for the
self-checking processor is an estimate which assumes that two processors, two 1
megaword memories and two program ROM's are oil 	 board along with a single
test and maintenance processor and a self-checking comparator. This estimate
assumes a memory failure rate of 1.6 x 15 -5 /hr for 1 megaword memory when
added to an existing module. General Dynamics did not address the feasibility of
this degree of complexity for a module or if a 10 MIP self-checking processor can be
designed to have a power consumption of 10 to 15 watts. The General Dynanucs
Local Data Link has not been defined. Its failure rate will be assumed to be the
same as a .JIAWG data processing module.
Table :3.4 gives the failure rates projected for the Boeing hardware modules. The
Fault Tolerance Module for the Boeing architecture was not well defined. Based on
rough descriptions of its functionality and a block diagram, it is judged to be of the
same complexity as the Processor. It has less memory but has the added function of
a voter, cross-channel interfaces and a bus switching unit.
The AIPS FTP failure rates have been reported for a VLSI implementation. The
failtire rates include an I/O processor and a computation processor along with two
floating point co-processors, a megabyte of PROM and RAM memory and
associated glue logic. Table 3.5 details these failure rates for the Ground Fixed
environment factor.
In order to get the FTP failure rate more or less on an equivalent footing with the
failure rates used for the processors used in the Boeing and General Dynamics
architectures, the FTP memory failure rates were modified to be consistent with
those used in the JIAWG modules. These are:
• 1.6 x 10- 5 /hr for adding 1 megaword RAM to an existing module
• 5 x 10_ 6 /hr for adding 2 megaword PROM to an existing module
An additional 6 x 10_ 6 /hr was added to cover the addition of a test and
maintenance interface.
The failure rate for the remaining FTP functions such as the Input/Output
Sequencer, the Intercomputer Communications Interface Sequencer, the shared
memory, the Communicator (voter) and Dual Port Memory are not known.
However, assumptions were made for them based oil
	 estimates of function
25
Module	 Failure Rate (Ground Fixed)
Processor	 3.8 MIPs 2.4 x 10-5/hr
1750A processor	 Power: 21 watts
512K word RAM
16K Word EEPROM
Test/Maintenance Processor
Power Supply Module 8 x 10-'/hr
5v 0 44 amps
1553B Interface 1.8 x 10'/hr
1750A processor	 Power: 21.5 watts
128K word RAM
16K Word EEPROM
Test/Maintenance Processor
High Speed Bus 2.4 x 10-5/hr
1750A processor	 Power: :31.5 watts
128K word RAM
16K Word EEPROM
Test/Maintenance Processor
Bulk Memory 1.4 x 10-5/lir
512K word EEPROM	 Power: 15 watts
16K Word EEPROM
Test/Maintenance Processor
Table 3.2. MIL-HDBK-217E Failure Rates for JIAWG Modules
26
Failure Rate
Module	 (Ground Fixed)
Self-checking Processor 6 x 10-5/hr
Local Data Link	 2.4 x 10-5/hr
(Voter)
High Speed Data Bus	 2 x 10_5/hr
Power Supply	 8 x 10_6/hr
Table 3.3. Failure Rates for General Dynamics Modules
Module	 Failure Rate
Processor 2.4 x 10_s/hr
Memory 2.8 x 10-'/hr
Fault Tolerance Module 2.4 x 10-5/hr
(Voter)
Power Supply 8 x 10-6/hr
High Speed Data Bus I/F 2 x 10-5/Iir
1773 I/F 1.8 x 10_5/hr
Table :3.4. Failure Rates for Boeing Modules
27
Failure Rate
0111ponent	 ((;round Fixed)
Processors	 11 x 10-'/Ilr
(IOP, CP & 2 floating points
(and glue)
RAM (B1 Polar) (1Megabyte) 1320 x 10-'/hr
PROM (Megabyte) 	 135 x 10_6/hr
Table :3.5. FTP Failure Rates
complexity. That is, there is no basis other than engineering judgement for their
values. Table :3.6 gives the failure rates used for the AIPS modules.
Figure 3.2 gives the projected failure rates for processing channels in each of the
architectures based oil 	 rates for component modules. A key parameter in
determining the reliability of MPRAS architectures during the launch phase is the
transient failure rate during this phase. For this evaluation, this rate was assumed
to be in the range of 2 to 10 times the 217E rate for the missile launch environment.
The sensor interface hardware was the least defined component for all of the
proposed architectures. Consequently, estimates of the failure rates were made. It
was assumed that all implementations would involve the use of similar components.
It was further assumed that the General Dynamics Sensor Interface module required
fewer parts since it was part of a larger assembly (RDI). Consequently, it shared
certain common components such as power supplies with other functions, whereas
the Boeing Local Signal Conditioner and presumably the Draper Device Interface
Unit are self contained. The items included in the estimate are A/D converters, Bus
I/F's, Multiplexers, 30 Differential Amplifiers, Power Supplies (low current) and test
electronics. Table 3.7 shows the failure rates used for the sensor interface.
The failure rate estimates are based on a minimum of design information and were
made by RTI for the purpose of doing rough overall system reliability comparisons.
These estimates are believed to be reasonable but are not represented as accurate.
It is assumed that the three contractors can implement the function with equivalent
failure rates. The sharing of common components across several functions inherent
in the General Dynamics RDI modularity is reflected by the lower failure rate for the
28
Failure Rate
Component
	
(Ground Fixed)
Processors	 3 x 10-5/hr
(IOP, CP, Memory)
Shared Resources	 2.5 x 10-5/hr
(Data Xchg, Memory) 
^
I/O Sequencer
	
2 x 10-5/hr
IC I/F Sequencer
	 12 x 10-5/hr
Power Supply	 18 x 10-6/hr
I/O Node	 12 x 10 - 5/hr
Table :3.6. AIPS Module Failure Rates
29
GENERAL DYNAMICS
(1 megaword RAM)
	
8—(:Self
	
Local	 H S
	
Check	 Data	 Data
	
Pair	 Link	 Bus
=	 8	 60	 24	 20
Note: Not clear how to add memory w/o adding SCP's
6
Total /10 hrs (GF)
112
BOEING AEROSPACE
(112 megaword RAM)
	
Fault	 H S	 1773	 ^^^' `^`
Power	 Proc	 Tolerance	 Data	 Mem
	
odul	 Bus	 `	 ^
=	 8	 24	 24	 20	 18	 28/
94 + Mem
AIPS
(1 megabyte)
Power	 pry	 Shared	 IC I/F _	 1/0 	 Mem
Resource	 .	 ^	 .	 ^
=	 8	 30	 25	 20	 20	 28/
Note: Assumes that memory modules can be added
Figure 3.2. Core Processing Single Channel Failure Rates
83+1/0
+ Mem
:30
Failure Rate
C0111ponent	 (Ground Fixed)
Local Signal Conditioner	 2.6 x 10_5/hr*
(Boeing :30 inputs)
Device Interface Unit	 2.6 x 10-'/11r*
(AIPS 30 inputs)
Sensor Interface Module	 1.5 x 10-5/hr*
(General Dynamics 30 inputs)
*Note: This rate is 30% higher than the design
objective of 0.999 reliability for a 423 hour mission
called out for the Boeing Local Signal Conditioner.
Table :3.7. Sensor Interface Failure Rates
Sensor Interface Module. Since these modules will be used extensively in MPRAS,
they can influence overall system reliability. Consequently, good reliability estimates
for these modules should be a priority for the MPRAS module design specifications.
3.3. Core Processing Functions
Preliminary Discussion — Core processing resources for the MPRAS applications
will be made up of a number of distributed processing centers. Depending upon
which architecture and which design options are exercised, these processing centers
may be configured as separate multi-channel redundant processors or configured
such that all processing centers function as a multi-channel redundant system.
As discussed in the previous section, the MPRAS mission is multi-phased. The
reliability of the on-pad and launch phases of the mission is the focus of the
reliability analyses discussed in this section. Analysis was carried out for a
quadruplex and triplex for both the on-pad and launch phases. Reliability models
for these cases were built using ASSIST and analyzed using SURE.
Prior to describing these models in detail, an approximate analysis of the essential
31
parts of these models will be reviewed.
......................
	
``	 • To Model States
4X	 $	 3	 3A ;4	 xt	 for more than
i	 t II (1-C A	 /	 two failures
P	 T
C 7
P
P	 x2
(1-C P 'I
3^P
VP
Ap
System Fail
Figure 3.3. Quadruplex On-pad Model
Consider a quadruplex prior to launch. In state 4 of Figure 3.3, all redundant
channels are functioning properly. Upon failure, and assuming that no
near-coincident second failures occur, a relatively fast recovery process successfully
isolates the faulty channel with a probability of Cp . For those faults that are not
detected and isolated by the fast process, a slower, more thorough fault detection
process (pre-launch diagnostics) may detect and isolate the faulty processor.
Otherwise, the system enters a state (Vp ) where a fault has occurred but has not
been detected. In this state, the system is believed to be fully functional. The
system leaves this state if another fault occurs in the faulty processor, or if a fault
occurs in another processor. In state Vp , the system is vulnerable to additional
failures. If the vehicle is launched while in this state, the next fault that occurs
could result in loss of the vehicle. While a latent fault that could not be detected
and isolated with pre-launch diagnostics may not be error-producing after another
fault occurs, the conservative assumption is that it will be and as such leads to
system failure. Assuming that b and -y are much faster than the failure rate A, the
probability of being in state Vp is approximated by:
	
PQ [Vpj ^^ 4A p (1 — Cp)tp	 (3.1)
32
where: AP = pad failure rate
CP = effective coverage of fast and slow fault detection
t P = time on pad
For a triplex, equation :3.1 becomes:
PT I VPI - 3A P (1 - CP)tP
	 (3.2)
------ --- -- ---------
 
4 /^ L
	
AL
4
To Model States
	
Undetected
	 for more than
failure durin
	
pre—launch^
	 two failures
^ ........................
Figure 3.4. Quadruplex Launch
Figure 3.4 shows a portion of a reliability model for the launch. The states are
defined as follows:
33
State	 Description
4	 All 4 channels operational at beginning of launch interval
Vp	 An undetected channel failure occurred during the on-
pad phase
x	 Recovery state
VL	 Undetected channel failure during launch interval
3	 :3 operational channels with a failed channel properly
eliminated. System call 	 in this state at beginning of
launch due to all
	 failure or due to a failure after
launch
SL .,	 System failure states
During launch only a relatively fast fault detection and isolation process is assumed.
As long as the fault detection and isolation is not successful, the system is
vulnerable to additional faults. if the systein starts in a vulnerable state from the
pre-launch phase, it remains vulnerable to additional faults. For this analysis, it is
assumed that after launch any fault which occurs when the system is ill vulnerable
state will result in the loss of the vehicle. Vulnerable states associated with
transitions out of the three-operational channel state are included in the model but
will not be analyzed for this discussion.
The probability of reaching state SLn, that is, the probability of losing the vehicle
due to incurring additional faults after having launched in a vulnerable state, is
approximated by:
P4[,5^L1] -- PQ [Vp]3A p (7rep (1 + Ft )) t L	(3.3)
where: 7rei = the failure rate environment factor for missile launch
7rep = the failure rate environment factor for ground
fixed form from MIL-HDBK-217E
Ft	 = the ratio of the rate of occurrence of transient
faults to the permanent fault failure rate
t L	 = duration of the launch phase.
34
Substituting equation :3.1 for PQ [Vp ] yields:
PQ [SL I 	12A p t p t L [1 — Cp] BFI (1 + Ft )	 (3.4)
7r,P
Similarly, the probability of vehicle loss for a triplex due to launching ill
vulnerable state is given by:
	
Pt[,5'L,] — 6^Pt p t L [1 — CF]
'rel ( 1 + Ft)	 (3.5)
7rPF
The probability of reaching state SL 2 is approximated by:
I	
ll2
PQ [SL 2 ] ^- 6APt2
^PI 
(1 + Ft ) 	 CL )P[state 4 at launch]	 (:3.6)
P
Substituting for P[state 4 at launch],
211
PQ [,5L 2 ]	 6APt^ L et (1 + Fc)
J
 (1 — CL )e -aa,t,	 (:3.7)
L 7rep
Simlarly, for a triplex
.2
11
PT [.5L 2 ] -- :3,\pt2L! (1 + Ft )
J 
(1 — CL)e-3a,,t".	 (3.8)
L yep
The probability of vehicle loss for a quadruplex due to exhaustion of components,
assuming that a duplex cannot be recovered and that the system starts in state 4, is
approximated by:
11
3
PQ [,SL E ]	 12AptL ^'rPr (1 + Ft) J7rep
Similarly, the exhaustion probability for a triplex becomes:
2
PT[SLE] -- :3A P tL I eI (1 + FI)
yep
Comparison of the relative magnitude of each of these quantities is of interest.
Consider the ratio of PQ [SL,] : PQ[,5L2]:
PQ [,5L 1 ] — tp [1 — Cp] prep	 1	 1?--(:3. 1)
PQ [SL2 ]	 t L
 [I — CL] 7rd (1 + Ft) e -4aptp
This ratio ranges from about 0.5 to about 100. For the following parameter ranges
of interest:
35
2 5 hrs < tp < 200 hrs,
tL = 0.167 hrs
0.9 < Cp < 0.99,
0.9 _< CL <— 0.99,
( 'p ^ CL,
2.s
7r,1	 5 1
Ft = 10,
10-4<Ap<10-3
Note that Cp = .99 is consistent with the built - in-test design goal for fault
detection of the General Dynamics MPRAS modules. For short pre-launch phases
combined with lower but reasonable coverage of failures during launch, the two
modes of failure contribute more or less equally to system failure. As the pre - launch
phase is lengthened, the failures due to launching in a vulnerable state dominate.
This donunance can be reduced by improving the coverage of the pre-launch tests.
Consider the ratio of PQ [SL,] : PQ[SLE]:
PQ[ ,5L ,] 	 [1 — Cp ]tp pr ep	 1	 l
tP 5'LE] 	 'ani	 [rer ( 1 + Ft) ^	
(3.12)
Q[^
For the parameter ranges of interest, this ratio varies from 2.8 to 2200. That is, the
probability of failure during launch due to lack of coverage in the pre - launch testing
dominates the probability of launch failure due to exhaustion of components.
The second exhaustion of components failure mode for . the quadruplex during
launch occurs when the system is reconfigured to triplex prior to launch. The
probability of failure for a degraded quadruplex due to exhaustion of components is
approximated by:
\l
PQD [S'LE ]	 3AptL ( 'ret (1 + Ft ) I PQ [degenerates to 3 on pad]	 (3.13)
rep	 J
Substituting an approximation for the probability of degrading to triplex during
pre-launch, equation 3 . 13 becomes:
a
PQD[.S'LE] ^ 3AptL	 et (1 + Ft ) l (1 — e -aaPCp ) Cp 	 (3.14)
rep	 J
Consider the ratio of PQ [,SL,] : PQD[SLE];
PQ [,S L, ]	
4 
(tp 
J	 rep (1	 3.1,ECO
PQD [SLE]	 \t L / 7r 1( 1 + Ft) Cp ( 1 p e_41ptp	
(	 r)
36
This ratio ranges between 1 and 117 for the parameter ranges of interest. The loss
of all avionics processing site due to lack of coverage ill pre-flight diagnostics is
about the same as or is much greater than the exhaustion of c-omponeuts.
Consider the ratio PQU [,S'L E : PQ[SLE]:
PQ[) [,
 'LEI 
	
CF	 7'P	 1	 (1 — e-44tp	 (3.16)PQ [,SLE ]	 4AptL 7r ,1 ( 1 + Ft)
This ratio ranges between about 2 and 20 for parameter ranges of interest.
From these simple calculations, it call 	 determined that for the ranges of
parameters assumed, the two main contributors to the failure of all
processing site are: 1) failure due to launching with undetected failures and 2)
failure due to attrition of components from starting launch with a quadruplex
degraded to a triplex.
Self-Checking Pair with a Spare — The self-clecking pair (SCP) with a spare is
the basic processing kernel for each channel of the General Dynanucs architecture.
Au SCP has a failure rate that is nominally twice as fast as that for a single
processor. However, the SCP should result 'Ill 	 fault coverage and reduced fault
latency than methods that rely only oil 	 diagnostics, memory tests and
voting of computed application data with that from redundant computing channels.
A Markov model for a self-checking pair architecture with one spare is illustrated in
Figure 3
	
transitions	 n.5. This model contains states and tra s to capture the occurrence of
both permanent and transient faults in the active pair and ill 	 spare pair. It also
contains states and transitions to model the detection of a fault in the SCP and the
replacement of the faulty SCP by the spare SCP and to model the return of a
processor to the spare pool after experiencing a transient fault.
The states ill
	
	
model can be represented by a vector
S=(SCP,S,T,P), where
SCP = the number of operational SCPs
S	 = the number of operational spare SCPs
T	 = 1 if the fault is transient,
0 otherwise
P	 = 1 if the fault is transient,
0 otherwise
Transitions from one state to another in the model are made according to the
following rates and probabilities:
37
IATA
Figure 3.5. Self-Checking Pair with Spare Model
Ap Permanent Processor Failure Rate
A T Transient Processor Failure Rate
P, Probability that fault is detected and spare
is successfully switched in to replace failed SCP
b	 Rate at which fault is detected and spare is
switched in to replace failed SCP
!1	 Rate at which SCP failed due to a
transient fault is recovered
The states in this launch model are described in the following table:
State Number State Vector Description
$ 1,1,0,0 1 operational SC
1 available spare SCP
3 1,0,0,1 1 operational SCP
Spare unavailable due to pennanent fault
1,0,1,0 1 operat;onal SC
Spare wiavailable due to transient fault
S 0,1,0,1 0 operational SCPs due to permanent fault
I available spare SCP
6 0,1,1,0 0 operational SCPs due to transient fault
I available spare SCP
0,0,0,0 0 operational SCPs
0 operational spare SCPs
38
This Model was analyzed for both the on-pad and launch misslon phases.
Figure 3.6 shows the probability of failure for a self-checking pair with a spare as a
function of mission time and as the coverage factor, f,, is varied. The transient
failure rate is assumed to be zero and the permanent failure rate, A P , is assumed to
be 6 x 10 - '5 /hour. When transients are included at a rate of 10a p , the unreliability
results are increased by about a factor of 10. This is due to the modeling
assumption that transient faults that occur after either the primary or spare SCT
has failed cannot be recovered and results in the channel failure. This assumption is
appropriate for launch. Prior to launch this SCP could be recovered.
For a permanent failure rate of 3 x 10_ 5 /hr for each processor in the SCP, the
average failure rate for an SCP with a spare is 1.:32 x 10'/hr for a 200 hour mission
and is about 7 x 10 -7 /hr for a 25 hour mission. These compare to 6 x 10_ 5 /hr for
all
	
SCP. Note that this Improvement in reliability is accomplished through
the use of four single processor units with memory.
Quadruplex On-pad — A Markov model for a quadruplex architecture during the
on-pad mission phase is illustrated in Figure :3.7. This model includes fault
occurrence and fault handling states for both permanent and transient faults. For
both permanent and transient faults, the model contains fault handling states for
both fast detection/isolation/and recovery mechanisms such as switching out
out-voted processors and slower mechanisms such as memory scrubs. For example,
state 11 represents a fast FDIR process after the occurrence of a permanent fault.
This process leads to a recovered state where the failed processor is switched out
(state 13) or to a slower diagnostic process (state 14). The slower diagnostic process
state can lead to system failure if another fault occurs, to the recovered state (state
1:3), or to an undetected fault state (state 19).
Since this is a model of system behavior during the on-pad mission phase, system
failure has to include the inability to launch. States 10 - 2:3 in Figure 3.7 are
operational states, and states 1 - 9 represent system failure states and those
operational states that preclude vehicle launch. Of the states from which a launch is
possible, the following are of sufficiently high probability to be included in the
launch-phase model: state 10 (4 nonfaulty processors), 1:3 (3 nonfaulty processors),
19 (4 operating processors, one with an undetected fault), and 23 (3 operating
processors, one with an undetected fault).
The states in this model can be represented by a vector
S=(N,P,T,R,F,UPF,UTF), where
N	 = the number of operational processors in the state
:39
10,
10-3
A
^t.O
10-4
w4aa
101
101
20	 40	 60	 80	 100	 120	 140	 160	 180	 200
Tiles
Figure 3.6. Self-Checking Pair Duplex Unreliability
40
P	
10	
4?` P	
11	
CP1.8	
13	
31.	 17	 CP1•d	 1
CT2•µ2
CT2 • µ2	 0- Cp l )b	 0- Cpl)b
Al	 42T	
GQry-k	
µl	 3XT	 CP2.1
CT1•81
15	 12	 14	 21	 18	 20
CT1•81	 ,	 (1-CP2)r
	
C,o	 (1-CT2)µ2
(1-CT1 )E7	 XP	 y X P	 P
	
16	 19	 ^ 2 2	 23
^Z1^	 L, G
8 1^ °	 9
Up (1-K)
2XP+ 21T
3). P +31T	311+317	 K(3Xp +Sy 	2XP+2XT	 3).P+3?^
	
6
	
(2) 3	 7	 4	 5
Figure 3.7. Quadruplex On-Pad Model (State descriptions are in following text)
41
P	 = i if a permanent fault has occurred,
0 otherwise
T	 = 1 if a transient fault, has occurred,
0 otherwise
R	 = 0 if no detection/isolation/recovery actions are in progress,
1 if fast detection/isolation/recovery actions are in progress,
2 if slow detection/isolation/recovery actions are in progress
F	 = 0 if vehicle can be launched from this state,
1 otherwise
UPF = 1 if there is an undetected permanent fault present,
0 otherwise
L1TF = 1 if there is an undetected transient fault present,
0 otherwise
Transitions from one state to another in the model are made according to the
following rates and probabilities:
42
Ap	 Permanent failure rate of a processor
A T	 Transient failure rate of a processor
CPI Probability that permanent fault is detected, isolated,
and the failed processor switched out as a result of
fast diagnostics
CP2 Probability that permanent fault is detected, isolated,
and the failed processor switched out as a result of
slow diagnostics
CT1 Probability that a transient fault is detected and
isolated as a result of fast diagnostics
C:T2 Probability that a transient fault is detected, isolated,
and the processor recovered as a.result of slow diagnostics
b	 Rate at which a permanent fault is detected, isolated,
and the failed processor switched out as a result of
fast diagnostics
bl	 Rate at which a transient fault is detected
and isolated as a result of fast diagnostics
7	 Rate at which a permanent fault is detected, isolated,
and the failed processor switched out as a result of
slow diagnostics
µl	 Rate at which a processor with a transient fault
is recovered as a result of fast diagnostics
µ`Z	 Rate at which a transient fault is detected, isolated,
and the processor recovered as a result of slow diagnostics
K	 Percentage of permanent faults whose errors cannot be masked
when another undetected fault is present in the system
The states in this on-pad model are described in the following table:
4:3
State Number State Vector Description
] -O 4,0,0,0,0,0,0 4 operating processors
No faults
11 (4,1,0,1,0,0,0) 4 operating processors
1 permanent fault
Fast FDIR in process
12 (4,0,1,1,0,0,0) 4 operating processors
1 transient fault
Fast FDIR in process
13 (3,0,0,0,0,0,0) 3 operating processors
1 processor switched out due to permanent fault
14 4,1,0,2,0,0,0 4 operating processors
1 permanent fault
Slow FDIR in process
15 (4,0,1,0,0,0,0) 4 operating processors
Transient fault detected and isolated
16 (4,0,1,2,0,0,0) 4 operating processors
1 transient fault
Slow FDIR in process
17 (3,1,0,1,0,0,0) 3 operating processors
1 permanent fault
Fast FDIR in process
18 (3,0,1,1,0,0,0) 3 operating processors
1 transient fault
Fast FDIR in process
19 (4,1,0,0,0,1,0) 4 operating processors
1 undetected pennanent fault
20 (3,1,0,2,0,0,0) 3 operating processors
1 permanent fault
Slow FDIR in process
21 (3,0,1,0,0,0,0) 3 operating processors
Transient fault detected and isolated
22 (3,0,1,2,0,0,0) 3 operating processors
1 transient fault
Slow FDIR in process
23 (3,1,0,0,0,1,0) 3 operating processors
1 undetected pennanent fault
1 (2,0,0,0,0,0,0) 2 operating processors
No faults
Unlaunch able
2 (2,1,0,0,1,0,0) Second failure during recovery
from permanent fault
3 (2,1,0,0,1,1,0) Second failure while undetected
pennanent fault exists
4 171,0,0,1,0,0 Second failure during recovery
from pennanent fault
5 (1,1,0,0,1,1,0) Second failure while undetected
pennanent fault exists
6 2,0,1,0,1,0,1 Second failure during recovery
from transient fault
7 (1,0,1,0,1,0,1) Second failure during recovery
frown transient fault
8 (4,0,	 0,0,0,1) Unsuccessful detection, isolation,
and recovery from transient fault
9 (3,0,1,0,0,0,1) Unsuccessful detection, isolation,
and recovery from transient fault
This model was used to determine the probability of being in various states at the
end of the pre-launch phase. These probabilities will be used to initialize the
44
starting states of the launch model. Of particular interest is the probability of being
in a state where a failure has occurred but has trot been identified and the faulty
unit has not been removed from voting. Figure :3.8 shows a portion of the model
results for the undetected failure state given various effective coverage values and
channel failure rates between 5 x 10_ 5 and 1 x 10_ 3 /hr at the end of 200 hours of
powered operation prior to launch.
The results for pad times of 25 and 50 hours are approximately 1 th and 4th
respectively of the 200 hour results. These results were used to instantiate the state
probability for the imperfect diagnostics state ill 	 launch Model.
Figure 3.9 shows the probability that a single quadruplex processing site will
degrade sufficiently to prevent launch at the end of 200 hours of pad time for
channel failure rates between 5 x 10 -5 to 10- 3 /llr.
Quadruplex Launch - A Markov model for a quadruplex architecture during the
launch mission phase is illustrated ill 	 :3.10. This model combines the
permanent and transient processor failure rates of the on-pad model with a launch
environment factor into one processor failure rate, A L . The model contains fault
occurrence states for up to 3 faults, allowing the initial quadruplex configuration to
degrade to a simplex. The fault handling mechanisms are modeled simply as a
holding state which leads either to a recovered state or to system failure. Since the
ability to degrade from a duplex to a simplex relies on a different fault isolation and
recovery mechanism than is required for the other reconfigurations, a different
recovery rate and probability are assigned to the duplex-to-simplex transitions.
States 9 and 12 in Figure 3.10 represent the probability that the launch tlllsslotl is
begun when an undetected permanent fault is present ill
	 system. Given that the
initiation of the launch phase is contingent upon the successful completion of the
on-pad phase, states 7, 10, 9, and 12 are assigned initial probabilities equal to their
occupancy probabilities at the completion of the on-pad phase. These occupancy
probabilities are determined from the on-pad model described previously.
The states itl this model can be represented by a vector
S=(N,F,R,tIF), where
N = the number of operational processors in the state
F = 1 if a fault has occurred and detection/isolation/recovery
actions are in progress
0 otherwise
R = 1 detection/isolation/recovery actions are unsuccessful or
a second fault has occurred during
detection/isolation /recovery
45
10 -1
10-2
w
10 -1
w
0
10-4
0
lo-'s
10,
90	 91	 92	 93	 94	 95	 96	 97	 98	 99	 100
EP'P'EC= COVERAGE (%)
Figure 3.8. Quad On Pad
46
10°
y
S
0
N 10 -1
Q
_H
J
mQ
m2 10-2
a
U
Z
Q
J
O
F—	10-3
W
J
mQ
Z
D
10,
10,
	10-4	 10 -1
CHANNEL FAILURE RATE (;hr)
Figure 3.9. Quad Failure to Launch
47
7	 10	 18
	
15
4).L	3X 	 2).L	 ).L
C181	 C181
	
C262
8	 11	 14
Undetected
failure during
on pad ^ XL
91LL
1
(1-C1)61+3% L	(1-C1)61+2xL
^L
(1-C 2)82+XL
12
21LL
2	 9	 4	 6	 8
SYSTEM FAILURE
Figure :3.10. Quadruplex Launch Model (State descriptions are in following text)
48
0 otherwise
OF = 1 if there is an undetected fault present,
0 otherwise
Transitions from one state to another in the model are made according to the
following rates and probabilities:
A L Failure rate of a processor
C1 Probability that a fault is detected, isolated,
and the failed processor switched out given that
there are more than 2 non-failed processors
C2 Probability that a fault is detected, isolated,
and the failed processor switched out given that
there are only 2 non-failed processors
61	 The rate at which a fault is detected, isolated,
and the failed processor switched out given that
there are more than 2 non-failed processors
62 The rate at which a fault is detected, isolated,
and the failed processor switched out given that
there are only 2 non-failed processors
The states in this launch model are described in the following table:
49
State Number State Vector Description
7 4,0,0,0 4 operating processors
No faults
8 (4,1,0,0) 4 operating processors
1 fault
FDIR in process
9 (4,0,0,1) 4 operating processors
1 undetected fault (due to on-pad failure)
10 3,0,0,0 3 operating processors
1 processor switched out due to fault
11 (3,1,0,0) 3 operating processors
1 fault
FDIR in process
12 (3,0,0,1) 3 operating processors
1 undetected fault
13 (2,0,0,0) 2 operating processors
No faults
14 (2,1,0,0) 2 operating processors
1 fault
FDIR in process
15 (1,0,0,0) 1 operating processor
No faults
1 4,1,0, 1) 4 operating processors
a fault has occurred while an
undetected fault is present
2 (4,0,1,0) 4 operating processors
detection/isolation/recovery actions are
unsuccessful or a second fault has occurred
during detection/isolation/recovery
3 (3,1,0,1) 3 operating processors
a fault has occurred while an
undetected fault is present
4 (2,0,1,0) 2 operating processors
detection/isolation/recovery actions are
unsuccessful or a second fault has occurred
during detection/isolation/recovery
5 (1,0,1,0) 1 operating processor
detection/isolation/recovery actions are
unsuccessful or a second fault has occurred
during detection/isolation/recovery
6 (0,0,0,0) no operating processors
The launch model for the quadruplex was examined using different permanent
failure rates, different transient failure rates, different launch fault coverage factors,
and different pre-launch powered avionics times.
Table 3.8 gives the probability of system failure due to imperfect pre-launch
diagnostics for various failure rates, pre-launch intervals, and diagnostic coverages.
Transient failures are assumed to be 10 times the permanent rate.
Regardless of the length of the pre-launch interval, pre-launch diagnostic coverages
greater than 0.99 would result in unreliabilities better than 3.9 x 10 -6 or better for
all channel failure rates considered. For a 25 hour pre-launch interval, coverage of
0.9 or better result in unreliabilities better than 4.9 x 10 -6 . There are combinations
of pre-launch coverage, pre-launch time and channel failure rate which will cause the
50
Pre-launch Coverage
Pre-launch Time Channel Failure Rate
(hrs) ((Iround Fixed)/hr 0.9 0.95 0.99 0.995
25 5x 10 - 5 7x 10 -7 :3.5x 10 - 8 7x 10" :3.5x 10 -7
10 - 4 2.8x 10 - 6 1.4x 10 - 6 2.8x 10 - 7 1.4x 10 -7
5x 10 -4 4.9x 10 -6 2.5x 10 -6 4.9x 10 -7 2.5x 10 -7
50 5x10' 1.4x10 -6 7x10 -7 1.4x10 -7 7x10 -8
10 - 4 .5.5x10 -6 2.8x10 -6 5.5X10 - 7 2.8X10 -7
5x10 -4 9.8x10 -6 4.9x10 -6 9.8x10 -7 4.9x10 -7
200 5x10 -5 5.6x10 -6 2.8x10 -6 5.6x10 - 7 2.8x10 -7
10 -4 2.2x10-s 1.1x10 -5 2.2x10 - 6 1.1x10 -6
5x 10 -4 :3.9x 10 -5 2x 10 -5 :3.9x 10 -6 2x 10 -6
Table :3.8. Probability of Failure Due to Imperfect Pre-launch Diagnostics
51
processing site unreliability to exceed 1 x 10 -5 . It is concluded that the pre-launch
diagnostic. coverage needs to exceed 0.95. If the pre-launch interval is closer to 200
hours, the coverage needs to be closer to 0.99.
Table 3.9 gives the conditional probability that a processing site fails during the
launch phase given that the quadruplex degraded to a triplex during the pre-launch
interval and experienced two failures during launch. The duplex coverage is
assumed to be 0.5 and the transient failure rate is assumed to be 10 times the
permanent failure rate. Only the 200 hour pre-launch and permanent ground fixed
failure rate of 5 x 10 - '/hour approaches an unreliability of 1 x 10-5.
Channel Failure Rate
(Ground Fixed)/hr
Pre-launch Time (hrs)
25 50 200
5x 10 - 5 1.7x 10 - 9 3.4x 10 - 9 1.2x 10-s
10 - 4 1.4x10 -8 6.7x10- 8 9.6x10-8
5x10- 4 1.7X10 -6 3.2x10 -6 8..3x10 -6
Table 3.9. Probability of Launch Failure For a Quad Degraded to Triplex During
Pre-launch (Duplex Coverage = 0.5)
Table 3.10 gives the probability of failure due to all first failure during
the launch phase for different values of coverage during the launch phase. Launch
coverages of 0.8 and 0.9 result in a processing site unreliability above the 1 x 10 -5
when the channel failure rate is 5 x 10- 4 /hour.
Triplex - A Markov model for a triplex architecture is illustrated in Figure 3.11.
This model includes fault occurrence and fault handling states for both permanent
and transient faults. The fault handling mechanisms are modeled simply as a
holding state which leads either to a recovered state or to system failure.
The states in this model can be represented by a vector
S=(N,T,P), where
52
Launch Coverage
Pre-launch Time Channel Failure Rate
(hrs) (Ground Fixed)/hr 0.8 0.9 0.95 0.99
25 5x10 -5 2.7x10 -7 1.4x10 - 7 7x10 - 8 1.4x10-3
1x10 -4 1.1x10 -6 5.5x10 - ' 2.7x10 -7 5.5x 10-8
5x 10 - 4 2.6x 10 -5 1.3x 10' 6.5x 10 - 6 1.3x 10 -6
50 5x10' 2.7x10 -7 1.4x10 - ' 7x10 -" 1.4x10-"
IXIO -4 1.1x10 -6 5.5X10 - ' 2.7x10 - ' 5.5x 10-8
5x10-4 2.4x10 -5 1.2x10' 6x10 -6 1.2x10 -6
200 5x10 -5 2.6x10 -7 1.3x10 -7 6.5x10 -3 1.3x10 -3
IXIO-4 IXIO -6 5XIO - ' 2.5X10 -7 5X10-3
5XIO-4 2x10 -5 1XIO -5 5x10 -6 1xIO -6
Table 3.10. Probability of Failure Due to Imperfect Coverage During Launch
53
Figure 3.11. Triplex Launch Unreliability
54
N = the number of operational processors in the state
T = 1 if a transient fault has occurred and
detection/isolation/recovery actions are in progress,
0 otherwise
P = 1 if a permanent fault has occurred and
detection/isolation/recovery actions are in progress,
0 otherwise
Transitions from one state to another in the model are made according to the
following rates and probabilities:
Ap Permanent processor failure rate
AT Transient processor failure rate
P, Probability that a fault is detected, isolated,
and the failed processor switched out
S	 Rate at which a fault is detected, isolated,
and the failed processor switched out
µ	 Rate at which a processor failed due
to a transient fault is recovered
The states in this model are described in the following table:
State Number State Vector Description
2 (3,0,0) 3 operating processors
No faults
3 (3,0,1) 3 operating processors
1 permanent fault
FDIR in progress
4 (3,1,0) 3 operating processors
1 transient fault
FDIR in progress
5 (2,0,0) 2 operating processors
No faults
6 (2,0,0) 2 operating processor
No faults
1 0,0,0 1 operating processor
Loss of system
The reconunended configuration for the General Dynamics architecture is a triplex.
Assuming that the pre-launch diagnostics are perfect, the system can start the
launch phase undergraded. Under this assumption the unreliability of a triplex
configuration was examined for different launch fault coverages, different channel
failure rates and different transient failure rates. Table 3.11 gives the approximate
probability of failure for various channel failure rates for a triplex experiencing two
55
failures during the launch phase, the second of which results in an uncovered duplex
failure. Depending upon the transient failure rate scale factor, a triplex whose
channel failure rates exceed 5 x 10 -4 /hour or 1 x 10 - '/hour will not meet 1 x 10
-'S
unreliability requirement for a 10 minute launch phase.
Transient Channel Failure Rate/liour Duplex (,overage
Factor ((;round Fixed) 0.5 0.8
2 10 -4 1x10-' 4x10-s
5x10 -4 2.5x10 -6 1x10-6
10 -3 1x10_5 4x10 -6
10 10 -4 1.4x10_6 5.5x10-'
5x10 - 4 3.5x10_5 1.4x10 -5
10 -3 1.4x10 -4 5.6x10_5
Table 3.11. Triplex Launch Unreliability
Multiple Core Processing Sites (Launch Phase) - Using the channel failure
rates given in Figure :3.2, the reliability of a single processing site can be
determined. If the General Dynamics self-checking pair (SCP) is spared in each
channel, the model for the spared SCP must be used for the processor element.
Since multiple processing sites are needed for MPRAS applications, the reliability of
a network of processing sites is needed. Determination of multiple site reliability
depends upon the techniques used to interconnect the sites. For certain designs, the
interconnection mechanism can be treated as an N modular redundant component
whose failure probability contributes to system failure as an independent component
and each processing site can be treated as independent N modular redundant
components. At the other extreme, the interconnection is such that the channel
failure rates for each processing site and for one interconnect channel must be
combined to give a composite channel failure rate. This connection scheme will be
referred to as direct connection in this discussion. Analysis of systems which do not
56
fit either of these categories is often difficult due to the fact that events leading to
failure are not always independent and mutually exclusive.
The analyses presented by Boeing in their MPRAS Fourth Quarterly Review
assumed that the interconnection scheme is a quad modular bus whose failure call
be treated independently. Their justification for this assumption is based o q the use
of a switching bus interface unit (BIU) to provide processing site access to the quad
redundant busses. The derivation of General Dynamics MPRAS dependability
goals, as presented in MPRAS Reference Vehicle Requirements, implies that the
multiple processing sites are interconnected such that the core processing can be
treated as a composite multistring system. That is, the channel failure rates for all
sites call 	 added to give a composite channel failure rate. This implies that
processing channels for each site are directly connected to a bus and that no
cross-channel voting or bus switching is employed.
The AIPS interconnection provides for redundant connection layers or channels with
voting or selection of data received oil 	 layer. Processing channels call
transmit oil 	 layer. Analyses which simplify and bound the reliability of this
type interconnection have been presented.
If the direct connection of processing site channels without cross-channel voting or
selection is assumed for the General Dynamics architecture, the processing site
channel failure rates for all critical sites which are interdependent should be
combined and used as the channel failure rate for the composite redundant system.
Depending upon the assumption for the transient failure rate, a triplex which is the
reconunended configuration for the General Dynamics architecture will not meet a
10 minute launch unreliability goal of 1 x 10 -5 when the composite ground fixed
channel failure rate exceeds either 2 x 10_ 4 /hour for a transient rate factor of 10 or
10 -3 /hour for a transient rate factor of 2.
If M identical processing sites are required, the channel failure rate for each site
must not exceed -1 th of these composite rates. If the estimated failure rate for a
processing channel without a spare SCP given for General Dynamics in Figure :3.2 is
used, the number of processing sites will be limited to between 2 and 10 depending
upon the transient multiplying factor. Once again the reliability sensitivity to the
transient multiplying factor and the need to better quantify this factor is apparent.
Use of a spare SCP in each site processing channel would approximately double the
number of processing sites that could be supported.
For the direct connection approach, the probability of system failure for highly
reliable systems tends to be proportional to M Z
 for a triplex and M 3
 for a
quadruplex.
Assuming that the interconnection allows each site and the interconnection
57
mechanism to fail independently as N modular redundant elements and that all
processing sites are identical, the probability of system failure is given by:
PsF = 1 — (1 — P1 )(1 — Pp ) M ,	 (:3.17)
where:
PS  = Probability of system failure
P,	 = Probability that interconnection fails
Pp	 = Probability that a processing site fails
M	 = Number of processing sites
Assuming that the interconnection failure probability is much smaller than the site
probability of failure, the probability of system failure is approximated by:
PsF — 1 — (1 — Pp ) M .	 (:3.18)
If Pp is also small, the expression can be approximated by:
PSF ^ MPp
	
M.1
P 2 .	 ( :3.19)
 2(A1 - 2)1 p
The probability of system failure for M triplex processing sites connected in this
manner, under the assumptions that P, « Pp and Pp is small, is proportional to M
and would be a factor of M smaller than the direct connect case. For a quadruplex,
it is a factor of M 2 smaller. From the reliability standpoint, architectures whose
interconnection mechanisms support these latter assumptions can be more readily
scaled to larger configurations (increased M) than can the direct interconnect
architectures. The practical limit of this approach is the point where the
unreliability of the fault-tolerant interconnection due to the voters, switches and
other components is no longer much smaller than the processing site unreliability.
For a given interconnect failure rate, this places a lower limit on the processing site
reliability beyond which system reliability will not improve.
Analyses reported for both the Boeing and AIPS architectures indicate that their
proposed interconnection reliability will be better than the reliability of a processing
site.
Sparing for interconnection architectures of this type takes the form of additional
processing sites which may be substituted for any failed site. The Boeing
architecture has provision for this type sparing. The structure of the Boeing
transducer and flight control networks permit this to be accomplished by simply
activating the spare processor.
Supporting sparing in this manner is within the capability of the AIPS architecture
building blocks. However, switching in a single spare so that the sensor/actuator
58
Table :3.12. Models for Probability that Maintenance Will be Regnired Prior to
Launch
bite Interconnection
Redundancy Direct Independent
Triplex :3MAt :3MAt[1 — 2(M — 1)At.]
Quadruplex 6(MAt)' 6M(At')[I — 9(M — 1)(At)']
I/O network or networks for all processing sites in all 	 configuration call be
serviced, may not be cost effective. The need for, or inethod of, sparing for MPRAS
applications has not been proposed for AIPS at this point.
It is feasible that both quadruplex and triplex configurations composed of several
processing sites can meet the mission reliability requirements. Interconnection
mechanisms which provide for voting or bus selection are superior and more scalable
than direct connection methods. While General Dynanui cs has indicated in some
documents that a direct connect, composite, multi-channel system is their
reconunended configuration, some documents also indicate that processing sites can
vote data passed between sites. It is not clear which option is reconunended.
Multiple Processing Sites (Prelaunch) - For triplex processing sites, a failure
of a redundant channel in any processing site is sufficient to require maintenance
prior to launch. Similarly, two redundant channel failures in any processing site is
suffiicient to require maintenance for a quadruplex configuration. Table :3.12 gives
models that approximate the probability of maintenance for triplex and quadruplex
configurations using both direct and independent interconnection of processing sites.
Equation :3.19 was used for the independent interconnection configuration.
By varying A, M, and t over the ranges of interest, the probability that maintenance
with be required call
	 deternuned. With A = 1 x 10 -'/hour, which is
approximately the permanent failure rate of a processing channel in each
59
architecture; the probability that maintenance will be required for a triplex ranges
from 7.5 x 10_ 3 to 6 x 10 - ' for triplex redundancy, 1 _< M < 10, and 25 lirs < t. <
200 lirs. Even for the most favorable parameters, the probability that maintenance
will be required could be unacceptably high
For the same parameter ranges, the directly interconnected quadruplex probability
of maintenance ranges from 3.8 x 10 -5 to 2.4 x 10- ' and varies as MY. If t = 200
hrs. and M = 2, the probability of required maintenance would be 0.01. The
independently interconnected quadruplex probability varies as Mt' and ranges from
3.8 x 10 -5 to 2.3 x 10'. A 0.01 probability results when t = 200 lirs and M = 4.
3.4. Acquisition Network Reliability
The hardware modules required to support the sensor/actuator communications
function in MPRAS applications represent one half or more of the total hardware
requirements. It is also the case that to date no reliability analyses of the
acquisition networks have been reported by the MPRAS contractors. Based on this,
it was decided that this area of design should be examined more closely.
Accordingly, the reliability characteristics of this function were assessed for each
architecture. Given the early design stages of the MPRAS architecture, the lack of
specific application details and the breadth of MPRAS applications, a hypothetical
sensor data acquisition problem was synthesized. The characteristic of this
hypothetical application is simplicity and its purpose is to examine general issues
related to the reliability of proposed architectures.
The only reasonably complete description of the sensors and actuators for a vehicle
was provided by General Dynamics in the MPRAS Reference Vehicle Requirements.
This document details a representative set of sensors for what would be considered
to be a demanding application for MPRAS. The application incorporates adaptive
CiN&C., advanced telemetry processing, integrated Health Monitoring, and Fluids
Management, as well as propulsion control and other typical avionics functions.
From this representative application it can be concluded that flight-critical sensors
and actuators of mixed redundancy will be distributed throughout the entire
vehicle. Upwards of 3000 transducers are identified for the core and booster.
Based on the characteristics of the reference vehicle requirements, a simple sensor
acquisition problem was defined. This hypothetical application was then used to
examine the reliability characteristics of the candidate MPRAS architectures. In
this application it was assumed that sensors are grouped into N sections of the
vehicle and that all sensors have the same redundancy.
Figure 3.12 shows an acquisition network to collect the sensor data. Sensor interface
60
units collect data from a set of sensors. Multiple busses deliver redundant sensor
data to central computing resources. This network has local voting planes
associated with each sensor group. Thus, if properly implemented, consistent sensor
data call
	
provided. The network is representative of the architecture proposed by
General Dynamics.
Figure 3.13 shows a network which is representative of that proposed by Boeing
Aerospace. This architecture has a central voting plane. It is also presumed to be
the preferred configuration for the AIPS architecture, although both configurations
should be feasible with AIPS building blocks.
Consider the local voter network shown in Figure :3.12. The network is shown as
triply redundant, but the quad-redundant case will also be considered. Assume that
the failure to deliver at least 2 redundant inputs from any sensor group is sufficient
to cause system failure. Within a sensor group, loss of 2 sensor interfaces for a
triplex or :3 sensor interfaces for a quadruplex configuration results in system failure.
The loss of a sufficient number of busses or the loss of any voting plane will also
result in failure. The sensor interface will be taken to be any component which
delivers sensor data to the voter. It is assumed that the voter outputs are coupled
more or less directly to the bus. A bus link will be lost if a babbling bus transmitter
cannot be disabled, if bus connections fail, or if the destination receiver and bus
interface fail. Assuming that all sensor group interfaces have the same failure rate
and that all voter channels have the same failure rate, the probability that a local
voting triplex system is operational is given by:
PS G = (1 — 3P,, +2P,3)^ [1 — 3Pu +2Pu]^' (1 — PB) 3 + 3PB (1 — PB)z	 1
(:3.20)
where: P„ = Probability that a voter channel fails,
P,, = Probability that a sensor group channel fails,
PB = Probability that a bus channel fails,
N = Number of sensor groups.
The left most factor of this equation is the probability that at least 2 of the :3 voter
channels are functional. Since there are N such components, this factor appears to
the Nth power. Similarly, the second factor represents that at least 2 out of :3 sensor
interface channels within a group are functional. Again, this factor appears to the
Nth power. The third factor is similar to the probability that two or more of three
bus channels are operational, that is,
[(1 — PB ) 3 + 3PB (1 — PB)1]	 (3.21)
However, it differs due to combinations of bus channel failures and voter failures.
This factor will be referred to as an "adjusted" bus reliability factor. Similarly, the
61
.............................................................................................
SENSOR GROUP 1	 S /F
LOCAL	 Al
VOTING
PLANES
......................
............................
SENSOR GROUP 2:	 S I/F
LOCAL
VOTING
PLANES	 ...... ......................
•	 •	 •
	• 	 •
SENSOR GROUP N: 	 S I/F	 S 11F
LOCAL
VOTINGPLANES.............................
......
•
................................... .....
S	 I/F
Cn
7—.................................
BUS-A	 BUS-B I	 BUS-C
Figure 3.12. Sensor Acquisition Network with Local Voting
62
----'---------------------------
SENSOR GROUP 1
SENSOR GROUP o LQ2
-'-'-_''_--' ...................................0	 K	 0	 U	 0
0	 0	 0
SENSOR GROUP N
Cn
8VS'A
CENTRAL VOTING PLAmE----~ 	 ^	 ^
Figure 3.13. Sensor Acquisition Network with Central Voting
63
probability that a local voting quadruplex system is operational is given by:
P,C = ( 1 -4P,3 +3Pu) N ( 1 -4P„+3Pu )'v 	 [1 + 2Fv +3Pv2I0 - PB ) 4 +4PB (1 - YB)3	
(1 +.2P„] N Z
 N +6F'B(1 - PB ) z 	 1	
JL	 1	 [1 + 2P^ +31'2111(3.22)
Here, the third factor is similar to the probability that two or more of four bus
channels are operational, that is,
[(1 — PB ) 4 + 4PB (1 — PB ) 3 + 6PB(1 — PB ) 2 1 	 (3.23)
Consider the central voting architecture shown ill Figure 3.13. Sensor interfaces are
directly coupled to the bus associated with a given redundant channel. Tile sensor
interface is likely to be less complex than that for the previous illustrated
architecture since it does not have to drive a dedicated local voter. Bus failures can
be taken to be the failure of bus connectors, uncontrollable failure of bus drivers at
the sensor interface and the failure of the destination receiver and all support
components which deliver bus data to the central voting plane. The reliability of a
central voter triplex system is given by:
	
1	 1P5G = (1 — 3Pv + 2Pti)(1 — 3Pv + 2Pu ) N L (1 — PB ) 3 + 3(1 — PB ) 2 PB ^ 1 + 2Pu)N
(3.24)
The reliability of a central voter quadruplex system is given by:
N
PS G 
= ( 1 - 4Pv +3Pu)( 1 - 4PL +3Pu)' ( 1 - PB)4 + 4(1 - PB)3PB	 (1 + 2P„) 2 N + 6PB(1 - PB)2	 1	
J[1 + 2P„ + 3P„1	 [2 + 2P„ +.3F',?]I`'
(3.25)
The form of these equations is sinvlar to that of equations 3.20 and 3.22 in that
they contain factors for the voter, the sensor interface and the bus. However, the
voter factor is not to the Nth power and the bus factor is not the salve as that for
the local voting configuration. While the "adjusted" bus reliability factor is written
in a form that is similar to that of the local case, it contains factors that depend on
the sensor interface failure probability. This is due to the fact that without the
intervening local voter, the loss of sensor interface units and bus channels are no
longer independent.
If the probability of sensor interface unit failure is one-half, the terms in the bus
factor which depend oil sensor interface unreliability (P,,) become (l ) N for a
triplex and ($ ) N and ( 1 ) N respectively for the quadruplex. If N or P,, is increased,
64
the factor tends to degenerate further from that of the independent redundant bus
factor and tends to be influenced more by the all-busses' operational term
(1 — PB)h.
A qualitative comparison of the two types of architectures suggests the potential for
trade-offs between voter costs and system reliability. In the local voter case, the
voter reliability factor appears to the Nth power, whereas this factor only appears
once in the central voter case. For equivalent voter failure rates, this factor favors
the central voting for reliability. Since the local voters do not require the higher
throughput of the central voter, the potential exists for these voters to be less
complex, which in turn could result in a local voter that has a lower failure rate
than the central voter. The effect of this term would be weak.
If the local voter is designed so its failure rate is smaller than the sensor interface
unit failure rate, the "adjusted" bus reliability factors the local voting architecture.
Tables :3.13, 3.14, and :3.15 detail, for the proposed architectures, the hardware
modules and associated failure rates which are allocated to the model parameters
P, P,,, and PB.
For the Boeing architecture, Local Signal Conditioners are allocated to the sensor
interface component. The central voter includes the Fault Tolerance module, the
Processor module and a Power Supply module. The modules that are associated
with bus channels include all 	 channel unit, the bus interface at the processing
site and other components such as bus connectors, links, and uncontrollable failures
of bus receiver/transnutter units.
The General Dynamics architecture provides for a sensor interface module that is
part of a Remote Data Interface (RDI). Support for sensor interface modules is
provided by the RDI. Consequently, these modules are assumed to fail at about
one-half the rate associated with the Boeing Local Signal Conditioner. A bus
channel is allocated the HSDB module at the receive end of the data collection bus.
Other parts of the bus that can fail are connectors, links, and uncontrollable bus
receiver/transmitter failures. Note also that the voter failure rate is not less than
the sensor interface unit failure rate.
At this point, it is appropriate to discuss all
	 feature of the General
Dynamics architecture. If the number of sensors within each section of the vehicle is
increased beyond the number that can be handled by a single sensor interface
module in each RDI, additional sensor interface modules can be incorporated into
the RDI's, and the RDI's become regional voters rather than dedicated local voters.
The additional sensor interface modules call be added with less failure rate penalty
than that for additional sensors in the central architectures. For the central
architectures, N would double if the number of sensors in the vehicle doubled. For
65
Configuration: Central (Quad Recommended)
Total
Associated Hardware Module Failure Rate
Model Parameter (Ground Fixed)
Sensor I/F Channel Local Signal Conditioner 2.6 x 10-5/hr
(P.) @ 2.6 x 10-5/hr
Bus Channel 1/0 channel: 6.6 x 10-'/hr
(PB ) HSDB I/F (Q 2 x 10-5/hr
1773 I/F @ 1.8 x 10-5/lir
P.S. @ 8 x 10-6/hr
Processing Site:
HSDB I/F @ 2 x 10-5/lir
Other: N Links
Connectors @ N x 5 x 10-7/hr
Rec/Tx Babble
Voter Processing Site: 5.6 x 10-5/hr
(P„) Processor @ 2.4 x 10-5/hr
F.T. Module @ 2.4 x 10-5/hr
P.S. @ 8 x 10-6/hr
Table 3.13. Boeing Sensor Network
66
Configuration: Local (Triplex Recommended)
Total
Associated Hardware Module Failure Rate
Model Parameter (Ground Fixed)
Sensor I/F Channel Sensor I/F 1.5 x 10-5/lir
(P.) @ 1.5 x 10-5/hr
Bus Channel Processing Site: 2 x 10-5/hr
(PB ) HSDB I/F CO 2 x 10-5/lir
Other: N Links
Connectors L N x 5 x 10-'/hr
Rec/Tx Babble
(Has spare bus for each channel)
Voter RDI:
(Pv ) Processor w/o spare ( 6 x 10 -5 /hr 11.2 x 10-'/hr
w spare or
Local Data Link Cd 2.4 x 10 -5 /lir 5.3 x 10-5/hr
P.S. C 8 x 10-6/hr
RDI:
HSDB I/F CO 2 x 10-5/hr
Table 3.14. General Dynamics Sensor Network
67
Configuration: Central
Total
Associated Hardware Module Failure Rate
Model Parameter (Ground Fixed)
Sensor I/F Channel Device I/F unit Pre-launch
(P..) CP 2.6 x 10 -5 /lir 4.6 x 10-'/hr
I/O Node
Pre-launch L 2 x 10-'/hr Launch
* launch K0 1 x 10 -5 /hr 3.6 x 10-5/lir
Pre-launch
Bus Channel I/O Network: N Nodes 4 x 10- 5 /lir
(PB ) Pre-launch: 1 @ 2 x 10- 5 /hr Launch
launch: N L l x 10 -5 /hr (4 + N) x 10 -5/111.
plus 1 @ 2 x 10 -5 /hr Nodeless
Launch only 2 x 10- 5 /lir
Connectors
Rec/Tx	 (N N x 10-'/hr
FTP:
IOS L 2 x 10- 5 /hr
Voter FTP Site: 6.:3 x 10- 5 /hr
(P„) Processor @ 3 x 10- 5 /hr
Shared HW (Voter) L 2.5 x 10- 5 /hr
P.S. @ 8 x 10-'/hr
Table :3.15. AIPS Sensor Network
68
the General Dynamics modularity, the number voting planes could remain the Sallie,
and the number of sensor interface modules would double. In equations 3.20 and
:3.22, the N in the voter factor and ill 	 adjusted bus factor would be replaced by
all
	
equal to the number of voting planes and the unreliability of the sensor
interface, P„ would be increased.
The AIPS architecture has a Device Interface Unit for sensor inputs. Since the
AIPS I/O network is reconfigurable during the pre-launch period, I/O Nodes which
facilitate reconfiguration will be combined with the DIU for failure purposes. That
is, if all 	 Node fails prior to launch, only the associated DIU and its sensor
complement are lost. During launch, the loss of all 	 Node could result in loss of
the bus. It was assumed that one-half of the total I/O Node failure rate could result
in failures that cause loss of the I/O network if reconfiguration was not allowed.
This part of the node failure rate is allocated to the bus channel. The remaining
I/O Node failure rate is allocated to the DIU.
Both during and prior to launch, the failure of the I/O Node at the receive end of
this network and the I/O Sequencer in the destination FTP call 	 effect cause the
loss of the bus channel. Prior to launch, multiple failures of connectors, links and
bus Receiver/Transmitters are required to cause the loss of the bus. Otherwise, I/O
network reconfiguration can configure around these sources of bus failure. The
failure rate of FTP channel components where central voting occurs is allocated to
the voter failure parameters of the central model.
The probability of system failure for various mission times was determined for
various architecture configurations. Selected results are given in Table :3.16. The
N=4 cases are equivalent to 30 quad-redundant or triple-redundant sensors in each
of four sections of the vehicle or 120 multiple redundant sensor sets. The N=8 and
N=12 cases are equivalent to 240 and 360 multiple redundant sensor sets,
respectively. Results are given for General Dynamics in their recommended triplex
configuration. Both regional and local voter, and single self-checking and spared
self-checking processor cases are given. Quadruplex configurations for Boeing, AIPS,
and General Dynamics are also given. The Boeing quadruplex and the AIPS
quadruplex with switched I/O nodes produce essentially the same results. The
AIPS quadruplex without switched I/O nodes and the General Dynamics
quadruplex with spared SCPs give the most favorable results. The reliability for the
General Dynamics architectures which use regional voter configurations does not
vary significantly with the number of sensors. This is all
	 that the
reliability is dominated by the unreliability of the voter module.
If the powered pre-launch period is 200 hours, none of these configurations will meet
a 10 -5
 unreliability goal. Some will meet a 10 -4
 unreliability goal. The triplex
configuration will not meet a 10 -3
 goal. If the powered pre-launch period is limited
69
a
^O3a^
zGcna^cliF-
ccocl^
a
o
 
[
—
	
-I
o
o
 
t
-
 
u
n
a
o
 
t
-
 
"
n
v
 a
4
. W
 W
W
 W
 W
W
 W
 W
o
^
00
	
O
0
0
 O
M
 c
o
 —
.a
	
Z
+
^
 =
M
 M
 .--.
c0 C
' M
o0 "o V
'
r
 
+
 Q
Q.
r
 U
) \
Qa7q^
a
o
 
[
-
 
^
1
-
 
c
D
 
^
t
-
 
c
o
 C'
N
^
W
 W
 W
W
 W
 W
W
 W
 W
`
'U
c
o
t-00
(7)Ln
m
m
o
 
a
Q
a
^
 ti v
^
-
-
 
-
^
 o
f
c
^i
	
c
^i
	
.
-^
`r'
a
 to
C7
vC O
0
0
 t- L
n
t-
	 [--
	
Ir.+
t
-
 I
-
'
v
7
W
W
W
W
W
W
W
W
W
°o_
"
a
.
Q
o
; t`
 v
-
^
 0
0
 L
O
-
 
m
 
.
n
x v
 U
O
C7
Q-
v
•
v
 
v
co
t-
 C
p
 in
1
7
 C
D
 ^
'
t-
 c
D
 ^
'
v
 
"
v
O
l W
 W
W
 W
W
	
W
Z
m
fD
 M
 O
^
C)O V
' ^+'D
^Y' Ln
^
-
-^
	
-
-^
	[-
M
N
 .-^
rt' M
 N
.^
^
a
to
'
v
^
,
[- ca ,n
[- cp ^
[- co v^
W w W
W W W
W W w
^
'n
1
- N
 M
O)
	
01
•
.r --^
 t-
N
 N
 —
M
 M
 -^
Ox
in to M
L
n
 C
' M
u
•7 V
' M
_
•^W
[. W
W
W
W
CW
W
W
W
o
 d
a'
a^
 c
a
 v
-
^
 cD
 .n
M
 t- co
n
m
 -
 ri
c-i
e
r
 r; C
,4
^
v
 to
QC7Qx
V
' V
' M
'c[' ^' M
v
^d
W
 W
W
W
W
W
W
W
W
o
 to
U
M
	
-
-^ 00
M
 N
 O
^
M
 M
 O
^
•
 'n
 t-
•--•
	
Ln
	 1-
.
-
-
.
 Ln 00
^
v
 0
Qak
C
 'c7' M
V
' M
 N
C
' M
 N
y
-
	
^"'
a
W
W
W
W
W
W
W
W
W
^
c
nU
^-1
M
 ^-•+ 00
C
o O
 Lc')
C
", In N
O
	
c
r'
Lo
 
1—
c• ;
	
.
-
.
.
-
.
,-
,
	
,-
,
	
cq
a
 
o
Q
z
v
.
n
o
°o
N
 to
LO q
,
N
 L
O
LO (Z)
N
 t1 j
C14
N
N
0
0
Nz
70
to 25 hours, most configurations except the triplex «-ill meet a 10 -5 unreliability
goal. Note that the recommended General Dynamics triplex configuration with
spare self-checking processors requires 50% more processor modules than the
quadruplex versions of other architectures and does not perform as well. Note also
that for the Boeing and AIPS architecture, the central voter can be spared. This
approach would improve the probability associated with the bus components such
as the Boeing HSDB interface and the AIPS Input/Output Sequencer. As a result
the reliability of these architectures could be further improved.
While the particular configuration analyzed contains simplifying assumptions which
may not be completely applicable to particular MPRAS applications, it is
sufficiently representative that the results should be of concern. The topology of the
networks match those proposed both by Boeing and General Dynamics and the
complexity or sizes considered are well within MPRAS requirements. Since the
resulting reliabilities are near or fall short of mission reliability requirements, the
design of the acquisition network should be examined much more closely. This is
particularly true for triplex configurations. If well-designed, the local voting
approach can exhibit better reliability, scalability and performance characteristics.
3.5. Testability
Testability is the ability of an item to undergo valid. dependable functional testing
and associated fault detection/isolation, within constraints of elapsed time,
complexity of access, support equipment and functional procedures, and within set
limits of manpower, material and other resources.
Testability underlies reliability, fault tolerance, maintainability, availability and
productivity. For the ALS, MPRAS testability is a key factor which enables cost
savings. The costs associated with test and maintenance for the current launch
system are significant. These include the cost of extensive checkout at all stages of
vehicle integration and the cost of support personnel and equipment to respond to
failure prior to launch. Testability call 	 the cost of building a system. The
cost of reworking a system due to an undetected fault at a particular assembly stage
increases as assembly proceeds past that point. Improved testability at all assembly
stages reduces both the frequency of reworks and the extent that assembly can
proceed before a fault is finally detected.
The increased complexity of MPRAS avionics increase testing requirements
substantially beyond that required for current launch vehicle avionics. With the
large number of components in the system, it is reasonably certain that there will be
several component failures during the pre-launch period. Improved testability
71
results in better fault isolation and improved mean-time-to-repair. More
importantly, improved fault diagnosis reduces the probability that the vehicle will
be launched when the avionics is in a degraded state that could jeopardize a
successful launch. Hence, mission reliability can be improved.
Testability features for each architecture were reviewed to determine if they had
been incorporated into the design to the extent appropriate for the concept,
requirements and early design stages and to determine if testability requirements
are consistent with the system maintainability, reliability, availability and fault
tolerance objectives.
The testability goals/requirements defined at this stage in MPRAS should include,
but not be limited to, the following subjects:
1. requirement for status monitoring.
2. definition of the failure ]nodes specified to be the basis for test design.
3. requirement for failure detection (failure coverage, failure latency) using full
test resources.
4. requirement for failure detection using built-in test resources.
5. requirement for failure detection using only passive monitoring.
6. requirement for limiting false alarm rate.
7. requirement for failure localization to a subsystem/equipment using built-in
test.
8. requirement for failure isolation to one or more number of modules using
built-in test. The requirement may be expressed in terms of percentage of
modules in a subsystem/equipment.
9. requirement for failure localization/isolation times.
10. restrictions on built-in test resources in terms of hardware size, weight and
power, memory size and test time.
11. requirement for BIT hardware reliability.
12. BIT MTBF.
13. allowable down time.
14. percentage of false alarms.
72
15. mean fault detection time.
16. mean BIT running time and frequency.
17. maintenance skill levels.
18. system modularity.
19. test point isolation.
20. number of maintenance points and access.
21. test equipment and access.
The General Dynamics requirements establish goals for the MTTR, for fault
detection and for fault isolation. Testing which supports assembly, integration and
pre-flight checkout is composed of tests that are directed toward several levels of
hierarchy. These include chip level, module level, subsystem level and system level.
These tests are to be used in a manner that reduces the time required to test the
overall system but maintain a high level of fault detection. The hierarchical
breakdown reduces the testing required for complex systems.
Chip level testing is directed toward complex VLSI functions and relies upon on-chip
built-in-test (BIT). For complex VLSI functions, BIT is often the only practical way
to determine that a chip is fault free with a high level of confidence. Module level
testing is directed toward line replaceable modules and will rely upon both on-line
and off-line self-testing as well as error detection mechanisms. For example, the
processor/ memory proposed is to implement a self-checking pair. Module testing
and error reporting is to be supported by the test and maintenance bus interface
provided to each module. System level testing is supported by a health maintenance
controller. This controller receives health and status reports from modules and
subsystem, monitors module/subsystem self-test functions, controls system
configuration for test and diagnostics, provides diagnostic capability for subsystem
interconnections, monitors system communications for errors, inserts data to check
error detection mechanisms and directs, monitors and diagnoses system level tests.
Independent communication will be provided to each channel of the redundant
paths to maintain isolation capability. Standard test interfaces are specified.
Consider the test and maintenance hardware for a triplex system which has ten
engine controllers, 5 remote data units, a vehicle management processor and a
guidance and navigation processor. There will be 3 system test and maintenance
modules (STM) in each of these or 17x3=51 STM modules. Assume the failure rate
for these modules in a ground fixed environment is 2x10 5 /hr. Also, assume that the
73
test related hardware on each module in the system represents about 20% of the
module complexity. If the average module failure rate is 2.5x10 -5 /hr, the failure
rate of the related test and maintenance components is 5x10 -6/hr. The combined
failure rate for the test and maintenance hardware in a 200 module system is
200x5xlO -6 + 5lx2x10 -5	10_3 /hr. For 25-200 hours of pre-launch operation, the
probability of a failure in the test and maintenance hardware ranges between 0.025
and 0.18. Since there is a significant chance that test and maintenance hardware
can fail in pre-launch operations and since the desired probability of not detecting a
system fault is somewhat lower than the chance of test failure, the reliability and
testability of the test and maintenance hardware appears to be a risk area for the
MPRAS design.
In addition to the self-checking pair; the use of memory error detection and
correction, communications error detection and correction, voting and software
diagnostics to flush out latent faults are suggested for in-flight operation. The
specifics of these mechanisms are not defined.
The Boeing health monitoring and BIT supports self-test, monitoring, readiness
evaluation for launch, maintenance and assembly operations. Health monitoring is
organized into a hierarchy of levels which are:
• Vehicle Level
• Stage Level (e.g., core or booster)
• Module Level (e.g., P/A module, payload)
• Subsystem Level (e.g., propulsion processing)
• Component Level (e.g., computer enclosure)
• Subcomponent Level (e.g., circuit board)
• Part Level (e.g., integrated circuit or sensor)
Tests for each level include internal self-test with isolation from assemblies of same
or higher levels, external interconnect test with assemblies of same or higher level
and external test of the assembly with BIT of higher assembly level.
The Boeing functional specifications call for an advanced, comprehensive, thorough
health monitoring and BIT system. The philosophy, constraints, goals and
guidelines for the design of this function have been laid out for the core processing
as well as for the sensor/effector elements. Extensive descriptions of the
sensor/effector failure modes and tests have been developed. Standard test
74
interfaces are specified and each electronics enclosure will have all 	 test
connector which will provide access to the test and maintenance port.
On-line error detecting requirements for the Boeing architecture call for error
detection and correction of memory data and communications data, scrubbing of
memories, and on-line. diagnostics. Specific techniques and details for these
mechanisms were not specified.
Extensive test and error detection mechanisms have been implemented for the AIPS
proof-of-concept system. These include an on-line self-test diagnostic designed to
uncover latent faults in the voter and error reporting circuits, data memory, program
memory and the real-time clock. A presence test is run every processing frame prior
to application processing to establish which members of a fault masking group are
available. Processor exception errors that are detected include bus errors, address
errors, illegal instruction errors, spurious interrupts and arithmetic traps. A watch
dog tinier is used to detect processors that fail to complete operation sequences.
The intercomputer and input/output networks detect protocol errors, data errors
and time outs. Voters are used to mask errors oil 	 communications. A
test port provides host controller access to the shared bus of the FTP. The extensive
BIT that is being used for modern complex VLSI devices and the extensive BIT
mechanisms needed to support thorough high coverage assembly and pre-launch
testing of MPRAS avionics, including the sensor/effector acquisition distribution
hardware is not implemented in the AIPS proof-of-concept. The extent to which
these concepts will be proposed for MPRAS was not available for this evaluation.
Both the Boeing and General Dynamics testability and maintenance functionality
and design guidelines are appropriate for the mission requirements and the stage of
development. The status of the Boeing test-related requirements are at a more
advanced stage than that for General Dynamics.
The BIT, self-test features and test interfaces which support assembly and
pre-launch checkout for AIPS in all
	 application are not fully defined. The
AIPS FTP and communications network self-test and error detection mechanisms
which are appropriate for the in-flight and on-pad phases have been implemented
and tested in the proof-of-concept system. Only the intention to use self-test
diagnostics and error detection techniques have been declared for the Boeing and
General Dynamics architectures. Which techniques and at what places in the
architecture they are to be employed has not been detailed for either architecture.
None of the architecture design information adequately addresses the reliability of
the BIT and related test support hardware. At least for the Boeing and General
Dynamics cases, the estimated failure rate for this is sufficient to be recognized as a
risk area requiring closer attention. Not enough is known about the AIPS to
determine if there is reason for concern.
75
4. PERFORMANCE EVALUATION
4.1. Introduction
The primary objective of the performance evaluation was to determine if the
proposed MPRAS architectures are capable of handling the processing workloads
projected for ALS avionics as expressed in baseline requirements. In addition,
characteristics such as sensitivities to architecture parameters and workloads were
considered. The evaluation takes into account the performance of the
sensor/actuator data acquisition/distribution architecture as well as the basic
computing resource architecture.
These performance analyses provide for the identification of strengths and
weaknesses of each architecture, identification of serious design deficiencies,
identification of potential development risks and critical design areas, and
identification of significant differences in the designs assessed. The basis for a
comprehensive performance evaluation includes specifications of hardware building
blocks and their associated characteristics, specifications of software operating
system characteristics, specifications of fault tolerance mechanisms and the
specification of the application characteristics.
With a few exceptions, the hardware building blocks for each of the architectures
have been adequately specified. The functionality and performance characteristics
of the hardware fault tolerance mechanisms in the General Dynamics and Boeing
architectures have not been specified sufficiently to either qualitatively or
quantitatively evaluate their impact oil 	 performance. Moreover, they
are not sufficiently specified to assess their adequacy to support the overall fault
tolerance of the architectures. For all architectures, the specification of the
sensor/actuator interface is not as complete as desired. The least complete is that
for the Draper Device Interface Unit.
The information of interest regarding system software architecture includes a
description of control characteristics such as distributed, central or hierarchical, and
descriptions of functions including task scheduling, I/O services, interrupt services,
memory management, utilities, interprocessor communications services and
functions related to fault tolerance. In addition, performance information such as
function overheads, context switching time, expected execution time for a given
system function, function response times and uncertainty in response times is of
interest. The operating system software for the Draper architecture has been
designed, developed and documented for a proof-of-concept system. Performance
characteristics such as context switching time, fault detection isolation and recovery
76
parameters, and certain I/O network service parameters have been p leasured and
reported. Not reported to date are measurements for IC and I/O network message
delivery times. Due to the early design stage of the Boeing and General Dynamics
MPR.AS architectures, specification of their operating systems design and
characteristics is minimal.
An application specification which contained baseline requirements and provided a
functional decomposition was requested. For each subfunction, a description of the
inputs, processing, outputs and special requirements such as transport delay, jitter
and process update rate is necessary. Subfunctions should be broken down into
tasks and the execution sequence for these tasks should be specified. Processing
workload in terms of instructions per process update and information flow between
subfunctions should be characterized. The basis for the workload estimates was
requested.
A good computational model specification for distributed real-time systems is
essential for the design of such systems. To be consistent with "concurrent
engineering" principles, good engineering practice and a methodology for the design
of dependable systems, the SDIO BM/C 3 Processor and Algorithm Working Group
has recommended that this model be created and delivered at the earliest milestone,
the System Requirements Review (SRR). (Applicable documents a, b, and c.)
Specification and modeling of the computations in an application is the starting
point for the design and evaluation of computing systems. From such specifications
and models, the workloads associated with the application can be characterized by
temporal behavior and by function or subfunction within an application.
In the past., computational models for systems consisted of estimates of various
parameters such as processing throughput, input/output data rates, and memory
accesses. These estimates were often based on coarse (low-fidelity) information
which was scaled up to provide adequate safety margins. While these requirements
were often broken down by application subfunction they seldom provided any
information regarding temporal characteristics of the workloads. Average workloads
do not account for peaking factors due to multiple rate groups and transport delay
requirements present in a complex real-time control system. Furthermore, average
values are not adequate if a function's workload is too large to be implemented on a
single processor.
While past practice is at best marginally adequate for a real-time system, it is
unacceptable for complex real-time distributed systerns. If an application workload
must be distributed across processing resources, allowable decomposition strategies
must be specified for the application. Furthermore, the conununications workload
generated by each decomposition strategy must be characterized. For complex
real-time applications, the deadlines associated with different processes in an
77
application can lead to variations in workload over time. In addition, specific
decompositions of the workload to reside on the distributed computing resources
can dictate the specific design of the fault detection, isolation and recovery
processes that must be employed.
Tables 4.1, 4.2, and 4.3 summarize the computational resource requirements derived
for MPRAS applications by each of three aerospace companies. The bases for these
resource estimates were not documented. Computational throughput requirements
are generally consistent at least to the degree of certainty expected in the early
system requirements stage. Exceptions to this observation are the areas of
propulsion control and telemetry processing (TT&C). There is at least a 10 to 1
discrepancy between the propulsion throughput requirements for the Martin
Marietta requirements and the Boeing and General Dynanvcs requirements. This
difference is significant since it applies to every vehicle engine and can impact overall
requirements substantially. The proposed Boeing topology has a highly centralized
processing core which would not be feasible if the higher propulsion throughput is
required. At the very least the Boeing topology would require a large number of
processing sites. This requirement should be clarified and resolved for future
MPRAS effort. The telemetry processing throughput for the General Dynamics
requirements is substantially greater than that called out in the other requirements.
General Dynamics requirements provide for substantial data compression and
transnussion of vehicle data via telemetry down links to the ground.
The throughput for most functions is somewhat less than the 10 to 20 MIP
processing throughput projected for MPRAS processor modules. The Martin
Marietta propulsion throughput estimate of 10 MIPS, the General Dynamics
telemetry processing estimate of 9.4 MIPS and the Boeing adaptive guidance and
navigation processing estimate are exceptions. If any of these functions cannot
reside in a single processing module, decomposition of them will require
re-exanvnation of systems communication estimates. The core processing
throughput requirements for the General Dynamics computation model includes a
percentage for operating system overhead. Operating system overhead for real-time
systems with a range of sensor sampling rates and control loop update rates is not
always well modeled by a percentage of the process computation throughput
requirements. An example is the task or context switching times which require a
constant number of instructions each time a task is invoked. For low frequency tasks
this overhead is much lower than for high frequency tasks. Of particular concern for
General Dynamics throughput estimates are the tasks that will be associated with
the sampling and processing of the acoustic and vibration sensors in the propulsion
instrumentation which must be sampled at 2kHz. These tasks must be invoked
every 1/2 millisecond. A context switch must be completed in significantly less than
78
1/2 millisecond in order to maintain real-time processing.
Sensor/actuator I/O rates are broken out in the Boeing and Mai-tin Marietta
models. There is a 6 to 1 factor between these estimated 1/0 rates (6mbps vs
lmbps). Interfunctio q I/O rates for the General Dynamics and the Boeing
computation models differ by a factor of 4. These differences should also be resolved
before further MPRAS work proceeds. These differences are sufficient to render
certain of the proposed architectural topologies infeasible.
Appendix A contains hierarchical diagrams for the MPRAS application as described
by each aerospace company. The functional decompositions and relationships
between decomposed functions are provided in these diagrams.
FUNCTION
NAME
I/O RT
(kbps)
I.F. RI'
(kbps)
TURPUT
(MIPS)
DATA MEM
(k B)
PGM MEM
(k B)
Propulsion 200 726 5.8 570 (4)
Fluids (1) (1) (1) (1) (4)
GkLN 67 270 12.8 1:350 (4)
Control 67 :380 .4 270 (4)
TT&C 560 852 .1 215 (4)
Communications (2) (2) (2) (2) (4)
Ground Interfaces (2) (2) (2) (2) (4)
Range Safety 1 :30 .00:3 :3 (4)
Mission Mngillnt 0 280 .2 25 (4)
Health Monitoring 40 654 .4 2620 (4)
Instrumentation NA NA NA NA NA
Veh-Elem Interfaces (3) (3) (:3) (:3) (:3)
Power 1 50 .2 70 (4)
Fault To] Mngmnt (4) (4) (4) (4) (4)
Miscellaneous 5000
TOTAL 9:36 :3242 19.9 10120
(1) Included in Propulsion
(2) Included in TT&C
(3) Included in Control
(4) No Information
* Core Stage Requirements
Table 4.1. MPRAS Computational Resource Requirements — Boeing
The computational model does not provide task sequence information, does not give
79
FUNCTION
NAME
1/0 RT
(kbps)
I.F. RT
(kbps)
THRPUT
(MIPS)
DATA MEM
(0)
PGM MEM
(kB)
Propulsion (3) 4400 6.6
	 (6) 20 20
Fluids (3) (7) .03 1 3
G&N (3) (7) 1.9 160 140
Control (3) 3200 3.2 25 100
TT&C (3) (7) 9.4 7 2
Conununications (1) (1) (1) (1) (1)
Ground Interfaces (3) 4000 1 1 5
Range Safety (3) (7) .6 17 36
Mission Mngmmnt (2) (2) (2) (2) (2)
Health Monitoring (3) (7) 9.8 3035 2254
Instrumentation (5) (5) (5) (5) (5)
Veh-Elem Interfaces (5) (5) (5) (5) (5)
Power (3) (7) .1 5 5
Fault Tol Mngmnt (3) (7) (3) (3) (3)
Miscellaneous (4) (3) (7) 2.7 7 2
TOTA
1	
1 12800 35.3 3276 T	 2567
(1) Included in Ground Interfaces
(2) Not a separate function for General Dynamics
(3) Included in I.F. Rate
(4) Data Recording Function
(5) Not called out for Point Design
(6) For 10 engines
(7) Relatively small values.
* Point Design
Table 4.2. MPRAS Computational Resource Requirements — General Dynamics
80
FUNCTION
NAME
I/O RT
(kbps)
I.F. RT
(kbps)
THRPUT
(MIPS)
DATA MEM
(kB)
PGM MEM
(kB)
Propulsion 10	 (1) 40	 (1) 4000	 (1)
Fluids NA NA NA NA NA
G&N (2) (2) (2) (2) (2)
Control (2) 185 4.32 164 930
TT&C .35 330 1650
Conununications NA NA NA NA NA
Ground Interfaces NA NA NA NA NA
Range Safety .004 660 2
Mission Mngmnt NA NA NA NA NA
Health Monitoring NA NA NA NA NA
Instrumentation 6240 .6 14 70
Veli-Elem Interfaces NA NA NA NA NA
Power .005 1 1
Fault To] Mnginnt (2) (2) (2) (2) (2)
Miscellaneous (3) 60 6 34 7
TOTAL -	 I -	 I 11+10/ENG 1200 2600+4000/ENG
(1) Per Engine
(2) Includes Central Control, Staging Control, G&N and Redundancy Management
(3) Winds Ahead
Table 4.3. MPRAS Computational Resource Requirements — Martin-Marietta
81
I ---
workload in terms of instructions as opposed to MIPS, does not describe the basis
for the model in terms of computations being implemented, I/O operations,
compiler assumptions, does not include what redundancy management operations
are performed, does not provide a temporal distribution cf workload and does not
provide a sound basis for interfunction coinniumcatlons.
The function (control loop) update rates, latency and jitter have been specified for
each computational model. However, there is no specification for the degree of skew
allowed between samples taken from different sensors for the same time index. For
the fuel slosh or structural vibrations or bending measurements that could be used
in all 	 control loop, it should be expected that a high degree of time
coherency would be required across spatially distributed sensors in order to best
learn and adapt for vehicle dynanuc. parameters. A time coherency specification
could impact the design for the control and synchronization of distributed sensor
sampling hardware.
It is strongly recommended that the MPRAS computation model be refined and the
areas where substantial differences exist between the three models described herein
be resolved before further MPRAS development proceeds.
A detailed audit of the sensors and actuators required to support the application
was requested. The audit should specify for each sensor/actuator the type, proposed
redundancy, number of each type, number of bits, source/destination subfunction
associated with the sensor/actuator, associated failure rate for each mission phase
and oil 	 test and calibration requirements. Short of a detailed audit a coarse
specification that provides a gross decomposition of sensors as to appropriate
number in various portions of the vehicle, the number of sensor/actuators that
support a particular subfunction, the number of sensor/actuators that have a given
redundancy level and the number that are time.-critical or flight-critical and the
approximate bandwidth of each sensor category is necessary to establish credible
system requirements.
The proposed topology for the hardware elements and interconnections for each
vehicle configuration specified in the baseline requirements is desirable. In addition,
the allocation of subfunctions to specific hardware resources should be defined for
the range of MPRAS requirements.
4.2. Sensor/Actuator Data Interprocessor Communications
Performance
Background — As indicated in the previous section, the sensor/ actuator I/O
interface and interprocessor communications, hardware compromises more than 50%
82
of the total MPRAS hardware. The combined data bandwidth of the numerous
sensors/actuators is typically much less than the bandwidth of the conununication
network used to collect/distribute these data. However, providing the capability to
distribute/collect these data in a manner that is fault-tolerant and that is flexible
and scalable to meet diverse requirements for a broad range of vehicle configurations
is not without performance considerations. Certain of the performance analyses for
MPRAS indicted that since the total sensor/actuator data bandwidth or the total
interfunction communications bandwidth represented only a small fraction (less
than 5%) of the system communication network bandwidth that a detailed
performance analysis was not necessary. Such design and performance analysis is
questioned since the sensor/actuator I/O communications performance for the
MPRAS architectures is a potential risk area and limiting factor. Communications
for the MPRAS sensor/actuator data collection/distribution can be defined as the
delivery of information with appropriate fault tolerance and error checking measures
from a sensor to a destination function (software task) or the delivery of information
from a source function (software task) to an actuator. Under this definition of
communications, not only is the conununication network links and signaling
hardware part of the communications path but the I/O hardware and software is
included as well. The delays associated with the network services software places
limits on the overall communications bandwidth for a distributed system. When
there are numerous sources and destinations uniformly sharing the communications
network, a fairly high network utilization can be maintained and the performance
limitations due to network services software can have modest impact on network
utilization. These performance limitations are more restrictive in networks where
comimunications are limited to a few destinations or where communications emanate
from a few sources. This is precisely the case for the sensor/ actuator I/O
communications network. Under these circumstances, the overall communication
bandwidth can be substantially lower than the network bandwidth.
Each of the few destinations or receivers were considered to have a message
processing latency. During the message processing, the receiver could not accept
new messages. Network utilization and overall communications bandwidth were
determined as functions of the ratio of the message time on the network to the
network service software message processing time.
Background — The process of systems design requires analysis of conununication
network bandwidths. This analysis must consider the characteristics of the message
senders, the communication network, the message receivers, and the messages
themselves. Often, if the combined message traffic generated by the senders is a
small percentage of the communication network's bandwidth, the network is
thought to be sufficient, regardless of the receiver characteristics.
83
However, if the receiver is unable to process its incoming messages in a timely
manner, messages inay be lost. Message processing for many communication
networks is a software-intensive function which is typically many times slower than
the amount of time that the message transnussion takes on the physical
communications link. Therefore, receiver characteristics cannot be ignored when
analyzing a system's conununication network.
Approach — With receiver characteristics in mind, a study was conducted to
exanvne the expected performance of a network with many senders and few
receivers. The network was modeled as a bus for which each message contended. It
does not model network protocols such as token passing. Although this model is a
simplification of the operation of most communication networks in use today, it does
demonstrate the impact of receiver bandwidths on such networks. This
simplification has little effect on performance predictions when the receiver delays
are large relative to cable delays coupled with the presence of only a few receivers.
The model consisted of a maximum of :32 sender nodes, each connected through :32
corresponding bus nodes to a maximum of 4 receiver nodes. Figure 4.1 is a diagram
of the modeled system.
Each sender and receiver node was modeled as having its own hardware, while all
bus nodes contended for the same bus hardware resource. Although there were a
maximum of 32 senders and 4 receivers, only the number of components of interest
were active during a given siniulation. Each model node seized its hardware
resource for a period of time representing the time required to send, pass or receive
its data structure. This period (in seconds) was calculated by dividing data
structure size (in bits) by the bandwidth (bits/second) of the seized hardware
component. If messages queued up at a receiver node, the node's period was
modified to reflect the tithe required to execute all queued messages.
The network traffic model consisted of two types of messages, node-to-node and
broadcast. Node-to-node messages originated in a specific sender node and were
delivered to a specific destination or receiver node. Node-to-node messages were
generated so that each destination node received an equal number of messages from
each send node during a given interval. Broadcast messages originated in specific
send nodes and were simultaneously delivered to all designation or receive nodes. A
certain percentage of the iessages froth each send node were broadcast messages.
Node-to-node messages account for sensor data that is used by a single processing
site. Whereas, broadcast messages account for sensor data that is used by all
processing sites.
Parameter Variation — Two model parameters were designated independent
variables to be altered to observe the impact on the utilization of the model's
84
Figure 4.1. System Model
85
components and the system's total effective bandwidth. They were the bandwidth
of each receiver and the ratio of senders to receivers. The bandwidth of each
receiver was varied relative to the constant bandwidth of the network. The receiver
to network bandwidths ratios were 128:1, 64:1, 32:1, 16:1, 8:1, 4:1, 2:1, 1:1, 1:2, 1:4,
1:8, 1:16, 1:32, 1:64, and 1:128. The ratio of senders to receivers were varied to
include 32:4, 16:4, 8:4, 4:4, 32:2, 16:2, 8:2, 4:2, 2:2, 32:1, 16:1, 8:1, 4:1, 2:1, and 1:1.
For the cases discussed in the following section, the bandwidth of each sender ws set
at 1 /6 of the bus bandwidth and the percentage of broadcast messages was set to a
small value.
Component Utilizations Per Sender to Receiver Ratio — Sender to receiver
ratios were statically established prior to each simulation run. For each run, the
results showed that the utilization of each component varied with the receiver
bandwidth to bus bandwidth ratio (RB:BB).
Figure 4.2 shows the percent utilizations of one sender, the bus and one receiver for
the case of 1 sender and 1 receiver.
1o0.00x
%00%
Utilization of Communication Components
1 Sender : 1 Receiver
Corrporwnt
900Oc
$500% 1`
%	 75.001E ;	 +1
70.001E 11U
65.001E Or	 1
60.001E 1
.155.001E
50.00%z r
1
6	 45.00%
t	 60.00%
35.00%0
n	 30.00% r`
2500% 
20.00% 4
15.00% IF	 F -x--^-E-^
.IK
1000E
^
+------S-d.e	
^5,00% -Q.. 
-000%
641	 32:1	 16:1	 11	 4:1	 21	 1_1	 12	 IA	 1:6	 1+16	 132	 164	 1:126126:1
Rsniwr Ban&md i to BUS Bwdnidn Rs6o
Figure 4.2. Utilization of Model Components for 1 Sender and 1 Receiver.
As the receiver bandwidth decreased relative to the bus' ability to provide data (to
the 1:1 ratio), the sender was the limiting factor of the system. Both the bus and
the receiver were waiting for the sender to supply messages to them. At an RB:BB
86
of 1:1, the bandwidths of the bus and the receiver were equal, both being utilized at
approximately 17 percent. Between RB:BBs of 1:4 and 1:8, the receiver's bandwidth
decreased to the point where both the bus and the sender supplied messages faster
than the receiver could handle. The receiver was the limiting factor of the system's
effective bandwidth.
Figure 4.3 shows the percent utilizations of each of two senders, the bus and one.
receiver for the case of 2 senders and 1 receiver.
Utilization of Communication Components
2 Senders : 1 Receiver
CorTpo r t
100.00%
95.00% 1
90.00%
95.00% ^1
6000% O_.------	 AK
% 75 00%
V 70 00%
^	 BS.00X d
11
60.00% 1
^ ,1
55.00%
i 1
Z	 5000%
e	 4500% 0-
1	 40.00%
35.00%
n	 30.00%
2500% \
20.00%
`15.005E k
10.00% ,i. + - - - - - - S—W
500%
000% -
126 . 1	 64'1	 32.1	 161	 e:1	 41	 2'.1	 IA	 1:2 IA	 to	 tae	 132	 1.64	 1.126
R.mwr S&,*N"061 to Bus B-&.,dn Rsoo
Figure 4.3. Utilization of Model Components for 2 Senders and 1 Receiver.
These results are different from the previous case. Notice that between RB:BBs of
1:2 and 1:4, the receiver's bandwidth decreased to the point of beconvng the
limiting factor of the system's effective bandwidth. In the prior case, this occurred
between 1:4 and 1:8. The results in this instance are because there are twice as
many suppliers of messages to the one receiver. Twice the number of messages also
produced an increased utilization of the bus from 17 to 33 percent.
This crossover point continues to move to the left on the RB:BB scale until the S:R
becomes 8:1 (Figure 4.4).
Here, the bus was the limiting factor from an RB:BBs 128:1 through 2:1. At the
RB:BB of 1:1, the utilization of the bus and the receiver were the same and from
RB:BBs 1:2 through 1:128 the receiver was the limiting factor.
87
Utllizatlon of Communlcatlon Components
Undws : 1 Ro c"r
ao oos'^
fe oos I1
; IW
a r»s f
saws -------
	
N.o.wn
noon -^ -F- -^-+-^r ♦ ^
loins fUf Obn 1
f fMW% i
tams
I
f tan^
,
}	 ^^
46 On \
f 40DM
I ,	 \
ri WS
31D 00% \1
saw% 4.\
\\\.ft^
%,kl000s
1	
16	
------- aw e.r
t M 1 ^.
v
coos
—^I ^^^fat
llf	 a	 ati	 41	 tf	 kf	 t/	 1:1	 is	 fA	 If	 f:fe	 t.-'*	 1'A1 fAin
Raw dvdrWth b But BsdWYr RIb
Figure 4.4. Utilization of Model Components for 8 Senders and 1 Receiver.
As the nuinber of senders was increased while the number of receivers was kept at 1,
the bus limited the system bandwidth until the RB:BB of 1:1, when the receiver
became the limiting system component.
When the number of receivers was doubled to two, the crossover point of the
utilization curves of the bus and each receiver became the RB:BB of 1:2. This can
be explained by the fact that each receiver was receiving half the number of
messages going through the bus. By comparing Figures 4.2 and 4.5, one can see
that the sender and receiver utilization curves are at the same RB:BB location. The
crossover occurs at this point whenever the number of senders equals the number of
receivers.
Sinular to the single receiver case, as the number of senders was increased and the
number of receivers held constant, the bus' utilization increased, becoming the
limiting factor until its bandwidth equaled the combined bandwidths of all receivers.
At that point, as above, the utilization of the receiver became dominant. This can
be seen iv Figure 4.6.
Component Utilizations Across Sender to Receiver Ratios — The impact of
the S:R can be seen by exanvning the utilization on one component across
numerous S:Rs, such as that of the con-ununication network as seen in Figure 4.7.
88
C'Z
Utilization of Communication Components
2 Senders : 2 Receivers
Corrpli	 ni
0500%
9000%
$500% !\
6000% ^I 0- - - -	 R—mv.
+% 
7500% 1
7000%
U
t	 65.00%
6000%
1\
t 5500%
z	 so oa% i
45.00% 11 F - - - - But
t	 40.008
35.00%
° E-t-MF - IF-E-^-1E - ^-^-R	 \
n	 3000%
2500% A
20.00% f
1500%
1000% k.^'	 1 +------
'	
\w5.00% u-
0.00%
64.1	 32'.1	 161	 IJ	 CI	 2.1	 I:1	 12	 1a	 14	 1.16	 132	 144	 1126121'1
Ba°6;vrr Bandvd i to BUs B6ndwioih Babo
Figure 4.5. Utilization of Model Components for 2 Senders and 2 Receivers.
10000%
95.00%
Utilization of Communication Components
8 Senders : 2 Receivers
3M _1F _1F-t-t- l - I6--i	 13-0 ... 0--0- -4k
`•b
11
Conporwlt
9000% 1
3500% 1
60.00% 1 0-------- R4CM'e
7000%
U
t	 65.00% ^^ 1
$000%
55.00% 1	 1
50.00%Z ^	 1
a	 45 00% \	 \ F - - - - Rut1	 \
r	 40.00% + \
1 35 00%0
\	 1
^	 \\
n	 30.00%
2500% ID
	
\,
20.00% ^\
15.008 \
10.00% ; ♦ +------5«d4r
5.00% ' d	
`
0	
rkr0.00%
64:1	 3211	 16.1	 1 ' ]	 4:1	 2:1	 i l l	 12	 14	 1 1 1	 i 16
	
132
	
1441:1x612611
Rscnwr Bandwld i to But Band^dn Ratlo
Figure 4.6. Utilization of Model Components for 8 Senders and 2 Receiver.
89
p ----- ._. 32A
-- --16:4
------- 4-4
x---- 322
Ol ------ 8:2
*_______4:2
6- ------ 22
>E ----32:1
v------16:1
8:1
+-------- 4:1
-2:1
164 1:128
Percent Utilization
For the Communlcatlon Network
8	 1B•	 32•
0600%
9000%
8500% 16:1	 10:2	 1` lea
$000%
X 75 00%
3211322
	
(r3241
7000% 0U ^.
60.00%
i	 55.00% f%
 42^'^t`
Z
	 50.00%
45.00%
r	 4000%
36.00% 2•
n	 3000%
26.00%
20.00% LI	 IS'`
x__M__K__M____y__x__X__1f__ x`16.00%
10.00%
5,00%
a.00%
1201	 64'.1	 321	 161	 1:1	 4:1	 2'I	 I:i	 12	 I:I	 I:t	 116	 1
R.M., BriA"ov, to NSMWR Bu101xidVI Re00
Figure 4.7. Utilization of Conununication Network Across Sender to Receiver
Ratios.
90
For cases where the number of senders was less than 8, the dominant component
utilization for RB:BBs of less than or equal to 1:1 is that of the sender. For the
same RB:BB range, when the-number of senders is 8 or greater, the network's
utilization is dominant. As discussed above, an increase in the number of receivers
causes the knee of the network's utilization to shift to the right on the RB:BB scale.
The corresponding curve for the utilization of each receiver is shown in Figure 4.8.
Percent Utilization
For Each RecaNw
Se Men Rw,Awl
100 00%
	
C4 00%	
Op	
are
Ira 4A
	
00.00%	 III .
	
IS 00%	
' 1  /' ^•
	
w00%	
1I 
	
,If/ ^	 ------•..
	
1t 7{.00%	 3Y1
	
U 7000%	
%A 1
	
X-
	
1 1500%	 10.1	 * ^,	
'-	 171
6000%
7 I ,	 s — — — lea
ss 
00
00% ^.
1 so %
	
4500%	 r /	 *-------.a
G	 12
I/ 
36 00%
	
• l 	 In 3000%
!1
n 00%
N o0%
• /	 cl
	
1500%	 ,/^------• as
rl
t ooK
oov%
1201 441 17'1 14 + 1a1	 4:1	 V	 I:1	 11	 1A	 It	 I.I4 137 IN 1 Ile
P" h., Ba.1A..cn f0 N.Mwk Br10w+A1 Mee
	
x
Figure 4.8. Utilization of Each Receiver Across Sender to Receiver Ratios.
Effective System Bandwidth — The results of this study indicates that the
effective bandwidth of the network is determined by the combined bandwidth of all
receivers and not by the bandwidth of the bus alone. As the combined bandwidth of
the receivers diminished, so did the effective bandwidth of the system as a whole.
The system's bandwidth relative to that of the network alone is shown in Figure 4.9.
The system's effective bandwidth was calculated by dividing the total number of
bits received by the total simulation time. When the receiver message processing
delays become large relative to the physical conununication link message
transmission time coupled with the presence of only a few receivers, link bandwidth
utilization decreases. At ratios of 50:1 to 100:1, utilization is severely limited by the
receiver message processing delay.
In addition, to this model, measurements of the message processing time for a
91
Combined Receiver Bandwidth Relative to Network Bandwidth
Expressed In Percentage
16	 32'.'	 S•nd•n B•arrv•r6loo oox F-♦-^-•^—f—F—i^
	
9500%	 p ..... ._. 32A1:1 •2
	 6A00.00% rt
h	 ia-- - - - , aA
	
15.00%	 16:1 let	 `16A
q	 1
	
8000%	
1	
+____ - _6:/
•	 SZ'.I 3221 ^32:I
75.00%
	
a 70.00><	 +111
	
1 65.00%	 1	 I	 X- ------ 322
	
60.00%	 ,	 C1 -- --162
	
• 55.00%	 1^1', a2,^/
•
^t
+	 n
-- - rz
+c
2
	
so.00
	 1
	
v 45.00%	 +-------/
40.00%
o- .	 2:2
c
	
3500%	 2•
o--B'--B--B'-a'-^-@--^	 i	 X•----32:1
n	 z), 22.
25.0.0%
r 	 o-----•1e.1
	
• 20.001	 1:1`•
g 15.00% X__X__X__X __X__X __X __X __X__,0`
	
`^	
R______. 6:1
	
• 10.00%	 <
5.00%
p ---'2:1
0.00%
126:1	 64 1	 32'.1	 16:1	 e:1	 /'.1	 2:1	 1:1	 12	 IA	 1:6	 1 . 16	 132	 161 1:126
Rom— BaiidWidal to N•6vork Bandwidth Ratio 	 X
Figure 4.9. Effective System Bandwidth Across Sender to Receiver Ratios.
general purpose communication network consisting of Ethernet system coupled with
DECNET and VAXELN network service software were made as a reference point.
These measurements indicate a message processing time to message cable time ratio
of about 100:1 for the 10Mbits/second cable and 1 Mip processors. RTI's experience
with other network comtunications indicates that this is a not ail
	 ratio.
Depending on the number of destinations or receivers, the model indicates that the
maximum network utilization that can be expected would be in the 1 to 10 percent
utilization range with such ratios. Good engineering practice dictates that network
service message processing delays due to normal message assembly and delivery and
due to error checking and fault tolerance related processing, should be analyzed and
predicted to determine the limitations of the architecture. The communications
models and measurements referred to in this paragraph are detailed in Appendix B.
AIPS I/O and Intercomputer Communications — Figure 4.10 shows the
hardware elements associated with ail
	
FTP processing channel. A processing
channel consists of a computational processor (Cl?), an input/output processor
(IOP), a shared memory, a data exchange and voter unit, a dual ported memory
(DPM), ail 	 sequencer (IOS), ail 	 interface sequencer
(ICIS), an I/O network and ail 	 communications network. Circuit
switched I/O network nodes and links establish conununications from the IOS and
92
SHARED
BUS
1	 1
1 PROCESSOR1	 I
1	 1
1	 I
1	 ^
1	 MEMORY	 1
1	 1
1	 1
1	 CP	 1
I	 1
PROCESSOR 1
11	 1
1	 I
1	 1
1	 MEMORY	 1
1	 1
1	 1
1	 IOP
SHARED I	 I DATA 
MEMORY	 EXCHANGEAND VOTER
CROSS
CHANNEL
LINKS
DUAL
PORT
MEMORY
108
I/O
NET
1/0 NODES
DUAL
PORTMEMORY
ICIS
IC
NET
the DIU's. Spare nodes and links are provided so that communications to all Dl[J's
call be maintained via reconfiguration in the event of network failures. I/O network
system services software provides for communications between user tasks and the
sensors and actuators connected to the I/O networks in all RIPS system. In
addition to the message processing and delivery functions required for I/O
communications, I/O network services provides for fault detection and isolation
capability for the I/O hardware elements, reconfigures the I/O network hardware as
dictated by I/O hardware failures and provides for distribution of consistent input
data to all non-faulted redundant channels.
Figure 4.10. FTP Architecture and I/O Data Flow
Au application task running in the CP issues an I/O request through the I/O
services resident in the CP. The I/O services resident in the IOP receives the I/O
request and issues a command to the IOS to activate a programmed chain of I/O
network transactions which are associated with the specific I/O request. The IOS
then issues commands to specific DIU's oil
	 I/O network. For each command the
selected DIU responds with a message which is processed in the IOS interface and
93
stored in the DPM. Typically, the message contents are distributed to other
redundant channels via the data exchange unit and are delivered to the shared
memory. The IOP processes the received messages and performs error processing.
The message contents are then delivered to the application task in the CP.
Typically, only a single conunand is required by the IOS to carry out a complex set
of I/O operations. The IOP is relieved of much of the detailed control necessary to
implement the 1/0 transfers.
Even with the mechanization of the detailed I/O control via the IOS and
programmed chains, AIPS I/O operations are software intensive. Maintaining
general and flexible I/O services while providing high coverage fault detection and
network reconfiguration capabilities leads to I/O throughput that is much less than
the I/O network bandwidth. Effective 1/0 bandwidth has been one of the primary
performance limiting factors of the current FTP design.
Measurements of 1/0 request processing times for the ALPS proof-of-concept system
have indicated that the message processing time exceeds the 1/0 network cable time
by factors of 25 to 50. On all in excess of 100 IOP instructions are required
for each sample acquired via the 1/0 network. Further, to meet the latency (2 to 5
milliseconds) and control loop update rates (100Hz) required for MPRAS, the I/O
request processing time must be reduced by a factor of 10 to 20. Significant 1/0
speedup cannot be achieved by increasing the bandwidth of the 1/0 network. Faster
data exchange and voting hardware, higher throughput processors (CP and IOP),
use of a more efficient Ada compiler and more efficient I/O service software are
candidate areas to provide the necessary speedup. The AIPS processors expected to
be used for MPRAS should provide at least a factor of 5 to 10 speedup over the
AIPS proof-of-concept model. Additional, speedup beyond that of the processors
will likely be required to meet the MPRAS requirements. Only after the I/O
request processing time has been reduced by the factors indicated would it be
worthwhile to increase the I/O network bandwidth by a factor of 5 to 10 and to
speed-up the voter from the 2.5 to 5 microseconds per word of the current
proof-of-concept model. Note that speed-up of the voter can be limited by the
amount of time skew permitted between channels.
Due to I/O overhead, the effective 1/0 bandwidth which accounts for both the 1/0
network bandwidth and the I/O request processing time depends on the number of
sensors that can be read in a single I/O request which in turn is application
dependent. Based on the measured I/O request tithes reported, the effective I/O
bandwidth for a flight-control application was between 1.5x10 4 and 3x104
bits/second. The actual 1/0 network bandwidth was 2x106 bits/second. Only about
1% of the actual network bandwidth could be used. If the I/O request processing
time is reduced by a factor of 20, the effective I/O bandwidth could be increased to
94
about 5x105 bits/second. Since the combined bandwidth of all MPRAS sensors and
actuators is between 1X106 bits/second and 6x 106 bits/second, an effective
bandwidth of 5x10 5 bits/second for a single I/O network should be sufficient for
AIPS to meet MPRAS requirements. Figure 4.11 sununarizes the structure of the
AIPS I/O data delivery path.
The AIPS intercomputer network provides communications between processing
sites. The ICIS shown in Figure 4.10 provides an interface to a layered (redundant)
iutercomputer data network and controls data transfers between the IC network and
the processing channel. Each redundant channel of the processing site can transmit
on a layer of the network and receives data from all layers of the network. Received
data can be selected from any layer or can be voted across all layers.
Circuit-switched K" 	 and data links provide communications between
processing sites and in the event of network failures can be reconfigured to maintain
communications.
The iutercomputer system services provides iutercomputer conununication service
for user tasks, provides a mechanism for maintaining tithe across distributed
processing sites, and manages the fault detection, isolation and reconfiguration for
each layer of the IC network. The IC conllnunlcatlon services architecture is
designed in a layered approach sinvlar to the proposed Open Systems
Interconnection protocols. The layers are:
1. the physical layer
2. the data link layer
:3. the network layer
4. the transport layer
5. the session layer
6. the presentation layer
7. the process or application layer
The ICIS provides the mechanism for controlling the IC data communications via
high-level conunands from the IOP. Even with these ICIS features the IC system
services is software intensive. In the same areas it is more complex than the I/O
network services. At present the IC message delivery times for the proof-of-concept
system have not been characterized. It should be expected that those delivery times
will be of the same order as the I/O request processing times for the I/O network
95
108
DPM
XCHNG
VOTE
1/0
SERVICE
i	 TASK
FTP Ll
Figure 4.11. FTP I/O Structure
96
services. In fact, the delay components due to use of the IC network will increase
due to the time required to contend for network access. Interfunction
communication requirements for MPRAS are as high as 12.8x10 6 bits/second
combined and range from negligible to about :3x10 6 bits/second for any single
function. If functions are allocated to processing sites such that these larger
Interfunction communications must take place across the IC' network, I(.;
communication requirements will be demanding. To meet these requirements, the
proof-of-concept IC bandwidth must be increased substantially from the current
2x106 bits/second.
In addition, to maintain sufficient effective IC bandwidth the IC message delivery
times must be reduced from those that will be realized in the proof-of-concept
system. Message delivery times of a few microseconds per byte instead of tens or
hundreds of microseconds per byte will be required. If the higher interfunction
requirements are valid and if the functions with the larger requirements must make
use of the IC network, the IC message delivery time will be one of the highest
performance risk areas for AIPS applied to MPRAS.
Network reconfiguration times for both the IC and 1/0 networks will not be critical
for MPRAS since reconfiguration will only be allowed during the pre-launch phase
when sufficient time is available for reconfiguration.
Boeing I/O and Intercomputer Communications — Figure 4.12 diagrams the
hardware modules associated with a processing channel in the General Dynamics
architecture. A channel is comprised of a local signal conditioner, all
	 channel
module, a system bus module, a fault tolerance module, all
	
interface module,
	 a
processor module, a memory module, all
	 module, a MIL-STD 1773 flight
control bus, a MIL-STD 177:3 transducer network bus and all system bus.
The software that provides for communications between application tasks and the
sensor/actuator network and manages network failures has not been specified for
the Boeing architecture.
Two distinct types of sensor/actuator I/O has been defined for the Boeing
architecture, time critical and non-time. critical. Time critical 1/0 is handled using
the MIL-STD 1773 flight control bus. Non-time critical 1/0 is handled using the
transducer network and the HSDB system bus. Figure 4.12 shows a conjectured
sequence for time critical 1/0 data flow via the flight control bus. Sensor data from
a local signal conditioner is transferred to an 1/0 interface module memory via the
flight control bus. This transfer requires I/O interface module software to check for
data errors and to control delivery of the data to a desired destination. It is
assumed that in order to have consistent sensor data in all redundant processing
channels the fault tolerance module will be used to distribute and vote sensor data
across the redundant channels. This is believed to be within the intended use of the
97
—1
1	 I
1	 BUS	 1
1	 INTERFACE
	 I1 1
1 (ACTUATOR
1	 SIGNAL 1 OUTPUTS
1	 CONDITIONER ^—^1 I
I I
I	 OUTPUT
I	 MODULE 11
CROSS CHANNEL
LIOLKS
r--------1 1 QfJANNEk
	 I	 r--------11	 I	 1	 III:TERFAGE1
 I	 1	 1
t	 I 1`?_;_J I	 I	 PROGRAM 11 PROCESSOR I 1
	
—;: 1
	 1	 MEMORY	 1
1	 I	 I(OTE	 1	 1	 1
1	 I	 1	 YNC	 I	 1	 1
1	 1	 1	
...	
.;^:	 1	 I	 1
1	 MEMORY =.:<?;	 :;:;	 I	 1	 PROCESSOR	 11 »: C 6 1Y..	 4V	 B6RI p^:::;
INTERFACE I 1
	 CACHE'	 MEMORY ^ I	 1	 :^ 	 11	 1 I ^^ 1 1 F	 1I
1 1/0 INTERFACE I
	
1	 FAULT
i MODULE	 I I	 TOLERANT	 1	 1 PROCE88OR 1
I	 1	 1	 MODULE	 1	 1 MODULE	 1
a ------_ — ^ 1 ----_----	 a ------ — —•
DUAL H8DB
SYSTEM
COMMUNICATION
r-- — ---I
II
DUAL	 1	 HSDB	 1
PI BUS
	 I	 INTERFACE	 I
	
1	 I
1	 I	 PI 1	 I4
	 r---4 PROCE 980R I
l
I	 MEMORY
1	 i	 MEMORY
1	 1	 I	 I
I	 MEMORY	 1	 1 SYSTEM BUS I1	 MODULE	 1	 1	 1
I—_-------I	 1— MODULE -- 1
—	 — — — — — — — 1
1 Bus	 I
1	 INTERFACE	 1
1	 1
SENSORt	
CONTROL 1
INPUTS 1 SIONAL COND	 11
1 
A /Xp ND
	 ;
1	 11	 LOCAL SIGNAL	 1
CONDITIONER MODULE
	
1
a----------------I
Figure 4.12. Boeing Architecture and Time Critical I/O Data Flow
98
fault tolerance module. However, the design for and use of this module has not been
clearly specified for the Boeing architecture. Following this assumption, data would
be transferred to the fault tolerance module via the PI bus. This data would be
distributed to the other redundant channels via cross channel data links. Similarly,
data from redundant sensors acquired by the other redundant channels would have
to be distributed to all channels as well. Presumably, errors occurring during these
sensor data distribution processes would be detected, processed and reported by
software in the fault tolerance module. In addition, the redundant sensor data could
be processed to obtain a single set of sensor data which in turn could be distributed
to all channels via the cross channel links and voted by the fault tolerance module.
Finally, either the redundant sensor data sets or a single voted sensor data set
would be delivered to the requesting application task executing in the processor
module via designated locations in processor memory. I/O service software in the
processor module would be required to manage this process.
Figure 4.13 illustrates a conjectured data flow sequence for non-time critical I/O for
the Boeing architecture. It differs from time critical I/O in that sensor data is
transferred via the transducer network, all
	 channel module and a system bus
module. (The switched BIU function shown is expected to reside oil 	 fault
tolerance module.) 1/0 software to direct these transfers and to detect and report
errors would be required in the appropriate modules. Further, the HSDB module
software would have to manage the token ring protocol of the HSDB and any
required network flow control. Sensor data received in the system bus module would
then be passed to the fault tolerance module where the process would proceed as
previously conjectured. Figure 4.14 summarizes the structure of the Boeing
non-critical I/O data network.
The message delivery time associated with sensor data would be the sum of the
times required to complete each step in the process. In the worst case, the time
between the delivery of one sensor data message and the next sensor data message
would be the same as message delivery time. The communications bandwidth
realized would be determined by this message delivery time as opposed to the
bandwidth of the data transfer busses. If the message delivery process is designed so
that a new message can be started as soon as a given hardware element such as the
1/0 interface module has completed its operations on the previous message, the
time between sensor data messages will be reduced from the sum of the times
required to complete all steps in the delivery process to the longest time that a
single hardware module is used in the delivery process. In this case, message
delivery would be designed as a pipelined sequence of steps.
Components of the overall message delivery time for the data flow sequence
described above include:
99
-Al
CR088 CHANNEL
LINKS
r___- - - - -1 1::;: 1,:ANNEL	 I	 r_ - - - - _ - -1
1	 1	 l i:i%? INn"'CE I	1	 1
1 PROCE880R 1 I	 I	 1	 PROGRAM	 1	 r	 -	 - - I1	 1	 I ;:;::	 I	 I	 MEMORY	 I	
11
1	 I	 1	 (mi 7E /	 I	 1	 1	 DUAL	 1
VbTEPI BUBII	
	
I
1	 MEMORYI 1'E;i`i;?" `pO ^ I
	
1 PROCE880R 1
I	 I^'PROCESS OR	 1	 1	 I	 1 PI 1	 1
^^	 J	 t^ PROCE380R 11	 1:: .....: iiii::i:>iii: is i:« i:::::.. 	 I	 I	 1
BUS	 1	1 	 I
INTERFACE	 1	 MEMORY	 I	 I	 CACHE	 1''y"''"^'' MEMORY
	
I	 1	 i
1	 1	 1 ..^
 I	 1	 1	 I	 I	 MEMORY	 11	 1 1	 ,1 1 ^ I 1^ 1 1
	 11 1/0 INTERFACE 1
	
1	 FAULT	 ::::::::	 I	 I	 I	 1	 1.;.:.>
1 MODULE	 1	 1	 TOLERANT`:'	 SPROCESSOR	 I	 MEMORY	 I	 1
	:^ 	 I	 I	 I	 MODULE	 1.^;?:1	 Y34'>^q BUS 1I	 1	 MODULE	 I MODULE	 I	 I	 I MODFJLE
	 I
FLIGHT CONTROL BUS ( 1773)
FIN D8,,ERFACE py a 
1
1 
9 INTERFACE	 i ; INTERFACE	 i	 ;	 CONTROL
SENSOR 1	
CONTROL i I
	 (ACTUATOR	 I
INPUTS1 MIUX AND OND	 1	 SIpNAI	 (OUTPUTS	 (...._:::BUA / D
	
	
S	 i
I I CONDITIONEA —'}.r.' : INTERFACE
LOCAL SIGNAL	 I I	 OUTPUT	 1	 ..;` I 110 CHANNEL
	
CONDITIONER MODULE .;:•. I I	 MODULE	 1	 K''	 MODULE	 HSDS
.••^.	 ^__	 ^_	 SYSTEM
TRANSDUCER NETWORK ( 1773 )
Figure 4.13. Boeing Non-Time Critical I/O Data Flow
100
SENSORS
1	 1	 1	 1
!	 1	 1	 1
I	 IN	 1	 IN	 1 • • • !	 l
1	 1	 I	 1I	 1	 1	 1
1	 I	 I	 1
I
1
I
1
I
` LOCAL SIGNAL
— — — CONDITIONERS
4 ACTIVE
( 1773 0 2Mb )
1	 i
1	 111/0 CHANNEL
I	 I
HSDB
	 4 ACTIVE
b
I	 11I
I110 11	 SERVICE	 I
1	 11
I	 II
1	 II1	 11
I	 XCHNG	 I
1	 V	 11
1	 II1 I1	 11
1	 11
1
TASK	 1
1	 II
I	 111
PROCESSING	 1
.. —PUE — — — —
--
Ii 1/O 11 SERVICE	 1
1	 II
1 11I	 i1	 11
1I	 XCHNG I
1	 V III	 11
I	 II
1	 111	 1 1
TASK ;	 1
1	 '1
1	 11
PROCESSING	 ; 1
PLTE
Figure 4.14. Boeing I/O Structure
101
1. the time required for data to be solicited froth a local signal conditioner after
an application task has requested data.
2. the time required for the local signal conditioner to respond.
:3. the time required to check for errors in each step.
4. the time required to determine the destination of received data.
5. the time required for data formatting.
6. the time required to distribute data across redundant processing channels.
7. the time required to transfer data into and out of the various memories
involved in the message delivery process.
8. the time required to reduce the redundant sensor data sets to a single
consistent sensor data set for all operational channels.
9. the task switching times associated with each processor in the message
delivery path.
None of the performance evaluations reported by Boeing indicate the extent to
which the data delivery sequence, the associated error checking and the
management of redundant sensor data were incorporated in the system performance
evaluation. The data delivery time rather than bus bandwidth is the limiting factor
for both I/O and interprocessor communications. It is more critical that this factor
be analyzed for the Boeing architecture because of the more centralized processing
topology proposed. That is, data from the large number of sensors are delivered to
fewer core processing sites in the Boeing architecture and, consequently, a higher
data bandwidth is required at each site.
Architecture elements such as the I/O channel module, the I/O interface module,
and the system bus module must handle the data rate associated with the signal
conditioners connected to the processing channel. Since redundant sensor data must
be distributed to all processing channels, the fault tolerance module trust handle
the data rate associated with all local signal conditioners connected to the
redundant channels of a processing site. While all delay factors in the data delivery
mechanisms were of concern, the handling of sensor data in the fault tolerance
module was examined more closely.
The functionality and use of the fault tolerance module is not described extensively
in the Boeing specifications. With the exception of calling out a 3.5 MIP processor,
the performance of the fault tolerance-model is not specified. Assuming that the
102
module will be used to establish consistent sensor data across processors, to
synchronize redundant channels, to vote partially processed results, to distribute
error data and status across redundant channels, and to align processor states
during recovery, only a portion of its capacity can safely be allocated to establishing
consistent sensor data. A simple performance calculation can be made for the fault
tolerance module. Assume that 25% of the fault tolerance module capacity can be
dedicated to establishing sensor data consistency and that 500 kbps of the
maximum specified sensor data rate, 1 mbps, must be handled by a processing site.
Further, assume that the sensor data rate is unifornily distributed across quad
redundant channels. The average time between 16-bit sensor data words in each
channel is then 128 microseconds. A simple model of the relationship between the
sensor data word interval and fault tolerance module performance is given by:
R(TDv + Ts + I • T/) :!^ CAS TSD	 (4.1)
where:
R	 = redundancy
TD 	 = time required to distribute and vote a simplex sensor data word
TS 	 = time skew between redundant processing channels
I	 = average number of instructions allocated to handling a word to be
distributed and voted including memory transfers and error processing
(These are instructions that cannot be overlapped with the distribution
process
T, = average instruction time of the processor in the fault tolerance module
CAS = % fault tolerance module capacity dedicated to sensor data consistency
TSD = average interval between sensor data words within a processing channel
Note that all fault tolerance module times are multiplied by the redundancy factor
R. This is due to the need to distribute copies of the redundant sensor data to all
channels in order to assure consistent data in all operational channels. Note also
that it is assumed that the skew between redundant channels impacts the voting of
each word. The impact of channel skew depends oil 	 design of the fault tolerance
module which was not given by Boeing. This simple model accounts for establishing
R sensor data sets in each processing channel. Additional processing would be
required to establish a single voted sensor data set.
If a skew of 1 microsecond and 10 instructions are required for each word, the time
to distribute and vote simplex data must be less than 4 microseconds per word or 4
mbps. This results in a reasonable specification for the fault tolerance module
distribution/vote timing. If 20 instructions are required for each word, the time for
103
distributing and voting must be less than 1 iiucrosecond per word or 16 mbps. This
would result in a much more demanding specification for the fault tolerance module.
Recall that the AIPS requires in excess of 100 instructions oil 	 for each sensor
sample delivered to all 	 task. Thus, 10 instructions per word for this
portion of the sensor delivery is very tightly coded. if the more demanding I/O rate
of 6 Mbps as estimated by the Martin Marietta specifications must be met instead
of the 1 Mbps rate, the fault tolerance module processor would have to be much
faster than 3.5 MIPS, the channel skew would have to be reduced, the distributed
vote time would have to be reduced and the 1773 busses would not be adequate.
Since the channel timing skew, the average number of instructions per word, the
distribution/voting time and the module capacity dedicated to each fault tolerance
module function such as synchronization or sensor data consistency, it is not
possible to assess the adequacy of the Boeing design. It is further believed that the
performance of the fault tolerance module is critical and, left unspecified and
unanalyzed, represents a significant technical risk.
Since the proposed Boeing topology has only a GN&-C processing site and a vehicle
management processing site, most interfunction conuiiunication takes place within
the processing sites and does not require the use of the system HSDB. Thus, the
bandwidth of the bus and even the effective bandwidth of interprocessor
communication software would not likely limit system communications. If Boeing's
low estimates for propulsion control throughput do not hold and additional
processing sites must be introduced into the Boeing design, substantial portions of
interfunction I/O will be implemented via the system HSDB and the performance of
the interprocessor communications software will have to be analyzed to determine
its adequacy.
Further development of the Boeing MPRAS architecture should be predicated upon
a better definition and performance analysis of the I/O and interprocessor
communications software.
General Dynamics I/O and Intercomputer Communications — Figure 4.15
diagrams the hardware modules associated with a channel of a Remote Data
Interface (RDI) or a processor in the General Dynamics architecture. An RDI
channel is comprised of a sensor input module, a local data link module, a
self-checking pair processor/ memory module, a system bus module and all
module. The software that provides communications between application tasks and
the sensors and actuators, and manages the network errors has not been specified
for the General Dynamics architecture.
Illustrated in Figure 4.15 is a sequence of I/O data flow operations that is presumed
for a Remote Data Interface (RDI) of the Boeing architecture. Sampled sensor data
104
CROSS CHANNEL DUAL HSDB
LI	 KS SYSTEMCOMMUNICATION
1
FPROCESSOR
I
^^` 1
I	 C	 SS
CH	 EL
I
--	 —	 ---1
1
i
1	 T I
i HNTERFACE>P	 S
1	 I1
1
I
1
MEMORY' 11	 DUAL	 1	 PR	 ES	 R(	 PI BUS	 1
I	 1
1
PI
-----1
OCE990R
1
1	 PI	 I	 PROCESSOR	 1F--1SENSOR
INPUTS	 I SIGNAL
~^
1 1
I
--'^
11
COND,MUX
AND A/ D 1
1
I
1	 M MOR1
1
1
1 11 MEMORY
1
1	 1MEMORY
	 1I
I 1 1 I 1
I	 I
1	 1	 I
SENSOR INPUT i PROCESSOR 1	 11	 LOCAL1 1
1	 SYSTEM BUS
I1 MODULE
a___-----•
1 1	 1a_ DATA LINK _r MODULE
--------1
I	 I	 MODULE
1----_---
LOCAL SERIAL
DATA LINKS
r------ --- ----- 
1
I INTERFACE
I
I
1 1
1
' ;	 PROCESSOR VOTER1
1
1
1
I	 OUTPUT D / A 1 ACTUATORDRIVER
OUTPUTS
^.	
1
-- -
	
— — —
 - 
i----	 T MODULE
Figure 4.15. General Dynamics Sensor Data Flow
105
is transferred from the sensor input module to the local data link module. Sensor
data is distributed to other redundant processing channels via the cross channel
data links. Redundant sensor data is also received from other channels via the cross
channel data links. Presumably, each channel will derive a single set of sensor data
which will be assembled into a message and transferred to core processing sites such
as the vehicle management processor via the redundant HSDB system data bus.
Sensor data messages received by the core processing site will be processed for
errors and directed to appropriate application tasks. Figure 4.16 shows the overall
sensor data collection structure of the General Dynamics architecture. Multiple
RDI's transfer redundant voted data to core processing sites via the HSDB.
Sinvlar to the Boeing architecture, the software to handle the required sequence of
operations to deliver consistent sensor data to application tasks has not be defined
nor has its performance characteristics been analyzed and specified. Of concern is
overall latency of data delivery and the effective sensor data conullunications
bandwidth. The performance and functionality of the local data link module was
not defined for MPRAS Part 1. It is assumed that this module will provide for
distribution and voting to assure consistent sensor data in all channels, will provide
for synchronization of redundant channels, will provide for alignment of the state of
channels during recovery and will provide for distribution of error information and
status between redundant channels. The distributed structure of the General
Dynamics architecture can reduce the throughput requirements for the local data
link module relative to that required for the Boeing fault tolerance module. By
distributing the voting and data distribution function to local or regional data
collection sites, the throughput requirements for any one site is reduced from that
required for a more centralized approach. In this respect, a more distributed sensor
data collection structure tends to be more expandable and places less demanding
performance requirements on the local data link modules. However, unless properly
designed, the additional voting planes required in the distributed architecture could
result in higher costs.
As was the case with the Boeing fault tolerance module, the definition of the local
data link module was not sufficient to assess the adequacy of its functionality and
performance for the MPRAS application. Since this module is expected to provide
the functions that are essential to the redundancy management of a multipath
system, incomplete functional specification and performance analysis of this module
represents a significant technical risk.
Communications between processing sites and the delivery of sensor data to
processing sites is provided by the redundant HSDB system data bus. The Ceneral
Dynamics computation model sets this communications rate at 12.8 mbps which is
about 25% of the HSDB capacity. Capability to actually realize this
106
i —
1
1
1
1
1
1	 PI1
1
1	 Xi
1
1
1
1
1
1
1
100 Mb/
12.5 MB 1
0
1
1
1
1
REMOTE
HSDB
3 ACTIVE
AND
3 BACKUP
DATA
INTERFACES
SENSORS
^-r r r r r r+ 1
1 II
1 11
1 11
1 1/0 1	 1SERVICE i	 1
1 11
1 11
1 11
1 11
1 11
1 11
1 it
TASK 1
1 11i 11
1 li
PROCESSING 1
^— PLTE — — — — — —
1 11
1 II
1 11
1 1/0 1	 1
I SERVICE 1	 1
1 111 11
1 II
1 11
1 11
1 11
1 11
1 11
TASK 1
1 11
1 11
I 11
PROCESSING i	 1
^- SLTE- - - - --
Figure 4.16. General Dynamics I/O Structure
107
communications bandwidth depends upon the software data delivery delays and the
distribution of this requirement across the processing sites. Input requirements are
highest for the vehicle management processor which provides control functions, the
data recording group, and the telemetry processing group. The software that
provides for preparation of data messages to be sent and for the delivery of data to
an application task must be capable of sustaining the peak data rate while
providing the necessary error detection and management capability. General
Dynamics has proposed the Table Driven Proportional Access timing approach
which was devised by Honeywell SRC. Using this approach, bus transmissions are
prescheduled and message lengths and destinations are predetermined during the
application software development process and occur at deterministic times. This
approach has the potential to simplify and hence improve couuiiunications
performance by eliminating the time required for contention on the system bus, by
reducing the need for source, destination, and message length information in
messages and by reducing the processing overhead required to deliver a message to
all
	
task relative to that required by a general purpose network message
handling software. Since this software has not been specified and described, it was
not possible to analyze the performance of it.
To achieve the peak input rate to the vehicle management processor from the
system bus of 0.9x106 bytes/second, the HSDB interface module must complete its
processing of a data word in all 	 of 2.2 nvcroseconds. If the processor on the
JIAWG HSDB module has the current throughput of 3.5 MIPS, approximately 7
instructions can be executed for each word handled. This would result in 100%
loading of the HSDB module and would not allow for system output. Even without
definition of the interprocessor conuirunication software, it can be concluded that
using the current standard .JIAWG HSDB module is not feasible to meet the vehicle
management processor input requirements as given in the General Dynamics
MPRAS Point Design System Partitioning of August 2, 1989.
As with the Boeing architecture, further development of the General Dynamics
MPRAS architecture should not proceed without more careful definition and
performance analysis of the I/O and interprocessor communications software.
108
5. FAULT TOLERANCE FEATURES OF MPRAS
ARCHITECTURES
5.1. Introduction
To provide the advanced functional features, such as adaptive guidance and control
and vehicle health management, which are expected to enable substantial savings iu
launch operations, the complexity of the avionics hardware is much greater than
that for current launch vehicles. A consequence of this increased complexity is an
increase in unreliability. To offset this reduction in reliability and to further reduce
launch vehicle costs due to unreliability, the use of fault-tolerant systems technology
for the ALS avionics was examined under the MPRAS ADP's. While essential
architecture characteristics such as performance and use of common modules had to
be considered for MPRAS, the primary focus was to be oil 	 infusion of
fault-tolerant system technology into the ALS avionics. Identification of appropriate
fault tolerance techniques, definition of suitable system development methods,
assessment of potential benefits, limitations and technology risks and the
preparation of technology development plans were among the goals of the MPRAS
ADP's.
The designs of the fault tolerance features for the proposed MPRAS architectures
were reviewed to determine adequacy, completeness and potential development
risks. The primary areas focused oil 	 1) fault masking and data consistency
mechanisms, 2) fault detection mechanisms, :3) fault recovery techniques, 4)
redundancy and sparing, and 5) validation and development methods.
5.2. Fault Masking and Data Consistency Mechanisms
5.2.1. Background
Fault masking provides fault tolerance through the use of redundancy to isolate or
correct fault effects before erroneous outputs call
	 from a failed module.
Each of the proposed MPRAS architectures relies on N-modular redundancy with
voting to mask processing and computational errors. N-modular redundancy is
based on the comparison of data from redundant channels to detect and mask
109
faults. For this comparison to be effective, the following conditions are necessary:
1. The redundant processors have to be synchronized to a known and bounded
tinvng skew.
2. The redundant processors must be initialized to consistent starting conditions.
:3. All redundant processors must gather inputs and produce outputs in the same
order.
4. Bitwise identical inputs must be used in the redundant processors.
5. A mechanism to detect bitwise disagreement between redundant processors
must be used.
Synchronization is needed to establish all priori limit on processing time so that a
slow non-failed processor call 	 differentiated from all 	 failed processor, the
comparator call 	 when valid outputs have been received, and duplicated sites
call bitwise consensus oil data. It call achieved by communication
between channels, which is subject to faults in either channel or in the inter-channel
communication mechanisms, or by the provision of a reliable clocking signal oil each
channel to which all channels synchronize.
The requirement for bitwise identical computation and output agreement requires
bitwise identical input, which in turn requires that all channels use identical or
consistent input data. To assure that identical inputs are distributed to all channels
in the presence of arbitrary failures, an input consistency protocol is required that is
f-Byzantine resilient; i.e., call 	 f arbitrary failures. For all 	 consistency
protocol to be f-Byzantine resilient requires: 1) at least 3f+1 participants, 2) each
participant to be connected to at least 2f+1 other participants through disjoint
communication paths, 3) at least f+1 rounds of communication among the
participants, and 4) the synchronization of the participants within a known skew.
The participants in the protocol need to be fault containment regions; i.e., regions
to which faults call 	 sufficiently contained to ensure the statistical independence
of failures in any two regions. Otherwise, single failures could result in the
simultaneous loss of more than one of the redundant channels. Therefore, four
FCR's are required to tolerate one arbitrary failure. Four architectural features lead
to independence of failure: 1) physical isolation, 2) electrical isolation, 3)
independent power, and 4) independent clocking.
Fault masking and data consistency mechanisms are essential elements for achieving
fault tolerance with a multipath redundant system. Unless these mechanisms are
properly designed, the potential reliability of a multipath system cannot be realized.
So important are these mechanisms that all 	 design cannot credibly be
110
represented as fault tolerant unless these mechanisms are rather fully specified as to
functionality and performance. Proceeding with ail 	 development
without such specifications should be considered a technical and development risk.
The mechanisms which provide fault masking and data consistency for each
proposed MPRAS architecture are discussed in the following paragraphs.
5.2.2. General Dynamics Fault Masking
Output Voting — Digital outputs from each redundant string are voted bit-for-bit
in the General Dynamics output modules. Each redundant string may have an
output module capable of driving multiple actuators. The voted outputs from each
string are converted and then used to drive redundant actuators. Up to four
redundant channels can be voted. The specification of the output module
functionality and performance is incomplete in several essential areas. Among those
characteristics left unspecified are: 1) the nominal timing skew allowed between
redundant strings and the output data/coinmand buffering required to align skewed
outputs at voter inputs, 2) how ail 	 module responds if a redundant string
fails to deliver data to be voted, 3) if and how a bad string can be excluded from a
vote, 4) if and how minority strings are identified and reported, 5) if and how the
voter and minority reporting are checked during operation, and 6) the data
bandwidth of the voter and drive converter elements. Without specification of these
essential functions, the design of the output fault masking is incomplete and its
adequacy cannot be assessed. No specific redundant channel fault masking is
provided for communications oil 	 system bus. Accordingly, the failure rates for
all processing sites within a channel along with the failure rates of the bus
communication paths are added to obtain the failure rate for a single string.
Input Data Consistency — Voting of output data values depends upon each
redundant channel having consistent input data. The General Dynanvcs
architecture provides for a Local Data Link (LDL) module which can be used to
establish consistent data in each redundant channel. Cross channel communication
paths are used to pass data between LDL modules in redundant channels.
Redundant data from multiple strings can be voted in the LDL. Communications
paths are optically isolated to maintain independent fault containment areas. This
prevents failures on a link driver or receiver from causing all LDL's to fail. It is
implied that the communication paths will be either self-checking paths or will use
error detection and correction coding to ensure reliable cross channel
communications. A description of the options for establishing consistent input is
given in the following paragraphs.
The architecture uses two methods to ensure consistency of the sensor input
III
data-syntactic error checks and semantic checks. When possible, the architecture
depends oil 	 error checks to detect potential faults. Where syntactic checks
are not effective, it relies on semantic checks to detect errors based oil the value or
reasonableness of the data. Semantic checks are dependent on application and
therefore left up to the programmer to implement. Syntactic checks, such as a
bit-for-bit vote of the three redundant channel outputs, are performed automatically
by the system. The LDL provides the hardware means by which these syntactic
checks are performed. The RDI's, designed to collect sensor data and develop a
single, consistent input value for each channel, can be configured in a variety of
options depending on the criticality of the sensor data and the redundancy level of
the sensor. In the case of a single flight critical sensor, two options are specified.
The first option ties the sensor to channel A of the RDI and the data value is
distributed to channels B and C via the LDL data exchange. This option does not
guarantee input consistency in the presence of arbitrary failures in channel A.
The second option cross-straps the sensor to four channels: A,B, and C and a fourth
consisting of a sensor interface and an LDL module. The addition of the fourth
channel provides the fourth fault ,containment area which is the basis for a
Byzantine-resilient protocol. However, it is not clear whether the required two
rounds of communication are to be provided. The guidance given for using the
sensor cross strapping option instead of the capture then distribute option was that
the reliability of the sensor and the cross strap wiring should be much higher than
that of the sensor interface and the LDL module. It is RTI's opinion that the
decision must also include the probability that for certain upset events such as
lightning strikes, the sensor cross strap wiring makes all channels vulnerable to
damage. That is, use of sensor cross strapping can make the system more
susceptible to non-recoverable common mode failures.
Instead of using the above approach to distributing consistent input data, an
approach which provides the appropriate number of fault containment regions and
disjoint conununi cation paths for a protocol that tolerates arbitrary behavior could
have been considered for the LDL. When error correcting or self checking
communication paths, multiple rounds of communications and the extra sensor
input and LDL module proposed by General Dynamics are taken into consideration,
this approach could in fact be less complex and expensive. Important unspecified
characteristics of the LDL function include: 1) the maximum timing skew between
redundant channels, 2) the impact of the skew upon LDL throughput, 3) the
buffering required to align voter inputs in the presence of skew, 4) the behavior of
the LDL when the LDL in a redundant string fails to deliver expected data, 5) if
and how a faulty channel is excluded from a vote, 6) if and how a minority string is
reported, 7) if and how the voter and error report logic are checked, 8) the necessary
112
bandwidth for the LDL function, and 9) the specifics of the cross channel link
self-checking or error correction functions. Without specification of the essential
functions, the design of the LDL is incomplete and its adequacy cannot be assessed.
As such, this represents a significant development risk.
Synchronization — There are three types of synchronization that must be carried
out for the General Dynanvcs architecture. The first type is the synchronization of
the two processors in the self-checking pair (SCP) processor module. The second
type is the establishment of synchronization of all processing elements within a
redundant channel. The third type is the synchronization of the redundant strings.
The synchronization of processing elements within a channel provides the basis for
communication oil
	 system bus. Communication between Bus Interface [hits
(BIU's) within the architecture is accomplished with decentralized bus control via
Table Driven Proportional Access (TDPA). TDPA is a time-multiplexed scheme
based oil coordinated allocation vector in each BItT. This permits processors to
share a bus without all 	 bus controller. Connections to the bus consist of
both transmit/ receive and receive only ports. Each BIU has a unique (for its
transmitting bus) identification (ID) code which determines when a particular
module is allowed to transmit to the system bus. Each processor contains a bus
access table or vector. The access table allows only one module access to the bus at
any one time. Using the same access table, each module's BIU checks to see if its ID
number matches the access table's current time slot ID as listed in the access table.
If so, that module has transmission rights on the system bus at that time. Proper
maintenance of the access table pointer by all system modules maintains
synchronization across the system. Each processor has special algorithms that
maintain synchronization with the system bus and provide for resynchronization to
the bus when required. The length of the access table and the bus clock rate impose
a frame rate for data transfer oil given bus. Because the data type to be
transmitted in a particular slot is known ahead of time, it is possible to allocate a
tight message window time to each slot. The window is large enough to
accommodate a message of the prescribed number of words along with tolerances for
tuning skews between processors. Each TDPA frame is composed of a number of
time slots. Each time slot has a message window and an inter-message gap. The
message window is made large enough to accommodate a message of the indicated
data type as well as tolerances on either side for BIU-to-BIU timing skew. The
inter-message gap is a fixed size determined largely by the amount of time required
by each BIU to set up for the next message. The timing skew between BIU's and
the synchronization and resynchronization algorithms have not been specified. A
non-responding BIU does not affect the timing of any other messages sent by other
modules. Therefore, the time to run completely through the table is always known
113
and is independent of the number of tinned-out messages as long as there is at least
one message transmitted during each cycle through the table. This mechanism is
said to yield completely tinie-deternvnate system operation and that the access
table can be configured to insure that time critical data transfers are supported.
Establishing and maintaining synchronization of redundant strings is necessary to
satisfy the bounded skew requirements for voting of redundant processor outputs
and for establishing consistent input data. The LDL modules in a string provide the
only cross channel communication paths by which cross channel synchronization can
be achieved. Description of this process for the General Dynamics architecture is
limited to the following statements: 1) "The exchange of data across channels tends
to provide inter-channel synchronization. This exchange of information provides a
means by which the skew between channels is limited and directly affects any
time-out functions implemented within the system. - and 2) "Strings synchronize
with each other at message boundaries according to the TDPA mechanism." This
description is insufficient to determine the functionality and adequacy of redundant
channel synchronization. This description leaves the actual bound on time skew
unspecified. Note that in addition to affecting time-out functions, skew between
redundant channels can add to the sample data control loop latency or lag, can
limit the coherency of sensor data samples and, depending upon LDL design details,
may linvt LDL bandwidth. Note also that the skew within a string may be additive
with the cross channel skew in turn producing a larger overall skew. Also left
unspecified is the explicit mechanism or algorithm by which synchronization is
established initially, maintained in the face of relative timing drift and
re-established following transient faults. Since there are multiple processing
elements and associated LDL modules within the system, it is not clear if one set or
all sets are involved in this cross channel synchronization process. If more than one
set is involved, their mutual coordination may need to be specified. If only one set is
involved, the designation of this master set and the potential need for alternate
masters may need to be specified.
Without more complete specification of the synchronization of redundant channels,
the fault tolerance of the architecture cannot be evaluated. The role of this
synchronization is of such importance to the fault tolerance of a inultipath system
that development of the architecture should not proceed without a more complete
specification of this function.
Error Correction Features — Error correction techniques can be employed to
mask errors in data transmission and storage. Inherent in the use of .IIAWG
modules for the General Dynanvcs architecture is the use of a Hanuning code to
correct or mask transmission errors on the PI bus. This is the only specific use of
error correction techniques that can be identified for this architecture. Other areas
114
where the use of error correction has been suggested but not specified include the
use of non-specific error correction coding for the LDL cross-channel data links,
error correction for memory bit errors and allocation of spare transmission windows
III
	 TDPA template for retransmission of messages that previously contained
errors. Presumably, software would have to be included to respond to message
errors with all 	 retry. Consideration should be given to the use of
memory error correction in the self-checking processor/inemory module. The
purpose would be to reduce the frequency at which the self-clecking logic pulls the
S(,'P off-line due to soft errors in the memories.
5.2.3. Boeing Fault Masking
Input Data Consistency and Output Voting — The fault tolerance module
(FTM) of the Boeing architecture implements the voting and synchronization
functions. It will be composed of a 1750A processor, a 256K memory, a
maintenance controller and interface, a PI bus interface, a switched or voted
redundant system bus interface and all 	 communication interface with
voter and synchronization logic. Boeing also indicates that rigorous fault tolerance
concepts will be used. Boeing defines the degree of electrical isolation that is
expected to constitute a physical fault containment region. This is the extent of the
functionality and performance specifications for the FTM. As with the General
Dynamics architecture, the specification of essential features is incomplete and the
potential for the architecture to achieve expected fault tolerance cannot be.
evaluated. Among the essential characteristics which should be specified and
evaluated before development proceeds are: 1) the nominal skew expected between
redundant channels, 2) the fault containment characteristics of the FTM, 3) the
behavior of the FTM when a channel fails to deliver data to be voted, 4) if and how
faulty channels are excluded from voting, 5) if and how a string whose data is in
minority is reported, 6) if and how the voter and error report logic are checked for
faults during operation, 7) the mechanism for distributing consistent simplex data,
8) the connectivity of the inter-channel communication paths, 9) the bandwidth of
the inter-channel communication paths and associated voting and error reporting
functions, and 10) if and how fault masking is provided for system bus
communications between processing sites. The design of the switched or voted
system bus interface is not indicated. While the interface can provide masking of
errors in intercomputer communications, unless designed properly it has the
potential to permit malicious behavior in one processing channel to disrupt the
other redundant channels.
Synchronization — All redundant channels within a processor will be powered-up
115
together at the start of power-up. Following self-test of all of the channels, the
individual channels will be synchronized to each other via the FTM hardware and
self-test will then be conducted at the node level. Task or frame level
synchronization will be used. A real-time clock in the Guidance, Navigation and
Control (GN&C) processor will be used as the reference to which all elements
synchronize. A spare processor will be capable of taking over these functions if the
GN&C processor fails. Real-time clock synchronization will control drift.
Without more complete specification of the synchronization process, the capability
to establish, maintain and re-establish, if necessary, a bounded timing skew across
redundant channels cannot be evaluated. Further, the fault tolerance of this
function in the presence of arbitrary failures cannot be assessed.
Error Correction Features — Specific error correction techniques other than
redundant channel voting identified for the Boeing architecture include the error
correction coding inherent in the .JIAWG PI bus interfaces and the error correction
coding present in the .JIAWG bulk memory modules. Retry of transmissions for
messages with errors is specified for data busses and backplanes.
5.2.4. AIPS Fault Masking
Data consistency and output voting is provided in the AIPS FTP via the
communicator/interstage hardware. This hardware has the connectivity and
physical fault containment regions necessary to provide for distribution of consistent
data in the presence of arbitrary failures. It is one-fault Byzantine resilient in a
triplex configuration and two-fault resilient in a quadruplex configuration. The
hardware provides for bit-for-bit majority voting, reporting of errors in the voting
process and the capability to block faulty processing channels from the vote process.
The function has been implemented and tested extensively in a series of
implementations.
Error reporting circuits and voter logic are tested exhaustively by a diagnostic test
program executing on a time available basis in the FTP. Data paths and control
circuit logic are tested through normal use of the hardware and through a special
test executed at the beginning of each processing frame.
The minimum time required to vote a word is 2.5 microseconds in the
proof-of-concept model of the FTP. The fault-tolerant clock used to cycle the voter
has a 5 microsecond period and is controlled by redundant digital control loops
which can adjust the fault-tolerant clock skews by 125 nanoseconds every
fault-tolerant clock period. The technology used for MPRAS should permit these
parameters to be reduced.
116
Upon start-up and for resynchronization, operational redundant FTP channels are
synchronized to an instruction interval and to identical hardware states by a
software service. Initial synchronization is attained by exchanging unique words via
the conusunicator/interstage hardware.
Conusunications between FTP's is provided via a three-layer IC network. FTP
channels call
	 oil all three network layers either by voting data received oil
layers or by selecting data from a single layer. FTP channels can transmit on only
one layer of the network. Transmission errors for data transmitted on all layers are
masked by a voter in the FTP IC interface. Data transmitted oil 	 one layer are
received by all channels in the receiving FTP. A consistent copy is formed by voting
the received copies via the communicator/interstage hardware. Error correction of
redundant memory values via a memory scrub program that executes as a
background self-test corrects for soft memory errors and prevents the accumulation
of these errors over long periods of operation. This mechanism makes use of the
voter hardware to find and isolate these errors.
5.3. Fault Detection and Diagnosis
Fault detection relies oil 	 provision and use of redundant information or resources
to detect the faults and errors caused by failures. The N-modular redundancy with
voting strategies that are being used in the proposed MPRAS architectures provide
the basis for fault detection, provided that they are augmented with appropriate
and effective fault diagnostics. The major issues are diagnosability and coverage.
Also, as for fault masking, for N-modular redundancy with voting to be effective,
the following conditions are necessary:
1. The redundant processors have to be synchronized to a known and bounded
skew.
2. Bitwise identical inputs must be provided to the redundant processors.
3. A mechanists to detect bitwise disagreement between redundant processors
must be used.
Although the fault masking provided by majority voting is sufficient to provide fault
tolerance, additional diagnostic and fault detection capabilities may be required to
support redundancy management requirements, including notification of fault
117
occurrence and location of the fault to some finite number of possible failure
locations. These two functions are essential if timely reconfiguration is needed to
maintain the level of redundancy after the occurrence of failures or to remove faulty
components.
Each of the proposed MPRAS architectures provide error detection and
identification capabilities beyond that provided via the majority voting mechanisms.
Those capabilities excluding the built-in-test features which support thorough
assembly and prelaunch readiness testing are discussed in the following paragraphs.
General Dynamics requirements call for the use of "state-of-the-art hardware and
software techniques to estaLlish a high level of system tolerance to both hardware
and software errors." "The ability to detect, correct or compensate for soft errors
induced by transient hardware or environmental anomalies is required."
Specific error detection capabilities that are inherent in the use of .IIAWG 00111111011
modules and would be applicable to the General Dynamics architecture are: 1) error
detection and correction in bulk memories, 2) memory parity checking, and :3) error
detection and correction on the PI bus interface.
The primary error detection method put forth for the system bus is the use of an
extra bus for each redundant bus channel along with self-checking bus interface
modules. This would require a bus interface module that differs from the .IIAWG
standard interface module.. If this approach is not used, the extra busses would be
used as spares and unspecified error checking words would be included and tested
for each message transmitted.
The primary error checking mechanism proposed for the General Dynamics
architecture is a processor memory module which uses the self-checking pair concept
to achieve high coverage and rapid detection of errors. Self-checking pairs (SCP's)
are used in all redundant channels to provide rapid fault detection, isolation and
containment at the module level. The use of SCP's simplify the diagnostics that are
necessary to perform these functions and thereby speed up the recovery process so
that spare modules can be used to maintain the desired redundancy levels after
multiple failures. However, the SCP's still need to execute diagnostic sequences to
check the comparator and to determine whether faults are transient or permanent.
Each SCP consists of 2 lock-step processors, a fault monitor module and a PI bus
interface. A self-checking comparator is part of the fault monitor. The self-checking
monitor checks itself while it checks the processor outputs. It also checks that
processor clocks are correct and indicated an error if they are not. A self-checking
fault management sequencer in the fault monitor module disconnects the SCP from
the system bus when an error is detected, diagnoses transient faults and allows the
SCP to re-establish normal system operation when the fault is diagnosed to be
118
transient. The processors that make up the SCP are run in lock-step.
The self-checking fault management sequencer in the fault monitor module of the
SCP is a simple device comprising counters and combinational logic. If processor
disagreement occurs or the self-checking comparator finds an internal fault, the
sequencer transitions from the "good" state to the "abort" state. In the abort state,
it signals all 	 fault and resets the processors. The external fault signal is
transmitted over the Test and Maintenance Bus and PI bus to ensure that errors
affecting the transmission or generation logic are also detected. Then the sequencer
unconditionally transitions to the "disable PI bus" state, where SCP outputs to the
P1 bus are disabled. P1 bus outputs were left enabled in "abort" state to allow the
sequencer to signal the malfunction. Processor resets are released in the disable PI
bus state to allow them to execute a diagnostic sequence. The diagnostic sequence
exercises all of the SCP module hardware including the processors and local
memory. The processors send a command to the sequencer when they have
successfully completed the diagnostic sequence. The sequencer responds by
transitioning to "re-enable" state to allow the SCP module to communicate over the
PI bus. The sequencer then transitions back to the "good" state which allows the
SCP module to be reused as a spare. The recovery process requires the processor to
obtain software state and database information from one of the active processors.
This would normally be performed after the module is brought online as an active
module. The sequencer transitions to the "crowbar" state which permanently
disables the SCP module if during the "disable PI bus" state 1) the processors
disagree, 2) the check circuit has all 	 fault, 3) there is a sequencer fault
during the diagnostic period, or 4) the processors fail to complete the diagnostics
within a specified period of time. The sequencer call
	 disable the
module either by continuously maintaining the reset signal true, or by causing
power to be removed from the module.
The use of SCP's has the following potential advantages:
I. Processors do not need to send "keep alive" signals to the fault monitor logic.
2. Eliminates need for detailed fault analysis for the processors since all non
common cause faults will result ill
	 disagreements.
3. Immediate disabling of outputs eliminates possibility that errors will
propagate.
4. Reuse of SCP if fault is determined to be transient.
5. Potential higher coverage and reduced latency for faults.
119
The use of SCP's has the following liabilities:
1. Comparator is a single-point failure; careful design is required for reliability.
2. They are potentially more prone to conunon mode failures.
3. The fact that the checker exists in the same fault containment region as the
devices being checked makes it difficult to guarantee independence of failure
modes.
4. SCP failure rates are at least double single processor failure rates.
Specific error detection capabilities that are inherent in the use of .JIAWG cotlunon
modules and which are applicable to the Boeing architecture are: 1) error detection
and correction in bulk memories, 2) memory parity checking, and 3) error detection
and correction on the PI bus interface. In addition, Boeing indicates the inclusion of
error checking words for all message transmissions on various system busses. The
Boeing specifications present a relatively thorough analysis of a broad range of faults
within their architecture and indicate how each fault is covered by the various error
detection mechanisms. Fault and error assessment processing will be implemented
in the Boeing FTM. The Boeing specifications call for each processing site to carry
out a self test on power up to establish its readiness to begin processing.
Fault identification algorithms for both the General Dynanucs and Boeing
architectures cannot be specified until their designs progress further. However,
certain important characteristics such as the methods for assuring consistent error
report inputs to the fault diagnosis process and for assuring that the non faulty
channels arrive at the same diagnosis should be, but have not been, specified.
Extensive error detection mechanisms have been implemented for the AIPS
proof-of-concept system. Each processor detects the following exception errors: 1)
bus errors, 2) address errors, 3) illegal instruction errors, 4) arithmetic traps, and 5)
spurious interrupts. A watch dog timer is used to detect processors that fail to
complete operation sequences. The intercomputer and input/output networks
detect protocol errors, data errors and time-outs. In addition to the detection of
data errors on network communication paths, messages contain extra error check
words that detect data errors in messages after message data has been handled
within FTP memories. Voters are used to mask errors on intercoinputer
communications. On-line diagnostic self-test programs are executed during available
processing intervals to detect faults in the voter logic, error reporting circuits, data
120
memory, program memory, the real-tune clock and the monitor interlock. Presence
	
1
tests are executed every processing frame prior to application processing to establish
which channels of each FTP are available. In addition to these error detection
mechanisms, fault diagnosis algoritlms have been implemented to analyze these
error reports along with errors detected by the voters to identify faulty elements.
Note that proper fault diagnosis for a redundant system requires that all
operational channels arrive at the same diagnosis. Consequently, the ALPS uses the.
data exchange mechanism to distribute to all channels the error reports from each
channel so that consistent error report inputs can be used for diagnosis in each
channel. Testing has been conducted to determine the correctness and effectiveness
of these algorithms. Test programs which exercise the spare links and nodes in the.
intercomputer and I/O networks are executed to detect failure in these spare
components. FTP's also execute a full self-test sequence following power up.
5.4. Fault Recovery
Recovery from faults in the AIPS architecture will be triggered by the fault
detection and identification mechanisms discussed in the previous section. Initial
response to all diagnosed channel faults is to block the faulty channel from voting.
Following this, attempts are made to recover this channel. A channel is considered
recovered if it can be resynchronized and state aligned to the operational channels
and if there are no data exchange voting errors being produced in the recovering
channel after a suitable period following resynchronization and state alignment.
Faults detected and identified for the AIPS I/O networks and IC network will cause
the networks to be reconfigured around the faulty component using spare
communication links and nodes.
Since network reconfiguration and channel recovery require substantial time to
complete, the AIPS architecture reconunended for MPRAS will only make use of
these fault recovery features during the pre-launch period. During the launch phase
only fault masking and the capability to block channels diagnosed as faulty from the
voting process will be in force.
Boeing has not specified or described the error diagnosis, reconfiguration, transient
recovery, resynchronization and state alignment processes which are essential for
fault recovery. Critical aspects of these processes need to be specified before their
adequacy can be assessed and before the design can be considered to represent a
credible fault-tolerant system.
The basic recovery process for the General Dynamics SCP was described in the
previous section. In this architecture, recovery occurs at the subsystem level and
121
^I
the goal is for rapid in-flight recover. The capability to properly and rapidly
introduce a spare SCP into the channel or to recover the faulty SC;P from a
transient depends upon self-test, synchronization, state recovery and state
alignment processes. These processes must be specified and analyzed in more detail
before their feasibility and adequacy can be established. Of particular concern
should be the time required to align the state of the SCP memories with that of the
other channels. This time could well exceed the specified recovery time.
The rapid in-flight recovery requirement specified by General Dynamics and Boeing
is a major contrast to the in-flight fault masking approach taken for the AIPS. Since
this requirement can have a substantial effect on the feasibility, cost and
development risk associated with the MPRAS architectures and could be a
discriminate between architectures, the need for this requirement must be resolved.
5.5. Redundancy and Sparing
The redundancy of the General Dynamics architecture can support up to
quadruplex configurations. Triplex has been recommended as the baseline
configuration. Due to the "two-failure" requirement for safety, launch will not be
pernutted if including the failure of spares the system degrades below the triplex
level during pre-launch. The reconunended redundancy for both the Boeing and
AIPS architectures is quadruplex. These systems have to degrade from quadruplex
to duplex during pre-launch before launch would be aborted.
Sparing in the General Dynamics architecture includes a spare PI bus for each
backplane in a redundant channel, a spare system bus for each redundant channel,
provisions for spare processors in redundant channels and the capability to provide
spare processing sites at the system level. Spare processing sites can be used to
replace any processing sites that are not used as an RDI; that is, is not connected to
sensors or actuators.
Sparing for the Boeing architecture includes a spare PI bus for each backplane in a
redundant channel and pooled system level spare processing sites. Due to the Boeing
core processing and local signal conditioner topology which permits all processing
sites to have access to all sensors and actuators, the spare processing site can be
used to replace any processing site. AIPS permits sparing at the fault masking
group or processing site level. The capability of the spare to replace a fault masking
group depends upon whether it can be given access to the I/O devices (sensors and
actuators) connected to the failed site. Access to an I/O network requires an IOS
module. Thus, sparing will be linvted to the number of distinct I/O networks for
which distinct IOS modules are provided in the spare. If a system has a number of
122
I/O networks, sparing could be constrained. AIPS building blocks also provide the
opportunity to include spare communication links and circuit switched network
nodes within the 1/0 and IC networks. These spare components can be used to
reconfigure these network components to bypass faulty network components.
123
6. SUMMARY AND CONCLUSIONS
The following paragraphs summarize the conclusions that have been drawn
regarding the MPRAS architectures.
Reliability — Reliability projections were made by Boeing for recoverable core
avionics over multiple nvssions. In addition, Draper Laboratory used mid 80's
failure rates to evaluate the reliability of an AIPS core avionics for a mission
consisting of a pre-launch and launch phases. No system reliability projections were
made by General Dynanucs for their architecture. None of the contractors
conducted reliability analyses of the sensor/actuator conununications networks.
Projections indicate that this area has a substantial impact on MPRAS reliability
and could cause overall MPRAS reliability to fall short of objectives. In particular,
the triple redundant General Dynamics sensor acquisition network topology could
fall far short of desired goals. The reliability of the sensor/actuator communications
hardware should be considered a risk area for all MPRAS architectures.
The major factor impacting reliability projections for MPRAS is failure rate
assumed for transient failures under both launch and pre-launch conditions.
Establishing credible transient failure rates should be a priority item for MPRAS
development. Otherwise, the system could suffer the costs of overdesign or suffer the
costs of failure to fulfill mission requirements.
No reliability analyses of the built-in-test features were reported for the MPRAS
architectures. The substantial built-in-test features required to enable cost savings
in vehicle assembly, integration, and launch operations require sufficient hardware
that the reliability of this hardware could adversely affect MPRAS reliability. The
design of the testability features should be reviewed with the goal of improving
reliability of the overall test system. The effectiveness of the entire error detection
and isolation features of the MPRAS architectures has been shown to be a critical
parameter for mission success and as such should be an area for scrutiny throughout
the MPRAS development.
Testability — Both the General Dynamics and Boeing testability design
specifications are at a level consistent with the development stage and are
appropriate for the application. Both designs provide for off-line chip level
built-in-test circuitry appropriate for reducing the otherwise enormous testing task
presented by complex systems implemented with VLSI technology. The Boeing
development is more advanced in specifying the testability features for the
sensor/actuator signal conditioning/interface elements. The Boeing design provides
for a test and maintenance bus within ail 	 enclosure and provides for an
external test and maintenance connection port oil 	 avionics enclosure but does
124
not carry a dedicated test and maintenance bus to all enclosures throughout the
vehicle. The General Dynamics design is based on a dedicated test and maintenance
bus that is provided to each avionics processing enclosure. The bus can provide
access to points within the system that should result in better systems testability.
Information regarding the testability of the AIPS architecture for the projected
MPRAS application was not available. However, the lack of chip level built-in-test
capability appropriate for VLSI complexity in the AIPS proof-of-concept system is a
significant disadvantage for assembly and pre-launch testing for MPRAS.
Because of the importance of testability to enabling costs savings in launch vehicle
assembly, test, and operations; competitive MPRAS architectures should be
carefully scrutinized with respect to adequacy and cost effectiveness. Due to the
large amount of hardware required for an MPRAS application, the reliability of the
associated built-in-test and maintenance hardware is a potential risk area that must
be addressed for all architectures.
Performance — Because there are substantial differences in projected performance
requirements for certain functions given in the three application descriptions,
conclusions regarding the adequacy of architecture performance characteristics are
subject to errors. Improved application descriptions and performance requirements
should be developed before substantial MPRAS development is undertaken. Areas
of most concern are propulsion control and adaptive GN&C computation and
communication resource requirements.
Intercomputer and sensor/actuator communications are judged to represent the
greatest performance risk for each architecture. The overall effective communication
bandwidth which includes the effect of software processing for high coverage error
detection and message delivery, as well as the cable bandwidth, is the relevant
performance parameter of interest. The potential for efficient, low-overhead
communications and task scheduling provided by the TDPA concept and the
distributed sensor data voting topology are considered performance advantages for
the General Dynamics architecture. Even though the AIPS communications
mechanisms are software intensive, the fact that they have been implemented, that
their message delivery times are known, and that the required improvement of a
factor of 10 to 20 is feasible using projected technology, reduces the risk for AIPS
performance. Due to the centralized topology of Boeing architecture, including the
more centralized sensor data voting characteristics, it is the architecture that would
be most affected by the need for higher throughput requirements such as those put
forth for the propulsion control functions. Further, the topology seems to constrain
the location of time-critical sensors to the PA module. This would preclude sensors
for fuel level and structures which must be distributed throughout the vehicle and
which are expected to be used for adaptive GN&C functions from being handled by
125
the time-critical I/O mechanisms. Thus, the Boeing architecture development would
carry a high performance risk.
Fault Tolerance — As indicated in Chapter 5, the fault tolerance features of the
Boeing and General Dynamics architectures are insufficiently specified or reported
to determine their adequacy or feasibility. For each architecture, numerous critical
fault detection, fault diagnosis, fault recovery, fault masking and redundancy
management characteristics must be defined before adequacy, feasibility, and
development risk call
	 determined or before these designs call 	 represented as
credible candidates for a fault-tolerant architecture.
The AIPS fault tolerance features are well-established and have been implemented
in a series of development systems. These design features have undergone extensive
analysis and testing. The verification and validation of the AIPS FTP and its
associated local system service software has received the most attention. This
process continues. Due to the advanced state of development, the AIPS fault
tolerance concepts have by far the lowest technical and development risks for
MPRAS.
The General Dynamics proposed requirement for rapid in-flight recovery call have a
major impact oil 	 MPRAS fault tolerance requirements. If this proposed
requirement is in fact necessary, all architectures including AIPS will be
significantly impacted. If, however, in-flight fault masking is all that is required, the
proposed non-reconfiguring AIPS architecture can be used.
Susceptibility of multipath systems to the commoII mode (non-independent) failure
is an obstacle for the acceptance of such systems for applications such as launch
vehicle avionics. Additional work must be done to determine if this threat is
significant and to devise ways to reduce this threat or its effect oil
Both Boeing and General Dynamics indicate that the SDIO BM/C 3 working group
reliable system design framework will be used to develop their architectures. If this
framework is augmented by appropriate methods and carried out carefully, it should
be sufficient to reduce development and technical risks.
The AIPS application development concept is built oil 	 validated "building
blocks" and appropriate application guidelines such that configuring all
from them will require that only the application be validated. Substantial effort has
been directed toward this goal and validation is relatively advanced. However, total
architecture validation has not been accomplished nor has the concept of separating
architecture and application validation been demonstrated. AIPS will have
development support tools and an extensive performance information base which
call 	 used to reduce development risks.
The General Dynamics TDPA concept introduces deterministic application tithing
126
which should greatly reduce the application validation or revalidation requirements
and costs.
Common Modules and Adherence to Standards — Both General DynamlCS
and Boeing Aerospace have set requirements to adhere to or meet a comprehensive
list of standards for materials, workmanship, processes, packaging, practices,
environment factors and maintainability. JIAWG standard line replaceable module
sizes and standard connectors are specified for packaging. Standard interfaces such
as PI Bus, TM Bus, MIL-STD-1773 and the High Speed Data Bus are used.
Conunon .JIAWG modules or upgraded .IIAWG modules are being used to the
extent possible. The common avionics processor CAP-32 and the MIL-STD-1750
instruction architecture are also specified. MIL-STD-1815 Ada programming
language is naturally dictated.
Based on the adherence to these standards and the maximum possible use of .IIAWG
common modules, it is indicated that costs will be reduced. Acquisition costs are
reduced because high production conunon modules will cost less. Logistics costs will
be reduced because of the adherence to standards..J1AWG module acquisition costs
have been estimated at between $10 thousand and $15 thousand for typical modules.
The costs for upgraded or new modules have not been formally reported but
informal speculation sets these costs as high as $20 thousand to $40 thousand. No
life-cycle cost reports addressing the avionics were available for the MPRAS Part 1.
Boeing Aerospace specifications called for the use of .JIAWG modules. General
Dynamics oil 	 other hand indicated that the .IIAWG modules would have to be
upgraded to meet launch vehicle specifications. The primary question is how much
of the cost advantage for conunon modules is lost due to upgraded repackaging.
That is, when is a common module no longer a conunon module? Both Boeing and
General Dynamics specify new modules such as sensor/ actuator interfaces, fault
tolerance modules, self-checking processor module and high throughput processors.
These modules would not necessarily be used in programs other than ALS. There
are about as many unique module types as there are common module types for both
architectures and the total number of unique modules used in applications seems to
equal or dominate the total number of conunon modules used.
A significant difference in the Boeing Aerospace design and the General Dynanvcs
design is the use of liquid cooling. General Dynamics proposes to keep module
power dissipation to between 10 and 15 watts so that the cost of liquid cooling can
be avoided. Since current JIAWG modules such as a 4 MIP processor requires 35-40
watts and since 10 MIP self-checking pair processor modules are proposed, the
feasibility of not having liquid cooling must be demonstrated.
The General Dynamics sensor/actuator conditioning and interface modules for
127
MPRAS are more completely specified, while the Boeing sensor/actuator
conditioning and interface module test features are more advanced.
Information as to standards and common module usage by AIPS was not available
for this review. If the costs savings associated with common modules and adherence
to standards hold for the Boeing and the General Dynamics architecture, it would
be difficult for AIPS to compete on recurring life cycle costs without using common
modules and adhering to supported standards.
128
A. i
APPENDIX A
FUNCTIONAL DECOMPOSITION OF
MPRAS REQUIREMENTS
Functional Decomposition of
Boeing MPRAS Requirements
A-1
UCO
	
[/1
"r
-
E, 
S
Co
	
^
I
^
 
^
^
^
-
Q
 Q
cU
ll l
	
>
^a
^:
^a
>
°
I
F
-
w
 
ca
>
U
-m
E
^
c
E
U
UC
Cn
	
CD
d
CD
d d
^Cl)
U
 c
c
'
^
n
_y
O
U
N 
>
o
^I C
c
c
O
 >
c
c
m
E
 
U0
a
Q
c
n
C7
yo
^
c
c
°
I
_
YC
Ni
	
V
d
I
E
Y
Ic
c°
I
C->
f=
 E
c
E
UY
n
~
I
3
EI
U
o
=
U
I
cn
0
a
c
W3
W
v
l
c
b
o
CD
O
v
t.^-
C
d
c
C
°o
L
o
^
t
 
c
—
°
°
0
 3
(7
c
C7
D
(
^
 
a
Figure 1.
r
nAC d
U
 
^
A-2
Ing
43
o
.G
	
.G
	
.G
	
. G
	
.
	
.
	
.
	
.6
	
.
	
. 6
	
. 6
	
.
Figure 2.
A
-3
$ !rf.96
^
2
2
^
-
-
-
-
-
	
--
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Ll
z
—
-
z
-
-
-
-
-
-
-
-
-
-
-
a
,
!
-
	
-
 
-
-
-
-
-
-
-
v
-
to
-
-
-
-
-
-
	
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
	
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
	
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
	
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
rf2
m
7
a
,
f7f
^
/
^
 k \ f ^
	
+
	
^
) \ \ t f ^ n X q / \
Figure 
3.
A
-4
«
%
\
}
	
}
\
^
#) §) §^
^
$
^(,
2
k
,
n } $.a;
,
f
	
^
^
	
E:
	
E.:
 
 
}
}
k
$
}
§ _
{
a
 
«
 §
)
k
5
§
f ^ 
_
^
&
 
>
a
a
	
4
a
a
a
ID
k,
.
 
f
^k[
§
_
}
\
\
/
/
/
2
\
 
/
{
	
®
	
{
<
)
/
2
2
7
7
.
0
/
0
	
7
7
,
	
.
z
	
,
F
i
g
u
r
e
 
ƒ
A-3
	L
 
L
 m
	
^
5
 
c`
m
	
m
	
c
	
m
m
 
^
 L
I
 
m
 
i 
o
 
m
 
L
I u
l
m
 
L
 
m
 
^
 
U
I
 
m
 
^
^
 
m
 L
^
	
U
 
2
 
^
 
m
 
L
 
m
	
Y
) m
 •]
	
O
_
 N
 m
 C
	
L
	
C
	
O
Figure 5.
A
-6
E1
v
E
 
m
 
m
2
 
-E
 
m
 
m
 
mFigure 6.
A
-7
'
A
f
	
%
A
%
ik
{
)
k
q$
f
4
^
$
$
	
f
 
2
	
a
 
}
	
G
 
^
	
)
	
}
/
k
a
_
\ _
\
_
\
/
 
\
 
\
/
)
s
	
\
	
\
	
\
	
§
ƒ
`
	
y
	
>
	
+
}
f
	
>
/ /
	
/
	
§
	
%
-
^
\
	
2
	
2
2
	
2
2
	
7
	
2
@
-
<
 J
,
	
z
,
	
z
	
3
z
.
z
z
 
„
 
z
=
	
/
Eiger 2
&-8
U-I
	
_
	
8
	
1
	
m
 
m
 
m
	
1
	
L
c
 
c
 
c
 
^
 
^
 
^
 
c
 
c
Figure 8.
A-9
a>I
°
43
Q
 
€
Si
2^
Eo
mIi
>
^
m
^
E
spI
N
r
Figure 9.
A
-10
A0Q
 
N
	
O
 
L (O
^
 
.d
 L
L
 U
>c
 
^
 
E
0
	
0
U
 <
 U
0
A
 
A
_
I
	
A
	
F
	
^
	
_
 
o
	
^
	
E
	
N
0:E
<
4
 
T
z
 
v
^
 
a
l
Et a
o
 c
o
d
 L
L
 U
 J
Oa`
A0
E
	
I
	
U I
	
U
	
Qi
^
	
I
	
I
6
	
Q
 
m
 
o
	
2
 
Q
 
>
n
l
	
^
I
	
c
	
>
'
	
E
	
m
'
	
`
^
	
I
	
^
	
v
^
v
^
^I
	
(
j
	
^
	
€
	
c
 
^
	
m
 
m
l
 
c
	
c
 
.
S
O
	
°
' ^
	
o
	
5
 5
 5
	
5
 5
	
e
 m
	
c
	
c
 
o
 
0
 
0
 
^
 
o
 
0
Figure 10.
_G
°.
'E
! r
 
°
 
c
SU
 
l
i
 
U
 
w
0
A
-11
0o
 '^
	
2
o
' 
e
 
g
 
z
a
'
 
m
'
 
m
	
YI
	
>
i
 
2
	
g
	
m
 
o
'
n
 
r
e
 
r E^
 
o
 
g
 
oo,
	
E
	
Gc
	
rn
N La
	
.
	
S
 
6
 
.
	
.
	
S
 
5
 
.
G
 
.
G
 
S
 
E
 
5
 
.
^
 
S
 
G
 
b
Figure 11.
A
-12
OJ(S
E4aN
^
m
l
81
o
f
a
I
^1
2
0L
	
o`
^
1
	
2
161
p
;
	
S6
e
	
^
^ i
•
	
m
^i
O
5
	
'"
^
5
E•^
5
	
n
o
'0
o
0
Figure 12.
A
-13
o_Em
L^0
U
	
;
{
a O
m
{
m
	
F
i
m
Q 0
n
^
C
	
C
W
Figure 1:3.
A
-14
JD
QC
^$
r
U
I
^
>c
	
>
P
E
^
c
o
 c
c
p
^
	
p
Figure 14.
A-15
Functional Decomposition of
General Dynamics MPRAS Requirements
A-16
i 11,I
A
-17
0
JW 9
 
o_
O
0 
yw^(Qy$ jjf
N
	
d
	
<P
L
W
G
I
Jd
-
 
-
-
 
-
-1
A
-18
A
-19
Ii
A
-2
0
^
 
V
AN
0
	
3
^
.
 
v
Jd
	
nJd.
r
A
-21
Functional Decomposition of
Martin Marietta MPRAS Requirements
A-22
^f
	 !^i'i!i!^j
	
1!
t(I1 I,4111AJ
11ij
1]
11 I11!',
i
	
'
	
I
	
I
	
'
	
1
	
^
 I' n
^l
	
;^^ i F
 bl ^,^A-23
	
`N
p
L
 P
 so
of *°
9c
U
.
CD
Ci
I
m
1
u
IfvCEE
I
V9•qI
t
0EWL 
O
D
[
^
 
U
O
 
C
XtDC 
:
>
O
 
J
U 
I
L
W
^
.7
t.+L
c
 
o
Q
I
L 7
C
LD
 cn
v
CD
`
 
c
 
q
V
 C
 o
c
 a^
.
S
 O
 u
^
0
i
 
m
o
 
m
 
E
O
 (1
C
 m
v
'C c
m
c
 
_
 
 
f
-
7
 
O
D
 
7
O
 C
0
,0
U U U
C
	
^
O
	
'
^
iC
 
_
	
N
C1
	
_
	
a
,
CL
	
a
l'
	
'1,^ cry
p p^
 \
rf
	
\\
	
o
cv
	
\
	
15
G
I
	\
	
l
a
	
1
c
	
\
 
C
o
_
\
 
^
I
	
3
C
	
\
 
_
C
aL
	
13
c
o
	
\
CO
 
LO
C
 O
^
r-+
m
IL
 
o
o
 _
v
N U ^
V
 N
 7
c
 i
 
a
O
O^4
 
O
C
 w
Cmt0
 
~
Ogg
NL
I
OV
r
V
W
ma
1
mvma
mtv
c
 
^
 
a
^
 
c
o
_
 o
 tD
1.°,° E
,
 E
c
 
U
o
C
A
.W
.
 0
o
c
O
 
U
 u
 tD
O
1
A
-
2
4
I111IwII1I
R
r
g
	
r
N
 
Cq
	(
N
 M
 ^.7
	
r
\
 
1vr\^ r
p
	
1
 
.
^
r
	
\
1
	
\
M
CL
a
	
I
s
	
J-
1t
al
wY
l N
N
s
1
a
/
	
^
1
g
1
t
tt
	
1
	
/
e
	
/
1
	
a
^
l
i
t
	«
	
\
^
^
^
 a
,
	
g
l
	
\
t
	
N
 N
	
^
	
^
 
^
 H
 \ ^
 w
 «
3
	
^
^
	
t
	
N
N
1
	
3
	
r
'
J
m
 
L
L
A
-25
r
 
t
	
0
	
-
-
3
Z
I
I
 
,
u
w
 
^
W
 
.
Y
M
^
g
11
i
O
FIG
poO
R
 QUALITY
\
	
r
w
fi
V
	
Y!-
f
	
h
V8CC^w
g
	
t
yM}OaO
	
III
^COu
m
^
i2
	
^
 
w
n
 N
 y
l1l►
	^
'
	
w8
0
	
^
{
	
Q
	
'
	
3
e{{{{{
A
-
2
6
GS 1S
O
F P°o
a
	
_
^
0af LCL
Is
A-27
1N
P^' P
6G
^1-^^^
pR
^G
 pR
 ^
Q
F
 
^
u 
N
v
L
-
-
:Vg
-
t
 idig1tI1
o
c
^3
1r^I
r
atI^
3
t11
GF- ^5
^G^1NPN
O
 ^ !p0
O
aEavYP^: 3
A
-28
\
g
s
,\
NCL
CL•
N ^
A-29
^
^PL
 Q^P^"`^
o
x
x
 pR
OF
0t^gaa...
^
a
AI1III1N111t11I
V G •
to1111
	
1
	
s
	
1
	
^
	a
	
^
	
I
	
^
	
f
	
^
	
1
	
^
1
 
^
•ui
1
	
1
_L
	
1I1
0
1
	
f
`
°
 l
	
I
_
^
 1
	
1
c
 
1
	
A
^
	
1N
^
 
•
	
1
c
	
1
G
	
`A'
	
I
M
	
^
'
eS
A-30
	
^
e
 v
^
 ^
O
F
g8it
rlIn
A
-31
o
^
^l^EJP
R
 QvP^-^^
O
f '°o
R\
\
	
1
\
	
N
	
^
\NI
r
	
\
i
	
\
 
\
	
g
l
	
a
,
	
!^
M
	
CC
a
	
^
9
	
1
M
	
^
	
/
/ /	
r
/
^
	
^
	
N
.
^
	
1
	
N
Ng,
u0M
^p O
tD
 d
C
 
`
7
`G
oN
pL
 Q
U
^
^`
^
A
- 32
	
O
R
 O
R
OF
Fdaf=t-vcE
e
^
 u
1uI111111lI
M
•
 a .
^
u
.
Al,l111►,l11144MUf
v
-
1
/►1l
►1111
frr
rf
_
^
 
rr
r
3s
	
'
rr
rr
rr
1
f
/' o
T
A
-3
3
O%
IG?l'to4o
O
F
 .
trr
V
 
/
1
A I
	
^
'
M
	
r
W
^
 N
 Y
	
M
 y
 
^
11
,
	
1
^
	
1
1
/
	
^
	
1
/
	
1
m
^
O^t1^G^pR
OF
A
-34
`N
A
B
 Q^P^^^^
APPENDIX B
OS PERFORMANCE MODELING FOR
DISTRIBUTED REAL-TIME SYSTEMS
Introduction
In designing and developing a highly reliable and fault-tolerant system, the use of
performance modeling at various stages of the process may provide useful
information to system developers. This task investigates issues on the role of
performance modeling in all 	 design environment. In particular, it
examines modeling abstractions and modeling fidelity issues for incorporating
operating systein (OS) characteristics into system performance models. Given the
application of interest, the work concentrates oil 	 distributed operating
systems rather than general purpose systems.
Specific goals and objectives were to identify useful performance abstractions for
operating systems that maintain modeling fidelity and to demonstrate their use in a
simple application model.
Performance Modeling
In keeping with good engineering practice, the design process is inultiply and
recursively iterative. In this context, the design process proceeds from the abstract
to the specific. Performance modeling at each stage and substage can help guide the
decomposition process further.
Performance call 	 different things at different levels of abstraction. Higher
levels of abstraction provide relatively crude measures, while lower levels can provide
more accurate measures, as shown by the modeling experiments described below.
The most important question for any modeling effort is clear: how can lower level
details be abstracted without losing too much fidelity in the model'? In other words,
what call 	 abstracted and what cannot? To some extent, the answer is dependent
upon the importance of the issues under experimental study, i.e., the dependent
variables for the modeling effort, relative to the particular system.
Since performance is only one of the many design requirements a system inust meet,
modeling trade-offs need to be made. One such trade-off is modeling fidelity versus
cost. High fidelity models may be expected to give more accurate results, but may
be inore costly to develop. Ideally, system modeling should allow developers to
select a level of detail appropriate to their particular needs and constraints.
In the realm of performance modeling, the partitioning of the system is a very
important issue. Modeling the application, the OS system services, the OS task
controls, or the architecture and its interconnection network, requires different
perspectives on performance issues. The performance attributes at each level of
B-1
Most abstract	 Performance
Attributes
Service Time Estimate
Data Flow In
Data Flow Out
Memory Requirements
I M1 I
Intermediate	 Callingg Sequence
Pseudocode
Language Selection M1.1 I	 I M1.2
M1.1.1 I
	
I M.1.1.21 I M2.1.1 1
	 I M2.1.2
Least abstract	 Detailed timing	 Task Do-it is
Instruction counts	 Begin
End
Figure 15. Application Software
modeling abstraction vary according to the system element being modeled.
Suppose that a inodel of operating system abstractions for task control is to be
created. The operating system would decide when tasks get executed, blocked, or
suspended. Performance metrics, such as processor utilization and task response
times (which depend on OS scheduling policy), can be used. Some task control
mechanisms are relatively simple and can be modeled using performance simulation
tools like ADAS, such as fixed schedules and preemptive priority. It is also true that
task control can be very complex to model, such as when modeling the Ada tasking
semantics since so much is hidden by the language and its run-time system.
If application software is being modeled, the most abstract view might use an
estimate of service tune, examine data flow in and out, and consider memory
requirements. An intermediate level of abstraction might consider calling sequences
and pseudocode for specific pieces of the algorithm being modeled. The most
granular and least abstracted view might model detailed timing constraints and
instruction counts. Figure 15 illustrates this point.
B-2
Table 1. Operating Systems Abstractions for System Services
Level of Abstraction
	
Performance Attribute
Most abstract
	
% Overhead for task
Intermediate level
	
# OS calls x time
per call by a task
Lower level Dynamic model of OS
services (e.g., shared
memory contention)
Least abstract	 Actual run time environment
If operating system services are being modeled, the most abstract view might
estimate the percentage of overhead for a task service. An intermediate might
evaluate performance in terms of the number of OS calls multiplied by the total
amount of time per call by a task. A further degree of refinement in detail nvght
model shared memory contention. Table 1 illustrates this point.
If interprocessor communication is being modeled, the most abstract view nught
model communication in terms of a fixed size delay. An intermediate level might
take the model further and deternvne the delay based on channel bandwidth, size of
data message, and overhead related to a network packet. Even more detail could be
achieved by modeling the changes in delay dynanvcally. The least abstracted view
would have detailed tinungs based on all of the factors that enter the transmission
process. Figure 16 illustrates this point.
In summary, useful performance abstractions exist for applications, operating
systems, and architecture modeling. In the case of operating system control and
interprocess conununication (the subjects of the experiments described below), some
control and conununication semantics can be modeled relatively easily. More
complex task control semantics, such as with Ada tasking, are more difficult to
model.
B-3
Dynamic model
with channel
semantics
Level of Abstraction
	 Performance
Attribute
Most abstract	 Fixed Delay
Intermediate levels	 Delay is function
of channel bandwidth,
amount of data, and
packet overhead
Representation
P1	 Delay	 P2
P1	 Variable	 P2Delay
PI - Bus semantics
Least abstract	 Detailed communication 	 ISA/RTL simulation models in VHDL
timings
Figure 16. Interprocessor Communication
B-4
Figure 17. Demonstration
Experiment
This experiment illustrates the relationship of model fidelity to levels of abstraction.
It shows that the addition of details to the model produces simulation results that
have increasing degrees of fidelity. The experiment was constructed so that
performance measures taken from models containing operating system and
communication abstractions could be compared to performance measures taken
from an actual prototypical implementation. The models were constructed and
simulated using ADAS. They consisted of one very abstract model and a second
more detailed model, which allowed for differing degrees of abstraction in functional
simulation with the same topographical model. Figure 17 illustrates the
experimental paradigm.
Description
The system application chosen for this experiment was a distributed client-server
model. A single client module was included, with six server modules to provide
generic services and a single Ethernet-like network for communication among
processes. Each functional component was conceived to be executing on separate
hardware processors. In simulation, the functional modules were utilizing their own
processing elements (PEs), while the network modules were viewed as competing for
B-5
the Ethernet transmission medium.
As mentioned above, the ADAS models included two basic levels of abstraction, a
high level of abstraction and a more detailed level, with this second level being
simulated in two modes of increasing detail. The models included operating system
abstractions and conununication abstractions.
With the ADAS performance and functional modeling tool, node firing delays (or
the amount of simulated execution time a process uses) can be random numbers
that are computed by user-defined routines during functional simulation. Random
numbers are often required in simulations to model unpredictable events, such as
arrival rates of requests for service and random service times.
The use of stochastic attributes to represent random events in the modeling process
enhances fidelity. Random numbers are usually generated from distributions that
attempt to model real-world behavior, such as Poisson distributions for queuing
models, exponential and Weibull distributions for failure models, and uniform
distributions for service time modeling. A particular distribution and its parameters
are chosen for the application based on experimental observations and assumptions
about the processes involved in the experiment. Simulations should then be carried
out until performance measures stabilize. It is difficult to predict a priori how long a
simulation should run; usually the experimenter should decide based on modeling
experience and knowledge of the particular modeling tool.
This experiment examined average network throughput, client utilization of its
processor during simulation run, and server utilizations of their separate processors
during simulation run. In this context, throughput was considered to be the total
amount of data that passed out of the network nodes in a given second of simulation
time. Client and server utilizations were considered to be measures of the amount of
simulation time, as a percentage of the total simulation tithe, that the individual
hardware PEs were busy.
Independent Variables
This experiment used three independent variables. They were the client execution
time, the server execution time, and the size of the user message. The client
execution time was based on measured values from an actual prototype
implementation of the application. Although it turned out to be a very small
amount of time, 500 microseconds, a greater time would not have affected the
essential problem being modeled. It was statically set, and remained the same, for
all simulation runs.
The server execution tithes were chosen to be uniformly distributed about a mean
B-6
value. The mean value was statically set, prior to each simulation run, with the
uniform distribution being calculated dynamically for each server node firing. The
mean times varied from 500 microseconds to 1 second.
User messages were varied from 64 bytes to 32 kilobytes, with the specific size
statically set prior to each set of simulation runs (ten settings). Message size was
incorporated directly into the network simulation models as network node firing
delays in the first model, and as production and consumption of simulation tokens
in the second model.
Sets of simulation runs were carried out, with each of the ten user message sizes
being held constant for all of the twelve server mean values, resulting in 120 data
points.
Model One
Model One was the most abstract and coarsely-grained of the experimental models.
It assumed that the execution times for both the client and server models
encompassed the application functions and system networking functions, with
additional network functions modeled as part of the network nodes. Again, firing
delays for the node execution times were taken from an actual prototype application
implementation.
Table 2 shows the message cycle times as actually measured in a prototype
implementation. The times shown are the average times taken for a message to be
read, processed, and for anew message to be sent. Since process time for the
application was known to be very small, the largest portion of the times represents
message overhead for reading and writing.
Table 3 shows the firing delays used in Model 1 to simulate the passing of messages,
and application code execution times. The server delay was incremented by a
reading and writing cost constant for each message size. The network delay was
estimated to be a fraction of the message size (in bits).
This model assumed that transmission time across the network was a function of
network bandwidth and the length of a user message. The network node firing delay
was set to be a function of the size of the user message as a result of this
assumption. The total user message was assumed to be transnvtted and received
without being broken down into network transmission packets. Coarse-grained
contention for network services was modeled in terms of the network nodes
competing for the network hardware resource. Figure 18 illustrates the ADAS
software graph of Model One.
B-7
Message Size Delay
(bytes)	 (cosec)
64 6.737
128 6.791
256 7.02:3
512 7.5:33
1024 8.870
2048 12.258
4096 17.619
8192 26.961
16:384 53.356
32768 97.:364
Table 2. Measured Delays for Message Cycle
Message Size Client Delay Server Delay Net Delay
(bytes) (cosec) (cosec) (cosec)
64 3677 3678 + random number 0.0512
128 3713 3713 + random number 0.1024
256 3868 3868 + random number 0.2048
512 4208 4208 + random number 0.4096
1024 5099 5099 + random number 0.8192
2048 6544 6544 + random number 1.6384
4096 9304 9:304 + random number 3.2768
8192 1:3090 1:3090 + random number 6.55:36
16384 32469 :32469 + random number 1:3.1072
32768 46187 46187 + random number 26.2144
Table 3. Component Delays for Model 1
B-8
client0
netl	 nett	 net3	 net4	 net8	 net9
ewer	 ewer	 ewer	 ewer
	 ewer	 ewer
net0net?	 net6	 net5	 etl	 etll
Figure 18. Model One
B-9
Model Two
Model Two represented a less abstract model with the introduction of two finer
levels of detail. III 	 two increasingly detailed views of the system, the client and
server modules were decomposed into application functions and 1/0 functions. This
more closely represents the way that the operating system and its network protocol
software actually function ill 	 to any specific application. The ADAS
graph topography was hierarchical in its decomposition of the model, and it was the
same for both versions of Model Two. Figures 19, 20, 21, and 22 illustrate the
ADAS software graph hierarchy for the Model Two topography.
Model Two — First Version
The first version of Model Two made a similar assumption about user inessage
transmission as Model One. III 	 cases the total user message was transmitted
and received as a unit. However, in this case, the transmission time was a function
of the number of Ethernet packets required for a particular message size (packet
sizes ranging from 64 bytes to 1024 bytes, even though Ethernet packets call
up to 1518 bytes). Based oil 	 prototype implementation, the amount of Ethernet
transmssion time was calculated (exclusive of operating system network services)
for messages of different sizes. The firing delay for the network nodes was statically
calculated in terms of this transmission time, and the simulations were run using
this calculation.
Table 4 shows the component delays for Model 2 Version 1. The client and sewer
nodes have been decomposed to show the actual reading and writing of messages.
The times again were based on the results obtained from the prototype
implementation for each message size. The next delay was based message size
relative to maximum network bandwidth.
This version represented a more detailed and accurate view of data transmission in
the system. Message transmssion interleaving was again modeled in terms of
contention for the Ethernet hardware resource. The next version refined this model
further.
Model Two — Second Version
Using the same graph hierarchy as the first version of Model Two, the second
version altered the network model assumptions. In this case, the user messages were
decomposed into Ethernet packets for transmission, each packet no larger than
1024 bytes (arbitrarily chosen for modeling convenience). A single token was used to
B-10
CLread
client
CLwrite
netl	 net2	 net3	 net4
	 net8	 net9
server0	 serverl	 server2	 server3	 server4
	 servers
net0
Ft 	 F96	 nets	 F-11	 etl
Figure 19. Model Two
B-11
inut0	 in rlatl	 inq7t2
	 inqr7irt3
	 ingr-Wt4
	 inm7rt5
outort0
	 outertl
	 outnrt2
	 out[§rt3
	 out6rt4
	 uut6l rt5
Figure 20. CLREAD.SWG
ingat0	 inqatl
	 inWTt2
	 in"t3
	 ingr7rt4	 inm^rt5
outort0
	 outn! rtl
	 outnrt2
	 out[^rt3	 out 1hrt4
	 out6rt5
Figure 21. CLIENT.SWG
B-12
Figure 22. Server 0
B-13
Message Size Client Read Client Write Server Read Server Write
	 Net
(bytes)	 (cosec)	 (cosec)	 (cosec)	 (cosec)
	 (cosec)
64 1672 1672 1672 1672 58
128 1690 1690 1690 1690 109
256 1767 1767 1767 1767 211
512 1937 1937 1937 1937 416
1024 2:383 2383 238:3 2:38:3 826
2048 3105 3105 3105 :3105 1652
4096 4485 4485 4485 4485 3:304
8192 6:378 6:378 6378 6378 6608
16384 16068 16068 16068 16068 132
32768 22927 22927 22927 22927 264:32
Table 4. Component Delays for Model 2 Version 1
model a single packet of user message data. (Note: Recall that in Model One and
the first version of Model Two the entire user message was sent and received as a
single token.) The total user message thus consisted of the number of token packets
required to send the message by the actual implementation. Consumption and
production of tokens from and to network nodes were scaled to the size of the user
message (which was statically set for the entire simulation run). The client-read
nodes and the server-read nodes consumed the number of tokens representing a full
user message. The client-write node and the server-write nodes produced the
number of tokens representing a full user message. The network nodes could only
consuine and produce a single token. Contention for the network hardware under
these conditions simulated the interleaving of packets and the consequences of
message delays in transmission for the application software in ail actual application.
Each network node execution time (firing delay) was based on a single packet's
transmission costs.
With the modification of the produces and consumes relative to message
transmission and the single token as single packet modeling assumption, this version
of Model Two further refines the application-OS-network interactions.
Table 5 show the most granular model, Model 2 Version 2. The client and sewer
nodes were again decomposed into reading and writing components with Airing
delay times taken from the prototype implementation. The net delays are also the
same as in Version 1, up to the 2048 bytes message size. In Version 2 tokens
13-14
represent packets of data. The model uses a maximum packet size of 1024 bytes.
Larger messages require more packets. The net node is modeled to only transfer a
packet at a time causing larger messages to be transmitted piece by piece,
interleaved with packets containing other messages. This approach is of higher
fidelity to the way that network transmission actually works.
Message Size Client Read Client Write Server Read Server Write Net Delay
(bytes) (cosec) (inset) (cosec) (cosec) (cosec)
Tokens Consumed Tokens Produced Tokens ConsuIne.d Tokens Produced
64 1672 1 1672 1 1672 1 1672 1 58
128 1690 1 1690 1 1690 1 1690 1 109
256 1767 1 1767 1 1767 1 1767 1 211
512 1937 1 1937 1 1937 1 1937 1 416
1024 2383 1 2383 1 2383 1 2383 1 826
2048 3105 2 3105 2 3105 2 3105 2 826
4096 4485 4 4485 4 4485 4 4485 4 826
8192 6378 8 6378 8 6378 8 6378 8 826
16:384 16068 16 16068 16 16068 16 16068 16 826
:32468 22927 32 22927 32 22927 32 22927 32 826
Table 5. Component Delays for Model 2 Version 2
Prototype Testbed
The prototype testbed consisted of a VAXstation II/GPX with Ethernet interface
and twenty-two rtVAX 1000 systems, also with Ethernet interfaces. As in the ADAS
models, the software client and servers each ran on a separate rtVAX 1000. The
testbed used the VAXELN real-time operating environment. Figure 23 illustrates
the hardware testbed.
The software design for prototype application was similar to the ADAS graph
models in concept. Task PEs, or servers, received input messages and performed a
synthetic computational task with executive times set as in the ADAS models.
Total time for each task was accumulated with high precision by the server process,
and finally an output message was transmitted.
A LoopDriver PE, or client, caused Task PEs (servers) to repeat their operations for
many iterations. The start-to-finish time for Task PEs was measured based on
13-15
83OMb	 VAXstanoo II/GPX 	 Color
[hck	 D6QN b(2)	 Display
Ethernet	 F--1
	
f—I ROMMP Erheinet
nVAX#1
IMb
DEQNA
rt V AX *2
IMb
DEQNA
nVAX #21
IMb
DEQNA
nV,kX #22
IMb
DEQNA
Figure 23. Hardware Configuration
recording the start-time before the first message was transmitted to each PE, and
recording an end-time after receipt of last return message. The difference between
the start-to-finish time for each task and the total synthetic compute time was
considered to be "everything else," including I/O driver time, actual Ethernet
transmit time, OS kernel time, and idle times.
A LANalyzer was used to monitor the Ethernet network, measuring the total
number of packets transmitted, the number of bytes sent, and the average
utilization of the network. Capture and time-stamping of the packet data allowed
packet transmit times to be measured with high precision. As in the ADAS models,
message size was fixed for each simulation run. Figure 24 illustrates the software
paradigm for the testbed application. In this experiment, only one LoopDriver and
six Tasks were used, but expansion is possible with this testbed.
Experiment Results
This experiment was intended to demonstrate that increasing refinement and detail
in a model results in higher fidelity models as shown by the performance measures.
The results are supportive of this assertion. While ten user message sizes from
64 bytes to 32 kilobytes were used in the experiment, three representative samples
B-16
PE #3
(Optional)
LoopDriver
PE #2
(Optional)
LoopDriver
Host
Control
and
Measurement
Data
- - - - - - -	 Control and Measurement
Figure 24. Software Design
B-17
of the results are presented below, representing a small user message size (64 bytes),
medium user message size (2 kilobytes), and large user message size (16 kilobytes).
The dependent variables studied by this experiment were average throughput, server
utilization, relative error of server utilization, network utilization, and the relative
error of the network utilization.
Throughput was taken to be the total number of accesses to the network nodes
during a simulation run, scaled to kilobytes per second.
Server and network utilizations were calculated to be the percentage of simulation
time that the hardware resources for each were active. Relative error for each was
calculated to be the difference between the ADAS utilization values and the
prototype testbed values.
Since sets of simulation runs were made for each user message size, each of the
dependent values was plotted against the mean server time independent variable.
In the plot figures that follow, curve A represents Model One, curve B represents the
first version of Model Two, curve C represents the second version of Model Two,
and curve D represents the actual implementation, where
• Model One was the simplest, lumping the application and operating system
together
• the first version of Model Two refined Model One by separating the I/O
functions froth the application
• the second version of Model Two further refined the modeling to consider the
effects of Ethernet packetizing of application data
Throughput Results
Figures 25 through 30 show the throughput results for the three data message sizes.
In all cases the basic shapes of the curves show that as modeling fidelity increases,
more accurate simulation results are obtained. Curve C is closer to the actual
implementation than the other models. Over the range of the mean server time the
curves are quite different while the network transmissioni time dominates the server
time. As the server time increases, it begins to dominate the transmission time and
the curves all converge. In the case of the 64-byte message size the model
overestimates the throughput while mean server times are small, and it
underestimates throughput as the server times grow. Relative error is fairly high
with the smaller server times. This suggests that the model needs to be refined
further to improve its fidelity. The most likely area for this refinement to be carried
B-18
out is in the network services related to developing data packets for transmission.
This is likely to be true for all of the results that follow.
B-19
tow
aoo
am
Too
^6 soo
C soa
aoa
aoe
aoa
toe
tea
ta0
140
I.
toe
so
a
m
a
To
ss
so
as
so
ss
se
ss
x
as
so
Is
to
THROUGHPUT — 512 Brr KESSACR
te'	 to'	 lo'	 U
%M" SZ2V[7t TtIR
Figure 25.
THROUGHPUT — 16384 BIT KESSACE
THROUGHPUT ERROR — 512 BIT MESSAGE
10
1ao
too
ao
zo
e
_m
—so
_60
_60
rrut srJrnM TTWX
Figure 26.
THROUGHPUT ERROR — 1638 4 BIT MESSAGE
let	 ,is,	 too	 to	 11
WrAN X"tVIM I=
Figure 27.	 Figure 28.
B-20
V^-
J
YY
1 300
tans
11
iooe
e
sm
loo
4
:oa
goo
soo
00
too
we
1
THROUGHPUT — 131072 BR K SSAGI
	
THROUGHPUT LRROR — 131072 WT bCESSAGt
we	 N4	 to
taatr sane ms	 bmAx uma TOM
Figure 29.	 Figure 30.
B-21
L-3
Server Utilization Results
Figures :31 through :36 show the server utilization results for the three data message
sizes. As with the throughput results, the two versions of Model Two more closely
approximate the results from the actual implementation in terms of the shape of the
curves. However, when the relative error curves are examined, even the more refined
models show a high degree of relative error. Again, the increasing detail of the
models show increasing fidelity, but additional iterations to refine the model further
might be desirable since the models consistently overestimate the server utilization.
Network Utilization Results
Figures 37 through 42 show the network utilization results for the three data
message sizes. These results are consistent with the previous sets. For small mean
server times, the models overestimate the network utilization. They then
underestimate it for larger server times. These results also show that the most
abstract model is the least accurate. Relative error remains higher than might be
desirable, suggesting that further refinement be carried out on the model.
Experiment Conclusions
This experiment indicates that the hierarchical refinement approach is a useful way
to carry out a modeling effort. It suggests that any modeling effort should strive to
achieve the simplest model that will give acceptable results. Clearly there will be
trade-offs in cost and effort for a level considered acceptable by the modelers in any
specific case.
It is also clear that any model must be validated for it to be useful. Validation
implies that the model must be compared against reality to establish a range of
valid results. This helps to identify problems in the model or in the measurements
that were used with the model. Accurate timing information is essential to a good
simulation model and this often requires specialized equipment. A model is only as
good as the information it uses to generate results.
Modeling efforts are most useful when the simulation results can be reused. In other
words, the effort should seek to develop a reusable model that can become a library
element after validation. This lowers the cost of any modeling work over time.
Experience and judgment are important in the modeling and analysis process. Any
modeling tool is only as good as the information it has available to it, and that
information is a product of the professionals involved.
B-22
Figure 32.
SERM Urn ERROR — I VA4 Brt NESSACR
SERVER Vn1 UMON — 512 BIT WESUCK
	 SERYER urn. tWM — 512 BR 1issACc
w
0
M
M
i 39
x
u
100
M
M
1ti
M
N
fb
71
a
pool
aeoe
tsar
ssa
I^oa
fooa
1700
IWO
14M
1700
1007
000
on
400
X" III2Y1:>1 T=
Figure 31.
SERVER UTUnAnON - 163M Bn WESSAGt
4a
]300
)000
707
7000
ISM
f000
i00
10	 be	 tr'	 r
	
f
►alll tarn TM	 tau aoeM TM
Figure 33.	 Figure 34.
B-23
 1 f
b	 / /
I
M	 M
iob
M
1
1b
IA
N
	 i
u
1
N
SERVER vrtuzAnOR - 131072 BIT 1[ UCE	 SERVER LML ERROR - 131072 BR' UMSSACs
beum scam TDC
	
KM sErYQ Tm
Figure 35.	 Figure 36.
B-24
I18111 XCRYil Too
Figure 37.
NET1oRK Lr=ZATION — 16384 B3T 1ESSACt
NW MV Tom
Figure 38.
NETIORK UT11.ERROR — 16384 BIT KF.SSACZ
t7o
Soo
110
1{O
tM
tt0
100
M
M
M
>b
•
-10
fe
TR
•<
u
x
PO
t
HNIVORK UnlJ?AT1OM — 512 BR WI33AOls
	
1 TWORK LrM ZRROR — 512 BR YESSAG[
1
{
s4 {i
)1
1) •
t1
t
•
t•/ 	Ill	 1./	 t•)
11M
tM
011	, I./	YI	 If
tft111 RYYM rm	 W" URM TM
Figure 39.	 Figure 40.
B-25
100
>r	
^
vo
so
NMORK UTUJUTTON — 131072 BIT 1SACE 	 MTORK LrM ERROR — 131072 BIT VMUGZ
to	 140	 ..e	 ..I
1
so
Go
..
m
•o
^I
b
xlAx SEVVM tots
Figure 41.	 Figure 42.
B-26
Form Approved
REPORT DOCUMENTATION PAGE OMB No. 0704-0188
Pubhc re porting burden for this collection of information n estimated to average l hour per r esponse. including the time for reviewing instructions. searching existing data sources.
gathering and maintaining the data needed, and completing and reviewing the colt eaton of ,ntormauon 	 Send comments regarding this burden estimate or any other aspect of this
collection of information, including suggestions for reducing this burden, to Washington Head quarters Se rvices, Directorate ffor information ODe ratl ohs and Reports, 1215 Jefferson
Davis Highway. Suite 1204, Arlington, VA 222 
0 
24 30 2. and to the O ffice of Managemen t and Budget. Paperwork Reduction Project (0704 .0188). wash^ngton, DC 20503
1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED
June 1993 Contractor Report
4. TITLE AND SUBTITLE 5. FUNDING NUMBERS
Advanced Launch System Multi-Path Redundant Avionics Architecture Analysis C NAS1-17964
and Characterization WU 506-46-21-56
6. AUTHOR(S)
Robert L Baker
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION
Research Triangle Institute REPORT NUMBER
P.O. Box 12194
Research Triangle Park, NC 27709-2194
9. SPONSORING /MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING/ MONITORING
National Aeronautics and Space Administration AGENCY REPORT NUMBER
Langley Research Center
Hampton, VA 23681-0001
NASA CR-4516
11. SUPPLEMENTARY NOTES
Technical Monitor: Felix L. Pitts
Final Report - Task 28
12a. DISTRIBUTION/AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE
Unclassified - Unlimited
Subject Category 62
13. ABSTRACT (Maximum 200 words)
The objective of the Mufti-Path Redundant Avionics Suite (MPRAS) program is the development of a set of
avionics architectural modules which will be applicable to the family of launch vehicles required to support
the Advanced Launch System (ALS). To enable ALS cost/performance requirements to be met, the MPRAS
must support autonomy, maintenance, and testability capabilities which exceed those present in conventional
launch vehicles. The mufti-path redundant or fault tolerance characteristics of the MPRAS are necessary
to offset a reduction in avionics reliability due to the increased complexity needed to support these new cost
reduction and performance capabilities and to meet avionics reliability requirements which will provide
cost-effective reductions in overall ALS recurring costs.
A complex, real-time distributed computing system is needed to meet the ALS avionics system requirements.
General Dynamics, Boeing Aerospace, and C.S. Draper Laboratory have proposed system architectures as
candidates for the ALS MPRAS. The purpose of this document is to report the results of independent
performance and reliability characterization and assessment analyses of each proposed candidate architecture
and qualitative assessments of testability, maintainability, and fault tolerance mechanism. These independent
analyses were conducted as part of the MPRAS Part 2 program and were carried under NASA Langley
Research Contract NAS1-17964, Task Assignment 28.
14. SUBJECT TERMS 15. NUMBER OF PAGES
ALS MPRAS Architectures, Digital Computers, Distributed Processors, System Assessment, 200
Multi-Path Redundant Avionics Suite (MPRAS), Fauft tolerance, Reliability, Redundancy 16. PRICE CODE
A09
17.	 SECURITY CLASSIFICATION 18.	 SECURITY CLASSIFICATION 19.	 SECURITY CLASSIFICATION 20. LIMITATION OF ABSTRACT
OF REPORT OF THIS PAGE OF ABSTRACT
Unclassified Unclassified
NON 041J - 1JI - PSU -^wu 	 Standard Form 298 (Rev. 2-89)
Prescribed by ANSI Std Z39-18
298-102
