Assurance of COTS Boards for Space Flight by Plante, Jeannette et al.
Assurance of COTS Boards for Space Flight - Part I
Jeannette Plante, Norm Helmold, Clay Eveland
Swales Aerospace/NASA GSFC Advanced Interconnect Program
5050 Powder Mill Rd
Beltsville, MD 20705
301=595=5500 ph / 301=902=4114 fx
//i ¸- . •
Abstract
Space Flight hardware and software designers are
increasingly turning to Commercial-Off-the-Shelf
(COTS) products in hopes of meeting the demands
imposed on them by projects with short development
cycle times. The Technology Validation Assurance
(TVA) team at NASA GSFC has embarked on applying a
method for inserting COTS hardware into the Spartan 25 l
spacecraft. This method includes Procurement,
Characterization, Ruggedization/Remediation and
Verification Testing process steps which are intended to
increase the user's confidence in the hardware's ability to
function in the intended application for the required
duration. As this method is refined with use, it has the
potential for becoming a benchmark for industry-wide use
of COTS in high reliability systems.
Introduction and Background
The Spartan 251 is a vehicle which provides experiment
platforms for instruments with relatively short (days to
weeks) mission lifetimes. The Spartan project is expected
to provide experiment platforms which have a fast
development cycle time (less than 3 years) and a low cost.
To meet this challenge the project looked to COTS
computer systems primarily as a solution to the schedule
challenge.
Schedule improvements where hoped to be realized
because so many of the development activities would not
be needed: circuit design, breadboarding, troubleshooting,
parts procurement and board assembly. The software
required was standard. It would also require development
activities and specialized skills to operate it.
COTS have as many definitions as there are people to
define it. The COTS being implemented by Spartan251
are fabricated boards. Some of the boards being used are
custom designed and fabricated by an outside vendor and
they were not considered COTS during this assessment.
The entire system is called the central processing unit
electronics (CUE box), and provides command and data
handling for the spacecraft.
The lead CUE box designer identified a CompactPCI bus
based system because it provided several system and
sol, are advantages. Two arrangements were considered
(Figure 1 and 2) based on the use of 6U (6.3" x 9.18") or
3U (3.94" x 6.30") sized boards.
There were no boards identified at the time, with prior
flight history so a variety of products were purchased.
Based on how they performed individually, together and
after environmental testing, the suite of boards would be
selected. Table 1. lists all of the COTS boards that were
purchased.
COTS Assurance Background and Approach
NASA and the aerospace industry have learned a lot
about failure modes in electronic parts and boards that
result from the stresses of use in flight environments.
Military, NASA and Industry specifications and standards
exist which try to prevent products with flaws, which
cannot independently survive high and low temperature
extremes, thermal cycling, shock and vibration, vacuum
and other environmental conditions, from being applied in
flight hardware. Testing of high reliability parts exercises
them using maximum electrical ratings. Qualification
testing is done by sampling a production process and is
considered applicable to subsequent lots from that process
for some length of time (one or two years). The goal is to
increase system reliability by reducing the number and/or
size of failure causing flaws and to characterize the
quality of parts produced by a given set of processes.
The same philosophy can be applied to evaluating the
reliability of COTS boards however since COTS products
are not produced with reliability as a priority, the methods
are not the same. The processes producing the hardware
are not known to be controlled or are changed frequently
over time (one or two years), without notice to the users.
It must be assumed that COTS boards will never be able
to withstand flight environments "independently". They
are designed and built to operate in on the ground and are
often augmented with heat dissipating mechanisms such
as fans and heat sinks to enable them to survive in a 25°C
environment. Once the fan is removed, it is likely that the
https://ntrs.nasa.gov/search.jsp?R=19990040889 2020-06-15T21:42:02+00:00Z
Figure 1. CUE Configuration A, 6U Boards
Figure 2. CUE Configuration B, 3U Boards
temperature rating given by the manufacturer
will be exceeded, from component self heating.
Certainly the individual parts cannot be tested to
their electrical limits because they are in a circuit
configuration which physically precludes this.
While we test high reliability parts using known
maximum electrical, thermal and mechanical
conditions, we must test COTS boards in a wide
variety of operational modes to expose them to
the variety of electrical stresses that they may see
in application. Given that the thermal
environment will have to be controlled, whether
through the use of fans, heat pipes or other
methods, and that deriving a test plan that
addresses all possible sot_ware driven
operational modes is non-uiviai, the assurance
engineer will realize that each COTS board in a
given application should be considered unique.
Table 1. COTS Boards Purchased
Board Type
6U Processors
6Uproce_ors
6U Processors
3U Processors
Part Number
CR6400053NBC
CTM6
ZT5510C4R3P3SIM3
CPCI-3603 (103095)
Manufacturer
OR Industrial Computers
OR Industrial Computers
Ziatech
Force
Function
Compac PCI Pentium Board
I/O Transition Module
CPCI Processor Board
POWERCORE/CPCI
Processor board
3U Processors CPCI-3740 (104912) Force POWERCORF3CPCI
Processor board (233MHz)
Memory 104949 Force POWERCORF3CPCI memory
board
IP Carriers CPCI-200-FP SBS Greenspring 6U IP Carrier Board
IP Carriers CPCI-IPC Alphi Technology 3U IP Carrier board
Alphi Technology
Acromag
Alphi Technology
IP Carriers
Digital FO
CPCI-SIP
Analog Input IP-320
Analog Output IP-220-16
IP-UNIDIG-E-48 SBS Greenspring
3U IP Carrier board
Analog Input
16 channel Analog Output
DIGITAL FO
Digital I/O IP-UNIDIG-HV-8I 160 SBS Greenspring DIGITAL I/O
Digital I/O IP-UNIDIG-D SBS Greenspring DIGITAL I/O
Digital I/O IP-OPTO DRIVER SBS Greenspring DIGITAL I/O Optical Driver
Digital I/O IP-445 Acromag 32 Channel, bus isolated
Digital output
Digital I/O IP-440-1 Acromag 32 Channel, bus isolated
Digital I/O +/- 4 to +/- 18 Vde
Serial I/O SCC-04B Alphi Actis Technolg. Quad RS422 serial interface
Serial FO IP-Serial SBS Greenspring Serial I/O
Serial I/O MP-Serial SBS Greenspring Synch/Asynch Data Corn
Thermistor IF IP-Thermistor SBS Greenspring Thermistor card
Since we cannot control the quality of COTS
products we need to understand theft limitations
and control how they are applied. Sealed
enclosures can provide protection from vacuum
effects and may provide a structure in which
mechanisms such as fans or fluid pumps can be
used for thermal management. Mechanical
damping should be considered at the board and
box level to reduce vibration and shock effects.
Shielding can be used to limit the total deposited
charge from ionizing radiation. The goal is to
limit the environment's contribution to flaw
growth which can result in board or part failure.
The operational modes in which the system is
used, are as important a consideration as is the
physical environment. The operating system and
the commands used will drive the electronics in
ways which may be stressful to the electronics
when used in specific combinations, repetitively
and over time. Consideration must be given to
the intended use of the hardware and how
particularly stressful operations can be
minimized or avoided.
At the beginning of the Spartan 251 COTS
insertion project a four step process was
proposed to gather information about the
capability of the boards intended for use to
survive the intended environment and how
ruggedization or failure mitigation approaches
could be validated. This flow is shown in
Figure 3. Lessons were and are being learned
along the way that have expanded the original
definitions assigned to each step. The
elaboration described here is intended to describe
details about the first two process steps and some
of the lessons learn during their implementation.
It is hoped that lessons learned while
implementing the last two,
Ruggedization/Remediation and Validation, will
be similarly documented and fed back to
improve the process.
Though the flow chart shows a series
relationship between Characterization and
Ruggedization/Remediation we found that it was
very valuable to start considering ruggedization
options well ahead of time, even before the
characterization data was collected. It helped to
formulate questions about what the most serious
concerns were and allowed time for planning.
For example, we know that it was very possible
that the processor would need an augmented
thermal path because it was intended to be used
with a fan and heat sink. This lead the
mechanical team to start considering mechanical
impacts that heat pipes and other thermal
management augmentations would have on the
closely spaced boards and on the CUE box
enclosure.
Procurement
As noted above, the boards were selected by the
project design engineers. The TVA team
became involved prior to this selection process.
We first spent time brainstorming ideas about
what type of information may exist that would
give us insight into the board and individual part
failure rate. We are accustomed to being able to
wimess the production process, facilities and
management of companies who make high
reliability parts. Due to high reliability
qualification and screening requirements, these
manufacturers often keep data for multiple test
lots over a long period of time (3 to 5 years) and
may maintain well documented traceability
between incoming materials lots and t-mished
product lots. COTS manufacturers are not
driven by their main markets to keep these kinds
of records or to provide the manpower required
to support a NASA vendor survey. They are cost
driven and will not be able to recoup costs
required to change their processes to produce
higher reliability parts for flight use. We decided
to ask for as much data and as little variability as
possible, knowing that it would be a learning
experience as we went along about how much
the manufacturers actually document about their
product.
It is impossible to know with sufficient certainty
whether or not a COTS system will be able to
withstand the rigors of space flight use before it
is characterized through testing. To realize the
schedule benefits which make the use of COTS
so desirable, a variety of systems or boards
should be procured to increase the chances that
the project will have a full compliment of boards
to use when the testing is completed. This was
done especially with respect to key boards such
as the processor and the IP Carriers.
Single assembly lots and single lot/date codes for
components used to build up the boards were
requested. Single lot/date code at the part level
was not available at this time however the
industry is starting to respond to this need and
individual vendors are specializing in "custom
built", ruggedized COTS boards where lot
control can be implemented. A single assembly
lot was available for the boards, however the
vendor(s) that were amiable to this indicated
increased delivery schedule to meet this
requirement as well as increased cost.
Sequential serial numbers were available.
The user should be aware of the vendor return
policy prior to procurement. In most cases where
commercial product is used, flight, spare, and
ETU samples are purchased at the same time to
avoid delivery driven schedule delays. This
increases cost risks. Depending on the vendor's
policies, unopened, unused hardware may be
returnable when characterization testing shows
critical or unavoidable failure modes or other
reasons for not using the product. Most of the
vendors used on this project were flexible in
their return policy since the boards could be
resold on the commercial market if they were
unopened. Some of those that offered a return
policy gave only account credits which did not
allow the project to recoup funds for returned
items.
A thorough review of product specifications,
when available, should be performed during the
design cycle. Documentation beyond what is
available through the marketing literature is
often not available. Critical performance
parameters, not defined by the manufacturer,
may need to be requested. Careful consideration
must be given to product obsolescence of the
higher level assemblies, as well as critical
component(s) (i.e. microprocessor and peripheral
devices). Significant changes may be made to
the product without your knowledge to keep it
current with the changing platform environment.
Figure 3. COTS Insertion Process Flow
Procurement
• Mfr Redundancy
• Single Lot/Lot Traceability
• Return Policy
• Rztum/Failure Experience
• Drawing Product Specifications
• Parts and Materials List
• Product Change Notifications
• Product Characterization Data
• Visual Inspection
Characterization
• Radiation
• Mechanical
• Thermal
• Contamination
• Electrical/SoRware
Ruggedization/Remediation
• Radiation
• Mechanical
• Thermal
• Contamination
• Electrical/Software
_r
I
, Acceptable
'_ for Use
An "as built" parts and materials list should be
requested from the vendor prior to procurement.
The purpose of this is two-fold. First, potential
contaminates can be identified such as
outgassing plastics used for connector back
shells, epoxies, and cleaning solvents. Second,
this allows a parts list review which highlights
potential reliability hazards or concerns. Parts
and materials lists were requested from the
vendors selected for this project, however they
were not provided nor were they offered for sale.
When a problem was found with a specific part
during testing, the manufacturer was able to
provide part specific information.
The user should request to be made aware of
product changes that would affect their design.
Engineering changes may or may not be
adequately reflected by the documentation. It
may be the case as well, that documentation such
as circuit or layout drawings must be requested
or purchased separately. Change notifications
(controlled) were not available to the project
because it was not considered a volume
customer. A case of out-of-date documentation
was found for two board types. One set of
documentation did not match the revision letter
printed on the board and the other board
contained an un-insulated jumper wire that was
not part of the documented configuration.
The user should obtain all characterization data
available from the manufacturer. Depending on
a vendor's participation in "high reliability"
markets, they may or may not maintain quality
or reliability data. Some manufacturers perform
acceptance testing and some that ruggedize their
products for harsh environments have
qualification data to prove the effectiveness of
their approaches. Many manufactures of COTS
electronics do neither and do not have
performance data available. No vendors
provided existing characterization data for the
boards bought for Spartan 251.
As COTS f'md increasing use on flight hardware,
historical data will begin to accumulate. At this
time this is already occurring and it is prudent to
contact NASA and military organization that
may have use the products you are considering,
to learn about their experiences. It is important
to remember though that most COTS products
are not controlled for quality or designed for
reliability and cannot be qualified by similarity.
Finally, the boards should be visually inspected
upon receipt. One should verify that the product
ordered was the one shipped and that deviations
from expected configurations are noted. Look
for obvious flaws such as poor soldering or stress
on leads from unsupported parts. Solvent
residue and corrosion should be noted. Use this
opportunity to try to identify as many of the parts
on the board as possible. Some will be easy to
identify and may even have a lot date code
identified while others will not be marked at all.
Use this information to create a part map for
future reference (An example is shown in Figure
4).
It is important at this stage to be sure that all
storage and handling areas where these boards
will be kept, must be electrostatic discharge
(ESD) controlled. Packing materials must be
controlled as well. It might be prudent as well at
this point to set up a controlled
system for stocking the COTS hardware. If the
testing goes well and some "gems" are identified,
other projects will be anxious to absorb the
spares.
Characterization
The boards must be characterized in order to
understand the ability of the "as received" units
to withstand the intended application and
environment and to understand the engineering
challenges associated with providing risk
mitigation paths for the hardware.
Characterization involves understanding their
performance with respect to ionizing radiation,
mechanical shock, vibration and temperature.
Assessments must also be made with respect to
temperature ratings of the boards, increases in
ambient temperatures due to self heating, sources
of contamination, concerns with respect to use in
vacuum, and electrical performance such as
power consumption and timing considerations.
Since COTS hardware is normally designed for
terrestrial applications which maintain
temperatures around 25°C 4- 15°C (although
"ruggedized" boards and individual parts may
have much wider temperature ratings) their
ability to withstand the vacuum, thermal,
mechanical and radiation environment are
unknown until the board is tested.
Characterization testing then must be considered
to be destructive and extra boards must be
Figure 4. Part Map
procured for it. The project referred to these
destruct units as Martyr boards.
Radiation Characterization
The first step is to perform a susceptibility
assessment of the radiation environment. Based
on the mission launch date, duration and orbit,
the radiation environment can be defined. This
definition should provide the severity of the
proton, gamma ray and heavy ion environment
that will allow proper determination of the
fluence to use during Single Event Effect (SEE)
testing. SEE testing will show if charged
panicle impacts will cause the electronics to
temporarily (upset) or permanently fail (latchup,
lock'up, etc.). The environment definition should
also describe the total accumulated (electron)
dose expected so that the board's ability to
withstand deposited charge can be evaluated
and/or shielding can be designed. Total Ionizing
Dose is not an issue for short duration missions
in low earth orbits. The radiation assessment
done for the Spartan 251 mission showed that the
primary radiation environment risk was due to
protons.
Following environment definition, application
and mission-specific radiation test procedures
must be prepared. The test procedure must be
based on the conditions of the application and
should simulate in-flight usage of the COTS
hardware. COTS hardware cannot be "qualified"
as a technology like electronic parts are. They
must be validated on a lot-by-lot basis so testing
must represent the application that the parts will
be used in, to the greatest extent possible.
The proton testing consists of high fluence and
low fluence testing. High fluence proton testing
provides data on component susceptibility, rate
prediction data and total accumulated dose.
High fluence testing is destructive. For the
Spartan project, 63 MeV incident protons were
used with a fluence of 3.8E10 protons/cm 2.
Low fluence testing is performed on the flight
units and is not destructive, however the board is
considered to have accumulated total dose. Low
fluence testing provides data which gives a
reasonable confidence that the flight boards have
a radiation susceptibility that is similar to that
shown during the high fluence testing.
Mechanical Characterization
Mechanical characterization of the Spartan CUE
Box and its COTS electronics boards was done
using both structural analysis and mechanical
vibration testing. Resonance frequencies of
vibration, random vibration, steady state and
Wansient loading and thermal loading (simulating
Space Shuttle launch and landing conditions) are
applied in the testing of these items. Excessive
deformation, excessive mechanical cycling of the
boards and failure of the solder joints are the
critical reliability concerns. This is especially
the case for the Spartan 251 project because most
of the components on the COTS boards are
surface mounted. The boards need to survive
one shuttle flight.
Qualification and acceptance vibration levels
were defined for the CUE Box. The
qualification level is used on the Martyr boards
and the acceptance level is used on the flight
boards. The boards will be held in the CUE box
along their side edges, along the bottom edges
and at the top two comers of each board. Wedge=
lock type card retainers normally used in space
flight boxes cannot be used because only 0.10
inches of edge space are available for securing
the boards in the box. A less robust,
commercially available, edge restrainer was
baselined for use.
The PC boards should have resonance
frequencies of vibration that are sufficiently
different than those of the CUE box in which
they are secured so that the resonance
frequencies of the boards and box do not couple.
If they do couple, excessive mechanical
excitation could occur and cause board failure.
Structural Finite Element Models (FEMs) oftbe
boards are developed and used in all of the
structural analyses involving the boards and the
CUE box. Modal testing of each COTS board,
using free-free edge conditions is used to provide
data that can be used to adjust the FEMs so that
they accurately represent the board under test.
Three-axis random vibration testing is used to
characterize the integrity of each COTS board
and each module attached to the Carrier boards.
These tests are performed in a test fixture
designed to secure the boards using the same
hardware and configuration that will be used in
the CUE Box application. Board preparation
steps include: staking the components, soldering
in socketed parts, conformal coating of the entire
board (except at specified masked off areas) and
installation of connectors as applicable. The
staking, soldering and conformal coating steps
will be duplicated for the flight boards. Care
must be taken when applying solder, coating and
staking materials so that the temperature ratings
of the components are not exceeded.
A full board eleclrical functional test should be
performed during and after each axis of testing.
Careful visual inspections must be made to
validate that solder joints are not damaged.
Photographs should be taken for this purpose.
Maximum displacement of the 6U IP Carrier
boards was a concern. On the 8-slot backplane
chosen for the CUE box, two IP Carrier boards
will be in adjacent slots with only 0.1 inches
clearance between them. A random vibration
analysis was planned for the 6U IP Carrier using
the FEM, which includes edge constraints that
estimate the fixity that the actual board will have
in the CUE box. This analysis should provide a
good idea of where to expect the maximum
displacement of the board under random
vibration loading.
A random vibration test can then be performed
on the 6U IP Carrier with an accelerometer
attached to the location that the FEM analysis
predicts maximum displacement will occur.
The displacement data will be used for the
following two purposes: to verify whether or not
the two boards will contact during vibration and
to update the FEM. The edge conditions
simulated in the FEM will be modified so that
the FEM random analysis results match the
random test results. Sine sweep test data
obtained during the test is also used in the
adjusUnent of the edge restraint simulation.
Thermal Characterization
Thermal characterization is performed to get an
understanding of the thermal effects on the
components and materials due to: self-heating
by the electronic components and the ambient
temperature, the availability of thermal paths and
the lack of convective cooling in vacuum and
zero G. Theoretical and empirical analysis must
be done to establish the thermal conditions that
will be seen by the boards in flight configuration,
in vacuum. Worst case power dissipation's
should be used for the analysis to simplify
obtaining the power data.
Since the design margins, electrical and thermal
are not known for COTS boards, it is important
to measure these conditions for the boards
received. Infrared Imaging can be used to
measure radiated emissions from hot objects
such as the parts on the board. These radiated
emissions, which are visible in the infrared
frequency range, can be translated to temperature
provided the emissivity of the material is known.
Extreme care must be taken when collecting this
data to avoid stray sources of reflected emissions
and to isolate the subject from cooling
mechanisms (such as moving air) that will not be
available in the flight application. An
understanding of how the camera works and its
limitations with respect to accuracy and fidelity,
Tbermocoupleswerealsoputin place to track the
accuracy of the temperatures calculated from the IR data.
The delta between these two forms of temperature data
varied between 0.35°C and 3.83°C (averages for each
board). The temperatures measured were in the range of
23°C (ambient temperature) to 62.9°C. Comparisons
between the temperatures measured and the part ratings
(at least those which could be identified) indicated that
there was probably little to no thermal margin at 60°C
ambient temperature (the project's maximum operating
temperature requirement). The thermal margins were
based on the addition of 27°C to the measurements taken
at 23°C ambient Figure 6 shows the IR imaging set-up.
Thermal modeling can be used to predict or reveal
thermal problem areas in vacuum (or in a sealed
enclosure) that could not be measured during testing. The
results of these models may be selection and design of an
alternate enclosure or additional thermal management
assemblies. Additional thermocouple measurements may
be required after the results of the thermal modeling are
obtained. The thermal data indicated a need for additional
thermal management at the board and/or part level. The
thermocouple and IR data was provided to the thermal
engineering group to both validate their models of the
thermal system and to help them begin to design
ruggedization hardware.
Contamination
Since COTS boards are generally not designed for use in
applications that are extremely sensitive to contamination,
the COTS hardware must be reviewed and tested for
sources of contamination and moisture. The first place to
collect this data is from the manufacturer where it may or
may not be available. If the materials cannot be
adequately documented in this way, testing is required.
First the boards should be baked out in an oven at the
highest temperature allowed by the manufacturer's
ratings. This should be done for as long as the schedule
will allow (one to two weeks for maximum temperatures
of 50°(3). An instrumented outgassing test then should be
run on the baked out board to verify that the
contamination level is acceptable. Following bake outs,
the boards should be conformally coated in accordance
with NASA standard procedures. Extreme care should be
taken to avoid exposing components to temperatures
beyond their rating.
Summary
By using this methodology for purchasing and
characterizing COTS boards, a project should be able to
understand the most critical vulnerabilities of the system
in the space environment and will be able to create a
ruggedization and/or risk reduction plan that addresses
these quantified concerns. Theoretical and empirical
methods can be used to simulate environments without
sacrificing hardware. Since qualification testing has
limited application in the case of COTS, it is important to
get data for the flight units in the configuration in which
they will be used.
This work was done under contract to NASA GSFC for
the NASA Advanced Interconnect Program and the
Spartan 25I program. Work which was performed on the
Spartan 251 COTS task which contributed significantly to
this paper was done by Dr. Michele Gates, Greg Martins,
Dr. Henning Leidecker, Don Deibler and Beverly Settles.
Their contributions are appreciated
