Risk Reduction for Use of Complex Devices in Space Projects by LaBel, Kenneth et al.
Risk Reduction for Use o f  Campiex bevices in Space Projects 
ABSTRACT 
We present guidel!nes to reduce nsk to an acceptable level when usmg complex devices n space appllcalions 
Appircation to Virtex 4 Field Programmabe Gate Array (FPGA) on Express Logtstlc Carner (ELC) project is prosenled. 
Porhtiadng ond Sslecting the Fwtianal  
state space fw TeetrQ (Con* bi intng ELC Test Reqwrewntr (Cant.) SffI Errcr Crars-Sectiw Cnlculatinns 
- i I 
- 
ConrtraimQ rk berigo State *ce - ,Z beiinatim 
.,r*l % - ~ , v . . ' ~ ~ u r a r a r ~ ~ ~ r & - ~ d " * ~ , ' ~ ~ * & w  Ir.rx** 
,*, * --* ,*e,~*.p-%-***hwx-* 1 -  ' I 
References 
Satliny Gools waj Specitl;aiims for ELC SfU l'e*Ylx 
To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 lEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 
https://ntrs.nasa.gov/search.jsp?R=20080040866 2019-08-30T05:34:38+00:00Z
Risk Reduction for Use of Comp ex bevices in Space Projects 
Melanie Berg, 
Christian Poivey, 
Dave Petrick, 
Kenneth LaBel Scott Stansberry 
Mark Friendlich USC Xnf ormaf ion Sciences Instihut@ 
ME1 Technoiogies EPIC. NASAlGPFQ: 
ABSTRACT 
We present guidelines to reduce risk to an acceptable level when using complex devices in space applications. 
Application to Virtex 4 Field Programmable Gate Array (FPGA) on Express Logistic Carrier (ELC) project is presented. 
To be presented by Melanie Berg at the lEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahame.gsfc.nasa.gov 
Introduction 
With the increased complexity of Field Programmable Gate Array (FPGA) technology, users are now able 
to utilize them to implement System On a Chip (SOC) applications. A design's state space consists of a 
combination of its hardware and software. Because of the large number of gates, the modes of operation, 
and amount of software that are contained within SOCs, they have a tendency to be incredibly complex 
solutions. However, this complexity makes it quasi-impossible for a customer to verify such products within 
near 100% fault test coverage, due to limitations such as: time, verification tool constraints, memory 
restrictions, and available tester speeds. 
In order to increase test coverage, the ASIC industry has developed Design For Test (DFT) 
methodologies [I]. However, such schemes have not been fully embraced by the FPGA community and test 
coverage remains an issue. When characterizing single particle radiation response within these devices, the 
goal is to compare, via test processes, normal operational response versus ionizing fault response. 
Obtaining the ability to observe single particle radiation-induced faults in a SOC increases the intricacy of 
the test requirements exponentially. Given the inherent restrictions in test coverage within normal 
operational environments, it becomes unrealistic to aim for a full radiation-response characterization 
covering all possible states of such products. However, an effective analysis must be performed to 
accurately determine project specific risk reduction techniques. Traditional qualification for space 
approaches may no longer be valid for contemporary, complex integrated circuits unless unrealistically large 
sample sizes and particle fluences are utilized. The goal is therefore to constrain the targeted state space to 
a level that will provide acceptable information to qualify the SOC operability in space. 
In this paper we will present the qualification methodology we have applied to one SOC: the Xifinx Virtex4 
XC4VFX60-SOC as implemented in the NASA Space Cube targeted for the Express Logistic Carrier (ELC) 
mission. ELC is a carrier to transport equipment and material to and from the International Space Station 
(ISS). The Space Cube utilizes a redundant Power PC topology within the FX60 to perform several data 
processing functions in a space radiation environment. Within this discussion, we will also present a 
synopsis of the NASA High Speed Digital Tester (HSDT) [2] that contains an "all-in-one" custom designed 
Virtex-4 Configuration Manager, Scrubber, Fault Injector, and read-back manager. 
To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 lEEE Radiation Effects Poster Session and on http:Nrahome.gsfc.nasa.gov 
Constraining the Design State Space - A Definition 
A synchronous design's state space is related to the number of D flip-flops (DFFs) utilized. In its simplest 
form, the upper bound of the design's possible state space is represented as: 
State Space = 2" (n : number of DFFs implemented within the DUT) [3J 
Combinatorial logic is not considered as part of the equation because synchronous designs are defined 
by clock periods. At the end of each clock period the circuit is settled in one of 2n possible DFF states. 
Design's that are not effected by noise or radiation in most cases are bounded by a much smaller subset of 
the 2n upper bound. However, when noise or radiation is a factor, any state is reachable. 
Due to time, money, and physical resources, it is impossible to test for every condition that can cover the 
design's entire state space, when performing Single Event Upset (SEU) radiation testing of complex circuits 
such as SOCs. It boils down to research vs. application, and hence, SEU tests have been divided into two 
categories: 
(1) Device Characterization (research driven): error rate catculations of device primitive circuits such as 
(but not limited to): DFF's, inverters, buffers, Look-up Tables (LUTs), I/O, and configuration memory. 
Accurately deternining error cross sections for multifaceted device primitives can take years due to the 
complexity of determining each elements contribution to error cross sections. Fault Masking and element 
cascading creating non- linear effects and dependencies are major contributors to the necessity of 
developing large test sets that can hone in on particular elements and accurately measure their single event 
responses. 
(2) Design Characterization (application driven): targets a specific design under test. Such calculations, 
when constrained and analyzed correctly, may only take months. Element obsewability is minimal. 
Therefore, designs should be specific to the application under investigation. The objective is to determine 
the strength of a given design in a space environment. 
To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 lEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 
Setting Goa s and Specifications for ELC SEV Testing 
The requirement was to supply the ELC mission with a radiation characterization of the Xilinx Virtex4 
XC4VFXGO-SOC with dual-core embedded PowerPC processors. Due to the stringent schedule constraints 
required by the project, design characterization of the Space Cube was considered. However, due the 
complexity of this processor based design, its state space coverage had to be defined and constrained. In 
order to constrain the SOC's state space, a strong understanding of the targeted device and the design's 
infrastructure was essential. To be concise, the objectives were: 
(1) To develop a design under test (DUT) that was compatible with the actual design targeted for the ELC 
mission 
(2) To constrain the complex state space such that the design" characterization was informative and a 
good representative of the actual flight project 
(3) To observe and compare possible radiation hazard responses 
(4) To determine an appropriate fault cross-section metric in essence to supply the mission with 
qualification data. 
To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahorne.gsfc.nasa.gov 
Synopsis o f  the €LC SOC besign 
The Space Cube processor card is populated with two Virtex-4 FX60 devices yielding a total of four 
PowerPC processors. Each processor is allocated 50% of the FPGA fabric and is considered an 
independent processor node. All four processors run independently of each other, and results are voted on 
by a separate rad-hard FPGA. The rad-hard FPGA is also tasked with trapping error conditions and flagging 
which processor node needs immediate attention. A combination of internal and external interaction will 
bring the malfunctioning processor back online and resynchronize its tasks with the other processors. 
Different approaches are being considered to bring the processors back online (ex. warm reset, full re-boot, 
partial reconfiguration.). Radiation test results will hopefully aid Space Cube team to determine what 
procedure is needed as well as what mitigation is the best to keep processors functioning as long as 
possible (i.e. scrubbing, Internal Configuration Access Port (ICAP) hardware controller, some combination of 
the two, etc ...) to bring the processors back online (ex. warm reset, full re-boot, partial reconfiguration, etc.). 
Figure 1 shows the block diagram of the embedded PowerPC and the major FPGA interfaces. The current 
hardware system interfaces to the processor using Xilinx specific Processor Local Bus (PLB). All 
instructions and data are stored in external RAM. 
To be presented by Melanie Berg at the lEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 
Timers u 
PLB Master Instructfan 
Read Interface CICM 
Cache Units 
- - - - - - r - - - -  - r e -  
D-Cache : D-Gache 
Array Contra!hr 
PLB Master kLB Master Data External-Interrupt lnstructian 
Read Interface Write fnterface OCM Controller Interface JTAG Trace 
Figure 4 :  ELC Power PC Black Diagram. 
To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gsv 
Partitioning and Se ecting the Functions 
Pcr?itianing Scheme 
The functional state space was partitioned as follows: 
(1) CPU (and its interfaces) along with Cache Units and 
(2) MMU and Timers/Debug Logic. 
Partitioned logic will be tested in stages. This discussion focuses on a portion of stage one: the CPU and 
its interfaces. 
Design Under Tes:st 
Geneg~6 
As previously stated under ELC objectives for a design characterization, the selected test structure 
should be representative of application. Fig. 2 shows a block diagram of the processor node test design. 
This figure only depicts half of the Virtex-4 FX60 implemented as the Design Under Test (DUT) unit. The 
processor node design is instantiated for both processors and has independent control lines from the tester. 
In addition to this processor design, the radiation test design includes a large shift register to exercise the 
logic part of FPGA (used for latch-up testing). 
Gus tom High-speed Peripher~l 
The Custom High-Speed Peripheral (CSHP) (illustrated in Figure 2.) is an instantiated Memory IP core: 
8x32bits. This is a novel approach to high-speed SEU Power PC testing. The processor writes the CSHP 
as it is writing a 32 bit memory location. However, there is no memory on this port. It is actually, the HSDT 
posing as memory and grabbing data. Data rates are 64 MHz by 32 bits, however, accounting for memory 
write overhead reduces this throughput by a factor of 4. Therefore data rates through the CHSP are 512 
M blsec. 
I u U.d yl UUUI IIbU U y  I Y l \ r l U I  I I L ,  YUI 3 U, L,  IU lLLL I .U"IUUI UI IU VyUUb I \ L I U I C I L I U I ,  L l l i r U L U  V", I IUI  UI IVU \I .VI \LV/) V U l J  LU Lr l , 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 
Partitioning and Se ecting the Functions 
State Space for Testing 
Figure 2: Single Processor Node DUT design. 
'Ibstc~ lteftti ertce Clc~ck 
Tester Reset C'o~itaal 
To be presented by Melanie Berg at the lEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2087 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 
Bef ining ELC Tes* Requirements 
In order to create a test vehicle with a sufficient number of observable points of the design's state, the 
tester must contain a large number of high-speed I/O that connect directly to the DUT. The Xilinx Virtex4 is 
a SRAM based FPGAfor configuration storage. It therefore becomes necessary to be able to scrub (correct 
errors in) the configuration memory (or re-write the erroneous bits). 
Figure 3: HSDT and XC4VFXGO-SOC Test Vehicle, 
To be presented by Melanie Berg at the lEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http:l/rahome.gsfc.nasa.gov 
CISDT I c  DUT Inle~faci? - General Dgscription 
Due to the numerous amount of available high-speed 11'0, the HSDT was utilized for SEU radiation testing 
of the XC4VFXGO-SOC. The HSDT controls IRQ, clock, and reset inputs to the DUT. It is also the 
responsibility of the HSDT to grab data from the DUT via the CHSP or the UART interfaces. The purpose of 
the DUT's CHSP from the perspective of the HSDT is to create a high-bandwidth bus to the DUT on which: 
(1) The tester can control the DUT 
(2) Tthe DUT can transfer dynamic (high-speed) data to the HSDT 
(3) The tester can send instruction code to the DUT configuration SRAM (through a custom made 
SRAM CNTRL unit that can MUX between processor control or HSDT control). 
Fault l ~ j e ~ t i a ~  
The third concept is a novel approach and is also used for real-time fault injection controllable by the 
tester into the instruction and or Data path. The test software application code is loaded and executed out of 
external RAM. 
Frequency Cansid@r~ti~ns 
In accordance to emulating a design representative of application, the DUT processors were exercised at 
the same speed as proposed in the ELC mission - 250 MHz. Data acquisition was implemented in 2 
separate Categories: 
(1) "Ping Pong'" Interrupt driven counter increment followed by a transmission to the HSDT. Time 
between Interrupts was varied. 
(2) Constant data acquisition: Counter was incremented every PC cycle and sent to the HSDT via the 
CHSP 
To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 
bef ining ELC Test Requirements 
Scrubbing 
ELC plans to utilize the XiEinx ICAP and Frame ECC cores for scrubbing the XC4VFX60. Such cores are 
DUT internal (unhardened) scrubbing mechanisms. To determine the effectiveness of this scrubbing 
methodology a comparison of the internal scrubber vs. an external scrubber that can potentially be hardened 
was performed 
E ~ t g r n ~ i  TEWW C o ~ f r ~ j  af the Sgig~t  P b f i ~ j ~  f t e H g ~ @  
The HSDT stores the DUT's configuration bit file within onboard SRAM. The user is able to control 
configuration, reconfiguration, scrubbing, fault injection, and read-back from the HSDT console. Fault 
injection was used to assist with error analysis and scrubbing verification. 
Inz@rnax" S~rubb i~g :  !CAP and FRAME ECC 
Knowing that the ICAPIFRAME-ECC design is based off a single error correct, double error detect 
(SECDED) correction scheme, both single and multiple bit frame errors were injected byvia the HSDT. All 
single bit errors were corrected. Double bit errors were not corrected, but detected as expected. However, 
there were multiple bit errors that went undetected and some false corrections were also detected. This is of 
interest because it has been observed that multiple errors are a major concern within the V4 configuration 
memory upon radiation exposure [3]. We irradiated the ICAP design utilizing our custom scrubber under 
heavy ION beam at Texas A&M University Cyclotron (TAMU) on February 17th-20th 2007 for further 
comparison studies. Please see Figures 4 and 5 for results. 
To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http:llrahome.gsfc.nasa.gov 
Testing was performed at heavy Jon and proton facilities. This presentation will only consider proton SEU 
data. 
Facility: Indiana University Cyclotron Facility. 
Energy: 93 MeV and 200 MeV 
Flux: I e7 protons/cm~/s 
Fluence: All tests were run until Single Event Functional Interrupt (SEFI). 
It had been decided that flux will be kept extremery tow in both proton and heavy ION test facilities. This 
decision was made based off of: 
(1) Previous SEU testing experience with SRAM based FPGAs 
(2) the fact that the design is complex. Errors can accumulate before propagating making it difficult to 
determine the accuracy of the error rate calculation. 
To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 
Proton SEU Resu 
............ 
"a" .............. ... 
. .  ......... ... . 
Energy (MeV) 
Figure 4: SEFl Crass Sections - A Comparison of Scrubbers. 
To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 
Time between interrupt (ms) 
Figure 5: Error Crass-Section while Vaving Time Between @ach Interrupt. 
To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 
Because we were performing a design characterization, it was necessary to measure error cross section 
by SEFl over fluence. We ran until SEFl occuranvce therefore the error cross section per device is defined 
as: 
fluence 
To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 
When performing a SEU analysis for a complex SOC device it is important to have a full understanding 
of the design(s) under consideration. Device level characterization is theoretically impossible to achieve 
within the time constraints of a mission project (hawever, achievable within a research environment). 
Therefore it is important to develop a design characterization approach to radiation testing for maximum 
project risk reduction. 
It is necessary to implement a DUT that is a replica (or very ciose to) the actual design within the project 
under investigation. However, due to the complexity, the design's state space must be constrained without 
loss of imperative data coliection/information. We chose to constrain the ELC Space Cube by: 
(1) using only 2 out the 4 Power PC's 
(2) Selecting simple software routines that will not mask operation 
(3) Changing the frequency of processing (time to interrupt and constant high speed counting) 
Proton test results illustrate using an external hardened scrubber will reduce error cross section by a 
magnitude of 10. There are many reasons for such results. However, the key is that the ICAPIFRAME ECC 
core is only a SECDED (single error correct double error detect) module. It has been shown that 
configuration memory is subject to multiple bit hits. The external scrubber is capable of correcting any 
number of multiple bit hits as long as DUT internal scrubbing interface, scrubbing logic, scrubbing registers, 
and un-writable configuration bits are not hit. 
To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 
Acknow edgment 
The Authors gratefully acknowledge support from, ELC, the NASA Electronic Parts and Packaging 
Program (NEPP), the Defense Threat Reduction Agency under IACRO# 07-42071, and Xilinx Corporation. 
References 
[ I ]  "The IBM ASIC/SoC methodology-A recipe for first-time success" 
http://www.research.ibm.com/journal/rd/466/doerre,ktml. 
[2] J.W. Howard, et al, " Development of a Low-Cost and High-Speed Single Event Efftects Testers based 
on Reconfigurable Field Programmable Gate Arrays (FPGA), "SEESYMOG, April 2006. 
[3] Melanie BergnFPGA Design Strategies for the Space Radiation Environment," SEESYMO6, April 2006. 
To be presented by Melanie Berg at the lEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 
