Abstract-This paper provides information regarding the use of the Xilinx Virtex-4 field-programmable gate array (FPGA) in a spacecraft deployed to low-earth orbit. The results are compared to pre-deployment accelerated single-event effects (SEEs) and faultinjection testing.
Flight Experience of the Xilinx Virtex-4 can be quite high, which can make fault-tolerant computing difficult. One of the MRM applications is a technology-readiness-level (TRL) application that monitors the commercial components for computational errors. This application provides information about single-event upsets (SEUs) that are scrubbed from the FPGAs' programming data, computational failures in the FPGAs, and SEUs in the auxiliary memories that could affect the FPGAs' computation. This information is time stamped and returned to the ground station so that the errors can be correlated to where they happen on orbit. The system has also been instrumented to indicate the location on the FPGA that has been affected, so that SEUs can be correlated to accelerated radiation and fault-injection testing completed before launch. The intention of this instrumentation was to study both how the FPGA responded to the space environment, how the application responded to radiation-induced faults and how well we could predict the behavior of both the FPGA and application using pre-deployment accelerated SEEs and fault-injection testing.
In this paper, we present data for the MRM system from October 2011 through December 2012. Section II provides information about the MRM system and the TRL application. Section III provides information regarding accelerated radiation test data and fault injection data collected before launch that was used to develop CREME96 predictions in Section IV. These predictions are compared to on-orbit results in Section V. Fig. 1 illustrates the general user-circuit architecture for the TRL application. The TRL application is in part a digital signal processing (DSP) application that has been implemented so that the same algorithm can run in lockstep on both FPGAs, despite the size differences in the FPGAs. Due to differences in the two FPGAs, the SX FPGA uses DSP units (multiply-accumulate cores) to implement the algorithm, whereas the LX FPGA uses both DSPs and LUT-based multipliers to implement the algorithm. The application also includes circuitry to monitor Samsung quad-data rate (QDR) SRAMs and BAE C-RAM for errors.
II. THE TRL APPLICATION AND THE MRM SYSTEM
The TRL application has been mitigated for SEUs using the BL-TMR tool for automated implementation of triple-modular redundancy (TMR) [1] . The user circuit includes intellectual property (IP) processing cores that were generated in the Xilinx CoreGen tool. Due to CoreGen's use of proprietary design formats, the BL-TMR tool triplicated these cores at the core level instead of at the gate level. The rest of the circuitry was triplicated at the gate level. scrubber algorithm uses a frame-based readback and repair methodology. Readback-based scrubbers read FPGA configuration memory frames and only repair single frames of data as SEUs are detected. The LANL scrubber calculates a checksum for each frame and then compares it to a "golden" checksum to detect errors. The scrubber uses radiation-hardened static random-access memories (SRAMs) from Aeroflex to store the golden checksums. Because SEUs can affect the golden checksums, the scrubber for this system can also detect and scrub SEUs in the golden checksums.
As the two FPGAs in each LEU have the same algorithm, run in lockstep and have the same input data, it is possible to monitor one FPGA with the other FPGA for observable output errors. This process is done by sending the output of the TRL application code to both a local comparator and a comparator on the other Xilinx FPGA. The comparators themselves are triplicated for added reliability. Additionally, the comparator redundancy between the two FPGAs provides additional confidence in the reported errors. If the two comparators report the same errors and the same captured data streams, there is a significant probability that the errors are from the TRL application and not the comparators themselves. If only one reports an error, then the likelihood is that a comparator error occurred.
When an error occurs, each comparator circuit notifies the RTAX and samples of the output streams are sent to the ground for further processing. Furthermore, information about SEUs and single-event functional interrupts (SEFIs) are sent to the ground for processing. The system provides an approximate time for all of these different types of events to aid in correlating output errors with SEFI or SEU effects.
III. PRE-LAUNCH ACCELERATED AND FAULT INJECTION TESTING
Several organizations have performed radiation testing of the Virtex-4 and the results are summarized in [2] - [6] . In [2] , the saturated heavy ion cross-section for configuration memory is stated as (cm )/(bit) and for BlockRAM memory is (cm )/(bit). The proton cross-sections for configuration memory and BlockRAM memory are the same at (cm )/(bit) [2] . LANL also collected its own heavy ion and proton test results from 2004 to 2006, which correlate to previous test results. The LANL testing also included a validation of the scrubber using 63.3 MeV protons to ensure proper scrubbing before launch.
As the application was to be mitigated with TMR we completed studies on multiple-bit upsets (MBUs) and TMR. These studies showed that MBUs could be a limitation to TMR [7] . The MBU analysis for the Virtex-4 was published in [6] . These results indicated that % % of all SEUs caused by 63.3 MeV protons would be MBUs, although the configuration memory could have up to % % MBUs. In heavy ions, the MBU rates could be much higher, as shown in Fig. 3 . For the LET ranges that cover the naturally occurring space particles, single-bit upsets (SBUs) occur at least 85% of the time. Of the MBUs that occur, most of the upsets are 2-bit upsets, as shown in Fig. 4 . Only at 30 (MeV-cm )/(mg) and above do 3-bit and 4-bit events occur at 1% or higher rates and 5-bit and larger events are exceptionally rare unless the charged particle strikes the component at a large angle of incidence.
Fault injection testing of both FPGAs was completed prior to launch using a similar methodology to the one discussed in [8] . Fault injection is a useful process for finding sensitive bits in the application. Sensitive bits are SRAM bits in the FPGA user circuit that can cause an observable output error, if flipped. These sensitive bits are sometimes caused by problems with the application of TMR. In cases where TMR has been only partially applied to the user circuit, sensitive bits are expected in the unmitigated portion of the user circuit. We have found from previous experiments that the number of sensitive bits in an FPGA user circuit can be used with the static cross-section information to define an observable output error rate and that this rate is often much smaller than the SEU rate for the components, due to utilization and logical masking. The fault injection tests showed that there were some single points of failure remaining in the application, but no persistent cross-section [1] .
Fault injection was particularly challenging for this application, because the application requires one second to properly initialize and synchronize. As the fault injection methodology that LANL follows requires resetting the system after the fault injection process is completed for each bit, it would have taken two years to complete fault injection. For this application, the system was reset only after injecting a fault that caused an observable output error. Because it is possible that corrupted state created the output error and not the fault injection into a particular bit, we completed the standard LANL fault injection process for a window of bits around the location that caused the output error.
After running multiple tests using the faster fault injection process, we found that there was a subset of bits that caused output errors on every fault injection pass. These particular bits were most likely to recur in the detailed fault injection process for the windows. While we were able to perform one pass of fast fault injection on either FPGA in about four hours, it took months to complete the detailed fault injections on the windows.
IV. CREME96 PREDICTIONS
The radiation environment is shown in Fig. 5 . MRM experiences two distinct radiation environments: the South Atlantic Anomaly (SAA) and the polar cap regions. This figure shows that the flux of greater than 0.1 MeV protons spans nearly all of the orbit locations, although protons with enough energy to transport through the spacecraft and upset the FPGAs are commonly only found in the South Atlantic regions. Detailed analysis of the operation of MRM indicates there should be approximately four descends and four ascends through the main portion of the SAA region every day.
The data in report [2] were used as the inputs to CREME96. For this orbit, CREME96 predicts an SEU rate of 50 (SEUs)/ (device-day) for the LX FPGA, 18 (SEUs)/(device-day) for the SX FPGA, and a total of 68 (SEUs)/(LEU-day) for each LEU, based on an active solar environment. Because the BlockRAM is not being monitored, these values are based only on SEUs in the configuration memory, which includes configuration logic blocks, input/output block, BlockRAM interconnect, and DSP units. For a quiet environment, the SEU rates are approximately 25% higher per day. For both solar conditions, 98.5% of SEUs are predicted to be from protons.
Fault injection results for the application indicate that 0.4% of all SEUs in the configuration memory of the XQR4VSX55 would cause observable output errors. Likewise, 0.1% of all SEUs in the configuration memory of the XQR4VLX200 would cause observable output errors. When using the worst-case predicted SEU rates, each Virtex-4 should have an observable output error approximately every 14 days for the SX FPGA and 20 days for the LX FPGA for an active solar cycle. These results also indicate the application should be able to withstand over 99% of the SEUs without an observable output error, due to the mitigation applied to the application.
When MRM was deployed we had no idea what the MBU rate would be, as there is no tool to generate such a calculation. After launch we did experiment with trying to develop an MBU prediction with CREME96. We fitted the heavy ion MBU cross-sections from Fig. 3 to a Weibull curve, which provided us input values of an onset of 0.5 (MeV-cm )/(mg), a width of 67 (MeV-cm )/(mg), an exponent of 2.3, and a limiting cross-section of 2.04 micron . As we only had one data point for protons, we approximated a Weibull curve using the data from report [2] reducing the limiting cross-section by the MBU percentage from our data point. CREME96 predicts the MBU rate as 1.18 (MBUs)/(device-day) for the SX FPGA and 3.31 (MBUs)/(device-day) for the LX FPGA. These rates would mean that the MBU rate for the SX FPGAs would be 6.5% and 6.6% for the LX FPGAs. It should be noted that we are not stating that this method is a correct usage of CREME96, as it is completely devoid of a physical interpretation. It was merely the only option we had for predicting an MBU rate with current tools. The Virtex-4 has a few SEFI modes. Most of the FPGA SEFI modes affect either the programming data or the interfaces for scrubbing the FPGA. The power-on reset (POR) SEFI, which unprograms the FPGA and leaves it in an uninitialized state, has a predicted occurrence of once every 24-32 years based on solar cycle. The SelectMAP SEFI, which affects reading and/or writing to the SelectMAP interface for scrubbing purposes, has a predicted occurrence of once every 26-34 years. While the MRM scrubber was designed to handle the occurrence of SEFIs, we did not expect to observe SEFIs while deployed.
V. ON-ORBIT RESULTS
In this section we compare the accelerated SEE and fault injection test results with the on-orbit behavior of the system by looking at several months of operation. This discussion focuses on where the SEUs occur in the orbit, how often MBUs occur, where MBUs occur in the orbit, and how valid the CREME96 predicted SEU rates were. These results use an extension of the analysis tools described in [6] and [8] that were modified to take the on-orbit data as input. All of the results presented in this paper are for the TRL application only.
A. SEUs
During this time period the two LEUs have seen a total of 11,330 SEUs between the four FPGAs. Fig. 6 shows a breakdown of (SEUs)/(device-day) for each of the four FPGAs. The TRL application runs exclusively on LEU1 and rarely on LEU2, so the data for LEU2 is not as complete as LEU1. On average the LX FPGA has and the SX FPGA has . 1 The ratio of device-days be- 1 Many of the calculations in this paper are based on quotients of two observed values, such as . As stated in [9] , the error on value is , where both and are based on the 95% confidence intervals. In many cases, these error bounds are slightly narrower than the error bars traditionally used for cross-sections.
tween the LX and SX is consistent with the variation in the FPGA sizes.
For the same time period, CREME96 predicted that the number of SEUs for the two LEUs should have been 51,500 SEUs based on operational usage. This value indicates that the predicted SEU rate is approximately four to five times higher than the actual SEU rate. A breakdown of the per month ratio of actual to predicted SEUs is shown in Fig. 7 . This graph shows not only very little month-to-month variation but also very little SX-to-LX variation, which leads us to believe that these values will not change significantly in the future. While we find this result to be consistent with our experience with flying the Cibola Flight Experiment (CFE), which has nine Virtex-1000 components, and other organizations' experience with their deployed Xilinx Virtex family components [10] , we have examined the results in more detail. To this end, we studied how the uptime in the SAA and shielding could have affected these results.
We noticed during the turn on phase of the payload that we we could affect the ratio of the actual to predicted SEUs operationally. In particular, if the payload was only scheduled for the SAA region, the ratio could be above 100%. Since finishing the turn on phase, when we started scheduling the payload to be operational as much as possible, getting these high ratios was impossible. We believe the difference is based on where the payload is operational. This situation became more clear, when we realized in March 2012 that regular per-orbit maintenance work was occurring in SAA. This maintenance work makes the payload to be non-operational and SEE data cannot be collected. By minimizing the maintenance work we were able to increase the operational time by approximately 15%. This change did not increase the SEU rate by 15%, because the time in the SAA did not increase by 15%. In Fig. 8 we show the percentage of time that the payload is operational in the SAA. This graph explains some of the dips in the (SEU)/(device-day) plot (Fig. 6) , such as in February 2012 when the utilization was low. We took the time operational in the SAA and used that to predict SEU rates for the SAA. The ratio of actual to predicted SEUs for the SAA is shown in Fig. 9 . From this analysis we have found that the ratio of actual to predicted SEU rates for the SAA is on average % %.
Our other concern is the amount of surrounding metal in the satellite. While one side of the payload is on an outer edge of the satellite with a standard 130 mils of aluminum, the other side is surrounded by several inches of aluminum. In discussion with other researchers 10 cm of aluminum would shield the protons below 200 MeV, oxygen below 400 MeV and iron below 800 MeV [11] . Therefore, it is possible that extra shielding on the interior wall affects the SEU rate. Unfortunately, without a mechanical model of the spacecraft, which we are unable to get, we are unable to quantify fully the effect of shielding on the SEU rates.
Because the actual SEU rate is much lower than predicted the time between observable output errors should also increase. The original estimate is from 14-20 days based on 68 (SEUs)/(LEU-day), but should be amended to 50 days for the SX FPGA and 71 days for the LX FPGA to reflect the approximate 19 (SEUs)/(LEU-day). There have been six output errors since October 2011. The current rate for output errors is once every 45 days with a 95% confidence interval of (2 days, 597 days). Furthermore, the LEUs have been available for 0.999999 of all uptime and the application is immune to 99.9% all SEUs. As the data are very sparse, there is a chance that further corrections might occur as more data are collected.
While there have been a number of flares since October 2011, only a handful of flares caused a change in the SEU rate. In January 2012 there was an M8 solar flare with a full-halo coronal mass ejection and an Xl solar flare with an asymmetrical-halo coronal mass ejection. These flares created a significant disturbance to the proton fluxes as measured by the GOES satellites. Since the January flares there were several substantial flares in July 2012. That month alone had five days where the SEU rate was higher than average, including one day where the SX FPGA had twice the number of SEUs than expected. The (SEU)/(device-day) is higher than average for the month of July due to these solar events and not operational events. During these solar events, the SEU rate increased by 25-50% without triggering observable output errors.
B. SEU Locations
Figs. 10-12 shows a projection of the sub-satellite locations for all SEUs for the four FPGAs. Fig. 10 displays a subset of the plot at the SAA in a heat map style that shows the density of SEUs in 3 squared regions. These plots show the majority of the SEUs occur in the SAA. The handful of non-SAA SEUs occur predominantly around the northern polar cap. The payload has only started interacting with the southern polar cap as the active solar conditions intensify.
When compared to Fig. 5 , one can see that the system is getting good east-to-west coverage and northern coverage of the SAA, including the western most tip north of New Zealand. The southern coverage of the SAA is still sparse, due to the previously mentioned operational issues. In the last six months since the operational change, there has been an increase of density of SEUs in the entire SAA region, although the southern portion is still predominantly absent. While there is a large population of low-energy protons in in the southern boundary of the SAA, we believe that it will just take time to accumulate SEUs in this region. Over the course of the summer, we have also seen an increase in SEUs in the northern polar cap and mid-latitude regions.
C. SEFIs
To date, five SelectMAP SEFIs have occurred on two different FPGAs. As discussed in Section IV the error rate for the SelectMAP SEFI should be approximately every 34 years for the active solar cycle. We are still investigating why we the SelectMAP SEFI rate is higher than expected. The scrubber is designed to detect and correct SEFIs. When these SEFIs occurred, the scrubber properly detected and corrected the SEFI. Once the SEFIs were cleared, the component returned to fault-free operation each time.
We have also seen a handful of events that are clearly not typical SEUs. In these cases, many bits in a single configuration frame were corrupted. While these types of events did occur occasionally in accelerated radiation testing, the event's occurrence was sporadic and we were unable to predict an on-orbit error rate. We are also not certain what the mechanism is that causes these events to manifest. The scrubber was able to correct these events without intervention and without triggering an observable output error.
D. MBUs
We determined that % % of the on-orbit SEU events are MBUs. These MBUs are defined as multiple SEUs Table II shows the breakdown of SEUs by size. The presence of 5-bit and 6-bit MBUs seemed unusual. Initially, all of the large MBUs occurred on the SX FPGAs, although recently one the LX FPGAs experienced a 6-bit MBU. Further analysis of these events determined that these events were exclusively occurring in the SAA, randomly distributed in time. When we went back to the pre-launch MBU testing, we determined the 5-bit and 6-bit MBU rate is reasonable. In particular, 6-bit events occurred 0.009% (0.008%, 0.017%) in 63.3 MeV protons and less than 0.06% in heavy ions. Currently, the 6-bit MBU rate 
E. Memory Types
While the FPGAs' memory is made of standard SRAM cells, its memory cells are laid out in three feature sizes from 90 nm to 130 nm. From testing, we have found that these different memory cell layouts have subtly different sensitivities to radiation. Information about the percentage of space on the component versus the percentage of SEUs is shown in Table III . This data shows that BlockRAM interconnect is a very small portion of the reconfigurable fabric, but a third of all SEUs. On the other hand, CLB memory makes up the majority of the fabric, but less than two thirds of the SEUs. Data on the BlockRAM is not available as it is not being monitored for SEUs.
When the data are analyzed in terms of routing or logic memory cells, most of the SEUs affect routing resources. SEUs in the programmable routing used to connect the programmable logic comprise % % of the SEUs and the programmable routing used to interface to the hard cores, such as the digital signal processing units and the BRAM, comprises % % of the SEUs. SEUs in the programmable logic, which consists of lookup tables (LUTs), comprise % % of the SEUs.
VI. CONCLUSION
In summary, we have presented on-orbit results from a deployed Virtex-4. These early results show that the CREME96 predicted SEU rate is four to five times higher than the actual SEU rate. We believe that the SEU rate is artificially low operational usage in the SAA and the amount of metal to one side of the payload. We found that % % of the SEUs are MBUs. We also determined that % % of the SEUs are occurring in the CLBs. Finally, we had six incidents of observable output errors, which proves that the payload is immune to over 99.9% of all SEUs, as predicted by fault injection.
