Over the past 15 years many organizations have researched the use of Static-Random Access Memory (SRAM)-based Field-Programmable Gate Arrays (FPGAs) in space. Although the components can provide a performance improvement over radiation-hardened processing components, random soft errors can occur from the naturally occurring space radiation environment. Many organizations have been developing methods for characterizing, emulating, and simulating radiation-induced events; mitigating and removing radiation-induced computational errors; and designing fault-tolerant reconfigurable spacecraft. Los Alamos National Laboratory has fielded one of the longest space-based FPGAs experiments, called the Cibola Flight Experiment (CFE), using Xilinx Virtex FPGAs. CFE has successfully deployed commercial SRAM FPGAs into a low-Earth orbit with Single-Event Upset (SEU) mitigation and was able to exploit effectively the reconfigurability and customization of FPGAs in a harsh radiation environment. Although older than current stateof-the-art FPGAs, these same concepts are used to deploy newer FPGA-based space systems since the launch of the CFE satellite and will continue to be useful for newer systems. In this article, we present how the system was designed to be fault tolerant, prelaunch predictions of expected on-orbit behaviors, and on-orbit results.
INTRODUCTION
Many space-based sensors have the ability to collect more raw data than can be transmitted to the ground station. On-board computational systems can reduce the amount of data transmitted to the ground station by compressing data, summarizing datasets, or prioritizing data. Because these types of tasks rely heavily on signal-processing algorithms, Field-Programmable Gate Arrays (FPGAs) can be useful for developing custom hardware designs for high-performance, space-based computing. In previous decades, custom Application-Specific Integrated Circuits (ASICs) or antifuse-based FPGAs, such as the Microsemi (formerly Actel) RTAX FPGA, have been frequently used for these purposes. Over the past decade, many organizations have been researching the use of Static Random Access Memory (SRAM)-based FPGAs in spacecraft because these types of FPGAs are more capable than antifuse FPGAs and more flexible than either alternative due to reprogrammability.
SRAM-based FPGAs are often adversely affected by the space radiation environment. There are two primary concerns for long space missions: the accumulation of ionizing radiation that can cause wearout-like mechanisms in the transistors and the interaction of single, ionizing particles that can cause both temporary and permanent operational failures [Holmes-Siedle and Adams 2002] . The first case, called total ionizing dose, results in a degradation of the transistors as dose accumulates. The Virtex FPGA can accumulate up to 90 krads (Si) before the component's parameters start to change [Fabula and Bogrow 2000] . The second case, called Single-Event Effects (SEEs), can be an issue with SRAM-based FPGAs.
There are many different types of SEEs. The four that are a primary concern for SRAM-based FPGAs are: -Single-Event Latchup (SEL): a radiation-induced version of electrical latchup that can permanently damage a component -Single-Event Transient (SET): a transient change in a gate's output -Single-Event Upset (SEU): a change in a memory value stored in an SRAM memory cell, register, or flip-flop. -Single-Event Functional Interrupt (SEFI): an SET or SEU in control logic that causes the component to operationally fail until reset Xilinx FPGAs have been most frequently used in space computing because the Virtex family of FPGAs have been shown to be SEL-immune [Fuller et al. 2000a; Swift 2004; . Although free of SEEs with permanent failure modes, many of the FPGAs in the Virtex and Spartan families are sensitive to SEUs and SEFIs. SEUs and SEFIs are known to cause corruption to output values in many types of processing components [Hiemstra and Bailak 2004; Rech et al. 2012] , including FPGAs [Graham et al. 2003b; Sterpone et al. 2007 ]. The Cibola Flight Experiment (CFE) satellite was one of the first satellites with Virtex FPGAs when it launched in 2007. The satellite was designed by Surrey Satellite Technology Ltd. (SSTL) and the reconfigurable payload was designed by Los Alamos National Laboratory (LANL). Research on radiation effects on SRAM-based FPGAs at LANL started in 1998 in conjunction with research partners at Brigham Young University (BYU), Xilinx, and the Xilinx Radiation Test Consortium [Fuller et al. 1999] . In the early days of the CFE project LANL worked with these teams to quantify and understand how SEUs affected the reconfigurable fabric. It was identified that the primary concern was that SEUs could modify both the user circuit and user memory [Caffrey et al. 2002b; Rollins et al. 2002; Graham et al. 2003b] . It was eventually determined that the SEUs could be masked through the use of Triple-Modular Redundancy (TMR) and half-latch removal [Bridgford et al. 2008; Graham et al. 2003a; Carmichael 2001; Morgan et al. 2005; Samudrala et al. 2004; Sutton 2013; Do 2011] . TMR is only guaranteed to mask one error in the system. Online reconfiguration or scrubbing is needed to keep SEUs from accumulating [Carmichael et al. 2000; . The CFE scrubbing circuit is designed to detect both SEUs and SEFIs.
In this article, we discuss the 7 years of successful flight experience on the CFE satellite (Section 4), including prelaunch predictions for how the FPGAs would perform (Section 3). We cover advancements made in FPGA radiation research that made CFE possible in Section 2. Although many of these advances were made in the early 2000s, many of them are valid and useful with present-day systems. In fact, many of the same techniques were used to design a Virtex-4 payload, called the Mission Response Module (MRM), that was launched into a much harsher orbit than CFE (Section 5).
BACKGROUND
The CFE satellite has a reconfigurable processor payload designed to survive the LowEarth Orbit (LEO) radiation environment. The Department of Energy, National Nuclear Security Administration NA-22 funded CFE satellite. It was launched by the U.S. Department of Defense Space Test Program (STP) as part of the STP-1 space flight mission. The STP-1 mission was launched in March 2007 on a U.S. Air Force Atlas-V Evolved Expendable Launch Vehicles (EELV) deploying the Orbital Express spacecraft and four EELV Secondary Payload Adapter (ESPA) satellites of which CFE was one. CFE was deployed into a 560km circular and 35.4-degree inclined orbit. The satellite as well as the payload are operated from an autonomous ground station at LANL in Los Alamos, New Mexico. The expected CFE on-orbit lifetime was originally predicted to be 4-5 years for average solar maximum activity. However, due to a weak solar maximum, CFE is continuing operations into the eighth year since launch.
The satellite is a Radio Frequency (RF) instrument with two antennae and a reconfigurable payload used to process the Intermediate Frequency (IF) for ionospheric and lightning studies. Two 40MHz RF channels are gang tuned to an IF of 55-95MHz. The two IF channels are sampled at 100MHz with 12-bit resolution, as shown in Figure 1 . Figure 1 also shows the functional diagram of the CFE satellite chassis with the three Reconfigurable Computers (RCC). The functional diagram of the RCC boards are shown in Figure 2 . The RCC boards implement a Software-Defined Radio (SDR) that surveys the radio spectrum between 100 and 500MHz [Caffrey et al. 2002a] . Each of the three RCC boards has three Xilinx XQVR1000 FPGAs. Because the use of SRAM-based FPGAs was rare when CFE was designed, the conceptual framework allowed for the loss of individual FPGAs in an RCC and the loss of individual RCCs.
Reconfiguration can be used to change processing to search for different signatures or enhance the sensitivity in the deployed system by changing the detection algorithms. On-board detection is used to compress data on the satellite to reduce the amount of data transmitted to the ground station. The processing demands for these tasks are immense and difficult to accomplish with traditional microprocessors, let alone with radiation-hardened microprocessors that are essentially 15 years behind state of the art. LANL has taken full advantage of the reconfiguration capability and has uploaded new bitstreams twice a month since launch with new detection algorithms and different mitigation schemes.
In this section, we discuss some of the innovations from LANL and BYU that helped CFE successfully deploy nine Virtex FPGAs in LEO. Although some of these innovations are widely used by many organizations now, the initial advancements were made early in the CFE project and remain as one of the most comprehensively tested Fig. 3 . Measured radiation sensitivities for heavy ions and protons. These curves are used to predict on-orbit SEU rates when convolved with the radiation environment data.
approaches to fault emulation, SEU mitigation, and FPGA scrubbing. This section explains how this work is universally applicable to all SRAM-based FPGAs.
Radiation Testing
Although SRAM FPGAs had never been used in space when the LANL project started, radiation-induced upsets in SRAM technology had already been explored [Flament et al. 2004; Tosaka et al. 2004] . Therefore, SEUs in SRAM FPGAs were expected, but there was no clear concept of how these sensitivities would translate to SRAM-based FPGAs. Accelerated testing was needed to accomplish two things:
(1) Measure the sensitivity (cross-section) of the SRAM cells to heavy ion and proton radiation so that error rate predictions can be estimated (2) Characterize how SEUs affect the FPGA's user circuits to determine potential onorbit failure modes.
The cross-section is a measurement of the area sensitive to SEEs and is used to determine predictions for on-orbit errors. For space applications, cross-sections are measured for both proton and heavy ion radiation using Equation (1):
where events are the number of observed SEEs that occurred in the amount of fluence, which is a measurement of the number or particles per unit area. More discussion of radiation effects can be found in Holmes-Siedle and Adams [2002] . LANL completed one of the first radiation tests to measure the cross-section of an SRAM FPGA when it conducted heavy-ion tests on the Xilinx Virtex XQVR300 in 1998 [Fuller et al. 2000b ]. The measured cross-section for heavy ions is shown in Figure 3(a) . This curve has the characteristic shape for radiation effects: a Linear Energy Transfer (LET) 1 onset where SEUs start to occur, a rapid increase in sensitivity, and a saturation region where increasing the LET the no longer increases the SEU sensitivity very much. Figure 3 (b) shows the measured sensitivity to proton radiation over several energies [Fuller et al. 2000b] . Note that both of these cross-section curves measure individual bit sensitivity. For a measurement of device sensitivity, the bit cross-section needs to be multiplied by the number of bits in the component. These cross-sections are consistent not only with other SRAM components of the era but for many SRAM and SRAM FPGA components today. Over the nearly 15 years of radiation testing completed by LANL, feature sizes of SRAM cells have decreased, but SRAM cell sensitivities have decreased only a little, less than an order of magnitude [Swift 2004; Quinn et al. 2007a; . In comparison, DRAM sensitivities have decreased many orders of magnitude [Baumann 2005 ] over the same time period.
In addition to the basic sensitivity, it is important to understand how SEUs affect the functioning of the user circuit. As shown in Caffrey et al. [2002b] , 91% of the SEUs occur in the configuration memory. The remaining 9% includes user flip-flops (0.4%), LUT bits (6.4%), and BlockRAM (2.1%). Whereas the relative proportion of these different types of memories is often similar for other Virtex components, the raw number of bits for each type is particular to the specific component. Graham et al. [2003b] breaks down how these SEUs can affect the user circuit, including changing the logic equation in the LUT; causing routes to bridge, open, or short; changing the value stored in half-latches used for logical constants; and changing the "state" configuration of the configuration logic blocks and user flip-flops. Although the number of embedded "hard" cores have increased and the architecture has changed over the years, this categorization remains valid with today's FPGAs. Due to the large size of the reconfigurable fabric, most of the radiation effects continue to be dominated by SEUs in the programmable logic and routing.
SEU Emulation
SEU emulation is the most useful tool that LANL has ever used for estimating the effects of SEUs on user circuits because it provides a method for mimicking SEUs without the use of a particle accelerators to study, understand, and improve fault tolerance of flight designs. Within the radiation effects community, there is a certain amount of controversy about using emulation or simulation tools because there are limitations to these tools and some basic differences between intentionally injected faults versus actual radiation-induced behaviors. Developing and validating an FPGA emulation system is straightforward, though. Emulation, done properly, gives the designer a set of behaviors and bit locations that should be consistent with those error signatures and bit locations found in radiation testing. The SEU emulation systems developed jointly by LANL and BYU have been used extensively for benchtop testing once validated using radiation sources.
The SEU emulation used for CFE was based on the Systems Level Applications of Adaptive Computing (SLAAC)-1V board that was designed by the Information Science Institute [Schott et al. 1999] . The SLAAC-1V board has three Xilinx Virtex 1000 FPGAs, as shown in Figure 4 ]. Processing Element (PE)1 and PE2 FPGAs operate synchronously and, under normal circumstances, behave identically. During SEU emulation, partial configuration is used to alter the configuration memory of PE1 to mimic the SEU. Once the SEU is injected, PE1 and PE2 are executed with the same input vectors for 215μs. During this time period, approximately 250,000 randomly generated input vectors are input and executed in the circuit. It is important that sufficient time and input vectors are used at this stage to allow for fault propagation in the circuit and reasonable coverage of input-sensitive faults. PE0 monitors the outputs of PE1 and PE2 to determine if the outputs are identical. When the outputs differ, the configuration bit is logged as a sensitive bit. All SEUs are removed through partial reconfiguration, and the PE1 and PE2 FPGAs are reset before the next SEU is injected so that each bit is tested independently. The current testing procedure for the SLAAC-1V board takes at least 215μs to test a single configuration bit and takes at least 20 minutes to test the entire 5.6 million bit configuration bitstream. . The SLAAC-1V conceptual design highlights the role of the three FPGAs labeled PE0-PE2. PE0 is used to control the experiment. PE1 is the FPGA that is corrupted with emulated SEUs. PE2 is the FPGA that is not corrupted. PE1 and PE2 are operated in lockstep to determine whether the emulated SEUs cause incorrect output.
The SEU emulation system was validated by testing several circuits in both emulation and proton radiation so the results could be compared. Multiple approaches were used to compare fault emulation data with radiation test data ]. It was determined that the percentage of radiation-induced sensitive bits out of all SEUs was within 1% of the percentage of emulation-found sensitive bits out of the entire bitstream.
Although SEU emulation will never replace accelerated testing, especially for the basic SEL, SEU, and SEFI characterization of components, it continues to be used heavily at LANL and other organizations to study design alternatives and predict onorbit error rates for flight designs. The SLAAC-1V SEU emulation system and the CFE payload were used for testing user circuit designs before the launch. The initial results for determining limitations in circuit mitigation were collected using a Virtex-II fault emulation system designed at LANL [Quinn et al. 2007b] . Later, fault emulation was implemented for the Virtex-4-based MRM so that in-system fault emulation could be completed before launch [Quinn et al. 2013] . Many other organizations have created FPGA fault emulation systems for research and risk reduction purposes Alderighi et al. 2007; Berg et al. 2008; Jacobs et al. 2012 ].
User Circuit Mitigation
Early in the process of investigating SEUs, it became apparent that it would be necessary to mitigate the user circuit so that the effect of SEUs could be suppressed. The best option found to date is TMR . From radiation testing, it was determined that to fully TMR a user circuit, it is necessary to triplicate all of the logic, inputs, outputs, global signals (clock and reset), and voters. Most FPGAs also have a structure called a half-latch that can be highly sensitive to SEUs [Graham et al. 2003a] . A half-latch is a weak keeper circuit that is used to generate the constants "1" and "0" [Quinn et al. 2009 ] while using fewer transistors and silicon area than a "full" latch. There are many details in properly mitigating a circuit, including applying mitigation to the user circuit and half-latches. In this section, we discuss persistent errors and half-latch errors, including methods for mitigating them.
All user circuits have a unique sensitivity to SEUs. Only about 1%-20% of the bitstream is found to be sensitive, where an SEU in a particular bit is likely to cause an observable output error. The actual percentage for a given design implementation is based on a number of factors including how much of the FPGA the design uses and how well the design can natively mask faults. Most of the bitstream is devoted to defining routing and cannot be completely used at one time logically because defining one route negates the use of many other routes.
The structure of the user circuit also matters. In particular, feedforward circuits, such as adders or multipliers, generally recover quickly to correct operation once the SEU is removed from the FPGA. In circuits with feedback, including counters and state machines, it is possible that that an SEU could cause persistent state corruption, called a persistent error. Figures 5(a) and 5(b) show the arithmetic difference ("magnitude") between the output of two FPGAs operating in lockstep when one of the FPGAs is in error [Morgan et al. 2005] . In Figure 5 (a), it shows that for a feedforward circuit there is no error in computation until the SEU is introduced into the FPGA; then, the SEU causes a highly varying arithmetic difference between the two FPGAs until the SEU is repaired. Similarly, Figure 5 (b) shows that for a feedback circuit the erroneous output is not removed with the SEU. If a persistent error occurs, the only way to recover correct output is to reset the user circuit, restart the calculation, and lose all intermediate data. Persistent errors can be difficult to detect so that in-situ correction cannot be performed. Triplicating the feedback circuit such that majority voters are inserted into the feedback loops masks many of the state-corrupting SEUs. This mitigation was shown to be highly effective in both fault emulation and radiation testing [Morgan et al. 2005] .
Half-latches also need to be mitigated. Half-latches are an artifact of mapping user circuits described in a hardware description language to the FPGA architecture and are often used to to tie off unspecified inputs. For example, if the carry-in signal for an adder is not specified, the input to the fast carry chain on the Virtex FPGA needs to be tied to ground. In this case, a corrupted half-latch can inject bad data into the user circuit through the fast carry chain. The half-latches in the Virtex XQVR1000 FPGA are an interesting aspect of this FPGA because they are set through the full reconfiguration process, are sensitive to SEUs, and cannot be scrubbed. Because scrubbing does not correct the error, it appears as if the circuit is persistently upset. It is possible to clear a corrupted half-latch through a full reconfiguration, but it requires fast detection of errors in the output stream so that a reconfiguration and reset can be triggered. It is easier to remove half-latches through one of these two methods:
(1) Replace half-latches with constant Lookup Tables (LUTs), or (2) Extract half-latches to input signals so that the signal can be driven from off chip.
It should be noted that the persistence issue with half-latches only exists in the Virtex generation of FPGAs. In the Virtex-II and later, the weak keeper circuits leaked more and were able to leak back to their original state [Quinn et al. 2009 ].
In later generations, constant LUTs are used more frequently. Constant LUTs can be used as regional constants that can be shared by multiple portions of the circuit. Because constant LUTs have all of the usual radiation failure modes as other LUTs, sharing constant LUTs can cause single points of failure in the design if not disentangled.
LANL and BYU have both been active in developing tools for mitigating user circuits, as have many other organizations [Bridgford et al. 2008; Samudrala et al. 2004; Sutton 2013; Do 2011] . BYU, in cooperation with LANL, developed a software tool to apply partial mitigation automatically on any Electronic Design Interchange Format (EDIF) formatted design. The BYU-LANL Triple Modular Redundancy (BLTmr) tool uses best practices for mitigating user circuits. This tool was initially designed to meet the needs for CFE, including the ability to apply partial mitigation in a manner that discovered and then applied mitigation to the most critical portions of the circuit [Pratt et al. 2006] , including half-latches. LANL also developed a tool called RadDRC that only mitigated half-latches [Graham et al. 2003a ] and disentangled constant LUTs [Quinn et al. 2009 ]. Both BLTmr and RadDRC have been tested using fault injection and radiation testing. Furthermore, all of the CFE user circuits have used BLTmr and RadDRC for mitigation. BLTmr was also used to mitigate the Virtex-4 components used for MRM.
Scrubbing
An Actel anti-fuse-based, one-time programmable FPGA and a BAE Systems R6000 Radiation-Hardened-by-Process (RHBP) microprocessor provide watchdog monitoring, configuration, and SEU scrubbing and SEFI recovery. The scrubbing system provides SEU detection for the Virtex FPGAs by continuously reading each Xilinx configuration frame, calculating a Cyclic Redundancy Check (CRC) for each frame, and comparing the CRC with the precalculated "golden" codebook of correct CRCs stored in local SRAM. Each FPGA is read every 180ms while the device is in operation with no interruption in service. The goal of the SEU detection scheme is to detect a high percentage (99% or so) of upsets and trigger fast recovery of correct operation, thus resulting in small system cost in terms of power and performance. Once an SEU is detected, data are marked as invalid and the system reset. The payload was also designed to provide complete diagnostic information regarding any detected SEUs, including both time and hardware location of occurrence. These data are collected by the ground station, allowing both mapping of geographical location of the SEUs and comparing the actual rates with prelaunch predictions.
The algorithm for configuring and scrubbing the Virtex FPGAs is shown in Figure 6 . The CFE scrubber is co-designed between the Actel FPGA and a BAE R6000 radiationhardened microprocessor, as shown in Figure 6 . When an SEU is detected, information about the SEU is given to the microprocessor so that the data can be returned to the ground station and so that the correction process is coordinated. Because the codebook is stored in memory that is not SEU-immune, the microprocessor also determines whether the codebook has had an SEU and scrubs the codebook memory.
The microprocessor also coordinates SEFI detection. The microprocessor keeps track of whether an unusually large number of frames have been upset simultaneously and/or whether a frame appears to be "unscrubbable." Both of these issues are symptoms of the SelectMAP SEFI that affects the functionality of the SelectMAP port. When the SelectMap SEFI is triggered, the SelectMAP port will be unable to read and/or write correctly. When either of these situations arise, the microprocessor can coordinate a full or offline reconfiguration of the affected FPGA.
One problem with the Virtex-era FPGAs is that the SRL16s, LUTRAM, User FlipFlops, and BlockRAM could be corrupted through readback operations. In more modern FPGAs, these areas are maskable. The CFE scrubber specifically avoids readback in the User Flip-Flops and BlockRAM, which involves skipping certain columns and frames. The problems with LUTRAM and SRL16s were avoided by not using these resources. It is often suggested to avoid the use of BlockRAM in the earlier Virtex FPGAs, where BlockRAM scrubbing is not straightforward. It is also possible to use the BlockRAM as long as users understand that SEUs could affect their data.
PRELAUNCH TESTING AND MODELING
Because of the low altitude and low inclination for the CFE satellite orbit, the primary radiation interaction is predicted to be protons in the South Atlantic Anomaly (SAA). The SAA is the area where the trapped proton belt in the inner Van Allen belt comes closest to Earth. Whereas the auroral zones at the poles often bring in more protons and heavy ions on the magnetic field lines, the satellite is shallowly inclined, and these interactions are not expected.
Omere [Beutier et al. 2003 ] and SPENVIS [Kruglanski et al. 2009 ] were used to model the radiation environment. The map of the proton flux from Omere is shown in Figure 7 . This map shows the intensity of the flux of ≥30MeV protons from the trapped proton belt projected onto the Earth as subsatellite locations. It is not possible to create the same type of map for heavy-ion interactions. MAST data shown in Cummings et al. [1993] indicate that oxygen atoms would be predominantly found at the poles and the southern portion of the SAA. Figure 8 shows a breakdown of flux by ion and energy, as calculated by CREME96 [Tylka et al. 1997] . Although this graph does not show the locations of these ions, it does show the most common ion interactions for the payload. The most common ions are hydrogen (protons), helium, and Carbon. Any of these ions could cause SEUs in FPGAs.
Shielding the spacecraft from solar conditions has a first-order effect on the radiation environment. Shielding moderates ions such that the interior of the spacecraft should have a less intense radiation environment than its outside. There is also an inverse relationship with the solar cycle, such that the inner Van Allen belt radiation environment is more intense when the sun is quiet (without sunspots). Increased solar activity can increase the possibility of solar flares and coronal mass ejections, which can temporarily increase X-ray and particle fluxes, respectively. Before launch, three predictions were developed for different operating conditions for these solar conditions: solar maximum, solar minimum, and a coronal mass ejection model for "worst-day" solar conditions. These spectra were convolved with the proton and heavy ion cross-sections to determine predicted error rates. This type of calculation can be handled by a number of tools, including CREME96 [Tylka et al. 1997] . Table I shows a prediction for SEU rates based on these solar conditions. This table shows that the predicted SEU rate at solar minimum is 60% higher than the predicted SEU rate solar maximum conditions due to the increased proton flux during solar minimum. The predicted SEU rate for the worst-day spectra is 58% higher than the predicted SEU rate for solar maximum due to an increase in both proton and heavy ion particle fluxes. In all of these cases, the SEU response to protons dominate the predicted SEU rate.
In addition to SEUs, Multiple-Bit Upsets (MBUs) and SEFIs are a concern. In both cases, these effects can be viewed as special types of SEUs. In the case of the MBU, the ionizing particle causes more than one SEU at a time. In ground testing of the XQVR1000, it was observed that 7.5% of SEUs from heavy ions are MBUs and 0.04% of SEUs from protons are MBUs. When the solar minimum predicted SEU rate is scaled by these percentages, the predicted MBU rate would be 6.0×10
−3
MBU s satellite−day . In the case of a SEFI, the SEU affects control logic, which causes the component to malfunction operationally until reset. The SelectMAP SEFI is approximately the size of 40 bits. When the predicted SEU rate is scaled, the predicted SEFI rate would be approximately 4.7 × 10 −5 SEF I satellite−day or one SEFI every 58 years. Although the occurrence of a SEFI might be rare, the arrival time of the events is unknown. Whereas on average SEFIs should occur every 58 years, there is nothing to stop it from occurring on the first day of the mission.
There is a lot of uncertainty in these predictions. There are experimental uncertainties in the cross-sections. The space radiation environment has its own uncertainties due to the dynamic environment. The operation of the payload can also affect the measured SEU rate if the payload is off for long periods of time. As a general rule of thumb, it is assumed that the actual rates are within a factor of 2 of the predicted rates on average. Because the operational profile is not known until launch, the rates should also be considered a worst-case rate until then.
FLIGHT EXPERIENCE
The CFE satellite has been in orbit for more than 7 years. During this period, the satellite transitioned from solar minimum conditions to solar maximum conditions. It experienced one of the deepest and longest solar minimum periods. Figure 9 shows a graph of sunspots as a function of year [SILSO 2013] . Usually, the solar minimum is not completely free of sunspots, and the solar maximum period has a sunspot number that is two to three times the current sunspot number. We are artificially marking the transition from solar minimum to solar maximum as January 2012, when a series of solar flares affected a number of satellites. In practice, the transition is not as clean. On any given day, the sun could be quiet or active. There were many days in 2012 when solar minimum conditions dominated the radiation environment. Furthermore, this particular solar cycle appears to be having a "double-humped" peak, as shown by the decrease in sunspots in 2012 and 2013.
In this section, we present information about how the Virtex FPGAs have been faring. This discussion includes geographic location of SEUs, a comparison of predicted SEU rates to actual SEU rates, and affected FPGA resources. Figure 10 shows the geographic locations of the CFE-detected SEUs as projected onto the Earth using subsatellite locations. From this figure, it is clear that the largest density of SEUs (red) and most of the MBUs (black) have occurred in the SAA. Figure 11 shows the SEUs as a heat map, so that the intensity of SEUs in the SAA can be directly compared to the Omere flux map in Figure 7 . The densest region of SEUs correlate to the most intense flux region. Although there is reasonable north-south coverage of the SAA, there are fewer SEUs in the eastern region of the SAA than expected. This result correlates with LANL's other on-orbit FPGA system, MRM [Quinn et al. 2013] . Approximately 5% of the SEUs have occurred outside of the SAA. Many of those locations are near the highest and lowest portions of the orbit, where geomagnetic shielding of cosmic rays is the lowest. Therefore, these SEUs are more likely caused by heavy ions, whereas the SEUs in the SAA are mostly proton-induced. Many of the heavy-ion SEUs occurred during solar minimum conditions, when cosmic rays peak. Although more heavy-ion SEU are occurring during solar maximum conditions, most of the on-orbit time is still dominated by solar minimum conditions.
SEU Locations

SEU Predictions
As of May 16, 2014, there have been a total of 2,816 SEUs. Table II shows the breakdown by FPGA. When compared FPGA-to-FPGA, these numbers look reasonably similar, especially given the facts that radiation events are a Poisson random process and that each of the FPGAs is scheduled independently. Using our delineation of solar minimum and maximum periods, the average Figure 12 shows the breakdown of SEUs/satellite-day since launch. Because CFE has been operational for many years now, the data are binned into quarters to make the amount of information more manageable. By looking at quarters instead of months, some of the randomness of the Poisson statistics is normalized. The SEUs/satellite-day are 50% with a 95% confidence interval of (23%, 725.30%) of the prediction for solar minimum and 45% with a 95% confidence interval of (11%, 956%) of the prediction for solar maximum, which places the predicted SEU rate as reasonably close to that expected for Poisson statistics. These values also indicate that SEU/satellite-day rate for solar maximum is 58% of solar minimum with a 95% confidence interval of (47%, 309%).
As of May 16, 2014, CFE has experienced 11 MBUs (eight in solar minimum conditions and three in solar maximum conditions). Based on the predicted MBU rate, CFE should have experienced five MBUs since launch. The actual MBU rate is 2.3 times higher than the predicted MBU rate at 0.013
MBU s satellite−day
with a 95% confidence interval of (0.007
, 0.037
). The percentage of MBUs out of all SEUs is 0.4% with 95% confidence intervals of (0.2%, 0.8%).
As of May 16, 2014, there have been no SEFIs. With SEFIs occurring every 58 years, on average, there would have only been an 11% probability of having a SEFI during the 7 years on orbit. Therefore, the lack of SEFIs is not unexpected.
Based on predictions, CFE should not be affected by CMEs, and the magnetosphere would moderate most of the increased particle flux. Since the MRM launch in 2011, we have been correlating CMEs that occurred with increased upset rates in the MRM and CFE platforms. For the most part, this prediction has been true. We were particularly interested in CMEs that occurred in 2012 and 2013, so that we could correlate the SEU rates to MRM. The January 2012 flare affected both CFE and MRM, with both satellites experiencing higher than average SEU rates. The next event was a series of flares in July 2012. These flares cause the SEU rates on MRM to double, but did not cause an increase in SEU rates on CFE. Finally, the solar flare that occurred in April 2013 caused a three order of magnitude increase in proton radiation at GEO, but did not affect either MRM or CFE. Although it is not possible to predict accurately the effect of CMEs on satellites due to their uncertain and individual nature, it appears the prediction that solar flares would not cause much of an effect on CFE looks to be accurate. 4.2.1. Discussion. For some time, there has been a concern that the rate prediction tools do not accurately determine SEU rates for FPGAs. One of the reasons why some of the actual SEU rates are low is that the rate accurately represents the operational behavior of the payload. When the payload is off for any reason, no SEUs are recorded. For example, CFE has been on-orbit for 2,330 days, but the payload has only been operational for 1,064 of those days. Thus, the payload is operational approximately 46% of the time. The other complicating factor for LEO satellites is the portion of the day that the payload is operational, as nearly all of the SEUs will occur in the SAA. For CFE, the concept of operation often means the payload is off for part of the day. For most of 2007-2009, the concept of operation included not using the payload in the SAA, which meant that the payload was not operational in the location where most of the SEUs would be reported. The concept of operation changed again in 2009 to include more operational time in the SAA, but included the payload being turned off when it was in eclipse, for power management. Although this change increased coverage of the SAA, there are times when SAA passage occurs during the eclipse, when the payload is not operational. Figure 13 shows the number of days per quarter that the payload is operational in terms of both total time and time in the SAA. The operational days were determined by the sum of the total number of seconds operational divided by the number of seconds in a day. The operational days in the SAA were determined in a similar manner, although using the total amount of time operational in the SAA and the total number of seconds spent in the SAA. We developed a program for determining how many seconds the spacecraft is operational in the SAA every day. We then determined the average number of seconds the spacecraft should be operational in the SAA on any given day using Satellite Toolkit, which is 11,012 seconds. Note that there are times when the day's mission is predominantly focused on only the SAA, such that the operational day in the SAA is completed but the operational day in other locations is not. This situation causes the operational days in the SAA to exceed the total operational days.
For systems with operational gaps and uneven coverage of the collection region, the actual SEU rate needs to be calculated using Equation (2):
This correction has been used on both CFE and MRM. In both cases, the actual SEU rates are within 50% of the predicted SEU rates. Therefore, for these cases, there is no evidence that the prediction tools are inadequate. It should be noted that these specific changes to operational time in this article reflect this payload's usage and will be different for each system. Because it can be difficult to predict how the system will work operationally before launch, it is best to assume that the predicted SEU rates are worst-case scenarios until the deployed operational behavior can be determined.
Affected Memory Types
During the research phase of the CFE project, categorizing how SEUs affected the reconfigurable fabric was of interest because the consequences of the SEUs were directly tied to the resource affected [Graham et al. 2003b] . To develop an understanding of how the SEUs affected the FPGAs, we developed analysis tools for translating the bit position of errors in the readback bitstream to logical addresses in the FPGA. These tools allow us to determine whether SEUs occurred in any of the major memory resources on the FPGA: input/output buffers (IOBs), BlockRAM interconnect, BlockRAM, Configuration Logic Blocks (CLBs), and clocking. These tools have been modified to take input from the satellite logs. Furthermore, because these data come from the scrubber, there are no data on the BlockRAM because BlockRAM scrubbing was not implemented. Before discussing how the resources are affected, it is important to discuss whether the SEUs are uniformly and randomly distributed across the FPGAs. Figure 14 shows the histogram of frames affected by SEUs for all FPGAs. Because this particular FPGA has 4,775 frames, the frame numbers are binned in groups of 100 frames. There is an is lower than the rest of the bins. When individual frames are analyzed, 55% of all frames have not had SEUs, 33% of all frames have had one SEU, and 12% of all frames have had 2-6 SEUs. If there is more than one upset in a frame, the most common cause is that multiple FPGAs had SEUs in the same frame at different times. In the very rare case where a single FPGA gets multiple SEUs in the same frame, these occurrences were separated in time. These data indicate that the SEU data are sparse, not clustered across particular frames, and spread uniformly around the FPGAs.
In Table III , the percentage of SEUs is shown as a function of affected resource. The percentage of the reconfigurable fabric (minus the BlockRAM) dedicated to each resource is also shown. The percentage of SEUs by resource closely tracks the percentage of fabric each resource takes. This result is unique to the Virtex XQVR1000 FPGA. Starting in the Virtex-II, the BlockRAM interconnect is unusually sensitive to radiation. In MRM, which deployed Virtex-4 FPGAs, SEUs in BlockRAM interconnect are 37% ± 1% of all SEUs, even though it takes up only 5% of the reconfigurable fabric [Quinn et al. 2013] .
Another interest is in the breakdown of SEUs between routing and the LUTs in the CLB resources. As a rule of thumb, we have found from testing user circuits that approximately two-thirds of observable output errors comes from the routing and onethird comes from the LUTs. In this FPGA, the LUTs are 4.2% of the CLB resources and Fig. 15 . Prediction of CFE de-orbiting. CFE will gradually decrease in altitude until late 2022, at which point increased drag from the atmosphere will rapidly cause it to reenter the atmosphere. experience 3.78% ± 0.78% of all CLB SEUs. The routing is 95.8% of the CLB resources and experience 96.2% ± 5.43% of all CLB SEUs. Once again, these values track the amount of SRAM cells dedicated to the resource. This result is similar to the results from MRM [Quinn et al. 2013 ].
Codebook SEUs
SEUs in the codebook have happened once since launch. It should be noted that the expected SEU rate of the radiation-hardened memory was so low that SEUs in the codebook were not expected during the mission. Because of the predicted SEU rate, the check on the codebook was not implemented until after launch. Although a minor problem on orbit, it remains the only "lesson learned" from the CFE mission.
De-orbiting
CFE started to de-orbit at the beginning of solar maximum because of the increased drag caused by magnetosphere changes. The prediction of the changes in altitude in upcoming years is shown in Figure 15 . In the next year, it is predicted that the orbit will decrease to 500km. Because of the decreasing altitude, the SEU rates will also decrease. The decrease of SEU rates from 560km to 500km is 30%. Therefore, it is expected in the 2015 time frame that the SEU rates will drop to 1.3
SEU s satellite−day
. The satellite is predicted to enter the Earth's atmosphere in the 2023 time frame.
MRM
Since the CFE launch, LANL has continued to fly Xilinx FPGAs in different missions. The most commonly known post-CFE mission is MRM, which is another RF sensor implementing an SDR in FPGAs. MRM was launched in 2011 on a U.S. Department of Defense (DoD) satellite. In this section, a very short introduction to the MRM mission is given, including design and radiation differences from CFE. More information on MRM can be found in Quinn et al. [2013] .
The MRM hardware design is based on Xilinx Virtex-4 FPGAs. Instead of three RCC boards each with three FPGAs that could work independently or together, MRM has two independent units with two Virtex-4 FPGAs (one XQR4VLX200 and one XQR4VSX55) in each unit. Given the size of these FPGAs, it is possible to do the complete CFE mission in each unit. MRM was also integrated with four independent antennae, so it is possible to record RF data from two different antennae simultaneously using both units. Like CFE, MRM has state-of-the-art Synchronous Dynamic Random Access Memory (SDRAM), Quad-Data Rate (QDR) SRAM, and Analog-to-Digital Converters (ADCs). Because uploading new bitstreams is less likely on a hosted system, MRM was preloaded with two applications. A third application was uploaded in 2013.
The MRM mitigation scheme was designed based on CFE without significant difference. BLTmr was used to mitigate the user circuits, which necessitated updating the EDIF parser to handle Virtex-4 constructs. A scrubber is used to keep SEUs from accumulating. Because of the issues with the codebook in CFE, MRM's scrubber was designed from the beginning to handle SEUs in the codebooks. The MRM scrubber also does not reset the unit each time an SEU occurs, which increases the availability of the units but could lead to occasional data corruption. Finally, the bitstreams are compressed in nonvolatile memory, which provides a significant advantage of reducing the amount of nonvolatile memory needed to store the bitstreams.
The MRM orbit is significantly harsher than the CFE orbit. Between the harsher orbit and the larger FPGAs, the MRM SEU rate is significantly higher than CFE's. On average, each unit has 14.4 in the SX FPGA. The same corrections to the predicted SEU rate were completed for MRM, and the on-orbit SEU rate is with 53% ± 3% of the predicted SEU rate. SEFIs have been more common on MRM than CFE, with five SelectMAP SEFIs since launch. The Virtex-4 FPGAs are significantly more sensitive to MBUs than Virtex FPGAs, and 6.37% ± 0.49% of all SEUs are MBUs.
Finally, due to particular design features of MRM, MRM is experiencing a lifetime failure mode. In MRM, the FPGA bitstreams are stored temporarily in SDRAM before the codebook is populated. Due to the concept of operations for the host satellite, it is necessary to repopulate the codebook once in orbit. This operation can be completed by reading the decompressed bitstreams in SDRAM, as long as the application is not being changed. In January 2014, we noticed that one of the SDRAM bits is starting to telegraph, which is the random fluctuations of an SDRAM bit's value [Chugg et al. 2009 ]. Random telegraphing is a sign that the bit is soon to become stuck permanently at one value. The bit is weak enough that it needs to be "refreshed" through overwriting each orbit before the codebook is loaded. Without refreshing, it is not possible to load the bitstream properly in the LX FPGA of the second unit. We are attempting to determine how many other telegraphing or stuck bits exist in SDRAM to see if the bitstream can be moved to another SDRAM location. Ultimately, though, telegraphing and stuck bits accumulate with time, so this failure mode might eventually limit the length of the MRM mission.
CONCLUSION
In this article we have discussed the CFE satellite, including a number of innovations that made it possible to successfully deploy nine Virtex FPGAs in LEO. These innovations covered a wide range of topics from characterizing how FPGAs are affected by radiation to mitigating radiation effects. These concepts, as well as our tools for SEU emulation and circuit mitigation, continue to be useful for present-day systems.
The payload has been on-orbit for more than 7 years. As of May 16, 2014, the reconfigurable payload has experienced 2,816 SEUs. The on-orbit SEU is 45%-50% of the predicted SEU rate, when the operational behavior is taken into account. The payload has also experienced 11 MBUs since launch, which is 2.3 times the predicted MBU rate. Finally, there have been no SEFIs since launch. Because there was only a 11% chance of having a SEFI in 7 years, having no SEFIs is the expected result.
