Abstract-This paper describes current ongoing research pertaining to the analysis of design radiation hardness for circuits implemented in Field-Programmable Gate Array (FPGA) devices. Radiation induces single event effects in FPGAs that can cause erroneous operation by upsetting data bits or changing logic behavior. Design-level techniques can help mitigate these upsets to some degree; however, there is currently no method available to quantify the benefit of these techniques after they are incorporated into a design. This research strives to develop a framework to analyze FPGA netlists and score a design in terms of upset hardness. Additionally, this framework will develop a method to determine the most upset-susceptible locations in design netlists and help identify areas of circuits that would most benefit from additional design mitigation.
I. BACKGROUND
Field Programmable Gate Array (FPGA) devices have become one of the predominant processing technologies for space electronic designs. The reprogrammable nature of the devices coupled with good performance and reasonable power consumption make this the processing platform of choice for a number of space hardware architectures. However, a challenge faced by hardware designers who utilize these devices is unwanted charge deposition within the device caused by radioactive particles. These particles can cause single-event effects (SEE), which can result in a variety of destructive and non-destructive events [1] . As most space-based systems avoid the use of parts that exhibit destructive SEE behavior, the focus of this particular work is based on characterizing the effects of non-destructive SEE. The challenges that exist with regard to these types of events in SRAM-based FPGA designs include changes to the logic, routing, or data stored in the FPGA, leading to erroneous operation and results.
Devices incorporating radiation hardening by design (RHBD) features and special fabrication processes have been designed to help mitigate non-destructive radiation-induced events; however, recent testing shows even the latestgeneration devices to be susceptible to measurable SEE [2] . The predominant SEE effects are single-event upsets (SEUs), which can change the value of memory elements, and singleevent transients (SETs), which can cause momentary glitches in signals. To combat SEE, design-level mitigation may be incorporated into a FPGA design, and a great deal of research has been conducted to this end. While very effective methods exist to ensure correct operation in the presence of SEE, there are a number of drawbacks when incorporating additional mitigation into a FPGA design.
Design-level SEE mitigation techniques can be very costly in terms of resources and power, and more complex methods can also be difficult to implement without the help of automated tools. It is important to minimize resource and power penalties by selecting the least complex method that provides enough robustness to meet operating requirements. Unfortunately, there is currently no capability to quantify the benefit from a specific design mitigation method after inserting it into a design without extensive quantitative testing. Additionally, verification of proper mitigation implementation is important, especially as synthesis tools have been observed to remove intentionally placed error correction circuitry after "optimizing" the circuit and classifying the mitigation as redundant [3] . For these reasons, it is very difficult for FPGA designers to ensure a reliable design without performing a significant amount of additional work.
Another consideration is the selection of the proper target device where the design is to be implemented. While spacegrade devices are well characterized for radiation environments, device SEE data is typically provided at the lowest level, providing upset rates of individual flip-flops, configuration memory cells, and so forth. There is no direct correlation to the overall reliability of a design implemented in a particular device that uses those fundamental blocks. Thus, estimating the reliability of an overall design is difficult, especially when trying to compare a single FPGA design implementation across multiple hardware architectures.
The primary goal of this research is to determine the radiation hardness of a particular FPGA design against SEE so that quantifiable benefits from properly added design mitigation can be easily determined at early design stages on any hardware platform without the need for extensive verification. Another area in which this research will have impact is analyzing the degree of severity of SEE in various parts of the circuit. The analysis of design hardness will provide a method for hardware designers to identify the most susceptible portions of a circuit and identify where mitigation should be placed to maximize benefit. This can allow selective mitigation of the most sensitive portions of a circuit, saving development time and resources when compared to fully mitigating the entire design. Finally, this research is ultimately striving to develop an algorithm that can combine device radiation characterization data, netlist analysis, and operational requirements to provide an anticipated upset rate along with a quantifiable reliability metric of proper design operation.
II. PREVIOUS WORK
One of the most popular methods of SEU estimation is through fault injection, which involves intentionally upsetting bits in a FPGA's configuration memory for the purpose of simulating upsets in the device. This method is described in [1] , [4] , [5] and is popular as it does not necessarily need any additional hardware or testing infrastructure and can be performed in just about any laboratory setting. However, this method can take a significant amount of time, especially for complex circuits. Fault injection is also limited to evaluating upsets in user-accessible memory cells and cannot easily simulate SET behavior.
Other works such as [6] , [7] investigate SEU estimation techniques, but involve analysis for the purpose of mitigated designs, specifically using Triple Modular Redundancy. The closest work to the research proposed in this paper is [8] , which proposed upset estimation at the netlist and routing stages of design development and uses the idea of fault propagation and sensitization to evaluate circuit behavior. Although similar to [8] in FPGA circuit modeling, the proposed method of analysis in this paper seeks to implement a better method of analyzing sequential circuit fault propagation that could improve accuracy and would provide additional symbolic fault propagation data that will assist planned future research to improve a circuit's SEE response. Work described in [9] performs synchronous simulation but operates by injecting faults independently and simulating over time to see if they cause errors in the output. In contrast, this research seeks a more analytical method by developing a temporal fault model that will provide this information without the need for individual behavioral simulations of faults. Figure 1 depicts the preliminary vision for our fault analysis methodology.
III. FAULT ANALYSIS METHODOLOGY
The first step in this research is developing a fault model of the FPGA design that can be used to evaluate potential faults and the probabilities of their occurrence and propagation. This involves breaking the design down into its constituent elements and then evaluating the upsets that can occur in the data path that would ultimately affect overall operation. The SEE effects under investigation can upset the circuit causing either persistent or non-persistent errors. Persistent errors are those that retain their erroneous state until they are explicitly corrected. Some examples of these errors would include logic equation changes in the FPGA look-up tables (LUTs) caused by FPGA configuration memory upsets. Another example might be routing changes, which could cause shorts, opens, or stuck-at errors on specific lines. Non-persistent errors would be those that occur in device elements but do not permanently change their operating characteristics. SETs on data lines and SEU-induced data flips in user flip-flops are examples of what might be considered non-persistent errors, as the circuit elements will continue to operate as intended albeit with the presence of an erroneous data bit present in the circuit. This erroneous data bit may still cause significant perturbation in the device output despite the fact that the circuit elements are in fact functioning normally since the elements are now processing faulty data.
In generating the fault model, it is useful to consider existing digital logic test generation techniques to establish sensitivity and propagation of the fault [10] [11] [12] . In order for a fault to propagate, an erroneous value must be able to propagate from a faulty FPGA element (flip-flop, LUT, etc.) to an output. Similarly, the inputs that sensitize that particular fault must be a valid combination of inputs that can occur at some time in the circuit's operation. The analysis in this research builds upon these algorithms to also consider a temporal element in consideration of the synchronous nature of the FPGA fabric. Doing so achieves a more accurate model of the circuit's overall behavior by using a more thorough analysis that considers the passage of time and the movement of faulty data through sequential elements. This type of analysis may often be necessary to evaluate complex logic chains that involve feedback or reconvergent fanout after multiple clock cycles. The final result of the model will be a time-based symbolic fault model of the circuit that contains
Figure 1: Flowchart illustrating fault analysis methodology
information on all potential faults and their temporal dependencies that will cause errors to propagate to outputs, and a statistical probability that the fault will propagate based on the probability that the inputs that sensitize the fault will occur to generate a fault condition.
The final step of this phase will be to generate an overall system upset metric by correlating the upset rates of critical components in the data path with available hardware device radiation characterization data. Persistent and non-persistent errors will be weighted accordingly and be scaled by their probabilistic effects as determined by the fault simulation data. The end result should be the generation of an overall design sensitivity cross-section to upsets, indicating how often that design will upset when implemented in a specific device. This cross-section can be correlated with existing phenomenological models to determine an expected upset and failure rate when operating the design in real-world space environments.
IV. VALIDATION
An important step for designers when using any error estimation method is to verify its accuracy when applied to real-world scenarios. As such, validation of this algorithm is planned by irradiating circuits implemented in different Xilinx FPGA devices. The observed failure rate of the FPGA outputs can then be correlated to the SEU estimation algorithm's calculated design failure rate. For this research, accelerator testing is preferred over bench-top fault injection, since fault injection can only provide an initial basis for approximate design robustness.
Additionally, the validation of this algorithm needs accelerator testing in order to establish realworld performance and to maintain consistency, considering that the device characterization data used by the estimation algorithm is also obtained by accelerator testing.
The first device used for validation is the Xilinx Kintex-7 FPGA, a commercial-grade part built in 28nm technology. Since this is a commercial device, radiation characterization data was not readily available. Thus to obtain this data, as a precursor to this research, a characterization of the device in heavy ions was performed in September 2013 [13] . This testing provided basic upset data of components relevant to this SEU estimation research, which included flip-flops, configuration memory elements, and user block RAM.
In order to obtain data for use in validating the accuracy of the SEE estimation algorithm, a brief heavy-ion accelerator test at Texas A&M (shown in Figure 2 ) was performed in March 2014 to irradiate the Kintex-7 FPGA with self-designed kernel circuits and benchmark circuits from the ISCAS'89 [14] and ITC'99 [15] benchmark suites implemented in the device. The goal in circuit selection was to have a moderately diverse set of representative circuit classes, such as finite state machines and arithmetic logic. During the beam test, a number of these kernel circuits were implemented in the FPGA and run in parallel. A second board was connected to the device under test and provided the input stimulus and monitoring of the outputs for erroneous behavior.
The benchmark circuits were tested across a variety of heavy ion species. The measure of energy deposition into the active regions of silicon caused by the heavy ion beam, also known as the Linear Energy Transfer (LET), was controlled and set to values of 1.5, 2.3, 3.2, 4.2, 10.9, and 38.8 MeVmg/cm 2 . For each energy level, the circuit was tested by beginning operation, starting the heavy ion beam, and then recording the cumulative ion count (fluence) to the point where the first observed failure occurred. The total fluences and error counts are shown in Table 1 for a state machine from the Figure 2 shows the design's cross-section, or measured susceptibility in units of area, as a function of LET. This fluence-to-failure data will verify the overall failure rate of the design as predicted from the statistical model formed during the previously mentioned fault simulation steps. Furthermore, this data can approximate a failure rate in space when combined with assumptions about the device construction, spacecraft design, and space environment (such as solar activity and device orbit) [16] .
V. FUTURE WORK
In addition to completing the SEU estimation algorithm, future work will include another accelerator trip to complete data collection for validating the algorithm against multiple devices and against more designs. A second verification platform will be utilized on the next accelerator test consisting of a Xilinx Virtex-5QV FPGA. The Virtex-5QV is a spacegrade FPGA based on 65nm technology and has been heavily characterized by the manufacturer. As such, a significant amount of analyzed beam data has been published [2] pertaining to this device. This published data will provide input for analysis of test circuits operating in a second device family.
Additionally, more benchmark kernels will be implemented in both devices to obtain more data and obtain better statistical bounds on results.
Other research is also planned on analyzing the effects of different design parameters that might affect reliability. In particular, tool settings for optimization of speed versus area and other user-controllable design aspects will be analyzed to determine their effect on design failure. Accelerator testing is planned to confirm any analytical hypothesis on the differences in reliability based on the design methods used.
VI. CONCLUSION
This research will develop a framework to provide accurate non-destructive SEE behavior estimation for complex circuit topologies in arbitrary FPGA designs. Analysis data can be used to identify soft areas of a circuit that can benefit from additional design-level mitigation. When combined with the radiation characterization data typically provided by device manufacturers, the algorithm developed can also provide an anticipated real-world upset rate in various orbits.
The impact of the research will be a significant reduction in overall design complexity, resources, and power, and a higher level of confidence that FPGA designs for space systems will operate as intended. Since there are no mechanisms currently available that are able to provide failure rates of designs in FPGAs correlated with real-world data, having a model that relates to actual real-world performance will inform designers where their designs are particularly susceptible so that additional mitigation can be inserted into the design. This research will also provide a mechanism to ensure that designlevel mitigation is properly incorporated and provide a measurable benefit in terms of a quantifiable metric, allowing different design options to be easily compared and contrasted. The result will be an overall increase in confidence in space designs and a reduction in power and both recurring and nonrecurring costs by reducing required device sizes and minimizing design complexity and time during development.
