Abstract-Local triple modular redundancy (LTMR) is often the first choice to harden a flash-based FPGA application against soft errors in space. In this work, we compare parity-based error detection with software-based retry, and LTMR on a reference architecture regarding maximum frequency, area overhead and processing time. Our results show that our solution based on parity-based error-detection saves from 30 % up to 45 % of the area overhead caused by LTMR.
I. INTRODUCTION
Field-programmable gate arrays (FPGAs) are often utilized in space avionics. The avionics must be protected from ionizing radiation in space. In the absence of a shield (e.g., magnetic field of the earth), a high energy particle can traverse through a digital circuit and induce significant amount of charge, which can cause soft errors. These errors are not permanent and can be corrected e.g., with a reset. In flash-based FPGAs, soft errors mainly happen in the flip-flops (FFs) of an FPGA application in form of bitflips. The FPGA configuration bits do not have to be protected, because flash memory has a negligible soft error rate.
The state-of-the-art solution for flash-based FPGAs is local triple modular redundancy (LTMR), i.e., triplicating the application FFs and voting their outputs. Unfortunately, triplication has a significant area overhead. Alternatively, a part of the space redundancy in the FPGA may be eliminated by implementing additional time redundancy, e.g., in software, if the FPGA acts as a co-unit beside an already radiation-hardened processor. An example architecture is depicted in Fig. 1 , where the FPGA implements the communication protocol interfaces needed for communicating with the satellite subsystems and the processor runs the mission software. The FPGA circuit which has to be hardened, only implements error detection. In case of an error, this circuit is functionally isolated and the software instructs the circuit to reprocess the last request. With this collaborative approach, error correction is achieved and the overhead of local error correction is eliminated in the FPGA. This technique will be referred as error detection with software-based retry (EDSR). In this paper, parity-based error detection (PBED) is used in EDSR. Parity-based codes and triplication are well-known concurrent error detection techniques (CED) [1] , [2] . Also error detection with retry for achieving error correction was proposed, e.g., in [3] . In recent years, one the one hand, partial hardening techniques were proposed due to the relatively high overhead of CED techniques, which selectively harden susceptible parts of the circuit [4] . On the other hand, software-based faulttolerance techniques are also popular due to the flexibility and relatively loose constraints of software, e.g., regarding memory requirements, compared to hardware [5] , [6] . Software-and hardware-based techniques have their tradeoffs, therefore these can also be used together [5] .
This work applies parity-based EDSR on an example data handling architecture based on a commercially-available flashbased FPGA and provides an experimental comparison to LTMR. Up to now, there is no detailed comparison based on a state-of-the-art (e.g., [7] ) flash-based FPGA. Due to the limited resources of space-proven flash-based FPGAs, area savings can be the key for fitting the application onto the FPGA. Our contributions are
• EDSR in the context of the full system stack including the discussion of requirements for the application and • empirical comparison of LTMR versus EDSR for circuit area overhead, maximum circuit frequency, and overall system latency due to error correction on a representative system in space-proven technology.
In the following sections, we firstly present the reference data processing system, which is used as the testbench. Then, we explain LTMR and EDSR and the implementations which are compared. Afterwards, experimental results based on a known flash-based FPGA are presented.
II. REFERENCE ARCHITECTURE
We use a reference model of an on-board data handling unit (OBDH) for satellites [7] for our analysis. In this section, we describe an overview of the system, the FPGA design, and the communication protocol between the processor and the FPGA. A. Overview Fig. 1 shows an overview of the architecture. OBDH comprises of two main processing modules: a processor and an FPGA. The processor runs the mission software, which involves communicating with different subsystems on-board of the space system. The communication is done through the FPGA, which acts as an interface component and implements the various communication interfaces needed by the subsystems (e.g., RS232, CAN). We assume that the processor, the communication line between the processor and the FPGA, and the subsystems are sufficiently protected against soft errors.
B. FPGA Design
From the processor point of view, the FPGA is a remote memory bus, where the implemented link interfaces are memory-mapped. The processor utilizes these interface modules by reading and writing the respective memory areas.
The FPGA model consists of three functional blocks: circuit A, B, and C as shown in Fig. 2 . Circuit A serves the memory access requests from the processor to circuit B, which issues memory accesses on circuit C and finally returns the data to the processor using the FIFO interface of circuit A. In Fig. 3 , circuit B is shown more in detail. Circuit C with a memory block inside resembles the memory-mapped interfaces. Circuit A and C are assumed to be sufficiently protected against soft errors (e.g., by LTMR). Circuit B must be hardened.
The FIFOs and the memory need a single clock cycle for reading or writing a single word, which renders the masking a single word access operation in the same clock cycle possible.
C. Communication Protocol
The communication protocol between the processor and the FPGA is visualized in Fig. 4 . It consists of two kinds of messages: request and response. The processor sends memory access requests for a specific address or address interval to the FPGA and the FPGA (more precisely, circuit B) answers with the according response: A read request is responded with read data and a write request is acknowledged after the write operation. Every request is acknowledged with a response and a second request cannot be sent before the response to the first request has been received. If the FPGA does not respond after a timeout, e.g., due to a soft error, the last request is repeated.
III. COMPARED HARDENING TECHNIQUES
In this section, LTMR and EDSR, and their characteristics are discussed. EDSR's implementation on the reference system is discussed in more detail due to its system impacts.
In LTMR, one FF from the application is triplicated and the outputs of the resulting three FFs are input to a voter, which outputs the majority value. LTMR detects and corrects a bitflip on an FF locally, hence it can be automatically applied on top of a circuit. This makes LTMR functionally transparent to the rest of the system, consequently the circuit mostly does not require a redesign before mapping to an FPGA.
PBED is a well-known error detection technique, which adds a parity bit to every data word being stored, e.g., by XORing the data bits [1] . Upon reading the data word, the parity is calculated again, compared to the stored parity value and in case of a mismatch, an error signal is asserted. Subsequently, an error handler can react and initiate a recovery scheme to correct the error.
After an error, a module must be recovered to an operational state. Often, this is done by resetting the module to its initial state. This in turn leads to a loss of the processing context that must be brought back, which involves periodically backing up the processing context, i.e., checkpointing. If the processing context does not contain any information which is needed for a long time, i.e., when a module regularly falls back to a defined state after a short time period, then the overhead of checkpointing in the circuit may be eliminated by reissuing a processing request after an error. Examples for such a module are a protocol converter or a module which exchanges data between two modules after reformatting data. Reissuing a request introduces extra delays, which should be negligible if the soft error rates are low. Fig. 2 shows PBED applied on circuit B. The error detection block continuously generates and checks the parity. If an error is detected, the error signal is asserted and the error handling block immediately masks the control signals on either side of the unreliable circuit. FFs in the unreliable circuit are segmented to groups and for each group one parity FF is introduced. One single group with a parity FF is called a cluster. Fig. 5 shows the generic implementation of the error detection in a single cluster. The number of clusters is given by c cl (c: count, cl: cluster). Each cluster contains s cl − 1 user FFs plus one parity FF (s: size). Even parity is generated by XORing the inputs to the user FFs by the XOR pg . The integrity of the stored bits is checked by the XOR pc with s cl inputs and the cluster error is generated by each cluster. Finally, c cl cluster error signals are reduced to a single error signal by an OR gate. Error handling is done by generating the reset and mask signals using the error signal.
If an incomplete or no response is received by the processor in the timeout window, then a recovery of the software processing context depends on the state: If an error happens during processing of a read request, then this request is repeated. If an error occurs in the middle of a write transaction, the software cannot know which part of the transaction was completed and the software can synchronize itself by reading these addresses again or simply retry the last transaction. If a write to a memory location triggers an operation (e.g., transmitting a command to a subsystem), then retrying retriggers the last operation, which can be undesirable and dangerous. In case of such action-triggering memory locations, the software can issue single memory write operations only. This has the advantage that every atomic memory write operation is acknowledged separately and the software knows exactly which single memory operation did not succeed, avoiding an indeterminable system state. This requirement can be loosened, if a memory area is written which does not trigger an action, i.e., the output of the target system does not change after the transaction. An example is the transmit buffer of a communication interface module, where the transmit operation must be first triggered by setting a bit in a control register allowing to start a data transfer to a subsystem. In this case, the processor would first try to write the transmit payload-data to the buffer with one write request and in the subsequent request the transmission operation would be triggered using another write request.
IV. EXPERIMENTAL RESULTS
We compared needed processing time for an example mission and synthesis results on different sizes of circuits. As circuit B, we implemented a module, which is functionally a concrete instantiation of the FSM in Fig. 3 . For PBED, we chose the cluster size s cl = 3, which fits to the ProASIC architecture with three-input LUTs and should give areaefficient results. In the tested implementation, the error han- dling comprises of (a) masking the circuit outputs and (b) resetting the circuit. In the following, the results are shown.
A. Processing Time Penalty
To verify our PBED implementation tool and compare the runtime performance of LTMR and EDSR under injection of bitflips, we implemented a bitflip injection tool and a testbench which performs a mission. The mission consists of 100 memory access blocks. Each memory access block consists of three subsequent memory accesses. One single memory access block is visualized in Fig. 6 . The block starts with a write transaction consisting of 200 words, which resembles data that should be sent to a subsystem by the FPGA. After the data are written, the subsystem data transmission is activated by a single word access. The subsystem responds in a predefined time window of 100 cycles. After a delay of 100 cycles, the subsystem response consisting of 55 words is read. At the end of the mission, the time needed for the whole mission is measured.
At every clock cycle, the bitflip injection tool iterates over all FFs in the target circuit and flips the FF bits according to the given probability p randomly. Probability p is defined as the bitflip probability per clock cycle for a single FF. The random numbers generated for the bitflip injection are dependent on a seed. We run the mission for 0 ≤ p ≤ 0.0001, and for one single p, the simulation was run with 32 different seeds.
In LTMR, the error is corrected in the same clock cycle, but EDSR requires that the error is corrected by the software by repeating the failed memory access request, which in turn causes additional processing delays. Fig. 7 shows relative processing time needed by EDSR for the given mission. The processing time of EDSR is plotted relative to the LTMR processing time, which is constant. For PBED, the processing time increases with increasing bitflip probability p, as a failed memory access request must be repeated. The time loss due to retransmission is at least the time required to transmit the failed request. At higher p, if the bitflip rate equals to the memory access request rate, the processing time would be infinite. Therefore, the processing time grows exponentially in respect to p. Note that, at the simulated p interval, there were no undetected errors (e.g., multiple bitflips in a PBED cluster) for both techniques.
For comparison, note that, assuming one year mission in L2 orbit under 1/cm 2 shielding, a programmed circuit with 5000 FFs on a ProASIC RTPE3000L FPGA has four SEUs [8] .
Assuming that this design runs at 20 MHz, then p for this mission is calculated by dividing the errors per year by the number of cycles in one year: Assuming the error rate from Eq. 1 makes the time penalty per year insignificant.
B. Synthesis Results
To compare the synthesis impacts, we created circuits of different sizes by multiple instantiations of circuit B. The circuits were synthesized using the tool Synplify for ProASIC A3P250. LTMR and PBED were applied using Synplify and a newly-implemented tool which generates the PBED circuitry on top of an RTL design, respectively. The output netlists were then placed and routed using Designer from Microsemi. The results are shown in Table I . The parameters shown are: FF count (c FF ), circuit area (A), maximum frequency (f max ), critical path length (t crit ), critical path overhead (t crit+ ), circuit area overhead (A + ), circuit area overhead per FF ( Note that in ProASIC3 architecture, every configurable logic block (CLB) can be either configured as an FF or LUT.
Consequently, in this work, circuit area A is defined as the total count of FFs and LUTs in the circuit.
The impact of PBED on the critical path (and thus on the maximum frequency) is significant due to the synchronous reset in the error handling. PBED reduces the hardening overhead of LTMR by 30.3 % up to 44.8 %.
V. CONCLUSION
We applied LTMR and PBED with software-based retry on a reference architecture and experimentally compared circuit area overhead, maximum frequency and needed processing time using an example mission under fault injection. The results show that at least 30 % of the area overhead caused by the LTMR can be saved by implementing PBED and correcting the errors with time redundancy. In our implementation the impact on the critical path of the circuit is significant, but a solution based on asynchronous reset and pipelined error detection will be investigated as future work.
