Abstract-The use of microprocessor-based systems is gaining importance in application domains where safety is a must. For this reason, there is a growing concern about the mitigation of SEU and SET effects. This paper presents a new hybrid technique aimed to protect both the data and the control-flow of embedded applications running on microprocessors. On one hand, the approach is based on software redundancy techniques for correcting errors produced in the data. On the other hand, control-flow errors can be detected by reusing the on-chip debug interface, existing in most modern microprocessors. Experimental results show an important increase in the system reliability even superior to two orders of magnitude, in terms of mitigation of both SEUs and SETs. Furthermore, the overheads incurred by our technique can be perfectly assumable in low-cost systems.
I. INTRODUCTION
HE use of microprocessor-based systems is gaining importance in application domains where safety is a must. In this case, errors induced by radiation in the microprocessor may cause wrong computations or even losing control of the entire system. Therefore, mitigation of Single-Event Effects (SEE) is mandatory in safety-or mission-critical applications.
SEEs, such as Single-Event Upsets (SEUs) or Single-Event Transients (SETs), may affect microprocessors in several ways. If an error occurs in a register or memory position storing data, a wrong computation result may be obtained. If an error occurs in a control register, such as the program counter or the stack pointer, the instruction flow may be corrupted and a wrong result may be produced or the processor may lose control and enter an infinite loop. Both data and control-flow errors need to be carefully addressed by software and hardware error mitigation techniques.
Software-based approaches have been proposed for both data and control-flow errors. For data errors, software approaches apply redundancy at low level (assembly code) [1] [2] or high-level source code by means of automatic transformation rules [3] . Also, multithreading has been applied to implement software detection and recovery solutions to mitigate faults [4] . Control-flow checking techniques are typically based on signature monitoring [5] [6] [7] . The program is divided into a set of branch-free blocks, where each block is a set of consecutive instructions with no branches except for possibly the last one. A reference signature is calculated at compile time and stored in the system for each block. During operation, a run-time signature is calculated and compared with the reference signature to detect control-flow errors.
Hardware-based approaches can be used for microprocessors at several abstraction levels as for any other digital device. Microprocessor specific techniques introduce system level redundancy by using multiple processors, coprocessors or specialized system modules [8] [9] . In particular, the reuse of debug infrastructures has recently been proposed [10] . Debug infrastructures are intended to support debugging during the development phase, and are very common in modern microprocessors. Since they are useless during normal operation, they can be easily reused for on-line monitoring in an inexpensive way [11] . On the other hand, they can provide internal access to the microprocessor without disturbing it, and require neither processor nor software modifications.
Both hardware and software-based techniques have advantages and disadvantages. The software-based approaches are very flexible and can be used with Commercial Off-TheShelf (COTS) microprocessors, since no internal modifications to the microprocessor are required. However, they may produce large overheads in processing time and storage needs [12] [13] . These can be particularly very large for control-flow checking, because a large amount of signatures need to be stored and checked very often. To the contrary, hardware-based techniques can be quite effective but they usually introduce large area overheads and generally require modifications on the microprocessor, which are not feasible in COTS. However, these drawbacks can be overcome by reusing the debug infrastructures, as they use existing hardware interfaces in a non-intrusive manner.
This paper presents a new hybrid technique aimed to protect both the data and the control-flow of embedded applications
Efficient Mitigation of Data and Control Flow
Errors in Microprocessors running on microprocessors. On one hand, the approach is based on software redundancy techniques for correcting the errors produced in the data. On the other hand, control-flow is checked by a small hardware module that monitors the sequence of instructions executed by the processor through the debug interface and detects illegal changes in the control flow.
The experimental results show an important increase in the system reliability even superior to two orders of magnitude, in terms of mitigation of both SEU and SET effects. Furthermore, the overheads incurred by our technique can be perfectly assumable in low-cost systems.
II. HYBRID HARDENING APPROACH

A. Data Hardening
Two software-based techniques are proposed in this work to be applied in different scenarios according to the maximum response time allowed, and the performance and code size constraints. Both of them are applied at low-level code (assembly) but with different granularity levels.
The first one is an adaptation of the SWIFT-R technique proposed by Reis et al. [14] . SWIFT-R is an overall method aimed to recover faults from the data section, mainly related to the register file of the microprocessor. Similar to hardware TMR, the idea is to keep two copies of any data that come into the software protected area (also called Sphere of Replication or SoR). In our case, the borders of SoR include the entire microprocessor data-path excluding the memory and ports. Every instruction that operates with the data is replicated too. Finally, to check the consistency of the data, software majority voters and recovery procedures are inserted before any instruction that implies the data leaves the SoR (e.g., store into a memory location or write to an output port), and also before any conditional branch.
Any soft-error affecting the program data within the microprocessor is masked by the copies and corrected in a short term. Data correction lasts the number of clock cycles necessary to execute the instructions of the voter and the recovery of the affected register. However, because of its fine replication granularity (instruction level) the code size and the execution time can be increased from 2.5 to 3 times compared to the non-hardened program [15] . Therefore, this method represents a suitable solution when a quick recovery time is needed and there are no severe overhead limitations.
The second method is based on Procedural Replication (PR) instead of instruction replication. The replication unit is the procedure (function), which is a block of code that performs a single task and returns some values. Every procedure is computed twice and recomputed a third time if a discrepancy between the previous two computations occurs [21] . To obtain this behavior, a few code transformations are needed in the original code, involving conditional jumps and consistency checkers (inserted after procedure calls). Thus, the impact in code size is, a priori, small. In contrast, the recovery time is much longer for this second technique than for SWIFT-R. It depends on the number of instructions included in the procedure's duplicated call. Usually, the recovery time is equal to the total execution time of the original procedure plus a few additional comparisons. The recovery time is, therefore, a significant issue that must be taken into account with this method. Nonetheless, since the third procedure call only occurs in case of error detection, the execution time overhead factor during normal operation of the system (error-free state) is only 2 times. Otherwise, in the unlikely event of error recovery, this overhead is up to 3 times.
B. Control Flow Checking
The hybrid approach presented in this paper performs Control-Flow Checking (CFC) by adding a dedicated hardware module (CFC module). The module accesses to internal resources by means of the trace interface available in modern microprocessors (see figure 1Fig. 1 ). Therefore, it is possible to observe the system behavior without modifying the normal operation or adding any performance penalties.
CFC-Module uP
Trace IF Figure 1 . System structure hardened with a CFC-Module
In particular, the operation of the CFC consists on predicting the next Program Counter (PC) value, starting from the executed instruction, and comparing it with the actual PC value for the next executed instruction. If there is any difference, an error in the execution flow is detected. This technique has preliminarily been explored in [20] , obtaining a good trade-off between the fault detection coverage and area overhead. In order to apply this technique, the value of the PC and operation code of the current executed instruction must be accessible. These values are generally present in trace interfaces, and therefore, this technique is applicable to those processors that contain a trace bus. Fig. 2 shows the structure of the CFC module. The CFC module consists of three blocks:
CFC-Module
• PC Checker: it compares the predicted PC value with the actual one.
• N-level Stack Replica: this block replicates N stack positions in order to check the control flow in case of call and return from subroutines. • Manager: this block is in charge of three main tasks: decoding the executed instruction, predicting the PC value for the next instruction and managing the other blocks,. The PC prediction value of the next instruction to be executed takes into account branch and non-branch instructions, distinguishing between conditional and unconditional branches. Furthermore, when a subroutine is called, the return PC value is stored in the N-level Stack Replica. Thus, when a return instruction is executed, the predicted PC value for the next instruction can be recovered from that block and checked. A suitable trade-off between necessary hardware resources and error coverage can be achieved by selecting the number of stack positions to be replicated. For applications with low level of nested subroutines, a few stack positions should be replicated. In this paper, the experimental results have been performed with 3 replicated positions (3-Level Stack Replica).
III. EXPERIMENTAL RESULTS AND DISCUSSION
To assess the effectiveness of our approach, we have performed extensive SEU and SET fault injection campaigns in a PicoBlaze microprocessor considering several software applications and different combinations of software and hardware hardening techniques.
We worked with a compiler front-end and back-end for PicoBlaze microprocessor in order to generate the hardened software versions. PicoBlaze is a soft-microprocessor based on a RISC architecture of 8 bits, with severe limitations in performance and resources, but widely used in FPGA-based embedded systems. These facts make it especially appropriate for our case study, taking into account that PicoBlaze is mainly found in cost sensitive applications. The PicoBlaze architecture contains 16 general-purpose 8-bit registers, an internal scrachtpad RAM, ROM, a stack memory to support subroutine calls and up to 256 input and output ports.
In this case study, a cycle accurate and RTL equivalent clone of the original PicoBlaze-3 version (RTLPicoBlaze) has been used. The PicoBlaze design has been extended with a trace interface, similar to the one that can be found for LEON3 [16] , and the CFC module has been connected to it. The CFC module implies an overhead of 435 gates and 119 FFs. This corresponds to an area overhead of about 40% because PicoBlaze is a very small microprocessor. For more complex microprocessors, the overhead can be expected to be much smaller.
The architecture was synthesized for a 90 nm technology using the SAED90 nm library provided by Synopsys [17] . For SEU experiments, we injected SEUs at every FF and clock cycle. For SETs experiments, we injected faults at several random instants within every clock cycle for every gate and with a pulse width of 500 ps, using the AMUSE tool [18] . The number of injected faults varies with the software application and the selected hardening techniques. It ranges from 17,483 faults/node to 105,344 faults/node (up to 111 million faults in total for SETs and up to 20 million faults in total for SEUs).
In the experiments, faults were classified according to their effect on the program behavior as proposed in [19] . Silent Data Corruption (SDC) failures are faults that have not been detected or corrected and make the program finish with an erroneous output. Hang failures are the ones that provoke abnormal program termination or an infinite loop. To detect this type of failures, we established a timeout condition with some allowed extra clock cycles for the computation to complete correctly (the timeout value depends on the application and the software hardening technique, ranging from 100 to 3300 clock cycles for the performed experiments).
Three different software applications were used for the experiments: matrix multiplication (Mmult), a Proportional-In tegral-Derivative Controller (PID) and a Finite Impulse Response filter (FIR).
In the experiments, we tested every application with no hardening at all (NH) and hardened with SWIFT-R (SR), Procedural Replication (PR) and all of the above combined with CFC (NH+CFC, SR+CFC and PR+CFC). Tables I and II The experimental results demonstrate that the combination of SW hardening for data errors and HW hardening for control-flow errors is able to mitigate almost all errors. Generally, the CFC module removes most Hang errors and the SR or PR techniques remove most SDC errors, but both techniques are required to produce a relevant mitigation. The best results are achieved with the combination of PR and CFC module. The last column in Tables II and III shows the relative error rate reduction that is obtained in this case with respect to the NH version. This reduction can be as large as 114 times, i.e, more than 2 orders of magnitude, in the FIR case.
When selecting a software technique, it must be taken into account that the studied software techniques require different recovery times and memory overheads. SWIFT-R is less effective than PR and it also introduces higher code and execution time overheads, as reported in Table III . In contrast, PR has significant recovery times (ranging from 590 to 3300 clock cycles for the considered benchmarks), while the recovery time for SWIFT-R is negligible. 
