Abstract-This paper presents a fault injection system for performing fault injection campaigns on Commercial-off-theshelf (COTS) microprocessors. The proposed system takes advantage of the debug facilities of modern microprocessors along with standard GNU Debugger (GDB) for executing and debugging benchmarks. The developed experiments on real boards, as well as on virtual machines, demonstrate the feasibility and flexibility of the proposal as a low-cost solution for assessing the reliability of COTS microprocessors.
I. INTRODUCTION
In recent years, the use of commercial off-the-shelf (COTS) devices represents an attractive alternative to radhard components in applications working in harsh environments. By using COTS devices, it is possible to significantly reduce both costs and development time of systems. In addition, there is a high availability of such devices on the market which offer lower power consumption and higher throughput than the radiation protected counterparts [1] . However, the miniaturization of electronic components has caused adverse consequences in the reliability of systems based on COTS processors [2] , [3] . For these systems, the effect of radiation could cause execution faults known as soft-errors. Among them the Single Event Upsets (SEUs) [4] . They do not produce a permanent damage, but can cause malfunctioning of a circuit or a system crash, which may or may not be detected. The presence of SEUs is becoming more prevalent in the electronic circuits, and hence it is getting concerns about it in several domains of applications. SEU do not only affects aerospace, but also military, and automotive systems, among others [5] .
Hardware fault injection is a technique that has been widely used to evaluate the dependability of such systems in presence of faults of this nature. Hardware fault injection methods can be classified into three categories: physical fault injection, logical fault injection by circuit emulation or simulation, and logical fault injection by means of processors debugging facilities [1] .
Physical fault injection uses radiation or laser beams for inducing SEUs in integrated circuits. This method has the advantage of causing actual hardware faults on real systems. However, these experiments are very expensive and require especial facilities [6] .
Logical fault injection by circuit emulation or simulation is performed using hardware emulation platforms, and hardware description languages simulators, respectively [7] . Fault injection is implemented by means of the bit-flip fault model, in which the content of a memory cell at the time of injection is reversed. Commonly, circuits under emulation tests are prototyped in one or more FPGAs. In addition, specific logic resources are built within the circuit for debugging purposes. They permit to gather information about the circuit operation in the presence of faults, and to evaluate the behavior of fault tolerance mechanisms [8] , [9] , [10] . Other approach performs the fault injection by using reconfiguring capabilities of FPGAs [11] . Unfortunately, it is not common to have certified and reliable HDL models for COTS processors. Therefore, these techniques have severe limitations in the dependability evaluation of COTS microprocessors.
Finally, logical fault injection by means of debugging facilities takes advantage of processors logic resources to access their internal elements and insert bit-flips. These extra logic resources are originally intended for other purposes, such as Boundary Scan or On-Chip Debugging (OCD) [12] . Regarding OCD, these infrastructures intend to support debugging during the development phase, and are very common in modern microprocessors [13] . This method can emulate faults on real devices. However, its application is 17th IEEE Latin-American Test Symposium -LATS 2016 limited by the capabilities of OCD, and implies the design of an experimental setup, including hardware and software, for each system. This paper presents a method based on a new low-cost logical fault injection tool, conceived for evaluating the processors reliability. This tool does not require the design of additional external or internal hardware modules and only makes use of a common built-in resource (timer) to accelerate the injection of faults. It uses hardware debugging infrastructures such as OCD, and de facto standard GNU debugger (GDB). The main advantage of our method is the high portability to different processor architectures and emulation/simulation platforms. Furthermore, our proposal supports any implementation of processors (softcore and hardcore) and specifically COTS devices. In addition, our fault injection technique is cycle and bit accurate improving the traceability of faults for their posterior analysis. This paper is organized as follows: section II summarizes the previous works regarding fault injection methods on COTS components; section III details the proposed tool and methodology; section IV discusses the experimental results obtained in the evaluation different microprocessors; finally, section V shows the work conclusions and proposes some future enhancements.
II. PREVIOUS WORKS Several approaches have been proposed to assess fault tolerance and reliability of COTS microprocessors in presence of SEU faults. Techniques based on software present a method of fault injection by the execution of a few instructions added to the program. Although, the time needed to inject the upset is relatively short, those methods are very intrusive since they require modifying the source code. In [14] On the other hand, techniques based on processor debug facilities do not require modifying the code of application. From a general point of view, OCD can be defined as the integrated combination of hardware and software, which allows accessing internal resources during execution [15] [16] . The implementation of debugging facilities varies according to the different families of processors, therefore also its features and capabilities. The access to internal resources (registers, memory, etc.) is usually achieved through Joint Test Action Group (JTAG) standard.
The generic architecture of a processor provided with an OCD infrastructure is mainly composed of a Device-UnderTest (DUT) and an external controller, which can be a host computer or a hardware component associated with it for fault injection campaigns. The communication between external controller and the DUT is also performed via JTAG interface.
Based on this concept, a scalable methodology is proposed in [15] . Authors modified hardware interfaces and dedicated OCD circuitry to assist the execution of fault injection campaigns in real time. The main goal of this approach is allow the insertion of faults in microprocessor memory elements.
A similar approach is presented in [17] , where a specific hardware module is implemented for interfacing between the DUT and the host. The technique is based on Nexus, the embedded processor debugging interface standard. In this case, faults are injected without altering its execution. The applicability of this proposal is conditioned to the Nexus standard compliance of the microprocessor. In addition, a circuit instrumentation technique to measure SEU sensitivity in complex processors is presented in [18] . This method employs a custom hardware module, and a control unit connected to the OCD via JTAG. The control unit carries out fault injection, and controls the functionalities that OCD offers. This technique does not require modification of the target system; it is non-intrusive, but requires designing a specific control unit.
Moreover, an on-line error detection technique aimed at microprocessor-based systems is presented in [19] . The proposed solution consists of adding a hardware module (CPU Checker), connected to the available trace interface. This module not only permits OCD debugging interface, but also can deal with the processor trace information, either on-line or through buffering. A similar approach designed for SoC devices is presented in [16] where fault injection occurs via custom hardware module embedded in the system.
Other approaches using GDB have also been proposed [20] , [21] . However, the process to generate static scripts and the need to launch the debugger for every single injection make the methods quite slow. In SFIG approach [21] GDB injects the fault when the program counter reaches the given instruction address. If that address is within a loop, GDB is in the charge of controlling the number of iterations, making the process more inefficient. FAUST tool [20] uses software timers to select the injection time, making the process more inaccurate. Those approaches are conceived to be applied to processors running Unix-like operating system. Our approach supports both, processors running bare to the metal or OS. In addition it includes some improvements to accelerate the injection of faults and to make the process cycle-accurate. This permits a posteriori analysis of the fault if needed. Also, the concept of time critic is included to support a fine grain control of the execution time. This concept is important when some kind of software-based fault tolerance technique is included in the code.
III. PROPOSED FAULT INJECTION SYSTEM
The fault injection system presented intends to handle SEUs and observe its effects, by means of experimental testing carried out directly on the DUT. Our proposal uses the debugging standard infrastructure of GDB for experimental
. Additionally, uses a timer hardware module, which is common on current COTS processors.
In order to demonstrate the feasibility of this method we have developed the fault injection tool, which can be applied to real processors, and it is also capable of working in different environments such as simulators, emulators and virtual machines. Our tool features four main components ( Fig. 1) : the Fault Injection Manager (FIM), GDB, Gateway, and a Debugging Interface (DI). FIM is in the charge of managing and generating fault injection campaigns and analyzing and classifying the results. Using the campaign parameters provided by the user, FIM generates the corresponding fault injection scripts for GDB. GDB is the debugger used for the fault injection on the DUT. The Gateway is DUT specific and allows the communication between GDB and DUT. In case of COTS microprocessor, the gateway is launched by the host to perform the communication; otherwise, the gateway is integrated in the emulator/simulator. DI implements the GDB commands, such as code downloading, stepping through code, or break points insertion. It varies according the DUT, in the case of COTS the rule of DI is performed by the OCD, meanwhile emulators/simulator use a software version known as GDB stub. The fault injection campaign is managed by the host where the FIM is running, so in both cases the speed of the injection process is limited by the required communication between the host and the DUT.
A. Fault Injection Phases
The injection campaign involves two phases: initialization and run-time injection.
The initialization phase includes all the tasks carried out by the FIM to prepare a fault injection campaign for a specific application (benchmark program).
In first place FIM executes the program without performing fault injection to obtain a reference execution of the workload (Golden Execution); also, it collects valuable information from the execution, such as the starting and ending addresses of the program and the number of total execution steps. Once this information is gathered, FIM automatically produces the injection scripts which integrate configuration data about when and where a fault should be injected. In addition, this configuration contains information about the total number of faults per campaign and the finalization conditions such as the timeout barriers used to control abnormal program terminations.
In second place, in those cases where DUT is a COTS processor, the code has to be slightly instrumented to speed up the injection process. In fact, the usual way to reach up to the execution point where a fault has to be injected is by stepping the execution by means of the slow stepi GDB command. To avoid this bottleneck, we propose to use a built-in hardware timer to provoke and interrupt at the selected cycle to allow GDB make the injection. Timer runs parallel to the processor and its interrupt service routine (ISR) produces insignificant code and execution time overheads (about 40 clock cycles). Figure 2 shows the procedure. As can be seen, several lines are added to the original code to configure and enable the timer. The timer ISR is only used to expose the code address where GDB has to insert a breakpoint, which is then used to launch the GDB injection routine. 
17th IEEE Latin-American Test Symposium -LATS 2016
Run-time injection phase, this phase performs the fault injection and classification of every fault. The FIM starts a debugging session and, randomly selects a clock cycle and one bit within the resources. Then, it continues the program execution until program is interrupted by the counter-based ISR in the case of COTS processors, or by overtaking the selected number of executed steps in the case of simulator/emulator. After injecting a fault, the execution resumes until the end of the program.
B. Fault Injection Method and Classification
The fault injection method is based on the injection of SEU (bit-flip), which is done randomly in time and location. The injection of a bit-flip on the internal resources of the processor (such as memory or registers) is performed as following: First, the location and content of the memory element where the fault will be injected are read. Second, XOR operation is applied to the value in the location with a fault mask, which specifies which bit is to be flipped. Finally, writes back the value after it is flipped to the original location.
The fault effects are classified according to the following categories [22] :
• unACE: in case the system completes its execution and obtains the expected output after a fault is injected, are classified as unnecessary for Architecturally Correct Execution.
• SDC: if the fault has not been detected / corrected and causes the program completes its execution with an erroneous output. This fault is called Silent Data Corruption.
• Hang: when the fault causes an abnormal program termination or an infinite execution loop.
IV. EXPERIMENTAL RESULTS

A. Experimental setup
To assess the fault injection methodology, two case studies based on different DUT targets were performed. For the first one, the DUT is a Texas Instruments MSP430 processor, whereas the second one is a ARMv4T processor simulated using QEMU.
The main features of the MSP430 processor are: 16-bit RISC CPU, 16kb of flash, and 512 bytes of RAM. The processor has a register file with 16 registers (R0-R15). The first four registers are intended for special purposes. R0 is reserved for the Program Counter (PC), R1 is the Stack Pointer, and R2 is the Status Register. This register contains the status of the MSP430 CPU, which is defined by a set of bits. The Status Register keeps the content of arithmetic flags (Carry, overflow, Negative, and Zero), as well as some control bits such as SCG1 (System Clock Generator 1), SCG0 (System Clock Generator 0), OSCOFF (Oscillator Off), CPUOFF (CPU Off) which are used to control the operational mode of the CPU. The GIE bit (General Interrupt Enable) is used to enable or disable maskable interrupts, and R3 is used for constant generation. The remaining registers R4-R15 are general purpose.
QEMU is a generic machine simulator and virtualizer, i.e. it is capable of simulate a full system. In addition, it has the advantage of being able to run either as a pure emulator or as a native virtual machine (on x86/x86-64 architecture). It is possible as well, to run and debug a standalone operating system-less application. The ARMv4T architecture provides 16 general purpose registers in the user mode, all of which are 32-bits wide. R15 is the Program Counter, but can be manipulated as a general purpose register. R14 is used as a Link Register by the branch-and-link instruction. R13 is typically used as Stack Pointer. The Current Program Status Register (CPSR) contains four 1-bit condition flags (Negative, Zero, Carry, and overflow) and four fields reflecting the execution state of the processor. The rest of CPU registers, R0 to R12, are general purpose registers.
The benchmark suite used in the experiments is made up of the following test programs: Vector Reduction (VR), QuickSort algorithm (QuickSort and QuickSort _crc), and Matrix Multiplication (MxM and MxM_crc). In the last two benchmarks, Cyclic Redundancy Check (CRC) has been added to reduce the result datawidth and to facilitate the correctness check. CRC is a technique for detecting errors in digital data, commonly used in digital telecommunications networks and storage devices. It uses polynomial division to determine a value called CRC, which is usually 16 or 32 bits width. As a reduction technique, two CRC values will not match if any single bit from a given bit set is changed. We used CRC to evaluate the coherence of the results, in comparison to benchmark programs with CRC algorithm and without it.
Benchmarks were compiled with the following two GCC ports for MSP430 and ARMv4T: msp430-gcc (v4.6.3), and arm-none-eabi-gcc (v4.8.4) respectively.
For assessing the performance of the injection method, the time needed to inject a single fault was measured and averaged during 1000 tests. The obtained average results were 1.10 seconds and 0.12 seconds for MSP430 and QEMUARMv4T, respectively. The first result reveals a big dependency on the communication latencies between the host and the DUT in the case of MSP430.
B. Fault injection campaigns
As an initial test, we verify the coherence of the results, analyzing in detail the faults obtained from the registers in the register file. Two fault campaigns for MSP430 and ARMv4T were performed injecting 1000 SEUs in every register, summarizing a total of 16000 SEUS per campaign on MSP430 and 17000 SEUS on ARMv4T. Two benchmarks were tested in each campaign: QuickSort and MxM. Fig. 3 and 4 show the corresponding results.
As expected, faults only affect those registers that are really involved in program instructions. For instance, all faults injected in unused registers during the execution (e.g. R4 and 17th IEEE Latin-American Test Symposium -LATS 2016 R5 in QuickSort on MSP430), are classified as unACE (Fig.  3) . In both benchmarks executed on MSP430, critical registers such as R0 (Program Counter) and R1 (Stack Pointer) present the highest error rates. This is particularly true in the case of the QuickSort for the register R1/SP. This algorithm is mainly based on recursion, thus, it uses the stack extensively, making the SP register, the most critical one.
It is worth noting that percentages of faults classified as unACE from registers R6 to R12 are quite high because these registers present short lifetimes, i.e. they were not prone to faults during the program execution. On the contrary, other highly used registers (such as R14 and R15 in the QuickSort; and R13 to R15 in case of the MxM) were widely used to perform many calculations throughout the process and, as a consequence, they present higher percentage of ACE faults. In particular, in the case of MxM, registers R13 to R15 are used to calculate and store intermediate results within the program, thus, they cause and increment in the SDC rate. Regarding to QuickSort running on ARMv4T (Fig. 4) , the register LR presents a notable increment in its Hang rate in comparison with MxM. This is due to the increment in its lifetime, as it is used to save the returning address of the subroutines. QuickSort has two big and recursive routines, which increase the fault probability. Summarizing, the overall injection results on both RISC architectures are quite similar as expected.
The second test shows the global reliability of the application, when fault are injected into the whole register file.
The benchmark suite has been tested in both architectures by means of 1000-SEUs injected per register. In average, MSP430 offer a better reliability than ARM, since its unACE fault percentages are higher in all the benchmarks except for QuickSort_crc. Considering the different nature of the benchmarks, some features can be observed. Those algorithms focused on data processing (i.e. VR, MxM and MxM_crc), are more prone to be affected by faults causing SDC. On the other hand, QuickSort and QuickSort_crc programs, which are more control-oriented, present a minor vulnerability to faults in local variables. This peculiarity is caused due to the intensive use of the stack to save the automatic variables over the recursion flow (stack process).
In other words, the size of the SDC rate was directly related with the use of the stack for intermediate results. The  Fig.5 and Fig.6 show that QuickSort with CRC and QuickSort have different SDC rates and similar HANG rates. This is because the low impact of the CRC algorithm due to its reduced code size overhead.
V. CONCLUSIONS
The proposed system is a tool aimed to the evaluation of the dependability of COTS microprocessors. The automatic application and the minimal impact on microprocessor operation are some of the major features of our proposal. In 17th IEEE Latin-American Test Symposium -LATS 2016 addition, this tool does not require any special-purpose hardware, and it has low development and implementation costs. The tests carried out show the tool flexibility to adapt to different platforms, as well as the coherence of the results of injection of faults affecting CPU registers. The use of the tool with emulation/simulation platforms supporting GDB, such as QEMU, allows the acceleration of the fault injection campaign, and also gives estimations regarding the system reliability.
