Abstract. Recently, embedded systems are increasingly deployed in power system. To assist the reliability assessment of these devices, this paper presents FFI4SoC, a fine-grained fault injection framework for SoC. By describing the design of a fault injection tool, the application of the framework was elaborated. Finally, a fault injection experiment was conducted, and the results proved that FFI4SoC can be applied to SoC well with the merit of fine-grained injection.
Introduction
Nowadays, power systems are increasingly equipped with SoC (system-on-chip) devices. Due to the high reliability requirement of power system, it's the priority to assess the reliability of these embedded systems and their workloads. Fault injection has been recognized as a powerful technique that allows the evaluation of a system under faults [1] .
Fault injection technique is to inject fault into target system under observation as well as logging in order to analyze its behavior in presence of fault. Software implemented fault injection, or SWIFI, has been regarded as a primary method in fault injection experiment for its inexpensiveness and high-efficiency. SWIFI changes the contents of memory or registers to simulate the occurrence of hardware or software faults [1] . Many tools have been developed in past two decades. FERRARI [2] injects fault into process control block; tool used in [3] is based on fault injection framework of Linux. These tools rely on OS of target system too much to be applied to some specialized embedded systems. EDFI [4] and LLFI [5] perform on intermediate code of compiler and disrupt SUT which requires source code or reverse engineering of SUT. In [6] , extensible firmware interface is used to conduct faults, but this standard is only available on some x86/x64 system currently. Xception [7] can only simulate CPU, memory and bus fault by runtime hardware exception vector.
In previous SWIFI-related research, there are few tools aiming at SoC devices. And most of the existing SWIFI tools run on the OS (operating system) of target system, which cannot be applied to some specialized embedded systems. Furthermore, they focus much on coarse-grained injection site like CPU and memory. To overcome the problems above, this paper presents FFI4SoC, a fine-grained fault injection framework for SoC. And XSSIFI, a SWIFI tool based on FFI4SoC for Xilinx SoC, is given to illustrate the application of the framework. The utility and flexibility of FFI4SoC are confirmed by using XSSIFI to perform fault injection experiment.
FFI4SoC Architecture
Contraposing fine-grained fault injection for SoC devices, FFI4SoC, an improved fault injection framework was presented. The framework can be configured to support injection in not only CPU registers, memory but also peripherals like DMA (direct memory access) controller, and QSPI (queued serial peripheral interface) controller. Fig. 1 shows a component diagram of FFI4SoC components and their relationships. The framework is divided into host and target subsystem. Host subsystem provides user interaction, analysis and statistics services and so on, while target subsystem is an implementation of main process of fault injection. Centralized components organization of each subsystem is recommended to facilitate the communication among them. For centralized organization, connectors are procedure call or inter processes communication whilst remote procedure call is adopted as connectors in distributed organization of subsystem. Host subsystem performs on the OS of host. Nonetheless, there are three relationships between target subsystem and OS on target system: on the OS, under or integrated with OS and partial under or integrated with OS.
Knowledge base, the information database for analysis, stores fault models, actuation strategies and logs, etc. Fault Model Config provides an interface for user to configure fault models and actuation strategies. Log and Statistics record and analyze the results. The Fundamental Layer exposes steady interface to layer above.
The details of main components in FFI4SoC are described as follows.
1) FAWR Data Modeling.
FAWR is an improved data structure of FARM [8] . F is short for fault set. A, fault actuation set, is a supplement of FARM. SUT or workload set is denoted by W. R is readout set or feedback of the framework. Detailed expression is shown in Eq. 1 to Eq. 4:
(1) FIPA denotes target physical address set for injection. Elements, named mask, in FIMask are sequential binary codes where '1' means effected, '0' the opposite. FT includes multiple fault types which can be expressed flexibly.
(2) AM set includes actuation mechanisms such as temporal actuation, spatial actuation, state-specific actuation, etc. And AP set denotes the detailed parameters under a certain mechanism. For temporal actuation, AP can be denoted by the combination of time units and countdowns. W = WR × WD.
(3) All resouces used by the workloads in target system constitute the WR set. WD is a set of duration of workloads. R = CTLFB × FIFB × WRFB.
(4) CTLFB denotes a set of control informations. FIFB denotes the set of injection conditions. WRFB is a set of exceptions and results of workloads after fault injection.
2) Fault Actuator. Fault actuator can be configured as temporal, spatial, state-specific actuator, etc. The component receives the A set and actuates fault injection accordingly. Fault actuator begins with actuation, which can be a message or an external or internal interrupt request. Then, it interrupts the workload for the following fault injection. For all configurable components, a Config Manager is inducted to maintain a registered components list which acts as an invoking guider.
3) Fault Injector. Fault injector can be configured to different types such as CPU injector, memory injector, DMA injector, QPSI injector, etc. According to F set, the component injects specific fault into workload, OS, or memory and registers at runtime. Different injection algorithms should be designed for different injected sites. Due to constraint of OS or hardware, there may be failing injection. Thus, confirmation of fault injection should be done to eliminate unsuccessful injection. After that, workload continues under control.
4) Fault Monitor. Fault monitor varies with its configuration. The component monitors the behavior of SUT with fault and collects potential exceptions information or results produced by SUT.
5) Process
Controller. This component integrates all other components in target subsystem into a whole, whose responsibility is to manage the interaction, control and data flow among components.
6) Communication. FFI4SoC supports multiple remote communication types like UART and Ethernet, etc, which can be configured by Communication Config. They all conform to the protocol defined by the framework to ensure efficient communication.
FFI4SoC Example
Single event effects (SEEs) may occur when a single radiation ionizing particle strikes the silicon. Single event upset (SEU), one example of SEEs, leads to bits upset in momery or register. In order to illustrate the usage of FFI4SoC, XSSIFI, a tool to reproduct SEEs in Xilinx ZedBoard (one of Zynq-7000 All Programmable SoC), is presented.
1) Data Modeling. F set: ZedBoard is equipped with an industry-standard ARM dual-core Cortex-A9 CPU, and the size of almost all registers is 32 bits. FIPA set of XSSIFI, therefore, can be any valid physical address within the system addresses of ZedBoard and mask length in FIMask set is 32 bits. Faults like SEU, transit to '1' and to '0', are contained in FT set. For example, (0xfffab1, 10, SEU) means that the second bit of data in 0xfffab1 would be injected with a SEU fault.
A set: Temporal actuation is the only element in AM. And AP is the Cartesian product of TU and CD. TU includes time units: ms and μs whilst CD is the countdown integer on a specific time unit.
W set: Every element in WR is named after the workload. WD is filled with duration of SUT in μs. R set: Informations like connection state between host and target system, target system reset condition are included in CTLFB. FIFB is {success, failure for SUT end, failure before SUT start, unknown failure}. WRFB will be described in detail in the experiment section.
2) Design of Fault Injector. XSSIFI is running below OS, which authorizes the tool complete access to the resources in target system. And multiple injectors are implemented in XSSIFI. The designs of CPU injector and DMA injector are presented as follows.
CPU fault injector: Compared with tools running on OS, there is no system call for modifying memory or register in XSSIFI. And in timer interrupt vector, current context is protected. Therefore, faults directly injected in registers by the injector will be restored whenever the interrupt vector exits. To conquer the problem, the injector has the backup stack injected instead. And the vector will restore the faulted context. The algorithm is as follows:
Input Step 1: in interrupt vector, save r0~r15.
Step 2: in interrupt routine, fetch original value in backup stack according to f[fipa].
Step 3: in interrupt routine, compute faulted value and target address according to f[fimask], f[ft] and original value.
Step 4: in interrupt routine, inject faulted value into target address.
Step 5: detect condition of fault injection, return r[fifb].
Step 6: in interrupt vector, restore r0~r15, end. DMA fault injector: From the software perspective, hardware is regarded as a set of registers. In DMA, direct access to some primary registers is prohibited for the sake of stability. Only by means of specialized instruction could the injector change the content in registers of DMA controller. The basic idea of DMA fault injection algorithm is to corrupt the driver. Details are presented as follows:
, whilst f belongs to F. Output: r[fifb], whilst r belongs to R.
Step 1: fetch original value of specific register from driver according to f[fipa].
Step 2: compute faulted value according to f[fimask], f[ft] and original value.
Step 3: inject faulted value into specific memory map within driver.
Step 4: detect condition of fault injection, return r[fifb], end.
3) Adding and Transplant Component. When a new configurable component is added to the FFI4SoC, work has to be done as follows: (1) provide the configuration descriptions of new component: in XSSIFI, they refer to head file in C language which defines interface and internal data; (2) implement corresponding interface in fundamental layer; (3) register the new component before running so that the mappings in registered components list operates properly.
If all components hew to the steps above, transplant is merely re-implementing the lower interface of target component followed by integration testing.
Fault Injection Experiment
In order to demonstrate the utility of FFI4SoC and its capability of fine-grained injection in SoC running environment, fault injection experiments of SEEs were conducted. In the experiments, XSSIFI performed on ZedBoard without OS. Supporting fault injection sites are: CPU, memory, memory-on-chip, DMA controller, DDR Memory controller, QSPI controller, generic interrupt controller.
Two SUTs were chosen: matrix multiplication program with 50 × 50 integers (mm); DMA testing program where 256 long integers are copied from source to destination memory (dmat). The duration values of these two SUTs were measured before the experiment. And the computational results were returned when they exited.
The exceptions defined by ARM (except for fast interrupt request and interrupt request) were involved in R set. Besides, timeout which is triggered by unknown reasons was recognized as an exception. Combination of running results of SUTs and exceptions constitutes the WRFB set, which is {undefined, data abort, prefetch abort, SWI, timeout, false result, no error}.
For mm SUT, F set is {(r0, one bit random, SEU), (r15, one bit random, SEU), (random memory address, one bit random, SEU)}; A set is {(temporal actuation, μs, random countdown)}. The random model is uniform distribution. One bit random means that one bit of mask is randomly set to '1'. And the random memory address is generated within image of SUT. Within duration value, the random countdown is generated. For dmat SUT, the ceiling of random countdown in A set is changed to the start time of driver. And tuple, (DMA controller, one bit random, SEU), is added to F set. Table 1 and Table 2 give the results of fault injection campaigns of 2000 times per element in F set per SUT. Table 1 shows that SEEs happened in memory and r15 may lead to much more severe error than that in r0 for mm SUT. It is explained that faults in memory may corrupt both instructions and data of SUT and that faulted r15 is likely to lead target system to an unexpected condition. Table 2 presents that the damage caused by faults in memory and r0 is less severe in dmat SUT. And the r15 is still the most fragile region compared with the others. An interesting result is that faults in the DMA controller affect dmat SUT most (only 11.1 percentage computational results are correct). This can be explained by the fact that dmat SUT relies more on DMA than other resources. From above, more general conclusions are presented as follows: (1) FFI4SoC can be properly applied to fault injection experiment for SoC, providing the ability of fine-grained injection; (2) influence of fault varies with the hardware that it resides in; (3) for the SUTs where peripherals are used as dominant resources, fine-grained injection is more efficient than coarse-grained one. 
Conclusions
This paper presents a fine-grained SWIFI framework for SoC (FFI4SoC), and a tool to reproduct SEEs in Xilinx ZedBoard (XSSIFI). The experimental results show that FFI4SoC and XSSIFI are useful and effective. Though designed for embedded system, the FFI4SoC can also be extended to general computer system. In future, we'll apply FFI4SoC to more fields.
