Abstract Single event upset (SEU) effect, caused by highly energized particles in aerospace, threatens the reliability and security of small satellites composed of commercial-off-the-shelves (COTS). SEUinduced control flow errors (CFEs) may cause unpredictable behavior or crashes of COTS-based small satellites. This paper proposes a generic software-based control flow checking technique (CFC) and bipartite graph-based control flow checking (BGCFC). To simplify the types of illegal branches, it transforms the conventional control flow graph into the equivalent bipartite graph. It checks the legality of control flow at runtime by comparing a global signature with the expected value and introduces consecutive IDs and bitmaps to reduce the time and memory overhead. Theoretical analysis shows that BGCFC can detect all types of inter-node CFEs with constant time and memory overhead. Practical tests verify the result of theoretical analysis. Compared with previous techniques, BGCFC achieves the highest error detection rate, lower time and memory overhead; the composite result in evaluation factor shows that BGCFC is the most effective one among all these techniques. The results in both theory and practice verify the applicability of BGCFC for COTS-based small satellites.
Introduction
The increasing popularity of commercial-off-the-shelves (COTS) in modern small satellites, such as pico-and nanosatellites, 1 calls for the high reliability within the limits of low cost and weight. Due to the lack of radiation hardened schemes of COTS components, radiation in aerospace environment like highly energized solar particles, photons, and electrons may cause transient and permanent errors in both software and hardware of COTS-based satellites. Among these radiation effects, the single event upset (SEU) effect caused by highly energized particles has been proven as one of the major threats to the space borne semiconductor devices and their host satellites. 2 SEU results in bit flips in memory cells, registers, or flip-flops. 3 Bit flips may introduce transient errors in both data and control flows. Data errors may cause unexpected results, while control flow errors (CFEs) are generally more serious than data errors since they can cause the program jump to unexpected locations and lead to unpredictable behavior and system crashes, which are security breaches of satellites. Additionally, experiments show that 33-77% (depending on the type of the processor) of SEU-induced errors are CFEs. 4, 5 Therefore, to improve the reliability and security of small satellites, fault tolerant techniques are necessary for detecting and blocking CFEs before damages occur. 6 Several fault tolerant techniques for detecting and blocking CFEs have been reported, which would fall into two categories: hardware redundancy and control flow checking (CFC). Furthermore, CFC techniques can be classified into hardware-based and software-based according to the implementation. In general, hardware redundancy techniques employ two or more identical processors to execute the same programs and detect CFEs (along with other errors) by comparing their outputs. Hardware redundancy techniques have a better fault tolerant capability and lower time overhead but impose higher cost, weight and complexity 7 so that they are viable for large satellites but infeasible for small satellites due to the strict containment of their cost and weight. The general approach adopted by CFC techniques is to divide the source code into basic blocks (a block with no branching instructions except the last one, which is described in Section 2.1) and insert extra instructions to check the control flows running inside blocks and between them. CFEs analyzed by CFC techniques are classified into three categories:
(1) Intra-node CFEs: caused by illegal branches within a basic block. (2) Inter-node CFEs: caused by illegal branches between basic blocks. (3) Out-of-memory CFEs: caused by illegal branches to the unused memory space.
CFC techniques aim to detect all the categories of CFEs with low time and memory overhead. Hardware-based CFC techniques conform to this demand but impose external devices or hardware modification. For example, an approach called control flow checking using execution tracing (CFCET) 8 employed an external watchdog processor (a coprocessor attached to the main processor via the address bus) to trace the execution of the target program in the main processor, detected CFEs by validating each branching address. A hardware assisted preemptive CFC method 9 modified the processors to insert extra checking instructions for detecting CFEs. Compared to hardware redundancy, hardware-based CFC techniques are cheaper due to the lower complexity of watchdog processors and hardware modification. They have good CFE detection rates but are unsuitable for the circumstance that hardware changes are not permitted. Moreover, watchdog processors are difficult to be attached to modern processors whose address buses are internal, processors modification are also infeasible because most modern COTS processors are close-sourced. Therefore, hardware-based CFC techniques are unfit for small satellites.
Instead of introducing extra hardware or hardware modification, software-based CFC techniques just insert extra checking instructions into the source code of the target program at compile time so as to have the target program do checking jobs itself. Preeminent software-based CFC techniques proposed in the recent and past years include software-based control flow checking (SCFC), 10 graph-tree-based control flow checking (GTCFC), 11 relationship signatures for control flow checking (RSCFC), 12 yet another control flow checking using assertions (YACCA), 13 control flow checking by software signatures (CFCSS) 14 and enhanced control flow checking by assertions (ECCA) 15 etc., which are listed in chronological order. All these techniques except CFCSS can detect all inter-node CFEs; they are useful in industrial environments but restrictive or impractical to small satellites. Details of their drawbacks are described as follows.
Both SCFC and RSCFC employed an N-bit bitmap (represented as an integer) to store the successors for each basic block, where N is the total number of basic blocks in the target program. Each basic block is assigned an ID from 1 to N respectively and the basic block with ID i corresponds to the ith bit in the bitmap. The bitmap is effective in exploiting bit-level parallel operation. Nevertheless, in practice, N is usually larger than the word length of the processor (usually 8, 16 or 32), so that a bitmap consists of multiple registers, then a single operation on the bitmap decomposes to multiple operations on registers, which increases the time and memory overhead. GTCFC, which claimed to be dedicated for picosatellites, introduced virtual basic blocks for the basic blocks that have multiple predecessors to ensure that each basic block has only one predecessor, and introduced an array to store the predecessors for each (virtual) basic block. The array occupies a linear memory space, which is considerable for resource constrained small satellites. The inserted checking instructions in virtual basic blocks consume extra time and memory. Moreover, element access in the array needs indirect addressing, which is the slowest one among addressing modes of processors. Thus, GTCFC is restrictive to small satellites. YACCA and ECCA represent predecessors of each basic block as the product P of their IDs, and then expose CFEs by checking the divisibility of P and divided-by-zero exception respectively. In practice, P may be a big integer and exceed the word length, so the operation and storage may lead to extra time and memory overhead. Furthermore, multiplication and division will cause a huge performance decay for processors without multipliers like ADuC841, which is adopted by ZDPS-1A pico-satellite. 16, 17 Thus, YACCA and ECCA may be time and memory consuming in practice. CFCSS performed simple XOR and AND operations at the beginning of each basic block to check CFEs, so it has a low time and memory overhead. However, the detection rate is low due to the lack of checking instructions at the end of basic blocks and the aliasing problem. 14 To overcome the drawbacks of previous techniques in small satellite applications, we propose a generic high-performance and low-overhead SCFC technique named bipartite graphbased control flow checking (BGCFC). BGCFC partitions the target program into basic blocks and builds its control flow graph as previous techniques. [10] [11] [12] [13] [14] [15] A conventional basic block consists of a cluster of instructions, and the illegal jump can start from or end at any instruction of the basic block. So illegal branches, start from or end at the beginning, inside and end of the basic block are different. To simplify illegal branches, BGCFC transforms the control flow graph into an equivalent bipartite graph by splitting each basic block into two subblocks, so that only illegal branches between sub-blocks need to be handled. BGCFC employs a bitmap to store the predecessors of each basic block in the corresponding bits as SCFC and RSCFC. However, because usually one basic block has only a few predecessors, storing all basic blocks in a single bitmap causes most bits in the bitmap wasted. Hence, BGCFC only stores the predecessors of the basic block with multiple predecessors into the bitmap to reduce the length. Furthermore, to increase the storage density of the bitmap, BGCFC assigns IDs to predecessors of each basic block as consecutive as possible. Along with the execution of the target program, BGCFC does check at each sub-blocks by merely performing XOR, AND, SUB and SHIFT operations, which are the fastest among the instruction set and ubiquitous in processors. BGCFC inherits the advantages of SCFC, RSCFC and CFCSS, and overcomes their shortages. As a result, BGCFC is not only capable of detecting all inter-node CFEs with low time and memory overhead, but also generic among arbitrary COTS processors, and independent of any specific hardware. Consequently, BGCFC is applicable and useful in COTS-based small satellites.
Bipartite graph
To reduce the types of illegal branches for further CFC, BGCFC transforms the original control flow graph into the equivalent bipartite graph. This section presents the details of the bipartite graph, including the motivation of introducing it and the effects that it brings on.
Definition
Definitions of the relevant concepts are provided as follows (some of them are adopted commonly by previous techniques).
Basic block: 11 A maximal set of ordered instructions that execute from the first to the last sequentially, with no branching except possibly the last. The branching destination is always the first instruction of the basic block. Basic blocks are denoted as vertices v i (i 2 f0; 1; . . . ; N À 1g). The zerobased index i conforms to the convention in the C programming language, which is widely adopted in small satellite developments. 18 
Control flow:
12 A branch from vertex v i to v j represents that v i possibly branches to v j in the error-free state. Control flows are denoted as directed edges e k (k 2 f0; 1; . . . ; M À 1g), where M is the total number of edges. The edge is also denoted as br ij if it is from v i to v j .
Predecessor/Successor: If there is an edge br ij from v i to v j , then v i is a predecessor of v j . Equivalently, v j is a successor of v i . The set of predecessors of v is denoted as pred(v). The set of successors of v is denoted as succ(v).
Control flow graph: 12 A program can be represented by a graph G = {V, E}, which is composed of a set of basic blocks V = {v i , i 2 {0, 1, . . ., N À 1}} and a set of control flows Illegal branch: A branch that fails to meet the definition of basic blocks (e.g., not from the end of a basic block or not to the beginning of a basic block), or represents an edge that is inexistent in the set of edges E (e.g., not to the legal successor).
Sub-block: Split a basic block v i into two sub-blocks v i u and v i w , where v i u corresponds to the beginning of v i and v i w the end of v i . An extra directed edge e i uw from v i u to v i w is generated. Obviously, succ(v i u ) = {v i w }, pred(v i w ) = {v i u }. Atomic: If all branches from v i to v j are considered identically, there is no need to consider whether the branches start from or end at the beginning, inside or end of v i and v j , then both v i and v j are atomic, otherwise, they are not atomic.
Cardinality: 19 Number of elements of a set. The cardinality of a set U is denoted as |U|.
Effects of bipartite graph
According to the legality of error-free control flows, illegal branches can be further categorized into six types as follows: 10 (1) Type 1: the illegal branch starts from any instruction of a basic block and ends at any other instruction of the same basic block. It skips or re-executes intermediate instructions within the basic block and causes an intranode CFE. (2) Type 2: the illegal branch starts from any instruction excluding the last one of a basic block and ends at the first instruction of its legal successor. It skips the instructions including the last one of the former basic block and causes an inter-node CFE. (3) Type 3: the illegal branch starts from the last instruction of a basic block and ends at any instruction excluding the first one of its legal successor. It skips the instructions including the first one of the latter basic block and causes an inter-node CFE. (4) Type 4: the illegal branch starts from any instruction excluding the last one of a basic block and ends at any instruction excluding the first one of its legal successor. It skips both the instructions including the last one of the former basic block and the instructions including the first one of the latter basic block, and causes an inter-node CFE. (5) Type 5: the illegal branch starts from a basic block and ends at an illegal successor. It generates an extra edge that is inexistent in the edge set E and causes an internode CFE. (6) Type 6: the illegal branch starts from a basic block and ends at the unused memory space. It causes an out-ofmemory CFE.
Type 2-Type 4 correspond to the illegal branches from a basic block to its legal successor and Type 5 corresponds to the illegal branch from a basic block to its illegal successor. The types mentioned above are cumbersome because basic blocks are not atomic so that we have to handle with illegal branches that start from or end at the beginning, inside and end of basic blocks separately. To eliminate this cumbersomeness, we split the basic blocks into two atomic sub-blocks. Thus, the original control flow graph can be transformed into a bipartite graph, and all branches between sub-blocks are considered the same. Fig. 1 shows an example of a control flow graph that contains basic blocks with a unique predecessor (v 3 and v 4 ) and with multiple predecessors (v 1 , v 2 , v 5 ), and the transformed bipartite graph. In Fig. 1 , circles represent the original basic blocks and rectangles represent the sub-blocks. Basic blocks and their corresponding sub-blocks are placed horizontally. Theorem 1. Any control flow graph can be transformed into an equivalent bipartite graph by splitting each basic block into two sub-blocks.
Proof. Given a control flow graph G = {V, E}, three possible numbers of vertices are analyzed below
Obviously, the corresponding sub-block sets U = B and W = B, vertices in the same set are not connected because both of sets are empty. In addition, U\W = B, E uw = B. Thus G = {B, B} can be transformed into the equivalent
Obviously, vertices in the same set are not connected since each set contains only one element, U\W = B and an extra edge e uw starts from v u and ends at v w . Thus, G = {{v}, B} can be transformed into the equivalent bipartite graph BG = {{v
In this case, each single vertex v i can be treated as a sub graph G_v i = {{v i }, B}; apply the proof result of Case (2) so that G_v i can be transformed into the equivalent bipar- To sum up, any control flow graph can be transformed into an equivalent bipartite graph.
Because no vertices in the same set are connected in BG, we evidently have four formulas below:
Theorem 2. After the bipartite graph transformation, four types of illegal branches that cause inter-node CFEs can be reduced to the illegal branches between two sub-blocks.
Proof. Illegal branches only exist between two basic blocks, so a control flow graph whose vertices contain both legal and illegal successors is sufficient for this proof. Suppose a control flow graph G = {{v 0 , v 1 , v 2 }, {e 0 , e 1 }}, where e 0 is from v 0 to
uw , e 0 , e 1 }}, and A side effect of the bipartite graph is that intra-node CFEs cannot be detected, because intra-node CFEs will lead to legal branches along with e i uw (i 2 {0, 1, . . ., N À 1}). Actually, almost no previous SCFC techniques attempted to detect intra-node CFEs because intra-node CFEs skip no checking instructions. Only SCFC claimed to be able to detect the intra-node CFEs by inserting a checking instruction in the middle of the basic block, 10 but the detection performance is limited because not all intra-node CFEs skip that checking instruction. The essence of the intra-node CFE detection of SCFC is to divide the basic blocks into multiple sub basic blocks (not sub-blocks in BGCFC) and these sub basic blocks can be used as individual basic blocks in the CFC technique. More sub basic blocks lead to better detection rates but couple with more overhead. According to the experimental results of previous techniques, [10] [11] [12] the high detection rate (>90%) shows that the ratio of intra-node CFEs is so low (<10%) that BGCFC does not detect them explicitly. If the detection for the intra-node CFE is necessary, then the tradeoff between the basic blocks division and the overhead needs to be considered.
Algorithm for CFC
After transforming the original control flow graph into its equivalent bipartite graph, various illegal branches are reduced to the illegal branch between two sub-blocks. BGCFC checks the control flow by validating whether the last visited subblock is the predecessor of the current visited sub-block. BGCFC employs a bitmap to store the predecessors for each sub-block that has multiple predecessors. The consecutive IDs of predecessors help to reduce the bitmap length.
BGCFC performs an ID assignment algorithm to ensure that the IDs of predecessors of every basic block that has multiple predecessors are as consecutive as possible. This section elaborates on the ID assignment algorithm and control flow checking principle. The ID assignment algorithm includes two steps: perform a breadth-depth-first search to obtain the predecessors of each basic block; enumerate all ID permutations to find the most consecutive one. The details of the algorithm are described as follows. After performing the algorithm listed above, we make sure the IDs of predecessors of each basic block are assigned as consecutive as possible. IDs of basic blocks correspond to the IDs of sub-blocks, that is id(v 2i ) = 2 · id(v i ) and id(v 2i+1 ) = 2 · id(v i ) + 1.
ID assignments
For a control flow graph G = {V, E}, where V = {v i |i 2 {0, 1, . . ., N À 1}}. In its equivalent bipartite graph BG, v i corresponds to two sub-blocks v i u 2 U, v i w 2 W. We change the notation v i u to v 2i and v i w to v 2i + 1 for mapping the indices. Thus, all elements in U have even numbered indices; all elements in W have odd numbered indices. By Eq. (1), we have pred(v 2i ) = pred(v i )˝W; by Eq. (4), we have pred(v 2i+1 ) = {v 2i }. Thus, the bitmap only needs to store the sub-blocks in W, which are indexed as 2i + 1. For simplicity, the bitmap only contains the basic block index i. Furthermore, for the basic block v i , we usually have |pred(v
Checking instructions
When the control flow is transferred from one sub-block to another, S is updated to a new value. Under the legal execution of the target programs, S should be equal to id(v) after visiting the sub-block v. BGCFC divides the sub-blocks v into two types and handle them differently. For the type of |pred(v)| = 1, BGCFC inserts simple checking instructions; because v has a unique predecessor, the legal control flow is determinate. For the type of |pred(v)| > 1, BGCFC inserts more complex checking instructions; because v has multiple predecessors, the legal control flow may have multiple legal source sub-blocks so that it is indeterminate. The checking instructions for these two types are described in detail below. For convenience of description, we introduce the symbols in C programming language ''^'', ''&'', ''>>'' to represent the XOR, AND, SHIFT RIGHT operations respectively. We denote the previous visited subblock as v prev . Before executing the checking instructions, S = id(v prev ).
If The Statement 03 checks whether S is an odd number, if it is not, then a CFE occurs. The Statement 04 checks whether v prev 2 pred(v) by checking the bitmap bm(v), and the constant off(v) = id(v p0 )/2 is calculated at compile time for reducing unnecessary calculations at runtime. The Statement 04 is so complicated that we decompose it step by step: (a) S >> 1 is to calculate the index i of the corresponding basic block in G; (b) (S >> 1) À off(v) is to calculate the bit number n (zero-based, counted from the least significant bit) in the bitmap of the possible predecessor; (c) bm(v) >> ((S >> 1) À off(v)) is to shift the bit n of the bitmap right to the least significant bit; The whole Statement 04 is to check whether the bit n is 0; if it is not 0, then v prev 2 pred(v) and we update S to the ID of current subblock; else, the Statement 05 is executed, that a CFE occurs.
Take an instance to describe the bitmap bm(v). Suppose id(pred(v)) = {3,4,6,7}, then bmðvÞ ¼ 11011 |fflffl{zfflffl} 7;6;5;4;3
ð5Þ
The ID 3, 4, 6, 7 are represented from the least significant bit to the most significant bit of bm(v) respectively, where the corresponding bit is set to 1.
The sample program in Fig. 1 is G = {V, E}, where V = {v 0 , v 1 , v 2 , v 3 , v 4 , v 5 }, E = {e 0 , e 1 , e 2 , e 3 , e 4 , e 5 , e 6 , e 7 , e 8 }, and e 0 = br 01 , e 1 = br 12 , e 2 = br 12 , e 3 = br 23 , e 4 = br 24 , e 5 = br 42 , e 6 = br 35 , e 7 = br 45 Fig. 3 illustrates the CFC instructions in the BG of Fig. 1. In Fig. 3 , numbers inside the solid rectangles represent the IDs of sub-blocks, and dashed rectangles contain the checking instructions for each sub-block. Note that in practice, the bitmap length equal to the word length, which is usually 8, 16 or 32. In this example, the bitmap length is 8. The statements outside the dashed rectangles elaborate on the update process of S that follows a legal control flow path v 0 fi v 1 fi v 2 fi v 4 fi v 2 fi v 5 step by step.
Error detection rate and overhead evaluation
In the aerospace, SEU affects the binary machine code instead of the higher level programming language. This section analyzes the machine code of checking instructions to evaluate the error detection rate, time and memory overhead of BGCFC respectively, and proves the applicability of BGCFC in COTS-based small satellites theoretically.
Error detection rate
SEU-induced transient errors include data errors and CFEs. Data errors are not covered by BGCFC since BGCFC is a control flow checking technique. The incorrect conditional branching is considered as the data error. 11, 12, 15 Data errors can be detected by dedicated techniques. 20, 21 The intra-node CFE is not detected by BGCFC explicitly, and the reason is explained in detail at the end of Section 2.2. To tackle the out-of-memory CFE, BGCFC fills all the unused memory space with instructions ''call Error()'', so once illegal branches jump to the unused memory space, the common error handler Error() is invoked. Therefore, the analysis of error detection rate mainly focuses on the detection of inter-node CFEs. Theorem 3. BGCFC is able to detect all types of illegal branches that cause the inter-node CFE.
Proof. From Theorem 2, various types of illegal branches that cause inter-node CFEs can be reduced to the illegal jump between two sub-blocks. Because sub-blocks are atomic, if a branch br ij from sub-block v i to v j is illegal, then we must have v i R pred(v i ). BGCFC divides the sub-blocks v into two types: |pred(v)| = 1 and |pred(v)| > 1 and inserts different checking instructions accordingly. These two types are analyzed in this proof. To sum up, BGCFC is able to detect all types of illegal branches caused inter-node CFEs.
Therefore, BGCFC achieves high-error detection rate because it is able to detect all types of inter-node CFEs. 
Time overhead
Although BGCFC only performs simple XOR, AND, SUB and SHIFT operations which are independent from specific hardware, the time overhead is still variable among different COTS processors due to the difference of their traits involving speed, cache and pipelines. 22 The time overhead is defined as the increasing ratio of execution time of the hardened program with respect to the original one. For the sake of fair, we evaluate the execution time by measuring the number of instruction cycles, which is regardless of the traits of processors.
BGCFC employs a register to store S and represents other constants as immediate operands within instructions, so the addressing mode is immediate addressing which is the fastest one. In this addressing mode, the XOR, AND, SUB and SHIFT operation all take only one cycle in most COTS processors. Besides, the auxiliary instructions CMP, CALL and MOV also take one cycle.
In the best case, the branch is legal and the CALL in Statement 02 is not executed. In the worst case, the branch is illegal and the CALL is executed additionally. So the checking instructions in the sub-block with a unique predecessor take two cycles in the best case and three cycles in the worst case.
If statements in Statements 03-05 are mutual exclusive, in the best case, the branch is illegal because S is an even number, only Statement 03 is executed. In the worst case, Statement 04 is executed with regardless of the execution of Statement 05, it takes six additional cycles and the CALL in Statement 03 is not executed. The checking instructions in the sub-block with multiple predecessors take three cycles in the best case and eight cycles in the worst case.
Assume the target program contains N basic blocks, while x of them have multiple predecessors and a basic block is divided into two sub-blocks, then in the worst case, the additional instruction cycle of checking instructions is
The time overhead is
where T ori is the instruction cycles of the original program. The coefficient of x and N in T e is constant, which means for each basic block, the time complexity of inserted checking instructions is constant, i.e., O(1). Therefore, the time overhead is low.
Memory overhead
Memory overhead is defined as the increasing ratio of memory size of the hardened program with respect to the original one. We evaluate the memory size by the number of the checking instructions in the machine code instead of the bytes of instructions because it depends on the word length of processors. The checking instructions in the sub-block with a unique predecessor take three instructions and the checking instructions in the sub-block with multiple predecessors take 10 instructions. Assume the target program contains N basic blocks, while x of them have multiple predecessors and a basic block is divided into two sub-blocks, then in the worst case, the additional number of instructions is
The memory overhead is
where M ori is the memory size of the original program. The coefficient of x and N in M e is constant, which means for each basic block, the memory complexity of inserted checking instructions is also constant, i.e., O(1). Therefore, the memory overhead is low.
As to the resource constrained small satellites, we hope the CFC techniques are able to detect more CFEs but take less time and memory. Therefore, in theory, BGCFC is applicable for COTS-based small satellites due to the high error detection rate, low time and memory overhead.
Test results and discussion
For verifying the proof in theory from the perspective of practice, this section brings tests and analysis based on the onboard computer (OBC) of the in-service ZDPS-1A pico-satellite, 17 which are researched and developed by our institute. The OBC utilizes an 8-bit 8052-compatible Harvard architecture COTS processor ADuC841 from Analog Devices, which contains a 2 KB random access memory (RAM) for data storage and a 64 KB read only memory (ROM) for code storage. The program code is executed in the ROM directly. ADuC841 is a simple controller and lacks the specific hardware for calculation, such as multipliers. One of the successive products of ZPDS-1A utilizes a more powerful 32-bit COTS digital signal processor (DSP) TMS320C6747 from Texas Instruments, which features the Harvard architecture too. TMS320C6747 equips various controllers, multipliers and signal processing units; it is much more powerful than ADuC841. However, we still choose the ADuC841 instead of TMS320C6747 as the test bench based on the following reasons. Firstly, the main purpose of this paper is to supply a generic high-performance and low-overhead SCFC techniques for COTS-based small satellites, not just limited to our own products. However, TMS320C6747 is powerful and may hide the shortages of the propose method, because not all processors in small satellites are as powerful as TMS320C6747. Secondly, according to the survey, 1, 23 mainstream small satellites adopted Harvard architecture processors, such as ARM, PIC controller, DSP etc. ADuC841 is generally less powerful than them, so if BGCFC is verified to be applicable for ADuC841, then we believe it will be generally adequate for most processors of mainstream small satellites. Thirdly, ZDPS-1A has successfully fulfilled all the missions and been functionally well in aerospace for more than three years, 17 so we believe the OBC of ZDPS-1A is reliable and it will not introduce extra unsafe factors to affect the accuracy of test results. Finally yet importantly, the goal of BGCFC is to extend the lifecycle of the small satellites in real projects, not only limited to the academic research.
We perform the test on four benchmarks: insertion sort (IS), quick sort (QS), matrix multiplication (MM) and fast Fourier transformation (FFT). These benchmarks are widely used by other SCFC techniques, and they represent different patterns of control flow graph. IS and QS take a lot of branches among relatively simple calculations, while FFT and MM carry out lots of time-consuming multiplications and relatively few branches. The benchmarks can cover all patterns of control flow graphs of real programs. Therefore, the benchmarks are adequately representative for the real programs in such senses. Besides, FFT and MM are commonly adopted in real missions of satellites, which make the benchmarks be more representative.
In order to simulate the SEU effects on the ground, we utilize an external complex programmable logic device (CPLD) to flip the bit randomly in the ROM and RAM for error injections. After injecting an error, the flipped bits are recovered immediately for the next error injection. Once the CFC technique detects a CFE, the Error() function increases the counter in CPLD to record the number of successful detections. We choose four preeminent previous software-based CFC techniques SCFC, GTCFC, CFCSS and ECCA for comparison tests. SCFC is the latest technique that is proposed in this year (2014). GTCFC is a dedicated CFC technique for picosatellites presented by our institute in 2013. CFCSS and ECCA are classical techniques proposed in 2002 and 1999 respectively, which have been compared in many previous papers. Therefore, these techniques are representative.
We generate six versions for each benchmark, which are:
(1) The original code.
(2) The code hardened by BGCFC (the proposed technique in this paper). We evaluate the time and memory overhead by comparing the increasing ratio of the execution time and memory size of the hardened programs and with respect to the original ones. Table 1 shows the comparison results of the time and memory overhead for each version of benchmark.
Note that the results of MM and FFT hardened by CFCSS are much lower than the results in the original paper 14 because the ADuC841 lacks the multiplier while MM and FFT employ heavy multiplications, both the execution time and memory size increase, but the MIPS processor adopted by the original paper contains a multiplier. Checking instructions inserted by CFCSS contain only simple XOR and AND operations which does not increase the execution time and memory size. Therefore, the increasing ratio of execution time and memory size decreases accordingly, resulting in that CFCSS has both low time and memory overhead. SCFC gets large time and memory overhead because the number of basic blocks N is larger than 8 (word length of ADuC841), a single operation of the bitmap needs to be divided into multiple operations, which increases the time and memory overhead. ECCA leads to large performance decay due to the numerous multiplications in the checking instructions, which shows obviously in IS and QS because these two benchmarks contain no multiplication. The results of MM and FFT seem better because both benchmarks and checking instructions contain numerous multiplications so that the ratios are pulled down. The time overhead of GTCFC is a little bit larger than CFCSS due to the slow indirect addressing and the extra overhead of virtual basic blocks, but the memory overhead is significantly large due to the linear memory space complexity. BGCFC is the only one that is comparable with CFCSS in both time and memory overhead. Table 2 shows the comparison results of error detection rate of each software-based CFC technique. Each version of benchmark is injected 10000 errors. Even though the injected errors are randomized in spite of their classifications, the large number of injected errors guarantees that all types of CFEs have appeared due to the large amount of tests. Actually in the real aerospace, SEUs also flip the bit randomly so this test is good for simulating the real environment. 6 We have obtained the results from the counter in CPLD.
The comparison results show that BGCFC can detect over 93% of all injected errors. The undetected injected errors, which are minority, are data errors or intra-node CFEs.
11
BGCFC does not detect them explicitly. In the results, BGCFC, SCFC and GTCFC achieve high error detection rate, the detection rate of ECCA is variable and CFCSS manifests the lousiest result. The reason is explained as follows. For ECCA, the memory size of inserted checking instructions exceeds the memory size of the original code. Due to the randomness, bit flips are much more possible to occur in inserted checking instructions than in the original code, so the result is unstable. CFCSS lacks checking instructions at the end of basic blocks, and the XOR operation leads to aliasing problem for the basic block with multiple predecessors (named branchfan-in node in CFCSS 14 ) , so the result is lousy. Theoretically, the error detection capabilities of BGCFC, SCFC and GTCFC are the same. However, BGCFC, which is proposed in this paper, achieves the highest error detection rate. There are two reasons to explain this phenomenon. First, the memory overhead of BGCFC is the lowest among these three techniques, because of the randomness, the possibility of bit flips in inserted checking instructions is lower than the others. Hence, the error detection rate of BGCFC is the highest. Second, BGCFC fills the unused memory space with instructions ''call Error()'' while others are not, so it further increases the error detection rate. SCFC manifests better than GTCFC in the similar result, because the former's memory overhead is less than the latter's and the checking instructions in the middle of basic blocks help to increase the error detection rate. Therefore, we can conclude that even though the error detection rates are the same in theory, the result is inversely proportional to the memory overhead in practice.
Resource constrained small satellites need to detect more CFEs with less time and memory, so neither the error detection rate nor time and memory overhead involved alone can reflect the effectiveness of the CFC techniques. We employ an evaluation factor to evaluate the entire effectiveness of the CFC techniques, which considers all the three factors. Evidently, both higher error detection rate and lower time and memory overhead should lead to higher evaluator factor. However, error detection rate should have a higher weight than time and memory overhead because the primary goal of CFC techniques is to detect CFEs. Therefore, we set the ratio of weight of error detection rate r detection , time overhead t overhead and memory overhead m overhead to 2:1:1. Note that r detection is in the range 0%-100%, whereas t overhead and m overhead are in the range 0 $ 1, so we cannot simply add r detection , t overhead and m overhead . Therefore, we introduce normalized factors for r detection , t overhead and m overhead respectively to evaluate their effectiveness contribution. These factors should positively contribute to the effectiveness and be normalized to the range 0%-100%. The definitions of normalized factors are 
where max(t overhead ,m overhead ) is the maximum value of time and memory overhead in test results. According to the results in Table 1 , max(t overhead ,m overhead ) = 595.49%. Then we define the evaluation factor f eval as f eval ¼ 2fðr detection Þ þ fðt overhead Þ þ fðm overhead Þ ð 11Þ Fig. 4 shows the comparison results of the average evaluation factor of these CFC techniques adopted in test, wherein BGCFC achieves the highest average evaluation factor. Such results suggest that BGCFC not only offers a high error detection rate and performs steadily well for different patterns of control flows, but also achieves both low time and memory overhead, so it is applicable for COTS-based small satellites in practice.
Nevertheless, two problems exist in BGCFC in practice. The first one is that time complexity of the ID assignment algorithm presented in Section 3.1 is O(N!), which means a factorial time complexity. Because it obtains the consecutive IDs by enumerating all permutations of ID 0 to N À 1, the total number of all permutation is N's factorial. In practice, N is large (usually >300) so it is very time-consuming.
Fortunately, this algorithm runs on the personal computer (PC) on ground instead of the OBC of the small satellite. Thanks to the high performance of current PCs, we can wait for several hours to obtain the consecutive IDs without affecting the effectiveness of BGCFC. However, it is still inconvenient and inflexible for this algorithm. Suppose if we run the algorithm on a large program with thousands of basic blocks, then probably it will take us several days to obtain the result. Therefore the main task of future work is to decrease the time complexity of the ID assignment algorithm to quadratic O(N 2 ) or linear O(N). The second problem is potential. In case that there exists a target program whose control flow graph is so special that IDs are not consecutive enough, the storage density of the bitmap will be quite low and the bitmap length may exceed the word length, then the performance of BGCFC may decay to the level of SCFC and RSCFC. Currently, we have no way to prove which patterns of control flow graphs may result in the worst case of the consecutive IDs theoretically, so it is potential. Fortunately, by far we have not suffered this problem while performing BGCFC to various real programs of satellites. Both of the problems are related to graph theory and we will attempt to resolve them referring to this theory in future work.
Conclusions
(1) The proposed SCFC techniques, BGCFC, simplifies the types of illegal branches by transforming the control flow graph into the equivalent bipartite graph. It introduces a bitmap to exploit effective bit-level parallel operations for verifying the last visited sub-blocks of control flow at runtime and assigns the consecutive IDs to reduce the bitmap length. (2) The effects of bipartite graph, error detection rate, time and memory overhead of BGCFC are detailedly analyzed in theory, which manifests that BGCFC can detect all types of inter-node CFEs with O(1) time complexity 11 show that the minor undetected errors are data errors or intra-node CFEs. To consider error detection rate, time and memory overhead comprehensively, we employ an evaluation factor to evaluate the effectiveness of CFC techniques. Compared with the previous preeminent software CFC techniques, BGCFC achieves the highest score in evaluation factor. The results verify the applicability of BGCFC for COTS-based small satellites in practice. (4) In future work, two problems of BGCFC need to be resolved: the large time complexity of the ID assignment algorithm and the proof for the worst case of ID assignments in graph theory.
