ABSTRACT
LIST OF FIGURES

INTRODUCTION
Transient or permanent faults introduced in a computer system during runtime can cause an incorrect sequence of instruction execution in the program and may cause control flow errors. If the system does not perform some run-time checking, the erroneous output may not be detected and serious damage may result. Therefore, it is important to monitor the program to detect any abnormality in the control flow or other error, and to take appropriate actions to avoid any incorrect output.
Signature monitoring and pure software methods have been proposed to check the control flow of a computer system. Signature monitoring is a signature analysis method in which a signature associated with a block of instructions (one or more nodes in the control flow graph) is calculated and saved somewhere during compile time; then, the same signature is generated during run time and compared with the saved one. Signatures are assigned arbitrarily (assigned signature) or derived from the binary code or the address of the instructions (derived signature). Structural Integrity Checking (SIC) [Lu 82 ] employs an assigned signature method while Path Signature Analysis (PSA) [Namjoo 82 ], Signatured Instruction Streams (SIS) [Shen 83 ], Asynchronous SIS (ASIS) [Eifert 84 ], Continuous Signature Monitoring (CSM) [Wilken 89 ], [Wilken 90 ], extended-precision checksum method [Saxena 90] , and On-line Signature Learning and Checking (OSLC) [Madeira 92 ] all use derived signatures.
In many signature monitoring techniques, dedicated hardware is used to calculate the run-time signatures and to compare them with the saved signatures. A watchdog processor is proposed for this purpose in [Lu 80] and [Mahmood 85 ]. On the other hand, when the hardware is fixed and cannot be changed, a software method that does not need any extra hardware has to be developed. Examples of software methods are assertions [Andrews 79], [Ersoz 85 ], watchdog task [Ersoz 85 ], Block Signature Self-Checking (BSSC) [Miremadi 92 Checking (VASC) has been proposed and evaluated in [Furtado 96] , in which the comparison of the ratio between control flow errors and data errors in RISC and CISC processors are reported.
SIGNATURE ANALYSIS BY INSTRUCTIONS
This paper presents a new assigned signature monitoring technique called Control Flow
Checking by Software Signatures (CFCSS), which monitors assigned signatures for interblock control flow checking using instructions without using any special hardware. The program is divided into basic blocks. A basic block is a branch-free sequence of instructions; no jumps into or out of the block except for the first and last instruction of the block. CFCSS differs from other signature monitoring techniques in using no watchdog processor or extra hardware to achieve inter-block control flow checking. CFCSS uses an assigned signature technique similar to Structural Integrity Checking (SIC) [Lu 82 ], but does not need to send check-labels to a watchdog processor since it checks the signatures using instructions. Block Signature Self-Checking (BSSC) [Miremadi 92 ] is also an assigned signature technique that uses a subroutine to replace the watchdog processor. However, its drawback is that the code depends on the location of the code because the signature consists of an absolute address. The control flow checking scheme presented in [Yau 80 ] is a pure software method but it constructs a database containing information about concurrent control flow checking, thus it may require significant memory overhead. The watchdog task described in [Ersoz 85 ] also needs no extra hardware, but the operating system must support a multi-tasking environment. On the other hand, a watchdog processor or similar monitoring hardware is required in the derived signature techniques as in Path Signature Analysis (PSA) [Namjoo 82 ], Signatured Instruction Streams (SIS) [Shen 83 ], Asynchronous SIS (ASIS) [Eifert 84 ], Continuous Signature Monitoring (CSM) [Wilken 89 ] [Wilken 90 ], extendedprecision checksum method [Saxena 90 ], On-line Signature Learning and Checking (OSLC) [Madeira 92] and Implicit Signature Checking (ISC) [Ohlsson 95] . They check the intrablock and inter-block control flow by observing the signatures derived from the instruction bit patterns or addresses of a basic block while CFCSS checks the inter-block control flow by monitoring assigned signatures in the GSR. The GSR differs from the reserved register in [Tung 90] ; the reserved register stores the program or procedure name whereas the GSR stores the signature of the current node. The GSR is not a special or additional register in the CPU. It is one of the general purpose registers of the CPU picked by the compiler or assembler to serve as the GSR. In addition, CFCSS differs from VASC [Furtado 96 ] in the way it embeds and checks for the signatures.
PRELIMINARIES
Before we present our approach, we define the terminology that will be used later.
A basic block is a maximal set of ordered instructions in which its execution starts from the first instruction and terminates at the last instruction. There is no branching instruction in a basic block except possibly for the last one. A basic block terminates at either an instruction branching to another basic block or an instruction receiving transfer of control flow from two or more places in the program [Yau 80 ] [Namjoo 83 ].
By defining V = {v 1 , v 2 , ..., v n } as the set of vertices denoting basic blocks, and E = {br ij | br ij is a branch from v i to v j } as the set of edges denoting possible flows of control between the basic blocks, a program can be represented by a program graph, P = {V, E}.
These br ij 's are not necessarily explicit branch instructions. They also represent fall through execution paths, jumps, subroutine calls and returns. An example is shown in Fig. 3 of a node to change to a non-branch instruction. As a result, the node without the branch instruction merges with the node that is adjacent to it in the memory address space.
The xor-difference of a and b is the result of performing the bitwise XOR operation of a and b, i.e., xor-difference = a ⊕ b, where a and b are binary numbers. When control is transferred from one basic block to another, a new run-time signature G is generated by a signature function f at the destination node of the branch. A signature function f is a function that updates G for the current node by using two values: the signature of the previous node (source node of the branch) and the signature of the current node (destination node of the branch). We use these two values since the source and destination nodes of the branch uniquely determine each branch in E.
Suppose that the signature function f is defined as We chose the XOR operation as the signature function because the XOR operation is better than other ALU operations for the purpose of checking or generating signatures. As AND, OR and XOR operations use fewer gates in the ALU than addition and multiplication, they have less chance of having an error in the ALU than addition and multiplication. We want to check the correct control flow in the original program and minimize the probability of error in the signature function. The fewer gates the signature function uses, the smaller area in ALU an error can occur, so that the probability of error in the signature function decreases. Furthermore, AND and OR operations cannot uniquely determine one input given the other input and output. Therefore, the best candidate for the signature function is the XOR operation. 
and s 1 and s 2 are the signatures of the source node v 1 and destination node v 2 of the branch br 12 , respectively. Note that nodes are assigned unique numbers as their signatures. Before a branch is taken, G is equal to G 1 that is the same as s 1 , the signature of the source node of the branch. After the branch is taken, G is updated with a new run-time signature
Since the signature difference d 2 is d 2 = s 1 ⊕ s 2 and G 1 was s 1 , the new run-
.e., the updated run-time signature G 2 is the same as the signature s 2 of the current node v 2 , therefore, no error has occurred. On the other hand, suppose that an illegal branch from node v 1 to node v 4 is taken.
In other words, the control should have moved from node v 1 to node v 2 , but an error causes an
Illegal branch G n : Run-time signature at the node v n sn : Signature assigned to the node v n dn : Signature difference
Correct branching Incorrect branching illegal branch from v 1 to v 4 . Before the illegal branch is taken, G 1 is equal to s 1 as before.
However, after the branch is taken, at node v 4 , the new updated run-time signature G 4 is different from the signature s 4 of the new node v 4 because . Thus an illegal branch to any instruction in the node will also be caught. The detailed calculation of G is shown in the figure. Fig. 4 .6 shows the case where an illegal branch br 14 lands at the second instruction of the node, i.e., 'br (G≠s) error'. In a similar way, as the new run-time signature is not generated at v 4 , G is still equal to the previous value G 1 that is not equal to s 4 . Therefore, 'br (G≠s) error' catches this mismatch and the error is detected. 
br G≠s 1 error br G≠s2 error br G≠s 4 error br G≠s5 error instructions to them. However, there are cases where the same signature has to be assigned to multiple nodes, for example, a branch-fan-in node. In Fig. 4 However, if we use s 1 = s 3 as the signatures, then an illegal branch from v 1 to v 4 , or from v 3 to v 2 , will not be detected. In order to solve the problem of assigning a same signature to multiple predecessors of a branch-fan-in node, a run-time adjusting signature D is introduced. After the run-time signature G is generated by the signature generation function, G is XORed with D to get the signature of the branch-fan-in node, thus, at the source node, D has to be set to the value which makes G equal to the signature of the destination node. For the branch br 12 , D is not necessary as node v 2 is not a branch-fan-in node; only one branch is coming into v 2 and d 2 is equal to s 1 ⊕ s 2 . Thus, the updated G at node v 2 is equal to s 2 as in the previous case in Fig. 4 .7. In summary, if one source node has a branch to the branch-fan-in node, the node has to have one extra instruction for D in the checking instructions to set D to the appropriate value before branching. If the branch to the branchfan-in node is taken, D is XORed with G at the destination node. If not, D is just ignored. In this way, we can assign arbitrary different numbers to all nodes in the program graph.
Algorithm A
The following is the complete description of Algorithm A that assigns signatures to each node in a program flow graph when a program is compiled. In addition, when a branch br ij is taken, if the destination node v j is a branch-fan-in node, the run-time signature G j is generated by the signature function and D, i.e.,
If they match, it means no control flow error has occurred in taking branch br ij .
CFCSS will detect the following control flow errors:
Corollary 1. An illegal branch taken to the signature function instruction -the first line of the node -will be detected.
Proof. Suppose that br ij is an illegal branch, thus v i ∉ pred(v j ). At node v i , G is equal to s i . After the branch is taken, the new run-time signature is generated, Corollary 2. An illegal branch taken to the instruction br (G≠s) error -the second line of the node -will be detected.
Proof. Suppose that br ij is an illegal branch and the branch is taken to the second line of the node, i.e., skipped the signature function. Since the new G was not generated, G is still equal to s i , not s j . Therefore, br (G≠s) error instruction sees the mismatch and detects the error. Corollary 4. A branch insertion inside a node will be detected.
Proof. Suppose that br ik is inserted at node v i , br ik ∉ E. At node v i , G is equal to s i .
After br ik is taken to the first instruction of node v k , the new updated Mismatch occurs and the error is detected. By Corollary 3, a branch to other instructions of the node will also be detected.
Corollary 5. The deletion of an unconditional branch instruction from the node will be detected.
Proof.
Suppose that the branch instruction br ij at node v i is changed to another instruction; therefore, br ij is removed from E and an adjacent node v k is merged into the node v i . Then, the signature of this node is changed from s i to s k in the middle of the node where v i and v k are merged; thus, the G updated in the beginning of the node does not match with s k when control reaches the merged point. This is the same case as when an illegal branch br ik ∉ E occurs. Therefore, the error is detected.
Aliasing
If multiple nodes are sharing multiple branch-fan-in nodes as their destination nodes, aliasing may occur between legal and illegal branches, and cause an undetectable control flow error. Suppose that an illegal branch br 16 occurs and lands at the first line of node v 6 , where the instruction of signature function f(G prev , d 6 ) is located. G prev is G 1 = s 1 and updated G is
updated G 6 is equal to s 6 , therefore, this illegal branch is not detected. This aliasing error is caused by the fact that more than two branch-fan-in nodes have their signature differences calculated with the signature of the same node, and their predecessor sets are not equal. More specifically, the condition for aliasing error is:
pred(v i )) to node v j is undetectable when that branch is taken to the location of the instruction for the signature function.
V4 V6
G = G ⊕ d 2 br G≠s2 error D = 0000 B2 G = G ⊕ d 1 br G≠s1 error D = s2 ⊕ s1 B1 V5 V1 V3 V2 Illegal branch br 16 G = G ⊕ d 3 br G≠s3 error D = s2 ⊕ s3 B3 G = G ⊕ d 5 G = G ⊕ D br G≠s 5 error B5 G = G ⊕ d 4 br G≠s4 error B4 G = G ⊕ d 6 G = G ⊕ D br G≠s 6 error
B6
Illegal branch br16
If the illegal branch is taken to any location except for the first line of the node -the instruction for the signature function -, the control flow error is detected because the new run-time signature associated with the destination node is not generated. In other words, the illegal branch is detected unless it lands at the first line of the destination node that satisfies the condition described above. With this observation, we can avoid the undetectable illegal branch if we assign signatures to the nodes in the following way.
If we assume one bit error, and the Hamming distance between the addresses of the first instructions in node v 5 and v 6 is greater than one, this undetectable illegal branch is avoided; one bit error in the destination field of the branch instruction at node v 1 cannot cause an illegal branch to the location of the first line of node v 6 . Similarly, if we assume m bit error and the addresses of the first instructions of all successor nodes are different by hamming distance greater than m, undetectable illegal branches caused by aliasing will be avoided.
EXPERIMENTAL RESULTS
Seven benchmark programs are chosen for the experiment: LZW (compression), FFT , Matrix multiplication, Quick sort, Insert sort, Hanoi, and Shuffle. First, the source files were compiled and assembly codes were generated. One of the branch deletion, branch creation or branch operand change was randomly applied to the assembly code, then a machine code was generated by compiling the faulty assembly program. This machine code is executed and the result of 500 iteration is shown in Table 1 .
The numbers in the second row (an incorrect result is produced) of the table indicate the number of faults that cause the programs to produce incorrect results, which look like correct ones to the observer. The third row means that erroneous result is infinitely produced because the fault creates an infinite loop in the program. In the fourth row, the processor does not respond to the observer, so we have to manually stop the processor. The numbers in the fifth row shows the number of faults that are detected by the operating system in the machine. A segmentation fault and failed assertion are the examples of faults detected by the operating system. Although a branch fault is inserted into the program, if a correct result is produced, then that number is presented in the sixth row. It means the fault may not cause an error. The last row denotes the number of faults that are not fault-secure; the faults result in producing errors, or they are not detected by the operating system. This is the sum of the second, third and fourth row. On average, 33.7 % of the injected faults are not fault-secure. For the second part of the experiment, CFCSS is included in the assembly source code and the branch fault (branch deletion, branch creation, and operand change) is inserted into the code. The resulting assembly code is compiled and executed. The result of 500 iteration is shown in Table 2 . In this table, the last row is the number of faults that result in error and are not detected by either CFCSS technique or the operating system.
The graph in On the other hand, programs such as sorting and searching have small size basic blocks because they have frequent branch instructions. Therefore, the overhead of checking instructions in these programs is relatively high compared to calculation intensive programs.
OVERHEAD REDUCTION AND EXAMPLES
When we consider the overhead of Algorithm A, each node has between 2 to 4 additional instructions. If we assume that the average size of one basic block is 7 to 8 instructions 2) Insert an instruction G = G ⊕ dj into node v j .
3) For node v j whose pred(v j ) is a set of nodes v i1 ,v i2 ,...,v iM -therefore, v j is a branchfan-in node − the signature difference is determined by one of the nodes (picked Fig. 6.1 (b) shows an example of a loop in which the signature is checked only at the end of the loop instead of checking it at every node in the loop. Similarly as in Fig. 6.1 (a) , only node v 6 has br (G≠s) error instruction. The overhead of applying Algorithm A to this loop example is 37.5%. When Algorithm B is applied, the overhead is 27%. Fig. 6 .2 shows four different nodes with associated checking instructions. A check node is a node that has the comparison instruction br (G≠s) error. The comparison instruction is excluded from the node that is not a check node. Note that a node can be a combination of these type, e.g., a branch-fan-in and check node.
SUMMARY
This paper presents the Control Flow Checking by Software Signatures (CFCSS), a new signature monitoring technique for control flow checking using a pure software approach.
CFCSS employs a run-time signature G and a run-time adjusting signature D to check the control flow of a program. Algorithm A detects the inter-block control flow errors by assigning a different signature to each node in the program graph. The XOR operation is used both to embed the signatures in the program and to check for them during run-time. In addition, the aliasing problem that may cause an undetectable illegal branch is discussed and the technique to avoid it is presented. Furthermore, Algorithm B is proposed to reduce the overhead by postponing the comparison of signatures.
The distinctive feature of the CFCSS over previous signature monitoring techniques is that the CFCSS needs no dedicated hardware such as a watchdog processor for control flow checking. Watchdog task in multi-tasking environment also needs no extra hardware but the advantage of the CFCSS over it is that CFCSS can be used even when the operating system does not support multi-tasking. If the hardware is fixed or a system is to be built using commercial-off-the-shelf components to keep the costs low, and control flow checking is needed, CFCSS can be used to enhance the reliability of the system.
REFERENCES
[Andrews 79] Andrews, D., "Using executable assertions for testing and fault tolerance," 9th Fault-Tolerance Computing Symp., Madison, WI, June 20-22, 1979. 
