Compiler-Assisted Signature Monitoring by Warter, Nancy J. & Hwu, Wen-mei W.
August 1990 UILU-ENG-90-2236CRHC-90-6
Center for Reliable and High-Performance Computing
COMPILER-ASSISTED 
SIGNATURE MONITORING
Nancy J. Warter Wen-mei W. Hwu
Coordinated Science Laboratory 
College of EngineeringUNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Approved for Public Release. Distribution Unlimited.
UNCLAbbLJ: iti»
$E¿urttfv ¿LA^i^t^AfiÒN OP this Pa G¿
REPORT DOCUMENTATION PAGE
1b. RESTRICTIVE MARKINGS
Nonej .  REPORT SECURITY CLASSIFICATIONUnclassified
2a. SECURITY CLASSIFICATION AUTHORITY
2b. OECLASSIFICATION /  DOWNGRADING SCHEDULE
3 DISTRIBUTION /AVAILABILITY OF REPORT
Approved for public release; 
distribution unlimited
4. PERFORMING ORGANIZATION REPORT NUMBER(S)
UILU-ENG-90-2236 (CRHC-90-6)
5. MONITORING ORGANIZATION REPORT NUMBER(S)
6a. NAME OF PERFORMING ORGANIZATION 
Coordinated Science Lab 
University of Illinois____
6b. OFFICE SYMBOL 
(If applicable)
N/A
7a. NAME OF MONITORING ORGANIZATION
Office of Naval Research
6c ADDRESS (Oty, Sfai», and ZIP Code)
1101 W. Springfield Ave. 
Urbana, IL 61801
7b. ADDRESS (Oty, Stata, and ZIP Codi)
800 N. Quincy St. 
Arlington, VA 22217
8a. NAME OF FUNDING/SPONSORING  
o r g a n iz a t io n  Joint Services 
Electronics Program
8b. OFFICE SYMBOL 
(If applicatila)
9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER
N00014-84-C-0149
8c ADORE SS (City. Stata, and ZIP Coda)
800 N. Quincy St. 
Arlington, VA 22217
10. SOURCE OF FUNDING NUMBERS
PROGRAM PROJECT TASK
ELEMENT NO. NO. NO.
WORK UNIT  
ACCESSION NO.
11. TITLE (Includa Security Classification)
COMPILER-ASSISTED SIGNATURE MONITOR NG
12. PERSONAL AUTHOR(S) Wärter, Nancy J. and Hwu, Wen-mei W.
13a. TYPE OF REPORT
Technical__
13b. TIME COVERED 
FROM _______ . TO
14. DATE OF REPORT (Tear, Month, Day) [15. PAGE COUNT




18. SUBJECT TERMS (Continua on reversa if  necessary and identify by block number)
performance, memory, signature monitoring, compiler-  
ass is ted  arc
19. ABSTRACT (Continue on reverse i f  necessary and identify by block number)
A methodology for applying optimizing compiler techniques to signature monitoring in order to reduce per­
formance overhead and simplify monitor hardware is introduced. We present models for the monitor architecture 
and the signature placement. The monitor architecture model is designed to keep both the hardware and integra­
tion complexities low. Our signature model is designed to insert reference signatures in order to satisfy a bound 
on the error detection latency. Justifying signatures are inserted on program arcs using an O (N2) algorithm which 
is significantly better than previous exponential node insertion algorithms. We use optimizing compiler techniques 
to customize the signature placement for various target processors and to minimize the performance overhead due 
to justifying signatures.
continued
20. DISTRIBUTION/AVAILABILITY OF ABSTRACT
®  UNCLASSIFIED/UNLIMITED □  SAME AS RPT. □  OTIC USERS
21. ABSTRACT SECURITY CLASSIFICATION
Unclassified ____
22a. NAME OF RESPONSIBLE INDIVIDUAL 22b. TELEPHONE (Include Area Code) 22c. OFFICE SYMBOL
OO FORM 1473,84 MAR 83 APR edition may be used until exhausted. 
All o ther editions are obsolete.
SECURITY CLASSIFICATION OF THIS PAGE
UNCLASSIFIED
>*C U W TY  CLASSIFICATION OF
Experiments were performed to study the performance and memory overheads of our compiler-assisted arc 
insertion signature monitoring method for a variety of architectures with different branch handling schemes. Using 
run-time information for processors with delayed branching or branch target buffers improves the performance 
overhead by approximately 50. However, processors that always fetch the instruction following a branch and 
squash it if the branch is taken (e.g., the MC68000) are able to hide some of the performance overhead and there­
fore the run-time information only slightly improves the performance overhead. Using the MC68000 as the target 
processor, the performance and memory overheads for latencies between 10 and 200 instruction cycles, range from 
16 to 4 and from 17 to 11 respectively. After 200 cycles, the overheads remain relatively constant In general, 
there is an inverse exponential relationship between the performance and memory overheads and the error detec­
tion latency.
UNCLASSIFIED_____________
SECURITY CLASSIFICATION OF THIS PAGE
Compiler-Assisted Signature Monitoring
Nancy J. W ärter Wen-mei W. Hwu
August 8, 1990
C e n te r  fo r  R e lia b le  an d  H ig h -P e r fo rm a n ce  C o m p u t in g  
C o o rd in a te d  S cien ce  L a b o ra to ry  
1101 W . S p rin g fie ld  A v e .
U n iv ers ity  o f  Illinois at U rb a n a -C h a m p a ig n  
U r b a n a ,I L  61801
A bstract
A methodology for applying optimizing compiler techniques to signature monitoring in order 
to reduce performance overhead and simplify monitor hardware is introduced. We present 
models for the monitor architecture and the signature placement. The monitor architecture 
model is designed to keep both the hardware and integration complexities low. 1 Our signature 
model is designed to insert reference signatures in order to satisfy a bound on the error detection 
latency. Justifying signatures are inserted on program arcs using an 0(N2) algorithm which 
is significantly better than previous exponential node insertion algorithms. We use optimizing 
compiler techniques to customize the signature placement for various target processors and to 
minimize the performance overhead due to justifying signatures.
Experiments were performed to study the performance and memory overheads o f our compiler- 
assisted arc insertion signature monitoring method for a variety o f architectures with different 
branch handling schemes. Using run-time information for processors with delayed branching 
or branch target buffers improves the performance overhead by approximately 50%. However, 
processors that always fetch the instruction following a branch and squash it if the branch is 
taken (e.g., the MC68000) are able to hide some o f the performance overhead and therefore the 
run-time information only slightly improves the performance overhead. Using the MC68000 as 
the target processor, the performance and memory overheads for latencies between 10 and 200 
instruction cycles, range from 16% to 4% and from 17% to 11% respectively. After 200 cycles, 
the overheads remain relatively constant. In general, there is an inverse exponential relationship 
between the performance and memory overheads and the error detection latency.
Preliminary research for this paper was presented at FTCS-20[23].
1
1 Introduction
An efficient concurrent error detection scheme should have good error coverage, be easy to imple­
ment, not significantly degrade the target system performance, and have reasonable error detection 
latency. For embedded concurrent error detection schemes, it is particularly important to keep the 
implementation complexity low. Otherwise, the additional hardware may actually lower the system 
reliability. To keep the implementation complexity low, the hardware should be simple and the 
integration should not require m ajor modifications to the basic system architecture.
In recent years, signature monitoring has become an attractive embedded concurrent error 
detection scheme because it can detect approximately 99% o f the control flow errors [11, 17, 25] 
using a simple watchdog m onitor2 [15, 12, 16, 20]. In signature monitoring, the compiler encodes 
the program control flow information into signatures. At run-time, the watchdog monitor uses 
these signatures to detect instruction bit and sequence errors [21]. Sequence errors correspond to 
failures that result in incorrect program flow.
In most signature monitoring schemes, signatures are inserted directly into the program code 
[14, 18, 25]. Adding these signatures degrades the target system performance and increases the 
program memory requirements. In order to reduce these performance and memory overheads, 
previous schemes have added hardware assists to the watchdog monitor [15, 19, 25].
In this paper, we present a signature monitoring method which uses optimizing compiler tech­
niques instead o f hardware assists to reduce the performance overhead.3 The optimizing compiler 
is customized to the target processor so that other than a simple interface, the monitor architecture 
is target processor independent. Furthermore, signatures are placed such that they guarantee a
2 Experiments performed by Gunneflo et al. indicate that approximately 78%  of the measured errors were control 
flow errors [7],
Preliminary research for this paper was presented at FTCS-20[23].
2
a: Phase 1 b: Phase 2
Figure 1: The phases o f signature monitoring.
bound on the error detection latency.
To analyze the effectiveness o f our compiler-assisted approach we compare the performance 
and memory overheads with the best hardware-assisted method, Wilken and Shen’s Embedded 
Signature M onitoring [25]. In addition, we analyze the effect o f bounding the error detection 
latency on the performance overhead, memory overhead, and error coverage.
2 Signature Monitoring
There are two phases to signature monitoring as shown in Figure 1. In the first phase, the 
compiler generates the signatures off-line and either embeds them into the original code [5, 10, 14, 
17, 19, 20, 26] or provides the information directly to the watchdog [5, 15]. During the second 
phase, the watchdog monitor computes a run-time signature based on the instructions fetched by 
the target processor. A t certain points the run-time signature is compared against the precomputed 
signature. Errors in the instructions or in their sequencing are detected if the signatures differ.
A program can be represented as a control flow graph. A typical control flow graph is presented
3
Figure 2: Weighted program control flow graph.
in Figure 2. A node represents a sequence o f instructions with only one entry and one exit point. 
Arcs represent the flow o f  control as determined by branch statements. The weights on the arcs 
represent the execution frequency o f that branch. For programs that are not self-modifying, the 
control flow graph is fixed and known at compile time. For compilers that can estimate the run­
time behavior o f the program , the weights are also known at compile time. This graph is used to 
generate signatures.
There are two types o f  signatures, reference and justifying. A reference signature is used to 
verify the control flow o f  a program interval which can consist o f one or more nodes. Reference 
signatures are inserted either within the entry node or within the exit node o f an interval. If it is 
inserted within the entry node o f the interval, when the signature is fetched the watchdog performs 
a zero check on the run-time signature and resets the run-time signature to the new reference 
value. On the other hand, if it is inserted within the exit node of the interval, when the signature is 
fetched the watchdog verifies its run-time signature with the reference value and resets the run-time
4
signature to zero.
If an interval associated with a reference signature includes more than one node, the signa­
ture at either the branch or the merge point, for entry node and exit node insertion respectively, 
is inconsistent. Justifying signatures are used to make the signature consistent at these points. 
Justifying signatures can be inserted either within a node, justifying node insertion, or on an arc, 
justifying arc insertion.
2 .1  E x is t in g  A p p r o a c h e s
Nam joo’s Path Signature Analysis (PSA ) is an example o f node insertion [14]. In the original 
PSA, reference signatures are inserted at the beginning o f each node. To reduce the memory and 
performance overhead, generalized PSA (Figure 3a) computes reference signatures for an interval 
or path set with a common start node. For each branch in the path set, the signatures will become 
inconsistent. Justifying signatures are added to make the signatures o f  all paths within a path-set 
consistent.
In more recent approaches, reference signatures are assigned to the exit or terminal nodes o f 
paths. In such approaches, the signatures are inconsistent at the merge nodes. In the Signatured 
Instruction Stream (SIS) approach (Figure 3b) which uses Branch Address Hashing (B A H ), ref­
erence signatures are placed before a merge on the sequential path [17, 18, 19]. Instead o f using 
explicit justifying signatures, Shen and Schuette hash the branch address with the implicit signa­
ture value o f the branch. If the run-time signature is incorrect then the rehashed branch address 
will be incorrect and the error will be detected unless the incorrect target is to another merge node. 
Although this scheme does not use justifying signatures, it is a predecessor o f  arc insertion because 
the implicit signature is only hashed along the taken arc o f a branch.
5
Figure 3: Existing signature monitoring schemes.
Embedded Signature Monitoring (ESM ) is a hybrid node/arc insertion method (Figure 3c) 
[25, 26]. The compiler inserts justifying signatures within the node after a branch instruction. At 
run-time, hardware is used to determine whether or not the branch is taken. If it is then the 
justifying signature is included into the run-time signature. Otherwise it is discarded. Thus, the 
justifying signature is only included into the run-time signature along the taken arc o f a branch.
In general, in arc insertion justifying signatures can be placed on any merge merge arc, not just 
the taken arc o f a branch. Our signature model presented in Section 4.1.1 considers all o f the cases 
for arc insertion.
2 .1 .1  S o ftw a re  C o m p le x ity
The implementation complexity includes both the hardware and software complexities. In this 
paper, the software complexity refers to the time required to compile a program. For a signature
6
Figure 4: Directed acyclic graph with out-degree two.
monitoring approach to be practical the time to compile a program with signatures must be rea­
sonable. The algorithm complexity o f  the signature insertion method reflects the additional time 
required to compile the program with signatures. In addition, any optimizing compiler techniques 
used specifically for signature insertion should also be included in the software complexity.
In N am joo’s PSA node insertion algorithm, all paths within a program interval are enumerated
[14]. These paths are then resolved to determine the justifying signatures, their placement, and 
the reference signature o f the interval. As shown in the following theorem, this algorithm has 
exponential complexity.
T h e o re m  1 The maximum number o f paths between two nodes in a directed acyclic graph with an 
out-degree o f two is exponential in the number o f nodes in the interval.
P r o o f  For the graph depicted in Figure 4 if node N is added to the graph with arcs to nodes N-l 
and N-2 then the number o f paths is P( N)  =  P ( N  — 1) +  P( N — 2). This is the Fibonnaci 
recurrence. The solution is
Fn = 7I
1 f l  +  s/5\
N
1
1 ( l - V 5 \
vs l 2 )
N
□
In Section 5 we present algorithms for arc insertion which have 0 (  A 2) complexity for a program 
graph with N nodes. In addition, we discuss the software complexity associated with the optimizing 
compiler techniques we use.
7
2 .1 .2  H a rd w a re  C o m p le x it y
To reduce the performance overhead due to inserting the signatures into the program code, previous 
methods have used hardware assists. Nam joo modified PSA by moving the signatures from the 
program code to the Cerebus-16 watchdog monitor environment [15]. Eifert and Shen extended SIS 
by removing the signatures from the program code and instead storing the program control flow 
graph and signature inform ation in the monitor memory [5]. This method, Asynchronous Signature 
Instruction Stream (ASIS) can monitor multiple processors continuously. Both o f these schemes 
eliminate the performance overhead but significantly increase the monitor complexity.
SIS and ESM use simple hardware assists to reduce the number o f signatures fetched by the 
processor and thus reduce the performance overhead. SIS uses branch detection and address hashing 
hardware to combine the signature with the branch instruction. ESM uses hardware to determine 
whether or not the branch is taken or not.
3 Monitor Architecture Model
The watchdog monitor design should be simple and easy to integrate into the target system. It 
is especially important to keep the monitor design simple if the target processor has an on-chip 
instruction cache. Since the monitor must lie between the processor and memory, the monitor 
will have to be integrated into the chip design. To simplify the monitor and ease integration, we 
assume that the signature placement scheme does not require additional hardware support or place 
restrictions on the target architecture.
The two basic parts o f  the monitor are the interface and checking modules. The interface 
module is responsible for detecting instruction words and signatures and propagating the error
8
signal from the checking module to the target processor. The interface module is target processor 
dependent. Previous work has addressed the interface implementation issues for a variety o f  target 
architectures [9, 14, 16, 18, 20].
The checking module is application specific rather than processor specific. The signature en­
coding scheme is chosen based on the error coverage, error detection latency, and performance and 
memory overhead requirements o f the application. The basic functions o f  the checking module are 
to generate the run-time signature, encorporate justifying signatures, compare against reference 
signatures, and propagate an error signal to the interface module if the run-time and reference 
signatures disagree.
Subroutine calls and interrupts require special handling. Previous methods use signature stacks 
to store the signature during a subroutine call or interrupt handling routine [4, 5, 18, 19]. On a 
subroutine return or return from interrupt, the signature is popped off the stack and checking o f the 
interrupted routine continues. The signature stack significantly increases the monitor complexity 
because it requires a memory interface to handle stack overflows. Saxena and McCluskey propose 
a software approach for target processors that support coprocessors [16]. On an interrupt, the 
signature can be saved by generic processor save/restore routines. While this simplifies the monitor 
complexity, it will increase the performance and memory overheads. Wilken and Shen eliminate 
the signature stack by using a characteristic signature for each routine [26]. On a return from 
interrupt, this characteristic routine is used to justify the run-time signature. The disadvantage o f 
this approach is that reference signatures cannot be inserted within the interrupt handling routines.
In our approach, we assume that there is a bound on the error detection latency. If the error is 
not detected within this bound, the error is assumed to be undetected. If a signature stack is used 
and an error occurs within a program interval before an interrupt, the error will not be detected
9
until after the interrupt handler has been executed. Such errors will likely exceed the bound on the 
error detection latency and are considered undetected. Therefore, signature stacks are not included 
in our model. To eliminate the need for a subroutine signature stack, we assume that reference 
signatures are placed before a subroutine call and at the end o f a subroutine.
Interrupts, on the other hand, are asynchronous and therefore reference signatures cannot be 
placed before an interrupt. Instead, the signature checker is reset on an interrupt and checking 
begins on the interrupt handling routine. Reference signatures are inserted within the handling 
routine in order to satisfy the bound and at the end of the routine. On a return from interrupt, 
the signature checker is disabled until the next reference signature is fetched. After that normal 
checking resumes.
The elimination o f  signature stacks greatly simplifies the monitor hardware. In addition, for 
on-chip monitors the signatures do not need to be incorporated into the processor state. Therefore, 
it is possible to integrate the monitor without major modifications to the original processor design.
4 Signature Insertion Model
The signature insertion model indicates how justifying signatures and reference signatures should 
be inserted into the program code in order to guarantee that the program is properly encoded. 
Furthermore, the justifying signature insertion model is designed to minimize the performance 
overhead and the reference signature insertion model is designed to guarantee a specified bound on 
the error detection latency. The models have low software complexity and do not require special 
hardware support beyond the basic monitor.
10
4 .1  J u s t i fy in g  S ig n a tu r e  I n s e r t io n
In this section we present our arc insertion model and show how optimizing compiler techniques 
can be used to simplify the monitor and reduce the performance overhead.
In justifying arc insertion, the program interval is justified at the program merge nodes. At a 
merge node, the signature along each incoming arc is different. Only one signature can be used to 
define the signature at the merge node. Justifying signatures are used to transform the remaining 
incoming signatures to this unique signature. There is only one constraint to placing the signatures 
on the program arcs.
C on stra in t 1: For a merge node with i incoming arcs, justifying signatures must be placed on 
i — 1 arcs.
The arcs with justifying signatures are justifying arcs and the remaining arc is the unique arc.
4.1 .1  A r c  In sert ion  M o d e l
There are three types o f  justifying arcs, which are drawn as dashed lines in the control flow graphs 
of the three cases in Figure 5.
In the first case, the justifying arc represents an unconditional branch. Since it is an uncondi­
tional branch, the signature can be placed directly in the node without affecting any other program 
path. The signature can either be placed before or after the branch instruction. If the target 
architecture always fetches the instruction following a branch, it can be placed after the branch. 
Otherwise, it must be placed before the branch.
In the second case, the justifying arc is on the sequential path. The sequential path can either 
be the not taken path o f a conditional branch or after a non-branching node. Either way, the last
11
NODE INSERTION
CASE 1: JUSTIFYING ARC FROM UNCONDITIONAL BRANCH
CASE 2: JUSTIFYING ARC ON SEQUENTIAL PATH
Figure 5: Justifying arc insertion.
12
instruction in the source node and the first instruction in the destination node o f the justifying 
arc are in sequential m em ory locations. The justifying signature is placed between these two 
instructions.
In the third case, the justifying arc is on the taken path o f a conditional branch. In this case 
the source and destination nodes o f the justifying arc are not in sequential memory locations. 
Therefore, to place the justifying signature on the arc, a justifying block is inserted between the 
source and destination nodes. The justifying block consists o f a signature instruction and a jump 
instruction. The destination o f the branch instruction in the source node is modified to jump to 
the justifying block, and the justifying block jumps to the original destination node.
4 .1 .2  J u stify in g  S ig n a tu re  G e n e ra tio n
For arc insertion, signature generation depends on the following property.
P r o p e r ty  1: There is a path along unique arcs between the start and terminal nodes o f  a program 
interval.
Based on this property, all o f  the signatures o f the unique arcs in a program interval can be 
determined by a breadth first search. After all the unique arcs are labeled with their signatures, 
the justifying signatures can be generated as shown in Figure 6. The justifying signature J1 is a 
function o f the unique signature Si, the unique signature of its source node Sj, and the signature 
of node A.
4 .1 .3  O p tim iz in g  C o m p ile r  T ech n iq u es
In an optimizing compiler, the architectural features o f the target processor are known so that 
the compiler can order the instructions such that they fully utilize the target processor while not
13
Figure 6: Signature generation for arc insertion.
violating the execution order. In a similar fashion, the target processor features can be used to 
ensure that signatures are placed properly. That is, only signatures that are supposed to be included 
into the run-time signature are fetched by the target processor. In particular, the branch handling 
scheme must be accounted for. For example, recall that the MC68000 always fetches the instruction 
following the branch and discards it if the branch is taken. Therefore, signatures can always be 
placed after an unconditional branch without incurring any performance penalty. On the other 
hand, signatures cannot be inserted directly after the branch on the sequential arc. Otherwise, 
if the branch is taken then the signature will be incorrectly included into the run-time signature. 
A detailed performance and memory cost analysis for a variety o f branch handling mechanisms is 
provided in Section 6.1.1.
Another optimizing compiler technique is to use run-time information to improve the proces­
sor performance. For instance, run-time information can be used to place instructions to improve 
sequential locality. Run-time information can also be used to place signatures to reduce the per­
formance overhead. The minimum number o f justifying signatures required to encode a program
14
interval with one reference signature and n conditional branches is n [26]. Arc insertion places the 
minimum number o f signatures into the program code. Our goal is to use run-time information to 
minimize the number o f signatures fetched and thus minimize the performance degradation.
In arc insertion, any merge arc can be selected as the unique arc. Run-time information can be 
used to guide this selection. By measuring the run-time behavior o f  the program, the node execution 
and branch frequencies can be predicted. Based on this prediction, the cost o f  inserting a signature 
on each merge arc can be determined. The cost, arc-cost, in terms o f number o f instruction words 
fetched, is:
arc-cost =  arc-frequency  * node-weight * just-w ords.
For example, if the signature is placed on the taken path o f a conditional branch, arc-frequency is 
the probability that the branch is taken, node-weight is the number o f times the branch is executed, 
and just-words is the number o f instructions words required for a justifying block. The just-words 
also reflects cost o f the special architectural features o f the target processor.
The following theorem proves that using arc-cost to select the unique arc minimizes the perfor­
mance overhead for justifying arc insertion.
T h e o re m  2 If the unique arc o f each merge node corresponds to the incoming arc with the highest 
arc-cost, the number o f instruction words fetched to justify the program is minimized.
P r o o f  Since justifying signatures are placed on the arcs, the signature assignments for each merge 
node do not depend on the assignments at other merge nodes. Therefore, the total number 
o f justifying signatures fetched is the sum o f the justifying signatures fetched at each merge 
node. For a single merge node, if the unique arc has the highest arc-cost o f  all the incoming 
arcs, the number o f instruction words fetched to justify that node is a minimum. Since a
15
sum o f minimums is a minimum sum, the number o f instruction words to justify the entire
program is minimized. □
This theorem proves that using run-time information will minimize the performance overhead for 
justifying arc insertion. In the experiment section (Section 6) we empirically prove that optimized 
arc insertion (i.e., using run-time information) minimizes the overhead due to justifying signatures.
4 .2  R e fe r e n c e  S ig n a tu r e  I n s e r t io n
The separation o f reference signatures defines the checking interval lmax. For bit errors, the average 
detection latency is lmax/ 2 and the maximum detection latency is lmax [26]. For single sequence 
errors, the average detection latency is lmax and the maximum detection latency is 2lmax. Let B 
be the bound o f the error detection latency for all bit errors and single sequence errors. Reference 
signatures must be placed such that / is at most B /2.
4 .2 .1  R e fe re n ce  In sert ion  M o d e l
The reference signature insertion model is shown in Figure 7. A  reference signature is required at 
each program exit point in order to correctly check the program (case 1). Recall that a signature 
stack will violate the bound on the error detection latency. To eliminate the need for a signature 
stack for subroutine calls, reference signatures are placed before the call and at the end o f the 
routine (cases 2 and 3). A reference signature is placed at the end o f an inner loop, case 4, in order 
to guarantee that loops o f  length less than lmax do not violate the bound on the detection latency. 
Furthermore, this breaks cycles in the program graph which simplifies the reference placement 
algorithm presented in the next section. Finally, signatures are placed such that no two are farther 
apart than lmax (case 5).
16
R e fe re n c e  signatures are placed:
case 1 : at p rogram  exit points,
case 2: before a  subroutine call,
case 3: at the end  of subroutines,
case 4: at the end  of an  inner loop, and
case 5: to g uaran tee  a  bound, Imax, 
on the erro r detection latency.
Figure 7: Reference signature insertion model.
5 Signature Insertion Algorithms
In this section, the algorithms for placing and generating both justifying and reference signatures 
are presented. A discussion o f the algorithm complexities and overhead associated with collecting 
run-time information is provided at the end o f the section.
5 .1  J u s t i fy in g  S ig n a tu r e  P la c e m e n t  A lg o r i t h m
The algorithm for justifying signature placement4 is shown in Figure 8. The algorithm implements 
the justifying arc insertion model and generates a partial terminal node set T . This set corresponds 
to the first four cases o f  the reference signature model, namely, a program or subroutine exit node, 
an inner-loop exit node, or the node before a subroutine call. The program control flow graph, G, is 
the input to the algorithm. First, the terminal nodes are determined. Then, for each merge node, if 
all incoming arcs are from terminal nodes, none o f the signatures need to be justified. Otherwise, a 
unique arc is selected. The unique arc can be specifically selected (e.g., using run-time information)
4 For the algorithms in this section, it is assumed that the compiler converts all switch statements into the equivalent 




input: G = program control flow graph 
output: program graph with justifying signatures and 
partial terminal node set T  7
placeJustifying_signatures(G)
 ^ fo r each node n in G 
if n is a terminal node 
add n to the terminal node set T 
place a reference signature at the end of n
fo r each merge node m in G 
if all incoming arcs to m are from terminal nodes 
mark all arcs as unique 
else
select a unique arc 
fo r each non-unique merge arc x 
if x from an unconditional branch
place a justifying signature before the branch instruction
else if x between two sequential nodes s1 and s2
create a justifying signature after the last instruction of node s1 
else /* x is the taken arc of a conditional branch 7
create a justifying block and place between the conditional 
branch node and the target node 
correct the target labels
Figure 8: Justifying signature placement algorithm.
or it can be selected at random. Note that for an unconditional branch, the signature can be placed
after the branch for target architectures that always fetch the signature following a branch. The 
MC68000 is an example o f such an architecture [3].
5 .2  R e fe r e n c e  S ig n a tu r e  P la c e m e n t  A l g o r i t h m
The algorithm for reference signature placement is shown in Figure 9 and its functions are shown 
in Figure 10. The algorithm places reference signatures so that the maximum distance between 
any two reference signatures is less than Imax. The program control flow graph, G, is effectively an 
acyclic graph since the terminal nodes break cycles. S and T represent the start node and terminal 
node set.
The algorithm is a greedy algorithm. Starting from the start node and each terminal node, it
18
traverses the paths o f  all successors calculating the maximum path length (step 1). The traversal 
along each path stops at a terminal node. When the successor o f a node makes the path length 
greater than /max, the current node is marked as a terminal node. The successors o f  the new 
terminal nodes are also traversed. The algorithm stops when all arcs have been visited. The 
reference signatures are then placed at the end o f each terminal node (step 2).
During the traversal, when paths merge they are combined in add-queue into one path with 
the path length set to the maximum path length. In addition, the number o f duplicates o f  the 
end node, dups, is incremented. A merge node is only removed from the queue in remove-queue 
when all incoming paths have been traversed (i.e., p.dups is equal to the number o f predecessors o f 
p. end-node).
5 .3  S ig n a tu r e  G e n e r a t io n  A lg o r i t h m
The signature generation algorithm is presented in Figure 11. Unique arcs have been identified 
by the justifying signature placement algorithm. The unique intermediate signatures are marked 
using a breadth first search. Once all the unique arcs have been marked with their intermediate 
signatures, the reference signatures are known and the justifying signatures can be calculated as 
shown in Figure 6 in Section 4.1.2.
5 .3 .1  C o m p le x it y  A n a lys is
For a program graph o f N  nodes, the complexity o f the justifying signature placement algorithm 
is 0 ( N 2). To generate the terminal nodes, loop analysis must be performed. The complexity o f  
the loop generation algorithm is 0(1V2)[1]. Once loop analysis has been performed, N nodes are 
considered to identify and mark the terminal nodes. To mark the unique arcs, at most 2N — 2
19
/*  place_reference_signatures
inputs: G = programjgraph, S = sfari nocfe,
T = partial terminal node set,
Imax =■ 7/2 error detection latency bound 
outputs: program graph with reference signatures 
placed no further apart than Imax and 
the complete terminal node set T 7
place_reference_signatures(G, S, T, Imax)
* p = generate_path(S) / •  step 1 y
add_queue(ref_queue, p) 
fo r each terminal node t in T 
fo r  each successor s of t 
p = generate_path(s) 
add_queue(ref_queue, p) 
w h ile  ref_queue not empty 
p = remove_queue(ref_queue) 
if p.length + max(|successors of p.end_node|) > Imax 
mark p.end_node as a terminal node and add to T 
fo r each successor s of p.end_node 
if p.end_node is a terminal node 
new_p = generate_path(s) 
else
new_p = updatej3ath(s,p) 
add_queue(ref_queue, new_p) 
destroy p
fo r each node in G
if a terminal node steP 2  ^
place a reference signature at the end of the node




input: n = program graph node 
output: path p which has path length 
equal to length of n, n, 
number of duplicates of n 
initialized to 1 7
generate__path(n)
 ^ create p 
p.length = |n| 
p.end_node = n 




inputs: n = program node, p = current path 
output: a new path, new_p, which has path 
length set to lenath of p + length of n, 
n, the number of duplicates of n 
initialized to 1 7
update_path(n)
K  #create new_p
new_p.length = p.length + |n| 
new_p.end_node = n 
new_p.dups = 1 
return new_p
/*  add_queue
inputs: queue = list of paths, p = path to add 
output: queue with either a new path p or an 
updated path e that has the same end 
node as p. the updated path e has length 
set to the maximum length of p and e and 
the number of duplicates of the end node 
of e is incremented 7
add_queue(queue, p)
{
fo r each element e in queue 
if e.end_node = p.end_node 
e.length = max(e.length, p.length) 
e.dups = e.dups + 1 
else
add p to end of queue
/* removejqueue
input: queue = list of paths 
output: path p whose end node has had 
all its incoming arcs visited 7
remove_queue(queue)
p = first element of queue 
while p.dups != number of predecessors of p 
add p to end of queue 
p = first element of queue 
return p
}
Figure 10: Functions o f the reference signature placement algorithm.
21
/*  signature generation
inputs: G = program graph, S = start node,
T = terminal node set 




for each node n in {S,T} 
for each successor s of n
if s is an unmarked unique arc 
mark the intermediate signature on the 
unique arc
push s on unique_stack 
while unique_stack not empty 
pop n off unique_stack 
if n is not a terminal node 
for each successor s of n
if s is an unmarked unique arc 
mark the intermediate signature 
on the unique arc 
push s on unique stack
else
calculate the reference signature of n 
for each merge node in G
calculate the justifying signature of the non-unique 
incoming arcs
Figure 11: Signature generation algorithm.
merge arcs are considered for a graph with N nodes and an out degree o f  two.
The complexity o f the reference signature placement algorithm is also 0 ( N 2). In step 1, 
add-queue is called once for each arc and remove-queue is called once for every node other than 
the initial start and terminal nodes. Both add-queue and remove-queue linearly search the queue 
and thus have 0 ( N )  complexity. In step 2, each node is evaluated once. Therefore, the reference 
algorithm has 0 ( ( 2 N  — 2) * N)  -f 0 ( N 2) +  0 ( N )  =  0 ( N 2) complexity.
In the signature generation algorithm, the intermediate signature o f  each arc is marked once. 
Therefore, it has 0 ( N ) complexity. The complexity o f all the algorithms combined is 0 ( N 2). 
Compared to the exponential complexity o f justifying node insertion, 0 ( N 2) complexity makes 
justifying arc insertion a desirable approach.
If run-time information is used to select the unique arcs, the performance overhead due to
22
justifying signatures can be minimized. If a profiler is used to collect the run-time information, the 
program is run for a variety o f  inputs while the execution frequencies are calculated. Therefore, 
the time to compile increases. However, for production code, this one-time cost may be worth the 
improved performance.
6 Experimental Results
In this section, we present the results o f experiments performed to study the performance o f 
compiler-assisted arc insertion and hardware-assisted node insertion and to analyze the impact 
of bounding the error detection latency.
To perform the experiments, we added profiling and signature placement to the GNU C com ­
piler. Programs were compiled with probes inserted at each node. At run-time these probes were 
used to collect the branch and node execution frequencies. These frequencies, combined with the 
architecture specifications, were used to guide signature placement. Thus, the complete process for 
inserting signatures is to compile the program with probes, profile the program on a large set of 
sample inputs, and re-compile the program to place signatures.
The experiments were performed using the benchmark set shown in Table 1. The ten benchmarks5 
are a combination o f Unix, C A D , and text processing programs. The largest benchmark is more 
than an order o f  magnitude larger than benchmarks o f previous studies [17, 19]. The sizes o f the 
input sets used in profiling are also given in Table 1. The average node or basic block size o f each 
benchmark is given for the MC68000.






c m p file comparison 2406 16
c o m p r e s s com press/expand files 14410 20
d if f file comparison 32314 19
e q n format equations 55175 20
g re p search file for expression 4630 20
m p la tile based PLA generator 24104 19
ta r create tape archives 22612 14
tb l format tables 65117 21
w c line/w ord /char count 1686 20
y a c c parsing program generator 48444 10
Table 1: Benchmark characteristics.
6 .1  P e r f o r m a n c e  o f  A r c  I n s e r t io n
In this section we compare the performance and memory overheads o f a compiler-assisted arc 
insertion and a hardware-assisted node insertion scheme for a variety o f branch handling methods. 
Justifying Arc Insertion (JA I) is our arc insertion which uses the algorithm in Section 5.1. In 
Optimal JAI, the signatures are placed using run-time information. In Random JAI, each unique 
signature is randomly selected.
Wilken and Shen’s Embedded Signature Monitoring (ESM ) scheme is a hybrid node-arc inser­
tion method. It has the performance and memory overheads o f node insertion but the software 
complexity o f  arc insertion. Signatures are placed within a node after a branch instruction. At 
run-time, hardware is used to determine if the branch is taken. If so, the signature is included into 
the run-time signature; otherwise, it is discarded. Therefore, signatures are generated to justify 
the arcs.
For these experiments, there was no tight bound on the error detection latency (in the next 
section, the effects o f bounding the error detection latency are presented). Reference signatures 
were placed at the program and subroutine exit nodes, before subroutine calls, and at the inner-loop
24
exit nodes. In order to make the schemes comparable, these signatures were also inserted for ESM, 
which originally only inserts reference signatures at the program exit nodes.
When a signature is placed after a branch, some or all o f the performance overhead may be 
hidden by the branch handling behavior o f the target architecture. We ran our experiments for 
three branch handling schemes: prefetch, delayed branching, and Branch Target Buffer (B T B ). In 
the prefetch scheme, the instruction following the branch is always fetched. If the branch is taken, 
the instruction is discarded. The MC68000 uses this branch handling method [3]. For delayed 
branching, we assume that the delay slot can be filled 70% o f the time for a conditional branch 
and 100% o f the time for an unconditional branch [13]. In the BTB scheme, the expected target 
for the branch is fetched from the buffer. We assume that if the target is wrong, the correct target 
is determined within one instruction cycle [8].
6 .1 .1  C o s t  A n a ly s is
The performance and memory cost o f inserting a signature depends on the insertion scheme and 
the target processor architecture. In ESM, a hardware monitor is used to determine whether the 
branch is taken or not. This hardware depends on the branch handling hardware o f the target 
processor. In JAI, the hardware monitor is kept independent o f the processor architecture and 
implementation by using this information at compile-time to place the signatures. To guarantee 
correct checking, the signatures must be placed such that the monitor does not see any incorrectly 
fetched signatures. In some cases, to ensure independence from the basic system architecture, NOP 







































b: Memory cost matrix.
Figure 12: Performance and memory cost matrices.
The performance and memory overhead costs are presented in Figure 12 6 7. The cost depends 
on the signature insertion scheme and the branch handling method. The three cases correspond to 
the three cases in the arc insertion model in Figure 5 in Section 4.1.1. ESM was not designed to 
work with a BTB and thus the cost for this combination is not presented 8. For all o f  the cases 
we assume that the justifying signature instruction requires one instruction word. The number o f 
instruction words required to implement the justifying block is discussed in case 3.
P e r fo r m a n c e  c o s t . The performance cost matrix in Figure 12a indicates the number o f 
instruction words fetched per justifying signature. Each case is described in detail below.
case  1 - u n co n d it io n a l b ra n ch : For an unconditional branch, both schemes place the signature 
in the node. The signature is placed after the branch for the prefetch scheme and delayed 
branching. Since the instruction after an unconditional branch is always discarded in the 
prefetch scheme, the cost for both JAI and ESM is zero. In delayed branching, the delay
6The matrices reflect the cost when the corresponding arc types are traversed. Optimal JAI traverses fewer arcs 
than ESM and thus has a lower overall cost.
7The cost for JAI depend on the location of the monitor. The costs presented are conservative. If the monitor is 
placed after the instruction register then the cost will be lower.
8The ESM hardware monitor could be modified to handle a BTB based target processor.
26
slot can always be filled for an unconditional branch and thus the cost is one. For the BTB 
scheme, the signature is placed before the branch and thus the cost is one.
case 2 - sequ en tia l p a th : For JAI, if the sequential path corresponds to the not taken path o f 
a conditional branch (top number for case 2 in Figure 12a), the signature cannot be placed 
directly after the branch. For the prefetch and BTB schemes, to guarantee that the signature 
is not included when the branch is taken, a NOP instruction is inserted between the branch 
and the signature and thus the cost is two. This is a cost paid to insure that the monitor is 
independent o f the basic system architecture.9 For delayed branching, the delay slot is filled 
from before the branch for a conditional branch. Therefore, the signature is placed after the 
delay slot and the cost is one. If the sequential path does not correspond to the not taken 
path of a conditional branch, the cost is one for all o f the branch handling methods.
In ESM, the signature is always fetched and discarded by the monitor if a conditional branch 
is not taken. For the prefetch scheme, the instruction following the branch is always executed 
if the branch is not taken. Therefore, the cost is one. In delayed branching, the delay slot 
can be filled 70% o f the time and thus the cost is 0.7.
case 3 - taken  p a th : For JAI, justifying blocks are placed on the taken arc o f  the conditional 
branch. For prefetch and delayed branching, the signature is placed after the jump instruction 
in the justifying block. For the BTB m ethod, the signature is placed before the justifying 
block jump instruction. To prevent the signature from being included into the run-time 
signature when the target o f the conditional branch in the BTB is incorrect, a NOP instruction 
is inserted before the signature in the justifying block. Again, this cost is the result of
9It will be shown in Figure 14 that this cost is not incurred in practice.
27
insuring system architecture independence. The cost is simply the number o f instruction 
words required for the justifying block. We assume that the justifying block size is three 
instruction words for the prefetch and BTB schemes, and two instruction words for delayed 
branching.10
For ESM, the signature is placed directly after the branch. For the prefetch scheme, the 
instruction is squashed if the branch is taken and thus the cost is zero. For delayed branching, 
the delay slot can be filled 70% o f the time and thus the cost is 0.7.
M e m o r y  c o s t . The memory cost matrix in Figure 12b indicates the number o f instruction 
words inserted into the program code. Note that cases 2 and 3 for ESM actually stem from one 
signature being placed after a conditional branch. Therefore, the memory cost o f the cases combined 
is one.
6 .1 .2  P e r fo rm a n ce  O v erh ea d
In this section we present the relative performance overhead results for Optimal JAI, Random JAI, 
and ESM for the three branch handling methods. We also present the performance overhead o f the 
three insertion schemes for the MC68000 target processor.
Figure 13 shows how each insertion scheme performs relative to Optimal JAI for each branch 
handling method. As can be seen, Optimal JAI has the minimum performance overhead for all of 
the branch handling schemes. However, for the prefetch branch method, Random JAI performs 
almost as well as Optimal JAI. Since the cost o f an unconditional branch is zero for the prefetch 
scheme, it appears that Random JAI places most o f  its signatures on the unconditional path. The
10The prefetch scheme estimate is based on the MC68000 which needs two instruction words for a jump instruction. 
For the others we assume one word per instruction.
28
Figure 13: Normalized performance overhead.
graph in Figure 14b showing the distribution o f the performance overhead for Random JAI confirms 
this conclusion. Note that for the delayed branching and BTB methods, Optimal JAI adjusts for 
the cost o f an unconditional branch whereas the signature placement in Random JAI does not 
change. Therefore, for these two branch handling methods, the performance overhead for Random 
JAI is almost double the performance overhead o f Optimal JAI.
For all the schemes and branch handling methods, the number o f reference signatures inserted 
is the same. Therefore, the relative percentage o f performance overhead due to reference signatures 
(Figure 14) indicates the overall performance o f the schemes for a given branch handling method. 
That is, the higher the percentage due to reference signatures, the better the scheme. For all o f the 
signature insertion schemes, the percentage due to reference signatures shows that a processor with
29
prefetch branch handling will have the lowest performance overhead and processors with delayed 
branching will perform slightly better than processors with BTBs.
The distribution o f performance overhead for ESM in Figure 14c shows the disadvantage o f 
node insertion. In node insertion, for a conditional branch, signatures are fetched along both the 
taken and not taken (sequential) paths. For ESM, the signatures fetched along the sequential 
path account for 18.7% o f the performance overhead for prefetch and 12.3% for delayed branching. 
These signatures are discarded in ESM but still incur a performance penalty. In arc insertion, these 
signatures are not fetched at all.
The performance overhead for the MC68000 in Table 2 shows that adding signature monitoring 
to an MC68000 based target system will only degrade the performance by approximately 4% n . 
This includes the overhead due to reference signatures placed before a call, at the subroutine and 
program exit nodes, and at inner-loop exit nodes. If the overhead due to these reference signatures 
is removed so that the program is only checked at the exit nodes, the performance overhead is 
reduced to approximately 0.1%. In this case, the error detection latency is the entire program 
execution time.
6.1.3 M e m o r y  Overhead
Figure 15 shows the normalized memory overhead for all o f the branch handling methods. The 
same number o f signatures were added for all insertion schemes. The difference in the memory 
overhead is due to the addition o f justifying blocks. Since ESM does not use justifying blocks it has 
the lowest memory overhead. Instead, it uses additional hardware. Therefore, there is a tradeoff 
between the memory and hardware overheads. The fact that the memory overhead for Random








F?yl - unconditional case 
18383 • taken case 
WZ\ -  sequential case 




a: Distribution of performance overhead for Optimal JAI.
• unconditional case 
E53S1 • taken case 
Y////\ - sequential case 





Distribution of performance overhead for Random JAI.
Branch Handling Method
c: Distribution of performance overhead for ESM.
Figure 14: Performance overhead distributions.
31
Benchmark Optimal JAI Random JAI ESM
cmp 1.80 1.80 1.80
compress 1.54 1.54 1.75
diff 3.05 3.08 3.72
eqn 3.81 4.02 6.08
grep 4.79 5.14 8.13
mpla 2.26 2.32 2.41
tar 6.70 6.70 7.20
tbl 6.44 6.61 7.23
wc 3.51 3.51 5.42
yacc 4.71 4.87 4.81
mean 3.86 3.96 4.86
std. dev. 1.80 1.85 2.36
Table 2: Percentage o f performance overhead for the MC68000.
JAI is less than for Optimal JAI shows that there is also a tradeoff between the performance and 
memory overheads. The memory overhead for the MC68000 is shown in Table 3. On average there 
is approximately 11% memory overhead associated with adding JAI to a MC68000 based target 
system.
6.2 Bounding the Error Detection Latency
In this section we analyze the effect on the performance and memory overheads o f varying the bound 
on the error detection latency. We also discuss the impact o f  reference signature placement on the 
error coverage. For this analysis, justifying signatures were optimally placed using the algorithm in 
Section 5.1. Reference signatures were placed using the greedy algorithm presented in section 5.2, 
where the bound Imax is the maximum distance between two reference signatures. The target 
processor in the experiments was the MC68000.
32
Figure 15: Normalized memory overhead.
Benchmark Optimal JAI Random JAI ESM
cmp 13.25 13.25 9.52
compress 8.20 8.20 6.46
diff 10.56 10.53 8.60
eqn 8.65 8.72 7.46
grep 16.25 15.82 13.48
mpla 6.76 6.76 5.98
tar 9.57 9.64 8.52
tbl 12.34 12.11 9.90
wc 13.77 12.58 10.18
yacc 9.49 9.47 8.24
mean 10.88 10.71 8.83
std. dev. 2.94 2.72 2.14
Table 3: Percentage o f memory overhead for the MC68000.
33
6.2.1 Performance Overhead
To study the effect o f the error detection latency on the performance overhead, Poverkead, signatures 
were placed for 19 values o f / max [10, 20, ..., 100, 200, ..., 1000]. For each level o f lmax there were 10 
overhead observations (one for each benchmark). Assuming a normal distribution at each level o f 
/max, a non-linear regression analysis on the experimental observation yields the following statistical 
relationship:
Povtrh'ai =  14.998e~°'049,ma* +  4.017.
The regression curve for performance overhead is shown in Figure 16. A plot o f the residuals 
shows that the actual data points are evenly distributed around the predicted function and thus 
the fit is reasonable. For low values o f lmax there is a significant change in the overhead for small 
changes in lmax until lmax is approximately 80 instructions. The worst case corresponds to placing 
signatures at each basic block. For a basic block length o f 5 instructions12 the worst case mean 
performance is approximately 15.76%. The asym ptote o f  this relation, 4.017%, is the performance 
overhead due to justifying signatures and reference signatures placed at the subroutine exit nodes, 
inner-loop exit nodes, and before a subroutine call. For all o f the benchmarks this asymptote is 
reached by lmax =  300 instructions.
The 95% confidence intervals for the expected value and for an individual prediction are also 
shown in Figure 16. From the individual prediction confidence interval, we can conclude that 95% 
of new programs will have a performance overhead between approximately 1% and 8% for an lmax 
of 100. Furthermore, the expected value confidence interval indicates that for /max =  100, 95% o f 
the time the mean value after including a new program will remain approximately between 3.5%





—  95% confidence interval for the expected value 






10 100 500 1000
max distance between reference signatures (instr)
Figure 16: Predicted performance overhead with 95% confidence intervals.
and 4.5%.
6.2.2 M e m o r y  Overhead
The same experiments were performed to study the statistical relationship between the memory 
overhead, moverhead, and /max. Assuming a normal distribution at each level o f  /mar, a non-linear 
regression analysis on the experimental observations yields the following statistical relationship:
mov„h '« i  =  7.848e-0040'”“ '* +  10.927.
The regression curve with the 95% confidence intervals is shown in Figure 17. Again, the residual 
plot shows that the predicted curve is a good fit.
The worst case mean memory overhead (lmax =  5) is approximately 17.35%. The asymptote 
overhead is 10.927%. Note that the maximum difference between the performance overhead across 
the lmax range is approximately 12%. For the memory overhead, the maximum difference is approx-
35
Figure 17: Predicted memory overhead with 95% confidence intervals.
imately 7%. Therefore, varying the error detection latency has greater impact on the performance 
overhead than on the memory overhead. On the other hand, the memory overhead, even for the 
worst case, is worse than the performance overhead. This implies that the longer basic blocks get 
executed more frequently.
From the confidence intervals o f Figure 17, for an lmax o f 100, 95% o f the time a new program 
will have a memory overhead between approximately 6% and 17%, and the overall mean will remain 
between approximately 10.5% and 11.5%.
6.2.3 Error Detection Latency
For a bound /max, the upper bound on the detection latency for double bit errors is lmax• If errors 
are evenly distributed, the average detection latency for bit errors is /max /2 . For single sequence 
errors, the maximum detection latency is 2lmax. Therefore, for all single errors, the maximum
36
detection latency is 2 /max. Near optimal performance can be achieved for lmax — 100 instructions 
or a maximum detection latency o f 200 instructions.
6.2.4 Error Coverage
Consider a program interval o f lmax w-bit instruction with a iw-bit signature. Using Carter’s MISER, 
all double bit errors are detected if:
w
I I 2 2 — 1 . /  W \
lmax <  [ “ J [ 2  2 + lj ,
where w is the signature width [2]. For the MC68000, w =  16 and thus /mar must be less than 4112 
to detect all double bit errors. Therefore, the bit error coverage will not be affected by varying 
lmax since there is no point in increasing lmax beyond 300.
Wilken and Shen report that the coverage o f sequence errors is less than 1 — 1 /(lmax +  1) [25]. 
For lmax — 10, the sequence error coverage is less than 99.17%. To improve the error coverage, the 
intermediate signatures must be randomized [22, 25]. To do this in our signature model, random 
initial signatures are added after each reference signature. For the optimal case, on average the 
reference signatures account for 43% o f the memory overhead and 45% o f the performance overhead. 
Therefore, randomizing the signatures will increase the optimal performance overhead from 4.02% 
to 5.83% and the optimal memory overhead from 10.93% to 15.63%. The error coverage with 
randomized intermediate signatures is approximately 1 — 2~w =  99.99 +  % for w =  16 [11, 25].
For the same performance and memory overhead, Saxena’s Extended Precision Checksums can 
be used [16]. Extended Precision Checksums detect all single bit errors and all unidirected errors. 
In addition, the sequence error coverage approaches one as the number o f sequence errors increases 
and the average detection latency is usually less than lmax/2.
37
We were not able to study the effect o f  interrupts and context switches on the error coverage. 
However, since the signature is disabled on a return from interrupt until the first reference signature, 
the error coverage will decrease as lmax increases.
7 Conclusions
In this paper we presented a signature insertion scheme with simple implementation complexity and 
low performance overhead. Our justifying arc insertion method has O(iV^) algorithm complexity 
compared to the exponential complexity o f previous node insertion methods. Furthermore, we 
proved that optimizing compiler techniques can be used to minimize the performance overhead for 
arc insertion and empirically proved that this optimized arc insertion minimizes the performance 
overhead due to justifying signatures.
We also performed experiments bounding the error detection latency and discovered that there 
is an inverse exponential relationship between the performance and memory overheads and the 
error detection latency. Using the MC68000 as our target processor, the performance and memory 
overheads for our benchmark set are relatively constant for detection latencies greater than 200 
instruction cycles. For latencies between 10 and 200 cycles, the performance overhead ranges from 
approximately 15.76% to 4.02%. Likewise, the memory overhead drops from approximately 17.35% 
to 10.93%.
Acknowledgements
The authors would like to thank Michael Loui, Pohua Chang, John Fu, Paul Chen, Bob Dimpsey, 
and all members o f the IM PACT research group for their support, comments and suggestions. The
38
authors would also like to acknowledge the contributions o f Tom Conte for the use o f his profiling 
package. This research has been supported by the Office o f Naval Research under Contract N00014- 
88-K-0656, the National Science Foundation (NSF) under Grant MIP-8809478, a donation from 
NCR, and the National Aeronautics and Space Administration (N ASA) under Contract NASA NAG 
1-613 in cooperation with the Illinois Computer laboratory for Aerospace Systems and Software 
(ICLASS).
References
[1] A .V . Aho, R. Sethi, J.D. Ullman, Compilers, Principles, Techniques, and Tools, Reading, M A: 
Addison-Wesley, 1986.
[2] W .C . Carter, ’’ Improved Parallel Signature Checkers/Analyzers,” FTCS-16, pp. 416-421,1986.
[3] W . Cramer, G. Kane, 68000 Microprocessor Handbook, Berkeley, CA: McGraw-Hill, 1986.
[4] X. Delord, R. Leveugle, G. Saucier, ’’ Extended Duplex Fault Tolerant System with Integrated 
Control Flow Checking,” International Workshop on Defect and Fault Tolerance in VLSI Sys­
tems, pp 98-109, 1989.
[5] J.B. Eifert, J.P. Shen, ’’ Processor Monitoring Using Asynchronous Signatured Instruction 
Streams,” FTCS-14, PP- 394-399, 1984.
[6] P.J. Fleming, J.J. Wallace, ” How Not to Lie with Statistics: The Correct Way to Summarize 
Benchmark Results,” Computing Practices, Vol. 29, No. 3, pp. 218-221, March 1986.
[7] U. Gunneflo, J. Karlsson, J. Torin, ’’ Evaluation o f Error Detection Schemes Using Fault In­
jection by Heavy-Ion Radiation,” FTCS-19, pp. 340-347, 1989.
[8] J.K .F. Lee, A .J. Smith, ’’ Branch Prediction Strategies and Branch Target Buffer Design,” 
IEEE Computer, pp. 6-22, January 1984.
[9] R. Leveugle, T . Michel, G. Saucier, ’’ Design o f Microprocessors with Built-In On-Line Test,” 
FTCS-20, pp. 450-456, 1990
[10] D.J. Lu, ’’ W atchdog Processors and Structural Integrity Checking,” IEEE Trans, on Comput­
ers, Vol. 31, No. 7, pp. 681-685, July 1982.
[11] A. M ahm ood, E.J. McCluskey, ’’ W atchdog Processors: Error Coverage and Overhead,” FTCS- 
15, pp. 214-219, 1985.
[12] A. M ahm ood, E.J. McCluskey, ’’ Concurrent Error Detection Using W atchdog Processors-A 
Survey,” IEEE Trans, on Computers, Vol. 37, No. 2, pp. 160-174, February 1988.
39
[13] S. McFarling, J. Hennessy, ’’ Reducing the Cost o f  Branches,” Proc. 13th Annu. Symp. on 
Comput. Arch., pp. 396-403, 1986.
[14] M. N am joo, ’’ Techniques for Concurrent Testing o f VLSI Processor Operation,” Int. Test 
Conf., pp. 461-468, 1982.
[15] M. N am joo, ” Cerebus-16: An Architecture for a General Purpose W atchdog Processor,” 
FTCS-13, pp. 216-219, 1983.
[16] N.R. Saxena, E.J. McCluskey, ’’ Control-Flow Checking Using W atchdog Assists and Extended 
Precision Checksums,” FTCS-19, pp. 428-435, 1989.
[17] M .A. Schuette, J.P. Shen, D.P. Siewiorek, Y .X . Zhu, ’’ Experimental Evaluation o f Two Con­
current Error Detection Schemes,” FTCS-16, pp. 138-143, 1986.
[18] M .A . Schuette, J.P. Shen, ’’ Processor Control Flow Monitoring Using Signatured Instruction 
Streams,” IEEE Trans, on Computers, Vol. 36, No. 3, pp. 264-276, March 1987.
[19] J.P. Shen, M .A . Schuette, ’’ On-Line Self-Monitoring Using Signatured Instruction Streams,” 
Int. Test Conf., pp. 275-282, 1983.
[20] J. Sosnowski, ’’ Detection o f  Control Flow Errors Using Signature and Checking Instructions,” 
Int. Test C onf, pp. 81-88, 1988.
[21] T . Sridhar, S. Thatte, ’’ Concurrent Checking o f Program Flow in VLSI Processors,” Int. Test 
C onf, pp. 191-199, 1982.
[22] C. Tung, J. Robinson, ” On Concurrently Testable Microprogrammed Control Units,” Int. Test 
C onf, pp. 895-900, 1986.
[23] N.J. Warter, W .W . Hwu, ” A Software Based Approach to Achieving Optimal Performance for 
Signature Control Flow Checking,” FTCS-20, pp. 442-449, 1990.
[24] N.J. Warter, W .W . Hwu, ’’ Compiler-Assisted Signature Monitoring,” Tech. Report, Center for 
Reliable and High-Performance Computing, University o f Illinois, Urbana-Champaign, IL, (In 
preparation).
[25] K. Wilken, J.P. Shen, ’’ Embedded Signature Monitoring: Analysis and Technique,” Int. Test 
C onf, pp. 324-333, 1987.
[26] K. Wilken, J.P. Shen, ’’ Continuous Signature Monitoring: Efficient Concurrent-Detection o f 
Processor Control Errors,” Int. Test C onf, pp. 914-925, 1988.
40
