Automatic Generation of High Coverage Transient Fault Detectors Using GoldMine by Athavale, Viraj & Vasudevan, Shobha
JANUARY 2011 UILU-ENG-11-2201 
CRHC-11-01
AUTOMATIC GENERATION OF HIGH 
COVERAGE TRANSIENT FAULT 
DETECTORS USING GOLDMINE
Viraj Athavale and Shobha Vasudevan
Coordinated Science Laboratory
1308 West Main Street, Urbana, IL 61801
University o f Illinois at Urbana-Champaign
REPORT DOCUMENTATION PAGE Form  A pp roved  O M B NO. 0704-0188
Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, 
gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comment regarding this burden estimate or any other aspect of this 
collection of information, including suggestions for reducing this burden, to Washington Headquarters Services. Directorate for information Operations and Reports, 1215 Jefferson 
Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188), Washington, DC 20503.
1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 
January 2011
3. REPORT TYPE AND DATES COVERED
4. TITLE AND SUBTITLE
Automatic Generation o f High Coverage Transient Fault Detectors using GoldMine
5. FUNDING NUMBERS
6. AUTHOR(S)
Viraj Athavale and Shobha Vasudevan
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION
Coordinated Science Laboratory 
University of Illinois at Urbana-Champaign 










The views, opinions and/or findings contained in this report are those of the author(s) and should not be construed as an official 
position, policy, or decision, unless so designated by other documentation
12a. DISTRIBUTION/AVAILABILITY STATEMENT
Approved for public release; distribution unlimited.
12b. DISTRIBUTION CODE
13. ABSTRACT (Maximum 200 words)
We propose a formal technique for automatic generation of transient fault detectors for flip-flops/latches in control logic. We apply 
GoldMine, an automatic assertion generation engine based on data mining and formal verification to generation of fault detectors. The 
GoldMine tool flow is modified to include fault coverage evaluation o f assertions. Our formal technique is sound and complete with 
respect to injected faults and provides an accurate assessment o f coverage. Our technique is also able to identify faults critical to 
overall design vulnerability. We demonstrate high fault coverage on OR1200 processor, SpaceWire RTL and ITC benchmarks. We 
extract a minimal high coverage subset of fault detectors for synthesis and show that it has minimal performance overhead.
14. SUBJECT TERMS
automatic fault detectors, logic soft errors, formal verification, data mining, detector 
minimization and synthesis
15. NUMBER OF PAGES 
7
16. PRICE CODE
17. SECURITY CLASSIFICATION 
OF REPORT
UNCLASSIFIED
18. SECURITY CLASSIFICATION 
OF THIS PAGE
UNCLASSIFIED
19. SECURITY CLASSIFICATION 
OF ABSTRACT
UNCLASSIFIED
20. LIMITATION OF ABSTRACT
UL
NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89)
Prescribed by ANSI Std. 239-18 
298-102
Automatic Generation of High Coverage Transient Fault
Detectors using GoldMine
Viraj Athavale, Shobha Vasudevan 
University of Illinois at Urbana-Cham paign  
{athavah, shobhav}@illinois.edu
ABSTRACT
We propose a formal technique for automatic generation of tran­
sient fault detectors for flip-flops/latches in control logic. We apply 
GoldMine, an automatic assertion generation engine based on data 
mining and formal verification to generation of fault detectors. The 
GoldMine tool flow is modified to include fault coverage evalu­
ation of assertions. Our formal technique is sound and complete 
with respect to injected faults and provides an accurate assessment 
of coverage. Our technique is also able to identify faults critical 
to overall design vulnerability. We demonstrate high fault cover­
age on OR 1200 processor, Space Wire RTL and ITC benchmarks. 
We extract a minimal high coverage subset of fault detectors for 
synthesis and show that it has minimal performance overhead.
1. INTRODUCTION
With scaling of semiconductor process technologies in deep sub­
micron regime, transient faults or soft errors due to particle strikes, 
radiation etc are becoming a throbbing concern for the reliabil­
ity of digital systems [4, 23]. Along with soft errors in memory 
cells, logic soft errors have an increasing impact on reliability with 
shrinking chip geometries [14]. Transient fault detection is accom­
plished through various methods at circuit and microarchitecture 
level [17, 15, 19].
At the Register Transfer Level (RTL), assertions which are typ­
ically generated through a manual, time and resource-hungry pro­
cess, can be used as fault detectors. The use of assertions as detec­
tors requires a continuous assessment of the tradeoff between the 
extent of fault coverage by the assertions and the area and perfor­
mance overhead they entail. To find the “sweet spot” with a few, 
high coverage assertions is an arduous task that does not guarantee 
high confidence even on completion.
We present a formal technique for automatic generation of high 
coverage checkers for detecting logic soft errors in sequential ele­
ments. Our method is based on GoldMine [21 ], a tool that uses data 
mining and formal verification techniques to generate assertions for 
design verification. GoldMine employs counterexamples from for­
mal verification to iteratively refine the assertion generation process 
until all assertions are declared true in formal verification. The final 
set of assertions for a design output captures the complete function­
ality of that output. We modify GoldMine to generate assertions 
that can be used for fault detection. We mainly focus on soft errors 
in control logic and state machines in this work.
Our fault injection and fault coverage evaluation, both follow a 
formal approach similar to [20]. we inject faults modeled as single 
event upsets (SEUs) into the state variables (flip-flops/latches) in 
RTL, thereby creating a faulty design. We use GoldMine to gen­
erate assertions for the fault-free design. The faulty design is run 
with these assertions through a formal verifier. If an assertion that
is true in the fault-free design fails in a faulty design, it is said to 
detect the injected fault.
GoldMine’s fault detectors can, in principle, detect all injected 
faults that propagate to the output. Our detection method is sound 
and complete with respect to injected faults. In practice, we may 
not be able to complete the required counterexample iterations of 
GoldMine due to increasing memory and time costs of each itera­
tion. The fault coverage achieved is not 100% in this case.
Unlike manual fault detector generation techniques, our auto­
matic technique results in multiple assertions detecting a single 
fault. Such assertions cover different paths in the design through 
which the fault propagates to the output. Therefore, the number 
of GoldMine assertions that detect a particular fault can provide 
an estimate of the criticality or importance of that fault for over­
all design vulnerability to soft errors. For the Space Wire control 
state machine, we find that the area overhead of GoldMine fault 
detectors reduces from 60% to under 10% if we limit the detec­
tion to critical faults. These estimates can also be used for selective 
protection of flip-flops/latches [16, 3, 20], Thus although our tech­
nique does not offer soft error protection, it can be used to provide 
criticality information to existing soft error protection methods [ 15, 
5].
We are also able to provide a ranking of assertions based on the 
number of faults they detect. Such a ranking provides a systematic 
way of selecting assertions for synthesis into checker circuits. We 
formulate the assertion set minimization as unate covering problem 
[10] and use a heuristic branching algorithm to extract the minimal 
set of assertions few synthesis. The checker circuits cover the frac­
tion of the behavior of the design relevant to detection of injected 
faults. As a result, they have much less area overheads compared 
to complete duplication and minimal performance overheads.
Our method has two unique advantages over existing soft error 
detection techniques. Firstly, GoldMine provides the input space 
coverage of the set of generated assertions. Since our technique em­
ploys these assertions for fault detection, we have a concrete idea 
about the high-level design behavior covered by our fault detectors. 
Secondly, through a formal approach, we ensure that the effect of 
each injected fault on design behavior is completely captured and 
therefore the fault coverage of every assertion is computed accu­
rately.
We show the efficiency of our technique in practice using non­
trivial case studies of OR 1200 [2], Space Wire [6] and a USB design 
as well as ITC benchmarks and demonstrate high coverage in ma­
jority of cases. In previous work [20], a fault in the SpaceWire 
FSM module has been identified as “hard-to-detect” and also pre­
dicted as “possibly masked” since it could not be detected by the 
manually generated assertions. However, our GoldMine generated 
fault detectors are able to detect this fault, providing a compelling
1
argument for GoldMine.
The chief merit of GoldMine-based fault detection is that it can 
systematically and accurately assess fault coverage as well as high 
level behavioral coverage of detectors. A comparison to existing 
soft error detection/protection techniques pertaining to this point 
is not possible since they do not provide a mechanism to analyze 
relation to design behavior. Therefore we perform comparison to 
techniques such as parity checking and self-checking FSM tech­
nique [24] in terms of other parameters viz. area and performance 
overheads. We find that the checker circuits from GoldMine as­
sertions have minimal performance overheads compared to these 
techniques.
Our contributions in this paper are as follows.
•  We present an automatic method to generate fault detectors 
for logic soft errors using GoldMine.
•  The fault detection is entirely done through formal engines, 
rendering it sound and complete with respect to injected faults.
•  Our method provides a metric for determining the criticality 
of a specific fault, which is very valuable for selective pro­
tection of sequential elements.
•  We can rank order assertions according to fault coverage, 
providing a basis for selecting them for synthesis. We use 
a set covering algorithm for fault detector minimization and 
synthesize them as checker circuits.
•  Our method practically gives very high fault coverage in real 
designs w.r.t injected faults with minimal performance over­
head.
2. AUTOMATIC GENERATION OF FAULT 
DETECTORS
Figure 1: Flow of our automatic transient fault detectors generation tech­
nique. It consists of the GoldMine flow with an additional block for fault 
coverage analysis.
Fig. 1 shows the overall flow of our automatic fault detector gen­
eration technique.
In the traditional GoldMine flow, a target RTL design is sim­
ulated for a fixed number of cycles using random input patterns. 
The simulation traces generated are used as data by the data min­
ing stage (A-Miner) comprising a decision tree based supervised 
learning algorithm. Static analysis of the RTL design is carried out 
in order to extract the set of variables which affect a design out­
put. This set is known as the logic cone of the output. A-miner is 
restricted to analyzing the data only for the logic cone of an out­
put. GoldMine can generate temporal assertions or sequential as­
sertions, which are relevant for detecting SEUs. A mining window 
length given to A-Miner provides the number of cycles fw which
the design should be unrolled to capture sequential behavior. A- 
Miner guesses the likely assertions in the design.
Formal verification is used to extract true assertions from the 
set of likely assertions. The true assertions in an iteration are re­
tained. The assertions that fail the formal verification phase are 
poor guesses by the data miner. The formal verifier produces coun­
terexample traces for every failed assertion. These traces are ap­
pended into the simulation testbench in a following iteration of 
GoldMine. This process is repeated until no assertions fail. The 
final set of true GoldMine assertions capture complete functional­
ity of a design output.
We enhance the GoldMine flow to act as an engine which pro­
duces high coverage transient fault detectors. The enhancement is 
directed by fault coverage analysis of GoldMine assertions at the 
end of each counterexample iteration. Assertions are generated for 
an output of a given RTL design with the regular GoldMine flow. 
SEUs are injected in state variables in the logic cone of the out­
put. Fault coverage of the set of true assertions is then evaluated 
for each injected SEU. Subsequent counterexample iterations are 
performed only if faults are left undetected by the cumulative set 
of assertions generated until the current iteration. Our algorithm 
terminates either if all injected faults are detected or if there are no 
more counterexamples to perform more iterations. This is different 
from the regular GoldMine flow where iterations are carried out 
unconditionally until there are no more counterexamples.
2.1 Fault Model and Injection
In this work, we focus on soft errors in sequential elements (flip- 
flops and latches). We consider only effective faults i.e. the faults 
that are able to flip the state of the sequential element.
Soft errors at sequential elements can be modeled as single event 
upsets (SEUs) at corresponding state variables in the RTL design. 
An SEU at a variable corresponds to a bit-flip in the variable at an 
arbitrary cycle of operation. The design behaves normally during 
the remaining cycles.
We employ the SEU model introduced in [20]. Consider a state 
variable v in the design where we wish to inject a fault. We define a 
new variable SEU  and a single bit input in jec tsE U . SEU  =  0 
implies that the bit-flip at v  has not yet occurred. In this state, 
if in ject SEU  is 1 in a particular cycle, v is flipped and SEU  
changes to 1. In all subsequent cycles of operation, since the fault 
has already taken place, the input in je c ts  EU  is simply ignored 
and the design behaves normally. As a result, only in the first cycle 
in which in je c ts  EU  is 1, an SEU is injected at v. The variable 
SEU  thus limits the fault to a single-event upset and prevents mul­
tiple bit-flips.
We use formal verification for evaluating fault coverage of a set 
of assertions. Specifically, the true GoldMine assertions generated 
from the fault-free design are checked against the faulty design. 
Formal verification exhaustively explores the states in the faulty de­
sign to test the validity of each assertion. Due to the injected fault, 
some of these assertions do not hold in the faulty design. They 
are said to detect the injected fault. The new input in je c ts  EU  is 
left unspecified in the faulty design. As a result, the formal verifier 
non-deterministically assigns a value of 0 or 1 to it in each cycle. 
This non-determinism essentially models a bit-flip in an arbitrary 
cycle of operation.
Logic cone of a design output z  is the set of all variables in the 
design that affect z. We inject faults at all state variables in the 
logic cone of z. Since these state variables correspond to sequen­
tial elements at circuit-level, our fault injection is exhaustive w.r.t. 
all sequential elements that affect z. Since our fault injection is 
based on a formal approach, we ensure that our fault injection is
2
exhaustive w.r.t. the time of occurrence of the faults at the state 
variables under consideration.
2.2 Fault Coverage Based GoldMine Flow
Figure 2: Fault coverage guided assertion generation using GoldMine
Fig. 2 gives a detailed illustration of our automatic fault detector 
generation technique.
Consider a target RTL design M. Let I, V and Z  be the set of 
inputs, state variables and outputs of M. Consider z € Z. From 
the static analysis stage of GoldMine, we have a set Vz of state 
variables in the logic cone of z. Note that each element of the sets 
/ ,  V  and Z  is single-bit. In case of multiple bit variables in RTL, 
we add an element for each bit of the variable to the corresponding 
set.
Let Ak be the set of new true assertions generated by the modi­
fied GoldMine flow at the end of fcth counterexample iteration. C k 
is the set of counterexamples generated from the false assertions. 
We use A0 to denote the set of true assertions generated after one 
pass of M  through the data generator, A-Miner and formal verifier, 
i.e. before any counterexample iterations. Similarly, C° is used to 
denote the set of counterexamples produced in this case. The set 
Ak is not cumulative, i.e. it contains the true assertions generated 
only in the kth counterexample iteration. Thus A0 U A 1 U . . .  U Ak 
constitutes the set of all true assertions generated at the end of k 
counterexample iterations.
For a state variable v € Vz, let Mv denote the faulty design ob­
tained from the fault-free design M  on injection of a fault at v. Let 
U be the set of state variables such that faults at these variables are 
yet to be detected. Initially, U — Vz i.e. faults are to be injected 
in every v € Vz. As faults are detected during the counterexam­
ple iterations of GoldMine flow, the corresponding variables are 
removed from U.
Finally, let D z be the set of fault detectors generated for output 
z of the design, which is the output of our technique. If a total of n 
counterexample iterations are performed, D z =  A°UA1U .. .uA n.
Now consider the first pass through regular GoldMine flow. A0 
is the set of true assertions generated. Consider the fault at a vari­
able v € U and the corresponding faulty design Mv. In the fault 
coverage analysis stage, we use formal verification to check the as­
sertions in A0 against Mv. If for an assertion A 6 A0, Mv ^  A 
then we say that A detects the fault at v. Let A° C A0 be the set 
of all such A’s. If A° is non-empty, we remove v from U, since 
we have found at least one assertion to detect it. We repeat this 
process for every v € U. If A“ is empty for at least one v, U is 
non-empty which means that there are some undetected faults re­
maining. In this case, we simulate the counterexamples in C° to 
refine the simulation trace data input to the decision tree builder.
A second pass through the GoldMine flow produces A1 and C 1. 
Again for a fault in each variable in U, we find the set of assertions 
detecting the fault. In particular, Aj C A 1 denotes the set of asser­
tions which detect the fault at v. If A l is non-empty, v is removed 
from U and so on. If U is still non-empty, this process is repeated 
for further counterexample iterations.
The process terminates after k counterexample iterations when 
any of the following two conditions are satisfied:
1. U is empty, C k is non-empty.
2. U is non-empty, C k is empty.
When the first termination condition is satisfied, all the injected 
faults are detected. The set A0 U A1 U . . .  U Ak detects faults at 
all the state variables in Vz. We output this set as the set of fault 
detectors D z and stop.
When the second termination condition is satisfied, we have un­
detected faults. Since C k is empty, counterexample simulation is 
no longer possible. D z captures complete functionality of z, but 
fails to detect faults at some state variables in its logic cone. These 
faults cannot propagate to z within the mining window. It can be 
argued that complete fault coverage for faults that can propagate to 
z  can be obtained using a suitably large mining window.
Figure 3: (a) A Verilog RTL design and (b) the corresponding finite state 
machine. In (b), we have shown the value of input x on the edges, value of 
the vector {v, z} in each state.
We use the example design from Fig. 3 to illustrate the flow of 
our automatic fault detector generation technique. Fig. 3(b) shows 
the state machine for the RTL in Fig. 3(a). Here /  — {x}, Z =  { z }  
and Vz =  {v, z }  for this design. Note that in this case one of the 
state variables in Vz is z  itself. Initially, U — {v, z j  represents the 
set of state variables faults at which are not yet detected. We use a 
mining window of 2 cycles.
One pass of the design through the decision tree builder gives 
one assertion:
A l : (True) => X (z)  1
In words, A l means that always z  =  1. This is obviously not 
true as per Fig. 3. A l is proved false in the formal verification 
stage of GoldMine and a counterexample C l is obtained. There­
fore, A0 =  0  and C° — {Cl}. Fault coverage analysis block is 
not run because A0 is empty.
Since U ^  0 , C° is simulated and another pass through the 
decision tree builder gives five assertions:
A2 : (v A x )  A X (x) => X X (z )
A3 : (~>v A x) A X (x) => X X (->z)
A4 : (-ix) A X (x) =t> X X (z )
A5 : (x) A X (-ix) =*► X X (z )
A6 : (->x) A X(->x) ^  XX(->z)
1 We use LTL [18] notation to represent assertions.
3
From the formal verification stage of GoldMine, we get A1 =  
{A2, A3, A4, >45} and C 1 =  {C6}.
In the fault coverage analysis stage, each assertion from A 1 is 
first checked against the faulty design Mv. We find that Mv |= 
A3, A5 but Mv ft  A2, A4. Thus we have A \ — {A2, A4}. Since 
the fault at v is detected, v is removed from U. Each assertion from 
A 1 is then checked against Mz to find that Mz A2, >13, A4, >15. 
Therefore >1* =  {>42, >13, A4, >15} is the set of assertions that 
detect a fault at z. z  is also removed from U.
Now U =  0  which means that all injected faults have been de­
tected and the first termination condition is satisfied. There is no 
need to continue with further counterexample iterations of Gold- 
Mine. D z — {A2, A3, A4, >15} is the final output from our tech­
nique.
3. GOLDMINE ASSERTIONS AS FAULT DE­
TECTORS
In this section, we explain how a GoldMine assertion can detect 
single event upset faults with the help of an example.
We first show how a GoldMine assertion corresponds to a set of 
state sequences in M.
Consider assertion >14 generated for the design in Fig. 3:
>14 : (—«a:) A X (x ) => X X (z )
In words, A4 states that if x — 0 in first cycle and x =  1 in the 
second cycle, z  will become 1 in the third cycle. We can see that 
the antecedent of >14 corresponds to two initial states 00 and 11 
and the input sequence 1,1. As a result, it corresponds to two state 
sequences P I : 11,11,11 and P2 : 00,11,11 in M. In the final 
state, z =  1 in both the sequences, corresponding to the consequent 
of A3.
A4: (-x) A X(x) => XX(z)
We now demonstrate how a fault at v is detected by the assertion 
A4 (Fig. 4). Consider the state sequence PI in M. P I  is also a 
valid state sequence in the faulty design Mv. However, suppose 
that when A4 is being checked against Mv in formal verification, 
a fault at v occurs when Mv is in the second state of P I. Then the 
state sequence followed in the faulty design is P I' =  11,01,00. 
Thus for the same initial state 11 and input sequence 1,1 defined by 
the antecedent of A4, Mv followed a different state sequence due 
to the injected fault, z =  0 in the last state of P I' which makes 
the consequent of A4 false. Since consequent of A4 was true but 
consequent false, A4 is declared false in Mv, thus detecting the 
injected fault.
Since the final set of assertions generated by GoldMine capture 
the complete functionality of z, it corresponds to the complete set 
of state sequences, and in particular the ones that propagate faults 
to z. Therefore it can be proved that our formal technique is sound 
and complete w.r.t. injected faults that can propagate to the output.
4. FAULT DETECTOR SYNTHESIS AND MIN­
IMIZATION
The set of fault detectors D z obtained by our technique for de­
sign M  are converted to equivalent checker circuits. These checker
circuits can perform runtime detection of corresponding transient 
faults. For example, Fig. 5 shows the checker circuit for assertion
A2 : (v A x) A X (x) =>X X (z) for the design in Fig. 3.
A2: (v A x) A X(x) => XX(z)
Figure 5: Checker circuit for the assertion A2 generated for the design in 
Fig. 3. The output signal A2_valid is high whenever A2 is true during 
runtime.
Our technique generates multiple fault detectors for the same 
fault. In order to limit the area of checker circuits, this redundancy 
needs to be removed by minimizing the set of fault detectors, while 
keeping the fault coverage of the set constant.
The fault detector minimization problem can be formulated as an 
instance of the unate covering problem, typically used in the min­
imization of two-level circuits [10]. We use a heuristic branching 
algorithm to solve the problem. Similar approaches have been used 
for test set compaction in [7, 11].
Let D  be the set of fault detectors generated for all outputs under 
consideration.
D  =  ( j D z . (1)
Let f v denote the fault at v. Let F d  be the set of faults detected by 
D. We use our fault injection and coverage evaluation technique 
as described in Sec. 2.1 to obtain the sets Fd C Fd of faults de­
tected by each fault detector d € D. We define a cost function 
C  : D  R to incorporate the area overhead of fault detectors. 
We estimate the number of flip-flops in the shift register part of the 
checker circuit and each type of gates required for the combina­
tional part and use it as the cost function.
The objective of the fault detector minimization problem is to 
find Dmin C D  such that
Fd =  U  Fd (2)
and the total cost CrDmin is minimum, where
cDmin= Yl o>
d£Dmin
Let D -  { d o ,d i , . . . ,d m- i }  and FD =  {/o, / i , . • •, / n - i } .  
To formulate the unate covering problem, a coverage matrix B  — 
(bij) m X n  is constructed such that bij —  1 if detects f j  and 
bij = 0  otherwise. Thus each row of B  corresponds to a fault 
detector in D  and each column corresponds to a fault in Fd - Our 
objective is to find a subset of rows Dm¿„ such that V column j ,  3 
row i 6 Dmin such that bij =  1.
We start with Dmin =  4>. We first reduce the coverage matrix 
by identifying essential rows and using row and column dominance 
[ 10]:
Essential rows: Row i is essential if 3 column j  such that d, is the 
only detector which detects f j .  We add d< to D m i n  and remove 
columns corresponding to faults it detects.
Column dominance: Column j  dominates column k if bij >  bik 
Vi. In this case, f j  is detected whenever f k is detected. We remove 
such a column j .
4
Row dominance: Row i dominates row k if bij >  bkj Vj. In 
this case, di detects all faults detected by dk. Further if C(di) <  
C(dk), we remove dk- If row i and row k are identical, we remove 
the row with higher cost.
These reduction steps are applied recursively until no more re­
duction is possible. If B  has no columns left, we have found an 
exact solution to our problem. We output Dmin and stop.
Otherwise, we have a cyclic coverage matrix. We apply branch­
ing heuristics in this case to get the final solution. Fault coverage 
and area costs of individual assertions are taken into account while 
using these heuristics.
Finally, every fault detector from Dmin is converted to a checker 
circuit similar to Fig. 5.
5. EXPERIMENTAL RESULTS
We evaluate the effectiveness of our technique through experi­
ments on the OR 1200 OpenRISC processor [2], the control state 
machine of an end node of the SpaceWire [6] network, USB 2.0 
packet assembler block and a set of ITC benchmarks. OR 1200 is a 
32-bit scalar RISC with Harvard microarchitecture and 5 stage inte­
ger pipeline. SpaceWire is a standard for high-speed links and net­
works for use onboard spacecrafts. Verilog RTL implementations 
from [1] for OR1200, SpaceWire end node and USB 2.0 controller 
were used. A mining window of 2 cycles was used for OR 1200, 
USB and ITC benchmark circuits while a mining window of 3 cy­
cles was used for the SpaceWire experiments. For each module, we 
ran experiments for a subset of the set of all outputs and for each 
output, we injected faults at all the state variables in its logic cone. 
We used Cadence IFV as the formal verifier for fault experiments. 
Experiments were performed on an Intel Core 2 Quad with 16GB 
of memory.
Table 1 summarizes the results of these experiments. FIFO mod­
ules from OR 1200 and the ITC benchmark circuits except b09 and 
blO have simpler logic compared to the rest of the modules. As 
a result, they require smaller number of counterexample iterations 
(I g m  = 0 to 2) of GoldMine compared to those required for OR 1200 
decode and SpaceWire (I g m  -  5 to 8). Consequently, the time 
taken by GoldMine to generate the final set of fault detectors in case 
of OR 1200 decode and SpaceWire FSM modules is also substan­
tially higher. We are able to achieve 100% coverage w.r.t.injected 
faults in 27 out of 30 outputs. I g m  in these cases denotes the num­
ber of GoldMine iterations required to achieve the complete fault 
coverage. For the remaining outputs, we had to stop after I g m  
iterations in each case because of excessive memory usage.
It can be seen that C„ in each case gives the accurate input space 
coverage of the set of true assertions generated in Ig m  iterations. 
Here an interesting observation is that for many outputs, 100% fault 
coverage is achieved when the input space coverage is much lower 
(10%-60%). This means that the generation of fault detectors does 
not depend on the quality of the initial random tests that were used 
to generate data. While for achieving high coverage in simulation 
of input space, GoldMine would have to run for multiple iterations, 
in the case of fault detection, this is achieved with far fewer coun­
terexample iterations, making this a highly scalable application of 
GoldMine.
5.1 Fault coverage with increasing counterex­
ample iterations
Fig. 6 shows fault coverage of assertions each output of the OR 1200 
decode module as a function of counterexample iterations. Al­
though fault coverage increases with increasing number of itera­
tions, the increase is not monotonic. This means that assertions 
generated in successive iterations may not always detect new faults.
Module Size
(gates)




exceptjllegal 7 3 5.88 100 82.10
ex_macrc_op 8 8 24.68 100 41.40
orl200_ctrl 1434 no_more_dslot 13 7 159.68 100 15.40
sig_syscall 10 8 35.78 100 54.11
sig_trap 10 5 6.20 100 90.57
biu_read_reg 5 0 1.73 100 11.48
burst 4 0 2.10 100 9.22
orl200 ic fsm 383 first_h't_ack 4 0 0.21 100 95.08first_miss_ack 4 0 0.31 100 83.91
first_miss_err 4 0 0.37 too 80.25
tag_we 5 0 0.55 100 54.06
dat_o 15 0 5.85 100 3.52
orl200 sb fifo 178 empty _o 7 1 0.12 100 90.63
full_o 7 1 0.11 100 84.38
orl200_sb 430 dcsb_ack_o 8 2 4.27 100 59.84
RST_tx_o 25 8 73.59 100 67.81
SPW FSM 315 err_sqc 25 6 46.62 100 46.78enTx_o 26 8 150.64 100 10.01
enRx_o 26 5 256.59 100 44.49
usbf_pa 327 tx_data[2] 28 8 152.41 96.43 76.92
bOl 62 outp 4 0 0.10 100 37.50
b02 49 u 4 0 0.03 100 62.50
grant_o[3] 24 D 1.26 100 1.22
b03 280 grant_o[2] 24 1 3.98 100 25.91grant_o[l] 24 1 2.28 100 18.42
grant_o[0] 24 1 3.58 100 9.47
b06 80 ackout 4 2 0.13 100 96.86
b09 249 y 28 5 9.11 100 75
K i n O/M ctr 15 10 281.8666.67 50.68
v_out[0] 15 6 174.4 66.67 37.24
iterations of GoldMine (Ig m ) and time taken by GoldMine in minutes 
(Tg m ) to generate the fault detectors. C /  and Ca are respectively the 
fault coverage and the input space coverage reported by GoldMine. 
Size of each module is given in terms of number of gates after synthesis. 
Since all the modules contain sequential elements, a flip-flop/latch is 
considered equivalent to 4 gates in order to give an overall sense of size 
of the module.
In Fig. 7, we present a fault-centric analysis of the OR 1200 de­
code module. For a given fault, we show the number of GoldMine 
iterations that it took to generate the set of assertions that covered 
the fault.2
There were a total 19 state variables in the combined logic cones 
of the 4 outputs shown. We can see that except f9, all faults in the 
logic cone of n o _ m o re _ d s lo t are detected within 5 iterations. 
For f9, we can see from the RTL that the corresponding state bit 
has a low fanout which makes the fault harder to detect, resulting 
in higher number of iterations.
5.2 Estimation of criticality of a fault
In Fig. 8, we show the percentage of total assertions generated 
(D z) for each output of SpaceWire FSM that detect a given fault. 
This number gives an estimate of criticality or importance of the 
fault for overall design vulnerability. The faults considered are 
in the combined logic cone of all the outputs. For the variables 
s t a t e  [5 :0 ]  and t im e r  [1 3 :0 ] ,  the number of assertions is 
averaged over individual bits.
We can see from the figure that a relatively high percentage of
2The faults are given names like fO, f 1 etc instead of actual variable 
names for readability.
5






Figure 6: Fault coverage evaluated after each GoldMine iteration for de­
code module of OR1200.
#  90
SWte|5:0] tS .4u t tl2_ *us HASgotNUU HASgotBit timer|13:0]
Variables at which faults are injected
B RST_tx_o 
■  e r r s q c  
a  enTx_o 
B enRx_o
Figure 8: Percentage of generated assertions that detect a particular fault 
for SpaceWire control FSM. Fault at the highlighted variable could not be 
detected by a set of manually written assertions as reported in [20].




fO f l  f2 13 f4 fS f6 f7 f8 f!7  fO f l  f2 13 f4 f5 f6 f7 f8 f!8
Figure 7: Number of counterexample iterations required to detect each 
fault for the decode module of OR 1200. A total of 20 faults were injected. 
For each output, faults at variables in its logic cone are plotted on the x-axis 
and the counterexample iterations at which they are detected are shown on 
the y-axis.
assertions (25% - 78%) detect faults at variable s t a t e .  Therefore, 
faults at s t a t e  would be most critical for all outputs considered. 
This is confirmed by the RTL description which shows a direct de­
pendence of these outputs on s t a t e ,  s t a t e  is used as a switch­
ing variable of a c a s e  statement in the RTL. Since the rest of the 
variables can only affect the outputs by changing s t a t e ,  a lower 
percentage of assertions detect faults at those variables (up to 31%).
An important result of this analysis follows. In [20] the authors 
have demonstrated that manually written assertions based on the 
SpaceWire specifications were not able to detect a fault at H A SgotBit. 
GoldMine’s fault detectors, however manage to detect this fault. At 
the same time, Fig. 8 also illustrates that this fault is hard-to-detect, 
by showing a very low percentage of assertions detecting this fault 
(0.02% - 4%), thereby confirming the inference in [20].
5.3 Ranking GoldMine assertions based on fault 
coverage
In the fault detection context, the extent of fault coverage of an 
individual assertion can be used to rank that assertion. The per­
centage of total injected faults that are detected by an individual 
assertion can be used as a goodness metric for evaluating its utility 
as a fault detector. Higher ranked assertions can be synthesized into 
hardware as checker circuits.
We ranked the assertions generated for SpaceWire FSM where
assertions detecting (<=10%), (10%-20%), (20%-30%)and (>30%) 
faults were assigned ranks 1,2, 3 and 4 respectively. We then com­
puted the percentage of assertions belonging to each rank (Fig. 9).
In this experiment, most of the assertions (82%) have rank 1 or 2 
i.e. they detect up to 20% of injected faults. However, there are 4% 
assertions belonging to rank 4 and we have also found that about 
6% of those detect more than 80% of the injected faults.
■ Rank 1
■ Rank 2 
•  Rank 3
■ Rank 4
Figure 9: Percentage of total assertions belonging to each rank.
5.4 Checker circuits overheads and compari­
son
The checker circuits and the modules themselves were synthe­
sized using Synopsys Design Compiler with OSU FreePDK 45nm 
library [8] and area and performance overheads were computed. 
Area overheads include the total cell areas. Performance overheads 
were computed as percentage increase in the delay of the critical 
path in the design when assertions are added.
In Table 2, we compare checker circuits from GoldMine fault de­
tectors to parity based soft error detection and self-checking FSM 
technique [24]. In parity checking, one parity bit was added per 
variable in the RTL design at which faults were injected and de­
tected by our technique. Typical parity based methods devote one 
extra cycle each to read from or write to a variable, which is a very 
high performance penalty. For a fair comparison in terms critical 
path delay, a modified scheme was used where the parity compu­
tation and checking was done in the same cycle as the rest of the 
combinational logic.
The performance impact of GoldMine fault detectors is much 
lower (within 5%) compared to parity checking. The exception 
in case of o r l 2 0 0 _ c t r l  is due to the fact that the critical path 
in o r l 2 0 0 _ c t r l  does not start or end with a flip-flop where 
fault was injected and therefore no parity checking logic was added 
to it. Although the area overheads of GoldMine fault detectors 
are higher, systematic fault and behavioral coverage analysis along 
with minimal performance impact can help offset them.
The self-checking FSM technique can only be applied to the 
modules based on state machines. Moreover, only the faults at the
6
State bits describing the the FSM are detected. In contrast, our tech­
nique detects faults in all the flip-flops of the design. Despite this 
fact, our method has lower performance overheads and comparable 
area overheads in many cases.
A comparison to the DICE circuit hardening technique [S] re­
vealed that GoldMine fault detectors have comparable performance 
overheads to this technique as well.
Selective detection of critical faults can reduce the overheads of 
GoldMine fault detectors to a great extent. We found that when 
only the critical faults in the Space Wire FSM viz. faults at 
state [5:0], t6_4us and HASgotNULL were considered for 
detection, the area overhead reduced from 59.5% to as low as 9.86% 









A p a r i t y
ity
king







orl200_ctrl 15.21 10.16 6.3 0 - -
orl200_ic_fsm 15.57 0 5.75 5.75 4.36 0
orl200_sb_fifo 52.49 0 44.05 27.28 - -
or!200_sb 16.56 0 10.28 32.79 - -
SPW_FSM 59.5 4.49 43.19 44.94 7.23 4.49
usbf_pa 90.61 5.97 37.95 36.23 - -
bOl 27.24 0 31.82 31.58 61.79 57.89
b02 50.88 0 40.69 58.82 65.35 67.65
b03 39.75 4.17 56.99 27.79 4.91 6.94
b06 38.8 1.85 18.09 18. 42.11 22.22
b09 50 0 51.88 41.67 16.67 1.19
blO
able '2: C o m m
48.59 0
m  n e r c e
23.97
n ta e e  a r e
32.89
a  a n d  n e
9.08
r f o r n u
0.63
m e e  o v e
heads of soft error detection schemes: GoldMine fault detectors 
(A Gm ,P g m \ parity checking (A Pari ty, P p arity ) and self-checking 
FSM technique [5] (ASc, Psc)-
6. RELATED WORK
Soft error detection and resilience has been studied widely at 
circuit level [17, 15, 5] and microarchitecture level [9, 22, 19]. 
Simulation-based approaches are typically used for fault experi­
ments [12]. Our method works at RT-Level and uses formal tech­
niques for all fault experiments. In [20, 13], formal methods have 
been used to inject soft errors in RTL design and evaluate fault 
coverage, but the authors only consider manually written assertions 
which do not guarantee high fault coverage.
In conclusion, we present an automatic methodology for gen­
eration of high coverage transient fault detectors, which is prac­
tically effective. Our method has twin advantages of being high 
confidence as well as providing metrics of importance for hard­
ware checkers. In future work, we will address recovery and/or 
protection from detected faults.
7. REFERENCES
[ 1 ] Opencores, http://www.opencores.org.
[2] Openrisc 1200, http://opencores.org/openrisc,or1200.
[3] H. Asadi and M. B. Tahoori. Soft error modeling and protection for 
sequential elements. In Pmc. DFT ’05.
[4] R. Baumann. Soft errors in advanced computer systems. IEEE Des. 
Test, 22:258-266, May 2005.
[5] T. Calin, M. Nicolaidis, and R. Velazco. Upset hardened memory 
design for submicron cmos technology. Nuclear Science, IEEE 
Transactions on, 1996.
[6] European Corporation for Space Standardization. Spacewire-links, 
nodes, routers and networks (ECSS-E-ST-50-12C), July 2008.
[7] P. F. Flores, H. C. Neto, and a. P. Marques-Silva. Jo'On applying set 
covering models to test set compaction. In Pmc. GLS ’99.
[8] FreePDK 45nm Library,
http://vcag.ecen.okstate.edu/projects/scells/OSUFreePDK.php.
[9] T. S. Ganesh, V. Subramanian, and A. Somani. Seu mitigation 
techniques for microprocessor control logic. In Pmc. EDCC ’06.
[10] G. D. Hachtel and F. Somenzi. Logic Synthesis and Verification 
Algorithms. 2006.
[11] D. Hochbaum. An optimal test compression procedure for 
combinational circuits. IEEE Tmns. CAD.
[12] M.-C. Hsueh, T. K. Tsai, and R. K. Iyer. Fault injection techniques 
and tools. Computer, 30:75-82, 1997.
[13] U. Krautz, M. Pflanz, C. Jacobi, H. W. Tast, K. Weber, and H. T. 
Vierhaus. Evaluating coverage of error detection logic for soft errors 
using formal methods. In Pmc. DATE ’06.
[14] S. Mitra, T. Karnik, N. Seifert, and M. Zhang. Logic soft errors in 
sub-65nm technologies design and cad challenges. In Pmc. DAC ’05.
[15] S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim. Robust system 
design with built-in soft-error resilience. Computer, 38.
[16] K. Mohanram and N. Touba. Cost-effective approach for reducing 
soft error failure rate in logic circuits. In Pmc. TTC ’03.
[17] M. Nicolaidis. Design for soft error mitigation. Device and Materials 
Reliability, IEEE Transactions on, 5(3):405 -418, 2005.
[18] A. Pnueli. The temporal logic of programs. In Pmc. FOCS ’TT.
[19] S. K. Sastry Hari, M.-L. Li, P. Ramachandran, B. Choi, and S. V. 
Adve. mSWAT: low-cost hardware fault detection and diagnosis for 
multicore systems. In Proc. MICRO ’09.
[20] S. A. Seshia, W. Li, and S. Mitra. Verification-guided soft error 
resilience. In Pmc. DATE ’07.
[21] S. Vasudevan, D. Sheridan, D. Tcheng, S. Patel, W. Tuohy, and 
D. Johnson. Goldmine: Automatic assertion generation using data 
mining and static analysis. In Pmc. DATE ’10.
[22] N. J. Wang and S. J. Patel. Restore: Symptom-based soft error 
detection in microprocessors. IEEE Trans. Dependable Secur. 
Comput., 3(3): 188-201, 2006.
[23] N. J. Wang, J. Quek, T. M. Rafacz, and S. J. patel. Characterizing the 
effects of transient faults on a high-performance processor pipeline. 
In Pmc. DSN ’04.
[24] C. Zeng, N. Saxena, and E. J. McCluskey. Finite state machine 
synthesis with concurrent error detection. In Pmc. ITC’99.
7
