University of South Florida

Scholar Commons
Graduate Theses and Dissertations

Graduate School

3-3-2010

Probabilistic Error Analysis Models for Nano-Domain VLSI Circuits
Karthikeyan Lingasubramanian
University of South Florida

Follow this and additional works at: https://scholarcommons.usf.edu/etd
Part of the American Studies Commons

Scholar Commons Citation
Lingasubramanian, Karthikeyan, "Probabilistic Error Analysis Models for Nano-Domain VLSI Circuits"
(2010). Graduate Theses and Dissertations.
https://scholarcommons.usf.edu/etd/1699

This Dissertation is brought to you for free and open access by the Graduate School at Scholar Commons. It has
been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Scholar
Commons. For more information, please contact scholarcommons@usf.edu.

Probabilistic Error Analysis Models for Nano-Domain VLSI Circuits

by

Karthikeyan Lingasubramanian

A dissertation submitted in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy in Electrical Engineering
Department of Electrical Engineering
College of Engineering
University of South Florida

Major Professor: Sanjukta Bhanja, Ph.D.
Nagarajan Ranganathan, Ph.D.
Syed M. Alam, Ph.D.
Wilfrido A. Moreno, Ph.D.
Paris H. Wiley, Ph.D.

Date of Approval:
March 3, 2010

Keywords: Reliability, Worst-case input, Sequential circuits, Redundancy models
© Copyright 2010 , Karthikeyan Lingasubramanian

DEDICATION

To my Family and Friends

ACKNOWLEDGEMENTS

I would like to thank my major professor Dr. Sanjukta Bhanja for believing in me and for
giving me this opportunity. Without her support this dissertation wouldn’t have been possible.
She has trained me in every aspect of research and has helped me mold myself as a better
researcher. Moreover she has also been a good friend to me.
My sincere thanks to Dr. Nagarajan Ranganathan, Dr. Syed M. Alam, Dr. Wilfredo A.
Moreno and Dr. Paris H. Wiley for serving in my committee. I would like to thank them all
for their valuable support and advice and most importantly for allotting their time in spite of
their busy schedule.
I would like to thank all the faculties and staff of the Department of Electrical Engineering
and College of Engineering.
I would like to thank all my present and former colleagues, Javier, Anitha, Srinath, Dinuka,
Jose, Pruthvi, Thara, Shiva, Nirmal, Vivek, Satish, Praveen and Saket, for their unconditional
support.
I am really very grateful for the invaluable support and motivation that I received from my
family.
I would also like to thank all my friends that I ever had in my whole life.

TABLE OF CONTENTS

LIST OF TABLES

iv

LIST OF FIGURES

v

ABSTRACT

viii

CHAPTER 1
INTRODUCTION
1.1 Motivation
1.2 Significance
1.3 Contribution
1.4 Scope of Application
1.5 Organization

1
2
3
5
8
10

CHAPTER 2
RELATED WORK
2.1 SEU Modeling
2.2 Dynamic Error Modeling
2.2.1 Calculation of Error Bounds
2.2.2 Calculation of Average Error
2.2.3 Error Reduction Through Redundancy
2.3 Relation to State-of-the-Art

12
12
13
13
16
19
22

CHAPTER 3
DESIGN FUNDAMENTALS
3.1 Probabilistic Representation of Digital Circuits
3.2 Modeling Error in Digital Circuits

24
24
29

CHAPTER 4
MAXIMUM ERROR MODELING
4.1 Maximum a Posteriori (MAP) Estimate
4.1.1 Calculation of MAP Upper Bounds Using Shenoy-Shafer Algorithm
4.1.2 Calculation of the Exact MAP Solution
4.1.3 Calculating the Maximum Output Error Probability
4.1.4 Computational Complexity of MAP Estimate
4.2 Experimental Results
4.2.1 Experimental Procedure for Calculating Maximum Output Error
Probability

33
35

i

38
52
55
55
56
56

4.3

4.2.2 Worst-case Input Vectors
4.2.3 Circuit-Specific Error Bounds for Fault-Tolerant Computation
4.2.4 Validation Using HSpice Simulator
4.2.5 Results with Multiple ε
Discussion

58
59
61
65
66

CHAPTER 5
MODELING ERROR IN SEQUENTIAL CIRCUITS
5.1 Sequential Logic Model
5.1.1 TDM Model
5.2 Error Model
5.2.1 Structure
5.2.2 Inference Scheme
5.2.3 Output Error Probability
5.3 Experimental Results
5.3.1 Experimental Procedure
5.3.2 Output Error Probabilities
5.3.3 Number of Time Slices
5.3.4 Output Error Propagation Across Time Slices
5.3.5 Output Error Probabilities for ε 0 = ε1
5.3.6 Validation Using HSpice Simulation
5.4 Discussion

67
68
69
70
70
74
78
78
79
80
80
81
83
84
85

CHAPTER 6
REDUNDANCY SCHEMES FOR ERROR MITIGATION
6.1 Temporal Redundancy Scheme Using Triple Temporal Redundancy (TTR)
Technique
6.1.1 Determination of the Set of Worst-Case Input Combinations for
Selective Redundancy in TTR
6.1.2 Experimental Setup for TTR
6.2 Spatial Redundancy Scheme Using Cascaded Triple Modular Redundancy
(CTMR) Technique
6.2.1 Sensitivity Analysis for Selective Redundancy in CTMR
6.3 Hybrid Redundancy
6.4 Experimental Results
6.4.1 Error Mitigation Through Temporal Redundancy
6.4.2 Error Mitigation Through Spatial Redundancy
6.4.3 Error Mitigation Through Hybrid Redundancy
6.4.4 Comparison Between the Redundancy Schemes
6.4.5 Error Mitigation Through Hybrid Redundancy with Different
Combinations of Spatial and Temporal Redundancies
6.4.6 Delay and Area Penalties
6.5 Discussion

86

101
102
103

CHAPTER 7

104

CONCLUSION AND FUTURE DIRECTIONS
ii

87
87
89
93
94
94
95
96
97
98
99

REFERENCES

106

ABOUT THE AUTHOR

End Page

iii

LIST OF TABLES

Table 4.1.

Valuations of the variables derived from corresponding CPTs

40

Table 4.2.

Combination

41

Table 4.3.

Worst-case input vectors from MAP

58

Table 4.4.

Run times for MAP computation

61

Table 4.5.

Comparison between maximum error probabilities achieved from the
proposed model and the HSpice simulator at ε = 0.05

62

Conditional probabilistic tables for error-free and error-prone NAND
logic

73

Conditional probabilistic table for error-prone NAND logic having variable gate error probabilities, ε0 and ε1

73

Table 5.3.

Output error probabilities at ε = 0.001, 0.003, 0.005, 0.01

80

Table 5.4.

Output error probabilities at ε = 0.001, 0.003, 0.005, 0.01 compared with
HSpice simulation results

84

Table 5.1.
Table 5.2.

iv

LIST OF FIGURES

Figure 1.1.

Significance of this dissertation

4

Figure 1.2.

Scope of application

8

Figure 2.1.

(a) Probabilistic transfer matrix for erroneous NAND gate with error
probability ε [45] (b) Markov random field [47]

17

Figure 2.2.

NAND multiplexing scheme introduced by Von Neumann [1]

20

Figure 2.3.

Some of the related works on reliability models for dynamic errors in
VLSI circuits

22

Figure 3.1.

Representation of a digital circuit as a probabilistic graph

25

Figure 3.2.

Minimal representation of the probabilistic graph

27

Figure 3.3.

Error model

30

Figure 3.4.

Conditional probabilistic tables for the ideal and erroneous nodes in the
error model

31

Figure 4.1.

(a) Digital logic circuit (b) Error model (c) Probabilistic error model

36

Figure 4.2.

Search tree where depth first branch and bound search performed

38

Figure 4.3.

Illustration of the fusion algorithm

41

Figure 4.4.

Partial illustration of binary join tree construction method for the first
chosen variable

44

Figure 4.5.

Complete illustration of binary join tree construction method

45

Figure 4.6.

(a) Message passing with cluster C11 as root (b) Message passing with
cluster C1 as root (c) Message storage mechanism

47

Figure 4.7.

Binary join tree for the probabilistic error model in Fig. 4.1.(c)

51

Figure 4.8.

Search process for MAP computation

53

v

Figure 4.9.

Flow chart describing the experimental setup and process

57

Figure 4.10.

Circuit-specific error bound along with comparison between maximum
and average output error probabilities for (a) c17, (b) max f lat, (c)
voter, (d) pc, (e) count, (f) alu4, (g) malu4

60

Output error probabilities for the entire input vector space with gate error
probability ε = 0.05 for c17

63

(a) Output error probabilities ≥ (µ + σ), calculated from probabilistic
error model, with gate error probability ε = 0.05 for max f lat (b) Corresponding HSpice calculations

64

Comparison between the average and maximum output error probability
and run time for ε=0.005, ε=0.05 and variable ε ranging from 0.005 0.05 for max f lat

65

(a) Digital logic circuit (b) Corresponding probabilistic model (c) DAG
representation which is not minimal (d) TDM model

68

Error model obtained from TDM model with 3rd order temporal dependence

71

(a) Digital logic circuit (b) Corresponding probabilistic model (c) Moral
graph obtained by adding undirected links between parents of common
child nodes (d) Corresponding join tree obtained

76

Figure 5.4.

Flowchart for experimental procedure

79

Figure 5.5.

Number of time slices needed by bbara and bbtas for ε = 0 − 0.006

81

Figure 5.6.

(a) Transition of output error probability across time slices for bbara,
s27 and mc with ε = 0.01 (b) Transition of output error probability across
time slices for lion, lion9 and bbtas with ε = 0.01

82

(a) Transition of error-free and error-prone output probabilities across
time slices for bbtas, lion9 and lion with ε = 0.01 (b) Transition of errorfree and error-prone output probabilities across time slices for bbara
with ε = 0.01

82

Output error probabilities for (ε0 = 0.01, ε1 = 0.02) and (ε0 = 0.02,
ε1 = 0.01)

83

Determination of the set of worst-case input combinations by backtracking through the search tree used for MAP computation given in Fig. 4.8.

88

Experimental setup for TTR incorporating selective redundancy

90

Figure 4.11.
Figure 4.12.

Figure 4.13.

Figure 5.1.
Figure 5.2.
Figure 5.3.

Figure 5.7.

Figure 5.8.
Figure 6.1.
Figure 6.2.

vi

Figure 6.3.

Spatial redundancy scheme using CTMR technique incorporating majority logic

94

Figure 6.4.

Hybrid redundancy scheme using CTMR and TTR techniques

95

Figure 6.5.

Percentage mitigation of output error achieved through 5% and 15%
temporal redundancy with ε=0.001

96

Percentage mitigation of output error achieved through 5% and 15% spatial redundancy with ε=0.001

98

Percentage mitigation of output error achieved through 5% and 15% hybrid redundancy with ε=0.001

99

Comparison between the redundancy schemes for (a) 5% and (b) 15%
redundancy with ε=0.001

100

Percentage mitigation of output error achieved through hybrid redundancy with different combinations of spatial and temporal redundancies
while ε=0.001

101

(a) Delay penalty in temporal redundancy (b) Area penalty in spatial
redundancy

102

Figure 6.6.
Figure 6.7.
Figure 6.8.
Figure 6.9.

Figure 6.10.

vii

PROBABILISTIC ERROR ANALYSIS MODELS FOR NANO-DOMAIN VLSI
CIRCUITS
Karthikeyan Lingasubramanian
ABSTRACT

Technology scaling to the nanometer levels has paved the way to realize multi-dimensional
applications in a single product by increasing the density of the electronic devices on integrated chips. This has naturally attracted a wide variety of industries like medicine, communication, automobile, defense and even house-hold appliance, to use high speed multi-functional
computing machines. Apart from the advantages of these nano-domain computing devices,
their usage in safety-centric applications like implantable biomedical chips and automobile
safety has immensely increased the need for comprehensive error analysis to enhance their
reliability. Moreover, these nano-electronic devices have increased propensity to transient errors due to extremely small device dimensions and low switching energy. The nature of these
transient errors is more probabilistic than deterministic, and so requires probabilistic models
for estimation and analysis. In this dissertation, we present comprehensive analytic studies
of error behavior in nano-level digital logic circuits using probabilistic reliability models. It
comprises the design of exact probabilistic error models, to compute the maximum error over
all possible input space in a circuit-specific manner; to study the behavior of transient errors
in sequential circuits; and to achieve error mitigation through redundancy techniques. The
model to compute maximum error, also provides the worst-case input vector, which has the
highest probability to generate an erroneous output, for any given logic circuit. The model
for sequential logic that can measure the expected output error probability, given a probabilisviii

tic input space, can account for both spatial dependencies and temporal correlations across
the logic, using a time evolving causal network. For comprehensive error reduction in logic
circuits, temporal, spatial and hybrid redundancy models, are implemented. The temporal redundancy model uses the triple temporal redundancy technique that applies redundancy in the
input space, spatial redundancy model uses the cascaded triple modular redundancy technique
that applies redundancy in the intermediate signal space and the hybrid redundancy techniques
encapsulates both temporal and spatial redundancy schemes. All the above studies are performed on standard benchmark circuits from ISCAS and MCNC suites and the subsequent
experimental results are obtained. These results clearly encompasses the various aspects of
error behavior in nano VLSI circuits and also shows the efficiency and versatility of the probabilistic error models.

ix

CHAPTER 1
INTRODUCTION

Integrated Circuits are used in a wide range of important applications like automobile,
aircraft, medicine, defense, communication and even house-hold appliances. Critical applications like medicine demand high accuracy and efficiency due to stringent safety requirements,
while applications like automotive, defense demand more robustness due to extreme working conditions [41, 42, 39]. Also the demand for multi-dimensional applications in a single
product has increased the density of the electronic devices on a chip eventually resulting in
reduction of device feature size, pushing the technology to nanometer levels [60, 59]. Complementary Metal Oxide Semiconductor (CMOS) transistors, which are the current generation
electronic devices, have been shrunk to sub-50nm dimensions [59]. This reduction in feature
size results in variations in device and process parameters, which in turn leads to transient dynamic faults in digital circuits. In this dissertation, we present an error model that can handle
these transient dynamic faults using probabilistic methods. Using this error model, we present
a unique method to calculate maximum errors in digital circuits. Also, based on this error
model, we present a time evolving probabilistic network that can calculate error in sequential circuits. Finally, we present temporal, spatial and hybrid redundancy techniques, which
incorporates selective redundancy using the base error model, for error mitigation in digital
circuits.

1

1.1 Motivation
Why use probabilistic models? Nano-domain computing devices are likely to have higher
error rates (both in terms of defect and transient faults) as they operate near the thermal limit
and information processing occurs at extremely small volume [61, 47]. Nano-CMOS, beyond
22nm, is not an exception in this regard as the frequency scales up and voltage and geometry
scales down. The resulting errors, due to uncontrollable variations in device and process parameters like temperature and threshold voltage, are highly intractable for deterministic testing
tools used to detect permanent faults. A fresh look at reliability in a technology independent
fashion is both timely and necessary. Given the inherent stochastic nature of the devices in the
nano-regime, instead of deterministic logic models probabilistic models would be more appropriate. This requires a significant shift in the design and testing paradigm, with reliability
adopting a central role in design of electronic devices.
Why model maximum error? Industries like automotive and health care have traditionally addressed high reliability requirements by employing redundancy, error corrections, and
choice of proper assembly and packaging technology. In addition, rigorous product testing
at extended stress conditions filters out even an entire lot in the presence of a small number of failures [39]. Another rapidly growing class of electronic chips where reliability is
very critical is implantable biomedical chips [41, 42]. More interestingly, some of the safety
approaches, such as redundancy and complex packaging, are not readily applicable to implantable biomedical applications because of low voltage and low power operation and small
form factor requirements. Authors in [41] identified that conventional approaches in device
and parasitic modeling, circuit techniques, and manufacturing and test need to improve due
to extreme low power and high reliability requirements, since these constraints pose serious
complexities in circuit design through unpredictable design environment. In addition, we believe our method of calculating maximum probability of error and the proposed maximum

2

error probability aware design is well suited for implantable biomedical IC design. While two
design implementation choices can have different average probabilities of failures, the lower
average choice may in fact have higher maximum probability of failure leading to lower yield
in manufacturing and more rejects during chip burn-in and extended screening. Also, when
the input space for a circuit is completely random and equally probable, calculation of average
error will suffice. But, as in some cases, when the input space gets biased, the average error
information will not be comprehensive enough to understand the error behavior in the circuit.
Therefore, using maximum probability of failure as a critical design metric along with average
case would be required in design of safety critical electronic chips.
Why model error in sequential circuits? Most of the real-time applications of electronic
devices, like random access memories, needs them to be sequential in nature. Sequential circuits consist of a combinational logic block, set of inputs, set of state bits where the values of
the next state bit is fed back to the present state in the next clock cycle through latches. At
a given time instance ti , the state signals sti are uniquely identified as a function of primary
input signals iti and state signals sti−1 of the previous time instance giving rise to temporal
correlations. Due to this, error occurring in the combinational part of the circuit at one time
instance might propagate towards several consecutive time instances making the device more
vulnerable [54, 51]. The static reliability models used for combinational circuits are not adequate to model the temporal dependencies between the circuit nodes, at the combinational
part of the sequential circuit, at different time instances [52, 53, 54, 51, 55]. In order to handle
this a more dynamic model which can evolve through consecutive time instances is needed.

1.2 Significance
The errors that can occur in nano-domain VLSI circuits can be widely divided into two
categories, hard faults and soft errors. Hard faults refer to any permanent faults that can

3

Reliability issues in Nano-domain VLSI circuits
Hard faults

Physical
defects
Silicon substrate
defects

Soft errors

Electrical
Faults

Logical
Faults

Shorts and
Opens

Stuck-at
Faults

Single Event Upsets

Dynamic errors

High energy neutrons

Device and process
parameter variations

Local errors
Static permanent errors

Global errors

This work
Figure 1.1. Significance of this dissertation
occur in a circuit component due to physical defects like oxide abnormalities in the transistor,
electrical defects like shorts and opens, logical defects like stuck-at and delay faults. Soft
errors refer to the failures in circuit components due to external conditions like high energy
neutron interaction or device parameter variations. Out of these categories, soft errors are
the toughest to model due to their transient nature, since the external conditions responsible
for these errors are highly unpredictable at the nanometer levels. So, the reliability models
used to address hard faults cannot be used to model soft errors, since they are completely
deterministic. Therefore, comprehensive probabilistic models, like our model, are well suited
to handle the transient soft errors.
The most prevalent soft errors in nano-domain VLSI circuits are widely categorized into
Single Event Upsets (SEUs) due to external particle interaction, and dynamic errors due to
device and process variabilities. While failures due to SEUs are more localized, in the sense,
they occur in a particular component in the circuit and gets propagated, dynamic errors are
more global, in the sense that, they can occur on multiple components of the circuit at the
same time. So, the models that address failures due to SEUs are not enough to model dynamic

4

errors. Our model, presented in this work, targets the dynamic errors by giving provisions to
address error behavior in multiple circuit components at the same time.
Also our model can be considered as a complete and comprehensive model that can accurately calculate both maximum and average errors in digital circuits. This versatility offers a
wider diagnostic application space, which aids the collection of a variety of information sets
that are highly essential for IC testing.

1.3 Contribution
The contributions of this dissertation are as follows,
• A method to calculate maximum output error in digital circuits using a probabilistic
model is presented.
– Given a circuit with a fixed gate error probability ε, this error model can provide
the maximum output error probability and the worst-case input vector, which can
be very useful testing parameters. Its also shown that these worst-case input vectors not only depend on the circuit structure but could dynamically change with
ε.
– It is shown that the maximum output error probabilities are much larger than average output error probabilities, for comparatively lower values of individual gate
error probability ε, thereby signifying the importance of maximum error as a design parameter.
– The circuit-specific error bounds for fault-tolerant computation are presented and
it is shown that maximum output errors provide a tighter bound. Also, it is shown
that the error bound for an individual gate placed in a circuit can be dependent on
the circuit structure.

5

– Through this work, an efficient design framework that employs inference in binary
join trees using Shenoy-Shafer algorithm, to perform MAP hypothesis accurately,
is being applied for the first time in the context of digital computing machines.
– The validity of the error model is tested through comparison with circuit simulations using HSpice and the results showed that the highest % difference of the
error model over HSpice is just 1.23%, signifying its accuracy.
– The possibility of efficient error incorporation in this model is presented by providing variable ε values to different gates of a circuit, instead of providing the same ε
value to all gates. This formation of the error model can help in useful diagnostic
studies like error sensitivity analysis.
• An exact probabilistic error model that can study transient error behavior in sequential
logic is presented.
– This model can accurately calculate the average output error probability in any
given sequential circuit.
– A minimal time evolving probabilistic network, namely, the Temporal Dependency
Model (TDM), that can handle both spatial dependencies between nodes in a single
time slice and temporal dependencies between nodes in different time slices, is
presented.
– It is shown that the increase in output error probabilities is more than 2 folds, even
for a slight increase in ε value, thereby indicating the vulnerability of sequential
circuits to transient errors.
– The crucial study of error propagation across different time instances, in a sequential circuit, can be performed using this model. This study is important to
understand error behavior in sequential circuits.
6

– It is shown that the number of time slices needed by the model, to converge to a
final average output error value, is completely dependent on the circuit structure.
– The flexibility of the error model is shown by incorporating unequal gate error
probability values, ε0 and ε1 , to study the effect of 0 → 1 and 1 → 0 errors on
the output of a circuit. Given a gate output signal, ε 0 represents the probability
of error occurrence when the ideal value of the signal is ’0’, and ε1 represents the
probability of error occurrence when the ideal value of the signal is ’1’.
– The validity of the error model is tested through comparison with circuit simulations using HSpice and the results showed that the highest percentage difference
of the error model over HSpice is only 6.25%, signifying its accuracy.
• Using the probabilistic error model, temporal, spatial and hybrid redundancy techniques
are performed, to achieve error mitigation in digital logic circuits.
– Efficient error reduction is achieved through selective redundancy, which is established by applying redundancy only to the most influential input combinations and
the most sensitive nodes.
– Through experimental results, the relative benefits of the temporal, spatial and
hybrid redundancy schemes are presented and hybrid redundancy is shown to be
the best scheme for error mitigation in digital logic circuits.
– It is shown that increasing the amount of redundancy results in better error mitigation in all the three schemes.
– It is shown that the error mitigation percentage for 15% temporal redundancy, is
more than 10% for all circuits, while for 15% spatial redundancy, it is more than
20% for all circuits and for 15% hybrid redundancy, it is more than 30% for all
circuits, thereby showing the high yield of hybrid redundancy scheme.
7

¾ Accuracy
¾ Precision
¾ Safety

Defense

Automobiles

Has to be Reliable
Communication

Medicine

Figure 1.2. Scope of application
– Delay and area penalties in temporal and spatial redundancies respectively are
presented and its is shown that the area penalty is much higher than the delay
penalty.

1.4 Scope of Application
Digital VLSI circuits are widely used in critical and essential applications like automobiles, defense, medicine and communication. The need for reliable computation in these
circuits are of the utmost importance due to the nature of its applications. VLSI circuits are
used in the automobile brake system, implantable bio-medical devices like pacemaker, aircraft control system and multi-functional smart phones like iPhone. The scaled computational
devices in current generation nano-domain VLSI circuits has immensely improved its application space. The advent of smart phones and implantable bio-medical chips are made possible
primarily by this scaling trend. At the same time VLSI circuits at nano-domain suffer from
various reliability issues that should be addressed during the design process.
8

In nano-domain digital VLSI circuits, affected by multiple errors, there can be one maximum error that can even breakdown the entire device. While primarily all of the current
reliability studies estimate the overall adverse effect by considering the average of all errors,
estimating the worst-case maximum error has proved to be tedious and cumbersome. Our reliability model, presented in this dissertation, can efficiently estimate this worst-case maximum
error and the input vector associated with this error, through intelligent diagnostic studies.
In IC testing, the usage of a probabilistic error model and the information about the worstcase input vector can help to improve testing techniques like scan chains, burn-in test and
hierarchical testing. Scan chains are widely used in Design for Test (DFT) methodologies
for IC testing. The basic idea is to form a chain of flip-flops that are made scan-able and
the desired test pattern can be serially inserted into the flip-flop chain. The test pattern is
applied to the logic circuits driven by the flip-flop chain after which the logic circuit outputs
can also be captured into the same or different flip-flop chain for serial shift-out. In such
a setup, including the worst-case input vector in the test patterns can speed up the testing
process, since the most hazardous behavior of the circuit-under-test can be detected with the
worst-case input vector. Burn-in tests are performed to find out devices with inherent defects
or manufacturing defects [44]. These devices will go faulty when subjected to high stress. The
IC is subjected to long test time and stress conditions, such as extreme Vdd and temperatures,
during a burn-in test. To aid the burn-in test, a probabilistic error model that can target and
exercise individual device fault modes would help to expedite the failure mechanisms and
to screen for inherent faults in a shorter test time. More specifically, the worst case input
vectors generated according to our method is well suited for application during the burnin test. Finally, in hierarchical testing, the entire circuit-under-test is divided into several
internal modules where these modules can be tested individually. Such a hierarchical division
reduces the size of circuit-under-test facilitating rigorous probabilistic error analysis and the
application of worst input vectors to the targeted internal modules.
9

The reliability model for error estimation in sequential circuits, presented in this dissertation, can be used to perform efficient diagnostic studies in essential real-time applications like
computer memories. In these sequential circuits, for a fixed input vector, the intermediate signals can get stuck at a wrong value due to the presence of error. This could propagate across
several time instances and this behavior can happen to any input vector. Such deterministic
approaches provides inaccurate estimation of the error behavior in sequential circuits. Our
model, which is a probabilistic reliability model, takes care of this discrepancy by treating
both input and signal space in a probabilistic manner, thereby ensuring efficient diagnostic
studies for reliability.
While the three error mitigation schemes, temporal, spatial and hybrid, presented in this
dissertation, can be used for error optimization in any nano-domain VLSI circuit, the tradeoff studies between them can provide essential application-specific information for circuit
designers. If the application demands lesser area, then more importance should be given
to temporal redundancy than spatial redundancy. If the application has high probability of
error occurrence in the signal space than the input space, then more importance should be
given to spatial redundancy than temporal redundancy. The trade-off studies presented in
this dissertation, can provide information related to the above scenarios, which are crucial for
circuit design.

1.5 Organization
This dissertation is organized as follows,
• Chapter 2 provides the related research works done in the field of probabilistic reliability
analysis for VLSI circuits.
• Chapter 3 provides the fundamental design concepts of the probabilistic error model.

10

• Chapter 4 explains, in detail, about the modeling of maximum errors in logic circuits
using a probabilistic error model.
• Chapter 5 explains, in detail, about modeling of errors in sequential circuits using a
dynamic time-evolving probabilistic error model.
• Chapter 6 explains, in detail, about the temporal, spatial and hybrid redundancy schemes
used for error mitigation in digital logic circuits.
• Chapter 7 provides the conclusion and future directions of this work.

11

CHAPTER 2
RELATED WORK

In nanometer level circuits, due to device scaling, the most prevalent and detrimental errors
are soft errors that are caused mainly by external particle interactions and variations in device
and process parameters. While the former results in localized failures like Single Event Upsets
(SEUs), the latter leads to more global dynamic errors.

2.1 SEU Modeling
The modeling of device failures due to SEUs are done in different levels of design abstraction, like device level, circuit level and gate level [62, 69, 70, 71, 73, 85, 74]. Initial work on
external radiation interaction on semiconductors was done as early as 1967 [62], in which the
authors proposed one dimensional drift diffusion models to study the radiation effects on semiconductor devices used widely in space applications. This work was followed by a number
of significant device level models for memory elements, using numerical simulation [63, 64].
In order to handle more complex situations, which are intractable by numerical simulation
models, analytic and empirical models were proposed [67]. The study of external particle
interaction with semiconductor devices, which is more of a multi dimensional phenomenon,
was enhanced through the advent of two dimensional and three dimensional models [65, 66],
which accurately measured the charge particle drift and diffusion mechanisms. At the circuit
level, SEU modeling is done by addressing circuit parameters like supply voltage, threshold
voltage and clock period; and circuit characteristics like electrical masking, logical mask-

12

ing and latching window effects. Simulation based models like SEMM [69] and SERA [56]
encapsulates these circuit aspects to provide soft error rate analysis in digital logic circuits.
While optimization techniques using dual-Vdd and gate sizing are used to model SEU [70], its
effects on interconnects are also modeled at the placement level [71], using simulated annealing. At the gate level, SEU modeling is based primarily on the detection of the probability of
error occurrence at the gate outputs. Logical abstraction tools like binary decision diagrams
are used to perform soft error rate analysis in both combinational [72] and sequential [51] circuits, while a completely probabilistic model based on Bayesian networks was used in [85] to
detect SEUs in digital logic circuits. While practical experiments like injecting SEUs in chips
using laser pulses to verify fault tolerance [74, 75] were performed, popular testing techniques
like built-in self-test mechanism [76] were also used to study soft errors.

2.2 Dynamic Error Modeling
Dynamic errors are transient soft errors caused by the uncontrollable and unpredictable
fluctuations in device and process parameters due to scaling. These global errors can coexist with the local SEUs and static hard faults, and they can happen randomly at any node in
the circuit, making them untraceable. The basic concept of dynamic error modeling is the
assumption that every circuit component will have a finite propensity to be erroneous. Based
on this idea, researchers approached dynamic error modeling problem in three broad categories, calculation of error bounds, calculation of average error, and error reduction through
redundancy.

2.2.1 Calculation of Error Bounds
The study of reliable computation using unreliable components was initiated by Von Neumann [1] who showed that erroneous components with some small error probability can pro-

13

vide reliable outputs and this is possible only when the error probability of each component
is less than 1/6. In this heuristic study, Neumann represented the logic gates as automatons
which are governed by logic functions. It was stated that the probability of error in the automaton and its output cannot exceed 1/2, since the system will become irrelevant at that
bound. Keeping this as the basic upper bound for the probability of error in the output, the
error probability of the automaton was studied through a majority organ, in which three copies
of the same automaton were created and the majority of the three outputs was considered true.
This arrangement was proven to reduce the error probability of the base system, and through
this it was shown that the error probability of the automaton cannot be ≥ 1/6, since at this
upper bound the system becomes unsustainable.
This work was later enhanced by Pippenger [3] who realized Von Neumann’s model using
formulas for boolean functions. Here the digital logic components are realized using functions
whose number of arguments relate to the number of inputs in the component. Through this
arrangement, it was shown that for a function controlled by k-arguments, the error probability
of each component should be less than (k − 1)/2k to achieve reliable computation. Through
this, an interesting result was shown for 3-input components, whose error probability bound
for reliable computation was 1/3, which is greater than the Von Neumann bound of 1/6,
thereby creating curiosity. This work was later extended by using networks instead of formulas
to realize the reliability model [4]. In [5], Hajek and Weller used the concept of formulas to
show that for 3-input gates the error probability should be less than 1/6, thereby reiterating
Von Neumann’s bound. Later this work was extended for k-input gates [6] where k was chosen
to be odd. The authors claimed that since k + 1 input gates can simulate k input gates, their
model can be easily used to compute bounds for gates with even number of inputs. For a
specific even case, Evans and Pippenger [7] showed that the maximum tolerable noise level
√
for 2-input NAND gate should be less than (3 − 7)/4 = 0.08856 · · ·.

14

Later this result was reiterated by Gao et al. [8] for 2-input NAND gate, along with other
results for k-input NAND gate and majority gate, using bifurcation analysis that involves
repeated iterations on a function relating to the specific computational component. The probability of the output line of a NAND gate, given by Z, was associated with the probabilities of
the input lines X and Y using the equation,
Z = (1 − ε)(1 − XY ) + εXY = (1 − ε)(2ε − 1)XY

(2.1)

where ε is the probability of error in the NAND gate. In order to study the error behavior, a
network of NAND gates, where the output of each gate is connected to the input of at least
one other gate, was created and the inputs X and Y are considered to be equally probable to
be at logic ’1’. The corresponding equation for this network was written as,
Xi+1 = (1 − ε) + (2ε − 1)Xi2

(2.2)

The initial value X0 was arbitrarily chosen and an iterative process was performed to obtain
consequent Xi values. After the solution has converged, values from the last few iterations
are plotted against the corresponding ε values to obtain the bi-modal graph for bifurcation
analysis. This bi-modal graph clearly showed that reliable computing using erroneous 2-input
NAND gates is not possible when its error probability ε = 0.08856 · · ·.
While there exist studies of circuit-specific bounds for circuit characteristics like switching
activity [9], the study of circuit-specific error bounds would be highly informative and useful
for designing high-end computing machines.

15

2.2.2 Calculation of Average Error
Many researchers are currently focusing on computing the average error from a circuit
and also on the expected error to conduct reliability-redundancy trade-off studies. In [45], a
Probabilistic Transfer Matrix (PTM) based model for reliability studies was proposed. In this
method each circuit signal is represented using random variables and the functionality of each
erroneous gate is represented in a matrix form using the PTMs (Fig. 2.1.(a)). Each gate in
the underlying digital circuit was represented by an individual PTM. To calculate the error
probability of the circuit, a PTM for the entire circuit is formed by multiplying the individual
gate PTMs. If gates g1 and g2 are connected in series, under the condition that when g 1 gets
an input gI1 it results in g2 giving an output gO
2 , the combined PTM can be written as
I
p(gO
2 |g1 ) =

∑ p(gO2 | j)p( j|gI1)

(2.3)

all j

If gates g1 and g2 are connected in parallel, under the condition that when g 1 gets an input gI1
I
O
it results in output g O
1 and when g2 gets an input g2 it results in output g 2 , the combined PTM

can be written as
I
O I
O I
p(gO
2 |g1 ) = p(g2 |g2 )p(g1 |g1 )

(2.4)

This is an exact method but it is computationally expensive.
An approximate method based on Probabilistic Gate Model (PGM) is discussed by Han
et al. in [15]. Here the PGMs are formed using the sum of product equations governing the
functionality between an input and an output. For any gate, with an output Z i and with error
probability ε, its PGM can be written as,
Zi = Ei (1 − ε) + (1 − Ei)ε

16

(2.5)

H






X2

0

1

H
H
H
H

H
H
H
H

X3

X4
X1

(a)

(b)

Figure 2.1. (a) Probabilistic transfer matrix for erroneous NAND gate with error probability
ε [45] (b) Markov random field [47]
where Ei is the sum of product equation. For a 2-input AND gate with inputs I 1 and I12 ,
Ei = I1 I2 . So the corresponding Zi can be written as,
Zi = (I1 I2 )(1 − ε) + (1 − (I1I2 ))ε

(2.6)

All the gates in the circuit were represented with individual PGMs and the overall reliability of
the circuit was calculated by multiplying the individual gate reliabilities, which were assumed
to be independent. This approximate model was proved to be faster than the exact PTM model.
A Markov Random Field (MRF) based probabilistic model for reliability studies was proposed in [47], which concentrated more on hard errors than soft errors. Here, the circuit
signals were represented as random variables in a Markov random network, where every node
is dependent only with the directly connected nodes that are called its neighbors (Fig. 2.1.(b)).
Given a set of random variables Γ = {X1, · · · , Xn } forming a Markov network, the probability

17

of any random variable, Xi , in the Markov network was described using Gibbs distribution as
follows,
P(Xi |{Γ − Xi }) =

1 − 1 ∑c∈φ Uc (X)
e kT
Z

(2.7)

where Z is a normalizing constant that bounds the probability value to [0,1], kT is the thermal
energy, c is clique in the set of cliques φ associated with Xi and Uc is the clique energy. A
typical clique in the circuit representation of Markov network will comprise of the nodes
representing the inputs and output of a gate. In this sense, every gate will have its own clique
and clique energy. The logic gates were represented using their sum of products term and the
clique energy for each gate was derived. For an inverter with input x0 and output x1 , the clique
energy was derived as follows,
U = −((1 − x0 )x1 + x0 (1 − x1 ))

(2.8)

= −(x1 − x0 x1 + x0 − x0 x1 )
= 2x0 x1 − x0 − x1

The negative sign in the clique energy signified the design condition that clique energies of
valid states should be lower than those of invalid states. The corresponding Gibbs distribution
was given as,
P(x0 , x1 ) =

1 − 1 (2x0 x1 −x0 −x1 )
e kT
Z

(2.9)

The probability of output x1 = 1 was calculated by marginalizing P(x0 , x1 ) over all possible
values of x0 .
P(x1 ) =

1
Zx

∑

e− kT (2x0 x1 −x0 −x1 )
1

0 ={0,1}

x1

=

e kT + e

(1−x1 )
kT
1

2(1 + e kT )
18

(2.10)

Likewise, the probability distribution of every signal in the circuit was represented using Gibbs
distribution. Corresponding probability distributions for the primary outputs of the circuit was
determined by propagating the marginalized distributions across various cliques using belief
propagation algorithm. Since these distributions were associated with thermal energy kT ,
comprehensive reliability studies on nanoarchitectures working under critical thermal limits,
were performed by altering the kT values and examining the signal probability distributions.
Although, this work provided some much needed insight on thermal behavior of nano-domain
circuits, it was performed on error free devices instead of erroneous ones.
Another work on reliability studies using probabilistic model checking was proposed
in [58]. This method employed discrete-time Markov Chains for probabilistic model checking. In another significant work [99], the average output error in digital circuits was calculated
using a probabilistic reliability model that employed Bayesian Networks.

2.2.3 Error Reduction Through Redundancy
The term ’redundancy’ means the usage of multiple redundant copies of the same erroneous component in order to test or improve its reliability. Von Neumann, in his legendary
work, was one of the first to propose one such methodology called multiplexing and he used
it to study the reliability of NAND logic [1]. This model was created by taking multiple
copies of the same erroneous NAND gate and supplying them input signals randomly from
various bundles of input lines. This setup ensures effective duplication of all possible signals
at the outputs. To obtain better error tolerance, two more NAND multiplexing setups are cascaded with the previous one. While the first NAND multiplexing setup called the “Executive
Unit” performed the logic computation, the following two units called the “Restorative Unit”
restored the correct computation values. (Fig. 2.2.)
Von Neumann also introduced the widely used redundancy technique called Triple Modular Redundancy (TMR) [1]. In TMR, three copies of the same erroneous logic component
19

Randomizing Unit

Randomizing Unit

Randomizing Unit
Executive Unit

Restorative Unit

Figure 2.2. NAND multiplexing scheme introduced by Von Neumann [1]
was created and the correct value was determined by performing the majority voting out of the
three outputs. Given three different signals X , Y and Z, the majority voting could be performed
using the function, XY +Y Z + X Z. Using this, Von Neumann showed significant reduction in
the probability of error occurrence in logic devices. As an extension of TMR, a more general
model called N-Modular Redundancy (NMR) [2] was proposed, where N is chosen to be odd
to facilitate majority voting. If TMR was used to choose the majority of 2 out of 3 inputs,
NMR was used to choose the majority of n + 1 out of 2n + 1 inputs. Also, given an erroneous
system with error probability ε, the reliability R through performing TMR was given by,
R(T MR) = ε3 + 3ε2 (1 − ε)

(2.11)

and the corresponding reliability through performing NMR was given by,
n

N!
(1 − ε)i εN−i
(N
−
i)!i!
i=0

R(NMR) = ∑

20

(2.12)

where N = 2n + 1. These base models for hardware redundancy were later applied in essential
applications like fault-tolerant microprocessor design [10], and also paved the way to a variety
of techniques for software, data and time redundancies. Apart from being used in the circuit
level, they were also used in different levels of design abstractions like in [73], where Selective
TMR (STMR) was used in FPGA’s to minimize error behavior due to SEUs.
From the initial works of Von Neumann, the study of fault-tolerant computation expanded
its barriers into fields like nano-computing architectures. An expansion of the TMR technique
called Cascaded Triple Modular Redundancy (CTMR) [11] was used for reliability studies of
nanochips using single-electron devices and quantum cellular automata gates. While TMR
is referred to as single level redundancy technique, CTMR is referred to as multilevel redundancy technique, where outputs from three different TMR units were supplied to another
majority gate to perform multiple levels of voting in order to obtain better error reduction. A
generalized CTMR technique, called Cascaded General Modular Redundancy (CGMR) was
also proposed in this work [11].
In [12], the reliability of reconfigurable architectures was obtained using NAND multiplexing technique. The processors in the architecture were implemented with NAND multiplexing system with a redundancy factor of 3. In the design, redundant spare circuitries were
also developed to enhance error correction and minimize error detection. In [13], majority
multiplexing was used to achieve fault-tolerant designs for nanoarchitectures. They further
enhanced the majority multiplexing model for small input error probabilities, by removing the
restorative stage, since effective restoration is possible without that stage. A recent comparative study of some of these methods [14], indicates that a 1000-fold redundancy would be
required for a device error (or failure) rate of 0.011 .
1 Note that this does not mean 1 out of 100 devices will fail, it indicates the devices will generate erroneous
output 1 out of 100 times.

21

Reliability models for dynamic errors in VLSI circuits

Reliability models for dynamic errors in
combinational circuits

Reliability models for dynamic errors in
sequential circuits

This work
Error bounds

Average error

Von Neumann [1]

Maximum error

Krishnaswamy et al. [45]

Redundancy

This work

Von Neumann [1]

Pippenger [3]

Han et al. [15]

Mathur et al. [2]

Feder [4]

Bahar et al. [47]

Depledge [10]

Hajek et al. [5]

Norman et al. [58]

Spagocci et al. [11]

Evans et al. [6, 7]

Rejimon et al. [99]

Han et al. [12]

Gao et al. [8]

Roy et al. [13]

Marculescu et al. [9]

This work

Figure 2.3. Some of the related works on reliability models for dynamic errors in VLSI circuits
2.3 Relation to State-of-the-Art
This work concentrates on the following,
• Modeling dynamic errors, which are global, as opposed to localized SEUs. This is
done using a probabilistic error model, where efficient error incorporation in multiple
nodes is possible. Also in this model, the error injection and probability of error for
each gate can be modified easily. Moreover, both fixed and variable gate errors can be
accommodated in a single circuit without affecting computational complexity.
• Estimation of maximum error as opposed to average error, since for higher design
levels it is important to account for maximum error behavior, especially if this behavior
is far worse than the average case behavior. This estimation is performed as a diagnostic
study in our error model, using the Maximum a posteriori (MAP) hypothesis, where the

22

output nodes are forced to be erroneous and the information is propagated towards the
input nodes to estimate the possible input configuration, that can provide a maximum
error in the output.
• Estimation of output error in sequential circuits as opposed to combinational circuits,
since the transient errors that occurs in a particular time frame, of a sequential circuit,
will propagate to consecutive time frames thereby making the device more vulnerable. This estimation is performed using a minimal time evolving probabilistic network,
namely, the Temporal Dependency Model (TDM), that can handle both spatial dependencies between nodes in a single time slice and temporal dependencies between nodes
in different time slices.
• Designing temporal, spatial and hybrid redundancy schemes, using our probabilistic
error model, to achieve error mitigation. We perform temporal redundancy using Triple
Temporal Redundancy (TTR) technique and spatial redundancy using CTMR technique.
Also efficient error reduction is achieved through selective redundancy, by applying
redundancy only to the most influential input combinations and the most sensitive nodes.

23

CHAPTER 3
DESIGN FUNDAMENTALS

3.1 Probabilistic Representation of Digital Circuits
A digital circuit is basically a network of digital signals connected together through gates
whose functionalities are based on boolean logic. This network can be represented accurately
using a graphical model, where the nodes represent the digital signals and the edges represent
the boolean logic functionality of the gates. Also these edges should be unidirectional, since
information flow in digital circuits is unidirectional from input to output. In order to assist efficient diagnostic studies on digital circuits, their graphical representation can be modeled as
probabilistic graphical models where each node is a random variable with two possible states,
’logic 0’ and ’logic 1’ or simply ’0’ and ’1’. To represent the digital functionalities, each random variable should be associated with a probability distribution function (pdf). Consider the
example in Fig 3.1., where a digital circuit and its probabilistic graphical model are given. As
discussed, each node from N1 to N8 is a random variable whose value will be either ’0’ or ’1’.
In a network representing any digital circuit, the nodes corresponding to the primary inputs
(i.e.,N1, N2, N3 in our example) will always be completely independent and every other child
node will be dependent on at least one parent node. This kind of interdependency between
nodes gives rise to conditional probability distribution, and so the pdf’s are represented as
Conditional Probabilistic Tables (CPTs). Fig 3.1. provides the CPTs for all the nodes. Since
N1, N2, N3 are primary inputs, their pdf’s can be controlled by the user. The child node N4
is dependent on its parent nodes N1 and N2 through AND logic, and the corresponding CPT

24

Digital Circuit
N1

N2

Corresponding Probabilistic Graph model
N3
N1

N5

N4

N2

N4

N6

N5

N7
N7

N1:
N2:
N3:

N7:

N8:

N8

Corresponding Conditional Probabilistic
Tables (CPTs) for the graph

N1

P(N1=0)

P(N1=1)

0 (or) 1

0.5

0.5

N2

P(N2=0)

P(N2=1)

0 (or) 1

0.5

0.5

N3

P(N3=0)

P(N3=1)

0 (or) 1

0.5

0.5

N1

N2

N4

N1

N2

0
0
1
1

0
1
0
1

0
0
0
1

0
0
1
1

0
1
0
1

N5:

N6:

N6

N8

Digital signals governed by Boolean logic

N4:

N3

N2

N5

N2

0
1

1
0

0
1

P(N4=0 | N1,N2)
1
1
1
0
P(N5=0 | N2)
0
1

N2

N3

N6

N2

N3

0
0
1
1

0
1
0
1

0
1
1
1

0
0
1
1

0
1
0
1

N4

N5

N7

N4

N5

0
0
1
1

0
1
0
1

1
0
0
0

0
0
1
1

0
1
0
1

N5

N6

N8

N5

N6

0
0
1
1

0
1
0
1

1
1
1
0

0
0
1
1

0
1
0
1

P(N6=0 | N2,N3)
1
0
0
0
P(N7=0 | N4,N5)
0
1
1
1
P(N8=0 | N5,N6)
0
0
0
1

P(N4=1 | N1,N2)
0
0
0
1
P(N5=1 | N2)
1
0
P(N6=1 | N2,N3)
0
1
1
1
P(N7=1 | N4,N5)
1
0
0
0
P(N8=1 | N5,N6)
1
1
1
0

Figure 3.1. Representation of a digital circuit as a probabilistic graph

25

should reflect this functionality. This can be achieved by providing the pdf as follows,
⎧
⎪
⎨ 0 if N1=1 and N2=1
P(N4 = 0|N1, N2) =
⎪
⎩ 1
otherwise

P(N4 = 1|N1, N2) =

⎧
⎪
⎨ 1 if N1=1 and N2=1
⎪
⎩ 0

(3.1)

otherwise

Similarly the CPTs for N5 should obey NOT logic, N6 should obey OR logic, N7 should obey
NOR logic, and N8 should obey NAND logic.
Once the graph model for a digital circuit is ready, the next obvious question is whether the
model captures all the interdependencies between the nodes. For example, in the probabilistic
graph given in Fig 3.1., the node N7 is directly dependent on nodes N4, N5 and indirectly dependent on nodes N1, N2. Also, node N8 is directly dependent on nodes N5, N6 and indirectly
dependent on nodes N2, N3. If we add edges representing these indirect dependencies, then
the resulting probabilistic graph will be as seen in Fig 3.2.(a). But are these edges necessary?
In the given digital circuit, it can be seen that the relation of the signal N7 towards the signals
N1, N2 is taken care by the signals N4, N5, i.e. any change in signals N1, N2 will be captured
by their direct output signals N4, N5 and the same changes will be translated to signal N7
through N4 and N5. So, in the corresponding probabilistic graph model, we can comfortably
say that node N7 is independent of nodes N1, N2 given nodes N4, N5. In a similar fashion,
we can also say that node N8 is independent of nodes N2, N3 given nodes N5, N6. In other
words, we can say that all the indirect dependencies are taken care by the direct dependencies.
As a result all the extra edges representing indirect dependencies can be removed from the
probabilistic graph model given in Fig 3.2.(a) resulting in Fig 3.2.(b), which is similar to the
initial model given in Fig 3.1. This representation is the absolute minimal, in the sense that

26

Not a minimal representation

N1

N2

N4

N3

N5

N7

Minimal representation

N1

N6

N2

N4

N8

N3

N5

N7

(a)

N6

N8
(b)

Indirect dependencies
Direct dependencies

Figure 3.2. Minimal representation of the probabilistic graph
removing even one edge will collapse the interdependencies between the nodes and eventually
results in an incomplete representation of the given digital circuit.
The probabilistic graph model can be represented mathematically as the conditional factoring of a joint probability distribution. Any probability function P(y 1 , y2 , · · · , yN ) can be
written as,
P(y1 , · · ·, yN ) = P(yN |yN−1 , yN−2 , · · · , y1 )
P(yN−1 |yN−2 , yN−3 , · · · , y1 )
· · · P(y1 )

(3.2)

where y1 , y2 , · · · , yN are random variables. This expression holds for any ordering of these
random variables. For the example probabilistic graph model in Fig 3.1., this probability

27

function can be written as,
P(n1, · · ·, n8) = P(n8|n7, n6, n5, n4, n3, n2, n1)
P(n7|n6, n5, n4, n3, n2, n1)
P(n6|n5, n4, n3, n2, n1)
P(n5|n4, n3, n2, n1)
P(n4|n3, n2, n1)
P(n3)P(n2)P(n1)

(3.3)

where n1, · · ·, n8 are the random variables represented by the nodes N1, · · · , N8 respectively.
But this equation does not perfectly represent the structure of the corresponding probabilistic
graph model. As discussed earlier, in the minimal representation of the probabilistic graph
model, every child node is connected only to its parent nodes. So Eqn. 3.2 can be restructured
as follows,
P(y1 , · · ·, yN ) = ∏ P(yv |Pa(Yv ))

(3.4)

v

where Pa(Yv ) are the parents of the node Yv , representing its direct causes. For the example
probabilistic graph model in Fig 3.1., this restructured joint probability function can be written
as,
P(n1, · · ·, n8) = P(n8|n6, n5)P(n7|n5, n4)
P(n6|n3, n2)P(n5|n2)
P(n4|n2, n1)P(n3)P(n2)P(n1)

28

(3.5)

3.2 Modeling Error in Digital Circuits
Any unexpected change in the logic state of the digital signals gives rise to error in digital
circuits. In order to understand and study these errors, we need a model that can detect these
unexpected changes. One such way of doing that is to compare the erroneous circuit with its
ideal error-free counterpart. Consider the circuit in Fig 3.3.(a), where each signal other than
the primary input signals can be erroneous through the faulty gates. Note that we assume that
primary input signals are error-free. In order to create the error detection model, two copies of
the circuit is created, where one copy represents the circuit in its normal erroneous form and
the other copy represents the circuit in its ideal form. When the primary outputs of these two
copies are compared, any error occurrence will become evident through the possible presence
of dissimilar logic states. The appropriate logic gate to do this operation is the XOR gate,
which produces a ’1’ in its output when its inputs have dissimilar logic states and provides a ’0’
in its output when its inputs have similar logic states. Fig 3.3.(b) illustrates the error detection
model for digital circuits based on the above mentioned concept. N4 e , N5e , · · · , N8e are the
erroneous signals and N4, N5, · · · , N8 are the ideal signals. Signal C1 gives the comparison
between the erroneous and ideal primary outputs N7 and N7e ; signal C2 gives the comparison
between the erroneous and ideal primary outputs N8 and N8e . It should be noted that the
ideal error-free portion and the comparator portion are fictitious and used only for studying
the given circuit.
The corresponding probabilistic graph model for error detection can be created as shown
in Fig 3.3.(d). Lets say that each gate in the digital circuit has ε % chance of being faulty. ε
can be termed as the gate error probability. This can be accommodated in the corresponding

29

Erroneous digital circuit

Circuit model used to detect error in the erroneous digital circuit
N1

N1

N2

Erroneous Circuit

N5

N4

Ideal Circuit

N6e

N7
N7e

N3

N3

N5e

N4e

N2

N6

N8

N5e

N4e

N7e

N6e

N8e

N8e

Comparators

C1
(a)

(b)
Probabilistic graph model representing the error
detection model

Probabilistic graph model representing the erroneous
digital circuit

N1
N1

N2

N3

N3

N6e

N8e

N4

N5

N7

N6

N8

C1

Erroneous Nodes

N5e

N7e

N2

Ideal Nodes

N4e

C2

N4e

N7e

C2

Comparator Nodes
(c)

(d)

Figure 3.3. Error model

30

N5e

N6e

N8e

N1

N7

N6

N8

Erroneous Nodes

N5

N3

Ideal Nodes

N4

N2

Corresponding Conditional Probabilistic
Tables (CPTs)
N1 N2

N4:

0
0
1
1

0
1
0
1

P(N5=1 | N2)

0
1

0
1

1
0

0
0
1
1

0
1
0
1

N4 N5
0
0
1
1

0
1
0
1

N5 N6

N8:

0
0
0
1

P(N5=0 | N2)

N2 N3

N7:

1
1
1
0

0
0
1
1

0
1
0
1

P(N6=0 | N2,N3)
1
0
0
0
P(N7=0 | N4,N5)
0
1
1
1
P(N8=0 | N5,N6)
0
0
0
1

N1 N2

N4e:

N5e:

N6e:

N7e

N7e:

0
0
1
1

0
1
0
1

0
1
0
1

N5e N6e

N8e:

0
0
1
1

1-H
1-H
1-H
H

H
1-H

0
1

0
0
1
1

P(N4e=0 | N1,N2)

P(N5e=0 | N2)

N4e N5e

P(N8=1 | N5,N6)
1
1
1
0

0
1
0
1

N2 N3

P(N7=1 | N4,N5)
1
0
0
0

0
0
1
1
N2

P(N6=1 | N2,N3)
0
1
1
1

N5e

N6e

N8e

Corresponding Conditional Probabilistic
Tables (CPTs)

P(N4=1 | N1,N2)

N2

N5:

N6:

P(N4=0 | N1,N2)

N4e

0
1
0
1

P(N6e=0 | N2,N3)
1-H
H
H
H

P(N4e=1 | N1,N2)
H
H
H
1-H
P(N5e=1 | N2)
1-H
H
P(N6e=1 | N2,N3)
H
1-H
1-H
1-H

P(N7e=0 | N4e,N5e) P(N7e=1 | N4e,N5e)
H
1-H
1-H
1-H

1-H
H
H
H

P(N8e=0 | N5e,N6e) P(N8e=1 | N5e,N6e)
H
H
H
1-H

1-H
1-H
1-H
H

Figure 3.4. Conditional probabilistic tables for the ideal and erroneous nodes in the error
model

31

probabilistic graph model by changing the CPTs as follows,
⎧
⎪
⎨ ε
if N1=1 and N2=1
e
P(N4 = 0|N1, N2) =
⎪
⎩ 1−ε
otherwise

P(N4e = 1|N1, N2) =

⎧
⎪
⎨ 1 − ε if N1=1 and N2=1
⎪
⎩ ε

(3.6)

otherwise

where N4e is the erroneous output signal of a faulty AND gate as shown in Fig 3.3.(a). Accordingly, the corresponding CPTs for rest of the erroneous nodes are provided in Fig 3.4.

32

CHAPTER 4
MAXIMUM ERROR MODELING

In this chapter, we present a probabilistic model to study the maximum output error over
all possible input space for a given logic circuit. We present a method to find out the worstcase input vector, i.e., the input vector that has the highest probability to give an error at the
output. In the first step of our model, we convert the circuit into a corresponding edge-minimal
probabilistic network that represents the basic logic function of the circuit by handling the
interdependencies between the signals using random variables of interest in a composite joint
probability distribution function P(y 1 , y2 , · · ·, yN ). Each node in this network corresponds to a
random variable representing a signal in the digital circuit, and each edge corresponds to the
logic governing the connected signals. The individual probability distribution for each node
is given using conditional probability tables.
From this probabilistic network we obtain our probabilistic error model that consists of
three blocks, (i) ideal error free logic, (ii) error prone logic where every gate has a gate error
probability ε i.e., each gate can go wrong individually by a probabilistic factor ε and (iii) a
detection unit that uses comparators to compare the error free and erroneous outputs. The error
prone logic represents the real time circuit under test, whereas the ideal logic and the detection
unit are fictitious elements used to study the circuit. Both the ideal logic and error prone logic
would be fed by the primary inputs I. We denote all the internal nodes, both in the error free
and erroneous portions, by X and the comparator outputs as O. The comparators are based on
XOR logic and hence a state “1” would signify error at the output. An evidence set o is created
by evidencing one or more of the variables in the comparator set O to state “1” (P(O i = 1) =
33

1). Then performing MAP hypothesis on the probabilistic error model provides the worstcase input vector iMAP which gives max∀i P(i, o). The maximum output error probability can
be obtained from P(Oi = 1) after instantiating the input nodes of probabilistic error model with
iMAP and inferencing. The process is repeated for increasing ε values and finally the ε value
that makes at least one of the output signals completely random (P(O i = 0) = 0.5, P(Oi =
1) = 0.5) is taken as the error bound for the given circuit.
It is obvious that we can arrive at MAP estimate by enumerating all possible input instantiations and compute the maximum P(i, o) by any probabilistic computing tool. The attractive
feature of this MAP algorithm lies on eliminating a significant part of the input search-subtree
based on an easily available upper-bound of P(i, o) by using probabilistic traversal of a binary
Join tree with Shenoy-Shafer algorithm [23, 24]. The actual computation is divided into two
theoretical components. First, we convert the circuit structure into a binary Join tree and employ Shenoy-Shafer algorithm, which is a two-pass probabilistic message-passing algorithm,
to obtain multitude of upper bounds of P(i, o) with partial input instantiations. Next, we construct a Binary tree of the input vector space where each path from the root node to the leaf
node represents an input vector. At every node, we traverse the search tree if the upper bound,
obtained by Shenoy-Shafer inference on the binary join tree, is greater than the maximum
probability already achieved; otherwise we prune the entire sub-tree. Experimental results
on a few standard benchmark show that the worst-case errors significantly deviate from the
average ones and also provides tighter bounds for the ones that use homogeneous gate-type
(c17 with NAND-only). Salient features and deliverables are itemized below:
• We have proposed a method to calculate maximum output error using a probabilistic
model. Through experimental results, we show the importance of modeling maximum
output error. (Fig. 4.10.)

34

• Given a circuit with a fixed gate error probability ε, our model can provide the maximum
output error probability and the worst-case input vector, which can be very useful testing
parameters.
• We present the circuit-specific error bounds for fault-tolerant computation and we show
that maximum output errors provide a tighter bound.
• We have used an efficient design framework that employs inference in binary join trees
using Shenoy-Shafer algorithm to perform MAP hypothesis accurately.
• We give a probabilistic error model, where efficient error incorporation is possible, for
useful reliability studies. Using our model the error injection and probability of error
for each gate can be modified easily. Moreover, we can accommodate both fixed and
variable gate errors in a single circuit without affecting computational complexity.
We would like the readers to note that we will be representing a set of variables by bold
capital letters, set of instantiations by bold small letters, any single variable by capital letters.
Also probability of the event Yi = yi will be denoted simply by P(y i ) or by P(Yi = yi ).
4.1 Maximum a Posteriori (MAP) Estimate
Let us define the random variables in our probabilistic error model as Y = I ∪ X ∪ O,
composed of the three disjoint subsets I, X and O where
• I1 , · · · , Ik ∈ I are the set of k primary inputs.
• X1 , · · ·, Xm ∈ X are the m internal logic signals for both the erroneous (every gate has a
failure probability ε) and error-free ideal logic elements.
• O1 , · · · , On ∈ O are the n comparator outputs, each one signifying the error in one of the
primary outputs of the logic block.
35

I1

I2

I3

I1

I2

I3

I1

X1
X1 X2

X1

X2

H

X3

X4

X5

X6

X3
Block 2

H

Block 1

X6
Block 2

O1

X3
Block 3
(a)
Block 1

I3

X4 X5

X2
Block 1

I2

O1

Block 3

(b)
Error-free logic

Block 2

(c)
Error-prone logic

Block 3

Comparator logic

Figure 4.1. (a) Digital logic circuit (b) Error model (c) Probabilistic error model
• N = k + m + n is the total number of network random variables.
Any primary output node can be forced to be erroneous by fixing the corresponding comparator output to logic ”1”, that is providing an evidence o = {P(O i = 1) = 1} to a comparator
output Oi . Given some evidence o, the objective of the Maximum a posteriori estimate is to
find a complete instantiation i MAP of the variables in I that gives the following joint probability,
MAP(iMAP , o) = max P(i, o)
∀i

(4.1)

The probability MAP(iMAP, o) is termed as the MAP probability and the variables in I are
termed as MAP variables and the instantiation i MAP which gives the maximum P(i, o) is
termed as the MAP instantiation.
For example, consider Fig 4.1. In the probabilistic model shown in Fig 4.1.(c), we have
{I1, I2, I3} ∈ I; {X 1, X 2, X 3, X 4, X 5, X6} ∈ X; {O1} ∈ O. X 3 is the ideal error-free primary
output node and X 6 is the corresponding error-prone primary output node. Giving an evidence
o = {P(O1 = 1) = 1} to O1 indicates that X 6 has produced an erroneous output. The MAP
hypothesis uses this information and finds the input instantiation, i MAP , that would give the
36

maximum P(i, o). This indicates that i MAP is the most probable input instantiation that would
give an error in the error-prone primary output signal X 6. In this case, i MAP = {I1 = 0, I2 =
0, I3 = 0}. This means that the input instantiation {I1 = 0, I2 = 0, I3 = 0} will most probably
provide a wrong output, X 6 = 1 (since the correct output is X 6 = 0).
We arrive at the exact Maximum a posteriori (MAP) estimate using the algorithms by Park
and Darwiche [29] [30]. It is obvious that we could arrive at MAP estimate by enumerating
all possible input instantiations and compute the maximum output error. To make it more
efficient, our MAP estimates rely on eliminating some part of the input search-subtree based
on an easily available upper-bound of MAP probability by using a probabilistic traversal of a
binary Join tree using Shenoy-Shafer algorithm [23, 24]. The actual computation is divided
into two theoretical components.
• First, we convert the circuit structure into a binary Join tree and employ Shenoy-Shafer
algorithm, which is a two-pass probabilistic message-passing algorithm, to obtain multitude of upper bounds of MAP probability with partial input instantiations (discussed
in Section. 4.1.1). The reader familiar with Shenoy-Shafer algorithm can skip the above
section. To our knowledge, Shenoy-Shafer algorithm is not commonly used in VLSI
context, so we elaborate most steps of join tree creation, two-pass join tree traversal and
computation of upper bounds with partial input instantiations.
• Next, we construct a Binary tree of the input vector space where each path from the root
node to the leaf node represents an input vector. At every node, we traverse the search
tree if the upper bound, obtained by Shenoy-Shafer inference on the binary join tree, is
greater than the maximum probability already achieved; otherwise we prune the entire
sub-tree. The depth-first traversal in the binary input instantiation tree is discussed in
Section. 4.1.2 where we detail the search process, pruning and heuristics used for better

37

N{}{}
N{{II11} 0}
N{{II11, I 02,}I 2

N{{II11, I 02,,II 23} 0, I 3

0}

0}

N{{II11, I 02,,II 23} 0, I 3 1}

N{{II11} 1}
N{{II11, I 02,}I 2

N{{II11, I 02,,II 23}1, I 3

0}

N{{II11, I12,}I 2

1}

N{{II11, I 02,,II 23}1, I 3 1}

N{{II11, I12,,II23}0, I 3

0}

0}

N{{II11, I12,,II23}0, I 3 1}

N{{II11, I12,}I 2

N{{II11, I12,,II23}1, I 3

0}

1}

N{{II11, I12,,II23}1, I 3 1}

Figure 4.2. Search tree where depth first branch and bound search performed
pruning. Note that the pruning is key to the significantly improved efficiency of the
MAP estimates.

4.1.1 Calculation of MAP Upper Bounds Using Shenoy-Shafer Algorithm
To clearly understand the various MAP probabilities that are calculated during MAP hypothesis, let us see the binary search tree formed using the MAP variables. A complete search
through the MAP variables can be illustrated as shown in Fig. 4.2. which gives the corresponding search tree for the probabilistic error model given in Fig. 4.1.(c). In this search
will be
tree, the root node N will have an empty instantiation; every intermediate node N Iiinter
inter
associated with a subset Iinter of MAP variables I and the corresponding partial instantiation
iinter ; and every leaf node NIi will be associated with the entire set I and the corresponding
complete instantiation i. Also each node will have v children where v is the number of values
or states that can be assigned to each variable Ii . Since we are dealing with digital signals,
every node in the search tree will have two children. Since the MAP variables represent the
primary input signals of the given digital circuit, one path from the root to the leaf node of this
01
, Iinter = {I1, I2} and
search tree gives one input vector choice. In Fig. 4.2., at node N{I1,I2}

iinter = {I1 = 0, I2 = 1}. The basic idea of the search process is to find the MAP probability
MAP(i, o) by finding the upper bounds of the intermediate MAP probabilities MAP(i inter , o).

38

MAP hypothesis can be categorized into two portions. The first portion involves finding
intermediate upper bounds of MAP probability, MAP(i inter , o), and the second portion involves improving these bounds to arrive at the exact MAP solution, MAP(i MAP, o). These two
portions are intertwined and performed alternatively to effectively improve on the intermediate MAP upper bounds. These upper bounds and final solution are calculated by performing
inference on the probabilistic error model using Shenoy-Shafer algorithm [23, 24].
Shenoy-Shafer algorithm is based on local computation mechanism. The probability distributions of the locally connected variables are propagated to get the joint probability distribution of the entire network from which any individual or joint probability distributions can
be calculated. The Shenoy-shafer algorithm involves the following crucial information and
calculations.
• Valuations: The valuations are functions based on the prior probabilities of the variables
in the network. A valuation for a variable Yi can be given as φYi = P(Yi , Pa(Yi )) where
Pa(Yi ) are the parents of Yi . For variables without parents, the valuations can be given
as φYi = P(Yi ). These valuations can be derived from the CPTs as shown in Table 4.1.
• Combination: Combination is a pointwise multiplication mechanism conducted to combine the information provided by the operand functions. A combination of two given
functions f a and fb can be written as fa∪b = fa ⊗ fb , where a and b are set of variables.
Table 4.2. provides an example.
• Marginalization: Given a function f a∪b , where a and b are set of variables, marginalizmar(b)

ing over b provides a function of a and that can be given as f a = fa∪b

. This process

provides the marginals of a single variable or a set of variables. Generally the process
can be done by summing or maximizing or minimizing over the marginalizing variables
in b. Normally the summation operator is used to calculate the probability distributions.
In MAP hypothesis both summation and maximization operators are involved.
39

Table 4.1. Valuations of the variables derived from corresponding CPTs
CPT

Valuation

P(I2 = 1) = 1
0
1

Error-free AND
X 1 I1 I2 φX1
0
0 0
1
0
0 1
1
0
1 0
1
0
1 1
0
1
0 0
0
1
0 1
0
1
1 0
0
1
1 1
1

Error-prone AND
P(X 4 = 1|I1, I2) P(I2 = 0) = 1 P(I2 = 1) = 1
P(I1 = 0) = 1
ε
ε
P(I1 = 1) = 1
ε
1-ε

Error-prone AND
X 4 I1 I2 φX4
0
0 0 1-ε
0
0 1 1-ε
0
1 0 1-ε
0
1 1
ε
1
0 0
ε
1
0 1
ε
1
1 0
ε
1
1 1 1-ε

Input
P(I1 = 0) 0.5
P(I1 = 1) 0.5

Input
I1 φI1
0 0.5
1 0.5

Error-free AND
P(X 1 = 1|I1, I2) P(I2 = 0) = 1
P(I1 = 0) = 1
0
P(I1 = 1) = 1
0

40

x
0
0
1
1

I2

fxy
1
1
1
0

Valuation Network

X2

H

Block 2

X1
Block 3

Block 1

I1

y
0
1
0
1

Table 4.2. Combination
x y z
0 0 0
y z fyz
0 0 1
0 0 1
0 1 0
0 1 0
0 1 1
1 0 0
1 0 0
1 1 0
1 0 1
1 1 0
1 1 1

Eliminating O1

Eliminating X1

I,

II2

II1

II2

II1

II2

I1

I2

I1

I2

I1

I2

IX1

IX2

IX1

IX2

X1
O1

fxyz = fxy ⊗ fyz
1x1
1x0
1x0
1x0
1x1
1x0
0x0
0x0

X2

X1

IO1

X2

IX2
((IO1)mar(O1)8I
mar(X1)
X1)

(IO1)mar(O1)

O1
I1

I2

Eliminating X2

II1
X1

I1

I2

Eliminating I2

II2

I1

II1

X2

O1

((((IO1)mar(O1)
8IX1)mar(X1)8
IX2)mar(X2)
8II2)mar(I2)

(((IO1)mar(O1)
8IX1)mar(X1)8
IX2)mar(X2)

(a)

(b)

Figure 4.3. Illustration of the fusion algorithm

41

X2

The computational scheme of the Shenoy-Shafer algorithm is based on fusion algorithm
proposed by Shenoy in [25]. Given a probabilistic network, like our probabilistic error model
in Fig. 4.3.(a), the fusion method can be explained as follows,
• The valuations provided are associated with the corresponding variables forming a valuation network as shown in Fig. 4.3.(b). In our example, the valuations are φ I1 for {I1},
φI2 for {I2}, φX1 for {X 1, I1, I2}, φX2 for {X 2, I1, I2}, φO1 for {O1, X 1, X 2}.
• A variable Yi ∈ Y for which the probability distribution has to be found out is selected.
In our example let us say we select I1.
• Choose an arbitrary variable elimination order. For the example network let us choose
the order as O1,X1,X2,I2. When a variable Yi is eliminated, the functions associated
j

with that variable fY1i , · · · fYi are combined and the resulting function is marginalized
over Yi . It can be represented as, ( fY1i ⊗ · · · ⊗ fYi )mar(Yi ) . This function is then associated
j

with the neighbors of Yi . This process is repeated until all the variables in the elimination
order are removed. Fig. 4.3. illustrates the fusion process.
– Eliminating O1 yields the function (φ O1 )mar(O1) associated to neighbors X 1, X 2.
– Eliminating X 1 yields the function ((φ O1)mar(O1) ⊗φX1 )mar(X1) associated to neighbors X 2, I1, I2.
– Eliminating X 2 yields the function (((φ O1)mar(O1) ⊗ φX1 )mar(X1) ⊗ φX2 )mar(X2) associated to neighbors I1, I2.
– Eliminating I2 yields the function ((((φ O1)mar(O1) ⊗ φX1 )mar(X1) ⊗ φX2 )mar(X2) ⊗
φI2 )mar(I2) associated to neighbor I1.
– According to a theorem presented in [24], combining the functions associated with
I1 yields the probability distribution of I1. φ I1 ⊗ ((((φO1 )mar(O1) ⊗ φX1 )mar(X1) ⊗

42

φX2 )mar(X2) ⊗ φI2 )mar(I2) = (φI1 ⊗ φO1 ⊗ φX1 ⊗ φX2 ⊗ φI2 )mar(O1,X1,X2,I2) = Probability distribution of I1 [24]. Note that the function φ I1 ⊗ φO1 ⊗ φX1 ⊗ φX2 ⊗ φI2
represents the joint probability of the entire probabilistic error model.
• The above process is repeated for all the other variables individually.
To perform efficient computation, an additional undirected network called join tree is
formed from the original probabilistic network. The nodes of the join tree contains clusters of
nodes from the original probabilistic network. The information of locally connected variables,
provided through valuations, is propagated in the join tree by message passing mechanism.
To increase the computational efficiency of the Shenoy-Shafer algorithm, a special kind of
join tree named binary join tree is used. In a binary join tree, every node is connected to no
more than three neighbors. In this framework only two functions are combined at an instance,
thereby reducing the computational complexity. We will first explain the method to construct
a binary join tree, as proposed by Shenoy in [24], and then we will explain the inference
scheme using message passing mechanism.
The binary join tree is constructed using the fusion algorithm. The construction of binary
join tree can be explained as follows,
• To begin with we have,
– Λ =⇒ A set that contains all the variables from the original probabilistic network.
In our example, Λ = {I1, I2, X 1, X 2, O1}.
– Γ =⇒ A set that contains the subsets of variables, that should be present in the
binary join tree. i.e., the subsets that denote the valuations and the subsets whose
probability distributions are needed to be calculated. In our example, let us say
that we need to calculate the individual probability distributions of all the variables. Then we have, Γ = {{I1}, {I2}, {X1,I1,I2}, {X2,I1,I2}, {O1,X1,X2},
{X1}, {X2}, {O1}}.
43

/ ^O1,X1,X2,I2,I1}
*= {{I1},{I2},{X1,I1,I2},{X2,I1,I2},{O1,X1,X2},{X1},{X2},{O1}}
Choose O1
*2= {{O1,X1,X2},{O1}}
Ȗi = {O1,X1,X2}
Ȗj = {O1}
Ȗk = {O1,X1,X2}
O1

O1,X1,X2

O1,X1,X2
*O1 = *O1 - {Ȗi , Ȗj} U {Ȗk}
*2= {{O1,X1,X2},{O1}} - {{O1,X1,X2},{O1}} U {{O1,X1,X2}}
= {{O1,X1,X2}}
Ȗi = {O1,X1,X2}
Ȗj = {O1,X1,X2} – O1 = {X1,X2}
O1

O1,X1,X2
X1,X2

O1,X1,X2

* = * U {Ȗj}
= {{I1},{I2},{X1,I1,I2},{X2,I1,I2},{O1,X1,X2},{X1},{X2},{O1}} U {X1,X2}
* {{I1},{I2},{X1,I1,I2},{X2,I1,I2},{O1,X1,X2},{X1},{X2},{O1}} U {X1,X2}
= {{I1},{I2},{X1,I1,I2},{X2,I1,I2},{X1},{X2},{X1,X2}}
/ = {O1,X1,X2,I2,I1} = {X1,X2,I2,I1}

Figure 4.4. Partial illustration of binary join tree construction method for the first chosen
variable
– N =⇒ A set that contains the nodes of the binary join tree and it is initially null.
– E =⇒ A set that contains the edges of the binary join tree and it is initially null.
– We also need an order in which we can choose the variables to form the binary join
tree. In our example, since the goal is to find out the probability distribution of I1,
this order should reflect the variable elimination order (O1,X1,X2,I2,I1) used in
fusion algorithm .
•

1:

while |Γ| > 1 do

2:

Choose a variable Y ∈ Λ

3:

ΓY = {γi ∈ Γ|Y ∈ γi }

4:

while |ΓY | > 1 do

5:

Choose γi ∈ ΓY and γ j ∈ ΓY such that ||γi ∪ γ j || ≤ ||γm ∪ γn || for all γm , γn ∈ ΓY

6:

γ k = γi ∪ γ j
44

/ ^O1,X1,X2,I2,I1}
*={{I1},{I2},{X1,I1,I2},{X2,I1,I2},{O1,X1,X2},{X1},{X2},{O1}}
Choose O1
O1

O1,X1,X2
X1,X2

O1,X1,X2

/ ^X2,I2,I1}
*={{I1},{I2},{X2,I1,I2},{X2},{X2,I1,I2}}
O1,X1,X2

X1
X1,X2

O1,X1,X2
/ ^I2,I1}
*={{I1},{I2},{I1,I2}}

O1,X1,X2

X1
X1,X2

/ ^I1}
*={{I1},{I1}}
O1

X1
X1,X2

X1,X2

X1,X2,I1,I2

X1,I1,I2

X2,I1,I2

Choose X2
X1,X2

X1,X2,I1,I2

X2

X1,I1,I2

X2,I1,I2

X2,I1,I2

X2,I1,I2

X2,I1,I2

I1,I2

Choose I2

O1,X1,X2

O1

O1,X1,X2

O1

O1,X1,X2

O1

/ ^X1,X2,I2,I1}
*={{I1},{I2},{X1,I1,I2},{X2,I1,I2}, {X1},{X2},{X1,X2}}
Choose X1

X1,X2

X1,X2,I1,I2

X2

X1,I1,I2

X2,I1,I2

X2,I1,I2

X2,I1,I2

I2

I1,I2

I1,I2

X2,I1,I2

I1

Choose I1

O1,X1,X2

O1,X1,X2

Cluster C1
O1

X1

X1,X2

X2

X1,X2,I1,I2

X1,I1,I2

Cluster C2

Cluster C3

Cluster C5

Cluster C7

Cluster C9

O1,X1,X2

X1,X2

X1,X2,I1,I2

X2,I1,I2

I1,I2

Cluster C4

X1,I1,I2

X2

Cluster C6

Cluster C8

X2,I1,I2

I2

I2

X2,I1,I2

X1,X2

X1

X2,I1,I2

X2,I1,I2

I1,I2

I1,I2

I1

Cluster C11
I1

Binary Join tree

Cluster C10

Figure 4.5. Complete illustration of binary join tree construction method

45

I1
I1

7:

N = N ∪ {γi } ∪ {γ j } ∪ {γk }

8:

E = E ∪ {{γi, γk }, {γ j , γk }}

9:

ΓY = ΓY − {γi , γ j }

10:

ΓY = ΓY ∪ {γk }

11:

end while

12:

if |Λ| > 1 then

13:

Take γi where γi = ΓY

14:

γ j = γi − {Y }

15:

N = N ∪ {γi } ∪ {γ j }

16:

E = E ∪ {{γi, γ j }}

17:

Γ = Γ ∪ {γ j }

18:

end if

19:

Γ = Γ − {γi ∈ Γ|Y ∈ γi }

20:

Λ = Λ − {Y }

21:

end while

• The final structure will have some duplicate clusters. Two neighboring duplicate clusters
can be merged into one, if the merged node does not end up having more than three
neighbors. After merging the duplicate nodes we get the binary join tree.
Fig. 4.4. and Fig. 4.5. illustrate the binary join tree construction method for the probabilistic error model in Fig. 4.3.(a). Fig. 4.4. explains a portion of the construction method for
the first chosen variable, here it is O1. Fig. 4.5. illustrates the entire method. Note that, even
though the binary join tree is constructed with a specific variable elimination order for finding
out the probability distribution of I1, it can be used to find out the probability distributions of
other variables too.

46

C1
O1

C1
O1

MC2їC3=
C3

MC4їC3
IX1 C6
X1,I1,I2

X1,X2
MC3їC5= MC2їC38MC4їC3
C5

MC6їC5= IX1
C8
X2
MC8їC7
II2

C10
I2

X1,X2,I1,I2
MC5їC7=
C7
IX2
X2,I1,I2

(MC6їC58MC3їC5)mar(X1)

MC3їC2= MC4їC38MC5їC3
C3

MC4їC3
IX1 C6
X1,I1,I2

X1,X2
MC5їC3= (MC6їC58MC7їC5)mar(I1,I2)
C5

MC6їC5= IX1
C8
X2

X1,X2,I1,I2

MC8їC7
II2

MC7їC9= (MC8їC78MC5їC7)mar(X2)
C9
I1,I2

MC10їC9= II2

MC2їC1= (MC3їC2)mar(X1,X2)
C2
IO1
O1,X1,X2

C4
X1

(MC1їC28IO1)mar(O1)
direction of message passing

C4
X1

MC1їC2
C2
IO1
O1,X1,X2

C10
I2

MC7їC5= MC8їC78MC9їC7
C7
IX2
X2,I1,I2
MC9їC7= MC10їC98MC11їC9
C9
I1,I2

MC10їC9= II2

MC11їC9= II1

MC9їC11= (MC10їC98MC7їC9)mar(I2)
II1

C1
O1

I1
C11

C2

Root

II1

(a)

C3

O1,X1,X2

C5

X1,X2

C7
X2,I1,I2

X1,X2,I1,I2

X1

X1,I1,I2

X2

C4

C6

C8

Root

C9
I1,I2

I1
C11

(b)

C11
I1

I2 C10
C7
X2,I1,I2

(c)

MC7їC9= (MC8їC78MC5їC7)mar(X2)
MC9їC7= MC10їC98MC11їC9

C9
I1,I2

Figure 4.6. (a) Message passing with cluster C11 as root (b) Message passing with cluster C1
as root (c) Message storage mechanism

47

Inference in a binary join tree is performed using message passing mechanism. Initially
all the valuations are associated to the appropriate clusters. In our example, at Fig. 4.6., the
valuations are associated to these following clusters,
• φI1 associated to cluster C11
• φI2 associated to cluster C10
• φX1 associated to cluster C6
• φX2 associated to cluster C7
• φO1 associated to cluster C2
A message passed from cluster b, containing a variable set B, to cluster c, containing a variable
set C can be given as,
Mb→c = (φb ∏ Ma→b )mar(B\C)

(4.2)

a=c

where φb is the valuation associated with cluster b. If cluster b is not associated with any
valuation, then this function is omitted from the equation. The message from cluster b can
be sent to cluster c only after cluster b receives messages from all its neighbors other than c.
The resulting function is marginalized over the variables in cluster b that are not in cluster
c. To calculate the probability distribution of a variable Yi , the cluster having that variable
alone is taken as root and the messages are passed towards this root. Probability of Yi , P(Yi ),
is calculated at the root. In our example, at Fig. 4.6.(a), to find the probability distribution
of I1, the cluster C11 is chosen as the root. The messages from all the leaf clusters are
sent towards C11 and finally the probability distribution of I1 can be calculated as, P(I1) =
MC9→C11 ⊗ φI1 . Also note that the order of the marginalizing variables is O1,X1,X2,I2 which
exactly reflects the elimination order used to construct the binary join tree. As we mentioned
before, this binary join tree can be used to calculate probability distributions of other variables
48

also. In our example, at Fig. 4.6.(b), to find out the probability distribution of O1, cluster C1
is chosen as root and the messages from the leaf clusters are passed towards C1 and finally
the probability distribution of O1 can be calculated as, P(O1) = M C2→C1 . Note that the order
of the marginalizing variables changes to I1,I2,X1,X2. We can also calculate joint probability
distributions of the set of variables that forms a cluster in the binary join tree. In our example,
the joint probability P(I1, I2) can be calculated by assigning cluster C9 as root. In this fashion,
the probability distributions of any individual variable or a set of variables can be calculated
by choosing appropriate root cluster and sending the messages towards this root. During
these operations some of the calculations are not modified and so performing them again will
prove inefficient. Using the binary join tree structure these calculations can be stored thereby
eliminating the redundant recalculation. In the binary join tree, between any two clusters b
and c, both the messages Mb→c and Mc→b are stored. Fig. 4.6.(c) illustrates this phenomenon
using our example.
If an evidence set e is provided, then the additional valuations {eYi |Yi ∈ e} provided by
the evidences has to be associated with the appropriate clusters. A valuation eYi for a variable
Yi can be associated with a cluster having Yi alone. In our example, if the variable O1 is
evidenced, then the corresponding valuation eO1 can be associated with cluster C1. While
finding the probability distribution of a variable Yi , the inference mechanism (as explained
before) with an evidence set e will give the probability P(Yi , e) instead of P(Yi ). From P(Yi , e),
P(e) is calculated as, P(e) = ∑Yi P(Yi , e). Calculation of the probability of evidence P(e) is
crucial for MAP calculation.
The MAP probabilities MAP(iinter , o) are calculated by performing inference on the binary
join tree with evidences iinter and o. Let us say that we have an evidence set e = {iinter , o},
then MAP(iinter , o) = P(e). For a given partial instantiation i inter , MAP(iinter , o) is calculated
by maximizing over the MAP variables which are not evidenced. This calculation can be done
by modifying the message passing scheme to accommodate maximization over unevidenced
49

MAP variables. So for MAP calculation, the marginalization operation involves both maximization and summation functions. The maximization is performed over the unevidenced
MAP variables in I and the summation is performed over all the other variables in X and O.
For MAP, a message passed from cluster b to cluster c is calculated as,
Mb→c =

∑

max

{Ib }∈{B\C} {X ∪O }∈{B\C}
b
b

φb ∏ Ma→b

(4.3)

a=c

where Ib ⊆ I \ Iinter , Xb ⊆ X, Ob ⊆ O and {Ib , Xb , Ob } ∈ B.
Here the most important aspect is that the maximization and summation operators in
Eqn. 4.3 are non-commutative.
[∑ max P](y) ≥ [max ∑ P](y)
X

I

I

(4.4)

X

So during message passing in the binary join tree, the valid order of the marginalizing variables or the valid variable elimination order should have the summation variables in X and
O before the maximization variables in I. A message pass through an invalid variable elimination order can result in a bad upper bound that is stuck at a local maxima and it eventually
results in the elimination of some probable instantiations of the MAP variables I during the
search process. But an invalid elimination order can provide us an initial upper bound of the
MAP probability to start with. The closer the invalid variable elimination order to the valid
one, the tighter will be the upper bound. In the binary join tree, any cluster can be chosen as
root to get this initial upper bound. For example, in Fig. 4.6.(b) choosing cluster C1 as root
results in an invalid variable elimination order (I1, I2, X1, X2) and message pass towards this
root can give the initial upper bound. Also it is essential to use a valid variable elimination
order during the construction of the binary join tree so that there is at least one path that can
provide a good upper bound.

50

C4
X3

C1

C2

O1,X3,X6

O1

Probability

Root Cluster

MAP({},o)

C2

MAP({I1=0},o),
MAP({I1=1},o)

C31

MAP({I1=0,I2=0},o),
MAP({I1=0,I2=1},o)

C30

MAP({I1=0,I2=0,I3=0},o),
MAP({I1=0,I2=0,I3=1},o)

C28

C3
X3,X6
C6

C5
X1,X2,X3,X6
C8
X6

X3,X1,X2

C7
X1,X2,X6
C10

C9

X6,X4,X5

X1,X2,X4,X5,X6
C12
X1

C31
I1

C11
X1,X2,X4,X5

C16
X2

C29

C14

C13
X1,X2,X4,X5,I1,I2

C30
I2

I1,I2

X1,I1,I2

C28
I3

C27

C15

I1,I2,I3

X2,X4,X5,I1,I2

C20
X4

X5,I1,I2,I3

X2,I2,I3

C19
X4,X5,I1,I2,I3

C26

C25

C18

C17
X2,X4,X5,I1,I2,I3

C23
X5,I1,I2,I3

X5,I2,I3
C24
X5

C21
X4,X5,I1,I2,I3
C22
X4,I1,I2

Figure 4.7. Binary join tree for the probabilistic error model in Fig. 4.1.(c)

51

Fig. 4.7. gives the corresponding binary join tree, for the probabilistic error model given
in Fig. 4.1.(c), constructed with a valid variable elimination order (O1, X3, X6, X1, X2, X4,
X5, I3, I2, I1). In this model, there are three MAP variables I1, I2, I3. The MAP hypothesis
on this model results in i MAP = {I1 = 0, I2 = 0, I3 = 0}.
The initial upper bound MAP({}, o) is calculated by choosing cluster C2 as root and passing messages towards C2. As specified earlier this upper bound can be calculated with any
cluster as root. With C2 as root, an upper bound will most certainly be obtained since the
variable elimination order (I3, I2, I1, X4, X5, X1, X2, X3, X6) is an invalid one. But since
the maximization variables are at the very beginning of the order, having C2 as root will yield
a looser upper bound. Instead, if C16 is chosen as root, the elimination order (O1, X3, X6,
X1, I3, X4, X5, I2, I1) will be closer to a valid order. So a much tighter upper bound can be
achieved. To calculate an intermediate upper bound MAP(iinter , o), the MAP variable Ii newly
added to form iinter is recognized and the cluster having the variable Ii alone is selected as
root. By doing this a valid elimination order and proper upper bound can be achieved. For
example, to calculate the intermediate upper bound MAP({I1 = 0}, o) where the instantiation
{I1 = 0} is newly added to the initially empty set i inter , a valid elimination order should have
the maximization variables I2,I3 at the end. To achieve this, cluster C31 is chosen as root
thereby yielding a valid elimination order (O1, X3, X6, X1, X2, X4, X5, I3, I2).

4.1.2 Calculation of the Exact MAP Solution
The calculation of the exact MAP solution MAP(i MAP , o) can be explained as follows,
• To start with we have the following,
– Iinter → subset of MAP variables I. Initially empty.
– iinter → partial instantiation set of MAP variables I inter . Initially empty.
– id1 , id2 → partial instantiation sets used to store i inter . Initially empty.
52

Choose I1
Iinter = {}
iinter = {}
N{}{}

Iinter = {I1}
id1 = {I1=0}

Iinter = {I1}
id1 = {I1=1}

N{}{}

MAP({I1=0},o) > MAP({I1=1},o)

N{{II11} 0}

N{{II11} 1}

MAP(iMAP,o) = MAP({I1=0},o)
iinter = {{I1=0}}
Choose I2

Iinter = {I1,I2}
id1 = {I1=0,I2=0}
N{{II11, I 02,}I 2

N{}{}

N{{II11} 0}

MAP({I1=0,I2=0},o) > MAP({I1=0,I2=1},o)

0}

MAP(iMAP,o) = MAP({I1=0,I2=0},o)

Iinter = {I1,I2}
id1 = {I1=0,I2=1}

N{{II11} 1}

N{{II11, I 02,}I 2

Ignored

1}

iinter = {{I1=0,I2=0}}
Choose I3

N{}{}
N{{II11} 0}

Iinter = {I1,I2,I3}
id1 = {I1=0,I2=0,I3=0}
N{{II11, I 02,,II 23} 0, I 3

0}

N{{II11, I 02,}I 2

0}

Iinter = {I1,I2,I3}
id1 = {I1=0,I2=0,I3=1}

{ I 1 0 , I 2 0 , I 3 1}
MAP({I1=0,I2=0,I3=0},o) > MAP({I1=0,I2=1,I3=1},o) N{ I 1, I 2, I 3}

MAP(iMAP,o) = MAP({I1=0,I2=0,I3=0},o)

N{{II11} 1}
N{{II11, I 02,}I 2

1}

Ignored

Ignored

iinter = {{I1=0,I2=0,I3=0}} = iMAP

Figure 4.8. Search process for MAP computation
– iMAP → MAP instantiation. At first, i MAP = iinit , where iinit is calculated by sequentially initializing the MAP variables to a particular instantiation and performing local taboo search around the neighbors of that instantiation [30].
– MAP(iMAP , o) → MAP probability. Initially MAP(i MAP , o) = MAP(iinit , o) calculated by inferencing the probabilistic error model.
– v(Ii ) → number of values or states that can be assigned to a variable Ii . Since we
are dealing with digital signals, v(Ii ) = 2 for all i.
•

1:

Calculate MAP(iinter , o). /*This is the initial upper bound of MAP probability.*/

2:

if MAP(iinter , o) ≥ MAP(iMAP , o) then
53

3:
4:

MAP(iMAP , o) = MAP(iinter , o)
else

5:

MAP(iMAP , o) = MAP(iMAP , o)

6:

iMAP = iMAP

7:

end if

8:

while |I| > 0 do

9:

Choose a variable Ii ∈ I.

10:

Iinter = Iinter ∪ {Ii }.

11:

while v(Ii ) > 0 do

12:

Choose a value iv(Ii ) of Ii

13:

id1 = iinter ∪ {Ii = iv(Ii ) }.

14:

Calculate MAP(id1 , o) from binary join tree.

15:

if MAP(id1 , o) ≥ MAP(iMAP , o) then

16:

MAP(iMAP , o) = MAP(id1 , o)

17:

id2 = id1

18:
19:

else
MAP(iMAP , o) = MAP(iMAP, o)

20:

end if

21:

v(Ii ) = v(Ii ) − 1

22:

end while

23:

iinter = id2

24:

if |iinter | = 0 then

25:

goto line 29

26:

end if

27:

I = I − {Ii }

28:

end while
54

29:

if |iinter | = 0 then

30:

iMAP = iMAP

31:

else
iMAP = iinter

32:
33:

end if

The pruning of the search process is handled in lines 11-23. After choosing a MAP variable Ii , the partial instantiation set i inter is updated by adding the best instantiation I i = iv(Ii )
thereby ignoring the other instantiations of Ii . This can be seen in Fig. 4.8. which illustrates the
search process for MAP computation using the probabilistic error model given in Fig. 4.1.(c)
as example.

4.1.3 Calculating the Maximum Output Error Probability
According to our error model, the MAP variables represent the primary input signals of
the underlying digital logic circuit. So after MAP hypothesis, we will have the input vector
which has the highest probability to give an error on the output. The random variables I that
represent the primary input signals are then instantiated with i MAP and inferenced. So the
evidence set for this inference calculation will be e = {iMAP }. The output error probability is
obtained by observing the probability distributions of the comparator logic variables O. After
inference, the probability distribution P(O i , e) will be obtained. From this P(O i |e) can be
obtained as, P(Oi |e) =

P(Oi ,e)
P(e)

=

P(Oi ,e)
.
∑Oi P(Oi ,e)

Finally the maximum output error probability is

given by, maxi P(Oi = 1|e).
4.1.4 Computational Complexity of MAP Estimate
The time complexity of MAP depends on that of the depth first branch and bound search
on the input instantiation search tree and also on that of inference in binary join tree. The

55

former depends on the number of MAP variables and the number of states assigned to each
variable. In our case each variable is assigned two states and so the time complexity can
be given as O(2k ) where k is the number of MAP variables. This is the worst case time
complexity assuming that the search tree is not pruned. If the search tree is pruned, then the
time complexity will be < O(2 k ).
The time complexity of inference in the binary join tree depends on the number of cliques q
and the size Z of the biggest clique. It can be represented as q.2Z and the worst case time complexity can be given as O(2Z ). In any given probabilistic model with N variables, representing
a joint probability P(x1 , · · ·xN ), the corresponding join tree will have Z < N always [27]. Also
depending on the underlying circuit structure, the join tree of the corresponding probabilistic
error model can have Z << N or Z close to N, which in turn determines the time complexity.
Since for every pass in the search tree inference has to be performed in the join tree to get
the upper bound of MAP probability, the worst case time complexity for MAP can be given
as O(2k+Z ). The space complexity of MAP depends on the number of MAP variables for the
search tree and on the number of variables N in the probabilistic error model and the size of
the largest clique. It can be given by 2k + N.2Z .
4.2 Experimental Results
The experiments are performed on ISCAS85 and MCNC benchmark circuits. The computing device used is a Sun server with 8 CPUs where each CPU consists of 1.5GHz UltraSPARC
IV processor with at least 32GB of RAM.

4.2.1 Experimental Procedure for Calculating Maximum Output Error Probability
Our main goal is to provide the maximum output error probabilities for different gate error
probabilities ε. To get the maximum output error probabilities every output signal of a circuit

56

Take the probabilistic model for a
given digital logic circuit
Provide evidence P(or = 0) = 0 and P(or = 1) = 1 to out put or
where r = 1,…,n
Perform MAP hypothesis
Obtain the input instantiation i and instantiate the input variables
in the probabilistic model with i and perform inference.
Obtain the output probability P(or) = max P(oi = 1) where i = 1,…,n.
i

No

Is
r=n
Yes

Obtain the probability P(o) = max P(or = 1) where r = 1,…,n.
r

Figure 4.9. Flow chart describing the experimental setup and process
has to be examined through MAP estimation, which is performed through algorithms provided
in [31]. The experimental procedure is illustrated as a flow chart in Fig. 4.9. The steps are as
follows,
• First, an evidence has to be provided to one of the comparator output signal variables
in set O such that P(Oi = 0) = 0 and P(Oi = 1) = 1. Recall that these variables have a
probability distribution based on XOR logic and so giving evidence like this is similar
to forcing the output to be wrong.
• The comparator outputs are evidenced individually and the corresponding input instantiations i are obtained by performing MAP.
• Then the primary input variables in the probabilistic error model are instantiated with
each instantiation i and inferenced to get the output probabilities.
57

Table 4.3. Worst-case input vectors from MAP
Circuits
c17
max flat

No. of
Inputs
5
8

voter

12

Input vector

Gate error
probability ε
01111
0.005 - 0.2
00010011
0.005 - 0.025
11101000
0.03 - 0.05
11110001
0.055 - 0.2
000100110110 0.01 - 0.19
111011100010
0.2

• P(Oi = 1) is noted from all the comparator outputs for each i and the maximum value
gives the maximum output error probability.
• The entire operation is repeated for different ε values.
4.2.2 Worst-case Input Vectors
Table 4.3. gives the worst-case input vectors got from MAP i.e., the input vectors that
gives maximum output error probability. The notable results are as follows,
• In max f lat and voter the worst-case input vectors from MAP changes with ε, while in
c17 it does not change.
• In the range {0.005-0.2} for ε, max f lat has three different worst-case input vectors
while voter has two.
• It implies that these worst-case input vectors not only depend on the circuit structure but
could dynamically change with ε. This could be of concern for designers as the worstcase inputs might change after gate error probabilities reduce due to error mitigation
schemes. Hence, explicit MAP computation would be necessary to judge the maximum
error probabilities and worst-case vectors after every redundancy schemes are applied.

58

4.2.3 Circuit-Specific Error Bounds for Fault-Tolerant Computation
The error bound for a circuit can be obtained by calculating the gate error probability ε
that drives the output error probability of at least one output to a hard bound beyond which
the output does not depend on the input signals or the circuit structure. When the output error
probability reaches 0.5(50%), it essentially means that the output signal behaves as a nonfunctional random number generator for at least one input vector and so 0.5 can be treated as
a hard bound.
Fig. 4.10. gives the error bounds for various benchmark circuits. It also shows the comparison between maximum and average output error probabilities with reference to the change in
gate error probability ε. These graphs are obtained by performing the experiment for different
ε values ranging from 0.005 to 0.1. The average error probabilities are obtained from our
previous work by Rejimon et al. [86]. The notable results are as follows,
• The c17 circuit consists of 6 NAND gates. The error bound for each NAND gate in c17
is ε = 0.1055, which is greater than the conventional error bound for NAND gate, which
is 0.08856 [7, 8]. The error bound of the same NAND gate in voter circuit (contains
10 NAND gates, 16 NOT gates, 8 NOR gates, 15 OR gates and 10 AND gates) is
ε = 0.0292, which is lesser than the conventional error bound. This indicates that the
error bound for an individual NAND gate placed in a circuit can be dependent on the
circuit structure. The same can be true for all other logics.
• The maximum output error probabilities are much larger than average output error probabilities, thereby reaching the hard bound for comparatively lower values of ε, making them a very crucial design parameter to achieve tighter error bounds. Only for
alu4 and malu4, the average output error probability reaches the hard bound within
ε = 0.1(ε = 0.095 f or alu4, ε = 0.08 f or malu4), while the maximum output error prob-

59

0.6

0.5

Output error probability

Output error probability

0.6

Max
Avg

0.4
0.3
0.2
0.1
0

0.5
0.4
0.3
0.2
0.1

Max
Avg

0

0

0.02

0.04

0.06

0.08

Gate error probability H

0.1

0.12

0

Hc17 = 0.1055

0.02

0.04

(a)

0.1

0.12

(b)
Output error probability

Output error probability

0.08

Hmax_flat = 0.069

0.6

0.6
0.5
0.4
0.3
0.2
0.1

Max
Avg
0

0.02

Hvoter = 0.0292

0.04

0.06

0.08

0.1

0.5
0.4
0.3
0.2
0.1

Max
Avg

0

0

0

0.12

0.02

0.04

0.06

0.08

0.1

0.12

Hpc = 0.0407 Gate error probability H

Gate error probability H

(c)

(d)

0.6

0.6

Output error probability

Output error probability

0.06

Gate error probability H

0.5
0.4
0.3
0.2
0.1

Max
Avg

0

0.5
0.4
0.3
0.2

Max
Avg

0.1
0

0

0.02

0.04

0.06

Gate error probability H

0.08

0.1

0.12

0

Hcount = 0.071

0.02

0.04

0.06

0.08

0.1

0.12

Halu4 = 0.0255 Gate error probability H

(e)

(f)

Output error probability

0.6
0.5
0.4
0.3
0.2
0.1

Max
Avg

0
0

0.02

0.04

0.06

0.08

0.1

0.12

Hmalu4 = 0.0235 Gate error probability H

(g)
Figure 4.10. Circuit-specific error bound along with comparison between maximum and average output error probabilities for (a) c17, (b) max f lat, (c) voter, (d) pc, (e) count, (f) alu4,
(g) malu4

60

Table 4.4. Run times for MAP computation
Circuit
c17
max flat
voter
pc
count
alu4
malu4

No. of
Inputs
5
8
12
27
35
14
14

No. of
Time
Gates
6
0.047s
29
0.110s
59
0.641s
103
225.297s
144
36.610s
63
58.626s
92
588.702s

abilities for these circuits reach the hard bound for far lesser gate error probabilities
(ε = 0.0255 f or alu4, ε = 0.0235 f or malu4).
• While the error bounds for all the circuits, except c17, are less than 0.08(8%), the error
bounds for circuits like voter, alu4 and malu4 are even less than 0.03(3%) making them
highly vulnerable to errors.
Table 4.4. tabulates the run time for MAP computation. The run time does not change
significantly for different ε values and so we provide only one run time which corresponds
to all ε values. This is expected as MAP complexity (discussed in Sec. 4.1.4) is determined
by number of inputs, and number of variables in the largest clique which in turn depends on
the circuit complexity. It has to be noted that, even though pc has less number of inputs than
count, it takes much more time to perform MAP estimate due to its complex circuit structure.

4.2.4 Validation Using HSpice Simulator
Using external voltage sources error can be induced in any signal and it can be modeled using HSpice [43]. In our HSpice model we have induced error, using external voltage sources,
in every gate’s output. Consider signal O f is the original error free output signal and the signal O p is the error prone output signal and E is the piecewise linear (PWL) voltage source
61

Table 4.5. Comparison between maximum error probabilities achieved from the proposed
model and the HSpice simulator at ε = 0.05
Circuit
c17
max flat
voter
pc
count
alu4
malu4

Model HSpice
0.312
0.315
0.457
0.460
0.573
0.570
0.533
0.536
0.492
0.486
0.517
0.523
0.587
0.594

% diff over HSpice
0.95
0.65
0.53
0.56
1.23
1.15
1.18

that induces error. The basic idea is that the signal O p is dependent on the signal O f and the
voltage E. Any change of voltage in E will be reflected in O p . If E = 0v, then O p = O f , and
if E = V dd (supply voltage), then O p = O f , thereby inducing error. The data points for the
PWL voltage source E are provided by computations on a finite automata which models the
underlying error prone circuit where individual gates have a gate error probability ε.
Note that, for an input vector of the given circuit, a single simulation run in HSpice is
not enough to validate the results from our probabilistic model. Also the circuit has to be
simulated for each and every possible input vectors to find out the worst-case one. For a given
circuit, the HSpice simulations are conducted for all possible input vectors, where for each
vector the circuit is simulated for 1 million runs and the comparator nodes are sampled. From
this data the maximum output error probability and the corresponding worst-case input vector
are obtained.
Table 4.5. gives the comparison between maximum error probabilities achieved from the
proposed model and the HSpice simulator at ε = 0.05. The notable results are as follows,
• The simulation results from HSpice almost exactly coincides with those of our error
model for all circuits.
• The highest % difference of our error model over HSpice is just 1.23%.
62

0.35

Output error probability

0.3

Model
HSpice

0.25
0.2

0.15
0.1

0.05

00
00000
00001
00010
00011
00100
00101
00110
01111
01000
01001
01010
01011
01100
01101
01110
10111
10000
10001
10010
10011
10100
10101
10110
11111
11000
11001
11010
11011
11100
11101
11110
11
1

0

Input vector

Figure 4.11. Output error probabilities for the entire input vector space with gate error probability ε = 0.05 for c17
Fig. 4.11. gives the output error probabilities for the entire input vector space of c17 with
gate error probability ε = 0.05. The notable results are as follows,
• It can be clearly seen that the results from both the probabilistic error model and HSpice
simulations show that 01111 gives the maximum output error probability.
Fig. 4.12.(a) and (b) give the output error probabilities, obtained from the probabilistic
error model and HSpice respectively, for max f lat with gate error probability ε = 0.05. In
order to show that max f lat has large number of input vectors capable of generating maximum
output error, we plot output error probabilities ≥ ((µ) + (σ)), where µ is the mean of output
error probabilities and σ is the standard deviation. The notable results are as follows,
• It is clearly evident from Fig. 4.12.(a) that max f lat has a considerably large amount
of input vectors capable of generating output error thereby making it error sensitive.
Equivalent HSpice results from Fig. 4.12.(b) confirms this aspect.
63

(a)

(b)
Figure 4.12. (a) Output error probabilities ≥ (µ+σ), calculated from probabilistic error model,
with gate error probability ε = 0.05 for max f lat (b) Corresponding HSpice calculations

64

Ou
utput errror pro
obabiliity

0.5
0.45
0.4

Max

0.35

Avg

0.3

Ti
Time

0.25
02
0.2
0.15
0.1
0.05
0
0.005

0.05

variable(0.005 0.05)

Gate error probability H

Figure 4.13. Comparison between the average and maximum output error probability and run
time for ε=0.005, ε=0.05 and variable ε ranging from 0.005 - 0.05 for max f lat
• It is clearly evident that the results from probabilistic error model and HSpice show the
same worst-case input vector, 11101000, that is obtained through MAP hypothesis.
4.2.5 Results with Multiple ε
Apart from incorporating a single gate error probability ε in all gates of the given circuit,
our model also supports to incorporate different ε values for different gates in the given circuit.
Ideally these ε values has to come from the device variabilities and manufacturing defects.
Each gate in a circuit will have an ε value selected in random from a fixed range, say 0.005 0.05.
We have presented the result in Fig. 4.13. for max f lat. Here we compare the average and
maximum output error probability and run time with ε=0.005, ε=0.05 and variable ε ranging
from 0.005 - 0.05. The notable results are as follows,
• It can be seen that the output error probabilities for variable ε are closer to those for
ε=0.05 than for ε=0.005 implicating that the outputs are affected more by the erroneous
gates with ε=0.05.
65

• The run time for all the three cases are almost equal, thereby indicating the efficiency
of our model.

4.3 Discussion
We have proposed a probabilistic model that computes the exact maximum output error
probabilities for a logic circuit and mapped this problem as maximum a posteriori hypothesis
of the underlying joint probability distribution function of the network. We have demonstrated
our model with standard ISCAS and MCNC benchmarks and provided the maximum output
error probability and the corresponding worst-case input vector. We have also studied the
circuit-specific error bounds for fault-tolerant computing. The results clearly show that the
error bounds are highly dependent on circuit structure and computation of maximum output
error is essential to attain a tighter bound.

66

CHAPTER 5
MODELING ERROR IN SEQUENTIAL CIRCUITS

Sequential circuits consist of a combinational logic block, set of inputs, set of state bits
where the values of the next state bit is fed back to the present state in the next clock cycle
through latches. At a given time instance t i , the state signals sti are uniquely identified as a
function of primary input signals iti and state signals sti−1 of the previous time instance giving
rise to temporal correlations. Due to this, error occurring at one time instance might propagate
towards several consecutive time instances making it more vulnerable.
In this chapter, we present a time evolving probabilistic model (Temporal Dependency
Model TDM) that can handle the temporal effects of random variables. We form the TDM
model (Fig. 5.1.(d)) by unrolling the basic probabilistic model into sufficiently large number
of time slices and connecting the present state node of each time slice PSti to the next state
node of the previous time slice NSti−1 thereby maintaining the temporal correlations.
To form the error model we have used the concept of miter circuits where two copies
of the same circuit, one representing the ideal circuit and the other representing the erroneous
circuit, are compared. For a given circuit, an ideal TDM model and an erroneous TDM model,
where each gate is error-prone by a factor ε, are created. The ideal and erroneous primary
output nodes, Oti and Otei respectively, at each time slice ti are connected to an XOR logic
based comparator node Cti thereby forming a time evolving miter model. The output error
probability is calculated by inferencing the error model and obtaining the probability of state
”1” at the comparator nodes, P(Cti = 1), at each time slice ti iteratively by adding time slices
until the results converge. The number of time slices needed for a given sequential circuit is
67

I

PS

PSt1

It1
X1t1

Latch

X1

X2t1

It1
X1t1

X2t1

X2
NSt1

NS

O

PSt2

(a)

PS

PSt1

Ot1

NSt1

Ot1

It2

PSt2

It2

X1t2

X2t2

X1t2

X2t2

NSt2

Ot2

NSt2

Ot2

PSt3

It3

PSt3

It3

I
X2

X1
NS

X1t3

O
NSt3

(b)

X2t3
Ot3

X1t3
NSt3

(c)

X2t3
Ot3

(d)

Figure 5.1. (a) Digital logic circuit (b) Corresponding probabilistic model (c) DAG representation which is not minimal (d) TDM model
related to the temporal dependence of output error which in turn is governed by the temporal
correlations in the circuit. Our results show that different sequential circuits exhibit different
degree of temporal dependence and the required amount of time slices is less than 10 for all
the circuits, which is similar to the observations presented in [49].

5.1 Sequential Logic Model
We model the sequential circuits into a time evolved probabilistic network, named as temporal dependency model (TDM), which handles temporal dependencies. In this section we
provide the details on the modeling of a sequential logic into a TDM model.

68

5.1.1 TDM Model
Let us consider the sequential circuit shown in Fig. 5.1.(a) where the present state node is
represented as PS, the next state node is represented as NS, the primary input is represented
as I, the primary output is represented as O and the internal nodes are represented as X 1 and
X 2.
The equivalent probabilistic model shown in Fig. 5.1.(c) can be represented by G ti =
(Vti , Eti ). The nodes of the probabilistic model, V , are the union of all the nodes for each
time slice.
n


V=

Vti

(5.1)

i=1

where n is the number of time slices. In our example Vti = {PSti , NSti , Iti , Oti , X 1ti , X 2ti }. The
edges, E, of the probabilistic model are not just the union of the edges in a single time slice,
Eti , but also includes the edges between time slices, that is, temporal edges, E ti ,ti+1 . It has to be
noted that the copies of the same variable Xi in all time slices follow a markov property such
that the following two sets {Xi,t1 , · · · , Xi,ti−1 } and {Xi,ti+1 , · · · , Xi,ti+k } are independent given Xi,ti .
For example, in Fig. 5.1.(c), X 1t1 and X 1t3 are independent of each other given X 1t2 . So the
temporal edges can be defined as
Eti ,ti+1 = {(Xi,ti , Xi,ti+1 )|Xi,ti ∈ Vti , Xi,ti+1 ∈ Vti+1 }

(5.2)

where Xi,ti is any node in time slice t i and Xi,ti+1 is the replica of the same node in the adjacent
time slice ti+1 as shown in Fig. 5.1.(c). Thus, the complete set of edges E is
E = Et1 ∪

n


(Eti + Eti−1 ,ti )

(5.3)

i=2

In the probabilistic model (Fig. 5.1.(c)), apart from the dependencies from one time slice,
we also have the dependencies over two copies of the same variable X j across adjacent time
69

slices. But it is evident that X j,ti and X j,ti−1 are independent of each other given the present
state node PS j,ti . For example the nodes X 1t1 and X 1t2 , from Fig. 5.1.(c), are independent
of each other given the present state node PS j,t2 ; so even if we remove the temporal edges
connecting these nodes at consecutive time slices the underlying structure will still be intact.
The same can be told for X 1t2 and X 1t3 .
So in the probabilistic model all the temporal edges except those connecting the present
state and next state nodes of adjacent slices (bold lines in Fig. 5.1.(c)) can be removed to
achieve a minimal representation as shown in Fig. 5.1.(d), which is termed as the TDM model.
In our example, the necessary temporal edges can be given as,
Eti ,ti+1 = {(NSti , PSti+1 )|NSti ∈ Vti , PSti+1 ∈ Vti+1 }

(5.4)

5.2 Error Model
From the TDM model of a given sequential circuit, an error model is designed where the
erroneous behavior of the circuit is compared with the ideal error-free behavior of the circuit.

5.2.1 Structure
The error model contains three sections, (i) error-free logic where the gates are ideal, (ii)
error-prone logic where each gate goes wrong independently by an error probability ε and (iii)
XOR based comparator logic that compare between the error-free and error-prone primary
outputs. At first two copies of the TDM model, of the given sequential circuit, are created
where one copy represents the error-free behavior of the circuit while the other represents
erroneous behavior of the circuit. Fig. 5.2. illustrates the error model for the sequential circuit
given in Fig. 5.1.(a). The Error-free block includes nodes representing the ideal combinational
part of all the time slices. The Error-prone block includes nodes representing the erroneous
combinational part of all the time slices. At each time slice t k an XOR logic based node
70

PSt1

I t1

H

X 2t1

X 1t1

I t2

Ot1

NS t1

PSt2

H

X 2t 2
I t3

NS t2

Ot2

PSt3

Ct 2
X 1t3

NS t3

NS te1

Error-free block
Error-prone block
Comparator block

X 2te2

X 1te2
Ote2

PSte3

NS te3

Ot3

Ote1

NS te2

H

X 2 t3

X 2te1

PSte2

Ct1
X 1t2

X 1te1

X 1te3

X 2te3
Ote3

Ct 3

Figure 5.2. Error model obtained from TDM model with 3rd order temporal dependence

71

Ctk is added to compare between the error-free and error-prone primary outputs Otk and Otek
respectively. These additional nodes are included in the Comparator block. Note that at every
time slice tk both error-free and error-prone logic has to be fed from the same primary input
node Itk and at the first time slice t1 both error-free and error-prone logic has to connect to
the same present state (PS) node PSt1 . Also the present state nodes, PStk and PStek , for all
time slices tk are error-free, since we assume ideal latches. The comparator nodes Ctk and the
primary input nodes Itk for all time slices tk are also assumed to be error-free.
Any given probability function P(x 1 , x2 , · · · , xN ) can be written as 1
P(x1 , · · · , xN ) = ∏ P(xv |Pa(Xv))

(5.5)

v

where Pa(Xv) are the parents of the variable Xv , representing its direct causes. This factoring
of the joint probability function can be denoted as a graph with links directed from the random
variable representing the inputs of a gate to the random variable representing the output. Our
error model is one such graph structure where the probabilities P(x v |Pa(Xv) are provided by
Conditional Probability Tables (CPTs) as shown in Table 5.1. It gives the CPTs for the nodes
Otk whose parents are X 1tk and X 2tk , and Otek whose parents are X 1tek and X 2tek from Fig. 5.2.
The nodes are governed by NAND logic.
The CPTs represent the underlying logic function of each gate. In this setup it is easier
to incorporate the individual gate error probability ε by just changing the probabilities in the
CPT. For example Table. 5.1. gives the CPTs for error-free Otk and error-prone Otek . In errorprone CPT we just have to replace the probability values 0 by ε and 1 by 1 − ε. This indicates
that there is (ε × 100)% chance for the signal to go to state ”1” when it has to go to state ”0”
and (ε × 100)% chance for the signal to go to state ”0” when it has to go to state ”1”.
1

Probability of the event X i = xi will be denoted simply by P(x i ) or by P(Xi = xi ).

72

Table 5.1. Conditional probabilistic tables for error-free and error-prone NAND logic
Error-free NAND
P(X 1tk ) P(X 2tk ) P(Otk = 0) P(Otk = 1)
0
0
0
1
0
1
0
1
1
0
0
1
1
1
1
0
Error-prone NAND
e
P(X 1tk ) P(X 2tek ) P(Otek = 0) P(Otek = 1)
0
0
ε
1-ε
0
1
ε
1-ε
1
0
ε
1-ε
1
1
1-ε
ε

Table 5.2. Conditional probabilistic table for error-prone NAND logic having variable gate
error probabilities, ε0 and ε1
Error-prone NAND
P(X 1tek ) P(X 2tek ) P(Otek = 0) P(Otek = 1)
0
0
ε1
1-ε1
0
1
ε1
1-ε1
1
0
ε1
1-ε1
1
1
1-ε0
ε0

Also, in our model we can provide unequal gate error probabilities for any variable Xtek at
any time slice tk , such that if P(Xtk = 0) = 1, then P(Xtek = 0) = 1 − ε0 and P(Xtek = 1) = ε0 ;
if P(Xtk = 1) = 1, then P(Xtek = 0) = ε1 and P(Xtek = 1) = 1 − ε1. The corresponding CPT of
this implementation for an error-prone NAND logic is given in Table. 5.2. ε 0 is basically the
error probability of logic ”0” and ε1 is the error probability of logic ”1” at the output of a gate.
Increasing ε0 indicates that the circuit has more 0 → 1 errors, whereas increasing ε 1 indicates
that the circuit has more 1 → 0 errors. With this implementation, we can use our error model
to study the effect of these errors in the output of the circuit.

73

5.2.2 Inference Scheme
The inference scheme basically calculates the joint probability distribution P(x 1 , · · · , xN )
efficiently by propagating the probability distributions P(x v |Pa(Xv) of locally connected variables and thereby calculates the updated individual probability distributions of all random
variables. The inference or propagation of belief on the probabilistic error model is done using the Hugin architecture [26, 27] which is an exact method. The inference on our model
can be performed by forming clusters of nodes (cliques) which are directly dependent on each
other and performing computations on those clusters, thereby enabling local computing. The
network that is formed using these cliques is called join tree, where information can be propagated between cliques using message passing mechanism. Since extensive literature is already
available, we will not be explaining the inference scheme in detail. Interested readers please
refer to [26, 27].
In order to obtain a join tree, a moral graph is created from the error model, by adding
undirected links between the parents of each common child node, and it is triangulated, to
ensure that there are no cycles with more than three nodes, to obtain a chordal graph. Then
the cliques are formed from the chordal graph and they are linked accordingly to form the
join tree. Each adjacent cliques will have one or more common variables which are termed
as separators. The following steps will explain the formation of join tree using an example
circuit given in Fig. 5.3.(a) and its equivalent probabilistic model given in Fig. 5.3.(b).
• A moral graph, as shown in Fig. 5.3.(c), is formed from the original probabilistic network by adding undirected links between the parents of each common child node.
– Additional links (Fig. 5.3.(c)): G1 - G2, G3 - G4
– These additional links helps to form complete subgraphs of each parent-child set.
The nodes in each subgraph can form a clique and thereby enable local computation. But this graphical form does not produce the minimal join tree because
74

some of the independencies represented by the probabilistic network are lost due
to its undirected nature. The dependency structure is however preserved. This
non-minimal representation will eventually lead to high computational needs, even
when it does not sacrifice accuracy.
• To get a more minimal representation of the join tree which can capture the conditional independencies, a chordal graph is formed. It is obtained by triangulating the
moral graph. Triangulation is the process of breaking all cycles in the graph to make a
composition of cycles over just three nodes by adding additional links. To control the
computational demands, the goal is to form a chordal graph with the minimum number
of additional links.
– Additional links (Fig. 5.3.(c)): No additional links since there are no cycles with
more than three nodes.
• The cliques are formed from the chordal graph and they are linked accordingly to form
the join tree (Fig. 5.3.(d)). Each adjacent cliques will have one or more common variables which are termed as separators. In Fig. 5.3.(d), between cliques C1 and C2, the
variables {G3, G4} form the separator set S1. Also, any two cliques sharing a set of
common variables will have these common variables present in all the cliques that lie
in the connecting path between these two cliques. In Fig. 5.3.(d), the cliques C1 and
C4 share the common variable {G3} and the only clique, C2, in their path also contains
{G3}.
To perform local computation, each clique Ci is associated with probability potentials φCi
and each separator S j is associated with probability potentials φ S j . Also from here on, the set
of variables of any clique Ci or separator S j will be represented in bold letters as Ci or Sj . The
inference is performed as follows,

75

C1

G1

G2

G1,G3,G4

G2
G3

G1

G6
G6

G4

G3

G4

G2

G5

S1
G3,G4

G1
G4

G3
G6

G5

C3

G1,G2,G3

G3,G4,G5
S3
G3

C4

G3,G6

G5

(a)

C2

S2
G1,G3

(b)

(c)

(d)

Figure 5.3. (a) Digital logic circuit (b) Corresponding probabilistic model (c) Moral graph obtained by adding undirected links between parents of common child nodes (d) Corresponding
join tree obtained
• Initialization: Initially all the entries in the clique potentials and separator set potentials are assigned the value 1. In the join tree the variables of the given probabilistic
network are divided in separate cliques. Each clique will have its own joint probability
governed by its variables. But eventually we need to realize the joint probability of the
entire network as given in Eqn. 3.2. To achieve this, for each variable Yv , a particular
clique Ci which contains Yv along with its parents Pa(Yv ) is selected and the conditional
probability potential of Yv from its CPT is multiplied to the clique potential φCi
φCi = φCi P(yv |Pa(Yv ))

(5.6)

– Example: (Fig. 5.3.(d))
φC3 = φC3 P(G3|G1, G2)

(5.7)

• Message passing: After initialization the clique potentials are not consistent with their
separator potentials. So the joint probability given in Eqn. 3.2 is not perfectly realized.
To achieve this consistency message passing is performed. At first the marginal prob-

76

ability of the separator variables has to be computed from the probability potential of
clique C p and then it is used to scale the probability potential of clique C q .
– Marginalization:
updated

φSr

=

∑

Cp \Sr

φCp

(5.8)

– Scaling:
updated

φCq = φCq

φSr

φSr

(5.9)

– Example: (Fig. 5.3.(d)) Message passing from C2 to C1
updated

φS1

= ∑ φC2

(5.10)

G5

updated

φ
φC1 = φC1 S1
φS1

(5.11)

– The transmission of this scaling factor is the primary necessity for updating and
message passing. Eventually the joint probability of the entire network can be
represented as,
P(y1 , · · · , yN ) =

∏i φCi
∏ j φS j

(5.12)

– Message passing in a join tree has to be done in both directions, from root to leaf
termed as outward pass and from leaf to root termed as inward pass. An inward
pass followed by an outward pass will completely update all the cliques in the join
tree.
• Individual probability distribution calculation: Then the individual probability distribution for each variable can be calculated by choosing a clique Ci containing the variable
Yv and marginalizing its potential φCi over all the other variables Ci \Yv . This probability

77

distribution P(yv ) is given as,
P(yv ) =

∑

Ci \Yv

φCi

(5.13)

– Example: (Fig. 5.3.(d))
P(G6) = ∑ φC4

(5.14)

G3

5.2.3 Output Error Probability
The output error probability of a given sequential circuit can be obtained by calculating the
probability, P(Ctn = 1) of the comparator node Ctn at the final time slice tn by inferencing the
corresponding error model. Each sequential circuit based on its underlying structure will need
different amount of time slices. During inference if at any time instance a random variable
representing a signal in the error-prone logic picks up a wrong value, this value will propagate
for a considerable amount of time before the signal gets back to its original value. This pattern
will keep on repeating through several samples. Due to this phenomenon the random variable
takes some time to converge at one particular probability distribution. So for each sequential
circuit we have to iteratively calculate the output error probability by increasing the time slices
and stop when the output error probabilities of consecutive time slices converge. This is the
reason for having comparator nodes at every time slice. The number of time slices needed by
a sequential circuit is purely dependent on the underlying functionality and circuit structure.

5.3 Experimental Results
The output error probabilities for various sequential circuits are calculated using our experimental setup. We have performed our experiments on standard MCNC and ISCAS benchmark circuits. We have used HUGIN tool [50] to perform inference on the error model and
we validate these results with equivalent HSpice simulation.

78

Take the error model for a
given digital logic circuit with a
given H and fixed no. of time slices
First time slice : P(PS=0)=0.5, P(PS=1)=0.5
All time slices : P(I=0)=0.5, P(I=1)=0.5
Inference the error model
Output error probability = P(C=1)
at final time slice
Add 1 time slice
No

Does output error
probability converge
Yes
Output error probability = P(C=1)
at final time slice

Figure 5.4. Flowchart for experimental procedure
5.3.1 Experimental Procedure
Fig. 5.4. gives the experimental procedure undertaken to obtain the output error probabilities. At first for a given ε value the probabilistic error model is obtained. The primary input
nodes Itk for all the time slices tk and the present state nodes PSt1 for the first time slice t1
are set to be equally probable to have state ”0” or state ”1”. The model is then inferenced
and the output error probability is obtained by noting the probability of state ”1” at the comparator node, P(Ctk = 1) at every time slice tk . This inference is an exact one and it also
handles reconvergence and spatio-temporal dependencies. P(Ctn = 1) of the final time slice tn
and P(Ctn−1 = 1) of the previous time slice tn−1 are checked for convergence. If they do not
converge the time slices at both error-free and error-prone blocks are increased by 1 and the
procedure is repeated. Thus the circuits are inferenced with different number of time slices

79

Table 5.3. Output error probabilities at ε = 0.001, 0.003, 0.005, 0.01
Circuits ε = 0.001 ε = 0.003 ε = 0.005 ε = 0.01
train11
0.0055
0.0161
0.0265
0.0511
lion
0.0060
0.0177
0.0288
0.0545
lion9
0.0069
0.0200
0.0326
0.0614
bbara
0.0074
0.0213
0.0341
0.0621
bbtas
0.0072
0.0211
0.0344
0.0653
s27
0.0075
0.0220
0.0357
0.0676
mc
0.0084
0.0246
0.0399
0.0747

iteratively and stopped when the output error probability values converge at consecutive time
slices.

5.3.2 Output Error Probabilities
Table 5.3. gives the output error probabilities for gate error probabilities, ε = 0.001, 0.003,
0.005, 0.01. For a slight increase in ε value from 0.001 to 0.003, there is at least 2.87 fold
increase in the corresponding output error probabilities. Also, for a considerably low influx
of error at the gates for ε = 0.005(0.5%), the output error probability of most of the circuits
exceed 3% with mc producing the highest output error probability of 3.99% which is almost 8
fold higher than the individual gate error probability. The same can be seen for ε = 0.01(1%),
where the output error probability of most of the circuits exceed 6%.

5.3.3 Number of Time Slices
Fig. 5.5. shows the number of time slices needed by bbara and bbtas for ε = 0 − 0.006. It
can be seen that the circuits needed less number of time slices for small ε values and then the
needed number of time slices gradually increases along with ε value. For bbtas, the needed
number of time slices gets set to 5 at ε = 0.0013 while bbtas takes up more time slices and

80

Number of time slices

10
9
8
7
6
5
4
3

bbara
bbtas

2
1
0
0

0.001

0.002

0.003

0.004

0.005

Gate error probability H

0.006

0.007

Figure 5.5. Number of time slices needed by bbara and bbtas for ε = 0 − 0.006
finally gets set at 9 for ε = 0.005. The number of time slices needed is completely dependent
on the circuit structure. The following studies will shed more light on this aspect.

5.3.4 Output Error Propagation Across Time Slices
Fig. 5.6.(a) & (b) gives the transition of output error probability across time slices for
ε = 0.01. Here we show two sets of results that shows the difference in the transition of output
error across time slices. Fig. 5.6.(a) shows the output error transition for bbara, s27 and
mc, where the output error increases gradually across time slices and finally gets converged.
Whereas in Fig. 5.6.(b), which shows the output error transition for lion, lion9 and bbtas, the
output error reaches a maximum value and then gradually gets back to a steady value. This
behavior can be attributed to the relation between the present state nodes and the primary input
nodes which have random unbiased state distribution. If the present state nodes are closely
connected to the input nodes resulting in having an unbiased state distribution, the output error
will not be significant. With a biased state distribution in the present state nodes, the output

81

0.08

Output error probability

Output error probability

0.08
0.07
0.06
0.05

bbara
s27
mc

0.04
0.03
0.02
0

1

2

3

4

5

6

7

8

9

10

0.07
0.06
0.05
0.04

lion
lion9
bbtas

0.03
0.02
0

1

2

Number of time slices

3

4

5

6

7

8

Number of time slices

(a)

(b)

Figure 5.6. (a) Transition of output error probability across time slices for bbara, s27 and mc
with ε = 0.01 (b) Transition of output error probability across time slices for lion, lion9 and
bbtas with ε = 0.01

0.35

Oe

bbtas

0.144

bbara

0.142

O

0.25

O

lion9

0.2
0.15

Output probability

Output probability

0.3

Oe
Oe

lion

0.1

O

0.05

Oe

0.14
0.138
0.136
0.134
0.132
0.13

O

0.128
0.126
0.124

0
0

1

2

3

4

5

6

7

0

8

1

2

3

4

5

6

7

Number of time slices

Number of time slices

(a)

(b)

Figure 5.7. (a) Transition of error-free and error-prone output probabilities across time slices
for bbtas, lion9 and lion with ε = 0.01 (b) Transition of error-free and error-prone output
probabilities across time slices for bbara with ε = 0.01

82

Output error probability

0.14
0.12

H0 = 0.01, H1 = 0.02
H0 = 0.02, H1 = 0.01

0.1
0.08
0.06
0.04
0.02
0

Circuit
Figure 5.8. Output error probabilities for (ε0 = 0.01, ε1 = 0.02) and (ε0 = 0.02, ε1 = 0.01)
error becomes more significant and reaches a maximum value in the early time slices as shown
in Fig. 5.6.(b).
Fig. 5.7.(a) & (b) gives the transition of error-free (O) and error-prone (Oe ) output probabilities across time slices for ε = 0.01. The results in Fig. 5.7.(a) show that, for some circuits,
the temporal dependence of the erroneous output conforms with that of the ideal error-free
output. Whereas in circuits like bbara this is not the case as shown in Fig. 5.7.(b). Due to this,
the output error probability in bbara takes more time to converge.
5.3.5 Output Error Probabilities for ε0 = ε1
In our model we can provide unequal gate error probability values, ε 0 and ε1 , to study the
effect of 0 → 1 and 1 → 0 errors on the output of a circuit. Fig. 5.8. gives the output error
probabilities for (ε0 = 0.01, ε1 = 0.02) and (ε0 = 0.02, ε1 = 0.01). It can be clearly seen that
when ε0 > ε1 the output error probabilities are higher for all circuits. This indicates that a
0 → 1 error can make the outputs more erroneous and signals that stay at logic ”0” more often
can be vulnerable to this effect. This might be favorable in CMOS technology, since the 0 → 1
83

Table 5.4. Output error probabilities at ε = 0.001, 0.003, 0.005, 0.01 compared with HSpice
simulation results
ε = 0.001
ε = 0.003
Circuits Error model HSpice % Error model HSpice %
diff
diff
train11
0.0055
0.0057 3.51
0.0161
0.0159 1.26
lion
0.0060
0.0063 4.76
0.0177
0.0171 3.51
lion9
0.0069
0.0066 4.55
0.0200
0.0208 3.85
bbara
0.0074
0.0070 5.71
0.0213
0.0208 2.40
bbtas
0.0072
0.0069 4.35
0.0211
0.0203 3.94
s27
0.0075
0.0080 6.25
0.0220
0.0217 1.38
mc
0.0084
0.0088 4.55
0.0246
0.0250 1.60
ε = 0.005
ε = 0.01
Circuits Error model HSpice % Error model HSpice %
diff
diff
train11
0.0265
0.0263 0.76
0.0511
0.0497 2.82
lion
0.0288
0.0277 3.97
0.0545
0.0522 4.41
lion9
0.0326
0.0339 3.83
0.0614
0.0607 1.15
bbara
0.0341
0.0345 1.16
0.0621
0.0595 4.37
bbtas
0.0344
0.0354 2.82
0.0653
0.0671 2.68
s27
0.0357
0.0345 3.48
0.0676
0.0638 5.96
mc
0.0399
0.0391 2.05
0.0747
0.0733 1.91

bit flip is much harder than 1 → 0 bit flip due to the less error proneness of pMOS as compared
to nMOS since holes are tougher to be dislodged by external particle bombardments. It can
also signify that circuits with series pMOS connections can be less error prone as compared
to circuits with parallel pMOS connections.

5.3.6 Validation Using HSpice Simulation
We validate our results by comparing them with HSpice simulation results. Even though,
our error model can be used for any technology, the lack of benchmark circuits in any of
the other emerging technologies has forced us to compare our model with simulations using
45nm CMOS technology. Using external voltage sources error can be induced in any signal

84

and it can be modeled using HSpice [43]. In our HSpice model we have induced error, using
external voltage sources, in every gate’s output. Consider signal O f is the original error free
output signal and the signal O p is the error prone output signal and E is the piecewise linear
(PWL) voltage source that induces error. The basic idea is that the signal O p is dependent
on the signal O f and the voltage E. Any change of voltage in E will be reflected in O p . If
E = 0v, then O p = O f , and if E = V dd(supply voltage), then O p = O f , thereby inducing
error. The data points for the PWL voltage source E are provided by computations on a
finite automata which incorporates the individual gate error probability ε. The width of every
error pulse is fixed to 1ns. The results are obtained by running the circuits for 5 million
random input vectors and sampling the comparator outputs. Table 5.4. gives the comparison
between the output error probabilities obtained from inference in the error model and Hspice
simulation for different circuits with gate error probability ε = 0.001, 0.003, 0.005, 0.01. The
% difference is calculated as, ((Error model - Hspice) / Hspice) x 100. The highest relative
difference between the inference results and HSpice results is just 6.25% and on an average
the relative difference is only 4.43%.

5.4 Discussion
We have proposed a compact probabilistic model that can handle error in sequential logic
and we have presented experimental results on ISCAS and MCNC benchmark circuits. We
have observed that for low gate error probabilities like ε = 0.005(0.5%), the output error
probabilities are at least 5 fold higher and at most 8 fold higher. Also, our observations
showed that the degree of temporal dependence differs for various sequential circuits. Another
interesting observation indicated that 0 → 1 errors affects the circuit output more than 1 → 0
errors. We have also validated our model using HSpice simulation results and the average %
difference is only 4.43%.

85

CHAPTER 6
REDUNDANCY SCHEMES FOR ERROR MITIGATION

Reliable computing using unreliable circuit elements can be accomplished using the concept of redundancy. As the name suggests, the basic idea of ’redundancy’ is based on analyzing any given erroneous circuit through multiple redundant components or processes. The
outputs from these redundant components or processes are subjected to a voting scheme where
the majority value of the signal under consideration is chosen to be its ultimate error-free
value. In this chapter, the following unique redundancy schemes are discussed.
• Temporal redundancy scheme - the redundancy is applied in the input space by providing multiple instances of the same input combinations, which are highly probable to
create an error in the output.
• Spatial redundancy scheme - the redundancy is applied in the intermediate signal space
by providing multiple copies of the same gates, whose output signals are erroneous.
• Hybrid redundancy scheme - the redundancy is applied in both input space and intermediate signal space, in order to achieve comprehensive error reduction in the given
erroneous circuit.

86

6.1 Temporal Redundancy Scheme Using Triple Temporal Redundancy (TTR) Technique
Triple Temporal Redundancy (TTR) is an error reduction technique, where specific input
combinations are applied three times and from the resulting simulation outputs the majority
value is accepted as the correct one. Performing this technique on the entire input space will
result a large amount of unwanted calculations leading to high simulation time. In order to
make it more efficient, the technique has to be applied only on a subset of the input space,
where each input combination has a high chance of giving a wrong output compared to those
in the rest of the input space. So as a preliminary step for this redundancy technique, the
above mentioned subset on the input space should be determined. This can be achieved by
using the model that calculates the maximum output error and the corresponding worst-case
input combination, explained in Chapter 4.

6.1.1 Determination of the Set of Worst-Case Input Combinations for Selective Redundancy in TTR
As discussed in Chapter 4, the model that calculates maximum output error probability,
determines a single worst-case input combination which has the highest probability to provide
an error in the output. In order to get a set of worst-case input combinations, the search process
explained in Section 4.1.2 can be extended as follows (Fig. 6.1.),
• After obtaining the most probable worst-case input combination, i MAP , the corresponding MAP probability, MAP(i MAP , o), is noted.
• The number of input combinations needed for TTR is decided and a lower bound of
MAP(i, o), called LB MAP, with reference to MAP(iMAP , o), is chosen in order to collect the set of worst-case input combinations. This lower bound can be adjusted based
on the amount of input combinations needed for TTR.

87

Lower Bound of MAP(i , o) = LB_MAP

No. of worst-case input combinations needed for TTR = 2

Starting Node

N{}{}
N{{II11} 0}

Iinter = {I1,I2,I3}
id1 = {I1=0,I2=0,I3=0}
N

N

{ I 1 0 , I 2 0}
{ I 1, I 2}

Iinter = {I1,I2,I3}
id1 = {I1=0,I2=0,I3=1}

{ I 1 0 , I 2 0 , I 3 0}
{ I 1, I 2 , I 3}

N{{II11} 1}
N{{II11, I 02,}I 2

1}

N{{II11, I 02,,II 23} 0, I 3 1}

Ignored

Ignored

MAP({I1=0,I2=0,I3=0},o) > LB_MAP
MAP({I1=0,I2=0,I3=1},o) < LB_MAP

Library of worst-case
input combinations
{I1=0,I2=0,I3=0}
Backtrack route

{I1=0,I2=1,I3=0}
N{}{}
N

N{{II11} 1}

{ I 1 0}
{ I 1}

Ignored
N{{II11, I 02,}I 2
N{{II11, I 02,,II 23} 0, I 3

0}

Iinter = {I1,I2,I3}
id1 = {I1=0,I2=1,I3=0}

0}

N{{II11, I 02,,II 23} 0, I 3 1}

N{{II11, I 02,,II 23}1, I 3

N{{II11, I 02,}I 2

1}

Iinter = {I1,I2,I3}
id1 = {I1=0,I2=1,I3=1}

0}

N{{II11, I 02,,II 23}1, I 3 1}

MAP({I1=0,I2=1,I3=0},o) > LB_MAP

Figure 6.1. Determination of the set of worst-case input combinations by backtracking through
the search tree used for MAP computation given in Fig. 4.8.
• The search process for the rest of the worst-case input combinations apart from i MAP ,
starts from the node NIiMAP and backtracks through the depth-first branch and bound
search tree, collecting all the input combinations which have MAP(i, o) above the chosen lower bound. This search process is stopped once the target number of worst-case
input combinations for TTR is reached.
• Finally, the resulting set of worst-case input combinations are placed in a library which
can be used as a reference while performing TTR.

88

Note that the search process for MAP computation is conducted in a binary tree, where the
are connected to only two child nodes. In the
root node N and the intermediate nodes NIiinter
inter
process for determining the set of worst-case input combinations, the basic idea is that every
input combination i in the vicinity of i MAP should be checked for the possibility of being a
worst-case one by comparing the corresponding joint probability MAP(i, o) with the lower
bound LB MAP. Also, note that whenever an intermediate node, with two unvisited child
nodes, is encountered, the search path goes through the ’0’ edge first and then to the ’1’ edge.
{I1=0,I2=0,I3=0}

In the example given in Fig. 6.1., the backtracking starts from node N{I1,I2,I3}
{I1=0,I2=0}

goes to the intermediate node N{I1,I2}

{I1=0,I2=0,I3=1}

and evaluates the node N{I1,I2,I3}

and

to check

whether the corresponding condition MAP({I1 = 0, I2 = 0, I3 = 1}, o) > LB MAP is true.
Since it is not true, the input combination {I1 = 0, I2 = 0, I3 = 1} is not categorized as a
worst-case input combination. Then the backtracking search goes one level above to node
{I1=0}

N{I1}

{I1=0,I2=1}

and gets to its unvisited child node N{I1,I2}

{I1=0,I2=1,I3=0}

The child node along the ’0’ edge, N{I1,I2,I3}

, which has two unvisited child nodes.

, is visited first and the corresponding joint

probability, MAP({I1 = 0, I2 = 1, I3 = 0}, o), is checked for the condition MAP({I1 = 0, I2 =
1, I3 = 0}, o) > LB MAP. Since it is true, the input combination {I1 = 0, I2 = 0, I3 = 1} is
considered as a worst-one and added to the library. Since the target of 2 worst-case input
combinations for TTR is reached, the search is stopped.

6.1.2 Experimental Setup for TTR
In order to achieve efficient computation, the unnecessary simulation runs are avoided by
incorporating selective redundancy in TTR. This is performed through the following steps,
• Performing TTR only on the worst-case input combinations instead of running it on the
entire input space.
• Deciding on the third run of TTR based on the first two runs.
89

Erroneous Circuit

H
H

N4e

H

N7e

Majority
Value
Calculator

N7m

H

N8e

Run
Identifier

N8m

N5e

H
N6e

C1

Run_flag

N2 Random
Input
Generator
N3

Decision
Block
TTR_flag

Library of worst-case
input combinations

Ideal Circuit
N4

Comparators

N1

C2

N7

N5

N8
N6

Figure 6.2. Experimental setup for TTR incorporating selective redundancy
Fig. 6.2. illustrates the experimental setup for TTR. The description of the various segments
are as follows,
• Erroneous Circuit:- This is the circuit under consideration, where each gate has a gate
error probability ε.
• Ideal Circuit:- The fictitious ideal counterpart of the erroneous circuit, under consideration, used to study the erroneous primary output signals.
• Comparators:- XOR gates used to compare between the erroneous primary output signals and their ideal counterparts, in order to detect the occurrence of output error.

90

• Library of worst-case input combinations:- Collection of input combinations, which
have high probability of inducing error in the circuit output as compared to the rest of
the input space.
• Random Input Generator:- This segment produces random digital input signals which
are applied to the primary inputs of both the erroneous circuit and its ideal counterpart.
• Decision Block:- This segment decides on the necessity of performing TTR and the
necessity of the third run. It gets the needed information from the library of worstcase input combinations, the random input generator and the majority value calculator.
It also sends the decision information to the random input generator and the majority
value calculator.
• T T R f lag:- A boolean flag that triggers the necessity of performing TTR.
– T T R f lag = 1, implies TTR needed.
– T T R f lag = 0, implies TTR not needed.
• Run f lag:- A boolean flag that triggers the necessity of performing the third run.
– Run f lag = 1, implies third run is needed.
– Run f lag = 0, implies third run is not needed.
• Majority Value Calculator:- This segment compares the output values from the TTR
runs and determines the majority value of the primary output signals.
• Run Identifier:- Compares the output values from the first two TTR runs and sends the
information to the decision block.

91

The procedure to perform TTR is as follows,
• An input combination generated from the random input generator is provided to both
the erroneous circuit and the ideal circuit. The same input combination is also sent to
the decision block.
• The generated input combination is compared with the set of worst-case input combinations. If it conforms with any of the worst-case input combination, then T T R f lag = 1,
else T T R f lag = 0.
• TTR flag is checked for its status.
– If T T R f lag = 1, then the same input combination is applied again and the value
is compared with that of the previous run. Then Run flag is triggered.
– If T T R f lag = 0, then the next input combination is applied. Run flag is not
triggered.
• Run flag is checked for its status.
– If Run f lag = 1, then the input combination is applied for the third time. The
majority value from the three runs is determined.
– If Run f lag = 0, then the input combination is not applied for the third time. The
value from the second run is decided as the majority value.
• Then the majority value of the erroneous output signal and the ideal output signal are
fed to an XOR comparator.
• The output error probabilities are calculated from the comparator outputs.
Note that the decision block can run in parallel with the circuit and so when the inputs are
not from the library, there is no penalty for the library search. Also note that if it is decided
92

that TTR is needed, then the XOR comparators wait for the majority values to be determined
before comparing the erroneous and ideal signals, thereby avoiding the occurrence of unnecessary samples in their output signals. Also in the example given in Fig. 6.2., if TTR is not
performed, then N7m = N7e , N8m = N8e . If TTR is performed, then N7m = majority of N7e
from the TTR runs, N8m = majority of N8e from the TTR runs.

6.2 Spatial Redundancy Scheme Using Cascaded Triple Modular Redundancy (CTMR)
Technique
In triple modular redundancy, three copies of the any erroneous gate are created and from
their outputs the majority value is accepted as the correct one. Lets say that the three copies
of the erroneous digital signal are represented as A, B and C. Then the majority value out of
A, B, C can be determined by implementing the boolean function AB + BC + AC. The most
important aspect to note here is that the error probability of any erroneous gate will reduce
when subjected to triple modular redundancy through the majority gate. An even better error
probability can be achieved using CTMR, where two cascading levels of triple modular redundancy is applied by replicating the erroneous gate nine times and producing three majority
outputs which in turn are supplied to another majority gate to get a final value. Fig. 6.3. illustrates the CTMR technique used to perform spatial redundancy. The signal N8 e whose initial
error probability is ε, when subjected under CTMR, attains a better error probability ε s < ε.
To achieve efficient spatial redundancy, instead of applying CTMR to all gates in the circuit,
only some selective gates can be chosen. To determine these gates, a sensitivity analysis can
be performed on the corresponding probabilistic error model of the circuit.

93

N5e
N1

N2

N3

N1

N2

N4

N5

N5e

N4e

N6

N8

N7e

Ideal
Circuit

H

H

H

H

H

H

H

H

H

H

N6e
MG

Hs

H
N7

N3

H

H

N6e

N8e
Erroneous
Circuit

MG

MG

MG

N8e with error probability

Hs < H
Majority Gate (MG)
A

C1

Comparators

B

C

C2

Majority of A, B, C

Figure 6.3. Spatial redundancy scheme using CTMR technique incorporating majority logic
6.2.1 Sensitivity Analysis for Selective Redundancy in CTMR
To determine the sensitivity of an erroneous node, that node is perturbed so that it will
have a different gate error probability εs , which can be similar to the one obtained by performing CTMR on that particular node, while providing gate error probability ε for all other
nodes. The output error probabilities for this setup is calculated as, Pεs (Oi ). Then the output
error probabilities, Pε (Oi ), are obtained with gate error probability ε fixed in all the erroneous
nodes. The difference between the output error probabilities, Pεs (Oi ) and Pε (Oi ), determines
the node’s degree of influence or sensitivity. Nodes are ranked on the basis of the decreasing
order of the degree of influence and the top ranked nodes are selected as the sensitive nodes.

6.3 Hybrid Redundancy
Hybrid redundancy scheme is the blend of temporal redundancy and spatial redundancy.
As shown in Fig. 6.4., hybrid redundancy can be visualized as performing temporal redun94

Erroneous Circuit after Spatial
Redundancy

H
H

N4e

Hs

N7e

Majority
Value
Calculator

N7m

H

N8e

Run
Identifier

N8m

N5e

N6e
N1
N2 Random
Input
Generator
N3

C1

Run_flag
Decision
Block
TTR_flag

Library of worst-case
input combinations

Ideal Circuit
N4

Comparators

H

C2

N7

N5

N8
N6

Figure 6.4. Hybrid redundancy scheme using CTMR and TTR techniques
dancy on an erroneous circuit whose error behavior is optimized by spatial redundancy. The
spatial redundancy on the error model is first performed and then the modified structure is used
to perform temporal redundancy. This procedure can be interpreted as performing temporal
redundancy on a circuit whose sensitive gates will have a gate error probability ε s compared to
the less-sensitive nodes with gate error probability ε, where ε s < ε. This redundancy scheme
will have the relative merits of both temporal and spatial redundancy schemes.

6.4 Experimental Results
The experiments are performed using 8 million random input vectors and the probability
of state ”1” at the comparator outputs, P(Ci = 1), are observed. For circuits with more than
one primary output, the output error is observed as max i P(Ci = 1). The results are presented
as percentage improvements in mitigation of output error with redundancy over the output

95

Percentage mitigation of output error

25

Temporal Redundancy 5%
Temporal Redundancy 15%

20

15

10

5

0
c17

Max_flat

Voter

Decoder

Alu4

Malu4

Circuits

Figure 6.5. Percentage mitigation of output error achieved through 5% and 15% temporal
redundancy with ε=0.001

error values without redundancy. Also results for two varying amounts of redundancy (5%
and 15%) are presented, where the variation in temporal redundancy is achieved by varying
the number of worst-case input combinations while performing TTR and varying the number of perturbed nodes while performing CTMR. All the results presented in this section are
for gate error probability ε=0.001. Circuits from the ISCAS85 benchmark suite are used as
test benches and the experiments are performed in a Pentium IV, 2.00 GHz, Windows XP
computer.

6.4.1 Error Mitigation Through Temporal Redundancy
Fig. 6.5. gives the percentage mitigation of output error achieved through 5% and 15%
temporal redundancy with ε=0.001. The important observations are listed as follows,

96

• For all the circuits, the error mitigation percentage for 15% temporal redundancy, is
more than 10%. For 5% temporal redundancy, c17 and voter show significant error
mitigation as compared to other circuits.
• For some circuits like c17 and max f lat, the error mitigation percentage is even beyond
20% when 15% of temporal redundancy is applied.
• For all circuits, the results clearly show the improvement in error mitigation when the
amount of redundancy is increased. The improvement is more than 13% in circuits like
c17 and max f lat, while for other circuits it is more than 6%.

6.4.2 Error Mitigation Through Spatial Redundancy
Fig. 6.6. gives the percentage mitigation of output error achieved through 5% and 15%
spatial redundancy with ε=0.001. The important observations are listed as follows,
• Significant error mitigation is achieved for all circuits with 15% spatial redundancy. The
error mitigation percentage is above 20% for all the circuits for 15% spatial redundancy,
with voter achieving almost 50% error mitigation, c17 achieving 36% error mitigation
and max f lat, malu4 achieving around 30% error mitigation.
• For 5% spatial redundancy, the error mitigation percentage is more than 10% for circuits
like c17, max f lat, voter and alu4, while it is closer to 10% for the other circuits.
• The results clearly show significant improvement in percentage of error mitigation,
when spatial redundancy is increased from 5% to 15%. While the improvement is as
high as 33% for voter, for all the circuits except decoder, the improvement is more than
15%.

97

Percentage mitigation of output error

50

Spatial Redundancy 5%

45

Spatial Redundancy 15%

40
35
30
25
20
15
10
5
0
c17

Max_flat

Voter

Decoder

Alu4

Malu4

Circuits

Figure 6.6. Percentage mitigation of output error achieved through 5% and 15% spatial redundancy with ε=0.001

6.4.3 Error Mitigation Through Hybrid Redundancy
Fig. 6.7. gives the percentage mitigation of output error achieved through 5% and 15%
hybrid redundancy with ε=0.001. 5% hybrid redundancy is achieved using the combination
of 5% temporal and 5% spatial redundancies, while 15% hybrid redundancy is achieved using
the combination of 15% temporal and 15% spatial redundancies. The important observations
are listed as follows,
• Significant error mitigation is achieved for all circuits with 15% hybrid redundancy.
The error mitigation percentage is above 30% for all the circuits, with voter achieving as high as 60% error mitigation, c17 achieving 51% error mitigation and max f lat
achieving around 47% error mitigation.
• Even for 5% hybrid redundancy, the error mitigation percentage is as high as 35% for
voter, while it is more than 20% for the circuits like c17 and max f lat.
98

Percentage mitigation of output error

70

5% Hybrid (5%Spatial &
5%Temporal)

60

15% Hybrid (15%Spatial &
15%Temporal)

50

40

30

20

10

0
c17

Max_flat

Voter

Decoder

Alu4

Malu4

Circuits

Figure 6.7. Percentage mitigation of output error achieved through 5% and 15% hybrid redundancy with ε=0.001

• The improvement in error mitigation, by increasing the amount of hybrid redundancy
from 5% to 15%, is highly significant in all the circuit. While the improvement is above
24% in circuits like c17, max f lat, voter and malu4, it is around 20% for the circuits
decoder and alu4.

6.4.4 Comparison Between the Redundancy Schemes
Fig. 6.8. gives the comparison between the redundancy schemes for 5% and 15% redundancy with ε=0.001. The important observations are listed as follows,
• As expected, hybrid redundancy provides better error mitigation as compared to temporal and spatial redundancies. The other notable result is that spatial redundancy provides
better error mitigation as compared to temporal redundancy.

99

Temporal Redundancy 5%

30

25

Percentage mitigation of output error

Percentage mitigation of output error

Spatial Redundancy 5%
5% Hybrid (5%Spatial &
5%Temporal)

20

15

10

5

0

70

Temporal Redundancy 15%

60

Spatial Redundancy 15%

50

15% Hybrid (15%Spatial &
15%Temporal)

40

30

20

10

0
c17

Max_flat

Voter

Decoder

Alu4

Malu4

c17

Circuits

Max_flat

Voter

Decoder

Alu4

Malu4

Circuits

(a)

(b)

Figure 6.8. Comparison between the redundancy schemes for (a) 5% and (b) 15% redundancy
with ε=0.001

• For 5% redundancy case, the improvement in error mitigation using hybrid scheme as
compared to temporal scheme is above 10% for circuits c17, max f lat, and above 18%
for voter. Even the least improvement is around 7% for decoder, while the improvement
in other circuits is around 9%.
• For 5% redundancy case, the improvement in error mitigation using hybrid scheme as
compared to spatial scheme is above 10% for only voter, while the improvement in c17
is around 8%. The improvement in rest of the circuits is less than 7%.
• For 15% redundancy case, the improvement in error mitigation using hybrid scheme as
compared to temporal scheme is above 20% for all circuits except decoder, with voter
showing the highest improvement of about 43%. Even the least improvement is around
18% for decoder.
• For 15% redundancy case, the improvement in error mitigation using hybrid scheme as
compared to spatial scheme is above 10% for all the circuits except alu4 and malu4.
The highest improvement, shown by max f lat, is around 16%, while the improvement
shown by alu4 and malu4 is above 8%.
100

5%Spatial 5% Temporal

70

Percentage mitigation of output error

15%Spatial 5% Temporal
60

5%Spatial 15% Temporal
15%Spatial 15% Temporal

50

40

30

20

10

0
c17

max_flat

voter

decoder

alu4

malu4

Circuits

Figure 6.9. Percentage mitigation of output error achieved through hybrid redundancy with
different combinations of spatial and temporal redundancies while ε=0.001

6.4.5 Error Mitigation Through Hybrid Redundancy with Different Combinations of
Spatial and Temporal Redundancies
Fig. 6.9. gives the percentage mitigation of output error achieved through hybrid redundancy with different combinations of spatial and temporal redundancies while ε=0.001. The
combinations include, 5% spatial and 5% temporal, 15% spatial and 5% temporal, 5% spatial
and 15% temporal, 15% spatial and 15% temporal. The important observations are listed as
follows,
• As expected, the combination of 15% spatial and 15% temporal redundancies yield the
best error mitigation, while the combination of 5% spatial and 5% temporal redundancies yield the worst error mitigation.
• Comparing the combination 15% spatial and 5% temporal with the combination 5%
spatial and 15% temporal, which is exactly the opposite, it is evident that providing
more spatial redundancy is beneficial for error mitigation. The difference between the
101

1.35

5

1.3

Area Penalty (x area for 1 gate)

Delay Penalty (x delay for 1 run)

4.5

1.25

1.2

1.15

1.1

4
3.5
3
2.5
2
1.5
1

1.05
0.5

1

0

5%

15%

5%

Temporal Redundancy

15%

Spatial Redundancy

(a)

(b)

Figure 6.10. (a) Delay penalty in temporal redundancy (b) Area penalty in spatial redundancy

percentage mitigation of output error between these combinations is as high as 18% for
voter, and more than 10% for c17 and malu4. While the difference is around 8% for
alu4, it is as low as 4% for max f lat and decoder.

6.4.6 Delay and Area Penalties
Fig. 6.10. gives the delay penalty due to multiple runs with the same input vector in temporal redundancy, and the area penalty due to multiple copies of the same gate in spatial
redundancy. The important observations are as follows,
• Area penalty is much larger than delay penalty. While the delay penalty is just 1.1
times the delay without redundancy for 5% temporal redundancy and 1.3 times the
delay without redundancy for 15% temporal redundancy, the area penalty is 2.2 times
the area without redundancy for 5% spatial redundancy and 4.6 times the area without
redundancy for 15% spatial redundancy.
• This is obvious since the delay penalty is due to just 2 additional runs, while area penalty
is due to 24 additional gates.

102

6.5 Discussion
We have performed temporal, spatial and hybrid redundancy, using our probabilistic error
model, to achieve error mitigation in digital logic circuits. The percentage of error mitigation
achieved using all the three types of redundancies are shown through experimental results.
On an average, for 15% redundancy, 16% error mitigation was achieved with temporal redundancy, 32% error mitigation was achieved with spatial redundancy and 44% error mitigation
was achieved with temporal redundancy. We have also provided a comprehensive study of
the relative merits of these redundancy schemes, indicating the effectiveness of the hybrid
redundancy, that encapsulates both temporal and spatial redundancy techniques.

103

CHAPTER 7
CONCLUSION AND FUTURE DIRECTIONS

In this dissertation, we have presented reliability models for nano VLSI circuits using
probabilistic graphs and have accomplished the following,
• We have calculated the maximum error occurring in digital logic circuits and the corresponding worst-case input combination, through maximum a posteriori hypothesis,
using an efficient Shenoy-Shafer algorithm. Through the results we have shown the
importance of handling maximum error behavior for achieving fault tolerant computing machines. We have also studied the circuit-specific error bounds for fault-tolerant
computing and the results clearly show that the error bounds are highly dependent on
circuit structure and computation of maximum output error is essential to attain a tighter
bound.
• We have calculated the average output error in sequential digital logic circuits and studied the transient error behavior across different time instances, using a dynamic timeevolving probabilistic error model. Through the results, we have shown the vulnerability of sequential circuits to transient errors and the dependence of error behavior to the
circuit structure.
• We have performed temporal, spatial and hybrid redundancy, using our probabilistic error model, to achieve error mitigation in digital logic circuits. We have shown significant
error reduction using all the three techniques and we also have provided a comprehensive study of the relative merits of these redundancy schemes, indicating the effective104

ness of the hybrid redundancy, that encapsulates both temporal and spatial redundancy
techniques.
Some possible future directions of this work are as follows,
• This work can be further enhanced by obtaining real time gate error probability, ε, values
from device physics and fabrication processes. Also using this model to solve reliability
issues in real-time test benches like circuits used in automobiles and biomedical chips
can further enlarge the scope and effectiveness of the model.
• To handle large circuits, stochastic heuristic algorithms to detect both average and maximum error can be proposed. This work can serve as a baseline exact estimate to judge
the efficacy of the various stochastic heuristic algorithms that will be essential for circuits of higher dimensions.
• The error model to detect error in sequential circuits can be further enhanced by exploring error masking effects, like the latching window masking effect, that can arise in an
erroneous latch connected in the feedback path.
• The error models can be enhanced by addressing design aspects like timing violations
leading to delay faults.
• Since our model can be more versatile, apart from addressing global erroneous behavior,
we should also address specific reliability issues like signal integrity, by modeling the
gate error probability values for each gate based on this specific reliability issue.

105

REFERENCES

[1] J. Von Neumann, “Probabilistic Logics and the Synthesis of Reliable Organisms from
Unreliable Components”, in Automata Studies (C. E. Shannon and J. McCarthy, eds.),
pages 43–98, Princeton Univ. Press, Princeton, N.J., 1954.
[2] F. P. Mathur and A. Avižienis, “Reliability Analysis and Architecture of a HybridRedundant Digital System: Generalized Triple Modular Redundancy with Self-Repair”,
AFIPS Joint Computer Conferences, pages 375–383, 1970.
[3] N. Pippenger, “Reliable Computation by Formulas in the Presence of Noise”, IEEE
Trans on Information Theory, vol. 34(2), pages 194–197, 1988.
[4] T. Feder, “Reliable Computation by Networks in the Presence of Noise”, IEEE Trans on
Information Theory, vol. 35(3), pages 569–571, 1989.
[5] B. Hajek and T. Weller, “On the Maximum Tolerable Noise for Reliable Computation
by Formulas”, IEEE Trans on Information Theory, vol. 37(2), pages 388–391, 1991.
[6] W. Evans and L. J. Schulman, “On the Maximum Tolerable Noise of k-input Gates for
Reliable Computation by Formulas”, IEEE Trans on Information Theory, vol. 49(11),
pages 3094–3098, 2003.
[7] W. Evans and N. Pippenger, “On the Maximum Tolerable Noise for Reliable Computation by Formulas”, IEEE Transactions on Information Theory, vol. 44(3), pages 1299–
1305, 1998.
[8] J. B. Gao, Y. Qi and J. A. B. Fortes, “Bifurcations and Fundamental Error Bounds for
Fault-Tolerant Computations”, IEEE Transactions on Nanotechnology, vol. 4(4), pages
395–402, 2005.
[9] D. Marculescu, R. Marculescu and M. Pedram, “Theoretical Bounds for Switching Activity Analysis in Finite-State Machines”, IEEE Transactions on VLSI Systems, vol. 8(3),
pages 335–339, 2000.
[10] P. G. Depledge, “Fault-Tolerant Computer Systems”, IEE Proc. A, vol. 128(4), pages
257–272, 1981.

106

[11] S. Spagocci and T. Fountain, “Fault Rates in Nanochip Devices”, in Electrochemical
Society, pages 354–368, 1999.
[12] J. Han and P. Jonker, “A Defect- and Fault-Tolerant Architecture for Nanocomputers”,
Nanotechnology, vol. 14, pages 224–230, 2003.
[13] S. Roy and V. Beiu, “Majority Multiplexing-Economical Redundant Fault-tolerant Designs for Nano Architectures”, IEEE Transactions on Nanotechnology, vol. 4(4), pages
441–451, 2005.
[14] K. Nikolic, A. Sadek, and M. Forshaw, “Fault-Tolerant Techniques for Nanocomputers,”
Nanotechnology, vol. 13, pages 357–362, 2002.
[15] J. Han, E. Taylor, J. Gao and J. A. B. Fortes, ‘”Reliability Modeling of Nanoelectronic
Circuits”, IEEE Conference on Nanotechnology, 2005.
[16] J. B. Gao, Yan Qi and J.A.B. Fortes, “Markov Chains and Probabilistic Computation A General Framework for Multiplexed Nanoelectronic Systems”, IEEE Transactions on
Nanotechnology, vol. 4(2), pages 395–402, 2005.
[17] E. Taylor, J. Han and J. A. B. Fortes, “Towards Accurate and Efficient Reliability Modeling of Nanoelectronic Circuits”, IEEE Conference on Nanotechnology, pages 395–398,
2006.
[18] M. O. Simsir, S. Cadambi, F. Ivancic, M. Roetteler and N. K. Jha, “Fault-Tolerant Computing Using a Hybrid Nano-CMOS Architecture”, International Conference on VLSI
Design, pages 435–440, 2008.
[19] C. Chen and Y. Mao, “A Statistical Reliability Model for Single-Electron Threshold
Logic”, IEEE Transactions on Electron Devices, vol. 55, pages 1547–1553, 2008.
[20] A. Abdollahi, “Probabilistic Decision Diagrams for Exact Probabilistic Analysis”,
IEEE/ACM International Conference on Computer-Aided Design, pages 266–272, 2007.
[21] M. R. Choudhury and K. Mohanram, “Accurate and Scalable Reliability Analysis of
Logic Circuits”, Design, Automation, and Test in Europe (DATE) conference, pages
1454–1459, 2007.
[22] S. Lazarova-Molnar, V. Beiu and W. Ibrahim, “A Strategy for Reliability Assessment
of Future Nano-Circuits”, WSEAS International Conference on Circuits, pages 60–65,
2007.
[23] P. P. Shenoy and G. Shafer, “Propagating Belief Functions with Local Computations”,
IEEE Expert, vol. 1(3), pages 43–52, 1986.
[24] P. P. Shenoy, “Binary Join Trees for Computing Marginals in the Shenoy-Shafer Architecture”, International Journal of Approximate Reasoning, pages 239–263, 1997.
107

[25] P. P. Shenoy, “Valuation-Based Systems: A Framework for Managing Uncertainty in
Expert Systems”, Fuzzy Logic for the Management of Uncertainty, pages 83–104, 1992.
[26] J. Pearl, “Probabilistic Reasoning in Intelligent Systems: Network of Plausible Inference”, Morgan Kaufmann Publishers, Inc., 1988.
[27] F. V. Jensen, S. Lauritzen and K. Olesen, “Bayesian Updating in Recursive Graphical
Models by Local Computation”, Computational Statistics Quarterly, pages 269-282,
1990.
[28] R. G. Cowell, A. P. David, S. L. Lauritzen and D. J. Spiegelhalter, “Probabilistic Networks and Expert Systems,” Springer-Verlag New York, Inc., 1999.
[29] J. D. Park and A. Darwiche, “Solving MAP Exactly using Systematic Search”, Conference on Uncertainty in Artificial Intelligence, 2003.
[30] J. D. Park and A. Darwiche, “Approximating MAP using Local Search”, Conference on
Uncertainty in Artificial Intelligence, pages 403–410, 2001.
[31] Sensitivity
Analysis,
Modeling,
Inference
and
More
(SAMIAM),
http://reasoning.cs.ucla.edu/samiam/, Automated Reasoning Group, University of
California, Los Angeles.
[32] J. P. Roth, “Diagnosis of Automata Failures: A Calculus and a Method”, IBM Journal of
Research and Development, vol. 10(4), pages 278–291, 1966.
[33] P. Goel, “An Implicit Enumeration Algorithm to Generate Tests for Combinational Logic
Circuits”, IEEE Transactions on Computers, vol. C-30(3), pages 215–222, 1981.
[34] H. Fujiwara and T. Shimono, “On The Acceleration of Test Generation Algorithms”,
IEEE Transactions on Computers, vol. C-32(12), pages 1137–1144, 1983.
[35] V. D. Agrawal, S. C. Seth and C. C. Chuang, “Probabilistically Guided Test Generation”,
IEEE International Symposium on Circuits and Systems, pages 687–690, 1985.
[36] J. Savir, G. S. Ditlow and P. H. Bardell, “Random Pattern Testability”, IEEE Transactions on Computers, vol. C-33(1), pages 79–90, 1984.
[37] C. Seth, L. Pan and V. D. Agrawal, “PREDICT - Probabilistic Estimation of Digital
Circuit Testability”, IEEE International Symposium on Fault-Tolerant Computing, pages
220–225, 1985.
[38] S. T. Chakradhar, M. L. Bushnell and V. D. Agrawal, “Automatic Test Generation using Neural Networks”, IEEE International Conference on Computer-Aided Design, vol.
7(10), pages 416–419, 1988.

108

[39] M. Mason, “FPGA Reliability in Space-Flight and Automotive Applications”, FPGA
and Programmable Logic Journal, 2005.
[40] E. Zanoni and P. Pavan, “Improving the Reliability and Safety of Automotive Electronics”, IEEE Micro, vol. 13(1), pages 30–48, 1993.
[41] P. Gerrish, E. Herrmann, L. Tyler and K. Walsh, “Challenges and Constraints in Designing Implantable Medical ICs”, IEEE Transactions on Device and Materials Reliability,
vol. 5(3), pages 435–444, 2005.
[42] L. Stotts, “Introduction to Implantable Biomedical IC Design”, IEEE Circuits and Devices Magazine, pages 12–18, 1999.
[43] S. Cheemalavagu, P. Korkmaz, K. V. Palem, B. E. S. Akgul and L. N. Chakrapani, “A
Probabilistic CMOS Switch and its Realization by Exploiting Noise”, IFIP International
Conference on Very Large Scale Integration, 2005.
[44] Military Standard (MIL-STD-883), “Test Methods and Procedures for Microelectronics”, 1996.
[45] S. Krishnaswamy, G. S. Viamontes, I. L. Markov, and J. P. Hayes, “Accurate Reliability
Evaluation and Enhancement via Probabilistic Transfer Matrices”, Design Automation
and Test in Europe (DATE) Conference, pages 282–287, 2005.
[46] J. Han, J. B. Gao, P. Jonker, Y. Qi and J. A. B. Fortes, “Toward Hardware-Redundant
Fault-Tolerant Logic for Nanoelectronics”, IEEE Transactions on Design and Test of
Computers, vol. 22(4), pages 328–339, 2005.
[47] R. I. Bahar, J. Mundy, and J. Chan, “A Probabilistic Based Design Methodology for
Nanoscale Computation”, International Conference on Computer Aided Design (ICCAD), pages 480–486, 2003.
[48] D. Bhaduri and S. K. Shukla, “NANOPRISM: A Tool for Evaluating Granularity vs.
Reliability Trade-offs in Nano Architectures”, Great Lakes Symposium on VLSI, pages
109–112, 2004.
[49] L. P. Yuan, C. C. Teng and S. M. Kang, “Statistical Estimation of Average Power Dissipation in Sequential Circuits,” Design Automation Conference (DAC), pages 377–382,
1997.
[50] HUGIN Inference Tool, http://www.hugin.com/, HUGIN EXPERT A/S, Aalborg, Denmark.
[51] N. M. Zivanov and D. Marculescu, “Soft Error Rate Analysis for Sequential Circuits”,
Design Automation and Test in Europe (DATE) Conference, pages 1–6, 2007.

109

[52] J. J. Shedletsky and E. J. McCluskey, “The Error Latency of a Fault in a Sequential
Digital Circuit”, IEEE Transactions on Computers, vol. C-25(6), pages 655–659, 1976.
[53] S. Y. Huang, K. T. Cheng, K. C. Chen and J. Y. Lu, “Fault-Simulation Based Design
Error Diagnosis for Sequential Circuits”, Design Automation Conference (DAC), pages
632–637, 1998.
[54] H. Asadi and M. B. Tahoori, “Soft Error Modeling and Protection for Sequential Elements”, IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems,
pages 463–471, 2005.
[55] R. Baumann, “Soft Errors in Advanced Computer Systems”, IEEE Design and Test of
Computers, vol. 22(3), pages 258–266, 2005.
[56] M. Zhang and N. R. Shanbag, “A Soft Error Rate Analysis (SERA) Methodology”, International Conference on Computer Aided Design (ICCAD), pages 111–118, 2004.
[57] S. Winograd and J. D. Cowan, “Reliable Computation in the Presence of Noise”, The
MIT Press, 1963.
[58] G. Norman, D. Parker, M. Kwiatkowska and S. K. Shukla, “Evaluating the Reliability of
Defect-Tolerant Architectures for Nanotechnology with Probabilistic Model Checking”,
International Conference on VLSI Design, pages 907–912, 2004.
[59] International
Technology
Roadmap
for
Semiconductors
http://www.itrs.net/Links/2005ITRS/ERD2005.pdf, 2005.

(ITRS),

[60] G. E. Moore, “Cramming More Components onto Integrated Circuits”, Electronics, vol.
38(8), 1965.
[61] L. B. Kish, “End of Moore’s Law: Thermal (Noise) Death of Integration in Micro and
Nano Electronics”, Physics Letters A, vol. 305(3–4), pages 144–149, 2002.
[62] C. W. Gwyn, D. L. Scharfetter and J. L. Wirth, “The Analysis of Radiation Effects
in Semiconductor Junction Devices”, IEEE Transactions on Nuclear Science, vol. NS14(6), pages 153–169, 1967.
[63] D. C. DAvanzo, M. Vanzi and R. W. Dutton, “One-Dimensional Semiconductor Device
Analysis, Tech. Rep. no. G-201-5, Stanford Electronics Laboratories, Stanford University, 1979.
[64] S. Selberherr, W. Fichtner and H. W. Potzl, “MINIMOS - A Program Package to Facilitate MOS Device Design and Analysis”, NASECODE I, pages 275–279, 1979.
[65] P. E. Cottrell and E. M. Buturla, “Two-Dimensional Static and Transient Simulation of
Mobile Carrier Transport in a Semiconductor, NASECODE I, pages 31–64, 1979.

110

[66] E. M. Buturla, P. E. Cottrell, B. M. Grossman, K. A. Salsburg, M. B. Lawlor and C. T.
McMullen, “Three-Dimensional Finite Element Simulation of Semiconductor Devices”,
IEEE Int. Solid State Circuits Conf. Dig. Tech. Papers, pages 76–77, 1980.
[67] M. R. Pinto, C. S. Rafferty and R. W. Dutton, “PISCES-II: Poisson and Continuity Equation Solver”, Stanford Electronics Laboratories, 1984.
[68] P. E. Dodd, “Device Simulation of Charge Collection and Single-Event Upset”, IEEE
Transactions on Nuclear Science, vol. 43(2), pages 561–575, 1996.
[69] G. R. Srinivasan, H. K. Tang and P. C. Murley, “Parameter-Free, Predictive Modeling
of Single Event Upsets due to Protons, Neutrons and Pions in Terrestrial Cosmic Rays”,
IEEE Transactions on Nuclear Science, vol. 41(6), pages 2063–2070, 1994.
[70] M. R. Choudhury, Q. Zhou and K. Mohanram, “Design Optimization for Single-Event
Upset Robustness using Simultaneous Dual-VDD and Sizing Techniques”, International
Conference on Computer Aided Design (ICCAD), pages 204–209, 2006.
[71] K. Bhattacharya and N. Ranganathan, “A New Placement Algorithm for Reduction of
Soft Errors in Macro Cell based Design of Nanometer Circuits”, IEEE Computer Society
Annual Symposium on VLSI (ISVLSI), pages 91–96, 2009.
[72] N. Miskov-Zivanov and D. Marculescu, “MARS-C: Modeling and Reduction of Soft
Errors in Combinational Circuits”, Design Automation Conference (DAC), pages 767–
772, 2006.
[73] P. K. Samudrala, J. Ramos and S. Katkoori, “Selective Triple Modular Redundancy
(STMR) Based Single-Event-Upset (SEU) Tolerant Synthesis for FPGAs”, IEEE Transactions on Nuclear Science, vol. 51(5), pages 2957–2969, 2004.
[74] W. A. Moreno, J. R. Samson Jr. and F. J. Falquez, “Laser Injection of Soft Faults for
the Validation of Dependability Design”, Journal of Universal Computer Science, vol.
5(10), pages 712–729, 1999.
[75] Paris D. Wiley, “Fault Tolerant Design Verification Through The Use of Laser Fault
Injection”, Masters Thesis, Department of Electrical Engineering, University of South
Florida, 2004.
[76] A. Sanyal, S. M. Alam and S. Kundu, “A Built-In Self-Test Scheme for Soft Error Rate
Characterization”, IEEE International On-Line Testing Symposium, pages 65–70, 2008.
[77] S. Bhanja and N. Ranganathan, “Switching Activity Estimation of VLSI Circuits using
Bayesian Networks”, IEEE Transactions on VLSI Systems, pages 558–567, 2003.
[78] S. Bhanja and N. Ranganathan, “Cascaded Bayesian Inferencing for Switching Activity
Estimation with Correlated Inputs”, IEEE Transaction on VLSI Systems, vol. 12(12),
pages 1360–1370, 2004.
111

[79] S. Bhanja and N. Ranganathan, “Modeling Switching Activity Using Cascaded Bayesian
Networks for Correlated Input Streams ”, International Conference on Computer Design
(ICCD), pages 388–390, 2002.
[80] S. Bhanja and N. Ranganathan, “Accurate Switching Activity Estimation of Large Circuits using Multiple Bayesian Networks”, 15th Intl. Conference of VLSI Design and 7th
ASP-Design and Automation Conference, pages 187–192, 2002.
[81] S. Bhanja and N. Ranganathan, “Dependency Preserving Probabilistic Modeling of
Switching Activity using Bayesian Networks”, IEEE/ACM Design Automation Conference (DAC), pages 209–214, 2001.
[82] S. Bhanja and S. Sarkar, “Probabilistic Modeling of QCA Circuits using Bayesian Networks”, IEEE Transactions on Nanotechnology, vol. 5(6), pages 657–670, 2006.
[83] S. Bhanja and S. Sarkar, “Switching Error Modes of QCA Circuits”, IEEE Conference
on Nanotechnology, vol. 1, pages 383–386, 2006.
[84] S. Bhanja and S. Sarkar, “Graphical Probabilistic Inference for Ground State and NearGround State Computing in QCA Circuits”, IEEE Conference on Nanotechnology, pages
290–293, 2005.
[85] T. Rejimon and S. Bhanja, “A Timing-Aware Probabilistic Model for Single-Event-Upset
Analysis”, IEEE Transactions on VLSI Systems, vol. 14(10), pages 1130–1139, 2006.
[86] T. Rejimon and S. Bhanja, “Probabilistic Error Model for Unreliable Nano-logic Gates”,
IEEE Conference on Nanotechnology, pages 717–722, 2006.
[87] T. Rejimon and S. Bhanja, “Time and Space Efficient Method for Accurate Computation
of Error Detection Probabilities”, IEE Computers and Digital Techniques, vol. 152(5),
pages 679–685, 2005.
[88] T. Rejimon and S. Bhanja, “A Stimulus-Free Probabilistic Model for Single-Event-Upset
Sensitivity”, IEEE Intl. Conference on VLSI Design, 2006.
[89] T. Rejimon, L. Hoffmann and S. Bhanja, “A Probabilistic Model for Single-EventUpset”, 12th NASA Symposium on VLSI, 2005.
[90] T. Rejimon and S. Bhanja, “An Accurate Probabilistic Model for Error Detection”, 18th
International Conference in VLSI Design, pages 717–722, 2005.
[91] T. Rejimon and S. Bhanja, “Scalable Probabilistic Computing Models using Bayesian
Networks”, IEEE Midwest Symposium on Circuits and Systems, pages 712–715, 2005.
[92] S. Ramani and S. Bhanja, “Anytime Probabilistic Switching Model using Bayesian Networks,” International Symposium on Low Power Electronic Design, pages 86–89, 2004.

112

[93] N. Ramalingam and S. Bhanja, “Causal Probabilistic Input Dependency Learning for
Switching Model in VLSI Circuits”, ACM Great Lake Symposium on VLSI, pages 112–
115, 2005.
[94] S. Srivastava and S. Bhanja, “Hierarchical Probabilistic Macromodeling for QCA Circuits”, IEEE Transactions on Computers, vol. 56(2), pages 174–190, 2007.
[95] S. Srivastava and S. Bhanja, “Bayesian Macromodeling for Circuit Level QCA Design”,
IEEE Conference on Nanotechnology, pages 31–34, 2006.
[96] S. Srivastava and S. Bhanja, “Hierarchical Bayesian Macromodeling for QCA Circuits”,
12th NASA Symposium on VLSI, 2005.
[97] S. Bhanja and S. Srivastava, “Bayesian Modeling of Quantum-dot Cellular Automata
Circuits”, NSTI, Nanotechnology Conference, 2005.
[98] S. Bhanja, K. Lingasubramanian and N. Ranganathan, “A Stimulus-Free Graphical Probabilistic Switching Model for Sequential Circuits using Dynamic Bayesian Networks”,
ACM Transactions on Design Automation of Electronic Systems, vol. 11(3), pages 773–
796, 2006.
[99] T. Rejimon, K. Lingasubramanian and S. Bhanja, “Probabilistic Error Model for NanoDomain Logic Circuits”, IEEE Transactions on VLSI, vol. 17(1), pages 55–65, 2008.
[100] K. Lingasubramanian and S.Bhanja, “Probabilistic Maximum Error Modeling for Unreliable Logic Circuits”, ACM Great Lake Symposium on VLSI, pages 223–226, 2007.
[101] K. Lingasubramanian and S. Bhanja, “Probabilistic Error Modeling for Sequential
Logic”, IEEE International Conference on Nanotechnology, pages 616–620, 2007.
[102] K. Lingasubramanian and S. Bhanja, “An Error Model to Study the Behavior of Transient Errors in Sequential Circuits”, IEEE International Conference on VLSI Design,
pages 485–490, 2009.
[103] A. Shareef, K. Lingasubramanian and S. Bhanja, “Selective Redundancy: Evaluation
of Temporal Reliability Enhancement Scheme for Nanoelectronic Circuits”, IEEE Conference on Nanotechnology, pages 895–898, 2008.
[104] S. Bhanja, K. Lingasubramanian and N. Ranganathan, “Estimation of Switching Activity in Sequential Circuits Using Dynamic Bayesian Networks”, 18th International
Conference in VLSI Design, pages 586–591, 2005.

113

ABOUT THE AUTHOR

Karthikeyan Lingasubramanian received the B.E. degree in electronics and communication engineering from Kumaraguru College of Technology, India, in 2001. He received
his M.S. degree in electrical engineering from University of South Florida, Tampa, USA, in
2004, where he is currently pursuing the Ph.D. degree in electrical engineering. His research
interests include design automation and testing, comprehensive nano-domain probabilistic and
statistical models for estimation and optimization of error and power.

