Combining Error-Correcting Codes and Decision Diagrams for the Design of Fault-Tolerant Logic by Astola, Helena
HELENA ASTOLA
COMBINING ERROR-CORRECTING CODES AND DECISION
DIAGRAMS FOR THE DESIGN OF FAULT-TOLERANT LOGIC
Master of Science Thesis
Examiners: Prof. Ioan Tabus
Prof. Radomir Stankovi¢
Subject approved in the meeting of
The Faculty Council of Computing and
Electrical Engineering on April 6, 2011
II
ABSTRACT
TAMPERE UNIVERSITY OF TECHNOLOGY
Master's Degree Programme in Information Technology
HELENA ASTOLA : Combining Error-Correcting Codes and Decision Dia-
grams for the Design of Fault-Tolerant Logic
Master of Science Thesis, 63 pages
September 2011
Major: Signal Processing
Examiners: Prof. Ioan Tabus and Prof. Radomir Stankovi¢
Keywords: Error-correcting codes, logic design, fault-tolerant circuits, decision diagrams
In modern logic circuits, fault-tolerance is increasingly important, since even atomic-
scale imperfections can result in circuit failures as the size of the components is
shrinking. Therefore, in addition to existing techniques for providing fault-tolerance
to logic circuits, it is important to develop new techniques for detecting and cor-
recting possible errors resulting from faults in the circuitry.
Error-correcting codes are typically used in data transmission for error detection
and correction. Their theory is far developed, and linear codes, in particular, have
many useful properties and fast decoding algorithms. The existing fault-tolerance
techniques utilizing error-correcting codes require less redundancy than other error
detection and correction schemes, and such techniques are usually implemented
using special decoding circuits.
Decision diagrams are an ecient graphical representation for logic functions,
which, depending on the technology, directly determine the complexity and layout
of the circuit. Therefore, they are easy to implement.
In this thesis, error-correcting codes are combined with decision diagrams to
obtain a new method for providing fault-tolerance in logic circuits. The resulting
method of designing fault-tolerant logic, namely error-correcting decision diagrams,
introduces redundancy already to the representations of logic functions, and as a
consequence no additional checker circuits are needed in the circuit layouts obtained
with the new method. The purpose of the thesis is to introduce this original concept
and provide fault-tolerance analysis for the obtained decision diagrams.
The fault-tolerance analysis of error-correcting decision diagrams carried out in
this thesis shows that the obtained robust diagrams have a signicantly reduced
probability for an incorrect output in comparison with non-redundant diagrams.
However, such useful properties are not obtained without a cost, since adding re-
dundancy also adds complexity, and consequently better error-correcting properties
result in increased complexity in the circuit layout.
III
TIIVISTELMÄ
TAMPEREEN TEKNILLINEN YLIOPISTO
Tietotekniikan koulutusohjelma
HELENA ASTOLA: Combining Error-Correcting Codes and Decision Dia-
grams for the Design of Fault-Tolerant Logic
Diplomityö, 63 sivua
Syyskuu 2011
Pääaine: Signaalinkäsittely
Tarkastajat: Prof. Ioan Tabus ja Prof. Radomir Stankovi¢
Avainsanat: Virheitäkorjaavat koodit, logiikkasuunnittelu, vikasietoiset piirit, päätösdiag-
rammit
Moderneissa logiikkapiireissä vikasietoisuuden merkitys on entistä suurempi, sillä
komponenttien pienentyessä jopa atomitason epätäydellisyydet voivat aiheuttaa häi-
riöitä piirien toiminnassa. Olemassaolevien tekniikoiden lisäksi onkin tärkeää kehit-
tää uusia menetelmiä, joilla häiriöistä johtuvat virheet voidaan havaita ja korjata.
Virheitäkorjaavia koodeja käytetään tyypillisesti tiedonsiirrossa aiheutuvien virhei-
den havaitsemiseen ja korjaamiseen. Näiden koodien teoria on pitkälle kehittynyttä,
ja erityisesti lineaarisilla koodeilla on useita hyödyllisiä ominaisuuksia ja tehokkaita
dekoodausalgoritmeja. Logiikkapiirien vikasietoisuuden lisäämisessä käytetyt mene-
telmät, jotka pohjautuvat virheitäkorjaaviin koodeihin, toteutetaan tavallisesti liit-
tämällä piiristöön erityisiä dekoodauksen suorittavia piirejä, ja nämä menetelmät
vaativat usein vähemmän redundanssia kuin muut vastaavat tekniikat.
Päätösdiagrammit ovat tehokas keino esittää logiikkafunktioita, jotka toteutuksen
teknologiasta riippuen määrittävät suoraan suunniteltavan piirin kompleksisuuden,
ja niiden toteuttaminen piiritasolla on helppoa.
Tässä opinnäytetyössä virheitäkorjaavat koodit ja päätösdiagrammit yhdistämällä
on kehitetty uudenlainen tekniikka vikasietoisen logiikan suunnitteluun. Kehitetyllä
menetelmällä redundanssi saadaan jo logiikkafunktioiden esitysmuotoihin eikä valmiis-
sa piireissä näin ollen tarvita ylimääräisiä virheiden havainnoinnin tai korjaamisen
suorittavia osia. Opinnäytetyön tarkoituksena on esittää tämä uusi menetelmä ja
arvioida saatujen päätösdiagrammien vikasietoisuutta.
Opinnäytetyössä on mallinnettu kehitetyn menetelmän suorituskykyä vikasietoi-
suuden kannalta ja osoitettu, että suunnittelun tuloksena saaduilla virheitäkorjaa-
villa päätösdiagrammeilla on huomattavasti pienempi todennäköisyys virheellisiin
ulostuloihin kuin vastaavilla tavanomaisilla diagrammeilla. Redundanssin lisäämi-
nen kuitenkin lisää myös kompleksisuutta, ja tehokkaamman virheenkorjauksen myö-
tä piirin kompleksisuus kasvaa. Kuitenkin jo maltillisella kompleksisuuden lisäämisel-
lä saavutetaan huomattavia parannuksia piirin vikasietoisuudessa.
IV
PREFACE
This Master of Science thesis has been written for the Department of Signal Pro-
cessing at the Tampere University of Technology. The research done for this thesis
in the Department of Signal Processing has also resulted in two research papers,
[4] and [5], which introduce and analyze the fault-tolerant logic design application
discussed in this thesis.
I wish to thank my supervisors Prof. Ioan Tabus and Prof. Radomir Stankovi¢
for their valuable comments and guidance, and Ph.D. Stanislav Stankovi¢ for his
help and taking part in the research. I am also grateful to Prof. Jaakko Astola for
his help and support.
Tampere, August 15, 2011
Helena Astola
helena.astola@tut.
VCONTENTS
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Discrete Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Decision Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Binary Decision Diagrams . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Multiple-Valued Decision Diagrams . . . . . . . . . . . . . . . . . 8
2.2.3 Multi-terminal and Shared Decision Diagrams . . . . . . . . . . . 9
2.3 Fields and Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Error-Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.1 Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.2 The Hamming Metric . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.3 The Lee Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.4 Decoding of Linear Codes . . . . . . . . . . . . . . . . . . . . . . . 15
3. Fault-Tolerance in Logic Circuits . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Faults in Digital Systems . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Fault-Tolerance Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Triple Modular Redundancy . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Self-Checking Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 The (N;K) Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6 Low-Density Parity-Check Codes . . . . . . . . . . . . . . . . . . . . . 24
4. Error-Correcting Decision Diagrams . . . . . . . . . . . . . . . . . . . . . . 26
4.1 Introduction to Error-Correcting Decision Diagrams . . . . . . . . . . 26
4.2 Formal Denition of Error-Correcting Decision Diagrams . . . . . . . . 29
4.3 Constructing Error-Correcting Decision Diagrams . . . . . . . . . . . . 29
4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4.1 Binary (5; 2) Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4.2 Hamming (7; 4) Code . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4.3 Shortened Hamming Code . . . . . . . . . . . . . . . . . . . . . . 34
4.4.4 Ternary Hamming (4; 2) Code . . . . . . . . . . . . . . . . . . . . 37
4.4.5 Repetition Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4.6 One-Lee-Error-Correcting Code for q = 5 . . . . . . . . . . . . . . 40
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams . . . . . . . 42
5.1 The Probability Model for Diagrams based on Codes in the Hamming
Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 The Probability Model for Diagrams based on Codes in the Lee Metric 45
5.3 Results of the Fault-Tolerance Analysis . . . . . . . . . . . . . . . . . 47
5.4 Approximating the Probability of Correct Outputs . . . . . . . . . . . 51
VI
6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.1 Discussion on Error-Correcting Decision Diagrams . . . . . . . . . . . 56
6.2 Discussion on the Fault-Tolerance Analysis . . . . . . . . . . . . . . . 57
7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
VII
ABBREVIATIONS AND NOTATION
 Logical exclusive OR.
c A codeword.
dH(x;y) The Hamming distance between the vectors x and y.
dL(x;y) The Lee distance between the vectors x and y.
e The number of errors a code corrects.
Fq The nite eld of q elements.
Fiq The vector space of length i vectors over Fq.
G A generator matrix of a linear code.
H A parity check matrix of a linear code.
Ii The identity matrix of size i i.
k The number of inputs of a function; the dimension of a linear code,
which equals the number of inputs of a non-robust function.
K The number of repeated memory elements.
l The number of outputs of a function.
L The number of subsets of f1; 2; : : : ;Mg.
i A random variable.
M The number of non-terminal nodes in a decision diagram.
n The length of a code, which for robust diagrams equals the number of
variables.
N The number of repeated modules (processors in the (N;K) concept).
p The probability that a decision node is faulty.
Pi A subset of the set f1; 2; : : : ;Mg.
q The size of the domain and range of a q-ary function, i.e., the number
of elements in the nite eld Fq.
 A binary relation.
r The number of repetitions in a repetition code.
s The syndrome of a word v 2 Fnq .
 The covering radius of a code.
wH The Hamming weight of a vector.
wL The Lee weight of a vector.
BDD Binary decision diagram.
BDT Binary decision tree.
DNF Disjunctive normal form.
ECC Error-correcting code.
LDPC Low-density parity-check (code).
MTBDD Multi-terminal binary decision diagram.
VIII
MTDD Multi-terminal decision diagram.
NMR N -modular redundancy.
OBDD Ordered binary decision diagram.
PLA Programmable logic array.
TMR Triple modular redundancy.
TSC Totally self-checking (circuit).
11. INTRODUCTION
The role of digital systems in modern life is increasingly important, and often these
systems handle critical information and their accurate performance is essential for a
given application. This high dependability is required, for example, in military and
aerospace computing. In addition to these areas where the demands are extremely
high, in most applications high dependability makes the products more competitive
as their digital circuits perform their designed functions with a lower error rate. In
modern logic circuits, the transistors are shrinking, which means that even atomic-
scale imperfections in each transistor can have a negative eect on the performance of
these circuits. Therefore, in addition to existing techniques against failures in digital
systems, it is important to develop new techniques for detecting and, in particular,
correcting errors resulting from dierent types of faults in digital circuits.
Due to the importance of fault-tolerance in digital systems, numerous techniques
have been developed against hardware failures. The most well-known such technique
is triple modular redundancy, for which the groundwork was laid by von Neumann in
[35]. It is a technique, which introduces redundancy to logic circuits by module trip-
lication. Several authors have later further analyzed and developed this technique
(see, for instance, [2], [17]).
Error-correcting codes have many useful properties and they are most typically
used in data transmission to detect and correct errors on noisy communication chan-
nels [34]. With error-correcting codes, error detection and correction in logic circuits
can often be implemented with less redundancy than when using other methods [26].
The properties of error-correcting codes are widely studied, and the highly devel-
oped theory behind them makes them a fruitful basis for new error detection and
correction schemes. Coding theory has been exploited in several techniques for pro-
viding fault-tolerance in logic circuits, usually by introducing special circuits into
logic modules, which handle the detection and correction of possible errors [26].
Since error correction in logic circuits is increasingly important, it is essential
to nd systematic ways to increase fault-tolerance already in the representations of
switching functions, i.e., functions realized by the circuits. The method for providing
fault-tolerance introduced in this thesis combines the theory of error-correcting codes
and decision diagrams to obtain robust representations for functions, which are easily
implemented with the suitable technology. The main advantage of using decision
1. Introduction 2
diagrams for representing switching functions is that the layout and complexity
of a circuit is directly determined by the decision diagram. The idea is to create
error-correcting decision diagrams, i.e., a way of representing switching functions in
a robust manner that can directly be mapped to technology. The purpose of the
thesis is to introduce this original concept and provide fault-tolerance analysis for
the obtained decision diagrams. The idea of combining error-correcting codes and
decision diagrams introduced in this thesis has also resulted in two conference papers
[4], [5], where error-correcting decision diagrams and their performance is discussed.
The thesis is structured as follows. In Chapter 2, the mathematical background
for the topics discussed in this thesis is reviewed. This includes the denitions for
decision diagrams and the theory of error-correcting codes, as well as some basic
concepts related to discrete functions and their representations. Fault-tolerance in
digital systems is explained in more detail in Chapter 3, where also existing error
correction and detection schemes, in particular those utilizing coding theory, are
discussed.
In Chapter 4, an original technique, namely error-correcting decision diagrams,
for providing fault-tolerance in logic circuits is presented for binary and multiple-
valued logic. The performance of this technique is analyzed in Chapter 5, where
the probabilities for correct outputs in such constructions are determined. Since
determining exact probabilities for larger congurations is very time-consuming, a
method of approximating these probabilities is also introduced. Further discussion
based on the analysis is included in Chapter 6.
32. MATHEMATICAL BACKGROUND
In this chapter, the mathematical background for the topics in this thesis is pro-
vided. Since digital circuits realize discrete functions, we begin by dening discrete
functions. Then, decision diagrams are introduced as representations of discrete
functions. The fault-tolerant circuit designs introduced in this thesis are derived
using the theory of error-correcting codes, which are explained in the nal section
of this chapter. However, before moving on to the theory of error-correcting codes,
some basic denitions regarding elds and vector spaces are provided for a better
understanding of the theory.
2.1 Discrete Functions
The functions discussed in this thesis belong to the class of discrete functions, which
can be dened in the following way [6]:
Denition 2.1 Let A and B be sets. Let  be a binary relation from A to B. If
for every element a 2 A there exists a unique element in b 2 B such that (a; b) 2 ,
then  is a function from A to B, which is denoted by  : A ! B. If A and B are
nite, then  is a discrete function.
The applications in this thesis concern discrete functions on nite sets, i.e., func-
tions for which the domain A and range B are nite. A function f of k variables
(inputs) is denoted as f(x0; x1; : : : ; xk 1), xi 2 A, f(x0; x1; : : : ; xk 1) 2 B, and a
multi-output function is dened as:
Denition 2.2 Let A and B be nite sets, k  0, l  0 and f : Ak ! Bl a
function. Then f is a multi-output function.
Thus, a multi-output function is equivalent to a system of single-output functions
f = (f0; f1; : : : ; fl 1).
The most commonly used functions in digital logic are switching (Boolean) func-
tions, i.e., functions f : f0; 1gk ! f0; 1g, which describe the behavior of binary logic
circuits. Logic systems with multiple inputs or outputs are represented by multi-
output switching functions that are functions of the form f : f0; 1gk ! f0; 1gl.
Multiple-valued functions are functions with a domain Ak and range Bl, where
jAj = jBj = q > 2. For example, ternary functions are a class of functions of the
2. Mathematical Background 4
form f : f0; 1; 2gk ! f0; 1; 2gl. Hence, for ternary functions, q = 3. Functions with
q = 4 are called quaternary functions. Generally, a function with a domain having q
values is a q-ary function. Notice that in this thesis we only refer to q-ary functions,
where q is the number of elements of the nite eld Fq (see, Section 2.3), but in
general, q can be any integer, which is larger than 2.
There exist several methods of representing discrete functions, e.g. truth-tables,
algebraic expressions, and graphic representations. Decision diagrams are a graphic
method of representing discrete functions, and they are explained in detail in Section
2.2. For understanding decision diagrams, certain representations and expressions
of logic functions are explained here.
Consider a switching function f(x0; x1; x2) : f0; 1g3 ! f0; 1g. This function can
be given by listing its values as (x0; x1; x2) run through the values of the domain
f0; 1g3. For a binary switching function, listing these values into a tabular form
is called a truth table (Table 2.1). The function f1 = f(x0; x1; x2) may also be
represented by a vector of the function values, which in case of switching functions
is called a truth-vector. The truth-vector of the switching function f1 dened in
Table 2.1 is F1 = [0; 1; 1; 1; 0; 1; 0; 1]
T . When representing switching functions with
truth-vectors, the ordering of the 2k binary input sequences should be specied.
Unless otherwise stated, we use lexicographic ordering as in Table 2.1.
Table 2.1: The truth-table of a binary switching function f1.
x0x1x2 f1(x0; x1; x2)
000 0
001 1
010 1
011 1
100 0
101 1
110 0
111 1
Switching functions can also be represented as a formula written in terms of
some operations over an algebraic structure. For these representations, some basic
denitions must be introduced. The following denitions are given for switching
functions but can directly be generalized to q-ary functions [6].
A two-valued variable xi may be written in terms of a positive literal xi or a
negative literal xi. A positive literal is just an atom, which is a logical formula
containing no subformulas, and a negative literal is the negation of an atom [29].
Denote by  the logical AND operation corresponding to a product of variables and
by + the logical OR operation corresponding to a sum of variables. Any switching
2. Mathematical Background 5
function can be written with literals and operations  and +.
For example, we can represent the function f1 in Table 2.1 as a canonical sum of
products, i.e., in the complete disjunctive normal form (DNF), which corresponds
to the lines on the table where f has the value 1 ( omitted):
f1 = x0x1x2 + x0x1x2 + x0x1x2 + x0x1x2 + x0x1x2; (2.1)
or equivalently in a more compact DNF as
f1 = x2 + x0x1: (2.2)
The expression in (2.2) is not canonical and can be derived from the sum of the two
functions f2 = x2 and f3 = x0x1 as shown in Table 2.2.
Table 2.2: The representation of f1 as the sum of f2 and f3.
x0x1x2 f2
000 0
001 1
010 0
011 1
100 0
101 1
110 0
111 1
+
x0x1x2 f3
000 0
001 0
010 1
011 1
100 0
101 0
110 0
111 0
=
x0x1x2 f1
000 0
001 1
010 1
011 1
100 0
101 1
110 0
111 1
2.2 Decision Diagrams
Decision diagrams are an eective way of representing discrete functions graphically.
The idea of representing switching circuits using reduced binary decision diagrams
(BDDs) was formalized by Bryant in [9], and the topic has been further explored by
numerous authors. Binary decision diagrams have many applications in logic design,
e.g., in logic circuit minimization [3] and probabilistic analysis of digital circuits [33].
Decision diagrams can also be used for representing other discrete functions than
binary switching functions, namely multiple-valued functions. An important feature
of decision diagrams is that they are easily mapped to technology. For example,
depending on the technology, the number of gates in the circuit directly relates to
the number of nodes in the decision diagram and the delay of the circuit is related to
path lengths. In Figure 2.1 is an example of the correspondence of a binary decision
diagram to a circuit layout, where the circuit is constructed using multiplexers.
More on circuit realization can be found in, for example, [28].
2. Mathematical Background 6
Figure 2.1: Correspondence between BDDs and networks of multiplexers [6].
In this section, the denitions of binary decision diagrams, multiple-valued, i.e.,
q-ary decision diagrams, multi-terminal decision diagrams (MTDDs) and shared
decision diagrams are given. The denitions and basic concepts and properties
related to decision diagrams are given according to [6], [23].
2.2.1 Binary Decision Diagrams
Binary decision diagrams are used to represent switching functions, i.e., functions
of the form f : f0; 1gk ! f0; 1g. We dene binary decision diagrams using binary
decision trees, which are graphic representations of functions in the complete DNF.
Denition 2.3 A binary decision tree (BDT) is a rooted directed graph having k+1
levels with two dierent types of vertices. On level i, where i = 0; : : : ; k   1, are
the non-terminal nodes, each having two outgoing edges labeled by 0 and 1 or by
corresponding literals xi and xi. On level k are the terminal nodes having the label
0 or 1 and no outgoing edges.
A BDT has a direct correspondence to the truth-table of a function. Let
f(x0; x1; : : : ; xk 1) be a switching function. In the binary decision tree of f , each
node on level i corresponds to a specic variable xi, and by following the edges the
value of the function at (x0; x1; : : : ; xk 1) is found in the terminal node. Figure 2.2
shows a BDT representing the function f1 dened in Table 2.1.
Denition 2.4 A binary decision diagram is a rooted directed graph obtained from
a binary decision tree by the following reduction rules:
1. If two sub-graphs represent the same function, delete one, and connect the edge
pointing to its root to the remaining subgraph.
2. If both edges of a node point to the same sub-graph, delete that node, and
directly connect its incoming edge to the sub-graph.
2. Mathematical Background 7
Figure 2.2: A BDT for the function f1 in Table 2.1.
In Figure 2.3 is a BDD representing the function f1 dened in Table 2.1.
Figure 2.3: A BDD for the function f1 in Table 2.1.
The letter S in the nodes means that the nodes in the diagrams are Shannon
nodes, i.e., the decision diagram is a graphic representation of the Shannon expansion
of the function, which is dened as follows.
Denition 2.5 The Shannon expansion of the switching function f(x0; x1; : : : ; xk 1)
with respect to the variable xi is f = xif0xif1, where f0 = f(x0; : : : ; xi 1; 0; xi+1; : : : ;
xk 1) and f1 = f(x0; : : : ; xi 1; 1; xi+1; : : : ; xk 1), and  denotes the logical Exclusive
OR.
As in most literature, when discussing BDDs, we refer to ordered binary decision
diagrams (OBDDs), where the variable xi corresponds to the level i of the decision
2. Mathematical Background 8
tree. In [9], it has been shown that the OBDD of a given function is canonical, i.e.,
for a given ordering, the OBDD of a given function is unique up to function graph
isomorphism, for which the denition was given by Bryant in [9]. Several important
consequences follow from the uniqueness of OBDDs, for example, equivalence of
functions can easily be tested using OBDDs. The property of the representation
of functions as BDDs being canonical means, that when we reduce the BDT of a
function with a given variable ordering into a BDD, the operations of reduction
cannot be made in such a way or order, that two non-isomorphic BDDs could be
obtained starting from the same BDT.
2.2.2 Multiple-Valued Decision Diagrams
The denition of a BDD is easily extended to the q-ary case for representing functions
with a larger than two-valued domain. Again, we dene the decision diagram using
the denition of a decision tree.
Denition 2.6 A q-ary decision tree is a rooted directed graph having k + 1 levels
with two dierent types of vertices. On level i, where i = 0; : : : ; k   1 are the non-
terminal nodes, each having q outgoing edges with labels from the set f0; 1; : : : ; q 1g
or the set of q-ary literals X0i ; X
1
i ; : : : ; X
q 1
i . On level k are the terminal nodes, which
have labels from the set f0; 1; : : : ; q   1g and no outgoing edges.
Denition 2.7 A q-ary decision diagram is a rooted directed graph obtained from
a q-ary decision tree by the reduction rules in Denition 2.4.
In Figure 2.4 is an example of the structure of nodes in a q-ary decision diagram
when q = 4.
Figure 2.4: The node structure in a quaternary decision diagram.
In the case of a q-ary function f(x0; x1; : : : ; xk 1), the Shannon expansion of f
with respect to the variable xi is f(x0; x1; : : : ; xk 1) = X0i f(x0; x1; : : : ; xi 1; 0; xi+1;
: : : ; xk 1)+X1i f(x0; x1; : : : ; xi 1; 1; xi+1; : : : ; xk 1)+   +Xq 1i f(x0; x1; : : : ; xi 1; q 
1; xi+1; : : : ; xk 1) [30].
2. Mathematical Background 9
2.2.3 Multi-terminal and Shared Decision Diagrams
The number of terminal nodes in decision diagrams is not necessarily limited to q
nodes. Such decision diagrams are called multi-terminal decision diagrams and are
used to represent functions with a range having more than q elements. The only
dierence between a decision diagram and a MTDD is the number of terminal nodes.
For example, a multi-terminal binary decision diagram (MTBDD) is a BDD having
more than two terminal nodes. In other words, a MTBDD represents a function
f : f0; 1gk ! B, where B has more than two elements. The construction of a
MTDD or MTBDD can be derived from decision trees similarly as described for
BDDs and q-ary decision diagrams.
MTDDs and shared decision diagrams are useful when dealing with multi-output
functions or systems of functions, where terminal nodes are labeled by the values
that the system can get. For example, for switching functions, the binary l-tuples of
the outputs are interpreted as binary representations of the corresponding integers
and terminal values are labeled by these integer values. Shared decision diagrams
are constructed by rst constructing the decision diagrams for the dierent outputs,
and then sharing the isomorphic subgraphs of the obtained decision diagrams.
In Figures 2.5a and 2.5b are examples of a MTBDD and a shared BDD. Both
diagrams represent a two-output function f = (f0; f1) where f0 = x0 and f1 = x0x1.
In the MTBDD, the output values are interpreted as the corresponding integers.
For example, with the input (0; 1), the output values are f0 = 0 and f1 = 1, which
in the MTBBD is interpreted as the binary representation 01, which corresponds to
the integer value 1.
(a) (b)
Figure 2.5: A MTBDD (a) and a shared binary decision diagram (b) [6].
2.3 Fields and Vector Spaces
Linear error-correcting codes are dened as subspaces of vector spaces over elds.
Therefore, some basic concepts related to elds and vector spaces must be provided
2. Mathematical Background 10
here. The following denitions and properties are given according to [13], [20].
A eld is dened using the denition of a group, which can be stated as:
Denition 2.8 A group (G; ) is a set G together with a binary operation  on G,
such that the following properties for G and  are satised:
1. The set G is closed under the operation , i.e., for a; b 2 G the result of the
operation a  b is also in G.
2. The operation  is associative, i.e., (a  b)  c = a  (b  c).
3. There is an element e 2 G such that e  a = a  e = a for all a 2 G. The
element e is called the identity element of G.
4. For each a 2 G there is an element a0 2 G such that a  a0 = a0  a = e. The
element a0 is called the inverse of a.
The group G is called Abelian if commutativity, i.e., ab = ba holds for all a; b 2 G.
The identity element (neutral element) of an Abelian group is usually denoted by 0.
A subset H of the set G is a subgroup if it also forms a group under the operation
.
Denition 2.9 If (G; ) is a group, g an element of G, and H a subgroup of G,
then gH = fg  h : h 2 Hg is a left coset of H in G and Hg = fh  g : h 2 Hg is a
right coset of H in G.
Cosets appear in the decoding process of linear codes. For Abelian groups, the
left and right cosets coincide.
Denition 2.10 A eld is a set F together with two operations, "+" and "", such
that the set is an Abelian group under the operation "+", the nonzero elements
of the set form an Abelian group under the operation "" and the distributive law
a  (b+ c) = a  b+ a  c holds.
The multiplicative identity element (the identity element of the Abelian group
under "") of a eld is called the unit element and it is usually denoted by 1.
Denition 2.11 Let F be a eld. A vector space V over the eld F is an additive
Abelian group, together with the association
(x; v)! xv;
where x 2 F and v 2 V , satisfying:
1. If 1 is the unit element of F, then 1v = v for all v 2 V .
2. Mathematical Background 11
2. If c 2 F and v; w 2 V , then c(v + w) = cv + cw.
3. If x; y 2 F and v 2 V , then (x+ y)v = xv + yv.
4. If x; y 2 F and v 2 V , then (xy)v = x(yv).
Denition 2.12 A subset W of V is a subspace of V if the following conditions are
satised:
1. If v; w 2 W , their sum v + w is also an element of W .
2. The identity element 0 of V is an element of W .
3. If v 2 W and c 2 F, then cv 2 W .
We denote a eld of q elements by Fq and the vector space of length i vectors over
this eld by Fiq. For vectors x 2 Fiq we take the convention of them being column
vectors, hence, a vector xT is always a row vector.
2.4 Error-Correcting Codes
Error-correcting codes are typically used for reliable delivery of digital information
over communication channels, which may introduce errors to the transmitted mes-
sage due to channel noise. Error-correcting codes add redundancy to messages,
which enable error detection and correction at the receiver. The theory of error-
correcting codes is in general a deep topic and many algorithms for the encoding
and decoding processes have been developed for particular code classes. However,
in principle for linear error-correcting codes with short codelengths, encoding and
decoding can be done using simple matrix and lookup operations, so their imple-
mentation is easy.
In the following, we dene and discuss important properties of error-correcting
codes. Some example codes are also given, but mostly the codes considered for
particular applications will be introduced later in the thesis as the corresponding
applications are discussed.
2.4.1 Linear Codes
In this section we recall basic denitions and properties of error-correcting codes,
focusing on linear codes, i.e., linear subspaces of Fnq . The error-correcting properties
of a code are dened over the metric, which is used for the given code, and for the
purposes of this thesis we consider linear codes over the Hamming metric and the
Lee metric. For general properties of error-correcting codes we refer to [7], [21], [34].
2. Mathematical Background 12
Denition 2.13 A code C is a subset of Fnq . C is called a linear code if C is a
linear subspace of Fnq .
The elements of C are length n vectors, which are called codewords. A linear code
C of dimension k  n is spanned by k linearly independent vectors of C, i.e., every
codeword can be written as a linear combination of these k linearly independent
vectors.
Denition 2.14 A matrix G having as rows any k linearly independent vectors of
C is called a generator matrix of the code C.
If the code has length n and dimension k it is called an (n; k) code.
Dene that two column vectors x and y are orthogonal if xTy = 0. The code C
of length n and dimension k can equivalently be specied by listing n   k linearly
independent vectors of C?, where C? is the subset of Fnq consisting of all vectors
orthogonal to all vectors of C. Any matrix H having as rows such n   k linearly
independent vectors is called a parity check matrix of C.
Denition 2.15 Two codes are equivalent if and only if their generator matrices
are obtained from each other by column permutations and elementary row operations.
Since the rows of the generator matrix are linearly independent and span the
linear code, i.e., linear subspace of Fnq , elementary row operations leave the space
unchanged. The permutation of columns corresponds to permutation of symbols in
the codewords. When these operations are performed for the generator matrix G of
the code C, the resulting code will be only trivially dierent from C.
A generator matrix G of the code C is in systematic form if
G = [IkjP]
=
266664
1 0 0    0 p1;1 p1;2    p1;n k
0 1 0    0 p2;1 p2;2    p2;n
...
...
...
. . .
...
0 0 0    1 pk;1 pk;2    pk;n k
377775 ;
where P is called the parity part of the generator matrix. Any generator matrix of
C can be put into this form by column permutations and elementary row operations.
The resulting generator matrix denes a systematic code, which is equivalent to C.
Also, if the generator matrix of C is G = [IkjP], it is clear that the parity check
matrix H is of the form
 PT jIn k, since
GHT = [Ik P]
"
 P
In k
#
=  P+P = 0:
2. Mathematical Background 13
For binary codes, we may write H in the form

PT jIn k

.
The code C encodes an information word i = [i0; i1; : : : ; ik 1]T to a length n
codeword c = [c0; c1; : : : ; cn 1]T by matrix multiplication cT = iT  G. Thus, the
code C can be dened as
C = fiTG j i 2 Fkqg
and equivalently with the parity check matrix H as
C = fc 2 Fnq j cTHT = 0g:
For example, if we have the following generator matrix:
G =
"
1 0 1 1 0
0 1 1 0 1
#
;
then the parity check matrix H for this code is:
H =
264 1 1 1 0 01 0 0 1 0
0 1 0 0 1
375 ;
and encoding the information word i = [0; 1]T we get the corresponding codeword
iTG = [0; 1] 
"
1 0 1 1 0
0 1 1 0 1
#
= [0; 1; 1; 0; 1]. Here k = 2 and n = 5.
2.4.2 The Hamming Metric
The vector space Fnq can be made into a metric space by dening the distance
between two vectors x and y. The Hamming metric is the most commonly used
metric for dening this distance and the properties of the error-correcting code.
Denition 2.16 The Hamming distance dH(x;y) of vectors x and y of length n is
the number of coordinates where x and y dier, i.e., dH(x;y) = jfi j xi 6= yigj.
Consider the vectors x = [0; 1; 0; 0]T and y = [1; 1; 0; 0]T 2 F42. The vectors dier
in the rst entry, hence, the Hamming distance is dH(x;y) = 1.
Denition 2.17 The Hamming-weight wH of a vector x is wH = dH(0;x).
Denition 2.18 A code C is e-error correcting if the minimum Hamming distance
between two codewords is 2e+ 1.
An e-error-correcting code can detect and correct up to e errors in the encoded
information word. This means, that if the values of the encoded information word
2. Mathematical Background 14
iTG change in  e positions, the decoder of the code will be able to detect and
correct the errors, i.e., interpret the received sequence as the correct information
word.
Denition 2.19 A code C 2 Fnq is called a q-ary  -covering code of length n if
for every word y 2 Fnq there is a codeword x 2 C such that the Hamming distance
dH(x;y)   . The smallest such  is called the covering radius of the code.
In other words, the covering radius of the code is the smallest  such that the
nite metric space Fnq is exhausted by spheres of radius  around the codewords.
Denition 2.20 A code is called perfect if it is e-error correcting and its covering
radius is e.
For a perfect code, the entire metric space is lled by the radius e spheres around
the codewords, with no overlaps. This means, that every length n sequence can be
traced back to a length k information word. However, if more than e errors occur,
the interpretation will be incorrect.
An important example of perfect codes in the Hamming metric are the Hamming
codes. Binary Hamming codes are a family of (2m   1; 2m   m   1) codes with
parity check matrices consisting of all 2m 1 distinct non-zero m-tuples. Since all of
the columns are distinct, no sum of two columns can be the zero vector, and hence
the code has minimum distance of  3. Therefore, Hamming codes are one error
correcting. Hamming codes have the covering radius of one, hence they are perfect
codes. This means that for each binary vector v of length 2m   1 there is a unique
codeword within distance 1 from v.
For example, the parity check matrix H of the binary (7,4) Hamming code can
be constructed by listing all vectors of length 3 as columns:
H =
264 1 1 1 0 1 0 01 1 0 1 0 1 0
1 0 1 1 0 0 1
375 :
Therefore, its generator matrix is
G =
26664
1 0 0 0 1 1 1
0 1 0 0 1 1 0
0 0 1 0 1 0 1
0 0 0 1 0 1 1
37775 :
Hamming codes are easily dened for vector spaces Fnq having parameters (
qm 1
q 1 ;
qm 1
q 1  m). For each m, there are (qm   1) dierent nonzero vectors, but since the
2. Mathematical Background 15
minimum distance of the code is  3, the columns of the parity check matrix have
to be pairwise linearly independent. Therefore, there are q
m 1
q 1 distinct q-ary vectors
that we can list as columns of the parity check matrix H.
2.4.3 The Lee Metric
Another metric used for error-correcting codes is the Lee metric. Let us call a
Lee-error-correcting code any error-correcting code dened for the Lee metric.
Denition 2.21 The Lee distance dL(x;y) of q-ary vectors x and y of length n is
the sum
Pn
i=1min(jxi   yij; q   jxi   yij).
For example, for q = 5,
dL(0; 1) = 1; dL(0; 2) = 2; dL(0; 3) = 2 and dL(0; 4) = 1:
And similarly for vectors x = [2; 1; 3]T and y = [1; 1; 0]T , x;y 2 F35, the Lee distance
dL(x;y) = 1 + 0 + 2 = 3:
Also, dL(a + k;b + k) = dL(a;b), i.e., like the Hamming metric, the Lee metric
is translation invariant. Notice, that for q = 2 and q = 3, the Hamming and the Lee
metric coincide.
Denition 2.22 The Lee-weight wL of a vector x is wL = dL(0;x).
An e-Lee-error-correcting code will be able to detect and correct errors, which
are at the Lee distance  e from the encoded sequence iTG, i.e., have Lee-weight
 e.
Perfect codes exist for the Lee metrics also. For example, for any given e, there
exists a perfect e-Lee-error-correcting code with n = 2 over Fq such that q = 2e2 +
2e+ 1.
2.4.4 Decoding of Linear Codes
When error-correcting codes are used in communication, the underlying assumption
is that a digit in a codeword has a small probability to change from one value to
another during the transmission over a noisy channel. Thus the most likely original
codeword is the one that is closest in the used metric to the received one. Hence
the decoding is unique (though not necessarily correct) as long as there is a unique
codeword closest to the received vector.
Consider the codewords of an e-error-correcting linear code C. Around each
c 2 C there is a sphere of radius e containing all the words at distance  e from c,
i.e., all the words that will be decoded into the word c. Hence, the decoding process
2. Mathematical Background 16
can be done by listing all the members of every sphere or more eciently by using
a structure called the standard array.
The standard array uses the coset decomposition of the additive group of Fnq .
Since a linear code is a subspace of Fnq , it is a subgroup of the additive group of Fnq
and hence the all-zero word is always a codeword. Thus each vector in the space
belongs to exactly one coset of the code. The vectors with minimum weight (i.e.,
minimum distance from the all-zero codeword) are called coset leaders. It is clear
that for an e-error correcting code, all vectors of weight at most e are coset leaders.
This follows immediately from the fact that the dierence of two vectors in the same
coset belongs to the code. A moment of thought shows that if each coset has only
one leader then for each element in the space there is a unique closest codeword.
Choosing that closest codeword when decoding the received vector corresponds to
maximum likelihood decoding [34].
The standard array in Table 2.3 is constructed as follows. Let 0; c2; : : : ; cqk be the
codewords of C, which is a subspace of Fnq . Begin by listing these words on the rst
row of the standard array. Then, choose one unused word v1 at distance  e from
the all-zero word, i.e., a coset leader, and place it as the rst entry of the second
row. The rest of the words on the second row will be v1 + c2; : : : ;v1 + cqk , i.e., the
word v1 translated by each of the codewords. These are the words belonging to the
coset of v1. Then, choose again an unused word at distance  e from the all-zero
codeword and translate it with each codeword to get the next row corresponding
to the next coset. Repeat this until there are no unused coset leaders. If there are
still unused words that do not appear in the standard array, draw a horizontal line
across the array, and to obtain the next row of the array, choose a word vj, which
is as close as possible to the all-zero word and translate the word vj by each of the
codewords. Repeat this until there are no unused words in Fnq .
Table 2.3: The standard array.
0 c2 : : : cqk
v1 v1 + c2 : : : v1 + cqk
...
...
...
...
vi vi + c2 : : : vi + cqk
vj vj + c2 : : : vj + cqk
...
...
...
...
In terms of the standard array, a decoder would assign a received word having at
most e errors (i.e., appearing above the horizontal line), into the codeword, which lies
above it in the standard array. Words below the horizontal line are either assigned
2. Mathematical Background 17
to a codeword similarly as the words at distance e from the codeword, or the decoder
refuses to decode such a word indicating more than e errors in the received word.
For perfect codes, there are no words under the horizontal line, since the spheres
around the codewords ll the entire metric space.
The coset leaders can be used to construct a fast decoding strategy by introducing
the syndrome s of a word v 2 Fnq , which is dened as sT = vTHT : It can be shown
that all vectors in the same coset have the same syndrome, which is unique to that
coset [34]. Hence, for ecient decoding, we only need to list the coset leaders and the
syndromes. Then, for each received word, it is enough to compute the syndrome and
nd the corresponding coset leader. Subtracting the coset leader from the received
word will correct the error.
18
3. FAULT-TOLERANCE IN LOGIC CIRCUITS
Fault-tolerance is an important topic in logic design and there exist several tech-
niques for providing tolerance against hardware component failures. In some cases,
for example in military and aerospace computing, fault-tolerance is critical, and it
is often desirable in other applications. Even when using high-quality components
and the best design techniques, system failures can still occur, and therefore tech-
niques for providing fault-tolerance are necessary. The rst ideas for fault-tolerant
logic were given by von Neumann [35] and Shannon and Moore [24] already in the
1950s. In the work of Shannon and Moore it was shown that arbitrarily reliable cir-
cuits can be built from unreliable components when the number of these unreliable
components is suciently large.
Failures in digital systems may be due to numerous reasons, and we begin by an
overview of the most common types of faults that may result in errors. Next, the
basic strategies for increasing fault-tolerance are discussed in Section 3.2. One of the
most well-known techniques against hardware component failures is triple modular
redundancy (TMR), which is explained in Section 3.3. The TMR technique has
been studied and improvements have been presented in several papers. There exist
also methods for increasing fault-tolerance, which utilize more complicated error-
correcting codes, and some of these techniques are reviewed in Sections 3.4-3.6.
3.1 Faults in Digital Systems
In digital systems, a system failure means that an element is unable to function due
to errors in the element or in its environment, the errors being caused by various
faults [26]. Faults are physical defects that can be caused by numerous reasons, e.g.
design errors, damage or external disturbances. Faults are manifested by errors,
thus a fault may change the value of the signal. A fault does not necessarily result
in an error, which gives meaning for the term fault-tolerance. The purpose of fault-
tolerant design is to ensure that a system can perform its intended function even in
the presence of a given number of faults.
Such faults, which change the value of the signal, e.g. from a 0 to a 1 or vice versa,
are called logical faults [19]. Other types of faults are referred to as nonlogical, which
include, for example, the malfunction of the clock signal or a power failure. Logical
faults are the type of faults that are of interest for the fault-tolerance methods
3. Fault-Tolerance in Logic Circuits 19
discussed in this thesis, and they can be further characterized by their value, extent
and duration [19]:
1. Value: Logical faults can cause xed or varying erroneous logical values.
2. Extent: The eect of a logical fault can be local or distributed.
3. Duration: A logical fault can be either permanent or temporary.
Dierent types of faults are described by fault models. The most common model
used for logical faults is the single stuck-at fault. A stuck-at-fault is a fault, in which
one of the inputs or the output of a gate is xed to some value, e.g. in switching
circuits to a 0 or a 1 [19]. The stuck-at-fault model is used for modeling the most
common types of physical defects in circuits, e.g. shorts, which may cause damage
to the circuit due to very low resistance in the circuit, and opens, in which the path
of the current gets broken. The stuck-at-model can also be used for representing
multiple faults in a circuit, when there are multiple stuck-at-faults in the circuit at
the same time.
When signal lines in a circuit get accidentally connected to each other, it results
in a permanent fault called a bridging fault [19]. Bridging faults are connected to
stuck-at-faults, since, depending on the technology, a bridging fault may manifest
as a stuck-at-fault. It may also cause a circuit to oscillate. The bridging fault model
is more applicable when the line widths are small.
A defect in a circuit may also be small enough not to alter the logic function,
which the circuit realizes, but will only cause the circuit to fail to meet its timing
specications [19]. Such defects delay the transition of a signal on a line to the
correct value, and are modeled by delay faults.
3.2 Fault-Tolerance Strategies
The foundation for fault-tolerance is in redundancy. Redundancy may be introduced
to hardware, software, information and computations, and the amount of redun-
dancy depends on the applied fault-tolerance technique. Fault-tolerance strategies
include one or more of the following elements [26]:
1. Masking, i.e., dynamic correction of generated errors.
2. Detection of an error, i.e., a symptom of a fault.
3. Containment, which means the prevention of error propagation.
4. Diagnosis, i.e., identication of the faulty module.
5. Repair or reconguration of the faulty component by replacement, elimination
or bypassing it.
3. Fault-Tolerance in Logic Circuits 20
6. Recovery, i.e., correction of the system to an acceptable state.
Masking, detection and correction are most important with respect to the topic
of this thesis, and several fault-tolerance techniques carry out one or more of these
three. Error detection, i.e., the ability to tell the incorrect values apart from the
correct ones, is the easiest of these to implement. An example of an error detection
scheme is simple parity checks in buses, memory and registers. Masking and correct-
ing errors is more dicult, and often requires a lot of redundancy. This redundancy
can be obtained, for example, by copying modules, which is the basic principle in
TMR technique (Section 3.3).
Error-correcting codes are an eective way of obtaining redundancy, and they of-
ten require less redundancy than other error detection and correction schemes. Their
theory is well developed and exploited in numerous applications. In logic circuits,
error detection and correction is usually implemented in special decoding circuits
[26]. The performance of the error-correcting scheme depends on the properties of
the code, and better error-correction ability usually requires more redundancy and
therefore, more physical space on circuits.
3.3 Triple Modular Redundancy
The TMR technique was rst introduced by von Neumann in [35], where he proposed
a conguration of independently computed copies of a signal and a restoring organ
in between logical operations. The basic idea of the TMR technique is to triplicate
the modules and then use a majority voter to decide the output of the whole system.
This way, if one of the modules produces an incorrect output, the majority vote will
still guarantee a correct output for the entire system. The TMR technique can be
generalized to N -modular redundancy (NMR), where there are in total N modules
and a voter, which decides the output based on the outputs of the modules.
The NMR technique can be described in terms of a (r; 1) repetition code, where
r = N . Repetition code is the simplest linear code. In an (r; 1) binary repetition code
a digit is encoded as a sequence of r repetitions of the digit itself, and the decoding is
done by majority-vote decoding. For example, in the binary (3; 1) repetition code,
which relates to the TMR technique, a 0 is encoded as the sequence 000, and if
the decoder receives either 000, 001, 010 or 100, it decodes the sequence as a 0 by
majority-vote decoding. Therefore, the (3; 1) repetition code is one error correcting.
In the TMR technique, usually an entire module realizing some logical operation
is encoded, i.e., triplicated, and a single voter is placed after the three modules
(Figure 3.1) or, more typically, a voter is placed after each module. These voters
together decode the output to a single output value. This output can be a symbol,
i.e., a sequence of bits, and therefore the TMR network can correct single symbol
3. Fault-Tolerance in Logic Circuits 21
errors. In Figure 3.2a is a network of modules and in Figure 3.2b is a typical TMR
version of this network, where the circles correspond to voters. The inputs are
i1; i2; i3 and outputs o1; o2.
Figure 3.1: The simplest TMR structure, with input i and output o.
(a) (b)
Figure 3.2: A network of modules (a) and its TMR version (b) with three inputs i1; i2; i3
and two outputs o1; o2.
In a TMR computer architecture, a module consisting of a memory and a pro-
cessor is triplicated, and a voter is added after each triplicated module. This allows
correction of single symbol errors, but bit errors distributed over the modules cannot
be corrected. In [17], an improvement for the TMR technique was proposed, where
3. Fault-Tolerance in Logic Circuits 22
the memory part of the modules is organized in a bit-sliced way, i.e., from modules of
smaller bit width, which increases the amount of hardware in the conguration, but
can tolerate arbitrary bit-slice failures. It can also tolerate the failure of arbitrary
bit-slices even if one of the modules is disabled. This idea of combined symbol- and
bit-error-correcting in a computer architecture originates from the (N;K) concept,
which is discussed in Section 3.5.
The TMR technique is the most widely used fault-tolerance technique. It has
been used in highly critical applications, e.g. in space technique [11], as well as
in some ECC memory, where ECC refers to error-correcting codes, although ECC
memory more typically uses Hamming codes [14].
3.4 Self-Checking Circuits
Self-checking circuits utilize the theory of error-correcting codes and have built-in
error detection capability. They are multi-output circuits, which produce an output
vector, from which the possible faults in the circuit can be detected. Formally, the
output of a self-checking circuit is a vector Y(X; f) which is a function of the input
vector X and the fault f in the circuit [36]. The inputs and outputs are codewords,
and the use of dierent error-correcting codes in these circuits has been studied by
several authors. When no faults occur in a self-checking circuit, the output vector is
a codeword, but a fault should result in a non-codeword output vector (detectable
error). However, it may happen that faults in the circuit result in an incorrect
codeword, in which case the fault will result in an undetectable error.
Self-checking circuits are divided to fault secure circuits, self-testing circuits or
totally self-checking (TSC) circuits [26]. In fault secure circuits, a correct input
codeword never causes an incorrect output codeword for a specied fault in the
circuit, i.e., no undetectable errors for correct input codewords can occur. However,
a fault might not result in a detectable error either. For self-testing circuits, if a fault
occurs in the circuit, the output is a non-codeword for at least one input codeword.
This means that if all input codewords occur in the operation of the circuit, a fault
will be detected by at least one of these input codewords. TSC circuits have the
properties of both self-testing and fault secure circuits, since a fault in the circuit,
in principle, cannot cause an error in the outputs without detection of the fault
[27]. Therefore, an incorrect input cannot result in a correct output and at least one
correct input will detect possible faults in the circuit.
TSC circuits are the most important class of self-checking circuits, and the above
properties guarantee that if the output is a codeword, it is safe to assume that it is
correct, and on the other hand, if there is a fault in the circuit, it will be detected
at some point. However, it may happen that faults are not detected in the order of
their appearance [27]. This might be the case if the rst fault is detectable only by
3. Fault-Tolerance in Logic Circuits 23
a particular input codeword. Also, implementing the self-testing properties in an
economical way has been problematic.
The basic implementation of a TSC network is introducing a TSC checker into
the network [27]. The TSC checker is a circuit, which takes as inputs the outputs of
the TSC network and is designed to detect errors in the error-correcting code used
in the network. Hence, the TSC checker is an additional piece of hardware, which
handles the error detection of the designed network. Possible faults can be detected
from the outputs of the checker. However, there is no information on whether the
fault occurred in the TSC network or in the checker.
3.5 The (N;K) Concept
The (N;K) concept was introduced by Krol in [18] as a new fault-tolerant computer
architecture based on a distributed implementation of a symbol-error-correcting
code. It is essentially a generalization of the TMR technique in which dierent
coding schemes are applied to memory data and processor data. In a TMR com-
puter architecture, both the memory data and the processor data are encoded using
the (3; 1) repetition code, but with the (N;K) concept, the amount of additional
memory hardware can be reduced. In [18], the concept was explained in more detail
for the case where N = 4 and K = 2, in which the fault-tolerant computer archi-
tecture is designed using a specic symbol- and bit-error-correcting code. The term
symbol refers to a sequence of bits, so within a faulty symbol, any number of bits
can be incorrect.
The (N;K) concept makes it possible to choose a ratio between memory and
processor redundancy and therefore makes it possible to optimize the total amount
of redundancy. In [18], the TMR technique was described as the (3; 1) concept, where
the computer architecture consists of three identical modules, each having memory,
a processor and a voter. All data in the system is triplicated, and voters mask the
possible failures in a single module. In the (N;K) concept, the processor data is
similarly encoded into a (N; 1) symbol-error-correcting code, but the memory part is
encoded into an (N;K) symbol-error-correcting code, where each of the N modules
contains one symbol of the code word. However, the symbol size in the (N;K) code
is K times smaller than the symbol size of the (N; 1) code, which means that less
redundancy is introduced when using the (N;K) concept.
The concept is easier to understand by means of an example, e.g. the (4; 2) con-
cept. In this case, the computer architecture consist of four modules, each having
memory and a processor. The processors are identical in all modules, but the mem-
ory part of each module has a wordlength of half a data word, i.e., the memory in
the (4; 2) concept is only doubled. The memory in each module is protected by a
symbol- and bit-error-correcting code, which can correct single symbol errors and
3. Fault-Tolerance in Logic Circuits 24
double bit errors even if the bit errors occur in dierent modules. If symbol errors
occur in more than one of the modules, the code can no longer correct the errors.
In Figure 3.3 is the layout of the modules, in which the wordlengths in each line is
written on the transfer lines. The boxes labeled E1; E2; E3; E4 together form the en-
coder of the (4; 2) symbol-error-correcting code. This way, data is encoded when it
is transmitted from the processors to the memories, but the encoding is distributed
over the modules to reduce redundancy. When the data is transmitted from the
memories to the processors, each module receives the complete codeword and the
decoders mask the faulty symbols.
Figure 3.3: The layout of the (4; 2) concept fault-tolerant computer architecture.
Since the (N;K) concept discusses a complete fault-tolerant computer architec-
ture, it is not as such applicable to circuit design. It is, in principle, a generalization
of the TMR technique, which enables the usage of more complicated error-correcting
codes and the choice in the ratio between the memory and processor redundancy.
The advantage of the (N;K) concept compared to the TMR technique is the possi-
bility of minimizing the total amount of hardware.
3.6 Low-Density Parity-Check Codes
Low-density parity-check (LDPC) codes were introduced already in the 1960s by
Gallager in [15], but they were rediscovered for applications in the 1990s. LDPC
codes are a class of codes, which are specied by sparse matrices, i.e., matrices con-
taining mostly zeros. They have been popular in many communication applications,
since the encoding matrices have low density and decoding can be done iteratively in
an easy manner. Here we briey review some selected applications of LDPC codes
in fault-tolerant logic, in which the use of LDPC codes is motivated by their high
error detecting and correcting ability, and because the resulting encoding, decoding
3. Fault-Tolerance in Logic Circuits 25
and checker circuits are sparse. In [10] and [16], LDPC codes are considered for
error correction in fault-tolerant memory, and in [22] the application of LDPC codes
in fault-tolerant nite eld multipliers is introduced.
The application of LDPC codes in nanoscale ECC memory in [16] is interest-
ing, since it considers modern nanoscale logic. A particular type of LDPC codes is
used in the application, namely Euclidean Geometry LDPC codes, see [32], since
the decoding process for these codes can be implemented with multistep majority
decoders, thus making the decoding fast and giving low latency, i.e., time delay,
for memory operations. In [16], the possibility of faults in the encoder and checker
circuits is taken into account, since they are also fabricated out of nanoscale com-
ponents, which are prone to faults. An ecient way of fabricating the encoder and
decoder circuits using nanowire-based programmable logic arrays (PLAs) is pro-
posed in [16], and the error detection and correction capacities of the resulting ECC
nanomemories are analyzed.
In [10], LDPC codes are used for a fault-tolerant memory architecture. The mo-
tivation behind the application is again in the decrease of transistor sizes, which
increases the unreliability of the components. In particular, the authors consider
unreliable components, which are subject to transient errors, i.e., errors, which ap-
pear at particular time steps. The idea is to build a reliable memory entirely of
unreliable components using LDPC codes. The existence of such reliable memories
is proved in [10], and it is also shown that the resulting architecture will have fairly
low redundancy.
The application of LDPC codes in [22] is motivated by fault related attacks in
cryptographic hardware, which is used, for example, for digital signature and identi-
cation schemes. It has been shown that attackers can inject faults into the hardware
causing incorrect outputs, which expose the digital signatures. Cryptography appli-
cations use nite eld arithmetic, and in [22], a method for designing error-correcting
multiplier circuits for nite elds is proposed, in which LDPC codes are used due
to their reduced decoding complexity. The resulting applications are shown to have
signicantly less redundancy than the traditional TMR technique, but additional
delay is introduced to the circuits due to encoding and decoding procedures.
26
4. ERROR-CORRECTING DECISION
DIAGRAMS
Many of the existing fault-tolerance techniques, e.g. the TMR technique (Section
3.3), are based on repetition of logic modules and additional checking circuitry. In
modern logic circuits, transistors are becoming smaller and smaller, and even atomic-
scale imperfections and variations within each transistor can result in circuit failures.
Therefore, in addition to testing and fault detection procedures, it is important to
nd systematic ways of introducing fault-tolerance already into the representations
of switching functions. When redundancy is introduced to the representations of
functions, no additional voters or checker circuits are required. This is the motivation
behind combining the theory of error-correcting codes and decision diagrams, since
decision diagrams are an ecient way of representing functions. The original method
presented in this chapter introduces robustness to the representations of functions,
and due to the properties of decision diagrams, the information on the complexity
and layout of the resulting circuits is contained in these robust representations.
Each robust diagram is specied by an error-correcting code, and the properties of
the given code aect the complexity and layout of the diagram. The procedure of
constructing the robust diagram is analogous to the decoding process of the code,
which for linear codes is, in principle, based on simple matrix and lookup operations.
First, some introduction to the subject is provided in Section 4.1 before moving
onto denitions and details. The denition of error-correcting decision diagrams is
given in Section 4.2, and the step by step procedure of constructing such a diagram is
described in Section 4.3. In Section 4.4, several examples of error-correcting decision
diagrams in both binary and multiple-valued logic are given.
4.1 Introduction to Error-Correcting Decision Diagrams
Consider the TMR technique described in Section 3.3. If instead of logic modules,
we simply consider variables, then TMR realizes the majority function f4 of three
variables, for which the truth vector is given in Table 4.1. On the other hand,
the majority function f4 has a direct correspondence to the binary (3; 1) repetition
code, since the function has the values 0 and 1, when the received length 3 vector
is decoded to 0 and 1, respectively, by the decoding rule of the repetition code.
The majority function f4 can be represented by its BDD, which is shown in Figure
4. Error-Correcting Decision Diagrams 27
Table 4.1: The truth-table of the majority function f4 of 3 variables.
x0x1x2 f4(x0; x1; x2)
000 0
001 0
010 0
011 1
100 0
101 1
110 1
111 1
4.1. There is some similarity between the structure of the BDD and the structure
of a logic module with TMR (Figure 3.1), i.e., the four nodes correspond to the
triplicated units and a voter. Also, due to the correspondence between the majority
function f4 and the (3; 1) repetition code, this diagram can be thought of as the
BDD representing the decoding rule of the (3; 1) repetition code, where by following
the edges corresponding to the received sequence, the original information word is
found in the terminal node. In other words, the received vector is the input and the
output is given by the decoding rule of the (3; 1) repetition code. Hence, the BDD
in Figure 4.1 is the simplest error-correcting decision diagram.
0
1
2
3
0 1
S 0
0
1
S 1
0 4
0 1
S 3
1 5
0
1
S 2
Figure 4.1: The BDD of the majority function f4.
The above diagram is a robust representation of the 1-variable function f5, which
is 0 when the input is 0, and 1 when the input is 1. The binary decision diagram of
the function f5 consists of just a single non-terminal node and two terminal nodes.
The basic idea of how the decision diagram in Figure 4.1 is obtained from f5 is to
map the function f5 to the majority function f4 given in Table 4.1. The mapping is
done by assigning such length 3 vectors to 0 (1), which would be decoded to 0 (1)
by the decoding rule of the (3; 1) repetition code. Then, the BDD of the majority
function f4 will be the error-correcting BDD of the function f5. Hence, the BDD in
4. Error-Correcting Decision Diagrams 28
Figure 4.1 is a robust version of a single decision node.
The idea can be generalized to arbitrary functions and codes. The procedure
of generating error-correcting decision diagrams follows the decoding rule of the
code, and the error-correcting properties of the diagrams depend on the given code.
Using an e-error-correcting code leads to an error-correcting decision diagram, which
corrects e decision errors. If the utilized code is a linear (n; k) code, it is a subspace of
the vector space Fnq , and the paths of an error-correcting decision diagram correspond
to the elements of the additive group of Fnq . The paths corresponding to elements
belonging to cosets having coset leaders at distance  e from the all-zero codeword
lead to the correct output value.
Now, if the error-correcting capability of a given code is e, then in the error-
correcting decision diagram generated with the given code, even if a wrong decision
is made in up to e nodes, we will still end up in the correct terminal node. For
example, when e = 1, if we follow the edges of the diagram towards the terminal
node, we may take a wrong turn once on the path that we are following, and we will
still end up in the correct terminal node. This is illustrated in Figure 4.2, where
following both the edges labeled 0 and 1 after the node on level 1 results in the same
output value. Notice, that the edge from the left-hand-side node on level 1 to the
terminal node labeled with 0 also includes one change in the input value.
Figure 4.2: Illustration of how one faulty node on a path does not aect the output value.
The concept of error-correcting decision diagrams is a new method for intro-
ducing fault-tolerance in logic design. In this thesis, the method is discussed on a
theoretical level, but once the decision diagrams are generated, their implementation
is straightforward with a suitable technology. The ideas introduced in this thesis
have also resulted in two conference papers [4], [5], where error-correcting decision
diagrams and their performance is discussed.
4. Error-Correcting Decision Diagrams 29
4.2 Formal Denition of Error-Correcting Decision Diagrams
The denition of error-correcting decision diagrams is given in terms of linear codes.
However, the idea can be generalized to apply for any coding schemes. This possibil-
ity is briey discussed in Section 6. In the following denition, we denote by d(x;y)
the distance between the vectors x;y. This distance is assumed to be a metric.
Denition 4.1 Let G be the generator matrix of a linear e-error-correcting (n; k)
code, f = f(x0; x1; : : : ; xk 1) a function, where [x0; x1; : : : ; xk 1]T 2 Fkq and g =
g(y0; y1; : : : ; yn 1) a function, where [y0; y1; : : : ; yn 1]T 2 Fnq . The error-correcting
decision diagram of f is the decision diagram of g, where g is dened as
g(y) =
8<:f(x) if there is x 2 Fkq such that d(yT ;xTG)  e otherwise, (4.1)
and the value  can be chosen arbitrarily.
In other words, each xTG and the vectors yT , where y 2 Fnq , within distance e
from xTG are assigned to the value f(x). The vectors y 2 Fnq at distance > e from
all the codewords are assigned to the label . The symbol  can be some arbitrary
value, which can be dened in a suitable way. It can have a value f(x), where x 2 Fkq ,
or it can have some other value. The function now behaves as the decoding rule of
the code having the generator matrix G, i.e., the vectors of Fnq within distance e
from a codeword xTG are interpreted as the codeword itself when determining the
function value. This is just the decoding process, where each received n-ary sequence
is interpreted as the codeword within distance e from the received sequence. If the
label  is obtained, then more than e decision errors have been made indicating at
least e+ 1 faults in the corresponding circuit.
Denition 4.2 Given a linear e-error-correcting (n; k) code, the general error-
correcting q-ary decision diagram of k-variable functions is the error-correcting deci-
sion diagram of f = f(x0; x1; : : : ; xk 1), where the function values as (x0; x1; : : : ; xk 1)
runs through the domain Fkq of f are left unspecied.
The error-correcting decision diagram of a particular f : Fkq ! Fq can be derived
from the general error-correcting q-ary decision diagram of k-variable functions by
assigning the values of the function to the terminal nodes of the error-correcting
decision diagram and reducing with respect to those values.
4.3 Constructing Error-Correcting Decision Diagrams
Suppose we wish to have an error-correcting decision diagram for arbitrary q-ary
functions of k variables, i.e., we want to construct the general error-correcting q-ary
4. Error-Correcting Decision Diagrams 30
decision diagram of k-variable functions. The procedure begins by determining a
suitable q-ary linear (n; k) code, which corrects e errors. After selecting the code,
the function f = f(x0; x1; : : : ; xk 1), where [x0; x1; : : : ; xk 1]T 2 Fkq , is mapped to
the function g = g(y0; y1; : : : ; yn 1) of larger domain, where [y0; y1; : : : ; yn 1]T 2 Fnq .
This mapping is done by equation (4.1) using the specied metric.
The next step is to construct the multi-terminal decision tree having qn terminal
nodes for the function g. Then, the obtained tree is reduced to an MTDD. After
reducing, we have a diagram with qk + 1 terminal nodes labeled by f(x), where
x 2 Fkq , and . This diagram can correct e decision errors, since the correct function
value is obtained even if a decision error occurs in  e nodes of the diagram.
From the obtained reduced decision diagram we get a robust decision diagram of
a particular function f : Fkq ! Fq by replacing the labels f(x) by the actual values of
the function f and reducing the diagram with respect to those values. This diagram
will then give a robust layout for a circuit realizing the desired q-ary function of k
variables.
The step by step method of constructing error-correcting decision diagrams goes
as follows. The last step is only for the case where we wish to obtain the error-
correcting decision diagram for a particular function.
1. Multiply the generator matrix G of the desired (n; k) code by each vector xTi ,
where xi 2 Fkq to obtain ci 2 Fnq .
2. For each ci, list all the vectors yi;1;yi;2; : : : ;yi;u of Fnq within distance e from
each ci.
3. To obtain the function g, assign the value f(xi) to each ci and to the corre-
sponding set of yi;1;yi;2; : : : ;yi;u.
4. Map each y 2 Fnq at distance > e from all the codewords ci to .
5. Construct a MTDD for the function g and reduce it.
6. To obtain the robust decision diagram of a specic function f , assign the values
of f into the terminal nodes of the MTDD of the function g and reduce.
In terms of the standard array, the set of vectors yi;1;yi;2; : : : ;yi;u in step 2
corresponds to the vectors listed directly below the codeword ci, but above the
horizontal line. The vectors of the standard array lying below the horizontal line
correspond to the paths of the error-correcting decision diagram leading to the node
labeled with the symbol . If a coset below the horizontal line in the standard
array has a unique coset leader, the elements of that coset could be assigned to the
values f(xi) corresponding codewords ci above them in the standard array to obtain
4. Error-Correcting Decision Diagrams 31
maximum likelihood decoding. Only the elements in cosets having no unique coset
leader would then be assigned to the value .
4.4 Examples
For better understanding of the concept of error-correcting decision diagrams, ex-
amples in both binary and multiple-valued logic must be provided. In this section,
we give some examples with fairly small values for both q and k, since the number
of nodes increases rapidly, as these parameters become larger. For binary functions,
we consider examples in the Hamming metric, and for multiple-valued functions,
both the Hamming and the Lee metric are considered.
The examples have been generated as follows. A suitable linear (n; k) code was
selected for each example, and for obtaining the mapping of the function f to the
function g, a script was written in MATLAB [1]. The decision diagrams of the
resulting functions were generated using the XML-based framework, which was in-
troduced in [30].
4.4.1 Binary (5; 2) Code
The rst example considers a non-perfect (5; 2) code for constructing a general
error-correcting MTBDD for 2-variable binary functions. Since the given code is
not perfect, the resulting robust diagram will have a terminal node with the label .
Let C be a binary (5; 2) code dened by the generator matrix G:
G =
"
1 0 1 1 0
0 1 1 0 1
#
:
The parity check matrix H of the code C is then
H =
2641 1 1 0 01 0 0 1 0
0 1 0 0 1
375 :
Since the sum of no two columns of H is zero, the code has minimum distance 3
and corrects 1-bit errors.
Using C, we can construct the general error-correcting decision diagram for binary
2-variable functions. Take all x 2 F22 and for each, compute the codeword cT =
xT G, where G is the generator matrix of C. To obtain the function g(y), map
each of the codewords to the label f(x) as in equation (4.1). For example, since
[0; 1] G = [0; 1; 1; 0; 1], the codeword [0; 1; 1; 0; 1]T is mapped to f(0; 1). Then, for
each obtained codeword c, list all the vectors y 2 F52 within distance 1 from c, and
4. Error-Correcting Decision Diagrams 32
map the vectors to the corresponding f(x), where cT = xTG.
For example,
g(1; 1; 1; 0; 1) = g(0; 0; 1; 0; 1) = g(0; 1; 0; 0; 1)
= g(0; 1; 1; 1; 1) = g(0; 1; 1; 0; 0) = g(0; 1; 1; 0; 1) = f(0; 1):
The vectors y 2 F52 at distance > 1 from all the codewords are mapped to .
Next, the multi-terminal binary decision tree for the function g is constructed
and reduced to obtain the general error-correcting MTBDD for 2-variable functions
(Figure 4.3). The terminal nodes are labeled f(0; 0); : : : ; f(1; 1) and . To get
the MTBDD of a particular binary function, the labels f(x) can be replaced by
the actual values of the function at f(x) and the general diagram should then be
reduced with respect to these values.
0
1
2
3
4
5
0 1
S
0 1
S
0 1
S
0
1
S
f(0,0)
0 1
S
0 1
S
0 1
S
01
S
0
1
S
0 1
S
0 1
S
0
1
S
0 1
S
0 1
S
0
1
S
0 1
S
0 1
S
0
1
S
0
1
S
0
1
S
0 1
S
0
f(0,1) f(1,0) f(1,1)*
Figure 4.3: A robust MTBDD for 2-variable functions using the (5; 2) code.
For example, to get the MTBDD of the 2-variable binary function f6 dened in
Table 4.2, assign the value 0 to the terminal nodes labeled f(0; 0) and f(1; 0), and
the value 1 to the terminal nodes labeled f(0; 1) and f(1; 1). Then, reducing the
obtained diagram we obtain the robust diagram for f6 (Figure 4.4).
Table 4.2: The truth-table of the function f6.
x0x1 f6(x0; x1)
00 0
01 1
10 0
11 1
In the error-correcting decision diagram in Figure 4.4, obtaining the output  in-
dicates at least 2 decision errors. However, a decision error in two nodes is not always
detectable, since some of such errors change the codeword into such a word, which
4. Error-Correcting Decision Diagrams 33
0
1
2
3
4
5
0 1
S
0 1
S
0 1
S
0
1
S
0
0
1
S
*
01
S
0 1
S
1
0 1
S
0 1
S
0 1
S
0
1
S
0 1
S
0 1
S
0 1
S
0
1
S
0 1
S
0
1
S
01
S
Figure 4.4: A robust MTBDD for the function f6.
is at distance 1 from some other codeword. For example, the word [1; 1; 1; 1; 1]T is
at distance 2 from the codeword [0; 1; 1; 0; 1]T , but does not output , since it is at
distance 1 from the codeword [1; 1; 0; 1; 1]T .
Notice that the function f6 = x1, which means that it is the identity function of
a single variable. Therefore, the diagram in Figure 4.4 can be seen as a competitor
for the robust diagram in Figure 4.1, which is generated using the (3; 1) repetition
code. The diagram in Figure 4.4 shows how the properties of the code aect the
resulting diagram, since it represents a simple function but has signicantly higher
complexity than, e.g. the diagram in Figure 4.1, where the utilized code is simpler.
4.4.2 Hamming (7; 4) Code
In the case of binary Hamming codes, the parameters of the code are n = 2m   1
and k = 2m   m   1. Therefore, for each binary function with k = 2m   m   1
variables, we can nd a Hamming code and construct the robust BDD using this
code. The properties of the Hamming code guarantee that by following the BDD,
the correct function value is obtained even if one decision error occurs during the
determination of the function value.
Consider the (7; 4) Hamming code having the following (non-systematic) genera-
tor matrix:
G =
26664
1 1 1 0 1 0 0
0 0 1 1 1 0 0
1 1 0 0 0 1 1
0 1 1 0 0 0 1
37775
Using this code we may produce a general error-correcting decision diagram for
binary 4-variable functions. First, we need to nd the length 7 codewords, which
are obtained by multiplying at the left the above generator matrix G by each xT ,
where x 2 F42.
4. Error-Correcting Decision Diagrams 34
Next, each obtained codeword is mapped to the corresponding symbolic value
f(x), and for each codeword, the vectors of length 7 within distance 1 from that
codeword are also mapped to the value f(x). Now, we have obtained a function
g for which we can generate a MTBDD, which will have in total 24 = 16 terminal
nodes (Figure 4.5).
0
1
2
3
4
5
6
7
0 1
S
0 1
S
0 1
S
0 1
S
0 1
S
0 1
S
0
1
S
0
1
S
0
1
S
0
1
S
0 1
S
0 1
S
0 1
S
0
1
S
01
S
0 1
S
0 1
S
0
1
S
0 1
S
0
1
S
0
1
S
01
S
0 1
S
0
1
S
0
1
S
01
S
0
1
S
0 1
S
0 1
S
0 1
S
0
1
S
0 1
S
0
1
S
0
1
S
0 1
S
0
1
S
01
S
0
1
S
0 1
S
0 1
S
0
1
S
0
1
S
01
S
0
1
S
0 1
S
0
1
S
0 1
S
0
1
S
0 1
S
0 1
S
0 1
S
0 1
S
0 1
S
0
1
S
0 1
S
0
1
S
0 1
S
01
S
01
S
01
S
0 1
S
0 1
S
01
S
0
1
S
01
S
0
1
S
0 1
S
0
1
S
0 1
S
0 1
S
0 1
S
0 1
S
0 1
S
0
1
S
0
1
S
0
1
S
0 1
S
0
1
S
0 1
S
0 1
S
0 1
S
0
1
S
0
1
S
0 1
S
0 1
S
0
1
S
0 1
S
f(0,0,0,0) f(1,1,1,0) f(1,0,0,1) f(1,0,1,1) f(1,0,1,0) f(0,1,0,0) f(0,1,0,1) f(0,1,1,1) f(0,0,0,1) f(0,0,1,1) f(1,1,0,1) f(1,1,1,1) f(0,0,1,0) f(1,1,0,0) f(1,0,0,0) f(0,1,1,0)
Figure 4.5: The robust MTBDD for 4-variable functions using Hamming (7; 4) code.
Consider the functions f7 = x0x1  x0x2  x0x3  x2x2  x1x3  x2x3; where
 denotes the exclusive OR, and f8 = x0x1x2x3. The error-correcting decision
diagrams of f7 and f8 can be obtained from the diagram in Figure 4.5 by assigning
their function values to the terminal nodes and reducing with respect to those values.
The resulting diagrams are shown in Figure 4.6.
The two diagrams in Figure 4.6 have a signicantly reduced number of nodes
than the general MTBDD of 4-variable functions.
4.4.3 Shortened Hamming Code
By shortening an existing linear code, it is possible to form a new linear code, which
has some of the properties of the original code. The shortening process is usually
done by taking the subspace of the chosen (n; k) code consisting of all codewords,
which begin by 0. The 0 is then deleted from the beginning, giving a new linear
4. Error-Correcting Decision Diagrams 35
0
1
2
3
4
5
6
7
0 1
S
0 1
S
0 1
S
0
1
S
0
1
S
0
1
S
0
0
1
S
1
0
1
S
0 1
S
01
S
0 1
S
0 1
S
0 1
S
0 1
S
0
1
S
0
1
S
0
1
S
0
1
S
01
S
0 1
S
0
1
S
0
1
S
0 1
S
01
S
0 1
S
0
1
S
0
1
S
0
1
S
0 1
S
0 1
S
01
S
0
1
S
0
1
S
0
1
S
0 1
S
0
1
S
0
1 S
(a) f7 = x0x1x0x2x0x3x2x2x1x3x2x3
0
1
2
3
4
5
6
7
0 1
S
0
1
S
0
1
S
0
0
1
S
0
1
S
0
1
S
01
S
1
0
1
S
0
1
S
0
1
S
0
1
S
0
1
S
(b) f8 = x0x1x2x3
Figure 4.6: Error-correcting decision diagrams of f7 and f8 generated using the Hamming
(7; 4) code.
code of parameters (n   1; k   1). For example, by shortening the Hamming (7; 4)
code we get a (6; 3) code, which has the same minimum distance and a generator
matrix
G =
264 1 0 0 1 1 00 1 0 1 0 1
0 0 1 0 1 1
375 :
However, this code is not perfect.
Consider the full adder of 2 variables, having the truth-table given in Table 4.3.
With the carry in, it is a 3-variable function, which has two outputs (Figure 4.7).
We can construct a robust decision diagram for the full adder by using the shortened
Hamming code of parameters (6; 3).
Table 4.3: The truth-table of the full adder of 2 variables.
ABCi Co S
000 0 0
001 0 1
010 0 1
011 1 0
100 0 1
101 1 0
110 1 0
111 1 1
The two-output function is mapped into a function of a larger domain by multi-
plying at the left the generator matrix of the (6; 3) code by the vectors of F32. The
shared MTBDD of the new function is then constructed, assigning two values to
4. Error-Correcting Decision Diagrams 36
0
1
2
3
0
1
S
0
1
S
0
0
1
S
1
0 1
S
0
1
S
0
1
S
0
1
S
0
1
S
Co S
Figure 4.7: A shared decision diagram of the 2-variable full adder.
each obtained codeword and the length 6 vectors at distance  1 from the code-
words. This procedure gives an error-correcting shared decision diagram for the full
adder of 2 variables (Figure 4.8). The diagram has a terminal node labeled with ,
since, due to the shortening, the used code is no longer perfect. Obtaining the value
 indicates two decision errors.
0
1
2
3
4
5
6
0 1
S
0 1
S
0 1
S
01
S
0
0
1
S
0
1
S
*
01
S
0
1
S
0
1
S
1
01
S
0 1
S
0 1
S
0
1
S
0
1
S
0 1
S
0
1
S
0
1
S
0
1S
0 1
S
0 1
S
0 1
S
0 1
S
0
1
S
01
S
0 1
S
0
1
S
0
1
S
0
1
S
0 1
S
01
S
0
1
S
01
S
0
1
S
0 1
S
0 1
S
0 1
S
0 1
S
0 1
S
0
1
S
01
S
01
S
0 1
S
0
1
S
0 1
S
0
1 S
01
S
0 1
S
0 1
S
0
1
S
0
1
S
0
1 S
0
1 S
0 1
S
0
1 S
0
1
S
0 1
S
0
1
S
0
1 S
Figure 4.8: A shared robust diagram for the full adder of 2 variables using shortened
Hamming code.
4. Error-Correcting Decision Diagrams 37
4.4.4 Ternary Hamming (4; 2) Code
Hamming codes can be dened over any vector spaces Fnq . Non-binary Hamming
codes exist with parameters n = q
m 1
q 1 and k =
qm 1
q 1   m. Choosing q = 3 and
m = 1 gives the ternary (4; 2) Hamming code, which is one-error-correcting. Using
this code, we may construct a general error-correcting decision diagram for ternary
functions of 2 variables.
The generator matrix G for the ternary (4; 2) Hamming code is
G =
"
1 0 1 2
0 1 2 2
#
:
The function g is obtained by multiplying G by the ternary vectors of length
2 and then mapping the obtained codewords and the length 4 ternary vectors
within distance 1 from the codewords to the corresponding function values. The
obtained ternary decision diagram will have 9 terminal nodes corresponding to
f(0; 0); f(0; 1); f(0; 2); : : : ; f(2; 2) (Figure 4.9). The code is perfect, hence there
are no -valued outputs in the obtained decision diagram.
0
1
2
3
4
0 1 2
S
0 1
2
S
01
2
S
0
1
2
S
0 1
2
S
0 1
2
S
0
1
2
S
0
1 2
S
0
1
2
S
0
1
2
S
0
1 2
S
0 1
2
S
01
2
S
0 1
2
S
0
1
2
S
01
2
S
0
1
2
S
0
12
S
0 1
2
S
0
1
2
S
01 2
S
0 1
2
S
0 1
2
S
0
1
2
S
01
2
S
0
1 2
S
0
1
2
S
0
1
2
S
0
1
2
S
0
1
2
S
0
1
2
S
f(0,0) f(0,2) f(1,0) f(2,0) f(0,1) f(1,1) f(2,1) f(2,2) f(1,2)
Figure 4.9: A general error-correcting decision diagram for ternary 2-variable functions
using Hamming (4; 2) code.
Similarly, as for binary functions, the robust decision diagram for a particular
ternary function of 2 variables can be obtained by assigning the function values to
the terminal nodes and reducing with respect to the function values.
4. Error-Correcting Decision Diagrams 38
For example, consider the ternary function dened as f9 = [1; 1; 2; 0; 1; 1; 2; 2; 0]
T .
If we assign the values of this function to the corresponding terminal nodes of the
general diagram in Figure 4.9, we obtain the the robust diagram in Figure 4.10.
Figure 4.10: A robust diagram for f9 using Hamming (4; 2) code.
4.4.5 Repetition Codes
With binary repetition codes we can construct error-correcting decision diagrams,
which have some analogy to the NMR technique described in Section 3.3. The
resulting diagrams are redundant representations for single nodes.
The rst example using such codes was already given in the introductory part
of this chapter. In Figure 4.1, the error-correcting decision diagram generated with
the (3; 1) repetition code was shown. The diagram has no terminal nodes labeled
as , since every possible sequence is always decoded to either the value 0 or 1 by
majority-vote decoding, i.e., every 3-bit sequence is always at a distance  1 from
the sequence 000 or 111. It was discussed, how this example is in principle similar
to the TMR technique. The key dierence is that once we have reduced the decision
diagram, the majority vote property is already included in the structure, i.e., no
voters are needed in the circuit level implementation.
The binary (5; 1) repetition code is two-error-correcting, and may also be used for
generating a robust structure for a single node. With a larger code the robustness
increases, but consequently the complexity of the BDD is higher (Figure 4.11).
Consider the (3; 1) repetition code for Fq. This code is perfect in F2, but will
result in -valued outputs for Fq, where q > 2. However, we may use this code
for generating a robust decision diagram for, e.g. quaternary (Figure 4.12) or 5-ary
(Figure 4.13) logic. The resulting decision diagrams correct one decision error.
4. Error-Correcting Decision Diagrams 39
0
1
2
3
4
5
0 1
S 0
0 1
S 1
0
1
S 3
0 9
0
1
S 6
0 1
S 8
1 10
0 1
S 4
0
1
S 7
0 1
S 2
0
1
S 5
Figure 4.11: A robust BDD for a single binary node using the (5; 1) repetition code.
0
1
2
3
0
1 2
3
S
01 2
3
S
0
0 1
2
3
S
1 *
0
1
2
3
S
2
0
12
3
S
3
0
1 2
3
S
0
1 2
3
S
0
1
2
3
S
0
1 2
3
S
0
1
2
3
S
0
1
2
3
S
Figure 4.12: A robust diagram for a single quaternary node using the (3,1) repetition code.
Figure 4.13: A robust diagram for a single 5-ary node using the (3,1) repetition code.
4. Error-Correcting Decision Diagrams 40
Notice that since the obtained error-correcting decision diagrams using repetition
codes are robust representations of single nodes, we may replace the nodes of a
traditional decision diagram of a function f with these robust diagrams to obtain a
robust representation for f . This method of constructing a robust diagram is briey
discussed when analyzing the fault-tolerance of error-correcting decision diagrams
in Section 5.3.
4.4.6 One-Lee-Error-Correcting Code for q = 5
The previous example for q = 5 is for representing a single decision node, but the
number of non-terminal nodes in the robust diagram is much higher. However, we
may construct a robust diagram using a perfect one-error-correcting code in the Lee
metric, which will have signicantly less non-terminal nodes.
Consider the one-Lee-error-correcting code having the generator matrix
G =
h
3 1
i
:
The construction of the robust diagram is done similarly as in the previous exam-
ples, but the mapping of length 2 vectors to corresponding values in F5 is done
with respect to the Lee metric. Table 4.4 shows the radius one spheres around the
codewords in the vector space F25, illustrating how the vectors y of length 2 satis-
fying dL(y;xG)  1, where x 2 F5, are found. The codewords, i.e., the vectors
[0; 0]T ; [3; 1]T ; [1; 2]T ; [4; 3]T and [2; 4]T , are labeled by the corresponding x 2 F5 and
in boldface, and the vectors of F25 at distance 1 from each codeword are labeled by
the same x 2 F5 as the codeword.
Table 4.4: The radius one spheres around codewords (in boldface) labeled by the corre-
sponding x 2 F5. The rst component of y chooses the column and the second component
chooses the row.
4 0 4 4 4 3
3 3 2 4 3 3
2 2 2 2 1 3
1 0 2 1 1 1
0 0 0 4 1 0
0 1 2 3 4
The obtained robust diagram for a single 5-ary decision node is in Figure 4.14.
The increase in the number of non-terminal nodes is signicantly less than with
the (3; 1) repetition code in the Hamming metric, and the obtained decision diagram
has no terminal nodes labeled with the symbol . However, the Lee metric introduces
a functional dierence to the diagram. In the error-correcting decision diagrams in
4. Error-Correcting Decision Diagrams 41
0
1
2
0 1 2 3 4
S
0
1
2
3
4
S
0 4 1
0
1
23
4
S
2
01
23
4
S
3
0
1
2
3
4
S
0
12
3
4
S
Figure 4.14: A robust diagram for a 5-ary node using a Lee-error-correcting code.
the Hamming metric, each line after a decision node is equal in the sense that
whichever incorrect decision is made, the eect of the decision is the same in terms
of error-correction. In the Lee metric, however, it is possible to make an error of
value  e in just one decision node, following that error correction is no longer
possible.
42
5. FAULT-TOLERANCE ANALYSIS OF
ERROR-CORRECTING DECISION DIAGRAMS
In addition to designing new fault-tolerance methods it is important to nd ways
to describe the performance and behavior of these systems with dierent design
parameters. One approach to evaluating system reliability is to experimentally
determine the reliability of a component as a function of time [2]. This requires
testing several copies of components that can be too expensive or complex for such
testing procedures. Therefore, it is necessary to describe the behavior of fault-
tolerant systems such as error-correcting decision diagrams with reliability modeling
techniques.
Reliability is dened by Naresky in [25] as "the ability of an item to perform a
required function under stated conditions for a stated period of time". The reliabil-
ity of a system can be described as a function of the reliability of a single module
or component in the system. In this chapter, instead of modeling the reliability of
error-correcting decision diagrams, we simply consider probabilities of correct and
incorrect outputs with dierent combinations of nodes performing their functions
correctly or incorrectly. Therefore, the eect of time is discarded, and the probabil-
ity of correct or incorrect outputs with any input combinations can be computed as
a function of the error probability of a single node in the diagram. The probability
model should be considered already when designing the error-correcting decision dia-
grams. The selection of codes should be done with respect to the desired technology,
since the fault model should t the realization. For example, if it is assumed that
the logical values can change into any incorrect values with equal probability, it is
reasonable to use codes in the Hamming metric. When errors of dierent value have
dierent probabilities, one can use, for example, codes in the Lee metric. Suitable
codes can also be designed to t the utilized technology best.
Modeling the probabilities of correct and incorrect outputs is discussed in Sections
5.1 and 5.2. Results of the fault-tolerance analysis for some example diagrams are
given in Section 5.3. However, for larger diagrams, the exact determination of these
probabilities can be dicult, and methods of approximation are discussed in Section
5.4.
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams 43
5.1 The Probability Model for Diagrams based on Codes in
the Hamming Metric
In this thesis, the performance of error-correcting decision diagrams is analyzed by
determining the probability that the output is correct for any input in the robust
diagrams. This probability is described as a function of the error probability of a
single node in the diagram, i.e., the probability that the output of a node is incorrect.
In the following, we consider error-correcting decision diagrams generated with codes
in the Hamming metric. For concepts and notation of probability theory we use [12]
as a reference.
We call decision nodes that make an incorrect decision faulty nodes. For tradi-
tional non-redundant decision diagrams, an incorrect output for the whole system
is obtained whenever there is a faulty node on a path. For error-correcting decision
diagrams, there may be up to e faulty nodes on a path, and the output is still cor-
rect. It could be so, that the correct output is obtained even if incorrect decisions
are made in more than e nodes on a path, but we are not interested in these outputs.
We model the non-terminal nodes of an e-error-correcting decision diagram as
independent binary random variables 1; 2; : : : ; M , where
Pfi = 1g = p (1 means faulty),
Pfi = 0g = 1  p;
for i = 1; 2; : : : ;M and M is the total number of non-terminal nodes. Notice that
when q > 2 the probability of a node not being faulty is still (1   p). This simpli-
cation is made, since in the Hamming metric, one incorrect decision is always at
distance 1 from the correct value, due to the denition of the Hamming distance.
Therefore, the eect on the total error is always the same no matter which incorrect
output is given from a single node.
Now, denote by P1; P2; : : : ; PL the subsets of f1; 2; : : : ;Mg formed of the indexes
of the non-terminal nodes in all paths from root to the terminal nodes. Using this
notation, we can express the probability that the output of the error-correcting
decision diagram is correct with any input as
Pfoutput correct for any inputg
= Pfthere are at most e faulty nodes on any pathg
= Pfmax
1iL
X
j2Pi
j  eg:
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams 44
Since
max
1iL
X
j2Pi
j 
MX
j=1
j; (5.1)
we can write
Pfoutput correct for any inputg  Pf
MX
j=1
j  eg
=
eX
i=0

M
i

pi(1  p)M i:
Thus, we may write
Pfoutput correct for any inputg =
eX
i=0

M
i

pi(1  p)M i +
MX
i=e+1
ip
i(1  p)M i;
(5.2)
where the coecient i depends on the structure of the error-correcting decision
diagram. Hence, for the probability of an incorrect output for the error-correcting
decision diagram we have
Pfincorrect outputg = 1 
 
eX
i=0

M
i

pi(1  p)M i +
MX
i=e+1
ip
i(1  p)M i
!
= 1 
 
1 
MX
i=e+1

M
i

pi(1  p)M i +
MX
i=e+1
ip
i(1  p)M i
!
;
which can be written in the form:
Pfincorrect outputg =
MX
i=e+1

M
i

  i

pi(1  p)M i: (5.3)
It is clear that in the expansion of the above equation, the lowest degree term is
of degree at least e + 1, i.e., of the form A  pe+1, where A is some constant. For a
traditional diagram, since a single incorrect decision causes an incorrect output, the
lowest degree term of the error probability function is always B  p, where B is some
constant.
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams 45
5.2 The Probability Model for Diagrams based on Codes in
the Lee Metric
When analyzing the fault-tolerance of robust decision diagrams constructed in the
Lee metric, we have to take into account that in a decision node, we can make either
the correct decision, an incorrect decision which is at Lee distance  e from the
correct value, or an incorrect decision which is at distance > e from the correct
value. For example, if e = 3, it is possible to make 3 incorrect decisions at distance
1, or an incorrect decision at distance 2 and an incorrect decision at distance 1, or
one incorrect decision at distance 3, and still obtain the correct output. Therefore,
the analysis must be slightly changed to make sense for the Lee-error-correcting
decision diagrams.
The error-correcting decision diagrams based on codes the Lee metric are for q-ary
logic where the variables are assumed to take value 0; 1; : : : q   1. In the following,
for simplicity, we assume that q = 2m+1, i.e., that q is an odd integer of value  3.
It is natural to assume that larger errors are less likely than smaller errors. For
example, depending on the technology, values within smaller distance from each
other can be obtained with smaller dierence in voltage in the circuit level, and it
is therefore more probable that an incorrect value at a smaller distance is obtained.
For instance, if q = 5, it is more likely for a 0 to change into the value 1 than into
the value 2, i.e., the error 0! 1 has a higher probability than the error 0! 2.
We model the non-terminal nodes of a e-Lee-error-correcting decision diagram as
independent random variables 1; 2; : : : ; M , and
Pfi = wg = pw;
where w = 0; 1; 2; : : : ;m, 0 < p1 < p2 <    < pm, p0 = 1  
Pm
w=1 pw > 0, and M
is the total number of non-terminal nodes. Thus, i = w is interpreted as a fault
that causes a decision error of Lee-weight w. For example, let q = 5 and consider
the node i. Then i = 2 is interpreted as decision errors 0 ! f3; 2g, 1 ! f4; 3g,
2! f0; 4g, 3! f1; 0g and 4! f2; 1g.
Similarly as for the Hamming metric, we denote by P1; P2; : : : ; PL the subsets of
f1; 2; : : : ;Mg formed of the indexes of the non-terminal nodes in all paths from root
to the terminal nodes.
Again, we may write
Pfoutput correct for any inputg = Pfmax
1iL
X
j2Pi
j  eg;
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams 46
and due to equation (5.1), we can write
Pfoutput correct for any inputg  Pf
MX
j=1
j  eg:
It is dicult to write explicit formulas even for small values of e, but for given
probabilities p0; p1; : : : ; pm and given n and e, we may compute Pf
PM
j=1 j  eg as
follows.
Expand the polynomial
AM(x) = (p0 + p1x+ p2x
2 +   + pmxm)M
= pM0 + A
(M)
1 x+ A
(M)
2 x
2 +   + A(M)e xe +   + pMmxmM :
Now, denote by BM(x) = p
M
0 + A
(M)
1 x+   A(M)e xe. Then,
Pf
MX
j=1
j  eg = BM(1):
This is because for a diagram having M non-terminal nodes, the coecient A
(M)
i
is a sum of all such terms pw0    pws , where w0  w1      ws  m and w0+w1+
   + ws = i, therefore it corresponds to all combinations of faulty and not faulty
nodes for which the total Lee-weight of the error is i.
For example, let q = 5, e = 2, M = 6, and p0 = 0:6; p1 = 0:3; p2 = 0:1. Then,
A6(x) = (p0 + p1x+ p2x
2)6 = 0:047 + 0:140x+ 0:222x2 +    ;
and
B6(x) = 0:047 + 0:140x+ 0:222x
2:
Now we can evaluate B6(x) at 1 giving
Pf
6X
j=1
j  eg = B6(1) = 0:409:
Notice that by writing AM+1(x) = AM(x)A1(x), we can obtain recursive formulas
for the coecients A
(M)
i .
If we simplify the model by assuming that p1 = p; p2 = p
2; : : : ; pw = p
w, following
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams 47
that p0 = 1 
Pm
w=1 p
w, then for the case e  m we obtain the following formula:
Pfoutput correct for any inputg
= (1 
mX
w=1
pw)M +
eX
s=1
sX
u=1

M
u

s  1
u  1

ps(1 
mX
w=1
pw)M u (5.4)
+
MX
s=e+1
sX
u=1
up
s(1 
mX
w=1
pw)M u:
The coecient
 
M
u
 
s 1
u 1

is the number of ways to select u faulty nodes from the total
M nodes of the diagram, which together produce an incorrect output at distance s
from the correct outputs. This corresponds to rst selecting any u nodes from the
total M nodes, which is given by
 
M
u

, and then assigning the size of the error to
each node in all possible orders, i.e., the number of ways of writing s as a sum of u
positive integers, which is the u-composition of s given by
 
s 1
u 1

[31]. The coecient
u depends on the structure of the diagram.
5.3 Results of the Fault-Tolerance Analysis
In this section, fault-tolerance analysis based on the probability models introduced
in the previous sections is performed on some of the example decision diagrams
of Section 4.4 to give some idea of their eciency. The probability of a correct
output with any input is computed as a function of p. The obtained probability
is then compared to the probability of a correct output with any input of a non-
redundant decision diagram performing the same function with no error correction.
We compute the probabilities by a brute force method, i.e., by listing all the possible
combinations of faulty and not faulty nodes, for which the output is correct. Since
the coecients of the higher order terms must be determined from the structure of
the diagram, the example diagrams are not very large.
Consider the error-correcting BDD for a single node generated using the (3; 1)
repetition code (Figure 4.1). We want to list all the possible cases for which there
is at most one faulty node on any path, following that the output of the diagram
is always correct. The probability is given by equation (5.2), where the term i
has to be determined separately from the diagram. Since there are 4 non-terminal
nodes in the diagram, the rst term is (1   p)4 when there are no faulty nodes in
the diagram, and the second term is 4p(1   p)3. The nal term p2(1   p)2 is when
there are exactly two faulty nodes in the diagram, situated after the root node on
level 1 of the diagram. This gives the probability
(1  p)4 + 4p(1  p)3 + p2(1  p)2
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams 48
for a correct output with any input. Therefore, the probability that the output of
the diagram is incorrect is
~p = 1  ((1  p)4 + 4p(1  p)3 + p2(1  p)2) = 5p2   6p3 + 2p4: (5.5)
The probability of a correct output 1  ~p given any input is depicted together with
the probability of a correct output of the single node decision diagram given by 1 p
in Figure 5.1a.
By similar computations, we obtain the probability of a correct output of the
error-correcting BDD based on the (5; 1) repetition code, which corrects two errors.
This probability is shown together with the error probability of a single node in
Figure 5.1b. Similarly, the probability of a correct output of the general error-
correcting decision diagram of 2-variable functions in Figure 4.3 is depicted together
with the traditional decision diagram for realizing 2-variable functions (Figure 5.2).
Again, the probability that a node is faulty is p.
(a) (3,1) repetition code. (b) (5,1) repetition code.
Figure 5.1: Comparison of the probability of a correct output of traditional and error-
correcting binary decision diagrams generated using repetition codes.
Instead of mapping a function f into a robust one by some error-correcting code
by equation (4.1), we may replace the nodes of the non-redundant decision diagram
representing f by robust structures to obtain a robust representation for the given
function f . It is also possible to replace just some of the nodes by robust structures,
if it is reasonable to assume that in some parts of the circuit, errors are more likely
than in others.
For example, consider a decision diagram for some binary function f , which has
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams 49
Figure 5.2: Comparison of the probability of a correct output of a traditional and the
general error-correcting binary decision diagram generated using the (5,2) code.
M nodes in its reduced BDD. The error probability function for this diagram is
1   (1   p)M . Now, we may replace each node by the non-terminal nodes of the
robust diagram generated using the (3; 1)-repetition code. The resulting diagram
will have 4M nodes in total and the error probability function 1  (1  ~p)M , where
~p is from equation (5.5). In Figure 5.3, the probability of a correct output of a
diagram with M = 50 nodes is compared to the probability of a correct output of
a diagram representing the same function, for which each node is replaced by the
robust structure using (3; 1)-repetition code.
Similarly as above for binary diagrams, we can determine the probability of a
correct output of error-correcting decision diagrams of multiple-valued functions
based on codes in the Hamming metric. For example, consider the general error-
correcting decision diagram generated using the ternary Hamming (4; 2) code, which
is shown in Figure 4.9. For this diagram e = 1 and there are in total M = 31 non-
terminal nodes in the diagram. We need to list all the combinations of nodes for
which there are either zero or one faulty nodes on each path.
Again, the rst terms are directly given by equation (5.2) as (1 p)31 and 31p(1 
p)30. The following terms are determined by rst considering all cases with exactly
two faulty nodes in the diagram. It is clear that these faulty nodes can be situated
on any single level of the diagram, since each node on a particular level never belongs
to the same path as the other nodes on that level. It is also possible that the faulty
nodes are on two dierent levels, but in this case we need to make sure that these
faulty nodes are never on the same path. We may continue to list all possible cases
when in the diagram there are exactly 3 faulty nodes, 4 faulty nodes, etc. The last
case is when all the 18 nodes on level 3 are faulty. Adding up all these situations
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams 50
Figure 5.3: Comparison of the probability of a correct output of a traditional diagram with
M = 50 nodes and a robust diagram, where each node is replaced by the robust structure
generated using (3,1)-repetition code.
gives us the probability of a correct output with any input. We may now compare
the obtained probability to (1  p)4, which is the probability of a correct output in
the corresponding traditional diagram (Figure 5.4a).
Consider now the error-correcting decision diagram in Figure 4.12, which is a
robust construction corresponding to a single quaternary decision node. By our
assumptions, a single quaternary node gives the correct output with probability
(1   p), since the probability of a faulty node is p. For the diagram in 4.12, we
have e = 1 and in total 11 decision nodes, therefore the rst terms of the probability
function are given by (1 p)11 and 11p(1 p)10. The rest of the terms are determined
similarly as above for the ternary diagram. The obtained probability is depicted
together with (1  p) in Figure 5.4b.
Finally, take the diagram constructed using a one-Lee-error-correcting code in
Figure 4.14, which has in total 6 nodes and e = 1. Since q = 5, an incorrect
value can be at most at distance m = 2 from the correct value. Let us assume
that the probabilities of decision errors are p1 = p and p2 = p
2. Now, we may
compute the probability for a correct output, starting with terms (1   (p + p2))6
and 6p(1   (p + p2))5 given by equation (5.4). In addition to these, it is clear that
the correct output is obtained even if all the nodes on level 1 give incorrect values
at distance 1 from the correct value, if the top level node is correct. Therefore, the
rest of the terms are given by
P5
i=2
 
5
i

pi(1  (p+ p2))6 i, when two or more of the
ve nodes on level 1 are faulty giving incorrect values at distance 1 from the correct
value. The probability of a correct output of the robust diagram is compared to the
probability of a correct output of a single 5-ary node in Figure 5.5.
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams 51
(a) (4,2) Hamming code. (b) (3,1) code for q = 4.
Figure 5.4: Comparison of the probability of a correct output of traditional and error-
correcting multiple-valued decision diagrams based on codes in the Hamming metric.
Figure 5.5: Comparison of the probability of a correct output of a traditional and a Lee-
error-correcting decision diagram for q = 5.
5.4 Approximating the Probability of Correct Outputs
Computing the exact probability of correct outputs for error-correcting decision
diagrams by brute force is very time-consuming, and nearly impossible for larger
diagrams. Therefore, it is important to nd ways to approximate the performance
of these diagrams. The easiest approximation is given directly by the equation (5.1),
following that we can obtain a lower bound for the probability of a correct output
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams 52
with any input simply by computing PfPMj=1 j  eg. It was shown in Sections
5.1 and 5.2 how this probability is computed in case of diagrams based on codes in
the Hamming metric and Lee metric, respectively. To obtain a better estimate, the
higher order terms should also be estimated in some way.
In the following, it is explained how the structure of the obtained diagrams can
be used for estimating the higher order coecients. The method is for diagrams
based on codes in the Hamming metric, and it is based on the observation that the
general decision diagrams generated using an (n; k) linear code always include the
full tree structure on levels 0 to k. For a full tree structure, it is possible to derive
a formula for determining the coecients of the function describing the probability
of correct outputs with any inputs.
An important property of linear codes is that each linear code is equivalent to a
systematic code (Section 2.4). The systematic property induces a certain structure
on general error-correcting decision diagrams, which are generated using linear codes.
Since the generator matrix G of a systematic code includes the identity matrix Ik,
the codewords of the systematic code are such qk length n vectors, which each begin
with dierent length k vector. Now, the general error-correcting decision diagram
is obtained by reducing the q-ary decision tree of the function g obtained with the
mapping in equation (4.1). In any general error-correcting decision diagram, a path
corresponding to a codeword must lead to a dierent terminal node. Therefore, the
general error-correcting decision diagram generated using an (n; k) systematic code
will have the full q-ary tree on levels 0 to k.
Since any linear (n; k) code C is equivalent to a systematic code, the systematic
code is obtained from C by applying a xed permutation of symbols to the codewords
of C. Therefore, the qk length n vectors, which each begin with dierent length k
vector, must lead to dierent terminal nodes in the general error-correcting decision
diagram of C. Thus, the general error-correcting decision diagram generated using
the code C must similarly have the full q-ary tree on levels 0 to k.
Now, we may exploit this structure for estimating the probability of correct out-
puts of any diagram. For general diagrams, which are not reduced with respect to
any particular functions, this approximation will give a lower bound for the proba-
bility of a correct output, and consequently an estimate for any particular function.
In this approach, the probability is approximated by estimating the higher order
coecients of the probability function. Consider the following for binary functions.
To estimate the probability of a correct output of any robust BDD, we may form the
rst e+ 1 terms by equation (5.2) and approximate the higher order coecients by
the coecients of the probability of a correct output for the binary tree of depth k,
for which there can be up to e faulty nodes on each path. Estimating the coecients
means that we approximate i of the term ip
i(1  p)M i (equation (5.2)) where M
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams 53
is the number of nodes in the general robust BDD by Si of the term Sip
i(1 p)2k 1 i,
where k is the depth of the binary tree and M > 2k   1  i.
Thus, we may write for general error-correcting decision diagrams:
Pfoutput correct for any inputg 
eX
i=0

M
i

pi(1  p)M i +
2k 1X
i=e+1
Sip
i(1  p)M i
The above inequality holds, since for a given e-error-correcting decision diagram
ofM non-terminal nodes, the coecient i equals the number of ways we can assign
exactly i of the non-terminal nodes of the diagram faulty with at most e of them
on any path. Similarly, for the depth k binary tree, which is strictly included in the
diagram, the coecient Si equals the number of ways we can assign exactly i of the
non-terminal nodes of the tree faulty with at most e of them on any path. It is clear
that when assigning the faulty nodes to the diagram, we can start by assigning them
to the binary tree part, obtaining the Si ways, and after that consider the situations,
where nodes that are not included in the tree can be faulty. Hence, i  Si.
Therefore, the approximation provides a lower bound for the probability of a cor-
rect output in case of general error-correcting decision diagrams. However, assigning
a particular function may break the full binary tree structure and reduce the num-
ber of nodes in the diagram signicantly. Thus, the approximation does not provide
bounds for particular functions, but can be used as an estimate for the probability
of a correct output for such diagrams.
The value of the coecient Si can be computed as follows. Let k be the depth
of the tree, e the number of allowed faulty nodes on a path, and i the number of
faulty nodes in the whole tree. Let Si = S(k; e; i) be the number of full depth k
binary trees having i faulty nodes and up to e of them on any path. The number of
such trees is equal to the coecient of the term pi(1  p)2k 1 i in the probability of
a correct output with any input for the full binary tree.
We may compute S(k; e; i) for k; e; i  1 recursively as follows:
Si = S(k; e; i) =
iX
t=0
S(k   1; e; t)S(k   1; e; i  t)
+
i 1X
t=0
S(k   1; e  1; t)S(k   1; e  1; i  1  t): (5.6)
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams 54
The initial values are:
when k = 0 : S(k; e; i) = 1 if i  e; where i  1 and S(k; e; i) = 0 otherwise,
when e = 0 : S(k; e; i) = 1 if i = 0 and S(k; e; i) = 0 otherwise,
when i = 0 : S(k; e; i) = 1 always.
For example, the depth 0 tree consists of only the root node, hence S(0; 0; 0) = 1,
since there is exactly one depth 0 tree having 0 faulty nodes, where on each path no
faulty nodes are allowed.
The formula associates two subtrees into a parent node, which may or may not be
faulty. In the rst convolution, the parent node is assumed to be non-faulty, and all
of the exactly i faulty nodes must lie in the subtrees. The terms of the convolution
consist of the dierent ways of assigning faulty nodes into the two subtrees. For
example, in the very rst term, all of the faulty nodes are in one of the subtrees.
Consequently, in the last product of the convolution, all of the faulty nodes lie in the
other subtree. In the second convolution, the parent node is assumed to be faulty,
following that we must subtract 1 from e and i. The initial values guarantee that
the recursion is well-dened.
An interesting observation in case of general BDDs is that the examples in Section
4.4 include not only the depth k binary tree structure, but in fact the full binary
tree of depth k + 1. On the other hand, we might use for estimation such binary
trees, which have approximately the same number of nodes as the obtained error-
correcting decision diagram. For example, Figure 5.6 shows the comparison between
the binary tree approximation and the exact probability of a correct output for
robust representations of single nodes using (3; 1) and (5; 1) repetition codes. The
binary tree used for the approximation is the depth k + 1 = 2 tree, where k is
the number of variables of the original function. The binary tree of depth 2 is not
included in the error-correcting decision diagrams, but the approximation still gives
good results.
The above binary tree approximation can be generalized for the q-ary case. The
general error-correcting decision diagrams of q-ary functions of k variables include
the full q-ary tree of depth k. Therefore, the probability of a correct output with
any input can be approximated using the coecients of the probability function of
this tree.
The equation (5.6) is generalized as follows.
S(k; e; i) =
X
t1++tq=i
S(k   1; e; t1)   S(k   1; e; tq)
+
X
t1++tq=i 1
S(k   1; e  1; t1)   S(k   1; e  1; tq): (5.7)
5. Fault-Tolerance Analysis of Error-Correcting Decision Diagrams 55
(a) (b)
Figure 5.6: Binary tree approximation compared to the exact probability of correct output
for robust structure for a single node using (3; 1) repetition code (a) and (5; 1) repetition
code (b).
The above method of approximation based on the structure of general error-
correcting decision diagrams is one possible means of approximation, and it is par-
ticularly interesting because it is derived from the properties of linear codes. How-
ever, it is dicult to say anything about the eciency of this approximation without
extensive research. In case of very large functions, this estimate might not be much
more ecient than simply using PfPMj=1 j  eg as a lower bound for the proba-
bility of a correct output. Other possible means of approximation would include,
for instance, deriving bounds for the higher order coecients based on the binomial
expansion, since each coecient i, when i > e, has to be less than the binomial
coecient
 
M
i

.
56
6. DISCUSSION
In this chapter, some remarks and explanations are given based on the theory and
results presented in the previous chapters. The goal of the thesis was to describe a
new method of designing fault-tolerant logic, which combines error-correcting codes
with decision diagrams. In addition to this, the fault-tolerance of these systems
was analyzed by deriving a probability model describing the probability of a correct
output with any input. The example error-correcting decision diagrams were shown
to have a signicantly increased probability of a correct output than corresponding
diagrams with no error correction.
6.1 Discussion on Error-Correcting Decision Diagrams
In Section 4.2 it was mentioned that the concept of error-correcting decision dia-
grams can be generalized to apply for nonlinear codes. The idea is simple, and can
be described in terms of the encoding and decoding rules of the code. First, by the
encoding rule of the code, the codewords of length n are determined and mapped to
the corresponding f(xi). Then, each non-codeword that is decoded to the informa-
tion word xi by the decoding rule of the code is mapped to f(xi). Any other possible
words of length n are mapped to the symbol . Then, the decision diagram for the
obtained function is generated, which is the error-correcting decision diagram of the
function f . However, using linear codes has some advantages, since linear codes are
known to have several good properties, and applying linear codes in the design of
error-correcting decision diagrams is fast and straightforward in terms of generating
the function g given by equation (4.1).
An important property of error-correcting decision diagrams is that depending
on the technology, they are easy to implement, since the layout of the circuit is
determined by the structure of the diagram. Therefore, the complexity of the circuit
corresponds to the complexity of the robust diagram. Fault-tolerance is included
already in the representations of functions, which means that no voters or checker
circuits are needed in the designs. Since the diagrams lead to correct outputs with
up to e faulty nodes on each path, a fairly large portion of the nodes can become
temporarily or permanently faulty without aecting the output values.
The general error-correcting decision diagram of k-variable functions is basically
a graphical representation of the decoding algorithm of the linear code, which is
6. Discussion 57
used in construction of the diagram. The terminal nodes correspond to the q-ary
length k vectors and each path corresponds to a received n-ary sequence. This
way, each path leads to the k-ary sequence, which is interpreted from the received
n-ary sequence in the decoding process. Therefore, the general error-correcting
decision diagram of k-variable functions can be seen as a useful tool for the decoding
procedure of the code. This leads to the idea of utilizing error-correcting decision
diagrams when designing fast decoding circuits for error-correcting codes. When
using error-correcting decision diagrams, the decoded sequence is directly available
after the encoded sequence is received, once the structure of the error-correcting
decision diagram is known. Moreover, since the structure of the error-correcting
decision diagram of arbitrary functions contains the depth k binary tree, knowing
the structure on levels k+1 to n would be sucient. On the other hand, storing the
decision diagram structure might not be ecient in case of very large codes, but the
idea seems worth exploring. Also, representing the code by a decision diagram gives
the idea of studying the properties of the code using the error-correcting decision
diagram it generates.
6.2 Discussion on the Fault-Tolerance Analysis
The fault model in this thesis considers any faults that may cause an incorrect output
in a node. Therefore, any type of fault that would cause the logical values to change
inside the nodes to incorrect ones could lead to a faulty output of a node, which
could result in the selection of an incorrect line. For example, in a stuck-at-fault
the output of the gate is xed, which can cause the data to propagate along the
incorrect line after the faulty node.
The probability analysis done in Chapter 5 compares the performance of the ro-
bust constructions to the performance of non-redundant representations of functions.
This analysis of robust decision diagrams shows, that the probability of incorrect
outputs can be signicantly decreased depending on the error-correcting properties
of the code. With traditional diagrams, a single incorrect decision causes the output
to be incorrect, and the lowest degree term of the error probability function is always
a multiple of p, where p is the error probability of a single node in the diagram. For
robust diagrams based on codes in the Hamming metric, the lowest degree term is
always at least of degree e+ 1, i.e., of the form A  pe+1, where A is some constant.
This means, that even with moderately high gate error probabilities, e.g, 10 2, a
robust construction will have a signicantly decreased probability for an incorrect
output. However, better error-correcting properties increase the complexity of the
design.
The problem in analyzing the fault-tolerance of error-correcting decision diagrams
is that due to their complex structure, in particular when large functions are consid-
6. Discussion 58
ered, computing the exact probability is very time-consuming. Since each diagram
has a dierent structure, the coecients i and u in equations (5.2) and (5.4) have
to be analyzed separately in case of each diagram, following that it is impossible to
derive a universal formula for determining these coecients. While computing the
probabilities in Section 5.3, the possibility of using a script, which would go through
each combination of faulty and not faulty nodes in a given error-correcting decision
diagram with a given probability p of a decision error, and determine the overall
probability of an incorrect output was considered, but this exhaustive method was
found to be too heavy even for diagrams generated with the (7; 4) Hamming code.
It was mentioned in the introductory part of Chapter 5 that the selection of
the fault model depends on the utilized technology. By considering realizations
using multiplexers, we can relate the well-known probability of correct decoding
[8] to error-correcting decision diagrams. When implementing a switching function
represented by a decision diagram using multiplexers, each level of multiplexers
corresponding to one variable has a control line, which aects the output of each
multiplexer on that level [6]. If we assume that the faults can only occur in the con-
trol lines, the probability of a correct output directly corresponds to the probability
of correct decoding given by
Pfcorrect decodingg =
eX
i=0

n
i

pi(1  p)n i;
where n is the length of the codewords. This is because in the general error-correcting
decision diagram constructed using a (n; k) linear code, there are in total n levels
of non-terminal nodes, following that in the implementation with multiplexers, the
circuit has n control lines. Therefore, there are always
 
n
i

possible combinations of
faulty control lines for each i. Clearly, the probability of a correct output is much
higher with the above assumptions, than given by equation (5.2), since M , which
is the total number of non-terminal nodes, is much larger than n. However, it is
dicult to say wether this model would be applicable in reality.
To study the performance of error-correcting decision diagrams in more detail, it
would be necessary to compare this method to other methods of increasing fault-
tolerance. A correspondence between the TMR method and the robust diagram
generated using the (3; 1) repetition code is easy to notice. The simplest TMR con-
struction (Figure 3.1) includes triplicated modules and a voter, which in comparison
to the robust diagram in Figure 4.1 has similar complexity. The robust decision di-
agrams are constructed in such a way, that all nodes are equal in terms of weakness,
whereas the voter in the TMR structure is unprotected from errors. Triplicating the
voters provides more fault-tolerance, but consequently the complexity of the layout
increases. Therefore, the robust decision diagram can be seen as a more ecient
6. Discussion 59
structure. However, thorough comparison in terms of reliability would require more
research and testing. Using the (5; 1) repetition code provides protection against
two errors, but the complexity is higher than of the TMR structure with triplicated
voters. The comparison between more complex decision diagram structures and, for
example, the TMR method is more dicult. It might not be straightforward to nd
ecient methods for comparing these.
60
7. CONCLUSIONS
In this thesis, fault-tolerance in digital systems was discussed, and a new method for
providing fault-tolerance, combining error-correcting codes and decision diagrams,
was presented. The method was discussed on a theoretical level, but due to the
properties of decision diagrams, the implementation of the obtained robust struc-
tures, namely error-correcting decision diagrams, is straightforward depending on
the desired technology. The ideas introduced in this thesis have also resulted in
two conference papers [4], [5], where error-correcting decision diagrams and their
performance is discussed.
In addition to easy implementation, a key advantage of error-correcting deci-
sion diagrams is that fault-tolerance is introduced already to the representations of
functions, following that no additional checker circuitry is required in the obtained
circuit layouts. The error-correcting decision diagrams in this thesis are based on
linear codes, which are known to have several good properties and, in principle, easy
implementation. Some of these properties, for example the systematic nature of the
code, is visible in the obtained decision diagrams, and can be used when analyzing
the performance of error-correcting decision diagrams.
The error-correcting properties of the diagrams depend on the properties of
the code on which the diagram is based. In the fault-tolerance analysis of error-
correcting decision diagrams, it was shown that even with moderately high gate er-
ror probabilities, e.g, 10 2, an error-correcting decision diagram has a signicantly
decreased probability for an incorrect output in comparison with a non-redundant
diagram. However, better error-correcting properties increase the complexity of the
design, as more powerful codes are typically more complex.
Future work could include comparing error-correcting decision diagrams to other
fault-tolerance methods, e.g. the TMR method, and determining the reliability of
error-correcting decision diagrams by modeling and testing actual implementations.
The possibility of studying the properties of error-correcting codes based on the
obtained decision diagram is also worth exploring.
61
BIBLIOGRAPHY
[1] The webpage of The MathWorks, MATLAB. [www], [referenced 29.6.2011],
Available: http://www.mathworks.com/products/matlab/.
[2] J.A. Abraham and D.P. Siewiorek. An algorithm for the accurate reliabil-
ity evaluation of triple modular redundancy networks. IEEE Transactions on
Computers, 23(7):682692, 1974.
[3] S. B. Akers. Binary decision diagrams. IEEE Transanctions on Computers,
27(6):509516, 1978.
[4] H. Astola, S. Stankovi¢, and J. T. Astola. Error-correcting decision diagrams.
In Proceedings of The Third Workshop on Information Theoretic Methods in
Science and Engineering, Tampere, Finland, August 16-18, 2010.
[5] H. Astola, S. Stankovi¢, and J. T. Astola. Error-correcting decision diagrams for
multiple-valued functions. In Proceedings of ISMVL 2011, 41th International
Symposium on Multiple-Valued Logic, pages 3843, Tuusula, Finland, May 23-
25, 2011.
[6] J. T. Astola and R. S. Stankovi¢. Fundamentals of Switching Theory and Logic
Design. Springer, 2006.
[7] E. R. Berlekamp. Algebraic Coding Theory. McGraw-Hill, New York, 1968.
[8] Richard E. Blahut. Theory and Practice of Error Control Codes. Addison-
Wesley, 1983.
[9] R. E. Bryant. Graph-based algorithms for Boolean functions manipulation.
IEEE Transactions on Computers, 35(8):667691, 1986.
[10] S.K. Chilappagari and B. Vasic. Fault tolerant memories based on expander
graphs. In IEEE Information Theory Workshop, 2007. ITW '07, pages 126
131, 2007.
[11] M. M. Dickinson, J. B. Jackson, and G. C. Randa. Saturn V launch vehicle
digital computer and data adapter. In Proceedings of the October 27-29, 1964,
fall joint computer conference, part I, AFIPS '64 (Fall, part I), pages 501516,
New York, NY, USA, 1964. ACM.
[12] William Feller. An Introduction To Probability Theory And Its Applications,
volume 1. Wiley, 1971.
[13] J.B. Fraleigh. A First Course in Abstract Algebra. Addison-Wesley, 1974.
BIBLIOGRAPHY 62
[14] K. Furutani, K. Arimoto, H. Miyamoto, T. Kobayashi, K. Yasuda, and
K. Mashiko. A built-in Hamming code ECC circuit for DRAMs. IEEE Journal
of Solid-State Circuits, 24(1):50 56, February 1989.
[15] R. G. Gallager. Low Density Parity Check Codes. M.I.T. Press, 1963.
[16] S. Ghosh and P. D. Lincoln. Low-density parity check codes for error correction
in nanoscale memory. SRI Computer Science Laboratory Technical Report,
September 2007.
[17] W.J. Van Gils. A triple modular redundancy technique providing multiple-
bit error protection without using extra redundancy. IEEE Transactions on
Computers, 35(7):623631, 1986.
[18] Thijs Krol. (N, K) concept fault tolerance. IEEE Transactions on Computers,
35(4):339349, April 1986.
[19] Parag K. Lala. An Introduction to Logic Circuit Testing. Morgan & Claypool,
2009.
[20] Serge Lang. Undergraduate Algebra. Springer, New York, 2005.
[21] F. J. MacWilliams and N. J. A. Sloane. The Theory of Error-Correcting Codes.
North-Holland, Amsterdam, 1997.
[22] J. Mathew, J. Singh, A.M. Jabir, M. Hosseinabady, and D.K. Pradhan. Fault
tolerant bit parallel nite eld multipliers using LDPC codes. In IEEE Inter-
national Symposium on Circuits and Systems, 2008. ISCAS 2008, pages 1684
1687, May 2008.
[23] D. M. Miller and M. A. Thornton. Multiple Valued Logic: Concepts and Rep-
resentations. Morgan & Claypool, 2008.
[24] E. F. Moore and C. E. Shannon. Reliable circuits using less reliable relays.
Journal of the Franklin Institute, 262(3):191  208, 1956.
[25] Joseph J. Naresky. Reliability denitions. IEEE Transactions on Reliability,
19(4):198 200, November 1970.
[26] Victor P. Nelson. Fault-tolerant computing: Fundamental concepts. Computer,
23(7):1925, July 1990.
[27] D. K. Pradhan and J. J. Stier. Error-correcting codes and self-checking cir-
cuits. Computer, 13(3):2737, March 1980.
BIBLIOGRAPHY 63
[28] Tsutomu Sasao. Memory-Based Logic Synthesis. Springer, Heidelberg, 2011.
[29] Uwe Schöning. Logic for Computer Scientists. Birkhäuser, 2008.
[30] S. Stankovi¢. XML-based framework for representation of decision diagrams.
Ph.D. Thesis, Tampere University of Technology, Department of Signal Pro-
cessing, 2009.
[31] Richard P. Stanley. Enumerative Combinatorics, volume 1. Cambridge Univer-
sity Press, 2002.
[32] Heng Tang, Jun Xu, Yu Kou, S. Lin, and K. Abdel-Ghaar. On algebraic
construction of Gallager and circulant low-density parity-check codes. IEEE
Transactions on Information Theory, 50(6):1269  1279, 2004.
[33] M. A. Thornton and V. S. S. Nair. Ecient calculation of spectral coecients
and their applications. IEEE Transanctions on Computer-Aided Design of In-
tegrated Circuits and Systems, 14(11):13281341, 1995.
[34] J. H. van Lint. Introduction to Coding Theory. Springer Verlag, New York,
1982.
[35] J. von Neumann. Probabilistic logics and synthesis of reliable organisms from
unreliable components. In C. Shannon and J. McCarthy, editors, Automata
Studies, pages 4398. Princeton University Press, 1956.
[36] J.F. Wakerly. Partially self-checking circuits and their use in performing logical
operations. IEEE Transactions on Computers, 23(7):658  666, 1974.
