A combinatorial method for the evaluation of yield of fault-tolerant systems-on-chip by Suñé, Víctor et al.
A Combinatorial Method for the Evaluation of Yield of Fault-Tolerant
Systems-on-Chip
Doru P. Munteanu
Military Technical Academy
G. Cosbuc 81–83
Bucharest 75275, Romania
munteanud@mta.ro
Vı´ctor Sun˜e´, Rosa Rodrı´guez-Montan˜e´s, Juan A. Carrasco  
Departament d’Enginyeria Electro`nica
Universitat Polite`cnica de Catalunya
Diagonal 647, plta. 9
08028 Barcelona, Spain
fsunye,rosa,carrascog@eel.upc.es
Abstract
In this paper we develop a combinatorial method for the
evaluation of yield of fault-tolerant systems-on-chip. The
method assumes that defects are produced according to a
model in which defects are lethal and affect given compo-
nents of the system following a distribution common to all
defects. The distribution of the number of defects is arbi-
trary. The method is based on the formulation of the yield
as 1 minus the probability that a given boolean function
with multiple-valued variables has value 1. That probabil-
ity is computed by analyzing a ROMDD (reduced ordered
multiple-valuedecision diagram) representation of the func-
tion. For efﬁciency reasons, we ﬁrst build a coded ROBDD
(reduced ordered binary decision diagram) representation
of the function and then transform that coded ROBDD into
the ROMDD required by the method. We present numerical
experiments showing that the method is able to cope with
quite large systems in moderate CPU times.
1 Introduction
Systems-on-chip are becoming popular. The high den-
sities and areas of those integrated systems make them
very susceptible to manufacturing defects. In fact, com-
plex systems-on-chip are likely to have a very small yield
if they are not designed with built-in fault-tolerance. Then,
there is a need for efﬁcient methodologies for estimating
the yield of complex fault-tolerant systems-on-chip. When
the fault-tolerant system-on-chip has a regular structure, it
is often possible to make “ad-hoc” evaluations (see, for
instance, [11, 12, 17, 18]). However, many fault-tolerant
 This work was supported by the “Comisio´n Internacional de Ciencia y
Tecnologı´a” (CICYT) of the Ministry of Science and Technology of Spain
under the research grant TAP1999-0443-C05-05.
designs do not have a regular structure, particularly those
using a sophisticated network-on-chip as a communication
subsystem among the intellectual property cores (IPs) [3].
Computing the yield of such systems-on-chip is difﬁcult,
mainly because the fact that realistic defect distributions
have clustering [7, 13, 14, 15, 16, 18] and, thus, introduce
dependencies among the failed states of the components
of the system (see, for instance, [18, 27]). Simulation is
an approach which is not severely limited by the complex-
ity of the system, but tends to be expensive and does not
provide strict error control. The aim of this paper is to de-
velop a combinatorial method for the evaluation of the yield
of fault-tolerant systems-on-chip with precise error control
which can cope with quite complex systems using currently
affordable computational resources.
We assume that the fault-tolerant system-on-chip is made
up of a set f          Cg of components and that whether
the system is functioning or not is determined from the
failed states of the components through a fault-tree function
F x
 
       x
C
, where variable x
i
takes the value 1 if and
only if component i is failed and the function takes the value
1 if and only if the system is not functioning. No restriction
is imposed on F x
 
       x
C
. It will be assumed that a
gate-level description of the function is available.
The productionof manufacturing defects will be modeled
using the following probabilities:
Q
k
 P number of manufacturing defects is k 
k             
P
i
 P a given defect affects component i and is lethal  
It will be assumed that all defects will be distributed over
the components making up the system and will be lethal
following the probabilities P
i
,     i   C, independently
of the number of defects, of which components affect the
remaining defects and of whether those defects are lethal
0-7695-1959-8/03 $17.00 (c) 2003 IEEEProceedings of the 2003 International Conference on Dependable Systems and Networks (DSN’03) 
or not. That model is useful from the designer’s point of
view, since the distribution of the number of defects Q
k
,
k            could be easily provided by the manufacturer
of the system-on-chip and the probabilities P
i
,    i   C
could be estimated from the ﬁnal layout of the system-on-
chip using appropriate tools [19, 21, 31, 32] or from IP
layouts and routingestimates [30]. Thus, the methodologies
could be used at several design stages. The assumed model
is consistent with all compound Poisson yield models [18],
which include the widely used negative binomial distribution
for the number of defects. The assumed model will not be
consistent however with yield models accounting for spatial
clustering1 such as the one proposed in [22].
From a computational point of view, it is convenient to
map the previously described model into a model taking
into account only lethal manufacturing defects, i.e. defects
which effectively make some component of the system to be
defective (not to work properly). That model includes the
probabilities:
Q
 
k
  P number of lethal manufacturing defects is k 
k             
P
 
i
  P a given lethal defect affects component i 
The reason why the last model is computationally more
convenient is basically because, since not all defects will be
lethal, the distribution Q 
k
, k            will be shifted
to lower values of k in relation to the distribution Q
k
, k  
         and, then, if only up to M defects are analyzed
(the computational cost of the methods will increase with
M ), higher accuracy will be obtained if the distributionQ 
k
,
k            is used instead of Q
k
, k           . The
mapping can be performed using:
Q
 
k
 

X
m k
Q
m
 
m
k

P
k
L
 P
L

mk
  (1)
P
 
i
 
P
i
P
L
 
where P
L
 
P
C
i 
P
i
is the probability that any given de-
fect is lethal. As previously commented, the negative bino-
mial distribution is the most widely used distribution for the
number of defects affecting a chip. That distributionhas the
form:
Q
k
 
	 k
k


k
 	 
 k
  (2)
where  is the expected number of defects and  is the
clustering parameter (the clustering increases for decreasing
1Spatial clustering refers to the fact that irrespectively of the expected
number of defects on the system-on-chip, defects tend to cluster spatially.
). It is known (see [15]) that, when the distribution of the
number of defects is negative binomial, the distribution of
the number of lethal defects is also negative binomial with
the same clustering parameter. More precisely, when the
distribution of the number of defects is given by (2), the
distribution of the number of lethal defects is:
Q
 
k
 
	 k
k


 

k
 	 
 

 k
 
with     P
L
. Similar results hold for all compound
Poisson distributions [18].
2 The method
In the method the yield, Y , is computed analyz-
ing whether the system is functioning or not assuming
          M lethal defects. Let
Y
k
  P system is functioning j there are k lethal defects 
We have
Y  

X
k 
Q
 
k
Y
k

Analyzing up to M defects we can pessimistically estimate
Y by
Y
M
 
M
X
k 
Q
 
k
Y
k
 
with error bounded from above by
P

k M
Q
 
k
   
P
M
k 
Q
 
k
. Then, given a suitable error control parameter ,
we can select
M   min

m    
m
X
k 
Q
 
k
  

 
guaranteeing and absolute error in the yield estimation   .
The yield estimate YM can be formalized as the proba-
bility that a boolean function of certain independent integer-
valued random variables is equal to 1. Assume that the
defects are numbered in some arbitrary order. Those ran-
dom variables are:
W  



k     k  M if there are k lethal defects
M 	  if there are more than
M lethal defects
and, for    k  M ,
V
k
  i if the kth lethal defect affects component i 
0-7695-1959-8/03 $17.00 (c) 2003 IEEEProceedings of the 2003 International Conference on Dependable Systems and Networks (DSN’03) 
Note that the random variable W takes values in
f         M  g and each random variable V
k
takes val-
ues in f         Cg. The random variable W has prob-
ability distribution P W  k  Q 
k
,     k   M ,
P W  M   
P
M
k 
Q
 
k
. The random variables V
k
have probabilitydistributionsP V
k
 i  P
 
i
,    k  M ,
   i   C.
Let I
k
x denote the boolean function with integer-
valued variable x returning the value 1 if x  k and the
value 0 otherwise and let I
l
x denote the boolean func-
tion with integer-valued variable x returning the value 1 if
x  l and the value 0 otherwise. Let the boolean function
Gw  v

  v

       v
M
  I
M
w
 F
 
M

l 
I
l
w  I

v
l
 
    
M

l 
I
l
w  I
C
v
l


 (3)
Then, we have the following result.
Theorem 1. YM   P GW V

  V

       V
M
  .
Intuitively, the reason why Theorem 1 holds is that
I
M
W  “tells” whether the number of lethal defects is
 M , I
l
W  “tells” whether there is a lth lethal defect,
I
i
V
l
 “tells” whether the lth lethal defect affects component
i and, then,
W
M
l 
I
l
W   I
i
V
l
 “tells” whether compo-
nent i is affected by some of the ﬁrst M lethal defects. A
formal proof follows.
Proof of Theorem 1 The quantity   Y
k
is the prob-
ability that given there are k lethal defects the system is
not functioning. Since, assuming there are k lethal defects,
component i is failed if and only if
W
k
l 
I
i
V
l
  , we
have
 Y
k
 P

F
 
k

l 
I

V
l
      
k

l 
I
C
V
l


 


(4)
Using the theorem of total probability and the independence
of the random variables W V

       V
M
:
P GW V

       V
M
  

M
X
k 
P W  kP GW V

       V
M
   jW  k

M
X
k 
Q
 
k
P Gk  V

       V
M
  

 

M
X
k 
Q
 
k

P GM    V

       V
M
    (5)
But, from the deﬁnition of G (3), for     k  M :
Gk  v

       v
M
  F
 
k

l 
I

v
l
      
k

l 
I
C
v
l


(6)
and
GM    v

       v
M
    (7)
Then, using (4)–(7)
P GW V

       V
M
  

M
X
k 
Q
 
k
 Y
k
  
M
X
k 
Q
 
k
 
M
X
k 
Q
 
k
Y
k
  Y
M

In the method, the probability P GW V

       V
M
 
 is computed building an ROMDD (reduced or-
dered multiple-valued decision diagram) of the function
Gw  v

       v
M
. ROMDDs are a natural extension of
the well-known ROBDDs (reduced ordered binary decision
diagrams) [5] in which both the variables and the function
are allowed to be multiple-valued. A gate-level representa-
tion of the functionGw  v

       v
M
 can be obtained from
a gate-level representation of F x

       x
C
 as shown in
Figure 1, where the gate labeled i inside is a “ﬁlter” gate re-
turning the value 1 if its integer-valued input has value i and
returning the value 0 otherwise and the gate labeled  i in-
side is a “ﬁlter” gate returning the value 1 if its integer-valued
input has value  i and returning the value 0 otherwise. As
ROBDDs, ROMDDs are canonical representations which
can be built and manipulated in a similar way as ROBDDs.
An ROMDD representing a function F , which can take val-
ues in the set S
F
, of variables x
i
, i           n, which
can take values in the sets S
i
is a directed acyclic graph
with up to jS
F
j terminal nodes each labeled with a distinct
value of the set S
F
. Every non-terminal node is labeled by
an input variable x
i
and has as many as jS
i
j edges, each
labeled by a subset of S
i
, with subsets associated with dif-
ferent edges being non-intersecting. The ROMDD has a
unique non-terminal node without incoming edges, repre-
senting the function F x

       x
n
, called the top node.
The input variables encountered in every path from the top
node to a terminal node form a sequence of non-repeating
input variables consistent with an ordering x
p
       x
pn
of the input variables of the function. Every non-terminal
node of the ROMDD represents a unique function of the set
of input variables which are found in some path from the
node to some terminal node. That a ROMDD is a canon-
ical representation means that, given F , the ROMDD only
0-7695-1959-8/03 $17.00 (c) 2003 IEEEProceedings of the 2003 International Conference on Dependable Systems and Networks (DSN’03) 
..
.
.
.
.
.
.
.
.
.
.
. . .
. . .
... ...
  
C
  
M   
v
M
C
w
   
 M
w
x
 
x

x
C
G
F
v
 
Figure 1. Gate-level description of the function
G w v
 
       v
M
.
depends on the selected ordering x
p 
       x
p n
for the
multiple-valued variables.
Using the fact that the random variables W V

       V
M
are independent and that the function represented by a non-
terminal node only depends on the set of variables found
on paths from the non-terminal node to terminal nodes, it
is possible to compute P GW V

       V
M
    from an
ROMDD representation of the function Gw  v

       v
M
.
This can be achieved by assigning the value 1 to the ter-
minal node labeled “1” and the value 0 to the terminal
node labeled “0”, making a depth-ﬁrst, left-most traver-
sal [1] of the ROMDD, and computing the probability that
the function represented by a non-terminal node has value
1 when returning from each non-terminal node. Assume
that node n has associated with it the variable w, that
M  , and that n has edges to nodes n

, n

and n

with subsets of values of w f	   g, f
g and f    g, re-
spectively. Then, denoting by valuex the “value” vari-
able associated with node x, when returning from node n,
valuen would be computed as P W  	  P W 
   valuen

  P W  
  valuen

  P W 
  P W    P W   valuen

. At the end of
the traversal, the “value” variable of the top node will hold
P GW V

       V
M
   . We illustrate the computa-
tional procedure with the small ROMDD shown in Figure 2
which corresponds to a fault-tolerant system having fault-
tree function F x

  x

  x

  x

x

 x

and M   under
the multiple-valued variable ordering v

  v

  w. This im-
plies that the random variable W will take values in the set
f	       
gand the random variablesV

and V

will take val-
ues in the set f     
g. Using a depth-ﬁrst, left-most traver-
sal of the ROMDD, P GW V

  V

     valuen


1 0
21
1,31
0,1
0
3
2,3
3
0,1,2 1,2,3
2,3 2
w
v
 
v

v

w
w
n
 
n

n

n


n

n

Figure 2. Small ROMDD to illustrate the computation of
P G W V
 
       V
M
  .
would be computed following the sequence:
valuen

  Q
 

 
valuen

  Q
 

 Q
 

 
valuen

  P
 

 valuen

  P
 

 P
 

 valuen

  
valuen

  P
 

 valuen

  P
 

 P
 

 valuen

  
valuen

  Q
 

 Q
 

 Q
 

 
P GW V

  V

     valuen

 
P
 

 valuen

  P
 

 valuen

  P
 

 valuen

 
Although there are algorithms and packages for ROMDD
manipulation [23, 29], there is currently consensus in the
ROMDD community that the most efﬁcient way for analyz-
ing multiple-valued functions of multiple-valued variables
is by using coded ROBDDs [23, 24]. A coded ROBDD
of a multiple-valued function Hx

  x

       x
n
 of
multiple-valued variables x
i
is the ROBDD of any function
H
 
x
 
       x
 k
 
  x
 
       x
 k

       x
n 
       x
n k
n

which represents Hx

  x

       x
n
 in terms of groups
x
i 
  x
i 
       x
i k
i
of binary variables encoding the
multiple-valued variables x
i
. Formally, denoting by D
i
the
domain of x
i
and by x
i 
j       x
i k
i
j the codeword
representing value j  D
i
in the code used for x
i
, H
  has
to satisfy H  x
 
j

       x
 k
 
j

  x
 
j

      
x
 k

j

       x
n 
j
n
       x
n k
n
j
n
  Hj

       j
n

for every j

       j
n
  D

D

    D
n
.
Coded ROBDDs can be used directly in many applica-
tions such as formal veriﬁcation. However, the combinato-
rial method for yield computation requires the availability
of the ROMDD. Given an ordering x
p 
       x
p n
of the
multiple-valued variables, the ROMDD can be efﬁciently
obtained from a coded ROBDD if the coded ROBDD is ob-
tained using an ordering for the binary variables in which the
variables encoding each multiple-valued variable are kept
grouped and the groups are ordered according to the order-
ing x
p 
       x
p n
. The conversion procedure is based
on viewing the coded ROMDD as made up of layers, where
0-7695-1959-8/03 $17.00 (c) 2003 IEEEProceedings of the 2003 International Conference on Dependable Systems and Networks (DSN’03) 
each layer contains the nodes with binary variables encoding
a given multiple-valued variable. Some of the nodes in each
layer are entry nodes (they have incoming arcs from other
layers). The procedure builds incrementally the ROMDD
by processing bottom-up each layer of the coded ROBDD.
Processed entry nodes of the coded ROBDD are associated
with nodes of the constructed ROMDD. Let mapping n
be the node of the ROMDD associated with the entry node
n of the coded ROBDD. The bottom layer of the coded
ROBDD is processed by creating a copy n 
 
in the ROMDD
of the non-terminal node n
 
of the coded ROBDD with
value 0 and creating a copy n 

in the ROMDD of the non-
terminal node n

of the coded ROBDD with value 1 and
making mapping  n
 
  n
 
 
and mapping n

  n
 

. The
remaining layers of the coded ROBDD are processed by
processing each entry node n of the layer as follows. For
each possible value i of the multiple-valued variable x as-
sociated with the layer, it is determined which entry node
of a different (down) layer is reached from n when the val-
ues of the group of binary variables associated with the
layer encoding value i are followed. Let n
si
be the node
of the coded ROBDD reached when the value i is “simu-
lated”. If all mapping n
si
 are equal to some node n 
of the ROMDD, then mapping n must be made equal to
n
  and no node has to be added to the ROMDD. Other-
wise, the ROMDD must have a node associated with n with
multiple-valued variable x. That node must have succes-
sor mapping n
si
 for each value i of x. If there exists
in the ROMDD some node n  with multiple-valued vari-
able x and successor mapping n
si
 for each value of x,
then, mapping  n is set to n  and no node is added to the
ROMDD. Otherwise, a node n  with multiple-valued vari-
able x and successor mapping n
si
 for each value i of
x is added to the ROMDD and mapping  n is set to n .
When not all combinations of values of the groups of bi-
nary variables encode values in the domain of the associated
multiple-valued variable, the ROMDD built in that way may
have nodes which are unreachable from the top node. Such
nodes are identiﬁed and deleted by making a depth-ﬁrst, left-
most traversal of the ROMDD starting from the top node.
Figure 3 illustrates the processing of a layer of a coded
ROBDD associated with a multiple-valued variablexwhich
takes values in the domain f    g and in which two binary
variables x

  x
 
have been used to encode variable x using
the code   ,    and   .
The coded ROBDD is built by processing an implemen-
tation of the functionG w  v

       v
M
 in binary logic ob-
tained by encoding the variablew in binary using a minimum
number of bits. For the variables v
i
, since they have values
in the domain f         Cg, a binary code of minimum
number of bits encoding v
i
   is used. Such strategy keeps
minimum the number of binary variables and tends to re-
sult in coded ROBDDs of minimum size. Filter gates are
1,2 2,3
1,3
1
x x x
3
2
0
1
0
0 01 1
0
1
1
mapping(n

)
mapping(n

)
mapping(n

)mapping(n

)
mapping(n

)
mapping(n

)
n
 
n

n

n

n

n

n

n

x

x

x

x

x

Figure 3. Illustration of the procedure for obtaining the
ROMDD from the coded ROBDD
substituted by binary logic expressed in terms of the binary
variables w
l
 
    w
 
encoding the multiple-valued variable
w and the binary variables vl
i
   v
 
i
encoding each multiple-
valued variable v
i
. Although, with given orderings of the
variables, the coded ROBDD and the ROMDD will be inde-
pendent on the particular implementation of that logic, that
implementation may affect the heuristic-based orderings to
be described next. This makes convenient to report which
logic is used. Calling z
k
,   k M , the output of the “ﬁl-
ter” gate labeled  k having as input w, and calling z
M	
the output of the “ﬁlter” gate labeled M  having as input
w, the binary logic used for generating z
k
,   k M  
is:
z
M	
 lit w
l
 
 M    lit w
l
 
 
 M  
     lit w
 
 M    
z
k
 z
k	
 lit w
l
 
  k  lit w
l
 
 
  k
     lit w
 
  k     k M  
where lit w
i
 m  w
i
if the ith bit of the binary code
representing m is 1 and lit w
i
 m  w
i
if it is 0, where x
denotes the complement of the binary variable x. Calling
z
k
i
the output of the “ﬁlter” gate labeled k having as input
v
i
, the binary logic used for generating zk
i
is:
z
k
i
 lit v
l
i
  k    lit v
l 
i
  k        lit v
 
i
  k    
0-7695-1959-8/03 $17.00 (c) 2003 IEEEProceedings of the 2003 International Conference on Dependable Systems and Networks (DSN’03) 
where lit vj
i
 m  v
j
i
if the jth bit of the binary code rep-
resenting m is 1 and lit vj
i
 m  v
j
i
if it is 0. For building
the coded ROBDD the implementation of the method uses
the well-knownBDD Library developed at Carnegie-Mellon
University [2].
It is well-known that the size of the ROBDD of a
boolean function of binary variables depends on the or-
dering of the binary variables. Similarly, the size of the
ROMDD of a multiple-valued function of multiple-valued
variables depends on the ordering of the multiple-valued
variables. The variables are most often sorted using heuris-
tics and an abundant literature is available about heuristics
for ordering the variables of boolean functions of binary
variables using gate-level representations of the functions
[4, 6, 8, 9, 10, 20, 25, 26]. Those heuristics can be classiﬁed
into static and dynamic depending on whether the ordering
is computed before the ROBDD is built or the ordering may
be changed during the ROBDD construction. Three heuris-
tics which are relatively simple to implement and which
have good performance are the topology heuristic described
in [26], the weight heuristic described in [25] and the H4
heuristic described in [4]. In the topology heuristic, in-
put variables are sorted as found in a depth-ﬁrst, left-most
traversal of the gate description. In the weight heuristic,
a weight 1 is assigned to the inputs, and, processing the
gate description bottom-up, a weight equal to the sum of
the weights of the fan-in nodes is assigned to the non-input
nodes. Then, nodes in the fan-in of each non-input node
are reordered in order of increasing weight, respecting the
original ordering in case of a tie, and input variables are
sorted as found in a depth-ﬁrst, left-most traversal of the
gate description with reordered fan-in. In the H4 heuristic,
input variables are sorted as found in a depth-ﬁrst, left-most
traversal of the gate description with nodes in the fan-in of
a non-input node dynamically sorted when the non-input
node is ﬁrst visited using the following two criteria, in that
order: ﬁrst, nodes having minimum number of non-visited
inputs in its dependency cone; second, nodes with minimum
sum of indices of visited inputs in its dependency cone (the
index of a visited input is the order assigned to the input).
As in the case of the weight heuristic, in case of a tie, the
original ordering of the fan-in of a non-input node is pre-
served. We will experiment with the followingorderings for
the variables w  v
 
       v
M
:
wv: w  v
 
       v
M
.
wvr: w  v
M
       v
 
.
vw: v
 
       v
M
  w.
vrw: v
M
       v
 
  w.
t: ordering which results when the heuristic topol-
ogy is applied to the gate-level description of
G w  v
 
       v
M
 in binary logic and the multiple-
valued variables are sorted in increasing order of the
average indices over the groups of binary variables
encoding each multiple-valued variables.
w: same as t but using the heuristic weight for sorting
the binary variables.
h: same as t but using the heuristic H4 for sorting the
binary variables.
The size of the coded ROBDD is affected by the ordering
of the group of binary variables encoding each multiple-
valued variable. Then, it is convenient to use an ordering
for those groups of binary variables yielding ROBDDs of as
small size as possible. We will experiment with the follow-
ing orderings for the groups of binary variables encoding
each multiple-valued variable:
ml: most to least signiﬁcant bit.
lm: least to most signiﬁcant bit.
t: ordering which results when the binary variables are
sorted in increasing ordering of the indices given by
the topological heuristic.
w: same as t but using the weight heuristic.
h: same as h but using the H4 heuristic.
We allow the use of orderings ml and lm for the groups
of binary variables in combination with any ordering for the
multiple-valued variables. However, we will only allow the
use of an ordering t for the groups of binary variables in com-
bination with the ordering t for the multiple-valued variables,
the use of an ordering w for the group of binary variables
in combination with the ordering w for the multiple-valued
variables, and the use of the ordering h for the groups of
binary variables in combination with the ordering h for the
multiple-valued variables.
3 Benchmarks description
In this section we describe the benchmarks which will
be used to evaluate the performance of the combinatorial
method for evaluating the yield. The benchmarks are two
scalable examples which instantiate systems-on-chip of in-
creasing numbers of components. The ﬁrst scalable exam-
ple, called MSn, is the system-on-chip with the architecture
illustrated in Figure 4. The system includes a cluster of two
“master” Intellectual Property cores IPM and n clusters in-
cluding two “slave” Intellectual Property cores IPS. Those
Intellectual Property cores are interconnected using com-
munication modules CM and CS and two buses. Buses are
assumed to be not affected by manufacturing defects. This
0-7695-1959-8/03 $17.00 (c) 2003 IEEEProceedings of the 2003 International Conference on Dependable Systems and Networks (DSN’03) 
IPM_1
IPM_2
CM_1_A
CM_1_B
CM_2_A
CM_2_B
CS_1_1_A
IPS_1_1
CS_1_1_B
CS_1_2_A
IPS_1_2
CS_1_2_B
CS_n_1_A
IPS_n_1
CS_n_1_B
CS_n_2_A
IPS_n_2
CS_n_2_B
Figure 4. Architecture of system-on-chip MSn.
implies that the system can be conceptualized as made up of
only IPMs, IPSs and communication modules. The system
is operational if at least an unfailed IPM can communicate
with at least an unfailed IPS of each cluster using unfailed
communication modules. The communication between the
IPM and each IPS has to be direct, i.e. it can only involve
a bus and two communication modules. Manufacturing de-
fects are assumed to follow a negative binomial distribution
with clustering parameter     ; for the expected num-
ber of defects two values will be assumed:     and
   . Furthermore, the probabilities P
i
will be taken so
that P
L
 
P
C
i 
P
i
   (and, then,   has the values
1 and 2) and, calling, P
IPM
the P
i
probability of an IPM,
P
IPS
the P
i
probability of an IPS, and P
C
the P
i
probability
of a communication module, the following relationships are
satisﬁed: P
IPS
P
IPM
  , P
C
P
IPM
  .
The second scalable example is the system-on-chip
ESEN n   m with the architecture described in Figure 5
for the case n   ,m   . The system includes n m	
Intellectual Property cores IPA and n   m	 Intellec-
tual Property cores IPB interconnected by a ESEN multiex-
change interconnection network with n inputs [28], through
m    concentrators (C) in case m  , in which each
switching element (SE) of the ﬁrst and last stage have a re-
dundant copy. The system is operational if n  m	  
unfailed IPAs and n m	 unfailed IPBs can commu-
nicate through the interconnection network. It is assumed
that links are not affected by manufacturing defects. Thus,
the system can be conceptualized as made up of only IPAs,
IPBs, SEs and, in case m  , Cs. As in the ﬁrst scalable
example, manufacturing defects are modeled using a nega-
tive binomial distribution with clustering parameter     
and for the expected number of defects two values will be
assumed:     and    . Furthermore, the probabilities
P
i
will be taken so that P
L
 
P
C
i 
P
i
   (and, then,

  has the values 1 and 2) and, calling, P
IPA
the P
i
proba-
bility of an IPA, P
IPB
the P
i
probability of an IPB, P
SE
the
P
i
probability of a SE, and P
C
the P
i
probability of a C,
the following relationships are satisﬁed: P
IPB
P
IPA
  ,
P
SE
P
IPA
   and P
C
P
IPA
  .
Table 1 gives the number of components C of the bench-
marks which will be used to evaluate the performance of the
combinatorial methods and the number of gates of the gate-
SE_2_0
SE_0_1
SE_0_2 SE_1_2
SE_1_3
SE_2_1
SE_0_3
SE_2_2
SE_2_3
SE_1_1
SE_4_0
SE_4_2
SE_4_3
SE_4_1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
IPA_0
IPA_1
IPA_2 C
C
IPA_3
Stage 2Stage 0 Stage 1 Stage 3
SE_1_0
IPB_4
IPB_5
IPB_6
IPB_7
IPB_0
IPB_1
IPB_2
IPB_3
IPA_4
IPA_5
IPA_6
IPA_7
SE_0_0
Figure 5. Architecture of system-on-chip ESEN8x2.
Table 1. Number of components (C) of the benchmarks
and number of gates of the gate-level descriptions of the
corresponding fault-tree functions.
benchmark C gates
MS2 18 27
MS4 30 51
MS6 42 75
MS8 54 99
MS10 66 123
ESEN4x1 14 13
ESEN4x2 26 26
ESEN4x4 34 74
ESEN8x1 32 73
ESEN8x2 56 122
ESEN8x4 72 314
level descriptions of the corresponding fault-tree functions
used in the experiments.
4 Results
All experiments reported in this section were performed
in a workstation with a Sun-Blade-1000 processor and 4 GB
of memory. We will examine ﬁrst how the ordering of the
multiple-valued variables w v

     v
M
affects the size of
the ROMDD. After that, we will examine how the ordering
of the binary variables within each group of binary vari-
ables encoding a multiple-valued variable affects the size
of the coded ROBBD which is built to derive from it the
ROMDD. We will run the method with an error requirement
      
 
. Table 2 gives the sizes (number of nodes)
0-7695-1959-8/03 $17.00 (c) 2003 IEEEProceedings of the 2003 International Conference on Dependable Systems and Networks (DSN’03) 
of the ROMDD for all benchmarks under the orderings of
the multiple-valued variables wv, wvr, vw, vrw, t, w, and
h deﬁned in Section 2. The heuristic weight (w) is consis-
tently the one which yields better results. The ordering wvr
(W
 
V
M
       V
 
) gives ROMDDs of exactly the same size as
w, but it fails in one case in which the method succeeds under
the ordering w. Table 3 gives the sizes (number of nodes) of
the coded ROBDDs from which the ROMDDs are obtained
for the ordering w for the multiple-valued variables and the
orderings ml, lm and w for the groups of binary variables
considered in Section 2 to be used in conjunction with the
ordering w for the multiple-valued variables. The heuristic
ml seems to be the best one: it gives better results in all cases
except for MS4, in which the other two heuristics perform
slightly better. It is interesting to note that the differences
among the three heuristics are small and that the heuristics
lm and w give exactly the same results in all cases. Based on
our experiments, it seems that the best heuristics are w for
the multiple-valuedvariables and ml for the groups of binary
variables encoding each multiple-valued variable. We will
asses more depthly the performance of the method for those
heuristics. Table 4 gives, for the benchmarks in which the
method succeeded, the CPU times, peak number of ROBDD
nodes (maximum sum of the nodes of the ROBDDs which
had to be held simultaneously in memory when processing
the generalized fault-tree), size of the coded ROBDD and
size of the ROMDD. Several comments are in order. First,
CPU times are reasonable in all cases, since in the worst case
(ESEN8x2,    ) the CPU time is about 18 minutes. Sec-
ond, the peak number of ROBDD nodes can be or not much
larger than the size of the ﬁnal coded ROBDD. In practice,
the application of the method is limited by that peak, since
it is that peak which determines the peak memory consump-
tion of the method. Third, the size of the coded ROBDD
is always about 10 times the size of the ROMDD. With
that factor, even an efﬁcient implementation of ROMDDs is
likely to consume more memory than the coded ROBDD,
which has a much simpler structure. Thus, the approach of
working with coded ROBDDs and translate the ﬁnal coded
ROBDD to the ROMDD required to perform the yield com-
putations seems to be a good approach. This is consistent
with the conclusion reached by researchers in the ROMDD
community that coded ROBDDs is probably the most ef-
ﬁcient way of handling ROMDDs [24]. Putting all results
together, it seems that the method can efﬁciently compute
the yield of systems with up to about 60 components when
the average number of lethal defects is moderate (   )
and up to about 30 components when the average number of
lethal defects is large (   ). The number of components
which the method can handle depends, of course, on the
value of the truncation parameter M . That parameter had
value 6 for the examples with     and value 10 for the
examples with    .
5 Conclusions
Systems-on-chip have reached a complexity degree that
make them very susceptible to manufacturing defects so that
reasonable yields can only be achieved with the use of fault-
tolerant techniques. That application of fault-tolerance calls
for efﬁcient methodologies for evaluation of yield of fault-
tolerant systems-on-chip. Such evaluation is difﬁcult be-
cause realistic models for manufacturing defects production
have clustering and, thus, introduce dependencies among
the failed states of the components making up the system.
In this paper, we have developed a combinatorial method
for the evaluation of yield of fault-tolerant systems-on-chip
supporting realistic models with clustering for manufactur-
ing defects production. The method builds a ROMDD of a
boolean function with multiple-valued variables which al-
lows to compute with a predeﬁned accuracy the yield. The
ROMDD is built automatically from a gate-level descrip-
tion of the fault-tree specifying the structure function of the
system. The computational complexity of the method in-
creases with the expected number of lethal defects in the
fault-tolerant system. We have shown, however, that the
method is able to deal using currently affordable computa-
tional resources with systems having tens of components.
In the future, we are planning to extend the method to allow
the evaluation of the operational reliabilityof a fault-tolerant
system-on-chip taking into account manufacturing defects.
References
[1] A. V. Aho, J. E. Hopcroft and J. D. Ullman, Data structures and algorithms,
Addison-Wesley, 1983.
[2] The BDD Library. Available at
http   www    cscmuedu modelcheckbddhtml .
[3] L. Benini and G. De Micheli, “Networks on Chip: A New SoC Paradigm,”
IEEE Computer, 2002, vol. 35, no. 1, pp. 70–78.
[4] M. Bouissou, F. Bruye`re and A. Rauzy, “BDD Based Fault-Tree Processing:
A Comparison of Variable Ordering Heuristics,” Proc. European Safety and
Reliability AssociationConference (ESREC’97),1997, C. Guedes Soares, ed.,
vol. 3, pp. 2045–2052.
[5] R. E. Bryant, “Graph-Based Algorithms for Boolean Function Manipulation,”
IEEE Trans. on Computers, 1986, vol. C-35, no. 8, pp. 677–691.
[6] K. M. Butler, D. E. Ross, R. KapurandM. Ray Mercer, “Heuristics to Compute
Variable Orderings for Efﬁcient Manipulation of Ordered Binary Decision
Diagrams,” Proc. 28th ACM/IEEE Design Automation Conference, 1991, pp.
417–420.
[7] J. Cunningham,“The Use and Evaluation of Yield Models in Integrated Circuit
Manufacturing,” IEEE Trans. on SemiconductorManufacturing, 1990, vol. 3,
no. 2, pp. 60–71.
[8] M. Fujita, H. Fujisawa and N. Kawato, “Evaluation and Improvements of
Boolean Comparison Method Based on Binary Decision Diagrams,” Proc.
IEEE Int. Conf. on Computer Aided Design (ICCAD’88), 1988, pp. 2–5.
[9] M. Fujita, Y. Matsunaga and T. Kakuda, “On Variable Ordering of Binary
Decision Diagrams for the Application of Multi-level Logic Synthesis,” Proc.
IEEE European Conference on Design Automation (EDAC’91), 1991, pp.
50–54.
[10] M. Fujita, H. Fujisawa and Y. Matsunaga, “Variable Ordering Algorithms for
Ordered Binary Decision Diagrams and Their Evaluation,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 12, no. 1,
January 1993, pp. 6–12.
0-7695-1959-8/03 $17.00 (c) 2003 IEEEProceedings of the 2003 International Conference on Dependable Systems and Networks (DSN’03) 
Table 2. Size (number of nodes) of the ROMDDs used in the method for the evaluation of yield for          for the heuristics
for the ordering of the multiple-value variables wv, wvr, vw, vrw, t, w, and h (— indicates that the method failed due to excessive
memory requirements).
benchmark wv wvr vw vrw t w h
MS2,      3,202 2,034 2,035 73,405 3,202 2,034 3,202
MS4,      28,392 22,760 22,761 882,505 28,392 22,760 28,392
MS6,      119,260 103,228 103,229 3,989,917 119,260 103,228 119,260
MS8,      344,320 309,136 309,137 — 344,320 309,136 344,320
MS10,      797,908 731,748 731,749 — 797,908 731,748 797,908
MS2,      25,038 7,534 7,535 — 25,038 7,534 25,038
MS4,      1,345,390 — — — 1,345,350 635,530 1,345,350
ESEN4x1,      5,090 3,046 3,047 190,059 5.090 3,046 5,090
ESEN4x2,      11,031 6,995 6,996 486,205 11,031 6,995 11,031
ESEN4x4,      29,391 19,547 19,548 1,469,685 29,391 19,547 29,391
ESEN8x1,      169,764 134,512 134,513 — 169,764 134,512 169,764
ESEN8x2,      373,117 303,657 303,658 — 373,117 303,657 373,117
ESEN4x1,      38,594 11,666 11,667 — 38,594 11,666 38,594
ESEN4x2,      97,671 30,783 30,784 — 67,671 30,783 97,671
ESEN4x4,      296,175 96,231 96,232 — — 96,231 —
Table 3. Size (number of nodes) of the coded ROBDDs used in the method for the evaluation of yield for           for the
heuristic w for the ordering of the multiple-value variables and the heuristic ml, lm and w for the ordering of the groups of binary
variables.
benchmark ml lm w
MS2,      24,237 28,418 28,418
MS4,      243,254 236,915 236,915
MS6,      1,120,255 1,290,274 1,290,274
MS8,      3,154,056 3,283,401 3,283,401
MS10,      7,954,261 10,019,092 10,019,092
MS2,      361,428 439,700 439,700
MS4,      11,885,214 11,492,704 11,492,704
ESEN4x1,      19,338 20,721 20,721
ESEN4x2,      54,705 65,208 65,208
ESEN4x4,      184,332 283,338 283,338
ESEN8x1,      904,777 972,506 972,506
ESEN8x2,      2,244,340 2,796,165 2,796,165
ESEN4x1,      105,511 109,692 109,692
ESEN4x2,      378,686 414,939 414,939
ESEN4x4,      1,513,441 2,117,587 2,117,587
0-7695-1959-8/03 $17.00 (c) 2003 IEEEProceedings of the 2003 International Conference on Dependable Systems and Networks (DSN’03) 
Table 4. Performance of the method for the evaluation of yield for         with the heuristic w for ordering the multiple-valued
variables and the ordering ml for the groups of binary variables.
benchmark CPU time (s) ROBDD peak ROBDD ROMDD yield
MS2,      0.98 30,987 24,237 2,034 0.944
MS4,      6.23 427,130 243,154 22,760 0.965
MS6,      66.4 2,564,600 1,120,255 103,228 0.975
MS8,      262.1 7,518,549 3,154,056 309,136 0.980
MS10,      862.2 20,344,432 7,954,261 731,748 0.984
MS2,      3.59 124,067 116,960 7,534 0.830
MS4,      827.7 14,175,238 11,885,214 635,530 0.885
ESEN4x1,      0.86 37,231 19,338 3,046 0.910
ESEN4x2,      2.72 200,272 54,705 6,995 0.848
ESEN4x4,      14.64 368,815 184,332 19,547 0.829
ESEN8x1,      172.85 6,544,206 904,777 134,512 0.881
ESEN8x2,      1060.7 29,926,091 2,244,340 303,657 0.835
ESEN4x1,      3.47 143,633 105,511 11,666 0.756
ESEN4x2,      18.34 757,529 378,686 30,783 0.642
ESEN4x4,      108.52 3,027,309 1,513,441 96,231 0.605
[11] I. Koren and D. K. Pradhan, “Yield and performance enhancement through
redundancy in VLSI and WSI multiprocessor systems,” Proceedings of the
IEEE, vol. 74, no. 5, May 1986, pp. 699-711.
[12] I. Koren and D. K. Pradhan, “Modeling the effect of redundancyon yield and
performance of VLSI systems,” IEEE Trans. on Computers, vol. C-36, no. 3,
March 1987, pp. 344-355.
[13] I. Koren, Z. Koren and D. K. Pradhan, “Designing Interconnection Buses
in VLSI and WSI for Maximum Yield and Minimum Delay,” IEEE J. of
Solid-State Circuits, 1988, vol. 23, no. 3, pp. 859–865.
[14] I. Koren and C. H. Stapper, “Yield Models for Defect-TolerantVLSI Circuits:
A Review,” Defect and Fault Tolerance in VLSI Systems. vol. I (Koren I., ed.),
Plenum, 1989, pp. 1–21.
[15] I. Koren, Z. Koren and C. A. Stapper, “A Uniﬁed Negative-Binomial Dis-
tribution for Yield Analysis of Defect-Tolerant Circuits,” IEEE Trans. on
Computers, 1993, vol. 42, no. 6, pp. 724–734.
[16] I. Koren, Z. Koren and C. Stapper,“A Statistical Study of Defect Maps of Large
Area VLSI IC’s,” IEEE Trans. on Very Large Scale Integration Systems, 1994,
vol. 2, no. 2, pp. 249–256.
[17] I. Koren and Z. Koren, “Analysis of a Hybrid Defect-Tolerance Scheme for
High-Density Memory ICs,” Proc. IEEE Int. Symp. on Defect and Fault Tol-
erance in VLSI Systems, 1997, pp. 38–42.
[18] I. Koren and Z. Koren, “Defect Tolerance in VLSI Circuits: Techniques and
Yield Analysis,” Proceedingsof the IEEE, 1999, vol. 86, no. 9, pp. 1819–1838.
[19] T. M. Mak, D. Bhattacharya, C. Prunty, B. Roeder, N. Ramadan, J. Ferguson,
and Y. Jianlin,“Cache RAM inductive fault analysis with fab defect modeling,”
Proc. IEEE Int. Test Conference, 1998, pp. 862–871.
[20] S. Malik, A. R. Wang, R. K. Brayton, A. Sangiovanni-Vincentelli, “Logic
Veriﬁcation using Binary Decision Diagrams in a Logic Synthesis Environ-
ment,” Proc. IEEE Int. Conf. on Computer-Aided Design (ICCAD’88), 1988,
pp. 6–9.
[21] C. Metra, S. Di Francescantonio, T. M. Mak, and B. Ricco, “Evaluation of
clock distribution networks’ most likely faults and produced defects,” Proc.
IEEE Int. Symp. on Defect and Fault Tolerance in VLSI Systems, 2001, pp.
357–365.
[22] F. J. Meyer and D. K. Pradhan, “Modeling Defect Spatial Distribution,” IEEE
Trans. on Computers, vol. 38, no. 4, April 1989, pp. 538–546.
[23] D. M. Miller and R. Drechsler, “Implementing a Multiple-Valued Decision
Diagram Package,” Proc. 28th IEEE Int. Symp. on Multiple-Valued Logic,
1998, pp. 52–57.
[24] Miller, D. M., private communication collecting the conclusions of the 30th
IEEE Int. Symp. on Multiple-ValuedLogic, 2000.
[25] S. Minato, N. Ishiura and S. Yajima, “Shared binary decision diagram with
attributed edges for efﬁcient Boolean function manipulation,” Proc. 27th
ACM/IEEE Design Automation Conference, 1990, pp. 52–57.
[26] M. Nikolskaı¨a, A. Rauzy and D. J. Sherman, “Almana: A BDD Minimization
Tool IntegratingHeuristic and Rewriting Methods,”Proc. Int. Conf. onFormal
Methods in Computer Aided Design (FMCAD), 1998, pp. 100–114.
[27] D. Nikolos, H. T. Vergos, “On the Yield of VLSI Processors with On-Chip
CPU Cache,” IEEE Trans. on Computers, 1999, vol. 48, no. 10, pp. 1138–
1144.
[28] S. Rai and Y. C. Oh, “Tighter Bounds on Full Access Probability in Fault-
Tolerant Multistage Interconnection Networks,” IEEE Trans. on Parallel and
Distributed Systems, vol. 10, no. 3, March 1999, pp. 328–335.
[29] A. Srinivasan, T. Kam, S. Malik and R. K. Brayton, “Algorithms for Discrete
Function Manipulation,” Proc. IEEE Int. Conf. on Computer-Aided Design
(ICCAD-90), 1990, pp. 92–95.
[30] A. Venkataraman and I. Koren, “Determination of yield bounds prior to rout-
ing,” Proc. IEEE Int. Symp. on Defect and Fault-Tolerance in VLSI Systems,
1999, pp. 4–13.
[31] I. A. Wagner and I. Koren, “An Interactive VLSI CAD Tool for Yield Estima-
tion,” IEEE Trans. on SemiconductorManufacturing, vol. 8, no. 2, May 1995,
pp. 130–138.
[32] J. Yu and F. J. Ferguson,“Maximum likelihood estimation for failure analysis,”
IEEE Trans. on SemiconductorManufacturing,vol. 11, no. 4, November 1998,
pp. 681–691.
0-7695-1959-8/03 $17.00 (c) 2003 IEEEProceedings of the 2003 International Conference on Dependable Systems and Networks (DSN’03) 
