An ROBDD-based combinatorial method for the evaluation of yield of defect-tolerant systems-on-chip by Carrasco, Juan A. & Suñé, Víctor
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 2, FEBRUARY 2009 207
An ROBDD-Based Combinatorial Method for
the Evaluation of Yield of Defect-Tolerant
Systems-on-Chip
Juan A. Carrasco, Senior Member, IEEE, and Víctor Suñé
Abstract—In this paper, we develop a combinatorial method for
the evaluation of the functional yield of defect-tolerant systems-on-
chip (SoC). The method assumes that random manufacturing de-
fects are produced according to a model in which defects cause the
failure of given components of the system following a distribution
common to all defects. The distribution of the number of defects is
arbitrary. The yield is obtained by conditioning on the number of
defects that result in the failure of some component and performing
recursive computations over a reduced ordered binary decision di-
agram (ROBDD) representation of the fault-tree function of the
system. The method has excellent error control. Numerical exper-
iments seem to indicate that the method is efficient and, with some
exceptions, allows the analysis with affordable computational re-
sources of systems with very large numbers of components.
Index Terms—Combinatorial method, defect-tolerant sys-
tems-on-chip (SoC), manufacturing defects, reduced ordered
binary decision diagram (ROBDD), yield.
I. INTRODUCTION
S YSTEMS-ON-CHIP (SoCs) represent a rapidly growingfield in the electronic and computer industry [1]. Applica-
tions include wireless systems [2], 3-D graphics systems [3],
reconfigurable systems [4], and others. The high densities and
areas of those integrated systems make them very susceptible
to random manufacturing defects [5]. In fact, complex SoCs are
likely to have a very small functional yield if they are not de-
signed with built-in defect-tolerance [6]. Here, we take the usual
definition of functional yield as the probability that a system
without parametric faults will not have some random defect pre-
venting the system from functioning properly. Clearly, there is
a need for efficient methodologies for estimating the functional
yield of complex defect-tolerant SoCs. When the defect-tolerant
SoC has a simple structure, it is often possible to make ad hoc
evaluations (see, for instance, [7]–[9]). However, given the trend
towards the use of a sophisticated network-on-chip as communi-
cation subsystem among the intellectual property cores (IPs) of
the SoC [10]–[12], it is foreseeable that many defect-tolerant de-
signs will not have such a simple structure. Evaluating the yield
Manuscript received September 03, 2004; revised October 16, 2007. Current
version published January 14, 2009. This work was supported by the “Comisión
Internacional de Ciencia y Tecnología” (CICYT) of the Ministry of Science and
Technology of Spain under the research Grant TAP1999-0443-C05-05 and by
the CICYT and FEDER (“Fondo Europeo de Desarrollo Regional”) under the
research Grant DPI2004-05077.
The authors are with the Universitat Politècnica de Catalunya, Barcelona
08034, Spain (e-mail: carrasco@eel.upc.edu; sunye@eel.upc.edu).
Digital Object Identifier 10.1109/TVLSI.2008.2004479
of those defect-tolerant SoCs is far from being a trivial task,
mainly because of the fact that realistic defect models have clus-
tering [13] and, thus, introduce dependencies among the failed
states of the components of the system (see, for instance, [9] and
[14]). A combinatorial method for the evaluation of the func-
tional yield of defect-tolerant SoCs has already been developed
[15]. However, the computational cost of that method is rela-
tively high and the method seems to be able to handle only de-
fect-tolerant SoCs with up to a few tens of components. The aim
of this paper is to develop a more efficient combinatorial method
for the evaluation of the functional yield of defect-tolerant SoCs
which can handle much more complex systems.
We will assume that the defect-tolerant SoC is made up of
a set of components and that whether the system
is functioning or not is determined from the failed states of the
components through a fault-tree function , where
variable takes value 1 if and only if component is failed
and the function takes value 1 if and only if the system is not
functioning. We will exclude the trivial cases
and . It will be also assumed that a gate-
level description of the fault-tree function is available.
The production of defects will be modeled using two sets
of probabilities: the probabilities , that the
number of random manufacturing defects in the area occupied
by the system is , and the probabilities ,
that a given defect causes the failure of component . It will be
assumed that any given defect will result in the failure of any
given component of the system following the probabilities ,
independently of the number of defects, of whether
the remaining defects cause a component failure or not, and of
which components affect the remaining defects. That model is
useful from the designer’s point of view, since the distribution
of the number of defects , could be pro-
vided by the manufacturer of the SoC and the probabilities ,
could be estimated as follows. Let , , and
be, respectively, the area of the system, the area of component ,
and the probability that a given defect is of type . Then, we can
take , where [16] is the probability
that any given defect of type affecting component causes the
failure of the component and the sum is taken over all possible
defect types. Each probability could be estimated from the
distribution of the size of defects of type and the layout of
component using appropriate tools [17], [18].
The assumed model is consistent with all large-area clus-
tering compound Poisson models [9], [19]. Those models re-
sult by assuming that: 1) defect clusters are comparable in size
1063-8210/$25.00 © 2009 IEEE
208 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 2, FEBRUARY 2009
with the chip; 2) the number of defects in each chip follows a
Poisson distribution whose parameter is itself a random variable
with some distribution; and 3) defects are uniformly distributed
within the area of the chip; and include the widely-used nega-
tive binomial distribution model [20], [21].
From a computational point of view, it is convenient to map
the previously described model into a model taking into account
only faults, i.e., defects that cause the failure of some compo-
nent. That model includes the probabilities
number of faults is
a given fault affects component
The mapping can be performed using
where is the probability that a given defect
causes a fault.
As previously mentioned, the negative binomial distribution
is the most widely used distribution for the number of defects in
a chip. That distribution has the form
(1)
where is the expected number of defects and is the clus-
tering parameter (the clustering increases for decreasing ). It
is known (see [22]) that, when the distribution of the number
of defects is negative binomial, the distribution of the number
of faults is also negative binomial with the same clustering pa-
rameter. More precisely, when the distribution of the number of
defects is given by (1), the distribution of the number of faults is
with . A similar result holds for any large-area clus-
tering compound Poisson model [9].
The rest of the paper is organized as follows. Section II
develops and describes the method. The method will require
the construction of an reduced ordered binary decision diagram
(ROBDD) [23] representation of the fault-tree function of
the system. Section III analyzes the computational cost of
the method using scalable benchmarks and discusses how the
method compares to alternatives. Finally, Section IV presents
some conclusions. The Appendix includes the proofs of the
results on which the method is based.
II. METHOD
Let denote the functional yield of the system and let be
the functional yield conditioned on the system being affected by
faults. Using the law of total probability
The proposed method estimates with bounded from above
absolute error by truncating the summatory to and ob-
taining estimates for with bounded from above absolute
error. Let be an error control parameter, let
(2)
and assume that an estimate for is available satisfying
. Then, the method estimates with absolute
error by
The estimates are computed with the help of an ROBDD
representation of . Let denote such a repre-
sentation for a given ordering of variables. We will start by ex-
pressing the in terms of some conditional probabilities asso-
ciated with the root node of and deriving recursive expres-
sions for the conditional probabilities associated with non-ter-
minal nodes of . After that, we will show how the used
by the method are computed by approximating these recursive
expressions with bounded from above absolute error.
A. Exact
First, we introduce some notation which will be used
throughout this paper.
, “0” and “1” terminal nodes of .
Set of non-terminal nodes of .
Root node of .
, 0- and 1-successor of node , .
Boolean function represented by node ,
.
Component such that variable is
associated with node , .
If , set of components such that
variable is located not before in the





Binary random variable with value 1 if and
only if component , is affected
by some fault.
Conditional probability that a fault affects
some component in set given that it affects
some component in set , :
. We will use
the shorthand for .
Number of elements in set .
CARRASCO AND SUÑÉ: ROBDD-BASED COMBINATORIAL METHOD FOR THE EVALUATION OF YIELD OF DEFECT-TOLERANT SOC 209
Indicator function: 1 if condition is satisfied;
0 otherwise.
Binomial probability mass function:
.
Let , , be the probability that, given that
the set of components is affected by faults, the function
, with replaced by , , has value 11. Clearly
The , can be obtained by processing recursively
using recursive expressions for , , .
To obtain these recursive expressions we exploit the “structure”
of . Several cases will have to be considered. Two of them
will be discussed in detail next. The recursive expressions for
all possible cases will be given later in the form of a theorem.
The first case is . In that case, occupies
the last position in the ordering of variables and both and
are terminal nodes. We will obtain first for
and next for . For (no fault affects component
), variable will take the value 0 with probability 1.
But, when , is reduced to the function .
It follows that is equal to the probability that the function
takes the value 1. Since is a terminal node, this
function is the constant function and, then,
. For , all faults will affect component .
Then, with probability 1 variable will take the value 1
and will be reduced to the constant function
. Therefore, for , .
The second case is , . In
words, neither nor are terminal nodes and in the or-
dering of variables, both and are located right
after , implying . Reasoning as we did
in the previously considered case for , is equal to
the probability that the function takes the value 1. But
not being a terminal node, that probability is and,
then, . The expression for is obtained
by conditioning on the number of faults affecting component
from the faults affecting components in . The condi-
tional probability that a fault affects component given that
it affects components in is . In addition, in this
case, . Then, for ,
the faults will be distributed randomly among and the
components in following a binomial prob-
ability distribution with parameters and . Then, the
probability that no fault affects component is .
With that probability, variable will take the value 0,
will be reduced to , and the faults will affect compo-
nents in , implying that will include the contri-
bution . The probability that ,
faults affect component is . With that proba-
bility, variable will take the value 1, will be reduced
to , and the remaining faults will affect components
1For the remainder of this paper, we will refer to the probability that function
    with binary variables  replaced by the corresponding binary random
variables has value 1 simply as the probability that function    has value
1.
in , implying that will include the contributions
, . Putting everything together,
in the case , , we have, for
The following theorem gives the complete set of recursive
expressions for , , .


















Cases a and b have already been proven; Cases c–j are proven
in the appendix.
B. Computation of
We could choose the truncation parameter so that the left
part is (i.e., using (2) with replaced by ), obtain ,
by processing bottom-up, using the recursive
expressions given by Theorem 1, and estimate the functional
yield of the system with absolute error bounded from above by
as , . However, that trivial method
has a computational cost per node, and the total cost
of the bottom-up processing of can be relatively large if is
large. In order to reduce that computational cost, in the proposed
method, as discussed, is chosen so that the left part is ,
estimates for , with absolute error bounded
from above by are obtained, and using them, estimates
for , with absolute error bounded from above by
are obtained, yielding an estimate for the functional yield
with absolute error bounded from above by . The
estimates for , will be obtained by pro-
cessing bottom-up, using approximate versions of the recur-
sive expressions of Theorem 1, in which the summatories have
been truncated. The truncations take advantage of the fact that
a Bernoulli random variable with replications and small “suc-
cess” probability has, for large , a number of successes much
smaller than with high probability. For large , the bottom-up
processing of will account for a significant portion of the
computational cost of the trivial method, the truncations of the
summatories will tend to be significant, and the computational
cost of the method will tend to be significantly smaller than in
the trivial method.
The following theorem shows how the summatories involved
in Theorem 1 can be truncated so that , are es-
timated with controlled absolute error. The theorem does not
specify the truncation points; it only gives conditions which the
truncation points have to satisfy. Efficient procedures for the se-
lection of truncation points minimizing the number of terms in
the summatories will be described after the theorem.
Theorem 2: Let , . For , let ,
be defined as follows.
Case a) :
Case b) :
where, if , then , , and
otherwise , , are integers satisfying
Case c)
:
where , are as in Case b with replaced by ,
and , , are integers satisfying
Case d) :
CARRASCO AND SUÑÉ: ROBDD-BASED COMBINATORIAL METHOD FOR THE EVALUATION OF YIELD OF DEFECT-TOLERANT SOC 211
where , are as in Case b.
Case e)
:
where, if , then , , and




where , are as in Case e with replaced by and
, are as in Case c.
Case h)
:
where , are as in Case e.
Case i)
:
where , are as in Case c with replaced by .
Case j) :
Then
Proof: See the appendix.
We discuss next procedures for selecting the truncation points
of the summatories involved in the previous theorem and com-
puting the truncated summatories. In Cases b, c, and d of the
theorem, we have to select and and compute
with , , satisfying
for some , . In Cases c, g, and i, we
have to select and and compute
with , , satisfying
for some , . Finally, with ,
, and , in Cases e,
g, and h, we have to select and and compute
with , , satisfying
212 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 2, FEBRUARY 2009
for some , . We note that,
in the latter case, such integers exist because, since
,
.
Let denote the largest integer . The procedure for se-
lecting and and computing is based on the fact that as
goes from 0 to , the terms , first increase mono-
tonically and next decrease monotonically, reaching their largest
value at , except when is an integer, in
which case the largest value is achieved at both
and [24]. The procedure is as follows. Starting
with , we compute
and for decreasing as long as
and either or, being , and
. After that, and are updated for increasing
as long as and either , or,
being , and . From that
point on, we continue updating and for decreasing and
increasing alternatively in a similar way until becomes
or , is reached.
The procedure for selecting and and computing is
similar. Starting with , we
compute and for decreasing as long as and
either or, being , and
. After that, and are updated for increasing
as long as and either , or, being ,
and . From that point on,
we continue updating and for decreasing and increasing
alternatively in a similar way until becomes or
, is reached.
Let denote the smallest integer . The procedure for
selecting and and computing is based on the following
lemma.
Lemma 1: Assume and , and let
, , and
, , . Then, the terms
increase monotonically as goes from 1 to and
decrease monotonically as goes from to .
Proof: See the appendix.
The procedure, which uses the fact that, since ,
, is as follows.
Starting with , we compute
and for decreasing as long as
and either , or, being , and
. After that, and are updated for increasing
as long as and either or,
being , and . From that
point on, we continue updating and for decreasing and
increasing alternatively in a similar way until becomes
or , is reached.
We can now define the , used in the method
and how they are computed. The are
The estimates , are computed by traversing
bottom-up and using the recursive expressions given in
Theorem 2 with . The truncation
points and the summatories are computed using the procedures
just described. Since , we have, by Theorem 2
as required.
Remark: For efficiency reasons, ROBDD packages typically
use ROBDDs with complement edges [25]. In those ROBDDs,
a complement edge leads to a node representing the comple-
ment of the function obtained by setting the variable associated
with the node to the value (0 or 1) associated with the edge.
In addition, the top node may represent the complement of the
function. The proposed method for computing the yield can be
easily adapted to ROBDDs with complement edges. It suffices
to set in case the top node represents the comple-
ment of the function and, in the recursive expressions for
of Theorem 2, use the complements to 1 of the values associ-
ated with the nodes reached by a complement edge, e.g., for the
node , use instead of if is a
non-terminal node and instead of if
is a terminal node, in case the 0-edge is a complement edge.
III. ANALYSIS
A. Benchmarks
We describe next the benchmarks that were used to analyze
the computational cost of the proposed method. The bench-
marks are instances of three scalable SoC examples. In all
benchmarks, the number of defects is assumed to follow a
negative binomial distribution with and the probabilities
are taken so that , implying that the
probability of any given defect causing a fault is 0.5 and that
the expected number of faults is .
As previously said, there exists a trend in the SoC commu-
nity towards the use of a network-on-chip as a communication
subsystem among the IPs. Examples of such networks include
Nostrum [26], SoCBUS [27], and SoCIN [28], which use a reg-
ular 2-D mesh topology, and SPIN [29], which uses a fat-tree
topology. Then, as our first scalable example we have chosen a
defect-tolerant SoC with a network-on-chip with a regular 2-D
mesh topology. The system, called MESH , includes
groups of IPs , interconnected by a
mesh made up of switches S. The architecture of
MESH is illustrated in Fig. 1 for the case , .
The system is functioning if unfailed IPs of every
group can communicate through the mesh with un-
failed IPs of every other group. It is assumed that links are not
affected by defects. Thus, the system can be conceptualized as
made up of only ’s and S’s. The probabilities are taken
so that, calling the probability of an and the
probability of an S, all are equal and .
The second scalable example is the defect-tolerant SoC FAT-
TREE with the architecture described in Fig. 2 for the
case , .
The system includes IPs IPA and IPs IPB inter-
connected by a -ary, -tree fat-tree [30]. The system is func-
CARRASCO AND SUÑÉ: ROBDD-BASED COMBINATORIAL METHOD FOR THE EVALUATION OF YIELD OF DEFECT-TOLERANT SOC 213
Fig. 1. Architecture of defect-tolerant SoC MESH (2,2).
Fig. 2. Architecture of defect-tolerant SoC FAT-TREE (2,3).
tioning if unfailed IPAs can communicate through the
fat-tree with unfailed IPBs. It is assumed that links are
not affected by defects. Thus, the system can be conceptualized
as made up of only IPAs, IPBs, and S’s. The probabilities are
taken so that, calling the probability of an IPA,
the probability of an IPB, and the probability of an S,
and .
The last scalable example is the defect-tolerant SoC ESEN
) with the architecture described in Fig. 3 for the case
, , .
The system includes groups of IPs ,
interconnected by an ESEN multiexchange interconnec-
tion network with inputs [31], through concentrators C if
, in which each switching element (S) of the first and
last stage have a redundant copy. The system is functioning if
unfailed IPs of each group can communicate
with unfailed IPs of every other group through
the ESEN network. It is assumed that links are not affected by
Fig. 3. Architecture of defect-tolerant SoC ESEN (8, 8   2).
TABLE I
CHARACTERISTICS OF THE BENCHMARKS
defects. Thus, the system can be conceptualized as made up of
only ’s, S’s, and, if , C’s. The probabilities are
taken so that, calling the probability of an ,
the probability of an S, and the probability of a C, all
are equal, , and .
Table I gives the number of components of the instances of the
scalable examples used as benchmarks in the experiments and
the numbers of gates and edges of the fault-trees which were
used to specify the fault-tree functions. Those fault-trees were
generated automatically using code specific for every scalable
example.
B. Results
All results were obtained on a workstation equipped with four
Dual-Core AMD Opteron processor chips at 2.2 GHz and 32
GB of main memory, using only one core and limiting memory
consumption to 4 GB. To build the ROBDD representations, we
used the CUDD package [25], which constructs ROBDDs with
complement 0-edges. The order of the variables was chosen
prior to building the ROBDDs using the heuristic weight de-
scribed in [32] for the MESH and ESEN ) bench-
marks, and the heuristic H4 described in [33] for the FAT-TREE
benchmarks.
We start by showing that the benchmarks cover a wide range
of design scenarios and that the proposed method can be ap-
plied to very complex SoCs for a wide range of values of the
expected number of defects . Fig. 4 plots the functional yield
of the benchmarks as a function of . We can note that the
214 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 2, FEBRUARY 2009
Fig. 4. Functional yield as a function of the expected number of defects.
Fig. 5. CPU times with      as a function of the expected number of
defects.
benchmarks cover a wide range of dependencies of the func-
tional yield with respect to . Fig. 5 gives the CPU times con-
sumed by the method with an error requirement as a
function of . As it can be seen, the method is able to compute
the functional yield with a tight error requirement for very com-
plex systems for as large as 200 in reasonable CPU times, the
only exception being the FAT-TREE (4,3) benchmark which has
only a moderate number of components (112). Attempt to apply
the method to a FAT-TREE benchmark with a larger number of
components resulted in the failure of the method due to exces-
sive memory consumption. A behavior even worse than that of
the FAT-TREE example is possible since the size of the ROBDD
is in the worst case exponential with the number of variables of
the represented function [34].
Table II analyzes the computational cost of the method in
more detail. We give the size (number of nodes) of the ROBDD
representation of the fault-tree function, the memory consump-
tion, the total CPU time (tot), and the CPU time of the bottom-up
processing of the ROBDD (proc). In all cases the computa-
tional cost both in terms of memory and CPU time is moderate.
In addition, the CPU time of the bottom-up processing of the
TABLE II
COMPUTATIONAL COST OF THE METHOD WITH     
Fig. 6. CPU times for     as a function of the error requirement  .
ROBDD is practically negligible up to and afterwards
seems to increase relatively slowly with for almost all the
benchmarks.
We analyze next the impact of the error requirement on the
CPU time. Fig. 6 shows the CPU times for as a function
of . As it can be seen, the CPU time increases moderately with
the error requirement.
To end, we analyze the extent to which the truncation of the
summatories of the recursive expressions for , re-
duces the CPU time of the bottom-up processing of the ROBDD
and the CPU time of the method for large values of . Table III
compares the CPU times of the bottom-up processing of the
ROBDD (proc) and the total CPU times (tot) for the proposed
method and for the trivial method (without truncation of sum-
matories) for and an error requirement .
The proposed method has significantly smaller CPU times than
the trivial method, with a reduction factor ranging from 2 to 21
for the CPU times of the bottom-up processing of the ROBDD
and from 2 to 15 for the total CPU times. The savings are sig-
nificantly smaller for the FAT-TREE benchmark. This has to
CARRASCO AND SUÑÉ: ROBDD-BASED COMBINATORIAL METHOD FOR THE EVALUATION OF YIELD OF DEFECT-TOLERANT SOC 215
TABLE III
IMPACT OF THE TRUNCATION OF THE SUMMATORIES
do with the fact that the benchmark has a significantly smaller
number of components, causing the “success” probability of the
Bernoulli distributions appearing in the summatories to be sig-
nificantly larger, and causing the truncations of those summato-
ries to be less significant. The values of the truncation param-
eter was 585 for the proposed method and 559 for the trivial
method.
Summarizing, the proposed method seems, with some excep-
tions, to be able to process with moderate computational cost
defect-tolerant systems with very large numbers of components
for a wide range of values of and .
C. Discussion of Alternatives
There seem to be only three immediate alternatives to the pro-
posed method: the combinatorial method described in [15], the
so-called “compounding technique” (see, e.g., [13]), and simu-
lation. The compounding technique can only be used with com-
pound Poisson models.
The method described in [15] uses reduced ordered multi-
valued decision diagrams. However, experiments in [15] indi-
cated that that method has a much larger computational cost
than the method proposed in this paper and is able to handle
efficiently only SoCs with up to a few tens of components. As
a confirmation, the method was not able to analyze any of the
five benchmarks considered in this paper with when run
with an error requirement and a memory limitation of
4 GB. In all cases, the method failed due to excessive memory
consumption.
The compounding technique exploits the fact that any com-
pound Poisson model can be interpreted as a Poisson model in
which the parameter is a random variable with some distribu-
tion, and that components are statistically independent when the
number of faults follows a Poisson distribution. Then, the yield
of the system can be computed by combining: 1) a standard
ROBDD-based combinatorial method [35] to compute the yield
of the system conditioned to the parameter of the Poisson dis-
tribution having a specified value and 2) a numerical integration
method requiring only values of the function to be integrated
to obtain the yield from values of the product of the conditional
yield and the probability density function of the parameter of the
Poisson distribution. That alternative requires the construction
of the ROBDD representation of the fault-tree function of the
system exactly as the method proposed in this paper. In addition,
it requires to make potentially many traversals of the ROBDD
with a constant cost per node in terms of CPU time consumption.
However, the method does not provide rigorous error control,
as the method proposed in the paper does. On the other hand,
the method proposed in the paper requires only one traversal of
the ROBDD but with a cost per node that depends on and ,
and has a potentially larger memory requirement resulting from
the need of storing floating-point variables per node of
the ROBDD. To compare with our method, we implemented the
compounding technique using the well-known Quadpack nu-
merical integration package [36] to perform the numerical in-
tegration and ran it on the five benchmarks with and
a target absolute error . In terms of memory con-
sumption, the alternative was cheaper in all cases. Thus, for the
MESH (8,32) benchmark it required 81.6 MB versus the 113
MB required by our method; for the MESH (32,32) benchmark,
112 MB versus 142 MB; for the FAT-TREE (4,3) benchmark,
12.7 MB versus 13.0 MB; for the ESEN (128, 32 32) bench-
mark, 57.3 MB versus 78.1 MB; and for the ESEN (512, 32
32) benchmark, 48.9 MB versus 69.6 MB. With regard to CPU
time, our method was slightly faster: 5.65 s versus 6.24 s for
the first benchmark, 2.90 versus 3.96 s for the second, 0.092
versus 0.116 s for the third, 6.71 versus 8.72 s for the fourth,
and 5.77 versus 7.46 s for the fifth. In summary, the price paid
in the proposed method in terms of computational cost seems to
be moderate and justified for having strict error control.
Simulation has clearly a smaller memory consumption than
our method. For practical purposes, that consumption is reduced
to the memory required to hold the description of the fault-tree.
On the other hand, it suffers from poor error control (simulation
offers just a confidence interval for the estimate which is known
to succeed with a given probability, assuming that the estimation
of the variance by the sample variance is accurate) and is poten-
tially very time-consuming if the yield is neither close to 0% nor
close to 100% and has to be estimated with high accuracy. To
analyze the second issue, we built an efficient simulator and ran
it on the five benchmarks with . The simulator used the
fault-tree to determine the functioning state of the SoC. Table IV
gives the computational cost of the simulation with a target 95%
confidence interval of , , and . In compar-
ison with our method (see Table II), we note that, as expected,
in all cases simulation has a significantly smaller memory con-
sumption. However, the CPU times are large, particularly when
the yield to be estimated is not too close to 100% nor to 0%
(see Fig. 4), and increase sharply with the reciprocal of the re-
quired half-width of the confidence interval. In summary, due to
its poor error control, simulation seems to be an alternative to
be used only when our method fails due to excessive memory
consumption or excessive CPU time, and in that case with a not
too tight accuracy requirement.
IV. CONCLUSION
In this paper, we have developed a new combinatorial method
for the evaluation of the functional yield of defect-tolerant SoCs.
The inputs of the method are the distribution of the number
of random manufacturing defects in the area occupied by the
system, which can be provided by the manufacturer of the SoC,
and, for each component making up the system, the probability
that a given defect causes the failure of the component, which
can be estimated from data of the manufacturing process and
the layout of the components using existing tools. The method
requires the construction of an ROBDD representation of the
fault-tree function of the system and provides rigorous error
216 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 2, FEBRUARY 2009
TABLE IV
COMPUTATIONAL COST OF SIMULATION WITH A TARGET 95% CONFIDENCE
INTERVAL OF    ,    , AND   
control. Our experiments seem to indicate that, with some ex-
ceptions, the method allows the analysis with affordable compu-
tational resources of defect-tolerant SoCs with very large num-
bers of components for a wide range of values of the expected
number of defects and the error requirement.
APPENDIX
A. Proof of Cases c–j of Theorem 1
Let be the number of faults affecting components in .
We will first justify the expressions for and next for
. For , with probability 1 variable will
take the value 0 and will be reduced to the function
. It follows that is equal to the probability that
the function takes the value 1. In Cases c, e, f, g, and i,
is not a terminal node, implying that .
In Cases d, h, and j, is a terminal node. Then, in those
cases, is the constant function and, therefore,
. The expressions for are justified next
on a case by case basis.
Case c) In this case, neither nor is a terminal node,
includes components different from not in
(the components in ), and does not
include components different from not in .
The expression is obtained by conditioning on the number
of faults affecting component from the faults af-
fecting components in . The probability that no fault
affects component is . With that proba-
bility, variable will take the value 0, will be
reduced to , and the faults will affect components
in . In that case, with probability ,
faults, will affect components in
and faults will affect components in . Let
. With probability , faults will affect
component , variable will take the value 1,
will be reduced to , and the remaining
faults will affect components in . This justifies the
recursive expression.
Case d) In this case, is a terminal node, is not a ter-
minal node, and does not include components dif-
ferent from not in . The expression is obtained
by conditioning on the number of faults affecting compo-
nent from the faults affecting components in .
The probability that no faults affects component is
. With that probability, variable will take
the value 0 and will be reduced to the constant func-
tion . The probability that , faults
affect component is . With that probability,
variable will take the value 1, will be reduced
to , and the remaining faults will affect compo-
nents in . This justifies the recursive expression.
Case e) In this case, neither nor is a terminal node,
does not include components different from
not in , and includes components different
from not in (the components in ).
The expression is obtained by conditioning on the number
of faults affecting component from the faults af-
fecting components in . The probability that no fault
affects component is . With that proba-
bility, variable will take the value 0, will be
reduced to , and all faults will affect components
in . The probability that , faults
will affect components in
with some of them affecting component is
. With that probability, variable
will take the value 1, will be reduced
to , and faults will affect components in
. This justifies the recursive
expression.
Case f) In this case, is not a terminal node, is a ter-
minal node, does not include components different
from not in , and includes components
different from . With probability , variable
will take the value 0, will be reduced to ,
and all faults will affect components in . With
probability , variable will take the
value 1 and will be reduced to the constant function
. This justifies the recursive expression.
Case g) In this case, neither nor is a terminal node,
includes components different from not in
(the components in ), and includes
components different from not in (the
components in ). The expression is obtained by con-
ditioning on the number of faults affecting component
from the faults affecting components in . The prob-
ability that no fault affects component is .
With that probability, variable will take the value
0, will be reduced to , and the faults will
affect components in . In that case, with probability
, faults, will affect components in
and faults will affect components in .
It remains to deal with the case in which some fault affects
component . The probability that , faults
will affect components in
with some of them affecting component is
CARRASCO AND SUÑÉ: ROBDD-BASED COMBINATORIAL METHOD FOR THE EVALUATION OF YIELD OF DEFECT-TOLERANT SOC 217
. With that probability, variable
will take the value 1, will be reduced
to , and faults will affect components in
. This justifies the recursive
expression.
Case h) In this case, is a terminal node, is not a
terminal node, and includes components different
from not in (the components in ). The
expression is obtained by conditioning on the number of
faults affecting component from the faults affecting
components in . The probability that no fault affects
component is . With that probability, vari-
able will take the value 0 and will be reduced
to the constant function . The probability that ,
faults will affect components in
with some of them affecting component
is
. With that probability, variable will
take the value 1, will be reduced to , and
faults will affect components in .
This justifies the recursive expression.
Case i) In this case, is not a terminal node, is a ter-
minal node, and includes components different from
not in (the components in ). The prob-
ability that no fault affects component is .
With that probability, variable will take the value 0,
will be reduced to , and the faults will af-
fect components in . In that case, with probability
, faults, will affect components in
and faults will affect components in .
It remains to deal with the case in which some fault affects
component . The probability that some fault affects
component is . With that probability,
variable will take the value 1 and will be re-
duced to the constant function . This justifies the
recursive expression.
Case j) In this case, both and are terminal nodes and
includes components different from . The prob-
ability that no fault affects component is .
Then, with that probability, variable will take the
value 0 and will be reduced to the constant func-
tion and, with probability , variable
will take the value 1 and will be reduced to the
constant function . This justifies the recursive ex-
pression.
B. Proof of Theorem 2
The proof is by complete induction on . The result
is trivially true for because this implies
and, then, from Case a of both Theorem 1 and the the-
orem, , . We will assume now that the
result holds for , , , and will show that,
for ,
(3)
We start by proving (3) for . Since , Case a
of the theorem is impossible. In Cases b, c, e, f, g, and i of the
theorem, we have, using the corresponding case of Theorem 1
and the induction hypothesis
In Cases d, h, and j of the theorem the result is trivial because,
using the corresponding cases of Theorem 1, .
We prove next (3) for on a case by case basis, ignoring
Case a, which is impossible since .
Case b) Let ,
. Then, using: 1) the in-
duction hypothesis; 2) , ;
3) ; and 4)
, which, we note, also holds when
, because in that case ,
(4)
Finally, using: 1) Case b of Theorem 1, 2) the induction
hypothesis, 3) inequality (4), and 4) the fact that, with
and , , we
have, for ,
218 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 2, FEBRUARY 2009
Case c) Let ,
. Then, using: 1)
the induction hypothesis, 2) ,






analogously as we did to obtain (4), we have
(6)
Finally, using: 1) Case c of Theorem 1, 2) inequalities (5)
and (6), and 3) the fact that, with and ,
, we have, for ,
Case d) Let , be exactly as in Case b (the integers , sat-
isfy the same conditions). Then, using: 1) Case d of The-
orem 1, 2) inequality (4), and 3) the fact that, with
and , , we have, for ,
Case e) Let
,
. Then, using: 1) the induction




which, we note, also holds when , because
in that case ,
(7)
Finally, using: 1) Case e of Theorem 1, 2) the induction
hypothesis, 3) inequality (7), and 4) the fact that, with
and ,
, we have, for ,
CARRASCO AND SUÑÉ: ROBDD-BASED COMBINATORIAL METHOD FOR THE EVALUATION OF YIELD OF DEFECT-TOLERANT SOC 219
Case f) Using Case f of Theorem 1 and the induction hypothesis,
we have, for ,
Case g) Let , be exactly as in Case c (the in-
tegers , satisfy the same conditions).
Furthermore, let
,
. Reasoning analogously as we
did to obtain (7), we have
(8)
Then, using: 1) Case g of Theorem 1, 2) inequalities (5)
and (8), and 3) the fact that, with and ,
, we have, for
Case h) Let , be exactly as in Case e (the integers , sat-
isfy the same conditions). Then, using: 1) Case h of The-
orem 1; 2) inequality (7); and 3) the fact that, with
and ,
, we have, for
Case i) Let ,
. Reasoning analogously
as we did to obtain (5), we have
Then, using: 1) Case i of Theorem 1, 2) the previous in-
equality, and 3) , we have, for
Case j) The result is immediate because, using Case j of The-
orem 1, , .
C. Proof of Lemma 1
From the assumption , it is clear that
both and are 0 and 1.
Further, since , . Therefore,
for we can write
Using elementary analysis techniques it is straightforward to
show that for integer 1 and , the function
increases on and tends to as
approaches 1. Then, since
implying that, for ,
Accordingly, the term , is larger than the
preceding one if and smaller if .
This proves that the terms increase monotonically as goes from
1 to and decrease monotonically as goes from
to .
220 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 2, FEBRUARY 2009
REFERENCES
[1] J.-C. Lo, C. Metra, and F. Lombardi, “Guest editors’ introduction: Spe-
cial section on design and test of systems-on-chip (SoC),” IEEE Trans.
Computers, vol. 55, no. 2, pp. 97–98, Feb. 2006.
[2] J.-F. Frigon, A. M. Eltawil, E. Grayver, A. Tarighat, and H. Zou, “De-
sign and implementation of a baseband WCDMA dual-antenna mobile
terminal,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 3, pp.
518–529, Mar. 2007.
[3] D. Kim, K. Chung, C.-H. Yu, C.-H. Kim, I. Lee, J. Bae, Y.-J. Kim,
J.-H. Park, S. Kim, Y.-H. Park, N.-H. Seong, J.-A. Lee, J. Park, S. Oh,
S.-W. Jeong, and L.-S. Kim, “An SoC with 1.3 Gtexels/s 3-D graphics
full pipeline for consumer applications,” IEEE J. Solid-State Circuits,
vol. 41, no. 1, pp. 71–84, Jan. 2006.
[4] A. Lodi, A. Cappelli, M. Bocchi, C. Mucci, M. Innocenti, C. De Bar-
tolomeis, L. Ciccarelli, R. Giansante, A. Deledda, F. Campi, M. Toma,
and R. Guerrieri, “XiSystem: A XiRisc-based SoC with reconfigurable
IO module,” IEEE J. Solid-State Circuits, vol. 41, no. 1, pp. 85–96, Jan.
2006.
[5] Y. Zorian and D. Gizopoulos, “Guest editors’ introduction: Design
for yield and reliability,” IEEE Des. Test. Comput., vol. 21, no. 3, pp.
177–182, Mar. 2004.
[6] F. J. Meyer and N. Park, “Predicting defect-tolerant yield in the
embedded core context,” IEEE Trans. Comput., vol. 52, no. 11, pp.
1470–1479, Nov. 2003.
[7] Y.-Y. Chen and S. J. Upadhyaya, “Yield analysis of reconfigurable
array processors based on multiple-level redundancy,” IEEE Trans.
Computers, vol. 42, no. 9, pp. 1136–1141, Sep. 1993.
[8] I. Koren and Z. Koren, “Analysis of a hybrid defect-tolerance scheme
for high-density memory ICs,” in Proc. IEEE Int. Symp. Defect Fault
Tolerance VLSI Syst., 1997, pp. 166–174.
[9] I. Koren and Z. Koren, “Defect tolerance in VLSI circuits: Techniques
and yield analysis,” Proc. IEEE, vol. 86, no. 9, pp. 1819–1838, Sep.
1998.
[10] L. Benini and G. D. Micheli, “Networks on chips: A new SoC para-
digm,” IEEE Computer, vol. 35, no. 1, pp. 70–78, Jan. 2002.
[11] T. Bjerregaard and S. Mahadevan, “A survey of research and practices
of network-on-chip,” ACM Comput. Surveys, vol. 38, no. 1, 2006, Ar-
ticle 1.
[12] F. Angiolini, P. Meloni, S. M. Carta, L. Raffo, and L. Benini, “A layout-
aware analysis of networks-on-chip and traditional interconnects for
MPSoCs,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.
26, no. 3, pp. 421–434, Mar. 2007.
[13] C. H. Stapper, “On yield, fault distributions, and clustering of parti-
cles,” IBM J. Res. Develop., vol. 30, no. 3, pp. 326–338, 1986.
[14] D. Nikolos and H. T. Vergos, “On the yield of VLSI processors
with on-chip CPU cache,” IEEE Trans. Comput., vol. 48, no. 10, pp.
1138–1144, Oct. 1999.
[15] J. A. Carrasco and V. Suñé, “Combinatorial methods for the evaluation
of yield and operational reliability of fault-tolerant systems-on-chip,”
Microelectron. Reliab., vol. 44, pp. 339–350, 2004.
[16] C. H. Stapper and R. J. Rosner, “Integrated circuit yield management
and yield analysis: Development and implementation,” IEEE Trans.
Semicond. Manuf., vol. 8, no. 2, pp. 95–102, Feb. 1995.
[17] I. A. Wagner and I. Koren, “An interactive VLSI CAD tool for yield
estimation,” IEEE Trans. Semicond. Manuf., vol. 8, no. 2, pp. 130–138,
Feb. 1995.
[18] G. A. Allan, “Yield prediction by sampling IC layout,” IEEE Trans.
Comput.-Aided Des. Integr. Circuits Syst., vol. 19, no. 3, pp. 359–371,
Mar. 2000.
[19] J. Cunningham, “The use and evaluation of yield models in integrated
circuit manufacturing,” IEEE Trans. Semicond. Manuf., vol. 3, no. 2,
pp. 60–71, Feb. 1990.
[20] T. Okabe, M. Nagata, and S. Shimada, “Analysis on yield of integrated
circuits and a new expression for the yield,” Elec. Eng. Japan, vol. 92,
no. 6, pp. 135–141, 1972.
[21] C. H. Stapper, “Defect density distribution for LSI yield calculations,”
IEEE Trans. Electron Devices, vol. 20, no. 7, pp. 655–657, Jul. 1973.
[22] I. Koren, Z. Koren, and C. H. Stapper, “A unified negative-binomial
distribution for yield analysis of defect-tolerant circuits,” IEEE Trans.
Comput., vol. 42, no. 6, pp. 724–734, Jun. 1993.
[23] R. E. Bryant, “Graph-based algorithms for Boolean function manipu-
lation,” IEEE Trans. Computers, vol. C-35, no. 8, pp. 677–691, Aug.
1986.
[24] W. Feller, An Introduction to Probability Theory and Its Applications,
3rd ed. New York: Wiley, 1968, vol. I.
[25] Univ. Colorado, Boulder, “CUDD: CU Decision Diagram Package,
Release 2.4.1,” 2007. [Online]. Available: vlsi.colorado.edu/~fabio/
CUDD
[26] M. Millberg, E. Nilsson, R. Thid, S. Kumar, and A. Jantsch, “The
Nostrum backbone—A communication protocol stack for networks on
chip,” in Proc. 17th Int. Conf. VLSI Des., 2004, pp. 693–696.
[27] D. Wiklund and D. Liu, “SoCBUS: Switched network on chip for hard
real time embedded systems,” in Proc. Int. Parallel Distrib. Process.
Symp., 2003.
[28] C. A. Zeferino and A. A. Susin, “SoCIN: A parametric and scalable net-
work-on-chip,” in Proc. 16th Symp. Integr. Circuits Syst. Des. (SBCCI),
2003, pp. 169–174.
[29] A. Andriahantenaia and A. Greiner, “Micro-network for SoC: Imple-
mentation of a 32-port SPIN network,” in Proc. Conf. Des., Autom. Test
Europe (DATE), 2003, p. 1128.
[30] F. Petrini and M. Vanneschi, “Performance analysis of wormhole
routed  -ary -trees,” Int. J. Foundations Comput. Sci., vol. 9, no. 2,
pp. 157–177, 1998.
[31] S. Rai and Y. C. Oh, “Tighter bounds on full access probability in
fault-tolerant multistage interconnection networks,” IEEE Trans. Par-
allel Distrib. Syst., vol. 10, no. 3, pp. 328–335, Mar. 1999.
[32] S. Minato, N. Ishiura, and S. Yajima, “Shared binary decision diagram
with attributed edges for efficient Boolean function manipulation,” in
Proc. 27th ACM/IEEE Des. Autom. Conf., 1990, pp. 52–57.
[33] M. Bouissou, F. Bruyére, and A. Rauzy, “BDD based fault-tree pro-
cessing: A comparison of variable ordering heuristics,” in Proc. Europ.
Safety Reliab. Assoc. Conf. (ESREL), C. G. Soares, Ed., 1997, vol. 3,
pp. 2045–2052.
[34] I. Wegener, “The size of reduced OBDD’s and optimal read-once
branching programs for almost all Boolean functions,” IEEE Trans.
Computers, vol. 43, no. 11, pp. 1262–1269, Nov. 1994.
[35] S. A. Doyle and J. B. Dugan, “Dependability assessment using binary
decision diagrams (BDDS),” in Proc. 25th Int. Symp. Fault-Tolerant
Comput. (FTCS-25), 1995, pp. 249–258.
[36] R. Piessens, E. de Doncker-Kapenga, C. W. Überhuber, and D.
K. Kahaner, Quadpack: A Subroutine Package for Automatic In-
tegration. New York: Springer Verlag, 1983, vol. 1, Series in
Computational Mathematics.
Juan A. Carrasco (SM’02) received the Engineer
degree in industrial engineering and the Ph.D. de-
gree in industrial engineering from the Polytechnical
University of Catalonia (UPC), Barcelona, Spain, in
1982 and 1987, respectively, and the M.Sc. degree
in computer science from Stanford University, Stan-
ford, CA, in 1987.
He is currently an Associate Professor with the
Electronics Engineering Department, UPC. He vis-
ited INRIA (CNRS), Rennes, France, twice, in 1996
and 1998. His research is focused on the development
of methodologies for the modeling and evaluation of fault-tolerant systems, a
topic in which he has published 60 papers in refereed journals and conference
proceedings. He has directed the design and implementation of METFAC-2,
a Markovian modeling tool (see http://www.dit.upc.es/qine/tools/metfac/).
He has been the principal investigator of several research projects funded by
both public and private institutions, has been in the program committees of 11
international conferences, is a research project evaluator of the Spanish and
Catalonian research project evaluation agencies, and is a reviewer of ACM
Computer Reviews.
Prof. Carrasco is a senior member of the IEEE Computer Society.
Víctor Suñé received the engineer degree in indus-
trial engineering and the Ph.D. degree in industrial
engineering from the Polytechnical University of
Catalonia (UPC), Barcelona, Spain, in 1995 and
2000, respectively.
He is currently a Lecturer with the Electronics
Engineering Department, UPC. He visited Duke
University in 2006. His research is focused on the
development of methodologies for the modeling and
evaluation of fault-tolerant systems, a topic in which
he has published 13 papers in refereed journals and
conference proceedings. He is co-designer and co-implementor of METFAC-2,
a Markovian modeling tool (see http://www.dit.upc.es/qine/tools/metfac/). He
has participated in a number of research projects funded by public institutions.
