This paper focuses on the investigation of efficient methods to evaluate circuit fault-tolerance. We propose a fault-tolerance evaluation method based on the Belief Propagation (BP) algorithm. Compared with existing approaches, our algorithm is more efficient in terms of memory requirements and CPU time. The algorithm can easily run on multiple CPUs to achieve parallel processing, and thus further reducing memory cost and processing time. The significance of this research is that the proposed algorithm can be used for developing computer-aided nanoscale simulation tools to systemically evaluate circuit fault-tolerant behavior. This knowledge, in turn, can help build more robust nanocircuits.
INTRODUCTION
The prosperity of the semiconductor industry over the past decades suggests that the transistor density of semiconductor chips doubles roughly every 18 months. As the dimensions of devices decreases to the nanoscale, this trend of miniaturization cannot continue. 1 In nanoscale devices and circuits, two types of errors, hardware faults and signal faults, are unavoidable. 2 (a) Structural Faults: In a nanocircuit, a significant number of logic gates and their interconnections can fail during and after fabrication. It is estimated that interconnection failure rates can reach 10% or more. 3 (b) Signal faults: such as thermal noise, and cross-talk can cause operational failure. These signal errors are dynamic in nature, and they are referred to as "soft errors" in some literature.
The error rate of each device can be obtained based on manufacturers' production yield and production testing results. Although the determination of statistical error rates of devices remains important, this issue is beyond the scope of our paper.
In order to create reliable nanoscale circuits, fault-tolerant design is crucial. Building fault tolerant circuits using unreliable components was initially proposed by Van Neumann. 4 He proposed the use of universal gates (such as NAND and Majority Logic Gates) as a primitive building block and used the Multiplexing technique to improve fault-tolerance. However, the hardware cost for building Von Neumann's multiplexing is too high. To overcome this problem, the lowdegree redundancy technique was proposed in Ref. [5] to reduce * Author to whom correspondence should be addressed.
hardware usage. The stochastic Markov design is the core of this system, which leads to a comprehensive fault-tolerance theory. In Refs. [6 and 7] , the bifurcation theory and its associated geometrical representation were used to analyze the Markov multiplexing system. Their work showed that the two modes (uni-modal and bi-modal) and the median of the stationary distribution are critical to characterize the reliability of the system. They also showed that the NAND-multiplexing technique can lead to system reliability with moderate redundancy even when the gate error probability is high. An integrated system was proposed 8 to synthesize self-recovering microarchitectures, where transient faults are detected using duplication and comparison, and transient faults are recovered using the checkpoint and rollback blocks. The designs in Refs. [9 and 10] used extra circuits to compensate for device and connection failures. Demultiplexerbased error-correcting codes were utilized in Ref. [11] to deal with stuck-open defects (i.e., complete breakage between two nodes that should otherwise be connected). Because we have no prior knowledge about when and where faults will occur in nanoscale circuits, probabilistically-based methods are effective ways to deal with faults in these circuits. For instance, the system proposed in Refs. [5] [6] [7] is built based upon the probabilistic modeling of faults and errors. Tools and techniques to evaluate reliability trade-offs for probabilistically-based architectures were also reported in Refs. [12, 13] .
To date, many nanoscale fault-tolerant circuit design methodologies have been proposed. Our previous studies show that a circuit's fault-tolerant behavior also depends on its topology. 14 For instance, there are different ways to design adders, including ripple adders, look-ahead adders, etc. Now the question is-with faulty gates and erroneous signals, which design is the most fault-tolerant among all designs? To answer this question, we have to systematically evaluate the fault-tolerance of nanoscale circuits. Methods such as the BDD (Binary Decision Diagram) were proposed in Ref. [15] , which can be viewed as a compressed form of the matrix approach. In Ref. [16] , the Bayesian inference schemes were proposed to estimate the overall output error probability of circuits. In this paper, we propose to use the Belief Propagation (BP) algorithm for fault-tolerance evaluation. The advantages of this method over the existing ones include its lower memory requirements and shorter simulation times. To demonstrate the efficiency of our proposed algorithm (which will be discussed later) in calculating system error rate (SER) for medium/large-size circuits, we use 4-bit, 8-bit and 16-bit adders as benchmark circuits. As mentioned previously, circuit topology can impact its fault-tolerant capabilities. In this example, we select two kinds of adders, namely ripple carry adder (RCA) and conditional select adder (CSA), for illustration. We run our simulations on a Pentium IV 3.0 G desktop with 1 G memory for different types of adders with different sizes. The results are shown in Table I . It should be noted that in Table I , the SER values are calculated with the assumption that the error probability of each basic logic gate is 1%.
From the sixth column of Table I and from Figure 1 , we can see that although the Ripple Carry Adder (RCA) and Conditional Select Adder (CSA) achieve the same logic function, their faulttolerance capabilities are different. In general, the CSA provides better fault-tolerant performance than the RCA. Furthermore, the fifth column of Table I shows that the memory cost for all the circuits in the simulation is less than 1MB, and does not depend on circuit size. This property is because all computations are executed locally, so the memory requirement is kept very small. The proposed methodology is general and can evaluate the faulttolerant behavior not only of nanoscale circuits, but also of combinatorial logic circuits (the evaluation of sequential circuits will be the subject of another paper).
Fault-tolerant design can be achieved at different levels. In this paper, we start with the most fundamental logic gates (such as NAND, NOR and NOT gates) and then analyze the fault tolerance of some sample circuits (such as adder and decoder) built upon these fundamental logic gates. Our goal is to find the best design with the smallest error probability among several possible implementations that can achieve the same function. Our strategy is: (1) Starting with some basic-operation circuits of certain fixed error probability. (2) Implementing the logic gates with the smallest SER as the design criteria. (3) Designing medium-size circuits for more complicated functions, and analyzing their fault tolerance capabilities. (4) The analysis should be similar or can be easily extended.
In principle, we can continue the process to develop the larger circuits.
Our study will also determine which basic components in a circuit are most likely to generate errors and affect the performance of the entire circuit. By using the same argument as above, our analysis can then be extended to large-scale circuits by considering each "basic" function as a single entity with a fixed error probability, and then decide which part of the large-scale circuit is the most error-prone link.
In the following, we will explain our proposed design in detail. The paper is organized as follows: we discuss the Ensembledependent Matrix (EDM) model in Section 2, which leads to the proposition of the BP algorithm. In Section 3, we introduce the BP algorithm as an efficient method to evaluate the fault-tolerant behavior of nanocircuits. In Section 4, the Probability Propagation in Trees of Clusters (PPTC) algorithm is proposed to analyze circuits with re-convergent fanouts. In Section 5, we analyze CPU time and memory requirements of our method and compare our method with other existing methods. We finally conclude our presentation in Section 6.
ERROR-RESILIENCE EVALUATION OF CIRCUITS
In order to understand our proposed design, let us begin with the straightforward Ensemble Dependent Matrix (EDM) method to evaluate circuit fault-tolerance. As described in Ref. [17] , the fault-tolerance performance of a circuit can be evaluated using its System Error Rate (SER), which is calculated through the manipulation of the Ensemble-dependent matrix. The matrix model is easy to implement but inefficient in terms of memory and CPU requirements. Next, we will explain EDM in detail.
To explain how the matrix model works and how we can improve the calculation efficiency, let us use the circuit in Figure 2 as an example. In order to obtain the SER of the circuit, we need to build its ensemble-dependent matrix, M, first. According to the steps outlined in Ref. [14] , the procedures are:
where NAND, NOR and INV are the probability-transfer matrices of the NAND gate, NOR gate, and inverter gate, respectively. Eye (2) is the identity matrix 1 0 0 1 Asterisk ( * ) means removing the corresponding columns of a matrix based on the fanouts of inputs (Please refer to Ref. [14] for the details).
M is a 2 2 by 2 4 matrix, as shown below. Each column represents an assignment of the inputs, each row represents an assignment of the outputs, and each component in the matrix represents the transfer probability from input to the corresponding output. Further analysis shows that two factors have a significant impact on the performance of the EDM model: (i) Matrix M contains much more information than required to evaluate the fault-tolerance capability of circuits. In our example, we only need 2 4 conditional probabilities (corresponding to the 2 4 items in the truth table) in order to calculate SER, not all the 2 2 · 2 4 components in matrix M. (ii) We decompose the procedure of matrix M calculation and describe each conditional probability of M as an arithmetic expression in terms of the elements of NAND, NOR and inverter. We find that each conditional probability is actually obtained by an exhaustive enumeration method, but this method is not efficient. For example, to calculate
The calculation above involves 2 7 = 128 summations and 128 × 8 = 1024 multiplications (each contains eight multiplications), which is computationally expensive.
The above analysis gives us two inspirations for improving the efficiency of the fault-tolerant evaluation method. First, we should only calculate the conditional probabilities that are necessary for obtaining SER, and thus avoid consuming lots of computation resources on information that we actually do not need. Second, we could explore an efficient way to calculate the individual conditional probabilities instead of using the exhaustive enumeration method as shown in Eq. (2) .
We found that the BP algorithm, which is widely used in the area of probabilistic inference, provides an efficient and effective solution to the above problem. First, the BP algorithm can be used to calculate the individual conditional probabilities separately. Therefore, we do not have to calculate the conditional probabilities that do not contribute to SER computation. Second, the BP algorithm provides an efficient way to compute each conditional probability. As an example, to calculate the conditional probability (2), the BP method does the following calculations: 
PEARL'S BP ALGORITHM
Before introducing our proposed algorithm, this section shows how to use the simple BP algorithm (Pearl's BP Algorithm) to analyze the fault tolerance of nanoscale circuits, and then discusses its limitations.
Introduction of Circuit Evaluation Based on
Pearl's BP Algorithm
The BP algorithm was initially proposed by Judea Pearl 18 to design a computational model for the mechanism of human inferential reasoning. In our research, the goal is to efficiently compute the SER of a circuit. To demonstrate how Pearl's BP algorithm works, we start with a simple example as shown in Figure 3 .
From the truth table in Figure 3 , we can easily obtain
Assuming that we already have prior knowledge about the input signal distributions, P A = 0 B = 0 C = 0 , P A = 0 B = 0 C = 1 , etc. The conditional probabilities, that is, P H = 1 A = 0 B = 0 C = 0 , are what we need to compute.
In order to perform the belief propagation, a belief network representing the knowledge of a given domain should be constructed first. To build a belief network of our sample circuit in Figure 3 straightforward. Each signal in the circuit is represented by a node in the belief network. Each logic gate in the circuit is represented by a link. The belief network can be built from input signals to output signals as shown in Figure 4 .
Once constructed, the belief network can be used to simulate human reasoning about the interpretation of specific input data (the observation) leading to certain conclusions. The reasoning process includes instantiating observed variables, computing their impacts on our beliefs of other variables, and finally obtaining the beliefs on the variables that we are interested in. Before showing the detailed calculation process, we first introduce the following definitions.
Definition 1: For any given variable V , suppose that V can be any value in {v 1 
Here E is the evidence. 
where is a normalizing constant.
is then obtained by combining these two support vectors
To calculate the -message XY x i sent from parent X to child Y , we assume that X has multiple children Y 1 Y 2 , and Y is one of them. XY x i is then given by
To calculate the -message YX x i sent from child Y to parent X, assume that Y has multiple parents X 1 X 2 , and X is one of them. YX x i is then given by i Y y i After defining all the above definitions and calculations, we can now precede to the explanation of Pearl's BP Algorithm. In Pearl's algorithm, each node V has two vectors:
The task is to determine how new information can be spread through the network, that is, how the and values of a node are determined by the and values of its neighbors. The process can be divided into three steps as shown below.
• For each node V , if V has been calculated and V received all the messages from all parents except X, calculate VX X and send it to X. Iterate this step until no change occurs (Note that the observed nodes do not receive message because the belief in their states will not change).
• Step 3: Normalization. For each node V , calculate BEL V = V V and normalize the result. In the remaining part of this section, we will use the calculation of conditional probability P H = 1 A = 0 B = 0 C = 0) in Figure 4 as an example to demonstrate the detailed procedure of Pearl's BP Algorithm.
Example. Assume that the conditional probability tables for an inverter, an AND, and an OR gate are respectively. That is, each gate has a probability of 10% to behave incorrectly. Then we apply the three steps described above into the belief network in Figure 4 .
• Step 1: Initialization.
Step 2: Propagation. The procedure of message propagation is shown in Figure 5 .
(1) As shown in Figure 5 (a), A, C, B send messages to D, F, I, respectively. 
In the step followed, G will send messages to D and F , H will send messages to I. However, since we wish to calculate the belief in node H and the following steps have no effect on H , we can stop here and do not have to continue to the next step.
• Step 3: Normalization. We calculate BEL H = · H · H = 0 12016, 0.87984), that is, given A B C = 0 0 0 , our belief on H = 1 is 0.87984. In other words, P H = 1 A = 0 B = 0 C = 0 = 0 87984. This value is the same as what we get using the ensembledependent matrix model computation.
Limitations of Circuit Evaluation Based on Pearl's BP Algorithm
Although Pearl's BP Algorithm is simple to implement, and efficient in computation, it has one drawback: it only works for single connected Belief networks, where there is only one path between any two nodes. As Pearl has pointed out, his BP algorithm may diverge or converge to an accurate value in a multiple connected graph because of "double counting," where the same evidence is passed around the network multiple times and can be mistakenly considered as new evidence. 18 19 Figure 6 shows the examples of both a single-connected belief network and a multiple-connected belief network. Let us see how the BP algorithm in Ref. [18] works in both graphs, when we have an observation at node A (e.g., A = a). For the single-connected graph, node A propagates a message to node B, and then node B propagates a message to node C. After this iteration, the belief propagation process finishes and the belief at each node is consistently updated. For the multiple-connected graph, node A propagates messages to nodes B and B', and then both B and B' nodes propagate a message to node C. After this step, the belief at each node is consistently updated based on the observation at node A. However, the BP algorithm in Ref. [18] has not finished yet; when node C receives a message from its neighboring B node, it will also propagate a message to its other neighboring B' node. Similarly, when node C receives a message from node B', it will propagate a message to node B. As a result, the observation at node A is propagated to both nodes B and B' twice. Consequently, the beliefs at nodes B and B' are incorrect. Unfortunately, for most practical circuits, their corresponding belief networks are multiple-connected graphs due to the reconvergent fanout. For example, in Figure 2 , E fans out to two different signals, H and I, which then converge to L in a later stage. As a result, these connections form a loop among E, H , I and L. We, therefore, need an algorithm that is able to handle loopy or multipleconnected graphs.
OUR PROPOSED ALGORITHM

Introduction and Implementation
To address the issue of applying the BP algorithm to loopy graphs, the Probability Propagation in Trees of Clusters (PPTC) algorithm, also called Junction Tree algorithm in some literature, was developed by Lauritzen and Spiegelhalter 20 and refined by Jensen. 21 It is one of the most widely used BP algorithms. We adopted and adjusted PPTC algorithm to evaluate circuit fault rate. Unlike the Pearl's BP algorithm, 18 which propagates messages in the original belief network and only works for a single-connected belief network, PPTC first converts the belief network into a cluster tree structure, which contains no loops. Probabilities are then computed based on propagating messages in the tree structure. In this way, PPTC can handle multiple-connected belief networks.
To demonstrate how the proposed algorithm can efficiently calculate SER and evaluate the fault-tolerance of nanoscale circuits, we use a simple XOR with four gates as an example. The circuit schematic is shown in Figure 7 and its SER is expressed as
where f is the output value corresponding to the inputs a b in the truth table. The total of 2 2 conditional probabilities P f a b ∀ a b = 0 0 1 1 is what we need to compute. In the following, we will show the detailed procedures as outlined in Figure 8 to calculate SER using the proposed algorithm. We will also illustrate ideas behind individual steps. For our following discussion, we will use the same notations and definitions as in Ref. [22] . Step (1) Graphical Transformation.
Substep (1.1) Building the Belief Network:
The first step of PPCT is to convert a digital circuit into the corresponding belief network.
• Create a node for each signal in the circuit. For example, there are six signals {A, B, C, D, E, F} in the XOR circuit and thus six nodes are created with the same signal names as shown in Figure 7 .
• Connect every input-output pair of each gate using an arrow, which represents the dependencies among these nodes. For example, in the XOR circuit in Figure 7 , signal A and signal B are the inputs of the left-most NAND gate while signal C is its output. We connect A-C and B-C using two arrows.
The belief network of the XOR circuit is shown in Figure 9 . We can express the joint probability of the graph as
P a b c d e f = P a P b P c a b P d a c P e b c P f d e
The conditional probabilities can be viewed as functions of the variables involved. We usually replace them with more general terms:
is called the potential function. In our problem a = P a , c a b = P c a b , etc. To calculate the conditional probabilities P f a b in Eq. (4) for each given value (a, b), we should utilize the efficient summation method as described in the previous section, for example,
instead of the exhaustive enumeration method. Note that the order in which the summations in the right hand side of Eq. (6) are carried out is arbitrary. We can switch the order of summations, such as computing d first and then c and e. The fundamental difference between the exhaustive enumeration method and the BP method is that the latter does all the computations locally and passes the results among neighboring nodes to obtain the overall conditional probability.
Substep (1.2) Constructing the Moral Graph.
Before we introduce how to construct the moral graph, we first introduce the concept of a clique.
Definition. A clique is a sub-graph that is complete and maximal. "Complete" means that every pair of nodes in the clique is connected. "Maximal" means that the clique is not contained in a larger and complete subgraph.
In order to systematically calculate the conditional probability in Eq. (6), we need to exploit the topology of the network. First, we can see that unlike the conditional probability form, the potential function form in Eq. (5) has no directional information. Therefore, all the directions in the graph can be removed. Second, we want the variable group of each function to be included within a clique in the final join tree structure, so we can do all the summations locally. To achieve this goal, we need to make sure that the variable group of each function is a completed graph, that is, all the variables in the group are connected with each other. Otherwise, the variable group will not be included in any clique. Recall that the function initially comes from the conditional probability and thus the variable group is composed of a variable and its parents. Because connections already exist between the variable and its parents, what we need to do is to make sure that every pair of parents is connected as well.
To make sure all the variables in the group are connected with each other, we will convert the Belief network into an undirected graph, also called Moral graph, as described below:
• Remove all the directions in the belief network.
• For every node in the graph, connect each pair of its parents if they are not connected.
In our example, nodes A and B have no parents. D's parent nodes A and C are already connected, so does E's parent node B connect to node C. However, C's parents, A and B, are not connected and we can connect them with a dashed line. Similarly, we can also connect F's parents, D and E, with a dashed line. The final moral graph is shown in Figure 10 .
As we mentioned in Substep (1.2), an important relationship between Eq. (5) and the moral graph is that the variable group of any function is complete, which means the variables of any single potential function are all connected in the moral graph. However, this relationship could be broken in the calculation process when some variables are eliminated after the summations. For example, in Figure 10 Fortunately, the triangulation condition can guarantee that we will be able to do the summations without creating intermediate functions with an incomplete variables set, provided that an appropriate ordering is chosen. 20 A graph is triangulated if and only if every cycle (with length no shorter than 4) contains an edge that connects two non-adjacent nodes in the cycle.
In this step, the goal is to make sure that the moral graph after substep (1.2) is triangulated. After we choose a suitable summation ordering, the variable group of any intermediate function will be complete. Before we show the detailed triangulation process, we first need to define the weight of a node.
Definition. Weight of a node is the number of states of the corresponding variables. Weight of a node set is the product of the weights of all the nodes.
In our example, the weight of each node is two since the digital signal has two states: 0 and 1.
The process of triangulation is as follows: • Create a duplicateG from the moral graph G.
• Choose a node according to the following steps:
For each node V inG, let Nv denote its neighbors. Count the number of edges needed to fill in if we want all the nodes in V ∪ Nv (symbol ∪ means the Union of two sets) to be connected. Select the node that causes the minimal number of edges to be filled in. If there is a tie among several nodes, then select the node whose V ∪ Nv has the smallest weight. If there is still a tie, randomly choose one.
• For the selected node V from the previous step, connect all the nodes of V ∪ Nv in graphG. For each edge added intoG, add the same edge into G. Then delete V fromG.
• Repeat the previous two steps until there is no node inG. G is then triangulated. Table II shows the order of node elimination in the triangulation process for the example in Figure 10 . Figure 11 shows the triangulated graph. We can see that a dotted-line is added between B and D in Figure 11 .
It should be noted that the triangulation process is not unique. There are many different methods to triangulate a moral graph. The final triangulated graph might be different as well due to different choices we make in the triangulation process. For example, as shown in Table II , we choose node F at first, then node A in the second step. In fact, we can choose any one from {A, B, D, E} in the second step. If we choose B, edge AE will be added instead of edge BD and the final triangulated graph will be different. However, as long as we follow the criterion described in Ref. [22] during the triangulation process, it is guaranteed that the final joint tree we build will be optimal, and the sum of state space sizes of the cliques in the triangulated graph is minimal. Substep (1.4) Identifying Cliques.
As indicated in Ref. [19] , the cliques can be identified during the triangulation process (substep 1.3). This declaration is based on two facts. First, each clique is a selected V ∪ Nv (after all the nodes are connected) in substep (1.3). The second fact is that a previous V ∪Nv can not be a subset of a later one.
As a result, we can get all the cliques during substep 1.3 by saving each selected V ∪ Nv which is not a subset of a previous one. In our example shown in Table II After extracting all the cliques from the triangulated graph, the next step is to build the join tree structure on which the belief propagations will be performed. A join tree (sometimes called a junction tree or a cluster tree in the literature) is a graph that is composed of clusters, sepsets and their connections. Clusters are the cliques we obtained in substep 1.4, and sepsets are intersections of adjacent clusters. In Figure 12 , we denote clusters using circles and represent sepsets using rectangles. The letters inside are the variables included in the clusters or sepsets. Before we explain the detailed procedure of building a join tree, two definitions need to be introduced: Definition 1: Mass of a sepset S is the number of variables in the sepset.
Definition 2: Cost of a sepset S XY is the sum of the weight of cluster X and the weight of cluster Y .
The process of building a join tree is as follows:
• Assume that we have n cliques in total, create n separate nodes (each represents a clique).
• For each pair of cliques X and Y , create their sepset S XY = X ∩ Y . There are n n − 1 /2 such sepsets in total.
• Select a sepset S XY that has the largest mass. If there is a tie among several sepsets, select the one with the smallest cost. Then insert S XY between cliques X and Y , if there is no path connecting X and Y , connect S XY and X, connect S XY and Y . Delete S XY from the sepset list. Repeat this step until all the cliques are connected together via the sepsets. Step2:
Step3:
Step1: Fig. 12 . The example procedure of building the join tree for the sample XOR circuit. Figure 12 shows the procedure of building the join tree for the sample XOR circuit in Figure 7 . As discussed before, this process includes three steps.
Step 1: For the XOR circuit, we have three clusters and three sepsets.
Step 2: We choose sepset {B, C, D} that has the largest mass of the three sepsets. Sepset {B, C, D} is the intersection of clusters {A, B, C, D} and {B, C, D, E}. Since these two clusters have not yet been connected, we insert sepset {B, C, D} between them to connect the sepset to them.
Step 3: We choose sepset {D, E}, which has the largest mass of the remaining two sepsets. Sepset {D, E} is the intersection of clusters {B, C, D, E} and {D, E, F}. Because these two clusters are not connected, we insert sepset {D, E} between them and connect the sepset to them as shown in step 3 of Figure 12 . All the clusters are now connected and we finish building the join tree.
Once we build the join tree, we have finished the process of graphical transformation from the initial digital circuit schematic to the final join tree. Next, we will initialize the potentials of each cluster and sepset, insert the states of the observed nodes (or the evidences) and perform belief propagation and marginalization. Finally, we will calculate the SER of the circuit. In the following, we will show the processes outlined in Figure 8 .
Step (2) Initialization.
In this step, we will initialize the potential function X for each cluster X of the join tree. We will assign a real value to function X for each variable assignment x based on the conditional probability tables of the basic NAND, NOR and INV gates. For instance, X is the probability P X E v , where E v denotes evidence or the states of the observed nodes.
To keep tracking the initial and the current potential function values, we give each cluster two potential arrays, Initial X and current X . The first array is used to store initial potential values, and the second one is used in later stages for belief propagation. Each array has a size of 2 n , where n is the number of variables in the cluster. Similarly, we also give each sepset two potential arrays, Step (3) Encoding evidence.
After initialization, we need to extract the information of the observed nodes (evidence) and encode this information into the cluster potentials.
• Extract the information of the evidences. For example, the first input assignment is A B = 0 0 , its output is F = 0, and the conditional probability we want to infer is P F = 0 A = 0 B = 0 . In this example, the evidence is {A = 0 B = 0}, and what we need to calculate is the marginal potential F = 0 .
• For each cluster X, let
It may seem more reasonable to put this step in the initialization stage. However, the initialization only needs to be executed once during the whole SER calculation, while this step needs to be executed repeatedly for each input. Therefore, we put it in Step 3.
• Encode each observed node V as a likelihood vector V . For example, V = 0 when V = 1 0 ; otherwise V = 1 when V = 0 1 . We then find a cluster X that contains V and multiply potential array After initializing the joint tree potentials and encoding the evidence into the cluster potentials, we need to perform the global propagation to make them locally consistent, that is, X\S X = S for every cluster-sepset pair X S . Here, X\S = x x ∈ X x S means the subset of X after eliminating the variables of S from X.
In general, the global propagation is executed in three steps:
19
• Randomly choose a cluster X.
• Unmark all clusters and call function Collect-Evidence (X).
• Unmark all clusters and call function Distribute-Evidence (X).
Here, Collect-Evidence (X) is the procedure of propagating messages in the direction toward cluster X, and Distribute-Evidence (X) is the procedure of propagating messages in the opposite direction.
In our application, however, what we are interested is the potential of the output variable F given the observed inputs variables A, B. We do not have to make all clusters in the tree locally consistent. Instead, we only need to make sure that at least one cluster containing F gets the messages from all the other clusters. We can, therefore, reduce three steps into two:
• Choose a cluster X that contains the last output signal, like cluster D E F in our example.
• Unmark all clusters and then call function Collect-Evidence (X).
The procedure of Collect-Evidence(X) is: (i) Mark X (ii) Recursively call function Collect-Evidence on unmarked neighboring clusters of X (iii) Pass a message from X to the cluster, which invokes CollectEvidence (X) A single message passing procedure (between two adjacent clusters X and Y with sepset S) in step III of function CollectEvidence is:
Note that, in this process of belief propagation, a cluster passes a message to a neighbor only after it has received messages from all other neighbors. Next, we will explain the belief propagation process using our example.
Cluster D E F is chosen since it is the only cluster that contains the last output signal Now, cluster D E F has obtained all the information from the other two clusters. We can do the marginalization on its potential array to obtain the final result.
Step (5) Marginalization.
After we have the consistent cluster potential current DEF , we can calculate the potential F = 0 , which is P F = 0 A = 0 B = 0) in our application, by marginalization. That is, we compute F = 0 = DE current DEF F =0 , which corresponds to the outer summation over variables D and E in Eq. (6).
Step (6) Handle next input.
After we finish the calculation on the first input assignment AB = 00, we can move to the next input assignment AB = 01 to obtain the corresponding output (F = 1) and followed by repeating step 3∼step 5 to calculate P F = 1 A = 0 B = 1). This process continues until we reach the last set of input assignment (AB = 11 in our example). Finally we use Eq. (3) to compute the circuit's SER.
Note that the first step (Graphical transformation) and the second step (Initialization) only need to be executed once for calculating the SER value, while Step 3 (Encoding Evidence) to Step 5 (Marginalization) need to be executed repeatedly for each assigned inputs.
COMPARISONS WITH OTHER WORKS
Verifying the Proposed Behavior Evaluation Method
We have evaluated the model's accuracy via HSPICE simulations. In our experiments, we simulated erroneous gates with a specific error rate (for example, 10%) by varying V DD and GND signals (the power supply and ground voltages). The following describes the validation process (please refer to Ref. [14] for the details). We used external control signals instead of V DD and GND, as shown in Figure 13 . As a result, we can control the error rate of various gates by varying the external signals. For example, in order to simulate an inverter with error rate 10%, we can create two control signals, the first signal 'ctrl1' is a random value, 10% of the time being at logic '0' and the remaining 90% time being at logic '1'. The second signal 'ctrl2' is the complementary signal of 'ctrl1' ('ctrl1' or '1-ctrl1'). The first signal is used in place of V DD , whereas the second is used to replace GND. These two signals switch concurrently. The comparison between conventional logic gates and ours is illustrated in Figure 13 (Here, we only show an inverter and a NAND as an example and the same approach can be applied to other logic gates).
We used a HSPICE toolbox for MATLAB developed by Silicon Laboratories, Inc (http://www-mtl. mit.edu/∼perrott). We used this toolbox to analyze the simulation results generated by the HSPICE and MATLAB. We then calculated the actual error rate of individual circuits. The simulation consists of four steps: (1) Using MATLAB to generate two sets of control signals, 'ctrl1' and 'ctrl2', with designated error rate. The data is later used in HSPICE. averaged results and comparing them with the theoretical error rate values calculated according to the proposed BP method.
Our Approach Compared with the EDM Method
We compare the performance of the proposed algorithm with the EDM model using a ripple carry adder. The results are shown in Figures 14 and 15 . The comparisons are performed in two aspects: memory requirement and CPU time. The memory requirement is in the unit of MB, while CPU time is represented by the total number of multiplication operations and addition operations. These are two major criteria to measure the load of SER computation. From Figure 14 , we can see that the memory cost of the EDM model increases drastically. It is about 100 GB when the number of bits of the adder reaches eight bits. This is beyond the capability of current desktop computers. On the hand, the memory cost of the proposed algorithm is very small (<1 MB) for a 1-bit adder to a 16-bit adder. Figure 15 shows that the proposed method is also much more efficient in terms of computational complexity compared to the EDM method.
Comparison Between Our Method and the BDD Method
The matrix model was improved in Ref. [15] to handle larger circuits more efficiently via the Binary Decision Diagram (BDD) approach. In Ref. [2] , the authors demonstrated the simulation results using
LGSynth91 and LGSynth93 benchmark circuits. To compare our method versus the BDD method, we choose two sample benchmark circuits, a decoder and a pm1 circuit. Their schematics are shown in Figure 17 . The comparison between BDD and our method is shown in Table III . Our method shows significant advantages over the BDD method in terms of both CPU time and memory requirement. For the decoder circuit, the CPU time is reduced by 99.9%, and the memory requirement is reduced by 93%. For the pm1 circuit, the CPU time is reduced by 98%, and the memory requirement is reduced by 99%.
CONCLUSIONS AND FUTURE WORK
Hardware and signal faults are unavoidable in nanoscale devices. Various fault-tolerant architectures have been proposed to build faulttolerant nanoscale systems with fault-prone components. The reliability is achieved at the cost of redundancy in the forms of structural and temporal redundancy or coding mechanisms. Because different topologies can be used to achieve the same logic function, we need an efficient way to tell which topology is the most fault-tolerant. The contribution of our paper is in the area of systematic evaluation of fault-tolerance in different designs.
In this paper, based on the performance analysis of the EDM model, we propose a novel evaluation method based on the Belief Propagation (BP) algorithm. The BP algorithm is efficient for circuit evaluation because: (1) it avoids all the unnecessary calculations, which is contained in the EDM model computation; (2) it utilizes the topology of circuits and localized computation. Simulations show that this new method is more efficient in terms of CPU time and memory requirement, compared with the other methods (e.g., EDM, BDD, etc). We have successfully applied the proposed algorithm to evaluate circuits with more than 50 input/output signals. The proposed methodology can also be easily run in parallel on multiple CPUs and thus further reduce computational time. For our future work, we will focus on extending the algorithm to handle sequential circuits and basic memory elements, like latches and flip-flops.
