Abstract-Two
I. INTRODUCTION
We have witnessed the prosperity of the semiconductor industry during the past decades based on Moore's law. However, as the dimension of devices keeps decreasing in size to the nanoscale level, Moore's law is no longer valid. In nanoscale world, there are two obstacles which we should overcome: Hardware faults and Signal faults [2] , [4] , [5] . Hardware faults are failures of devices and interconnections during and after manufacture. Signal faults are operation failures caused by surrounding noise when the devices operate near the thermal limit.
Different approaches have been proposed to deal with these faults. In Ref. [6] and [7] , researchers used extra circuit elements to supplement failed devices and connections. In Ref. [8] , Von Neumann used majority logic gates as a primitive building block and randomizing networks to prevent clusters of failures from overwhelming the fault tolerance of the majority logic. In Ref. [9] , demultiplexer-based error-correcting codes were utilized to deal with stuck-open defects. These approaches, however, are technology-specific. Some of them can only handle specific types of faults. The challenges are to handle both hardware faults and signal faults simultaneously given that we have no prior knowledge when and where these faults might occur. In order to achieve this goal, the probabilistic-based model and circuit design are proposed: the Markov Random Field design [2] , [10] , [11] .
The Markov random field (MRF) model was proposed initially in Ref. [2] and extended in Ref. [10] and [11] . MRF was developed to optimize a set of random variables so that their overall joint probability is a global maximum. Based on this idea, the MRF models for logic gates are developed. Once the fundamental MRF building blocks such as NOT and NAND are created, we can then use them to build complicated circuits and these circuits will have the capability to tolerate both hardware faults and signal faults. However, the fault-tolerant nanoscale circuits modeled by MRF model are complicated and are not suitable for CAD tools development. We need to simplify the model to make it practical.
The EDM (Ensemble-dependent Matrix) model was first introduced in Ref. [3] and extended in Ref. [1] . The matrix elements represent input-output transition probabilities with the column indices of the matrix representing the input values while the row indices representing the output values. Based on the matrices of the elementary gates and the connection ways between gates, we can get the ensemble matrix for any combinational logic circuit. Using the ensemble-dependent matrices, we can calculate the Bit Error Rate (BER) values of circuits and use these values to evaluate the fault-tolerance capability. We also observe that the condition number (or eigenvalue ratio) of the ensemble-dependent matrix * A A relates to the fault-tolerance performance of the circuit. With this eigenvalue analysis, we expect to find some criteria which can act as general guidelines in designing fault-tolerant nanoscale circuits.
In section 2, we verify the accuracy of the ensemble dependent matrix model by using HSPICE and MATLAB. In section 3, we show that a better fault-tolerant circuit corresponds to a delta matrix with smaller 'trace value'. We then provide an analytical proof in section 4 that the matrix model and the MRF model converge for digital circuits. We conclude the paper in Section 5.
II. VERIFYING ENSEMBLE DEPENDENT MATRIX MODEL
The ensemble-dependent matrix model uses input-output transition probabilities instead of the conventional truth table to describe circuits. For example, the ensemble-dependent matrix of a NAND gate can be expressed as where c stands for its correct operation probability, and ic = 1− c is the incorrect probability. The column indices of the matrix represent input values while the row indices represent output values.
We can evaluate the fault-tolerance capability of each circuit by using the EDM model. Let us take the four different circuits as shown in Fig.1 as an example. Although these circuits can achieve the same logic function, their Bit Error Rates (BER) are different, as shown in Fig. 2 . The BER plot shows that circuit (a) has the worst performance while circuit (c) has the best fault-tolerance capability. The circuits (b) and (d) have comparable performance.
The next logical question is "How accurate is this model in describing the performance of actual circuits?" To answer this question, we propose to use a circuit simulation software (HSPICE) to check whether the simulation results match the BER values we calculated. Before performing the simulation, we have to solve two issues, i) How to simulate various types of logic gates with a given gate error probability? (Note: This given gate error probability is an approximation of the faults)
ii) How to analyze the simulation outputs based on HSPICE simulation and how to obtain the actual error rate for each circuit?
Fig. 2. Error probability of various circuits.
To solve the first issue, we propose to use an external control signals instead of using conventional V DD and GND, as shown in Fig. 3 . As a result, we can control the error rate of various gates via controlling the external signals. For example, in order to simulate an inverter with error rate 10%, we can create two control signals. The first signal 'ctrl1' is a random value with 10% of time being at logic '0' and remaining 90% time being at logic '1'. The second signal 'ctrl2' is the complementary signal of 'ctrl1' (' ctrl1 '). The first signal is used in place of V DD , whereas the second is used to replace GND. These two signals switch concurrently. The comparison between conventional logic gates and ours is illustrated below (Here, we only show an inverter and a NAND as an example and the same approach can be applied to other logic gates). To solve the second issue, we use a HSPICE toolbox for MATLAB developed by Silicon Laboratories, Inc (Http://www-mtl.mit.edu/~perrott). We use this toolbox to analyze the simulation results generated by the HSPICE and MATLAB. We then calculate the actual error rate of individual circuits. The simulation consists of four steps: 1) Using MATLAB to generate two sets of control signals, 'ctrl1' and 'ctrl2', with designated error rate. The data is later used in HSPICE.
2) Utilizing HSPICE to simulate the output of individual circuits with the control signals created at the first step.
3) Employing MATLAB to analyze the output data from HSPICE and calculating the error rate of various circuits. 4) Repeating previous three steps several times and recording the actual error rate of each circuit during each run. Getting the averaged results and comparing them with the theoretical BER values calculated according to the matrix method.
In the initial simulation, we use the four circuits as shown in Figure 1 . We choose incorrect probability ic=10% and ic=30% respectively to verify whether actual circuits can generate the same BER as indicated in Fig.2 
BER values calculated from the matrix models
Fig5. Gate error rate p = 30%
III. CRITERION TO COMPARE ERROR-TOLERANCE CAPABILITY
We have shown how circuit components interconnect (or circuit topology) can affect their fault-tolerance performances (refer to the sample circuits in Fig. 1 and BER in Fig. 2 ). Our goal, however, is to find a general design guideline so that computer-aided design tools can generate an optimal fault-tolerant circuit. In Ref [1] , different criteria based on the ensemble dependent matrix model, such as the BER value and the eigen-value ratio, have been used to evaluate the fault-tolerant capability of different circuits. Here we would like to show another criterion.
Every circuit has its desirable dependent matrix A , such Suppose the actual matrix for the four circuits are 1 2 3 , , A A A and 4 A with the presence of signal errors. We define the delta matrices as | | . We can show that the trace of the delta matrix reflects the energy of the erroreous outputs and is proportional to the BER. Therefore, the smaller the trace, the more fault-tolerant is the circuit, as shown in Fig.6 . In addition to the eigen ratio, the trace of the delta matrix is another indication for optimal fault-tolerant circuit design. 
IV. RELATIONSHIP BETWEEN THE ENSEMBLE MATRIX MODEL AND THE MRF MODEL
In Ref [2] , another probabilistic-based Markov model is proposed for hardware fault and signal fault. Given the fact that logic signals in digital circuits are '0' and '1', we can prove that the Ensemble-dependent Matrix model and the Markov Random Field (MRF) model are the same. To prove this finding, we need to build a general logic circuit model first. As we mentioned previously, there are three interconnection topologies for combinatorial logic gates: serial, parallel and fanout. From this perspective, we can build our logic circuit model as follows. Figure 7 shows a general logic circuit, where 'IN' is for inputs, 'OUT' is for outputs. Combinatorial circuit can generally be divided into many sub-stages, S1, S2, … S(n).
As shown in Fig.7 , different stages are connected in a serial manner. Within each stage, gates can be connected in a parallel or in a 'fanout' fashion. For gates within each stage, we only need to consider some basic gates such as Inverter, NAND, NOR, AND, OR because other gates can be constructed by using these building blocks. For consistency and simplicity in our matrix computation, we can use identity matrix eye(2) to describe a topography where a logic signal is transferred directly across a stage. There could also be `fanout' in each stage in which a single logic output is connected to several gates. , ,..., n t t t − are the number of inputs to each stage, n t is the number of outputs. From our ensemble dependent matrix model, each stage can be represented by a matrix. Suppose that these matrices are 1 2 , ,..., n A A A , respectively. The matrix of the whole circuit is then 
From the MRF model, the marginal probability of inputs and outputs is . Equation (2) comes from the Markovian property.
Comparing (1) with (2), we can see that the left-hand side of both equations indicate the transition probability from the j th input of the first stage to the i th output of the last stage. Next, we will prove that these transition probabilities are the same and thus these two models (the EDM design and the MRF design) match each other.
Let us look at the right-hand side of (1) and (2) 
)
F represents the deletion of some columns in the tensor product by considering fanouts. As a result,
and it equals to 0 when the probability from the j th input to the i th output at stage k is zero.. Here, u is the number of gates that operate incorrectly , v is the number of gates who act correctly.
If the probability from the j th input to the i th output at stage k is zero,, 
where , , 
V. CONCLUSION
Most methodologies proposed so far to increase fault-tolerance capability of nanoscale circuits are technology-specific. The two probabilistic-based models proposed, however, are independent of the fabrication technologies because the models are built from the system perspective. In this paper, we verify that circuit error can actually be described using the ensemble-dependent matrix model. We show that the BER values calculated directly from the matrices are the same as those obtained from actual circuit simulations. We are working on mapping the design onto real nanoscale chip to verify our claims and testing our models on some benchmark circuits. We also prove that the MRF model and the EDM model are the same in terms of error description in digital circuits. For future research, we will use the criteria we found as guidelines to designing optimal fault-tolerant circuits. We will also explore the methods for applying the matrix model to sequential circuits.
Dr. Jie Chen acknowledges support from the Discovery grant of Natural Sciences and Engineering Research Council of Canada. The authors would like to thank Tingting Liu for valuable discussion about the analytical proof.
