The development of VLSI technology results in the dramatically improvement of the performance of integrated circuits. However, it brings more challenges to the aspect of reliability. Integrated circuits become more susceptible to soft errors. Therefore, it is imperative to study the reliability of circuits under the soft error. This paper implements three probabilistic methods (two pass, error propagation probability, and probabilistic transfer matrix) for estimating gate-level circuit reliability on PC. The functions and performance of these methods are compared by experiments using ISCAS85 and 74-series circuits.
Introduction
Soft errors arise from single event upsets (SEU), which are caused by energetic particles (neutrons and alpha particles). Device scaling and density increasing make integrated circuits become more and more susceptible to soft errors.
Conventional integrated circuit reliability evaluation faces to chip manufacture. The reliability is obtained by some tests such as environmental test, life test and so on. However, with the improvement of the reliability of VLSI circuits, some serious problems of these approaches such as too many samples, high cost, and long term have been exposed [1] . Facing these problems, high-level reliability evaluation was proposed. Highlevel methods rely on simulation-based fault injection at system level, register transfer level (RTL), and gate level. They can be used to identify the most sensitive parts of a circuit during design stage.
Recently, in the presence of soft errors, circuit reliability estimation often relies on soft error rate (SER). So far, SER estimation can be carried out at gate-level and transistor-level. Gate-level probabilistic models are developed based on the type of gates and their interconnection topological structures. These methods were proposed based on the signal reliability theory [2] . The typical methods are two pass (TP) method [3] , error propagation probability (EPP) method [4] and probabilistic transfer matrix (PTM) method [5] . Transistor-level SER estimation methods consider some physical factors of device. They generally adopt simulation-based fault injection [6] and statistical analysis (such as Qcrit method [7] ). Some tools (such as ROBAN [8] , SEUTool [9] , and XDRS [10] ) are often used to simulate the circuit behavior at transistor level. The faults are injected by adjusting signal pulse, voltage or current.
Compared to transistor-level method, gate-level method is time-saving, more accurate, and easy to understand and use. So this paper concentrates on the probabilistic methods at gate level.
In the presence of soft errors, circuit reliability estimation often relies on SER. Based on SER, two esti- mation measures are used, i.e., failure probability and reliability. Failure probability is the SER of partial primary outputs (POs) of the circuit that are infected by the soft error occurred internal that can be propagated to POs, while reliability is the probability of the whole circuit with correct functions. The paper is organized as follows: Some concepts about signal reliability are presented in Section 2. A brief description of the TP, EPP, and PTM methods is provided in Section 3. The implementation of the three methods is introduced in Section 4. A detailed experimental comparative study is given in Section 5. The paper is concluded in Section 6.
Signal Reliability
Signal reliability was proposed as a measure of logic circuit reliability in the 1970s. Ogus [2] indicated that the fault (hard fault) in a combinational circuit will not always cause the output incorrect. For example, if one input of a 2-input AND gate is logic 0, then a stuck-at-0 fault occurred on the other input line would not cause the output to give an error value. Given the probability of faults occurring in a circuit and the probabilities of the applied input combinations, the likelihood of the circuit output being correct can be determined.
It is assumed that all input combinations occur with the same probability. That is to say, if the circuit has n inputs, the probability of every input combination is 1/2 n . Table 1 summarizes some lemmas to calculate the signal probability for a circuit, where X i presents the ith signal, the probability of signal X i =1 is x i .
Table 1 Summary of probabilistic model lemmas

Logical function
Signal probability Assumptions
Circuits Reliability Estimation Methods
TP method
The model is based on the susceptibility tables of primitive logic gates. For example, Table 2 shows the 2-input NAND gate susceptibility table, where P s is the probability of a soft error hit at individual nodes. Prob. of out=1 00 1
This model provides a two-pass algorithm to analyzing the circuit. It is summarized in pseudo-code form below: By repeating the two-pass algorithm for all input patterns, the model calculates a soft error probability for any output node. The total error probability for an output node can be expressed as
EPP method
It is an error propagation probability computation approach for the estimation of a synchronous sequential circuit failure due to soft errors at the logic level. It uses the signal probabilities of all nodes in the combinational part and then computes EPPs based on the topological structure of the circuit. It needs to suppose an error source node n i , and then sets up the path from n i to its reachable POs and inputs of flip-flops. Four kinds of signal probability are defined, i. e., P e (P~e) is the probability of the output being e (~e) (~e presents the inverted value of e), P 0 (P 1 ) is the probability of the output being 0 (1). For the nodes that are not on the path, they will not be infected, so they have only P 0 or P 1.
The following algorithm shows how to extract and traverse all paths from a given error site to all reachable outputs and how to apply the propagation probability rules. Table 3 shows an example of the rules. For any PO or flip-flop FF j , the main algorithm includes three procedures:
(1) Path construction: Extract all signals and gates from error source node n i to any reachable PO or flipflop.
(2) Ordering: Levelize signals on these paths using the topological sorting algorithm.
(3) Propagation probabilities computation: Traverse the paths in order and apply propagation rules to compute probability for each on-path node.
As an example, Table 3 shows the error propagation calculation rules for an AND gate. Table 3 Computing probability at the output of an AND gate in terms of its inputs AND gate
According to the propagation rules and gates connection, the EPP of an erroneous value from error source n i to all reachable primary outputs is calculated. The system failure probability is calculated as
where R seu (n i ) is the occurrence rate of SEUs at node n i to cause a glitch at the output of the gate. And P erroneous (PO j ) =P e (PO j )+P~e(PO j ). If PO j is a flip-flop input, the error propagation probability is equal to P erroneous (PO j ) P latched (n i , j). P latched (n i , j) is the probability that an erroneous value propagated from node n i is captured in FF j . P latched (n i , j)=1 if PO j is a primary output. k is the number of outputs belonging to the PO reachable_from_ni (the set of all reachable outputs from node n i ).
PTM method
The probabilistic transfer matrix method models circuits at the logic level using a matrix representation for gates that represents parallel composition of gates with tensor products. Figure 1 shows the matrix representation of a NAND gate, where the probability of each output value is explicit for each input combination. p is the probability the gate produces an incorrect output for any given input. The circuit PTMs are calculated from gate PTMs by combining gate PTMs in a manner dictated by their connectivity. If a circuit consists of two gates connected in series, its PTM is produced by matrix production of the two gate PTMs, while if a circuit consists of two gates connected in parallel, its PTM is produced by tensor production of two gate PTMs. Thus, the reliability of a circuit C can be obtained by the following expression:
where p(i) is the probability of input vector being i. 
p(j|i) is the (i, j)-th entry of the PTM. I(i, j) is the (i, j)-th term of an ideal transfer matrix (ITM
Implementation
We implement the three gate level reliability estimation methods using C++ on a Dell Dimension 2400 PC (2.66GHz Pentium IV processor, and 512MB memory). The experimental circuits include ISCAS85 benchmark circuits and 74-series circuits with the netlists in IS-CAS85 format.
(1) TP algorithm We perform the width-first search algorithm to find all primary inputs (PIs) that reach the selected POs. For convenience, a recursion function is used to replace the stack, which has similar complexity. To get the total error probability of the selected node, the calculation should perform 2 n times, where n is the number of inputs.
(2) EPP algorithm In the first part of the EPP algorithm, we construct the directed graph G(V, E) corresponds to the combinational part of the circuit, and then extract the path stored in a sub-graph G(VI, EI), which may consume much space.
(3) PTM algorithm The circuit-levelizing algorithm described in Ref. [11] is selected. The algebraic decision diagram (ADD) is used to compress PTMs.
Comparison of the Methods
Qualitative comparison
PTM model supposes that an error occurs on individual gate, performs simultaneous computation over all possible input combinations to get the whole circuit reliability. It needs not to explicitly map out signal dependencies in the network. TP method and EPP method suppose that an error hits individual node(i.e., signal terminal) and perform computation on the active paths. TP model picks one output and performs serial computation over input patterns. EPP model needs a given error site to calculate the propagation probability. PTM model concerns the possibility of multiple soft error events; but EPP model can only cope with single soft error event. TP and PTM are accordant in essence, but PTM includes more circuit information.
As we mentioned in Section 1, PTM method computes the overall probability of correctness for a circuit, while TP method and EPP method compute the failure probability that a node is functionally sensitized by the inputs to propagate the erroneous value from the error site to POs.
Three methods are all used for combinational circuits. Because of the exponential complexity, PTM and TP can only be used to small and medium scale circuits, while EPP can be used for large scale circuits.
Additionally, PTM can provide an accurate measure for the susceptibility of some gate in a circuit. Given a gate error probability p, it can be calculated by R(C)/p. Obviously, the replication of a gate with high susceptibility can improve the entire circuit reliability.
Quantitative Analysis
Functional analysis (1) Experiments of TP
The calculating results of 1-bit full adder are given in Table 4 , where Ps is the given error probability of a node. It shows that TP method can obtain the SER of one output node over different input patterns. This allows the designer to estimate soft error resiliency for specific input patterns and identify highly susceptible inputs to modify the design accordingly. In Table 5 , we define reliability of the selected POs as 1-SER. # Node means the total number of lines in the netlist. PO is the selected PO number that SER will be calculated. # PI records the total number of PIs corresponding to the selected PO. The time complexity of TP is (2 # PI ). We find that the bigger the # PI is, such as 74 184 and 74 185, the higher the CPU time and memory overheads are. Therefore, TP method cannot be used for large scale circuits.
(2) Experiments of EPP In Table 6 , Rseu is the given SEU occurrence rate of a node, NSN is the error source node number that is selected randomly in the program, the off-path signal probability SP off-path is the probability of those signals not on the propagated path being 1. SP off-path can be set during calculation. It is assumed that SP off-path =0.5 in our experiments. The complexity of EPP algorithm is (n), where n is the total number of lines and gates in the circuit. EPP model can be used to large scale circuits. But it is not accurate enough because the selection of the error source node and the setting of the value of SP off-path are random.
(3) Experiments of PTM In Table 7 , # PI and # PO are the number of PIs and POs of the circuit, respectively. Width is the largest number of signals at any level of the evaluated circuit, which is one sensitive factor to the algorithm complexity. p is the gate error probability.
PTM model is more accurate and simple. It provides a measure of reliability for an entire circuit. However, just because it contains more structural information of the circuit, it has an exponential complexity. For a circuit with n inputs and m outputs, its PTM has space complexity (2 n+m ). Table 7 only gives experimental data of three small scale circuits. We cannot obtain more data for larger circuits under our experimental environment.
Performance analysis
Several example circuits are used to experimental analyses of the three methods. Unfortunately, the results show that without optimization, the TP and PTM models cannot be used to large scale circuits. Therefore, Fig. 2 only illustrates the CPU time and memory overhead of Full adder, Schneider and C17 circuits. For these circuits, TP is time-consuming and EPP is memory-consuming. 
Conclusions
Estimating reliability of VLSI circuits at high level becomes more and more important. In this paper we make a comparative study of three logic level reliability evaluation methods. By qualitative comparison and quantitative analysis, we find that TP and EPP calculate the failure probability of partial circuit assuming soft error hits at individual nodes, while PTM calculates the whole circuit reliability assuming soft error hits on the gate. EPP can be applied on large scale circuits, while TP and PTM only fit for small scale circuits. The PTM method can calculate the overall correctness probability of a circuit. Its reasoning procedure is more accurate, so it deserves to be further studied. One of the most serious problems should be solved is its memory bottleneck for large scale circuits.
