We present a technique using diverse duplication to implement concurrent error detection (CED) in sequential logic circuits. We examine three diflerent approaches for this purpose: Our results for the simulated sequential benchmark circuits demonstrate that the third approach is most efJicient in protecting sequential logic circuits against multiple and common-mode failures. The computational complexity of the data integrity analysis of the third approach is of the same order as that of the first approach and is at least an order of magnitude less than that of the second approach.
Introduction
Concurrent Error Detection (CED) techniques are widely used for designing systems with high data integrity. By data integrity, we mean that the system either produces correct outputs or generates an error signal when incorrect outputs are produced. A duplex system in the form of a self-checking pair is a classical example of a CED scheme which has been used for guaranteeing data integrity in many applications like the IBM G5 and G6 processors [Spainhower 991. Figure 1 .1 shows the basic principle of operation of a duplex system. As long as only one module fails, a duplex system provides guaranteed data integrity. It is generally assumed that module failures are independent events; hence, in a duplex system, the probability that both modules fail is very low for realistic failure rates. However, this assumption is not always true. In a duplex system, common-mode failures (CMFs) result from failures that affect both modules at the same time, generally due to a common cause [Lala 941 . These include operational failures due to external (such as EMI, powersupply disturbances, radiation) or internal causes and design mistakes. CMFs are surveyed in [Mitra OOa] .
Design diversity was proposed and used in the past to protect redundant systems against common-mode failures [Avizienis 84, Briere 93, Riter 951. In [Avizienis 841, design diversity was defined as the independent generation of two or more software or hardware elements (e.g., program modules, VLSI circuit masks, etc.) to satisfy a given requirement. The basic idea is that, with different implementations, common failure modes will cause different error effects.
The conventional notion of diversity is qualitative and does not provide any quantitative insight into design of diverse duplex systems. In [Mitra 99a ], a metric was developed to quantify design diversity and analyze the reliability, availability and data integrity of duplex systems using this metric. In [Mitra OOb] , this metric was used as a cost function lo synthesize diverse implementations of combinational logic functions. However, the efforts on characterization of diverse duplex systems were focused on combinational logic circuits. In this paper, we extend our ideas to sequential logic circuits.
This work was done as part of the ROAR (Reliability Obtained by Adaptive Reconfiguration) project [Saxena 001 . In the project, the system under consideration is reconfigurable and contains user-programmable logic elements (e.g., FPGAs). For such systems, faults can be detected during system operation, the faulty part can be located, and the system can be reconfigured to operate without using the defective part. The Field Replaceable Unit (FRU) is a programmable logic block or a routing resource, instead of a chip or a board used in any conventional fault-tolerant system. Hence, it is reasonable to design combinational or sequential logic with concurrent error detection such as duplication.
In Sec. 2 , we describe three approaches to designing sequential logic circuits with CED based on diverse duplication and present simulation results comparing these three schemes. Section 3 describes a technique to analyze the data integrity of sequential logic circuits with CED. We conclude in Sec. 4.
Diverse Duplication for Sequential Logic Circuits
We consider the Finite State Machine (FSM) model of sequential circuits [McCluskey 861 as shown in Fig. 2.1 . In addition, we assume that faults do not affect the clock signal (not shown in Fig. 2.1 For synthesizing diverse implementations of the nextstate and the output logic the technique in [Mitra OOb] can be used. This CED technique suffers from the problem that there is no diversity in the state encoding (i.e., the flipflop contents). In the worst-case, for a faultfaffecting a flip-flop in the first implementation, a fault g affecting the corresponding flip-flop in the second implementation can be identified, such that the fault pair g ) can never be detected by the comparator; this situation is not desirable.
Diverse State Encoding and Diverse Logic
(DSEDL) Diversity can be created by encoding the internal states of the given FSM in "different" ways in the two implementations. This provides another degree of freedom in the synthesis of FSMs with CED based on diverse duplication and can possibly help in providing enhanced protection against CMFs compared to the scheme in Fig.  2 .2. This scheme is shown in Figure 2 .3. Since the encoding of the internal states of the FSM are not identical in the two implementations, simple self-checking comparator designs cannot be used to check the flip-flop outputs -the comparator design can be very complex. This can degrade the capability of this technique to detect multiple failures and CMFs. . The encoding of the internal states of the second implementation can be looked upon as a transformation of the encoding of the internal states of the first implementation.
Formally, if E1 (s) represents the encoding of state s in the first implementation, and E2(s) represents the encoding of state s in the second implementation, then E2(s) = T(El(s)). If T i s a "simple" transformation (e.g., linear transformation consisting of xor gates only), then we can design inexpensive checkers (e.g., parity trees) to check the flip-flop outputs.
Diverse Duplication for Output Logic; Parity Prediction for Next-State Logic (PPNSLDOL & PPDL)
The CED technique ISEDL (Sec. 2.1) has the following advantages over the technique DSEDL (Sec. 2.2): (1) The flip-flop outputs in the two implementations can be compared; hence, if a fault-pair produces nonidentical next state outputs, it will be detected; (2) As will be illustrated in Sec. 3, the computational complexity of the analysis of the ISEDL technique is much less than that of the DSEDL technique. However, the ISEDL technique suffers from the problem of having no diversity in the flipflop contents.
The CED scheme of this section combines the advantages of the ISEDL and DSEDL techniques. We use diverse duplication for the output logic and parity prediction for the next-state logic of the FSM implementation.
Figures 2.4a and 2.4b show two implementations of this CED scheme.
In Fig. 2 (Fig. 2.4b) generates circuits with less area overhead compared to PPNSDOL (Fig. 2.4a) ; hence, area results for the PPNSDOL technique are not shown in Table 2 .1.
Simulation Results

In
Next, we present simulation results on the vulnerability of CED techniques to multiple failures and CMFs. In dependable systems, it is realistic to assume that a corrective action is initiated after the system generates an error signal. Thus, for any system with CED, data integrity is guaranteed as long as the system does not produce an undetected corrupt output before indicating an error.
For each fault pair ct;:, 8) affecting the FSM, for each primary input sequence, the FSM produces outputs that belong to the following categories: (1) correct outputs; ('2) produces an error signal before producing an undetected erroneous output; (3) produces an undetected erroneous output before producing an error signal. Let yi,j be the fraction of input sequences f o r which the system produces only correct outputs; let zi,j be the fraction of input sequences f o r which the system produces an error signal before producing an undetected erroneous output. We define the term wi,j = for the fault pair U;,$) ;as the detected fraction or incorrect output detectability, which is the fraction of primary input sequences producing erroneous outputs for which the system data integrity is maintained. If the value of this term is 1 the system either produces correct outputs or indicates erroneous situations when incorrect outputs are produced. If the value is 0 the system never produces any error signal when incorrect outputs are produced. Note that, if a CED-based system produces correct outputs for all input combinations even in the presence of a fault, then the fault is redundant.
Similarly, for each fault pair (fi, jj), we define the probability of undetected error as Xi,j = 1 -Y i j -Zi,j.
We used the following simulation procedure. For each single-stuck-at fault fi, we simulated exhaustively all fault
The results of Table 2 .2 and 2.3 demonstrate the effectiveness of the PPDL technique of Fig. 2 .4b (diverse combinational logic implementation, parity prediction for flip-flops and generation of parity bit through an XOR-tree from a next-state logic implementation) for implementing CED in the simulated designs.
It may be noted here that, if transient faults create bitflips (rather than bit-stucks) in the flip-flops of a sequential circuit, then the CED technique based on diverse state 
181
Circuit Name encoding technique based on linear transformations, which is an extension of the idea of parity prediction as described at the end of Sec. 2.2, is expected to outperform the other techniques (ISEDL, PPNSDOL or PPDL) so far as data integrity is concerned.
In the next section we describe a formal technique for analyzing each of the CED schemes; the discussion also shows that the computational complexity of analyzing the DSEDL technique is at least an order of magnitude higher than that of the ISEDL, PPNSDOL or PPDL techniques.
Analysis of CED schemes
Suppose that we are given two implementations N1 represented by the following set { I , 0, S, T, L ) . Here, f is the set of primary input combinations, 0 is the set of primary output combinations and S is the set of intemal states. T is the transition logic which can be looked upon as a mapping from SXI to S. L is the output logic which can be represented as a mapping from SxI. An input distribution of an FSM is given by the conditional probability distribution P(ils) for all iEf and SES. P(ils) is the conditional probability that a primary input combination i~ I is applied to the FSM when it is in state s.
For the current paper, we assume that all primary input combinations are equally likely for all states. However, for specific systems, the input distribution can be approximated using trace simulations. state (a, b, c) in the product machine K detects the presence of a fault if b f c. All such states can be merged into a single state Detected. This reduction is not possible for a CED scheme with diverse state encoding unless there is an "easy" way to check that both the implementations are in the same state. All detecting transitions in the product machine K can be redirected to the Detected state; all edges starting from the states that are merged into the Detected state can be deleted. There is no outgoing edge from the Detected state. All bad transitions in the product FSM K can be redirected to a new state Error. There is no outgoing edge from the Error state. After these reductions, all unreachable states and edges starting from them in the final FSM can be deleted. Figure 3 .4 illustrates these reduction techniques for the product FSM in Fig. 3 .3 for the case when the internal states of the two FSMs are checked. The system never enters an Error state and the data integrity in the presence of the fault pair is 1. The data integrity of the CED system at time t in the: presence of faults can be defined in the following way. For each state s of the original fault-free FSM, we identify the: state S = (a, b, c) in the product FSM such that a = s, and b and c are the corresponding the states in the two implementations with faults; next, we calculate the probability E(S, t ) of being in the Error state in the product FSM at time t starting from state S. This can be calculated using straightforward Markov analysis techniques and tools like SHARPE (http://www.ee.duke.edu/-kst).
The data integrity of the CED system in the presence of a given fault pair is equal to x P ( s ) [ l -E ( S , f ) ] . Here, P(s) is the stationary probability of state s in the original fault-free FSM. For very low failure rates, it is realistic to assume that the original FSM reaches a stationary probability state before a fault affects the FSM. Analysis of CED scheme:; based on diverse duplication of output logic and parity prediction of next-state logic is similar to the analysi:j technique described above and is not repeated. 
Computational Complexity of the Analysis
Theoretically, the above analysis technique is computationally intensive because of the following problems. The analysis technique may run into memory problems due to possible state space explosion during the computation of the product FSM. For example, if the original FSM has 64 states, it is theoretically possible that the product FSM will have 643 = 262,144 states if we use the DSEDL technique (without comparators comparing the flip-flop outputs). Moreover, if the original FSM has a large number of primary inputs, then the construction of the product FSM will be very time consuming if we have to compute the state transition of the product FSM from each state for each primary input combination. In Table 3 .1, we show the characteristics of the 1149.1 Boundary-Scan TAP controller and the MCNC FSM benchmark circuits and the average and the maximum number of states in the product FSM over all single stuck-at fault pairs.
Most of the FSM benchmarks in the MCNC benchmark suite have the number of states not more than 32. The TAP FSM has 16 states and a single primary input. A similar observation can be made about the internal benchmark FSM specifications of CAD companies. This is perhaps because FSMs used in real designs are designed as interacting state machines. However, there are some FSM specifications with the number of states approximately 97 or 135; moreover, for FSMs with a large number of primary inputs, an exact analysis for each input combination can be very time consuming (FSMs $420, s510, $820 and scfwith 19, 19, 18 and 27 primary inputs, respectively). For these FSM benchmarks approximate techniques must be devised.
Conclusions
We studied the problem of implementing concurrent error detection (CED) based on diverse duplication in sequential logic circuits. We examined three different techniques for this purpose.
Our simulation results demonstrate that the CED technique based on diverse duplication of combinational logic and parity prediction of flip-flop contents is most efficient in protecting sequential logic circuits against multiple and common-mode failures. We also described an exact technique to analyze the data integrity of sequential logic circuits with CED. Our results on MCNC benchmark circuits show that the exact analysis technique is feasible for many (80 %) benchmark circuits although theoretically it can suffer from state space explosion problems. Future research must focus on extending the idea of parity prediction for next-state logic
