Very Large Scale Integrated (VLSI) circuits used in the space and nuclear industry are continuously subjected to ion radiation. As the limits of VLSI technology are pushed towards sub-micron levels in order to achieve higher levels of integration devices become more vulnerable to radiation induced errors. These radiation induced errors may lead to possible system failure, particularly if they a ect the memory portion of vital sub-systems, such as state machine controllers. 
Introduction
The advancements in VLSI technology toward lower supply voltages and smaller feature size make today's circuits more susceptible to errors caused by external factors. These errors, either transient or permanent in nature, may ultimately lead to system failure. This is of particular concern in the case of sequential state machines, which act as controllers in most VLSI systems, as a transient error in a sequential machine has the potential to become permanent if it a ects the storage element of the machine. This type of error is very common to VLSI circuits used for space-borne applications, where random high energy cosmic ion strikes cause soft errors in memory elements, referred to as single event upsets (SEUs) 1]. While there are ways of preventing SEU-induced errors, they involve either costly processes with larger feature sizes or storage elements with higher static power dissipation 2]. In this paper we concentrate on design techniques for synchronous state machines which tolerate single errors in the state variables. This restriction is quite reasonable, as the e ect of an ion particle strike is con ned to a region 2 m in diameter 3]. The use of fault-tolerant architectures to tolerate an SEU-event would allow the manufacture of VLSI circuits for mission-critical space applications using conventional, state of the art, low power CMOS processes.
Research on the design of fault-tolerant controllers started in the early 1960's. Many di erent schemes have since been proposed for fault-masking in sequential machines 4]{ 12]. With the development of signature analysis in the 1970's and the increased importance of testing and testable designs, focus shifted away from fault-tolerant state machines and lead to other areas of research. One such area was control ow checking for synchronous state machines 13]{ 15] and is closely related to fault-tolerant design techniques. However, this work is not suitable for achieving on-line fault-tolerance in state machines due to their reliance on o -line techniques for detecting errors. The availability of CAD tools and advances made in VLSI technology have rekindled research interest in the classical fault-tolerant design methods 16] . In this discussion, we compare the tradeo s associated with di erent fault-tolerant state machine designs implemented in standard cell, CMOS technology using the Octtools IC design suite from UC-Berkeley.
The remainder of this paper is organized as follows. We will present design methods to tolerate single errors in state machines based on the classical fault tolerant design principles in Section 2. Two of these methods are based on hardware redundancy principles and the remainder on information redundancy techniques. State machines from the MCNC logic synthesis benchmark suite 17] were implemented using these techniques and the results are presented in Section 3.
Fault-Tolerant Design Techniques
A nite state machine M is formally de ned as a ve tuple (S; I; O; ; ): S is a nite, nonempty set of states; I and O are the nite, non-empty sets of inputs and outputs respectively; is the next state transition function, : S X I ! S; and is the output forming logic function, : S X I ! O. The symbolic states in the non-empty set S are encoded by a group of binary state variables. In this discussion, the logic block which implements the next state transition function is referred to as the excitation circuit. The next state logic plus the output forming block represent the combinational logic of the state machine, while the ip ops which store the state information form the memory section. In this paper we will focus on methods for tolerating single errors in the state variables, such as might be induced by a single event upset. In some cases, the architecture will also tolerate certain gate failures in the excitation circuit.
Fault-tolerant designs use additional resources beyond what is needed for normal operation. These additional resources can be in the form of hardware, information, or time. SEU-immune state machines can be designed using one or a combination of these additional resources. However, we have eliminated time redundancy from our study because we are interested in producing state machines which operate at speeds comparable to the non-redundant implementations.
Triple Modular Redundancy Architecture
Triple modular redundancy is the most common form of hardware redundancy used. In this implementation the excitation circuit and the state ip ops are triplicated and the three copies of the present state variables are voted on to produce the correct present state. Each module is a complete, non-redundant state machine without the output forming logic. The present state output from the voter is fed back to the three modules of the excitation circuit. This design masks any single state variable error caused by a gate failure in the excitation circuitry or the memory portion of the state machine.
Since the emphasis of this paper is on the design of an SEU-immune state machine there is no need to triplicate the excitation circuit. A simpler architecture may be constructed in which the next state is generated from a single, non-redundant, excitation circuit and fed to three copies of the state ip ops. As in the full TMR architecture, the outputs of the ip ops are voted on to produce the correct present state. This con guration will be referred to as SEU-I TMR to distinguish it from the full TMR implementation.
Duplex Architecture
A single fault-tolerant state machine can be designed using two copies of fault detecting state machines. Di erent architectures which implement this scheme have been suggested 10, 11] . In order to perform single bit error detection, the Hamming distance between any pair of states must be greater than or equal to two. This can be accomplished by designing the excitation circuit of each machine to generate a parity bit along with the original state variables, increasing its size compared to a non-redundant circuit.
The next state generated by the excitation circuit is fed into two independent sets of state ip ops. Parity of the state variables is regenerated by two copies of error detecting circuits to form error signals, E M and E S . If both of the error signals are low, indicating a condition of no error, the present state generated from module 1 is routed through the selector circuit to the feedback path of the state machine. In the event that a single error occurs in one of the modules, the corresponding error signal will be asserted and the present state from the other module routed to the feedback path. If both E M and E S are asserted, indicating faults in both modules, the system can be forced into a fail state by the selector. Faults in the excitation logic block, error detection circuit and the selector circuits are not tolerated in this architecture.
Explicit Error Correction Architecture
The explicit error correction architecture was the earliest attempt to apply Hamming error correcting codes to the design of fault-tolerant state machines 4, 7] . The rst step is to encode the states of the given sequential machine with Hamming distance three codes, pro-viding for single bit error correction. The excitation circuit is then designed; due to the large number of don't care entries introduced by the redundant state variables, this logic can be implemented e ciently.
Next, the explicit error correction circuit is designed based on the syndrome generation principle. First, the parity bits are regenerated from the output of the state ip ops. A bit-wise exclusive-or of the regenerated parity and the output from the state ip ops which hold the original parity bits results in a syndrome. The value of the syndrome indicates the state variable in error and is used to correct the original state variables or the parity state variables. The output of the error correction circuit is the correct present state, which contains both the non-redundant state variables and the appended parity.
Modi ed Explicit Error Correction Architecture
In the explicit error correction architecture, all of the state variables, including the parity bits, are generated by the excitation circuit using only the outputs from those ip ops which hold the non-redundant state information. This redundancy in the excitation circuit can be eliminated by using a separate \parity appender" circuit to generate the parity state variables. A minimum number of parity bits, as given by the Hamming bound, are appended to the output of a non-redundant excitation circuit to maintain a distance of three between any pair of next states.
As with the explicit error correction method, separate error correcting logic is placed in between the state ip ops and the excitation circuitry. However, this circuit is simpler than before because only the non-redundant variables are corrected; errors in the parity state variables are not corrected as they are not used to generate the next state of the machine.
Implicit Error Correction Architecture
This technique is also based on Hamming encoding of the states and is very similar to the ideas proposed by Russo 5] and Meyer 12] . Rather than use separate circuitry to correct errors in the state variables, the error correction function is built into the excitation logic of the sequential machine. As a result, adjacent entries in the next state map, corresponding to single state variable errors, must be changed from don't cares to the state encoding of the correct next state. For example, if state A is encoded as <00000>, then the next state map will have six entries lled with the correct next state for present state A under each input condition: one entry corresponding to no state variable errors and ve for each single error. Consequently, the number of don't care states is greatly reduced, resulting in a larger excitation circuit.
The advantage of this architecture is that if gates in the excitation circuit are not shared between state variables, then faults introduced by any single gate failure are also tolerated. It should also be noted that a corrected version of the present state variables is not available for the output logic. Consequently, the output logic would need to be designed to use present inputs and present next state values to generate the proper output.
Qualitative Analysis
A true single fault-tolerant state machine can be de ned as one which tolerates any single gate failure in the combinational or memory section of the machine, including the input decoding circuitry. The area overhead associated with such designs can be very large and unacceptable in many cases. More often, fault-tolerant designs cater to a small set of error conditions; in this work the architectures presented tolerate single, soft or hard errors in state ip ops. The remaining circuitry is assumed to be fault free.
The full TMR architecture is almost a true single fault-tolerant state machine since any gate failure in the excitation circuit or state ip op error is tolerated. Intuitively, it can be seen that this architecture results in an area increase of more than 200% compared to a non-redundant machine. In contrast, the SEU-I TMR architecture would have a smaller area overhead since there is a single copy of the excitation circuit. However, the architecture tolerates only a restricted set of faults compared to the full TMR implementation. The basic advantage of the TMR based schemes is the ease and simplicity of the design. The duplex architecture has fewer redundant ip ops compared to the TMR architecture. However, in this case the error detection and selector circuit are complex. The explicit and implicit error correction architectures are based on information redundancy techniques. These architectures will always result in an equal, if not fewer, number of redundant ip ops compared to the other schemes.
The architectures presented in this section were applied to the MCNC logic synthesis benchmark state machines. In the next section we will compare each method in terms of area and maximum path delay.
Performance Evaluation and Analysis
The Microelectronics Center of North Carolina (MCNC) maintains benchmark circuits for researchers interested in test generation and logic synthesis of combinational and sequential circuits. These circuits were assembled from industrial and academic sources from the United States and abroad and are widely used to compare results in logic synthesis and optimization methods. State machines from the MCNC logic synthesis benchmark set of 1991 were implemented to compare the architectures presented in Section 2 of this work.
Two important criteria useful in comparing VLSI implementations are die area occupied by the circuit and speed of operation. Five benchmark state machines listed in Table 1 and recommended by MCNC 17] were simulated and standard cell layouts generated using the Octtools tool suite developed by UC-Berkeley. The results of these implementations form the basis for a quantitative comparison of the architectures in terms of area and speed.
State Machine Implementation using Octtools
A ve-step procedure was used in the design of non-redundant state machines:
State Assignment State encoding was done with the primary goal of minimizing the combinational logic of the machine, using the program jedi.
Excitation Circuitry A behavioral description le which describes the next state behavior of the state machine was generated from the state table. This le was provided to misII, a multiple level, combinational logic optimization program. misII maps the optimized logic into a netlist of standard cells. The standard cell library used for this work was developed at Mississippi State University for a two metal, scalable CMOS process.
Flip-Flops Standard cell, set-reset, D ip ops were externally interfaced to the unplaced next state logic to realize a complete state machine.
Simulate A multilevel, functional logic simulator, musa, was used to verify the state machine design.
Place and Route VLSI layouts were generated using an iterative standard cell place and route utility called wolfe. Table 1 shows the results of implementation of the non-redundant benchmark machines. Because the standard cell library is scalable, dimensions are given in terms of , rather than microns. The worst case delay through the combinational logic was found with a 1pf load on all internal nodes of the circuit.
The fault-tolerant architectures were synthesized and implemented using Octtools with minor modi cations to the design ow. For the TMR architectures, the excitation circuit design and implementation is similar to the non-redundant machine. The voter circuit in this architecture was described using gates from the standard cell library, along with the state ip ops. This le was then interfaced to the excitation circuit to realize a complete design. For the duplex, explicit, and implicit architectures the state encoding was a two step process:
rst, a non-redundant encoding was done using jedi and the required parity bits appended manually. The state codes obtained were then used in the behavioral description le of the excitation circuit. The error correction circuit of the duplex architecture is included along with the state ip ops as in the TMR architecture. However, in the explicit and modi ed explicit error correction methods, the error correction logic was described along with the next state logic using the behavioral description language.
Area Overhead
Optimized areas for each of the architectures were found after generating the layouts for several trial runs. The percentage increase in the area of the fault-tolerant implementation with respect to the non-redundant implementation is shown in Table 2 . These percentages include the input forming logic, the excitation circuitry, and the ip ops, but not the output forming logic. The output forming logic is a block of nonredundant circuitry common to all implementations and would distort the comparisons of area overhead if it were included.
Full TMR The area overhead for the TMR architecture concurs with the intuitive guess made while qualitatively analyzing the architecture. The slight variations in overheads for di erent machines implemented using this architecture as seen in Table 2 are because of the routing penalties associated with increased next state logic and ip ops. The error correction circuit, represented by the voter in the TMR architecture, is simple and forms a small percentage of the overall area. This advantage is o set by a three fold increase in the excitation logic. This architecture is not a good choice for implementing radiation immune state machines since a very large price has to be paid in terms of area.
Duplex In the duplex architecture the major source of area increase is due to the replicated state variables, including parity. For the machines bbara, dk512, and cse, the area penalties associated with an increased number of state ip ops and error correction circuitry are equal since they have the same number of state variables. However, there is considerable variation in the total area overhead due to di erences in the excitation circuitry. The amount of this logic increase is dependent upon the state encoding and will be discussed later.
Explicit EC In the explicit error correction architecture the area increase is caused primarily by the logic in the excitation circuit for generating the parity state variables. This method is e cient when the amount of redundancy added due to the Hamming code is a minimum. For example, if the number of states in a machine is six, the number of parity bits required is three while the same number is required for a machine with sixteen states. The area occupied by the error correction circuitry in this architecture is on average 50% more than that of the duplex architecture. The overall area penalty, due to the redundant excitation circuit and the error correction logic, is more than the duplex architecture and outweighs the advantage of fewer parity state variables.
Implicit EC In the implicit error correction architecture, if the gates in the excitation logic are not shared, then any single gate failure in this circuit is tolerated. But, the area overhead for such a design is much higher than the TMR implementation. Note that the machines dk14 and dk512 implemented using information redundancy methods have area overheads higher than the average; this is due to poor optimization of the excitation circuit. For a non-redundant machine implementation, the states are encoded by jedi to optimize the logic in the excitation circuit. The optimization property in the state codes is lost when parity state variables are appended in the duplex, explicit, and implicit architectures. The amount of ine ciency induced in the excitation circuit due to this factor is variable and is machine speci c.
SEU-I TMR and Modi ed EC The SEU-I TMR architecture and modi ed explicit error correction architecture are the two most area e cient schemes to implement radiationimmune state machines using conventional fault-tolerant methods, due to the absence of any redundancy in the excitation circuit. However, as a result, neither of these architectures will tolerate any gate failure in the excitation circuit. The increased area of the ip ops in the SEU-I TMR architecture is compensated by the e cient error correction circuit. The SEU-I TMR architecture is also easier to implement, particularly the error correction circuit, compared to the modi ed explicit error correction architecture.
Critical Path Delay
In this section we discuss the critical path delay which dictates the maximum operating frequency of the state machine. To ensure reliable operation, the data inputs to the state ip ops must be stable for a setup time, t su , before they can be clocked. After the ip op is clocked, the data propagates to the output after a delay of t ff , the ip op propagation delay. The output of the ip op then becomes the present state input for the excitation circuit. The next state, Y , is generated after a delay through the excitation logic, represented by t c . The clock period, T, should not be less than the loop propagation delay plus the setup time. That is, T t max ff + t max c + t max su :
For a given set of ip op setup and propagation delay times, the propagation delay through the combinational logic determines the clock period. For the fault-tolerant state machine architectures the delay of the error correction circuitry increases the excitation circuit delay, resulting in a lower clocking frequency. The longest path in the combinational logic was identi ed and the delay through this path, t c , calculated for each architecture using the misII utility. The percentage increase of t c in a fault-tolerant machine compared to a non-redundant implementation is referred to as the delay overhead and is tabulated in Table 2 .
The additional delay introduced by the TMR and SEU-I TMR architectures are equal since they have the same error correction circuit. Similarly, the duplex and explicit error correction architectures have almost the same delay overhead. The error correction circuit of the duplex architecture is very slow since the fault has to be detected and then the proper state variables selected. In the explicit error correction architecture, the delay overhead is more due to the redundant excitation circuit.
In the case of the modi ed explicit error correction architecture, the error correction circuit delay is less compared to the explicit error correction architecture. This is because the error correction circuit is designed to correct only the errors in the non-redundant state variables. The SEU-I TMR and modi ed explicit error correction architecture are comparable in terms of speed of operation, with the TMR-based scheme having a slight edge.
Based on these experiments, it is apparent that the SEU-I TMR and modi ed explicit EC architectures are the best candidates for building state machines which are tolerant of single event upsets. The modi ed explicit EC architecture will always be the slower of the two, due to the complexity of the error correction logic compared to the voter circuit. In terms of area overhead, the two architectures are typically quite close. The SEU-I TMR architecture will generally require more state ip ops than the modi ed explicit EC architecture, however, this is often comparable to the overhead of the parity appender and error correction circuitry.
