Abstract: This work considers a tool for simulating single event transients produced by ground level radiation in VDSM ICs. Fault injection procedures and a fast fault simulation algorithm for transient faults were implemented around an event driven simulator. A statistical analysis was implemented to organize data sampled from simulations. The performance evaluation of the algorithm shows that for a large number of fault injections, the algorithm is much faster than a serial fault simulation approach.
Introduction
IC technologies are approaching the ultimate limits of silicon in terms of channel width, power supply and speed. Drastic device shrinking, power supply reduction, and increasing operating speeds that accompany the technological evolution towards nanometric technologies, reduce significantly the noise margins and thus the reliability of deep submicron ICs face to the various internal and external sources of noise [NIC 98a ]. This process is now approaching a point where it becomes unfeasible to produce ICs that are free from these effects. One of the most significant problems concerns the sensitivity of VDSM circuits face to energetic particles. Such particles are for instance, atmospheric neutrons produced by the sun activity. As we are approaching 0.1um the rates of random errors induced by cosmic neutrons are becoming unacceptable.
The basic reason for increased sensitivity to single event transient (SET) produced by atmospheric radiation and alpha particles is the reduction of device size and V DD voltage. This increased sensitivity affects nodes of both memory cells and logic networks. Traditionally, only memories were protected against transients. Very deep submicron scaling increases drastically the sensitivity of logic networks too. Transient pulses induced by particle strikes have a width of few hundreds of picoseconds (the exact value depends on circuit characteristics and particle energies). Pulses wider than the logic transition time of a gate are propagated through the network without attenuation [BAZ97] . Since the logic transition time of gates is shorter than the transient pulse duration, the transient pulses are not attenuated even for relatively low energy particles. In addition, as the clock frequencies are increasing significantly, the probability to latch a transient pulse at the network output is increased by the same factor. Due to these trends, logic parts become as sensitive as memories.
In this context, it will become mandatory to design the future ICs to tolerate transient faults. Because this protection will be needed for any product, including commodity ones, traditional fault tolerant design such as TMR cannot be used due to its high cost. The only viable solution is to adjust the design flow to transparently implement TFT (transient fault tolerant) techniques.
Means for evaluating the SET sensitivity of an IC in an early phase of the design cycle are mandatory to determine if the circuit must be protected again SETs. Such means are also needed to evaluate various protection techniques and select the most efficient one. A multilevel simulation framework is under development at iRoC Technologies, including 3D simulations, electrical simulations, gate level simulations, and RTL simulation. This paper presents ROBAN, a gate-level simulation able to simulate in short time large numbers of transient pulses.
Transient Fault Modeling and Injection
The impact of a charged particle on a MOS circuit has been extensively analyzed [CHA 93b ]. Electron-hole pairs resulted from the particle impact on the drain zone of the transistor are collected between the drain and the substrate due to the potential difference on the junction. The current pulse width depends on several technological parameters. This pulse may be modeled by a double-exponential current pulse [MES 82]:
According to the node parameters, this current pulse creates a voltage pulse on the affected node. As our main concern is to statistically evaluate the behavior of the circuits vs. transient faults, the exact pulse shape is not important. Actually, the pulse is different from a particle impact to another, so it is meaningless to evaluate the circuit under test for a specific transient pulse. What is important is the instance within the clock cycle when the particle strikes a node and how much time is needed to dissipate the charge injected (the pulse duration). The instant of the impact governs when the pulse will reach primary outputs of the circuit, after propagation through the circuit network. In fact, the occurrence instance of the pulse and its width is critical for the latching of the fault in a flip-flop or a latch. Therefore, the simulator must allow injecting transient pulses of various widths and also injecting them at various instances within the clock cycle. The form of the voltage pulse is another important parameter. The actual pulses can have various shapes. However, the response of a gate to a pulse injected on its input primarily depends on the part of the pulse that exceeds the gate threshold. Thus, a given pulse of a certain shape is equivalent to a square pulse with duration equal to the time period during which the actual pulse exceeds the threshold of the gate. Thus, pulses of square shapes can be considered without compromising the quality of simulations. This equivalent pulse simplifies fault injection and allows fault simulations without changing the internal structure of the circuit under test. ROBAN implements this fault model using Verilog forcerelease statements as follows begin #<impact_time> temp = ~ <injection_net>; force <injection_net> = temp ; //fault injection start #<pulse_width> release <injection_net> ; //fault injection ends end
The width of the equivalent pulse corresponds to the original pulse and can be chosen as a mean value, worstconditions or varied in a representative range. The designer can modify all the parameters related to the fault injection. Our task is to deliver an efficient tool that can simulate the fault according to the model.
The tool offers the possibility to inject faults on all the nets within the circuit, even if the selected net is not a logical choice (supply net). However, we will consider only faults injected onto the output ports of logic gates. Furthermore, it is possible to inject multiple pulses created by a single particle on several topologically close nodes. After its occurrence, the transient pulse will reach the circuit outputs at different times according to the delay on various internal paths. The pulse will be propagated only through the sensitized path connecting the node of its origin to the circuit outputs. Reconvergent paths with different delays can superpose on a net transient pulses issued from a single initial pulse, and result on a pulse much larger than the original one, or create multiple pulse at a single output. Also, pulses can reach different outputs at different instances. These phenomena aggravate the sensitivity of the circuit since they increase the probability that a transient pulse causes an error by reaching one or more outputs during the latching edge of the circuit clock.
Test Environment
To test the circuit, the test environment aims at providing the necessary elements: signal to drive the inputs of the device under test, circuits to analyze the outputs, etc. We have chosen to implement the test environment at a behavioral level, independently of the technology used for the device under test. This approach has the advantage of being generic (it works with all the devices under test). To build the test environment, we need to monitor the circuit outputs by a program that checks their correctness, to generate the input stimuli, and create the list of injected faults.
The output monitoring is done by a fault-free simulation that determines the correct responses of the circuit, used as the reference values. The output monitoring also includes the monitoring of the errordetecting signals that determine the efficiency of error detection or fault tolerant techniques.
The input stimuli can be either user defined or randomly generated. Random input vector generation has to be done carefully, because some input configurations, not occurring during normal circuit operation, can drive the circuit into inconsistent internal state. Entering an inconsistent internal state due to illegal input values has to be avoided, since after such an event the circuit can be blocked for the rest of the simulation. On the other hand, entering an inconsistent internal state due to an injected fault is allowed, since for the next fault injection, the circuit state is reestablished on its correct values. 
Simulation Algorithm
The algorithm used is an algorithm dedicated to the simulation of transient faults in combinational circuits. We can use the characteristics of this specific class of faults (these faults are temporary faults, not permanent ones) to minimize the time needed for the analysis, however restricting the use of this algorithm.
This algorithm uses an event-driven simulator to load and to evaluate the behavior of the circuit. An eventdriven simulator evaluates only events (changes) in the circuit. Roban can control the primary inputs because it generates the test vectors applied to the circuit, controlling thus the activity of the circuit. This control is realized using the Verilog language. The time needed for the simulation is proportional to the total number of events evaluated. This number depends on the size and complexity of the circuit, the number of test vectors applied at the inputs, and the supplementary activity associated to the fault injection. To reduce the simulation time, the number of evaluated events must be reduced. The test vectors can be arranged to minimize the activity in the circuit when changing the inputs, but is a minimal improvement. The most gain can be obtained if we minimize the events associated to the fault injection.
The number of events to be simulated for each fault is equal to the number of input bit transitions associated to the transition from the previous input vector to the new input vectors plus the node transition associated with the injected faults. However, one can show that only transient pulses occurring on a node during the steady state period of the node are of interest. Thus, we can consider the steady state of the circuit and eliminate the simulation of the input events for each fault injection [CHA93a] .
The algorithm is:
for each test vector evaluate the circuit (without faults) for each fault fault injection evaluate fault activity (propagate fault) compare current outputs with reference values calculate sensibility restore correct state end 'for each' fault end 'for each' test vector For a test vector, the first clock period is reserved for the simulation without faults. Then, during each clock period, a fault injection is done. After the injection of all faults, we can change the test vector applied on the primary inputs. Thus, for N fault injections, we realize N+1 simulations => N+1 clock period. For M vectors, a total of M*(N+1) simulations are needed.
Extensions have been developed to consider also registers in fairly complex sequential circuits. During the simulation, the algorithm uses two copies of the circuit, one with fault injection, and the other as a reference. The internal states of the corresponding registers in the two circuits are compared to discover latched faults. The configuration of the affected registers is restored using the registers in the reference circuit.
Data Analysis and Interpretation
To evaluate the sensitivity of a circuit with respect to the transient faults, we use a sensitivity rate (in %). If the fault is not latched at any memory element (latch, flip-flop), a "fault dropping/discarding" occurred and the fault has no impact on the circuit. The sensitivity of the circuit, in this case, is 0%.
If the fault is latched into a memory element, then the sensitivity of the device under test with respect to this fault is strictly greater than 0%. The sensitivity can be 100% when we have a clear observation of the fault (e.g.: when the latched state of a memory element is 0 and the corresponding correct value is 1), or we can have a smaller sensitivity when we cannot evaluate correctly the outputs ("x" state). In this case we calculate a probability p correct that the circuit is in a correct state. The sensitivity is 100-p correct .
If the device under test has error-detecting mechanisms, we can use an error signal to analyze if the error has been detected or not. Along with the sensitivity rate, ROBAN can estimate an error detection rate (coverage of the error detecting mechanism). Figure 1a) illustrates the linear dependence of the circuit sensitivity to the clock speed. The trends of this analysis show that in the multi-GHz domain, logic circuits become as sensitive as memories. For instance, the extrapolation to a clock frequency of 1 GHz, gives that the sensitivity of the analyzed circuit for 300 ps transient pulses is 25%. Considering a single net with several fault injections and observing the errors on the outputs, the graph b) shows the sensitivity of the net with respect to the pulse width. The simulations have been made using NCSim (Verilog mode) simulator by Cadence Design System on an embedded block of a microcontroller totalizing 3942 combinational gates and 228 registers. The analysis considered 3942 injection points (the outputs of each combinational gate) with 51 faults/injection point evenly distributed during clock period and 50 test vectors, a total of 10 millions simulations. Using a Sun Ultra 10 workstation the whole process takes 354 minutes (~6 hours). Considering a classic serial fault simulation with only one fault injected per run, under the same conditions, the process can ultimately take over 10000 hours to complete.
It is clear that a dedicated simulation algorithm can tremendously accelerate the design flow to help the designer to achieve an extensive analysis of the behavior of circuits under transient faults.
Conclusions
We have presented a transient fault simulator able to simulate a high number of transient faults injected in complex circuits, within a few hours. Due to its high speed, the simulator allows to make large statistical simulations and obtain accurate circuit sensitivity evaluation, supposing that the range of transient pulse duration is known. Three-dimensional and electrical simulations can be used to evaluate the transient pulse duration resulting on a framework allowing accurate circuit sensitivity evaluation. Presently we are developing a framework making possible to predict with a high accuracy the error rates in complex VDSM ICs, under ground radiation.
