Abstract-A gate-level radiation hardening technique for costeffective reduction of the soft error failure rate in combinational logic circuits is described. The key idea is to exploit the asymmetric logical masking probabilities of gates, hardening gates that have the lowest logical masking probability to achieve costeffective tradeoffs between overhead and soft error failure rate reduction. The asymmetry in the logical masking probabilities at a gate is leveraged by decoupling the physical from the logical (Boolean) aspects of soft error susceptibility of the gate. Gates are hardened to single-event upsets (SEUs) with specified worst case characteristics in increasing order of their logical masking probability, thereby maximizing the reduction in the soft error failure rate for specified overhead costs (area, power, and delay). Gate sizing for radiation hardening uses a novel gate (transistor) sizing technique that is both efficient and accurate. A full set of experimental results for process technologies ranging from 180 to 70 nm demonstrates the cost-effective tradeoffs that can be achieved. On average, the proposed technique has a radiation hardening overhead of 38.3%, 27.1%, and 3.8% in area, power, and delay for worst case SEUs across the four process technologies.
I. INTRODUCTION
W HEN high-energy neutrons (present in terrestrial cosmic radiation) or alpha particles (which originate from impurities in the packaging materials) strike a sensitive region in a semiconductor device, the resulting single-event upset (SEU) can alter the state of the system resulting in a soft error. Although soft errors cause no permanent damage, they can severely limit the reliability of electronic systems. Soft errors in memories (both static and dynamic) have traditionally been a much greater concern than soft errors in combinational logic circuits (for the same minimum feature size) since memories contain by far the largest number and density of bits susceptible to particle strikes. In the next decade, technology trends-smaller feature sizes, lower voltage levels, higher operating frequencies, and reduced logic depth-are projected to cause an increase in the soft error failure rate in core combinational logic in integrated circuits [5] , [18] , [22] , [33] . In [22] , it has been shown that propagated SEUs will constitute an important failure mode in integrated circuits at feature sizes below 0.35 µm. In [5] , it was shown that Manuscript received September 24, 2004 ; revised November 12, 2004 and February 11, 2005 . This paper was recommended by Associate Editor S. Sapatnekar.
The authors are with the Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005 USA (e-mail: quming@rice.edu; kmram@rice.edu).
Digital Object Identifier 10.1109/TCAD. 2005.853696 normalized to the number of cells, the soft error failure rate for dynamic combinational logic test circuits and static random access memory (SRAM) circuits are essentially identical at the 0.25-µm technology node. Another recent study [33] indicates that the soft error failure rate in logic will increase to an extent where it will be comparable to present-day unprotected memory elements. Thus, the development of techniques to estimate and reduce the soft error failure rate in combinational logic circuits is an important challenge for the near future. The approaches to increasing system reliability to SEUs and soft errors can be divided into techniques based on fault avoidance (intolerance) and fault detection/tolerance. Radiation hardening techniques for fault avoidance to increase reliability primarily rely on conservative design practices such as the use of high-reliability components, the exclusion of radiation-sensitive circuit styles (such as dynamic logic and non-complementary metal-oxide semiconductor (non-CMOS) styles) and the incorporation of sufficient functional margin in circuit designs to account for anticipated shifts in circuit characteristics [7] , [19] , [34] . Fault detection/tolerance techniques are used when fault avoidance alone cannot economically be used to meet reliability requirements during design. Both classes of techniques have been historically used for space and mission critical applications (e.g., traffic control, banking, medicine). In such applications, the primary objective is to achieve very high reliability with cost and performance as secondary concerns.
However, the overhead (area, power, delay) costs of traditional radiation hardening approaches (often exceeding 100%) are unacceptable for high-volume mainstream applications, where cost and performance are the primary objectives. As the soft error failure rate in mainstream application environments increases, there is a need for low overhead solutions to meet the demands of the highly competitive and cost-sensitive mainstream commercial market [1] . Whereas traditional fault avoidance techniques for mission critical applications target all modeled faults, fault avoidance techniques for mainstream applications need to target soft error failure rate reductions in a cost-effective manner.
This paper describes a new technique for designing radiationhardened combinational logic circuits that spans the middle ground between no protection/no overhead and very high protection/very high overhead. Rather than focus on all modeled faults, radiation hardening is targeted towards the nodes that have the highest soft error susceptibility, i.e., the nodes that contribute the most to the soft error failure rate of the logic circuit. This allows cost-effective tradeoffs between radiation hardening overhead and soft error failure rate reduction.
The proposed approach belongs to a class of techniques for radiation hardening that increase (or maximize) the critical charge (Q crit ) for nodes in a design. Q crit is the minimum amount of charge that needs to be deposited by a particle strike to produce a SEU [7] . A node is hardened by adding capacitance (to increase Q crit ) or drive (to dissipate deposited charge), or a combination of both. For elementary CMOS gates, this is achieved by sizing gates (or just transistors), i.e., by altering the W/L ratios of the transistors in the gates. The proposed algorithm uses an efficient fault simulation-based technique to identify and rank the critical nodes that contribute significantly to the soft error failure rate of a combinational logic block. A fast and accurate technique is used to size these critical gates to render them immune to SEUs with specified worst case characteristics. It has linear runtime complexity, since all the gates are processed in a single pass to reduce the soft error failure rate.
The proposed technique has several advantages. First, it is compatible with other optimization techniques that specifically target area, delay, and/or power reduction. Second, it can also be used to complement other fault avoidance and fault detection/tolerance techniques such as the use of siliconon-insulator (SOI) substrates, error detection and correction codes, etc. to reduce the soft error failure rate. Sizing for radiation hardening also does not incur any overhead for error detection and retry, since SEUs are dissipated locally and cannot result in soft errors. Last, by addressing SEU robustness earlier in the design cycle at the gate level, it aids the synthesis of inherently reliable circuits, thereby decreasing the number of iterations in the design cycle. Experimental results for the 180-70-nm process technologies are presented to show that the proposed technique reduces the soft error failure rate significantly-as estimated through coverage to sensitized faults-with minimal impact to overhead. Preliminary results were published in [40] and [41] .
The rest of this paper is organized as follows. In Section II, different approaches to radiation hardening are classified and summarized. In Section III, the problem is examined in greater detail and the key ideas presented in this paper are discussed. In Section IV, a technique to size individual gates for SEU immunity is described. In Section V, the proposed algorithm for soft error failure rate reduction in combinational circuits is presented. In Section VI, simulation results are presented and discussed. In Section VII, the conclusion is presented.
II. BACKGROUND
The purpose of this section is: 1) to list representative techniques that could be (or are already) used to address SEUrelated concerns in mainstream electronics and 2) to point out limitations (overhead and cost) that motivate research in alternate cost-effective strategies for SEU-tolerant design. Note that a detailed discussion of these techniques is beyond the scope of this paper. The references in this section are representative and by no means exhaustive; the reader is referred to [7] , [19] , [23] , and [42] for an exhaustive introduction and survey of this area.
Radiation hardening techniques can be roughly classified into three categories, which are: 1) device-level; 2) circuit-level; and 3) system-level. Device-level hardening approaches involve either a fundamental change or an enhancement to the fabrication process. Circuit-level hardening approaches usually require robust circuit design methodologies that reduce the sensitivity of the final design to SEUs. Both these approaches can be classified as fault avoidance approaches. System-level hardening approaches usually rely on fault detection/tolerance mechanisms that either detect or correct the effects of SEUs.
Device-level hardening approaches mainly aim to reduce and mitigate the effects of charge collection at the site of the particle strike. Proposed techniques include an extra doping layer to limit substrate charge collection [8] , well structures [11] to isolate the strike inside the well, and a buried layer that provides an internal electric field to oppose the deposited charge [35] . Whereas these techniques are effective in reducing SEU sensitivity, the cost from a process and materials standpoint might be excessive for mainstream applications. SOI processes, which can reduce the charge collection depth, were expected to provide a significant increase in SEU immunity. However, research [9] , [16] has shown that partially depleted SOI, used to manufacture commercial devices, is comparable to bulk processes in SEU sensitivity, and that scaling of SOI will result in a higher soft error susceptibility just as in bulk processes.
Examples of circuit-level hardening techniques include the insertion of feedback elements (e.g., resistors and capacitors) in the gate cell to slow the propagation of voltage transients, not using dynamic logic and the removal of floating (nondriven) nodes [32] , as well as the use of local space and time redundancy for soft-error-tolerant latches [13] , [22] , [27] . Most circuit-level hardening techniques to date have, however, focused only on hardening memories, latches, and flip-flops since at larger process technologies, they have a soft error susceptibility that is at least an order of magnitude higher than combinational logic.
System-level hardening approaches for logic circuits usually involve the introduction of redundancy into the design to achieve fault detection/tolerance capability [34] . A common solution, which is already widely used in memories, is the use of error detection and correction codes [14] . The complexity of the code (e.g., parity versus single-error-correcting doubleerror-detecting) is an example of trading cost and performance for SEU immunity. For logic circuits, concurrent error detection and correction schemes based on parity, duplication, as well as triple-modular redundancy have also been proposed. However, the irregular multilevel structure of combinational logic leads to very high overhead costs precluding their use for most mainstream applications. At higher levels, watchdog processors [21] , lock-step processing [34] , etc. have been proposed to increase reliability. More recently, solutions like simultaneous and redundant multithreading (SRT) [30] , [36] , active-stream/redundant-stream simultaneous multithreading (AR-SMT) [31] , and slipstream processors [28] have been proposed to exploit the significant amount of redundancy, repetition, and predictability in general-purpose programs. Such techniques will eventually be integrated into mainstream applications since they significantly increase reliability without seriously compromising performance.
III. MOTIVATION
Since radiation bombards a chip fairly uniformly in space and time, the probability of a particle strike at a combinational node is roughly proportional to its active area. Following a strike, the characteristics of a SEU vary greatly depending on which node it occurs at in the combinational logic circuit. For a specific application, the first step in radiation hardening is to select a range of incident particle energies over which the probability of occurrence of a particle is significant enough to require hardening. A discussion on how this range can be selected is deferred to Section IV-A. Once a range of particle energies is chosen, the three interrelated factors that determine whether a particle strike at a node produces a SEU at that node are: 1) the total charge deposited at the node; 2) the drive strength of the gate that drives the node; and 3) the capacitance of the node. In this section, the authors begin by discussing how transistor sizing within a gate to alter its drive strength affects the vulnerability of the gate to SEUs. The focus is on the magnitude and duration of the SEU that results from a particle strike, as a function of deposited charge and drive strength. The masking factors that influence how a SEU propagates through a logic circuit are described. Methods on how sensitization, one of the masking factors, can be used to rank and size the gates in a logic circuit to decrease the soft error failure rate are discussed.
A. Sizing and SEU Vulnerability
Consider a two-input NAND gate driving a lumped capacitance C p at its output N in Fig. 1 . The total capacitance at N is
Here, (W/L) is the size of a single negative-channel metal oxide semiconductor (nMOS) transistor in the NAND gate.
1
C unit is the unit output capacitance (includes nMOS and positive-channel metal oxide semiconductor (pMOS) diffusion capacitances) obtained by dividing the output capacitance of the NAND gate by the size of the nMOS transistor in the NAND gate. C p is the lumped parasitic capacitance (interconnect and fanout) at N . The authors focus on the voltage V out at N , since its magnitude and duration will determine how a SEU propagates through gates in the transitive fanout of the NAND gate to the primary outputs/latches/flip-flops. The charge deposition due to a particle strike at N is modeled by a double exponential current pulse I in at the site of the particle strike [6] , [24] 
where Q is the charge (positive or negative) deposited as a result of the particle strike, τ α is the collection time constant 1 Although we refer to transistor sizes and use W/L in the formulation, we limit ourselves to symmetric gate sizing in this paper for reasons explained in Section IV-C. Thus, scaling a single transistor is equivalent to scaling all transistors (nMOS and pMOS) in the gate by the same ratio. of the junction, and τ β is the ion-track establishment time constant. τ α and τ β are constants that depend on several process-related factors.
Note that only the worse case transient effects to logic values 0 and 1 are considered. A transient to logic 1 (logic 0) occurs when the steady-state logic value at N is logic 0 (logic 1) in the fault-free case and a SEU generates a positive (negative) transition to logic 1 (logic 0) at N . Whereas the figures focus on the response of the NAND gate when a SEU causes a 0 → 1 transient, the analysis for a SEU that causes a 1 → 0 transient is symmetric, with the use of pMOS transistor equations. Note that the pMOS transistors whose inputs are at logic value 1 are OFF. Hence, the pMOS transistors are ignored for analysis of a 0 → 1 SEU at N . Both inputs of the NAND gate are set to logic 1 so that the voltage is 0 at N in the fault-free case. For a transient to logic 1 in the NAND gate, the site for injection of the current I in can be one of the internal nodes or the output of the gate. Since any disturbance internal to the gate has to propagate through one (or more) series of transistors before reaching the output of the gate, the magnitude of the pulse may be reduced (or fade out altogether) during this propagation. Thus, the worst case occurs when the site for the particle strike is the output of the gate (i.e., node N ).
With this model, Fig. 2 presents how sizing affects the vulnerability of the NAND gate to particle strikes. The output response of the NAND gate (determined using SPICE simulations) to a SEU that produces a 0 → 1 transient at the output-for combinations of values of transistor sizing, process parameters τ α and τ β , as well as deposited charge-is presented. In each subfigure, it is clear that as the size of the nMOS transistors (which dissipate the deposited charge) increases, the magnitude and duration of the SEU transient diminish rapidly. In other words, transistors (i.e., gates) can be sized to dissipate (sink) the deposited charge as quickly as it is deposited so that the transient does not achieve sufficient magnitude and duration to propagate to the fanout. Note that sizing a gate increases the area of the gate sensitive to particle strikes. However, the added drive and capacitance mean that a SEU of particular magnitude (i.e., a SEU that is capable of depositing a particular charge Q) can no longer produce a transient of the same severity as before. This is because sizing adds drive to the nMOS and pMOS transistors in the gate, which can now sink the deposited charge faster and prevent the transient from reaching sufficient magnitude to propagate through the fanout. Besides τ α and τ β , the maximum charge Q for which SEU immunity is desired (i.e., the range of incident particle energies) has to be considered to determine this optimal transistor size. By sizing the gate for the worst case SEUs, i.e., the largest Q that can be deposited, the sensitivity of the gate to SEUs is locally reduced to zero for that worst case SEU and the gate no longer contributes to the soft error failure rate of the logic circuit.
B. Masking Factors
A simple and direct solution to radiation harden a logic circuit would be to size all the gates over a range of particle energies. However, the overhead costs of such an approach will be prohibitive. Selective hardening of the most sensitive gates can be performed to significantly harden the logic circuit with lower overhead costs. The factors that affect the ability of a SEU to propagate through the logic circuit and cause a soft error can be used for this purpose. Whereas the rate at which a SEU at a node occurs depends on incident particle energy distribution, the drive strength of the gate, and the capacitance of the node, there are three masking factors that determine whether this SEU can propagate to the primary outputs/latches/flip-flops and result in a soft error.
1) Logical masking occurs in the absence of a functionally sensitized path from the gate to the primary outputs/latches/flip-flops. This can be estimated by fault simulation. 2) Electrical masking occurs if the SEU is attenuated as it propagates along a sensitized path to the primary outputs/latches/flip-flops. This can be estimated over a range of deposited charge (which corresponds to a range of particle energies) by SPICE simulations. 3) Temporal masking occurs if a SEU reaches the primary outputs/latches/flip-flops at an instant other than the clocking window. This is estimated as a fraction of the clock period. Note that the rate at which soft errors are generated at a primary output/latch/flip-flop due to SEUs at a particular gate diminishes as each masking factor increases. Note also that logical, electrical, and temporal masking terms are referred to on the average and not on a per input pattern basis. In this manner, all three factors may be estimated independent of each other and used to compute the soft error failure rate of the circuit.
While these three factors present a natural barrier to soft errors in logic circuits [20] , technology trends such as smaller feature sizes, lower voltage levels, higher operating frequencies, and reduced logic depth are causing these barriers to diminish significantly. In [25] , it was shown that as a result of these factors, the soft error susceptibility of internal nodes (which is the contribution of the node to the overall soft error failure rate) in a logic circuit can vary by an order of magnitude or more. This provides an opportunity to significantly reduce the soft error failure rate at reduced cost, since nodes with high soft error susceptibility can be hardened, while those with very low soft error susceptibility can be ignored. In this manner, the soft error failure rate in logic circuits can be significantly reduced at a fraction of the cost of conventional techniques that try to harden all nodes.
C. Asymmetric Sensitization
The central idea in this paper is to decouple sensitization, which determines the propagation probability of a SEU in Boolean terms, from the electrical and physical properties of SEU vulnerability at a gate in a logic circuit. Consider the three masking factors introduced in Section III-B. Logical masking depends on the input pattern that is being applied to the circuit, i.e., whether or not there is a sensitized path from the gate to the primary outputs/latches/flip-flops. The probability of logical masking at the gate is given by P logical masking = 1 − P sensitization where P sensitization is the probability of sensitization, i.e., the probability that there exists one (or more) functionally sensitized paths from the gate to the primary outputs/latches/flipflops. Consider the node G 3 shown in Fig. 3 . If a is set to logic 0, the effects of a SEU at G 3 are logically masked from the primary output G. Similarly, if either (or both) G 5 and G 6 evaluate to logic 1, a SEU at G 3 will be logically masked from primary output H, since one or more side inputs along the SEU's propagation path are set to controlling values. Logical masking leads to a high asymmetry-for SEUs of the same magnitude-in the soft error susceptibility of gates in combinational logic. A similar observation from testing theory is that fault detectability can vary by orders of magnitude across a design. To illustrate this skew, the sensitization probability distribution profile for six benchmark circuits is presented in Fig. 4 . Sensitization probability on the x-axis is divided into ten intervals from 0 to 1, and the y-axis shows the number of nodes with a sensitization probability in each interval. The primary outputs, which are always sensitized, are omitted from the histograms. It is clear from the figure that less than 20% of the gates on average have a high sensitization probability (> 0.8) in logic circuits.
Electrical masking depends on the electrical properties of the intermediate gates along a sensitized path, i.e., on their drive strengths. There is a significant correlation between gates with low electrical masking probability and gates with low logical masking probability. Gates with low logical masking probability are highly observable, and are close to the primary outputs. Thus, SEUs that occur at such gates are less susceptible to electrical masking since they have to propagate through fewer gates. Gates several levels of logic deep have low observability, and hence, a high probability of logical masking. SEUs at such gates have to propagate through several levels of logic and are, hence, more likely to undergo electrical masking. This was also shown in [33] , where gates 16 FO4 levels deep needed a larger minimum charge for SEU since they were more prone to electrical masking. Further, whereas logical masking is a cumulative effect over all input patterns, electrical masking is of concern only on those input patterns where logical masking does not occur. In other words, electrical masking is of concern only on those input patterns where a logically sensitized path exists. Lastly, results from [2] suggest that while electrical masking does produce an observable effect, it does not significantly reduce the observed soft error failure rate. These reasons motivate the exclusion of electrical masking as a metric to identify the gates with a high sensitivity to SEUs.
Finally, temporal masking depends on the frequency of operation of the circuit. The probability of temporal masking is the same for a particular clock period-for propagated SEUs of the same magnitude-across all the gates in a logic block.
In summary, logical masking is an aggregate, direct, and first-order measure of the asymmetry in the soft error susceptibility of gates in a combinational circuit. In this paper, logical masking is used as the criterion to guide the search for the most susceptible gates for radiation hardening. The hardening is achieved by sizing the gate to limit the peak of the SEU-induced transient to 0.5V DD at the gate where the SEU occurs. Thus, the SEU is rendered marginal, i.e., it is electrically masked, along all paths of propagation to the primary outputs. Since logical masking is a direct measure of the number of times such paths occur, it is an effective choice to identify candidate gates for radiation hardening.
IV. SIZING FOR SEU IMMUNITY
In this section, an efficient method to compute the minimum transistor size (W/L) min required to limit the maximum value of the transient pulse V out at N to a prespecified value is described [40] . For the rest of this discussion, it is assumed that this limit on the peak value is 0.5V DD . Note that the method is equally applicable for any other limit on the peak value of V out . Without loss of generality, a technique to size the nMOS transistors in a logic gate for SEU immunity is described. The method is equally applicable to sizing the pMOS transistors in a gate. Whereas the pMOS and nMOS transistors can be sized independently in a logic gate, this has implications for gate sizing, which are discussed in Section IV-C.
A. Choosing Worst Case Deposited Charge
Upper bounds for the deposited charge used for gate sizing are determined as follows. The term linear energy transfer (LET) is used to describe the sensitivity of a process technology to SEUs. A particle with an LET of 1 MeV cm 2 /mg deposits approximately 10 fC/µm of electron-hole pairs along its track [7] , [22] . The LET of very few ionizing particles in silicon is higher than 15 MeV cm 2 /mg [15] , [37] . The LET of a particle is multiplied by the charge collection depth to obtain the total electron-hole pairs generated by a strike. For process technologies of 180-nm and higher, the charge collection depth does not change significantly and is typically 2 µm in epitaxial (as well as bulk) substrates [16] , [22] . This gives an upper bound of 0.3 pC for deposited charge in the 180-nm process technology. For smaller feature sizes, the charge collection efficiency decreases primarily due to higher channel doping density and a decrease in active layer thickness, which reduces depletion width and channel funneling [16] , [17] . In [12] , an inverse linear relation between collected charge and doping density was determined empirically. For uniform technology scaling [29] , the doping density increases by a factor of λ (equals √ 2) in successive process technologies. Accordingly, upper bounds of 0.21, 0.15, and 0.11 pC can be derived for the 130-, 100-, and 70-nm process technologies. Similarly, as reported in the authors' original paper [41] , the doping density at 130 nm was more than twice the doping density at 180 nm for the SPICE libraries that were used for the experiments. This is consistent with fixed-voltage scaling [29] by λ 2 (equals 2) and can be used to derive upper bounds of 0.15, 0.08, and 0.04 pC for the 130-, 100-, and 70-nm process technologies, respectively. In Section VI, the actual values of doping density are used to scale the base value of 0.30 pC for the smaller process technologies.
B. Sizing Technique
The voltage V out following a particle strike is given by the solution to
where C total is the total capacitance at N (1), I in is the current from the particle strike (2), and (W/L) is the aspect ratio of a single nMOS transistor in the gate. I D is the effective drain current through the nMOS transistor network in the gate and is a function of V out . It is assumed that the pMOS transistors are off, since the inputs to the gate are such that N evaluates to logic 0 when the SEU occurs. The cross-coupled nature (time t and voltage V out ) of the differential (3) implies that there is no closed form expression for the instant t max when V out reaches 0.5V DD . However, since t max occurs after I in reaches its maximum, it is possible to use the following iterative procedure to compute t max . The first step is to determine a suitable search interval for t max . The maximum value of I in occurs at a time instant t start , which is given by (3) . The first condition is that the slope dV out /dt must equal 0 at t max , i.e.,
where (W/L) min is the minimum transistor size required to limit the peak of the SEU transient to 0.5V DD . Rearranging, it becomes
The second condition is given by charge conservation over the interval [0, t max ]. In other words, the integral of both sides of (3) over the interval [0, t max ] must be equal, i.e.,
Since I D is a nonlinear equation that depends on V out , the following approximation is used to simplify the integral. It is assumed that the voltage V out rises from 0 to the peak value of 0.5V DD linearly, i.e.,
As a result, I D is just a function of time t and (6) is directly integrated to get a nonlinear equation in (W/L) min and t max . Note that this assumption is accurate since the nMOS transistors are in the linear region of operation (V out ≤ 0.5V DD ).
With this approximation, (5) and (6) 
C. Continuous Symmetric Gate Sizing
Since the nMOS (pMOS) network of a CMOS gate can be sized independently of the pMOS (nMOS) network, the above algorithm can be extended to size CMOS gates asymmetrically. This disadvantage of skewing transistor sizes significantly is that the 1 → 0 (0 → 1) delay through the gate can be significantly affected. For example, increasing the W/L of the nMOS transistors adds to the diffusion capacitance and can significantly increase the pull-up time of a gate if the pMOS transistors are not adequately resized. If the rising transition through the gate lies on the critical path, this can significantly impact performance. This paper only discusses symmetric gate sizing for radiation hardening.
V. PROPOSED ALGORITHM
In this section, the gate sizing problem for SEU immunity is formulated. It is shown how the gate sizing technique presented in Section IV can be used to size critical nodes in a logic circuit to reduce the soft error failure rate significantly with minimal impact to overhead.
A. Problem Statement
Given a mapped combinational circuit composed of gates from a technology library. For each gate g in the circuit, several different sizes 1, 2, . . . , k are available in the library, each of which implements the same logic function but differs in one or more of the following aspects-area, delay, drive strength, and power consumption. The gate sizing problem for SEU immunity is to select optimum sizes for each (or a subset) of the gates in the combinational logic circuit such that the objective function-defined by the susceptibility of the logic circuit to SEUs (i.e., the soft error failure rate of the logic circuit)-is minimized.
B. Proposed Algorithm
The pseudocode for the proposed procedure for radiation hardening is presented in Fig. 6 . The first step is to rank all the gates in the circuit in descending order of their sensitization probability using the method Fault-Simulate as follows. Since the probability of logical masking of a node depends on the probability of each input pattern being applied to the circuit, an efficient way to calculate the probability of logical masking is to simply simulate the system with a typical workload for some number of clock cycles. For each clock cycle, fault simulation can be performed on each gate to determine if it is sensitized to one or more outputs/latches/flip-flops. Nodes that are only sensitized for a very few input patterns will have a negligible effect on the overall soft error rate (since their probability of being sensitized is extremely low) and can, hence, can be ignored for radiation hardening. A less accurate alternative to simulating the system with a typical workload would be to just apply random patterns at the primary inputs. Fault simulation was run on the circuit in Fig. 3 in this manner, and the logic 0 and logic 1 sensitization probabilities were computed as shown in the figure. Note that the fraction of cycles where a node may assume a logic 0 value may differ significantly from the fraction of cycles when the node assumes a logic 1 value. As a direct consequence, there can be a significant difference between the logic 0 and logic 1 sensitization probabilities of a gate, especially if there is reconvergent fanout in the logic circuit (e.g., G 2 ). Since this paper focuses on continuous symmetric gate sizing, the logic 0 and logic 1 sensitization probabilities are collapsed (summed) when the gates are inserted into the priority queue sensitizationQ.
Gates are dequeued from sensitizationQ in decreasing order of their collapsed sensitization probability (increasing order of logical masking probability). The gate sizing routine Size-SEU-Immunity symmetrically sizes both the nMOS and the pMOS transistors in a library gate using the technique from Section IV. Once the minimum size for SEU immunity is determined for a gate, the transistor sizes (both nMOS and pMOS) are updated using
Note that the scaling of the gate is done such that the ratio of the sizes of the nMOS and pMOS transistors in the original library gate remains unchanged. The gates are processed in decreasing order (G, H, G 3 ,  G 1 , . . . for Fig. 3 ) until the coverage objective is met or any of the constraints are violated. The routine Update-CoverageConstraints first updates coverage, which is defined as
In (8), P s (·) returns the collapsed sensitization probability of a gate. Candidate gates g c are all the gates that may be sized for SEU immunity as they are dequeued from sensitizationQ. Thus, the percentage of propagated SEUs over all the cycles is reduced (in percent) by an amount that equals coverage for the worst case parameters, since the gates have been sized such that the SEUs will not propagate even if a sensitized path exists. The coverage metric is used to estimate the reduction in soft error failure rate, without computation of the exact soft error failure rate of the original and hardened circuits (refer to Section V-C). Note that 90% (50%) coverage corresponds (approximately) to an order of magnitude (factor of 2) reduction in the soft error failure rate for the chosen charge range (worst case SEU parameters). For the circuit in Fig. 3, (all gates g) (P s (g)) is 5.05-only gates {G, H, G 3 } may need to be sized for 50% coverage, while all the gates except G 5 and G 4 may need to be sized for 90% coverage. In Section VI, simulation results that show that only 55.6% of the gates on average need to be considered candidates for sizing to achieve 90% coverage are presented.
C. Coverage and Soft Error Failure Rate Reduction
In order to verify that there is a significant correlation between coverage determined using sensitization probability and soft error failure rate reduction that includes electrical and temporal masking factors, a Monte Carlo-based simulation framework similar to that described in [39] was implemented to estimate the reduction in soft error failure rate of circuits. The charge used for simulation was the worst case charge used for radiation hardening. The site for particle strikes and the input pattern were chosen randomly. Since the runtime of this SPICE-based simulator is exorbitant (over 10 hours for 100 000 patterns), experiments were run on small circuits from the MCNC benchmark suite [38] to verify this correlation.
In Table I , the reduction in soft error failure rate achieved when the gates are sized to achieve 90% coverage is presented. For a gate g, let n unsized (g)(n sized (g)) be the number of soft errors due to a particle strike at g in the unsized (sized) circuit. Similarly, let A unsized (g) (A sized (g)) be the area of the gate g in the unsized (sized) circuit. The sized and unsized circuits are simulated simultaneously. Therefore, the reduction in the soft error failure rate can be estimated as
It is clear from the results in Table I that coverage is a good estimate for soft error failure rate reduction. The average reduction in soft error failure rate across the circuits and process technologies is 80.9%. This difference between coverage and actual reduction in soft error failure rate can be attributed to electrical and temporal masking, as well as other factors. However, the computational complexity of accounting for electrical and other masking effects increases rapidly with the size of the circuit. Hence, the coverage determined using sensitization probability of gates is a good and computationally efficient metric to estimate the reduction in the soft error failure rate in logic circuits.
D. Order of Processing
When gates are dequeued from sensitizationQ, it is possible that a gate may be sized after one or more of its fanin gates have been sized. This perturbs the soft error sensitivity of the gates in the immediate fanin of the gate, since the increase in the gate's input capacitance was not accounted for when the fanin gates were originally sized. Since the gates are processed one at a time, the procedure Radiation-Harden has to be run multiple times till the changes in gate sizes stabilize.
The experiments with three passes of Radiation-Harden indicate that the impact of this effect on the overall performance of the algorithm is negligible. Area, delay, and power overhead change by less than 3% on average across all the benchmarks and process technologies. There are two reasons for this observed behavior.
1) First, the sensitization probability of a fanout gate usually exceeds that of its fanin gates. As a result, the number of cases where the fanin is processed before the fanout is less than 44.1% of the gates that are sized (24.5% of total gates) on average (for the benchmarks in Section VI 
E. Design Constraints
Sizing the transistors in a gate affects the three major design constraints: 1) area; 2) power consumption; and 3) delay, which can be integrated into the method Update-CoverageConstraints. Since the constraints are updated after each gate is sized, the algorithm terminates as soon as one of the constraints is violated. Area information is obtained from physical layout of the standard cell library. Area changes in discrete steps as (W/L) min increases. This is because in most standard cell libraries, gates of drive strength 1 and 2, 3 and 4, etc. usually have the same cell area. Power changes continuously as the gate is sized. However, switching activity at each of the gates can be obtained during Fault-Simulate and can be used to estimate the increase in power after each gate is sized using a simple load model. If either area or power constraints are violated and Radiation-Harden terminates, the reduction in the soft error failure rate will be maximized since the gates were processed in descending order of their collapsed sensitization probability.
Delay is the most difficult constraint to handle, since sizing changes not only the drive strength of a gate, but also the input and output capacitances. The effects of sizing a gate are thus not localized from a delay perspective, since all the gates in the transitive fanin and transitive fanout are impacted by the change in capacitance. The load-dependent nature of delay means that the problem of gate sizing for delay is N P-complete [10] . Recomputing delay after each gate is sized may be computationally expensive, so it may be done only if the gate is on a critical path. However, as presented in the experimental results in Section VI, delay is minimally impacted by the sizing procedure proposed in this paper. If the final delay exceeds specifications, techniques such as the one presented in [4] may be used to decrease the delay of the circuit. This is done by flagging the gates that have actually been sized for SEU immunity such that their sizes are not further reduced (i.e., these sizes serve as a lower bound so that SEU immunity is not compromised).
VI. SIMULATION RESULTS
The SPICE libraries for four process technologies-180, 130, 100, and 70 nm-were obtained from the Berkeley Predictive Technology Model [3] . The combinational benchmark circuits were chosen from the ISCAS85 and LGSynth91 suite [38] . τ α = 0.2 ns and τ β = 0.05 ns are used in all the simulations in this paper [6] . A technology library that comprised inverters and two-and three-input NAND and NOR gates for synthesis of the benchmarks was built. The charges used for each process technology are presented in Table II . The charges were derived based on the discussion presented in Section IV-A. The doping densities for the n-channel were obtained from the SPICE files and used to scale the worst Note that these values are only a guideline to determine the maximum charge for which radiation hardening is desired. For comparison, results obtained using the lower charge are also reported to show how radiation hardening overhead increases with increasing charge deposition for each process technology. Table III presents experimental results for the 180-and 130-nm process technologies. Under the first major heading in Table III , details about the circuits that were chosen-name, number of primary inputs, number of primary outputs, and number of gates-are provided. Under the second major heading, the circuit function is reported. Under the third major heading, the fraction (in percent) of gates that were targeted for sizing is reported. This remains constant across all process technologies, since logical masking as the criterion to determine coverage and the synthesized netlists remain the same. Under the fourth major heading, the area, power, and delay overhead when the gates in the circuit are sized to obtain 90% coverage for the 180 nm process technology is reported. The charge used to simulate SEUs were 0.2 and 0.3 pC, respectively. The overhead is normalized with respect to the area, power, and delay of the original circuit after technology mapping and is reported as a percentage in all the cases. The area numbers are derived from the technology library, while power and delay are given by
where f and g are obtained from extensive simulation and characterization of the library cells. It is clear that roughly an order of magnitude reduction in the soft error failure rate, i.e., 90% coverage, for worst case charge 0.3 pC may be obtained with area, power, and delay overhead of 19.3%, 9.8%, and 1.2% on average. Under the fifth major heading, the results for the 130 nm process technology are reported. Similarly, for the worst case charge of 0.30 pC, 44.7%, 33.5%, and 4.9% in area, power, and delay overhead are incurred on average. These results are also consistent with the expectation that process technologies with smaller feature sizes (130 nm in this case) will have a higher susceptibility to SEUs in comparison to a process technology with a larger feature size (180 nm in The table is organized like  Table III . Achieving 90% coverage for soft error failure rate reduction, for a worst case charge of 0.18 pC for the 100 nm technology, has overhead of 33.1%, 20.9%, and 5.1% in area, power, and delay, respectively. Similarly, a worst case charge of 0.15 pC for the 70 nm technology has overheads of 56.3%, 44.1%, and 4.0% in area, power, and delay, respectively. It is interesting to note that delay is minimally impacted with less than a 3.8% overhead for worst case charge across the four technology nodes. This can be further reduced using the techniques suggested in Section V-E.
Finally, the choice of the process-related parameters τ α and τ β (0.2 and 0.05 ns) influences the results that are obtained, since they determine the magnitude and severity of SEU for a particular charge. In order to comprehend the effects of τ α and τ β on SEU severity and overhead, Table V presents the average overhead for the four process technologies for a range of values of the process parameters τ α and τ β and corresponding worst case charges. It is evident from this table that as τ α and τ β decrease, the average overhead required for radiation hardening by gate sizing increases. However, as explained in Section II, device-level hardening techniques may be used to reduce the impact of τ α and τ β as well as the amount of deposited charge Q. The results presented in this table are thus only indicative of the overhead that may be incurred if sizing were used based on currently available parameters.
VII. CONCLUSION
In the future, as designs become more complex and as the soft error failure rate of logic circuits becomes unacceptably high, there will be a need for gate-level techniques for radiation hardening. The gate sizing technique for radiation hardening presented in this paper targets soft error failure rate reduction by selectively sizing the most sensitive nodes in a logic circuit. The proposed technique has an average overhead of 38.3%, 27.1%, and 3.8% in area, power, and delay for worst case single-event upsets (SEUs) across four process technologies. Since the proposed approach has significantly less overhead than approaches based on fault detection and tolerance, and since it also does not require any runtime support in hardware, it is an attractive option to reduce the soft error failure rate with minimal impact to performance. An area for future research is to investigate how the proposed technique can be integrated with other technology-dependent optimization algorithms with multiple objectives.
