Abstract-Near-threshold computing is an effective strategy to reduce the power dissipation of deeply-scaled CMOS logic circuits. However, near-threshold strategies exacerbate the impact of delay variations on device performance and increase the susceptibility to soft errors due to narrow voltage margins. The objective of this work is to develop and assess design approaches that leverage tradeoffs between performance and the resilience of fault masking coverage for various soft-error mitigation techniques. The primary insight from this work is identification of redundancy-based hardening techniques that can deliver increased benefits in terms of the fault coverage energy ratio (FCER) for the leveraged tradeoffs within iso-energy constraints at near-threshold voltage (NTV). Simulation results demonstrate that temporal redundancy approaches offer favorable tradeoffs in terms of FCER. They exhibit reduced impact on performance variations and achieve extensive soft fault masking, therefore improving the system robustness within acceptable delay constraints. Meanwhile, it is shown that a hybrid redundancy approach can be used to protect a low-power system to maintain throughput while tolerating soft errors. We demonstrate how the FCER metric can be used as an optimization parameter to guide circuit synthesis to meet performance and robustness goals. Finally, the impact of design diversity on spatial and hybrid redundancy at NTV is assessed in terms of FCER and delay variation to form overall recommendations regarding soft-error mitigation at NTV.
Ç

INTRODUCTION
R EDUCTIONS in the power dissipation of CMOS circuits are desired due to thermal limitations associated with technology scaling and diminished energy budgets. These constraints restrict the number of simultaneously activated components, e.g., the proportion of active cores within a many-core processor. In this regard, supply voltage scaling has been widely recognized as an effective lever to reduce the power dissipation and energy consumption in CMOS logic datapaths.
Total energy consumption of CMOS logic circuits is determined by dynamic and static energy components, which are both dependent on supply voltage, V DD . The dynamic energy component has a quadratic relationship with V DD , whereas the static components have a linear dependence. Scaling down the supply voltage for lowpower designs have shown significant improvements in energy-efficiency [1] . However, aggressive voltage scaling negatively impacts the system robustness by increasing soft error susceptibility in deeply-scaled CMOS-based computing systems [2] . Supply voltage reduction encounters absolute theoretical limitations around a value that is twice the thermal voltage, i.e., about 36 mV [3] .
Operating at low-voltage such as below the transistor's threshold voltage V th , which is an order of magnitude larger than thermal voltage, can result in highly undesirable exponential increase in delay. The leakage energy also begins to dominate at a certain point, such that designs within this subthreshold region have limited applicability. Hence, operation in the near-threshold region is sought, as it provides an energyefficient operating tradeoff against delay. In this region, the V DD is set to be slightly above the V th of the transistors to provide a 10X delay improvement compared to operation in the sub-threshold region with only a 50 percent reduction in energy savings [3] . Taking all of these factors into consideration, NTV can be preferred to provide up to 6X energy savings with reasonable performance overheads as compared to operating at nominal voltage [3] , [4] .
Even though NTV computing offers an attractive approach to balance power consumption versus delay for applications within iso-energy constraints, such as for highperformance computing [3] , it can result in reliability implications as the fault masking coverage is reduced. Herein, Reliability RðtÞ of a system, is defined as the probability (0 RðtÞ 1) that soft errors do not impact the output of the circuit during a time period 0 < ¼ t. Similarly, the soft fault coverage of a redundant system refers to its ability to tolerate radiation-induced soft errors. The majority of such errors are due to Single-Event Transient (SET) [5] that impacts a logic gate and Single-Event Upset (SEU) [6] striking a register/flip-flop. These events are expected to increase at NTV and will be exaggerated with manufacturinginduced Process Variations (PV) due to technology scaling [7] , [8] . As a result, soft errors during NTV operation can affect memory elements and cause arbitrary bit flip(s) in the stored data (SEU), or transient errors can propagate through a logic path and eventually become latched by a latch/flipflop circuit (SET) to cause erroneous computations.
A system that tolerates all possible transient and upset glitches (SET and SEU) during its lifetime achieves complete soft fault masking coverage, or a 100 percent reliability. In particular, a spatial redundancy approach such as a Triple Modular Redundancy (TMR) arrangement, employs triplicated datapath and register circuits, therefore, it can deliver complete fault masking coverage for faults in any single module at a given time instance. Generally, redundancy-based mitigation techniques, such as spatial [9] , temporal [6] , and hybrid redundancy [10] , have been used to provide fault tolerance and improve the system robustness. Numerous works have been reported recently aiming to provide viable soft-error detection and correction in memory systems using information redundancy, such as Error Correcting Codes [3] . However, in this work, we focus on datapath logic that is more amenable to the use of redundancy-based masking techniques than data encoding. These masking techniques allow a range of favorable area versus performance tradeoffs. As near-threshold computing becomes mainstream, more mission-critical applications will seek to embrace this mode of operation for the energy benefits with maximum reliability. The potentials and pitfalls of near-threshold computing for mission-critical applications, such as online banking systems, avionics operation and control systems that can have a severe impact if they fail [11] , are end-product impacts of this work.
Herein, we evaluate the energy and performance costs at NTV while ensuring certain degree of reliability. These are crucial design considerations if the use of low-voltage operation is to prevail in contemporary computing systems. In particular, we provide detailed tradeoffs between energy consumption and resilience of fault masking coverage for redundancy-based soft error mitigation approaches by introducing a new metric referred to as Fault Coverage Energy Ratio (FCER). Additionally, an important insight of this work is to identify redundancy-based hardening techniques that can deliver increased benefits for the leveraged tradeoffs within iso-energy constraints.
Specifically, the contributions of this paper are as follows:
quantify the impact of threshold voltage variation due to NTV operation on circuits protected via spatial, temporal, and hybrid redundancy mitigation approaches, establish the relationship between soft fault masking and energy-efficient operation for multiple mitigation approaches via the FCER metric, use of FCER as an optimization parameter for design of resilient circuits, and its applicability to guide circuit synthesis algorithms while meeting performance goals, and quantify the pros and cons of design diversity for redundancy-based soft error mitigation approaches.
CHIP-LEVEL SINGLE-EVENT CATEGORIES
As CMOS technology continues to scale towards fundamental physical limits, its immunity against charged particle strikes decreases significantly. Thus, soft errors at the chip level has become a major reliability challenge for VLSI designs [12] . In order to achieve high-reliability for contemporary computing systems, memory protection techniques, i.e., Error Correcting Code (ECC), have been utilized. However, such techniques are inadequate to completely and efficiently protect the reliability of VLSI circuits and systems operating at NTV. In particular, soft error effects (SEE) in logic circuits, i.e., latches/flip-flops and combinational logic, have become significant contributors to increase Soft Error Rate (SER) at the system-level [13] , [14] . Therefore, it has become essential to ensure the integrity of logic paths by utilizing efficient soft-error mitigation techniques. At the chip-level, various types of soft error effects can occur in CMOS logic circuits due to energetic radiation particles, including 1) particles that strike combinational logic and generate a transient pulse, as an SET [15] , which propagates through downstream logic and eventually it might be captured by a latch/flip-flop thereby causing an upset, 2) energetic particles that strike node(s) inside a storage element to cause a so-called SEU that flips the bit state in a DRAM memory cell or a SRAM-based register, and 3) energetic particle strikes on global signal lines, such as control signals or instruction lines, can result in so-called Single-Event Functional Interrupt (SEFI) that causes a temporary malfunction, i.e., interruption of normal operation [16] , as the wrong instruction might be executed. Since the corrupted data can be rewritten by a new legitimate data, these temporary glitches are considered as soft errors. Altogether, these events can cause catastrophic failures in mission-critical applications. Thus, protection schemes need to be evaluated for energyefficient operation within the near-threshold region.
Trends and Prospects of SER at NTV Region
The impact of soft errors in logic paths, as opposed to memory elements, is becoming significant as the supply voltage is scaled down to the near-threshold region. For instance, [17] predicts that the SER of logic circuits per die will become comparable to the SER for unprotected memory elements, which was later verified through experimental data measured from a microprocessor [8] . Operation at NTV is predicted to aggravate these trends. For example, [8] reports that SER increases roughly by 30 percent per each 0:1V decrease as V DD is decreased from 1.25 to 0.5 V. Moreover, both simulation results and experimental results at the 40nm and 28 nm technology nodes concur that SER increases by two orders of magnitude when V DD is reduced from 0.7 to 0.5 V [8] . Primarily, the critical charge, Q crit , needed to cause a failure decreases as V DD is scaled and SER has an exponential dependence on critical charge [8] . Such trends are consistent with decreasing feature sizes due to technology scaling [18] . For instance, in the earlier technologies' gate capacitances utilized large amounts of charge, and therefore, they were less prone to experience upsets, and even if an upset did occur, it was most likely to be attenuated by electrical masking since the gate transition delay was larger than the SET pulse width [15] . With technology scaling both the supply voltage and gate capacitance are shrinking, and thus higher SER is predicted due to lower Q crit and higher device density per unit area, i.e., roughly doubling, which leverages a higher strike probability.
Soft Error Masking Mechanisms in Logic Paths
In logic paths, the propagation of a transient pulse through downstream logic can be masked by three inherent masking mechanisms. These masking techniques can prevent the propagation of a spurious transient pulse along a path towards the input of a flip-flop/latch, where it may be registered to cause an SEU [17] : 1) Logical Masking: corresponds to the case when a transient pulse does not affect the computation in other gates along the path towards the output for a given input vector, 2) Electrical Masking: due to the attenuation of the glitch while passing through subsequent logic gates, and 3) Latching-window/temporal Masking: occurs when a generated glitch does not occur within the setup/ hold time window of a flip-flop. Among the mechanisms listed above, logical masking is not impacted by operation at lower voltages since it is circuit design dependent [15] , [17] , [19] . For example, a 0 logic on one input of a NAND gate or a 1 logic on one input of a NOR gate prevents the undesired transient pulse from being propagated to the next logic gate, or controlled by the state of the combinational logic. However, as operating frequencies at NTV are expected to be low, it has been suggested that pipeline stages consist of fewer gates to regain lost throughput. This will consequently lower the benefit of both logical and electrical masking. In addition, the electrical attenuation is reduced at low supply voltages as large pulse-width transients are created. However, there is a positive effect on masking due to latching-window masking since operating frequencies are lowered. The latchingwindow masking is also dependent on the design of the flip-flop utilized [8] , where some designs show more SER immunity as compared to others. Overall, reduced pipeline depths, technology scaling, and voltage reduction can be anticipated to have detrimental impact on logic SER. Thus, there is a need to develop effective soft error mitigation techniques for reliable NTV operation.
REDUNDANCY-BASED SOFT ERRORS MASKING TECHNIQUES
In the literature, several hardening techniques were proposed to protect CMOS circuits against soft-errors. They can be categorized into three taxonomies including: 1) hardening by process at the device-level, 2) hardening by design at the circuit-level, and 3) resilience by information-encoding at the system-level. Each level leverages some design property to protect the design against radiation-induced soft errors and insures the integrity of data and computation. At the device-level, fault avoidance techniques [12] are utilized. They are designed by concentrating on either increasing the amount of critical charge, Q crit , or reducing the collected charge at a struck vulnerable node. Increasing the quantity of Q crit requires an increase in the charge/discharge capacitance by resizing every vulnerable node while a modification for the existing fabrication process is required. This improves the layout design and reduces the probability for the struck charges to be collected at sensitive nodes [20] . However, as technology is further scaled, utilizing more robust fabrication processes incurs more complexity and increases the production cost [14] . Additionally, Radiation-Hardening By Process (RHBP) and material approaches are not guaranteed or efficient to mitigate both transient and upset glitches [21] . For instance, identifying and resizing every susceptible individual node in designs that consist of over 1 billion transistors is a complicated process in addition to its expensive fabrication cost [22] . Therefore, Radiation-Hardening By Design (RHBD) techniques, i.e., fault correction techniques, within some redundancy considerations, at the circuit/module-level of abstraction are required since they can be directly implemented on commercial applications without any modification for the technology process or using advanced fabrication processes. On the other hand, resilience by coding techniques, i.e., ECC and parity, which are used to protect memory elements, are the predominant techniques at the system-level [14] . However, these strategies require extra bits of information along with the data to check the validity of the stored data bits and restore the correct state in an event of upset. In addition, the coding techniques are not widely applicable beyond memory elements in the circuit. Thus, an effective alternative solution is to utilize circuitlevel schemes. They potentially offer intriguing features, such as being directly implementable on commercial technologies, able to tolerate SEU and Multiple-Bit Upset (MBU), while incurring acceptable area-energy and performance overheads under moderate cost. Notice that the scope of this work is fault detection and correction techniques (circuitlevel), such as spatial, temporal, and hybrid redundancy, rather than fault avoidance techniques at the device-level.
To reduce design complexity and fabrication cost of SER masking techniques designed at device-level, redundancybased soft error mitigation approaches have been introduced as an efficient solution. The protection of logic paths towards soft errors typically can be achieved by applying three redundant strategies: spatial approaches such as TMR, temporal schemes such as delayed clock shadow latches, and hybrid redundancy combining spatial and temporal [5] , [14] , [23] .
The SER in logic paths can be reduced by schemes such as gate-sizing [24] or dual-domain supply voltage assignments [25] to harden components which are more susceptible to soft-errors. These techniques tradeoff increased area and/or power to reduce SER of the logic circuit, but may not be able to provide comprehensive coverage. For instance, SER reduction of only 33.45 percent is demonstrated in [25] using multiple voltage assignments. One option identified for masking soft errors is spatial redundancy, and in particular the readily-accepted use of TMR, as being effective for mitigating soft errors. TMR is considered to be appropriate for applications which demand immunity to soft errors and can accommodate its inherent overheads. Whereas, temporal and hybrid redundancy approaches have been introduced in the literature to handle soft error effects with reduced area and power consumption while incurring higher computing delay compared to the conventional TMR approach [6] , [10] . Next, we discuss each redundancy-based mitigation approach in detail.
Spatial Redundancy
Spatial redundancy involves replicating N instances of a circuit module and obtaining the majority output via a voting element. Hence, N is typically chosen to be an odd number to preclude outcomes which are ties. The output of a spatial redundancy arrangement can be considered to be correct whenever the majority of the instances produce identical and valid outputs. Spatial redundancy is typically utilized in applications that operate in harsh environments to ensure system operation even in unforseen circumstances, such as autonomous vehicles and satellites [9] , nuclear reactors [26] , and deep space systems [27] . It has also been employed in commercial systems such as High-Performance Computing applications [28] where significant increase in computing node availability is sought. Utilization of compute-node level redundancy at the processor, memory module, and network interface can improve reliability by a factor of 100-fold to 100,000-fold [28] . This is because TMR realizes 100 percent fault resilience coverage for faults in single module simultaneously, compared to the simplex arrangement (unprotected design). However, this is at the cost of incurring roughly two orders of magnitude for area and energy overheads [29] , which limits the usage of spatial techniques in applications with tight energy budget. Thus, TMR is suitable for high performance and mission-critical applications which can accommodate its inherent area and energy overheads.
Meanwhile, identical yet invalid outputs in an N-Module Redundancy (NMR) system with a multiple-bit word output require the transient(s) to impact distinct NMR instances at the corresponding functional locations to manifest identically incorrect outputs. In the case of an isolated SET in an NMR system during a computation interval, the resultant soft error is masked. However, if more than one bit is upset then a MBU results. Spatial MBUs occur when a single particle upsets multiple bits which reside within the same physical neighborhood. Temporal MBUs occur when two or more particle strikes independently upset distinct NMR instances. MBUs may still generate a diagnosable error from an NMR word-wise voted output. In such scenarios, word-wise voting can be advantageous compared to bit-by-bit voting [30] .
Under nominal operating conditions, the energy consumption of NMR systems is about N-fold as compared to simplex (N ¼ 1) systems which lacks the soft error masking capability. This paper explores the tradeoffs of operating NMR systems at NTV beyond processor caches where a low-complexity means for improving resilience has been sought [31] . For cache memories, Orthogonal Latin Square Codes (OLSCs) have been employed to encode orthogonal groups of checking bits without syndrome generation, yet enable recovery with majority voting, and further extensions to Variable-Strength Error Correcting Codes (VS-ECCs) have been employed which combine the use of ECC and memory tests to ensure reliable cache operation under aggressive voltage scaling [32] . Herein, we concentrate on redundancy-based mitigation techniques for logic paths as opposed to memory elements.
Temporal Redundancy
As compared to spatial redundancy, temporal redundancy uses a simplex instance of the datapath for error detection and correction. Here, data from the same logic circuit is captured at three distinct time instances to configure a voting arrangement [22] . In particular, the data is sampled at three different time intervals (T1, T2, and T3) by using three identical flip-flops triggered by three different clock signals (CLK1, CLK2, and CLK3). These clock signals are delayed by a relative phase shift such that F 1 is a phase delay between CLK1 and CLK2, while F 2 is relative latency between CLK1 and CLK3 [22] . Additionally, these timing constraints, i.e., F 1 and F 2 , are set depending on the transient pulse width coverage, so that the legitimate data from the previous stage is latched in the registers. The final masked output is decided by a majority voter circuit that votes based on outputs of the three flip-flops [6] . Meanwhile, this technique is only immune to SEU, based on the assumption that a single SET pulse occurs either in the combinational logic path and is stored in one of the flip-flops to cause an upset, or a single transient glitch occurs directly inside one of the latching circuits and results in a soft-error (SEU). In both cases, the soft error should be masked by the voter circuit. Thus, similar to the TMR arrangement, the temporal redundancy approach achieves complete SEU masking in a single module, whereas it fails in the rare case when two out of three registers are impacted concurrently. This is due to the result of majority voting.
Hybrid Spatial and Temporal Redundancy
The hybrid spatial and temporal redundancy approach was introduced to detect and correct soft errors since conventional Dual Modular Redundancy (DMR) is unable to mask transient faults (it can only detect faults). In [23] , Teifel et al. proposed a hybrid redundancy technique, namely Self-Voting DMR (SV-DMR). It incurs only slightly higher overheads as compared to DMR, since only the sequential part is triinstantiated like the TMR approach and combinational part is only duplicated for error detection. For instance, the outputs from the two datapaths are registered in R1 and R2 (DFF1 and DFF2), while a third register R3 receives its input form a self-voter which votes on two redundant combinational logic circuits to produce its output. Legitimate data is captured by R3 as long as the two external inputs match, in the absence of SET. In this arrangement, R3 is triggered by a clock signal delayed by the width of the SET pulse. However, even though SV-DMR incurs lower overheads as compared to TMR approach, we explore below other approaches which can provide lower area-energy overheads.
In our previous work [10] , a hybrid technique, Temporal Self-Voting Logic (TSVL), capturing benefits of both spatial and temporal redundancy approaches has been introduced as an efficient approach that reduces the performance penalties of the spatial redundancy, i.e., conventional TMR, and improves the performance of temporal redundancy. The proposed technique is utilized to mask soft errors effects in logic paths, occurring due to a transient or an upset error. The TSVL approach requires a single datapath with double registers/flip-flops, while a self-voter circuit is used to vote, based on duplicated registers and the feedback of the selfvoter as the third input for the voter circuit.
To ensure that the correct data is stored in the duplicated register, it should be triggered by clock rate buffered by a delay greater than the generated transient pulse width (d SET ). The clock signals, CLK1 and CLK2, operate on the same clock rate, but they are delayed by a phase shift F1 (F1 5 d SET ). Data from the previous stage will be stored with the rising edge of the clock in the registers, so the timing constraints are essential to ensure that the correct data is stored in the registers. The final output is determined by a comparator circuit that ensures the output is masked; more details can be found in [10] . The hybrid redundancy approach realizes an alternative solution that ranges between the temporal and spatial redundancy schemes, where it consumes roughly 2-fold less area than TMR and provides half the speed degradation of the temporal redundancy approach. In terms of Double Node Upset (DNU) [33] , the hybrid redundancy is also unable to detect and correct simultaneous double upsets in time. This indicates that the TSVL approach like other approaches does not provide protection in the rare case, when both the actual and the redundant registers are upset. Table 1 summarizes the differences between redundancybased mitigation approaches in terms of performance penalties, fault masking coverage, and design complexity.
QUANTIFYING SOFT-ERROR TOLERANT DESIGNS
Experiments and simulations are carried out to analyze and evaluate performance penalties for the three redundancybased soft error masking approaches in terms of occupied area, energy consumption, output delay variation, and resilience of fault masking coverage within iso-energy constraints. Each redundancy-based technique is synthesized and simulated using Synopsys Design Compiler based on 45 and 22 nm HK-MG bulk CMOS PTM-based NanGate open source library [34] . Next, the synthesized netlist is imported into the Synopsis HSPICE tool for Monte-Carlo (MC) simulations. At least 1,000 experimental runs of MC simulations were conducted to evaluate the delay variation for each circuit. These simulations vary the V th of the transistors in the netlist based on a Gaussian distribution having a mean equal to the nominal model card for PTM and sV th as provided in [35] . These experiments are conducted on inverter chains composed of 26 fan-out-of-4 inverters, which is a similar setup as adopted in [36] . Next, we discuss the impact of PV on the redundancy-based radiation-hardening approaches.
Impact of Threshold-Voltage Variation on
Redundant Computation at NTV Nanoscale devices are susceptible to process variations created by precision limitations of the manufacturing process. Phenomena such as Random Dopant Fluctuations (RDF) and Line-Edge Roughness (LER) are major causes of such variations in CMOS devices [35] . The increased occurrence of PV results in a distribution of threshold voltage V th . As V th increases, the increase in switching time affects the delay performance of the circuit. Such variability is observed to become magnified by continued scaling of process technology node [35] . For example, the effect of RDF is magnified as the number of dopant atoms is fewer in scaled devices such that the addition or deletion of just a few dopant atoms significantly alters transistor properties. In addition, a large impact in circuit performance occurs as the transistor on-current is highly variable near the threshold region [7] . Recent approaches for dealing with increased PV at NTV in multicore devices include leveraging the application's inherent tolerance for faults through performance-aware task-to-core assignment based on problem size [37] . Variation impacts at NTV on cache reliability have also been analyzed to leverage adaptive methods to dynamically adjust error control strength [38] . Next, we discuss how these PV effects when combined in redundant arrangements of logic paths can vary the critical path delay to exhibit a higher mean delay than simplex systems. Stated alternatively, NMR arrangements require a more-than-linear increase in energy in order to obtain a delay which is comparable to its component module.
Impact of Threshold Variation on Spatial Redundancy
While operation at NTV can be seen to increase PV by approximately 5-fold as quantified in [3] for simplex arrangements. In the case of an NMR arrangement, it can be expected that the worst-case delay will exceed that of any 
where d represents the delay of the voting logic, which contributes directly to the critical delay. Furthermore, the chance of having an instance with higher than average delay increases with N, which has been validated through experimental results quantified in Section 6. Overall, these results are in agreement with delay distributions of 128-wide laned SIMD architectures demonstrated in [36] , whereby the speed of the overall architecture is also determined by the slowest SIMD lane. Herein, we focus on the performance of NMR systems as compared to simplex systems, with 3 N 5. Intra-die variations for both 22 nm and 45 nm technology nodes are simulated using MC simulations via HSPICE. A viable alternative approach for simulating PV at NTV is also proposed in [39] , which captures both the systematic effects due to lithographic irregularities and the localized variations due to RDF. In the current paper, each module in an NMR arrangement can be anticipated to exhibit comparable spatial variability due to the relative proximity of its module instances in the physical layout. Thus, the scope of this work focuses on random variation impacts while die-to-die variations would comprise future work. Here, variations caused by both RDF and LER effects are modeled as the random effects that vary the V th of CMOS devices. The standard deviation sV th values are adopted from [35] which range from 25.9 to 59.9 mV for 45 nm process and 22 nm process, respectively. Fig. 1 shows the mean delay for an inverter chain for commonly-used values of N. It is observed that the performance impact for 45 nm technology node is around 10:6X on average at 0:5 V as compared to nominal voltage operation (notice, y-axis in Fig. 1 is a logarithmic scale) for a simplex system. However, it reduces to 6:29X when the voltage is increased to 0:55 V. The performance impact for NMR systems with N ¼ 3 and N ¼ 5 tend to follow the same behavior. This is in agreement with the near-threshold performance results noted in [3] .
Results in Fig. 1 indicate that the mean delay is slightly higher for NMR systems and tends to increase with N.
The spread in mean delays between simplex and NMR systems also increases with increased variability effects notable at NTV as shown in Fig. 2 . However, more values are clustered closer to the mean when N is increased. This is observed through delay distribution in Fig. 3 for V DD ¼ 0:55 V whereby the mean values are increased causing right-shifted peaks for N ¼ 3 and N ¼ 5 compared to simplex while the variances are decreased causing narrower spreads. In addition, the performance difference between simplex and NMR systems at 22 nm technology node is magnified due to higher values for the coefficient of variation for 22 nm technology node as compared to the 45 nm technology node as noted in Fig. 2 . For instance, at 22 nm and 0:55 V, the mean delays for N ¼ 3 and N ¼ 5 are 1:16X and 1:24X (see Fig. 1 ) the mean delay for a simplex system, respectively. Whereas, in contrast, for 45 nm technology node, the difference is only 1:06X and 1:09X for N ¼ 3 and N ¼ 5 systems, respectively. These numbers tend to be higher when operating very close to the V th of the transistors due to increased variations.
Impact of Threshold Variation on Temporal Redundancy
Previous results indicate that speed degradation aggravates with lowering V DD for redundant implementations. More importantly, the variation in the output delay for redundant implementations was also found to be increased when scaling down V DD from the nominal level to NTV region [22] . One reason that elucidates this impacting issue is that the buffering circuit for SET pulse width of temporal redundancy incurs more delay variation and speed degradation when scaling down the supply voltage level. However, in our previous work [22] , it is shown that the temporal redundancy realizes better tolerance to the delay variation at NTV compared to both TMR and SV-DMR. Even the variation of the recent technology size (16 nm) was found to be lower than that of the 45 nm technology nodes for TMR and the hybrid redundancy approach. Thus, the temporal redundancy technique can effectively decrease variation-induced timing errors. This indicates that the temporal redundancy can be used to harden large-scale logic circuits more adequately against the radiation-induced soft errors (SET and SEU) while encountering a minor delay variation and a reasonable speed degradation at NTV.
Impact of Threshold Variation on Hybrid Redundancy
Although the hybrid redundancy approach improves some limitations of the conventional spatial and temporal approaches, it imposes a higher impact of PV on the output delay path. This is because both SV-DMR [23] and TSVL [10] approaches incur longer critical delay path than the conventional temporal redundancy or TMR approaches. As observed in Eq. (2) [10] , [22] . Thus, the higher the complexity of the fault masking resolution circuit, the higher the impact of PV on the critical delay path.
Where d voter , and d SET represent delay of the voter circuit and delay of the SET pulse width, respectively.
Fault Coverage Energy Ratio
To realize tradeoffs between the fault-masking coverage achieved versus its cost in terms of power and delay, an appropriate metric is proposed. The metric is then used to optimize the design of several low-power circuits. We evaluate the redundant arrangements by calculating a new metric called Fault Coverage Energy Ratio (FCER), which divides the soft or transient fault masking coverage over the worst case energy consumption. The aim is to maximize the resilience of fault coverage and minimize the consumed energy. Thus, the higher the value of FCER metric, then the better the protection arrangement. FCER is expressed as
The worst case energy consumption considers the power consumed by the logic circuit as well as the protection circuit. To consider worst case conditions, the circuit needs to be evaluated through the application of worst case input vectors. We use FCER to evaluate the overheads of providing soft error resilience using multiple mitigation techniques. We identify the following applications of FCER metric: 1) FCER serves to compare multiple mitigation techniques; 2) Use of FCER as a parameter for synthesis of resilient circuits. In an iterative design process, the metric can be used as an optimization parameter to guide selection of the protection approach and meet performance and energy goals.
FCER Evaluation of Redundancy-Based Approaches
The summary of the FCER analysis for the redundancybased soft error mitigation techniques is listed in Table 2 . Note that, only the fault masking coverage for TSVL approach is listed in Table 2 , while the masking coverage for other redundancy-based techniques is assumed to be 100 percent. Additionally, Fig. 4 shows FCER under scaling down V DD from the nominal level (1.1 V) to 0.5 V for 45 nm technology. As observed in Table 2 , at the nominal supply voltage, the hybrid redundancy scheme (TSVL) achieves the preferred tradeoffs in terms of FCER among the redundancy-based soft error mitigation approaches for most of the selected benchmark circuits, irrespective of whether the combinational or the sequential logic ratio is higher. While the temporal redundancy realizes high or acceptable FCER when the combinational logic ratio is high, whereas TMR achieves desirable/acceptable FCER either when the sequential logic ratio is high or when the benchmark circuit occupies a small area. On the other hand, in the NTV region, the temporal redundancy approach realizes an increased benefit in terms of FCER, meanwhile TSVL still achieves competitive FCER compared to the temporal approach at NTV operations. Consequently, the TSVL approach can be utilized to harden logic paths against radiation-induced soft errors that are constrained with an iso-energy consumption budget with a minimal impact on the masking coverage roughly by 1.5 percent of transient errors. Likewise, the temporal approach offers promising attributes at NTV operation while encountering an acceptable/reasonable performance delay variation and degradation.
Design and Analysis Using FCER
To further investigate the energy-efficiency and resilience of fault masking coverage for highlighted approaches, we synthesized and optimized via FCER maximization the following applications in addition to the benchmark circuits: 1) a 5-stage pipeline processor, 2) a cryptographic processing core, and 3) a memory controller. Using the results, realistic estimations are obtained of performance penalties versus robustness for mission-critical systems. These circuits were synthesized and simulated using Synopsys Design
Compiler with the 45 nm technology node of NanGate open source library. We show how energy savings obtained from NTV operation can be utilized to fortify both the computational logic and the pipeline registers against soft errors via the aforementioned mitigation approaches. Then, we investigate the effects of soft errors on the protected computational logic while evaluating area overheads. For example, a TMR arrangement is utilized with majority voting which has the ability to mask corrupted outputs. On the other hand, a low overhead approach as compared to TMR, the Self-Voting DMR [23] approach is also evaluated. Alternatively, the temporal and hybrid (TSVL) redundancy approach replaces the pipeline registers so that soft fault detection and correction storing circuitry is deployed only in the pipeline registers while utilizing a simplex instance of the datapath. The simulation results of a 5-stage pipeline 32-bit MIPS processor show that the hybrid redundancy (TSVL) scheme incurs only an increased energy of 5.7 percent at nominal supply voltage compared to the temporal redundancy approach, which incurs the lowest power consumption and area overhead. This is due to a major contribution of TSVL hardening scheme as it utilizes a single computational logic for error detection and duplicated registers, instead of triplicated registers in the temporal redundancy arrangement. Therefore, performance penalties of TSVL are reduced compared to the conventional TMR or SV-DMR pipeline stages. However, for the Advanced Encryption Standard (AES) encryption circuitry, temporal redundancy realizes the best energy results among all utilized techniques. This is because the AES encryption module has large fraction of computational logic that remains unprotected in the temporal redundancy while only the state holding logic, i.e., registers/flipflops, are triplicated. Therefore, a minimum area overhead is required to immune the registers. Fig. 5 demonstrates that the temporal redundancy delivers an optimized Energy Delay Product (EDP) at 0.8 V, whereas TSVL needs to operate at 0.75 V to realize the optimal EDP. Even though, temporal redundancy provides reduced energy savings, it has higher propagation delay at the NTV region due to the buffer circuits which are used to delay the main CLK signal and generate the phase shifted signals to capture the input data at different time instances. Meanwhile, considering low supply voltage (V DD ¼ 0.75 or 0.55 V) for energy-efficiency, the TSVL approach delivers intriguing energy savings. This competes well with the temporal redundancy approach by achieving roughly 2.96 and 26.76 percent improvement, respectively. Therefore, it can be utilized to fortify low-power circuits and systems against transient and upset faults. On the other hand, both the conventional TMR and SV-DMR redundancy approaches incur high energy and EDP compared to either the temporal or TSVL approaches at NTV. Consequently, variation-aware fault-tolerant designs can be hardened via temporal redundancy. This is because it exhibits strong capability to incur less delay variation [22] in the critical output datapath and preserves the best tradeoff between energy saving and fault masking coverage for operations at the nominal supply voltage, as listed in Table 2 and discussed later on.
Evaluation of Fault Coverage
In this work, we utilize an in-house fault-injection module developed at the RTL-level to simulate and evaluate the transient fault masking capability of circuits implemented in Verilog HDL. Large-sized circuits such as a 32-bit pipeline processor, a 128-bit AES encryption core, and an I 2 C controller have been evaluated. This aids us to accelerate the determination of FCER for practical systems. For experimental results herein, 1,000 faults of each SET and SEU were injected into the gate-level Verilog code describing the benchmark circuitry configured with the redundancy-based mitigation techniques. SETs were injected into a node impacting a logic gate, while SEUs were injected into the input of a register/ flip-flop with the rising edge of the main CLK signal. All errors were injected at randomly selected locations. The evaluation results showed that the spatial (TMR) and hybrid (SV-DMR) redundancy can prevent the injected transient fault from propagating through the computational logic of the pipeline stages in all cases. On the other hand, for the hybrid redundancy (TSVL), 5.66 percent of the injected errors were occasionally observed to corrupt a pipeline register causing transient error in the register's output.
Further investigation into the fault coverage results of the TSVL approach listed in Table 2 reveals interesting trends. Specifically, TSVL approach achieves high fault resilience for systems that include higher proportion of combinatorial logic than sequential logic. For instance, the majority of the logic (roughly 85 percent) of the AES encryption circuitry is combinational logic for computation while only about 15 percent is used for registers. Therefore, in this case, the possibility of having a SEU at the intermediate registered output hardened with the hybrid redundancy is reduced. On the other hand, for the I 2 C controller benchmark circuit, the TSVL approach provides inadequate robustness (93.15 percent) since the I 2 C controller circuitry includes high sequential logic portion for holding its state, and thus there is a higher probability for an intermediate register to capture a SET and cause an upset. Even with these unpreventable transient faults, the TSVL approach still achieves a high soft fault masking with roughly 7 percent margin of error.
Summary of Fault-Tolerant Design Using FCER
The peak tradeoffs between energy efficiency and resilience of fault masking for the hybrid (TSVL) compared to temporal redundancy approach can be obtained at NTV (V DD ¼ 0.75 V), as an energy saving of 2.96 percent is achieved. Furthermore, the hybrid redundancy approach realizes a higher FCER than the temporal redundancy at NTV region as shown in Table 2 and Fig. 6 . However, our results demonstrate that this is worthwhile for an application/system with the following traits: balanced sequential and combinational resources or an application constrained with isoenergy objective that allows sacrificing fault resilience while meeting the constrained budget of energy consumption. For instance, image processing applications are a promising field to employ the hybrid redundancy at NTV region. This demonstrates the benefit of utilizing either the TSVL or temporal redundancy approach at NTV to protect low power designs against transient faults.
In summary, the energy usage is minimal while using the hybrid redundancy (TSVL) for protection, with the expense of delivering a reduced performance due to the speed degradation. Therefore, to improve reliability of low power computing systems in NTV region, the temporal redundancy approach can be selected to harden designs that consist of higher sequential logic resources, whereas the hybrid redundancy can be chosen to protect a design configured with a balanced usage between the computational and sequential logic resources. Tradeoffs between energyefficiency and resilience of fault masking coverage can be obtained by utilizing the hybrid redundancy approach to protect a low-power system that requires maintaining high performance and tolerates few unmasked soft errors. Finally, the temporal redundancy approach can be selected to effectively immune mission-critical systems to radiationinduced transient faults while delivering energy-efficiency and preserving high soft fault masking with an acceptable performance degradation. In this case, FCER metric is validated as an optimization parameter to guide circuit synthesis algorithms to meet performance goals after the selection of a mitigation technique. It is noteworthy that the electrical and logical masking mechanisms that inherently mask transient faults are not considered during the fault injection process, thus, fault masking coverage can be expected to be higher in the field than the values estimated.
Relationship of Area and FCER
Herein, we explore the relationship between the SER and area overhead of redundant systems by using a set of ISCAS89 benchmark circuits with various number of cells. It turns out that the ratio for all circuits is the same except that the power is positively correlated with area and the probability of soft error is positively correlated with area. Thus, there is an overall impact of FCER ¼ OðArea 2 Þ relationship as observed in Fig. 7 . Meanwhile, since the TSVL approach offers a reduced SER while incurring modest performance penalties in terms of occupied area, power consumption, and speed degradation, it is considered an intriguing approach that addresses or alleviates the drawbacks of spatial and temporal redundancy approaches.
DESIGN DIVERSITY AND SPATIAL REDUNDANCY
In this section, we investigate the use of Design Diversity, which is a circuit synthesis technique, to mitigate soft errors. Design Diversity helps to prevent so-called Common Mode Failures (CMFs) at negligible cost [40] . By leveraging design diversity among the modules, different outputs are produced under CMF such that an error is detectable [41] . Herein, we employ design diversity with spatial redundancy as a use-case to test the efficacy of fault-tolerant circuit synthesis using FCER. For instance, a TMR arrangement is constructed with three different structures while a voter circuit determines the majority condition in order to determine the final masked output.
Various TMR arrangements are considered utilizing a diverse logic path design based on NAND2 and INV gates shown in Fig. 8 . All of these TMR arrangements are functionally equivalent, yet exhibit different amounts of variability. For example, 22 nm TMR arrangements based on NAND2 gates exhibit about 13 percent less variation as compared to a TMR arrangement with INV gates. Thus, in addition to its capability to reduce the probability of common failures, design diversity with spatial redundancy can be utilized to reduce the variability of process variation and improve fault tolerance. However, in our experiments with the inverter chain, the mean delays for NAND-based systems are higher than INV-based systems, which can subsume the benefit of reduced delay variation.
The influence of variability is also dependent on the length of the logic datapath, i.e, the number of gates in the critical path. For instance, it is noted in [36] that the variability decreases as the length of the inverter chain increases. However, as pointed out earlier, logic datapaths operating at NTV may be structured to have relatively shallower depths. In this work, alternative synthesis techniques are demonstrated which can lower the amount of variability at NTV. For instance, it is observed that the variation is also dependent on the type of logic gate utilized. Fig. 9 shows that functionallyidentical inverter chains built using NAND2 gates exhibit the least amount of variation, having inputs tied together to realize the same function as an INV gate. As future work, we plan to investigate the pros and cons of design diversity with evaluation of more functionally-complex benchmark circuits.
To quantify the advantages/disadvantages of diverse TMR arrangements, we utilize the FCER metric defined earlier in Section 4.2. Table 3 lists the summary for some design arrangements of TMR systems. Results indicate that the inverter-based chain achieves the highest FCER, but it incurs the highest delay variation to impact the performance as compared to diverse implementations. Additionally, these results are based on a single error injection scenario considered for each diversity enabled-TMR inverter chain, since the conventional TMR approach is only able to tolerate a single error at a time. Thus, the diversity enabled-TMR arrangements might be used to alleviate/address the issue of MBU, aside from its benefit to tolerate the CMFs. Meanwhile, the delay variation results in time-dependent bit error, when the delay variation exceeds the clock period. Therefore, considering the transient error (SET/SEU) and timing error when calculating the fault masking coverage will provide a more accurate evaluation for the FCER metric. It provides a means to further highlight the pros and cons of design diversity when used with spatial redundancy.
RESILIENCE OF NMR ARRANGEMENT AT NTV
Near-threshold operation allows improved energy efficiency. The energy savings can be employed for either reduced power operation or to increase resilience via NMR. Thus, operation of NMR systems in the near-threshold region allows for consideration of interesting tradeoffs. For instance, increasing N from 1 to 3 can be evaluated as a means to increase reliability within the same energy consumption budget as a simplex system operating at nominal supply voltage (V DD ). This is valid provided that the increase in delay, and thus corresponding drop in performance, is acceptable. Note that this pursuit of increased reliability is predicated upon the assumption that the source of variability in the near-threshold region is due to variation in V th for which this work is restricted. Further study needs to be performed to determine the reliability levels provided by NMR systems in this operating region due to other noise sources such as variation in V DD [42] , inductive noise and temperature [43] .
Based on these assumptions, we point out these tradeoffs in Figs. 10 and 1 . For instance, it shows the feasibility of TMR operation at approximately 0:69 V (operating point B on plot) that on average consumes a comparable budget of energy compered to a simplex system operating at nominal voltage of 1:1 V (operating point A on plot) while incurring a delay difference of 2:58X. Similarly, it is possible to achieve 5MR with approximately 0:545 V (operating point C on plot) operation on average while incurring a performance impact of 7:15X. This represents a greater-than-linear increase in delay as a function of N, as compared to TMR operation. Thus for mission-critical applications, this offers insights into tuning the degree of redundancy facilitated by a near-threshold computing paradigm.
To consider the feasibility of reducing energy consumption while simultaneously providing soft error masking, it is valuable to consider Fig. 10 . The energy requirement of a simplex arrangement at 1.1 V is about 0.541 pJ, while the TMR curve at 0:55 V is only about 0.330 pJ. Thus, a TMR arrangement at 0:55 V results in an energy savings of 38.4 percent as compared to simplex system at 1.1 V. This means compared to a simplex arrangement at nominal voltage, selecting a supply voltage of 0:55 V allows for the provision of TMR for soft error masking in the presence of technology scaling while still reducing the energy requirement significantly.
RELATED WORK
In the literature, several soft error mitigation approaches have been introduced to harden large-scale logic circuits and systems against transient and upset glitches. For instance, an evolved temporal sampling methodology, associated with minimum overheads of area, power, and performance degradation, was proposed by Mavis et al. [6] ; they describe how the design methodology can totally eliminate all soft errors, i.e., foreseeable SETs and SEUs, in any synchronous microcircuit. The key contribution is the ability to implement the proposed approach in combinational logic preceding a latch or utilizing it to protect the latch itself. On the other hand, Avirneni and Somani [19] employed the temporal redundancy approach for mitigating soft errors in a two stage pipeline processor. The authors proposed two approaches, called Soft Error Mitigation (SEM) and Soft and Timing Error Mitigation (STEM). The latter could be used to detect and correct timing errors that occur when operating at very high frequencies, so-called over-clocking, in addition to detecting and correcting soft errors. Both approaches provide 100 percent fault coverage and since the proposed schemes are designed such that they do not incur any reduction in the system's overall performance during faultfree operation, they offer better performance. Specifically, SEM attains 26.58 percent performance improvement on average compared to a conventional TMR approach, while STEM outperforms SEM by 27.42 percent. In [44] , a heterogeneous spatially and temporally redundant system is designed and implemented by exploiting the linear transform of pipelineable applications. The proposed schemes are based on Concurrent Error Detection (CED). In the heterogeneous CED (hCED) based spatial redundancy, the redundant module is used as a checker for the original module and since it only functions as an error detection circuit, its size can be reduced considerably which is the major motivation of the hCED scheme.
The major drawback of hCED is that it is an application dependent scheme since it needs an invariant condition to identify a module as fault-free. Additionally, it can only be used for error detection but not for correction. Alternatively, SV-DMR [23] combines self-voting with DMR or CED. As shown herein, SV-DMR incurs significantly lower performance penalties as compared to TMR. Additionally, lower area overheads as compared to TMR in the range from 10 to 24 percent are reported depending on the proportion of combinational logic in the circuit [23] .
To improve the SV-DMR approach, Temporal Self-Voting Logic [10] , was introduced to achieve energy-efficiency irrespective of the ratio of combinational and sequential logic, while sacrificing some fault masking coverage. On average, the TSVL scheme outperforms SV-DMR by 2.15 percent, while improving area overhead roughly by 22 percent at a cost of 1.5 percent less error resilience. Herein, we have analyzed the energy consumption vs fault resilience and design diversity vs fault masking coverage of redundancy-based mitigation techniques for operation at near-threshold computing. In addition, we investigate the impact of threshold voltage variation on the output delay, and highlight the technique that achieves increased benefits in terms of FCER and less delay variation at NTV for several benchmarks and systems.
Future work includes the development of mitigation techniques that are able to concurrently tolerate MBU. This is because the spacing between the sensitive nodes is reduced with advancing technology generations, and thus the vulnerability of SEU-immune logic circuits increases. In other words, charge sharing between adjacent nodes causes Double Node Upset (DNU), simultaneously, which has become a major contributor to impact the reliability [33] , [45] . Consequently, effective solutions to completely and efficiently address this issue are sought to improve the reliability of nanoscale device technology circuits and systems.
CONCLUSION
NTV operation provides an energy-efficient and performance optimal operating point. However, it gives rise to resilience concerns due to increased susceptibility to soft errors and impacts of process variation. The tradeoffs of resilience and energy-efficiency using the proposed FCER metric for a wide range of redundancy-based approaches provides a unifying measure to evaluate these tradeoffs. Additionally, the benefits of NTV operation are highlighted through opportunity for increasing resilience using spatial redundancy within iso-energy constraints. Overall, temporal redundancy approach demonstrates promising results to improve the reliability of low power computing systems that consist of higher proportions of sequential logic resources, and therefore, it can be considered to harden variationaware designs. On the other hand, the hybrid redundancy approach demonstrates beneficial results for a design containing a balance between the combinational and sequential logic resources. Finally, the developed FCER metric provides a concise and effective optimization parameter to guide the synthesis of fault-tolerant circuits to meet performance and energy goals.
ACKNOWLEDGMENTS
This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paidup, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Faris S. Alghareb received the MS degree in computer engineering from the University of Mosul, Mosul, IRAQ. Currently, he is working toward the PhD degree in computer engineering in the Department of Electrical and Computer Engineering, University of Central Florida. His research interests include soft-error resilient computing architectures, reliable VLSI design with emphasis on low power and high performance, reconfigurable computing, imprecise signal processing, and spin-based emerging nonvolatile (NV) latching circuits. He is a student member of the IEEE.
Rizwan A. Ashraf received the MS and the PhD degrees in computer engineering from the University of Central Florida, in 2013 and 2015, respectively. He is currently with the Oak Ridge National Laboratory. His research interests are in reliable and low-power computing, fault-tolerant system design, reconfigurable computing, and high-performance computing. He is a member of the IEEE. " For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.
