There is a growing concern about timing errors resulting from design marginalities and the effects of circuit aging on speed-paths in logic circuits. This paper presents a low overhead solution for masking timing errors on speed-paths in logic circuits. Error masking at the outputs of a logic circuit is achieved by synthesis of a nonintrusive error-masking circuit that has at least 20% timing slack over the original logic circuit. The error-masking circuit can also be used to collect runtime information when the speed-paths are exercised to (i) predict the onset of wearout and (ii) assist in in-system silicon debug. Simulation results for several benchmark circuits and modules from the OpenSPARC T1 processor are presented to illustrate the effectiveness of the proposed solution. 100% masking of timing errors on all speed-paths within 10% of the critical path delay is achieved for all circuits with an average area (power) overhead of 16% (18%).
Introduction
It is widely acknowledged that there will be a sharp increase in hardware failures in scaled CMOS technologies, e.g., [1, 2] . Timing errors resulting from process variability and manufacturing defects in scaled CMOS technologies are an important -and possibly dominant -failure mode that impact hardware reliability. In addition, long-term degradation effects like hot carrier injection, electromigration, and negative-bias temperature instability are also projected to increase timing failures with technology scaling. Timing errors are most pronounced on critical or near-critical paths, also called speed-paths, in multi-level control logic modules in integrated circuits. Control logic, with its irregular, multi-level structure and significant number of speed-paths is usually the most challenging part of an integrated circuit (i) to achieve timing closure on, (ii) to validate and debug during the post-silicon phase, and (iii) to achieve high fault coverage on during manufacturing test. The system-level effects of errors resulting from failures in control logic can be far reaching, especially given the recent trend toward highly integrated hardware platforms with multi-threaded, parallel execution environments. These factors motivate the development of cost-effective solutions to provide online protection to timing errors on speed-paths in logic circuits.
Conventional techniques for concurrent error detection (CED) in logic circuits provide good coverage when the stuck-at fault model is used to evaluate coverage to both transient and permanent faults [3, 4] , but they are poorly suited to detect and/or mask timing errors. This is because the synthesis of CED circuits usually results in an error detection circuit with a larger critical path delay than the original circuit, making the error detection circuit more susceptible to timing errors. Online delay-fault detection techniques based on monitoring the outputs of the circuit [5] [6] [7] and re-sampling valThis research was supported by NSF CAREER Award CCF-0746850. . ues of the output [8] suffer from the inability to detect errors due to late transitions outside the stability checking period determined during design. Moreover, design effort to pad short paths is necessary to increase the stability checking period [9] . Following timing error detection, error correction based on the above techniques incurs a performance penalty when rollback to the last correctly executed instruction is initiated for re-execution of the instruction stream. Circuit-level waveform monitoring using guard-bands [10] and other architecture-level techniques [11] [12] [13] [14] have been proposed to predict or detect slowdown of paths due to wearout or temperature and voltage variations. The techniques in [10] [11] [12] [13] target specific sources of slowdown -either variations or wearout -but not both. The technique proposed in [14] cannot detect errors dynamically during runtime, but requires periodic offline stress testing of the system in order to be effective. This paper proposes a low cost error-masking solution that specifically targets timing errors on speed-paths in logic circuits. To the best of our knowledge, this is the first paper that proposes runtime masking of timing errors. An error-masking circuit is designed to mask timing errors arising on the application of input patterns that sensitize speed-paths of the logic circuit. The patterns that sensitize all speed-paths in the logic circuit are represented using a speedpath characteristic function (SPCF), and an efficient algorithm to compute the SPCF based on timing analysis of the logic circuit is described. The error-masking circuit is synthesized by using the SPCF to simplify the Boolean functions of the internal nodes in the technology-independent representation of the original logic circuit. By design, the error-masking circuit has at least 20% timing slack over the original logic circuit and is hence immune to timing errors. The inputs to the error-masking circuit are the inputs of the logic circuit and errors are masked only at the outputs of the logic circuit. Error masking is thus non-intrusive, and the critical path delay of the original circuit is not altered by the addition of the error-masking circuit.
Simulation results for several benchmark circuits and modules from the OpenSPARC T1 processor illustrate that 100% masking of timing errors arising on speed-paths within 10% of the critical path delay is achieved for all circuits with an average area (power) overhead of 18% (16%) and an average slack of 57% over the original circuit. In addition to masking timing errors, the error-masking circuit can also be used to log the application of and response to inputs that sensitize speed-paths. This runtime information can be used to (i) predict the onset of wearout through periodic analysis and (ii) assist in in-system silicon debug by expanding trace buffer windows through selective capture. This paper is organized as follows. Section 2 motivates the proposed error-masking technique. Section 3 and 4 describe the algorithm for synthesis of the error-masking circuit. Section 5 presents results for various benchmarks. Section 6 draws conclusions and summarizes directions for future work. 
Background and motivation
Several techniques have been proposed in literature to achieve resilience to timing errors arising due to design marginalities, environmental variations like temperature and voltage, and aging. Existing solutions can be broadly classified into techniques that require (i) logic redundancy or logic restructuring, (ii) output waveform monitoring, or (iii) architectural support.
Concurrent error detection is a logic redundancy based technique that traditionally uses the stuck-at fault model to evaluate and improve coverage to transient and permanent faults, e.g., [3, 4, [15] [16] [17] . However, these techniques are either intrusive, i.e., they require making modifications to the original circuit, or they result in an error detection circuit with long critical paths that is more vulnerable to timing errors. Although a totally self-checking concurrent delay testing technique was proposed in [18] , it is based on duplication of the original logic circuit and hence vulnerable to common-mode timing errors. Logic restructuring techniques based on rewiring [19] and implication [20] transformations have also been proposed. These techniques target soft error rate reduction and their coverage of timing errors has not been demonstrated.
One class of waveform monitoring techniques monitor the outputs for late transitions that occur after the clock edge. These techniques are limited in their effectiveness since they cannot detect errors due to late transitions outside the monitoring period. Further, they are intrusive since short paths in the circuit must be padded to extend the monitoring period. Since such techniques focus on detecting timing errors, the errors are corrected through rollback to the last correctly executed instruction followed by re-execution of the instruction stream, which imposes control overhead and impacts performance. Examples of these techniques include monitoring using a sensing circuit [5] [6] [7] and output re-sampling after a certain delay [5, 8] . Re-sampling techniques also need to address data path metastability and increased clock energy due to the addition of an extra latch, which require design effort and overhead [9] . Output waveform monitoring that provides a guard-band at the outputs before the clock edge for wearout prediction has also been proposed in [10] . Since this technique is based on collection of timing data using sensors, it is specific to prediction of timing errors arising from gradual slowdown of speed-paths due to aging.
Several architecture-level techniques for resilience to variations and lifetime reliability, including dynamic on-chip verification [11] , lifetime-reliability tracking based on technology parameters [12] , wearout detection circuitry to predict the onset of wearout [13] , periodic stress testing [14] , and on-chip temperature and voltage sensors to predict temperature surges and voltage droop [21] have been proposed. A major drawback of these techniques is that they either target only specific sources of timing failures or require periodic offline stress testing in order to be effective. This paper proposes a low cost online error-masking technique that specifically targets failures arising from timing errors on speedpaths in logic circuits. To the best of our knowledge, this is the first paper that proposes a solution for masking timing errors during runtime without any rollback. The proposed error-masking solution has several advantages. First, since a timing error is logically masked, unlike periodic monitoring techniques, the proposed technique is not restricted to any specific source of variation and can be used to mask timing errors due to temperature and voltage variations, aging and wearout related slow down of speed-paths, and errors arising in early lifetime due to latent defects and design marginalities. Second, the error-masking circuit is designed to have at least 20% timing slack over the original circuit. Hence, the error-masking circuit is immune to timing errors. Finally, by design, the error-masking circuit is non-intrusive since errors are masked directly at the primary outputs using a 2-to-1 multiplexer, Hence, there is a marginal, quantifiable impact on the critical path delay of the original circuit, which can be easily compensated for during synthesis.
Error-masking mechanism
The proposed solution for error-masking is based on synthesizing a circuit, referred to as the error-masking circuit, that correctly predicts the outputs of the circuit upon application of inputs that sensitize the speed-paths of the circuit. The basic mechanism of the proposed error-masking approach is shown in Fig. 1 . The original logic circuit has inputs x 1, x2, ..., xn, outputs y1, y2, ..., ym, and a critical path delay Δ. Errors are masked only at the critical outputs, i.e., outputs at which one or more speed-paths terminate. In Fig. 1 , outputs y k , ..., ym are critical and the error-masking circuit is used to mask timing errors only at these outputs. For each critical output y i, k ≤ i ≤ m, the error-masking circuit produces two outputsỹ i and ei. The first outputỹi predicts the correct value of y i when a speed-path is sensitized in the fanin cone of yi. The second output e i is used to indicate that a speed-path is sensitized, i.e., e i is 1 if a speed-path is sensitized to output yi. The logic for the outputsỹ i and ei are designed so thatỹi correctly predicts y i when ei is 1. Error masking at the output is performed using a 2-to-1 multiplexer. The indicator output is connected to the select input, y to the 0-input, andỹ i to the 1-input of the multiplexer. Thus, when a speed-path is sensitized, the select input (e i) routes the 1-input (ỹi) of the multiplexer to the output; otherwise ei is 0 and the multiplexer routes the original output y i to the output. Note that the error-masking circuit is designed so that it has at least 20% smaller critical path delay than the original circuit. Hence, the error-masking circuit is itself immune to timing errors on its speed-paths.
Wearout detection:
The error-masking circuit can be used to detect the onset of wearout. As speed-paths slow down due to wearout and aging, timing errors at the critical outputs y k , ..., ym start to increase. With the proposed error-masking circuit in place, these timing errors will be masked. However, the information that a timing error occurred, indicated by e i(yi ⊕ỹi), can be recorded and analyzed offline periodically. For instance, a high timing error rate observed during offline analysis can predict the onset of wearout and the system can be designed to adapt dynamically to reduce the timing error rate.
Debug information:
The error-masking circuit can also assist postsilicon at-speed in-system debug by guiding selective capture of debug information in trace buffers. Trace buffers are very useful because they can be used for real-time at-speed observation of limited signals during in-system debug [23, 24] . However, trace buffers can only store a limited amount of data in one debug session. To optimize usage of trace buffers, selective storage of signal values on only a few suspect clock cycles has been proposed in [25] . Since errors occur mainly as timing errors on speed-paths, we propose that the error-masking circuit can provide runtime information to selectively store debug information. For a critical output y i, the output e i of the error-masking circuit indicates the application of an input pattern that sensitizes speed-paths that terminate at y i. By storing debug information only when y i is vulnerable to timing errors in the trace buffers, the window size of the trace buffers can be expanded significantly. This runtime identification of the application of patterns that sensitize speed-paths also increases the ability to debug unreproducible bugs. With this background, we describe the proposed algorithm for computing the speed-path characteristic function in Sec. 3 and the synthesis of the error-masking circuit in Sec. 4.
Speed-path characteristic function
Consider a technology-mapped circuit C with primary inputs x 1, x2, ..., xn and primary outputs y1, y2, . .., ym. For ease of notation and without loss of generality, we will use an output y ∈ {y 1, y2, ..., ym} for illustration, although our implementation considers all outputs simultaneously. For a given input pattern I, let y stabilize to the correct value after a finite non-zero delay Δ I . The value of Δ I depends on the applied pattern I, gate delays, and circuit structure. Given a target arrival time at output y, Δ y , pattern I is a speed-path activation pattern iff Δ I > Δy.
Definition:
The speed-path characteristic function (SPCF) for an output y, denoted by Σy (x1, x2, . .., xn, Δy), is the characteristic function for the set of all speed-path activation patterns. Thus, if speed-paths within 10% of the critical path delay Δ are targeted, Δ y = 0.9Δ. In the rest of this paper, the SPCF at y is denoted by Σ y (Δy) for brevity. Traditionally, the SPCF has been used in timing-driven optimization during logic synthesis [26] and in variable latency designs [27] . Many algorithms for computing the SPCF have been proposed in literature. In [27] , a path-based algorithm for computation of the exact SPCF was proposed using an ADD-based timing analysis framework. The ADD-based approach traverses all paths in a circuit and is hence memory and time intensive, especially when a complex and realistic gate delay model is used. An over-approximation algorithm that traverses nodes on the critical paths to compute a super-set of the SPCF, instead of the exact SPCF, was proposed in [28] . Comparing the over-approximated SPCF, computed using the node-based algorithm presented in [28] , to the exact SPCF indicates that the node-based approach presented in [28] may lead to large over-approximations of the SPCF for most circuits [22] . Hence, an extension of the node-based approach to reduce the over-approximation in the SPCF was presented in [22] .
The extended node-based approach of [22] uses arrival and required time information to mark gates with a negative slack as critical. Using two functions, the long path activation function and the short path activation function, both statically and dynamically sensitizable patterns are computed in a single topological pass through the circuit. The algorithm is node-based because the critical gates are marked statically, i.e., before the topological pass through the circuit. Thus, if a gate has more than one fanout and the gate lies on a critical path only along one fanout, the gate is marked critical and input patterns that sensitize any path through this gate are included in the SPCF. Although node-based traversal makes it computationally efficient, the over-approximation in the SPCF arises as a consequence of node-based traversal, i.e., statically marking critical gates before the topological pass to compute the SPCF.
Our work extends the node-based algorithm from [22] to a pathbased algorithm that computes the SPCF exactly. In our path-based approach, gates are not marked as critical based on required and arrival time information. Instead, a gate is marked as critical in the context of the path on which it lies. This enables the exact computation of the SPCF using a path-based algorithm. However, the accuracy of the path-based algorithm comes at the cost of computational complexity. The trade-off between accuracy and runtime for the node-based approach of [22] and the proposed path-based approach is illustrated in Table 1 . The first 3 columns report the name, number of inputs and outputs, and area of the circuit. The SPCF is computed as the set of all patterns that sensitize speedpaths within 10% of the critical path delay. The number of critical patterns, i.e., the number of input patterns in the SPCF and the runtime for computing the set of critical patterns for the node-based approach [22] and the path-based extension are shown in columns 4 and 5. Note that the set of critical patterns computed using the node-based approach is always a super-set of the set of critical patterns computed using the proposed path-based approach. However, the path-based approach is, on average, 3.5X slower than the nodebased approach.
The computational complexity of the path-based extension of [22] can be attributed to the path traversals for the computation of the long path and short path activation functions. In this paper, we show that the computational complexity can be reduced significantly by computing the SPCF based on the short path activation function only. Consider a gate g with a single output z in a technology-mapped circuit with inputs l 1, l2, ..., l k . Let f (l1, l2, ..., l k ) denote the Boolean function at z. Let δ l i denote the delay of input l i to output z and Δz denote the target arrival time at z. Let Σz(Δz) denote the complement of the SPCF at z, i.e., Σz(Δz) is the set of all input patterns such that the value at z stabilizes before the target arrival time Δ z . Let P be the set of all prime implicants in the on-set and off-set of f . Let L(p) denote the set of literals in each prime implicant p, where L(p) ⊆ {l 1, l2, ..., l k }. The target arrival time at z is met iff the target arrival time is met for all the literals in at least one prime implicant in the off-set or on-set of f . Thus, Σz(Δz) is given by
Eqn. 1 can be used to recursively compute Σy for each primary output y of the circuit that contains speed-paths. The runtime for the proposed path-based algorithm is shown in column 6 of Table 1 . Note that for runtimes that are comparable to the node-based approach, the proposed algorithm computes the SPCF exactly. The next section describes a synthesis algorithm for the error-masking circuit using the SPCF.
Synthesis of the error-masking circuit
Given a technology-mapped circuit C with inputs x1, x2, ..., xn and outputs y1, y2, ..., ym. For ease of notation and without loss of generality, we will use an output y ∈ {y 1, y2, ..., ym} for illustration, although our implementation considers all outputs simultaneously. Let Σ y (Δy) denote the SPCF of output y, where Δ is the critical path delay for the design. Thus, if the speed-paths in the top 10% of Δ are targeted for protection by the error-masking circuit, then Δ y is set to 0.9Δ. Note that if an output has a slack greater than 0.1Δ, the output is not critical. The objective of an error-masking circuitC is to predict the correct output of C for patterns in Σ y (Δy). For patterns not in Σy(Δy), the circuitC need not predict the outputs of circuit C correctly, i.e., the patterns not in Σ y (Δy) lie in the input don't care space for circuitC. Since the error-masking circuitC must correctly predict the outputs of circuit C only on patterns in Σ y (Δy), the circuitC must also indicate when an output is correctly predicted. Thus, for an output y in circuit C, the error-masking circuitC produces two outputs: (i)ỹ that predicts the correct value of y and (ii) e that indicates when the prediction of the error-masking circuitC is correct. The logic functions forỹ and e are designed so that e is 1 when a speed-path is sensitized, i.e., a pattern from the SPCF is applied andỹ predicts the correct value of y when e is 1. Although the specifications ofỹ and e have a rich input don't care space to be exploited during synthesis, the Boolean functions ofỹ and e are inter-dependent (sincẽ y must correctly predict y whenever e is 1) and this makes synthesis ofC challenging.
The rest of this section describes an algorithm for the synthesis of the error-masking circuit. The synthesis algorithm exploits the don't care space in the specification of circuitC to optimize the area and delay ofC. Hence, the error-masking circuitC has a small area-power overhead (16-18% on average) and greater than 20% timing slack over the original circuit, as presented in Sec. 5. The large timing slack ensures that the error-masking circuit is itself immune to timing errors on its speed-paths. Before presenting the proposed algorithm for the synthesis of the error-masking circuit C, we will briefly describe two simple synthesis algorithms along with their limitations to motivate the proposed synthesis algorithm.
Bottom-up synthesis:
The bottom-up approach involves direct synthesis, i.e., two level minimization of the incompletely specified function, followed by multi-level optimization of the logic forỹ and e. Although, this approach can leverage the rich space of don't cares in the specification ofỹ and e for optimizing the logic, it is not scalable to circuits with more than 15-20 inputs. Further, the interdependence of the Boolean functions ofỹ and e makes bottom-up synthesis even more computationally demanding.
Top-down synthesis:
A top-down approach uses circuit C as a starting point and simplifies the circuit for synthesizing the logic forỹ. This approach has several disadvantages: (i) it is not flexible because the synthesis is tied to the implementation of circuit C, (ii) this approach may yield circuits that are susceptible to commonmode timing failures, since the implementation of circuitC will be structurally similar to circuit C, and (iii) since the Boolean functions ofỹ and e are inter-dependent, top-down synthesis from C may not effectively use the input don't care space. The top-down approach -in the extreme case -is based on duplication of the critical paths of C. This approach is ineffective because the duplicated paths will be as susceptible to timing errors as the critical paths in the original circuit.
Proposed synthesis algorithm
The technique proposed in this paper uses an intermediate representation -in the form of the technology-independent representation of the original circuit C -as a starting point for the synthesis of the error-masking circuitC . A technology-independent network is an intermediate representation of a circuit in which the internal nodes can have complex Boolean functions (with 10-15 inputs). The main advantage of starting with a technology-independent network is that the don't care space in the specification of the Boolean function for the error-masking circuitC can be exploited effectively to simplify the Boolean expressions of the internal nodes and thus reduce the overhead of the error-masking circuitC. In addition, working with a technology-independent representation does not suffer from the scalability issues of the bottom-up approach described above because the Boolean expression of the internal nodes are limited to 10-15 inputs. We will now describe the technique for simplification of the technology-independent representation, T , of circuit C to obtain the technology-independent representation,T , of the error-masking circuitC.
Let Σ y (Δy) denote the SPCF of an output y of C. Since the patterns in Σ y (Δy) are targeted for error masking, Σy(Δy) is the input care-set for the logic cone of y. Let n j be an internal node in the fanin cone of y in the technology independent network T . Let a 1, a2, . .., a k be the inputs of nj. In order to predict the output y correctly for input patterns in Σ y (Δy), the output of nj must be correctly predicted for the minterms in the satisfiability-care set induced by Σ y (Δy) at the inputs of nj. Let s0 and s1 denote the satisfiability-care minterms in the off-set and on-set of n j . The Boolean expression of n j can be simplified to ensure the correct prediction of minterms in s 0 and s1. Since nj can have up to 10-15 inputs, the exact computation of the satisfiability-care minterms, s 0 and s1, is computationally intensive. Hence, we propose a technique based on eliminating cubes from the sum-of-product (SOP) expressions of the on-set (n j ) and off-set (nj) to obtain reduced on-set (n 0 j ) and reduced off-set (n Note that the proposed technique is computationally efficient because it uses a coarser granularity, i.e., the cubes of the SOP, rather than the individual minterms, to compute the cover. After generating the covers n 0 j and n 1 j , the outputsñj and en j for node n j are given by,ñ
The output en j is 1 when an input pattern from Σy(Δy) is applied and outputñ j predicts the correct value of nj when en j is 1. Note that e n j may be 1 for many more minterms apart from the minterms in Σ y (Δy). Hence, the Boolean expression for en j can be simplified further by elimination of cubes in its on-set that are not essential for covering minterms in Σ y (Δy). The indicator output e y for primary output y is 1 when all internal nodes in the fanin cone of y predict their outputs correctly. Thus, e y is generated as a Boolean and of the indicator outputs, e n j , of all internal nodes n j in the fanin cone of y. The simplified technology-independent networkT is then synthesized, optimized, and mapped to produce the error-masking circuitC.
2-bit Comparator
A 2-bit comparator is used to illustrate the proposed error-masking technique. Consider the 2-bit comparator shown in Figure 2 (a) with inputs a 0, a1, b0, b1 and output y. The comparator output, y, is 0 when the decimal equivalent of the binary number a 1a0 is less than the decimal equivalent of b 1b0. The optimal factored form for the off-set and on-set of y is,
Assuming unit delay for an inverter and a delay of two units for 2-input gates, the critical path delay of the 2-bit comparator is 7. Suppose all speed-paths within 10% of the critical path delay are susceptible to timing errors, then Δ y is 6.3. The speed-paths within 10% of the critical path delay have been highlighted in Fig. 2(a) . The SPCF, Σ y (a0, a1, b0, b1, Δy), is Σy(a0, a1, b0, b1, Δy) = a1 + a0b1
The satisfiability care-set induced by Σy(a0, a1, b0, b1, Δy) in the off-set and on-set of y, represented by s 0 and s1, respectively, is
The cubes from Eqn. 3 that are selected to cover all the satisfiability care minterms, s 0 and s1, are y 0 = a1b1 + a0b0(a1 + b1) and y 1 = (a0 + b0)(a1 + b1). Using y 0 and y 1 , the Boolean function for the outputs of the error-masking circuitỹ and e, as shown in Eqn. 2 are,ỹ
Note that the error-masking circuit specified by Eqn. 4 covers extra minterms in addition to the minterms in Σ y (a0, a1, b0, b1, Δy). The Boolean expression for e is simplified further by elimination of cubes in the on-set that are not essential to cover minterms in Σ y (a0, a1, b0, b1, Δy). The simplified Boolean expression for e is e = a1 + b1
This error-masking circuit is shown in Fig. 2(b) .
Results
This section presents simulation results for the proposed errormasking technique. The simulations were run on a 64-bit 2.4 GHz Opteron-based system with 6 GB memory. The benchmark circuits were synthesized using Synopsys Design Compiler and mapped with the lsi_10k gate library.
The SPCF was computed for a target arrival time of 0.9Δ, where Δ is the critical path delay of the circuit. Table 2 presents areapower overhead of the proposed technique. Columns 1-3 report the name, number of inputs and outputs, and number of gates for each circuit. Column 4 reports the number of critical primary outputs, i.e., primary outputs that contain speed-paths. The number of input patterns in the SPCF over all critical primary outputs is reported in column 5. We observed that on average, about 20% of the primary outputs were critical primary outputs. The slack in the critical path delay of the error-masking circuit over the original circuit is reported in column 6. The area and power overhead of the error-masking circuit is reported in columns 7 and 8. For every benchmark circuit, 100% coverage for masking of timing errors was achieved, i.e., all the input patterns in the SPCF were covered by the error-masking circuit. The average area (power) overhead of the error-masking circuit is 16% (18%) and the average timing slack is 57%.
Conclusions
Timing errors resulting from process variability, manufacturing defects, and long-term degradation effects on logic circuit speedpaths are an important -and possibly dominant -failure mode that impact hardware reliability. This paper described the synthesis of a low-cost error-masking circuit to mask timing errors on speedpaths in a logic circuit. The error-masking circuit is itself immune to timing errors, and can also be used in wearout prediction and post-silicon debug. Other potential applications of error-masking circuits, e.g., (i) adaptive speed-up of critical gates using body bias and (ii) aggressive dynamic voltage scaling by masking timing errors are areas for future research.
