Abstract-With the growing complexity of current designs and shrinking time-to-market, traditional ATPG methods fail to detect all electrical faults in the design. Debug teams have to spend considerable amount of time and effort to identify these faults during post silicon debug. This work proposes off-chip analysis to speed-up the effort of identifying hard-to-find electrical faults that are not detected using conventional test methods, but cause the chip to crash during functional testing or silicon-bring-up. With the goal of reducing the search space for reconstructing the failure trace path, formal methodology is used to analyze the reachable states along the path. Isolating the root cause of failure is also accelerated. Moreover, we propose a forward traversal technique on selected few possible faults to generate a complete failure trace starting from the initial state to the crash state. Experimental results show that the proposed approach can lead to a 44% reduction in actual silicon run with a commensurate reduction in off-chip debug time.
I. INTRODUCTION
The goal of conventional Automatic Test Pattern Generation (ATPG) algorithms is to generate test vectors for every fault at the gate-level model of the circuit. Although this methodology has a high fault coverage, current designs have large numbers of faults to be tested. Moreover, the algorithm complexity makes this process time-consuming and expensive [1] . The shortcomings are even more acute when sequential ATPG algorithms are used to identify the electrical bugs that require multiple cycles to be detected [2] [3] . Thus deep electrical bugs -triggered only at extreme sequential depth -often escape the manufacturing test step but cause the chip to crash after few seconds of execution. Once a failure is observed, an additional validation step, popularly known as post silicon debug, is used to locate the design defects. The process of identifying these bugs is painstakingly slow and takes more than half of the chip development cycle [4] [5] [6] . In this paper we propose a novel technique to reduce the debug time for diagnosing these bugs. An off-chip analysis is augmented with actual silicon run to reconstruct the path from initial state to the crash state and identify a small set of faults that may have caused the chip to crash.
In [7] , a post silicon debug technique is proposed to identify deep-electrical bugs. Starting from the crash state, [7] traverses the failing trace backwards towards the initial state, pruning the possible set of faults during this process. Given a reachable state and a possible bug, formal techniques are used to identify the possible previous states. Next, reachability analysis is performed for each possible previous state by adding a breakpoint and running the chip. The utilization of formal engine makes FD-BackSpace more robust and practical compared to simulation-based techniques [7] . Although the actual time to run the chip can be in the order of a few seconds, performing reachability analysis for each state requires the verification engineer to run the chip multiple times. This is a very slow process negating the effectiveness of the approach. Building on the previous work on FD BackSpace, this paper reduces the time required to identify a small set of electrical faults that can cause the chip to fail.
In general, deep electrical bugs cause the chip to malfunction after executing millions of cycles. Generating test cases for such bugs requires months of simulation on server farms. However, actual silicon run is several orders of magnitude faster requiring few seconds to trigger the error in the design thereby leading to the crash state. Unlike in pre-silicon verification, the accessibility and visibility of internal signals are very limited in post-silicon debug and hence this is the major challenge in the validation and debug of first silicon [8] . In [9] trace buffers are used to store the values of selected signals for multiple clock cycles. A binary-search-based debug method [10] iteratively divides the search space in half until the method identifies the first cycle in which the error is activated and observed. All of these techniques address the observability problem and can enhance off-chip analysis presented in this paper. This paper describes a technique that combines formal methods with on-chip support logic to identify the root cause of the failure. The off-chip analysis extracts information from the crashing state that results in reduction of the number of actual silicon runs. In particular, the contributions are as follows:
• We propose a formal technique to analyze the possible states in the failure trace path. Rather than running the chip to determine the reachability of each possible state, we perform off-chip analysis to determine if a state can exist on the failure trace path. This reduces the number of silicon runs as well as constraints the state space search.
• Assuming deterministic system behavior, the algorithm quickly reduces the number of possible faults by identifying failure paths that do not match the actual execution of the chip.
• In addition to backward traversal along the failure trace path this work proposes forward traversal technique that further reduces the actual silicon runs. This additional analysis also helps in reducing the overall debug time.
Experiments on the ITC'99 benchmarks demonstrate the effectiveness of our approach. On average the number of silicon runs was reduced by 44%. This can be translated to shorter debug time. Moreover the run-time for off-chip analysis can be reduced by 36%. The rest of the paper is organized as follows. In Section II provides the background information and introduces the notations used in the rest of the paper. Sections III, IV and V present the three off-chip analysis techniques that reduce the debug time. Experimental results are presented in Section VI followed by the conclusion in Section VII.
II. PRELIMINARIES

A. Notation and Basic Definitions
We model the circuit C as a finite state machine M , with L flip-flops, I inputs, O outputs, initial states S 0 ⊆ 2 L , and
L . An execution path (run) on M is a finite sequence of states π = s 0 s 1 s 2 . . . s n , where n ∈ N and s 0 ∈ S 0 . A crash state is a state of the chip where a bug is observable (e.g., a system hang). A path s i s i+1 . . . s n is said to be a valid trace leading to the crash state if s n is the crash state and for each s j , i ≤ j ≤ n − 1, s j is a predecessor of s j+1 and reachable from the initial state.
To simplify our exposition, we will assume a single-stuck-at fault model. However, our method works for any fault model that can be modeled using the SAT-based technique described next. We also denote G as the set of all possible faults.
B. Fault Modeling
Similar to [7] , we use the SAT-based fault-modeling technique introduced in [11] : we augment the model of the fault-free circuit C (henceforth, C') by adding a mux at the output of each gate g. Each mux has one fault-select and one signal line. We denote the set of fault-select lines by E = {e 1 , e 2 , . . . , e n } and the set of corresponding fault-signal lines by W = {w 1 , w 2 , . . . , w n }, where n = |G|. By setting e i = 1, the node g i is disconnected from fanout(g i ), and w i is connected to every node g j ∈ fanout(g i ).
This modeling framework allows us to disconnect any input of a gate from the circuit C and force an arbitrary faulty value instead. For example, for the single stuck-at fault model, we constrain that exactly one e i = 1 (which enforces "single"), assign the corresponding w i to be 0 (1) for stuck-at-0 (stuck-at-1), and do not allow e i or w i to change in different time-frames (which enforces "stuck-at").
For example, consider the sequential circuit in Fig. 1(a) . A possible faulty version is in Fig. 1(b) , where a s-a-1 fault presents in g 2 . Fig. 1(c) shows the augmented circuit with a set of muxes used to model each fault. (For space reasons, the latches have been removed in Fig. 1(c) .) The fault is modeled by setting fault-select line e 2 = 1 with all other e i = 0, and fault-signal line w 2 = 1.
To communicate with the SAT solver, C is encoded in Conjunctive Normal Form (CNF) using Tseitin transformation with the use of auxiliary variables [12] .
C. FD-BackSpace
Since the proposed off-chip analysis is based on the FDBackSpace framework, we briefly review the debug-flow here. For a complete presentation, please refer to [7] . FD-Backspace framework starts from the observed bug sighting (e.g. crash state) s n and goes backwards towards the initial states, considering only faults that are relevant to possible executions to the actual bug. Initially the possible fault set consists of all the possible faults in the circuit. Using formal analysis of the corresponding RTL (or other model), we compute the set of predecessor-candidates (P ) of the crash state. If the fault cannot exist under the crash state, then P is empty and we remove the corresponding fault from the possible fault set. If P is non-empty, we load each predecessor state into a breakpoint circuit on the chip. Starting from the initial state (s 0 ) the chip is run at full-speed and stopped at the breakpoint state if the state exists in the failure trace path. The predecessor state is considered to be invalid if the chip does not reach the breakpoint state before reaching s n . If a valid predecessor state is observed then it is considered as possible trace path from the s 0 to s n under a given fault condition. As the algorithm traverses through the state space, the possible fault list is pruned. This framework was able to reduce the possible fault set to a handful of faults that can be further analyzed to diagnose the root cause of failure.
III. PREVIOUS STATE ANALYSIS
We start this section by formally defining unreachable states. A light-weight technique to detect unreachable states is then presented.
Definition 1: Given a circuit C, a trace π is a finite sequence of states π = s 0 s 1 · · · s n where n ∈ N such that s 0 is an initial state and each pair {s i , s i+1 } in π is a legal transition in C.
Definition 2: Given a circuit C, a state s is said to be unreachable if and only if there does not exist a trace π such that π = s 0 s 1 · · · s.
Lemma 1: Given a circuit C, if a state s is unreachable then for all π = s 0 s 1 · · · s, there exists a pair {s i , s i+1 } that is not a legal transition in C Proof: From Definition 2, it is trivial that if s is an unreachable state then any sequence in the form π = s 0 s 1 · · · s is not a trace. From Definition 1, a sequence is not a trace when there exists a pair {s i , s i+1 } that is not a legal transition in C.
Theorem 1: Given a circuit C, a state s is unreachable if and only if for each reachable state s r , the pair {s r , s} is an illegal transition in C.
Proof:
→ direction: Assume there exists a reachable state s r that can reach s. A trace π is constructed such that it starts with an initial state s 0 , reaches to s r and finally transitions to s. Because s r is reachable, all pairs of state transitions from s 0 to s r are legal transitions in C. Hence, π r = s 0 , s 1 , · · · s r , s is a trace and s is reachable. This is a contradiction and thus, there does not exist such reachable state s r . ← direction: Consider an arbitrary sequence π = s 0 s 1 · · · s s . If the penultimate state before s is reachable then the last pair is an illegal transition from the assumption and thus π is not a trace. If the penultimate state before s is unreachable, then π is obviously not a valid trace. Therefore, s is unreachable.
In order to prove a state s is unreachable, one can prove that each sequence π = s 0 s 1 · · · s contains at least one pair {s i , s i+1 } that is an illegal transition (Lemma 1) or prove that s cannot be reached from any reachable state (Theorem 1). These methods however require a tremendous amount of work and are not feasible in practice. In this work, a light-weight method to verify whether a state s is unreachable is proposed. The next subsection describes the method and proves its correctness.
A. Unreachable State Identification
Theorem 1 provides an approach to verify whether a state s is unreachable. However, finding the complete set of reachable states R is an arduous task [13] . The following corollary of Theorem 1 shows how we over-approximate R and simplify the verification problem of an unreachable state.
Corollary 1: Given a circuit C and the set S where S is an over-approximation of the set of reachable states, if for all state s i ∈ R the pair {s i , s} is an illegal transition then s is unreachable.
Proof: As S is an abstraction of R, it must contain all reachable state s r . Because s r ∈ R, {s r , s} is an illegal transition in C. From Theorem 1, the fact that {s r , s} is illegal for any s r indicates that s is unreachable.
It is essential to note that the opposite statement of Corollary 1, "if there exists a state s i such that pair {s i , s} is a legal transition in C, s is reachable" is not true. This is because S is just an over-approximation of R and hence may still contains unreachable states. The state s is reachable only if s i is reachable, which cannot be confirmed without further investigation.
From Corollary 1, a state s can be verified as unreachable by checking if any state can transition to s. This can be encoded as a SAT query by constraining the next-state signals as s while present-state signals are left unconstrained. Next, the SAT solver is asked to find a solution on the present-state signals. If there is a solution (SAT ), no conclusion can be drawn on s.
In the SAT query, we leave the present-state signals unconstrained so that the SAT solver can find a solution in the set of all states, which of course is an abstraction of R. Nevertheless, this solution space can be reduced while still maintaining the property as an abstraction of R. Specifically, if a state s is already found unreachable, s can be safely removed from the solution space. This is accomplished by simply forcing the solver to return a solution that is different from s.
Example 1: Consider the example in Figure 2 . The next-state signal x is constrained to 0 and y is constrained to 1. In this example, the circuit is also assumed under a stuck-at-1 fault at gate g 2 . We want to find a present-state, an assignment on x and y, such that it would satisfy the next-state < 0, 1 >. Unfortunately, there are no values on x and y that can alternate the stuck-at-1 fault at gate g 2 . This implies that there is no state that can transition to < 0, 1 >; hence < 0, 1 > is unreachable 
Solve (φ) = SAT then return false; 4 else return true;
Algorithm 1 depicts the process of identifying an unreachable state. Given a circuit C, a fault p and a state s, the function returns whether s is unreachable under p. First, a CNF presentation of circuit C with the assumption of fault p is constructed (line 1). Next, the next-state signals of the CNF instance are constrained to s (line 2). Then, the CNF instance is solved using the SAT solver. If there is a satisfying assignment, s is assumed reachable and hence the algorithm returns false (line 3). Otherwise, the algorithm returns true indicating s is unreachable under p (line 4).
IV. PATH ANALYSIS
In this section, a technique to prune out suspect faults early in the execution of FD-BackSpace is presented. This technique is based on the consistency between a fault and the chip's behavior.
Recall that in the FD-BackSpace framework, if under the assumption of a fault p, a state s has no predecessors, it is ignored and the algorithm continues to analyze fault p. In this case, the algorithm is conservative and assumes that the state s is not in the plausible path. This is sensible as proved in Section III. If a state s has no predecessor-states, it is unreachable under the fault p. Though, s is unreachable under p does not imply that it is unreachable under other faults. This scenario is illustrated in Example 2.
Example 2: Consider the scenario in Figure 3 where a faulty circuit is returned with a crash state and an initial state. In this case, the crash state is < 0, 1 > and the initial state is < 0, 0 >. Now let us assume that the real fault is a stuck-at-0 at gate g 2 . We have proved that the state < 0, 1 > is unreachable under the faults, stuck-at-1 at gate g 2 and the stuck-at-0 at gate g 4 . If the fault is stuck-at-0 at gate g 2 , the state < 0, 1 > is actually reachable from the initial state < 0, 0 >. Figure 4 shows a plausible path for the fault stuck-at-0 at gate g 2 . In this case, < 0, 1 > is actually observed as the crash state and that refutes stuck-at-1 at gate g 2 and the stuck-at-0 at gate g 4 as possible faults. Now, consider the case where a state s is observed during a chip-run; this observation indicates that s should be reachable under the real fault. As a result, if s is observed and it is unreachable under a fault p, p cannot be the real fault and is invalidated as a possible fault. Algorithm 2 shows how to invalidate a fault p using this technique. Given a state s that is unreachable under fault p, a timeout and a set of initial states S 0 , the function returns whether p is an invalid fault. This function runs the chip to find if s can be observed (line 2). This is accomplished by using the original BackSpace framework. If s is observed, the function returns false indicating that p is an invalid fault; otherwise, true result is returned (line 3-4). This technique although requires additional chip-runs has the ability to prune out spurious faults that can not be detected by the original FD-BackSpace framework. 
A. The Updated FD-BackSpace Algorithm
In this section, we present the updated algorithm of FDBackSpace after integrating techniques mentioned in previous sections.
The circuit C, the set of initial states S 0 , the crash state s c , a timeout and the set of initial faults G are the inputs to Algorithm 4. At the end of its computation, this algorithm returns the set of possible faults F. The set G i is used to store invalid faults (line 1). For each fault, S v , S uv , S r S ur are sets of visited states, unvisited states, reachable states, unreachable states, respectively. To expand reachable state set for a fault p, a function Expand Reachable State Set is invoked (line 5). Moreover, for each fault, two Boolean variables valid and invalid are employed to keep track of its status (line 7-8). The loop is terminated when the fault p is proved to be invalid or valid (line 11). If S uv is empty, p is an invalid fault as all states found during traversal are proved unreachable (line 12). When p is proved invalid, it is added to G i (line [13] [14] . At each iteration, a state s at the top of S uv is analyzed (line 17). If s is reachable from an initial state, p is a possible fault and it is added to F (line 19-21) . Otherwise, it is tested on unreachability (line 23). If it is unreachable but observed in a chip-run, p is marked as invalid fault (line 25). However, if it can only be unreachable, it is added to S ur (line 28). When an unreachability test cannot draw any conclusion on s, the algorithm attempts to go back further. All predecessors of s are found using the SAT solver (line 33). For each predecessor state s , it is carefully tested before added to S uv (line 37). First, s must be different from all found unreachable states, s / ∈ S ur , and is not visited before, s / ∈ S v (line 35). Moreover, s must be observed during a chip-run (line 37). These tests assure that s has not been encountered before and it exists in the actual path from initial state to the crash state.
VI. EXPERIMENTAL RESULTS
This section presents the experimental results of the proposed framework. The presented techniques are augmented with the original FD-BackSpace framework from [7] . All experiments are run on an Intel Core i5 3.1 GHz quad-core workstation with 8 GB of RAM. To interface with the original BackSpace framework, all circuits are synthesized using the cell library included with the BackSpace tool. We use Synopsys Design Compiler Version Y-2006.06-SP2 and Synopsys VCS Version A-2008.09 as our compiler and logic simulator, respectively.
Ten ITC'99 benchmark circuits are used in our experiments. A s-a-0 or a s-a-1 is randomly inserted to each gate-level netlist obtained by synthesizing the original RTL. For each benchmark, we simulate the gate-level netlist with the accompanying test-bench. We randomly select a state as the crash state after running the test-bench for few seconds. Experiments are conducted with five different versions of Algorithm 4. In the first version, we turn off all new techniques to represent the the original FD-BackSpace framework FD-BackSpace. Then we turn on each of the following techniques individually: (a) unreachable state identification UR,(b) reachable states expansion RS, and (c) invalid fault removal IF + UR. Note that in order to apply IF, we need to detect unreachable states first and hence UR must be turned on for IF to work. As a result, we encode the invalid fault removal optimization as IF + UR. Finally, we apply all new optimizations UR + RS + IF. Table I displays the information regarding each benchmark. The first column gives instance names. The next two columns respectively show the numbers of flops and the initial numbers of faults. At the start of the debug step all faults are considered to be plausible for causing the chip to reach the crash state. Table II shows the results of all our experiments. The first column gives the instance name. The next three columns respectively show the final numbers of faults, the run-times and the numbers of chip-runs under the original FD-BackSpace framework. Columns five, and six show the run-times and numbers of chip-runs under UR. Column seven and eight give the run-times and the numbers of chip-runs under RS. Note that UR and RS optimizations only expand the sets of unreachable and reachable states to help speed-up the traversal process. These optimizations however do not alternate the traversal process of the original FD-BackSpace and thus cannot reduce the possible fault set as compared to the original FD-BackSpace. Hence, the final numbers of faults in these experiments stay the same as in the original FD-BackSpace. These numbers are thus omitted to avoid repetition. Column nine, ten and eleven respectively show the final numbers of faults, the run-times and the numbers of chip-runs under IF + UR. As presented in Section IV, IF + UR can detect invalid faults that may not be found by the original FD-BackSpace. Thus, the final numbers of faults can differ from the original FD-BackSpace and hence are reported. The last two columns give the run-times and the numbers of chip-runs when all optimizations are turned on (UR + RS + IF). We do not report the final numbers of faults in this case because they are the same as when IF + UR is applied. Figure 5 plots the ratios of run-times and the numbers of chip-runs between the UR + RS + IF and the original FD-BackSpace. Our algorithm outperforms the original FDBackSpace framework in all cases both in run-times and the numbers of chip-runs. Specifically, in b01, we are able to obtain 60% reduction in run-times and 85% reduction in the number of chip-runs. Note that for b01, our algorithm also finds six invalid faults more than the original FD-BackSpace algorithm.
In general, FD-BackSpace works well with designs that have It is important to note that the unreachable state identification technique poses overhead in run-times when applied individually. This technique requires the system to make additional SAT queries and hence increases the run-times significantly.
As implied from the previous paragraph, if the number of reachable states is significant, the unreachable state identification technique creates a noticeable amount of overhead and does not provide many benefits. However, this extra run-time comes with the potential benefit of less chip-runs. Given that each chip-run requires a significant amount of time to setup and run, the extra time spent in the analysis software is justified. Moreover, identifying unreachable states is crucial for the IF + UR optimization. The IF + UR optimization gives us the ability to find invalid faults early in the run. This means considerably less states to analyze compared to UR and FD-BackSpace. As a result, while IF + UR still makes extra SAT queries for unreachable analysis, it is much less compared to when only UR is applied. Furthermore, in some cases while FD-BackSpace and UR go further back and make extra SAT queries, IF + UR marks a fault invalid and stops analyzing that fault. In fact, as seen from the table, IF + UR reduces the numbers of chip-runs, finds more invalid faults and has comparable run-times compared to FD-BackSpace. The RS optimization of course reduces the run-times and chipruns as it allows us to stop much earlier during the path traversal compared to FD-BackSpace. Overall, when applying all optimizations we are able to obtain a 36% reduction in run-times and a 44% reduction in the numbers of chipruns compared to the original FD-BackSpace. This shows the effectiveness of our methods.
VII. CONCLUSION
This work improves the original FD-BackSpace framework by proposing three optimizations. First, a formal mechanism to detect unreachable states without chip-run trials is introduced. Second, a technique to detect invalid faults using unreachable states is presented. Finally, we propose a light-weight technique to expand the set of known reachable states from the set of initial states. The end result is significant reductions both in run-times and the numbers of chip-runs, demonstrating the practicality of our techniques. 
