Abstract-Register-transfer level (RTL) debug has become a resource-intensive bottleneck in modern very large scale integration computer-aided design flows, consuming as much as 32% of the total verification effort. This paper aims to advance the state-of-the-art in automated RTL debuggers, which return all potential bugs in the RTL, called solutions, along with corresponding corrections. First, an iterative algorithm is presented to compute the dominance relationships between RTL blocks. These relationships are leveraged to discover implied solutions with every new solution, thus significantly reducing the number of formal engine calls. Furthermore, a modern Boolean satisfiability (SAT) solver is tailored to detect debugging nonsolutions, sets of RTL blocks guaranteed to be bug-free, and to imply other nonsolutions using the precomputed RTL dominance relationships. Extensive experiments on industrial designs show a three-fold reduction in the number of SAT calls due to solution implications, coupled with faster SAT run-times due to nonsolution implications, resulting in a 2.63x overall speedup in total SAT solving time, demonstrating the robustness and practicality of the proposed approach.
returned, consisting of a sequence of input stimuli that exhibits a mismatch between the actual and expected responses of the design and its specification, respectively. Given a buggy design and a counter-example, design debugging is the process of tracking down the root cause of the observed erroneous behavior. The latter is still a predominantly manual task in the industry, entailing the burdensome analysis of long and complex counter-examples [2] . Recent technical roadmaps and market studies suggest that once a design fails verification, debugging it and fixing it can consume up to 32% of the total verification effort [1] .
With the aim of alleviating the design debugging cost, several methodologies have been proposed over the years to automate this process [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] . The output of a modern automated design debugger is a set of potential bug locations, referred to as solutions. Each solution denotes a set of registertransfer level (RTL) lines or blocks, where functional changes, called corrections, can rectify the erroneous behavior in the given counter-example. The automated debugger must return all solutions, along with their corrections, with engineers being given the final task of identifying the real bug and fixing it.
State-of-the-art automated debuggers make heavy use of formal tools, such as Boolean satisfiability (SAT) [11] , quantified Boolean formulas [13] , and maximum satisfiability [17] . They reduce the debugging problem into a propositional formula whose satisfying assignments correspond to debugging solutions. As such, hundreds of formal engine calls are often required to return all solutions, one at a time [15] . With typical design sizes containing millions of synthesized gates and hundreds of thousands of RTL lines, the heavy computational cost of such a high number of formal engine calls limits the effectiveness and scalability of automated debugging software. This work proposes techniques that: 1) reduce the number of required formal engine calls and 2) expedite the run-time of each call to the formal engine. This is done by leveraging structural dominance relationships between RTL components in the design.
A node u is said to be a single-vertex dominator of another node v if every path from v to a primary output passes through u. Single-vertex dominators can be found in linear time [18] , [19] and have been used for optimizing various computeraided design tasks, e.g., test pattern generation [20] , [21] . More recently, they have been leveraged in the gate-level debugger in [11] , which performs an initial debugging pass on selected dominator gates. However, state-of-the-art automated design debuggers operate at the RTL level [12] , [14] , where bugs occur in RTL blocks (e.g., an always block, an if 0278-0070 c 2014 IEEE statement, a module definition), corresponding to multiplegate, multiple-output circuit blocks in the synthesized netlist. As such, it is difficult to make use of single-vertex dominators at the RTL level. A block a dominates another block b if every path from every node in b to a primary output passes through a node in a. In existing approaches for computing so-called multiple-vertex or generalized dominators, the gates constituting each block are not fixed in advance. Instead, nodes are grouped to form blocks during the algorithm, and according to certain conventions (e.g., the smallest subset of fanouts collectively dominating a node [22] , [23] ). In contrast, we are interested in computing dominance relationships among blocks of nodes defined a priori by a hierarchical RTL design.
Our initial contribution is an algorithm that iteratively computes all the dominator RTL blocks of each RTL block in the design. Next, we apply our algorithm as a preprocessing step to debugging, and leverage it in two ways.
First, we prove that for each solution RTL block returned by the automated debugger; blocks that dominate it are separate implied solutions. As such, the number of formal engine calls for finding all solutions can be significantly reduced using solution implications. Moreover, we show how to extract corrections for such implied solutions in linear time from the satisfying assignment corresponding to the original solution. This can be thought of as pruning the solution space of the debugging problem.
We also use block dominance to prune the nonsolution space of the debugging problem. We introduce the concept of nonsolutions, which are sets of blocks that cannot be modified in any way to correct the counter-example. We show that if a set of n RTL blocks is a nonsolution, then a set of n blocks they dominate can also be ruled out as a nonsolution. Detecting nonsolutions and blocking the RTL blocks they dominate using blocking clauses during an SAT run can lead to significant time savings. In order to make such nonsolution implications possible and useful, we present a new SAT branching scheme where error-select variables [11] are decided upon first, allowing the early detection of original nonsolutions. We also prove that error-select variables are part of the careset [24] of the debugging problem, providing further theoretical ground for moving them up in the SAT decision tree. Finally, solution and nonsolution implications are shown to be valid for any error cardinality.
The proposed techniques are presented and implemented on the top of an SAT-based automated RTL debug framework [11] , [12] using MiniSat 2.2.0 [25] as the back-end solver. An extensive set of experiments on real industrial designs obtained by our partners demonstrates the consistent benefits of the presented framework. It is shown that 66% of solutions are discovered early due to solution implications, resulting in a three-fold reduction in the average number of SAT solver calls. This, coupled with the fact that 25% of all nonsolutions are implied and blocked, results in a 3x overall speedup in solving time. These results demonstrate the effectiveness and practicality of our contributions. This paper is organized as follows. Section II contains preliminaries on automated design debugging and dominators.
Section III presents our iterative algorithm for computing dominance relationships between blocks and proves its correctness. Section IV shows how to leverage block dominators for on-the-fly solution implications in design debugging. Section V introduces debugging nonsolutions and describes the use of block dominators to imply nonsolutions. Section VI discloses the details of our tailored SAT solver, which can detect original nonsolutions to imply further nonsolutions based on block dominators. Finally, Section VII presents experimental results and Section VIII concludes this paper.
II. Preliminaries
The following notation is used throughout this paper. Given a sequential circuit C, the symbol n denotes the set of all nodes in C. The symbols x, y, and s label (possibly overlapping) subsets of n, respectively referring to the sets of primary inputs, primary outputs, and state elements (flip-flops) of C. For each z ∈ {x, y, s, n}, the Boolean variable z i denotes the ith element in the set z. In general, bold (z) versus regular (z) symbols differentiate sets or sequences from single variables.
We consider designs with single clock domains, although the described theory is applicable to multiple synchronous clock domains using the techniques described in [26] . Ganai and Gupta [26] showed how to transform multiple synchronous clock domains into a single domain by using a global highfrequency clock and by adding extra circuitry around flip-flops and latches. The interested reader is referred to [26] , as the details of this translation are beyond the scope of this paper.
Time-frame expansion for k clock-cycles is the process of replicating, or unrolling, the combinational component of C k times such that the next state of each time frame is connected to the current state of the next time frame, thus modeling the sequential behavior of C. For any variable (or a set of variables) z i (or z), symbol z t i (or z t ) denotes the corresponding variable (or set of variables) in time-frame t of the unrolled circuit. The behavior of C during the tth clock-cycle is formalized using the transition relation predicate T (s t , s t+1 , x t , y t ), which describes the dependence of the primary outputs y t and nextstate s t+1 on the primary inputs x t and current-state s t . The transition relation T can be extracted from C and is normally given in conjunctive normal form (CNF), using the set of nodes n t as auxiliary variables. An RTL design is translated into a gate-level netlist using logic synthesis. Such a gate-level sequential circuit C can also be represented as a directed graph. For convenience, we add an artificial sink node r to this graph such that the set of nodes V = n ∪ {r} and the set of edges E = {(n i , n j )|n i is a fanin of n j in C}∪{(y i , r)|∀y i ∈ y}. We reserve the letters u and v to refer to nodes in V . Let fanout(v) = {u ∈ V |(v, u) ∈ E} and fanin(v) = {u ∈ V |(u, v) ∈ E}. Furthermore, the nodes n of C are grouped into (possibly overlapping) blocks. Each block consists of the synthesized gates of a given block of RTL code, such as an always block in Verilog. Let B = {b 1 , b 2 , . . . , b |B| } denote the set of all blocks, where each b i ⊆ n is a collection of nodes. Note that the same node v can belong to more than one block because of the hierarchical nature of RTL. The set out(b i ) denotes the outputs of block Consider the sequential circuit in Fig. 1(a) . The blocks
Note that y 1 and y 2 are primary output labels for g 3 and g 2 , respectively, and do not represent separate nodes. Fig. 1(b) presents the corresponding directed graph, including the artificial sink r.
A. Single-Vertex Dominators
In a directed graph C = (V, E, r) with a single output sink r ∈ V , a node u ∈ V is said to be a structural single-vertex post-dominator, or simply a dominator, of a node v ∈ V , if every path from v to the sink r passes through u. The set dom(v) = {u ∈ V |u dominates v} consists of nodes that dominate v. As a convention, we consider that a node dominates itself. Furthermore, to ease the presentation, we assume that every node has a path to r (i.e., all dangling logic has been removed).
The immediate dominator of a node v (v = r), denoted by idom (v) , is a provably unique node u (u = v) that dominates v and is dominated by all the nodes in dom(v) − {v}. It can be shown that for all [27] . Therefore, it is sufficient to compute all immediate dominators, which can be done in O(|E| + |V |) time [18] , [19] . In the directed graph shown in Fig. 1 
In this paper, we are interested in finding dominance relationships between blocks in B, rather than between nodes in V . Section III outlines our approach, and discusses why methods for computing single-vertex dominators, as well as existing techniques for computing multiple-vertex dominators, are not applicable in a design debugging setting.
B. Design Debugging
This section describes SAT-based design debugging [11] and introduces relevant notation, which is used throughout this paper. Given an erroneous design C, a set of blocks B, a counter-example of length k (along with its expected outputs) and an error cardinality N, the task of an automated design debugger is to find all sets of N blocks that can be responsible for the counter-example. More precisely, each returned set of can be modified to rectify the erroneous behavior exhibited in the counter-example. We refer to each such set of N blocks as a solution of cardinality N. SAT-based automated design debugging [11] , [12] encodes the debugging problem as a propositional formula whose satisfying assignments correspond to debugging solutions. The following are the steps to translate design debugging into an SAT problem. We use C and B given in Fig. 1(a) as an example for illustrating the encoding process. Fig. 2 is an illustration of the resulting design debugging encoding for a two-cycle counter-example.
First, a set of error-select variables e = {e 1 , . . . , e |B| } are added to the circuit such that setting e i = 1 disconnects gates in out(b i ) from their fanins, making them free variables, whereas setting e i = 0 does not modify the circuit. This can be achieved by inserting special multiplexers or switches at block outputs or by directly modifying the CNF of the transition relation. Next, this enhanced circuit is replicated using timeframe expansion for the length of the counter-example k, and such that for all time-frames t, outputs out(b t i ) are controlled by the same error-select variable e i . Fig. 2 
where T en (s t , s t+1 , x t , y t , e) refers to the transition relation predicate of the enhanced circuit at time-frame t.
Each assignment to e = {e 1 , . . . , e |B| } satisfying Debug (1) corresponds to a debugging solution, and the SAT solver must find all such satisfying assignments to e. This is normally done by iteratively blocking each satisfying assignment using a blocking clause and resolving Debug until the problem becomes unsatisfiable. In a satisfying assignment where some e i = 1, the values of out(b t i ) across all time-frames t = 1 . . . k represent a sequence of corrections, which would correct the erroneous behavior in the counter-example.
Example 1: Consider the sequential circuit in Fig. 1 (a) to be a buggy implementation. We are also given a two-cycle counter-example with initial state 0, inputs x 1 , x 2 = 1, 1 in the first time-frame and 0, 1 in the second, as well as expected outputs y 1 , y 2 = 1, 1 and 0, 0 in the first and second time-frames. This yields a mismatch in the second timeframe at the output y 2 since the buggy circuit yields y 2 2 are shown in boxes, while N is omitted for brevity. For N = 1, {b 2 } and {b 3 } will be returned by the solver as separate solutions, and are therefore considered potentially buggy blocks. Corrections for solution {b 2 } (respectively {b 3 }) consist of the satisfying assignments to {x 2 } (respectively {g 1 , g 2 }) during the two time-frames. For instance, in any correction for solution {b 2 }, x 2 2 must be set to 0.
III. Dominance between Blocks
In this section, an iterative algorithm is presented for computing the dominator RTL blocks of every RTL block. Assuming that internal (nonoutput) block nodes cannot be primary outputs, any path to a primary output exiting a block must pass through one of its outputs. Furthermore, all primary outputs are connected to the artificial sink r. As such, the block dominator relation D ⊆ B × B can be formalized using restricted quantifier notation [28] as follows:
(2) where a path p : v p r is a sequence of nodes starting at v and ending at r. The right-hand side of (2) reads: for all vertices v in out(b i ), and for all paths p from v to r, there exists a vertex u in p, such that u ∈ out(b j ).
We let the set D(b i ) = {b j |b j Db i } consist of blocks that dominate b i . Note that b i Db i according to (2) . Consider the sequential circuit given in Fig. 1(a) . Although x 2 is not dominated by g 1 or g 2 separately, block
The relation D on the blocks B of C in Fig. 1 (b) is illustrated in Fig. 3 . Unlike single-vertex dominators, a block does not necessarily have a unique immediate dominator block. This can be seen for block b 1 in Fig. 3 . As such, algorithms for calculating single-vertex immediate dominators cannot be used for computing block dominators. On the other hand, in existing approaches for computing so-called generalized or multiplevertex dominators [22] , [23] , [29] , block boundaries are not defined in advance. Instead, nodes are assembled into multiplevertex dominators on-the-fly according to certain conventions, e.g., the smallest subset of fanout(v) collectively dominating a node v [22] , [23] . This is not applicable in a design debugging setting, where circuit blocks are defined in advance by the hierarchical RTL design. 
while changed do 10 changed ← false;
In this paper, the block dominator relation D on the set of blocks B is computed in two steps. First, the block dominators of each node v ∈ V are computed. Then, these block-to-node dominators are used to compute the block-to-block dominator relation D.
Definition 2: A block b j dominates a node v, denoted as b j dv, if and only if every path from v to a primary output in y passes through a node in b j .
The block-to-node dominator relation d ⊆ B × V can be formalized as
We let the set d(v) = {b j |b j dv} consist of blocks that dominate node v. For instance, in Fig. 1 
Algorithm 1 shows our pseudocode for computing the block dominator relation D. It first computes the sets d(v) for every v ∈ V (lines 1 to 17). This is done using an iterative algorithm, where the set of block dominators of each node is initialized to all blocks B and iteratively refined until it converges to its actual block dominators. These block-to-node dominators are subsequently used on line 19 to compute
On line 1, C T denotes the transpose of directed graph C (i.e., C with edges reversed). The function ReversePostordering(C T , r) performs a depth-first search (DFS) of C T starting from r, and sorts the nodes in decreasing finishing times. In general, a reverse postordering is not unique. For instance, for C given in Fig. 1(b) ,
Traversing V in reverse postorder guarantees for each node u ∈ V that at least one of v ∈ fanout(u) is already visited by the time u is traversed. This will reduce the number of iterations needed for convergence when computing the sets d(v) later in the algorithm.
Lines 3-5 calculate the sets out −1 (v) for each node v. The algorithm for computing the sets d(v) for all nodes v (lines 7-16) is based on the traditional data-flow analysis algorithm for finding single-vertex dominators [27] , [30] . Lines 7 and 8 initialize each dominator set d(v) to all blocks B for v ∈ V − {r}, and to the empty set for v = r. In each iteration of the while loop, the nodes are traversed in reverse postorder (as calculated on line 1) and a refined set of dominator blocks is computed for each node on line 13. The computation of this refined set of dominator blocks of each node on line 13 is the main difference with the data-flow analysis algorithm for single-vertex dominators. The new set of dominator blocks of a node u ∈ V is updated to be the intersection, over all v ∈ fanout(u), of the dominator blocks of v as well as the blocks in which v is an output. If any of the sets d(v) is changed during an iteration (i.e., the if condition on line 14 is true), the while loop is executed again. The while loop terminates after an iteration where all block-to-node dominator sets remain unchanged. Line 17 adds the blocks in which node v is an output, to the dominators of v. Example 2: We will go through Algorithm 1 for the graph respresentation of the circuit in Fig. 1(a) , along with its suspect blocks B, as shown in Fig. 1(b) . Let the reverse postordering returned by ReversePostordering 
After line 8, we have d(r) = ∅ and for every other v,
Next, each iteration of the while loop goes through every vertex u in reverse postorder and sets
In the first iteration, the block-to-node dominators are set as follows:
The second iteration of the while loop does not modify any of these sets, and as such the loop is exited. Line 17 then adds out
Finally, line 19 intersects the block-to-node dominators of the output nodes of each block to get the block-to-block dominator relation D shown in Fig. 3 .
Lemma 1: The while loop in Algorithm 1 terminates and the block-to-node dominator relation d is correctly computed by the end of the foreach loop on line 17.
Proof: Kam and Ullman [31] described a class of well known iterative data-flow analysis algorithms, which have a variety of applications (e.g., in compiler optimization [32] ) and are not restricted to calculating dominators. It can be shown that Algorithm 1 up to line 16 is a special case of Algorithm MK in [31] . Using the results in [31] , one can show that the block-to-node dominator relation d is correctly computed by the end of the foreach loop on line 17.
Theorem 1: Algorithm 1 correctly computes the block dominator relation D.
Proof:
which satisfies the definition of the block dominator relation D given in (2) .
The overall run-time of Algorithm 1 is normally dictated by the run-time of the while loop from line 10 to 16. Furthermore, during each iteration of the while loop, line 13 clearly dominates computation time. We assume that all dangling logic has been removed during preprocessing (i.e., every node has a path to r), and as such |V | = O(|E|). Using an aggregate analysis of all executions of line 13 during a single iteration of the while loop, it can be seen that line 13 performs a total of O(|E|) intersections and unions between two sets of size at most
. We assume that all sets are implemented using ordered lists, and therefore intersections and unions can be done in linear time. As such, in a single iteration of the while loop, line 13 takes O(|B| · |E|) time.
Let c denote the so-called loop-connectedness of the directed graph C = (V, E, r), which refers to the maximum number of back edges in any cycle-free path in C. The back edges are defined according to the DFS performed in REVERSEPOSTORDERING(C T , r) on line 1. It is proven in [31] that the number of iterations of the while loop for this class of algorithms is bounded by c + 2. Hence, our algorithm takes O(c · |B| · |E|) time.
IV. Leveraging Block Dominance in Design
Debugging: Solution Implications In this section, we show how to leverage the relation D to imply debugging solutions. In effect, given a solution
is also a solution of Debug. This is an implication of Theorem 2, which is more general because it is also used in Section V. Furthermore, Corollary 1 shows that corrections for each implied solution can be obtained automatically from the satisfying assignment of the original solution.
First, due to the fixed length of a given counter-example, we define the following slightly modified concept of domination.
Definition
The following three lemmas are used in the proof of Theorem 2 in this section. In Lemma 2, Proof: In what follows, we refer to the left-hand side (respectively right-hand side) SAT formula of the implication as the LHS (respectively RHS). Let π denote any satisfying assignment of the LHS = Debug ∧ n m=1 e i m . Assuming that n m=1 (b j m Db i m ), we will construct an assignment π satisfying the RHS = Debug ∧ n m=1 e j m to prove the claim. Given any set of variables z, we use the notation π(z) [respectively π (z)] to denote the truth assignment to z in the satisfying assignment π (respectively π ). Clearly, every variable in {e i 1 , . . . , e i n } must be set to 1 in π(e). Furthermore, we let the remaining N −n error-select variables that are set to 1 in π(e) be denoted by {e i n+1 , . . . , e i N }. Using this notation, the cardinality-N solution of the LHS corresponding to π can simply be written as {b i 1 , . . . , b i N }. We refer to this set as B.
We start by constructing π (e) from π(e). This is equivalent to constructing a set of blocks B that we will later show is a cardinality N solution of the RHS. Clearly, every variable in {e j 1 , . . . , e j n } must be set to 1 in π (e). Furthermore, for every m = n + 1, . . . , N, we obtain the following. We set every other error-select variable to 0 in π (e). The total number of error-select variables assigned to 1 in π (e) is exactly N, thus satisfying N .
Using the scheme given above, we have
It is easy to show that B DB. We are already given that for each m = 1 . . . n, b j m Db i m . Each of the blocks in the other two subsets in B shown in (5) already exists in B, and therefore dominates itself by definition. As such, each block in B dominates at least one block in B, and therefore, by Lemma 2, B DB. Furthermore, by Lemma 3, we get B D k B.
In the second half of this proof, we will use the fact that B D k B in order to construct a satisfying assignment π for the RHS SAT formula from any satisfying assignment π of the LHS SAT formula. The assignment π must set all the variables in Debug, which includes the k-time-frame expanded circuit obtained from C, as described in Section II-B. Let U refer to this expanded circuit (Fig. 2) . In what follows, we refer to the blocks in B as {b j 1 , . . . , b j N } to simplify notation. Let I = {b
denote the union of all nodes in B (respectively B ) across all time-frames in U. Also, let out(I) [respectively out(J)] refer to the set of outputs of I (respectively J). We will partition the nodes in U into three parts, U I , U J , and U R as follows.
Let U J denote the transitive fanout of out(J) in U. Let U I denote the nodes in U that are in the transitive fanout of out(I), but not in U J . Finally, let U R consist of the remaining nodes in U, outside U I and U J . Using B D k B and Lemma 4, we can imply that any path from out(I) to a primary output must pass through out(J). As a result, these partitions of U can be represented by the diagram shown in Fig. 4 .
Note that in Fig. 4 , the output constraints are separated into two subsets:
denotes the output constraints applied at the outputs of U J (respectively U R ). This separation is only needed for this proof and is not required by our method.
We know that given e i 1 = 1, . . . , e i N = 1, there exist assignments to the nodes in U I , U J , and U R satisfying the LHS. Let π(U I ), π(U J ), and π(U R ) refer to these assignments. We want to find assignments π (U I ), π (U J ), and π (U R ) such that given e j 1 = 1, . . . , e j N = 1, the RHS is satisfied. These assignments are found as follows.
First, consider the subset of output constraints applied at the outputs of U R , denoted by R Y in Fig. 4 . Since π(U R ) satisfies R Y and the input constraints to U R (i.e., S ∧ X ) are the same in the LHS and the RHS, setting π (U R ) = π(U R ) will also satisfy R Y in the RHS. Next, consider U I . Note that any path from out(I) to a primary output must pass through out(J). Also, setting e j 1 = 1, . . . , e j N = 1 in the RHS disconnects out(J) from their fanins. Therefore, there are no output constraints applied on U I (i.e., U I is dangling logic in the RHS). As such, π (U I ) can simply propagate the values of π (U R ) in U I .
Finally, since the nodes in out(J) are disconnected from their fanins in the RHS, the SAT solver is free to pick any assignment for these variables. Furthermore, setting π (U R ) = π(U R ) already assigned any inputs to U J coming from U R to the same values as the LHS. Therefore, we can simply pick π (U J ) = π(U J ), which will satisfy J Y in Fig. 4 . This completes the satisfying assignment π to all the variables in U I , U J , and U R in the RHS. Therefore, the RHS is SAT.
Corollary 1: Given a solution {b i 1 , . . . , b i N } and its corresponding satisfying assignment π of Debug, a sequence of corrections for each implied solution {b j 1 , . . . , b j N } consists of the assignments to {out(b
Proof: Consider Theorem 2 with n = N. In its proof, we showed how to build a satisfying assignment π of the RHS of (4) given a satisfying assignment π of the LHS. In particular, we showed that the subset of π corresponding to U J is the same as the subset of π corresponding to U J . In other terms, π (U J ) = π(U J ). Since U J is simply the transitive fanout of out(J) in U, the subset of π corresponding to out(J) is also the same as the subset of π corresponding to out(J). As such, given a satisfying assignment π for the original solution {b i 1 , . . . , b i N }, a sequence of corrections 
Intuitively, Theorem 2 and Corollary 1 show that if the SAT solver returns a given debugging solution, we can immediately imply that all its dominators are also solutions, and we can extract their corresponding corrections from the satisfying assignment of the original solution. This eliminates all the additional SAT solver calls to find these dominating solutions and their corrections, therefore significantly expediting the debugging process.
A. Debugging Flow Using Solution Implications
The flowchart in Fig. 5 illustrates the overall design debugging flow using on-the-fly solution implications. Algorithm 1 is first run to compute D(b i ) for every block b i ∈ B. Next, the automated debugger builds the CNF of Debug and passes it to the SAT solver. If it is UNSAT, the flow terminates. Otherwise, a solution {b i 1 , . . . , b i N } is returned. A simple implication engine takes in this solution, and using the precomputed block dominator relation D, generates all newly implied solutions. A blocking clause is added to Debug for each of these implied solutions, as well as the original solution. The resulting debugging instance is given again to the automated debugger, and this process is repeated until the problem becomes UNSAT.
Example 3: Consider the sequential circuit in Fig. 1(a) and the corresponding design debugging formulation illustrated in Fig. 2 , with N = 1. Assume that D ⊆ B × B has been computed using Algorithm 1. Let the solver first return the solution {b 2 }. Since D(b 2 ) = {b 2 , b 3 }, the solution {b 3 } (along with its corrections) can be immediately implied, eliminating a SAT call. After adding the corresponding blocking clauses (ē 2 ) and (ē 3 ) to Debug, the solver returns UNSAT, indicating that all solutions have been found.
V. Leveraging Block Dominance in Design
Debugging: Nonsolution Implications In this section, we define the concept of a nonsolution. We show that the reverse of the computed block dominance relationships can be leveraged to perform nonsolution implications, thus pruning the search space of the debugging problem. In Section VI, we present a tailored SAT solver that is able to learn and detect original nonsolutions much faster, leading to earlier nonsolution implications and expedited run-times.
As opposed to a solution, any set of N blocks whose outputs can in no way be simultaneously modified to correct the counter-example can be referred to as a nonsolution. We extend this concept to sets of n ≤ N blocks as follows.
Definition 4: Given an erroneous design C, a set of blocks B, a counter-example of length k along with the corresponding expected outputs and an error cardinality N, {b i 1 , . . . , b i n } with n ≤ N is a nonsolution if and only if Debug ∧ n m=1 e i m is UNSAT.
In other terms, a set of n ≤ N blocks is an n-block nonsolution if it cannot be extended to any solution of cardinality N. Theorem 3 proves that sets of blocks dominated by nonsolutions are also nonsolutions.
Theorem 3: Given an erroneous design C, a set of blocks B, a counter-example of length k along with the corresponding expected outputs, and an error cardinality N, if {b j 1 , . . . , b j n } with n ≤ N is a nonsolution of Debug and
Using Theorem 2, we have
Intuitively, Theorem 3 shows that if we are able to identify a nonsolution, i.e., a set of blocks that we cannot modify in any way to fix the given counter-example, we can immediately imply that all blocks dominated by this nonsolution are also nonsolutions. This information can be procured to the SAT solver by adding blocking clauses to divert it from considering these dominated blocks. This prunes its search space and therefore speeds up the completion time of the SAT call.
Example 4: Consider the sequential circuit in Fig. 1(a) and the corresponding design debugging formulation illustrated in Fig. 2 , using N = 1. We know that block b 4 dominates b 1 and b 5 . If b 4 is known to be a nonsolution, using Theorem 3, we can imply that b 1 and b 5 are each separate nonsolutions. We can therefore automatically add the clauses (ē 1 ) and (ē 5 ) to prune the search space of Debug.
Whereas Theorem 2 can be used to imply solutions of cardinality N given each returned solution by the SAT solver, in order to be able to use Theorem 3 one must first be able to detect nonsolutions. This can only be made possible by modifying the SAT solver.
One way to identify nonsolution blocks is by monitoring learned clauses of the form (ē 1 ∨ · · · ∨ē n ). However, the SAT solver rarely learns such clauses. Instead, learned clauses are more complex and usually involve many other variables along with the error-select variables. Another way to detect blocks that are single-block nonsolutions is to examine the forced assignments of Boolean constraint propagation (BCP) after each solver restart (when the decision stack is empty). If some e i = 0 by unit propagation given an empty decision stack, then the solver has learned that b i is a nonsolution block. However, from our experiments, virtually all such nonsolution blocks are learned during the last solver restart in the last call to the all-solution SAT procedure (after all solutions have been found), leaving little room for improvements using nonsolution implications.
VI. Tailored SAT Solver
In this section, we describe a new SAT branching scheme for design debugging, where error-select variables are decided upon first. This allows the early learning (and simple detection) of nonsolutions, making nonsolution implications useful.
A. Our SAT Branching Scheme
We force the SAT solver to first decide on all error-select variables (e). The rest of this subsection provides several motivations for this choice. Furthermore, we force the solver to always assign error-select variables that are decided (i.e., not forced due to BCP) to 1 before trying to set them to 0. The reason for doing this is to learn and detect nonsolutions, and is explained in detail in Section VI-B. Once all the error-select variables are assigned, the solver uses the standard decision heuristics (e.g., VSIDS [33] ) for the remaining variables.
Motivation: A SAT solver can assign variables in any order. The first motivation for assigning the error-select variables early in the decision tree relates to their importance and their impact on other variable decisions in the SAT solving process. For example, when e i = 1, the internal nodes of block b i become dangling, and therefore they are don't-cares. As such, assigning the nodes in b i , as well as their fanouts, is useless if e i is later assigned to 1.
What follows formalizes this motivation by proving that the error-select variables are part of the careset [24] variables of a design debugging problem. According to [24] , a complete assignment on careset variables is a gateway to a satisfying assignment, and branching on these variables first can speed up SAT solvers by an order of magnitude.
In [24] , a constrained circuit is one where certain variables are constrained to Boolean values. For instance, the constraint circuit corresponding to Debug for the circuit in Fig. 1(a) is shown in Fig. 2 . Given a constrained circuit and its corresponding CNF formula , a partial assignment π is said to be a minimally satisfying assignment (MSA) if and only if: 1) BCP cannot be applied on | π further; 2) | π can be shown to be satisfiable by assigning all remaining unassigned inputs in its constrained circuit to arbitrary values and using BCP; 3) unassigning at least one variable in π would violate 1) or 2). Here, | π denotes where the variables in π have been assigned to their values in π.
Definition 5: A nonempty set S of variables is a careset of if every variable in S is assigned in every MSA of .
Theorem 4:
The error-select variables e belong to the careset of Debug.
Proof: Assume toward a contradiction that a certain error-select variable e i does not belong to the careset of Debug. Then, there must exist an MSA π of Debug where e i is unassigned. By 2), | π must be satisfiable under any combination of assignments to the error-select variables not in π, including e i = 0 and e i = 1. This cannot be true because assigning e i to 0 or 1 in a certain combination of error-select assignments will change the number of error-select variables set to 1, thus violating the error cardinality constraint N in at least one of the cases.
The second, and more important, reason for assigning the error-select variables early is that it allows the solver to learn nonsolution blocks much faster. This, in turn, enables nonsolution implications to prune the SAT search space earlier and therefore more effectively. Section VI-B discusses how to detect learned nonsolutions using our branching scheme.
B. Detecting Nonsolutions
To simplify the presentation of this subsection, let us assume without loss of generality that the error-select variables, which are branched upon first in our SAT solver, are decided in the order of e 1 , . . . , e |B| . According to our branching scheme explained in Section VI-A, the SAT solver first assigns e 1 = 1. If the solver later switches to e 1 = 0 without finding a satisfying assignment under e 1 = 1, this means that e 1 = 1 cannot be extended to a satisfying assignment. Hence, e 1 = 0 is true for all satisfying assignments (if any exist). That is, (ē 1 ) has been learned and {b 1 } is a single-block nonsolution.
Lemma 5 shows that, using our decision scheme, assigning any e i to 0 on the rightmost path in the SAT decision tree indicates that {b i } is a single-block nonsolution.
Lemma 5: For any i, if every error-select variable in {e 1 , . . . , e i } is set to 0 by the SAT solver, then {b i } is a single block nonsolution.
Proof: Recall that our decision scheme forces the solver to first set each error-select variable to 1 before trying to set it to 0. Thus, by construction, e i = 1 has already been tried under every assignment combination to e 1 , . . . , e i−1 , and cannot be extended to a satisfying assignment. This means that Debug∧e i is UNSAT, which implies that {b i } is a single-block nonsolution.
Example 5: Consider the partial SAT decision tree in Fig. 6(a) , where error-select lines do not correspond to the blocks in Fig. 1(a) . Each set of blocks shown under a dashed line indicates a nonsolution detected by the solver using Lemma 5. For instance, as soon as e 2 is assigned to 0, since its only ancestor e 1 is also assigned to 0, {b 2 } is detected as a 1-block nonsolution. The solver implies that every block that b 2 dominates is also a nonsolution, by Theorem 3 with n = 1. As such, we can add clauses of the form (ē i ) for every block b i that has b 2 in its dominators D(b i ).
1) General Case-Detecting n-Block Nonsolutions: Theorem 5: For any n ≤ N and for any i ≥ n, if e i is set to 0, and every error-select variable in {e 1 , . . . , e i−1 } is set to 0 by the SAT solver except n−1 of them, say {e j 1 , . . . , e j n−1 } which are set to 1, then {b j 1 , . . . , b j n−1 , b i } is an n-block nonsolution. Proof: Once again, our decision scheme forces the solver to first set each error-select variable to 1 before trying to set it to 0. This means that every time the solver sets an errorselect variable e j to 0, it does so because it has exhausted the e j = 1 branch and hence e j = 0 is implied under the partial assignment to e 1 , . . . , e j−1 above e j in the decision tree.
As such, starting from the first error-select variable that is set to 0, which is implied by any previous error-select variable assignments to 1 (if any), it is easy to show by induction that every error-select variable assignment to 0 can be implied from any previous error-select variable assignments to 1. In other terms e i = 0 is forced due to e j 1 = 1, . . . , e j n−1 = 1
Example 6: The partial SAT decision tree in Fig. 6 (b) shows nonsolutions of one and two blocks, detected using Theorem 5. For instance, as soon as e 3 is assigned to 0, since its only ancestor assigned to 1 is e 2 , we know that {b 2 , b 3 } is a 2-block nonsolution. Each of the blocks in this nonsolution can be replaced by a block it dominates to get another nonsolution, by Theorem 3 with n = 2. As such, we can add clauses of the form (ē i ∨ē j ) for every b i and b j dominated by b 2 and b 3 , respectively.
C. Restart Heuristic
Modern SAT solvers have periodic restarts, usually occurring after a certain number of conflicts. A restart clears assignments of all variables (including all decisions) while keeping the learned clauses. We modify the existing restart mechanism to enhance nonsolution detection as follows. If no nonsolutions have been learned during a solver restart, we generate a random number r (1 ≤ r ≤ |B|), and swap the order of the error-select variables below and above level r in the decision tree. The reasoning behind this is to avoid spending too much time in parts of the search-space where it is hard to detect and therefore imply nonsolutions.
D. Debugging Flow Using Solution and Nonsolution Implications
The flowchart in Fig. 7 illustrates the overall design debugging flow using both solution and nonsolution implications. This is an extension to the flow in Fig. 5 . Our modified SAT solver is able to detect n-block nonsolutions, where 1 ≤ n ≤ N, of the form {b i 1 , . . . , b i n }. Each such nonsolution is immediately passed to a reverse implication engine, which uses the precomputed block dominator relation D to generate all implied nonsolutions. These are passed back to the SAT solver, which adds the corresponding clauses to prune the search space of the problem.
Example 7: Consider the sequential circuit in Fig. 1(a) and the corresponding design debugging formulation illustrated in Fig. 2 . Assume that N = 1, and that the modified SAT solver detects the nonsolution {b 4 }. Since b 4 ∈ D(b 1 ) and b 4 ∈ D(b 5 ), the nonsolutions {b 1 } and {b 5 } can be implied by the reverse implication engine. As such, the clauses (ē 1 ) and (ē 5 ) are added to Debug, immediately pruning the solution search space.
VII. Experimental Results
This section presents the experimental results for solution and nonsolution implication-based design debugging in an industrial setting. All experiments are run using a single core of an i5-2400 3.1 GHz workstation with 8 GB of RAM and a timeout of 3600 s. The proposed debugging framework is implemented on the top of a state-of-the-art SAT-based debugger based on [11] , [12] , and [15] , with a Verilog front-end to allow for RTL diagnosis. For nonsolution implications, we tailor the debugger's back-end SAT solver, Minisat-v2.2.0 [25] .
Eight industrial Verilog designs from OpenCores [34] and two commercial designs provided by our industrial partners are used in our experiments. For each design, several debugging instances are generated by inserting different errors into the design. The RTL errors that are injected are based on the experience of our industrial partners. These are common designer mistakes such as wrong state transitions, incorrect operators, or incorrect module instantiations [15] . Such errors typically translate into multiple gate-level changes. The erroneous design is then run through an industrial simulator with the accompanying testbench, where a failure is detected and a counter-example is recorded. Each block b i ∈ B consists of the synthesized gates corresponding to a (set of) line(s) in the RTL implementing an assignment, an if statement, a module definition, an instantiation, etc. Our RTL debugging tool uses bounded model debugging (BMD) [15] , which is an orthogonal debugging optimization that helps deal with long counter-examples. BMD first examines the last d time-frames (d = 40 in this paper) of the counter-example, and iteratively analyzes earlier time-frames until all solutions are found. The reasoning behind it is that the bug is most often (although not always) excited within a few time-frames of it being observed. In our experiments, trad refers to the traditional debugging flow (without solution or nonsolution implications), +impl includes only solution implications as illustrated in Fig. 5 , −impl includes only nonsolution implications, and finally +impl−impl includes both as shown in Fig. 7 . Table I presents the characteristics of each design debugging instance, as well as the number of solutions and the run-times of trad with N = 1. The first column gives the instance name, which consists of the design name and an appended number indicating a different inserted error. The following three columns, respectively, show the number of circuit nodes |n| in the cone-of-influence of the counter-example, the number of blocks |B|, the number of clock-cycles k in the counterexample. Columns # sols and overh, respectively, refer to the total number of returned solutions, and the run-time overhead in seconds for setting up the problem, (i.e., generating the CNF of Debug). This overhead includes graph optimizations such as dangling logic removal. Note that the # sols and overh numbers are common to all debugging flows. Finally, column trad SAT gives the total SAT solver run-time for returning all solutions with the traditional debugging flow. This is the sum of all SAT calls, including the last call that yields UNSAT. Table II presents the experimental results for +impl, −impl and +impl−impl, for the same debugging instances as the ones shown in Table I . Column Alg. 1 gives the run-time in seconds of Algorithm 1 for computing the block dominator relation D, which is run as a preprocesing step in each of these three flows. Next, for each of +impl, −impl, and +impl−impl, column SAT shows the SAT solver run-time in seconds, whereas columns impr and impr-o show the speedup achieved by that debugging method over trad, where impr excludes and impr-o includes the common overhead shown under column overh in Table I . The run-time of Algorithm 1 is included in speedup calculations. Finally, under +impl−impl, columns sol impl and non-sol impl give the percentage of implied solutions to all solutions and implied nonsolutions to all nonsolutions, respectively. Since N = 1, all nonsolutions are single-block here. Fig. 8 shows the percentages of implied solutions to all solutions and implied nonsolutions to all nonsolutions, sorted in increasing order. On average, 68% of all solutions are implied, resulting in a three-fold reduction in the number of SAT solver runs. This percentage of implied solutions goes up to 91% for vga-5, indicating that more than nine out of ten solutions are found without a SAT call. Fig. 8 also shows that, on average, 25% of all nonsolutions are implied by our tailored SAT solver. Fig. 9 plots the number of found solutions versus run-time for trad, +impl, and +impl−impl for design1-2. It can be seen that while trad returns solutions at roughly equal time intervals, +impl and +impl−impl discover solutions at a much faster rate due to solution implications. The rate of solution discovery decreases for both with time, mainly because implied solutions in later SAT calls might have already been found (or implied) in previous calls. Returning most solutions early is beneficial because the engineer can start examining returned solutions earlier, while the debugger continues to run. Moreover, as expected, Fig. 9 demonstrates that nonsolution implications speed up SAT calls in +impl−impl compared to +impl. Fig. 10 plots the SAT run-times of each of +impl, −impl and +impl−impl versus those of trad on a logarithmic scale, along with the 1x, 2x, 3x, and 10x lines, clearly showing the superiority of our approaches. The geometric means of the speedups from trad to each of +impl, −impl, and +impl−impl are shown in the last line of Table II to be 2.07x, 1.45x, and 2.63x excluding common overhead, and 1.75x, 1.33x, and 2.02x including overhead. In most cases, higher percentages of implied solutions and nonsolutions mean less and faster SAT calls, which result in less total SAT solving time. For instance, in design1-2, 80% of solutions and 58% of nonsolutions are implied, yielding a 4.1x speedup in total SAT run-time, compared to the average 2.63x speedup. However, this is not always true because of the unpredictable behavior of SAT solvers. Furthermore, we have not found any clear relationships between design parameters and improvements due to solution and nonsolution implications.
Finally, Table III shows experimental results for 11 debugging instances with N = 3. Here, trad is compared to two versions of +impl−impl, first only allowing nonsolutions of n = 1 blocks, then nonsolutions of up to n = 3 blocks. The columns of Table III are structured similarly to those  of Tables I and II . We can see that detecting and implying nonsolutions of up to three blocks increases the geometric mean of the speedup of +impl−impl relative to trad from 2.21x to 2.35x excluding common overhead, and from 1.55x to 1.59x including overhead.
VIII. Conclusion
We first presented an iterative algorithm for computing dominance relationships between the blocks of an RTL design. We showed how to leverage these dominance relationships in an automated RTL debugger to expedite the discovery of potentially buggy RTL blocks, as well as the clearance of blocks guaranteed to be correct. This was done by using solution and nonsolution implications. Our methods reduced the number of SAT calls three-fold and speedup each call, resulting in a 2.63x overall speedup in total SAT solving time, as demonstrated on an extensive set of experiments on industrial designs.
In the future, we plan to use dominance relationships between RTL blocks to group and rank all potential bugs.
Bao Le (S'13) received the B.A.Sc. and M.A.Sc.
