Abstract-In this paper, we present CORT, a factored concolic execution based methodology for high-level functional test generation. Our test generation effort is visualized as the systematic unraveling of the control-flow response of the design over multiple explorations. We begin by transforming the Register Transfer Level (RTL) source for the design into a high-performance C++ compiled functional simulator which is instrumented for branch coverage. An exploration begins by simulating the design with concrete stimuli. Then, we perform an interleaved cycle-by-cycle symbolic evaluation over the concrete execution trace extracted from the Control Flow Graph (CFG) of the design. The purpose of this task is to dynamically discover means to divert the control flow of the system, by mutating primary-input stimulated control statements in this trace. We record the control-flow response as a Test Decision Tree (TDT), a novel representation for the test generation effort. Successive explorations begin at system states heuristically selected from a global TDT, onto which each new decision tree resultant from an exploration is stitched. CORT succeeds at constructing functional tests for ITC99 and IWLS-2005 benchmarks that achieve high branch coverage using the fewest number of input vectors, faster than existing methods. Furthermore, we achieve orders of magnitude speedup compared to previous hybrid concrete and symbolic simulation based techniques.
I. INTRODUCTION
Hardware design validation consumes as much, or more, resources than design development. Rapid test generation for functional metrics at a high level of design abstraction (such as the Register Transfer Level (RTL)), not only can aid validation effort but can catch design flaws earlier. Commonly used functional metrics are akin to their software testing counterparts, i.e., statement coverage, branch coverage, path coverage and assertion coverage [2] . Considering the complexity of practical hardware design, the go-to standard approach of constrained random test generation is proving to be unsuitable for achieving high functional coverage. This is because such random testing does not utilize the available high-level functional design information available at this abstraction level. To bridge the gap, designers manually guide directed test generation effort. It is crucial that this task is automated to decrease the overall human effort and redirect it for suitable goals. Directed test generation by evaluating the design under formal methods can theoretically reach all system states in the design. However, in reality, these techniques do not scale well due to the size and complexity of modern designs. Traditionally, symbolic evaluation of RTL [1] is statically performed by unrolling the design for several cycles. Considering that the entire hardware RTL needs to be evaluated per cycle of unrolling, the number of feasible paths that need to be considered grow exponentially (the path explosion ceiling).
In Software Testing, Godefroid et al. [10] introduced the concept of high performance directed automated random testing (DART) by engaging classical symbolic execution over the concrete execution path derived by testing the software with real (concrete) inputs. Based on this concept, Sen et al. [7] built an efficient C-language unit-test generator, CUTE, and coined the term, Concolic Execution (portmanteau of concrete and symbolic). Inspired by the success of concolic execution, semi-formal directed RTL test generators [8, 9, 13, 16] tackle this problem by restricting the symbolic evaluation to the analysis of the real executed path extracted during dynamic simulation using random vectors. These hybrid techniques score well on their intended functional metric with a minimal number of test vectors. On the other hand, stochastic or heuristically guided search-based techniques [11, 12] are generally faster than semi-formal techniques, but usually, end up producing relatively larger tests to meet similar goals.
In this work, we present, CORT (Concolic RTL Test Generator), a methodology for rapid RTL directed test generation that aims to maximize branch coverage with a minimal number of test vectors. Tests with high branch coverage can be used towards the task of state justification, a primary task in gate-level ATPG (Automated Test Pattern Generation) [14] . Moreover, being able to reach every branch would also imply executing every user-defined assertions in the design, thereby providing high assertion-coverage and confidence in the design-under-test. The following is a short overview of our test generation methodology. We iteratively build upon several relatively smaller explorations to generate the overall test. For each exploration, we perform cycle-by-cycle concolic simulation where symbolic expressions are only constructed for primary input stimulated (activated) statements in the conPaper 11.2 978-1-5386-3413-4/17/$31.00 c 2017 IEEE INTERNATIONAL TEST CONFERENCE 1 crete execution trace. Not all statements or variables are datadependent on primary input. Such statements can be abstracted away, and the unstimulated variables defined by them can be substituted for their concrete values (read from the simulator), if they are eventually used in other activated statements. As a result, the overall symbolic evaluation effort is significantly reduced. The goal of concolic simulation is to dynamically reveal activated control points (guard statements) in the trace. The symbolic expressions for activated guards are mutated (negated) using a formal Satisfiability Modulo Theory (SMT) solver, to generate mutation stimuli. The concrete and mutation stimuli are used to build a decision tree which represents the control-flow response of the design in the exploration. The decision trees from individual explorations incrementally uncover the overall control-flow response of the design to various stimuli over several cycles, i.e., a global Test Decision Tree. This is subsequently used to guide new explorations along heuristically determined starting system states. Once all explorations have concluded, the optimal test can be derived directly from the global Test Decision Tree. Our contributions are summarized below:
• We present an efficient factored concolic-execution based automated test generator for RTL.
• We introduce a cycle-by-cycle concolic execution, where the symbolic evaluation effort is strictly limited to primary input stimulated statements, and unstimulated values are substituted with concrete values from the trace.
• A novel test representation, the Test Decision Tree is proposed, which specifies the multi-cycle control-flow response of the design under various stimuli. Each of our concolic execution based explorations results in a decision tree.
• Path explosion and limitation of unrolling are overcome by a systematic stitching of the decision trees of many individual explorations to form a global Test Decision Tree. Three selection methods are described which heuristically select the next starting state for each factored exploration.
• Our approach generates the optimal test that reaches all covered branches with the least number of test vectors using the Test Decision Tree.
• Despite being developed in Python, our method still achieves significant speedup over previous hybrid symbolic and concrete methods, while retaining their advantage of being able to generate minimal tests. The rest of the paper is organized as follows. Section II discusses relevant previous work followed by fundamental concepts used in our work. Section III covers the framework of CORT and discusses underlying operations and methodology alongside an illustrative example. In Section IV, we detail the performance of CORT, followed by limitations in Section V. Finally, a concluding summary is described in Section VI.
II. BACKGROUND AND PRELIMINARIES
In this section, we present related work. Then, we describe the fundamental concepts used in our work along with key definitions.
A. Related Work
HYBRO [9] demonstrates viable RTL directed test generation by symbolically executing the concrete trace extracted from the design Control Flow Graph (CFG) during simulation. The RTL is instrumented with branch coverage counters, and the execution path in the CFG is deduced by observing the change in branch counters per simulation cycle. HYBRO symbolically evaluates the entire concrete trace to reveal activated (stimulated by input) guards along that trace. New regions of execution are explored by mutating these activated guards. Xiaoke and Mishra [13] also interleaved concrete simulation and symbolic execution, by instrumenting the RTL with statements that printed the concrete execution in terms of syntactic elements. The resultant trace file output during concrete simulation is symbolically analyzed to reveal activated statements. In contrast to HYBRO, which symbolically evaluates unstimulated variables used in activated statements, their work could simply substitute them with their concrete equivalent. Both [9] and [13] generate tests by unrolling the design for a fixed number of cycles from a specific starting state (say, system reset), and are limited to the exploration of system states reachable within the unrolled cycles.
PACOST [16] , a target oriented RTL test generator has shown success in reaching hard-to-reach states by using factored explorations adjunct to interleaved concrete and symbolic simulation. The abstraction-guided exploration begins at a successor state selected from previously discovered states. However, the construction of path constraints in each exploration is limited to variables that can trace their use-definition chain to an input variable within that exploration. In contrast to [16] , our factored explorations are guided by heuristic metrics derived from the overall coverage status of the test, and our analysis is not restricted to use-definition chains wholly contained within the exploration.
B. Concolic Execution of RTL
A concrete path or trace is defined as a sequence of statements (assignments and guards) executed upon simulating the RTL code (for one or more cycles) with concrete input stimuli. However, unlike previous work, we do not construct a path constraint which represents concrete trace as the stack of symbolic assignments and guards. Rather, we perform cycleby-cycle concolic evaluation and maintain an Activation Table which stores the symbolic definition of activated variables defined in assignment statements. In our work, concolic execution requires symbolically executing the concrete path by only using symbols for inputs. In each cycle, a symbolic expression is constructed for a statement if it is found to be activated, with appropriate substitutions for each variable used. For primary inputs, the cycle annotated input symbol representing them are used. An activated variable is substituted for its defining symbolic expression fetched from the Activation Table. Values of unstimulated variables are said to be concrete, and are obtained (read) from the simulator during simulation (between cycles).
Paper 11.2 INTERNATIONAL TEST CONFERENCE
Along a concrete trace, guards that use activated variables or primary inputs are termed activated guards. For each activated guard, a Boolean symbolic expression representing its predicate and the concrete branch taken is constructed. A guard expression is a function of input stimuli. To divert the control flow, we attempt to mutate the activated guards. An activated guard that can be mutated is called a mutable guard and the stimuli that satisfy the mutation is referred to as the mutation stimuli. The constraint stack for each activated guard includes its negated expression and the concrete expressions for the intersecting mutable guards found before it, in this trace. The intersection of guards is defined as having cycleannotated input stimuli in common, and we consider this during the mutation effort to ensure that the mutation stimuli of a latter guard do not mutate the concrete execution of a former. The constraint stack is solved using an SMT solver. If the constraint stack could not be solved, then the activated guard is ignored henceforth in the analysis of this trace.
Comparable to [13] , our work can build symbolic expressions for statements involving array accesses (e.g. memory, queues, FIFO) where every element in the array is treated as a scalar variable annotated by its index. This means that the array index would be read as concrete. Thus, upon encountering activated variables being used as array access indexes, we deactivate them and consider their input stimuli to be immutable stimuli. No symbols are created for immutable stimuli, and their concrete values are substituted wherever they are used. In a similar note, our work considers clocking and reset inputs as immutable stimuli, and always treats them as concrete values.
C. Test Decision Tree
The test generation effort is represented as a decision tree in our work. Performing concolic execution over a given sequence of input vectors (concrete stimuli), we detect mutable guards and their corresponding mutation stimuli. The mutable guards can be considered as points of divergence in the control-flow, with the divergence being achieved by applying the mutation stimuli instead of concrete stimuli.
Structurally, the decision tree consists of two types of nodes, data nodes and control nodes. A data node comprises zero or more concrete input vectors called default stimuli. Control nodes are non-terminal, and detail a cycle annotated mutable guard, its concrete execution (default execution) and mutation stimuli.
An exploration is the task of performing concolic execution over a given concrete stimuli and returning a decision tree representing its control-flow response. The exercise of building a decision tree begins with an empty data node. The concrete input vectors are enlisted cycle-by-cycle as default stimuli until their execution discovers a mutable guard. Thus, the data node leads to a new control node which will branch into two children, along edges called default and mutate, indicating the concrete and divergent execution of the mutable guard, respectively. The control node holds the mutation stimuli, and its branches represent a choice to either default or mutate (application of mutation stimuli) the control-flow. Subsequent concrete input vectors are enlisted into the default data nodes, branching as necessary until all concrete stimuli are exhausted.
Construction of the test represented by a data node in the decision tree requires a reverse traversal from that node towards the root gathering input vectors along the way. The default stimuli from each traversed data node are first collected in its entirety. Then, the mutation stimuli are merely noted from the control nodes that fell on the mutate branch of their parent during traversal. In situations where some mutation stimulus of a deeper control node coincides (same signal on the same cycle) with the mutation stimulus of a higher control node, the latter is ignored. Lastly, the default stimuli are overwritten by mutation stimuli for the cycles that it denotes. The task of overwriting the default stimuli with the mutation stimuli is indicative of the minimum change in the input vectors necessary to guide the control-flow to the target node. There is a need to apply the mutation stimuli over the complete default stimuli because the mutation of a guard (while honoring the control-flow through the guards above it) might require altering input, many cycles in the past.
Construction of a decision tree and test to reach a constituent data node will be shown in the illustrative examples in Sections III-B and III-C. Before test generation, a preprocessor generates a cycleaccurate functional simulator from the given Verilog alongside extracting various design information. The Test Generator (TG) block engages the Concolic Execution Engine (CEE) for several explorations towards the goal of building a test that achieves maximum branch coverage. The CEE receives input vectors from the TG and performs concolic execution over their concrete simulation. It discovers mutable guards and their mutation stimuli, and a returns a decision tree identifying them. The TG stitches these decision trees from individual explorations to form a global Test Decision Tree (TDT). At the start of each exploration, the TG selectively determines terminal nodes in the TDT to commit new explorations from and generates appropriate input vectors to reach that node followed by random input vectors for the unexplored cycles. Once all explorations have completed, the input vectors that form the optimal test are written out. A detailed description of each block is presented below.
A. Preprocessor
The Preprocessor begins by translating the Verilog source of the design into C++ using Verilator [18] . Given synthesizable Verilog, Verilator outputs a C++ class representing a cycleaccurate functional model of the RTL. All signals (IO and internal) of the top hierarchical component of the design are directly accessible as public data members of this class. The class also provides a public function, eval, which simulates the model's behavioral response based on the current values of its public data members. Furthermore, Verilator automatically instruments the source code for branch coverage measurement.
Counters are placed under decision branches which increment when the control flow executes through them. Simulating the model for a single cycle can be understood as assigning the data input, followed by two calls to eval with complimentary values set for clock input. The coverage status for the simulation is the value in each instrumented branch coverage counter.
The Abstract Syntax Tree (AST) for the model's eval function is parsed from the simulator's source code and is transformed into a Control Flow Graph (CFG) [6] . The AST structure for each statement helps with analysis and construction of symbolic expressions. RTL for FSM-based designs often involves guidepost variables to denote the state of the system. The assignments and guards for such variables are statically analyzed in the AST to construct a Branch Transition Graph (BTG) as described in [15] . A node in the BTG is representative of a state in an FSM graph, and its branch coverage point identifies the guard that acts as its state-variable guidepost. The BTG edges denote the branch that must be covered to transition from the current state to the next. We can associate all branch points with their respective BTG nodes, by following their hierarchy in the CFG. This task mimics the purpose of design abstraction in [16] and aims to identify the underlying design FSM.
Lastly, a simple harness that instantiates the C++ model and provides generic accessor functions to its public members (signal variables, coverage counters, and eval) is generated. The harness is compiled along with the model to obtain the design simulator. The Verilog description for the design used as the motivating example for this section is presented in Fig. 2a 
1F-6-7F-8-9-10-11 addr=0 9:
accurate behavior and CFG extracted from it is displayed in Fig. 2b and Fig. 2c respectively. The AST forming statement-19 is portrayed in Fig. 2d , highlighting a complex expression that utilizes arrays. The BTG representation of the FSM in the design is shown in Fig. 2e .
B. Concolic Execution Engine (CEE)
The CEE is the heart of the CORT framework and performs the primary task of exploring the design response for a given sequence of input vectors. The following exemplifies an exploration of the design in Fig. 2 and the cycle-by-cycle simulation and analysis is annotated in Table I . Assume that the TG has provided a random concrete input for five cycles (cycles 1-to-5 in column 2).
Before simulating the model with the concrete input, initialization vectors are applied to the model. In this case, that would be a single cycle of reset set to high, during cycle-0, beyond which it is held low. Then, a base-test (provided by the TG) is applied which brings the system to a desired starting state. No analysis is performed for the initialization and base cycles. In this example, we start from the reset state, and thus no base-test is required.
We have access to concrete values of all signals by simply reading them directly from the simulator (instead of parsing simulation print-outputs as in [13] ), but the values can only be read between clock pulses, not during it. Therefore, we require at least two passes through the same concrete stimuli to have valid requisite pre-cycle or post-cycle concrete reads. Each simulation pass begins by applying the initialization and base-test vectors. The Activation Table is only required for the second pass and starts off empty. Firstly, as the functional simulation of the C++ translated RTL is at least five times faster than native Verilog simulation, such an overhead is acceptable. Secondly, being able to read and substitute concrete values for unstimulated variables avoids the redundant effort of symbolically evaluating unstimulated statements, as done in [9] . We take advantage of the fact that the simulator functionally evaluates every statement and variable regardless of stimulation, and that we have easy access to read any variable value during simulation.
Pass-1: Concrete Trace Analysis. The design is simulated over the concrete stimuli. The concrete execution trace (column 3) traversed in the model is inferred from the CFG by observing the change in the instrumented counters before and after the cycle. Say, in cycle-1, by noting that counters 1 and 7 are updated, we can deduce that the execution followed a concrete path of <1F-6-7T-8-9-10-11> where the guards in statements 1 and 7 evaluated to False (else) and True (then) respectively. By analyzing the statements along per cycle trace, we mark each used variable as a post-cycle or pre-cycle read, for their particular cycle. A used variable in marked as a postcycle-read if the variable was first defined (Verilog blocking assignment) and then utilized in the same cycle. Else, it is characterized as a pre-cycle-read. Symbolic analysis of the concrete trace is performed in the second pass.
Pass-2: Concolic Execution. In this pass, before simulating a clock pulse with concrete stimuli, we first analyze the previously extracted concrete path for the upcoming cycle, to reveal activated statements (stimulated by primary input or an activated variable from the Activation Table) . The used variables in each activated statement which were marked as pre-cycle-read for this cycle, are now read from the simulator. Next, the clock pulse is applied. Symbolic expressions (column 5) are constructed for activated statements using its AST representation with appropriate value substitutions for the used variables. Post-cycle concrete reads for unstimulated variables are read from the simulator on demand. The defined activated variable is enlisted in the Activation Table (column 6). The concrete reads in each cycle are shown in column 4.
For this example, din i denotes the byte input din on cycle i. In cycle-1, the first cycle of analysis, the assignment statement-9 is found to be activated by the input din. It is seen that this Paper 11.2 INTERNATIONAL TEST CONFERENCE 5 We require that the array access index variables themselves are concrete. Thus, the concrete value for addr is pre-cycle-read from the simulator and the index is calculated by substituting it in the AST for array index selection (ArraySel AST node).
In the LHS of the activated statement, we use din 1 for the construction of the symbolic expression defining buffer [0] . Finally, we add the index annotated array variable, buffer_0 and its symbolic expression: din 1 , to the Activation Table. Similarly, in cycle-2, statement-14 is found to be stimulated, and the Activation Table is updated with buffer_1 with its expression: din 2 .
During cycle-3, statement-19 is found to be activated as it uses activated variables buffer_0 and buffer_1. The symbolic expression for r0 is constructed by substituting values for activated variables from the Activation Table in statement's AST representation. In cycle-4, statement-24 is the first activated guard to be encountered along the overall concrete path. We see that, with the concrete stimuli, the execution takes the else branch, and thus, the appropriate predicate expression is constructed. Statement-9 is re-activated in cycle-5 and updated in the Activation Table but is not used further in this exploration. While not shown in the example, it must be noted that if an entry in the Activation Table is found to be deactivated (defined wholly by unstimulated variables or constants) while analyzing the concrete path, it is simply removed.
Once all the activated guards are discovered, an attempt is made to check if they are mutable. The negated expression for each activated guard is fed to the SMT solver constrained by expressions for mutable guards with which its stimuli intersects. In the current example, the expression representing activated guard-24 on cycle-4, is negated to obtain (57005 ==(din 2 « 8) | din 1 ). The solver returns din 1 = 173 and din 2 = 222 as mutation stimuli, and guard-24 is found to be a mutable guard in cycle-4.
Lastly, the decision tree (Fig. 3) is built for the current exploration as per Section II-C. 
C. Test Generator
Concolic execution on its own is hindered by the same limitations of path explosion and computational effort of evaluation over a large number of cycles. The entire CFG of the design is processed every cycle. As demonstrated in [16] , factoring the exploration into a smaller number of cycles and combining the results of each exploration offers a promising avenue to scale. Our work treats the test generation problem as a task of iteratively building the global Test Decision Tree (TDT) for the design over each exploration.
A configurable parameter called exploration-radius (R) determines the length of each exploration. An iteration begins by heuristically selecting a terminal data node in the TDT. Assume that the terminal data node n3 from Fig. 3 is selected as the starting point for a new exploration. Reaching the system state represented by n3 requires that the mutate branch is taken from node n1. The test is constructed as described in Section II-C where the entire 4-cycle default stimulus is collected first, and then overwritten by specific mutation stimuli on cycle 1 and 2. The execution region under the then branch of guard-24 on cycle-4 is unexplored, which is also implied by an empty n3. Thus, to ensure that the exploration does not miss any mutable guard on cycle-4 along the new execution branch, the last input vector in the 4-cycle test that reaches n3, is set as the first cycle of the concrete stimuli. This represents a 1-cycle overlap of the previous exploration and the new one.
The concept of stitching factored decision trees that guide the overall control flow of the design is afforded by overlapping explorations using a configurable parameter, explorationoverlap (Q). The exploration-overlap can take a minimum value of 1. The overlapping vectors are treated as concrete stimuli, which implies that concolic execution is performed over Q + R vectors. The test that reaches the selected terminal node is reduced by Q cycles to form a base-test, and the remaining vectors are used as overlap. Fig. 4a . The returned decision tree in displayed in Fig. 4b . This time, the previously empty n3 starts off as the head node while building the exploration's decision tree. However, the vectors that were included as overlap are not gathered as default stimuli, as illustrated by the absence of <cycle-4, din=232> in the updated n3. The stitching of the decision tree returned by the new exploration at n3 is shown in Fig. 5a .
Overlapping explorations may lead to re-discovering mutable guards within the region of overlap. The CEE ignores these previously explored cycle-specific mutable guards while building its local decision tree, but includes them for mutation effort (in cases where their stimuli intersect with activated guards down the new concrete path). Increasing Q may increase the discovery of mutable guards in the new exploration, as it deepens the influence of primary input stimuli. A Q = ∞ would rediscover all mutable guards from the root of the TDT but would be computationally intensive. A smaller Q and R would lead to relatively fewer activated statements being processed per exploration.
The test represented by a data node includes the test that reaches it, and its default stimuli (if any). The test representing node n6 is shown in Fig. 5b . At the end of an exploration, the coverage status for each new terminal node is recorded by simulating the design over their tests. If no mutable guards were discovered during an exploration, then the head data node from which the exploration began would continue to be terminal, though with more default stimuli. The coverage status needs to be refreshed for such updated terminal nodes. The TG currently employs three heuristic metrics to select the next terminal node for each exploration. Random Path Selection (RPS): This selection heuristic was introduced in KLEE [4] . A terminal node is selected by traversing the TDT, starting from the root with random choices made at every branch. This strategy is expected to uncover easy-to-reach system states.
Coverage Oriented Selection (COS): This strategy aims to guide the exploration along least covered branches in hopes of uncovering unexplored regions of the design. Say, I branches are covered among all J terminal nodes. And, c ij is the coverage count for branch i in terminal node j. Then, as shown below, a score s j for each node is calculated from a branch heuristic weight w i . The node with the highest score is selected.
Target Oriented Selection (TOS): Given a target branch, TOS restricts a COS type heuristic selection to terminal nodes that may possibly reach the target. If a terminal node is found to cover a branch that offers no path to the target branch in the design BTG, then it is ignored during selection.
Algorithm 1 Derive Final Test from the TDT
I = set of all unique branches covered in the TDT M = data nodes that first-discovered some branch ∈ I N j = set of branches covered by node j ∈ M , of size n j T j = test represented by j ∈ M , of length t j T init = initialization vectors
The concluding task for the TG is to derive the test from the TDT. Writing out the optimal test can be boiled down to Paper 11.2 INTERNATIONAL TEST CONFERENCE selecting the optimal set of data nodes in the TDT that cover all explored branches with the minimal number of input vectors. We utilize a greedy approach for handling this operation. With a breadth-first traversal, we collect a list of data nodes, M , that were the first at discovering a branch, until M includes nodes that cover all discovered branches, I. The test, T j represented by each node-j includes the test that reaches it and its default stimuli. Algorithm 1 shows the greedy accumulation of vectors among M , to build the final test, T . Concluding our example, the final test written out will only consist of n3 as it covers all branches (specifically, branch-4 in cycle-4, and branch-5 in cycle-8). A test of length, 9 ( Fig. 5c) is built concatenating the initialization vector and the 8-cycle test represented by n3.
IV. RESULTS
CORT is written in Python (v3.4.3) and uses the Z3Py Python interface to Z3 (v4.5.1) [17] , an SMT solver from Microsoft Research. The design information parsed and generated by the Preprocessor is stored as Python objects (Python pickling) in binary format to be read in further stages. The simulator source code which includes the Verilator C++ translation of the RTL and its harness generated by the Preprocessor is compiled as shared library object that can be imported as a Python module. All experiments are run on a Linux (Lubuntu 15.10) machine with an Intel Core i7-975 (3.33GHz) with 6GB memory. We present CORT's performance in terms of branch coverage, test length and execution time over benchmarks from ITC99 [5] (row 2-8) and IWLS-2005 [3] (row 9-14) detailed in Table II .
A. Search Strategy
The benchmark circuits used and the parameters set for each circuit are described in Table II . The number of branches instrumented by Verilator is shown in column 4 (Bran.). The one-time cost for preprocessing inclusive of the simulator compilation is shown in column 5 (Pre.). The individual exploration for each design is bound by exploration-radius (R) and exploration-overlap (Q), selected as per design complexity and size. These parameters are experimentally determined with a minimum of Q = 1 and R = 4 and are increased until the trials for test generation results in high branch coverage (aim: >95%).
Search in column 8, describes the quantity and type of the iterative selection method for each exploration. We begin growing the TDT for each design with repeated rounds of RPS until no new coverage points are discovered (for four rounds). The actual number of RPS iterations during test-generation is non-deterministic and thus is simply mentioned as RPS. This is usually followed by multiple explorations using COS, selected as per design size. A description of RPS+N×COS implies that N explorations using the COS strategy were pursued after exhausting RPS. Almost all designs are suitably covered by using RPS and COS alone. For b12, a single player game of guessing a sequence, a TOS for the branches representing hard-to-reach states c1 and c2 [11, 16] is repeated, until the targets are covered. A relatively larger exploration radius is set for b12 because the primary inputs are only used in guards, offering low concolic execution complexity.
B. Branch Coverage
CORT results are compared against HYBRO [9] , [13] and BEACON [11] in Table III . For unreported benchmarks in HY-BRO and [13] , NA is placed. Columns 5 and 11 highlights the branch coverage and test length for CORT. Column 9 describes the time taken by CORT for executing the search strategy and generating the test for each benchmark and excludes the preprocessing cost. A quick glance shows that our work, despite relying on interleaved concrete and symbolic constraint solving, demonstrates significantly shorter test generation time and achieves a high RTL branch coverage. A fewer number of vectors are generated for the same goal of maximal branch coverage compared to BEACON, a search-based test generator. The minor differences in coverage between tools could be attributed to instrumentation differences and optimizations.
Consider design b10 where CORT achieves a high branch coverage score (96.88%) on par with HYBRO and [13] in only 0.52s. Although HYBRO, [13] , and CORT base their test generation effort on hybrid concrete and symbolic simulation, our work is more than 40× faster for b10 alone. For b11, CORT manages to cover all reachable branches (96.97% overall coverage accounts for 100% reachable branches) 90× faster than previous concolic execution based methods. For b14, we compare CORT with BEACON which also uses Verilator for instrumentation and simulation. We achieve a higher coverage (93.36%) with a test which is 7.5× smaller.
Concerning the target-oriented test generation, CORT consistently manages to cover states c1 and c2 in b12. Satisfying c1 requires timing-out by not pressing the input (k). The deepest system state in b12, c2 (WIN, state variable gamma = 25), which requires over 30000 cycles in a comparable time to PACOST [16] 1 Number of branches present in the RTL may vary between compared methods due to instrumentation differences. 2 Test length was not reported for HYBRO [9] and [13] . Table IV shows results of test generation on the remaining benchmarks for which results for HYBRO and [13] were not available. Hence, we compare CORT against random test generation (RAND), and BEACON configured to the settings used in [11] . RAND is constrained to non-reset primary inputs for 50000 cycles and averaged across eight trials to represent a baseline. The table highlights the best results from BEACON and CORT. BC%, Vec. and Time represent percentage branch coverage, test length and runtime respectively. In all cases, the result from CORT matches or exceeds the compared methods in terms of coverage and test length. The design, b15 contains a 16-byte prefetch queue (array) through which both instructions (80386 opcodes) and data are buffered. Random test generation is unable to handle such sequential depth in the control-flow. BEACON almost covers as many branches (89.93%) as CORT (92.62%) but requires notably more vectors to do so (almost 100×).
Although CORT is not as fast as stochastic search-based techniques like BEACON for these circuits, the tests generated are undoubtedly efficient in terms of coverage per vector. This is within our expectation, as CORT is a Python-based semi-formal methodology relying on costly SMT operations. Nonetheless, in comparison to the same class of techniques that use hybrid concrete and symbolic simulation (HYBRO, [13] ), CORT is remarkably faster. These results are indicative of CORT's ability to generate minimal directed tests for practical Verilog designs.
C. Effects of Exploration Length
In Table V , for a fixed search strategy of 64×COS, the exploration radius (R) and exploration overlap (Q) are varied for the design, b15. An increasing exploration length (R + Q) is analyzed, and the average branch coverage (BC%), test length (Vec.) and runtime (Time) are reported over 8 trials. Our experiments show that a suitably sized exploration is necessary for generating efficient tests. With shorter exploration lengths (row 2-3), we discover fewer branches in the same search. However, these tests require fewer vectors to cover those branches, highlighting the utility of deriving the overall test from the TDT. Finding the suitable exploration length is a matter of experimentation and knowledge of design complexity. A smaller Q leads to discovering fewer mutable guards per exploration and thus, longer tests, which is indicative of a greater reliance on the randomness of the concrete stimuli (R) to reach certain branches. On the other hand, an exploration Paper 11.2 INTERNATIONAL TEST CONFERENCE 9
with a high Q is computationally intensive and demands a high execution time. It is seen that an exploration length of 24 (row 5) optimally leverages all goal metrics. The efficiency of the test generation effort stagnates beyond a certain an exploration length, indicating its sufficiency at handling the design complexity. Lastly, row 7 (Q = 16, R = 32) shows a larger final test length, which is an adverse effect of the greedy test accumulation technique. V. LIMITATIONS In our work, the concrete values of variables can only be read between, not during, cycles. In the same cycle, if a variable is used in a statement between two blocking definitions of the same variable, then no valid (pre/post cycle) concrete values can read for its use. When the CEE encounters such a situation in the concrete path, it is reported to the user, and the program terminates. The user is expected to insert a temporary internal variable in the RTL to mirror the first blocking definition. This adds no functional value and does not increase the size of the synthesized design, but is quite helpful for concolic execution. Automation of such insertions is expected in our future work. Secondly, our work is limited to RTL designs containing signals with a maximum width of 64 bits. This limitation is posed by Verilator's handling of largewidth signals as contiguous C arrays. The capability to handle such signals and their assignments is also being considered for future work. Similarly, the Verilog HDL semantics supported by CORT are limited by Verilator. Thirdly, in CORT we treat array accesses as concrete. This limits the strength of symbolic evaluation in designs such as cache controller that require heavy logic around array accesses. Lastly, CORT is written in Python as a proof of concept and will be reconstructed in C++ for performance.
VI. CONCLUSIONS
We have presented CORT, a factored concolic execution based test generation technique for the Register Transfer Level. Our method tightly interleaves concrete simulation and symbolic evaluation at a cycle-by-cycle level, providing advantageous performance over previous hybrid techniques. Our minimalist approach to concolic execution, where we dynamically evaluate primary-input stimulated statements, allows us to explore the search space effortlessly. Furthermore, we abstract the control-flow response for the entire test search effort in a novel representation, the Test Decision Tree (TDT), which is systematically grown over several factored explorations. We not only use the TDT for strategic guidance for each new exploration towards high branch coverage but also derive the optimal test from it. Our results show that CORT generates smaller tests with high branch coverage, faster than existing hybrid semi-formal methods.
