Abstract-We propose Specification-Based Test Generation (SBTG) which automatically generates functional tests directly from specification, rather than the HDL description of the design. The main benefit of generating tests from the specification is the ability to detect Specification-based Translation Errors (SBTEs) that occur due to a misunderstanding of the specification. Our results show that our test generation approach is more effective at detecting these errors than approaches that generate tests from the HDL code to maximize code coverage metrics.
INTRODUCTION
About 54% of bugs in an analysis by the Intel group for the Pentium® 4 processor design [1] are specification-level errors. We define Specification-based Translation Errors (SBTEs) to be those errors resulting from misunderstandings of design specifications. SBTEs describe mistakes that occur during the manual creation of a design from a specification. The goal of this paper is to define a subset of SBTEs that are not well tested using existing automatic test generation approaches and to define a test generation approach to detect these errors. In industry, a commonly used testbench generation process for a new design is shown in Figure 1a . The process of generating the initial hardware design, behavioral-level or RTL, is manual primarily since higher-level synthesis tools have not caught up to the level of efficiency and precision found in the manual designs of experienced engineers. Functional testbench generation is also primarily manual, using information from the specification and the design. A verification engineer examines the specification and design in order to identify interesting test cases for inclusion in the testbench. Figure 1b shows how many existing automatic test generation approaches are incorporated into the design flow. The key observation is that most existing automatic test generation approaches use information from the design, not the specification because the design is described precisely in a hardware description language (HDL) suitable for automatic analysis. The potential weakness of the approach in Figure 1b is that the functional testbench is created from a design assumed to be faulty. Such a testbench explores the behavior of the design, but it does not necessarily explore potential differences between the specification and design behaviors.
The Specification-based Test Generation (SBTG) approach we propose is presented in Figure 1c . Our approach automates the test generation process using information directly from the specification to generate functional tests. We generate tests directly from the specification because specification-related errors are common in large design projects. Designers are typically partitioned into groups working on different components of a larger design. Miscommunication between design teams is a known source of error and delay [2] , resulting in Specification-based Translation Errors (SBTEs). SBTEs are significant and not directly captured using coverage-based or simulation-based automatic test generation techniques since they analyze the code, not the specification.
In this research, we focus on detecting a subset of SBTEs referred to as subtractive errors. Subtractive errors are errors of omission where some behavior described in the specification is not implemented in the HDL of the design. The manifestation of a subtractive error in an HDL description is that the description is incomplete. Consequently, there is no individual line of code that can be categorized as being incorrect since it has been omitted. Automatic test generation approaches that target code coverage are not likely to detect subtractive errors because they are not evident in the code. In order to target subtractive errors during test generation, the original specification must be used to guide the generation process.
In order to use the specification in the testbench generation, the system behavior must be formally described. The technique used to formally describe the behavior must be straightforward, adding minimal complexity to the existing specification process. To formally capture the behavior described in the specification, we use the concept of scenarios well used in the software engineering domain. Scenarios [3] [4], like live sequence charts [5] , describe typical behavior present in the specification. We assume a complete set of scenarios provided by the designer since we are not presenting a system to formally capture design for synthesis. This is an input into the functional test generation process. We argue that this represents minimal additional burden on the designer because scenarios describe simple event sequences and can be derived directly from timing diagrams commonly included in hardware specifications.
The SBTG technique we propose first generates tests directly from scenarios to stimulate behavior described in the specification. The scenarios are generated iteratively and then perturbed on each iteration to model the variety of mistakes a designer might make while writing HDL code. The perturbations in the test cases increase the likelihood that any differences between the design and the specification will be triggered and detected during simulation.
Our test generation technique is meant to be used together with other test generation techniques that target code coverage. While the SBTG technique is uniquely effective at detecting SBTEs, code coverage-based test generation techniques are still most effective at detecting a wide range of other types of errors identified in the previous and related work section. By using SBTG together with code coverage-based test generation, the complete set of errors, both specification-based and nonspecification-based, can be detected.
II. PREVIOUS AND RELATED WORKS

A. Functional Test Generation for Simulation-based Validation
Functional test generation in simulation is performed to target design-based errors-those errors related to the design of the system and exercise the existing model of the circuit. Some techniques involve a hybrid of these techniques with specialized algorithms. For example, [6] uses a strategy to map high-level faults into logic-level faults, and genetic [7] and b-algebra [8] provide a more directed approach. Also, techniques in test generation are proposed with formal specifications [9] .
Functional test generation involves solving sets of constraints that collectively guarantee the detection of design errors. Researchers use many different solving techniques to perform functional test generation. Boolean SAT solvers perform Bounded Model Checking (BMC) [10] . Constraint Satisfaction/Logic Programming (CSP/CLP) solvers, which solve both arithmetic and Boolean constraints, are used for microcontroller test generation [11] and with Extended Finite State Machines (EFSMs) [12] .
Automatic Test Pattern Generation-based (ATPG-based) justification and propagation algorithms are also used [6] , [8] , and [13] .
B. Coverage Metrics
Coverage metrics determine the adequacy of a test, assess how thoroughly a program is exercised, and show whether the test promotes high system activation. Software testing first qualified the capacity of a given input stimulus to activate specific properties of the program code [14] . In hardware, the most common classifications of coverage metrics are as follows: code coverage metrics, metrics based on circuit activity, metrics based on finite-state machines, functional coverage metrics, error-(or fault-) based coverage metrics, and coverage metrics based on observability [15] .
C. Specification-based Testing
Specification-based testing is well explored in the software domain. The survey papers [16] , [17] highlight problems with requirements documents and specification. They show that designers often implement their code incorrectly. The ambiguity inherent in specifications is the source of significant design complexity in software [18] and in hardware.
Some previous works in the hardware domain examine the problem of functional verification from specification. In [19] , CTL is not used for test generation but to generate coverage goals. The approach described in [20] generates tests from use cases, a formalism originally developed in the software domain.
D. Mutation-based Testing
Mutation-based testing is well explored in the hardware domain. In [21] , the testbench inputs come from specifications using mutation-based approaches. These approaches inject mutations into the design and evaluate the design not the inputs.
III. SYSTEM OVERVIEW
We provide a functional testbench generator using the specification directly.
Scenarios describe behavior in the specification and can be derived from timing diagram well used in industry practice for design. For this reason, we assume a complete set of scenarios provided by the designer since our goal is not to present a system to formally capture design for synthesis. These are then converted into a behavioral Verilog testbench that is injected with perturbations increasing the likelihood of SBTE detection. Figure 2 shows a general overview of the test generation system. From the specification, a set of correct scenarios are created from all timing diagrams, rules and properties for the design. From these scenarios, along with a set of random test vectors for data, a scenario testbench is generated with n copies of the original scenario-generated testbench code. Our test generation technique that utilizes a modified set of these scenarios (n modified, perturbed iterations) also uses a set of random test vectors for data to generate a scenario testbench from the Verilog generator. 
IV. SCENARIOS
The idea of scenarios was initially investigated, together with use cases, in the field of software engineering to help define aspects of software behavior [22] . To formally define scenarios, the analogous terminology used in software engineering [23] is modified to suit the needs of hardware validation. We utilize an existing model [24] since the goal of this paper is not to write a new executable specification language. Other models, such as [23] , express the specification in the form of extended timing diagrams that are able to express sequence.
A use case is defined abstractly as a contract for the behavior of the DUT (Device Under Test), the system being validated [22] . A scenario is an instance of a use case that describes a single input sequence that triggers a behavior. It is seen as an instance -> "WSI = 0;" wscan_in1
of a use case because a use case should define the set of input sequences that trigger a behavior, while a scenario describes only one input sequence that triggers a behavior. The scenario is a pair (T, R) where T is an ordered sequence of events on system inputs that triggers a behavior, and R is an ordered sequence of events on state variables and outputs that describe the response associated with the behavior.
A specification is a set of behaviors that can be refined into a state transition graph such as a Finite State Machine (FSM). The assumption of a correct specification is important to focus the problem space. Also, the assumption of correct and complete scenarios is important to set this research apart from vacuity related problems [27] not addressed in this paper. A scenario describes some aspect of the input/output behavior of a system, and each scenario describes many FSM paths because the trigger of a scenario only defines the values of a subset of input signals at specific times. Any FSM path whose input sequence matches the scenario's trigger sequence is implicitly described by the scenario. For this reason, it is possible to define complex behaviors with a reasonable set of scenarios.
Production-based specifications (PBS) [24] are concise and easy to debug and understand due to the local nature of each production in the specification. [24] , [25] define symbols. At the lowest level of abstraction, tokens describe atomic sets of external signal transitions while those at higher levels describe arbitrarily complex sequences of these tokens. In the same way, since scenarios are a sequence of events on signals, they are created from atomic tokens to create higher levels of complex sequences. SBTG utilizes a reduced set of operations from PBS [24] to describe scenarios using two main types of operators, logic and sequential.
V. DETAILED SCENARIO EXAMPLE
The IEEE 1500 specification [26] is used to expound further a scenario. The IEEE 1500 specification describes a test wrapper component consisting of a Wrapper Bypass (WBY), Wrapper Instruction Register (WIR), and Wrapper Boundary Register (WBR). Figure 3 shows an overview of the IEEE 1500 design. Most system on a chips (SoCs) consist of multiple cores that must interact with one another, and the IEEE 1500 specification describes this system of interaction as well as each individual core's wrapper behavior. This design was implemented in eight modules. The Wrapper Serial Control (WSC) signals are the main inputs that control the system. The Wrapper Serial Input (WSI) and Wrapper Serial Output (WSO) provide the serial interface to the design. An example of a high-level scenario would be "placing an instruction into the WIR." To do this, the WIR is activated using the "SelectWIR" signal. Then, an instruction is shifted in. Finally, the WIR is updated with the instruction shifted into the instruction register of the wrapper. From the timing diagram in Figure 4 , the three steps mentioned for placing an instruction into the WIR can be extrapolated. The control signals are SelectWIR, ShiftWR, and UpdateWR. Each of the three steps is a building block for the main high-level scenario.
The "instruction_bypass" demonstrates a simple scenario. It is a special case of what is specified in the Wrapper Serial Port (WSP) timing diagram ( Figure 4 ) and puts the wrapper in bypass instruction mode. Figure 5 shows a textual view of this scenario derived from a digraph of the scenario "instruction_bypass" T = (P, E) where P is a set of production nodes p and E is a set of edges e within the triggering sequence graph T. For every edge in the original digraph, e = (l,r) where l is the predicate of the edge e as described textually in Figure 5 representing symbols on the left side of the production and r is on the right side. The terminal symbols map to actual Verilog constructs. The textual file in Figure 5 is created using the five operations ',', '&', '&&', ':', and '->'. The "instruction_bypass" top-level scenario first determines that this operation is an instruction with the "sel_instruction" mid-level token. Then, two zeros are shifted in from wscan_in since '00' represents a WS_BYPASS instruction. Once the two zeros are shifted into the instruction register, an update event occurs to update the register with the appropriate value.
An ace doubles as a 1 and an 11 (depending on which is most advantageous to the player). 
VI. SPECIFICATION-BASED TRANSLATION ERRORS
SBTEs are misunderstandings of design specifications and are focused on aspects associated directly with the specification. The specification and design are intricately related in that there is a mapping between parts of the specification to creation of parts of the design. A designer creates design from specification by manually defining this mapping. If they make a mistake during this process, then the mapping may be incorrect in several different ways.
Two types of errors are modeled to heuristically capture SBTEs: additive and subtractive.
An additive error occurs when the designer adds a dependency that is not specified in the design. In the code, added possibilities in the form of extra code are not in the specification. This includes signal modifications that change the values of the signals (high to low, negedge to posedge, and vice versa) as well as operations and control sequences. Most code coverage-based test generation techniques detect additive errors in the design since these are exposed by metrics targeting code directly. If something is added to the code, it will be exercised with full code coverage, and if it produces erroneous results, it will be detected. However, if code is missed, it cannot be detected with full code coverage because it is not present in the code design. Thus, we will focus on subtractive errors.
A subtractive error occurs when the designer subtracts or leaves out a dependency that is in the specification. Subtractive errors are not present in the code and are only present in the specification. These are the errors of omission on which we focus in this work. In verifying that the behavior of a design matches the specification, SBTG is needed to formalize the specification and help to provide a framework for modeling and generation of tests based on the specification rather than on the structure of a design.
The design errors targeted are not errors in the specification but in the translation of the specification. To detect these design errors, we inject perturbations into the testbench that target these design errors. In other words, we make a distinction between the design errors we are targeting (i.e. subtractive SBTE) vs. those perturbations we insert into the created scenarios to automatically generate the testbench used to target these design SBTE.
To explain the use of SBTG to discover specification-based errors, we propose a simple example to describe the design errors we are targeting. In the design of a blackjack application, a designer looks at the set of rules of the application. One such rule is shown in Figure 6 and coded in its entirety in Figure 7 . However, if the grayed out part of the specification is left off (i.e. in the specification but not caught or implemented by the designer in code), the rule and coding drastically change. If the designer omitted the part of the rule in the specification that stated that an ace could be an 11 (in addition to being a 1), a traditional test generation technique would not catch this omission by the designer in the code because it is an error caused by an omission, not incorrect implementation, of a rule. Achieving 100% code coverage does not necessarily reveal this kind of a bug. Generating tests based on scenarios based on the specification would not miss this property.
To explain this further, we generate a testbench with additive and subtractive qualities since we do not have knowledge of what is missing from the design. We start with correct sequences (1-3) in Figure 8 derived from the scenario that states the material conditional with antecedent and consequent as follows: if total_value less than 10, ace_value should be 11. In order to detect what is missing, we must add something to the testbench. What is added in the blackjack example are simulated hands (total value of all cards) with assigning the opposite value for ace_value. Sequence 1s shows a subtractive perturbation since the material condition does not apply. Sequences 2a and 3a show additive perturbations since we "added" an incorrect sequence that includes a new material conditional as follows: if total_value greater than 10, ace_value should be 11. 
VII. SBTG SYSTEM
The SBTG system described in the "System Overview" section (Section III) and Figure 2 
A. Scenario Perturbation Generator
The Scenario Perturbation Generator creates a testbench in the form of the perturbed scenarios in addition to the original scenarios provided.
In the example regarding the blackjack application, a scenario to compute the value of a hand is shown in Figure 9 . The perturbations included in the tests correspond to the SBTEs we intend to detect. Additive perturbations include adding variable assignments and branches in a control path of the testbench. This includes perturbations that change comparator operations and arithmetic operations. Signal perturbations change values of the signals (high to low, negedge to posedge and visa versa). Subtractive perturbations include taking away variable assignments as shown previously in the testbench in Figure 8 . 261
Inputs: Symbol p describing the trigger of a scenario. Set of all productions P. Output: A Verilog testbench. 1. T = ExpandSymbol(p, P) 2. S = {T} 3. for all e in T 4.
T' = T -e 5. S = S union {T'} 6. Testbench = GenerateVerilog(S) Inputs: Symbol p describing the trigger of a scenario.
Set of all productions P. Oubgbtput:
A sequence of events T that is accepted in the symbol p. 
B. The Verilog Generator
The Verilog Generator compiles input specifications in the form of scenarios and outputs testbenches in the form of Verilog code segments derived from the terminal symbols of the scenarios. The testbench is completed with a header and footer added automatically by the generator. The system for this Verilog generator has been implemented as a Python script.
The original scenarios do not contain timing information; however, the final testbench must include minimal discrete timing information to enforce temporal dependencies. We make the following assumptions to support timing. First, we insert a minimum time delay, t min , between any two events adjacent in sequence. Second, the clock period is much greater than t min .
VIII. TESTBENCH GENERATION ALGORITHM
Our goal is to create Verilog testbenches from scenarios directly created from the specification of a design. If the scenario is not complete, for example in the case when a signal dependency was not written into the scenario by the verification engineer, the designer should know what the results should be.
A. Algorithm
The algorithm for creating the testbenches using SBTG is as follows: Figure 10 . Algorithm to Generate a Testbench from a Scenario. Figure 10 displays the pseudo-code for the test generation algorithm. The inputs to the algorithm are a symbol p describing the trigger of the scenario used as a template for test generation and the set of all productions P. The first line of the algorithm calls the function ExpandSymbol to return a sequence of events that is accepted by the symbol p. T is therefore a triggering sequence for the scenario. Line 2 of the algorithm initializes the set S of all sequences that will be used to create the testbench. Lines 3, 4, and 5 describe the loop that iteratively adds sequences with subtractive changes. The loop iterates through all events e in T (line 3) and for each event, generates a T' that does not contain the event e (line 4) that is added to the set of all sequences S (line 5). Each event in a sequence is a legal Verilog statement, either a signal assignment or a delay statement. Once the set S is complete, it is converted into a legal Verilog testbench by adding the appropriate header and footer Verilog statements. The ExpandSymbol(p, P) function, shown in Figure 11 , returns a sequence of events that is accepted by the grammar defined by the set of productions P and whose parse tree starts with symbol p. ExpandSymbol is a recursive function, and line 1 is the terminating condition; if the start symbol is a terminal then the terminal is returned, and the expansion is complete. The function FindProduction(p, P), called on line 2, searches through P to find the production with symbol p as its left-hand side. This is assigned to intermediate variable q. Line 3 assigns B to be the right-hand side of the production. The loop on lines 4 and 5 iteratively replaces symbols in B with their expansions, calling ExpandSymbol recursively. The resulting list of terminal symbols is returned on line 6.
IX. EXPERIMENTAL RESULTS
The results demonstrate that our SBTG technique generates tests effective in detecting subtractive errors. The results demonstrate two key points: 1) SBTG detects a high percentage of subtractive errors, and 2) code coverage (specifically line coverage) is not correlated with the detection of subtractive errors. The second point is important because it implies that SBTG is more effective at detecting subtractive errors than any test generation technique based on line coverage. Figure 12 shows an overview of our results generation process. We compare our test generation approach to the use of a random test generation approach. The random approach iterates through each IEEE 1500 test instruction, shifting random data into the WSI port at each iteration and observing outputs each clock cycle. The number of testing cycles for random testing is the same as the number of cycles for SBTG in all cases.
We implemented a simple version of the IEEE 1500 wrapper that does not include optional instructions or parallel instructions. Our IEEE 1500 design has 779 lines of Verilog code (LOC). The scenarios generated for the IEEE 1500 contained 120 production rules.
SBTG is implemented as a Python script which was executed on a Sun Microsystems Sun-Fire-V240 machine with a 1503 MHz SunW, UltraSPARC-IIIi Dual Core processor and 4Gb RAM running SunOS Release 5.10 Version Generic_142900-13 [UNIX® System V Release 4.0]. Test generation completes in 0.9 seconds. We compare our SBTG method with that of a standard random algorithm and run this to get at least 99% line coverage. A golden design is compared to 20 erroneous designs (each with one subtractive SBTE which is injected into the golden design). The erroneous designs model subtractive errors by line deletions and branch deletions. Each erroneous design is simulated to compute line and defect coverage. Line coverage is the number of lines exercised during the simulation process. Defect coverage is the number of designs detected to be erroneous by observable output comparison with the golden design.
Coverage results are reported in Table I . The line coverage values reported are the average coverage values over all erroneous designs. It is clear that in all cases line coverage is at least 99% for both SBTG and random test generation. The results also demonstrate that defect coverage is much higher for Defect Coverage Testbenches SBTG, 75% compared to 10%. The defect coverage curves for both SBTG and random are shown in Figure 13 . The results in Figure 12 and 13 show that high line coverage does not guarantee increased detection of subtractive errors. For this reason, test generation techniques that rely on code coverage should not be expected to effectively detect this class of errors.
X. CONCLUSION
In this paper, we define a class of errors that occur during the design process. We also present a functional test generation technique that detects this class of errors more effectively than code coverage-based techniques. Our SBTG technique can be used together with code coverage-based test generation to detect both specification-based errors and non-specification-based errors. This research represents an effort to automate the process of specification-based verification, a traditionally a manual process.
