In formal verification, the first step is to build a formal model. Most 
Introduction
One of the crucial steps in formal verification [1] is constructing a formal model of the system being verified. However, creating an accurate model of the design is a difficult problem. Usually formal models are constructed from informal design documents. But such documents are often ambiguous and do not reflect the actual system completely: many decisions taken at the time of implementations are not documented. In such cases, the verification results are questionable. While, building verification models directly and automatically from the code can avoid those problems above.
Most of academic tools represent designs in BLIF or BLIF-MV [2] networks, and cannot take an input directly from Verilog source code. For example, VIS [3] depends on the translator vl2mv [4] to compile Verilog into intermediate format BLIF-MV first, and then extract formal model from these files instead of the direct Verilog source code. There are only few tools that can directly take Verilog source code as an input, but extracting formal model like this brings convenience indeed. So we come up with the idea of extracting formal model directly from Verilog source code.
The Static Single Assignment (SSA) [5, 6] form is an Inter-mediate Representation (IR) that adopted by a wide range of compilers like GCC (since GCC 3.0 Release), Intel CC, and LLVM. The SSA representation proves very efficient as it allows efficient implementations of data flow analysis and optimizing transformations, such as constant propagation, induction variable analysis, dead code elimination, global value numbering [7, 21] and partial redundancy elimination [8, 22] . The SSA transformation, originally crafted for sequential programs manipulating scalars, has been extended to parallel programs as well as the sequential programs with arrays. [9, 10] The Combinational Equivalence Checking (CEC) [11] problem is a formal verification method targets to check whether two given combinational circuits-in case the circuit contains no storage elements (flipflops or latch)-are functional equivalence. We notice that the synthesizable [12] Verilog is wellstructured which can be represented in SSA form, and the SSA form of combinational Verilog just defines the CEC constraints. In this paper, we present a straight-forward method using the SSA form to mechanically extract models directly from the source code of Verilog design, which can be served as the front-end of formal verification tool.Though the extracted model is used as the font-end of CEC, our method for extracting formal model is not limited to combinational circuit.It is also suitable for sequential circuit.
The rest of this paper is organized as follows. In section 2 we briefly review the backgrounds of SSA conversion. Section 3 presents the implement details of conversion to and out of SSA for a subset of synthesizable Verilog. Section 4 reveals the solution how we can guarantee the formal model extracted from Verilog source code function-preserving. Section 5 presents the experimental results of the SSA conversion and Section 6 introduces some related works we have done for building our prototype system. Section 7 is a conclusion of this paper.
Background
The CEC process consists of several steps. First, the reference and implementation circuits are parsed by the front-end lexical and syntax analyzer. Then after elaboration process, the formal model defining the system constraints is extracted separately, and finally a Miter [13] circuit is constructed and then handed over to the back-end constraint solver.
In this chapter, we will give a brief introduction about the first few steps from lexical analyzing to SSA conversion.
Lexical/Syntax Analyzing
Lexical and Syntax analyzing are the first two phases in compiling. The lexical analyzer is a pattern matcher that groups the input streams into tokens according to the regular expression patterns. Afterwards, the syntax analyzer reads the token streams and builds an internal syntactic representation of the program. The most common IR is the Abstract Syntax Tree (AST).
There are tools developed for generating analyzers purpose. For example, the Lex and YACC are free and open source tools for generating lexical analyzer and syntax analyzer [14] .
Elaboration
In the Verilog and VHDL communities, elaboration [12] is a process of finding identifier definitions, resolving parameters overridden by instantiation and defparam statements, and generating fullyexpanded code.
After a Verilog design has been parsed, and before simulation or synthesis begins, the design needs the modules being instantiated linked to the modules being defined, the parameters propagated among the various modules, and hierarchical references resolved. These are all solved during the elaboration phase.
In our implementation, the elaboration phase revises the AST to meet the needs. The elaboration phase is a necessary pre-process for the SSA conversion.
SSA Form
In this section we give a brief review of the SSA form representation. SSA form is an IR that allows simple yet efficient optimizations and analyses. Take the branch code in table II for example. The variable x has two versions, x1 and x2, after the ifthen-else statement. Therefore, an φ function is needed to merge these two versions. As a result, a new version of x is introduced, and the assignment is placed at the joint of the if-then-else statement. In this simple test case, the φ function is represented by the conditional operator (condition? result1:result2). There are mature algorithms for computing SSA form, while this is beyond the scope of this paper. The naive method for computing SSA form is placing φ node at the joint nodes, renaming variable for each assignment, and replacing each use of a variable to an appropriate version. In the following we will discuss the SSA conversion for this particular Verilog language.
Formal Model Extraction Implementation
As depicted in Figure 1 , our front-end Verilog parser contains two processes: lexical/syntax analyzing and elaboration. Through the first process, we can get the IR of Verilog source code, which we call an Abstract Syntax Tree. Elaboration is executed on the basis of AST and usually needs several iterations to complete the whole process. This phase is a necessary pre-process for the SSA conversion. Extraction of formal model is crucial in combinational equivalence checking and it is in this phase that our Verilog model is transformed into SSA form which can be exploited by back-end solver conveniently. After extraction of formal models for the two Verilog models separately, a Miter circuit is constructed and then handed over to the back-end constraint solver. A back-end constraint solver is usually a SAT solver or SMT solver that decides whether the clauses of a given Boolean formula can be assigned in such way to make the formula true. In our prototype system we adopt a solver based on SMT called STP (Simple Theorem Prover) as our back-end constraint solver.
In this paper, we will not explain all main elements of our model, but only focus on formalization for the extraction of formal models from Verilog. To illustrate our algorithm, we give a simple version of formal syntax, which defines a subset of synthesizable Verilog [12] , and it is enough to reveal our approach. 
Formal Model
(5) function id; {stmt-item} endfunction (6) stmt-item ::= lvalue = expr ; (7) | lvalue <= expr;
| begin {stmt-item} end The formal syntax given above is not strictly following IEEE standard, and only contains representative structures of synthesizable Verilog.
In Verilog designs, one module can instantiate another one, through the module instantiation structure (4) defined above. The module without instantiated by any other modules is termed as top module, which could be several in a common design.
One special structure is the procedural assignment, or precisely the event control statement (12) in our paper. Other structures of procedural assignment, like delay control statement, is not defined here. The procedural statement only appears in always block (3) . The stmt-item is executed when a specified transition (rising/falling edge) on a signal happens if the event-expr is Edge-sensitive, or the event-expr evaluates true if Level-sensitive. The assignment in procedural assignment (12) could be blocking or non-blocking assignment. In addition, the Left Hand Side (LHS) variables of assignment in procedural assignment should be register-type.
In our paper, we assume the design is clear. That is, blocking assignment only appears in combinational always (level triggered) while non-blocking only appears in clocked always (edge triggered). We also assume that the loop construct (10) is constant bounded, and there is no expression construct with side effect.
Various kinds of module-item it is, however, the same functionality can be implemented with only continuous assign (2) and clocked always. The module instantiated could be flattened, and the gate instantiation can be represented by continuous assign, while the combinational always can be transformed to equivalent continuous assign statements. In this way, there are only two kinds of assignments left in Verilog after conversion: the blocking assignment in continuous assignment and the non-blocking assignment in clocked always. The equivalent structure is listed below. 
Convert to SSA
Branching structure here is the conditional statement (13) (case statement (14) will elaborate to nested conditional statement first), which is the only one needs to place φ function in. As there are only two branches, it is straightforward to use the question-mark expression (? : ) to represent the φ function.
To meet the needs of uniqueness of variables assigned in SSA form, a global counter is used to generate unique number (UID) for each assignment (instead of for each id).
The main ideal of SSA transformation is to extract assignments info (we refer it to dataflow) from various syntax structures. The assignment info includes assignment kind (blocking or non-blocking), Left Hand Side (LHS) expression, Right Hand Side (RHS) expression and the Sensitive List (SL). The SL is part of event-expr (12) .
To memorize the dataflow, we need three key maps: the Dataflow map, Bits-core map and SSA-ID map. The Dataflow map stores all the assignments info. The Bits-core map stores a vector of number for each id, one number for one bit. The relation of the newly introduced variable and the original variable is stored in SSA-ID map in a way that new variable represents certain bits of the original variable. The formal definitions are listed below.
Bit-score: ID -> UID* Dataflow: UID -> LHS × KIND × RHS × SL SSA-ID: ID -> ID×BIT_RANGE
Once an assignment is encountered, a unique number getting from the global counter with the assignment info will be added into Dataflow. Also, the corresponding bits of the LHS variable will be updated to the unique number. Therefore, the bit-vector number for each ID in Bit-score is a key for retrieving the assigned assignments in Dataflow.
Note that the LHS (lvalue) could be single variable, part-select, or a concatenation of them. In case of concatenation, a temporary variable will be introduced to get the value, and each piece of the LHS will get the value by part selecting this temporary variable.
The SSA conversion starts from the top modules, and iteratively calls conversion function of each syntax structure.
1) Statement a) Blocking assignment (7)
1) Rename RHS based on current data flow. That is replace every ID in RHS expression with the newest version, the newest version can be checked from the Bit-score map.
2) Update data flow with LHS (Get a UID from the global counter, introduce a new version variable for this assignment, and then add this variable into SSA-ID map. Update the corresponding bits of LHS in Bit-score to UID, and add the assignment into Dataflow). b) Non-blocking assignment (8) Similar to blocking-assignment, except the assignment type is NON-BLOCKING.
c) Sequential block (9)
Apply SSA conversion to each statement in the list.
d) Loop statement (10)
Be unfolded in elaboration phase. e) Conditonal statement (13) 1) Rename conditional expression first.
2) Make a copy of data flow for each branch, and transform each branch using its copy.
3) Merge data flow of then and else branch. That is, for each variable assigned in any of the branch, insert a φ function to merge these two assignments.
Formal Model
Extraction for Combinational Equivalence Checking Guiling Zhang, Dantong Ouyang, Hongtao Bai, Hailin Zeng, TieMin Ma, YueHua Zhang
f) Case statement (14)
Be converted to conditional-statement in elaboration phase. g) Function call statement (11) 1) Get function definition (6) body being called. 2) Do SSA transformation on this function definition, and get the return value assigned to this function id.
3) Replace each formal id in return value with actual id. 4) Update data flow with LHS. h) Procedural assignment (12) 1) Do SSA transformation on its body stmt-item.
2) In case the statement is edge-sensitive, that is the assignments are non-blocking assignments, as we assumed before, then set all the newly introduced variables to net-type, and choose one or introduce a new version for each register variable assigned. Keeping only one version for each register is required by the Verilog semantic, or the cycle time will be changed.
2)
Module item a) Continuous-assign(2)
Be treated as blocking assignment. b) Variable declaration (1) If this declaration has assigned an initial value, then treat this assignment as continuous assign. 3) Module declaration 1) For each module item, apply SSA conversion. 2) Keep a latest version of output variable the same as in original Verilog. 3) As Verilog is not sequential executed, the variable used above may be assigned below. So another round should be taken to update each variable to the latest. In this step, no more variables would be introduced, just replace the variables of RHS expression in Dataflow with newest version.
Out of SSA
The SSA form can be easily transformed back to Verilog, for performing optimization and debugging. As we have all the statement information needed in Dataflow, the conversion out of SSA is straightforward. We only need to convert blocking assignment to continuous assign, and place nonblocking assignment in clocked always. As the top module is flattened during SSA conversion, there will be only one module for each top module.
The table IV below shows a Verilog program, and the Verilog converted out of SSA.
Table IV.
Out of SSA Form The internal maps during the SSA transformation process are listed in table V. In the fourth row, when the two branch s3, s4 of the conditional statement s2 are finished, the Dataflow are merged. Besides, we use the copy propagation to eliminate the internal variable b_2 and b_3. 
Formal Model
The assignments in the Dataflow represented in SSA form define the formal model. For combinational circuit, the internal variables in the programs are net-type variables. The constraints for CEC can be modeled as letting the LHS equals to the RHS satisfied for each entry in Dataflow.
While for sequential circuit, we have to model the behavior of register variables. The register variable only appears in non-blocking assignment. And the SL is used for modeling the condition that triggers this non-blocking assignment.
Function Preserving
As the front-end of the combinational equivalence checking, the formal model extracted by our method must be function-preserving to ensure the verification result valid. In our approach, we make sure every transition in the prototype system would be function preserving by checking the equivalency of the model after transition and source model. Formality is an equivalence-checking solution that uses formal, static techniques to determine whether two versions of a design are functionally equivalent. We use this famous verification tool in the industrial world to check the equivalency.
From Verilog source model to our formal model, three transitions are needed: Verilog source model to AST model, AST model to elaboration model and elaboration model to SSA form model. SSA form model is the final extraction model to hand over to back-end solver. After every transition, we take the source code model as a reference model of Formality and the model through the transition as an implementation model to check whether the conversion is function-preserving. We also use ISCAS and PicoJava-II to test the robustness for each transition and make sure the transition is appropriate for most Verilog source models.
Formal Model
Experiment Result

Our prototype system for extracting the formal model is implemented in C++ and we run the experiments on a 32-bit 1.9-GHz AMD Athlon(tm) 64 X2 Dual Core Processor with 2 GB of RAM under Linux. Currently, our system only support scalar and bit-vectors variable, and does not support memory type.
In order to prove that our converting procedure does not lose any information, we translate the Dataflow out of SSA to Verilog, and checking its equivalence with the primary design by Formality. We have tested it on a large number of synthesizable circuits. The circuits listed in table VI are some representative test cases, including both combinational and sequential circuits.
Among the test cases, the largest one is picoJava-II [15] . The picoJava-II has 15 blocks in total, while the topmost block CPU has memory variables which we do not support (Not Support, NS). Besides, the ICU block, containing combinational always that synthesis to latch, is not supported as we discussed in section III. We have run SSA conversion on the left 13 test cases, and 11 of them have proved equivalent. The other two blocks, IU and IFU are too large for the equivalence checking tool to load in (Over Memory, OM), so we are not sure about the equivalency. 
Efficient CEC Based on Our Front-end
As mentioned above, the extracted model can be used as the front-end of CEC. In [18] , the CEC system based on our front-end makes a better performance in time and space complexity than Formality. In [18] , the author performs an experiment with ISCAS '85 benchmarks. The benchmarks contain several groups of test cases, which are all functional equivalents but implemented in different Verilog codes. ISCAS'85 circuits are based on gate level design. The experiment result shows the memory usage of CEC system is almost only half of Formality's. Also, there is another experiment conducted in [18] with ARITH. ARITH benchmarks are used as arithmetic circuits to verify CEC system. In this test,it also demonstrates good performance in both time and space complexity compared with Formality.
The test result not only reveals our front-end is of function equivalence and robust, but also proves that our extracted model can be exploited efficiently.
Related Work
Vl2mv is a tool that compiles a given Verilog description into FSMs. Vl2mv extracts a set of finite state machines which preserves the behavior of the source Verilog program defined in terms of simulated results [4] . The vl2mv project is now part of a larger project VIS [3] that attempts to implement formal verification, synthesis, and simulation of finite state systems. Compared with our method, vl2mv supports a wider range of Verilog subset, and supports time constraints. Our method is more targeted, and it is sufficient for CEC. Besides, various optimizations on the SSA form are available, which can narrow the solution space of further verification.
Module
State Module  State  BIU  PASS  TRAP  PASS  DCU  PASS  PIPE  PASS  FPU  PASS  RCU  PASS  PCSU  PASS  UCODE  PASS  SMU  PASS  EX  PASS  LOGIC  PASS  CPU  NS  ICU  NS  IU  OM  IFU  OM SSA form has been recommended to represent data flow and it is an efficient data structure applied in program analysis and compiler optimization. Though SSA form is attractive data structure, the procedure of its construction is not so easy and the problem of space explode discourages its use [19] .Minimal SSA form is a refinement of pseudo assignment put forward by Shapiro and Saint [20] , but if there does not exist obvious φ function, it is difficult to manage the relationship between new names and old names. In [5] , R. Cytron et al. came up with an idea to construct SSA from. The algorithms use dominance frontiers to place φ function, and it is a two-step process. First, some φ functions are inserted to the join nodes of data flow. Second, new variables V (V =φ (V1, V2,) ) are generated. The transition can be performed efficiently and also leads a moderate increase in space. In our system, we adopt the concept φ function mentioned by R. Cytron et al. to convert to SSA form. According to the concrete Verlilog source model , we analysis and research Verilog language feature to find how the source model can be converted to SSA form efficiently with function preservation and what structure we should use to store our SSA form data flow. Before converting it to SSA form, our system goes through AST and elaboration period. After the two processes, we can insert φ function to join nodes and rename new variables conveniently.
Conclusion
In this paper, we present a straight-forward method to mechanically extract formal models directly from the source code of Verilog design. Many formal verification tools cannot take input directly from Verilog source code, so the high-level designs have to convert to BLIF or BLIF-MV format by third party tools. In addition, the SSA form can be easily translated back to Verilog in this paper, and the correctness of the SSA conversion can be checked by the equivalency with our origin Verilog design. Experimental result shows that the SSA convention is function-preserving and robust. The method proposed here can be served in front-end design in formal verification and extracted model can be exploited efficiently in CEC.
Though we have converted the Verilog model to SSA form successfully, the SSA representation is not guaranteed to be optimal. In the future, we plan to improve and extend the SSA conversion and try to represent the formal model with an optimal SSA form.
