Abstract-A semi-formal verification technique, which performs a brute-force compiled simulation with a sophisticated search space pruning, has been proposed and shown to be competitive with the state-of-the-art SAT-based verification techniques [1]. This paper presents a novel approach for accelerating the semi-formal verification by utilizing hardware/software coexecution. To maximize the gain from hardware acceleration, we propose two novel techniques such as hardwired conflict analysis for learning and speculative input pattern generation. We demonstrate that our FPGA-based prototype system achieves about 7x speedup compared against the software implementation of the semi-formal verifier.
I. Introduction
With the increasing importance and substantial cost, efficient formal verification of a large-scale design has become a major challenge. One of the most promising approaches for overcoming this challenge is the use of hardware accelerator. A number of studies on hardware acceleration of SAT solver, a core engine of formal verification, have been available [2] , [3] . However, most hardware-accelerated SAT solvers have not integrated advanced techniques used in the state-of-the-art SAT solvers, such as the conflict learning and non-chronological backtracking. Recently a semi-formal verification technique, which performs a brute-force compiled simulation with a sophisticated search space pruning, has been proposed and shown to be competitive with the state-of-the-art SAT-based verification techniques. This paper presents a novel approach for accelerating the semi-formal verification by utilizing hardware/software coexecution. To maximize the gain from hardware acceleration, we propose several novel techniques such as hardwired conflict analysis and speculative input pattern generation. Experimental results demonstrate that our FPGA-based prototype system achieved about 7x speedup compared against the softwareonly implementation of the semi-formal verifier.
II. Semi-Formal Verification Technique
Bingham and Hu [1] proposed a semi-formal bounded model checking method 1 based on compiled simulation, as a replacement of SAT-based model checkers. The main disadvantage of a simulation-based approach is the exponential number of possible input vectors. The semi-formal method overcomes this disadvantage by pruning the search space analogously to the advanced techniques used in the state-ofthe-art SAT solvers.
( f, c skip ) ← simulate circuit(v) 
In the remainder of this section, we will explain the details of the semi-formal verification method. Procedure 1 describes the procedure of the semi-formal verification. In the procedure, V maintains the set of input vectors which have been verified. First, a new input vector which has not been verified is picked up as a random minterm from V and a compiled simulation for the input vector is performed by calling a function simulate circuit.
In addition to the compiled simulation, the function simulate circuit computes the skip cube of the primary output (verification target). A skip cube is a subset of the input values which determines the value of the output independent of the other input values. The skip cube A v (w) of a wire w with respect to an input vector v is defined as follows: 
The skip cube of a gate with more than 2 inputs can be defined similarly. In the procedure, the skip cube A v ( f ) of the primary output f is computed for each input vector v in the function simulate circuit as well as the simulation of the target circuit. Since any input vector contained in the skip cube leads to the same output, we can skip all input vectors in the skip cube from the next iterations by adding the skip cube to V. Thus, the number of input vectors which are actually simulated is significantly small.
III. Hardware/Software Co-Execution System for Formal Verification

A. System Overview
In this paper, we assume that a given verification problem can be transformed into a problem of checking the satisfiability of a circuit. For instance, suppose a given problem is the equivalence checking between two circuits. First a new circuit is generated by adding a gate which computes the exclusive-OR of the primary outputs of two circuits, then the equivalence is proved by checking that the output of the exclusive-OR gate is false for all possible input vectors. Similarly, a property checking problem can be transformed by adding an extra circuit which checks the property where the output of the extra circuit is true iff the property is true. There are several methods proposed for generating such an extra circuit [5] . Thus, under the assumption that a target circuit has one and only primary output f which is the verification target, our verification system focuses only on the problem of checking if f is true for all input vectors.
Initially, we implemented a software semi-formal verifier described in Procedure 1 and performed a runtime profiling analysis using an example circuit used in Section IV. The profiling result, as presented in Table I , shows that the circuit simulation and the skip cube computation spent most of the execution time. Based on the analysis, we designed a system which performs Procedure 1 in a hardware/software co-execution fashion, as illustrated in Figure 1 . The circuit simulation and skip cube computation in the procedure are executed using hardware. We first describe how to efficiently 
B. Hardware Emulation with Embedded Skip Cube Computation
In our verification system, we accelerate Procedure 1 by executing the circuit simulation and the skip cube computation (i.e. simulate circuit) using hardware. Obviously, the circuit simulation can be easily implemented in hardware. If the skip cube computation is implemented using hardware just the way described in Def. 1, it may require N-bit Boolean operations for each gate in the circuit where N is the number of primary inputs. The size of skip cube computation circuit increases O (G · N) where G is the number of gates, and critically limits the capacity of a target design. To avoid this large overhead, we propose a novel circuit where every gate embeds a fixed skip cube computation logic. The size of our circuit is proportional to the original circuit size regardless of the number of inputs.
First, we slightly modify the definition of the skip cube for efficient hardware implementation as follows: 
Next, we define a propagation flag: In the proposed circuit, we replace each logic gate with a logic block which not only performs the original logic operation but also computes the p-flags. Each wire in the original circuit is replaced with two wires, one is for the signal and the other is for the p-flag. The direction of the p-flag computation is opposite to the logic operation, that is, the p-flags of the gate inputs are computed from the p-flag of the gate output. After setting the p-flag of the primary output to 1, the p-flag of each wire is computed in a topological order from the primary output to the primary inputs. Figure 2 (a)-(c) show basic logic gate blocks with embedded p-flag computation. For ease of explanation, it is assumed that a given circuit consists only of 2-input AND gates and inverters. An extension to more general circuits should be straightforward. A circuit for hardware emulation with embedded skip cube computation can be obtained by simply replacing each 2-input AND gate with the 2-input AND block in Figure 2 (a), each inverter with the inverter block in Figure 2 (b) and each multiple fanout point with the distribution block in Figure 2 (c). Figure 3 illustrates the example of the skip cube computation by the conventional method (Figure 3 (a) ) and the proposed hardwired method (Figure 3 (b) ). The circuit in Figure 3 (a) is transformed into the circuit in Figure 3 (b) by applying the rules described above.
After computing the p-flags of all wires, we can compute the skip cube A v ( f ) of the primary output f by the following theorem: C be a cube (B 1 , . . . , B N ) such that
where v i is the value of the i-th input and P i is the p-flag of i-th primary input. Then, C is equivalent to the skip cube A v ( f ) of the primary output f .
Proof We omit the proof due to the limit of space.
Thus, we can easily compute the skip cube from the p-flags of the primary inputs.
C. Speculative Input Vector Generation
In a typical hardware/software co-execution system, the communication between hardware and software could be a major performance bottleneck. Every step in Procedure 1 must be executed sequentially because an input vector for an iteration depends upon a skip cube of the previous iteration and vice versa. Therefore, every iteration in the procedure requires at least one distinct hardware/software communication. Since our system uses a generic communication method like UDP/IP via Ethernet, there is a large overhead in every communication.
To resolve such a communication overhead, the number of communications is reduced by applying a slight modification to the original procedure. Instead of generating an input vector for each iteration, a certain number of input vectors are randomly picked up from V. Then, the set of input vectors are transmitted to the hardware at once. The simulation is performed for all input vectors and the set of the outputs and skip cubes are returned. The set of the skip cubes are added into V. A potential drawback of this speculative vector generation is that many input vectors could be redundant because they are contained in the skip cubes computed for the other input vectors. By using an example, we will show the effectiveness of the speculative generation. As an example, the equivalence of the lower 8 bits of the outputs of two 16×16bit multipliers is checked. Figure 4 (a) presents the number N C of total communications when a certain number N S of input vectors are generated speculatively for each communication. This graph clearly shows that the number of communications can be reduced dramatically by using the speculative generation. Figure 4 (b) presents the number of total input vectors N V generated for the full verification, i.e. N C × N S . As can be seen easily, the speculative generation is effective up to N S = 1000. Figure 4 (c) presents the estimated runtime under the practical assumption that the computation for each iteration requires 25µs and the single communication requires 1ms. In this example, we found that N S = 1000 leads to the best. This example demonstrates that the speculative generation can dramatically reduce the communication overhead.
IV. Experimental Results
We implemented the proposed hardware/software coexecution verification system described in Section III. The software portion is implemented on Linux PC system on Core 2 Duo 2.0GHz with 2 GB main memory, and the hardware portion is implemented using a Xilinx Virtex-II Pro FPGA with embedded PowerPC processor. For the communication between the PC and the FPGA, the UDP/IP via Ethernet is used and the embedded processor in the FPGA controls the UDP/IP communication.
We used a 16×16bit multiplier as an example circuit, and our system is used for checking the equivalence of the lower n-bits of the outputs. We also implemented the semi-formal model checker explained in Section II and performed the same equivalence checking as a reference. Table II compares the runtime of software-only execution and hardware/software coexecution, and also shows the speedup factor. The results show that our system was about 7x faster than the state-of-the-art semi-formal model checker.
V. Conclusions
In this paper, we proposed a novel hardware/software coexecution approach for accelerating the semi-formal verification. To maximize the gain from hardware acceleration, we proposed several novel techniques such as hardwired conflict analysis and speculative input pattern generation. Despite the recent progress in formal verification, conventional simulation is still widely used for functional verification. From another perspective, our approach can be regarded as a technique for enhancing the coverage of simulation-based verification. Experimental results demonstrated that our FPGA-based prototype system achieved about 7x speedup compared against the software implementation of the semi-formal verifier.
