Conventional tools for formal hardware/software co-verification use bounded model checking techniques to construct a single monolithic propositional formula. Formulas generated in this way are extremely complex and contain a great deal of irrelevant logic, hence are difficult to solve even by the state-of-the-art Satisfiability (SAT) solvers. In a typical hardware/software co-design the firmware only exercises a fraction of the hardware state-space, and we can use this observation to generate simpler and more concise formulas. In this paper, we present a novel verification algorithm for hardware/software co-designs that identify partitions of the firmware and the hardware logic pertaining to the feasible execution paths by means of path-based symbolic simulation with custom path-pruning, propertyguided slicing and incremental SAT solving. We have implemented this approach in our tool COVERIF. We have experimentally compared COVERIF with HW-CBMC, a monolithic BMC based co-verification tool, and observed an average speed-up of 5× over HW-CBMC for proving safety properties as well as detecting critical co-design bugs in an open-source Universal Asynchronous Receiver Transmitter design and a large SoC design.
INTRODUCTION
In modern embedded system development, software and hardware components are designed and implemented in parallel. Hardware/software co-verification is performed throughout the design cycle to ensure that both components work correctly together.
Before Register Transfer Level (RTL) code exists for the hardware components, engineers write abstract models of the proposed hardware; such models are commonly known as Transaction Level Models (TLM) [2] . TLMs are typically implemented using the Sys-temC TLM library [20] or as plain C programs. TLMs capture enough functionality of the hardware (HW) to enable executing and debugging of the software (SW) or firmware (FW) before the RTL is available [2, 3] , but TLMs are always incomplete. Co-verification of the TLM and the SW is typically performed by testing [1, 2] . We use the term FW and SW interchangeably in this paper.
Once RTL coding for the hardware components is complete (that is, post-RTL), hardware/software co-verification becomes more complex. Unlike the TLM, which only captures limited design functionality, the RTL code describes the cycle-accurate behavior of the final HW, and contains many extra-functional artefacts related to power, area, and timing. Because of the RTL's detail and complexity, the effectiveness of testing is severely limited in post-RTL coverification. Formal verification is mandatory to ensure correctness. Note that whenever we refer to "hardware" from this point onwards, we mean an RTL implementation and not a TLM.
The verification of SW written in C/C++ together with RTL coded in Verilog/VHDL is very challenging. First, there is a timing mismatch between the synchronous clock-driven HW model and asynchronous event-driven SW model. For example, the FW running on a processor could be much faster or slower than the HW model it interacts with. Second, there are no standard languages or techniques for specifying properties of HW/SW co-designs. Third, the hardware is highly concurrent and the software are frequently multithreaded; leading to a large number of event interleavings to analyze. Finally, there are few automated formal HW/SW co-verification tools that support co-designs implemented in C/C++ and Verilog.
Recently, Mukherjee et al. [15] presented a formal HW/SW coverification tool, called HW-CBMC, that constructs a combined HW-SW model through in-tandem symbolic execution of the SW and the RTL code. SAT-based Bounded Model Checking (BMC) [6] is used to prove safety properties of the combined model. We will refer to the combined HW-SW model as the co-verification model.
In this paper, we build on the observation that monolithic BMC of the co-verification model leads to propositional SAT formulas containing much irrelevant logic. The size and complexity of the formulas pose difficulties for SAT solvers, making the approach ineffective for practical co-verification problems. An effective and practical co-verification solution must reason only about "relevant" interactions between the SW and HW. The notion of relevance stems from a few sources: 1) the property or co-specification to be proved, 2) the behavior of the software, and 3) environmental assumptions. First, the co-specification in a HW/SW co-design captures the design intent that is to be verified. A scalable HW/SW co-verification tool need only check those parts of the co-verification model that pertain to the given co-specification model. Second, the SW in a typical HW/SW co-design only exercises a fragment of the HW state-space [8, 18, 19] . Formal tools may use this fact to verify only the HW functionality exercised by the SW, ignoring or abstracting the rest. This approach can generate much simpler and concise formulas than those arising in monolithic BMC -most importantly, formulas that are more readily solved by state-of-the-art SAT solvers. Finally, assumptions about the environment of a HW/SW co-design may be exploited to further constrain the verification state-space.
In this paper, we present a novel verification algorithm for HW/SW co-designs that identifies partitions of the SW and the HW logic pertaining to the feasible execution paths by means of path-based symbolic execution with custom path-pruning, property-driven slicing, and incremental SAT solving (see Section 3). We employ these techniques in our tool COVERIF to demonstrate that the domainspecific optimizations in COVERIF lead to more scalable reasoning for HW/SW co-designs compared to HW-CBMC.
Contributions
In this paper, we present a novel verification algorithm for HW/SW co-designs called, COVERIF, using path-based symbolic simulation with custom path-pruning, property-guided slicing, and incremental SAT solving techniques. COVERIF supports HW designs in Verilog RTL (IEEE SystemVerilog 2005 standards) and SW in ANSI-C (C89, C99 standards). We experimentally compare two approaches for formal HW/SW co-verification -1) the monolithic approach used in HW-CBMC, and 2) the path-based approach of COVERIF.
We study an open-source Universal Asynchronous Receiver Transmitter (UART) design and a large SoC design, and we find that COVERIF is 5× faster than HW-CBMC for proving safety properties as well as for detecting critical co-design bugs. Figure 1 gives a fragment of a SW driver for a UART design. The main module of the UART SW, shown on the right side of Figure 1 , begins by resetting the UART HW which is followed by a wb_idle() function (explained next). The SW implements Linux style inb() and outb() functions which further invoke the wb() class of functions to communicate with the UART HW. The SW then configures the UART in loopback mode using outb() function calls. The wb() class of interface functions, shown on the left side of Figure 1 , communicate with the wishbone bus interface to set/reset (wiggle) the UART input ports and read/write data through the bus interface. The calls to the top-level UART module is represented by U ART _top() function. We verify that the transmitted data is the same as the received data in the loopback mode. To do so, we place an assertion given by the assert() statement (marked in red) inside the main logic of the UART SW, on the right-side of Figure 1 . Figure 2 shows our proposed verification methodology, as implemented in COVERIF. We now describe each step in detail.
WORKING EXAMPLE

PROPOSED METHODOLOGY
Step 1: Generating Software Netlist from HW A HW circuit, given in Verilog RTL, is automatically synthesized into a cycleaccurate and bit-precise software netlist [14, 16, 17] model following synthesis semantics. The software netlist model is represented as a C program 1 which retains the word-level structure as well as module hierarchy of the Verilog RTL. In contrast with conventional RTL synthesis into a netlist, our software netlist exists solely to facilitate hardware/software co-verification. Figure 3 shows an example of Verilog RTL circuit containing both sequential and combinational elements. Column 2,3 in Figure 3 shows the formal semantics of the Verilog RTL and the synthesized HW respectively. The equivalent software netlist model is shown in column 4. UART_top(); we_i = 0; cyc_i = 0; stb_i = 0; } int main() { wb_reset(); wb_idle(); // Configure the uart outb (0x13, UART_MC); outb (0x80, UART_CM3); outb (0x00, UART_CM2); outb (0x00, UART_CM1); outb (0x00, UART_CR); outb (0x03, UART_IE); char tx_b[] = "Hello world"; _u8 status = 0; char rx_b[100]; int i=0,c=0,d=0; Step 2: Sequentializing Interactions of Firmware and Software Netlist Concurrency is a key problem for co-verification. Testing concurrent threads requires exploring all possible interleavings between HW and SW threads. The number of interleavings could potentially be exponential. However, we observe a specific interaction pattern, which resembles a producer-consumer relationship. That is, a FW thread is mostly independent of the HW thread it interacts with [1, 2] . Specifically, a FW thread is only responsible for configuring the HW by writing to memory mapped registers, Verilog Formal Semantics Synthesized Hardware Software netlist module t o p ( c l k , a ) ; i n p u t c l k , a ; reg b , d , e ; w i re c , co n d ; a s s i g n c = e ? 1 ' b0 : d ; a s s i g n co n d = a ; a l w a y s @( p o sed g e c l k ) b e g i n b <=a ; i f ( co n d && b ) e <=b ; e l s e e <=0 ; d <=c ; end endmodule or polling the interrupt status register for data transmission, or receiving incoming data packets. Furthermore, we observe producerconsumer interaction patterns in many practical industrial co-designs from IBM [8, 18, 19] , RockBox Media Player [1] , and others codesigns. In this paper, we verify co-designs that exhibit producerconsumer interaction behavior. A co-verification model, M seq , is constructed through sequential composition of the FW and its interacting software netlist.
Step 4: Property Driven Slicing of Co-verification Model A property-driven slicing is performed on the unified co-verification model, M seq . This step is purely syntactic, meaning that we perform a backward dependency analysis starting from the property which only preserve those program fragments that are relevant to a given property. The sliced program is then passed to the symbolic execution engine for path-based exploration.
Step 5: Co-verification Using Path-based Symbolic Execution Given a co-verification model, M seq , a scenario, S typically represented by assume properties, and a co-design property expressed as assert(c) (where c is a condition stated in terms of variables in M seq ) as input, COVERIF performs path-based exploration of M seq to automatically check its validity using backend solvers. If the condition c does not hold, then M seq is said to have violated the property.
A typical path-based symbolic execution engine might explore a path until it come to an assert(c) statement. This whole path can then be posed as a query to a SAT solver to see if the assertion is violated at that point. If the path is infeasible, the assertion holds trivially. If a large number of paths are infeasible, symbolic execution may waste time exploring them. COVERIF employs an eager infeasibility check to prune infeasible paths, as well as incremental encoding that makes it easier for the underlying SAT solver.
Alg. 1 shows the overall algorithm of COVERIF. States mentioned in the algorithm are all symbolic states, which are quantifierfree predicates characterizing a set of program states. Symbolic execution starts with an initial symbolic state I (x), is a quantifier-free predicate over program variables x, and the first statement stmt to be executed. Note that we assume all program variables have finite bit-width and thus can be represented as bit-vectors. Every statement acts as a state transformer during the symbolic execution. worklist maintains the set of symbolic states, along with the corresponding stmt that should execute next. Assumptions can be specified in the program using assume(c) statements, where c is the condition expressed in terms of program variables. Assumptions restrict the search to only those states for which the condition c holds at the program point where assume(c) is encountered. For example, suppose (x! = 0) characterizes the set of states that has been discovered to be reachable so far by a verification tool. Here, x is a program variable. Upon encountering assume(x > 0), the set of states reachable at the point of assumption is shrunk to only those that satisfy (x > 0). A user can specify assumptions to restrict the verification to only certain regions of the program's state space.
COVERIF performs a feasibility check at an assume statement or a branch (Line 6 and 9) to ensure that only feasible symbolic states are kept in the worklist. This ensures that the infeasibility is detected as early as possible. If an assertion is violated, then a counterexample is detected and Alg. 1 terminates (Line 14). In all other cases, symex(S, stmt) performs one step of symbolic execution by executing stmt from the symbolic state S (Line 22). If no further statement remains to be executed along the path that is being explored, then stmt is assigned the value ⊥. The symbolic state is put in the worklist only if there are further statements remaining (Line 24).
The feasibility checks shown as isFeasible pose a query to the underlying SAT solver. Note that Alg. 1 does not refer to how the methods worklist.put and worklist.get work. In principle, one can use any search heuristic to select which symbolic state to explore further from the worklist. In the current version COVERIF employs a depth first strategy of exploration.
Apart from the eager infeasibility check, another crucial optimization is the use of incremental SAT solving. During the symbolic execution, only one solver instance is maintained while going down a single path. Thus, when making a feasibility check from one branch b 1 to another branch b 2 along a single path, only the program segment from b 1 to b 2 is encoded as a constraint and added to the existing solver instance. This results in speeding up the process of feasibility check of the symbolic state at b 2 as the feasibility check at b 1 was true. A new solver instance is used to explore a different path, after the current path is detected as infeasible.
The eager infeasibility check restricts the search to explore only those SW/HW interactions which are feasible under a given scenario. In our experiments, we find this optimization has a large effect on runtimes. Though COVERIF poses many queries to the SAT solver, each query is relatively simple due to two reasons: the resultant formula encodes only a single path, and exploration along a path only needs to encode and solve for the path segment (along with the existing constraints) from the last point of query.
HW-CBMC: MONOLITHIC HW/SW CO-VERIFICATION
We briefly describe the working of the HW-SW co-verification tool, HW-CBMC. In contrast to the path-based approach, in HW-CBMC, the symbolic execution of HW and SW models are clearly separate and the two flows meet only at the solver phase, where a complex monolithic formula is generated in Conjunctive Normal Form (CNF) from the HW and SW designs. This complex formula is passed on to the solver for verification purpose. Furthermore, HW-CBMC provides specific handshake primitives such as next_time f rame() and set_inputs() to model FW-HW communications.
PROPERTIES
Lack of support for property specifications in a HW/SW co-design is one of the stumbling blocks for the application of formal techniques in co-verification. We express a co-design property in C language, which is discussed next. Figure 4 shows an example of a temporal property for the UART HW. The System Verilog Assertion (SVA) is shown on the left and the equivalent property in C semantics is shown on the right. The temporal delay (##2) of the SVA on the left is simulated by the calls to the top-level UART module in the software netlist, which is represented by U ART _top() function in the right column. 
EXPERIMENTAL RESULTS
We report experimental results for SW-HW co-verification of a UART and a SoC design. All our experiments were performed on an Intel Xeon 3.0 GHz machine with 48 GB RAM. All times reported are in seconds. The timeout for all our experiments was set to 2 hours. The performance of bit-level and word-level flow in HW-CBMC are similar. So, we only report bit-level results for HW-CBMC. MiniSAT-2.2.0 [7] was used as underlying SAT solver with HW-CBMC and COVERIF. The focus of our experiments is to compare the performance of COVERIF against HW-CBMC for verification of an UART design and a SoC design. We distribute our tool COV-ERIF, along with the HW/SW co-design benchmarks here 2 .
Comparison with Other HW/SW Co-verification Tools: Despite extensive use of model checking and other formal methods in SW verification or HW verification domain, building automated HW/SW co-verification tools using formal methods has received little attention in the past. Other than HW-CBMC [15] , we are not aware of any other automated formal co-verification tool in the literature that can readily accept co-designs written in C/C++ and Verilog RTL. Hence, we only compare our results with HW-CBMC in this paper. 
HW/SW co-verification of UART
About UART: A UART core is used for asynchronous transmission and reception of data which provides serial communication capabilities with a modem or other external devices. The UART is compliant with industry standards for UART and interfaces with the wishbone bus. The design statistics for UART is shown in Table 1 .
The UART core is configured in 3 different operating modes, namelytransmission without interrupt enabled (Scenario A), transmission with interrupt enabled (Scenario B) which transmit non-deterministic data through the serial output while the receiver module is inactive. In Scenario C, the UART is configured in loopback mode with interrupt enabled in which both the transmitter and the receiver are active. The data-width varies in each mode, ranging from 8 bits to 64 bits. Discussion of Result: Table 2 reports the run times for bounded safety proofs of co-design properties in UART core. Column 1 in Table 2 gives the name of the scenario, Column 2 gives the maximum loop unroll bound of the firmware, column 3-7 present the runtime using HW-CBMC, total/feasible path counts, COVERIF, respectively. Table 2 shows that COVERIF dominates HW-CBMC in all scenarios (marked in bold). COVERIF is on average 8× faster than HW-CBMC, both for proving safety as well as detecting bugs. Thus, COVERIF outperforms HW-CBMC in all scenarios. We verified a total of 39 properties of the UART design. Table 2 reports some of the representative properties. In Scenario A and scenario B, we verified whether the transmitted data (32-bit or 64-bit) is available through the serial output port after a pre-determined number of clock cycles. In both of the configurations, COVERIF is able to prune the receiver logic since the SW only exercises the transmitter module by appropriately configuring the memorymapped registers. In Scenario C, we verified whether the transmitted data matches the data received when the UART is configured in loopback mode. We found several bugs in the open source UART obtained from http://www.opencores.org. The bottom part of Table 2 reports the runtimes for detecting data-path and control-path bugs.
HW/SW Co-verification of System-on-Chip
About the SoC:
We obtained an open source System-on-Chip design from [21] . It consists of an 8051 micro-controller, a memory arbiter, an external memory (XRAM) and cryptographic accelerators, as shown in Figure 6 . The design statistics for SoC is given in accelerators and the XRAM by reading or writing to XRAM addresses. The arbitration of these module is done by the memory arbiter module. The FW initiates the operation in the SoC by first writing to an initial memory-mapped register. The FW implements Linux-style inb() and outb() functions calls, which are used to communicate with the HW ports. The FW writes a sequence of nondeterministic data to the XRAM port and then reads the data from the same port. The cryptographic accelerators use direct memory access to fetch the data from the external memory. The completion of the operation is determined by polling the appropriate memorymapped registers in the FW. Discussion of Result: Table 3 gives the runtimes for the bounded safety proofs of the SoC design. Column 1 gives the name of the scenario, column 2 report the maximum loop unroll bound of the FW. Column 3-5 present the verification runtimes using HW-CBMC, COVERIF and the verification outcome respectively. The result in Table 3 shows that COVERIF is approximately 2× faster than HW-CBMC for proving safety. For detecting bugs, the speedup is 6× for COVERIF over HW-CBMC. In the case of SoC scenario (data_transfer), the SW exercises only the micro-controller and transfer sequence of bytes to the XRAM port bypassing peripherals connected to other ports such as hardware accelerator. This scenario allows path-based symbolic execution engine in COVERIF to prune the logic for the accelerator and generate only relevant verification conditions for the micro-controller and XRAM. It is important to note that forward symbolic execution without these optimizations timeout for all the benchmarks.
We verifies a total of 19 properties for the SoC co-design. Due to space limitations, we report 4 representative properties in Table 3 .
We check whether the acknowledgement for data transmission and data reception arrives from the micro-controller in the correct cycle. We verify whether the non-deterministic data transmitted through outb() is the same as the data received through inb(). We also verify that reading/writing to the appropriate memory-mapped registers produce the correct result during the data transmission phase. We found one critical control bug in the SoC design. The bug is manifested when memory arbiter hardware wrongly arbitrates the port selection thereby forcing the write strobe for the external RAM to be LOW. This violates the data transfer protocol in the SoC design.
LIMITATIONS OF PROPOSED APPROACH
The primary motivation for constructing a sequential single threaded unified co-verification model is to avoid enumerating exponential number of interleavings between the HW and SW threads. We have shown that this is extremely beneficial for co-designs that exhibits producer-consumer relationship. Such interaction pattern is prevalent in many practical co-designs [1, 8, 18, 19] . However, the proposed verification approach is not applicable for co-designs that exhibits true concurrency [5] , that is, when a SW and its interacting HW threads do not exhibit producer-consumer relationship. In this case, it is imperative to consider all possible interleavings between participating threads in an efficient manner.
RELATED WORK
Previous work [2, 3, 9, 22] for co-verification have addressed the problem at the pre-RTL phase. However, we address the co-verification problem at the post-RTL phase [11, 13] where a key risk is divergence of the HW RTL from the behavior expected by the SW. Generating a unified co-verification model is a well-known technique in HW/SW co-verification. Notably, Kurshan et al. modeled HW and SW using finite state machines [10] , Monniaux in [12] modeled HW and SW as C programs which are formally pushdown systems (PDS), Li et al. in [11] used Buchi Automata to abstractly model a hardware and PDS to abstractly model a software to generate a unified SW-HW model, called Buchi Pushdown System (BPDS). In this paper, we construct a unified sequential coverification model in C language. Common practice in industry for system-level co-verification is to either use emulators/accelerators or Instruction Set Simulators (ISS) [4] . However, no rigorous formal verification effort is performed at the post-RTL phase to ensure the validity of the SW-HW interactions.
CONCLUDING REMARKS
In this paper, we presented a formal HW/SW co-verification tool called COVERIF. In a typical HW/SW co-design, the software only exercises a fragment of the HW state-space. This renders many interactions between HW and SW modules infeasible. Our general observation is that the bounded model checking technique in HW-CBMC cannot prune irrelevant logic, and hence generates formulas that are extremely difficult to solve with a SAT/SMT solver. In contrast, the path-based exploration strategy in COVERIF is able to automatically prune design logic with respect to a given configuration (scenario), owing to domain-specific optimizations such as eager path pruning combined with incremental SAT solving and propertyguided slicing. Our experiments show that COVERIF is on average 5× faster than HW-CBMC for proving safety as well as for finding critical bugs. In the future, we plan to extend COVERIF to support HW/SW co-designs that exhibit further interaction patterns as well as implement efficient domain-specific path-merging techniques.
