Abstract-The current verification flow of complex systems uses different engines synergistically: virtual prototyping, formal verification, simulation, emulation and FPGA prototyping. However, none is able to verify a complete architecture. Furthermore, hybrid approaches aiming at complete verification use techniques that lower the overall complexity by increasing the abstraction level. This work focuses on the verification of complex systems at the RT level to handle the hardware peculiarities. Our results show an improvement of 100% compared to the commercial tool's results for the prototype we used to validate our approach.
I. INTRODUCTION Systems-on-chip (SoCs) are widespread nowadays, covering a wide spectrum of electronics, e.g. in cell phones, tablets, and cars. This variety of applications means that SoCs' complexity is increasing and will keep increasing in the next generations [1] . The ITRS System Integration group predicts, for a single SoC architecture, an increase in application processors from 9 elements in 2017 to 18 in 2020 and 36 in 2025 and in graphics processing units from 19 elements in 2017 to 58 in 2020 and 247 in 2025. Still according to [1] , "the degree of integration after 2008 keeps increasing to meet the demands of (i) higher computation performance, (ii) faster wireless connections, and (iii) richer multimedia capabilities."
However, the increase in complexity and functionality has a hidden cost: "The increasing number of heterogeneous components (RF, logic, memory, and MEMS) complicates the system design" [1] and complete verification of such systems is practically impossible [2] . Different engines exist to try to solve the verification problem at each stage of development. Simulation, emulation, FPGA prototyping, and formal verification are currently the preferred engines for hardware verification in the industry [3] . Nevertheless, so far no engine is capable of giving 100% coverage for a complex architecture.
The research field for solutions to the verification gaps is wide and strong both in the academia and in the industry and different techniques and methodologies exist [4] . For example, there are techniques to either verify an architecture in a higher level of abstraction [5] or to divide it into smaller subblocks and verify them separately. Both have advantages and disadvantages. The first technique speeds up the verification process by increasing the abstraction, but it gives meaningful results only for the functionality of the hardware, not for its low-level behavior, e.g. timing and parallelism. Therefore, it is useful only for the initial phases of development. The second technique lowers the system complexity by verifying small portions instead of the whole system at once; however, it is unable to cover the interaction between the many sub-blocks that compose the system.
Even though these and many other techniques have their advantages, it is still necessary to verify the entire architecture in the later stages of development before moving to the physical implementation. It is only by stimulating the complete architecture that all its functionalities can run in parallel and highlight the corner cases that need deeper verification.
Coming back to the four engines used in the industry, each serves a different purpose and applies to different stages of development.
Formal verification can completely verify IPs and small subsystems, allowing for complete coverage [6] and fits well in the early stages of development. Nevertheless, it does not scale well for architectures that are more complex due to state space explosion.
Simulation and emulation stimulate an architecture with specific test vectors to generate intermediate and output values [7] and they fit best later in the development flow when the architecture is more stable and all sub-blocks are verified. Although both are scalable, they cannot cover every possible test case.
FPGA prototyping applies to mature architectures [8] to allow for at-speed testing with the embedded software, however, at the expense of reduced internal observability.
To improve the verification results for big architectures, a new trend currently becoming more popular in the industry is the synergy between verification engines [9] , where the verification team seeks to combine the advantages of each engine to the most applicable development phase. One simple but powerful example is deep dynamic formal verification [10] . Simulation directs the architecture to a specific state, and from there, formal tools try to verify a smaller set of states. This technique relies on the quality of the input vectors to drive the system to the desired deep space or corner case. However, this can be challenging and time-consuming to perform iteratively, mostly due to the need to simulate millions of cycles to reach the desired state.
The aim of this work is to increase coverage using dynamic data to cover a greater set of states, without resorting to deep states. As SoC architectures grow in complexity in every new generation of electronics, this growth highlights the need for new approaches to the verification problem. This work proposes a "build-and-prove" process tied to a static register assignment (SRA) heuristic to reduce the state space for formal verification.
The main contribution of this work is a scalable, hybrid, iterative and embedded software simulation-driven flow to improve the verification productivity. The build-and-prove process tries to verify subsystems that grow at each iteration until it reaches the complete architecture. Simulation runs help to reduce the state space for the formal verification process and to avoid the state space explosion. The simulation process uses the embedded software to provide the dynamic data to constrain the architecture during the semiformal phases. The SRA process tries to improve the constraints in an iterative fashion.
The next sections of this paper are organized as follows: Section II describes the related work. Section III presents the developed work in detail. Section IV summarizes the results after applying this work to a test case. Finally, Section V concludes this paper.
II. RELATED WORK
Mukherjee et al. [11] propose a flow to translate RTL code to ANSI-C code and apply different formal techniques for software to it, e.g. bounded model checking, path-based symbolic simulation, and abstract interpretation. The idea is to increase the abstraction level and simplify the proofs in order to get results faster. However, due to hardware's nature, software models cannot accurately describe some of its peculiarities. An example is the generation of netlists from high-level models using technologies such as high-level synthesis (HLS). Since current HLS tools cannot capture specific hardware details from the software description, e.g. parallelism and pipelining, they do not implement the developer's intent correctly.
Herber [12] aims at hardware/software co-verification by partitioning SystemC models to achieve a scalable flow. In this work, different engines verify different aspects of an architecture. For instance, a satisfiability-modulo-theory (SMT) tool verifies synchronous components of the RTL. The proposed tool splits the verification task into hardware, software, and system level. However, as useful as this approach may be, it is only applicable to the initial stages of design since the granularity level is too coarse for deep verification.
Große et al. [13] divide the formal verification of SystemC models into three steps. The first step checks the hardware blocks separately. The second step does the verification of the hardware/software interface using the results from the previous step. The third step verifies the embedded software (ESW). As it is the case with the previously cited works, this approach targets a higher-level description, i.e. SystemC, and thus fails to address characteristics inherent to the hardware, focusing only on its functionalities.
The above-mentioned works show a gap in low-level verification since all approaches employ higher-level abstractions to try to improve verification results at the expense of finegrained details. We close this gap with this work by only focusing on the RT-level and using a "build-and-prove" system tied to a novel heuristic to avoid the state-space explosion as much as possible.
III. SCALABLE SEMIFORMAL HARDWARE
VERIFICATION METHODOLOGY As aforementioned, large systems cannot be completely verified using formal methods due to state space explosion. To overcome this problem, we developed an iterative buildand-prove system. It starts the verification process proving a small subsystem and increasing it iteratively IP-by-IP. The verification begins with formal methods and, when they become insufficient, a proposed semiformal heuristic aids to overcome the system complexity.
We propose the HWVerifyr verification approach in this work, which has five phases: (1) RTL preprocessing, (2) formal and (3) semiformal verification of the IPs, (4) formal and (5) semiformal verification of subsystems using the build-andprove process. Algorithm 1 describes the proposed flow.
The next sections describe each phase in detail as well as our developed SRA heuristic to improve the verification process.
A. Static Register Assignment Heuristic
The Static Register Assignment (SRA) heuristic's goal is to reduce the state space using dynamic information. Each time HWVerifyr calls SRA, it uses information from the simulation run.
To achieve the best results, it is important to scale down the state space without over-constraining it; otherwise, the constraints can make errors unreachable. SRA addresses this point using a register mapping between RTL and embedded software, which are elements reachable from the "user" side. This avoids using elements that the user has no control over, e.g. I/O interfaces.
SRA begins by building the DUV's netlist, either a single IP or a subsystem. From this netlist, SRA calculates the cone-ofrelevance (COR) for each of the mapped registers and ranks them from highest to lowest. The output is a list of ranked registers for the semiformal verification phases.
The COR is a measure for the influence of a register based on its breadth and depth. The breadth indicates how many paths start at the register. The depth indicates how many state elements a register connects to, either directly or indirectly. Breadth receives a greater weight due to the register's influence on multiple paths. The measure starts at each mapped register and covers all the logic from them to the outputs. For each path connected to a register, the COR for that register increases by 100 points and for each element connected along each path it increases by 1 point. Figure 1 presents a graphical illustration of this concept.
In Figure 1 , register 1 connects to one path and nine logic elements and register 2 connects to 3 paths and 13 logic elements. The resulting list has register 2 in first place and register 1 in second, since register 1 has a deeper but narrower relevance and register 2 has a shorter but broader relevance. First, it extracts, from the RTL, the IPs that compose the architecture (Line 3). The result is a list of all unique IPs in the architecture used by phases 2 and 3. These phases do the verification of each IP separately.
Second, it ranks the instantiated IPs according to their connectivity (Line 4). The result is the ranked list of elements used by phases 4 and 5. These phases are the base of the iterative build-and-prove process introduced in Section III-E.
Third, it merges all source files to generate a single file, which has all accessed addresses explicitly encoded. We use the tool cilly (Line 5), from the C Intermediate Language framework [9] , to perform this action.
Fourth, it maps the registers between ESW and RTL using the single source file generated in the previous action (Line 6). From this file, HWVerifyr generates the mapping between the implementation of the RTL registers and the locations where the ESW accesses them. This mapping is the key point of the SRA heuristic developed in this work.
Fifth, it groups the user-provided formal properties by the IP(s) they cover (Line 7). The model checker uses these groups in all following phases.
C. Phase 2: Formal verification of the IPs
Phase 2 (Algorithm 1, lines 9-13) tries to verify all IPs separately with the model checker (Line 11) to find any errors before starting the subsystems phase. It adds all the IPs that do not successfully complete within the time limit to a list for Phase 3 (Line 13). The default time limit to verify each IP is 3600 seconds, but the user can set a different value through a parameter.
D. Phase 3: Semiformal verification of the IPs
HWVerifyr executes Phase 3 (Algorithm 1, lines 14-34) if there are any IPs in the output list of Phase 2. Otherwise, the tool goes to Phase 4.
Phase 3 begins by running the SRA process to generate the list of ranked registers for each IP marked for semiformal verification (Line 15). Next, it sets up the points-of-interest (PoIs) for the simulation (Line 16), which are the locations in the ESW that access any of the mapped registers. Following this, the simulation starts (Line 17). Whenever the simulator executes an instruction that involves a PoI (Line 19), the semiformal verification process begins. First, HWVerifyr communicates with the simulation engine to collect the dynamic data for the current ranked registers (Line 21). Next, it creates stop-ats associated with the registers (Line 22) and uses the values from the simulation to generate assumptions for the registers' outputs (Line 23). The formal tool, then, tries to prove the properties associated with the current IP using the generated stop-ats and assumptions (Line 24). The verification runs for the duration of the time limit. If it cannot complete and there are no more registers left, the tool either black boxes the IP (Line 28), if the user chose to black box failing IPs or it aborts the process and outputs the results to the user (Line 30). If the list of ranked registers still has elements, the tool uses the next in the sequence and restarts the semiformal verification process (Line 32). If the model checker verifies the IP successfully, the tool removes it from the list and resumes the simulation (Line 34). This loop executes until all IPs are either verified or black-boxed or if it fails.
A stopat is an abstraction used to "cut" the driving logic beyond a chosen point. This enables the model checker to choose a value for a proof. Furthermore, assumptions can tell the model checker which value it must use from that point on.
Black boxing instructs the formal tool to ignore the internal architecture for some block and unconstrain all its output signals.
E. Phase 4: Formal verification of the subsystems
After HWVerifyr verifies every IP successfully, it starts the build-and-prove iterative proving process. This process uses the list of ranked IPs from the preprocessing phase to build subsystems that grow by one IP at each iteration. It begins the subsystem with the two highest ranked IPs, verifies it and, after success in the verification, adds the next highest ranked IP from the list and repeats the process. The build process follows the original architecture. The starting set of registers for this phase contains all the registers that were successful at the end of Phase 3.
IV. RESULTS AND DISCUSSION

A. Verification Environment
We used a 24 core Intel R Xeon R CPU E5-2630 @ 2.3GHz with 96GB of RAM memory running CentOS to perform the experiments. We used the tool JasperGold to obtain the baseline results and we compared these results to those obtained using our scalable hybrid approach described in Section 3. We also used JasperGold as the model checker for our experiments.
Our experiments used the X-Propagation app from JasperGold to extract the properties for each architecture, either IP or subsystem, and verify them. Table I shows the number of properties extracted for each IP and for each subsystem.
B. Automotive Gateway Prototype
We used a prototype for an automotive gateway developed in-house as our case study. We develop this prototype using the Fusesoc platform. It has an OpenRISC processor ("MOR1KX"), a CAN IP ("CAN"), an Ethernet IP ("ETH-MAC"), and a RAM memory IP ("WB RAM"). All elements are from the OpenCores repository. Figure 2 presents its architectural block-level diagram and Table II summarizes the results for the validation of HWVerifyr. The chosen time limit for each IP was 3600 seconds and for each subsystem was 5400 seconds. We followed the approach presented in Algorithm 1 and started the validation process verifying the IPs separately. In Phase 2, HWVerifyr verified successfully only the RAM memory IP, while all the other IPs needed Phase 3. In Phase 3, it verified the CAN IP after two iterations, the Ethernet IP after three iterations, and the processor IP was black boxed after five iterations. The first part of Table II presents the  results for Phases 2 and 3. In Phase 4, the build-and-prove process started with the processor IP and the RAM memory IP. As mentioned above, 2 Properties that did not complete inside the time limit it was not possible to verify the processor IP and, therefore, it was black boxed in Phases 4 and 5. HWVerifyr verified that subsystem in one iteration. Next, it added the CAN IP to the subsystem; however, it was not possible to verify it with the model checker. Therefore, it was necessary to switch to Phase 5. In Phase 5, HWVerifyr verified it in one iteration. Finally, it added the ETHMAC IP to the subsystem, but was unable to complete the verification process. The second part of Table II presents the results for Phases 4 and 5.
Even though it was not possible to verify subsystem 3, SRA showed a considerable gain over the model checker alone. It is now possible to focus on the region left unproven, which is smaller than without SRA.
Our results show an improvement over JasperGold alone. Table II shows an improvement of 50% in Phase 3 since it was possible to complete the formal proof for two more IPs than JasperGold. It also shows an improvement of 66% in Phase 5, where our flow was able to complete the proof for two iterations in the "build-and-prove" system before timing out.
C. SRA Validation
We used the Ethernet and the CAN IPs to validate the SRA heuristic. As described in Section III-A, SRA works with the registers the user has control over, i.e. configuration and control registers. For the validation process, we ran Phase 3 on the selected IPs.
The declaration of the configuration registers for the Ethernet IP is in the file "eth registers.v". SRA ranked them according to their COR and Table III presents the results. The chosen time limit for all runs was 3600 seconds. Table III shows that SRA needed three iterations to find the minimum set of registers for the Ethernet IP, which are MODER, MIICOMMAND, and CTRLMODER. We ran the process with the other registers for comparison purposes. Furthermore, the low number of stop-ats for this set is a good indication that the system will not be over-constrained during the semiformal verification process.
The declaration of the configuration registers for the CAN IP is in the file "can registers.v". SRA ranked them according 2 Properties that did not complete inside the time limit to their COR and Table IV presents the results. The chosen time limit for all runs was 3600 seconds. Table IV shows that SRA needed two iterations to find the minimum set of registers for the CAN IP, which are MODE and COMMAND. We ran the process with the other registers for comparison purposes. Again, the low number of stop-ats help to reduce the state space without over-constraining it.
V. CONCLUSION
We have presented our scalable hybrid verification approach for complex hardware systems. We described the advantages of the proposed methodology, which spans several steps in the hardware verification flow. The process begins with the formal verification of each IP and ends with the build-and-prove system that verifies incrementally bigger subsystems up to the complete architecture. The semiformal phases of the proposed methodology use the SRA heuristic to reduce the state space without over-constraining the architectures. Our results show that this methodology greatly benefits the verification flow of complex SoCs.
As future work, it should be possible to execute Phase 4 with different starting subsystems to create "verified islands" in the architecture when complete verification is not possible. It is also our goal to reduce the number of necessary stop-ats, to continue avoiding over-constraint. Finally, we want to add a smart time limit for the model checker since complex systems need more time to complete the verification task.
