Many formal verification techniques make use of Binary Decision Diagrams (BDDs). In most applications the choice of the variable ordering is crucial for the performance of the verification algorithm. Usually BDDs operate on the Boolean level, i.e. BDDs are a bit-level data structure.
Introduction
As modem circuits contain up to several million transistors, verification has become the major bottleneck in the design flow, i.e. up to 80% of the overall design costs are due to verification. This is one of the reasons why recently several formal verification methods have been developed since classical simulation cannot guarantee sufficient coverage of the design. E.g. in [I] it has been reported that for the verification of the Pentium IV more than 200 billion cycles have been simulated, but this only corresponds to 2 CPU minutes, if the chip is mn at 1 GHz.
As alternatives, formal verification or symbolic simulation have been proposed and in the meantime these have been successfully applied in many projects [5] . In this context many alternative techniques have been proposed that are used to speed up the proof process, such as SAT or BDDs. A lot of work has been done to combine these techniques resulting in very efficient solvers (see e.g. 171).
Even though these techniques are very powerfuI they all operate on the Boolean level, i.e. high-level information that is available on the initial RTL description is not used. This also applies in cases where very regular structures are verified, such as adders, multipliers or scalable designs. Many difficulties in the proof process result from the fact that this information is not used. In contrast, the frontends that read in the RTL -typically given as Verilog or VHDL -transform the design to a flat netlist that only consists of AND-gates and inverters. This has shown several advantages for verification tools, but all svuctural information gets lost. The major problem when using BDDs in the verification process is that a good variable ordering has to he determined. But this is an NP-complete problem and thus heuristics have to be applied. The most promising approaches regarding quality of the BDD are based on dynamic reordering of variables, like sifting [IO] . Even though the resulting BDDs are small in size, the run times are prohibitive large, such that sifting is usually switched off during BDD construction. Alternatively, static variable ordering methods have been proposed that compute a BDD from the circuit topology (see e.g. [61). But these approaches often fail to determine g w d results. All techniques proposed so far do not make use of high-level information or consider the scalability of the design.
In this paper we present a new technique to speed up BDD-based formal verification of scalable designs. In a fust step a small instance of the Device Under Verification (DUV) is generated and the corresponding BDD is build. This BDD is optimized based on dynamic variable reordering. Since the instance is small, this process runs very fast. Then the resulting optimized variable ordering is analyzed using a pattern matching approach. After this phase the ordering is scaled based on word-level information extracted from the signal names. This scaled ordering is then used as a static ordering for larger instances.
Experimental results for verification of combinational and sequential circuits showed significant reductions, i.e. instances that took several hours before could be verified within a few seconds.
The paper is structured as follows: First we introduce basic definitions. Then we give the main idea of the approach. In the following section our approach is discussed in detail. Next the experimental results are presented. Finally, the work is summarized.
Preliminaries
As is well-known a Boolean fnnction f : E" -- 
Basic Idea
Before the algorithm is described in detail, the underlying main idea and the resulting four steps of our technique are fxst illustrated by a simple example:
Consider the n-bit adder with operands a and b, where a. and bo denote the least signifcant bit, respectively. It is known for adders that an interleaved order gives an optimal result, if the bits are ordered from the least to the most significant bit, i. The proposed technique works in four steps:
1. Build the BDD for a small number of bits only. 2. Perform an optimization based on dynamic reordering. 3. Analyze the ordering and generalize it to an n-hit order. 4. Build the BDD for the large number of bits based on a static ordering.
In the example we start with the "worst case" ordering, i.e. for the adder this means that the two operands a and b are separated. If we start with a small number of hits, e.g. 10 hits, then sifting determines an interleaved ordering that is afterwards generalized and used as a static ordering for building a 32-bit adder.
The benefit of this approach is obvious: Since the time consuming Step 2 of BDD minimization is only carried out on a small design with a small number of variables, the algoritbm runs very fast and due to the regularity of the design the quality is very good as will be shown by experiments later.
Even though the method is simple regarding the general approach, it has shown to be very effective. In the following we fmt describe the analysis phase in more detail and then discuss case studies of scalable designs. It is shown that speed-ups of several orders of magnitude can be achieved.
Scaling BDD Ordering
While the processing in Steps 1, 2 and 4 in the previous section are rather obvious, the crucial step in the approach is the analysis phase. Based on the osdering for the small example the ordering for the n-bit version is extrapolated. The approach would of course benefit from various runs, i.e. if several orders could be considered. This results from the fact that sifting is also a heuristic approach and by several runs robustness can be obtained. In the following only a single variable ordering is studied, since our experiments have shown that this is sufficient. But, it should be noticed that this might become necessary for more complex and more irregular designs.
The resulting ordering is considered as a string of characters, where in each position the name of the corresponding input is given. In the example above this would correspond to e.g. a. or bS. The text string is evaluated by detenniniing the relative order of each entry. This is then matched against existing patterns. From our studies and assuming regularity in a scalable design, it tumed ant that it is sufficient to consider only four patterns: If blocks are more complex, i.e. they do not consider a single bit as in the case of the adder, the method has also to take this hierarchy into account. Notice that the approach not only works for combinational hut also for sequential circuits. In this case also variables for the present states and next states are part of the BDDs hut they can be treated in the same way. The next state variables are necessary for computing the transition relation of the sequential circuit.
In the following the analysis phase is described in more detail.
Annlysis of Ordering
Given a scalable design consisting of n blocks. Then the corresponding BDD ordering string is of the form "a, b,
where i is the number of a block aod each character string corresponds to an input, a current state or a next state variable of a block. The current state and next state variables are used for representing the transition relation.
The ordering analysis algorithm is split into two parts. The fnst part is used to identify increasing or decreasing patterns. The second part is applied to identify the interleaved increasing or decreasing patterns.
A sketch of the analysis algorithm is given in Figure 1 . The first part of the algorithm works as follows (for the integrated examples assume that the given ordering string os is "a0 al a? a3 CO CI cz b3 c3 bo bl b2"): Then the ratio maxf / (number of variables) is computed. This ratio indicates the probability of an increasing or decreasing pattem. (Example: rario is 10/12 = 0.83). If (ratio 2 0.75) then the ordering string is an increasing or decreasing pattem. In this case the overall result of the fnst part of the analysis algorithm is increasing or decreasing depending on a comparison of increasing and decreasing from Step 3 and the relativeOrderList from Step 4. (Example: increasing, because 12 > 0 and "a c b", i.e. scaled ordering for n will be "a0 ... a,,.l co 3.
4.

5.
6.
... c,~.I bo ... b,,.)").
Notice that the described first part of the analysis algorithm does not find a solution for interleaved increasingldecreasing orderings. So to identify this type of orderings the following pattem matching technique is applied (assuming ordering string os to be "a0 bo co al C I bl a2 bz CI a3 bs c3"):
1. First it is determined whether the total ordering is mostly increasing or mostly decreasing. This works by comparing the index of a variable with all the indices of its successor variables analogously to the third step of the first part of the analysis algorithm. (Example: mostly-increasing is 9.3+6.3+3.3=54 and mod-decreasing is 0). 
l cn.,").
With the described analysis algorithm the ordering of a small instance can be analyzed and a generalization for larger designs can be computed.
In the following experimental results show the efficiency of the approach.
Experimental Results
In this section experimental results are given. The proposed technique has been implemented in C++. All run times are given in CPU seconds on an Intel Pentium IV with 1,7 GHz and 512 MByte of main memory. As the BDD package we used CUDD [ l I]. The run times given for our approach always contain the times for the complete flow, i.e. including analysis and consuuction for small instances. For the experiments three scaleable designs have been considered: 
Adders
The results for the adder circuits are given in Table 1 . In the fust column the number of bits to be added are given. Then for both approaches Memory and Time denote the memory in MByte used by the BDD manager and the mn time in CPU seconds, respectively. A time l i t for BDD construction of 2 CPU hours has been set. As can be seen, already for 20 variables, the new approach outpelforms sifting. For 500 variables, the scaling technique is nearly a factor of 10 faster. 
Multipliers
In a next series of experiments we consider multiplier circuits. It is well known that BDDs always become exponential in the number of variables independent of the chosen variable ordering [3]. For this, it is interesting to study the mn time of the algorithms until they give up. We started with a live node limit of 2,000,000 BDD nodes. For up to 12-bit multipliers the BDDs can be constructed. For larger instances the construction failed (shown in italic). We repoIt the memory consumption and the mn time for sifting and our approach until 12-bit. Beyond 12-bit the memory and run time used until the construction failed is given. In case of sifting the values are not monotonically increasing because sifting is called dynamically by the BDD package. Since, in the final phase of our approach a static variable ordering is applied, the limit is reached very fast, as can be seen in Table 2 . Compared to sifting a speedup of more than a factor of 20 can be observed for a 12-hit multiplier.
Table 2. Results for multipliers
Arbiters
As a sequential benchmark for our experiments we considered a scalable bus arbiter. This circuit is often used for experiments in formal verification (see e.g. [8,91). In the upper part of Figure 2 a single arbiter cell is shown, whereas the composition to an n-cell arbiter is given in the lower part. For the resulting circuit a computation of the reachable states is carried out. For the new approach the analysis phase was run on an example with 20 cells. The run times are negligible, since also sifting for these instances needs nearly no time. In the following we give the results for a complete reachability analysis using sifting and the scaling approach. The results are given in Table 3 . In the fmt column the number of arbiter cells is given. The second column shows the overall number of BDD variables. Then as above for both approaches memory and time is given.
As has been shown in [4] the reachability analysis can he performed np to n= 11 hits with 5 12MB of memory, if the original variable ordering as it occurs in the benchmark description is used and sifting is disabled. With sifting this can be improved. But already for 300 cells more than 7200 CPU seconds (corresponding to 2 CPU hours) are needed. The arbiter with 200 cells already takes more than 3000 CPU seconds, while the scaling approach can handle this instance -including the pre-processing -within 5 seconds,
i.e. a speed-up of more than a factor of 600. Using the new technique the complete reachahility can he computed for up to 1500 arbiter cells in about 1000 CPU seconds.
Conclusions
A new approach for fmding BDD orderings has been
proposed. This technique works for scalable designs and makes use of high-level information. Experimental results have demonstrated the quality of the approach. In contrast to dynamic reordering improvements of several orders of magnitude have been observed. It is focus of current work to integrate the approach in an existing verification flow 151. Here it is imporfant that the ordering can be given to the tool without changing any of the internal smctures, hut in the form of a preprocessing. 
