Abstract
Introduction
Nowadays modern circuit design can contain several million transistors. For this, also verification of such large designs becomes more and more difficult, since pure simulation can not guarantee the correct behavior and exhaustive simulation is too time consuming.
But many designs have very regular structures, like ALUs, that can be described easily on a higher level of abstraction. E.g. describing (and verifying) an integer multiplier on the bit-level is very difficult, while the verification becomes easy, when the outputs are grouped to a build a bit-string. Recently, several approaches to formal circuit verification have been proposed that make use of these regularities [1, 12, 3] . All these approaches have in common that they are based on Word-Level Decision Diagrams (WLDDs), i.e. graph based representations of functions (similar to BDDs [4] ) that allow to represent functions with a Boolean range and an integer domain. Examples of WLDDs are e.g. EVBDDs [20] , MTBDDs [9, 2] , *BMDs [6] , HDDs [10] , and K*BMDs [14] . In the meantime WLDDs have been integrated in verifications tools [1, 8] and are also used for symbolic model checking [11, 7] . In [19] HDDs have been applied to verification of circuits at the register transfer level. WLDDs are a tool for bridging the gap between verification of high-level Hardware Description Languages (HDLs), like VHDL, and the netlists consisting of basic gates, like AND and OR. But so far for many HDL commands no effective way of translation into a WLDD is known.
In this paper we present a complete set of datapath operations that can be formally verified based on Word-Level Decision Diagrams (WLDDs). Our verification techniques allow to directly translate a HDL constructs to WLDDs. The key to this transformation are two new algorithms for modulo operation and division. Even though the operations have exponential worst case behavior we show by some experiments that these algorithms can handle functions with up to several hundred variables, while previously known algorithms fail for more than 16 bits. For some important functions often occurring in high-level descriptions we prove polynomial upper bounds on the representation size of the WLDD.
For each HDL operation we describe the main ideas and report some experiments. Finally, a case study on verifying a BCD-to-binary converter shows how the different components can be combined. We succeeded in automatically verifying this circuit, while other approaches, e.g. based on *BMDs only, fail.
The paper is structured as follows: In Section 2 WLDDs are introduced. In Section 3 arithmetic functions are described that often occur in high-level descriptions of circuits and their size is estimated. In Section 4 datapath operations are discussed. An experimental study is given in Section 5. Finally, the results are summarized.
Word-Level Decision Diagrams
In this section notations and definitions are given that are important for understanding the paper. We give a brief overview on Decision Diagrams (DDs). (For more details see [18, 13] For MTBDD [9, 2] , BMD [6] , HDD [10] , EVBDD [20] , *BMD [6] and K*BMD [14] , the corresponding functions f e are given in Table 1 (a; m are integer numbers). 
Representation Size of Arithmetic Functions
High-level circuit descriptions allow the use of buses. By this, Boolean variables are grouped, if they belong together. The big advantage of WLDDs is that they allow to directly make use of this grouping, while the direct correlation gets lost in bit-level DDs, like BDDs. Obviously, the smaller the representation is, the faster are the algorithms. This becomes even more important, if algorithms with exponential worst case behavior are used.
For this, we first consider arithmetic operations of functions that are defined over Boolean variables. Let V a r = f x 1 ; : : : ; x n g be a set of Boolean variables:
While these functions are the basic operations for most others, they are studied in more detail in the following.
Depending on the WLDD-type the representation size largely varies (see below). The best results so far have been obtained for *BMDs: For X;X+Y ;XY ;c X *BMDs have linear size [5] . These results directly transfer to K*BMDs. The situation becomes more complex, when functions of type X c (c constant) are considered.
We first prove an upper bound for function X c for BMDs. Obviously, the given bound holds for *BMDs and K*BMDs as well.
Theorem 1
The BMD for function X c has at most c X i=0 n i nodes using the variable ordering x n,1 ; : : : ; x 0 .
Proof: Before we consider the BMD representation we start with some general considerations that will be used in the following:
It now easily follows from Equation (1):
Notice that the exponent of the polynomial decreases by one, i.e. from d to d , 1.
We now make use of these equations, when we have a closer look at the influence of the BMD decomposition on polynomials:
Here, f low represents the function, if variable x i is set to zero, i.e. f low = f xi=0 . f high is given by f xi=1 , f xi=0 .
If in the following we decompose the function starting from the highest coefficient in the polynomial towards the lower coefficients, i.e. x n,1 ; x n , 2 ; : : : , the function represented by the low-edge of node v computes the same polynomial as v, with the only difference that coefficient x i has been set to zero.
The case for the high-edge is more difficult: We have to subtract the polynomials for the case of x i = 1 and x i = 0. But these polynomials differ only in a constant factor. Thus, Equation (2) can be applied and it directly follows that by each use of the high-edge the exponent of the polynomial represented by the corresponding node is decreased by one. After c high-edges the polynomial assumes a constant value, but this is represented as a terminal node in a BMD. Thus, for X c we only have to count the number of paths from the root of the BMD that pass at most In the following two theorems we show that better upper bounds can be given for K*BMDs. But due to the additive and multiplicative edge values this function becomes isomorphic to the high-successor of the low-edge, while the high-edge points to a constant value. All in all, the number of nodes per level is bounded by two.
(A more detailed analysis shows that the exact number is given by 2 n , 2).
2
This result shows that K*BMDs are the only DD-type presented so far for hardware verification, that can represent all functions considered in [5] in linear size (see Table 2 ). The representation size becomes extremely important for WLDDs, since most operations have exponential worst case behavior. Thus, keeping the (final) representation small enables us to define more efficient algorithms.
Finally, we show that the result of Theorem 1 can also be improved for K*BMDs for c = 3 .
Theorem 3 The K*BMD for function X 3 has at most
On 2 nodes using the variable ordering x n,1 ; : : : ; x 0 .
Proof:
A detailed analysis similar to the one of the proof above shows that (starting from the third level) per level one additional node is created. Thus, the total number of nodes becomes quadratic in the number of variables. (The exact number is given by n 2 + n , 4=2).
2
All in all, it turns out there exist WLDD-types that can efficiently represent arithmetic operations in polynomial size (by polynomials of low degree), while other types fail. 
Word-Level Verification
In this section we define a set of datapath operations that allow to effectively verify high-level HDLs, like VHDL. For this, first two operations are introduced, i.e. modulo operation and division. Recently, it has been proved that none of the "usual" WLDD-types can represent the division function efficiently [21] . Nevertheless, our algorithms for these closely related operators work very well in practice.
(All experiments in this section have been carried out on a SUN UltraSPARC-170 workstation with 256 MByte of main memory.)
Modulo Operation
Modulo arithmetic based on powers of two is frequently used in specifications of datapaths. But as described above, division (and modulo) is a "hard" problem for WLDDs. A straightforward approach to compute modulo would be to recursively apply Shannon decompositions. But a limitation of this approach when using WLDDs is that the range of functions often becomes prohibitively large.
In the sequel, we present an algorithm for modulo arithmetic, that often avoids explicit enumeration of function values. We make use of the two properties of modulo arithmetic: a + bn = an + bn = an + bnn a bn = an bn = an bnn:
Here denotes the modulo operation, a; b 2 Z, and n 2 N.
The algorithm consists of two steps (only the main idea is given in the following, due to page limitation; for more details see [17] ):
1. Terminal cases are checked based on a "conservative" estimate for function ranges. We make use of the algorithm for range estimation as described in [10] . If g depends on x, both algorithms, i.e. exact computation based on Shannon decomposition and estimate are applied recursively.
As an important special case this algorithm also includes the modulo operation with a constant function g. Then for computing f g only the WLDD for f has to be traversed, and the operation has to be applied to the terminal nodes. Afterwards, range estimation on the simplified WLDD frequently leads to an early termination.
Remark 2 For WLDDs using additive and multiplicative
edge values for constant functions g 0 we proceed as follows:
Then the modulo operation only has to be computed for the simplified function ag + m g f v .
Experiment We consider modulo addition based on WLDDs:
We represent the formula by a K*BMD with only pD decomposition using an interleaved variable ordering. The results of our approach in comparison to the conventional approach based on Shannon expansion for varying bit-length are given in Table 3 . Even though the size of the output function grows only linear with the bit-length, the straightforward approach fails for more than 16 bits, while our algorithm can handle the function with 512 bits (and 1024 variables) in less than 300 CPU seconds.
Division
Based on the modulo operation described above, we now give an algorithm for computing the division on WLDDs.
The basic idea of the algorithm is to first subtract the remainder of the division from the dividend and then to compute the result:
If g is independent of variable x, it holds: In some cases division can also be computed efficiently when the divisor is not constant. This is often the case, if dividend and divisor are monotonous and if they are defined over the same set of variables. The expressions a+ 1and a 2 + 2 a + 1are given as K*BMDs consisting of pD-nodes only. For a = 0 the result becomes 1. In all other cases it becomes 0. The K*BMD grows linearly with the bit-length. This is "obvious", but hard to handle using DDs. E.g. BDDs fail, since they can not represent multiplication efficiently. Applying the standard methods (see e.g. [5] ) all input combinations have to be considered resulting in an exponential runtime.
Experiment
Using the algorithm described above also large bit-length can be handled efficiently (see Table 4 ). A prerequisite for this are the efficient representations of e.g. a 2 as proven in Section 3.
Datapath Operations
Based on the algorithm described so far in combination with the results presented in [16] we can now efficiently describe a large set of datapath operations for HDLs, like VHDL (see Table 5 ). a, a0, a1 denote bit-vectors of length n, that are given by integer encodings a; a0; a 1 . b is a single bit represented by the Boolean function b. n, k, l (n; k; l) are natural numbers. equ(inc(ac,2*n), cat(inc(selslice(ac,0,n-1),n), adc(selslice(ac,n,2*n-1),0, equ(inc(selslice(ac,0,n-1),n),0)),n))
Figure 1. Example of datapath operation
Notice that the operations often combine Boolean and integer expressions. This is taken into account by using Boolean and integer graph types. In the implementation of the hybrid DD package from [16] , e.g. the parity function odd(a) uses a WLDD to represent the integer function a, while the result is represented by a Boolean graph type, i.e. an OKFDD or a BDDs.
Experiment Consider the datapath operation in Figure 1 . It will be checked whether incrementing register ac (of bitlength 2n) can be done by splitting it into two words of length n and then performing the operation accordingly.
The implementation given in Figure 1 is faulty, since a carry might be generated during addition adc. In our experiment all word-level operations are carried out on K*BMDs and for all Boolean expressions BDDs are used. The BDD for the first occurrence of function equ represents the complete set of possible values of register ac, for which the operation is implemented correctly. (It is easy to see that the BDD only needs 2n + 1 nodes.)
In Table 6 again the runtimes and the maximum graph sizes during the computation are given. The main problem in this case is the computation of the division and modulo operation in functions selslice and inc, respectively. Notice that the addition adc is again a hybrid operation, i.e. between K*BMDs and BDDs. Even though, most of the word-level operations have exponential worst case behavior it turns out that in most practically relevant cases 
A Case Study
Finally, we describe the complete automatic formal verification of a 10-decade BCD-to-binary converter. (Minor details are left out due to page limitation.) Following the Texas Instrument TTL Data Book for Design Engineers the specification is given by:
The BCD-to-binary function of the SN54184 and SN74184 is analogous to the algorithm: a. Shift BCD number right one bit and examine each decade. Subtract three from each 4-bit decade containing a binary value greater than seven. b. Shift right, examine, and correct each shift until all converted decades contain zeros.
One possible formulation of this algorithm in a more formal way is given in Figure 2 . We compare the HDL description to an implementation composed of subcircuits of type SN74184. As can be seen the HDL description makes use of several operations introduced in the previous sections, like addition, multiplication, greater than.
Figure 2. Algorithmic specification
For all Boolean functions we used BDDs and all wordlevel operations are carried out using K*BMDs. The decomposition types and the variable ordering are not predetermined: they are dynamically found using DTL-sifting [18] .
On a SUN UltraSPARC-170 workstation 30 MByte of main memory were needed. For the transformation of the specification to WLDDs about 11 CPU minutes were needed. Then the BDD for the implementation is constructed. The circuit consists of 82 TTL elements (corresponding to about 5000 two-input gates). The BDDs for the outputs are constructed in less then 1 CPU minute. Furthermore, also Don't Cares are considered, i.e. only "valid" input combinations are used.
All in all, the verification could be completed (including computation of specification and implementation) in less than 18 CPU minutes using 97 MByte of main memory. 60% of the runtime was used for dynamic minimization based on DTL-sifting and the maximal number of nodes during the run was 1.5 million.
Finally notice that in contrast the verification of the specification against the circuit using *BMDs only failed. This further underlines the importance to hybrid structures in verification.
Conclusions
In this paper we presented a complete set of datapath operations that can be formally verified based on Word-Level Decision Diagrams. Our techniques allow a direct translation of HDL constructs to WLDDs. The sizes of WLDDs for important arithmetic functions have been estimated and we have studied manipulation algorithms for WLDDs for modulo operation and division. Based on these core operations, we have shown by several experiments the feasibility of our approach.
In a case study we showed how a specification and its implementation could be automatically verified using formal techniques. Alternative approaches based e.g. on *BMDs could not complete the verification within several hours, while the whole process took less than 18 CPU minutes using our techniques.
