Arithmetic expressions are the fundamental building blocks of hardware and software systems. An important problem in computational theory is to decide if two arithmetic expressions are equivalent. However, the general problem of equivalence checking, in digital computers, belongs to the NP Hard class of problems. Moreover, existing general techniques for solving this decision problem are applicable to very simple expressions and impractical when applied to more complex expressions found in programs written in high-level languages. In this paper we propose a method for solving the arithmetic expression equivalence problem using partial evaluation. In particular, our technique is specifically designed to solve the problem of equivalence checking of arithmetic expressions obtained from high-level language descriptions of hardware/software systems, which consists of regular arithmetic operators (+, −, ×) and logical operators (and, or, not). In our method, we use interval analysis to substantially prune the domain space of arithmetic expressions and limit the evaluation effort to a sufficiently limited set of subspaces. Our results show that the proposed method is fast enough to be of use in practice.
INTRODUCTION AND MOTIVATION
Arithmetic expressions are the fundamental building blocks of hardware and software systems. In hardware, arithmetic expressions form the core of data-path designs. In software, arithmetic expressions form the core of basic blocks. A fundamental problem in computational theory is to decide if two expressions are equivalent [11, 8] . In hardware and software systems, expression equivalence is uniquely characterized by operating on finite precision integers. Furthermore, the general problem of equivalence checking, as related to hardware and software systems, belongs to the NP Hard class of problems [7] .
Efficiently solving the equivalence problem between two arithmetic expressions will have a profound impact in the areas of formal verification [10] , complex code generation and technology mapping [6] , resource scheduling [3] , code transformation [4] , synthesis technologies [20] , compiler techniques [1] , reconfigurable computing methodologies, extensible processors, VLIW and multiple-processor-on-a-chip compilers.
In this paper we propose a method for solving the expression equivalence problem using partial evaluation. In our method, we use interval analysis [21] to substantially prune the domain space of arithmetic expressions and limit the evaluation effort to a limited set of subspaces. Our results show that the proposed method is fast enough to be of use in practice.
Let us motivate the expression equivalence problem with a specific example. Consider the famous problem of mutual exclusion in hardware synthesis [23] [3] [16] [18, 19] [24] . Mutual exclusion is a special instance of the equivalence checking problem. Here, if E1 and E2 are two arithmetic expression, we say that E1 and E2 are mutually exclusive if the condition E1 = E2 is false for all values of E1 and E2. We say that E1 and E2 are not mutually exclusive if for at least some point in the domain of E1 or E2, the expression E1 = E2 evaluates to true. This is indeed the problem of equivalence checking. If C1 and C2 are two conditional expressions (e.g., x < 0 and x > 255), we say the C1 and C2 are mutually exclusive if the condition C1&&C2 evaluates to false for all points in the domain of C1 and C2.
The remainder of this paper is organized as follows. In Section 2 we show the previous work. In Section 3, we formulate the problem of expression equivalence. In Section 4, we give our solution for this problem when we have only one simple arithmetic expression. In Section 5 we extend our solution for more complex arithmetic expressions which have boolean operators also. In Section 6 we present our experimental results. Finally, in Section 7, we give our conclusion.
PREVIOUS WORK
Most of the work on equivalence checking is done in the domain of formal verification. The most commonly used methods to do formal verification of circuits use binary decision diagrams (BDD) [2] and its derivatives, namely ordered BDD (OBDD), ordered functional decision diagrams (OFDD), multi terminal BDD (MTBDD), binary moment diagram (BMD), edge-valued BDD (EVBDD), and multiplicative BMD (*BMD). These approaches differ mainly in bit vs. word level scope and composition rules.
BDD, OBDD, and OFDD are bit-level decision diagrams, while the rest are word-level decision diagrams (bitlevel decision diagrams represent boolean functions f : {0, 1} n → {0, 1} m , while word-level decision diagrams represents integer-valued functions f : {0, 1} n → Z). These decision diagram based approaches also differ in the type of decomposition rule used, specifically, Shannon (BDD, OBDD, and K*BMD), positive-Davio (OFDD and K*BMD), or negative-Davio (K*BMD). Among those decision diagrams that are word-level, a further difference is in the place where the integer weights are inserted, either in leaves (MTBDD and BMD) or edges (EVBDD, *BMD and K*BMD). A detailed survey of BDD and its derivatives can be found in [9] .
Due to exponential complexity, bit-level decision diagrams are only applicable to simple boolean expressions and are not feasible when applied to arithmetic expressions. Wordlevel decision diagrams can be applied to simple arithmetic expressions (e.g. datapath segments [15] ), however, they can only be used to determine the equivalence of arithmetic expressions. Conversely, our method, in addition to checking equivalence, can also partition the domain space into regions and define the arithmetic relations (less-than, greater-than, and equal) present in those regions.
In related work, Wakabayashi et al. [23] have used the notion of a condition vector to find mutual exclusion between two boolean conditions. Two conditional expressions are mutually exclusive if it can be shown that they can never be evaluated to true at the same time. Likewise, Juan et al. [16] have proposed condition graphs, a form of syntax pattern matching, to find mutual exclusion between two restricted boolean conditions. Further, Jian et al. [18, 19] have used timed decision table (TDT) to find three possible types of mutual exclusion between a pair of conditional expressions, namely structural, behavioral and dataflow. Also, Xie et al. [24] used a branch labeling method to find the mutual exclusion properties between two boolean expressions. Finally, Camposano [3] , in his path-based scheduling technique, has proposed a method for determining mutual exclusion based on an exhaustive traversal of all paths in a control flow graph.
The problem of mutual exclusion between two boolean conditions, as solved previously, is a special case of the problem solved in our work. The main limitation of existing works in this area is the restriction imposed on the grammar and the lack of support for mixed arithmetic and boolean expressions. The problem solved in our work applies to general arithmetic expressions with arbitrary complexity.
Zhou et al. [25] have proposed a formal verification system, called conditional term rewriting on attribute syntax trees (ConTRAST) for verifying the equivalence between two differently synthesized data-paths. In their approach, they maintain attributes (e.g., real bounds) associated with each node of the syntax trees of the two data-paths and combine this with term rewriting to establish equivalence. Their approach differs from ours in that they focus on computation precision of real values as an element of comparison. Cheung et al. [5] have used bit-slicing of binary decision diagrams (BDDs) to establish equivalence between two expressions. The main limitation of their approach is scalability, as representing general and arbitrary arithmetic expressions as a BDD is not feasible in terms of space and time requirements.
PROBLEM DEFINITION
An arithmetic expression is formed over the language (+, −, ×, integer-constant, integer-variable). A simple condition is in the form of (expr1 ROP expr2). Here, expr1 and expr2 are arithmetic expressions and ROP is a relational operator (=, =, <, ≤, >, ≥). Without loss of generality we can assume all simple conditions to be of the form of (expr ROP 0). This normalization is achieved by converting (expr1 ROP expr2) to (expr1 − expr2 ROP 0). Hence, (expr ROP 0) is called a normalized simple condition. For the remainder of this work, we refer to a normalized simple condition as a simple condition.
We define an n-dimensional space to be a box-shaped region defined by the cartesian product
. In a simple condition, all integer-constants and integer-variables are assumed to be bounded between min and max values 1 . Hence, the domain of a simple condition C with n integer-variables x0, x1, ..., xn−1 is an n-dimensional space defined by the cartesian product
Given a simple condition C with integer-variables x0, x1, ..., xn−1, the domain space partitioning problem for a simple condition is to partition the domain space of C into a minimal set of n-dimensional spaces s1, s2, ..., s k with each space si having one of true, false, or unknown truth value. If space si has a truth value of true, then C evaluates to true for every point in space si. If space si has a truth value of false, then C evaluates to false for every point in space si. If space si has a truth value of unknown, then C may evaluate to true for some points in space si and false for others.
For example, consider C : 2 × x0 + x1 + 4 > 0. Let us assume min = −5 and max = 5. Therefore, the domain of C is a 2-dimensional space defined by the cartesian product Figure 1 shows the partitioned domain space and the corresponding truth values for this example using our solution to the domain space partitioning problem.
The problem of equivalence checking can be reduced to that of arithmetic expression evaluation. Specifically, given two expressions E1 and E2, by evaluating the condition E1 − E2 = 0, we can establish the equivalence of E1 and E2 (i.e., E1 and E2 are not equivalent if the condition evaluates to false for a point in the domain of E1 and E2). We give our solution to the domain space partitioning problem for a simple condition in section 4.
A complex condition is either a simple condition or two The domain of a complex condition C with n integervariables x0, x1, ..., xn−1 is an n-dimensional space defined by the cartesian product
Similar to the domain space partitioning problem for simple conditions, given a complex condition C with integervariables x0, x1, ..., xn−1, the domain space partitioning problem for complex conditions is to partition the domain space of C into a minimal set of n-dimensional spaces s1, s2, ..., s k with each space si having one of true, false, or unknown truth value. If space si has a truth value of true, then C evaluates to true for every point in space si. If space si has a truth value of false, then C evaluates to false for every point in space si. If space si has a truth value of unknown, then C may evaluate to true for some points in space si and false for others.
The general problem of equivalence checking between two expressions expr1 and expr2 with bounded variables 2 can be expressed in terms of the domain space partitioning problem for complex conditions. As an example, consider checking equivalence between expr1 = 2 × x0 and expr2 = −x1 − 4. Further, let us assume x0 and x1 are 3-bit two's complement integers. We can construct the following complex condition:
Here, (2 × x0) − (−x1 − 4) = 0 evaluates to true, for values of x0 and x1 where expr1 and expr2 are equivalent. The remaining expressions (i.e., x0 + 4 ≥ 0, x0 − 3 ≤ 0, x1 + 4 ≥ 0, and x1 − 3 ≤ 0) evaluate to true when x0 and x1 are within the 3-bit two's complement bounds. To establish equivalence, we solve the domain space partitioning problem and check that the entire region is marked as true. We give our solution to the domain space partitioning problem for a complex condition in section 5.
DOMAIN SPACE PARTITIONING FOR SIMPLE CONDITION
Our overall domain space partitioning strategy is depicted in Figure 2 . On input, the arithmetic expression of the simple condition is parsed to obtain an equivalent polynomial representation. Any arbitrary arithmetic expression can be rewritten as an n-variable polynomial with degree D using the general form shown in Equation 1 . For example, the expression 2 × x0 + x1 + 4 of Figure 1 can be rewritten as 2 × x0 1 x1 0 + x0 0 x1 1 + 4 × x0 0 x1 0 (zero coefficient terms not shown) with n = 2 and D = 1. We describe the remaining domain space partitioning steps in the following subsections.
Computing Root-spaces
During this phase, we operate on an n-variable polynomial P and obtain a set of minimally sized spaces (root-spaces) that contain the roots of P , as outlined in Algorithm 1. We achieve this by finding the roots of P using interval analysis [21] . Let us first give an overview of the interval analysis technique.
A 
Next, we describe our strategy (Algorithm 1) for computing the root-spaces. Algorithm 1 operates as follows:
Algorithm 1 Compute Root-spaces 1: Input: a n-variable polynomial P 2: Output: a set R of minimally sized root-spaces 3:
for all xi ∈ P do 10:
P ← convert P to a polynomial with xi as the only variable and xj = vj (vj ∈ S) 11:
roots ← P .solve() 12:
for all r ∈ roots do 13:
if r = vi (vi ∈ S) then 14:
changed ← 1 15:
r ← r ∩ vi {Intersect new root with old one} 16:
Q.push( v0, ..., r, ..., vn−1 ) {replace vi with r} 17:
end
Ri ← convert Ri to smallest bounding integer space 26: end for Clearly, the roots of P (if any) are within S, however, S may not be minimally sized. To minimize S, we push S onto a queue Q to be processed by the iterative phase of the algorithm. In our running example 2 × x0 
Iterative Phase (lines 6-23):
We pop a space S from the queue Q and split S into smaller spaces S0, S2, ..., S k−1 . If S0 ∪ S1... ∪ S k−1 = S, then S can not be minimized, thus, we add S to the output list of root-spaces R. If S0 ∪ S1... ∪ S k−1 ⊂ S, then we push S0, S1, ..., S k−1 onto the queue Q and discard S. This process iterates until the queue Q is empty. This phase proceeds as follows:
(a) As long as the queue Q is not empty, we pop a space S from the queue Q and clear a flag called changed (lines 7-8).
(b) For each variable xi in P , we compute a single variable polynomial P by setting all variables xj (j = i) to the corresponding intervals vj ∈ S. Next, we solve P using any root finding algorithm (e.g., Newton-Raphson Method [22] ), implemented using interval analysis to obtain a set of one or more disjoint root-spaces (i.e. roots, line 9-11). In our running example, P is computed twice during the run of the for loop starting on line 9. In the first round, with x0 as the vari- [1, 5] to be pushed on the queue Q for processing during the following iteration of the algorithm.
Quantization Phase (lines 24-26):
Finally, we convert each root-space in the output set R to the smallest bounding integer space. Table 1 gives the final output set R for our running example. This result is shown graphically in Figure 3 . All the shaded areas are the root-spaces, and as shown in Figure 3 , the equation 2x0 + x1 + 4 = 0 passes through all of them. 
Partitioning
Given the root-spaces for an expression Ex (corresponding to a normalized simple condition Ex ROP 0), the entire domain of Ex can be partitioned into a number of disjoint spaces. This is accomplished by extending the boundaries of each root-space to the limits (min and max) of the entire domain to establish the borders between the disjoint spaces. For our running example, the boundary points {0, −1, −4, −3, −2} for x0 and {−4, −2, −1, 1, 5, 0} for x1 (see Table 1 ) partition the entire domain space as shown in Figure 4 . In Figure 4 , the root-spaces are shown in shaded color.
For each disjoint space si, and si not overlapping with any of the root-spaces, it must be the case that evaluating 
Evaluation
After partitioning the domain space, each disjoint space si, and si not overlapping with any of the root-spaces, can be evaluated separately. This is done by picking an arbitrary point in si and evaluating the simple condition C. This will yield either a true or a false result. Accordingly, space si can be marked as true or false. For a disjoint space sj , and sj overlapping with one of the root-spaces, such evaluation can not be performed, therefore, sj must be marked as unknown. For example, evaluating 2x0 + x1 + 4 > 0 with the arbitrary point (3, 3) in space [1, 5] , [1, 5] yields a true value, thus, the entire space [1, 5] , [1, 5] is marked as true (see Figure 5) . 
Merging
When two n-dimensional spaces have the same truth value and share n − 1 common borders, then these two spaces can be merged. For example, in Figure 5 In our proposed technique (i.e., Figure 2 ), the overall running time is bounded by the running time of the merging step. Given k disjoint n-dimensional spaces, a brute-force approach can be used to solve the merging problem. To do so, we take each pair of spaces (i.e., O(k 2 )) and look for n−1 common borders (i.e., O(n)), for a total cost of O(k 2 × n). Here, in the worst case, one pair of spaces may be merged, reducing the total number of spaces to k − 1. Then, the process repeats, k times, until a single space remains. Thus, the total running time takes O(k 3 × n). The dimensionality n is the number of variables in the simple condition and is usually small (e.g., less than 8) for manually written programs. Hence, the effective running time of the brute-force merging algorithm is O(k 3 ). Alternatively, we can use a divide-and-conquer heuristic to do this in O(k 2 ). The idea is to sub divide the k disjoint sets into two equal clusters and recursively merge each cluster. In turn, each of these two clusters will be broken further, until the size of the cluster is less than or equal to two. There are exactly O(k/2) = O(k) such leaf clusters, and, merging a leaf cluster takes O(1), for a total of O(k). The above procedure would, in the worst case, merge a single pair during each iteration, reducing the total number of clusters to k − 1. Repeating, as long as some clusters have merged, would take O(k) iterations. Thus, the final run time is bounded by O(k 2 ). Figure 6 shows the result of merge operation on Figure 5 .
DOMAIN SPACE PARTITIONING FOR COMPLEX CONDITION
Our overall strategy for solving the domain space partitioning problem for complex conditions is depicted in Figure 7 . The steps involved include parsing, evaluating leaf 
Parsing
To capture a complex condition, we use a DAG representation with internal nodes of types (&&, ||, !) and leaf nodes of type simple conditions. As mentioned in Section 3, the simple condition is captured as a multi-variable polynomial ROP 0. As a running example, consider the complex condition (2 × x0 + x1 + 4 > 0) || ( (x0 − 2 < 0) && !(x1 − 3 > 0) ) and its DAG representation shown in Figure 8 .
Evaluating Leaf Nodes
Each leaf node in the DAG representation is a simple condition and is evaluated as outlined in section 4. Specifically, each leaf node in the DAG representation corresponds to one instance of the domain space partitioning problem for simple conditions. Figure 9 shows the partitioned domain spaces for the leaf nodes of our running example.
Domain Space Propagation and Merging
After computing the partitioned domain spaces for leaf For the logical not operator (!), the truth value of a space marked as true or false is inverted. A space marked as unknown is unchanged. Figure 11 shows the DAG representation after applying logical not operator (!) to the (x1 −3 > 0) leaf node.
For the logical and operator (&&), the merging is performed on those spaces that have an overlap region. Let us assume L and R are two partitioned domain spaces. Let us further assume that s l ∈ L and sr ∈ R are two overlapping spaces in those domains. If space sp is the overlapping space between s l and sr, then sp will be added to the result of the logical and. The truth value of sp is computed using the merge rules given in Figure 10 . This procedure is shown in Algorithm 2. Figure 12 shows an example of the logical and merging of two partitioned domain spaces. In Figure 12 , two spaces s l1 and sr1 are overlapping and their overlap is space sp1, with its truth value set to false. In the same way, Algorithm 2, with two nested for loops, has O(N 2 ) running time. To improve on this algorithm, instead of comparing all the pairs of spaces in each domain space to see if they are overlapped or not, we use the R-tree data structure [14] to make the search job faster. An R-tree as defined in [14] is a height-balanced tree suitable for handling spatial data in multidimensional spaces. Figure 13 shows a partitioned domain space and the way it is represented using the R-tree structure.
Algorithm 3 uses the R-tree data structure to make Algorithm 2 faster. Specifically, Algorithm 3 uses an R-tree representation of the domain spaces to efficiently find all overlapping regions. The running time of Algorithm 3 is O (N × log(N ) ). Finally, the logical or operator can be performed in a way similar to the logical and operator.
Algorithm 2 Logical-AND Space Merging-Exhaustive way 1: Input: Partitioned domain spaces S l and Sr 2: Output: Merged domain space Sp 3: for all spaces l ∈ S l do 4: for all spaces r ∈ Sr do 5:
p ← l ∩ r {Intersection of the two subspaces} 6:
if (p = φ) then 7:
p.truth ← f mergerule (l.truth, r.truth) {Fig 10} 8:
Sp.push(p) 9:
end if 10:
end for 11: end for 12: Sp.merge() 13: return Sp Figure 13 : Partitioned Domain Space Representation Using R-tree Using the not logical operator and the merge algorithms for logical operations and and or, the DAG representation is recursively merged in a bottom-up traversal. Figure 14 shows the result of merging the spaces of Figure 9 in three steps. Figure 14(a) shows the initial state after evaluating the leaf nodes, Figure 14 
EXPERIMENTS
We tested our tool, using two different approaches. In the first approach we picked some random simple and complex conditions from Mediabench [17] applications. In the second approach we evaluated our tool using some synthetic examples with more aggressive combination of supported Algorithm 3 Logical-AND Space Merging-Using R-tree 1: Input: Partitioned domain spaces S l and Sr 2: Output: Merged domain space Sp 3: rT = make an R-tree using Sr 4: for all spaces l ∈ S l do 5: overlappedRegion = rT.overlap(l) 6:
for all spaces o ∈ overlappedRegion do 7:
p ← l ∩ o {Intersection of the two subspaces} 8:
p.truth ← f mergerule (l.truth, o.truth) {Fig 10} 9:
Sp.push(p) 10:
end for 11: end for 12: Sp.merge() 13: return Sp 
Mediabench Examples
In our first set of experiments, we randomly selected a number of simple and complex conditions from Mediabench applications [17] . Table 2 gives some basic statistics for the selected conditions, namely, the total number of simple and complex conditions (#Exp), average number of variables per condition (Avg. #Var), average number of arithmetic operations per condition (Avg. #Arith), average number of logical operations per condition (Avg. #Logic), and the average CPU time for evaluating a condition (Time) . Table 3 shows the ratio of truth values for Mediabench examples, as computed by our technique. On the average, about 92.7% of the whole domain of each condition is evaluated to true or false and about 7.30% is evaluated to unknown. Note that, the portion of the domain space that is evaluated to true or false (i.e., 92.7%), represent the amount of pruning (with respect to evaluating the condition for all possible domain values) achieved by our algorithm. Conversely, the portion of the domain space that is evaluated to unknown (i.e., 7.30%) would require exhaustive evaluation to resolve the truth value of the condition. 
Synthetic Examples
In our second set of experiments, we evaluated our tool using some synthetic examples with more aggressive combination of supported arithmetic operators. We generated a total of 500 synthetic single and complex conditions, of those, a partial list of simple conditions is presented in Table 4 . (x0 * x0 * x1 * x1 + x2 * x3 == 100) 678 0.08 (x0 * x0 * x1 * x1 * x2 * x3 == 100) 345 0.05 (x0 + x1 + x2 + x3 + x4 == 100) 171975 95.58
(x0 * x1 * x2 + x3 + x4 < 100) 97802 47.99
((x0 * x0 * x1 * x2) + x3 + x4 == 100) 84499 42.14 ((x0 * x0) + (x1 * x1) + x2 + x3 + x4 == 100) 63296 144.97
((x0 * x0) + (x1 * x2 * x3 * x4) < 100) 38456 10.72
((x0 * x0) + (x1 * x2) + (x3 * x4) < 100) 24057 10.02 (x0 * x0 * x1) + (x2 * x3 * x4) < 100) 10616 2.63 (x0 * x0 * x1 * x1 * x2 * x3 * x4 < 100) 6336 1.1 ((x0 * x0 * x1 * x1) + x2 + x3 + x4 < 100) 3272 1.29 Table 4 gives some basic statistics for the synthetic simple conditions, namely, the actual example (Single Condition), the generated number of unmerged spaces (#Spaces), and the CPU time for evaluating the synthetic single condition (Time). In our strategy for generating these examples, we considered the number of variables ranging from 1 to 5, the number of arithmetic operations (+, −, ×) from 1 to 5, the number of relational operators from 2 to 3 and the number of logical operators from 1 to 2. For a complete list of applied synthetic examples for single and complex conditions refer to [13] and [12] . Figure 15 and Figure 16 show the CPU time for running our algorithm on those simple condition examples with four or five variables. Figure 17 and Figure 18 show the CPU time for running our algorithm on those complex condition examples with 3 variables, 2 or 3 relational operators and 1 or 2 logical operators. Our results show that the CPU time for running our algorithm is proportional to the number of spaces into which the domain of the condition that is being evaluated is partitioned.
CONCLUSION
In this paper we have proposed a method for solving the expression equivalence problem using partial evaluation. In our method, we used interval analysis to substantially prune the domain space of arithmetic expressions (and conditional expressions) and limited the evaluation effort to a sufficiently small number of minimally sized spaces within the domain of the expression. Then, we extend the technique to incorporate arbitrary use of logic operators and, or, and not within arithmetic expressions. Our results show that the proposed method is fast enough to be of use in practice.
ACKNOWLEDGMENT
This work was supported, in part, by the National Science Foundation award number CCR-0205712. Number of Spaces#Var.=3, #Rel Op=3, #Logic Op=2
