Abstract-In this paper we present a lnew data structure for representing Boolean functions and an associated set of manipulation algorithms. Functions are represented by directed, acyclic graphs in a manner similar to the repre$enJation:s introduced by Lee [11 and Akers 121, but with further restrictions on the ordering of decision variables in the graph. Although a function requires, in the worst case, a graph of size exponential in the number of arguments, many of the functions encountered in typical applications have a more reasonable representation. Our algorithms have time complexity proportional to the sizes of the graphs being operated on, and hence are quite efficient as long as the graphs do not grow too large. We present experimental results from applying these algorithms to problems in logic design verification that demonstrate the practicality of our approach.
I. INTRODUCTION
D OGLEAN Algebra forms a cornerstone of computer science and digital system design. Many problems in digital logic design and testing, artificial intelligence, and combinatorics can be expressed as a sequence of operations on Boolean functions. Such applications would benefit from efficient algorithms for representing and manipulating Boolean functions symbolically. Unfortunately, many of the tasks one would like to perform with Boolean functions, such as testing whether there exists any assignment of input variables such that a given Boolean expression evaluates to 1 (satisfiability), or two Boolean expressions denote the same function (equivalence) require solutions to NP-complete or co NPcomplete problems [3] . Consequently, all known approaches to performing these operations require, in the worst case, an amount of computer time that grows exponentially with the size of the problem. This makes it difficult to compare the relative efficiencies of different approaches to representing and manipulating Boolean functions. In the worst case, all known approaches perform as poorly as the naive approach of representing functions by their truth tables and defining all of the desired operations in terms of their effect on truth table entries. In practice, by utilizing more clever representations and manipulation algorithms, we can often avoid these exponential computations. [4] are quite impracticalevery function of n arguments has a represdnitation of size 2" or more. More practical approaches utilize representations that, at least for many functions, are not of exponential size. Example representations include as a reduced sum of products [4] (or equivalently as sets of prime cubes [5] ) and factored into unate functions [6] . These Due to these characteristics, most programs that process a sequence of operations on Boolean functions have rather erratic behavior. They proceed at a reasonable pace, but then suddenly "blow up," either running out of storage or failing to complete an operation in a reasonable amount of time.
In this paper we present a new class of algorithms for manipulating Boolean functions represented as directed acyclic graphs. Our representation resembles the binary decision diagram notation introduced by Lee [1] and further popularized by Akers [2] . However, we place further restrictions on the ordering of decision variables in the, vertices. These restrictions enable the development of algofithms for manipulating the represehitations in a more efficieni manner.
Our representation has several advantages over previous approaches to Boolean function manipulation. First, most commonly encountered functions have a reasonable representation. Por example, all symmetric functions (including even and odd parity) are represented by graphs where the number of vertices grows at most as the square of the number of arguments. Second, the performance of a program based on our algorithms when processing a sequence of operations degrades slowly, if at all. That is, the time complexity of any single operation is bounded by the product of the graph, sizes for the functions being operated on. For example, complementing a function requires time proportional to the size of the function graph, while combining two functions with a binary operation (of which intersection, subtraction, and testing for 0018-9340/86/0800-0677$01.00 © 1986 IEEE 677 implication are special cases) requires, at most, time proportional to the product of the two graph sizes. Finally, our representation in terms of reduced graphs is a canonical form, i.e., every function has a unique representation. Hence, testing for equivalence simply involves testing whether the two graphs match exactly, while testing for satisfiability simply involves comparing the graph to that of the constant function 0.
Unfortunately, our approach does have its own set of undesirable characteristics. At the start of processing, we must choose some ordering of the system inputs as arguments to all of the functions to be represented. For some functions, the size of the graph representing the function is highly sensitive to this ordering. The problem of computing an ordering that minimizes the size of the graph is itself a co NP-complete problem. Our experience, however, has been that a human with some understanding of the problem domain can generally choose an appropriate ordering without great difficulty. It seems quite likely that using a small set of heuristics, the program itself could select an adequate ordering most of the time. More seriously, there are some functions that can be represented by Boolean expressions or logic circuits of reasonable size but for all input orderings the representation as a function graph is too large to be practical. For example, we prove in the Appendix that the furnctions describing the outputs of an integer multiplier have graphs that grow exponentially in the word size regardless of the input ordering. With the exception of integer multiplication, our experience has been that such functions seldom arise in digital logic design applications. For other classes of problems, particularly in combinatorics, our methods seem practical only under restricted conditions.
A variety of graphical representations of discrete functions have been presented and studied extensively. A survey of the literature on the subject by Moret [7] cites over 100 references, but none of these describe a sufficient set of algorithms to implement a Boolean function manipulation program. Fortune, Hopcroft, and Schmidt [8] studied the properties of graphs obeying similar restrictions to ours, showing that two graphs could be tested for functional equivalence in polynomial time and that some functions require much larger graphs under these restrictions than under milder restrictions. Payne [9] describes techniques similar to ours for reducing the size of the graph representing a function. Our algorithms for combining two functions with a binary operation, and for composing two functions are new, however, and these capabilities are central to a symbolic manipulation program.
The next section of this paper contains a formal presentation of function graphs. We define the graphs, the functions they represent, and a class of "reduced" graphs. Then we prove a key property of reduced function graphs: that they form a canonical representation of Boolean functions. In the following section we depart from this formal presentation to give some examples and to discuss issues regarding the efficiency of our representation. Following this, we develop a set of algorithms for manipulating Boolean functions using our representation. These algorithms utilize many of the classical techniques for graph algorithms, and we assume the reader has some familiarity with these techniques. We then present some experimental investigations into the practicality of our methods. We conclude by suggesting further refinements of our methods.
A. Notation
We assume the functions to be represented all have the same n arguments, written xl, * xn. In expressing a system such as a combinational logic network or a Boolean expression as a Boolean function, we must choose some ordering of the inputs or atomic variables, and this ordering must be the same for all functions to be represented.
The function resulting when some argument xi of function f is replaced by a constant b is called a restriction of f (sometimes termed a cofactor [10] ) and is denoted fii=b* That is, for any arguments xl, * xn, fjlxi=b(X1, -, x") =(XI,*, xi_-,. b, xi+ ,***, xn).
Using this notation, the Shannon expansion [11] index(v) = index(v'), a(low(v)) = low(u'), and o(high(v)) = high(v').
Note that since a function graph contains only 1 root and the children of any nonterminal vertex are distinguished, the isomorphic mapping a between graphs G and G' is quite constrained: the root in G must map to the root in G', the root's low child in G must map to the root's low child in G', and so on all the way down to the terminal vertices. Hence Suppose instead that there is some vertex u with index(v) = j < i, but such that there is no other vertex w having j < index(w) < i. The function f does not depend on the xj and hence the subgraphs rooted by low(u) and high(u) both denote f, but this implies that low(u) = high(u) = u, i.e., G is not reduced. Similarly, vertex v' must be the root of G', and hence the two graphs are isomorphic.
Finally The odd parity function of n variables is denoted by a graph containing 2n + 1 vertices. This compares favorably to its representation in reduced sum-of-products form (requiring 2n terms). This graph resembles the familiar parity ladder contact network first described by Shannon [11] . In Upon closer examination of these two graphs, we can gain a better intuition of how this problem arises. Imagine a bit-serial processor that computes a Boolean function by examining the arguments xi, x2, and so on in order, producing output 0 or 1 after the last bit has been read. Such a processor requires internal storage to store enough information about the arguments it has already seen to correctly deduce the value of the function from the values of the remaining arguments. Some functions require little intermediate information. For example, to compute the parity function a bit-serial processor need only store the parity of the arguments it has already seen. Similarly, to compute the function xl x2 + + X2n -I X2n, theZ processor need only store whether any of the preceding pairs of arguments were both 1, and perhaps the value of the previous argument. On the other hand, to compute the function xl xn, + * * * + xn x2,, we would need to store the first n arguments to correctly deduce the value of the function from the remaining arguments. A function graph can be thought of as such a processor, with the set of vertices having index i describing the processing of argument xi. Rather than storing intermediate information as bits in a memory, however, this information is encoded in the set of possible branch destinations. That is, if the bit-serial processor requires b bits to encode information about the first i arguments, then in any graph for this function there must be at least 2b vertices that are either terminal or are nonterminal with index greater than i having incoming branches from vertices with an index less than or equal to i. For example, the function xl X4 + x2 X5 + x3 x6 requires 23 branches between vertices with an index less than or equal to 3 to vertices which are either terminal or have an index greater than 3. In fact, the first 3 levels of this graph must form a complete binary tree to obtain this degree of branching. In the generalization of this function, the first n levels of the graph form a complete binary tree, and hence the number of vertices grows exponentially with the number of arguments.
To To use our algorithms on anything other than small problems (e.g., functions of 16 variables or more), a user must have an intuition about why certain functions have large function graphs, and how the choice of input ordering may affect this size. In Section V we will present examples of how the structure of the problem to be solved can often be exploited to obtain a suitable input ordering.
C. Inherently Complex Functions
Some functions cannot be represented efficiently with our representation regardless of the input ordering. Unfortunately, the functions representing the output bits of an integer multiplier fall within this class. The appendix contains a proof that for any ordering of the inputs a,, * * * , an and b1, . . ., bn at least one of the 2n functions representing the integer product of a b requires a graph containing at least 2n/8 vertices. While this lower bound is not very large for word sizes encountered in practice (e.g., it equals 256 for n = 64), it indicates the exponential complexity of these functions. Furthermore, we suspect the true bound is far worse.
Empirically, we have found that for word sizes n less than or equal to 8, the output functions of a multiplier require no more than 5000 vertices for a variety of different input orderings. However, for n > 10, some outputs require graphs with more than 100 000 vertices, and hence become impractical.
Given the wide variety of techniques used in implementing multipliers [12] id(low(v)) = id(high(v)), then vertex v is redundant, and we should set id(v) = id(low(v)). Second, if there is some labeled vertex u with index(u) = i having id(low(v)) = id(low(u)), and id(high(v)) = id(high(u)), then the reduced subgraphs rooted by these two vertices will be isomorphic, and we should set id(v) = id(u). A sketch of the code is shown in Fig. 4 . First, the vertices are collected into lists according to their indexes. This can be done by a procedure similar to Traverse where as a vertex is visited, it is added to the appropriate list. Then we process these lists working from the one containing the terminal vertices up to the one containing the root. For each vertex on a list we create a key of the form (value) for a terminal vertex or of the form (lowid, highid) for a nonterminal vertex, where lowid = id(low(v)) and highid = id(high(v)). If a vertex has lowid = highid, then we can immediately set id(v) = lowid.
The remaining vertices are sorted according to their keys. We then work through this sorted list, assigning a given label to all vertices having the same key. We also select one vertex record for each unique label and store a pointer to this vertex in an array indexed by the label. These selected vertices will form the final reduced graph. Hence, we can obtain the reduced version of a subgraph with root v by accessing the array element with index id(v). We use this method to modify a vertex record so that its two children are vertices in the reduced graph and to return the root of the final reduced graph when the procedure is exited. Note that the labels assigned to the vertices by this routine can serve as unique identifiers for later routines. The time complexity of the algorithm is dominated by the time to sort the lists. Fig. 5 shows an example of how the reduction algorithm works. Next to each vertex we show the key and the label generated during the labeling process. Observe that both vertices with index 3 have the same key, and hence the right hand vertex with index 2 is redundant.
C. Apply
The procedure Apply provides the basic method for creating the representation of a function according to the G3 is the resulting reduced graph. Fig. 7 shows an example of how this algorithm would proceed in applying the "OR" operation to graphs representing the functions r (xl *x3) and x2 x3. This figure shows the graph created by the algorithm before reduction. Next to each vertex in the resulting graph, we indicate the two vertices on which the procedure Apply-step was invoked in creating this vertex. Each of our two refinements is applied once: when the procedure is invoked on vertices a3 and bI (because 1 is a controlling value for this operator), and on the second invocation on vertices a3 and b3. For larger graphs, we would expect these refinements to be applied more often. After the reductioni algorithm has been applied, we see' that the resulting graph indeed represents the function --(xI tZ2 x3). simpler and more efficient way by a more syntactic technique. That is suppose functions fi and f2 are represented by graphs G1 and G2, respectively. We can compose the functions by replacing each vertex v in graph G1 having index i by a copy of G2, replacing each branch to a terminal vertex in G2 by a branch to low(v) or high(v) depending on the value of the terminal vertex. We can do this however, only if the resulting graph would not violate our index ordering restriction. That is, there can be no indexes j E Ifi, k E If2, such that i <j k or i > j 2 k. Assuming both Gl and G2 are reduced, the graph resulting from these replacements is also reduced, and we can even avoid applying the reduction algorithm. While this technique applies only under restricted conditions, we have found it a worthwhile optimization.
F. Satisfy
There are many questions one could ask about the satisfying set Sf for a function, including the number of elements, a listing of the elements, or perhaps just a single element. As can be seen from Table I , these operations are performed by algorithms of widely varying complexity. A single element can be found in time proportional to n, the number of function arguments, assuming the graph is reduced. Considering that the value of an element in Sf is specified by a bit sequence of length n, this algorithm is optimal. We can list all elements of Sf in time proportional to n times the number of elements, which again is optimal. However, this is generally not a wise thing to do-many functions that are represented by small graphs have very large satisfying sets. For example, the function 1 is represented by a graph with one vertex, yet all 2n possible combinations of argument values are in -its satisfying set. Hence, care must be exercised in invoking this algorithm. If we wish to find an'element of the satisfying set obeying some property, it can be very inefficient to enumerate all elements of the satisfying set and then pick out an element with the desired characteristics. Instead, we should specify this property in terms of a Boolean function, compute the Boolean product of this function and the original function, and then use the procedure Satisfy-one to select an element. Finally, we can compute the size of the' satisfying set by an algorithm of time proportional to the size of the graph (assuming integer operations of sufficient precision can be performed in constant time). In'general, it is much faster to apply this algorithm than to enumerate all elements of the satisfying set and count them.
The procedure Satisfy-one shown in Fig. 9 is called with the root of the graph and an array x initialized to some arbitrary pattern of O's and 1's. It returns'the value false if the function is unsatisfiable (Sf = and the value true if it is. In the latter case, the entries in the array are set to a set of values denoting some element in the satisfying set. This procedure utilizes a classic depth-first search with backtracking scheme to find a terminal vertex in the graph having value 1 The procedure will only backtrack at some vertex when the first child it tries is a terminal vertex with value 0, and in this case it is guaranteed to succeed for the second child. Thus, the complexity of the algorithm is O(n).
To enumerate all elements of the satisfying set, we can perform an exhaustive search of the graph, printing out the element corresponding to the current path every time we reach a terminal vertex with value 1. The procedure Satisfy-all shown in Fig. 10 implements this method. This procedure has three arguments: the index of the current function argument in the enumeration, the root vertex of the subgraph being searched, and an array describing the state of the search. It is called at the top level with index 1, the root vertex of the graph, and an array with arbitrary initialization. The effect of the procedure when invoked with index i, vertex v, and with the array having its first i -1 elements equal to b1, i *, is to enumerate all elements in the set {(big **e, bi-,. x,,is, x")1 fu=(bis*, bi-,, xi, ... , Xj)l} As with the previous algorithm, this procedure will work for any function graph, but it could require time exponential in n for an unreduced graph regardless of the size of the satisfying set (consider a complete binary tree with all terminal vertices having value 0). For a reduced graph, however, we are guaranteed that the search will only fail when the procedure is called on a terminal vertex with value 0, and in this case the recursive call to the other child will succeed. Hence, at least half of the recursive calls to Satisfy-all generate at least one new argument value to some element in the satisfying set, and the overall complexity is O(n* IS 1). traverses the graph in the manner of the procedure Traverse.
The formula is applied only once for each vertex in the graph, and hence the total time complexity is O() GI). Table II . These data were measured with the best ordering we were able to find, which happened to be the first one we tried: first the 5 control inputs, then the carry input, and then an interleaving of the two data words from the least significant to the most. In this table, the number of gates is defined as the number of logic gates in the schematic diagrams for the two chips times the number of each chip used. The number of patterns equals the number of different input combinations. CPU time is expressed in minutes as measured on a Digital Equipment Corporation VAX 11/780 (a 1-MIP machine). The times given are for complete verification, i.e., to construct the functions from both the circuit and the behavioral descriptions and to establish their equivalence. The final column shows the size of the reduced graph for the A = B output. In all cases, this was the largest graph generated.
As can be seen, the time required to verify these circuits is quite reasonable, in part because the basic procedures are fast. Amortizing the time used for memory management, for the user interface, and for reducing the graphs, each call to the evaluation routines Apply-step and Compose-step requires around 3 ms. For example, in verifying the 64-bit ALU, these two procedures were called over 1.6 x 106 times. The total verification time grows as the square of the word size. This is as good as can be expected: both the number of gates and the sizes of the graphs being operated on grow linearly with the word size, and the total execution time grows as the product of these two factors. This quadratic growth is far superior to the exponential growth that would be required for exhaustive analysis. For example, suppose that at the time the universe first formed (about 20 billion years ago [16] ) we started analyzing the 32-bit ALU exhaustively at a rate of one pattern every microsecond. By now we would be about half way through! For the 64-bit ALU, the advantage over exhaustive analysis is even greater.
These ALU circuits provide an interesting test case for evaluating different input orderings, because the successive bits of the function output word are functions of increasingly more variables. Fig. 11 shows how the sizes of these graphs depend on the ordering of circuit inputs. The for n > i 2 1, and fi equals the desired function. Each of these functions can be represented by graphs with 6 vertices. It is unclear, however, how often such decompositions occur, how easy they ate to find, and how they would affect the efficiency of the algorithms.
APPENDIX THE COMPLEXITY OF INTEGER MULTIPLICATION
In this appendix, we prove that the functions representing the outputs of an integer multiplier provide a difficult case for our representation, i.e., the graph sizes grow exponentially in the word size regardless of the ordering of the input variables. Given that there are (2n)! possible orderings of the input variables, we could not hope to derive this result experimentally, and hence we must provide a detailed proof.
Our proof is based on principles similar to those used in proving area-time lower bounds on multiplier circuits [18] , [19] . However, we must show not just that a large amount of information must be transferred from the set of inputs to the set of outputs in performing multiplication, but that certain individual outputs require high information transfer.
Consider a multiplier with inputs al, , an and bl, ** , bn corresponding to the binary encoding of integers a and b with a, and b, being the least significant bits. This circuit has 2n outputs corresponding to the binary encoding of the product a-b, described by functions muli(a,, * * , a,, bi, * * *, b,) for 1 c i c 2n. For a permutation ir of { 1, * * *, 2n }, let G(i, ir) L={r(j)|n+1<j.2n, ir(j)>n}.
That is, F represents those indexes of argument a occurring in the first half of the input sequence, while L represents those indexes of b (with n added to them) occurring in the second half. If t < n/2 then define F and L as F= {r(j)Il.j1n, ir(j)>n} L= {I7r(j)In+1.j.2n, ir(j).n}.
That is, F represents those indexes of b (with n added to them) occurring in the first half while L represents those indexes of a occurring in the second. In either case, the sets F and L will each contain at least n/2 elements. We will consider the elements of F to be data inputs and those of L to be control. Since multiplication is commutative, we are free to choose which argument is considered the control input and which is considered the data in our proof. For 
