We show the advantage of Quarternary Decision Diagrams (QDDs) 
Introduction
Branching program machines for BDDs have been used in control applications [2, 5, 7, 6] . Fast response is especially important in control applications in which there are usually hundreds of inputs. For such applications, a general purpose microprocessor (MPU) cannot meet the speed requirements. A branching program machine can be several times faster than an MPU: An ordinary MPU requires two or three machine instructions to read and test one input variable, while the branching program machine requires just one instruction [3] .
In this paper, we present a Quarternary Decision Diagram (QDD) to implement a branching program machine. Although the QDD machine requires longer instruction words than the BDD machine, the QDD machine is 1.3−2.0 times faster than the corresponding BDD machine. In the past, when the price of memory was high, 16-bit controllers were popular [13, 25] . However, nowadays, the price of memory is lower, and a 32-bit or wider architecture is often used to increase the performance of controllers. So, in this paper, we show a method to increase the performance by increasing the number of bits in a word.
The rest of this paper is organized as follows: Section 2 introduces a method to represent multi-output logic functions by multi-valued decision diagrams. Section 3 introduces branching program machines: It introduces both a 4-address QDD machine and a 3-address QDD machine. The 3-address QDD machine requires less memory than the 4-address QDD machine. Section 4 shows an optimization problem of codes for 3-address QDD machines. Section 5 shows the experimental results. And finally, Section 6 concludes the paper.
Representation of Multiple-Output Functions

Multi-Valued Decision Diagrams
An arbitrary n variable logic function can be represented by a binary decision diagram (BDD). Evaluation of a BDD requires n table look-ups. Fig. 2.1 shows an example of an MTBDD (multi-terminal binary decision diagram). In this case, many outputs can be evaluated at the same time. To further speed up the evaluation, a multiple-valued decision diagram (MDD) is used. In the MDD(k), k variables are grouped to form a 2 k -valued super variable. To evaluate the MDD(k), we need at most n k table look-ups [15, 19] . When the function is represented by an MDD(k), the evaluation of a logic function can be k times faster than the corresponding BDD 1 . Thus, a larger k yields a faster evaluation of the MDD(k). Unfortunately, the size of memory to represent a node for an MDD(k) is proportional to 2 k , as shown in Fig. 2 .2. For many benchmark functions, the total size of the memory for an MDD(k) achieves its minimum when k = 2 [19] . Therefore, in logic evaluation, MDD(2)s are more suitable than BDDs. Since nodes in an MDD(2) have 4 branches, it is termed a Quarternary Decision Diagram (QDD). 
Optimization of MDDs
In an MDD(k), the evaluation of an n-variable logic function can be done by at most n k table look-ups. So, the major problem is the minimization of the number of nodes. In general, it is not so easy to obtain an MDD(k) with the minimum number of nodes. The following heuristic method is used to obtain near minimal MDDs:
1. Minimize the number of nodes of the BDD by a heuristic method [21] . 2. Partition the input variables to generate an MDD(k) [22] . Fig. 2.3 shows an example of a conversion from a BDD into an MDD(2). In the above MDDs, we assume each group of variables has the same size. Such MDDs are homogeneous MDDs. When the groups have different sizes, the MDD is a heterogeneous MDD. For simplicity, in this paper, we consider only homogeneous MDDs.
Branching Program Machine
Special machines to evaluate MDDs have been developed [8, 9, 10] . Unfortunately, they are unsuitable for practical applications. Here, we consider a machine whose ar- chitecture is well-suited for evaluating MDDs, but is easily programmed.
2-Address BDD Machine
A branching program for BDDs uses only two kinds of instructions:
B_Branch (ADDR0, ADDR1), INDEX Output DATA, and GOTO ADDR.
The first one is the binary branch instruction that is similar to the computed GOTO statement of the FORTRAN language: If the value of the variable specified by INDEX is equal to 0, then go to ADDR0, otherwise goto ADDR1. The second one performs the output operation followed by an unconditional GOTO operation. Fig. 3 .1 produces the value of the variable x i selecting the next branch address. When x i = 0, ADDR0 is selected. Otherwise, ADDR1 is selected. The selected address is then loaded into the program counter (PC). In this way, the next address is specified. To reduce the width of the instruction words, 1-address BDD machines shown in Fig. 3 .2 have been developed [2, 6, 25, 13] . In this case, when the value specified by INDEX is 1, the machine works similarly to the case of the 2-address BDD machine. Otherwise, the content of the program counter (PC) is incremented by one, to access the next address. In this case, the size of the instruction word is reduced, but unconditional GOTO instructions are necessary, as shown later.
In this example, DATA in Output DATA is the decimal equivalent of the function output values expressed in binary as
f 3 , f 2 , f 1 , f 0 . ( E n d o f E x a m p l e )
4-Address QDD Machine
By evaluating two binary variables and by increasing the number of branch addresses to four, we have a branch instruction for a 4-address QDD machine. Since it evaluates two binary variables at a time, it can reduce the evaluation time to half that of the 2-address BDD machine.
A branching program for 4-address QDD machines consists of two kind of instructions: The first field of the branching instruction specifies the branch command. The second field, INDEX, specifies the index i of the input variable X i . It determines which variables to select. In the case of a QDD, two consecutive binary variables are selected at a time. The input selector shown in Fig. 3 .3 producesX i . The upper multiplexer selects the variable. When X i = (0, 0), ADDR0 is selected; when X i = (0, 1), ADDR1 is selected; when X i = (1, 0), ADDR2 is selected; and when X i = (1, 1), ADDR3 is selected. The selected address is then loaded into the program counter (PC). In this way, the next address is specified as a function of INDEX i and the input variable X i . Note that this instruction requires a rather long word, which would be expensive for embedded applications. Fig. 3 .5 shows the format for the output instruction. The left field specifies the instruction type: Output. The middle field contains the address to which this program should jump. The right field is the output value, as shown at the bottom of the QDD.
3-Address QDD Machine
Since the 4-address QDD instruction requires a long word, we developed a 3-address QDD machine. The branch instruction for the 3-address QDD machine contains only three address fields. For example, consider the instruction shown in Fig. 3 .6. This instruction is symbolically denoted by In this instruction, ADDR1, ADDR2, and ADDR3 are specified, but ADDR0 is missing. ADDR0 is replaced by "+1", which shows the next address of the current instruction. This instruction performs the following operations:
• Let i be the value specified by INDEX. If (i = 0) then goto the next address of the current instruction, else goto ADDRi. Note that the last instruction is an unconditional GOTO statement. As shown in the next section, the number of unconditional GOTO statements can be minimized by an optimization algorithm. Fig. 3 .7 shows the architecture of the 3-address QDD machine, where only the circuit for branching operations is shown. Consider the instruction in Fig. 3 .6. When the value specified by INDEX and the input variables is non-zero, the machine works similarly to the case of the 4-address QDD machine. When the value specified by IN-DEX and the input variables is equal to 0, the content of the program counter (PC) is incremented by one, to access the next address.
In the real system, we use four types of branch instructions shown in Fig. 3 .8 . To distinguish four branch instructions, we use two additional bits in the instruction field. However, as shown in the experimental results, by using four branch instructions, we can reduce the number of instructions and the total bit size. So, the cost of these extra bits is fully compensated. 
Optimization of Codes for QDD Machines
In this section, we consider a method to reduce the number of instructions for QDD machines. 
(Proof) In a 3-address QDD machine, a non-terminal node is represented by either a branch instruction or a pair consisting of a branch instruction and an unconditional GOTO statement. Also, a terminal node is represented by an output instruction. Thus, the number of unconditional GOTO statements is at most the number of non-terminal nodes.
(Q.E.D.) In the case of a 4-address QDD machine, there is no code optimization problem, i.e., the instructions can be generated in any order. However, in the case of a 3-address QDD machine, the length of the program depends on the order of instructions. (+1,T3,T3,T3),X2  GOTO N3  N2:Q_Branch(+1,T1,T1,T1),X3  GOTO T0  N3:Q_Branch(+1,T2,T2,T2),X3 GOTO T1 T0:Output 0, and GOTO N0 T1:Output 1, and GOTO N0 T2:Output 2, and GOTO N0 T3:Output 3, and GOTO N0
Note that, the above program has four unconditional GOTO statements that are not part of output statements. However, when the code is generated in the depth-first order, it has no unconditional GOTO statements that are not part of output statements.: / ** Code without Unconditional GOTO ** / N0:Q_Branch(+1,N1,N1,N1),X1 Q_Branch(+1,N3,N3,N3),X2 Q_Branch(+1,T1,T1,T1),X3 T0:Output 0, and GOTO N0 N1:Q_Branch(+1,T3,T3,T3),X2 N3:Q_Branch(+1,T2,T2,T2),X3 T1:Output 1, and GOTO N0 T2:Output 2, and GOTO N0 T3:Output 3, and GOTO N0
Note that the first four instructions correspond to the leftmost path from the root node to the terminal node T0.
The next three instructions correspond to the path from the the node N1, the node N3, and to the terminal node T1. (End of Example)
The code optimization problem for a 3-address QDD machine can be reduced to a graph covering problem as follows: 
Experiment and Observation
Benchmark Results
To see the effectiveness of QDDs over BDDs, and the effectiveness of the code optimization, we realized certain benchmark functions by BDDs and QDDs. First,we compare QDDs and BDDs with respect to the number of nodes. Then, we convert these into code for BDD and QDD machines, and compare QDD's and BDD's with respect to the number of instructions. Table 5 .1 shows the experimental results. Func. name denotes the name of the benchmark functions; # Inp. denotes the number of input variables; # Out. denotes the number of outputs; BDD Nodes denotes the number of nodes of the MTBDD including both terminal and non-terminal nodes; Opt. Codes under BDD denotes the number of instructions of the optimized code for the 1-address BDD machine (near optimal solution); Term. Nodes denotes the number of terminal nodes; Aver. Inst. under BDD denotes the average number of instructions to evaluate an input vector by a 1-address BDD machine; QDD Nodes denotes the number of nodes of the MTQDD including both terminal and non-terminal nodes, that is the same as the number of instructions for a 4-address QDD machine; X=00 Codes under QDD denotes the number of instructions in the code for 3-address QDD machine, when only the first type of instruction in Fig. 3 .8 is used; Opt. Codes under QDD denotes the number of instructions of the optimized code for the 3-address QDD machine, when all four types of instructions in Fig. 3 .8 are used to minimize the number of GOTO statements; X = 00 GOTO denotes the number of GOTO statements, when only one type of branching instruction is used; Opt. GOTO=(Opt. Codes -QDD. Nodes) under QDD denotes the number of GOTO statements, when four types branching instructions are used; Aver. Inst. in QDD denotes the average number of instructions to evaluate an input vector by a 3-address QDD machine; and Ratio denotes the value: (Aver. Inst. in 1-address BDD machine)/(Aver. Inst. in 3-address QDD machine).
Detail of the Experiment
Optimization of Decision Diagrams: First, the ordering that minimizes the size of the MTBDD is obtained. Then, the input variables are partitioned into groups of two variables in the natural order to obtain the MTQDDs. Optimization of Codes: Theorem 4.1 shows how to minimize the number of GOTO statements. The algorithm given by [11] is only applicable to the program with nodes whose in-degrees and out-degrees are both two. So, we developed our own algorithm to obtain near optimal solutions for our more general case.
Observations
From the table, we can observe the following:
• The number of nodes in QDDs is smaller than that of BDDs.
• The number of instructions for the 3-address QDD machine can be considerably reduced by an optimization algorithm.
• For C432, in3, misex2, misj, and risc, the number of GOTO statements in the optimized QDD codes is zero. This means that optimal code is generated for these functions. Also, for these functions, optimal code for BDD machines are generated.
• signet requires many GOTO statements in both BDD and QDD machines. The number of GOTO statements for a BDD machine is given by (Opt. Codes)-(BDD Nodes)=8671-7347=1324.
• Opt. Codes, the number of instructions for a 3-address QDD machines is often larger than QDD Nodes, the number of instructions for a 4-address QDD machine. The column headed by Opt. GOTO (=OPT. Codes -QDD. Nodes) shows the extra GOTOs. Except for a few functions, the extra GOTOs are rather small. • Consider the value: (Sum of X=00 Codes)-(Sum of Optimal Codes)=28535-24528=4007. This shows the total number of instructions reduced by using four types of branch instructions, instead of using only one type of branching instructions. However, to specify four types of instructions, we need two additional 
