Introduction.
Field programmable gate arrays (FPGA's) are very useful in rapid prototyping as well as small volume production [3]. A look-up table (LUT) type FPGA shown in Fig. 1 .1 consists of LUT s and programmable interconnection. Both LUT's and programmable interconnections are controlled by static RAMS. We assume that each FPGA has q LUT's, and each LUT can realize an arbitrary logic function of k binary variables. We also assume that interconnection resources are sufficient, i.e., any logical network with q LUT's can be realized. Note that wiring is implemented by pass transistors, and the delay time for interconnection will often be larger than for the LUT's. Consider the realization of a moderately complex function. If k is small, e.g., k = 2 or 3, the LUT's are efficiently used, but the interconnections will be complex. This will be especially inefficient since interconnections are often more expensive than logic. If k is large, e.g., k = 7 or 8, the interconnections will be simpler, but the LUT's are not efficiently used. Thus, there exists an o timum value for k between 3 and 7. One study [207 shows that when k = 3 or 4, FPGA's require the smallest chip area. However, these results have not been used by manufacturers. Commercially available FPGA's contain many LUT's, but the interconnections are much slower than other standard mask type gate-arrays. Thus, it is often more important to reduce the propagation delay than to reduce the number of LUT's [5, 161. To reduce the propagation delay, we have to reduce the number of the levels in the networks. Also, if the interconnections and layout are regular, then the propagation delay will be smaller.
In this paper, we propose a design method for LUTbased FPGA's that produces LUT networks having the Fig. 1.2 . We assume that each LUT has 6 binary inputs. By pairing variables of twovalued input functions, we can obtain four-valued input functions. By using four-valued logic, we can reduce the number of variables to consider, and can design a network having a regular structure. We use the pseudoKronecker expansion for four-valued input two-valued output functions to design compact and regular LUT networks. The proposed method is also promising for the four-valued LUT based FPGA's [26, 14, 311. In the similar way, we can expand the new subfunctions as follows: Fig. 2 .2 is the binary decision tree corresponding to the above expansion. If we replace each node by a two-input multiplexer (Fig. 2.1(a) ), we have the network shown in Fig. 2 The relation among BDD, FDD, KDD and PKDD is shown in Fig. 2 .7. This shows that KDD's are a special case of PKDD's, and that BDD's and FDD's are each a special case of KDD's. It follows that when we realize a given function by 3-input LUT's, a method based on a PKDD requires the fewest LUT's. Fig. 2 .8 is a realization of the function in Table 2 .1 by 3-input LUT's. In this case, we used only the Shannon expansion for each LUT. However, in general, we can use any one of the three expansions to reduce the number of LUT's.
Definitions and Basic Properties
We assume that functions to be realized have n = 2r inputs. Let X = (21,22, --, z,,) be the input variables, An arbitrary logic function of n variables f ( X ) can be converted into a four-valued input two-valued outExample 3.1 The logic function shown in Table 2 .1 can be converted into the four-valued input two-valued output function in Table 2 .2, where X1 = (21,22) and From now on, a four-valued input two-valued output function will be simply called a function, and an ordinary two-valued input function will be called a twovalued input function. 
Design of FPGA's Using the Shannon Expansion
First, we will show a naive method to design LUT We can networks, where each LUT has six inputs. realize a function f by using the Shannon expansion, It is convenient to view the two binary control inputs as a single four-valued input. In this example, the control input of the output LUT is driven by X I , while X2 drives the control inputs of the other four LUT's. Fig. 4.1 shows that some combination of X3 and X4 drives the primary inputs of these four LUT's.
The following observations will reduce the number of LUT's: The diagram corresponding to the LUT network is called a QDD (Quarternary Decision Diagram). We can design an LUT network using an algorithm that is similar to that used for Binary Decision Diagrams (BDD's): If two sub-functions are the same, realize only one sub-function. If a sub-function involves 6 or fewer variables, stop (it is realized by a single LUT). f(al,~2,...,~,-3,X~-2,X~-l,X~) , where ai E Q, i = 1'2, -* , r -3. Then, f can be realized with at most 4r-3 -1 3 + p LUT's.
(Proof) f can be expanded as f (XI 7 XZ 9 * . + 3 Xr) = v Xpl Xi2 * * X , " : i 3 f (a1 a2 9 7 ar-3 7 Xr -2 9 X r -1 9 X r ), where a; E Q, i = 1'2,. e , T -3.
We can realize a selector circuit that selects one out of 4r-3 inputs by using 9 LUT's. Because there are p distinct functions of the form f(a1, a2,---,ar-3,Xr-~,X,.-~, X,.), we need to realize only p such functions.
Q.E.D.
A Design Method Using the PseudoKronecker Expansion.
The design method using the Shannon expansion is simple, but requires many LUT's. This is because each LUT is only used as a multiplexer, while it can realize any of 264 different functions. To reduce the number of LUT's, we can use the theory of pseudo-Kronecker expansions, which was developed for the optimization different expansions, of which the Shannon expansion is one. This leads to the question of which expansion reduces the complexity of the network. We can select an expansion that reduces the complexity of the network. 
4) If v is a nonterminal node with index(v)
where % is a characteristic vector for Si ( i = 0, 1,2,3). Xi is called the decision variable for node U. It is very difficult to obtain an exact minimum PKDD, so we will consider a heuristic method which obtains a good solution quickly. In this case, we have to consider the following 15 sub-functions: , cost = 2.
, cost = 1.
, cost = 1. , cost = 1. , cost = 0. These results are consistent with the relation of decision diagrams in Fig. 2 .7. The bottom row of the Table  7 .1 shows the relative sizes of the diagrams. On the average, PKDD's require 29% fewer nodes than BDD's. However, the nodes for BDD's and FDD's are comparable.
For 6-input LUT's (4-valued case).
The bit pairing algorithm [22] for PLA's with decoders was used to produce four-valued input twovalued output functions. Table 7 .1 also compares the number of non-terminal nodes for QDDs (Quarternary Decision Diagrams) and 4-valued PKDD's. QDDs were obtained by the 4-valued extension of Shannon Expansion in (3.1). Details of the optimization algorithm is shown in Appendix. On the average, QDDs require 37% fewer nodes than BDD's, and 4-valued PKDD's require 51% fewer nodes than BDD's. Also, 4-valued PKDD's require 23% fewer nodes than QDDs.
Conclusions and Comments
In FPGA design, interconnections are often more expensive than logic. FPGA's using 3-input LUT's require many logical levels and complex interconnections. On the other hand, FPGA's using Ginput LUT's require fewer interconnections and fewer logical levels.
In this paper, we showed a method to represent logic functions by using pseudo-Kronecker diagrams PKDD's). Experimental results show that 2-valued b KDD's require 29% fewer nodes than BDD's, and 4-valued PKDD's require 23% fewer than QDDs, the 4-valued extension of BDD's. Thus, this method is useful for the design of FPGA's with 6-input LUT's. However, when LUT's have less than 6 inputs, this method is not applicable.
In the practical design of FPGA's, we can further reduce the number of LUT's as follows: 1) Use complement edges 15, 71.
Consider the decision d iagrams where each edge can complement the sub-function. This corresponds the fact that LUT's can complement the inputs with no extra cost. In the case of 2-valued BDD's, complement ed es can reduce the number of nodes by 30 to 50% f151. 2) Remove redundant LUT's.
For a 3-input LUT, if both of two sub-functions are constants, then it realizes a variable or its complement. Thus, this LUT can be removed, and the variable is used as the output. If an LUT depends on at most three variables, then it can be realized with one LUT. For a 6-input LUT, if it depends on at most six variables, then it can be realized with one LUT. 3) Merge several LUT's into one.
For 6-input LUT's, if any of the sub-functions is a constant, then the function is realize by an LUT with 5 or fewer inputs. In AT&T ORCA series FPGA's, an LUT can be configured to realize either a 6-input function, or a pair of 5-input functions, or four 4-input functions. Thus, LUT's with fewer inputs can be merged. Except for the 6-input case, some of the inputs must be shared among the functions.
We assumed that the FPGA's are all binary, but the control inputs for LUT's can be four-valued inputs instead of pairs of binary variables. This will significantly reduce the connections. Thus, this method is also useful for a design of multiple-valued LUT type FPGA's.
[9] shows a method to realize a logic function by using multiple-valued multiplexers, where only circuits with tree type structure are generated. 
