This paper presents an approach of logic mapping into LUT-Array-Based PLD where Boolean functions in the form of the sum of generalized complex terms (SGCTs) can be mapped directly. While previous mapping approach requires predetermined variable ordering, our approach performs mapping and variable reordering simultaneously. For the purpose, we propose a directed acyclic graph based on the multiplevalued decision diagram (MDD) and an algorithm to construct the graph. Our algorithm generates candidates of SGCT expressions for each node in a bottom-up manner and selects the variables in the current level by evaluating the sizes of SGCT expressions directly. Experimental results show that our approach reduces the number of terms maximum to 71 percent for the MCNC benchmark circuits.
Introduction
Recently, Programmable Logic Devices (PLDs) are widely used for prototyping of development, scientific calculation, education, and nowadays even for consumer products, since designers can program, in other words, configure functions on the device by themselves and achieve short-term and lowcost design. Led by the advance of the technologies, the performance, the functionality, and the flexibility of PLDs has been significantly extended and they draw attention as a key device to open up a vista of the computer system called reconfigurable computing.
Plastic Cell Architecture (PCA) [2] , [3] is proposed as such an evolution of PLDs with the unique characteristics as array of homogeneous, relocatable, and expandable cells; unification of logic and memory; asynchronous cooperation of circuits; and dynamic reconfiguration by itself. In order to realize the characteristics, an architecture which consists of two layers as exemplified in Fig. 1 is proposed, referred to as a plastic part and a built-in part where circuits and connections are implemented, respectively. In this paper, a design methodology to map a given Boolean function into the plastic part of the PCA is presented.
A look-up table (LUT) is one of the most popular primitive device to implement combinational logics in reconfigurable architectures and an n-input m-output LUT can be programmed to work as any Boolean function with n input variables and m output values. The plastic part of the PCA is composed of a simple and regular array of LUTs. The target devices of this paper are such LUT-Array-Based PLDs, which can realize a target Boolean function in a form of a sum of more complex terms generated by series of cascaded LUTs ( Fig. 2 ) than AND terms in Programmable Logic Arrays (PLAs). A term which can be generated by a series of cascaded LUTs is referred to as a generalized complex term (GCT) and a Boolean function in a form of a sum of generalized complex terms (SGCT) can be mapped into PCA devices, mapped directly for the case of PCA-Chip2 [4] or mapped as some partial logic [5] , [6] for the case of PCA-1/PCA-2 [7] , [8] . The number of terms in an SGCT referred to as the size of the SGCT corresponds to the amount of used resource in the device. So, our objective is to generate an SGCT of a given Boolean function minimizing the size. Note that our architecture realizes the given logic by cascades of single-output LUTs while a similar reconfigurable architecture is found in [9] , [10] where the given logic is realized by a cascade of multiple-output LUTs which assumes richer hardware resources than us. In order to handle Boolean functions efficiently, several [15] , [16] are proposed. Since the efficiency of computations using the decision diagrams depends on the number of nodes in the diagram (referred to as the size of the diagram, hereinafter) and the size depends on the variable ordering, variable (re)ordering to minimize the size is a key issue about the decision diagrams [12] , [14] .
There have been previous works to minimize the size of the SGCT of a given Boolean function as [4] - [6] where mapping algorithms for a given variable ordering are presented. However the number of terms in the SGCT also depends on the input variable ordering. Thus, there is an approach to decide the variable ordering [17] . [17] tries to reduce the number of GCTs by reordering variables as a preprocess of the mapping seeking a clue in the BDD expression of the given function where the size of the BDD is evaluated instead of the size of the SGCT. However, such an indirect evaluation does not necessarily lead the minimization of SGCTs. Their experiment reveals that the size of the SGCT is reduced not so often and the amount of the reduction is small. Therefore, we propose a variable reordering approach which directly evaluates the number of GCTs integrated with the SGCT generation process. For the purpose, we introduce a directed acyclic graph based on the MDD where an SGCT is associated to each node. Our approach constructs candidates of SGCTs for nodes in a bottom-up manner and selects the variables in the current level by evaluating the sizes of the SGCTs directly. Experimental results show that our integrated approach reduces the number of terms maximum to 71 percent for the MCNC benchmark circuits.
The rest of the paper is organized as follows. Section 2 gives the descriptions of the decision diagrams and the SGCT. Our algorithms and the variable reordering applied to them are described in Sects. 3 and 4. The performance of our algorithms are demonstrated by experiments in Sect. 5. Finally, Sect. 6 concludes the paper.
Preliminaries

Decision Diagrams
A binary decision diagram (BDD) [11] is an acyclic directed graph which represents a Boolean function f (x 1 , x 2 , ..., x n ) : {0, 1} n → {0, 1}. A BDD has two types of nodes, constant nodes and variable nodes. There is two constant nodes in a BDD labeled with the constant value 0 and 1. Each variable node is labeled with a variable x i and has two output arcs, one is labeled with the value 0 and the other 1, implying that the given value for the variable equals to the value associated to the arc. Each node represents a Boolean function where the value of the function is determined by traversing arcs corresponding to the value of the associated variable to each node. Each variable appears at most once in the same order for any traversal. A fixed order assures the uniqueness of the BDD representation for a function. The node representing a function is referred to as the initial or root node of the function. Multiple output functions can be represented by placing multiple initial nodes for each output. The size of a BDD is defined by the number of nodes in the BDD. Several approaches for efficient representation of the BDD have been proposed such as sharing equivalent subgraphs among multiple functions and using output inverters which invert the value [13] . The level of a variable x i in a BDD is defined by the reverse order of the traversal. The level of a variable node v is defined by the level of the associated variable.
A multi-valued decision diagram (MDD) is an extension of the BDD and defined similarly to the BDD, where each input variable can take a value from multiple values, that is, an MDD represents a function f (x 1 , x 2 , ..., x n ) : {0, 1, . . . , r} n → {0, 1}. A literal of a variable is the variable itself or the negation of the variable. Given a set of binary variables {x i 1 , x i 2 , . . . , x i k }, a minterm is a product of literals of the variables including literals of all the variables. For example, x 1 ,x 1 ,x 2 ,x 2 are literals and x 1 x 2 ,x 1 x 2 , x 1x2 andx 1x2 are minterms for
For a Boolean function, by partitioning the binary variables of the function into sets of n variables, assigning 2 nvalued variables distinctively to the sets, and assigning the values {1, . . . , 2 n } distinctively to the minterms of each set, a multiple-valued function corresponding to the Boolean function is obtained. An MDD is obtained from a BDD accordingly. For the case that the number of the binary variables are not a multiple of n, the shortage can be padded with dummy Boolean variables.
Variable Reordering
The computational cost, memory and time, to manipulate a BDD depends primarily on the size of the BDD and the size depends on the variable ordering. However the minimization of the BDD nodes is one of the "intractable" problems. Thus, several heuristic algorithms as [12] , [14] have been proposed. The primitive operation of the heuristics is to exchange two adjoining nodes because the exchange needs only the local reconstruction of the BDD within the two levels, i.e. no global reconstruction is needed.
A number of adjacent variables is referred to as a window. The window permutation algorithm [14] moves the window along the BDD and decides the order within the window at each position. The sifting algorithm [12] moves each variable along the BDD by iterative exchanges with the adjoining node decide the position of the variable. The operation is applied to all the variables in turn.
Sum of Generalized Complex Terms (SGCT)
Consider a class of Boolean functions which can be mapped in the sum of cascaded LUTs. An n-variable tail function is an n + 1-variable function {0, 1} n × {0, 1} → {0, 1} and denoted by F(X, x cin ), where x cin and X are a binary variable and a set of n binary variables, respectively. The variable x cin is referred to as the cascade input of the function.
For simplicity, cascade inputs for tail functions denoted by the same name x cin though the cascade inputs are distinctive for each tail function. If F does not depend on the cascade input x cin , the tail function is denoted by F(X, U) where U is referred to as the universal term. An expression
is referred to as a generalized complex term (GCT) where (X 1 , X 2 , . . . , X m ) is a sequence of sets of variables and (F 1 , F 2 , . . . , F m ) is a sequence of tail functions. The sequence (X 1 , X 2 , . . . , X m ) is the variable ordering of the GCT. If |X i | ≤ n for every i, the GCT is called nGCT. An nGCT can be embedded in a series ( 1 , 2 , . . . , m ) of cascaded n + 1-input LUTs by assigning X i to the inputs of the LUT i , and programming i to be F i . Figure 2 shows the mapping of f = F 2 (X 2 , F 1 (X 1 , F 0 (X 0 , U))) into the cascaded LUTs. Note that the positions of variables for the GCT mapped into the cascaded LUTs are not necessarily exchangeable, while those for the product term mapped into the "AND" array of the PLA are exchangeable.
We introduce an operator " " to denote the GCT by
and F i are referred to as the head and the tail of the GCT, respectively. A subterm of a GCT is the GCT itself or any subterm of the head of the GCT. In this paper, "U " is often omitted.
When a function f can be expressed as f = T 1 T 2 · · · T k by GCTs T i and any commutative binary operator " ," the expression is referred to as a sum of generalized complex terms (SGCT). An SGCT consisting only of nGCT is called an nSGCT and the number of GCT in an SGCT f is denoted by | f |. When the variable ordering of any GCT included in an SGCT is X = (X 1 , X 2 , . . . , X m ), the SGCT is called an SGCT of the variable ordering X. In this paper, we assume that the operator is "OR," and X i ∩ X j = ∅ for i j. In the followings, an SGCT is often treated as a set of the GCTs in the SGCT for convenience.
The number of terms in an SGCT to express a Boolean function strongly depends on the variable ordering. . Note that our problem is a generalized version of multiple-valued PLAs' [16] , [18] .
SGCT Graph and Construction Algorithm
We introduce a directed acyclic graph to generate nSGCTs. An SGCT graph is an MDD corresponding the given Boolean function where each node is labeled with an nSGCT of the Boolean function corresponding to the node. Figure 4 shows examples of a BDD (left) and an SGCT graph (right) for a Boolean function f = x 1 x 2 + x 3 x 4 . In the SGCT graph, the variable ordering is ({x 4 , x 3 }, {x 2 , x 1 }), the SGCTs associated to the node v 0 and v 1 are {U x 2 x 1 x cin } and {U x 4 x 3 x cin , U x 2 x 1 x cin x 4 x 3 x cin } where x cin represents the cascade input. Given a completely specified Boolean function in the SBDD [13] form, our algorithm performs bottom-up construction of an SGCT graph together with the variable reordering. In order to support the partially constructed SGCT graph, the BDD in the current variable ordering is maintained during the construction and each node in the SGCT graph and the corresponding node in the BDD are linked each other.
The construction is performed according to the level of the nodes. At each level of the construction, m variables for the current level are decided by reordering undecided variables and evaluating the number of GCTs. Figure 5 illustrates the framework of the algorithm.
At each level, the algorithm generates candidates for the set V of n variables by applying some reordering strategy to the BDD, for example, the sifting algorithm [12] . For each candidate V, the algorithm creates nodes and arcs of the SGCT graph at the level. A node v SGCT of the SGCT is created for a node v BDD of the BDD if v BDD has input arcs from the upper levels than the candidate. An output arc a of v SGCT are created for each minterm of the candidate V, and connected to the node u SGCT associated to the node u BDD where u BDD is the node in the BDD specified by traversing arcs corresponding to the minterm from v BDD .
For each node v, the SGCT F is generated from the SGCTs of the nodes in the lower levels as follows. Let C be the set of all the minterms including the variables in the candidate, u c be the node connected by the arc labeled with c ∈ C, F c be the SGCT of u c , and {t c:0 , t c:1 , . . . , t c:|F c | } be the GCTs in F c . Then, cF c = {ct c:i : 1 ≤ i ≤ |F c |} can be a part of the SGCT for the node v. The set of GCTs is obtained by adding the tail function cx cin to each t c:i as illustrated in Fig. 6 . Consequently, the SGCT F for the node v is given by the sum of the GCTs for all the minterms {c · t c:i : c ∈ C, 1 ≤ i ≤ |F c |}. The algorithm repeatedly tries to reduce the number of GCTs in F according the following reduction rules.
The sum of two GCTs which have the same head function can be replaced by one GCT as (H T ) (H T ) = H (T T ).
The sum of two GCTs where the head function of a GCT is the negation of the others can be replaced by one GCT as (H T ) (H T ) = H (T T ) where T (x) = T (x).
The sum of a GCT and the universal term can be replaced by one GCT as (H T ) (U T ) = H (T T ).
The following example shows that five GCTs are combined into one GCT by the rules.
By generating all the candidates of the variables for the current level, the candidate which gives the minimum number of total GCTs is selected for the level and all the other candidates are discarded. Note that the minimization of the number of GCTs at the current level may not lead the minimum number of GCTs at the later levels. By applying the procedure above for each levels from the bottom to the top, an SGCT of the given function is obtained.
For more detail, a pseudo-code of the algorithm above is shown in [1] . Note that the given function must be a completely specified function because the algorithm starts to construct SGCTs of subfunctions at the lowest level in a single BDD.
Strategies of Variable Reordering
Based on our algorithm to construct SGCT graph, we present several variations to generate the SGCT. Before entering the discussion about the integrated approach, we discuss about the SGCT construction without variable reordering and variable reordering as a preprocess.
SGCT Construction without Reordering
Omitting to generate candidates of the variable ordering, our algorithm to construct the SGCT graph can be an algorithm without variable reordering, i.e. for the given variable ordering. It can be the reference case to infer the potential of the algorithm from. It can be combined with some preprocess for the variable reordering.
Reordering as Preprocess
As described in the introduction, [17] tries to reduce the number of GCTs by reordering variables as a preprocess of the mapping where the size of the BDD is evaluated instead of the size of the SGCT. However, the size of the BDD seems far from a clue for the minimization of the SGCT. Then, we present two alternatives to the evaluation in the preprocess, the size of the MDD and the size of the cutset in the BDD. The variables in the same level are exchangeable both in the SGCT and the MDD while they are distinguished in the BDD. Then, the MDD seems more suitable than the BDD for the purpose and we apply the size of the MDD for the evaluation in the preprocess.
For a bi-partition of the nodes, the cutset is defined by the set of arcs between the two parts and the size of the cutset is defined by the number of arcs. For a level, consider a bipartition of the nodes by the level, i.e. the nodes in the level or below and the nodes above the level. The cutset of the bi-partition is the set of all the arcs from the lower levels to the upper levels. The size of the cutset implies some kind of the quantity of the information needed to be transfered across the level. Then, we apply the size of the cutset in the BDD for the evaluation in the preprocess. Note that the size of the cutset in the MDD for the level is the same to that in the BDD.
There is a similar concept to the cutset, the width. The width is defined by the number of the nodes to which the arcs across the level are connected. The difference is that a node is counted once for the width while arcs to the same node are individually counted for the cutset. From the viewpoint of the circuit design, that implies the difference between the output signal and the input terminal from the net (may be multiple). The width represents the quantity of the information needed to be transfered across the level for the case that connections of the signals can fork and reconverge. However, for our target, the SGCT, each term is generated independently and a subterm is duplicated if it is needed for multiple terms. Then, we adopt the cutset rather than the width.
In this paper, the sifting algorithm is applied to the variable reordering to both preprocesses.
Variable Reordering
First, we present a local exhaustive search for a reference case. At each level, all the combinations to select n variables from all the undecided variables are nominated as the candidate. The number of the GCTs at the current level is the minimum among any reordering strategies. Note that the minimum solution at each level may lead the non-minimum solution of the original problem.
Since the exhaustive nomination may not be applicable for large number of variables, the sifting algorithm is applied to our algorithm in this paper. In order to enhance the chance to achieve smaller number of GCTs, our algorithm is extended to examine extra levels at each level as illustrated in Fig. 7 . At the step for each level , the algorithm constructs the SGCT graph up to the level + k − 1 for each candidate of the variable ordering. m variables in the current level which gives the minimum number of GCTs at level + k − 1 is selected for the level . 
Evaluation
Our algorithm is implemented in C++ language with a BDD library of CUDD (Colorado University Decision Diagram Package) [19] and evaluated by generating 2SGCTs for benchmark circuits provided by MCNC [20] . The platform is a PC equipped with Pentium IV 1.8 GHz CPU, 512 MB main memory, and Linux 2.4.20 OS. Algorithms tested are the followings.
Tsu01
The algorithm in [4] . The variable ordering is given by the same as the variables appears in the benchmark file. NoOrd Our algorithm to construct the SGCT graph without the variable ordering. The variable ordering is given by the same as the variables appears in the benchmark file. Yua03 The algorithm in [17] . The variable reordering to reduce the size of the BDD, the window permutation algorithm [14] , is applied as the preprocess. After the reordering, [4] is applied. MDD Size The variable ordering based on the sifting algorithm [12] to reduce the size of the MDD is applied as the preprocess. After the reordering, our algorithm to construct the SGCT graph without reordering is applied. Cutset The variable ordering based on the sifting algorithm [12] , to reduce the size of cutset at each level is applied as the preprocess. After the reordering, our algorithm to construct the SGCT graph without reordering is applied. LclExh The local exhaustive nomination of candidates is applied to our algorithm for the reordering. Sift The sifting algorithm [12] is applied to our algorithm for the reordering. Sift2 The sifting algorithm [12] is applied to our algorithm for the reordering. At each level, the SGCT graph is constructed up to one more level for each candidate. Sift3 The sifting algorithm [12] is applied to our algorithm for the reordering. At each level, the SGCT graph is constructed up to two more levels for each candidate. Table 1 summarizes the results for the size. The first three columns list the name, the number of inputs, and the number of outputs of each circuits, respectively. The remained columns list the number of 2GCTs generated by each algorithms. Since our algorithm can handle only completely specified functions, we apply the algorithm to the lower-bound function (all the don't-cares are set to zero) and the upper-bound function (all the don't-cares are set to one) of each incompletely specified function. The results appears as "lower-bound/upper-bound" in the table. The smallest numbers of GCTs for each circuit among all the tested algorithms appear with asterisks and numbers near the smallest, the smallest plus one or not exceed 105 percent of the smallest, appear in the shaded areas. The circuits are sorted in the increasing order of the smallest number of GCTs.
Our approaches, especially Sift3, achieve smaller num- ber of GCTs for many circuits, 71 percent reduction at the maximum for the case of Sift3 applied to t481 compared to Yua03. Our integrated approaches achieve the smallest or near the smallest for many circuits for which the size of the resultant SGCT is relatively small. On the other hand, the size increases compared to the previous approaches for many larger circuits, especially for incompletely specified functions. While our algorithm performs bottom-up construction, the previous algorithm(Tsu01) performs top-down decomposition of functions utilizing don't-cares. For larger circuits, the merit of the top-down strategy and don't-cares seems more significant. However, such a large circuit should be divided into smaller circuits suitable for SGCT expressions. To develop Such circuit division is a difficult but challenging future work. Table 2 summarizes the CPU time. The first three columns list the name, the number of inputs, and the number of outputs of each circuits, respectively. The next seven columns list the CPU time for each algorithms. The remained two columns list the ratios of the CPU time for Sift3 versus Yua03 and Tsu01, respectively. The ratios of 1.5 or more appear in shaded areas. The circuits are sorted in the same order with Table 1 .
All the algorithms complete their computation within seconds almost all the cases. Our core algorithm, NoOrd, is the fastest all the circuits except misex3. The reason of the exception is considered to be that the number of GCTs, which affects the computational time, for misex3 generated by NoOrd is much larger than Sift. It is observed that the order of speed is Sift, Sift2, and Sift3 according to the extra levels of the construction but the difference is small. We conclude that Sift3 is the most effective one considering the resultant size and the computational time. Our algorithms are faster than previous approaches, 10 to 50 percent reduction of the CPU time, for many cases.
Conclusion
An integrated approach of variable ordering and logic mapping into LUT-Array-Based PLD is presented. Given a target Boolean function, our algorithms generate an expression of the function in the form of the sum of generalized complex terms by constructing the SGCT graph. During the construction, our algorithms perform the variable ordering and evaluates the sizes of SGCTs directly, while the previous approach performs the variable ordering as a preprocess and evaluate the ordering indirectly. Our integrated approaches achieve the smaller size compared to the previous approaches for smaller circuits, 71 percent reduction at the maximum. However, the size increases compared to the previous approaches for many larger circuits. Such large circuits are considered to be divided into smaller circuits, which is one of the future works. The computational time of our approaches is smaller, 10 to 50 percent reduction, compared to previous approaches. Combination with some top-down strategy, enhancement to handle don't-cares, and comparisons with other LUT-based FPGA-mapping algorithms are also the future works.
