Abstract-Programmable Logic Arrays (PLA's) provide a flexible and efficient way of synthesizing arbitrary combinational functions as well as sequential logic circuits. They are used in both LSI and VLSI technologies. The disadvantage of using PLA's is that most PLA's are very sparse. The high sparsity of the PLA results in a significant waste of silicon area.
I. INTRODUCTION
TERACTIONS between the structured design techwide implications for overall design cost and efficiency of digital circuits and systems. Use of a regular structures facilitates the design process and eliminates tedious manual operation. Due to the regularity of the structure and the simplicity of the design, Programmable Logic Arrays (PLA) have found widespread acceptance in the design of digital systems.
The PLA is a hardware form used for implementing two-level multiple-output combinational logic circuit. PLA design is easily automated because of a direct correspondence between physical PLA layout and the personality matrix. The major disadvantage of the PLA is that most practical logic problems leave much PLA area unused. A straightforward physical design results into a significant waste of silicon area, which may be unacceptable. Also, speed and power become critical parameters as the size of the PLA increases [7] . The gate capacitances of the input signals carried by long polysilicon lines become the key factor in determining the timing (speed) performance. In moderate to large PLA's, the polysilicon resistance becomes as important a factor as the capacitance. The signal can be seriously degraded with the large resistance added to the line, no matter how large the drivers are. Further, if the PLA becomes large, the width of the power and the ground lines should also be increased to avoid possible metal migration. Most PLA generators I". niques and the design of complex VLSI circuits have IEEE Log Number 9212365. [15] automatically increase the width of the power lines and the ground lines in the PLA, depending on the total current demand. PLA optimization aims at minimizing the area occupied by the PLA and as a result addresses almost all disadvantages listed above. Two minimization techniques are commonly used to reduce the PLA areas, 1) Logic minimization: Logic minimization seeks a logic representation with a minimal number of implicants. Reduction of the number of implicants allows a PLA to be implemented in a small area. 2) Topological minimization: PLA folding is a technique which reclaims unused space without destroying the regular structure of the PLA. According to Egan and Liu [2] arbitrary boolean functions produce sparse PLA's, in which typically 90% of the crosspoints are unused. Folding achieves size reduction by compaction and removal of areas of unused crosspoints.
In this paper, we study the problems of the PLA folding. There are many types of PLA folding, depending on the technology employed to implement a PLA. All PLA folding methods involve the merging of two or more columns (rows) of a PLA into a single column (row). The simplest form of folding, called Simple Column Folding [7] , involves merging pairs of columns into single columns.
The object of PLA folding is to find the maximum number of pairs of columns/rows that can be folded simultaneously. The PLA folding has a complex functional dependence on the ordering of the rows. The optimal simple PLA column folding problem can be defined as:
Determine a permutation of the rows which allows a maximum set of column pairs to be implemented in such a way that each column of the folded PLA contains a pair of columns from the set.
The optimal folding problem has been shown to be NPcomplete [6], [ 141. Many algorithms and heuristics have been developed to solve this problem. The simplest one is the branch and bound algorithm [4] , [13] . Although it is simple and able to find an optimal solution in theory, its practicality for large PLA's is questionable because it carries out an exhaustive search for an optimal solution. Therefore, many heuristics have been developed to find good, but nonoptimal solutions. Hwang et al. [lo] used a best-first search algorithm to find a near-optimal result. Ullman [20] used a graph algorithm to find a feasible so-0278-0070/93$03.00 @ 1993 IEEE lution in a time complexity no worse than O ( w c 2 ) , where w is proportional to the number of rows and c is the number of columns. Hachtel et al. 171, [8] proposed algorithms for both row and column foldings, which find the folding pairs one by one. The PLA folding results thus obtained are only locally optimal and depend on the selection of order of the folding pairs. For example, for column folding, they try to fold as many columns as possible and then determine the row permutation according to the folding set so found. In fact, each folding set corresponds implicitly to some row permutation order. Thus, after a folding set is selected, the next folding set is constrained by the row permutation orders. In this paper a new bipartite PLA folding algorithm based on matrix representation is presented. Before searching a bipartite folding, the columns which do not satisfy certain constrains, and hence nonfoldable, are pruned. This reduces the search space. During search, heuristics are used to find an alternative folding. This speeds up the search processes.
This paper is organized as follows: The advantages and constraints of PLA folding are given in Section IT. The PLA bipartite folding is also introduced in this section. Terms which are used in the proposed algorithm are defined in Section 111. In Section IV, a bipartite folding algorithm is described. Results on the benchmark examples are presented in Section V. Section VI concludes the paper. Table I . This is because the area of placing the extra input decoders and output buffers in a folded PLA is greater than the area saved by folding.
ADVANTAGES AND LIMITATIONS
Although PLA folding can reduce the area effectively, there exist constrains, such as routing, on folding a PLA or a folded PLA. In a VLSI system design, these constrains should be taken into account. Two of the important constraints and their impacts are discussed below.
Routing:
In a folded PLA, one of the folded input (output) signals must come from the top of the PLA and the other from the bottom. Since the inputs may be required anywhere and the output may go to anywhere, it often increases the complexity of routing. Furthermore, it is well known that the routing of signals often takes more silicon area than the logic blocks. Typically, 30% of total design time and about 60% of the chips are expended merely to interconnect the circuit elements [ 181. Therefore, any calculation made to estimate the overhead without considering the routing area is often too optimistic. Thus, folded PLA design must consider the routing area of interconnects and time to complete the design. Testing: The input decoders and output buffers of a testable design of a PLA are generally augmented such that they can control the columns of the PLA easily. For a folded PLA, transistors can be placed between the cuts of the columns, such that the PLA can be controlled by the augmented input decoder only from one side during testing. Otherwise, input decoders at the both sides would need to be augmented to test the PLA. To place a transistor on the cut in a folded PLA can be costly for a CMOS technology design. This is because the layout of a PLA is very compact. For a simple column folding PLA, the cuts are in different levels, the area increased to place the transistors are dependent on the number of cuts in a folded PLA, which results in significant waste of silicon area.
A bipartite folding is a folding in which all of the breaks (cuts) occur at the same level. The single break level of a bipartite folding splits a PLA into two regions [2], an upper folding region ( U ) which contains those folded input and output lines that are above the break, and a lower fofding region (L) which contains the folded input and output lines that are below the break. A column bipartite folding exists if every line in the upper folding region is disjoint from every line in the lower folding region.
There are several advantages for using bipartite folding:
Our experimental results show that the size of a bipartite folded PLA compares favorably to the size of the PLA obtained after single column folding. The area of a PLA can be further reduced by folding the upper folding region and lower folding region. The same algorithm can be applied recursively to the bipartite folded PLA.
The area required for inclusion of testability features in a bipartite folded PLA is much less than that of a single column folded PLA. Since all of the cuts in a bipartite folded PLA are at the same level, therefore as argued earlier, this PLA can be tested from one side alone with very little area overhead. The idea proposed in [19] can be used to obtain Built-in SelfTest folded PLA by placing pass transistors between the cuts of a bipartite folded PLA.
111. PRELIMINARIES An example of the AND plane of a PLA is shown in Fig. 1 . In this figure, columns represent uncomplementcomplement pairs of literals. A dot means the placing of a transistors on uncomplemented or complemented input. Each horizontal line of the PLA carries a product term. There are 13 inputs and 21 product terms in the example PLA. We will use this example AND plane of a PLA throughout this section.
Definition I: Two columns are disjoint (compatible) if they do not share a common product line.
More explicitly, two input lines are disjoint if the corresponding inputs do not occur together in any product term for any of the output function. An input line is disjoint from an output line, if the literal represented by this input line is not present in the function represented by the output line. Finally, two output lines are disjoint, if they do not share a product term. For the example PLA, inputs 1 and 5 do not share any common product line, hence they are disjoint. Inputs 9 and 13 appear on product line 1, therefore they are not disjoint. 
CM is symmetric, since if column i is disjoint from
Construction of the matrix CM from a given PLA is straightforward. The compatibility matrix of the PLA of Fig. 1 is shown in Fig. 2 . The last column, labeled weight, will be explained later.
A column bipartite folding is a folding in which all the breaks (cuts) in the columns occur at the same level. Fig.  3 shows a bipartite folding of the example PLA. In Fig.  3 , U contains inputs 7, 8, 9, 10, and 12, and L contains inputs 1, 2, 4, 5 , and 6. The size of a bipartite folding PLA is the cardinality of either folding region. Since the breaks of a bipartite folding PLA are at the same level, the following lemma is evident and is given without proof. It is an m X m matrix where 2m I n, n being the total number of columns of the PLA and m being the number of folding pairs. Every element in the upper folding region is the column of FM, and it has the lower folding region as its row.
umn is not disjoint from itself, j , then j is disjoint from i.
It has all-1 .
An FM of the PLA in Fig. 1 is shown in Fig. 4 . This FM corresponds to the bipartite folded PLA shown in The last column in Fig. 2 shows the weight of each column. Input 13 is expendable. An expendable column (row) can be deleted from the CM, since it is nondisjoint from all other columns. In other words, the expendable columns (rows) can not appear in either upper folding region or lower folding region.
Theorem 2: If a PLA contains M bipartite folding pairs then at least 2M columns in the CM have weights greater than or equal to M .
Pro08 In an M pairs bipartite folding PLA, each column in U can be folded with every column in L. Since there are M columns in U , each of these columns must have weight at least M . Likewise, each of the columns in L must also have weight at least M . Hence, the total number of columns with weight M or more in a bipartite folded PLA is at least 2M.
This theorem can be used to reduce the search space for bipartite folding. In other words, if we are looking for a bipartite folding with size M, all candidate columns should have weights greater than or equal to M . For example, there are 12 columns in Fig. 2 after removing the expendable column 13. In the CM, the value of M cannot be 6 since there are only 9 columns with weights greater than or equal to 6. Hence, M is upper-bounded by 5 in the given example. Theorem 3: Given 2M columns with M bipartite folded pairs, the number of companion pairs of these 2M columns is greater than or equal to M 2 .
Proof: A column in U is disjoint from every column in the L. Hence, the weight of each column in U is at least M. Likewise, every column in L has weight greater than or equal to M . Further, every column in U is a companion pair with every column in L , with weight(Ci) + weight(Cj) 2 2 M , where Ci is a column in U and Cj is a column in L . There are M 2 combinations of the columns. Hence, the total number of companion pairs in a bipartite folding is Theorem 3 gives the basic constrain for bipartite folding for a given set of 2M columns. If Theorem 2 is used M 2 .
to find 2M columns as the cardinality of a bipartite folding, Theorem 3 must be satisfied for these columns for bipartite folding to exist.
The next section gives a column bipartite folding algorithm based on these two theorems.
can easily block the change of placing next column in L ( U ) . For a sparse PLA, most columns have larger weights. Hence, BIFOLDING can easily find a bipartite solution. The experimental results in Section V show that BIFOLDING can find a solution efficiently.
The selection steps are important to produce a "good" IV. THE FOLDING ALGORITHM AND HEURISTICS folding. These steps have to be chosen so that Uand L computed by the algorithm have cardinality as close as possible to the maximum. number of pairs is large. Because we have large freedom to arrange the folding pairs, a much easier task is
Therefore, it appears that a good heuristic is to select columns with The optimal bipartite folding problem was shown to be algorithm to obtain a "good" solution. To explain the basic ideas behind the algorithm, let us first introduce a frame of the algorithm. The algorithm frame is described in the following structured code' 121* Therefore, we propose a heuristic-based If we select columns with large degrees (weights), the in the following folding maximum degree as candidates to be folded and place them in candidate FM.
However, selecting a column with large degree for inclusion in FM increases the difficulty of folding. In fact, many possible foldings are created by the choice of maximum degree columns. If we select columns with minimum degree the number of possible alternate foldings is kept small. This argument seems to suggest that we should select columns with minimum degree form candidates to be placed in FM.
Of course these two selection rules are contradictory, therefore a tradeoff must be made. We decided to use the following selection rule: Select columns with maximum degree as the cardinalities to be folded and place them in candidate FM, then select columns with minimum degree from candidate FM and place it into FM.
In BIFOLDING, there are several subprocedures. We now describe these subprocedures and the heuristics used in them.
The algorithm SELECT implements the idea of selecting 2M columns with maximum degree from C M . In CM, the columns are sorted into decreasing order according to weight. Hence, SELECT selects the first 2M columns which have not been justified. The sorting of CM saves the effort of searching 2M columns with weights at least M .
VSELECT is based on choosing a minimum degree column from the 2M selected column. Algorithm INITIALFOLDING( V , U, L ) , given below, is used to arrange 2M columns into U and L. It proceeds a follows: In the beginning of the algorithm, a column u is selected with least freedom using VSELECT, and put into, say U . It then removes u from V and sets L to be an empty set. Then, for each element U in the V , if U is not disjoint from every element in U @ ) , it is placed in L(U) and removed from V. This step continues until every element in V has been justified.
Algorithm VSELECT ( V )
INITIALFOLDING reduces the search space by fixing some columns in U or L. Since these columns have least freedom (weights), which limits the chance of the folding of other columns.
The algorithm COMPATIBLE( U , R ) determines the compatibility of a column u with a folding region R , where R is upper folding or lower folding region. If Y is disjoint from every element in R , COMPATIBLE returns true, otherwise it returns false. For a given folding region R , if U is compatible with R , U can be placed in the opposite CHECKFOLDING(~, U , L ) tries to place the elements in V into U and L. It first selects the column with least freedom in V . If this element can be put in neither U nor L, the folding condition fails, and CHECKFOLDING returns false. Otherwise, this element is put into the corresponding folding region. Every element in V is tested using CHECKFOLDING.
end; end; end;
V. EXPERIMENT RESULTS The algorithms have been implemented in C programming language on a SUN SPARC-station 2 . We have used our algorithm to find bipartite folding for a number of PLA's of varying sizes. We have not analyzed the complexity of the algorithm but its efficiency is evident from the results presented in this section. Table I1 summarizes M A : the relative area with respect to arbitrary column
Note that the RA's are computed from the "exact" area of the PLA, which includes the input decoders, pull-ups, and output buffers. The layout of a bipartite folded PLA is generated by a CMOS PLA generator developed by us.
The choice of 30 PLA's out of 56 PLA's is based on the area reduction (RA) of folding. As pointed out in Section 11, placing the input decoders and output buffers on top of small PLA may increase the area in spite of folding. The PLA's with increased area after folding are not listed in Table 11 . Besides, some PLA's among the 56 PLA's are too small such that the searching of the solution can be carried exhaustively. These small PLA's are also not considered in Table 11 .
We notice from Table I1 that for most of the PLA's f, = faM andf, = foM. Thus, we conclude that BIFOLDING provides optimal solution in most cases.
From theresults given in Tables I and 11 , it is evident that the areas of PLA's obtained by BIFOLDING are comparable to those obtained by simple column folding. For many of the large sparse PLA's, the folding pairs found by bipartite folding are the same as found by simple column folding. A dash (-) in the last column indicates that the simple column folding program was either unable to provide an answer or could not be run due to the large size of the PLA.
folding. The CPU time in seconds is also shown in Table 11 . For most of the PLA's, the CPU time is less than 1 s. The reasons that BIFOLDING is fast are, 1) The folding constrains are considered before searching. Hence, no time is spent on those columns which cannot be folded. 2) The heuristics in BIFOLDING select those columns with maximum freedom and then search the candidates with minimum freedom. This increases the possibility of finding a folding while keeping the search space alternatives as few as possible.
In the last row of To compare the speed between our algorithm and "pleasure," BIFOLDING was also ported onto a VAX 1 U750 machine, where both "pleasure" and BIFOLDING could run. Table I11 shows the folding results and the CPU time of BIFOLDING and "pleasure." The last 2 columns show the CPU time in seconds for "pleasure" and BI- FOLDING, respectively. This table substantiates the claims made about the speed of BIFOLDING.
To show the folding result of BIFOLDING, a small PLA "alul" which is listed in the 56 PLA's in Table I , is used to demonstrate. Fig. 5(a) show the personality matrix and Fig. 5 (b) shows the PLA implementation of ' 'alu 1. " ''alu 1 " has 12 inputs, 8 outputs, and 19 product terms. The size of "alul" is 608. In Fig. 5(a) , a "1" ("0") in the AND plane shows the presence of the corresponding uncomplemented (complemented) input on a product term. Similarly, a "1" in the OR plane shows the presence of the corresponding product term in the output. The reason to choose "alul," instead of those PLA's in Table 11 , is that its size is small and there are folding pairs existed in it. folding pairs in the AND plane and 4 folding pairs in the OR plane. In the AND plane, for physical consistency, the input columns have been folded with input columns and output columns with output columns. A "!" is a normal contact to uncomplemented input and split below, a "0" is a normal contact to complemented input and split below, and a "=" means no contact but split below. Similarly, a "!" in OR plane is a normal contact to output and split below , and a " = " means no contact but split below.
The size of the folded PLA is 380. This implementation yields 37.5 % area reduction.
VI. CONCLUSION
We proposed an efficient PLA folding algorithm applicable for column bipartite folding, based on matrix representation. The compatibility of the column pairs is found and stored in a compatibility matrix. This step discards "pairs" that cannot be in the final solution. Theorems are proved and invoked during this phase that help limit the search space. The BIFOLDING algorithm proposed in the paper makes use of heuristics to guide the search. The algorithm proposed in this paper yields nearly optimal results for almost all examples and in certain cases bipartite folded PLA's provide a better solution than arbitrary simple column folding as obtained by "pleasure." Generally, our algorithm provides PLA's that have areas comparable to single column folded PLA's but is much faster in providing the solution.
We have also outlined some of the advantages of bipartite folding, many of these ideas, especially those relating to testability have been incorporated in a program which generates testable PLA.
