Manuscript received January 4, 1989; revised May 31, 1989. This work was supported in part by ERSO under Contract SF-C-010-1 and by the Cleemput's layout style [lZ]. Our algorithm takes a transistor-level circuit schematic and outputs a minimum set of transistor chains. Possible diffusion abutments between the transistor pairs are modeled as a bipartite graph. A depth-first search algorithm is used to search for the optimal chaining. Theorems on the set of branches needed to be explored at each node of the search tree are derived. A theoretical lower bound on the size of the chain set is derived. This bound enables us to prune the search tree efficiently. The algorithm has been implemented and tested. It is able to find optimal solutions almost instantly for all the cases available to us from the literature.
V. CONCLUSION
We have presented an efficient deterministic solution for symbolic-cover minimization for fully U 0 specified FSM's. The effect of this algorithm on symbolic-cover minimization is described as the following competition between two state-assignment algorithms. Consider a large fully specified FSM (say with more than 100 states). First we carry out symbolic-cover minimization using the ideas presented in this paper, thus finding the (unique) cover in reasonable time. In contrast, a straightforward application of the symbolic-cover minimization algorithm of [2] for such a large FSM must be heuristic, thus most probably generating an inferior solution. It is this part of the state-assignment algorithm that determines the number of terms in the resulting PLA. Thus we can expect a PLA generated by the first algorithm to have fewer terms than a PLA generated by the second. Next, for both competitors, we continue with the constrained encoding algorithm of [2] which determines the number of columns in the resulting PLA. Clearly, we can now expect the first competitor to do at least as well as the second.
ACKNOWLEDGMENT
The author would like to thank the anonymous referees for their helpful suggestions and guidance.
I. INTRODUCTION As CMOS VLSI technology [13]
and cell-based layout methodology [ 11, [4] gain popularity, the automatic layout generation of CMOS functional cells becomes very important and attracts attention from many VLSI/CAD researchers. In [12] , Uehara and van Cleemput proposed a paradigm for CMOS functional cell layout, which has inspired much research.
In [12] 's layout style, the transistors are placed in two parallel rows, where all the P-type transistors are in one row while all the N-type transistors are in the other. Power rails are routed along the rows on the outside and intracell routing runs between the rows. Since the height of a cell is usually fixed, the primary concern is to place transistors in such a way that gate signals are aligned and the drain/source diffusions of adjacent transistors are abutted as much as possible, thereby minimizing the number of separations between diffusion strips, which in turn minimizes the layout area. Much research has been done to improve the original proposal [2] , [61-[101, [14l. In this paper, we propose a fast algorithm for the problem of chaining the transistor pairs using a minimum number of chains. The input of our algorithm is a CMOS circuit schematic at the transistor level. The output from the algorithm is a minimum set of chains, where each chain can be realized using only one P-type diffusion strip and one N-type diffusion strip.
We group transistors into pairs with each pair consisting of a P-type and an N-type transistor and then model the possible abutments between the pairs as a bipartite graph. On the graph, a depthfirst search algorithm is used to find a maximum set of edges which correspond to a maximum number of realizable abutments. There is a tight upper bound on the number of realizable abutments, and hence, the lower bound on the number of chains needed for an optimal solution, is derived. Theorems are proven to help to reduce the size of the search tree.
In the next section, we will survey some previous work. In Section 111, we will present the bipartite graph model. Section IV defines some terminology and derives a number of theorems which will speedup the search process. A theoretical lower bound on the number of chains in an optimal solution is derived in Section V. Section VI describes the algorithm. Section VI1 presents our implementation and some experimental results. Concluding remarks and future work are discussed in Section VIII. A heuristic method for finding a good' but not necessary ''ti- [12] . A CMOS gate is represented by two multigraphs (one for the €'-network and the other for the N-network), where each vertex corresponds to a source/drain connection and each edge represents a transistor. The objective is to minimize the number of dual Euler 0278-0070/90/0700-0781$01 .OO O 1990 IEEE paths (a P-and an N-Euler path that have identical labeling sequences) needed to cover all the transistors once.
PREVIOUS WORK
Wimmer et a l . [14] proposed an exhaustive method to solve the chaining problem in three steps: 1) transistor pairing, 2) chainformafion, and 3) chain covering. Starting with chains of length 1 (i.e., an individual pair), it iteratively forms chains of length i by abutting a pair to a chain of length i -1 until none of the chains can grow longer. A minimal set of all possible chains that covers all the transistor pairs exactly once is found by solving a minimum covering problem. Since the minimum covering problem is an NPcomplete itself, it is transformed into a maximal clique finding problem and then solved by using a heuristic method.
Attempts were made to apply rule-based programming techniques on the problem. TOPOLOGIZER [6] used very simple exchanging rules to achieve good results for small sized problems. PLAY [7] solved the problem using a pattern-matching paradigm where useful patterns were extracted and stored in a knowledge base. 
T H E BIPARTITE GRAPH MODEL
This section describes a bipartite graph model for representing the possible abutments between pairs. If the input circuit is fully complementary, the members of a pair are the dual of each other. Otherwise, a pairing procedure to be described later is needed.
A transistor, t , is denoted as a quadruple ( We represent the possible abutments between pairs as a bipartite graph G = ( Vp U V,, E ). Each vertex in V, ( V,,) corresponds to a set of painvise abutable P-type (N-type) transistors, that is, those transistors share the same P-(N-)type drain/source diffusion net. An edge exists between two vertices if and only if an abutment is possible between two pairs which are composed of the transistors represented by these two vertices. The formal descriptions of V,,, V,, and E are:
= i or S ( t , ) = i ) } ] , and E = { efiJ { rp(pk), t , , ( p , ) } c vi, { t n ( P k ) , r n ( m ) } E vi, and ABUTABLE(pk, p , ) } . 
IV. DEFINITIONS A N D THEOREMS
Dejinition 1: An essential abutment is an abutment which must appear in any solution derived from the current node of the search tree. In the bipartite graph, an edge that represents an essential abutment is an essential edge.
Dejinition 2: Two possible abutments are mutually exclusive if at most one of them can be in any solution. Otherwise, they are compatible. Two edges that represent two mutually exclusive possible abutments, respectively, are mutually exclusive edges.
Dejinition 3:
The set of edges which are mutually exclusive with edge efi is denoted as X O R ( e% ).
Dejinition 4:
The X function, X ( e i , e j ) , returns TRUE if e, E XOR ( e j ), FALSE, otherwise. The C function is the complement of
X .
Dejinition 5: A set of edges { e ; 1 i = 1, 2, . . . , n } forms a mutually exclusive class (MEC) if X( e ; , e, ) is TRUE f o r i , J = 1 , 2 , * * . , n a n d i # J .
Dejinition 6: A set of edges { e; I i = 1 , 2 , . . . , n } forms a mutually compatible class (MCC) if C ( e;, e, ) is TRUE for i , J = 1, 2, * . . , n and i # J .
Lemma 1 : Two edges, e$ E E ( G ) and e:$: E E ( G ) , are mu-
Proof: In Uehara and van Cleemput's layout style, at most two same-type transistors can share a common diffusion. Therefore, if two abutments involve a common diffusion, at most one of them can be realized.
Q.E.D. For example, X ( e ; i , e:;) = TRUE and x o R ( e : ; ) = {e::, e g , e::} in Fig. l 
(b).
Lemma 2: An edge, e& E E ( G ), is essential if XOR ( e # ) = 0.
Proof: If its mutually exclusive set is empty, an edge can be included in any solution without preventing other edges from being selected. Therefore, any solution without it will not be an optimal one. Hence, it is essential.
Q.E.D.
v. A LOWER BOUND ON THE NUMBER OF CHAINS
This section first derives an upper bound on the number of realizable abutments in a solution. The lower bound on the number of chains needed in an optimal solution then follows.
Theorem I :
A P-type vertex, U; E Vp, can contribute to at most min ( L I 0; I / 2 1 , deg ( U $ , ) ) realizable P-diffusion abutments. Similarity holds for any N-type vertex.
Proof: Since the number of transistors in a vertex U corresponds to the transistors whose drain or source are connected by the same net, the maximum number of abutments that occur at that netis L l v l / 2 ] .
Also, the number of abutments cannot be greater than deg ( U ) because each edge represents only one possible abutment.
Therefore, a vertex can contribute to a solution at most rnin
We can easily derive two corollaries from the above Theorem as follows. 
Corollary 2:
The lower bound on the number of chains needed
For example, the lower bound on the number of chains for the in an optimal solution is max ( 1 , #pairs -min (Pabut, Nabut)).
circuit in Fig. 1 , can be derived as following: According to Corollary 2, the optimal solution cannot have fewer than max ( I , #pairs -min ( Pabut, N a b u t ) ) = max ( 1, 5 -min (5, 4 ) ) = 1 chain.
The only situation, #pairs -min (P,,,,, N d b U t ) < 1, occurs when the solution is a single circular chain.
VI. THE OPTIMAL CHAINING ALGORITHM The problem we are concerned with is to chain all the transistor pairs using a minimum number of chains. In other words, we are to minimize the number of diffusion separation gaps. Our algorithm consists of three subtasks: pairing, bipartite graph construction, and chaining, as shown in Fig. 2 . Its input is a CMOS cell circuit schematic at the transistor level, and its output is a minimum set of chains.
The pairing procedure groups the transistors into pairs, where each pair consists of a P-type transistor and an N-type transistor. Since the two transistors within a pair will be placed at the same column, it is desirable to pair the transistors in such a way that terminals of the same signal are vertically aligned as much as possible. For example, a dual of P-and N-type transistors within a CMOS compound(comp1ex) gate should be paired because they share the same gate signal. Also, the two constituent transistors of a CMOS transmission gate are good candidates for pairing because both their drains and sources are aligned.
After pairing is done, a bipartite graph model as described in Section 111 is constructed using a matrix representation. In the graph, the set of edges represents all possible abutments between pairs. 
ALGORITHM

G' = G ; E(G') = E ( G ' ) -(e) -XOR(e);
chaining((;', B');
Y = Y -e ;
end; end; Fig. 3 . The chaining algorithm.
Our objective is to find a maximum subset of mutually compatible elements among the set of all possible abutments. A pseudocode description of the proposed depth-first search algorithm is shown in Fig. 3 . In the algorithm, G and B are the bipartite graph and the set of chosen abutments, respectively. G is initially set to the bipartite graph while B is set to 0 . If there are essential edges in E( G ), we confirm them and remove from E( G ). Then, the tree is explored by selecting a set of edges \k in E( G ) for expansion at this level. An edge e i and its mutually exclusive edge set XOR ( e i ) constitute the edge set q. The edges in \k are queued in descending order according to the upper bound on the number of abutments after they are selected. Once an edge, e;$:, is selected and confirmed, the reduced graph, G ' (with e;$: and XOR(~;$.) removed from G), and the confirmed set, B' (with e;$: added to B) are passed down to the next level. This process continues in a depth-first fashion.
We prune the queue when its head has an upper bound on the number of realizable abutments no greater than those of the retuming solution. The pruning according to the upper bound on the number of realizable abutments is based on Theorem 1. We now show the correctness of exploiting just a limited set (an edge and its XOR'S) at each node in the following theorems.
Theorem 2:
The algorithm will not miss any solutions by expanding out of each node an edge and its mutually exclusive set.
Proof:
Because the order of confirmation has no effect on the chaining, we only need to prove that for any two edges, e ' and e", if C ( e ' , e"), they must come down the tree from a same branch. Hence, the maximum painvise compatible set of edges, which constitute the optimal chaining solution, will appear in a same leaf node. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, VOL. 9. NO. 7, JULY 1990 GI : Gs :
G,:
c, Let the expansion edges at a particular node be e and xoR(e), a n d 9 = { e } U X O R ( e ) .
If { e ' , e" } fl \k # 0, e' and e" will come down from a same branch.
The only situation that prevents e' and e" from coming down a same branch is when { e ' , e"} fl C = 0, xoR(e') 2 C', XOR(e") 2 C", and \k' U 0" = C. Now, we prove this situation will never occur.
Assume that the above situation exists, there must be e E C ' or e E C". In the case e E C', e E XOR(e'), therefore, we have e' E XOR(e), which contradicts { e ' , e" } f l C = 0. Similarly, if e E *'' then e E xO~(e"), therefore, we have e" E XOR(~), which also contradicts { e ' , e" } f l
Theorem 3: The algorithm will produce redundant solutions if there are e, E X O R ( e ) and e, E X O R ( e ) such that C ( e , , e, ). If C ( e , , e, ) , there must be an edge set E 2 { e , , el } and E is an MCC. Due to Theorem 2 , E must appear in solutions derived from both the branch expanding with e, and the branch expanding with e,. Therefore, the redundancy exists.
Theorem 4:
There will be no redundant solutions from the current node if the set of expansion edges e and X O R (~) is an MEC.
Proof:
If { e } U XOR ( e ) is an MEC, the edge expanded from one branch will not go down via other branches. Therefore, there will be no redundancy introduced.
Q.E.D. Fig. 4 shows how the algorithm works on the circuit in Fig. 1 . The search process starts from the root, where GI = G (the original bipartite graph) and B , = 0 . Since there is no essential edge in G I , we form an edge set C which consists of e:: (e:: has the minimum number of mutually exclusive edges), and the members of xo~(e"fp2) ( = { e : : } ) . Now, the first level of the search tree consists of two branches. Along the first branch, we add e:: to B , to form a new binding B2 and delete it and XOR ( e:: ) from GI to form GZ. The results are G2 and B2. Along the second branch, we add e:: to BI to form B,, and delete it and XOR( e::) = {e::, e : : , e::} from GI to form G7. By applying the recursive procedure, we will obtain the whole search tree (Fig. 4(a) ). Fig. 4(c) shows the matrix representations of GI and B, of the search tree. Fig. 4(b) shows the search tree when the theoretical lower bound is applied to prune the search. The tree is totally skewed because the solution, SI, happens to be optimal.
VII. IMPLEMENTATION A N D EXPERIMENTS
We have implemented the proposed algorithm using the C programming language. Table I lists the results. In the table, column 2 shows the number of transistors of each case. Column 3 denotes the total number of edges of the initial bipartite graph G. Column 4 is the theoretical lower bound on the number of chains. Column 5 shows the number of chains obtained by our algorithm. All of them except that of case 3 have optimal solutions with the number of chains equal to the lower bound. This shows that our lower bound on the number of chains needed is indeed very tight.
Columns 6 and 7 show the tree size (the number of nodes expanded) and the computation time (CPU seconds), respectively. Note that the tree size is not necessarily proportional to the circuit size (e.g., case 4) due to the presence of essential abutments. From the experiments, we conclude that our algorithm is able to find solutions almost instantly for all the test cases from the literature. Fig. 5 shows the circuit, the chaining result, and the symbolic layout of Case 9 (a static complementary full adder).
VIII. CONCLUSION AND FUTURE WORK
We have proposed a fast algorithm for the optimal transistorchaining problem in CMOS functional cell layout. We have developed a bipartite graph model to represent the possible abutments between P-N transistor pairs. A depth-first tree search algorithm is used to find a minimum number of chains.
According to the bipartite graph model, a lower bound on the number of chains needed in an optimal solution has been derived. Experiments have shown that this bound is extremely tight. Theorems are proven to help eliminate unnecessary searching. The algorithm has been successfully implemented and tested. Excellent experimental results have been reported. With this algorithm, we can solve a 'large sized problem of more than one hundred transis- Currently, our algorithm does not handle the logic equivalent property as [2] did. Instead, it assumes that the given topology is fixed. In some applications where circuit performance is not as critical as silicon area consumption, our algorithm may produce inferior results. We are investigating the possibility of enhancing the graph model to accommodate this situation.
Our assumption that the number of P-type transistors must be equal the number of N-type transistors may prevent our algorithm from being applied to some circuits which the numbers of P-type and N-type transistors are not equal. One possibility is to add dummy transistors before the algorithm is run. Another possibility is to have the singular transistors (those cannot be paired) untouched during the chaining process and reinsert them afterward.
