Pass-Transistor Mapper PTM, a logic synthesis tool speci cally designed for passtransistor based logic library that has only three basic cells, is reported. It exploits the close relationship between BDD representation of logic and the structure of pass-transistor logic cells to ensure e cient technology mapping. BDD variable order optimization is achieved through a genetic algorithm with dynamic parameters. Unlike LEAP, the only previously reported system for pass-transistor logic, PTM integrates both synthesis and logic optimization in one step and can be used for large logic functions. Results of using PTM on a large set of benchmarks are compared to that from Berkeley's SIS using the MCNC CMOS cell library and are found to be promising.
Introduction
It has been demonstrated that CMOS pass-transistor based logic can often result in high-speed and high-density circuits. For example, by using pass-transistor logics, a 3:8ns 0:5m CMOS 1616-b multiplier was implemented in 1990 1 , a 1:5ns 0:25m 32b CMOS ALU was developed in 1993 2 , and a 4:4ns 0:25m CMOS 54 54-b multiplier was designed in 1995 3 .
The most important di erence between conventional CMOS cells and pass-transistor cells is that in conventional logic the inputs can only be connected to the gate of transistors, and all inputs are symmetrical. In the pass-transistor logic inputs can be connected to both gates and drains, 1 The authors are with the Dept. of Electrical and Electronic Engineering, Imperial College London, UK, SW7 2AZ and changing the input con guration will correspond to di erent Boolean functions. It implies that pass-transistor based circuits are more exible. Unfortunately, traditional state-of-the-art logic synthesis tools can no longer be used in pass-transistor logic design if the full potential of pass-transistor logic is to be exploited. Due to the lack of the automatic design tools for general logic functions, pass-transistor logic is currently mainly restricted to the design of arithmetic macros 1 4 . A research group at Hitachi proposed a synthesis package, LEAPLean Integration Passtransistor, for pass-transistor logic 5 . In LEAP a v ery simple pass-transistor cell library is proposed as shown in Figure 1 , where Y1, Y2, and Y3, which are essentially multiplexers, form the basic logic cells. Completing the library are simple inverters with di erent drive capabilities. The core idea of LEAP is a to express the required logic function with a reduced and shared BDD 6 , and b to partition the BDD into the smaller trees which can be mapped into one of the basic cells in library. Although LEAP shows very positive results for small designs, it falls short in the following: 1 it does not have an optimization capability; 2 it can not handle large logic functions.
In this paper, we describe an improved pass-transistor synthesis tool called Pass-Transistor
Mapper PTM. In PTM, w e use the same cell library as 5 , where Y1, Y2, and Y3 are used as the essential logic cells, and the inverters are used to realize the negation operation of input variables. Compared with LEAP, PTM has the following improved features:
1 it employs optimized ROBDD 7 9 ; 2 it can handle large logic functions; 3 it exploits state-of-the-art logic synthesis techniques for pass-transistor technology mapping.
PTM is a sophisticated package composed of a group of algorithms. The BDD ordering using Genetic Algorithm is explained in Section 2. Pass-transistor based technology mapping using Boolean matching and greedy covering algorithm is proposed in Section 3. The inverter optimization algorithm, netlist compiling algorithm, and PTM package implementation are described in Section 4. Section 5 discusses the test results for large set of MCNC benchmarks using Mentor Graphics' GDT environment for placement, routing and veri cation, and conclusions are given in Section 4.
2 BDD size minimization using Genetic Algorithm
Problem Formulation
It is well known that multiplexer based structures are directly related to BDDs, and nding the optimal data-select variable ordering of the multiplexer network is equivalent to nding the good variable ordering for BDDs 10, 11 . Since reduced ordered BDD ROBDD has been shown to be an e cient representation of Boolean functions for logic synthesis and veri cation 6 , it is employed in our algorithm. The rst step of our synthesis algorithm is to minimize the size of the ROBDD, and then the ROBDD is mapped to the pass-transistor cell library.
De nition 1: An OBDD is a rooted directed graph G = V;E. The vertex set V is composed of two kinds of vertex, non-terminal and terminal. Each non-terminal vertex has as attributes a pointer indexv 2 f 1 ; 2 ; :::; n gto an input variable in the set fx 1 ; x 2 ; :::; x n g, Previous research indicated that GA methods can produce better results with reasonable CPU time for most benchmark circuits. In this paper a novel GA 8 which uses dynamic parameters is employed to search for near optimal variable ordering.
De nition 4: For any Boolean function fx 1 ; x 2 ; :::; x n , the decision variable ordering from the root to the terminals of the data structure can be de ned with a vector Order n = f x k 1 ; x k 2 ; :::; x kn g.
For a certain Order n , the size of the ROBDD, BDD Size, is the number of the non-terminal vertex in the data structure. Our goal is to nd a good variable ordering fx k 1 ; x k 2 ; :::; x kn g for a given function fx 1 ; x 2 ; :::; x n , such that BDD Size is minimized.
The Dynamic Parameter Genetic Algorithm
The optimal variable ordering used in calculating the BDD is found with a Genetic Algorithm GA, details of which can be found in 18 . A major factor that a ects the performance of GAs is the size of the population from which new generations are evolved. A small population size would tend to limit the search in a con ned area. A large population size will spread the search area wider and reduce the chance of producing good o spring as result of the diversi cation of the parents.
Mutation is another important mechanism in GA which prevents a solution being trap in a local minima. Unfortunately, setting too high a mutation rate can cause the algorithm to escape from solution areas which are already close to the global optimum, while too low a rate might cause the algorithm to be trapped in local minima.
In the new algorithm, we propose to use dynamic population size and mutation rate as de ned Here 1 , 2 and are constants and are determined experimentally. Whenever improvements are found, the algorithm is exploring in an area which is already promising. Therefore it is reasonable to concentrate the search e orts and avoid increasing the exploration space. It is also reasonable to reduce the chance of escaping from this search area. On the other hand, if no improvement is found, it is worth enlarging the search area by increasing both the population size and mutation rate to provide a better chance of jumping out of local minima. This algorithm therefore avoids the use of too small or too large a search area and too many or too few random mutations.
In conventional GA implementations, the algorithm is terminated either after a xed number of generations, or when no improvements are found in a xed number of iterations. In our algorithm, we allow the stop criteria of the algorithm to adapt to improvement as follow: A greedy covering algorithm is employed in this package. The subject graph is not required to be a tree, but just a rooted DAG for Boolean covering. There is always at least one match for each v ertex because the library contains a multiplexer, which is strongly related to the BDD data structure. When multiple matches exist at a given vertex, the algorithm selects the biggest covering match. This is because according to the library circuits in Figure 1 , the bigger the cell chosen, the more e cient it is in area and delay. The selection priority in the pass-transistor library is therefore Y3, Y2, and Y1. For example, for vertex j in Figure 2 , there are three matches, where match 1, which related to Y1 cell in the library, is smaller than match 2 and match 3. So, match 1 is ignored. Match 2 and match 3 are with the same size. The algorithm will arbitrary select one of them. The complexity of the algorithm is linear with respect to the size of the subject graph.
PTM Implementation
After the covering algorithm, a gate interconnection netlist based on the pass-transistor library has to be generated. However, to further optimize the synthesised circuit and to provide a suitable interface with external physics design tools, such as Mentor Graphic's GDT, the following synthesis and interface algorithms are used.
Inverter optimization
During technology mapping described in the last Section, some redundant i n v erters would inevitably be introduced. We use the algorithm red inv rem to remove these parallel and serial redundant i n v erters. If there are i i 1 inverters fanout from the same node, the algorithm will remove i , 1 parallel redundant i n v erters from the network. If an inverter's fan-in is also an inverter, the algorithm will remove both inverters from the network. The most signi cant di erence between red inv rem and reminv in UC Berkeley SIS 16 is that red inv rem will avoid removing bu er inverters associated with the pass-transistor library cells, because they are required to ensure correct drive capability in the circuit.
Phase optimization
The total number of inverters in the network can be further minimized via phase assignment algorithm. Here, we directly use good phase algorithm, which i s a v ailable in UC Berkeley SIS 16 , to perform phase optimization. good phase determines for each node whether to implement the function or its complement in order to reduce the total number of inverters.
Interface algorithm
To establish a suitable interface with the commercial physics design software, Mentor Graphic's GDT, we used the algorithm interface GDT. The algorithm can transfer the pass-transistor library based netlist into the netlist which can be accepted by GDT. It transforms a netlist format "cell node 1 n o d e 2 ... node n" into a router le format "cell xx.node yy, cell xy.node yz, ...,cell zz.node xz" used by the automatic place and route package AutoCells. 
Veri cation
After synthesis, the network is usually changed from the original. It is necessary to verify the optimized network against the speci cation. We i n tegrated Berkeley's formal veri cation algorithm verify in our package PTM.
The entire synthesis procedure for pass-transistor based CMOS circuits is depicted in Figure 3 PTM includes all algorithms within the rectangle, and has been integrated in UC Berkeley's SIS environment. It can accept the standard gate-level descriptions from HDL compilers, create the optimized netlist based on pass-transistor cells, and generate data for the layout tools.
Results
The numerous algorithms described in the previous sections were combined together as PTM.
Since no other pass-transistor based synthesis tools which can test general benchmark circuits has been published, the performance of PTM is compared to state-of-the-art CMOS-based synthesis algorithms in UC Berkeley SIS. In SIS a gate-level description is initially optimized with the most robust technology independent synthesis script script:rugged. Then the optimized network is mapped into MCNC CMOS standard library, mcnc:genlib. good phase is also used to perform phase optimization. To make the comparison fair, both pass-transistor logic and standard CMOS logic libraries are designed for the TSMC 0.6 process using 5V supply voltage.
Since the technology allows only two metal layers metal1 is used for the cell design leaving metal2 for over-cell routing. The CMOS library has 19 di erent cells, the pass-transistor cell library only needed 7 di erent cells including inverters and bu ers.
Although PTM does not use power dissipation as an optimisation parameter, a rough estimate on the power dissipation of synthesised circuits were made. We concentrated on the capacitive switching power and neglected short-circuit, leakage and static power dissipation. Node activity for each input was assumed to be 50 and uncorrelated. Using the topology of the netlist the activity for each cell output was calculated. The capacitive load on this output was derived from a database containing input-capacitance information for every cell-type. These inputcapacitance include the internal capacitances of the cells, which are derived from a statistical analysis of their hidden internal nodes. The average power dissipation can be calculated using the following formula 17 :
where a 0, 1 is the probability of a '0' to '1' transition on the node, C L is the load capacitance, V dd and f clk are the supply voltage and operation frequency respectively. Since V dd and f clk are constant for a given system, the average power for the entire circuit can be estimated by the activity-capacitance product:
where C ij is the input capacitance for the j th fanout of the i th node.
A large set of benchmark circuits were tested, and the results are shown in Table 1 . Results for the pass-transistor implementation are represented as percentages of that for the CMOS implementation.
The CMOS values are used as reference, while PTM is given as a percentage to this value.
Column "Area" indicates the area used in mm 2 ; "Delay" shows the critical path delay o f t h e mapped and place-and-routed circuits in ns; "C-active" indicates the average node-activity multiplied by the load-capacitances in pF as mentioned above. "CPU" is the CPU time on a SUN Sparc 10 workstation in seconds. From Table 1 , it can be seen that PTM designs are marginally larger in area when compared with standard CMOS. However, critical path delay i s in general better, on average by a factor of 1.4, with occasional exceptions. Power dissipation is also lower for PTM in most cases.
Conclusion
A CMOS pass-transistor based logic synthesis package, PTM, which can run in UC Berkeley's SIS environment is proposed in this paper. Since the cell library used is strongly related to the BDD data presentation, our algorithm is not only memory e cient, but also much faster than the Boolean matching method in 15 . To solve the NP-complete problem of BDD variable ordering, Genetic Algorithm with adaptive population size mutation rate and number of generations was used. Since mapping is achieved using a cell library consisting of only three logic cells and four bu er-inverters, adding new technologies is extremely easy. Comparing the area, speed and power of the benchmark circuits synthesis by PTM for pass-transistor logic and SIS for conventional CMOS logic, better area-power-speed product is achieved in the majority of the cases using PTM.
The current PTM only attempts to minimise the size of the mapped circuit. A future implementation will include power and timing in the optimisation cost function. 
