Reversible logic has applications in quanturn computing, low power CMOS, nanotechnology, optical Computing, and DNA computing. The most common reversible gates are the Toffoli gate and the Fredkin gate. Our synthesis algorithm first finds a cascade of Toffoli and Redkin gates with no hacktracking and minimal look-ahead. Next we apply transformations that reduce the size of the circuit. Transformations are accomplished via template matching. The basis for a template is a network with m gates that realizes the identity function. If a sequence in the network to be synthesized matches more than.half of a template, then a transformation that reduces the gate count can be applied. In this paper we show that Toffoli and Fredkin gates behave in a similar manner. Therefore, some gates in the templates may not need t o be specified-they can match a Toffoli or a Fredkin gate. We formalize this by introducing the box gate. All templates with less than six gates are enumerated and classified. We synthesize all three input, three output reversible functions and compare our results to those obtained previously.
INTRODUCTION
Reversible logic is an emerging research area. The synthesis of reversible circuits differs significantly from synthesis using traditional irreversible gates. Two restrictions are added for reversible networks, namely fan-outs and feedbacks are not allowed. The only possible structure for a reversible network is a cascade of reversible gates. The most frequently used gates are the Toffoli gate (91 and the Fredkin gate 13). The Toffoli gate inverts a single bit if the AND of a set of control lines is 1. That is, it performs a controlled NOT. The Redkin gate interchanges two bits if the AND of a set of control lines is 1. In other words, a controlled swap. The formal definitions are given in Section 2.
Only a few synthesis methods have been proposed for reversible logic. Suggested methods include: using Toffoli gates to implement an ESOP (EXOR sum-of-products) (71, exhaustive enumeration 161, heuristic methods that iteraPermission to make digital or hard copies of all or pa* o f this work for personal or classroom usc is granted without fee providcd that copies are not made or distributcd for profit or commercial advantage and thal copies bear this notice and the full citation on the first page. To cupy otherwise, to republish, to post on servers or lo redislribute to lists, requires prior specific permission and/or a fee. lCCAD'03. Novemhcr I1-13,2003, San Jose, Califomia, USA.
Copyright2003 ACM 1-581 13-762-1/03/0011 ... 55.00.
256
tively make the function simpler (simplicity is measured by the Hamming distance [1] or by spectral means ( 5 ] ) , and transformation based synthesis [4], among others. Some methods use excessive search time, others are not guaranteed to converge, and some require many additional outputs (garbage). We follow the two-step approach suggested in IS]
and further investigated in 12). The first paper [6] describes templates with Toffoli gates only. The second (21 introduces some templates with Fredkin gates, however, they are restricted to three inputs. First a network for the given function is found. The algorithm for this step is guaranteed to converge. In fact, the algorithm is very fast. Improvements on a naive algorithm are described in [SI (as they apply to Toffoli networks). The second step consists of applying txansformations that reduce the number of gates. In this paper we describe and classify the templates used for such transformations in detail.
DEFINITIONS
and generalized Redkin (31 gates defined as follows.
In this work we consider cascades of generalized Toffoli For both gate types, C will be called the control set and T will be called the target set. The number of elements of the set of controls C defines the width of the gate. The set of generalized Toffoli and generalized Fredkin gates will he called the Redkin-Toffoli family. For the control set C = {x3,xp, ..., xh+z} the pictorial representation of gate TOF(C, 2 2 ) is shown in Fig. la * In a correctly built circuit a box symbol will never be on the same line with EXOR or SWAP symbols.
~~, if the setting for the box (Toffoli or Fredkin) is not specified, the box can be either one, assigned accordingly to the above rules.
THE ALGORITHM
In this section we consider the synthesis of completely specified reversible functions realized with gates from the Ftedkin-Toffoli family. Before we describe the algorithm, we have to agree on the function representation. For our method it is best to think of the function to be realized as given by a truth table, which has input patterns on the left and output patterns on the right side. The input patterns are arranged in lexicographical order.
The basic algorithm starts with an empty circuit (the identity function). At every step of the synthesis algorithm we add some gates from the Fredkin-Toffoli family t o the end of the circuit. Since the reversible cascade can be built from either end. The basic algorithm starts from the output.
Basic algorithm.
Step 0. Idea: take the narrowest gates and arrange them i n a cascade so that they bring the first output pattern to the first input pottern.
The first pattern in the truth ..., xn) is less than the order of (al,nz, ..., a,) and the set of controls defined as minimal subset of unit values of (~1~x 2 , ..., xn) such that this subset forms a Boolean pattern of an order higher than (a,, az, ..., a,) is minimal. This can be easily done if "swaps" are done on the low order bits first.
Note, that in this case initial pattern (bl, bz, _.., bn) was greater than (al,an, ..., a n ) , so the most significant binary digit of ( b l , bz, ..., b, ) equal one was greater than the most significant one digit of (al,az, ...,a,,]. Thus, it will be taken as the control (when a control is needed) for all corresponding Fredkin and Toffoli gates except the last Toffoli gate, for which the control will consist of all unit digits of ( a l , a z , ..
., a,).
If the number of ones in ( b l , bz, ..., bn) is equal to the number of ones in (a,, az, __.,a"), it is possible to transform one pattern into the other using "swap" operation only. Controls are determined by the procedure described in the above case.
e If the number of ones in (bl,bz, ..., bn) is greater then the number of ones in ( a l , a z , ..., an), apply "swaps" starting from the end of the pattern ( b l , 4 , ..., b,) and then apply necessary Toffoli gates. All the necessary controls can be found using the procedure from the first case.
Step 2" -1. When all the 2" -1 of previous patterns are in places, the last patterns will automatically be correct.
Motivation. Given a target technology, it usually happens that the narrower the gate, the less costly it is, thus we try to.use the narrowest gates. Although choosing the narrowest gates at each step may lead to larger initial circuits which might not be simplified enough by the template tool. Another possibility is to chose the control set such that the remaining function is as simple as possible. To measure "simplicity" we use the Hamming .distance as a heuristics.
It also happens that the template simplification tool is sensitive to the width of gates, therefore by taking narrow gates we prepare the circuit for better template reduction.
from the output to input by adding the gates in one direction starting frbm the end of desired cascade and ending at its beginning. What if we were able to understand what happens if during the procedure when a gate is added to the beginning of cascade? Then we would be able to construct the network from, the two ends simultaneously hy growing the number of gates from the two sides. The idea of the method is the s a m e b y applying the gates to match input with output part pf the truth table to each other by assuring that at each step of calculation we put at least one pattern at its place. It makes sense that such a bidirectional algorithm on average will converge faster. 
TEMPLATE SIMPLIFICATION TOOL
Let a size m t e m p l a t e be a sequence of m gates (a circuit) which realizes identity function. The template size m should also be independent of the templates of smaller size, e.g: for a given template size m no application of any set of templates of smaller size can decrease the number of its elements. For a template Go GI ... Gm-l its application is one of the two operations:
1. Forward application. A sequence of gates in the network which matches the sequence Gi G(,+I),nod,n... ( 1 , 1 , . . . , !,5*+21Zk+111b+3....15O) .
The'presented algorithm is an improved version of the al. gorithm proposed in 16). The algorithm in 16) uses Toffoli gates only, and its basic structure was simple: for input Pat- pattern ( b l , b z > ...,b,,) t o t h e form (alVbl,azVbz, ..., a,vb,) (increase the order) using controls xi : bi = 1. And then use the Toffoli gates'with controls z; : ai = 1 to bring (alVbl,azVbz, ..., a,Vb,) to the desired form (a,,a,, . .
., an).
For such an algorithm it was easy to construct the worst case function. Particularly, for n = 3 such a function was constructed (it is unique) and called 3.17.pla. The cost of realizing this function was 17.
tern (a,,a,, ..., a,, If the assignment was SWAP, the line with the box becomes the two lines, where the symbol SWAP is put. Every occurrence of a control on the line with this box is substituted with two controls and every occurrence of the box symbol is substituted with SWAP. EXOR symbol cannot appear in this line, since by the first item, had it be there, all the boxes would be substituted with EXOR, thus SWAP substitution will be incorrect initially.
Further, if a box symbol in a circuit is not specified, it can be either EXOR 01 SWAP which are substituted iuto tht: circuit by the above rules.
m=l. There are no templates of size 1, since every gate changes at least two input patterns.
m=2. There is one class of templates size 2 the duplication deletion rule, AA, which is defined as G(S1, B1) G(S1, El)
This class is a generalization of the duplication deletion rule [SI and it is true for any two gates which perform a selfinverse transformation. In disjoint notations this class can be written as two formulas, one for two Toffoli gates and one for two Redkin gates: TOF(C1,tl) TOF(C1,tl) and FRE(C1,ti + t z ) FRE(Ci,tl + tz) shown in Fig. 3 . There can be a shorter but less formal condition: the tempIateG(S1,Bl) G(S2,Bz) G(S1,Bl) G(Sz,B,) exists if for the first (if there are two with this pmpnrty) line containing a control (dot) and a BOX, the BOX is SWAP, and sets B I , Bz either disjoint or equal. All the cases are shown in Fig. 4 . 'The first part of the OR condition covers the first picture, the second OR condition describes the second. The third and fourth pictures illustrate the case when the third condition holds.
There is a regular procedure for finding all the templates of the form ABAB. Since ABAB is the identity, the circuit produced by the sequence of gates AB must be a self-inverse permutation. The search of the templates of the form ABAB, thus, becomes equivalent to the search of self-inverse permutations that can be realized by two different gates.
The following sets of templates can be treated as one, Fig. 4 where the first box is Ftedkin and the second is Toffoli and the set CA is empty, is a template of a semi-passing group. The new templates added by this group are shown in Fig. 5 . Note, some of the semi-passes leave the gate G(S2, Bz) unchanged. Also, if we take the set of all semi-passing group templates and subtract the set of all templates of the passing rule group, the resulting set will have the semipassing group templates where the second gate always changes.
-A group can he treated a? the definition of the This class is illustrated in Fig. 8 . The program which searches for the self dual functions of size three has found only those functions that are described by the presented template or circuits which can be simplified by other templates. Thus, we conclude that we built all the size 6 templates of the form ABCABC.
Applying the templates to the circuits shown in Fig. 2 results in the two circuits shown in Fig. 9 . The circuit of size 12 (Fig. 2a) resulted in a circuit of size 7 (Fig. Sa) . The second circuit (Fig. 9b) was reduced to size 6 from the original 7 (Fig. 2b) .
RESULTS
We have written a program that synthesizes functions using the bidirectional algorithm and then, applies the template tool as a primary circuit simplification procedure. We ran our program exhaustively for all reversible functions with 3 variables and compared the results of our algorithm to the results of optimal synthesis. Table 3 
