Abstract-Additional lines are required to implement an irreversible function as a reversible circuit. The emphasis, particularly in automated synthesis methods, has been on using the minimal number of additional lines. In this paper, we show that circuit cost reductions can be achieved by adding additional lines. We present an algorithm for line addition that can be targeted to reducing the quantum cost of a circuit or the transistor count for a CMOS implementation. Experimental results show that the cost reduction can be significant even if (1) only a small number of lines (even one) is added and (2) other circuit optimizations have already been applied.
I. INTRODUCTION
The amazing developments in microelectronic technology and the looming limitations are well known and well documented. Both, transistor density and power dissipation are crucial issues for designing current high performance digital circuits. For these reasons, researchers expect that in 10-20 years current technologies will reach their limits (see e.g. [1] ). It is thus important to consider alternative technologies. Reversible logic offers one such alternative due to its applications in the domains of low-power design, quantum computation, and bio-computing.
Landauer and Bennett showed in [2] , [3] , that power dissipation is not only caused by non-ideal behavior of transistors and materials, but also by a more fundamental reason. Each time information is erased (e.g. when the irreversible AND operation is computed), energy is dissipated as well. Only if computation is information-lossless, can zero energy dissipation be achieved. This holds for reversible logic since only bijective operations are allowed (i.e. each input pattern is mapped to a unique output pattern). Practical reversible circuits have been built that exploit these properties [4] . In fact, these circuits were powered by their input signals only and did not need additional power lines.
Quantum computation is of interest, since it potentially allows exponential speed-up of computation for many important problems. This is because qubits, in contrast to traditional bits, can assume a probabilistic superposition of the base states 0 and 1. As a result, a set of qubits can represent multiple states at the same time enabling enormous computational speed-up. Quantum circuits have been developed working with 8, 16 , and, recently, 28 qubits (see e.g. [5] ). Reversible logic acts as a framework for quantum computation since all quantum gate operations are inherently reversible.
For the reasons cited above, reversible circuits and logic synthesis have in recent years become well studied topics. Many synthesis approaches for reversible logic have been proposed (e.g. [6] - [15] ). Except for very small cases, these methods do not produce minimal results. Thus, post-synthesis optimization is applied to reduce the cost of a circuit.
For example, template matching [11] , [16] , [17] is a search method which looks for gate sequences that can be replaced by alternative cascades of lower cost. The run-time increases with both the number of applied templates and the number of gates in the circuit and as a result can be quite high. But for many circuits substantial improvement are achieved.
As a second example, Zhong and Muzio [18] showed how analyzing cross-point faults can identify redundant control connections in reversible circuits. Removing such control lines reduces the cost of the circuit. However, the computation needed to determine such redundancies is extremely high.
In this paper, we propose a post-synthesis optimization technique which reduces the cost of the circuit by adding further signal lines to the circuit. The general idea is to use the new lines for "buffering" factors of gate control lines so that they can be reused by the other gates in the circuit. This reduces the size of these gates and thus decreases the cost of the circuit. Both, quantum cost (used in quantum circuits) and transistor count (used in CMOS implementations) are considered.
A fast algorithm is presented along with results that show it can be quite effective even if only a small number of lines, even 1, are added. Since we consider the addition of lines to a circuit (not to the functional specification), our approach is applicable to circuits designed by any automated synthesis method and in fact to circuits designed by hand.
Experiments show that applying our approach circuit cost can be reduced by up to 70% with adding only a single line. Thus, even for quantum realizations (where qubits and hence the number of circuit lines are limited) adding a further line is beneficial as it reduces the quantum cost significantly. So, the designer can trade off the additional expenses of a further circuit line for a significant reduction of quantum cost. For CMOS realizations, the observed reductions are more moderate. However, here the cost of adding a new line to the circuit is negligible, so that the proposed optimization is worthwhile for such circuits as well.
In previous work, it has already been observed that more circuit lines usually lead to lower (quantum) cost (see e.g. [19] or recently [20] ). Moreover, the authors of [21] even showed that some functions cannot be synthesized for certain gate libraries unless one additional line is added. However, in this paper these observations are exploited for the first time by proposing a constructive post-synthesis optimization approach for reversible logic.
The remainder of this paper is structured as follows: Section II provides the necessary background on reversible circuits for this paper. The general idea of the proposed approach is introduced in Section III while Section IV describes the algo-rithm in detail. Experimental results are reported in Section V and the paper concludes with observations and suggestions for further research in Section VI.
II. REVERSIBLE LOGIC CIRCUITS
In this section, we provide the background on reversible circuits necessary to make this paper self-contained. Readers interested in a more extensive introduction to the subject should consult the literature, e.g. [22] .
is reversible if it maps each input pattern to a unique output pattern. Otherwise, the function is termed irreversible.
A reversible function can be realized by a circuit G = G 1 G 2 . . . G k comprised of a cascade of reversible gates G i with no fan-out or feedback [22] . A reversible gate, itself realizes a reversible function. Many reversible gates have been proposed [22] . In this paper, we concentrate on multiple control Toffoli and multiple control Fredkin gates which are defined as follows: Definition 2. A multiple control Toffoli gate (MCT) with target line x j and control lines {x i1 , x i2 , . . . ,
Note that all control lines must be 1 for the target to be inverted and an MCT gate is thus a controlled inversion of the target line. An MCT gate with no control always inverts the target line and is the well-known NOT gate. An MCT gate with a single control line is called a controlled-NOT (CNOT) gate. The case of two control lines is the original gate defined by Toffoli. An MCT gate is denoted M CT (C; t) where C is the possibly empty set of control lines and t is the target line. An MCF gate is denoted M CF (C; t p , t q ) where C is the possibly empty set of control lines and t p and t q are the target lines. Note that the control lines and unconnected lines pass through both types of gate unchanged. For drawing circuits, we follow the established convention of using the symbol ⊕ to denote the target of an MCT gate, × to denote the targets of an MCF gate, and solid black circles to indicate control connections. Example 1. The circuit in Figure 1 realizes the irreversible benchmark function RD53 (taken from RevLib [23] ) as a reversible circuit. Note that two constant input and four garbage outputs are required.
The number of gates in a cascade is a very poor measure of its complexity since the costs of MCT and MCF gates depend on the number of control lines and the target technology. Thus, we use two distinct cost models in this paper: transistor cost and quantum cost. While the transistor cost model estimates the cost of the circuit in terms of the number of CMOS transistors, the quantum cost model estimates the cost of the [24] . The transistor cost of a circuit composed of MCT and MCF gates is the sum of the transistor costs of the individual gates.
Definition 5. Quantum cost model:
The quantum cost of an MCT gate is given in Table I 1 where m is the number of control and target lines for the gate and n is the number of circuit lines. The quantum cost of a circuit is the sum of the quantum costs of the individual gates. The quantum cost of an MCF gate with m control and target lines is calculated as the cost for an MCT gate with that number of lines plus 2.
Even if these models provide better cost estimates than simple gate count they are still only approximations. In particular, they do not take into account transistor or elementary quantum operation reductions that can be made by combining the realizations of MCT and MCF gates that are adjacent, or can be moved to be adjacent, in the circuit. Achieving such reductions in a systematic manner is a complex optimization problem that we leave for future research. Instead we use the well-established cost metrics defined above.
Quite often, the objective in synthesizing a reversible circuit is not only to realize a reversible but also an irreversible Boolean function. This requires the irreversible function to be embedded into a reversible one which requires the addition of constant inputs and garbage outputs defined as follows: Definition 6. A constant input to a reversible circuit is one that is set to a fixed value to achieve the desired functionality.
Definition 7.
A garbage output from a reversible circuit is one which is a don't-care for all possible input conditions.
In [25] , it was shown that at least g = log 2 (µ) garbage outputs are required to embed a completely-specified irreversible function into a reversible function, where µ is the maximum number of times a single output pattern is repeated in the truth table of the irreversible function. Thus, an n-input, m-output irreversible function has a total of m + g outputs, m+g ≥ n. This requires the addition of c ≥ 0 inputs such that n + c = m + g. The new lines must be assigned constant input values. The interest here is what circuit cost savings can be achieved if more than the minimal number of lines are added to a circuit.
III. GENERAL IDEA
Most reversible synthesis approaches produce circuits using the minimal number of signal lines which is equal to (n+c) = (m+g), i.e. the number of inputs and outputs of the reversible function (reversible embedding) that is realized. Optimization approaches, such as the two noted in Section 1, have also concentrated on using the minimal number of lines. In this section we show how extending the circuit by additional signal lines can improve the cost of a reversible circuit. Hence, the additionally added line is denoted as a helper line in the following.
Definition 8. Let G be a reversible circuit. A helper line 2 is an additionally added circuit line
• whose input is set to a constant value 0 and • whose output is used as a garbage output.
Having a helper line available, values can be "buffered" on this line so that they can be later reused by other gates. In doing so, control lines can be saved as shown by the following definition.
Definition 9. Let G be a reversible circuit and h be a helper line. Then, a gate M CT (C, t) of G can be replaced by the sequence M CT (F, h), M CT (h ∪Ĉ, t), M CT (F, h) where C = F ∪Ĉ, F ∩Ĉ = ∅, and F = ∅. In the following this replacement is referred as factoring the initial gate, and F is a factor of M CT (C, t).
The terminology "factoring" and "factor" are natural since partitioning the control set C into F andĈ is essentially factoring the AND function for the control lines. This factoring depends on the fact that 0⊕x 1 x 2 . . . x k = x 1 x 2 . . . x k , i.e. that the result of a factor can be "buffered" by a constant line assigned to 0.
By applying Definition 9 to gates in a circuit, control lines can be removed. Since the number of control lines directly influences the circuit cost, this may lead to less costly circuits. However, this is only the case, if the total cost of the added gates is less than the cost saved by factoring the control lines. By substituting in a single gate only, this can not happen for the transistor count cost model but it can for the quantum cost model. If more than one gate can be substituted, higher cost savings are achieved (reductions for the transistor cost model are also observed).
These ideas are clarified in the following example. Example 2. Consider the cascade of Toffoli gates depicted in Fig. 2 (a) . The gates in this cascade have a common control factor F = {x 0 , x 1 }. Hence, the cost of this circuit can be reduced as shown in Fig. 2 (b) by adding an additional line h (at the top of the circuit) as well as the Toffoli gates T OF (F, h) before and after the cascade. This leads to additional quantum cost of 2 × 5 = 10. However, the factored gates reuse the result of F (dashed rectangle in Fig. 2 (b) ) leading to a reduction of one control line per gate. The removed control lines are shown as white circles. In total this reduces the quantum cost from 104 to 59 and the transistor count from 144 to 136, respectively. Note that the added line is set to constant input 0. Furthermore, the right most Toffoli gate operating on the added line is only needed if the line is to be used for a subsequent factor.
IV. ALGORITHM
Based on the ideas presented in the last section, we now propose an algorithm that adds one helper line and then employs a straightforward search procedure to use that line for optimizing the circuit. More precisely, we show how to extract factors from Toffoli and Fredkin gates in the circuit (the circuit may contain other types of gates). The algorithm can be applied repeatedly to add more than one helper line. It can also be iterated to add lines until adding a further line results in no cost reduction. The transistor cost model or the quantum cost model can be used and in fact the algorithm is readily adapted to any other gate-based cost model. Algorithm 1. Reversible Circuit Factoring Consider a reversible circuit consisting of the cascade of gates G 1 G 2 . . . G k . Let C i denote the set of control lines for G i and let T i denote the set of target lines for G i .
1) Add a single helper line h.
2) Find the highest cost reducing factor across the circuit as follows:
is an MCT or MCF gate and the helper line h is available, i.e. it is not being used by a previously applied factor at this point in the circuit: For every partitioning of C i into {F,Ĉ} with F not empty a) Find the lowest j ≥ i such that j = k or (F ∩(T j+1 ∪h)) = ∅, i.e. find the next gate G j that manipulates one of the lines in F so that the value of the helper line cannot be reused any longer. If the outputs of the circuit are reached use G k instead. b) Determine the cost reduction that would result from applying this factor to all applicable gates between G i and G j , including the cost of introducing two instances of the factor gate M CT (F, h). c) Keep a record of the factor and the gate range that leads to the greatest cost reduction. 3) If no cost reducing factor is found in 2. terminate. 4) Otherwise, apply the best factor found and repeat from step (2) on the revised circuit.
Note that, as already mentioned above the rightmost M CT (F, h) operating on the helper line is only added if the helper line is going to be used for another factor.
Example 3. Figure 3 shows the result of applying our algorithm to the realization of RD53 from Figure 1 using the quantum cost metric. The applied factors are highlighted by brackets at the bottom of the figures. While the original circuit has quantum cost of 128, that can be reduced with 1 helper line to 83 (top circuit) or with 2 helper lines to 66 (bottom circuit). Adding a third helper line does not reduce the quantum cost of this circuit further.
The order in which factors are considered typically has an effect. We apply the algorithm to the circuit as given and then to the circuit found by reversing the order of the original circuit. The better of the two final circuits is taken as the result. Thus, the presented algorithm is a heuristic. But as the experiments in the next section show, even this simple approach leads to good results.
V. EXPERIMENTAL RESULTS
This section provides experimental results for the proposed approach. To this end, the method described above has been implemented in C and applied to all benchmarks from the RevLib reversible logic benchmark website (www.revlib.org) [23] as of July, 2009. Our experiments were run on an AMD Athlon 3500+ with 1 GB of memory. The QMDD-based circuit equivalence checking method from [26] was used to verify all results.
Since some of the circuits in RevLib have already been optimized using various approaches (e.g. extensive template post-synthesis optimization, output permutation optimization, and other techniques), to provide a more even staring point we pre-optimized circuits using the approach described in [11] together with a basic set of 14 templates 3 . Our new optimization method was then applied. This approach shows that even for circuits optimized by other means, further significant reductions can be achieved if helper lines and the algorithm introduced above are used 4 . Table II summarizes the obtained results for one and two helper lines, respectively. The first three columns give the name of the circuit (including the RevLib file ID), the number of circuit lines, as well as the number of gates of the initial (already optimized) circuit, respectively. In the following columns, the obtained results for quantum cost and transistor count models are presented. The proposed approach has been applied with one and with two helper lines to both the circuit as given as well as in the reversed order. The better of the two results is chosen. The percentage improvement is shown for each case relative to the initial circuit cost, which is the cost after template application where applicable. Finally, the last column gives the highest CPU time (in seconds) of a single run for each benchmark. Space does not allow us to report the results for all circuits. Thus, small circuits that have less than 5 lines and less than 10 gates are omitted. Furthermore, for some circuits adding a helper line gave no improvement. Those benchmarks are listed at the bottom of Table II. Considering quantum cost, for most of the circuits significant cost reductions can be observed, even if only a single line is added. Over all circuits (including the ones that gave no improvement), adding a single line reduces the quantum cost by 22.51% on average and in the best case (cycle17 3 112) by just over 69%. This can be further improved if another line is added leading to an average additional reduction of 5.10%. If transistor cost is considered, the reductions are somewhat smaller but still significant. When adding a single line the transistor cost is reduced by 5.83% on average and in the best case (cycle17 3 112) by 37%. Adding a second line reduces the transistor count by a further 1.65%. Since two additional lines is negligible in CMOS technologies, this is a notable reduction as well. In addition, these optimizations can be achieved in very short run-time. Even for circuits including thousands of gates our approach terminates after some minutes -in most of the cases after some seconds.
Finally, we evaluated the improvement achieved when more than two helper lines are added. More precisely, we have applied our method with from one to five helper lines to all the circuits on the RevLib website (including the small ones that have been omitted in Table II) . Again, all these circuits were pre-optimized using templates as described above. A total of 95 of the 177 circuits show an improvement in quantum cost when a single helper line is added. Of the other 82 circuits, 64 have a very small number of lines (less than or equal to 5) and are already highly optimized due to their relatively small size. Fig. 4 shows the improvement in quantum cost a To provide an even basis for the evaluation, all circuits already went through template optimization (using the approach described in [11] together with a basic set of 14 templates) before our approach have been applied. For brevity, small circuits (i.e. circuits that have less than 5 lines and less than 10 gates) are omitted. The circuits 0410184 169, 0410184 170, 4gt13 91, cnt3-5 179, decod24-enable 126, ham7 105, ham7 106, hwb6 58, mod5adder 128, mod5adder 129, rd73 141, rd84 143, sym9 147 and sys6-v0 144 gave no improvement. In summary, by applying the proposed approach significant cost reductions can be achieved if a single line is added to the circuit (even for already optimized realizations). Further (diminishing) improvements result if more than one helper lines are added.
VI. CONCLUDING REMARKS
In this paper we showed that adding lines to a reversible circuit can reduce its cost and that the reduction can be quite significant even if only one or two lines are added. The reduction is as expected higher for the exponential quantum cost model than it is for the linear transistor count model.
The factoring method presented adds additional Toffoli gates to a circuit and would thus appear to increase the delay through the circuit. However, for the quantum model each MCT (including Toffoli) gate represents a cascade of quantum gates. By applying our approach we shorten the length of the corresponding cascades and thus reduce the total number of quantum gates. The actual delay change would have to be analyzed for each circuit. The same is true for the transistor model, and in that case the delay would also have to be analyzed, albeit in quite a different manner.
The most critical issue is the fact, that additional lines (and in the quantum case qubits) must be added to enable the possible optimizations. Thus, the designer must trade off if this additional expense is balanced by the subsequent implementation reductions. Since up to 70% of the quantum cost can be saved, this should be the case for many circuits.
For future work, alternative methods for choosing factors should be compared to the search procedure used in this work. Further, the possibility of introducing helper lines during the synthesis process rather than as a post-synthesis optimization should be investigated.
