Abstract-In the past, reversible logic has become an intensely studied research topic. This is mainly motivated by its applications in the domain of low-power design and quantum computation. Since reversible logic is subject to certain restrictions (e.g. fanout and feedback are not allowed), traditional synthesis methods are not applicable and specific methods have been developed. In this paper, we focus on synthesis of multiplier circuits in reversible logic. Three methods are presented that address the drawbacks of previous approaches. In particular, the large number of circuit lines in the resulting realizations as well as the poor scalability. Finally, we compare the results to circuits obtained by general purpose synthesis approaches.
I. INTRODUCTION
The number of elements integrated in digital circuits grows exponentially, leading to enormous challenges in Computer Aided Design (CAD). Due to this exponential growth physical boundaries will be reached in the near future. Furthermore, power consumption of circuits becomes a major issue. As a consequence, researchers expect that "traditional" technologies like CMOS will reach their limits in the near future [1] . To face this, alternative computation technologies are needed. This motivates research in the domain of reversible logic.
Reversible logic realizes bijections, i.e. one-to-one mappings of Boolean functions. The resulting reversibility allows promising applications e.g. in the domain of low-power design and quantum computation. In fact, it has been proven that zero power dissipation is only possible if computations are performed in an invertable manner [2] , [3] . Reversible circuits are driven by their input signals only (and accordingly without additional power supplies) have already been physically implemented [4] . Besides that, the application in quantum computation is triggered by the fact that every quantum operation inherently is reversible [5] . Quantum computers allow to solve practically relevant problems (e.g. factorization) much faster than traditional circuits [6] , [7] .
As a result, reversible logic has become an intensely studied research area. In particular, synthesis aspects are of interest since reversible circuits are subject to certain restrictions, e.g. fanout and feedback are not allowed [5] . In the past, several general purpose synthesis approaches have been introduced (e.g. [8] , [9] , [10] , [11] , [12] ). The realization of reversible circuits for arithmetic functions is of particular interest, since these functions occur naturally in circuit design.
In the following, we focus on synthesis of multiplier circuits in reversible logic. First reversible realizations of multiplier have already been introduced (see e.g. [13] , [14] , [15] ). However, they often rely on a very special type of reversible gates (e.g. the TSG gate, the HNG gate, or the PFAG gate) and additionally have been proposed for very small bit-width only (in fact, 4 × 4 multiplier have been introduced). Besides that, multiplier can also be realized with the help of the (general purpose) synthesis approaches mentioned above. But since multiplication is an irreversible function, thereby often circuits with a significant number of additional circuit lines result. Furthermore, these approaches do not scale very well in particular for the multiplication function.
In this paper, we present three methods that (partially) address these drawbacks. More precisely, an adjusted multiplication specification is introduced that enables synthesis of a multiplier with a significant lower number of circuit lines. Even if this specification still is only applicable for very small bit-widths, it provides an interesting insight in how to exploit properties of a multiplier while synthesizing them as reversible circuit. Additionally, two constructive approaches for synthesis of multipliers with very large bit-width are proposed. While the first one is a hierarchical method based on partial products, the second one is motivated by the divide-and-conquer-method of Karatsuba's algorithm [16] .
All proposed methods apply the well-established Toffoli gate library. The resulting circuits are evaluated with respect to number of circuit lines, number of gates, quantum cost, and transistor costs, respectively. Furthermore, we compare the results with circuits obtained by (general purpose) synthesis approaches. Overall, methods to generate reversible realizations for practical relevant multiplier with large bit-widths are presented and evaluated.
The remaining paper is structured as follows. Section II introduces the basics of reversible logic. Afterwards, the three methods to generate reversible multipliers are introduced in Section III. This also includes a brief discussion about the advantages and disadvantages of the respective approaches. Finally, experimental results are presented in Section IV and the paper is concluded in Section V, respectively.
II. PRELIMINARIES
To keep the paper self-contained, this section introduces the basics of reversible functions and reversible circuits. For more details we refer to the respective publications.
Definition 1: A multiple-output function f : B n → B m is a reversible function iff (1) its number of inputs is equal to the number of outputs (i.e. n = m) and (2) it maps each input pattern to a unique output pattern.
In other words, each reversible function is a bijection that permutes the set of input patterns. A function that is not reversible is termed irreversible. Quite often, (irreversible) multi-output Boolean functions should be represented by reversible circuits. This necessitates the irreversible function to be embedded into a reversible one which requires the addition of circuit lines leading to constant inputs (i.e. inputs that are assigned to a fixed value) and garbage outputs (i.e. outputs that are don't care for all possible input conditions). The minimal number of circuit lines to be added is determined by the number of the occurrences of the most frequent output pattern [17] .
To realize reversible functions some restrictions must be considered, e.g. fanouts and feedback are not allowed [5] . This is reflected in the definition of reversible circuits.
Definition 2: A reversible circuit G over inputs
g i where d is the number of gates 1 . In the literature, different reversible gates have been studied. The universal multiple control Toffoli gate [18] is the most commonly used gate.
Definition 3: Let X := {x i |1 ≤ i ≤ n} be the set of domain variables. A multiple control Toffoli gate has the form g(C, t), where C = {x ji |1 ≤ i ≤ k} ⊂ X is the set of control lines and t = x l with t ∈ C is the target line. It inverts the target line iff all control lines are assigned to 1, i.e. the gate maps T l−1 i=1
x i . If no control lines are given (C is empty), then the target line is inverted, i.e. the input vector of the gate is mapped to T l−1 i=1
In the following, we refer to multiple control Toffoli gates for brevity as Toffoli gates. Furthermore, a Toffoli gate with no control line (with k control lines) is also called NOT gate (C k NOT gate).
To determine the effort to realize a reversible circuit, the following cost models are applied depending on the target technology:
• Line count denotes the number of lines the circuit uses (in particular for the application in the domain of quantum computing this is an important cost criterion since the number of circuit lines correspond to the number of qubits -so far a very restricted resource).
• Gate count denotes the number of gates the circuit consists of (i.e. d).
• Quantum cost denotes the effort needed to transform a reversible circuit to a quantum circuit. Table I shows the quantum cost for a selection of Toffoli gate configurations as introduced in [19] and further optimized e.g. in [17] . 1 For tuples, we are using the T -symbol which is analogously defined to the sum-symbol
As can be seen, gates of larger size are considerably more expensive than gates of smaller size. The sum of the quantum cost for all gates defines the quantum cost of the whole circuit.
• Transistor cost denotes the effort needed to realize a reversible circuit in CMOS according to [20] . The transistor cost of a Toffoli gate is 8 · c where c is the number of control lines. Example 1: Fig. 1 shows a Toffoli circuit with 3 circuit lines, 6 gates, quantum cost of 10, and transistor cost of 56, respectively. The control lines are thereby denoted by G, while the target lines are denoted by ⊕. The annotated values illustrate the computation performed by this circuit.
Finally, the following definition introduces the denotation of controlled functions.
Definition 4: Let • : B n → B n be a reversible function that is realized by the reversible circuit
controlled function c
• is described by the circuit c G which is obtained by adding the circuit line c to the control set C of each gate g in the circuit G.
In this way, e.g. a controlled increaser c += can be obtained from an increaser += by adding the control bit c to every gate of the circuit, that represents this increaser.
III. REALIZING MULTIPLICATION IN REVERSIBLE LOGIC
In this section, we introduce methods to efficiently realize multiplication in reversible logic. In a straight-forward way, a multiplier can be synthesized by specifying the underlying function in terms of a truth table, binary decision diagram, exclusive-sum-of-products, or similar descriptions, respectively, and pass this to an appropriate synthesis approach (e.g. [9] , [11] , [12] ). Since multiplication is an irreversible function, thereby often circuits with a significant number of additional circuit lines result (even if in case of [9] circuits with minimal number of lines can be achieved). But for the particular case of multiplication, it is possible to reduce the number of circuit lines if an adjusted function specification is used. The corresponding approach is introduced in the first part of this section. However, even then multipliers can be realized only for very small bit-widths. Thus, two further approaches are introduced that enable synthesis of multipliers with scalable bit-widths. Finally, the newly proposed realizations are discussed.
A. Multiplier with Sub-minimal Circuit Lines
As mentioned above, usually additional circuit lines are needed to embed an irreversible function (like multiplication) into a reversible one. More precisely, lines must be added so that at least log 2 (µ) garbage outputs are available, where µ is the maximum number of times an output pattern is 
repeated in the truth table (see [17] ) 2 . Thus, keeping µ as small as possible helps reducing the number of circuit lines. In the following, an adjusted specification of the multiplication function is presented that exploits this observation.
The most frequent output pattern in a multiplier is the binary encoding of zero. The product is zero, iff (at least) one factor is zero. Thus, an n-bit multiplication produces µ = 2 · 2 n − 1 = 2 n+1 − 1 zeros. Hence, log 2 (2 n+1 − 1) = (n + 1) lines with garbage outputs are required. All other possible products (i.e. output patterns) are less frequent. The main idea of this approach is to reduce the occurrences of the zero-output by means of an additional indicator output. As a result, the value of µ is significantly decreased and the multiplier can be realized with less circuit lines.
More precisely, an indicator output is added, which becomes assigned to 1 iff the product of the multiplication is zero. In contrast, all primary outputs can be arbitrarily assigned in this case. In doing so, the result zero is obtained by measuring the indicator output, while all other results still can be obtained from the remaining primary outputs. Thus, the binary encoding of the zero is not longer applied to obtain the minimal number of garbage lines. Instead the next most frequent output pattern is used for that. Therewith, the number of garbage lines can be asymptotically reduced by one half.
Example 2: Consider a 3-bit multiplication. The possible products and their occurrences (i.e. µ) are depicted in Table II . The respective output patterns (ordered in terms of a truth table) are shown in Table III . Since zero is the most frequent product (in total occurring 15 times), at least log 2 (15) = 4 garbage lines are conventionally needed.
In contrast, encoding the zero output by a separate indicator output, 6 and 12 become the most frequent products -each with only 4 occurrences. Hence, only log 2 (4) = 2 additional garbage outputs are needed to realize this function. As a result, this encoding only requires 3 additional outputs (the indicator output and the two garbage outputs) in comparison to the 4 additional outputs required by the conventional method. Column sub-minimal in Table III shows a possible embedding exploiting this encoding.
Having this adjusted specification, any truth table-based synthesis approach (e.g. [9] ) can be applied to realize the multiplier. However, since this approach still relies on a 
B. Hierarchical Method
The realization of multiplier by a hierarchical method using controlled increasers is described in this section.
A common way to multiply two factors a = 
is, the respective bit of b j multiplied by the respective power of 2 is added to the product, iff the respective bit a i is assigned to 1. This can easily be realized by controlled functions (or more precisely controlled increasers as sketched at the end of Section II). Thus, using the hierarchical method an n-bit multiplier is realized by n single n-bit controlled increasers. The respective algorithm for an n-bit multiplication with the factors c i is depicted in Fig. 2 . Here, the ith controlled increaser is controlled by a i . It conditionally adds the value of b to T n−1+i j=i c i , i.e. to the n bits of the product c beginning from the i-th bit. The lower bits do not have to be considered, since the j-th bit of the product is only modified until the j-th controlled increaser. Beyond that, the value remains unchanged. Therefore, the controlled increasers can sequentially be realized by an implicit bit-shift after every increase and without concern for the lower bits. Also, the i-th increaser writes the carry-over in the n + i-th position of the product, which has not been used so far and, thus, holds the value 0. are needed to realize this multiplication. The first controlled increaser is controlled by a 0 . It conditionally adds the second factor b to the three least significant bits of the product c, i.e. to c 2 c 1 c 0 . It also writes the carry-over into c 3 . The second controlled increaser is controlled by a 1 . It conditionally adds b to c 3 c 2 c 1 . Again the carry-over is written into the next productbit c 4 . Finally, the third controlled increaser is controlled by a 2 . It conditionally adds the second factor b to c 4 c 3 c 2 . The carry-over is written into the most significant bit c 5 of the product.
C. Karatsuba Method
This section describes the realization of multipliers in reversible logic based on the divide-and-conquer-method of Karatsuba's algorithm [16] . The idea behind the Karatsuba algorithm is to realize the multiplication by multiplying two factors with smaller bit-width and additionally perform some less expensive operations. Consider an n-bit multiplication with n = 2 · k. Both factors (e.g. a = 2·k−1
With this representation the following equations are deducible:
These equations show that a (2 · k)-bit multiplication can be realized by three k-bit multiplications, some additions, subtractions, and bit-shifts, respectively. In reversible logic, additionally a number of circuit lines are required (as already shown in [21] ). However, the latter can be reduced by choosing appropriate targets of the intermediate results as well as good orderings of the respective operations. Fig. 3 shows an optimized approach to generate reversible multiplier based on this observation (which requires less additional circuit lines than the method of [21] ). The general idea is illustrated by the following example. Since the Karatsuba method is not applicable for very small bit-widths, the variable turningPoint denotes the bit-width below which the hierarchical multiplication is used. Beyond the turningPoint the Karatsuba method is used for multiplication.
Example 4: Consider an 8-bit Karatsuba multiplication and turning point t = 6. Since 8 is greater than t and even, the two conditionals (lines 1 and 4) do not hold and k = 4 is computed (line 7
) and c = a · b (line 10) are performed, respectively. Since these are 4-bit multiplications and the turning point in this example is t = 6, these multiplications are realized using the hierarchical approach described in the previous section. Afterwards, the result of these multiplication is directly assigned to the bits of the product. Furthermore, this result will be used later to modify the product of the sums that are computed next. These sums and e = b + b (line 12). They can be performed by copying the first summand to the (still uninitialized) target and then increasing it by the second summand. The results of these two sums are multiplied to get the third sub-product h = d · e (line 13). Again, since the turning point is greater than the size of the factors, this multiplication is performed by the hierarchical approach. After that, this third sub-product must be modified by subtracting the two earlier computed products from lines 9 and 10 as can be seen in lines 14 and 15 . Finally, the result is obtained by performing the addition of this value to the product using an implicit bitshift of k bits (line 16).
D. Discussion
As reviewed in Section II, there are different cost metrics to judge the quality of reversible circuits, namely the number of circuit lines, the number of gates, the quantum cost, and the transistor cost, respectively. In this section, the costs of the circuits for an n-bit multiplier obtained by the proposed methods are briefly discussed. Experimental results are afterwards given in Section IV.
The sub-minimal approach significantly reduces the minimal number of garbage lines using approximately n 2 garbage lines (instead of n + 1 garbage lines that the conventional approach minimally requires). The number of gates, the quantum cost, and the transistor costs depend on the applied synthesis method.
For the hierarchical approach, an n-bit adder without garbage lines is used, which needs (5 · n − 5) CNOT and (2 · n − 1) C 2 NOT gates [22] . Accordingly, a controlled adder consists of (5 · n − 5) C 2 NOT gates and (2 · n − 1) C 3 NOT gates. The hierarchical method realizes a multiplier with n controlled adders. The first controlled addition can therefore be replaced with a controlled duplication (since the product initially is assigned to 0 and, thus, nothing has to be added). Consequently, this implementation of the hierarchical method leads to circuits with (n − 1) controlled adders and one controlled duplication (which is done with n CNOT gates). In total, circuits result including (5 · n 2 − 9 · n + 5) C 2 NOT gates as well as (2·n 2 −3·n+1) C 3 NOT gates. Therefore, these circuits have quantum costs of 51·n 2 −84·n+38 and transistor costs of 128 · n 2 − 216 · n + 104, respectively. Furthermore, this approach uses separate lines for primary inputs (n lines for each factor) and primary outputs (2 · n lines for the product), i.e. in total 4 · n lines.
The Karatsuba multiplication needs (4 · n 2 + 5) garbage outputs per recursion step. Thus, if t is chosen as turning point and T t represents the number of circuit lines needed for a t-bit hierarchical multiplier, circuits with approximately
garbage outputs are required (additionally to the 2 · n lines for the primary inputs). The number of gates, the quantum cost, and the transistor costs are in O(n log2(3) ). In summary, the Karatsuba approach asymptotically has less quantum cost and transistor costs than the hierarchical approach. But, the hierarchical method requires less circuit lines. The sub-minimal approach leads to circuits with the lowest line count, but is limited by the truth table-based specification.
IV. EXPERIMENTS
The proposed synthesis methods for multiplier have been implemented in C++. In this section, we provide experimental results generated by these methods and compare them to realizations obtained by general purpose approaches (namely the transformation-based [9] , the ESOP-based [11] , and the BDD-based [12] approach, respectively). For the sub-minimal specification, we used the transformation-based approach [9] to synthesize the circuits. The turning point for the Karatsuba approach was set to t = 8. The timeout (denoted by TO) was set to 1000 CPU seconds.
The results are presented in Table IV . Besides the respective bit-width and the resulting number of primary inputs (PI), the line count (LC), the gate count (GC), the quantum cost (QC), and the transistor costs (TC) are listed for each method, respectively. Additionally, the time (in CPU seconds) required to generate the results is listed in column Time.
First of all, the results confirm that using the adjusted specification from Section III-A, multiplier with a lower number of circuit lines can be generated. But due to the truth tablebased description, the method works for very small bit-widths only. A similar behavior can be observed, if general purpose synthesis approaches based on ESOP and BDDs are applied. Indeed, somewhat larger multiplier can be synthesized, but practical relevant bit-widths (e.g. a 32-bit or 64-bit multiplier) cannot be generated. This can be explained by the fact that in particular for the multiplication no efficient representation as ESOP or as BDD, respectively, exists.
In contrast, the hierarchical method and the Karatsuba approach enable the synthesis of reversible multipliers for nearly arbitrary sizes. As discussed in the last section, the Karatsuba method requires more circuit lines, but leads to smaller realizations with respect to quantum cost and transistor cost (in particular for large bit-width). For a better overview, the results of both approaches are additionally illustrated in Fig. 4. V. CONCLUSIONS In this paper, we introduced three methods for multiplier synthesis that particularly address the drawbacks of previous approaches (e.g. the large number of circuit lines in the resulting realizations as well as the poor scalability). We showed that multiplier with a lower number of circuit lines can be obtained by using an adjusted specification of the underlying function. Besides that, two constructive approaches for synthesis of multipliers with very large bit-width are proposed. Experiments confirmed that using these methods, multipliers with large bit-widths can efficiently be synthesized, while previous approaches as well as general purpose synthesis methods do not scale very well on multiplication.
