Ancilla-Input and Garbage-Output Optimized Design of a Reversible
  Quantum Integer Multiplier by HV, Jayashree et al.
ar
X
iv
:1
60
8.
01
22
8v
1 
 [q
ua
nt-
ph
]  
3 A
ug
 20
16
1
Ancilla-Input and Garbage-Output Optimized
Design of a Reversible Quantum Integer Multiplier
Jayashree HV∗, Himanshu Thapliyal†, Hamid R. Arabnia‡, V K Agrawal§
∗ Department of ECE, PES Institute of Technology, Bangalore, KA, India
† Department of Electrical and Computer Engineering, University of Kentucky, Lexington, KY, USA
‡ Department of Computer Science, University of Georgia, Athens, GA, USA.
§ Department of ISE, PES Institute of Technology, Bangalore, KA, India
Abstract—A reversible logic has application in quantum com-
puting. A reversible logic design needs resources such as ancilla
and garbage qubits to reconfigure circuit functions or gate
functions. The removal of garbage qubits and ancilla qubits
are essential in designing an efficient quantum circuit. In the
literature, there are multiple designs that have been proposed for
a reversible multiplication operation. A multiplication hardware
is essential for the circuit design of quantum algorithms, quantum
cryptanalysis, and digital signal processing (DSP) applications.
The existing designs of reversible quantum integer multipliers
suffer from redundant garbage qubits. In this work, we propose
a reversible logic based, garbage-free and ancilla qubit optimized
design of a quantum integer multiplier. The proposed quantum
integer multiplier utilizes a novel add and rotate methodology
that is specially suitable for a reversible computing paradigm.
The proposed design methodology is the modified version of a
conventional shift and add method. The proposed design of the
quantum integer multiplier incorporates add or no operation
based on multiplier qubits and followed by a rotate right
operation. The proposed design of the quantum integer multiplier
produces zero garbage qubits and shows an improvement ranging
from 60% to 90% in ancilla qubits count over the existing work
on reversible quantum integer multipliers.
Index Terms—Reversible Logic, Multiplier, Fredkin Gate,
Quantum Arithmetic.
I. INTRODUCTION
A reversible logic has application in quantum computing.
Reversible circuits are required to have an equal number of
inputs and outputs. They are designed without any feedback
and fanout. There are a few parameters or resource constraints
used to measure the performance of reversible circuits namely
quantum cost (QC), garbage outputs (GO), ancilla inputs (AI),
gate count (GC), and delay (△). The quantum cost of a
reversible circuit is the number of 1x1 and 2x2 quantum gates
that are used to construct the circuit. Garbage outputs are
the ones that are neither the primary outputs nor the ones
required for further computation. An ancilla or constant inputs
are required to derive a certain function and to retain one-to-
one mapping. A delay corresponds to the number of primitive
quantum gates in the critical path of the circuit.
Multipliers are the major computational units that are used
frequently in DSP computations. Optimization is a major
objective in designing a multiplier with design constraints. In a
reversible circuit design, it is necessary to minimize the count
of garbage outputs and ancilla inputs in order to reduce the
total number of qubits; therefore, we present the design of
a reversible multiplier which produces zero garbage outputs
and minimizes the number of ancilla inputs compared to the
existing multiplier designs in the literature.
In this work, we present a modified version of the add and
shift method of multiplication. The basic components used
in our design are add or NOP block and rotate right (ROR)
block. To meet our design requirement, we also present a
modified circuit of reversible ALU design presented in [1]. In
addition, we present a generalized circuit design methodology
that is supported by a generalized behavioral model to design
a constant depth rotate right reversible circuit. This design
is motivated by the design presented in [2]. The reversible
multiplier design presented in this work outperforms existing
multiplier designs in terms of its garbage outputs and ancilla
inputs. We also give an estimate of the gate count, quantum
cost, ancilla inputs, and delay for NXN qubit multiplier. In
the optimization of performance parameters, there is always
a trade-off; such as in optimizing one parameter, the other
parameters get affected. Here, our objective is to optimize the
ancilla inputs and garbage outputs, due to this the remaining
parameters get affected. To indicate the trade-off, the estima-
tion of all the performance parameters is given for each block
used in designing a NXN reversible multiplier. The paper is
organized into several sections namely: Section 1 provides an
introduction to reversible logic gates; Section 2 elaborates on
the background of reversible logic gates; Sections 3, 4, and
5 covers existing designs, behavioral model of the proposed
design, and the proposed circuit design methodology, respec-
tively; Sections 6, 7, and 8 covers performance parameters
calculation, results comparison, and conclusion, respectively.
II. BACKGROUND ON REVERSIBLE LOGIC GATES
This section covers the basics of reversible logic gates. Any
N variable reversible system is built with NxN reversible
circuits. A few 1x1 and 2x2 primitive quantum gates are
used to construct large sized reversible gates and circuits. The
quantum cost of reversible gates used in this work can be found
in [3]. The reversible gates used in this work are Fredkin,
CNOT, Toffoli, and Swap gates [4, 5, 6].
2A. CNOT Gate
A CNOT gate is also known as a Feynman gate (FG). It is
a 2x2 reversible gate. The inputs and outputs are denoted as
(A,B) and (P,Q), respectively. Here, A is treated as a control
qubit, while B is treated as a target qubit. The mapping of
input to outputs are denoted as P ↔ A , Q ↔ A ⊕B. The
block diagram and symbol of CNOT gate are shown in Fig.
1. QC of FG is 1.
(a) CNOT gate (b) Symbol
Fig. 1: CNOT gate and its symbol
B. Toffoli Gate (TG)
This gate is also known as a C2NOT gate. The TG used
in our work is a 3x3 gate with inputs (A,B,C) and outputs
(P,Q,R), respectively. Here, A and B are the control qubits,
while C is the target qubit. The mapping between inputs and
outputs is given with the relation P ↔ A, Q ↔ B, R ↔
(A·B)⊕C. The block diagram and symbol of TG are presented
in Fig. 2. QC of TG is 5.
(a) Toffoli gate (b) Symbol
Fig. 2: Toffoli gate and its symbol
C. Fredkin Gate (FRG)
A Fredkin gate is commonly used as a controlled Swap gate.
In this paper, we use a 3x3 Fredkin gate. A,B, and C are the
inputs and P,Q, and R are the output qubits. The mapping
of the input and outputs are given based on the value of A,
which is the control qubit. When A is high, Q ↔ C and
R ↔ B. When A is low, Q ↔ B and R ↔ C. Irrespective
of A value, P ↔ A. The block diagram and symbol of FRG
gate are shown in Fig. 3. QC of FRG is 5.
(a) Fredkin gate (b) Symbol
Fig. 3: Fredkin gate and its symbol
D. Swap Gate (SG)
The Swap gate is a 2x2 gate. It swaps the input and output
qubits unconditionally. The mapping is A↔ Q and B ↔ P .
The block diagram and symbol are shown in Fig. 4. QC of
SG is 3.
(a) Swap gate (b) Symbol
Fig. 4: Swap gate and its symbol
III. EXISTING WORK
The research on reversible logic is being explored in the
domains of design, synthesis, and testing. Although there
are many synthesis techniques available to realize reversible
circuits, having dedicated designs of a reversible circuit com-
ponent gives flexibility in choosing the designs based on the
application requirement. Multiple ways of designing arithmetic
circuits have been explored in conventional, reversible and
quantum computing [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21]. Several interesting contributions have been
explored in the existing synthesis of reversible logic circuits
[3, 22, 23, 24, 25, 26, 27, 28, 29]. In this section, we discuss
the existing reversible multiplier designs that are important
arithmetic circuits in processing digital signals.
There are several multiplier designs proposed by many au-
thors in view of optimizing different performance parameters
namely quantum cost, ancilla inputs, garbage outputs, logic
depth, and a combination of these parameters. The multipliers
proposed in [30, 31, 32, 33, 34] follow two phases in com-
puting product terms. In the first phase, the partial products
are computed. In the second phase, the summation of partial
products are computed to get the final product terms. In all the
designs mentioned above, the partial product generation and
summation stages are improved either in terms of constant
inputs, quantum cost, or garbage outputs. We found that the
design in [34] gives better results when compared to other
existing work in terms of ancilla inputs and gate count. Apart
from the regular parallel or array multiplier designs, other
techniques of multiplications like Booth, Wallace, and Vedic
are proposed by researchers in [35, 36, 37]. All these designs
are illustrated for smaller operand width. It is necessary for
any designer to choose designs based on their performance
over a wide range of operand width. We found that the recent
publication on multiplier in [38] discussed NxN reversible
multiplier design. We considered the designs proposed in
[38, 39] and compared it with our work, as the efforts in both
of these papers were to reduce the number of constant inputs
and garbage outputs.
IV. PROPOSED REVERSIBLE MULTIPLIER BEHAVIORAL
MODEL
In this section, we present an algorithm for multiplying two
n qubit numbers A and B. The result is stored in 2n qubit
3product register P . There are two conventional techniques of
add and shift method of multiplying two numbers. Shift the
multiplicand left and add it to the product register contents
iteratively or add the multiplicand and shift the product register
contents right iteratively. In the first technique, after the
computation is completed, the multiplicand will not be in its
original form and since one cannot recover the multiplicand,
it will be considered as garbage qubits. We choose the second
technique. The final result will be the contents of the product
register and the multiplicand contents are unaltered, so that
garbage outputs are not generated.
A. Behavioral Model of NxN Multiplier
Algorithm 1 Add and rotate method to model NxN Multiplier
function MULTIPLIER(|An〉, |Bn〉, |P2n〉=|02n〉)
for i = 0 to n− 2 do
if
∣∣A[i]〉 = |1〉 then∣∣P[2n−1:n−1]〉 = ∣∣P[2n−1:n−1]〉+ ∣∣B[n−1:0]〉;
end if∣∣P[2n−1:0]〉= ROTATERIGHT(∣∣P[2n−1:0]〉);
end for
if
∣∣A[n−1]〉 = |1〉 then∣∣P[2n−1:n−1]〉 = ∣∣P[2n−1:n−1]〉+ ∣∣B[n−1:0]〉;
end if
return P ;
end function
The multiplier algorithm is best explained using an example
of 4x4 multiplication with a dot diagram as shown in Fig. 5.
Initially, the P register is loaded with ancilla 0 qubits. The
multiplicand B is added to the P register contents. For the
P register, it considers a least significant position (LSP) from
n−1 and move up to 2n−1 position. For the B register, LSP
starts from 0 and moves up to n− 1. A multiplicand is added
to the P register only if the corresponding multiplier qubit
is high; otherwise, only rotate operation is performed. The
rotate right operation is performed irrespective of the value
of a multiplier qubit. While adding a multiplicand to the P
register, the (n− 1)th position of the P register is aligned to
the 0th position of the B register. Due to this alignment, one
rotate operation is eliminated at the end of computation. The
diagram shown in Fig. 5 is self-explanatory of the algorithm.
B. Behavioral Model of Rotate Right operation
In this section, we present a rotate right operation for 2n
qubit data width. In the multiplication technique presented
in the Algorithm 1, there is a need to rotate the P register
contents to the right; the size of the P register is 2n qubit
width. The rotate right operation is performed by swapping the
qubits in two stages. The circuit is designed to obtain constant
logic depth. To give an illustrative example, we present a
rotate right operation in Fig. 6 for data width of 8 qubits.
The numbers 0 to 7 represent the qubit positions. The initial
representation of qubits is shown in the left most part of Fig.
6. The rotate right operation is performed in two stages. In
Fig. 5: 4x4 qubit multiplication dot diagram
the first stage, qubits in the position pairs (0,7), (1,6), (5,2),
and (4,3) are swapped [(0,7) indicates 0 is swapped with 7].
In the second stage, (0,6), (1,5), and (2,4) are swapped. It is
visible from the diagram that the swapping of qubits in each
stage is parallel which reduces the logic depth compared to
the sequential shifting of qubits. The reversible circuit design
of rotate right operation will be discussed in the latter section
of this paper. We present a generalized pseudo code for the
method discussed in Fig. 6 to swap the qubits in two stages.
The pseudo code presented in the Algorithm 2 performs rotate
Fig. 6: Rotate right with two sets of disjoint transpositions
right operation by 1 qubit position. This code works for both
even and odd data width. The Algorithm 2 will be useful in
any application when data input width is variable (i.e. even or
odd). For the multiplication technique proposed in this paper,
the width of product register (P ) is always even; hence, the
second half of the pseudo code is redundant for the proposed
work.
V. PROPOSED GARBAGELESS REVERSIBLE MULTIPLIER
CIRCUIT DESIGN
From the behavior models Algorithm 1 and Algorithm
2 presented in the Section IV, it is clear that to compute
the product of two n qubit numbers, we need to design
the following reversible circuits: (1) n qubit addition or no
operation (ADD/NOP) circuit (2) uncontrolled rotate right
operation circuit. This section elaborates on the reversible
circuit design methodology of ADD/NOP and rotate right
(ROR) block.
4Algorithm 2 Pseudo Code for Rotate Right Operation
ROTATERIGHT(|P 〉)
k=SIZEOF(|P 〉); ⊲ k is an integer
k1=FLOOR(k/2); ⊲ k1 is an integer
if k mod 2 == 0 then ⊲ For even number of qubits
i = 0; j=k − 1; ⊲ i and j are integers
while i < k1 && j >= k1 do ⊲ First Stage
SWAP(∣∣P[i]〉,∣∣P[j]〉);
i = i+ 1; j = j − 1;
end while
i = 0; j = k − 2;
while i < k1− 1 && j >= k1 do ⊲ Second stage
SWAP(
∣∣P[i]〉∣∣P[j]〉);
i = i+ 1; j = j − 1;
end while
else ⊲ For odd number of qubits
i = 0; j = k − 1;
while i < k1 && j >= k1 + 1 do ⊲ First Stage
SWAP(∣∣P[i]〉,∣∣P[j]〉)
i = i+ 1; j = j − 1
end while
i = 0; j = k − 2;
while i < k1 && j >= k1 do ⊲ Second Stage
SWAP(∣∣P[i]〉,∣∣P[j]〉)
i = i+ 1; j = j − 1;
end while
end if
return P ;
A. ADD or NOP Circuit Design
ADD or NOP block has evolved from the ALU design
proposed in [1]. We have modified the original work to adapt
to our garbageless multiplier design. The reversible circuit
design of ADD/NOP block is shown in Fig. 7. Inputs to the
Fig. 7: Reversible ADD/NOP circuit
ADD/NOP block are:
(a) n qubit product register ∣∣P[2n−1:n−1]〉, (b) n qubit input
operand
∣∣B[n−1:0]〉, (c) 1 qubit input Zcin initialized with
ancilla 0 , (d) 1 qubit input operand A[m], where m is the
qubit position varying from 0 to n − 1. Here, A[m] acts as
the control qubit; if it is high, the P and B register contents
are added. At the output, we have the B register contents
unaltered, where the P register contents will have the sum of
P and B contents. If A[m] is low, then the B and P register
contents are regenerated at the output without modification.
The role of Zcin is to propagate the carry generated from
the previous qubit position. If the control qubit A[m] is high,
P[2n−1] will have the final carry out generated; otherwise, it
will retain its value.
The computation of ADD/NOP block is summarized below.
Here, the index of B varies from 0 to n−1 and P varies from
n−1 to 2n−1, according to the requirement of the multiplier
design. The index of A is chosen to be m which ranges from
0 to n − 1, where n is the size of operands (multiplier and
multiplicand). Here, j is used to indicate the qubit position of
the product register P .
1) Computation Phase 1:
Initialize Zcin with ancilla 0 qubit. In further stages,
the same line will propagate the carry generated from
the previous stage.
2) Step 1: Apply 3x3 Toffoli gate at locations A[m], B[0],
and P[n−1]. After the computation, A[m] and B[0] will
retain their value. P[n−1] will get transformed according
to the equation given below.∣∣P[n−1]〉 = (∣∣A[m]〉 · ∣∣B[0]〉)⊕ ∣∣P[n−1]〉 (1)
3) Step 2a: For 0 ≤ i ≤ n−1 and n−1 ≤ j ≤ 2n−2, apply
3x3 Fredkin gate at locations P[j], B[i], and Zcin. Here,
P[j] acts as a control line to FRG gate and it will not
change after the computation. The remaining lines B[i]
and Zcin will get modified according to the equations
shown below.
|Zcin〉 =
{∣∣B[i]〉 if ∣∣P[j]〉 = 1
|Zcin〉 if
∣∣P[j]〉 = 0 (2)
∣∣B[i]〉 =
{
|Zcin〉 if
∣∣P[j]〉 = 1∣∣B[i]〉 if ∣∣P[j]〉 = 0 (3)
4) Step 2b: For 1≤ i≤ n−1 and n≤ j≤ 2n−2, apply a 3x3
Toffoli gate at locations A[m], B[i], and P[j]. After the
computation, A[m] and B[i] will retain their value. P[j]
will get transformed according to the equations given
below.
∣∣P[j]〉 =
{∣∣P[j]〉⊕ ∣∣B[i]〉 if ∣∣A[m]〉 = 1∣∣P[j]〉 if ∣∣A[m]〉 = 0 (4)
Step 2a and Step 2b execute concurrently.
5) Step 3: Apply a 3x3 Toffoli gate at locations A[m],
B[n−1], and P[2n−1] . This step is required in order to
store the final carry out after n qubit addition. After the
computation, P[2n−1] stores final carry out.
∣∣P[2n−1]〉 =
{∣∣B[n−1]〉 if ∣∣A[m]〉 = 1∣∣P[2n−1]〉 if ∣∣A[m]〉 = 0 (5)
In other words, P[2n−1] stores the final carry out only
if the control line that is corresponding to the multiplier
qubit A[m] is high; otherwise, it will restore the pre-
vious value present in it, that means, it will retain the
previous carry value stored in P[2n−1] from the previous
computation.
56) Computation Phase 2:
The steps in this phase contribute to the generation of
the final product value P and the regeneration of the
contents of multiplicand B .
7) Step 4a: For 2n − 2 ≥ j ≥ n − 1 and n − 1 ≥ i ≥ 0,
apply a 3x3 Fredkin gate at locations P[j] , B[i], and
Zcin. After the computation, the value at P[j] will be
retained as it is where as Zcin and B[i] will get modified
according to the equations shown in Computation Phase
1 Step 2a.
8) Step 4b: For 2n−2 ≥ j ≥ n−1 and n−1 ≥ i ≥ 0, apply
a 3x3 Toffoli gate at locations A[m], Zcin, and P[j]. At
the end of computation, A[m] and Zcin will retain its
value; where as, P[j] will get modified according to the
equation mentioned below.
∣∣P[j]〉 =
{
|Zcin〉 ⊕
∣∣P[j]〉 if ∣∣A[m]〉 = 1∣∣P[j]〉 if ∣∣A[m]〉 = 0 (6)
Steps 4a and 4b execute sequentially.
B. Rotate Right Reversible Circuit Design
The reversible circuit for the rotate right operation is shown
in Fig. 8. The circuit takes no ancilla and rotates the data
to the right from MSB qubits to LSB qubits by 1 position
(ROR). The reversible rotate circuit is designed using Swap
gates and performs a rotate operation with constant delay. For
clarity of understanding, we have shown 8 qubit ROR circuit
design. The product register qubits P[0] to P[7] are given as
input. After one rotate operation, P[0] occupies P[7]th qubit
position and qubits from P[7] to P[1] shift to the right by one
position. The quantum cost of Swap gate is 3. The delay in
performing the rotation operation involves two Swap gates
in series; therefore, the constant delay of 6 is obtained by
considering each cycle individually and decomposing it into
two sets of disjoint swaps. The gates shown in the dotted
boxes are executed in parallel. The rotator design is motivated
by the property proven in [2]. According to the authors,
any permutation is the composition of two set of disjoint
transpositions. This is illustrated in Fig. 6, which also shows
that the cycle is a composition of two reflections. The authors
have proven that any permutation of n qubits can be performed
in 4 layers (levels or logic depth) of CNOT gates with n
ancilla input qubits, or in 6 layers with no ancilla input qubits
(delay of 2 Swap gates). If we had opted for first technique
of multiplication in which the multiplicand (n bit operand) is
shifted, it leaves the multiplicand altered, which in turn will
yield garbage or garbage output.
The rotate circuit proposed performs an unconditional rotate
operation in the sense that irrespective of multiplier qubit
value, a rotation is performed. Another option that we explored
was to use a conventional multiplication technique that needs
conditional shift or rotate circuit. A controlled Swap gate or
Fredkin gate instead of Swap gate can be used to rotate the
qubits. The rotate operation is controlled by A[m] qubit. The
same control qubit is used by all the Fredkin gates in the rotate
circuit as shown in Fig. 9. Here, the computation becomes
sequential and the delay will increase with the size of the
Fig. 8: Reversible rotate right circuit(ROR)
rotate circuit unlike our proposed design. Another reason for
ignoring the design shown in Fig. 9 is that the quantum cost
of Fredkin gate is more than the Swap gate. To optimize the
delay and quantum cost, we omitted this option. If the delay
has to be maintained constant, then one has to store the control
lines for these Fredkin gates and use them in parallel. This will
again increase the number of ancilla lines, thus violating our
objective of minimizing the ancilla lines.
Fig. 9: Alternative reversible rotate circuit
C. Reversible Multiplier Circuit Design Methodology
In this section, we illustrate the design steps of a reversible
multiplier.
1) For m = 0 to n− 2 repeat Step-1 and Step-2
2) Step 1:ADD or NOP
Apply the data qubits A[m], Zcin, product regis-
ter P[2n−1:n−1], and the multiplicand register contents
B[n−1:0] to ADD/NOP block. After the computation,
the contents of A[m], Zcin, and B are restored; where
6Fig. 10: Reversible multiplier circuit
as, the P register contents will get modified according
to the computation equations mentioned in the Section
V-A. ADD/NOP circuit will perform the addition on P
and B register contents if A[m]= high; otherwise, the
contents of those registers are retained.
3) Step 2: Rotate Right (ROR)
Apply the P register contents to ROR (ROR-1 block in-
dicates rotate right by 1 position) block, which performs
a rotate right operation with a constant delay of 6. The
computation is carried out as follows:∣∣P[2n−1:0]〉 ← ∣∣P[2n−1:0]〉 1;
4) Step 3: update m = n− 1, repeat Step-1.
VI. PERFORMANCE PARAMETERS CALCULATION
In this section, we discuss the performance parameters and
the calculation for each circuit used in the reversible multiplier
design. As a final part of the calculation, we show the overall
calculation of the reversible multiplier.
A. Performance Parameters of ADD/NOP Block
The equations shown below are with respect to the design
mentioned in Fig. 7. The calculation of quantum cost (QC) is
shown below.
QC(ADD/NOP) = 5 ∗ No. of TG + 5 ∗ No. of FRG
= 5 ∗ (2n+ 1) + 5 ∗ (2n)
= 20n+ 5 (7)
The ancilla inputs include the product register qubits (n+1)
which are initially set to ancilla 0 and Zcin used for carry
propagation initialized to ancilla 0. So the ancilla for n qubit
ADD/NOP block is given below.
AI(ADD/NOP) = n+ 2 (8)
The delay of ADD/NOP block includes the critical path delay.
To find the critical path, the design has been divided into
stages where each stage is sequential in execution. This is
illustrated in Fig. 11 in vertical lines. The computation stages
are divided into Phase 1 and Phase 2. The Phase 1 consists of
the computation of half sum and final carry out. In Phase 2,
the full sum is computed and the content of the B register is
regenerated. There are 3n+ 2 stages. We have computed the
number of stages involving Toffoli gates and Fredkin gates.
Since the delay is proportional to the quantum gates present
Fig. 11: Critical Path Computation
in each reversible gate, the total delay of ADD/NOP circuit
can be found by using the equation shown below.
△(ADD/NOP) = Delay of single stage ∗ No. of stages
= 5 ∗ (3n+ 2)
= 15n+ 10 (9)
B. Performance Parameter of ROR Block
The performance parameters are estimated for the rotate
right reversible circuit (ROR). It is discussed in the previous
sections that to perform one-time rotation, two phases of swap
operations are carried out. The swapping of qubits in each
phase are computed parallely. The rotate circuit presented has
no garbage outputs and no ancilla input qubits. Qcost (QC)
and delay (△) are calculated as follows.
QC(ROR) = 3 ∗ No. of SG
= 3 ∗ (n− 1)
= 3n− 3 (10)
The above equation is for a generalized rotate circuit design.
For our work, we feed the 2n qubits of the product register
contents to the rotate block. The modified equation is shown
below.
QC(ROR) = 3 ∗ No. of SG
= 3 ∗ (2n− 1)
= 6n− 3 (11)
△(ROR) = 6 (12)
C. Performance Parameters of NxN Reversible Multiplier
Block
The performance parameters calculation for NxN reversible
multiplier block is generated by summing up the calculations
of ADD/NOP and Rotate Right block (ROR) components.
QC(Mul) = n ∗ QC(ADD/NOP) + (n− 1) ∗ QC(ROR)
= n ∗ (20n+ 5) + (n− 1) ∗ (6n− 3)
= 26n2 − 4n+ 3 (13)
7Although the ancilla inputs of the rotate right circuit is nil,
the input to the ADD/NOP block and ROR circuit is the P
register contents, and all the 2n qubit locations of P register
which are initialized with ancilla 0 qubits.
AI(Mul) = AI(ADD/NOP) + AI(ROR)
= 2n+ 1 (14)
The delay of the multiplier is the summation of delay of
ADD/NOP block and rotate right circuit.
△(Mul) = n ∗ △(ADD/NOP) + (n− 1) ∗ △(ROR)
= n ∗ (15n+ 10) + (n− 1) ∗ 6
= 15n2 + 16n− 6 (15)
VII. COMPARISON RESULTS
In the literature, there are more designs presented for a
4x4 reversible multiplier. It is necessary to design a circuit
which is scalable to any size. Hence, we compared our design
with other NxN reversible designs that are available in the
literature. The designs proposed in [38] and [39] show the
calculation for NxN reversible multiplier. In both of these
papers, the authors have shown only the ancilla inputs and
garbage outputs calculations; the comparisons shown in Tables
I, II list only ancilla inputs and garbage outputs for these
papers. It is clear from the result shown in Table I that as the
operands size increases, the percentage of improvement also
increases. Our proposed design showed a better improvement
in terms of the ancilla inputs resulting in saving the chip area
since the number of lines are reduced.
We have listed the garbage outputs of the designs proposed
in [38] and [39]. Our design outperforms the existing designs
because it is 100% better in terms of garbage outputs since
our design produces no garbage. We compared our design
with another garbageless reversible multiplier design using the
recursive scheme in [40]. Here, we compare our work with
TABLE I: Ancilla inputs comparison of NxN Reversible
Multiplier
N
Ancilla
inputs
in [1]
Ancilla
inputs
in [2]
Ancilla
inputs
in [3]
%imp
over
[2]
%imp
over
[3]
4 9 23 28 60.86 67.85
8 17 83 120 79.51 85.83
16 33 303 496 89.10 93.34
32 65 1135 2016 94.27 96.77
64 129 4351 8128 97.03 98.41
128 257 16959 32640 98.48 99.21
256 513 66815 130816 99.23 99.60
512 1025 264959 523776 99.61 99.80
1024 2049 1054719 2096128 99.80 99.90
[1]-Proposed design, [2]- S.Kotiyal et.al [38], [3]-R.Zhou et.al [39]
Karatsuba multiplier design presented in [40]. The comparison
for gate count, ancilla inputs, and delay are shown in Table
III. The design of Karatsuba (1) shown in Table III follows
Bennet’s first scheme. An extra register is used to store the
result and the circuit is run backward. The parallel recursive
calls are made to reduce the time complexity. The design
of Karatsuba (2) also uses parallel recursive call, but the
design does not follow Bennet’s first scheme, instead it follows
TABLE II: Garbage Outputs Comparison for NxN Reversible
Multiplier
N
Garbage outputs
in [1]
Garbage outputs
in [2]
% Imp
over
[1] & [2]
4 22 36
100%
8 81 168
16 300 720
32 1131 2976
64 4346 12096
128 16953 48768
256 66808 195840
512 264951 784896
1024 1054719 3142656
[1]-S. Kotiyal et.al[38], [2]-R. Zhou et.al[39]
a recursive garbage disposal scheme. The multiplication is
computed parallel to garbage disposal. But the trade-off is
that the number of gates increases due to the different design
blocks adapted in the garbage disposal design process. The
design of Karatsuba (3) follows Bennet’s first scheme for
garbage disposal. The only difference with respect to the
design of Karatsuba (1) is that the recursive calls are sequential
rather than parallel. Karatsuba (4) is designed using the
recursive scheme similar to the one adapted in Karatsuba (2),
but recursive calls are sequential. For the designs presented
in [40], we considered the minimum bound on ancilla inputs
calculation. It is observed from Table III that with the slight
increase in the delay and gate count, the proposed design has
improved the ancilla inputs compared to all the Karatsuba
designs.
TABLE III: Comparison with Karatsuba Recursive Multiplier
Designs Gate count Ancilla inputs Delay
K(1) O(nlog23) 6n O(n)
K(2) O(nlog26) 4n O(nlog26)
K(3) O(nlog23) 5n+ n/2 + 1 O(nlog23)
K(4) O(nlog26) 3n+ n/2 O(nlog26)
Proposed O( n2) 2n+ 1 O(n2)
K(1),K(2),K(3), and K(4) indicates Karatsuba designs
1,2,3, and 4 proposed in [40]
VIII. CONCLUSION
In this work, we have proposed ADD and Rotate based on
an NxN reversible multiplier design. We presented the general
behavioral model of the design. The proposed multiplier is
compared with the relevant existing reversible multiplier de-
signs in the literature. We presented the generalized equations
for the performance parameters of the proposed reversible
multiplier. It is observed from comparison results that our
work outperforms the other designs in terms of the ancilla
inputs and zero garbage outputs. The proposed design can be
integrated in a larger data path subsystem designs where the
garbage outputs and ancilla inputs reductions are the major
concerns.
REFERENCES
[1] M. K. Thomsen, R. Glu¨ck, and H. B. Axelsen, “Re-
versible arithmetic logic unit for quantum arithmetic,”
8Journal of Physics A: Mathematical and Theoretical,
vol. 43, no. 38, p. 382002, 2010.
[2] N. Margolus, “Parallel quantum computation,” Complex-
ity, Entropy and the Physics of Information, Santa Fe
Institute Studies in the Sciences of Complexity, vol. 8,
pp. 273–287, 1990.
[3] D. Maslov, “Reversible logic synthesis benchmarks
page,” Online: http://www. cs. uvic. ca/˜ dmaslov, 2005.
[4] E. Fredkin and T. Toffoli, Conservative logic. Springer,
2002.
[5] A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo,
N. Margolus, P. Shor, T. Sleator, J. A. Smolin, and H. We-
infurter, “Elementary gates for quantum computation,”
Physical Review A, vol. 52, no. 5, p. 3457, 1995.
[6] J. A. Smolin and D. P. DiVincenzo, “Five two-bit quan-
tum gates are sufficient to implement the quantum fredkin
gate,” Physical Review A, vol. 53, no. 4, pp. 2855–2856,
1996.
[7] T. G. Draper, S. A. Kutin, E. M. Rains, and K. M. Svore,
“A logarithmic-depth quantum carry-lookahead adder,”
Quantum Information & Computation, vol. 6, no. 4, pp.
351–369, 2006.
[8] Y. Takahashi and N. Kunihiro, “A linear-size quantum
circuit for addition with no ancillary qubits,” Quantum
Information & Computation, vol. 5, no. 6, pp. 440–448,
2005.
[9] Y. Takahashi, “Quantum arithmetic circuits: A survey,”
IEICE TRANSACTIONS on Fundamentals of Electronics,
Communications and Computer Sciences, vol. 92, no. 5,
pp. 1276–1283, 2009.
[10] Y. Takahashi, S. Tani, and N. Kunihiro, “Quantum ad-
dition circuits and unbounded fan-out,” arXiv preprint
arXiv:0910.2530, 2009.
[11] B.-S. Choi and R. Van Meter, “On the effect of quantum
interaction distance on quantum addition circuits,” ACM
Journal on Emerging Technologies in Computing Systems
(JETC), vol. 7, no. 3, p. 11, 2011.
[12] V. Vedral, A. Barenco, and A. Ekert, “Quantum networks
for elementary arithmetic operations,” Physical Review A,
vol. 54, no. 1, p. 147, 1996.
[13] H. Thapliyal and N. Ranganathan, “Design of efficient
reversible logic-based binary and bcd adder circuits,”
ACM Journal on Emerging Technologies in Computing
Systems (JETC), vol. 9, no. 3, p. 17, 2013.
[14] H. Thapliyal, H. Jayashree, A. Nagamani, and H. R.
Arabnia, “Progress in reversible processor design: A
novel methodology for reversible carry look-ahead
adder,” in Transactions on Computational Science XVII.
Springer, 2013, pp. 73–97.
[15] H. Thapliyal, H. Arabnia, and A. P. Vinod, “Combined
integer and floating point multiplication architecture
(cifm) for fpgas and its reversible logic implementation,”
in Circuits and Systems, 2006. MWSCAS’06. 49th IEEE
International Midwest Symposium on, vol. 2. IEEE,
2006, pp. 438–442.
[16] H. Thapliyal, H. R. Arabnia, and M. Srinivas, “Reduced
area low power high throughput bcd adders for ieee 754r
format,” arXiv preprint cs/0609036, 2006.
[17] H. Thapliyal and H. R. Arabnia, “Reversible pro-
grammable logic array (rpla) using fredkin & feynman
gates for industrial electronics and applications,” arXiv
preprint cs/0609029, 2006.
[18] H. Thapliyal, M. Srinivas, and H. R. Arabnia, “Reversible
logic synthesis of half, full and parallel subtractors.” in
ESA, 2005, pp. 165–181.
[19] H. Thapliyal, M. B. Srinivas, and H. R. Arabnia, “A Need
of Quantum Computing: Reversible Logic Synthesis of
Parallel Binary Adder-Subtractor,” in Embedded Systems
and Applications, 2005, pp. 60–68.
[20] H. Thapliyal, M. Srinivas, and H. R. Arabnia, “A re-
versible version of 4 x 4 bit array multiplier with mini-
mum gates and garbage outputs.” in Embedded Systems
and Applications, 2005, pp. 106–116.
[21] H. Thapliyal, H. R. Arabnia, and M. Srinivas, “Efficient
reversible logic design of bcd subtractors,” in Transac-
tions on Computational Science III. Springer, 2009, pp.
99–121.
[22] A. K. Prasad, V. Shende, I. Markov, J. Hayes, and K. N.
Patel, “Data structures and algorithms for simplifying
reversible circuits,” ACM JETC, vol. 2(4), pp. 277–293,
2006.
[23] O. Golubitsky and D. Maslov, “A study of optimal 4-bit
reversible toffoli circuits and their synthesis,” Computers,
IEEE Transactions on, vol. 61, no. 9, pp. 1341–1353,
2012.
[24] D. Maslov and G. W. Dueck, “Reversible cascades with
minimal garbage,” Computer-Aided Design of Integrated
Circuits and Systems, IEEE Transactions on, vol. 23,
no. 11, pp. 1497–1509, 2004.
[25] G. Yang, X. Song, W. N. Hung, and M. A. Perkowski,
“Bi-directional synthesis of 4-bit reversible circuits,” The
Computer Journal, vol. 51, no. 2, pp. 207–215, 2008.
[26] D. Maslov and M. Saeedi, “Reversible circuit opti-
mization via leaving the boolean domain,” Computer-
Aided Design of Integrated Circuits and Systems, IEEE
Transactions on, vol. 30, no. 6, pp. 806–816, 2011.
[27] D. Maslov, “On the advantages of using relative phase
toffolis with an application to multiple control toffoli
optimization,” arXiv preprint arXiv:1508.03273, 2015.
[28] M. Saeedi, M. S. Zamani, M. Sedighi, and Z. Sasa-
nian, “Reversible circuit synthesis using a cycle-based
approach,” J. Emerg. Technol. Comput. Syst., vol. 6, pp.
13:1–13:26, December 2010.
[29] W. N. Hung, X. Song, G.Yang, J.Yang, and
M. Perkowski, “Optimal synthesis of multiple output
boolean functions using a set of quantum gates by
symbolic reachability analysis,” IEEE Trans. Computer-
Aided Design, vol. 25, no. 9, pp. 1652–1663, Sep.
2006.
[30] M. Zomorodi Moghadam and K. Navi, “Ultra-area-
efficient reversible multiplier,” Microelectronics Journal,
vol. 43, no. 6, pp. 377–385, 2012.
[31] H. Bhagyalakshmi and M. Venkatesha, “Optimized mul-
tiplier using reversible multi-control input toffoli gates,”
International Journal of VLSI Design & Communication
Systems, vol. 3, no. 6, 2012.
9[32] V. K. Panchal and V. H. Nayak, “Analysis of multiplier
circuit using reversible logic,” International Journal for
Scientific Research and Development, vol. 1, no. 6, pp.
279–284, 2014.
[33] H. Rangaraju, A. B. Suresh, and K. Muralidhara, “Design
of efficient reversible multiplier,” in Advances in Com-
puting and Information Technology. Springer, 2013, pp.
571–579.
[34] S. Mamataj, B. Das, and S. Chandran, “An approach
for designing an optimized reversible parallel multiplier
by reversible gates,” in Computational Advancement in
Communication Circuits and Systems. Springer, 2015,
pp. 345–355.
[35] J. Sultana, S. K. Mitra, and A. R. Chowdhury, “On the
analysis of reversible booth’s multiplier,” in VLSI Design
(VLSID), 2015 28th International Conference on. IEEE,
2015, pp. 170–175.
[36] H. Thapliyal and M. Srinivas, “Novel reversible mul-
tiplier architecture using reversible tsg gate,” arXiv
preprint cs/0605004, 2006.
[37] A. Banerjee and D. K. Das, “The design of reversible
multiplier using ancient indian mathematics,” in 2013
International Symposium on Electronic System Design
(ISED). IEEE, 2013, pp. 31–35.
[38] S. Kotiyal, H. Thapliyal, and N. Ranganathan, “Re-
versible logic based multiplication computing unit using
binary tree data structure,” The Journal of Supercomput-
ing, pp. 1–26, 2015.
[39] R. Zhou, Y. Shi, H. Wang, and J. Cao, “Transistor
realization of reversible zs series gates and reversible
array multiplier,” Microelectronics Journal, vol. 42, no. 2,
pp. 305–315, 2011.
[40] R. Portugal and C. Figueiredo, “Reversible karatsubas
algorithm,” Journal of Universal Computer Science,
vol. 12, no. 5, pp. 499–511, 2006.
