Quantum Circuits for Toom-Cook Multiplication by Dutta, Srijit et al.
Quantum Circuits for Toom-Cook Multiplication
Srijit Dutta,1 Debjyoti Bhattacharjee,2, ∗ and Anupam Chattopadhyay2
1Department of Computer Science and Engineering, IIT Bombay, India
2School of Computer Science and Engineering, Nanyang Technological University, Singapore
(Dated: June 26, 2018)
In this paper, we report efficient quantum circuits for integer multiplication using Toom-Cook
algorithm. By analyzing the recursive tree structure of the algorithm, we obtained a bound on
the count of Toffoli gates and qubits. These bounds are further improved by employing reversible
pebble games through uncomputing the intermediate results. The asymptotic bounds for different
performance metrics of the proposed quantum circuit are superior to the prior implementations of
multiplier circuits using schoolbook and Karatsuba algorithms.
I. INTRODUCTION
Quantum computing has gathered significant attention
by solving certain problems much faster than any known
classical algorithm. In contrast to Boolean logic, quan-
tum bits (qubits) not only represent the classical 0 and 1
states but also any complex combination or superposition
of both, leading to a significant speed-up in computing.
The Deutsch-Jozsa algorithm [1] and Shor’s factorization
algorithm [2] are well-known examples demonstrating the
power of quantum computing. This capability gives rise
to the bounded-error quantum polynomial time (BQP)
complexity class with an open quest among computer
scientists and mathematicians to establish the exact re-
lation between BQP and other complexity classes. In
order to accelerate scientific computing using the capa-
bilities of a quantum computer, efficient quantum circuits
for basic mathematical functions are needed. The effi-
ciency of a quantum circuit is measured by lower com-
putational space (number of qubits) and lower compu-
tational time (logical depth). For fault-tolerant, error-
protected quantum circuits to implement the quantum
algorithms, it is projected that a large number of physical
qubits are required for every logical qubit [3]. Naturally,
potential solutions to reduce the number of logical qubits
contribute to the overall efficiency of the quantum circuit.
Multiplication is one of the elementary mathematical
operations of arithmetic. Fast long integer arithmetic is
at the very core of many computer algebra systems. In
quantum computing, apart from being used as a block
in itself, integer multiplication is used as a sub-routine
in many applications such as Shor’s integer factorization
algorithm and in Newton iterations for calculating many
functions like the inverse [4].
In this paper, we present quantum implementation of
the Toom-Cook multiplication algorithm [5, 6], which can
attain better asymptotic complexity than simple school-
book multiplication and the Karatsuba based integer
multiplication [7]. We further improve these bounds by
analyzing pebble games on complete trees.
∗ debjyoti001@e.ntu.edu.sg
II. PRIOR WORKS
The problem of multiplication in the quantum domain
has been explored previously. For small numbers, the
na¨ıve schoolbook multiplication works best, with a run-
time complexity O(n2) that also translates to the logi-
cal depth in a quantum circuit realization. Karatsuba
multiplication, implemented in quantum circuits [7], is
usually faster when the multiplicands are longer than
320− 640 bits, which also provides asymptotic improve-
ment in terms of Toffoli cost and Toffoli depth over
the schoolbook multiplication. However, the number of
qubits required for Karatsuba-based quantum implemen-
tation is higher than the schoolbook multiplication. In
the realm of quantum circuits, so far, the Scho¨nhage-
Strassen method (using Fast Fourier transform) and
Toom-Cook multiplication algorithm are not reported,
even though, it is known from the classical implemen-
tations that these algorithms result in better run time,
when the operand size is much larger. We primarily fo-
cus on the Toom-Cook multiplication in this work. As
reported in our results, this leads to significant savings
of all the performance metrics for an efficient quantum
circuit, clearly outperforming the prior works.
The family of Toom-Cook methods is an infinite set
of polynomial algorithms (Toom-2.5, Toom-3, Toom-4,
etc) [8]. Instead of using the more common Toom-3
implementation, we present the work with Toom-2.5 to
avoid a division by 3 required by the former. This leads
to reduction in overall circuit costs, as quantum division
is costlier in terms of Toffoli count and Toffoli depth than
simple addition or shift operations. Most of the higher
Toom implementations require a similar division by con-
stants that are not multiples of 2. The implementation of
such divisions incur higher quantum costs and therefore
we avoid them.
When moving to quantum domains, gate sets need to
go beyond classical to create the superposition effect of
the inputs. The standard universal quantum gate library
that efficiently implements fault-tolerant quantum error
correction codes is the Clifford+T library [9, 10]. In
this library, the cost of implementing a T -gate is suffi-
ciently high to customarily neglect the cost of other Clif-
ford group gates, while determining the total cost of the
ar
X
iv
:1
80
5.
02
34
2v
2 
 [c
s.E
T]
  2
3 J
un
 20
18
2quantum circuit. Therefore, the number of T -gates is a
metric to judge the cost of a quantum circuit. Also, the
number of qubits used in a quantum circuit is another im-
portant standard, since the current quantum technologies
still struggle to achieve error free computation for large
count of qubits. A study of the space-time trade off can
be performed [11] using these two metrics. Another met-
ric of importance is the T-depth. T -depth is defined as
the number of T-stages is a quantum circuit where each
such stage consists of one or more T or T† gates per-
formed concurrently on separate qubits. It is important
to note that an input circuit with continuous parameter
gates (e.g. z-rotation gate Rz(θ) ) is decomposed using a
set of discrete, basis gates, typically from the Clifford+T
library. The exact number of Clifford+T gates needed
for such a continuous parameter gate depends on the de-
sired accuracy, and the discrete gate set provides only an
approximation. In the context of the current work, we
consider T-count and T-depth of the circuit to be propor-
tional to the Toffoli-count and Toffoli-depth respectively,
by following the Toffoli decomposition proposed in [12].
III. TOOM-COOK MULTIPLIER
Given two large integers n1 and n2, the Toom-Cook al-
gorithm splits them into k smaller parts of length l. The
multiplication sub-operations are then computed recur-
sively using Toom-Cook multiplication again, till we are
able to apply another algorithm on it for the last stage of
recursion, or until the desired multiplier is reached. The
input numbers are divided into limbs of a given size, each
in the form of polynomial, and the limb size is used as
radix. Instead of multiplying the obtained polynomials
directly, they are evaluated at a set of points and the
values multiplied together at those points. Based on the
products obtained at those points, the product polyno-
mial is computed by interpolation. The final result is
then obtained by substituting the radix.
In general, Toom-k runs in Θ(c(k)ne), where n de-
notes input size, k is the number of parts that the input
operand is decomposed into and e = logk (2k − 1). c(k)
is the time spent on auxiliary additions and multiplica-
tions by small constants. The Karatsuba algorithm [13]
is a special case of Toom-Cook multiplication (Toom-
2), where the input operand is split into two smaller
ones. It reduces 4 multiplications to 3 and so operates
at Θ(nlog2 3). In general, Toom-k reduces k2 multiplica-
tions to 2k − 1 ordinary long multiplication (equivalent
to Toom-1) with complexity Θ(n2).
A. Implementation Details
Let x and y be two n bit numbers. To proceed with
Toom-2.5 algorithm, we first decompose x and y into two
and three parts respectively. Express x = x12
i + x0 and
y = y22
2i + y12
i + y0 with i ≥ 1. Typically i is chosen as
max
{⌊
dlog2 xe
k
⌋
,
⌊
dlog2 ye
k
⌋}
, where k = 2.5 in our case.
We define the following four product terms :
P = x0y0, (1)
Q = (x0 + x1)(y0 + y1 + y2), (2)
R = (x0 − x1)(y0 − y1 + y2), (3)
S = x1y2 (4)
Then, the product xy is evaluated as :-
xy = A23i +B22i + C2i +D (5)
A = S (6)
B = −P + 1
2
Q+
1
2
R (7)
C =
1
2
Q− 1
2
R− S (8)
D = P (9)
Note that only 4 multiplications are required for com-
putation of the product. Also, each of these multiplica-
tions consists of numbers of size smaller than the original
problem size (bit-width). Each smaller multiplication is
between one number of bit-width n/2 and another of bit-
width n/3. Since this method is applied in recurrence
the second time for our analysis, we consider that the
smaller limbs formed from the number which was split
into 3 parts originally is now split into 2 parts and vice-
versa. So after two steps, we get 16 smaller problems of
size n/6 each. Thus, we obtain the basic recurrence for
the number of steps T (n).
T (n) = 16T (
n
6
) (10)
All additions (the intermediate ones as well as the fi-
nal ones) are performed by separate adders which have
bounded cost.
B. Gate count Analysis
For gate count analysis, we consider only the Toffoli
count required by the quantum circuit or sub-circuit.
This is because the other used gates (NOTs, CNOTs)
do not contribute to the T -count of the circuit, consid-
ering the Clifford+T library. The designed circuit maps
(x, y, 0, 0) 7→ (x, y, g, xy), where g denotes some garbage
output resulting as computation of A,B,C and D. The
product is copied after the computation and the circuit
is then run backwards (uncomputed) to set the garbage
outputs back to 0.
In our circuit implementation, the Cuccaro adder is
used [14]. For addition of two n bit numbers, Cuccaro
adder requires 2n− 1 Toffoli gates. It is also established
that the cost An, for an in-place adder adding two n bit
numbers, is bounded by 2n Toffoli gates.
Let Tn,n denote the multiplication call to Toom-2.5
circuit for calculating product to two n bit numbers and
3FIG. 1: Recursion tree structure of the
Toom-2.5 implementation.
TCn denote the number of Toffoli gates required for im-
plementing Tn,n. First, we need to calculate P,Q,R and
S. This requires 4 recursive calls to Tn
2 ,
n
3
. For calculat-
ing the intermediate sums required as input for Tn
2 ,
n
3
, we
need four n/2 bit adders and six n/3 bit adders. This
also includes uncomputation of the intermediate garbage
results, i.e., the qubits used for storage of intermediate
results are returned to their initial states. The output of
each Tn
2 ,
n
3
is a 5n/6 bit number. Finally, for computing
A,B,C and D, four 5n/6 adders are required. As already
stated earlier in evaluation of Tn
2 ,
n
3
we assume that the
n/2 bit number is split into 3 parts and vice versa. By
performing similar analysis, we get evaluation of Tn
6 ,
n
6
,
in terms of which the recursive relation is provided.
TCn = 16TCn/6 + 40An/6 + 22An/3 + 4An/2 + 4A5n/6
(11)
= 16log6nTC1 + 40(An6 + 16A
n
36
+ . . . )
+ 22(An
3
+ 16A n
18
+ . . . ) + 4(An
2
+ 16A n
12
+ . . . )
+ 4(A 5n
6
+ 16A 5n
36
+ . . . ) (12)
The base case is the multiplication of two 1 bit numbers
which can be done by a Toffoli gate. Therefore, TC1 = 1.
Each of the summation of the adder gate counts have
log6n terms. On evaluating the summations using geo-
metric progression and doubling the cost to account for
the aforementioned uncomputation, we get :-
TCn = 2
(
16log6n + 23.2n
[(16
6
)log6 n − 1]) (13)
= 2nlog6 16 + 46.4n(nlog6(16/6) − 1) ≤ 49nlog6 16
(14)
Note that all operations used in the circuit design are
implemented using only adders and shifts, without any
separate multiplication/division blocks.
C. Space-Time Trade-offs
The recursive nature of the problem gives rise to an inher-
ent tree structure as shown in Fig. 1. The size of a node
is representative of the problem size at that level. For ex-
ample, the root level denotes the complete problem (n bit
multiplication). According to the recursion presented in
Equation (10), each node will have 16 children nodes de-
noting a smaller problem (n/6 bit multiplication). For
the Toom-2.5 circuit with an input of size n at any level x
in the tree, there are 16x nodes of size n6−x each for a
total cost of n
(
16
6
)x
at level x (level numbering starting
at 0 from root). So, the space cost Qorig of the complete
tree is
Qorig = n
N−1∑
0
(16
6
)x
(15)
= n
(16/6)log6 n − 1
(16/6)− 1 (16)
= O(n(8/3)log6 n) = O(n1+log6 (8/3)) (17)
≈ O(n1.547) (18)
where tree height, N = log6 n.
The reversible pebble game [15] is a combinatorial
game played on rooted Directed Acyclic Graphs (DAGs).
Each pebble represents some amount of space. The
rules are similar to those used in the pebble game to
model irreversible computation except that we simply
cannot remove pebbles by reversibility constraint. There
is a reverse computation for each corresponding compu-
tation performed, implying that during the game, the
pebbles may still be removed but it is subject to the
same conditions as applied during placing the pebbles.
We use this reversible game to obtain better asymptotic
bounds in the number of qubits (space) to implement the
Toom 2.5 algorithm.
We want to find a level in the recursion tree such that
the size of each node’s sub-tree is approximately equal
to the sum of the size of all nodes at that level chosen
and above. Once all the nodes in the chosen level have
been computed, we uncompute all the sub-trees below
it. This is performed to minimize space — the size of
these sub-trees is chosen to be approximately equal to the
remaining size of the tree above them. Let the required
height be k from the leaves of the tree. The cost of all
height k sub-trees is
n
N−1∑
N−k
(16
6
)x
Therefore, cost of a single height k sub-tree is
n
16N−k
N−1∑
N−k
(16
6
)x
=
n
6N−k
k−1∑
0
(16
6
)x
We want this to equal the cost of all nodes above the
4FIG. 2: The quantum circuit for computing integer multiplication result using Toom-2.5 algorithm. The compute
blocks are then run backwards (uncomputed) to set the garbage outputs (g) back to 0 (not shown in the figure).
kth level.
n
N−k−1∑
0
(16
6
)x
=
n
6N−k
k−1∑
0
(16
6
)x
(19)
Simplifying we obtain a bound that k ≤ N2−log16 6 . This
is since k ≤ N and
(
16
6
)N−k
≥ 16k
6N
. Using the above
technique, the qubit count is now optimized and bounded
by Qopti.
Qopti = O
(
n
(8
3
) 1
2−log16 6 log6 n
)
≈ O(n1.404) (20)
The time complexity of a quantum circuit is effectively
equal to the depth of the circuit in terms of Toffoli gates.
Each node in the computation tree shown in Fig. 1 at
level k, must be computed sequentially. At the kth level,
the number of sub-trees STk and corresponding depth
Dk is defined as follows.
STk = 16
(
1− log 162 log 16−log 6
)
log6 n (21)
Dk =
n
6
(
1− log 162 log 16−log 6
)
log6 n
(22)
STk ∗Dk = n
(8
3
)(1− log 162 log 16−log 6) log6 n ≈ n1.143 (23)
The product STk ∗Dk gives an overall depth for comput-
ing the entire kth level of the recursion tree.
The method proposed above is most efficient if both
the numbers to be multiplied are approximately of the
same bit-width. In case one of them is much bigger than
the other, it is better if the bigger number is repeatedly
divided into 3 parts in each turn, until the smaller parts of
both the numbers are roughly the same size. Following
this method, the asymptotic computational complexity
can be shown to be more efficient than that of the alter-
nating Toom-2.5 method adopted.
The circuit of the described implementation is shown
in Fig. 2. It describes the circuit for Tn,n that multi-
ples x (decomposed into x0, x1) and y (decomposed into
y0, y1, y2). All symbols and variables mentioned hold the
same meanings as described in the analysis above. The
adder, subtractor and shifting blocks are represented as
‘Adder’, ‘Sub’ and ‘Shift’ respectively. The Tn
2 ,
n
3
blocks
denote Toom-2.5 sub-circuits of smaller bit-width.
TABLE I: Asymptotic performance analysis of the
quantum implementation of various multiplication
methods.
Method QC TC TD
Na¨ıve [7] O(n) O(n2) O(n2)
Na¨ıve Improved. [16] O(n) O(n2) O(n logn)
Karatsuba [7] O(n1.427) O(nlog2 3) O(n1.158)
Toom-2.5 O(n1.404) O(nlog616) O(n1.143)
Const. Mult. [17] O(n) O(n2) O(n)
QC: Qubit count, TC: Toffoli count, TD: Toffoli depth
TABLE II: Cost of quantum implementation of
multiplication.
Method QC TC TD
Na¨ıve [7] 4n + 1 4n2 − 3n 4n2 − 4n + 1
Karatsuba [7] n
(
3
2
) log2 n
2−log3 2 42nlog2 3 n
(
3
2
)(1− 1
2−log3 2
)
log2 n
Toom-2.5 n
(
8
3
) log6 n
2−log16 6 49nlog6 16 n
(
8
3
)(1− 1
2−log16 6
)
log6 n
Const. Mult [17] 3n + 1 4n(n + 1) 8n
5IV. RESULTS AND DISCUSSIONS
Table I presents the asymptotic results of implementation
of various multiplication methods while Table II provides
the exact constants involved. The na¨ıve multiplication
method suggested in [7] allows implementation with the
lowest number of qubits asymptotically but fares badly
in terms of Toffoli count and depth.
In [16], the implementation of logarithmic depth
adders have been provided. The na¨ıve (shift-add) mul-
tiplier can be improved in depth by using the logarith-
mic depth adder as a submodule. The n-bit adder has
a depth of order O(log n), thus the multiplier shall have
a depth of O(n log n). However, for both ‘in place’ and
‘out of place’ adders described in [16], extra ancilla are
required for intermediate computation. Also, the Tof-
foli count is greater compared to the Cuccaro Adder.
Thus, the multiplier developed by extension though opti-
mized in depth, will have greater asymptotic Toffoli and
qubit count, equal to O(n2). In Table I, we provide the
asymptotic complexity of such a multiplier. However,
in the absence of an explicit design, we are unable to
provide the exact constants involved in the cost metrics
and hence Na¨ıve Improved method mentioned in Table I
is excluded from Table II. Toom-2.5 requires less num-
ber of qubits than Karatsuba [7]. Toom-2.5 outperforms
both the na¨ıve and Karatsuba methods in terms of Toffoli
count as well as Toffoli depth, highlighting the efficiency
of the proposed method.
Pavlidis et al. [17] presented a depth optimized multi-
plier, for multiplication by a constant only. Therefore, it
is unfair to be directly compared with our implementa-
tion and the Karatsuba multiplication implementations
presented in [7]. It has a Toffoli depth of 8n, a cost of
4n(n+ 1) and qubit count of 3n+ 1.
The Clifford+T quantum gate library has garnered
much interest in the implementation of fault-tolerant
quantum circuits [18]. As mentioned in [19, 20], the
cost of Toffoli gate is higher compared to the NOT and
CNOT gates. The Toffoli gate may be decomposed using
Clifford+T -gates, which makes cost metrics associated
with Toffoli gates important. Therefore, Toffoli count
and Toffoli depth are used as the performance metrics
to begin with. The cost of mapping a Toffoli gate to the
Clifford+T fault tolerant library is upper bounded by 7×
Toffoli count and 3× Toffoli depth [12]. Therefore, fault
tolerant implementation of the proposed multiplication
method would have at most 7× Toffoli count and 3×
Toffoli depth of the values mentioned in Table I. It is
further possible to improve these values by optimization
techniques proposed in [9, 12].
Fig. 3a presents a comparison of the Toffoli count re-
quired by the various methods for variation in the bit-
width of the inputs. The na¨ıve multiplication method
performs better in terms of total Toffoli cost at smaller in-
put sizes (< 300 bits), but is outperformed by the Karat-
suba and Toom algorithms at higher bit-widths.
Fig. 3c shows the variation in the qubit requirements
by the different implementations across a range of in-
put sizes. In this case the shift and add method (na¨ıve)
outperforms both the recursive algorithms as it increases
linearly. However, this low space requirement leads to a
higher depth as demonstrated, in Fig. 3b in a logarith-
mic scale. Both Toom-2.5 and the Karatsuba implemen-
tations perform much better in this respect.
We also present a bound on the CNOT counts of the
considered implementations. In the proposed Toom-2.5
circuit shown in Fig. 2, CNOT gates are present in the
Cuccaro adders and copy blocks. It can be seen from [14]
that the number of CNOT gates in a n bit adder can be
bounded by 5n. Proceeding similarly as the Toffoli count
analysis, we get an exactly similar recurrence relation as
presented in Gate Count Analysis in Section III. Let CCn
denote the number of CNOT gates in Tn,n. Also, let Acn
denote the number of CNOT for an in-place n bit adder.
CCn = 16CCn/6 + 40Acn/6 + 22Acn/3 + 4Acn/2 + 4Ac5n/6
(24)
= 16log6nCC1 + 40(Acn6 + 16Ac
n
36
+ . . . )
+ 22(Acn
3
+ 16Ac n
18
+ . . . ) + 4(Acn
2
+ 16Ac n
12
+ . . . )
+ 4(Ac 5n
6
+ 16Ac 5n
36
+ . . . ) + COPYcnot (25)
where COPYcnot denotes the number of CNOTs used
in the two copy blocks. However, the number of CNOT
gates arising out of the Copy blocks are of the order O(n)
and is dominated by the terms of order nlog6 16. CC1 = 0
because 1-bit multiplier just consists of 1 Toffoli gate.
CCn ≈ 2
(
58n
[(16
6
)log6 n − 1]) (26)
= 116n(nlog6(16/6) − 1) ≤ 116nlog6 16 (27)
By similar analysis, the CNOT count of the Karatsuba
multiplier can be bounded by 100nlog2 3. For the na¨ıve
method, controlled adders are considered as described
in [7]. Each such adder has 2n CNOTs and the multiplier
uses n − 1 such adders. Thus, the total CNOT count is
2n2 − 2n. For the Constant multiplier in [17] 2n CNOT
gates are employed. These observations are summarized
in Fig. 4.
In [20], it has been established that the T -gate is
at least 6 times costlier compared to the CNOT gate,
which emphasizes the importance of T-count and T-
depth. However, with increasing circuit size, the cost of
CNOT may take a dominant role if we follow the analysis
in terms of upper/lower bounds [19]. From that perspec-
tive, the study of overall cost is important. As we found
for the case of multipliers, the CNOT count of Toom-
2.5 Multiplier grows at slightly lower rate compared to
that of the Karatsuba Multiplier with increasing input
size. Considering the fact that Toffoli count of Toom-2.5
Multiplier already outperforms Karatsuba at large input
sizes, the proposed design is clearly more efficient.
6100 200 300 400 500
Input Size (bits)
0
2
4
6
8
10
12
To
ffo
li 
Co
un
t
#10 5
Karatsuba
Toom-2.5
Naive
Const
1
(a)
5 10 15 20 25 30 35 40 45 50
100
101
102
103
104
To
ffo
li 
De
pt
h
Input Size (bits)
 
 
Karatsuba
Toom−2.5
Naive
Const
(b)
100 200 300 400 500
Input Size (bits)
0
5000
10000
15000
# 
of
 Q
ub
its
Karatsuba
Toom-2.5
Naive
Const
1
(c)
FIG. 3: Comparison of the quantum multiplier implementations based on : a Toffoli Count, b Toffoli Depth and
c #Qubits.
100 200 300 400 500
Input Size (bits)
10 0
10 2
10 4
10 6
10 8
CN
O
T 
Co
un
t
Karatsuba
Toom-2.5
Naive
Const
1
FIG. 4: Variation in CNOT counts across different
implementations with increasing input size.
V. CONCLUSION
Designing an efficient quantum circuit with low resource
requirements and faster run time is an important chal-
lenge with significant repercussions across several do-
mains, such as, scientific computing and security. In this
work, we reported an efficient quantum circuit for inte-
ger multiplication based on Toom-Cook algorithm. We
provide design results, and techniques for lowering the re-
source requirements. In terms of asymptotic complexity,
the presented implementation outperforms the state-of-
the-art results for multiple performance metrics.
[1] D. Deutsch and R. Jozsa, in Proceedings of the Royal
Society of London A: Mathematical, Physical and Engi-
neering Sciences, Vol. 439 (The Royal Society, 1992) pp.
553–558.
[2] P. W. Shor, in Foundations of Computer Science, 1994
Proceedings., 35th Annual Symposium on (Ieee, 1994) pp.
124–134.
[3] E. T. Campbell, B. M. Terhal, and C. Vuillot, Nature
549, 172 (2017).
[4] M. Soeken, M. Roetteler, N. Wiebe, and G. De Micheli,
in 2017 Design, Automation & Test in Europe Conference
& Exhibition (DATE) (IEEE, 2017) pp. 470–475.
[5] A. L. Toom, in Soviet Mathematics Doklady, Vol. 3 (1963)
pp. 714–716.
[6] S. A. Cook and S. O. Aanderaa, Transactions of the
American Mathematical Society 142, 291 (1969).
[7] A. Parent, M. Roetteler, and M. Mosca, arXiv preprint
arXiv:1706.03419 (2017).
[8] D. E. Knuth, The art of computer programming, Vol. 2
(Pearson Education, 1997).
[9] M. Amy, D. Maslov, and M. Mosca, IEEE Transactions
on Computer-Aided Design of Integrated Circuits and
Systems 33, 1476 (2014).
[10] A. G. Fowler, A. M. Stephens, and P. Groszkowski, Phys-
ical Review A 80, 052312 (2009).
[11] R. Wille, M. Soeken, D. M. Miller, and R. Drechsler,
Integration, the VLSI Journal 47, 284 (2014).
[12] N. Abdessaied, M. Amy, M. Soeken, and R. Drechsler,
in Multiple-Valued Logic (ISMVL), 2016 IEEE 46th In-
ternational Symposium on (IEEE, 2016) pp. 150–155.
[13] A. Karatsuba, in Sov. Phys. Dokl., Vol. 7 (1963) pp. 595–
596.
[14] S. A. Cuccaro, T. G. Draper, S. A. Kutin, and D. P.
Moulton, arXiv preprint quant-ph/0410184 (2004).
[15] C. H. Bennett, SIAM Journal on Computing 18, 766
(1989).
[16] T. G. Draper, S. A. Kutin, E. M. Rains, and K. M.
Svore, arXiv preprint quant-ph/0406142 (2004).
[17] A. Pavlidis and D. Gizopoulos, arXiv preprint
arXiv:1207.0511 (2012).
[18] Y. S. Weinstein, Physical Review A 87, 032320 (2013).
[19] D. Maslov, arXiv preprint arXiv:1602.02627 (2016).
[20] V. V. Shende and I. L. Markov, arXiv preprint
arXiv:0803.2316 (2008).
