Efficient Construction of a Control Modular Adder on a Carry-Lookahead
  Adder Using Relative-phase Toffoli Gates by Oonishi, Kento et al.
Efficient Construction of a Control Modular Adder on a Carry-Lookahead Adder
Using Relative-phase Toffoli Gates
Kento Oonishi,1, 2 Tomoki Tanaka,3, 4 Shumpei Uno,5, 4
Takahiko Satoh,4, 6 Rodney Van Meter,7, 4 and Noboru Kunihiro8
1Graduate School of Information Science and Technology, The University of Tokyo,
7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
2The Graduate School of Science and Technology, Keio University,
3-14-1 Hiyoshi, Kohoku, Yokohama, Kanagawa, 223-8522, Japan
3Mitsubishi UFJ Financial Group, Inc. and MUFG Bank, Ltd.,
2-7-1 Marunouchi, Chiyoda-ku, Tokyo, 100-8388, Japan
4Quantum Computing Center, Keio University,
3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, 223-8522, Japan
5Mizuho Information & Research Institute, Inc.,
2-3 Kanda-Nishikicho, Chiyoda-ku, Tokyo, 101-8443, Japan
6Graduate School of Media and Governance, Keio University SFC,
5322, Endo, Fujisawa, Kanagawa 252-0882 Japan
7Faculty of Environment and Information Studies,
Keio University SFC, 5322, Endo, Fujisawa, Kanagawa 252-0882 Japan
8University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8573, Japan
(Dated: October 5, 2020)
Control modular addition is a core arithmetic function, and we must consider the computational
cost for actual quantum computers to realize efficient implementation. To achieve a low compu-
tational cost in a control modular adder, we focus on minimizing KQ, defined by the product of
the number of qubits and the depth of the circuit. In this paper, we construct an efficient control
modular adder with small KQ by using relative-phase Toffoli gates in two major types of quantum
computers: Fault-Tolerant Quantum Computers (FTQ) on the Logical layer and Noisy Intermediate-
Scale Quantum Computers (NISQ). We give a more efficient construction compared to Van Meter
and Itoh’s, based on a carry-lookahead adder. In FTQ, T gates incur heavy cost due to distillation,
which fabricates ancilla for running T gates with high accuracy but consumes a lot of specially
prepared ancilla qubits and a lot of time. Thus, we must reduce the number of T gates. We propose
a new control modular adder that uses only 20% of the number of T gates of the original. Moreover,
when we take distillation into consideration, we find that we minimize KQT (the product of the
number of qubits and T -depth) by running Θ
(
n/
√
logn
)
T gates simultaneously. In NISQ, CNOT
gates are the major error source. We propose a new control modular adder that uses only 35%
of the number of CNOT gates of the original. Moreover, we show that the KQCX (the product
of the number of qubits and CNOT-depth) of our circuit is 38% of the original. Thus, we realize
an efficient control modular adder, improving prospects for the efficient execution of arithmetic in
quantum computers.
I. INTRODUCTION
Recently, functional but imperfect quantum computers
have emerged, called Noisy Intermediate-Scale Quantum
Computers (NISQ) [1], with machines from IBM [2, 3],
Google [4], Rigetti [5], IonQ [6], and Honeywell [7] all
accessible via the web.
However, we cannot realize large-scale quantum com-
putation on NISQ, due to the high error rate. These er-
rors propagate as the calculation proceeds, and we cannot
extract the correct result. Thus, we must reduce the er-
ror rate in quantum computers. To realize computation
with high accuracy, research on Fault-Tolerant Quantum
Computers (FTQ) is proceeding [8–10].
Jones et al. [11] proposed a method for constructing
FTQ as a layered architecture. Specifically, we conduct
the accurate computation on the Logical layer, which is
achieved using large numbers of physical qubits with er-
rors.
However, T gates impose an additional cost when run
on FTQ. By the Gottesman-Knill theorem [12], we can
conduct classical simulation of quantum circuits com-
posed only of Clifford gates, but to realize universal quan-
tum computation, we require non-Clifford gates such as
a T gate, taking us into a realm that cannot be sim-
ulated classically. We achieve high-fidelity T gates by
incorporating distillation [13], which requires a lot of log-
ical qubits and a lot of time; research on optimization of
distillation is being carried out [14]. In FTQ, we may re-
alize large-scale quantum algorithms such as Shor’s algo-
rithm [15] and Grover’s algorithm [16]. Shor’s algorithm
is of particular interest if it can implemented effectively,
because it solves the factorization problem or the discrete
logarithm problem in polynomial time, breaking the secu-
rity of current cryptosystems, such as RSA [17] or elliptic
curve [18, 19], whose security is based on the factorization
problem or the discrete logarithm problem, respectively.
In Shor’s algorithm, the control modular exponenti-
ar
X
iv
:2
01
0.
00
25
5v
2 
 [q
ua
nt-
ph
]  
2 O
ct 
20
20
2|x〉 • |x〉
|b〉 /
M
o
d
A
D
D
|b+ ax mod N〉
FIG. 1: Overview of a control modular adder. The
first register has a single qubit which is used as a con-
trol bit. The second register has n qubits which are
used to store the result. a and N are n-bit classical
numbers.
ation step dominates the total cost, leading many re-
searchers to study its construction [20–30]. One strategy
is realizing a control modular exponentiation by the re-
peated calling of control modular additions. Thus, if we
reduce the cost of a control modular adder, the total cost
of Shor’s algorithm will shrink. In this paper, we focus on
the efficient construction of a control modular addition.
A. Background
A control modular addition is defined by a control
qubit x and n-bit numbers a, b, and N . a and b sat-
isfy 0 ≤ a, b ≤ N −1, and a and N are classical numbers.
A control modular addition calculates
|x〉 |b〉 → |x〉 |b+ xa mod N〉 . (1)
An overview is shown in Figure 1.
However, the optimal construction of a control mod-
ular adder is not obvious. A control modular adder is
constructed from simple adders [21, 22, 24, 27–30], and
there are many kinds of adders [28, 29, 31–35]. Previous
constructions follow similar overall structure, but differ
in detail. We need to determine which combination is
the best.
To evaluate the efficiency of the circuit, we use KQ [36]
as an index. Minimizing KQ benefits both FTQ [11] and
NISQ [37]. KQ is defined as the product of the number
of qubits and the depth of the circuit. One candidate
control modular adder with small KQ, proposed by Van
Meter and Itoh [28], uses three carry-lookahead adders.
Based on this, Van Meter et al. [38] and Jones et al. [11]
analyzed the computational cost of Shor’s algorithm on
FTQ. Importantly, Jones et al. showed that we must as-
sign a large fraction of our total resources for distillation.
However, this construction has room for further mini-
mization of the number of T gates. Thapliyal et al. [39]
proposed a means of minimizing the number of T gates
in a carry-lookahead adder. This method replaces some
Toffoli gates by Gidney’s relative-phase Toffoli gates [34].
Thus, we may reduce the number of T gates by applying
Gidney’s relative-phase Toffoli gates on the Van Meter-
Itoh construction.
FIG. 2: Abstract of our results. This figure shows the
two-level optimization of a control modular adder. In
first-level optimization, we optimize the construction of
a control modular adder. In second-level optimization,
we minimize KQ for FTQ or NISQ by using relative-
phase Toffoli gates.
In the above discussion, we consider execution on FTQ,
but it is also important to consider an efficient circuit for
NISQ because we are currently in the NISQ era. Cur-
rently, NISQ machines have higher error rates on CNOT
gates than on single qubit gates [2, 3]. Thus, we must
reduce KQ based on CNOT gates for NISQ. By using
relative-phase Toffoli gates composed of a smaller num-
ber of CNOT gates [40] compared to the standard Toffoli
gate, we may reduce the cost of a control modular adder
for NISQ.
B. Our Contribution
In this study, we propose a method for optimizing a
control modular adder based on a carry-lookahead adder,
one each in FTQ or NISQ. We apply two-level optimiza-
tion on the original Van Meter-Itoh construction [28] as
in Figure 2.
In first-level optimization, we optimize the construc-
tion of a control modular adder (Section III). Specifically,
we optimize by focusing on the efficiency of the compara-
tor in a carry-lookahead adder and reduce some control
operations by taking advantage of the classicality of a
and N .
In second-level optimization, we minimize KQ for
FTQ or NISQ by using relative-phase Toffoli gates (Sec-
tion IV). In this study, we assumed all qubits are con-
nected, without considering the physical or logical topol-
ogy [22, 41, 42]. First, we clarify the definition of KQ
in each device, because the cost of gates is different in
FTQ or NISQ. Specifically, we define KQT on FTQ and
KQCX on NISQ, which is defined by the product of the
number of qubits and T -depth or CNOT-depth respec-
tively, because it is the major cost in each device. Then,
we use Gidney’s relative-phase Toffoli gates [34] in FTQ
3and Maslov’s relative-phase Toffoli gates [40] in NISQ,
instead of the standard Toffoli gates. However, the con-
struction for FTQ does not consider the cost of distil-
lation, and there is a trade-off between T -depth and the
number of T gates running simultaneously. We show that
we achieve smallest KQT when we run Θ
(
n/
√
log n
)
T
gates simultaneously.
II. PRELIMINARIES
In this paper, we optimize a carry-lookahead adder by
replacing Toffoli gates with relative-phase Toffoli gates.
To maintain an accurate calculation, we must consider
the role of Toffoli gates well. Moreover, we reduce com-
putational costs by decomposing Toffoli gates into single-
qubit gates and CNOT gates.
In subsection A, we explain the quantum gate set used
in this paper. Next, to clarify the role of Toffoli gates, we
review Draper et al.’s carry-lookahead adder [32] briefly
in subsection B. We explain T -minimization [39] by using
Gidney’s relative-phase Toffoli gates [34] in subsection C.
We review the general construction of a control modular
adder in subsection D.
A. Quantum Gate Set
In this paper, we use the following:
• Clifford gates: X gate, Y gate, Z gate, H gate,
S gate, CNOT gate
• non-Clifford gates: T gate
The CNOT gate is a two-qubit gate, and the others are
one-qubit gates. We express X gates as ⊕ in the circuit.
In this paper, we focus on two gates: T and CNOT,
T =
1 0
0 exp
(
ipi
4
) ,CNOT =
1 0 0 00 1 0 00 0 0 1
0 0 1 0
 . (2)
B. Draper et al.’s Carry-lookahead Adder
First, we explain the calculation of a+ b when a and b
are n-bit numbers. We express a as an−1an−2 . . . a0 and b
as bn−1bn−2 . . . b0, where ai and bi are 0 or 1. To calculate
a + b, we introduce a carry ci. Carry ci is defined as an
overflow from the (i − 1)-th bit to the i-th bit. In more
detail, we define ci as
ci =
0 if i = 0⌊ai−1 + bi−1 + ci−1
2
⌋
otherwise
(3)
Then, (a+ b)i, the i-th bit of a+ b, is calculated as
(a+ b)i = ai ⊕ bi ⊕ ci. (4)
Thus, we need carries to calculate an addition.
Now, we give a brief explanation of a carry-lookahead
adder. Before calculating an addition, we determine the
propagation of a carry from the i-th bit to the j-th bit
as a function of the following three conditions:
• propagate: A carry is propagated from the i-th
bit to the j-th bit. Namely, cj = ci.
• generate: A carry is generated in the j-th bit,
namely cj = 1, regardless of the value of ci.
• kill: A carry is killed in the j-th bit, namely cj = 0,
regardless of the value of ci.
To calculate the propagation, we define two functions
p [i, j] , g [i, j] ∈ {0, 1}. p [i, j] is true when the carry from
the i-th bit to the j-th bit should be propagated. Simi-
larly, g [i, j] is true when the carry out at the jth bit is
true independent of the value of the carry in at the i bit.
We do not need a separate function for kill, as its value
can be inferred from p and g. By using these functions,
we can calculate the propagation state over a wider span.
Specifically, when i < k < j,
p [i, j] = p [i, k] ∧ p [k, j] , (5)
g [i, j] = g [k, j]⊕ (g [i, k] ∧ p [k, j]) , (6)
where ∧ is Boolean AND, and ⊕ is Boolean XOR. By
using these properties, we calculate cj = g [0, j].
Now, we explain Draper et al.’s carry-lookahead adder
for |a〉 |b〉 → |a〉 |b+ a〉. This requires an additional n
qubits for the carry register |c〉 and n qubits for register
|p〉, containing p [i, j]. Thus, a carry-lookahead adder
requires 4n qubits.
Now, we explain the implementation briefly. This im-
plementation consists of five phases, Initialization, P-
rounds, G-rounds, C-rounds, and inverse P-rounds. In
each round,
• Initialization: we calculate g [i, i+ 1] in |ci+1〉
and p [i, i+ 1] in |bi〉,
• P-rounds: we calculate the p-function and write
result in |p〉,
• G-rounds: we calculate |c2k〉 (k ∈ N) by calculat-
ing some g-function,
• C-rounds: we calculate all carry |c〉 by calculating
some g-function,
and we clean |p〉 in inverse P-rounds. After inverse P-
rounds, we calculate each bit of a + b by using these
carries |c〉. In this calculation, we run P-rounds and G-
rounds simultaneously, and we run C-rounds and inverse
P-rounds simultaneously. However, the value of carries
remain on |c〉. Thus, we must clean |c〉 to |0〉 except
4|a0〉
In
it
ia
li
za
ti
o
n
P G C
In
v
er
se
P
E
ra
si
n
g
C
a
rr
y
|a0〉
|b0〉 |(a+ b)0〉|c1〉 • |c1〉
|a1〉 |a1〉
|b1〉 |(a+ b)1〉
. . . / . . .
|cn−2〉 • |cn−2〉
|an−2〉 |an−2〉
|bn−2〉 |(a+ b)n−2〉
|cn−1〉 • |cn−1〉
|an−1〉 |an−1〉
|bn−1〉 |(a+ b)n−1〉
|cn〉 |(a+ b)n〉
FIG. 3: An abstract figure of Draper et al.’s carry-
lookahead adder. In this figure, we sort qubits from the
lowest qubits to the highest qubits, which is different
from Figure 1. |ci〉 is given as |0〉 at the beginning of
this circuit and these are cleared to |0〉 after Erasing
Carry. The detailed circuit is shown in Appendix A.
|p [i, k]〉 •
|p [k, j]〉 •
|0〉 |p [i, j]〉
(a) Calculation circuit of
p[i, j] as eq. (5).
|g [i, k]〉 •
|p [k, j]〉 •
|g [k, j]〉 |g [i, j]〉
(b) Calculation circuit of
g[i, j] as eq. (6).
FIG. 4: Calculation circuit of p [i, j] and g [i, j].
for cn. Draper et al. found that the value of carries
ci except cn in a+ b is the same in a+ (2
n − 1− a− b).
Therefore, we erase carries by performing the addition a+
(2n − 1− a− b) on the lower n− 1 qubits. The abstract
circuit is shown in Figure 3.
As noted above, a carry-lookahead adder is mainly con-
structed by a calculation on p and g. We calculate p
and g with eq. (5) or (6) respectively, and those are im-
plemented by Toffoli gates as shown in Figure 4. The
detailed explanation of Draper et al.’s adder, including
which p-function or g-function we calculate, is given in
Appendix A. In total, a carry-lookahead adder requires
10n Toffoli gates and 4n CNOT gates. Moreover, the
Toffoli depth is 4 log n.
Up to this point, we have explained the construction of
an adder. Draper et al. also proposed other operations,
such as a subtractor and a comparator, based on their
adder. The number of gates and the depth in a subtractor
is almost the same as those in an adder. In a comparator,
the number of gates is 60% of an adder and the depth is
50% of an adder. Draper et al. implement a comparator
using only Initialization, P-rounds, G-rounds, and their
inverses. More precisely, Draper et al. regard a and b as
2dlogne-bit numbers by padding 0 in higher bits, but we
do not use these qubits. If we calculate p [i, j] or g [i, j]
when i ≤ n− 1 and j ≥ n, we calculate p [i, n] or g [i, n]
• • • • T
• • T † T † S
H T † T T † T H
FIG. 5: Standard decomposition of Toffoli gate [12].
We call this decomposition ST. The control bits are the
first and second qubits, and the target bit is the third
qubit. This calculation preserves the phase.
• T †
• T †
H T • T • H S
FIG. 6: Gidney’s relative-phase Toffoli gate [34] given
by the unitary matrix (7). We call this decomposition
GRT. The control bits are the first and second qubits,
and the target bit is the third qubit. This calculation
preserves the phase only when we input |0〉 on the tar-
get qubit.
respectively. Then, we calculate g [0, n] after G-rounds.
C. T -count Minimization of a Carry-lookahead
Adder
Thapliyal et al. [39] proposed T -count minimization
by using relative-phase Toffoli gates. The standard Tof-
foli gate (ST) [12] decomposition is given in Figure 5.
However, we can calculate correctly even if we replace
some Toffoli gates with Gidney’s relative-phase Toffoli
gate (GRT) or its inverse (IGRT) [34]. GRT is shown in
Figure 6 and the corresponding unitary matrix of GRT
•
Z
H •
FIG. 7: Inverse of Gidney’s relative-phase Tof-
foli gate [34]. We call this decomposition IGRT.
This calculation preserves the phase when we input
|000〉 , |010〉 , |100〉, or |111〉, which are outputs of GRT
having valid phase. Control-Z is a Clifford gate, and we
use no T gate.
5GRT IGRT
• •
• •
|0〉 •
|ci〉
FIG. 8: Replacing Toffoli gates in G-rounds and C-
rounds in a T -optimized carry-lookahead adder. We
call this decomposition PGRT. We replace the first Tof-
foli gate with GRT and the second Toffoli gate with
IGRT. The third qubit is an ancilla qubit. This qubit is
measured in IGRT and will be |0〉 after running PGRT.
in the computational basis is
1 0 0 0 0 0 0 0
0 i 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 −i 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 −i 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 −i 0

, (7)
and we calculate correctly when the target bit is |0〉.
IGRT is shown in Figure 7. In the carry-lookahead adder,
as in many circuits, we must clean our ancilla qubits, re-
turning them to a known, disentangled state, typically
|0〉. In this case, we can reduce our cost by measuring
the ancilla on IGRT, assuming the cost of measurement
is small. By using GRT and IGRT, the number of T gates
is reduced compared to using the only ST.
Thapliyal et al. proposed two constructions. The
first construction replaced Toffoli gates in Initialization
and P-rounds with GRT, and Toffoli gates in the inverse
rounds with IGRT. Other Toffoli gates are replaced with
ST. Thapliyal et al. call this construction qubit-optimize.
The number of qubits is 4n and the number of T gates
is 40n.
The second construction replaced all Toffoli gates into
GRT or IGRT by increasing ancilla qubits. Thapliyal
et al. call this construction T -optimize. Specifically, we
replace Toffoli gates in Initialization, P-rounds, and the
inverse of these similarly as the first construction. More-
over, we replace Toffoli gates in G-rounds and C-rounds
by the pair of GRT and IGRT as in Figure 8. We call
these gates PGRT, where P is the abbreviation of “pair”.
In this construction, Thapliyal et al. claim that the num-
ber of qubits is 6n and the number of T gates is 20n.
However, we recalculated these results and our results
differ from results in [39]. In our result, the number of
qubits is 4.5n and the number of T gates is 28n. The dif-
ference in the number of qubits occurs from our method
for preparing ancilla qubits. Thapliyal et al. prepare
new ancilla qubits for G-rounds and C-rounds respec-
tively, while they recycle ancilla qubits for P-rounds. We
CTRL |x〉 • • • |x〉
|b〉 /
C
o
m
p
A
d
d
C
o
m
p |b+ ax mod N〉
COMP |0〉 • |0〉
FIG. 9: The general construction of a control modu-
lar adder. Add means an adder, and Comp means a
comparator. CTRL has a single qubit which is used to
hold the value of the control. |b〉 has n qubits which are
used to hold the result of a control modular addition.
COMP has a single qubit which is used to hold the re-
sult of a comparison. a and N are classical numbers.
apply this to G-rounds and C-rounds similarly.
D. The General Construction of a Control
Modular Adder
In this subsection, we explain the calculation of
|x〉 |b〉 |0〉 → |x〉 |b+ ax mod N〉 |0〉 . (8)
The general construction of a control modular adder is
shown in Figure 9. The first register has a single qubit
which is used to hold the value of the control. We call this
the CTRL qubit. The second register has n qubits which
are used to hold the result of a control modular addition.
The third register has a single qubit which is used to
hold the result of a comparison temporarily. We call this
the COMP qubit. Specifically, we determine whether
we subtract N or not based on COMP. We conduct a
comparator with one control qubit and an adder with
two control qubits, and we write these as a C-comparator
and a CC-adder, respectively.
To execute a control modular adder, we conduct oper-
ations in this order:
1. We compare the second register |b〉 and the classical
value N − a. If b ≥ N − a, namely a + b ≥ N , we
flip COMP.
2. If both CTRL and COMP are 1, we subtract N−a
from the second register. If CTRL is 1 and COMP
is 0, we add a. Otherwise, we add no value.
3. If the second register is strictly less than a, we flip
COMP.
III. FIRST-LEVEL OPTIMIZATION: OUR
CONSTRUCTION OF A CONTROL MODULAR
ADDER
In this section, we explain first-level optimization
on the original construction [28]. In the general con-
struction, a comparator has about 1/2 the depth of a
6CC-Adder
CTRL |x〉 • • • • |x〉
|d〉 |0〉 /
E
m
b
ed
A
d
d R
es
et |0〉
|b〉 /
C
o
m
p
C
o
m
p |b+ ax mod N〉
COMP |0〉 • • |0〉
FIG. 10: Our construction of a control modular adder
based on Figure 9. A CC-adder is constructed by em-
bedding, an adder, and resetting. Then, we add the
second register |d〉 as an n-qubit ancilla for embedding
the value based on CTRL. The carry register |c〉 with
n qubits and the p-function register |p〉 with n qubits
are not represented in this figure for visibility. In a C-
comparator, we do not use the second register. In total,
our control modular adder requires 4n+ 2 qubits.
carry-lookahead adder. Thus, by constructing a carry-
lookahead adder using the same general construction, the
depth is about the same as 2 adders, because a carry-
lookahead adder is composed of two comparators and
one adder. In the original construction, we use 3 adders.
Thus, we use only 2/3 of KQ of the original construction.
Then, we need to give the construction of
• C-comparator (subsection A)
• CC-adder (subsection B)
on a carry-lookahead adder. In this construction, we do
not decompose Toffoli gates, because the decomposition
of Toffoli gates is different in FTQ or NISQ respectively.
Thus, we leave Toffoli gates as they are, and we consider
the decomposition of Toffoli gates in Section IV.
In our construction, we consider the classicality of a
and N as described by Markov and Saeedi [25] to realize
higher efficiency. Moreover, we consider a C-comparator
precisely that is not considered in the original construc-
tion. By doing these, we propose a circuit construction
of a control modular adder.
Based on Figure 9, we construct our circuit as shown in
Figure 10. We add the second n-qubit ancilla register for
embedding value with CTRL. In addition to these regis-
ters, we use the carry register |c〉 with n qubits and the
p-function register |p〉 with n qubits to realize the carry-
lookahead adder, not represented in Figure 10. Thus, our
control modular adder requires 4n+ 2 qubits. The num-
ber of gates and the depth is given in Table I, and the
breakdown of this is given in Table V in Appendix B.
Now, we explain the C-comparator and the CC-adder
briefly.
CTRL •
|d0〉
In
it
ia
li
za
ti
o
n
P G
In
v
er
se
G
In
v
er
se
P
In
v
er
se
In
it
ia
li
za
ti
o
n
|b0〉
|c1〉
. . . /
|dn−1〉
|bn−1〉
|cn〉 •
COMP
FIG. 11: Block-level view of our construction of a C-
comparator. In this figure, we sort qubits from the low-
order qubits to the high-order qubits, top to bottom.
This circuit is symmetric about the Toffoli gate sur-
rounded by a dotted box. |ci〉 is given as |0〉 at the be-
ginning of this circuit and these are cleared back to |0〉
after the computation. The example circuits are shown
in Figure 23 or 25 in Appendix B.
A. Construction of a C-comparator
In a C-comparator, only COMP is changed and other
qubits do not change. Thus, to implement a C-
comparator, it is sufficient that we add control oper-
ations only on the gates including COMP and remain
other gates.
In our construction of a control modular adder, we use
two types of C-comparators. In the first C-comparator,
we flip COMP if CTRL is 1 and b ≥ N − a. In the final
C-comparator, we flip COMP if CTRL is 1 and b < a.
In both cases, we judge whether b ≥ d or b < d with a
classical value of d.
We construct these operations taking advantage of the
classicality of d. The intuitive explanation of this opera-
tion is that we calculate b+ (2n − d) and check whether
there is an overflow in the n-th bit. Specifically,
b+ (2n − d) = 2n + (b− d) (9)
and there is an overflow when b ≥ d. This construc-
tion is similar to previous constructions by Markov and
Saeedi [25], but slightly different from them because our
construction does not requireX gates on |b〉. The number
of gates and the depth is given in Table I. The detailed
construction is given in Appendix B. The abstract con-
struction of our C-comparator is given in Figure 11, and
the example circuits are shown in Figure 23 and 25.
B. Construction of a CC-adder
In a CC-adder, we embed values before and after an
adder, similar to a C-adder [29]. Based on this construc-
tion, we apply optimization by considering the classical-
ity of a and N . From this point forward, we mainly focus
7TABLE I: Gate count and depth of our proposed control modular adder. The breakdown of this is shown in Ta-
ble V in Appendix B.
Count Depth
Operation Toffoli CNOT Toffoli CNOT
C-comparator (twice) 4n n 2 logn O(1)
CC-adder 9.5n 4.75n 4 logn 2 logn
Total 17.5n 6.75n 8 logn 2 logn
CTRL • •
|d〉 / 2n + a−N a
COMP • •
FIG. 12: Block-level diagram of the embedding circuit.
We omit |b〉 in Figure 10. We embed 2n + a−N or a on
|d〉 based on CTRL and COMP. The example circuit of
the embedding is shown in Figure 24 in Appendix B.
on embedding on |d〉. In a CC-adder, we conduct the fol-
lowing:
• If CTRL is 1 and COMP is 1, we add a and subtract
N . This operation can be realized by adding 2n +
a − N and disregarding the calculation of a carry
cn.
• If CTRL is 1 and COMP is 0, we add a.
• Otherwise, we add no value.
Thus, the embedding is conducted as in Figure 12. The
resetting is conducted by inverting the embedding circuit.
After embedding, we apply a standard adder. Then,
we conduct two optimizations as follows:
• disregarding gates including |g [0, n]〉.
• eliminating gates in Initialization where we know
the control bit is 0.
The number of gates and the depth is given in Table I.
The detailed construction is given in Appendix B. More-
over, we give the example circuit of a CC-adder in Fig-
ure 24 in Appendix B.
IV. SECOND-LEVEL OPTIMIZATION:
CONSTRUCTING A CONTROL MODULAR
ADDER FOR FTQ AND NISQ DEVICES
In this section, we explain our second-level optimiza-
tion. We evaluate the computational cost for both FTQ
on the logical layer, and NISQ, focusing on the decom-
position of Toffoli gates. We define KQ more specifically
for FTQ and NISQ and minimize this value. For FTQ,
we minimize the number of T gates by using Gidney’s
|ψ〉 • • S |ψ〉
|Y 〉 H H |Y 〉
FIG. 13: Running an S gate [11]. The second qubit is
|Y 〉 = (|0〉+ i |1〉) /√2. Assuming correct operation on
top of error correction, this ancilla passes through the
gate execution unmodified, allowing it to be reused.
relative-phase Toffoli gates. However, this construction
does not take into consideration the cost of distillation.
We take into account the cost of distillation by finding
the maximal number of T gates which should be run
simultaneously, optimizing KQT . For NISQ, we apply
Maslov’s relative-phase Toffoli gates with a small num-
ber of CNOT gates [40] and minimize KQCX. By doing
these, we propose a control modular adder that is more
efficient than Van Meter and Itoh [28], called the original
construction in this section. In the following discussion,
we disregard the rounds with O(1) gates. In this section,
we explain the optimization for FTQ in subsection A and
the optimization for NISQ in subsection B.
A. Computational Cost on the FTQ Logical Layer
Next, we consider the optimal circuit for FTQ on
the Logical layer, using Jones et al.’s architecture as a
model [11]. This architecture, in common with other er-
ror corrected-architectures,provides a fundamental gate
set consisting of X, Y , Z, CNOT, and H gates, and mea-
surement; here, we ignore qubit movement in the surface
code. To run an S gate, we prepare an ancilla qubit
|Y 〉 = (|0〉+ i |1〉) /√2 and run the circuit shown in Fig-
ure 13. An S† gate can be realized by the reverse circuit
of Figure 13.
To achieve universal computation, we also need a non-
Clifford gate; the choice of T is typical. To run a T gate,
we prepare an ancilla qubit |A〉 = (|0〉+ eipi/4 |1〉) /√2
and run the circuit shown in Figure 14. To run a T † gate,
we apply an S† gate instead of a S gate. To realize
accurate T gates, we must prepare accurate |A〉 state
defined by
(|0〉+ eipi/4 |1〉) /√2. Preparing |A〉 is done
by distillation, as shown in Figure 15. This distillation
circuit requires 15 qubits and 6 time steps, even assuming
all of the CNOT gates can be implemented concurrently,
but this is difficult to realize. Distillation is an expensive
8|ψ〉 • MZ
|A〉 S T |ψ〉
FIG. 14: Running a T gate [11]. The second qubit
|A〉 = (|0〉+ eipi/4 |1〉) /√2; the |A〉 state is consumed in
the process, with the consequence that creation of high-
fidelity |A〉 states is one factor limiting performance.
|A〉 • |A〉
|A〉 • MX
|A〉 • MX
|A〉 • MX
|A〉 MZ
|A〉 MZ
|A〉 MZ
|A〉 • MX
|A〉 MZ
|A〉 MZ
|A〉 MZ
|A〉 MZ
|A〉 MZ
|A〉 MZ
|A〉 MZ
FIG. 15: A distillation circuit of |A〉 [13]. By this dis-
tillation circuit, we reduce the error rate of |A〉 from p
to 35p3.
operation, and its optimization is an ongoing topic of
research [14]. Thus, a T gate is the greatest factor in the
cost of an FTQ circuit, leading us to focus on reducing
the number of T gates.
Now, we minimize the number of T gates on our con-
trol modular adder. In our CC-adder, we adopt construc-
tion similar to Thapliyal et al., replacing Toffoli gates in
G-rounds and C-rounds with PGRT especially. We min-
imize the number of T gates in a C-comparator. In a
C-comparator, we replace Toffoli gates in Initialization
and P-rounds with GRT and Toffoli gates in the inverse
rounds with IGRT as the same as Thapliyal et al.’s con-
struction. We focus on G-rounds. The abstract circuit
of a C-comparator is shown in Figure 11, and we give ex-
ample circuit as Figure 23 or 25 in Appendix B. In these
figures, Toffoli gates in G-rounds and inverse G-rounds
are symmetric about the Toffoli gate surrounded by a
dotted box. The control qubits of corresponding Toffoli
gates in G-rounds and inverse G-rounds are the same.
G inverse G
GRT IGRT
• ... •
• ... •
|0〉 • ... •
|ci〉 ...
FIG. 16: Our construction of G-rounds and inverse G-
rounds in a C-comparator. In Figure 8, we apply IGRT
after the first CNOT gate immediately in G-rounds and
inverse G-rounds. In our construction, we calculate the
result of GRT in the third ancilla qubit and preserve
this qubit until the corresponding Toffoli gate in inverse
G-rounds. Then, we clear this ancilla qubit by IGRT.
Therefore, we can calculate with an accurate phase as
Figure 16. This construction requires an additional n
qubits to preserve. Fortunately, we do not use n qubits
for |d〉 in Figure 10. Thus, we realize this construction
without an overhead of qubits.
The computational cost of our control modular adder
is shown in Table II, and the breakdown of constructions
based on our construction is given in Table VI in Ap-
pendix D. From Table II, our construction requires 43n
T gates. We call this construction a T -optimal control
modular adder. The original construction requires 30n
Toffoli gates implemented by ST requiring 7 T gates,
and 210n T gates in total. Thus, our construction re-
quires 20% T of the number of T gates of the original
construction.
Now, we focus on KQ of a T -optimal control modular
adder. In this circuit, we use O(n) qubits and O(log n)
depth, giving a KQ of O(n log n). However, we do not
consider the computational costs for distillation in this
calculation. We can trade space for time, with substan-
tial flexibility, by allocating more qubits to ancilla “facto-
ries”, corresponding to increasing the number of T gates
that are in concurrent execution [10, 38]. For an accu-
rate estimate of the cost, and to enable fair comparison
with prior research, we must take into account the T gate
costs, including the space for distillation [11, 43].
However, it is difficult to calculate computational costs
for distillation precisely, because the cost depends on
many architecture-specific parameters. Instead of KQ,
we define a new index KQT , defined as the product of
the number of logical qubits and T -depth. We define
nT as the T -width, the upper-bound of the number of
T gates running simultaneously. We assume that we re-
quire a constant cg logical qubits for the distillation step.
By calculating nT minimizing KQT , we reduce the com-
putational cost of our control modular adder.
In the above discussion, our control modular adder
uses 4n+2 qubits for calculation, as explained in Section
III. In addition, we require ancilla qubits for running nT
T gates. Specifically, to run one T gate, we require one
9TABLE II: T -count of our control modular adder and prior work. The latter four constructions are based on our
construction proposed in Section III. The breakdown of the latter four constructions is shown in Table VI in Ap-
pendix D.
Construction #comparators #adders Total T -count
Van Meter and Itoh [28] 0 3 210n
Draper et al. [32] 2 1 122.5n
Thapliyal et al. (qubit-optimize) [39] 2 1 75n
Thapliyal et al. (T -optimize) [39] 2 1 51n
Ours 2 1 43n
qubit |Y 〉 for running S gates and cg qubits for generat-
ing |A〉. Thus, when we run nT T gates simultaneously,
we use the following qubits:
• |y〉 (Contains |Y 〉 states) nT qubits
• |g〉 (Generates |A〉 states) cgnT qubits.
The number of qubits in |y〉 is given as nT , because we
consume one S gate in each T gate. Then, the number
of qubits is
4n+ (cg + 1)nT + 2. (10)
Now, we calculate T -depth. To calculate T -depth, we
assume that we run GRT with the same timing, and each
GRT has 2 T -depth from Figure 6. T -depth depends on
nT as Figure 17. Then, T -depth is
86n
nT
+ 12 log nT − 12. (11)
The detailed calculation is given in Appendix C.
FIG. 17: Calculating T -depth. Distill means distilla-
tion circuits. In the naive construction, we run as many
T gates as possible. In our construction, we restrict the
upper-bound of the number of simultaneous T gates to
nT . When we reduce nT , the total number of qubits is
smaller and T -depth is larger.
Now, we minimize KQT on nT . KQT is
(4n+ (cg + 1)nT + 2)
(
86n
nT
+ 12 log nT − 12
)
. (12)
We minimize this on nT > 0.
Letting the expression in Eq. 12 be f (nT ), we see that
d2f (nT )
dn2T
> 0 (13)
in nT > 0. Thus, f (nT ) is a convex function and it
is sufficient to search for only one optimal value of nT .
Then, the optimal value
nT =
√
86
3(cg + 1)
n√
log n
(14)
Thus, O
(
n√
log n
)
T -width minimizes KQT . Plugging
this value into Eq. 12,
4n+ (cg + 1)nT + 2 ∼ 4n (15)
86n
nT
+ 12 log nT − 12 ∼ 12 log n (16)
Therefore, the dominant term of KQT is 48n log n.
B. Optimization for NISQ
Now, we propose a form of the control modular adder
reducing CNOT gates. To reduce this number, we review
the decomposition of Toffoli gates into CNOT gates. We
use relative-phase Toffoli gates with differences in phase
as in Figures 18 and 19, proposed by Maslov [40]. The
corresponding unitary matrix of Figure 18 in the compu-
tational basis is
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 −1 0 0
0 0 0 0 0 0 0 i
0 0 0 0 0 0 −i 0

. (17)
This calculation changes the phase when we input
|1〉 |0〉 |1〉, |1〉 |1〉 |0〉, or |1〉 |1〉 |1〉. We call this relative-
phase Toffoli gate RT3, and we call its inverse IRT3. The
10
•
• •
H T T † T T † H
FIG. 18: A relative-phase Toffoli gate with 3
CNOT (RT3). This calculation changes the phase when
we input |1〉 |0〉 |1〉, |1〉 |1〉 |0〉, and |1〉 |1〉 |1〉. We call the
inverse circuit of RT3, IRT3.
• •
• •
H T † T T † T H
FIG. 19: A relative-phase Toffoli gate with 4
CNOT (RT4). This calculation change the phase when
both control bits are 1. We call the inverse circuit of
RT4, IRT4.
corresponding unitary matrix of Figure 19 in the compu-
tational basis is
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 −i
0 0 0 0 0 0 −i 0

. (18)
This calculation changes the phase when both control
bits are 1. We call this relative-phase Toffoli gate RT4,
and its inverse IRT4. By using these relative-phase Tof-
foli gates, we reduce the number of CNOT gates. Next,
we address which Toffoli gates can be replaced with
relative-phase Toffoli gates.
First, we consider which Toffoli gates can be replaced
in a C-comparator. The structure of a C-comparator
is shown in Figure 11, and we give an example circuit in
Figure 23 or 25 in Appendix B. In these figures, all Toffoli
gates are symmetric about the Toffoli gate surrounded by
a dotted box, in the middle of the circuit. Thus, we can
replace the Toffoli gates to the left of the dotted box by
RT3 and those to the right of the box by IRT3. Therefore,
we can replace all of the Toffoli gates in a C-comparator
except this middle one with RT3 or IRT3.
Next, we address which Toffoli gates can be replaced
in a CC-adder, and find that those in P-rounds can be
replaced by RT3 and those in inverse P-rounds by IRT3.
The other Toffoli gates used calculate the value of carries,
and these carries are cleared after the calculation. In the
calculation of carries, the values of the control bits change
between calculating a carry and erasing it, which would
seem to rule out using anything but pure Toffoli gates.
However, looking more closely, we see that the value of
a carry changes at most once, namely when both control
bits are |1〉. Thus, if we calculate correctly in the other
situations, we can calculate and clear carries correctly.
RT4 satisfies this. Therefore, we can replace Toffoli gates
by RT4 in the Initialization, G-rounds, and C-rounds,
and we can replace Toffoli gates by IRT4 in the inverse
rounds.
As a result, the cost of our control modular adder is
shown in Table III and IV. The breakdown of those based
on our construction are shown in Table VII and VIII in
Appendix D. From Table III and IV, our construction
is better in terms of both the number of CNOT gates
and CNOT-depth. Now, we compare our circuit to the
original construction.
First, we compare the CNOT count. Our construction
requires 64.75n CNOT gates. The original construction
requires 30n Toffoli gates implemented by ST using 6
CNOT gates, and we use an additional 4.5n CNOT gates
in embedding or resetting. Thus, the original construc-
tion requires 184.5n CNOT gates in total. Therefore, our
construction reduces the number of CNOT gates to only
35% of the number in the original.
Next, we compare KQCX, defined as the product of
the number of qubits and CNOT-depth. Our construc-
tion requires 120n log n KQCX. The original construc-
tion requires 12 log n Toffoli depth implemented by ST
requiring 6CNOT-depth, and we require 6 log n CNOT-
depth for the embedding step. Thus, the original con-
struction requires 78 log n CNOT-depth and 312n log n
KQCX. Therefore, our construction requires only 38% of
the KQCX of the original construction.
V. CONCLUSION AND FUTURE WORK
In this study, we proposed a method of optimizing
a control modular adder based on a carry-lookahead
adder [32] and Van Meter and Itoh’s construction [28].
First, we show that the general construction given as Fig-
ure 9 is about 2/3 of the KQ of the original construction.
Then, we construct a more efficient circuit. We evaluate
the computational cost in FTQ and we show that our
circuit requires only 20% of the T gates of the original.
Moreover, we show that our circuit achieves its minimum
KQT when we run Θ
(
n√
log n
)
T gates simultaneously.
Finally, we propose an efficient circuit for use in the NISQ
era, and we show that our circuit requires only 35% of
the CNOT gates and 38% KQCX of the original.
In this work, we have focused on optimizing Toffoli
gates by using relative-phase Toffoli gates. However, in
previous research [44, 45], other researchers have used
gates such as Fredkin and Peres gates. These gates also
may be simplified by replacing them with relative-phase
gates. Thus, we expect that those circuits would also
show an improvement with these techniques applied.
In this paper, we have considered only the single con-
trol modular addition. In additional future work, the
11
TABLE III: CNOT count of our control modular adder and prior work. The latter four constructions are based on
our construction proposed in Section III. The breakdown of the latter four constructions is shown in Table VII in
Appendix D.
Construction #comparators #adders Total CNOT count
Van Meter and Itoh [28] 0 3 184.5n
Draper et al. [32] 2 1 111.75n
Thapliyal et al. (qubit-optimize) [39] 2 1 88n
Thapliyal et al. (T -optimize) [39] 2 1 104n
Ours 2 1 64.75n
TABLE IV: KQCX of our control modular adder and prior work. The latter four constructions are based on our
construction proposed in Section III. The breakdown of the latter four constructions are shown in Table VIII in Ap-
pendix D.
Construction #qubits The depth of the circuit KQCX
Van Meter and Itoh [28] 4n 78 logn 312n logn
Draper et al. [32] 4n 50 logn 200n logn
Thapliyal et al. (qubit-optimize) [39] 4n 50 logn 200n logn
Thapliyal et al. (T -optimize) [39] 4.5n 66 logn 297n logn
Ours 4n 30 logn 120n logn
circuits that postpone and summarize multiple modular
arithmetic operations, as proposed by Van Meter and
Itoh [28], should be addressed using similar optimization
techniques. In addition, it is important to minimize KQ
by reordering gates [26, 46].
Our construction does not consider the architecture of
quantum computers as linear nearest neighbor architec-
ture [22, 41, 42]. Thus, in the next step, we will consider
the appropriate architecture and additional cost for our
construction.
Lastly, we focused only on the Logical layer of FTQ in
this study. Future work, we must consider the mapping
to physical qubits, as well as distillation protocols.
ACKNOWLEDGEMENT
The first author is supported by a JSPS Fellowship
for Young Scientists. This work was supported by
JSPS Grant-in-Aid for JSPS Fellows 20J11754, MEXT
Quantum Leap Flagship Program Grant Number JP-
MXS0118067285, and JST CREST Grant Number JP-
MJCR14D6, Japan.
[1] J. Preskill, Quantum computing in the NISQ era and
beyond, Quantum 2, 79 (2018).
[2] A. D. Co´rcoles, A. Kandala, A. Javadi-Abhari, D. T. Mc-
Clure, A. W. Cross, K. Temme, P. D. Nation, M. Stef-
fen, and J. Gambetta, Challenges and opportunities of
near-term quantum computing systems, arXiv preprint
arXiv:1910.02894 (2019).
[3] IBM Quantum Experience, https://quantum-
computing.ibm.com.
[4] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin,
R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A.
Buell, et al., Quantum supremacy using a programmable
superconducting processor, Nature 574, 505 (2019).
[5] R. S. Smith, M. J. Curtis, and W. J. Zeng, A practi-
cal quantum instruction set architecture, arXiv preprint
arXiv:1608.03355 (2016).
[6] P. Murali, D. M. Debroy, K. R. Brown, and
M. Martonosi, Architecting noisy intermediate-scale
trapped ion quantum computers, arXiv preprint
arXiv:2004.04706 (2020).
[7] S. Moses, J. Pino, J. Dreiling, C. Figgatt, J. Gae-
bler, M. Allman, C. Baldwin, M. Foss-Feig, D. Hayes,
K. Mayer, et al., Demonstration of the QCCD trapped-
ion quantum computer architecture, Bulletin of the
American Physical Society (2020).
[8] S. J. Devitt, W. J. Munro, and K. Nemoto, Quantum
error correction for beginners, Reports on Progress in
Physics 76, 076001 (2013).
[9] J. Preskill, Fault-tolerant quantum computation, in In-
troduction to quantum computation and information
(World Scientific, 1998) pp. 213–269.
[10] A. M. Steane, Space, time, parallelism and noise require-
ments for reliable quantum computing, Fortschritte der
Physik: Progress of Physics 46, 443 (1998).
[11] N. C. Jones, R. Van Meter, A. G. Fowler, P. L. McMa-
hon, J. Kim, T. D. Ladd, and Y. Yamamoto, Layered
architecture for quantum computing, Physical Review X
2, 031007 (2012).
[12] M. A. Nielsen and I. Chuang, Quantum computation and
quantum information (2002).
[13] A. G. Fowler, A. M. Stephens, and P. Groszkowski, High-
threshold universal quantum computation on the surface
code, Physical Review A 80, 052312 (2009).
12
[14] C. Gidney and A. G. Fowler, Efficient magic state fac-
tories with a catalyzed |CCZ〉 → 2 |T 〉 transformation,
Quantum 3, 135 (2019).
[15] P. W. Shor, Polynomial-time algorithms for prime factor-
ization and discrete logarithms on a quantum computer,
SIAM review 41, 303 (1999).
[16] L. K. Grover, A fast quantum mechanical algorithm for
database search, in Proceedings of the twenty-eighth an-
nual ACM symposium on Theory of computing (1996)
pp. 212–219.
[17] R. L. Rivest, A. Shamir, and L. Adleman, A method
for obtaining digital signatures and public-key cryptosys-
tems, Communications of the ACM 21, 120 (1978).
[18] N. Koblitz, Elliptic curve cryptosystems, Mathematics of
computation 48, 203 (1987).
[19] V. S. Miller, Use of elliptic curves in cryptography, in
Conference on the theory and application of cryptographic
techniques (Springer, 1985) pp. 417–426.
[20] D. Beckman, A. N. Chari, S. Devabhaktuni, and
J. Preskill, Efficient networks for quantum factoring,
Physical Review A 54, 1034 (1996).
[21] J. Davies, C. J. Rickerd, M. A. Grimes, and D. O¨. Gu¨ney,
An n-bit general implementation of Shor’s quantum fac-
toring algorithm, Quantum Information & Computation
16, 700 (2016).
[22] A. G. Fowler, S. J. Devitt, and L. C. Hollenberg, Imple-
mentation of Shor’s algorithm on a linear nearest neigh-
bor qubit array, Quantum Information & Computation
4, 237 (2004).
[23] C. Gidney and M. Eker˚a, How to factor 2048 bit RSA
integers in 8 hours using 20 million noisy qubits, arXiv
preprint arXiv:1905.09749 (2019).
[24] T. Ha¨ner, S. Jaques, M. Naehrig, M. Roetteler, and
M. Soeken, Improved quantum circuits for elliptic curve
discrete logarithms, in International Conference on Post-
Quantum Cryptography (Springer, 2020) pp. 425–444.
[25] I. L. Markov and M. Saeedi, Constant-optimized quan-
tum circuits for modular multiplication and exponen-
tiation, Quantum Information & Computation 12, 361
(2012).
[26] A. Pavlidis and D. Gizopoulos, Fast quantum modu-
lar exponentiation architecture for shor’s factoring al-
gorithm, Quantum Information & Computation 14, 649
(2014).
[27] M. Roetteler, M. Naehrig, K. M. Svore, and K. Lauter,
Quantum resource estimates for computing elliptic curve
discrete logarithms, in International Conference on the
Theory and Application of Cryptology and Information
Security (Springer, 2017) pp. 241–270.
[28] R. Van Meter and K. M. Itoh, Fast quantum modular
exponentiation, Physical Review A 71, 052320 (2005).
[29] V. Vedral, A. Barenco, and A. Ekert, Quantum networks
for elementary arithmetic operations, Physical Review A
54, 147 (1996).
[30] C. Zalka, Fast versions of Shor’s quantum factoring algo-
rithm, arXiv preprint quant-ph/9806084 (1998).
[31] S. A. Cuccaro, T. G. Draper, S. A. Kutin, and D. P.
Moulton, A new quantum ripple-carry addition circuit,
arXiv preprint quant-ph/0410184 (2004).
[32] T. G. Draper, S. A. Kutin, E. M. Rains, and K. M. Svore,
A logarithmic-depth quantum carry-lookahead adder,
Quantum Information & Computation 6, 351 (2006).
[33] T. G. Draper, Addition on a quantum computer, arXiv
preprint quant-ph/0008033 (2000).
[34] C. Gidney, Halving the cost of quantum addition, Quan-
tum 2, 74 (2018).
[35] Y. Takahashi, S. Tani, and N. Kunihiro, Quantum addi-
tion circuits and unbounded fan-out, Quantum Informa-
tion & Computation 10, 872 (2010).
[36] A. M. Steane, Overhead and noise threshold of fault-
tolerant quantum error correction, Physical Review A 68,
042322 (2003).
[37] T. Satoh, Y. Ohkura, and R. Van Meter, Subdivided
phase oracle for NISQ search algorithms, arXiv preprint
arXiv:2001.06575 (2020).
[38] R. Van Meter, T. D. Ladd, A. G. Fowler, and Y. Ya-
mamoto, Distributed quantum computation architecture
using semiconductor nanophotonics, International Jour-
nal of Quantum Information 8, 295 (2010).
[39] H. Thapliyal, E. Mun˜oz-Coreas, and V. Khalus, T-
count and qubit optimized quantum circuit designs of
carry lookahead adder, arXiv preprint arXiv:2004.01826
(2020).
[40] D. Maslov, Advantages of using relative-phase Toffoli
gates with an application to multiple control Toffoli op-
timization, Physical Review A 93, 022311 (2016).
[41] B.-S. Choi and R. Van Meter, A θ (
√
n)-depth quantum
adder on the 2D NTC quantum computer architecture,
ACM Journal on Emerging Technologies in Computing
Systems (JETC) 8, 1 (2012).
[42] Y. Hirata, M. Nakanishi, S. Yamashita, and
Y. Nakashima, An efficient conversion of quantum
circuits to a linear nearest neighbor architecture,
Quantum Information & Computation 11, 142 (2011).
[43] N. Isailovic, M. Whitney, Y. Patel, and J. Kubiatowicz,
Running a quantum circuit at the speed of data, ACM
SIGARCH Computer Architecture News 36, 177 (2008).
[44] T. Æ. Mogensen, Reversible in-place carry-lookahead ad-
dition with few ancillae, in International Conference on
Reversible Computation (Springer, 2019) pp. 224–237.
[45] H. Thapliyal, H. Jayashree, A. Nagamani, and H. R.
Arabnia, Progress in reversible processor design: a novel
methodology for reversible carry look-ahead adder, in
Transactions on Computational Science XVII (Springer,
2013) pp. 73–97.
[46] D. Maslov, G. W. Dueck, D. M. Miller, and C. Ne-
grevergne, Quantum circuit simplification and level com-
paction, IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems 27, 436 (2008).
Appendix A: Detailed Explanation of Draper et al.’s
Carry-lookahead Adder
Draper et al.’s carry-lookahead adder is given as fol-
lows:
Initialization (n Toffoli gates and n CNOT gates)
We calculate g [i, i+ 1] and p [i, i+ 1] (0 ≤ i ≤ n− 1),
as follows:
g [i, i+ 1] =
{
1 if ai = bi = 1
0 otherwise
(A1)
13
|ai〉 • • |ai〉
|bi〉 • |p [i, i+ 1]〉
|ci+1〉 |0〉 |g [i, i+ 1]〉
FIG. 20: A calculation circuit of g [i, i+ 1] and
p [i, i+ 1] (0 ≤ i ≤ n− 1). We use |ci+1〉 as the third
qubit. We can run these gates simultaneously for i = 0
to n− 1.
p [i, i+ 1] =
{
1 if ai + bi = 1
0 otherwise
(A2)
The circuit calculating these is shown in Figure 20.
P-rounds (n Toffoli gates and log n Toffoli depth)
We calculate the p-function by using eq. (5). We
use a parameter tp representing the range of the
propagation of carry. We increase tp from 1 to
blog nc − 1. In each tp, we calculate p [2tpi, 2tp (i+ 1)]
(1 ≤ i ≤ bn/2tpc − 1) by setting |p [2tpi, 2tp (i+ 1/2)]〉
and |p [2tp (i+ 1/2) , 2tp (i+ 1)]〉 as the control qubits in
in Toffoli gate in Figure 4a. These Toffoli gates are ap-
plied simultaneously in each tp.
G-rounds (n Toffoli gates and log n Toffoli depth)
We calculate |c2k〉 (k ∈ N ∪ {0}) by using eq. (6). We
use a parameter tg similar to the way we used it in P-
rounds. We increase tg from 1 to blog nc. In each tg, we
calculate g [2tg i, 2tg (i+ 1)] (0 ≤ i ≤ bn/2tgc − 1) by set-
ting |c2tg i+2tg−1〉 and |p [2tg (i+ 1/2), 2tg (i+ 1)]〉 as the
control qubits and |c2tg (i+1)〉 as the target qubit in Tof-
foli gate in Figure 4b. These Toffoli gates are applied
simultaneously in each tg. Moreover, G-rounds with tg
can be run in parallel with former P-rounds with tg + 1.
C-rounds (n Toffoli gates and log n Toffoli depth)
We calculate all carries |c〉 by using eq. (6). We use a
parameter tc similar to the way we used it in P-rounds.
We decrease tc from blog (2n/3)c to 1. In each tc, we cal-
culate |c2tc i+2tc−1〉
(
1 ≤ i ≤ ⌊(n− 2tc−1) /2tc⌋− 1) by
setting |c2tc i〉 and |p
[
2tci, 2tci+ 2tc−1
]〉 as the control
qubits and |c2tc i+2tc−1〉 as the target qubit in Toffoli gate
in Figure 4b. These Toffoli gates are applied simultane-
ously in each tc.
Inverse P-rounds (n Toffoli gates and log n Toffoli
depth)
We apply the same gates as P-rounds in reverse order.
Rounds with tp can be run in parallel with former C-
round with tp + 1.
Calculating |a+ b〉 (n CNOT gates)
We calculate (a+ b)i (0 ≤ i ≤ n− 2) on |bi〉. We ap-
ply CNOT gates with the control qubit of |ci+1〉 and the
target qubit of |bi+1〉. These CNOT gates are applied
simultaneously.
Erasing Carry (5n Toffoli gates, 2n CNOT gates,
and 2 log n Toffoli depth)
We erase all carries by applying the inverse circuit of
a + (2n − 1− a− b) on the lower n − 1 bits, as shown
|a0〉 •
P
In
v
er
se
C
In
v
er
se
G
In
v
er
se
P
In
v
er
se
In
it
ia
li
za
ti
o
n|(a+ b)0〉
|c1〉
...
/ ...
|an−2〉 •
|(a+ b)n−2〉
|cn−1〉
|an−1〉
|(a+ b)n−1〉
|cn〉
FIG. 21: Erasing |c〉. We apply gates only on the lower
n − 1 qubits of |a〉, |b〉, and |c〉. We apply the same
gates in omitted qubits |ai〉, |(a+ b)i〉, and |ci+1〉. The
P-rounds and inverse C-rounds can be run in parallel,
as can the inverse G-rounds and inverse P-rounds. We
define PE-rounds as the gates before P-rounds, and in-
verse PE-rounds as the gates after inverse Initialization.
in Figure 21. We apply gates before P-rounds and after
inverse Initialization to erase carries. We call these gates
PE-rounds and inverse PE-rounds respectively.
Now, we show the example circuit of Draper’s carry-
lookahead adder as given in Figure 22. In this exam-
ple, we define a and b as 6-bit values, and we calculate
|a〉 |b〉 → |a〉 |a+ b〉. In Figure 22, in constrast to Fig-
ure 9, qubits are sorted from low order to high order.
Appendix B: Detailed Construction of Our Control
Modular Adder
In this section, we explain detail of our control mod-
ular adder. We show the example figures of our control
modular adder too.
1. A C-Comparator
Now, we explain the construction of a C-comparator
in more detail. In a C-comparator, we judge whether or
not b ≥ d, where b is a quantum value and d is a classical
value. As noted in Section III. A., we conduct this by
calculating the carry out of the entire circuit b+(2n − d).
Our construction is given as follows:
Initialization
If we conduct Initialization naively, we apply a Tof-
foli gate and a CNOT gate for each bit. However,
the compilation of a quantum algorithm often requires
compilation (selection of the sequence of gates) to be
adapted to the specific classical values that are inputs
to the overall algorithm. Because 2n − d is a classi-
cal value, we can convert some Toffoli gates to CNOT
14
IP IC IP
Init P G C P IG IInit|a0〉 • • • • •
|b0〉 • • |(a+ b)0〉
|c1〉 • • •
|a1〉 • • • • •
|b1〉 • • • • |(a+ b)1〉
|c2〉 • • • • •
|a2〉 • • • • •
|b2〉 • • • • • • • • |(a+ b)2〉
|p [2, 4]〉 • •
|c3〉 • • •
|a3〉 • • • • •
|b3〉 • • • • • • • • |(a+ b)3〉
|c4〉 • • • •
|a4〉 • • • • •
|b4〉 • • • • • • |(a+ b)4〉
|p [4, 6]〉 •
|c5〉 • •
|a5〉 • •
|b5〉 • • • • |(a+ b)5〉
|c6〉 |(a+ b)6〉
FIG. 22: An example of Draper et al.’s carry-
lookahead adder. This circuit adds two 6-bit numbers
a and b, namely |a〉 |b〉 → |a〉 |a+ b〉. In this figure, we
sort qubits from the lowest qubits to the highest qubits.
The labels at the top are the rounds including Toffoli
gates. Init means Initialization. IP, IC, IG, and IInit
means Inverse P-rounds, Inverse C-rounds, Inverse G-
rounds, and Inverse Initialization respectively.
gates and eliminate other gates. Then, we calculate each
(2n − d)i (0 ≤ i ≤ n− 1). If (2n − d)i = 1,
1. We apply CNOT gates with the control qubit |bi〉
and the target qubit |ci+1〉.
2. We apply X gates with on |bi〉.
These operations correspond to Toffoli gates or CNOT
gates in the Initialization phase in Draper et al.’s con-
struction, respectively.
P-rounds and G-rounds
We conduct P-rounds and G-rounds similar to Draper
et al.’s construction.
Writing result on the COMP qubit (O(1) gates
and O(1) depth)
If we want to flip COMP when b ≥ d, we apply Toffoli
gates with the control qubits of CTRL and |g [0, n]〉, and
with the target qubit of COMP. If we want to flip COMP
when b < d, we apply Toffoli gates similarly to b ≥ d,
but we apply NOT gates on |g [0, n]〉 before and after the
Toffoli gate.
Resetting qubits
We conduct inverse G-rounds and inverse P-rounds
similar to Draper et al.’s construction. Moreover, we
conduct the inverse of our Initialization. Then, we reset
all qubits except COMP as the initial values.
2. A CC-adder
First, we explain the construction of embedding in
more detail. We want to embed as follows:
• If CTRL is 1 and COMP is 1, we embed 2n+a−N .
• If CTRL is 1 and COMP is 0, we embed a.
• Otherwise, we embed no value.
Therefore, we embed on the second register on Figure 10
as follows:
• If CTRL is 1 and (2n + a−N)i = ai = 1, i-th
qubit is |1〉.
• If CTRL is 1, COMP is 1, (2n + a−N)i = 1, and
ai = 0, i-th qubit is |1〉.
• If CTRL is 1, COMP is 0, (2n + a−N)i = 0, and
ai = 1, i-th qubit is |1〉.
• Otherwise, we do nothing.
In the above condition, the values of (2n + a−N)i and
ai are classical information, and CTRL and COMP are
quantum information. Thus, embedding in the first con-
dition can be realized by CNOT gates with the control
qubit of CTRL. Moreover, embedding in the second and
third condition can be realized by Toffoli gates with the
control qubits of CTRL and COMP. However, the set
of i in each classical condition has no overlap. There-
fore, once we embed one of i, we can embed the remain-
ing value as CNOT gates. In each set, we have average
n/4 elements requiring n/4 CNOT gates, O(1) additional
gates. Thus, these embedding can be implemented by
3n/4 CNOT gates. Moreover, because we can run these
simultaneously, embedding requires log n CNOT depth.
The reset of embedding can be implemented similarly.
Next, we explain the optimization in an adder. In our
calculation, there is no carry for g [0, n] whether we sub-
tract N −a or add a. Thus, we can disregard calculation
of carry qubit g [0, n]. To realize this, we omit calculation
of p [i, n] and g [i, n] (i < n). Moreover, by using classi-
cality of a and N , we know that we embed no value in
average n/4 qubits on the second register of Figure 10.
In these qubits, we can omit Initialization, inverse Ini-
tialization, and CNOT gates with the control qubit of
|ai〉 and the target qubit of |bi〉 in erasing carry. By con-
sidering these optimizations, we reduce n/2 Toffoli gates
and 3n/4 CNOT gates.
The gate count and depth is shown in Table V.
15
TABLE V: Gate count and depth of our proposed control modular adder. We omit the rounds whose gate count is
O(1) and whose depth is O(1).
Count Depth
Operation Rounds Toffoli CNOT Toffoli CNOT
Initialization 0 0.5n 0 O(1)
P n 0 } logn 0
C-comparator G n 0
(twice) Inverse G n 0 } logn 0
Inverse P n 0
Inverse Initialization 0 0.5n 0 O(1)
Total 4n n 2 logn O(1)
Embedding O(1) 0.75n O(1) logn
Initialization 0.75n 0.75n O(1) O(1)
P n 0 } logn 0
G n 0
C n 0 } logn 0
Inverse P n 0
Calculating |a+ b〉 0 n 0 O(1)
CC-adder PE 0 0.75n 0 O(1)
P n 0 } logn 0
Inverse C n 0
Inverse G n 0 } logn 0
Inverse P n 0
Inverse Initialization 0.75n 0.75n O(1) O(1)
Resetting O(1) 0.75n O(1) logn
Total 9.5n 4.75n 4 logn 2 logn
Total 17.5n 6.75n 8 logn 2 logn
3. Example of Our Control Modular Adder
We show an example of a 6-bit control modular adder
when N = 59 and a = 37. Circuits are given in Fig-
ures 23–25.
In these example figures registers are shown with low-
order qubits at the top, in contrast to Figure 10. In this
subsection, the register |b〉 contains a quantum value.
The algorithm follows in this order:
1. Conduct a C-comparator with the control qubit
CTRL. Compare |b〉 and N − a = 22. If b ≥
22, flip COMP. This is implemented by adding
26 − (N − a) = 42 and using the carry out.
2. Conduct a CC-adder. If both CTRL and COMP
are 1, subtract N − a = 22. This is implemented
by adding 26 − (N − a) = 42 without calculating
carry c6. If CTRL is 1 and COMP is 0, add a = 37,
otherwise, add no value.
3. Conduct a C-comparator with the control qubit
CTRL. Compare |b〉 and a = 37. If b < 37, flip
COMP. This is implemented by calculating carry
of adding 26 − a = 27.
These steps correspond to Figure 23, 24, and 25 respec-
tively.
Appendix C: Detailed Calculation of T -depth
In this section, we analyze the T -depth of our T -
optimal control modular adder. We assume that we run
GRT with the same timing, and each GRT has T -depth
2 from Figure 6. We focus on the parts that can be run
concurrently. Except for Initialization, we run
• P-rounds and G-rounds simultaneously,
• C-rounds and inverse P-rounds simultaneously,
• P-rounds and inverse C-rounds simultaneously, and
• inverse G-rounds and inverse P-rounds simultane-
ously.
In the first and third steps, we run many T gates simulta-
neously at the start and fewer T gates as the calculation
progresses. In the second and fourth steps, we run only
a few T gates simultaneously initially and more as the
calculation progresses. Thus, there is a difference in the
number of T gates we can run simultaneously.
As noted in Section IV. A., we define nT as the upper-
bound of the number of T gates running simultaneously,
and we calculate T -depth based on nT as in Figure 17.
In each round, there are parts where we can run more
than nT T gates. However, by setting nT , we run these
T gates separately. Compared to this, in the parts having
less than nT T gates, we can run these T gates simulta-
neously.
16
Init P G IG IP IInit
CTRL •
|d0〉
|b0〉
|c1〉 • •
|d1〉
|b1〉 • • • •
|c2〉 • •
|d2〉
|b2〉 • •
|p [2, 4]〉 • •
|c3〉 • •
|d3〉
|b3〉 • • • • • •
|c4〉 • •
|d4〉
|b4〉 • •
|p [4, 6]〉 • •
|c5〉 • •
|d5〉
|b5〉 • • • • • •
|c6〉 •
COMP
FIG. 23: An example circuit of the first C-comparator
for flipping the COMP qubit if b ≥ 22. To achieve this,
We add 26 − 22 = 42 = 1010102 and use the COMP
qubit as the carry out of the adder. The Init phase con-
sists of pairs of gates, a CNOT and an X, on the sec-
ond, fourth, and sixth groups of qubits including |di〉,
|bi〉, and |ci+1〉 from the lowest bit. This circuit is sym-
metric about the Toffoli gate surrounded by a dotted
box. Init, IP, IG, and IInit mean Initialization, Inverse
P-rounds, Inverse C-rounds, Inverse G-rounds, and In-
verse Initialization respectively.
First, we consider the parts having fewer than
nT T gates, which happens when we run P-rounds and
G-rounds simultaneously, C-rounds and inverse P-rounds
simultaneously, P-rounds and inverse C-rounds simulta-
neously, and inverse G-rounds and inverse P-rounds si-
multaneously. In these rounds, if we have no restriction
on running T gates, patterns are given as follows:
Embedding Resetting
CTRL • • • • • •
|d0〉 • • • • • • •
|b0〉 • •
|c1〉 • • •
|d1〉 • • • • • • •
|b1〉 • • • •
|c2〉 • • • • •
|d2〉 • • • • •
|b2〉 • • • • • • • •
|p [2, 4]〉 • •
|c3〉 • • •
|d3〉 • • • • •
|b3〉 • • • • • • • •
|c4〉 • • •
|d4〉
|b4〉 • •
|p [4, 6]〉
|c5〉 •
|d5〉 •
|b5〉
|c6〉
COMP • • • •
FIG. 24: An example of the CC-adder. If both CTRL
and COMP are 1, we subtract N − a = 22. This is
implemented by adding 26 − (N − a) = 42 = 1010102
without calculating carry c6. If CTRL is 1 and COMP
is 0, we add a = 37 = 1001012. Based on these, we
conduct embedding and resetting. The remaining part
is an adder, and we omit the calculation of p[i, 6] and
g[i, 6] (i < 6).
• In the first and the third cases, the number of T
gates we can run simultaneously decreases by one
half as the calculation progresses. Thus, in the
latter part of the calculation, we run fewer than
nT T gates simultaneously. This part has T -depth
2 log nT and nT T gates in total.
• In the second and the fourth cases, the number of
17
Init P G IG IP IInit
CTRL •
|d0〉
|b0〉 • •
|c1〉 • •
|d1〉
|b1〉 • • • •
|c2〉 • •
|d2〉
|b2〉 • •
|p [2, 4]〉 • •
|c3〉 • •
|d3〉
|b3〉 • • • • • •
|c4〉 • •
|d4〉
|b4〉 • • • •
|p [4, 6]〉 • •
|c5〉 • •
|d5〉
|b5〉 • • • •
|c6〉 •
COMP
FIG. 25: An example of the last C-comparator. We
flip the COMP qubit if b < 37. This is achieved by
adding 26 − 37 = 27 = 0110112 and using the carry
out. First, we apply pairs of gates, a CNOT and an
X gate, on the first, second, fourth, and fifth groups
of qubits. In contrast to Figure 23, we apply X gates
before and after the center Toffoli gate. This circuit
is symmetric about the Toffoli gate surrounded by a
dotted box. Init, IP, IG, and IInit means Initialization,
Inverse P-rounds, Inverse C-rounds, Inverse G-rounds,
and Inverse Initialization respectively.
T gates we can run simultaneously doubles as the
calculation progresses. Thus, in the former part
of the calculation, we run less than nT T gates si-
multaneously. This part has T -depth 2 log nT and
nT T gates in total.
We have 6 parts each with a small number of T gates, as
follows:
• P-rounds and G-rounds in the first C-comparator,
• P-rounds and G-rounds in the CC-adder,
• C-rounds and inverse P-rounds in the CC-adder,
• P-rounds and inverse C-rounds in the CC-adder,
• inverse G-rounds and inverse P-rounds in the CC-
adder, and
• P-rounds and G-rounds in the final C-comparator
Thus, we consume 12 log nT T -depth and 6nT T gates in
these.
Next, we consider the remaining parts. In these parts,
we run T gates nT each. The number of total T gates is
43n from Table II, and we run 43n− 6nT T gates. Thus,
T -depth of this part is given by
2 (43n− 6nT )
nT
=
86n
nT
− 12. (C1)
In conclusion, T -depth is given by
86n
nT
+ 12 log nT − 12. (C2)
Appendix D: Detailed Gate Count on FTQ or NISQ
In this section, we detail the T gate count on FTQ or
NISQ. The FTQ count is shown in Table VI. The de-
tailed CNOT gate count on NISQ is shown in Table VII.
The detailed CNOT depth count and KQCX on NISQ are
shown in Table VIII.
18
TABLE VI: The breakdown of Toffoli count and T -count of our control modular adder. Tof means the number of
Toffoli gates in each round. Gate means the type of using relative-phase Toffoli gates in each round. Cost means
the number of T gates in each relative-phase Toffoli gate. Count means T -count in each round. We omit the rounds
whose T -count is O(1). Inv means Inverse, C-comp means a C-comparator, CC-add means a CC-adder, and Init
means Initialization.
Toffoli Decomposition
Thapliyal et al. [39] Thapliyal et al. [39]
Draper et al. [32] (qubit-optimize) (T -optimize) Ours
Operation Rounds Tof gate cost count gate cost count gate cost count gate cost count
P n ST 7 7n GRT 4 4n GRT 4 4n GRT 4 4n
C-comp G n ST 7 7n ST 7 7n PGRT 4 4n GRT 4 4n
(twice) InvG n ST 7 7n ST 7 7n PGRT 4 4n IGRT 0 0
InvP n ST 7 7n IGRT 0 0 IGRT 0 0 IGRT 0 0
Total 4n – – 28n – – 18n – – 12n – – 8n
Init 0.75n ST 7 5.25n GRT 4 3n GRT 4 3n GRT 4 3n
P n ST 7 7n GRT 4 4n GRT 4 4n GRT 4 4n
G n ST 7 7n ST 7 7n PGRT 4 4n PGRT 4 4n
C n ST 7 7n ST 7 7n PGRT 4 4n PGRT 4 4n
InvP n ST 7 7n IGRT 0 0 IGRT 0 0 IGRT 0 0
CC-add P n ST 7 7n GRT 4 4n GRT 4 4n GRT 4 4n
InvC n ST 7 7n ST 7 7n PGRT 4 4n PGRT 4 4n
InvG n ST 7 7n ST 7 7n PGRT 4 4n PGRT 4 4n
InvP n ST 7 7n IGRT 0 0 IGRT 0 0 IGRT 0 0
InvInit 0.75n ST 7 5.25n IGRT 0 0 IGRT 0 0 IGRT 0 0
Total 9.5n – – 66.5n – – 39n – – 27n – – 27n
Total 17.5n – – 122.5n – – 75n – – 51n – – 43n
19
TABLE VII: The breakdown of Toffoli count and CNOT count of our control modular adder. Gate means the type
of using relative-phase Toffoli gates in each round. Cost means the number of CNOT gates in each relative-phase
Toffoli gate. Count means CNOT count in each round. We do not show the rounds whose CNOT count is O(1).
Inv means Inverse, C-comp means a C-comparator, CC-add means a CC-adder, Init means Initialization, Embed
means Embedding, Calc means Calculating of |a+ b〉, and Reset means Resetting.
Toffoli Decomposition
Thapliyal et al. [39] Thapliyal et al. [39]
Draper et al. [32] (qubit-optimize) (T -optimize) Ours
Operation Rounds gate cost count gate cost count gate cost count gate cost count
Init CNOT 0.5n – – 0.5n – – 0.5n – – 0.5n – – 0.5n
P Toffoli n ST 6 6n GRT 6 6n GRT 6 6n RT3 3 3n
G Toffoli n ST 6 6n ST 6 6n PGRT 8 8n RT3 3 3n
C-Comp InvG Toffoli n ST 6 6n ST 6 6n PGRT 8 8n IRT3 3 3n
(twice) InvP Toffoli n ST 6 6n IGRT 1 n IGRT 1 n IRT3 3 3n
InvInit CNOT 0.5n – – 0.5n – – 0.5n – – 0.5n – – 0.5n
Total – – 25n – – 20n – – 24n – – 13n
Embed CNOT 0.75n – – 0.75n – – 0.75n – – 0.75n – – 0.75n
Init
Toffoli 0.75n ST 6 4.5n GRT 6 4.5n GRT 6 4.5n RT4 4 3n
CNOT 0.75n – – 0.75n – – 0.75n – – 0.75n – – 0.75n
P Toffoli n ST 6 6n GRT 6 6n GRT 6 6n RT3 3 3n
G Toffoli n ST 6 6n ST 6 6n PGRT 8 8n RT4 4 4n
C Toffoli n ST 6 6n ST 6 6n PGRT 8 8n RT4 4 4n
InvP Toffoli n ST 6 6n IGRT 1 n IGRT 1 n IRT3 3 3n
Calc CNOT n – – n – – n – – n – – n
CC-add PE CNOT 0.75n – – 0.75n – – 0.75n – – 0.75n – – 0.75n
P Toffoli n ST 6 6n GRT 6 6n GRT 6 6n RT3 3 3n
InvC Toffoli n ST 6 6n ST 6 6n PGRT 8 8n IRT4 4 4n
InvG Toffoli n ST 6 6n ST 6 6n PGRT 8 8n IRT4 4 4n
InvP Toffoli n ST 6 6n IGRT 1 n IGRT 1 n IRT3 3 3n
InvInit
Toffoli 0.75n ST 6 4.5n IGRT 1 0.75n IGRT 1 0.75n RT4 4 3n
CNOT 0.75n – – 0.75n – – 0.75n – – 0.75n – – 0.75n
Reset CNOT 0.75n – – 0.75n – – 0.75n – – 0.75n – – 0.75n
Total – – 61.75n – – 48n – – 56n – – 38.75n
Total – – 111.75n – – 88n – – 104n – – 64.75n
20
TABLE VIII: The breakdown of Toffoli count and CNOT-depth of our control modular adder. Gate means the
type of using relative-phase Toffoli gates in each round. Cost means the number of CNOT gates in each relative-
phase Toffoli gate. Depth means CNOT-depth in each round. We omit the rounds whose CNOT-depth is O(1). Inv
means Inverse, C-comp means a C-comparator, CC-add means a CC-adder, Init means Initialization, Embed means
Embedding, and Reset means Resetting.
Toffoli Decomposition
Thapliyal et al. [39] Thapliyal et al. [39]
Draper et al. [32] (qubit-optimize) (T -optimize) Ours
Operation Rounds gate cost depth gate cost depth gate cost depth gate cost depth
P Toffoli } logn ST 6 }6 logn GRT 6 }6 logn GRT 6 }8 logn RT3 3 }3 logn
G Toffoli ST 6 ST 6 PGRT 8 RT3 3
C-Comp InvG Toffoli } logn ST 6 }6 logn ST 6 }6 logn PGRT 8 }8 logn IRT3 3 }3 logn
(twice) InvP Toffoli ST 6 IGRT 1 IGRT 1 IRT3 3
Total – – 12 logn – – 12 logn – – 16 logn – – 6 logn
Embed CNOT logn – – logn – – logn – – logn – – logn
P Toffoli } logn ST 6 }6 logn GRT 6 }6 logn GRT 6 }8 logn RT3 3 }4 logn
G Toffoli ST 6 ST 6 PGRT 8 RT4 4
C Toffoli } logn ST 6 }6 logn ST 6 }6 logn PGRT 8 }8 logn RT4 4 }4 logn
InvP Toffoli ST 6 IGRT 1 IGRT 1 IRT3 3
CC-add P Toffoli } logn ST 6 }6 logn GRT 6 }6 logn GRT 6 }8 logn RT3 3 }4 logn
InvC Toffoli ST 6 ST 6 PGRT 8 IRT4 4
InvG Toffoli } logn ST 6 }6 logn ST 6 }6 logn PGRT 8 }8 logn IRT4 4 }4 logn
InvP Toffoli ST 6 IGRT 1 IGRT 1 IRT3 3
Reset CNOT logn – – logn – – logn – – logn – – logn
Total – – 26 logn – – 26 logn – – 34 logn – – 18 logn
Total – – 50 logn – – 50 logn – – 66 logn – – 30 logn
#qubits – – 4n – – 4n – – 4.5n – – 4n
KQCX – – 200n logn – – 200n logn – – 297n logn – – 120n logn
