Quantum Addition Circuits and Unbounded Fan-Out by Takahashi, Yasuhiro et al.
ar
X
iv
:0
91
0.
25
30
v1
  [
qu
an
t-p
h]
  1
4 O
ct 
20
09
Quantum Addition Circuits and Unbounded Fan-Out
Yasuhiro Takahashi∗ Seiichiro Tani∗† Noboru Kunihiro‡
Abstract
We first show how to construct an O(n)-depth O(n)-size quantum circuit for addition of two
n-bit binary numbers with no ancillary qubits. The exact size is 7n− 6, which is smaller than
that of any other quantum circuit ever constructed for addition with no ancillary qubits. Using
the circuit, we then propose a method for constructing an O(d(n))-depth O(n)-size quantum
circuit for addition with O(n/d(n)) ancillary qubits for any d(n) = Ω(logn). If we are allowed
to use unbounded fan-out gates with length O(nε) for an arbitrary small positive constant ε,
we can modify the method and construct an O(e(n))-depth O(n)-size circuit with o(n) ancillary
qubits for any e(n) = Ω(log∗ n). In particular, these methods yield efficient circuits with depth
O(log n) and with depth O(log∗ n), respectively. We apply our circuits to constructing efficient
quantum circuits for Shor’s discrete logarithm algorithm.
1 Introduction
Since Shor’s discovery of quantum algorithms for factoring and discrete logarithm problems [1],
many studies have investigated ways of constructing quantum circuits for the algorithms [2, 3,
4, 5, 6, 7]. The resulting circuits are important not only for implementing the algorithms on a
quantum computer but also for understanding the computational power of small quantum circuits.
These studies have shown that addition of two binary numbers is a key operation for constructing
quantum circuits for Shor’s algorithms.
We consider the problem of constructing quantum circuits for addition of two binary numbers
with better complexity. The complexity measures of a quantum circuit are its size and depth, and
the number of qubits in it. Roughly speaking, the size and depth correspond to computation time,
while the number of qubits corresponds to the size of memory. We regard the number of qubits as
a primary consideration since it seems difficult to realize a quantum computer with many qubits.
It is not obvious whether the number of qubits in a quantum circuit for addition can be decreased
by using efficient classical ones, though the size or depth can be decreased simply by using them.
An unbounded fan-out gate on n + 1 qubits copies a classical source bit into n copies. In
particular, the gate on two qubits is a CNOT gate. If unbounded fan-out gates are available,
sublogarithmic-depth quantum circuits for various operations can be constructed [8, 9]. This is
because the gate performs the copy operation on an unbounded number of qubits in a constant
time. However, it seems difficult to realize such a gate practically. Thus, it is important to minimize
the number of target qubits of the gate in a circuit without increasing the complexity of the circuit.
When we use unbounded fan-out gates, we consider the complexity measures (size, depth, and the
number of qubits) for the number of target qubits of the gate. We call the number of target qubits
the length of an unbounded fan-out gate.
There have been many studies of efficient quantum circuits for addition of two n-bit binary
numbers. These circuits can be classified according to depth complexity. Draper’s and Takahashi
∗NTT Communication Science Laboratories, NTT Corporation
†Quantum Computation and Information Project, ERATO-SORST, JST
‡Graduate School of Frontier Sciences, The University of Tokyo
1
et al.’s circuits have depth O(n) and use no ancillary qubits [10, 11]. Takahashi et al.’s is more
efficient than Draper’s since the sizes of Takahashi et al.’s and Draper’s are O(n) and O(n2),
respectively. Draper et al.’s and Takahashi et al.’s circuits have depth O(log n) [12, 13]. Draper
et al.’s uses O(n) ancillary qubits and its size is O(n). Takahashi et al. decreased the number
of ancillary qubits to O(n/ log n) without increasing the size asymptotically. Høyer et al. showed
that, if unbounded fan-out gates with length O(n) are available, an O(log∗ n)-depth circuit can be
constructed [9]. They have not analyzed the number of ancillary qubits or size.
In this paper, we first show how to construct an O(n)-depth O(n)-size quantum circuit for
addition with no ancillary qubits. The circuit is based on the ripple-carry approach. The exact
size is 7n−6, which is smaller than that of any other quantum circuit ever constructed for addition
with no ancillary qubits. Moreover, the circuit is more implementable than the previous circuits
with no ancillary qubits in the sense that the circuit can be used directly on a linear nearest
neighbor architecture [6], i.e., on a unidimensional array of qubits with nearest neighbor interactions
only. By combining the circuit with the carry-lookahead approach, we then propose a method for
constructing an O(d(n))-depth O(n)-size quantum circuit for addition with O(n/d(n)) ancillary
qubits for any d(n) = Ω(log n). The method is a generalized and simplified version of Takahashi
et al.’s method for constructing a logarithmic-depth circuit with a small number of qubits [13]. In
particular, for d(n) = log n, our method yields an O(log n)-depth O(n)-size circuit with O(n/ log n)
ancillary qubits. The number of ancillary qubits is exactly the same as that in Takahashi et al.’s
circuit and the size is less than half that of Takahashi et al.’s.
If we are allowed to use unbounded fan-out gates with length O(nε) for an arbitrary small
positive constant ε, we can modify our method and construct an O(e(n))-depth O(n)-size circuit
with O(n log∗∗ n/e(n)) ancillary qubits for any e(n) = Ω(log∗ n), where log∗∗ n is a slowly-growing
function satisfying log∗∗ n = o(log∗ n). The main point of this modification is to decrease the
depth of the carry-lookahead part of our method by using a quantum version of Chandra et al.’s
constant-depth classical circuit for addition with unbounded fan-in and fan-out gates [14]. To
construct the quantum version, we require a quantum gate corresponding to an unbounded fan-in
gate. We use Høyer et al.’s small-depth quantum circuit for a generalized Toffoli operation with
unbounded fan-out gates [9] as the gate. In particular, for e(n) = log∗ n, the modified method
yields an O(log∗ n)-depth O(n)-size circuit with o(n) ancillary qubits. Though Høyer et al. have
constructed an O(log∗ n)-depth circuit for addition as mentioned above, our construction shows
that the number of ancillary qubits, size, and the length of an unbounded fan-out gate can be small
simultaneously.
This construction also shows that unbounded fan-out gates with a small length are sufficient to
construct a sublogarithmic-depth circuit. For example, if we are allowed to use unbounded fan-out
gates with length O(log n), we can construct an O(log n/ log log n)-depth O(n)-size circuit with o(n)
ancillary qubits. Such a sublogarithmic-depth circuit cannot be constructed by using a quantum
circuit only with gates on a bounded number of qubits [15] or by using a classical circuit only with
bounded fan-in and unbounded fan-out gates [16].
Using our circuits for addition, we construct efficient quantum circuits for Shor’s discrete log-
arithm algorithm for elliptic curves over the prime field GF(p). This is done by simply using our
addition circuits in Proos et al.’s circuit for Shor’s discrete logarithm algorithm [5]. Since Proos et
al.’s circuit uses n ancillary qubits during addition, the use of our circuit with no ancillary qubits
decreases the n ancillary qubits without increasing the original depth or size asymptotically, where
n is the length of the binary representation for p. Moreover, we decrease the depth asymptotically
by adding o(n) ancillary qubits. Proos et al.’s circuit with our addition circuits is more efficient
than with the previous ones described above.
In contrast to the previous methods for constructing efficient quantum circuits for addition
[10, 11, 12, 13, 9], our method is general in the sense that it can yield various types of efficient
2
quantum circuits for addition. The generality allows us to construct quantum circuits appropriate
for various situations we will have to consider practically. For example, if we want to save the
number of qubits, we can obtain a qubit-efficient circuit by setting d(n) = n in our method. We
can decrease the depth by setting d(n) = log n. Moreover, we can choose an “intermediate” circuit
by setting d(n) =
√
n.
2 Circuit with Depth O(n)
2.1 Ripple-Carry Approach
We use the standard notation for quantum states and the standard diagrams for quantum circuits
[17]. As mentioned earlier, the measures of the complexity of a quantum circuit are the number
of qubits and its size and depth. The meaning of the number of qubits is obvious. The size of a
circuit is defined as the total number of elementary gates in it. The elementary gates are one-qubit
unitary gates, CNOT gates, controlled-Rt gates, and Toffoli gates, where Rt|x〉 = e2piix/2t |x〉 for
t ≥ 1 and x ∈ {0, 1}. In Section 4, we use the gate for an unbounded fan-out operation Ft as an
elementary gate, where Ft (on t+ 1 qubits) is defined as
Ft
(
|y〉
t−1⊗
i=0
|xi〉
)
= |y〉
t−1⊗
i=0
|xi ⊕ y〉
for y, xi ∈ {0, 1}. The symbol ⊕ denotes addition modulo 2. The depth of a circuit is defined as
follows. Input qubits are considered to have depth 0. For each gate G, the depth of G is equal
to 1 plus the maximal depth of a gate on which G depends. The depth of a circuit is equal to
the maximal depth of a gate in it. Intuitively, the depth is the number of layers in the circuit,
where a layer consists of gates that can be performed simultaneously. A quantum circuit can use
ancillary qubits, which start and end in the state |0〉. We usually count the number of ancillary
qubits instead of the number of all qubits used in the circuit.
We consider the problem of constructing quantum circuits for the operation ADDn defined as(
n−1⊗
i=0
|bi〉|ai〉
)
|z〉 →
(
n−1⊗
i=0
|si〉|ai〉
)
|z ⊕ sn〉,
where an−1 · · · a0 and bn−1 · · · b0 are the input binary numbers, z ∈ {0, 1}, and sn · · · s0 is the sum
of the input binary numbers. Our linear-depth circuit and most of the previous ones with a small
number of qubits are based on the ripple-carry approach. To explain the approach, we define the
carry bit ci (0 ≤ i ≤ n) as follows:
ci =
{
0 i = 0,
MAJ(ai−1, bi−1, ci−1) 1 ≤ i ≤ n,
where MAJ is the majority function for three bits defined as MAJ(a, b, c) = ab ⊕ bc ⊕ ca. In the
ripple-carry approach, the first step is to compute the carry bit c1 by using a0 and b0 and c0.
Then, c2 is computed by using a1 and b1 and c1. This procedure is repeated until all carry bits are
computed. After that, si (0 ≤ i ≤ n) is computed by the relationship
si =
{
ai ⊕ bi ⊕ ci 0 ≤ i ≤ n− 1,
cn i = n.
When the ripple-carry approach is used, the key issue for constructing a quantum circuit with
a small number of qubits is how to store carry bits. Cuccaro et al.’s circuits, which are based on
3
 !"
 !"#$%&'
 
#
 !"#$%&'
 
  
 !"#$%&'
Figure 1: The MAJ gate.
the approach, use one ancillary qubit to store c0 = 0 [18]. The carry bit ci is stored in the qubit
initially storing ai−1 for 1 ≤ i ≤ n. To do this, they defined the gate for MAJ depicted in Fig. 1,
which is the main component of their circuits. The gate maps |ci〉|bi〉|ai〉 to |ci ⊕ ai〉|bi ⊕ ai〉|ci+1〉.
Takahashi et al.’s circuit, which is also based on the ripple-carry approach, uses no ancillary qubits
[11]. All the carry bits are stored in the qubit initially storing z. The main component of their
circuit is also the MAJ gate. They use the property that the gate maps |z ⊕ bi〉|z ⊕ ai〉|z ⊕ ci〉 to
|bi ⊕ ci〉|ai ⊕ ci〉|z ⊕ ci+1〉.
2.2 Our Circuit
We store the carry bit ci in the qubit initially storing ai for 0 ≤ i ≤ n − 1 and store the high-
order bit cn in the qubit initially storing z. This would be difficult to do if we use the MAJ gate
directly. Our idea is to divide the MAJ gate into two parts. The first part consists of two CNOT
gates and the second one consists of one Toffoli gate. It is easy to verify that a Toffoli gate maps
|bi⊕ai〉|ai⊕ ci〉|ai+1⊕ai〉 to |bi⊕ai〉|ai⊕ ci〉|ai+1⊕ ci+1〉 for 1 ≤ i ≤ n−1, where we consider an as
z. Thus, using CNOT gates (the first parts of the MAJ gate) and a Toffoli gate, we first prepare
the state
|b1 ⊕ a1〉|a1 ⊕ c1〉
(
n−1⊗
i=2
|bi ⊕ ai〉|ai ⊕ ai−1〉
)
|z ⊕ an−1〉.
By applying Toffoli gates (the second parts of the MAJ gate), we can compute ci and store it in
the qubit initially storing ai. The final Toffoli gate computes cn and stores it in the qubit initially
storing z. The detailed construction is described below.
Let Ai and Bi denote the memory locations initially storing ai and bi, respectively, for 0 ≤ i ≤
n − 1. Let An be the memory location initially storing z. Location Ai (0 ≤ i ≤ n − 1) will store
ai, Bi (0 ≤ i ≤ n − 1) will store si, and An will store z ⊕ sn at the end of the computation. Our
circuit is constructed in the following six steps.
1. For i = 1, . . . , n− 1:
Apply a CNOT gate to a pair of memory locations Bi and Ai where Ai is used for the control
qubit.
2. For i = n− 1, . . . , 1:
Apply a CNOT gate to a pair of memory locations Ai and Ai+1 where Ai is used for the
control qubit.
3. For i = 0, . . . , n− 1:
Apply a Toffoli gate to a tuple of memory locations Bi, Ai and Ai+1, where Bi and Ai are
used for the control qubit.
4. For i = n− 1, . . . , 1:
4
  
 
! " "
 !"#$%&'
 !
 
!
 "
 
! " " "  "
 
!
  
!
!
 !"#$%&'
" "
 !"#$%&'  !"#$%&'
 !
!
!
 "
!
! " "
 !"#$%&'
" " "
 !"#$%&'
" "  "
!
!
  
"
!
 !"#$%&'
" "
 !"#$%&'  !"#$%&'
 !
"
!
 "
"
! " "
 !"#$%&'  !"#$%&'
" " "
 !"#$%&'  !"#$%&'
" "  "
"
!
  
#
!
 !"#$%&'
" "
 !"#$%&'  !"#$%&'
 !
#
!
 "
#
! " "
 !"#$%&'  !"#$%&'
" " "
 !"#$%&'  !"#$%&'
" "  "
#
!
  
$
!
 !"#$%&'
"
 !"#$%&'  !"#$%&'
 !
$
!
 "
$
! " "
 !"#$%&'  !"#$%&'
" "
 !"#$%&'  !"#$%&'
"  "
$
!
 #!
 !"#$%&'  !"#$%&'
 # # !
%
!
Figure 2: The circuit for ADD5.
Apply a CNOT gate to a pair of memory locations Bi and Ai where Ai is used for the control
qubit. Then, apply a Toffoli gate to a tuple of memory locations Bi−1, Ai−1 and Ai, where
Bi−1 and Ai−1 are used for the control qubit.
5. For i = 1, . . . , n− 2:
Apply a CNOT gate to a pair of memory locations Ai and Ai+1 where Ai is used for the
control qubit.
6. For i = 0, . . . , n− 1:
Apply a CNOT gate to a pair of memory locations Bi and Ai where Ai is used for the control
qubit.
The circuit for ADD5 is depicted in Fig. 2.
We describe the changes of the input state of ADDn to show that the circuit works correctly.
In Step 1, the input state is transformed into
|b0〉|a0〉
(
n−1⊗
i=1
|bi ⊕ ai〉|ai〉
)
|z〉.
In Step 2, the state is transformed into
|b0〉|a0〉|b1 ⊕ a1〉|a1〉
(
n−1⊗
i=2
|bi ⊕ ai〉|ai ⊕ ai−1〉
)
|z ⊕ an−1〉.
The first Toffoli gate in Step 3 transforms the state into
|b0〉|a0〉|b1 ⊕ a1〉|a1 ⊕ c1〉
(
n−1⊗
i=2
|bi ⊕ ai〉|ai ⊕ ai−1〉
)
|z ⊕ an−1〉.
This is repeated by using a Toffoli gate. The state after Step 3 is
|b0〉|a0〉
(
n−1⊗
i=1
|bi ⊕ ai〉|ai ⊕ ci〉
)
|z ⊕ sn〉.
In Step 4, the state is transformed into
|b0〉|a0〉|b1 ⊕ c1〉|a1〉
(
n−1⊗
i=2
|bi ⊕ ci〉|ai ⊕ ai−1〉
)
|z ⊕ sn〉.
5
Table 1: Comparison of Our Circuit and Previous Circuits
Circuit Ancilla Size Toffoli Depth LNN
Cuccaro et al. [18] 1 6n + 1 2n 6n+ 1
√
Cuccaro et al. [18] 1 9n − 8 2n− 1 2n+ 4 √
Draper [10] 0 1.5n2 + 4.5n + 2 0 5n+ 3 —
Takahashi et al. [11] 0 10n− 9 4n− 5 8n− 7 —
Our Circuit 0 7n − 6 2n− 1 5n− 3 √
In Step 5, the state is transformed into
|b0〉|a0〉
(
n−1⊗
i=1
|bi ⊕ ci〉|ai〉
)
|z ⊕ sn〉.
Since si = ai ⊕ bi ⊕ ci for 0 ≤ i ≤ n− 1, the final step gives us the desired output state.
2.3 Complexity Analysis
From the construction, it is obvious that our circuit uses no ancillary qubits. We compute the
depth and size of the circuit for n ≥ 3 precisely. In Step 1, the number of CNOT gates is n− 1 and
these gates can be performed simultaneously. Thus, the depth and size of Step 1 are 1 and n − 1,
respectively. In Step 2, the number of CNOT gates is n− 1 and thus the depth and size of Step 2
are n− 1. In Step 3, the number of Toffoli gates is n and thus the depth and size of Step 3 are n.
In Step 4, the number of CNOT gates is n− 1 and the number of Toffoli gates is n− 1. Thus, the
depth and size of Step 4 are 2n − 2. In Step 5, the number of CNOT gates is n − 2 and thus the
depth and size of Step 5 are n− 2. In Step 6, the number of CNOT gates is n and these gates can
be performed simultaneously. Thus, the depth and size of Step 6 are 1 and n, respectively. Thus,
the depth and size of the whole circuit are 5n− 3 and 7n− 6, respectively. The numbers of CNOT
and Toffoli gates are 5n− 5 and 2n− 1, respectively.
As discussed in [6], many proposed quantum computer architectures deal with a unidimensional
array of qubits with nearest neighbor interactions only. Thus, it is important for a circuit to work
on such a linear nearest neighbor (LNN) architecture. When the input and output binary numbers
are arranged on an LNN architecture in an interleaved manner (as in Fig. 2), our circuit can be
used directly on an LNN architecture in the sense that the circuit can be transformed into one on
an LNN architecture without increasing the size or depth asymptotically.
A comparison of our circuit and the previous ones with a small number of qubits is summarized
in Table 1. The symbol “
√
” in the LNN column means that the circuit can be used directly on
an LNN architecture in the sense described above. The symbol “—” means that we do not know
whether this is the case for the circuit. The size of our circuit is less than that of any other quantum
circuit ever constructed for ADDn with no ancillary qubits. When we regard the number of qubits
as a primary consideration, our circuit is more efficient than the previous circuits in Table 1.
Though there exists a size-efficient or depth-efficient circuit with one ancillary qubit [18], it
is worth noting that the difference between the total number of ancillary qubits used by parallel
applications of our circuit (as in the next section) and that of the previous circuit with one ancillary
qubit depends on the number of circuits applied in parallel and may become large. Moreover, since
Toffoli gates are on three qubits and thus may be harder to implement than the other gates (on
a smaller number of qubits), it is worth noting that the number of Toffoli gates in our circuit is
2n− 1, which is less than or equal to those of the previous circuits in Table 1 (excluding Draper’s
O(n2)-size circuit).
6
3 General Method
3.1 Combination Method
The ripple-carry approach decreases the number of ancillary qubits but requires large depth. The
carry-lookahead approach decreases the depth but requires many qubits [12]. Our method is based
on the combination of these methods and is a generalized and simplified version of Takahashi et
al.’s method for constructing a logarithmic-depth circuit with a small number of qubits [13]. In
this section, we review the previous method. The carry-lookahead approach is described by using
two bits p[i, j] (1 ≤ i < j ≤ n) and g[i, j] (0 ≤ i < j ≤ n) [12]. The bit p[i, j] is 1 if a carry bit is
propagated from bit position i to bit position j, and g[i, j] is 1 if a carry bit is generated between
bit positions i and j. The p[i, j] and g[i, j] are computed by the following relations:
• For any i such that 1 ≤ i ≤ n− 1, p[i, i+ 1] = ai ⊕ bi.
• For any i, j such that 1 ≤ i < i+1 < j ≤ n, p[i, j] = p[i, t]p[t, j] for any t satisfying i < t < j.
• For any i such that 0 ≤ i ≤ n− 1, g[i, i + 1] = aibi.
• For any i, j such that 0 ≤ i < i+ 1 < j ≤ n, g[i, j] = g[i, t]p[t, j] ⊕ g[t, j] for any t satisfying
i < t < j.
It holds that g[0, j] = cj for all 1 ≤ j ≤ n.
Draper et al.’s quantum carry-lookahead adder first computes p[i, i + 1] (1 ≤ i ≤ n − 1) and
g[i, i + 1] (0 ≤ i ≤ n − 1). Then, it computes g[0, i] (1 ≤ i ≤ n) by successively doubling the
sizes of the intervals under consideration. Lastly, it computes si (0 ≤ i ≤ n), where s0 = p[0, 1],
si = p[i, i + 1] ⊕ g[0, i] (1 ≤ i ≤ n − 1), and sn = g[0, n]. The key circuit is the one for the second
step. We call this circuit the CARRY1 gate. In general, the CARRYl gate is a circuit for the
operation
⌊n/2l−1⌋−1⊗
i=1
|pl−1[i]〉
⌊n/2l−1⌋−1⊗
j=0
|gl−1[j]〉 →
⌊n/2l−1⌋−1⊗
i=1
|pl−1[i]〉
⌊n/2l−1⌋−1⊗
j=0
|g[0, 2l−1(j + 1)]〉,
where 1 ≤ l ≤ ⌊log n⌋ − 1, pl−1[i] = p[2l−1i, 2l−1(i+ 1)], and gl−1[i] = g[2l−1i, 2l−1(i+ 1)] [13]. The
CARRYl gate uses
∑⌊log n⌋−1
t=l (⌊n/2t⌋ − 1) ancillary qubits and its depth and size are O(log n − l)
and O(
∑⌊logn⌋−1
t=l (⌊n/2t⌋ − 1)), respectively. Draper et al.’s quantum carry-lookahead adder uses
O(n) ancillary qubits and its depth and size are O(log n) and O(n), respectively.
In Takahashi et al.’s combination method, the input binary number an−1 · · · a0 is divided into
n/k blocks of length k, where we assume that n is a power of two for simplicity and set k = 2⌊log logn⌋
and l = ⌊log log n⌋ + 1. Note that k = Θ(log n) and n is divisible by k. That is, we consider a
k-bit binary number a(j) = a(j+1)k−1 · · · ajk for 0 ≤ j ≤ n/k − 1. Similarly, we consider b(j) for
bn−1 · · · b0. Roughly speaking, the previous method is described as follows:
1. Compute the high-order bit of a(j) + b(j), which is gl−1[j] = g[jk, (j +1)k], using the ripple-
carry approach [11] for 0 ≤ j ≤ n/k − 1.
2. Compute the value
∧k−1
i=0 (ajk+i ⊕ bjk+i), which is pl−1[j] = p[jk, (j + 1)k], using Barenco et
al.’s circuit for a generalized Toffoli operation Tk [19] for 0 ≤ j ≤ n/k− 1, where Tt (on t+1
qubits) is defined as
Tt
(
|y〉
t−1⊗
i=0
|xi〉
)
= |y ⊕
t−1∧
i=0
xi〉
t−1⊗
i=0
|xi〉.
7
3. Compute the carry bit cjk = g[0, jk] using the values computed in Steps 1 and 2 for 1 ≤ j ≤
n/k. This is done by using the CARRYl gate.
4. Compute the carry bit g[0, i] using the carry bits computed in Step 3 for 1 ≤ i ≤ n and obtain
si for 0 ≤ i ≤ n. This is done by a circuit based on the ripple-carry approach as in Step 1.
The whole circuit uses O(n/k) (= O(n/ log n)) ancillary qubits and its depth and size are O(k)
(= O(log n)) and O(n), respectively.
3.2 Our Method
Our idea is to divide the input binary numbers into n/d(n) blocks of length d(n) in Takahashi
et al.’s method, where d(n) = Ω(log n). By using the CARRYlog d(n)+1 gate, we can construct an
O(d(n))-depth O(n)-size circuit with O(n/d(n)) ancillary qubits. This is a simple generalization
of the previous method. Though this allows us to construct an O(d(n))-depth circuit for any
d(n) = Ω(log n) in contrast to the previous method, it, of course, does not improve the previous
O(log n)-depth circuit.
To obtain an efficient circuit, we simplify Steps 1, 2, and 4 in the previous method using the
circuit for addition in Section 2. The simplification of Step 4 is due to a direct application of
the circuit for addition. To simplify Steps 1 and 2, we use only the first halves of our circuit for
addition and Barenco et al.’s circuit for Tn [19]. The first half of the circuit for addition outputs the
high-order bit of a(j)+b(j) and appropriate inputs to Barenco et al.’s circuit. We use only the first
half and we can thus save Toffoli gates, but some qubits represent unuseful values. An important
point is that Barenco et al.’s circuit can use these qubits as uninitialized ancillary qubits. We use
the first half of Barenco et al.’s circuit and we can thus again save Toffoli gates, but some qubits
have unuseful values. This is not a problem since these qubits are reset to the initial values in later
steps. The details are described below.
To simplify Steps 1 and 2, since we need to compute only the two bits g[i, j] and p[i, j] for some
i, j, it suffices to construct an efficient quantum circuit for the operation(
w−1⊗
i=0
|bi〉|ai〉
)
|0〉|0〉 →
(
w−1⊗
i=0
|p[i, i+ 1]〉|ri〉
)
|g[0, w]〉|p[0, w]〉,
where aw−1 · · · a0 and bw−1 · · · b0 are the input binary numbers, r0 = a0, and ri = ai⊕g[0, i]⊕p[0, i]
(1 ≤ i ≤ w−1). Let Ai and Bi denote the memory locations initially storing ai and bi, respectively.
Let G and P be the memory locations initially storing 0. Location Ai will store ri, Bi will store
p[i, i + 1], G will store g[0, w], and P will store p[0, w] at the end of the computation. The circuit
is defined as follows:
1. Apply the first half of the circuit (for two w-bit binary numbers) in Section 2 to a tuple of
memory locations Ai (0 ≤ i ≤ w − 1) and Bi (0 ≤ i ≤ w − 1) and G.
2. Apply a CNOT gate to a pair of memory locations A0 and B0, where A0 is used for the
control bit.
3. Apply the first half of Barenco et al.’s circuit for Tw to a tuple of memory locations Ai
(0 ≤ i ≤ w− 1) and Bi (0 ≤ i ≤ w− 1) and P , where Ai is used as an uninitialized ancillary
memory location.
Step 1 writes the value g[0, w] into the memory location G. The memory location Ai stores the
value ri. Step 2 writes p[0, 1] into the memory location B0. Step 3 uses the memory location Ai as
an uninitialized ancillary memory location and writes the value p[0, w] into the memory location
8
  
 
! "
 !"#$%&'
"  ! !" "℄!
 #
 
! " "  $
 
!
  
!
!
 !"#$%&'
" " "  ! "" $℄!
 #
!
! " "
 !"#$%&'
" "
 !"#$%&'
"  $
!
!
  
"
!
 !"#$%&'
" " "  ! $" %℄!
 #
"
! " "
 !"#$%&'  !"#$%&'
" "
 !"#$%&'  !"#$%&'
"  $
"
!
  
#
!
 !"#$%&'
" " "  ! %" &℄!
 #
#
! " "
 !"#$%&'  !"#$%&'
" "
 !"#$%&'  !"#$%&'
"  $
#
!
  
$
!
 !"#$%&'
" " "  ! &" '℄!
 #
$
! " "
 !"#$%&'  !"#$%&'
" "
 !"#$%&'  !"#$%&'
"  $
$
!
 !!
 !"#$%&'  !"#$%&'
 % !" '℄!
 !!
                       !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
                       
 !"#$%&'  !"#$%&'
 ! !" '℄!
Figure 3: The INIT5 gate. A dashed-line box represents the part for computing g[0, 5], which is
the first half of our circuit for addition in Section 2.
 ! " "  !
 !
 
!
 !"#$%&'
" "
 !"#$%&'  !"#$%&'
 "
 
!
 #
 
! " "
 !"#$%&'
" " "
 !"#$%&'
" "  #
 
!
 !
!
!
 !"#$%&'
" "
 !"#$%&'  !"#$%&'
 "
!
!
 #
!
! " "
 !"#$%&'  !"#$%&'
" " "
 !"#$%&'  !"#$%&'
" "  #
!
!
 !
"
!
 !"#$%&'
" "
 !"#$%&'  !"#$%&'
 "
"
!
 #
"
! " "
 !"#$%&'  !"#$%&'
" " "
 !"#$%&'  !"#$%&'
" "  #
"
!
 !
#
!
 !"#$%&'
" "
 !"#$%&'  !"#$%&'
 "
#
!
 #
#
! " "
 !"#$%&'  !"#$%&'
" " "
 !"#$%&'  !"#$%&'
" "  #
#
!
 !
$
!
 !"#$%&'
 "
$
!
 #
$
!
 !"#$%&'  !"#$%&'
"
 !"#$%&'  !"#$%&'
 #
$
!
Figure 4: The SUM5 gate.
P . The whole circuit uses no ancillary qubits and its depth and size are O(w). We call the circuit
the INITw gate. The INIT5 gate is depicted in Fig. 3.
To simplify Step 4, it suffices to construct an efficient quantum circuit for the operation(
|c〉
w−1⊗
i=0
|bi〉|ai〉
)
→
(
|c〉
w−1⊗
i=0
|ti〉|ai〉
)
,
where c ∈ {0, 1}, aw−1 · · · a0 and bw−1 · · · b0 are the input binary numbers, tj = aj ⊕ bj ⊕ dj
(0 ≤ j ≤ w − 1), and dj is defined as
dj =
{
c j = 0,
MAJ(aj−1, bj−1, dj−1) 1 ≤ j ≤ w − 1.
We can directly apply the circuit in Section 2 to constructing such a circuit and thus omit the
details. The circuit uses no ancillary qubits and its depth and size are O(w). We call the circuit
the SUMw gate. The SUM5 gate is depicted in Fig. 4.
3.3 The Whole Circuit
We construct a quantum circuit for ADDn. For simplicity, we assume that n is a power of two.
Let d(n) = Ω(log n). We set k = 2⌊log d(n)⌋ and l = ⌊log d(n)⌋ + 1. Note that k = Θ(d(n)) and
9
n is divisible by k. As described in Section 3.1, we consider k-bit binary numbers a(j) and b(j).
Let Ai and Bi denote the memory locations initially storing ai and bi, respectively. Let Z be the
memory location initially storing z ∈ {0, 1}. Location Ai will store ai, Bi will store si, and Z will
store z ⊕ sn at the end of the computation. We assume that there are ancillary memory locations
initially storing 0. The first half of our circuit is defined as follows:
1. Apply the INITk gate to memory locations storing a(j) and b(j) and to two ancillary memory
locations storing 0 for 0 ≤ j ≤ n/k − 1. The gate writes gl−1[j] and pl−1[j] into the ancillary
memory locations.
2. Apply the CARRYl gate to memory locations storing all gl−1[j] and all pl−1[j] and to ancillary
memory locations storing 0. The gate writes c(j+1)k into the memory location storing gl−1[j]
for 0 ≤ j ≤ n/k − 1.
3. Apply the gates in Step 1 in reverse order, where we exclude the gates applied to memory
locations storing c(j+1)k for 0 ≤ j ≤ n/k − 1 since we do not erase the value.
4. Apply the SUMk gate to memory locations storing a(j + 1) and b(j + 1) and to a memory
location storing ck(j+1) to obtain sk(j+1), . . . , sk(j+2)−1 for 0 ≤ j ≤ n/k−2. Apply a simplified
gate of the SUMk gate to memory locations storing a(0) and b(0) to obtain s0, . . . , sk−1.
The last half part deletes unnecessary carry bits using the fact that the carry bits generated for
computing a+ s′ is the same as those for computing a+ b, where s′ is the bitwise complement of s
[12].
5. Apply a NOT gate to Bi to write si ⊕ 1 into Bi for 0 ≤ i ≤ n− k − 1.
6. Apply the first half of our circuit excluding Step 4 in reverse order, where we exclude the
gates applied to memory locations storing a(n/k−1) and b(n/k−1) since we do not erase the
last carry bit. The gate writes 0 into a memory location storing ck(j+1) for 0 ≤ j ≤ n/k − 1.
7. Apply a NOT gate to Bi to write si into Bi for 0 ≤ i ≤ n− k − 1.
The whole circuit for d(n) = log n and n = 8 (and thus k = l = 2) is depicted in Fig. 5.
We compute the number of ancillary qubits, the depth, and the size precisely. For simplicity,
we count only Toffoli gates as in [12, 13]. Step 1 requires 2nk ancillary qubits to use
n
k INITk gates.
The gate consists of 3n−2 Toffoli gates for n ≥ 3. Thus, the depth and size of Step 1 are 3k−O(1)
and 3n−O(n/k), respectively. The CARRYl gate in Step 2 uses nk −O(log n) ancillary qubits and
its depth and size are 2 log nk + O(1) and
4n
k + O(log n), respectively, where
n
k ≥ 4 [13]. Step 3 is
the same as Step 1. Step 4 uses nk SUMk gates. The gate consists of 2n− 2 Toffoli gates for n ≥ 3.
Thus, the depth and size of Step 4 are 2k−O(1) and 2n−O(n/k), respectively. The other steps are
the same as the above steps excluding Step 4. Our circuit uses 3nk − O(log n) ancillary qubits and
its depth and size are 14k+4 log nk +O(1) and 14n−O(n/k), respectively, where nk ≥ 4. Thus, the
circuit uses O(n/d(n)) ancillary qubits and its depth and size are O(d(n)) and O(n), respectively.
For example, for d(n) = log n and n ≥ 16, the number of ancillary qubits, the depth, and the size
are approximately 3n/ log n, 18 log n, and 14n, respectively. The corresponding previous bounds
are 3n/ log n, 30 log n, and 29n. That is, in this case, the number of ancillary qubits in our circuit
is the same as that in Takahashi et al.’s [13] and the leading coefficient of the expression of the size
in our circuit is less than half that in Takahashi et al.’s.
10
  
 
! " " " "
 !"#$%&'  !"#$%&'
" "
 !"#$%&'
 !
 
!
 "
 
! " " " " " " "  "
 
!
  
!
!
 !"#$%&'
"
 !"#$%&'  !"#$%&'  !"#$%&'  !"#$%&'
"
 !"#$%&'  !"#$%&'
 !
!
!
 "
!
! " "
 !"#$%&'
"
 !"#$%&'
"
 !"#$%&'
"
 !"#$%&'
"
 !"#$%&'
"
 !"#$%&'
" "  "
!
!
  !
 !"#$%&'  !"#$%&'
" " " "
 !"#$%&'  !"#$%&'
  !
  
"
! "
 !"#$%&'
" "
 !"#$%&'
"
 !"#$%&'
" "
 !"#$%&'  !"#$%&'  !"#$%&'
"
 !"#$%&'
" "
 !"#$%&'
"
 !"#$%&'
 !
"
!
 "
"
! " " " " " "
 !"#$%&'
" " "
 !"#$%&'
" " " " " "  "
"
!
  
#
!
 !"#$%&'
" " " " "
 !"#$%&'  !"#$%&'  !"#$%&'  !"#$%&'
" " " " "
 !"#$%&'  !"#$%&'
 !
#
!
 "
#
! " "
 !"#$%&'
" "
 !"#$%&'
" "
 !"#$%&'
"
 !"#$%&'
"
 !"#$%&'  !"#$%&'
"
 !"#$%&'  !"#$%&'
"
 !"#$%&'
"
 !"#$%&'
" "
 !"#$%&'
" "
 !"#$%&'
" "  "
#
!
  !
 !"#$%&'  !"#$%&'  !"#$%&'
" " " " "
 !"#$%&'  !"#$%&'  !"#$%&'
  !
  !
 !"#$%&'  !"#$%&'
"
 !"#$%&'  !"#$%&'  !"#$%&'  !"#$%&'
"
 !"#$%&'  !"#$%&'
  !
  
$
! "
 !"#$%&'
" "
 !"#$%&'
"
 !"#$%&'
" "
 !"#$%&'  !"#$%&'  !"#$%&'
"
 !"#$%&'
" "
 !"#$%&'
"
 !"#$%&'
 !
$
!
 "
$
! " " " " " "
 !"#$%&'
" " "
 !"#$%&'
" " " " " "  "
$
!
  
%
!
 !"#$%&'
" " " " "
 !"#$%&'  !"#$%&'  !"#$%&'  !"#$%&'
" " " " "
 !"#$%&'  !"#$%&'
 !
%
!
 "
%
! " "
 !"#$%&'
" "
 !"#$%&'
" "
 !"#$%&'
"
 !"#$%&'
"
 !"#$%&'  !"#$%&'
"
 !"#$%&'  !"#$%&'
"
 !"#$%&'
"
 !"#$%&'
" "
 !"#$%&'
" "
 !"#$%&'
" "  "
%
!
  !
 !"#$%&'  !"#$%&'
"
 !"#$%&'
" "
 !"#$%&'  !"#$%&'  !"#$%&'
  !
  !
 !"#$%&'  !"#$%&'
" " "
 !"#$%&'  !"#$%&'  !"#$%&'  !"#$%&'
"
 !"#$%&'  !"#$%&'
  !
  !
 !"#$%&'
"
 !"#$%&'
  !
  
&
! "
 !"#$%&'
" "
 !"#$%&'
"
 !"#$%&'
" "
 !"#$%&'  !"#$%&'
 !
&
!
 "
&
! " " " " " "
 !"#$%&'
" " "
 !"#$%&'
" "  "
&
!
  
'
!
 !"#$%&'
" " " " "
 !"#$%&'  !"#$%&'
 !
'
!
 "
'
! " "
 !"#$%&'
" "
 !"#$%&'
" "
 !"#$%&'
"
 !"#$%&'
"
 !"#$%&'  !"#$%&'
"
 !"#$%&'  !"#$%&'
 "
'
!
 #!
 !"#$%&'  !"#$%&'  !"#$%&'  !"#$%&'
 # # !
(
!
  !
 !"#$%&'  !"#$%&'
" " "
        !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
        
 !"#$%&'  !"#$%&'
                
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
                
   !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
   
  !
Figure 5: The circuit for ADD8, where d(n) = log n. The first and third dashed-line boxes represent
the carry-lookahead part [12, 13]. The second one represents the parallel applications of the SUM2
gate.
4 Circuit with Depth o(logn)
4.1 Chandra et al.’s Classical Circuit
If we use only one-qubit and two-qubit gates as elementary gates, we cannot construct an o(log n)-
depth circuit for ADDn. This is simply shown by using the logarithmic lower bound for the depth
of the circuit for Fn [15]. To construct an o(log n)-depth circuit, we decrease the depth of the
carry-lookahead part of our method in Section 3 by using a quantum version of Chandra et al.’s
efficient classical circuit for addition with (classical) unbounded fan-out gates [14]. We assume that
we have unbounded fan-out gates (described in Section 2) as elementary gates. We first consider
the simple case where we have unbounded fan-out gates with a long length and then reduce the
length.
Chandra et al.’s method for constructing the circuit is a generalization of the carry-lookahead
approach. Besides the (classical) unbounded fan-out gates, the circuit uses unbounded fan-in gates
that compute logical AND (or OR) of an unbounded number of input bits. The depth and size of
the circuit for two m-bit binary numbers are O(1) and O(m log∗∗m), respectively, where
log∗∗ t = min{j|
j︷ ︸︸ ︷
log∗ · · · log∗ t ≤ 1}, log∗ t = min{j|
j︷ ︸︸ ︷
log · · · log t ≤ 1}.
It can be shown that log∗∗m = o(log∗m). Though the definition of the depth of a classical circuit
is similar to that of a quantum circuit, the definition of the size of a classical circuit in [14] is
different from that of a quantum circuit. More precisely, a classical circuit is defined as a directed
acyclic graph and the size is the number of edges in the circuit and the depth is the length of a
longest path from an input node to an output node. Chandra et al. give a tighter bound on the
size of the circuit, but we use the above bound since it is sufficient for showing that our circuits in
Sections 4.2 and 4.3 use a sublinear number of ancillary qubits.
11
4.2 Simple Case
4.2.1 Quantum Version of Chandra et al.’s Circuit
We transform Chandra et al.’s classical circuit for two m-bit binary numbers into its quantum
version. Since the size (that is, the number of edges) of the circuit is O(m log∗∗m), it suffices to
consider an unbounded fan-out gate with length O(m log∗∗m) and a Tt gate (corresponding to an
unbounded fan-in gate with t inputs in the classical circuit) with t = O(m log∗∗m). We assume that
we have unbounded fan-out gates with length O(m log∗∗m). If we have one-qubit gates, CNOT
gates, Tt gates, and unbounded fan-out gates with length O(m log
∗∗m), Chandra et al.’s classical
circuit can be simply transformed into its quantum version. Note that an OR gate in Chandra
et al.’s circuit is transformed into a Tt gate with NOT gates. However, in our setting, we have
only one-qubit gates, CNOT gates, and unbounded fan-out gates with length O(m log∗∗m). Thus,
we require a quantum circuit for Tt (consisting of one-qubit gates, CNOT gates, and unbounded
fan-out gates with length O(m log∗∗m)). We use Høyer et al.’s circuit for the Tt operation (defined
in Section 3.1) as the Tt gate [9]. They showed that, if unbounded fan-out gates with length O(t)
are available, an O(log∗ t)-depth O(t)-size quantum circuit for Tt can be constructed. We can show
that Høyer et al.’s circuit uses O(t) ancillary qubits. Since we have unbounded fan-out gates with
length O(m log∗∗m), we can directly use Høyer et al.’s circuit for Tt with t = O(m log
∗∗m). Thus,
we obtain a quantum version of Chandra et al.’s circuit. We call the circuit the GCLAm circuit,
which stands for the generalized carry-lookahead approach for two m-bit binary numbers.
The complexity of the GCLAm circuit is analyzed as follows. To compute the depth of the
circuit, since the depth of the original circuit is O(1), it suffices to consider a Tt1 gate, where t1 is
the maximum number of inputs of Tt gates in the GCLAm circuit. The depth of the Tt1 gate is
O(log∗ t1). Since t1 = O(m log
∗∗m), the depth of the Tt1 gate is O(log
∗(m log∗∗m)) and thus the
depth of the GCLAm circuit is O(log
∗(m log∗∗m)). To compute the size of the circuit, we define
At as the number of unbounded fan-in gates with t inputs in Chandra et al.’s circuit, which is
equal to the number of Tt gates in the GCLAm circuit. Since the size of Chandra et al.’s circuit
is O(m log∗∗m),
∑
t tAt = O(m log
∗∗m). The size of a Tt gate is O(t). The number of the other
gates in the GCLAm circuit is O(m log
∗∗m) (and the size of each gate is 1). Thus, the size of
the GCLAm circuit is O(
∑
t tAt +m log
∗∗m) = O(m log∗∗m). A similar argument shows that the
number of ancillary qubits in the GCLAm circuit is O(m log
∗∗m). That is, the GCLAm circuit
uses O(m log∗∗m) ancillary qubits and its depth and size are O(log∗(m log∗∗m)) and O(m log∗∗m),
respectively.
4.2.2 Modification of Our Method
We modify our method in Section 3.3 by using the GCLAm circuit as the CARRYl gate. Let
e(n) = Ω(log∗ n). We set k and l as in Section 3.3. Note that k = 2l−1 = Θ(e(n)). We assume that
we are allowed to use unbounded fan-out gates with length O(n). Chandra et al.’s circuit for two
⌊n/2l−1⌋-bit binary numbers is directly applied to perform the operation performed by the CARRYl
gate. Thus, we set m = ⌊n/2l−1⌋. In this case, O(m log∗∗m) = O(n log∗∗(n/2l−1)/2l−1), which is
bounded by O(n). Since we have unbounded fan-out gates with length O(n), we can use the com-
plexity analysis described in Section 4.2.1. The GCLAm circuit, which is the CARRYl gate, uses
O(n log∗∗(n/2l−1)/2l−1) ancillary qubits and its depth and size are O(log∗(n log∗∗(n/2l−1)/2l−1))
and O(n log∗∗(n/2l−1)/2l−1), respectively. For simplicity, we consider slightly weaker bounds;
it uses O(n log∗∗ n/2l−1) ancillary qubits and its depth and size are O(log∗(n log∗∗ n/2l−1)) and
O(n log∗∗ n/2l−1), respectively.
The complexity of the whole circuit obtained by the modified method is analyzed as in the orig-
inal method. Step 1 uses O(n/k) ancillary qubits and its depth and size are O(k) and O(n), respec-
12
tively. Step 2 uses O(n log∗∗ n/k) ancillary qubits and its depth and size are O(log∗(n log∗∗ n/k))
and O(n log∗∗ n/k), respectively. Step 4 requires no new ancillary qubits and its depth and size
are O(k) and O(n), respectively. The other steps are similar to the above steps. Thus, the whole
circuit uses O(n log∗∗ n/e(n)) (= o(n)) ancillary qubits and its depth and size are O(e(n)) and
O(n), respectively. In particular, for e(n) = log∗ n, the modified method yields an O(log∗ n)-depth
O(n)-size circuit with O(n log∗∗ n/ log∗ n) (= o(n)) ancillary qubits.
4.3 Reduction of the Length of an Unbounded Fan-Out Gate
We prove that the length of an unbounded fan-out gate can be restricted to O(nε) in the modified
method without increasing the complexity of the circuit, where ε is any small positive constant.
Suppose that we are allowed to use unbounded fan-out gates with length f(n). An unbounded
fan-out gate with length t = O(m log∗∗m) (and m = ⌊n/2l−1⌋) can be simply simulated by using
an O(log t/ log f(n)+1)-depth O(t/f(n)+1)-size circuit with no ancillary qubits that consists only
of unbounded fan-out gates with length f(n). In the following, using this simulation, we reconsider
the complexity of the Tt gate, the GCLAm circuit, and the circuit our method in Section 4.2 yields.
4.3.1 Tt gate
The Tt gate, which is Høyer et al.’s circuit for the Tt operation, is constructed as follows:
1. Construct an O(1)-depth O(t log t)-size circuit with O(t log t) ancillary qubits for reducing
the computation of OR of t bits to that of O(log t) bits.
2. Using the circuit in Step 1, for any d > 0, construct an O(d + log∗ t)-depth O(dt log(d) t)-
size circuit for Tt with O(dt log
(d) t) ancillary qubits, where log(d) t is the d-times iterated
logarithm log · · · log t.
3. Using the circuit in Step 2, construct an O(log∗ t)-depth O(t)-size circuit for Tt with O(t)
ancillary qubits.
We can modify the above steps using unbounded fan-out gates with length f(n) as follows:
1. Construct an O(log t/ log f(n)+1)-depth O(t log t)-size circuit with O(t log t) ancillary qubits
for reducing the computation of OR of t bits to that of O(log t) bits.
2. Using the circuit in Step 1, for any d > 0, construct an O(d + log∗ t + log t/ log f(n) +
d log log t/ log f(n))-depth O(dt log(d) t)-size circuit for Tt with O(dt log
(d) t) ancillary qubits.
3. Using the circuit in Step 2, construct an O(log t/ log f(n) + log∗ t)-depth O(t)-size circuit for
Tt with O(t) ancillary qubits.
To see this, we first analyze Step 1 in Høyer et al.’s construction. In this step, an unbounded fan-
out gate with length O(log t) is used in parallel to make O(log t) copies of each of the t input bits.
Moreover, an unbounded fan-out gate with length O(t) is used in parallel to prepare appropriate
ancillary qubits O(log t) times. As described above, an unbounded fan-out gate with length O(log t)
can be simulated by using an O(log log t/ log f(n)+ 1)-depth O(log t/f(n) + 1)-size circuit with no
ancillary qubits. Similarly, an unbounded fan-out gate with length O(t) can be simulated by
using an O(log t/ log f(n) + 1)-depth O(t/f(n) + 1)-size circuit. Thus, the depth of the Tt gate is
O(log t/ log f(n) + 1). The size is O(t · (log t/f(n) + 1) + (log t) · (t/f(n) + 1)) = O(t log t). These
simulations do not require any ancillary qubits. That is, in Step 1, the number of ancillary qubits
and size remain unchanged even if we consider unbounded fan-out gates with length f(n). Thus,
they also do so in Steps 2 and 3. Step 2 of Høyer et al.’s construction is done by using Step 1
13
O(log∗ t) times to reduce the computation of OR of t bits to that of a constant number of bits. Step
3 is done by reducing the computation of OR of t bits to that of t/ log∗ t bits and by using Step
2 with d = log∗ t. These procedures can be simply applied to the case where we use unbounded
fan-out gates with length f(n) and imply the desired depth bound.
4.3.2 The GCLAm circuit
To compute the depth of the GCLAm circuit, it suffices to consider a Tt1 gate for some t1 and an
unbounded fan-out gate with some length t2. The depth of the Tt1 gate is O(log t1/ log f(n)+log
∗ t1)
and the depth of an unbounded fan-out gate with length t2 is O(log t2/ log f(n) + 1). Since t1 and
t2 cannot be greater than the size of Chandra et al.’s circuit, the depth of the GCLAm circuit is
O(logm/ log f(n)+log∗(m log∗∗m)). To compute the size, we defineBt as the number of unbounded
fan-out gates with length t used (implicitly) in Chandra et al.’s original circuit, which is equal to
the number of unbounded fan-out gates with length t (that are not used in Ts gates for any s) in the
GCLAm circuit. Since the size of Chandra et al.’s circuit is O(m log
∗∗m),
∑
t tBt = O(m log
∗∗m). If
t ≥ f(n), an unbounded fan-out gate with length t can be simulated by an O(t/f(n))-size circuit.
Thus, the size related to unbounded fan-out gates with length greater than or equal to f(n) in
the GCLAm circuit (that is,
∑
t≥f(n)(t/f(n))Bt) is O(m log
∗∗m) since
∑
t tBt = O(m log
∗∗m).
The size related to the Tt gates (that is, O(
∑
t tAt)) is O(m log
∗∗m). The number of the other
gates is O(m log∗∗m) (and the size of each gate is 1). Thus, the size of the GCLAm circuit is
O(m log∗∗m). The number of ancillary qubits is the same as the size. That is, the GCLAm circuit
uses O(m log∗∗m) ancillary qubits and its depth and size are O(logm/ log f(n) + log∗(m log∗∗m))
and O(m log∗∗m), respectively. Since m = ⌊n/2l−1⌋, the circuit uses O(n log∗∗(n/2l−1)/2l−1)
ancillary qubits and its depth and size are O(log(n/2l−1)/ log f(n)+log∗(n log∗∗(n/2l−1)/2l−1)) and
O(n log∗∗(n/2l−1)/2l−1), respectively. For simplicity, we consider slightly weaker bounds; it uses
O(n log∗∗ n/2l−1) ancillary qubits and its depth and size are O(log n/ log f(n)+log∗(n log∗∗ n/2l−1))
and O(n log∗∗ n/2l−1), respectively.
4.3.3 Our Circuit
We set f(n) = nε and use the GCLAm circuit as the CARRYl gate, where ε is any small positive
constant. In this case, the CARRYl gate uses O(n log
∗∗ n/2l−1) ancillary qubits and its depth and
size are O(log∗(n log∗∗ n/2l−1)) and O(n log∗∗ n/2l−1), respectively. This is the same situation as
that in Section 4.2 except that the length of an unbounded fan-out gate in the CARRYl gate is
at most nε. Thus, the whole circuit uses O(n log∗∗ n/e(n)) (= o(n)) ancillary qubits and its depth
and size are O(e(n)) and O(n), respectively. If we set e(n) = log∗ n, we obtain an O(log∗ n)-depth
O(n)-size circuit with o(n) ancillary qubits.
It is worth noting that the above method for constructing a circuit for ADDn yields an o(log n)-
depth O(n)-size circuit with o(n) ancillary qubits using unbounded fan-out gates with a small
length. For example, we set f(n) = log n and d(n) = log n/ log log n. In this case, the CARRYl
gate uses O(n log∗∗ n log log n/ log n) ancillary qubits and its depth and size are O(log n/ log log n)
and O(n log∗∗ n log log n/ log n), respectively. This yields an O(log n/ log log n)-depth O(n)-size
circuit with O(n log∗∗ n log log n/ log n) ancillary qubits. Such an o(log n)-depth circuit cannot be
constructed by using a quantum circuit only with gates on a bounded number of qubits [15] or
by using a classical circuit only with bounded fan-in and unbounded fan-out gates [16]. Hence,
unbounded fan-out gates even with a small length are useful for constructing efficient quantum
circuits for addition.
14
5 Application
We consider the prime field GF(p) for some prime p > 3. An elliptic curve E over GF(p) is the set
of points (x, y) ∈ GF(p) × GF(p) satisfying y2 = x3+ax+ b, where the constants a, b ∈ GF(p) and
4a3+27b2 6= 0, together with the point at infinity O. It is known that the addition operation in E
can be defined and that E with the addition operation forms an abelian group with O serving as
its identity [20]. Let P ∈ E, 〈P 〉 be the subgroup of E generated by P , and |〈P 〉| be the order of
〈P 〉. The discrete logarithm problem over the elliptic curve E with respect to the base P is defined
as follows: Given a point Q ∈ 〈P 〉, find the integer 0 ≤ d ≤ |〈P 〉| − 1 such that Q = dP . Shor’s
discrete logarithm algorithm solves the problem in time polynomial in the length of the binary
representation for |〈P 〉| with high probability [1]. As in [5], we assume that the length of the binary
representation for |〈P 〉| is equal to that of the binary representation for p.
Proos et al. constructed an efficient quantum circuit for Shor’s discrete logarithm algorithm
for elliptic curves over GF(p) [5]. Let n be the length of the binary representation for p. The
depth and size of the circuit are O(n3). The dominant cost is O(n2) applications of an O(n)-depth
O(n)-size quantum circuit for ADDn with n ancillary qubits. For counting the number of qubits
in the circuit, it suffices to count the number of qubits in the circuit for division in GF(p) that
maps |x〉|y〉 to |x〉|y/x〉 for x (6= 0), y ∈ GF(p). The circuit for division in GF(p) uses about 5n
qubits: 2n qubits are used for the input register and about 3n qubits are used in the circuit for
the extended Euclidean algorithm. In the circuit for the extended Euclidean algorithm, about 2n
qubits are used for the input binary numbers and intermediate results, and n qubits are used for
ancillary qubits during ADDn.
By simply replacing Proos et al.’s circuit for ADDn with our circuit in Section 2, we can eliminate
the n ancillary qubits during ADDn since our circuit for ADDn does not use any ancillary qubits.
The resulting circuit uses about 4n qubits. Since Proos et al. do not describe the precise depth
or size of their circuit for ADDn, we cannot compare the depth or size of the resulting circuit
with that of the original one precisely. However, the depth and size of our circuit for ADDn are
asymptotically the same as those of Proos et al.’s. Thus, the depth and size of the resulting circuit
are asymptotically the same as those of the original circuit.
By adding o(n) ancillary qubits to the circuit obtained above, we can decrease the depth asymp-
totically. As shown in Section 3, for any d(n) = Ω(log n), we have an O(d(n))-depth O(n)-size
circuit for ADDn with O(n/d(n)) ancillary qubits. If we use this circuit as above, we obtain
O(n2d(n))-depth O(n3)-size circuit for Shor’s discrete logarithm algorithm with 4n + O(n/d(n))
qubits. Moreover, as shown in Section 4, if we are allowed to use unbounded fan-out gates with
length O(nε) for an arbitrary small positive constant ε, we have an O(e(n))-depth O(n)-size circuit
for ADDn with o(n) ancillary qubits for any e(n) = Ω(log
∗ n). This circuit yields an O(n2e(n))-
depth O(n3)-size circuit for Shor’s discrete logarithm algorithm with 4n+ o(n) qubits. We can also
use the previous circuits for ADDn to improve Proos et al.’s circuit. However, they do not yield
more efficient quantum circuits for Shor’s discrete logarithm algorithm than our circuits described
above. This is simply because our circuits for ADDn is more efficient than the previous ones.
6 Conclusions and Future Work
We constructed an O(n)-depth O(n)-size quantum circuit for ADDn with no ancillary qubits. The
size is less than that of any other quantum circuit ever constructed for ADDn with no ancillary
qubits. Using the circuit, we proposed a method for constructing a small-size quantum circuit for
ADDn with a small number of qubits that has a given depth. In particular, we showed that, if we are
allowed to use unbounded fan-out gates with length O(nε) for an arbitrary small positive constant
ε, we can construct an O(log∗ n)-depth O(n)-size circuit with o(n) ancillary qubits. We applied
15
our circuits to constructing efficient quantum circuits for Shor’s discrete logarithm algorithm.
Interesting challenges would be to find ways of improving the quantum circuits described in this
paper. For example, can we construct an O(log n)-depth O(n)-size quantum circuit for ADDn with
O(1) ancillary qubits? Can we construct an O(1)-depth O(n)-size quantum circuit for ADDn with
O(n) ancillary qubits using unbounded fan-out gates? In the classical case, we cannot construct
an O(1)-depth O(n)-size (that is, the number of edges) circuit for addition with unbounded fan-in
and fan-out gates [21].
Acknowledgments
The authors thank Yasuhito Kawano and Go Kato for their helpful comments.
References
[1] P. W. Shor (1994), Algorithms for quantum computation: discrete logarithms and factoring,
In Proceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science,
pages 124–134.
[2] V. Vedral, A. Barenco, and A. Ekert (1996), Quantum networks for elementary arithmetic
operations, Physical Review A, 54(1):147–153.
[3] C. Zalka (1998), Fast versions of Shor’s quantum factoring algorithm, quant-ph/9806084.
[4] S. Beauregard (2003), Circuit for Shor’s algorithm using 2n+ 3 qubits, Quantum Information
and Computation, 3(2):175–185.
[5] J. Proos and C. Zalka (2003), Shor’s discrete logarithm quantum algorithm for elliptic curves,
Quantum Information and Computation, 3(4):317–344.
[6] A. G. Fowler, S. J. Devitt, and L. C. L. Hollenberg (2004), Implementation of Shor’s algorithm
on a linear nearest neighbour qubit array, Quantum Information and Computation, 4(4):237–
251.
[7] Y. Takahashi and N. Kunihiro (2006), A quantum circuit for Shor’s factoring algorithm using
2n+ 2 qubits, Quantum Information and Computation, 6(2):184–192.
[8] F. Green, S. Homer, C. Moore, and C. Pollett (2002), Counting, fanout, and the complexity
of quantum ACC, Quantum Information and Computation, 2(1):35–65.
[9] P. Høyer and R. Sˇpalek (2005), Quantum fan-out is powerful, Theory of Computing, 1(5):81–
103.
[10] T. G. Draper (2000), Addition on a quantum computer, quant-ph/0008033.
[11] Y. Takahashi and N. Kunihiro (2005), A linear-size quantum circuit for addition with no
ancillary qubits, Quantum Information and Computation, 5(6):440–448.
[12] T. G. Draper, S. A. Kutin, E. M. Rains, and K. M. Svore (2006), A logarithmic-depth quantum
carry-lookahead adder, Quantum Information and Computation, 6(4&5):351–369.
[13] Y. Takahashi and N. Kunihiro (2008), A fast quantum circuit for addition with few qubits,
Quantum Information and Computation, 8(6&7):636–649.
16
[14] A. K. Chandra, S. Fortune, and R. Lipton (1983), Unbounded fan-in circuits and associative
functions, In Proceedings of the 15th Annual ACM Symposium on Theory of Computing,
pages 52–60.
[15] M. Fang, S. Fenner, F. Green, S. Homer, and Y. Zhang (2006), Quantum lower bounds for
fanout, Quantum Information and Computation, 6(1):46–57.
[16] N. Pippenger (1987), The complexity of computations by networks, IBM Journal of Research
and Development, 31(2):235–243.
[17] M. A. Nielsen and I. L. Chuang (2000), Quantum Computation and Quantum Information,
Cambridge University Press.
[18] S. A. Cuccaro, T. G. Draper, S. A. Kutin, and D. P. Moulton (2005), A new quantum ripple-
carry addition circuit, The Eighth Workshop on Quantum Information Processing. Also on
quant-ph/0410184.
[19] A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, N. Margolus, P. Shor, T. Sleator,
J. A. Smolin, and H. Weinfurter (1995), Elementary gates for quantum computation, Physical
Review A, 52(5):3457–3467.
[20] D. Hankerson, A. Menezes, and S. Vanstone (2003), Guide to Elliptic Curve Cryptography,
Springer.
[21] D. Dolev, C. Dwork, N. Pippenger, and A. Wigderson (1983), Superconcentrators, generaliz-
ers and generalized connectors with limited depth, In Proceedings of the 15th Annual ACM
Symposium on Theory of Computing, pages 42–51.
17
