We study the quantum complexity class QNC 0 f of quantum operations implementable exactly by constant-depth polynomial-size quantum circuits with unbounded fan-out gates. Our main result is that the quantum OR operation is in QNC 0 f , which is an affirmative answer to the question posed by Høyer andŠpalek. In sharp contrast to the strict hierarchy of the classical complexity classes: NC 0 AC 0 TC 0 , our result with Høyer andŠpalek's one implies the collapse of the hierarchy of the corresponding quantum ones: QNC 0 f = QAC 0 f = QTC 0 f . Then, we show that there exists a constant-depth subquadratic-size quantum circuit for the quantum threshold operation. This allows us to obtain a better bound on the size difference between the QNC 0 f and QTC 0 f circuits for implementing the same operation. Lastly, we show that, if the quantum Fourier transform modulo a prime is in QNC 0 f , there exists a polynomial-time exact classical algorithm for a discrete logarithm problem using a QNC 0 f oracle. This implies that, under a plausible assumption, there exists a classically hard problem that is solvable exactly by a QNC 0 f circuit with gates for the quantum Fourier transform.
Introduction
In 2001, Moore and Nilsson initiated the study of the quantum complexity class QNC (Moore & Nilsson 2001) , the quantum ana-cc 25 (2016) log of the classical complexity class NC. It consists of quantum operations implementable by polylogarithmic-depth polynomialsize quantum circuits. This line of research is important for understanding the difference between efficient parallel quantum computation and the corresponding classical one. In particular, parallel quantum computation in constant time, i.e., computation by constant-depth polynomial-size quantum circuits, has been studied and found somewhat surprisingly powerful in various settings (Fenner et al. 2005; Green et al. 2002; Høyer &Špalek 2005) , compared with the corresponding classical one (Vollmer 1999) . It is of great interest to further study the computational power of such quantum circuits.
In this paper, we analyze the computational power of constantdepth polynomial-size quantum circuits, which also allows us to analyze that of polylogarithmic-depth ones. The elementary gates are one-qubit gates, CNOT gates, and unbounded fan-out gates (unless otherwise stated). The unbounded fan-out gate is an analog of the classical one normally assumed to be an elementary gate for the study of the computational power of small-depth classical circuits (Vollmer 1999) . The gate on n + 1 qubits makes n copies of a classical source bit in a superposition and, when n = 1, the gate is a CNOT gate. It is interesting to deal with the gate as an elementary gate since its use clarifies many differences between quantum and classical circuits (Green et al. 2002; Høyer &Špalek 2005) and connects the quantum circuit model with the one-way model .
There are three important settings for studying constant-depth classical circuits. All the settings allow the use of (classical) unbounded fan-out gates. The first setting deals with constantdepth polynomial-size classical circuits consisting of NOT gates and OR and AND gates with bounded fan-in. The classical complexity class NC 0 is the class of problems solvable by (uniform families of) such circuits. The second setting is the first one augmented with OR and AND gates with unbounded fan-in, which defines the class AC 0 . The third setting is the second one augmented with threshold gates with unbounded fan-in, which defines the class TC 0 . The threshold gate implements the threshold func-tion, which outputs the bit representing whether the Hamming weight of the input is less than a predetermined threshold. These classes form a strict hierarchy: NC 0 AC 0 TC 0 (Furst et al. 1984; Vollmer 1999) .
We regard the following settings as the quantum counterparts of the above settings (Green et al. 2002) , where all the settings allow the use of unbounded fan-out gates. The first setting deals with constant-depth polynomial-size quantum circuits consisting of one-qubit and CNOT gates. The quantum complexity class QNC 0 f , which corresponds to NC 0 , is the class of quantum operations implementable exactly by (uniform families of) such circuits. The second setting is the first one augmented with a quantum version of OR gates with unbounded fan-in, which defines the class QAC 0 f , corresponding to AC 0 . The third setting is the second one augmented with a quantum version of threshold gates with unbounded fan-in, which defines the class QTC 0 f , corresponding to TC 0 .
Høyer andŠpalek considered the quantum complexity classes BQNC 0 f , BQAC 0 f , and BQTC 0 f , which are the bounded-error versions of QNC 0 f , QAC 0 f , and QTC 0 f , respectively (Høyer &Špalek 2005) . They showed that the hierarchy of these classes collapses, i.e., BQNC 0 f = BQAC 0 f = BQTC 0 f . On the other hand, although they showed that QNC 0 f ⊆ QAC 0 f = QTC 0 f , it has not been known whether the hierarchy of the classes in the exact setting collapses. It is important to consider the exact setting since the computational power of exact quantum circuits is not well understood and there might be a significant difference in computational power between such circuits and their bounded-error versions.
First, in order to study the relationship between QNC 0 f and QAC 0 f , we consider the question posed by Høyer andŠpalek (Høyer &Špalek 2005) as to whether an O(1)-depth poly(n)-size quantum circuit can be constructed for the quantum operation OR n , which computes the OR function on n bits. They showed that there exists an O(log * n)-depth O(n log n)-size quantum circuit for OR n . It is a repetition of the OR reduction, which is represented as an O(1)-depth circuit that exactly reduces the computation of the OR function on n bits to that on O(log n) bits. We give an affirmative cc 25 (2016) answer to the question:
f . This is a sharp contrast to the strict hierarchy of the corresponding classical classes: NC 0 AC 0 TC 0 . More generally, Theorem 1.1 with Høyer andŠpalek's result immediately implies that the hierarchy of polylogarithmic-depth exact quantum circuits collapses, i.e., QNC k
The collapse of the hierarchy obtained by Theorem 1.1 means that QNC 0 f circuits are at least as powerful as TC 0 circuits. This is because QNC 0 f circuits are as powerful as QTC 0 f circuits and QTC 0 f circuits can directly simulate TC 0 circuits. Thus, for example, many arithmetic operations including those useful for Shor's factoring and discrete logarithm algorithms (Shor 1997 ) (such as iterated multiplication) are in QNC 0 f since they are in TC 0 (Siu et al. 1993) . This allows us to study the relationship between QNC 0 f and efficient classical computation as described in the last part of this paper.
Our idea for constructing the circuit in Theorem 1.1 is that, after we apply Høyer andŠpalek's OR reduction, we compute the OR function on O(log n) bits in depth O(1) and with size exponential in log n. The exponential-size circuit is based on the representation of the OR function as an R-linear combination of exponentially many parity functions. Hoban et al. used this representation to provide a procedure for computing the OR function in a restricted model of measurement-based quantum computation (Hoban et al. 2011) . Roughly speaking, our exponential-size circuit is a simulation of that procedure. The proof of Theorem 1.1 depends on the fact that, in the QNC 0 f circuit, an unbounded fanout gate can be used as a parity gate (Green et al. 2002) , which cc 25 (2016) Constant-depth exact quantum circuits 853 implements the parity function. We note, however, that the relationship QNC 0 f = QAC 0 f cannot be derived only from the computational power of parity gates added to the corresponding classical circuit, i.e., to the NC 0 circuit. This is because the OR function is not in NC 0 even if the parity gates are allowed in the NC 0 circuit (Høyer &Špalek 2005).
Second, we derive a precise relationship between QNC 0 f and QTC 0 f circuits as an application of Theorem 1.1. To do this, we consider the problem of constructing an O(1)-depth small-size quantum circuit for the quantum operation TH t n . It computes the threshold function with a threshold 1 ≤ t ≤ n, which is an integer, on n bits. Theorem 1.1 simply yields an O(1)-depth O(tn log n)size quantum circuit for TH t n with 1 ≤ t ≤ n/2 and an O(1)-depth O((n − t + 1)n log n)-size circuit with n/2 ≤ t ≤ n. We show that, for any t such that min{t, n − t} is non-constant, Theorem 1.1 yields a smaller circuit:
Theorem 1.2. There exist the following O(1)-depth quantum circuits for TH t n :
• An O(n log n)-size circuit for any 1 ≤ t ≤ log n or n − log n ≤ t ≤ n.
• An O(n √ t log n)-size circuit for any log n ≤ t ≤ n/2.
• An O(n (n − t) log n)-size circuit for any n/2 ≤ t ≤ n − log n.
As described above, QNC 0 f circuits are as powerful as QTC 0 f circuits, up to polynomial size (and constant depth). The existence of the smaller circuits in Theorem 1.2 means that we can obtain a better bound on the size difference between the QNC 0 f and QTC 0 f circuits for implementing the same quantum operation. Let U n be a quantum operation on n qubits, C n an optimal-size QTC 0 f circuit for U n , and s(n) a polynomial representing the size of C n . Similarly, let t(n) (≥ s(n)) be the optimal QNC 0 f circuit size. Although the definition of QNC 0 f only implies that t(n) is poly(n), Theorem 1.2 tells us more: t(n) = O(s(n) s(n) log n). To show this, we consider a QNC 0 f circuit for U n obtained by replacing every threshold 854 Takahashi & Tani cc 25 (2016) gate in C n with the QNC 0 f circuit in Theorem 1.2. Then, the size of the resulting circuit is O(s(n) s(n) log n) since it is the total size of each gate G on m G qubits in the resulting circuit and the size of G is bounded above by
A key ingredient of the circuits in Theorem 1.2 is an O(1)depth O(n 2 )-size quantum circuit for the quantum operation that computes the counting function on n bits, which outputs the binary representation of the Hamming weight of the input. Our idea for constructing the circuit is that, after we apply Høyer andŠpalek's OR reduction, we implement a particular type of the quantum Fourier transform (QFT) on O(log n) qubits in depth O(1) and with size exponential in log n.
Lastly, we derive a relationship between QNC 0 f and efficient classical computation as an application of Theorem 1.1. More concretely, we study the existence of a classically hard problem 1 that is solvable exactly by a QNC 0 f circuit, where a problem is said to be classically hard if it cannot be solved by a polynomial-time bounded-error classical algorithm. To do this, we consider the question of whether a polynomial-time exact classical algorithm using a QNC 0 f oracle can be constructed for a discrete logarithm problem (DLP) that seems classically hard. Here, the QNC 0 f oracle solves, in classical constant time, a problem that is solvable exactly by a QNC 0 f circuit. The existence of such an algorithm for the DLP means that QNC 0 f is strictly stronger than efficient classical computation in terms of solving a particular problem under a plausible assumption. More precisely, it implies that, under the plausible assumption that the DLP is classically hard, there exists a classically hard problem that is solvable exactly by a QNC 0 f circuit. This is because the algorithm for the DLP with a polynomial-time bounded-error classical simulation of the QNC 0 f oracle would imply that the DLP is not classically hard.
Based on Shor's polynomial-time bounded-error quantum algorithm for the general DLP (Shor 1997) , Høyer andŠpalek showed cc 25 (2016) Constant-depth exact quantum circuits 855 that there exists a polynomial-time bounded-error classical algorithm using a bounded-error version of the QNC 0 f oracle (Høyer &Špalek 2005). It is, however, difficult to transform the algorithm into an exact one. Based on Theorem 1.1 and van Dam's polynomial-time exact quantum algorithm for the general DLP (van Dam 2003), we show that, under an assumption about the QFT, there exists the desired algorithm for a particular type of the DLP that seems classically hard: Theorem 1.3. Let q be a safe prime, i.e., a prime of the form 2p + 1 for some prime p, and n = log q . If the QFT modulo p is in QNC 0 f , there exists a poly(n)-time exact classical algorithm for the DLP over the multiplicative group of integers modulo q using the QNC 0 f oracle.
As in the cryptographic literature, we assume that there exist infinitely many safe primes. Since we require the assumption about the QFT modulo p, Theorem 1.3 does not imply the existence of the above-mentioned problem. However, it implies that, under the plausible assumption that the DLP in Theorem 1.3 is classically hard, there exists a classically hard problem that is solvable exactly by a QNC 0 f circuit with gates for the QFT modulo p. Theorem 1.3 suggests the following key problem for further understanding the relationship between QNC 0 f and efficient classical or quantum computation: Is the QFT modulo p in QNC 0 f ? If this is the case, Theorem 1.3 implies the existence of a classically hard problem that is solvable exactly by a QNC 0 f circuit (under a plausible assumption). If not, QNC 0 f is strictly weaker than efficient quantum computation, more precisely, it is strictly contained in the class of quantum operations implementable approximately (or even exactly) by polynomial-size quantum circuits. This is because the QFT modulo p is in the latter class (Hales & Hallgren 2000; Høyer &Špalek 2005; Mosca & Zalka 2004) . We leave the problem about the QFT modulo p as an open problem.
The main components of van Dam's exact quantum algorithm for the DLP in Theorem 1.3 are the QFT modulo p, arithmetic operations such as modular exponentiation, and an amplitude amplification procedure (Brassard & Høyer 1997; Brassard et al. cc 25 (2016) 2002). Our rigorous analysis of the algorithm shows that these components except the QFT can be implemented by using the OR functions and iterated multiplications with values precomputed by polynomial-time exact classical algorithms. This analysis with Theorem 1.1 implies Theorem 1.3.
The remainder of this paper is organized as follows. In Section 2, we give some definitions and the idea of the OR reduction to describe our results precisely. In Sections 3 and 4, we describe the circuits in Theorems 1.1 and 1.2, respectively. In Section 5, we describe the algorithm in Theorem 1.3. In Section 6, we conclude this paper and give some open problems.
Preliminaries
2.1. Quantum circuits and complexity classes. We use the standard notation for quantum states and the standard diagrams for quantum circuits (Nielsen & Chuang 2000) . A quantum circuit consists of elementary gates, where the elementary gates are onequbit gates, CNOT gates, and unbounded fan-out gates (unless otherwise stated). A fan-out gate on k + 1 qubits implements the quantum operation defined as
where y, x j ∈ {0, 1}, k ≥ 1, and ⊕ denotes addition modulo 2. The first input qubit, i.e., the qubit in state |y , is called the control qubit. When k = 1, the gate is a CNOT gate. When a fan-out gate is allowed to be applied on a non-constant number of qubits, it is called an unbounded fan-out gate. Since an unbounded fanout gate makes copies of a classical source bit, we may say "copy" when we apply this gate.
The complexity measures of a quantum circuit are its size and depth. The size of a quantum circuit is defined as the total size of all elementary gates in it, where the size of an elementary gate is defined as the number of qubits on which the gate acts. The depth of a quantum circuit is defined as follows. Input qubits are considered to have depth 0. The depth of each gate G is equal to 1 cc 25 (2016) Constant-depth exact quantum circuits 857 plus the maximum depth of a gate on which G depends. The depth of a quantum circuit is defined as the maximum depth of a gate in it. Intuitively, the depth is the number of layers in the circuit, where a layer consists of gates that can be applied in parallel. A quantum circuit can use ancillary qubits initialized to |0 .
For any a = a 0 · · · a n−1 ∈ {0, 1} n \{0 n }, the parity function with value a on n bits, denoted as PA a n , is defined as PA a
We denote PA 1 n n as PA n . For example, PA 10 2 (x) = x 0 , PA 01 2 (x) = x 1 , and PA 11 2 (x) = PA 2 (x) = x 0 ⊕ x 1 . For any integer 1 ≤ t ≤ n, the threshold function with a threshold t on n bits, denoted as TH t n , is defined as TH t n (x) = 1 if |x| ≥ t and 0 otherwise, where x = x 0 · · · x n−1 ∈ {0, 1} n and |x| = n−1 j=0 x j , the Hamming weight of x. The OR function on n bits, denoted as OR n , is defined as TH 1 n . The AND function on n bits, denoted as AND n , is defined as TH n n . The exact function with value t on n bits, denoted as EX t n , is defined as with TH t n except that |x| ≥ t in the definition of TH t n is replaced with |x| = t. The function EX 0 n is defined as the negation of OR n . The quantum operation for PA a n is defined as
where x j , z ∈ {0, 1} and x = x 0 · · · x n−1 . The last qubit is called the output qubit. It is also denoted as PA a n . The quantum operations TH t n , OR n , AND n , and EX t n are defined similarly. For any integer m > 0, the quantum Fourier transform modulo m, denoted as F m , is the quantum operation on log m qubits defined as
where 0 ≤ x ≤ m − 1 and ω m = e 2πi/m . The base of the logarithm in this paper is 2. The quantum complexity class QNC 0 f is the class of quantum operations implementable exactly by (uniform families of) constant-depth polynomial-size quantum circuits consisting of the cc 25 (2016) elementary gates described above. The class QAC 0 f is defined as with QNC 0 f except that quantum circuits can use a gate for OR k as an elementary gate for any k bounded above by an arbitrary poly(n) for input length n. The class QTC 0 f is defined as with QAC 0 f except that quantum circuits can use a gate for TH t k as an elementary gate for any k bounded above by an arbitrary poly(n) and 1 ≤ t ≤ k. Although some authors assume that quantum circuits can use only a bounded number of distinct one-qubit gates (Høyer &Špalek 2005), we do not assume this since we consider the exact setting. Thus, the complexity classes in this paper are equal to or larger than those in the papers that considered only a bounded number of distinct one-qubit gates. We note, however, that one-qubit gates used in our circuits in Sections 3 and 4 are only Hadamard gates H and Z(±π/2 k ) gates for any integer k ≥ 0, where 
Høyer andŠpalek's OR reduction.
The OR reduction mentioned in Section 1 is an O(1)-depth O(n log n)-size quantum circuit for exactly reducing the problem of computing OR n to that of computing OR m , where m = log(n + 1) . We explain the idea of the circuit, which will be used in our circuits. We want to compute OR n and let |x = |x 0 · · · |x n−1 be an input state, where
for any 0 ≤ k ≤ m − 1. If |x| = |x 0 · · · x n−1 | = 0, H|ϕ k = |0 for any 0 ≤ k ≤ m − 1 and thus the output state is |0 ⊗m . If |x| ≥ 1, there exist 0 ≤ a ≤ m − 1 and b ≥ 0 such that |x| = 2 a (2b + 1). A direct calculation shows that H|ϕ a = |1 and thus the output state is orthogonal to |0 ⊗m . Therefore, the circuit exactly reduces 
Circuit for the OR function
3.1. Exponential-size circuit. For any Boolean function f n : {0, 1} n → {0, 1} satisfying f n (0 n ) = 0, there exists a set of real numbers {r a } a∈{0,1} n \{0 n } such that, for any x ∈ {0, 1} n ,
This is shown by using the Fourier expansion of f n (O'Donnell 2008), more precisely, by replacing the Fourier basis in the Fourier expansion of f n with a basis consisting of the parity functions PA a n . In particular, the following representation of OR n can be obtained by using the Fourier expansion of OR n : cc 25 (2016) Lemma 3.1. For any x ∈ {0, 1} n , OR n (x) = 1 2 n−1 a∈{0,1} n \{0 n } PA a n (x).
Proof. We show this by induction on n (without using the Fourier expansion of OR n explicitly). It is obvious that the lemma holds when n = 1. We assume that it holds when n = k. For any
where the induction hypothesis implies the second equation. The value is equal to OR k+1 (x 0 · · · x k ) and thus, when n = k + 1, the lemma holds as desired.
Using the above-mentioned representation of f n , Hoban et al. provided a procedure for computing f n in a restricted model of measurement-based quantum computation, where the adaptivity of measurements is removed (Hoban et al. 2011) . Roughly speaking, a simulation of the procedure yields an O(1)-depth O(n2 n )-size quantum circuit for OR n . More concretely, when the input x is given, we compute PA a n (x) for every a in parallel and prepare the state |0 ⊗(2 n −1) + (−1) ORn(x) |1 ⊗(2 n −1) √ 2 based on the representation of OR n . Applying an unbounded fanout gate and a Hadamard gate to the state gives the desired state |OR n (x) . The point is that there exists an O(1)-depth O(|a|)-size quantum circuit for PA a n as depicted in Figure 3 .1 (Green et al. 2002) . The details are as follows: Proof. Let |x = |x 0 · · · |x n−1 be an input state. The circuit is described as follows:
1. Copy the input state |x and apply the circuit for PA a n to each copy for every a ∈ {0, 1} n \ {0 n } in parallel to prepare the state a∈{0,1} n \{0 n } |PA a n (x) .
2. Apply a Hadamard gate and an unbounded fan-out gate to ancillary qubits (initialized to |0 ) to prepare the (2 n − 1)qubit state |0 ⊗(2 n −1) + |1 ⊗(2 n −1) √ 2 .
3. Apply controlled-Z(π/2 n−1 ) gates in parallel to the states in Steps 1 and 2 to prepare the state
where Lemma 3.1 implies the equation.
4.
Apply an unbounded fan-out gate and a Hadamard gate to the state in Step 3 to prepare the desired state |OR n (x) .
862 Takahashi & Tani cc 25 (2016) For any 0 ≤ j ≤ n − 1, let e(j) = e 0 · · · e n−1 ∈ {0, 1} n such that e k = 1 if k = j and 0 otherwise. In Step 1, since the input state |x j = |PA e(j) n (x) , it suffices to prepare the state |PA a n (x) for every a ∈ {0, 1} n such that |a| ≥ 2. To prepare the states in parallel, we require the state |x j ⊗(2 n−1 −1) for any 0 ≤ j ≤ n − 1. Thus, before applying the circuit for PA a n , we apply an unbounded fan-out gate to the input qubit in state |x j and 2 n−1 − 1 ancillary qubits for every 0 ≤ j ≤ n − 1 in parallel. In Step 2, we apply an unbounded fan-out gate to the ancillary qubits in state H|0 ⊗ |0 ⊗(2 n −2) . In
Step 3, we use the qubit in state |PA a n (x) as the control qubit of the controlled-Z(π/2 n−1 ) gate. In Step 4, we first apply an unbounded fan-out gate to the state in Step 3 to disentangle the last 2 n − 2 qubits and obtain the state
Thus, the Hadamard gate outputs the desired state. By the construction, the depth of the whole circuit does not depend on n.
Since Step 1 is the dominant part and uses n unbounded fan-out gates on 2 n−1 qubits, the size of the whole circuit is O(n2 n ).
Proof of Theorem 1.1. We show Theorem 1.1 by combining Lemma 3.2 with Høyer andŠpalek's OR reduction:
Proof of Theorem 1.1. Let |x = |x 0 · · · |x n−1 be an input state. The circuit is described as follows:
1. Apply Høyer andŠpalek's OR reduction to the input state |x to prepare the m-qubit state m−1 k=0 H|ϕ k , where m = log(n + 1) .
Apply the circuit in Lemma 3.2 to the state in
Step 1 to prepare the desired state |OR n (x) .
Since
Step 1 exactly reduces the problem of computing OR n to that of computing OR m in depth O(1) and with size O(n log n),
Step 2 outputs the desired state. Since the size of the input to
Step 2 is m, the depth and size of the circuit in Step 2 are O(1) and O(m2 m ) = O(n log n), respectively. Thus, the depth and size of the whole circuit are O(1) and O(n log n), respectively.
cc 25 (2016) Constant-depth exact quantum circuits 863 Theorem 1.1 immediately implies that OR n is in QNC 0 f and thus the following relationship holds:
By applying the method for decreasing the size of a quantum circuit for OR n (Høyer &Špalek 2005) , for any integer constant c ≥ 1, the size of the circuit in Theorem 1.1 can be decreased to O(n log (c) n) without increasing the depth asymptotically, where log (c) n is the c-times iterated logarithm log · · · log n. To show this, we divide the n input qubits into n/ log n blocks of log n qubits. For each block, we apply the circuit in Theorem 1.1 to compute OR log n . We have n/ log n output qubits and apply the circuit again to the output qubits to compute OR n/ log n , which yields the desired output. A direct calculation shows that the depth and size of the whole circuit are O(1) and O(n log (2) n), respectively. We repeat this sizereduction procedure c − 1 times and obtain an O(n log (c) n)-size circuit.
A small-depth circuit for OR n yields a small-depth circuit for EX t n (Høyer &Špalek 2005) . In fact, the circuit in Theorem 1.1 yields a constant-depth circuit for EX t n . To construct the circuit, it suffices to prepare Z(−tπ/2 k )|ϕ k in place of |ϕ k in Høyer anď Spalek's OR reduction and to negate the final output of the circuit in Theorem 1.1. This is done by only adding a Z(−tπ/2 k ) gate for every 0 ≤ k ≤ m − 1 and a NOT gate. Thus, the depth and size of the resulting circuit are asymptotically the same as those in Theorem 1.1. This yields an O(1)-depth O(n log n)-size quantum circuit for EX t n for any 0 ≤ t ≤ n.
864 Takahashi & Tani cc 25 (2016) 
Circuit for the threshold function
First, we describe a constant-depth circuit for TH t n based on the above circuits for EX k n . Then, we describe another constant-depth circuit for TH t n based on a counting circuit. We combine these two circuits to show Theorem 1.2.
Exact-function-based circuit.
The first circuit for TH t n is based on the circuits for EX k n :
Lemma 4.1. There exist the following O(1)-depth quantum circuits for TH t n :
• An O(tn log n)-size circuit for any 1 ≤ t ≤ n/2.
• An O((n − t + 1)n log n)-size circuit for any n/2 ≤ t ≤ n.
Proof. Let 1 ≤ t ≤ n/2 and |x = |x 0 · · · |x n−1 be an input state. The circuit is described as follows:
1. Copy the input state |x and apply the circuit for EX k n to each copy for every 0 ≤ k ≤ t − 1 in parallel to prepare the state t−1 k=0 |EX k n (x) .
Apply the circuit for PA t and a NOT gate to the state in
Step 1 to prepare the state | t−1 k=0 EX k n (x) ⊕ 1 .
If |x| ≥ t, EX k n (x) = 0 for every 0 ≤ k ≤ t − 1. If |x| < t, there exists exactly one 0 ≤ k ≤ t − 1 such that EX k n (x) = 1. Thus, the state in Step 2 is the desired state |TH t n (x) . The depth and size of the circuit in Step 1 are O(1) and O(tn log n) , respectively. The depth and size of the circuit for PA t are O(1) and O(t), respectively. Thus, the depth and size of the whole circuit are O(1) and O(tn log n), respectively. When n/2 ≤ t ≤ n, we modify the circuit in such a way that it prepares the state | n k=t EX k n (x) in Step 2.
When t is an integer constant, the size is O(n log n). On the other hand, when t = n/2 , in other words, for the majority function, the size is O(n 2 log n).
cc 25 (2016) Constant-depth exact quantum circuits 865 4.2. Counting-function-based circuit. The counting function CO n on n bits is defined as CO n (x) = s 0 · · · s m−1 , where x ∈ {0, 1} n , s j ∈ {0, 1}, m = log(n + 1) , and |x| = m−1 j=0 s j 2 j . It computes the binary representation of the Hamming weight of the input. The quantum operation for CO n maps |x |0 ⊗m to |x |s 0 · · · |s m−1 . This operation is also denoted as CO n .
We first construct a constant-depth circuit for CO n . Let |x be an input state. Since |x| = m−1 j=0 s j 2 j , the state |ϕ k in Høyer anď Spalek's OR reduction is
This implies that |ϕ 0 · · · |ϕ m−1 = F 2 m |s m−1 · · · |s 0 . Thus, to obtain the desired state |s m−1 · · · |s 0 , it suffices to implement the following type of the inverse of the QFT:
|x ⊗ F 2 m |s m−1 · · · |s 0 → |x |s m−1 · · · |s 0 .
To implement this operation, we apply HZ(θ) on many |ϕ k 's in parallel for appropriate θ's and copy the resulting states. Some of them are the computational basis states, which yield each s k .
As an example, we consider the case where m = 3. In this case, we first prepare the state |ϕ 0 |ϕ 1 ⊗2 |ϕ 2 ⊗4 with a slightly modified version of Høyer andŠpalek's OR reduction, where
We transform these states into the following states:
• |ψ ε 0 = H|ϕ 0 ,
• |ψ 0 1 = H|ϕ 1 , |ψ 1 1 = HZ(− π 2 )|ϕ 1 , • |ψ 00 2 = H|ϕ 2 , |ψ 10 2 = HZ(− π 4 )|ϕ 2 , |ψ 01 2 = HZ(− π 2 )|ϕ 2 , |ψ 11 2 = HZ(− 3π 4 )|ϕ 2 .
866 Takahashi & Tani cc 25 (2016) It holds that |s 0 = |ψ ε 0 and
|ψ 00 2 , if s 0 = s 1 = 0, |ψ 10 2 , if s 0 = 1, s 1 = 0, |ψ 01 2 , if s 0 = 0, s 1 = 1, |ψ 11 2 , otherwise. If we have sufficiently many copies of the above states, these relationships allow us to prepare the states |s 1 and |s 2 in parallel as follows. To prepare the state |s 1 , we apply two AND 2 gates (on three qubits) and the circuit for PA 2 (on three qubits) as follows:
1. Apply an AND 2 gate to the states X|ψ ε 0 , |ψ 0 1 , and |0 , where X is a NOT gate and the third qubit is an ancillary qubit, which is the output qubit of this gate.
2. Apply an AND 2 gate to the states |ψ ε 0 , |ψ 1 1 , and |0 , where the third qubit is an ancillary qubit (different from the one in Step 1), which is the output qubit of this gate.
3. Apply the circuit for PA 2 to the above output qubits and an ancillary qubit (different from the ones in Steps 1 and 2), which is the output qubit of this circuit.
When s 0 = 0, the gate in Step 1 outputs the state |s 1 . In this case, the state |ψ 1 1 is a superposition state, but the gate in Step 2 does nothing and thus the state of the output qubit in Step 2 is |0 . When s 0 = 1, the gate in Step 1 does nothing and the gate in
Step 2 outputs the state |s 1 . Thus, the circuit in Step 3 outputs the desired state |s 1 . Similarly, to prepare the state |s 2 , we apply four AND 3 gates and the circuit for PA 4 . The circuit for preparing |s 2 excluding the circuit for PA 4 is depicted in Figure 4 .1. The state |ψ in Figure 4 .1 satisfies the following relationship:
Thus, the circuit for PA 4 outputs the desired state |s 2 . We note that the AND 2 gates for preparing |s 1 and the AND 3 gates for cc 25 (2016) Constant-depth exact quantum circuits 867 preparing |s 2 can be applied in parallel. Moreover, the circuits for PA 2 and PA 4 can be applied in parallel.
By generalizing the above idea, we can prepare the state |s k for every 1 ≤ k ≤ m − 1 in parallel. To prepare the state |s k , we apply 2 k AND k+1 gates and the circuit for PA 2 k . Each of the AND k+1 gates is associated with y = y 0 · · · y k−1 ∈ {0, 1} k . More concretely, for any y = y 0 · · · y k−1 ∈ {0, 1} k , we apply an AND k+1 gate to the states X 1+y j |ψ y 0 ···y j−1 j (0 ≤ j ≤ k − 1), |ψ y k , and |0 , where |ψ ε 0 = H|ϕ 0 , |ψ y 0 ···y j−1 j = HZ −π j−1 h=0 y h 2 j−h |ϕ j for any 1 ≤ j ≤ k, and y 0 · · · y j−1 is regarded as ε when j = 0. Only one AND k+1 gate outputs the state |s k and the other AND k+1 gates do nothing. Thus, the circuit for PA 2 k outputs the cc 25 (2016) desired state |s k . The AND k+1 gates for every 1 ≤ k ≤ m − 1 can be applied in parallel. Similarly, the circuit for PA 2 k for every 1 ≤ k ≤ m − 1 can be applied in parallel. The remaining problem is to prepare the copies of the states obtained from the state |ϕ 0 |ϕ 1 ⊗2 |ϕ 2 ⊗4 , such as |ψ ε 0 , |ψ 0 1 , and |ψ 1 1 . A simple way of preparing the copies is to repeatedly use the circuit for preparing the state |ϕ 0 |ϕ 1 ⊗2 |ϕ 2 ⊗4 , but this yields a large-size quantum circuit and thus we avoid this repetition. In general, the above states are not computational basis states and thus we cannot use unbounded fan-out gates to prepare the copies. However, in our circuit for preparing |s k , all AND k+1 gates except only one gate do nothing and the only one gate is always applied to a computational basis state. This means that we do not require the copies of superposition states, i.e., it suffices to prepare only the copies of the computational basis state. Thus, we can use unbounded fan-out gates.
For example, we consider the circuit depicted in Figure 4 .1, where we assume that the initial state is |ψ ε 0 |0 ⊗3 |ψ 0 1 |ψ 1 1 |0 ⊗2 |ψ 00 2 |ψ 10 2 |ψ 01 2 |ψ 11 2 , which is directly obtained from the state |ϕ 0 |ϕ 1 ⊗2 |ϕ 2 ⊗4 with ancillary qubits. We apply three fan-out gates. The first one is applied to the state |ψ ε 0 |0 ⊗3 , where the first qubit is the control qubit. Similarly, the second one is applied to the state |ψ 0 1 |0 and the third one is applied to the state |ψ 1 1 |0 . The state |ψ ε 0 is always a computational basis state and thus the first fan-out gate always outputs the state |ψ ε 0 ⊗4 . When s 1 = s 2 = 0, the state |ψ 0 1 is a computational basis state and the state |ψ 1 1 is not. Thus, the second fan-out gate outputs the state |ψ 0 1 ⊗2 , but the third one does not output the state |ψ 1 1 ⊗2 . The resulting state of the whole qubits is different from the initial state in Figure 4 .1, but the circuit works. In fact, the AND 3 gates applied to superposition states do nothing and only the first AND 3 gate, which is applied to the computational basis state, outputs the desired state |s 2 . The other cases are similar. Thus, in our circuit, it suffices to apply unbounded fan-out gates to the states whose copies may be used.
Based on this idea, we can show the following lemma: Proof. Let |x = |x 0 · · · |x n−1 be an input state. The circuit is described as follows:
1. Apply a slightly modified version of the OR reduction to the input state |x to prepare the state m−1 k=0 |ϕ k ⊗2 k .
2. Apply a H gate to the state |ϕ 0 and a HZ(−π k−1 h=0 y h 2 k−h ) gate to the state |ϕ k for every 1 ≤ k ≤ m − 1 and y = y 0 · · · y k−1 ∈ {0, 1} k in parallel to prepare the states |ψ ε 0 and |ψ y k .
3. Apply an unbounded fan-out gate to the state |ψ ε 0 with ancillary qubits to prepare the state |ψ ε 0 ⊗(2 m −1) . 4. Apply an unbounded fan-out gate to the state |ψ y k with ancillary qubits for every 1 ≤ k ≤ m − 1 and y = y 0 · · · y k−1 ∈ {0, 1} k in parallel to prepare the state that is |ψ y k ⊗(2 m−k −1) if the state |ψ y k is a computational basis state and some (unimportant) superposition state otherwise. 5. Apply the following gates for every 1 ≤ k ≤ m − 1 and y = y 0 · · · y k−1 ∈ {0, 1} k :
(a) An X 1+y j gate to the following qubit for every 0 ≤ j ≤ k − 1: The qubit to which an unbounded fan-out gate is applied in Step 3 or 4 to prepare the state |ψ y 0 ···y j−1 j , where the qubit varies depending on y.
(b) An AND k+1 gate (constructed by the circuit for OR k+1 in Section 3) to the k qubits in (a) and the following two qubits: The qubit to which an unbounded fan-out gate is applied in Step 4 to prepare the state |ψ y k , and an ancillary qubit, which is the output qubit, where the qubits vary depending on y.
870 Takahashi & Tani cc 25 (2016) 6. Apply the circuit for PA 2 k (as in Figure 3 .1) to the output qubits of the AND k+1 gates associated with y ∈ {0, 1} k in
Step 5 and an ancillary qubit, which is the output qubit, for every 1 ≤ k ≤ m − 1.
In
Step 5, all the X gates can be applied in parallel and so can all the AND gates. It is easy to show that the AND k+1 gate associated with y = s 0 · · · s k−1 outputs the state |s k and the other gates do nothing for any 1 ≤ k ≤ m − 1. Thus, the circuit in Step 6 outputs the desired state, where all the circuits in Step 6 can be applied in parallel. By the construction, the depth of the whole circuit does not depend on n. Since Step 1 is the dominant part and the state in Step 1 can be prepared with a circuit of size O(n m−1 k=0 2 k ) = O(n 2 ), the size of the whole circuit is O(n 2 ).
Remark 4.3. Lemma 4.2 yields an O(1)-depth O(n 2 )-size quantum circuit for TH t n . To construct the circuit, it suffices to add a circuit for comparing t with the output of the circuit for CO n . We can obtain an O(1)-depth poly(m)-size quantum circuit for the comparison using the addition circuit in Chandra et al. (1983) .
Remark 4.4. In the extended abstract of this paper (Takahashi & Tani 2013) , we constructed the circuit for CO n in Lemma 4.2 using intermediate measurements. In the present version, we avoid such measurements by directly transforming them into unitary operations. One of the referees of this paper suggests another way of removing intermediate measurements. This is based on the fact that each bit of the output of CO n can be represented as the OR of the outputs of EX k n functions for appropriate k's. The size of the resulting circuit for CO n is O(n 2 log n).
Combination of the two circuits.
A careful combination of the circuits in Lemmas 4.1 and 4.2 yields a smaller circuit for TH t n . We explain the idea in the case when 1 ≤ t ≤ n/2. When the input x is given, we first compute some low-order bits of the binary representation of |x| by the circuit in Lemma 4.2. Since we know the low-order bits, it is not necessary to check whether EX k n (x) = 1 for every 0 ≤ k ≤ t − 1 as in Lemma 4.1. It suffices cc 25 (2016) Constant-depth exact quantum circuits 871 to consider 0 ≤ k ≤ t − 1 such that the low-order bits of the binary representation of k are equal to those computed by the circuit in Lemma 4.2. Since the number of k's we need to consider is decreased, the size of the whole circuit can be decreased. The details are as follows:
Lemma 4.5. There exist the following O(1)-depth quantum circuits for TH t n : • An O(2 l n + 2 −l tn log n + n log n)-size circuit for any 1 ≤ t ≤ n/2 and 0 ≤ l ≤ log(t + 1) .
• An O(2 l n + 2 −l (n − t)n log n + n log n)-size circuit for any n/2 ≤ t ≤ n and 0 ≤ l ≤ log(n − t + 1) .
Proof. Let 1 ≤ t ≤ n/2, 0 ≤ l ≤ log(t + 1) , and |x = |x 0 · · · |x n−1 be an input state. The value l is less than or equal to the length of the binary representation of t. Let t 0 · · · t l−1 be the l low-order bits of the binary representation of t, where t 0 is the lowest order bit. Note that the value t − l−1 j=0 t j 2 j is nonnegative and is a multiple of 2 l . The first circuit is described as follows:
1. Apply the circuit in Lemma 4.2 to the input state |x , where we regard m in the proof of Lemma 4.2 as l. Let |s 0 · · · |s l−1 be the output. In other words, s 0 · · · s l−1 are the l low-order bits of the binary representation of |x|, where s 0 is the lowestorder bit.
2. Apply the first circuit in Lemma 4.1 to the input state |x , where we consider only 0 ≤ k ≤ t−1 such that the l low-order bits of the binary representation of k are equal to s 0 · · · s l−1 . More concretely, k = M 2 l + l−1 j=0 s j 2 j for any integer M satisfying
cc 25 (2016) We note that, before Step 2, we prepare all the binary representations of k satisfying the above conditions by applying unbounded fan-out gates and NOT gates to ancillary qubits. As in the proof of Lemma 4.1, the circuit outputs the desired state |TH t n (x) and the depth of the whole circuit does not depend on n. The sizes of the circuits in Steps 1 and 2 are O(2 l n) and O(2 −l tn log n + n log n), respectively, since M ≤ t/2 l and the number of k's we need to consider is bounded above by t/2 l +1. Thus, the depth and size of the whole circuit are O(1) and O(2 l n+2 −l tn log n+n log n), respectively. To construct the second circuit, we use the second circuit in Lemma 4.1, where we consider only t ≤ k ≤ n such that the l low-order bits of the binary representation of k are s 0 · · · s l−1 . The number of k's we need to consider is bounded above by (n − t)/2 l + 2 and thus the depth and size of the whole circuit are O(1) and O(2 l n + 2 −l (n − t)n log n + n log n), respectively.
By setting l appropriately depending on t, Lemma 4.5 implies Theorem 1.2:
Proof of Theorem 1.2. For any 1 ≤ t ≤ log n, we set l = log(t + 1) in the first circuit in Lemma 4.5 and this yields an O(n log n)-size circuit. For any log n ≤ t ≤ n/2, it holds that 0 ≤ log t log n ≤ log(t + 1) and thus we set l = log √ t log n in the first circuit in Lemma 4.5. This yields an O(n √ t log n)-size circuit. For any n/2 ≤ t ≤ n − log n, it holds that 0 ≤ log (n − t) log n ≤ log(n − t + 1) and thus we set l = log (n − t) log n in the second circuit in Lemma 4.5. This yields an O(n (n − t) log n)-size circuit. For any n − log n ≤ t ≤ n, we set l = log(n − t + 1) in the second circuit in Lemma 4.5 and obtain an O(n log n)-size circuit.
The size of the circuit for TH n/2 n in Lemma 4.1 is O(n 2 log n) and it can be decreased to O(n 2 ) by Lemma 4.2. Theorem 1.2 with t = n/2 yields an even smaller circuit: Let q > 5 be a safe prime, i.e., a prime of the form q = 2p + 1 for some prime p > 2. In the following, as in the cryptographic literature, we assume that there exist infinitely many safe primes. Let G q = (Z/qZ) * , the multiplicative group of integers modulo q. It is known that there exists a generator 1 < g q ≤ q − 1 of G q and thus G q = {g 0 q = 1, g 1 q , . . . , g q−2 q } and g q−1 q ≡ 1 mod q. The discrete logarithm problem (DLP) over G q (with respect to given q and g q ) is to find 0 ≤ l q ≤ q − 2 such that g l≡ x q mod q for an input x q ∈ G q , where the problem size is n = log q and the order of G q , i.e., q − 1, and its decomposition 2p are known. Since it seems difficult to reduce the DLP over G q to DLP's over groups of sufficiently small orders, it is plausible that it cannot be solved by a polynomial-time bounded-error classical algorithm, in other words, that the DLP over G q is classically hard.
Although we can directly consider the DLP over G q , for simplicity, we consider simpler DLP's obtained by the reduction method in Pohlig & Hellman (1978) . Since the order of G q is 2p and gcd(2, p) = 1, the DLP over G q with an input x q can be reduced to the following two DLP's by a poly(n)-time exact classical algorithm. One is the DLP over the group generated by g p q with the input x p q , which is solvable by a poly(n)-time exact classical algorithm since the order of g p q is 2. The other is the DLP over the group G generated by g = g 2 q with the input x = x 2 q . Thus, to show Theorem 1.3, it suffices to show that, if F p is in QNC 0 f , there exists a poly(n)-time exact classical algorithm for the DLP over G using the QNC 0 f oracle, which solves, in classical constant time, a problem that is solvable exactly by a QNC 0 f circuit.
Proof of Theorem 1.3.
We consider a slightly modified version of van Dam's exact algorithm for the DLP. The main differ-
