Quantum Lower Bounds for Fanout by Fang, Maosen et al.
ar
X
iv
:q
ua
nt
-p
h/
03
12
20
8v
1 
 2
8 
D
ec
 2
00
3
Quantum Lower Bounds for Fanout
M. Fang∗ S. Fenner† F. Green‡ S. Homer∗ Y. Zhang†
November 2, 2018
Abstract: We prove several new lower bounds for constant depth quantum circuits. The
main result is that parity (and hence fanout) requires log depth circuits, when the circuits are
composed of single qubit and arbitrary size Toffoli gates, and when they use only constantly
many ancillæ. Under this constraint, this bound is close to optimal. In the case of a non-
constant number a of ancillae, we give a tradeoff between a and the required depth, that
results in a non-trivial lower bound for fanout when a = n1−o(1).
1 Introduction
There has been significant recent progress in understanding the power of constant depth
quantum circuits. Such circuits are of considerable interest as the first quantum circuits will
certainly be small circuits with limited gates and constant depth. Much of the progress in this
area has been in showing that constant depth circuits are more powerful than their classical
counterparts. However, these and other upper bounds seem to require the presence of a
(reversible) quantum fanout gate. A fanout gate takes an arbitrary number of bits and fans
out one of them by taking its XOR with each of the others. Here we consider the question of
whether fanout gates are necessary for these upper bounds. We prove several lower bounds
showing that fanout cannot be computed using only generalized (i.e., unbounded size) Toffoli
and single qubit gates when the number of extra work bits (ancillæ) that the circuit uses is
limited.
Fanout gates have proved to be unexpectedly powerful. Moore [5] first observed that
fanout gates and parity gates, in the presence of single qubit gates using 0 ancillae, are
equivalent up to depth 3. This was extended by Green et al. [2]: fanout is even equivalent
to any MODq function (for q ≥ 2), which determines if the number of 1s in the input is
not divisible by q. Here the equivalence is again up to constant depth, but using O(n) an-
cillæ. One may interpret this result by defining quantum circuit classes analogous to classical
constant-depth circuit classes. For example, a reasonable analog of the classical unbounded
fanin and fanout class AC0 is QAC0wf , the class of constant depth quantum circuit families
∗Computer Science Department, Boston University, Boston, MA 02215, {heroes|homer}@bu.edu
†Dept. of CS and Eng., University of South Carolina, Columbia, SC 29208, {fenner|zhang29}@cse.sc.edu
‡Dept. of Math and CS, Clark University, Worcester, MA 01610, fgreen@black.clarku.edu
1
composed of single qubit, generalized Toffoli, and fanout gates. (Here the subscript “wf ” de-
notes “with fanout.”) Similarly one may define quantum analogs of ACC(q) (called QACC(q))
and ACC (called QACC). Thus the equivalence of fanout with MODq implies that, for any
q > 2, QAC0wf = QACC(q) = QACC. Contrast this with the fact that AC
0 6= ACC, and,
for any distinct primes q, p, ACC(q) 6= ACC(p) [7, 8]. More recently, Høyer and Spalek [3]
have improved these results by proving these same QAC0wf circuits can compute threshold
functions. Thus QAC0wf = QTC
0, an even sharper contrast with the classical classes. Indeed,
this result implies that we can approximate the quantum fast Fourier transform in constant
depth using fanout. Thus the “quantum part” of Shor’s renowned quantum factoring algo-
rithm can be carried out with a quite simple, constant depth quantum circuit that uses the
fanout operator.
These results suggest the following question: Is fanout really necessary to do the quantum
Fourier transform in constant depth? While so much can be “reduced” to fanout, it is far from
clear how much can be reduced to fanin, even in what appears to be its weakest form (i.e., the
generalized Toffoli gate). Although generalized Toffoli gates can involve just as many bits as
fanout gates, they may be more feasible to implement and it is instructive to investigate their
power in constant-depth circuits. Note that Cleve and Watrous [1] proved that with only
one and two qubit gates it is not possible to approximate the quantum Fourier transform in
less than log depth, but no similar lower bounds against quantum circuits containing gates
of unbounded size are known.
Our main result, proved in Section 4, is that one cannot compute parity (and hence
fanout) with QAC0 circuits (i.e., in constant depth, without fanout) using a constant number
of ancillæ. This is the first hard evidence that QAC0 and QAC0wf may be different, and that
fanout may be necessary for all the upper bound results mentioned above (it certainly is
if we can get by with only constantly many ancillæ). The issue of the necessity of ancillæ
in quantum computations is a murky one. It is generally accepted that a limited number
(polynomially many relative to the number of inputs) are needed. This seems reasonable
as it allows polynomially extra space in which to carry out a computation. However, it is
possible to approximate any unitary operator with a small set of universal gates without
ancillæ (although one apparently needs circuits of great depth and size in order to do so).
Furthermore, to our knowledge, no systematic investigation into the absolute necessity of
ancillæ has been done. They play a crucial role in the present result, in which we find
the lower bound to be difficult to obtain when more than sublinearly many ancillæ are
allowed. To help clarify this problem, we provide a proof (implicitly claimed, but omitted,
in Cleve and Watrous) that quantum circuits with gates of bounded size must be of log
depth to compute parity (and hence fanout) exactly. In particular, we carefully address the
problem of including ancillæ, and show that in this case the depth of the circuit must be
log n to compute parity, no matter how many ancillæ are used. This is given in Section 3.
In Section 4, we allow circuits to include Toffoli gates of unbounded size. It is easiest to
see the log-depth lower bound in the case of zero ancillæ, so this result is given first, in
Theorem 4.3. We then explain how the proof yields a depth/ancillæ trade-off, showing that
with fewer ancillæ one needs greater depth to compute fanout.
2
We end with some open questions.
2 Preliminaries
In this section we set down most of our notational conventions and the circuit elements we
use. Some acquaintance with quantum computational complexity as described in [6] or [4]
is assumed.
The following notation and terminology will be convenient. Let H denote the 2-
dimensional Hilbert spanned by the computational basis states |0〉, |1〉. Let H1, . . . ,Hn
be n copies of H. By B{1,...,n} (or simply “Bn” when the set notation is clearly understood)
we denote the 2n-dimensional Hilbert space H1⊗ · · ·⊗Hn spanned by the usual set of com-
putational basis states of the form |x1, . . . , xn〉, where each xi ∈ {0, 1}. We also consider
“quotient spaces of B{1,...,n} over m bits,” defined as B{i1,...,im} = Hi1 ⊗ · · · ⊗ Him , where
{i1, . . . , im} ⊆ {1, . . . , n}, which obviously have dimension 2m. A “state over a set of m bits”
is a state in such a quotient space. A quantum gate G corresponds to a unitary operator
(also denoted G) acting on some quotient space B{i1,...,im} of Bn. We will say that G involves
the bits i1, . . . , im. We will freely identify G with any “extension by the identity” that acts
on a bigger quotient space BA for any set of bits A ⊇ {i1, . . . , im}, that is, G can be identi-
fied with the operator G⊗ I, where I is the identity on BA−{i1,...,im}. If we fix a state |Ψm〉
over m bits {i1, . . . , im}, we are effectively restricting B{1,...,n} to the 2n−m-dimensional linear
subspace |Ψm〉 ⊗ B{1,...,n}−{i1,...,im}. The space B{1,...,n}−{i1,...,im} is referred to as the quotient
space of B{1,...,n} complementary to |Ψm〉.
A single-qubit gate is a 2×2 unitary matrix (e.g., acting in B{1}). For example, the
Hadamard gate H is the single-qubit gate,
H =
1√
2
[
1 1
1 −1
]
.
A generalized Toffoli gate, which we refer to in this paper as simply a Toffoli gate T , trans-
forms computational basis states as follows:
T |x1, ..., xn〉 = |x1, ..., xn, b⊕ ∧ni=1xi〉
A generalized Z-gate, which we refer to as a Z-gate for brevity, has the following effect:
Z|x1, ..., xn〉 = (−1)
∧
n
i=1|x1, ..., xn〉.
It is not hard to show that, T = HZH where the Hadamard gate H in this equation is
applied to the target bit of T . Hence we may substitute Z-gates for T -gates in any circuit
that allows Hadamards (which will be true throughout the paper). Z-gates are useful for our
purposes since they do not permute computational basis states, and thus have no preferred
target bit.
3
The fanout gate F and the parity gate P are defined, respectively, by
F |x1, ..., xn, b〉 = |b⊕ x1, ..., b⊕ xn, b〉,
P |x1, ..., xn, b〉 = |x1, ..., xn, b⊕
n⊕
i=1
xi〉.
There is no obvious a priori relation between these operators, but as was observed by Moore,
F is conjugate to P via an n+ 1-fold tensor product of Hadamards applied to all the bits:
F = H⊗(n+1)PH⊗(n+1) (1)
Recall that Hadamard, phase, CNOT (Toffoli gates for n = 1), and pi/8 gates are a
universal set of gates in that any unitary operator can be approximated to an arbitrary
degree of precision with them. Our lower bound techniques work against arbitrary sets
of single-qubit gates combined with Z-gates, which is also a universal set by the above
discussion.
A quantum circuit is constructed out of layers. Each layer L is a tensor product of a
certain fixed set of gates (in our main theorems, these will consist of single-qubit and Z-
gates). A circuit is simply a (matrix) product of layers L1L2 · · · Ld. (Observe that “last”
layer Ld is actually the one that is applied directly to the inputs, and L1 is the output
layer.) The number of layers d is called the depth of C. A circuit C over n qubits is then a
unitary operator in the 2n-dimensional Hilbert space B{1,...,n}. Clearly, C computes a unitary
operator U exactly if for all computational basis states, C|x1, ..., xn〉 = U |x1, ..., xn〉. This is
in general too restrictive, however. One must allow for the presence of “work bits,” called
ancillæ, that make extra space available in which to do a computation. In that case, in
order to exactly compute the operator U we extend the Hilbert space in which C acts to the
2n+m-dimensional space spanned by computational basis states |x1, ..., xn, a1, ..., am〉, where
again xi, ai ∈ {0, 1}, the ai serving as ancillæ. Then we say that C cleanly computes U if,
for any x1, ..., xn and y1, ..., yn,
〈y1, ..., yn, 0, ..., 0|C|x1, ..., xn, 0, ..., 0〉 = 〈y1, ..., yn, 0, ..., 0|(U ⊗ I)|x1, ..., xn, 0, ..., 0〉,
where I is the identity in the subspace that acts on the ancillæ, and the number of 0s in each
state above is m. That is, C does a clean computation if the ancillæ begin and end all as 0s.
We assume all of our circuits perform clean computations. This is a reasonable constraint,
since only then is it easy to compose the circuits.
Lastly, all circuits should be understood to be elements of an infinite family of circuits
{Cn|n ≥ 0}, where Cn is a quantum circuit for n qubits.
3 Fanout Requires Log Depth with Bounded Size
Gates
It is easy to see that, by an obvious divide-and-conquer strategy, we can compute parity in
depth log n using just CNOT gates and 0 ancillae. In this section we prove this is optimal
4
.   .   .   .   .   .
L
′
d
Ld
L1
R1M
Rd
L
′
1
Figure 1: Decomposition of the layers of the circuit C.
for any bounded size multi-qubit gates, and furthermore that no number of ancillae help to
reduce the depth of the circuit.
Let C = L1 · · · Ld consist entirely of arbitrary two-qubit gates and single-qubit gates.
(The extension to arbitrary, but fixed, size gates is straightforward.) Further suppose that
M is an observable on a single qubit in the last layer. Let L′1 denote the gate whose output
M is measuring. L′1 could be a two-qubit or a single-qubit gate. In either case, L1 = L
′
1⊗R1,
where R1 is the tensor product of all the other gates in that layer, if any. More generally,
we decompose layer i similarly, writing Li = L
′
i ⊗Ri, where L′i is a transformation that acts
on some subset of the bits, and Ri acts on the rest.
Lemma 3.1. For each d, there are layers L′1, ..., L
′
d such that
L†dL
†
d−1 · · · L†1ML1 · · · Ld−1Ld = L′†dL′†d−1 · · · L′†1ML′1 · · · L′d−1L′d
where, for each i, L′i acts on at most 2
i bits. Furthermore, for each i, L′i acts on bits with
indices in some set Si such that Sd ⊇ Sd−1 ⊇ ... ⊇ S1.
Figure 1 makes the notation a little clearer. Note that the input will, as usual, be on the
left, but it doesn’t enter the claim (or the following argument) at all.
Proof: The proof of Lemma 3.1 is by induction on d. First consider d = 1. Then consider
the operator L†1ML1. By the observations above, we may write L1 = L
′
1 ⊗ R1, where L′1 is
either a single or two-qubit gate. So,
L†1ML1 = (L
′†
1 ⊗R†1)M(L′1 ⊗R1) = L′†1ML′1,
by virtue of the fact that M and R1 commute. Since L
′
1 only depends on ≤ 2 qubits, this
establishes the result for d = 1.
Now suppose that we can write,
L†dL
†
d−1 · · · L†1ML1 · · · Ld−1Ld = L′†dL′†d−1 · · · L′†1ML′1 · · · L′d−1L′d
5
where, for each i, L′i acts on at most 2
i bits. In particular, note that L′d acts on at most 2
d
bits. Suppose that L′d acts on indices in the set Sd (where Sd has size ≤ 2d). Now by the
induction hypothesis,
L†d+1L
†
d · · · L†1ML1 · · · LdLd+1 = L†d+1L′†d · · · L′†1ML′1 · · · L′dLd+1,
and Sd ⊇ Sd−1 ⊇ ... ⊇ S1.
The gates in L′d involve at most the bits in Sd. Since the circuit only contains at most
two-qubit gates, all the gates in Ld+1 involving bits in Sd can act on at most 2
d+1 bits. Let
the tensor product of these gates be denoted by L′d+1, and Sd+1 denote the set of bits on
which L′d+1 acts. Clearly Sd+1 ⊇ Sd. Then for some tensor product of single and two-qubit
gates Rd+1 we may write Ld+1 = L
′
d+1 ⊗ Rd+1. Since Rd+1 acts on bits not in Sd+1, it
commutes with all the L′i and M , which only act on bits inside Sd+1. Hence Rd+1 “cancels
out” and we have the desired relation.
Theorem 3.2. Let C be a quantum circuit on n inputs of depth d, consisting of single-qubit
and two-qubit gates, with any number of ancillæ that cleanly computes parity exactly. Then
d ≥ logn. If C computes fanout in the same way, then d ≥ logn− 2.
Proof: Let C = L1 · · ·Ld as in Lemma 3.1. Suppose C uses m ancillæ, and that it cleanly
computes the parity operator P in depth d < log n. It follows that for any x1, ..., xn, b and
any measurement operator M on the target bit,
〈x1, ..., xn, b, 0, ..., 0|C†MC|x1, ..., xb, 0, ..., 0〉 = 〈x1, ..., xn, b|PMP |x1, ..., xn, b〉. (2)
By Lemma 3.1,
C†MC = L†dL
†
d−1 · · · L†1ML1 · · · Ld−1Ld = L′†dL′†d−1 · · · L′†1ML′1 · · · L′d−1L′d,
where the operator L′1 · · · L′d acts on at most 2d inputs. Since 2d < n, there is an input on
which that operator does not act. Hence the value on the left hand side of eq. (2) remains
unchanged if we can flip some xi. However, the outcome of the measurement on the parity
gate on the right hand side depends on every input, which is a contradiction.
The second assertion in the Theorem follows from eq. (1).
It is clear that if we have a family of circuits that use a fixed set of multi-qubit gates
with arity independent of n, that a similar proof will work. Thus we have the following as a
corollary of the proof of Theorem 3.2:
Corollary 3.3. Let C be a quantum circuit on n inputs of depth d, consisting of single-
qubit and multi-qubit gates of size O(1), with any number of ancillæ, that cleanly computes
parity, or fanout, exactly. Then d = Ω(log n).
6
4 Parity Requires Log Depth with Few Ancillæ
In this section we treat circuits that contain Toffoli gates or, equivalently, Z-gates, of arbi-
trary size (i.e., that can depend on n). The technique of the preceding section does not work
in this case. This is because the large gates in general do not cancel, since they may not
commute with the measurement operator M .
To see how to proceed, it is useful to briefly consider classical circuits with similar con-
straints. Suppose we have a classical circuit with NOT gates and unbounded fan-in AND
and OR gates, but that we do not allow any fanout. Once inputs (or outputs of other gates)
are used in either an AND or an OR gate, they can not be used again. It is obvious that
if such a circuit has constant depth, it cannot compute such functions as parity. The AND
and OR gates can be killed off by restricting a small set of inputs, resulting in a constant
function, while parity depends on all the inputs.
In the quantum case, it appears again that the only thing to do is to attempt to “kill
off” the large Toffoli gates. However, the quantum case is much more subtle since we must
face the fact that intermediate states are a superposition of computational basis states,
and furthermore that the Z-gates, in combination with the single-qubit gates, may cause
entanglement.
As before, write C = L1L2 · · ·Ld. Thus the circuit C transforms the state |Ψ〉 to
L1 · · ·Ld|Ψ〉. We assume wlog that each layer Li is a tensor product of Z-gates and single-
qubit gates. Further assume wlog that a specific bit (say, the nth bit) of C serves as the
output or target bit (which eventually is supposed to agree with the output bit of a parity
gate).
Our main technical lemma is easiest to see in the case that C has no ancillæ, which we
assume until later in the section:
Lemma 4.1. Let C be a circuit as described above, with no ancillæ. Then for each 1 ≤ k ≤
d, there exists a state |Ψk〉 over at most 2k bits such that for any state |R〉 in the quotient
space of Bn complementary to |Ψk〉, the state L1L2 · · ·Lk(|R〉 ⊗ |Ψk〉) has a 0 in the target
position of C.
Proof: The proof is by induction on k. First let k = 1. There are two cases:
1. In layer L1, the target is the output of a single-qubit gate S. Then let the state
|Ψ1〉 = S†|0〉 over the nth bit. Now we may write L1 = L′1 ⊗ S, where L′1 acts on the
quotient spaceR complementary to |Ψ1〉. No matter what state |R〉 ∈ R we choose over
the bits {1, . . . , n−1}, it follows that L1(|R〉⊗|Ψ1〉) = (L′1|R〉)⊗(S|Ψ1〉) = (L′1|R〉)⊗|0〉
has a 0 in the nth position.
2. In layer L1, the target is the output of a Z-gate. Write L1 = L
′
1⊗G, where G is this Z
gate. In this case, we choose |Ψ1〉 = |0〉 over the nth bit. Now G acts both on |Ψ1〉 as
well as the complementary quotient space R (via extension by the identity). But since
G involves a bit that is 0 (i.e., the nth bit), G is equivalent to the unit matrix in R.
Hence for any state |R〉 ∈ R, L1(|R〉 ⊗ |Ψ1〉) = (L′1 ⊗G)(|R〉 ⊗ |Ψ1〉) = (L′1|R〉)⊗ |0〉
again has a 0 in the nth position.
7
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  

















 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  


















 
 
 
 
 
 
 
 
 
 
 
 
 
 














 
 
 
 




 
 
 
 
 





 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



















         
 
 
 
 
 
 
 
 
 
 
 
 
 
 














 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


















 
 
 
 
 
 
 
 
 









.   .   .   .   .   .
Z
Rd
Kd
k
0
Rk
Kk
. . .
Figure 2: The sets Kk and Rk. A Z gate that involves bits in both sets is shown.
Now suppose the assertion is true for k−1 where k > 1. We will show that it remains true
for k. Suppose the |Ψk−1〉 in the assertion is a state over the (at most) 2k−1 bits in the set
Kk−1. Let Rk−1 denote the rest of the bits {1, . . . , n}−Kk−1. Thus |Ψk−1〉 is a state in BKk−1 ,
and the quotient space complementary to |Ψk−1〉 is BRk−1 , which for convenience we denote
by Rk−1. We specify the state |Ψk〉 as follows: Start with Kk := Kk−1 and Rk := Rk−1. If a
Z-gate G in Lk involves bits both in Kk and in Rk, we remove a single bit from Rk on which
G acts, add it to Kk, declare the gate G killed, and remove G from further consideration.
Continue until all such Z-gates have been killed. Since each bit in Kk−1 can be involved
with at most one Z-gate in Lk, the number of bits added to Kk (and removed from Rk) in
this process is at most 2k−1. Let L
(K)
k denote the gates in Lk that involve the bits in Kk,
excluding the Z-gates that have been killed. Then finally, we define the state |Ψk〉 as the
tensor product of L
(K)†
k |Ψk−1〉 with the state in which all the bits in Kk −Kk−1 are 0.
Note that |Ψk〉 is a state over at most 2 · 2k−1 = 2k bits, as seen in Figure 2.
Let Rk denote the quotient space complementary to |Ψk〉. Clearly, Rk = BRk . Now let
|R〉 be any state in Rk (equivalently, over the bits in Rk), and apply Lk to |R〉 ⊗ |Ψk〉. Let
L
(R)
k denote the gates in Lk acting in Rk, again excluding the Z-gates that have been killed.
Note that any Z-gate in layer Lk that involves bits in Kk as well as Rk acts as the identity
on Rk ⊗ |Ψk〉, by the construction of |Ψk〉. Thus we have eliminated these gates from Lk
without any loss of generality. Thus,
Lk(|R〉 ⊗ |Ψk〉) = (L(R)k ⊗ L(K)k )(|R〉 ⊗ |Ψk〉) = (L(R)k |R〉)⊗ (L(K)k |Ψk〉).
Now L
(K)
k |Ψk〉 is the tensor product of |Ψk−1〉 with a number of |0〉 states. So we conclude
that Lk(|R〉 ⊗ |Ψk〉) is of the form |R′〉 ⊗ |Ψk−1〉 for some state |R′〉 ∈ Rk−1. Then,
L1L2 · · ·Lk−1Lk(|R〉 ⊗ |Ψk〉) = L1L2 · · ·Lk−1(|R′〉 ⊗ |Ψk−1〉).
By the induction hypothesis, the right hand side of the above equation has a 0 target bit,
which proves the lemma.
8
Remark. With a bit more careful analysis, Lemma 4.1 can be improved to the following:
Lemma 4.2. Let C be a circuit as described above. Then for each 1 ≤ k ≤ d, there exists
a state |Ψk〉 over at most 2k/2 bits such that for any state |R〉 in the quotient space of Bn
complementary to |Ψk〉, the state L1L2 · · ·Lk(|R〉⊗ |Ψk〉) has a 0 in the target position of C.
The difference is that now |Ψk〉 is over only 2k/2 bits instead of 2k. Instead of giving a
formal proof, we will just sketch the reasons for Lemma 4.2. When some bit (the ith bit, say)
is moved from Rk to Kk, it is set to the |0〉 state. Consider the gate G (if any) in Lk+1 that
involves this bit. If G is a single-qubit gate, then no Z-gate is killed involving the ith bit, so
no additional bit needs to be added to Kk+1 for the sake of the i
th bit. If G is a Z-gate, then
the ith bit alone is enough to kill G, since this bit is already 0. So again, no additional bit
must be added to Kk+1 to kill G. Thus k must increase by 2 for the size of Kk to double.
Note that we handled the base case of Lemma 4.1 this way, obtaining a state over 1 = 20
bits.
Theorem 4.3. Let C be a circuit of depth d consisting of single-qubit gates and Z-gates,
and uses 0 ancillæ. If d < 2 logn, then C cannot compute P .
Proof: Suppose C = P . Then for any input state, the target bit of C is 0 iff the target
bit of P is 0. By Lemma 4.2, there exists a state |Ψ〉 on at most 2d/2 < n bits such that, for
any state |R〉 on the remaining n− 2d/2 bits, C(|R〉⊗ |Ψ〉) has a 0 value for the target. First
let |R〉 be the state with 0s in all n−2d/2 positions (since n−2d/2 > 0, such positions exist).
Then P (|R〉⊗ |Ψ〉) has a 0 target. This is only possible if the state |Ψ〉 is in a quotient space
of Bn spanned by computational basis states in which an even number of the variables are
1. Now change one of the bits of |R〉 from 0 to 1. The target of C(|R〉 ⊗ |Ψ〉) still has the
value 0, but the target of P (|R〉 ⊗ |Ψ〉) must change to 1, which contradicts the assumption
that C = P .
Since fanout and parity are equivalent up to depth 3 (with 0 ancillæ), we have immediately
the following.
Corollary 4.4. Let C be a circuit of depth d consisting of single-qubit gates and Z-gates,
and uses 0 ancillæ. Then, if d < 2 logn− 2, C cannot compute the fanout operation.
We now consider the case in which our circuit has a non-zero number of ancillæ. Firstly,
it is clear that Lemmas 4.1 and 4.2 work if we set a target and all ancillæ to 0 at the same
time. If there are a many ancillæ, then we are setting a + 1 “outputs.” The conclusion of
the analogous Lemma for a ancillæ would then be that the state |Ψ〉 is over (a+ 1)2d/2 bits
(since the number of “committed” bits doubles with each second layer, as in Lemma 4.2).
These bits may include all the ancillæ, and assuming that C does a clean computation, |Ψ〉
will be 0 on the ancillæ (since they must all start out as 0 in order to return to their final
value of 0). Therefore, if n > (a + 1)2d/2, the state |R〉 does not involve any of the ancillæ
and is thus free to take on any value. Thus if n > (a+ 1)2d/2, the output of C is insensitive
9
to changes in at least one of the inputs, and hence the circuit is defeated as before. Note we
have a depth/ancillæ trade-off as a result. We thus have the following corollary of the proof
of Theorem 4.3:
Corollary 4.5. Let C be a circuit of depth d consisting of single-qubit gates and Z-gates.
Then, if C cleanly computes the parity function with a ancillæ, then d ≥ 2 log(n/(a+ 1)).
We conjecture that d must be at least 2 logn no matter what a is.
We offer an alternative interpretation of our result that arose out of conversations with
L. Longpre´. Let us say that a quantum circuit C robustly computes a unitary operator U
if C computes U cleanly and, in addition, if its output is insensitive to the inititial state of
the ancillæ. Thus the ancillæ of C can start out in any state whatsoever; the circuit C is
guaranteed to return the ancillæ to that state in the end, and always gives the same answer.
This of course puts a much stronger constraint on the circuit (since in the usual model we
only insist on a clean computation when the ancillæ are initialized to 0), but such circuits
can be useful (e.g., see exercise 8.5 in Kitaev et al. [4]). It is not hard to see that in this
case, if C consists only of single-qubit and Toffoli gates, then it must have depth logn to
compute parity, regardless of the number of ancillæ.
5 Conclusions and Open Problems
Following the line of earlier work of Green et al., Høyer and Spalek, and Cleve and Watrous
[1], our main result gives an optimal, O(log n) lower bound on the depth of QAC-type circuits
computing fanout, in the presence of limited (slightly sublinear) numbers of ancilæ. It would
clearly be desirable to extend our result to obtain the same conclusion when polynomially
many (or an unlimited number of) ancillæ are allowed, and thus to prove that QAC0 6=
QAC
0
wf .
The role of ancillæ in quantum computation has not received much detailed attention.
Prompted by our considerations here, there are several interesting questions that arise. One
issue is the necessity of ancillae for specific quantum computations or classes of quantum
computations. Is there a problem that can be done in constant depth with ancillæ but which
requires log n depth without ancillæ? Similarly, are there computational problems for which
log n depth is possible with ancillæ but without ancillæ, polynomial depth is needed? In
general, how many ancillæ are needed for specific problems? Is there a general tradeoff that
can be proved between numbers of ancillæ and circuit depth?
While much has recently been learned concerning constant depth circuit classes, a few
interesting questions still remain. It would be worthwhile to be able to distinguish between
the power of quantum gates of unbounded arity. We have seen that Toffoli and Z gates
(which are equivalent up to constant depth) are weaker than parity and fanout (which are
equivalent not only to each other but also, for all intents and purposes, to other mod gates,
threshold gates and the quantum Fourier transform). Are there other natural types of gates
that lie between these two classes, or is every gate either equivalent, up to constant depth,
10
to either single qubit and CNOT gates, or to Toffoli gates, or to parity? It would also be of
interest to characterize exactly what can be computed in constant depth using only single
qubit and CNOT gates, as even very optimistically, this is the kind of circuit that might be
built in the not too distant future.
6 Acknowledgements
We thank Luc Longpre´ for helpful discussions and comments on this paper. This work
was supported in part by the National Security Agency (NSA) and Advanced Research
and Development Agency (ARDA) under Army Research Office (ARO) contract numbers
DAAD 19-02-1-0058 (for M. Fang, S. Homer, and F. Green) and DAAD 19-02-1-0048 (for
S. Fenner and Y. Zhang).
References
[1] R. Cleve and J. Watrous, “Fast parallel circuits for the quantum Fourier transform,”
Proceedings of the 41st Annual Symposium on Foundations of Computer Science (2000),
526–536.
[2] F. Green, S. Homer, C. Moore and C. Pollett, ”Counting, Fanout and the Complexity of
Quantum ACC,” Quantum Information and Computation 2 (2002) 35–65.
[3] P. Høyer and R. Spalek, “Quantum circuits with unbounded fan-out,” 20th STACS Con-
ference, 2003, LNCS 2607, 234–246.
[4] A. Yu. Kitaev, A. H. Shen, and M. N. Vyalyi, Classical and Quantum Computation,
American Mathematical Society, 2002.
[5] Cristopher Moore. Quantum Circuits: Fanout, Parity, and Counting. In Los Alamos
Preprint archives (1999), quant-ph/9903046.
[6] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information,
Cambridge University Press, 2001.
[7] A. A. Razborov, Lower bounds on the size of bounded depth networks over a complete
basis with logical addition, Matematicheskie Zametki 41 (1987) 598-607. English transla-
tion in Mathematical Notes of the Academy of Sciences of the USSR 41 (1987) 333-338.
[8] R. Smolensky, Algebraic methods in the theory of lower bounds for Boolean circuit
complexity, in Proceedings of the 19th Annual ACM Symposium on Theory of Computing
(1987) 77-82.
11
