Quantum Lower Bounds for Fanout by Fang, M. et al.
Boston University
OpenBU http://open.bu.edu
Computer Science CAS: Computer Science: Technical Reports
2004-01-12
Quantum Lower Bounds for Fanout
https://hdl.handle.net/2144/1530
Boston University
Quantum Lower Bounds for Fanout
M. Fang

S. Fenner
y
F. Green
z
S. Homer

Y. Zhang
y
November 26, 2003
Abstract: We prove several new lower bounds for constant depth quantum circuits. The
main result is that parity (and hence fanout) requires log depth circuits, when the circuits are
composed of single qubit and arbitrary size Tooli gates, and when they use only constantly
many ancill. Under this constraint, this bound is close to optimal. In the case of a non-
constant number a of ancill and n input qubits, we give a tradeo between a and the
required depth, that results in a non-trivial lower bound for fanout when a = n
1 o(1)
.
1 Introduction
There has been signicant recent progress in understanding the power of constant depth
quantum circuits. Such circuits are of considerable interest as the rst quantum circuits will
certainly be small circuits with limited gates and constant depth. Much of the progress in this
area has been in showing that constant depth circuits are more powerful than their classical
counterparts. However, these and other upper bounds seem to require the presence of a
(reversible) quantum fanout gate. A fanout gate takes an arbitrary number of bits and fans
out one of them by taking its XOR with each of the others. Here we consider the question of
whether fanout gates are necessary for these upper bounds. We prove several lower bounds
showing that fanout cannot be computed using only generalized (i.e., unbounded size) Tooli
and single qubit gates when the number of extra work bits (ancill) that the circuit uses is
limited.
Fanout gates have proved to be unexpectedly powerful. Moore [7] rst observed that
fanout gates and parity gates, in the presence of single qubit gates using 0 ancill, are
equivalent up to depth 3. This was extended by Green et al. [4]: fanout is even equivalent
to any MOD
q
function (for q  2), which determines if the number of 1's in the input is
not divisible by q. Here the equivalence is again up to constant depth, but using O(n)
ancill. One may interpret this result by dening quantum circuit classes analogous to
classical constant-depth circuit classes. For example, a reasonable analog of the classical
unbounded fanin and fanout class AC
0
is QAC
0
wf
, the class of constant depth quantum circuit

Computer Science Department, Boston University, Boston, MA 02215, fheroesjhomerg@bu.edu
y
Dept. of CS and Eng., University of South Carolina, Columbia, SC 29208, ffennerjzhang29g@cse.sc.edu
z
Dept. of Math and CS, Clark University, Worcester, MA 01610, fgreen@black.clarku.edu
1
families composed of single qubit, generalized Tooli, and fanout gates. (Here the subscript
\wf " denotes \with fanout.") Similarly one may dene quantum analogs of ACC(q) (called
QACC(q)) and ACC (called QACC). Thus the equivalence of fanout with MOD
q
implies that,
for any q > 2, QAC
0
wf
= QACC(q) = QACC. Contrast this with the fact that AC
0
6= ACC
(Furst, Saxe and Sipser [3]), and, for any distinct primes q; p, ACC(q) 6= ACC(p) [9, 11]. More
recently, Hyer and

Spalek [5] have improved these results by proving these same QAC
0
wf
circuits can compute threshold functions. Thus QAC
0
wf
= QTC
0
, an even sharper contrast
with the classical classes. Indeed, this result implies that we can approximate the quantum
fast Fourier transform in constant depth using fanout. Thus the \quantum part" of Shor's
renowned quantum factoring algorithm [10] can be carried out with a quite simple, constant
depth quantum circuit that uses the fanout operator.
These results suggest the following question: Is fanout really necessary to do the quantum
Fourier transform in constant depth? While so much can be \reduced" to fanout, it is far from
clear how much can be reduced to fanin, even in what appears to be its weakest form (i.e., the
generalized Tooli gate). Although generalized Tooli gates can involve just as many bits as
fanout gates, they may be more feasible to implement and it is instructive to investigate their
power in constant-depth circuits. Note that Cleve and Watrous [1] proved that with only
one and two qubit gates it is not possible to approximate the quantum Fourier transform in
less than log depth, but no similar lower bounds against quantum circuits containing gates
of unbounded size are known.
Our main result, proved in Section 4, is that one cannot compute parity (and hence
fanout) with QAC
0
circuits (i.e., in constant depth, without fanout) using a constant number
of ancill. This is the rst hard evidence that QAC
0
and QAC
0
wf
may be dierent, and that
fanout may be necessary for all the upper bound results mentioned above (it certainly is if we
limit our computations to only constantly many ancill). The issue of the necessity of ancill
in quantum computations is a murky one. It is generally accepted that a limited number
(polynomially many relative to the number of inputs) are needed. This seems reasonable
as it allows polynomially extra space in which to carry out a computation. However, it is
possible to approximate any unitary operator with a small set of universal gates without
ancill (although one apparently needs circuits of great depth and size in order to do so).
Furthermore, to our knowledge, no systematic investigation into the absolute necessity of
ancill has been done. They play a crucial role in the present result, in which we nd the
lower bound to be diÆcult to obtain when more than sublinearly many ancill are allowed.
To help clarify this problem, in Section 3 we provide a proof (implicitly stated in Cleve
and Watrous [1]) that quantum circuits with gates of bounded size must be of log depth to
compute parity (and hence fanout) exactly. In particular, we carefully address the problem
of including ancill, and show that in this case the depth of the circuit must be log n to
compute parity, no matter how many ancill are used. This proof serves as a revealing,
though considerably simpler warm-up to our main theorem in Section 4. In this section we
consider circuits which include Tooli gates of unbounded size. It is easiest to see the log-
depth lower bound in the case of zero ancill, so this result is given rst, in Theorem 4.3.
We then explain how the proof yields a depth/ancill trade-o, showing that with fewer
2
ancill one needs greater depth to compute fanout.
We end with some open questions.
2 Preliminaries
In this section we set down most of our notational conventions and the circuit elements we
use. Some acquaintance with quantum computational complexity as described in [8] or [6]
is assumed.
The following notation and terminology will be convenient. Let H denote the 2-
dimensional Hilbert space spanned by the computational basis states j0i; j1i. Let H
1
; : : : ;H
n
be n copies of H. By B
f1;:::;ng
(or simply \B
n
" when the set notation is clearly understood)
we denote the 2
n
-dimensional Hilbert space H
1

    
H
n
spanned by the usual set of com-
putational basis states of the form jx
1
; : : : ; x
n
i, where each x
i
2 f0; 1g. We also consider
\quotient spaces of B
f1;:::;ng
over m bits," dened as B
fi
1
;:::;i
m
g
= H
i
1

    
 H
i
m
, where
fi
1
; : : : ; i
m
g  f1; : : : ; ng, which obviously have dimension 2
m
. A \state over a set of m bits"
is a state in such a quotient space. A quantum gate G corresponds to a unitary operator
(also denoted G) acting on some quotient space B
fi
1
;:::;i
m
g
of B
n
. We will say that G involves
the bits i
1
; : : : ; i
m
. We will freely identify G with any \extension by the identity" that acts
on a bigger quotient space B
A
for any set of bits A  fi
1
; : : : ; i
m
g, that is, G can be identi-
ed with the operator G 
 I, where I is the identity on B
A fi
1
;:::;i
m
g
. If we x a state j	
m
i
over m bits fi
1
; : : : ; i
m
g, we are eectively restricting B
f1;:::;ng
to the 2
n m
-dimensional linear
subspace j	
m
i 
 B
f1;:::;ng fi
1
;:::;i
m
g
. The space B
f1;:::;ng fi
1
;:::;i
m
g
is referred to as the quotient
space of B
f1;:::;ng
complementary to j	
m
i.
A single-qubit gate is a 22 unitary matrix (e.g., acting in B
f1g
). For example, the
Hadamard gate H is the single-qubit gate,
H =
1
p
2
"
1 1
1  1
#
:
A generalized Tooli gate, which we refer to in this paper as simply a Tooli gate T , trans-
forms computational basis states as follows:
T jx
1
; :::; x
n
; bi = jx
1
; :::; x
n
; b ^
n
i=1
x
i
i
A generalized Z-gate, which we refer to as a Z-gate for brevity, has the following eect:
Zjx
1
; :::; x
n
i = ( 1)
V
n
i=1
x
i
jx
1
; :::; x
n
i:
It is not hard to show that, T = HZH where the Hadamard gate H in this equation is
applied to the target bit b of T . Hence we may substitute Z-gates for T -gates in any circuit
that allows Hadamards (which will be true throughout the paper). Z-gates are useful for
our purposes since they are bosonic (that is, completely symmetric over their bits), and thus
have no preferred target bit.
3
The fanout gate F and the parity gate P are dened, respectively, by
F jx
1
; :::; x
n
; bi = jb x
1
; :::; b x
n
; bi;
P jx
1
; :::; x
n
; bi = jx
1
; :::; x
n
; b
n
M
i=1
x
i
i:
There is no obvious a priori relation between these operators, but as was observed by Moore,
F is conjugate to P via an (n+1)-fold tensor product of Hadamards applied to all the bits:
F = H

(n+1)
PH

(n+1)
(1)
Recall that Hadamard, phase, CNOT (Tooli gates for n = 1), and =8 gates are a
universal set of gates in that any unitary operator can be approximated to an arbitrary
degree of precision with them. Our lower bound techniques work against arbitrary single-
qubit gates combined with Z-gates, which together also form a universal set by the above
discussion.
A quantum circuit is constructed out of layers. Each layer L is a tensor product of a
certain xed set of gates (in our main theorems, these will consist of single-qubit and Z-
gates). A circuit is simply a (matrix) product of layers L
1
L
2
   L
d
. (Observe that the
\last" layer L
d
is actually the one that is applied directly to the inputs, and L
1
is the output
layer.) The number of layers d is called the depth of C. A circuit C over n qubits is then a
unitary operator in the 2
n
-dimensional Hilbert space B
f1;:::;ng
. Clearly, C computes a unitary
operator U exactly if for all computational basis states, Cjx
1
; :::; x
n
i = U jx
1
; :::; x
n
i. This is
in general too restrictive, however. One must allow for the presence of \work bits," called
ancill, that make extra space available in which to do a computation. In that case, in
order to exactly compute the operator U we extend the Hilbert space in which C acts to the
2
n+m
-dimensional space spanned by computational basis states jx
1
; :::; x
n
; a
1
; :::; a
m
i, where
again x
i
; a
i
2 f0; 1g, the a
i
serving as ancill. Then we say that C cleanly computes U if,
for any x
1
; :::; x
n
and y
1
; :::; y
n
,
hy
1
; :::; y
n
; 0; :::; 0jCjx
1
; :::; x
n
; 0; :::; 0i = hy
1
; :::; y
n
; 0; :::; 0j(U 
 I)jx
1
; :::; x
n
; 0; :::; 0i;
where I is the identity in the subspace that acts on the ancill, and the number of 0's in
each state above is m. That is, C does a clean computation if the ancill begin and end
all as 0's. We assume all of our circuits perform clean computations. This is a reasonable
constraint, since only then is it easy to compose the circuits.
Lastly, all circuits should be understood to be elements of an innite family of circuits
fC
n
jn  0g, where C
n
is a quantum circuit for n qubits.
3 Fanout Requires Log Depth with Bounded Size
Gates
It is easy to see that, by an obvious divide-and-conquer strategy, we can compute parity in
depth logn using just CNOT gates and 0 ancill. In this section we prove this is optimal
4
for any bounded size multi-qubit gates, and furthermore that no number of ancill help to
reduce the depth of the circuit.
The intuition behind the proof of the next Lemma seems quite obvious. Namely, if a
depth d circuit is composed of only one- and two- qubit gates, then any output qubit of the
circuit can depend on at most 2
d
input qubits. However, as is often the case in this eld, a
formal proof of this fact is less obvious than rst appears, and the techniques we use here
form the basis for the proof of the lower bound theorem of the next section.
Let C = L
1
   L
d
consist entirely of arbitrary two-qubit gates and single-qubit gates.
(The extension to arbitrary, but xed, size gates is straightforward.) Further suppose that
M is an observable on a single qubit in the last layer. Let L
0
1
denote the gate whose output
M is measuring. L
0
1
could be a two-qubit or a single-qubit gate. In either case, L
1
= L
0
1

R
1
,
where R
1
is the tensor product of all the other gates in that layer, if any. More generally,
we decompose layer i similarly, writing L
i
= L
0
i

R
i
, where L
0
i
is a transformation that acts
on some subset of the bits, and R
i
acts on the rest.
Lemma 3.1. For each d, there are layers L
0
1
; :::; L
0
d
such that
L
y
d
L
y
d 1
   L
y
1
ML
1
   L
d 1
L
d
= L
0
y
d
L
0
y
d 1
   L
0
y
1
ML
0
1
   L
0
d 1
L
0
d
where, for each i, L
0
i
acts on at most 2
i
bits. Furthermore, for each i, L
0
i
acts on bits with
indices in some set S
i
such that S
d
 S
d 1
 :::  S
1
.
Figure 1 makes the notation a little clearer. Note that the input will, as usual, be on the
left, but it doesn't enter the claim (or the following argument) at all.
.   .   .   .   .   .
L
0
d
L
d
L
1
R
1
M
R
d
L
0
1
Figure 1: Decomposition of the layers of the circuit C.
Proof: The proof of Lemma 3.1 is by induction on d. First consider d = 1. Then consider
the operator L
y
1
ML
1
. By the observations above, we may write L
1
= L
0
1

 R
1
, where L
0
1
is
either a single or two-qubit gate. So,
L
y
1
ML
1
= (L
0y
1

R
y
1
)M(L
0
1

 R
1
) = L
0y
1
ML
0
1
;
5
by virtue of the fact that M and R
1
commute. Since L
0
1
only depends on  2 qubits, this
establishes the result for d = 1.
Now suppose that we can write,
L
y
d
L
y
d 1
   L
y
1
ML
1
   L
d 1
L
d
= L
0
y
d
L
0
y
d 1
   L
0
y
1
ML
0
1
   L
0
d 1
L
0
d
where, for each i, L
0
i
acts on at most 2
i
bits. In particular, note that L
0
d
acts on at most 2
d
bits. Suppose that L
0
d
acts on indices in the set S
d
(where S
d
has size  2
d
). Now by the
induction hypothesis,
L
y
d+1
L
y
d
   L
y
1
ML
1
   L
d
L
d+1
= L
y
d+1
L
0
y
d
   L
0
y
1
ML
0
1
   L
0
d
L
d+1
;
and S
d
 S
d 1
 :::  S
1
.
The gates in L
0
d
involve at most the bits in S
d
. Since the circuit only contains at most
two-qubit gates, all the gates in L
d+1
involving bits in S
d
can act on at most 2
d+1
bits. Let
the tensor product of these gates be denoted by L
0
d+1
, and let S
d+1
denote the set of bits
on which L
0
d+1
acts. Clearly S
d+1
 S
d
. Then for some tensor product of single and two-
qubit gates R
d+1
we may write L
d+1
= L
0
d+1

 R
d+1
. Since R
d+1
acts on bits not in S
d+1
, it
commutes with all the L
0
i
and M , which only act on bits inside S
d+1
. Hence R
d+1
\cancels
out" and we have the desired relation.
Theorem 3.2. Let C be a quantum circuit on n inputs of depth d, consisting of single-qubit
and two-qubit gates, with any number of ancill that cleanly computes parity exactly. Then
d  logn. If C computes fanout in the same way, then d  logn  2.
Proof: Let C = L
1
  L
d
as in Lemma 3.1. Suppose C uses m ancill, and that it cleanly
computes the parity operator P in depth d < log n. It follows that for any x
1
; :::; x
n
; b and
any measurement operator M on the target bit,
hx
1
; :::; x
n
; b; 0; :::; 0jC
y
MCjx
1
; :::; x
n
; b; 0; :::; 0i = hx
1
; :::; x
n
; bjPMP jx
1
; :::; x
n
; bi: (2)
By Lemma 3.1,
C
y
MC = L
y
d
L
y
d 1
   L
y
1
ML
1
   L
d 1
L
d
= L
0
y
d
L
0
y
d 1
   L
0
y
1
ML
0
1
   L
0
d 1
L
0
d
;
where the operator L
0
1
   L
0
d
acts on at most 2
d
inputs. Since 2
d
< n, there is an input on
which that operator does not act. Hence the value on the left hand side of eq. (2) remains
unchanged if we can ip some x
i
. However, the outcome of the measurement on the parity
gate on the right hand side depends on every input, which is a contradiction.
The second assertion in the Theorem follows from eq. (1).
It is clear that if we have a family of circuits that use a xed set of multi-qubit gates
with arity independent of n, that a similar proof will work. Thus we have the following as a
corollary of the proof of Theorem 3.2:
Corollary 3.3. Let C be a quantum circuit on n inputs of depth d, consisting of single-
qubit and multi-qubit gates of size O(1), with any number of ancill, that cleanly computes
parity, or fanout, exactly. Then d = 
(logn).
6
4 Parity Requires Log Depth with Few Ancill
In this section we treat circuits that contain Tooli gates or, equivalently, Z-gates of arbitrary
size (i.e., that can depend on n). The technique of the preceding section does not work in this
case. This is because the large gates in general do not cancel, since they may not commute
with the measurement operator M .
To see how to proceed, it is useful to briey consider classical circuits with similar con-
straints. Suppose we have a classical circuit with NOT gates and unbounded fan-in AND
and OR gates, but that we do not allow any fanout. Once inputs (or outputs of other gates)
are used in either an AND or an OR gate, they can not be used again. It is obvious that
if such a circuit has constant depth, it cannot compute such functions as parity. The AND
and OR gates can be killed o by restricting a small set of inputs, resulting in a constant
function, while parity depends on all the inputs.
In the quantum case, it appears again that the only thing to do is to attempt to \kill
o" the large Tooli gates. However, the quantum case is much more subtle since we must
face the fact that intermediate states are a superposition of computational basis states,
and furthermore that the Z-gates, in combination with the single-qubit gates, may cause
entanglement.
As before, write C = L
1
L
2
  L
d
. Thus the circuit C transforms the state j	i to
L
1
  L
d
j	i: We assume without loss of generality that each layer L
i
is a tensor product
of Z-gates and single-qubit gates. Further assume without loss of generality that a specic
bit (say, the n
th
bit) of C serves as the output or target bit (which eventually is supposed to
agree with the output bit of a parity gate).
Our main technical lemma is easiest to see in the case that C has no ancill, which we
assume until later in the section.
Lemma 4.1. Let C be a circuit as described above, with no ancill. Then for each 1  k 
d, there exists a state j	
k
i over at most 2
k
bits such that for any state jRi in the quotient
space of B
n
complementary to j	
k
i, the state L
1
L
2
  L
k
(jRi 
 j	
k
i) has a 0 in the target
position of C.
Proof: The proof is by induction on k. First let k = 1. There are two cases:
1. In layer L
1
, the target is the output of a single-qubit gate S. Then let the state
j	
1
i = S
y
j0i over the n
th
bit. Now we may write L
1
= L
0
1

 S, where L
0
1
acts on the
quotient spaceR complementary to j	
1
i. No matter what state jRi 2 R we choose over
the bits f1; : : : ; n 1g, it follows that L
1
(jRi
j	
1
i) = (L
0
1
jRi)
(Sj	
1
i) = (L
0
1
jRi)
j0i
has a 0 in the n
th
position.
2. In layer L
1
, the target is the output of a Z-gate. Write L
1
= L
0
1

G, where G is this Z
gate. In this case, we choose j	
1
i = j0i over the n
th
bit. Now G acts both on j	
1
i as
well as the complementary quotient space R (via extension by the identity). But since
G involves a bit that is 0 (i.e., the n
th
bit), G is equivalent to the unit matrix in R.
Hence for any state jRi 2 R, L
1
(jRi 
 j	
1
i) = (L
0
1

 G)(jRi 
 j	
1
i) = (L
0
1
jRi)
 j0i
7
again has a 0 in the n
th
position. (Note that L
0
1
jRi is well dened by extending L
0
1
by
the identity.)
Now suppose the assertion is true for k 1 where k > 1. We will show that it remains true
for k. Suppose the j	
k 1
i in the assertion is a state over the (at most) 2
k 1
bits in the set
K
k 1
. Let R
k 1
denote the rest of the bits f1; : : : ; ng K
k 1
. Thus j	
k 1
i is a state in B
K
k 1
,
and the quotient space complementary to j	
k 1
i is B
R
k 1
, which for convenience we denote
by R
k 1
. We specify the state j	
k
i as follows: Start with K
k
:= K
k 1
and R
k
:= R
k 1
. If a
Z-gate G in L
k
involves bits both in K
k
and in R
k
, we remove a single bit from R
k
involved
with G, add it to K
k
, declare the gate G killed, and remove G from further consideration.
Continue until all such Z-gates have been killed. Since each bit in K
k 1
can be involved
with at most one Z-gate in L
k
, the number of bits added to K
k
(and removed from R
k
) in
this process is at most 2
k 1
. Let L
(K)
k
denote the gates in L
k
that involve the bits in K
k
,
excluding the Z-gates that have been killed. Then nally, we dene the state j	
k
i as the
tensor product of L
(K)y
k
j	
k 1
i with the state in which all the bits in K
k
 K
k 1
are 0.
Note that j	
k
i is a state over at most 2  2
k 1
= 2
k
bits, as seen in Figure 2. Let R
k
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
















 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


















 
 
 
 
 
 
 
 
 
 
 
 
 













  
  
  
  




 
.   .   .   .   .   .
Z
R
d
K
d
k
0
R
k
K
k
. . .
Figure 2: The sets K
k
and R
k
. A Z gate that involves bits in both sets is shown.
denote the quotient space complementary to j	
k
i. Clearly, R
k
= B
R
k
. Now let jRi be any
state in R
k
(equivalently, over the bits in R
k
), and apply L
k
to jRi 
 j	
k
i. Let L
(R)
k
denote
the gates in L
k
acting in R
k
, again excluding the Z-gates that have been killed. Note that
any Z-gate in layer L
k
that involves bits in K
k
as well as R
k
acts as the identity on R
k

j	
k
i,
by the construction of j	
k
i. Thus we have eliminated these gates from L
k
without any loss
of generality. Thus,
L
k
(jRi 
 j	
k
i) = (L
(R)
k

 L
(K)
k
)(jRi 
 j	
k
i) = (L
(R)
k
jRi)
 (L
(K)
k
j	
k
i):
Now L
(K)
k
j	
k
i is the tensor product of j	
k 1
i with a number of j0i states. So we conclude
that L
k
(jRi 
 j	
k
i) is of the form jR
0
i 
 j	
k 1
i for some state jR
0
i 2 R
k 1
. Then,
L
1
L
2
  L
k 1
L
k
(jRi 
 j	
k
i) = L
1
L
2
  L
k 1
(jR
0
i 
 j	
k 1
i):
8
By the induction hypothesis, the right hand side of the above equation has a 0 target bit,
which proves the lemma.
Remark. With a bit more careful analysis, Lemma 4.1 can be improved to the following:
Lemma 4.2. Let C be a circuit as described above. Then for each 1  k  d, there exists
a state j	
k
i over at most 2
dk=2e
bits such that for any state jRi in the quotient space of B
n
complementary to j	
k
i, the state L
1
L
2
  L
k
(jRi
 j	
k
i) has a 0 in the target position of C.
The dierence is that now j	
k
i is over only 2
dk=2e
bits instead of 2
k
. Instead of giving
a formal proof, we will just sketch the reasons for Lemma 4.2. When some bit (the i
th
bit,
say) is moved from R
k
to K
k
, it is set to the j0i state. Consider the gate G (if any) in L
k+1
that involves this bit. If G is a single-qubit gate, then no Z-gate in L
k+1
is killed involving
the i
th
bit, so no additional bit needs to be added to K
k+1
for the sake of the i
th
bit. If G is
a Z-gate, then the i
th
bit alone is enough to kill G, since this bit is already 0. So again, no
additional bit must be added to K
k+1
to kill G. Thus k must increase by 2 for the size of
K
k
to double. Note that we handled the base case of Lemma 4.1 this way, obtaining a state
over 1 = 2
0
bits.
Theorem 4.3. Let C be a circuit of depth d consisting of single-qubit gates and Z-gates,
and uses 0 ancill. If d < 2 logn, then C cannot compute P .
Proof: Suppose C = P . Then for any input state, the target bit of C is 0 i the target
bit of P is 0. By Lemma 4.2, there exists a state j	i on at most 2
dd=2e
< n bits such that, for
any state jRi on the remaining n 2
dd=2e
bits, C(jRi
j	i) has a 0 value for the target. First
let jRi be the state with 0's in all n   2
dd=2e
positions (since n   2
dd=2e
> 0, such positions
exist). Then P (jRi
j	i) has a 0 target. This is only possible if the state j	i is in a quotient
space of B
n
spanned by computational basis states in which an even number of the variables
are 1. Now change one of the bits of jRi from 0 to 1. The target of C(jRi
 j	i) still has the
value 0, but the target of P (jRi 
 j	i) must change to 1, which contradicts the assumption
that C = P .
Since fanout and parity are equivalent up to depth 3 (with 0 ancill), we have immediately
the following.
Corollary 4.4. Let C be a circuit of depth d consisting of single-qubit gates and Z-gates,
and uses 0 ancill. Then, if d < 2 logn  2, C cannot compute the fanout operation.
We now consider the case in which our circuit has a non-zero number of ancill. Firstly,
it is clear that Lemmas 4.1 and 4.2 work if we set a target and all ancill to 0 at the same
time. If there are a many ancill, then we are setting a + 1 \outputs." The conclusion of
the analogous Lemma for a ancill would then be that the state j	i is over (a+1)2
dd=2e
bits
(since the number of \committed" bits doubles with each second layer, as in Lemma 4.2).
9
These bits may include ancill, and assuming that C does a clean computation, j	i will be
0 on the ancill (since they must all start out as 0 in order to return to their nal value of
0). Therefore, if n > (a + 1)2
dd=2e
, the state jRi is over at least one bit but no ancill and
is thus free to take on any value. Thus if n > (a+ 1)2
dd=2e
, the output of C is insensitive to
changes in at least one of the inputs, and hence the circuit is defeated as before. Note we
have a depth/ancill trade-o as a result. We thus have the following corollary of the proof
of Theorem 4.3:
Corollary 4.5. Let C be a circuit of depth d consisting of single-qubit gates and Z-gates.
Then, if C cleanly computes the parity function with a ancill, then d  2 log(n=(a+ 1)).
We conjecture that d must be at least 2 logn no matter what a is.
We oer an alternative interpretation of our result that arose out of conversations with
Luc Longpre. Let us say that a quantum circuit C robustly computes a unitary operator U
if C computes U cleanly and, in addition, if its output is insensitive to the inititial state of
the ancill. Thus the ancill of C can start out in any state whatsoever; the circuit C is
guaranteed to return the ancill to that state in the end, and always gives the same answer.
This of course puts a much stronger constraint on the circuit (since in the usual model we
only insist on a clean computation when the ancill are initialized to 0), but such circuits
can be useful (e.g., see exercise 8.5 in Kitaev et al. [6]). It is not hard to see that in this case,
if C consists only of single-qubit and Z-gates, then it must have depth 2 logn to compute
parity, regardless of the number of ancill.
5 Conclusions and Open Problems
Following the line of earlier work of Green et al., Hyer and

Spalek, and Cleve and Watrous
[4, 5, 1], our main result gives an optimal, O(log n) lower bound on the depth of QAC-
type circuits computing fanout, in the presence of limited (slightly sublinear) numbers of
ancill. It would clearly be desirable to extend our result to obtain the same conclusion
when polynomially many (or an unlimited number of) ancill are allowed, and thus to prove
that QAC
0
6= QAC
0
wf
.
The role of ancill in quantum computation has not received much detailed attention.
Prompted by our considerations here, there are several interesting questions that arise. One
issue is the necessity of ancill for specic quantum computations or classes of quantum
computations. Is there a problem that can be done in constant depth with ancill but
which requires logn depth without ancill? Similarly, are there computational problems for
which logn depth is possible with ancill but without ancill, polynomial depth is needed?
In general, how many ancill are needed for specic problems? Is there a general tradeo
that can be proved between numbers of ancill and circuit depth?
While much has recently been learned concerning constant depth circuit classes, a few
interesting questions still remain. It would be worthwhile to be able to distinguish between
the power of quantum gates of unbounded arity. We have seen that Tooli and Z gates
10
(which are equivalent up to constant depth) are weaker than parity and fanout (which are
equivalent not only to each other but also, for all intents and purposes, to other mod gates,
threshold gates and the quantum Fourier transform). Are there other natural types of gates
that lie between these two classes, or is every gate either equivalent, up to constant depth,
to either single qubit and CNOT gates, or to Tooli gates, or to parity? It would also be of
interest to characterize exactly what can be computed in constant depth using only single
qubit and CNOT gates as, even from an optimistic point of view, this is the kind of circuit
that might be built in the not too distant future. A further study of these limited quantum
circuits can be found in Fenner et al [2]. One result proved there is that one cannot compute
generalized Tooli gates by circuits in the class QNC
0
which consists of constant depth
circuits composed of single qubit and CNOT gates, and hence this class diers from QAC
0
.
6 Acknowledgements
We thank Luc Longpre for helpful discussions and comments on this paper. This work
was supported in part by the National Security Agency (NSA) and Advanced Research
and Development Agency (ARDA) under Army Research OÆce (ARO) contract numbers
DAAD 19-02-1-0058 (for M. Fang, S. Homer, and F. Green) and DAAD 19-02-1-0048 (for
S. Fenner and Y. Zhang).
References
[1] R. Cleve and J. Watrous, \Fast parallel circuits for the quantum Fourier transform,"
Proceedings of the 41st Annual Symposium on Foundations of Computer Science (2000),
526{536.
[2] S. Fenner, F. Green, S. Homer and Y. Zhang, \Bounds on the Power of Constant Depth
Quantum Circuits," manuscript.
[3] M. Furst, J.B. Saxe, and M. Sipser, \Parity, circuits, and the polynomial-time hierarchy."
Math. Syst. Theory 17 (1984) 13{27.
[4] F. Green, S. Homer, C. Moore and C. Pollett, "Counting, Fanout and the Complexity of
Quantum ACC," Quantum Information and Computation 2 (2002) 35{65.
[5] P. Hyer and R.

Spalek, \Quantum circuits with unbounded fan-out," 20th STACS Con-
ference, 2003, LNCS 2607, 234{246.
[6] A. Yu. Kitaev, A. H. Shen, and M. N. Vyalyi, Classical and Quantum Computation,
American Mathematical Society, 2002.
[7] Cristopher Moore. Quantum Circuits: Fanout, Parity, and Counting. In Los Alamos
Preprint archives (1999), quant-ph/9903046.
11
[8] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information,
Cambridge University Press, 2001.
[9] A. A. Razborov, Lower bounds on the size of bounded depth networks over a complete
basis with logical addition, Matematicheskie Zametki 41 (1987) 598-607. English transla-
tion in Mathematical Notes of the Academy of Sciences of the USSR 41 (1987) 333-338.
[10] P. W. Shor, \Polynomial-time algorithms for prime number factorization and discrete
logarithms on a quantum computer." SIAM J. Computing 26 (1997) 1484{1509.
[11] R. Smolensky, Algebraic methods in the theory of lower bounds for Boolean circuit
complexity, in Proceedings of the 19th Annual ACM Symposium on Theory of Computing
(1987) 77-82.
12
