Linear Depth Stabilizer and Quantum Fourier Transformation Circuits with
  no Auxiliary Qubits in Finite Neighbor Quantum Architectures by Maslov, D.
ar
X
iv
:q
ua
nt
-p
h/
07
03
21
1v
2 
 1
5 
N
ov
 2
00
7
APS/123-QED
Linear Depth Stabilizer and Quantum Fourier Transformation Circuits with
no Auxiliary Qubits in Finite Neighbor Quantum Architectures
Dmitri Maslov
Institute for Quantum Computing,
University of Waterloo, Waterloo, ON, Canada,
N2L 3G1. Email: dmitri.maslov@gmail.com.
(Dated: November 4, 2018)
In this paper we investigate how quantum architectures affect the efficiency of the execution of the Quantum
Fourier Transform (QFT) and linear transformations, which are essential parts of the stabilizer/Clifford group
circuits. In particular, we show that in most common and realistic physical architectures including Linear
Nearest Neighbor (LNN), 2D lattice, and bounded degree graph (containing a chain of length n), n-qubit QFT
and n-qubit stabilizer circuits can be parallelized to linear depth using no auxiliary qubits. We construct lower
bounds that show the efficiency of our approach.
PACS numbers: 03.67.Lx, 03.67.Pp
I. INTRODUCTION
Quantum computation has attracted attention because it ap-
pears to reduce the computational complexity of certain calcu-
lations, see, for example, [1, 2]. For the quantum circuit model
of computation, there exists a number of physical quantum in-
formation processing implementations, such as liquid NMR
(up to 12 qubits at a time) [3], and trapped ions (8 qubits) [4].
Generally, a large number of qubits is required for computa-
tional purposes. In this work, we do not allow for any auxil-
iary qubits to be used in order to reflect the apparent hardness
of scaling up quantum information processing devices.
Quantum circuits have been optimized to require less space,
fewer gates and smaller depth. This is important from the
point of view of the efficient potential realization of the quan-
tum algorithms. As discussed in the first paragraph, we ad-
dress the issue of space minimization by restricting the num-
ber of auxiliary qubits to zero. Our next focus is on depth
minimization. This is because a small depth circuit does not
only mean a fast computation, but also helps reduce the effect
of decoherence. For instance, it is possible to construct re-
alistic examples in which a smaller depth circuit will require
fewer levels of error correction, and each error correction code
concatenation step is a very expensive operation [5].
In this paper, the depth of a circuit is defined as the number
of logic levels in it. Each logic level is a set of non-intersecting
“elementary” gates. It is generally accepted that in a prac-
tical quantum information processing approach, it should be
possible to execute independent gates in parallel. The gate
libraries considered in the relevant literature include a set of
single-qubit and CNOT gates (which is most likely an arti-
fact of the well known result showing the completeness of
this gate set, however, CNOT may not necessarily be a natu-
ral gate for some quantum information processing proposals),
and any two-qubit operation. Indeed, given a Hamiltonian,
any two-qubit operation can be efficiently implemented [6].
For the sake of completeness, this paper discusses how the re-
sults apply to both gate sets. Circuit depth, as defined above,
upper bounds a possibly lower circuit runtime in cases when
next logic level can be executed based upon the availability
of qubits and before execution of the gates from the previous
level has completed. Practically, this means that some of our
upper bounds may not be tight (which is advantageous in the
sense that the implementation we construct may in fact have
a smaller runtime than predicted by the formulas evaluating
depth).
Quantum algorithms and their circuits are usually formu-
lated without considering the physical limitations imposed by
different architectures. We believe that circuit and algorithm
designs need to be modified to account for possible architec-
tures. In particular, in realistic architectures, it is not possible
to establish direct interactions between every pair of qubits
[3, 4, 7]. A study of quantum computing architectures for
the existing and emerging quantum technologies shows that
the fastest possible direct interactions form a bounded degree
graph (e.g., liquid NMR quantum information processing),
and 1D or 2D (sub)lattices [8]. A mixed architecture, where
values of stationary qubits may be teleported with the help of
flying qubits to where they are desired was studied in [9]. In
this work, the role of stationary qubits is played by the spins
of phosphorus atoms embedded in silicon, known as the Kane
proposal [10], and the flying qubits are photons, with the in-
formation being teleported via EPR states. Other proposals for
state transfer between either stationary or both flying and sta-
tionary qubits, and discussions of mixed architectures, can be
found, for example, in [11, 12, 13] and the references therein.
However, an architecture that allows interconversion between
stationary and flying qubits cannot in general be realized in
any technology. In addition, it was shown that teleportation
of a single value (simultaneous teleportation of many qubits
may be less efficient) in the Kane architecture is only efficient
if compared to more than 2-4 levels of SWAPs [9]. A similar
effect is likely to take place in other mixed architecture pro-
posals. The latter is important for this work since we are only
using depth-1 swapping of multiple qubits via SWAP gates.
Generally speaking, due to the spatial constraints it seems
unrealistic to believe that a direct scalable implementation of
the unrestricted (where every two qubits are neighbors) archi-
tecture, or, more generally, unbounded neighbor architecture,
will ever be found. Furthermore, in classical computation the
2number of neighbors is limited, and there is no obvious rea-
son to believe that the quantum world is different. Thus, the
complexity of the circuit designs must be refined to take it into
account the limitations of possible quantum computing archi-
tectures.
The linear nearest neighbor (LNN) architecture, also known
as chain nearest neighbor, is often considered as a good (and,
in fact, very restrictive) approximation to what a scalable
quantum architecture may be. Mathematically, in an LNN
architecture with n qubits q1, q2, . . . ,qn, two-qubit gates are
allowed between any qubits whose subscript values differ by
one. The LNN architecture describes 1D lattices. It misses
possible direct interactions in 2D lattices and may restrict the
number of useful interactions in connected graphs. However,
if one can show that a circuit can be efficiently reorganized to
be executed in the LNN architecture, such a circuit could be
run efficiently in many other architectures.
The Quantum Fourier Transformation (QFT) is an analogue
of the classical discrete Fourier transformation, however, in
the quantum case the transformation is applied to the ampli-
tudes. The QFT serves as a basis for a number of efficient
quantum algorithms. Most notably, it is at the heart of inte-
ger factorization and the discrete logarithm polynomial time
quantum algorithms [2]. Therefore, efficient implementation
of the QFT is important. This is why this topic has been stud-
ied extensively [14, 15, 16]. Researchers presented linear and
logarithmic depth circuits using a number of auxiliary qubits.
Known circuits for the QFT have a regular structure [5, 16].
However, they require direct interaction between every two
qubits, which makes such circuits especially inconvenient for
quantum architectures where only a finite number of neigh-
bors is allowed. In an architecture with a finite number of
neighbors, such as LNN, state transfer down the chain may
require up to (n− 1) SWAP gates. We refer to the this obser-
vation as the locality constraint in the discussions involving
lower bound arguments. A linear depth QFT circuit imple-
mented in the LNN architecture has been reported in [17]. We
reconstruct this circuit with our generalized technique and we
also study lower bounds.
Stabilizer circuits (also known as unitary stabilizer circuits
or Clifford group circuits) were introduced and studied for
their use in the encoding, decoding and error detection stages
of quantum error-correction codes [18, 19]. They can be
defined as arbitrary quantum circuits composed with single-
qubit Hadamard and Phase gates and two-qubit controlled-
NOT gates. It turns out that stabilizer circuits can be effi-
ciently simulated [20] as an 11-stage sequence of Hadamard
(H), Phase (P) and linear reversible circuits (C) as H-C-P-C-
P-C-H-P-C-P-C. Each P and H stage is a depth-1 computation
composed with single-qubit gates. The depth of stabilizer cir-
cuits is, thus, defined by the depth of a circuit realizing some
linear reversible function. Efficient circuits for linear func-
tions are, therefore, of great importance. In this paper we
show that every stage C can be parallelized to linear depth
in the LNN architecture. Thus, the entire stabilizer circuit re-
quires at most linear time to be executed.
A very recent study shows that a size s stabilizer circuit
in an unrestricted architecture can be parallelized to a depth
O(logn) circuit, but requires O(s3 + n) auxiliary qubits [21],
Proposition 8.9. Combining the results of [20, 21, 22] this
gives a depth O(logn) circuit in unrestricted architectures us-
ing O( n6log3 n ) auxiliary qubits to realize any stabilizer circuit.
Since a depth d circuit built on q qubits in unrestricted archi-
tectures may become as large as depth O(qd) in the LNN ar-
chitecture (every depth-2 computation can be adversary made
to define the complete interaction pattern of the LNN archi-
tecture, and two depth-2 non-commuting stages can be de-
fined such as to require a linear depth qubit permutation be-
tween them), the benefit of logarithmic depth quickly disap-
pears. However, a large amount of auxiliary qubits remains.
Out approach thus appears more practical.
The remainder of the paper is organized as follows.
We start by introducing a concept of skeleton circuits and
studying their properties. In Subsections II A and II B the
lessons learned are applied to show that QFT and linear re-
versible/stabilizer circuits can be parallelized to linear depth
in the LNN architecture. Section III reports lower bounds for
a class of skeleton circuits which appears to be very important.
Concluding remarks can be found in Section IV.
II. SKELETON CIRCUITS
Any quantum circuit composed with single-qubit and two-
qubit gates can be thought of as a circuit composed of generic
two-qubit operations each of which consists of a two-qubit
gate of the initial circuit with the surrounding gates absorbed
into it (the trivial case when only single-qubit gates are applied
to a specific qubit throughout an entire computation is ignored
as not interesting). We call this a skeleton circuit. Obviously,
the complexity of a skeleton circuit defines the complexity
of the initial circuit (assuming that any two-qubit gate has a
finite cost) and vice versa. We next study skeleton circuits of a
certain type and apply the lessons learned to construct circuits
for QFT and linear reversible/stabilizer circuits of linear depth
in the LNN architecture.
The basic skeleton circuit we consider is illustrated in Fig.
1(a). Mathematically, the skeleton circuit SC is defined as
SC := Gi11 (q1,q2)G
i2
2 (q1,q3) . . .G
in−1
n−1(q1,qn)
Ginn (q2,q3) . . .G
in(n−1)/2
n(n−1)/2(qn−1,qn), (1)
where G∗ (∗ is reserved to represent any possible existing
value of subscript) is a two-qubit gate that operates on the
qubits indicated in brackets, i∗ take Boolean values, and for a
gate G, G1 is the gate G itself, whereas G0 = Id (identity, i.e.,
this gate is not applied). In other words, i∗ are used to indicate
whether a gate is present or not.
Since all quantum gates that operate on non-intersecting
sets of qubits commute, the SC circuit can be executed in par-
allel in (2n−3) computational stages L1,L2, . . . ,L2n−3 defined
as follows: L1 :=Gi11 , L2 :=G
i2
2 , L3 :=G
i3
3 Ginn , L4 :=G
i4
4 G
in+1
n+1,
L5 := Gi55 G
in+2
n+2G
i2n−2
2n−2, . . . , L2n−3 := G
in(n−1)/2
n(n−1)/2. This is illus-
trated in Fig. 1(b) in the case n = 5.
3q
q
q
q
1
2
3
4
q
1
q
2
q
3
q
4
q
1
q
2
q
1
1
2
3
4
5
S
S
S
S
S
:
:
:
:
:
L L L L1S 2S 3S 4S S1 2 3 4 5
q
5
q
q
q
q
1
2
3
4
q
5
6S :
7S :
q
5
q
3
q
2
q
4
q
3
q
5
q
1
q
2
q
3
q
4
q
5
q
3
q
4
q
5
q
1
q
2
q
3
q
4
q
1
q
2
q
5
q
4
q
2
q
5
q
1
q
3
q
4
q
5
q
3
q
1
q
2
L5 S6L6 S7L7
q
q
q
q
1
2
3
4
q
5 (a)
(b) (c) (d)
q
4
q
5
FIG. 1: Reorganizing an n-qubit skeleton circuit, illustrated for n = 5. (a) Original circuit with at most n(n−1)2 gates. Each of the gates in this
skeleton circuit may or may not be present. (b) Linear (2n− 3) depth circuit possible to run in the “sea-of-qubits” architecture. (c) Version
of (b) ready for execution in the LNN architecture. (d) This table illustrates how swapping stages S∗ are constructed and inserted between the
computational stages L∗.
Next, the circuit can be adapted to the LNN architecture
through inserting SWAP gates SWAP(qs,qt) after each gate
Gikk (qs,qt). This is illustrated in Fig. 1(c) and (d) in the case
n = 5. In the gate library containing all possible 2-qubit uni-
taries, the upper bound for depth is (2n−3). We next use this
result to achieve linear depth circuits for QFT and stabilizer
circuits. These are fairly tight upper bounds. With the best
known asymptotic result requiring Θ(n2) gates for the QFT, it
can be shown that QFT cannot be computed in less than linear
depth even in an unrestricted architecture. A counting argu-
ment applied to linear circuits [22] shows that there exists a
stabilizer circuit that requires at least Θ( n2logn ) gates, meaning
that it is impossible to find a circuit for it with depth less than
Θ( nlogn ) even if the architecture is unrestricted. Lower bounds
in restricted architectures (all of which turn out to be linear,
and thus having the same asymptotic as the upper bound that
follows from our construction) are studied in Section III.
Let us note that the skeleton circuit that we consider can
be parallelized to linear depth in the LNN architecture for any
initial permutation of the input and return the output in any
desired order. For that, at most a linear depth swapping stage
before and after the circuit is required, which does not change
the overall linearity of the depth. The circuit illustrated in
Fig. 1(c) not only allows execution in the LNN architecture, it
also does not change the LNN connectivity pattern (q1−q2−
. . .− qn), and thus such circuits can be applied one after the
other with no swapping in between. This observation will be
used in Subsection II B. If the circuit in Fig. 1(c) is the last
computational stage before the measurement is done, the last
SWAP need not be applied.
A. QFT in the LNN architecture
A circuit that realizes the QFT and requires no ancilla
qubits is illustrated in Fig. 2(a). Its skeleton circuit (Fig.
H
H
H
H
H
H
R2 R3 R4 R5 R6
R2 R3 R4 R5
R2 R3 R4
R2 R3
R2
(a)
(b)
FIG. 2: (a) Circuit for n-qubit QFT [5], page 219, illustrated for
n = 6. The two-qubit gates are controlled-Z rotations with parameter
1/2k , where k is the subscript in the gate notation. The single-qubit
gates are Hadamard gates. (b) Skeleton circuit of the QFT circuit in
(a) composed of generic two-qubit gates.
2(b)) is obviously of the type considered in the previous sec-
tion with all i∗ = 1. Therefore, the QFT can be parallelized to
linear depth. This is, however, a known result, as [17] reports a
construction that is equivalent to ours. It can also be observed
that the approximate QFT circuit, where controlled rotations
of the QFT circuit with small parameters are ignored, may
be executed in linear depth in the LNN architecture. Lower
bounds are discussed in Section III, and they apply directly to
the QFT circuit.
B. Stabilizer/linear circuits
Synthesis of efficient linear circuits has been studied in
[22]. The authors report a synthesis algorithm capable of pro-
4ducing a circuit with O( n2logn ) CNOT gates. It was also proven
that their synthesis is asymptotically optimal in that there ex-
ists a linear function that requires Θ( n2logn) CNOT gates. In
this paper, the goal is different. We target minimization of the
depth as opposed to the number of gates used. The depth of
our circuit is linear in the number of qubits n, and it is up-
per bounded by 18n+O(1) CNOTs (assuming every SWAP
is substituted with a suitable 3-CNOT implementation) or
6n+O(1) generic two-qubit gates. We also prove asymptotic
optimality, which in our case is straightforward.
Every reversible linear function of n variables ~q =
(q1,q2, . . . ,qn)t can be written as matrix multiplication A~q,
where A is an n× n Boolean non-singular matrix. Synthesiz-
ing such a function is equivalent to composing a sequence of
gate operations that transforms matrix A into its reduced ech-
elon form. Due to reversibility, the reduced echelon form of A
is the identity matrix. A standard technique for transforming a
matrix A to the identity is to apply the Gauss-Jordan elimina-
tion algorithm. In the following, we illustrate the application
of the Gauss-Jordan elimination algorithm and then modify its
circuit to allow it be executed with a linear number of compu-
tational stages. Parameters i∗ and p∗ take Boolean values and
they are used to indicate whether the gate has been applied (1)
or not (0). Parameters p∗ are reserved for the gates applied to
update values of the diagonal elements of the matrix A during
Gauss-Jordan elimination.
• Step 1. Make sure that the pivot element a1,1 6= 0. If
a1,1 6= 0 assign p1 := 0. Otherwise choose a j,1 6= 0,
apply gate CNOT(q j,q1) and make assignment p1 := 1.
• Steps s = 2..n. Transform each as,1 to 0 through appli-
cation (if needed) of the gate CNOT(q1,qs). If at step s
a gate was applied set is := 1, otherwise, is := 0.
• Step n+1. Make sure that the pivot element a2,2 6= 0. If
a2,2 6= 0 do nothing (p2 := 0), otherwise choose a j,2 6= 0,
apply gate CNOT(q j,q2) and set p2 := 1.
• Steps s = (n + 2)..(2n − 1). Transform each as,2
to 0 through application (if needed) of the gate
CNOT(q2,qs−n+1). If at step s a gate was applied set
is := 1, otherwise, is := 0.
. . .
• Step n(n+1)2 − 2. Make sure that the pivot element
an−1,n−1 6= 0. If an−1,n−1 6= 0 do nothing (pn−1 := 0),
otherwise apply gate CNOT(qn,qn−1) and make assign-
ment pn−1 := 1. After this step, all parameters p∗ must
be set.
• Step n(n+1)2 − 1. Transform each an,n−1 to 0 through
application (if needed) of the gate CNOT(qn−1,qn).
If the gate was applied set i n(n+1)
2 −1
:= 1, otherwise,
i n(n+1)
2 −1
:= 0. At this point, the set of applied transfor-
mations reduced matrix A to the upper triangular form
with ones on diagonal. The remainder of the algorithm
eliminates non-zero elements above the diagonal.
q
1
q
2
q
3
q
n-1
q
n
?
?
?
?
...
?
?
?
... ...
?
.
.
.
...
S
te
p
:
1 2 3 n
-1
n n
+
1
n
+
2
2
n
-2
2
n
-1
n
(n
+
1
)/
2
-1
n
(n
+
1
)/
2
-2
n
(n
+
1
)/
2
n
(n
+
3
)/
2
-4
n
(n
+
3
)/
2
-3
n
(n
+
3
)/
2
-2
n
 -
3
2
n
 -
2
2
n
 -
1
 
2
G
a
te
p
re
s
e
n
c
e
in
d
ic
a
to
r:
2 3 n
-1
n n
+
2
2
n
-2
2
n
-1
n
(n
+
1
)/
2
-1
n
(n
+
1
)/
2
i i i i i i i i i i i i i i i
1
p
2
p
n
-1
p
... ... ... ... ...
... ... ... ... ...n
(n
+
3
)/
2
-4
n
(n
+
3
)/
2
-3
n
(n
+
3
)/
2
-2
n
 -
3
2
n
 -
2
2
n
 -
1
 
2
FIG. 3: Application of Gauss-Jordan elimination algorithm to the
synthesis of a reversible network. Gates with controls ©? indicate
a single CNOT each with the control at (exactly) one of positions
marked ©? .
• Steps s = n(n+1)2 ..(n
2 − 1). If ak,l 6= 0, apply
CNOT(ql ,qk) for k = l..1 inside for l = n..2 and set is
to one iff a gate has been applied.
We next use the gate commutation rule (two CNOT gates
commute iff target of one gate is not equal to the control
of the other) and circuit identity CNOT(a,c)CNOT(c,b) =
CNOT(c,b)CNOT(a,b)CNOT(a,c) to move all (n− 1) gates
CNOT(a,c) with parameter p∗ to the front of the network.
Note, that every time commutation rule is used, the gates
just change their position and every time the circuit identity
is applied we introduce a new gate CNOT(a,b). However,
such a gate can always be commuted to the closest on the left
CNOT(a,b), and this is accounted for by the updates to the
i∗ gate presence indicator. The circuit gets transformed to the
one illustrated in Fig. 4. Parameters i∗ are changed through
XORing each i j , j < n(n+1)2 with pk, for k < n such that qk is
the target of the gate used at step j. The constructed circuit
consists of three parts marked I-III in Fig. 4. The skeleton of
each of these parts is described by equation (1), which is obvi-
ous for parts II and III and requires a short explanation for part
I. Divide the skeleton circuit (Fig. 1a) into (n− 1) parts with
the first containing first (n− 1) gates, the second containing
next (n−2) gates, and so on, the last, (n−1)st part containing
one last gate. Then, gate Gi for i = 1..n− 1 from part I of
the circuit in Fig. 4 can be matched (via “skeletonization”) to
some gate in the ith part of the skeleton circuit SC. Thus, ev-
ery linear reversible function can be computed as a maximal
depth 3(2n− 3) = 6n+O(1) circuit. Furthermore, since each
SWAP-CNOT pair can be rewritten as two CNOTs (Fig. 5)
and SWAP requires no more than 3 CNOT gates, the overall
depth in terms of CNOTs can be upper bounded by the ex-
pression 18n+O(1). We note that in some quantum informa-
tion processing proposals pair CNOT-SWAP can be executed
more efficiently than a single CNOT or a single SWAP, such
as in [23], Fig. 1. Due to the locality constraint our upper
bound has the same asymptotic as a lower bound, and thus
our circuits are asymptotically optimal. Using H-C-P-C-P-C-
52i 3i
...
... n
-1
i ni
q
1
q
2
q
3
q
n-1
q
n
?
?
?
?
?
?
?
2
p n
+
2
i 2
n
-2
i 2
n
-1
i
...
... n
(n
+
1
)/
2
-1
i
?
n
-1
p
.
.
.
...
G
a
te
p
re
s
e
n
c
e
in
d
ic
a
to
r:
n
(n
+
1
)/
2
i i i i i i i
1
p ... ...... n
(n
+
3
)/
2
-4
n
(n
+
3
)/
2
-3
n
(n
+
3
)/
2
-2
n
 -
3
2
n
 -
2
2
n
 -
1
 
2
...
...
3
p
2
p
3
pn
-1
p
n
-1
p
...
I II III
FIG. 4: Gauss-Jordan elimination algorithm network with rearranged
gates.
= =
FIG. 5: 2-CNOT circuit equivalent to a SWAP-CNOT pair.
H-P-C-P-C decomposition for stabilizer circuits [20] these up-
per bounds directly translate to at most depth 30n+O(1) cir-
cuit composed with generic two-qubit gates, or at most depth
90n+O(1) circuit in the library with single-qubit and CNOT
gates.
1. Encoding and error syndrome circuits for CSS codes
Encoding and error syndrome circuits for CSS codes are of
a great practical importance due to the clever error correct-
ing properties of the CSS codes. Such circuits include those
illustrated in Fig. 6(a) (encoding; [24]) and Fig. 6(b) (error
syndrome; [5]), where single-qubit Hadamard gates are not
illustrated since their contribution to the total depth is only
a constant, and the controlled gates, each of which may or
may not be present (which is defined by the form of the parity
check matrices of the corresponding classical codes), are ei-
ther controlled-NOT or controlled-Z. Our circuit paralleliza-
tion technique described in the previous subsection applies
directly to such circuits since each of them has skeleton as
described by the Eq. (1) with n = s+ t + 1 for the encoding
circuit and n = s+ t for the error syndrome circuit. This al-
lows us to execute the encoding circuit in (2s+ 2t− 1) stages
and the error syndrome circuit in (2s+ 2t− 3) (in both cases,
s, t ≥ 1) stages composed of generic two-qubit gates. How-
ever, a better approach is possible. The following construction
is, essentially, a part of the algorithm used to execute SC.
Consider encoding circuit (Fig. 6(a)). Prepare the qubits in
the following LNN connectivity pattern a1 − a2 − . . .− as −
b− ct − ct−1− . . .− c1. At each level i apply gates whose tar-
gets intersect with the sloping lines marked “level i” shown
in Fig. 6(a). Each such level is followed by the level of
SWAPs applied to the same qubits as the gates from the pre-
vious level to allow for the next set of gates to get executed
in the LNN architecture. For example, for s = 3 and t = 4
a
a
a
b
1
2
s
level 1
c
c
c1
2
t-1
ct
level 2
level s-1
level s
level s+1
level s+t
level s+t+1
a
a
a1
2
s
c
c
c1
2
t-1
ct
level 1
level 2
level s
level s+1
level s+2
level s+t-1level s+2
(a)
(b)
FIG. 6: General structure of the (a) encoding and (b) syndrome de-
tection circuits for CSS quantum error correcting codes.
level 3 will be composed of the gates G(b,c2), G(a3,c3), and
G(a2,c4), followed by the swaps SWAP(b,c2), SWAP(a3,c3),
and SWAP(a2,c4). Thus, the total depth of the encoding
circuit executable in the LNN architecture will be equal to
(s+ t+1) if it is allowed to be composed of generic two-qubit
gates. This is almost half of what was expected if this cir-
cuit were matched to the SC first. This translates to a depth
2(s + t + 1) circuit with controlled-NOT, controlled-Z and
SWAP gates. Similarly, the depth of the error syndrome cir-
cuit composed with generic gates and executable in the LNN
architecture is (s+ t− 1).
Application of the technique described in this subsection to
executing the error syndrome circuit for Steane’s code ([5],
Fig. 10.16) in the LNN architecture shows that this can be
done in 12 stages composed of generic two-qubit gates or
26(= 2 ∗ 12+ 2) stages composed of Hadamard, controlled-
NOT, controlled-Z, and SWAP gates. We can show that the
encoding circuit of [24], Fig. 8b, can be executed in 23
stages composed of generic two-qubit gates or, alternatively,
68 (= 3 ∗ 23+ 1− 2: pairs CNOT-SWAP must be combined,
we need an extra level for Hadamard gates, but do not need to
apply last SWAP) stages composed of CNOT and Hadamard
gates in the LNN architecture. Our result for the depth, 68, is
notably better than 177 found by the automated procedure of
[24].
III. LOWER BOUNDS
In this section we study lower bounds on the depth of skele-
ton circuit SC defined in equation (1) assuming all gates are
present (i.e., each i∗ = 1). We further assume that a pair of
gates G(qi,q j)SWAP(qi,q j) requires two units of the execu-
tion time, one for each of the gates. In practice, a direct im-
plementation of pair G(qi,q j)SWAP(qi,q j) may be more ef-
ficient [6], but the particulars of such a construction depend
on the specific Hamiltonian, which is unknown in the gen-
eral case. The depth of circuit illustrated in Fig. 1(c) is thus
(4n− 6). The lower bounds achieved below are directly ap-
plicable to the QFT circuit.
To prove lower bounds, we need to restrict the set of possi-
ble computations. We define two circuit type quantum com-
6putational models A and B. We require that for each of them
in order to compute the SC (equation (1)) all n(n−1)2 two-qubit
gates need to be executed, and no ancilla qubits may be used.
Furthermore,
• in model A we assume that the gates required to be exe-
cuted in SC cannot be commuted (other than trivially—
a pair of gates operating on non-intersecting sets of
qubits always commutes);
• in model B we allow possibility of the execution of
gates in any order (i.e., this lets us obtain bounds that al-
low commuting gates through the circuit, without wor-
rying about which gates actually commute, and what
kind of corrections are needed in case they do not com-
mute).
The architectures considered in this paper are LNN, 2D square
lattice, and bounded degree graph with the degree of each ver-
tex no more than k. We next prove a number of lower bounds,
refer to Table I.
TABLE I: Lower bounds on the depth of the SC in models A and B in
the LNN, 2D square lattice, and bounded degree graph architectures.
LNN 2D square lattice bounded degree graph
model A 10n3 +O(1) 3n+O(1) (2+
2
k )n+O(1)
model B 3n2 +O(1)
5n
4 +O(1) (1+
1
k )n+O(1)
10n
3 + O(1) bound in LNN, model A. First, denote each
depth-1 computational stage (logic level) by L and each depth-
1 swapping stage by S. Every three stages of the SC have a
single fixed qubit that interacts with three other qubits. This
is either q1, q2, or qn. Thus, every three logic levels have to
be separated by a round of SWAPs, each having depth at least
1, i.e. each sequence LLL must be replaced by LSLL or LLSL
to be able to run the circuit in the LNN architecture. We call
this 3L→ 1S requirement. With the 3L→ 1S requirement, the
total depth must be at least 2n−3+ ⌈ 12(2n−5)⌉= 3n+O(1)
logic levels. Therefore, using just the 3L → 1S requirement
proves that our circuit is at most factor 43 off the optimum. We
now improve this bound to 10n3 +O(1) by showing that every
4 computational stages must be separated by at least depth-2
swapping stage (4L → 2S requirement). 4L → 2S is slightly
more restrictive than 3L → 1S. The difference between the
two is that in one LLSLL is allowed, but not in the other. We
next prove that depth-1 level does not suffice in separating
some two computational stages from the following two by ex-
ploring the properties of SC and the LNN architecture.
Assume all 4 computational stages Li, Li+1, Li+2, and Li+3
are solely in the first half of SC. The second half is sym-
metric to the first half and thus a similar proof holds for it.
We do not prove the boundary case (where one part of the 4-
stage computation is in the first half of the SC and the other
part is in the second half) because its contribution to the final
figure is only a constant. Next, assume i is odd. The proof
for even values i is analogous. Name the qubits q1,q2, . . . ,qn
top to bottom. The computational stages Li and Li+1 use
interactions qi+2 − q1, q1 − qi+1, qi+1 − q2, . . . ,q i+1
2
− q i+3
2
,
which in the LNN architecture can only be aligned as follows:
qi+2− q1− qi+1− q2− . . .− q i+1
2
− q i+3
2
. The computational
stages Li+2 and Li+3 use interactions qi+4 − q1, q1 − qi+3,
qi+3− q2, . . . ,q i+3
2
− q i+5
2
. In particular, stages Li+2 and Li+3
require interaction q i+3
2
− q i+7
2
, and qubit q i+7
2
is used both in
Li+2 and Li+3. However, we know that after completion of
stages Li and Li+1, the architecture allows interactions in the
following order q i+3
2
− q i+1
2
− q i+5
2
− q i−1
2
− q i+7
2
. The LNN
architecture distance between q i+3
2
and q i+7
2
is 4. A depth-
1 swapping reduces the architectural distance between these
qubits by at most 2, which is not enough for the desired inter-
action to be allowed. Thus, the depth of swapping must be at
least 2. This concludes the proof of the 4L→ 2S requirement.
We finalize the proof of 10n3 + O(1) lower bound by
observing that for a circuit with 2n + O(1) stages L we
need to have at least 4n3 +O(1) stages S to satisfy 4L → 2S
requirement. Thus, the total number of stages required to
execute SC in LNN is 10n3 +O(1). This implies that the circuit
we constructed explicitly (Fig. 1(c)) must be within factor of
6
5 from optimum.
3n+O(1) lower bound in 2D square lattice, model A. We
prove that every three computational stages Li−2, Li−1, and Li,
where i = 2k+ 1 and k = 1..⌈ n−22 ⌉ (this means that all com-
putational stages are in the first part of SC; the proof for the
symmetric second part is similar) must contain at least one
swapping stage if ran in 2D square lattice architecture. We
prove this by finding three interactions that form a loop. Ver-
tices in such loop cannot be isomorphically mapped to the ver-
tices of 2D square lattice. The interactions that form such a
loop, assuming qubits are named q1,q2, . . . ,qn top to bottom,
are q i−1
2
− q i+1
2
in Li−2, q i−1
2
− q i+3
2
in Li−1, and q i+1
2
− q i+3
2
in Li. This proves that for every possible value k it is required
to have at least one swapping stage, which results in the con-
struction of 3n+O(1) lower bound.
The lower bound that we just proved may be interesting
to those experimentalists working on implementing 2D
architectures for quantum information processing. The lower
bound shows that, with certain restrictions, the QFT in 2D
square lattices cannot in principle be parallelized any more
efficiently than to a depth at least 34 of the depth of QFT
circuit executable in the LNN architecture.
3n
2 +O(1) lower bound in NCT, model B. Recall that the
number of gates in SC is n(n−1)2 and they all require different
qubit-to-qubit interactions to be available. Next, note that in
the LNN architecture application of a single SWAP may make
at most two new interactions become available for a gate to be
applied on. Thus, the total number of SWAPs that one must
execute in a circuit to go through all n(n−1)2 possible interac-
tions is at least ⌈
n(n−1)
2 −(n−1)
2 ⌉= ⌈
(n−1)(n−2)
4 ⌉. This means that
the total number of gates to be executed in the LNN architec-
ture to compute SC must be at least n(n−1)2 + ⌈
(n−1)(n−2)
4 ⌉ =
⌈ (3n−2)(n−1)4 ⌉. At most ⌊
n
2⌋ gates can be executed in paral-
7lel. Thus, the depth of the circuit is at least the minimum
total number of gates to be executed divided by the maxi-
mum number of gates that can be executed simultaneously,
i.e. 3n2 +O(1).
This lower bound is constructed based on the assumption
that all gates in SC need to be executed, and does not take into
account that the order they are executed in is important. Thus,
the restriction on the form of the computation is significantly
weaker than that for model A, and the proven lower bound is
looser.
Generalizing the above techniques, it can be shown that in
an architecture where each qubit has a finite number of neigh-
bors bounded by number k:
• the lower bound for executing SC is (2+ 2k )n+O(1) in
model A;
• the lower bound for executing SC is (1+ 1k )n+O(1) in
model B.
The 5n4 +O(1) lower bound announced in Table I follows
from the second of these two statements. Given the linearity
of proven lower and upper bounds, we have just shown the
asymptotic optimality of the depth of our skeleton circuit in
the restricted architectures considered in this paper.
IV. CONCLUSION
In this paper we studied the complexity of the execution of
the quantum Fourier transformation and stabilizer circuits in
restricted architectures.
We reconstructed the depth 4n+ O(1) circuit (composed
with SWAP and controlled-Z gates) for QFT initially reported
in [17] which is implemented in the LNN architecture. With
the application of our generalized technique we showed how
the approximate QFT circuit can be executed in linear depth in
the LNN architecture. We proved a number of lower bounds
for the depth of QFT circuit, which are all a constant factor
away (ranging from 14 to 56 , and depending on the computa-
tional model and assumptions made) from the above upper
bound. Some of our lower bounds can be used by experimen-
talists working on implementing advanced architectures as a
guide to how complex architectures may need to be for partic-
ular types of computations. For instance, we proved that, with
certain restrictions, the QFT circuit in 2D square lattices can-
not in principle be parallelized more than to the depth equal to
3
4 of the depth of QFT circuit executable in the LNN architec-
ture.
More importantly, we presented a constructive algorithm
for synthesizing linear depth stabilizer circuits in the LNN ar-
chitecture. In particular, we showed that any stabilizer circuit
can be executed in at most 30n+O(1) stages each composed
with generic two-qubit gates, which in the library with CNOT
and single-qubit gates translates to at most depth 90n+O(1)
circuit. This upper bound is asymptotically optimal. We
considered specific stabilizer circuits and showed how these
circuits can be executed faster than reported by previous re-
searchers [24].
Acknowledgments
I would like to thank Prof. Michele Mosca from the Uni-
versity of Waterloo and for his help in preparation of this
manuscript and useful discussions. I wish to thank Jacob D.
Biamonte from the University of Oxford and Donny Cheung
from the University of Waterloo for their help in preparation
and proofreading this manuscript. This work was supported
by PDF grant from the National Sciences and Engineering Re-
search Council of Canada.
[1] L. K. Grover. A fast quantum mechanical algorithm for
database search. Proceedings of 28th Annual ACM Sym-
posium on the Theory of Computing, pages 212-219, 1996,
quant-ph/9605043.
[2] P. W. Shor. Polynomial-time algorithms for prime factorization
and discrete logarithms on a quantum computer. SIAM Journal
of Computing, 26:1484–1509, 1997, quant-ph/9508027.
[3] C. Negrevergne, T. S. Mahesh, C. A. Ryan, M. Ditty, F. Cyr-
Racine, W. Power, N. Boulant, T. Havel, D. G. Cory, and
R. Laflamme. Benchmarking quantum control methods on a
12-qubit system. Physical Review Letters, 96(170501), 2006,
quant-ph/0603248.
[4] H. Ha¨ffner, W. Ha¨nsel, C. F. Roos, J. Benhelm, D. Chek-al-kar,
M. Chwalla, T. Ko¨rber, U. D. Rapol, M. Riebe, P. O. Schmidt,
C. Becher, O. Gu¨hne, W. Du¨r, and R. Blatt. Scalable multiparti-
cle entanglement of trapped ions. Nature 438:643–646, Decem-
ber 2005, quant-ph/0603217.
[5] M. Nielsen and I. Chuang. Quantum Computation and Quan-
tum Information. Cambridge University Press, 2000.
[6] J. Zhang, J. Vala, S. Sastry, and K. B. Whaley. Geometric
theory of nonlocal two-qubit operations. Physical Review A,
67(042313), 2003, quant-ph/0209120.
[7] L. M. K. Vandersypen, M. Steffen, G. Breyta, C. S. Yannoni,
M. H. Sherwood, and I. L. Chuang. Experimental realization
of Shor’s quantum factoring algorithm using Nuclear Magnetic
Resonance, Nature 414:883–887, December 2001.
[8] R. Van Meter and M. Oskin. Architectural implications of quan-
tum computing technologies. ACM Journal on Emerging Tech-
nologies in Computing Systems, 2(1):31–63, 2006.
[9] D. Copsey, M. Oskin, F. Impens, T. Metodiev, A. Cross, F.
T. Chong, I. L. Chuang, and J. Kubiatowicz. Toward a scalable
silicon-based quantum computing architecture. IEEE Journal
of Selected Topics in Quantum Electronics, 9(6):1552–1569,
2003.
[10] A. Skinner, M. Davenport, and B. Kane. Hydrogenic spin quan-
tum computing in silicon: a digital approach. Physical Review
Letters, 90(087901), 2003, quant-ph/0206159.
[11] J. Sherson, H. Krauter, R. K. Olsson, B. Julsgaard, K. Ham-
merer, I. Cirac, E. S. Polzik. Quantum teleportation between
light and matter. Nature 443:557–560, 2006.
[12] G. Burkard and A. Imamoglu. Ultra-long distance interaction
between spin qubits. Physical Review B 74(041307), 2006,
8cond-mat/0603119.
[13] A. M. Steane and D. M. Lukas. Quantum computing with
trapped ions, atoms and light. Fortschritte der Physik, 48(9-
11):839–858, 2000, quant-ph/0004053.
[14] R. Cleve and J. Watrous. Fast parallel circuits for the quan-
tum Fourier transform. In IEEE Symposium on Foundations of
Computer Science, pages 526–536, 2000, quant-ph/0006004.
[15] D. Coppersmith. An approximate Fourier transform useful in
quantum factoring. Technical Report RC19642, IBM, 1994.
[16] C. Moore and M. Nilsson. Parallel quantum computation and
quantum codes, 1998, quant-ph/9808027.
[17] A. G. Fowler, S. J. Devitt, and L. C. L. Hollenberg. Implemen-
tation of Shor’s algorithm on a linear nearest neighbor qubit
array. Quantum Information and Computation, 4(4):237–251,
2004, quant-ph/0402196.
[18] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin, and W. K. Woot-
ters. Mixed-state entanglement and quantum error correction.
Physical Review A, 54(3824), 1996, quant-ph/9604024.
[19] D. Gottesman. A class of quantum error-correcting codes sat-
urating the quantum Hamming bound. Physical Review A
54(1862), 1996, quant-ph/9604038.
[20] S. Aaronson and D. Gottesman. Improved simulation of
stabilizer circuits. Physical Review A, 70(052328), 2004,
quant-ph/0406196.
[21] A. Broadbent and E. Kashefi. Parallelizing Quantum Circuits,
April 2007, arXiv:0704.1736.
[22] K. N. Patel, I. L. Markov and J. P. Hayes. Efficient synthe-
sis of linear reversible circuits. In International Workshop on
Logic Synthesis, Temecula Creek CA, June 2004, pp. 470-477,
quant-ph/0302002.
[23] A. G. Fowler, C. D. Hill and L. C. L. Hollenberg. Quantum er-
ror correction on linear nearest neighbor qubit arrays. Physical
Review A, 69(042314), 2004, quant-ph/0311116.
[24] T. Metodi, D. D. Thaker, A. W. Cross, F. T. Chong, and I. L.
Chuang. Scheduling physical operations in a quantum informa-
tion processor. In Proceedings of the SPIE, Volume 6244, pp.
62440T, 2006.
