Constant-Factor Optimization of Quantum Adders on 2D Quantum
  Architectures by Saeedi, Mehdi et al.
ar
X
iv
:1
30
4.
04
32
v1
  [
qu
an
t-p
h]
  1
 A
pr
 20
13
Constant-Factor Optimization of Quantum Adders on 2D Quantum Architectures
Mehdi Saeedi∗, Alireza Shafaei, Massoud Pedram
Department of Electrical Engineering, University of Southern California,
Los Angeles, CA 90089-2562
August 26, 2018
Abstract
Quantum arithmetic circuits have practical applications in various quantum algorithms.
In this paper, we address quantum addition on 2-dimensional nearest-neighbor architectures
based on the work presented by Choi and Van Meter (JETC 2012). To this end, we propose
new circuit structures for some basic blocks in the adder, and reduce communication overhead
by adding concurrency to consecutive blocks and also by parallel execution of expensive Toffoli
gates. The proposed optimizations reduce total depth from 140
√
n + k1 to 92
√
n + k2 for
constants k1, k2 and affect the computation fidelity considerably.
1 Introduction
Quantum algorithms are often described in the quantum circuit model of computation, where for
a quantum circuit with n qubits, any pairs of qubits can interact. However, current advances in
physical quantum technologies can only allow qubit interactions in one-, two-, or three-dimensional
spaces. Restricting interactions to only linear dimension results in O(n) overhead. On the other
hand, working with 2D (or 3D) quantum architectures where each qubit can interact with 4 (or
6) neighboring qubits provides more flexibility.
For a given quantum circuit C one can construct an interaction graph GC = (VC , EC), the
nodes of which represent qubits in C with edges between them when a gate in C involves the
related qubits. Additionally, the architecture (or fabric) of a quantum computing system can be
described by a simple connected graph GQ = (VQ, EQ) where vertices VQ represent qubits and
edges EQ represent adjacent qubit pairs that gates can be applied on [1]. Accordingly, the problem
of mapping a quantum circuit C with arbitrary interactions between qubits onto a quantum
architecture with limited interaction distance can be mapped to the problem of embedding graph
GC into graph GQ.
In general, the graph embedding problem is NP-hard. However, optimal embedding methods
with polynomial time complexities for several classes of graphs have been proposed [2]. In [3],
the concept of dilation in graph embedding has been applied to find a depth lower bound for
a quantum circuit after embedding. In this case, dilation is defined as the maximum distance
between adjacent nodes of the graph after embedding. Working with proven properties of log-
depth binary trees and considering the fact that log-depth quantum addition circuits exist, Choi
and Van Meter [3] showed that the depth lower bound of the exact quantum addition circuit
on a k-dimensional quantum architecture is Ω( k
√
n). In [4], the authors examined the minimum
overhead in depth for emulating a circuit C by a circuit C′ subject to the constraints imposed by
the interaction constraints and showed that this overhead is O(n) for 1D, O(
√
n) for 2D, O(log2 n)
or O(log n) (depending on the approach) for hypercube.
Exploring an efficient realization of a given quantum algorithm or quantum circuit for a re-
stricted architecture has been followed by a number of researchers during the recent years. Physical
implementations of the quantum Fourier transform (QFT) [5, 6], Shor’s factorization algorithm
[7, 8, 9], quantum error correction [10], and general reversible circuits [11] for 1D/2D architectures
∗msaeedi@usc.edu
1
• • • T • • • • T •
• • T  T †  • ×  T  T †  × T  T † 
H  T †  T  T †  T H H  T † × • × T H
Figure 1: Decomposition of the Toffoli gate into one-qubit and six CNOT gates [24] and the
implementation with adjacent qubits.
have been explored in the past. Worst-case synthesis cost of a general/Boolean unitary matrix
under the 1D restriction has been discussed in [12, 13, 14, 15]. In [16, 17, 18] heuristic methods
for converting an arbitrary quantum circuit to its equivalent circuit on 1D architectures have been
proposed.
Quantum adder and its modular version have applications in different quantum algorithms
including Shor’s factoring algorithm. In [19], a quantum adder with Θ(
√
n) depth on 2D quantum
architectures was proposed which has 140
√
n− 72 depth, in terms of one- and two-qubit quantum
gates. Asymptotically, the depth of the proposed adder is optimal. However, constant-factor
optimization is possible and in fact desirable. Besides the effect of reducing circuit size/depth
on physical realization, any additional gate in the circuit longest path can reduce circuit fidelity
to some extent. Based on the analysis done in [20] for fault-tolerant error correction with a
concatenated 7-qubit CSS code [21], nearest-neighbour communication overhead results in 175x
reduction in error threshold. Improving error threshold is costly and may include using a more
sophisticated quantum control protocol to have gates with higher fidelities or applying a more
robust error correction code. Therefore, reducing unnecessary communication overhead for a useful
quantum computation is vital. Because of the effect of addition on e.g., modular multiplication
and modular exponentiation circuits [9, 22, 23], reducing communication overhead for quantum
adder by circuit optimization — the focus of this work — is of particular interest.
In this paper, we show how 140
√
n+const depth in [19] can be further improved to 92
√
n+const.
For this purpose, we reconsider the basic blocks in the suggested quantum adder and introduce
some constant-factor optimizations in communication overhead in different stages. To physically
implement a given circuit, one needs to decompose all gates into primitive one- and two-qubit
gates. To decompose a 3-qubit Toffoli (T ) gate, we use Clifford+T gates which are universal and
have fault-tolerant (FT) implementation [21]. Figure 1 shows the decomposition of the Toffoli gate
into one- and two-qubit gates. To consider depth, we report circuit depth in terms of single-qubit,
CNOT (C) and SWAP (S) gates. The rest of this paper is organized as follows. In Section 2, the
method in [19] is discussed. We introduce the reduction techniques in Section 3. The result of
the proposed reductions is analyzed in Section 4 and Section 5. We finally conclude the paper in
Section 6.
2 Quantum Addition on 2D Architectures
In this section, we describe the circuit structure in [19] for quantum addition on 2D architectures.
For an n-qubit quantum circuit, the method in [19] arranges the qubits in
√
n×√n arrays where
each qubit can interact with its four neighboring qubits with no additional cost. Additionally, the
circuit was divided into 3 phases which are executed sequentially. In the first phase, ripple-carry
addition is performed on the first column, and carry-lookahead addition is performed on the other√
n − 1 columns. In the second phase, carry propagation is performed between columns, and
finally in phase 3 carry generation and summation are performed.
In the first phase, after using a half-adder and
√
n− 1 full-adders output carries c2, · · · c√n+1
will be available. It is done in 32
√
n− 17 unit-time steps in [19]. The carry-lookahead addition in
other columns produces
gk
√
n+j = ak
√
n+j · bk√n+j (1)
pk
√
n+j = ak
√
n+j ⊕ bk√n+j (2)
for 1 ≤ k ≤ √n− 1 and 1 ≤ j ≤ √n. After computing gi and pi values in all columns in parallel,
G[i, j] and P [i, j] are computed in serial based on (3) and (4) for 1 ≤ k ≤ √n−1, and 2 ≤ j ≤ √n
2
Table 1: Basic blocks in 2D adder [19] and their depths in terms of unit-cost gates. The last term
(i.e., 3) in total depth represents 2 NOTs and one CNOT gate used to construct the final output
in [19].
Name #steps: gate sequence Circuit
H, T, CNOT (C), SWAP (S) 1
Toffoli (T (a,b,0)) 14: 2 S+ 12 1-qubit H(0)C(b,0)T†(0)S(b,0)C(a,b)T(b)C(0,b)
T†(b)C(a,b)S(b,0)T(b)T(0)C(a,b)H(0)
T(a)T†(b)C(a,b)
Half-adder(a,b,0) 15: 1 T + 1 C T (a,b,0)T (a,b)
Full-adder(c,a,b,0) 32: 2 T + 2 C+ 2 S T (a,b,0)T (a,b)S(c,a)T (a,b,0)T (a,b)
S(c,a)
g,p(a,b,0) 15: 1 T + 1 C T (a,b,0)T (a,b)
G,P(P,G,a,p,g,0) 34: 2 T + 6 S S(G,a)S(P,G)T(a,p,g)S(G,a)S(g,0)
T(a,p,g)S(G,a)S(P,G)S(G,a)
Column carry(P,G,C) 18: 1 T + 4 S S(P,G)T (C,G,P)S(G,C)S(P,G)S(G,C)
Carry(P,G,a,p,C) 18: 1 T + 4 S S(P,G)S(p,C)S(a,p)T (a,G,P)S(G,a)
S(P,G)
Carry1(p,g,c) 16: 1 T + 2 S S(g,c)T (p,g,c)S(p,g)
SUM(c,P,a,p) 5 : 1 C+ 4 S S(c,P)S(P,a)T (a,p)S(P,a)S(c,P)
SUM1(c,a,p) 3 : 1 C+ 2 S S(c,a)T (a,p)S(c,a)
SUM2(p,c) 1 : 1 C T (c,p)
phase 1 34
√
n− 19: g,p + (√n− 1)G,P
phase 2 18
√
n− 18: (√n − 1) Column carry
phase 3 18
√
n+ 1: (
√
n − 1) Carry + Carry1 + SUM1
clearing ancillae 70
√
n− 39: phase 1 + phase 2 + phase 3 - SUM1
total depth 140
√
n− 72: phase 1 + phase 2 + phase 3 + clearing ancillae + 3
where G[k
√
n + 1, k
√
n + 1] = gk
√
n+1 and P [k
√
n + 1, k
√
n + 1] = pk
√
n+1. This part takes
34
√
n− 19 time steps in [19]. Accordingly, the first phase in [19] results in 34√n− 19 time steps.
G[k
√
n+ 1, k
√
n+ j] = gk
√
n+j ⊕ pk√n+j ·G[k
√
n+ 1, k
√
n+ j − 1] (3)
P [k
√
n+ 1, k
√
n+ j] = pk
√
n+j · P [k
√
n+ 1, k
√
n+ j − 1] (4)
In the second phase, column-level carries are computed as shown in (5) for 1 ≤ k ≤ √n− 1 in
18
√
n− 18 time steps.
c(k+1)
√
n+1 = G[k
√
n+ 1, (k + 1)
√
n]⊕ ck√n+1 · P [k
√
n+ 1, (k + 1)
√
n] (5)
In phase 3 output carries are calculated sequentially as (6) for 1 ≤ k ≤ √n − 1 and j =√
n− 1, ..., 1.
ck
√
n+j+1 = G[k
√
n+ 1, k
√
n+ j]⊕ ck√n+1 · P [k
√
n+ 1, k
√
n+ j] (6)
Finally, addition outputs are calculated as shown in (7) for 1 ≤ k ≤ √n− 1 and 1 ≤ j ≤ √n.
Altogether, operations in phase 3 can be performed in 18
√
n+ 1 time steps.
sk
√
n+j = ak
√
n+j ⊕ bk√n+j ⊕ ck√n+j (7)
Considering the three subcircuits for phase 1, phase 2, and phase 3 in sequence leads to
70
√
n − 36 time steps in [19]. Applying the inverse circuit to clear ancillae leads to 140√n − 72
time steps for the complete adder.
Based on the equations (1)-(7), Table 1 reports circuit depth in different blocks. In this table,
we used the same notation in [19] for circuit blocks — g,p to compute gi, pi values in (1) and (2);
G,P to compute G[i, j] and P [i, j] values in (3) and (4); Column carry to compute column-level
carries in (5); Carry & Carry1 to compute carries in (6); and SUM, SUM1 & SUM2 to compute
final outputs in (7).
3 The Proposed 2D Adder
In this section, we revise the basic blocks in [19] and introduce additional parallelism in var-
ious parts to reduce circuit depth. Basically, the proposed optimizations are based on (1) new
3
circuit structures for CARRY and SUM basic blocks (2) reducing communication overhead in Col-
umn carry, (3) parallel execution of expensive Toffoli gates in G,P blocks as well as in Full-adders,
and (4) reducing interaction overhead by adding concurrency to consecutive blocks.
3.1 New Circuits
Working with the same circuit structures in [19] for Half-adder, g,p, and G,P blocks as reported
in Table 1, we define several new structures for the other blocks.
• Full-adder: The first T and C gates in the Full-adder blocks in [19] can be executed in
parallel with the gates in the Half-adder circuit. This saves one T and one C for all √n− 1
Full-adders.
• Column Carry: Figure 4 shows the new structure of Column Carry block. In this circuit,
c[k
√
n+1] is from the previous column (e.g., c4 in Figure 2). After the computation, the new
carry, e.g., c7, is moved down, to be used by the next Column Carry block. The previous
carry, e.g., c4 is placed near to the Carry module. This new structure saves 1 SWAP gate.
• Carry: Figure 5 shows the new structure for Carry block. Since c[k√n + 1] is required to
compute all carries in different rows, c[k
√
n + 1] is moved up in this figure. On the other
hand, the generated carry is required to compute sum values, and hence is moved down.
This new circuit uses 5 SWAP gates (vs. 4 in [19]).
• SUM: Applying the proposed circuit for Carry results in adjacent c[k√n + j + 1] and
p[k
√
n+ j+1] values (see Figure 5). Based on (7) sum outputs can be computed by a single
CNOT gate. This saves 4 SWAP gates in [19]. In order to construct si values on bi qubits,
one needs to add one SWAP gate S(p[k√n+1], c[k√n+ 1]). However, this SWAP gate can
be removed because of an identical SWAP gate in the Carry circuit. Accordingly, we define
another circuit block Carry1 with excluding the SWAP on c[k
√
n+1] and P [k
√
n+1][k
√
n+j]
(for j = 1) qubits. We do not need to use SUM1 and SUM2 blocks in the proposed 2D adder
structure.
3.2 Reducing Communication Overhead
To use adjacent gates in the 2D quantum adder, we use a set of SWAP gates inside each circuit
block. The added SWAP gates are used for communication between those gates required for the
computation. In other words, the added SWAP gates are not required for the computation, and
should be reduced as much as possible. Independent optimization of different blocks can reduce
communication overhead inside each subcircuit, but has no view about the neighboring subcircuits.
In this section, we consider consecutive circuit blocks to reduce communication overhead further.
Note that the optimizations given in this section are based on the new circuit blocks given in
Section 3.1.
• G,P ⇛ Carry: Reconsider (3), (4), and (6) and note that the result of Column carry in
(5), i.e., c[k
√
n + 1], is constructed on the last qubit in the Carry block (see Figure 4 and
Figure 5). Figure 6 shows the blocks in sequence. To simplify the circuit, note that the
last three SWAP gates in G,P can be moved to right. Next, the resulting circuit can be
reconstructed as shown in Figure 6(b). Accordingly, three SWAP gates in each G,P block
can be saved. Figure 7 shows the new circuits for Carry and Carry1. Note that some of G,P
blocks are directly connected to the Carry (or Carry1) blocks without any interaction with
Column carry blocks. For such cases, we can apply the same mechanism.
• G,P ⇛ G,P: Each G,P block constructs two outputs based on (4) and (3) where G[k√n+
1, k
√
n + j] depends on G[k
√
n + 1, k
√
n + j − 1] and P [k√n + 1, k√n + j] depends on
P [k
√
n + 1, k
√
n + j − 1]. Since G[k√n+ 1, k√n + j] is constructed first, we can use it to
construct G[k
√
n+ 1, k
√
n+ j + 1] in parallel to construction of P [k
√
n+ 1, k
√
n+ j − 1].
This can save one Toffoli and one SWAP. Figure 8 shows the result of this optimization.
4
a1
HA
•
C−1
a1
b1    s1
0
FA
0
a2 • a2
b2    s2
0
FA
0
a3 • a3
b3    s3
0
ColCarry
0
a4
g, p
• a4
b4
G,P
CARRY1
SUM
   s4
0 0
a5
g, p
• a5
b5
SUM
   s5
0
G,P
CARRY
0
0 0
a6
g, p
• a6
b6
SUM
   s6
0 0
0
ColCarry
0
a7
g, p
• a7
b7
G,P
CARRY1
SUM
   s7
0 0
a8
g, p
• a8
b8
SUM
   s8
0
G,P
CARRY
0
0 0
a9
g, p
 •  a9
b9
SUM
 s9
0 0
0 0
Figure 2: The revised block diagram of a 2D 9-bit adder in [19] based
on the blocks used in this paper. The critical path in this circuit is
g,p99KG,P99KColCarry99KColCarry99KCARRY99KCARRY199KSUM. The C−1 block is the
reverse of the circuit shown in the dashed box. This reverse circuit with the NOTs and CNOTs
shown are applied to clear ancillae in [19]. Except for ColCarry (Column carry), the number
of inputs and outputs for other modules are the same as the ones shown in this figure. In
Column carry, the number of inputs/outputs is 3 — i.e., the first line and the last two lines are
actual inputs and outputs. Note that these three lines are neighbor in the 2D layout. The qubit
placement for this 2D grid and their values during the computation (up to clearing ancillae) are
given in Figure 3.
4 Depth Analysis
In this section, we analyze the circuit depth of a 2D n-bit quantum adder based on the circuit
structures proposed for each block.
• Phase 1 — Half-adder+Full-adder: We can execute Half-adder and the first two gates
(T +C) in all Full-adders in parallel. This results in 1T +1C+(√n − 1)(2S+1C+1T ) time
steps.
• Phase 1 — g,p+G,P: Each g,p block includes one Toffoli gate and one CNOT gate. Except
for the first G,P block, the other
√
n − 2 G,P blocks include 3 SWAPs and 1 Toffoli. The
first G,P block includes two Toffoli and two SWAP gates. Altogether, circuit depth can be
calculated as (1T +1C)+(2T+2S)+(√n− 2)(3S+1T ).
• Phase 2 — Column carry: There are √n − 1 Column carry blocks in cascade. This
results in
√
n− 1(1T +3S) time steps.
• Phase 3 — Carry + SUM: There are √n− 2 Carry blocks followed by one Carry1 block
and one SUM block. Therefore, circuit depth is (
√
n− 2)(1T +4S)+(3S+1T )+1C.
5
0b3
a3
0
b2
a2
0
b1
a1
-
-
0
0
b6
a6
0
0
b5
a5
0
b4
a4
0
0
b9
a9
0
0
b8
a8
0
b7
a7 - a4 a7
- b4, p4, s4 b7, p7, s7
a1 0, g4, c4 0, g7, c7
b1, s1 a5 a8
0, c2 b5, p5, s5 b8, p8, s8
a2 0, g5, P [4, 5], c4, c5 0, g8, P [7, 8], c7, c8
b2, s2 0, G[4, 5], P [4, 5] 0, G[7, 8], P [7, 8]
0, c3 a6 a9
a3 b6, p6, s6 b9, p9, s9
b3, s3 0, g6, P [4, 6], c4, c6 0, g9, P [7, 9], c7, c9
0, c4 0, G[4, 6], c7 0, G[7, 9], c10
Figure 3: The qubit placement for the 2D grid in Figure 2 and their values during the computa-
tion.
c[k
√
n+ 1] • × • × × P [k√n+ 1][(k + 1)√n] P [k√n+ 1][(k + 1)√n] • ×
P [k
√
n+ 1][(k + 1)
√
n] • × ≡ • × c[k√n+ 1] G[k√n+ 1][(k + 1)√n]  ×××
G[k
√
n+ 1][(k + 1)
√
n]   ××× c[(k + 1)√n+ 1] c[k√n+ 1] • × ×
(a) (b) (c)
Figure 4: (a) Circuit structure for Column carry based on (5). Note that c[(k − 1)√n + 1] and
P [(k− 1)√n+1][k√n] are not adjacent (see Figure 2). (b) Circuit in (a) with adjacent gates. (c)
Circuit in (b) with relabelled qubits to show adjacent qubits.
Table 2 reports circuit depth for each component and the total depth in the proposed 2D
quantum adder. As can be seen in this table, circuit depth is improved by a factor of 2635 (i.e.,
%24).
In [25], a new circuit for Peres with depth=5C+3 has been proposed (Figure 10(a)). After
inserting one CNOT (to have Toffoli) and two SWAP gates to have adjacent gates, one can use
the new circuit with depth=6C+2S+4 in order to further optimize the proposed 2D adder. Note
that in [25], a circuit structure for Toffoli gate with depth=6C+2 has been proposed too, Figure
9. However, working with Peres gate results in a more compact circuit in terms of the number
of SWAP gates. Following this path results in depth=92
√
n+const for the proposed 2D quantum
adder. Table 3 compares circuit depth based on different costs for Toffoli and SWAP gates.
5 Error Correction
To protect quantum information from errors due to e.g., noise or decoherence, quantum error
correction (QEC) should be used in any large-scale quantum computation. In the recent years,
various models for QEC have been proposed [21]. A common technique, known as concatenated
quantum code, is to encode a logical qubit into the state of several physical qubits (e.g., 7 in
Steane code and 9 in Bacon-Shor code [21], both for one level of concatenation).
Let assume each unitary operation should be followed by quantum error correction for proper
computation. This results in an aggressive quantum error correction mechanism. In some cir-
cumstances, one may insert error correction after several operations, instead of each operation.
Consider a quantum computation U with NU logical operations which include only FT quantum
gates. Moreover, assume that error correction for each FT gate requires NE physical instruc-
tions. NE includes SWAPs required for communication. Normally, NE differs for various logical
operations; however, we can consider the worst-case value among all FT gates. Working with
concatenated quantum error correction techniques, the total physical gate count at concatenation
level L can be estimated as NL = NL−1+NL−1×NE or NL ≈ NL−1×NE. We haveN0 = NU , and
P [k
√
n+ 1][k
√
n+ j] • ×× • × c[k√n+ 1]
G[k
√
n+ 1][k
√
n+ j]  ×  ×× P [k√n+ 1][k√n+ j]
a[k
√
n+ j + 1] ≡ × • ×× a[k√n+ j + 1]
p[k
√
n+ j + 1] ×× ×× p[k√n+ j + 1]
c[k
√
n+ 1] • × × × c[k√n+ j + 1]
Figure 5: Circuit structure for Carry based on (6). Inputs a[k
√
n+ j + 1] and p[k
√
n+ j + 1] are
not used in the computation.
6
• × × • × ×
· · · · · ·
P [k
√
n+ 1][k
√
n+ j] × × • × × ×
G[k
√
n+ 1][k
√
n+ j] × × × ×××  ×× ≡ × × ×  ×××
a[k
√
n+ j + 1] × • × • × × × • ×× × • × • • ××××
p[k
√
n+ j + 1] • • ×× ×× • • × • × ××
g[k
√
n+ j + 1]  ×  • × × ×  ×  • × × ×
0 ×  ××× ×  ×××
(a) (b)
Figure 6: (a) G,P, Column carry, and Carry blocks in cascade. The three rightmost SWAP gates
in G,P can be merged with gates in the Carry block to construct a new circuit shown in (b).
P [k
√
n+ 1][k
√
n+ j] × × c[k√n+ 1]
G[k
√
n+ 1][k
√
n+ j]  ×××  ××× P [k√n+ 1][k√n+ j]
a[k
√
n+ j + 1] • ×××× • ××× a[k√n+ j + 1]
p[k
√
n+ j + 1] × • × ×× × • ×× p[k√n+ j + 1]
c[k
√
n+ 1] × × × × c[k√n+ j + 1]
(a) (b)
Figure 7: New circuit structures for Carry (a) and Carry1 (b) based on the optimization shown
in Fig 7. Note that the first SWAP gate can be executed in parallel with gates in the previous
block (see Figure 7).
therefore, NL = NU (NE)
L. Accordingly, besides the effect of the proposed approach on circuit
depth, one can implement the proposed 2D adder with fewer gates — the reduction factor is 2435 .
6 Conclusion
We considered a quantum adder on 2D quantum architectures. Our work is based on the results
reported in [19] with several improvements. In particular, we optimized the building blocks of
the 2D adder with focus on reducing the communication overhead required in 2D quantum archi-
tectures. Having optimized consecutive blocks, the proposed adder can execute expensive Toffoli
gates concurrently in several locations. The suggested optimizations improve depth=140
√
n+ k1
in [19] to 92
√
n+ k2 for constants k1 and k2.
References
[1] D. Cheung, D. Maslov, and S. Severini. Translation techniques between quantum circuit
architectures. Workshop on Quant. Inf. Proc., Dec 2007.
[2] J. Dı´az, J. Petit, and M. J. Serna. A survey of graph layout problems. ACM Comput. Surv.,
34(3):313–356, 2002.
Table 2: Circuit depth for our blocks in 2D adder. Circuit depths for CNOT (C), SWAP (S), and
Toffoli (T ) gates are considered as 1, 1, and 14 as done in [19].
Block Circuit Ours [19]
Half-adder 1T +1C 15 15
Full-adder 2S+1C+1T 17 32
g,p 1T +1C 15 15
G,P (first) 2T +2S 30 34
G,P (others) 3S+1T 17 34
Column carry 1T +3S 17 18
Carry 1T +4S 18 18
Carry1 3S+1T 17 18
SUM 1C 1 5
Phase1-1 1T +1C+(√n− 1)(2S+1C+1T ) 17√n− 2 32√n− 17
Phase1-2 (1T +1C)+(2(T +2S)+(√n− 2)(3S+1T ) 17√n+ 11 34√n− 19
Phase2 (
√
n− 1)(1T +3S) 17√n− 17 18√n− 18
Phase3 (
√
n− 2)(1T +4S)+(3S+1T )+1C 18√n− 18 18√n+ 1
clearing ancillae Phase1-2+Phase2+Phase3-SUM 52
√
n− 24 70√n− 39
2D Adder Phase1-2+Phase2+Phase3+clearing ancillae+3 104
√
n− 46 140√n− 72
7
× ×
× × × × × ×
× • × • × • × •
• • • •
 ×  × ≡  ×  × × × ×
× × × × × • × × × • × × ×
× • × • × • ×× • × • ×× •
• • × ×  ×× • ××  ×× •
 ×  × ××  × ×× 
× × ×
Figure 8: Construction of G[k
√
n + 1, k
√
n + j + 1] can be done in parallel to construction of
P [k
√
n+ 1, k
√
n+ j − 1] in two consecutive G,P blocks. The right circuit shows the new circuit
structure for G,P (except for the first G,P block).
a • a a T †  T  T †  T  a
b • b ≡ b T † • • • • b
c  c c H •  T  • T † H c
Figure 9: Toffoli decomposition with depth 6C+2 [25].
[3] B.-S. Choi and R. Van Meter. On the effect of quantum interaction distance on quantum
addition circuits. J. Emerg. Technol. Comput. Syst., 7(3):11:1–11:17, August 2011.
[4] R. Beals et al. Efficient distributed quantum computing, arXiv:1207.2307v2, 2012.
[5] Y. Takahashi, N. Kunihiro, and K. Ohta. The quantum Fourier transform on a linear nearest
neighbor architecture. Quant. Inf. Comput., 7:383–391, 2007.
[6] D. Maslov. Linear depth stabilizer and quantum Fourier transformation circuits with no
auxiliary qubits in finite neighbor quantum architectures. Phys. Rev. A, 76, 2007.
[7] A. G. Fowler, S. J. Devitt, and L. C. L. Hollenberg. Implementation of Shor’s algorithm on
a linear nearest neighbour qubit array. Quant. Inf. Comput., 4:237–245, 2004.
[8] S. A. Kutin. Shor’s algorithm on a nearest-neighbor machine. Asian Conf. on Quant. Inf.
Sci., 2007.
[9] P. Pham and K. M. Svore. A 2D nearest-neighbor quantum architecture for factoring.
arXiv:1207.6655, 2012.
[10] A. G. Fowler, C. D. Hill, and L. C. L. Hollenberg. Quantum error correction on linear nearest
neighbor qubit arrays. Phys. Rev. A, 69:042314.1–042314.4, 2004.
[11] M. Arabzadeh, M. Saheb Zamani, M. Sedighi, and M. Saeedi. Depth-optimized reversible
circuit synthesis. Quant. Inf. Proc., 12(4):1677–1699, 2013.
[12] M. Mo¨tto¨nen and J. J. Vartiainen. Decompositions of general quantum gates. Ch. 7 in Trends
in Quant. Comput. Research, NOVA Publishers, New York, 2006.
[13] V. V. Shende, S. S. Bullock, and I. L. Markov. Synthesis of quantum-logic circuits. IEEE
Trans. CAD, 25(6):1000–1010, Jun 2006.
[14] M. Saeedi, M. Arabzadeh, M. Saheb Zamani, and M. Sedighi. Block-based quantum-logic
synthesis. Quant. Inf. Comput., 11(3-4):0262–0277, 2011.
a • • a a T • • • a a T • • • • a
b •  b ≡ b T  T †  T  T † b b T ×  T ×  T  T †  b
c  c c H •  T †  T • H c c H • × T † × T • H c
(a) (b)
Figure 10: (a) Peres decompositions with depth 5C+3 [25], (b) Toffoli with adjacent gates based
on Peres decomposition (depth=6C+2S+4).
8
Table 3: Circuit depth for the proposed adder and the one in [19] considering different costs for
Toffoli and SWAP gates.
T -depth=14,S-depth=1 T -depth=14,S-depth=3 T -depth=12,S-depth=3 T -depth=12,S-depth=1
Ours [19] Ours [19] Ours [19] Ours [19]
104
√
n 140
√
n 144
√
n 176
√
n 132
√
n 160
√
n 92
√
n 124
√
n
[15] M. Saeedi, M. Saheb Zamani, M. Sedighi, and Z. Sasanian. Reversible circuit synthesis using
a cycle-based approach. J. Emerg. Technol. Comput. Sys., 6(4):13:1–13:26, 2010.
[16] M. Saeedi, R. Wille, and R. Drechsler. Synthesis of quantum circuits for linear nearest
neighbor architectures. Quant. Inf. Proc., 10(3):355–377, 2011.
[17] Y. Hirata, M. Nakanishi, S. Yamashita, and Y. Nakashima. An efficient conversion of quantum
circuits to a linear nearest neighbor architecture. Quant. Inf. Comput., 11(1–2):0142–0166,
2011.
[18] A. Shafaei, M. Saeedi, and M. Pedram. Optimization of quantum circuits for interaction
distance in linear nearest neighbor architectures. In Design Autom. Conf., 2013.
[19] B.-S. Choi and R. Van Meter. A
√
n-depth quantum adder on the 2D NTC quantum computer
architecture. J. Emerg. Technol. Comput. Syst., 8(3):24:1–24:22, Aug 2012.
[20] T. Szkopek et al. Threshold error penalty for fault-tolerant quantum computation with
nearest neighbor communication. IEEE Trans. Nano., 5(1):42-49, 2006.
[21] M. Nielsen and I. Chuang. Quantum Computation and Quantum Information. Cambridge
Univ. Press, 2000.
[22] I. L. Markov and M. Saeedi. Constant-optimized quantum circuits for modular multiplication
and exponentiation. Quant. Info. Comput., 12(5-6):361–394, May 2012.
[23] I. L. Markov and M. Saeedi. Faster quantum number factoring via circuit synthesis. Phys.
Rev. A, 87:012310, Jan 2013.
[24] V. V. Shende and I. L. Markov. On the CNOT-cost of TOFFOLI gates. Quant. Inf. Comput.,
9(5-6):461–486, May 2009.
[25] M. Amy, D. Maslov, M. Mosca, and M. Ro¨tteler. A meet-in-the-middle algorithm for fast
synthesis of depth-optimal quantum circuits. IEEE Trans. CAD, arXiv:1206.0758v3, 2013.
Acknowledgements
Authors were supported by the Intelligence Advanced Research Projects Activity (IARPA) via
Department of Interior National Business Center contract number D11PC20165. The U.S. Gov-
ernment is authorized to reproduce and distribute reprints for Governmental purposes notwith-
standing any copyright annotation thereon. The views and conclusions contained herein are those
of the authors and should not be interpreted as necessarily representing the official policies or
endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government.
9
a1 • • a1 •
C−1
a1
b1 •  s1 s1    s1
0  c2 × × 0
a2 • • × • • × a2 • a2
b2 •  •  s2 s2    s2
0   c3 × × 0
a3 • • × • • × a3 • a3
b3 •  •  s3 s3    s3
0   c4 • × ×P[4,6] 0
a4 • • a4 • a4
b4 •  p4 × ×  s4    s4
0  g4 × × ×  ×××c4 • 0
a5 • • × • × • • ××× a5 • a5
b5 •  p5 • • × • ××  s5    s5
0  g5  ×  P[4,5] × × × c4× ×c5 • 0
0 × G[4,5] • × × ×  ××× 0
a6 • • × • × × • • ×××× a6 • a6
b6 •  p6 × ×  × × • × • × ××  s6    s6
0  g6 × × ×  P[4,6] • × c4 × ×c6 c6 • 0
0 × G[4,6]  ××× c7 • × ×P[7,9] 0
a7 • • a7 • a7
b7 •  p7 × ×  s7    s7
0  g7 × × ×  ×××c7 • 0
a8 • • × • × • • ××× a8 • a8
b8 •  p8 • • × • ××  s8    s8
0  g8  ×  P[7,8] × × × c7× ×c8 • 0
0 × G[7,8] • × × ×  ××× 0
a9 • • × • × × • • ×××× a9 • a9
b9 •  p9 × ×  × × • × • × ××c9  s9    s9
0  g9 × × ×  P[7,9] • × c7× × c9 • 0
0 × G[7,9]  ××× c10 0
Figure 11: A 9-bit adder based on the proposed blocks. Carry, Gi,j , pi, and gi values are shown in this figure. The C
−1 block is the reverse of the circuit
shown in the dashed box applied with the NOTs and CNOTs shown to clear ancillae. All gates use adjacent gates in the 2D layout. For qubit locations see
the table in Figure 3.
1
0
