T-count and Qubit Optimized Quantum Circuit Designs of Carry Lookahead
  Adder by Thapliyal, Himanshu et al.
ar
X
iv
:2
00
4.
01
82
6v
1 
 [q
ua
nt-
ph
]  
4 A
pr
 20
20
1
T-count and Qubit Optimized Quantum Circuit
Designs of Carry Lookahead Adder
Himanshu Thapliyal, Edgard Mun˜oz-Coreas, Vladislav Khalus
Abstract—Quantum circuits of arithmetic operations such
as addition are needed to implement quantum algorithms in
hardware. Quantum circuits based on Clifford+T gates are used
as they can be made tolerant to noise. The tradeoff of gaining
fault tolerance from using Clifford+T gates and error correcting
codes is the high implementation overhead of the T gate. As a
result, the T-count performance measure has become important
in quantum circuit design. Due to noise, the risk for errors
in a quantum circuit computation increases as the number of
gate layers (or depth) in the circuit increases. As a result, low
depth circuits such as quantum carry lookahead adders (QCLA)s
have caught the attention of researchers. This work presents
two QCLA designs each optimized with emphasis on T-count
or qubit cost respectively. In-place and out-of-place versions of
each design are shown. The proposed QCLAs are compared
against the existing works in terms of T-count. The proposed
QCLAs for out-of-place addition achieve average T gate savings
of 54.34% and 37.21%, respectively. The proposed QCLAs for
in-place addition achieve average T gate savings of 72.11% and
35.87% respectively.
I. INTRODUCTION
Quantum circuits of arithmetic operations are needed to de-
sign quantum hardware for implementing quantum algorithms
such As integer factoring, searching and quantum mechanical
simulation. Quantum circuits for addition are fundamental
building blocks crucial to implementing these quantum al-
gorithms [1] [2] [3] [4] [5]. Thus, researchers have invested
considerable effort in designing quantum adders such as ripple
carry adders or carry lookahead adders (QCLAs) [6] [7] [8]
[9] [10] [11].
Quantum circuits possess properties that make them distinct
from circuits for other technologies. For example, there is a
one-to-one relationship between the inputs and outputs in a
quantum circuit. As a result, the quantum circuit designer will
face additional sources of circuit overhead such as ancillae and
garbage output. Ancillae are constant inputs to the quantum
circuit. Garbage output are any circuit output that is not a
circuit input or needed output. To make full use of the qubit
resources of the quantum machine, the garbage output will
need to be cleared. This process will add to the overall qubit
cost and gate cost of a quantum circuit and is discussed in
[12].
Physical quantum computers are prone to noise errors [13]
[14] [15]. Quantum circuits based on Clifford+T gates have
caught the attention of researchers because can be made
Himanshu Thapliyal, Edgard Mun˜oz-Coreas and Vladislav Khalus are
with the Department of Electrical and Computer Engineering, University of
Kentucky, Lexington, KY, USA.
Email: hthapliyal@uky.edu
fault tolerant with quantum error correcting codes permitting
reliable and scalable quantum computation [15] [16]. This set
is universal and has been used to realize basic reversible logic
gates and larger functional blocks [17] [18] [16] [19] [20].
However, fault tolerance comes with increased implementation
overhead especially the overhead associated with implement-
ing the T gate [16] [21] [22]. The high cost to implement
the T gate has caused the measures of T-count and T-depth to
become important measures to evaluate quantum circuit cost
[17] [14] [23] [9].
QCLAs have caught the interest of researchers because
they perform the addition operation in order O(log(n)) circuit
depth (while the ripple carry adder has a depth of O(n)). Low-
depth circuits such as QCLAs have use in quantum hardware
applications where longer computation time increases the risk
of errors [24] [25] [26]. Several QCLA designs have been
proposed in the literature such as [27] or [8]. Designs that
return the sum on ancillae (or out-of-place QCLA) and designs
that replace one of the primary inputs with the sum (or
in-place QCLA) have been proposed. Table I and Table II
summarizes the existing work. Table I presents existing out-of-
place QCLAs and Table II presents in-place QCLAs. Further
discussion on the existing work is in Section VII.
TABLE I: Details of Existing Out-of-Place QCLAs
Design T-count qubits gates used garbage?
Draper et al.
([28])
O(35 · n) O(4 · n) CNOT, Toffoli no
Babu et al.*
([29])
O(54 · n) O(12 · n) PFA1, CNOT,
MIG2, NFT3
yes
Trisetyarso
et al. ([30])
O(35 · n) O(4 · n) NOT, CNOT,
Toffoli,
Hadamard,
DCZ4
no
Lisa et al.*
([31])
O26 · n) O(6 · n) CNOT, RPA5,
Fredkin7
yes
Thapliyal et
al. ([32])
O(35 · n) O(4 · n) CNOT,
Toffoli,
Peres6
no
1 The PFA gate decomposes into two V gates, a V+ gate and six CNOT
gates. Each V gate and V+ gate has a T-count of 3. [29] [33] [18]
2 The MIG gate decomposes into two V gates, a V+ gate and four CNOT
gates. [33]
3 The NFT gate decomposes into two V gates, a V+ gate and seven CNOT
gates. [34]
4 The DCZ or double controlled Z gate has a T-count of seven. [35]
5 The RPA gate decomposes into a V gate, a V+ gate and three CNOT
gates. [31]
6 The Peres gate has a T-count of seven. [18]
7 The Fredkin gate has a T-count of seven. [18]
* T-count and qubit cost are for the circuit after being modified to remove
garbage output. We use the methodology outlined in [12] to remove the
garbage outputs
2TABLE II: Details of Existing In-Place QCLAs
Design T-count qubits gates used garbage?
Draper et
al. ([28])
O(70 · n) O(4 · n) NOT, CNOT,
Toffoli
no
Trisetyarso
et al. ([30])
O(70 · n) O(4 · n) NOT, CNOT,
Toffoli,
Hadamard,
DCZ1
no
Thapliyal et
al. ([32])
≈ O(51 · n) O(4 · n) NOT, CNOT,
Toffoli,
Peres2, TR3
no
Takahashi
et al.([8])
≈ O(196 · n) ≈ O(5 · n) NOT, CNOT,
Toffoli
no
Takahashi
et al.([36])
≈ O(49 · n) ≈ O(5 · n) NOT, CNOT,
Toffoli
no
Cheng et al.
([27])
O(n3) O(3 · n) CNOT,
Toffoli,
MCT4
no
Mogenson
1 ([37])
≈ O(42 · n) O(3 · n) CNOT,
Toffoli,
Fredkin5
no
Mogenson
2 ([37])
≈ O(42 · n) O(3 · n) CNOT,
Toffoli,
Fredkin5
no
1 The DCZ or double controlled Z gate has a T-count of seven. [35]
2 The Peres gate has a T-count of seven. [18]
3 The TR gate decomposes into two V gates, a V+ gate and a CNOT gate.
Each V gate and V+ gate has a T-count of 3. [38] [18]
4 The MCT or Multiple-Control Toffoli gate decomposes into 2 · n − 3
Toffoli gates. The T-count is 14 · n− 21.
5 The Fredkin gate has a T-count of seven. [18]
To overcome shortcomings in existing works, this work
proposes a family of designs for QCLAs. The first design (FT-
QCLA1) is optimized with an emphasis on T-count. The second
design (FT-QCLA2) is optimized with an emphasis on number
of qubits. All proposed designs enjoy reduced T gate cost and
qubit cost compared to the existing works. In-place QCLA
implementations (In-FT-QCLA1 and In-FT-QCLA2) and out-
of-place QCLA implementations (Out-FT-QCLA1 and Out-FT-
QCLA2) for each design are shown. All proposed QCLAs
can be made fault tolerant with error correcting codes. The
proposed QCLA designs are based on the NOT gate, CNOT
gate, Toffoli gate, logical AND gate and uncomputation gate.
The logical AND gate and uncomputation gate are presented
in [19] and [23]. The proposed QCLAs are compared against
existing works and shown to be superior in terms of qubit cost
and T-count.
This work is organized as follows: Section II provides
background on the Clifford+T quantum gate set, introduces
the logical AND gate and the uncomputation gate. Section
III presents the proposed out-of-place QCLAs. Each design
(Out-FT-QCLA1 and Out-FT-QCLA2) is addressed in its
own subsection within Section III. Section IV illustrates the
comparison of the proposed out-of-place QCLAs against ex-
isting out-of-place QCLAs. Section V presents the proposed
in- place QCLAs. Each design (In-FT-QCLA1 and In-FT-
QCLA2) is addressed in its own subsection within Section V.
Section VI illustrates the comparison of the proposed in-place
QCLAs against existing in-place QCLAs. Section VII provides
a review of the existing work in quantum carry lookahead
addition.
II. BACKGROUND
A. Quantum Gates
CLIFFORD+T GATE SET
Hadamard Gate
H
1√
2
[
1 1
1 −1
]
T Gate
T
[
1 0
0 ei·
pi
4
]
Hermitian of T
Gate T †
[
1 0
0 e−i·
pi
4
]
Phase Gate
S
[
1 0
0 i
]
Hermitian of
Phase Gate S†
[
1 0
0 −i
]
NOT Gate
[
0 1
1 0
]
Feynman
(CNOT) Gate
•


1 0 0 0
0 1 0 0
0 0 0 1
0 0 1 0


Fig. 1: The quantum gate set used in this work.
The proposed QCLA circuits in this work are based on the
Clifford+T quantum gate set shown in Figure 1. The Clif-
ford+T gates have caught the interest of researchers because
the gate set can be made tolerant to noise errors [13] [14] [15].
The Clifford+T gate set is universal in nature permitting the
fault tolerant quantum realization of any function of interest
[39] [17]. However, the fault-tolerance comes with the trade-
off of the high implementation costs of the T gate relative to
the Clifford gates. Details on the fault tolerant implementation
of the T gate (and its associated costs) are in [21] [22]. The T
gate is required because the Clifford gates by themselves are
not universal [17] [14] [39] [16]. The high implementation cost
of the T gate has made T-count and T-depth metrics of interest
to evaluate quantum circuit performance. Existing quantum
computers (such as those shown in [25]) have a small number
of qubits available. As a result, the number of qubits in a
quantum circuit is an important cost measure. We now define
the T-count, T-depth and qubit cost resource measures.
• T-count: T-count is the total number of T gates used in
the quantum circuit.
• T-depth: T-depth is the number of T gate layers in the
circuit, where a layer consists of quantum operations that
can be performed simultaneously.
• Qubit cost: Qubit cost is the total number of qubits
required to design the quantum circuit.
The Clifford+T gates can be combined to build logic gates
which can in turn be used to implement quantum circuits such
as the proposed QCLAs. The proposed QCLA circuits are
based on the NOT gate, CNOT gate, Toffoli gate, temporary
logical-AND gate and uncomputation gate. The NOT gate
and CNOT gate are members of the Clifford+T gate set. The
Toffoli gate, temporary logical-AND gate and uncomputation
gates must be constructed from Clifford+T gates. We use the
Clifford+T implementations of the Toffoli gate designed in
3|x〉 • T+ |x〉 |x〉 • |x〉
|y〉 • T+ |y〉 |y〉 • |y〉
|A〉 • T • H S |x · y〉 |x · y〉
(a) The temporary logical-AND gate and its Clifford+T gate
implementation. This Clifford+T gate implementation of the
temporary logical-AND gate has a T-count of 4. |A〉 is an
ancillae in the state 1√
2
(|0〉 + e
i·pi
4 |1〉). Source: [23]
|x〉 • |x〉 |x〉 • |x〉
|y〉 Z |y〉 |y〉 • |y〉
|x · y〉 H • |x · y〉
(b) The uncomputation gate and its Clifford+T gate im-
plementation. This Clifford+T gate implementation of the
uncomputation gate has a T-count of 0. Source [23]
Fig. 2: Clifford+T gate implementation of the uncomputation
gate and logical-AND gate used in this work.
• T • • T † •
• = T • T † T † •
H T • T • H
Fig. 3: The Clifford+T gate implementation of the Toffoli gate
design presented in [18]. This fault tolerant Clifford + T gate
implementation of the Toffoli gate has a T-count of 7.
[18] for this work (see figure 3). Figure 2 shows the temporary
logical-AND gate and the uncomputation gates used in this
work. The Clifford+T implementation and the graphical image
are shown for each gate. The design of the logical-AND gate
used in this work is presented in [23]. The T-count for these
logical gates is as follows: The Toffoli gate implementation
designed in [18] has a T-count of 7, the temporary logical-
AND gate has a T-count of 4 and the uncomputation gate has
a T-count of 0.
B. Carry Look Ahead Addition
Carry look ahead addition (CLA) has caught the attention
of researchers because carry lookahead addition can perform
addition in O(log(n)) time while alternative adders such
as ripple carry addition requires O(n) time. Thus a CLA
circuit can complete the addition operation with a critical path
depth of order O(log(n)) whereas the critical path for ripple
carry addition has a depth of order O(n). The decrease in
computation depth comes at the cost of added circuitry.
Given two n bit inputs a and b, the CLA circuit generates
the sum of a and b by executing the CLA algorithm illustrated
in Figure 4. The CLA algorithm is an established technique to
quickly perform addition [40] [41]. The CLA algorithm can
be divided into two basic steps: (i) implement generate (gi)
and propagate bits (pi) for each bit of the inputs a and b and
(ii) compute the sum s by computing ci+1 = pi ∧ ci ∨ gi for
1 ≤ i ≤ n.
Algorithm 1: Carry lookahead Addition
Function CLA(a, b)
Requirements:
//Takes 2 n bit values a and b as input.
//Returns the sum as an n+ 1 bit number s.
1 //create carry generate and carry propagate bits.
2
3 For i = 0 to n− 1
4 pi = ai ⊕ bi
5 gi = ai ∧ bi
6 End
7
8 //synthesize sum.
9
10 c0 = g0 //no carry-in
11 s0 = p0
12 For i = 1 to n− 1
13 ci = pi−1 ∧ ci−1 ∨ gi−1
14 si = ci ⊕ ai ⊕ bi
15 End
16 sn = pn−1 ∧ cn−1 ∨ gn−1
17
18 Return s
Fig. 4: The carry lookahead addition algorithm implemented
by the circuits in this work.
III. PROPOSED DESIGNS OF OUT-OF-PLACE QCLA
CIRCUITS
We now show our proposed out-of-place QCLA circuits.
The proposed QCLA circuits have lower T-count and qubit
cost than existing works. The QCLA designs save T gates by
using the temporary logical-AND gate, the existing uncom-
putation gate or proposed uncomputation gate where possible.
The out of place FT-QCLA1 (or Out-FT-QCLA1) is optimized
for T-count. We also propose an out of place FT-QCLA2 (or
Out-FT-QCLA2) which is optimized for qubit cost. The design
methodologies of the proposed QCLAs are generic and each
can be used to implement a QCLA circuit of any size.
The proposed out-of-place QCLA circuits operate as fol-
lows: Given 2 values a and b (each n bits wide) stored in
quantum registers A and B as well as n + 1 ancillae stored
in register X . X0 is set to 0 and the remaining locations
are set to A
(
where A = 1√
2
(
|0〉+ e
ipi
4 |1〉
))
. Lastly, Out-
FT-QCLA2 requires a register Z with n − w(n) − ⌊log(n)⌋
elements (where w(n) = n −
∑∞
y=1⌊
n
2y ⌋ and is the num-
ber of ones in the binary expansion of n) all set to A(
where A = 1√
2
(
|0〉+ e
ipi
4 |1〉
))
. Out-FT-QCLA1 requires a
register Z with 3·n−2·w(n)−2⌊log(n)⌋ ancillae set to A. At
the end of computation, A and B are restored to their initial
values and X will contain the sum of the addition of a and
b. Lastly, Z is transformed into a register of classical states.
These qubits can be restored to computational basis values for
4reuse as ancillae.
This Section is organized as follows: The proposed design
of Out-FT-QCLA1 is shown in Section III-A. The proposed
design of Out-FT-QCLA2 is shown in Section III-B.
A. Proposed Out-of-Place FT-QCLA1 Design (Out-FT-
QCLA1)
The out-of-place FT-QCLA1 (Out-FT-QCLA1) is optimized
for T-count. The proposed QCLA is based on the quantum
NOT gate, CNOT gate, logical AND gate and uncomputation
gate. The steps of the proposed design methodology to realize
Out-FT-QCLA1 are shown along with an illustrative example
of the QCLA circuit in Figure 5.
• Step 1: For i = 0 to n-1, apply the logical AND gate
at locations A[i], B[i] and an ancillae such that locations
A[i] and B[i] are unchanged. The ancillae will have the
result of computation. The ancillae will be renamed to
the value g[i, i +1].
• Step 2: For i = 1 to n-1. At the locations A[i] and
B[i] apply a CNOT gate such that A[i] is unchanged
and location B[i] will have the result of computation.
Location B[i] will be renamed p[i, i+1].
• Step 3 (P-rounds): We use a nested loop in this step.
For t = 1 to ⌊log(n)⌋ − 1 and For m = 1 to ⌊ n2t ⌋ − 1:
At locations p[j,l], p[l, k] and at an ancillae apply a
logical AND gate such that locations p[j,l] and p[l, k]
are unchanged and the ancillae will have the result of
computation. The ancillae will be renamed p[j,k]. The
equation for indexes are j = 2t ·m, k = 2t ·m+ 2t, and
l = 2t ·m+ 2t−1, respectively.
• Step 4 (G-rounds): We use a nested loop in this step.
For t = 1 to ⌊log(n)⌋ and For m = 0 to ⌊ n2t ⌋ − 1:
At locations g[j, l], p[l, k], and g[l, k] apply a Logical
AND gate and uncomputation gate pair such that loca-
tions g[j, l] and p[l, k] are unchanged and location g[l,
k] will have the result of computation. Location g[l, k]
is renamed to g[j, k]. The equation for indexes are j =
2t ·m, k = 2t ·m+2t, and l = 2t ·m+2t−1, respectively.
• Step 5 (C-rounds): We use a nested loop in this step.
For t = ⌊log
(
2·n
3
)
⌋ to 1 and For m = 1 to ⌊ (n−2
t−1)
2t ⌋:
At locations g[0, l], p[l,k], and g[l,k] apply a Logical
AND gate and uncomputation gate pair such that loca-
tions g[0, l] and p[l,k] are unchanged and location g[l,k]
will have the result of computation. The equation for
indexes are l = 2t ·m and k = 2t ·m+ 2t−1.
• Step 6 (P-erase-rounds): We use a nested loop in this
step. For t = ⌊log(n)⌋−1 to 1 and For m = 1 to ⌊ n2t ⌋−
1:
At locations p[j, l], p[l, k], and p[j, k] apply a uncom-
putation gate such that locations p[j, l]] and p[l,k] are
unchanged and location p[j, k] will be restored to its
original value. The equation for indexes j = 2t · m, k
= 2t ·m+ 2t, and l = 2t ·m+ 2t−1.
• Step 7: This step has two sub-steps:
– Sub-step 1: For i = 1 to n-1, At locations p[i, i +1]
and g[0, i] apply a CNOT gate such that location
|0〉 |s0〉
|a0〉 • • • |a0〉
|b0〉 • • |b0〉
|A〉 • • |s1〉
|a1〉 • • • |a1〉
|b1〉 • • • • |b1〉
|A〉 •
|A〉 • • • • |s2〉
|a2〉 • • • |a2〉
|b2〉 • • • • • • |b2〉
|A〉 • •
|A〉 •
|A〉 • • |s3〉
|a3〉 • • • |a3〉
|b3〉 • • • • • • |b3〉
|A〉 •
|A〉 •
|A〉 • •• •• • |s4〉
|a4〉 • • • |a4〉
|b4〉 • • • • • • |b4〉
|A〉 • • • •
|A〉 •
|A〉 • • |s5〉
|a5〉 • • • |a5〉
|b5〉 • • • • • • |b5〉
|A〉 • •
|A〉 •
|A〉 •
|A〉 • • • • |s6〉
|a6〉 • • • |a6〉
|b6〉 • • • • • • |b6〉
|A〉 • • • •
|A〉 •
|A〉 • • |s7〉
|a7〉 • • • |a7〉
|b7〉 • • • • • • |b7〉
|A〉 •
|A〉 •
|A〉 •
|A〉 |s8〉
Fig. 5: Proposed out-of-place FT-QCLA1 (Out-FT-QCLA1)
circuit for the case of adding two 8 bit values a and b.
p[i, i +1] is unchanged and g[0, i] has the result
of computation. After this step, location g[0, i] will
have the sum bit si. The sum bit sn is at location
g[0, n].
– Sub-step 2: At locations p[0, 1] and an ancillae
apply a CNOT gate such that location p[0, 1] and
the ancillae will have the value B[0].
• Step 8: This step has two sub-steps:
– Sub-step 1: For i = 1 to n-1, At locations p[i, i
+1] and A[i] apply a CNOT gate such that A[i] is
unchanged and p[i, i +1] is restored to its original
value (bi).
5– Sub-step 2: At locations Z and A[0] apply a CNOT
gate such that A[0] is unchanged and Z will have the
sum bit s0
B. Proposed Out-of-Place FT-QCLA2 Design (Out-FT-
QCLA2)
Out-FT-QCLA2 is optimized for qubit cost. A trade-off for
the reduced qubit count is an increase in the T gate cost
of Out-FT-QCLA2. We reduce the qubit cost by replacing
logical AND gate and uncomputation gate pairs with alter-
native Toffoli gate implementations such as the design in [18]
were appropriate. To retain a low T-count, Out-FT-QCLA2
does still incorporate logical AND gate and uncomputation
gate pairs. We save qubits because the logical AND gate
and uncomputation gate pairs have a qubit cost of 4 while
implementations of the Toffoli gate (like the design in [18])
have a qubit cost of 3. However, alternative Toffoli gate
implementations have higher T-counts (such as the design in
[18] with a T-count of 7). The design methodology for Out-FT-
QCLA2 is identical to the methodology for Out-FT-QCLA1
except Step 4 and Step 5. In Step 4 and Step 5, Toffoli gates
based on the design in [18] are used. An illustrative example
of Out-FT-QCLA2 is in Figure 6.
|z0〉 |s0〉
|a0〉 • • • |a0〉
|b0〉 • • |b0〉
|A〉 • |s1〉
|a1〉 • • • |a1〉
|b1〉 • • • |b1〉
|A〉 • • |s2〉
|a2〉 • • • |a2〉
|b2〉 • • • • • |s2〉
|A〉 •
|A〉 • |s3〉
|a3〉 • • • |a3〉
|b3〉 • • • • • |b3〉
|A〉 • • • |s4〉
|a4〉 • • • |a4〉
|b4〉 • • • • • |b4〉
|A〉 • • •
|A〉 • |s5〉
|a5〉 • • • |a5〉
|b5〉 • • • • • |b5〉
|A〉 •
|A〉 • • |s6〉
|a6〉 • • • |a6〉
|b6〉 • • • • • |b6〉
|A〉 • • •
|A〉 • |s7〉
|a7〉 • • • |a7〉
|b7〉 • • • • • |b7〉
|A〉 |s8〉
Fig. 6: Proposed out-of-place FT-QCLA2 (Out-FT-QCLA2)
for the case of adding two 8 bit values a and b.
IV. PERFORMANCE OF PROPOSED OUT-OF-PLACE QCLA
CIRCUITS
A. T-count analysis of Out-FT-QCLA-1
The T-count of Out-FT-QCLA-1 is shown for each Step of
the proposed design methodology. Total T-count is determined
by summing the T-count for each Step of the proposed
design methodology. This design uses logical AND gate and
uncomputation gate pairs to implement the Toffoli gate as
shown in [19] and [23]. The pair has a T-count of 4. The
total T-count is 16 · n − 8 · w(n) − 8 · ⌊log(n)⌋ − 4 where
w(n) = n−
∑∞
y=1
⌊
n
2y
⌋
.
• Step 1 uses n logical AND gates. The T-count for this
step is 4 · n.
• Step 2 does not need T gates.
• Step 3 uses n−w(n)−⌊log(n)⌋ logical AND gates. The
T-count for this step is 4 · (n− w(n) − ⌊log(n)⌋).
• Step 4 uses n−w(n) logical AND gate and uncomputa-
tion gate pairs. The T-count for this step is 4 ·(n−w(n)).
• Step 5 uses n − ⌊log(n)⌋ − 1 logical AND gate and
uncomputation gate pairs. The T-count for this step is
4 · (n− ⌊log(n)⌋ − 1).
• Steps 6 through 8 does not need T gates.
B. T-count analysis of Out-FT-QCLA-2
The T-count of Out-FT-QCLA-2 is shown for each Step of
the proposed design methodology. Total T-count is determined
by summing the T-count for each Step of the proposed design
methodology. This design uses the Toffoli gate implementation
in [18] which has a T-count of 7. The total T-count is 22 ·n−
11 · w(n)− 11 · ⌊log(n)⌋ − 7 where w(n) = n−
∑∞
y=1
⌊
n
2y
⌋
.
• Step 1 uses n logical AND gates. The T-count for this
step is 4 · n.
• Step 2 does not need T gates.
• Step 3 uses n−w(n)−⌊log(n)⌋ logical AND gates. The
T-count for this step is 4 · (n− w(n) − ⌊log(n)⌋).
• Step 4 uses n−w(n) Toffoli gates. The T-count for this
step is 7 · (n− w(n)).
• Step 5 uses n− ⌊log(n)⌋ − 1 Toffoli gates. The T-count
for this step is 7 · (n− ⌊log(n)⌋ − 1).
• Steps 6 through 8 does not need T gates.
C. Cost Comparison of Proposed Out-of-place QCLAs
1) Cost Comparison in Terms of T-count: Table III in-
dicates that all proposed out-of-place QCLAs have T-count
costs of order O(n). The existing works also have T-counts
of order O(n). All proposed QCLAs have a reduced T-count
compared to the existing work. Of the proposed QCLAS, Out-
FT-QCLA1 requires the fewest T gates.
The Out-FT-QCLA1 requires roughly 70.37% fewer T gates
than the design by Babu et al., 38.46% fewer T gates than the
design by Lisa et al., 54.29% fewer T gates than the designs
by Draper et al., Thapliyal et al. and Trisetyarso et al.
The proposed Out-FT-QCLA2 requires 59.26% fewer T
gates than the design by Babu et al., 15.38% fewer T gates
than the design by Lisa et al., 37.14% fewer T gates than the
designs by Draper et al., Thapliyal et al. and Trisetyarso et al.
6TABLE III: Cost Comparison of Out-of-place QCLAs
Design T-count Equation Qubit Equation
Draper et al. ([28]) 35n− 21w(n)− 21⌊log(n)⌋ − 7 4n−w(n)− ⌊log(n)⌋ + 1
Trisetyarso et al. ([30]) 35n− 21w(n)− 21⌊log(n)⌋ − 7 4n−w(n)− ⌊log(n)⌋ + 1
Thapliyal et al. ([32] ) 35n− 14 4n+ 1
Babu et. al. ([29])1 54 · n 12 · n+ 1
Lisa et al. ([31])1 26 · n 6 · n+ 1
Out-FT-QCLA12 16n − 8w(n)− 8⌊log(n)⌋ − 4 6n− 2w(n)− 2⌊log(n)⌋
Out-FT-QCLA23 22n− 11w(n)− 11⌊log(n)⌋ − 7 4n−w(n)− ⌊log(n)⌋ + 1
w(n) = n−
∑∞
y=1
⌊
n
2y
⌋
1 Circuits modified to remove garbage output. We use the method-
ology in [12] to remove the garbage output.
2 Out-FT-QCLA1 is optimized emphasizing T-count.
3 Out-FT-QCLA2 is optimized emphasizing qubit cost.
2) Cost Comparison in Terms of Qubits: Table III indicates
that all proposed out-of-place QCLAs have qubit costs of
order O(n). The existing works also have a T-count of order
O(n). Out-FT-QCLA1 requires 2 · n + w(n) + ⌊log(n)⌋ − 1
additional ancillae compared to Out-FT-QCLA2. The added
qubit cost of Out-FT-QCLA1 illustrates the trade-off between
qubits and T gates that occurs when logical AND gate and
uncomputation gate pairs are replaced by Toffoli gates in the
QCLA implementation.
Out-FT-QCLA2 requires w(n)+⌊log(n)⌋ fewer qubits than
the design by Thapliyal et al. and has the same qubit cost as
the designs by Draper et al. and Trisetyarso et al. Further, the
proposed Out-FT-QCLA2 requires 8·n+w(n)+⌊log(n)⌋ fewer
qubits than the design by Babu et al. and 2·n+w(n)+⌊log(n)⌋
fewer qubits than the design by Lisa et al. In contrast, the out-
of-place Out-FT-QCLA1 requires 6·n+2·w(n)+2·⌊log(n)⌋+1
fewer qubits than the design by Babu et al. and 2 ·w(n) + 2 ·
⌊log(n)⌋+ 1 fewer qubits than the work by Lisa et al.
V. PROPOSED IN-PLACE QCLA CIRCUITS
We now show our proposed in-place quantum carry looka-
head (QCLA) circuits. The proposed QCLA circuits have
lower T-count and qubit costs than existing works. The pro-
posed in-place FT-QCLA1 (or In-FT-QCLA1) and the pro-
posed in-place FT-QCLA2 (or In-FT-QCLA2) save T gates by
using the temporary logical-AND gate and uncomputation gate
where possible. In-FT-QCLA2 is optimized for qubit cost and
the In-FT-QCLA1 is optimized for T gates. The methodologies
of the proposed in-place QCLAs are generic and each can be
used to implement a QCLA circuit of any size.
The proposed QCLA circuits operate as follows: Given 2
values a and b (each n bits wide) stored in quantum registers
A, B and an n bit register Z of ancillae set to a where A =
1√
2
(
|0〉+ e
ipi
4 |1〉
)
. Lastly, In-FT-QCLA2 QCLA requires a
n − w(n) − ⌊log(n)⌋ (where w(n) = n −
∑∞
y=1
⌊
n
2y
⌋
and is
the number of ones in the binary expansion of n) bit register of
ancillae (X) set to 1√
2
(
|0〉+ e
ipi
4 |1〉
)
. In contrast, register X
of the In-FT-QCLA1 has 3·n−2·w(n)−2⌊log(n)⌋ ancillae set
to A. At the end of computation, A is restored to their initial
values and B will contain sum bits 0 through n − 1 of the
addition of a and b. For each QCLA, at the end of computation,
Z will contain the sum bit sn and the remaining locations in
Z are transformed into a register of classical states. These
qubits can be restored to initial values for reuse as ancillae.
In-FT-QCLA1 and In-FT-QCLA2 will transform the remaining
locations inX to a register of classical states. These qubits can
be restored to initial values for reuse as ancillae. This Section
is organized as follows: In-FT-QCLA1 is shown in Section
V-A. The In-FT-QCLA2 is shown in Section V-B.
A. Methodology of the Proposed In-Place FT-QCLA1 (In-FT-
QCLA1)
The T gate optimized fault tolerant QCLA is targeted for
quantum hardware that can support the resource requirements
needed for fault tolerant quantum computation. Because of
the high implementation cost of the fault tolerant T gate, this
QCLA is optimized for T-count. The proposed QCLA is based
on the quantum NOT gate, CNOT gate, logical AND gate and
the uncomputation gate presented in [23]. The steps of the
methodology to implement the proposed In-FT-QCLA1 are
shown along with an illustrative example of the QCLA circuit
in Figure 7.
• Step 1: For i = 0 to n-1, apply the logical AND gate
at locations A[i], B[i] and an ancillae such that locations
A[i] and B[i] are unchanged. The ancillae will have the
result of computation. The ancillae will be renamed to
the value g[i, i +1].
• Step 2: For i = 1 to n-1: At locations of A[i] and B[i]
apply a CNOT gate such that location A[i] is unchanged
and location B[i] will have the result of computation.
Location B[i] is renamed to the value p[i, i+1].
• Step 3 (P-rounds): We use a nested loop in this step.
For t = 1 to ⌊log(n)⌋ − 1 and For m = 1 to ⌊ n2t ⌋ − 1:
At locations p[j,l], p[l, k] and an ancillae apply a logical
AND gate such that locations p[j,l] and p[l, k] are un-
changed. The ancillae will have the result of computation
and will be renamed to the value p[j,k]. The equations
for the indexes are j = 2t · m, k = 2t · m + 2t and
l = 2t ·m+ 2t−1.
• Step 4 (G-rounds): We use a nested loop in this step.
For t = 1 to ⌊log(n)⌋ and For m = 0 to ⌊ n2t ⌋ − 1:
At locations g[j, l], p[l, k], and g[l, k] apply a Toffoli gate
such that locations g[j, l] and p[l, k] pass through un-
changed. Location g[l, k] holds the result of computation
7|a0〉 •• • |a0〉
|b0〉 • • |s0〉
|A〉 • • • • •
|a1〉 •• • •• |a1〉
|b1〉 • • • • • • |s1〉
|A〉 •
|A〉 •
|A〉 • • • • • • • • •
|a2〉 •• • •• |a2〉
|b2〉 • • • • • •• • • • |s2〉
|A〉 • • • •
|A〉 •
|A〉 •
|A〉 • • • • •
|a3〉 •• • •• |a3〉
|b3〉 • • • • • • • •• • |s3〉
|A〉 •
|A〉 •
|A〉 •
|A〉 •
|A〉 • •• •• • • • •• •
|a4〉 •• • •• |a4〉
|b4〉 • • • • • •• • • • |s4〉
|A〉 • • • • • •
|A〉 •
|A〉 •
|A〉 • • • • •
|a5〉 •• • •• |a5〉
|b5〉 • • • • • • • •• • |s5〉
|A〉 • •
|A〉 •
|A〉 •
|A〉 •
|A〉 •
|A〉 • • • • • • •
|a6〉 •• • •• |a6〉
|b6〉 • • • • • • • • |s6〉
|A〉 • • • •
|A〉 •
|A〉 •
|A〉 • • •
|a7〉 •• |a7〉
|b7〉 • • • • • |s7〉
|A〉 •
|A〉 •
|A〉 •
|A〉 |s8〉
Fig. 7: In-place FT-QCLA1 (In-FT-QCLA1) for the case of
adding two 8 bit values a and b.
and will be renamed to the value g[j, k]. The equations
for the indexes are j = 2t · m, k = 2t · m + 2t, and
l = 2t ·m+ 2t−1.
• Step 5 (C-rounds): We use a nested loop in this step.
For t = ⌊log
(
2n
3
)
⌋ to 1 and For m = 1 to ⌊n−2
t−1
2t ⌋:
At locations g[0, l], p[l,k], and g[l,k] apply a Toffoli
gate such that locations g[0, l] and p[l,k] are unchanged.
Location g[l,k] will hold the result of computation and
will be renamed to the value g[0, k]. The equations for
the indexes are l = 2t ·m and k = 2t ·m+ 2t−1.
• Step 6 (P-erase-rounds): We use a nested loop in this
step. For t = ⌊log(n)⌋−1 to 1 and For m = 1 to ⌊ n2t ⌋−
1:
At locations p[j, l], p[l, k] and p[j, k] apply a uncom-
putation gate such that locations p[j, l] and p[l, k] are
unchanged. Location p[j, k] will be restored to its original
value. The equation for the indexes are j = 2t · m,
k = 2t ·m+ 2t and l = 2t ·m+ 2t−1.
• Step 7: For i = 1 to n-1: at locations p[i, i +1] and
g[0, i] apply a CNOT gate such that location g[0, i] is
unchanged. Location p[i, i +1] will have the result of
computation.
• Step 8: This step has the following two sub-steps:
– Sub-step 1: At location B[0] apply a NOT gate. The
location B[0] will be renamed to the value p[0, 1]
– Sub-step 2: For i = 1 to n-2: At location p[i, i +1]
apply a NOT gate.
• Step 9: For i = 1 to n-2: At locations A[i] and p[i, i +1]
apply the CNOT gate such that location A[i] is unchanged
and p[i, i +1] has the result of computation.
• Step 10 (Reverse of P-erase-rounds): We use a nested
loop in this step. For t = 1 to ⌊log(n)⌋ − 1 and For
m = 1 to ⌊ n2t ⌋ − 1:
At locations p[j,l], p[l,k] and an ancillae apply a logical
AND gate such that locations p[j,l] and p[l,k] are un-
changed. The ancillae will hold the result of computation
and the ancillae is renamed to the value p[j,k]. The
equations for the indexes are j = 2t ·m, k = 2t ·m+2t,
and l = 2t ·m+ 2t−1.
• Step 11 (Reverse of C-rounds): We use a nested loop
in this step. For t = 1 to ⌊log
(
2n
3
)
⌋ and For m =
1 to ⌊ (n−2
t−1)
2t ⌋:
At locations g[0,l], p[l,k], and g[0,k] apply a Toffoli gate
such that locations g[0,l] and p[l,k] are unchanged. Lo-
cation g[0,k] will have the result of computation and the
location will be renamed to the value g[l,k]. The equations
for the indexes are l = 2t ·m and k = 2t ·m+ 2t−1.
• Step 12 (Reverse of G-rounds): We use a nested loop in
this step. For t = ⌊log(n)⌋ to 1 and For m = 0 to ⌊ n2t ⌋−
1:
At locations g[j,l], p[l,k] and g[j,k] apply a Toffoli gate
such that locations g[j,l] and p[l,k] are unchanged. Loca-
tion g[j,k] will have the result of computation and the
location is renamed to the value g[l,k. The equations
for the indexes are j = 2t · m, k = 2t · m + 2t and
l = 2t ·m+ 2t−1.
• Step 13 (Reverse of P-rounds): We use a nested loop
in this step. For t = ⌊log(n)⌋ − 1 to 1 and For m =
1 to ⌊n−12t ⌋ − 1:
At locations p[j,l], p[j,k], and p[l,k] apply a uncom-
putation gate such that locations p[j,l] and p[l,k] are
unchanged. Location p[j,k] will be restored to its original
value. The equations for the indexes are j = 2t · m,
k = 2t ·m+ 2t and l = 2t ·m+ 2t−1
• Step 14: For i=1 to n-2: At locations A[i] and p[i, i
+1], apply a CNOT gate such that location A[i] would
be unchanged and location p[i, i +1] will have the result
of computation.
• Step 15: For i=0 to n-2: at locations A[i], p[i, i +1], and
g[i, i +1], apply an uncomputation gate such that locations
a[i] and p[i, i +1] are unchanged and g[i, i +1] will be
restored to its original value.
• Step 16: Location p[n-1, n] has the sum bit sn-1 and g[0,
8n] has the sum bit sn. For i=0 to n-2: At location p[i, i
+1] apply a NOT gate. Location p[i, i +1] will have the
sum bit si.
B. Proposed In-Place FT-QCLA2 Design (In-FT-QCLA2)
In-FT-QCLA1 is optimized for T-count. A trade-off for
the reduced T-count of In-FT-QCLA1 is an increase in the
qubit cost of In-FT-QCLA1. We can reduce the qubit cost by
replacing logical-AND gate and uncomputation gate pairs with
alternative Toffoli gates implementations (such as the design
in [18]) where appropriate. We save qubits because the logical
AND gate and uncomputation gate pairs have a qubit cost of 4
while implementations like the design in [18] have a qubit cost
of 3. However, alternative Toffoli gate implementations have
higher T-counts (such as the design in [18] with a T-count of
7).
The steps of the methodology to implement In-FT-QCLA2
are identical to the methodology to implement In-FT-QCLA1.
To implement In-FT-QCLA2, replace the logical AND gate,
uncomputation gate pairs with a Toffoli gate in Step 4, Step
5, Step 11 and Step 12. An illustrative example of the In-FT-
QCLA2 circuit is in Figure 8.
|a0〉 • • • |a0〉
|b0〉 • • |s0〉
|A〉 • • •
|a1〉 • • • • • |a1〉
|b1〉 • • • • |s1〉
|A〉 • • • • •
|a2〉 • • • • • |a2〉
|b2〉 • • • • • • • • |b2〉
|A〉 • •
|A〉 • • •
|a3〉 • • • • • |a3〉
|b3〉 • • • • • • • • |b3〉
|A〉 • • • • • •
|a4〉 • • • • • |a4〉
|b4〉 • • • • • • • • |s4〉
|A〉 • • • •
|A〉 • • •
|a5〉 • • • • • |a5〉
|b5〉 • • • • • • • • |s5〉
|A〉 •
|A〉 • • • •
|a6〉 • • • • • |a6〉
|b6〉 • • • • • • |b6〉
|A〉 • • •
|A〉 • •
|a7〉 • • |a7〉
|b7〉 • • • • |s7〉
|A〉 |s8〉
Fig. 8: In-place FT-QCLA2 (In-FT-QCLA2) for the case of
adding two 8 bit values a and b.
VI. PERFORMANCE OF PROPOSED IN-PLACE QCLA
CIRCUITS
A. T-count analysis of In-FT-QCLA1
The T-count of In-FT-QCLA1 is shown for each Step of the
proposed design methodology. Total T-count is determined by
summing the T-count for each Step of the proposed design
methodology. The total T-count is 20·n−8·w(n)−4·⌊log(n)⌋−
8·w(n−1)−4·⌊log(n−1)⌋−8 where w(n) = n−
∑∞
y=1
⌊
n
2y
⌋
.
• Step 1 uses n logical AND gates. The T-count for this
step is 4 · n.
• Step 2 does not need T gates.
• Step 3 uses n−w(n)−⌊log(n)⌋ logical AND gates. The
T-count for this step is 4 · (n− w(n) − ⌊log(n)⌋).
• Step 4 uses n−w(n) Toffoli gates. The T-count for this
step is 4 · (n− w(n)).
• Step 5 uses n− ⌊log(n)⌋ − 1 Toffoli gates. The T-count
for this step is 4 · (n− ⌊log(n)⌋ − 1).
• Steps 6 through 9 does not need T gates.
• Step 10 uses n − 1 − w(n − 1) − ⌊log(n − 1)⌋ logical
AND gates. The T-count for this step is 4 ·(n−1−w(n−
1)− ⌊log(n− 1)⌋).
• Step 11 uses n − ⌊log(n − 1)⌋ − 2 Toffoli gates. The
T-count for this step is 4 · (n− ⌊log(n− 1)⌋ − 2).
• Step 12 uses n− 1−w(n− 1) Toffoli gates. The T-count
for this step is 4 · (n− 1− w(n− 1)).
• Steps 13 through 16 does not need T gates.
B. T-count analysis of In-FT-QCLA2
The T-count of In-FT-QCLA2 is shown for each Step of the
proposed design methodology. Total T-count is determined by
summing the T-count for each Step of the proposed design
methodology. The total T-count is 40 · n − 11 · w(n) − 11 ·
⌊log(n)⌋ − 11 · w(n − 1) − 11 · ⌊log(n − 1)⌋ − 32 where
w(n) = n−
∑∞
y=1
⌊
n
2y
⌋
.
• Step 1 uses n logical AND gates. The T-count for this
step is 4 · n.
• Step 2 does not need T gates.
• Step 3 uses n−w(n)−⌊log(n)⌋ logical AND gates. The
T-count for this step is 4 · (n− w(n) − ⌊log(n)⌋).
• Step 4 uses n−w(n) Toffoli gates. The T-count for this
step is 7 · (n− w(n)).
• Step 5 uses n− ⌊log(n)⌋ − 1 Toffoli gates. The T-count
for this step is 7 · (n− ⌊log(n)⌋ − 1).
• Steps 6 through 9 does not need T gates.
• Step 10 uses n − 1 − w(n − 1) − ⌊log(n − 1)⌋ logical
AND gates. The T-count for this step is 4 ·(n−1−w(n−
1)− ⌊log(n− 1)⌋).
• Step 11 uses n − ⌊log(n − 1)⌋ − 2 Toffoli gates. The
T-count for this step is 7 · (n− ⌊log(n− 1)⌋ − 2).
• Step 12 uses n− 1−w(n− 1) Toffoli gates. The T-count
for this step is 7 · (n− 1− w(n− 1)).
• Steps 13 through 16 does not need T gates.
C. Cost Comparison of Proposed In-Place QCLAs
1) Cost Comparison in Terms of T-count: Table IV indi-
cates that all proposed in-place QCLAs have T-count costs of
9TABLE IV: Cost Comparison of In-Place QCLAs
Design T-count Equation Qubit Equation
Draper et al. ([28]) 70n− 21w(n)− 21⌊log(n)⌋ − 21w(n− 1) − 21⌊log(n − 1)⌋ − 49 4n− w(n)− ⌊log(n)⌋ + 1
Trisetyarso et al. ([30]) 70n− 21w(n)− 21⌊log(n)⌋ − 21w(n− 1) − 21⌊log(n − 1)⌋ − 49 4n− w(n)− ⌊log(n)⌋ + 1
Thapliyal et al. ([32]) 203
4
n− 28 4n+ 1
Takahashi et al.([8]) ≈ 196n ≈ 5n
Takahashi et al.([36]) ≈ 49n ≈ 5n
Cheng et al. ([27]) 14
6
n3 + 21
6
n2 − 49
6
n 3 · n+ 1
Mogenson (Design 1)1 ([37]) ≈ 84 · n− 56 3 · n− 1
Mogenson (Design 2)2 ([37]) ≈ 84 · n− 56 3 · n− log(n)− 1
In-FT-QCLA13 20n− 8w(n)− 8w(n− 1)− 4⌊log(n)⌋ − 4⌊log(n− 1)⌋ − 8 6n− 2w(n)− 2⌊log(n)⌋
In-FT-QCLA24 40n− 11w(n)− 11⌊log(n)⌋ − 11w(n− 1) − 11⌊log(n − 1)⌋ − 32 4n− w(n)− ⌊log(n)⌋ + 1
w(n) = n−
∑∞
y=1
⌊
n
2y
⌋
1 Mogenson (Design 1) can accept a carry in bit c0.
2 Mogenson (Design 2) does not accept a carry in bit.
3 In-FT-QCLA1 is optimized emphasizing T-count.
4 In-FT-QCLA2 is optimized emphasizing qubit cost.
order O(n). The existing works also have a T-count of order
O(n) with the exception of Cheng et al. where the T-count is
of order O(n3). Of the proposed designs, In-FT-QCLA1 has
the lowest T-count.
In-FT-QCLA2 requires roughly 79.59% fewer T gates than
the design by Takahashi et al. in [8], 18.37% fewer T gates
than the design by Takahashi et al. in [36], 52.38% fewer
T gates than the designs by Mogenson and 42.86% fewer T
gates than the designs by Draper et al. and Trisetyarso et al.
and 21.18% fewer T gates than the Thapliyal et al.
The Proposed In-place In-FT-QCLA1 requires roughly
89.80% fewer T gates than the design by Takahashi et al. in
[8], 59.18% fewer T gates than the design by Takahashi et al.
in [36], 76.19% fewer T gates than the design by Mogenson
and 71.43% fewer T gates than the designs by Draper et al.
and Trisetyarso et al. and 60.59% fewer T gates compared to
the design by Thapliyal et al. The In-FT-QCLA1 and In-FT-
QCLA2 also achieve an order of magnitude reduction in T
gates compared to the design by Cheng et al.
2) Cost Comparison in Terms of Qubits: Table IV indicates
that the proposed in-place QCLAs have qubit costs of order
O(n). The existing works also have a T-count of order O(n).
In-FT-QCLA2 requires w(n)+ ⌊log(n)⌋ fewer qubits than the
design by Thapliyal et al. and has the same qubit cost as the
designs by Draper et al. and Trisetyarso et al. Further, In-FT-
QCLA2 requires n+ w(n) + ⌊log(n)⌋ − 1 fewer qubits than
the designs by Takahashi et al. in [8] and [36]. In-FT-QCLA1
achieves its T gate savings with only an order O(1) increase
in qubits compared to the existing work.
VII. EXISTING WORK
Quantum carry lookahead adders (QCLA) have caught the
attention of researchers and many have contributed many
designs to the literature. Several designs such as [29] [31]
target reversible computing and, therefore, produce significant
garbage output. Other designs such as [42] present promising
designs but they are not generic, prohibiting scaling the
designs to alternative input qubit lengths. Designs that can
be implemented on quantum hardware include [29] and [28].
In-place and out-of-place QCLAs have been proposed. We
discuss the existing work for in-place and out-of-place QCLAs
in separate sections.
A. Out-of-Place QCLAs
Existing out-of-place QCLAs that can be implemented
on quantum hardware include [29] [31] [28] [30] [32]. Ta-
ble I summarizes important performance measures for these
QCLAs. T-count, qubit cost, gates used and if the design
produces garbage output are shown. The designs in [29] and
[31] produce garbage output and must be made garbageless
before use in quantum algorithms. As a result, the T-count
cost is doubled and the qubit cost is increased by at least
n + 1. The reported T-count and qubit cost reflect the added
cost from removing garbage outputs. The designs in [28] [30]
and [32] have no garbage outputs and can be used as is. The
design in [31] offers the lowest T-count in exchange for a
O(1) increase in qubit cost compared to existing works. The
designs in [28] [30] and [32] achieve the lowest qubit cost
with only a modest O(1) increase in T-count compared to
more T-count efficient works such as [31]. The design in [29]
has the highest resource costs of the existing works. These
are all interesting works that offer options with reduced qubit
or T gate costs. However, with advances such as recent T
gate efficient Toffoli gate implementations shown in [23], we
have designed quantum QCLAs that have reduced T-count
compared to these works. Further, we can design quantum
QCLAs that offer T-count savings yet maintain comparable
qubit costs.
B. In-Place QCLAs
Existing in-place QCLAs that can be implemented on
quantum hardware include [28] [8] [30] [32] [27] and [37].
Table II summarizes important performance measures for these
QCLAs. T-count, qubit cost, gates used and if the design
produces garbage output are shown. All the designs have
no garbage outputs and can be used as is. The in-place
QCLA in [27] is based on CNOT, Toffoli gates and Multiple
Control Toffoli gates. To decompose a multiple control Toffoli
gate into quantum gates, first the multiple control Toffoli
10
gate must be decomposed into Toffoli gates. One multiple
control Toffoli gate translates into 2 · n − 3 Toffoli gates
which has a corresponding T-count of 14 · n − 21 (using the
Toffoli gate implementation in [18]). As a consequence, the
design in [27] has a T-count of order O(n3) which becomes
prohibitively costly for large n. The other existing works have
more reasonable T gate costs of order O(n). Of these works,
the designs by Mogenson have the lowest T-count and can
perform their operation with only 3O(n) qubits. Among the
works with T gate costs of order O(n), the design in [8] is
the most costly in terms of T-count and qubit costs. Prior
works such as [30] [32] and [37] present QCLAs that offer
low T-count, low qubit cost and no garbage outputs. However,
with the recent T gate efficient Toffoli gate implementations
shown in [23], we have designed quantum QCLAs that have
reduced T-count compared to these works. Further, we have
designed quantum QCLAs that offer T-count reduction with
only a modest O(1) qubit cost increase compared to existing
work.
VIII. CONCLUSION
In this work, we propose quantum circuits for carry looka-
head addition. We present proposed designs for in-place
QCLAs and out-of-place QCLAs. We present three designs
for in-place QCLAs and three designs for the out-of-place
QCLAs. The in-place FT-QCLA1 (In-FT-QCLA1) and out-of-
place FT-QCLA1 (Out-FT-QCLA1) are optimized for T-count.
The in-place FT-QCLA2 (In-FT-QCLA2) and out-of-place FT-
QCLA2 (Out-FT-QCLA2) are optimized for qubit cost while
providing low T gate cost. The proposed QCLAs are based
on NOT gates, CNOT gates, Toffoli gates, logical AND gates,
uncomputation gates as well as a proposed uncomputation gate
for near term quantum hardware. These designs are compared
and shown to have reduced T gate and qubit costs compared
to the existing work. We conclude that the proposed in-
place QCLAs and out-of-place QCLAs can be used in larger
quantum data-path circuits where gate count and/or qubit cost
is of concern. We also conclude that the proposed QCLAs
can be used to increase the amount of computation possible
on quantum hardware with limited coherence times.
REFERENCES
[1] A. M. Childs and N. Wiebe, “Hamiltonian simulation using linear combi-
nations of unitary operations,” Quantum Information and Computation,
vol. 12, no. 11-12, pp. 901–924, 2012.
[2] P. Shor, “Algorithms for quantum computation: discrete logarithms and
factoring,” in Proceedings 35th Annual Symposium on Foundations of
Computer Science. IEEE Comput. Soc. Press, 1994, pp. 124–134.
[3] L. Novo and D. Berry, “Improved hamiltonian simulation via a truncated
taylor series and corrections,” Quantum Information and Computation,
vol. 17, no. 7-8, pp. 623–635, 2017.
[4] S. Hallgren, “Polynomial-time quantum algorithms for pell’s
equation and the principal ideal problem,” J. ACM, vol. 54,
no. 1, pp. 4:1–4:19, Mar. 2007. [Online]. Available:
http://doi.acm.org/10.1145/1206035.1206039
[5] A. M. Childs and G. Ivanyos, “Quantum computation of discrete
logarithms in semigroups,” Journal of Mathematical Cryptology, vol. 8,
no. 4, 2014.
[6] L. Ruiz-Perez and J. Garcia-Escartin, “Quantum arithmetic with the
quantum fourier transform,” Quantum Information Processing, vol. 16,
no. 6, pp. 1–14, 2017.
[7] S. A. Cuccaro, T. G. Draper, S. A. Kutin, and D. Petrie Moulton, “A
new quantum ripple-carry addition circuit,” arXiv e-prints, Oct 2004.
[Online]. Available: https://arxiv.org/abs/quant-ph/0410184
[8] Y. Takahashi and N. Kunihiro, “A fast quantum circuit for addition with
few qubits,” Quantum Information and Computation, vol. 8, no. 6-7, pp.
636–649, 2008.
[9] E. Mun˜oz-Coreas and H. Thapliyal, “Quantum circuit design of a t-
count optimized integer multiplier,” IEEE Transactions on Computers,
vol. 68, no. 5, pp. 729–739, May 2019.
[10] P. Selinger et al., The Quipper System, 2016, available at:
http://www.mathstat.dal.ca/ selinger/quipper/doc/.
[11] D. Wecker et al., Language-Integrated Quantum Operations:
LIQUi|〉, 2016, available at: https://www.microsoft.com/en-
us/research/project/language-integrated-quantum-operations-liqui/.
[12] C. H. Bennett, “Logical reversibility of computation,” IBM J. Res.
Dev., vol. 17, no. 6, pp. 525–532, Nov. 1973. [Online]. Available:
http://dx.doi.org/10.1147/rd.176.0525
[13] S. Bravyi and A. Kitaev, “Universal quantum computation
with ideal clifford gates and noisy ancillas,” Phys. Rev.
A, vol. 71, p. 022316, Feb 2005. [Online]. Available:
https://link.aps.org/doi/10.1103/PhysRevA.71.022316
[14] M. Amy, D. Maslov, and M. Mosca, “Polynomial-time t-depth optimiza-
tion of clifford+t circuits via matroid partitioning,” IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, vol. 33,
no. 10, pp. 1476–1489, Oct 2014.
[15] A. Paler, I. Polian, K. Nemoto, and S. J. Devitt, “Fault-tolerant, high-
level quantum circuits: form, compilation and description,” Quantum
Science and Technology, vol. 2, no. 2, p. 025003, 2017. [Online].
Available: http://stacks.iop.org/2058-9565/2/i=2/a=025003
[16] X. Zhou, D. W. Leung, and I. L. Chuang, “Methodology for quantum
logic gate construction,” Phys. Rev. A, vol. 62, p. 052316, Oct 2000. [On-
line]. Available: https://link.aps.org/doi/10.1103/PhysRevA.62.052316
[17] D. Gosset, V. Kliuchnikov, M. Mosca, and V. Russo, “An
algorithm for the t-count,” Quantum Info. Comput., vol. 14,
no. 15-16, pp. 1261–1276, Nov. 2014. [Online]. Available:
http://dl.acm.org/citation.cfm?id=2685179.2685180
[18] M. Amy, D. Maslov, M. Mosca, and M. Roetteler, “A meet-in-the-
middle algorithm for fast synthesis of depth-optimal quantum circuits,”
IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, vol. 32, no. 6, pp. 818–830, June 2013.
[19] C. Jones, “Low-overhead constructions for the fault-tolerant toffoli
gate,” Phys. Rev. A, vol. 87, p. 022328, Feb 2013. [Online]. Available:
https://link.aps.org/doi/10.1103/PhysRevA.87.022328
[20] D. Miller, M. Soeken, and R. Drechsler, “Mapping ncv circuits to opti-
mized clifford+t circuits,” in Reversible Computation, ser. Lecture Notes
in Computer Science, S. Yamashita and S.-i. Minato, Eds. Springer
International Publishing, 2014, vol. 8507, pp. 163–175.
[21] S. J. Devitt, A. M. Stephens, W. J. Munro, and K. Nemoto, “Require-
ments for fault-tolerant factoring on an atom-optics quantum computer,”
Nature Communications, vol. 4, 2013.
[22] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland,
“Surface codes: Towards practical large-scale quantum computation,”
Phys. Rev. A, vol. 86, p. 032324, Sep 2012. [Online]. Available:
https://link.aps.org/doi/10.1103/PhysRevA.86.032324
[23] C. Gidney, “Halving the cost of quantum addition,”
Quantum, vol. 2, p. 74, Jun. 2018. [Online]. Available:
https://doi.org/10.22331/q-2018-06-18-74
[24] R. V. Meter and M. Oskin, “Architectural implications of quantum
computing technologies,” J. Emerg. Technol. Comput. Syst., vol. 2, no. 1,
pp. 31–63, Jan. 2006.
[25] IBM, Quantum Computing - IBM Q, 2017, available at:
https://www.research.ibm.com/ibm-q/.
[26] C.-J. Yu, M. J. Graham, J. M. Zadrozny, J. Niklas, M. D.
Krzyaniak, M. R. Wasielewski, O. G. Poluektov, and D. E.
Freedman, “Long coherence times in nuclear spin-free vanadyl
qubits,” Journal of the American Chemical Society, vol. 138, no. 44,
pp. 14 678–14 685, 2016, pMID: 27797487. [Online]. Available:
https://doi.org/10.1021/jacs.6b08467
[27] K.-W. Cheng and C.-C. Tseng, “Quantum plain and
carry look-ahead adders,” 2002. [Online]. Available:
https://arxiv.org/abs/quant-ph/0206028
[28] T. G. Draper, S. A. Kutin, E. M. Rains, and K. M. Svore, “A logarithmic-
depth quantum carry-lookahead adder,” Quantum Information and Com-
putation, vol. 6, no. 4-5, pp. 351–369, 2006.
[29] H. Babu, L. Jamal, and N. Saleheen, “An efficient approach for designing
a reversible fault tolerant n-bit carry look-ahead adder.” IEEE Computer
Society, 2013, pp. 98–103.
11
[30] A. Trisetyarso and R. Van Meter, “Circuit design for a measurement-
based quantum carry-lookahead adder,” International Journal of Quan-
tum Information, vol. 8, no. 05, p. 843867, 2009.
[31] N. J. Lisa and H. M. H. Babu, “Design of a compact reversible
carry look-ahead adder using dynamic programming,” in 2015 28th
International Conference on VLSI Design, vol. 2015-, no. February.
IEEE, 2015, pp. 238–243.
[32] H. Thapliyal, H. Jayashree, A. Nagamani, and H. Arabnia, “Progress
in reversible processor design: A novel methodology for reversible
carry look-ahead adder,” Lecture Notes in Computer Science (including
subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 7420, pp. 73–97, 2013.
[33] L. Jamal, M. M. Rahman, and H. M. H. Babu, “An optimal design of
a fault tolerant reversible multiplier,” in 2013 IEEE International SOC
Conference, Sep. 2013, pp. 37–42.
[34] S. K. Mitra and A. R. Chowdhury, “Minimum cost fault tolerant
adder circuits in reversible logic synthesis,” in 2012 25th International
Conference on VLSI Design, Jan 2012, pp. 334–339.
[35] P. Selinger, “Quantum circuits of t-depth one,” Phys. Rev.
A, vol. 87, p. 042302, Apr 2013. [Online]. Available:
https://link.aps.org/doi/10.1103/PhysRevA.87.042302
[36] Y. Takahashi, S. Tani, and N. Kunihiro, “Quantum addition circuits and
unbounded fan-out,” Quantum Information and Computation, vol. 10,
no. 9-10, pp. 872–890, 2010.
[37] T. Æ. Mogensen, “Reversible in-place carry-lookahead addition with few
ancillae,” in Reversible Computation, M. K. Thomsen and M. Soeken,
Eds. Cham: Springer International Publishing, 2019, pp. 224–237.
[38] H. Thapliyal and N. Ranganathan, “Design of efficient reversible
logic-based binary and bcd adder circuits,” J. Emerg. Technol. Comput.
Syst., vol. 9, no. 3, pp. 17:1–17:31, Oct. 2013. [Online]. Available:
http://doi.acm.org/10.1145/2491682
[39] P. Boykin, T. Mor, M. Pulver, V. Roychowdhury, and F. Vatan, “A new
universal and fault-tolerant quantum basis,” Information Processing
Letters, vol. 75, no. 3, pp. 101 – 107, 2000. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0020019000000843
[40] O. Spaniol, Computer arithmetic : logic and design, ser. Wiley series in
computing. Chichester [West Sussex] ; New York: Wiley, 1981.
[41] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs.
New York: Oxford University Press, 2000.
[42] A. N. Nagamani, C. K. Kavyashree, R. M. Saraswathy, C. H. V. Kartika,
and V. K. Agrawal, “Design of reversible floating point adder for dsp
applications,” in Proceedings of the International Conference on Signal,
Networks, Computing, and Systems, D. K. Lobiyal, D. P. Mohapatra,
A. Nagar, and M. N. Sahoo, Eds. New Delhi: Springer India, 2016,
pp. 123–135.
