T-COUNT OPTIMIZATION OF QUANTUM CARRY LOOK-AHEAD ADDER by Khalus, Vladislav Ivanovich
University of Kentucky 
UKnowledge 
Theses and Dissertations--Electrical and 
Computer Engineering Electrical and Computer Engineering 
2019 
T-COUNT OPTIMIZATION OF QUANTUM CARRY LOOK-AHEAD 
ADDER 
Vladislav Ivanovich Khalus 
University of Kentucky, khalus.vlad@gmail.com 
Digital Object Identifier: https://doi.org/10.13023/etd.2019.227 
Right click to open a feedback form in a new tab to let us know how this document benefits you. 
Recommended Citation 
Khalus, Vladislav Ivanovich, "T-COUNT OPTIMIZATION OF QUANTUM CARRY LOOK-AHEAD ADDER" 
(2019). Theses and Dissertations--Electrical and Computer Engineering. 141. 
https://uknowledge.uky.edu/ece_etds/141 
This Master's Thesis is brought to you for free and open access by the Electrical and Computer Engineering at 
UKnowledge. It has been accepted for inclusion in Theses and Dissertations--Electrical and Computer Engineering by 
an authorized administrator of UKnowledge. For more information, please contact UKnowledge@lsv.uky.edu. 
STUDENT AGREEMENT: 
I represent that my thesis or dissertation and abstract are my original work. Proper attribution 
has been given to all outside sources. I understand that I am solely responsible for obtaining 
any needed copyright permissions. I have obtained needed written permission statement(s) 
from the owner(s) of each third-party copyrighted matter to be included in my work, allowing 
electronic distribution (if such use is not permitted by the fair use doctrine) which will be 
submitted to UKnowledge as Additional File. 
I hereby grant to The University of Kentucky and its agents the irrevocable, non-exclusive, and 
royalty-free license to archive and make accessible my work in whole or in part in all forms of 
media, now or hereafter known. I agree that the document mentioned above may be made 
available immediately for worldwide access unless an embargo applies. 
I retain all other ownership rights to the copyright of my work. I also retain the right to use in 
future works (such as articles or books) all or part of my work. I understand that I am free to 
register the copyright to my work. 
REVIEW, APPROVAL AND ACCEPTANCE 
The document mentioned above has been reviewed and accepted by the student’s advisor, on 
behalf of the advisory committee, and by the Director of Graduate Studies (DGS), on behalf of 
the program; we verify that this is the final, approved version of the student’s thesis including all 
changes required by the advisory committee. The undersigned agree to abide by the statements 
above. 
Vladislav Ivanovich Khalus, Student 
Dr. Himanshu Thapliyal, Major Professor 
Dr. Aaron Cramer, Director of Graduate Studies 
T-COUNT OPTIMIZATION OF
QUANTUM CARRY LOOK-AHEAD ADDER
THESIS
A thesis submitted in partial fulfillment
of the requirements for the degree of
Master of Science in Electrical Engineering
in the College of Engineering
at the University of Kentucky
By
Vladislav Ivanovich Khalus
Lexington, Kentucky
Director: Dr. Himanshu Thapliyal, Assistant Professor of Electrical and Computer
Engineering
Lexington, Kentucky
2019
Copyright © Vladislav Ivanovich Khalus 2019
ABSTRACT OF THESIS
T-COUNT OPTIMIZATION OF QUANTUM CARRY LOOK-AHEAD ADDER
With the emergence of quantum physics and computer science in the 20th century,
a new field was born which can solve very difficult problems in a much faster rate
or problems that classical computing just can’t solve. In the 21st century, quantum
computing needs to be used to solve tough problems in engineering, business, medical,
and other fields that required results not today but yesterday. To make this dream
come true, engineers in the semiconductor industry need to make the quantum circuits
a reality.
To realize quantum circuits and make them scalable, they need to be fault tolerant,
therefore Clifford+T gates need to be implemented into those circuits. But the main
issue is that in the Clifford+T gate set, T gates are expensive to implement.
Carry Look-Ahead addition circuits have caught the interest of researchers because
the number of gate layers encountered by a given qubit in the circuit (or the circuit’s
depth) is logarithmic in terms of the input size n. Therefore, this thesis focuses on
optimizing previous designs of out-of-place and in-place Carry Look-Ahead Adders
to decrease the T-count, sum of all T and T Hermitian transpose gates in a quantum
circuit.
KEYWORDS: Quantum Computing, Fault Tolerant, T-count, Out-of-place Carry
Look-Ahead Adder, In-place Carry Look-Ahead Adder
Vladislav Ivanovich Khalus
May 28, 2019
T-COUNT OPTIMIZATION OF
QUANTUM CARRY LOOK-AHEAD ADDER
By
Vladislav Ivanovich Khalus
Dr. Himanshu Thapliyal
(Director of Thesis)
Dr. Aaron Cramer
(Director of Graduate Studies)
May 28, 2019
(Date)
Table of Contents
Table of Contents iii
List of Figures v
List of Tables vi
1 Introduction 1
1.1 Contribution of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background 5
2.1 Quantum bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Operations on Qubits . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.4 Measurement Gate . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Quantum Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 1-qubit Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 2-qubit Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 3-qubit Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Logical-AND Computation and Uncomputation Gates . . . . . . . . . 14
2.3.1 Logical-AND Computation Gate . . . . . . . . . . . . . . . . . 14
iii
2.3.2 Logical-AND Uncomputation Gate . . . . . . . . . . . . . . . 21
2.3.3 Comparison between the two Methods . . . . . . . . . . . . . 23
2.4 Carry Look-Ahead Addition Review . . . . . . . . . . . . . . . . . . . 26
2.4.1 Classical CLA . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Design of Proposed Out-of-place QCLA 30
3.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 T-count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4 Design of Proposed In-place QCLA 37
4.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 T-count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5 Simulation Results 48
5.1 Out-of-place QCLA . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 In-place QCLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6 Conclusion 52
References 54
Vita 58
iv
List of Figures
2.1 The Measurement gate . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 The Clifford+T gates for a 1-qubit . . . . . . . . . . . . . . . . . . . 12
2.3 The CNOT gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 The Controlled-Z gate . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 The Toffoli gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 The logical-AND computation gate . . . . . . . . . . . . . . . . . . . 15
2.7 The logical-AND uncomputation gate: Measure-and-Fixup . . . . . . 22
2.8 Testing Measure-and-Fixup gate . . . . . . . . . . . . . . . . . . . . . 23
2.9 The logical-AND uncomputation gate: Computation-Reversal . . . . 24
2.10 Testing Computation-Reversal gate . . . . . . . . . . . . . . . . . . . 25
3.1 Out-of-place QCLA: low T-count . . . . . . . . . . . . . . . . . . . . 36
4.1 In-place QCLA: low T-count . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 In-place QCLA: high speed . . . . . . . . . . . . . . . . . . . . . . . . 47
5.1 Out-of-place QCLA simulation . . . . . . . . . . . . . . . . . . . . . . 49
5.2 In-place QCLA simulation . . . . . . . . . . . . . . . . . . . . . . . . 51
v
List of Tables
2.1 Clifford+T 1-qubit gates . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Truth Table of logical-AND . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Equations for Out-of-place T-count . . . . . . . . . . . . . . . . . . . 35
4.1 Equations for In-place T-count . . . . . . . . . . . . . . . . . . . . . . 45
vi
Chapter 1
Introduction
Quantum computing offers a lot of promises like performing tasks that are very com-
putational and a unique capability when processing information [1]. Many algorithms
were proposed to solve difficult problems in communication, computing, and sensing
[1]. To make these quantum algorithms a reality, quantum hardware devices or more
specifically quantum circuits need to be made to perform quantum logic, carrying out
an order of elaborate calculations, and performing quantum information encoding [1].
In 1960, R Landuaer proposed that KTln2 J of energy is wasted on a single bit
of data, k is defined as Boltzmann’s Constant, T is defined as the temperature [2].
Thirteen years later, researcher named Bennet said that to avoid kTln2 J of energy
loss in a circuit, the circuit has to be a reversible one. [2].
The way to make a reversible circuit is by using reversible gates. Reversible gates
have a one-to-one mapping between inputs and outputs [2]. The good thing is that
the circuit will not waste energy or information and the inputs could be recovered
from outputs [2]. For quantum computing, its necessary for quantum gates to be
reversible.
To improve more of the efficiency of quantum computing, the quantum gates
employ superposition and entanglement, this greatly speeds up the computation [3].
1
Instead of having one bit of information being computed, multiple bits of can easily
be computed with quantum computing and its’ gates.
To make the quantum circuits even better, meaning reliable and scalable, they
need to be fault tolerant [4]. For the quantum circuits to be fault tolerant, they
need to omit noise errors [5]. Quantum gates that are fault tolerant have Clifford+T
gates. Clifford+T gates include: NOT, Hadamard, T, T Hermitian transpose, Phase,
Phase Hermitian transpose, and CNOT gates [5]. With the help of these gates and
quantum correcting error codes, the quantum circuit can omit noise errors [5]. The
main issue in implementing a fault tolerant quantum circuit is that T gate is expensive
to implement, therefore the T gate needs to be implemented when necessary [6]. The
reason why T gate is costly is costly to implement in practice compared to Clifford
gates is that the gate isn’t transversal for a lot of quantum error-correcting codes [6],
therefore T gate need to be implemented when necessary.
When going into fault-tolerant computing, quantum gates that aren’t in the Clif-
ford group are difficult to make. In many quantum error correcting codes like the
surface code in [7], just one T gate needs around 100 times circuit volume than com-
pared to an H gate or a CNOT gate [8]. Therefore, the production cost is a big
issue in fault tolerant computing. This is also a big problem in quantum computing
because the T gate is a universal quantum gate, which is used everywhere in quantum
computing and its algorithms. Therefore, to make quantum computing reliable and
scalable, this issue needs to be resolved.
Because T gate is expensive to implement, reducing T-count became one of the
necessary optimizing goals in literature [5]. Where T-count is defined as number of
T and T Hermitian transpose gates in a quantum circuit [6].
Quantum circuits that implement arithmetic operations are a necessity for re-
alizing quantum algorithms [4]. Examples of quantum algorithms are Peter Shor’s
factoring algorithm [4], Grover’s search algorithm [9], Quantum Fourier Transform
2
[10], Simon’s algorithm [11], and triangle finding algorithm [4].
Many arithmetic operations in quantum circuits like binary addition [12], binary
and BCD addition [13], floating-point addition [14], adder-subtractor and subtractor
[15], integer multiplication [16], modular multiplication [17], floating-point multipli-
cation [18], and integer division [4] have been given attention from the researchers
working on quantum computing. But the most important out of these arithmetic
operations is addition because quantum adders form key components in subtraction,
multiplication, and division.
Carry Look-Ahead addition circuits have caught the interest of researchers be-
cause the number of gate layers encountered by a given qubit in the circuit (or the
circuit’s depth) is logarithmic in terms of the input size n. As a result, designs of
Quantum Carry Look-Ahead Adders (QCLA) have been proposed in literature in
[19][20][21][22][23]. For example, designs in [19][20][23] are interesting but they suffer
from high T gate cost. To overcome the limitations of the existing designs, this thesis
presents quantum circuits for Carry Look-Ahead addition based on Clifford+T gates.
1.1 Contribution of Thesis
This thesis introduces integer arithmetic designs that will be used in quantum
computing. Main focus will be on Quantum Carry Look-Ahead Adders (QCLAs):
1. Quantum out-of-place CLA (low T-count).
2. Quantum out-of-place CLA (high speed).
3. Quantum in-place CLA (low T-count).
4. Quantum in-place CLA (high speed).
5. Quantum in-place CLA (high speed + low T-count).
3
1.2 Outline of Thesis
This thesis will be broken down into five main Chapters. Chapter 2 gives a back-
ground of a quantum bit, quantum gates, the logical-AND computation and uncom-
putation gates and an overview of the classical CLA and a literature review of the
existing QCLAs. Then Chapter 3 introduces new design for the out-of-place QCLA
with a low T-count and a faster version of it. Next, Chapter 4 goes into the new
design for the in-place QCLA, which has three versions: the low T-count, the high
speed, and a mixture of both. Chapter 5 shows the simulation results for the out-of-
place and in-place QCLAs. Finally, the thesis will end with a Conclusion in Chapter
6.
4
Chapter 2
Background
This chapter introduces the basics of quantum computing and the reversible quantum
gates. It is highly recommended to get an understanding of the quantum world before
moving on to the other Chapters of this thesis.
2.1 Quantum bit
To begin with quantum computing, one needs to know a quantum bit, or a qubit.
2.1.1 Introduction
The qubit is basically a vector in a two-dimensional complex vector space, C2 [24]. A
generic vector |ψ〉 in C2 can be written as |ψ〉 = α|c1〉 + β|c2〉 [25]. The vector can be
written by using a symbol called a ket, |ψ〉. This notation for qubits is called Dirac
bra-ket notation [24], and will be used throughout quantum information processing
[3].
Going back to |ψ〉 = α|c1〉 + β|c2〉, the |ψ〉 is called a state, α and β are amplitudes
that are complex numbers [25]. The vectors |c1〉 and |c2〉 are defined as |c1〉 = [ 10 ],
|c2〉 = [ 01 ]. In ket form, |0〉 = [ 10 ] and |1〉 = [ 01 ]. Therefore, |c1〉 = |0〉 and |c2〉 = |1〉.
5
With |ψ〉 = α|c1〉 + β|c2〉, this means that the qubit is in superposition, where its
being in both states |0〉 and |1〉 [3], not just |0〉 or |1〉. The state |ψ〉 can also be called a
linear combination of states [24], which is the actual definition of superposition. With
the help of superosition, large amount of classical bits can be processed by running
the quantum computer once.
One thing to note, a qubit doesn’t start out in superposition, it is usually in spin-
up state: |0〉 or spin-down state: |1〉, which correspond to classical bits 0 and 1. But
when an operation is performed on a qubit like a qubit going through the quantum
gate, things get interesting, the qubit ends up in superposition or in a multi-qubit
system, the qubits get entangled.
2.1.2 Operations on Qubits
Before going further in understanding quantum gates, one needs to know how they
are used with qubits. There are two forms that can be used to minipulate qubits
and gates, these two forms are matrix and Dirac bra-ket notation. For example, the
matrix is [ 1 00 1 ] and the equivalent bra-ket notation will be |0〉〈0| + |1〉〈1| [3]. The
bra form 〈ψ| or the combination of the ket and the bra forms: |ψ1〉〈ψ2| will not be
used in this thesis because that signifies a gate, and a gate is better understood in
matrix form. As for the ket form |ψ〉, it signifies a qubit and qubit can be used
interchangeably in the ket form or in the matrix form.
As previously explained, the states for a 1-qubit: |0〉 = [ 10 ] and |1〉 = [ 01 ]. Which
shows that just one qubit has possible states of {|0〉, |1〉}, which are called compu-
tational basis states [24]. Going into the 2-qubit quantum system, there are four
computational basis states of {|00〉, |01〉, |10〉, |11〉}, and the state can be written as
|ψ〉 = c0,0|00〉 + c0,1|01〉 + c1,0|10〉 + c1,1|11〉 [24]. In the matrix form, the computa-
tional basis states look like:
6
|00〉 in matrix:

1
0
0
0

|01〉 in matrix:

0
1
0
0

|10〉 in matrix:

0
0
1
0

|11〉 in matrix:

0
0
0
1

.
Lastly, when the two qubits go through a gate, the operation looks like this:

1 0 0 0
0 1 0 0
0 0 0 1
0 0 1 0



0
0
0
1

=

0
0
1
0

.
Going into the 3-qubit quantum system, there are eight computational basis states:
{|000〉, |001〉, |010〉, |011〉, |100〉, |101〉, |110〉, |111〉}, The state can be written as |ψ〉
= c0,0,0|000〉 + c0,0,1|001〉 + c0,1,0|010〉 + c0,1,1|011〉+...c1,1,1|111〉 [24]. In the matrix
form, the computational basis states look like this:
|000〉:

1
0
0
0
0
0
0
0

|001〉:

0
1
0
0
0
0
0
0

|010〉:

0
0
1
0
0
0
0
0

|011〉:

0
0
0
1
0
0
0
0

|100〉:

0
0
0
0
1
0
0
0

|101〉:

0
0
0
0
0
1
0
0

7
|110〉:

0
0
0
0
0
0
1
0

|111〉:

0
0
0
0
0
0
0
1

.
When three qubits go through a gate, the operation looks like this:

1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0



0
0
0
0
0
0
1
0

=

0
0
0
0
0
0
0
1

Using qubits as matrices gets tedious, so Dirac bra-ket notation or more precisely
the ket form |ψ〉 is used to simplify things. There is a Dirac bra-ket notation for
the gates but it will be omitted for simplicity. Therefore, when there is a operation
between the quantum gate and the qubit(s), then the operation can be written as
X|1〉 or CNOT|11〉 or the operation can say that a CNOT gate was applied on qubits
one and three.
8
2.1.3 Entanglement
Entangled states are crucial pieces in quantum computation [24]. They are unique
quantum phenomena, because of non-existent classical counterpart [3]. When going
into a multi-qubit system, most states are entangled. The use of entanglement is
important in quantum teleportation, Bell’s inequality, and superdense coding [24].
An entangled state has a property that there does not exist two 1-qubit states,
|c1〉 and |c2〉 that |ψ〉 = |c1〉|c2〉. For an entangled state |ψ〉 6= |c1〉|c2〉 because the
product of two states cannot equal an entangled state, |ψ〉 [24]. With the number
of qubits increasing, an exponential growth of the complexity of entangled states is
expected.
One way to make an ordinary state entangled, there has to be an interaction
between the qubits by using gates. If the quantum system is 2-qubit system, then
a 2-qubit gate is required to entangled the qubits, with the help of the CNOT gate,
this can be achieved [25].
An example of non-entangled state is |ψ〉 = 1
2
(|00〉 + |01〉 + |10〉 + |11〉), which
was extracted from the product 1√
2
(|0〉 + |1〉) and 1√
2
(|0〉 + |1〉). But an example of
an entangled state is |ψ〉 = 1√
2
(|00〉 + |11〉), which is also one of Bell states that is
used for quantum teleporation [24].
To understand entanglement clearly, there is a state |ψ1〉 = |01〉, the first qubit
which is 0 signifies night, and the second qubit which is 1 signifies the stars can be
seen, these two qubits are entangled or they cannot be separated from each other
because one will not function without the other. If its day time, the stars cannot be
seen and the new state will be |ψ2〉 = |10〉. Therefore, the stars seen or not seen qubit
is entangled to the night or day qubit.
9
2.1.4 Measurement Gate
Just like in a classical computer when the user wants to retrieve the content in the
memory, a quantum computer can do the same. The measurement gate as shown
in Figure 2.1 is the only way to get the information about the qubit [3]. Using the
measurement gate, one classical bit of information can be retrieved from the qubit [3].
This means that one needs to be careful where to put the measurement gate because it
restricts the amount of information that can be retrieved from the qubit [3]. Another
restriction of the measurement circuit is that a state cannot be cloned, and one of
the cloned states measured [3], once the state is measured, the measurement yields
a classical bit of information [3]. As shown in Figure 2.1, the single line is the qubit
and the double line is the classical bit.
With the qubit being in the state |ψ〉 = α|0〉 + β|1〉, when measuring, the result
for the classical bit of 0 will have a probability of |α|2 and the result for the classical
bit of 1 will have a probability of |β|2, which means |α|2 + |β|2 equals a probability
of 1 [24].
Lastly, Combining Entanglement and Measuring, when two qubits are entangled,
and one of the qubits are measured, then the measuring will let the user know of
the other qubit that wasn’t measured, even though the non-measured qubit was un-
touched.
Figure 2.1: The Measurement gate
10
2.2 Quantum Gates
Before making any quantum circuit, like the Carry Look-Ahead Adder, an under-
standing of quantum gates needs to be made. One thing to note that in quantum
computing, all the gates are reversible. In this thesis, only the quantum gates that
built the quantum circuits in Chapters 4 and 5 will be discussed. These and other
quantum gates can be found in [24].
2.2.1 1-qubit Gates
Starting with the 1-qubit Clifford+T gates shown more descriptly and clearly in [26]
than in [24], they consist of one input and one output. Figures 2.2(a), 2.2(b), 2.2(c),
2.2(d), 2.2(e), 2.2(f) show these gates that will be used to make a fault tolerant
quantum circuit. Also, looking at Table 2.1, it shows all the information for these
gates and what happens when a qubit goes through them.
2.2.2 2-qubit Gates
CNOT gate/Feynman Gate
The Controlled NOT gate or CNOT gate for short is a 2x2 reversible gate and a
Clifford gate with two inputs: |A〉, |B〉 and two outputs: |P 〉 and |Q〉. The first
input equals the first output, therefore |P 〉 = |A〉. As for the second output, the first
input is EXOR’ed with the second input, therefore |Q〉 = |A⊕B〉. The quantum
configuration of the CNOT gate can be EXOR-Down or EXOR-Up. This all can be
shown in Figures 2.3(a), 2.3(b), 2.3(c).
Controlled-Z Gate
The Controlled-Z gate or CZ gate is a 2x2 reversible gate with two inputs: |A〉, |B〉
and two outputs: |P 〉 and |Q〉. The first input equals the first output, therefore |P 〉
11
(a) (b)
(c) (d)
(e) (f)
Figure 2.2: The Clifford+T gates for a 1-qubit
(a) The NOT gate
(b) The Hadamard gate
(c) The T gate
(d) The T † gate
(e) The S gate
(f) The S† gate
(a) (b) (c)
Figure 2.3: The CNOT gate
(a) Graphical Representation
(b) Quantum Representation
(c) Matrix Representation
= |A〉. As for the second output, |Q〉 = (−1)A·B|B〉, which means when both |A〉 and
|B〉 have a qubit of |1〉, then |Q〉 = -|1〉 else |Q〉 = |B〉. The CZ gate can be shown
in Figures 2.4(a), 2.4(b), 2.4(c).
12
Table 2.1: Clifford+T 1-qubit gates
Type of gate Symbol Matrix Input: |A〉 Output: |P 〉
NOT X or --·⊕·--
[
1 0
0 1
]
|0〉 |1〉
|1〉 |0〉
Hadamard H 1√
2
[
1 1
1 −1
]
|0〉 1√
2
(
|0〉 + |1〉
)
|1〉 1√
2
(
|0〉 - |1〉
)
T gate T
[
1 0
0 e
iπ
4
]
|0〉 |0〉
|1〉 e iπ4 |1〉
T gate Hermitian Transpose T †
[
1 0
0 e
−iπ
4
]
|0〉 |0〉
|1〉 e−iπ4 |1〉
Phase S
[
1 0
0 i
]
|0〉 |0〉
|1〉 i |1〉
Phase Hermitian Transpose S†
[
1 0
0 -i
]
|0〉 |0〉
|1〉 -i |1〉
(a) (b) (c)
Figure 2.4: The Controlled-Z gate
(a) Quantum Representation
(b) Clifford+T Representation
(c) Matrix Representation
13
2.2.3 3-qubit Gates
Toffoli Gate:
The Toffoli gate or the Controlled-Controlled NOT gate as shown in Figures 2.5(a),
2.5(b), 2.5(c), 2.5(d) is a 3x3 reversible gate with three inputs: |A〉, |B〉, |C〉 and three
outputs: |P 〉, |Q〉, |R〉. The first input equals the first output, therefore |P 〉 = |A〉.
The same is true for the second output, meaning |Q〉 = |B〉. The last output takes
the first, second inputs and ANDs them together, and finally the AND operation gets
EXOR’ed by the third output, therefore |R〉 = |A&B〉 ⊕ |C〉. The Toffoli gate can be
in three configurations EXOR-Down, EXOR-Up or EXOR-Middle. The Clifford+T
implentation shown in Figure 2.5(c) was taken from [24].
2.3 Logical-AND Computation and Uncomputa-
tion Gates
One of the last steps before making the main quantum circuits in Chapters 4 and 5 is
understanding the three gates: logical-AND computation gate, logical-AND uncom-
putation Measure-and-Fixup gate, and logical-AND uncomputation Computation-
Reversal gate, which were all discussed by Gidney in [27].
2.3.1 Logical-AND Computation Gate
Many addition circuits require the AND of two qubits, this gate is a great implemen-
tation inside of them. As it can be seen in Figures 2.6(a) and 2.6(b), the two inputs
are the two qubits that are of interest, and the third input is the ancilla that is set at
|T 〉 = 1√
2
(
|0〉 + e iπ4 |1〉
)
, which was extracted from one Hadamard and one T gate.
Talking about the T-count, this gate has two (one from ancilla and one from the gate)
T gates and two T † gates; therefore the T-count is four.
14
(a) (b)
(c)
(d)
Figure 2.5: The Toffoli gate
(a) Graphical Representation
(b) Quantum Representation
(c) Clifford+T Representation
(d) Matrix Representation
(a) (b)
Figure 2.6: The logical-AND computation gate
(a) Graphical Representation
(b) Quantum Representation
15
The logical-AND computation gate was divided into four sections to prove that
the logical-AND of two qubits work. Looking at Table 2.2, it shows all possible
combinations of |A〉 and |B〉.
Table 2.2: Truth Table of logical-AND
|A〉 |B〉 |A&B〉
|0〉 |0〉 |0〉
|0〉 |1〉 |0〉
|1〉 |0〉 |0〉
|1〉 |1〉 |1〉
Starting with the first case: |A〉 = |0〉, |B〉 = |0〉
0. Initial values:
|0〉|0〉 1√
2
(
|0〉+ e iπ4 |1〉
)
∴ |0〉|0〉 1√
2
(
|0〉+ ( 1√
2
+ 1√
2
i)|1〉
)
∴ |0〉|0〉 1√
2
|0〉+ |0〉|0〉 1√
2
( 1√
2
+ 1√
2
i)|1〉 ∴ 1√
2
|0〉|0〉|0〉 + (1
2
+ 1
2
i)|0〉|0〉|1〉
1. Applying the CNOT gate on the first qubit and the third:
1√
2
|0〉|0〉|0〉 + (1
2
+ 1
2
i)|0〉|0〉|1〉
2. Applying the CNOT gate on the second qubit and the third:
1√
2
|0〉|0〉|0〉 + (1
2
+ 1
2
i)|0〉|0〉|1〉
3. Applying another CNOT gate on the second qubit and the third:
1√
2
|0〉|0〉|0〉 + (1
2
+ 1
2
i)|0〉|1〉|1〉
4. Applying the CNOT gate on the first and third qubit:
1√
2
|0〉|0〉|0〉 + (1
2
+ 1
2
i)|1〉|1〉|1〉
5. Applying the T † gates on the first and second qubits and T gate on the third
qubit:
1√
2
|0〉|0〉|0〉+ (1
2
+ 1
2
i)(1
2
- 1
2
i)(1
2
- 1
2
i)(1
2
+ 1
2
i)|1〉|1〉|1〉 ∴ 1√
2
|0〉|0〉|0〉+ 1√
2
|1〉|1〉|1〉
16
6. Applying the CNOT gate on qubits two and three:
1√
2
|0〉|0〉|0〉 + 1√
2
|1〉|0〉|1〉
7. Applying the CNOT gate on first and third qubits:
1√
2
|0〉|0〉|0〉 + 1√
2
|0〉|0〉|1〉
8. Applying Hadamard gate on qubit three:
1√
2
|0〉|0〉 1√
2
(
|0〉+ |1〉
)
+ 1√
2
|0〉|0〉 1√
2
(
|0〉 - |1〉
)
∴ 1
2
|0〉|0〉|0〉+ 1
2
|0〉|0〉|1〉+ 1
2
|0〉|0〉|0〉
- 1
2
|0〉|0〉|1〉 ∴ 1|0〉|0〉|0〉
9. Applying the S gate on the third qubit:
1|0〉|0〉|0〉
Finally, measuring the value (1)2 = 1 ∴ 100% probability on |0〉|0〉|0〉
Going into the second case: |A〉 = |1〉, |B〉 = |0〉
0. Initial values:
|1〉|0〉 1√
2
(
|0〉+ e iπ4 |1〉
)
∴ |1〉|0〉 1√
2
(
|0〉+ ( 1√
2
+ 1√
2
i)|1〉
)
∴ |1〉|0〉 1√
2
|0〉+ |1〉|0〉 1√
2
(
1√
2
+ 1√
2
i
)
|1〉 ∴ 1√
2
|1〉|0〉|0〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|1〉|0〉|1〉
1. Applying the CNOT gate on the first qubit and the third:
1√
2
|1〉|0〉|1〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|1〉|0〉|0〉
2. Applying the CNOT gate on the second qubit and the third:
1√
2
|1〉|0〉|1〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|1〉|0〉|0〉
3. Applying another CNOT gate on the second qubit and the third:
1√
2
|1〉|1〉|1〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|1〉|0〉|0〉
4. Applying the CNOT gate on the first and third qubit:
1√
2
|0〉|1〉|1〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|1〉|0〉|0〉
17
5. Applying the T † gates on the first and second qubits and T gate on the third
qubit:
1√
2
(
1√
2
- 1√
2
i
)(
1√
2
+ 1√
2
i
)
|0〉|1〉|1〉+ 1√
2
(
1√
2
+ 1√
2
i
)(
1√
2
- 1√
2
i
)
|1〉|0〉|0〉 ∴ 1√
2
|0〉|1〉|1〉
+ 1√
2
|1〉|0〉|0〉
6. Applying the CNOT gate on qubits two and three:
1√
2
|0〉|0〉|1〉 + 1√
2
|1〉|0〉|0〉
7. Applying the CNOT gate on first and third qubits:
1√
2
|1〉|0〉|1〉 + 1√
2
|1〉|0〉|0〉
8. Applying Hadamard gate on qubit three:
1√
2
|1〉|0〉 1√
2
(
|0〉 - |1〉
)
+ 1√
2
|1〉|0〉 1√
2
(
|0〉+ |1〉
)
∴ 1
2
|1〉|0〉|0〉 - 1
2
|1〉|0〉|1〉+ 1
2
|1〉|0〉|0〉
+ 1
2
|1〉|0〉|1〉 ∴ 1|1〉|0〉|0〉
9. Applying the S gate on the third qubit:
1|1〉|0〉|0〉
Finally, measuring the value (1)2 = 1 ∴ 100% probability on |1〉|0〉|0〉
Going into the third case: |A〉 = |0〉, |B〉 = |1〉
0. Initial values:
|0〉|1〉 1√
2
(
|0〉+ e iπ4 |1〉
)
∴ |0〉|1〉 1√
2
(
|0〉+ ( 1√
2
+ 1√
2
i)|1〉
)
∴ |0〉|1〉 1√
2
|0〉+ |0〉|1〉 1√
2
(
1√
2
+ 1√
2
i
)
|1〉 ∴ 1√
2
|0〉|1〉|0〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|0〉|1〉|1〉
1. Applying the CNOT gate on the first qubit and the third:
1√
2
|0〉|1〉|0〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|0〉|1〉|1〉
2. Applying the CNOT gate on the second qubit and the third:
1√
2
|0〉|1〉|1〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|0〉|1〉|0〉
18
3. Applying another CNOT gate on the second qubit and the third:
1√
2
|0〉|0〉|1〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|0〉|1〉|0〉
4. Applying the CNOT gate on the first and third qubit:
1√
2
|1〉|0〉|1〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|0〉|1〉|0〉
5. Applying the T † gates on the first and second qubits and T gate on the third
qubit:
1√
2
(
1√
2
- 1√
2
i
)(
1√
2
+ 1√
2
i
)
|1〉|0〉|1〉+ 1√
2
(
1√
2
+ 1√
2
i
)(
1√
2
- 1√
2
i
)
|0〉|1〉|0〉 ∴ 1√
2
|1〉|0〉|1〉
+ 1√
2
|0〉|1〉|0〉
6. Applying the CNOT gate on qubits two and three:
1√
2
|1〉|1〉|1〉 + 1√
2
|0〉|1〉|0〉
7. Applying the CNOT gate on first and third qubits:
1√
2
|0〉|1〉|1〉 + 1√
2
|0〉|1〉|0〉
8. Applying Hadamard gate on qubit three:
1√
2
|0〉|1〉 1√
2
(
|0〉 - |1〉
)
+ 1√
2
|0〉|1〉 1√
2
(
|0〉+ |1〉
)
∴ 1
2
|0〉|1〉|0〉 - 1
2
|0〉|1〉|1〉+ 1
2
|0〉|1〉|0〉
+ 1
2
|0〉|1〉|1〉 ∴ 1|0〉|1〉|0〉
9. Applying the S gate on the third qubit:
1|0〉|1〉|0〉
Finally, measuring the value (1)2 = 1 ∴ 100% probability on |0〉|1〉|0〉
Going into the last case: |A〉 = |1〉, |B〉 = |1〉
0. Initial values:
|1〉|1〉 1√
2
(
|0〉+ e iπ4 |1〉
)
∴ |1〉|1〉 1√
2
(
|0〉+ ( 1√
2
+ 1√
2
i)|1〉
)
∴ |1〉|1〉 1√
2
|0〉+ |1〉|1〉 1√
2
(
1√
2
+ 1√
2
i
)
|1〉 ∴ 1√
2
|1〉|1〉|0〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|1〉|1〉|1〉
19
1. Applying the CNOT gate on the first qubit and the third:
1√
2
|1〉|1〉|1〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|1〉|1〉|0〉
2. Applying the CNOT gate on the second qubit and the third:
1√
2
|1〉|1〉|0〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|1〉|1〉|1〉
3. Applying another CNOT gate on the second qubit and the third:
1√
2
|1〉|1〉|0〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|1〉|0〉|1〉
4. Applying the CNOT gate on the first and third qubit:
1√
2
|1〉|1〉|0〉 + 1√
2
(
1√
2
+ 1√
2
i
)
|0〉|0〉|1〉
5. Applying the T † gates on the first and second qubits and T gate on the third
qubit:
1√
2
(
1√
2
- 1√
2
i
)(
1√
2
- 1√
2
i
)
|1〉|1〉|0〉 + 1√
2
(
1√
2
+ 1√
2
i
)(
1√
2
+ 1√
2
i
)
|0〉|0〉|1〉 ∴ 1√
2
(
-
i
)
|1〉|1〉|0〉 + 1√
2
i |0〉|0〉|1〉
6. Applying the CNOT gate on qubits two and three:
1√
2
(
-i
)
|1〉|1〉|0〉 + 1√
2
i |0〉|1〉|1〉
7. Applying the CNOT gate on first and third qubits:
1√
2
(
-i
)
|1〉|1〉|0〉 + 1√
2
i |1〉|1〉|1〉
8. Applying Hadamard gate on qubit three:
1√
2
(
-i
)
|1〉|1〉 1√
2
(
|0〉 + |1〉
)
+ 1√
2
i |1〉|1〉 1√
2
(
|0〉 - |1〉
)
∴ -i
2
|1〉|1〉|0〉 - -i
2
|1〉|1〉|1〉 +
i
2
|1〉|1〉|0〉 - i
2
|1〉|1〉|1〉 ∴ -i |1〉|1〉|1〉
9. Applying the S gate on the third qubit:
-i |1〉|1〉
(
-i
)
|1〉 ∴ 1|1〉|1〉|1〉
Finally, measuring the value (1)2 = 1 ∴ 100% probability on |1〉|1〉|1〉
20
2.3.2 Logical-AND Uncomputation Gate
Measure-and-Fixup Method
Talking about the T-count, this gate has no T and T † gates; therefore the T-count is
0. A reversible implementation of the logical-AND function takes two inputs, classical
binary 1 and 0, which are represented as qubits, and at the end of computation, the
two original inputs will be returned as classical values. Since classical 0s and 1s don’t
have phase angles, we need to remove them. The purpose of this uncomputation
gate is phase correction. As it can be seen from the third qubit or the ancilla in one
of the Figures like Figure 2.8(a), the measurement of the ancilla is collapsed from
superposition to a classical bit. If the ancilla value wants to be restored to |T 〉 =
1√
2
(
|0〉 + e iπ4 |1〉
)
for another computation, the classical bit needs to be cleared and
a Hadamard with the T gate needs to be applied.
Instead of writing all the cases by hand, a quantum circuit simulator called Quirk
in [28] was used to verify the correct operation of the Measure-and-Fixup method.
Four possible test cases were used (See Figures 2.8(a), 2.8(b), 2.8(c), and 2.8(d)) to
show the simulation results, but also most importantly, the two top qubits didn’t
change. There are four steps for the gates in 2.8(a), 2.8(b), 2.8(c), and 2.8(d); the
first step is the ancilla or the third qubit is set, then the inputs (first and second
qubits) are adjusted to the four possible test cases, next the qubits go through the
logical-AND computation gate and then the Measure-and-Fixup gate, and finally the
qubits are measured.
One thing to note that if the uncomputation gate is applied in the middle of a
quantum circuit, more specifically meaning that if a measure is placed in the middle of
any quantum circuit, quantum properties will be lost, and the quantum circuit will be
slower. The gate can be shown in 2.7(a) and 2.7(b). From the name, Measure means
that its measuring the third qubit which is in a superposition, and Fixup means that
21
the controlled-Z gate is fixing the phase errors on the first two qubits.
(a) (b)
Figure 2.7: The logical-AND uncomputation gate: Measure-and-Fixup
(a) Graphical Representation
(b) Quantum Representation
Computation-Reversal Method
An alternative option to the Measure-and-Fixup method is reversing the logical-AND
computation gate as shown in Figure 2.9(a) and 2.9(b). Talking about the T-count,
this gate has two T gates and one T † gate; therefore the T-count is three. Unlike
the Measure-and-Fixup method which returns a classical bit for the third output, the
Computation-Reversal method returns the ancilla value of |T 〉 = 1√
2
(
|0〉 + e iπ4 |1〉
)
.
Instead of writing all the cases by hand, a quantum circuit simulator called Quirk
in [28] was used to verify the correct operation of the Computation-Reversal method.
As it can be seen in 2.10(a), the original ancilla value of |T 〉 = 1√
2
(
|0〉 + e iπ4 |1〉
)
.
Four possible test cases were used (See Figures 2.10(b), 2.10(c), 2.10(d), and 2.10(e))
to verify that in the end, the ancilla stayed at its original superpostion value but also
most importantly, the two top qubits didn’t change. There are five steps for the gates
in 2.10(b), 2.10(c), 2.10(d), and 2.10(e); the first step is the ancilla or the third qubit
is set, then the inputs (first and second qubits) are adjusted to the four possible test
cases, next the qubits go through the logical-AND computation gate and then the
Computation-Reversal gate, and finally the qubits are measured.
22
(a)
(b)
(c)
(d)
Figure 2.8: Testing Measure-and-Fixup gate
(a) Both inputs are |0〉
(b) The first input is a |1〉
(c) The second input is a |1〉
(d) Both inputs are a |1〉
2.3.3 Comparison between the two Methods
When considering these two gates, the Measure-and-Fixup uncomputation gate will
be resource efficient when implemented with error correcting codes such as surface
codes. Quantum circuits based on the gates of Computation-Reversal uncomputation
will be resource efficient in cases where error correcting codes cannot or are not used
23
(a) (b)
Figure 2.9: The logical-AND uncomputation gate: Computation-Reversal
(a) Graphical Representation
(b) Quantum Representation
such as with near term devices. Each gate will now be illustrated to show the best
suited option for their respective implementations.
When implemented with error correcting codes such as the surface codes, the
logical-AND computation gate saves resources because it uses an ancilla |A〉 set to
1√
2
(
|0〉 + e iπ4 |1〉
)
as opposed to a logical T gate. To realize a logical T gate, ancilla set
to |A〉 and |Y 〉 (where Y = 1√
2
(|0〉+ e iπ2 |1〉
)
must be created with one or more rounds
of state distillation [29] [30]. According to [31] [29] [30] |A〉 distillation requires at
least 15 logical qubits, 15 T gates, 15 measurements and arrays of CNOT gates and
|Y 〉 distillation requires at least 7 logical qubits, 7 S gates, 7 measuremnts and arrays
of CNOT gates. If the surface code scheme in [31] is used, 3600 physical qubits are
required per logical qubit. As a result, generating a |Y 〉 state will need 25200 qubits
and generating a |A〉 state will need 54000 total qubits. Thus, the logical-AND gate
saves qubit and quantum gates. The uncomputation gate with Measure-and-Fixup
does not require T gates avoiding the costly |A〉 and |Y 〉 state distillations. Con-
sidering that a single logical T gate requires at least 22 measurements, the penalties
associated with an additional single measurement in the uncomputation gate with
Measure-and-Fixup will be negligible and offset by the overall resource savings from
avoiding logical T gates.
24
(a)
(b)
(c)
(d)
(e)
Figure 2.10: Testing Computation-Reversal gate
(a) Ancilla |T 〉 result
(b) Both inputs are |0〉
(c) The first input is a |1〉
(d) The second input is a |1〉
(e) Both inputs are a |1〉
For situations where error correcting codes are not used such as with near-term
quantum technologies where error correcting schemes (such as the surface codes) are
not supported, the ancilla |A〉 used in the logical-AND computation gate is realized by
applying a Hadamard gate then a T gate to an ancillae set to 0. The uncomputation
25
gate with Computation-Reversal has the advantage in terms of computation speed in
cases where fault tolerant schemes are not used. Time of computation is important
because qubits can only maintain superposition states for a finite time or rather
they are limited by their coherence times. Coherence times for quantum computers
have been reported in the literature. For instance, the IBM quantum machine has a
coherence time of 100µs according to [32]. Coherence times of up to a half hour have
been reported in the literature [33]. Thus, to maximize the amount of computations,
time intensive operations should be used sparingly. Operation times for gates and
the measurement operation have been reported in the literature [34] [35] [32]. For
example, in [34], a one qubit gate (such as a T gate) has a computation time of 1µs, a
two qubit gate (such as a CNOT) has a computation time of 10µs while measurement
has a computation time of 200µs. To estimate the computation time for each gates
in Figures 2.6(b), 2.7(b), and 2.9(b), a three step algorithm is used: (i) calculate the
number of circuit layers (or depth), (ii) calculate maximum time to perform each gate
layer, (iii) sum the results. The values in [34] for the computation time estimates.
The logical-AND computation gate and the Computation-Reversal uncomputation
gate will both have a depth of 7 and a total computation time of 43µs. The Measure-
and-Fixup uncomputation gate has a depth of 4 and a total computation time of
212µs. The Computation-Reversal uncomputation gate is roughly 5 times faster and
therefore will permit one to perform roughly 5 times as much computation within a
given coherence time.
2.4 Carry Look-Ahead Addition Review
2.4.1 Classical CLA
The Full Adder is used to add two 1-bit numbers, or inputs, and a Carry in if appli-
cable. The results of the Full Adder are two outputs: Sum and Carry out. To add
26
two 2-bit, 3-bit, or n-bit numbers, the Full Adders have to be in parallel [36], where
the Carry out of the first value goes into the Carry in of the second value, and so on.
The parallel addition of the Full Adder is called a Ripple Carry Adder [36].
The problem with Ripple Carry Adder is to get the next Sum and the next Carry
out, the previous Carry out needs to be known. That means the Carry out values
needs to move or propagate all the way to the last Sum and Carry out to have the
right value. Therefore, Carry out propagating to the final value creates propagation
delay, which is defined as n-number of Full Adders inside a Ripple Carry times the
number of seconds to reach the Sum and Carry out values in each Full Adder [36].
To reduce the propagation delay of the Ripple Carry Adder, there needs to be
another way, but it will increase the complexity of the circuit, a circuit called the
Carry Look-Ahead Adder. With the complexity of the Carry Look-Ahead Adder,
two values have to come into place, carry propagate and carry generate [36]. Carry
propagate is defined as: pi = ai ⊕ bi, which is responsible for propagating Carry to
Carry out. Carry generate is defined as gi = ai & bi, which makes the Carry out, ci+1
when both ai and bi are set to 1, and it doesn’t matter what the Carry in, ci is. Sum,
si is defined as: si = ai ⊕ bi ⊕ ci, therefore, plugging in pi: si = pi ⊕ ci. Carry out,
ci+1 is defined as ci+1 = pi & ci | gi.
An example is shown to illustrate the Carry Look-Ahead addition of two 4-bit
numbers.
Using ci+1 = pi & ci | gi:
i = 0 : c1 = p0 & c0 | g0
i = 1 : c2 = p1 & c1 | g1 = c2 = p1 & (p0 & c0 | g0) | g1 = p1 & p0 & c0 | p1 & g0 | g1.
i = 2 : c3 = p2 & c2 | g2 = p2 & p1 & p0 & c0 | p2 & p1 & g0 | p2 & g1 | g2.
i = 3 : c4 = p3 & c3 | g3 = p3 & p2 & p1 & p0 & c0 | p3 & p2 & p1 & g0 | p3 & p2 &
g1 | p3 & g2 | g3.
27
2.4.2 Literature Review
A literature review was done for the Quantum Carry Look-Ahead Adder. In the
later Chapters, three literatures in [19], [23], and [23] will be used because they are
applicable for the comparison with the proposed work in Chapters 3 and 4. The
remaining two articles in [21] and [22] were read to get more of an understanding into
the Carry Look-Ahead addition.
For the first literature from [19], the authors that proposed this paper to im-
prove the efficiency of the Quantum Carry Look-Ahead Adder with depth of O(logn)
and O(n) of ancillary qubits. This Literature included out-of-place, in-place, and
extensions of these two circuits like the comparison and subtraction. The circuit is
optimized for ancillas, size, and depth.
The next literature which is from [20], the authors that proposed this paper fo-
cused on delay, gate count, and quantum cost of two circuits: out-of-place and in-place
Carry Look-Ahead Adders which included reversible gates like the CNOT, Peres, TR,
and Toffoli gates. The purpose was to optimize the circuits in these three parameters
of delay, gate count, and quantum cost.
The third literature from [21], the authors that proposed it, focused on reducing
the qubits to O(n/logn) and making the depth: O(logn) and size O(n) of the Quantum
Carry Look-Ahead Adder. This circuit contains only Toffoli gates. The main issue
that the author is facing with is decreasing the ancillary qubits but increasing depth
and size for the Carry Look-Ahead Adder.
As for the fourth literature from [22], the main promises that this literature gave
is reducing the quantum cost, delay, garbage outputs, and the number of gates on
the Quantum Carry Look-Ahead Adder.
The final literature from [23], the authors of this literature decided to improve the
requirements and performance for the out-of-place and in-place Quantum Carry Look-
Ahead Adder using the measurement-based method. Also there is another Quantum
28
Carry Look-Ahead Adder described called graph-state, where the size and depth was
compared to logical qubits to get a good comparison between the new Quantum Carry
Look-Ahead Adders and the existing ones.
29
Chapter 3
Design of Proposed Out-of-place
QCLA
Two designs have been proposed, the first proposed circuit called low T-count and
a second proposed circuit called high speed. To make things simplistic, the first
proposed circuit will be described, and where there is a difference, then the second
proposed circuit will be noted down. The proposed (low T-count) out-of-place QCLA
circuit is shown in Figure 3.1 for the case of adding two eight-qubit numbers |a〉 and
|b〉. Out-of-place means that the Sum qubits are generated or realized from the ancilla
qubits. At the end of the computation the location A with |a〉 and location B with
|b〉 will be unchanged. The s1 through sn are realized on n ancillas initialized to the
value 1√
2
(
|0〉 + e iπ4 |1〉
)
. As for s0, it’s realized on ancilla being initialized to the value
of |0〉. These ancillas are stored in location Z. Another set of ancillas will be stored
in location X. They will also be initialized to 1√
2
(
|0〉 + e iπ4 |1〉
)
and will be used in the
P-rounds step. The number of ancillas inside location X is n-w(n)-blognc, where w(n)
is the number of ones in the binary expansion of n [19], where w(n) = n -
∑∞
y=1
⌊
n
2y
⌋
.
All ancillas except the ancilla used for s0 are initialized to
1√
2
(
|0〉 + e iπ4 |1〉
)
because
the logical-AND computation gates used in the proposed circuit require these ancillas
30
to function correctly. At the end of computation, n-w(n)-blognc ancillas are turned
into classical bits and need to be reset to be used in other computations, this is the
first proposed circuit. As for the second proposed circuit (high speed), the ancillas
can be reused in later computations because they are restored to the initialized ancilla
value (this will be described in step 6). Finally as for remaining ancillas that are from
location Z, they will be in the sum.
The proposed (low T-count) out-of-place QCLA circuit is based on the NOT gate,
the CNOT gate, the Toffoli gate along with the logical-AND computation gate and
the logical-AND Measure-and-Fixup uncomputation gate presented in [27]. An al-
gorithm based on the design methodology presented in [19] to implement an out-of-
place QCLA from these quantum gates. By using logical-AND computation gates and
logical-AND Measure-and-Fixup uncomputation gates, the proposed design method-
ology saves T gates.
For manipulating values in locations A and B, two 2-D arrays are required, one
for propagation, p[j,l ] and one for generation, g[j,k ]. The variables of i, j, k, l are the
indices that signify the location of the propagating, the generating and the carry value.
3.1 Procedure
The steps in designing the proposed (low T-count) out-of-place QCLA are:
1. For i = 0 to n-1, apply the logical-AND computation on A[i ], B[i ], and Z[i+1 ]
. This will generate three outputs. The first output will be A[i ], the second
output will be B[i ], and the third output will be A[i ] & B[i ] where the location
will be set to g[i, i +1].
31
2. For i = 1 to n-1. At the locations of A[i ] and B[i ] introduce the CNOT gate in
which the same value is maintained at A[i ], while at location B[i ], changes to
B[i ] = A[i ] ⊕ B[i ], and that value will be set to a new location p[i, i+1].
3. P-rounds: For the logical-AND computation gate, there are three inputs and
three outputs. For the first and last outputs, the inputs pass through. The lo-
cation for the first input and output is p[j,l ], the location for the third input and
output is p[l, k ]. With the help of the second input, which is an ancilla stored
at location X[d], the second output could be calculated. But in all simplicity,
the second output is calculated by p[j,l ] & p[l,k ] and the value is saved to the
output location of p[j,k ]. The equation for indices j = 2tm, k = 2tm + 2t, and
l = 2tm + 2t−1. The indices j, k, and l for the location of array p is determined
by t and m, where t=1 to blognc-1 and where 1 ≤ m < b n/2t c which is nested
in the t loop. The index d is determined by j+k, the addition of these indices
are sorted from the lowest to the highest. The lowest will have an index of 0
and the highest will have
∑∞
y=1
⌊
n
2y
⌋
- blognc - 1.
4. G-rounds: This step includes the Toffoli gates, which have three inputs and
three outputs. The locations for the three inputs are g[j, l ], p[l, k ], and g[l, k ]
respectively. The first and second outputs are the same as the inputs, so the
values just pass through and the locations are unchanged. The third output
has an operation done on it, g[l, k ] ⊕ (g[j, l ] & p[l, k ]) and the value is saved to
the output location of g[j, k ]. The equation for the indices j = 2tm, k = 2tm +
2t, and l = 2tm + 2t−1. The indices j, k, and l for the location of arrays p and
g are determined by t and m. Where t=1 to blognc and where 0 ≤ m < b n/2t
c which is nested in the t loop.
32
5. C-rounds: This step includes the Toffoli gates, which have three inputs and
three outputs. The locations for the three inputs are g[0, l ], p[l,k ], and g[l,k ]
respectively. The first and second outputs are the same as the inputs, so the
values just pass through and the locations are unchanged. The third output
has a operation done on it, where g[l, k ] ⊕ (g[0, l ] & p[l, k ]) and the value is
saved to the output location g[0, k ]. The equation for l = 2tm and k = 2tm +
2t−1. The indices l and k for the locations of arrays g and p are determined by
t and m, where t = b log(2n/3) c and decreases to 1 and where 1 ≤ m ≤ b (n -
2t−1)/2t c which is nested in the t loop.
6. P-erase-rounds: For the logical-AND Measure-and-Fixup uncomputation gate,
there are three inputs and three outputs. For the first and last outputs, the
inputs pass through. The location for the first input and output is p[j, l ], and
the location for the third input and output is p[l, k ]. As for the middle input,
its location is p[j, k ], which has the value stored from the logical-AND, and this
value is uncomputed and the output value will result in a classical bit (proposed
circuit 1: low T-count). For proposed circuit 2 (high speed), by replacing the
Measure-and-Fixup gates to the Computation-Reversal gates, the middle out-
put value will result in the original X[d] ancilla value, and the location of the
ancilla value will be set to Xout[d]. For both circuits, the equation for indices
j = 2tm, k = 2tm + 2t, and l = 2tm + 2t−1. The indices j, k, l for the location
of array p are determined by t and m. Where t = b logn c - 1 and decreases to
1, and where 1 ≤ m < b n/2t c, which is nested in the t loop. The definition
for index d is the same as in P-rounds.
33
7. For i = 1 to n-1, apply the CNOT gate on locations p[i, i +1] and g[0, i ] to
get si. For i = 0, the CNOT gate is applied, the ancilla value of Z is set to Z
= B[i ] ⊕ Z.
8. Finally, for i = 0, si is equal to A[i ] ⊕ Z. As for i = 1 to n-1, p[i, i +1] coming
from location B[i ] is being XORed by A[i ], and the result will return the original
bi value. For i = n, si or the last Sum value will be extracted from g[0, i ].
3.2 T-count
The T-count for the first proposed circuit (low T-count) Carry Look-Ahead Adder
shown in Figure 3.1 is 22n− 11w(n)− 11blognc − 7.
 For the first step, it has n number of logical-AND computation gates, with each
having a T-count of 4.
 The second step has a T-count of 0.
 The third step has a T-count of 4(n - w(n) - b logn c).
 Step 4 has n - w(n) Toffoli gates with a T-count of 7(n - w(n)).
 Step 5 has n - b logn c - 1 Toffoli gates with a T-count of 7(n - b logn c - 1).
 As for steps 6,7, and 8, the T-count is 0.
Table 3.1 shows the equations for the T-count for the two proposed circuits and
the three in literature.
34
Table 3.1: Equations for Out-of-place T-count
Design T-count Equation
1 35n− 21w(n)− 21blognc − 7
2 35n− 21w(n)− 21blognc − 7
3 35n− 14
Prop1 22n− 11w(n)− 11blognc − 7
Prop2 25n− 14w(n)− 14blognc − 7
1 is the design in [19]
2 is the design in [23]
3 is the design in [20]
Prop1 is the design in Figure 3.1
Prop2 is the design in Figure 3.1. Step 6 replaced with Computation-Reversal gate
35
𝑍0                                                                                𝑍0                                                                                 𝑠0   
𝑎0                                                                                𝑎0                                                                                            𝑎0                                                                                 
𝑏0                                                                                𝑏0                                                                                  𝑏0 
𝑍1                                                                             𝑔[0,1]                                                                              𝑠1 
𝑎1                                                                                𝑎1                                                                                  𝑎1 
𝑏1                                                                             𝑝[1,2]                                                                               𝑏1 
𝑍2                                                                             𝑔[0,2]                                                                              𝑠2 
𝑎2                                                                                𝑎2                                                                                  𝑎2 
𝑏2                                                                             𝑝[2,3]                                                                               𝑏2 
𝑋0                                                                             𝑝[2,4] 
𝑍3                                                                             𝑔[2,3]                                                                              𝑠3 
𝑎3                                                                                𝑎3                                                                                  𝑎3 
𝑏3                                                                             𝑝[3,4]                                                                               𝑏3 
𝑍4                                                                             𝑔[0,4]                                                                              𝑠4 
𝑎4                                                                                𝑎4                                                                                  𝑎4 
𝑏4                                                                             𝑝[4,5]                                                                               𝑏4 
𝑋1                                                                             𝑝[4,6] 
𝑍5                                                                             𝑔[4,5]                                                                              𝑠5 
𝑎5                                                                                𝑎5                                                                                  𝑎5 
𝑏5                                                                              𝑝[5,6]                                                                              𝑏5 
𝑋2                                                                             𝑝[4,8] 
𝑍6                                                                             𝑔[4,6]                                                                              𝑠6 
𝑎6                                                                                𝑎6                                                                                  𝑎6 
𝑏6                                                                             𝑝[6,7]                                                                               𝑏6 
𝑋3                                                                             𝑝[6,8] 
𝑍7                                                                             𝑔[6,7]                                                                             𝑠7 
𝑎7                                                                                𝑎7                                                                                 𝑎7 
𝑏7                                                                             𝑝[7,8]                                                                              𝑏7 
𝑍8                                                                            𝑔[0,8]                                                                              𝑠8 
𝑆𝑡𝑒𝑝 1    𝑆𝑡𝑒𝑝 2      𝑆𝑡𝑒𝑝 3                          𝑆𝑡𝑒𝑝 4                                                         𝑆𝑡𝑒𝑝 5                    𝑆𝑡𝑒𝑝 6             𝑆𝑡𝑒𝑝 7    𝑆𝑡𝑒𝑝 8   
Figure 3.1: Out-of-place QCLA: low T-count
Step 1,3 Computation gate
Step 2,7,8 CNOT gate
Step 4,5 Toffoli gate
Step 6 Uncomputation gate
36
Chapter 4
Design of Proposed In-place QCLA
Three designs have been proposed, the first proposed circuit called low T-count, the
second proposed circuit which is a mixture of low T-count and high speed, and the
third proposed circuit called high speed. To make things simplistic, the first proposed
circuit will be described, and where there is a difference, then the second and third
proposed circuits will be noted. The proposed (low T-count) in-place QCLA circuit
is shown in Figure 4.1 for the case of adding two eight-qubit numbers |a〉 and |b〉.
In-place means that the Sum qubit is generated or realized from an input qubit. At
the end of computation, the location A with |a〉 will be unchanged and the location
B with |b〉 will contain the sum bits s0 through sn-1. The proposed in-place QCLA
circuit also requires n ancillas initialized to the value 1√
2
(
|0〉 + e iπ4 |1〉
)
. These ancillas
will be stored in location Z. At the end of computation, n ancillas are turned into
classical bits and need to be reset to be used in other computations, this is the first
proposed circuit. As for the third proposed circuit (high speed), the ancillas can be
reused in later computations and the value is set to the original initialized ancilla.
For all the proposed circuits, another set of ancillas will be stored in location X. The
ancillas will also be initialized to 1√
2
(
|0〉 + e iπ4 |1〉
)
and will be used in the P-rounds
and the reverse of P-erase-rounds steps. The number of ancillas inside location X is
37
n-w(n)-blognc, where w(n) is the number of ones in the binary expansion of n [1],
where w(n) = n -
∑∞
y=1
⌊
n
2y
⌋
. All ancillas are initialized to the value 1√
2
(
|0〉 + e iπ4 |1〉
)
because the logical-AND computation gates used in the proposed circuit require these
ancillas to function properly.
The proposed (low T-count) in-place QCLA circuit is based on the NOT gate, the
CNOT gate, the Toffoli gate along with the logical-AND computation and logical-
AND Measure-and-Fixup uncomputation gate presented in [27]. An algorithm based
on methodology presented in [19] to implement the in-place QCLA from these gates.
By using the logical-AND computation and logical-AND Measure-and-Fixup uncom-
putation gates, the proposed design methodology saves T gates.
4.1 Procedure
The steps in designing the new (low T-count) in-place QCLA are:
1. For i = 0 to n-1, apply the logical-AND computation gate on three inputs that
have locations A[i ], B[i ] and Z[i ], respectively. This will generate three outputs.
The first output will be A[i ], the second output will be B[i ], where the first and
second input value will not change. The third output value will be A[i ] & B[i ]
where the location will be set to g[i, i +1].
2. For i = 0 to n-1. At the locations of A[i ] and B[i ] introduce the CNOT gate in
which the same value is maintained at A[i ], while at location B[i ], changes to
A[i ] ⊕ B[i ], and that value will be set to a new location p[i, i+1].
3. P-rounds: For the logical-AND computation gate, there are three inputs and
three outputs. For the first and last outputs, the inputs pass through. The
38
location for the first input and output is p[j,l ], and the location for the third
input and output is p[l, k ]. With the help of the second input, which is an
ancilla stored at location X[d], the second output could be calculated. But in
all simplicity, the second output has an operation done on it, p[j,l ] & p[l,k ] and
the value is saved to the output location of p[j,k ]. The equation for indices j =
2tm, k = 2tm + 2t, and l = 2tm + 2t−1. The indices j, k, and l for the location
of array p are determined by t and m, where t=1 to blognc-1 and where 1 ≤ m
< b n/2t c which is nested in the t loop. The index d is determined by j+k, the
addition of these indices are sorted from the lowest to the highest. The lowest
will have an index of 0 and the highest will have
∑∞
y=1
⌊
n
2y
⌋
- blognc - 1.
4. G-rounds: This step includes the Toffoli gates, which have three inputs and
three outputs. The locations for the three inputs are g[j, l ], p[l, k ], and g[l, k ]
respectively. The first and second outputs are the same as the inputs, so the
values just pass through and the locations are unchanged. The third output
has an operation done on it, g[l, k ] ⊕ (g[j, l ] & p[l, k ]) and the value is saved
to the output location of g[j, k ]. The equation for j = 2tm, k = 2tm + 2t, and
l = 2tm + 2t−1. The indices j, k, l for the location p ang g are determined by
t and m. Where t=1 to blognc and where 0 ≤ m < b n/2t c which is nested in
the t loop.
5. C-rounds: This step includes the Toffoli gates, which have three inputs and
three outputs. The locations for the three inputs are g[0, l ], p[l,k ], and g[l,k ]
respectively. The first and second outputs are the same as the inputs, so the
values just pass through and the location is unchanged. The third output has a
operation done on it, where g[l, k ] ⊕ (g[0, l ] & p[l, k ]) and the value is saved to
39
the output location g[0, k ]. The equation for l = 2tm and k = 2tm + 2t−1. The
indices l and k for the location of arrays g and p are determined by t and m,
where t = b log(2n/3) c and decreases to 1 and where 1 ≤ m ≤ b (n - 2t−1)/2t
c which is nested in the t loop.
6. P-erase-rounds: For the logical-AND Measure-and-Fixup uncomputation gate,
there are three inputs and three outputs. For the first and last outputs, the
inputs pass through. The location for the first input and output is p[j, l ], and
the location for the third input and output is p[l, k ]. As for the middle input,
its location is p[j, k ], which has the value stored from logical-AND, and this
value is uncomputed and the output value will result in a classical bit (proposed
circuit 1). For proposed circuit 3 (high speed), the output value will result in
the original X[d] ancilla value, and the location of the ancilla value will be set
to Xout[d]. For all the proposed circuits, the equation for indices j = 2tm, k
= 2tm + 2t, and l = 2tm + 2t−1. The indices j, k, l for the location p are
determined by t and m. Where t = b logn c - 1 and decreases to 1, and where
1 ≤ m < b n/2t c, which is nested in the t loop. The definition for index d is
the same as in P-rounds.
7. For i = 1 to n-1, apply the CNOT gate on locations p[i, i +1] and g[0, i ] so
that location g[0, i ] would contain the same value but the value of p[i, i +1]
would change to g[0, i ] ⊕ p[i, i +1] and placed in location p[i, i +1].
8. For i = 0 to n-2, apply the NOT gate. Starting at i = 0, apply the NOT gate
on location B[i ] and put the value in location p[i, i +1]. As for i = 1 to n-2,
apply the NOT gate on the p[i, i +1] location and put that negated value back
40
in location p[i, i +1].
9. For i = 1 to n-2, at locations A[i ] and p[i, i +1], apply the CNOT gate, so that
location A[i ] would contain the same value but the value of location p[i, i +1]
changes to A[i ] ⊕ p[i, i +1], and placed back into the location p[i, i +1].
10. Reverse of P-erase-rounds: which has logical-AND computation gate, there are
three inputs and three outputs. For the first and last outputs, the inputs pass
through. The location for the first input and output is p[j,l ], and the location
for the third input and output is p[l,k ]. With the help of the second input, which
is an ancilla stored at location X[d], the second output could be calculated. But
in all simplicity, the second output has an operation done on it, p[j,l ] & p[l,k ]
and the value is saved to the output location p[j,k ]. The equation for indices j
= 2tm, k = 2tm + 2t, and l = 2tm + 2t−1. The indices j, k, l for the location
p is determined by t and m∗, where t=1 to blog(n-1)c-1 and where 1 ≤ m < b
(n-1)/2t c which is nested in the t loop. The definition for index d is the same
as in P-rounds.
11. Reverse of C-rounds: which has Toffoli gates, there are three inputs and three
outputs. The locations for the 3 inputs are g[0,l ], p[l,k ], and g[0,k ] respectively.
The first and second output values are the same as inputs, so the values just
pass through and the location is unchanged. The output has an operation done
on it, where g[0, k ] ⊕ (g[0, l ] & p[l, k ]) and the value is saved to the output
location g[l,k ]. The equation for l = 2tm and k = 2tm + 2t−1. The indices k
and l are determined by t and m∗, where t = 1 and goes to t = b log(2(n-1)/3)
c and where 1 ≤ m ≤ b ((n-1)-2t−1)/2t c which is nested in the t loop.
41
12. Reverse of G-rounds: which have the Toffoli gates, it includes three inputs and
three outputs. The locations for the three inputs are g[j,l ], p[l,k ] and g[j,k ]
respectively. The first and second outputs are the same as the inputs, so the
values just pass through and the location is unchanged. The third output has
an operation done on it, g[j,k ] ⊕ (g[j,l ] & p[l,k ]) and the value is saved to the
output location of g[l,k ]. The equation for indices j = 2tm, k = 2tm + 2t, and
l = 2tm + 2t−1. The indices j, k, l for the location p and g are determined by
t and m∗. Where t = b log(n-1) c which decreases to 1 and where 0 ≤ m < b
(n-1)/2t c which is nested in the t loop.
13. Reverse of P-rounds: apply the logical-AND Measure-and-Fixup uncomputa-
tion gate on three inputs that have locations p[j,l ], p[j,k ], and p[l,k ], respectively.
This will generate three outputs. The first output will be p[j,l ], the third out-
put will be p[l,k ], where the first and third output values will not change. The
second input value will be uncomputed, and the output value will result in a
classical bit (proposed circuit 1). For proposed circuit 3 (high speed), the out-
put value will result in the original X[d] ancilla value, and the location of the
ancilla value will be set to Xout[d]. For all the proposed circuits, the equation
for indices j = 2tm, k = 2tm + 2t, and l = 2tm + 2t−1. The indices j, k, l for
the location of array p are determined by t and m∗. Where t = b log(n-1) c -
1 and decreases to 1, and where 1 ≤ m < b (n-1)/2t c which is nested in the t
loop. The definition for index d is the same as in P-rounds.
14. For i=1 to n-2, at locations A[i ] and p[i, i +1], apply the CNOT gate so that
location A[i ] would contain the same value but the value of location p[i, i +1]
42
changes to A[i ] ⊕ p[i, i +1], and placed back into the location p[i, i +1].
15. For i=0 to n-2, at locations A[i], p[i, i +1], and g[i, i +1], apply the Measure-
and-Fixup uncomputation gate so that locations a[i] and p[i, i +1] would contain
the same value. The third input value will be uncomputed, and the output value
will result in a classical bit (proposed circuit 1). For proposed circuit 3 (high
speed), the output value will result in the original Z[i] ancilla value, and the
location of the ancilla value will be set to Zout[i].
16. For i=0 to n-2, at location p[i, i +1], apply the NOT gate, so that the value
would be inverted and saved in location si, where the Sum is stored. For i=n-1,
at location p[i, i +1], si will be stored from, which is the second last value of
Sum. For i = n, at location g[0, i ], the last value of sum will be stored.
∗To preserve the last carry value or sn, gates and circuits in steps 10 to 13 that
do an operation on the last carry value needs to be deleted so they would not
interfere with the last carry value. Therefore the equations for t and m were
adjusted from the index ending at n to n-1.
Two more proposed Carry Look-Ahead Adders can be extracted by manipulating
the design steps. The second proposed (low T-count + high speed) circuit can be
made by replacing the Computation-Reversal gates in steps 13 and 15 from Figure 4.2
and putting the Measure-and-Fixup gates. The third proposed (high speed) Carry
Look-Ahead Adder is shown in Figure 4.2, where it can be made by replacing all the
uncomputation gates to be Computation-Reversal gates.
43
4.2 T-count
The T-count for the proposed (low T-count) In-place QCLA shown in Figure 4.1 is
40n− 11w(n)− 11blognc − 11w(n− 1)− 11blog(n− 1)c − 32.
 For the first step, it has n number of logical-AND computation gates, with each
having a T-count of 4.
 The second step has a T-count of 0.
 The third step has a T-count of 4(n - w(n) - b logn c).
 Step 4 has n - w(n) Toffoli gates with a T-count of 7(n - w(n)).
 Step 5 has n - b logn c - 1 Toffoli gates with a T-count of 7(n - b logn c - 1).
 As for steps 6,7, 8, 9 the T-count is 0.
 Step 10 has 4((n-1) - w(n-1) - b log(n-1) c)
 Step 11 has (n-1) - b log(n-1) c - 1 Toffoli gates with a T-count of 7((n-1) - b
log(n-1) c - 1).
 Step 12 has (n-1) - w(n-1) Toffoli gates with a T-count of 7((n-1) - w(n-1)).
 As for Steps 13, 14, 15, 16, the T-count is 0.
Table 4.1 shows the equations for the T-count for the three proposed circuits and
the three existing works.
44
Table 4.1: Equations for In-place T-count
Design T-count Equation
1 70n− 21w(n)− 21blog(n)c − 21w(n− 1)− 21blog(n− 1)c − 49
2 70n− 21w(n)− 21blog(n)c − 21w(n− 1)− 21blog(n− 1)c − 49
3 203
4
n− 28
Prop1 40n− 11w(n)− 11blognc − 11w(n− 1)− 11blog(n− 1)c − 32
Prop2 43n− 14w(n)− 14blognc − 11w(n− 1)− 11blog(n− 1)c − 32
Prop3 49n− 14w(n)− 11blognc − 14w(n− 1)− 14blog(n− 1)c − 38
1 is the design in [19]
2 is the design in [23]
3 is the design in [20]
Prop1 is the design in Figure 4.1
Prop2 is the design in Figure 4.2. Steps 13, 15 replaced with Measure-and-Fixup gate
Prop3 is the design in Figure 4.2
45
                                                                                                                                                                      
𝑎0                                                                                                              𝑎0                                                                                                                                                            
𝑏0                                                                                                                                                                               𝑠0 
𝑍0                                                                                                                                                                
𝑎1                                                                                                                                                                              𝑎1 
𝑏1                                                                                                                                                                              𝑠1 
𝑍1                                                                                                                                                                
𝑎2                                                                                                                                                                              𝑎2 
𝑏2                                                 𝑠2 
                                                            
𝑋0                                                                                  
𝑍2                                                                                                                                                               
𝑎3                                                                                                                                                                              𝑎3 
𝑏3                                                                                                                                                                              𝑠3 
𝑍3                                                                                                                                                                
𝑎4                                                                                                                                                                              𝑎4 
𝑏4                                                                                                                                                                              𝑠4                                                     
𝑋1                                                                                  
𝑍4                                                                                                                                                                
𝑎5                                                                                                                                                                              𝑎5 
𝑏5                                                                                                                                                                              𝑠5 
𝑋2                                                                                  
𝑍5                                                                                                                                                               
𝑎6                                                                                                                                                                              𝑎6 
𝑏6                                                                                                                                                                              𝑠6 
𝑋3                                                                                 
𝑍6                                                                                                                                                               
𝑎7                                                                                                                                                                             𝑎7 
𝑏7                                                                                                                                                                             𝑠7 
𝑍7                                                                                                                                                                             𝑠8 
       𝑆. 1       𝑆. 2         𝑆. 3                 𝑆. 4                     𝑆. 5              𝑆. 6     𝑆. 7    𝑆. 8     𝑆. 9        𝑆. 10           𝑆. 11                  𝑆. 12              𝑆. 13        𝑆. 14  𝑆. 15   𝑆. 16 
Figure 4.1: In-place QCLA: low T-count
Step 1,3,10 Computation gate
Step 2,7,9,14 CNOT gate
Step 4,5,11,12 Toffoli gate
Step 6,13,15 Uncomputation gate
Step 8,16 NOT gate
46
                                                                                                                                                                      
𝑎0                                                                                                              𝑎0                                                                                                                                                             
𝑏0                                                                                                                                                                               𝑠0 
𝑍0                                                                                                                                                                              𝑍0 
𝑎1                                                                                                                                                                              𝑎1 
𝑏1                                                                                                                                                                               𝑠1 
𝑍1                                                                                                                                                                              𝑍1 
𝑎2                                                                                                                                                                              𝑎2 
𝑏2                                                 𝑠2   
                                                          
𝑋0                                                                                                                                                                             𝑋0 
𝑍2                                                                                                                                                                             𝑍2 
𝑎3                                                                                                                                                                             𝑎3 
𝑏3                                                                                                                                                                              𝑠3 
𝑍3                                                                                                                                                                             𝑍3 
𝑎4                                                                                                                                                                             𝑎4 
𝑏4                                                                                                                                                                              𝑠4                                                     
𝑋1                                                                                                                                                                             𝑋1 
𝑍4                                                                                                                                                                             𝑍4 
𝑎5                                                                                                                                                                             𝑎5 
𝑏5                                                                                                                                                                              𝑠5 
𝑋2                                                                                                                                                                             𝑋2 
𝑍5                                                                                                                                                                             𝑍5 
𝑎6                                                                                                                                                                             𝑎6 
𝑏6                                                                                                                                                                              𝑠6 
𝑋3                                                                                                                                                                             𝑋3 
𝑍6                                                                                                                                                                             𝑍6 
𝑎7                                                                                                                                                                             𝑎7 
𝑏7                                                                                                                                                                             𝑠7 
𝑍7                                                                                                                                                                             𝑠8 
       𝑆. 1       𝑆. 2         𝑆. 3                 𝑆. 4                     𝑆. 5              𝑆. 6     𝑆. 7    𝑆. 8     𝑆. 9        𝑆. 10           𝑆. 11                  𝑆. 12              𝑆. 13        𝑆. 14  𝑆. 15   𝑆. 16 
Figure 4.2: In-place QCLA: high speed
Step 1,3,10 Computation gate
Step 2,7,9,14 CNOT gate
Step 4,5,11,12 Toffoli gate
Step 6,13,15 Uncomputation gate
Step 8,16 NOT gate
47
Chapter 5
Simulation Results
For the simulation results of the out-of-place and in-place QCLA circuits, the hard-
ware description language or HDL that was used is Verilog 2001. With Verilog,
quantum circuits could be simulated when they are decomposed into logical gates
like AND and XOR. The simulation output can be seen on Xilinx’s ISE Simulator
called ISim. For both of the circuits, the inputs changed every 10ns.
5.1 Out-of-place QCLA
The first quantum circuit that was designed was the the out-of-place QCLA. The out-
of-place QCLA was tested from 4-bit case to the 32-bit for correctness, it was also
exhaustively tested, meaning tested using all possible cases, 2n where n is dependent
from n = 4-bit to n = 32-bit.
As it can be seen in Figure 5.1(a), a simulation of an 8-bit out-of-place QCLA.
The inputs are the first four variables: A in, B in, X in, and Z. The outputs are the
last four variables: A out, B out, X out, and Sum. This simulation shows all possible
test cases of each input A in and B in from 0 to 28-1 or 255. As for Figure 5.1(b),
a specific time frame was selected to show a closer look at the correct output for
A out, B out, X out, and Sum. One thing to note that the Measure-and-fixup and
48
Computation-Reversal uncomputation gates function the same when implemented in
Verilog because Verilog is in a Classical domain not Quantum, therefore only one
simulation was required to test the low T-count and the high speed quantum circuits.
(a)
(b)
Figure 5.1: Out-of-place QCLA simulation
(a) Full View
(b) Specified View
5.2 In-place QCLA
Just like the out-of-place QCLA, the in-place QCLAs were tested from 4-bit case to
the 32-bit case for correctness, and tested exhaustively, 2n possible test cases.
49
As it can be seen in Figure 5.2(a), a simulation of an 8-bit in-place QCLA. The
inputs are the first four variables: A in, B in, X in, and Z. The outputs are the last
four variables: A out, Z out, X out, and Sum. This simulation shows all the possible
test cases of each input A in and B in from 0 to 28-1 or 255. For Figure 5.2(b),
a specific time frame was selected to show a closer look at the correct output for
A out, Z out, X out, and Sum. One thing to note that the Measure-and-fixup and
Computation-Reversal uncomputation gates function the same when implemented in
Verilog because Verilog is in a Classical domain not Quantum, therefore only one
simulation was required to test the low T-count, the high speed, and a mixture of
both in-place QCLAs.
50
(a)
(b)
Figure 5.2: In-place QCLA simulation
(a) Full View
(b) Specified View
51
Chapter 6
Conclusion
In this thesis, five novel designs of quantum Carry Look-Ahead addition circuits
having logarithmic depth were introduced. The proposed designs of the quantum
Carry Look-Ahead Adder consisted of logical-AND computation, Measure-and-Fixup
and Computation-Reversal uncomputation gates. These logical-AND computation
and uncomputation gates were verified by hand or by using Quirk Simulator from
[28]. As for the actual QCLA circuits, they ware simulated and tested with all
possible test cases using Verilog HDL. But the main focus of this thesis was T-count
efficient circuits for the out-of-place and in-place QCLAs.
The proposed designs are compared and achieve significant T-count savings com-
pared to the existing works. The proposed QCLA circuits can be used in a larger
quantum circuit where T-count is of primary concern.
The five designs that were introduced in this thesis can be the ground work for
future improvements. One improvement to the QCLA circuits is changing all the
Toffoli gates ( in G-rounds, C-rounds, and their reverses) to the gates proposed by
Jones in [37]. This will make each Toffoli gate have a T-count of 4 instead of 7. But
the drawback of the Jones proposition is that there will be more ancilla inputs and
outputs to the proposed QCLA circuits. Second improvement or proposition is the
52
proposed designs can be used in quantum circuits like multiplication and division to
improve the efficiency. The last proposition is to apply the logical-AND computation
and uncomputation gates into other quantum circuits, other than the QCLAs, where
T-count is a major concern.
53
References
[1] T. S. Humble, H. Thapliyal, E. Muoz-Coreas, F. A. Mohiyaddin, and R. S.
Bennink. Quantum computing circuits and devices. IEEE Design Test, pages
1–1, 2019.
[2] H.R. Bhagyalakshmi and M Venkatesha. Toffoli cascade synthesis of an optimized
two-bit comparator. Lecture Notes in Electrical Engineering, 248:779–787, 09
2014.
[3] Eleanor Rieffel and Wolfgang Polak. Quantum computing a gentle introduction.
The MIT Press, 2014.
[4] H. Thapliyal, E. Munoz-Coreas, T. S. S. Varun, and T. Humble. Quantum circuit
designs of integer division optimizing t-count and t-depth. IEEE Transactions
on Emerging Topics in Computing, pages 1–1, 2019.
[5] Edgard Munoz-Coreas and Himanshu Thapliyal. T-count and qubit optimized
quantum circuit design of the non-restoring square root algorithm. ACM Journal
on Emerging Technologies in Computing Systems (JETC), 14(3):36, 2018.
[6] David Gosset, Vadym Kliuchnikov, Michele Mosca, and Vincent Russo. An algo-
rithm for the t-count. Quantum Info. Comput., 14(15-16):1261–1276, November
2014.
[7] Austin Fowler, Ashley M. Stephens, and Peter Groszkowski. High threshold
universal quantum computation on the surface code. Physical Review A, 80, 03
2008.
[8] N. Cody Jones, Rodney Van Meter, Austin G. Fowler, Peter L. McMahon,
Jungsang Kim, Thaddeus D. Ladd, and Yoshihisa Yamamoto. Layered archi-
tecture for quantum computing. Phys. Rev. X, 2:031007, Jul 2012.
[9] K.-A. Brickman, P. C. Haljan, P. J. Lee, M. Acton, L. Deslauriers, and C. Mon-
roe. Implementation of grover’s quantum search algorithm in a scalable system.
Phys. Rev. A, 72:050306, Nov 2005.
[10] Y. S. Weinstein, M. A. Pravia, E. M. Fortunato, S. Lloyd, and D. G. Cory.
Implementation of the quantum fourier transform. Phys. Rev. Lett., 86:1889–
1891, Feb 2001.
54
[11] M. S. Tame, B. A. Bell, C. Di Franco, W. J. Wadsworth, and J. G. Rarity. Ex-
perimental realization of a one-way quantum computer algorithm solving simon’s
problem. Phys. Rev. Lett., 113:200501, Nov 2014.
[12] Steven A Cuccaro, Thomas G Draper, Samuel A Kutin, and David Petrie
Moulton. A new quantum ripple-carry addition circuit. arXiv preprint quant-
ph/0410184, 2004.
[13] Himanshu Thapliyal and Nagarajan Ranganathan. Design of efficient reversible
logic-based binary and bcd adder circuits. J. Emerg. Technol. Comput. Syst.,
9(3):17:1–17:31, October 2013.
[14] M. Nachtigal, H. Thapliyal, and N. Ranganathan. Design of a reversible floating-
point adder architecture. In 2011 11th IEEE International Conference on Nan-
otechnology, pages 451–456, Aug 2011.
[15] Rasha Montaser, Ahmed Younes, and Mahmoud Abdel-Aty. New design of re-
versible full adder/subtractor using r gate. International Journal of Theoretical
Physics, 58(1):167–183, 2019.
[16] E. Muoz-Coreas and H. Thapliyal. Quantum circuit design of a t-count optimized
integer multiplier. IEEE Transactions on Computers, 68(5):729–739, May 2019.
[17] Rich Rines and Isaac Chuang. High performance quantum modular multipliers.
arXiv preprint arXiv:1801.01081, 2018.
[18] M. Nachtigal, H. Thapliyal, and N. Ranganathan. Design of a reversible single
precision floating point multiplier based on operand decomposition. In 10th IEEE
International Conference on Nanotechnology, pages 233–237, Aug 2010.
[19] Thomas G Draper, Samuel A Kutin, Eric M Rains, and Krysta M Svore.
A logarithmic-depth quantum carry-lookahead adder. arXiv preprint quant-
ph/0406142, 2004.
[20] Himanshu Thapliyal, HV Jayashree, AN Nagamani, and Hamid R Arabnia.
Progress in reversible processor design: a novel methodology for reversible carry
look-ahead adder. In Transactions on Computational Science XVII, pages 73–97.
Springer, 2013.
[21] Yasuhiro Takahashi and Noboru Kunihiro. A fast quantum circuit for addition
with few qubits. Quantum Information & Computation, 8(6):636–649, 2008.
[22] Neela Shirisha, M Tech, P Kalyani, and D Nageshwar Rao. Design of a reversible
carry look-ahead adder using reversible gates.
[23] Agung Trisetyarso and Rodney Van Meter. Circuit design for a measurement-
based quantum carry-lookahead adder. International Journal of Quantum Infor-
mation, 8(05):843–867, 2010.
55
[24] Michael A. Nielsen and Isaac L. Chuang. Quantum computation and quantum
information: 10th Anniversary Edition. Cambridge University Press, 2010.
[25] Giuliano Benenti, Guilio Casati, and Giuliano Strini. Basic concepts. World
Scientific, 2008.
[26] D Michael Miller, Mathias Soeken, and Rolf Drechsler. Mapping ncv circuits to
optimized clifford+t circuits. In International Conference on Reversible Compu-
tation, pages 163–175. Springer, 2014.
[27] Craig Gidney. Halving the cost of quantum addition. arXiv preprint
arXiv:1709.06648, 2017.
[28] Craig Gidney. Quirk: Quantum circuit simulator, https://algassert.com/quirk.
[29] Austin G. Fowler and Simon J. Devitt. A bridge to lower overhead quantum
computation. arXiv e-prints, Sep 2012.
[30] Simon J. Devitt, Ashley M. Stephens, William J. Munro, and Kae Nemoto.
Requirements for fault-tolerant factoring on an atom-optics quantum computer.
Nature Communications, 4, 2013.
[31] Austin G. Fowler, Matteo Mariantoni, John M. Martinis, and Andrew N. Cle-
land. Surface codes: Towards practical large-scale quantum computation. Phys.
Rev. A, 86:032324, Sep 2012.
[32] Kristel Michielsen, Madita Nocon, Dennis Willsch, Fengping Jin, Thomas Lip-
pert, and Hans De Raedt. Benchmarking gate-based quantum computers. Com-
puter Physics Communications, 220:44 – 55, 2017.
[33] Kamyar Saeedi, Stephanie Simmons, Jeff Z Salvail, Phillip Dluhy, Helge Rie-
mann, Nikolai V Abrosimov, Peter Becker, Hans-Joachim Pohl, John J L Mor-
ton, and Mike L W Thewalt. Room-temperature quantum bit storage exceed-
ing 39 minutes using ionized donors in silicon-28. Science (New York, N.Y.),
342(6160):830–833, 2013.
[34] Darshan Thaker, Tzvetan Metodi, Andrew Cross, Isaac Chuang, and Frederic
Chong. Quantum memory hierarchies: Efficient designs to match available par-
allelism in quantum computing. In Proceedings of the 33rd annual international
symposium on computer architecture, volume 2006 of ISCA ’06, pages 378–390.
IEEE Computer Society, 2006.
[35] Daniel Kudrow, Kenneth Bier, Zhaoxia Deng, Diana Franklin, Yu Tomita, Ken-
neth Brown, and Frederic Chong. Quantum rotations: a case study in static
and dynamic machine-code generation for quantum computers. In Proceedings
of the 40th Annual International Symposium on computer architecture, ISCA ’13.
ACM, 2013.
56
[36] Stephen D. Brown and Zvonko G. Vranesic. Fundamentals of digital logic with
Verilog design. McGraw-Hill Higher Education, 2008.
[37] Cody Jones. Novel constructions for the fault-tolerant toffoli gate. Physical
Review A, 87, 12 2012.
57
Vita
Vladislav Ivanovich Khalus
Education
University of Kentucky
Bachelor of Science in Electrical Engineering, May 2016
Bachelor of Science in Computer Engineering, May 2016
Experience
Embedded Software Engineer
June 2018 - Current
KPIT Technologies Inc.
Novi, MI
Graduate Research Assistant
August 2016-May 2017
University of Kentucky
Lexington, KY
Publication
Vladislav Khalus, Edgard Muñoz-Coreas, and Himanshu Thapliyal. ”T-count Op-
timized Quantum Circuit for Logarithmic Addition” 22nd Annual Conference on
Quantum Information Processing, Boulder, January 2019.
58
