Methods for Efficient Synthesis of Large Reversible Binary and Ternary Quantum Circuits and Applications of Linear Nearest Neighbor Model by Hawash, Maher Mofeid
Portland State University
PDXScholar
Dissertations and Theses Dissertations and Theses
Spring 5-30-2013
Methods for Efficient Synthesis of Large Reversible Binary and
Ternary Quantum Circuits and Applications of Linear Nearest
Neighbor Model
Maher Mofeid Hawash
Portland State University
Let us know how access to this document benefits you.
Follow this and additional works at: http://pdxscholar.library.pdx.edu/open_access_etds
Part of the Other Electrical and Computer Engineering Commons
This Dissertation is brought to you for free and open access. It has been accepted for inclusion in Dissertations and Theses by an authorized
administrator of PDXScholar. For more information, please contact pdxscholar@pdx.edu.
Recommended Citation
Hawash, Maher Mofeid, "Methods for Efficient Synthesis of Large Reversible Binary and Ternary Quantum Circuits and Applications
of Linear Nearest Neighbor Model" (2013). Dissertations and Theses. Paper 1090.
10.15760/etd.1090
Methods for Efficient Synthesis of Large Reversible Binary and Ternary Quantum 
 
Circuits and Applications of Linear Nearest Neighbor Model 
 
 
 
 
 
 
 
 
by 
 
Maher Mofeid Hawash 
 
 
 
 
 
 
A dissertation submitted in partial fulfillment of the 
requirements for the degree of 
 
 
 
 
Doctor of Philosophy 
in 
Electrical and Computer Engineering 
 
 
 
 
Dissertation Committee: 
Marek Perkowski, Chair 
Douglas Hall 
Malgorzata Chrzanowska-Jeske 
Christof Teuscher 
John Caughman 
 
 
 
 
 
Portland State University 
2013 
  
 
i 
Abstract 
This dissertation describes the development of automated synthesis algorithms that 
construct reversible quantum circuits for reversible functions with large number of 
variables.   Specifically, the research area is focused on reversible, permutative and fully 
specified binary and ternary specifications and the applicability of the resulting circuit to 
the physical limitations of existing quantum technologies. 
Automated synthesis of arbitrary reversible specifications is an NP hard, multi-
objective optimization problem, where 1) the amount of time and computational 
resources required to synthesize the specification, 2) the number of primitive quantum 
gates in the resulting circuit (quantum cost), and 3) the number of ancillary qubits 
(variables added to hold intermediate calculations) are all minimized while 4) the number 
of variables is maximized.   Some of the existing algorithms in the literature ignored 
objective 2 by focusing on the synthesis of a single solution without the addition of any 
ancillary qubits while others attempted to explore every possible solution in the search 
space in an effort to discover the optimal solution (i.e., sacrificed objective 1 and 4).   
Other algorithms resorted to adding a huge number of ancillary qubits (counter to 
objective 3) in an effort minimize the number of primitive gates (objective 2). 
In this dissertation, I first introduce the MMDSN algorithm that is capable of 
synthesizing binary specifications up to 30 variables, does not add any ancillary 
variables, produces better quantum cost (8-50% improvement) than algorithms which 
limit their search to a single solution and within a minimal amount of time compared to 
  
 
ii 
algorithms which perform exhaustive search (seconds vs. hours).   The MMDSN 
algorithm introduces an innovative method of using the Hasse diagram to construct 
candidate solutions that are guaranteed to be valid and then selects the solution with the 
minimal quantum cost out of this subset. 
I then introduce the Covered Set Partitions (CSP) algorithm that expands the search 
space of valid candidate solutions and allows for exploring solutions outside the range of 
MMDSN.  I show a method of subdividing the expansive search landscape into smaller 
partitions and demonstrate the benefit of focusing on partition sizes that are around half 
of the number of variables (15% to 25% improvements, over MMDSN, for functions less 
than 12 variables, and more than 1000% improvement for functions with 12 and 13 
variables).  For a function of n variables, the CSP algorithm, theoretically, requires n 
times more to synthesize; however, by focusing on the middle  
k (k<n) partitions, the CSP algorithms only requires k times the amount of time required 
by MMDSN which typically yields lower quantum cost.  I also show that using a Tabu 
search for selecting the next set of candidate from the CSP subset results in discovering 
solutions with even lower quantum costs (up to 10% improvement over CSP with random 
selection). 
In Chapters 9 and 10 I question the predominant methods of measuring quantum cost 
and its applicability to physical implementation of quantum gates and circuits.   I counter 
the prevailing literature by introducing a new standard for measuring the performance of 
quantum synthesis algorithms by enforcing the Linear Nearest Neighbor Model (LNNM) 
  
 
iii 
constraint, which is imposed by the today’s leading implementations of quantum 
technology.  In addition to enforcing physical constraints, the new LNNM quantum cost 
(LNNQC) allows for a level comparison amongst all methods of synthesis; specifically, 
methods which add a large number of ancillary variables to ones that add no additional 
variables.  I show that, when LNNM is enforced, the quantum cost for methods that add a 
large number of ancillary qubits increases significantly (up to 1200%). 
I also extend the Hasse based method to the ternary and I demonstrate synthesis of 
specifications of up to 9 ternary variables (compared to 3 ternary variables that existed in 
the literature).  I introduce the concept of ternary precedence order and its implication on 
the construction of the Hasse diagram and the construction of valid candidate solutions. I 
also provide a case study comparing the performance of ternary logic synthesis of large 
functions using both a CUDA graphic processor with 1024 cores and an Intel i7 processor 
with 8 cores.  In the process of exploring large ternary functions I introduce, to the 
literature, eight families of ternary benchmark functions along with a Multiple Valued 
file specification (the Extended Quantum Specification XQS).  I also introduce a new 
composite quantum gate, the multiple valued Swivel gate, which swaps the information of 
qubits around a centrally located pivot point. 
In summary, my research objectives are as follows: 
• Explore and create automated synthesis algorithms for reversible circuits both 
in binary and ternary logic for large number of variables. 
  
 
iv 
• Study the impact of enforcing Linear Nearest Neighbor Model (LNNM) 
constraint for every interaction between qubits for reversible binary 
specifications. 
• Advocate for a revised metric for measuring the cost of a quantum circuit in 
concordance with LNNM, where, on one hand, such a metric would provide a 
way for balanced comparison between the various flavors of algorithms, and on 
the other hand, represents a realistic cost of a quantum circuit with respect to an 
ion trap implementation. 
• Establish an open source repository for sharing the results, software code and 
publications with the scientific community. 
With the dwindling expectations for a new lifeline on silicon-based technologies, 
quantum computations have the potential of becoming the future workhorse of 
computations.  Similar to the automated CAD tools of classical logic, my work lays the 
foundation for creating automated tools for constructing quantum circuits from reversible 
specifications.  
  
  
 
v 
Dedication 
I dedicate this dissertation to 
 
My loving parents Mofeid and Raefah Hawash; 
My wife Lisa for her unwavering support; 
My wonderful children Jared, Sarra and Amani; 
My brother Shaher; 
My Sisters Shaheera, Maheera, Mohaya & Majdoleen; 
And my family and friends around the globe. 
 
 
 
 
  
  
 
vi 
Acknowledgments 
I offer my sincere gratitude to many individuals for their support, advice and 
motivation to help fulfill my aspiration to attain a PhD in Electrical and Computer 
Engineering.  I am deeply indebted to Dr. Marek A. Perkowski for his transcendent 
patronage and mentorship throughout the years of work on my PhD and for his 
interminable motivation and inspirations to help me blaze forward.  I am greatly 
impressed by the tremendous effort he expends for the sake of his students specially the 
time and energy he spends in nurturing middle and high school students in quantum 
computing and robotics.  I have personally benefited throughout the years from the 
lectures and discussions in his weekly Friday meeting and from working directly with 
him in developing ideas, reviewing results and drafting papers for publishing.  I am 
committed to, in addition to absorbing his well of knowledge, by following in Dr. 
Perkowski’s footsteps in mentoring young students to attain their own scientific 
achievement. 
I would also like to extend my deepest appreciations to the committee members who 
have given me valuable feedback for improving the quality of results from my simulation 
of quantum logic synthesis and for introducing me to new concepts.  I am grateful for Dr. 
Douglas Hall for the multiple roles that he took as my graduate advisor and a supportive 
member of my committee.  Dr. Malgorzata Chrzanowska-Jeske has been extremely 
helpful in reviewing my dissertation and giving me valuable feedback in regards to 
organization and content.   I am greatly appreciative of her insightful questions and 
  
 
vii 
constructive advice during my comprehensive exam and dissertation proposal.   I cannot 
thank Dr. Christof Teuscher enough for expanding my field of vision by introducing me 
to the stimulating and promising nanotechnology, which I hope to dedicate more time to 
learn and explore in the future.  Since my first encounter of Dr. John Caughman, I have 
been greatly impressed, and slightly jealous, by his fluency in one of my favorite 
childhood topics, Mathematics.  I am grateful for the spark that he and Dr. Steven Bleiler 
ignited which allowed me to develop the CSP algorithm. 
My mother, brother and sisters have been a beacon of guidance, a fount of love and a 
whisper of encouragement.  Undoubtedly, I would be ungrateful to overlook the arduous 
burden that I have levied upon my wife and children as I thieved the time meant for them 
for the sake of my PhD and who have tolerated my rants and unpleasantness as I 
struggled with some code or papers to finish.  Their love and support account for nine 
tenth of my PhD.  I close by thanking my friends Naveed and Rania Ali, Steven 
McGeady, Stephen Houze, Leora Gregory, Rohan and Kathryn Coelho and my fellow co-
workers at Renewable Funding for their unrelenting support and encouragement. 
 
  
  
 
viii 
 
 
TABLE OF CONTENTS 
Abstract i  
Dedication v  
Acknowledgments vi  
List of Tables xv  
List of Figures xix  
Glossary of Terms xxxii  
Introduction 1  
Summary of Research Objectives 4  
My Contributions 10  
 Introduction 18  Chapter 1
   Included in this research 26  1.1
   Excluded from this research 26  1.2
   Approach 27  1.3
   Organization of Chapters 28  1.4
 Background 30  Chapter 2
   Feasibility of Quantum Computation 32  2.1
  
 
ix 
 Binary and Ternary Quantum Gates 37  Chapter 3
   Introduction 37  3.1
   Binary Quantum Gates 37  3.2
   Inverter NOT gate 38  3.2.1
   Feynman or Controlled NOT gate 38  3.2.2
   Toffoli and MCT gates 39  3.2.3
   Fredkin or Swap gate 41  3.2.4
   Square Root of Not (V) gate 42  3.2.5
   Hadamard Gate 43  3.2.6
   Ternary (and Multiple Valued) Quantum Logic gates 43  3.3
   Galois Field 3 Logic – GF(3) 44  3.3.1
   Mathematical representation of qubit state 46  3.4
   Bloch Sphere and Quantum State 47  3.4.1
   Evolution of Quantum State 48  3.4.2
   History of Ion Trap 50  3.5
   Operation of the trap 53  3.5.1
   One Qubit Operation 55  3.5.2
   Entanglement 57  3.5.3
   CNOT Gate 58  3.5.4
   Quantum Cost Calculation 61  3.5.5
   Ternary Quantum gates in an ion-trap system 61  3.5.6
   Methods of Quantum Logic Synthesis 63  3.6
  
 
x 
   Heuristic methods 63  3.6.1
   Cycle Decomposition 65  3.6.2
   Hierarchical Diagrams 67  3.6.3
   Linear Nearest Neighbor Model (LNNM) 68  3.6.4
   Measurement Quality of Reversible Gates 69  3.6.5
 MMDSN and MP algorithms 73  Chapter 4
   Introduction 73  4.1
   Explanation of MMDs main idea 77  4.2
   MMDS and MMDSN Orderings 80  4.3
1)   Proof of Convergence: 84  
   Multi-Pass Algorithm 85  4.4
   Results of the MMDSN/MP for more than four Variables 88  4.5
   Analysis and Conclusion 90  4.6
 Covered Set Partitions 93  Chapter 5
   Introduction 93  5.1
   MMD Style Algorithms 95  5.2
   Partial Covering Set Partitions 103  5.3
   CSP Algorithm 104  5.4
   Experimental Results 105  5.5
   Analysis and Conclusion 108  5.6
 Covered Set Partition with Evolutionary Algorithms 115  Chapter 6
   Introduction 115  6.1
   MMD Style Algorithms 116  6.2
  
 
xi 
   Anatomy of Covered Set Partition Algorithm 116  6.3
   Structure 116  6.3.1
   Steps for creating valid sequences 118  6.3.2
   Algorithmic Contest 120  6.4
   Objective function using Quantum Cost 120  6.4.1
   Method 1: A Random Skip and Hop 121  6.4.2
   Method 2: Genetic Algorithm 122  6.4.3
   Method 3: Tabu Search 124  6.4.4
   Experimental Results 126  6.5
   Conclusion and Analysis 131  6.6
 Other attempts at discovering better sequences 133  Chapter 7
   Hamming Distance Predictor 133  7.1
   Absolute Distance Predictor 135  7.2
   Vector Length Predictor 136  7.3
 Linear Nearest Neighbor Model for Binary 139  Chapter 8
   Introduction 139  8.1
   Toffoli Gate Cost 141  8.2
   Count of CNOT and CV Gates for MCT Gate 143  8.3
   LNN Quantum Cost of MCTn Gates 148  8.4
   Circuit Depth 153  8.5
   Experimental Results 154  8.6
   Conclusion and Analysis 155  8.7
  
 
xii 
 Multi-Dimensional LNNM Architecture 157  Chapter 9
   Introduction 157  9.1
   MCT4 in 2-D 158  9.2
   MCT5 in 2-D 160  9.3
   Conclusion and Analysis 162  9.4
 Synthesis of Ternary Quantum Circuits 165  Chapter 10
   Prologue 165  10.1
   Ternary Logic System 168  10.2
   Measurement of a qubit 168  10.2.1
   Trits and Ternary States 169  10.2.2
   Reversible Operations 170  10.2.3
   Ternary Reversible Operators 173  10.3
   Synthesis by Example 176  10.4
   Ternary Logic Synthesis Algorithm 179  10.5
   Control Line Blocking 180  10.5.1
   Ternary Hasse Input Sequence 181  10.5.2
   Construction of Input Sequence 183  10.5.3
   Hasse Precedence Quandary 186  10.5.4
   Convergence of Algorithm 188  10.6
   Convergence of Triangular Hasse Precedence Orders 195  10.7
   Selection through Genetic algorithm 197  10.8
   Objective function using Quantum Gate Count 199  10.8.1
   Genetic Algorithm 200  10.8.2
  
 
xiii 
   Genotype and Valid Operators 201  10.8.3
   Experimental Results 202  10.9
   Acceleration with CUDA 205  10.10
   Conclusion and Analysis 210  10.11
   Derivation of Equation 212  10.12
 MV Benchmarks and Extensible Quantum Specification (XQS) 214  Chapter 11
   Introduction 214  11.1
   Reversible Multiple-Valued Logic Functions 215  11.2
   Extensible Quantum Specification (XQS) 222  11.3
   Structure of YAML 224  11.3.1
   Example of Extensible Quantum Specification 224  11.4
   Hybrid Multiple-Valued Reversible Function 227  11.4.1
   Multiple-Valued Finite State Machine 228  11.4.2
   Benchmarks Organizational Description 229  11.5
   Conclusion 231  11.6
   Quantum Lib Function Generators 232  11.7
 Generalized Multiple Valued Swivel Gate 234  Chapter 12
   Introduction 234  12.1
   Binary Swivel Gate 235  12.2
   Quantum cost of binary swivel gate 238  12.3
   MV Ternary SWAP Gate 238  12.4
   Ternary Swivel Gate 239  12.5
  
 
xiv 
   Quantum Gate Count of Ternary Swivel Gate 240  12.6
   MV Swap Gate 241  12.7
   MV Swivel Gate 241  12.8
   Quantum Gate Count of Ternary Swivel Gate 242  12.9
   Conclusion 243  12.10
 Final Conclusion 244  Chapter 13
   Accomplishments of my research 247  13.1
   Automated Logic Synthesis of Large Binary Specifications 247  13.1.1
   MMDSN Algorithm: 248  13.1.2
   Covered Set Partition Algorithm 249  13.1.3
   Redefinition of Benchmarks Measurement 250  13.1.4
   Ternary Quantum Synthesis 251  13.1.5
   Extended Quantum Specification and MV function generator 253  13.1.6
   Generalized Multiple Valued Swivel Gate 254  13.1.7
   Future Work 255  13.2
List of Publications 256  
Bibliography 258  
 
 	  
  
 
xv 
                                                          List of Tables 
Table 3-1 Galois Field 3 operations (modulo 3) for a single qutrit with addition operator 
(a) and product operator in (b).  The addition operator uses the same symbol as 
the binary XOR symbol (⊕) since, mathematically, the binary XOR operation is a 
Galois Field 2 (modulo 2) operation. .................................................................... 45  
Table 3-2 Ternary operators based on GF(3) addition operator where each gate (operator) 
is a mathematical bijection with one-to-one and onto mapping from the input to 
the output. ............................................................................................................. 45  
Table 4-1 MMD method illustrated with truth tables of intermediate functions. Notation  
a à  c means  c = c ⊕  a which “inverts c if a=1”. Control lines are underlined 
and shaded minterms are completed and should not be modified.  The goal of the 
algorithm is to insert quantum gates in order to transform the input vector 
(column ABC) into the input vector (column abc = column 6).  Starting from the 
top, the algorithm skips the first row since abc = ABC and processes the second 
row where the output minterm ‘100’ is transformed into the input minterm ‘001’ 
by placing two conditional inverters on qubits ‘A’ and ‘C’.  The process continues 
until column abc = column 6. ............................................................................... 78  
Table 4-2 Comparison of MMDSN to both MMD and MMDS with respect to  quantum 
cost and duration of synthesis ............................................................................... 92  
Table 5-1 Results of synthesizing 16 functions of different number of variable for every 
possible partitions size for the specific function.  Each table includes the average 
quantum cost over 5 runs, along with the standard deviation for the runs and the 
ratio of the standard deviation to the mean (a measure of variance in results).  The 
majority of results show consistency between runs with a variance below 3% 
while few exhibited higher error around 9%.  The shaded number represents the 
partition size with the lowest average quantum cost.  For functions below 12 bits, 
  
 
xvi 
all functions witnessed the best quantum cost where the partition size is around 
half the number of variables. .............................................................................. 111  
Table 6-1 Quantum Cost of elementary gates ................................................................. 121  
Table 6-2 Comparison for five large functions between four different selection methods 
(random, GA with single point crossover, GA with double point crossover, and 
Tabu search).  By far, the Tabu search has outperformed all other methods when 
compared for the same partitions size as shown by the lightly shaded areas.  For 
each function, the partition size also had an impact the outcome where similar to 
the random selection; partition sizes around the midpoint of the number of 
variables yield the best results – shaded in dark gray. ........................................ 129  
Table 7-1 Hamming distance between input (ab) and output (AB) vectors.  In (a), the 
HD=4 when the qubits are arranged in this manner, however, if the outputs are 
swapped (BA), then the hamming distance=0.  I assumed that it is feasible that by 
rearranging the qubits and synthesizing the modified function might yield better 
results.  However, preliminary experimentation with different functions did not 
indicate that this would be a successful strategy. ............................................... 134  
Table 7-2 Absolute Distance |d| of minterms between input and output vectors.  In (a) the 
input minterm (01) is one position away from its location in the output vector.  
Similarly, the (10) input minterm is one position away from its location in the 
output vector, which combined, results in |d|=2.  Similarly for (b) where |d|=4. 136  
Table 8-1 LNNQC for MCT3 to MCT9 gates. ................................................................ 152  
Table 8-2 Current quantum cost used in literature compared to LNNQC. ..................... 153  
Table 8-3 RevLib Benchmarks with LNNQC [30] ......................................................... 156  
Table 10-1 Shows the band number where each minterm appears where the value of x is 
shown in the upper header.  For row (2x), the minterms (20, 21, 22) appear in 
bands (2,3,4) consecutively which is guaranteed by the structure of the Hasse 
diagram.  I rely on this ordering for the proof of convergence. .......................... 192  
  
 
xvii 
Table 10-2 All possible permutations vs. total permuations for Hasse based sequences 
where all possible soutions are shown in column 3, while the number of solutions 
constructed using the Hasse structure are shown in column 4. ........................... 199  
Table 10-3 Comparison between using the natural order of the input vector vs. using the 
Hasse structure to construct valid input vector arrangements.  As the search space 
increases, the probability of discovering better solutions decreases while the time 
required to discover such solutions increases drastically. .................................. 203  
Table 10-4 Performance times for the HWT5 function which shows that, for CUDA, at 
full thread capacity, it is able to execute at 55 microsconds per sample compared 
to 304 microseconds for the CPU. ...................................................................... 206  
Table 10-5 strucutre of GA algorithm on both CPU and CUDA implementations. ....... 207  
Table 10-6 Performance times for the HWT6 function where, at full capacity, CUDA 
takes 376 microseconds per sample compared to the CPU at 645 710 
microseconds per sample.   The CUDA speedup is lower than for the HWT5 
function since the dataset for HWT6 is larger forcing some of the data to exist in 
CUDA memory buffers shared amongst multiple threads.  For the HWT5 
function, all the data was able to fit in local memory buffers each exclusively 
dedicated for each processor core. ...................................................................... 208  
Table 11-1 Example of multiple-valued (ternary) reversible function where AB are the 
inputs and PQ are the outputs. ............................................................................ 216  
Table 11-2 Matrix representation of ternary (a-c) and binary (d) quantum gates. ......... 217  
Table 11-3 Generalized Ternary Gates based on Galoi Field 3 (GF3) with the gate name 
given in the first column and the mathematical equation for calculating the 
outputs based on the current input.  The third column shows the output for the 
three possible inputs shown in the header, and finally the gate symbol is shown in 
the last column. ................................................................................................... 219  
Table 11-4 Superposition gates for self-inverting ternary(a-c) and binary (d) logic. ..... 219  
Table 11-5 Ternary Square Root of [+1] and [+2] gates. ............................................... 221  
  
 
xviii
Table 11-6 Basic Data Types supported by YAML language which can be nested to as 
many levels as needed.  The first column shows the YAML syntax of the type 
shown in the second column.  The last column represents the parsed YAML in 
most modern programming languages ................................................................ 224  
Table 11-7 Sample benchmark functions available on MV Benchmark Repository: 
http://quantumlib.cecs.pdx.edu. .......................................................................... 229  
Table 12-1 Quantum Ternary Operators based on GF(3) logic. ..................................... 238  
Table 12-2 Number of gates of swivel gates for binary, ternary and multiple valued radix-
r basis of computation. ........................................................................................ 242  
Table 13-1 Comparison of my binary synthesis algorithms, MMDSN and CSP, to the 
work of others in the field.  My algorithm offers the best compromise for this 
multiple optimization problem, where the number of variables is maximized while 
the time of synthesis, quantum cost are minimized without the addition of any 
ancillary variables. .............................................................................................. 248  
 
  
  
 
xix 
                                                       List of Figures 
Figure 3-1 The classical inverter in (a) is unidirectional where it inverts the value of the 
input ‘a’ while the quantum inverter in (b) is bidirectional (reversible) which 
transforms ‘a’ into ‘a’ and vice versa. .................................................................. 38  
Figure 3-2 Controlled NOT gate where the ‘b’ qubit is inverted when the ‘a’ qubit = 1.  
Otherwise, the ‘b’ qubit remains unmodified.  The value of the ‘b’ qubit 
represents the XOR relation which is equivalent to the XOR classical gate.  The 
classical XOR gate is not reversible while the CNOT gate is reversible. ............. 39  
Figure 3-3 Demonstration of quantum reversibility of the CNOT gate.  An application of 
the first CNOT gate to the ‘ab’ inputs produces a⊕ b (middle table) on the ‘b’ 
qubit.  Applying the second CNOT gate reverses the operation and restores the 
original value of ‘b’.  The value of qubit ‘a’ remains unchanged throughout the 
entire process. ....................................................................................................... 39  
Figure 3-4 Equivalent implementation of the classical AND and NAND logic gates (a) 
using the Toffoli quantum gate (b).  The NAND gate is implemented by strapping 
the ‘c’ qubit to ‘1’ which  results in (1 ⊕ ab) = ab. ............................................. 40  
Figure 3-5 Universal Toffoli gate (C2NOT) which is classically implemented by the 
AND/XOR gates (a), however, the quantum Toffoli gate (b) is reversible where, 
similar to the CNOT gate above, application of two Toffoli gates restores the state 
of qubit ‘c’ to its original value. ............................................................................ 41  
Figure 3-6 (a) Fredkin or Swap gate is a composite structure which is implemented using 
three CNOT gates arranged according to the figure in (b).  In classical logic, a 
swap of two bits is accomplished through the exchange of physical wires but in 
quantum technology a set of gates are used to exchange the information of two 
qubits. .................................................................................................................... 41  
Figure 3-7 Symbol of the Hadamard gate and its matrix representation.  The Hadamard 
gate, which has no classical equivallence, places the qubit in a state of 
  
 
xx 
superposition between the pure states of    0  and  1.  The gate is also reversible 
where an application of two Hadamard gates back to back restores the original 
value of the qubit. ................................................................................................. 43  
Figure 3-8 Bloch sphere illustrating a qubit state ψ(t) where the North and South poles 
are designated as the binary basis states of    0  and    1.  Unlike classical logic, the 
qubit could assume any state between the basis states of    0  and    1 where such 
states are symbolized by all the points on the surface of the unit sphere. ............ 46  
Figure 3-9 Ion trap apparatus showing a vacuum chamber in the middle where ions are 
linearly aligned at the center of the narrow cavity. ............................................... 51  
Figure 3-11 Pictorial representation of the Paul ion-trap with ions trapped along the 
center z-axis.  Each ion acts as a quantum register (qubit) where finely detuned 
laser beams (shown as two arrows pointing upwards) represent the quantum gate.  
The laser beams are used to affect the state of each individual ion (a single qubit 
gate), or the state of two neighboring ions in an entangled state. ......................... 52  
Figure 3-14 Binary quantum system showing two energy levels ground  g  and  excited  e   
states where an electron is transitioned between the two states with a finely 
detuned laser pulse with the exact energy necessary to cause such transition. ..... 57  
Figure 3-15 Ion's electronic excitation state copuled to its vibrational motion.  Each 
electronic transition at frequence ω0 has sideband frequencies as a result of the 
vibrational motion ω1. ........................................................................................... 59  
Figure 3-16 Decomposition of the composite Toffoli gate two the five primitive 2-qubit 
gates (CNOT and CV gates).  The Toffoli gate is assumed to have a quantum cost 
of 5 which is the number of one and two qubit primitive gates necessary to 
implement it. ......................................................................................................... 61  
Figure 3-17 Implementation of ternary energy states of an ion-trap system (a) transitions 
between any two states is possible with a single pulse which requires three 
distinctly detuned laser systems for each qubit, while in (b) only transitions 
  
 
xxi 
between neighboring energy levels are possible using two distinctly detuned laser 
systems but requiring two pulses for transitions between the states   0  and    2. .... 62  
Figure 3-19 Cycle decomposition of 3-bit binary function shown in (a) as a table, and (b) 
in a Karnaugh map where the function is represented by the 3-tuple cycle (0,1,2) 
which is equivalent to the cycles (1,2,0) and (2,0,1).  The synthesis process 
decomposes each of the 3-tuple cycles into their equivalent 2-tuple cycles (P1, P2, 
P3) as shown in (b).  A cascade of 2-tuple cycles is then substituted with the 
corresponding gate cascade as shown in (c), (d) and  (e) and the solution with the 
least quantum cost is selected as the best solution. ............................................... 66  
Figure 3-20 Reed Muller Reversible Logic Synthesis algorithm [60] where the algorithm 
starts from a set of equations rather than a truth table.  A solution is found by 
recursively factorizing the set of equations in every possible way (at each node), 
until all inputs and outputs are exactly the same.  The number of nodes for this 
method grow expoentionally and require huge amount of resources (memory, 
computation, and time) and, as a result, is limited to small number of variables. 68  
Figure 3-21 The universal Toffoli gate is known to have a quantum cost of 5 which is the 
number of primitive two-qubit gates needed to implement (a); however, the first 
CV gate performs remote interaction between qubits ‘a’ and ‘c’ which violates the 
LNNM architecture.  An LNNM compliant composition of the Toffoli gate is 
shown in (b) at a cost of 11 two-qubit primitive gates which is further reduced to 
9 two-qubit gates as shown in section 8.2 below. ................................................. 69  
Figure 4-1 The solution circuit found from MMD in Table 4-1  drawn and created from 
outputs to inputs.  The arrow shows the flow of signal from inputs to outputs. This 
method is possible because each reversible gate used in this figure is its own self-
inverse. .................................................................................................................. 80  
Figure 4-2 Algorithm and examples of control line blocking detection ........................... 82  
Figure 4-3 (a) MMDSN orders are created by first constructing a Hasse diagram where 
each level corresponds to the sum of digits at that level. (b) shows all possible 
  
 
xxii 
transitions which have a single bit difference between the two minterms in the 
transition. (c) Finally, an MMDSN order is constructed from the Hasse diagram 
where all the minterms of lower levels have to be processed before any terms in 
the succeeding level. In this example, the order of levels is ({0}, {4,1,2}, {5,3,6}, 
{7}) which yields the input vector of the following order of minterms 
{0,4,1,2,5,3,6,7}. ................................................................................................... 83  
Figure 4-4 A valid MMDS order {0,2,1,3,4,6,5,7} for MMD-like binary synthesis which 
is shown to be algorithmically convergent according to MMDS convergence rule.  
This algorithm is outside the subset that MMDSN algorithm creates since the 
minterm ‘3’ (on the third level) is taken before the minterm ‘4’ on the second 
level. ...................................................................................................................... 84  
Figure 4-5 Quantum cost comparision of MMDSN to both MMD and MMDS for 50 
AHP functions which are randomly generated reverisble specifications.  Each 
function has three points corrosponding to one of the algorithms.  The dotted line 
represetns a linear trendline which demonstrates that on average, the MMDS 
algorithm always gives the lowest quantum cost while the MMD yields the 
highest quantum cost.  The MMDSN results in quantum costs that are close to the 
MMDS and much better than MMD. .................................................................... 88  
Figure 4-6 Comparison of the performance of MMD’s algorithm to the MMDSN 
algorithm described in this chapter.  For MMD the single sample with natural 
binary order was processed.  For MMDSN the average quantum cost of 5 runs 
over 100,000 samples for each run is reported along with the standard deviation of 
the samples and the percentage of deviation compared to the mean.  The first 
improvement column reports the percentage improvement/degradation of 
MMDSN over MMD and the final column reports the final result of MMDSN 
which always considers the MMD sequence in the process. ................................ 89  
Figure 5-2 Row 2 illustrates control line blocking where the output minterm ‘10’ is to be 
mapped to the input minterm ‘01’.  Following the MMD method of synthesies, I 
  
 
xxiii
can invert the lower bit of ‘10’ while using the upper bit for control.  Had I done 
so, however, would alter the completed minterm in the first row ‘11’ to ‘10’ 
(effectively restoring the original state of the first row.  Altering any completed 
minterm is a violation of the MMD algorithm which indicates that the algorithm 
is blocked and will never converge. ...................................................................... 98  
Figure 5-3 Construction of Binary Hasse Diagram for any 3-bit function starts with the 
minterm with all zeros first in the first level (level 0).  Every subsequent level 
includes all minterms which have the same sum of their digits.  Level 1, for 
example, has the minterms {001, 010, 100} since the digits of each minterm add 
up to ‘1’.  The last minterm has all digits set to ‘1’. ........................................... 100  
Figure 5-4 (a) MMDSN orders created by first constructing a Hasse diagram where valid 
transitions from one minterm to another is shown in  (b). A valid MMDSN order 
is constructed from the Hasse diagram where all the minterms of lower levels are 
processed before any terms in the succeeding level. .......................................... 102  
Figure 5-5 The input sequence of a four variable (B3B2B1B0) can be partitioned in 
multiple ways.  Using the upper variable (B3) only would create two partitions 
(visually represented as the two cubes) and the remaining bits (B2B1B0) are 
arranged according to a Hasse diagram (the levels of the Hasse diagram are 
shown as diagonal lines – described in earlier sections).  Using the upper two bits 
(B3B2) for partitioning would create 4 partitions (shown as the four planes) where 
the remaining two bits of each partition would also be arranged according to the 
Hasse structure.  A CSP input sequence is then constructed by arranging the 
minterms according to the natural order of the bits forming the partition then for 
each partition the remaining bits are arranged according to the MMDSN sequence 
described in the previous chapter. ....................................................................... 104  
Figure 5-6 Normalized quantum cost for 16 benchmark functions ranging from 5 to 13 
variables.  In order to visualize the outcome for functions of differnet number of 
variables, I scaled the quantum cost for all functions to the range between 0 and 
  
 
xxiv
100% (y-axis) and scaled the partitions size between 0 and 1 (x-axis).  The 
normalized quantum cost for functions of the same number of variables were 
averaged and plotted here.  Notice that the lowest quantum cost (normalized) for 
all functions is found for normalized partitions size between 0.4 and 0.6; hence, 
the best quantum costs are for parition sizes which are close to the half-way point 
relative to the number of variables.  Notice the trendlines clearly show the 
minimums for each dataset are close to the middle partition. ............................. 106  
Figure 5-7 The graph plots the normalized partition size where the best normalized 
quantum cost was discovered from Table 5-1, in which, the partition size and best 
average quantum cost are shaded in gray.  Notice that for functions of 11 
variables or less, the best quantum cost occurred around the midpoint of the 
normalized partition size (between 0.4 and 0.6) which indicates that for functions 
of this number of variables, a partition size in the vicinity of ½ the number of 
variables typically yields the best results. ........................................................... 107  
Figure 5-8 Results for functions of 12 and 13 variables.  Similar to the other functions, 
the midpoints of the partition size appears to have the best normalized quantum 
cost.  The left edge represents the MMDSN algorithm which, in this case, suffers 
greatly in discovering the quantum cost. ............................................................ 109  
Figure 6-1 Covering Set Partitions with partition size=1 using bit 3 to create two 
partitions of 3 bits each (upper and lower cubes),or partition size=2 using bits 3-2 
to create 4 partitions of 2 bits each (four planes).  The dark line separating the 
partitions is referred to as the partition boundary where minterms are not allowed 
to cross such boundary in the process of rearranging minterms to create different 
input sequences. .................................................................................................. 119  
Figure 6-2 Genotype of a valid CSP input sequence showing valid and invalid mutations 
and cross over operations.  A valid crossover can only occur at the partition 
boundary for each specific partition and at the band boundary as stipulated by the 
  
 
xxv 
Hasse diagram.  Valid mutations, in this case swap, can only swap elements 
within the band to assure that the Hasse order is not violated. ........................... 124  
Figure 6-3 Quantum Cost for the URF9 (9 bit) function for the four selection methods at 
each partition point.  Notice that, in general, the Tabu search performs the best 
compared to the other selection methods and that the Random selection method is 
always the worst.  Also notice that the best quantum cost for all methods happens 
around the midpoint of the number of variables (here at partition size = 4). ..... 127  
Figure 6-4 Plots of four functions with quantum cost vs. partition size for the four 
selection methods.  Similar to the urf5 function above, the Tabu search performs 
the best and all selection methods have lower quantum cost around the midpoint 
of the size of function. ........................................................................................ 132  
Figure 7-1 For the same sequence, swapping bit positions in the output and calculating 
the bit to bit hamming distance vs. the quantum cost of the resultant circuit.  From 
the plot, it is clear that using a preprocessor to determine the arrangement of 
variables with the lowest hamming distance does not give a good predictor of the 
result. ................................................................................................................... 134  
Figure 7-2 Cumulative Positional Distance (CPD) of minterms vs. quantum cost.  I 
hypothesized that a preprocessor which, for a specific minterm, calculates the 
Absolute Distance |d| between its position in the input vector to its position in the 
output vector, and then accumlating all these distances to determine the CPD.  
The CPD would, supposingedly, then could be used as a predictor of which 
input/output sequence could yield better quantum costs.  However, as seen from 
the plot, no clear sign emerges to fortell such a result. ....................................... 135  
Figure 7-3 Difference in Euclidean vector length between input and output vectors as a 
predictor of best quantum cost does not appear to show a clear pattern of which 
candidate solutions could yield better quantum costs.  The vector length is 
determined by considering that the value of each minterm as a decimal distance 
away from zero and calculating the Euclidean distance for a the multi-
  
 
xxvi
dimensional input and output vectors.  The absolute difference between such 
values is plotted vs the quantum cost. ................................................................. 137  
Figure 8-1 Toffoli gate decomposed to set of 5 primitive gates.  Notice the remote 
interaction of the first CV gate which violates the LNNM architecture. ............ 142  
Figure 8-2 Toffoli gate with qubit interaction constrained to LNNM where the qubits of 
the first CV gates are brought next to one another through a set of swap gates.  
The first 3 CNOT gates act as a swap gate which brings the a qubit next to c, and 
the next set of CNOT gates restore the location of the a qubit back to its original 
location.  The quantum cost here is 11 but since the two highlighted CNOT gates 
would cancel each other, the quantum cost of the LNNM Toffoli gate is 9. ...... 142  
Figure 8-3 Decomposition of the 4-bit MCT gate without consideration of LNNM.  Most 
algorithms assume this gate to cost 13 primitive gates, which ignores the fact that 
there are five gates which interact remotely in violation of the LNNM 
architecture. ......................................................................................................... 144  
Figure 8-4 Decomposition of the 5-bit MCT gate into primitive 2 qubit gates without 
consideration of the LNNM architecture. ........................................................... 145  
Figure 8-5 LNNM equivalent of CNOT gate acting on qubits ‘a’ and ‘d’ which are two 
qubits apart.  Rather than a quantum cost of 1, the LNNMQC = 13 (a) which is 
further reduced to 9 (b) once the shaded gates cancel one another. ................... 149  
Figure 8-6 4-qubit swivel gate which pivots the qubits around the center point between 
them  (abcd àdcba).  The binary swivel gate of ‘n’ variables requires  = n2 – 1 
CNOT gates to implement. ................................................................................. 150  
Figure 8-7  Method for minimal swapping of distant two-qubit gates within a 
decomposed MCTn gate.  In stage 0, rather than adding 4 swap gates to bring 
qubit ‘a’ next to’f’ and then 4 swap gates to bring it back, I opted to bring ‘f’ next 
to ‘a’, and slowly bring ‘f’ back through each stage.  Qubit ‘f’ is the target line 
and it ineracts with all qubits in each of the stages.  This method reduces the 
number of swap gates necessary to implement this MCT6 gate.  Also notice that 
  
 
xxvii 
recursive patterns emerge in every stage which build on the structure of the 
previous stage. ..................................................................................................... 151  
Figure 9-1 MCT4 in two dimensional grid layout where, now, qubits ‘a’ and ‘d’ are 
considered neighbors and the first CV gate is compliant with the LNNM model (a 
cost of 1) compared to the linear arrangement which requires 8 additional CNOT 
gates to bring it into compliance.  Notice that the pairs ‘ac’ and ‘bd’ are still one 
qubit apart. .......................................................................................................... 158  
Figure 9-2 MCT4 in a planer layout with qubit ‘a’ duplicated into two additional ancilla 
qubits in order to bring pairs ‘ac’ and ‘bd’ next one another.  Notice that with this 
arrangement, ‘b’ and ‘c’ are now the only pair which are one qubit apart.  Using 
this arrangement reduced the quantum cost further to 21 (rom 29) at a cost of 2 
ancialla qubits. .................................................................................................... 159  
Figure 9-3MCT5 gate with a star-shaped 2-D layout centered around the target qubit ‘e’.  
The LNNMQC for this arrangement (85) is actually worse than the linear 
arrangement of qubits (77) since now qubits ‘a’, ‘b’, ‘c’, and ‘d’ are each remote 
to one another which is not the case for the linear arrangement. ........................ 160  
Figure 9-4 A modified planer arrangement of the MCT5 gate with the duplication of 
qubit ‘a’ into additional ancilla qubits.  Since qubits ‘a’ and ‘e’ interact with all 
other qubits, our strategy is to bring these two qubits as close to the others as 
possible.  Since qubit ‘a’ never acts as a target qubit, it is easy to just mirror its 
value upfront to other ancilla qubits which will interact in a ‘neighborly’ manner 
with the other qubits.  This pattern reduced the quantum cost to 53 compared to 
the linear arrangement of 77. .............................................................................. 161  
Figure 9-5 Another improvement of the planer arrangement of the MCT5 gate where 
qubit ‘b’ was transported to another location which makes it next to both qubits 
‘c’ and ‘d’.  This shuttling of qubit ‘b’ can only occur after the first stage once 
qubit ‘b’ is no longer a target.  The LNNM quantum cost is further reduced to 43 
from 53. ............................................................................................................... 162  
  
 
xxviii 
Figure 10-1 Measurement of qubit in an optical quantum system can be performed by 
placing horizontal and vertical filters along with photon detectors on two 
orthogonal axis for detecting the number of photons on each point.  One of the 
detectors is assigned the state   0 and the other state   1. ...................................... 169  
Figure 10-2 (a) non-reversible functions f and g since separately nor combined (fg) since 
the value 01 is repeated twice on the output and there is no way to reconstruct the 
input; (b) ax is reversible function since it is possible to determine the input 
minterm for each output minterm. ...................................................................... 171  
Figure 10-3 Symbols for quantum binary Identity (wire) and NOT gates. .................... 172  
Figure 10-4 Generalized Ternary Gates with the gate name given in the first column and 
the mathematical equation for calculating the outputs based on the current input.  
The third column shows the output for the three possible inputs shown in the 
header, and finally the gate symbol is shown in the last column. ....................... 173  
Figure 10-5 Ternary energy states of an ion-trap system.  In (a) transitions between any 
two states is possible with a single laser pulse which would require three finely 
detuned laser rays targeted at each ion.  In (b) only transitions between 
neighboring energy levels are possible but only two laser rays are required. .... 174  
Figure 10-6  (a) Ternary Inverter vs. (b) binary Inverter; (c)  ternary controlled op (C-OP) 
vs. (d) Feynman CNOT; and (e) ternary (C2-OP) vs. (f) Toffoli gate (C2-NOT).
 ............................................................................................................................. 175  
Figure 10-7 Ternary Synthesis Example transforming output vector AB (column 2) into 
input vector ab (columns 1 and 8).  The [[+1]] in the header represents the ternary 
gate inserted at that point which is also shown pictorially in the circuit below the 
column. The shaded values indicate the values impacted by the gate; the [[+1 1]] 
in the header of column 5 indicate that the lower qutrit is used as a control of 
value ‘1’ while the upper bit has the [[+1]] gate as shown in the pictorial below 
the column. .......................................................................................................... 177  
  
 
xxix
Figure 10-8 When the algorithm reaches the shaded minterm which needs to be 
transformed from 20 to 02.  If the upper trit is used as control to avoid changing 
the second minterm, any gate applied to change the lower trit from 0 to 2 will also 
change the first minterm.  The same thing will happen if I use the lower trit (0) as 
control where in this case, the second minterm will be altered.  Either way, this is 
a violation of the algorithm and the algorithm is blocked at this point .............. 181  
Figure 10-9 Structure of the ternary Hasse diagram for a 2-variable function.  Each level 
(band) contains the set of minterms which has the same mathematical sum of 
digits. The sum of digits represents the band number shown on the right hand 
side. ..................................................................................................................... 183  
Figure 10-10 Construction of the Ternary Hasse Diagram for a 2-variable ternary 
function.  (a Starting from the bottom with 00, I add two new minterms by adding 
1 to each digit resulting in (01, 10).  (b) for each new minterm (e.g. 01), I repeat 
the process where I add 1 to each digit resulting in (02, 11).  (c) For each digit, the 
process is repeated until the upper value ‘2’ is reached. ..................................... 184  
Figure 10-11 Three possible Ternary Value Precedence diagrams where the ternary 
values can be arranged with different precedence orders.  (a) indicates a natural 
mathematical order of precedence.  (b) indicates that the values 0 and 1 are 
equivalent in precedence and are lower than 2.  (c) indicates that 1 and 2 are at the 
same level and are both above 0.  The precedence order plays a role in the 
structure of the Hasse diagram and the consequent construction of ternary input 
sequences. ........................................................................................................... 187  
Figure 10-12 Precedence order of low and high bits in a ternary Hasse diagram.  This 
Hasse structure follows the natural precedence order (0→1→2) which is also 
evident for each specific digit as it moves from one level to a higher level.  The 
lower digit exhibits the same pattern in the NW direction, while the upper bit 
exhibits the pattern in the NE direction. ............................................................. 191  
  
 
xxx 
Figure 10-13 Example for case 4c.  In an attempt to change 111 to 120, the two lower 
trits have to be modified one at a time.  Were I to attempt changing the lower trit 
from 1 to 0 first, I would had to use the ‘11’ control pattern which was 
encountered before, and would have modified the first minterm (110). But by 
always doing ‘low to high’ transitions first (pass 1), in this case the middle trit 
(from 1 to 2), the output minterm will change properly to allow for new control 
values to emerge which would not affect previously completed minterms.  In pass 
2, I can safely modify the lower trit from 1 to 0 since the ‘new controlling value 
(12)’ will be guaranteed not to modify any previously completed minterms. .... 195  
Figure 10-14 (a) One of the triangular precedence orders ({0}à{1,2}) can be represented 
by the binary equivalent class ({0}à{1}) since the {1,2} are of the same 
precedence level.  (b) The ternary Hasse diagram based on this triangular 
precedence order can be reduced to the binary equivalent Hasse diagram shown to 
the right. .............................................................................................................. 196  
Figure 10-15 Number of permutations for all possible input vectors (all possible 
solutions) vs. Hasse based sequences (valid solutions constructed by this 
algorithm). ........................................................................................................... 198  
Figure 10-16 Genotype of a valid input sequence showing valid and invalid mutations 
and cross over operations.  A valid crossover can only occur at the band boundary 
as stipulated by the Hasse diagram.  Valid mutations, in this case swap, can only 
swap elements within the band to assure that the Hasse order is not violated .... 202  
Figure 10-17 Time required to synthesize the family of Hidden Weighted Trit functions.
 ............................................................................................................................. 205  
Figure 11-1 Example of multiple-valued reversible function represented as a circuit.  In 
this illustration, the A=P qubit is used as a controlling value for the gate [[02]].  
When A is set, the [[02]] gate is active, otherwise, it is a passthrough. ............. 217  
Figure 11-2 Operation of Ternary Inverters (a,b,c) shows three self inverters, while (d) 
show the two complementary inverters (+1 and +2). (e) shows substitution of +1 
  
 
xxxi
to (12, 02) and +2 to (02,12) where the middle 02 gates cancel one another, and 
then the 12 gates cancel one another yielding identity; hence, +1 and +2 are 
complementary invertors. ................................................................................... 221  
Figure 11-3 (a) Example of binary reversible quantum circuit and (b) its specification in 
the PLA format. .................................................................................................. 223  
Figure 11-4 Pseudo-code of an XQS specification describing a Full Adder with 2 inputs 
and 1 output.  The implementation section shows a solution for this adder. ...... 225  
Figure 11-5 XQS specification of a binary/ternary hybrid function where the radix is 
defined for each qubit (322 is ternary for the upper qubit, and binary for the other 
two).  “Don’t care” is shown as ‘-‘ in the output variable definition. ................. 227  
Figure 11-6 XQS specification of Moore State Machine. .............................................. 228  
Figure 11-8 Index page of quantum function generator. ................................................ 232  
Figure 11-9 details of function along with downloadable specification ......................... 233  
Figure 12-1 (a) LNNM rendition of distant CNOT gate (on far left) which is brought into 
LNNM compliance with the aid of swap gates (block of 3 CNOTs), (b) minimized 
version of (a) where the shaded CNOT gates cancel one another. ..................... 235  
Figure 12-2 Pictorial of swivel gate with (a) odd number of qubits, (b) even number of 
qubits. .................................................................................................................. 236  
Figure 12-3Four bit binary swivel gate composed of a set of CNOT gates.  The upper 
right triangle shows the well-known swap gate composed of three CNOT gates.
 ............................................................................................................................. 237  
Figure 12-5: 4-qubit ternary swivel gate which has similar structure as its binary 
counterpart with the addition of the 2x gate and the double return gates at the far 
right. .................................................................................................................... 240  
Figure 12-7 Four variable Multiple Valued (radix = r) Swivel Gate which has the same 
structure as the ternary and binary counterparts with the use of bias gates (r-1)x 
and the (r-1) cascade of controlled gates at the far right. .................................... 242  
  
 
xxxii 
 
Glossary of Terms 
Ancilla Qubits Qubits which are introduced for the sake of holding intermediate 
results of calculations and are not part of the original specification.  
They are also referred to as garbage bits. 
Ancillary Ratio The ratio of synthesized circuit width to the number of input 
variables which reflects the magnitude of increase of circuit width 
due to the addition of ancilla qubits.  An ancillary ratio of one(1) 
indicates no ancilla bits. 
Application 
Specific 
Quantum 
Circuit (ASQC) 
Also known as a quantum oracle refers to the automated 
synthesis of quantum circuit, using the set of primitive quantum 
gates, for any given specification. 
Band Used in the construction of input vector sequences from a Hasse 
diagram where band n holds all minterms whose individual digits 
add up to the decimal number n.  For example, the minterms (011, 
110, 101) belong to band 2. 
Bijective A bijection, or one-to-one correspondence, is a function giving 
an exact pairing of the elements of two sets.  Every element of one 
set is paired with exactly one element of the other set, 
and every element of the other set is paired with 
exactly one element of the first set. (There are no 
unpaired elements.) 
Bloch Sphere In quantum mechanics, the Bloch sphere is a 
geometrical representation of the pure state space 
of a 2-level quantum system represented as points 
on the surface of the unit sphere in three 
dimensions.  Alternatively, it is the pure state space of a 1 qubit 
  
 
xxxiii 
quantum register. 
C
n
NOT Controlled NOT, or Multi Control Toffoli (MCT), gate with n 
control qubits and single target qubit. 
Completely 
Mapped Pair 
For an MMD style algorithm, an input/output pair which have 
already been mapped and should not be altered by any subsequent 
synthesis step. 
Control line A qubit whose value represents a conditional switch to enable or 
disable the operation of a quantum gate of another qubit.  In binary 
circuits, typically the value of one(1) allows the controlled (target) 
gate to operate while a value of zero(0) disables the target gate. 
Control Line 
Blocking 
A condition that occurs during the synthesis process which 
prevents MMD style algorithms from converging because later 
synthesis steps cannot proceed without altering completely mapped 
pairs which is a violation of the convergence criteria. 
Convergence The ability of an algorithm to always compute a solution 
without suffering from the control line blocking syndrome. 
Crossover In genetic algorithms, crossover is a genetic operator used to 
vary the programming of a chromosome or chromosomes from one 
generation to the next.  In other words, crossover is a process of 
taking more than one parent solutions and producing a child 
solution from them. 
CSP Acronym for Covered Set Partition which, for an n-variable 
function, is a method of subdividing the input vector into partitions 
of size k-bits, k < n where partitions are processed in sequential 
order according to the natural order of the k-bits. 
CV Controlled V gate, which enables the V gate when the control 
qubit is one(1) and disables it when the control qubit is zero(0).  
Cycle A process where a permutative specification Sà S is redefined 
  
 
xxxiv 
Decomposition as the product of subsets each consisting of a disjoint cycle. 
FPRM Fixed Polarity Reed Muller decomposition 
Fredkin Gate Invented by Edward Fredkin, is the controlled swap gate of three 
qubits which swaps the last two bits when the control bit is one(1).  
It is a universal gate which means that it can be used to construct 
any quantum binary logic circuit. 
Genetic 
Algorithm 
Is heuristic search algorithm used whenever an exhaustive 
search is impractical or impossible.  A genetic algorithm mimics 
the biological process of natural evolution employing procedures of 
inheritance, mutation and random selection. 
Genotype In a genetic algorithm, the genotype represents the structure of a 
chromosome (solution). 
Grover’s 
database search 
algorithm 
Invented by Lov Grover in 1996, is a quantum algorithm for 
searching an unsorted database with N entries in 𝑂(𝑁!/!) time and 
using 𝑂(𝑙𝑜𝑔  𝑁) storage space. 
Hasse Diagram By Helmut Hasse (1898-1979), is a type of mathematical 
diagram used to represent a finite partially ordered set in a form of 
directed acyclic graph representing the relation (S, ≤) where 
vertical position represents the relation ≤ between elements of S 
such that a ≤ b when a is below b. 
Heuristic Refers to experience based techniques for problem solving, 
learning and discovery where an exhaustive search is impractical. 
Hidden Weight 
Bit/Trit 
functions 
Reversible binary/Ternary functions where the output minterm 
is generated by circularly shifting the input minterm by the number 
of its non-zero bits/Trits. 
Ion Trap Special apparatus which confines charged particles into a 
definite region in space, typically in a linear array of particles 
uniformly lined like a tightly held string of beads.  An ion-trap 
  
 
xxxv 
provides the environment for quantum computation where, on one 
hand, qubits must be isolated from the surrounding environment in 
order to minimize decoherence (loss of internal state), while on the 
other hand, qubits must be individually addressable by external 
laser pulses to transform their internal state according to the 
quantum gate under operation. 
 
Kronecker 
Product 
Used to assemble the composite operator spanning multiple 
qubits when control lines are not present between the qubits. 
 
For example: 
 
LNNM Linear Nearest Neighbor Model which constrains interaction 
amongst qubit to their nearest neighbors which would require 
swapping qubits to comply with such constraint. 
MCTn Multi-Control Toffoli Gate of n- variables. 
MMD algorithm Refers to Miller, Maslov and Dueck’s algorithm for synthesis of 
binary quantum circuits 
MMDS 
algorithm 
Stedman’s improvement over MMD where he discovered the 
formula for detecting convergent sequences. 
MMDSN 
algorithm 
Nouraddin’s (and Hawash) limited the generation of input 
sequences to those constructed using the Hasse diagram. 
Mutation Genetic algorithm operator used to imbue diversity into the 
population by altering phenotype of a solution. 
  
 
xxxvi 
NCT library Not Controlled-Not Toffoli library of gates. 
NMR Nuclear Magnetic Resonance is used to manipulate the state of 
qubits in order to implement quantum primitive gates. 
Qubit Binary Quantum digit 
Qudit d-valued Quantum digit 
Qutrit Ternary Quantum digit 
Recombination Genetic algorithm operator where two genotypes (solutions) are 
used to breed an offspring through n-point crossover. 
Reversible 
Specification 
Represents a function in the form of a truth table or equation 
where every input pattern maps to a unique output pattern. 
Tabu Search Is a met heuristic local search algorithm which iteratively moves 
from one potential solution x to an improved solution 𝑥! in the 
neighborhood of x until some stopping criterion is satisfied.  
Target line 
(qudit) 
The qubit on which a quantum gate applies its operation or 
produces its results. 
Ternary 
Precedence 
Order 
Precedence hierarchy amongst the three members of the ternary 
domain {0, 1, 2} are a member of an ordered graph where ∀  𝑎, 𝑏 ∈ 0, 1, 2      𝑎 ≤ 𝑏  }. 
Toffoli Gate Invented by Tommaso Toffoli, a universal reversible logic gate 
of two control qubits (a , b) and a single target qubit (c) where c = 
ab ⨁ c.  
 
 
  
 
1 
Introduction 
This dissertation is devoted to quantum computing and information processing from 
an engineering perspective with specific focus on automated and efficient synthesis of 
quantum logic circuits of relatively large number of variables.  My research topic 
assumes that there exists a set of universal reversible quantum gates capable of operating 
as binary and ternary logic gates, which have been experimentally demonstrated to 
operate in accordance to the principles of quantum mechanics [15, 18].  The physical 
design of such quantum gates and the fabrication of quantum circuits, however, are 
outside the scope of this dissertation.   
The entire body of my research centered on developing and analyzing algorithms to 
synthesize quantum logic specifications of large number of variables.  My work has 
progressed through three successive and complimentary stages: 
Stage 1: I first started by focusing on the problem of synthesizing binary quantum 
specifications and attempted to automatically construct quantum circuits while 
minimizing the number of primitive quantum gates used.  In this stage, I adhered to 
the prevalent theoretical assumption that any two qubits can interact remotely 
regardless of the number of other qubits between them.  The artifacts of this effort 
are a set of algorithms capable of synthesizing any arbitrary reversible binary 
specification which attempt to minimize the objective function measuring quantum 
cost by employing classical techniques in data structures, random, genetic algorithm 
  
 
2 
and Tabu search. 
Stage 2: I later realized, however, that certain technological constraints exist in a 
physical implementation of quantum circuit.  For example, Ion Trap and Nuclear 
Magnetic Resonance (NMR) technologies, limit interaction amongst qubits to their 
nearest neighbor, and as a result, distant the information held within two qubits has 
to be relocated to two neighboring qubits in order to facilitate the interaction (i.e., 
computation) and the result is then relocated back to the source qubit.  I realized 
that, clearly, such restriction would greatly impacts the yardstick employed in 
measuring circuit performance, and therefore, I have shifted our focus to the Linear 
Nearest Neighbor Model (LNNM).   Since then, I continue to calculate and promote 
a set of standard benchmarks based on the LNNM architecture.  As part of my work, 
I have recalculated the cost of the most commonly used gates, the Multi-Control 
Toffoli gate (MCT) and suggested an efficient LNNM Quantum Cost for the MCT 
gate family.  I then recalculated the performance of my latest binary synthesis 
algorithms to the LNNM metric. 
Stage 3: Given the higher information carrying capacity of a quantum bit, I also 
examined synthesis of quantum ternary specifications and applied methods similar 
to our binary work from the first stage to the ternary domain.  As I was trying to 
synthesize ternary functions larger than what is available in existing literature, I 
defined and introduced a set of new benchmark functions into the literature.  I 
initially introduced the Hidden Weighted Trit (HWT) [19] functions which extend 
  
 
3 
the well-known Hidden Weighted Bit (hwb) binary benchmark functions [20].  I 
later introduced seven additional Multiple-Valued (MV) functions and built an 
online multiple-valued function generator capable of creating one of the defined 
functions for any radix and any number of variables.  In the process of introducing 
these functions, I also realized that the existing file formats used to define quantum 
specifications, the PLA and REAL formats, are inadequate for defining multiple-
valued and hybrid quantum functions.  As a result, I invented and introduced a new 
file specification, the Extensible Quantum Specification (XQS) which is specifically 
designed for multiple-valued and hybrid quantum specifications. 
Stage 4 (current work): As the number of variables increases, by definition of the 
specification or as a result of the introduction of ancillary qubits, the number of 
control lines required to implement the circuit grow.  For a Multi-Control Toffoli 
(MCT) gate, every additional control qubit effectively doubles the number of 
primitive gates necessary to simulate the MCT gate, i.e., doubles the quantum cost.  
In this stage of my research, I am exploring post-processing methods in order to 
group gates of similar control pattern and partition their control functionality into 
subgroups through the introduction of a minimal number of ancillary qubits. 
  
 
4 
Summary of Research Objectives 
I. Binary Logic Synthesis 
Problem Statement: Automated synthesis of binary specifications into quantum 
circuits using the Not, C-Not and Toffoli (NCT) library. 
Description: A combinatorial problem by nature, quantum logic synthesis 
algorithms have been bound to low number of variables (3-4) due to the enormous 
computing resources required to explore all (2n!) possible solutions for the set of Not 
Controlled Not and Toffoli gates (NCT library).  The solution space is much larger, 
in fact, if we chose to include other sets of gates for the synthesis process. 
Objective: Devise methods for exploring the enormous search space of large 
number of binary variables (9+) and discover valid solutions with low quantum cost.  
In order to compare the results of different synthesis algorithm, I use the de facto 
benchmark metrics found in the literature where the quantum cost of the circuit is 
measured by the number of one and two-qubit gates needed to implement the circuit. 
Completed Work: I have completed work in this area with publication of 4 papers: 
1. MMDSN algorithm [21, 22]. 
2. CSP-R algorithm with Random selection [23]. 
3. CSP–EV with selection using evolutionary algorithms (Genetic algorithm and 
Tabu search) [24].  
  
 
5 
II. Binary Logic Synthesis with LNNM 
Problem Statement: Automated synthesis of binary quantum circuits with 
consideration to technological constraint of nearest neighbor interaction between 
qubits. 
Description: The majority of logic synthesis algorithms calculate the complexity of 
resultant circuit using an idealistic cost of a quantum gate without regard to the 
limitations of such gates from a physics point of view.   
For example, Ion trap and NMR (both leading technologies for quantum gates) 
constrains interaction between qubits to their nearest neighbor; yet, the majority of 
existing algorithms disregard this fact when calculating the cost. 
It is not realistic to compare results amongst different methods of synthesis where 
some algorithms consider controlled gates with large distance between the control 
and target qubits to be equivalent in cost to the primitive gates where the target and 
control qubits are next to one another.  
Objective: (1)Analyze and (2)compare the impact of enforcing nearest neighbor 
model upon the quantum cost of a circuit and (3)advocate the LNNM architecture as 
the foundation for establishing new metrics for calculating the cost of binary 
quantum circuits. 
Completed Work: I calculated the LNNM quantum cost of the well-known 
Multiple Control Toffoli (MCT) gates and derived a set of equations that determine a 
  
 
6 
minimized quantum cost based on the LNNM architecture.  I also proposed a two-
element tuple, (lnnqc, ancillary ratio), definition of quantum cost which considers 
both the width and depth of a circuit.  I believe that this definition will allow for a 
level comparison amongst different methods of quantum logic synthesis. [25]  
Finally, I re-implemented my latest binary algorithm (CSP) to calculate the 
LNNQC measure performance of circuits.  I am also currently exploring post 
processing algorithms for sharing control lines among consecutive gates in an effort 
to drastically reduce the LNNQC 
  
 
7 
III. Multiple Valued (Ternary) Logic Synthesis 
Problem Statement: Automated and efficient synthesis of ternary specifications 
into quantum circuits using the Permutative Ternary Gate (PTG) library based on 
Galois Field 3 logic. 
Description: Quantum mechanical systems are inherently suitable for multi-valued 
logic of much higher information density per qubit.  However, the majority of 
research today has mainly focused on binary systems and the limited research in 
multi-valued logic synthesis has been only demonstrated for functions of two 
variables. 
Objective: Explore logic synthesis algorithms for multiple-valued systems with 
large number of variables relative the radix of our logic considered.  For ternary 
systems, the search space expands at an exponential rate of (3n)!; hence, a 5 ternary 
variable can hold the same information capacity as a 9 binary variable.  
Completed Work: HP3 algorithm [26, 27, 28, 29]: 
1. Synthesis algorithm of up to 9 ternary variables. 
2. Introduced the 8 new multiple-valued functions into the literature. 
3. Published and online multiple-valued function generator and a repository of 
MV functions at http://quantumlib.cecs.pdx.edu . 
4. Introduced the Extended Quantum Specification (XQS) file format, specifically 
designed for multiple-valued and hybrid quantum functions.  
  
 
8 
IV. Collaborate 
Motivation: The following is a personal objective that I do not intend to be a 
requirement for my PhD.  Unlike my work in the industry which dictates secrecy, I 
intend that all my academic work will be available to anyone interested in using it 
for the advancement of this field.  I firmly believe in the spirit of collaboration and 
the principles of open source. 
Objective: I realized that there is a unique opportunity for establishing an online 
forum for discussion and hosting an open source repository for source code, 
publications and references. I also believe that it would be beneficial for me 
personally, and for others, to build a compendium of existing synthesis methods in 
the field of quantum logic synthesis. 
Completed Work: 
1. Already established open source repository containing all source code for all 
variants of the binary and ternary algorithms: http://github.com/quhub 
2. I relocated my quantumlib.org site to the Portland State University’s 
subdomain http://quantumlib.cecs.pdx.edu. 
3. QuGenerator: I implemented a new function generator which is capable of 
generating specifications for quantum functions with any radix and number of 
variables: http://quantumlib.cecs.pdx.edu/specifications.  
  
 
9 
4. QuCircuit: I created an online tool for converting a quantum circuit definition 
to a circuit diagram: http://quantumlib.cecs.pdx.edu/circuits. 
5. The majority of my work has been in collaboration with professors and 
students at Portland State University [21, 22, 25, 23, 26], in Palestine [24, 23] 
and in Japan [28, 27]. 
  
  
 
10 
My Contributions 
MMDSN/MP Algorithms Collaborators: N. Alhagi 
Reference: Chapter 4 MMDSN and MP algorithms 
Motivation: Alhagi has designed this algorithm based on Hasse diagram to discover 
and synthesize quantum binary circuits.  He attempted to implement the algorithm in C 
but lacked the proficiency in programming so he resorted to demonstrate its operation 
manually including a manual random number generator and hand calculation.  Although 
he hypothesized correctly that the algorithm will converge, he could not provide 
experimental results to demonstrate. 
The algorithm intended to extend the MMD algorithm to address other possible 
sequences, which MMDS did, but still do it in a reasonable amount of time, and perform 
the synthesis for larger specifications.  MMDS was computationally limited to 5 bits 
since it attempted to synthesize every possible permutation of sequences.  The 
MMDSN/MP algorithm proved successful in providing better results than MMD, 
tremendously faster results than MMDS and for functions up to 32 bits. 
My Contribution: Dr. Perkowski recommended that I collaborate with Alhagi to help 
him with this algorithm and to learn about quantum logic synthesis.  I learned the details 
of the algorithm and implemented the entire algorithm in C++ and demonstrated that the 
algorithm will converge for any arbitrary reversible binary function.  I also calculated 
benchmarks demonstrating that this algorithm provides better quantum cost than MMD 
and takes a fraction of the time that MMDS does. 
I then edited (with Dr. Perkowski) finalized and submitted the paper for the ISMVL 
2010 conference and for publication in the FACTA UNIVERSITATIS journal. 
 
  
  
 
11 
CSP Algorithm Collaborators: Amjad Hawash 
Reference: Chapter 5 Covered Set Partitions 
Motivation: Dr. Perkowski and I met with Drs. Caughman and Bleiler to explore 
other possible ways of exploring the search space.  I realized that our MMDSN algorithm 
is skipping certain search spaces and wanted to attempt additional methods of 
constructing valid sequences that explore other segments of the search landscape.  I used 
one of the methods from that meeting to design and implement the CSP algorithm. 
My implementation of the CSP algorithm with the ability to adjust the partition size 
has proved useful in discovering solutions with superior quantum cost.  Depending on the 
number of partitions attempted, this algorithm takes a bit longer, and performs better than 
MMDSN and in much faster time than MMDS can.  I also was able to synthesize 
functions up to 32 bits but it takes a long time to complete. 
I also saw an opportunity to involve some of the professors from the University of 
Najah, in my hometown of Nablus, in my research for the sole purpose of providing them 
with an opportunity to learn, research and contribute.  My cousin professor Amjad 
Hawash showed interest and I setup a weekly meeting on Skype where I introduced him 
to quantum logic synthesis and reviewed my CSP code in details.  He worked with me in 
collecting, analyzing the data and reviewing the paper. 
My Contribution: I designed the CSP algorithm and transformed the MMDSN 
program to support the CSP method of generating sequences.  I added support for a 
variable partition size in order to test the impact of the size of partition on the result.  I 
also added support for multi-threading on the CPU, and purchased a new PC with Intel i7 
with 8 cores, in order to process a larger number of sequences.  I also modified the 
MMDSN algorithm for multi-threading and compared results from both algorithms.  
I recently updated the logic for calculating the quantum cost and automated the 
process of running the algorithm for all possible partitions of a specifications based on 
the number of variables.  I ran this modified algorithm on a larger number of functions 
from the RevLib site. 
  
 
12 
I wrote a paper about CSP and presented it at ITNG 2011 with the initial results and 
will be revising the paper with the new results for submission later this year. 
CSP-EV Algorithm Collaborators: Amjad Hawash, Baker Abdalhaq 
Reference: Chapter 6 Covered Set Partition with Evolutionary Algorithms 
Motivation: Up to this point, I had been using random generators in selecting CSP 
sequences, so I decided to explore whether the selection methods of CSP sequences has 
an impact on discovering sequences with better quantum cost.   I decided to implement a 
genetic algorithm (GA) using Roulette Wheel to decide on the next set of CSP sequences 
to synthesize.   
I also invited another professor, B. Abdalhaq, from the University of Najah in my 
research who suggested exploring Tabu Search as well.   In my weekly meeting with 
them, I discussed various ideas and challenges in their effort and I gave them advice and 
feedback on how to proceed.  Abdalhaq added the Tabu search to my source code and 
Amjad collected the data from my implementation of the GA.  Amjad also experimented 
with various derivations of the GA and both of them helped in reviewing the paper. 
My Contribution: I updated my CSP code and implemented the genetic algorithm to 
select CSP sequences.  I designed the mutation and cross-over operators to preserve the 
internal structure of the CSP.  I chose to use the Roulette Wheel method of selection in 
order to give the offspring of better fit solutions a higher priority for selection.  I also 
implemented both single point and two point crossover and experimented with uniform 
crossover. 
I then wrote the paper for the ISIICT conference in Amman, Jordan which was 
presented by B. Abdalhaq. 
 
  
  
 
13 
LNNM Algorithm  
Reference: Chapter 9 Linear Nearest Neighbor Model for BinaryChapter 8  
Motivation: Learning about the nearest neighbor constraints that the underlying 
technologies impose upon the design of a circuit, I realized that the de facto quantum cost 
metric used in the literature is not realistic.  I decided to explore the Linear Nearest 
Neighbor Model as a realistic alternative.  I also realized that attempts by other 
researchers to address the nearest neighbor constraint was incomplete since they 
addressed the gaps between gates but did not address the gaps within a gate (specifically 
MCT gates).  None provided a comprehensive set of equations to calculate the LNNM 
quantum cost. 
My Contribution: I first derived a set of equations which calculate the number of 
swap gates necessary to bring any binary quantum into LNNM compliance.  Inserting 
swap gates was the common method of enforcing LNNM in the literature and everyone 
ignored the internal structure of the MCT gates.  I then derived equations to calculate the 
number of swap gates necessary to bring any of the MCT gates into compliance with 
LNNM.  Finally, I implemented a program to calculate the LNNM quantum cost for all 
existing benchmark functions on RevLib [30] and Maslov’s [20] websites. 
I wrote and submitted a paper with the results of comparing the idealistic vs. the 
LNNM quantum costs and proposed that the LNNQC should be used for comparison 
because it is compliant with technology and that the LNNQC brings evenness to 
comparison amongst different methods of synthesis. 
I then derived a set of equations to recalculates the LNNM quantum cost for the family 
of MCT gates by reducing the number of CNOT gates needed for enforcing LNNM 
architecture.  I modified my program to calculate the new LNNM quantum cost and 
applied them to the functions from RevLib and Maslov’s websites. 
I also investigated arranging the qubits on a 2-D grid and calculate the quantum cost of 
some of the MCT gates when arranged in a 2 dimensional plan (rather than a straight 
line). 
  
 
14 
Finally, I am in the process of updating my previous paper with the new results and 
plan on submitting it to future conferences and journals. 
Ternary Synthesis Algorithm  
Reference: Chapter 10 Multi-Dimensional LNNM Architecture 
Motivation: The majority of literature in quantum synthesis focused on the binary 
domain, and few research papers addressed the ternary and multiple-valued domain.   
Even then, such papers demonstrated manual calculations on functions of 2 to 3 
variables.   
I saw an opportunity to address ternary functions with large number of variables and 
decided to apply the knowledge of constructing Hasse diagrams in the ternary domain. 
My Contribution: I designed three models for constructing the Hasse diagrams in the 
ternary domain and implemented a synthesis application using C++ to synthesize any 
arbitrary reversible ternary circuit.   Since the literature lacked a set of ternary 
benchmarks for large number of variables, I created and introduced a new set of functions 
HWT for any number of variables.  I was able to run my implementation on functions of 
up to 9 ternary variables.  
I wrote a paper with the results and effectively set the yardstick for future algorithms.  
In the paper I highlighted the importance of precedence order amongst the ternary literals 
(0,1,2) and provided a proof of convergence for the three prominent orders.  I also 
described the process of constructing Hasse structures for each of the orders. 
I wrote and published a paper in ISMVL 12. 
I then purchased a CUDA graphics card with 1024 cores in an effort to speed up this 
time intensive algorithm and compared the speed up to running the same algorithm on the 
host CPU.  I wrote and published all my results as a chapter in the book: GPU Computing 
with Applications in Digital Logic. 
 
  
  
 
15 
MV Functions and XQS file format Collaborator: Martin Lukac 
Reference: Chapter 11 MV Benchmarks and Extensible Quantum Specification 
(XQS) 
Motivation: During my work in the ternary domain I introduced the HWT family of 
functions in order to verify my results because ternary functions with large number of 
variables were not available.  I worked with Dr. Martin Lukac from Japan to create a set 
of general-purpose multiple-valued functions and introduce them into the literature.  In 
the process of defining these functions, I realized that the current file format used to 
define quantum binary functions becomes limiting when I move to higher levels of 
computation, and as a result, I also introduced the XQS file format.  
My Contribution:  Martin and I explored some of the existing binary benchmark 
functions from RevLib and decided on seven of them.  I wrote a set of generators in Ruby 
language which is capable of generating any of the seven functions based on the radix of 
computation and number of variables.  I then integrated these generators into the 
Portland Quantum Logic Group website and provided it online for anyone to use. 
As for the XQS file format, I suggested the architecture of YAML format as the 
structure for eh XQS format, and suggested the specific format which will accommodate 
multiple-valued and hybrid quantum specifications. 
I jointly wrote a paper of these results which I will be presenting in ISMVL 13. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
 
16 
 
Multiple Valued Swivel Function  
Reference: Chapter 12 Generalized Multiple Valued Swivel Gate 
Motivation: In my work on LNNM architecture, I recognized a pattern of gates which 
reverse the order of qubits around a central access.  I then derived a generalized pattern 
which is applicable to any multiple-valued domain and decided to share these results. 
My Contribution: I designed the swivel gate pattern for multiple valued 
specifications (binary, ternary …etc.) and provided a set of equations to calculate the 
number of primitive gates necessary to implement.  The swivel gate can be used as a 
functional block in a circuit. I have a paper accepted for the Reed-Muller workshop 2013. 
 
  
  
 
17 
 
 
PART I 
 
 
 
 
 
INTRODUCTION 
TO 
QUANTUM COMPUTING  
  
 
18 
 
 Introduction  Chapter 1
Although it appears that the probability of making quantum computing devices 
available for the masses is miniscule, the eminent end of Moore’s law continue to kindle 
the fire for exploring and laying a foundation for such a futuristic computing device.  
Today where the number of quantum bits, qubits, employed for computing is extremely 
limited, automated logic synthesis tools (CAD) which aim at reducing the number of such 
resources is of extreme importance.  When I started my research into quantum logic 
synthesis, I found that the vast body of research in quantum logic synthesis used an 
abstract, mathematical model of quantum gates, and, by large, ignored inherent physical 
constraints imposed by the underlying technology used to construct quantum circuits.  I 
also implemented my first set of algorithms using the abstract model.  However, as I 
realized that there are physical constraints imposed by the technology used to construct 
quantum gates and circuits and, I shifted the focus of my research to take into account the 
physical technological constraints and redefined the quantum cost metrics to take into 
account the physical constraint imposed by the underlying technologies.  The abstract 
model of a quantum gate is useful at the conceptual level of quantum CAD, however, 
from a practical engineering perspective, considering the constraint of Linear Nearest 
Neighbor Model (LNNM) dictated by technology yields better estimates to the width and 
depth of a quantum circuit.  Since a quantum gate requires a finite amount of time to 
  
 
19 
execute, the width of the circuit determines the length of time to execute the specification 
implemented by said circuit.  The depth of a quantum circuit dictates the number of 
additional gates necessary to bring an arbitrary quantum circuit into compliance with the 
LNNM architecture.  Depending on the specification at hand, every additional qubit can 
result in doubling the number of gates needed to implement the function. 
A drastic shift of thinking and comprehension had to manifest as I started working in 
this field.  Quantum mechanics seemed as the strangest field I have ever encountered 
judged by the existing foundations of engineering acquired in my previous academic 
pursuit and professional career.  Basic concepts of quantum computing made little sense 
as they violated well-known, and solid engineering principles.  So the question continues 
to be asked, is quantum computing feasible and do the postulates of quantum mechanics 
hold water?  I can now answer both questions with a resounding affirmation and can 
point to the body of experimental results demonstrating quantum computation in the lab 
setting.  For instance, Ion-trap is one of the leading quantum computing technologies 
used today to build quantum circuits.  Quantum registers with up to 14 qubits, trapped in 
a straight line, have demonstrated the ability to perform computations with high 
repeatability and fidelity [15].  In these experiments, the set of single and double 
primitive gates have been demonstrated.  In classical logic, a NAND gate is considered 
universal such that any logic circuit can be constructed purely from a set of NAND gates.  
In quantum computing, the Toffoli gate is one of many equivalent universal quantum 
  
 
20 
gates which have been demonstrated in the lab and can also be used to construct any 
quantum logic specification [31, 32, 33]. 
My first exposure to quantum logic synthesis started by aiding N. Alhagi in his 
implementation of the Multi-Pass (MP) and MMDSN algorithms, described shortly, 
which allowed for synthesis of any arbitrary quantum reversible circuit [21, 22].  As part 
of his PhD endeavors, Alhagi was exploring quantum logic synthesis and had developed 
a heuristic algorithm to find minimal solutions for arbitrary reversible quantum 
specifications.  He had developed the MMDSN algorithm as an extension to one of the 
well-known quantum logic synthesis algorithms, MMD [34], and on another algorithmic 
extension introduced by Stedman, MMDS [11].  MMD operates on all possible minterms 
of a quantum specification by organizing them in a truth table format in ascending natural 
binary order of the input vector.  The algorithm then implements the input/output 
mapping of each minterm pair by inserting the necessary quantum gates to the circuit, 
and using control lines to avoid modifying previously synthesized minterm pairs. 
Stedman correctly hypothesized that attempting to reorder the order of treating the 
input/output minterms would surely produce different, and ultimately better, results.  For 
a specification of n variables, Stedman calculated that the solution space consist of 2!! 
unique ways of ordering the input vector.  He was able to implement his MMDS 
algorithm to visit each of the input vector ordering, solution, and calculate its quantum 
cost.  Although Stedman was able to demonstrate that, truly, better solutions exist, his 
algorithm could only handle a small number of variables, as the number of solutions grew 
  
 
21 
exponentially with each additional variable, and hence, his algorithm blocked at 5 
variables. 
Building on the work of Stedman, Alhagi realized that the majority of solutions that 
MMDS explores do not algorithmically converge, and thus must be discarded.  He 
developed an algorithm to construct different orderings of the input vector which are 
guaranteed to converge and demonstrated the ability of the algorithm to discover better 
solutions than MMD but not as optimal as MMDS.  Yet, his algorithm would easily work 
for larger number of variables and discover solutions better than MMD.  Lacking 
proficiency in computer programming, Alhagi had trouble implementing the algorithm in 
order to provide simulation results of the algorithm and compare with existing 
algorithms.  He had done the majority of his calculations on small number of variables 
and had done the simulation by hand.  I quickly learned the details of the algorithm, 
implemented it and collected results for functions of up to 16 variables.  I then 
implemented the MultiPass (MP) algorithm that builds on the MMDSN work by 
operating on binary equations rather than truth table.  Using binary equations is superior 
to the truth table as the number of variables increase, as the memory requirements to store 
truth tables doubles with each additional qubit.  With the MP algorithm, I was able to 
synthesize functions with up to 32 variables – both unfeasible for MMD and MMDSN. 
I built on my work with Alhagi and developed the Covered Set Partition (CSP) 
algorithm, which in addition to exploring different orders, splits the function into sub-
groups or partitions and then reorders the terms within a partition.  Through simulation, I 
  
 
22 
was able to demonstrate that creating partitions of the input vector always results in 
discovering better solutions than MMDSN and that the partition width also affects the 
outcome.  For this algorithm, I also experimented with different methods of selecting 
input vector orderings in an attempt to study the impact of selection on the result.  I 
implemented random selection, selection through a genetic algorithms and Tabu Search.  
Although both evolutionary methods produced slightly better results than random 
selection, the choice and overhead of the evolutionary algorithms did not prove highly 
superior to random selection. 
But, how does it really work?  I initially had trouble internalizing how a quantum gate 
or circuit works and could not conceptualize what a quantum gate looks like nor how it 
operates.  I took an introductory course in quantum mechanics in the physics department 
in hopes of understanding the concepts of superposition, entanglement and the 
technicalities of quantum computation.  From that course, I was able to understand the 
framework of quantum mechanics that helped me understand the internal state of a qubit 
and how a qubit could exist in a state of superposition (anywhere between the values 0 
and 1).  I was also able to reason out the ability of a qubit to assume multiple basis states 
for the implementation of multiple values systems and I was able to better comprehend 
the concept of entanglement.  It was clear to me at the time, that, in quantum 
computation, a qubit represents the storage unit to hold the information necessary for 
computation.  It was still unclear, however, what a quantum gate is and how it operates 
until I started learning about NMR and ion-trap technologies.  Ion-trap, specifically, 
  
 
23 
made it clear that a the state of qubits, suspended in a vacuum chamber which isolates 
them from external environments, could be manipulated with a highly focused laser beam 
with a specific wavelength for a specific period of time.  The paper “Ion-Trap Quantum 
Computation” by Holzscheiter [15] gave a good description of the Paul’s ion-trap and its 
operation and detailed the implementation of a NOT and CNOT quantum gates.  It 
became clear that linearly trapped ions limit their interaction to their immediate neighbors 
[6].  Nonetheless, quantum logic synthesis algorithms in the literature have largely 
ignored such physical constrains and have made the following invalid assumptions: 
1. The possibility of any two qubits to interact in an operation regardless of the 
physical distance and barriers between them, 
2. The possibility for a quantum gate to operate on a large number of qubits, 
3. The feasibility of using thousands of qubits in a quantum circuit despite the 
demonstrated hurdles of attaining 10 qubits to date. 
Only recently have researchers turned their gaze to the concept of Linear Nearest 
Neighbor Model (LNNM) as it applies to quantum logic synthesis.   Still, there is naivety 
in the approach where an application of LNNM is limited to the pathways between 
quantum gates while the gates themselves violate the same LNNM principle.  I shifted 
my research focus to the LNNM model and apply this model from a holistic point of view 
by considering all aspects of the problem of quantum logic synthesis.   I directed my 
  
 
24 
effort to redefine the yardstick used to quantify the performance of automated quantum 
synthesis algorithm for the reasons that follows: 
1. Application of the LNNM constraint to any synthesized quantum circuit is a 
superior predictor of the complexity of the circuit as it considers the natural 
constraints imposed by the underlying technology, 
2. Uniform measurement of a quantum circuit using the LNNM architecture allows 
for a normalized benchmark for comparing of different solutions utilizing 
different methods to synthesize the circuit.  Notably, benchmarks of algorithms 
which disregard the cost of interaction between distant or add additional qubits for 
holding intermediate results cannot be compared to solutions which adhere to the 
physical realization of quantum gates. 
I am confident that my contribution in this field, along with the work of other research 
institutions around the world, will lay the foundation for the next generation of automated 
logic synthesis CAD tools which aim to construct application specific quantum circuits 
(ASQC).  Kindled by the spirit of collaboration, I continue to work with other researchers 
in the field to explore alternative methods of quantum logic synthesis and share our 
findings with the rest of the world.  Guided by my personal belief that fruitful 
collaborations are the direct outcome of open and transparent disposition amongst 
researchers in the community, I established an online portal to share the results of our 
research, and provide an online open repository for sharing algorithms and source code.  
  
 
25 
Existing internet portals for quantum benchmarks, namely RevLib [30] and Maslov’s 
Reversible Logic Synthesis Benchmark [20] pages, continue to promote the theoretical, 
unrealistic, embodiment of a quantum gate as it favors their method of logic synthesis; 
i.e., lower quantum cost.  As I will soon demonstrate, application of the LNNM 
architecture to existing solutions expose that the methods that some of the researchers use 
result in a very expensive cost when LNNM is considered. 
I also extended my binary algorithms to the ternary domain and discovered new set of 
challenges as the level of computation increases.  I explored the benefit of exploring 
different ternary orderings to discover solutions with lower quantum gate count.  In the 
binary case, I only had two permutative gates to work with, a wire and an inverter.  In the 
ternary domain, I have five permutative gates to choose from which adds additional 
dimensions to the solution space.  For each additional variable, the increase of the 
number of solutions is even larger for the ternary space where, for a function with n 
variables, there exists 3!! possible orderings of the input vector (compared to 2!! for the 
binary domain).  Convergence of the ternary algorithm takes the center stage where I 
provide multiple proofs of convergence of my ternary algorithm where each proof 
depends on the relative precedence of the values 0, 1 and 2.  This topic is discussed 
further in Chapter 10. 
  
 
26 
 Included in this research 1.1
My research is mainly concerned with the problem of logic synthesis of permutative 
quantum circuit specifications1 which are represented as a set of equations or a truth 
table.  Quantum circuit specifications represent any arbitrary n-input vector mapped to a 
bijective n-output vector such that elements of the output vector are a permutation of the 
input vector; i.e., one-to-one and onto mathematical relation.  The field of research will 
explore quantum logic synthesis of binary and ternary specifications and will 
progressively focus on the applicability of synthesized solutions to the technology used to 
construct the quantum circuit: namely, application of Linear Nearest Neighbor Model 
(LNNM) to the synthesized circuit.  A brief overview of the LNNM, the shortcoming of 
existing synthesis algorithms, and the benefit of applying LNNM architecture to logic 
synthesis can be found in sections 3.6.4 and 0 below. 
The research field is further limited to the set of reversible functions which are 
completely specified; i.e., there are no terms which are designated as a do not care. 
 Excluded from this research 1.2
The research will not address the technical implementation of a quantum gate or the 
fabrication of synthesized quantum circuits.  Discussion of quantum mechanical 
principles and allegories are also excluded along with some of the advance mathematical 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
1 A permutative binary specification for an n-variable circuit represents the set S consisting of 2n input 
minterms where every input minterm maps to an output minterm of the same set S and each mapping is 
unique.   In group theory, a permutation of a set S is defined as a bijection from S to itself S à S.  Such 
specification is clearly reversible due to the one-to-one and unique mapping of input and output minterms. 
  
 
27 
derivations and manipulation of quantum state, gates and propagation of a quantum bit 
from one state to another.  I follow the path of classical CAD tools which relies on the 
supposition that basic universal primitive gates exist at the abstract level of Boolean logic 
(NOT, AND, OR, XOR), and the goal of such tools is to arrange and interconnect such 
gates to implement the desired function.  I make a similar assumption where a basic set of 
quantum primitives exist (in this case NOT, Controlled NOT, Toffoli, and Fredkin gates) 
which when placed appropriately would implement the desired specification.  With the 
exception of the LNNM approach, I assume that the synthesis algorithm is oblivious to 
the technology implementing a quantum circuit. 
  Approach 1.3
Motivated by the potential of quantum computing and other reversible technologies, 
synthesis of reversible specifications (circuits) has been intensely pursued both in 
academic and professional circles.  Some researchers approached their solutions solely on 
a pure mathematical model of reversible circuits while others constrained the 
mathematical model to the limits of quantum mechanics, viz. nature.  Methods for 
finding solutions with minimal quantum gate cost typically require an exploration of the 
entire search space which typically limits such algorithms to a small number of variables.  
Exhaustive depth and breadth first traversal algorithms [35], group theory [36] and 
Boolean satisfiability (SAT) [37] methods suffer a similar fate where the exploration 
  
 
28 
space grows exponentially with each additional variable, and as a result, are limited to a 
maximum of six binary variables. 
 Organization of Chapters 1.4
Part I of this dissertation provides background information about quantum computing 
in general and quantum logic synthesis in particular. Chapter 2 starts with information 
about and feasibility of quantum computation. Chapter 3 presents some of the quantum 
gates typically used for constructing quantum circuits and are the main building blocks 
used in this research.  The chapter also contains a description of Ion trap technology 
which has been used to demonstrate quantum gates and quantum computation.  Chapter 4 
describes some of the quantum logic synthesis method used in the literature and 
demonstrates their operations and discusses their strengths and weaknesses. 
The remainder of this dissertation is organized in three parts.  Part II covers my work 
in the binary logic synthesis domain and includes the MMDSN/MP (Chapter 5), CSP 
(Chapter 6) and CSP-EV (Chapter 7) algorithms which are based on the idealistic cost of 
quantum gates.   Chapter 8 covers some of my other experiments in the binary synthesis 
domain which yielded little success. 
Part III shifts focus to the Linear Nearest Neighbor Model in the binary domain and 
demonstrates methods for calculating the quantum cost for the MCT gates and for any 
quantum circuit (Chapter 9).  Chapter 10 discusses the application of the LNNM to a 2-D 
  
 
29 
arrangement of qubits and the potential for reduced quantum cost using such 
arrangement. 
Part IV is focused on the ternary and multiple valued domain.  Starting with Chapter 
11, I cover my work in synthesizing ternary quantum specifications with relatively large 
number of variables and introduce a set of ternary specifications as benchmark functions.  
In Chapter 12 I introduce seven more multiple valued benchmark functions and introduce 
a new file format (XQS) specifically designed for multiple valued functions.  In Chapter 
13, I introduced a new functional quantum block, the generalized swivel gate, and 
demonstrate how to calculate the cost for any multiple valued computation basis. 
 
 
  
  
 
30 
 Background Chapter 2
Practically, computation has been the single driving force behind modern technology 
over the last few decades. The theoretical foundation of modern information processing 
was established by Church and Turing in 1930s and has become one of the foundational 
principles of modern computer science.  The Church-Turing thesis outlined a classical 
deterministic model of computation and set the stage for computing feasibility and 
complexity of computations.  I soon came to realize that, for a certain class of problems 
(NP problems), a vast number of computations become necessary, and impossible to 
perform, as the solution space for such problems grows exponentially.  All the more 
surprising was the recent claim that there exists a more efficient way of computing, 
which can, for a certain class of problems, result in exponentially faster algorithms.  The 
newly discovered method depends on the quantum mechanical description of information 
which gained the appropriate name of quantum computing. 
While a classical bit of information can assume one of two states, the quantum 
mechanical description of an equivalent two level system, a qubit, is represented by a unit 
vector in a 2-dimensional complex plane which is represented graphically by the Bloch 
Sphere – described in 3.4.1.  Qubits combine via a tensor product where n qubits are 
represented by a unit vector in a 2n-dimensional complex plane2.   A transistor based 
memory cell or register can hold a single value at any moment of time, (0 or 1 for binary 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
2 The geometrical representation of the Bloch sphere is naturally three dimensional, however, by 
constraining the vector length to 1, the state of a qubit resides only on the surface of the unit sphere; hence, 
only two dimensions are required to pinpoint the state of the qubit on the surface of the Bloch sphere 
(analogous to latitude and longitude). 
  
 
31 
registers).  A quantum register, a string of qubits, on the other hand, can be a 0, 1, or 
anything in between.  Similarly, a classical gate expects a single discrete value on each of 
its inputs, and produces single discrete value on each of its outputs.  Assuming binary 
logic, a quantum gate accepts any value between 0 and 1, inclusive, and produces a value 
in the same range.   When the input values are in a superposition state, halfway between 0 
and 1, a quantum gate calculates the results of applying all possible combinations of 
inputs at the same time. 
Despite of the fact that our knowledge of quantum mechanics and its potential 
applicability to computation has spanned more than a  century, advances in quantum 
computing technology have been moderately small.  Such technologies and experiments, 
however, have been instrumental in validating the concepts of quantum mechanics and 
have further demonstrated through the construction of quantum gates that the concepts of 
quantum computing are valid.  Quantum computation requires the setup of a very special 
physical environment where numerous operations must be performed on the register of 
qubits while those qubits are in a total isolation from external environmental stimuli, 
events or disturbances; yet, the same set of qubits can interact with, and affect the state 
of, one another as manipulated by an accurate source of laser or magnetic force.  
Additionally, once the qubits are initialized to a state, computation must be performed in 
quick succession as to prevent the loss of information through decoherence - a loss of 
energy which results in weakening, and eventual ruin of, the interlocking relationship 
amongst qubits in the register. 
  
 
32 
In 1958, the German researcher Wolfgang Paul [38] proposed a design principle for 
isolating single ions in a linear trap and, in 1995, Ignacio Cirac and Peter Zoller [16] 
proposed the architecture of quantum computing machine based on the Paul ion-trap.  
The Linear Paul ion-trap uses a linear radio-frequency quadrupole placed within a 
vacuums chamber at very low temperature where the positively charged ions are 
suspended, and confined, at the center of the trap through opposing and alternating 
current applied to the four poles of the trap.  The positively charged ions maintain an 
even distance between one another through under the influence of the Coulomb repulsive 
force.  A set of laser pulses detuned to a very narrow bandwidth are then used to alter the 
state of a specific ion, or cause it to enter a state of entanglement with another ion.  In this 
arrangement, the set of suspended ions represent the information vessel, the qubit, and the 
bandwidth, sequence and length of the laser pulse represent quantum gate.  The operation 
of the Paul trap is detailed further in section 3.4.1. 
 Feasibility of Quantum Computation 2.1
While the fundamental operations required for quantum computation have been 
demonstrated successfully in the lab, the efforts to scale these experiments to a large 
scale quantum computer are still in their infancy.  The first generation of quantum 
computers might not be suited for general purpose computing nor malleable to be 
software driven.  Rather, quantum computing devices would be designed, initially, to 
implement custom algorithms for solving a specific problem of a higher class of 
  
 
33 
complexity.   Such algorithms, however, would still require thousands of qubits to 
implement, with possibly larger numbers of qubits for error correcting in order to achieve 
an acceptable level of fidelity. 
There are a number of physical systems which have been proposed for the realization 
of a quantum computer.  For example, enormous progress has been made in 
implementing qubits using Josephson junctions, semiconductor quantum dots, nuclear 
magnetic spins, trapped ions and many others.  DiVincenzo [39, 40, 41], of IBM, outlined 
the necessary requirements for implementing a quantum computing apparatus as: 
1. An extensible set of registers (qubits), 
2. Ability to initialize registers to a known initial state, 
3. Universal set of quantum gates, 
4. Ability to transfer quantum information between spatially separated registers, 
5. Ability to extract (measure) the state of registers, 
6. Decoherence time of quits longer than duration of quantum gate. 
The system which has met most of the requirements deemed necessary for a functional 
quantum computer uses trapped atomic ions.   In particular, state readout, high fidelity 
and universality of one and two qubit gates, teleportation and error correction have all 
been demonstrated [41].   In 2009, NIST demonstrated a 2-qubit quantum computer using 
trapped ions where laser pulses were used to manipulate the state of the qubits [42].   In 
2000, Chuang’s team at IBM demonstrated a five (5) qubit system for solving the 
  
 
34 
problem of “order finding” which finds the periods of a particular function, [43] and in 
the same year, the scientists at Los Alamos National Laboratories announced that they 
have achieved quantum coherence in a seven-qubit molecule [43]. 
As for our technology of choice, ion trap, Monz et al have recently demonstrated the 
largest number of qubit entanglement in a confined linear Paul ion trap consisting of 14 
40Ca+ ions where each ion represents a qubit [32].  The original linear Paul trap proposed 
by Wolfgang Paul [38] confines a set of charged particles (ions) to a definite region of 
space with magnetic or electric field.  Raizen et al. [44] and Walther proposed the linear 
radio-frequency quadrupole (RFQ) trap, known as the linear Paul trap which uses a time-
varying electric fields to suspend a set of positively charged ions in free space in a 
straight line.  In their trap, Raizen demonstrated the ability to trap 15 and 33 ions 
suspended in space forming a line, mimicking a string of beads. 
I now point our attention to a list of concrete examples where quantum computation 
has been developed and implemented: 
Quantum Cryptography was one of the earliest, and so far the most developed 
application of quantum information. Bennett and Brassard have proposed a 
communication method which relied on the impossibility of measuring a quantum system 
without disturbing it [45].   Quantum cryptography is different from traditional 
cryptographic systems in that it relies more on physics, rather than mathematics, as a key 
aspect of its security model.  Unlike classical CMOS, sampling or measuring a quantum 
system in the midst of computation, or teleportation, cannot be done undetected.   Any 
  
 
35 
act of measurement will always leave a mark on the system which makes it impossible 
for an eavesdropper to listen-in, i.e. sample the information, without leaving a trace.  
Amazingly, the protocol can be proven to be secure, regardless of resources available to a 
potential eavesdropper. Today, there exist commercial implementations of this protocol 
where it has been used to build secure networks to transmit election data, truly random 
number generators and quantum RADAR [46]. 
Quantum Simulations, proposed by Feynman [47] and Deutsch [48], provided the 
initial impetus for studying quantum computation.  A full description of a quantum 
system consists of n 2-level systems which require 𝑂(2!) complex numbers, and as a 
result, simulations of quantum systems can be performed only for small systems, due to 
the memory requirements involved in tracking the variables.   However, if the 
information is encoded in qubits, only storage of the order 𝑂(𝑛)  is required, reducing the 
complexity exponentially.  The ability to simulate quantum systems could help resolve a 
number of important problems in a condensed matter and particle physics. 
Quantum Algorithms such as the Shor’s factoring algorithm [49] and Grover’s 
database search algorithm [50], pushed quantum computation into the mainstream of 
physics and computer science.  The best known classical algorithms for factoring a large 
number N need 𝑂(𝑒 !"#!! ) operations, while the quantum equivalent uses 
only  𝑂((log𝑁)!), an exponential improvement in speed. Factoring turns out to be very 
important for cryptography, and a realization of a polynomial time factoring code would 
render most of current encryption schemes useless.  
  
 
36 
These advantages do not come without a cost. The increased size of the description 
required for quantum bits is a result of the importance of relative qubit phases.  In an 
experimental realization of quantum information processing, these phases must be 
carefully preserved throughout all performed operations.  However, any interaction with 
the environment may alter the energy states of the qubits, resulting in uncontrolled phase 
shifts, or decoherence. Such interactions are inevitable, though, and for a while it seemed 
that complex computations would not be possible to practically realize.  The discovery of 
quantum error correction has dramatically altered the landscape, demonstrating that as 
long as the errors are small enough, they can be reduced further by constructing “logical 
qubits” out of a number of physical ones [51, 52]: storage qubits plus error correcting 
qubits.  The current theoretical estimates place the necessary threshold of maximum 
failure probability at 𝑂(10−3), which is likely within experimental reach for an ion trap 
implementation.  Ion trap features high trap depth which allows for long storage times, 
and hence lower decoherence (>10 min have been observed) [40].  The fact that the 
charges are confined in free space, away from any objects, the ion trap remains 
undisturbed, for the most part, allowing for precise control of individual qubits.  
  
 
37 
 Binary and Ternary Quantum Gates Chapter 3
 Introduction 3.1
This chapter provides background information about quantum gates, circuits, 
technologies and the performance of a quantum algorithm is measured.   I will start by 
describing the basic quantum primitive gates used to construct more complex gates which 
are further used to construct logic blocks (oracles) or circuits implementing a functional 
specification.  I will also give details of ion-trap technology as it is relevant to our work 
in the space of Linear Nearest Neighbor Model architecture.  
 Binary Quantum Gates 3.2
Similar to classical logic, binary quantum gates are given representative symbols to 
designate their functionality.  The literature describes many quantum gates, or operators, 
where some operate on a single qubit and others operate on two qubits or more.  There is, 
however, a limited set of primitive quantum gates which are used to construct composite 
gates of greater complexity.  The NOT, Controlled-NOT and Controlled-V gate represent 
the NCV library of primitive quantum gates which are commonly used in quantum logic 
synthesis and are the building blocks of composite gates.  Even though the Fredkin 
(swap) gate is composed of a cascade of CNOT gates, it is also typically considered as a 
primitive gate and included in the NCVS library of gates.  Similarly, the Toffoli 3-qubit 
gate is considered as a functional primitive and is included in the NCT library which is 
used in this dissertation. 
  
 
38 
One of the commonly used gates is the 3-variable Toffoli gate and the Multiple-
Control Toffoli (MCT) gate.  Although these gates are ubiquitously used in the literature, 
they are composite gates built from the NCV library gates as shown in section 8.2. 
 Inverter NOT gate 3.2.1
Figure 3-1 shows the classical inverter (NOT) gate, on the left hand side, and its 
equivalent quantum inverter symbolized with the XOR symbol used in binary logic 
equations, on the right hand side.  Unlike its classical counterpart, the quantum NOT gate 
is reversible by definition allowing information to flow in either direction. 
 
Figure 3-1 The classical inverter in (a) is unidirectional where it inverts the value of the input ‘a’ while 
the quantum inverter in (b) is bidirectional (reversible) which transforms ‘a’ into ‘𝑎’ and vice versa. 
 
 Feynman or Controlled NOT gate 3.2.2
Figure 3-2 shows the two qubit Controlled NOT gate (CNOT) which is also known as 
the Feynman gate after the well-known physicist Richard Feynman.  The CNOT gate 
allows for conditional modification of the lower qubit based on the value of the upper 
qubit.  A value of zero on qubit a renders the lower part of the gate as pass-through, while 
  
 
39 
a value of one on qubit a enables the inverter on lower part of the gate.  From the truth 
table, the CNOT gate performs the same functionality as the classical XOR gate.  Again, 
unlike the classical counterpart, the CNOT gate is reversible.  In Figure 3-3 for example, 
placing two CNOT gates back to back returns the state of both qubits to the original state.  
 
Figure 3-2 Controlled NOT gate where the ‘b’ qubit is inverted when the ‘a’ qubit = 1.  Otherwise, the 
‘b’ qubit remains unmodified.  The value of the ‘b’ qubit represents the XOR relation which is equivalent to 
the XOR classical gate.  The classical XOR gate is not reversible while the CNOT gate is reversible. 
 
Figure 3-3 Demonstration of quantum reversibility of the CNOT gate.  An application of the first CNOT 
gate to the ‘ab’ inputs produces a⊕ b (middle table) on the ‘b’ qubit.  Applying the second CNOT gate 
reverses the operation and restores the original value of ‘b’.  The value of qubit ‘a’ remains unchanged 
throughout the entire process. 
 Toffoli and MCT gates 3.2.3
Figure 3-4 shows one of the universal quantum gates, the Toffoli gate, which, similar 
to the NAND gate, can be used to implement any quantum circuit.  The Toffoli gate 
enables the inverter on qubit c only when both qubits a and b hold the value one. We can 
easily implement the quantum equivalents of both the AND and NAND gates by 
initializing qubit c to either a zero or one as shown in Figure 3-4.   
  
 
40 
 
Figure 3-4 Equivalent implementation of the classical AND and NAND logic gates (a) using the Toffoli 
quantum gate (b).  The NAND gate is implemented by strapping the ‘c’ qubit to ‘1’ which  
results in (1 ⊕ ab) = 𝑎𝑏. 
The Toffoli gate is not considered a primitive element of quantum gates as it 
decomposes to a set of CNOT and Controlled-V gates, discussed in section 8.2.  The 
symbolism is extended further where I define the Multiple Control Toffoli gates (MCTn) 
as follows:  
Definition 3-1: A multiple-control Toffoli (MCT) gate of n inputs with target line (xt) 
and control lines (xc1, xc2, xc3, … xc(n-1)) maps all control to their original value and the 
target (xt) to (xc1· xc2· xc3 … xc(n-1))⨁xt.  In essence, all control lines must be active for the 
target (xt) to be inverted. 
 
 
  
 
41 
 
Figure 3-5 Universal Toffoli gate (C2NOT) which is classically implemented by the AND/XOR gates 
(a), however, the quantum Toffoli gate (b) is reversible where, similar to the CNOT gate above, application 
of two Toffoli gates restores the state of qubit ‘c’ to its original value.  
 Fredkin or Swap gate 3.2.4
The Fredkin gate swaps the values of two qubits as shown in Figure 3-6.  Unlike 
classical registers, where swapping bits of information involves transporting values over 
a physical connection, in a quantum register, no such wires exist, and swapping here 
refers to exchanging the information amongst the two qubits in a one for one manner.  
The no-cloning theorem driven by the preservation of quantum momentum (quantum 
mechanics) prohibits a one way delivery of data, but some level of an exchange must 
occur.  Figure 3-6a shows the symbolic representation of the Fredkin gate, while Figure 
3-6b shows the composition of the gate using the CNOT primitive gate. 
 
Figure 3-6 (a) Fredkin or Swap gate is a composite structure which is implemented using three CNOT 
gates arranged according to the figure in (b).  In classical logic, a swap of two bits is accomplished 
through the exchange of physical wires but in quantum technology a set of gates are used to exchange the 
information of two qubits.  
  
 
42 
A Controlled Swap gate also exists where the operation of the swap gate is active only 
when all control qubits are active. 
 Square Root of Not (V) gate 3.2.5
This is one of the special gates of quantum computing and has no analogy in binary 
logic.  The V gate transitions a qubit from one of its pure states (0 or 1) to a state of 
superposition halfway between 0 and 1.   The state of superposition in quantum 
computing is represented on the Bloch sphere, described shortly, by the circle around the 
equator of the Bloch sphere.  The V gate places a qubit in a state of superposition which 
is represented on the Bloch sphere as a single point on the equator of the sphere. 
The V gate is also referred to as the Square Root of NOT because multiplying its 
matrix representation by itself, i.e. squaring it, yields the matrix representing the NOT 
gate.  The effect of the V Gate can be visualized on the Bloch sphere as rotating the unit 
vector pointing to the zenith (  𝑠𝑡𝑎𝑡𝑒   0 ) or nadir (  𝑠𝑡𝑎𝑡𝑒   1 ) points by 90° in the 
clockwise direction.  The V† gate is an inverse gate which rotates the unit vector around 
the Bloch sphere by -90°.  In effect V·V† = I.  The V gate is represented mathematically 
by the matrix: 
V = 𝐶𝑁𝑂𝑇 =    !!!! !!!!!!!! !!!!  
  
 
43 
Similar to the NOT and CNOT gates, the V and V† gates also exist with a controlling 
qubit where such gates are referred to as the Controlled-V and Controlled-V† gates (CV 
and CV†).  
 Hadamard Gate 3.2.6
The Hadamard Gate (H) acts on single qubits and maps the basis state    0  to the 
superposition state    ! !   !!         and state    1  to the superposition state    ! !   !!    which 
represent of a π rotation around the x- and z-axes.  The gate is represented by the 
following symbol and  Hadamard matrix shown in Figure 3-7. 
 
Figure 3-7 Symbol of the Hadamard gate and its matrix representation.  The Hadamard gate, which 
has no classical equivallence, places the qubit in a state of superposition between the pure states of     0   𝑎𝑛𝑑   1 .  The gate is also reversible where an application of two Hadamard gates back to back 
restores the original value of the qubit. 
 Ternary (and Multiple Valued) Quantum Logic gates 3.3
Ternary logic is the closed logic system with certain ternary operators that operate on 
three logic values {0, 1, 2}.  In quantum mechanics, the three ternary values could 
correspond to the different polarization of a photon, alignment of nuclear spin in a 
uniform magnetic field, or different energy levels in an ion-trap system.  To date, Nuclear 
Magnetic Resonance (NMR) and Ion Trap are the most promising technologies which 
were used to demonstrate “quantum circuit model” of quantum computation. 
  
 
44 
In addition to the basis states mentioned above, a qutrit could exist in a state of 
superposition anywhere in between two or three states (including a phase component).   
The Dirac notation allows for describing the state of a qutrit as follows: 
Ψ =  𝛼 0 +   𝛽 1 +   𝛾 2  
where, |α|2 + |β|2 + |γ|2 = 1.  The α, β, and γ components are referred to as the 
amplitude and their square |α|2, |β|2, |γ|2 represent the probability amplitude of each state; 
meaning |α|2 is the probability amplitude that the qutrit will be measured in the    0  state. 
An n-qutrit quantum register has 3n computational basis states donated as    00… 0 ,    00… 1 ,… ,    11… 1 .  At any moment, a register of n-qutrits will be in a 
composite superposition state composed the superposition states of each qutrit. 
 Galois Field 3 Logic – GF(3) 3.3.1
A Galois Field is a field that contains a finite set of elements which is of the order p or 
pk where p is a prime number and k is a positive integer.  In the ternary case, the GF(3) is 
a prime Galois Field of three elements {0,1,2} which are closed under the addition and 
multiplication modulo 3 operators.  Table 3-1 demonstrates the closure of GF(3) under 
the addition and multiplication operators [53].  
  
  
 
45 
Table 3-1 Galois Field 3 operations (modulo 3) for a single qutrit with addition operator (a) and product 
operator in (b).  The addition operator uses the same symbol as the binary XOR symbol (⊕) since, 
mathematically, the binary XOR operation is a Galois Field 2 (modulo 2) operation. ⨁ 0 1 2  ∘ 0 1 2 
0 0 1 2  0 0 0 0 
1 1 2 0  1 0 1 2 
2 2 0 1  2 0 2 1 
(a)  (b) 
The modulo 3 addition operator can be used as the foundation for a set of the ternary 
quantum operators shown in Table 3-2 [54].  Notice that the set of six gates are the total 
number of permutations of the basis states {0,1,2}.  A thorough discussion of these gates 
is found in Chapter 10 and Chapter 11. 
Table 3-2 Ternary operators based on GF(3) addition operator where each gate (operator) is a 
mathematical bijection with one-to-one and onto mapping from the input to the output. 
 
  
 
46 
 Mathematical representation of qubit state 3.4
Classical multiple valued logic has been trivially simple to represent as a finite set of 
literals representing the state of a classical bit.  For example, the set {0,1} represents all 
possible states of a binary bit, while the set {0,1,2} represent all possible states of a 
ternary bit.  In quantum logic, the most common representation of a qubit is the Bloch 
Sphere that is used to visualize the current state of a qubit.  The Bloch sphere represents 
a continuous landscape for the qubit’s state to occupy rather than the discreet set of a 
classical multiple valued system.  The Bloch sphere represents both a magnitude and 
phase element of the internal state of the qubit, where, for an optical quantum system, for 
example, describes the magnitude and phase of the light photons employed as the qubit. 
 
Figure 3-8 Bloch sphere illustrating a qubit state ψ(t) where the North and South poles are designated 
as the binary basis states of     0   𝑎𝑛𝑑     1 .  Unlike classical logic, the qubit could assume any state between 
the basis states of     0   𝑎𝑛𝑑     1  where such states are symbolized by all the points on the surface of the unit 
sphere.  
In quantum computing technologies, measuring the state of a qubit is a probabilistic 
event, where, rather than measuring and exact zero, one or two, of a single computation, 
an ensemble of computing registers (possibly thousands or millions) are all constructed in 
  
 
47 
the same manner, prepared in the same manner, and excited with the same set of inputs.  
The answer to the computation is the answer with the highest probability. 
 Bloch Sphere and Quantum State 3.4.1
A qubit’s state can be visualized as a unit vector on the well-known Bloch sphere 
shown in Figure 3-8 which can be represented mathematically as: 
   𝜓 =   𝑒!" cos𝜃2    𝜓 +   𝑒!" sin𝜃2    1  
Where θ, ϕ and γ are real numbers, and 0 ≤ θ ≤ π and 0 ≤ ϕ ≤ 2π are the only two 
angles necessary to define any point on the sphere (similar to the longitude and latitude 
lines on a map).  The eiγ term is a global phase which has no observable effects and can 
be ignored resulting in the equation: 
   𝜓 =    cos𝜃2    𝜓 +   𝑒!" sin𝜃2    1  
In fact, the Bloch sphere merely extends the two dimensional complex (x + iy) plane 
into the third dimension z and limiting the radius (vector length) to 1.  By doing so, only 
two variables (θ and ϕ) are necessary to locate any point on the surface of the unit sphere.  
For a two dimensional complex plan, the length of the vector 𝑣 = (x+iy) can be found by: 𝑣 = 𝑣∗𝑣 =    𝑥 − 𝑖𝑦 ∙ 𝑥 + 𝑖𝑦 =   𝑥! + 𝑦! = 1 
which is the equation for the unit circle.  In polar form, the vector can be represented 
as: 𝑣 =    r(cos𝜃 + 𝑖 sin𝜃) = 𝑟𝑒!" 
  
 
48 
which for the unit circle would become: 𝑣 =    cos𝜃 + 𝑖 sin𝜃 = 𝑒!" 
Similarly, the general qubit state can be written in the Dirac notation as:    𝜓 = 𝛼   0 + 𝛽   1  
where α and β are complex numbers which, for a pure quantum state, has the unit 
length constraint: 
 𝜓 𝜓∗ = 1 →    𝛼 ! + 𝛽 ! = 1 
In quantum mechanics, the coefficients α and β represent the probability amplitude of 
the corresponding states; meaning, 𝛼 ! is the probability of measuring the qubit in state    0  and 𝛽 ! is the probability of measuring state   1 . 
 Evolution of Quantum State 3.4.2
In the following section about Ion trap, I will show that an application of a finely 
tuned laser pulse can transition a qubit’s state from ground to excited states.  The length 
of time that the laser pulse is applied has an impact on the final state of the qubit.  Using 
the Bloch sphere as a visualization aid, the qubit’s state can be thought of as a vector 
touching a single point on the Bloch sphere and the application of the laser pulse 
transitions that point around the surface of the sphere.  For a binary system, the north and 
the south poles of the sphere represent the basis basis states of    0   and   1  respectively.  
Any point in between is said to be in a superposition state between the two basis states.  I 
describe such a state by the application of a !! pulse representing which is symbolically 
  
 
49 
represented by the Hadamard gate (H) which evolves either of the basis states of a qubit 
to a state of superposition.  In essence, we can visualize the application of a quantum gate 
as a rotation (σx, σy, σz) of the qubit’s state vector around the three orthogonal axes.  The 
set of rotation operators, known as the Pauli matrices are shown below: 𝜎! =   𝑋 =    0 11 0  𝜎! =   𝑌 =    0 −𝑖𝑖 0  𝜎! =   𝑍 =    1 00 −1  
The Hadamard gate is a composition of two Pauli matrix rotations as follows: 
H = !! 𝑋 + 𝑍 =    !! 0 11 0 + 1 00 −1 = !! 1 11 −1  
Notice that the above matrices are all unitary matrices which preserve the unit length 
of the vector within the Bloch sphere.  By definition, a matrix U is unitary if 𝑈 ∙ 𝑈! = 𝑈! ∙ 𝑈 = 𝐼 
 The NOT gate is the same as the X gate and can be simulated with HZH as follows: 𝑋 = !! 1 11 −1 ∙ 1 00 −1 ∙ !! 1 11 −1 =    0 11 0  
Similarly a CNOT gate can be simulated with the following primitive gates: 
  
 
50 
 
Multiplying the Hadamard gate matrix by itself results in the identity matrix which 
indicates that, similar to the CNOT gate, the Hadamard gate is a self-inverting gate. 
 History of Ion Trap 3.5
Ignacio Cirac and Peter Zoller proposed the original utility of Ion-Trap for quantum 
computation in 1995 [16].  An ion-trap is a special apparatus which, as the name implies 
confines positively charged particles into a definite region in space, typically in a linear 
array of particles uniformly lined like a tightly held string of beads.  An ion-trap provides 
the environment for quantum computation where, on one hand, qubits must be isolated 
from the surrounding environment in order to minimize decoherence (loss of internal 
state), while on the other hand, qubits must be individually addressable by external laser 
pulses to transform their internal state according to the quantum gate under operation. 
Figure 3-9 shows the Radio-Frequency Quadra-pole (RFQ) trap, aka linear Paul trap 
[44] which utilizes a radio-frequency time-varying electric field along four long poles to 
suspend the ions in a vacuum chamber and effectively isolates them from the 
  
 
51 
environment.  Each ion which serves as a single quantum register, a qubit, is individually 
addressable with a sharply defined laser beam targeted at the exact location in space. 
 
 
Figure 3-9 Ion trap apparatus showing a vacuum chamber in the middle where ions are linearly 
aligned at the center of the narrow cavity. 
Figure 3-10 and Figure 3-11 illustrate the construction of an RFQ ion trap with four 
rods (poles) where each pair of opposing rods are connected to the same source (polarity) 
and neighboring rods are of opposite polarity.  In a theoretically symmetrical trap, the 
opposing positive pairs of rods exact a repellant force upon the positively charged ion of 
the same magnitude but in the opposite direction.  At the same time, the negatively 
charged rods attract the positively charged ion with forces of the same magnitude but in 
opposite direction.  However, minuscule differences in timing or the construction of the 
trap distorts the idealistic symmetry and the ions attempt to leak out of the trap.  By 
switching the polarity of the poles, the ion will reverse its attempt to escape and move in 
the opposite direction (back to the center of symmetry).  Applying an alternating source 
with the appropriate switching frequency, the charged particles will remain within the 
trap along the axis of symmetry.  
z-axis 
  
 
52 
 
Despite their confinement along the axis of symmetry, ions will attempt to escape in 
either direction along that axis.  The Coulomb repulsion force of similarly charged 
particles typically maintains a distance between each neighboring ions; however, due to 
the constant vibration of each ion, such force will eventually push the ions at the edge out 
of the trap.  To avoid such an incident, a static DC voltage (end cap electrodes) is applied 
on both ends of the axis as shown in Figure 3-11. 
 
Figure 3-11 Pictorial representation of the Paul ion-trap with ions trapped along the center z-axis.  
Each ion acts as a quantum register (qubit) where finely detuned laser beams (shown as two arrows 
pointing upwards) represent the quantum gate.  The laser beams are used to affect the state of each 
individual ion (a single qubit gate), or the state of two neighboring ions in an entangled state.   
(a)                                                                         (b)                                            (c) 
Figure 3-10 Paul ion-trap with four electrical rods having exact characteristics (material, 
radius, length, …etc) placed in parallel at equal distances.  The two opposite electrical rods are 
connected together to a time varying alternating electrical field of the same charge. The other 
set of poles carry the exact electrical field but with opposite charge (180° phase shift). Positively 
charged ions are trapped in the center of the four rods along the central z-axis by the Columb 
forces of the alternating magnetic field which alternates at a high enough frequency to keep 
pushing and pulling on the ions along the two orthogonal axes (x and y).  The ions are kept from 
escaping from the open ends (along the z-axis) by placing positive static charges at both ends of 
the z-axis. 
  
 
53 
It is also essential to reduce the vibrational motion of individual ions in order to ensure 
their exact spatial position within the trap, such that, when a laser beam is applied, it can 
be applied to a specific ion.  Therefore, in order to individually address each ion, the 
motion along the trap must below the wavelength of the laser beam while the distance 
between neighboring ions along the axis of symmetry must be larger than the wavelength 
of the beam.  The vibrational motion of the ions can be reduced by cooling the ions to an 
extremely low temperature (below 10 K) which is effectively accomplished by multi-step 
process of laser pumping, Doppler cooling and broadband cooling [15]. 
 Operation of the trap 3.5.1
A string of ions confined within a Paul trap, even cooled ions, will vibrate within a 
sphere of motion centered around a single point in space, and therefore, a string of N ions 
have 3N vibrational modes.   Along the axis of symmetry, ions will vibrate back and forth 
pushing one another along that axis.  Similar to a set of strongly coupled pendulum 
connected with a spring, the motion of a single pendulum will cause all others to move in 
a similar pattern.  Unlike a spring, however, vibrations exhibited by a string of ions are 
quantized, where the amplitude of motion depends on the number of quanta (phonons) in 
the vibrational mode.  At the lowest energy, or absence of individual energy phonons, all 
ions will vibrate back and forth in unison in the same direction.  This is known as the 
common mode of vibration and represents the common-mode quantum state (stationary =    0  and vibrating =   1 ) – analogous to a modulated wave in a microwave signal. 
  
 
54 
In addition to the common vibrational mode, each ion exhibits its own vibrational 
mode which can be addressed individually by a laser beam of an appropriate wavelength.  
As a result, the vibrational state of each ion can represent another vessel of quantum 
information (a qubit) – logical qubit state.  As a result, the state of a string of ions is 
given as:    𝑞!, 𝑞!,… , 𝑞!    𝒏  
The first ket,    𝑞!, 𝑞!,… , 𝑞! , represents the logical qubit states of the j ions in the trap, 
and the second ket,    𝒏 , represents the common-mode vibrational state (n=0, 1, 2…etc) 
where n represents the number of energy phonons in common-mode.  For the lowest 
energy common-mode (n=0), ions are not vibrating along the axis of symmetry which 
represents a Common-Mode Quantum State (CMQS) of    0 , and when all ions are 
vibrating in unison along the same axis, the CMQS =    1 , see Figure 3-12.  The common 
mode vibrational state is typically used for establishing a state of entanglement between 
two ions. 
 
Figure 3-12 Common Mode Quantum State (a) all ions are stationary è State   |𝟎⟩, (b) A single 
energy phonon excites all ions to vibrate in unison in the same direction è State   |𝟏⟩.   
  
 
55 
 One Qubit Operation 3.5.2
A single qubit operation can be implemented by affecting the internal state of each 
individual ion or affecting the vibration state of the entire trap.  The latter, however, will 
only allow for the realization of a single qubit regardless of the number of ions in the trap 
(a single quantum of information) as they all share the common mode vibrational state.  
Consequently, the internal state of an ion is used as the carrier of information, while the 
vibrational state is used to entangle multiple qubits which is used for implementing 
multiple qubit gates – entanglement will be discussed shortly. 
3.5.2.1 The electron model – a preview 
The electron model proposed by Rutherford and Bohr with the electrons spinning 
around the nucleus at discrete distances, the planter model, was superseded by the cloud 
model proposed by Erwin Schrödinger.  By proposing the probability function for the 
Hydrogen atom, he surmised that the electron does not follow a specific orbit, but rather, 
it can be found within a region.  Heisenberg proposed his landmark uncertainty principle 
declaring that simultaneous measurement of both the exact location of an electron and its 
precise speed and direction is not possible; and hence, it is not possible to measure the 
path a specific electron takes as it orbits the nucleus which invalidates the Bohr model of 
the atom. Scientists can, however, determine the area an electron will probably occupy, 
and the probability of finding the electron at some place inside this area. A map of this 
area and its probabilities forms a cloudlike pattern known as an orbital. Each orbital can 
  
 
56 
contain two electrons, but these electrons cannot have identical properties, so they must 
spin in opposite directions.  Orbitals are grouped into the set of shells around the nucleus.  
Each shell can contain a limited number of orbitals, which means that each shell can 
contain a limited number of electrons. Each shell corresponds to a certain level of energy, 
and all the electrons in the shell have this same level of energy Figure 3-13. 
 
Within a shell, the s-orbital has the lowest energy level followed by the p, d and f-
orbitals consecutively.  Electrons fill orbitals with lower energy levels first followed by 
higher energy levels of orbitals.  Exceptions occur where, for example, the 4s-orbital (4th 
shell) has a lower energy level than the 3d-orbital, which results in electrons occupying 
the 4s-orbital before the 3d-orbital. An injection of a photon at the appropriate frequency 
can cause an electron to move from a lower energy orbital to a higher energy orbital or 
vice versa. 
Figure 3-13 Electron orbitals around the atom showing a cloudlike structure where the 
electron is most likely to occupy depending on the energy level that it carries.  An electron can 
be excited to move to a higher energy level through the injection of a photon of a specific 
wavelength (finely detuned laser beam).  Since multiple energy levels are available for the 
electron to occupy at each orbital shell level,  Multiple Valued logic is implemented by moving 
the electron between multiple distinct energy levels. 
  
 
57 
 
Figure 3-14 Binary quantum system showing two energy levels 𝑔𝑟𝑜𝑢𝑛𝑑   𝑔   𝑎𝑛𝑑  𝑒𝑥𝑐𝑖𝑡𝑒𝑑   𝑒    states 
where an electron is transitioned between the two states with a finely detuned laser pulse with the exact 
energy necessary to cause such transition.  
3.5.2.2 Implementation 
The ion’s electronic ground state (s-orbital) can be used to represent the   0  while one 
of the higher energy states (p- and d-orbitals) are used to represent the   1  state [Figure 
3-14].  In an ion trap, a sharply focused laser beam of specific amplitude and duration 
detuned to the transition frequency (ω0) moves the electron from its ground to the long 
lived excited state (1 second).  The duration of such a transition is known as a π-pulse 
transition.  An application of the same laser beam for half the time a !! pulse, places the 
electron in an equally weighted superposition state between    0     and   1  states 
[ !! (   0   +   1 )]−   (A  Hadamard  gate). 
 Entanglement  3.5.3
Entanglement was initially a mysterious and controversial aspect of quantum 
mechanics which led Einstein, Podolsky and Rosen (EPR) [55] to reject the notion that 
the universe may have a fundamentally non-local interpretation and expressed as much in 
the so-called EPR paradox.  Entanglement is the property whereby a measurement on one 
  
 
58 
part of a quantum system affects the measurement outcome of another.   Later, however, 
Bell, Clauser, Horne, Shimony and Holt [55] where able to confirm, experimentally, that 
the concept of entanglement is a part of Nature and an integral part of quantum 
mechanics.  
Entanglement is the cornerstone of implementing two qubit gates and is at the heart of 
describing quantum many-body physics harnessing the many-body phenomena, such as 
quantum phase transitions, to entangle many-body particles together, at absolute zero 
temperature.  The implementation of the CNOT gate in an ion-trap system relies on 
entangling the two ions in order to achieve the conditional control of one qubit over the 
other. 
 CNOT Gate 3.5.4
While we can interact with each bit separately affecting its electronic state, laser can 
also be used to affect the vibrational mode of the ions in the trap along the axis of 
alignment.  Manipulating the vibrational state along with the logical electronic state 
allows for the implementation of the CNOT gate which along with the one qubit gates 
mentioned above can be used to implement any logic circuit.   
I mentioned earlier that the vibrational mode acts as the modulated wave of a 
microwave signal.  In this case, the individual ion’s electronic state acts as the carrier 
wave of a modulated microwave signal exhibiting a dual sideband spectrum.  Figure 3-15 
  
 
59 
shows the vibrational sideband spectrum of an ion’s electronic state of frequency ω0 
which is much higher than the common mode vibrational frequency ω1. 
 
Figure 3-15 Ion's electronic excitation state copuled to its vibrational motion.  Each electronic 
transition at frequence ω0 has sideband frequencies as a result of the vibrational motion ω1. 
Assuming two ions where both their electronic and vibrational state of an ion have 
been initialized to their ground states   0   (i.e., s-orbital  and no vibrational motion Figure 
3-12a)    𝑞!𝑞!    𝒗 =       00    𝟎  
In order for the ion to transition state (either electronic or vibrational), a quantized 
amount of energy must be injected into the system consisting of a laser beam of a specific 
wavelength.  Application of a laser beam detuned to the blue (higher) sideband (ω0 + ω1) 
to the first ion for the π-pulse duration will transition its electron’s electronic state to 
  
 
60 
   1…    …  due to the ω0 and the vibrational mode of all ions in the trap to the    …    𝟏  
state due to the energy of ω1 leaving the system in the state:    𝑞!𝑞!    𝒗 =       10    𝟏  
To implement a CNOT gate, however, we would apply the blue sideband frequency 
for a half the duration a !! pulse, which would leave the system in the entangled state.    𝑞!𝑞!    𝒗 =    12 (   00    𝟎   +   10    𝟏 ) 
Notice that the first ion’s electronic state and the vibrational state of all ions are both 
entangled.  If we now address the second ion, still in ground electronic state, with a laser 
beam detuned to the red sideband (ω0 - ω1), the energy level of the beam is below its 
transition frequency ω0 which is not enough to transition.  However, if the exact deficit in 
energy (ω1) is borrowed from the common vibrational mode, the second ion has enough 
energy to transition from its ground to excited state.  Borrowing that phonon of energy, 
however, will also transition the traps vibrational state back to its ground state (no 
vibration). 
   𝑞!𝑞!    𝒗 =    12 (   00   +   11 )   𝟏  
Notice that the final electronic state of the second ion depends on the state of the 
electronic state of the first ion where q2 is negated only when q1 is   1 , and hence, the 
CNOT gate. 
  
 
61 
 Quantum Cost Calculation 3.5.5
Quantum cost of a circuit represents the number of one and two-qubit gates within the 
circuit which, physically, represents the effect of a single laser beam that performs one of 
the Pauli rotations described above.  The NOT gate, for example, requires a single 
rotation around the x-axis; namely, a σx(π) rotation so the quantum cost of a NOT gate is 
one(1). 
Although we’ve demonstrated a CNOT gate with the HZH cascade, a CNOT gate can 
easily be implemented with a single Controlled-X rotation requiring a single pulse.  A 
Toffoli gate, on the other hand requires five pulses as shown in Figure 3-16 which 
accounts for the five primitive CV and CNOT gates each requiring a single pulse (Note 
that, for this calculation, I am disregarding the LNNM criteria described in section 0.) 
 
Figure 3-16 Decomposition of the composite Toffoli gate two the five primitive 2-qubit gates (CNOT 
and CV gates).  The Toffoli gate is assumed to have a quantum cost of 5 which is the number of one and 
two qubit primitive gates necessary to implement it. 
 Ternary Quantum gates in an ion-trap system 3.5.6
In an ion-trap system, a ternary system can be implemented in a similar manner as I 
described in section 3.5.2 above where I outlined the operation of a binary system with 
two the basis states    0   𝑎𝑛𝑑   1  represented as the energy levels    𝑔   𝑎𝑛𝑑   𝑒 .  The 
  
 
62 
ternary energy states shown in Figure 3-17 can be used to implement the ternary quantum 
states    0 ,    1   𝑎𝑛𝑑     2 .  Ternary gates can be implemented by the application of the 
appropriately detuned laser beam to perform state transitions from one state to the next.  
For example, the [[01]] gate can be performed by the application of a laser beam of w01 
frequency which will transition qubits in state    0  to state    1  and qubits in state    1  to 
state    0  while leaving qubits in state    2   unchanged.  The [[12]] gate can be 
implemented in a similar manner with the application of a laser pulse of w12 frequency. 
 
Figure 3-17 Implementation of ternary energy states of an ion-trap system (a) transitions between any 
two states is possible with a single pulse which requires three distinctly detuned laser systems for each 
qubit, while in (b) only transitions between neighboring energy levels are possible using two distinctly 
detuned laser systems but requiring two pulses for transitions between the states    0   𝑎𝑛𝑑     2 . 
Implementing the [[12]] gate depends on the construction of the ion-trap.  Figure 3-17 
a shows an ion-trap construction with three laser beams allowing a direct implementation 
of the [[12]] gate with a single application of a laser pulse of w12 frequency.  The 
implementation shown in Figure 3-17b requires two pulses to implement the [[12]] gate 
going through the    1  state. 
  
 
63 
 Methods of Quantum Logic Synthesis 3.6
Of particular interest to our field of study are the following categories of reversible 
logic synthesis: 
 Heuristic methods 3.6.1
Although simple in concept, Heuristic methods are able to synthesize functions with 
large number of variables.  Starting from a truth table representation of a reversible 
binary circuit specification, the transformation-based method proposed by Miller, Maslov 
and Dueck (henceforth MMD) [56] compares every input minterm to its corresponding 
output minterm.  For every mismatched qubit, the algorithm adds an inverter (NOT gate) 
to correct the mismatch, and hence, construct a cascade of quantum gates to implement 
the specification.  A single guiding principle of the algorithm prohibits any operation 
from altering minterms which have already been ‘corrected’, which, by design, allows 
this algorithm to converge for any arbitrary binary specification.  Although the synthesis 
process is remarkably fast for large specifications of up to 16 binary variables, the 
algorithm’s demand for memory resources increases exponentially, which, as a result, 
limits its ability to process functions with larger number of variables. 
  
 
64 
 
Galvanized by the above limitation of the MMD algorithm, I realized and successfully 
demonstrated that exploring alternative sequences of the input and output minterms could 
yield circuits with lower quantum cost [22].  Before I delve into our discovery however, 
it is appropriate at this time to illustrate the mechanics of the MMD algorithm as it 
represents the foundation of some of our work. 
Given the truth table of an arbitrary reversible specification, Figure 3-18, where the 
input vector ab shown in column 1 maps to the output vector AB of column 2.   Since the 
specification is reversible by definition, the MMD algorithm, in this example, transforms 
the output vector to the input vector by inserting the appropriate gates between the input 
and output endpoints.  The algorithm first examines each input/output pair starting from 
the top row where, in this case, it detects a mismatch between the high bits of the input 
and output minterms 𝑎 ≠ 𝐴.  The algorithm inserts an inverter on the A line to remedy 
the mismatch, and in turn, inverts the A bit of all rows of the truth table (shaded bits in 
column 3).  A corresponding inverter is inserted below column 3 to provide a graphical 
A
B
a
b
ab AB00 10 0 00 0001 01 1 1 0110 11 1 1 111 00 0 10 1
Figure 3-18 Demonstration of Miller, Maslov and Dueck (MMD) Algorithm for a 2-variable 
binary function.  The first column ‘ab’ is the input vector and the second column ‘AB’ is the output 
vector.  Gates are placed between the input and output to transform the output vector ‘AB’ to be a 
replica of the input vector ‘ab’ (column 1 = column 5).  The resulting cascade of gates represents 
the synthesized circuit.   Starting from the top minterm, in column 3, an inverter is placed on the 
‘A’ qubit in order to transform the output minterm AB=10 to match the input minterm ab=00. The 
process is repeated for all minterm pairs until column 1 = column 5.   The synthesized circuit is 
shown below the table. 
  
 
65 
demonstration of the circuit’s construction.  Once the first row has been corrected; i.e., 
the input and output minterms become an exact match, the algorithm proceeds to examine 
the next set of input/output minterms of the second row.  Again, the algorithm detects a 
mismatch, in the upper bit, between the input minterm (01) and the recently modified 
output minterm (11).  To correct this difference, however, the algorithm must use the 
lower bit (1) as a control line (conditional variable) while inverting the upper bit.  In 
essence, a Feynman gate, also known as Controlled-Not gate, is inserted where bit (a) is 
inverted only if bit (b) equals 1.  The algorithm continues the process until the output 
vector (last column) is an exact match to the input vector (ab) signaling that the synthesis 
process is complete, and that a solution (circuit) has been built which implements the 
initial specification.  Using the control line is necessary to preserve the previously 
completed synthesis of the first pair of minterms in compliance with the guiding principle 
of MMD mentioned above. 
 Cycle Decomposition 3.6.2
The mathematical concept of cycle permutations can be applied in quantum logic 
synthesis due to the fact that quantum specifications are bijective functions where the 
minterms of output vector are a permutation of the minterms of the input vector.   Using 
this method, any reversible specification can be decomposed into a set of 2-cycle 
transpositions and each 2-cycle can be further decomposed into an ordered set of 
distance-one cycles. 
  
 
66 
 
Figure 3-19 Cycle decomposition of 3-bit binary function shown in (a) as a table, and (b) in a 
Karnaugh map where the function is represented by the 3-tuple cycle (0,1,2) which is equivalent to the 
cycles (1,2,0) and (2,0,1).  The synthesis process decomposes each of the 3-tuple cycles into their 
equivalent 2-tuple cycles (P1, P2, P3) as shown in (b).  A cascade of 2-tuple cycles is then substituted with 
the corresponding gate cascade as shown in (c), (d) and  (e) and the solution with the least quantum cost is 
selected as the best solution. 
Figure 3-19a shows a specification with a single 3-cycle (0,1,2) where 0→1, 1→2, 
2→0 and all other minterms are self-mapping.  The 3-cycle transposition can be 
decomposed to a set of 2-cycle transpositions as shown in the Karnaugh map of Figure 
3-19b-e.   In the data flow graph of Figure 3-19c for example, the 2-cycle permutation 𝑃! = 0,1 0,2  first swaps the location of the minterms 0 and 1, then swaps the location 
of 0 and 2.  Tracing the input minterms 0, 1 and 2 shows that the product of the 2-cycle 
transpositions (0,1) and (0,2) represents the original cycle (0, 1, 2): hence,  0→1, 1→2, 
2→0.  Notice that this operation is not commutative, as the product (0,2)(0,1) is not 
equivalent to (0,1)(0,2).  The lower half of Figure 3-19c shows the cascade of gates 
which implement the specification through the decomposed cycles (0,1)(0,2).  Figure 
3-19d-e shows two possible cycle decompositions, P2 and P3 which implement the same 
specification.  Notice that the P1 has less number of gates than the other two.  The saving 
  
 
67 
in this case could be related to the lower hamming distance of cycles (0,1) and (0,2) 
(HD=1) as compared to the cycle (1,2) (HD=2) found in both P2 and P3 . 
Algorithms using cycle decomposition typically explore the various sets of 2-cycle 
decomposition and attempt to discover a sequence of cycle permutations with the least 
quantum cost.  The problem, however, is that the number of different 2-cycle 
permutations increases exponentially as the number of variables increases which makes it 
extremely difficult to find solutions with better quantum cost than could be discovered by 
some of the other methods presented herein.  
 Hierarchical Diagrams 3.6.3
Following the classical approach to logic synthesis, the hierarchical diagrams can be 
used to decompose a quantum circuit specification as a set of binary equations into a set 
of sums or products of minterms.  Some of the typical approaches to hierarchical decision 
diagrams are the BDD [57], positive and negative Davio expansions [57, 58, 59, 60], 
Cosine Sine Decomposition [61], and Quantum Multiple-Valued Decision Diagrams 
(QMDD) [62].  Such decompositions are typically portrayed as binary tree diagram 
which expands exponentially at each level of the decomposition.  Similar to the cycle 
decomposition method above, algorithms using hierarchical diagrams are limited to few 
numbers of variables as the amount of resources required to store and compute such 
diagrams becomes enormous very quickly as shown in Figure 3-20.  As a result, these 
methods are also limited to small number of variables.  
  
 
68 
 
Figure 3-20 Reed Muller Reversible Logic Synthesis algorithm [60] where the algorithm starts from a 
set of equations rather than a truth table.  A solution is found by recursively factorizing the set of equations 
in every possible way (at each node), until all inputs and outputs are exactly the same.  The number of 
nodes for this method grow expoentionally and require huge amount of resources (memory, computation, 
and time) and, as a result, is limited to small number of variables. 
 Linear Nearest Neighbor Model (LNNM) 3.6.4
Quantum architectures which adhere to the LNNM limit interaction amongst any two 
qubits to their nearest neighbor.  For example, 3-21 shows the defacto embodiment of the 
well-known Toffoli gate as a cascade of five 2-qubit gates.  The first V gate, however, 
allows remote interaction between the qubits a and c, which, according to the LNNM 
architecture is not permissible.  To calculate the LNNM quantum cost of executing such a 
  
 
69 
gate, the information of one of the qubits (a in this example) have to be transferred to the 
location of qubit b with the aid of swap gates (represented by the three gate block).  The 
V gate is applied between a and c, and then a is returned back to its original position.  
 
 
Figure 3-21 The universal Toffoli gate is known to have a quantum cost of 5 which is the number of 
primitive two-qubit gates needed to implement (a); however, the first CV gate performs remote interaction 
between qubits ‘a’ and ‘c’ which violates the LNNM architecture.  An LNNM compliant composition of the 
Toffoli gate is shown in (b) at a cost of 11 two-qubit primitive gates which is further reduced to 9 two-qubit 
gates as shown in section 8.2 below. 
 
 Measurement Quality of Reversible Gates 3.6.5
3.6.5.1 Quantum Cost 
This is the most commonly used measure to evaluate the performance of a synthesis 
algorithm.  Quantum cost is defined as the number of primitive single and two-qubit 
quantum gates needed to implement the reversible specification.  Primitive gates include 
the single qubit NOT, V and V†; and the two qubit Controlled counterparts: CNOT, CV 
and CV† gates.  Some of the commonly used composite gates, such as the Multiple 
Control Toffoli (MCT) gates, swap gate, Miller gate, ...etc. are typically decomposed to 
  
 
70 
their primitive gates and the cumulative quantum cost of the primitive gates are used as 
the quantum cost of such composite gates. 
The majority of the literature, however, uses an idealistic quantum cost assuming that 
any two qubits, regardless of other qubits between them, are able to interact, and in turn, 
use the same quantum cost for a CNOT which spans 100 qubits as the one which involves 
two neighboring qubits.  As I will discuss later in this dissertation, technological 
constraints imposed by the technical implementation of quantum gates limit interaction 
between qubits to their nearest neighbor.  I introduce in Chapter 8 and Chapter 9 a new 
set of MCT quantum costs which are compliant with the Linear Nearest Neighbor Model 
(LNNM) and reported such benchmark yardstick to the literature. 
3.6.5.2  Ancilla Bits 
A typical quantum specification of n variables have the same number of inputs and 
outputs.  Some quantum algorithms require the addition of new qubits to hold 
intermediate calculations which are used later in the construction of the synthesized 
circuit.  Such additional bits are typically referred to as garbage or ancilla bits.  Some 
resort to the addition of ancilla bits in an effort to reduce the amount of gates necessary to 
implement the specification, and in turn, result in better quantum cost. 
It is desirable to keep the number of ancilla bits to a minimum because of the technical 
cost increases with the addition of each new bit – laser beams, detectors, …etc.  Some 
specifications require the addition of ancilla bits, however, and cannot be avoided.  For 
  
 
71 
example, incomplete specifications require the addition of ancilla bits in order to convert 
them to complete reversible specification suitable for quantum synthesis and 
computation. 
As I have indicated in the previous section, remote interactions between qubits might 
not be feasible, and as such, algorithms that add ancilla bits and allow for interaction 
between distant qubits provide faulty measurement of their performance.  As I explain in 
Chapter 8, forcing compliance on such algorithms result in a huge increase in their 
LNNM quantum cost compared to their reported quantum cost.  In an effort to normalize 
comparison amongst diverse algorithms, I added a secondary measurement of 
performance in my LNNM proposal to measure the width of the circuit.  The concept of 
ancillary ratio is defined as the rate of increase in the circuit width relative to the original 
set of variables should also be reported by algorithm designers in order to provide an 
even comparison amongst them. 
  
  
 
72 
 
 
PART II 
 
 
 
RESEARCH OBJECTIVE #1 
 
QUANTUM LOGIC SYNTHESIS 
OF 
BINARY SPECIFICATIONS  
  
 
73 
 MMDSN and MP algorithms Chapter 4
 Introduction 4.1
Quantum circuits are a sub category of the general class of reversible circuits which 
are mathematically referred to as bijection where the set of output minterms (output 
vector) is simply a permutation of the input vector.  Mathematically, a quantum gate (also 
applies to a full quantum circuit) is represented as a unitary matrix which, when applied 
to an input quantum state, produces the resulting output quantum state.  The inherent 
reversibility of a quantum circuit implies that, if the output state is applied to the output 
end of a gate then the quantum gate will yield the original input quantum state.   Similar 
to the universal NAND gate in classical logic, physicists have been able to define a set of 
universal reversible quantum gates which would serve as the building blocks of quantum 
circuits.  For example, the equivalent representation of the classical NAND gate can be 
constructed from the quantum Toffoli gate.  Unlike classical circuits, however, quantum 
circuits cannot fan in or out because, doing so, would violate the no-cloning principle 
and would clearly render a quantum circuit irreversible.  Additionally, observation, 
sampling or measurement of a quantum circuit is prohibited during computation as 
measurement disrupts the internal state of the qubit and causes it to collapse, which as a 
result, forces the qubit to lose its quantum properties of superposition and entanglement. 
In general, automated quantum logic synthesis starts from a specification, as a table or 
a set of equations, mapping each possible input and its corresponding output.  A fully 
specified reversible mapping represents a reversible function which, algorithmically, are 
  
 
74 
considered prime (or ready) for automated synthesis.   A partially specified function is 
inherently irreversible which typically requires the addition of ancillary (extra) variables 
to make it reversible before it would be suitable for automated synthesis.   A synthesis 
algorithm typically constructs a cascade of quantum gates mapping each input minterm to 
its corresponding input minterm.  It is of course imperative that at any point in the 
synthesis process that the specification remains reversible. 
There are currently two types of algorithms to synthesize reversible circuits: (T1) 
those like MMD  [63, 64, 65, 66, 67, 68, 56] that start from a reversible specification, 
(T2) those like  [37, 69, 70, 71, 11, 72] that start from non-reversible specification and 
create ancilla bits. The second type of methods has been successful for large functions 
[37, 30, 73] but solves basically a different problem. The MMD algorithm [20] (Miller, 
Maslov and Dueck) is currently the leading reversible logic synthesizer if no ancilla bits 
are used. Mathematically, the problem is to decompose a large permutation of circuit’s 
specification to small permutations of reversible gates that are used.  MMD uses the 
permutation vector-like reversible function specification as its input with an internal data 
structure which represents to a truth table which must be stored and processed in 
memory.  Since it is intrinsically bound by the natural binary order of minterms, and 
hence does not use search, MMD cannot be enhanced through better search algorithms or 
iterative/recursive routines.  Since MMD processes only a single order of the input 
vector, it is reasonably fast and it distinguishes itself among other programs of this type 
because it achieves (theoretical) 100% convergence regardless the problem size [20]. 
  
 
75 
Practically, however, it can be applied to at most 8 qubit reversible functions and very 
few reversible functions with more than 8 variables were presented as MMD benchmarks 
in the literature. It was found in our research, and by other researchers, that the 
complexity of both the synthesis process and the average circuit sizes synthesized by 
MMD grow very quickly above 8 qubits, herein “large circuits”. In our research, it was 
difficult to evaluate the quality of our results for large circuits from reversible 
specifications chiefly due to the lack of a single solution for comparison.  Consequently, 
with this research, I set the benchmark for future research.  (Observe that standard non-
reversible specifications are used in recent papers [74, 72, 30], and we need reversible 
functions such as specified by permutations). At the time MMD program was the 
established benchmark for the evaluation of programs for reversible circuit synthesis with 
no ancilla bits.  A strong asset of the philosophy used in MMD, in contrast to those used 
in other programs is that MMD gives a warranty of convergence if the data is small 
enough for MMD to be able to keep them in memory.  Due to the known fact that the 
quality of MMD may be very low for functions where the exact minimal solution is 
known, several research groups are constantly attempting to improve on the MMD 
algorithm.  Agrawal and Jha’s algorithm [63, 37] uses the number of terms in the Positive 
Polarity Reed-Muller (PPRM) expansion of synthesized functions as its cost function. As 
PPRM can be stored by an expression that is shorter than 2n, their algorithm could, in 
theory, minimize larger functions. On the other hand this algorithm has to store many 
PPRM equations as it represents a tree-search algorithm (discussed in 3.6.3: Hierarchical 
  
 
76 
Diagrams). Also, non-factorized PPRMs may be in many cases of similar complexity to 
truth tables, for instance for the function f=a’b’c’d’. Some of the algorithm variants from 
[63, 64, 60] have trouble with convergence and there is a trade-off between provable 
convergence and size of circuits that can be minimized.  A challenge thus still exists to 
create an algorithm that could trade-off quality for time, but with a provable convergence 
for every function.  I will present such an algorithm in this chapter. 
After many failed attempts at creating better minimizers based on other search 
strategies [75, 70, 71], I decided to improve MMD. The main weakness of MMD is that it 
is limited to functions of the size that their truth table (exponential size) can fit in 
memory. This limits practically MMD's approach to about 13 variables. Because of its 
design principle, even with big speed penalty MMD just cannot minimize larger 
functions. Thus an improved algorithm has to use an entirely different representation. 
When it was decided to use an internal representation other than a truth table or a 
spectrum with 2n minterms, the problem was “what is the best representation that would 
still guarantee convergence?”  Kerntopf [76] used a new type of decision diagrams but 
did not prove the convergence and, as a result, his method only worked for 3 variables.  
In some of our unpublished research I used ESOPs and FPRMs rather than PPRM but I 
was not able to find a heuristic that would work better than the variants from [63, 64]. 
Other cascade types have been also proposed in the newer versions of composition-based 
search approaches [67, 70, 71, 75] but there were troubles with either the size of solutions 
or convergence.  Here I present a search method that is both convergent, allows for 
  
 
77 
synthesis of large functions, and produces near minimal solutions. This algorithm 
includes variants which are various generalizations of MMD. 
 Explanation of MMDs main idea 4.2
To make this chapter self-contained I give a brief overview of MMD.  More can be 
found in [65, 68, 56]. The main idea of all algorithms for reversible circuit synthesis of 
type T1 is to transform bit-by-bit a reversible function to its identity function.  
Example 4-1: Table 4-1 illustrates the basic flow of MMD algorithm. The first 
column lists all input minterms of the function in the natural numerical order (linear): 0, 
1, 2, 3, etc. The second column in Table 4-1 lists values of the output vectors that 
correspond to the input vectors from the first column.  For instance, the input minterm 𝑎  𝑏  𝑐   = 000 is mapped to the output minterm 𝐴  𝐵  𝐶  = 000 and input 001 is mapped to the 
output minterm 100. Self-mapping minterms are minterms with matching input and 
output values (e.g., minterm 000 above). The synthesis process applies successive gates 
to the output column (ABC), bit-by-bit, to generate the corresponding minterm of the 
input column (abc).    Recall that Toffoli and Feynman gates are also self-inverse gates 
(M-1 = M), so they process information the same way from inputs to outputs and from 
outputs to inputs.  The MMD algorithm shown here is thus the “backward searching” or 
“output to input searching” algorithm.  Since the first minterm is self-mapping, MMD 
skips to the second minterm applying a controlled- Feynman gate to bit c, shaded, 
conditional on bit a being set, underscored.   After the application of each gate, the output 
  
 
78 
column minterms (of intermediate functions) become more and more similar to the first 
column – the column of input vectors.  The question is “what does it mean to be more 
similar?” It is an advantage of general search methods that various measures of 
complexity or coincidence or similarity have been used [64, 76, 67, 71].  This may lead to 
better and faster solutions but it is hard or impossible to prove convergence.  The MMD 
algorithm has however a very simple and working solution to this problem.  It requires 
that intermediate columns remain exactly the same as the input column in some subset of 
rows from the top. The completed rows, start from row 0, then row 1, row 2 etc. up to the 
minterm under construction.  When some subset of rows from top are completed, they are 
not allowed to change (shown in shaded areas in Table 4-1) which is guaranteed by the 
selection of proper control bits.  The final circuit is shown in Figure 4-1. 
Table 4-1 MMD method illustrated with truth tables of intermediate functions. Notation  a à  c means  
c = c ⊕  a which “inverts c if a=1”. Control lines are underlined and shaded minterms are completed and 
should not be modified.  The goal of the algorithm is to insert quantum gates in order to transform the input 
vector (column ABC) into the input vector (column abc = column 6).  Starting from the top, the algorithm 
skips the first row since abc = ABC and processes the second row where the output minterm ‘100’ is 
transformed into the input minterm ‘001’ by placing two conditional inverters on qubits ‘A’ and ‘C’.  The 
process continues until column abc = column 6. 
 
  
 
79 
This is the main idea of MMD algorithm and actually the only algorithmic idea of this 
method (excluding templates). The proof that this algorithm is convergent is obvious, as 
every step creates one more bit in a row from top that is the same in the intermediate 
column as in the first column.  This way, after at most n * 2n - 1 steps (intermediate 
columns) the last column becomes exactly the same as the first column, and thus, the 
remaining function to be realized is an identity function.  Obviously, the strength of this 
algorithm is definite convergence, but since the complexity is exponential, MMD is 
limited in application to a small number of bits. So far, however, MMD continues to 
represent the benchmark to meet as no better algorithm had been proposed.  The symbol    
a à c   in the column 1 means that whenever a = 1 in the previous column, the bit c is 
flipped from 0 to 1 and from 1 to 0.  Hence, this transition from column to column 
executes the Toffoli gate c = c ⊕ a. The reader may check that the number of completed 
rows is either the same or larger from column to column.  In this example the upper 
complexity bound is n * 2 n – 1   which for our 3-bit example yields (3 * 2 3 – 1) = 23 
gates.   Note that our example simulation resulted in only 6 gates, which in this case, 
MMD happened to produce good results.  But there are examples [75] where the gate 
number is close to the upper bound although the known minimal number of gates is 
lower. 
  
 
80 
a
b
c
Abca =⊕⊕
Ba=
Cb=
)( ba⊕
)( ca⊕
b
a
ß Flow  
Figure 4-1 The solution circuit found from MMD in Table 4-1  drawn and created from outputs to 
inputs.  The arrow shows the flow of signal from inputs to outputs. This method is possible because each 
reversible gate used in this figure is its own self-inverse. 
 MMDS and MMDSN Orderings 4.3
The main concept of MMD, the natural binary minterm ordering was challenged in 
[11] as the only 100% convergent order.  It was found that MMD’s minterm ordering 
falls into a subset of orderings that do not exhibit certain important property that was 
called the “control line blocking”.  This observation led to the creation of the “MMDS 
ordering” [11]. To make this chapter self-contained, all these ideas will be defined below 
but first I need to motivate the new concepts. Without any backtracking, any bi-
directional search or any template matching, the MMDS ordering used exhaustively were 
superior for 3-bit circuits [11]. The MMDS orderings can be used with any number of 
inputs and have larger gains compared to MMD when the number of inputs increases.   
However, the number of MMDS orderings is too high to use all these orderings for 
synthesis. In this chapter, I introduce an algorithm which uses a subset of the MMDS 
ordering, herein MMDSN orderings, which greatly reduces the number of terms 
examined while providing near minimal solution superior to MMD. 
  
 
81 
MMD stipulates that the function is arranged in a natural binary code order by inputs 
assignments.  Each iteration adds a gate in order to correctly transform the outputs to 
match the inputs without changing any of the previously completed (from top row) output 
minterms.  Other innovative algorithms utilized greedy algorithm where gates are chosen 
to reduce the cost function from input to output.  For example, Hamming Distance 
determines the choice of gates to transform the output function to the original function or 
to identity function.  Such algorithms did not always converge, unlike, MMD, which, as 
it might give the worst solutions, it always converges. The question however is: How can 
these two main ideas of natural ordered search of MMD and greedy search can be 
combined to improve the quality of results and always achieve the convergence.   Such 
combination is the goal of this chapter, part of which is discussed here. 
The good ordering should not conflict with the main MMD’s idea [9, 56] of not 
changing any previously set outputs.  This idea is also what guarantees MMD’s 
convergence.  
Definition 4-1: Control Line Blocking condition occurs when all control lines of the 
current minterm are a subset of the control lines of a previously completed minterm in 
the input order. 
When the condition of Definition 4-1 occurs it makes it impossible to change any 
output bits during the current iteration without altering the output bits which have been 
previously completed.  Occurrence of this condition hinders convergence.  
  
 
82 
control_line_blocking := false; 
for terms i=1..n 
 for terms j=0..i-1 
  if (term[i] & term[j]) == term[i]) then 
   control_line_blocking := true; 
  end if 
 end for 
end for 
 
Example 1: minterm 101 comes after 111. 
  Since (101 == 101 & 111)then control line blocking exists. 
Example 2: minterm 101 comes after 011. 
  Since (101 <> 101 & 011) then no control line blocking exists. 
Figure 4-2 Algorithm and examples of control line blocking detection 
Therefore, any ordering of inputs that does not lead to the occurrence of the blocking 
condition can be used in an improved MMD algorithm. The method to find all non-
blocking permutations for any number of inputs was found in [11].  No control line 
blocking seems to be a very restrictive rule.  For a three-input function there are initially 
8! (40,320) permutations but since the 000 and 111 are assigned to the first and last 
location respectively, the number of permutation reduces to 6!  Using the software, 48 
permutations, called MMDS orders, were found to exhibit no control line blocking for all 
3*3 reversible functions – see Figure 4-2.  Included in this set is, of course, the original 
MMD natural ordering. 
  
 
83 
111
011 101 110
001 010 100
000
(a)
7
3 5 6
1 2 4
0
(b)
7
3 5 6
1 2 4
0
(c)
04125367
 
Figure 4-3 (a) MMDSN orders are created by first constructing a Hasse diagram where each level 
corresponds to the sum of digits at that level. (b) shows all possible transitions which have a single bit 
difference between the two minterms in the transition. (c) Finally, an MMDSN order is constructed from 
the Hasse diagram where all the minterms of lower levels have to be processed before any terms in the 
succeeding level. In this example, the order of levels is ({0}, {4,1,2}, {5,3,6}, {7}) which yields the input 
vector of the following order of minterms {0,4,1,2,5,3,6,7}. 
The binary vectors of cells (minterms) of a 3 * 3 reversible function can be 
represented as a well-known Hasse diagram [77], where a bit-by-bit domination relation 
(1 ≥ 0, 1 ≥ 1, 0 ≥ 0) is used as an ordering relation (see Figure 4-3a, b).  While binary 
vectors are used in Figure 4-3a, Figure 4-3b uses natural numbers being counterparts of 
these binary vectors. The rule says “never to take a dominating node (number) before a 
dominated node”. Thus 5 cannot be taken before 1, for instance. As we see, MMD order 
satisfies these rules. Another set of good orders are shown in Figure 4-3c and Figure 4-4. 
As the number of input lines increases, the number of non-blocking orderings 
increases exponentially.  For functions with four inputs, Stedman [11] reported that 
78,880 different non-blocking permutations exist.  I however discovered that 1,680,382 
such non-blocking permutations exist.  As the amount of non-blocking orders increase so 
does the optimality of the MMDS orderings, and as a result, the time required to 
synthesize.  With MMDSN order, a set of rules were created to distill the best possible 
control choices from the set of all possible control line choices, as follows: 
  
 
84 
• The target bit cannot be used to control the current transformation, 
• Use minimal number controls bits  necessary to flip the target bit, 
• No past outputs can be changed, 
• Process 0 à 1 transitions first to maximize availability of control lines, and hence, 
guarantee convergence. 
The control possibilities are then sent to the gate choice function to produce a circuit.  
Currently gate choice is based on Hamming Distance but it can be any cost function [63, 
75, 76, 67, 69]. Using control line blocking as the only rule, a subset of all input orders 
can easily be found, and it can be easily proven that all non-blocking input orders will 
converge for all output permutations. 
4
3
0
7
65
21
3
0
 
Figure 4-4 A valid MMDS order {0,2,1,3,4,6,5,7} for MMD-like binary synthesis which is shown to be 
algorithmically convergent according to MMDS convergence rule.  This algorithm is outside the subset 
that MMDSN algorithm creates since the minterm ‘3’ (on the third level) is taken before the minterm ‘4’ on 
the second level. 
Theorem 4-1: All non-blocking input orders converge for all output permutations.  
1) Proof of Convergence: 
Convergence is guaranteed in MMD and MMDS because at any given point in the 
algorithms all following output bits are able to be changed without altering any 
  
 
85 
previously set outputs.  This is guaranteed because the input orders do not exhibit control 
line blocking.  With MMD and MMDS’ methodical approaches,  as long as all output bits 
can be changed without altering any previously set outputs these algorithms will 
converge every time. 
MMDS set of orders is a superset of MMD's.  Our improved algorithm uses multiple 
MMDS input orders that exhibit no control line blocking.  Included in these orders is the 
MMD natural binary order.  MMDS ordering algorithm performs the same bit 
manipulating strategy for all non-blocking input orders, and reduces the circuit more than 
the standard MMD algorithm.  This outcome is obvious, given that MMD is a subset of 
MMDS, so it can perform no worse than MMD. 
Definition 4-2: MMDSN order is one in which the minterm 00…0 is generated first, 
followed by all minterms with a single one(1) in random order, followed by all minterms 
with two ones (1’s) in random order, and so on, successively incrementing the number of 
ones (1's) in each band until we finally reach the minterm 11…1. 
Example 4-2: for 3 variables: MMDSN order is for instance: 000, 100, 010, 001, 110, 
101, 011, 111. This is also a MMDS order but not MMD order. 
 Multi-Pass Algorithm 4.4
Earlier attempts at improving MMD algorithm resulted in very good/minimal 
solutions for some circuits or non-converging/incorrect circuits for others [75, 76, 67, 
  
 
86 
71].  Thus the order of selecting outputs to be covered by gates was found experimentally 
to be more important than the gate heuristics to choose gates.  For larger number of 
variables, a variant of our algorithm was created based on the following principles:  
(1)  Rather than maintaining a set of tables mapping inputs to outputs, the algorithm 
creates these columns implicitly, simulating minterms one-by-one.  The simulator uses 
the equations from the specification together with the part of the already constructed 
reversible circuit.  To demonstrate the concept, imagine two circuits similar to Figure 4-1 
cascaded back to back and simulated from inputs at each stage of minterm 
transformation. The first circuit, described by equations, represents the function under 
synthesis, and the second circuit is the outcome of synthesis (in reverse order of gates). 
When the synthesis process completes, two equivalent circuits, one mirror of another 
exist, where the first circuit is specified by equations, and the second by reversible gates 
(in reverse order of gates). When we simulate this composed circuit, for every input 
minterm, the same minterm is obtained at the outputs of the concatenated circuits, and 
hence, the concatenated circuits together are a reversible identity. Since the circuits 
mirror one another, the solution is represented by the second circuit of the concatenated 
whole. 
(2) A number (k) of randomly selected MMDSN orders are generated which represent 
the function under synthesis. The solution with optimal cost is selected with the 
possibility of backtracking if the temporary cost exceeds the minimum cost determined 
earlier in the process. 
  
 
87 
(3) When possible, template-matching method from MMD is used on the result for 
post-processing to further improve the quantum cost.  
For functions of four variables, I created a set of randomly generated four-bit 
reversible functions, AHP1-AHP50, and synthesized them using the original MMD, 
MMDS and our MMDSN orders.  For MMDS and MMDSN, I tested the AHP functions 
against all possible permutations and calculated the minimum possible gate count as 
shown in Figure 4-5 and Table 4-2.  It is evident that our selective order consistently 
produces superior results compared to the single MMD order for a negligible time 
penalty. Notice, however, that although the MMDSN order did not generate the optimal 
gate count generated by MMDS, the time advantage of MP is huge at 4 bits, and would 
be astronomical at greater number of bits.  Even at higher number of bits, MMDSN order 
consistently produces better results than MMD within tolerable time.  For example, at 9 
bits, MMDSN was able to explore 100,000 solutions within 13 minutes.  Although the 
current implementation utilizes parallel processing on an 8 core i7 processor, the 
algorithm is prime for massive parallelization in a cloud infrastructure, CUDA graphic 
processors or HPC supercomputers.  Such capability would allow for synthesizing a 
selecting a larger iteration variant (k) and thus produce even larger circuits.  The reader 
should note that in this study, neither MMD nor MP used local optimization techniques, 
e.g. template matching, which would ideally reduce the number of gates even further.  
Although MP would run even slower with template matching, its inclination to 
parallelization would easily minimize such an impact.  An additional advantage of MP 
  
 
88 
approach is that we can have a trade-off – the longer we run the new combined algorithm 
the better is potentially our result. This property is missing in both MMD and 
Agrawal/Jha approaches. 
 
Figure 4-5 Quantum cost comparision of MMDSN to both MMD and MMDS for 50 AHP functions 
which are randomly generated reverisble specifications.  Each function has three points corrosponding to 
one of the algorithms.  The dotted line represetns a linear trendline which demonstrates that on average, 
the MMDS algorithm always gives the lowest quantum cost while the MMD yields the highest quantum 
cost.  The MMDSN results in quantum costs that are close to the MMDS and much better than MMD.   
 Results of the MMDSN/MP for more than four Variables 4.5
Figure 4-6 shows the results with k=100,000 for functions up to 7 variables and 
k=10,000 for 8 and 9 bit variable functions.  The algorithm was implemented on with 
multiple threads running on Intel i7 processor with 8 execution cores.  The application 
allows the user to select the value of k to any value in order to trade-off between 
  
 
89 
synthesis time and improvement of quantum cost.   For example, I selected a lower value 
of k=10,000 for 8 and 9 variable functions because each sequence has many minterms 
which require a long time to synthesize, and in turn we sacrifice quality of the solution in 
exchange for reasonable time.  With the Multiple-Pass (MP) variant of the algorithm, I 
am able to synthesize functions of up to 30 variables which is not possible, at this time, 
using either MMD or MMDSN because of the amount of resources required by both 
algorithms (230 rows in memory).   To understand the limitation of the MP algorithm for 
very large functions I created a sample reversible function, AHP30_1, of 30 variables 
[75], which was input as separate equation for each bit as it would require a huge amount 
of memory to represent as a truth table.  The synthesis generated a quantum array of 4496 
gates and took 2 hours and 45 minutes to complete.  The function was a simple cascade 
of Toffoli gates where each variable controls its immediate successor.  Our choice of a 
simplistic function, at this time, sets for us a foundation for future research. 
 
Figure 4-6 Comparison of the performance of MMD’s algorithm to the MMDSN algorithm described in 
this chapter.  For MMD the single sample with natural binary order was processed.  For MMDSN the 
average quantum cost of 5 runs over 100,000 samples for each run is reported along with the standard 
deviation of the samples and the percentage of deviation compared to the mean.  The first improvement 
column reports the percentage improvement/degradation of MMDSN over MMD and the final column 
reports the final result of MMDSN which always considers the MMD sequence in the process.  
#	  bits function time	  (ms) avg	  quCost time	  (s) avg	  quCost std.	  deviation %	  dev	  to	  mean #	  samples w/o	  MMD w/MMD
3 ham3 0.003 20 28 9 0 0.00 100000 55% 55%
4 aj-­‐e11 0.004 165 43 79 0 0.00 100000 52% 52%
4 hwb4 0.008 144 58 86.2 4.38 5.08 100000 40% 40%
5 hwb5 0.014 834 123 517.2 43.02 8.32 100000 38% 38%
6 hwb6 0.019 4208 452 2832 109.14 3.85 100000 33% 33%
6 mod5addr 0.023 576 562 535.6 8.96 1.67 100000 7% 7%
7 hwb7 0.033 15206 987 14105.6 409.50 2.90 100000 7% 7%
7 ham7 0.032 17201 1023 13874.6 250.96 1.81 100000 19% 19%
8 hwb8 0.039 60655 2345 62958.8 777.24 1.23 10000 -­‐4% 0%
9 hwb9 0.042 251478 3643 262997.6 2130.32 0.81 10000 -­‐5% 0%
MMD ImprovementMMDSN
  
 
90 
 Analysis and Conclusion 4.6
In this chapter I presented the MMDSN/MP algorithm for quantum logic synthesis of 
binary specifications of large number of variables.  I stated earlier that the motivation of 
this algorithm attempts to optimize three conflicting objectives: 1) minimizing the 
amount of time required for synthesis; 2) maximizing the number of variables of the 
specification under synthesis while 3) reducing the quantum cost of the resulting circuit.  
While the MMD algorithm produces results very quickly because it only operates on the 
single natural order of the input vector, it does not produce the least quantum cost.  
MMDS on the other hand explores every possible solution and produces the best 
quantum cost, yet, it takes a very long time to synthesize and, consequently, is severely 
limited to six variables.  By arranging the input vector according to the Hasse diagram, 
the MMDSN algorithm constructs a specific subset of the entire search space which is 
guaranteed to be algorithmically convergent and produces better results than MMD and 
within a tiny fraction of the time than MMDS takes and processes functions up to 14 
variables.  The MP variant of the MMDSN algorithm is capable of synthesizing even 
larger functions, up to 30 variables, by starting from a set of equations rather than a truth 
table. 
In Figure 4-6 the MMDSN algorithm proved to be consistent when it ran multiple 
times for the same function.  Examining the standard deviation, we notice that the results 
from the 5 runs yielded quantum cost close to one another and to the average quantum 
cost.  The column labeled percent deviation to mean calculates the ratio of the standard 
  
 
91 
deviation to the mean to determine how far do the samples stretch away from the average 
quantum cost.  Except for the hwb5 function, all the results are below 5%.  Notice that the 
percentage improvement (in the last two columns) illustrates the superiority of the 
MMDSN relative to MMD except for specifications larger than seven variables.  
Although 100,000 samples is a large number, it is a minuscule percentage of the solution 
space for functions of eight variables or higher.  A larger number of samples is necessary 
in order to discover better solutions, which would, of course, require increasing longer 
time as the number of variables increase. 
Finally, by using the Hasse diagram as the only foundation for constructing valid 
solutions, the MMDSN algorithm ignores a huge swath of the solution space which 
includes other convergent solutions.  In the next chapter, I will examine an extension to 
the MMDSN algorithm which allows for exploring additional subsets of the search space 
which also consists of solutions which are guaranteed to converge.  
  
  
 
92 
Table 4-2 Comparison of MMDSN to both MMD and MMDS with respect to  
quantum cost and duration of synthesis 
Function MMDSN MMD MMDS 
 # 
Gates 
Q-Cost Time 
(ms) 
# 
Gates 
Q-Cost Time 
(ms) 
# 
Gates 
Q-Cost Time 
(ms) 
AHP-0 18 102 8.393 20 144 1.074 15 55 178,097 
AHP-10 16 68 6.991 29 209 0.022 14 42 182,428 
AHP-100 22 150 8.04 25 149 0.018 18 98 205,910 
AHP-102 21 109 7.653 28 192 0.019 19 103 362,359 
AHP-104 19 99 7.408 28 192 0.02 17 73 392,670 
AHP-106 21 129 7.567 24 116 0.016 17 77 438,121 
AHP-108 20 108 8.078 21 129 0.015 17 77 464,066 
AHP-1000 16 80 7.497 19 111 0.014 14 54 468,883 
AHP-1002 21 113 7.513 31 223 0.014 18 78 526,966 
AHP-1004 20 136 7.056 23 167 0.029 15 79 539,691 
AHP-1006 17 93 7.495 24 172 0.03 17 109 575,764 
AHP-1008 19 95 6.682 31 215 0.024 18 90 593,118 
AHP-1010 18 74 6.953 30 230 0.028 17 85 621,180 
AHP-1012 23 131 7.146 28 168 0.031 18 70 626,634 
AHP-1014 23 139 8.069 27 179 0.031 19 75 639,966 
AHP-1016 18 126 6.748 23 167 0.03 15 79 646,605 
AHP-1018 17 105 6.939 25 197 0.03 15 63 408,780 
AHP-1020 18 106 7.317 25 193 1.803 16 96 284,467 
AHP-1022 19 111 7.697 24 156 0.153 14 54 268,481 
AHP-1024 22 138 6.622 30 218 0.148 16 76 253,849 
AHP-1026 14 66 7.252 17 113 0.154 14 66 229,625 
AHP-1028 14 86 7.343 20 148 0.157 13 81 222,084 
AHP-1030 21 137 7.776 27 167 0.124 16 80 211,866 
AHP-1032 20 108 6.726 27 187 0.106 17 93 214,853 
AHP-1034 19 123 7.132 22 138 0.102 15 71 220,812 
AHP-1036 19 107 7.257 26 186 0.093 17 81 206,786 
AHP-1038 18 106 7.927 18 106 0.083 13 65 210,267 
AHP-1040 16 96 6.478 22 174 0.078 11 39 217,464 
AHP-1042 22 146 7.263 25 173 0.08 19 99 204,661 
AHP-1044 19 107 7.325 23 159 0.096 16 92 196,889 
AHP-1046 19 107 7.739 23 147 0.092 15 71 210,829 
AHP-1048 18 94 6.484 20 120 0.096 17 89 201,351 
AHP-1050 23 123 7.325 34 230 0.083 19 83 219,222 
AHP-1052 18 110 7.557 26 166 0.08 16 84 241,366 
AHP-1068 22 134 8.606 21 97 0.017 19 99 272,891 
AHP-1070 18 106 7.707 22 158 0.018 16 80 386,639 
AHP-1072 20 112 7.611 23 159 0.019 16 72 313,911 
AHP-1074 22 126 8.236 26 194 0.017 18 106 263,204 
AHP-1076 21 121 8.644 28 184 0.02 18 74 264,143 
AHP-1078 21 105 7.69 30 222 0.021 18 78 277,411 
AHP-1080 21 145 7.879 21 109 0.016 17 93 289,429 
AHP-1082 21 133 8.109 29 233 0.016 16 92 230,252 
AHP-1084 23 119 8.797 31 187 0.014 20 104 263,490 
AHP-1086 20 116 7.367 27 195 0.014 17 85 232,918 
  
 
93 
 Covered Set Partitions Chapter 5
 Introduction  5.1
The algorithm developed by Miller, Maslov and Dueck (MMD) [20]  led the way in 
logic synthesis for reversible functions which requires no additional ancillary bits. 
Stedman and Perkowski [26] presented an algorithm capable of producing circuits with 
lower number of gates by exploring permutations of input vector ordering other than the 
natural ordering used by MMD.  Stedman’s method however stalled at large number of 
variables as it required an exorbitant amount of time to compute.  Alhagi, Hawash and 
Perkowski [21, 22] followed up with a synthesis method which explores a subset of 
Stedman’s orderings that produce near optimal circuits within a reasonable amount of 
time.  Mathematically, the problem of reversible function synthesis is the decomposition 
of large permutation of circuit’s specification to small permutations of reversible gates.  
MMD uses the vector-like reversible function specification as its input which 
corresponds to a truth table that is explicitly used in the synthesis process and, thus, is 
stored and processed in memory. Additionally, since it is intrinsically bound by the 
natural binary order of input terms, MMD’s algorithm does not utilize any search facility, 
and hence, cannot be enhanced through better search or iterative/recursive routines.  On 
the positive side, MMD is reasonably fast to synthesize the function since it only 
processes a single ordering of the input vector, the natural ordering.  In addition, MMD 
distinguishes itself among other methods of this type because it achieves (theoretical) 
100% convergence regardless the problem size [56].  Practically speaking, however, 
  
 
94 
MMD is limited to functions of at most 8 variables due to the limitation of memory 
resources as clearly evident by the limited set of benchmarks presented in the literature.  
Clearly, MMD set the benchmark functions for small reversible functions with no 
ancillary bits. However, I discovered through our research that the complexity of both the 
synthesis process and the average circuit sizes synthesized by MMD grow very quickly 
above 8 qubits, herein “large circuits”.  Due to the known fact that the quality of MMD 
may be very low for functions where the exact minimal solution is known, several 
research groups have constantly attempted to improve on the MMD algorithm.   
Agrawal and Jha’s algorithm [63, 60] uses the number of terms in the Positive Polarity 
Reed-Muller (PPRM) expansion of synthesized functions as its cost function.  As PPRM 
can be stored by an expression that is shorter than 2n, their algorithm could, in theory, 
minimize larger functions.  On the other hand this algorithm has to store many PPRM 
equations as it represents a tree-search algorithm.  Also, non-factorized PPRMs may be in 
many cases of similar complexity to truth tables which imposes the same resource 
constraints as MMD.  Additionally, some variants of the algorithm [63, 64, 60] have 
trouble with convergence where a trade-off is stipulated between provable convergence 
and size of circuits that can be minimized.  Although Stedman improved on the results of 
MMD by synthesizing all permutations of the input vector to produce optimal results, 
Stedman’s approach hits the ceiling for a relatively small number of bits (5) due to the 
time it takes to synthesize all permutations of the input vector.  Alhagi, et al [21] 
presented the best challenge to both MMD and Stedman through synthesis of an 
  
 
95 
orchestrated subset of Stedman’s input orderings which consistently produced equal or 
better results than MMD and completed in a short amount of time while able to 
synthesize considerably large functions (30-bits).  This chapter extends our effort from 
the previous chapter by combining the mathematical concept of covering sets with Hasse 
diagrams to construct new set of input vector orderings, still a subset of Stedman’s, which 
would produce lower cost circuits than all previous efforts, yet still able to complete 
within a reasonable amount of time. 
 MMD Style Algorithms 5.2
In their paper, A Transformation Based Algorithm for Reversible Logic Synthesis, 
Miller, et al. [56] outlined a simple, yet powerful, synthesis method of reversible circuits 
(herein MMD).  This algorithm observes a simple, yet essential, guiding principle stating 
that: A completely mapped pair can never be altered by succeeding mapping 
calculations.  This important rule allows MMD to always converge which is an essential 
principle for synthesizing arbitrary reversible circuits.  Some definitions are in order 
before I illustrate the algorithm with an example. 
Definition 5-1: An n-variable mapping specification is a set of n variable input/output 
pairs, typically represented as a table, indicating the required functionality of a logic 
circuit or algorithm. 
Definition 5-2: An n-variable input/output pair describes the expected output 
sequence for its corresponding input sequence. 
  
 
96 
Definition 5-3: A completely mapped pair is a pair where, at some point in the logic 
synthesis, a set of logic gates have been specified to map its n-variable input to its 
corresponding n-variable output. 
Definition 5-4: A self-mapping pair is a pair whose n-variable input sequence is equal 
to its corresponding output sequence. 
Synthesizing by brute force, the MMD algorithm starts with a mapping specification 
of a fully specified reversible function and creates a cascade of primitive reversible gates 
to map all input/output pairs.  Figure 5-1 shows a mapping specification of a two-variable 
function where the inputs are designated with (ab) and the outputs with (AB).  The 
algorithm synthesizes the function as follows: 
• Considering the inherent reversibility of the function, the algorithm starts 
synthesis from the output column (AB) towards the input column (ab). 
• Starting with the first pair (00 à 10), the MMD algorithm realizes that an 
inverter on line (a) would correctly map the 00 input to the 10 output.  
Essentially, any value presented on the (A) line will be inverted, as shown in 
the bolded text of the third column.  At this stage, the first input/output pair is 
completely mapped, and according to the guiding principle mentioned above, 
such a pair should never be modified by later transformations. 
• In order to observe such a rule, the algorithm uses control lines for all 
subsequent synthesis as shown in the last two columns.  Row two of the third 
  
 
97 
column shows the pair (01 à 11) which requires an inverter on the (A) line 
with line (B) as a control line – shaded.  As a result, only the bolded digits of 
second and third rows are affected. 
• Similarly, the third pair (10 à 11) is synthesized with an inverter on line (B) 
which is controlled by a value of one (1) on line (A). 
• At this stage, the algorithm realizes that the mapping circuit is complete as the 
first and last columns are both identical. 
 
Inspired by MMD, Stedman, et al. [11], were able to successfully synthesize the same 
functions used by MMD and yield circuits with less number of gates.  Stedman, et al, 
questioned a limitation of the MMD algorithm requiring the use of the natural ordering 
for the input sequence in the mapping specification – column (ab) in Figure 5-1.  By 
analyzing all permutations of input/output pairs, Stedman, et al, realized that a finite set 
 
Figure 5-1  MMD synthesis method with truth tables holding intermediate functions. Resulting 
circuit is shown below table with each gate under its corresponding synthesis step. Control bits are 
shaded and target bits are bolded.  The first column ‘ab’ is the input vector and the second column 
‘AB’ is the output vector.  Gates are placed between the input and output to transform the output 
vector ‘AB’ to be a replica of the input vector ‘ab’ (column 1 = column 5).  The resulting cascade of 
gates represents the synthesized circuit.   Starting from the top minterm, in column 3, an inverter is 
placed on the ‘A’ qubit in order to transform the output minterm AB=10 to match the input minterm 
ab=00. The process is repeated for all minterm pairs until column 1 = column 5.    
  
 
98 
of input sequences suffers from control line blocking which results in a violation of 
MMD's guiding principle: never altering previously completed mapped pairs.  Such 
blocking sequences never converge.  Figure 5-2 shows an input sequence in column (ab) 
which illustrates control line blocking where the first pair is easily synthesized with an 
inverter on line (B).  However, the second pair (01 à 10) requires inverting the same 
output bit (B) from 0 to 1 in order to match the (b) bit of input term (01).  Using (A=1) as 
a control line, however, would surely violate MMDs aforementioned guiding principle by 
altering the first completely mapped pair (11) back to (10).   Consequently, this input 
sequence in figure 2 suffers from control line blocking and is rejected for it will never 
converge. 
 
Figure 5-2 Row 2 illustrates control line blocking where the output minterm ‘10’ is to be mapped to the 
input minterm ‘01’.  Following the MMD method of synthesies, I can invert the lower bit of ‘10’ while 
using the upper bit for control.  Had I done so, however, would alter the completed minterm in the first row 
‘11’ to ‘10’ (effectively restoring the original state of the first row.  Altering any completed minterm is a 
violation of the MMD algorithm which indicates that the algorithm is blocked and will never converge. 
Stedman, et al, were able to distill their discovery of blocking sequences to the 
following definition: 
Definition 5-5: Control Line Blocking condition occurs when all control lines of the 
current minterm are a subset of the control lines of a previously completed minterm for a 
given input order. 
ab AB11 10 1 1101 11 110 01 000 00 0
  
 
99 
Stedman, et al, condensed the process of detecting control line blocking of any input 
order to the following algorithm: 
 
Armed with such clear formula, Stedman, et al, launched a laborious quest for the 
optimal circuit by exhaustively synthesizing the input/output function using all valid, i.e. 
non-blocking, input sequences (herein, MMDS orders).  For a given input/output 
specification, the algorithm tests all permutations of input orderings for valid sequences 
which are then synthesized into a set of gates.  The circuit with the minimum quantum 
gate cost is selected as the result of the algorithm.  Clearly this algorithm was always able 
to find solutions equal to or better than the MMD algorithm could.  A mathematician 
trained in ordered sets would quickly realize that the concept harnessed by MMDS 
derives from the covering graph theory where, for convergent input orderings, earlier 
terms in the ordering are covered by later terms.  The covering concept is enforced by the 
AND (&) operation in the algorithm testing that, for convergent inputs orderings, the 
pattern of ones of all completely mapped pairs are a subset of the ones of the term under 
synthesis. 
for terms i=1..n 
 for terms j=0..i-1 
  if (term[i] & term[j]) = term[i]) then 
   Control Line Blocking Detected; 
  end if 
 end for 
end for 
  
 
100 
In the previous chapter, and in Alhagi, et al. [21], I demonstrated the use of covering 
graph concept to devise a mechanism for generating valid orderings.  I realized that as the 
number of input variables of a specification increased, the level of computations for the 
MMDS algorithm swelled exponentially rendering it impracticably slow.  In the 
MMDSN algorithm, I focused on creating a subset of the MMDS orderings, which trades 
off optimality of the resultant circuit for reasonability of the time required for synthesis.  
Consequently, rather than tediously discovering and rejecting blocking sequences, the 
authors opted for creating a process to systematically build a subset of MMDS’ input 
orderings which assures convergence.  Relying on the covering graph theory, I 
discovered a process to mechanically create a subset of the valid MMDS orderings by 
arranging the input terms in a Hasse diagram.   
Definition 5-6: a Hasse diagram is a type of mathematical diagram used to represent 
a finite partially ordered set, in the form of a graph where, for the relation {(x,y) | x ≤ y | 
x,y ∈ S}, each element of S is a vertex in the plane and draws a line segment or curve that 
goes upward from x to y whenever y covers. 
 
Figure 5-3 Construction of Binary Hasse Diagram for any 3-bit function starts with the minterm with 
all zeros first in the first level (level 0).  Every subsequent level includes all minterms which have the same 
sum of their digits.  Level 1, for example, has the minterms {001, 010, 100} since the digits of each minterm 
add up to ‘1’.  The last minterm has all digits set to ‘1’. 
  
 
101 
Figure 5-3 (a,b) illustrates the process of creating a Hasse diagram for a 3-bit sequence 
which is then used to create an MMDS compliant ordering as shown in Figure 5-3c.  As 
demonstrated in the last chapter, I used the following procedure to create the MMDSN 
sequences: 
• Create a  Hasse diagram by – Figure 5-3: 
• Starting with the base term consisting of all zeros, 
• Draw a line from above term to new terms constructed by placing a one for 
each zero in the base term, (the covering property), 
• Repeat step (b) for each new term to construct a new layer of the diagram. 
• Stop when the term consisting of all ones is reached. 
• Generate an MMDSN ordering by – Figure 5-4: 
• Start from the base level of the Hasse diagram which consists of the base term 
of all zeros, 
• Randomly permute terms of the next level consisting of a single one and place 
them at the end of the ordering [e.g., 4,1,2 in Figure 5-4 (c)],  
• Repeat step (b) for each level afterwards where each level includes all terms 
consisting of an additional one compared to the level before it [e.g., 5,3,6 in 
Figure 5-4 (c)], 
• Place the last term of all ones at the end of the ordering, 
• Resulting in the valid input sequence {0,4,1,2,5,3,6,7}. 
  
 
102 
 
Figure 5-4 (a) MMDSN orders created by first constructing a Hasse diagram where valid transitions 
from one minterm to another is shown in  (b). A valid MMDSN order is constructed from the Hasse 
diagram where all the minterms of lower levels are processed before any terms in the succeeding level. 
The reader can easily surmise that sequences arranged in a Hasse diagram are a subset 
of the Stedman set of orderings, and, as a result, are convergent.  The ultimate benefit of 
this algorithm is a circuit with less quantum gates than produced by MMD.  While not as 
optimal as MMDS, this algorithm always completes in real time for circuits with eight 
variables or less, and a reasonably tolerable time for larger circuits. For a small number 
of variables, it is possible to compute all permutations at each level and discover the 
synthesis that produces the most optimal circuit with an MMD type of an algorithm.  
Notice that the random aspect of step (2.b) above becomes necessary as the number of 
variables and computation time increases, and as a result, the number of orderings is 
limited to a user selected upper bound (k) of randomly created MMDSN orderings and 
select the best result out of them.  In summary, MMDSN devised a method of 
constructing a set of input term orderings which are guaranteed to converge and then 
synthesizing a subset of these orderings to discover the best solution out of the subset.  
Contrarily, MMDS had to grudgingly try each permutation of the input orderings and 
synthesize the ones which would converge before discovering the optimal solution. 
111
011 101 110
001 010 100
000
(a)
7
3 5 6
1 2 4
0
(b)
7
3 5 6
1 2 4
0
(c)
04125367
 
	  
  
 
103 
 Partial Covering Set Partitions 5.3
I realized that a partitioning of the original set of input vector permutations while 
maintaining properties of a covering graph could prove fruitful.   I realized that the 
MMDSN algorithm could be sidestepping some of the Steadman search space and 
wanted     to construct another structure which is capable of complementing MMDSN 
and exploring other portions of the MMDS search space; and hence, the Covered Set 
Partition algorithm (CSP). Figure 5-5 shows an illustration of covering set partitions for 3 
and 4 bit variables.  The lower half of the graph represents the lower partition of a four 
bit functions where the highest bit 3 = 0, and the upper half is for the partition where bit3 
= 1. Considering a three bit function, represented by one of the cubes, the lower plane 
represents the partition where the highest bit2 = 0, and the upper plane represents the 
partition where bit 2 = 1.  
In the case of the four bit function, notice that it is possible to partition on the upper bit 
only, or the upper two bits and process the remaining lower bits with a Hasse diagram.  
For example, a one bit partitioning model would split the sequence into the two halves 
shown and construct two Hasse diagrams for each of the remaining lower three bits, as 
shown by the diagonal dashed lines, following the same process as MMDSN.  Using the 
two upper bits for partitioning, would yield four partitions with the lowest shaded plane 
where (b3b2=00) followed by lower middle plane (b3b2=01) followed by the upper 
middle plane (b3b2=10) and lastly the upper plane (b3b2=11). 
  
 
104 
 
Figure 5-5 The input sequence of a four variable (B3B2B1B0) can be partitioned in multiple ways.  Using 
the upper variable (B3) only would create two partitions (visually represented as the two cubes) and the 
remaining bits (B2B1B0) are arranged according to a Hasse diagram (the levels of the Hasse diagram are 
shown as diagonal lines – described in earlier sections).  Using the upper two bits (B3B2) for partitioning 
would create 4 partitions (shown as the four planes) where the remaining two bits of each partition would 
also be arranged according to the Hasse structure.  A CSP input sequence is then constructed by arranging 
the minterms according to the natural order of the bits forming the partition then for each partition the 
remaining bits are arranged according to the MMDSN sequence described in the previous chapter. 
 CSP Algorithm 5.4
The following process outlines the steps for creating CSP sequences for an n- variable 
function using the p upper bits for partition: 
• Create k=2p partitions where p is the number of upper bits used with partitions 
j=0...k-1. 
• To construct an input sequence, start from the lowest partition to the highest 
partition where, for each partition, use the lower (n – p) bits to create a Hasse 
diagram. 
B3: 1
B3: 0
000
010
001
011
100
110
101
111
000
010
001
011
100
110
101
111
  
 
105 
• For each Hasse diagram, construct and MMDSN order as follows: 
a. Start from the base level of the Hasse diagram consisting of all zeros, 
b. Randomly permute terms of the next band consisting of single ones and 
place them at the end of the ordering, 
c. Repeat step (b) for each band that follows in consecutive order, where 
each band has an additional one compared to the band before it, 
d. Place the last term consisting of all ones at the end of the sequence. 
 Experimental Results 5.5
To study the impact that the partition size has on the quantum cost of the synthesized 
circuit, I experimented with synthesizing all possible partition sizes for 16 benchmark 
functions of different number of variables and reported the results in   
  
 
106 
Table 5-1.  I selected 16 different functions ranging from 5 variables to 13 variables, 
where for each function of n variables, I constructed sequences for partition sizes from 
zero to n-1, and for each partition size I repeated the synthesis five times for 100,000 
samples each.  Consequently, for a function of n variables, the number of solutions 
visited is 500,000n solutions. 
 
Figure 5-6 Normalized quantum cost for 16 benchmark functions ranging from 5 to 13 variables.  In 
order to visualize the outcome for functions of differnet number of variables, I scaled the quantum cost for 
all functions to the range between 0 and 100% (y-axis) and scaled the partitions size between 0 and 1 (x-
axis).  The normalized quantum cost for functions of the same number of variables were averaged and 
plotted here.  Notice that the lowest quantum cost (normalized) for all functions is found for normalized 
partitions size between 0.4 and 0.6; hence, the best quantum costs are for parition sizes which are close to 
the half-way point relative to the number of variables.  Notice the trendlines clearly show the minimums for 
each dataset are close to the middle partition. 
Figure 6-6 shows a plot of the normalized quantum cost for the sixteen functions 
against the normalized partition size.  For each individual function I first scaled the 
quantum cost to 100% and scaled the partition size to be between 0 and 1.  I then grouped 
the functions by the number of variables and averaged the normalized quantum cost  for 
  
 
107 
each normalized partitions size and plotted them on the graph.  Notice that for each 
function of n variables, there are n points on the graph and that, for the 12 and 13 variable 
functions, only the first two points are visible since their normalize quantum cost drops 
quickly below 50% - I will discuss these functions shortly.  The graph clearly shows that 
for all functions below 12 bits, the normalized quantum cost is lower around the midpoint 
of the normalized partition size.  Figure 5-7 demonstrates this finding where it plots the 
normalized partition size where the best quantum cost occurred for the 16 functions.  
These points represent the lowest points of Figure 6-6 which, according to Figure 5-7, the 
best quantum cost occurs when the normalized partition size is between 0.4 and 0.6 – 
excluding functions with 12 or more variables. 
 
Figure 5-7 The graph plots the normalized partition size where the best normalized quantum cost was 
discovered from Table 5-1, in which, the partition size and best average quantum cost are shaded in gray.  
Notice that for functions of 11 variables or less, the best quantum cost occurred around the midpoint of the 
normalized partition size (between 0.4 and 0.6) which indicates that for functions of this number of 
variables, a partition size in the vicinity of ½ the number of variables typically yields the best results. 
  
 
108 
 Analysis and Conclusion 5.6
In this chapter I introduced the Covered Set Partitions (CSP) algorithm which builds 
on our previous work with the MMDSN algorithm and extends its reach to explore more 
patches of the solution space which are outside the range of MMDSN and MMD.  I 
maintain the same objective of MMDSN where I continue to explore quantum logic 
synthesis for binary reversible functions of large number of variables.  Our motivation 
remains consistent with the previous chapter where I attempt to optimize three conflicting 
objectives (time, quantum cost, number of variables).  Compared to the MMDSN 
algorithm, however, the CSP algorithm sacrifices more time in favor of discovering better 
quantum costs.  For an n variable function, the CSP will take, on average, n times more 
time to complete as it repeats the synthesis process for each partitions size (n partitions).  
However, based on the observations above where the best quantum cost occurs around 
the midpoint, a possible optimization is to explore a limited number of partitions sizes 
which are centered around the mid-point of the number of variables.  For functions of less 
than 12 variables, the CSP has discovered the best quantum cost in the region where the 
normalized partitions size is between 0.4 and 0.6.  Assuming we chose k < n partitions to 
explore, the total time would be k times greater than MMDSN with a high probability of 
discovering better solutions than MMDSN or MMD.   
The two end points of the graph where the partitions size = 0 or n represent the 
MMDSN and MMD algorithms respectively.  When the partitions size is zero, the Hasse 
diagram is used to construct the sequence on all the variables of the input vector, which is 
  
 
109 
exactly what the MMDSN algorithm does – see previous chapter for details.  On the other 
extreme, when the partitions size equals to n-1, the entire input vector would be arranged 
according to the natural order which is exactly what the MMD algorithm does.  
Consequently, the two algorithms are a subset of the CSP algorithm presented in this 
chapter. 
 
Figure 5-8 Results for functions of 12 and 13 variables.  Similar to the other functions, the midpoints of 
the partition size appears to have the best normalized quantum cost.  The left edge represents the MMDSN 
algorithm which, in this case, suffers greatly in discovering the quantum cost. 
For functions of 12 variables or higher, the CSP algorithm also discovers lower 
quantum costs than on the edges, however, the left edge which represents the MMDSN 
algorithm, partitions size = 0, suffers the most as seen if Figure 5-8.  When you examine 
these functions in Table 5-1, you realize that partitions sizes around the midpoint still 
gives low quantum cost, but not always the lowest.  For the 12 bit function 
plus63mod4096, for example, the best quantum cost, 686522.2, occurs at partitions size 
  
 
110 
10, however, at partition size 6, the quantum cost of 713982.2 is only a 4% higher than 
the best quantum cost discovered.   The 13 bit function plus127mod8192, however, has a 
higher percentage difference at the midpoint when compared to the best quantum cost. 
For partition size of size zero, the MMDSN case, the subset of the search space, which 
is explored by the algorithm, is considerably huge so that the random algorithm has 
difficulty discovering low quantum costs.  The results are consistent over multiple runs, 
which indicates that a partition size of zero is not the best place to start when exploring 
functions of large number of variables.  When the search space is partitioned into smaller 
areas, the CSP algorithm had better success in discovering lower quantum costs, since the 
algorithm is exploring a smaller subset of solutions for each partition.  This discovery has 
been consistent for all functions, and the fact that partition sizes around the midpoint of 
the size of the function is the best place to start the hunt.     
  
  
 
111 
Table 5-1 Results of synthesizing 16 functions of different number of variable for every possible 
partitions size for the specific function.  Each table includes the average quantum cost over 5 runs, along 
with the standard deviation for the runs and the ratio of the standard deviation to the mean (a measure of 
variance in results).  The majority of results show consistency between runs with a variance below 3% 
while few exhibited higher error around 9%.  The shaded number represents the partition size with the 
lowest average quantum cost.  For functions below 12 bits, all functions witnessed the best quantum cost 
where the partition size is around half the number of variables. 
 
Partition	  Size Avergae	  QC Std.	  Deviation %	  std.	  dev	  from	  mean Partition	  Size Avergae	  QC Std.	  Deviation %	  std.	  dev	  from	  mean
0 517.20 43.02 8.32 0 59769.00 1489.74 2.49
1 480.00 30.14 6.28 1 56143.60 492.84 0.88
2 465.60 28.65 6.15 2 54689.60 652.62 1.19
3 647.00 0.00 0.00 3 52522.20 254.56 0.48
4 834.00 0.00 0.00 4 51714.60 831.78 1.61
5 51910.60 600.00 1.16
0 2832.00 109.14 3.85 6 53287.60 768.40 1.44
1 2786.80 64.12 2.30 7 66142.00 0.00 0.00
2 2483.40 220.61 8.88
3 2557.40 86.15 3.37 0 16847.00 636.93 3.78
4 2698.00 51.43 1.91 1 17100.40 997.13 5.83
5 4208.00 0.00 0.00 2 15668.60 833.24 5.32
3 14166.60 841.58 5.94
0 575.60 8.96 1.67 4 14532.20 376.44 2.59
1 573.00 0.00 0.00 5 14734.00 1390.56 9.44
2 563.40 20.85 3.70 6 15227.00 1267.06 8.32
3 602.00 0.00 0.00 7 14923.40 1318.49 8.84
4 576.00 0.00 0.00
5 576.00 0.00 0.00 0 62958.80 777.24 1.23
1 59006.40 693.12 1.17
0 14105.60 409.50 2.90 2 55714.20 290.94 0.52
1 13423.60 367.42 2.74 3 53742.60 835.05 1.55
2 12567.60 374.13 2.98 4 53043.80 489.49 0.92
3 12648.00 218.45 1.73 5 53797.80 648.31 1.21
4 12271.40 391.41 3.19 6 53745.60 722.37 1.34
5 12574.20 240.06 1.91 7 60655.00 0.00 0.00
6 15206.00 0.00 0.00
0 13874.60 250.96 1.81
1 12851.20 111.62 0.87
2 11939.80 327.81 2.75
3 11749.20 307.94 2.62
4 12315.80 393.93 3.20
5 12486.60 149.29 1.20
6 17201.00 0.00 0.00
mlp4	  (8	  bits)
ham8	  (8	  bits)
hwb5	  (5	  bits)
hwb6	  (6	  bits)
mod5addr	  (6	  bits)
hwb7	  (7	  bits)
ham7	  (7	  bits)
urf2	  (8	  bits)
  
 
112 
 
Partition	  Size Avergae	  QC Std.	  Deviation %	  std.	  dev	  from	  mean Partition	  Size Avergae	  QC Std.	  Deviation %	  std.	  dev	  from	  mean
0 517.20 43.02 8.32 0 59769.00 1489.74 2.49
1 480.00 30.14 6.28 1 56143.60 492.84 0.88
2 465.60 28.65 6.15 2 54689.60 652.62 1.19
3 647.00 0.00 0.00 3 52522.20 254.56 0.48
4 834.00 0.00 0.00 4 51714.60 831.78 1.61
5 51910.60 600.00 1.16
0 2832.00 109.14 3.85 6 53287.60 768.40 1.44
1 2786.80 64.12 2.30 7 66142.00 0.00 0.00
2 2483.40 220.61 8.88
3 2557.40 86.15 3.37 0 16847.00 636.93 3.78
4 2698.00 51.43 1.91 1 17100.40 997.13 5.83
5 4208.00 0.00 0.00 2 15668.60 833.24 5.32
3 14166.60 841.58 5.94
0 575.60 8.96 1.67 4 14532.20 376.44 2.59
1 573.00 0.00 0.00 5 14734.00 1390.56 9.44
2 563.40 20.85 3.70 6 15227.00 1267.06 8.32
3 602.00 0.00 0.00 7 14923.40 1318.49 8.84
4 576.00 0.00 0.00
5 576.00 0.00 0.00 0 62958.80 777.24 1.23
1 59006.40 693.12 1.17
0 14105.60 409.50 2.90 2 55714.20 290.94 0.52
1 13423.60 367.42 2.74 3 53742.60 835.05 1.55
2 12567.60 374.13 2.98 4 53043.80 489.49 0.92
3 12648.00 218.45 1.73 5 53797.80 648.31 1.21
4 12271.40 391.41 3.19 6 53745.60 722.37 1.34
5 12574.20 240.06 1.91 7 60655.00 0.00 0.00
6 15206.00 0.00 0.00
0 13874.60 250.96 1.81
1 12851.20 111.62 0.87
2 11939.80 327.81 2.75
3 11749.20 307.94 2.62
4 12315.80 393.93 3.20
5 12486.60 149.29 1.20
6 17201.00 0.00 0.00
mlp4	  (8	  bits)
ham8	  (8	  bits)
hwb5	  (5	  bits)
hwb6	  (6	  bits)
mod5addr	  (6	  bits)
hwb7	  (7	  bits)
ham7	  (7	  bits)
urf2	  (8	  bits)
  
 
113 
 
Partition	  Size Avergae	  QC Std.	  Deviation %	  std.	  dev	  from	  mean Partition	  Size Avergae	  QC Std.	  Deviation %	  std.	  dev	  from	  mean
0 94677.60 1647.98 1.74 0 4185570.60 3021.68 0.07
1 87661.60 1887.98 2.15 1 3759816.20 16189.25 0.43
2 85417.80 1098.08 1.29 2 3509468.60 14335.49 0.41
3 83607.00 870.24 1.04 3 3408426.60 14354.58 0.42
4 80754.20 1660.82 2.06 4 3327920.60 19046.78 0.57
5 81495.40 1622.21 1.99 5 3270174.20 9570.76 0.29
6 83701.00 1880.12 2.25 6 3273673.20 17588.71 0.54
7 86658.80 1028.81 1.19 7 3286444.80 12309.80 0.37
8 98221.00 0.00 0.00 8 3283356.00 4912.80 0.15
9 3297517.40 18489.35 0.56
0 262997.60 2130.32 0.81 10 3531787.00 0.00 0.00
1 242097.80 1730.92 0.71
2 229557.20 1691.04 0.74 0 8039679.40 124163.35 1.54
3 222400.40 1444.06 0.65 1 7946022.80 123583.65 1.56
4 217330.00 1255.85 0.57 2 5564745.40 80074.53 1.44
5 217870.00 1602.04 0.74 3 3572569.20 83341.23 2.33
6 217573.80 1649.06 0.76 4 2252990.40 32991.30 1.46
7 218820.00 1981.44 0.91 5 1347609.40 6575.66 0.49
8 251478.00 0.00 0.00 6 713982.20 28507.76 3.99
7 698602.20 24738.06 3.54
0 253058.20 1753.67 0.69 8 691560.20 24986.72 3.61
1 234130.00 2098.63 0.90 9 698068.80 3953.90 0.57
2 222287.20 1507.57 0.68 10 686522.20 10372.66 1.51
3 212919.80 2554.46 1.20 11 850847.00 0.00 0.00
4 211623.00 3891.74 1.84
5 211727.60 2098.67 0.99 0 40960807.60 398797.49 0.97
6 209847.80 994.75 0.47 1 39614392.40 136085.65 0.34
7 208544.20 2491.61 1.19 2 27007772.20 180973.44 0.67
8 247324.00 0.00 0.00 3 16858437.80 141189.68 0.84
4 10053987.40 51895.44 0.52
0 752606.60 8630.68 1.15 5 5682551.20 51999.14 0.92
1 693328.20 6224.36 0.90 6 3046166.80 29255.60 0.96
2 662626.40 6182.74 0.93 7 2914814.40 27722.12 0.95
3 645025.60 2902.58 0.45 8 2862730.60 32650.13 1.14
4 635811.00 9533.07 1.50 9 2786472.20 106196.33 3.81
5 632504.00 3865.58 0.61 10 2807440.20 28871.99 1.03
6 635001.20 3213.31 0.51 11 2823347.00 21410.62 0.76
7 633250.80 5911.52 0.93 12 3216450.00 0.00 0.00
8 641412.00 3245.25 0.51
9 714802.00 0.00 0.00 0 24097980.80 326637.20 1.36
1 23650079.80 253220.60 1.07
2 16671353.60 342702.58 2.06
3 10735756.20 317902.05 2.96
4 7087538.60 44568.30 0.63
5 4415358.00 64362.46 1.46
6 1350258.00 46635.86 3.45
7 1425990.60 45021.62 3.16
8 2666584.40 48088.33 1.80
9 1382421.00 25670.54 1.86
10 1372139.80 32484.41 2.37
11 1390089.00 16832.92 1.21
12 1754858.00 0.00 0.00
urf4	  (11	  bits)
urf5	  (9	  bits)
hwb9	  (9	  bits)
plus63mod8192	  (13	  bits)
plus63mod4096	  (12	  bits)
plus127mod8192	  (13	  bits)
urf1	  (9	  bits)
urf3	  (10	  bits)
  
 
114 
Partition	  Size Avergae	  QC Std.	  Deviation %	  std.	  dev	  from	  mean Partition	  Size Avergae	  QC Std.	  Deviation %	  std.	  dev	  from	  mean
0 94677.60 1647.98 1.74 0 4185570.60 3021.68 0.07
1 87661.60 1887.98 2.15 1 3759816.20 16189.25 0.43
2 85417.80 1098.08 1.29 2 3509468.60 14335.49 0.41
3 83607.00 870.24 1.04 3 3408426.60 14354.58 0.42
4 80754.20 1660.82 2.06 4 3327920.60 19046.78 0.57
5 81495.40 1622.21 1.99 5 3270174.20 9570.76 0.29
6 83701.00 1880.12 2.25 6 3273673.20 17588.71 0.54
7 86658.80 1028.81 1.19 7 3286444.80 12309.80 0.37
8 98221.00 0.00 0.00 8 3283356.00 4912.80 0.15
9 3297517.40 18489.35 0.56
0 262997.60 2130.32 0.81 10 3531787.00 0.00 0.00
1 242097.80 1730.92 0.71
2 229557.20 1691.04 0.74 0 8039679.40 124163.35 1.54
3 222400.40 1444.06 0.65 1 7946022.80 123583.65 1.56
4 217330.00 1255.85 0.57 2 5564745.40 80074.53 1.44
5 217870.00 1602.04 0.74 3 3572569.20 83341.23 2.33
6 217573.80 1649.06 0.76 4 2252990.40 32991.30 1.46
7 218820.00 1981.44 0.91 5 1347609.40 6575.66 0.49
8 251478.00 0.00 0.00 6 713982.20 28507.76 3.99
7 698602.20 24738.06 3.54
0 253058.20 1753.67 0.69 8 691560.20 24986.72 3.61
1 234130.00 2098.63 0.90 9 698068.80 3953.90 0.57
2 222287.20 1507.57 0.68 10 686522.20 10372.66 1.51
3 212919.80 2554.46 1.20 11 850847.00 0.00 0.00
4 211623.00 3891.74 1.84
5 211727.60 2098.67 0.99 0 40960807.60 398797.49 0.97
6 209847.80 994.75 0.47 1 39614392.40 136085.65 0.34
7 208544.20 2491.61 1.19 2 27007772.20 180973.44 0.67
8 247324.00 0.00 0.00 3 16858437.80 141189.68 0.84
4 10053987.40 51895.44 0.52
0 752606.60 8630.68 1.15 5 5682551.20 51999.14 0.92
1 693328.20 6224.36 0.90 6 3046166.80 29255.60 0.96
2 662626.40 6182.74 0.93 7 2914814.40 27722.12 0.95
3 645025.60 2902.58 0.45 8 2862730.60 32650.13 1.14
4 635811.00 9533.07 1.50 9 2786472.20 106196.33 3.81
5 632504.00 3865.58 0.61 10 2807440.20 28871.99 1.03
6 635001.20 3213.31 0.51 11 2823347.00 21410.62 0.76
7 633250.80 5911.52 0.93 12 3216450.00 0.00 0.00
8 641412.00 3245.25 0.51
9 714802.00 0.00 0.00 0 24097980.80 326637.20 1.36
1 23650079.80 253220.60 1.07
2 16671353.60 342702.58 2.06
3 10735756.20 317902.05 2.96
4 7087538.60 44568.30 0.63
5 4415358.00 64362.46 1.46
6 1350258.00 46635.86 3.45
7 1425990.60 45021.62 3.16
8 2666584.40 48088.33 1.80
9 1382421.00 25670.54 1.86
10 1372139.80 32484.41 2.37
11 1390089.00 16832.92 1.21
12 1754858.00 0.00 0.00
urf4	  (11	  bits)
urf5	  (9	  bits)
hwb9	  (9	  bits)
plus63mod8192	  (13	  bits)
plus63mod4096	  (12	  bits)
plus127mod8192	  (13	  bits)
urf1	  (9	  bits)
urf3	  (10	  bits)
   
 115 
 Covered Set Partition with Evolutionary Algorithms Chapter 6
  
 Introduction 6.1
The algorithm presented herein avoids the addition of extraneous output bits and does 
not give consideration to the LNNM model.  This chapter reports our latest milestone in 
the chain of algorithms based on Miller, Maslov and Dueck (MMD) [56] approach to 
quantum logic synthesis.  Stedman and Perkowski [11] presented an algorithm capable of 
producing circuits with lower number of gates by exploring permutations of input vector 
ordering other than the natural ordering used by MMD.  Stedman’s method however 
stalls at large number of variables, as it requires an exorbitant amount of time to compute.  
Alhagi, Hawash and Perkowski [22] followed up with a synthesis method that explores a 
subset of Stedman’s orderings that produce near optimal circuits within a reasonable 
amount of time.  Hawash, et al. [23] explored alternative convergent sets of Stedman’s 
orderings, dubbed Covering Set Partitions, which were able to discover solutions of 
lower quantum gate cost.  This chapter explores the impact of partition depth on quantum 
cost. 
The main topics of this chapter are: 
• The impact on quantum gate cost of using Genetic Algorithm and Tabu search 
compared to random selection of valid CSP sequences, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
 
 
   
 116 
• Comparison of the performance (with respect to quantum gate cost) of various 
variants of the genetic algorithm (single and double cross over) and Tabu 
search, 
• The Impact on quantum cost of varying the depth of the CSP partition used to 
generate valid sequences. 
 MMD Style Algorithms 6.2
In the paper, A Transformation Based Algorithm for Reversible Logic Synthesis, 
Miller, et al. [56] outlined a simple, yet powerful, synthesis method of reversible circuits.  
This algorithm observes a simple, yet essential, guiding principle stating that: A 
completely mapped pair can never be altered by succeeding mapping calculations.  This 
important rule, along with inherent attribute of natural binary ordering of the input vector, 
allows MMD to always converge which is an essential principle for synthesizing arbitrary 
reversible circuits.  The issue of convergence has been treated fully by  
[21, 23, 11]  and, for the sake of setting context for convergence as it relates to CSP, the 
reader is encouraged to review [23].  Some definitions are in order before I illustrate the 
algorithm with an example. 
 Anatomy of Covered Set Partition Algorithm 6.3
 Structure 6.3.1
I hinted earlier that MMD [56] uses the natural binary order to arrange the minterms of 
the input vector and that such an arrangement ensures convergence.  Stedman [11], 
Alhagi [22] and the current authors [23] documented the advantage of exploring 
   
 117 
alternative sequencing of input vector.  Stedman outlined an algorithm for detecting 
convergent input orderings and Alhagi devised a systematic algorithm, based on the 
Hasse diagram, for constructing valid input orderings for any number of bits and 
demonstrated the ability to produce circuits at lower quantum cost within a reasonable 
period of time.  In our attempt to improve on Alhagi’s work, I construct a different set of 
sequences based on the mathematical concept of partially orderd sets described below.  
Definition 6-1: a Hasse diagram is a type of mathematical diagram used to represent 
a finite partially ordered set, in the form of a graph where, for the relation {(x,y) | x ≤ y | 
x,y ∈ S}, each element of S is a vertex in the plane and draws a line segment or curve that 
goes upward from x to y whenever y covers x (that is, whenever x < y and there is no z 
such that x < z < y). 
Figure 6-1 displays graphical illustrations of two variants of the covering set partitions 
method for a function of four variables.  The table to the left of the graph sets a partition 
depth of 1 bit which is depicted graphically by the upper and lower regions labeled (b3:0 
and b3:1).  The lower half of the graph represents the partition where the highest bit 3 = 
1, and the upper half is for the partition where bit3 = 0.   For the remaining three bits (b2-
b0), the algorithm uses the Hasse structure to create the sequence for each of the two 
halves.  Notice the Hasse diagram levels are represented by the diagonal lines of the top 
half – see [22], [23] for more information about creating the Hasse sequence.   
The following ordered set represents the order of the minterms in the sequence for a 
partition depth of one (underlined). 
{{0000}, {0001, 0010, 0100}, {0011, 0101, 0110},{0111},  
   
 118 
  {1000}, {1001, 1010, 1100}, {1011, 1101, 1110},{1111}} 
Alternatively, a valid sequence could be constructed using a partition depth of 2 which 
is represented graphically by the four planes of the upper and lower surfaces of the cube 
and shown in the table on the right.  In this case, terms with b3b2=00 are placed at the 
beginning of the sequence followed by b3b2=01, b3b2=10 and finally b3b2=11.  The 
remaining two bits could still be taken according to the Hasse sequence.  The following 
ordered set is a valid sequence for a partition depth of two: 
{{0000}, {0001, 0010}, {0011}, {0100}, {0110, 0101}, {0111},  
  {1000}, {1001, 1010},  {1011},  {1100},  {1110, 1101}, {1111} } 
 Steps for creating valid sequences 6.3.2
Definition 6-2: For a binary function of n variables, a band within a Hasse diagram is 
the set of minterms (bn-1….b1b0) which have the same number of ones; i.e., 𝒙 =  𝒃𝒏!𝟏…𝒃𝟏𝒃𝟎     ∀𝒙   𝒃𝒊𝒏!𝟏𝒊!𝟎   𝒊𝒔  𝒕𝒉𝒆  𝒔𝒂𝒎𝒆}.   
Corollary 1: An n-variable binary function has a total of n+1 bands. 
The following process outlines the steps for creating CSP sequences for an n- variable 
function using the p upper bits for partition: 
• Create k=2p partitions where p is the partition depth represented by the number 
of upper bits resulting in the number of partitions N=0..k-1. 
• To construct an input sequence, place all the terms sequentially according to 
their partition number N=0..k-1.   
   
 119 
• Within each partition, use the Hasse diagram to arrange the minterms within a 
partition as follows: 
• Start from the base level of the Hasse diagram consisting of all zeros, 
• Randomly permute, i.e., shuffle, terms of the next band consisting of single 
ones and place them at the end of the ordering, 
• Repeat step (b) for each band that follows in consecutive order, where each 
band has an additional one compared to the band before it, (two ones, three 
ones, … ), 
• Place the last term consisting of all ones (k-1 ones) at the end of the sequence. 
 
Figure 6-1 Covering Set Partitions with partition size=1 using bit 3 to create two partitions of 3 bits 
each (upper and lower cubes),or partition size=2 using bits 3-2 to create 4 partitions of 2 bits each (four 
planes).  The dark line separating the partitions is referred to as the partition boundary where minterms 
are not allowed to cross such boundary in the process of rearranging minterms to create different input 
sequences.  
   
 120 
 Algorithmic Contest  6.4
In section 6.3.2 above, I outlined the steps for creating a single valid sequence using 
the CSP algorithm.  I stipulated then that there exists a number of solutions in an 
exponentially expanding search space.  In [23] I employed a random process in 
constructing the sequences and maintained the ones with the best cost up to that point.  It 
was shown then, that, for a small number of variables, the CSP performed well compared 
to earlier attempts by [22, 56]; yet as the number of variables increased, the ability to find 
better solutions became dismal at best.  I concluded then that the vastness of search space 
hindered our ability to discover the proverbial needle in the haystack.  In this effort, I 
present the results of exploring two additional alternative selection methods of the input 
vector sequence and compare the performance of the three methods: Random, Genetic 
Algorithm and Tabu search. It is noteworthy that I was careful to provide a fare 
comparison by stipulating that each method selects and synthesizes the same number of 
sequences and only varied the method of constructing valid sequences. 
 Objective function using Quantum Cost 6.4.1
The quality of a solution is measured by quantum cost which represents the number of 
elementary quantum gates used to implement the specification.  For an arbitrary quantum 
circuit C with k quantum NCT gates, the quantum cost Q𝑐 is calculated as follows: 
𝑄! =    𝐺!"(𝑖)!!!!  
   
 121 
where: Gqc is the quantum cost of each gate in the cascade and can be calculated 
according to Table 6-1. 
Table 6-1 Quantum Cost of elementary gates 
Gate Type Quantum Cost 
NOT, C1NOT 1 [ [78]] 
C2NOT (Toffoli) 5 [ [79]] 
CmNOT (3 ≤ 𝑚 ≤ !! ) 12m – 22 [ [34]] 
CmNOT ( !! + 1 ≤ 𝑚 ≤ 𝑛 − 2) 24m – 64 [ [34]] 
Cn-1NOT 2n – 3 [ [79]] 
 Method 1: A Random Skip and Hop 6.4.2
In order to discover a solution with the lowest quantum cost, a set of k valid solutions 
are created randomly according to the steps outlined above.  Notice that because each 
band is shuffled in a random manner, step · above, potential solutions are selected for 
examination in a blind manner; i.e., past solutions has no influence on the structure of 
new solutions.  A new solution is saved only if its quantum cost is lower than the current 
lowest cost; otherwise, the solution is purged, and a new solution is randomly 
constructed.  The following pseudo-code demonstrates the operation of random selection: 
 
Although the search space grows exponentially, (2n)!, there exists a very small 
possibility that a solution would be visited more than once.  More dramatically, however, 
cost ≔ MAX_INTEGER 
k ≔ number of solutions to examine 
for i≔ 1 to k 
 solution ≔ initialize(); 
 if (evaluate(solution) < cost) 
  best_solution ≔ solution; 
 end if 
end for  
   
 122 
is that the odds of finding solutions with low quantum cost are equivalent to the odds of 
hitting the jackpot of the grand lottery. 
 Method 2: Genetic Algorithm 6.4.3
Rather than bouncing randomly around the search space, a genetic algorithm follows a 
set of directed probabilistic steps where new solutions are the offspring of existing good 
solutions.  The following block exhibits the standard structure of a genetic algorithm: 
 
The initialization and evaluation steps are exactly the same steps used in the random 
algorithm in 6.4.2 above.  Roulette wheel selection process was used to randomly select 
two parents of the current generation for recombination.  Single and double crossover 
operators were used to create the offspring with certain limitations on the position of the 
crossover, discussed below.  Finally, mutation is probabilistically applied to each 
offspring in order to continuously maintain population diversity and avoid premature 
convergence to local minima. 
g ≔ number of generations; 
initialize(P(g)); 
do 
 evaluate(P(g)); 
 P1(g), P2(g) ≔ select(P(g));   // select set of parents 
 g ≔ g – 1; 
 P(g) ≔ recombine(P1(g), P2(g));   // crossover è offspring 
 P(g) ≔ mutate(P(g));         // mutate selected child 
while(g > 0); 
   
 123 
6.4.3.1 Genotype and Valid Operators 
As discussed earlier and shown in [22], [23], the band structure of definition 7.2, 
above, must be maintained to ensure algorithmic convergence.  Consequently, 
recombination operators are limited to the band boundary and mutation operators are 
limited to intra-band alterations.   
Figure 6-2 illustrates the structure of a chromosome for a three variable binary 
function with CSP partition depth of one (1).   In order to ensure that a child is a valid 
CSP sequence, the crossover point(s) must happen at the band boundary position.  Had 
the invalid crossover point been taken in Figure 6-2, the resultant child would have been 
invalid as it would have included the term 001 twice and lacked the term 010.  Of course 
a repair process could have detected and corrected such a defect which would surely yield 
different result and could be the subject of future exploration.  The reader might correctly 
surmise that the choice of limiting crossover to band boundary could potentially result in 
stale members within each band, leading to premature convergence to local minima.  
Such an anomaly is treated with random mutation within a band at a level higher than 
mutation probability suggested by standard genetic algorithms.  A high level of mutation 
probability, I theorized, would inject diversity within children allowing them to escape 
such hasty local minima. 
   
 124 
 
Figure 6-2 Genotype of a valid CSP input sequence showing valid and invalid mutations and cross over 
operations.  A valid crossover can only occur at the partition boundary for each specific partition and at 
the band boundary as stipulated by the Hasse diagram.  Valid mutations, in this case swap, can only swap 
elements within the band to assure that the Hasse order is not violated. 
 Method 3: Tabu Search 6.4.4
I also implemented the Tabu3 search [80, 81, 82] to examine whether it would 
discover solutions with lower quantum cost than either the genetic algorithm or random 
methods.  Tabu is a meta-heuristic search algorithm which, during the selection process, 
forbids search moves to solutions already visited in the past m steps.  As a result, the 
algorithm is amenable to accept, temporarily, solutions with inferior quantum cost, in 
order to skip, possibly better, solutions which were just investigated.  Such an approach, I 
assumed, should provide protection against the trap of falling into local minima early.  
The following list describes the Tabu search: 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
3 Tabu or taboo. 
{000} {001,010}
Valid	  crossoverInvalid	  crossover
Parent	  1
Parent	  2
Offspring
Invalid	  mutationValid	  mutation
{011} {100} {110,101} {111}
{000} {010,001} {011} {100} {101,110} {111}
{101,110} {111}{000} {001,010} {011} {100}
   
 125 
 
Unlike the genetic algorithm where an initial population of m solutions is created 
followed by generations of solutions through intra breeding and mutation, the Tabu 
search starts with a randomly begotten single solution, θ.  At each step of the process, n 
mutations are serially performed to create a neighborhood of n solutions, using the same 
probabilistic intra-band swap operator of the genetic algorithm above.   The selection 
criteria of new solutions are: 
• When a solution is selected for synthesis, it is added to the Tabu list, T, used to 
reject future encounters of the same solution. 
• When a solution λ with a better cost than θ is found: 
• Select λ only if it is not in the Tabu list T or it has not been visited for τ 
iterations.   
• Otherwise, select the next best solution in the neighborhood N(θ) according to 
the same criteria · above. 
• If all neighbor solutions are in the Tabu list, a new set of neighbors is 
generated. 
C ≔ {15, 20, 25, 30};    // constant for different runs 
θ ≔ initialize(); 
τ ≔ bands /2 + C(runs); 
do 
 evaluate(θ); 
 N(θ) ≔ sort( neighborhood(θ) );    // neighborhood set 
 λ ≔ select(N(θ)) { λ ∉ T(θ) OR T(θ) > τ }; 
 T ≔ {θ};         // add to top of taboo set 
while (not termination-condition); 
   
 126 
Rather than generating a fixed number of neighbors, the algorithm determines the size 
of the neighborhood based on the size of the band selected for mutation: 
Neighborhood Size =β × length(band); // β ∈ {!! , !! , !!"} 
The factor β is inversely proportional to the number of variables and was introduced as 
a trade-off to speed up computation by reducing the neighborhood size as the number of 
variables increases.  For the sake of reducing memory usage and increasing speed of 
comparison, I chose to store the checksum of the input vector rather than saving the entire 
vector in the list. 
 Experimental Results 6.5
For the purposes of this experiment, I selected five benchmark functions of 9 to 11 
variables [83] and, similar to the previous chapter, collected the quantum cost for each 
partition size along with the standard deviation as shown in Table 7-2.  In this experiment 
I answer the questions: what is the impact of the selection method on the quantum cost 
and does the partition size continue to affect the result?  In order to keep a balanced 
comparison, the following steps were observed: 
• The same synthesis algorithm was used once the input sequence was 
constructed. 
• All algorithms processed the same number of input sequences for each partition 
size (100,000 sequences), regardless of the chance, that the same sequence 
could have been selected repeatedly.   
   
 127 
• For each partition size, and for each flavor of the algorithm, 5 runs are executed 
with the average quantum cost being reported along with the ratio of the 
standard deviation to the mean as a measure of the variance between runs.  
Consequently, for a function of n variables, the algorithm explores 2,000,000n 
input sequences (4x5x100000n). 
• For the genetic algorithm, I selected a mutation probability of 0.2 and 
recombination probability of 0.8 as they appeared to provide the best results in 
the process of tuning the algorithm to the probability parameters. 
• The Roulette wheel method is used to select two parents for the creating of new 
offspring. 
• The comparison was performed for different partition and results for each 
partition size are reported separately. 
 
Figure 6-3 Quantum Cost for the URF9 (9 bit) function for the four selection methods at each partition 
point.  Notice that, in general, the Tabu search performs the best compared to the other selection methods 
and that the Random selection method is always the worst.  Also notice that the best quantum cost for all 
methods happens around the midpoint of the number of variables (here at partition size = 4). 
   
 128 
Table 7-2 shows a comparison amongst four different selection methods of the input 
sequence: random, genetic algorithm with single crossover, genetic algorithm with 
double crossover, and Tabu search.   For a function of n variables, the four selection 
method are applied for partition sizes 0 to n-2.  The n-1 partition size is not processed 
since it represents the MMD sequence which uses the natural order.  From Table 7-2 and 
Figure 7-3, the Tabu search has consistently outperformed the other selection methods 
with few exceptions where the genetic algorithm with single or double crossover won 
over the Tabu search.  Overall, however, utilizing an evolutionary algorithm in the 
selection process of input sequences has shown to be superior to random selection. 
The results for the GA and Tabu search are also consistent with the results for the 
random selection in consideration to the partition size.  For the Tabu search, partition 
sizes around the midpoint of the number of variables yielded the best quantum cost for all 
functions except for urf1.   Even for that specific function, the quantum cost around the 
midpoint (193544) is only 3% off the best cost (188523). 
 
  
   
 129 
Table 6-2 Comparison for five large functions between four different selection methods (random, GA with 
single point crossover, GA with double point crossover, and Tabu search).  By far, the Tabu search has 
outperformed all other methods when compared for the same partitions size as shown by the lightly shaded 
areas.  For each function, the partition size also had an impact the outcome where similar to the random 
selection; partition sizes around the midpoint of the number of variables yield the best results – shaded in 
dark gray. 
 
Partition
Size Avergae.QC SD/Mean Avergae.QC SD/Mean Avergae.QC SD/Mean Avergae.QC SD/Mean
0 94677.60 1.74 92405.34 1.71 91269.21 1.74 93068.08 1.71
1 87661.60 2.15 86083.69 2.11 85031.75 2.12 79333.75 2.12
2 85417.80 1.29 83624.03 1.27 83453.19 1.24 82855.27 1.29
3 83607.00 1.04 82102.07 1.02 81182.40 1.02 82185.68 1.01
4 80754.20 2.06 79785.15 2.03 77685.54 1.99 76070.46 3.12
5 81495.40 1.99 80109.98 2.30 79702.50 1.93 77094.65 1.95
6 83701.00 2.25 82278.08 2.21 82696.59 2.21 81608.48 2.25
7 86658.80 1.19 85445.58 1.17 84492.33 1.16 79812.75 1.14
0 262997.60 0.81 257737.65 0.81 253792.68 0.78 258000.65 0.78
1 242097.80 0.72 237013.75 0.71 234108.57 0.68 225393.05 0.72
2 229557.20 0.74 224277.38 0.71 221063.58 0.71 211422.18 0.72
3 222400.40 0.65 220176.40 0.64 218841.99 0.63 219064.39 0.63
4 217330.00 0.57 212766.07 0.55 211244.76 0.58 197770.30 0.56
5 217870.00 0.74 214166.21 0.72 215037.69 0.73 197608.09 0.74
6 217573.80 0.76 213222.32 0.74 212352.03 0.74 215398.06 0.74
7 218820.00 0.91 214224.78 0.90 210286.02 0.91 202627.32 0.89
0 253058.20 0.69 247490.92 0.67 244454.22 0.70 247743.98 0.70
1 234130.00 0.90 228745.01 0.89 228745.01 0.90 227340.23 0.89
2 222287.20 0.68 218952.89 0.66 213618.00 0.66 204504.22 0.67
3 212919.80 1.20 208022.64 1.15 210151.84 1.19 193544.10 1.19
4 211623.00 1.84 206755.67 1.78 208448.66 1.79 194481.54 1.82
5 211727.60 0.99 209186.87 0.98 208128.23 1.00 195424.57 0.97
6 209847.80 0.47 206280.39 0.47 205231.15 0.46 196207.69 0.46
7 208544.20 1.20 206041.67 1.19 203539.14 1.18 188523.96 1.16
0 752606.60 1.15 736049.25 1.16 738307.07 1.12 737554.47 1.13
1 693328.20 0.90 683621.61 0.90 668368.38 0.87 659355.12 0.91
2 662626.40 0.93 648711.25 0.92 656000.14 0.94 652687.00 0.89
3 645025.60 0.45 638575.34 0.45 637285.29 0.45 632125.09 0.45
4 635811.00 1.50 628817.08 1.45 627545.46 1.51 620551.54 1.44
5 632504.00 0.61 623016.44 0.61 610998.86 0.61 605306.33 0.59
6 635001.20 0.51 626746.18 0.50 581026.10 0.50 616918.29 0.49
7 633250.80 0.93 619319.28 0.90 626918.29 0.92 595889.00 0.90
8 641412.00 0.51 630508.00 0.51 632432.23 0.50 582402.10 0.51
0 4185571 3021.68 4118601 3026.128 4039076 2965.573 3938622 2974.07
1 3759816 16189.25 3699659 16383.29 3710939 16455.92 3466551 15965.63
2 3509469 14335.49 3425241 14243.74 3463846 14209.42 3228711 14531.16
3 3408427 14354.58 3350483 14055.13 3370934 14237.91 3077809 14226.18
4 3327921 19046.78 3251378 18385.19 3268018 18781.08 3018424 18576.38
5 3270174 9570.76 3230932 9550.853 3142637 9457.442 3149178 9339.837
6 3273673 17588.71 3198379 17747.03 3234389 16990.69 2982316 17381.48
7 3286445 12309.8 3243721 12601.05 3224002 12436.27 3154987 12122.89
8 3283356 4912.8 3237389 4909.479 3018323 4750.678 3217688 4911.208
9 3297517 18489.35 3228270 18242 3205187 18208.61 3050204 17864.71
urf4.(11.bits)
Random Single.Crossover Double.Crossover Tabu
urf5.(9.bits)
hwb9.(9.bits)
urf1.(9.bits)
urf3.(10.bits)
   
 130 
Table 7-2 shows a comparison amongst four different selection methods of the input 
sequence: random, genetic algorithm with single crossover, genetic algorithm with 
double crossover, and Tabu search.   For a function of n variables, the four selection 
method are applied for partition sizes 0 to n-2.  The n-1 partition size is not processed 
since it represents the MMD sequence which uses the natural order.  From Table 7-2 and 
Figure 7-3, the Tabu search has consistently outperformed the other selection methods 
with few exceptions where the genetic algorithm with single or double crossover won 
over the Tabu search.  Overall, however, utilizing an evolutionary algorithm in the 
selection process of input sequences has shown to be superior to random selection. 
The results for the GA and Tabu search are also consistent with the results for the 
random selection in consideration to the partition size.  For the Tabu search, partition 
sizes around the midpoint of the number of variables yielded the best quantum cost for all 
functions except for urf1.   Even for that specific function, the quantum cost around the 
midpoint (193544) is only 3% off the best cost (188523). 
 
  
  
   
 131 
 Conclusion and Analysis 6.6
By utilizing heuristic based selection of future sequences based on the quality of 
already visited solutions, both the genetic algorithm and Tabu search methods were able 
to discover input vector sequences which produce superior results compared to purely 
random selection (~ 10%).   In this experiment, for a function of n variables, 2,000,000n 
solutions were explored in order to analyze each function, which is 20n times higher than 
the time needed for MMDSN alone.  However, from this experiment, it is obvious that 
the Tabu search consistently provides the best results which are typically found around 
partitions sizes of ½n.  Assuming we limit our search to Tabu search and explore k 
partitions around the midpoint, the necessary time to find the best solution is only k times 
larger than the time required form MMDSN.  This of course, is the same conclusion I 
reached in the last chapter when I limited the experiment to discovering the impact of 
partition size. 
Using the random selection method, the algorithm takes random jumps around the 
huge search space and is able to find the best solution within the visited subset.  
However, by partitioning the search space into smaller sections, the random algorithm 
will jump within the bounds of each partition and is able to find even better solutions 
since the search space is more focused.   By adding a genetic algorithm or Tabu search, 
the quality of selection increases since the selection algorithm does not randomly jump 
around the partitioned search space.  For the genetic algorithm flavor, the two best 
parents are used to create the next set of solutions, which allows the algorithm to visit a 
set of children which are not far away from the parents.  For the Tabu search, the 
   
 132 
selection of the next set of candidates is even more constrained where a mutation operator 
with a small probability 0.25 is used to create all candidate solutions from the single best 
solution of the previous run.  Additionally, due to the Tabu list, this method reduces the 
chance that the same solution is synthesized often which allows it to explore more unique 
solutions than any of the other methods.  Combined, all of these qualities of the Tabu 
search gives it the best advantage of discovering the lowest quantum cost when compared 
to both the random and genetic algorithm flavors for selecting the next set of candidates. 
 
  
  
Figure 6-4 Plots of four functions with quantum cost vs. partition size for the four selection methods.  
Similar to the urf5 function above, the Tabu search performs the best and all selection methods have lower 
quantum cost around the midpoint of the size of function. 
  
   
 133 
 Other attempts at discovering better sequences         Chapter 7
Our attempts to discover methods for optimal logic synthesis have not all been 
successful.  I have thus far explicated that the complexity of discovering solutions with 
low quantum costs increases exponentially relative to the number of variables.  I thought 
that there might exist a method to predict the subset of input/output orderings which 
consistently yield better circuit cost.   Consequently, I have attempted a few 
preprocessing methods in an effort to predict patterns which could potentially lead to 
good solutions.  All such efforts, unfortunately, have been unsuccessful where no obvious 
pattern emerged as a predictor of good solutions. 
 Hamming Distance Predictor 7.1
The first method calculates the cumulative hamming distance between all input and 
output minterms.  Such calculation can be performed extremely quickly compared to a 
full synthesis of a single input/output sequence.  I hypothesize that the cumulative 
hamming distance between the input and output bits of each minterm could be a predictor 
of good solutions, and if so, I could quickly eliminate solutions predicted to be of high 
quantum cost.   For example, the binary specification in Table 7-1a yields a total 
hamming distance of four (highlighted) between the input and output minterms.   I 
conjectured that a possible swap of qubits locations, Table 7-1b, could yield circuits at 
lower cost, and by calculating the hamming distance of both situations, I could easily 
focus on the sequences of Table 7-1b.   In this artificial example, adding a swap gate 
   
 134 
between the two qubits brings the hamming distance to zero, as shown in Table 7-1b, and 
hence, the entire circuit consists of a single swap gate. 
Table 7-1 Hamming distance between input (ab) and output (AB) vectors.  In (a), the HD=4 when the 
qubits are arranged in this manner, however, if the outputs are swapped (BA), then the hamming 
distance=0.  I assumed that it is feasible that by rearranging the qubits and synthesizing the modified 
function might yield better results.  However, preliminary experimentation with different functions did not 
indicate that this would be a successful strategy. 
 
 
Figure 7-1 For the same sequence, swapping bit positions in the output and calculating the bit to bit 
hamming distance vs. the quantum cost of the resultant circuit.  From the plot, it is clear that using a 
preprocessor to determine the arrangement of variables with the lowest hamming distance does not give a 
good predictor of the result.  
Another example is shown in Table 7-1c-d where, swapping A and B results in the 
reduction of bit-to-bit hamming distance from four to two.  For this experiment I had 
   
 135 
hoped to find that a low cumulative hamming distance could be an indicator of sequences 
which yield better results by swapping variables as shown above.  The results, however, 
did not show that either low or high hamming distances are predictors of lower cost 
circuits. Figure 7-1 shows the results of synthesizing a single sequence where each bit in 
the output sequence was swapped with all other bits followed by calculating the quantum 
cost. 
 
Figure 7-2 Cumulative Positional Distance (CPD) of minterms vs. quantum cost.  I hypothesized that a 
preprocessor which, for a specific minterm, calculates the Absolute Distance |d| between its position in the 
input vector to its position in the output vector, and then accumlating all these distances to determine the 
CPD.  The CPD would, supposingedly, then could be used as a predictor of which input/output sequence 
could yield better quantum costs.  However, as seen from the plot, no clear sign emerges to fortell such a 
result. 
  Absolute Distance Predictor 7.2
I then imagined a relationship between the cumulative distances of a minterm in the 
input vector to its location in the output vector.  Table 7-2a, for example, shows the 
minterm (10) in rows 2 and 3 of the input and output vectors respectively, placing a 
   
 136 
distance of one apart.  Minterm (01) also has a distance of 1 for a cumulative distance of 
two for all minterms of the input and output vectors.  Table 7-2b, on the other hand, 
shows a different function with an absolute distance of 4 [d(10) = 2, d(01)=1, d(11) =1].  
Again, I had hoped to find a clue from calculating the absolute distance to identify 
potential good solutions; however, again, no clear indication emerged from this exercise.  
Figure 7-2 shows the same function used in the Hamming Distance Predictor above with 
each output bit swapped with all other output bits followed by calculation of the quantum 
cost vs. the cumulative positional distance of minterms.  
Table 7-2 Absolute Distance |d| of minterms between input and output vectors.  In (a) the input minterm 
(01) is one position away from its location in the output vector.  Similarly, the (10) input minterm is one 
position away from its location in the output vector, which combined, results in |d|=2.  Similarly for (b) 
where |d|=4. 
 
 Vector Length Predictor 7.3
I also hypothesized that portraying the set of 2n input and output minterms as vectors 
in a 2n dimension Euclidean space and that the difference in the length of such vectors 
could be a predictor of quantum cost.  Again, however, the results in Figure 7-3 illustrate 
the fallacy in our assumption where the results appear to closely mimic the absolute 
distance predictor. 
   
 137 
 
Figure 7-3 Difference in Euclidean vector length between input and output vectors as a predictor of 
best quantum cost does not appear to show a clear pattern of which candidate solutions could yield better 
quantum costs.  The vector length is determined by considering that the value of each minterm as a decimal 
distance away from zero and calculating the Euclidean distance for a the multi-dimensional input and 
output vectors.  The absolute difference between such values is plotted vs the quantum cost.  
 
  
   
 138 
PART III 
 
 
 
QUANTUM LOGIC SYNTHESIS 
OF  
BINARY FUNCTIONS 
IN COMPLIANCE WITH 
LINEAR NEARST NEIGHBOR MODEL 
  
   
 139 
 Linear Nearest Neighbor Model for Binary Chapter 8
 Introduction 8.1
Algorithms for automated synthesis of quantum and reversible circuits have generally 
measured their performance according to an established set of costs of primitive quantum 
gates [4, 84, 21, 56, 23, 24, 85].   The foundation for those costs, however, is built on the 
assumption that any two qubits are able to interact directly and independent of any other 
qubits.   However, current realization of quantum technologies indicate that certain 
intrinsic physical limitations exist which inhibit such distant interaction amongst qubits 
[15, 16, 17].   For instance, in the One-Dimensional Ion Trap technology introduced by 
Cirac and Zoller [16, 15] a two qubit CNOT gate is created by entangling neighboring 
qubits.   In quantum optics, qubits also interact by proximity using optical wires or 
crystals [86]. 
Ignoring the physical nearest neighbor limitation, some algorithms take liberty of 
adding many ancillary qubits with permissive interaction amongst distant qubits for the 
sake of reducing the overall quantum cost of the resultant circuit [57, 59, 87].  Recent 
attempts to account for LNN model either focused on swapping qubits for the gate to 
bring their qubits next to one another, or using template matching to replace non-LNN 
compliant sections of the circuit with their LNN compliant ones [87, 88].  In [25], I 
demonstrated a mathematical decomposition for determining the worst case scenario for 
applying LNN restriction to a quantum circuit which also considers the internal structure 
of MCT gates.  The authors in [88] described a method for synthesizing quantum circuits 
   
 140 
for LNN model using template matching and qubit reordering strategies and paper [87] 
showed the benefit of LLN method for symmetrical layout of quantum circuits. 
Regardless of the algorithm used to synthesize the circuit, a physically realizable 
measure of quantum cost is necessary in order to accurately determine the actual cost of 
implementing a quantum circuit.  In this dissertation I propose a universal measure of 
LNN Quantum Cost which observes the cost of bringing interacting qubits next to one 
other for MCT gates, and the cost of enforcing the LNN restriction to the decomposition 
of MCT gates.  In our analysis, the method only allows single (V, V† and NOT) and two 
qubit (CV, CV† and CNOT) gates.   Multiple Control Toffoli (MCT) gates and Multiple 
Control V/V† gates are not allowed in the calculations [89]; higher levels of MCV gates 
require exponentially smaller degrees of rotations which are hard to implement. 
I first analyze the Linear Nearest Neighbor Quantum Cost (LNNQC) of the well-
known Toffoli gate.  I then extend the analysis to Multiple Control Toffoli (MCT) gates 
and calculate the number of primitive single and two-qubit gates necessary to construct 
such gate with consideration to the LNN restriction.  I also calculate the cost of bringing 
two qubit gates with separated target and control lines next to one another and introduce 
an algorithm for finding the optimal LNNQC for MCT gates.  I introduce a new cost 
component, ancillary ratio, which accounts for the increase of circuit depth and finally I 
present the proposed LNNQC cost for a set of MCT gates and portrays the impact on the 
quantum cost of some of the commonly used benchmark functions. 
   
 141 
 Toffoli Gate Cost 8.2
The well-known NCT library (Not Controlled Not and Toffoli) is one of the widely 
used set of gates in quantum logic synthesis [4, 84, 22, 56, 23, 24].  The Toffoli gate, in 
particular, is considered a universal gate because, similar to NAND in classical 
computing, it can be used to construct any other binary gate which could be used to 
construct any binary quantum circuit.  A set of n-bit Multiple Controlled Toffoli (MCT) 
gates are a theoretical extension to the Toffoli gate where additional control lines are used 
to control an XOR gate on the target qubit.  Barenco et al, [84] provided a cogent 
mathematical construction for deriving any n-variable MCT gate from the set of one- and 
two-qubit elementary gates: NOT, CV, CV† and CNOT.  Throughout the literature, 
existing quantum synthesis algorithms [24, 21, 23, 20, 34, 57, 59]calculate a circuit’s 
quantum cost as the number of single- and two-qubit elementary gates in a synthesized 
circuit. 
Figure 8-1 shows the de facto decomposition of the Toffoli gate currently used in the 
literature with a quantum cost of five (5) elementary gates [84, 68]  Our disclaimer of a 
hypothetical model for calculating quantum cost refers to the assumption that the long 
range (or distant) CV gate (shown within dashed rectangle in Figure 8-1) has a quantum 
cost of one (1) [84, 68].  The assumption that any pair of qubits separated by arbitrary 
distance can interact without penalty is a purely theoretical assumption.  From a 
technological point of view, however, it is more reasonable to assume that qubits can 
only interact with their nearest neighbor which is consistent with existing physical 
realization of ion trap [90], superconducting qubits [39] or optical lattices [91].   
   
 142 
While some technologies support interaction between qubits along a 2-dimensional 
plane, I limit the discussion to a linear ion trap model where a set of ions are arranged 
along a straight line like a tightly stretched string of beads [15].  As such, the discussion 
herein is limited to the application of Linear Nearest Neighbor Model (LNNM) constraint 
to synthesis of binary quantum circuits.  When I apply the LNNM to the long range CV 
gate of Figure 8-1, I would utilize swap gates to bridge the gap between the distant qubits 
(a and c) which would physically enable interaction between them.  Figure 8-2 shows the 
most precise decomposition of the Toffoli gate where interaction between any two qubits 
is constrained to their nearest neighbor.  In the figure, a swap gate is inserted before the 
CV gate to bring the information of qubits a and c next to one another, and a mirror swap 
gate is inserted after the CV gate to restore qubit a to its original position, for a total cost 
of eleven (11) elementary gates.  Notice that the two shaded CNOT gates cancel one 
another, and as a result, reduce the total quantum cost for this embodiment from 11 to 
nine (9) elementary gates [59].  
 
Figure 8-1 Toffoli gate decomposed to set of 5 primitive gates.  Notice the remote interaction of the first 
CV gate which violates the LNNM architecture. 
 
Figure 8-2 Toffoli gate with qubit interaction constrained to LNNM where the qubits of the first CV 
gates are brought next to one another through a set of swap gates.  The first 3 CNOT gates act as a swap 
gate which brings the a qubit next to c, and the next set of CNOT gates restore the location of the a qubit 
back to its original location.  The quantum cost here is 11 but since the two highlighted CNOT gates would 
cancel each other, the quantum cost of the LNNM Toffoli gate is 9.  
   
 143 
 Count of CNOT and CV Gates for MCT Gate 8.3
I now turn our attention to the quantum cost for Multiple Control Toffoli (MCT) gates 
with more than 2 control lines.  Barenco et al. [84], Lemma 7.1, used a Gray code 
construction to demonstrate that an n-bit MCT gate, where n is the number of qubits of an 
MCT gate, requires a combination of 2n-1-2 CNOT gates plus 2n-1-1 2-bit Controlled 
CV/CV† gates.  For the sake of brevity, I shall refer to the set of CV/CV† as CV) from this 
point forward.  However elegant, the analysis did not take into account the physical 
requirement of LNNM and did not calculate the true cost of realizing technologies for the 
constraint imposed by the LNN model.   
Definition 8-1: An n-bit multiple control Toffoli (MCT) gate is a reversible gate with 
the set of n-1 control line C = {ci ∈ C, 1≤ 𝒊 < 𝒏}   and a single target line x where: 
f(x)= 𝒙   ∀𝒄𝒊 = 𝟏  |  𝒄𝒊 ∈ 𝑪𝒙   ∃𝒄𝒊 = 𝟎  |  𝒄𝒊 ∈ 𝑪 
To the extent of our knowledge, the vast majority of publications in the field of binary 
quantum synthesis have used a faulty model for aggregating the quantum cost of the 
resultant circuit by ignoring the LNN restriction.  In this chapter, I provide a 
comprehensive set of equations for calculating the true cost of any MCT gate based on a) 
the number of control qubits and b) the gaps between the target bit and its nearest control 
qubit. 
   
 144 
 
Figure 8-3 Decomposition of the 4-bit MCT gate without consideration of LNNM.  Most algorithms 
assume this gate to cost 13 primitive gates, which ignores the fact that there are five gates which interact 
remotely in violation of the LNNM architecture.  
 
Lemma 8-1: An n-Bit Multi Control Toffoli gate contains the following number of 2-qubit 
CV/CV+ gates: 
#  𝑪𝑽,𝑪𝑽!𝒈𝒂𝒕𝒆𝒔  𝒇𝒐𝒓  𝑴𝑪𝑻𝒏 =    𝑪𝑽𝒌𝒏!𝟏𝒌!𝟏 = 𝟐𝒏!𝟏 − 𝟏              (𝒆𝒒.𝟏) 
Where: 
  𝑪𝑽𝒌 =   𝟐𝒏!𝒌!𝟏 
 n: number of qubits | n>2 
 k: distance between target and control qubits  
Proof:  I rely on a visual comparison between Figure 8-1 and Figure 8-3 to illustrate 
how to construct an MCT4 gate by starting from an MCT3 gate, as follows: 
• Starting from an MCT3 gate, stretch all existing CV gates by a distance of one 
qubit and insert a new qubit line between the target qubit and its nearest control 
qubit.  The new qubit will represent the new control line of an MCT4. 
• Add an additional 23-1 CV1/CV1† gates, maintaining the alternating pattern of 
CV and CV† gates, with the newly added qubit as the control line – inside 
dotted rectangle. 
   
 145 
• In order to determine LNNM cost, I calculate the aggregate for each set of CVk 
gates separately according to the distance (k) between the target and control 
qubits, as follows: 
𝐶𝑉! : 4 = (2!! ! !!),  𝐶𝑉! : 2= (2!! ! !!), 𝐶𝑉! : 1= (2!! ! !!), 
 
Figure 8-4 Decomposition of the 5-bit MCT gate into primitive 2 qubit gates without consideration of 
the LNNM architecture. 
Performing an analogous comparison between the MCT4 (Figure 8-3) gate and the 
MCT5 gate (Figure 8-4), I realize that the left half of Figure 8-4 is exactly the same as the 
entirety of  Figure 8-3 where all CV gates have been lengthened by an additional qubit.  
Again, the dotted rectangle containing the right side of Figure 8-4 adds an additional 24-1 
CV1 gates.  I now have 15 CV gates with distinct aggregates as follows: 𝐶𝑉! : 8= (2!! ! !!)  𝐶𝑉! : 2= (2!! ! !!), 𝐶𝑉! : 4= (2!! ! !!), 𝐶𝑉! : 1= (2!! ! !!), 
 
Clearly the pattern for the total number of CVk gates for an MCTn gate is 2n-k-1 where k ∈ [1,n-1].  Notice that the exact pattern repeats for every increment of a control line for 
higher orders of MCT gates.  It is only a matter of mathematical dexterity to demonstrate 
   
 146 
that the total number of CV gates in equation (1) is consistent with Lemma 7.1 of 
Barenco et al [84] mentioned above4∎ 
Lemma 8-2: An n-Bit Multi Control Toffoli gate consists of the following number of 2-
bit CNOT gates: #𝐶𝑁𝑂𝑇   = 2 ∙ 𝑛 − 2 𝐶𝑁𝑂𝑇! +    𝐶𝑁𝑂𝑇!!!!!!!   =   2!!! − 2    (𝑒𝑞. 2)  
Where: 
 n:number of control lines 
 k: distance between target and control qubits 
  𝐶𝑁𝑂𝑇! =    𝑛 − 𝑘 − 1 ∙ 2!!! 
Proof: Taking a second glance at Figure 8-1, Figure 8-3 and Figure 8-4, notice that two 
additional CNOT1 gates are needed for every new control qubit of a MCTn gate.  For 
example, starting from an MCT3 with 2 CNOT1 gates, an MCT4 requires 4 CNOT1 gates, 
and an MCT5 requires 6, and so on.  This pattern accounts for the first term of the above 
equation (2 ∙ n− 2 CNOT!), as follows: 
# qubits 3 4 5 6 n 
#CNOT1 2 4 6 8 𝟐 ∙ 𝑛 − 𝟐  
 
Let’s now consider the set of CNOT2 gates within the dotted line of Figure 8-3 and 
Figure 8-4.  In the case of Figure 8-3, we trailed every CNOT1 gate with a CNOT2 gate 
for a total of two CNOT2 gates and two CNOT1 gates.  Let’s now focus our attention to 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
4   Sn= 2n-1 +2n-2 + … + 2n-n+1 + 2n-n  
2∙Sn= 2∙ (2n-1 +2n-2 + … +2n-n+1 + 2n-n ) = 2n +2n-1 + … +2n-n+2 + 2n-n+1 
2∙Sn - Sn = Sn = 2n - 1 
 
 
   
 147 
the shaded dotted block of Figure 8-4.  Notice that, in the rightmost dotted block, we 
added the same number of CNOT2 gates as the previous block, which is the same pattern 
for every additional control qubit as follows: 
# qubits 3 4 5 6 n #𝐶𝑁𝑂𝑇 !  0 2 4 6 𝑛 − 2 − 1∙ 2 ! !! 
Consider the new CNOT3 gates in the rightmost dotted rectangle in Figure 8-4.  Notice 
that we append a CNOT3 gate to each of the CNOT{1,2} gates in that block for a total of 
four (4) CNOT3 gates.  Similar to the previous two cases, every new block, for higher 
order MCT gates, will gain 4 CNOT3 gates as follows: 
# qubits 3 4 5 6 n #𝐶𝑁𝑂𝑇 !  0 0 4 8 𝑛 − 3 − 1∙ 2 ! !! 
 
The recursive pattern evidently repeats for every MCTn gate, where a number of new 
CNOTn-1 gates are added of equal quantity as of the last previously added block of CNOT 
gates for a the preceding order of an MCTn-1 gate (i.e., shaded block in Figure 8-4).  
Obviously, the index (shown with borders) of the 𝐶𝑁𝑂𝑇 !  is proportional to the count of 
such gates.  The pattern for calculating the number of CNOTk for an n-bit MCT gate is: 
 𝐶𝑁𝑂𝑇 !   =    𝑛 − 𝑘 − 1 ∙ 2 ! !! 
which when added for all cases for k=2 to n-2 concludes the proof of Lemma 2.∎ 
Table 8-1 summarizes the number of elementary 2-qubit CV and CNOT gates for 
MCT3 – MCT9 where the last column provides a general formula for calculating the 
count for each specific gate based on the distance between the control and target qubits.  
The literature has been unanimous in using the numbers in the last row, labeled Gate 
   
 148 
Count, as the quantum cost of an MCT gate without regard to the distance between the 
target and control qubits.   I mentioned earlier that some algorithms considered the 
LNNM constraint between the cascade of MCT gates, but they largely ignored the 
application of the same LNNM principle to the set of distant elementary gates within the 
decomposed MCT gate itself (gates similar to the leftmost gate in Figure 8-1) [59, 88, 
89]. 
 LNN Quantum Cost of MCTn Gates 8.4
Theorem 8-1: The maximum quantum cost of a MCTn gate, where n>2, is: 𝐿𝑁𝑁𝑄𝐶!"! +!!!!!! 𝐿𝑁𝑁𝑄𝐶!"#$!!!!!!!                                                                                                           (𝑒𝑞. 3)  =    2!!!!! ∙ 4 ∙ 𝑘 − 1 + 1!!!!!! +  (𝑛 − 𝑘 − 1) ∙ 2!!! ∙ [4 ∙ 𝑘 − 1 + 1]!!!!!!   
Proof: In the previous section I determined the number of CV and CNOT gates 
needed to decompose any MCTn gate which I will now use to calculate the linear nearest 
neighbor quantum cost (LNNQC).  I demonstrated earlier, Figure 8-2, that the target and 
control qubits of a distant gate could be brought together by a cascade of swap gates 
applied to one of the qubits, followed by a mirror cascade of swap gates to place it back 
to its original position. Figure 8-5a shows the process of bringing qubit a closer to qubit d 
by inserting two sets of swap gates (6 Feynman gates) followed by a mirror set to bring 
qubit a back to its original position.  Notice that with this arrangement, the set of shaded 
CNOT gates will cancel each other resulting in the reduced template of Figure 8-5b for 
enforcing nearest neighbor constraint for any two qubit gate.  The following equation 
   
 149 
calculates the true cost of any distant CVk or CNOTk gate where k represents the gap 
between the control and target bit: 2− 𝑞𝑢𝑏𝑖𝑡  𝐿𝑁𝑁𝑄𝐶   𝑘 = 1+   2 ∙ 2 ∙ 𝑘 − 1             (𝑒𝑞. 4)   
         
Multiplying the LNNQC(k) for each qubit with the number of gates for each CVk and 
CNOTk proves the equations of Theorem 8-1∎ 
 
Figure 8-5 LNNM equivalent of CNOT gate acting on qubits ‘a’ and ‘d’ which are two qubits apart.  
Rather than a quantum cost of 1, the LNNMQC = 13 (a) which is further reduced to 9 (b) once the shaded 
gates cancel one another. 
However, applying equation 4 to each of the two-qubit distant gates individuall y is an 
expensive proposition which will result in an exponential increase of LNNQC for circuits 
of large number of qubits [20].  In Figure 8-6 for example, the individual cost of 
enforcing LNN through equation 4 for the CV gates labeled 1, 2 is 17, 13 and 13 
respectively.  Now, I will use the same Figure 8-6 to demonstrate a better method for 
enforcing the LNN model to the distant CV gates within the decomposed template of any 
MCTn gate.   A set of swap gates are first used to bring the target qubit f next to its 
control qubit a facilitating LNN interaction between them.  Rather than returning f back 
to its original location through a mirror swap cascade, I only need to insert a swap gate to 
bring b back to its original location and f next to b.  At this stage, both CV gates 2 and 3 
are compliant with the LNN model as they interact with qubit b.  Similarly, each dashed 
stage needs only a single swap gate to propagate down the target qubit f of all CV gates 
   
 150 
allowing interaction with the control qubit in compliance with the LNN model.   
According to theorem 1, independent swapping of the CV gates of Figure 8-6 requires 
119 CNOT gates (at 2 CNOT per swap) while the method outlined only requires 16 
CNOT gates.  For any MCTn gate, the number of CNOT gates required to enforce LNN 
model for the CV gates is: #𝐶𝑁𝑂𝑇(𝑀𝐶𝑇𝑛)!"  !"#$% = 4 ∗ 𝑛 − 2                       (𝑒𝑞. 5)    
I turn our attention now to the distant CNOT gates on the top right triangle of Figure 
8-6, which if naïvely done, would require a large number of swap gates.  Instead, I 
describe a method that would minimize the number of CNOT swap gates needed to 
enforce LNN model.  I first note the symbol ( ) represents a reflection template (herein 
reflection gate), which reverses the order of the qubits of the gate (I will analyze shortly).  
Stage 2 has the first encounter of a distant CNOT gate whom control and target bits are 
brought together through a set reflection gates (represents a regular swap gates for stage 
2).  Two internal swap gates are also needed to allow for qubit b to interact with qubit c 
according to the LNN model.  Stage 3 and 4 each has a pair of reflection gates of the 
same size as the stage number.  The number of internal swap gates for stage k is:       𝑁!"#$ 𝑘 =   2 ∗ 𝑁!"#$ 𝑘 − 1 + 2 ∗ 𝑘 − 1         (𝑒𝑞. 6) 
 
 
Figure 8-6 4-qubit swivel gate which pivots the qubits around the center point between them  
(abcd àdcba).  The binary swivel gate of ‘n’ variables requires  = n2 – 1 CNOT gates to implement. 
   
 151 
Each stage k also requires a pair of swivel gates of size k.  Figure 8-7 shows the 
construction and operation of a reflection gate.  Each reflection gate is constructed from a 
total number of CNOT gates according to the following equation: 
 
 𝐶𝑁𝑂𝑇!"#$"%&!"# 𝑘 =   𝑘! − 1                (𝑒𝑞. 7) 
 
Figure 8-7  Method for minimal swapping of distant two-qubit gates within a decomposed MCTn gate.  
In stage 0, rather than adding 4 swap gates to bring qubit ‘a’ next to’f’ and then 4 swap gates to bring it 
back, I opted to bring ‘f’ next to ‘a’, and slowly bring ‘f’ back through each stage.  Qubit ‘f’ is the target 
line and it ineracts with all qubits in each of the stages.  This method reduces the number of swap gates 
necessary to implement this MCT6 gate.  Also notice that recursive patterns emerge in every stage which 
build on the structure of the previous stage. 
Theorem 8-2: The number of CNOT and CV gates required to construct an optimal 
MCTn gate compliant with the LNN model are: 
   Total Number of CV gates is given in equation 1, 
and      𝑇𝑜𝑡𝑎𝑙  𝑁𝑢𝑚𝑏𝑒𝑟  𝑜𝑓    𝐶𝑁𝑂𝑇  𝑔𝑎𝑡𝑒𝑠 =   2!!! − 2 + eq(2) 4 𝑛 − 2   + eq(5) 
2∙ 2𝑁!"#$ 𝑘 − 1 + 2 𝑘 − 1   !!!!!!  + eq(6) 2 ∙ (𝑘!!!!!!! − 1)  eq(7) 
 
    where: Nswap(2) = 2∎ 
   
 152 
  According to Theorem 8-2, for example, the MCT6 gate in Figure 8-6 requires 140 
CNOT gates and 31 CV gates for an LNNQC of 171; however, according to Theorem 
8-1, the same MCT6 gate requires 318 CNOT gates and 31 CV gates for a total LNNQC 
of 349. 
Table 8-1 LNNQC for MCT3 to MCT9 gates. 
 
Table 8-2 shows the calculated LNNQC for MCT gates (3-9) calculated according to 
the equations in Theorem 8-2.  Notice how the LNNQC increases drastically as the 
distance between the control and target bits increases, which in turn, makes gates with 
larger gaps between the target and control qubits unfavorable.  Clearly, the application of 
LNNQC introduces a level of normalization across various methods of automated 
quantum synthesis.  More specifically, solutions that attempt to reduce quantum cost, as 
measured today, through the introduction of many ancillary qubits would suffer a penalty 
for each distant gate introduced.  With this proposal, I hope that such a penalty would 
   
 153 
bring uniformity to quantum cost calculation, and would allow for evenhanded 
comparison amongst various classes of synthesis algorithms. 
Table 8-2 Current quantum cost used in literature compared to LNNQC. 
 
 Circuit Depth 8.5
Although the near neighbor quantum cost is, at least, many orders of magnitude larger 
than the currently used quantum cost, it represents a realistic measure of physically 
implementing a quantum circuit which must adhere to the nearest neighbor constraint.   
In general, the LNN model levels the field of comparison amongst the various methods 
used by synthesis algorithms.   For example, some algorithms tend to increase the circuit 
depth through the addition of ancillary bits in order to gain advantage of lower number of 
elementary gates, and hence, lower quantum cost.  In such cases, these algorithms assume 
that any 2-qubit elementary gate has a quantum cost of one regardless of the distance 
between the endpoints of the gate.  Some synthesis algorithms [20, 30] produce solutions 
which contain gates with distances of more than 200 qubits between the target and 
control qubits, which, as shown in the tables below, results in an enormous increase in 
LNNQC, and would render such functions untenable. 
Consider for a moment that, to this date, the most advanced ion trap apparatus is only 
capable of lining up to 8 ions within the trap, and that every additional ion (qubit) 
represents a milestone in technical advancement [15].  Although, in essence, the LNNM 
   
 154 
model balances the scale for the sake of comparison, one of its most important aspects is 
its compliance with the physical reality of a quantum machine.  To further enhance the 
comparison amongst different classes of algorithms, it would be useful to understand the 
rate of increase in the circuit width.  In this dissertation, I propose that the LNNQC be 
accompanied by another metric to highlight the increase of a circuit’s width (I call this 
metric the ancillary ratio).  So instead of a single number to measure quantum cost, I 
would use a tuple of two numbers (LNNQC, AR) where a) LNNQC is the number of 
linear nearest neighbor elementary gates, and b), AR is the ratio of number of ancillary 
bits to the original width of the circuit.  An ancillary ratio of 1 represents a solution 
without any ancillary qubits; higher ancillary ratio reflects the percentage ancillary bits 
added to the circuit compared to the original width of the circuit.  See Table 8-3 for 
examples. 
 Experimental Results 8.6
 I analyzed the impact of enforcing LNN model on all MCT library functions on 
Revlib [30] and documented some of the results in Table 8-3 below.  In the table, the 
“Number of qubits” column indicates the width of the circuit including any ancillary bits.  
The “Ancillary Ratio” column reports the ratio of the circuit width to the number of the 
original input variables, where, as mentioned above, an ancillary ratio of one (1) indicates 
no ancillary bits added to the circuit.  Notice that some of the solutions quadrupled the 
circuit’s width to achieve a low quantum cost as shown in the column labeled “RevLib”.  
This cost represents the number of 1 and 2-qubit elementary gates (NOT, CV or CNOT) 
regardless of the distance between the end points.  The results of applying our LNNQC 
   
 155 
using the naïve method of theorem 1 (LNN1) and the optimized method of theorem 2 
(LNN2) are shown in the table as well.  
When I compare the RevLib quantum cost to the optimized LNNQC in column 5 
(LNN2), I notice that some of the functions increased drastically (e.g., 517779% for the 
function frgl).  This is clearly due to the use of many distant gates or large MCT gates 
which when decomposed result in many large distance gates.  The optimized LNN2 cost 
is still much better than the naïve approach of theorem 1, which results in savings up to 
1264%. 
  Conclusion and Analysis 8.7
A huge effort has already been invested in the field of automated quantum logic 
synthesis that is years ahead of the feasibility of any quantum computing devices.  Yet, 
although some notable achievements have been observed in demonstrating the feasibility 
of quantum computation, technical challenges and constraints exist.  One of the most 
notable constraints is the nearest neighbor restriction to interaction amongst qubits 
involved in a computing paradigm.  As the field of quantum logic synthesis continues to 
be in its infancy, solutions using different methodologies continue to emerge.  In this 
chapter, by imposing natural technological constraints, I propose a quantum cost metrics 
which would level the playing field amongst diverse types of quantum logic synthesis 
algorithms, and would allow for a more reasonable comparison amongst them.  I hope 
that this body of work will inspire the scientific community to bring solutions that would 
one day be applicable to a future computing machine. 
   
 156 
Table 8-3 RevLib Benchmarks with LNNQC [30] 
 
  
% Increase % Savings
Thm 2 (LNN1) Thm2 (LNN2) LNN2 / Q C LNN1 /LNN2
apex4 22 3,438 189,594 53,742 2.20 1563% 353%
apex4 28 237,963 7,945,022 2,676,178 3.11 1125% 297%
C17 7 99 407 337 1.40 340% 121%
C7552 21 1728 12476 8562 4.20 495% 146%
cm150a 22 1096 17937 7765 1.05 708% 231%
cm152a 12 252 1208 888 1.09 352% 136%
cm163a 29 756 35547 10991 1.81 1454% 323%
cm42a 14 377 1,787 1,379 3.50 366% 130%
cm85a 14 2252 52923 19537 1.27 868% 271%
cmb 20 910 622786 98190 1.25 10790% 634%
con1 9 206 1151 833 1.29 404% 138%
cordic 25 349522 2963275837 327591467 1.09 93726% 905%
cu 25 1148 151076 33484 1.79 2917% 451%
dc1 11 416 1,863 1,549 2.75 372% 120%
dc1 11 416 1,863 1,549 2.75 372% 120%
dc2 15 1886 33663 13727 1.88 728% 245%
decod 21 1,728 12,476 8,562 4.20 495% 146%
dist 13 7,601 246,881 75,643 1.63 995% 326%
dk17 21 1,559 96,013 26,759 2.10 1716% 359%
dk27 18 248 3614 1792 2.00 723% 202%
ex1 6 7 47 63 1.20 900% 75%
ex1010 20 155534 18207859 4341197 2.00 2791% 419%
ex2 6 141 617 407 1.20 289% 152%
ex2 6 141 617 407 1.20 289% 152%
ex3 6 79 297 219 1.20 277% 136%
example2 16 5,654 356,857 90,947 1.60 1609% 392%
f2 8 255 1117 815 2.00 320% 137%
f51m 22 37,400 21,966,577 3,524,427 1.57 9424% 623%
frg1 31 15,265 999,144,550 79,038,956 1.11 517779% 1264%
in0 26 20031 14814870 2381782 1.73 11890% 622%
in2 29 23,802 11,908,647 2,095,751 1.53 8805% 568%
inc 16 2,140 24,229 11,943 2.29 558% 203%
life 10 6,766 130,289 39,383 1.11 582% 331%
majority 6 136 624 374 1.20 275% 167%
max46 10 5444 151559 45249 1.11 831% 335%
misex1 15 982 7,735 4,857 1.88 495% 159%
misex3 28 119177 95456974 15130206 2.00 12696% 631%
misex3c 28 115,190 67,750,817 11,376,499 2.00 9876% 596%
mlp4 16 3753 98689 34415 2.00 917% 287%
mux 22 1078 17835 7693 1.05 714% 232%
parity 17 32 512 544 1.06 1700% 94%
pcler8 21 327 4890 2334 1.31 714% 210%
pm1 14 377 1,787 1,379 3.50 366% 130%
radd 13 676 5096 3494 1.63 517% 146%
rd32 5 29 67 95 1.67 328% 71%
rd32 5 18 56 86 1.67 478% 65%
rd32 5 116 292 252 1.67 217% 116%
rd53_68 8 265 1,283 949 1.60 358% 135%
rd73_69 10 1143 9750 5338 1.43 467% 183%
rd84 12 2,749 55,313 19,917 1.50 725% 278%
root 13 3,443 113,383 34,723 1.63 1009% 327%
ryy6 17 4,292 3,646,800 515,126 1.06 12002% 708%
sao2 14 7670 675960 152650 1.40 1990% 443%
sqn 10 2,122 20,964 9,416 1.43 444% 223%
sqr6 18 1,033 9,545 5,735 3.00 555% 166%
sqrt8 12 622 8234 3784 1.50 608% 218%
squar5 13 442 2,983 2,151 2.60 487% 139%
sym10 11 25866 828384 197384 1.10 763% 420%
t481 17 237 1,293 995 1.06 420% 130%
table3 28 80,039 105,133,378 15,706,518 2.00 19624% 669%
tial 22 56,203 21,409,199 3,933,821 1.57 6999% 544%
wim 11 217 923 769 2.75 354% 120%
x2 17 625 9,768 4,244 1.70 679% 230%
xor5 6 7 47 63 1.20 900% 75%
z4ml 11 642 4,326 2,762 1.57 430% 157%
Function # of qbits
Ancillary 
Ratio
Non LNNM 
Metrics 
(RevLib)
LNNMQ C
   
 157 
 Multi-Dimensional LNNM Architecture Chapter 9
 Introduction 9.1
Jordan in his thesis [55] described an algorithm for computing the ground state of 
infinite 2D quantum lattice system and the utility of a 2D lattice in quantum computation.   
Jordan described a lattice structure where neighboring qubits, arranged in rows and 
columns, would interact in compliance with the nearest neighbor model.   Choi, et al [92] 
provided an algorithm for constructing a 2-D NTC (Nearest Neighbor Two qubit gate) 
and demonstrated the construction of a full adder using this architecture.  The proposed 
architecture lays out the qubits in a lattice pattern where, in addition to neighboring 
qubits interacting, Choi et al. allows interaction in the diagonal direction as well. 
In this chapter, I chose to follow the construction of Jordan [55] allowing interaction 
only between neighboring qubits in the same row or column, but not across diagonals.  In 
the description of the Paul’s trap in section 3.5.2.1, where I discussed the construction of 
a CNOT gate in the linear ion-trap, I described that the two qubits are entangled together 
through the common vibrational mode of qubits along the axis of the ion-trap.   In order 
to accomplish entanglement in a 2-D ion-trap, a similar common mode would have to 
occur independently in both row and column direction in order to allow for 2-qubit gate 
interaction.  Unless demonstrated otherwise, I believe that the likelihood of attaining two 
common vibrational modes in the orthogonal directions of rows and columns of a 2-D 
ion-trap lattice is higher than the possibility of adding two additional common vibrational 
modes in the diagonal directions as proposed by Choi [92]. 
   
 158 
 The discussion in this chapter is limited to the layout of a Multiple Control Toffoli 
(MCT) gate in a two dimensional grid and the calculation of the Planer Nearest Neighbor 
Quantum Cost (PNNQC) using such layout. 
 
 MCT4 in 2-D 9.2
 
Figure 9-1 MCT4 in two dimensional grid layout where, now, qubits ‘a’ and ‘d’ are considered 
neighbors and the first CV gate is compliant with the LNNM model (a cost of 1) compared to the linear 
arrangement which requires 8 additional CNOT gates to bring it into compliance.  Notice that the pairs 
‘ac’ and ‘bd’ are still one qubit apart. 
Figure 9-1 shows the internal layout of the MCT4 gate from the previous chapter 
where the qubits are laid out in a 2-D grid pattern.  In a linear ion-trap, the qubit pairs 
(a,c), (a,d) and (b,d) are considered distant pairs and require swapping of values to bring 
them next to one another for interaction.  In the two dimensional grid layout, in 
accordance to the model outlined by Jordan [xx], the pairs (a,d) become neighbors and 
can directly interact.  I calculated the Linear Nearest Neighbor Quantum Cost (LNNQC) 
for the MCT4 at 31 single and double qubits gates.  The two dimensional (Planer) 
Nearest Neighbor Quantum Cost (PNNQC) is shown in Figure 9-1 as 29 primitive gates, 
which, at 9%, is not a significant saving.   With this arrangement, interaction between the 
pairs (a,c) and (b,d) require the addition of single set of swap gates to bring them next to 
   
 159 
one another, and another set to bring their values back to position (at a cost of 4 CNOT 
gates). 
Examining the relationship between the qubits, I noticed that qubits a and d interact 
the most with the other qubits.  I also noticed that qubit a always interacts as a control 
line and never as a target line.  Qubit d on the other hand is always a target line and never 
a control line, while all other bits in between interact as both control and target.  Figure 
9-2 shows another two dimensional arrangement of the qubits where I added two ancilla 
qubits and copied the duplicated the value of qubit a into both of them.  The idea here is 
that since qubit a does not change throughout the operation, it should be safe to copy it 
and bring its value close to the other qubits in order for it to interact in accordance with 
the nearest neighbor constraint.  In this arrangement, only the pair (c, d) are a two qubits 
away requiring a set of swap gates (along with their mirrors) to interact.  The PNNQC in 
this case is 21 primitive gate which represents a 34% saving.  In this case, of course, I 
have to add two ancilla qubits, which might not be desirable. 
 
Figure 9-2 MCT4 in a planer layout with qubit ‘a’ duplicated into two additional ancilla qubits in order 
to bring pairs ‘ac’ and ‘bd’ next one another.  Notice that with this arrangement, ‘b’ and ‘c’ are now the 
only pair which are one qubit apart.  Using this arrangement reduced the quantum cost further to 21 (rom 
29) at a cost of 2 ancialla qubits. 
   
 160 
 MCT5 in 2-D 9.3
I now extend this line of reasoning to the MCT5 gate as shown in Figure 9-3.  In this 
arrangement, the target qubit e is placed at the center of a 2-D grid because this qubit has 
interaction with all other qubits and it is the final recipient of all transforms.  This 
arrangement, however, puts a distance between the pairs (a,b), (b,c), (c,d) which are 
typically neighbors in a the one dimensional arrangement of the previous chapter.  This 
distance interaction result in a negative impact on the PNNQC (85) as compared to the 
LNNQC(77) – a 10% increase in quantum cost. 
 
 
Figure 9-3MCT5 gate with a star-shaped 2-D layout centered around the target qubit ‘e’.  The 
LNNMQC for this arrangement (85) is actually worse than the linear arrangement of qubits (77) since now 
qubits ‘a’, ‘b’, ‘c’, and ‘d’ are each remote to one another which is not the case for the linear arrangement. 
Figure 9-4 shows another arrangement of the MCT5 gate where, similar to the MCT4 
gate above, three additional ancilla qubits are added and the a qubit value is copied into 
them.  In this case, the a qubit is a neighbor to all other qubits and all interactions require 
a single primitive gate.  The PNNQC cost in this case is 53 primitive gates which 
represents a 32% saving over the LNNM model of the previous chapter. 
   
 161 
 
Figure 9-4 A modified planer arrangement of the MCT5 gate with the duplication of qubit ‘a’ into 
additional ancilla qubits.  Since qubits ‘a’ and ‘e’ interact with all other qubits, our strategy is to bring 
these two qubits as close to the others as possible.  Since qubit ‘a’ never acts as a target qubit, it is easy to 
just mirror its value upfront to other ancilla qubits which will interact in a ‘neighborly’ manner with the 
other qubits.  This pattern reduced the quantum cost to 53 compared to the linear arrangement of 77. 
In the previous arrangement I noted that qubit b interacts with both qubits c and d 
remotely because of the position it occupied.  Once the interaction of a and b is 
concluded after the first few gates, qubit b assumes a similar role to qubit a where it only 
acts as a control.  At this stage, I can transport qubit b to a new location in nearness to 
both qubits c and d as shown in Figure 9-5.  Since the original location of qubit b is a 
distance 3 from the new position, it takes 6 CNOT gates to transport it to the new 
location.  Once transported, all interactions of qubit b with both qubits c and d require a 
single primitive gate at a cost of one(1).  At this rate, the final PNNQC of this 
arrangement is 43 primitive gates which represent a saving of  44% over the LNNQC.  
The cost of this arrangement, however, is three additional ancilla qubits. 
   
 162 
 
Figure 9-5 Another improvement of the planer arrangement of the MCT5 gate where qubit ‘b’ was 
transported to another location which makes it next to both qubits ‘c’ and ‘d’.  This shuttling of qubit ‘b’ 
can only occur after the first stage once qubit ‘b’ is no longer a target.  The LNNM quantum cost is further 
reduced to 43 from 53.  
For larger size of the MCT gate, a similar analysis could be followed for a layout of 
the gate in a two dimensional grid with the ability to relocate qubits midway in order to 
facilitate near neighbor interaction amongst qubits while reducing quantum cost.  The 
number of ancilla bits to be added, however, will grow with the size of the gate which 
would increase the ancillary ratio discussed in the previous chapter. 
 Conclusion and Analysis 9.4
I introduced and analyzed the application Nearest Neighbor Model to a two 
dimensional grid which proved to be beneficial with respect to quantum cost.  However, 
the most benefit of a two dimensional arrangement typically comes at the cost of adding 
ancilla bits which is not always desired as it would increase the cost of constructing the 
system.  It is possible, of course, to add additional ancilla bits to the linear model of the 
last chapter, and also reduce the quantum cost in a manner similar to what was done 
using the 2-D layout.  The facts that the 2-D layout has yet to be implemented in the lab 
and that it is also possible to benefit from the addition of ancilla bits in the linear 
   
 163 
arrangement leads to believe that the linear arrangement is still the most useful method to 
pursue for future LNNMM compliant synthesis algorithms. 
 
  
   
 164 
 
PART IV 
 
 
 
QUANTUM LOGIC SYNTHESIS 
OF 
MULTIPLE VALUED FUNCTIONS 
  
   
 165 
 Synthesis of Ternary Quantum Circuits Chapter 10
 Prologue 10.1
In the realm of classical technology, the irreversibility of digital logic gates results in 
information loss, which manifests itself as heat dissipation.  Landauer proved that using 
irreversible logic gates yields a rate of energy loss proportional to kT [1].  Essentially, 
information equals energy, and the loss of it equals heat loss.  Computations that preserve 
information are considered reversible and gates that perform reversible computation are 
designated as reversible gates.  Bennett [3] showed that a near-zero energy dissipation is 
possible when a computer can operate near its thermodynamic equilibrium and further 
displayed that such a stasis state can be achieved through reversible components. Toffoli 
[4] showed that quantum logic gates are inherently reversible and demonstrated a set of 
universal quantum binary primitive capable of implementing any logic circuit - namely, 
NCT library (Not, Controlled-Not and Toffoli gates).  The qubit came to represent the 
quantum analogy of the classical symbol of information carrier: the bit. Possibly years 
before the feasibility of mass production of quantum computers, researchers have been 
laying the foundation for manufacturing such a computing device by exploring automated 
synthesis algorithms of quantum logic circuits.  In this chapter I tackle the problem of 
quantum logic synthesis for ternary quantum logic. 
Ternary-valued logic represents information in a base 3 system with three base states 
{0, 1 and 2} where a qutrit (trit for short) is a quantum unit of information with three 
basis states.  Qutrit is the ternary equivalent of the binary qubit.  A ternary-valued  m-
variable reversible logic function maps each of the 3m input terms to a unique output 
   
 166 
term; or mathematically speaking, it is an onto and one-to-one function or a bijection.  
The problem of synthesizing a reversible circuit is the process of constructing a cascade 
of ternary reversible gates, which maps each, said input term to its corresponding unique 
output term.  Mathematically, quantum circuit synthesis represents a decomposition of 
circuit’s specification to a number of small permutations of reversible gates. 
In their work, Miller, Maslov and Dueck [10], henceforth MMD3, presented 
exhaustive results for all (9!) permutations of a two-variable ternary reversible functions.  
They further illustrated a synthesis example of the inherently irreversible 3-trit full adder 
by adding a single ancillary trit to create a 4-trit reversible function, and then applying 
their synthesis algorithm to such function. 
M. M. Khan, et al. [93]presented a method for synthesizing ternary GF(3) based 
reversible logic circuits while avoiding the addition of ancillary trits.  M. H. Khan, et al. 
[54] presented another method of synthesis of ternary circuits based on the Galois field 
sum of products (GFSOP) using cascades of multiple input ternary Toffoli and swap 
gates.  Al-Rabadi [94, 95] proposed a Galois field based approach to ternary logic 
synthesis using fast spectral transforms and fast permutation transforms.   
Our major contributions in this chapter are the following: 
1. Demonstrate the advantage of exploring different input vector sequences on the 
synthesized circuit quantum gate cost. 
2. Outline an algorithm for constructing valid input vector sequences using a ternary 
Hasse structure and provide a proof of algorithmic convergence for such sequences. 
   
 167 
3. Establish a set of benchmark cost numbers of synthesis of large ternary functions 
up to 9 variables.  Alhagi, et al [22], defined large binary functions as those 
consisting of eight (8) or more binary qubits whose information could easily be 
contained within 5 trits.  A nine trit register carries the equivalent of 14.25 binary 
bits of information. 
4. Introduce the set of Hidden Weighted Trit (HWT) functions into the literature as a 
set of benchmark functions for ternary logic synthesis.  HWT functions are an 
extension to the Hidden Weight Bit (HWB) benchmark functions [20, 30, 96] 
extensively used in the literature as one of the harder binary benchmarks functions. 
5. Establish an open source repository [29] for the scientific community to hold and 
share the set of benchmarks for ternary functions. 
For the sake of self-containment, section 0 introduces the domain of ternary logic and 
reversible gates followed by an example of the operation of the algorithm for a two 
variable ternary function in section 10.4.  Section 10.5 provides a detailed explanation of 
our ternary logic synthesis algorithm where it describes the concept of control line 
blocking, and demonstrates how to construct Hasse diagrams and generate input vector 
sequences from such diagrams.   Sections 10.6 and 10.7 provide proof of convergence of 
the algorithm for three classes of ternary precedence orders.  Section 10.8 gives details of 
the genetic algorithm followed by an analysis of the experimental results and conclusion.  
   
 168 
 Ternary Logic System 10.2
 Measurement of a qubit 10.2.1
The theory of quantum mechanics depicts the qubit as a quantum state which could 
exist in a state of superposition between the two basis states of {0, 1}. However, upon 
observing the state of such a register, i.e., measure its value, the qubit loses its state of 
superposition and collapses to one of the two basis states.   Due to the quantized nature of 
the particle used for computation (polarization of a photon, presence of an electron 
...etc.), a detector monitoring a value of zero(0) would either observe the particle (a one) 
or not (a zero), but nothing in between.  For example, placing a set of two orthogonal 
polarization filters in the path of a stream of photons would polarize some of the photons 
along one axis (basis state zero) and the reset along the other axis (basis state one).  In 
essence, measurement of a qubit in the superposition state of   𝛼 0 + 𝛽   1     forces the 
qubit to collapse to a zero with a probability of |α|2 or to a one with a probability of |β|2.  
Visually, Figure 10-1 depicts measurement as a projection of the vector representing 
the superposed state onto the vectors representing the two basis states {0, 1}.  Similar to 
any probabilistic computation, a quantum computation is typically performed on an 
assembly of quantum systems consisting of a large number (N) of identical quantum 
circuits, all initialized to the identical set of input values.  Measurement is performed by 
exposing half of the particles to the zero detector and the rest to the one detector, where, 
probabilistically, the sensor detecting the highest number of hits reflects the internal state 
of the quantum register.  Such a binary detection system could be extended to ternary 
logic where three measurements are sought after, {0, 1, 2}. 
   
 169 
 
Figure 10-1 Measurement of qubit in an optical quantum system can be performed by placing 
horizontal and vertical filters along with photon detectors on two orthogonal axis for detecting the number 
of photons on each point.  One of the detectors is assigned the state    0  and the other state    1 .   
 Trits and Ternary States 10.2.2
Ternary logic is the closed logic system with certain ternary operators that operate on 
three logic values {0, 1, 2}.  In quantum mechanics, the three ternary values could 
correspond to the different polarization of a photon or alignment of nuclear spin in a 
uniform magnetic field.  To date, Nuclear Magnetic Resonance (NMR) and Ion Trap are 
the most promising technologies which were used to demonstrate “quantum circuit 
model” of quantum computation. 
Definition 10-1: A ternary quantum bit, a trit, is a ternary quantum system defined 
over the Hilbert space ℋ! with basis states {   0 ,   1 ,   2 }, which are represented with 
the following Heisenberg vector notation: 
   0 = 100 , 1 = 010   , 2 = 001  
A two-variable register consists of two trits which has an information capacity of 32(9 
possible distinct states) represented as follows:  
   
 170 
   𝟎𝟎     𝟎   ⨂ 𝟎  = 1  0  0  0  0  0  0  0  0 !    𝟎𝟏     𝟎   ⨂ 𝟏  = 0  1  0  0  0  0  0  0  0 !    𝟎𝟐     𝟎   ⨂ 𝟐  = 0  0  1  0  0  0  0  0  0 !    𝟏𝟎     𝟏   ⨂ 𝟎  = 0  0  0  1  0  0  0  0  0 !    𝟏𝟏     𝟏   ⨂ 𝟏  = 0  0  0  0  1  0  0  0  0 !    𝟏𝟐     𝟏   ⨂ 𝟐  =  0  0  0  0  0  1  0  0  0 !    𝟐𝟎     𝟐   ⨂ 𝟎  =  0  0  0  0  0  0  1  0  0 !    𝟐𝟏     𝟐   ⨂ 𝟏  =  0  0  0  0  0  0  0  1  0 !    𝟐𝟐     𝟐   ⨂ 𝟐  = 0  0  0  0  0  0  0  0  1 ! 
where the ⨂ symbol represents the mathematical tensor (Kronecker) product of the two 
trits.  Consequently, an n-trit ternary register is a vector of n-ternary trits with a capacity 
of 3n states that are represented by the following equation: 
( ) i
n
i
t φ⊗
=
=Ψ
0
   (2) 
where Ψ t  represents the state of the system at time (t)  and    ∅𝒊  is the state of trit (i). 
 Reversible Operations 10.2.3
Definition 10-2: A k-variable ternary reversible gate (operator) is a bijective, one-to-
one and onto mapping of every permutation of the 3k input patterns. 
Unlike classic logic, quantum logic circuits are inherently reversible and can only be 
constructed from reversible logic gates.  Logical reversibility is the ability to reconstruct 
the input of a function from its output, and vice versa.  The definition above stipulates 
such reversibility with the one-to-one mapping where each input term is mapped to a 
single element of the output, and vice versa.  The onto stipulates that all elements of the 
output set are used, and hence, there are the same number of elements in the input and 
   
 171 
output sets.  The third requirement is of closure where the range and domain of the 
function are identical sets.  The reader can easily deduce that, with definition 2, the set of 
output terms is simply a permutation of the input terms where each set includes unique 
set of elements. 
 
Figure 10-2 (a) non-reversible functions f and g since separately nor combined (fg) since the value 01 
is repeated twice on the output and there is no way to reconstruct the input; (b) ax is reversible function 
since it is possible to determine the input minterm for each output minterm. 
To illustrate, Figure 10-2 (a) shows the two logical functions AND (f) and OR (g), 
which are, separately and jointly, irreversible.  Both functions map the two inputs, a and 
b, to a single output, f or g, making it impossible to reconstruct the input pair from the 
single output.  Since these gates only have a single output, one of their inputs has 
effectively been erased and the information it carries has been lost.  Figure 10-2 (b), by 
contrast, represents the logical XOR function x which, taken alone, is unidirectional and 
irreversible.  However, when x is combined with a copy of input a, the combined pair a∙x 
represents a reversible function where each input term, a∙b, maps to a single unique 
output term, a∙x, and vice versa. ■  
In binary logic, only two reversible gates (operators) exit: a wire representing identify, 
and an inverter representing negation.  As with all reversible gates, these binary operators 
have the same number of input and output variables, one variable, which uniquely map 
each input value to an output value as shown in Figure 10-3.  
   
 172 
Notice that, mathematically, the two functions represent the ordered set of all 
permutations of the input values, 0 and 1: {wire: (0, 1), inverter: (1, 0)}. By corollary, 
the ternary values {0, 1, 2} can be fully permuted into six unique sequences yielding a 
total of six unique ternary operators representing the ordered set of the ternary values: 
{(0,1,2),  (0,2,1), (1,0,2), (1,2,0), (2,0,1), (2,1,0)}.  
 
Figure 10-3 Symbols for quantum binary Identity (wire) and NOT gates. 
a a
a a
In (a) 0 1
Wire (a) 0 1
NOT (a) 1 0
   
 173 
 Ternary Reversible Operators 10.3
Figure 10-4 lists the six ternary operators for a single ternary variable with their names 
in the first column, mathematical equation in the second column, truth table in the third 
column, and the symbolic notation in the last column.  Clearly, the +0  operator is the 
analogy of a wire.  The +1  and +2  operators are ternary inverters which perform a 
mathematical summation of the gate's inputs modulo 3.  The ⟦12⟧, 02  and ⟦01⟧ 
operators swap their namesake input values without affecting the third ternary value.  For 
example, the ⟦12⟧ gate swaps the values one and two while leaving zeros alone.  
 
Figure 10-4 Generalized Ternary Gates with the gate name given in the first column and the 
mathematical equation for calculating the outputs based on the current input.  The third column shows the 
output for the three possible inputs shown in the header, and finally the gate symbol is shown in the last 
column. 
In an ion-trap system, a ternary system can be implemented in a similar manner as I 
described in section 3.5.2 above where I outlined the operation of a binary system with 
two the basis states    0   𝑎𝑛𝑑   1  represented as the energy levels    𝑔   𝑎𝑛𝑑   𝑒 .  The 
ternary energy states shown in Figure 10-5 can be used to implement the ternary quantum 
states    0 ,    1   𝑎𝑛𝑑     2 .  Ternary gates can be implemented by the application of the 
appropriately detuned laser beam to perform state transitions from one state to the next.  
   
 174 
For example, the [[01]] gate can be performed by the application of a laser beam of w01 
frequency which will transition qubits in state    0  to state    1  and qubits in state    1  to 
state    0  while leaving qubits in state    2   unchanged.  The [[12]] gate can be 
implemented in a similar manner with the application of a laser pulse of w12 frequency. 
 
Figure 10-5 Ternary energy states of an ion-trap system.  In (a) transitions between any two states is 
possible with a single laser pulse which would require three finely detuned laser rays targeted at each ion.  
In (b) only transitions between neighboring energy levels are possible but only two laser rays are required. 
Implementing the [[12]] gate depends on the construction of the ion-trap.  Figure 
10-5a shows an ion-trap construction with three laser beams allowing a direct 
implementation of the [[12]] gate with a single application of a laser pulse of w12 
frequency.  The implementation shown in Figure 10-5b requires two pulses to implement 
the [[12]] gate going through the    1  state. 
Definition 10-3: A controlled gate is a logic gate consisting of n variables where the 
values of (n-1) control variables enable the operation on the target variable (n). 
Definition 10-4: Control lines are independent variables where a specific pattern of 
values on the set of control lines affects the operation on a dependent variable (target 
line). 
   
 175 
Definition 10-5: A target line is a dependent variable where a specific operation on 
the line is enabled iff a set of control lines matches a specific pattern; otherwise, the 
signal on the line is passed through unchanged. 
 
 
Figure 10-6  (a) Ternary Inverter vs. (b) binary Inverter; (c)  ternary controlled op (C-OP) vs. (d) 
Feynman CNOT; and (e) ternary (C2-OP) vs. (f) Toffoli gate (C2-NOT). 
Figure 10-6 (a) shows the ternary extension to the binary inverter gate of Figure 10-
6(b).  In the ternary case, however, the +1   and +2  act as inverters to one another 
which, visually, a +1  followed by a +2  operator represents a cumulative rotation of 
360° around the Bloch sphere bringing the atomic particle to its original orientation. 
Figure 10-6 (c) shows the ternary extension to the binary C-NOT gate in Figure 10-6 (d).  In  this case, a value of one (1) on control line (a) activates the +2  operator on line (b) 
while other values on line (a) would pass (b) unchanged.   Of course any of the five 
ternary operators could be used on the target line (b)  resulting  in  five  representations  
   
 176 
of   the   controlled  gate.     Additionally,   the   control   line   (a)   could   theoretically  utilize  any  of   the   ternary  values   {0,  1,  2}  as  a   control   signal   resulting   in  a   total  of   fifteen  (15)  representatives  of  the  Feynman  gate  in  the  ternary  domain.      Similarly Figure 
10-6 (e) shows a ternary equivalent of the Toffoli gate where the operator ⟦12⟧  affects  the   target   line  (c) only if (a) and (b) are both one (1).  The reader can easily deduce, 
through the same argument above, that a total of 45 representatives of the Toffoli gate 
exist in the ternary space.  Al-Rabadi [95] and Khan [93] considered several multi-valued 
(MV) and ternary gates including the mod-sum gates used in this chapter.  Miller et al 
[10] used ternary gates identical to the gates shown in Figure 10-6. and they limited the 
control line to the value of one (1). 
 Synthesis by Example 10.4
Before I delve into the details of the ternary synthesis algorithm, it would be helpful to 
start with an illustration of the process as described by Miller, et al [10].  Figure 10-7 
shows a reversible ternary function of two variables composed of 9 terms (32).  Column 
(ab) represents the input vector and column (AB) represents the corresponding output 
vector.  The objective of the synthesis is to create a cascade of primitive reversible 
ternary gates to map all input minterms to their corresponding output minterms.  The 
algorithm terminates when all terms of the output vector (AB) map to their corresponding 
terms of the input vector (ab) – compare column 8 to column 1.  The algorithm processes 
the terms one trit at a time and places gates only when the input/output trits of the same 
position mismatch.  The algorithm observes a simple, yet essential, guiding principle 
stating that: A completely mapped pair should never be altered by succeeding mapping 
   
 177 
calculations.  This important rule assures that the algorithm will always converge, which 
is an essential criterion for synthesizing arbitrary reversible circuits. 
 
Figure 10-7 Ternary Synthesis Example transforming output vector AB (column 2) into input vector ab 
(columns 1 and 8).  The [[+1]] in the header represents the ternary gate inserted at that point which is also 
shown pictorially in the circuit below the column. The shaded values indicate the values impacted by the 
gate; the [[+1 1]] in the header of column 5 indicate that the lower qutrit is used as a control of value ‘1’ 
while the upper bit has the [[+1]] gate as shown in the pictorial below the column.  
Figure 10-7 illustrates the step by step synthesis process along with the circuit diagram 
for each transformation, as follows: 
• Considering the inherent reversibility of the function, the algorithm starts synthesis 
from the output column (AB) towards the input column (ab). 
• Starting with first pair (00 à 12), the algorithm realizes that a [[+1]] inverter on line 
(a) would correctly map the upper trit 0 to 1.  Essentially, any value presented on the 
(a) line will be incremented by 1 modulo 3, as shown in the shaded text of the third 
column. 
1 2 3 4 5 6 7 8
a b A B +1 +2 +1 1 +2 2 2 +2 12
0 0 0 1 2 0 2 0 0 0 0 0 0 0 0 0 0
1 0 1 2 0 1 0 1 1 0 1 0 1 0 1 0 1
2 0 2 0 1 2 1 2 2 2 2 0 2 0 2 0 2
3 1 0 2 1 1 1 1 2 1 2 2 2 2 0 1 0
4 1 1 0 2 2 2 2 0 2 0 2 0 2 1 1 1
5 1 2 1 0 0 0 0 1 2 1 2 1 2 2 1 2
6 2 0 2 2 1 2 1 0 1 0 1 0 1 0 2 0
7 2 1 0 0 2 0 2 1 1 1 1 1 1 1 2 1
8 2 2 1 1 0 1 0 2 0 2 1 2 1 2 2 2
A
B
a
b1 2
2
+2
12+2
+2
+1+1
1
2
Control	  
line	  
values
   
 178 
• A second gate, ⟦+2⟧ is placed on line (B) bringing its value from two to zero 
matching the corresponding input line (b).  Notice that due to their unconditional 
nature, the above two gates affect all the terms of the output vector as demonstrated 
by the shaded values. 
• The synthesis process continues with the second term where the upper trit  of input 
term (01)  mismatches the newly realized output term (11) in column 4.  The 
algorithm places a ⟦+1⟧ gate on line (A) to perform a 1 à 0 transformation only if  
line (B) has the value of one; hence, a controlled gate.  As stated before, the control 
values (highlighted with thick borders) are used to ensure that completely mapped 
pairs are never modified by a later step. 
• The third input term (02) now maps to (22).  A [[+2]] gate on line (a) controlled by a 
value of (2) on line (b) remedies the mismatch of the upper trit. 
• The fourth input term (10) now maps to (22) requiring two gates to correct.  The first 
gate is a controlled [[+2]] on line (b) controlled by a value of (2) on line (a). 
• To correct the upper bit, I realize that a [[12]] swap gate on line (a) would correctly 
map that line and the remaining minterms of the function. 
• Realizing that column 1 is identical to column 8, the synthesis algorithm terminates 
with a successful synthesis of the specification. 
Taking a deeper look at the synthesis example, the reader would surely discover a 
discordance between the operator indicated on top, e.g., ⟦+1⟧, and what the table 
indicates as the result of such an operation (shaded).  For example, with the pair (00, 12), 
performing ⟦+1⟧ on the trit value of (A=1) would surely yield a value of (A=2) not the 
   
 179 
(A=0) indicated in the first row of the table, column 3.  To alleviate this disagreement, 
remember that I have started our synthesis process from the output vector (AB), column 
2, heading towards the input vector (ab), replicated in column 8.  And why not, this is a 
reversible circuit after all!  So, in truth, the operation ⟦+1⟧ answers the question, “what 
do I need to make the input trit of value (a=0) into the output trit of value (A=1)?” A ⟦+1⟧ of course! And that is exactly what I have specified.  I could have easily started 
from the input vector (ab) and synthesized the circuit by adding gates to match the output 
in a similar manner and would have surely ended up with a circuit functionally equivalent 
to the one shown above. 
 Ternary Logic Synthesis Algorithm 10.5
I was inspired by our research in the binary domain [22] which revealed that the size 
of the resulting circuit is greatly influenced by the arrangement of terms of the input 
vector.  In the above example, the terms of the input vector (ab) are arranged in their 
natural ternary order.  The remainder of this chapter answers the questions: Is the size of 
the circuit influenced by the arrangement of the input vector?  Do such arrangements 
affect algorithmic convergence? 
I will demonstrate below that the answer to both questions is in the affirmative where 
the arrangement of the input vector influences the number of gates required to implement 
the circuit.  I will also demonstrate through an example the concept of control line 
blocking where, for a subset of such input orderings, the algorithm becomes trapped in an 
endless loop.  It is worthy to mention, for the sake of completeness, that our algorithm for 
   
 180 
synthesizing reversible circuits adheres to the basic assumption for quantum circuits as 
outlined by Toffoli [4] below: 
1. No fan-out is permitted between gates, 
2. Loops are not permitted, 
3. Permutations of connections between gates are permitted.  
 Control Line Blocking 10.5.1
Some minterm orderings of the input vector violate the bedrock principle of never 
altering previously completed mapped pairs, and forces the algorithm into an infinite 
loop.  Using the same steps of section 10.4 above, Figure 10-8 lists the synthesis of the 
first three minterms of the input sequence: 00 22 02.  Once the second minterm is 
mapped correctly, trying to map the third pair (20 à 02) becomes impossible without 
altering the first two completely mapped pairs.  For example, in order to map the lower 
trit correctly (0 à 2), I would normally use the upper trit (2) as a control signal and 
possibly apply swap gate ⟦02⟧ to provide the correct mapping.  However, such an 
operation would surely alter the completely mapped pair of the second row (22à 20), 
and in effect violates the aforementioned principle.  The reader can easily deduce that 
going back and attempting to correct the term of the second row will result in an infinite 
loop.  Similarly, attempting to use the lower trit (0) as a control signal is also destructive 
where, in this case, the first completely mapped pair would be altered. 
   
 181 
Definition 10-6: Control Line Blocking condition occurs when all control lines of the 
current minterm are a subset of the control lines of a previously completed minterm for a 
given input order. 
It is possible, of course, to programmatically detect when a previously mapped pair 
has been altered, and, consequently, reject the input sequence.  Going through all possible 
permutations of input arrangements would surely guarantee the discovery of the most 
optimal solution.  Even for functions with a small number of ternary variables, however, 
attempting all permutations is impossible.  A lowly 3-variable ternary function consists of 
27 (33) minterms resulting in 27! (1028) possible permutations: not an easy feat even for 
our most powerful computing machines.  The question then is: would it be possible to 
only focus the search on sequences which are guaranteed to converge?  
 
Figure 10-8 When the algorithm reaches the shaded minterm which needs to be transformed from 20 to 
02.  If the upper trit is used as control to avoid changing the second minterm, any gate applied to change 
the lower trit from 0 to 2 will also change the first minterm.  The same thing will happen if I use the lower 
trit (0) as control where in this case, the second minterm will be altered.  Either way, this is a violation of 
the algorithm and the algorithm is blocked at this point 
 Ternary Hasse Input Sequence 10.5.2
Mathematics comes to the rescue!  It is possible to construct a subset of all possible 
convergent sequences using the mathematical concept of Hasse diagrams and covering 
graphs.  Rather than cycling through the entire set of permutations, I could easily 
1 2 3 4 5 6
a b A B +1 +2 +2 1 2 +2
0 0 1 2 0 2 0 0 0 0 0 0
2 2 2 0 1 0 1 1 2 1 2 2
0 2 0 1 2 1 2 2 2 2 2 0
   
 182 
construct a number of such valid input sequences and discover the ones which provide 
the circuits with the lowest quantum cost. 
Definition 10-7: a Hasse or Poset diagram is a type of mathematical diagram used to 
represent a finite partially ordered set, in the form of a graph where, for the relation 
{(x,y) | x ≤ y | x,y ∈ S}, each element of S is a vertex in the plane and draws a line 
segment or curve that goes upward from x to y whenever y covers x (that is, whenever x < 
y and there is no z such that x < z < y). The relations < and ≤ represent a precedence 
hierarchy between the operands and not necessarily analogous to the mathematical 
inequality relations on real number. 
I will start with a demonstration of constructing a Hasse diagram for a two variable 
ternary function – see Figure 10-9, Starting from smallest valued minterm (00), I draw a 
line to each of the two minterms which satisfy the relation of partial ordered sets: {(x, y) | 
x ≤ y | x, y ∈ S}.  Loosely speaking, I find all minterms which are trit-wise larger than 
the minterm at hand.  For 00, adding a 1 to the lower trit yields 01, and adding a one to 
the upper trit yields 10 – shown in Figure 10-10(a). 
In a similar fashion, the 01 minterm would yield 02 by adding a 1 to the lower trit, and 
a 11 by incrementing the upper trit – Figure 10-10 (b).  The process repeats for all the 
terms in the set until the highest number 22 is reached.   Consider for a moment the upper 
trit of term 02 in Figure 10-10 (c).  Notice that for the branch (00à01à02), each 
transition only affects a single trit and that, for the sake of  maintaining closure within the 
ternary domain, the process stops once a trit reaches the value of 2.  Now that the lower 
   
 183 
trit for minterm 02 has reached the ceiling of 2, the upper trit can transition through its 
stages (02à12à22). 
 
Figure 10-9 Structure of the ternary Hasse diagram for a 2-variable function.  Each level (band) 
contains the set of minterms which has the same mathematical sum of digits. The sum of digits represents 
the band number shown on the right hand side. 
 Construction of Input Sequence   10.5.3
Once I have constructed the Hasse structure, I group all minterms at the same level 
together, within a set of bands, as shown graphically in Figure 10-9. 
Definition 10-8: A Hasse Ternary Band is the set of terms at the same level in a 
ternary Hasse diagram where the sum of trits of each term equals the zero-based 
numerical order of the band. 
The column on the right of Figure 10-9 shows the sum of each trit in a band.  For 
example, band 3 has the terms 12 and 21 which both add up to 3, and hence, they are in 
band 3. 
   
 184 
 
Figure 10-10 Construction of the Ternary Hasse Diagram for a 2-variable ternary function.  (a Starting 
from the bottom with 00, I add two new minterms by adding 1 to each digit resulting in (01, 10).  (b) for 
each new minterm (e.g. 01), I repeat the process where I add 1 to each digit resulting in (02, 11).  (c) For 
each digit, the process is repeated until the upper value ‘2’ is reached. 
Corollary 2: A ternary function with n variables has 2n+1 Hasse Ternary Bands. 
Since the highest band has the single minterm of n number of the digit 2, the sum of 
all digits is clearly 2n. Since the sum is zero based, according to Definition 10-7, the 
number of bands is 2n+1.∎ 
At this stage, I am able to use the Hasse Ternary Bands to construct input vectors 
which are guaranteed to converge.  The following pseudo-code outlines the process: 
The above pseudo-code can be described by the following steps:  
• Step 02: Start at the lower band consisting of all zeros (0…0), and stop at the 
upper band consisting of all twos (2…2), 
inputSequence := {}  
 for index :=0 to 2*n  
  bandSequence := Permutation (Band[index])  
  Append (inputSequence, bandSequence)  
 end for 
   
 185 
• Steps 03, 04: For each band, append any permutation of the terms within the 
band to the end of the sequence,  
Observe that, for a two variable function, the combined permutations of the bands will 
result in 24 valid input sequences (2!·3!·2!). The following two vectors are examples of 
valid input sequences:  
• S1 = {00, 01, 10, 11, 20, 02, 21, 12, 22}  
• S2 = {00, 10, 01, 20, 02, 11, 12, 21, 22}  
The alert reader will readily notice that, in constructing the input sequence, the 
0à1à2 precedence order, defined below, is not necessarily obeyed as prescribed in 
section Ternary Hasse Input Sequence above.  For instance, the vector S2 includes the 
term 10 with a high trit of one (1), followed by 01 with a high trit of zero (0); yet, I 
consider this sequence valid.  Notice that since the precedence criteria (of 0à1à2) 
applies only to the construction of the Hasse diagram, and not to the construction of the 
input vector from the Hasse diagram.  The only restriction for constructing the input 
vector from a Hasse diagram is: All minterms of a lower band must be used before any 
minterms in the next higher band.   Clearly the algorithm described above satisfies both 
conditions. 
Definition 10-9: Precedence order refers to the mathematical binary relation between 
a set of elements in a partially ordered set (Poset) where one element precedes the other. 
“Partial orders” reflect the fact that not every pair of elements of a Poset need be 
related. 
   
 186 
Now that an input vector has been constructed, I apply the same synthesis process 
detailed in the  
“Synthesis by Example” section above.  What are the advantages of this algorithm then?  
With this algorithm, I am now able to: 1) systematically construct multiple input vector 
arrangements which are guaranteed to converge, 2) without searching through every 
possible permutation of the input vector.  This allows us to examine a large number of 
input vector arrangements in order to discover the circuit with the best quantum cost 
among the input vectors.  Most likely, however, such a solution will not be the optimal 
circuit realization as the number of possible input vector arrangements grows 
exponentially. 
 Hasse Precedence Quandary 10.5.4
 In our attempt to construct Hasse diagrams for the ternary space, I was confronted 
with a dilemma regarding precedence in the form of the question: What precedence exists 
amongst the three ternary values {0, 1 and 2}? Figure 10-11, for example, shows three 
possible arrangements of the ternary values establishing the precedence of one constant 
over the other(s).  Do I treat the literal one (1) as equal, less than or greater than the 
other two?  Intrinsically, there is no natural precedence among the three constants, but 
rather, a symbolic primacy born out of our choice of mathematical manipulations.  Figure 
10-11 clearly demonstrates that I conveniently, yet arbitrarily, designated the symbol    0  
to align at 0°, while the other two symbols are 120° away, and hence, there are no natural 
physical phenomena dictating precedence amongst the three constants.  In the algorithm 
described herein, I have artificially set precedence for the convenience of implementation 
   
 187 
where, as described shortly, I opted to perform low-to-high trit transitions first followed 
by high-to-low transitions, and as a result, introduced an artificial algorithmic prejudice in 
favor of the value two (2).  Our choice was largely driven by the procedure for 
constructing the Hasse diagram, described above, where I have favored the constant two 
over one and the latter over zero.  Such a favorable treatment results in delaying the 
appearance of terms containing the constant two while forcing the terms of containing 
zeros to appear earlier during the synthesis process.  Such delay allows us to avoid 
control line blocking by relying on the fact that the value two will always appear later in 
the sequence, and hence, can be used as a control variable.  I shall revisit this topic 
further in the discussion about convergence below. 
 
Figure 10-11 Three possible Ternary Value Precedence diagrams where the ternary values can be 
arranged with different precedence orders.  (a) indicates a natural mathematical order of precedence.  (b) 
indicates that the values 0 and 1 are equivalent in precedence and are lower than 2.  (c) indicates that 1 
and 2 are at the same level and are both above 0.  The precedence order plays a role in the structure of the 
Hasse diagram and the consequent construction of ternary input sequences. 
Figure 10-11 (b) shows that the constant two (2) has precedence over the other two 
values, and Figure 10-11 (c) shows that the two constants one (1) and two (2) are of equal 
precedence and that both are higher than the value zero(0).  Algorithmically, it is feasible, 
of course, to swap the symbolic values zero (0) with two (2) which would grant 
preference to the value zero (0) over the value two (2).   Following the same thought 
   
 188 
process, the reader can quickly ascertain that, at most, there exist 12 unique precedence 
orders:  
1. Six for precedence order (a) representing the six unique permutation of {0,1,2} 
Oa = {0à1à2, 0à2à1, 1à0à2, 1à2à0, 2à0à1, 2à1à0}, 
2. Three for precedence order (b), Ob ={2à{0, 1},1à{0, 2}, 0à{1, 2}}, 
3. Three for precedence order (c), Oa = {{1, 2}à0, {0, 1}à2, {0, 2}à1}.   
Mathematically speaking, members within each of the three sets Oa, Ob, Oc are 
described as equivalent classes where a single element acts as a representative of the 
entire group.  In this chapter I limit the discussion to the three precedence orders shown 
in Figure 10-11 and treat them as representatives of the twelve possible precedence 
orders.  The precedence order of Figure 10-11(a) will be explained immediately after the 
discussion about algorithmic convergence. 
Definition 10-10: Given a set S and an equivalence relation ~ on S, the equivalence 
class of an element a in S is the subset of all elements in S which are equivalent to a, 
represented as: [a] = {  𝒙   ∈ 𝑺     𝒙  ~  𝒂 . 
 Convergence of Algorithm 10.6
The bedrock rule for all MMD-like algorithms states that: in order for it to converge, 
the algorithm should never alter any of the completely mapped pairs.  I have adequately 
demonstrated that control variables must be employed in order to satisfy this rule and 
   
 189 
illustrated the concept of control line blocking.  I have thus far stipulated that a Hasse 
style ordering could remedy control line blocking but have yet to corroborate our 
supposition.  Proving that our synthesis algorithm converges for any of the input 
sequences generated through the Hasse structure is the underpinning of the algorithm, 
and without the certitude of convergence, the algorithm would not have any advantage 
over a search of the entire set of permutations.  Now that the bar has been set, I initially 
ventured through a brute force programming approach to demonstrate the convergence of 
ternary input sequences constructed through a ternary Hasse diagram. The effort entailed 
synthesizing every valid ternary Hasse input vector against all possible permutations of 
output vectors.  I was able to demonstrate convergence for all two variable input vectors, 
24 in total, against all permutations of the output vectors, a total of 362,880 (9!).  
However, as I stepped up to ternary circuit of three variables, the number of output 
permutations quickly inflated to 27!; an astronomical number even for the fastest super 
computer.  Proving convergence of a two variable ternary problem, however, contributes 
nothing over the existing effort of MMD [10].  Our experience in the binary domain [22] 
fueled our determination to prove convergence through the old fashioned constructive 
proof with a convincing logical argument.  I shall first provide a proof for input vectors 
based on the first precedence order of Figure 10-11 (a) and defer the proof of the other 
two precedence orders to a later section. 
Theorem 10-1: All input vectors concordant with the Ternary 0à1à2 precedence 
order will converge for all possible output permutations. 
   
 190 
Foundation for proof:  The theorem asserts that any input vector constructed through 
a Hasse diagram with the 0à1à2 precedence relationship amongst the ternary values 
will converge for all permutations of the output vector.   Notice that the stipulation about 
all possible output permutations validates that the algorithm is able to process all possible 
circuits (input/output mappings).  It is worthy at this time to highlight the following 
points: 
1. The algorithm blindly processes each input/output pair and does not maintain history 
nor peeks ahead in the course of synthesis. 
2. For an n variable circuit, the algorithm processes one trit at a time: the target trit, and 
uses the remaining n-1 trits as control variables.  Hence, any n-1 control pattern will 
be encountered exactly three times in the course of synthesis. 
3. For each input/output pair transformation, the algorithm processes low to high 
transitions first followed by high to low transitions.  This stipulation makes the 
maximum number of control lines available which allow us to avoid altering any 
completely mapped pair which, in turn, facilitates convergence. 
4. The algorithm uses the three ternary values {0, 1, 2} for control variables.   
5. For any band b, the algorithm processes all terms within the band before processing 
any terms in band b+1. 
  Lemma 10-1: For any control patterns, the minterms exhibiting the control pattern 
will always appear in three consecutive bands according to the 0à1à2 precedence 
order of their target trit.   
   
 191 
 
Figure 10-12 Precedence order of low and high bits in a ternary Hasse diagram.  This Hasse structure 
follows the natural precedence order (0→1→2) which is also evident for each specific digit as it moves 
from one level to a higher level.  The lower digit exhibits the same pattern in the NW direction, while the 
upper bit exhibits the pattern in the NE direction.  
Proof of Lemma 1: The second point above embodies the essence of the proof of the 
Lemma 1, and consequently, the  proof of convergence of all input sequences constructed 
on the basis of the 0à1à2 precedence order.  I will utilize Figure 10-12 of the ternary 
Hasse diagram for a two variable circuit to illustrate the second point.  In this discussion, 
I will use the symbol x to represent a value of don’t care.  Now, suppose that we use the 
high order trit as a control variable during the synthesis process, then each of the patterns 
0x5, 1x, and 2x will only repeat three times throughout the synthesis process as clearly 
shown by the dark solid lines travelling in the north western direction.  For example, the 
control pattern 0x includes the terms 00, 01, 02 which ascend on the left branch of the 
Hasse diagram ascending in the north western direction.  Notice that the three terms 
appear in three consecutive bands (band 0, 1, 2) and they appear in the order of the target 
trit (lower trit in this case).  Similarly the terms of pattern 1x appear in the three 
consecutive bands 1, 2 and 3, and the terms for pattern 2x appear in bands 2, 3 and 4.  
Similarly, patterns x0, x1, and x2 where the lower trit is used as a control variable, each 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
5 Underlined digits are control line values. 
   
 192 
of the 3-tuple terms appear in in three consecutive bands as shown by the gray lines 
travelling to the north east in Figure 10-12.  Table 10-1 lists the band number where each 
midterm appears according to the control pattern used.  The third row, for example, 
shows that the terms 20, 21 and 22 appear in bands 2, 3 and 4 consecutively. ■ 
Table 10-1 Shows the band number where each minterm appears where the value of x is shown in the 
upper header.  For row (2x), the minterms (20, 21, 22) appear in bands (2,3,4) consecutively which is 
guaranteed by the structure of the Hasse diagram.  I rely on this ordering for the proof of convergence.  
 
Proof of theorem 1: In the discussion above, notice that the expressions appearance 
of terms and target trit both referred to the terms of the input vector.  The reader might 
still recall from the synthesis example of section 10.4 above that the synthesis process 
transforms the output vector to match the input vector, and in turn, making the input 
vector the destination (or target) of the transformation.  With this in mind, I provide the 
following constructive proof of theorem 1: 
1) At any point in the synthesis, all completely mapped pairs are identical; i.e., input and 
output minterms are equal and are arranged according to the ternary Hasse diagram, 
2) At the kth term in the synthesis process, the kth input term could not have appeared in 
the previously completely mapped pairs because, for a bijective function, it should 
only exist once in the input vector, 
   
 193 
3) For any mismatched jth trit of the kth input term, the corresponding n-1 trits of the 
output term represent the control pattern, 
 
4) The first appearance of the input control pattern (Cin-1 … Ci1) appears when the target 
trit Tij = 0 – see Lemma 10-1.  I now consider three possible situations based on the 
output control pattern (Con-1 … Co1):  
a) ∀  𝒚 = 𝟏. .𝒏− 𝟏  |  𝑪𝒊𝒚 = 𝑪𝒐𝒚: With exception to the target trit, all other trits are 
identical in both the input and output minterms.  In this situation, I can safely use 
the control pattern without altering any completely mapped pairs since, according 
to Lemma 10-1, a zero in the target trit points to the first encounter of the control 
pattern. 
b) ∃  𝒚 = 𝟏. .𝒏− 𝟏  |  𝑪𝒊𝒚 ≠ 𝑪𝒐𝒚 and the output control pattern Con-1 … Co1 has 
not been encountered before (in terms 1..k-1): Again, I can safely use this 
control pattern without altering any completely mapped pairs, 
c) ∃  𝒚 = 𝟏. .𝒏− 𝟏  |  𝑪𝒊𝒚 ≠ 𝑪𝒐𝒚 and the output control pattern Con-1 … Co1 has 
been encountered before: This situation will occur only if the previously 
encountered completely mapped term (Fo) with the similar control pattern has a 
zero in the jth position of the target trit, i.e. Foj(k)=0, which makes Toj(k) ∈ {1,2}.  
Trying to match Toj(k) to its corresponding input trit Tij(k) = 0 would surely alter 
   
 194 
the trit Foj(k), and hence, violating the bedrock rule of preserving previously 
completely mapped pairs. In section Foundation for proof above, stipulation 3 
guarantees that Toj(k) will not change on the first pass of low to high transitions – 
since I want to transform a {1 or 2} into a {0}.   However, as other trits within 
term k are transformed within the first pass from a lower to a higher value, the 
control pattern Con-1 … Co1 of the term k, on the second pass (high to low), is 
guaranteed to be different and that it would have larger trit values which would, 
most likely, match minterms in some of the later bands.   Now that the control 
pattern the kthe term {Con-1 … Co1}pass II is different than the control pattern of 
term Fo, I can safely alter Toj(k) without altering any previously mapped minterm 
(situation 4)b) above), and hence, remain in compliance with MMD’s bedrock 
rule for convergence.  Figure 10-13 shows an example where the control values 
Co = 11x are skipped in pass 1 allowing trit 2 to transform 1à2, and on the 
second pass, a new control value Co = 12x is used to safely transform T0  (1à0). 
d) Upon full synthesis of the kth term, the output term will be identical to the input 
term. 
5) The second encounter of the control pattern Cin-1…Ci1 will immediately follow in the 
next consecutive band, at the mth term, with a value of one(1) in the target trit.   In the 
case of a mismatch, the corresponding output target trit would surely be a two(2) 
since the zero(0) have already appeared step 4) above.  A controlled ⟦12⟧  operation  would   safely   do   the   transformation   without   affecting   any   completely   mapped  terms. 
   
 195 
6) As a result of the above construction, it is clear that the bedrock lemma of never 
altering a completely mapped pair holds true throughout the above process; and 
hence, the algorithm will always converge for any Hasse compliant input sequence 
constructed using the 0à1à2 precedence order.  This concludes the proof of 
theorem 1. ■ 
 
Figure 10-13 Example for case 4c.  In an attempt to change 111 to 120, the two lower trits have to be 
modified one at a time.  Were I to attempt changing the lower trit from 1 to 0 first, I would had to use the 
‘11’ control pattern which was encountered before, and would have modified the first minterm (110). But 
by always doing ‘low to high’ transitions first (pass 1), in this case the middle trit (from 1 to 2), the output 
minterm will change properly to allow for new control values to emerge which would not affect previously 
completed minterms.  In pass 2, I can safely modify the lower trit from 1 to 0 since the ‘new controlling 
value (12)’ will be guaranteed not to modify any previously completed minterms.  
 Convergence of Triangular Hasse Precedence Orders 10.7
Theorem 2: All input vectors concordant with the Ternary 0à{1,2} and {0,1}à2  
precedence orders will converge for all possible output permutations. 
Proof: The theorem asserts that any input vector constructed through a Hasse diagram 
with one of the triangular precedence orders 0à{1, 2} or {0, 1}à2 will converge for all 
permutations of the output vector.  Figure 10-14b show the banded Hasse diagram for the 
0à{1,2} precedence order.  Mathematically speaking, notice that the ternary values 1 
1 1 0
: : :
1 2 0 1 1 1
1 1 0
: : :
1 2 0 1 2 1
1 1 0
: : :
1 2 0 1 2 0
Control
Pa
ss
	  I
Pa
ss
	  2
   
 196 
and 2 belong to an equivalent class (see definition 14 above) as they both have an equal 
precedence over the zero but no precedence between them.  As a result, I could easily 
select one of the two values to represent the behavior of the entire equivalent class as 
shown in Figure 10-14 (a).  Arbitrarily designating the constant one (1) as the 
representative of the equivalent class, I can remap the banded Hasse diagram with its 
binary representation as shown to the right of Figure 10-14 (b).  For this example, the 
ternary band number one {01, 10, 02, 20} simply becomes {01, 10}, and the set in band 
two simply becomes {11}.  Since I have already proved convergence for reversible 
circuit synthesis using a binary Hasse diagram in [22], I can easily conclude that all 
ternary Hasse diagrams with the triangular precedence order 0 à{1, 2} will also 
converge.   Using the same reasoning for the last triangular order {0, 1}à2, the values 
{0, 1} are members of the same equivalent class which, again, reduces the problem to the 
binary Hasse diagram as seen above.  This concludes the proof of theorem 2. ■ 
 
Figure 10-14 (a) One of the triangular precedence orders ({0}à{1,2}) can be represented by the 
binary equivalent class ({0}à{1}) since the {1,2} are of the same precedence level.  (b) The ternary Hasse 
diagram based on this triangular precedence order can be reduced to the binary equivalent Hasse diagram 
shown to the right. 
00
100102 20
11 2112
22
0
2 1
0
1
00
1001
11
(a)
(b)
   
 197 
 Selection through Genetic algorithm 10.8
A ternary function with n variables has 3n minterms in the input vector which makes 
the number of possible permutations of an input vector an astounding 3n!  Our method of 
constructing convergent input vector sequences constructs a subset of all convergent 
sequences.  According to Corollary 2, a ternary Hasse diagram consists of 2n+1 bands 
where each band consists of 𝑁 𝑏   minterms – see derivation in Section 10.12. 
𝑁 𝑏 =    𝑛𝑚 − 𝑗!!!/!!!!
∙ 𝑛 − (𝑚 − 𝑗)2𝑗 + (𝑏  𝑚𝑜𝑑  2)  
(3) 
where !!   is  the  combination  operator  and    𝑚𝑜𝑑  is  the  modulo  operator.    
In the process of constructing the input vector, step 0 above of the pseudo-code in 
section 10.5.3 above, selects a single sequence of any permutation of all minterms in a 
band.  Consequently, the total number of possible input vectors is the product of all 
permutations of all bands, stated as: 
𝑇 𝑛 = 𝑁 𝑏 !!!!!!  (4) 
  
   
 198 
 
Figure 10-15 Number of permutations for all possible input vectors (all possible solutions) vs. Hasse 
based sequences (valid solutions constructed by this algorithm). 
Clearly, as the number of variables increases, Table 10-2, the number of possible input 
vectors generated by our algorithm still grows exponentially, despite at orders of 
magnitudes slower than generating all possible input vectors – see Figure 10-15.  For a 3 
trit function, I could easily examine all 6,494 Hasse based input vectors and select the 
one which yields the best quantum cost.  Functions of 4-trits or more, however, suddenly 
become beyond the capacity of our best computers.  Confronted with such daunting 
roadblocks, and borrowing from our experience in the binary domain [24], I opted to 
employ a genetic algorithm to construct potential input vector arrangements based on the 
results of previously synthesized input vectors.  
   
 199 
Table 10-2 All possible permutations vs. total permuations for Hasse based sequences where all 
possible soutions are shown in column 3, while the number of solutions constructed using the Hasse 
structure are shown in column 4. 
 
 Objective function using Quantum Gate Count 10.8.1
In their analysis of the simple 2 trit function, the authors of [10] arbitrarily assigned a 
cost of one for unconditional ternary gates, and a cost of two for controlled ternary gates.  
Practically, however, there are no existing physical implementations of ternary quantum 
systems, and hence, no realistic cost could be assumed.  For the purposes of this chapter, 
I use the number of gates as the measure of fitness, or objective function, for the genetic 
algorithm.  For an arbitrary ternary quantum circuit C with k quantum gates, the quantum 
cost Qc is calculated as follows: 
𝑄! =    𝐺!"(𝑖)!!!!  
   
 200 
where: Gqc(i) is the quantum cost for each gate which I assume to be one for the 
purposes of our analysis.  Actual gate cost would be used once physical implementation 
of ternary gates is realized. 
 Genetic Algorithm 10.8.2
Rather than bouncing randomly around the search space, a genetic algorithm follows a 
set of directed probabilistic steps where new solutions are the offspring of existing good 
solutions.  The following block exhibits the standard structure of a genetic algorithm: 
The initialization step (initialize(P(g)) randomly creates a set of valid input vector 
sequences, initial population, using the Ternary Hasse Diagrams where, for each band, 
the set of minterms are randomly shuffled, and the resulting band arrangement is 
concatenated to the input vector under construction – step 0 of section 10.5.3 above.   
Synthesis of the initial population gives the fitness, quantum cost, of each input vector 
arrangement which is used to determine the next generation, offspring, of solutions to 
examine.  Roulette wheel selection process is then used to randomly select two parents of 
the current generation for recombination.  For this research, I studied both single and 
double crossover operators to create the next generation, with special consideration for 
g  ← 𝑛𝑢𝑚𝑏𝑒𝑟  𝑜𝑓  𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠; 
initialize(P(g)); 
do 
    evaluate(P(g)); 
    P1(g), P2(g) ← select(P(g));  // Set of parent pair 
    g ← g - 1;        
    P(g) ← recombine(P1(g), P2(g));  // crossover è children 
    P(g) ← mutate(P(g));   // Mutate children 
while (g > 0); 
   
 201 
the position of the crossover - discussed shortly.  The final step of the genetic algorithm 
applies a mutation operator in order to continuously maintain population diversity and 
avoid premature convergence to local minima. 
 Genotype and Valid Operators 10.8.3
As discussed earlier and shown in [22, 23], the band structure, defined above, must be 
faithfully preserved in order to assure algorithmic convergence.  As a result of the banded 
structure of the algorithm, recombination operators are limited in their application to the 
boundaries of a band in order to avoid a minterm jumping from one band to another.  In a 
similar fashion, mutation operators are constrained to swapping minterms intra-band 
which will also preserve the certitude of convergence. 
Figure 10-16 illustrates the structure of a chromosome, i.e. input vector arrangement, 
for a two variable ternary function consisting of five bands.   As hinted earlier, in order to 
ensure that an offspring is a valid input vector sequence, the crossover point(s) must 
occur at either end of a band, but not in the middle of a band.  Had the invalid crossover 
point been taken in Figure 10-16, the resultant child would have been invalid as it would 
have included the minterm 01 twice and lacked the term 10.  Of course a repair process 
could have detected and corrected such a defect which, depending on the repair process, 
could yield a different, yet valid, input vector.  The reader might correctly surmise that 
the choice of limiting crossover to band boundary could potentially result in stale 
members within each band, leading to premature convergence to local minima.   In 
general, genetic algorithms introduce mutations as a remedy for premature convergence 
where mutation typically acts as a background operator at a low probability of 
   
 202 
occurrence.  For this study, however, I intentionally elevated the probability of applying 
the mutation operator, at a level higher than suggested by standard genetic algorithms, in 
order to counteract the limitations imposed on the recombination operator (band 
boundary only).  A high level of mutation probability, I theorized, would inject diversity 
within children allowing them to escape such hasty race to the nearest local minima. 
 
Figure 10-16 Genotype of a valid input sequence showing valid and invalid mutations and cross over 
operations.  A valid crossover can only occur at the band boundary as stipulated by the Hasse diagram.  
Valid mutations, in this case swap, can only swap elements within the band to assure that the Hasse order 
is not violated 
 Experimental Results 10.9
For the purposes of this study, I have limited our experiment to the set of Hidden 
Weighted Trit (HWT) benchmark functions which are introduced, for the first time, into 
the literature in my papers [26, 27].  HWT functions are an extension to their binary 
counterpart Hidden Weighted Bit (HWB) functions which were first introduced by 
Prasad et al [97] and are heavily cited as one of the harder benchmark for reversible 
binary logic synthesis [21, 24, 23, 20].   
   
 203 
Definition 10-11: Hidden Weighted Trit (HWT) functions are reversible ternary 
functions where the output minterm is generated by circularly shifting the input minterm 
by the number of its non-zero trits.  
For the sake of a balanced comparison, I used the same synthesis algorithm for both 
the natural and Hasse based input sequences and the same method for calculating the 
cost.  The only independent variable, in this case, is the input vector arrangement which 
represents the crux of our experiment reported in this chapter.  The use of genetic 
algorithm is merely an aid for discovering Hasse based input vectors with lower circuit 
cost. 
Table 10-3 Comparison between using the natural order of the input vector vs. using the Hasse 
structure to construct valid input vector arrangements.  As the search space increases, the probability of 
discovering better solutions decreases while the time required to discover such solutions increases 
drastically. 
 
Table 10-3 shows the results of synthesis of the HWT functions of 4 to 9 trits using 
the natural and the Hasse based input vector ordering.  Clearly the Hasse based ordering 
has produced better results for all functions with a 60% saving for the HWT-4 function 
down to 7% for the HWT-9 function.  Of course the results should not be surprising as 
   
 204 
the genetic algorithm processed a total of 600,000 arrangements of the input vector 
consisting of 100 generations of 100 individuals each for two variants of recombination 
methods (single & double crossover), and 30 combinations of probabilities of 
recombination and mutation.  The fact that I am able to freely construct convergent input 
vector sequences, at our whims, is the strong point of this algorithm, and hence, our main 
contribution to the research area.  Notice that the percentage of savings shrinks 
dramatically as the number of variables increase which can be easily explained with a 
quick glance at Table 10-3.   Even though the search space is greatly reduced with the 
Hasse based algorithm, a nine trit function has a search space of the order 109,615, for 
which, an exploration of 600,000 elements is like a drop in a colossal ocean.  
To exasperate matters further, the time to synthesize functions with larger number of 
variables increases exponentially.  Although our implementation of the genetic algorithm 
took advantage of multithreading on an 8 core Intel® i7 processor, the nine variable 
function consumed more than two hours to yield 7% improvement by visiting 600,000 
solutions Figure 10-17 demonstrates an exponentially increasing curve depicting time vs. 
number of variables for class of HWT functions.  For a four trit function, the 600,000 
visited solutions represent a 6x10-10% of the Hasse based search space where a saving of 
60% is a great achievement.  For a nine variable function, however, covering a similar 
ratio of the search space requires visiting close to 1x109606 potential solutions which is 
beyond the possibilities of all existing computing power on earth.  
 
   
 205 
 
Figure 10-17 Time required to synthesize the family of Hidden Weighted Trit functions. 
 Acceleration with CUDA 10.10
In order to accelerate the time required for finding Hasse sequences with the lowest 
cost, several experiments using the GPU - CPU mapping have been performed. The 
results of these experiments are shown in Table 10-4 and Table 10-6 for the HWT5 and 
HWT6 functions respectively. The columns in both tables represent in order: case 
represents the type of experiment, cores is the number of cores (CPU or GPU) the 
algorithm is running on, repetitions is the number of times the sequences are computed, 
samples is the number of different sequences computed, Total Time is the time required 
for the whole computation and Time/Sample is the unit time required to compute and 
evaluate a single sequence.  
0
1000
2000
3000
4000
5000
6000
7000
8000
hwt4 hwt5 hwt6 hwt7 hwt8 hwt9
Time	  (sec)
   
 206 
Table 10-4 Performance times for the HWT5 function which shows that, for CUDA, at full thread 
capacity, it is able to execute at 55 microsconds per sample compared to 304 microseconds for the CPU. 
 
The first case in Table 10-4 and Table 10-6 shows the Genetic Algorithm running on 
an i7 Intel 960 Processor with 8 cores at 3.2GHz each. The system has 12 GByte DDR3 
memory. The GA runs through 72 sets of parameters of 300 generations each, where each 
consists of 512 individuals. For HWT5, I realize a 317% speedup on the GPU relative to 
the CPU.  For the HWT6 function, however, such a speedup diminishes to a mere 10% 
advantage for the GPU.  This big difference in performance and the decrease of the 
performance when 512 are compared to 1024 cores is a result of the transfer time 
required to send the sequences from the CPU to the GPU.  This can be confirmed by 
observing that the time/sample remains unchanged between the two scenarios (i.e., the 
processing happens at the same speed).  The reason for observing a speedup for HWT5 is 
that the algorithm is CPU bound, and having more CPU cores helped demonstrate the 
value of CUDA (when we are CPU bound).  This is because the difference in the number 
of minterms to be synthesized between the five and six variable function is very large 
which takes a lot of time to transfer between the CPU and GPU.  As a result, the GPU 
   
 207 
based approach is ideal when the amount of data transfer between the host and GPU is 
minimal. 
Table 11-5 shows the distribution of work between CPU and GPU (column 2) as 
compared to a CPU only approach (column 1). 
Table 10-5 strucutre of GA algorithm on both CPU and CUDA implementations. 
 
In order to understand the detailed operation of the algorithm I ran few experiments to 
verify our conclusion about data transfer issues impeding performance.  Cases 1 and 2 in 
Table 10-4 and Table 10-6 show the result for running a single thread on the GPU and the 
loop over the available sequences are directly implemented in the GPU.  In case 1, a 
single sample is fed to a single CUDA thread, and the same sample is synthesized 1000 
times.   Case 2 is the same, with the addition of synchthreads() outside the loop and only 
for 100 samples. They both gave essentially the same results for synthesizing a single 
sample.  From this experiment I conclude that the GPU did not require explicit 
synchronization with the host and that our time measurement is accurate where I am 
measuring the time it takes the GPU to compute all sequences. 
In case 3 I specified the loop inside the CPU and made the call to CUDA 1000 times 
calling a single thread to do the synthesis. As expected, adding a second thread 
(effectively synthesizing two sequences) did not affect the time, as both CUDA threads 
   
 208 
are working in parallel . Similarly, doing 512 threads (the number of cores on a single 
device) was all done in the same time as a single thread.  This particular configuration 
shows the efficiency and the usefulness of the GPU acceleration. The pseudo code of 
case 3 is shown below: 
CPU: for 1000 times 
CUDA: Synthesize(sample) 
Case 4 is the same algorithm as Case 3, where 1024 sequences are fed to two GPU 
devices (two distinct graphic cards) each with 512 threads.  The loop is repeated 512 
times.   For the HWT5 function, you can see that it took about the same time per sample 
which effectively did not affect the time per sample.   For the HWT6, however, the time 
per sample almost dropped by 30% indicating issues related to data transfer. 
Table 10-6 Performance times for the HWT6 function where, at full capacity, CUDA takes 376 
microseconds per sample compared to the CPU at 645 710 microseconds per sample.   The CUDA speedup 
is lower than for the HWT5 function since the dataset for HWT6 is larger forcing some of the data to exist 
in CUDA memory buffers shared amongst multiple threads.  For the HWT5 function, all the data was able 
to fit in local memory buffers each exclusively dedicated for each processor core. 
 
   
 209 
In order to get both devices to work in parallel, a special memory mode in the host 
memory was used.  It is called the pinned host memory mode which page locks a region 
of memory on the host and makes it visible to the CUDA device.  The locked memory 
holds the input and output sequences while the results were kept on CUDA’s local 
memory.   I ensured that the algorithm on the CUDA device avoids memory conflicts 
between threads by coping the input/output vectors to local memory, and allocates the 
output buffers on local memory as well.  When shared memory was used for the output 
buffers, it took much longer to synthesizes the sequences due to bank conflicts between 
the threads trying to read and (especially write) to the shared memory. 
Notice that cases 1 and 2 produced unexpected results. I expected that placing the loop 
inside the CUDA core would yield the best results; however, I measured a100% 
degradation in performance when the loop was place inside the CUDA compared to 
placing it in the CPU (case 3).  I observed the same anomaly for both the HWT5 and 
HWT6 functions. 
Finally, to summarize the results of the GPU acceleration, considerable acceleration of 
circuit computation is achieved if the following conditions are satisfied: 
• Minimize the CPU-to-GPU transfer data, 
• Minimize the GPU global-to-local memory transfer, 
• Split the data so that different running GPU cores do not obstruct each other by 
blocking the global memory access. 
   
 210 
 Conclusion and Analysis 10.11
In this chapter I compared the synthesis process using a natural ternary arrangement of 
the input vector versus a subset of all possible arrangements and successfully 
demonstrated the benefit of the latter.  In the process, I introduced a synthesis algorithm 
capable of synthesizing any arbitrary ternary function with a large number of variables.  
Since the option of exploring the entire search space is unfeasible, our unique method of 
constructing input vector arrangements, with guaranteed convergence, becomes an 
essential component of the search algorithm. 
Building a Hasse based ternary structure provides a protection against the trap of 
control line blocking and allows our synthesis algorithm to access any element within the 
limited Hasse based search space in a random manner.  Still, rather than randomly 
hopping throughout the search space, I constructed new sets of candidates (offspring) 
based on the best solutions found up that point.  I have shown in section 6.5 that genetic 
algorithm and Tabu search has consistently resulted in better quantum cost, compared to 
random selection, and I believe that it is the case here as well.  The search space for the 
ternary domain grows at a larger magnitude than in the binary domain, which makes it 
even harder for random selection method to discover better solutions.  One question that 
has not been answered by our work in the ternary domain, however, is whether 
partitioning the search space, similar to our CSP algorithm, would yield better results for 
different partitions. 
Although the CUDA performance for the hwt5 and hwt6 functions have shown 
superior results compared to the CPU, the benefit in performance is only marginal 
   
 211 
considering that 1024 processing cores were used compared to the 8 cores of the CPU.  
Considering that a CUDA core runs at 1GHz compared to the CPU core at 3.7 GHz, I had 
expected a performance boost of 30 to 50 times what the CPU can do.  Both the CPU and 
CUDA devices utilize DDR3 memory at 3GHz which does not explain the slowdown.  
Memory contention is always an issue with any multiprocessing algorithm, so I made 
sure that for the CUDA implementation, every core has access to completely separate 
memory spaces, including constants and stack.  Since CUDA provides three memory 
levels, local, shared and global, I made sure to fit data as close to the processing core as 
possible (local first, shared next, then global).  For the hwt5 function, all dataset was able 
to fit within the local memory which is exclusive for the specific core which resulted in a 
descent speedup compared to the CPU.  However, for the hwt6 function, the shared 
memory had to be used in order to keep the data as close to the processing core as 
possible.  The shared memory, however, is slower and requires more time to execute. 
Of course, the CPU L1 and L2 cache infrastructure is superior (in size and speed) to 
what CUDA provides where the entire dataset for both functions can easily fit within the 
L1 cache.  Even though the CPU is running 8 cores, its access to the L1 cache is 
extremely fast that it makes up the time when it faces off with 1024 CUDA cores. 
  
   
 212 
 Derivation of Equation 10.12
Equation 3 calculates the number of minterms in each band in a ternary Hasse diagram 
using the precedence order 0 à1à2.   According to definition 8, for any minterm in a 
band (b), the mathematical sum of its trits equals the band number.  To demonstrate the 
derivation of equation 3, shown below, I will first start with an example where, for a 
function of five variables, I calculate the number of minterms at band levels 4 and 5.  
Notice that, according to corollary 1, a function with five variables has 11 bands (0 to 
10). 
𝑁 𝑏 =    𝑛𝑚 − 𝑗!!!/!!!!
∙ 𝑛 − (𝑚 − 𝑗)2𝑗 + (𝑏  𝑚𝑜𝑑  2)  
(3) 
Band 4 (even): To calculate the number of elements in a band, I find all combinations 
where the sum of digits is equal to 4, which is the sum of the following arrangements:  
j  Total 
2 (Combination of selecting 2 twos of 5 variables)∙(Combination of selecting 0 ones of 3 
variables) 
!! ⋅ !! = 10 
1 (Combination of selecting 1 twos of 5 variables)∙(Combination of selecting 2 ones of 4 
variables) 
!! ⋅ !! = 30 
0 (Combination of selecting 0 twos of 5 variables)∙(Combination of selecting 4 ones of 5 
variables) 
 !! ⋅ !!  = 5 
 
Band 5 (odd): Similarly, I find all combinations where the sum of digits is equal to 5, 
which is the sum of the following arrangements:  
   
 213 
J  Total 
2 (Combination of selecting 2 twos of 5 variables)∙(Combination of selecting 1 ones of 3 
variables) 
!! ⋅ !! = 30 
1 (Combination of selecting 1 twos of 5 variables)∙(Combination of selecting 3 ones of 4 
variables) 
!! ⋅ !! = 20 
0 (Combination of selecting 0 twos of 5 variables)∙(Combination of selecting 5 ones of 5 
variables) 
 !! ⋅ !!  = 1 
 
In equation 3, a band b in an n variable function has m = !! + 1 terms which are added 
together ( )!!!/!!!! .  For each index j=0..m, I first select the (m-j) digits of twos out of n 
possible digits !!!! .  Since the sum of digits at each band equals the band level b, then 
the remainder must be selected from digits which have a value of one.  The total number 
of such digits would be: 
Number of ones = b – 2*(m-j) = 𝑏 − 2 ∗ !! + 2𝑗 = 2𝑗 
Since I have already used (m-j) digits for selecting the twos (in the first term above), I 
have n-(m-j) digits to select the ones from.  Consequently, the second term of equation 3, 
!!(!!!)!!!(!  !"#  !)  represents the combination of selecting 2j of ones out of n – (m-j) digits.  
Notice that when the band number b is even, b mod 2 = 0, while for an odd band number 
(b), the term 𝑏  𝑚𝑜𝑑  2  =1 which essentially selects an additional one out of the 
remaining digit. ■ 
  
   
 214 
 
 MV Benchmarks and Extensible Quantum Specification (XQS)  Chapter 11
 Introduction 11.1
Existing literature in multiple-valued reversible circuit synthesis tends to focus on 
ternary circuits with rather small number of variables using an array of different 
specifications [13],[16], [8], [17], [14], [10], [19], [15], [9], [18], [20], [23], [11],[12], 
[22]. In turn, it becomes hard, if not impossible, to formulate an objective comparison 
amongst various algorithms. Binary logic synthesis, on the other hand, has an established 
set of benchmark functions which are commonly used by most researchers to measure the 
performance of their algorithms. Some of the most common binary functions are 
supported by online resources such as RevLib [24] and Maslov’s Benchmark 
Specifications [5] where quantum specifications of different classes along with some of 
the best discovered circuits are available. 
Binary reversible functions and circuits already use the standard Programmable Logic 
Array (PLA) file format to describe the specification as input/output pairs and the set of 
gates which represent the synthesized circuit. Both RevLib and Maslov’s websites 
contain various reversible benchmark functions using the PLA format and some functions 
are specified with a truth table. RevLib, for example, uses the truth table format (called 
SPEC) for specifying embedding of irreversible functions, while the PLA format is used 
for specifying larger functions [24]. Extending the classical logic PLA specification to 
quantum binary logic has proven successful since the level and complexity of 
computation is somewhat similar to their classical binary counterparts. 
   
 215 
Delving into multiple-valued computation, however, increases the complexity of 
representing functions. It is feasible for the quantum unit of computation, the qubit, to 
utilize multiple basis of computation within a single implementation, e.g., hybrid 
quantum circuits (HQC). It is also feasible to design a circuit where different qubits 
utilize different basis of computations throughout the circuit. The PLA specification is 
not well suited for representing multiple-valued functions with such complexity and 
would require a major overhaul in order to do so. 
In this chapter I present a foundational framework for an initial set of multiple-valued 
benchmark functions which will standardize the yardstick for measuring performance 
amongst algorithms in the MV domain. I also propose a new file format that is designed 
specifically for multiple-valued and hybrid functions. The new eXtensible Quantum 
Specification (XQS) is based on the universal YAML file format [6] which is a human 
friendly data serialization standard for modern programming languages. The YAML file 
format, and hence XQS, allows for encapsulation of data structures (name spacing), 
inheritance, expressivity and extensibility. YAML is supported by the majority of modern 
programming languages where packaged libraries for the specific language are available 
on the Internet [6]. Both benchmark functions and XQS specification are available on [4]. 
 Reversible Multiple-Valued Logic Functions 11.2
A reversible multiple-valued function is a bijection B : I àO from Nk to Nk, with N 
being the number of inputs (outputs) and k is the radix of computation basis.  Table 11-1 
shows the truth table of a two variable multiple-valued reversible function of radix 3 (i.e., 
ternary).  Similar to their binary counterparts, quantum  
   
 216 
Table 11-1 Example of multiple-valued (ternary) reversible function where AB are the inputs and PQ 
are the outputs. 
AB PQ 
00 00 
01 01 
02 02 
10 10 
11 11 
12 12 
20 22 
21 21 
22 20 
circuits for multiple-valued specifications are constructed from a set of quantum logic 
reversible gates which are capable of evolving the state of the qubit between the MV 
states (based on the radix of operation).  For example, a ternary quantum gate transitions 
the state of the qubit amongst the states   0 ,    1 , 𝑎𝑛𝑑   2 ; therefore, these gates are 
referred to as multiple-valued reversible logic gates.   Figure 11-1 shows the quantum 
circuit (in this case a single quantum gate) that realizes the function from Table 11-1.  
The gate shown in the figure is the Controlled-[02] and belongs to the class of the 
Controlled-U (C-U) reversible logic gates.  The C-U family is a well-known class of 
quantum logic gates which have been physically implemented and demonstrated for both 
binary and ternary logic.  In Boolean quantum circuits, the C-U gate applies the function 
U on qubit B if state    𝐴  =   𝑃  =    1 , hence U:    𝑄  = 𝑈 𝐵 .  Otherwise, the value of 
qubit    𝐵  is unchanged,     𝑄  =    𝐵 . The Controlled-U set of gates for the ternary 
computational basis have been introduced in [21]. 
  
   
 217 
 
Figure 11-1 Example of multiple-valued reversible function represented as a circuit.  In this 
illustration, the A=P qubit is used as a controlling value for the gate [[02]].  When A is set, the [[02]] gate 
is active, otherwise, it is a passthrough. 
Definition 12-1: (Multiple-Valued Controlled-U Gates) The multiple-valued 
controlled-U gates are the radix gates: when    𝐴  =   𝑃  =    𝑟 − 1 , (with r being the 
radix of the control qubit) the target qubit is modified according    𝑄  =𝑈   𝑃 . Such 
function is illustrated in Table 11-1; the target qubit is modified using a multiple valued 
operator so that    22  =  𝑈 20 , and     20 , 𝑈   22 . The matrix of this U operation is 
shown in eq. 1(a). 
As can be expected, because of the higher radix in multiple-valued quantum reversible 
functions, the number of available operators is relatively larger when compared to the 
binary cases.  For instance, while in the binary case only one CNOT gate exists (Table 
11-2(d), in ternary circuits at least three possible equivalent gates exists Table 11-2 (a-c). 
Table 11-2 Matrix representation of ternary (a-c) and binary (d) quantum gates. 
02 = 0 0 10 1 01 0 0  
(a) 
12 = 1 0 00 0 10 1 0  
(b) 
(eq.	  12.1)	  01 = 0 1 01 0 00 0 1  
(c) 
𝐶𝑁𝑂𝑇 =    0 11 0  
(d) 
   
 218 
Definition 12-2: (Multiple-Valued Gates) For multiple-valued logic for the basis 
states of radix r, there exists a number of unitary quantum gates equal to the number of 
unordered r-subsets (each r-subset is a permutation of the r-1 values) of the set 0, 1, 2,… , 𝑟 − 1  and the number of basis r permutation gates = r! 
For example, the set of permutations for binary gates, r = 2, are the unordered 2-
element set {(0, 1), (1, 0)} which represent the wire and inverter respectively.  By 
corollary, the set of ternary values {0, 1, 2} can be fully permuted into the six elements of 
the 3-tuple set {(0, 1, 2), (0, 2, 1), (1, 0, 2), (1, 2, 0), (2, 0, 1), (2, 1, 0)} which represents 
the single-qubit ternary gates symbolically shown in Table 11-3 
This increase in the resources in the multiple-valued circuit design as compared to the 
binary case, require extra care in the design process, because selection of a particular set 
of operators is in-fact a selection of a specific algebra.  The ternary operators described 
here are an example of the Galois Field algebra which defines the modulo function for 
level 3 [7].  
   
 219 
Table 11-3 Generalized Ternary Gates based on Galoi Field 3 (GF3) with the gate name given in the 
first column and the mathematical equation for calculating the outputs based on the current input.  The 
third column shows the output for the three possible inputs shown in the header, and finally the gate symbol 
is shown in the last column. 
 
In the binary case, the number of operators (gates) is increased when the qubit is 
placed into a state of superposition with a V gate – referred to as the state of 
superposition where the qubit is effectively halfway between the states,      0   𝑎𝑛𝑑       1 .  
For instance, an equivalent decomposition to the binary [CNOT] gate = [V ][V ] = [V† 
][V†] where the application of a single V or V† gate rotates the qubit into the 
superposition state.  The V gate applies a rotation of 90° in one direction while the V† 
applies rotation of 90° in the opposite direction around the qubit real axis – as envisioned 
by the Bloch sphere.   Similar decomposition can be envisaged for multiple valued gates. 
For example, the ternary gates of eq. 1 above can be decomposed into V=V† gates as 
shown in eq. 12-2 below:  
Table 11-4 Superposition gates for self-inverting ternary(a-c) and binary (d) logic. 
𝑉02 = 02 = 1 + 𝑖2 0 1 − 𝑖20 1 01 − 𝑖2 0 1 + 𝑖2  
(a) 
V12 = 12 = 1 0 00 !!!! !!!!0 !!!! !!!!  
(b) 
(eq.	  12.2)	  
   
 220 
𝑉01 =    01 = 1 + 𝑖2 1 − 𝑖2 01 − 𝑖2 1 + 𝑖2 00 0 1  
(c) 
V = 𝐶𝑁𝑂𝑇 =    !!!! !!!!!!!! !!!!  
(d) 
 
Definition 2 clearly states that the number of possible permutative quantum gates, 
using the radix  r as the basis of computation, is exponentially proportional to r.  For the 
binary basis state there exists a single inverter represented by the NOT gate Figure 
11-2(d).  In the ternary computation basis, however, there exists three self-inverters ([01], 
[02], and [12]) which when pair of each are placed back to back yields the identity matrix 
(self-reversible gates) Figure 11-2 a-c.  The other two ternary gates ([+1] and [+2]) are 
supplementary inverters where placing one of each restores the original state Figure 
11-2d.   The last two gates can be implemented with the self-inverting gates where, in 
Figure 11-2e, the gate [+1] = [12]·[02] and the gate [+2] = [02]·[12].  I mentioned that, in 
the binary basis of computation, the square root of NOT gates, V and V†, represent the 
quantum states of superposition (Eq. 12-2d) which is a state that rests halfway between 
the    0   𝑎𝑛𝑑     1   basis states. Similarly, it is feasible to define an equivalent set of gates, 
for the ternary basis of computation, which suspend the qubit in a superposition state 
between the three basis states    0   ,    1 ,𝑎𝑛𝑑     2 .  Notice that, in this case, the square-root 
of a ternary quantum gate, creates superposition between coefficients that are not on the 
diagonal of the unitary matrix representing the gate. Thus, the matrix representation for 
[V02] = ([02]) shown in eq. 12-2a) creates a superposition for the input states     0   𝑎𝑛𝑑     2    while if the input state is   1    the output is an observable deterministic 
   
 221 
quantum state. However, the gates [+1] and [+2] create a different type of superposition 
given by the unitary matrices shown in eq. 12-3.  
Table 11-5 Ternary Square Root of [+1] and [+2] gates. 
V+1 = +1 = !! − !! !!!! !! − !!− !! !! !!    (a) 
(eq.	  12.3)	  
V+1 = +1 = !! !! − !!− !! !! !!!! − !! !!  (b) 
 
 
Figure 11-2 Operation of Ternary Inverters (a,b,c) shows three self inverters, while (d) show the two 
complementary inverters (+1 and +2). (e) shows substitution of +1 to (12, 02) and +2 to (02,12) where the 
middle 02 gates cancel one another, and then the 12 gates cancel one another yielding identity; hence, +1 
and +2 are complementary invertors.   
These quantum gates have similar properties to the V and V† : for instance the [V02] 
and [V02]†  postulate that: 
[V02][V02] = [V02]† [V 02] †  = I                        (eq. 4) 
Finally, the consequence of the higher variety of the multiple-value reversible gates is 
that the possible functions can be more complex as well as they can be represented in 
more complex or user-convenient manners. For instance, a ternary majority function can 
   
 222 
take three values such as 0, 1 and 2 to express the comparison of the output value to a 
particular threshold t such that: 
𝑚𝑎𝑗! 𝑎, 𝑏, 𝑐 =    0 𝑖𝑓  𝑚𝑎𝑗! 𝑎, 𝑏, 𝑐 < 11 𝑖𝑓  𝑚𝑎𝑗! 𝑎, 𝑏, 𝑐 = 12 𝑖𝑓  𝑚𝑎𝑗! 𝑎, 𝑏, 𝑐 > 1                               
  (eq. 5) 
 Extensible Quantum Specification (XQS) 11.3
Binary quantum functions specifications and circuits have been adequately 
represented through existing classical standard specifications such as the PLA file 
format (used by Espresso [3]), Berkeley Logic Interchange Format - BLIF [1], [2], 
and RevLib [24] file format.  RevLib, which is used primarily to describe binary 
input specifications of quantum circuits, outlines two methods for describing a 
quantum specification, the PLA Sum of Product (SOP) format shown in Figure 11-3, 
and the truth table format known as SPEC.  The Revlib website gives great details 
about the REAL format and how it can be used for describing the components of a 
quantum circuit as shown to the right of Figure 11-3. 
The complexity of describing quantum circuit for multiple valued logic increases 
where a quantum qubit could be in more than two states (multiple valued) or, in the 
case of a hybrid circuit, could have qubits using different basis of computation 
(binary, ternary, quaternary, …etc.) at the same time. It might be feasible to 
shoehorn additional information to the existing PLA and SPEC formats in an effort 
of representing higher basis of computation; however, such extensions to these 
specifications would become complex, repetitive and highly incompatible. 
   
 223 
 
Figure 11-3 (a) Example of binary reversible quantum circuit and (b) its specification in the PLA 
format. 
In order to deal with complex specifications and increased complexity of 
functions, as in the case of the multiple-valued logic, I chose to introduce a new 
extensible file format specifically designed for multiple valued quantum logic.  The 
new format is based on the extensible YAML file format specification [6] which is a 
human friendly data serialization standard for programming languages which allows 
for encapsulation of data (scoping or name spacing), inheritance, expressivity and 
extensibility.  YAML support for the majority of modern programming languages 
exist through packaged libraries available on the Internet. 
  
   
 224 
 
 Structure of YAML 11.3.1
YAML6 supports three basic primitives: mappings (hashes and dictionaries), 
sequences (arrays and lists), and scalars (strings and numbers).  Each of these data types 
can be nested within any of the supported data types to any depth (or branches) which is a 
necessary vehicle for scoping or name-spacing specification elements.  Table 11-6 shows 
examples of the three data types in YAML format along with their parsed abstract 
representation.  Notice in the last column that I have a set of sequences nested within a 
mapping element.  The strings and numbers represent the scalars in YAML. 
Table 11-6 Basic Data Types supported by YAML language which can be nested to as many levels as 
needed.  The first column shows the YAML syntax of the type shown in the second column.  The last column 
represents the parsed YAML in most modern programming languages 
 Example of Extensible Quantum Specification 11.4
The XQS specification in Figure 11-4 for a 2 digit ternary full adder has four primary 
sections: 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
6 YAML: stands for Yet Another Markup Language or Yaml Ain’t Markup Language 
YAML Input Type Parsed into language data structure 
- Mark McGuire 
  - Sammy Sosa 
  - Ken Griffey 
Array 
{Sequence} 
[“Mark McGuire”, “Sammy Sosa”, “Ken Griffey”] 
Home runs: 65 
Average: 0.89 
Total Runs: 147 
Hash 
{Mapping} 
{“Home runs”: 65, “Average”: 0.89, “Total Runs”: 147} 
American: 
  - Red Sox 
  - Tigers 
  - Yankees 
Canadian: 
  - Mounties 
Sequences 
nested with 
Mappings 
{ 
  “American”: [“Red Sox”, “Tigers”, “Yankees”], 
  “Canadian”: [“Mounties”] 
} 
   
 225 
Signature: Holds the signature of this function for human identification and reporting 
basis. 
Defaults: This section represents a base set of parameters which are shared among 
sections in the specification and implementation subsections. For example, the defaults 
section is inherited by the specification::inputs section, which is, itself, inherited by the 
specification::outputs section. In this case, both inputs and outputs have the value radix 
= 3. 
Specifications: This section holds one or more representations of the same 
specification. In this example, a truth table representation is used to represent a 2 digit 
full adder ternary function. 
Implementation: This section holds the set of quantum gates implementing the 
specified function. Similarly, this section allows for different presentations of the same 
implementation. 
 
Figure 11-4 Pseudo-code of an XQS specification describing a Full Adder with 2 inputs and 1 output.  
The implementation section shows a solution for this adder. 
   
 226 
The example described in Figure 11-5 shows a complete specification represented 
with a truth table mapping each input minterm to its corresponding output minterm.  With 
XQS format, it is easily possible to include multiple representation of the same 
specification in the same file (i.e., PLA, Blif, ..etc.). For example, a new element (e.g., 
Equation Format) can be nested under the specification element and use ternary logic 
equations to describe the same function.  Similarly, multiple implementation formats 
could be specified as shown in the figure, where both the RevLib and Typed formats are 
specified. The Typed format, for example, will be parsed into an array of terms for each 
gate, while the RevLib format will be parsed as a list of strings. Such flexibility provides 
a vehicle of interoperability between applications by allowing different algorithms to 
parse the specifications in a manner compatible to their internal data structures. 
Element inheritance is shown on the line [inputs: &defaults]. In this case, all the 
attributes defined in the defaults elements are used (inherited as the base definition of the 
inputs element, which then can be overridden by a specific definition within the inputs 
element. In this case, the inputs element will contain both the radix = 3 and variables = 2 
parameters when parsed by the YAML parser. The outputs element will contain radix = 3 
and variables = 1. 
   
 227 
 
Figure 11-5 XQS specification of a binary/ternary hybrid function where the radix is defined for each 
qubit (322 is ternary for the upper qubit, and binary for the other two).  “Don’t care” is shown as ‘-‘ in the 
output variable definition. 
 Hybrid Multiple-Valued Reversible Function 11.4.1
The XQS specification provides the ability to represent hybrid quantum circuit (HQC) 
specifications utilizing multiple basis of computation (e.g., binary and ternary).  Figure 
11-5 shows the input and output definition of a hybrid circuit using both binary and 
ternary registers (qubits). There are three input variables (abc) with variable a uses 
ternary logic while b and c are both binary. The output also defines three variables where 
the least significant digit represents a binary function while the other digits are 
insignificant (don’t care).  
HQC implementation where gates of different basis states operate on the same qubit 
are presented in the same sequential format used to specify a cascade of gates 
representing the implementation. In this case, gates of different basis would be listed 
back to back representing the specific implementation.  
   
 228 
 
Figure 11-6 XQS specification of Moore State Machine. 
 Multiple-Valued Finite State Machine 11.4.2
As a final example of the XQS I describe a small Finite State Machine (FSM). Figure 
11-6, Figure 11-7 show an odd number of ones detector for both Moore and Mealy 
machines respectively. The FSM is a two-state machine with a single input (x), which 
detects whether an odd number of ones came through a sequence of binary digits. Notice, 
that the FSM is specified by the two major sub-headings under the specification section: 
the transitions and the implementation. The implementation represents the state encoding 
while the transitions define the state transition and output generation function (for the 
Mealy FSM). For the case of the Mealy FSM, the output is defined for each next state - as 
shown in brackets. Notice that in the transitions section, you can define any arbitrary 
function, such as a threshold function, by following the same methods of defining 
classical state machines.  
   
 229 
 
 Benchmarks Organizational Description 11.5
In this section, I introduce a set of multiple-valued benchmark function generators 
through an online website found on http://quantumlib.cecs.pdx.edu.  The site currently 
generates the list of functions found in. 
Table 11-7 Sample benchmark functions available on MV Benchmark Repository: 
http://quantumlib.cecs.pdx.edu. 
Function Inputs Outputs Description Examples 
r_vgtec  v digits 1 binary variable The function sets the output to 1 
whenever the minterm >= the 
constant c. 
3_4gte5: four variable ternary 
function where the output is 1 
whenever the value >=5. 
r_v1gtev2 v1 + v2 
digits 
1 binary variable The function sets the output to 1 
whenever the first v1 digits are 
>= the last v2 digits. 
3_4gte2: six variable ternary 
function where the output is 1 
whenever the value of the 
upper 4 digits >= the value of 
the lower 2 digits. 
r_vcount1r v digits (r-1)*ceil(logr(v)) Counts the number of repeated 
instance of the digits 1 to r-1. 
3_4count12: ternary digit 
counter of 4 variables counting 
the number of ones and twos. 
r_vfulladder v digits ceiling(logr(v*r)) Adds all digits 3_4fulladder: adder of 4 
ternary digits 
r_vSwivel v digits v digits Reverse the order of digits where 
abc => cba 
3_5swivel: ternary swivel gate 
of five variables. 
 
Figure 11-7 XQS specification of Mealy State Machine. 
   
 230 
The web site allows users to select and group benchmarks according to their criteria 
with the goal of making the evaluation and comparison of results easier.  The following 
outlines some of the features and functionality of the site: 
• Function Generation On Demand: 
• Allows for specification of radix, number of variables and function type. Unlike 
other web sites, the benchmark functions are generated in real time and allows 
the user to build functions suited for his/her needs.  For instance a user might 
want to synthesize adders for 5,6,7 bits, and all such functions are generated in 
real time and provided to the user in the XQS file format. 
• Select whether the function should be fully reversible, even for incomplete 
functions. For such functions, the web site will add the necessary number of 
ancilla bits to both input output vectors and produces a reversible specification 
representing the same function. 
The resulting format of the functions is the XQS but other formats can be requested 
and will be generated in real-time.  Additional functions of the web site provide methods 
for ordering and retrieving previous results. 
•  Sorting: 
a. Functions can be sorted by the number of ancilla bits or cost of the 
gates.  Additional methods of sorting allow the user to cascade sorting 
functions to obtain a best function realization based on specified criteria.  
   
 231 
For instance, a search for a full adder on n input variables, ordered in 
the increasing order of the gate cost and of the ancilla bits. 
 Conclusion 11.6
In this chapter I introduced a set of multiple-valued benchmark functions for general use 
in automated synthesis algorithm that are generated on the QuantumLib web site. I also 
introduced an extensible file format specifically designed for MV quantum specifications 
and circuits which allows for simple specification of both fixed- and mixed-radix 
functions.  Since existing formats are definitions of binary classical circuits conveniently 
used for the quantum binary domain, they are limited in their structure to define some of 
the new concepts of quantum computing including multiple valued and hybrid logic.  The 
XQS format, however, is specifically designed to accommodate these new concepts and 
allows a level of flexibility for the algorithm designer to define ancillary information as 
they see fit. 
The extensibility of the XQS definition can also be a danger to interoperability 
between different algorithms if each algorithm defines its own structure.  At this stage of 
the maturity of quantum computing, the extensibility allows freedom in defining the 
structure, however, as the field matures, the structure has to be defined by the research 
and professional community to allow interoperability between disparate systems and 
algorithms. 
   
 232 
 Quantum Lib Function Generators 11.7
The function generator on the quantum lib website provides a list of functions already 
generated along with the parameters used to generate it.  The site also provides a search, 
sorting and pagination functionality to make it easy to find classes of functions based on 
their names, radix or number of variables. 
 
Figure 11-8 Index page of quantum function generator. 
The site allows the user to view existing functions or generate new functions with the 
ability to download the function specification in the new Extended Quantum 
Specification (XQS) format. 
   
 233 
 
Figure 11-9 details of function along with downloadable specification 
 
  
   
 234 
 
 Generalized Multiple Valued Swivel Gate Chapter 12
 Introduction  12.1
The majority of quantum logic synthesis algorithms have typically sited their 
performance according to an established set of quantum cost numbers counting the 
number of single- and two-qubit primitive quantum gates [1,2,3,4,5,6,7].   Recently, a 
growing number of researchers are taking into account the implications of LNNM 
physical constrains imposed by technology used to construct quantum circuits.  So rather 
than calculating a quantum cost which ignores such technological constraints, algorithms 
have utilized the well-known universal swap gate to bring interacting qubits next to one 
another, and in turn, enforcing the LNNM constraint [8, 9, 10, 11] entry.  In a typical 
application of the LNNM constraint, the information of a qubit is typically transported 
over a long range of qubits through as set of swap gate in order to bring it next to the 
associated qubit of the gate.  Moreover, the same number of swap gates are used to 
restore the qubit to its initial location. 
Khan, et al [13] introduced the set of ternary gates based on Galois Field logic.  This 
branch of logic is a subset of the finite field logic where the field order is either a prime 
GF(p) or power of prime GF(pr).  For the purposes of this ternary logic, the GF(p) is 
typically used where the GF(p) is the finite field of residue classes modulo p.  In their 
paper, Khan et al, also introduced the ternary swap gate which I will discuss later and 
build upon to derive the set of multiple valued ternary and swivel gates. 
   
 235 
The chapter starts with a discussion of LNNM logic and binary swap gates and 
demonstrates the extension to binary swivel gates in section II and calculates the cost of 
binary swivel gate in section III.  Analogous construction of the ternary swivel gate is 
presented next followed by the derivation of cost of the ternary swivel gate, sections IV 
through VI.  Finally, sections VII through IX show the derivation of multiple-valued 
swap gate, swivel gate and the calculation of number of gates for each.  
 Binary Swivel Gate 12.2
Figure 12-1 (a) shows a set of swap gates which bring the distant qubits (a and d) next 
to one another facilitating the LNNM interaction.  A second set of swap gates 
immediately follow (mirror) to restore the a qubit back to its original place in the circuit.  
Due to introduction of the mirror set of swap gates, the gray CNOT gates effectively 
cancel one another, and hence it is possible to use 2 CNOT gates to represent the 
functionality of a swap cascade as shown in Figure 12-1(b).  
 
Figure 12-1 (a) LNNM rendition of distant CNOT gate (on far left) which is brought into LNNM 
compliance with the aid of swap gates (block of 3 CNOTs), (b) minimized version of (a) where the shaded 
CNOT gates cancel one another. 
The swivel gate operates on a number of qubits (n) and it transforms the set of qubits 
by symmetrically swapping the content of qubits around the center.  In the case of an odd 
number of qubits, the qubit in the center represents the pivot point, and in the case of an 
   
 236 
even number of qubits, the gap between the two central qubits represents the pivot point – 
see Figure 12-2. 
Definition 1: An n variable swivel gate Sv(n) swaps operating on the qubits (x1, x2, 
…xn) swaps the values of the qubits around a central pivot point in a symmetrical 
arrangement.  In essence, the swivel gate transforms the input (x1, x2, …, xn-1, xn) to (xn, 
xn-1, …x2, x1).  
Figure 12-3 shows the block diagram of a swivel gate presented in [12].  Notice that 
the well-known binary swap gate (shown within the dotted box) is a special case of the 
generalized swivel gate introduced in this chapter.  The 4-bit binary swivel gate shown in 
the figure exhibits a regular recursive structure of CNOT gates which repeats in multiple 
stages.  The pattern of repetition can be visualized as a set of down ladders which start 
from the left of the gate where each successive CNOT ladder has one less qubit.  Once 
the last pair of qubits interacts through a two gate CNOT ladder, a rising ladder 
completes the gate.  
 
Figure 12-2 Pictorial of swivel gate with (a) odd number of qubits, (b) even number of qubits. 
The down ladder pattern propagates the upper qubit (a) down to the lowest line (4) 
while shifting all other qubits (c-d) up by one position.  A similar pattern applied to lines 
   
 237 
1-3 propagates the qubit on line 1 (b) down to the lowest line (3).  Lastly, the last pattern 
applied to lines 1-2 represents the 2-qubit reduced swap gate pattern seen in Figure 1 
above.  At this stage, line 1 has the value of qubit (d) while all other lines hold their final 
qubit value in addition some remainder term as a result of the down ladder.  In order to 
remove the remainder term, I successively apply a set of CNOT gates between every two 
neighboring qubits.  Notice that the remainder term for each qubit is exactly the same as 
the value of the qubit immediately above it.  For example, the last two lines hold the 
terms: 
𝑙𝑖𝑛𝑒  3 =   𝑏⨁𝑐⨁𝑑    𝑙𝑖𝑛𝑒  4 =   𝑎⨁𝑏⨁𝑐⨁𝑑    
By applying a CNOT operation between from line 3 to line 4, I eliminate the 
remainder term as follows: 
𝑙𝑖𝑛𝑒  4 = 𝑎⨁𝑏⨁𝑐⨁𝑑   ⨁ 𝑏⨁𝑐⨁𝑑 =   𝑎   
Repeating the same process for each pair of qubits will remove all remainder terms 
and yield the result expected by the swivel gate.  
 
Figure 12-3Four bit binary swivel gate composed of a set of CNOT gates.  The upper right triangle 
shows the well-known swap gate composed of three CNOT gates. 
   
 238 
 Quantum cost of binary swivel gate 12.3
The quantum cost of a binary swivel gate can be easily calculated from the number of 
CNOT primitve gates used to construct this composite swivel gate. 
#  𝐶𝑁𝑂𝑇! = 2 ∙ 𝑘!!!!!! + 𝑛 − 1    
                   = 𝑛 𝑛 − 1 + 𝑛 − 1 = 𝑛! − 1      (1) 
Equation (1) also represents the binary LNNM quantum cost of the swivel gate. 
 MV Ternary SWAP Gate 12.4
I now consider construction of a swivel gate for multiple valued logic and I start by 
demonstrating the concept to the ternary domain.  Table 12-1 shows the set of ternary 
gates based on the Galois Field Sum of Product Logic (GFSOP) for basis 3 GF(3).  
Table 12-1 Quantum Ternary Operators based on GF(3) logic. 
Input (X) 0 1 2 GF(3) 
X+0 0 1 2 𝑋 
X+1 1 2 0 𝑋 + 1 
X+2 2 0 1 𝑋 + 2 
X12 0 2 1 2 ∙ 𝑋 
X02 2 1 0 2 ∙ 𝑋 + 1 
X01 1 0 2 2 ∙ 𝑋 + 2 
 
And the following set represents the GF(3) basic literals for variable x:  
𝐺𝐹 3 𝐿𝑖𝑡𝑒𝑟𝑎𝑙𝑠 = 1,2,𝑋!!,𝑋!!,𝑋!!,𝑋!",𝑋!",𝑋!",𝑋!  
  
 
Figure 12-4 ternary swap gate requring 5 primitve gates to implement and using Galois Field 3 algebra. 
   
 239 
Khan et al [13] discusses GFSOP gates in further details.  In the same paper, Khan 
introduced the ternary swap gate as show in Figure 12-4.   The symbol ⨁ representing the 
XOR gate in binary logic is extended to represent the modulo function, which in the 
ternary domain, the equation A⨁B represents (A+B) modulo 3.   In figure 4, line 2 has 
the value a⊕b after the first CMOD3 (Controlled Modulo 3) gate.  Were I to follow the 
exact same pattern of the binary swap gate and place an inverted CMOD3 gate from line 
b to a, I would have ended up with a⊕a⊕b = 2a⊕b on upper qubit which is not desired.   
In order to remedy this condition, I place the gate [[X12]], symbolized as 2𝑥 in [13], 
which performs a doubling operation on qubit (a).  Now when I place the inverted 
CMOD3 gate after the [[X12]] gate, I end up with the value of (b) on the upper qubit as 
follows: 
𝑢𝑝𝑝𝑒𝑟  𝑞𝑢𝑏𝑖𝑡 = 2 ∙ 𝑎 + 𝑎 + 𝑏   𝑚𝑜𝑑𝑢𝑙𝑜  3                   = 3𝑎 + 𝑏 𝑚𝑜𝑑  3 = 𝑏 
Similarly, the lower qubit has the term a⨁b  instead  of  simply  a.      By  adding  the  last  two  CMOD3 gates will eliminate the b term as follows: 
𝑙𝑜𝑤𝑒𝑟  𝑞𝑢𝑏𝑖𝑡 = 𝑎 + 𝑏 + 𝑏 + 𝑏   𝑚𝑜𝑑𝑢𝑙𝑒  3 = 𝑎 + 3𝑏   𝑚𝑜𝑑𝑢𝑙𝑜  3 = 𝑎 
 Ternary Swivel Gate 12.5
Figure 12-5 shows a 4-variable ternary swivel gate which follows a pattern similar to 
its binary counterpart.  I start by constructing the left hand down ladder which carries the 
first qubit down to line 4 while shifting all other qubits (b-d) one position up.  Notice the 
placement of the [[X12]] operator, shown by its mathematical equivalence as [[2x]], 
   
 240 
which, analogous to the ternary swap gate, will eliminate the remainder term returned by 
the inverted gate.  The recursive down ladder pattern repeats in a number of stages until 
the qubit value of the lowest line propagates to the top line.    Similar to the binary swivel 
gate, for a function of n variables, there exists n-1 down patterns. 
 
Figure 12-5: 4-qubit ternary swivel gate which has similar structure as its binary counterpart with the 
addition of the 2x gate and the double return gates at the far right. 
At this stage of construction, the ternary swivel gate has a similar structure to its 
binary counterpart, and each line has the same GF(3) value on its line.  Line 4, in Figure 
5, has the value 𝑎⨁𝑏⨁𝑐⨁𝑑  instead of the desired value of qubit (a).  In order to 
eliminate the remainder term 𝑏⨁𝑐⨁𝑑, I place two CMOD3 gates rather than one which 
will remove the remainder term according to GF(3) logic as follows: 
𝑙𝑖𝑛𝑒  4 = 𝑎⨁𝑏⨁𝑐⨁𝑑   2 ∙ 𝑏⨁𝑐⨁𝑑  = (𝑎 + 3𝑏 + 3𝑐 + 3𝑑)𝑚𝑜𝑑𝑢𝑙𝑜  3 = 𝑎 
A similar up ladder with double gates is applied to the remaining pair of qubits to 
eliminate the rest of reminder terms. 
 Quantum Gate Count of Ternary Swivel Gate 12.6
In addition to the gates found in the binary case, I introduced a set of [[X12]] to correct 
for the returning remainder term, and another set of up ladder CMOD3 gates to eliminate 
   
 241 
the remainder terms at the end of the cascade in accordance with GF(3) logic.  For a 
swivel gate of n variables, the number of required gates is: 
𝑛𝑢𝑚𝑏𝑒𝑟  𝑜𝑓  𝑔𝑎𝑡𝑒𝑠 = 𝑛! − 1 + 𝑛! − 𝑛2 + 𝑛 − 1  = !!!!!!!!                                  (2) 
 MV Swap Gate 12.7
I now extend this concept further to higher computational basis following the same 
construction and using GF(r) mathematics where r represents the basis of computation.  
Notice that, according to the definition of GF logic, the value of r must be prime number.   
Figure 12-6 shows a multiple valued swap gate using the GF(r) logic.  Notice that in 
order to eliminate remainder terms in the final values of the qubits, a set of (r-1) CMODr 
gates are needed for each rung of the up ladder. 
         
 
 MV Swivel Gate 12.8
Figure 12-7 shows a generalization of the ternary swivel gate for multiple valued logic 
of basis (r).  Building on the MV swap gate, a set of multiplication gates with a constant 
multiplier (r-1) is used to eliminate remainder terms for moving qubits up in the down 
ladder.  The up ladder requires a cascade of (r-1) controlled GF(r) gates between each 
two neighboring lines in order to remove the final remainder terms.  
Figure 12-6 multiple valued Swap gate using the (r-1)x gate for bias and (r-1) control gates on the far right. 
   
 242 
 
Figure 12-7 Four variable Multiple Valued (radix = r) Swivel Gate which has the same structure as the 
ternary and binary counterparts with the use of bias gates (r-1)x and the (r-1) cascade of controlled gates 
at the far right. 
 Quantum Gate Count of Ternary Swivel Gate 12.9
I can now calculate the number of gates necessary for the MV swivel gate as follows: 
𝑛𝑢𝑚𝑏𝑒𝑟  𝑜𝑓  𝑔𝑎𝑡𝑒𝑠 = 3 ∙ 𝑘!!!!!! + 𝑟 − 1 ∙ 𝑛 − 1  = 3 𝑛 𝑛 − 12 + 𝑟 − 1 ∙ 𝑛 − 1  = 𝑛 − 1 !!! + 𝑟 − 1              (3) 
Table 12-2 shows the number of quantum gates along with the general formula for 
calculating the number of quantum gates needed to construct a swivel gate of any basis. 
Table 12-2 Number of gates of swivel gates for binary, ternary and multiple valued radix-r basis of 
computation. 
# qubits Binary Ternary MV Basis (r) 
2 3 5 𝑟 + 2 
3 8 13 2𝑟 + 7 
4 15 24 3𝑟 + 15 
5 24 38 4𝑟 + 26 
6 35 55 5𝑟 + 40 
n 𝑛! − 1       3𝑛! + 𝑛 − 42  𝑛 − 1 !!! + 𝑟 − 1               
 
 
   
 243 
 Conclusion 12.10
In this chapter I introduced the generalized multiple valued swivel gate for n number 
of variables which is applicable to any basis of computation.  I have also calculated the 
number of GF(r) logic gates necessary to implement this logical block.  The family of 
binary swivel gates has proven useful for implementation of LNNM compliant MCTn 
gates as shown in section 8.4.  I believe that this family of composite gates will be useful 
for other applications because it follows a regular structure and is scalable in respect to 
the number of variables and to the basis of computation. 
   
 244 
 
 Final Conclusion Chapter 13
My earliest thought as I first started to learn about quantum computation and quantum 
mechanics was one of bewilderment and skepticism at the fact that such phenomena are 
feasible.  Einstein’s famous references to quantum mechanics as “Spooky actions at a 
distance” and “God don’t play dice with nature” provided a level of affirmation to my 
skepticism; yet at the same time, piqued my curiosity to dive deeper into this subject and 
learn about the new understanding of nature.   As I started to learn about reversible logic 
and quantum computing and quantum logic synthesis, I initially had trouble 
understanding the advantages that such a technology could hold over our increasingly fast 
processors.  I also wondered whether this path of research is mostly theoretical and would 
never have any value or, possibly, one day, my research would be useful in advancing 
this burgeoning field.  Reversible logic was also a new topic which I had not encountered 
in my academic or professional career and I was not sure why reversible logic is a topic 
of importance since a typical circuit has a one directional flow of information: inputs to 
outputs. 
I was already aware by Gordon Moore’s famous prophecy and I was convinced that 
such prophecy has an end date that was approaching quickly.  Issues of heat dissipation 
from processors, hard drives, and all our technologies was common knowledge as well.  I 
started to grasp the importance of reversibility and its relation to heat dissipation from 
Landauer’s [1] who proved that binary logic circuits built using traditional irreversible 
gates inevitably lead to energy dissipation, regardless of the technology used to realize 
   
 245 
the gates.  He equated that the loss of information in a classical gate manifests as heat 
because information is energy and energy cannot be lost, rather, it transforms from one 
form to another.  For example, a classical OR gate has two inputs and a single output, 
where, assuming that it takes an amount of energy µ to maintain the information for each 
input, the total energy fed into the OR gate equals to 2µ.  However, since a single output 
is produced by the gate at an energy level of µ, the remaining energy exits the OR gate in 
the form of heat.   
As modern silicon technology packs more and more transistors within smaller 
volumes, heat dissipation becomes even more of a problem.  Zhirnov et al. [98] showed 
that power dissipation in CMOS technology will eventually make it impossible to remove 
the heat efficiently and foretold of 2020 as the milestone when classical CMOS 
technology will reach such a roadblock. Bennett [3] also declared that the use of 
reversible gates will eliminate or greatly reduce the loss of heat and confirmed the earlier 
prediction by Landauer that the preservation of information will preserve power and 
eliminate heat loss.  Reversibility is truly a simple concept to comprehend where a 
reversible computing block exhibits a one-to-one mapping between its inputs and outputs.  
In theory, one can trace back the input value if the output value is known, and vice versa.  
Permutative specifications are the most common example of reversible circuits where all 
minterms of output vector are permutations of the minterms of the input vector.   
Quantum gates exhibit such reversibility where quantum computation utilizes quantum 
mechanics phenomena to perform computations that are typically performed in a closed 
system in which the total energy of the system must be preserved.  Specifically, the 
principle of preservation of angular momentum of an electron, which represents an 
   
 246 
electron spin, must be preserved – an electron spin (up or down) represent the basis states    0   𝑎𝑛𝑑   1  of a quantum system.  Nielsen and Chuang [78] showed that a binary 
permutative specification complies with these quantum principles which as I noted 
earlier, a permutative specification is reversible, which confirms Bennett [3] conclusion, 
that such circuits will not exhibit heat loss. 
Still, is the field of quantum logic theoretical or do I have empirical evidence that such 
principles are feasible, technologically, for future computations.  Nielsen and Chuang 
[78], Stock [90], Cirac [16], Holzscheiter [15] and others described real implementation 
of technologies based on quantum mechanics used to perform actual computations and 
measurements in the lab.  The most leading technologies are Ion-Trap and Nuclear 
Magnetic Resonance (NMR) which have been used to initialize qubits with information 
and then perform computation using either laser or EMF pulses to perform the 
computations. 
In conclusion, the future of transistor-based technologies is reaching its limits and the 
urgency to discover new methods of computations is real.  Quantum computation is one 
of the promising technologies, which skirts the issues of heat loss, inherently suitable for 
massively parallel computations, and have been shown to be feasible.  As with existing 
technologies, automated logic synthesis is one of the essential tools for implementing 
large systems, and I chose to focus my research on the developing field of automated 
quantum logic synthesis of permutative specifications for large number of variables.  
Although other research has been done in this space, I found that the focus has been on 
small circuits, on methods that add many ancillary qubits, and methods that do not 
account for the physical implementation of quantum gates.  I wanted to provide 
   
 247 
algorithms to automate the synthesis of large circuits and redefine the yardstick used to 
measure the performance of an algorithm in order to account for the physical constraints 
imposed by the technology – specifically, linear nearest neighbor model.   I also wanted 
to explore logic synthesis algorithms in the ternary and other multiple valued logic 
domains. 
 Accomplishments of my research 13.1
 Automated Logic Synthesis of Large Binary Specifications 13.1.1
I created a succession of algorithms that accept a permutative binary specification and 
automatically generate a quantum cascade of gates that implement the circuit.  Although 
similar algorithms existed at the time with the same general goal, they lacked the focus 
on large number of variables, took a long time to synthesize, added a large number of 
ancillary qubits or did not explore other solutions in the search space.   The following 
table provides a comparison of my contributions (shaded) to others: 
   
 248 
 MMDSN Algorithm: 13.1.2
I collaborated with Alhagi on the implementation of this algorithm which, unlike 
MMD, explored a large subset of the search space, and unlike Stedman, finished in a 
short amount of time and is capable of performing up to 30 bit functions.  Compared to 
MMD, MMDSN is able to achieve improvements of up to 55% for smaller functions.  
However, similar to all other algorithms in the space, as the number of variables 
increases, the performance becomes lower in quality of quantum cost, and higher in the 
time it takes to synthesize.  Stedman on the other hand who synthesizes every possible 
solution in the search space performs well compared to MMDSN, but quickly halts at 5 
Table 13-1 Comparison of my binary synthesis algorithms, MMDSN and CSP, to the work of others in 
the field.  My algorithm offers the best compromise for this multiple optimization problem, where the 
number of variables is maximized while the time of synthesis, quantum cost are minimized without the 
addition of any ancillary variables. 
Algorithm Quality of Results Speed Size of 
Function 
Scalability Complexity Convergence 
MMDSN [21] Low quantum cost, 
no ancilla bits 
Slow (hours 
for 12+ 
bits) 
30 Multithreaded, 
Encodes values in 
bits 
kO(2n) Guaranteed 
CSP  with Tabu 
Search [24] 
Lowest quantum cost, 
no ancilla bits 
Slow (hours 
for 11+bits) 
30  Multithreaded nkO(2n) Guaranteed 
MMD [56] Medium, Only 
explores a single 
solution 
Fast 
(minutes for 
12+ bits) 
12  None O(2n) Guaranteed 
Agrawal/Jha [63] Medium, storage 
space quickly 
exhausted, and adds 
ancilla bits. 
Very slow 
(hours for 
6+ bits)  
16 Limited by the 
amount of 
memory.  
nO(2n) Not 
guaranteed 
Shende [61] Exact for small 
functions 
Very slow 3 Limited by 
memory and time 
nO(2n!) Guaranteed 
Patino [99] Good, some 
backtracking  
Very slow 
(hours for 
7+ bits) 
18 Limited by 
memory and time 
nO(2n) Guaranteed 
Stedman [11] Best MMD style 
search solution 
Very Slow 
(days for 7+ 
bits) 
5 Limited by time O(2n!) Guaranteed 
Wille/Dreschler 
[57, 59] 
Medium, Adds many 
ancillary variables 
Slow (hours 
for 9+ bits) 
16 Limited by 
memory 
nO(2n) Guaranteed 
   
 249 
variables, as the number of solutions equals to 2n!.  MMDSN establishes a balance 
between the time it takes to synthesize and the number of solutions it explores.  In my 
experiments, the MMDSN algorithm is able to synthesize 10,000 samples for the 9-
variable hwb9 function in approximately 1 hour.  In comparison, Stedman takes almost 
12 hours to synthesize a function of 5 variables.  MMDSN also does not add any ancilla 
bits like the algorithms by Wille and Agrawal.   
I published this work in ISMVL 2010 conference and in the Facta Universitatis 
journal. 
 Covered Set Partition Algorithm 13.1.3
Although the MMDSN has provided better results than similar algorithms, its 
performance lagged as the number of variables increased.  I realized that, although, the 
Hasse structure provides a means of constructing converging solutions, that structure only 
allowed for a subset of the valid solution space to be explored.  I developed the Covered 
Set Partition (CSP) algorithm which includes the MMD sequence and all MMDSN 
sequences and which also includes only valid solutions.  The CSP algorithm partitions the 
search space into a set of partitions where the partition boundary is adjustable to any of 
the bit boundaries.  By partitioning the search space in this manner, the CSP algorithm 
has a better chance of discovering solutions of lower quantum cost, since each partition is 
smaller by a number of magnitudes when compared with the entire search space.  I 
discovered that partition sizes which are in the vicinity of the half point of the number of 
variables typically result in the best quantum cost.  I also discovered that the CSP 
algorithm provided the best results with the aid of Tabu search.   
   
 250 
One of the drawbacks of this algorithm is that it takes longer to execute, however, by 
focusing on the few partition sizes around the center point, it is most likely that I can 
discover solutions better than MMD, MMDSN and some of the other similar search 
algorithms.  For functions below 12 bits, the CSP algorithm performed better than 
MMDSN and MMD by 25% while for 12- and 13-bit functions, the CSP performed more 
than 1000% better than MMDSN.  This of course shows that breaking the search space 
into smaller subsets allows the CSP to find better solutions. 
I also studied the impact of changing the selection process of candidate solutions and 
compared the results amongst random selection, two variants of a genetic algorithm and 
Tabu search.  I discovered that the use of Tabu search combined with partitioning 
produce better results than each on its own. 
I reported my findings in the International Technology New Generations (ITNG) 
conference in 2010 and the IEEE Innovation in Information & Communication 
Technology (ISIICT) conference in 2011. 
 Redefinition of Benchmarks Measurement 13.1.4
As I learned about some of the physical implementation of quantum computing using 
Ion-Trap and NMR, I realized that the de facto quantum cost metric used in the literature 
is not realistic.  I decided to explore the Linear Nearest Neighbor Model as a realistic 
alternative which would bring compliance of benchmark measurement closer to physical 
implementation.  Although I found several attempts by other researchers to address the 
nearest neighbor constraint, I realized that their contribution was still incomplete since 
they did not address the internal structure of the composite gates used to construct the 
   
 251 
circuit.  In addition, none of them provided a comprehensive set of equations to calculate 
the LNNM quantum cost. 
I initially derived a set of equations which calculate the number of swap gates 
necessary to bring any binary quantum into LNNM compliance.  Inserting swap gates 
was the common method of enforcing LNNM in the literature and everyone ignored the 
internal structure of the MCT gates.  I then derived equations to calculate the number of 
swap gates necessary to bring any of the MCT gates into compliance with LNNM.  
Finally, I implemented a program to calculate the LNNM quantum cost for all existing 
benchmark functions on RevLib [30] and Maslov’s [20] websites. 
I then derived a set of equations which optimize the LNNM structure for the most 
commonly used family of MCT gates by reducing the number of CNOT gates needed for 
enforcing LNNM architecture.  I also investigated arranging the qubits on a 2-D grid and 
calculate the quantum cost of some of the MCT gates when arranged in a 2 dimensional 
plan (rather than a straight line). 
I published a paper in ULSI 2012 where I reported my results and proposed that the 
LNNQC should be used for comparison because it is compliant with technology and that 
the LNNQC brings evenness to comparison amongst different methods of synthesis.  I am 
in the also in the process of preparing a paper to report the improved quantum cost and 
the 2-D model. 
 Ternary Quantum Synthesis 13.1.5
I also extended my work into the ternary domain with the hope that using the same 
number of qubits to hold a higher concentration of information could potentially be 
   
 252 
important in the development of quantum computing.  I realized that the research in the 
ternary domain was minimal compared to its binary counterparts, and that few 
researchers addressed logic synthesis for large functions.  I decided to extend my work in 
constructing valid solutions using the Hasse diagram into the ternary domain and reported 
the results of my findings. 
The ternary domain introduced a new set of challenges that were not applicable in the 
binary domain.  The issue of precedence amongst the three digits extended the potential 
search space by multiple magnitudes and introduced many possibilities of constructing 
the Hasse diagram.  I addressed this issue and demonstrated the construction of the Hasse 
diagram using multiple precedence orders and provided a proof that for each order the 
synthesis algorithm will find a solution.   
Miller, Maslov and Dueck [MMD] reported their about their logic synthesis for a 2 
variable ternary function using the natural ternary order for their input sequence.  Al 
Rabadi reported various methods of synthesis without a particular set of benchmark 
measures.  As I did not find a set of benchmarks for ternary logic for larger number of 
variables, I introduced the Hidden Weighted Trit family of functions which defined a 
general set of functions for up to 9 variables.  I used my ternary synthesis algorithm and 
demonstrated that it is capable of synthesizing up to 9 ternary variables and that the 
results are superior to using the single natural order. 
One of the challenges with any logic synthesis algorithm is the amount of time it takes 
to synthesize, and this is particularly important in the ternary domain since the spaced 
grows at the rate of 3n!  I constructed a set of experiments to study the performance 
benefit from using the CUDA graphic processor with 1024 processing cores vs. an Intel 
   
 253 
i7 processor with 8 cores.  For functions of five ternary variables, CUDA provided a 
300% speedup compared to the Intel processor which dropped to 100% speedup for a 
sixe ternary variable function.  Despite of the number of execution cores, CUDA might 
not be ideal for quantum logic synthesis as the number of variables increase because the 
local memory buffer (fastest memory access in CUDA) is smaller than such functions 
and would eventually result in huge degradation of performance.  One of the main 
advantages of the Intel processor is the structure and the size of its L1 and L2 caches 
which are large enough to hold the local dataset, and its data streaming and prefetching 
policies are ideal for quantum logic synthesis. 
I reported my findings in ISMVL 2012 and published a chapter about the work on 
CUDA in the book “GPU Computing with Applications in Digital Logic”, TICSP 2012. 
 Extended Quantum Specification and MV function generator 13.1.6
In addition to introducing the HWT family of functions, I introduced six more 
multiple valued functions into the literature and created an online website to generate 
such functions.  The function generator is available from the Portland Quantum Logic 
Group website: quantumlib.cecs.pdx.edu.  Using the website, you can request the 
generation of one of the following functions: Counter of non-zero literals, Reflection 
(Swivel) gate, Full Adder, Comparator against a constant, Comparator against a 
variable, and the Hidden Weighted Digit.  The user can specify the radix of computation 
(binary, ternary, ..etc.), the number of variables and whether they want the function to be 
converted to a reversible specification by adding ancillary digits. 
   
 254 
In the process of developing these multiple valued functions, I realized that the current 
file formats (PLA, REAL) used to specify binary quantum specifications is insufficient 
for specifying multiple-valued and hybrid quantum functions.  In turn, I introduced the 
new file format, the Extended Quantum Specification (XQS), to accommodate the 
necessary definition of such functions.  The XQS file format is a specific implementation 
of the commonly used YAML file format. 
I will be reporting this work in the upcoming conference, ISMVL 2013. 
 Generalized Multiple Valued Swivel Gate 13.1.7
In the process of optimizing the Multiple Control Toffoli gates to the Linear Nearest 
Neighbor Model, I came across a specific pattern that swaps the values of a set of qubits 
around a pivot point.  I realized that this pattern also applies to the commonly used binary 
two variable swap gate.  After spending time to understand the structure of such a pattern, 
I was able to generalize the construction of such gate to any number of variables in the 
binary domain and derived a set of equations to calculate the cost of such pattern, which I 
dubbed as the Swivel gate. 
I then explored extending the same pattern to multiple-valued logic and was able to 
create a generalized construction of the composite swivel gate to any radix of 
computation.  I also provided a generalized set of equations to calculate the cost of this 
composite gate based on the radix and the number of variables.  The swivel gate is 
LNNM compliant by design that makes it ideal in synthesis algorithms which adhere to 
the LNNM architecture. 
I will be introducing this family of gates in the Reed-Muller workshop in May 2013.  
   
 255 
 Future Work 13.2
I am currently collaborating with Dr. Martin Lucak in Japan on a post processing 
algorithm aiming at further reduction of the synthesized circuit.  This effort is attempts to 
use intermediate ancillary qubit to share control lines amongst neighboring Multiple 
Control Toffoli (MCT) gate within a circuit.   A secondary post processing stage will 
further reduce these added ancillary qubits through template matching and substitution of 
common gate patterns with more efficient representations. 
I also plan on pursuing further work in the LNNM binary domain and revise the CSP 
algorithm to use the LNNM architecture to calculate its quantum cost.  I also plan on 
researching the potential of automated synthesis which takes account of LNNM during 
the process of synthesis rather than a post processing or calculation stage. 
 
  
   
 256 
List of Publications 
1.  “Synthesis of Reversible Circuits with no Ancilla Bits for Large 
Reversible Functions Specified with Bit Equations”, Alhagi N., Hawash 
M., Perkowski M., published: IMSVL 2010 [22] 
2. “Synthesis of Reversible Circuits for Large Reversible Functions”, 
Alhagi N., Hawash M., Perkowski M., published: FACTA 
UNIVERSITATIS (NIŠ). [21] 
3. “Reversible Function Synthesis of Large Reversible Functions 
With No Ancillary Bits Using Covering Set Partitions”, M. Hawash., M. 
Perkowski, S. Bleiler, J. Caughman, A. Hawash,  Published ITNG 2011 
[23] 
4. “Application of Genetic Algorithm for Synthesis of Large Reversible 
Circuits using Covered Set Partitions”, Hawash M., Abdalhaq B, 
Hawash, A., Perkowski, M., Published ISIICT 2011, Amman [24] 
5. “Proposal for Normalized Quantum Cost through Compliance with 
Linear Nearest Neighbor Model”, Hawash M., Perkowski M., Published: 
ULSI 2012, Victoria, B.C. [25] 
6. “Synthesis of Ternary Quantum Circuits Utilizing Hasse Diagrams”, 
Hawash M., Perkowski M.: Published: ISMVL 2012, Victoria, BC. [26] 
7.  “Synthesis of Ternary Quantum Circuits Using Hasse Diagrams and 
Genetic Algorithms”, Hawash M, Perkowski M., Lukac M.: Book 
Publication: GPU Computing with Applications in Digital Logic, 
   
 257 
Tampere International Center for Signal Processing, TICSP Series #62, 
2012. 
8. “Multiple Valued Reversible Benchmarks and Extensible Quantum 
Specifications (XQS) formats”, Hawash M., Lukac M., Kameyama M., 
Perkowski M.: Accepted ISMVL 2013. 
9. “Generalized Multiple Valued Swivel Gate”, Hawash M., Perkowski 
M.: Accepted to Reed-Muller 2013. 
  
   
 258 
Bibliography 
  
[1] R. Landauer, "Irreversibility and heat generation in the computing process," IBM 
Journal of Research and Development, vol. 5, pp. 183-191, 1961.  
[2] S. K. Moore, "Laudauer Limit Demonstrated," May 2012. [Online]. Available: 
http://spectrum.ieee.org/computing/hardware/landauer-limit-demonstrated. 
[3] C. Bennett, "Logical Reversibility of computation," IBM Journal of Research and 
Development, pp. 525-532, 1973.  
[4] T. Toffoli, "Reversible Computing," Feb 1980. 
[5] B. D. a. A. D. Vos, "A reversible carry-look-ahead adder using control gates," 
Integration, the VLSI journal, vol. 33, no. 1-2, pp. 89-104, 2002.  
[6] R. Van Meter and M. Oskin, "Architectural Implications of Quantum Computing 
Technologies," ACM Journal on Emerging Technologies in Computing Systems, 
vol. 2, no. 1, pp. 31-63, Jan 2006.  
[7] L. Mearian, Feb 2012. [Online]. Available: 
http://spectrum.ieee.org/computing/hardware/landauer-limit-demonstrated. 
[8] M. N. a. I. Chuang, Quantum Computation and Quantum Information, Cambridge 
University Press, 2000.  
[9] D. Miller and G. Dueck, "Spectral Techniques for Reversible Logic Synthesis," 
Reed-Muller Workshop, 2003. 
[10] D. Miller, D. Maslov and G. Dueck, "Synthesis of quantum multiple-valued 
circuits," Journal of Multiple-Valued Logic and Soft Computing, vol. 12, no. 5-6, 
pp. 431-450, 2006.  
[11] C. Stedman, B. Yen and M. Perkowski, " Synthesis of Reversible Circuits with 
Small Ancilla Bits for Large Irreversible Incompletely Specified Multi-Output 
Boolean Functions," in Proc. 14th International Workshop on Post-Binary ULSI 
Systems, 2005.  
[12] M. Szyprowski and P. Kerntopf, "An Approach to Quantum Cost Optimization in 
Reversible Circuits," in IEEE International Conference on Nanotechnology, 
   
 259 
Portland, OR, 2011.  
[13] N. Wu, M. Kamada, A. Natori and H. Yasunaga, "Quantum Computer Using 
Coupled Quantum Dot Molecules," arXiv:quant-ph/9912036, 1999. 
[14] P. Liu, J. Santana, Q. Dai, X. Wang, P. Dowben and J. Tang, "Sign of the 
superexchange coupling between next-nearest neighbors in EuO," Phys. Rev. B, 
vol. 86, no. 22, p. 408, 2012.  
[15] M. Holzscheiter, "Ion-Trap Quantum Computation," Los Alamos Science, vol. 27, 
pp. 264-281, 2002.  
[16] J. Cirac and P. Zoller, "Computations with Cold Trapped Ions," Physics Review 
Letters, vol. 74, p. 4091, 1995.  
[17] A. Fowler, S. Devitt and L. Hollenberg, "Implementation of Shor's algorithm on a 
linear nearest neighbor qubit array," Quantum Information and Computing, vol. 4, 
no. 4, pp. 237-251, 2004.  
[18] R. Feynman, The Feynman Lectrues On Physics, Vol III, Definitive Edition ed., A. 
Black, Ed., San Francisco: Pearson Addison Wesley, 2006.  
[19] M. Hawash, "Portland Quantum Logic Group," Portland State University, 2010. 
[Online]. Available: quantumlib.cecs.pdx.edu. 
[20] D. Maslov, 2011. [Online]. Available: http://webhome.cs.uvic.ca/~dmaslov. 
[21] N. Alhagi, M. Hawash and M. Perkowski, "Synthesis of Reversible Circuits for 
Large Reversible Functions," FACTA UNIVERSITATIS (NIŠ), vol. 23, no. 4, pp. 
273-286, December 2010.  
[22] N. Alhagi, M. Hawash and M. Perkowski, "Synthesis of Reversible Circuits with 
No Ancilla Bits for Large Reversible Functions," in International Symposium on 
Multiple-Valued Logic, Barcelona, Spain, 2010.  
[23] M. Hawash, M. Perkowski, S. Bleiler, J. Caughman. and A. Hawash, "Reversible 
Function Synthesis of Large Reversible Functions With No Ancillary Bits Using 
Covering Set Partitions," in 8th International Conference on Information 
Technology- New Generation, Las Vegas, NV, 2011.  
[24] M. Hawash, B. Abdalhaq, A. Hawash and M. Perkowski, "Application of Genetic 
Algorithm for Synthesis of Large Reversible Circuits using Covered Set 
Partitions," in The Fourth IEEE International Symposium on Innovation in 
Information & Communication Technology (ISIICT 2011), Amman, Jordan, 2011.  
   
 260 
[25] M. Hawash and M. Perkowski, "Proposal for Normalized Quantum Cost through 
Compliance with Linear Nearest Neighbor Model," in ULSI, Victoria, B.C., 2012.  
[26] M. Hawash and M. Perkowski, "Synthesis of Ternary Quantum Circuits Utilizing 
Hasse Diagrams," in IEEE International Symposium on Multiple-Valued Logic 
(ISMVL), Victoria, BC, 2012.  
[27] M. Hawash, M. Perkowski and M. Lukac, "Synthesis of Ternary Quantum Circuits 
using Hasse Diagrams and Genetic Algorithms," in GPU Computing with 
Applications in Digital Logic, Tampere, Finland, TICSP, 2012, pp. 161-188. 
[28] M. Hawash, M. Lukac, M. Kameyama and M. Perkowski, "Multiple-Valued 
Reversible Benchmarks and Extensible Quantum Specification (XQS) format," in 
ISMVL, Toyama, Japan, 2013.  
[29] M. Hawash, 2011. [Online]. Available: quantumlib.cecs.pdx.edu. 
[30] [Online]. Available: http://revlib.org/function_details.php?id=12. 
[31] T. Monz, K. Kim and W. Haensel, "Realization of the quantum Toffoli gate with 
trapped ions," Physics Review Letters, no. 040501, p. 102, 2009.  
[32] T. Monz, P. Schindler, J. Barreiro, M. Chwalla, D. Nigg, W. Coish, M. Harlander, 
W. Hansel, M. Hennerih and R. Blatt, "14-qubit entanglement: creation and 
coherence," Physical Review Letters, vol. 106, no. 13, p. 130506, March 2011.  
[33] A. Fedorov, L. Steffen, M. Baur, M. d. Silva and A. Wallraff, "Implementing of a 
Tofoli gate with superconducting circuits," Nature, vol. 481, no. 7380, pp. 170-172, 
2012.  
[34] D. Maslov, C. Young, D. Miller and G. Dueck, "Quantum circuit simplification 
using templates," in Design, Automation and Test in Europe (DATE), Europe, 
2005.  
[35] A. K. P. I. L. M. a. J. P. H. V. V. Shende, "Synthesis of reversible logic circuits," 
IEEE Transactions on CAD, vol. 22, no. 6, pp. 710-722, 2003.  
[36] X. S. W. N. N. H. a. M. A. P. G. Yang, "Fast synthesis of exact minimal reversible 
cricuits using group theory," in ASP Design Automation, Asia and South Pacific, 
2005.  
[37] R. W. G. W. D. a. R. D. D. Große, "Exact multiple control Toffoli network 
synthesis with SAT techniques," IEEE Transaction on CAD, vol. 28, no. 5, pp. 
703-715, 2009.  
   
 261 
[38] W. Paul and H. Stone, "A new mass spectrometer without magnetic field," 
Zeitschrift Naturforschung, vol. 8, p. 448, 1953.  
[39] D. DiVincenzo, "Quantum computers, factoring, and decoherence," Science, no. 
270, p. 255, 1995.  
[40] T. Heinrichs and R. Hughes, "A Quantum Information Science and Technology 
Roadmap," 2004. 
[41] J. Labaziewicz, "High Fidelity Quantum Gates with Ions in Cryogenic 
Microfabricated Ion Traps," 2008. 
[42] M. Brooks, "Quantum computers are coming," NewScientist Tech, vol. 203, no. 
2726, pp. 42-45, Sept 2009.  
[43] IBM, Aug 2000. [Online]. Available: http://www-
03.ibm.com/press/us/en/pressrelease/1587.wss. 
[44] M. G. Raizen, J. M. Gilligan, J. Bergquist, W. M. Itano and D. J. Wineland, "Ionic 
Crystals in a Linear Paul Trap," Physics Review A, vol. 45, no. 9, p. 6493, May 
1992.  
[45] C. Bennet and G. Brassard, "Quantum Cryptography: Public key distribution and 
coin tossing.," in IEEE International Conference on Computers, Systems and 
Signal Processing, Bangalore, India, 1984.  
[46] E. Allen and M. Karageorgis, "Radar systems and methods using entangled 
quantum particles". USA Patent 7,375,802, 20 May 2008. 
[47] R. P. Feynman, "Simulating Physics with Computers," International Journal of 
Theoritical Physics, vol. 21, p. 467, 1982.  
[48] D. Deutsch, "Quantum theory, the church-turing principle and the universal 
quantum computer," Royal Society London Series A, vol. 400, p. 97, 1985.  
[49] P. Shor, " Algorithms for quantum computation: Discrete log and factoring," in The 
35th Annual Symposium on the Foundations of Computer Science, Los Alamitos, 
CA, 1994.  
[50] L. K. Grover, "A fast quantum mechanical algorithm for database search," in The 
28th Annual ACM Symposium on the Theory of Computing, New York, NY, 1996.  
[51] A. M. Steane, "Error correcting codes in quantum theory," Physics Review Letters, 
no. 77, p. 793, 1996.  
   
 262 
[52] P. W. Shor, "Scheme for reducing decoherence in quantum memory.," Physics 
Review Leters, no. 52, p. 2493, 1995.  
[53] W. MathWorld, "Finite Field," [Online]. Available: 
http://mathworld.wolfram.com/FiniteField.html. 
[54] M. Khan, M. Perkowski and P. Kerntopf, "Multi-Output Galois Field Sum of 
Product Synthesis with New Quantum Cascades," in ISMVL, 2003.  
[55] J. Jordan, "Sudies of Infinite Two-Dimensional Quantum Lattice Systems with 
Projected Entangled Pair States," The University of Queensland, Australia, 
Quuesnland, 2011. 
[56] D. M. a. G. W. D. D. M. Miller, "A transformation based algorithm for reversible 
logic synthesis," in Design Automation Conference, Anaheim, 2003.  
[57] R. Wille and R. Drechsler, "Effect of BDD optimization on synthesis of reversible 
and quantum logic," in Workshop on Reversible Computation, 2009.  
[58] M. Soeken, R. Wille and R. Drechsler, "Hierarchical synthesis of reversible circuits 
using positive and negative Davio decomposition," in 5th International Design and 
Test Workshop, Abu Dhabi, 2010.  
[59] R. Wille, M. Saeedi and R. Drechsler, "Synthesis of reversible functions beyond 
gate count and quantum cost," in Int't Workshop on Logic Synthesis, 2009.  
[60] P. Gupta, A. Agrawa and N. Jha, "An Algorithm for Synthesis of Reversible Logic 
Circuits," 2010. 
[61] V. Shende, S. Bullock and I. Markov, "Synthesis of Quantum-Logic Circuits," 
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED 
CIRCUITS AND SYSTEMS, vol. 25, no. 6, p. 1000, 2006.  
[62] D. Miller and M. Thornton, "QMDD: A Decision Diagram Structure for Reversible 
and Quantum Circuits," in ISMVL 2006, 2006.  
[63] A. Agrawal and N. Jha, "Synthesis of Reversible Logic," in DATE, Paris, 2004.  
[64] J. Donald and N. Jha, "Reversible Logic Synthesis with Fredkin and Peres gates," 
in ACM Journal on Emerging Technologies inComputing Systems, 2008.  
[65] G. Dueck and D. Maslov, "Reversible Function Synthesis with Minimum Garbage 
Outputs," in 6th International Symposium on Representations and Methodology of 
Future Computing Technologies; RM, Trier, Germany, 2003.  
   
 263 
[66] P. Gupta, A. Agrawal and N. Jha, "An algorithm of synthesis of reversible logic 
circuits," in IEEE Transactions on CAD, 2006.  
[67] A. Khlopotine, M. Perkowski and P. Kerntopf, "Reversible Logic Synthesis by gate 
Composition," in Proc. IWLS, 2002.  
[68] D. Maslov and G. Dueck, "Improved quantum cost for k-bit Toffoli gates," IEEE 
Electroninc Letters, vol. 39, no. 25, pp. 1790-1791, 2003.  
[69] M. Kumar, B. Year, N. Metzger, Y. Wang and M. Perkowski, "Relization of 
incompletely specified functions in minimized reversible circuits," in Proceeding 
Reed-Muller, 2007.  
[70] M. Kumar, "Realization of Incompletely Specified Reversible Functions," in 
Thesis, Portland State University, 2008.  
[71] A. Mishchenko and M. Perkowski, "Fast Heurisic Minimization of Exclusive Sum 
of Products," in Proc. Reed-Muller Workshop, Starkville, 2001.  
[72] M. Saeedi, M. Sedighi and M. Zamani, "A Novel Synthesis Algorithm for 
Reversible Circuits," in Proc. ICCAD, San Jose, 2007.  
[73] R. Wille and R. Dreschsler, "BDD-based synthesis of reversible logic for large 
functions," in Proc. DAC, San Francisco.  
[74] J. Rice, K. Fazel, M. Thornton and K. Kent, "Toffoli Gate Cascade Generation 
using ESOP Minimization and QMDD-based Swapping," in Reed-Muller 
workshop, 2009.  
[75] N. Alhagi, "A Synthesis method to synthesize reversbile functions using Arbitrary 
Gates and Relational Specifications," Portland State University, PhD Thesis, 2010. 
[76] P. Kerntopf, "A new heuristic algorithm for reversible logic synthesis," Prc. DAC, 
2004. 
[77] "Wikipedia," [Online]. Available: http://en.wikipedia.org/wiki/Hasse_diagram. 
[78] M. Nielsen and I. Chuang, Quantum Computation and QUantum Information, 
Cambridge University Press, 2009.  
[79] e. a. A Barenco, "Elementary gates for quantum computation," Physical Review A, 
vol. 52, p. 52:3457–3467, 1995.  
[80] F. Glover, "Future paths for integer progamming and links to artificial 
intellegence," Computers and Operations Research, vol. 13, no. 5, pp. 533-549, 
   
 264 
1986.  
[81] F. Glover and L. M, TabuSearch in modern heuristcs techniques for combinatorial 
problems, John Wiley & Sons, 193.  
[82] B. Abdalhaq, "A Methodology to Enhance the Prediction of Forest Fire 
Propagation.," Universitat Autònoma de Barcelona Report No. ISBN: 8468877816, 
. 
[83] M. Saeedi, "Unstructured Reversible Function 4 (urf4)," [Online]. Available: 
http://www.informatik.uni-bremen.de/rev_lib/function_details.php?id=89. 
[84] e. a. A. Barenco, "Elementary gates for quantum computation.," Physical Review, 
vol. 52, pp. 3457-3467, 1995.  
[85] X. Song, G. Yang, J. Yang, M. Perkowski and W. N. N. Hung, "Optimal Synthesis 
of Multiple Output Boolean Functions using a set of Quantum Gates by Symbolic 
Reachability Analysis," in CAD-ICS, 2006.  
[86] J. O'Brien, G. Pryde, A. White and T. R. a. D. Branning, "Demonstration of an all 
optical quantum controlled-NOT gate," Nature, vol. 426, pp. 264-267, 2004.  
[87] M. Perkowski, M. Lukac, D. Shah and M. Kameyama, "Synthesis of quantum 
circuits in Linear Nearest Neighbor model using Positive Davio Lattices," Facta 
Universitatis (NIS), vol. 24, no. 1, pp. 73-89, April 2011.  
[88] M. Saeedi, R. Welle and R. Drechsler, "Synthesis of Quantum Circuits for Linear 
Nearest Neighbor Architectures," Quatum Information Processing, vol. 10, no. 3, 
pp. 355-377, 2011.  
[89] Z. Sasanian and D. M. Miller, "Transforming MCT Circuits to NCVW Circuits," 
Reversible Computing, Third International Workshop, pp. 77-88, 2011.  
[90] R. Stock and D. James, "A scalable, high-speed measurement-based quantum 
computer using trapped ions.," Physical Review Letters, vol. 102, no. 17, 2009.  
[91] D. Jaksch, "Optical Lattices, ultracold atoms and quantum information processing," 
Contemporary Physics, vol. 45, no. 5, pp. 367-381, 2004.  
[92] B. Choi and R. V. Meter, "A n Depth Quantum Adder on the 2D NTC Quantum 
omputer Architecture," ACM Journal on Emerging Technologies in Computing 
Systmes, vol. 8, no. 3, 2012.  
[93] M. M. Khan, A. Biswas, S. Chowdhury, M. Hasan and A. Khan, "Synthesis of 
GF(3) Based Reversible/Quantum Logic Circuits Without Garbage Output," in 39th 
   
 265 
International Symposium on Multiple-Valued Logic (ISMVL'09), Naha, Okinawaw, 
2009.  
[94] A. Al-Rabadi, L. Casperson, M. Perkowski and Z. Song, "Multiple Valued 
Quantum Logic".  
[95] A. Al-Rabadi, "Quantum Circuit Synthesis Using Classes of GF(3) Reversible Fast 
Spectral Transforms," in Proceedings of the 34th Interntational Symposium on 
Multiple Valued Logic (ISMVL'04), Toronto, 2004.  
[96] B. Bollig, M. Lobbing, M. Sauerhoff and I. Wegener, "On The Complexity of the 
Hidden Weighted Bit Function for Various BDD Models," Informatique Theorique 
et Applications, vol. 33, no. 2, pp. 103-116, 1999.  
[97] A. Prasad., V. Shende, I. Markov, J. Hayes and K. Patel, "Data Structures and 
Algorithms for Simplifying Reversible Circuits," ACM Journal on Emerging 
Technologies in Computing Systems (JETC), vol. 2, no. 4, pp. 277-293, October 
2006.  
[98] V. Zhirnov, R. K. Kavin, J. Hutchby and G. Bourianoff, "Limits to Binary Logic 
Switch Scaling - A Gedanken Model," Proc. of the IEEE, vol. 91, no. 11, pp. 1934-
1939, 2003.  
[99] A. (. Patino, Reversible Logic Synthesis Using a Non-blocking Order Search, 
Portland OR: Portland State University, 2010.  
[100] M. Perkowski and e. al., "A Hierarchical Approach to Computer-Aided Design of 
Quantum Circuits," 2002. 
[101] A. Chakrabarti and S. Sur-Kolay, "Nearest Neighbour based Synthesis of Quantum 
Boolean Circuits," Engineering Letters, vol. 15, no. 2, Nov 2007.  
[102] A. Younes and J. Miller, "Representation of Boolean Quantum Circuits as Reed 
Muller Expressions," arxive: quant-h/0304134, May 2003.  
[103] K. Fazel, M. Thornton and J. Rice, "ESOP-based Toffoli Gate Cascade 
Generation," in IEEE Pacific Rim Conference on Communications, Victoria, BC, 
2007.  
[104] S. A. Caccaro, T. G. Draper, S. A. Kutin and D. P. Moution, "A new quantum 
ripple-carry addition circuit," Quantum Physics, 2004.  
[105] A. Al-Rabadi, "Reversible fast permutation transforms for quantum circuit 
synthesis," in Proceedings. 34th International Symposium on Multiple-Valued 
   
 266 
Logic (ISMVL'04), Toronto, 2004.  
 
 
 
 
