RSFQ digital circuit design automation and optimisation by Muller, Louis C.




Dissertation presented for the degree of Doctor of Philosophy
in Electronic Engineering in the Faculty of Engineering at
Stellenbosch University
Promoter:
Prof. Coenrad J. Fourie





By submitting this dissertation electronically, I declare that the entirety of the
work contained therein is my own, original work, that I am the sole author
thereof (save to the extent explicitly otherwise stated), that reproduction and
publication thereof by Stellenbosch University will not infringe any third party
rights and that I have not previously in its entirety or in part submitted it for
obtaining any qualiﬁcation.
Date: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Copyright c© 2015 Stellenbosch University
All rights reserved.
i
Stellenbosch University  https://scholar.sun.ac.za
Abstract
In order to facilitate the creation of complex and robust RSFQ digital logic
circuits an extensive library of electronic design automation (EDA) tools is a
necessity. It is the aim of this work to introduce various methods to improve
the current state of EDA in RSFQ circuit design.
Firstly, Monte Carlo methods such as Latin Hypercube sampling and Sobol
sequences are applied for their variance reduction abilities in approximating
circuit yield. In addition, artiﬁcial neural networks are also investigated for
their applicability in modeling the parameter-yield space.
Secondly, a novel technique for circuit functional testing using automated
state machine extraction is presented, which greatly simpliﬁes the logical ver-
iﬁcation of a circuit. This method is also used, along with critical timing
extraction, to automatically generate Hardware Description Language(HDL)
models which can be used for high level circuit design.
Lastly, the Greedy Local search, Simulated Annealing and Genetic Algo-
rithm meta-heuristics were statistically compared in a novel manner using a
yield model provided by artiﬁcial neural networks. This is done to ascertain
their performance in optimising RSFQ circuits in relation to yield.
The variance reduction techniques of Latin Hypercube Sampling and Sobol
sequences were shown to be beneﬁcial for the use with RSFQ circuits. For
optimisation purposes the use of Simulated Annealing and Genetic Algorithms
were shown to improve circuit optimisation for possible multi-modal search
spaces. An HDL model is also successfully generated from a complex RSFQ
circuit for use in high level circuit design which includes critical timing and
propagation latency.
All the techniques presented in this study form part of a software library
that can be further reﬁned and extended in future work.
ii
Stellenbosch University  https://scholar.sun.ac.za
Acknowledgements
My gratitude must always go foremost to the Almighty for granting me the
ability and perseverance to complete this project while being employed as an
engineer in the industry.
I would also like to express my sincerest gratitude to the following people:
To my unbelievable wife who continued to encourage me through late
nights, early mornings and looming work deadlines, all the while raising a
very busy Little Man. My gratitude and love for you cannot be expressed in
prose or the most complex of quantum mechanical equations.
Prof. C. J. Fourie, my supervisor at the University of Stellenbosch, for
his patience and guidance throughout this whole project. Thank you, espe-
cially, for understanding when work sometimes interfered with studies and for
granting me the freedom to investigate various weird and wonderful research
avenues (which might not all have been successful).
The various open source projects that allowed the investigation and soft-
ware creation for this project: CodeLite, Boost, FANN, Matplotlib and the
work of S. Joe and F. Y. Kuo on Sobol sequences.
Thank you also to Dr. N. J. Els at the University of Stellenbosch for
creating the LATEXtemplates that have been used to compile this thesis.
To Tristan Goss, my employer, thank you for allowing me the time to
complete this project.
For those members of my family who have already passed on. Your inﬂu-
ence on my life, though far too short, can never be understated. Your example
will remain with me forever.
iii






List of Figures vii
List of Tables x
Nomenclature xii
1 Introduction 1
1.1 Current state of RSFQ Electronic Design Automation . . . . . . 2
1.2 Document Layout . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background Information 6
2.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Properties of Superconductors . . . . . . . . . . . . . . . . . . . 6
2.3 Josephson Junctions . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Rapid Single Flux Quantum Circuits . . . . . . . . . . . . . . . 12
3 Monte Carlo Methods for Yield Estimation 16
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Latin Hypercube Sampling . . . . . . . . . . . . . . . . . . . . . 18
3.3 Sobol Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Practical Example . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 Artiﬁcial Neural Networks 30
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Artiﬁcial Neural Network Fundamentals . . . . . . . . . . . . . 30
4.2.1 The Building Blocks of Artiﬁcial Neural Networks . . . . 31
4.2.2 Training through the Back-Propagation method . . . . . 34
iv
Stellenbosch University  https://scholar.sun.ac.za
CONTENTS v
4.3 Practical Application . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5 State Machine and Timing Extraction 52
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2 Method Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Flux Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4 Detailed Method Description . . . . . . . . . . . . . . . . . . . . 58
5.4.1 User Inputs . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4.2 SPICE parsing, ﬂattening and graph creation . . . . . . 60
5.4.3 Inductive Loop Identiﬁcation . . . . . . . . . . . . . . . 60
5.4.4 State Machine Extraction . . . . . . . . . . . . . . . . . 62
5.4.5 Critical Timing Extraction . . . . . . . . . . . . . . . . . 64
5.4.6 HDL Generation . . . . . . . . . . . . . . . . . . . . . . 72
6 Yield Optimisation of RSFQ Circuits 81
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2 Meta-Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2.1 Greedy Local Search . . . . . . . . . . . . . . . . . . . . 83
6.2.2 Simulated Annealing . . . . . . . . . . . . . . . . . . . . 85
6.2.3 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . 88
6.3 Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4 Search space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.5 Solution Representation . . . . . . . . . . . . . . . . . . . . . . 96
6.6 Testing Methodology . . . . . . . . . . . . . . . . . . . . . . . . 97
6.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.7.1 Greedy Local Search . . . . . . . . . . . . . . . . . . . . 98
6.7.2 Simulated Annealing . . . . . . . . . . . . . . . . . . . . 100
6.7.3 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . 102
6.8 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7 Conclusion and Future Work 106
Appendices 109
A RSFQ Automation and Optimisation Library 110
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
A.2 Library Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 111
A.2.1 App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
A.2.2 SPICE File . . . . . . . . . . . . . . . . . . . . . . . . . 112
A.2.3 Functional Test . . . . . . . . . . . . . . . . . . . . . . . 112
A.2.4 Merit Test . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.2.5 ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.2.6 Greedy Local Search . . . . . . . . . . . . . . . . . . . . 114
Stellenbosch University  https://scholar.sun.ac.za
CONTENTS vi
A.2.7 Simulated Annealing Algorithm . . . . . . . . . . . . . . 114
A.2.8 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . 115
A.3 Basic User Guide . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.3.1 Main Conﬁguration File . . . . . . . . . . . . . . . . . . 116
A.3.2 Functional Conﬁguration File . . . . . . . . . . . . . . . 118
A.3.3 Greedy Local Search Conﬁguration File . . . . . . . . . . 118
A.3.4 Simulated Annealing Conﬁguration File . . . . . . . . . . 120
A.3.5 Genetic Algorithm Conﬁguration File . . . . . . . . . . . 122
A.3.6 Artiﬁcial Neural Network Conﬁguration File . . . . . . . 124
A.3.7 Conﬁguration File Examples . . . . . . . . . . . . . . . . 126
B Results of ANN Training Runs 129
Bibliography 134
Stellenbosch University  https://scholar.sun.ac.za
List of Figures
2.1 Superconductor as an ideal inductor. . . . . . . . . . . . . . . . . . 7
2.2 Superconductor as a ﬂux insulator. . . . . . . . . . . . . . . . . . . 7
2.3 Weak link in a superconductor. . . . . . . . . . . . . . . . . . . . . 8
2.4 RCSJ model of Josephson junction with approximation . . . . . . . 10
2.5 Illustration of Josephson junction pendulum analogy . . . . . . . . 12
2.6 Circuit schematic of a Josephson transmission line. . . . . . . . . . 13
2.7 Circuit schematic of an RSFQ Destructive-Flip-Flop . . . . . . . . 14
2.8 Circuit schematic of an RSFQ Pulse Splitter . . . . . . . . . . . . . 15
3.1 Obtaining Latin Hypercube samples from a Gaussian distribution. . 19
3.2 Example of Latin Hypercube sampling of a Gaussian random variable. 20
3.3 Illustration of pseudo-random vs quasi-random sampling. . . . . . . 21
3.4 Circuit schematic of an un-optimised RSFQ AND gate. . . . . . . . 24
3.5 Circuit schematic of an un-optimised RSFQ XOR Gate. . . . . . . . 25
3.6 Monte Carlo method variance comparison for an RSFQ AND Gate. 27
3.7 Monte Carlo Method Variance Comparison of RSFQ XOR Gate . . 28
4.1 The McCulloch-Pitts neuron model. . . . . . . . . . . . . . . . . . . 31
4.2 ADALINE and Perceptron elements. . . . . . . . . . . . . . . . . . 32
4.3 Single element sample space of the AND function. . . . . . . . . . . 32
4.4 Single element sample space of the XOR function. . . . . . . . . . . 33
4.5 Example of a multi-layer ANN topology. . . . . . . . . . . . . . . . 33
4.6 A Sigmoid function. . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.7 Multi-layer ANN topology as back-propagation example. . . . . . . 36
4.8 Circuit schematic of an un-optimised RSFQ AND Gate. . . . . . . 38
4.9 Circuit schematic of an un-optimised RSFQ OR Gate. . . . . . . . 38
4.10 Training strategy ﬂow diagram for artiﬁcial neural networks. . . . . 42
4.11 Sobol sequence yield map of junction B2 and inductor L2 for the
RSFQ AND gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.12 ANN yield map of junction B2 and inductor L2 using network 3
run 2 for the RSFQ AND gate. . . . . . . . . . . . . . . . . . . . . 45
4.13 ANN Yield map of junction B2 and inductor L2 using network 5
run 1 for the RSFQ AND gate. . . . . . . . . . . . . . . . . . . . . 45
vii
Stellenbosch University  https://scholar.sun.ac.za
LIST OF FIGURES viii
4.14 ANN yield map of junction B2 and inductor L2 using network 9
run 1 for the RSFQ AND gate. . . . . . . . . . . . . . . . . . . . . 46
4.15 Sobol sequence yield map of junction B7 and inductor L6 for the
RSFQ OR gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.16 ANN yield map of junction B7 and inductor L6 for network 3 run
1 for the RSFQ OR gate. . . . . . . . . . . . . . . . . . . . . . . . . 48
4.17 ANN yield map of junction B2 and inductor L2 using network 6
run 2 for the RSFQ OR gate. . . . . . . . . . . . . . . . . . . . . . 49
4.18 ANN yield map of junction B2 and inductor L2 using network 9
run 4 for the RSFQ OR gate. . . . . . . . . . . . . . . . . . . . . . 50
5.1 Calculation of change in magnetic ﬂux. . . . . . . . . . . . . . . . . 56
5.2 Example of a ﬂux calculation exception. . . . . . . . . . . . . . . . 57
5.3 Circuit schematic of an RSFQ D-Flip-Flop. . . . . . . . . . . . . . 59
5.4 Un-directed graph of D-Flip-Flop components. . . . . . . . . . . . . 60
5.5 Finding the smallest loop containing inductor L1. . . . . . . . . . . 61
5.6 State machine extraction ﬂow diagram. . . . . . . . . . . . . . . . . 62
5.7 Illustration of set-up and hold times. . . . . . . . . . . . . . . . . . 64
5.8 A simpliﬁed RSFQ AND gate with Mealy state diagram. . . . . . . 65
5.9 Empirical critical timing extraction algorithm. . . . . . . . . . . . . 67
5.10 Current stability critical timing extraction algorithm. . . . . . . . . 71
5.11 Example error veriﬁcation VHDL code . . . . . . . . . . . . . . . . 72
5.12 Schematic circuit of an unoptimised reconﬁgurable DFF-JTL . . . . 73
5.13 Extracted DFF-JTL Mealy state machine representation. . . . . . . 76
5.14 Error veriﬁcation VHDL code for the DFF-JTL circuit using the
empirical extraction method. . . . . . . . . . . . . . . . . . . . . . . 77
5.15 Error veriﬁcation VHDL code for the DFF-JTL circuit using the
current stability method. . . . . . . . . . . . . . . . . . . . . . . . . 78
5.16 Error veriﬁcation VHDL code for the DFF-JTL circuit using the
current stability method continued. . . . . . . . . . . . . . . . . . . 79
5.17 VHDL state machine implementation of the DFF-JTL circuit. . . . 80
5.18 Functional waveforms of DFF-JTL VHDL simulation. . . . . . . . . 80
5.19 SPICE transient simulation of DFF-JTL circuit. . . . . . . . . . . . 80
6.1 Greedy local search algorithm ﬂow diagram. . . . . . . . . . . . . . 84
6.2 Graph of probability of selecting an inferior solution using simulated
annealing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3 Simulated annealing algorithm ﬂow diagram. . . . . . . . . . . . . . 87
6.4 Genetic algorithm ﬂow diagram. . . . . . . . . . . . . . . . . . . . . 89
6.5 Genetic algorithm crossover illustration. . . . . . . . . . . . . . . . 90
6.6 RSFQ-AND circuit schematic. . . . . . . . . . . . . . . . . . . . . . 93
6.7 RSFQ-OR circuit schematic. . . . . . . . . . . . . . . . . . . . . . . 93
6.8 Sobol sequence yield map of B2 and L2 for the RSFQ AND gate. . 95
6.9 Sobol sequence yield map of B7 and L6 for the RSFQ OR gate. . . 96
Stellenbosch University  https://scholar.sun.ac.za
LIST OF FIGURES ix
A.1 App Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
A.2 SPICE File Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
A.3 Functional Test Class . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.4 Merit Test Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.5 Merit Test Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A.6 Greedy Local Search Class . . . . . . . . . . . . . . . . . . . . . . . 114
A.7 Simulated Annealing Class . . . . . . . . . . . . . . . . . . . . . . . 115
A.8 Merit Test Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.9 Main Conﬁguration Example . . . . . . . . . . . . . . . . . . . . . 126
A.10 Linked Components Example . . . . . . . . . . . . . . . . . . . . . 126
A.11 Functional Conﬁguration Example . . . . . . . . . . . . . . . . . . 126
A.12 Tolerance Conﬁguration Example . . . . . . . . . . . . . . . . . . . 127
A.13 Junction Conﬁguration Example . . . . . . . . . . . . . . . . . . . 127
A.14 Genetic Algorithm conﬁguration Example . . . . . . . . . . . . . . 128
A.15 Artiﬁcial Neural Network conﬁguration Example . . . . . . . . . . . 128
Stellenbosch University  https://scholar.sun.ac.za
List of Tables
2.1 Analogy between Josephson junction parameters and pendulum pa-
rameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 Random Samples across Two Gaussian Random Variables . . . . . 19
3.2 Sobol sequence calculation: values for mk and vk in binary. . . . . . 23
3.3 Values for x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Un-optimised RSFQ AND gate circuit parameters. . . . . . . . . . 25
3.5 Un-optimised RSFQ XOR gate circuit parameters. . . . . . . . . . 26
4.1 Sample space deﬁnition for yield modelling. . . . . . . . . . . . . . 39
4.2 Selected FANN parameters for yield modelling. . . . . . . . . . . . 41
4.3 Topologies of the investigated artiﬁcial neural networks. . . . . . . . 43
4.4 ANN yield modelling results for the RSFQ AND gate. . . . . . . . 43
4.5 ANN yield modelling results for the RSFQ OR gate. . . . . . . . . 46
5.1 DFF-JTL circuit parameters. . . . . . . . . . . . . . . . . . . . . . 74
6.1 IPHT 1 kA junction parameters. . . . . . . . . . . . . . . . . . . . 92
6.2 Un-optimised RSFQ AND gate parameters. . . . . . . . . . . . . . 94
6.3 Un-optimised RSFQ OR gate parameters. . . . . . . . . . . . . . . 94
6.4 ANN topologies chosen for optimisation investigation. . . . . . . . . 97
6.5 Yield optimisation search space bounds. . . . . . . . . . . . . . . . 98
6.6 Greedy local search parameter samples. . . . . . . . . . . . . . . . . 98
6.7 GLS statistical results for the RSFQ AND gate. . . . . . . . . . . . 99
6.8 GLS statistical results for the RSFQ OR gate. . . . . . . . . . . . . 100
6.9 Simulated annealing parameter samples. . . . . . . . . . . . . . . . 101
6.10 SA statistical results for the RSFQ AND gate. . . . . . . . . . . . . 101
6.11 SA statistical results for the RSFQ OR gate. . . . . . . . . . . . . . 102
6.12 Genetic algorithm parameter samples. . . . . . . . . . . . . . . . . 102
6.13 GA statistical results for the RSFQ AND gate. . . . . . . . . . . . 103
6.14 GA statistical results for the RSFQ OR gate. . . . . . . . . . . . . 104
A.1 Main Conﬁguration File Parameters . . . . . . . . . . . . . . . . . 116
A.2 Main Conﬁguration File Parameter Cont. . . . . . . . . . . . . . . . 117
A.3 Functional Conﬁguration File Parameter Cont. . . . . . . . . . . . 118
x
Stellenbosch University  https://scholar.sun.ac.za
LIST OF TABLES xi
A.4 Greedy Local Search Parameters . . . . . . . . . . . . . . . . . . . 118
A.5 Greedy Local Search Paramaters Cont. . . . . . . . . . . . . . . . . 119
A.6 Simulated Annealing Parameters . . . . . . . . . . . . . . . . . . . 120
A.7 Simulate Annealing Parameters Cont. . . . . . . . . . . . . . . . . . 121
A.8 Genetic Algorithm Parameters . . . . . . . . . . . . . . . . . . . . . 122
A.9 Genetic Algorithm Parameters Cont. . . . . . . . . . . . . . . . . . 123
A.10 Artiﬁcial Neural Network Parameters . . . . . . . . . . . . . . . . . 124
A.11 Artiﬁcial Neural Network Parameters Cont. . . . . . . . . . . . . . 125
B.1 ANN Results for AND gate Structure 1 . . . . . . . . . . . . . . . . 129
B.2 ANN Results for AND gate Structure 2 . . . . . . . . . . . . . . . . 129
B.3 ANN Results for AND gate Structure 3 . . . . . . . . . . . . . . . . 129
B.4 ANN Results for AND gate Structure 4 . . . . . . . . . . . . . . . . 130
B.5 ANN Results for AND gate Structure 5 . . . . . . . . . . . . . . . . 130
B.6 ANN Results for AND gate Structure 6 . . . . . . . . . . . . . . . . 130
B.7 ANN Results for AND gate Structure 7 . . . . . . . . . . . . . . . . 130
B.8 ANN Results for AND gate Structure 8 . . . . . . . . . . . . . . . . 131
B.9 ANN Results for AND gate Structure 9 . . . . . . . . . . . . . . . . 131
B.10 ANN Results for OR gate Structure 1 . . . . . . . . . . . . . . . . . 131
B.11 ANN Results for OR gate Structure 2 . . . . . . . . . . . . . . . . . 131
B.12 ANN Results for OR gate Structure 3 . . . . . . . . . . . . . . . . . 132
B.13 ANN Results for OR gate Structure 4 . . . . . . . . . . . . . . . . . 132
B.14 ANN Results for OR gate Structure 5 . . . . . . . . . . . . . . . . . 132
B.15 ANN Results for OR gate Structure 6 . . . . . . . . . . . . . . . . . 132
B.16 ANN Results for OR gate Structure 7 . . . . . . . . . . . . . . . . . 133
B.17 ANN Results for OR gate Structure 8 . . . . . . . . . . . . . . . . . 133
B.18 ANN Results for OR gate Structure 9 . . . . . . . . . . . . . . . . . 133
Stellenbosch University  https://scholar.sun.ac.za
Nomenclature
Abbreviations
RSFQ Rapid Single Flux Quantum
FPGA Field Programmable Gate Array
IC Integrated Circuit
ASIC Application Speciﬁc Integrated Circuit
HDL Hardware Description Language
EDA Electronic Design Automation
HTS High Temperature Superconductor
LTS Low Temperature Superconductor
FF Flip-Flop
DC Direct Current
SFQ Single Flux Quantum
JTL Josephson Transmission Line
SQUID Superconduting Quantum Interference Device
CMC Crude Monte Carlo
ANN Artiﬁcial Neural Network
ADALINE Adaptive Linear Elements
RTL Register Transfer Level




Stellenbosch University  https://scholar.sun.ac.za
Chapter 1
Introduction
The semiconductor industry has been advancing at a staggering rate since
integrated circuits became a commercial reality, mostly fuelled by an ever
increasing human need for processing power. In order to serve this need, world
leaders in integrated technology such as Intel and AMD have been pushing the
limits in terms of circuit performance. The competition for the highest clock
frequency might have been set aside due to the constraints of physics, but the
process node size continues to shrink, allowing more transistors to consume less
power on the same die size. Understandably, this progress has not been simple,
with current leaders, Intel, spending billions of dollars on ﬁnalising their latest
14 nm FinFET technology while already working on a 10 nm node size. One
might wonder how long the current node reduction trend can continue before
a replacement for the silicon-based devices becomes a necessity.
Possible replacements are the various digital logic families that use super-
conductive state of materials. One of the most promising of these families is
the Rapid Single Flux Quantum (RSFQ) [1] devices which exhibit the novel
behaviour of using the smallest unit of magnetic ﬂux as its unit of information
storage and propagation. Complex circuits employing this logic family and op-
erating at frequencies far above the capabilities of semiconductor devices have
already been tested [2, 3]. There are, however, immense challenges facing the
RSFQ community before this technology can become part of the mainstream.
The key obstacle remains the superconductive condition itself. Although
superconductivity has been shown to occur in materials at above 100 Kelvin
[4], the materials used for these achievements are not easily manufactured in
integrated circuits. Low temperature superconductors, such as Niobium, with
critical temperatures of around 4 Kelvin, are therefore still predominantly
employed in the superconducting digital logic industry. Moreover, expensive
cryogenic equipment is necessary to retain the circuits in their operating region.
Research in higher temperature superconductors and cryogenic equipment is
still ongoing.
Another obstacle facing the RSFQ community is high operating frequency
itself. At the tens of gigahertz where these circuits operate, clock skew evolves
1
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 1. INTRODUCTION 2
into a substantial problem. In eﬀect, the circuit operates at frequencies that
are comparable to the propagation delay of the information carriers themselves,
which makes timing closure very diﬃcult. However, some novel approaches to
timing and asynchronous methodologies [5] have been proposed which might
be more suited to this technology.
The challenges faced when designing timing critical, complex circuits are
not unique to RSFQ though, it is a problem that has already been researched
for decades in the semiconductor industry. In order to alleviate such design
challenges, various pieces of software, called electronic design automation soft-
ware, were developed. These software packages permit the automation of
various steps in the design process in order to allow designers to create more
complex circuitry without the need to manually follow through with the com-
plete implementation. Such software packages are currently an essential part
of the circuit design process, be it in the arena of the Field Programmable
Gate Array(FPGA) or the Application Speciﬁc Integrated Circuit (ASIC).
The aim of the presented work is to investigate the current state of RSFQ
electronic design automation in order to propose various fundamental improve-
ments to the design methodology. A secondary deliverable is also provided in
the form of a software library that contains the methodological improvements
presented in this work.
1.1 Current state of RSFQ Electronic Design
Automation
Although the design of superconductor electronic circuits has matured signiﬁ-
cantly in the past few decades, most of the steps still necessitate a substantial
amount of human input. This is in stark contrast to the current state of circuit
design in the semi-conductor industry where a very high level of abstraction
is available to designers so that they can focus on the speciﬁc function being
implemented rather than the details of the implementation itself. An obvious
example is the high level synthesis provided by Xilinx [6] where applications
can be written in C, C++ or SystemC and the provided software facilitates a
direct implementation to the Hardware Description Language(HDL) level, and
in this case, the targeted FPGA. The MathWorks [7] arguably takes the pro-
cess a step further by supplying a block diagram-based environment (Simulink)
from which an HDL implementation can be generated and after which a vendor
speciﬁc synthesis process can commence.
It was shown in [8] that the state of Electronic Design Automation(EDA)
software used for superconducting circuit development tends to be substan-
tially fragmented, with many institutions using in-house solutions that have
not been updated for a relatively long period of time. Therefore, opportunities
abound to improve the current state of EDA software in the RSFQ industry.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 1. INTRODUCTION 3
In order to keep the scope of the presented work tractable, two areas that are
in need of improvement were selected for further investigation. The ﬁrst area
was that of circuit yield optimisation. Yield designates the percentage of man-
ufactured circuits that are deemed to be functional. The optimisation of yield
is therefore crucial for the creation of robust circuits that can form part of a
larger design. Yield is generally calculated by using a Monte Carlo method
that relies on many simulation runs. The yield calculations and optimisation
therefore tend to be time-consuming.
The current state of EDA software in the superconductor electronics in-
dustry employed in yield optimisation can be summarised as follows:
• In order to calculate the circuit yield, pass or fail criteria need to be
created. This is generally achieved manually (usually using a SPICE
test bench).
• To ascertain whether a circuit is functional or not, results from the pre-
viously mentioned test bench are analysed in terms of junction switches
and output pulses. This is achieved either manually or by using a pro-
gramming script.
• A Crude Monte Carlo approximation is generally used for yield calcula-
tions which necessitates many simulation runs in order to obtain a usable
conﬁdence interval.
• Numerous yield optimisation algorithms are actively used, mostly with
a stochastic foundation. It is therefore very challenging to compare dif-
ferent algorithms and diﬀerent variants of the same algorithm.
The second area under investigation was the creation of Hardware De-
scription Language (HDL) models for high level circuit design. A Hardware
Description Language is a programmatic input method used to describe circuit
functionality and circuit interconnections. The usage of HDL has become a
part of the mainstream in the semiconductor industry due to the ease with
which complex circuitry can be described. The circuit HDL description along
with a robust cell library lends itself to a fast and repeatable design cycle. As
is indicated in [8], many methods of HDL modeling have already been pro-
posed. However, these models tend to be constructed using various diﬀerent
methodologies which can be time consuming. Furthermore, no standard exist
for the creation of these models making the inter-institution exchange of cell
libraries diﬃcult.
The current state of EDA software in the superconductor electronics in-
dustry employed in HDL model creation is as follows:
• State machine representations are created manually or via script by mon-
itoring circuit junction switches and output pulse generation.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 1. INTRODUCTION 4
• Timing characterisation is achieved either manually or via script by mon-
itoring circuit junction switches and output pulse generation.
• Model construction methodologies tend to be institution speciﬁc.
The aim of this work is to propose various methods to improve the state
of EDA software in the superconducting electronics industry by focusing on
automating various steps in the design cycle and proposing algorithmic im-
provements to reduce the design time.
The aims are thus:
• To automate the creation of pass or fail test benches and interpret the
results.
• To propose various methods for faster yield analysis.
• To propose a method for the comparison of diﬀerent optimisation algo-
rithms and the comparison of diﬀerent variants of the same algorithms.
• To propose a method for the automated creation of an HDL model with
its accompanying timing characterisation in a repeatable, standardised
manner.
1.2 Document Layout
In Chapter 2 of this document a high level explanation of the phenomenon
of superconductivity is presented. This is followed by a discussion on the
functional behaviour of RSFQ circuits.
Chapter 3 proposes various Monte Carlo methods for approximating the
process yield of an RSFQ circuit. Here, Latin Hypercube sampling and Sobol
sequences are compared with Crude Monte Carlo in terms of their conﬁdence
interval.
Chapter 4 introduces artiﬁcial neural networks speciﬁcally for approximat-
ing circuit yield. The results of this section are discussed in Chapter 6, with
its applicability to circuit optimisation.
Chapter 5 presents an automated state machine extraction algorithm which
can be applied to RSFQ circuits for failure testing and HDL model generation.
The timing characterisation of the circuits are also investigated for use with
the HDL models.
In Chapter 6 various meta-heuristic optimisation algorithms are compared
by using an artiﬁcial neural network for yield approximation. The artiﬁcial
neural network is employed to facilitate vast numbers of optimisation runs in
order to investigate the statistical performance of the meta-heuristics.
The conclusion and possible future work are presented in Chapter 7. An
appendix is also attached which contains a broad overview of the software
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 1. INTRODUCTION 5
library of the algorithms presented in this document and the results of the
artiﬁcial neural network investigation.




Superconductivity as a physical phenomena was ﬁrst discovered in 1911 by a
Dutch physicist named Heike Kammerlingh Onnes [4], a pioneer in cryogen-
ics. His aim was to observe the properties of metals in liquid helium. The
understanding at the time was that the resistance should decrease by some
continues function as the temperature decreases. When testing mercury how-
ever, the physicist identiﬁed a very sharp drop in temperature, down to zero
ohm at around 4 Kelvin, in eﬀect a discontinuity that was not observed be-
fore. Soon many other metals exhibiting the same behaviour was found. The
temperature where this sudden transition to zero ohm occurred was dubbed
the critical temperature Tc. For a long time the metal with the highest Tc
was Niobium, with a value of 9 Kelvin. However, in 1987 a new type of su-
perconductor material was identiﬁed with Tc values far higher than previously
recorded. Materials, such as Y Ba2Cu3O7 with Tc values of around 92 Kelvin
became known as High Temperature Superconductors (HTS) while the previ-
ously found materials were known as Low Temperature Superconductors(LTS).
As discussed in this document, it is evident that the usage of LTS materials
is still very prevalent in the manufacturing of digital logic circuits. This is due
to the ease of manufacturing with materials such as Niobium. In contrast,
the manufacturing of HTS circuitry tends to pose various challenges which
fall outside the scope of this work. It is therefore still necessary to employ
relatively expensive cryo-coolers in order to test the LTS circuits.
2.2 Properties of Superconductors
The currently accepted theory behind superconductors, called the BCS theory
(named after the J. Bardeen, L. Cooper and J. Shrieﬀer), makes heavy use of
quantum mechanics to describe how and why superconductors function as they
do. In this work, a more intuitive method is used to describe their behaviour by
6
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 2. BACKGROUND INFORMATION 7
using some of the properties of the BCS theory and following the explanations
in [4]. Interested readers are directed to [9] for a more thorough investigation
of the theory behind superconductivity.
Figure 2.1: Superconductor as an ideal inductor.
A useful analogy to use when dealing with superconductors is to model
the superconductor as an ideal inductor. An example is illustrated in Figure
2.1. Assuming that S2 is open and S1 is closed for a short period, energy will
be stored in the circuit inductor as a magnetic ﬁeld as long as these switch
conﬁgurations are held constant. If, for example, the switch positions are
reversed (S1 open and S2 closed), the current in the circuit will continue to ﬂow
and the stored energy will eventually be completely dissipated in the resistor.
In superconductors, though, the resistance value is zero, which means that
due to the stored magnetic ﬁeld the current will continue to ﬂow forever. This
has been shown to be the case in practise, where no substantial changes were
found when using superconductors in experiments where the current value was
tested over a period of years. It is noteworthy, though, that superconductors
have a current limit called the critical current, Ic. If a current level above this
value is applied through the superconductor, the latter will enter its resistive
state, and hence behave like a normal conductor.
Figure 2.2: Superconductor as a ﬂux insulator.
Another way of interpreting the superconductive state is to rather investi-
gate the change in magnetic ﬂux. In Figure 2.2 where the loop is of a material
in a supeconductive state, the voltage source will continually add magnetic
ﬂux to the circuit due to V = dΦ/dt which is stored in the loop. The only
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 2. BACKGROUND INFORMATION 8
way for the ﬂux to escape the loop is through a resistance which is zero due
to the superconductive state. There is therefore no other means for the ﬂux
to leave the loop. If, for example, the voltage source is shorted and a magnet
is brought close to the loop, the change in magnetic ﬁeld due to the magnet
will cause an opposing current to form in an attempt to counteract the new
magnetic component (as is the principal behind electric generators). In a nor-
mal metal, as soon as the magnet stops moving (and the change in magnetic
ﬁeld returns to zero), this opposing current will decay with a time constant of
L/R. In superconducting circuits, however, the resistance value is zero which
indicates that the opposing current will continue to ﬂow even if the applied
magnetic ﬁeld is static. The shielding current will therefore perpetually cause
a counteractive magnetic ﬁeld to nullify the ﬂux component of the magnet. A
superconductor can therefore also be regarded as a "ﬂux insulator".
Figure 2.3: Weak link in a superconductor.
There are, however, ways in which ﬂux can enter and leave a supercon-
ducting loop under certain conditions. For instance, assuming that a super-
conducting loop, as shown in Figure 2.3, contains a weak link. This weak link
could, for example, be a narrowing of the superconducting wire, or a thin insu-
lator connecting the superconductive wires. Under normal circumstances, the
loop will hold its magnetic ﬂux indeﬁnitely, as explained above. If, however,
the current ﬂowing in the loop is larger than the critical current of the weak
link, ﬂux will be able to escape from the loop due to the weak link entering its
resistive state. What is interesting to note is that instead of a continuous loss
of ﬂux, the ﬂux will rather decrease by one ﬂuxon at a time. A ﬂuxon being
the smallest quantised measure of magnetic ﬂux with relation Φ = h/2e where
h is the Plank constant and e the electron charge. Each time a ﬂux leaves the
loop, a small voltage pulse is created due to V = dΦ/dt.
Another important concept in superconducting circuits is the quantisation
of ﬂux storage. Through the BCS theory, it has been established that the
charge carrying particles in a superconductor are Cooper pairs rather than
normal electrons. These are pairs of electrons coupled together as a result
of electron-phonon interactions [9]. The theory states that not only should
all these Cooper pairs be in the same energy state, but they should also be
governed by the same quantum mechanical wave equation with Ψ = Ψ0exp(jθ).
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 2. BACKGROUND INFORMATION 9
In this equation, Ψ20 is the number density of the Cooper pairs and θ is the
phase of the wave function. In eﬀect, all the wave functions of each Cooper
pair add up coherently to form a macroscopic particle. As described in [4],
when a voltage pulse with an integral equal to that of a single ﬂuxon is applied
to a superconducting loop, such as seen in Figure 2.3, a phase change of 2pi
is introduced around the loop so that the start and end phases of the wave
function align. In eﬀect, a continuous phase relationship is present around the
loop without any discontinuous phase jumps. Any discontinuities in this phase
relationship is energetically unfavourable given that the same wave function is
used for all the Cooper pairs in the loop. It is thus evident from this discussion
that only modula 2pi phase changes are allowed inside the superconducting loop
which is associated with the storage or loss of a single magnetic ﬂux quantum.
2.3 Josephson Junctions
From the previous discussion it is evident that in order to connect two super-
conducting wires, the phases of the wave functions must align. A deviation
from this condition might occur if a weak link, as depicted in Figure 2.3, is
formed between the wires. Consequently, the wave functions will be weakly
coupled and allow a diﬀerence in the phase across the link [4]. Such "weak
links" are called Josephson junctions, taking their name from B.D. Josephson
who ﬁrst theorised that electron pairs will be able to tunnel between closely
spaced superconductors [9]. The super current ﬂowing through the junction is
related to the diﬀerence in the phase of the two wave functions as presented
in equation 2.3.1
I = Icsinφ (2.3.1)
where φ = θ2 − θ1. The change in junction phase is related to the voltage










From equation 2.3.2, it can be surmised that if a constant voltage is applied
across the junction, an oscillating current will be created with frequency fJ =
2eV/~ = V/Φ0.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 2. BACKGROUND INFORMATION 10
Figure 2.4: RCSJ model of Josephson junction with approximation
A popular method of modelling a Josephson junction is the use of the
RCSJ model as depicted in Figure 2.4. In this model three current paths are
illustrated. The resistive path is the normal current when the instantaneous
current is larger than the junction critical current while the capacitor is formed
from the capacitive structure ,superconductor-insulator-superconductor, of the
junction itself. The total instantaneous current can therefore be written as [4]:

















where equation 2.3.2 is used to replace V . Since equation 2.3.3 is a non-
linear second-order diﬀerential equation, a linearisation around φ is used to











where φ′ = Φ0φ/2pi, V = dΦ′/dt. LJ0 = Φ0/2pi = ~/2eIc is the limit of the
junction inductance as Is << Ic as was derived in [4].
As is presented in [4], this describes a current-driven parallel RCL resonator
with resonant frequency (sometimes called the plasma frequency in supercon-
ducting literature) of:




and a Q value of






Analogous to the resonator, the Q value also describes how "damped" the
junction is, which plays a major role in the dynamic behaviour of the junction
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 2. BACKGROUND INFORMATION 11
as will be explained below. The Q value itself is rarely used in superconducting
literature but rather re-derived as Bc as indicated in equation 2.3.7.




It is interesting to note that equation 2.3.3 is analogous to that of a pen-





= Ta − ksin(φ)− ν dφ
dt
(2.3.8)
where Ta is the applied torque, µ = ml2 is the moment of inertia, d2φ/dt2
is the angular acceleration and k = mgl is the restoring constant. For each
of these components of the pendulum there is an analogous component in the
junction model of equation 2.3.3. This is presented in Table 2.1 which has
been reproduced from [4].
Figure 2.5 illustrates the pendulum analogy.
Junction Components Pendulum Components
Phase diﬀerence φ Angle from vertical φ
Voltage V = (~/2e)dφ/dt Angular Velocity dφ/dt
Capacitance C Moment of inertia µ = ml2
Conductance 1/R Viscosity ν
Critical Current Ic Restoring constant k = mgl
Applied Current I Applied Torque Ta
Plasma Frequency ω0 = 1/
√
LJ0C Oscillating Frequency ω0 =
√
k/µ
Table 2.1: Analogy between Josephson junction parameters and pendulum
parameters
Bearing in mind the value of Q (or BC), an intuitive understanding of the
junction behaviour can now be gained. Assuming that the pendulum starts at
rest with φ = 0, then the applied torque will increase the angular velocity and
commence the rotation of the pendulum around its anchor point. If less torque
than the restoring constant is applied and removed, gravity and viscosity will
ensure that the pendulum returns to its nominal position. This might include
some oscillations around the origin depending on the Q value of the pendulum
system. If Q >> 1/2, the pendulum will oscillate substantially before coming
to rest (under-damped). If Q << 1/2, the pendulum will not oscillate at
all (over-damped). If the applied torque pulse is larger than the restoring
constant, the pendulum will swing around and rotate around its anchor point,
oscillating again depending on the Q value.
The behaviour of the Josephson junction is very similar. If a current weaker
than the critical current value is applied and removed through the junction, the
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 2. BACKGROUND INFORMATION 12
Figure 2.5: Illustration of Josephson junction pendulum analogy
junction phase will increase and decrease back to its original value, possibly
oscillating depending on the value of Bc. If, however, a current above the
critical current value is applied, the junction will "switch" thus changing its
phase around 2pi exactly like the pendulum equivalent. As indicated previously,
this 2pi phase shift is equivalent to a single ﬂuxon passing over the junction.
2.4 Rapid Single Flux Quantum Circuits
Some of the ﬁrst implementations of logical circuits using the Josephson junc-
tion as active elements attempted to emulate the voltage state logic as used
in the semiconductor industry. A very high proﬁle project, headed by IBM
[10] attempted to implement a Josephson computer using voltage state logic.
The implementation employed under-damped junctions exhibiting hysteresis
in their resistive state. For example, when a current above the critical current
value is applied to the junction, the junction remains in its resistive state un-
til the current is removed completely (not only reduced to a level below the
critical current). The junction therefore forms a simple switch that can be set
and reset through the bias current applied to the junction. A logical level of 1
was then associated with the junction being in its resistive state and a logical
level of 0 was associated with the superconducting state. This project was
eventually terminated due to implementation diﬃculties as well as due to the
advancement of the semiconductor industry. Various other voltage state logic
families were also tested and implemented with better results [11, 12].
In contrast to these voltage-based logical families, Lihkarev et al. [1] pro-
posed a logical family that rather used the absence or presence of single ﬂux
quantum pulses as indication of a logical 1 or 0. Thus instead of propagat-
ing voltage levels through the system, ﬂuxons were used to change circuit
behaviour and propagate information. This is intuitively a very pleasing solu-
tion since, as explained earlier, the switching of a Josephson junction releases
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 2. BACKGROUND INFORMATION 13
Figure 2.6: Circuit schematic of a Josephson transmission line.
SFQ pulses via its normal switch operation. Instead of using under-damped
junctions with their hysteric behaviour, over-damped junctions are employed.
To understand the reasoning begin this, one needs to consider the pendulum
analogy as depicted in Figure 2.5. Assuming that a DC bias current is ap-
plied to the junction (a constant torque in the pendulum analogy), this will
ensure that the pendulum is situated at a certain position of φ > 0. If a
SFQ pulse is then applied to the junction, the small associated current pulse
could be suﬃcient to drive the current level over its critical threshold. In the
pendulum analogy this would signify the application of a small torque that is
suﬃcient to rotate the pendulum above the critical angle of φ > pi, causing
the pendulum to undergo a revolution. As described earlier, for the junction
this indicates that another SFQ pulse was generated and is free to propagate
onwards through the system. Due to the over-damped nature of the system,
the junction quickly returns to its biased position (caused by the DC bias cur-
rent) without the need to remove the bias current, thus leaving the junction
ready to propagate the next SFQ pulse. It is therefore evident that a junction
can be used to propagate SFQ pulses throughout a circuit. This logical family
is called the Rapid-Single-Flux-Quantum (RSFQ) family.
An example of such a circuit is illustrated in Figure 2.6.
In the circuit I1 is used to bias junctions B1 and B2 to around 70% of
their critical current value. The shunt resistors RS1 and RS2 are used to alter
the Bc value shown in equation 2.3.7 to approximately 1, thereby forming a
over-damped junction. If an SFQ pulse arrives at the input, suﬃcient current
is applied to junction B1 to temporarily switch it to its resistive state. This
causes a ﬂuxon (SFQ pulse) to be propagated into the circuit. The current
direction created as a result of the propagated ﬂuxon is then clock-wise in the
loop B1L2L3B2. This newly released SFQ pulse will have an additive
eﬀect on the current through B2, causing it to switch temporarily to its resistive
state as well. Subsequently, another SFQ pulse will be released through B2
and propagate to the output. Due to this type of behaviour this circuit is
generally called the Josephson transmission line (JTL) and forms the core of
RSFQ circuit interconnections.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 2. BACKGROUND INFORMATION 14
Figure 2.7: Circuit schematic of an RSFQ Destructive-Flip-Flop
Having explained the propagation of SFQ pulses, the next function which
requires attention is the storage element. For this, another two junction circuit,
called a DC SQUID, is used. These circuits are depicted in Figure 2.7 where
the shunt resistances were omitted for the sake of clarity.
In Figure 2.7 junction B3 is once again biased to the 70% mark in order to
receive and propagate an SFQ pulse. Junction B4 on the other hand, has a low
bias given that the current is not equally split from I1 due to the inductor L3.
If an SFQ pulse arrives at InputA, a switch occurs in B3 due to its current
biasing, as was described for the JTL implementation. Note that Φ = LI,
therefore the ﬂux is equal to the inductance and current in the loop. Since
the amount of ﬂux entering the loop is known (that of a single ﬂuxon), the
amount of current generated in the loop is dependent on the loop inductance.
If the selected value of L3 is large enough, then a clock-wise current ﬂow will
be generated in the loop B3L3B4 that is not suﬃcient to switch B4 to
its resistive state. The ﬂuxon is therefore trapped inside the loop. In order
to release this ﬂuxon, a pulse at InputB is necessary. The current caused
by the SFQ pulse at InputB is additive to the clock-wise current in the loop
due to the trapped ﬂuxon. Together these two currents are then suﬃcient to
switch B4 to its resistive state and propagate a ﬂuxon to the output. The
stored ﬂuxon value is lost through the operation, hence the circuit is named a
destructive-ﬂip-ﬂop.
In Figure 2.7, series junctions are also introduced. Such junctions are used
to protect the circuit under all logical conditions. For example, if no ﬂuxon is
stored, B1 should be biased to around 70% of its critical current. When a pulse
arrives at InputB this should cause B1 to switch, therefore "throwing out" the
entered SFQ pulse from the circuit. If a ﬂuxon is stored in the loop, some of
the clock-wise current due to the stored ﬂuxon is shunted in the direction from
the loop to InputB. This has the subtractive eﬀect of lowering the bias current
on B1. A pulse at InputB now does not supply suﬃcient current for B1 to
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 2. BACKGROUND INFORMATION 15
Figure 2.8: Circuit schematic of an RSFQ Pulse Splitter
switch to its resistive state, therefore the pulse is allowed to be applied to the
loop in order to release the trapped ﬂuxon. Similarly, B2 is used to protect
against a subsequent pulse at InputA in instances where a ﬂuxon is already
stored in the loop.
Another circuit of interest is the pulse splitter depicted in Figure 2.8. Shunt
resistances are again omitted for the sake of clarity.
An arriving pulse once again switches B2, allowing a ﬂuxon to enter the
circuit. Assuming that B1 and B3 is biased slightly below their critical current
value, and that L1, L4 and L5 are suﬃciently small, the current associated
with the arriving ﬂuxon could be suﬃcient to switch both B1 and B3, thereby
generating two SFQ pulses, one at each output. The junction can therefore
be used to amplify the arrived ﬂuxon, that was split between two junctions in
order to generate two output pulses. At this point, the output pulses will each
consist of a single ﬂuxon.
Hence, using this fundamental behaviour, many diﬀerent types of logical
circuits can be constructed. The interested reader is pointed to [1] for more
circuitry details.
Stellenbosch University  https://scholar.sun.ac.za
Chapter 3
Monte Carlo Methods for Yield
Estimation
3.1 Introduction
In the manufacturing of integrated electronic circuits, either semiconductor
or superconducting circuits, some statistical variations are present that can
inﬂuence the functionality or performance of a circuit. In the semiconduc-
tor industry, the variations of interest can include changes to the transistor
construction, such as oxide thickness, doping value and geometric variations.
In low temperature superconducting electronics, the variations usually entail
the geometry of the Josephson junctions, that is, the junction area and the
geometry of other passive elements, such as resistors and inductors [13].
The percentage of circuits created that are functional and fall within the
accepted performance bound is called the circuit yield. Manufacturers of such
integrated circuits normally gather a large amount of practical data after which
a statistical analysis is performed to ascertain a distribution and variance for
each of the parameters. In general the distributions tend to be Gaussian with
3-sigma or 6-sigma information rendered as a variance measure.
It is the responsibility of the designer to maximise the yield of the designed
circuit by using various optimisation algorithms and techniques, some of which
are investigated in Chapter 6 of this document. Common to each of these
algorithms, though, is the method employed to obtain the yield of the circuit
given the current circuit parameters.
Analytically the circuit yield can be calculated as follows: Assuming that
the circuit parameters under investigation form the random vector Y , each
with its respective distribution (possibly Gaussian), so as to create a joint
distribution function f(·), the circuit yield will then be the multi-dimensional
integral of the joint distribution function across the acceptance region Ωa.
This equates to the expected value of the joint distribution function and can
theoretically be stated as follows:
16
Stellenbosch University  https://scholar.sun.ac.za





The challenge that designers face is that no analytical mapping generally
exists between selected circuit parameters and the acceptance region Ωa. De-
signers are therefore compelled to make use of simulations, which are usually
expensive in terms of time when testing and measuring the functionality and
performance of a circuit under test. A Monte Carlo method is therefore used
to serve as an unbiased estimator of equation 3.1.1.







where N refers to the number of samples drawn from the joint probability
distribution. Due to the law of large numbers [14], the approximation converges
to the result presented in equation 3.1.1 with a probability of 1 in the limit
where N →∞.
It is valuable to deﬁne a ﬁgure of merit for an approximation algorithm as
this allows one to gain conﬁdence in the algorithm itself as well as to compare
it with other approximation algorithms. For unbiased approximates such as
Monte Carlo, the variance of the solution can be used as a ﬁgure of merit [14].
The equation to calculate the associated variance is presented in equation 3.1.3.
σ2s = E [Y − Y ]2 (3.1.3)
In eﬀect, Y is in itself a random variable since multiple runs using the
same number of samples (Equation 3.1.2) will result in diﬀerent approxima-
tions. The use of the central limit theorem [14, 15] demonstrates that the
variance in equation 3.1.3 decreases at a rate of O(N− 12 ) which implies that
four times as many samples are required from the distribution in order to
decrease the variance by a factor of 2 (and therefore the error of the approxi-
mation method). The aim of investigating various Monte Carlo methods is to
ﬁnd an algorithm that decreases its error faster than the Crude Monte Carlo
(CMC) approximation presented in equation 3.1.2.
In this chapter, two Monte Carlo methods, commonly used in various en-
gineering and ﬁnancial ﬁelds are investigated for their applicability to RSFQ
circuit yield estimation. The ﬁrst method, Latin Hypercube Sampling, makes
use of sample space stratiﬁcation while the second, Sobol sequences, uses a low
discrepancy sequence to lower the variance of the approximation. The aim of
both of these methods is to draw samples from the joint distribution function
in a more orderly fashion than a pseudo-random number generator, generally
used with CMC, would be able to. In the following two sections, Latin Hy-
percube Sampling and Sobol sequences are explained in detail, followed by a
practical example of each of the two methods.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 3. MONTE CARLO METHODS FOR YIELD ESTIMATION 18
3.2 Latin Hypercube Sampling
Stratiﬁed sampling constitutes a method of sub-dividing a sample space into
disjoint regions called strata. Samples are then drawn according to the condi-
tional probability of each strata. Therefore, assuming that one can sub-divide
the joint distribution function into m distinct strata, each of which is selected
with the probability pi where i = [1...m], this selection will then represent a
random variable which can be called Z. With the use of the tower property of
expected values, the stratiﬁed expected value of equation 3.1.1 can be written
as follows [15]:
Y = EE [Y |Z] =
m∑
i=1
piE [Y |Z = i] (3.2.1)
Subsequently, the expected value of T can be calculated by using the ex-
pected values of the conditional distributions of the strata. Stratiﬁcation there-
fore ensures that the samples are more evenly spaced than those generated
using Crude Monte Carlo sampling from equation 3.1.2.
However, although very useful, stratiﬁed sampling tends to become too
computationally intensive for large dimensional sample spaces while the cal-
culation of the cumulative distribution functions can become cumbersome.
Hence another method, Latin Hypercube Sampling, was proposed by [16] to
circumvent this dimensional explosion.
Latin Hypercube Sampling modiﬁes the stratiﬁed sampling method by di-
viding each dimensional range into regions of equal marginal probability. A
single point is then sampled from each of these strata. Finally, these points
are randomly matched between dimensions to form samples for investigation.
For example, given two random variables, both of which are described by
normal distributions forming a joint probability function from which samples
could be drawn, the ﬁrst step would be to stratify the marginal probabilities of
each random variable. Supposing further that 8 strata per marginal probability
are selected by using a uniform distribution U between 0 and 1 as the starting
point, this would result in each strata having an equal probability of pi = 0.125.
A random sample is then drawn from each of the distributions associated with
the strata. A possible sampling is depicted in Table 3.1
V AR1U in the table are the random, uniformly distributed, samples drawn
from each strata of the marginal distribution of the ﬁrst random variable. It is
evident that the samples are generally equally spaced. V ar1G is the equivalent
Gaussian sample drawn by using the inverse transform method with a mean
of 0 and a standard deviation of 1. V AR2U and V AR2G are the equivalent
samples for the second random variable that were drawn by using the same
Gaussian distribution parameters.
Figure 3.1 graphically illustrates the sampling of the ﬁrst random variable.
The ﬁgure also indicates the cumulative distribution function of a Gaussian
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 3. MONTE CARLO METHODS FOR YIELD ESTIMATION 19
V AR1U V AR1G V AR2U V AR2G
0.1 -1.28155 0.08 -1.40507
0.15 -1.03643 0.2 -0.841621
0.29 -0.553385 0.32 -0.467699
0.45 -0.125661 0.43 -0.176374
0.6 0.253347 0.58 0.201893
0.69 0.49585 0.65 0.38532
0.8 0.841621 0.83 0.954165
0.91 1.34076 0.96 1.75069
Table 3.1: Random Samples across Two Gaussian Random Variables



















Figure 3.1: Obtaining Latin Hypercube samples from a Gaussian distribution.
distribution with a mean of 0 and a standard deviation of 1. The x-axis in
Figure 3.1 indicates the probability that a sampled number from the distribu-
tion could fall below the respective y-axis value of the cumulative distribution
graph. The x-axis is graphically stratiﬁed and a random sample represented
by the vertical lines within each strata, is drawn. The corresponding Gaussian
distributed value for each sample is then calculated (shown as the horizontal
lines) and stored for later use.
Finally, random permutations of the columns V AR1G and V AR2G are se-
lected without replacement in order to generate the sample points for the two
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 3. MONTE CARLO METHODS FOR YIELD ESTIMATION 20
dimensions of the sample space.
A possible selection is depicted in Figure 3.2 where the horizontal and
vertical lines represent the strata limits and the points indicate the sample
point selections.










Randomised Latin Hypercube Samples













Figure 3.2: Example of Latin Hypercube sampling of a Gaussian random vari-
able.
An example of the LHS method employed with the RSFQ circuit yield
approximation is presented in Section 4 of this chapter.
3.3 Sobol Sequences
Another sampling method frequently employed with Monte Carlo simulations
is quasi-random numbers. These sequences are in fact not random at all but are
rather deterministically generated and exhibit the kind of behaviour of being
generally more evenly spaced through the sample space than are independent
and identically distributed values. Mathematically, the sequence is said to
possess a low discrepancy [17]. For instance, if E is a collection of subsets of
[0, 1)d, where d is the dimension of the normalised sample space, and P =
{u1, ..., uN} is a set of points in [0, 1)d the d-dimensional unit cube, then the
star-discrepancy of the set of points can be stated as:









Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 3. MONTE CARLO METHODS FOR YIELD ESTIMATION 21


























Figure 3.3: Illustration of pseudo-random vs quasi-random sampling.
where sup denotes the supremem operator, the sum term is the proportion
of points in C, and the integral is the volume of C in proportion to the unit
cube volume. The measure of low discrepancy in equation 3.3.1 returns the
worst-case (highest) discrepancy value of the sets of points. The discrepancy
returns the worst error if the sum estimate is used to estimate the volume of
the sets of points when the sample space is divided into volumes of the same
size. Too few or too many points in a sample volume would cause a signiﬁcant
error and therefore also a signiﬁcant discrepancy. Hence it follows that a low
discrepancy sequence is generally equi-distributed, that is, a volume in a low
discrepancy sequence contains a number of points proportional to the volume
size. Figure 3.3 graphically illustrates the diﬀerence between pseudo-random
numbers and low discrepancy sequences. The sample points on the left of the
ﬁgure were generated with the use of the Mersenne Twister method (a popular
pseudo number generator available in the Boost C++ library [18]) while those
on the right were generated with the use of a Sobol sequence. It is noticeable
that the quasi-random sequence tends to be more uniformly spaced throughout
the sample space than the pseudo-random samples.
The Sobol sequence is a digital sequence proposed in [19]. Although only
the generation of the sequence according to [20] is described, the interested
reader is referred to [19] for the theoretical underpinnings of the sequence.
The ﬁrst step is to freely select a primitive polynomial of degree d in Z2 =
{0, 1} in order to generate a one-dimensional Sobol sequence as is indicated in
equation 3.3.2.
xd + a1x
d−1 + ...+ ad−1 + 1 (3.3.2)
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 3. MONTE CARLO METHODS FOR YIELD ESTIMATION 22
Next, a sequence of positive integers is deﬁned by using the following re-
currence relation for the kth number in the sequence:
mk = 2a1mk−1 ⊕ 22a2mk−2 ⊕ 2d−1ad−1mk−d+1 ⊕ 2dmk−d ⊕mk−d (3.3.3)
The recurrence relation in equation 3.3.3 requires initial values mk with
k = [1..d] which can be freely selected as long as the values are odd and less
than 2k. The initial values for equation 3.3.3 and the primitive polynomials
can be obtained by using the algorithms described in [21] which in turn were
used to publish the values obtainable from [22].
Direction numbers, presented in equation 3.3.4 are then generated and used





The ith point in the sequence can subsequently be generated as follows:
xi = i1v1 ⊕ i2v2 ⊕ ... (3.3.5)
where ik is the kth digit from the right when i is written in binary form.
For example, commencing with a primitive polynomial of degree 4 with
a1 = 0, a2 = 0 and a3 = 1 will result in the primitive polynomial presented in
equation 3.3.6.
x4 + x3 + 1 (3.3.6)
Commencing with the initial direction numbers m1 = 1, m2 = 3, m3 = 7,
m4 = 5, the next six values can be generated by using equation 3.3.3 as
demonstrated below where all mk are presented in their binary format:
m5 = 2
3m2 ⊕ 24m1 ⊕m1
= 23(0011)2 ⊕ 24(0001)2 ⊕ (0001)2
= (1000)2 ⊕ (0000)2 ⊕ (0001)2
= (1001)2
= 9 (3.3.7)
Likewise, m6 to m10 can also be calculated as observed from the results in
Table 3.2.
Subsequently, with the use of equation 3.3.5, the numbers of the sequence
can be calculated for x5 as demonstrated in equation 3.3.8.
Stellenbosch University  https://scholar.sun.ac.za












Table 3.2: Sobol sequence calculation: values for mk and vk in binary.
x5 = i1v1 ⊕ i2v2 ⊕ i3v3...
= (1)v1 ⊕ (0)v2 ⊕ (1)v3
= (0.1)2 ⊕ (0.111)2
= (0.011)2
= 0.3750 (3.3.8)
Values for the ﬁrst 10 elements of the sequence are listed in Table 3.3.
i xi
1 (0.1)2 = 0.5
2 (0.11)2 = 0.75
3 (0.01)2 = 0.25
4 (0.111)2 = 0.875
5 (0.011)2 = 0.375
6 (0.001)2 = 0.125
7 (0.101)2 = 0.625
8 (0.0101)2 = 0.3125
9 (0.1101)2 = 0.8125
10 (0.1001)2 = 0.5625
Table 3.3: Values for x
3.4 Practical Example
An AND gate and an XOR gate were selected to investigate the eﬀects of the
Monte Carlo methods on the yield calculation of RSFQ circuits. The AND
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 3. MONTE CARLO METHODS FOR YIELD ESTIMATION 24
gate is presented in Figure 3.4 with its component values listed in Table 3.4
while the XOR gate is presented in Figure 3.5 with its components listed in
Table 3.5. For both circuits the shunt resistances and parasitic elements were
omitted for the sake of clarity. All input pulses are applied through a DC-SFQ
circuit connected to the circuit under test using a Josephson Transmission Line
while the outputs are terminated using a 2Ω resistor connected to the circuit
also through a Josephson Transmission Line.
Figure 3.4: Circuit schematic of an un-optimised RSFQ AND gate.
The AND gate consists of two ﬂuxon storage loopsB2L2B3 andB13L12B14.
SFQ pulses entering DataA and DataB are stored in these loops respectively.
A Clock input is used to release both of these stored ﬂuxons simultaneously,
which changes the bias currents of B6 and B10 so that B8 switches before B6
and can remove the SFQ pulses from the circuit. This causes an output pulse
to be generated. If only one ﬂuxon is stored, a Clock pulse will cause either
B6 or B10 to switch before B8, thereby throwing the pulse out of the circuit
without generating an output. A pulse splitter is formed through B5 and B9
to apply the input clock to the ﬂuxon storage loops. B1 and B12 serve as series
junctions to remove any input data pulses when a respective ﬂuxon is already
stored while B4 and B11 are used to remove the Clock pulses if no ﬂuxon is
stored in the storage loops.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 3. MONTE CARLO METHODS FOR YIELD ESTIMATION 25
Junctions Inductors Bias Currents
B1 : 200µm
2 L1 : 2.47 pH I1 : 187µA
B2 : 250µm
2 L2 : 9.4 pH I2 : 256µA
B3 : 325µm
2 L3 : 2.75 pH I3 : 460µA
B4 : 275µm
2 L4 : 3.01 pH I4 : 187µA
B5 : 175µm
2 L5 : 0.69 pH
B6 : 150µm
2 L6 : 1.25 pH
B7 : 300µm
2 L7 : 0.81 pH
B8 : 350µm
2 L8 : 1.1 pH
B9 : 175µm
2 L9 : 0.69 pH
B10 : 150µm
2 L10 : 3.01 pH
B11 : 275µm
2 L11 : 2.47 pH
B12 : 200µm
2 L12 : 9.4 pH
B13 : 250µm
2 L13 : 2.75 pH
B14 : 325µm
2
Table 3.4: Un-optimised RSFQ AND gate circuit parameters.
Figure 3.5: Circuit schematic of an un-optimised RSFQ XOR Gate.
The XOR gate consists of a single storage loop which is constructed from
diﬀerent components depending on which data input pulse arrived ﬁrst. The
loop consists of B2B3L4B5B6 if a pulse on DataA arrives ﬁrst and
B8B7L7B5B6 if a pulse on DataB arrives ﬁrst. Junctions B3 and B7
serve as escape junctions for the non-receiving input in order to ensure that no
pulses are propagated backwards through the system. If a pulse has already
arrived, by means of DataA for example, a follow-up pulse on DataB will
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 3. MONTE CARLO METHODS FOR YIELD ESTIMATION 26
Junctions Inductors Bias Currents
B1 : 150µm
2 L1 : 2.48 pH I1 : 100µA
B2 : 150µm
2 L2 : 1 pH I2 : 100µA
B3 : 150µm
2 L3 : 2.48 pH I3 : 100µA
B4 : 150µm
2 L4 : 4.12 pH
B5 : 200µm
2 L5 : 4.05 pH
B6 : 200µm
2 L6 : 2.48 pH
B7 : 150µm
2 L7 : 4.12 pH
B8 : 150µm
2
Table 3.5: Un-optimised RSFQ XOR gate circuit parameters.
cause B5 to switch thereby releasing the already stored ﬂuxon. Consequently,
the XOR logical operation is implemented. The arrival of a Clock pulse will
release a stored ﬂuxon through B6, generating an output pulse. If no ﬂuxon is
stored, B4 is used as the escape junction for the Clock pulse.
An automated state machine extraction method, which is discussed in de-
tail in Chapter 5 of this document, was used to test the functionality of the
AND and XOR gates. This method tests a perturbed circuit (such as is em-
ployed in yield calculations) against the nominal state machine representation
of the circuit. No performance measurement (latency, etc.) was, however,
carried out during the functionality testing.
To simplify the investigation, the local and global process variations of
the relevant parameters described in [13] were reduced to a single normal
distribution for each parameter, each with a standard deviation of 0.13. As a
benchmark, a Crude Monte Carlo run was executed with 20000 samples which
returned a yield value of 62.580% for the AND gate with given parameters and
variations. The equivalent yield for the XOR gate after 20000 was 43.55%.
A robust approach for comparing the three methods is to conduct a com-
parison of how the variance reduces for each method versus the number of
samples drawn. Nine sample steps were investigated, commencing with 100
samples and ending with 500 samples, with each step adding 50 more samples
to the investigation. For each sample step, each algorithm was run 100 times
in order to quantify the variance. The variance of each method for each sample
point could be calculated, by means of equation 3.1.3 and using a yield value
of 62.580% for the AND gate and 43.55% for the XOR gate.
In order to investigate the variance of a Sobol sequence some element of
randomisation needs to be added. This is due to the Sobol sequence being
deterministically deﬁned. In eﬀect, given the same primitive polynomial and
initial direction vectors, the same sequence will always be generated. Hence,
a simple shue method was used for randomisation. All the Sobol sequence
points were pre-created after which each parameter sequence was shued by
employing a pseudo-random number generator. For these tests a Mersenne
Twister pseudo-random number generator, available in the C++ boost library
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 3. MONTE CARLO METHODS FOR YIELD ESTIMATION 27
[18], was applied. Various other randomisation methods are also available and
should be investigated in future studies [14, 15].


















Figure 3.6: Monte Carlo method variance comparison for an RSFQ AND Gate.
The results of the variance comparison for the RSFQ AND gate are pre-
sented in Figure 3.6. It is evident that both the Latin Hypercube and the
randomised Sobol methods tend to perform better than the Crude Monte
Carlo method. Some assumptions can be made due to the fact that Latin
Hypercube Sampling performs better than Crude Monte Carlo sampling. As
discussed in [14], Latin Hypercube Sampling tends to only perform better than
Crude Monte Carlo sampling when the function to be sampled contains some
parameters which aﬀect the variance independently of the other parameters.
In eﬀect, the sampled function can be reduced to a summation of functions
that vary the yield over one dimension and functions that vary over more
than one dimension. Future work should therefore include an investigation
into the eﬀective dimensionality of RSFQ circuits. For semiconductor circuits
the eﬀective dimensionality tends to be substantially lower than the number
of parameters [14] which leads to substantial speed-ups for circuit modelling.
The same might be true for RSFQ circuits.
Sobol sequences also tend to reduce the variance of functions which contain
one-dimensional components [14]. It was also mentioned that Sobol sequences
have a lower discrepancy for the lower (l < 10) dimensions. It would therefore
be preferable to use the lower dimensional Sobol points for the more sensitive
parameters in the circuit. An interesting future test would be to initiate a
margin analysis on the components of an RSFQ circuit and explicitly use the
lower dimensional Sobol points for these parameters.
To appreciate the advantages suggested in Figure 3.6, one can calculate the
conﬁdence interval as indicated in equation 3.4.1:
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 3. MONTE CARLO METHODS FOR YIELD ESTIMATION 28






The equation is governed by the estimated standard error [15], σ√
N
. When
one examines the variance samples at 500 runs in Figure 3.6, it is evident
that the values 5.888, 3.382 and 2.682 for CMC, Latin Hypercube and Sobol
respectively are obtained. This leads to estimated standard errors of 0.108,
0.082 and 0.073 respectively. In practical terms, to achieve the same level of
conﬁdence in the yield approximation, one would require 57% of the number
of runs needed by CMC when using Latin Hypercube and 45% of that needed
by CMC for the Sobol sequence.
The results of the XOR gate are depicted in Figure 3.7.




















Figure 3.7: Monte Carlo Method Variance Comparison of RSFQ XOR Gate
Once again it is evident that both the Latin Hypercube sampling and the
Sobol sequence provides a substantial reduction in variance. The diﬀerence
in the XOR gate is more evident when a lower number of samples are drawn.
The values of the samples at 450, which subjectively seems to be following the
trend in the ﬁgure most closely, are 5.5126, 4.4662 and 4.597 for the CMC,
Latin Hypercube and Sobol sequences respectively, which lead to standard
error values of 0.107, 0.0996 and 0.101. This provides the same conﬁdence
interval in 81% of the runs for the CMC samples required by Latin Hypercube
sampling and in 83% of the runs for the CMC samples required by the Sobol
sequence.
Considering Figure 3.6 and Figure 3.7, it is evident that Latin Hypercube
sampling as well as Sobol sequences can provide a measurable variance reduc-
tion when used in conjunction with RSFQ yield approximations. However, the
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 3. MONTE CARLO METHODS FOR YIELD ESTIMATION 29
noise in the data samples does encumber the calculation of a numeric value for
the variance reduction. Future work should therefore include more than 100
yield approximations per number of samples so that a more accurate numerical
value of the variance reduction can be calculated.




Another method for increasing the speed of yield calculations is to employ
a function approximator. One of the most celebrated, and most researched,
approximators is that of artiﬁcial neural networks(ANN). These networks make
use of a very basic model of the neural connections in the human brain, along
with training data, to iteratively learn to approximate a given function. If a
neural network can be trained to approximate the yield search space accurately
by using a limited number of Monte Carlo samples, any future required samples
can be gathered through subsequent runs of the network. This could save a
signiﬁcant amount of time for optimisation algorithms that depend on multiple
Monte Carlo runs. Artiﬁcial neural networks occupy a vast ﬁeld of research.
Therefore only the most documented network, the multi-layer Perceptron, was
investigated regarding yield estimation. In addition, numerous research has
also been conducted on yield estimation via artiﬁcial neural networks in the
semiconductor industry [23, 24, 25, 26]. However, to the best knowledge of the
author this technique has not yet been applied for yield estimation of RSFQ
circuits.
Section 2 of this chapter encompasses the fundamentals of artiﬁcial neural
networks, speciﬁcally regarding multi-layer Perceptron networks. Section 3
then applies the artiﬁcial neural network to an RSFQ digital logic circuit to
ascertain whether it can serve as a viable approximator.
4.2 Artiﬁcial Neural Network Fundamentals
A simpliﬁed description of a biological neuron pertaining to artiﬁcial neural
networks can be found in [27]. The neuron consists of a cell body, containing
the nucleus and is surrounded by dendrites that receive stimuli from other neu-
rons. Stimuli leave the neuron through the axon, which connects the neuron
to downstream neurons. Stimuli are transferred by means of an electric charge
30
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 31
exchange created by the diﬀusion of ions. However, not all of the interconnec-
tions entering the neuron are equally weighted as some of them have priority
over others. Interconnections can also inhibit the transmission of stimuli.
4.2.1 The Building Blocks of Artiﬁcial Neural Networks
Most of the artiﬁcial neural networks investigated currently are based on a
biological model proposed by [28]. An illustration of the proposed model can
be seen in Figure 4.1 [27].
Figure 4.1: The McCulloch-Pitts neuron model.
In this ﬁgure, inputs designated by an E are excitation inputs and inputs
designated by an I are inhibitory inputs. Furthermore, a threshold value T
is associated with the neuron. If the number of the active excitation inputs
is higher than the threshold value at any one time, the output of the neuron
will be active. However, if the threshold value is not reached or if any of the
inhibitory inputs are active, the neuron output will be inactive. It was also
demonstrated that basic logical operations such as AND, OR and NOT could
be implemented using the proposed model.
Further reﬁnements to the neuron model were proposed by [29, 30, 31],
eventually leading to the ADALINE (Adaptive Linear Neuron and later Adap-
tive Liner Elements) and the Perceptron, whose elements are shown in Figure
4.2.
Structurally there is very little diﬀerence between the ADALINE and the
Perceptron. Both eventually contained the improvements of replacing the ex-
citation and inhibitory inputs with a numerical value (1 and 0 or −1 and
1). Both of these modelling methods also investigated the inﬂuence of dif-
ferent threshold functions. Each of the inputs were also associated with a
weight value to indicate the relative priority of the input. A bias term, that
is, an input whose value is ﬁxed at 1 (or -1) was also added to shift the eﬀec-
tive threshold value. The transfer function of such an element for a Signum
threshold function is presented in equation 4.2.1.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 32
Figure 4.2: ADALINE and Perceptron elements.
O = 1 when
n∑
k=i
ikwk + w0 > T
= −1 otherwise (4.2.1)
It is evident from equation 4.2.1, that each element forms a linear discrim-
inator. The sample space of these elements consist of two regions, which are
divided by a linear boundary. For example, in the sample space in Figure 4.3,
a discriminator implementing the AND function is presented where the output
is 1 above and -1 below the division line.
Figure 4.3: Single element sample space of the AND function.
A possible implementation of the discriminator would thus be a two input
element with a Signum threshold function, weights w1 = 1, w2 = 1 and a bias
weight of w0 = 0.5 when the bias input is −1. When the inputs are both 1, the
sum term plus the bias of equation 4.2.1 is 1.5 which causes the output to be
active. Any other input combination would cause the output to be suppressed.
The limitation of the single Perceptron or ADALINE (or a single layer
for that matter) is that these elements can only represent linearly separable
functions. An example of this limitation is illustrated in Figure 4.4.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 33
Figure 4.4: Single element sample space of the XOR function.
In this ﬁgure, the requisite sample space conﬁguration is depicted in order
to implement an XOR function. However, it is evident that it is impossible to
draw a straight line on the ﬁgure to separate the portions where the output
should be 1 from the portions where the output should be −1. The XOR
function is therefore not implementable with a single Perceptron/ADALINE
element or with a single layer of the elements.
The solution to the linear separation limitation as discussed in [27, 32,
33] is therefore to create multiple layers of Perceptron/ADALINE elements
connected in a feed-forward topology as illustrated in Figure 4.5.
Figure 4.5: Example of a multi-layer ANN topology.
In this topology, any layer not connected to an output is called a hidden
layer. Also notable is the fact that information always propagates forward
to the next layer in the network, hence the designation feed-forward network.
The processing elements (PE) shown in the ﬁgure can either be Perceptron
elements or ADALINE elements. As [32] indicates, a network with only three
layers is suﬃcient to approximate any input-output mapping given a ﬁnite
number of processing elements in each layer, a sigmoid threshold function (to
be explained below) and successful training.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 34
4.2.2 Training through the Back-Propagation method
The next step under investigation is the training of the network itself. Training
of artiﬁcial neural networks can be divided into two categories: supervised or
unsupervised. However, in this study, only supervised training is investigated
for yield estimation purposes. The aim of the supervised training algorithm
is to supply the artiﬁcial neural network with training data consisting of in-
put values with expected output values. The network should then adjust the
weights of the system (the free variables) to minimise an error function. One of
the major diﬀerences between the Perceptron and the ADALINE is the error
function deﬁnition.




(td − od)2 (4.2.2)
The error is therefore the squared sum of the diﬀerences between the current
output and the desired output of the Perceptron. The ADALINE, on the other











In eﬀect, the ADALINE error function is the sum of squared diﬀerences
between the desired response and the sum of the weight-input pairs of the
respective elements. The diﬀerence is therefore the "feedback" of the error
function (before and after the threshold function). It is notable that the train-
ing algorithm (as shown below) employs a gradient method to adjust the weight
parameters. This, however, causes a problem for equation 4.2.2 since if the
Signum function is used as a threshold, the diﬀerential does not exist due to
the discontinuous nature of the function. The solution to this dilemma was to
employ a Sigmoid function, depicted in Figure 4.6, with the equation presented







= y(1− y) (4.2.5)
The training algorithm for the Perceptron network is called the back-
propagation algorithm and is explained by means of the derivation found in
[32]. The update rule for the input weights of the Perceptron uses the gradient
of the error function (as presented in equation 4.2.2). The update rule is shown
in equation 4.2.6:
∆wi = −k ∂E
∂wi
(4.2.6)
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 35















Figure 4.6: A Sigmoid function.
where k is the step size per iteration.
First the update equation of a single Perceptron and then the generalised
equation of the back-propagation algorithm are derived. For the single Per-
ceptron, xi are the inputs, dp is be the desired output and yp is the actual
Perceptron output across all the training data p.
Using equation 4.2.4, the output of a processing element (single Perceptron










The summation is over all the inputs entering the respective processing
element. Using ep = (dp − yp)2 as the error of an individual Perceptron in
conjunction with the above equations, the gradient function can be written as
follows:
Stellenbosch University  https://scholar.sun.ac.za








































(dp − yp)yp(1− yp)xip (using equation 4.2.5) (4.2.9)
(4.2.10)






xipyp(dp − yp)(1− yp) (4.2.11)
Hence the back-propagation algorithm itself can be explained with a single
example. Considering Figure 4.7, which consists of a three layer network, the
ﬁrst layer is merely used to propagate the inputs to all the processing elements.
Figure 4.7: Multi-layer ANN topology as back-propagation example.
Each input (and output) has been named xl(i), where l is the layer number
and i is the net number in the respective layer. The weights of each processing
element are designated wl(i, k), where l is once again the layer, i is the input
index and k is the processing element index in the layer. For the network
illustrated here, the range of i is [0..2]. The 0th element of each weight is the
weight associated with the bias input, which is not explicitly presented in the
ﬁgure. For the sake of clarity the squared error will be used for a single training
input instead of equation 4.2.2 which calculates the mean squared error over
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 37
all the training data. For example, the error function of the output Perceptron
in Figure 4.7 is E = (d− x3(1))2.
Supposing that the aim is to update weight w2(1, 2), then a partial deriva-
tive of the error function with respect to w2(1, 2) will be required in order to
use equation 4.2.6. Furthermore, error derivatives can be expanded from x3(1)
















Expanding each of the chain rule derivatives:
∂E
∂x3(1)




= w3(2, 1)x3(1)(1− x3(1)) using equation 4.2.5 (4.2.14)
∂x2(2)
∂w2(1, 2)
= x1(1)x2(2)(1− x2(2)) using equation 4.2.5 (4.2.15)
Substituting the above equations in equation 4.2.12 results in:
∂E
∂w2(1, 2)
= −2x1(1)x2(2)(1− x2(2))w3(2, 1)x3(1)(1− x3(1))(d− x3(1))
(4.2.16)
which leads to the following update rule for w2(1, 2):
∆w2(1, 2) = 2kx1(1)x2(2)(1− x2(2))w3(2, 1)x3(1)(1− x3(1))(d− x3(1))
(4.2.17)
With the use of the aforementioned method the change in weight after each
training run can now be calculated, starting from w3(i, 1) to w2(i, j). Training
can be achieved batch wise, where all the training data are used simultaneously
to calculate the error function and update the weights, or incrementally, where
the weights are updated after each individual run.
4.3 Practical Application
The ability to approximate the yield sample space by means of artiﬁcial neural
networks was added to the software library that forms part of this work. The
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 38
artiﬁcial neural networks were implemented with the Fast Neural Network
Library [34] by interfacing with the various yield calculation and functional
testing functions of the software library.
The RSFQ AND and OR gates are investigated here in order to demon-
strate the yield estimation ability of artiﬁcial neural network. The gates are
again illustrated in ﬁgures 4.8 and 4.9 with the circuit parameters discussed in
Chapter 3. All input pulses are applied through a DC-SFQ circuit connected
to the circuit under test using a Josephson Transmission Line while the out-
puts are terminated using a 2Ω resistor connected to the circuit also through
a Josephson Transmission Line.
Figure 4.8: Circuit schematic of an un-optimised RSFQ AND Gate.
Figure 4.9: Circuit schematic of an un-optimised RSFQ OR Gate.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 39
A neural network has many variables that can be modiﬁed and have an
eﬀect on its behaviour. Since it is very diﬃcult to apply the theoretical frame-
work of artiﬁcial neural networks to a speciﬁc problem, the variables are in-
vestigated empirically.
The ﬁrst step in the creation of an artiﬁcial neural network is the generation
of training data. This entails the sampling of the function to be approximated.
In this case, it is the mapping of circuit parameters of the logical gate to a
yield value. First a sampling strategy is selected and then a valid range of
parameters is deﬁned. The ranges of the components used with both circuits
are listed in Table 4.1.
L B I
Abs Min value 0.5 pH 100µm2 100µA
Abs Max value 15 pH 400µm2 600µA
Deviation 150 % 150 % 150 %
Table 4.1: Sample space deﬁnition for yield modelling.
Instead of using an absolute search space, each inductor, Josephson junc-
tion and current source was deviated by plus 150 % and minus 150 % of its
nominal value to create the allowable range for each particular component.
Absolute minimum and maximum values were also imposed on the deviated
values to ensure that components with very large or very small nominal values
remain in a practical implementable range.
An automated state machine extraction algorithm (which is discussed in
Chapter 5) was used to test each deviated circuit for basic functionality. This
entails extracting the state machine representation of the nominal circuit and
testing it against the state machine representation of the deviated circuit. If
functional, a Sobol sequence approximation (described in Chapter 3) with 500
runs was used to calculate the approximate yield. A standard deviation of 0.13
was used for all components when calculating circuit yield which was found
to generally provide yield values between 40 % and 95 % for the circuits under
test.
It is important to note that the AND gate consists of two identical arms,
each of which are used to store and release a ﬂuxon. In general, when designing
such a circuit one would use the same parameter values for both of these
arms. One would, for example refrain from using vastly diﬀerent values for B2
and B13. In terms of the sample space approximation, and the optimisation
algorithms described in Chapter 6, the user of the software library has the
ability to deﬁne linked elements. The user would therefore inform the program
that B2 and B13 are linked which in turn would remove one free variable from
the search space. This action has a dramatic eﬀect on the performance of the
algorithms. For the AND gate illustrated in Figure 4.8, linked elements were
created between the two arms leading to 19 free variables in the circuit. A BC
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 40
value of 1 was used to calculate the shunt resistance of each junction. In an
equivalent fashion, the two arms of the OR logical gate shown in Figure 4.9
can also be linked resulting in 16 free variables.
For the samples themselves a Latin Hypercube sampling strategy was em-
ployed due to the possibility that Crude Monte Carlo sampling would miss
certain areas of the sample space. These missed areas could prove critical to
the correct approximation of the search space. On the other hand, a Latin
Hypercube sampling scheme (as discussed in Chapter 4) ensures that samples
are distributed more evenly across the search space. A Sobol sequence would
also be appropriate for this type of sampling but was not investigated for this
work. A further reﬁnement was also made to the Latin Hypercube sampling
to include extra localised samples. When a functional (in terms of state ma-
chine representation) sample is found, another Latin Hypercube is generated
around the sample point. A deviation of 5 % of the functional sample point
was used for the boundaries of the hypercube with 20 extra localised sam-
ples drawn. This ensures that each functional sample point has neighbouring
samples which would indicate the function dynamics in that area. The more
information the networks receive regarding the function dynamics, the better
the network should be able to interpolate. This resulted in 7750 samples being
drawn for the AND gate, including non-functional samples, and 16330 samples
for the OR gate. Notablye all the input and output values were scaled between
0 and 1 before being applied to the ANN.
The Fast Artiﬁcial Neural Network(FANN) library, used to implement the
artiﬁcial neural networks, allows a wide variety of options. These include acti-
vation functions, type of training algorithm, steepness of activation functions
and learning momentum. For the activation function various Sigmoid-like
functions are available for which a steepness can be set using the appropriate
parameter. Generally it was found that the Elliot activation function pro-
vided slightly better interpolation than the Sigmoid function when used with
yield approximation. Mathematically, the Elliot function can be described as
presented in equation 4.3.1.
y =
x/2
1 + |x| + 0.5 (4.3.1)
Furthermore, in terms of the training strategy it was found that batch
training was not very eﬀective when training the network for yield approxima-
tion. Incremental training was therefore used for all investigations. The FANN
library also provided two more advanced training methods called RPROP and
QUICKPROP. These methods were not investigated in this work. The param-
eters for the artiﬁcial neural networks are shown in Table 4.2.
Generally, the largest inﬂuence on the yield approximation was found to
be the structure of the networks. The investigation therefore focused on the
number of hidden layers as well as the number of processing elements in each
layer.
Stellenbosch University  https://scholar.sun.ac.za





Table 4.2: Selected FANN parameters for yield modelling.
The application of the full set of training data is called an epoch in ANN
literature. It is diﬃcult to know how many epochs one should train the network
before using its approximation abilities. One option is to allow the network to
keep on training until the mean squared error of the outputs compared to the
training data stops reducing. This is not always the best strategy as the net-
work might become "over-trained". This indicates that the network becomes
so adept at approximating the training data that it loses the ability to interpo-
late new data (generalise). One way to mitigate this, as mentioned in [33], is
to create another small set of test data that was generated independently from
the training data. After each training epoch, the mean squared error of the
test data is calculated to ascertain whether or not the network is any better
at approximating untrained data. If the mean squared error increases after a
training epoch, one can assume that the network is starting to over-train and
is losing its ability to interpolate.
Bearing in mind the eﬀect of over-training, an investigation methodology
was created as shown in Figure 4.10.
Along with the training data, test samples were created independently
using the same Latin Hypercube method. For the AND gate this consisted
of 1780 samples and for the OR gate 6220 samples. Throughout the training
phase, the current lowest mean squared error of both the training data and the
test data were saved. The aim of this method is to ﬁnd both network weights
where the mean squared error of the test data and that of the training data
are at a minimum. The minimum is selected if the lowest error value remains
constant for the 1500 epochs.
The algorithm starts by creating a new network with the parameters shown
in Table 4.2. The initial mean squared errors for the training data and the test
data are then calculated and saved as the new lowest values. The counters are
also reset to zero. A new training epoch now commences. If the lowest error
of the test data has not yet been found, the test data is applied to the current
network in a feed-forward manner (meaning that the weights are not updated
and only outputs are generated) and the outputs are compared to the known
outputs of the test data so as to calculate the mean squared error. If this new
error value is lower than the previously stored value, the new value is stored
as a new reference. The network is also saved in case the lowest error value
had been reached. If, on the other hand, the new error value is higher than
the saved error value, the counter is incremented. If this counter reaches the
maximum number (in this case 1500), a ﬂag is set to indicate that the test
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 42
Figure 4.10: Training strategy ﬂow diagram for artiﬁcial neural networks.
data phase has been completed and that the network is possibly over-trained.
The last saved set of network weights will then have produced the lowest mean
squared error for the test data.
The same method is applied to the training data in parallel. The new
mean squared error is calculated and compared to the previous best mean
squared error of the training data. If a better network is found, the saved
error is updated and the network is saved for future use. If, however, no better
network is found, the training counter is incremented and after 1500 counts
have been reached, a ﬂag is set to indicate that the training data portion have
reached its lowest error value.
The aim is therefore to save the network weights at the state where the
mean squared error of the test data is at its lowest and also where the training
method cannot reduce the mean squared error of the training data any further.
These two networks can then be compared by visualising circuit parameter
sweeps as explained below.
It is noteworthy that the training algorithm itself is a randomised optimi-
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 43
sation algorithm since the initial weights of the system are randomly selected.
It may therefore be necessary to run the network more than once to obtain
useful values. Every network structure investigated in this study was therefore
run ﬁve times. The hidden layers of the investigated structures for the two
circuits are listed in Table 4.3. The input layers always constitute the number
of inputs and the output layer is equal to one due to the fact that only one
variable is approximated.
Network 1 (10) Network 2 (20)
Network 3 (40) Network 4 (10,10)
Network 5 (20,20) Network 6 (40,40)
Network 7 (10,10,10) Network 8 (20,20,20)
Network 9 (40,40,40)
Table 4.3: Topologies of the investigated artiﬁcial neural networks.
Notably all the networks are fully connected, meaning that each input is
applied to each processing element and each processing element is applied to
every neighbouring processing element in the next hidden layer. A summary
of the results are presented below, with the complete set of results available in
Appendix B.
In Table 4.4, the results of the AND gate runs are presented. The lowest
mean squared error for each structure in terms of the training data and the
independent test data is presented. The run that was responsible for each of
these values is also presented.
Number Structure Lowest Train MSE Run Lowest Test MSE Run
1 (10) 0.0039 Run 1 0.0113 Run 3
2 (20) 0.0023 Run 1 0.0090 Run 1
3 (40) 0.0020 Run 1,2,5 0.0087 Run 5
4 (10,10) 0.0033 Run 3 0.0085 Run 3
5 (20,20) 0.0017 Run 2 0.0089 Run 4
6 (40,40) 0.0011 Run 1,3 0.0097 Run 3
7 (10,10,10) 0.0028 Run 1 0.0121 Run 4
8 (20,20,20) 0.0013 Run 1 0.0060 Run 3
9 (40,40,40) 0.00057 Run 4 0.0090 Run 4
Table 4.4: ANN yield modelling results for the RSFQ AND gate.
The value of the mean squared error of the training data varies substantially
depending on which structure is used to approximate the yield. It is evident
that any of the structures that only make use of ten processing elements in a
hidden layer generally perform worse than their larger counterparts. In general,
it would seem that the mean squared error of the training data is reduced when
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 44
more processing elements per hidden layer and more hidden layers are present.
It is very interesting to note, though, that this trend is not visible in the
mean squared error of the training data. In fact, both network numbers 6 and
9 seemed to perform worse in comparison with their 20 processing element
counter parts. This might be due to the optimisation of the weights being
more diﬃcult in the presence of more free variables which could in turn have
necessitated more runs to ﬁnd an optimal weight distribution.
It is challenging to intuitivly know how these mean squared error numbers
correlate with the ability to approximate the yield. In order to gain insight into
the meaning of the errors, a visual inspection was made regarding the yield
value over two free variables. This inspection entails the selection of two circuit
variables after which a yield map is created by varying these variables over a
speciﬁed range and calculating the yield using the Sobol sequence method
desribed in Chapter 3 as well the artiﬁcial neural network approximation. B2
and L2 were chosen as free variables due to these components forming part of
the ﬂuxon storage loop which is crucial to the operation of the circuit. The
range of each of the component values were divided into 25 equally spaced
points and each combination of values were fed into the network while retaining
the other components at nominal values. For comparison purposes, the two
variables were ﬁrst investigated using a Sobol sequence with 500 steps per
point. A three-dimensional yield map was therefore generated. The result of



















B2 Area in µm
2







Figure 4.11: Sobol sequence yield map of junction B2 and inductor L2 for the
RSFQ AND gate.
All the networks were compared to the yield map in Figure 4.11. The best
comparison (subjectively chosen) for each hidden layer step is shown below.
For example, the best comparison of all the runs of networks 1, 2 and 3 are
indicated below. A single representative was therefore also chosen for networks
4, 5 and 6 and networks 7, 8 and 9.
Stellenbosch University  https://scholar.sun.ac.za


















B2 Area in µm
2











Figure 4.12: ANN yield map of junction B2 and inductor L2 using network 3
run 2 for the RSFQ AND gate.

















B2 Area in µm
2











Figure 4.13: ANN Yield map of junction B2 and inductor L2 using network 5
run 1 for the RSFQ AND gate.
The training error for Network 5 Run 1 was 0.0020 with a test data set
error of 0.0097.
Stellenbosch University  https://scholar.sun.ac.za


















B2 Area in µ m
2











Figure 4.14: ANN yield map of junction B2 and inductor L2 using network 9
run 1 for the RSFQ AND gate.
The training error for Network 9 Run 1 was 0.00075 with a test data set
error of 0.0103.
From a visual inspection of the yield spaces in comparison to the Sobol
equivalent, it can be conﬁrmed that a low number of processing elements gen-
erally does not provide adequate results. Furthermore, a single hidden layer
tends to give a general, very broad approximation of the yield space, whereas
more hidden layers are likely to add further detail to the approximation. There
seems to be very little correlation between the visually pleasing approximations
and these approximations that performed better numerically.
The same testing methodology was applied to the RSFQ OR gate and the
results are indicated in Table 4.5.
Number Structure Lowest Train MSE Run Lowest Test MSE Run
1 (10) 0.0208 Run 2 0.0310 Run 2
1 (20) 0.0176 Run 5 0.0272 Run 2
1 (40) 0.0158 Run 3 0.0278 Run 4
1 (10,10) 0.0176 Run 2 0.0283 Run 2
1 (20,20) 0.0115 Run 1,3 0.0274 Run 3
1 (40,40) 0.0065 Run 1 0.0258 Run 5
1 (10,10,10) 0.0157 Run 3 0.0259 Run 3
1 (20,20,20) 0.0109 Run 5 0.0283 Run 4
1 (40,40,40) 0.0043 Run 5 0.0276 Run 4
Table 4.5: ANN yield modelling results for the RSFQ OR gate.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 47
The error values of the OR gate are generally higher than that of the AND
gate. This could be due to the larger number of samples that were available for
both the training data and the test data. Once again the data conﬁrm that a
small number of processing elements tend to render sub-standard results, which
also tends to be the case for both the training and the test data. It is interesting
too that the change from 20 to 40 processing elements in general produced a
signiﬁcant drop in the mean squared error of the training data. This could
indicate that the OR gate, more so than the AND gate, needs more processing
elements to correctly model its yield behaviour. This could further indicate
that there is no real correlation between the number of processing elements
and the number of input parameter values since the AND gate, with more
input parameters, was suﬃciently approximated with 20 processing elements.
A visual inspection of the yield map of B7 and L6 was also undertaken in
relation to the Sobol sequence method presented in Chapter 3. These elements
were once again selected due to their ﬂuxon storage behaviour. A represen-
tative yield map is illustrated below for each hidden layer size, in the same
manner as seen for the AND gate. For comparison purposes, a Sobol sequence
method was used to generate the map shown in Figure 4.15 with each point

















B7 Area in µm
2











Figure 4.15: Sobol sequence yield map of junction B7 and inductor L6 for the
RSFQ OR gate.
Stellenbosch University  https://scholar.sun.ac.za

















B7 Area in µm
2











Figure 4.16: ANN yield map of junction B7 and inductor L6 for network 3 run
1 for the RSFQ OR gate.
The training error for Network 3 Run 1 was 0.0163 with a test data set
error of 0.0279.
Stellenbosch University  https://scholar.sun.ac.za

















B7 Area in µm
2











Figure 4.17: ANN yield map of junction B2 and inductor L2 using network 6
run 2 for the RSFQ OR gate.
The training error of Network 6 Run 2 was 0.0072 with a test data set error
of 0.0277.
The training error for Network 9 Run 4 was 0.0049 with a test data set
error of 0.0267.
Once again it is evident that single hidden layer networks can only approx-
imate the general outline of the sample space. Also, visually there seems to
be only a slight diﬀerence between the two and three hidden layer networks.
In general, though, the three hidden layer network does seem to attenuate the
yield values more compared to the Sobol sampling.
4.3.1 Conclusion
From the investigation discussed in this chapter it can be ascertained that the
choice in network structure has a very signiﬁcant inﬂuence on the performance
of the artiﬁcial neural network. What is also evident, which can be surmised by
investigating the full set of results in Appendix B, is that the back-propagation
algorithm can result in diverse network behaviours for multiple training runs of
the same network structure. It can therefore be surmised that the search space
traversed by the back-propagation algorithm when training for yield might also
be multi-modal (have more than one optimum). It is therefore crucial that
networks are run multiple times to ﬁnd the optimal solution. Alternatively,
meta-heuristic optimisation algorithms could be investigated as a replacement
Stellenbosch University  https://scholar.sun.ac.za



















B7 Area in µm
2











Figure 4.18: ANN yield map of junction B2 and inductor L2 using network 9
run 4 for the RSFQ OR gate.
for back-propagation because of their ability to ﬁnd optimum points in multi-
modal search spaces.
The usage of test data to ascertain the success of approximation tends to
be inconclusive. The results do, however, show a slight indication that the
networks which performed well in terms of the mean squared error and visual
comparison to the Sobol sequence method do tend to have a lower test mean
squared error, but no obvious trend is visible.
It was useful to visually inspect the network behaviour so as to ascertain
whether the general yield space shape of the network output is close to what
could be expected from a Sobol sequence Monte Carlo equivalent. Naturally
this method is not conclusive, since not every part of the yield space can be
investigated in this manner.
Given the results, the following guidelines should be followed when aiming
to approximate the circuit yield of an RSFQ circuit with an artiﬁcial neural
network:
• Gather training data that is well spread out across the search space;
• Gather a smaller independent set of test data;
• Start with a network structure with two hidden layers and around 20
processing elements per hidden layer. If the results are not favourable,
increase the processing elements ﬁrst before adding another hidden layer;
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 4. ARTIFICIAL NEURAL NETWORKS 51
• Inspect the mean squared error of the test data set as well as the visual
yield maps for signs of over-training; and
• Run the back-propagation algorithm multiple times.
It can be surmised that artiﬁcial neural networks have the ability to accu-
rately model the circuit yield of an RSFQ circuit. The procedure to obtain a
satisfactory approximation, though, is not very well understood and still relies
on trial and error. The amount of training data required also tend to make
the use of the network as a replacement for the Monte Carlo methods (such as
discussed in Chapter 3) unlikely. This might be alleviated by using a diﬀerent
sampling strategy or a diﬀerent network structure with a diﬀerent interconnec-
tion matrix. More research is therefore necessary to ascertain whether artiﬁcial
neural networks can be used to replace Monte Carlo methods in an eﬃcient
manner.
The networks do tend to render a fairly accurate overall representation of
the yield space. The network can therefore be used as an investigative tool for
algorithms that employ a large number of yield samples, such as optimisation
algorithms. The speed at which yield samples can be generated using the
trained network allows large amounts of statistical data to be gathered in
order to quantify the performance of these types of algorithms.
This is the approach that is used to investigate various meta-heuristics in
Chapter 6.
Stellenbosch University  https://scholar.sun.ac.za
Chapter 5
State Machine and Timing
Extraction
5.1 Introduction
The main aim of design centering is to ensure that a circuit is functional given
all the possible manufacturing process variations. However, the meaning of
"functional" needs to be explored before an attempt can be made to solve this
problem. For digital logic circuits, such as those under investigation in this
work, it is only natural that the implemented logical function should also be
the primary functional attribute. For example, if the designer aims to imple-
ment a logical AND gate, the circuit response needs to be tested against the
known logical functionality of an AND gate. Apart from the logical function
implemented by the circuit, a designer might also deﬁne a functional bound on
other parameters. This could include circuit timing (which is investigated in
detail later in this chapter) and leakage current (which falls outside the scope
of this work). Timing can inﬂuence the functionality of the circuit itself as
well as adjacent logic circuits in a system composed of multiple logical cells.
In pure asynchronous designs, such as [35], timing is a crucial functional spec-
iﬁcation for adjacent cells to function properly. Leakage current can also have
a direct inﬂuence on biasing of adjacent circuits which could cause further
circuit failure due to process variations.
In this chapter, a method consisting of various algorithms is presented that
characterises RSFQ digital logic circuits in terms of logical functionality and
timing. These characteristics can be used for the functional testing of circuits
and for high-level circuit modelling.
Fundamentally, the logical operation of digital circuits can be represented
by their respective state machine descriptions. This representation can either
be based on a Moore machine or a Mealy machine. In a Moore machine,
the output of the circuit is dependent only on the current state of the circuit
whereas in a Mealy machine, the output is dependent on the current state
52
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 53
and the current input values of the circuit. For RSFQ circuits, with their pico-
second pulse propagation, a Mealy representation tends to be more natural. In
order to robustly test the logical functionality of any digital logic circuit, the
designer needs to test each of the states in the state machine representation
of the circuit for each of the possible circuit inputs. This is analogous with
current FPGA and ASIC designs where circuit functionality is proven, given
the complete circuit coverage from an applied test bench. In eﬀect, every
functional part of the circuit is exercised and tested against expected values.
One of the most prevalent methods of currently testing RSFQ circuit func-
tionality is by manually creating a test bench in a SPICE simulator that en-
sures circuit coverage. Outputs of such a test bench are observed manually,
or by means of computer script, in order to ascertain whether the circuit is
functionally correct. Creating test benches manually can become a very time-
consuming portion of circuit design for large circuits. This can be alleviated
by using the method of state machine and timing extraction proposed in this
chapter.The results of the extraction can then be examined for correctness by
the designer or tested against other extracted state machines. The possibility
of automatically generating state machine representations was suggested by
[36], but to the best knowledge of the author, no known publication detailing
this method has yet emerged.
Large parts of digital logic design in the semiconductor industry make use
of hardware description languages to create complex designs. These languages,
for example VHDL and Verilog, allow the designer to work on a higher level of
abstraction in order to create designs that would be cumbersome to implement
in instances where the full electrical representation of the circuit needs to be
considered. In FPGA or ASIC designs, a designer usually creates system com-
ponents by using register transfer level (RTL) or component instantiation in
their selected hardware description language (HDL), before testing the logical
functionality of their design by means of behavioural simulation. This is a
simulation that does not take any circuit latencies into account and usually
involves the designer creating a test bench of desired stimuli that will test
the full range of circuit functionality. After the functionality has been con-
ﬁrmed, a timing analysis can be performed which can be static or dynamic.
In a static timing analysis, the propagation paths for the clock, data and
asynchronous signals are calculated throughout the design. The synchronous
signals are checked against known timing requirements for the various sequen-
tial elements, such as ﬂip-ﬂops. Asynchronous signals, on the other hand,
are tested against recover-removal requirements. It is important to note that
the asynchronous signals generally refer more to asynchronous set-resets of se-
quential circuit elements or to input signals from an unknown clock domain,
rather than a fully asynchronous design. A static timing analysis therefore
does not consider the logical functionality of the circuit, but rather only tests
the sequential elements for timing violations. In contrast, a dynamic timing
analysis depends on an input test bench created by the designer to logically
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 54
test the circuit by considering the appropriate component and path latencies.
For FPGAs this is achieved by ﬁrst synthesising the circuit to be tested which
then translates the RTL and instantiation code to a net list of physically im-
plementable constructs in the FPGA. The latencies through these constructs
are known to the simulator which subsequently allows the simulation to test
for timing violations. Further timing information can be obtained after the
place-and-route cycle when the various components of the net list are placed
in the FPGA fabric, and path latencies can be conﬁrmed. Generally, a dy-
namic timing analysis is more accurate than a static timing analysis due the
extra logical veriﬁcation, but it does take longer to simulate and is also more
cumbersome to implement since the designer ﬁrst needs to create a test bench.
The same general design ﬂow applies to RSFQ circuits and the semiconduc-
tor industry. Some studies have been undertaken by various researchers on the
synthesis [37, 38] and place-and-route [39] stages of the RSFQ circuit design
process. There are, however, some fundamental diﬀerences between RSFQ and
semiconductor technology that should be considered. Clock timing in RSFQ
circuits tends to be very challenging due to the picosecond pulses used for
information passing and also the high operation frequencies [40, 41]. Due to
this design challenge, it has not yet been conclusively established whether a
synchronous or an asynchronous design methodology would be most beneﬁcial
for large scale RSFQ integration. It is therefore important that asynchronous
methodologies, such as reviewed by [5], as well as synchronous methodologies
should be possible with whatever high level design constructs are available.
Various researchers also proposed using high-level HDL modelling for RSFQ
circuit design, such as in [36, 42, 43, 44, 45] [46]. The presented work, however,
oﬀers a method for the automatic generation of these models given the SPICE
representation of a circuit and other user inputs. This model can be used for
an asynchronous or a synchronous circuit design.
5.2 Method Overview
A method is presented in [47] to automatically extract the state machine repre-
sentation of an RSFQ digital logic circuit, along with its timing characteristics
in order to alleviate the cumbersome nature of test-bench generation for circuit
functionality veriﬁcations and to generate HDL representations for a high level
circuit design. As explained later in this section, the method revolves around
identifying the inductive loops that store ﬂuxons, and monitoring these loops
as inputs are exhaustively applied in all circuit states. The method is imple-
mented in C++ and forms part of the larger software library which is expanded
upon in Appendix A.
A functional SPICE deck and its input and output nodes are supplied to
the program at start-up. Currently, a practical circuit settling time should
also be supplied. This is the time (which need not be accurate) that should
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 55
lapse after each input pulse application to allow all circuit transients to settle.
In future versions of the method this requirement will most likely be omitted
due to the ability of the program to identify the passing of transients in the
circuit. The various other input options that are available to the designer are
elaborated on in the detailed method discussion.
The input SPICE deck is ﬁrst parsed and ﬂattened (if applicable) after
which an undirected graph is constructed for loop identiﬁcation. Then the
undirected graph is used to identify the shortest possible loop containing each
circuit element. These loops are all monitored in the initial state machine
extraction process before the method veriﬁes which loops are in fact used for
ﬂuxon storage loops.
After the completion of the loop identiﬁcation process, the extraction pro-
cess can commence. An SFQ pulse is applied, one at a time, to each of the
circuit inputs. After the circuit transients settle, the identiﬁed loops are mon-
itored for ﬂuxon storage or loss. A change in stored ﬂuxons indicates that
the circuit underwent a change in state. This new state (and the stimulus
that led to the new state) is stored for later investigation. After a pulse is
applied to each of the circuit inputs, the method selects one of the new states
to investigate. The method then applies the stimulus that led to the state
under investigation after which pulses are once again applied to the inputs
individually. All transitions in the circuit states are therefore investigated.
Throughout the extraction process the output nodes are investigated for
the generation of output pulses, which is also stored with the state informa-
tion. Timing characteristics can also be extracted once the state machine
representation is known. This is necessary in instances where dynamic HDL
timing models need to be constructed for a high-level design. The timing char-
acterisation builds on the deﬁnitions of critical and delay timings as proposed
by [43]. Two diﬀerent timing characterisation methods are available to the
designer: a method that focusses on empirical timing information extracted
by means of repeated SPICE runs, and another that makes use of identifying
circuit transients. These timing methods are discussed in more detail later in
this chapter.
The designer can elect to use the generated state machine representation
for failure testing or for HDL generation. For failure testing, the state machine
representation of derivative circuits (changed due to Monte Carlo variations
or optimisation techniques) can be compared to the nominal state machine
representation to prove logical equivalence. For HDL generation, the state
machine representation and the timing characteristics can be employed by the
method to automatically generate an HDL model.
The assumption is made that the correct functionality of the circuit does
not depend upon the simultaneous arrival of SFQ pulses. This is, however,
not a major limitation given the diﬃculty in aligning the picosecond wide
SFQ pulses in practical circuits. If this functionality is required in the future,
further expansions of the method would be necessary.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 56
5.3 Flux Calculations
The core of the method is the quantisation of ﬂux in RSFQ circuits. The only
way for an RSFQ digital logic circuit to change its behaviour is by either storing
or releasing a ﬂuxon. This coincides with the state machine representation of
these circuits. It follows that by investigating the inductive loops of a circuit
for the storage or loss of ﬂuxons due to the eﬀects of the application of input
pulses, these state machine representations can be constructed.
The ﬁrst step entails the ability to calculate the ﬂux change in inductive
loops. The magnetic ﬂux through a loop can be calculated using the current




In × Ln (5.3.1)
where n is the number of components in the loop.
Considering Figure 5.1, if one would like to identify the instantaneous
amount of ﬂux in loop B1L1B2 then by using a SPICE simulation one
could easily log the simulated current through the three respective elements
in order to derive the value of L1. The inductance of a Josephson junction,
however, changes depending on the amount of current passing through the
junction [9]. The inductance can be approximated as follows, depending on
the instantaneous current:













with Φ0 = 2.0679e−15Wb.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 57
The loops can be investigated for ﬂuxon storage by calculating equation
5.3.1 after each input application, assuming that the previously mentioned loop
is currently storing a ﬂuxon. It is important to note that when investigating
L1 for ﬂux storage, loop B1L1L2B3 should have the same ﬂux value
as the mentioned storage loop. This is due to the quantisation of ﬂux in
superconducting circuits. The increase in current through B2 due to the stored
ﬂuxon causes a net change in ﬂux in loop B2L2B3. If this ﬂux does
not equate to a single ﬂux quantum, a shielding current will form in L2B3
to expel the change in ﬂux. This shielding ﬂux will therefore be equal and
opposite to the contribution of the ﬂux caused by B2 in loop B2L2B3.
For loop B1L1L2B3 this will therefore add the same contribution as
B2. This is an important observation since one can therefore investigate any
formed loop containing L1 in order to ascertain whether this component is part
of a ﬂuxon storage loop. Practically, though, the smallest loop possible will
be used for the investigation process in order to minimise calculation times.
An interesting exception to the above mentioned calculation methodology
can be seen in Figure 5.2. Supposing that the loop inductance ofB1L1L3B2
is not suﬃcient to store a ﬂuxon but loop B1L1L2 is able to do so, then
when a ﬂuxon is stored in the said loop, a shielding current forms through
the elements L3B2 to counteract the net ﬂux gain in loop L2L3B2 as
previously explained. An exception occurs when the amount of shielding cur-
rent required to counteract the eﬀect of L2 in the loop is greater than the
critical current of B2. Through simulation it can be veriﬁed that B2 does
not enter its resistive state, but rather that the excess current is shunted
through L4B3. This phenomenon does not cause a problem for the ﬂuxon
identiﬁcation method, though, since the sum of the net ﬂux gain in loops
B1L1L3 B2 and B2L4B3 still equates to that of a single ﬂux quan-
tum. In eﬀect, the method investigates adjacent loops if a ﬂux change is
identiﬁed that is either a single ﬂux quantum or no ﬂux change after an input
application.
Figure 5.2: Example of a ﬂux calculation exception.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 58
5.4 Detailed Method Description
This section describes the details of the extraction method.
5.4.1 User Inputs
The following user inputs should be supplied for the correct extraction of the
state machine representation:
• A functioning SPICE deck containing the circuit under investigation in a
format parsable by JSIM [48] or JSPICE. Other simulation formats can
relatively easily be incorporated into the software library.
• The input nodes of the circuit. These are the nodes that will be used for
the application of input pulses.
• A Josephson junction in the input path of each input node from which
circuit latency is calculated. Generally a junction just before the sup-
plied input node is used for this purpose to ensure that the true circuit
latency will always be slightly less than the calculated latency. This
junction would normally be situated in an adjacent logic cell or in a DC-
SFQ circuit used to generate stimuli. In future implementations of the
method, this requirement can be reinvestigated. It should be possible
to use the created undirected graph to automatically identify the ﬁrst
parallel junction before the supplied input node.
• The output junctions that should be monitored for output generation
and for delay (latency) characterisation. In future implementations of
the method an output node can ﬁrst be supplied and then a parallel
junction can automatically be identiﬁed for the monitoring process.
• The time after each input that is required for all the circuit transients
to pass. This requirement can also be removed in future versions of the
implementations by initially monitoring the current passing through the
last element in the longest possible path in the generated undirected
graph.
• Any supplied user rules (optional). To understand the need for user rules
consider Figure 5.3.
In general a circuit is designed so that any sequence of input pulses can be
applied without inﬂuencing the logical functionality of the circuit, regardless
of the circuit state. For example, in Figure 5.3 the series junction B2 is used
to "throw-out" input pulses if a ﬂuxon is already stored in loop B3L3B4.
The designer can, however, elect not to include B2 if a decision is reached that
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 59
Figure 5.3: Circuit schematic of an RSFQ D-Flip-Flop.
the circuit is unable to receive two subsequent input pulses without a clock
pulse arriving ﬁrst.
In order to facilitate this requirement, the user can supply rules to the
extraction method. For instance, a rule for Figure 5.3 could be that a clock
pulse has to arrive after an input pulse before another input pulse can be
applied. Practically this would result in a maximum rule which should be
supplied to the method as follows:
rule max in_node 1 reset clk_node (5.4.1)
where in_node would be the input node number and clk_node the clock
node number. The number after the node to which the maximum rule applies
designates the number of pulses allowed before the rule is violated. The reset
keyword designates that the following node resets the maximum rule.
In general the following keywords are available:
• max A B : maximum number of sequential pulses on node A is B.
• after A : the previously supplied maximum rule only applies after a
pulse on A.
• reset A : the previously supplied maximum rule should be reset after a
pulse on A.
• override A : the previously supplied maximum rule is ignored if a pulse
on node A arrives after a pulse supplied with the after keyword.
In future implementations of the method, functional diﬀerences could be
supplied with the input node numbers. Input nodes can, for example, be
classiﬁed as clock inputs for synchronous circuits which can automatically be
used to reset input pulse rules.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 60
5.4.2 SPICE parsing, ﬂattening and graph creation
The next step in the method is to read and parse the SPICE deck. Any
hierarchical instantiations are removed by creating a unique component (with
unique connected nodes) for each component within a sub-circuit instance.
This will ensure that loops that span multiple instanced cells are correctly
identiﬁed. The designer is also able to specify sub-circuit instances to ignore
for the ﬂattening and extraction process. This is useful for peripheral circuits
that are used to excite the circuit under test but does not form part of the
functionality of the said circuit. Circuits such as DC-SFQ cells and JTLs
are usually ignored for the extraction process as they are generally used to
generate stimuli and do not form part of the tested circuit functionality. An
undirected graph is created from all the ﬂattened components, while ignoring
any components in sub-circuit instances that appear in the ignore list. Any
component that only has one connected node due to the peripheral cells being
ignored (such as input and output inductors) is excluded from the graph.
The nodes of the undirected graph correspond to the nodes of the SPICE
representation while the vertices of the graph correspond to the two terminal
components (inductors or Josephson junctions).
An undirected graph of the D-Flip-Flop presented in Figure 5.3 is depicted
in Figure 5.4.
Figure 5.4: Un-directed graph of D-Flip-Flop components.
5.4.3 Inductive Loop Identiﬁcation
The aim of the inductive loop identiﬁcation is to ﬁnd the smallest loop for
each inductor in the circuit under test. This is achieved by ﬁrst selecting an
un-investigated inductor and arbitrarily assigning one of its nodes as a target
node and the other as a source node. The investigation takes place in parallel
from both of these nodes, in a direction away from the component itself. In
each iteration of the algorithm each path (target and source) adds a single
unselected vertex and node pair to its path list. If more than one vertex is
selectable (for example a node connecting three components), a new path is
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 61
created by copying the path that caused the current node to be reached and
adding the remaining unselected vertex. In the next iteration of the algorithm
a new vertex is selected for the new path as well as for the initial paths. After
each iteration, the identiﬁed nodes are compared to ascertain whether two of
the paths have reached a common node. The ﬁrst pair of paths that reach a
common node is necessarily the shortest path (though other paths might exist
that contain the same number of vertices). This is due to the fact that each
path only adds one vertex-node pair during each iteration, therefore keeping
all the paths the same length. It is important to note that once a loop has
been identiﬁed, none of the loop components need to be investigated again
since the identiﬁed loop for each of these components will contain at least the
same number of elements as the initial loop.
Figure 5.5: Finding the smallest loop containing inductor L1.
For example, when one considers Figure 5.5 where the aim is to ascertain
the smallest inductive loop containing L1, the target and source node can be
selected as N1 and N3 respectively. At the ﬁrst iteration, three paths are
created; one path starting from N1 and two paths from N3 since the last
named node has two unselected vertices. In the next iteration each path adds
one component away from the component under investigation. Path 1 adds L2
and ﬁnds node N2 while path 2 adds component L4 and ﬁnds the ground node.
Path 3 adds component L5 as its vertex and also ﬁnds node N2. After each
iteration the identiﬁed nodes are investigated in a search for commonality. In
this case both path 1 and path 3 found node N2. The components determined
by path 1 and path 3 can now be added together to form the shortest inductive
loop containing L1, that is, L1L2L5.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 62
5.4.4 State Machine Extraction
After the loop identiﬁcation process, the state machine extraction process can
commence. Initially, it is not known which of the identiﬁed loops are used as
storage loops by the circuit. Therefore, all the components that form part of
one of these identiﬁed loops are investigated through the extraction process.
For the purpose of failure testing, any subsequent extraction process can only
investigate the loop components that are known to form part of a storage loop.
This provides a necessary processing performance increase when performing
taxing investigations, such as Monte Carlo runs.
Start
Create Save List 
Create Initial Input
Sequence 
Create Initial State 
Choose Next
Uninvestigated State 
Run Spice Deck 
With Input Sequence 
New State 
Found? 
Create New State 
Add Pulse On Untested 
Input To Input Sequence 












Figure 5.6: State machine extraction ﬂow diagram.
The extraction process is described in conjunction with the ﬂow-diagram
in Figure 5.6, as presented in [47].
The ﬁrst step of the algorithm is to create a list of components to be inves-
tigated in the extraction process. This list includes all the components which
form part of the identiﬁed inductive loops and the input and output junctions
supplied by the designer. The transient current values of the loop components
are stored for ﬂux calculations while the input and output junction ﬂux values
are stored in order to identify output pulse generation and to calculate the
latency from the input to the output pulses. An initial input sequence and
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 63
initial state are then created. The input sequence contains the sequence of
pulses that caused the circuit to reach the state under investigation as well as
any new stimuli that should be applied to the state. For example, an input
sequence could have been a pulse on InputA at time X followed by a pulse
on InputB at time Y, and so on. In examining the ﬂow-diagram, it is evident
that after a state investigation phase has been completed a newly identiﬁed
state that has not yet been investigated is selected as the test state. The input
sequence is then populated with the nominal sequence of input pulses that
caused the state under test to be reached. At start-up, only the initial state is
available for testing which by deﬁnition requires no input pulses to be reached.
Hence the input sequence of the initial state would be empty.
Nominal ﬂux values are required for each state in order to calculate ﬂux
changes caused by input pulses. Since the initial state was not identiﬁed by the
algorithm through the normal input application process as explained below,
an initial SPICE run was used to ascertain what the nominal ﬂux values of
this state are.
The process then continues by selecting an untested input to the circuit.
Thereafter, a pulse is added to the input sequence and applied a certain amount
of time after the previous pulse application in the sequence so as to ensure
that all circuit transients have dissipated. This amount of time is the settling
time provided by the user during the set-up procedure. A SPICE deck is
then automatically generated with the applied input sequence consisting of
the current and phase values of the previously created save list which are
identiﬁed as variables to log.
Once the SPICE run is completed the new ﬂux values are calculated by
using the last stored value of each current measurement. Here the assumption
is that the settling time supplied by the user was suﬃcient to dissipate any
current transients in the circuit. In future versions of the implementation a
test could be added to calculate the ﬁnite diﬀerence of the last stored value
in a measurement. If this ﬁnite diﬀerence is above a certain value (indicating
that transients are still active), the designer can be notiﬁed or the SPICE deck
can be rerun for a longer period of time. The calculation process is done using
equation 5.3.1 to ascertain whether any ﬂuxons have been stored or lost due
to the application of the input pulse.
The input and output junction measurements are also processed for junc-
tion switching. The stored phase values are stepped through, and if 80% of
a 2pi phase shift is detected, a switch event is logged. The time of the input
and output switched values are also logged. The diﬀerence between the output
switching time and the input switching time is used as the latency for output
pulse generation.
After the calculation phase has been completed all the previously identiﬁed
states are searched so as to compare them with the changed ﬂuxon values (if
any). If a previous state is identiﬁed or if no ﬂuxon change has occurred
(indicating that no state change took place), the state-under-test transition
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 64
table is updated to log the transition. If no previous state is identiﬁed, a new
state is created containing the ﬂuxon identiﬁers that were determined during
the simulation process. This state is then added to the un-investigated state
list for later investigation.
After the state-under-test transition table has been updated, the last ap-
plied input is removed from the input sequence and marked as tested. Another
untested input is then selected and added to the input sequence after which
the SPICE deck is once again generated and the measured data are processed.
After all the inputs of the state-under-test have been investigated, the state
testing is ﬂagged as completed. Next, an untested state is selected for the
input application.
The assumption is that the initial state did not contain any stored ﬂuxons.
If a ﬂuxon was being stored in this state at the time, a subsequent state would
have been identiﬁed to gain a negative ﬂuxon. After the extraction process has
been completed, a normalisation process is performed to remove the negative
ﬂuxon from the relevant states and to add the same ﬂuxon to all the other
states.
When no further untested states are identiﬁed the extraction process is
complete.
5.4.5 Critical Timing Extraction
In general, the critical timing of a circuit is deﬁned in terms of the minimum
(or maximum) time allowed between related input pulses. The relationship
between these pulses indicates that both have an eﬀect on the same functional
elements in a circuit. In semiconductor devices the critical times are usually
described in terms of set-up and hold times. However, due to the prevalence
of synchronous circuits in semiconductor technology, these times are always
regarded in relation to the clock signal. As depicted in Figure 5.7, the set-up
time is the time before the clock edge (usually rising) when the data have to
be stable.
Figure 5.7: Illustration of set-up and hold times.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 65
The hold time is the time after the clock edge when the data have to remain
stable. In the case of ﬂip-ﬂops, a timing violation might either sample the data
pulse incorrectly or cause the ﬂip-ﬂop to become metastable.
In terms of RSFQ circuits, critical times remain a relationship between
input pulses. The main diﬀerence is that the critical times may diﬀer substan-
tially depending on which state the circuit is currently in. This eﬀect is caused
by the variability of inductance of a Josephson junction, depending on the bias
current as shown in equation 5.3.2. A stored ﬂuxon leads to a ripple eﬀect,
generating shielding currents to expel any non-quantised ﬂux. These shielding
currents can therefore have an eﬀect on the bias currents of junctions that are
not situated close to the ﬂuxon storage loop. The change in junction induc-






It is therefore essential to investigate critical timings in terms of the state
of the current circuit as proposed by [43].
Two approaches were investigated for the extraction of critical timings.
The ﬁrst empirically tests the minimum delay between subsequent pulses by
using SPICE runs to ascertain when the functionality of the circuit is aﬀected.
These tests are performed for each pulse pair and for each circuit state.
Figure 5.8: A simpliﬁed RSFQ AND gate with Mealy state diagram.
When considering the AND gate and its Mealy state machine represen-
tation in Figure 5.8, it is evident that all the shunt resistances and parasitic
elements were removed for the sake of clarity. The four possible states are gov-
erned by the storage (or lack of storage) of ﬂuxons in the two storage loops of
the circuit, B2L1B3 and B10L4B11. For each of these states, the min-
imum allowable time between input pulse pairs needs to be calculated. This
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 66
can be achieved relatively easily since the correct behaviour is known due to
the state machine extraction method described earlier. For example, assuming
that the circuit is in State 0, then no ﬂuxons are stored. If the relation be-
tween the DataAClock sequences is investigated, the stored state machine
is aware that it should move to State 1 under functional circumstances and
then return to State 0 without generating an output pulse. The ﬁrst step in
locating the minimum allowable delay is to ascertain whether a timing rela-
tionship exists at all. In order to test this, Clock is applied to the circuit
for the shortest time possible (usually a picosecond) after DataA. Depending
on the component values and manufacturing process, the currents in B3 and
B4 might not have stabilised suﬃciently by the time the Clock pulse reaches
B4, possibly throwing the Clock out of the circuit through B4 switching. The
state transitions identiﬁed by the state machine extraction in this scenario
would be only a transition to State 1 which would not compare correctly to
the nominal state machine representation. The algorithm can therefore as-
sume that a relationship exists between the DataA → Clock sequence which
should be investigated further. The investigation now returns the time delay
between the input pulses to that supplied by the designer after which a binary
search ensues. The time delay is therefore initially halved. If the circuit is
still functional after half the delay, half the diﬀerence between the last func-
tional time (in this case half the supplied time) and the last non-functional
time (the 1 picosecond delay) is subtracted from the current delay. If not,
half the diﬀerence between the last functional (the supplied time) and the last
non-functional time (in this case half the supplied time) is added to the delay.
The algorithm iteratively updates the functional and non-functional times by
either adding or subtracting half this value from the current delay in order
to create a window around the critical time. The designer is able to supply
a acceptable uncertainty (windows size) after which the algorithm will return
the last working value as the critical time relationship between the two pulses.
It is important to note that for these two inputs the sequence Clock →
DataA also has to be investigated. In this sequence the nominal state machine
remains in State 0 after the application of the Clock pulse and then transitions
to State1 after the arrival ofDataA. This is exactly what was determined from
the 1 ps delay example which therefore indicates that no relationship exists
between the input pair in this sequence.
The main assumption of this method is that all the critical timing informa-
tion of the circuit can be described in terms of the relationship of input pairs
only. Input triplets and n-tuples in general are not investigated at all. For
example, it is assumed that when the sequence DataA→ DataB → Clock is
applied to the circuit, the delay between DataA → DataB does not have an
eﬀect on the critical timing between DataB → Clock. This is a very diﬃcult
assumption to prove. In eﬀect, to ensure that this is not the case, every sig-
nal input in an n-tuple sequence needs to be investigated so as to create an
exhaustive critical timing model which would incur a signiﬁcant performance
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 67
penalty.
Figure 5.9: Empirical critical timing extraction algorithm.
An overview of the ﬁrst method is presented in Figure 5.9. At start-up,
an untested input pair of an untested state is selected for investigation. First
a SPICE run is executed to test the shortest delay possible between the two
input pulses (usually 1 ps). The resulting state can then be tested against the
nominal extracted state machine representation for circuit functionality. If the
run was found to be functional, no critical timing relationship is present and
the next untested input pair is investigated. If the run was non-functional,
a new delay is selected starting at half the nominal delay provided by the
designer. A binary search algorithm is then applied as explained above. After
each run the current critical time uncertainty is tested (window size). This is
the value of the last function delay minus the last non-functional delay. If this
diﬀerence is smaller than the acceptable uncertainty, the last working delay
is accepted as the critical time for the input pair. Once all pairs in all states
have been tested, the algorithm is complete.
The second method investigates the currents in the system to ascertain
when all transients have passed after the application of an input pulse. Be-
fore applying such a method, the term "stability" needs to be deﬁned for the
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 68
current application. Currently equation 5.4.3 is used to ascertain whether a
current under investigation is stable. In eﬀect, a ﬁnite diﬀerence is calculated
by using the diﬀerence between subsequent current values (normalised to the
ﬁrst sample) and testing them against a threshold. At present no theoretical
approach exists to identify the threshold value. This value depends on the
SPICE simulation time-step, the circuit under investigation and the manufac-
turing process being modeled by the current simulation. For example, it was
found that using the 1kA/cm2 process of the Institute of Photonic Technology
(IPHT) [49], a value of 0.01 tended to render very repeatable results when
using a SPICE time-step of 1 ps.
x(t− 1)− x(t)
x(t− 1) < threshold (5.4.3)
One step of stability as described in equation 5.4.3 also does not ensure
that the circuit is in fact stable. The algorithm has considered the possibility
of slow varying signals that could still cause instability in the circuit operation.
Equation 5.4.3 was therefore modiﬁed to function with a number of samples
as presented in equation 5.4.4 in order to mitigate this situation.
x(t− 1)− x(nt)
x(t− 1) < threshold should be true all n where n = 1...k (5.4.4)
For this equation, the initial sample is held as a reference and normalisation
ratio but the inequality needs to be met for a number of subsequent samples
before the algorithm acknowledges that stability has been reached. A value
of 5 for n was found to be robust for the manufacturing process and SPICE
time-step indicated previously.
The next variable to investigate when using the stability method is the
decision about which components to test for current stability. The most ro-
bust solution is to test every component in the circuit. This, however, incurs
a signiﬁcant processing penalty on the algorithm and for many circuits this
will result in large critical times. Considering the AND gate in Figure 5.8
once again, it is apparent that critical times, as stated before, are a relational
attribute between input pulses. If, for example, one investigates the critical
time between DataA and DataB, one recognises that the circuit was designed
so that these two pulses can arrive at the same time without aﬀecting the
functionality of the circuit. The stability algorithm testing all the components
would, however, assume that the stability of all the components could possibly
have an eﬀect on DataB after the arrival of DataA. The critical time would
therefore be calculated as the time that the last component took to stabilise
(possibly B8 in the circuit). This would therefore result in unnecessary timing
violations and a large performance penalty if these critical timings are to be
taken into account for further circuit design.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 69
On the other hand, a sub-set of components containing the storage loops
might generate a more realistic critical time. The algorithm could be modiﬁed
to only investigate the storage loops that change their storage characteristics
(gain or lose a ﬂuxon) due to the application of the second input pulse in
the critical timing pair. This information is readily available due to the state
machine extraction process described earlier. For example, consider the critical
time between the input pairDataA→ Clock. An input pulse onDataA causes
a ﬂuxon to be stored in its associated storage loop thereby transitioning the
state machine from State 0 to State 1. A subsequent Clock pulse should release
the previously mentioned stored ﬂuxon and return the state machine to State
0. Since the state machine fully describes the functionality of the circuit, and
also since the state machine is implemented by means of the ﬂuxon storage
loops, one might argue that the critical timing should only account for settling
times of relevant storage loop components. In the example the algorithm
would therefore monitor components B2, L1 and B3 for current stability before
allowing the Clock pulse to be safely applied to the circuit.
The ﬂaw in this argument, though, is that the eﬀects of series junctions
are ignored. If the critical timing for the input pair DataA → DataA is
to be investigated, the algorithm would notice that a subsequent pulse on
DataA will not aﬀect the current state after a ﬂuxon is already stored in loop
B2L1B3 due to the eﬀects of the ﬁrst pulse on DataA. The algorithm
would then erroneously assume that no critical time exists between subsequent
applications of pulses on DataA. In terms of this example the current stability
of the series junctionB1 needed to be considered as well. This method therefore
does not take into account the eﬀects of the storage or loss of a ﬂuxon on other
parts of the circuit.
Given the arguments thus far a possible generalisation could be stated as
follows:
A critical timing relationship exists between two input pulses if the ﬁrst
input pulse causes the circuit to change state, and if the circuit behaves dif-
ferently due to the application of the second pulse in the two respective states
[47].
The input pair DataA→ Clock ﬁts the above generalisation for a critical
timing relationship. The application of a pulse on DataA causes the circuit to
change state, and a subsequent application of a pulse on Clock has a diﬀerent
eﬀect on the circuit before and after the state change. More speciﬁcally the
change in circuit behaviour is such that before the arrival of DataA, a Clock
pulse would cause B4 to switch and after the application of DataA, a Clock
pulse causes B3 and B5 to switch.
The input pairDataA→ DataA also ﬁts the generalisation. The ﬁrst pulse
on DataA causes the state to change due to the ﬂuxon storage. The eﬀects
of the subsequent DataA pulse diﬀers before and after the state transition.
In this case, B2 switched before the state change while B1 switched after the
state transition.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 70
The input pair DataA → DataB does not have a critical timing relation-
ship which is also indicated by the generalisation. Even though a pulse on
DataA causes the circuit state to change, a subsequent pulse on DataB has
no diﬀerent eﬀect on the circuit before and after the state change.
Since the changes in the circuit are dictated by certain junctions in the
circuit, a natural conclusion is to rather investigate the stability of these com-
ponents in order to ascertain their critical timing relationships. For the input
pair DataA → Clock the diﬀerence in elements are B4, B3 and B5. There-
fore the critical timing relationship is the time required for the last of these
components to stabilise after the application of DataA.
Note that the eﬀects of Clock on the three critical junctions are not instan-
taneous after the application of the Clock pulse itself. The voltage pulse has
a certain delay before its eﬀects are applied to the critical junctions. In the
algorithm this time is also calculated and subtracted from the previously cal-
culated critical time so as not to unnecessarily lengthen the established critical
time.
A further reﬁnement was also required that is not stated in [47]. It is also
noteworthy that the generalisation does not hold for the input pair Clock →
Clock. If we start the application in State 0 there will be no change in the
state after the ﬁrst application of Clock. Hence, based on the generalisation,
the algorithm will assume that there is no relationship between the input
pairs. This, however, might not be the case since the application of two Clock
pulses with a very small delay might have functional eﬀects on the circuit. It
was therefore decided to rather test pulses that do not ﬁt the generalisation
by using the empirical timing characterisation method explained previously.
For example, when the algorithm notices that the input pair does not ﬁt the
generalisation, a SPICE run is performed where the pair is applied to the
circuit in the quickest possible succession (usually 1 ps delay). If the correct
ﬁnal state is reached by using this application, the algorithm indicates that no
critical timing relationship exists between this input pulse pair. If the correct
state was not reached, the empirical method needs to be applied to the input
pair so as to determine the correct relationship, as explained previously.
Even with the reﬁnements, it cannot be claimed that the method will work
under all circumstances. It cannot, for example, be conclusively proven that
junctions whose behaviour remains constant before and after a state change
cannot have any functional eﬀect due to transient eﬀects on the circuit be-
haviour. The proposed method does, however, function robustly for current
use cases.
An overview of the stability critical timing extraction algorithm is pre-
sented in Figure 5.10. At the start of the algorithm an untested input pair
in an untested state is selected for investigation. The known state machine
representation is used to ascertain whether the ﬁrst pulse in the pair causes a
state change in the system. If not, a shortest possible delay between the pulses
is tested to identify a possible critical timing relationship. If a relationship is
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 71
Figure 5.10: Current stability critical timing extraction algorithm.
identiﬁed, the empirical extraction method described previously is employed
to calculate the critical time of the input pair. If the ﬁrst pulse in the pair
does in fact cause a state change, a SPICE run is used to ascertain whether
the second pulse causes a diﬀerence in any junction switching between the
two respective states. If no diﬀerences are identiﬁed, the empirical method is
once again employed to test for a critical time relationship. If diﬀerences in
the junction switches were found, the ﬁrst pulse is applied to the circuit and
the last transient time of all the critical junctions is logged. The second pulse
is then used to log the ﬁrst possible eﬀect of the input on the critical junc-
tions (in both states reached). The diﬀerence between the two times are then
logged as the critical time relationship between the input pair. The algorithm
is completed after all the input pairs for all the states have been investigated.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 72
p_time : PROCESS ( in_A , in_B) IS
BEGIN
−− I n e s t i g a t e f o r v i o l a t i o n s
IF r i s ing_edge ( in_A) AND (
s0_in_B_in_A_err = '1 ' OR
s1_in_A_in_A_err = '1 ' ) THEN
REPORT "Timing V io l a t i on " SEVERITY ERROR;
END IF ;
IF r i s ing_edge ( in_B) AND (
s1_in_A_in_B_err = '1 ' ) THEN
REPORT "Timing V io l a t i on " SEVERITY ERROR;
END IF ;
−− Update v i o l a t i o n s i g n a l s
CASE current_state IS
WHEN state_0 =>
IF r i s ing_edge ( in_B) THEN
s0_in_B_in_A_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 42 PS ;
END IF ;
WHEN state_1 =>
IF r i s ing_edge ( in_A) THEN
s1_in_A_in_A_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 5 PS ;
END IF ;
IF r i s ing_edge ( in_B) THEN




Figure 5.11: Example error veriﬁcation VHDL code
5.4.6 HDL Generation
Both the state machine representation and the critical timing information is
necessary for the generation of an HDL model. There is no limitation as to
which HDL modelling language should be used for the model generation step
since all the commonly used languages (VHDL, Verilog, System Verilog etc.)
contain the necessary constructs for the behavioural and timing implementa-
tion. Currently the software library only allows for the generation of VHDL
models although other language support can be added in future.
Since VHDL (and other HDL languages) primarily make use of voltage
state logic, the presense of an SFQ pulse is modelled as a picosecond long
pulse. The rising edge of this pulse is used to trigger the sequential HDL logic.
The implementation of the critical timing checks are discussed in relation to
Figure 5.11.
Every critical timing relationship has an associated critical time signal.
If the signal value is a logical 1, this signiﬁes that the arrival of the second
pulse in the critical time pair will result in a timing violation. In the exam-
ple code, the circuit being modelled has two inputs (in_A and in_B). The
critical timing extraction method identiﬁed three relationships. The ﬁrst and
second relationship is in_B → in_A and in_A → in_A when in State
0, and in_A → in_B when in State 1. The associated error signals are
s0_in_B_in_A_err, s0_in_A_in_A_err and s1_in_A_in_B_err re-
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 73
spectively. The dynamics of the error signals are modelled by using VHDL
TRANSPORT delays. This ensures that the error signal returns to a logical 0
after the critical timing period has passed. The updating of the error signals
are implemented by means of a CASE construct.
The implementation of the state machine makes use of standard VHDL
methods to implement sequential logic and will not be elaborated on.
5.4.6.1 HDL Generation Example
A reconﬁgurable Destructive-Flip-Flop(DFF)-JTL circuit was selected to demon-
strate the HDL generation methodology. The circuit schematic is illustrated
in Figure 5.12 with component parameters available in Table 5.1. The shunt
resistors were removed for the sake of clarity. All input pulses are applied
through a DC-SFQ circuit connected to the circuit under test using a Joseph-
son Transmission Line while the outputs are terminated using a 2Ω resistor
connected to the circuit also through a Josephson Transmission Line.
Figure 5.12: Schematic circuit of an unoptimised reconﬁgurable DFF-JTL
The circuit can function either as a D-Flip-Flop or a JTL depending on
which of the two set-up inputs, Set_To_JTL or Set_To_DFF , last received
an input pulse. At start-up, the circuit functions as a D-Flip-Flop due to a lack
of a ﬂuxon being stored in the loop B14L20L21L17B11. The lack of a
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 74
Junctions Inductors Bias Currents
B1 : 100µm
2 L1 : 2.5 pH Ib1 : 81µA
B2 : 200µm
2 L2 : 2 pH Ib2 : 300µA
B3 : 100µm
2 L3 : 2.2 pH Ib3 : 300µA
B4 : 100µm
2 L4 : 2 pH Ib4 : 150µA
B5 : 225µm
2 L5 : 2.1 pH Ib5 : 63µA
B6 : 200µm
2 L6 : 0.1 pH Ib6 : 300µA
B7 : 300µm
2 L7 : 4.6 pH Ib7 : 300µA
B8 : 100µm
2 L8 : 1.8 pH Ib8 : 104µA
B9 : 200µm
2 L9 : 2 pH
B10 : 200µm
2 L10 : 0.5 pH
B11 : 125µm
2 L11 : 2.5 pH
B12 : 200µm
2 L12 : 2.5 pH
B13 : 225µm
2 L13 : 0.1 pH
B14 : 250µm
2 L14 : 0.5 pH
L15 : 2.3 pH
L16 : 3.9 pH
L17 : 0.8 pH
L18 : 2, 5 pH
L19 : 2.9 pH
L20 : 2.9 pH
L21 : 3.5 pH
L22 : 1.8 pH
L23 : 1.8 pH
L24 : 1.8 pH
L25 : 1.8 pH
Table 5.1: DFF-JTL circuit parameters.
stored ﬂuxon inﬂuences the bias current through B4, which in turn aﬀects the
behaviour of input pulses on Data_In. An input pulse on Data_In causes a
ﬂuxon to be stored in the loop B3L3L4B1L9B7. A subsequent input
pulse on the Clock input will cause B4 to switch allowing the stored ﬂuxon to
propogate throughData_Out. If noData_In pulse arrived prior to the Clock
arrival, the Clock pulse is removed from the circuit by the switching of B1.
This circuit demonstrates the ﬂuxon storage exception that was explained pre-
viously. Normally the stored ﬂuxon in loop B3L3L4B1L9B7 would
cause a shielding current through L5, L6 and B4 to counteract the net ﬂux
gain in loop B3L3L4L5L6B4 . The size of B4 is, however, relatively
small and this shielding current would cause B4 to exceed its critical current.
As explained earlier, this will cause some of the shielding current to rather
shunt through L7 and B5. In this case, the extraction algorithm either in-
dicates that B3L3L4B1L9B7 is a ﬂuxon storing inductive loop, or
that the combination of B3L3L4L5L6B4 and B4L7B5 are used
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 75
for the storage. Both of these choices will result in the correct state machine
representation.
An input on Set_To_JTL will have the eﬀect of storing a ﬂuxon in
B14L20L21L17B11 and therefore also an associated eﬀect on the bias
current of B4. Any subsequent inputs on Data_In will now rather switch
B4 before ﬂuxon storage can occur. This will then propagate the pulse to
Data_Out.
If the circuit is in the D-Flip-Flop mode with a ﬂuxon being stored, an input
on Set_To_JTL will cause the stored ﬂuxon to be propagated to Data_Out
before returning to JTL mode.
The Data_In input does not have a series junction to guard against two
subsequent pulses before a Clock pulse arrives. If this should happen, the
second Data_In pulse will cause B4 to switch without the need of a Clock
pulse. In order to indicate to the algorithm that this situation should not be
investigated, a user deﬁned rule is necessary. The appropriate rule is indicated
in equation 5.4.5 where node 17 is Data_In, node 16 is Clock, node 15 is
Set_To_DFF and node 14 is Set_To_JTL.
rule max 17 1 after 15 reset 16 override 14 (5.4.5)
The rule states that a maximum of one pulse can be applied to node 17.
This rule is only valid when the "after" node receives an input after the "over-
ride" node. In this case, the rule is only valid when the Set_To_DFF pulse
arrives after the Set_To_JTL pulse. The rule is reset after the arrival of a
pulse on node 16.
All the inputs of the circuit-under-test were connected to the output of a
DC-SFQ and JTL pair in order to generate a representative SFQ pulse from
a piece-wise linear SPICE input. The output was connected to a JTL before
terminating in a 2Ω resistor.
The extracted state machine representation is illustrated in Figure 5.13
which was reproduced from [47].
The system has two ﬂuxon storage loops which equates to a maximum of
four states. However, only three states are reachable since both loops never
store a ﬂuxon simultaneously. State 0 is associated with the DFF functionality
where no ﬂuxon is stored in B14L20L21L17B11. State 2 is associated
with the storage of a ﬂuxon in B14L20L21L17B11 after the arrival
of a pulse on Data_In. State 1 is associated with the storage of a ﬂuxon
in B14L20L21L17B11. The circles on the transition arrows indicate
when an output pulse is generated. It is important to note that an additional
Data_In pulse in State 2 is not allowable due to the lack of a series junction
in the input path.
After the state machine is generated the critical timings of the circuit are
extracted before the VHDL model is constructed. For the purposes of com-
parison both of the critical timing extraction methods were applied to the
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 76
Figure 5.13: Extracted DFF-JTL Mealy state machine representation.
DFF-JTL circuit. The VHDL error code for the empirical extraction method
is shown in Figure 5.14 where the allowable timing uncertainty was selected
as 4 ps. The critical timings by using the current stability method is shown in
Figure 5.15 and in Figure 5.16.
The inputs of the VHDL model is associated with the input junctions that
were used to calculate timing parameters. The input junctions were selected as
the last junctions in the connected DC-SFQ-JTL pair. In the model, X6_B2
is associated with the Set_To_JTL input, X7_B2 is associated with the
Set_To_DFF input, X8_B2 is associated with the Clock input and X9_B2
is associated with Data_In. Entity and architectural deﬁnitions were omitted
for the sake of clarity.
From the ﬁgures presented above, it is evident that there is a substantial
diﬀerence in the models of the two methods. The current stability method
always errs on the side of caution since it is clear that many of the critical
timings generated due to overlapping current dynamics do not empirically
cause a circuit failure. One could argue, however, that the stability method
results in a far more robust circuit. The stability method can further be
reﬁned to rather change the time selected for the start and end of the current
transients to a certain percentage of the maximum current change.
The state machine implementation itself is equivalent between the methods
and is presented in Figure 5.17.
VHDL and SPICE simulation results are illustrated in ﬁgures 5.18 and
5.19 which were reproduced from [47] to illustrate the equivalence between the
SPICE circuit and the generated HDL model.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 77
p_time : PROCESS ( in_X8_B2 , in_X6_B2 , in_X7_B2) IS
BEGIN
IF r i s ing_edge (X8_B2) AND (s0_X9_B2_X8_B2_err = '1 ' ) THEN
REPORT "Timing V io l a t i on " SEVERITY ERROR;
END IF ;
IF r i s ing_edge (X6_B2) AND (
s0_X7_B2_X6_B2_err = '1 ' OR
s2_X6_B2_X6_B2_err = '1 ' OR
s2_X7_B2_X6_B2_err = '1 ' ) THEN
REPORT "Timing V io l a t i on " SEVERITY ERROR;
END IF ;
IF r i s ing_edge (X7_B2) AND (
s1_X6_B2_X7_B2_err = '1 ' OR
s1_X9_B2_X7_B2_err = '1 ' ) THEN




IF r i s ing_edge (X7_B2) THEN
s0_X7_B2_X6_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 8 PS ;
END IF ;
IF r i s ing_edge (X9_B2) THEN
s0_X9_B2_X8_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 8 PS ;
END IF ;
WHEN state_1 =>
IF r i s ing_edge (X6_B2) THEN
s1_X6_B2_X7_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 5 PS ;
END IF ;
IF r i s ing_edge (X9_B2) THEN
s1_X9_B2_X7_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 10 PS ;
END IF ;
WHEN state_2 =>
IF r i s ing_edge (X6_B2) THEN
s2_X6_B2_X6_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 45 PS ;
END IF ;
IF r i s ing_edge (X7_B2) THEN




Figure 5.14: Error veriﬁcation VHDL code for the DFF-JTL circuit using the
empirical extraction method.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 78
p_time : PROCESS ( in_X8_B2 , in_X6_B2 , in_X7_B2 , in_X9_B2) IS
BEGIN
IF r i s ing_edge (X8_B2) AND (
s0_X9_B2_X8_B2_err = '1 ' OR
s2_X6_B2_X8_B2_err = '1 ' OR
s2_X8_B2_X8_B2_err = '1 ' ) THEN
REPORT "Timing V io l a t i on " SEVERITY ERROR;
END IF ;
IF r i s ing_edge (X6_B2) AND (
s0_X6_B2_X6_B2_err = '1 ' OR
s0_X7_B2_X6_B2_err = '1 ' OR
s0_X9_B2_X6_B2_err = '1 ' OR
s1_X7_B2_X6_B2_err = '1 ' OR
s2_X6_B2_X6_B2_err = '1 ' OR
s2_X7_B2_X6_B2_err = '1 ' OR
s2_X8_B2_X6_B2_err = '1 ' ) THEN
REPORT "Timing V io l a t i on " SEVERITY ERROR;
END IF ;
IF r i s ing_edge (X7_B2) AND (
s0_X6_B2_X7_B2_err = '1 ' OR
s1_X6_B2_X7_B2_err = '1 ' OR
s1_X7_B2_X7_B2_err = '1 ' OR
s1_X9_B2_X7_B2_err = '1 ' OR
s2_X6_B2_X7_B2_err = '1 ' ) THEN
REPORT "Timing V io l a t i on " SEVERITY ERROR;
END IF ;
IF r i s ing_edge (X9_B2) AND (
s0_X6_B2_X9_B2_err = '1 ' OR
s1_X7_B2_X9_B2_err = '1 ' OR
s2_X6_B2_X9_B2_err = '1 ' OR
s2_X8_B2_X9_B2_err = '1 ' ) THEN
REPORT "Timing V io l a t i on " SEVERITY ERROR;
END IF ;
Figure 5.15: Error veriﬁcation VHDL code for the DFF-JTL circuit using the
current stability method.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 5. STATE MACHINE AND TIMING EXTRACTION 79
CASE current_state IS
WHEN state_0 =>
IF r i s ing_edge (X6_B2) THEN
s0_X6_B2_X6_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 42 PS ;
END IF ;
IF r i s ing_edge (X6_B2) THEN
s0_X6_B2_X7_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 43 PS ;
END IF ;
IF r i s ing_edge (X6_B2) THEN
s0_X6_B2_X9_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 20 PS ;
END IF ;
IF r i s ing_edge (X7_B2) THEN
s0_X7_B2_X6_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 8 PS ;
END IF ;
IF r i s ing_edge (X9_B2) THEN
s0_X9_B2_X8_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 46 PS ;
END IF ;
IF r i s ing_edge (X9_B2) THEN
s0_X9_B2_X6_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 49 PS ;
END IF ;
WHEN state_1 =>
IF r i s ing_edge (X6_B2) THEN
s1_X6_B2_X7_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 5 PS ;
END IF ;
IF r i s ing_edge (X7_B2) THEN
s1_X7_B2_X6_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 45 PS ;
END IF ;
IF r i s ing_edge (X7_B2) THEN
s1_X7_B2_X7_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 48 PS ;
END IF ;
IF r i s ing_edge (X7_B2) THEN
s1_X7_B2_X9_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 25 PS ;
END IF ;
IF r i s ing_edge (X9_B2) THEN
s1_X9_B2_X7_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 10 PS ;
END IF ;
WHEN state_2 =>
IF r i s ing_edge (X6_B2) THEN
s2_X6_B2_X8_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 77 PS ;
END IF ;
IF r i s ing_edge (X6_B2) THEN
s2_X6_B2_X6_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 82 PS ;
END IF ;
IF r i s ing_edge (X6_B2) THEN
s2_X6_B2_X7_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 86 PS ;
END IF ;
IF r i s ing_edge (X6_B2) THEN
s2_X6_B2_X9_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 62 PS ;
END IF ;
IF r i s ing_edge (X7_B2) THEN
s2_X7_B2_X6_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 12 PS ;
END IF ;
IF r i s ing_edge (X8_B2) THEN
s2_X8_B2_X8_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 66 PS ;
END IF ;
IF r i s ing_edge (X8_B2) THEN
s2_X8_B2_X6_B2_err = TRANSPORT ' 1 ' , ' 0 ' AFTER 65 PS ;
END IF ;
IF r i s ing_edge (X8_B2) THEN




Figure 5.16: Error veriﬁcation VHDL code for the DFF-JTL circuit using the
current stability method continued.
Stellenbosch University  https://scholar.sun.ac.za




IF r i s ing_edge (in_X6_B2) THEN
current_state <= state_1 ;
END IF ;
IF r i s ing_edge (in_X9_B2) THEN
current_state <= state_2 ;
END IF ;
WHEN state_1 =>
IF r i s ing_edge (in_X7_B2) THEN
current_state <= state_0 ;
END IF ;
IF r i s ing_edge (in_X9_B2) THEN
out_X11_B1_sig <= TRANSPORT ' 1 ' AFTER 43 ps , ' 0 ' AFTER 44 ps ;
END IF ;
WHEN state_2 =>
IF r i s ing_edge (in_X8_B2) THEN
current_state <= state_0 ;
out_X11_B1 <= TRANSPORT ' 1 ' AFTER 34 ps , ' 0 ' AFTER 35 PS ;
END IF ;
IF r i s ing_edge (in_X6_B2) THEN
current_state <= state_1 ;
out_X11_B1 <= TRANSPORT ' 1 ' AFTER 43 ps , ' 0 ' AFTER 44 PS ;
END IF ;
IF r i s ing_edge (in_X9_B2) THEN
REPORT " I l l e g a l  Input  Sequence" SEVERITY ERROR;
END IF ;
END CASE;
Figure 5.17: VHDL state machine implementation of the DFF-JTL circuit.
Figure 5.18: Functional waveforms of DFF-JTL VHDL simulation.
Figure 5.19: SPICE transient simulation of DFF-JTL circuit.
Stellenbosch University  https://scholar.sun.ac.za
Chapter 6
Yield Optimisation of RSFQ
Circuits
6.1 Introduction
Optimisation is the minimisation (or maximisation) of a function given certain
variable constraints. Mathematically, optimisation can be described as follows
[50]:
minimise
x ∈ Rn fi(x), (i = 1...M) (6.1.1)
with φj(x) = 0 (j = 1...J) (6.1.2)
υk(x) ≤ 0 (j = 1...K) (6.1.3)
In equations 6.1.1, x is the input vector (also called the design vector)
and fi is the function to be minimised(maximised), also known as the cost or
merit function. The subscript indicates that more than one objective can be
used in the optimisation process. The constraints can generally be classiﬁed
as equalities (for φ) and bounds (for υ) which are applicable to the input
vector. In the equation, the input vector is designated as being in the set of
real numbers. The equations are, however, applicable to integer sets as well.
In this section the main focus falls on the optimisation of circuit yield.
Circuit yield is the percentage of manufactured circuits that are classiﬁed
as functional, given their designer speciﬁcations and manufacturing process
variations. Various methods of circuit yield calculations were investigated in
Chapter 3 of this study.
The need for optimisation in terms of circuit yield is self-evident. Commer-
cially speaking, less circuit failures equate to more proﬁt. In terms of research,
more robust circuits equate to the possible design of more complex circuitry.
In the semiconductor industry, yield optimisation (also known as design
centering) is still an active research area, even after decades of maturity and
81
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 82
experience in the manufacturing of such circuits. The reason for this appears
to be the constant shrinking of node sizes with associated changes in the man-
ufacturing process.
There is no reason to believe that this will be any diﬀerent for RSFQ cir-
cuits (or for any other next-generation digital logic circuitry). RSFQ circuit
design is still in its infancy, even after a few decades of active research. This
includes manufacturing processes that cannot be compared with the complex-
ity of processes used in the semiconductor industry. In order to stimulate
research and funding in the superconducting digital logic ﬁeld, complex and
useful circuits need to be designed, shown to be functional, shown to be com-
parable and, hopefully, superior to their semiconductor counterparts. Yield
optimisation forms an integral part of this process to create complex, robust
circuits.
Substantial research has already been conducted in the ﬁeld of RSFQ yield
optimisation. Many of these methods were inspired by the semiconductor in-
dustry, given the overlapping similar problem of process spread. Currently,
the optimisation methods investigated for RSFQ circuits fall into two cate-
gories: geometric and heuristic. Geometric methods attempt to model the
feasible area in the possible search space by using multi-dimensional geometric
structures. The feasible area is the area in the search space where the circuit
is deemed functional. Once a geometric model is generated, a mathematical
method can be used to ﬁnd the centre of this structure, which then indicates
the optimal design. It follows that the highest yield can be found at this point
since it allows for the maximum amount of deviation in all dimensions while
still maintaining the functionality of the circuit. An example of this method
using hyper-spheres can be found in [51].
The second type of optimisation that is employed to maximise circuit yield
is heuristic and meta-heuristic methods. A heuristic can be thought of as an
experience-based technique that can be applied to solve a particular problem.
A meta-heuristic is a technique that can be applied to many diﬀerent types
of problems. Generally, heuristics do not guarantee optimal solutions, but
rather tend to produce good solutions for diﬃcult optimisation problems. A
staple of modern heuristics is also a random element which tends to make any
rigorous mathematical proof of convergence a very diﬃcult undertaking. On
the other hand, it allows these methods to be used to solve problems that
cannot be approached analytically at all. Some of these methods can be found
in [52, 53, 54]. As is evident, many of these heuristics are inspirated by natural
phenomena in order to address the optimisation problem.
Comparing heuristics is a diﬃcult task due to their intrinsic random nature.
One can, for example, do a few runs of each heuristic and illustrate the ﬁnal
merit value that is achieved. However, these results might never be repeatable
due to the random nature of the algorithms. A more robust approach for
comparing this probabilistic data would be to perform many runs and obtain
statistical information on the data provided. This implies an investigation into
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 83
the best merit achieved, the mean of all the runs, the standard deviation of the
runs, and so on, in order to ascertain whether there really is an improvement
of any particular algorithm over another. The problem is exacerbated by the
amount of time it takes to perform an optimisation run for a single, very
small circuit. Depending on the conﬁdence level of the requires values, an
optimisation run can take anything from a few minutes to a few days which
renders the gathering of vast amounts of data very time-consuming.
The alternative is to use a model of the merit space that can be sampled
in a fraction of the time a SPICE run would take. This ability is provided
by the artiﬁcial neural networks discussed in Chapter 4. Once the network
has been trained it can be reused for the testing of all the heuristics. Bearing
in mind though that the training of an artiﬁcial neural network is not an
exact science. One might, for example, miss subtle behaviours in the merit
function either because no training data were available at those locations, or
the network parameters were not set up correctly to incorporate the correct
behaviour. Either way, as is presented below, the networks will serve as an
approximation of the shape of the search space which serves as a tool to test
optimisation runs thousands of times in order to extract statistical data.
In this section three heuristics are investigated. These are the greedy local
search, simulated annealing and genetic algorithm. All three heuristics are
investigated by means of artiﬁcial neural networks so as to determine their
merit function approximation.
The presented optimisation algorithms form part of the software library
described in Appendix A, which can easily be extended to include other algo-
rithms.
The algorithms are presented in the next section, along with their param-
eters that will be varied in order to investigate their behaviour. The RSFQ
circuits that form part of the investigation are then presented along with an
investigation of their yield search spaces. The solution representation is then
explained after which the results of the optimisation algorithms are presented
along with a conclusion.
6.2 Meta-Heuristics
6.2.1 Greedy Local Search
Greedy Local Search [55] is a very simple randomised heuristic that serves as a
control for comparison purposes. It can be classiﬁed as a local search methodol-
ogy. It commences with a single solution and steps through the neighbourhood
using random perturbations of this solution. A ﬂow diagram of the algorithm
is shown in Figure 6.1.
The algorithm is initiated by creating a viable solution situated in the
feasible region of the search space. The merit value of the solution is then
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 84
Figure 6.1: Greedy local search algorithm ﬂow diagram.
calculated using the chosen merit function. With this initial solution, a vi-
able neighbouring solution is then selected by randomly varying the value of
a component (or multiple components). What exactly constitutes the neigh-
bourhood is explored later in this section. The merit value of the neighbouring
solution is then calculated and compared to that of the initial solution. If the
merit value of the neighbouring solution is higher (for maximisation optimi-
sation), then this solution is selected as the current solution and the initial
solution is discarded. However, if the merit value of the neighbouring solution
is lower than that of the initial solution, the neighbouring solution is discarded
and the initial solution retained as the current solution. A new neighbour is
then created from the current solution and compared in terms of merit value.
The algorithm ends when a certain stop criteria has been reached. This criteria
can generally be the number of tested solutions, an adequate merit value being
reached or no merit increase being found in a certain number of neighbours
generated.
The only variable that can inﬂuence the behaviour of a greedy local search
algorithm is the neighbourhood deﬁnition of a solution. The neighbourhood
is described in the variation length (or neighbourhood size) and the number
of variable dimensions. The variation length designates the range (from the
nominal solution value) from which the neighbouring solution can be selected.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 85
For the solution representation, the scope can vary from one step (in a positive
or negative direction) up to the whole range of the component, depending on
the variation length parameter. Another possible variable may determine the
number of dimensions (or components) across which perturbation should be
allowed.
The choice of these parameters can have a profound eﬀect on the workings
of the algorithm. For example, by choosing 1 as the allowable step selection,
the algorithm stands a very good chance of getting stuck in a local optimum
(if one exists). However, if a large step size is selected, the algorithm could
"jump" over a local optimum thus allowing a global optimum to be found. The
selection of multiple variable dimensions also allows the algorithm to move to
better solutions for problems that have a signiﬁcant inﬂuence on the parame-
ters.
The eﬀects of the variation length (neighbourhood size) and variable di-
mensions (neighbourhood dimension) are investigated below.
6.2.2 Simulated Annealing
The simulated annealing algorithm takes its name from the annealing process
in solids [56]. A crystalline solid is heated and slowly cooled to allow the crys-
tal lattice conﬁguration to reach its most regular (least energy) state. This
leads to superior structural integrity of the solid. The basics of the optimi-
sation algorithm itself is closely related to that of a local search algorithm
(such as the Greedy Local Search). The algorithm commences with a viable
solution from which a neighbouring solution is created. The diﬀerence lies in
the acceptance criteria of this new solution. Unlike with a greedy local search,
a worse solution can be selected for the current solution depending on a cer-
tain, variable probability. This probability is dependent on the diﬀerence in
merit values of the two solutions and on a parameter called the temperature.
The higher the temperature value, the more likely the algorithm will be to
select a worse solution. The change in the temperature variable is governed
by the cooling schedule of the algorithms, which is elaborated on below. The
Metropolis acceptance criteria [56] is generally used to deﬁne the probability
of selecting a worse solution. These criteria are presented in equation 6.2.1.
The probability of selecting an inferior solution that has a yield value of 20%
less than the comparative solution is depicted in Figure 6.2.
The acceptance criteria provide a possible method for the algorithm to
avoid a local optimum by allowing for inferior steps. As illustrated in Figure
6.2, at the start of the run, when the temperature value is high, the probability
of selecting an inferior solution is high too. As the temperature decreases,
however, less and less inferior solutions are accepted. When the temperature
variable reaches zero, the algorithm reduces to a simple greedy local search as
explained earlier. Nonetheless, it is noticeable that this methodology does not
guarantee that simulated annealing will always perform better than its greedy
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 86































Figure 6.2: Graph of probability of selecting an inferior solution using simu-
lated annealing.
local search counterpart. Simulated annealing only provides the mechanics for
the algorithm to possibly settle near the global optimum before commencing
a greedy search. The greedy local search algorithm, on the other hand, is
completely dependent on its starting value, and the size and dimensionality of
the deﬁned neighbourhood.
P (Accept ω′) =
{
exp[(f(ω′)− f(ω))/tk] if f(ω′) < f(ω)
1 if f(ω′) ≥ f(ω) (6.2.1)
It is interesting to note that simulated annealing is one of the few meta-
heuristics that indicate proof of convergence. The interested reader can inves-
tigate this convergence proof in [56].
The algorithm is further explained in Figure 6.3. It commences with the
creation of an initial, random solution in the feasible region of the search
space after which the merit value of this solution is calculated. A neighbour-
ing solution is then constructed using the initial solution followed by the merit
calculation of the neighbouring solution. If the merit value of the neighbouring
solution is higher (given a maximisation problem) than that of the initial solu-
tion, the neighbouring solution is selected as the current solution. If the merit
value of the neighbouring solution is lower than that of the initial solution, the
acceptance criteria in equation 6.2.1 is used to either select the neighbouring
solution as the new current solution, or to discard it depending on a sample
from a uniformly distributed random variable.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 87
Figure 6.3: Simulated annealing algorithm ﬂow diagram.
The stop criteria can be associated with a certain merit value, an absolute
number of runs, or a stability criterion. Generally, the strategy allows the
temperature variable to reduce to zero, after which a greedy implementation
is run until no improvements are found for a certain number of samples. If
the number of stable samples are large, the possibility of a local (or global)
optimum being found is high.
The cooling strategy of the algorithm deﬁnes how the temperature variable
varies as the algorithm progresses. This normally consists of maintaining the
temperature variable at a stable rate for a number of steps after which the
temperature is decreased with a certain step size. In general, the decrease
in the temperature variable is linear or geometric. The cooling strategy is
discussed in detail below.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 88
Simulated annealing has the same neighbourhood deﬁnition as the greedy
local search heuristic. The same neighbourhood parameters are therefore avail-
able as variables. These are the number of dimensions that can be perturbed
when seeking a neighbouring solution and also the maximum step size that
can be taken from the initial solution.
A unique feature of the simulated annealing algorithm is its cooling strat-
egy. This strategy governs how the temperature variable changes as the al-
gorithm progresses. For the purpose of heuristic comparison, a linear cooling
strategy is used. This entails maintaining the temperature at a stable rate for
a number of steps after which the temperature is reduced by a ﬁxed step size.
For this implementation three free variables are evident: the ﬁrst variable is
the starting temperature, the second is the number of iterations per temper-
ature step and the last variable is the temperature step itself. As explained
previously, the aim is to allow the algorithm to settle in an advantageous po-
sition within the search space before running a greedy implementation with a
temperature variable that had reached zero.
It was decided to reduce the three variables to two by using a ﬁxed allowable
number of samples until the temperature reached zero. The number of steps
that the cooling strategy can take from its starting point until it reaches zero,
multiplied by the number of iterations per temperature step are held constant.
This is realised to ensure that one set-up does not perform better than another
because it was allowed to use more samples. For example, if 1000 samples are
selected as a constant and 10 steps are chosen for the starting temperature
value to reach zero, 100 iterations per temperature should be applied.
The free variables that are investigated for the simulated annealing algo-
rithm reduce to the starting value and the number of temperature steps that
the cooling strategy will take.
6.2.3 Genetic Algorithm
Genetic algorithms [57] are meta-heuristics inspired by the genetics of evolu-
tion, with the concept of survival of the ﬁttest playing a large part in the algo-
rithm. The notion behind this heuristic is that a ﬁt (good) solution (genome)
carries some features that are the cause of the ﬁtness. This might, for exam-
ple, either be the presence of a certain parameter (gene) value in the genome
or the presence of a group of interrelated parameter values. The methodol-
ogy is to attempt to pass these good features on to the next generation by
"breeding" the solution with another (usually also good) genome. A mutation
strategy usually also forms part of this heuristic. This entails changes to one
or more structures of a single genome in order to add new genetic information
into the generation that was not available in the gene pool before as well as
to perform a rudimentary local search. An "elitist" strategy is also usually
employed. This entails that a certain percentage of the best genomes in the
current generation will be directly transferred to the new generation. This af-
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 89
Figure 6.4: Genetic algorithm ﬂow diagram.
fords a further opportunity to incorporate the good features of these genomes
into future generations.
The algorithm is explained further in Figure 6.4.
The algorithm commences by creating an initial generation which consists
of random, but usually viable genomes after which the ﬁtness (merit value)
of each genomes is calculated. After each ﬁtness calculation process the stop
criteria is checked, which is explained below.
If the stop criteria was not reached, a new generation of genomes is cre-
ated from the current generation (illustrated by the shaded area in the ﬁgure).
The composition of the new generation is variable based on the decision of the
designer. Normally a percentage, called the crossover percentage, is used to
deﬁne what percentage of the new generation should be created via "breeding"
(or crossover as referred to in genetic algorithm literature). Before these new
genomes can be constructed, a gene pool is ﬁrst created. This is a list of the
genomes in the current generation that are elected to take part in the crossover
process. Generally this list contains the best individuals of the current gener-
ation. The percentage of the ordered list (best to worst) of individuals in the
current generation that are ellected to take part in the crossover is called the
threshold percentage. Finally the diﬀerence between the number of individu-
als in a generation and the number of individuals created by the crossover, is
made up of the best individuals of the current generation thus incorporating an
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 90
Figure 6.5: Genetic algorithm crossover illustration.
elitist strategy which ensures that the gene pool contains a suﬃcient amount
of good genomes.
The stop criteria of the algorithm can once again either be the number of
runs (or the number of generations in this case), a certain merit (ﬁtness) value
being reached, or a certain number of runs for which no improvements were
found.
Genetic algorithms are easy to implement but hard to use optimally. This
is due to the number of variable parameters in the algorithm that have an
eﬀect on its behaviour. The inherent power of the algorithm emanates from its
parallel implementation. In genetic algorithms, unlike with greedy local search
and simulated annealing algorithms a large diverse area of the search space is
investigated in parallel which improves the chances of identifying promising
search areas. The way in which these areas are searched is governed by the
parameters of the algorithm which are discussed below.
The ﬁrst parameter of interest is the size of the generation. This is the
number of individuals in each generation. It is generally favourable to have
a large generation size as this ensures that various parts of the search space
is represented in the generation. This does, however, add substantially to the
processing time necessary to calculate the generational ﬁtness.
The next signiﬁcant parameters are related to how any subsequent genera-
tion is constructed. The truncation percentage is the percentage of the current
generation that will form part of the gene pool. This is the list of viable par-
ents for the crossover process. The next parameter of interest is the crossover
percentage. This is the percentage of a new generation that should be created
using the crossover operator.
The crossover itself can be deﬁned in many diﬀerent ways. The imple-
mented crossover strategy is illustrated in Figure 6.5
A number of crossover points are deﬁned for the algorithm which map to
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 91
indices in the solution representation. In the ﬁgure, each block of a solution
corresponds to a parameter (for example a circuit component) and the value
inside the block is the current value of the component in the allowable range.
The crossover points therefore map to indices of the solution representation.
The user of the algorithm only deﬁnes the number of crossover points and
from then onwards the points themselves are randomly selected. The crossover
process then steps through the solution representation assigning gene values
from the parent to the child. At the start of the process, the gene values of
parent 1 will be assigned to the child. As soon as a crossover point is reached,
the source of points is switched to another parent, who then continues to
assign values. When another crossover point is reached the source switches
back to the ﬁrst parent, and so on. In this manner there is a possibility that
components that function well together, stay together in the genome of the
child.
The last parameter of interest is the mutation ratio. This relates to the
probability that an individual in the newly created generation will undergo
mutation. The mutation process itself is akin to the neighbourhood search
methods explained earlier. A number of component points (called dimensions
previously) are selected along with a range over which these dimensions may
vary. These components are then randomly assigned new values in the range.
The same eﬀects caused by the neighbourhood deﬁnition explained earlier are
prevalent at the mutation stage.
In order to keep the investigation tractable, three parameters were selected
to vary. These are the generation size, the number of crossover points and
the mutation ratio. The generation size should in general have a signiﬁcant
eﬀect on the algorithm given that this would increase the diversiﬁcation and
therefore increase the opportunity to ﬁnd a good solution. The number of
crossover points could also make an interesting contribution to the algorithm
since the components in a circuit generally tend to have some correlation with
regards to their values. The number of crossover points inﬂuence the amount
of neighboring components that will remain together throughout the crossover
process. The mutation ratio controls the localised search capability of the
algorithm, and is therefore also important.
6.3 Circuits
The RSFQ AND and OR circuits were investigated for yield optimisation.
These circuits were already introduced in earlier chapters in this document
and are added here again for easy reference. Values for the junction capaci-
tances and shunt resistances were obtained using the IPHT 1 kA process [49].
The respective values are listed in Table 6.1. All the shunt resistances were cal-
culated for a βc value of approximately 1. It is important to note that for the
optimisation investigation smaller step sizes were used than those presented
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 92
in Table 6.1. A simple linear interpolation method was used to generate ca-
pacitance and shunt resistance values that are not shown. All input pulses are
applied through a DC-SFQ circuit connected to the circuit under test using
a Josephson Transmission Line while the outputs are terminated using a 2Ω
resistor connected to the circuit also through a Josephson Transmission Line.
Junction Area Junction Capacitance Shunt Resistance
100µm2 0.498988 pF 2.576 Ω
125µm2 0.624488 pF 2.057 Ω
150µm2 0.785522 pF 1.695 Ω
175µm2 0.883018 pF 1.456 Ω
200µm2 1.00852 pF 1.274 Ω
225µm2 1.13151 pF 1.135 Ω
250µm2 1.26153 pF 1.022 Ω
275µm2 1.38502 pF 0.9278 Ω
300µm2 1.50550 pF 0.8529 Ω
325µm2 1.62999 pF 0.7879 Ω
350µm2 1.75851 pF 0.7308 Ω
375µm2 1.87848 pF 0.6842 Ω
400µm2 2.01704 pF 0.6371 Ω
425µm2 2.15234 pF 0.5934 Ω
450µm2 2.28123 pF 0.5435 Ω
Table 6.1: IPHT 1 kA junction parameters.
The AND gate in Figure 6.6 consists of two ﬂuxon storage loops that are
used to change the behaviour of the circuit. These loops are B2L2B3 and
B13L12B14. Inputs on DataA and DataB are respectively stored in these
loops. A pulse on Clock causes both of these stored ﬂuxons to be released and,
due to the biasing of B6 and B10 causes B8 to switch, thereby producing an
output pulse. The storage of the ﬂuxons inﬂuences the biasing of B6 and B10.
If only one loop holds a ﬂuxon, an arriving clock pulse will cause B6 or B10 to
switch before B8, therefore no output pulse is produced. Junctions B1 and B12
serve as escape junctions for consecutive data pulses without an interleaved
Clock pulse. Junctions B5 and B9 form a pulse splitter for the application of
the Clock pulse to the storage loops while junctions B4 and B11 are used as
escape junctions for the Clock pulses if no ﬂuxons are present in the respective
loops. The circuit parameters are again listed in Table 6.2.
It is also important to note that the two input data arms of the AND
gate are identical. It is therefore unnecessary to optimise the components of
these two arms independently since their parameter values will generally be
identical. This information can be passed to the optimisation algorithm via a
linked component list.
The next circuit under test is the RSFQ OR gate, depicted in Figure 6.7.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 93
Figure 6.6: RSFQ-AND circuit schematic.
Figure 6.7: RSFQ-OR circuit schematic.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 94
Junctions Inductors Bias Currents
B1 : 200µm
2 L1 : 2.47 pH I1 : 187µA
B2 : 250µm
2 L2 : 9.4pH I2 : 256µA
B3 : 325µm
2 L3 : 2.75 pH I3 : 460µA
B4 : 275µm
2 L4 : 3.01 pH I4 : 187µA
B5 : 175µm
2 L5 : 0.69 pH
B6 : 150µm
2 L6 : 1.25 pH
B7 : 300µm
2 L7 : 0.81 pH
B8 : 350µm
2 L8 : 1.1 pH
B9 : 175µm
2 L9 : 0.69 pH
B10 : 150µm
2 L10 : 3.01 pH
B11 : 275µm
2 L11 : 2.47 pH
B12 : 200µm
2 L12 : 9.4 pH
B13 : 250µm
2 L13 : 2.75 pH
B14 : 325µm
2
Table 6.2: Un-optimised RSFQ AND gate parameters.
The OR gate consists of a single ﬂuxon storage loop with components
B6L6B5B7. An input pulse on DataA or DataB will be stored in this
loop. Junctions B3 and B8 are used as escape junctions for the data pulses
to ensure that an SFQ pulse is not propagated back into the system via the
non-receiving input data port. When two consecutive pulses arrive at DataA
and DataB without an interleaved Clock pulse, B5 will switch, thus ensuring
that only one pulse is stored in the storage loop. The arrival of a Clock pulse
will release the stored ﬂuxon through B7 to generate an output pulse. B2 serve
as escape junction for the incoming Clock pulse if no ﬂuxon is present in the
storage loop. Parameters for the OR gate can be found in Table 6.3.
Junctions Inductors Bias Currents
B1 : 200µm
2 L1 : 1.27 pH I1 : 160µA
B2 : 200µm
2 L2 : 1.77 pH I2 : 520µA
B3 : 300µm
2 L3 : 1.27 pH
B4 : 325µm
2 L4 : 0.65 pH
B5 : 250µm
2 L5 : 3.09 pH
B6 : 100µm
2 L6 : 6 pH
B7 : 250µm
2 L7 : 3, 47 pH
B8 : 300µm
2 L8 : 1.27 pH
B9 : 325µm
2 L9 : 0.65 pH
Table 6.3: Un-optimised RSFQ OR gate parameters.
Stellenbosch University  https://scholar.sun.ac.za



















B2 Area in µm
2







Figure 6.8: Sobol sequence yield map of B2 and L2 for the RSFQ AND gate.
6.4 Search space
By looking at the search space of the optimisation problem, one might as-
certain which of the algorithms should be superior. The problem faced when
optimising circuits, though, is the number of free variables in the system, which
make it impossible to visualise the search space as a whole. One can, however,
investigate two dimensions of the search space (with an associated merit axis).
Consequently, one of the most important features that can be determined is
whether the search space is concave (or convex for minimisation problems).
This might indicate that a normal greedy heuristic can be used to merely it-
eratively move through the search space until the global optimum is reached.
If the space is not convex, one is more than likely dealing with a multi-modal
search space. This is a search space that contains multiple optima which, in
general, diﬀer from their optimal values. In such situations more advanced
heuristics (such as simulated annealing and genetic algorithms) can assist in
ﬁnding the global optimum.
Visualisations of the search space were already presented for the critical
components in the storage loops in the investigation of artiﬁcial neural net-
works in Chapter 4. These visualisations are again presented below for refer-
ence purposes.
From the visualisations it would appear that the yield search space of the
two gates are mostly concave, there being only one global maximum. Given
the concave structure, the greedy local search algorithm should prove suﬃcient
for the optimisation process. Naturally this is only a two-dimensional (from
19 possible dimensions) view, and the information should be treated as such.
It is also important to bear in mind that only the optimisation of one
function, that is the yield value, is considered here. When another merit
Stellenbosch University  https://scholar.sun.ac.za

















B7 Area in µm
2











Figure 6.9: Sobol sequence yield map of B7 and L6 for the RSFQ OR gate.
function, for example latency, is also considered, a weighted sum is usually
employed to garner a ﬁnal merit value. This could easily result in a multi-
modal search space since it is highly likely that the latency of the circuit could
have reached an optimum at a position where the yield is not optimal.
6.5 Solution Representation
One of the ﬁrst steps a designer faces after selecting an optimisation method-
ology is how to represent the problem to the optimisation algorithm. For the
proposed optimisation methods, each component parameter was discretised
depending on the upper and lower bounds of the component and the number
of steps to be investigated. The bounds and steps are provided by the designer.
The step size can then easily be calculated using equation 6.5.1.
StepSize = ((UpperBound)− (LowerBound))/Steps (6.5.1)
The representation of the components are therefore reduced to a list of
integer values. Each integer value corresponds to a component with the range
of the integer value being the number of steps for that particular component.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 97
6.6 Testing Methodology
As explained earlier, it is very diﬃcult to compare meta-heuristics due to the
randomisation used in the algorithms. An inherently sub-standard algorithm
might provide good results simply because of a randomly good starting po-
sition. If meta-heuristics are to be compared, it would be crucial to execute
a very large number of runs in order to statistically compare the outcomes.
This tends to be quite diﬃcult for the optimisation of RSFQ circuits due to
the number of SPICE runs needed to sample a single yield point in the search
space.
In order to mitigate this situation, it was decided to use the yield space
models approximated by artiﬁcial neural networks as discussed in Chapter 4.
However, as previously illustrated, the approximation of the yield space is not
perfect, but the networks do provide a generally good approximation of the
overall shape of the search space. Since a comparison of the manner in which
the diﬀerent heuristics move through the search space is of more interest to
this study, it is believed that the artiﬁcial neural networks can still be a useful
tool. Yet, it is still very important to note that the global maximum of the
approximation might not be the same as that of the circuit when investigating
the search space using the Monte Carlo methods. The artiﬁcial neural networks
do provide the possibility to execute a very large number of optimisation runs
in a fraction of the time that it would have taken the Monte Carlo methods
to generate the same data. It is therefore, according to the best knowledge of
the author, the only tool available that can statistically compare the methods
in a meaningful way and within a practical time frame.
Regarding the networks, it was decided to use Network 5 Run 1 for the
AND gate and Network 6 Run 2 for the OR gate as presented in Table 6.4.
The reason for this decision was that these networks generally provided good
mean squared errors for both training data and test data as well as a good
approximation of the two-variable yield maps that were visually investigated.
Gate Structure Run Training MSE Test MSE
RSFQ-AND (20,20) 1 0.0020 0.0097
RSFQ-OR (40,40) 2 0.0072 0.0277
Table 6.4: ANN topologies chosen for optimisation investigation.
It was decided not to make use of networks with three hidden layers even
though the mean squared error of their training data tends to be better than
that of their two hidden layer counterparts. As pointed out in Chapter 4,
networks with more hidden layers tend to be more diﬃcult to optimise given
the larger number of free variables in the system. Hence it would be more
likely to ﬁnd a generally good approximation by using two hidden layers even
though some details would not be approximated correctly.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 98
Artiﬁcial neural networks are easily incorporated as yield approximators
for optimisation algorithms. A solution with a structure as depicted in Table
6.4 is normalised between 0 and 1 using the training data limits of the network
as normalisation factors. This normalised solution can then be passed on to
the network in a feed forward manner which returns a normalised yield value
between 0 and 1.
Each parameter step under investigation was run 1000 times in order to
identify any statistical trends.
The search space investigated for the optimisation was the same as that
used for the artiﬁcial neural network in order to ease the input normalisation.
Each component therefore had a deviation of plus 150% and minus 150% with
absolute bounds to ensure that the circuit was practically implementable. The
sample space bounds are listed in Table 6.5 below.
L B I
Abs Min value 0.5 pH 100µm2 100µA
Abs Max value 15 pH 400µm2 600µA
Deviation 150 % 150 % 150 %
Table 6.5: Yield optimisation search space bounds.
6.7 Results
6.7.1 Greedy Local Search
As stated earlier, the investigation regarding the greedy local search consists
of the neighbourhood size and the neighbourhood dimensions. The parameter
points investigated are shown below in Table 6.6.








Table 6.6: Greedy local search parameter samples.
Each of the optimisation set-ups were run 1000 times. A stability stopping
criteria was used. If no better solution was found for 100 samples, the algorithm
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 99
assumed that an optimum had been reached and the algorithm was ended.
Each component range deﬁned in Table 6.5 was sub-divided into 100 positions
for the generation of the solutions representation.
The results of the greedy local search on the AND gate are shown below.
The number of required samples were also recorded in order to ascertain how
long each of the 1000 runs took to reach its ﬁnal stable position. The number
of samples that reached a "global" optimum above 90% are also indicated.
This should oﬀer a rough indication of how adept an algorithm is at ﬁnding
the global optimum.
Number Yield Mean Yield Max Yield Std Mean Samples Above 90%
1 84.3% 90.8% 4.23% 6887 145
2 84.3% 90.8% 4.8% 4072 153
3 83.9% 90.7% 3.8% 2714 95
4 83.9% 90.7% 4.9% 2699 123
5 83.5% 90.7% 4.4% 1994 91
6 83.5% 90.7% 4.1% 1334 106
7 81.3% 90.5% 5.4% 463 12
Table 6.7: GLS statistical results for the RSFQ AND gate.
It is evident that an increase in neighbourhood size had a signiﬁcant eﬀect
on the number of samples required to obtain an optimum point. This is un-
derstandable given the larger "leaps" that solutions are able to take. A slight
increase in good solutions (above 90%) were visible using a neighbourhood size
of 5. This trend did not continue with a neighbourhood size of 10 though. It
could be surmised that the selection caused large leaps to be taken therefore
missing small localised areas where the yield is better. This is also revealed in
the lower standard deviation of the size 10 neighbourhood which could indicate
that localised areas were not suﬃciently investigated.
The increase in dimension size had an even larger eﬀect on the number of
samples that were needed before a value would settle. It would seem, though,
that a global optimum is less likely to be found using this methodology. This
makes sense considering that multiple changes to a solution might cause a
global point to be missed altogether.
By increasing both the neighbourhood size and the dimensions, the eﬀect
on the number of required samples is cumulative. It would seem, though, that
the algorithm then loses its ability to ﬁnd optimum solutions eﬃciently due to
the more random nature of the neighbourhood.
The results of the OR gate are listed in Table 6.8.
The ﬁrst observation of the results indicate a very large standard deviation
at the relatively low mean. This implies that the search space of the OR
gate could be substantially diﬀerent from that of the AND gate. The OR
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 100
Number Yield Mean Yield Max Yield Std Mean Samples Above 90%
1 56.9% 95.2% 40% 3891 263
2 60.2% 95.6% 40% 2700 376
3 56.3% 95.7% 40.8% 1735 261
4 56.5% 95.5% 40.9% 1699 263
5 57.5% 95.5% 40.1% 1264 231
6 59.8% 95.5% 39.8% 818 251
7 64.3% 95.3% 36.8% 330 186
Table 6.8: GLS statistical results for the RSFQ OR gate.
gate search space possibly has quite a few local optima which tend to lead to
sub-optimal solutions being produced.
The results of the OR gate exhibit the same trends as those of the AND
gate. Once again, an increase in neighbourhood step size has the eﬀect of
decreasing the required samples without a reduction in performance. It would
therefore appear that a neighbourhood step size of 5 is a good selection for
both the AND gate and the OR gate.
The change in dimensions had the same eﬀect on the samples (reduction)
with very little eﬀect on the other metrics. This is slightly in contrast to what
was found for the AND gate, which again highlights the diﬀerence in the search
space of the two gates.
It is interesting to note that to increase both the neighbourhood step and
the neighbourhood dimensions (Samples 6 and 7) an overall reduction in the
standard deviation of the yield is visible. This could indicate that the local
optima in the OR search space are relativy small (geometrically) which allows
a more random set-up to jump over these local optima and search further on
in the search space.
Generally it would seem that a good option is to use a single dimensional
neighbourhood with a step size of 5.
6.7.2 Simulated Annealing
In order to minimise the number of free variables under investigation, the re-
sults of the greedy local search were used. The neighbourhood of the simulated
annealing algorithm was therefore selected as a single dimensional size with a
neighbourhood step of 5.
The free variables of the algorithm comprise of the starting temperature
and the temperature steps to zero. As previously explained, the temperature
steps to zero multiplied by the iterations per temperature step were selected
as a constant. The value of 2000 was selected for this purpose. For example, if
a set-up is selected that requests 10 temperature steps before the temperature
reaches zero, this would result in 2000/10 = 200 iterations per temperature
step.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 101
Once again 1000 optimisation runs were completed for each sample point,
using the same stopping criteria as for the greedy local search tests.
The sample points that were investigated are presented in Table 6.9.








Table 6.9: Simulated annealing parameter samples.
The results for the AND gate are shown below in Table 6.10.
Number Yield Mean Yield Max Yield Std Mean Samples Above 90%
1 83.8% 90.7% 4.1% 4689 114
2 83.7% 90.8% 4.8% 4743 119
3 83.4% 90.8% 4.2% 4637 88
4 83.9% 90.8% 4.2% 4727 123
5 84.1% 90.8% 4.4% 4761 136
6 83.5% 90.7% 4.4% 4677 99
7 83.9% 90.8% 4.4% 4689 124
Table 6.10: SA statistical results for the RSFQ AND gate.
These results are very similar to that of the greedy local search test. How-
ever, there tends to be a slight increase in the mean yield value when more
temperature steps are taken, as was the case for samples 5 and 6. One pos-
sibility for these similar results is that the search space of the AND gate is
very uni-modal which means that it is unlikely for a random greedy search
not to ﬁnd the optimum regularly. Hence, the ability of simulated annealing
algorithms to ﬁnd a global optimum in a multi-modal search space would not
be necessary.
The results of the OR gate are shown in Table 6.11.
The results of the OR gate using simulated annealing exhibit substantially
better solutions than the greedy local search, in contrast to the AND gate. The
mean in general tends to be higher than that of the greedy local search, show-
ing the possibility that the simulated annealing algorithm tends to ﬁnd optima
more regularly in this search space. The most signiﬁcant improvement, how-
ever, was demonstrated when the starting temperature was decreased. This
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 102
Number Yield Mean Yield Max Yield Std Mean Samples Above 90%
1 63.5% 95.5% 38.7% 3677 312
2 66.2% 95.4% 37.4% 3630 329
3 70.5% 95.4% 34.9% 3559 355
4 63.6% 95.6% 38.4% 3668 298
5 64.6% 95.4% 37.7% 3606 295
6 66.1% 95.6% 37.5% 3596 327
7 69.7% 95.6% 35.4% 3562 349
Table 6.11: SA statistical results for the RSFQ OR gate.
had the eﬀect of reducing the chance that a non-improving solution was ac-
cepted prematurely in the optimisation run.
The increase in temperature steps also seemed to add a slight improvement
to the algorithm as again demonstrated in samples 5 and 6, but the eﬀect is
not as signiﬁcant as the reduction in the starting temperature.
It was suggested that the search space of the OR gate might be more multi-
modal than that of the AND gate. This could very well be why the simulated
annealing algorithm performs substantially better on the OR gate given the
ability of this algorithm to step out of the local optima.
6.7.3 Genetic Algorithm
The variables under test for the genetic algorithm is the generation size, the
number of crossover points and the mutation ratio. The mutation operation
was chosen to mimic the same neighbourhood search as the previous two al-
gorithms for the sake of consistency. Therefore when a mutation operation
occurs, one dimension is varied for the ﬁrst ﬁve steps. The same stopping
criteria were applied for the algorithm. If no better solution was found after
100 steps (generations in this case), the algorithm was declared to be com-
plete. The truncation ratio was ﬁxed at 0.5 for all the sample set-ups and the
crossover ratio was selected as 0.7.
The samples set-ups of the algorithm are listed in Table 6.12.
Number Generation Size Crossover Points Mutation Ratio
1 40 4 0.25
2 40 4 0.5
3 40 4 0.75
4 40 2 0.5
5 40 8 0.5
6 20 4 0.5
7 80 4 0.5
Table 6.12: Genetic algorithm parameter samples.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 103
As is evident from Table 6.12, a nominal sample set-up with a generation
size of 40, 4 crossover points and a mutation ratio of 0.5 was selected. Each
variable was then varied one step up and down from this nominal.
The results for the AND gate are shown in Table 6.14.
Number Yield Mean Yield Max Yield Std Generations Above 90%
1 86.1% 90.8% 3.3% 867 191
2 86.2% 90.8% 3.3% 718 175
3 86.6% 90.8% 3.3% 1082 266
4 86% 90.8% 3.4% 767 188
5 86.3% 90.8% 3.3% 710 203
6 84.9% 90.8% 3.9% 1008 129
7 87.6% 90.8% 2.6% 573 330
Table 6.13: GA statistical results for the RSFQ AND gate.
In general, the genetic algorithm was able to reach better results than
both the greedy local search and simulate annealing algorithms for the AND
gate. This is evident in the higher mean and lower standard deviation values.
Increasing the mutation ratio above the 0.5 tended to produce slightly better
results as can be seen from the number of runs above 90% in sample 3. This
could be due to the ability of the local searches implemented by means of
mutation to ﬁnd local optimum points. Reducing the crossover points to 2
from the nominal 4 had the eﬀect of a slight decrease in performance as is
evident from the lower mean and higher standard deviation values. It is also
interesting to note that the same maximum value was found in each sample,
indicating that the algorithm is very adept at ﬁnding the global optimum.
The most signiﬁcant improvement, though, is seen with the increase in the
genomes per generation, which also exhibited an increase in the yield mean
and a decrease in deviation. It would seem that the extra diversity supplied
by more genomes does tend to produce better results.
The number of generations undertaken before a stable value was reached
tended to be very high in general. In order to roughly compare the genera-
tion count to the sample counts of the greedy local search and the simulated
annealing algorithms one has to investigate the number of yield points sam-
pled. For each generation this would be in the order of Genomes_Per_Gen×
Crossover_Ratio since the genomes that are not part of the crossover opera-
tion are taken from the current generation and therefore need not have their
ﬁtness calculated. As an example, equivalent yield samples for sample 1 are
in the range 867× 40× 0.7 = 24276 which is substantially more than that of
the other two algorithms. In order to mitigate this large range, it is suggested
that a yield threshold should rather be used as a stopping criteria. After the
yield threshold is reached a pure local search can be implemented to ﬁnd the
local optima of each genome.
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 104
Number Yield Mean Yield Max Yield Std Generations Above 90%
1 91% 95.6% 2.3% 673 700
2 91.1% 95.6% 2.3% 578 720
3 91.6% 95.6% 2% 866 800
4 90% 95.6% 2.2% 605 709
5 91.2% 95.6% 2.2% 568 741
6 90.4% 95.6% 2.6% 780 623
7 91.8% 95.6% 1.8% 446 852
Table 6.14: GA statistical results for the RSFQ OR gate.
The values obtained for the OR gate are also substantially better than
those of both of the other algorithms. In is evident that the genetic algorithm
is capable of successfully navigating a multi-modal search space (which is sus-
pected to be the case for the OR gate). The change in mutation ratio showed a
small improvement by lowering the standard deviation when increased to 0.75.
This is in correlation with what was seen for the AND gate. When the number
of crossover points were decreased to 2 a slight drop in the mean was visible,
providing slightly poorer results. Similar results are evident in the AND gate
investigation.
The most signiﬁcant inﬂuence is once again the genomes per generation
which caused a substantial decline when reduced to 20 and supplied a better
result when increased to 80. The number of generations needed for the algo-
rithm to run were still high but the large increase in the mean implies that it is
likely that the genetic algorithm only has to be run once to obtain favourable
results while for the other two algorithms more runs may be necessary.
6.8 Comparison
What is evident from the tests is that the algorithms can have diﬀerent eﬀects
on diﬀerent search spaces. For the AND gate a normal greedy local search
seems to suﬃce while for the OR gate both the simulated annealing algorithm
and the genetic algorithm tend to perform substantially better. As was sug-
gested, this might indicate that the search space of the OR gate is multi-modal
given the large standard deviation in its the local search techniques.
It is an open question whether the search space of the OR gate truly is
multi-modal or whether the artiﬁcial neural network implementation caused
the multi-modality. The answer, though, is not crucial to the relevance of these
results. What is demonstrated here is that it is highly likely that simulated
annealing and genetic algorithms will perform better for search spaces in which
normal local searches exhibit a large standard deviation (therefore possibly
being multi-modal). Given the complexity of RSFQ circuits, the likelihood of
multi-modal search spaces is high. If multi-objective metric values (such as
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 6. YIELD OPTIMISATION OF RSFQ CIRCUITS 105
yield and latency) are added to the algorithms, multi-modality becomes even
more likely.
In general, when optimising an RSFQ circuit for yield, it is suggested to
commence with a greedy local search or simulated annealing algorithm if avail-
able. If after a few runs the standard deviation seems to be very high, a strong
indication of a multi-modal search space have been found. The usage of op-
timisation algorithms designed to circumvent local-optima, such as simulated
annealing or genetic algorithms, are then suggested.
The possibility of using a greedy local search to identify a possible multi-
modal search space for RSFQ optimisation should be further investigated in
future work. The usage of meta-heuristics such as genetic algorithms generally
add a substantial time to the computation of an optimal solution and should
therefore be used only when necessary. It is suggested that more complex
circuitry are also optimised using the methods introduced in this chapter to
further ascertain the usability of these algorithms.
Stellenbosch University  https://scholar.sun.ac.za
Chapter 7
Conclusion and Future Work
Various design automation and optimisation techniques were introduced in this
work which have been shown to be beneﬁcial for the development of RSFQ
circuits.
In Chapter 3 it was demonstrated that Latin Hypercube Sampling and
Sobol sequences can successfully be utilised as variance reduction techniques
for yield estimation. A substantial amount of research can still to be under-
taken to gain the full beneﬁt of statistical analysis of the circuit parameters on
yield. A next step would be to include a principal component analysis (PCA)
of the circuit parameters in terms of yield. This analysis technique investigates
the correlation between components while also identifying which components
are responsible for the most variance. These principal components can then be
used for yield approximation, thereby reducing the dimensionality of the sam-
ple space. The semiconductor and ﬁnancial industries also provide many other
examples of variance reduction techniques which could be beneﬁcial when used
with RSFQ circuits.
In Chapter 4 the possibility of yield estimation by means of artiﬁcial neu-
ral networks was investigated. It was found that these networks can be used
to model the shape of the yield sample space. There is, however, substantial
room for improvement since it was determined that the artiﬁcial neural net-
work approximation does not perfectly ﬁt the Sobol based sample space used
for the comparison. Various choices in network structure, activation function,
parameter settings and training algorithms should be further investigated to
ascertain the optimum topology for RSFQ yield approximations. Artiﬁcial
neural networks constitute an extensive ﬁeld of research and provide many
other avenues of exploration for yield estimation. Artiﬁcial neural networks
can also be directly used for optimisation, as can be seen with the Hopﬁeld
networks investigated in [58]. Currently though, the amount of training data
needed might dissuade users from using normal multi-level Perceptron net-
works for direct yield approximation. The usage of such a network as a tool
for comparing the eﬃcacy of optimisation algorithms was, however, demon-
strated to be of great use as was described in Chapter 6. Stochastic optimi-
106
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 7. CONCLUSION AND FUTURE WORK 107
sation algorithms generally have many parameters that aﬀect the outcome of
the optimisation run. Using an artiﬁcial neural network approximation allows
optimisation runs to complete in a fraction of the time needed by Monte Carlo
approximations. This provides a statistical method to compare algorithms and
algorithm variants.
In Chapter 5 an automated state machine extraction algorithm was intro-
duced that can be used for functional circuit testing as well as for the creation
of HDL models. In terms of functional testing, the algorithm was shown to be
successful and was the primary method used to prove state machine equiva-
lence to nominal circuits for all optimisation and Monte Carlo methods used
in this work. Various improvements can still be done to further reﬁne the
method. In terms of automation, improvements could include the automatic
identiﬁcation of ﬂuxon storage loops (without an exhaustive search). Theoret-
ical justiﬁcation can also be investigated for the constants used in the method
that currently only have an empirical foundation.
Two approaches to critical timing extraction was also presented in Chap-
ter 5, one which makes use of empirically identifying the point at which the
circuit fails and another based on current stability. Further work can be used
to validate which critical timing model match practical circuit failures more
closely. A further study should also include a Monte Carlo of the critical tim-
ing parameters to ascertain how these values change under process variations.
Statistical timing information could also be incorporated in future implemen-
tations as was proposed in [59].
The HDL models created should also be practically tested when combined
to form larger circuits. This could identify further work in leakage current
modeling which could also be included in the HDL models.
In Chapter 6 various optimisation algorithms were compared for their ap-
plicability for RFSQ circuits. A novel technique was used to compare the
algorithms using an artiﬁcial neural network approximation of the yield sam-
ple space as elaborated upon in Chapter 4. Given the considerable diﬀerence
in optimisation results one can surmise that the yield sample spaces of the
two gates investigated are substantially diﬀerent. Both genetic algorithms and
simulated annealing are known to perform better when dealing with multi-
modal problems, therefore one can surmise that the OR gate sample space
contains many local optima. The possibility was pointed out that the variance
of results obtained using a greedy algorithm can be used as an indication of a
multi-modal search space. This could be due to the search algorithm failing to
leave a local optimum and thus ﬁnding substantially diﬀerent results depend-
ing on the algorithm's starting position. When a large variance in optimisation
results is obtained it would be beneﬁcial to make use of algorithms which are
more apt at traversing a multi-modal search space. By using the artiﬁcial neu-
ral network approximation the eﬀects of algorithm parameter variation was
also successfully compared. It is clear that certain parameters, such as genera-
tion size for genetic algorithms, have the largest inﬂuence on the optimisation
Stellenbosch University  https://scholar.sun.ac.za
CHAPTER 7. CONCLUSION AND FUTURE WORK 108
results. This method can therefore be used in future to ascertain the inﬂuence
of algorithmic parameters.
Future work in optimisation can also include further reﬁnements to artiﬁcial
neural network models for algorithm comparison and also the investigation
of multi-objective optimisation (latency, leakage current), which have a high
possibility of creating multi-modal search spaces.
This work introduced various algorithms and methodologies that further
the state of electronic design automation in RSFQ circuits. In terms of yield
approximations, automated techniques were proposed to simplify the creation
of exhaustive test benches for the use in Monte Carlo runs. Various vari-
ance reduction techniques were shown to also reduce the necessary amount
of samples needed for a particular conﬁdence interval and in doing so, reduce
the time of yield calculations. Artiﬁcial neural networks were shown to be
valid approximators of the yield sample space which is of signiﬁcant use in
statistically quantifying the performance of diﬀerent optimisation algorithms.
Further work is, however, necessary before the Monte Carlo methods can be
completely replaced as the yield approximators used in everyday RSFQ circuit
design.
There is a high possibility that HDL models will play a crucial role in
the creation of complex RSFQ circuits in the same manner that these models
form a critical part in the design cycle of semiconductor circuits. This work
proposed an automated method of extracting HDL models in an exhaustive
and repeatable manner for the creation of high-level cell libraries. With further
reﬁnements in leakage current, these models can form the core of complex
RSFQ circuitry.
All the methods presented in this work forms part of a software library that
can be integrated with existing development environments or form a basis for
future work.
Stellenbosch University  https://scholar.sun.ac.za
Appendices
109





The functionality described in this work forms part of a extensible software
library ,written in C++, which can be used for general RSFQ circuit inves-
tigation as well as optimisation (mulit-objective or otherwise). Currently the
functionality provided is:
• Automated extraction of the state machine representation of an RSFQ
circuit provided in SPICE format.
• Automated generation of a VHDL model by using the extracted state
machine representation and a timing extracted technique (either empiric
or stability).
• Yield estimation of circuit for which a state machine has been extracted.
The estimation can be using crude Monte Carlo, Latin Hypercube Sam-
pling or a Sobol sequence.
• The creation of an artiﬁcial neural network to approximate single or
multi-modal objective functions.
• The optimisation (in terms of yield, latency or both) of an RSFQ circuit
provided in SPICE format.
• Various miscellaneous abilities useful for investigating a circuit's search
space.
This chapter will explain classes, their interconnections and their methods
in order to allow future work to be done on the software library. Note that the
library has been designed to function with a Linux based operating system.
110
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 111
There are however only a few operating speciﬁc sections (such as the multi-
core processing etc.) which should portable to other operating systems. Note
also that their does not currently exist a graphical user interface to facilitate
easy use of the library. Currently all commands are passed via text ﬁles. It
should be relatively easy to create a graphical front-end to improve usability.
The dependencies are as follow:
• Boost C++ library 1.56 or higher. http://www.boost.org
• JSIM or JSPICE available from various resources
• Sobol initial values for yield calculation. http://web.maths.unsw.edu.au/ fkuo/-
sobol/
The project was created using the free CodeLite IDE under Ubuntu 14.04.
In the next section, an overview of the classes and their relationships will
be provided. A brief description of each method will also be provided. After
the descriptions a detailed discussion will explain the usage of the program via
the text ﬁles and the options available.
Keep in mind that this library is under active development, and will more
than likely be restructured in a more organised way in the near future.
A.2 Library Overview
In this section each class will be investigated with it's interfaces to other classes.
Note that a solid line in the ﬁgures indicate ownership. The class object being
pointed to was therefore created by the base class and will be responsible for
its cleanup. A dotted line means that a class makes use of a certain object
but does not own it. Please note that these are not UML diagrams. Since
the library is under active development, only the classes as a whole will be
discussed.
A.2.1 App
The App class is the main entry point for the library. It contains the methods
for invoking all the functionality available to the library. The App class is con-
trolled via a conﬁguration text ﬁle created by the user. This will be explained
later.
The App will always create a SPICE_File objects from the supplied SPICE
ﬁle and will also automatically try to extract the state machine representation.
The state machine representation is necessary for all the other classes to func-
tion correctly. Once the SPICE_File object is created the user's required
operation is executed.
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 112
Figure A.1: App Class
Figure A.2: SPICE File Class
A.2.2 SPICE File
The SPICE_File class is used to parse the spice ﬁle and create a ﬂat hierarchy
for the extraction of possible inductive loops.
The parsing process creates sub-circuit objects containing the component
deﬁnitions of each component found. For each Josephson junction found an
additional junction model is also created and associated with the relevant
component deﬁnition In order to extract the state machine representation the
SPICE hierarchy needs to be ﬂat. For each instance of the sub-circuits under
test a unique ﬂat component if create that uses unique node numbers. For
each junction ﬂat component a unique junction model is also created. With
the ﬂat hierarchy created, a graph is created using the ﬂat nodes as nodes and
the ﬂat components as vertexes. The graph is created using the Boost C++
Graph library. After the graph's creation, all the possible loops in the circuit
is identiﬁed and stored using the algorithm described in Chapter 5.
A.2.3 Functional Test
The functional test class is used to extract the state machine representation
of the parsed SPICE ﬁle. This is done using the methodology explained in
Chapter 5. The state transitions, outputs generated and state identiﬁers are
all stored in the State objects. After the initial state machine extraction, it
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 113
Figure A.3: Functional Test Class
Figure A.4: Merit Test Class
is known which of the loops identiﬁed by the SPICE File opbject are actually
used for ﬂuxon storage. These values are then saved in the Functional Test
object for use with merit testing. After the State Machine extraction process
a critical timing analysis can be done on the circuit as described in Chapter 5
after which an HDL model can be generated.
A.2.4 Merit Test
The merit test class is used to calculate merit values for yield analysis and
also optimisation purposes. The circuit yield is calculated via either crude
Monte Carlo, Latin Hypercube sampling or Sobol sequences. The supplied
SPICE ﬁle is used as the source for the yield estimation and might contain
component values diﬀerent to the nominal SPICE ﬁle when calculating the
yield for optimisation algorithms. The option to return all the latencies of the
circuit also exists. The merit function itself does not assign a weighting value
to the multiple criteria if used.
A.2.5 ANN
The ANN class governs the creation of an artiﬁcial neural network. This
normally necessitates the generation of training data. Using the SPICE File
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 114
Figure A.5: Merit Test Class
Figure A.6: Greedy Local Search Class
various sampling techniques can be used to test the sample space of the cir-
cuit. The merit value of the sample point can then be found using the merit
function. After training data has been created, the artiﬁcial neural network
can be trained via user deﬁned parameters to be described below.
The trained ANN can then be used by optimisation algorithms to approx-
imate the yield value without calling the merit function.
A.2.6 Greedy Local Search
The greedy local search class implements a local search algorithm as explained
in Chapter 6. The SPICE File to optimise is supplied to the algorithm which
makes a copy of the spice object. The merit values can be calculated using
Monte Carlo Methods or a artiﬁcial neural network.
A.2.7 Simulated Annealing Algorithm
The Simulated annealing class implements the simulated annealing optimisa-
tion algorithm. As was done for the LGS algorithm, a SPICE File is supplied
which is used as source for the optimisation. Once again merit values can be
obtained via Monte Carlo methods or from a trained ANN.
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 115
Figure A.7: Simulated Annealing Class
Figure A.8: Merit Test Class
A.2.8 Genetic Algorithm
The GA class implements the Genetic Algorithm optimisation algorithm as
explained in Chapter 6. The SPICE File which needs to be optimised is pro-
vided to the GA of which a copy is made in order to vary values at will. The
merit value of the SPICE File under test can be gathered using the Monte
Carlo based merit function or the trained artiﬁcial neural network.
A.3 Basic User Guide
The program functions by passing a conﬁguration ﬁle containing instructions,
other conﬁguration ﬁles and parameters as a command line argument. The
various conﬁguration ﬁles will be discussed in this section. All the conﬁgura-
tion ﬁles makes use of a parameter keyword on a new line followed by space
separated values. The full user guide will be available on [60] as the library
reaches as stable release.
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 116
A.3.1 Main Conﬁguration File
Tables A.1 and A.2 contains the commands and parameters that can be set
using the main conﬁguration ﬁle.
Parameter Options Comment
COMMAND
EXTRACT_STATE_MACHINE Extract the state ma-
chine of the supplied
SPICE ﬁle
GENERATE_HDL Generate a VHDL
representation of the
supplied SPICE ﬁle
YIELD_ANALYSIS Do a yield analysis on
the supplied SPICE
ﬁle
RUN_ANN Create/Train a Artiﬁ-
cial Neural Network
RUN_GLS Run Greedy Local
Search on supplied
SPICE ﬁle
RUN_SA Run the Simulated
Annealing Algorithm
on supplied SPICE ﬁle




JSIM Use JSIM as simulator
JSPICE Use JSPICE as simu-
lator












ANN_CONFIG_FILE (path to ﬁle) Path to ﬁle contain-
ing information used
by the ANN function-
ality
Table A.1: Main Conﬁguration File Parameters
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 117
OPT_CONFIG_FILE (path to ﬁle) Path to ﬁle containing optimisa-
tion information
OUTPUT_FILE (path to ﬁle) Path to the output ﬁle that
should contain the relevant re-
sults
MONTE_TYPE
CMC Select crude Monte Carlo as yield
estimator
LHS Select Latin Hypercube Sam-
pling as yield estimator
SOBOL Select Sobol sequence as yield es-
timator
MONTE_RUNS (unsigned integer number) Number of Monte Carlo samples
for yield estimation
LHS_SAMPLES (unsigned integer number) Number of sample points for LHS
yield estimation
SOBOL_POINTS (unsigned integer number) Number of points used by Sobol
sequence for yield estimation
SOBOL_RANDOM (1 or 0) If one randomise the Sobol se-
quence by shuing the entries
SOBOL_INIT_FILE (path to ﬁle) Path to ﬁle containing Sobol ini-
tialisation values
SPICE_FILE (path to ﬁle) Path to the SPICE ﬁle under in-
vestigation
COMP_LINK_FILE (path to ﬁle) Path to ﬁle indicating component
links
AUTO_ADD_RS (1 or 0) If 1 automatically calculate junc-
tion shunt resistances for a Bc of
1
IGNORE_SUB_CIRS (list of sub-circuit names) List of sub-circuits in the SPICE
ﬁle that should be ignored
Table A.2: Main Conﬁguration File Parameter Cont.
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 118
A.3.2 Functional Conﬁguration File
Table A.3 contains the available set-up parameters used by the state machine
extraction algorithm.
INPUTS (integer list) Space separated list of nodes that serve as
inputs to the circuit
INPUTS_JJ (string list) Space separated list of full junction names
that is associated with the input above in
INPUTS
OUTPUTS (string list) Space separated list of full junction names
that serve as outputs to the circuit
TIME_OFFSET (intege value) The time between pulses for the extraction
of the state machine representation
RULE (list of rule string) Applies a rule for the extraction process. See
Chapter 5 for details
Table A.3: Functional Conﬁguration File Parameter Cont.
A.3.3 Greedy Local Search Conﬁguration File
Table A.4 and A.5 contains the optimisation parameters available when run-
ning the greedy local search algorithm.
OUTPUTS (string list) Space separated list of full junc-
tion names that are the outputs
of the circuit
TIME_OFFSET (integer) An integer value representing the
number of pico seconds before
new stimulus can be applied to
the circuit
COMPONENTS
(list of component names) SPICE components to be opti-
mised
ALL_B | ALL_L | ALL_I Used to indicate that all of a
type of component should be op-
timised
SAMPLE_TYPE RELATIVE | ABSOLUTE Create the search space relative
to the current component values
or use the absolute values sup-
plied
MIN_VALUES (list of component values) The absolute minimum value of
the search space associated with
the COMPONENTS listed
Table A.4: Greedy Local Search Parameters
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 119
MAX_VALUES (list of component values) The absolute maxi-
mum value of the
search space associ-
ated with the COM-
PONENTS listed
COMP_DEV (list of deviation values) The deviations (+ and
-) that should be ap-
plied to create the
search space if RELA-
TIVE is chosen
B_STEPS (unsigned integer) The number of steps
in the search space for
the junction range
L_STEPS (unsigned integer) The number of steps
in the search space for
the inductor range
I_STEPS (unsigned integer) The number of steps
in the search space for
the bias current range
NEIGHBORHOOD_DIMS (unsigned integer) The number of dimen-
sions used to create a
neighbor
NEIGHBORHOOD_SIZE (unsigned integer) The maximum num-
ber of steps used to
create a neighbor
(randomised)
MONTE_STOP_VALUE (value between 0 and 1) Stop the algorithm
when this normalised
yield value is reached
SAMPLES_STOP (unsigned integer) Stop the algorithm
when this number of
steps were taken
MERIT_METHOD ANN | MONTE Use the current ANN
for merit values or
use the set-up Monte
Carlo method
Table A.5: Greedy Local Search Paramaters Cont.
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 120
A.3.4 Simulated Annealing Conﬁguration File
Table A.6 and A.7 contains the optimisation parameters available when run-
ning the simulated annealing algorithm.
COMPONENTS
(list of component names) SPICE components to be
optimised
ALL_B | ALL_L | ALL_I Used to indicate that all of
a type of component should
be optimised
SAMPLE_TYPE RELATIVE | ABSOLUTE Create the search space rel-
ative to the current compo-
nent values or use the abso-
lute values supplied
MIN_VALUES (list of component values) The absolute minimum
value of the search space
associated with the COM-
PONENTS listed
MAX_VALUES (list of component values) The absolute maximum
value of the search space
associated with the COM-
PONENTS listed
COMP_DEV (list of deviation values) The deviations (+ and -)
that should be applied to
create the search space if
RELATIVE is chosen
B_STEPS (integer value) The number of steps in the
search space for the junction
range
L_STEPS (integer value) The number of steps in the
search space for the induc-
tor range
I_STEPS (integer value) The number of steps in the
search space for the bias
current range
Table A.6: Simulated Annealing Parameters
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 121
NEIGHBORHOOD_DIMS (integer value) The number of dimen-
sions used to create a
neighbor
NEIGHBORHOOD_SIZE (integer value) The maximum num-
ber of steps used to
create a neighbor
(randomised)
MONTE_STOP_VALUE (value between 0 and 1) Stop the algorithm
when this normalised
yield value is reached
SAMPLES_STOP (integer value) Stop the algorithm
when this number of
steps were taken
MERIT_METHOD ANN | MONTE Use the current ANN
for merit values or
use the setup for the
Monte Carlo method
START_TEMP (ﬂoat value) Speciﬁes the starting
temperature for the
algorithm
ITER_PER_TEMP (integer value) Speciﬁes to the num-
ber of iterations per
temperature step
TEMP_STEP (ﬂoat value) Speciﬁes the tempera-
ture step for the algo-
rithm
Table A.7: Simulate Annealing Parameters Cont.
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 122
A.3.5 Genetic Algorithm Conﬁguration File
Table A.8 and A.9 contains the optimisation parameters available when run-
ning the genetic algorithm.
COMPONENTS
(list of component names) SPICE components to
be optimised
ALL_B | ALL_L | ALL_I Used to indicate that
all of a type of com-
ponent should be op-
timised
SAMPLE_TYPE RELATIVE | ABSOLUTE Create the search
space relative to
the current compo-
nent values or use
the absolute values
supplied
MIN_VALUES (list of component values) The absolute mini-
mum value of the
search space associ-
ated with the COM-
PONENTS listed
MAX_VALUES (list of component values) The absolute maxi-
mum value of the
search space associ-
ated with the COM-
PONENTS listed
COMP_DEV (list of deviation values) The deviations (+ and
-) that should be ap-
plied to create the
search space if RELA-
TIVE is chosen
B_STEPS (integer value) The number of steps
in the search space for
the junction range
L_STEPS (integer value) The number of steps
in the search space for
the inductor range
I_STEPS (integer value) The number of steps
in the search space for
the bias current range
Table A.8: Genetic Algorithm Parameters
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 123
NEIGHBORHOOD_DIMS (integer value) The number of dimen-
sions used to mutate a
genome
NEIGHBORHOOD_SIZE (integer value) The maximum num-
ber of steps used to
mutate a genome
(randomised)
MONTE_STOP_VALUE (value between 0 and 1) Stop the algorithm
when this normalised
yield value is reached
GENERATIONS_STOP (integer value) Stop the algorithm
when this number of
generations have been
created
MERIT_METHOD ANN | MONTE Use the current ANN
for merit values or
use the setup for the
Monte Carlo method
GENOMES_PER_GEN (integer value) Number of genomes
that should be created
per generation
TRUNC_RATIO (ﬂoat value between 0 and 1) The ratio of the cur-
rent generation that
forms part of the gene
pool
MUTATE_RATIO (ﬂoat value between 0 and 1) The probability of
mutating a speciﬁc
genome
CROSSOVER_POINTS (integer value) The number of points
used in the crossover
operation




Table A.9: Genetic Algorithm Parameters Cont.
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 124
A.3.6 Artiﬁcial Neural Network Conﬁguration File
Table A.10 and A.11 contains the parameters available when creating a artiﬁ-
cial neural network.
COMPONENTS
(list of component names) SPICE components to be
optimised
ALL_B | ALL_L | ALL_I Used to indicate that all of
a type of component should
be optimised
SAMPLE_TYPE RELATIVE | ABSOLUTE Create the search space rel-
ative to the current compo-
nent values or use the abso-
lute values supplied
MIN_VALUES (list of component values) The absolute minimum
value of the search space
associated with the COM-
PONENTS listed
MAX_VALUES (list of component values) The absolute maximum
value of the search space
associated with the COM-
PONENTS listed
COMP_DEV (list of deviation values) The deviations (+ and -)
that should be applied to
create the search space if
RELATIVE is chosen
NETWORK_STRUCTURE (list of integer) Indicate the hidden layers of
the ANN
TRAIN_ANN (0 or 1) 1 indicates the created,
or loaded ANN should be
trained given training data
TRAIN_DATA_METHOD CMS | LHS | LOCAL_LHS Choose the method that
will be used to generate
training data
TRAIN_DATA_STEPS (integer value) The number of training
data samples to generate
SAVE_TRAIN_DATA (0 or 1) 1 indicates that the train-
ing data should be saved for
later use
Table A.10: Artiﬁcial Neural Network Parameters
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 125
READ_TRAIN_DATA (0 or 1) 1 indicates that
the training data
should be read from
the ﬁle speciﬁed
TRAIN_DATA_FILENAME (path to ﬁle) The ﬁle where
the training data
should be saved or
restored from
TRAIN_TEST_DATA_FILENAME (0 or 1) The ﬁle where the
test data is located
for training pur-
poses
READ_ANN (0 or 1) 1 indicates that
the ANN should be
read from the ﬁle
speciﬁed
SAVE_ANN (0 or 1) 1 indicates that
the created ANN
should be saved to
the ﬁle speciﬁed
ANN_SAVE_FILENAME (path to ﬁle) File where ANN
should be saved or
read from
Table A.11: Artiﬁcial Neural Network Parameters Cont.
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX A. RSFQ AUTOMATION AND OPTIMISATION LIBRARY 126
A.3.7 Conﬁguration File Examples

























Figure A.10: Linked Components Example
INPUTS 1 6 10
INPUTS_JJ X3_B2 X7_B2 X1_B2
OUTPUTS X6_B1
TIME_OFFSET 100
Figure A.11: Functional Conﬁguration Example
Stellenbosch University  https://scholar.sun.ac.za























Figure A.13: Junction Conﬁguration Example
Stellenbosch University  https://scholar.sun.ac.za







COMPONENTS ALL_L ALL_B ALL_I
MIN_VALUES 0.5p 100u 100u
MAX_VALUES 15p 400u 600u










Figure A.14: Genetic Algorithm conﬁguration Example
COMPONENTS ALL_L ALL_B ALL_I
MIN_VALUES 0.5p 100u 100u
MAX_VALUES 15p 400u 600u
SAMPLE_TYPE relative













Figure A.15: Artiﬁcial Neural Network conﬁguration Example
Stellenbosch University  https://scholar.sun.ac.za
Appendix B
Results of ANN Training Runs
This chapter contains the results of the ANN training procedure for the RSFQ
AND and OR gates.
Structure Run Lowest Train MSE Lowest Test MSE
(10) 1 0.0039 0.0125
(10) 2 0.0040 0.0131
(10) 3 0.0044 0.0113
(10) 4 0.0041 0.0024
(10) 5 0.0040 0.0117
Table B.1: ANN Results for AND gate Structure 1
Structure Run Lowest Train MSE Lowest Test MSE
(20) 1 0.0023 0.0090
(20) 2 0.0027 0.0099
(20) 3 0.0026 0.0129
(20) 4 0.0027 0.0111
(20) 5 0.0027 0.0117
Table B.2: ANN Results for AND gate Structure 2
Structure Run Lowest Train MSE Lowest Test MSE
(40) 1 0.0020 0.0120
(40) 2 0.0020 0.0091
(40) 3 0.0024 0.0113
(40) 4 0.0021 0.0115
(40) 5 0.0020 0.0087
Table B.3: ANN Results for AND gate Structure 3
129
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX B. RESULTS OF ANN TRAINING RUNS 130
Structure Run Lowest Train MSE Lowest Test MSE
(10,10) 1 0.0034 0.0106
(10,10) 2 0.0034 0.0112
(10,10) 3 0.0033 0.0085
(10,10) 4 0.0037 0.0108
(10,10) 5 0.0034 0.0096
Table B.4: ANN Results for AND gate Structure 4
Structure Run Lowest Train MSE Lowest Test MSE
(20,20) 1 0.0020 0.0097
(20,20) 2 0.0017 0.0097
(20,20) 3 0.0019 0.0121
(20,20) 4 0.0020 0.0089
(20,20) 5 0.0017 0.0197
Table B.5: ANN Results for AND gate Structure 5
Structure Run Lowest Train MSE Lowest Test MSE
(40,40) 1 0.0011 0.0119
(40,40) 2 0.0083 0.0110
(40,40) 3 0.0011 0.0097
(40,40) 4 0.0099 0.0106
(40,40) 5 0.0012 0.0113
Table B.6: ANN Results for AND gate Structure 6
Structure Run Lowest Train MSE Lowest Test MSE
(10,10,10) 1 0.0028 0.0133
(10,10,10) 2 0.0035 0.0124
(10,10,10) 3 0.0029 0.0136
(10,10,10) 4 0.0030 0.0121
(10,10,10) 5 0.0032 0.0098
Table B.7: ANN Results for AND gate Structure 7
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX B. RESULTS OF ANN TRAINING RUNS 131
Structure Run Lowest Train MSE Lowest Test MSE
(20,20,20) 1 0.0013 0.0084
(20,20,20) 2 0.0016 0.0097
(20,20,20) 3 0.0015 0.0060
(20,20,20) 4 0.0014 0.0112
(20,20,20) 5 0.0016 0.0083
Table B.8: ANN Results for AND gate Structure 8
Structure Run Lowest Train MSE Lowest Test MSE
(40,40,40) 1 0.00075 0.0103
(40,40,40) 2 0.00074 0.0120
(40,40,40) 3 0.00076 0.0099
(40,40,40) 4 0.00057 0.0090
(40,40,40) 5 0.00059 0.0107
Table B.9: ANN Results for AND gate Structure 9
Structure Run Lowest Train MSE Lowest Test MSE
(10) 1 0.0211 0.0334
(10) 2 0.0208 0.0310
(10) 3 0.0221 0.0319
(10) 4 0.0225 0.0311
(10) 5 0.0230 0.0346
Table B.10: ANN Results for OR gate Structure 1
Structure Run Lowest Train MSE Lowest Test MSE
(20) 1 0.0177 0.0293
(20) 2 0.0178 0.0272
(20) 3 0.0180 0.0293
(20) 4 0.0204 0.0314
(20) 5 0.0176 0.0289
Table B.11: ANN Results for OR gate Structure 2
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX B. RESULTS OF ANN TRAINING RUNS 132
Structure Run Lowest Train MSE Lowest Test MSE
(40) 1 0.0163 0.0279
(40) 2 0.0162 0.0296
(40) 3 0.0158 0.0316
(40) 4 0.0167 0.0278
(40) 5 0.0159 0.0282
Table B.12: ANN Results for OR gate Structure 3
Structure Run Lowest Train MSE Lowest Test MSE
(10,10) 1 0.0184 0.0306
(10,10) 2 0.0176 0.0283
(10,10) 3 0.0186 0.0286
(10,10) 4 0.0179 0.0321
(10,10) 5 0.0191 0.0308
Table B.13: ANN Results for OR gate Structure 4
Structure Run Lowest Train MSE Lowest Test MSE
(20,20) 1 0.0115 0.0283
(20,20) 2 0.0116 0.0285
(20,20) 3 0.0115 0.0274
(20,20) 4 0.0118 0.0277
(20,20) 5 0.0120 0.0281
Table B.14: ANN Results for OR gate Structure 5
Structure Run Lowest Train MSE Lowest Test MSE
(40,40) 1 0.0065 0.0295
(40,40) 2 0.0072 0.0277
(40,40) 3 0.0073 0.0285
(40,40) 4 0.0066 0.0279
(40,40) 5 0.0066 0.0258
Table B.15: ANN Results for OR gate Structure 6
Stellenbosch University  https://scholar.sun.ac.za
APPENDIX B. RESULTS OF ANN TRAINING RUNS 133
Structure Run Lowest Train MSE Lowest Test MSE
(10,10,10) 1 0.0193 0.0324
(10,10,10) 2 0.0200 0.0335
(10,10,10) 3 0.0157 0.0259
(10,10,10) 4 0.0197 0.0311
(10,10,10) 5 0.0182 0.0315
Table B.16: ANN Results for OR gate Structure 7
Structure Run Lowest Train MSE Lowest Test MSE
(20,20,20) 1 0.0114 0.0298
(20,20,20) 2 0.0111 0.0284
(20,20,20) 3 0.0118 0.0294
(20,20,20) 4 0.0112 0.0283
(20,20,20) 5 0.0109 0.0286
Table B.17: ANN Results for OR gate Structure 8
Structure Run Lowest Train MSE Lowest Test MSE
(40,40,40) 1 0.00049 0.0290
(40,40,40) 2 0.00046 0.0286
(40,40,40) 3 0.00046 0.0286
(40,40,40) 4 0.00049 0.0267
(40,40,40) 5 0.00043 0.0276
Table B.18: ANN Results for OR gate Structure 9
Stellenbosch University  https://scholar.sun.ac.za
Bibliography
[1] K. Likharev and V. Semenov, Rsfq logic/memory family: a new
josephson-junction technology for sub-terahertz-clock-frequency digital
systems, Applied Superconductivity, IEEE Transactions on, vol. 1, no. 1,
pp. 3 28, mar 1991.
[2] M. Dorojevets, C. Ayala, N. Yoshikawa, and A. Fujimaki, 8-bit asyn-
chronous sparse-tree superconductor rsfq arithmetic-logic unit with a
rich set of operations, Applied Superconductivity, IEEE Transactions on,
vol. 23, no. 3, pp. 1 700 1041 700 104, June 2013.
[3] V. Michal, E. Baggetta, M. Aurino, S. Bouat, and J.-C. Villegier, Super-
conducting rsfq logic: Towards 100ghz digital electronics, in Radioelek-
tronika (RADIOELEKTRONIKA), 2011 21st International Conference,
April 2011, pp. 18.
[4] A. Kadin, Introduction to Superconducting Circuits. John Wiley and
Sons, 1999.
[5] L. Müller, H. Gerber, and C. Fourie, Review and comparison of rsfq asyn-
chronous methodologies, Journal of Physics: Conference Series, vol. 97,
no. 1, 2008.
[6] Xilinx programmable devices, http://www.xilinx.com/.
[7] Mathworks technical computing, http://www.mathworks.com/.
[8] C. J. Fourie and M. H. Volkmann, Status of superconductor electronic
circuit design software, Applied Superconductivity, IEEE Transactions
on, vol. 23, no. 3, june 2013.
[9] T. Van Duzer and C. W. Turner, Principles of Superconductive Devices
and Circuits. Prentice Hall, 1998.
[10] W. Anacker, Josephson computer technology, special issue of IBM J.
Res. Devel, vol. 24, pp. 105264, March 1980.
[11] N. Fujimaki, S. Kotani, T. Imamura, and S. Hasuo, Josephson modiﬁed
variable threshold logic gates for use in ultra-high-speed lsi, Electron
Devices, IEEE Transactions on, vol. 36, no. 2, pp. 433446, Feb 1989.
134
Stellenbosch University  https://scholar.sun.ac.za
BIBLIOGRAPHY 135
[12] W. Perold, M. Jeﬀery, Z. Wang, and T. Van Duzer, Complementary
output switching logic-a new superconducting voltage-state logic family,
Applied Superconductivity, IEEE Transactions on, vol. 6, no. 3, pp. 125
131, Sept 1996.
[13] C. Fourie, W. Perold, and H. Gerber, Complete monte carlo model de-
scription of lumped-element rsfq logic circuits, Applied Superconductivity,
IEEE Transactions on, vol. 15, no. 2, pp. 384387, June 2005.
[14] A. Singhee and R. Rutenbar, Why quasi-monte carlo is better than
monte carlo or latin hypercube sampling for statistical circuit analysis,
Computer-Aided Design of Integrated Circuits and Systems, IEEE Trans-
actions on, vol. 29, no. 11, pp. 17631776, Nov 2010.
[15] D. P. Kroese, T. Taimre, and Z. I. Botev, Handbook of Monte Carlo
Methods. Wiley, 2011.
[16] M. Mckay, R. Beckman, and W. Conover, A comparison of three methods
for selecting values of input variables in the analysis of output from a
computer code, Technometrics, vol. 42, no. 1, Special 40th Anniversary
Issue, pp. 5660, Feb 2000.
[17] I. Dalal, D. Stefan, and J. Harwayne-Gidansky, Low discrepancy se-
quences for monte carlo simulations on reconﬁgurable platforms, in
Application-Speciﬁc Systems, Architectures and Processors, 2008. ASAP
2008. International Conference on, July 2008, pp. 108113.
[18] Boost c++ library, http://www.boost.org/.
[19] I. Sobol, Uniformly distributed sequences with an additional uniform
property, USSR Computational Mathematics and Mathematical Physics,
vol. 16, no. 5, pp. 236242, Jan 1975.
[20] P. Bratley and B. L. Fox, Algorithm 659: Implementing sobol's
quasirandom sequence generator, ACM Trans. Math. Softw., vol. 14,
no. 1, pp. 88100, Mar. 1988. [Online]. Available: http://doi.acm.org/10.
1145/42288.214372
[21] S. Joe and F. Kuo, Remark on algorithm 659: Implementing sobol's
quasirandom sequence generator, ACM Trans. Math. Softw., vol. 29,
no. 1, Mar 2003.
[22] Sobol sequence generator, http://web.maths.unsw.edu.au/~fkuo/
sobol/.
[23] R. Pratap, P. Sen, C. Davis, R. Mukhophdhyay, G. May, and J. Laskar,
Neurogenetic design centering, Semiconductor Manufacturing, IEEE
Transactions on, vol. 19, no. 2, pp. 173182, May 2006.
Stellenbosch University  https://scholar.sun.ac.za
BIBLIOGRAPHY 136
[24] L. Xieting and D. Guojie, Neural network for yield estimation, in Cir-
cuits and Systems, 1991. Conference Proceedings, China., 1991 Interna-
tional Conference on, Jun 1991, pp. 298300 vol.1.
[25] L.-I. Tong, W.-I. Lee, and C.-T. Su, Using a neural network-based ap-
proach to predict the wafer yield in integrated circuit manufacturing,
Components, Packaging, and Manufacturing Technology, Part C, IEEE
Transactions on, vol. 20, no. 4, pp. 288294, Oct 1997.
[26] G. Creech, J. Zurada, and P. Aronhime, Feedforward neural networks for
estimating ic parametric yield and device characterization, in Circuits
and Systems, 1995. ISCAS '95., 1995 IEEE International Symposium on,
vol. 2, Apr 1995, pp. 15201523 vol.2.
[27] D. Graupe, Principles of Artiﬁcial Neural Networks. World Scientiﬁc,
2007, vol. 6.
[28] W. McCulloch and W. Pitts, A logical calculus of the ideas imminent in
nervous activity, Bulletin of Mathematical Biophysics, no. 6, pp. 115133,
1943.
[29] J. Von Neumann, Probabilistic logics and the synthesis of reliable organ-
isms from unreliable components, Automata Studies, pp. 4398, 1956.
[30] B. Widrow and M. Hoﬀ, Adaptive switching circuits, 1960 URE
WESCON Convention Record, pp. 96104, 1960.
[31] F. Rosenblatt, The perceptron: a probabilistic model of information stor-
age and organization in the brain, Psychological Review, no. 65, pp. 368
408, 1958.
[32] P. Picton, Introduction To Neural Networks. Macmillan, 1994.
[33] J. C. Principe, E. N.R., and W. Lefebvre, Neural and Adaptive Systems:
Fundamentals Through Simulation. Wiley, 2000.
[34] Fast artiﬁcial neural networks library, http://leenissen.dk/fann/wp/.
[35] H. Gerber, C. Fourie, W. Perold, and L. Muller, Design of an asyn-
chronous microprocessor using rsfq-at, Applied Superconductivity, IEEE
Transactions on, vol. 17, no. 2, pp. 490 493, june 2007.
[36] P. Bunyk and P. Litskevitch, Case study in rsfq design: fast pipelined
parallel adder, Applied Superconductivity, IEEE Transactions on, vol. 9,
no. 2, pp. 3714 3720, jun 1999.
Stellenbosch University  https://scholar.sun.ac.za
BIBLIOGRAPHY 137
[37] V. Adler, C.-H. Cheah, K. Gaj, D. Brock, and E. Friedman, A cadence-
based design environment for single ﬂux quantum circuits, Applied Su-
perconductivity, IEEE Transactions on, vol. 7, no. 2, pp. 3294 3297, jun
1997.
[38] Y. Kameda, S. Yorozu, and Y. Hashimoto, Automatic single-ﬂux-
quantum (sfq) logic synthesis method for top-down circuit design, Jour-
nal of Physics: Conference Series, vol. 43, pp. 1179 1182, 2006.
[39] , A new automatic placement and routing design technology for
large-scale single-ﬂux- quantum logic circuits, Extended Abstract of ISEC
03, Sydney, Australia, July 2003.
[40] K. Gaj, E. Friedman, and M. Feldman, Timing of multi-gigahertz rapid
single ﬂux quantum digital circuits, Journal of VLSI Signal Processing,
vol. 16, pp. 247 276, 1997.
[41] J.-C. Lin and V. Semenov, Timing circuits for rsfq digital systems, Ap-
plied Superconductivity, IEEE Transactions on, vol. 5, no. 3, pp. 3472
3477, Sept 1995.
[42] K. Gaj, C.-H. Cheah, E. Friedman, and M. Feldman, Functional mod-
eling of rsfq circuits using verilog hdl, Applied Superconductivity, IEEE
Transactions on, vol. 7, no. 2, pp. 3151 3154, jun 1997.
[43] S. Intiso, I. Kataeva, E. Tolkacheva, H. Engseth, K. Platov, and
A. Kidiyarova-Shevchenko, Time-delay optimization of rsfq cells, Ap-
plied Superconductivity, IEEE Transactions on, vol. 15, no. 2, pp. 328 
331, june 2005.
[44] F. Matsuzaki, N. Yoshikawa, M. Tanaka, A. Fujimaki, and Y. Takai, A
behavioral-level hdl description of sfq logic circuits for quantitative per-
formance analysis of large-scale sfq digital systems, Physica C: Super-
conductivity, vol. 392â396, Part 2, no. 0, pp. 1495  1500, 2003.
[45] H. Toepfer, T. Harnisch, J. Kunert, S. Lange, and H. Uhlmann, For-
mal description of the functional behavior of rsfq logic circuits for design
and optimization purposes, Applied Superconductivity, IEEE Transac-
tions on, vol. 7, no. 2, pp. 3630 3633, jun 1997.
[46] M. Dorojevets, C. Ayala, and A. Kasperek, Data-ﬂow microarchitecture
for wide datapath rsfq processors: Design study, Applied Superconduc-
tivity, IEEE Transactions on, vol. 21, no. 3, pp. 787791, June 2011.
[47] L. Muller and C. Fourie, Automated state machine and timing char-
acteristic extraction for rsfq circuits, Applied Superconductivity, IEEE
Transactions on, vol. 24, no. 1, pp. 312, Feb 2014.
Stellenbosch University  https://scholar.sun.ac.za
BIBLIOGRAPHY 138
[48] J. E. Fang and T. Van Duzer, A josephson integrated circuit simulator
(jsim) for superconductive electronic applications, Extended Abstracts of
4thnd International Superconductive Electronics Conference(ISEC), pp.
407410, 1989.
[49] Institute of Photonic Technology (IPHT), Jena, Germany, http://www.
ipht-jena.de/.
[50] X. Yang, Engineering Optimization. John Wiley and Sons, 2010.
[51] Q. Kerr and M. Feldman, Multiparameter optimization of rsfq circuits
using the method of inscribed hyperspheres, Applied Superconductivity,
IEEE Transactions on, vol. 5, no. 2, pp. 33373340, June 1995.
[52] T. Harnisch, J. Kunert, H. Toepfer, and H. Uhlmann, Design centering
methods for yield optimization of cryoelectronic circuits, Applied Super-
conductivity, IEEE Transactions on, vol. 7, no. 2, pp. 34343437, June
1997.
[53] C. Fourie and W. Perold, Comparison of genetic algorithms to other
optimization techniques for raising circuit yield in superconducting digital
circuits, Applied Superconductivity, IEEE Transactions on, vol. 13, no. 2,
pp. 511  514, june 2003.
[54] Y. Tukel, A. Bozbey, and C. Tunc, Development of an optimization tool
for rsfq digital cell library using particle swarm, Applied Superconductiv-
ity, IEEE Transactions on, vol. 23, no. 3, pp. 1 700 8051 700 805, June
2013.
[55] M. Resende and C. Ribeiro, Handbook of Metaheuristics, F. Glover and
G. Kochenberger, Eds. Kluwer Academic, 2003.
[56] D. Henderson, S. Jacobson, and A. Johnson, Handbook of Metaheuristics,
F. Glover and G. Kochenberger, Eds. Kluwer Academic, 2003.
[57] C. Reeves, Handbook of Metaheuristics, F. Glover and G. Kochenberger,
Eds. Kluwer Academic, 2003.
[58] J. Potvin and K. Smit, Handbook of Metaheuristics, F. Glover and
G. Kochenberger, Eds. Kluwer Academic, 2003.
[59] M. Celik and A. Bozbey, A statistical approach to delay, jitter and timing
of signals of rsfq wiring cells and clocked gates, Applied Superconductivity,
IEEE Transactions on, vol. 23, no. 3, pp. 1 701 3051 701 305, June 2013.
[60] Inductex website, http://www.inductex.info.
Stellenbosch University  https://scholar.sun.ac.za
