Development of RSFQ circular memories for arbitrary AC waveform synthesisers by Tolkacheva, Elena et al.
  
PROCEEDINGS 11-15 September 2006 
 
 
 
 
 
FACULTY OF ELECTRICAL ENGINEERING 
AND INFORMATION SCIENCE 
 
 
 
INFORMATION TECHNOLOGY AND 
ELECTRICAL ENGINEERING - 
DEVICES AND SYSTEMS, 
MATERIALS AND TECHNOLOGIES 
FOR THE FUTURE 
 
 
 
 
 
Startseite / Index: 
http://www.db-thueringen.de/servlets/DocumentServlet?id=12391 
51. IWK 
Internationales Wissenschaftliches Kolloquium 
International Scientific Colloquium 
Impressum 
 
Herausgeber: Der Rektor der Technischen Universität llmenau 
 Univ.-Prof. Dr. rer. nat. habil. Peter Scharff 
 
Redaktion: Referat Marketing und Studentische 
Angelegenheiten 
 Andrea Schneider 
 
 Fakultät für Elektrotechnik und Informationstechnik 
 Susanne Jakob 
 Dipl.-Ing. Helge Drumm 
 
Redaktionsschluss: 07. Juli 2006 
 
Technische Realisierung (CD-Rom-Ausgabe): 
 Institut für Medientechnik an der TU Ilmenau 
 Dipl.-Ing. Christian Weigel 
 Dipl.-Ing. Marco Albrecht 
 Dipl.-Ing. Helge Drumm 
 
Technische Realisierung (Online-Ausgabe): 
 Universitätsbibliothek Ilmenau 
  
 Postfach 10 05 65 
 98684 Ilmenau 
 
Verlag:  
 Verlag ISLE, Betriebsstätte des ISLE e.V. 
 Werner-von-Siemens-Str. 16 
 98693 llrnenau 
 
 
© Technische Universität llmenau (Thür.) 2006 
 
Diese Publikationen und alle in ihr enthaltenen Beiträge und Abbildungen sind 
urheberrechtlich geschützt. Mit Ausnahme der gesetzlich zugelassenen Fälle ist 
eine Verwertung ohne Einwilligung der Redaktion strafbar. 
 
 
ISBN (Druckausgabe): 3-938843-15-2 
ISBN (CD-Rom-Ausgabe): 3-938843-16-0 
 
Startseite / Index: 
http://www.db-thueringen.de/servlets/DocumentServlet?id=12391 
 
51st Internationales Wissenschaftliches Kolloquium 
Technische Universität Ilmenau 
 September 11 – 15, 2006 
 
E. Tolkacheva, M. Khabipov, D. Hagedorn, F-Im. Buchholz, J. Kohlmann, 
J. Niemeyer 
 
 
Development of RSFQ circular memories for arbitrary AC 
waveform synthesisers 
 
 
Abstract - We present the designs of circular Rapid Single Flux Quantum (RSFQ) shift registers which 
are to be used as local memories in an arbitrary AC waveform synthesiser for metrology applications 
and are based on the results of different optimisation approaches. It was possible to apply two different 
optimisation approaches to the circuits for comparison of the outcome. The first approach is based on 
fast logic simulations and results in 22 Josephson junctions per double-bit register stage, operating at 
a maximum clock frequency of 12 GHz at the value of 0.75 of the optimum global bias, and at 17 GHz 
and 21 GHz at the bias points 1.0 and 1.25, respectively. The second optimisation approach is based 
on solving the corresponding integro-differential equations and results in 11 junctions per double-bit 
stage at a maximum clock frequency of 17.6 GHz for the same value of the Stewart-McCumber 
parameter with bias current margins of ±30%, which should be compared with the maximum operating 
frequency at the bias point of 0.75 of the logic simulation approach.  
 
1. Introduction 
 
Recently, a concept for on-chip integrated driving RSFQ circuitry for a pulse-driven 
programmable Josephson voltage standard has been introduced and developed 
[1], [2]. The purpose of this device is to synthesise arbitrary AC waveforms for 
metrology applications. One of its basic features is on-chip digitising of the desired 
waveform and its consequent synthesis by means of Josephson junction arrays. Key 
modules are circular shift registers which are used as local memories. The SFQ 
pulses delivered at the output of the registers are doubled and converted to voltage 
pulses by SFQ/DC interfaces, and then amplified by voltage drivers. To drive large 
Josephson junction arrays, the output of the voltage drivers has to be further 
amplified by semiconductor circuitry. The design of and the experimental results for 
voltage drivers have recently been reported in [3] and have been further improved in 
most recent experiments. The basic structure of the circular memory has been 
presented in [2] and [4]. In the following, the latest results of further developments of 
RSFQ circular memory devices will be presented. 
 
2. Design approaches 
 
The shift registers under development are of the standard z-1-register type [5]. In this 
configuration, the data line of the register provides two simultaneous modes of SFQ 
pulse propagation, i.e., in co-flow and in counter-flow to the pulse propagation across 
the clock line. In the co-flow branch, to provide sufficient time delay, several buffer 
junctions are inserted between adjacent memorising cells in order to avoid racing 
conditions in the clock and in the data paths. The counter-flow branch does not need 
such delays due to the low complexity of the memorising cells and the consequent 
absence of racing conditions in counter-flow clocking.  
 
Two approaches were used in the optimisation of the circular shift registers: a 
physical-based simulation approach and a logic-based one.  
 The physical simulation approach is an optimisation procedure based on the solving 
of the integro-differential equations describing the dynamics of Josephson junctions, 
which implies the exact simulation of the whole circuit in the time domain. In our case 
it is performed by means of PSCAN software [6] which is capable of simulating 
small- and medium-sized circuits with Josephson junctions. With increasing circuit 
complexity, it becomes time-inefficient due to the complexity of the equations 
describing the whole circuit. For higher-complexity circuits, the application of the logic 
simulation approach becomes more efficient. 
 
The logic simulation approach of RSFQ circuits is based on the technique of 
extracting the circuit timings from physical simulations of the circuits (critical timings 
and output delays), and on their consequent utilisation in logic simulation with the 
help of Hardware Description Language (e.g. VHDL or Verilog [7]). Critical timings 
are the restricted delays between incoming signals at different circuit inputs, output 
delays are the delays between the incoming signal and the corresponding output 
signal; for details see [8]. This approach has been adapted for RSFQ design from 
semiconductor digital design and becomes increasingly popular among different 
RSFQ design groups due to the resulting fast turnaround of the design process [9], 
[10].  
 
The set of basic circuits of which every complex RSFQ device is composed and 
which is used in the logic approach, represents the Standard Cells Library. These 
cells are optimised for acceptable parameter margins and for matching each other, 
i.e. in all test benches (standalone - in the surroundings of standard Josephson 
transmission lines (JTLs), and in connections with other cells of the library) each cell 
has parameter margins not smaller than the accepted fabrication-tolerant set. We 
optimise the circuits for variations of the global bias current XI = ±30%, the global 
inductance XL= ±40%, the global critical current density XJ= ±25%, the individual 
junctions’ critical currents XJi= ±30%, individual bias currents XIi= ±30% and 
individual inductances XLi= ±40%. Also, the spread of each output delay in all test 
benches must not be higher than ±5% at each of the three global bias points 
XI=0.75, 1.0 and 1.25. This is achieved by constructing all cells in such a way that at 
all of their inputs and outputs, buffer junctions (single-junction standard JTLs) are 
inserted additionally by which the cores of the cells are separated effectively from the 
environment [8]. This results in stable operation margins, independent of the load, 
and stable output delays.  
 
An example of the standard RS flip-flop optimised in the logic framework and utilised 
in the shift registers under development is presented in Fig. 1. A standard JTL is 
added also at the clock input of each clocked logic cell, as the clock tree is based on 
the splitters which are designed without an output buffer junction in order to reduce 
signal propagation delay via the clock-tree. The addition of one further junction at the 
clock input of the cell introduces, on the other hand, a less crucial delay or no delay 
at all to the clock cycle in most of the complex circuit architectures due to the RSFQ-
inherent capability of deep pipelining, see, for example, ref. [11].  
 
Since the logic simulation is not based on the solving of the corresponding integro-
differential equations, it is by a factor of some tens faster than the physical one. The 
disadvantage of this approach is, however, the increased number of Josephson 
junctions due to the presence of buffer junctions in each standard cell. This results in 
up to twice the lowest number of junctions required to construct the circuit, and also 
in a reduced maximum speed of the device. Thus, higher speed and lower  
  
 
Fig. 1. RS flip-flop from the Standard Cells Library. Dashed areas show buffer junctions. 
 
complexity are sacrificed for a fast turnaround of the design cycle. In the case of 
medium- sized circuits (up to 100 junctions it is still possible to simulate by means of 
the physical simulator), there is still a trade-off between both approaches. For large 
circuits (complex-stage systolic arrays, large Application-Specific Integrated Circuits 
(ASICs)), physical simulation is virtually impossible and the logic simulation approach 
becomes the only acceptable one. 
 
As the circular shift registers under development can be considered as medium-
sized circuits, it is still possible to simulate the whole structure by means of the 
physical simulator. Therefore, both approaches were used for circuit optimisation, 
both resulting in different numbers of Josephson junctions per stage and different 
operation speeds, respectively.  
 
3. Logic simulation approach 
 
Logic optimisation of circular shift registers was performed on the basis of the 
Standard Cells Library developed at Chalmers University of Technology [12]. All the 
cells in the library are optimised at a value of the Stewart-McCumber parameter of 
βC=2. This library is not complete as it was developed for a specific ASIC, hence one 
more basic element that is required for the circular shift register was optimised and 
added to the library, i.e., the element of a DC current-controlled switch, see Fig. 2. It 
enables two modes of operation depending on the control current I1: data rotating in 
the register and resetting the register to the zero state/loading new data. 
 
 
 
Fig. 2. DC current-controlled switch, optimised for the Standard Cells Library. 
clkinclkout
cntfloutcntflin
coflincoflout
CLKin
clkout
DATAout
cntflin
DATAin
coflout
clkinclkout
cntfloutcntflin
coflincoflout
splitter
cntflout
coflin
clkin
merger
switch
 
 
Fig. 3. Block diagram of the circular shift register (3 double-bit stages) used in logic optimisation.     
 
Fig. 3 shows a block diagram for the logic optimisation approach of the shift register. 
Its double-bit stage consists of two splitters in a clock line and two RS flip-flops (d1 in 
Standard Cells Library) per stage, see Fig. 4. Delays for avoiding racing conditions in 
a co-flow line (standard JTL lines, shown as circles in Fig. 3) are optimised in this 
approach separately in VHDL by adjusting the number of the JTL stages and the 
values of βC, resulting in a 4-junction standard JTL with βC=1.04.  
 
The values of the maximum clock frequency f of the shift register at three basic 
global bias current points resulting from the optimisation are presented in Table I 
(The clock frequency f corresponds to the technology parameters of the PTB-4-µm 
Nb SIS fabrication process with a critical current density of jC=1kA/cm2 and a 
characteristic voltage of VC=250µV [13].). The limiting factor for the frequency 
performance of the circuit appears to be the delay of the data path between the last 
and the first bit throughout the switch and the merger, see Fig. 3. 
  
 
Fig. 4. Schematic diagram of a single           Table I. Maximum operation frequencies f at different  
    stage of the circular shift register.              basic global bias current  points XI (logic optimisation). 
 
The shift registers were designed in configurations with 32- and 8-double-bit stages 
(64 and 16 bits, correspondingly), see Fig. 5. One regular stage comprises 22 
junctions and has a size of 560x103 µm2; the size of the 32-stage version core is 
560x3500 µm2. 
 
Fig. 5. Design of 64-bit (left) and 16-bit (right) circular shift registers based on the Standard Cells 
Library. 
XI f (GHz) 
0.75 12 
1.0 17 
1.25 21 
  
Fig. 6. Schematic diagram of one regular stage of the circular shift register optimised by physical 
simulations. 
 
4. Physical simulation approach 
 
The circular shift register optimised only by physical simulations comprises 11 
Josephson junctions per regular double-bit stage, including the co-flow branch data-
input delay, see Fig. 6. It was optimised for two values of the Stewart-McCumber 
parameter, βC = 1 and βC =2. Due to the fact that there are no buffer junctions in this 
approach, the merger and the switch circuits have a reduced number of junctions 
compared to the previous approach. Since control of the circuit switching as 
described in the standard sfqhdl script is not always sufficient to ensure the correct 
operation of the whole circuit [14], a modification of the well-known technique of 
control check-sums was incorporated into the sfqhdl script of the shift registers and 
applied during the optimisation. 
 
The circuit optimisation was performed incrementally, first on the 3 double-bit stage 
device, then on the one comprising 6, 12 and 16 stages. A decrease of the margins 
for the global bias current and the critical current density by about 2 to 6% during the 
transition between the first three iterations was corrected by a slight adjustment of 
the parameters. These corrections did not reduce the margins of the registers with a 
lower number of stages. During the transition to the 16-stage version these margins 
decreased by values of 1 to 2% only which, in turn, leads to the estimation that a 
further increase in the circuit complexity will not decrease the margins by more than 
2 to 3% in total, assuming the first target of 64 stages. 
 
The obtained maximum operation frequencies for the corresponding values of βC, 
assuming ±30% margins on global bias, are presented in Table II. These values 
should be compared to those from the logic optimisation approach at the bias point 
value of 0.75 of optimum. The maximum frequencies at the optimum bias point are 
higher, but they are not used during physical optimisation; one can guess these 
values from the dependence of the maximum operation frequency for the logic 
approach, it should be virtually the same for both approaches, considering similar  
 βC f (GHz) 
1 14.6 
2 17.6 
 
Table II. Maximum operation frequencies f at the global bias current point 0.75 of optimum value for 
different values of the Stewart-McCumber parameter βC (physical optimisation). 
 
architecture [15]. It can be seen from Tables I and II that a circuit optimised by 
physical simulation operates at higher speed even at lower values for βC, due to the 
lower number of junctions. In this approach, the clock frequency was limited by the 
different relative stretching of the signal propagation delays across the clock and 
data paths in the co-flow branch, with global bias. 
 
The circuits were designed for fabrication in configurations of 128-bit (64 double-bit 
stages) and 32-bit (16 double-bit stages) shift registers, see Fig. 7. One regular stage 
has a size of 370x80 µm2; the 128-bit device has a size of 370x5400 µm2. The 
circuits were designed and fabricated with the same technology process as those 
described in the previous section [13].  
 
 
 
Fig. 7. Design of a 128-bit circular shift register optimised by physical simulations. 
 
 
5. Conclusion 
 
We have designed RSFQ circular shift registers for application in arbitrary AC 
waveform synthesisers. They were optimised within two different design frameworks 
according to a logic-based simulation approach and a physical-based one. The logic 
approach, on the one hand, resulted in a higher number of Josephson junctions (22 
junctions per double-bit stage) and a lower speed (12 GHz at the bias point value of 
0.75 of the optimum), which meant sacrificing higher speed and lower complexity for 
achieving a fast design cycle. The physical approach, on the other hand, took about 
20 times longer but resulted in only 11 junctions per double-bit stage and a maximum 
operating frequency of 17.6 GHz for the same value of βC at the same bias point 
value. The fabrication yield of these approaches is under investigation.  
 Acknowledgement 
 
This work is supported by the BMBF-Project QuaSy (13N8412), Germany, the EU-
Project RSFQubit (FP6-502807) and carried out partly within the framework of a 
bilateral cooperation between the PTB and the IREE, Moscow, Russia.   
 
References:    
[1] M. Khabipov, W. Kessel, H.-G. Meyer, J. Niemeyer, G. Wende, 43rd  Int. Scient. Coll. TU Ilmenau, Conf Proc.,  
No. 3, pp. 224-227, 1998.  
[2] M. Khabipov, F.-Im. Buchholz, J. Niemeyer, H.-G. Meyer, G. Wende, Proc. of the 6th European Conf. on Applied 
Supercond. (EUCAS 2003), No. 181, pp. 3459-3464. 
[3] M. Khabipov, D. Hagedorn, F.-Im. Buchholz, J. Kohlmann, F. Maibaum, M. Schilling, J. Niemeyer, in print: Proc. of 
the 7th European Conf. on Applied Supercond. (EUCAS 2005). 
[4] F.-Im. Buchholz, W. Kessel, M.I. Khabipov, R. Dolata, J. Niemeyer, A.Yu. Kidiyarova-Shevchenko, Proc. of the 3rd 
European Conf. on Applied Supercond. (EUCAS 1997), No. 158, pp. 433-436.  
[5] K. Likharev, V. Semenov, IEEE Trans. on Appl. Supercond., vol. 1,  pp. 3-28, 1991. 
[6] S. Polonsky, P. Shevchenko, A. Kirichenko, D. Zinoviev, A. Rylyakov, IEEE Trans. on Appl. Supercond., vol. 7, 
No. 2, pp. 2685-2689, 1997. 
[7] www.vhdl-online.de, www.cadence.com . 
[8] S. Intiso, I. Kataeva,  E. Tolkacheva, H. Engseth, K. Platov, A. Kidiyarova-Shevchenko, IEEE Trans. on Appl. 
Supercond., vol. 15, No. 2, pp. 328-331, 2005. 
[9] S. Yorozu, Y. Kameda, H. Terai, A. Fujimaki, T. Yamada, S. Tahara, Physica C, vol. 378-381, pp. 1471-1474, 2002. 
[10] F. Matsuzaki, N. Yoshikawa, M. Tanaka, A. Fujimaki, Y. Takai, Physica C, vol. 392-396, pp. 1495-1500, 2003. 
[11] E. Tolkacheva, Lic. Thesis, available online: www.mc2.chalmers.se/mc2/fte/HFDE/new/2005-04-01.xml  
[12] Available oline: www.mc2.chalmers.se/mc2/fte/HFDE/mat/cadint.xml  
[13] R. Dolata, M. Khabipov, F.-Im. Buchholz, W. Kessel, J. Niemeyer, Proc. of the 2nd European Conf. on Applied 
Supercond. (EUCAS 1995), No. 148, pp. 1709-1712. 
[14] E. Tolkacheva, MS diploma work, Moscow State University, 2003, in Russian available: elena@dnph.phys.msu.su. 
[15] E. Tolkacheva, H. Engseth, I. Kataeva, K. Platov, and A. Kidiyarova-Shevchenko, Proc. of the 6th European Conf. on 
Applied Supercond. (EUCAS 2003), No. 181, pp. 3486-3492. 
 
Authors:  
Elena Tolkacheva  PTB 
Dr. Marat Khabipov  PTB/IREE 
Dr. Daniel Hagedorn  PTB 
Dr. Friedrich-Im. Buchholz PTB 
Dr. Johannes Kohlmann  PTB 
Prof. Jürgen Niemeyer  PTB 
  
Physikalisch-Technische Bundesanstalt  
Bundesallee 100, PF 3345 
38023 Braunschweig 
Phone: +49 (531) 592 24 21  
Fax: +49 (531) 592 24 05 
E-mail: marat.khabipov@ptb.de 
