A novel global self-timing methodology for BSFQ circuits by Teh CK et al.
IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY, VOL. 13, NO. 2, JUNE 2003 543
A Novel Global Self-Timing Methodology
for BSFQ Circuits
Chen Kong Teh and Yoichi Okabe, Member, IEEE
Abstract—Recently we have proposed Boolean single-flux-
quantum (BSFQ) circuits, which like CMOS circuits directly
support Boolean primitives, and do not require local synchroniza-
tion for their elementary cells as well as for their combinational
cells. However, only the cell-level timing description of the BSFQ
circuits was considered, which did not specify their global timing
strategy in a system-level design. In this paper, we present a
novel global self-timing methodology, dual encoding hierarchical
pipelining (DEHP), for the locally asynchronous BSFQ circuits. In
circuit implementation, a nonvolatile memory cell named ND-DFF
and a volatile memory cell named D-DFF have been designed.
Index Terms—Asynchronous design, Boolean primitive, BSFQ,
pipelining, self-timing, SFQ circuits.
I. INTRODUCTION
RECENTLY we have proposed Boolean single-flux-quantum (BSFQ) circuits [1], [2], which like CMOS
circuits directly support Boolean primitives, and do not require
local synchronization for their elementary cells as well as their
combinational cells. Moreover, BSFQ circuits are also charac-
terized by their simple design, low latency, high modularity, and
high flexibility in terms of easily interfacing with other types
of SFQ circuits. However, despite their locally asynchronous
feature, BSFQ circuits, similar to CMOS circuits, necessitate
a global timing signal for a system with sequential circuits. In
our previous work, we only considered BSFQ circuits as inde-
pendent circuits themselves, and did not specify their global
timing strategy in a system-level design. Therefore, in order to
provide BSFQ circuits with a globally asynchronous feature,
we have developed a novel global self-timing methodology,
named dual encoding hierarchical pipelining (DEHP).
In the context of global timing strategy, BSFQ logic can uti-
lize both synchronous and asynchronous timing scheme used by
CMOS logic. However, synchronous timing scheme requires se-
vere attention for clock skew problem at multi-gigahertz oper-
ating frequencies, and restricts a system to a worse case perfor-
mance. On the other hand, despite allowing average-case per-
formance, CMOS asynchronous timing scheme requires rather
bulky hardware and complicated design for control logic sub-
system. These facts motivate us to develop an alternative glob-
ally asynchronous timing scheme, which enables BSFQ system
Manuscript received August 6, 2002. This work was supported in part by a
grant from the Superconductivity Research Laboratory, Japan.
C. K. Teh is with the Electronic Engineering Department, University of
Tokyo, Tokyo 113-8656, Japan (e-mail: teh@okabe.rcast.u-tokyo.ac.jp).
Y. Okabe is with the University of Tokyo, Tokyo 113-8656, Japan (e-mail:
okabe@okabe.rcast.u-tokyo.ac.jp).
Digital Object Identifier 10.1109/TASC.2003.813931
Fig. 1. (a) BSFQ local encoding scheme. (b) Basic diagram of a BSFQ cell.
(c) Usage of BSFQ local encoding scheme is shown by using dynamics of an
AND gate.
to have high throughput and low latency, without too much com-
plexity overhead.
The first idea of DEHP methodology is to use two suitable en-
coding schemes, which can provide required timing information
for combinational circuits and sequential circuits, respectively.
The second idea is to use self-timed hierarchical pipelines, com-
prised of micro-pipelines and meta-pipelines, for building an ar-
bitrary BSFQ system.
II. CELL-LEVEL TIMING SCHEME
SFQ circuits inherit different operating fashion compared to
that of CMOS circuits, where the former uses voltage pulse for
data transmission, while the latter uses voltage level. In SFQ
circuits, since the arrival of a SFQ pulse is at a point in time,
synchronizing signals are important for establishing timing win-
dows for data signals. On the other hand, in CMOS circuits, the
time interval of a level signal behaves as a timing window for
itself, and an overall timing window is obtained by overlapping
timing windows produced by all input signals.
By this consideration, BSFQ logic adopts level-processing
methodology of CMOS logic, on top of its pulse-based oper-
ating mode. In BSFQ logic, a Boolean level signal is represented
by a “set” SFQ pulse at the rising edge of the signal, and a
“reset” SFQ pulse at the falling edge of the signal [Fig. 1(a)].
This encoding approach is called BSFQ local encoding scheme,
in DEHP methodology. An example of AND operation using
this local encoding scheme is shown in Fig. 1(c). Note that the
AND gate operates asynchronously.
Encoded set-reset pulses are transferred by using a dual-rail
Josephson transmission line (JTL), directed toward BSFQ cells
[Fig. 1(b)], where operations are performed. For a special cell
such as a BSFQ gate, these set-reset pulses are converted into
superconducting flux levels, for performing Boolean operation
according to its preset threshold of flux level [2]. The results
1051-8223/03$17.00 © 2003 IEEE
544 IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY, VOL. 13, NO. 2, JUNE 2003
Fig. 2. (a) BSFQ global encoding scheme. (b) An error canceller cell
represents an example of a BSFQ elementary cell. No converter is required
for interchange of encoding, since there are escape junctions in all BSFQ
elementary cells. Note that BSFQ error canceller is similar to RSFQ RS
flip-flop [5], except the former one has an Out(R) rail.
are outputted in the form of set-reset pulses again for propa-
gating to other BSFQ cells. For interfacing with external voltage
level environment, BSFQ level-to-pulse converter and BSFQ
pulse-to-level converter are used.
III. SYSTEM-LEVEL TIMING SCHEME
In BSFQ local encoding scheme, process timing is obtained
when there is a change in Boolean state encoded in data signals.
However, if the state remains high or low, the process timing is
unknown. Therefore, despite BSFQ combinational circuits op-
erate asynchronously, BSFQ sequential circuits require global
clock signal for synchronization. In order to make these sequen-
tial circuits operate asynchronously, we consider adding a new
encoding scheme, BSFQ global encoding scheme [Fig. 2(a)], for
channels requiring timing information that the local encoding
scheme cannot provide. This global encoding scheme is similar
to that of dual-rail logic [3], [4], where a data signal should be
sent in either a set rail or a reset rail at every event of operation.
No converter is required for interchanging between global en-
coding and local encoding, thus facilitates hardware required.
Consider a series of pulses, in the order shown in Fig. 2(a), ar-
rives at the input of a BSFQ error canceller shown in Fig. 2(b).
When two consecutive set pulses arrive, the latter pulse will be
thrown off by escape junction , maintaining the state “one”
in the cell. As the same, when two consecutive reset pulses ar-
rive, the cell will only output the first pulse and throw away the
latter pulse through escape junction . Hence, after processed
by error canceller cell, the output sequence of the pulses will
be just like Fig. 1(a). Since such escape junctions are with all
BSFQ circuits, no additional converter is required for the inter-
change of encoding.
IV. HIERARCHICAL PIPELINING
In BSFQ logic as well as CMOS logic, the simplest single-
stage pipeline consists of a combinational logic block and a reg-
ister. However, in conventional SFQ logic, such as RSFQ logic
[5], the simplest pipeline limits to an elementary cell, since the
elementary cell requires local clock to obtain processed result.
Thus, BSFQ logic has an advantage, where one can build a big
combinational logic block in the pipeline, as long as no feed-
back required in the block. This enables BSFQ combinational
circuits to be simple in design and lower latency in operation,
since local clock distribution, as well as time margins for a local
Fig. 3. (a) DEHP micro-pipeline. Variable m; n; p shows bus width of the
related channel. (b) Initiator is created by using a RSFQ DC/SFQ converter and
a splitter block. Output rails of the initiator are connected to reset rails of the
feedback channel. (c) An example of micro-pipeline (m-PL) network.
clock event, is not required at all. In order to enable BSFQ se-
quential circuits to have this advantage as well, we have devel-
oped hierarchical pipelining technique, for capsulating BSFQ
sequential circuits into asynchronous functional block, so that
these blocks can compose combinational functional block.
A. Micro-Pipeline: Structure
A DEHP micro-pipeline is the simplest sequential module,
which has a feedback channel with an initiator, a combinational
logic block, a BSFQ nondestructive D flip-flop (ND-DFF)
block, and a Muller C-element block [Fig. 3(a)]. The ND-DFF
block is similar to a CMOS register, which temporarily stores
processed results. The Muller C-element block is a comple-
tion-waiting functional block, which outputs a ready signal if
and only if all input signals are received.
BSFQ local encoding scheme is only used in communication
inside combinational logic block, and communication between
combinational logic block and ND-DFF block. The other parts
of micro-pipeline, including the inputs and outputs, are using
BSFQ global encoding scheme in communication.
B. Micro-Pipeline: Operation
1) Initialization: At system initialization stage, initiator in
each micro-pipeline produces a SFQ pulse for each reset rail
in the feedback channel. These pulses reset the related input
rails in combinational logic block, and present as ready signals
to Muller C-element block. The initiator consists of a RSFQ
DC/SFQ cells and a splitter block [Fig. 3(b)]. It is activated
by raising the input current of the DC/SFQ cells, which is con-
trolled by either an external current source or an on-chip current
source. The design is simple, since the input current path is inde-
pendent of inductance, resulting in simple access to the required
location.
2) Normal Routine: When input signals arrive at the micro-
pipeline, they will be processed asynchronously by combina-
tional logic block, and the results will be stored in ND-DFF
block. On the other hand, the input signals also branch out to
Muller C-element block, where their arrival is acknowledged. If
TEH AND OKABE: A NOVEL GLOBAL SELF-TIMING METHODOLOGY FOR BSFQ CIRCUITS 545
Fig. 4. DEHP pipeline. Initiator and DC/SFQ converter are used only in
initialization of the pipeline. The diamond-shaped node indicates that all
outputs of initiator connected to reset rails (R) of feedback channel.
all input rails of Muller C-element block receiving signals, an
output signal will be released to ND-DFF block, from where the
stored results will be outputted to either the next micro-pipeline
or the feedback channel. The whole process is repeated after
the feedback signals return to the inputs of combinational logic
block.
C. Micro-Pipeline: Requirement
1) Muller C-Element Block: In layout design, the shortest
time interval for input pulses propagating from IN port to
ND-DFF block through Muller C-element block, is adjusted by
using several stages of JTL’s. The adjusted time interval should
be equal to a total of time interval for signals traveling from IN
port to ND-DFF block through combinational logic block, and
time margin for process variation. This can be easily calculated
from logic simulation. If is the latency of combinational
logic block, is the latency of feedback channel, is the
latency of other auxiliary circuits, and is time margin for
process variation, the total latency of one operating cycle for a
micro-pipeline is .
2) Feedback Channel: We may use low latency passive
transmission lines instead of active JTL’s for designing the
feedback channel, so that the penalty of waiting feedback
signals is lessened.
D. Meta-Pipeline
Substituting the combinational logic block in Fig. 3(a) with
a network of micro-pipelines, we obtain a DEHP 1st-level
meta-pipeline. No feedback is allowed in the micro-pipeline
network, which is similar to that for combinational logic
block. An example of micro-pipeline network is illustrated in
Fig. 3(c). As the same, substituting the micro-pipeline network
with a network of 1st-level meta-pipeline, we obtain DEHP
2nd-level meta-pipeline. By using this hierarchical pipelining
method, we can construct an arbitrary FIFO sequential circuit.
E. Pipeline
A DEHP pipeline is the highest-level meta-pipeline, where
backward communication of vacancy status or data request, can
no longer be omitted. Fig. 4 illustrates structure of a pipeline,
which is similar to that in Fig. 3(a), but has a request channel
connecting to the adjacent pipelines, and has a DC/SFQ con-
verter for initializing the request channel, directed from the suc-
Fig. 5. (a) BSFQ nondestructive D flip-flop (D-DFF) cell. Nominal value:
I1 = 0:23 mA, I2 = 0:12 mA, J1 = J8 = 0:14 mA, J2 = 0:23 mA,
J3 = J9 = 0:20 mA, J4 = J5 = J7 = 0:13 mA, J6 = 0:12 mA,
J10 = 0:26 mA, J11 = J12 = 0:17 mA, L1 = 9:7 pH, L2 = 1:2
pH, L3 = 0:2 pH, L4 = 1:1 pH, L5 = L13 = 0:4 pH, L6 = 7:9
pH, L7 = 0:6 pH, L8 = 1:0 pH, L9 = L10 = 1:6 pH, L11 = 5:3
pH, L12 = 1:9 pH, R1 = 3:7
. Parasitic inductances are omitted here but
included in optimization. (b) Moore diagram of a ND-DFF cell.
ceeding pipeline. An example of a pipeline is a multiplier, an
address calculator, an instruction decoder, etc. Thus, a DEHP
pipeline is just like a single-stream FIFO executor of a specific
job, and the backward request communication insures synchro-
nization of the pipeline with adjacent pipelines, at the moment
of reloading data. A demonstration of designing and high-speed
testing of a DEHP pipeline has been carried out. The results will
be presented in other paper soon.
V. KEY CIRCUITS FOR DEHP METHODOLOGY
A. ND-DFF Circuit
A BSFQ nondestructive D flip-flop (ND-DFF) is a non-
volatile memory cell, which stores a SFQ fluxon as its internal
state, and the internal state is unchanged after a read-out
process being carried out. In the read-out process, a timing
signal is sent to ND-DFF, producing a set pulse or a reset pulse
according to its internal state.
Schematic of a ND-DFF is shown in Fig. 5(a). ND-DFF is
constructed by using the idea of RSFQ B flip-flop template [6].
This cell has three applications. First, it can be used as a data
cache in a DEHP pipeline. Second, it can be used as a data
encoder for the BSFQ global encoding scheme, where when
switched by timing signals, a series of set or reset pulses is pro-
duced in the outputs, according to inputted Boolean signal in
In(S) and In(R) rails. Third, it can be used as a direction switcher
for single data rail, where inputted signals in Clk rail will be di-
rected to either Out(S) rail or Out(R) rail, controlled by the in-
putted Boolean signal.
Parameter margins of ND-DFF are sensitive to parasitic in-
ductance, where a bigger parasitic inductance leads to a nar-
rower margin. Thus, after incipient design of the cell, the par-
asitic inductances are extracted, and the whole cell is re-opti-
mized. Optimized BSFQ ND-DFF, including consideration of
parasitic inductance, has simulated global bias margin ranging
from 29% to 26% at 10 GHz. In simulation, the critical
current density, IcRn product and McCumber parameter of a
shunted Josephson junction are assumed 2.5 kA/cm , 0.37 mV,
546 IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY, VOL. 13, NO. 2, JUNE 2003
Fig. 6. (a) BSFQ destructive D flip-flop (D-DFF) cell. Nominal value: I1 =
0:20 mA, J1 = J5 = J9 = 0:16 mA, J2 = J4 = 0:21 mA, J3 = 0:24
mA, J6 = J7 = 0:27 mA, J8 = 0:14 mA, J10 = 0:29 mA, J11 = 0:13
mA, J12 = 0:12 mA, L1 = 9:7 pH, L2 = 1:2 pH, L3 = 0:2 pH, L4 = 1:1
pH, L5 = 7:9 pH, L6 = L7 = 1:3 pH, L8 = 2:6 pH, L9 = L10 = 1:5 pH,
R1 = R2 = 3:7
. Parasitic inductances are omitted here but included in the
optimization. (b) Moore diagram of a D-DFF cell.
and unity, respectively, where Ic is critical current, and Rn is
normal resistance of a shunted Josephson junction.
A Moore diagram for ND-DFF is shown in Fig. 5(b). For un-
derstanding operating mechanism of ND-DFF, first we consider
the internal state of this cell is “zero.” When a signal arrives at
In(S) rail, it switches junction and , and the internal state
of the cell becomes “one.” Successive set pulses will be thrown
away by escape junction , for maintaining the state “one” in
the cell. Then, if a reset pulse arrives at In(R), it will switch junc-
tion , , and , and restore the internal state of the cell.
Successive reset pulses will be thrown away by escape junction
. On the other hand, if a timing signal arrives at Clk rail when
the cell is in state “zero,” it will switch junction , and ,
and produce a reset pulse at the Out(R) rail. Switching of junc-
tion maintains the state “zero” of the cell. On the contrary, if
a timing pulse arrives at Clk rail when the cell is in state “one,”
it will switch junction , and , consecutively, and
produce a set pulse to the Out(S) rail. Switching of junction
maintains the state “one” for the cell, ensuring nondestructive
read-out of stored information.
B. D-DFF Circuit
A BSFQ destructive D flip-flop (D-DFF) is a volatile memory
cell that the internal state restores to its initial state “zero” when
switched by a timing signal. This cell is mainly used to build up
a BSFQ shift-register. Optimized D-DFF, which includes con-
sideration of parasitic inductance in layout, has simulated bias
margins ranging from 30% to 32% at 10 GHz. Schematic of
a D-DFF is shown in Fig. 6(a), and its Moore diagram is shown
in Fig. 6(b).
Consider the cell is at state “zero.” When a signal arrives at
In(S) rail, it switches junction and , resulting in the cell
turning into state “one.” Then, if a reset pulse arrives at In(R),
it will switch junction and , and restore the internal state
of the cell to state “zero” again. On the other hand, if a timing
signal arrives at Clk rail when the cell is in state “zero,” it will
switch junction , , and , and produce a reset pulse at the
Out(R) rail. On the contrary, if a timing pulse arrives at Clk rail
when the cell is in state “one,” it will switch junction , ,
and , consecutively, and produce a set pulse to the Out(S)
rail. The internal state of D-DFF restores to state “zero” after
the switching of junction .
VI. CONCLUSION
We have presented a novel global self-timing methodology,
DEHP, for constructing a globally asynchronous locally asyn-
chronous BSFQ system. In this methodology, two encoding
schemes are used for representing a Boolean signal, where
one of them provides sufficient timing information for BSFQ
local cell, and the other one provides sufficient information
for global functional block. Besides, the methodology is using
hierarchical pipelining, where the self-timed micro-pipeline,
meta-pipeline, and pipeline are used to build an arbitrary
asynchronous BSFQ system. A nonvolatile memory cell
named ND-DFF and a volatile memory cell named D-DFF are
constructed to achieve this goal.
REFERENCES
[1] H. Kodaka, T. Hosoki, and Y. Okabe, “Single flux quantum level cir-
cuit using new DC/SFQ,” IEEE Trans. Appl. Supercond., vol. 9, pp.
3729–3732, Jun. 1999.
[2] C. K. Teh and Y. Okabe, “New BSFQ circuit designs with wide mar-
gins,” IEEE Trans. Appl. Supercond., vol. 11, no. 1, pp. 970–973, Mar.
2001.
[3] Z. J. Deng, S. R. Whiteley, and T. Van Duzer, “Data-driven self-timing
of RSFQ digital integrated circuits,” in Ext. Abst. of 5th Int. Supercond.
Electron. Conf., 1995, pp. 189–191.
[4] M. Maezawa, I. Kurosawa, M. Aoyagi, H. Nakagawa, Y. Kameda, and T.
Nanya, “Pulse-driven dual-rail logic gate family based on rapid single-
flux-quantum (RSFQ) devices for asynchronous circuits,” in Proc. of
2nd Int. Symp. on Adv. Res. in Asynchronous Circ. and Syst., 1996, pp.
134–142.
[5] K. K. Likharev and V. K. Semenov, “RSFQ logic/memory family: A
new Josephson-junction digital technology for sub-terahertz-clock-fre-
quency digital systems,” IEEE Trans. Appl. Supercond., vol. 1, pp. 3–28,
Mar. 1991.
[6] S. V. Polonsky, V. K. Semenov, and A. F. Kirichenko, “Single flux
quantum B flip-flop and its possible applications,” IEEE Trans. Appl.
Supercond., vol. 4, pp. 9–18, Mar. 1994.
