Abstract-Recently we have proposed novel Boolean Single-Flux-Quantum (BSFQ) circuits, which just like CMOS circuits support Boolean primitives directly, and do not require local synchronization for each operation cell. However, previous BSFQ AND, OR, and XOR cells suffered from problems with narrow margin, where their critical margins hardly exceeded +lo% due to low flux gain. Furthermore, while being suitable for combinational circuits, previous BSFQ NOT cells had initialization problems in sequential circuits. In this paper, new versions of these circuits with simulated margins beyond 430% are proposed. Moreover, a Muller C-element, an error canceller, a destructive read-out (DRO), and a demultiplexer are also newly created. The operation time, parameter margins, and circuit size of these BSFQ cells are comparable to those of the conventional RSFQ cells.
I. INTRODUCTION he distinct differences between the Boolean Single-Flux-Quantum (BSFQ) logic system and other types of single-flux-quantum (SFQ) logic systems are their implementation of operation timing, and the type of operation primitives they support. In the BSFQ logic system, a Boolean signal is represented by a "set" SFQ pulse at the rising edge of the signal, and a "reset" SFQ pulse at the falling edge of the signal [l]. These set and reset pulses are transferred by using a dual-rail Josephson transmission line (JTL), directed toward BSFQ cells, where they are converted into superconducting flux levels for performing Boolean operations, and the results are outputted in the form of set-reset pulses. Thus, there is no need for local synchronization for each BSFQ cell, and Boolean primitives are supported directly just as in CMOS logic.
For conventional Rapid Single-Flux-Quantum (RSFQ) logic, clock signals are required by each operation cell for implementing timing windows on their data signals [2]. Compared to other SFQ circuits, RSFQ circuits require few Josephson junctions. However, clock skew might become a problem for large-scale circuits operating at clock-speed approaching a terahertz, since timing uncertainty exists in the clock distribution tree due to process variations and thermal fluctuation. These types of asynchronous circuits eliminate the necessity of global clock signals for the local operation cells, since timing information is encoded in dual-rail data signals. However, the number of Josephson junctions required to implement these asynchronous circuits is comparably large, since the timing information requires to be decoded from the data signals and then directed to the local operation cells of these circuits. As a result, the circuits are bulky, and their layout designs are complicated.
T Manuscript received September
There are other types of asynchronous logic systems, which do not require local synchronization at all, such as Delay Insensitive (DI) logic [5]. These circuits have potential to offer high performance, since they operate at an average speed rather than a worst case speed. However, DI logic is event-based logic, which is unusual in modern VLSI technology. Thus, DI logic might be unable to utilize the hitfid knowledge base of today's VLSI technology, which is based on Boolean primitives. Moreover, DI circuits require large areas for layout, and the placement is complicated, since DI circuits have many branches of data signals converging at the same location.
BSFQ logic system has merits all these logic systems have; while requiring only comparable number of Josephson junctions, BSFQ circuits do not require local synchronization, and support Boolean primitives directly. Moreover, since BSFQ circuits use SFQ pulses in transferring level information, it can utilize the know-how of the leading RSFQ technology. In fact, the BSFQ logic system can share circuits with RSFQ. BSFQ may be considered an extended logic system for RSFQ.
PROBLEMS OF PREVIOUS BSFQ CIRCUKS

A . Narrow Margin Problem
For BSFQ circuits, AND and OR cells share the same structure ( Fig. 1) [ 6 ] . Boolean operations are performed based on the threshold value of the flux level across L2.
However, if we consider a flux quantum trapped inside JI-LI-J2-L2, we will find that the flux level o f L2 is much smaller than a flux quantum Qo, since the inductance L2 has to be much smaller than Ll for the stability of the circuit operation. Thus, the parameters of the following stage have small margins due to the low flux gain across L2. In simulation, the flux gain hardly exceeds O.lQo. As a result, critical margins (11, J3) of previous BSFQ AND, OR, XOR cells were below 40%. crossing the set and reset rails of the dual-rail JTL. However, unlike other logic system using dual-rail, a BSFQ inverter must be initialized, since at the initial state where there is no input pulse, the output of the cell is a '0' state instead of a '1' state. Two initialization methods were proposed previously. One is sending an initial pulse as a set pulse to the output of the cell by using a merger [6]. However, this makes the circuit inefficient, and the placement of the cell complicated. The alternative choice is sending a series of set-reset pulses to the dual-rail input of the system [ 7 ] . These pulses change the internal flux level of the cells connected at the back of each inverter. It was shown that this method could initialize all inverters in arbitrary combinational circuits. However, the inverters were found to not work well in sequential circuits, since initial pulses might not reach some inverters in the circuits. Moreover, the number of initial pulses required to initialize all the inverters in the system are indefinite for a black box system.
B. Incompleteness of Previous NOT Cell
NEW BSFQ CIRCUITS
In this section, the critical current density, 1&, product and
McCumber parameter of a shunted Josephson junction are assumed to be 2.5 kA/cm2, 0.37mV and unity, where Z,is the critical current, and R, is the normal resistance of a shunted Josephson junction.
A. Error Canceller
In BSFQ circuits, the use of dual-rail JTLs to transfer level information might cause 2 problems. Firstly, error pulses occur occasionally in the circuits due to thermal fluctuations. The presence of these pulses in between any set and reset pulses will cause the following data signals to be out of order. Secondly, there might be a situation that pulses in one rail overtake pulses in another rail at a certain point in the circuits, especially when SFQ pulses of high speed propagate in a long dual-rail JTL .
Formerly, 2 escape junctions were added to the input of the AND/OR cells to solve the first problem [7] . However, some cells such as the XOR cell cannot be modified to include these escape junctions. Thus, we created an error canceller cell for removing the unwanted pulses in arbitrary BSFQ cells. For solving the second problem, the 2 rails of the dual-rail JTL are placed as near as possible to avoid process variations, and error cancellers are used to break up long JTLs. Fig. 2 illustrates the schematic of an error canceller cell.
This cell behaves like a JTL if the set pulse and reset pulse arrive altemately. However, if 2 set pulses (2 reset pulses) arrive, junction J3 ( J 7 ) will switch and throw the second pulse out of the rail. This cell has critical margin as wide as *32% (Table I ) in simulation.
B. NOT Cell
This new NOT cell is similar to an error canceller cell except the set rail and reset rail are interchanged, and the bias current 12 is variable. From its initial state, the current is set to a higher level and then returned to its initial level. When 12 is set to a higher level, junction J5 switches one and only one time, and 3 SFQ pulses occur. One of them propagates to the set rail of the output, and one of them is trapped inside the loop JS-L7-J9 as an internal flux level of the inverter. The other SFQ pulse propagates toward the input of the cell, and then is thrown out of the rail through buffer junction JZ. Afier being initialized, the NOT cell operates in the same way as the error canceller cell. The simulated critical margin of this cell is as wide as *32%. Fig. 4 illustrates new BSFQ AND, OR, and Muller C-element cells. They are constructed by using two RSFQ D flip-flop and a dc SQUID. Mutual coupling is used to raise the flux gain of inductance L8. As a result, the flux gain reaches 0.25@, as a flux trapped inside one of the J4-L4-J8 loops. For the AND cell, junction J9 will switch when the flux level across the two inductances L8 is raised to 0.5 cPo, and junction J12 will switch when the flux level returns to its initial condition. The inductance L4, L8, and the mutual inductance between them are extracted from the layout of cell by using our inductance calculation tool [ S I . The calculated critical margin of the AND cell is i31%. However, for the OR cell, the critical margin is *21%, whch is still narrow for using in large-scale circuits. Hence, instead of using an OR cell, it is better to use a combination of NOT and AND cell for this purpose.
C. AND/OR/Muller C-element Cell
Muller C-element cells are required for constructing self-timed circuits. For this cell, a set pulse will release only after each of its inputs receives a set pulse, and a reset pulse will release only after each of its inputs receives a reset pulse.
The critical margin of Muller C-element cell was calculated to be as wide as 130%.
D. XORCelI
A new BSFQ XOR is constructed by using two error cancellers, two mergers, and a modified RSFQ B flip-flop cell (Fig. 5) . The error canceller and merger line-up the input set 
Io1
Outl( S) Outl(R)
. Fig. 5 . Implementation of BSFQ demultiplexer. Fig. 6 . Implementation of BSFQ DRO cell. and reset pulses in an alternate manner, and ensure that the first pulse is a set pulses. Then, the modified B flip-flop releases a SFQ pulse into the set or reset rail alternately when its inputs receive a SFQ pulse. The critical margin of XOR cell is as wide as Lt32%.
Read)
E. Demultiplexer
The implementation of BSFQ Demultiplexer is easier than other dual-rail logic. It is constructed by using two RSFQ T flip-flop (Fig. 5) . A BSFQ demultiplexer splits consecutive pairs of set-reset pulses into 2 groups, such that the adjacent pairs are in different output rails. This cell has only 8 Josephson junction, and is smaller compared to those of the conventional asynchronous circuits.
F. DRO Cell BSFQ DRO (destructive read-out) cell is implemented by using a RSFQ D2 flip-flop cell (Fig. 6 ). The write pulse will turn the flux level of the inductance high, and when the cell receives a read pulse, it will release a set pulse to the output. If the flux level of the inductance is low, the arrival of read pulse will result in the output of a reset pulse. Table I1 shows the basic characteristics of BSFQ cells. The latency of BSFQ cells is small compared to conventional asynchronous circuits. Note that the latency for one stage of standard JTL is about 3 'to, where % , is defined as where I, is critical current, R,, is n o m 1 resistance of shunted Z , =@o121d,R,, There are only AND, OR, Muller C-element cells being designed and tested at this moment. However, since the other cells do not use mutual coupling structures, and are not much different from the conventional RSFQ cells, there is no reason to suspect their workability in the real world. Fig. 8, Fig. 9 , Fig. 10 illustrate the experimental results of new AND, OR, and Muller C-element cells respectively, in a low frequency testing. The testing chip was fabricated by NEC Corporation using their standard Nb/AlOxA% process [SI. Set-reset input pulses were generated by using a BSFQ level-to-pulse converter, and dc-voltage outputs were obtained by using a BSFQ pulse-to-level converter. The tested global bias margins for AND, OR, Muller C-element cells were &13%, *6%, and *lo% respectively. VI. CONCLUSION BSFQ logic system is classified as a flux level logic system, which directly supports Boolean primitives just the same as CMOS logic, and do not require any local synchronization for each operation cell. New BSFQ fundamental cells offer wide margins, which increase their ability to be used in large-scale circuits, and in circuits where switching speed is approaching a terahertz. The hardware complexity and latency of a BSFQ cell is small compared to conventional asynchronous circuits. Further work is in progress to draw out a global self-timed scheme for the BSFQ logic system.
N. BASIC CHARACTERISTlCS OF BSFQ CELLS
