Abstract-We have developed a Data-Driven Self-Timed (DDST) Rapid-Single-Flux-Quantum (RSFQ) demultiplexer (demux) for the interface between on-chip high-speed RSFQ circuits and off-chip low-speed circuits. In order to eliminate the timing issue in a synchronous clocking system we employed the DDST architecture, where a clock signal is localized within a 2-bit basic demux module and dual rail lines are used to transfer the timing information between the modules. A larger demux can be produced simply by connecting the 2-bit modules in a tree structure. The DDST demux was designed for 10 Gb/s operation with sufficient dc bias margin using HYPRES 1 kAlcm2 Nb process. We have successfully tested operation of the 2-bit demux up to 18 GHz using the DDST on-chip high-speed test system which was developed in our group. test system, input signals are stored in the input buffer at low frequency at first. After that, high speed test is performed by triggering the on-chip clock generator which provides a high speed clock signal to the circuit under test and the input/output buffer. Output signal is then read out by triggering the output buffer at low frequency. We have made the on-chip high-speed test system based on the DDST circuits and demonstrated its correct functionality up to 20 GHz [ 91.
test system, input signals are stored in the input buffer at low frequency at first. After that, high speed test is performed by triggering the on-chip clock generator which provides a high speed clock signal to the circuit under test and the input/output buffer. Output signal is then read out by triggering the output buffer at low frequency. We have made the on-chip high-speed test system based on the DDST circuits and demonstrated its correct functionality up to 20 GHz [ 91.
In this paper, we have investigated a high speed performance of the DDST demux using the DDST on-chip high-speed test system.
CIRCUIT DESIGN of DEMULTIPLEXER I. INTRODUCTION
Because of the ultra-high operational frequency and very low output voltage level of rapid single flux quantum (RSFQ) digital circuits [ 11, a demultiplexer (demux), which converts high speed serial data from the RSFQ circuits into low speed parallel data processable by semiconductor circuits, is strongly demanded. So far two types of architecture have been proposed for the demux: a binary tree architecture [2] , [3] , and a shift-and-dump architecture [4] . Though correct operation has been confirmed at relatively low frequency for both architectures, making a larger system seems hard due to the difficulty of the timing design between the clock and the data signal at very high frequency.
We have developed a demux by using a data-driven selftimed (DDST) architecture [ 5 ] , [ 6 ] . In this asynchronous architecture, no global clock is used, but a localized clock is used just within the DDST basic modules. Between each module, complementary data carry the timing information through the dual data line. We have reported successful functional operation of 4-bit and 8-bit DDST demux systems at low frequency with sufficient bias margin [7] . Functional testing of RSFQ circuits at very high frequency above several gigahertz is, however, extremely difficult because the output voltage level of an RSFQ circuit is very small and the precise control of skew between the signals and the clock becomes very hard. One way to solve these problems is to use an on-chip high-speed test system [8] [9] , where a high speed clock generator and inputloutput buffer are integrated on the chip with the circuit under test. In this Manuscript received September 15, 1998.
As shown in Fig. 1 , the basic 2-bit demux module consists of two modified D flip-flops (D2FF) with complementary outputs, a T flip-flop (TFF), a confluence buffer and four pulse splitters [7] . Complementary input data "In" and "f;;" are applied to the module, where an input of SFQ pulse to "In" ("E") represents logical 1 (0). These data are stored in both D2FFs, and TFF triggers the D2FFs alternately, resulting in alternating outputs of the complementary data in ("Outl", "Outi") and ("Out2", "Out;!"). A larger demux system can be produced simply by connecting the 2-bit demux modules in a tree structure. A similar architecture of the demux based on the dual rail has been reported in [ 101.
One of the key component of the DDST demux, which determines the margins and the maximum operating frequency, is the D2FF [7] . Its circuit diagram is shown in Fig. 2 . Though we have already shown the parameter set of the D2FF before [7] , we optimized it again by using the Monte Carlo circuit optimization method [l 11 in order to improve the yield. We used the MJSIM [12] for the estimation of the theoretical circuit yield and the optimization of the circuit parameter. MJSIM can generate a number of net lists for JSIM [13] , where all parameter values in the circuits (the critical current Zc, the resistance R, and the inductance L) are varied randomly similar to the fabrication process variations, so as to automatically calculate the circuit yield. It can also find an improved parameter set based on the optimization algorithm using the center-of-gravity method [14] . In this algorithm, an improved parameter set is derived by calculating the average of circuit parameter vectors which are inside the operational region. The advantage of this optimization method is low computational cost compared to other optimization methods when the number of the circuit parameter increases. Our new parameter set is shown in the caption of Fig. 2 , where a 24-dimensional vector representing the circuit parameter set is optimized after 43 evaluations of the Monte Carlo circuit yields.
Circuit parameters of 2-bit demux were optimized for the input data rate of 10 Gb/s assuming a Nb Josephson process with the critical current density of 1 kA/cm2. The simulated dc-bias margin is f28% for global bias changes . This value is slightly smaller than previous value (+29%) because the circuit is optimized so as to have the maximum circuit yield. Figure 3 shows the dependence of the operating range of the 2-bit demux for the global bias changes on operating " '~= " "~" " ! " " ! ' ' " " " ' " " ; e ;
i 3 0 spread of : One can see that lower margin gradually decreases with increase of the frequency, while the upper margin does not depend on the frequency. The degeneration of the margin arises from discrepancy of the relative timing between the clock and the data at low bias voltages. Figure 4 shows the dependence of the theoretical circuit yield of the 2-bit demux on the local 3 0 spread of the criticall current IC, where we assume 10% global (chip-to-chip) 3 a spread on IC and L , 15% global 30 spread on R, and 10%~ local (component-to-component) 3 0 spread on R and L [ 111. It is found that 100% circuit yield is obtained for 10% local 3 0 spread on IC.
DDST ON-CHIP HIGH-SPEED TEST SYSTEM
The block diagram of the DDST on-chip high-speed test system for the 2-bit demux is shown in Fig. 5 . The system ha:; an input 4-bit DDST shift register for loading data, two Ls L6 o u t output 4-bit DDST shift registers for reading data, and a 4-bit 20 GHz clock generator. First, data "1010" are loaded into the input shift register at low frequency, and initial values in the output shift register "00" are shifted out on both Out1 and Out2. Then, one initial signal is sent to the clock generator, which produces four bits of "0" to drive the loaded data "1010" out from the first shift register into the demux under test at very high frequency. Simultaneously, the high speed output signal from the demux is stored on two output shift registers, and "0"s are shifted out on Outl and Out2. Because this output signal is too fast, no output is observed by the low speed measurement system.
Finally, when the next data "1011 1110" are applied at low frequency, data stored in the output register can be read out as "00 11" on Outl and "00 00" on Out2. Resultant input and output signals is shown in Fig. 5b . Notice that 12 bits of "0" are loaded after the above data sequence to reset all shift registers. On the other hand, when no initial signal is applied, output signals from the demux are only shifted by the input and output shift register by 6-bit. Observe the difference of the output data patterns between with and without initial signal, which are indicated by shaded regions in 
IV. FABRICATION AND TESTING
The 2-bit DDST demux with the on-chip high-speed test system was fabricated using the standard HYPRES Nb Josephson technology with the critical current density of 1 kAJcm2. The micrograph of the on-chip high-speed test system is shown in Fig. 6 . The dimension of the 2-bit demux is about 390 pm x 550 pm. Figure 7 shows test results of the 2-bit demux at low frequency (1 kHz) with an input data pattern of (1 11 1 1010 1011 1110). One can see that fully correct operation is observed with output data patterns of (11 11 11 11 ) in Outl and (11 00 01 10) in Out2. The measured dc-bias voltage margin of this circuit is h4.0 % at low frequency. Degradation of the dc-bias margin of the circuit compared to the circuit previously reported [7] is thought to be due to the large local variation of the junction parameters and some flux trapping in the circuit. . . Fig. 8 Test results of a 2-bit demux at high frequency (16 GHz). Top figure (a) shows input and output data pattern with the initial signal, whereas the bottom figure (b) is data pattern without the initial signal. Each transition corresponds an input or output of one SFQ. The input data pattern is (1010 1011 1110 0000 0000 0000) as described in Fig. 5b . Fully correct operations are observed with complementary ou ut data patterns of (1 1 11 00 11 00 00) in Out1 and (1 1 11 11 11 10 01) in k t 2 with the initial signal, and (11 11 1 1 0 0 0 0 0 0 ) i n~~a n d ( 1 1 11 11 11 1001)inU-t2without the initial signal. data pattern with the initial signal, and Fig. 8b is the data pattern without the initial signal. The data pattern into the onchip high-speed test system is (1010 1011 1110 0000 0000 0000) as described in Fig. 5b . One can see that fully correct operations are observed with complementary output data patterns of (11 11 00 11 00 00) in Out1 and (11 11 11 11 10 01) in O u t 2 with the initial signal, and (1 1 11 11 00 00 00) in Out1 and (1 1 11 11 11 10 01) in Out2 without the initial signal. Correct output operations are also observed in Out1 and Out2.
We have measured the dependence of the dc-bias margin of the circuit on the frequency of the clock generator. The circles. It is found that circuit operates correctly up to 18. GHz, though the dc-bias margin is small. It should be noted. that we estimate the frequency of the clock generator by computer simulation. The good agreement in the frequencies of the clock generator which are calculated from simulation and measured by a spectrum analyzer has already been verified in our previous testing [9] .
V. CONCLUSION
We have demonstrated proper operation of the 2-bit DDST demux up to 18 GHz by using the DDST on-chip high-speed test system. By employing the DDST architecture, the difficulty in designing the timing between the clock and the data signals of the high speed RSFQ circuit and the test system became dramatically reduced.
