Abstract-We investigated a single flux quantum (SFQ) multiinput merger composed of Josephson transmission lines (JTLs), a dc-SQUID stack magnetically coupled to the JTLs, and a dc/SFQ converter. The new merger can more efficiently merge many input signals than a conventional merger circuit, which is a two-input SFQ confluence buffer (CB). In this paper, we optimized and designed the multi-input merger according to an analog circuit simulation. The circuit simulation results show that the merger using up to 16 inputs can correctly operate. We implemented the test circuit and demonstrated a high-speed operation of a four-input merger at an input frequency of up to 23.3 GHz. We evaluated the delay time and the circuit scale of a practical multi-input merging circuit using the newly designed merger and the conventional merging circuit for an SFQ memory system. If we design a 4096-input merging circuit using a 16-input merger circuit tree, we can reduce the delay time, the number of Josephson junctions (JJs), and the total power dissipation of the merging circuit compared with the merging circuit based on a conventional CB tree. The reduction rates of the delay, the JJs, and the total power are approximately 11%, 35%, and 53%, respectively. Index Terms-Confluence buffer, dc/SFQ, dc-SQUID, merger, SFQ, shift register memory.
I. INTRODUCTION

S
INGLE flux quantum (SFQ) circuits have been widely studied and expected as one of the candidates for nextgeneration integrated circuit technology owing to their highspeed operation and low power dissipation [1] . Thus far, many important systems using SFQ circuits, such as microprocessors [2] , [3] , analog-to-digital converters [4] , and a readout system for a multi-channel superconducting detector array [5] , have been proposed and implemented. In such applications, the merging function of many signals plays an important role. For example, in the readout electronics of the superconducting single photon detector (SSPD), signals from a large number of SSPDs are merged into one output to detect the photon input [5] , [6] . In the SFQ shift register (SR) memory system [7] and cache memory [8] for SFQ microprocessors, the output data from each register are merged and read out at a high frequency. In such systems, conventional two-input confluence buffers (CBs) [1] have been used as a merging circuit. However, the maximum number of inputs of a CB is limited to be two because of the limitation of the physical layout and the operating margin. Therefore, if the number of merging signal inputs increases, the delay time and the circuit scale of the merging circuit composed of a two-input CB tree drastically increase. These merging-circuit problems could become a bottleneck of the whole system.
To overcome the above-mentioned problems, we investigated, designed and tested a novel multi-input merger that uses a SQUID stack which detects the input SFQ signal and a dc/SFQ converter. We also evaluated the performance of the merging circuit for practical application assuming the use of a new multi-input merger. We can more efficiently design many systems in terms of the delay, circuit scale and power. Fig. 1 shows the circuit schematic of a multi-input merger using a SQUID stack. The multi-input merger is composed of Josephson transmission lines (JTLs) corresponding to the input channels, a dc-SQUID array magnetically coupled to each input JTL, and a dc/SFQ converter, which is specially designed for the multi-input merger.
II. MULTI-INPUT MERGER USING SQUID STACK
The serially connected dc-SQUIDs are current-biased by bias current I b close to the threshold current. When an SFQ signal is input to one of the JTLs, the flux quantum which propagates in the JTL applies a magnetic field to the SQUID via the mutual inductance, and the SQUID switches to its voltage state. The McCumber parameter of the Josephson junctions (JJs) in the input JTL is set to 0.1 to obtain a long coupling time. The SQUID switching generates an output current pulse. The dc/SFQ converter detects the current pulse from the dc-SQUID stack and outputs the SFQ pulse to the output terminal.
Determination of the circuit parameters of the dc-SQUID is important in designing the multi-input merger because the operating margin and frequency of the multi-input merger are limited by the threshold characteristic and resetting time of the dc-SQUID. We optimized the β L value of the SQUID, which is expressed as where L is the loop inductance I c is the critical current of the JJ, and Φ 0 (=2.07 × 10 −15 Wb) is the flux quantum. All inductances of the merger, including the mutual inductance between the input JTL and the dc-SQUID, were extracted from the circuit layout using the three-dimensional inductance extraction tool, InductEx [9] . The optimized β L value was 3.08. The operating frequency of the merger circuit is roughly limited by the rise time and fall time of the output current from the dc-SQUID stack. The time constants, τ RC and τ RL are represented as
where C stack and L stack are total capacitance and inductance of the SQUID stack, respectively. If the number of inputs increases, both C total and L total become larger and the maximum operating frequency of the multi-input merger is deteriorated. The maximum number of inputs of the merger circuit depends on the required frequency for the application of the merger circuit. The pulse height of the output current from the dc-SQUID stack was 60 µA, and this input current could not drive the dc/SFQ converter in the conventional cell library [10] . Therefore, we optimized the dc/SFQ converter to obtain a higher current sensitivity. Fig. 2(a) shows the circuit configuration and circuit parameters of the optimized dc/SFQ converter. The circuit simulation result is shown in Fig. 2(b) . We used JSIM [11] to simulate and optimize the dc/SFQ converter and the multi-input merger. The dc/SFQ converter has an offset current of I offset to obtain a higher input sensitivity. JJs J 1 and J 2 are current-biased to close to the critical currents by I offset . When a current pulse is input to the dc/SFQ converter, J 1 and J 2 are switched on, and the output SFQ pulse is then obtained. 
11 pH, L 6 = 3.42 pH, and I offset = 350µA. The simulation results indicate that the minimum current pulse height to operate the dc/SFQ converter is 40 µA.
We simulated a four-input merger using a SQUID stack assuming the use of the Advanced Industrial Science and Technology (AIST) 2.5-kA/cm 2 Nb standard process 2 (STP2) [12] , [13] . The number of JJs of the 4-input merger is 19. The parasitic capacitances between the wiring layer of the SQUID stack and the ground plane, the value of which is 15 fF/channel, were taken into account in all simulations in this study. Fig. 3 shows the input and output waveforms of the four-input merger using a SQUID stack. We can see that each input SFQ pulse is correctly merged, and four output SFQ pulses are obtained from the output terminal. When the 2 SFQ signals are input to the 2 channel of the merger with a time difference less than 19.7 ps at the bias voltage of 2.5 mV, the merger outputs only one SFQ pulse. This function is the same as that of the conventional confluence buffer. The normalized bias voltage margin of the four-input merger is 78%-134% at a low operating frequency. The delay time of the four-input merger is 36.6 ps when the applied bias voltage is 2.5 mV, which is the standard voltage of the CONNECT cell library [13] . The margin of I b , normalized by the designed value, is 99.2%-103.9%. The simulated maximum operating frequency of the four-input merger using the SQUID stack is 43 GHz.
We simulated the 8-and 16-input mergers. The numbers of JJs of the 8-and 16-input mergers are 35 and 67, respectively. Fig. 4 shows the comparison of the bias voltage margins of the 4-, 8-, and 16-input mergers with the operating frequency. The operating margins deteriorate with the increase in the operating frequency because of the finite resetting time of the dc-SQUID stack. Though the time constants of the SQUID stack is the main factor that restricts the recovery time of the bias current flowing in the SQUID stack, the operation frequency of the mergers were limited by not only the time constants of the stack but also switching time of the Josephson junction in the dc/SFQ converter.
III. MEASUREMENT OF THE FOUR-INPUT MERGER USING THE SQUID STACK
We designed the four-input merger and its test circuit using the AIST STP2. The cell size and the number of JJs of the four-input merger were 160 µm × 80 µm and 19, respectively. Fig. 5 shows the block diagram and a microphotograph of the test circuit. We employed the on-chip high-speed test technique [14] in this test circuit, and we could perform low-and highspeed tests. To measure the circuit at a high input frequency, a 4-bit high-speed SFQ signal train was generated by the onchip clock generator (CG) by inputting a signal the "CG_in" signal in Fig. 5 and is input to one of the input channels of the merger, i.e., In1, because this test pattern was the most severe test sequence that requires the longest recovery time indicated by the circuit simulation. The high-speed output signals from the merger were once stored in the 4-bit SR. By reading out the stored data in the SR using low-frequency readout signals, represented by "Clk" in Fig. 5 , from the pulse pattern generator in the room temperature environment, the number of output SFQ pulses can be measured.
We confirmed its correct operation with a normalized bias voltage margin of 93.0%-136.2% from the low speed test. Fig. 6 shows the high-speed test result of the four-input merger using a SQUID stack obtained by the on-chip high-speed test. Fig. 7 . Comparison of the measured and simulated results of the dependence of the bias voltage of the four-input merger using a SQUID stack on its operating frequency. In this measurement, I offset of 380 µA was supplied. Fig. 7 shows the dependence of the measured bias voltage margin on the input frequency. We could obtain its correct operation up to an input frequency of 23.3 GHz. In the frequency region above 23.3 GHz, we could not measure the merger because of the malfunction of the CG.
IV. PERFORMANCE ESTIMATION OF MULTI-INPUT MERGING CIRCUIT USING THE NEW MERGER
Using the investigated multi-input merger, we can efficiently design a merging circuit for practical large-scale applications. To quantitatively evaluate the effectiveness of the multi-input merger, we estimated the total delay and the circuit scale of the merging circuit as a function of the number of inputs assuming the use of 16-input mergers with a delay of 42.0 ps aligned in a tree structure. In this evaluation, we assumed the use of 16-input mergers, the maximum operation frequency of which is 26 GHz. This value is the typical operating frequency of SFQ digital circuits that require the merging function such as an SFQ shift register memory system [7] . Fig. 8 shows the dependence of the delay of the merging circuit on the number of inputs. Fig. 8 also shows the values of the conventional merging circuit composed of a two-input CB tree. These estimates assume the use of a passive transmission line (PTL) wiring [15] in both cases of use of a two-input CB tree and 16-input merger circuits. The sharp increase of the delay of the 16-input merger based circuit from 2 4 to 2 5 is caused by increase of the number of stages in the merger tree from 1 to 2. When the number of inputs increases, the delay of the PTL becomes a dominant factor of the total delay of the merging circuit. The number of JJs required to design the merging circuit using 16-input merger can be represented as 70 × n, where 70 is the number of JJs of a 16-input merger and a driver and a receiver for PTL wiring [15] and n is the number of 16-input merger in the merging circuits. Similarly, when we use a 2-input CB tree, the number of JJs can be represented as 10 × n , where 10 is the total number of JJs of a CB with driver and a receiver and n is the number of the CBs in the merger. When the number of inputs is 4,096, the total number of JJs of the merging circuit composed of 16-input mergers and conventional two-input CB tree are 19,110 and 40,950, respectively. The advantage of using the merger composed of the dc-SQUID stack is enhanced with the increase in the number of the merging circuit inputs in terms of the number of JJs.
In addition to the above estimation, we calculated the static power P s and the dynamic power P d and compared the total power dissipation of the 4096-input merging circuits composed of a two-input CB tree and of 16-input merger circuits. The static power P s and the dynamic power [16] . Assuming the operating frequency is 20 GHz, the calculated P s and P d of the merging circuit using a 16-input merger circuit tee are 3.4 mQ and 0.14 mW, whereas P s and P d of the conventional CB-based merging circuit are 8.1 mW and 0.31 µW, respectively.
According to this delay, circuit scale and power estimation, we can reduce the delay time, the number of JJs and the total power dissipation of a 4096-input merging circuit, which can be used for a 32 × 4k bit SFQ SR memory system, by 11%, 35%, and 53%, respectively, compared with the merging circuit composed of the conventional CB tree.
V. CONCLUSION
We have investigated a multi-input merger that uses a dc-SQUID stack and a dc/SFQ converter. We have optimized the circuit configuration and parameters of the SQUID stack and the dc/SFQ converter according to the circuit simulation results for high-speed and stable operation. We implemented the test circuit of a four-input merger using the AIST 2.5 kA/cm 2 Nb STP2. We demonstrated the high-speed operation of the fourinput merger up to an input frequency of 23.3 GHz. Assuming the use of the 16-input merger tree using a SQUID stack in the design of a 4096-input merging circuit, we can reduce the delay time, the number of JJs, and the total power dissipation by approximately 11%, 35%, and 53%, respectively, compared with the merging circuit composed of a conventional 2-input CB tree. Using multi-input mergers, we can more efficiently design various systems in terms of the delay time, circuit scale, and power dissipation.
ACKNOWLEDGMENT
The circuits were fabricated in the clean room for analog-digital superconductivity (CRAVITY) of National Institute of Advanced Industrial Science and Technology (AIST) with the standard process 2 (STP2). The AIST-STP2 is based on the Nb circuit fabrication process developed in International Superconductivity Technology Center (ISTEC).
