I. INTRODUCTION

C
RYOPACKAGING is necessary to utilize ultra-high-speed single flux quantum (SFQ) circuits [1] in practical systems. The high-speed operation with extremely low power dissipation of SFQ digital or mixed-signal circuits, such as analog-to-digital converters [2] , network switches [3] , [4] , and microprocessors [5] , have been demonstrated at clock frequencies over 10 GHz, and DC measurements of simple flip-flops have demonstrated the potential of SFQ circuits to operate at over 100 GHz [6] . Additionally, recent progress in chip-to-chip SFQ pulse transmission [7] - [9] has raised the possibility of ultra-high-speed multi-chip modules (MCMs) that can operate at the speed of single chips. Most of these high-speed demonstrations, however, were done by using low-speed input/output (I/O) testing method or DC measurements in liquid helium.
For practical use, SFQ circuits need cryocoolers with highspeed I/O links to room temperature (RT) systems [10] - [12] . Cryocooled systems are successfully being developed for small- scale applications such as digital receivers [10] , [11] , which require relatively narrow digital bandwidth (hundreds Mbps/port to Gbps/port). However, for large-scale, high-throughput applications such as network switches, which require many (100 to 1000) I/O links with wider digital bandwidth (tens Gbps/port), cryocooled systems are much more difficult to develop. In fact, although a 4 4 switch was demonstrated in a closed cycle refrigerator with 24 high-speed electric I/O links and two optical fiber inputs, the operation frequency was limited at 4 Gbps, which was well below its target frequency of 10 Gbps [12] . Operation of SFQ circuits in cryocooled systems with multi I/O links with a bandwidth of the order of 10 Gbps/port has yet to be reported, so further effort is required to develop cryopackaging technology for high-throughput SFQ digital systems.
Two years ago, we began a study on cryopackaging technology. The primary purpose was to develop a cryocooled system prototype to demonstrate our switch circuits [3] , [4] at the system level. The prototype was designed to have 32 I/O links between the SFQ MCM and the RT systems with a bandwidth of 10 Gbps/port for our first system demonstration [13] . We developed elemental cryopackaging technologies and components including MCM [8] , [9] , semiconductor cryogenic amplifier [13] , wide-bandwidth multi-pin cryo-probe [13] , and superconductor voltage driver. Last year, we developed the cryocooled system prototype by integrating these elemental cryogenic components, I/O cables, magnetic and thermal shields, and a cryocooler. In this paper, we report on the design, implementation, and experimental results of the system. Fig. 1 shows the general scheme of our cryocooled system. The cryocooler has two temperature stages, namely, 4 K and 40 K stages. The SFQ MCM is located on the 4 K stage. The MCM consists of SFQ logic circuit chips, voltage driver chips, and an MCM carrier. The chips are flip-chip bonded on the carrier using solder bumps. The MCM allows one to choose a suitable fabrication process for each chip. In our usual case, logic circuit chips are designed with well-established standard library cells [14] so that they are fabricated with NEC's standard Nb process [15] in which the junction critical current density is 2.5 , while the driver chips are fabricated using a higher-process (typically ) to generate enough output voltage for low bit-error-rate (BER) operation at high speed.
II. SYSTEM INTEGRATION
A. System Design
Chip-to-chip communications use SFQ pulses. Both the pulse driver and receiver are single-junction circuits [8] , [9] . Fig. 1 . General scheme of our cryocooled system. D/S, NRZ/RZ, D, and R denote DC to SFQ converter, NRZ to RZ converter, pulse driver, and pulse receiver, respectively. More than one SFQ logic circuit chip and driver chip may be used. The transmission line impedance for on-chip and chip-to-chip communication is 2 to 4 for of 2.5 to 10 . We already demonstrated high-speed chip-to-chip communications up to 60 Gbps with and up to 117 Gbps with by using ring-shaped circuits [8] , [9] . The outputs of the SFQ logic circuit chips are transferred to the voltage driver chips, and are converted into 2-mV level signals on a 50-line. The outputs of the driver chips are amplified to about 50 mV by semiconductor cryogenic amplifiers at 40 K.
The I/O link between the MCM and RT system is electrical, and its data rate was designed to be 10 Gbps/port. For the first demonstration of our switches, number of the I/O link was designed to be 32. Thus the system bandwidth is 320 Gbps. on the first and second stages. Hence, we measured the load map of our cryocooler. The measured load map (Fig. 3) shows, for example, if the heat loads are 40 W and 1 W on the first and second stage, respectively, the temperature is 42 K on the first stage while it is 4.2 K on the second stage.
B. Implementation
The SFQ MCM is packaged on the 4-K sample stage (Fig. 2b) . The MCM and the sample stage are 16 mm 16 mm and 60 mm 60 mm, respectively. A flexible thermal link is used to connect the sample stage to the second stage in order to damp mechanical oscillations of the cryocooler. The MCM is surrounded by double mu-metal magnetic shields and a 40 K radiation shield, as shown in Fig. 2b .
On the first stage, GaAs cryogenic amplifiers [13] , which were customized for operation in cryogenic environments, are installed to amplify the output of the SFQ MCM. By placing the amplifiers on the first stage, the length, and hence the electrical loss, of the cables between the SFQ MCM and the amplifier can be reduced, resulting in a higher signal to noise ratio and lower BER. The gain of the amplifier is about 30 dB in a frequency range of 60 kHz to 25 GHz at 40 K. Note that the gain is about 9 dB higher than the gain at RT. The amplifier consumes 1 W and measures . One of the challenges of cryocooled system integration is how to reduce heat load while keeping electric loss as small as possible, because these requirements conflict with each other [10] . Additionally, the system should be easy to handle and robust against thermal cycles. To solve these problems, we used three different cables for each I/O link (Table III) . From RT to the first stage, we used relatively thick ( 2.2 mm) Cu co-axial cables to make the electrical loss low. Although such thick Cu cable is also a good thermal conductor, sufficient cooling capacity of the first stage allowed us to use it. These Cu cables were thermally anchored at the first stage. On the other hand, the cooling capacity of the second stage is only 1 W. Hence, we used relatively thin ( 1.19 mm) co-axial cables made of Phosphor Bronze (PB), whose thermal conductivity is much less than that of Cu, between the first and second stages to make the heat insertion low. The PB cables were thermally anchored at the second stage. From the second stage to the SFQ MCM, we again used Cu co-axial cable to make the electrical loss low. Although the Cu cables are good thermal conductor, heat insertion through the cable is negligible because the temperature of the sample stage is almost the same as that of the second stage. Here we used thin ( 1.19 mm) flexible cable to make it easy to attach/detach the cryo-probe (Fig. 2b) to/from the SFQ MCM. To improve electrical conductance for high-frequency signals, silver plating was made on the surfaces of the inner and outer conductors of all the cable. Additionally, porous poly-tetra-fluoro-ethylene (PTFE), instead of normal PTFE, was used as insulator for all the cable. The reason is that porous PTFE is not as thermally expandable as normal PTFE, so that it makes the I/O cables mechanically robust against thermal cycles. The total length of an I/O link from the RT I/O port to the SFQ MCM is 700 mm. The length from the SFQ MCM to the cryogenic amplifier is 400 mm. Total heat load to the sample stage was estimated to be 1.01 W (Table IV) . Another challenge was to develop a reliable, wide-bandwidth, multi-pin cryo-probe. The probe had to be detachable in order to be repeatedly used, so we employed mechanical pressure to establish electrical contact. After several improvements, we developed a reliable probe (Fig. 4) . The 32-pin probe consists of four 8-pin probes. The body of the 8-pin probe is made of CuMo. Eight probe heads are inserted into the probe body. The probe head is made of BeCu to ensure reliable contact at 4 K, and it has a signal finger and two ground (GND) fingers to form a coplanar waveguide. As shown in Fig. 4a , the SFQ MCM is set on a CuMo plate on the Cu sample stage. Then, four 8-pin probes are aligned and fixed on the CuMo plate with screws. The height of each probe head can be adjusted using individual adjusters, if needed. The reason why we used CuMo for the plate and probe body is that its expansion coefficient is closer to that of Si in comparison to that of Cu . Thus, the reliability of the electrical contact at 4 K is higher. Additionally, CuMo is hard material so that the probe body resists abrasion. Note that the probe is scalable: one can easily package larger MCMs that have more than 32 pins by using more 8-pin probes.
After the compressor is turned on, the system reaches 4 K in four to five hours. Achievable minimum sample stage temperature is 3.85 K, which means the actual heat load is less than the estimation (Table IV) . The second stage's temperature oscillates with an amplitude of several hundreds mK. It was pointed out that such a temperature oscillation may result in fluctuations of temperature-dependent circuit parameters [10] . In our system, however, the thermal resistance and the thermal capacitance between the second and the sample stages act as a thermal filter so that the oscillation amplitude at the sample stage is significantly damped to mK order.
III. EXPERIMENTAL RESULTS
A. Frequency Characteristics
We designed and fabricated a test chip for S parameter measurements of our system (Fig. 5a ). The chip size was 16 mm 16 mm, and it contained sixteen 50-microstrip lines (MLSs). As shown in Fig. 5b , the probing pad was coplanar in order to match the probe head (Fig. 4b) . The probing pad was designed to be 50 . Note that we took into account the dielectric constant of the Si wafer in designing the probing pad. We cooled the test chip with our cryocooled system and measured the S parameters with a vector network analyser. Fig. 6 plots the measured S21. S21 was measured for an electrical path from one RT I/O port to another RT I/O port through an MSL on the test chip that was packaged on the 4-K sample stage so that insertion loss of the system was a half of the S21. Therefore, analog bandwidth (BW), which is defined as a frequency at which the insertion loss is 3 dB, was 23 GHz. The obtained analog BW was much wider than 10 GHz.
In designing MCM, the bonding pad for the I/O signals has to be carefully designed to ensure good impedance matching. Fig. 7a shows an ordinary bonding pad connected to a 50-MSL, whose width is 1.5 . This type of bonding pad has parasitic capacitance between the pad and the ground plane, which causes impedance mismatch. Hence, we also tried a coplanar waveguide (CPW) bonding pad (Fig. 7b) . The CPW pad was designed to be 50 by taking into account of the Si wafer. To compare these two bonding pad types, we fabricated a test module (Fig. 8) , in which a 5 mm 5 mm MSL Fig. 8 . Schematic of test module to compare two types of bonding pad; one is ordinary type and the other is CPW type. chip was flip-chip bonded on a 16 mm 16 mm MCM carrier using InSn solder bumps. The chip contained two 15-mm-long 50-MSLs. One of the MSLs was connected to ordinary bonding pads while the other was connected to CPW bonding pads (Fig. 8) . We cooled the test module with our cryocooled system and measured the S parameters. As shown in Fig. 9 , the MSL with the ordinary bonding pad exhibited resonance at about 5 GHz and its harmonics. The resonance frequency corresponds to the MSL length, which means the mismatch at the ordinary pad is significant. On the other hand, the MSL with the CPW pad did not exhibit severe resonance. Compared with the S21 for the ordinary pad, the S21 for the CPW pad was 2 to 10 dB higher at the resonance frequencies, which proves that the CPW pad is effective.
B. Cryocooled Operation of SFQ Module
To demonstrate stable cryocooled operation of SFQ circuits at high speed, we designed an SFQ test module that consisted of a 5 mm 5 mm SFQ chip and a 16 mm 16 mm MCM carrier (Fig. 10) . The chip was flip-chip bonded on the carrier with InSn solder bumps. The chip contained a test circuit consisting of an NRZ/RZ converter, JTLs, splitters, and a 16-stage SQUID-stack voltage driver (Fig. 11) . In the voltage driver, each SQUID was coupled to an RS flip-flop (RSFF), which is the same as the circuit proposed in [12] . The test circuit comprised 591 junctions and its bias current was 90 mA.
The test circuit operates as follows. If the data dat_in, which is in non-return-to-zero (NRZ) format, and clock clk_in are applied to the circuit, the NRZ/RZ converter converts dat_in into an SFQ pulse that is in return-to-zero (RZ) format. Then, the pulse is divided into two: one is used as a set pulse while the other is used as a reset pulse for the RSFF. A delay of 50 ps is added to the reset pulse. The set pulse is split into 16 pulses and each pulse is applied to each RSFF. The reset pulse is also split into 16 pulses and applied to each RSFF. Thus, when data "1" comes to the circuit, the RSFFs store the data for 50 ps. While the RSFFs store the data, the 16 SQUIDs generate voltage.
The voltage driver is a key circuit for high-speed, low-BER cryocooled operation of SFQ circuits. In principle, to make the driver operate with a wide bias margin and high output voltage, the critical current of the SQUID should be increased, because by doing so, one can reduce the impact of the bias-current reduction that occurs because part of the SQUID bias current flows to the 50-load while the SQUIDs are generating voltage. Nevertheless, we used the smallest available junction whose size was to minimize the circuit size and power, and optimized the driver (Fig. 12) . Additionally, an aggressive design rule, in which the minimum line width, minimum space, and alignment margin were 1 , 1 , and 0.3 , respectively, was used in the layout of the driver to minimize the circuit size. The test chip was fabricated using the same process as NEC's standard Nb process [15] except that was increased to 10 to generate enough voltage ( 2 mV for our cryogenic amplifier [13] ) for low BER operation at 10 Gbps. The shunt resistance was designed to 
be
. GND plane holes were made below the coupling inductances , , and (Fig. 12 ) to increase the coupling constant. The coupling constant was measured to be 0.46. The output voltage of the driver was 2 to 2.5 mV during low-speed measurement.
The MCM carrier was fabricated using the same process as that for chips except that it did not contain any junction. The carrier had an Nb GND plane and two Nb wiring layers. InSn solder bumps were formed with a simple emersion process [17] on Au/Pd/Ti/Nb bonding pads of the chip and carrier. The thicknesses were 50 nm, 100 nm, and 200 nm for Ti, Pd, and Au, respectively. The CPW bonding pads (Fig. 7b) were used for I/O signals, while ordinary bonding pads (Fig. 7a) were used for biases.
We packaged and cooled the test module in our cryocooled system. First, we performed high-speed function tests. We applied a test pattern and a clock from a pulse pattern generator (PPG) at RT to the test module and observed the output of the module on a digital sampling oscilloscope. We confirmed correct operation: the test pattern dat_in (in NRZ format) was converted into RZ format by the on-chip NRZ/RZ converter, and the output of the module was amplified by the GaAs cryogenic amplifier at 40 K (Fig. 13) . A maximum output voltage of about 50 mV was obtained at the optimum driver bias at 10 Gbps. We confirmed correct operation up to 12.5 Gbps. The maximum throughput was limited by the PPG.
We also measured BER by applying a 10-Gbps pseudorandom bit sequence (PRBS) and a clock from the PPG to the test module. Errors were counted with an error detector at RT. The output of the system was amplified again to about 1 V by using a semiconductor amplifier at RT to make the output voltage higher than the threshold of the error detector. Fig. 14 shows the measured eye diagram and BER curve. The BER was less than in a driver bias margin of 1.4 mA 4.3%.
IV. CONCLUSION
We have developed a cryocooled system prototype. The system has 32 high-speed I/O links between an SFQ MCM and RT systems. S-parameter measurements showed that the analog BW of the I/O link was 23 GHz. The cryocooled system including 32 I/O cables, a 32-pin cryo-probe, an SFQ MCM with a superconductor voltage driver, and a customized GaAs cryogenic amplifier stably operated at bit rates up to 12.5 Gbps. The BER for a PRBS was less than at 10 Gbps, which showed that the prototype has enough performance for high-speed system demonstrations of not only our switches but also other SFQ digital circuits. Our cryopackaging technology and cryogenic components such as MCM, cryo-probe, cryogenic amplifier, and voltage driver can be generally used to integrate high-throughput cryocooled SFQ digital systems.
