Introduction
Ultra low power, ultra high-speed switching and ballistic signal transmission available in the rapid-single-flux-quantum (RSFQ) logic circuits are expected to open the next-generation high-speed electronics, because semiconductor integrated circuits supporting the present high-speed electronics encounter severe problems of their increased power density and of increased interconnect delay. Since the formulation of the logic circuit using a single flux quantum (SFQ) as an information carrier [1] , the integration level of the RSFQ circuits has been enhanced and superiority to the semiconductor devices have been demonstrated in small-scale circuits. However, the RSFQ circuits have not been used in actual applications. The main reason is thought to be in the disadvantages such as the cooling cost, bulky system, and difficulty in the packaging.
To hide the disadvantages, higher-speed and higher integration level are strongly required even in the relatively small-scale applications. Though the small-scale applications utilize other advantages such as low noise, high sensitivity, high precision, higher-speed, higher integration usually leads to improved performances and functions. Thus, we started to develop the designing technology of the RSFQ circuits toward RSFQ large-scale integrations (RSFQ-LSIs) because the reliable fabrication process called "the NEC standard process [2] " had already been established based on Nb/Al-AlO x /Nb Josephson junctions (JJs). Of course, a fabrication process with the increased number of wiring layers and increased critical current density should be developed. The advancement in the fabrication technology is described elsewhere [3] .
We employed the cell-based design methodology [4] , where physical layouts, logic functions, and schematics of all the logic elements including wirings were registered. The cells were optimized so as to enlarge operating margins to the variation of the circuit parameters and so as not to be interfered with the adjacent cells when connected. We obtained the bias-current-dependent timing parameters such as delay time, set-up time, numerically, and registered them in the computer-aided design tool. Recently, we introduced the shielding structure that shielded the magnetic field generated by the bias current, because the field was strengthened in proportion to the magnitude of the bias current and reduced the operating margins of the circuits [5] . By using this design technology, the integration level was increased to 10k JJs. In fact, RSFQ-microprocessors [5] and high-end router components [6] have been demonstrated with the operating frequencies of 20-40 GHz so far.
However, the timing deviation from the designed value becomes large as the integration is growing up. Thus, the deviation should be measured in actual circuits and its origin should be investigated for further integration. In this article, we describe the experimental analyses of the timing deviation and our design approach to suppressing the effect of the timing deviation. In addition, we mention the applications based on the RSFQ-LSIs.
Measurement of the Timing Deviation
The deviation from the designed value in SFQ arrival timing is thought to originate from parameter spreads, thermal noise, and external noise including magnetic flux generated by supply currents. First, we examined the oscillation periods of four ring oscillators to measure the average voltage using the nano-voltmeter. Each oscillator is composed of different number of Josephson transmission lines (JTLs), splitters, and confluence buffers. By solving the simultaneous equations, we can obtain the delay time for each elements. Fig. 1 shows the obtained delay time of a single JJ in a JTL as a function of the bias current. We tested five identical circuits fabricated on different Si wafers. The solid line corresponds to the numerically obtained delay time. That agrees with the experimental data and this agreement definitely supports the demonstration of the relatively large circuits like microprocessors. However, the standard deviation of the chip-to-chip variation (1-σ global spread) was measured to be 0.2 ps (6%), though it depended on the bias current. This value is probably consistent with the 1-σ global spread in the critical currents, inductances, and resistances.
Increased number of JJs requires large supply currents, resulting in enhanced magnetic fields generated by the return currents flowing on ground plane even if we use the shielding structure. To investigate the effect of the return current quantitatively, we measured the oscillation period of the ring oscillator under intensive currents on the ground plane. The two identical ring oscillators, Ring A and Ring B, are examined. The positions of the two oscillators and the three paths of the ground currents are illustrated in Fig. 2 (b) . The obtained delay times as a function of the ground plane currents are shown in Fig. 2 (a) . Evidently, closer paths have large effect on the delay time. Usually when we push the bias current in a certain line, the bias current with the same magnitude is pulled out through the adjacent line. Thus the magnetic fields generated by the bias currents are suppressed. However, this suppression is not perfect. We have to pay special attention to the balance of the bias currents and their returns in large-scale circuits. Time jitter possibly limits the integration level of our current RSFQ circuits because the jitter is proportional to the square root of the number of junctions composing signal transmission paths. We built up the time-to-digital converter (TDC) for measuring the time jitter experimentally. The TDC is composed of delay line and delay-flip-flops (DFFs) serving as a timing arbiter. The timing accuracy and the resolution of the TDC are determined by the delay τ of the passive transmission line (PTL). In this experiment, we set the delay to 0.7 ps. We examined five chips. Each chip has five JTLs made up of different number of JJs. Fig. 3 (a) shows the schematic diagram of the circuit for measuring the time jitter and Fig. 3 (b) indicates the obtained values at 4.2K. From the slope of the data, the time jitter is calculated to be 0.035 ps for a single junction. Though this value seems enough small, the jitter will reach 1 ps if we use the JTL composed of 1000 JJs, having possibility to produce errors in a large-scale circuits. According to the same experiment with a higher critical current density, the origin of the time jitter is thought to be thermal noise generated at the shunt resistor. 
Introduction of Flexible Passive Transmission Lines
To reduce the timing deviation mentioned above, we need to reduce the number of JJs composing signal transmission paths and the number of superconducting loops with keeping the functionality of the circuits. The most effective way is full-scale introduction of PTLs having a structure of a micro-strip line (MSL) or a strip line (SL). However, the PTLs available in the current RSFQ circuits are less flexible. They are only used for the port-to-port transmission without any via-contacts at which an MSL and an SL are connected.
We optimized the via-contact structure with the 3-dimensional electromagnetic field simulator Ansoft HFSS so as to minimize reflection due to impedance mismatching and to keep the sufficient operating margin up to the data rate of 50 Gb/s [7] . Fig. 4 illustrates the optimized structure of the via-contact. We placed a 10 µm square signal contact (SC) together with a ground contact (GC) between the upper and lower ground planes surrounding the SC. The isolation distance is a key issue to keep sufficient operating margins, and set to 1 µm in our case. According to the simulation, the reflection (S11) at the via-contact is suppressed below -20dB.
We made several kinds of ring oscillators including PTLs and study the SFQ propagation properties. Fig. 5 shows a bias margin of the driver (DRV) and that of the receiver (REC) as a function of the data rate for a ring oscillator with a 13-mm-long PTL and 54 via-contacts. Good properties are confirmed up to 50 Gb/s. Though slight degradation is observed in the propagation property compared to the ring oscillator having no via-contact, the origin is found to be the length of the PTL. Increasing fan-outs at the driver is one of the most important issues to improve flexibility in the PTL wirings. We optimize the circuit parameters in the driver (Tx) and receiver (Rx) so as to maintain sufficient bias margins even at the resonant frequencies [8] . To enhance the driving ability of the Tx, we increase the critical currents of the JJs in Tx with keeping LIc product. We designed four kinds of the ring oscillators. Each of those has a PTL and the (multicasting) driver with fan-out of 1, 2, 3, and 4. Fig. 6 shows the experimentally obtained bias margins as a function of the data rate. We confirmed sufficient margins up to 50 Gb/s even for the Tx having the fan-out of 4. From these experimental results, we think that the flexible wiring technique using PTLs becomes available in our current RSFQ design. Fig. 7 summarizes possible applications of the RSFQ integrated circuits together with the required integration level. Though high-performance computers and high-end routers based on the RSFQ technology require a large number of JJs, these performances will not be achievable in semiconductor technology because of the heat problem. Digital SQUIDs can be applied to bio-magnetism and non-destructive evaluation. Superconductive sensors or detectors such as a superconducting tunnel junction (STJ), transition-edge sensor, have high potential in their sensitivity, signal-to-noise ratio (SNR), etc. compared to other non-superconductive sensors. A multiple-superconductive-sensors system will be a powerful tool to the fields of a THz-wave imager, protein mass analysis, nuclear physics, radio-astronomy, etc. In the system, multiple-sensors are combined digitally and real-time signal processing is carried out. The key components concerning the RSFQ circuits are an analog-to-digital converter (ADC) and multiple-input digital signal processor (MI-DSP). Software-define radio (SDR) applications and the readout system of superconductive qubits can be classified into the same application field.
Ongoing RSFQ Applications
RSFQ-based ADCs are widely being studied toward a digital-RF receiver, one of the SDR applications [9] . Multiple RF signals with different modulation schemes are digitized at the broad-band bandpass ADC [10] . Then, down-conversion, channel selection, and demodulation are performed at the RSFQ-DSP. Ultimate flexibility will be realized by using the digital-RF receiver. There is an increasing interest in the multiple-detectors system. In particular, an imager based on the THz-wave irradiation is an emerging application. Fig. 8 shows a block diagram of a multiple-detectors system for a THz-wave imaging. This diagram is available for an X-ray detectors system. Though the recent study reveals that the THz-wave imager is a powerful tool for a semiconductor LSI inspection, DNA observation, material identification, defect observation, etc., there is a problem of very long data acquisition time because the present detectors have the features of long response time, small dynamic range, low sensitivity, and detecting area. On the other hand, STJs serve as a quantum detector with short response time, high dynamic range, high sensitivity, and high energy resolution (for X-ray). Thus, we can easily shorten the acquisition time if we employ the STJs as the detectors. Increased heat flux inflow to the STJs becomes another problem, when we use multiple detectors. In addition, room-temperature (RT) electronics becomes bulky with increased number of STJs. To overcome the above-mentioned problems, we place ∆Σ-or ∆-modulators, which are front-end parts of the oversampled ADCs, at the 4K-stage, then quantize the output signals of the STJs as shown in Fig. 8 . After quantization, a 1-bit data stream reflecting the output signal of each STJ is obtained. Digital signal processing such as multiplexing, auto adjustment of intensity, averaging for noise reduction, spatial derivative for a dark field image, are carried out with keeping the form of a 1-bit data stream. The decimation is done at the back-end part of the DSP and the digital data is sent to the RT electronics.
We started to develop the multiple-detectors system for X-ray in collaboration with the RIKEN group as a first step, because the signal processing required for the X-ray detection is easier than that of a THz imager. Fig. 9 shows a microphotograph of our ADC composed of two identical ∆-modulators and a first-order decimation filter [11] . The ADC is covered by the magnetic shield can, and then set at the actual setup for X-ray detection. The operation is confirmed even in an external magnetic field that is applied to suppress the Josephson current. Though degradation in the SNR is observed, the performances will be improved after reducing external noise.
We have also been developing RSFQ-microprocessors. While the main purpose of the development is to evaluate the design tools in complicated, large-scale integrated circuits, we always make effort to find the technical merits of a supercomputer application. Fig. 10 (a) shows the micro-architecture of our latest microprocessor, CORE1β. The CORE1β has seven pipeline stages and two arithmetic logic units (ALUs) to approach the performance of the state-of-the-art semiconductor microprocessor, though a bit-serial processing is employed in the data-path of the ALU. The two ALUs connected in cascade enable us to perform two calculations on serial data using a register-register instruction. According to the logic simulation, the peak performance of the CORE1β microprocessor is estimated to be 1500 million operations per second with the power consumption of 3.3 mW. 
Summary
We describe recent progress in the fundamental design technology for RSFQ LSIs. An SFQ propagation delay in a single JJ easily changes on the order of 0.1 ps because of a time jitter, parameter spread, variation of bias currents, and magnetic fields. To suppress the effect of these timing fluctuations, we develop the advanced PTL technology which improves interconnect flexibility remarkably. Based on this design technology, we are developing ADCs and microprocessors toward the applications of a multiple-detectors system and high-performance computers. For further integration and high-speed, reliable operation, we need to solve a flux-trapping problem and to develop a current recycle technique.
