Adiabatic quantum-flux-parametron (AQFP) logic is a promising technology for future energy-efficient highperformance information processing systems. Its static power is zero because of ac flux bias, and its dynamic power is considerably reduced, thanks to the adiabatic switching of the junctions. The lack of high-density memories in the AQFP logic, however, makes it challenging to realize large-scale information processing systems with the use of pure AQFP circuits. We have been developing a Josephson-CMOS hybrid memory to overcome the memory bottleneck in AQFP digital systems. By utilizing the high sensitivity of the AQFP gate, the output current from CMOS memories can be significantly decreased resulting in the reduction of the power consumption. In this article, we designed and fabricated a low-power area-efficient AQFP-CMOS hybrid field-programmable gate array (FPGA), where a CMOS memory is utilized as a rewritable readonly memory to control the AQFP circuits. The AQFP circuit for the AQFP-CMOS hybrid FPGA is composed of logic blocks, switch blocks, and connection blocks, which are clocked by four-phase excitation currents. The AQFP-CMOS hybrid FPGA is fabricated by using the AIST 10 kA/cm 2 Nb high-speed standard process and the Rohm 0.18 µm CMOS process. The area and power consumption of the two-by-two AQFP logic-cell system are estimated to be approximately 6.56 mm 2 and 12.4 nW at 5 GHz operations, respectively. The power consumption of the CMOS memory was estimated to be 1.02 µW assuming the CMOS source voltage of 3 mV. We demonstrated the operation of the AQFP-CMOS hybrid FPGA at low speed by combining the AQFP logic and the CMOS memory.
I. INTRODUCTION
T HE performance of semiconductor integrated circuits has been continuously and rapidly improved due to advances in microfabrication technology. However, in recent years, microfabrication technology is considered to be approaching limitation [1] . Low-power superconductor logic families are attractive as a fundamental technology for future high-performance information processing systems. Various superconductor logic families such as rapid single-flux-quantum (RSFQ) logic [2] have been extensively studied [3] - [5] . Adiabatic quantum-fluxparametron (AQFP) logic [6] is an energy-efficient superconductor logic based on the quantum-flux-parametron [7] , [8] . The static power of AQFP logic is zero because of ac flux biasing. In addition, the dynamic power of AQFP logic is considerably reduced, thanks to adiabatic switching of the junctions [9] , [10] . As a result, the bit energy of AQFP logic is about six orders of magnitude lower than that of semiconductor logic [11] . Recently, average energy dissipation of 1.5 zJ per junction was demonstrated in an 8-bit carry-lookahead adder at 5 GHz [12] .
The lack of high-density memories in the AQFP logic, however, makes it challenging to realize large-scale computing systems using pure AQFP circuits. One way to overcome the memory bottleneck is the hybridization of superconducting logic with CMOS logic [13] - [15] . Recently, we demonstrated the fully functional operation of 64-kb Josephson-CMOS hybrid memories [16] . In this system, small signals from RSFQ circuits are amplified by using Josephson latching drivers and CMOS amplifiers; then the signals are provided to a CMOS memory composed of 8-transistor (8-T) static memory cells. The output signals from CMOS memory cells are detected by RSFQbased current sensors. Because of the high current sensitivity of the RSFQ current sensor, high-speed memory access can be expected. One drawback in this system is relatively high power consumption necessary to amplify the small signal from the RSFQ circuits and to decode the address data at high speed. It was estimated that approximately 93% of power is consumed at the amplifiers and decoders in the hybrid memory [16] . This high power consumption can be prevented by using a hybrid memory as a rewritable read-only memory (ROM), where the write operation happens relatively infrequently. Moreover, CMOS bias voltage for reading the memory data can be reduced independently from the entire CMOS bias voltage to decrease the power consumption of the memory. Considering the high current sensitivity of the AQFP gates of a few µA levels, the power consumption of the hybrid memory can be considerably decreased.
A field-programmable gate array (FPGA) [17] is a kind of programmable logic device, which is integrated circuits realizing arbitrary logic functions. Though the FPGA was traditionally used to test the functionality of a prototype system, it is recently used even as an accelerator in the high-performance computing system due to its reconfigurability and large-scale hardware parallelism. High-performance FPGAs composed of more than 1 M logic cells have been manufactured by using the 16-nm Fin FET technology [18] . In the high-performance FPGA, the performance per watt is one of the essential figures of merits. However, a large portion of the power of the latest CMOS FPGAs is consumed by the leakage currents of CMOS devices, which limits the device performance. Thus, superconducting integration circuits are attractive to realize FPGAs due to their energy-efficient operation. The first superconducting FPGA using RSFQ circuits was proposed in [19] , in which an FPGA with two-by-two logic cells was designed, and its hardware cost was estimated. Recently, an RSFQ-based FPGA using magnetic Josephson junction memories was proposed for implementation of area-efficient switches [20] . Nevertheless, both of these activities are theoretical researches, and no actual system is demonstrated up to now.
In this article, we propose an AQFP-CMOS hybrid FPGA by using a CMOS memory as a rewritable ROM. The power consumption of the CMOS memory can be significantly reduced by decreasing the supply voltage for a CMOS memory. We designed and implemented an AQFP-CMOS hybrid FPGA with two-by-two logic cells, and estimated their hardware cost. Their functionality was examined at low speed. Fig. 1 shows a conceptual diagram of an AQFP-CMOS hybrid FPGA using a CMOS memory. The system is composed of an AQFP FPGA circuit and a CMOS memory. The AQFP FPGA circuit is a two-dimensional array of logic cells composed of logic blocks (LBs), switch blocks (SBs), and connection blocks (CBs). I/O blocks are connected to the peripheral of the logic cell array. The CMOS memory defines the function of the AQFP FPGA circuit by applying µA level currents to each AQFP gate. The CMOS memory is composed of a CMOS decoder and 8-T SRAM cells. The decoder and memory cells are driven by typical CMOS supply voltage V DD of volt levels, while the readout transistors are driven by much lower supply voltage V RDD of mV levels. The data of the CMOS memory are directly written from the room temperature electronics through the CMOS decoder. The detailed design of the CMOS decoder and 8-T SRAM cells is described in [16] . Fig. 2 shows a circuit diagram of an 8T SRAM cell and an AQFP buffer gate to detect the current from the 8T SRAM cell. By separating the supply voltage V RDD of the readout transistors from the supply voltage V DD of the entire CMOS circuits, the readout current, and hence the power consumption can be considerably reduced. The dependence of the measured output current I RBL of the 8-T SRAM cell on the supply voltage V RDD of readout transistors is shown in Fig. 3 . We used Rohm 0.18 µm CMOS devices whose typical supply voltage is 1.8 V. The width of the readout transistor is 5 µm. As the current sensitivity of AQFP buffer gates is a few µA level, we can evaluate V RDD necessary for reading the memory data. When we assume the output current of the memory cells is 10 µA, we can choose V RDD to be 3 mV, and the power consumption of the memory is estimated to be 30 nW/bit. composed of four circuit blocks: an LB, an SB, an input connection block (iCB), and an output connection block (oCB). LB is a programmable logic gate with two inputs and one output, whose function can be defined by a 4-bit look-up table (LUT). SB is a programmable switch gate with four inputs and four outputs, where output signals from the bottom and left nodes are selected among either of the top or right nodes. iCB is a programmable switch gate with two inputs and four outputs, where output signals of the bottom nodes are selected among either of the right nodes. oCB is a programmable switch gate with three input and two outputs, where the input signal from the top nodes is switched to output nodes. In order to reduce the hardware cost, the direction of the signal flow in the routing channel (RC) is restricted as shown in Fig. 4 .
II. AQFP-CMOS HYBRID FPGA

III. DESIGN OF AN AQFP FPGA CIRCUIT
Each AQFP circuit block was designed by using the AQFP logic cell library for the AIST 10 kA/cm 2 Nb high-speed standard process [21] based on the minimalist design approach [22] . Fig. 5 shows a circuit diagram of LB, which consists of a 4-to-1 multiplexer (MUX) and a feedback-type delay latch [23] . Control data for a 4-to-1 MUX are supplied from a CMOS memory, which stores the LUT data for the LB. The circuit is driven by four-phase excitation clocks I x1 through I x4 so that the input data are output at the next two clock cycle. The circuit area and the junction number are estimated to be 345 × 520 µm 2 and 86, respectively. Fig. 6 shows circuit diagrams of SB, iCB, and oCB. Their main circuit element is a 2-to-1 multiplexer (2M), which selects one of two input signals depending on a control signal from a CMOS memory. All the circuits are driven by four-phase excitation clocks so that the input data are output at the next clock cycle. The circuit areas of SB, iCB, and oCB are 315 × 600, 315 × 350, and 315 × 600 µm 2 , respectively. Their junction numbers are 96, 60, and 50, respectively. Fig. 7 shows a photomicrograph of a two-by-two AQFP FPGA circuit fabricated by using the AIST 10 kA/cm 2 Nb high-speed standard process. The system is composed of four LBs, nine SBs, four iCBs, and four oCBs. The total circuit area is estimated to be 1960 × 3350 µm 2 . The junction number and the number of CMOS memory cells are listed in Table I . Four-phase excitation currents I x1 through I x4 are generated by the combination of two ac currents I ac1 and I ac2 , and one dc bias current I dc , whose designed amplitudes are 782, 782, and 1070 µA, respectively. The power consumption of the AQFP circuit is estimated to be 12.4 nW at 5 GHz, whereas that of the CMOS memories is 1.02 µW assuming V RDD = 3 mV. IV. MEASUREMENT OF AN AQFP-CMOS HYBRID FPGA Fig. 8 shows a photograph of a two-by-two AQFP-CMOS hybrid FPGA using a CMOS memory. In this article, we use a CMOS memory for controlling four AQFP LUTs. The total number of control lines from the CMOS memory is 16 because each LUT has four control lines. The CMOS memory is composed of a 4-to-16 decoder and sixteen 8-T SRAM cells, which was fabricated by using the Rhom 0.18 µm CMOS process. A Josephson chip and a CMOS chip are connected by Al wire bonding. Fig. 9 is an example of the circuit configuration of the AQFP-CMOS hybrid FPGA in a functional test. Because of the malfunction of the bottom two logic blocks, we tested the function of the top two logic blocks. By applying the control signals to the RC from room-temperature electronics, we defined the circuit configuration, as shown in Fig. 9 . Fig. 10 shows an example of measured waveforms. In the first test sequence, at first, we loaded the data "0001" and "0110" to the CMOS memory by varying the address successively to set the function of LB1 and LB2 to be AND and XOR. Then, V RWL was enabled to send the data to the AQFP circuits. In this test, V RDD was set to 15 mV to obtain large memory output current. After that, the input signals I inA and I inB are applied, and output signals V outC and V outD were measured. One can see that AND and XOR functions are properly obtained in LB1 and LB2. It should be noted that the output data from LB1 appear earlier than those from LB2 by five cycles. This is because the length of the RC is different, as shown in Fig. 9 . In the second test sequence, data "1110" and "0111" are loaded to the CMOS memory to set the function of LB1 and LB2 to be NAND and OR, respectively. One can see that the correct functions were obtained in V outC and V outD . The measured operation margins for the excitation currents I ac1 , I ac2 , and I dc are ±39% (439-990 µA), ±38% (439-969 µA), and ±24% (745-1224 µA), respectively.
V. DISCUSSIONS
The junction number, circuit area, memory cell number, and power consumption of the AQFP-CMOS hybrid FPGAs with two-by-two and 32-by-32 logic cells are listed in Table II . Note that these parameters proportionally increase with the increase in the number of logic cells because of scaling of the system. In the estimation of the power consumption of the CMOS memory, we assumed V RDD to be 3 mV. The circuit area will be further reduced by revising the sparse layout in the current design. Furthermore, the use of a multilayer Josephson integrated process [24] will reduce the area of the circuits significantly. Our estimation indicated that the cell size could be reduced to approximately 40% of the current design. The double active-layered process, where Josephson junctions are fabricated in the different metal layers, also helps to increase the circuit density [25] .
Though we use wire bonding to connect AQFP and CMOS chips in the current demonstration, we plan to use flip-chip bonding with the pad pitch of approximately 150 µm in the next step, which increases the number of AQFP-CMOS connections to several hundred levels. However, more dense connections by using a Josephson/CMOS monolithic fabrication process have to be developed for realizing larger systems. The estimation of the power consumption in Table II shows that about a hundred time larger power is consumed in CMOS memories than AQFP FPGA circuits. It should be noted that the power consumption of the CMOS memory can be further reduced by decreasing the ON-state resistance of the MOS device. This can be simply obtained by increasing the device width or by using a newer CMOS fabrication process. In this case, the possible lowest ON-state resistance is limited by the thermal noise current given by I N = (4k B T Δf/R) 1/2 , where k B is the Boltzmann constant, T is operating temperature, Δf is the bandwidth, and R is the ON-state resistance of the MOS device. For R = 1 Ω, Δf = 5 GHz, and T = 4.2 K, we can estimate the thermal noise current to be I N = 1.1 µA, which is still smaller than the output signal current (∼10 µA) from the MOS device. For R = 1 Ω, the power consumption of the CMOS memory is estimated to be about 0.1 nW/bit, which makes the values in Table II 100-fold smaller.
As stated in the previous section, the latency of each wiring block, SB, iCB, and oCB, is one cycle, whereas that of the logic block, LB, is two cycles. Therefore, the latency for wiring is at least two cycles between neighboring logic blocks, and it will increase proportionally with an increase in the distance between logic blocks. The disadvantage of the large interconnection latency of the AQFP FPGA as compared to conventional AQFP circuits with a maximum interconnection length of approximately 1 mm must be compensated by the system's reconfigurability in practical applications.
VI. CONCLUSION
In this article, we proposed a low-power area-efficient AQFP-CMOS hybrid FPGA, where a CMOS memory was used as a rewritable ROM. The energy consumption of the CMOS memory was considerably reduced by decreasing the supply voltage to the MOS device for reading out. The proposed AQFP FPGA circuit is composed of four circuit blocks: logic blocks, switch blocks, input connection blocks, and output connection blocks, to which control signals are supplied from the CMOS memory cells. We designed and implemented an AQFP-CMOS hybrid FPGA with 2 × 2 logic cells by using the AIST 10 kA/cm 2 Nb high-speed standard process and the Rohm 0.18 µm CMOS process. It was shown that the power consumption of the AQFP FPGA circuit is 12.4 nW at 5 GHz, whereas the power consumption of the CMOS memory was estimated to be 1.02 µW assuming the CMOS source voltage of 3 mV. We confirmed the correct operation the AQFP-CMOS hybrid FPGA at low speed.
