SUPPLEMENTARY NOTES

FUNDING NUMBERS
Introduction
The project "Advanced Single Flux Quantum Devices" has been originated as a sequel to the large-scale DoD URI program "Advanced Superconductor Digital Electronics" . The goal of the present effort is the further development of ultrafast "Rapid Single-Flux-Quantum" (RSFQ) devices and circuits based on the storage, transfer, and processing of digital bits encoded by single quanta of magnetic flux. This approach offers several key advantages over other possible digital superconductor technologies, most importantly an extremely high operation speed and extremely low power consumption.
The goal of this report is to describe the progress during the project. Notice that due to a substantial overlap of the two projects mentioned above, some of the research described below should be considered as supported in part by the URI program as well. Some leverage has also been provided by project "Superconductor Technology for HTMT Computer Architecture" supported by DARPA, NSA, and NASA via JPL.
Major Research Results
A. Fundamentals and Logistics
Fluctuation Effects on RSFQ Circuits (K. Likharev and A. Rylyakov [1])
We have analyzed the effects of fluctuations on the relation between the ultimate speed, power consumption, and bit error rate in RSFQ circuits. Theoretical analysis has been carried out within the framework of two simple models taking into account both thermal and quantum fluctuations. Experimentally, we have studied the bit error rate (in the range from 10" 9 to 10" 13 ) of an XOR gate as a function of bias point position (at the operation range margins), clock rate (from 0 to 25 GHz) and dc bias voltage (which determines static power dissipation). The results of this analysis have been used for the design of better complex RSFQ circuits (see Sec. 2.B below). We have developed a test circuit which allows the study of random spreads of a large number of Josephson junctions and other key components of RSFQ circuits, as well as effects of thermal fluctuations, using just a few contact pads. The circuit is based on the sequential propagation of flux quanta along an RSFQ shift register. This approach is very convenient for the characterization of emerging superconductor integrated circuit fabrication technologies.
Credit-Based Flow Control in RSFQ Micropipelines (D. Zinoviev and M. Maezawa [3])
Micropipelines are used for the asynchronous delivery of data and control signals. Their operation is based on the request/acknowledgment mechanism which does not require a global clock. This advantage is of key importance for RSFQ circuits, because of their extremely high speed. Traditional RSFQ micropipelines are simple and reliable, but their throughput is limited by the round-trip flight time between two consecutive micropipeline stages. This time can be large, especially in the case of across-the-chip or chip-to-chip transfers. We have proposed an RSFQ version of a simple, credit-based flow control mechanism first applied by Kung et al. to ATM networks. This mechanism can to some extent hide the round-trip latency and significantly improve the micropipeline throughput. Implementation of an Accredit flow control requires N-\ additional buffers per micropipeline stage and a credit pool that reflects the free storage availability in the next stage.
Delay Insensitive RSFQ Circuits with Negligible Static Power Dissipation (S. Polonsky [4])
The total power dissipation in RSFQ circuits consists of two parts, dynamic and static. Dynamic power is dissipated in Josephson junctions performing useful logical and data transmission operations. This dissipation is fundamental and proportional to the data rate (at 4 K, of the order of 10" 18 Joule per bit). Static power is dissipated in resistors used by RSFQ circuits to distribute dc bias current between Josephson junctions. This part of the power dissipation is not intrinsic to RSFQ circuits and in principle can be eliminated. We have shown that some Delay Insensitive (DI) RSFQ primitives suggested earlier by our group can be modified so that resistors are no longer required in the dc power supply distribution network, and on-chip static power dissipation is essentially absent.
VHDL Simulation of RSFQ Circuits (P. Bunyk, P. Litskevich, and D. Yu. Zinoviev)
The physical-level simulation of large RSFQ circuits (more than a thousand Josephson junctions) is prohibitively time consuming, thus logical-level simulation is required. For this we have tried to transfer our design to an industry-standard hardware description language, VHDL. Every RSFQ cell is modeled as a finite state Mealy machine described in VHDL. This description is generated automatically from the machine's graphical representation. Each transition in the machine (induced by a single flux quantum switching on a cell input) is tagged with the transition time. This time is calculated as is an interpolation of the results of physical-level simulation and parametrized, e.g., with critical current density, dc voltage bias, or similar global parameters. A fair interpolation can be achieved using a rational function of the delays, calculated in a central point and at the boundaries of the operating region. Special precautions are taken that ensure that RSFQ cells optimized individually preserve their timing properties and operational margins when connected together. This approach has been applied, for example, to the simulation of a 32-bit carry-lookahead pipelined RSFQ adder (see Sec. B below).
B. Development of Complex RSFQ Circuits and Systems
A/D Converter (V. Semenov, Yu.A. Polyakov, and T. Filippov [5])
We have designed several fully operational superconductor analog-to-digital converters (ADC) with performance comparable to or higher than that of the best semiconductor counterparts. Each of the devices is composed of 2 basic parts: a "fundamental" ADC operating at about 20 GHz sampling rate and a digital decimation filter which attenuates high-frequency noise components and then reduces the sampling rate to match it with the bandwidth of an input signal. There are a few interchangeable versions of both parts, so about half a dozen different superconductor chips were designed, fabricated and tested. The chips vary by input bandwidth (from 10 to 100 MHz) and expected accuracy (12 to 17 bits). All the chips, which use 2,000 to 3,000 Josephson junctions are fabricated at HYPRES, Inc. with the standard 1000 A/cm2 Nb-trilayer technology. The chips have been successfully tested at low frequency.
D/A Converters (V. Semenov, Yu.A. Polyakov, and P.N. Shevchenko [6])
We are in the process of testing several prototypes of a digital-to-analog converter (DAC) with input data rates up to 12 MHz. The converters consist of a set of dc Voltage Multipliers (VM) each one with length (and hence gain) increasing by a factor of 8. The VMs are connected in series for dc output and independently controlled by a digital RSFQ circuit. This control is provided for each de VM by the generation of a train of SFQ pulses with frequency (and hence average voltage) proportional to 3 corresponding bits of an input binary code. Due to different gains of VMs the total average output voltage is exactly proportional to an input code. The estimated parameters of the device are as follows: margins for an output (load) current ±0.1 mA, the number of quantization levels 256,000, and a bandwidth over 1 MHz. We are working on two versions of the device with different output voltage magnitudes (up to 10 mV and up to 500mV). Our major goal now is to demonstrate the outstanding accuracy of the conversion, which is expected to be similar to one of the Josephson DC voltage standards.
Digital SQUID (V. Semenov and Yu.A. Polyakov [7])
We are testing the first full-scale digital SQUID based on RSFQ technology, which should outperform its analog counterparts. The SQUID incorporates a 4-loop pickup coil and analog-to-digital converter, which will be presented as a separate report. The device is fabricated using standard 1000 A/cm 2 Nb-trilayer technology and occupies an area 5x10 mm 2 (including the 4x5 mm 2 pick-up coil). The single-bit digital signal produced by the quantizer with 20 GHz sampling frequency is digitally filtered and decimated to the Nyquist frequency of 200 kHz. At this output frequency, 24 useful bits of the differential code are transferred to room-temperature electronics in series via a singtedigital channel using PSK representation. The performance goal of the device is 2 fT/Hz field resolution and a dynamic range in excess of 200 dB for 1 kHz signal bandwidth.
Digital Autocorrelator (A. Rylyakov and Yu.A. Polyakov [8])
We are testing a fully integrated all-digital one-bit RSFQ autocorrelator for shortmillimeter and submillimeter wave spectrometry applications. The 16-channel device, complete with a 16x9 array of binary counters, on-chip double-oversampling quantizer and on-chip clock is operational at clock speeds of up to 11 GHz. The total number of Josephson junctions in the design is 1672 with an estimated total power dissipation of less than 0.1 mW. For high-speed testing of the device we have developed a specialized 16-channel room-temperature interface capable of real-time data acquisition at a 16 Mbps-per-channel output rate. Extensive high-speed on-chip testing of the autocorrelator has also been performed, both in analog and digital modes, in the latter mode with an additional on-chip clock controller. We have also developed a concept of a 128-channel autocorrelator system built on 8 independent identical chips with an estimated Josephson junction count of about 2,500 per chip, i.e. within the reach for the present day fabrication technology. We have carried out the parallel design of 3 versions of the first large-scale RSFQ functional unit for high-performance computing: an integer adder based on the KoggeStone carry-lookahead architecture. The adder versions differ by signal presentation and timing and include: (a) standard RSFQ approach with "Clock Follows Data" timing, (b) data-driven dual-rail approach, and (c) delay-insensitive dual-rail approach.
A critical comparison of the simulation results has indicated that using various versions of the device enables one to trade off simplicity for higher throughput, lower latency, and higher robustness to fabrication spreads. Thus the final choice of the version should depend on the degree of fabrication technology progress.
Floating-Point Adder (P. Litskevitch)
We have carried out a preliminary design of an even more complex RSFQ functional device: a floating point adder integer adder based on the Oberman-Flynn architecture. The results indicate that the floating point adder may operate at the same rate as the integer-point adder, with slightly larger latency.
C. Reviews
Besides the original work described above, a review of the RSFQ technology was written during the report year [11] , Its short version will be presented as an invited report at the forthcoming Applied Superconductivity Conference [12] .
