Abstract-We have devised a digital time-division multiplexing (TDM) scheme for minimizing the circuit complexity required for an array of sensors. As a proof-of-concept, we have designed, fabricated, and tested a four-channel digital TDM readout circuit. The proposed scheme can be generalized for a larger number of channels. The readout circuit comprises an array of ADCs to digitize sensor outputs, a multiplexing unit, a clock controller, and a counter with parallel-to-serial output interface. For demonstration purposes, we employed low-pass phase modulation-demodulation ADCs running in a synchronous mode of operation. To facilitate an independent verification of circuit operation, we also placed an on-chip pattern generator to apply a unique pattern to each channel. The multiplexing unit is based on an array of sequentially triggered switches, each controlling the flow of data from a single ADC to a common output bus. In our scheme, the switches are realized using RS-flip-flops with nondestructive (RSN) readout cells with only one RSN cell turned on at a time. Multiplexed output data were stored in ripple counter based on T-flip-flops and read out to room temperature electronics using a serial interface. The chip was immersed in liquid helium at 4.2-K temperature and extensively evaluated at sampling frequencies up to 12.8 GHz. By means of embedded pattern generators, we proved the correct operation of each channel and of all four channels combined. We also were able to perform reconstruction of a signal applied to individual ADC. The chip was fabricated using HYPRES' 4.5-kA/cm 2 process with four Nb metal layers. We briefly discuss the proposed scheme's scalability for higher current density and smaller feature size fab processes.
I. INTRODUCTION
A RRAYS of cryogenic detectors and sensors find a variety of applications in the fields of astronomy, high-energy and nuclear physics instrumentation, and sensitive imaging. Superconductor mixed-signal integrated circuits (ICs) offer a compelling solution to the needs of such cryogenic detector arrays. Digitizing detector outputs at low temperature, close to the detectors, ensures naturally noise-immune digital transport to room temperature. This is especially important as the number of detectors increases since analog signal transport is Manuscript received September 6, 2016 ; accepted November 19, 2016 . Date of publication December 8, 2016 ; date of current version December 28, 2016. This work was supported in part by the U.S. Department of Energy under Contract DE-SC0007659.
The authors are with HYPRES, Inc., Elmsford, NY 10523 USA (e-mail: asahu@hypres.com; tfil@hypres.com; masoud@hypres.com; dkir@hypres. com; gupta@hypres.com).
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TASC. 2016.2637336 susceptible to crosstalk and noise pick-up. These ICs, featuring monolithic integration of analog and digital circuits, feature lower power consumption, higher sensitivity, and much higher radiation hardness than cryogenic semiconductor circuitry. The advantages of using time-division, frequency-division, and code-division multiplexing in optimizing communication resources are well-known. Naturally, these techniques have all been considered and implemented for cryogenic detector readout with superconductor circuits, mostly in the analog domain [1] - [4] . Frequency-division multiplexing with an array of resonators of different frequencies coupled to detectors is particularly attractive for microwave kinetic inductance detectors (MKIDs) [5] , [6] . Time-and code-division multiplexing with SQUIDs have been demonstrated in the analog domain with great success for arrays of transition edge sensors (TES) [7] , [8] .
The same concepts can be realized with a SQUID-based digitizer, where the sensor output is converted into a stream of single flux quantum (SFQ) pulses and digital logic is used to perform multiplexing and further processing. If the bandwidth of the analog signal is much smaller than the maximum sampling rate, one can take advantage of oversampling to simplify the digitizer front-end at the expense of including a digital filter to reduce the output sampling rate while increasing the number of bits of resolution. Since SFQ circuits are fast, the digitizer frontend can be simplified to a single SQUID [9] , either operating in asynchronous mode, or as a single-junction 1-bit quantizer [10] , operating with a large oversampling ratio. By digitally multiplexing these 1-bit oversampled representations of sensor data into a common digital filter, one can greatly reduce complexity and the power consumption of the overall digital readout scheme; the switching energy of Josephson junctions used in digital logic is in the 10 -19 J range for 4-K operation. The digitizers and digital logic can be efficiently integrated on a Nb Josephson junction (JJ) integrated circuit chip.
In this paper, we describe the design and experimental results of a representative digital time-division multiplexing readout circuit chip. The chip demonstrated full functionality at clock frequencies above 12 GHz using a synchronous single-bit ADC 4-channel front-end.
II. DESIGN OF DIGITAL TDM CIRCUIT
Multiplexing permits sharing of resources, such as processing hardware and data communication channels, among multiple data sources. Time-division multiplexing is particularly well-suited when the source data vary slowly compared to the 1051-8223 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information. digitizing and digital processing hardware, which is the case for readout of cryogenic sensors (such as TESs) with SFQ digital circuits. Time-division multiplexing involves interleaving samples from multiple such digitizers through a common data output channel. Fig. 1 depicts a digitized pulse where the dashed vertical lines represent digital samples approximating the analog amplitude. Let us assume that the signal is changing slowly enough that one can get a sufficiently accurate digital representation by computing over an interval that is N times smaller than the sampling interval. This allows N samples to be multiplexed; N = 4 in the example shown in Fig. 1 .
A. Circuit Architecture
Both asynchronous [9] and synchronous [10] , [11] quantizers, followed by digital counters to perform lowpass filtering, that are suitable for cryogenic sensor readout have been realized with superconductor circuitry. In our implementation, the multiplexer circuit is inserted between the quantizers and the common digital counter to maximize sharing of hardware resources. This is consistent with our digital-RF receiver architecture in which we insert the digital switch matrix [12] between the analog-todigital converter (ADC) front-ends and the digital processors (digital filter, channelizer, etc.) [13] , [14] .
The TDM circuit comprises a set of switches that controls routing of a set of inputs to a common output bus; only one switch is closed during any given time slot to avoid data collisions. In general, time slots may be assigned among the set of inputs according to requirements of a communication system. While our switch elements, realized with non-destructive readout cells with set and reset inputs (called RSN), are fully programmable, we have simplified the multiplexing scheme to a cyclic sequence of successively reading out the inputs (see Fig. 2 ). This scheme is well-suited for an array of sensors without additional information on preferential selectivity. With the goal of accommodating both synchronous and asynchronous digitizers, the multiplexer unit is preceded by a synchronizer.
A lowpass phase-modulation-demodulation (LP PMD) delta ADC [10] was chosen as the digitizer front-end (see Fig. 3 ) to demonstrate the multiplexing circuit. Its input transformer was slightly modified in the current design to provide input coupling of 7 μA per LSB. This is a synchronous (clocked) digitizer. Therefore, we have further synchronized four such front-ends using a common clock. Additionally, provisions were made to test the multiplexer with four easily recognizable patterns derived from the master clock. For example, these four patterns could be submultiples of the clock: f C /8, f C /4, f C /2 and f C .
B. Multiplexer Design
The 4:1 multiplexer circuit (see Fig. 4 ) successively selects each of four input channels for a pre-determined time slot and merges their data on to a common output data bus. Four time slots make up a frame which repeats. Channel selection is done using four sequentially triggered RSN switches. When the RSN is set, the switch is turned on and passes all SFQ pulses arriving at its input to its output, whereas none are passed if it is in reset (off) state.
The circuit that we implemented sets the time slot for each channel at 256 master clock cycles. Therefore, a select clock (f S = f C /2 8 ) is derived from the clock controller and used to define the operation of the multiplexer. This clock is further divided by a factor of 4 to derive the frame clock (f F = f S /4 = f C /2 10 ). The frame clock pulse proceeds from one channel to the next with a delay equal to the select clock period, thus setting the corresponding RSN switch. All switches are reset at select clock intervals just before one of them is set with the frame clock pulse. 
C. Integrated Circuit (IC) Design
A 10 mm × 10 mm readout chip (see Figs. 5 and 6 ), incorporating the 4:1 multiplexing scheme, was designed for fabrication using HYPRES' 4-layer 4.5 kA/cm 2 commercial foundry. The four ADC front-ends (FEs) constitute a block, which is connected to the multiplexer section through delay-matched passive transmission lines (PTLs). The output of the multiplexer proceeds to a binary ripple counter (accumulator), consisting of a chain of 8 toggle-flip-flops with destructive readout (TD cells [13] ) to match the time slot (τ S = 1/f S = 256/f C ) assigned to each input channel. The counter is emptied (read out destructively) at select clock intervals into a register that is serially read out using a read clock (f R = 8f S ) that is 8 times faster than the select clock. This register with serial read out is depicted as a parallel-to-serial (P2S) converter in Figs. 5 and 6. Standard SQUID-stack drivers [14] were used for two serial output data streams (T and RS types, see Fig. 6 ) and the three output clocks (frame, select, and read).
Since a synchronizer was included in front of the multiplexer circuit, the ADC block can easily be replaced with a different set of synchronous or asynchronous digitizers that are best matched to the sensor needs. The clock controller (CC) produces the select clock (f S = f C /2 8 ), the frame clock (f F = f C /2 10 ), and the serial read clock (f R = f C /2 5 ).
III. MULTIPLEXER CHIP OPERATION
The multiplexer operation was verified first at low clock frequency (∼1 MHz) by applying test patterns to the four channels and observing the low-frequency monitors (see Fig. 6 ), which are toggle-flip-flop type SFQ/DC converters. One of these test patterns, shown in Fig. 7 , was the master clock (f C ) and its submultiples (f C /2, f C /4, and f C /8). The raw data, representing the time multiplexed data at the input of the accumulator (ripple counter) show four distinct periodic patterns at the respective frequencies. When the pattern is at the clock frequency, the count is exactly 256, resulting in a single carry pulse output. The frame pulse, occurring once every four select pulses, is branched off at the input of the 4th RSN cell (see Fig. 4 ) for monitoring and appears at the beginning of the channel 4 time slot. Each channel could be set to produce one of three test patterns that are shown at the top-left corner of Fig. 7 .
Next, the operation of the multiplexer, together with the counter and parallel-to-serial converter, was verified through five high-frequency output drivers shown in Fig. 6 . The outputs of one of the test pattern combinations are shown in Fig. 8 . Here, the first channel has no signal (NS) while the channels 2-4 have submultiples (f C /8, f C /4, and f C /2 respectively) of the clock frequency. The resultant counts in the four channels are 0, 2 5 , 2 6 , and 2 7 . The corresponding serialized output data stream shows a single "1" at the correct position. For example, channel 3 has a pattern at f C /4 causing the count to be 2 8 /4 = 2 6 and the single one bit to be at the 7th most significant bit of the 8-bit counter. The frame clock output through the HFD occurs at the input of the channel 3 in all Figs. 8-10 . The HFDs permit the outputs to be observed at higher clock frequencies. Fig. 9 shows operation of the circuit at 6.4 GHz clock frequency for the same input test vector as in Fig. 8 . The circuit was tested up to a maximum clock frequency of 12.8 GHz (see Fig. 10 ). Correct operation of the digitizer through the multiplexed readout scheme was also verified by applying a sinusoidal signal, acquiring the serial output and plotting the digitized data, which represents the first derivative of the signal (see Fig. 11 ).
IV. DISCUSSIONS
The multiplexed readout circuit features fast programmable digital switches (RSNs) between two parts of the basic digital readout circuits, the low-complexity front-end quantizer and the higher-complexity digital counter with a serializer. This enables reuse of the higher complexity circuitry lowering the overall power consumption. This architecture may be expanded in several ways. First, one can replace the shared digital circuitry with more functionally complex logic blocks, such as a digital decimation filter [13] . Second, the front-ends may be changed to other synchronous (clocked) quantizers, such as quasi-onejunction SQUID comparators [11] and its variants [15] , [16] , and asynchronous quantizers, such as SQUIDs (or Bi-SQUIDs [17] ) producing a frequency-modulated SFQ pulse stream. The choice of particular front-end with its sensitivity and bandwidth can be tailored to a specific sensor and its application. For example, using a transformer turns ratio of 120, input sensitivity of 4 nA per LSB was obtained [18] . Third, the use of fast RSN switches makes the multiplexer rapidly programmable using a control vector. Thus, any input combination can be implemented and dynamically reconfigured. Finally, the scheme could be extended to larger multiplexing factors and even to multiple levels of multiplexing.
Power consumption is an important parameter for all cryogenic applications. The integrated circuit described here was implemented with Rapid Single Flux Quantum (RSFQ) logic. The most complex digital part, the counter with serial readout, was previously implemented with ERSFQ logic [19] that has zero static power consumption. A comparable version having a 9-bit counter and parallel-to-serial converter was estimated to consume 6.4 μW at a clock frequency of 32 GHz. Converted to ERSFQ, all the digital parts will consume only about 10 μW. The current high-frequency drivers (HFDs) consume significant power, 100-200 μW each. These can be replaced with SFQ/DC converters to minimize on-chip power consumption [14] . In that case, transporting digitized data to room-temperature will require additional cryogenic amplification which could be distributed among warmer temperature stages of the cryogenic system [20] , [21] . Three such SFQ/DC drivers would consume about 10 μW. So, the total power consumption of the digital readout part could be less than 20 μW which would be shared among all the channels that are being multiplexed. The multiplexer, and the synchronizer preceding it, will indeed scale with the number of channels. The digitizer front-end designs vary a lot depending on the requirements and it could be as low as 1 μW per channel.
The maximum number of channels that can be multiplexed depends on various factors, such as the speed of the sensor response, requirements on amplitude resolution, as well as the maximum clock frequency (f C = 1/τ C ) of the digital measurement system. Let us assume that it is sufficient to sample the signal pulse k times during its width (τ pulse ) with m-bit resolution to acquire adequate information about its shape and time-integral. As a result the signal pulse is represented by k points with m-bit resolution (see Fig. 1 ). This implies that the maximum multiplexing ratio (N max ) will be τ pulse /(2 m kτ C ). If we assume, k = 8, m = 8, τ pulse = 10 μs, and f C = 20 GHz, then we can estimate N max to be around 100. We can assess the limitations imposed by hardware that can be accommodated on a single superconductor IC. For estimating hardware complexity, we will consider scaling our designs to an advanced process with finer lithography and more superconductor layers. For analysis, we take our current standard digital cell libraries for HYPRES' 4-layer 4.5 kA/cm 2 process [22] and MIT Lincoln Laboratory's 8-layer 10 kA/cm 2 process [23] , called SFQ5ee. The area for the main digital blocks, in the chip shown in Fig. 5 implemented in the HYPRES' 4-layer 4.5 kA/cm 2 process is about 8 mm 2 . Laid out in the SFQ5ee process, this area will shrink by a factor of 9 to less than 1 mm 2 . This will allow most of the chip area to be used for digitizer front-ends and interconnect pads to couple individual sensors to them. We estimate the area of the pad and sensitive dc-SQUID front-end to be about 0.08 and 0.06 mm 2 respectively. This would permit well over 100 such front-ends, together with the multiplexer and the shared digital counter, to be accommodated on a 5 mm × 5 mm SFQ5ee IC. The power consumption of such a 100-channel digital multiplexed readout chip could be in the 200-500 μW range.
V. CONCLUSION
We have demonstrated a 4-channel digital multiplexed readout chip, implemented with RSFQ logic, operating up to 12.8 GHz clock frequency. The chip includes four synchronous 1-bit LP PMD ADCs as digitizer front-ends, a digital counter with serial readout, and a time-division multiplexing circuit placed between the ADC front-end and the counter. This architecture minimizes hardware complexity and power consumption, especially when realized with lower power logic families, such as ERSFQ.
