Abstract-While high-speed analog-to-digital converter (ADC) front-ends in serial link receivers enable flexible and powerful digital signal processing-based (DSP-based) equalization, the robustness and power consumption of these ADCs can limit overall receiver energy efficiency. This paper presents a 25 GS/s 6b 8-way time-interleaved multi-bit search ADC that employs a soft-decision selection algorithm to relax track-and-hold (T/H) settling requirements and improve ADC metastability tolerance. T/H bandwidth is also improved with a new shared-input doubletail three-latch structure. Fabricated in general purpose 65 nm CMOS, the ADC occupies 0.24 mm 2 total area. A signal-to-noise and distortion ratio (SNDR) of 29.6 dB is achieved at Nyquist while consuming 88 mW from a 1 V supply, translating into a figure-of-merit of 143 fJ/conversion step. A measured <10 −10 metastability error rate demonstrates the effectiveness of the softdecision selection algorithm.
of more bandwidth-efficient modulation schemes, such as four-level pulse-amplitude modulation (PAM4). Serial links, which utilize an analog-to-digital converter (ADC) receiver front-end ( Fig. 1) , offer a potential solution, as they enable more powerful and flexible digital signal processing (DSP) for equalization and symbol detection [1] [2] [3] [4] and can easily support advanced modulation schemes. Moreover, the DSP back-end provides robustness to process, voltage, and temperature variations, and benefits from improved area and power with CMOS technology scaling. However, most of the state-ofthe-art ADC-based receiver implementations [1] [2] [3] [4] display higher power relative to their mixed-signal counterparts [5] because of the significant power consumed by conventional multi-GS/s ADC implementations. This motivates exploration of energy-efficient ADC designs with moderate resolution and very high sampling rates to support data rates at or above 50 Gb/s.
Time-interleaving architectures with multiple unit ADCs working at a lower sampling rate are generally employed to achieve sampling rates larger than 10 GS/s, with flash and successive approximation register (SAR) converters often utilized [6] [7] [8] [9] . Flash ADCs [6] , [7] can operate at high sampling rates and a relatively small number of unit ADCs. However, the parallel conversion approach of a flash ADC results in high-power consumption, as the resolution approaches 6 bits due to the switching of all the comparators. Conversely, SAR ADCs offer excellent power efficiency with a minimal number of comparators performing a binary search conversion. Unfortunately, it is challenging to push the unit ADC sampling speed significantly beyond 1 GS/s, resulting in a very high channel count to obtain an overall high aggregate sampling rate [8] , [9] .
Binary search ADCs [10] adopt a similar energy-efficient binary search conversion algorithm as SAR ADCs, but without any digital-to-analog converter (DAC) settling time or SAR logic delay. While this allows for a potentially higher conversion speed, the multi-stage operation can still limit the achievable sampling rate. This issue has been addressed in SAR ADCs, which employ a multi-bit per stage conversion [11] . However, as shown in Fig. 2 , significant area and power overhead results due to the multiple DACs and comparators required to enable multi-bit conversion in an SAR ADC. Fortunately, multi-bit search ADCs can be implemented by directly applying this multi-bit per stage conversion algorithm to binary search ADCs with minimal hardware overhead to enable higher sampling rates at excellent power efficiency.
However, key challenges exist in the efficient implementation of a multi-bit search ADC. One issue is that the multi-stage operation is inherently prone to metastability errors, which can dramatically degrade ADC signal-to-noise and distortion ratio (SNDR) [12] and system performance in serial I/O receivers [13] . Multi-bit search ADCs also suffer from a similar exponential hardware complexity as flash ADCs, resulting in a large load capacitance for the track-andhold (T/H). Although reference prediction techniques [14] can reduce comparator count, achieving the maximum benefit of this approach involves the use of relatively slow unit ADCs, which employ multiple single-bit stages.
This paper presents an 8-channel time-interleaved (TI) 25 GS/s, 6b ADC with 3.125 GS/s unit ADCs, employing a two-stage asynchronous multi-bit search structure that consists of a 2b first-stage and a 4b second-stage that addresses these issues [15] . In order to improve T/H bandwidth and ADC metastability, a soft-decision selection algorithm is introduced and analyzed in Section II. Section III presents the ADC architecture and key circuit blocks, including a shared-input double-tail three-latch structure utilized to reduce the comparator loading of the T/H circuit. Experimental results from a general purpose (GP) 65 nm CMOS prototype are presented in Section IV. Finally, Section V concludes this paper.
II. SOFT-DECISION SELECTION ALGORITHM
A conventional asynchronous multi-bit search algorithm works in a decision ripple fashion, where the search space at each stage other than the first-stage depends on the decisions from the previous conversion stages. While an efficient search operation is achieved by each conversion stage generating hard decisions and having non-overlapping search spaces for the following stages, decision errors occurring at a certain stage other than the final stage result in an erroneous subsequent search space and produce conversion errors at the ADC output. A redundant SAR algorithm [16] tolerates these hard decision errors by overlapping the search space of the following stages, such that the errors can be recovered from the redundancy introduced in the overlapped region. However, increasing the search space translates into more triggered comparators and results in degraded power efficiency for low-to-medium resolution ADCs. In addition, for a conventional asynchronous multi-bit search algorithm, each conversion stage will not be triggered until the previous stage decisions ripple to the current stage. Therefore, if a decision stage experiences metastability and takes an excessive amount of time to generate the ripple signal, the following stages will not have enough time for conversion and will also result in conversion errors at the ADC output. T/H settling errors are a major source of decision errors at the critical first conversion stage. As shown in Fig. 3 , a conventional T/H circuit consists of a bootstrapped switch followed by a buffer to drive the ADC. This buffer is implemented to isolate the ADC input loading from the sampling switch, since the bandwidth of the front-end sampling switch should be larger than the ADC bandwidth to maintain a good dynamic range when the ADC input is close to the Nyquist frequency. Ideally, the buffer output V TH should track the sampling switch output V SW with minimal phase delay and generate a sampled input at the instant when the sampling switch is turned off. However, the large capacitive loading from the ADC and routing can result in a high-power design in order to preserve a high-bandwidth buffer output node. When the sampling switch is turned off, the buffer output will settle to the voltage held at the switch output at a rate determined by the settling time constant. Reducing the buffer's output bandwidth to save power results in the T/H output tracking the switch output with a phase and gain error, as shown by the black and blue V TH curves of Fig. 3 . In a conventional multi-bit search ADC, during the hold phase, the T/H output should settle within 0.5 LSB when the ADC starts conversion. A slower T/H settling time results in less time for ADC conversion and a lower conversion speed.
In order to demonstrate the potential for conversion errors with incomplete T/H settling, Fig. 4 shows this scenario for a multi-bit search ADC, with 2 and 4 bits converted in the first and second-stages, respectively. For simplification, the lines represent the comparators with thresholds at the corresponding reference levels. The T/H output (ADC input) is assumed to be either slightly smaller or larger than 1/2V REF when the firststage comparators are triggered. In the worst case scenario, the T/H output has not fully settled and continues to settle This paper introduces a soft-decision selection multi-bit search algorithm that creates redundancy to tolerate decision errors without the need to overlap search spaces. Relative to a redundant binary search algorithm [16] , this improves the ADC critical timing path to relax T/H bandwidth requirements and metastability errors. The redundancy from the soft-decision selection algorithm offers tolerance to T/H settling errors, such that the ADC can start conversion even if the T/H output has not settled within 0.5 LSB errors. Fig. 4 shows that the soft-decision selection search algorithm introduces auxiliary decision information, represented by the dashed lines in between the first-stage comparators (solid lines), to select the triggered second-stage comparators. These additional decisions are generated by set-reset (SR) latches comparing the rising edges of the decision outputs from adjacent first-stage comparators, which create interpolated levels in the time domain [17] . In order to quantify the performance improvement from the soft-decision selection algorithm, simulations are performed to examine the impact of T/H buffer and comparator time constants on ADC SNDR and metastability error rate (MER). Assuming a 3.125 GS/s two-stage 2b-4b ADC with a 160 ps 50% hold phase period and 35 ps logic delay in between the two stages, Fig. 6(a) shows the effect of the T/H buffer time constant with a 15 ps comparator time constant. The 4 LSB redundancy from the soft-decision selection search algorithm allows relaxing of the T/H buffer time constant by 2× relative to a conventional 2b-4b multi-bit search algorithm when SNDR is kept close to the ideal 37.6 dB. Assuming an allocation of 40 ps for T/H settling, Fig. 6(b) shows that the soft-decision selection search algorithm allows an increase in the comparator time constant by more than 50%. Moreover, the MER with the error threshold of 4 LSB is kept below 10 −10 with the soft-decision selection scheme when the latch time constant is smaller than 20 ps, where the MER is limited by the assumed σ = 0.5 LSB rms noise. The MER with the softdecision scheme is a weak function of the latch time constant up to 20 ps, whereas the MER grows exponentially as the latch time constant increases from 5 ps in a conventional multi-bit search structure.
The hardware overhead of implementing the soft-decision selection search algorithm includes the two dummy comparators at the full scale reference levels and the four SR latches in between the first-stage comparators, as well as three comparators at the second-stage with reference 1/4V REF 
III. ADC ARCHITECTURE AND KEY CIRCUITRY

A. Time-Interleaved Architecture
In order to prove the concept of the search algorithm introduced in Section II, an 8-channel 25 GS/s, 6b ADC is implemented with 3.125 GS/s unit ADCs employing the soft-decision selection algorithm. Fig. 7 shows the TI multibit search ADC timing and block diagrams. The ADC input consists of eight front-end T/Hs, one per unit ADC, clocked by eight phases of 3.125 GHz 50% duty cycle clocks with 40 ps spacing. Calibration DACs are included for both sampling clock skew correction for the eight front-end T/H sampling phases and for the comparators' offset correction/threshold generation in all eight unit ADCs. [17] . This enables a high output at the SR latch to select the second-stage comparators 29-35 instead of [13] [14] [15] [16] [17] [18] [19] and creates redundancy at the second-stage. The second-stage selection logic is skewed intentionally to have a faster enable path delay than reset path delay to increase the available conversion time. All the comparator thresholds are set with a 3b reference ladder providing coarse input references and offset calibration DACs setting the equivalent references to the full 6b resolution. Finally, a MUX-based encoder converts the thermometer output from the second-stage comparators to the final 6b binary output.
B. Unit ADC With Soft-Decision Selection Search Algorithm
C. Front-End T/H
The front-end T/H schematic is shown in Fig. 8 . It consists of a bootstrapped switch clocked at 3.125 GHz followed by a source follower with an additional high-pass path for bandwidth extension. The bootstrapped switch improves the bandwidth and the linearity at the critical sampling node and the front-end T/H architecture allows for a large input sampling bandwidth, as the sampling capacitor is just the input capacitance of the pseudo-differential PMOS source-follower buffer stage. This buffer drives the 360 fF loading capacitance of the core ADC, which consists of 220 fF capacitance from routing and 140 fF from the comparator loading, and provides isolation from kick-back noise. Simulation results show that the T/H output node has a 3 dB bandwidth of 6 GHz. With a 300 mV input common-mode voltage and a 500 mV input swing, the bootstrapped switch achieves up to a 12.5 GHz input bandwidth with a 3.125 GHz sampling clock.
D. Multi-Phase Clock Generation and Skew Calibration
As shown in Fig. 9 , eight equally spaced sampling phases for the front-end T/H are generated from a 12.5 GHz differential input clock. A pseudo-differential self-biased input stage buffers the 12.5 GHz differential clock to drive a CML latchbased divide-by-4 stage which creates eight 3.125 GHz clock phases spaced at 40 ps. Delay lines with digitally controlled MOS capacitor banks are employed in the eight-phase distribution network to calibrate the phase mismatches between the eight critical sampling phases. Measurement results verify that the clock skew calibration has a resolution of about 150 fs and allows for a maximum tuning range of 20 ps per phase.
E. Shared-Input Double-Tail Three Latch
In order to reduce T/H loading, Fig. 10(a) shows the schematic of a shared-input double-tail dynamic three-latch structure utilized in the second-stage of the unit ADCs. Each input stage is followed by three regenerative latches calibrated with 1 LSB difference in threshold levels. Since the input transistors are often sized for a specific input offset variation level, the three-latch structure reduces both the comparators' contribution of the T/H loading and kick-back noise by approximately 3× and the increased load at the firststage output node does not significantly impact comparator performance. A 2b shared capacitive DAC at the first-stage output node and an independent 6b resistive DAC at each regeneration stage allow setting of the comparators' threshold with a worst case 2.4 mV resolution (0.3 LSB) and ±60 mV tuning range relative to the coarse input reference level from the 3b reference ladder with ±10% supply and 0°C-70°C temperature variation. Fig. 10(b) shows the Monte Carlo offset simulation of the latch structure with a 3σ value around 30 mV, which is covered by the comparator offset tuning range. routing from the eight-phase clock generator. The differential 12.5 GHz clock input signal is distributed to the dividerbased phase generator via an on-die differential transmission line. Local decoupling capacitors are placed with the reference ladders in each unit ADC to reduce the impact of kick-back noise on the reference voltages.
Comparator offset/reference calibration and phase skew calibration are both done in the foreground as shown in Fig. 12 . During the comparator offset/reference calibration, ideal dc reference levels are generated from off-chip, and the corresponding comparator output is selected by MUXs and monitored via Labview from the real-time scope. A comparator's output is averaged and the calibration DAC code is adjusted automatically until this average reaches 0.5, which implies that the comparator is metastable and generating 50% 0's and 1's. The foreground skew calibration procedure is done in two steps. First, course phase tuning is performed by manually monitoring the muxed eight-phase clock output on the scope. Then, a sinewave-input FFT-based foreground method [18] is employed for fine phase tuning. Fig. 13 shows that after calibrating the comparator references among the eight unit ADCs and the phase errors of the eight sampling clocks, the 25 GS/s ADC with nominal 1 V supply achieves 32.5 dB low-frequency maximum SNDR and 29.6 dB SNDR at the 12.5 GHz Nyquist, which translates to 5.10 and 4.62 bits ENOB, respectively. The ADC achieves similar SNDR and SFDR performance with 1.1 V supply and suffers from a 2 and 3 dB degradation on SNDR and SFDR, respectively, with 0.9 V supply. At 15 and 20 GS/s sampling rate, the ADC SNDR performance at Nyquist is comparable with that at 25 GS/s and mainly limited by jitter. The ENOB at Nyquist is primarily limited by the 350 fs rms jitter from the frequency synthesizer used as the input clock source and the estimated additional 200 fs rms jitter from the on-chip clock divider and distribution. Fig. 14 shows the ADC output spectrum with 12.49 GHz −1 dBFS input before and after reference and skew calibration, which provides 10.1 dB SNDR improvement.
A sinewave histogram technique [19] is utilized for ADC static characterization. Fig. 15 shows that the maximum DNL and INL of each unit ADC after reference and phase calibration are +0.64/−0.62 and +0.59/−0.60 LSB, respectively. Fig. 12 also shows the metastability measurement setup where an FMC XM105 debug card is connected to a Xilinx ML623 Virtex-6 FPGA used for data acquisition. A 100 kHz sinusoidal input is applied to the ADC, such that consecutive samples have a difference less than 1 LSB. The ADC MER characterization results are shown in Fig. 16 . As the measured MER follows the erfc −1 curve instead of the natural log curve, this implies that noise, rather than metastability errors, is limiting the MER results. This proves the effectiveness of the metastability tolerance with the soft-decision selection algorithm. Table I summarizes the ADC performance and compares this work against recent 6b ADCs with sample rates ranging from 16 to 46 GS/s. The ADC consumes 88 mW power from a 1 V supply, of which 63.9% (56.2 mW) is dissipated by the core ADC, 14.8% (13 mV) by the clock phase generation and 21.3% (18.8 mW) by the T/H, achieving a 143 fJ/conv.-step FOM. Relative to the flash converters, this design achieves significant FOM improvement over the 16 GS/s 65 nm design [6] and 25% faster conversion speed and comparable performance to the 20 GS/s 32 nm SOI design [7] . Similar metastability tolerance is achieved at a lower FOM relative to the 28 nm FDSOI SAR design which employs back-end hardware for metastability correction [8] . While the advanced 32 nm SAR architecture of [20] achieves a better FOM, Fig. 17 shows that the performance of the presented 65 nm prototype ADC falls near the 32 nm design trend and achieves around 10× efficiency improvement compared with the 65 nm design trend.
V. CONCLUSION
This paper has presented an 8-channel 25 GS/s, 6 bit time-interleaving ADC with the unit ADCs employing a 2b-4b two-stage multi-bit search structure to achieve an increased sampling rate. A soft-decision selection search algorithm is implemented with very low overhead to relax T/H bandwidth requirements and improve ADC metastability performance. T/H loading and kick-back noise is reduced with a shared-input double-tail three-latch structure. Measurements verify that the soft-decision selection search algorithm delivers robust ADC performance with a relaxed T/H and comparator design. Overall, the presented design achieves good power efficiency, making it a suitable architecture for a 50 Gb/s PAM4 wireline receiver. 
