I. INTRODUCTION

D
IGITAL receivers for high-bit-rate communications are spurring on the conversion rate of analog-to-digital converters (ADCs). State-of-the-art disk-drive read channels and high-speed Ethernet signals use partial-response signaling, which requires 6-b resolution at conversion rates of 1 GHz and beyond [1] . Furthermore, continuous DVD playback and certain Ethernet uses permit no idle times when the linearity of the ADC may be self-calibrated by autozeroing [2] . Fig. 1 shows the block diagram of an ADC embedded in the front end of a 1-Gbit/s PRML read channel. In this architecture, the amplified read waveform is sampled and digitized for detection. Accurate timing and gain control requires a dynamic range higher than 45 dB. The analog delay line can degrade the signal dynamic range by 10 dB. Therefore, the dynamic range at the sampler output should be higher than 55 dB which defines the ADC's track-and-hold (T/H) linearity requirement. Converter latency is not important here, because the ADC is not part of the timing and gain control loops. For PRML waveforms, the 6-b ADC should convert waveforms occupying half the Nyquist bandwidth with 5.5 effective-bit linearity.
The literature describes the flash architecture and variations used in many high-speed low-resolution ADCs. The fastest reported 6-b ADC operates at 4 Gsample/s, using a folding architecture with on-chip T/H and implemented in a GaAs HBT technology [3] . Folding architectures lead to low latency and power, but the requisite analog preprocessing of input signal degrades dynamic performance [4] . If a T/H is not present, autozeroing needs series-input capacitors that limit resolution bandwidth, and also requires "idle time" for offset cancellation [5] , [6] . Background autozeroing enables continuous conversion, but reManuscript received April 10, 2001 ; revised June 19, 2001 . The authors are with the Department of Electrical Engineering, University of California, Los Angeles, CA 90095-1594 USA (e-mail: abidi@icsl.ucla.edu).
Publisher Item Identifier S 0018-9200(01)09318-0. quires complex clocking that is incompatible with high-speed operation [7] . An input T/H improves the resolution bandwidth if wideband signals are to be digitized with the specified resolution [3] , [8] . Instead of acquiring samples of the input waveform, a distributed T/H array [9] captures the zero crossings at the output of a flash-like array of preamplifiers. The T/H circuits in this arrangement need not be linear, which simplifies the circuit, but like any parallel scheme, they suffer dynamic inaccuracies due to skews in the clock signals distributed across the array. At high input frequencies, a single front-end T/H with sufficient linearity will outperform the distributed T/H [9] , [10] .
This paper reports on a 6-b ADC without autozero or self-calibration, which digitizes a 630-MHz input tone with a linearity of 5.5 effective bits at 1 Gsample/s, and a 650-MHz input with five effective bits at 1.3 Gsample/s. Enabling this performance for a CMOS ADC are a simple yet accurate T/H circuit and analog techniques to improve accuracy at high conversion rates and wide input-signal bandwidths. Section II presents the architecture and key ideas to improve the sampling rate and resolution bandwidth. Section III describes resistive averaging to improve linearity and speed. Circuit implementation of the major building blocks is described in Section IV. Finally, experimental results are discussed in Section V.
II. ADC ARCHITECTURE
As Fig. 2 shows, this is a flash ADC. Balanced circuits in the T/H, reference ladder, and other preamplifiers largely cancel second-order distortion. The T/H is designed for 8-b linearity, but with the highest possible bandwidth. The preamplifier array with gain of 3 senses the difference between the differential input signal and the differential threshold to drive the latched 0018-9200/01$10.00 © 2001 IEEE comparator. However, the preamplifiers in the array suffer from random offsets. Collective averaging of the preamplifier outputs across a properly designed resistor network lowers the impact of the offsets, improving accuracy of the threshold comparison without degrading bandwidth [13] . A comparator must quickly recover from large overdrive when the changing input voltage rapidly approaches the comparator's threshold from far away. To aid this, the gain of the comparator's input differential pair is about one, which widens its bandwidth. The resulting net amplification of 3 is not sufficient to overcome the dynamic random offsets in the regenerative latch for 6-b accuracy. Resistor averaging is also used within the comparators to lower the impact of offsets. Interpolation is not used in this work, because it degrades bandwidth [12] , [13] . The second comparator array provides rail-to-rail logic swing for the digital back end consisting of the SR latch and ROM-based digital encoder. Fig. 3 shows a preamplifier array with averaging resistors connecting adjacent output nodes. In this work, offset averaging is treated and optimized based on the concept of spatial filtering [13] , [14] . The -network forms a spatial filter with impulse response (1) where represents the index of differential pairs in the array. The ratio determines the width of the impulse response. An array of differential pairs injects two different types of stimuli currents in the averaging network. First, differential signal currents limited by the linear region of the differential pair enter the network. Second, "noise" currents due to transistor random mismatches also enter the averaging network. A well-designed averaging network should filter out the random currents, without losing valuable signal current, nor should the network lower bandwidth. Averaging is optimum when (the number of differential pairs in the unclipped linear region of their characteristic) is greater than the impulse response width of the network [13] . Dummy preamplifiers are inserted to compare the input signals with thresholds beyond the actual full scale, so that the tails of the averaging network's impulse response remain undistorted at the ends of the full scale. A circular arrangement with sufficient dummy preamplifiers maintains translational symmetry of the array across the input full scale.
III. OFFSET AVERAGING
A. Accuracy Improvement With Averaging Network
B. Speed Improvement With Averaging Network
It is by now well known that averaging improves accuracy. What is not as well appreciated, as explained below, is that averaging also improves speed.
The input random offset voltage is inversely proportional to the square root of the transistor gate area [15] . Therefore, preamplifier FET size may be related to the target ADC resolution as follows. (2) References [13] , [14] show that the optimum averaging network lowers random offset by up to 3 , with a spatial impulse response 18 nodes wide. With this network connected to the outputs of a preamplifier array, a given accuracy is maintained with 9 smaller FETs than if no averaging were used. The bandwidth as set by the output pole of this differential pair, shown in Fig. 4 , is (3) With averaging, the wiring and junction capacitance are 9 lower due to transistor scaling (3 less ). For the same bias current, the load resistance must be 3 larger for equal voltage gain, which means that (4) where is the parasitic capacitance of the averaging network. Comparing the two terms from (3) and (4) (5) The second term in (5) is much less than 1, while the third term is close to 1. Therefore, use of the averaging network raises preamplifier bandwidth by 3 .
IV. CIRCUIT DESCRIPTIONS
A. Track-and-Hold
An input T/H improves the dynamic performance of an ADC. By holding the analog sample static during digitization, the T/H largely removes errors due to skews in clock delivery to a large number of comparators, limited input bandwidth prior to latch regeneration, signal-dependent dynamic nonlinearity, and aperture jitter [11] . The on-chip T/H circuit shown in Fig. 5 precedes the flash quantizer. This simple circuit, consisting of a passive nMOS switch connected to a sampling capacitor through a dummy switch [16] , [17] , offers just the required linearity. The dummy switch lowers the common-mode jump after the track-to-hold transition. The main sources of distortion are signal-dependent charge injection on switch opening, nonlinearity of the source follower, and dynamic current into the signal-dependent input capacitance of the quantizer.
FET charge injection and clock feedthrough cause distortion by adding/removing charge on the hold capacitor when the switch disconnects the signal source [18] . A dummy switch driven by the complement of the switch clock lowers these effects [19] . Because both the source and drain of the dummy switch are connected to the hold node, its ratio is initially chosen as half the size of the switch, and then tuned through simulation. A low input common-mode voltage, 0.5 V, enables use of only nMOSFETs for the sampling and dummy switches. Unlike a complementary switch, nMOS (or pMOS)-only switches effectively cancel charge feedthrough, even with process variations.
Sample jitter due to the input-dependent switch opening is another distortion source and is described by the following output equations with a differential sine input. (6) where is the maximum clock voltage, is the transition time of a clock edge, is the input frequency, and is the input amplitude [20] . For given (3.3 V), (0.15 ns), (500 MHz), and (0.4 V), the total harmonic distortion is 68 dB for the differential output, and 25 dB for the singleended.
The low-input common-mode voltage of 0.5 V allows a larger gate overdrive to turn on the switch, which lowers track-mode distortion due to nonlinear channel resistance. The source is tied to the well in the pMOS source follower to eliminate nonlinear body effect. Simulations show that when acquiring samples of a 250-MHz full-scale sine wave at 1 Gsample/s, the T/H delivers samples to the quantizer input with third-harmonic distortion of about 60 dBc.
B. Preamplifier
The preamplifier stage should be wideband and provide sufficient gain to overcome comparator offsets. It should also recover from large overdrive within one clock cycle. An open-loop lowgain-stage amplifier is gain-bandwidth limited, therefore unsuitable for use at high speeds because of poor overdrive recovery [6] , [21] - [23] .
The following analysis addresses the fundamental limitation of an open-loop single-pole amplifier stage in overdrive recovery and justifies quantitatively that an amplifier with a reset switch meets the speed requirement. Consider a simple onestage single-pole amplifier, shown in Fig. 4 . The preamplifier is completely unbalanced at . With a step input applied to the preamplifier stage, the output transient is shown in Fig. 6 . The step response of the amplifier is given by (7) where is the voltage gain, is the voltage difference between the input and reference tap, is the tail current, and is the time constant. The overdrive recovery time is calculated by setting of (7) to zero.
Solving (8) for results in (9) In order to determine the relationship between and with fixed bias current and transistor size, substitute and into (9) , where . Solving this gives overdrive recovery time for to change polarity. of 1 GHz is required to recover from overdrive in 0.35 ns. Now assume that the output step response does not settle in one sampling period as shown in Fig. 6 . The gain-bandwidth requirement to obtain the desired gain ( ) can be derived from (7). The output voltage at is given by (11) Solving (11) for gives
The gain-bandwidth product (GBW) required to obtain gain of 3 ( ) within one period of 1.3-GHz sampling rate is shown in Fig. 8 for various , multiplying (12) by the dc gain . GBW is optimum when the dc gain is about 4. Given of 0.1 V, GBW should be higher than 2.5 GHz. However, the GBW is limited by the FET channel length as follows: (13) where is a constant that defines . Fig. 9 shows the achievable GBW with given based on extracted simula- tions. Therefore, the GBW specified in Fig. 8 cannot be met with 0.35-m CMOS. However, it is possible to surmount this apparently fundamental limit by exploiting the clocked nature of this circuit. A reset switch is inserted between the two output nodes, as shown in Fig. 10 . While the T/H is in track mode, the reset switch is turned ON to erase the residual voltage from the previous sample. When the T/H is in hold mode, the reset switch is turned OFF. Similar to the previous analysis, the step response for the amplifier with the reset switch is given by (14) The desired voltage gain should be made available at because the amplifier is in reset mode for half the period. The output voltage at is given by (15) Solving (15) for gives (16) The GBW required to obtain amplification of 3, 4, and 5 at the end of a half period of 1.3-GHz sampling rate is shown in Fig. 11 . The higher the dc gain, the lower the required GBW. However, the offset averaging network needs a well-controlled load resistor. As well, an increase in load resistor for higher voltage gain lowers the output common-mode which can cause the input transistors to enter the triode region. For preamplification by 3 within a half period of 1.3 GHz, a dc voltage gain of 4 is selected, which needs 2.2-GHz GBW. As shown in Fig. 9 , this can be obtained with V.
C. First-Stage Comparator
Fig . 12 shows the first-stage comparator. This circuit consists of an input differential pair -and a latch pair -, both sharing the diode-connected MOS load -. Unlike a MOS triode region resistor, this load largely decouples the signal gain and output common-mode voltage. Due to the circular and symmetric nature of the network, nonlinearity of the diode loads does not offset zero-crossing at the output of the resistive averaging network.
In reset mode, is low and the differential output follows the differential input. While the comparator's input is being sampled to its output, a shorting switch lowers the first comparator voltage gain to less than one and erases memory of the previous decision. A similar comparator has been reported previously [24] , [25] , which applies a short pulse to the reset switch to improve voltage gain. However, it is impractical to generate a short pulse at 1.3 GHz, when the clock transitions alone consume 40% of the period.
This comparator uses the standard to turn on the shorting switch . Fixed gain-bandwidth product implies that the low voltage gain during the comparator's reset mode gives a high output bandwidth for fast erase. However, the gain should be large to overcome dynamic offset in the regenerative latch [6] , [26] - [29] of as much as 40-mV rms. Rather than raise the voltage gain and degrade speed, we have discovered that resistor averaging across an array of regenerating latches lowers dynamic offsets as well. Use of averaging in the latch stage increases regeneration time constant that necessitates cascaded comparators to lower bit-error rate. However, the overall power reduction and sampling rate increase due to averaging surpasses this minor drawback as discussed in the following section. In the ideal case with no transistor mismatch, the comparator output changes polarity when a slow ramp input crosses the corresponding threshold. With random variations added to the transistors and 10% mismatch in the load capacitance, Fig. 14 shows that the time the comparator changes polarity is randomly dispersed with respect to when the input crosses zero. For each Monte Carlo simulation, the input voltage is measured when the output changes its polarity. The sigma of input-referred random offset is obtained by curve-fitting the corresponding input voltages to the Gaussian distribution. Fig. 14 is a composite plot of 60 Monte Carlo transient analyses.
D. Two Stages of Averaging
Without averaging, the input-referred random offset in one comparator chain is about 11 mV, as shown in Fig. 15(a) . With offset averaging applied only to the preamplifier stage, shown in Fig. 15(b) , the offset sigma is reduced to 9 mV. This is not sufficient for 6-b resolution. The dominant offset source is the dynamic offset in the first latch. With offset averaging applied both to the preamplifier and comparator stages, as shown in Fig. 15(c) , the input-referred random offset sigma value is lowered to 3.7 mV, which is sufficient for 6-b linearity. This sigma value can be derived analytically as follows.
mV (19) with simulated preamplifier random offset ( mV), latch random offset ( mV), both averaging factors ( and ) of 3, and preamplifier gain ( ). We conclude that averaging after the preamplifier array and dynamic comparator array reduces the overall input-referred random offset by about 3 . Suppose, hypothetically, that averaging is only applied to the preamplifier array, and the preamplifier gain is raised to overcome latch dynamic offsets. This gain must be - (20) This has important repercussions on the power consumption. To raise the gain 3 in a resistor-loaded single-stage amplifier whose output pole frequency, set by the junction capacitance, is to be kept fixed, the FET must be raised 3 . This implies 9 higher current. Therefore, cascaded averaging saves almost an order of magnitude in preamplifier power. Fig. 16 shows the second-stage comparator. A single-phase clock is applied from this stage throughout the digital back end; multiple phases with the need to accommodate skews would limit the highest clock frequency [30] . The second stage provides rail-to-rail output to the SR latch. As in the case of the preamplifier and the first-stage comparator, overdrive recovery time also limits highest ADC clock frequency. During reset mode, its output is reset through two parallel discharge paths for fast overdrive recovery. In the next half clock cycle of regeneration, differential pair nMOSFETs ( ,
E. Second-Stage Comparator
configured from cross-coupled CMOS inverters ( -) steer the tail current from one side to the other, speeding up regeneration. Without to couple the nMOSFETs, one output node needs to drop its voltage from to below to turn off the corresponding nMOS transistor. The SR latch is the same structure as the second-stage comparator without the tail current source.
F. Digital Encoder
Many digital encoding schemes have been developed to suppress glitch errors caused by the comparator metastability [31] , [32] and bubble errors in the thermometer code [7] , [34] - [38] . Metastability errors occur when nonbinary comparator levels drive the digital encoder and produce senseless outputs. Bubble errors result from three major sources. First, the overall input-referred random offset greater than 0.5 LSB can switch the order of the two adjacent thresholds and create a bubble error. Second, for the comparators without a single front-end T/H, slew-dependent sampling and clock dispersion [39] increase bubble-error probability. Finally, the propagation-delay variations through the preamplifier stage as a result of limited bandwidth and fast input frequency [7] , [25] worsen the bit-error rate due to the bubble errors.
In this work, the cascaded regeneration times of two ranks of comparators and SR latch guarantee a metastable error rate of less than 10 at 1 Gsample/s. The front-end T/H eliminates the clock skew-dependent bubble errors. The reset switches in the preamplifier and the first-stage comparators suppress the bubble errors caused by inadequate overdrive recovery.
With the three major bubble error sources removed, a ROMbased encoder maps thermometer code output from the flash array into quasi-Gray code (Fig. 17 ) [40] , [41] . This arrangement requires only two-input NAND gates, yet suppresses bubble errors equivalently to a conventional Gray encoder, which uses slower three-input NAND gates. To operate at 1-GHz sampling rate and beyond, the D-flip-flops are implemented with true single-phase clocked (TSPC) circuits [42] , [43] . The encoded output is decimated by 16 on chip to enable easy acquisition by a logic analyzer. The output buffers are differential circuits driving 100-loads.
G. Clock Generator
In this ADC, clock jitter randomly modulates the periodic sampling instants of the T/H. Nonuniform sampling raises the noise floor of the digitized system, degrading signal-to-noise ratio (SNR). Clock jitter is increasingly a concern relative to the short period of high-speed ADCs. Given a sinusoidal input waveform with amplitude and radian frequency , SNR due to clock jitter only is (21) where is the rms clock jitter [44] . According to (21) , to obtain SNR of 38 dB (the ideal SNR for 6-b quantization of a sinewave) at input frequency of 650 MHz, the rms clock jitter should be less than 3 ps. Low-noise methods taken from analog circuit design are applied to the clock generator (Fig. 18) . The circuits convert a differential sinewave input, terminated in 50 to suppress off-chip reflections which may upset duty cycle, into and [14] . The size of each CMOS inverter buffer chain is based on the clock load: 5.9 pF for and 1.5 pF for . When this ADC's measurement (35-dB SNDR with 650-MHz input in Fig. 23 ) is applied to (21) , the deduced clock jitter is less than 4.3 ps.
V. PERFORMANCE EVALUATION
The chip occupies 2.2 2.2 mm with 0.8-mm active area fabricated through ST Microelectronics using MOSFETs only in the 0.35-m four-metal single-poly BiCMOS6M technology. The process offers double-metal linear capacitors and precise poly resistors. More than half of the total pads (33 out of 64) are assigned to power and ground to lower series inductance. Digital and analog supplies are separately connected to on-chip decoupling capacitors (1.2 nF for digital power and 0.4 nF for analog power). Extracted simulation shows that total 600-pF decoupling capacitor with 5.6-nH inductance causes 100 mV power and ground bounce. The larger on-chip decoupling capacitance avoids unexpected performance degradation due to power and ground bounce. Fig. 19 shows a microphotograph of the ADC layout. The averaging network connects the two edge preamplifiers in the array. The folded reference ladder provides differential reference voltages to the preamplifier array so that the two preamplifiers at the edges are positioned next to each other. This equalizes the parasitic capacitance at each node in the array. The first comparator array with averaging network is laid out in the same way. The digital back end, pipelined for high-speed operation, occupies about 50% of the active chip area.
The entire chip was extracted and simulated using mixedsignal techniques, and the reported performance is from the first silicon. Fig. 20 shows the test setup of the ADC. The die is mounted in a 64-pin TQFP for all tests. To measure dynamic performance accurately, the ADC is clocked with a Integral nonlinearity (INL) and differential nonlinearity (DNL) are calculated from histograms of ADC output data. When digitizing a 630-MHz sinusoid at 1 Gsample/s, peak DNL is less than 0.2 LSB, and peak INL is below 0.35 LSB (Fig. 21) . Fig. 22 shows the reconstructed spectra. With a 630-MHz input at 1 Gsample/s, the ADC is 5.5 effective-bit accurate. With 650-MHz input converting at 1.3 Gsample/s, the ADC is five effective-bit accurate. Both of these figures are consistent with 6-b linearity and indicate that the cascaded resistor averaging effectively suppresses threshold errors to efficiently attain 6-b linearity.
The ADC's dynamic performance shows an effective resolution bandwidth exceeding Nyquist input frequency. The signal-to-noise-plus-distortion ratios (SNDRs) at 1.025 and 1.312 GHz conversion rate with various input frequencies are plotted in Fig. 23(a) . The reconstructed SNDR is a flat 35 dB up to 630-MHz input for 1-GHz sampling rate, and 32 dB up to 650-MHz input for 1.3-GHz sampling rate. Fig. 23(b) shows SNDR with fixed low input frequency (100 MHz) but swept conversion rate. Therefore, SNDR is above 5.5 effective bits up to a maximum conversion rate of 1.34 GHz.
Bit-error rate is measured by digitizing a full-scale low-frequency sinewave at 1 Gsample/s which changes by at most 1 LSB in successive samples. Ten thousand iterations of one million output data are captured by a logic analyzer. A bit error is detected when the digitized output shows a change of more than 2 LSB in successive samples. No error was detected in this test to meet the bit-error rate requirement ( 10 ) of partial-response maximum-likelihood (PRML) read channel.
Excluding output buffers, this ADC consumes about 500 mW from 3.3 V at 1 Gsample/s, and 10% more at 1.3 Gsample/s. The logic circuits and clock buffers consume half the power, and this portion of the total power will scale down with technology. The performance summary is given in Table I . Reset switches in the preamplifier and comparator arrays give fast overdrive recovery. The second comparator is the fastest such CMOS circuit with rail-to-rail output. In this work, averaging networks are used in three places: first, to lower the impact of preamplifier static offset; second, to lower the latch dynamic offset; and third, to improve the preamplifier bandwidth by 3 . These ideas have led to a compact CMOS ADC with unprecedented dynamic performance.
