Abstract-Averaging networks suppress the random mismatch between the comparators in flash A/D converters (ADCs), but an analytical understanding does not exist, nor a method for optimal design. This paper unifies various forms of offset averaging and derives the optimum by treating the averaging network as a spatial filter. At the optimum, the offsets in terms of LSB are minimized with almost no loss in comparator unity-gain bandwidth. This work leads to an easy-to-follow procedure to design efficient and fast CMOS flash-type ADCs.
I. INTRODUCTION
Communications receivers for QAM, disk read channels, and gigabit ethernet extract the clock and data in a digital signal processor (DSP) on samples of the received waveform digitized to a resolution of 6-8 b. At this relatively low resolution, the straightforward full-flash architecture seems best suited for the high speed analog to digital converter (ADC). However, in CMOS, flash ADCs suffer greatly from random offsets in the comparators which can easily exceed the least significant bit (LSB)-in fact, offsets can limit the achievable linearity to less than 6 b. The spread in threshold voltages scales down inversely as the square root of the size of the comparator preamps but at the expense of higher power consumption and capacitance in these circuits, both of which grow with FET size. There is also the dynamic offset arising from clock switching in the regenerative latch that follows the comparator preamp, which can be much larger in magnitude than spreads in threshold voltages. However, preamplification can lower the relative effect of latch offsets within limits set by the gain-bandwidth tradeoff; that is, at higher gains the preamp bandwidth can shrink to the point that it limits the ADCs overall conversion rate. The key to good resolution at high speeds therefore lies in efficient methods to combat random device mismatches in the preamplifiers and comparators.
Offset averaging is one such method that may be applied to arrays of preamplifiers and comparators in a flash ADC. Various forms of averaging to smooth out the random mismatch across the comparator preamp array have been described previously [1] - [5] . However, these implementations are not as effective as they might be, or they compromise other properties such as signal gain and unity-gain bandwidth. In this paper we show that offset averaging is in essence spatial filtering; 1 unless an implementation exploits this insight, there is no guarantee that it will be optimal. The rigorous analysis presented here unifies offset averaging and interpolation and shows how to optimally lower random offset, the associated costs in signal gain and unity-gain bandwidth, edge effects, tail current mismatch, correlated offsets, and the benefits of cascaded averaging. Averaging used well leads, ideally, to savings of in comparator power and area with negligible loss in bandwidth. The optimal averaging has been applied to the sub-ADCs in a 2-stage 12-b pipeline ADC [6] and a 6-b, 1.3-GSample/s flash ADC realized in 0.35-m CMOS [7] .
Our analysis shows that in the course of averaging signal lost in the comparator preamp array degrades the system signal-to-noise ratio (SNR), weakening the averaging effect and raising input-referred offsets contributed by the comparator latches. However, if the notion of SNR is not being used, as is the case in a recent analysis on offset averaging [8] , it is easy to focus on noise reduction only and miss out on the fact that the resulting nonoptimal averaging network will degrade signal gain and therefore actually worsen input-referred offset.
Following a brief review of the averaging techniques in Section II, the concept of spatial filtering is explained in Section III to deal with the ideal case of an infinite array of comparators with translational symmetry, and a figure of merit is proposed to quantify the effectiveness of offset averaging. Section IV discusses the problems that arise at the boundary of comparator arrays, and how to lower the integral nonlinearity (INL) there. Section V gives a step-by-step procedure to realize the optimum. Section VI highlights some specific cases of interest and extends averaging to a broader scope. Section VII validates the theory with experimental results. Table I summarizes the notation used in this paper.
II. BACKGROUND AND SYNOPSIS
Offset averaging was first presented [1] in an array of comparator preamps comprising a flash ADC, whose load resistors connect to nearest neighbors with lateral resistors [ Fig. 1(a) ]. The authors had found that although the lateral connections lowered the total load resistance at each comparator, and therefore the gain to the random input offsets (the "noise"), they appeared open circuit to the zero-crossing "signal" which defines the comparator's useful output. In later work [2] , current sources (i.e., infinite load impedance) were used in place of the finite 's to restore gain to the zero-crossing signal. As we show in this paper, when using the correct lateral resistors there is almost no loss in the signal gain and bandwidth. Although lateral s do lower the load impedance of a single isolated preamp, when an array of preamps compares an input with closely spaced thresholds the lateral connections convey signal currents from adjacent comparator cells into the critical cell sensing the zero crossing, reinforcing the signal current and enhancing the signal . Under the right conditions, this exactly makes up for the lower impedance; or effectively, the s appear open circuit to the signal. On the other hand, current source loads push the averaging to an extreme, which we will show is not optimum. At this extreme, gain-bandwidth is lost, edge effects worsened, sensitivity to global mismatches such as in tail bias currents heightened, and implementation is complicated.
Predating their use in averaging, lateral resistors were used to interpolate zero-crossings (ZXs) between a coarsely spaced array of comparator preamps or ZX generators [9] - [11] . However, this requires that buffers drive the s to suppress interactions between the preamp outputs, which might otherwise shift the true position of ZXs [10] . As we will show, in averaging this very interaction is beneficial and distinguishes it from interpolation. Indeed, if the interaction is uniform at the output of every ZX generator across the array, the ZX points will not shift from their true positions. At the edge of the full scale, however, this necessitates a number of dummy preamps to extend the array beyond the input full scale. Furthermore, it transpires that as the ratio increases, the interaction between neighbors gets stronger; but this does not necessarily lead to better aver- aging because now more dummies are required, which leave a smaller voltage full-scale for the analog input at a given supply voltage. Casting these tradeoffs into an analytical form leads to the optimal averaging.
Averaging exploits the higher SNR in the sum of many samples of a signal corrupted by uncorrelated noise (the random offsets). As an analogy, consider the random charges trapped in the gate oxide of a MOSFET and the random variation in the depletion charge density which cause the MOSFET threshold voltage to fluctuate randomly [12] , [13] . These fluctuations may be averaged out by connecting many identical MOSFETs in parallel, equivalent to scaling up the MOSFET size. Here the "noise" is the random charge variation; the "signal" is the externally applied charge on the gate capacitance. As the number of MOSFETs increases, the uncorrelated noise adds up as the root-mean-square while the correlated signal adds in magnitude, resulting in an input-referred voltage whose variance shrinks inversely with MOSFET area. It is counterintuitive that increasing the bulk doping worsens the mismatch when the charge fluctuation relative to is reduced by averaging. The absolute charge fluctuation, which is the "noise" associated with mismatch, rises with , but the , which corresponds to the "signal" in this case is almost independent of . This shows the value in correctly identifying both the signal and noise involved in averaging.
In short, averaging requires the summation of quantities spread into two or more samples in either time or space. In the MOSFET example, the parallel connection spreads the noisy charges. Similarly in the preamp array of a flash ADC the output current spreads through the lateral connections in the averaging or interpolation resistor network. Other averaging methods include current spreading with split transistors [14] , and charge spreading with lateral capacitors [3] . Summation is automatic when physical quantities such as currents and charges merge at a node in accordance with Kirchhoff's laws.
The impulse response (IR) of the spreading network, which forms a spatial filter, characterizes the extent of spreading. Usually the wider the IR (i.e., the narrower the filter passband)-obtained in the example above by increasing the number of parallel MOSFETs-the higher the SNR. However, as discussed later there are limits to this in uses such as offset averaging, where a wider IR also filters out signal. To correctly optimize the SNR, the properties of both signal and noise must be taken into account to find the best IR. Therefore, the averaging network must be designed as a matched filter [19] .
III. SPATIAL FILTERING
All properties of an infinite array of ZX generators, as shown in Fig. 1(a) , are the same at every node, that is, the properties are translation-invariant (the spatial dual of time-invariance). The resistor network in the upper part, modeled as a linear system or spatial filter [15] , is subjected to current stimulus from the lower part. In Fig. 1(b) , the current flowing in each defines the filter "output", in response to stimulus from the differential pairs (i.e., the stages) of the ZX generators at nodes of the filter. In the following analysis, we neglect the resistor mismatch in the spatial filter and use current instead of voltage as the system variable.
Although the spatial filter shown here is a first-order one-dimensional (1-D) differential resistive network, equally well it may be any higher-order network, which might not even be resistive. The stimuli might take forms other than current. We define the order of a lateral connecting element in a linear network as the number of the network nodes it spans. Each resistor here is labeled with its order. For example, the lateral resistor in Fig. 1 is labeled , because it spans adjacent nodes; spans no node. The highest resistor order defines the network order. For simplicity, we base the following discussion on a linear first-order resistive network stimulated by currents, and we analyze an actual differential configuration as single-ended. Quantities after averaging (that is, with present) are labeled with a prime.
A. Impulse Response
The IR, , of the spatial filter is found by injecting a unit stimulus current at one node (Fig. 2) , and noting the resulting distribution of current in each at other nodes. For a stimulus current injected to an arbitrary node , Kirchhoff's current law (KCL) requires that (1) where the third and fourth terms are the currents flowing into node from node and , respectively. Using the transform simplifies analysis [15] . Applying the transform to (1)
The inverse transform of yields [15] the spatial impulse If the IR were rectangular, it would be fully characterized by its width. Although it is impossible to realize a brick-wall IR with a resistive network, it greatly simplifies discussion if we represent the actual IR with an equivalent rectangle of width, , which spans roughly speaking, the same number of the nodes across which the magnitude of the actual IR is significant. A precise definition of this equivalence may be, for instance, the rectangular IR that yields the same averaging effect as the actual IR. We will later define on the basis of the boundary conditions, as discussed in a later section.
B. Stimuli
Consider an input voltage equal to the threshold at node . The differential current flowing in the load resistor of the zeroth ZX generator that compares with [ Fig. 1(a) ] may not be zero if an offset current is injected to any node of the spatial filter. Offset currents arise from mismatch [12] , [13] in the differential pairs and fluctuations in the tail currents of the ZX generators. Since offsets are usually within the linear region of the transistors, we have , where and are the transconductance and input referred voltage offset of the th diffpair, and is the error current injected to node due to random spreads in tail currents across the array. We should point out that changes with the state of the associated diffpair: if the pair is at balance, is zero, whereas if the pair is clipped, reaches its maximum (random) value. Now we shift away from by until the output current of the zeroth ZX generator returns to zero. As drives the input of all differential pairs, the consequent current stimuli are injected to each node . If the lateral resistor were absent, . Resistor allows the offset currents to flow into the zeroth from the adjacent ZX generators, causing the randomly opposite current to partly cancel, which may result in rms rms (rms). In terms of spatial filtering, corresponds to incremental "signal" stimulus, while becomes "noise." Fig. 3 plots signal and noise across the array. is differential output current and is the difference between the input and the th threshold. Only the ZX generators operating in the differential pair's -transition region will present signal stimuli while those which are clipped will not. This amounts to windowing an array of current sources with , that is, . If the transition region were linear, this window would be rectangular.
To simplify the ensuing discussion, we will assume that a rectangular signal window of width spans the active preamps in the nonclipped active, or transition, region, i.e., for , otherwise, . For a nonrectangular signal window we can define the equivalent width following the same methodology as for the impulse response. When CMOS differential amplifiers comprise the preamp array and assuming square-law FET characteristics, we can say that , where when the differential pair is biased at balance. Noise currents arising from are also windowed. The ZX generators clip outside the window. However, the clipping levels themselves fluctuate due to spreads in the tail currents. If we assume that the random offset at each node is uncorrelated and the RMS noise current due to differential pair mismatch is roughly comparable to the RMS spreads in tail current, then the "noise" is of the same strength everywhere across the array, both inside and outside the window defined by the transition region. The constant noise spectral density amounts to white noise. If there were no spread in the tail currents, the noise window width , which represents the number of offset current stimuli, would equal . This is the case if the tail current mismatch is minimized using very large transistors, however, this goes against the spirit of shrinking overall chip area with offset averaging.
C. Matched Filter
The decaying IR of the averaging network indicates lowpass filtering in spatial frequency. The windowed signal stimulus sequence occupies a limited band of spatial frequencies centered at dc. The lowpass filter must be wide enough to capture the signal in its passband. The noise spectrum depends on the statistics of and . If it is white, as we have assumed in the previous section, we know from communication theory [19] , that the optimal filter maximizes SNR by matching the passband shape to the signal spectrum. In the case of a rectangular IR and rectangular signal window, the filter is matched when (4)
D. Input Referred Offset
As explained in Section II, when two or more preamps interact, offsets are averaged and random fluctuations in their outputs are smoothed out. Fig. 4 shows the large-signal output currents with set to 0 for convenience before and after interaction. Nominally, . Due to offsets, the dots on the figure representing disperse around the nominal curve . The standard deviation of , shown by the thick shaded curve, is assumed uniform for all 's. Normalized by the transconductance at the zero crossing, appears at the differential pair's input as an equivalent input voltage offset . Three things are now apparent. First, the thickness of the shaded curve along the vertical axis shrinks to when fluctuations in ) smooth out as a result of offset averaging. Noise is now scaled by the "gain" . Second, the slope of the -characteristic at zero-crossing, or the effective signal transconductance, , diminishes to when the interaction involves preamps beyond those in the transition region (i.e.,
). This lowers the signal gain by and also the unity-gain bandwidth at the output of each preamp. The number of interacting preamps defines the impulse response width, , while the number of preamps in the active region defines the stimulus width . Third, the input-referred offset voltage after averaging is now . As sweeps past the value , the denominator of this expression, , diminishes faster than the numerator. Therefore, is least at and we have constructed the matched filter.
E. Error Correction Factor (ECF)
A figure-of-merit of the spatial filter is the amount by which it lowers input-referred offset. This is the ratio of the offset RMS before averaging (without the s) to that after averaging (with the s), and we call it the ECF. Thus (5) where the SNRs are defined as the ratio of the incremental signal current to the offset current at the critical zero-crossing node. An earlier work on averaging [1] , also defines an ECF, but in terms of differential nonlinearity (DNL). This is the improvement in spreads of the difference between successive thresholds, whereas our definition applies to the absolute error in each threshold. Ours is the more stringent and practically useful definition because successive thresholds along the averaging network are highly correlated, which means that averaging lowers DNL substantially more than it does the absolute error. Now let us derive the ECF for a spatial filter with the IR given by (3). We assume that it is driven by a rectangular signal window of height and width , which approximates the actual diffpair -characteristic as a piecewise linear function. Since the output of any filter is found by convolving its input with the impulse response, we have (6) and (7) For a rectangular noise window of width , the output noise current is given by (8) which leads to the standard deviation (9) Taking from (3) and substituting (6) , and (9) into (5), it can be shown that (10) To find the reduction in DNL, we first derive the spread in the difference between adjacent offset currents (11) which leads to the standard derivation, [as shown in (12) at the bottom of the page]and then to the DNL error correction factor (13) The percentage ECF as defined in [1] is precisely our . By setting , respectively, equal to and infinity, (10) and (13) will give both absolute and DNL ECF for windowed and white noise. Fig. 5 plots , and the two ECFs with respect to the filter resistor ratio, . The trends found by analysis are consistent with the earlier qualitative reasoning. The ECF for windowed noise reaches a maximum as approaches 0 (i.e., , while in the case of white noise the optimum lies at some nonzero value, as is expected from a matched filter. From the plot, the DNL improves sharply as approaches zero when the correlation between adjacent offsets becomes 1. For white noise, the ECF is not very sensitive to the ratio; however, the signal gain rolls off rapidly to zero as falls below the optimum, indicating a sharp loss in (unity-gain) bandwidth when . Cross-connection as a way to terminate differential averaging network [2] .
These results are consistent with expressions for ECF that have been proposed in earlier publications. The expressions are all based on simplified circuit analysis that applies to extreme cases such as (or ) [1] , [18] , or [2] . In contrast, the results derived here, which converge asymptotically to these earlier expressions, hold over the entire range of the key design parameters ( , and ) thus enabling global optimization of both INL and bandwidth. The authors of [18] derive ECF for arbitrary using superposition [8] similar to (9) but under the restrictive assumption that the entire preamp array is linear (i.e., and ) as shown in Fig. 1 of [8] . In fact the ECF expressed in (5) and plotted in Fig. 3 of [8] is nothing more than our . In summary, we have cast an averaging resistor network as a spatial filter with an associated impulse response and lowpass characteristic. This formulation predicts the filter response, respectively, to the zero-crossing outputs of the preamp array, which we call the "signal", and to the uncorrelated random offsets, which is "noise". Averaging improves the signal to noise ratio by an amount depending on . This improvement we call the error correction factor.
IV. THE OPTIMUM FILTER

A. Boundary Condition and Edge Effects
Finiteness of the array of preamps poses unique problems at the boundaries of an averaging flash ADC. Usually the preamp array comes to an end at the upper and lower limits of the analog full scale. This will disrupt averaging at the last few preamps, because at the extreme nodes there is no longer an equal number of stimuli into the resistor network from both left and right. As shown in Fig. 6(a) the vertical line that crosses the zero-crossing point at the right-hand extreme of FS intercepts ZXs only in the upper half plane if there are no dummy ZXs. In presence of the lateral resistors, those intercept points contribute positive currents to the zero-crossing node and effectively pull the ZX toward the center of the array. Unless the positive intercept points are balanced with negative ones from dummies or by distorting reference taps, all the ZXs within the range of at each edge are pulled resulting a systematic INL curvature. This may explain why earlier work on averaging does not disclose measurements of INL near the boundaries [2] .
Linearity across the full scale requires that no distortion should build up at the edges. A straightforward solution is to add preamps on either edge of the array to extend the averaging network by at least , the extent of the interaction range. Therefore, dummy preamps numbering are attached at each end, such that (14) Each dummy preamplifier drives the averaging network by comparing the analog input signal with uniformly spaced threshold voltages that extend beyond the actual full scale. However, there is no need to latch the outputs of the dummy preamplifiers, because their sole purpose is to remove edge effects in averaging. To guarantee perfectly uniform IR, the -network must not only span the dummy ZX generators, it should finally terminate in a resistor to ground [17] , which, like a transmission line terminated in its characteristic impedance, gives the impression of infinite length.
is obtained from the following relation:
The accuracy of this expression is easily checked with the special case of , when the network becomes the well known ladder. Equation (15) correctly predicts that the termination for this ladder is a resistor to ground. It is interesting to note that the authors of the very first report on averaging [1] obtain the same expression for as an intermediate variable in an expression, but do not give its physical significance.
It is also possible to design an active termination to the network that eliminates the need for many dummy preamps [18] . This can be complicated and does not necessarily work well, because the termination circuit must balance as many as ZXs at each edge. [8] describes another, simpler, active termination, but this is useful only for small , which leads to narrow IR and limited averaging (see Fig. 5 ). The authors of [8] discuss how to extend this to the case of , which however needs more dummies. Briefly, to correct INL curvature, the averaging resistance tapers down toward the terminations. Approaching the boundary, the growing influence of the termination preamps requires fewer dummies to balance the preamps on the other side of zero-crossing. However, with this termination scheme the circuit is no longer translation invariant.
In [8] , it is questionable that the ECF expression (5) derived for an infinite resistive network can be directly substituted into (16) to yield Figs. 11 and 12 for the finite array with terminations that do not preserve the uniformity within the input full scale. Averaging termination is not only a matter of correcting the INL curvature; it should also preserve key circuit properties across the ADCs full-scale including the ECFs. Also, [8] does not consider the reduction of input FS by the termination over range. Taking into account these tradeoffs, it remains unclear from the analysis of [8] that the net benefit, if any, of these termination schemes, is significant compared to the straightforward approach of using a sufficient number of dummy preamps.
Based on this discussion and (14), we can now define for a nonrectangular IR an equivalent width of the impulse response . We start with a number of active preamps that is much larger than the number of dummies (i.e.,
; then we adjust the number of dummies and monitor the deviation of the ZX at the edges of the input FS until the INL is just acceptable; at this point, we define the impulse response width as (16) In practice, the ADC and the averaging network are almost always differential circuits. This offers the possibility to construct effectively an unbounded averaging network with a seamless loop of resistors whose one edge cross-connects to the other edge [2] , as shown in Fig. 6(b) . However, it would be wrong to think that because this averaging network has no boundary, it needs no dummies. For the input diffpairs stimulate it with a finite array of stimuli that begin and end at the two extremes of the full-scale; and at these extremes, dummies are still necessary. In other words, the unbounded network is subject to a bounded span of stimuli, and edge effects still arise at the boundaries of the stimuli. Fortunately, some simplifications are possible. Dummies may be shared across the cross-connection by using the clipped outputs of ZX generators at one end of the array as dummy stimuli at the other end, because ideally both positive and negative clipping levels are equal in magnitude. The actual number of dummies now need only be (17) In the case of a nonrectangular signal window generated by a real preamp array, we define an equivalent window width as the minimum that holds the INL to an acceptable level in a differential network with seamless cross-connection, and the of the network is much larger than (i.e., ) then (18)
B. Optimum
Consider the outputs of a preamp array in an -bit flash CMOS A/D converter averaged by a network whose impulse response and signal window are both rectangular. The standard deviation of offset after uniform averaging of the random preamp offsets across the window may be calculated from , the sum of the input FET gate areas of the dummies. Recalling that to avoid edge effects, the dummies must span this very window, it follows (19) Here is the mismatch coefficient for FET threshold voltage per unit gate area [12] .
When (16) and (18) describe of an actual averaging network and the of a practical preamp array (Fig. 1) , averaging across the preamps in the window is weighted by nonrectangular IR and nonrectangular signal window. To account for departure from the ideal rectangular window, we introduce a shape factor (where for a rectangle, ). Equation (19) is then revised as follows: (20) where can be thought of equally well as the ratio of the gate area of a lumped single FET that yields the same RMS offset to the total aggregate FET area covered by the averaging window . In terms of , where corresponds to the rectangular IR and signal window of the same widths, and , respectively. Let , where is the total number of preamps, including dummies. Then , where is the aggregate FET gate areas of all preamps. As the dummy preamps compare the ADC input with thresholds beyond the actual analog input full scale FS, it follows that , where is the largest possible full scale across which the comparator preamp can operate properly at a given supply voltage. In effect, the parameter sets the upper limit of possible averaging in the array to meet a certain specification on INL. If the 4-sigma limit of fluctuations in the ZX thresholds is to be held to less than one LSB, that is, , and the preamp gain suppresses comparator latch offsets so that they no longer matter, then using (20) and we deduce that the ADC can resolve at most bits, where (21) Let us fix the total FET gate area in the array and also the supply voltage. Equation (21) shows that averaging is optimum when the term reaches a maximum. This happens when , and implies the following number of dummies: (22) That is, for strongest averaging the dummies should comprise the total number of preamplifiers in the array. Combining (22) with the optimal relations derived in the last Section, we obtain the overall optimum for two cases: 
The inequality in (24) is consistent with (4) . Here the optimal boundary condition forces . Whatever the distribution of the signal and noise outside the IR window, it has no effect on the spatial filtering and does not alter the optimum. When the signal window and IR are not rectangular, the equivalent and defined by the shape factor must satisfy (23) or (24). In Case (a), the signal window limits [ Fig. 7(a) ], [2] . In Case (b) the signal is almost constant within the [ Fig. 7(b) ], so the IR limits . Due to these two similar limitations on , averaging in Case (a) and Case (b) lowers offset to a comparable degree. However, Case (b) also preserves preamp bandwidth and suppresses offsets across the tail currents. Therefore, (24) specifies the practical optimum.
The expression is quite flat around the maximum at . For instance, over the range to the magnitude of changes from 0.340 to 0.354, departing from the peak value of 0.385 by only 12%. The broad optimum means that in practice we can gain most of the benefit with dummies comprising as little as the total number of preamps, or even less.
C. Accuracy of Resistor Ratio
How accurate must be the ratio of and to deliver near optimum benefits? Consider, for example, , and choose a nominal , which gives close to the maximum ECF. The ratio is chosen a little larger than at the maximum point to prevent a steep rolloff in . If the ratio falls by 50% to 0.05, the ECF increases from 3.3 to 3.6 but drops from 0.93 to 0.85. If rises by 50% to 0.15, the ECF also falls to 3.1 but rises to 0.96. We may conclude that a 300% change in the ratio causes only a 15% change in ECF and gain-bandwidth product. Thus, a network with only roughly the right resistor ratio should yield close to optimum averaging.
D. Fundamental Limitation
We will now show that the boundary condition imposes a fundamental limitation to offset averaging. To see this, substitute the peak magnitude of , into (21) and set to the highest possible value, 1. Then after averaging, the flash ADC can at best resolve the following number of bits:
With no averaging, in (21) can be set to , and using the same comparators the resolution must fall to (26) Equations (25) and (26) show that averaging changes the relation between the total gate area of the input FET's from the 3rd to the 2nd power of . This gate area determines the ADC input capacitance . Using some typical numbers with mV-m [13] , V and of 1000 m , while limiting the input capacitance to a few pF, we see that bits with averaging, whereas bits without averaging. That is, at the same supply voltage and in the same technology, averaging improves resolution by 2.1 bits. To obtain the same resolution without benefit of averaging, the RMS offsets in the comparators would have to be lowered by , which requires preamps with larger gate area which will then raise the ADC input capacitance proportionally. This clearly illustrates the benefits of averaging, namely smaller active area, lower power consumption, and lower input capacitance at a given resolution.
In summary, we have shown that averaging worsens nonlinearity at the edges of a finite array of preamps spanning only the ADCs input full scale. This undesirable effect may be suppressed by extending the averaging network at both ends, and driving the extra nodes in the network with dummy preamps which compare the analog input with thresholds beyond the full scale. However, adding more dummies within a fixed supply voltage shrinks the voltage left behind to accommodate the input full scale, which means that for a given resolution the LSB also shrinks. This implies the existence of an optimum. We have found that at the mathematical optimum, the dummies consume one third the voltage span of all preamps. The optimum is rather broad, so fewer dummies will usually suffice.
V. DESIGN GUIDE
The following is a systematic procedure to design a practical CMOS flash ADC with the optimal offset averaging network.
Step 1) (Choose -): Design a unit-size differential amplifier to meet specifications on BW and dc gain. Implement with polysilicon or other resistive material, or even with a MOSFET in triode region [6] or diode connected [7] ; need not be very linear. Choose the tail current MOSFET to be of similar size as the input pair, and bias it at comparable . Use the largest input full scale voltage that the supply voltage and the circuit allows, and adjust so that the transition region of the differential amplifier's -curve is at least as large as , where lies in the range -. If this is too large, the differential amplifiers far away from the zero crossing may have too much output voltage swing for a given dc gain. If the output voltage clips, becomes zero, disturbing offset averaging.
Step 2) (Create -): Now construct an array of these differential amplifiers by distributing their thresholds uniformly across the entire . In this array, preamps will span the input full-scale, , and the rest will act as dummies.
Step 3) (Create ): Add lateral averaging resistors between preamp outputs, and terminate the -network either with the resistance given by (15) , or in a differential circuit by cross-connection of the endpoints. Simulate and plot zero crossings across the array. Sweep until the INL at the two ends of FS is good enough. The plots in Fig. 5 provide a good starting point for selecting .
Step 4) (Tradeoff between performance and INL): Compare the simulated BW and dc gain of the preamplifiers located at the extremes of the array with those of a single isolated amplifier (no lateral resistor ). If averaging worsens either parameter by more than 10%, increase .
Step 5) (Scaling): Scale the entire array to tradeoff the total input capacitance (i.e., power dissipation) and INL. Verify with Monte-Carlo simulation the improvement in INL due to averaging, which should be close to the ECF in Fig. 5 . This design procedure preserves the dc and dynamic performance of individual preamps while lowering INL to the greatest extent possible.
VI. DISCUSSION
A. Extreme Case of
When is replaced by a current source [2] , the network becomes a chain of equal resistors in series, which must be terminated by the cross connection shown in Fig. 6(b) to preserve uniform IR across the preamp array. The resulting IR is linear as shown in Fig. 7(a) and spans the entire averaging network of width , i.e., . From (22), the number of dummies and the signal window must satisfy for optimum averaging. The arrangement suffers from three drawbacks. First, there is less freedom in implementation. To avoid large INL at the edges, the number of dummies now dictate the transition region of the differential pair comparators . There is also a pernicious side-effect that at infinite , the spatial filter is (spatially) unstable [17] and the output voltages may easily clip in practice. Whereas by following the design procedure in Section V, offset averaging is decoupled from circuit design.
Second, since the IR extends over the entire network, random offsets everywhere including tail current mismatch in the network now exercise an influence over the critical ZX. In the case of the optimal averaging network constructed according to this design guide, only offsets inside the compact IR window matter.
Third, signal currents lost to nodes outside the signal window centered at the critical differential pair are now no longer available to charge and discharge capacitance at the critical zero crossing node. The fraction of current lost is the same as the fractional area under the IR not covered by the signal window. As we will now show, this lowers bandwidth.
For a triangular IR of width and a rectangular signal window of width , it can be shown that the dc voltage gain is lower by a factor of compared to stimulating the network at a single node (i.e., ). To maintain the same voltage gain, must be scaled up by the inverse of this factor. As the IR is wider than the span of the preamp's transition region, the total node capacitance that the stimuli must drive is also larger by the factor . The time constant of a distributed averaging network is half the product of total and ; as a result (27) where the time constant applies to the averaging network, and to the case with no averaging. This is exactly the ratio of the total area under the triangular IR to the portion covered by the rectangular signal window.
In the optimal averaging network when . This means that when averaging with infinite , the time constant is 80% larger than when there is no averaging, or equivalently the preamp bandwidth is 45% lower.
B. Averaging versus Interpolation
Averaging an array of preamplifiers is intimately related to interpolating the outputs of a subarray of those preamplifiers with resistors. To understand this, consider a flash A/D converter consisting of a preamplifier array and an averaging network. Measure a distance equal to half of the averaging window from one end of the full scale into the array. Merge all the ZX generators that lie within its span into a single ZX generator of the same aggregate size, as shown in Fig. 8 . The transition region of this ZX generator spans some portion of the averaging network; attach the preamplifier output to the center node of that portion. Next step along the array by half of the width of the averaging window, attach the output of another aggregate ZX generator, and step-and-repeat this procedure until the entire array is traversed. The resistor network will now reconstruct at the taps that lie between the ZX generators all the ZX signals of the original flash ADC. In this way averaging turns into interpolation. The total FET size in the preamps remains unchanged. As the shape factor is 1 for lumped ZX generators, INL is not adversely affected.
The correspondence between averaging and interpolation illustrates a basic principle: that what determines the RMS offset is not the number of ZX generators, rather it is the aggregate gate area of the comparator FETs in the entire array. For a given total gate area, whether many comparators comprising small, inaccurate FETs are distributed at every node of an averaging network, or a few comparators comprising large, accurate FETs are uniformly distributed along an interpolating network, the final spread in offsets is almost the same. This is also confirmed by measurements on actual A/D converters, as we show next.
CMOS flash ADCs using interpolation (and folding) had not advanced beyond a resolution of 8 b when offset averaging was first applied to a 10 b folding ADC [2] . Although, the designers of that circuit expected an ECF of , with 4.5 pF input capacitance and 2 Vpp input full scale they actually obtained effective number of bits (ENOB) of 8.7-b. Shortly before this work, an 8 b folding ADC with interpolation but no averaging was realized in the same technology (0.5-m CMOS) [11] , and with 2 pF input capacitance and 1.6 Vpp full scale it yielded 7.5 b ENOB. Normalizing the differences in input capacitance and full scale between the two, this comparison shows that offset averaging in [2] gives roughly the same ENOB as interpolation in [11] . This shows that the two are, in effect, variants of one ADC.
However, there is one difference and this is in their respective bandwidths. In averaging, an active element drives each node of the resistor string, so to the first order its bandwidth is higher than in an interpolating resistor ladder whose intermediate nodes are passively loaded by capacitance only. On the other hand, the interpolation nodes require fewer wiring connections, which lessens stray capacitance. This suggests that averaging combined with a moderate interpolation of or even will likely yield the highest bandwidth of all. Another difference is that interpolation requires less or possibly even no overrange. This is an advantage when a large input full scale is being used to overcome noise.
With optimal averaging, none of the active signal current generated in the critical differential pair is lost through to inactive nodes; all of it flows into the attached to that differential pair. Fig. 8 shows signal current flowing away from the zero-crossing node and current flowing into that node from the adjacent preamps. The two currents are equal because the IR responsible for current spreading exactly covers all neighbor differential pairs operating in their transition region. Therefore, to the signal stimuli it appears that no current is lost into the lateral connections. This current-conserving property of optimal averaging also lends itself to other uses. For example, in [7] , optimal averaging is applied to an array of regenerative latches to average out their static and dynamic offsets.
C. Cascaded Averaging
Can averaging help if it is applied repeatedly in a cascade of preamp arrays? The answer is-sometimes-and then only mildly. Whereas the offsets in individual FETs are uncorrelated, the averaged output consists of the weighted sum of all the offsets that lie within the span of the IR. Thus, the averaged offset changes only a little from node to node, which means that DNL is strongly suppressed but the averaged offsets are also highly correlated. 2 In the spatial frequency domain, this means that averaging transforms the white noise-like spectrum of originally uncorrelated offsets into a colored noise with a strong peak at low frequencies. If the colored noise and the signal are once again averaged, both will lie in the low frequency passband of the second averaging network, which will therefore yield little or no additional benefit in improving SNR. Offsets in the second array, when referred to the ADC input, are suppressed by the gain of the first array, and therefore are not very important.
We can conclude that in a cascade of preamplifiers, averaging is only useful in the first array. There is also the practical matter that only the preamps in the first stage allow for a large transition region across which averaging can take place. Referred to the ADC input, the transition region of preamplifiers in later stages shrinks by the preceding gain, to the point that it can no longer conveniently span of the full scale, as is required for optimal averaging. However, if the second stage preamplifiers can be scaled down even by a little bit, averaging in a serial cascade will give some benefit of lower capacitive loading on the first stage.
D. Spatial Frequency Upconversion
If, for some reasons, offsets are correlated across the input preamp array, INL errors will accumulate in one direction and produce a low frequency peak in the spatial spectrum of the "noise". This correlation can arise from parasitic coupling between adjacent cells, nonuniform layout, and cell asymmetry. When most of the correlated noise spectrum falls in the passband of the averaging network viewed as a filter, there is no longer any benefit. This problem may be overcome if by some means the noise peak is translated to high spatial frequencies, while the signal spectrum is left unchanged. Flipping the polarity of both the input and output of the neighboring differential preamps across the array accomplishes just this: it scrambles the signs of the gradient of offsets, shifting the associated noise spectrum to high frequencies (Fig. 9) . However, the signal polarity at the output remains the same. Now an averaging network can deliver a benefit.
With this capability there is no longer a need to strive for precise uniformity in layout, which can now be more aggressive. Thus, the spatial mixer and averaging network filter provide a complete solution to both the first and the second class of MOSFET mismatch as described by Pelgrom, et al. [12] .
VII. VALIDATING ANALYSIS
The concepts developed in this paper were validated on a 6-b, 1.3-GS/s flash A/D converter that uses averaging [7] . We examine the preamplifiers, which compare a differential input voltage across an array of differential thresholds by subtracting the output currents of two differential pairs. The size of each NMOSFET in the differential preamp is m m. Using known statistics of process spreads, we find with 100 runs of Monte Carlo simulations that the RMS offset voltage of an isolated preamp is 7.5 mV.
The transition region of each differential pair, , spans about 19 LSB thresholds, i.e., . Referring to Fig. 5(b) , averaging the output of an array of these preamps by a resistor network with the ratio should lower effective RMS offset by a factor of , at the small cost of lowering the gain-bandwidth, which is proportional to the signal gain , by no more than 10%. Simulating the preamp array with averaging on 100 Monte Carlo runs, we find that the RMS offset is lowered to 2.4 mV, that is, by a factor of . The results of this statistical simulation are very close to what we expect from analysis.
Next we compare the power savings that averaging brings to this A/D converter. Consider two 6 b preamp arrays, where one consists of 63 noninteracting preamps, and the other of 63 preamps embedded in an averaging network, and extended by 18-dummy preamplifiers to cover the impulse response at the ends. The two arrays are designed for equal input-referred RMS offsets. The nMOS gate area of each averaged preamp can be scaled down by , and assuming FETs are biased at constant the power will go down in proportion. Thus, the power consumed by the averaged preamp array, including dummies, scales down by a factor relative to the power in the unaveraged array. This is a savings in power. This falls short of the savings that (25) predicts for the simple case of because the impulse response of the actual averaging network is exponential (3), not rectangular, and because the soft saturation of the differential pair defines the signal window, not an ideal piecewise linear characteristic. In fact, this ADCs ideality factor is almost the highest practically achievable.
Finally, let us turn to savings in preamplifier active area. In the actual layout of the 0.35-m CMOS prototype [7] , one preamp cell including its supply rails consumes 24 48 m . Associated with each cell is a pair of averaging resistors which take an area of 24 10 m . This means that including the dummies, 81 preamps occupy a total of 112 752 m . With no averaging, the width of the NMOS FETs and their tail bias current FETs must be scaled up by ; the supply rails must also be widened by the same factor so that stray drops across the array do not induce error gradients. This means that the area of each preamplifier cell, including wiring, will grow almost by . Even though there are now only 63 preamps in the flash array instead of 81, the total area rises dramatically to 653 184 sq-m. Therefore, averaging can lower the total preamp active area by . In sum, optimal averaging used in a practical 6 b A/D converter lowers the power dissipation in the preamplifiers by and their active area by almost . As CMOS technology scales down, the area and power consumed by the digital latches and encoding logic shrinks quadratically, and the analog preamplifiers soon dominate. Averaging, used well, will now deliver compelling benefits.
VIII. CONCLUSION
Averaging resistor networks attached to the outputs of arrays of comparators, or ZX generators, are shown to act as lowpass spatial filters. These filters pass the useful zero crossing signal, but suppress broadband (in the sense of spatial frequency) noise arising from uncorrelated offsets in the comparators. As these offsets limit the resolution of a nonautozeroed flash ADC, averaging improves resolution at lower power, area, and input capacitance than is otherwise possible. In addition, DNL improves as a result of correlation among the averaged offsets.
No matter how offset averaging is implemented in a flash ADC, the boundaries of the flash array pose special problems. The array must be augmented with as many dummy preamplifiers as the total number of preamplifiers involved in averaging, which is optimally one third of the total number of cells in the comparator array. Averaging changes the relation between the total size of the comparator cells and the ADC resolution of bits from the third to the second power of . The optimum infinite averaging network is a matched filter whose impulse response is as wide as the given preamp transition region. However, for a finite averaging network, the impulse response must not span more than the total number of dummies. This compact impulse response suppresses errors arising far away from tail-current mismatch and edge effects; it also preserves the active current at the critical preamp and therefore its unity-gain bandwidth. The optimized spatial filter, combined with a spatial mixer, is a preferred alternative to high-order interpolation (say more than ) and a complete solution to the transistor mismatch-random or correlated-in ADC design.
Though offset averaging was first used to improve the DNL of a bipolar flash ADC [1] , it delivers marked benefits in CMOS ADCs where the differential pair easily offers a transition region as wide as half of the input full scale for optimal averaging. If the ADC input capacitance is a few pF, averaging enables resolution of about 9 bits without autozeroing, which is 2 b more than if there is no averaging.
