Index Terms-Analog-to-digital converter (ADC), asymmetric digital subscriber line (ADSL), MASH, sigma-delta modulation, switched-capacitor circuits.
I. INTRODUCTION

S
UPPORTED by a considerable commercial success, wireline solutions for broad-band access and home networking are evolving to provide ever increasing data rates and more functionality. An asymmetric digital subscriber line (ADSL) is an example of such applications and extensions of it like ADSL (with doubled number of channels) or very-high-data-rate digital subscriber line (VDSL), providing video-rate reception) are just round the corner. As this trend goes on, the demand for highly linear, fast analog front-ends challenges mixed-signal designers to achieve accuracies of 12-15 b for signal bandwidths ranging from 1.1 to 12 MHz [1] .
Although these specifications seem a priori better suited for Nyquist architectures, such as pipeline analog-to-digital converters (ADCs) [2] , these architectures do not exhibit enough linearity for some telecom applications, especially in low-voltage implementations, unless the power consumption is significantly increased. For this reason, oversampled ADCs have gained ground in this frequency range. Specifically, sigma-delta modulators Ms [3] , [4] exhibit high intrinsic linearity, making use of relatively simple analog circuitry, which render them worth exploring for broad-band wireline and baseband radio-frequency communications [5] - [22] .
Given the high signal bandwidths required in wireline communication, only low-oversampling ratio Ms are feasible. In order to keep the resolution levels with these low values of , the well-known formulas for the dynamic range DR and the effective number of bits ENOB [3] DR ENOB DR (1) dictate that either high-order loop filtering (increasing the order ) or multibit quantization (increasing the resolution of the quantizer ), or both must be used. However, these strategies raise issues that jeopardize robustness of highly oversampled, low-order single-bit Ms. On the one hand, high-order loops are prone to instability and the stabilization methods proposed have resulted in complex architectures whose DR is degraded with respect to that in (1) [3] .This degradation is more notorious for single-bit quantizers, so that the combination of high-order loops with single-bit quantization is not a good choice for high-frequency designs [3] . On the other hand, multibit conversion entails extreme sensitivity to the nonlinearity of the digital-to-analog converter (DAC) in the feedback path and forces the use of correction/calibration techniques [23] - [25] . Unfortunately, since DACs cannot be efficiently linearized within an arbitrarily large resolution, the use of low-order multibit modulation may not be enough to obtain a given DR.
A direct solution to this problem is to increase both the modulator order and the internal quantizer resolution, giving rise to moderate-order (3) (4) (5) , multibit architectures. In fact, the use of multibit quantization (typically up to 4 b) in single-loop highorder Ms inherently improves their stability [3] , so that these are good candidates to obtain high-resolution, high-frequency operation, provided that the nonlinearity problem is solved [8] , [9] , [13] , [21] . With the same objective, the combination of highorder cascade (MASH) architectures [26] with multibit quantization has been proposed [18] , [27] . These modulators gather the unconditional stability of cascade modulators (only secondand/or first-order stages are used) and the advantages of multibit quantization with relaxed requirements for the linearity of the DAC. The feasibility and efficiency of this approach, because it needs no correction/calibration mechanisms, has already been proven [10] - [12] , [17] - [20] .
In this paper, we present the design of a M for ADSL applications in a 2.5-V, 0.25-m CMOS process. With this goal, a family of M architectures capable of achieving high resolution with a low oversampling ratio are devised in Section II. Section III studies the impact of deep-submicrometer features on the architecture selection, providing optimized architecture parameters for the specifications considered. Circuit implementation and related design considerations are explained in Sections IV-VI. Finally, Section VII shows experimental results of the M and compares its performance with state-of-the-art designs. Fig. 1 shows the generic block diagram of a family of high-order cascades. It is an th-order modulator formed by a second-order stage followed by identical first-order stages M . The values of the integrator weights are (2) As in all cascade Ms [26] , the outputs of the stages are processed in the digital domain through simple operators and combined to cancel out the quantization noise generated in each stage but the last one. Additionally, a pseudomultibit operation [18] , [27] is achieved by including multibit quantization only in the last stage, while the remaining are single bit. Linearized -domain analysis shows that the modulator output can be expressed as follows [4] : (3) where stands for the input signal, which is simply delayed, is the last-stage quantization error, which is shaped by an th -order function, and represents the nonlinearity error of the last-stage DAC. Note that, since is generated in a -bit quantizer, the modulator response equals that of an ideal th-order -bit M, except for the factor 2. The aim of this factor, that equals , is to compensate for the signal scaling required to avoid premature overloading of the modulator. By integrating the error terms in (3) over the signal band, the in-band quantization error power is obtained [4] as (4) where (5) are the total power associated with the last-stage quantization error and the DAC nonlinearity error, respectively, with INL being the DAC integral nonlinearity relative to the input full scale . Since the factor 2 in (3) quadruples the in-band power of these errors, a 1-b systematic loss of resolution is generated. However, this loss is small when compared to other cascade Ms and, more importantly, it is constant, regardless of the number of stages. In fact, the most appealing feature of this architecture (with the set of coefficients proposed) is that it can be easily set to any order just by changing the number of identical first-order stages. As shown in Fig. 2 , a correct operation is maintained with constant overloading point, regardless of the overall order.
II. LOW-OVERSAMPLING CASCADE MODULATORS
The coefficients in (2) also have the following interesting properties.
1) The output swing required in all integrators is only the quantizer full-scale. 2) By proper sharing of the switched-capacitor (SC) input stages, they can be implemented with just two-branch integrators, which minimizes the number of unitary capacitors. 3) All first-order stages, but the last one in case of using multibit quantization, contain the same coefficients, so that they can be electrically identical. This considerably simplifies the electrical and physical implementation of the modulator.
A. Nonideal Performance
SC implementations of cascade modulators suffer from certain nonideal behaviors more than their single-loop counterparts, namely: finite (and nonlinear) amplifier dc gain and ca- pacitor mismatch [4] . Both nonidealities modify the ideal integrator -domain transfer function, thus altering the quantization error transfer function. Since this variation is not correlated to changes of the cancellation logic, mismatch appears between the analog and digital processing that precludes perfect cancellation of the low-order quantization error. Into first-order approximation, the in-band power of the error leakages is independent of , because they are generated in the modulator first stage, which is the same for whatever [4] (6) where stands for the first-stage amplifier dc gain, and is the capacitor ratio standard deviation. If we compare (4) and (6) for a given , it is clear that for certain values of , , and these effects may dominate the in-band error power, thus imposing an upper bound to the practical values of . In order to estimate this limit under realistic circuit imperfections, Fig. 3(a) shows the simulated half-scale SNDR as a function of the amplifier dc gain for . Fig. 3(b) shows the SNDR histograms obtained from Monte Carlo simulation assuming 0.1% sigma in capacitor ratios-0.05% is currently featured by metal-insulator-metal (M-i-M) capacitors in CMOS processes. Under these conditions, mainly because of the matching sensitivity, the seventh-order architecture seems not worth implementing for . Nevertheless, the sixth-order modulator provides a 90-dB worst-case SNDR with dc gain of 2500. Especially robust is the fifth-order cascade requiring a dc gain of 1000 to achieve 80-dB worst-case SNDR with . It is important to remark that these gains are basically needed for the first-stage amplifiers. The dc-gain requirement for the integrators in the remaining stages of the cascade are much more relaxed. This is also applicable to other circuit imperfections such as electronic noise, finite dynamics, nonlinearity, and mismatch. This practice allows us to use simpler circuit topologies and layouts for these stages, thus saving area and power consumption. Likewise, in practice, the number of bits in the last-stage quantizer cannot be arbitrarily large. As shown in Fig. 4 , for a given , the evolution of the overall effective resolution with tends to saturate due to the presence of leakage. Nevertheless, depending on the signal bandwidth, the reduction in oversampling ratio that can be achieved by resorting to multibit quantization may define the border between feasible and infeasible implementations. As we will show further on, proper selection of the three main design parameters ( , , and ) is the key to really efficient implementations.
III. DEEP-SUBMICROMETER DESIGN CONSIDERATIONS
Viability of cascade multibit
Ms in deep-submicrometer CMOS is related to two main process features: supply voltage and capacitor performance. The supply voltage, through the selection of the reference voltages, defines the available dynamic range, but also makes an impact on the selection of the amplifier topology and its capability to trade open-loop dc gain, speed, and output swing [28] . An empirical upper bound for a feasible is given by references are (7) where is the saturation voltage of the amplifier output devices and is the number of transistors in the output branch, which again depends on the specific amplifier topology. If a single-stage amplifier is used, cascode devices will be required to achieve enough dc gain, so that . This common choice is not adequate in low-voltage implementations, where an excessive value of will result in a ridiculously small value for . Among the alternatives, we count on two-stage amplifiers [28] , whose output branch can contain only two transistors still producing a large open-loop dc gain. This allows us to increase the value of up to useful levels at the price of an increased power dissipation. Apart from the amplifiers, the performance of the switches with supply voltages below 2.5 V needs careful control, especially for dynamic distortion considerations [29] . For broad-band Ms, solutions are in the clock-boosting strategies [30] or in the employment of high-voltage devices available in double-oxide processes, with the subsequent increase in price, circuit complexity, and power dissipation.
The second most relevant technology feature has to do with the quality of the capacitor structures. According to the results shown in Section II, typical capacitor matching requirements range from 0.1% to 0.2% standard deviation. Low parasitics are also of extreme importance for an efficient implementation of a high-frequency modulator, and finally, we have the capacitor linearity requirements, which are less demanding provided that symmetrical fully-differential circuitry is used. Fortunately, M-i-M capacitors are now available in CMOS processes. They exhibit an excellent matching and linearity, with very small bottom parasitics.
In order to quantitatively evaluate previous assumptions, we have developed an analytical procedure to estimate the power consumption of different cascade single-bit and/or multibit Ms. In the underlying expressions, detailed in the Appendix, both architecture and technological features are contemplated, together with simplifying assumptions inspired in practical design solutions. The aim here is not only to draw conclusions about architectural choices, but also to track their evolution under technology changes. To this end, the following figure-of-merit (FOM) has been used [31] :
where DOR stands for the digital output rate, i.e., the Nyquist rate.
In a first comparison step, the triads describing specific cascades have been evaluated along the curve in the resolution-speed plane shown in Fig. 5 (dashed line). Although this particular resolution-speed relationship is arbitrary, it fits the usual requirements for wireline telecom ADCs: integrated services digital network (ISDN), ADSL, VDSL, etc., which have been placed in the figure for illustration. For each section of the resolution-speed curve, the architecture with the minimum FOM has been noted. Observe that, as the output rate increases, the oversampling ratio decreases and, simultaneously, the increased number of bits in the multibit quantizer shows up to compensate for the oversampling reduction. Note that the 4.4-MS/s DOR employed in ADSL falls into the region led by the architecture , i.e., a fourth-order 2-1-1 cascade with 3-b quantization in the last stage and using a 16 oversampling ratio, which will be our choice.
In a second step, we take advantage of the fact that some technology features enter the above formulation to predict how the performance of the cascade Ms is going to evolve under technology changes. Fig. 6 shows the estimated evolution of the FOM of three cascade topologies, namely , , and , aimed at obtaining 14 b at 4.4 MS/s. These are typical specifications for ADSL modems. Two facts are noticeable.
• Despite the reduction of the supply voltage, overall, the power dissipation does not decrease below 0.18 m. This is basically due to the reduction in supply voltages, which imposes a reduction in the reference voltage and, hence, a compensating increase in the sampling capacitors. Since the incomplete settling error power must be also kept constant, this mechanism leads to an increased current absorption, which makes the overall power consumption increase below 0.18 m. The location of the inflection point depends on the converter specifications. For instance, if for the same speed, the resolution is to be increased, the inflection point moves to the right in into binary by a read-only memory (ROM) that generates the corresponding bitstreams.
The modulator operation is controlled by two nonoverlapped clock phases. The integrator input signals are sampled during phase . During phase , the algebraic operations are performed and results are accumulated in the feedback capacitors. In order to attenuate the signal-dependent clock-feedthrough, delayed versions of the two phases ( and ) are also provided. This delay is incorporated only to the falling edges of the signals (switches turn off), while the rising edges are synchronized in order to increase the effective time-slot for the modulator operations [15] . The comparators and the last-stage ADC are activated at the end of -using as strobe-to avoid any possible interference due to the transient response of the integrators at the beginning of sampling.
V. SPECIFICATIONS FOR THE BUILDING BLOCKS
The converter specifications have been mapped onto basic building block requirements by following an optimization process supported by behavioral simulations [4] . Table I summarizes the modulator sizing achieving 13 b@4.4 MS/s. Five groups of specifications are enclosed: modulator, front-end TABLE I  MODULATOR SIZING   TABLE II  MAIN IN-BAND ERROR CONTRIBUTIONS integrator, amplifier, comparator, and A/D/A converter. In this procedure, the worst-case performance has been evaluated in the presence of variations in the process (for instance, changes in the capacitor absolute value), temperature, and supply. Table II shows a summary of the most significant contributions to the in-band error power. Main considerations made for this sizing are described next.
The first step of the modulator sizing is the selection of . In this selection, both the overloading characteristics of the modulator and the nature of the signal being converted must be considered. In our case, the overloading point is nearly 5 dB (see Fig. 2 ), while the largest input is the 15-dB discrete multitone (DMT) signal shown in Fig. 8(a) . Note that, although its power is not too high, large peaks appear from time to time, thus yielding the high crest factor [32] peculiar to DMT signals (5.4 in our case). Fortunately, the duration of these peaks is short enough not to overload the modulator. In order to illustrate this, Fig. 8(b) shows behavioral simulation results of the modulator SNDR for such an input signal as a function of the reference voltage. In spite of the presence of a signal peak of approximately 1 V, the modulator SNDR is correct up to V (note that this would never be the case for a 1-V amplitude input sinewave, since it would be inside the modulator overloaded region with 1.3-V reference). In order to provide a safety margin, V was taken. Returning to (7), this reference voltage gives us a margin of 500 mV per output transistor in a two-stage fully differential amplifier supplied with 2.5 V. As shown in Fig. 7 , is implemented using differential references, so that . In Table II , the in-band error power of quantization error has been split up in its four contributions associated to: the ideal quantization error [first term in (4)], finite dc gain [first term in (6)], capacitor mismatch [second term in (6)], and last-stage DAC nonlinearity [second term in (4)]. Note that the quantization error leakage will be dominated by capacitor mismatch. Although M-i-M capacitors exhibit good matchingfor 1-pF caps-the use of small unitary capacitors (0.66 pF) for dynamic considerations increases the sensitivity of the cascade, so that we have assumed twice that value for . The contribution of the 3-b DAC nonlinearity is 6 dB below the ideal quantization noise for INL FS, which is easily achievable without calibration. The noise leakage due to the amplifier dc gain is almost negligible for . However, as we explain further on, this value will not be further relaxed in order to avoid excessive distortion due to dc-gain nonlinearity.
Following the discussion in Section III and in the Appendix, a small sampling capacitor pF is used in order to reduce the capacitive load of the integrators and, hence, their power dissipation. So, white circuit noise becomes the dominant error source. A more exact expression (than the one used in the Appendix) for its in-band error power is [3] GB (9) where is the amplifier input-referred white noise and GB is the effective amplifier gain-bandwidth product (in Hz), which during integration can be approximated to GB GB GB GB GB (10) where GB is the amplifier gain-bandwidth product (in Hz), is the switch on-resistance, and is the pole associated with the constant of the SC branch during integration. The first contribution in (9) yields a worst-case value of 86.0 dB-for maximum temperature 110 C and 20 tolerance in the capacitor value. On the other hand, for GB and fixed according to settling considerations to 265 MHz and 150 , respectively, GB will be 250 MHz. An equivalent thermal noise at the amplifier input of 6 nV Hz is therefore enough to obtain a noise contribution similar to that of the noise 87.5 dB . Besides, the worst-case amplifier white noise contribution corresponds to the largest GB , which varies along the process corners. Assuming that it can be as large as twice its nominal value (i.e., 500 MHz), this worst-case contribution yields 84.5 dB.
The limited amplifier GB introduces basically a gain error in the integrator transfer function. This error is especially important in the integrators of the first stage of the cascade, because the quantization error of this stage will leak to the modulator output. For the architecture considered operating with , the amplifier must fulfill GB to avoid degradation of the modulator performance due to incomplete settling, being the sampling frequency.
If the finite on-resistance of the switch is also considered, the effective amplifier response is slowed down, as stated in (10) . This effect is illustrated in Fig. 9 (a) that shows behavioral simulation results for the in-band error power as a function of the normalized amplifier GB, for different values of . The corresponding values of the normalized pole are also depicted. Note that, as the pole decreases, the amplifier GB must be increased in order to compensate for the slowdown. A switch resistance of 150 is fixed for this design. On the one hand, as we show further on, this resistance can be obtained using standard CMOS transmission gates, without clock boosting. On the other, the amplifier GB must be increased just to GB in order to maintain the modulator performance. Assuming that approximately 85% of the clock period is left for the integrator operation (after ensuring nonoverlapping and delay in the clock-phase signals), the required GB is approximately 265 MHz. The required amplifier slew rate (SR) is established guarantying that the slew-rate limited evolution at the beginning of integration and sampling [33] is fast enough for the subsequent linear dynamic to settle to the desired accuracy. For this modulator, a normalized SR SR is sufficient to ensure correct performance. However, since the operation of the front-end integrator is partially SR limited, the dynamic will be also partially nonlinear and appreciable harmonic distortion may arise. This effect is illustrated in Fig. 9(b) , where behavioral simulation results are shown for the modulator in-band error power versus the normalized amplifier SR, for different amplitudes of a sinewave input. Note that, for the correct conversion of an input sinewave of maximum amplitude (0.85 V), the normalized SR must be increased up to 6.5. Assuming that 85% of the clock period is left for the integrator operation, the required SR is approximately 800 V s.
Thanks to oversampling, some specifications in Table I referring to the front-end integrator can be relaxed for the rest of integrators. Specifically, the value of the sampling capacitor in those integrators can be progressively scaled down, since their contributions to the overall noise are attenuated in the signal band. Nevertheless, matching considerations and reliability preclude using very small capacitors. In this design the scaling of the nominal (0.66 pF) is limited to 32%, which means that 0.45-pF unitary capacitors are used in the rest of in- tegrators. On the contrary, the input-referred white noise of the amplifiers at the modulator back-end can be considerably increased without jeopardizing performance.
A more aggressive reduction can be applied to the other circuit requirements. For instance, the amplifier dc gain of the third and fourth integrators can be reduced to 600, because the in-band powers of the respective quantization error leakages are proportional to and , and the effect of their nonlinearity is negligible when compared to that in the front-end integrator. Moreover, the SR can be relaxed to 350 V s, as their settling behaviors are not so important. Table III summarizes the specifications for the four integrators in the cascade after scaling.
VI. DESIGN OF THE BUILDING BLOCKS
A. Amplifiers
The triple tradeoff among dc gain, dynamics and output swing, always present in an amplifier [28] , becomes tighter in a low-voltage implementation. We have already shown that the selection of the reference voltage and the topology of the front-end amplifier are interrelated in deep-submicron cascade Ms, the reason being that large enough requires two-stage amplifiers in order to achieve the dc gain and dynamic requirements. Fortunately, this is not the case for the amplifiers at the modulator back-end, whose dc gain can be largely relaxed, so that a single-stage amplifier may be enough. Therefore, in order to avoid over-sizing and optimize the power consumption, two different amplifiers have been designed: a high-dc-gain, high-speed amplifier for the first stage (OPA), and a modest dc-gain, high-speed amplifier for the third and fourth integrators (OPB).
OPA is implemented using a two-stage two-path compensated architecture, shown in Fig. 10(a) . It uses a telescopic firststage and both Miller and Ahuja compensation [34] through capacitors and , respectively. The common-mode feedback nets (CMFB) employed in the first and second stages are dynamic, because they have no static consumption and help to circumvent voltage range problems. A p-type input scheme has been preferred, the main reason being the possibility of cancelling the body effect in the pMOS devices-one of the mechanisms for substrate noise coupling [35] . Another reason for this choice is that, in the target technology, noise of nMOS devices is considerably larger than that of pMOS ones. Although noise usually plays a secondary role in telecom converters, since it normally does not alias and the low-frequency region of the spectrum is commonly out of the signal band, the noise power spectral density (PSD) of very small devices can be huge [36] and sometimes poorly modeled, thus deserving special attention in deep-submicrometer implementations. This trend precludes using minimal length transistors, even more noticeably than if only matching considerations are taken into account. In our case, the devices contributing most to the amplifier noise are , , , and . In order to make the noise contribution negligible, the length of those devices was increased up to 0.5 m for the pMOS and 2 m for the nMOS. In the worst case, the in-band error power due to the noise of the front-end amplifier is -103.6 dB, low enough not to degrade the performance.
OPB is implemented using a folded-cascode architecture, shown in Fig. 10(b) , which is enough to accomplish the moderate dc gain requirement with reduced power dissipation. An SC CMFB is also employed. Table IV shows the features of OPA and OPB obtained by electrical simulation after full sizing. Results summarized correspond to the worst-case value of each parameter in a corner analysis-considering fast and slow device models, 5 variation in the 2.5-V supply, and temperatures in the range 40 C 110 C .
The amplifier nonlinear features (mainly nonlinear dc gain and dynamics) deserve special attention in a low-voltage implementation. When the amplifier output voltage swings, the drain-to-source voltage of the output transistors changes, and so does the output impedance. This effect, illustrated in Fig. 11 for OPA, translates into a dependence of the open-loop dc gain on the output voltage, so that the dc gain reaches its maximum at the central point and decreases as the output approaches the rails. Such a nonlinearity is traditionally modeled by a second-order polynomial dependence of the gain on the output voltage [4] , but this is only valid for small voltage excursions around the central point. On the contrary, in a 2.5-V implementation, it is expected that small-gain regions of the dc curve (shadowed areas in Fig. 11 ) are often visited during normal operation of the modulator. In order to accurately account for this nonlinearity in behavioral simulations, we have resorted to a table look-up procedure from amplifier dc curves obtained by electrical simulation. A similar approach has been employed for validating the actual transient response of the front-end integrator. This step is aimed at avoiding inaccuracies of the single-pole SR limited behavioral model employed [33] when applied to the two-stage amplifier with nonconstant SR in the first integrator.
B. Switches
The design of the CMOS switches has been tackled with two main considerations in mind. First, the nonzero on-resistance heavily affects the integrator dynamic, slowing down its transient response. Second, the switch on-resistance can be highly dependent on voltage in low-voltage implementations. The sampling process with such a nonlinear resistance causes dynamic distortion [29] at the M front-end, the more evident the larger the signal frequency. Among the solutions to these problems, resorting to larger aspect ratios increases parasitics and power dissipation, whereas including clock-boosting [30] increases complexity and leads to a less robust design.
According to settling considerations, resistances in the range of 150
can be tolerated in combination with the amplifier dynamics. In our process, such a value can be obtained using standard-threshold CMOS transmission gates, with no need for clock boosters. The sizes of the pMOS and nMOS devices were selected to equalize their transconductances, keeping the resistance of the transmission gate as linear as possible. Fig. 12 shows its nominal dc characteristic.
In order to evaluate the distortion, the nonlinear sampling has been extensively simulated using the differential circuitry in Fig. 13 . Note that the distortion will be mainly determined by switches and (connected to the input), whereas and are connected to the central voltage that is constant. Electrical simulations have been performed to compute the first five in-band harmonics for a 0.85-V, 366-kHz input sinewave. Also, the DMT signal in Fig. 8(a) has been considered. Fig. 14 shows the worst-case results obtained for both type of inputs during the corner analysis: The worst-case total harmonic distortion (THD) is 96 dB for the input sinewave and the maximum multitone power ratio (MTPR) [32] of the converted DMT signal is 81 dB. Both figures are small enough for our application, so that clock-boosting is not required.
C. Quantization Blocks
The resolution specifications for the comparators in the first and second stage are not very demanding: offset and hysteresis smaller than 10 and 20 mV, respectively. However, the maximum comparison time is only 3 ns-a quarter of the worst-case clock period. For this reason, the latched comparator in Fig. 15 has been adopted. It includes a differential pair input transconductor [37] , which attenuates the impact of common-mode interferences, a regenerative stage, and an SR latch. In this circuit, the small voltage imbalance created across the nMOS switch controlled by during the reset phase is rail-to-rail regenerated during the positive-feedback comparison phase. The latter starts when goes high, thus making the latch react before the integrator output changes at the beginning of . This strategy avoids using an extra SC stage at the comparator front-end. Differenced supply paths are used for the preamplifier and the regenerative latch in order to reduce the sensitivity to digital switching noise and supply bouncing.
The 3-b A/D/A converter in the last stage has been implemented with a flash ADC and a resistive-ladder DAC, as shown in Fig. 16 [18] . The resistive-ladder is also used to generate the voltage references of the ADC. The latter has a fully differential flash architecture, where the thermometer output code is translated into a 1-of-8 code using AND gates. For improving robustness against common-mode interferences, seven differential comparators, similar to those in the first and second stage of the cascade, form the ADC front-end, each of them with two input pairs to perform the subtraction of the two differential signals being compared. Apart from this, the only difference with those in the first-and second-stage comparators is that the input transistors have been reduced in size in order to decrease the capacitive load of the fourth integrator.
The DAC consists of 14 segments of 50-poly resistors, the most important source of INL being resistor mismatch, which improves with device area. Thus, in order to guarantee that INL FS, each of the 50-resistors is obtained by connecting larger devices in parallel. Fig. 17 shows the clock driver that generates the nonoverlapped clock phases-, -from an external clock signal. Delayed versions of the phases-, -are also generated to avoid signal-dependent clock-feedthrough. As shown in Fig. 7 , the delay is incorporated only to the turn-off of the switches (falling edges of the signals) in order to increase the time slot available for sampling and integration [15] . Complementary versions of the phases are also generated to control the CMOS switches. All signals are properly driven at the output using a buffer tree that equalizes the differences in capacitive load from phase to phase. After ensuring reliable nonoverlapping time and phase delay, the worst-case effective phase eye is 6 ns, which means that approximately 85% of the clock period is left for the modulator operation.
D. Auxiliary Blocks
The reference voltages required for the modulator operation, namely V and V, together with the central voltage are on-chip generated by the circuit shown in Fig. 18 . Its main design considerations are fast settling and that the output impedance of the and lines must be low enough to avoid dynamic distortion at the integrators [38] . In our case, 7-maximum output impedance is obtained along the signal band through the combined use of an on-chip resistive amplifier and two big external capacitors. An extra external capacitor is connected between the reference voltages, valued according to the pad wire lead pin parasitics, so that the spurious components around half the sampling frequency are removed from the differential reference voltage.
A second-order passive antialiasing filter is also included on-chip. Its bandwidth can be programmed to accomplish 2.78 mm and dissipates 65.8 mW (55 mW corresponding to the M itself) from a 2.5-V supply. Apart from the modulator described here, other blocks pertaining to the final application (not shown) were included in the prototype chip, among them a phase-locked loop (PLL) and a decimation filter. These blocks were arranged so that the M could be tested as a stand-alone block or in combination with the PLL and the digital filter. The latter configuration is aimed at reducing the switching activity of the digital buffers at the pad-pin level-a major source of performance degradation. Also, with the objective of attenuating the impact of switching noise, the following mixed-signal recipes, valid for nonepi high-ohmic substrates [35] , were adopted in the prototype: 1) increased distance among analog and digital blocks; 2) use of separate analog, mixed, and digital supplies, which are distributed on-chip through distinguished low-impedance paths; 3) placement of guard-rings (with dedicated pads and pins) surrounding the different chip sections, in order to avoid spreading of switching noise and provide a quiet substrate for the sensitive analog blocks; 4) preserved layout symmetry and extensive use of common-centroid techniques aimed at gaining insensitivity to common-mode interferences; 5) shielding of the clock buses along the chip in order to reduce cross-talk and provide a low- impedance return path; 6) extensive use of on-chip decoupling, including a mixed on/off-chip decoupling scheme for the analog supply [39] ; and 7) multiple bonding for reducing wiring inductance.
In order to avoid socket parasitics, each prototype sample was mounted onto a dedicated four-layer printed circuit board (PCB), including typical measures for signal integrity, such as separate analog, mixed, and digital ground planes, intensive decoupling and filtering, and proper impedance termination [40] . The input signal was provided by a high-resolution, 100 dB THD, sinusoidal source with floating differential output, its common-mode voltage referenced to the on-chip generated central voltage . The output samples, either from the modulator bitstreams or from the digital filter, were acquired by a digital tester that also provided the master clock stimulus and the supply voltages. Fig. 20 shows the total in-band error power versus the modulator sampling frequency, for the nominal oversampling ratio and twice this value. For each value of two curves are plotted, corresponding to clock-rate acquired ( M alone) output samples and decimated PLL M Decimator output samples. Note that in the former case the in-band error power increases as the sampling frequency increases. This effect, explained by the increasing switching noise injected by the I/O buffers, causes a degradation of around 9 dB in performance at the nominal sampling frequency (70.4 MHz). Nevertheless, when both the PLL and decimator are used, so that the switching frequency of input and output buffers are divided by 4 and 16, respectively, the loss of performance is reduced to 3 dB. Note also that although the digital filter activity generates an increase of the in-band error power at intermediate sampling frequencies, its impact is largely suppressed at the nominal rate, thus demonstrating the validity of the decoupling schemes used, especially at the reference voltages. A similar behavior is obtained for . Fig. 21 shows a 16 384-point fast Fourier transform (FFT) of the decimated converter output for a 0.5-V, 160-kHz input sinewave. Despite the large signal level, no significant harmonic distortion is observed. In fact, in Fig. 21 , spurious-free dynamic range (SFDR) is 90 dB, whereas THD computed up to the fifth harmonic is 87 dB, so that the signal-to-noise ratio (SNR) almost coincides with the SNDR. The latter is shown in Fig. 22 for both and , the error power being computed in the ADSL band (from 30 kHz to 2.2 MHz) and in the ADSL band (from 30 kHz to 1.1 MHz), respectively. The dynamic range is 78 dB (12.7 b) for and 85 dB (13.8 b) for , with SNDR peaks of 72.7 dB and 80 dB, respectively. The good linearity of the converter also manifests as low integral and differential nonlinearity (INL and DNL, respectively). Both curves are shown in Fig. 23 . They have been obtained applying the code-histogram method [41] to 89 output data records, each one containing 8192 consecutive output samples for a 0.8-V, 59.62-kHz input sinewave. The only difference among the data records is the phase of the sinewave, which can not be controlled. However, this fact helps to hit the converter output codes in a more uniform way, thus requiring fewer samples than in the case in which all were taken consecutively. The INL, DNL units in Fig. 23 As a matter of conclusion, the M here has been compared with other recently reported designs featuring DOR MS/s, whose performances are summarized in Table V . Their effective resolution (ENOB) and FOM value, defined in (8), have been plotted as a function of DOR in Fig. 24(a) and (b) , respectively. In spite of the performance loss due to switching noise (around half a bit), the modulator here achieves one of the lowest FOM reported so far. In particular, the FOMs obtained are surpassed by only two CMOS designs with supply voltage equal to or below 2.5 V [5] , [6] . 
APPENDIX POWER ESTIMATOR FOR CASCADE Ms
In the presence of circuit imperfections, the dynamic range of a M can be roughly expressed as follows [4] :
where , , and are the in-band powers of quantization error, white circuit noise or thermal noise, and incomplete settling error, respectively.
For the sake of simplicity, we will assume for now that the incomplete settling error can be controlled by design so that , . An approximate expression for is obtained by adding up (4) and (6) . Concerning , it will be usually dominated by white noise injected by the switches and the front-end amplifier, whose PSD is folded back over the baseband by undersampling. A conservative expression for the in-band power of white noise can be derived [3] as (12) where is the value of the sampling capacitor. Equations (4)- (7) and (12) show that the dynamic range of a cascade M can be roughly expressed as a function of the following design parameters:
, , , , , and , to which we have to add and INL if the last-stage quantizer is multibit. So, for given values of , , and INL, the minimum value of the capacitor required to obtain a given DR can be obtained as a function of , , and . Once is known, the equivalent load for the amplifier in the integrator can be estimated as [33] (13)
where , the integrator feedback capacitance, is related to through the integrator weight, , and and stand for the integrator summing node and output parasitics, respetively. Estimating the latter two capacitances is a difficult task because of their extreme dependence on the actual amplifier design.
Usually, the main contribution to is the amplifier input parasitics. In a fully differential topology, this is formed by the input transistor gate-to-source capacitance (both channel and overlap contributions) and its overlap gate-to-drain capacitance amplified by Miller effect [28] . Thus, neglecting , we have (14) where is the gate oxide capacitance density and stands for the lateral diffusion of drain/source regions below the gate, both technology-dependent parameters. Apart from the input transistor dimensions , the other unknown variable in (14) is its input-to-output gain . This is equal to the complete amplifier gain for single-stage amplifiers or to the first-stage gain if multistage topologies are used. It can even be around unity if cascode devices are used, such as in folded-or telescope-cascode amplifiers [28] . Now, making use of the wellknown (as much as inadequate) square-low expression for the input transistor drain current, we have (15) where is the input transistor overdrive voltage.
The other unknown capacitance in (13) , , has two main contributions: The first one is due to the bottom parasitic of the integration capacitor , and the second one is due to the amplifier itself. The former contribution can vary a lot, depending on the type of capacitors. With modern M-i-M structures it turns out to be very small, ranging from less than 1% to 5% of . Because of this, tends to be dominated by the amplifier output parasitic load, which strongly depends on the actual output devices and, overall, on the amplifier topology. Even the supply voltage, via output swing and dc gain requirements, makes an impact on the transistor sizes and hence on . For a given amplifier schematic, the latter influence makes slightly increase under technology scaling and shrinking supply voltages, because wider output devices are required to accommodate similar output swings. All things considered, a reliable estimation of this capacitance prior to sizing the amplifier is not possible. Based on previous design experiences, we will assume a constant value equal to 2.5 pF.
Returning to the settling error power , an accurate estimation would involve the following calculations. For example, just for a single-pole amplifier model, complicate expressions are derived [4] if a nonlinear (SR limited) settling is considered.
Further complexity arises from considering both sampling and integration incomplete charge transference and the contribution of the nonzero switch on-resistance [33] . Hence, we will simplify our treatment assuming that the slew-rate of the amplifier is large enough and the switch on-resistance small enough to neglect their impact on the integrator transient response, so that the settling is linear with time constant equal to . This being the case, it takes a number of time constants to settle within ENOB resolution, that is, the following relation should be fulfilled: (16) where is the sampling period. Note that we have added an extra bit in order to make room for the inaccuracy of this simplified model. The above expression can be used to estimate the minimum value of the transconductance parameter as (17) where is the sampling frequency. This is the transconductance required for a single-stage amplifier with equivalent output load . For multistage amplifiers, the previous relation must be carefully tackled because both parameters, total transconductance and equivalent output load, lose control of the dynamics. However, provided that the main pole of the amplifier is set by the input stage and an eventual inter-stage compensation capacitor, (17) can still be used to determine the input stage transconductance, that is related to the input transistor current as follows: (18) Equations (13), (15) , (17) , and (18) can be handled in an iterative way to determine the current required through the input transistors of the amplifier, whose actual topology sets the power consumption. Whenever possible, a single-stage amplifier should be used for its better performance/power figure. However, as technologies scale down and supply voltages shrink, two-stage amplifiers are gaining ground. Moreover, in practice, two gain stages are not enough to achieve the overall gain requirement, so that the first one often includes cascode devices in a telescope cascode configuration. Let us consider this topology as an archetype in modern deep-submicrometer technologies. The current through the first stage has been already estimated as . Assuming for the sake of simplicity a fixed ratio between the currents flowing through the input and output branches, the total current through the amplifier can be estimated as (19) where an extra is added to account for the biasing stage. With (19) , the power dissipation of the first amplifier can be estimated. That of the remaining amplifiers in the cascade stages can be decreased, following the scaling rule commonly applied to the amplifier requirements in
Ms. This power reduction may come from either a relaxed set of specifications or the subsequent amplifier topology simplification. Sometimes, even when a two-stage amplifier may be required for the first integrator, it is possible to use a single-stage topology for the rest of integrators. So, we can write (20) where is the ratio of the current absorption of the th amplifier to the first one. From this, the static power dissipated in amplifiers is (21) Besides this static consumption, which usually accounts for 80% of the total power, there are other contributing blocks, namely:
• latched comparators used as single-bit quantizers and those in the last-stage multibit quantizer, usually implemented by a flash ADC, i.e., more latches. This consumption must include the static power dissipated in a convenient preamplifying stage.
• Last-stage multibit DAC (if ). The relaxed requirements for this block allows us to implemented it with a resistor ladder. Its main design considerations are resistor matching and linearity (both causing INL) and the fact that it must drive enough current to provide a good settling. The current requirement scales with the sampling frequency and the capacitive load involved. The latter can be considered almost constant because the last-stage capacitors should be set to the minimum required to achieve certain level of matching (thermal noise playing a secondary role). So, we can empirically write (22) where is the current through the DAC required for operating at a certain frequency of reference . • Dynamic power in SC stages. The dynamic power dissipated to switch a capacitance between the reference voltages at a frequency can be estimated as , which tends to increase in high-speed, high-resolution converters. Its actual value depends on the integrator weights used. In our case, the following expression provides a good estimate: (23) where the factor 2 comes from the differential implementation, is the unitary capacitor in the first integrator, whereas is the one in the rest of integrators, usually smaller than .
• Small digital blocks: flip-flops, gates, cancellation logic, etc. Apart from being small, they do not make any difference for the architectures considered and will be neglected here. Of course, this does not apply to the decimation filter, whose power consumption is comparable to that of the M. Moreover, since the order of the digital filter must equal , high-order Ms require more complex filters than low-order ones. However, an increase of the modulator order entails a decrease of the oversampling ratio and the filter can be operated at a lower frequency, dissipating less power. To our purpose, we can consider an essentially constant decimation filter power consumption. By adding up all the contributions, the power dissipation of the M can be estimated as Power (24) 
