Abstract-An algorithm for architecture-level exploration of the 16 A/D converter (ADC) design space is presented. Starting from the desired specification, the algorithm finds an optimal solution by exhaustively exploring both single-loop and cascaded architectures, with a single-bit or multibit quantizer, for a range of oversampling ratios. A fast filter-level step evaluates the performance of all loop-filter topologies and passes the accepted solutions to the architecture-level optimization step which maps the filters on feasible architectures and evaluates their performance. The power consumption of each accepted architecture is estimated and the best top-ten solutions in terms of the ratio of peak signal-to-noise+distortion ratio versus power consumption are further optimized for yield. Experimental results for two different design targets are presented. They show that previously published solutions are among the best architectures for a given target but that better solutions can be designed as well.
I. INTRODUCTION
A MONG the many architectures of analog-to-digital converters (ADCs), designs are used in a large class of applications ranging from low-frequency [1] and audio [2] to down-converted intermediate-frequency [3] and digital video [4] . Their property to trade speed for accuracy makes them more attractive in the context of present CMOS technology evolution [5] . The spread of designs and the absence of an accurate analytical model for their nonlinear behavior caused a rapid evolution of dedicated simulation software. There are a number of simulators readily available, some of them as free software toolboxes [6] and some developed in universities [7] , [9] . However, even using fast, dedicated simulators, it is all but impossible for a designer to explore the entire range of topology and design parameters that can yield the optimal solution for a target dynamic range (DR) or peak signal-to-noise+distortion ratio (SNDR). CAD tools for design and optimization of ADCs have also been reported. In [10] , a tool is described which helps the designer choose the optimal solution from a predefined set of topologies with fixed filter coefficients, which might be a local optimum in the entire ADC design space. The tool also determines the minimum values for the circuit-level parameters like operational amplifier gain and bandwidth. The tool in [11] also requires the designer to input topology specifications, and a simulated annealing optimization, based on topology-specific equations, is used to find the architecture coefficients and circuit parameter values. This paper presents a global optimization approach which outputs a list of the best ADC architectures in terms of peak SNDR versus power consumption ratio. The optimality is guaranteed by exhaustive search of the entire design space, as opposed to other design automation software reported so far. The search is conducted in two steps. In a first step, called filter-level design, the loop filter(s) is analyzed using the linear model approximation [13] for all possible combinations of loop order, number of cascaded loops, number of bits in the quantizer, oversampling ratio, and peak gain of the noise transfer function (NTF). The DR of each solution is evaluated [6] and qualifying solutions that satisfy the input DR specification are delivered to the next step. The second step, called architecturelevel design, maps the filter-level qualified solutions on possible system architectures. Each architecture is designed and its performance analyzed from time-domain simulations. Accepted solutions are then subjected to power consumption estimation, assuming switched-capacitor (SC) differential circuits for discrete-time (DT) designs [2] , [14] and active-RC differential circuits for continuous-time (CT) designs, respectively. A reduced set of solutions are selected based on the peak SNDR versus power consumption ratio. Only these solutions are subjected to Monte Carlo analysis to optimize their yield with respect to process-induced mismatch of architecture coefficients. The resulting designs are then returned to the designer in ranked order.
The paper is organized as follows. Section II contains a description of the filter-level design step. Section III presents the architecture-design step with details on the performance tests and architecture-level power estimation. Experimental results are shown in Section IV for two different targets, an audio and an xDSL ADC. The conclusions are presented in Section V.
II. FILTER-LEVEL DESIGN
A generic DT representation of a ADC, given in Fig. 1 , is best used to explain the functioning of the ADC modeled as a linear system. The loop filter has two sections, a forward filter and a feedback filter . The input signal is applied and compared with the signal fed back by , filtered through , and quantized to give the digital output . The quantization introduces an error which is modeled as input-signal-independent and directly added to the output in the quantizer (represented as a summation point).
The two transfer functions are defined on the system above: the signal transfer function (STF) to characterize the 
These functions can be independently defined because a linear model for the quantizer is assumed, thus making the whole system a linear one where superposition rules apply. The definitions of the two transfer functions are based on the two sections of the loop filter:
The order of the loop is given by the order of the polynomial product . A higher loop order improves the rejection of in-band quantization noise, thus increasing the DR. Another method to increase DR is to increase the number of bits in the quantizer, reducing the power of the quantization noise .
A. Exploration Algorithm
A fast design space exploration is performed first, using a linearized model of the ADC. The search algorithm is shown in Fig. 2 . A separation is made between the topology design space defined by the order, number of loops (for single loops this value is one, for cascades it shows how many loops are cascaded) and number of bits (a vector with the length equal to loops), and the parameter design space defined by the oversampling ratio (OSR) and peak NTF magnitude. Other design parameters, like the input signal bandwidth or the value of the reference level, are not useful at this design level since they are only scaling quantities. All of these dimensions of the search space are browsed using constant stepping, linearly in the topology space and exponentially in the parameter space [15] .
At each step, a set of two filter transfer functions are generated, the NTF and the STF. The NTF has Chebyshev poles and reduced ripple at high frequencies. The software can optimize the NTF's in-band zeros by linear search from dc to the upper limit of the signal bandwidth to reduce the total in-band noise power [6] . Off-band STF zeros can also be optimized by linear search from the upper limit of the signal bandwidth to half the sampling frequency to reduce the in-band STF gain ripple (useful especially in audio applications) and to increase the rejection of off-band input spectral components (useful when no signal conditioning is available in front of the ADC). At this stage, the two filters are not connected to any particular loop topology; they only share the same poles.
After the two filter transfer functions are generated, the in-band noise power is calculated from the NTF to estimate the DR (see Section II-B). A test is applied with two thresholds derived from the target DR (in dB)
Both DR and DR are positive numbers expressed in dB. The lower limit tests if, after the circuit (white) noise is added, the converter still reaches the target DR. A typical value for DR is 6 dB. The upper limit is set to reject solutions which offer much more quantization-noise DR than actually needed at the expense of overloading level (OVL) value. A typical value of DR is 12 dB. If the test is passed, the solution is saved in a database.
The algorithm investigates all possible solutions characterized by (order, bits, loops). For every single-loop solution, the search stops when a combination of minimal OSR and NTF gain is found that satisfies the DR requirement in (3). For cascaded designs, however, because significant architectural details are not available at the filter level, the algorithm continues searching for solutions even after finding the first valid one, until the entire (OSR, NTF) space has been explored for every topology. This ensures that no valid solution is prematurely rejected.
B. Dynamic Range Evaluation
A fast yet accurate method to estimate the DR is used. It overcomes the drawbacks of both time-domain behavioral simula- tions, which are slow because the loop must first be designed, and of the classical formula [13] DR ORDER ORDER OSR ORDER (4) which is inaccurate for high-order loops. The method is as follows. First the magnitude of the NTF is calculated in n points equally distributed from dc to as NTF ORDER (5) using (6) The total integrated power of the quantization noise is assumed to be [13] ( 7) where is the quantizer step and is equal to the reference level divided by the number of quantizer steps, . This yields a (white) quantization noise amplitude in each spectrum bin from dc to , referred to , as if the quantizer has BITS number of bits [15] . The value of the NTF magnitude, given by (5) expressed in decibels, is added to the value of and a curve showing the quantization noise amplitude in each bin is drawn. Its integral in the band of interest yields the estimated DR value. Fig. 3 shows the error of the calculated DR using (4) and the error of the estimated DR using (5)- (8) compared to the ideal, time-domain (behavioral) simulated value for a set of single-loop designs. It can be seen that the error for the calculated DR is significantly greater than the one for the estimated DR and it increases with the loop order. At large loop orders the approximation of a stable NTF with an ideal high-pass discrete filter [used to derive (4)] becomes less accurate than at low orders. The error of the DR estimation method used in our approach, however, is always lower than 3 dB. The estimation also yields more accurate results than the classical formula for cascaded designs, as can be seen in Fig. 4 where the errors of the calculated and estimated DR are compared for a large set of fifth-order ADCs with two, three and four loops in cascade. The calculated DR is in large error, especially for a large number of quantizer bits in the last loop.
Estimation of the DR value is very fast because only polynomial calculations are required, as opposed to behavioral timedomain simulations which would require the ADC to be designed first, a process implying many additional behavioral simulations. A typical design space search at the filter level evaluates a few thousand solutions in less than 10 min on a 1-GHz computer.
C. Peak NTF Magnitude Database
The one parameter of ADC's which cannot be predicted accurately by linear modeling is the overloading level (OVL) [13] . In the algorithm presented here, the overloading levels for each single-loop ADC in the entire range of loop orders and quantizer bits are computed once from behavioral time-domain simulations and stored in a database which is then used during the design space exploration. This one-time OVL computation is done as follows. The peak magnitude of the NTF is varied in a range from 1.0 (0 dB) to 16.0 (24 dB). For each peak NTF magnitude, each single-loop ADC is designed and its coefficients are optimized to bound the integrator outputs, in a range of input signal amplitudes from below the target DR up to 0 dB. The overloading level is detected as the input signal amplitude which causes at least one integrator to clip even if the integrator coefficient is close to zero. Fig. 5 contains part of the entries in the peak NTF database for the single-bit, fourth-order ADC designed with DT and CT loop filters, respectively. It shows the variation of the over- loading level (OVL) and the coefficient of the first integrator in the NTF range where the loop can be stabilized. The CT loop filters are generated from DT equivalents, therefore a comparison of the NTF ranges can be performed for the two filter types. The stable NTF range for DT loop filters proves to be larger than for the CT equivalents. The lower value of the first integrator's coefficient, , also shows a better stability of the ADC designed with DT loop filters compared to the CT equivalent. The stable NTF range is also dependent on the number of bits in the quantizer, as can be seen in Fig. 6 for a fifth-order loop with 1-8-b quantizer. The stability ranges for DT designs quickly increase with the number of quantizer bits while the stability ranges for CT equivalents are always lower and increase much slower than the ones for DT designs. The results shown in Figs. 5 and 6 are obtained for nonreturn-to-zero (NRZ) DAC pulses in the CT design. If return-to-zero (RZ) DACs are used in the CT solutions, a different peak NTF database has to be generated and used.
Because there is a tight relationship between the OVL value and the ratio of the clipping voltage over reference voltage, the database has to be generated for each different ratio of . This takes about 24 hours on a 1 GHz computer but the same database can be used for any target DR (SNDR) design as long as the ratio does not change. During the testing of our software there was only need for a set of three databases, for of 0.7, 1.0 and 1.3. As an example, if the actual is 0.8, using the database for 0.7 will not reduce the optimality of the final design, since the database is only used to verify that a stable solution exists for a given peak NTF magnitude.
III. ARCHITECTURE-LEVEL DESIGN

A. Exploration Algorithm
The architecture-level exploration algorithm evaluates the performance of the remaining filter-level solutions, mapped on a specific architecture. Two different algorithms are used for single-loop and cascaded solutions, respectively.
The algorithm in Fig. 7 is applied if a single-loop solution is processed. Starting from a filter-level solution, an architecture is generated and its coefficients are calculated. The feedforward and feedback connectivity should be specified by the user, according to project-specific requirements, and is not explored as an additional design space dimension since it is not expected to generate major differences in the performance of the ADC. A wide range of input signal amplitudes is then used to detect the overloading level. The input signal applied at this stage is a pulse (a busy signal [13] ) with a fundamental frequency three times lower than the signal bandwidth. The next step is the coefficient optimization, performed with an input signal amplitude at the OVL previously detected, which sizes all the loop coefficients to limit the integrators outputs to a range defined by the designer. The SNDR and DR variations as a function of the input signal are then (behavioral) simulated. The two curves are tested for performance (see Section III-B) and passing solutions have their power estimated and are then saved for further processing in an architecture-level solutions pool.
The algorithm in Fig. 8 is applied for cascaded solutions. In this case the only architectural details predefined at the filter level are the number of loops and the number of quantizer bits in the last loop. Therefore, the filter orders of the individual cascaded loops are generated as one additional design space dimension named ORDERS in Fig. 8 . Another new design space dimension is the number of BITS in the first loop of the cascade. For simplicity, the last loops in a cascade of loops have the same number of bits as the last one. Each derivative of the input solution in the extended design space, as built based on ORDERS and BITS by the "Loop to MASH" step, is analyzed as an independent solution. Each loop in the cascade is designed following a procedure similar to the processing of the single-loop solutions and the coupling coefficients along with the digital filter gains are calculated. Behavioral time-domain simulations are performed to analyze the performance of each architectural variant and passing variants are saved in the solutions pool after their power consumption is estimated. If the currently processed architectural variant is not the last one in the (ORDERS, BITS) subspace, the next one is generated for the same filter-level solution; otherwise, the next filter-level solution is processed.
B. Performance Test
Performance testing is based on characteristics of the SNDR and DR curves as functions of the input signal level. One linear regression is performed on each curve, in a range defined from the SNDR zero-crossing to the overloading level, as defined by the peak value of SNDR. The slope of the SNDR curve is tested to be within 10% of the desired conversion gain (typically unity). A slope outside this range shows a strong dependency of the quantization noise power on the input signal level, which is not desired. The mean of the DR curve is then tested against the target DR value to verify if the target DR is attained. Finally, the peak regression residual of the DR curve is tested to be lower than 6 dB (1 bit) to verify if the required integral nonlinearity (INL) is attained. Finally, the overloading level is tested to be larger than dB. As an example, the SNDR and DR curves for an architecture rejected by the performance test algorithm are shown in Fig. 9 . The dotted lines are the linear regression fitted values for both simulated curves. The drop in DR at high input levels (larger than dB) shows that the peak NTF value needed to reach the target DR is too high, so premature clipping occurs [6] . The peak SNDR which still keeps a good overall INL is about 85 dB instead of almost 95 dB, as shown by its absolute peak value. But the detection of this effect requires a set of linear regressions with each point in the curve used as higher limit, which would increase the computation time tenfold. Instead, because the decrease in peak SNDR already disqualifies this solution, the simple yet effective criterion of peak regression residual limiting is used as rejection reason. The slope of the DR fitted line shows that the test for the SNDR slope also works toward rejecting the solution, even if the curves would pass the DR mean value test.
C. Power Estimation
The power of each solution accepted by the performance tests explained in Section III-B is evaluated, considering that SC circuits are used for DT solutions and active-RC circuits for CT solutions. Fully differential circuits are considered, with independent paths for input and DAC signal integration, as shown in Figs. 10 and 11 .
The power consumption of the SC integrator shown in Fig. 10 can be expressed as a function of the amplifier's input stage transconductance , which is designed considering the settling requirements for specific linearity performance [2] , [14] . The capacitive load driven by is derived from the noise performance of each integrator. Considering a one-stage amplifier, the total noise power at the input of the integrator is (9) with (10) where is a fraction of the integration capacitor (parasitic capacitance of , between the lowest plate of and the grounded chip substrate) connected at the opamp's output. The first part of the noise power in (9) is the noise of the switch on-resistance and the second part is the noise of the of the opamp. This power is referred at the input of the converter as OSR (11) where is the order of the integrator in the converter. With a given noise budget, (9) can be simplified to be only dependent on the integration capacitor and on the loop coefficients and used to calculate good starting values for all capacitors. The is then calculated from the required settling performance, considering a slewing followed by settling model [2] (12)
where is the biasing current (tail current) of the op-amp MOS input pair and is the clock period. , the residual voltage at the input of the opamp after the passive charge redistribution [2] , is calculated in the worst case as (13) The number of needed time constants, in (12) , is given by the settling time required to reach bits of linearity [14] as . The required linearity of an integrator is determined by allocating equal distortion power to each integrator and considering the loop gains to input-refer individual distortion powers, with the sum of the distortion powers set to 3 dB (0.5 b) below the target DR.
Assuming that MOS transistors operated in weak inversion are used in the input stage of the op-amp, and a compact expression for is obtained (14) After is calculated, the noise budget can be adjusted for the rest of the integrators to accommodate the (typically slight) increase in the budget of the so designed integrator.
The power consumption of the active-RC integrator shown in Fig. 11 is also defined by the of the operational amplifier. It can be calculated to reduce the nonlinearity introduced by the residual voltage below the limit derived from the target SNDR [16] .
With a one-stage amplifier, the noise power introduced by the CT integrator is proportional with the signal bandwidth (15) and is referred to the input of the loop by an expression similar to (11) . From the noise performance, the input and DAC resistors and are calculated and their values are used to evaluate the nonlinearity introduced in the integration current (singleended case) (16) by the nonlinear residual voltage . The calculation requires a large-signal expression of and yields two different values for MOS transistors operated in weak inversion and in strong inversion , respectively, shown as (17) Fig. 12 . Power consumption for g as a function of the noise power ratio R.
if is the target third-order harmonic amplitude, for bits of linearity [13] . By assuming a linear dependency of and in weak inversion, a more compact expression can be obtained. However, the expression for strong inversion cannot be compacted furthermore.
The noise power of each integrator is allocated based on an exploration of the noise power distribution across the converter. A part of the noise power of the previous integrator is allocated to the next one in the loop:
and the value of (smaller than one) is chosen to minimize the total power consumption. Fig. 12 illustrates the dependence of the power consumption on the noise power allocation through the ratio . The top curve is the power (expressed as the supply current) consumed for the total for a fifth-order, 3-loop, 2-2-1 architecture, with 5 b in the first loop and 3 bits in the other two loops. It is operated at an OSR of 16 times, with a signal bandwidth of 2 MHz, as in [4] . It is worth noting the 25% reduction in current consumption by the optimization of and the fact that the minimum current of 14 mA, is well related to the reported consumption of 36 mA if folded-cascode amplifiers are used [2] . The second curve in Fig. 12 is the current consumption for a fourth-order, one-bit single-loop ADC for audio applications kHz . The supply current of 200 A estimated in the best case also matches the design reported in [2] .
D. Yield-Based Optimization
From the remaining architecture-level solutions pool, the "top-ten" most performant ones are selected based on their ratio of peak SNDR versus power consumption. Only these selected solutions are passed to time-consuming, advanced behavioral simulations, which are used to find the limits for more circuit-level parameters, like dc gains for amplifiers, clock jitter for CT DACs, or multi-bit DAC mismatch. The "top-ten" solutions are then passed to an even more time-consuming Monte Carlo analysis which varies the ADC Fig. 13 . Yield of single-loop solutions for different noise margins.
coefficients using a user-supplied distribution. For CT designs, a process-induced spread (as large as 35% in CMOS processes) is also considered from the early design stages, since it reduces the available integrator output range. The user can specify for each integrator coefficient a spread (initial accuracy) value, along with the coefficient-to-coefficient mismatch value. A minimal capacitor and maximal resistor are specified that guarantee, in a given IC process, the coefficient-to-coefficient mismatch. A few hundreds of Monte-Carlo simulation steps are run for each of the top-ten solutions and the performance tests explained in Section III-C are applied. The single-loop solutions can be designed from the early stages to give 100% yield with relaxed matching requirements. This involves placing the in-band quantization noise power sufficiently low compared to the circuit noise (white noise) so that the latter one dominates the in-band noise floor. This margin should be large enough to guarantee the yield under process variations, as shown in Fig. 13 . With a noise margin of only 1 dB the yield of the single-loop design can drop to 70% when the coefficient mismatch is 5%. Even with a 1% mismatch the yield is only 88%, below the lowest yield considered acceptable at this abstraction level, which is 90%.
For cascaded solutions there is another parameter that can be used to optimize their yield. The number of bits in the first loop in the cascade is increased for solutions which do not attain 90% yield and another Monte Carlo yield analysis iteration is started for the improved solution. If the maximal number of bits is reached and the yield is still not large enough, the solution is dropped.
IV. EXPERIMENTAL RESULTS
The algorithms described so far were implemented in a combination of C and Fortran languages and compiled on Linux computers. The filter-level code has a total of 3700 lines. The architecture-level code has 5000 lines and also makes heavy use of functions written in the filter-level code. There is no user interface bound with the code, so porting it to other Unix computers can be done fast. Two examples are presented to show the effectiveness of global optimization through exhaustive design-space exploration. The first one is an audio ADC supplied from 1.5 V with rail-to-rail input and 1.5 V reference voltage. A designed circuit has been reported [2] which consumes 0.95 mW (only the analog part, without voltage reference buffers) for a DR of 98 dB and a peak SNR of 89 dB at a signal bandwidth of 20 kHz. The second one is an ADC for xDSL type of applications, supplied from 2.5 V, also with rail-to-rail input and the reference voltage equal to the supply voltage. The signal bandwidth is 2 MHz. A design has been reported [4] which consumes 90 mW in the analog circuits to attain 95 dB DR and 90 dB peak SNR.
The designs mentioned above are state-of-the-art examples. The results presented here show that an architectural exploration program can find other architectures than the above reported ones that offer better figures of merit (FOM) but the published designs are still among the best options. The FOM that we used is defined as the ratio of peak SNDR to the power consumption (in one-stage amplifiers)
The yield optimizations have been conducted assuming a 1% (fair capacitor/capacitor matching) coefficient-to-coefficient mismatch for single-loop architectures and a 0.5% (good capacitor/capacitor matching) for cascaded architectures, based on the properties of the technologies in the two reference designs [2] and [4] .
A. Audio Delta-Sigma ADC
The search for an optimal audio has been first performed in the entire design space to find the global optimum. A powerbased histogram of the complete set of possible solutions is shown in Fig. 14 . It shows that from all 207 solutions fit for the target specs more than half have a power consumption around and a large number of bits in the first loop, which increases the overloading level. Since in the simplified power model used in our tool more complex quantizers and DACs do not add power penalties, the FOM is also higher.
In order to compare the results with the reported state-of-the-art solution, restricted sets in terms of number of loops and number of bits have been analyzed. Fig. 16 contains the optimization results for a set of solutions restricted by the number of loops . The best solution is the third-order, 4-b loop, again working at OSR . This solution is also remarkable for its low number of bits compared to the other top performers. It shows that, for audio frequencies, a high OSR is still a good option.
Further design space restriction to 1-b single-loop architectures yields only the state-of-the-art fourth-order, single-loop solution with OSR reported in [2] . The 1-b DAC is often preferred for its inherent linearity but it requires a larger OSR. The solution is chosen from a set of four possible, three of which do not pass the yield test. The bandwidths of the one-stage operational amplifiers used here in the power model match with the ones reported in [2] , but the power consumption reported in [2] is larger since the first integrator contains a two-stage amplifier.
Note that this feature of the exploration tool that the designer can incrementally constrain the search space adds to its flexibility in practical use.
B. Delta-Sigma ADC for xDSL Applications
The results of the global optimization for a 4-MS/s ADC are shown in Fig. 17 .
Again, a third-order solution with OSR and 4-b quantizer has good performance, but most of the solutions operate at 16 times oversampling. They also have a large number of bits in the (first-loop) quantizer to attain high overloading levels.
To avoid solutions like the ones requiring 32 times OSR, the designer can conduct the search in a limited space, for example for (ORDER , OSR ). Furthermore, the number of bits can be limited, for example, to 6 to keep a low DAC complexity. The solutions for this search are shown in Fig. 18 . They are all cascaded ADCs except two which, even with their low FOM, can be good choices for low-voltage, mismatch-tolerant designs.
The best are the three-loop designs with a 2-2-1 configuration. The state-of-the-art solution reported so far [4] is among them, (2-2-1, 5-3-3) on the axis with an FOM of 101. During the initial optimization stages, only 3 b were needed in the first loop, but yield optimization reached the 5-b solution reported in [4] , the increase being needed to accommodate capacitor/ capacitor mismatch effects. The supply current needed for the is found here to be 14 mA, which is well correlated with the reported analog supply current of 36 mA, considering that folded-cascode amplifiers are used in the reported design.
V. CONCLUSION
An exhaustive architectural design-space exploration algorithm for global optimization of ADC designs has been presented. The algorithm examines all possible architectural solutions in two steps to increase overall efficiency. The large total number of possible solutions is first explored by the fast filter-level design step which rejects the ones which can not attain the required dynamic range. A reduced set of solutions is then forwarded to the architecture-level design step which evaluates their behavior by time-domain simulations. The power consumption of each solution is estimated and the ten most promising solutions are subjected to the time-consuming yield analysis. The generation of the entire set of possible solutions for a particular set of specifications takes approximately 24 h on a 1-GHz computer. Yield analysis can be run on subsets of solutions to find only those architectures which have certain required properties. Experimental results show that previously published state-of-the-art design solutions are among the best designs but not necessarily the best and that our method returns other, sometimes better, architectural solutions as well.
