Ramón Tortosa (email: castro@imse.cnm.es) was with CSIC-IMSE-CNM, Sevilla, Spain, and is currently with Analog Devices, Ireland.
I. Introduction
The ever shrinking minimum feature size of CMOS technologies has triggered a revolution in integrated designs, from application-specific integrated circuits to entire systems on a single chip. Notwithstanding, a critical design productivity lag has been reported [1] . With a productivity growth rate of 21%, compared to a 58% complexity growth rate, design cost is increasing rapidly. Taking into account the ever demanding time-to-market pressures, this picture is clearly worrisome. For analog and/or mixed-signal design the situation is even worse for many reasons, the most significant being the lack of commercial computer-aided design tools and methodologies to efficiently support the analog design. In spite of their advantages, CT ΣΔ modulators are more sensitive than DT modulators to some circuit errors, namely, clock jitter, excess loop delay, and technology parameter variations [2] . The latter are especially critical for the realization of cascaded architectures. This explains the use of single-loop topologies in most reported silicon prototypes [4] - [6] . Although single-loop CT topologies have potentially lower sensitivity to technological process variations than cascade CT topologies, the possibility of avoiding stability problems in the latter make them especially appealing for high-resolution, high signal bandwidth operation.
Most developments in systematic design methods and tools have focused on DT ΣΔ modulators [7] - [13] , probably due to their widespread use, but also due to their easier design. First, developments of methods and tools for CT ΣΔ modulators have addressed specific problems, such as efficient behavioral simulation [12] , [14] , [15] or topological synthesis by discrete-time to continuous-time transformation [16] , eventually with coefficient scaling using simulation of ideal modulators [17] .
This paper introduces a complete design methodology to assist the designer in the implementation of CT cascade ΣΔ modulators. The main components of this systematic methodology, introduced in section II, are the following: a) Performance modeling of dominant error sources at the modulator level (described in section III) b) A High-level topological synthesis method directly in the continuous-time domain (described in section IV) c) Efficient behavioral simulator with variable levels of modeling accuracy for architectural synthesis, specification transmission, and hierarchical verification (described in section V) d) Global and local optimization core for topology exploration and specification transmission (described in section II) e) Specification transmission driven by bottom-up information flow in the form of Pareto-optimal fronts (described in section VI)
The design of a 12-bit 20 MHz CT ΣΔ modulator in a 1.2 V 130 nm CMOS technology is used in section VII as an illustrative example of each step.
II. Systematic Design Methodology
Synthesis of high-speed CT ΣΔ modulators is a complex task which requires systematic design methods and customized tools. The objective of the synthesis process is to design a CT ΣΔ modulator able to meet the performance specifications, with minimum power consumption and minimum occupation of silicon area.
The synthesis procedure is schematically shown in Fig. 2 . In the three main stages of this design flow, design space exploration and specification transmission rely on the interaction of some kind of performance evaluator (such as equations and behavioral simulation with models at some level of abstraction) with an optimizer. The cornerstones of this process are an adequate formulation of a cost function, which quantifies the degree of compliance of the design with the targeted performance; a fast yet accurate method to evaluate the cost function; and an efficient technique to generate the next movement over the design space.
The optimization core used has two steps. In the first step, global optimization techniques are applied, whereas deterministic techniques are applied for local optimization in the second step [18] . Our experience is that adaptive simulated annealing algorithms are more efficient for global optimization addressing some specific design constraints, whereas other popular global optimization algorithms, such as evolutionary algorithms, are more powerful to explore trade-offs between performance specifications.
The optimization problem is mathematically stated as
where y oi (x) stands for the value of the i-th design objective (that is, to minimize power consumption); y rj (x) is the value of the j-th design constraint (that is, an SNR larger than 70 dB); Y rj is the targeted value of such design specification; and x is the vector of design variables. Design objectives, constraints, and variables depend on the optimization task at hand. For instance, block non-idealities (such as amplifier gain) are design variables for high-level sizing, but they are design constraints for circuit-level sizing. It is important to highlight the difference between a constraint and a design objective. Constraints define the set of valid designs (also called the feasible design space), whereas design objectives, such as power consumption or area occupation, characterize the optimality of the design and show the trade-off between valid solutions. The sizing engine carries out the optimization by using a single cost function. For those points of the design space that do not satisfy the design constraints, the cost function is defined as
where w j is the weight associated to the j-th constraint. For those points of the feasible design space, the cost function is defined as ( ) ( ) log( )
where w i is the weight associated with the i-th design objective. The inputs to the architectural synthesis stage (see Fig. 2 ) are required performance specifications of the CT ΣΔ modulator and the technology process information. The methodology starts by an architectural exploration, which basically tries to obtain candidate architectures, defined by the order of the modulator L, the number of bits of the quantizer(s) B, and the oversampling ratio M, which allows a certain SNR specification to be obtained. This architectural exploration is performed by using analytical expressions that model the dominant error sources limiting the achievable SNR, in combination with the optimization core previously outlined. The modeling of these error sources will be discussed in section III. The output of this architectural exploration is a set of candidate architectures that can potentially meet the modulator performance specifications.
The following step is the topological synthesis, that is, the definition of the cascade architecture, the intra-and inter-stage loop filter transfer functions, and the cancellation logic functions. A direct synthesis method in the CT domain is used here instead of the more conventional DT to CT transformation of an equivalent DT topology. The direct topological synthesis method is described in section IV.
The input to the high-level sizing stage is the structural description of the topology being synthesized. The subsequent automated sizing process uses the behavioral simulator (see the non-idealities considered in section V) together with global optimization procedures to find out the maximum values of non-idealities of the different building blocks that can be tolerated while still meeting the modulator performance specifications. At this level, power consumption estimates are much more detailed as relationships with each building block specifications can be established [7] . The modulator performance with the transmitted building block specifications is then verified under all operating conditions (process, temperature, and supply variations) by using the behavioral simulator (section V). If this verification shows that some performance specification degrades beyond certain limits, the high-level synthesis and/or the architectural synthesis are performed again under harder constraints.
The last step of the synthesis procedure is the sizing of the building blocks. The inputs to the circuit-level sizing stage are the performance requirements for each building block (for instance, DC gain and bandwidth of amplifiers, or hysteresis and offset for comparators). This sizing is performed by combining an electrical simulator with the global optimization procedure previously outlined [18] . The implementation of the optimization core is flexible enough to incorporate valuable design knowledge of each building block. At the optimization level, design knowledge brings knowledge of the feasibility space. This limits the exploration space and makes the synthesis process more efficient, thereby enhancing the optimization results.
With all blocks sized, a final verification of the complete modulator at the electrical level at a limited number of operating conditions is performed, namely, at the nominal point and a few critical process corners. This verification is complemented by a more exhaustive verification (all process, temperature, and supply variations) at the behavioral level with information extracted at the electrical level. Performance degradations beyond tolerable margins induce redesign iterations at the circuit and/or modulator levels.
III. High-Level Performance Modeling and Architectural Exploration
As shown in section II, design space exploration and specification transmission rely on the iterative interaction between a global optimizer and a fast performance evaluator. At a high level of abstraction, modulator performance is modeled by a set of closed-form equations, which are relatively inaccurate but carry essential information on the design parameters dominating the system behavior. The signal to noise ratio of a ΣΔ modulator is given by
where A represents the magnitude of the input signal, and P ε represents the in-band error power. Ideally, the in-band error power only contains the quantization noise P εq :
Here, X FS is the full-scale of the quantizer, B is the number of bits of the quantizer, f s is the sampling frequency, N TF (.) is the noise transfer function, and B W is the signal bandwidth. However, in practice, the error power contains terms due to quantization error power enlargement, digital-to-analog converter (DAC) non-linearities, capacitor mismatching, thermal noise, clock jitter, finite amplifier gain, incomplete amplifier settling, and so on. Therefore, the in-band error power becomes thermal jitter DAC settling ...
Unlike other types of ΣΔ modulators, a dominant error source in high-speed CT modulators is the error power due to clock jitter. For this reason, closed-form modeling of the influence of jitter is of paramount importance. The error power due to clock jitter in CT ΣΔ modulators with non-return-to-zero (NRZ) DACs can be expressed as in [19] 
where A and ω i are the amplitude and frequency of the input signal, and
is a function arising from the state-space representation of the noise transfer function of the modulator and depends on the modulator order. It can be seen that it has two terms. The first term depends on the modulator input and decreases with the sampling frequency, and the other depends on the modulator architecture and increases with sampling frequency. For illustration's sake, Fig. 3 represents the two terms in brackets in (4) for a 5-bit third-order modulator with a 20 MHz input frequency as a function of the sampling frequency. Notice that there is a frequency range dominated by the signaldependent term and another range dominated by the modulator-dependent term. Therefore, there is an optimum sampling frequency which minimizes the in-band jitter noise power and, hence, maximizes the SNR.
The use of the dominant error power terms in (3) (shown in (2) and (4)) in combination with the optimization core allows candidate architectures to be extracted (each represented by a triad of values of order, number of bits, and oversampling ratio {L, B, M}) with better performance in terms of distribution of the noise transfer function (NTF) zeroes and insensitivity to clock jitter.
Usually, several triads are considered for later stages for several reasons. First, the modeling equations are very approximate and, therefore, there is no guarantee that the selected architecture will continue to meet the performance specifications when more accurate models containing the non-idealities of the particular physical implementation are used. The optimal architecture is the one that, meeting the performance constraints, minimizes objectives like power consumption or area occupation. Exploration criteria at the architectural level include considerations like order minimization, minimization of oversampling ratio to avoid infeasible sampling frequencies in terms of power consumption, and minimization of the number of bits in the quantizer to avoid the use of linearization techniques [4] , [19] . Therefore, the power or area minimization criteria that can be considered at this level are of qualitative nature; therefore, any ranking of candidate architectures may suffer significant changes when progressing through the synthesis process. However, this is not very critical at this stage because the desired result is just a set of candidate topologies which will be pruned when more detailed models are considered in subsequent design steps.
IV. Topological Synthesis
Cascade CT ΣΔ modulator architectures are usually synthesized by first synthesizing a ΣΔ modulator with the same performance specifications in the DT domain and then applying a DT to CT transformation that keeps the same digital cancellation logic [16] . However, obtaining a functional CT modulator from this transformation and keeping the cancellation logic requires every state variable and DAC output to be connected to the integrator input of subsequent stages as Fig. 4 shows for a 2-1-1 architecture. This means a larger number of analog components (transconductors and amplifiers), which translates into larger area, higher power consumption, and higher sensitivity to circuit tolerance.
To avoid this, we have developed a synthesis method directly in the continuous-time domain which we will present in this section. Let us consider the general case of a cascaded CT ΣΔ modulator with m stages as shown in Fig. 1 . Let us denote the transfer function from y i (s) to the input of j-th quantizer as
The synthesis method starts by optimally placing the poles of the single-loop transfer functions F ij (s) at the positions which minimize the NTF in the signal bandwidth [21] . Their numerators are obtained by combining behavioral simulation with the optimization core. Starting from the nominal values required to place the zeros of the corresponding NTF, the modulator performance is optimized in terms of dynamic range and stability. For this purpose, these coefficients are varied in a range around their nominal values in order to maximize the SNR while maintaining stability. Then, transfer functions F ij (s) are automatically determined by the inter-stage integrating paths.
If the modulator input, x(t), is set to zero, it can be shown that the output of each stage y k (z) can be written as
where Z stands for the Z-transform, and L -1 is the inverse Laplace transform. The output y o of the modulator can be written as
where CL k (z) represents the partial cancellation logic transfer function of the k-th stage. The partial cancellation logic transfer functions can be calculated by imposing the cancellation of the transfer function of the first m-1 quantization errors E k (z) in (7). This gives
where the partial cancellation logic transfer function of the last stage, CL m (z) can be chosen to be the simplest form that preserves the required noise shaping. By using this method, the 2-1-1 architecture in Fig. 5 is synthesized. The circuitry is significantly less complex than that shown in Fig. 4 . Another positive consequence is better sensitivity to parameter tolerances. 
V. Behavioral Modeling and Simulation
Specification transmission and verification require performance evaluation mechanisms with much higher accuracy than that provided by approximate equations such as (2) to (4). Moreover, this performance evaluation is frequently performed within an iterative optimization process; therefore, simulation efficiency is critical for the synthesis process.
Because ΣΔ modulators are strongly non-linear sampleddata circuits, simulation of their main performance specifications has to be carried out in the time domain. Due to their oversampling nature, long transient simulations are necessary to evaluate their main figures of merit. Therefore, transistor-level simulations yield excessively long computation times. An appropriate trade-off between simulation accuracy and efficiency is accomplished by using behavioral simulation. In this approach, the modulator is partitioned into sub-blocks (integrators, quantizers, and so on), which are modeled by a set of equations, containing the main sub-block functionality and the most important non-idealities. For the implementation platform, we selected Matlab/Simulink [22] .
Behavioral models of the continuous-time building blocks are described by a set of continuous-time state-space equations which are integrated by Matlab solvers. To increase simulation efficiency, we make extensive use of S-functions [23] . This mechanism allows non-idealities to be modeled by embedding C-code routines instead of interconnecting numerous Simulink elementary blocks. The basic building blocks modeled in the behavioral simulator as well as its non-idealities are summarized in Table 1 .
The developed toolbox includes several libraries of CT building blocks (integrators and resonators) considering different circuit implementations: gm-C, gm-MC, active-RC, and MOSFET-C. Examples of the application of this behavioral simulator to synthesis and verification can be found in section VII.
VI. High-Level Sizing Using POFs
As stated in section II, the combination of the behavioral simulator in section V with the optimization tool allows the high-level sizing of the ΣΔ modulator to be efficiently performed, that is, getting the maximum non-idealities of the building blocks which can be tolerated while meeting the modulator specifications.
This high-level specification transmission has some drawbacks. First, there is no information a priori about the realizability of the building blocks with the transmitted specifications, and this may induce redesign iterations. Secondly, even if the specifications are realizable, the results 
Pbias1

Pbias2
Out + Out -
Pbias1
Vin+
Vin-
may be suboptimal in terms of area or power consumption due to an inappropriate balance among the specifications demanded for different building blocks.
Specification transmission can be made more efficient if Pareto-optimal fronts of candidate architectures are available. These fronts represent trade-off hypersurfaces between the different types of circuit performance [24] - [26] . For illustration's sake, Fig. 6(a) shows the trade-offs between dcgain, gain-bandwidth (GB) product, and power for the folded-cascode operational amplifier shown in Fig. 7 in a 130 nm CMOS technology. This Pareto-optimal front was generated by combining a genetic algorithm [27] with an electrical simulator. Projection on the XY plane in Fig. 6(b) makes it easy to visualize the best trade-off between dc gain and GB that the circuit at hand can achieve. Pareto fronts usually have higher dimensionality including all specifications of interest of the building block, which allows exploration of the trade-off among them, including the power and area budget.
Although Pareto fronts can also be used to directly provide the sizes of the building blocks (each point in the Pareto front represents a design point), in our case, we have used them only to guarantee the feasibility of specifications and to provide estimates of area and power for higher hierarchical levels. The reason is that Pareto fronts are usually generated by using evolutionary algorithms, where a population of individuals evolves towards the best performance trade-offs. Therefore, the number of points of the Pareto fronts is necessarily limited [26] . Restricting the search space to those points would lead to suboptimal solutions; therefore, better results are obtained if circuit sizing using the statistical optimization techniques discussed in section II is applied with the circuit specifications obtained in the high-level sizing.
VII. Case Study
The objective specifications for the CT ΣΔ modulator are 12 bits with 20 MHz signal bandwidth for a wireline communication application, to be implemented in a 130 nm CMOS technology. As a result of the different steps of the architectural exploration process, several fifth-order (L=5) cascade ΣΔ modulators were selected: 2-1-1-1, 2-2-1, and 3-2 topologies. Only the topology which is retained for the final synthesis steps is conceptually shown in Fig. 8 (a) . It consists of a 2-2-1 topology and is clocked at f s =240 MHz (M=6) with B=4 and NRZ DACs in all stages in order to minimize the effect of jitter.
The intra-and inter-stage transfer functions F ij can be written as 11  10  21  20  30  11  22  33  2  2  2  2  1  2   20 30  10 20 30  23  13  2  2  2  2  2  2  2  1  2 ( ) , ( 
where ω p1,2 denotes the optimal placement of the pole frequencies. Coefficients b ij in (9) are found through a 10 20 30  10  14  3  3  2  2  1  2  2  1   3  3  3  3  1  2  1  2  1  2  2  1   10 20 30  11  13  3  3  2  2  1  2  2  1   3  3  2  1  2  1  1  2   3  2  1  2 ( )
where T s =1/f s is the sampling period. Figure 8 (b) shows the conceptual circuit implementation of the modulator. The results of the optimization process are summarized in Table 2 , which includes the values of loop filter coefficients, k i (implemented as transconductances) and the capacitances, C i , used in the modulator. The modulator was high-level sized. That is, the system-level specifications (12-bit at 20 MHz) were mapped onto buildingblock specifications using global optimization for design parameter selection, Pareto-optimal fronts of the sub-blocks and behavioral simulation for evaluation. The result of this sizing process is summarized in Table 3 , showing the maximum (minimum) values of the non-idealities (at the building block level) that can be tolerated to fulfil the required modulator performance. Notice that not all non-idealities that were presented in Table 1 are listed in Table 3 . In this table, only specifications for those non-idealities that have a significant impact on the modulator performance are collected. Thermal noise of amplifiers, for instance, is not critical for the design at hand (it is usually less critical than the integrator's thermal noise from the RC elements, whose in-band noise power, in this case, is around 85 dB). The building blocks, including a front-end opamp, loop filter transconductors, quantizers, and DACs are designed by applying a cell-level sizing tool [18] . Due to space limitations, only the synthesis results of the front-end opamp sizing are shown here.
The schematic of the front-end operational amplifier used together with its common-mode feedback circuit is shown in Fig. 7 . It is a fully differential folded-cascode topology with gain Table 4 . Electrical performance of the front-end opamp. boosting. After a simulator-in-the-loop optimization process, the resulting sized circuit achieves the electrical performance shown in Table 4 . A similar sizing is applied for the other building blocks.
The modulator performance before and after layout generation has been extensively verified. Figure 9 shows the layout of the modulator. The total occupied area is 2.33 mm 2 (pads included) with a power dissipation of 70 mW from a single 1.2 V supply voltage. As an illustration, Fig. 10 shows the output spectrum for an input sine-wave with -6.5 dBV amplitude and 1.76 MHz frequency. The maximum signal-to-(noise+distortion) ratio (SNDR) is 80 dB (about 13 bits). These results correspond to a full electrical-level post-layout simulation (thermal noise is included in the simulation). Due to the long simulation time, this type of verification is only feasible for a limited set of simulation conditions. A more exhaustive verification (including process, supply, and temperature variations) is performed by using the behavioral simulator with data obtained from the electrical simulation of the building blocks. This allows, for instance, the application of Monte Carlo simulation to evaluate the influence of mismatch on performance deviations. In the present case, even in the worst-case mismatch, a maximum SNDR larger than 74 dB is obtained. Finally, Fig. 11 shows the results from a two-tone input signal test. The performance degradation due to the third-order intermodulation distortion (IM3) is well below the required resolution.
VIII. Conclusion
This paper has presented a complete design methodology supporting the design of CT ΣΔ modulators. An appropriate combination of design knowledge, behavioral simulation, synthesis methods, and optimization tools eases the complex design of these high-performance modulators.
