Abstract-Design of a high speed capacitive digital-to-analog converter (SC DAC) is presented for 65 nm CMOS technology. SC pipeline architecture is used followed by an output driver. For GHz frequency operation with output voltage swing suitable for wireless applications (300 mVpp) the DAC performance is shown to be limited by the clock feed-through and settling effects in the SC array rather than by the capacitor mismatch or kT/C noise, which appear negligible in this application. While it is possible to design a highly linear output driver with HD3 < -70 dB and HD2 < -90 dB over 0.55 GHz band as we show, the maximum SFDR of the SC DAC is 45 dB with 8-bit resolution and Nyquist sampling of 3 GHz. The capacitor array is designed based on the DAC design area defined in terms of the switch size and unit capacitance value. A tradeoff between the DAC bandwidth and resolution accompanied by SFDR is demonstrated. High linearity of the output driver is attained by a combination of two techniques, the derivative superposition (DS) and resistive source degeneration. In simulations the complete DAC achieves SFDR of 45 dB with 8-bit resolution for signal bandwidth 1.36 GHz with Nyquist sampling. With 6-bit and 5.5 GHz bandwidth 33 dB SFDR is attained. The total power consumption of the SC DAC is 90 mW with 1.2 V supply and clock frequency of 3 GHz.
INTRODUCTION
On the pace towards broadband connectivity in wireless telecommunication systems increasingly more demands are placed on the performance and speed of the data converters. To enable ever higher data rates, wider channel bandwidths and advanced DSP techniques such as OFDM or multi-bit QAM are required that in terms of design, transform onto speed, dynamic range, and the linearity specifications. Implemented in CMOS technology data converters have proven to meet the broadband communication challenges in particular for the speed of CMOS devices and also for low manufacturing cost .
Among D/A converters (DACs), recently, the currentsteering architectures [1] [2] [3] [4] have been mainly focused on due to their simplicity and the achieved good performances. Unlike this, the capacitive DACs (SC DACs) have not drawn much attention except for two recent publications [5, 6] .
In fact, SC DACs have some advantages over the currentsteering architectures [5, 6] . Specifically, as capacitor arrays they are easier to match that makes SC DACs virtually more linear. For n-bit resolution a pipeline SC DAC only requires n identical stages as opposed to 2 n unit cells in a current steering DAC. Also the power consumption is low because their operation is based on charge redistribution. On the other hand, the ability to drive off-chip load of SC DACs is poor, so a suitable output driver is necessary. The driver appears critical for linearity of a capacitive DAC unless the required output voltage swing is low. One solution to this problem is using a closed-loop architecture where the circuit linearity is improved by negative feedback [5] . This technique, however, cannot work well at high frequencies (> 1GHz). Another solution is an open-loop driver design that can be supported by interleaving and a pre-distortion technique [6] . However, the latter is usually limited by an assumption of minimum memory effect of the nonlinear object.
In this paper we present design of a simple pipeline SC DAC with a highly linear output driver using 65 nm CMOS technology. The main objective is achieving maximum data rate in this architecture. The circuit is optimized under linearity and noise constraints imposed by the capacitor array and the output driver. We define the design area of the SC array in terms of unit capacitance and the switch size for given resolution and clock frequency. The SFDR and SNR analysis shows that the DAC bandwidth limit is subject to incomplete settling and feed-through effects rather than kT/C noise and the capacitor mismatch. A tradeoff between the DAC bandwidth and resolution accompanied by SFDR is demonstrated.
The DAC output driver is an open-loop wideband amplifier patterned after [22] . To attain high linearity a combination of the derivative superposition [12, 14] and resistive source degeneration technique is used. We estimate the driver nonlinearity by using the Volterra series model [23] . The analysis verified by simulation shows that for signal bandwidth up to 5 GHz and the output voltage swing 300 mVpp, HD3 better than -72 dB is attained that corresponds to 12 bit resolution. This shows the contribution of the SC array to HD3 and SFDR of the SC DAC to be dominant.
Simulation results of the complete SC DAC with 8-bit resolution and Nyquist sampling show SFDR of 45 dB for 1.36 GHz signal bandwidth. For 6-bit resolution and signal bandwidth of 5.5 GHz, SFDR of 33 dB is attained. The respective maximum data rate of 72 Gbps compares favorably with the state-of-the-art.
The paper is arranged as follows. In Section II the pipeline SC DAC architecture and its operation principle are presented. Analysis of the SC array in terms of noise, capacitor mismatch, clock feed-through, and SC settling time is provided in Section III. Based on this we introduce the DAC design area model to identify the maximum data rates and SFDR feasible in 65 nm CMOS technology. Comparisons to the possible 28 nm CMOS implementation are provided in Appendix. In Section IV we present the output driver design supported by Volterra model linearity analysis and noise analysis. Section V provides simulation results of the complete SC DAC, followed by a discussion. Conclusions are formulated in the last section.
II. SC DAC ARCHITECTURE
The pipeline capacitive DAC architecture shown in Fig.1 was firstly proposed in [18] . The 12-bit input is divided into 4 segments of three-bits each. It needs one cycle of clock to complete 3-bit conversion in the 3-bit SC circuits. The 3-bit DFFs are employed to synchronize the signal using a nonoverlapping three-phase clock. The output driver serves as a buffer driving an off-chip standard load. Figure 2 . Three-bit switched capacitor segment. Figure 2 depicts the 3-bit SC segment where a three-phase clock (Φ1, Φ2 and Φ3) is used. In the first SC segment, an extra switch is necessary to reset voltage at node A in the pre-charge step. During the time when Φ1 = 1, the capacitor at A is discharged to zero and the capacitor at B is charged either to Vref if b1+k = 1 or discharged if b1+k = 0. When Φ1 goes low and Φ2 goes high, the voltages at A and B are equal to Vref b1+k/2. The capacitor at node C is controlled like the cap at B in phase Φ1. Next, when the clock Φ3 = 1 the voltage at B and C would attain a level of Vref ( b1+k /2 + b2+k )/2. Similarly, when Φ1 = 1 again, the voltages at C and D will be equal to Vref ( b1+k /4+ b2+k /2 + b3+k)/2. This is in fact, the output voltage of the first SC segment and the same process will continue for the following segments.
When all the parasitic capacitances of the circuit are neglected the output voltage of the SC pipeline DAC is linearly dependent on the values of input data according to 
III. SC ARRAY DESIGN AND ANALYSIS

Noise analysis
Noise contribution from all switches to the output can be estimated using the superposition principle. Basically the hold portion of noise is considered since the track noise is not transferred except for two switches in the last section of the pipeline. For switch Φ 1 of the first section (Fig. 3) Moreover, we observe that the transferred noise voltage remains on the output capacitor only for 2/3 part of the clock period and otherwise the capacitor is discharged. The latter effect is reflected in the frequency domain by a sinc function shaping the noise spectrum accordingly. Hence, the respective noise PSD contribution (one sided) to the output at low frequency can be found from 
The contribution of switch Φ 10 is the same as of Φ 1. For noise of the Φ 2 switch we find the incident capacitance to be Cu/2 and the respective held voltage to be shared by two Cu capacitors in series. The transfer function to the output is 
For switch Φ 3 only the transfer function is changed to For the last two switches in the pipeline (Fig.4 ) also their track noise should be taken into account 
where for Φ 1last contribution both the hold and track time is 1/(3fs). For Φ 3last a resistance of 2Ron is assumed (two switches in series). The total hold noise by the floating switches Φ 2, Φ 3, Φ 1, … can be estimated as 
The total noise PSD at the output including the track noise can be found from
For practical design values the hold noise (kT/C) in (7) matters while the track noise term in can be omitted. (7) and by a circuit simulation model using SpectreRF. The results are achieved by Transient noise simulation followed by spectrum averaging at low frequencies. Using Pnoise analysis the same result is obtained.
Generally, the SC noise should be less than quantization noise of DAC which can be calculated from
where VFS is the full-scale voltage, n is the number of bits, and BW = fs /2. Then using (7) we can write For example, for n = 12 and VFS =1.3 V, and T = 290 K from (10) we obtain Cu > 191 fF. Observe, that for the minimum value of Cu the total noise level of DAC gets doubled and SNR is less by 3 dB compared to SNQR.
Capacitor mismatch analysis:
Following the analysis in [21] a similar estimate for voltage error due to capacitor mismatch can be attained
where VFS is the full-scale output range and VDNL is the differential nonlinearity of DAC. The standard deviation σ(∆Cu/Cu) of the capacitor mismatch for a typical metalinsulator-metal (MIM) capacitor can be found from [24] A
where A is the capacitor areawhereas Kσ (matching coefficient) and KC (capacitance density) are technology constants provided by the chip manufacturer. From (11) and (12) , the standard deviation of the voltage error caused by capacitor mismatch is found as
For the DAC accuracy, it is common to maintain 3(VDNL) less than a half LSB. Hence, the lower bound for the unit capacitor due to the mismatch in n-bit DAC is
For 65 nm CMOS process we have Kσ = 0.5% µm, KC = 5 fF/µm 2 that for 12-bit DAC give the unit capacitor Cu > 5 fF. As already shown, the limit due to kT/C noise calculated from (10) for VFS =1.3 V is Cu > 191 fF so it largely prevails while the limit due to the capacitor mismatch can be neglected. To verify the estimates by simulation, we refer to more accurate equations from [21] 
The standard variation of VDNL would be
Substituting (17) to (16), we have With 65 nm CMOS process of STM foundry we have Kσ = 0.5% µm, KC = 5 fF/µm 2 . Fig. 6 gives a comparison of VDNL standard deviation obtained with model (18) and SpectreRF simulations. The simulation results are achieved by statistical Monte-Carlo analysis only with the focus on capacitor variation whereas the switches are assumed ideal and the simulation count is 100. A very low clock frequency was used to verify the static VDNL with input data of MSB transition [21] . We usually use 3 standard, meaning that 3(VDNL) should be less than a half of VLSB. The limitation line shown in Fig. 6 is therefore one sixth of VLSB of 12-bit resolution DAC which is 53 V with VFS =1.3 V. The VDNL standard deviations are in nano-volt range while those from simulation are by one order of magnitude less than this. The results from simulation are lower than the ones from estimation because in calculation we assumed total variation of mismatch as maximum [21] while the variation of unit capacitors mismatch in Monte-Carlo simulation is randomly distributed. Importantly, the both values are by orders of magnitude less than the limitation voltage of 12-bit DAC. Therefore the limit due to the capacitor mismatch can be neglected while the one due to kT/C noise prevails in this case.
Clock feed-through effect:
To discuss the feed-through effect we consider three groups of switches in the SC array ( Fig. 2) , namely: data switches, charge/discharge switches, and charge redistribution switches. The clock feed-through caused by the data switches is not meaningful because there is always a low time-constant path either to Vref or GND. Unlike this, the feed-through related to the charge switches changes the capacitor voltage when they are switched off because the respective capacitors see high impedance during this period of time. Since the switches are designed as transmission gates, there is inherent partial cancelation of this effect due to the rising and falling edge applied at the same time to the PMOS and NMOS device, respectively. This cancelation requires correction by dummy transistors [19] since the PMOS devices are three times larger than the NMOS (for balance of resistance).
The charge redistribution switches require compensation of the feed-through effect not only upon switching off but also switching on, taking thereby more effect on the transfer function and the DAC linearity. For this reason the redistribution switches require more precise feed-through cancelation that also entails an extra design constraint. Assuming the feed-through voltage to be limited by a half LSB we have
where Cp(W) is parasitic capacitance of a switch with respect to the switch size, while Kcomp is a constant representing the feedthrough compensation by the dummy devices. This constant can be identified during optimization of the charge redistribution switches. Assuming that the total mismatch between two single-ended branches of differential topology (including mismatch of switches, caps, jitter, skew) is Kdif, from (19) the limit for Cu follows
Settling time analysis
Due to imperfections of the transistor switches (Roff / Ron resistances) the leakage and incomplete settling in the capacitor array result in voltage errors which degrade DAC linearity [18] . Using the schematic in Fig. 2 , we can consider an error at node B for Φ1 = 0 while Φ2 = 1. For the incident bit value equal to 1, there is a leakage current from Vref through Roff to the capacitance 2Cu seen from node B. In this case, the onresistance of switch Φ2 can be neglected since Ron << Roff. The respective voltage change ( where Roff is the off-resistance of a charge switch, Ts = 1/fs is clock period, and tpulse is the pulse width. Importantly, tpulse < 1/(3fs) to guarantee non-overlapping 3-phase clock operation. In order to keep the voltage error (21) less than half LSB we find the lower bound of Roff
Similarly, we can find the upper bound of the switch onresistance. Fig. 7 (b) shows the charge phase with incident input bit equal to 1. The respective settling time error can be found from Since (27) is more relaxed compared to (24) the upper bound for Ron is set by (24) . The settling conditions (22) and (24) appear stringent for high frequency clock. For illustration, let us consider fs = 3 GHz, tpulse = 50 ps, (tpulse < 1/(3fs) ), n = 7, and Cu = 200 fF. In this case the lower bound of Roff = 30 k and the higher bound of Ron = 19.5 . A suitable size of a switch (transmission gate) in 65 nm CMOS with channel length L = 0.06 µm is depicted in Fig. 8 (22) and (24) . However, for higher DAC resolutions the available design range is reduced and already for n > 8, it will vanish unless the demand of half LSB accuracy is relaxed. 
Bounds for unit capacitor:
Having mapped the resistances Roff and Ron onto the width of the switching devices, we can identify functions Roff(W) and Ron(W), and redefine our design task by combing (22) and ( In this case, the lower-and upper bound for Cu are defined in terms of the switch size W, pulse time tpulse, and the number of bits n. In fact, also tpulse depends on W due to parasitic capacitance of a switch and the driving source resistance that reduce the pulse time. Estimating the rise/fall time trf from 10% to 90% voltage change we find
where Rd is the driver resistance and Cg(W) is the gate capacitance with respect to the switch width W. Then tpulse time appears as
where tn is a latency time to guarantee non-overlapping clock. The additional reduction by 4trf is due to the rising and falling time, while Cg value is doubled due to a dummy switch (to compensate for the feed-through effect [19] ). Based on this (28) can be rewritten as
Ronmax Roffmin
Design range
This result is illustrated in Fig. 9 for clock frequency fs = 3 GHz, tn = 30 ps, n = 7 with solid lines representing the upper and lower bounds for Cu. The design area W-Cu is additionally limited by the feed-through condition (20) shown as the straight line (green) lower bound. The conditions due to kT/C noise and capacitor mismatch are negligible in this case. For W < 570 m the Cu lower bound due to Roff applies. For W > 570 m and W < 780 m the lower bound due to the feedthrough limit applies.
The dashed lines illustrate the design area for the 8-bit resolution to vanish (the upper bound goes below the lower bound). If the accuracy requirement of half LSB is relaxed the design area can be restored. Interestingly, for 1-LSB accuracy and n-bit resolution the same design area is achieved as for (n1)-bit with half-LSB accuracy, so the design area shown in Fig. 9 for the 7-bit DAC is also valid for the 8-bit provided the accuracy constraints are changed to 1-LSB. According to (31) the design area largely shrinks with the clock frequency as well. This is illustrated in Fig. 10 
In this case the feed-through bound is not changed but the other two bounds due to (31), go down largely reducing the design area.
To increase the speed of this SC DAC, we have to reduce its resolution. For example, increasing the clock frequency to fs = 12 GHz a convenient design area for a 6-bit DAC is available as shown in Fig. 11 . Then, with the corresponding Nyquist bandwidth < 6 GHz the estimated maximum data rate amounts for 72 Gbps.
In the analysis and simulations, the reference voltage was assumed as an ideal source. However, previous work such as [4] has shown that the reference voltage can for example be generated using on-chip low drop-out regulator (LDO).
In principle, the tradeoff between the resolution and bandwidth (or data rate) in this design can be conveniently moved towards higher levels of those quantities by making use of newer CMOS technologies. For illustration in Appendix-C we demonstrate the DAC design areas for Fully Depleted Silicon-on-Insulator (FD SOI) 28 nm CMOS process that enables a significant increase of the resolution and bandwidth. Clearly, the tradeoff between them still exists. However, other features such as better element matching, compared to current steering solutions, make SC DACs even more appealing. 
IV. OUTPUT DRIVER DESIGN Problem and solution:
The output voltage of high speed DACs for wireless and wire LAN communications can also exceed 300 mVpp. In practice, we have to treat this voltage as a large signal like in the power amplifier case. There are several linearization techniques well established in power amplifier design [8] . Predistortion as one of them was already applied in a high speed DAC to compensate for non-linearity of its output driver [6] . Using the main and auxiliary amplifier/driver for non-linearity cancellation [9, 10] is another possible solution. As an additional inspiration the techniques developed for low noise amplifiers (LNAs) can also be considered [11] [12] [13] [14] 17] . In particular, the third-order distortions can be cancelled using various linearization techniques, like negative feedback,
7-bit
8-bit
Design area 3GHz 4GHz harmonic termination, optimum biasing, feed-forward, or derivative superposition (DS).
In this design, we are targeting high SFDR and bandwidth (> 1 GHz). This makes the driver design a challenge. Fig. 12 (a) shows a circuit suitable for third-order non-linearity cancellation by DS method that has been recognized both in small-and large signal applications such as LNAs [12] and power amplifiers [16] , respectively. This technique is based on cancellation of the transconductance g3 ( d 3 Id/d(Vgs) 3 ) of the main transistor and g3 of the auxiliary one for properly biased devices. As shown in Fig. 12 (b) , if Vgs is biased around [0.520.67] V the output current has negligible 3 rd order transconductance. So the suitable input swing is about 150 mVpp that is however, not sufficient for most applications. Moreover, the auxiliary transistor has to be biased in weak inversion to generate positive g3B value [14] . Larger input swing tend to push this transistor into the off region for a part of the input cycle raising the distortion. With 300 mVpp swing this circuit cannot provide HD3 better than -50 dB.
MA
To overcome these problems, we propose a combination of a simple inherent feedback and DS technique that is illustrated in Fig. 13 . The negative feedback technique helps to reduce both second-and third order non-linearity of a system [14] . Specifically, we try to avoid using inductors and complexity by complementary circuits necessary in different solutions.
Like for the inductive source-degeneration [14] , in this case AIIP3 improves approximately by a ratio (1+gm1RS) 3/2 . Moreover, the source resistor provides wideband matching for the driver load. A large value of gm1RS is also useful to maximize the closed-loop voltage gain of the driver, ideally approaching one. Since the effective RS value is limited by the 50  load, more gain can only be achieved by increasing gm1 at the expense of the power consumption. Fig. 13 (a) shows a half of the proposed output driver designed as a differential circuit. The voltage at the output of SC pipeline can be large enough to provide around 300 mVpp at the driver output even for a moderate gain gm1RS. Since we use 1.2 V supply to limit power consumption of the driver, not much voltage headroom is left to keep the signal within a linear region of the input/output characteristics. Inevitably, the signal is exposed to non-linear behavior of the transistors and hence, good enough cancelling of the non-linear effects is necessary.
As shown in Fig. 13 (b) the cancelling range is much wider compared to the previous case. The both transistors operate in saturation that makes the circuit less sensitive to large input swing. Also larger bias voltages are allowed providing larger transconductance values gm1 and lower HD3 distortions at the same time.
Volterra series model:
The advantage of the DS technique combined with the resistive feedback will be discussed using the Volterra series model [23] . We focus on the third-order distortions as they used to prevail over the other distortions (33) and (34) with these values shows HD3 of the proposed circuit to be by 26.8 dB better than that of the DS-only circuit. This result is very close to the simulation result shown in Fig. 14 . For 300 mVpp output swing and 50  resistive load, HD3 < 80 dB is attained that corresponds to 13-bit DAC resolution.
Within the same bias voltage range also the second-order distortion term can be partly cancelled. Using the Volterra model we have By comparison of (36) and (37), the proposed driver shows HD2 by 15 dB better than its counterpart. This result is also verified by simulations presented in Fig. 15 . As the driver is designed as a diffrential circuit a mismatch between its two branches was introduced to model HD2. Also HD3 proves immune to the mismatch. For high frequency operation the driver circuit model would be refined as shown in Fig. 16 . The parasitic resistances and capacitances roA, roB, CdsA and CdsB of the two transistors can be merged as Zout = (ZSroAroB(1/sCdsA) (1/sCdsB) with ZS = (RS(1/sCload) and Cgs =CgsA +CgsB. Since roA, roB >> RS and Cload >> CdsB, CdsB, we can approximate the driver output impedance as Zout = (RS(1/sCload) . 
HD3 obtained by the Volterra series model (38) and by SpectreRF simulations is depicted in Fig. 17 for two Cgs values. HD3 tends to degrade with frequency beginning from the low cut-off frequency of ZOUT. For higher frequencies the ratio ZOUT(j1, j2, j3) / ZOUT(j1)  1/3. Also the other numerator to denominator terms tend to settle and this effect is more pronounced for the larger Cgs value, which provides better HD3 for higher frequencies at the expense of reduced voltage gain. 
Thermal noise analysis:
Using the schematic shown in Fig. 13 we can replace the transistors MA and MB by one device with transconductance gm = gmA + gmB and the corresponding input-referred voltage noise PSD of 4kT /gm. By simple calculations the noise by the RS resistor can be found as 4kTRS /(gmRS) 2 whereas the total input-referred noise PSD would be
This can be compared to SC noise (7)
For example, using Cu = 200 fF, fS = 3 GHz,  = 1, gm = 40 mS, and RS = 50 , from (40) we find RB  600  which can be considered the upper bound of the bias resistance.
V. SIMULATION RESULTS
The circuit has been simulated using Cadence software with RF transistor models (65 nm CMOS). The driver output will be connected to an off-chip load (50 ) and parasitic capacitances caused by wire-bonding cannot be avoided. In simulation we assume that parasitic capacitance equal to 100 fF. For signal frequencies up to 5 GHz the gain drop remains within 0.5 dB shown in Fig. 18 . Using 1.2 V, we distribute the supply current between the main and the auxiliary transistor to be 15.4 mA (providing g1A = 37.1 mS) and 1.2 mA (providing g1B = 2.9 mS), respectively. The total current through Rs would be 16.6 mA with the total gm of 40 mS. With Rs = 65  and Rload = 50 , the open-loop gain can be calculated as (RloadRs)(g1A +g1B) = 1.13 which results in the feedback-loop gain of -5.5 dB. 18 shows the voltage gain of the driver close to 5.5dB over the band of 0.15 GHz. It changes only by 0.5 dB at 5 GHz and this drop is mainly due to the intentionally added parasitic capacitance 100 fF. The corresponding cut-off frequency of the driver is approx. 20 GHz. Within the band of 0.54 GHz, HD2 < -90 dB and HD3 < -66 dB are attained as shown in Fig. 19 . The dependence of HD2 and HD3 on the bias voltage is demonstrated in Fig. 20 . HD3 < -70 dB and HD2 < -90 dB are achieved with the change of bias voltage from 0.95 V to 1.25 V. The temperature variation is also verified and shown in Fig. 21. HD2 is not sensitive to temperature whereas HD3 is always less than -70 dB in the range of -10100 0 C. The power supply rejection ratio (PSRR) is verified for a sinusoidal disturbance imposed on Vdd over a wide frequency range as shown in Fig. 22 . With intentionally introduced 5 % mismatch into two branches of the differential circuit, PSRR is always greater than 40 dB.
The SC array is designed based on the available design area W-Cu for given resolution and signal bandwidth. The simulation results of the complete SC DAC for 7-and 8-bit resolution are shown in Fig. 23 for OSR = 1.1. For frequencies above 3.5 GHz the SFDR drops due to violation of the settling time. The attained SFDR is not affected by the output driver for its high linearity and bandwidth. For 6-bit resolution with 12 GHz clock frequency the unit capacitance value must be largely reduced according to the available design area shown in Fig. 11 . In this case we choose Cu = 30 fF while using the same size of switches (W = 64 m).
The simulation results of the complete SC DAC with 6-bit resolution are shown in Fig. 24 for OSR = 1.1. The attained SFDR approaches 34 dB but for clock frequencies below 8 GHz it is largely deteriorated due to leakage effects caused by too low time constant RoffCu. In fact, this could be seen as the lower and upper bounds on Cu, shown in Fig. 11 , to go up for lower clock frequencies while the actual operating point (Cu, W) remains unchanged and ultimately it falls out of the design area.
The DAC power consumption is mostly due to the output driver and the digital clock part (75 mW @ fs = 3 GHz). The pipeline SC array consumes much less power (15 mW @ fs = 3 GHz) since its total capacitance is relatively low and the operation is based on charge redistribution. The summary of the SC DAC performance is presented in Table I . 
where N is the number of bits, BW (or BWN) is the signal bandwidth, Ptotal is the total power consumption, whereas
Vswing and SFDR stand for the differential output swing and spurious free dynamic range of DACs, respectively. 
VI. CONCLUSION
In this paper we have presented a pipeline SC DAC design in 65 nm CMOS for high-speed applications. With 300 mVpp output voltage range and GHz bandwidth the DAC is well suited for the contemporary wideband wireless transmitters. Having designed a highly linear output driver we have found the SC array to limit the DAC performance mostly due to the settling and clock feed-through effect rather than kT/C noise and capacitor mismatch. By precise cancelation of the clock feed-through the SFDR was shown to follow well the resolution up to 8 bits for GHz signal bandwidth (45 dB for n = 8 and BW = 1.36 GHz or 33 dB for n = 6 and BW = 5.5 GHz ) without extra means of correction. Thereby the tradeoff between DAC bandwidth and resolution accompanied by SFDR has been demonstrated. Higher SFDR can be attained by reducing the clock frequency for increased number of bits and redesigning the SC array accordingly. However, from analysis of the available (Cu, W) design area we find increasing the resolution from 8 to 9 bits with the corresponding improvement of SFDR, feasible only for clock frequency reduced below 0.5 GHz. In such a case the data rate can be elevated by employing the interleaving technique [6] or using a newer CMOS technology.
The main contribution of this paper is showing an alternative way to design high-speed DACs beside currentsteering technique which has been widely used. The existing SC DACs have been shown competitive mostly in terms of power consumption for no static current in their SC circuits. Even more advantages can be attained in deep submicron technologies. Using 28 nm FD SOI for example, the SC circuits with their capacitances and on-resistance switches can achieve better matching than their current-steering counterparts where the transistors operate in saturation region. The output impedances of current-cells decrease with scaling down due to lower supply, resulting also in more distortion at the DAC output, whereas SC arrays do not have this problem.
APPENDIX A DERIVATION OF THE VOLTERRA OPERATORS FOR THE PROPOSED DRIVER AT LOW FREQUENCY
For the circuit shown in Fig. 13 (a) the respective currents and voltages can be expressed as
Where giA and giB are the i th -order coefficients of MA and MB, accordingly, obtained by taking the derivative of the drain dc current IDS with respect to the gate-source voltage VGS at the dc bias point
Then from (A.9) and (A.10) the functions G1, G2, and G3 are obtained respectively
In a similar way, for the DS-only driver we find Fig. 8 and Fig. 25 , where the latter shows [Wmin,Wmax] = [63, 3000] µm. However, the real advantage is that higher frequencies and resolutions can be attained with this technology.
In FD SOI technology the oxide capacitance Cox is much smaller than in standard bulk CMOS so Cg(W) is smaller and the channel length L = 30 nm results in less Ron as well. That is why the upper bounds for Cu are higher than the ones obtained with 65 nm technology. A smaller value of Cox also results in smaller Cp(W) in (20) which makes the Cu lower bound (due to feed-through effect) more relaxed as well.
In particular, at 3 GHz clock frequency as demonstrated in Fig. 26 it is possible to design 11-bit SC DAC using (W, Cu) = (125 m, 500 fF) with a sufficient design area around to guarantee circuit robustness. The value of unit capacitance is larger than required by kT/C noise condition (14) Cu > 50 fF for n = 11. Observe that the lower Cu bound is, in fact, dictated by the feedthrough condition (20) (green line) and not by Roff in (31). It also shows that 12-bit DAC at 3 GHz is not feasible (green dotted line) unless the demands for accuracy are relaxed to one LSB.
Additionally, in Fig. 27 we show the speed capability of a 9-bit SC DAC that with a tight design area demonstrates a theoretical data rate of 135 Gbps. 
