Abstract-In order to achieve high speed and high resolution for switched-capacitor (SC) digital-to-analog converters (DACs), an architecture of split-segmented SC DAC is proposed. The detailed design considerations of kT/C noise, capacitor mismatch, settling time and simultaneous switching noise (SSN) are mathematically analyzed and modelled. The design area W-Cu is defined based on that analysis. It is used not only to identify the maximum speed and resolution but also to find the design point (W, Cu) for certain speed and resolution of SC DAC topology. The segmentation effects are also considered. An implementation example of this type of DACs is a 12-bit 6-6 split-segmented SC DAC designed in 65 nm CMOS. The linear open-loop output driver utilizing derivation superposition (DS) technique for nonlinear cancellation is used to drive off-chip load for the SC array without compromising its performance. The measured results show that the SC DAC achieves a 44 dB spurious free dynamic range (SFDR) within a 1 GHz bandwidth of input signal at 5 GS/s while consuming 50 mW from 1 V digital and 1.2 V analog supplies. The overall performance that was achieved from measurement is poorer than expected due to lower power supply rejection ratio (PSRR) in fabricated chip. This DAC can be used in transmitter baseband for wideband wireless communications.
I. INTRODUCTION
IGH-SPEED digital-to-analog converters (DACs) are in demand for many modern electronic systems, including instrumentation applications, radar systems, and wideband reconfigurable radios. High-data-rate communication requires increasingly larger bandwidths, such as the recent wideband standards for UWB [1] and 60-GHz [2] radios.
The current-steering DAC architecture has received a lot of focus for wideband high-speed applications due to its simplicity and good performances among DACs [3] [4] [5] [6] [7] [8] [9] . Another class of DACs, which also has good potential in wideband high-speed applications, is the switched-capacitor DACs (SC DACs). However, it has not been drawn much attention except for few recent publications [10] [11] [12] . Fig. 1 shows the conceptual architecture of SC DACs. It consists of the SC array and output driver to drive the off-chip load (50Ω). Table I shows the overall comparison table of current Manuscript received January 24, 2017 . This work was supported by Swedish Foundation for Strategic Research.
The authors are with the Dept. of Electrical Engineering, Linköping University, SE-581 83, Sweden, (email: quoctaiduongameyabhi@gmail.com, atila@isy.liu.se). Figure 2 . The SC DAC with track-and-hold circuit in [11] . requirement. The SC arrays thus have a supply-voltage scaling potential for more advanced deep sub-micron technology node. However, output driver still needs head-room and is not friendly with technology scaling. The power consumption can be low as its operation is based on charge redistribution. The first part of capacitive DACs is the SC array. Previous implementations of high-speed capacitive DACs use the socalled pipeline architecture [10, 11] . Additionally, a timeinterleaved topology of the pipeline SC was utilized to improve the speed of the DAC [11] . However, it can only work up to 800 MS/s due to the finite bandwidth of the trackand-hold circuit, as shown in Fig. 2 . In order to meet the kT/C noise requirements, a large capacitor is required for the holdoperation, which limits speed. Moreover, based on the earlier study we have found that the pipeline capacitor array suffers from clock feed-through effect for high-resolution (12-bit) DACs [12] . Fig. 3 shows an example of the clock feed-through signal propagating to output. When switch Φ2 is going to be off, the spike appears at node B. Then switch Φ3 is on and transmitting it to node C since Φ1 is off at that time. The spike will be added into the feed-through signal caused by Φ3 at node C. Similarly the feed-through voltage is accumulated and propagated to output and causes distortion. The pipeline SC DAC also needs a precise multiple-phase clock, which is onchip generated and resulting in some amount of power dissipation. The analysis in [12] shows that it is not feasible to design beyond 8-bit resolution DAC at 3 GS/s. Another important part of capacitive DACs is the output driver. The ability to drive off-chip load of SC arrays is poor, so a suitable output driver is necessary. This driver is critical for the linearity of a capacitive DAC when a high output voltage swing is required [11] . One solution to this problem is using a closed-loop architecture where the circuit linearity is improved by negative feedback [10] . However, the performance of this technique cannot be maintained at high frequencies (> 1 GHz) due to the loop bandwidth limitation. Another solution is an open-loop driver design that can be supported by a pre-distortion calibration [11] . However, it requires a huge memory of a lookup table for dynamic predistortion. For example, it would need 562 TB of memory for 12-bit DAC calibration [13] . Moreover, only static nonlinearities can be calibrated using a low-but-accurate ADC (in feedback loop), while the presence of significant dynamic nonlinearities would force the ADC to sample at least at the DAC update rate [14] .
A split-segmented SC array topology can overcome the above limitations of the pipeline SC DAC arrays, such as the clock feed-through problem, since it can get rid of pipelined charge propagating to the output. As shown in Fig. 4 , any clock feed-through voltage will be charged/ discharged to supply/ ground since either switch Ui or Ubi is on at any time. Moreover, this SC array works based on charge/voltage division so it does not need the track-and-hold circuit. Also, by avoiding the multiple-phase clocks for the proposed SC array, power can be saved and timing challenges due to multi-phase clocks can be avoided. The split-segmented SC array can work at even higher speed (5 GS/s) since it is not limited by kT/C noise due to absence of the track-and-hold circuit.
This work proposes an architecture of split-segmented SC DAC to overcome the clock feed-through issue from pipeline SC DAC. The detailed design considerations of kT/C noise, capacitor mismatch, settling time and simultaneous switching noise (SSN) are mathematically analyzed and modelled to define design area W-Cu. It is used not only to identify the maximum speed and resolution but also to find the suitable design point (W, Cu) for certain speed and resolution of SC DAC topology. The segmentation effects are also considered.
In order to design a driver to be able to work at high frequency (> 1 GHz) without using any form of calibration, this work also proposes a linear open-loop driver which uses the technique of derivative superposition (DS) [15] for canceling both second and third-order harmonic distortion to drive the proposed split-segmented SC array. Since the output swing of DAC is low suitable for wireless applications, nonlinearity of driver is found not the main limiting factor.
The most recent implementations of SC DACs have previously been reported a maximum sampling rate up to 800 MS/s [11] for a BW of 400 MHz. Hence, this work aims to further extend the signal BW of SC DACs up to 1 GHz using a proposed split-segmented architecture and an open-loop output driver. Additionally, the design and analysis of the output driver using Volterra Series [16] are also presented. The experimental prototype shows that SC DAC can work up to more than 3 GS/s. However, SFDR is limited by SSN which requires sufficient PSRR. It has potential to improve SFDR when SSN still has room to be reduced by using more advanced packaging techniques such as flip-chip, 3-D packaging. Another way for reducing SSN effect is to use additional circuits to improve PSRR for SC arrays.
The remainder of this paper is arranged as follows. In Section II the split-segmented SC DAC architecture is presented. Analyses of the SC array are provided in Section ... III. The definition of design area W-Cu and the way how to find design point (W, Cu) are described in this section. Section IV presents the output driver design and analysis. Section V provides chip implementation and measured results of the complete 12-bit SC DAC. Conclusions are formulated in the last section.
II. A SPLIT-SEGMENTED SC DAC ARCHITECTURE
The proposed capacitive DAC architecture is shown in Fig. 5 . Assuming segmentation of m most significant bit (MSB) thermometer bits and (n-m) least significant bit (LSB) binary bits for n-bit resolution SC DACs is considered. Moreover, two attenuated capacitors in the middle for reducing capacitance spread [17] create two split-binary segments of k1 bits and k2 bits. The mux-based thermometer decoder can be used to convert m MSBs of the input to (2 m -1) unary bits since it is more suitable for high speed than the logic-based decoder [18] . It can be designed based on the previous work of the 5─32 decoder in [5] with some modifications according to the particular specifications. The SC cell is based on capacitor/ voltage divider and linearly proportional to reference voltage Vref, unit capacitance Cu, output capacitance Cfilter according to (1) (2) .
The charge-divider based SC array topology shown in Fig .4 is used to eliminate clock feed-through issue. The feed-through voltage at the internal node A between switch Ui and Ubi in Fig. 5 will be immediately charged/ discharged to Vref/ GND since when one of these switches is off, another will be on, and vice versa. How to design unit capacitance Cu and switch size W will be described in section III. The linear open-loop output driver which will be described in details in Section IV is used to drive off-chip load (50 Ω matching from equipment). Neglecting all parasitic capacitances for simplicity, the output voltage of SC arrays is linearly dependent on the values of the input data Ui, bi, supply Vref, unit capacitance Cu, and output capacitance Cfilter as
and the attenuation capacitances Catten1 and Catten2 are proportional to the unit capacitance Cu [17] , Catten1 = 2 k1
Cu/(2 k1 -1), Catten2 = 2 k2 Cu/(2 k2 -1).
III. THE ANALYSES OF SC ARRAY
In order to identify the maximum speed and resolution that SC DAC can work and find the suitable values for the switch sizes and unit capacitance for certain speed and resolution, this section describes the analysis of kT/C noise, capacitor mismatch, settling time effect and simultaneous switching noise (SSN) in the n-bit split-segmented SC arrays. The design area W-Cu and design point (W, Cu) are defined based on that analysis.
A. kT/C noise
Noise contribution from all switches to the output can be estimated using the superposition principle.
1) Noise due to thermometer SC arrays
Fig . 6 shows the equivalent circuit to calculate noise contribution from the thermometer segment where the notation n stands for bit number of DAC, m for bit number of thermometer segment, k1 for bit number of the first splitbinary segment, and k2 for bit number of the second splitbinary segment. From A.1-A.7 in the appendix A, the total admittance of thermometer segment, two binary segments and load (Cfilter) in Fig. 6 is found as
The power spectral density (PSD) noise at output due to each thermometer branch in thermometer segment is 
The total power spectral density (PSD) noise at output due to thermometer segment is calculated as
2) Noise due to the first binary segment Similar to the above analysis, the total power spectral density (PSD) noise at output due to the first binary segment is as
where AV1 is the voltage gain from noise source of switch in the first binary segment to the output.
3) Noise due to the second binary SC arrays
Similarly the total power spectral density (PSD) noise at output due to the second binary segment is as
where AV2 is the voltage gain from noise source of switch in the second binary segment to the output. The summation in (6) and (7) is divided by 2 i since the onresistance of binary switches is approximate to Ron/2 i as shown in Fig 
4) Noise due to all binary and thermometer SC arrays
The total power spectral density (PSD) noise at output is sum of all individual noise sources as
. (8) Generally, the kT/C noise should be less than quantization noise of DAC with certain resolution. For n-bit DAC, the quantization noise PSD can be calculated from
where VFS is the full-scale voltage, n is the number of bits, and BW = fs /2 for Nyquist rate, the minimum capacitance value Cu can be estimated using (8) and (9) as
B. Capacitor mismatch Similar to the capacitor mismatch analysis in [12] the standard variation of VDNL is described as
where VFS is the full-scale output range, VDNL is the differential nonlinearity of DAC, Cu = KCA, A is the capacitor area, n is the DAC resolution while Kσ (matching coefficient) and KC (capacitance density) are technology constants provided by the chip manufacturer.
The 3-σ standard can be used, meaning that 3-σ(VDNL) should be less than a half of VLSB. For simplicity, the lower bound for the unit capacitor due to the mismatch in n-bit DAC is estimated as ( )
With 65 nm CMOS process and metal-to-metal capacitor, we have Kσ = 0.5% µm, KC = 0.39 fF/µm 2 , for example 12-bit DAC gives the unit capacitor Cu > 0.4 fF. Fig. 7 gives a comparison of VDNL standard deviation obtained with model (11) and SpectreRF simulations for 12 bits. The simulation results are achieved by statistical Monte-Carlo (MC) analysis only with the focus on capacitor variation whereas the switches are assumed ideal and the simulation count is 100. A very low clock frequency was used to verify the static VDNL with input data of MSB transition [27] . We usually use 3σ standard, meaning that 3σ(VDNL) should be less than a half of VLSB. The limitation line shown in Fig. 7 is therefore one sixth of VLSB of 12-bit resolution DAC which is 12 µV with VFS =0.3 V. The VDNL standard deviations are in [0.1-1] nV range while those from simulation are tracking with around half less than this. The results from simulation are lower than the ones from estimation because in calculation we assumed total variation of mismatch as maximum [27] while the variation of unit capacitors mismatch in Monte-Carlo (MC) simulation is randomly distributed. Importantly, the both values are by orders of magnitude less than the limitation voltage of 12-bit DAC. Therefore the limit due to the capacitor mismatch can be neglected for 12-bit resolution and metal-to-metal unit capacitor.
C. Settling time
Due to imperfections of the transistor switches (Roff / Ron resistances) the leakage and incomplete settling in the capacitor array result in voltage errors which degrade DAC linearity [19] . Firstly the upper bound of on-resistance of switch is estimated. When switch is on, the time-constant will be RonCu due to Cfilter ≫ R Cu as shown in Fig. 8 . The maximum charged/discharged voltage error due to settling time should be less than a half LSB voltage of n-bit DAC 
where ton ≈ Ts/2 = 1/2fs is pulse width of clock cycle Ts, Vref is voltage supply for SC, Cu is value of unit capacitance. Therefore for assuming Vref = VFS, one of upper bounds of Ron can be found as ( )
For example, with n = 12, Cu = 10 fF, fs = 3 GS/s, Cfilter = 2 pF, the on-resistance of switches should be less than 1.85 KΩ.
The circuit model due to off-resistance effect is shown in Fig. 8 . If data is high, transistor switches to supply Vref shown in Fig. 8a while data is low switch is on to GND shown in Fig.  8b . In both cases, the voltage error due to leakage is the same
We have to keep this voltage error less than a half LSB of n-bit DAC with the same assumption of Cfilter≫ Cu as
From this condition, the lower bound of off-resistance can be obtained
However, this voltage error does not cause distortion for DAC output since it has the same effect for all of branches with assumption that all switches have the same Ron and Roff. Instead, it reduces output swing due to voltage drop on Ron by leakage current through Roff. Simulation shows that Roff has negligible effect on the linearity of DAC.
D. Capacitance parasitic effects
Parasitic capacitances Cparp and Cparn of PMOS and NMOS switches in Fig. 9 cause distortion through settling time by adding more capacitances. When bk = 1, the node at Vref_k is supposed to charge to Vref. For the worst case, while we assume voltage at Vout now is zero the total capacitance to be charged is (Ck + Cparn_k) = (Cu + Cpar) for assumption of Cparp = Cparn = Cpar. The parasitic will be added to Cu in equation (14) for the upper bound of Ron
The upper bound of Cu is calculated from (18) as
It can be seen in (19) that parasitic capacitance Cpar narrows down the range of unit cap Cu. Figure 9 . Equivalent circuit model for parasitic capacitances. 
E. Simultaneous switching noise (SSN)
The linearity of high-resolution and high-speed SC DAC is significantly limited by simultaneous switching noise (SSN) caused by the parasitic inductance of the bonding wires shown in Fig. 10a . The decoupling capacitors (de-caps) Cdecap are locally used on-chip for keeping the internal supply stable and have equivalent parasitic resistance Rdecap. Each supply has two bonds VDD and VSS which can be modelled as RCL [20] [21] [22] [23] . Typically, the inductances LDD and LSS are dominant and cause ripple/noise on internal supply since the series resistances (RSS, RDD) and package capacitances (CDD, CSS) have less effects.
The self-inductances (LDD, LSS) of a round wire can be estimated as [23] 
where L stands for the inductance per unit length, R for radius of the conductor, l for length of bond wires, and 0 for permittivity of free space. For typical bond wires, L approximates 1 nH/mm. The maximum SSN voltage is estimated in [21] as
where m is number of drivers, n is the device transconductance value near the switching threshold voltage
. For sub-micron MOSFETs, alpha ( ) is in the range of {1-2}. The tr is the rise time. The constant k = 3 can be used according to [21] , the formula (21) can be rewritten as
There are several ways to reduce SSN. Separating supplies as shown in Fig. 10b is one of the ways which can be used to avoid noise effect each other with different supplies. The deep-NWell (DNW) layer and guard rings can also be used to eliminate the body effect and to have more isolation respectively. Based on the voltage error model in (22) SSN can be reduced by minimizing inductance L. Using multiple bond-wires, n wires for example, of each supply makes the effective inductances lower by 1/√ due to mutual effects. Quite big de-caps (MIM cap and MOS cap) can be utilized to locally suppress noise as long as area is allowed. Many parallel de-caps with different values to achieve different selfresonant frequencies (SRF) and equivalent series resistances (ESR) are soldered on-board to filter out different-frequency noise. Damping resistors can also be used to eliminate noise [20] [21] . Another damping/ filtering technique applied on the signal path were introduced and used in this design as shown in Fig. 11 . Simulation shows that it can improve around 6 dB SSN since ripple voltage on signal path is reduced around half. It can be understood from another respective that this filtering technique improves power supply rejection ratio (PSRR) for SC DACs. The low values of R = 10 Ω, C = 600 fF are used to avoid the signal loss which causes reducing the voltage swing of DAC. Since voltage drop on R is negligible and its cut-off frequency is high (f C = 53 GHz) this technique has no side effect on the output. This RC filter helps to improve DAC performance not by filtering signal, but by suppressing SSN through damping resistor R hence improving PSRR.
The SSN on the supply of switch-driver is the most critical since bigger sizes of switches have been used for fast switching. The upper bound of transistor sizes can be found based on (22) . This maximum noise voltage should be less than a half LSB of n-bit DAC
From (22) (23) , the upper bound of switch sizes can be found On-chip ( )
Based on formula (24), the transistor sizes of switches in the switch-driver can be chosen depending on the rise time of clock tr, supply (VDD), the output swing (VFS), the DAC resolution (n), parasitic inductance of bonding-wires (LSS) and technology ( nCox). Fig. 12 shows SFDR change versus SSN voltage with assumption signal voltage of 400 mVpp (output swing of SC arrays), PSRR of SC DAC equal to 0 dB. SSN voltage of 0.1 mVrms is required to have 60 dB SFDR which is very tough to achieve. It is therefore necessary to have reference voltage generator to provide sufficient power supply rejection ratio (PSRR). One of the simple ways is to use a reference generator shown in Fig. 13 .
F. Define design area W-Cu
In order to identify the maximum resolution and sampling rate that SC DAC can be designed, this section defines the design area W-Cu based on the above analytical considerations of kT/C noise, capacitor mismatch, settling time and SSN. The parameters W and Cu are the width of switches with minimum length and value of unit capacitance, respectively. This methodology is also used to select an appropriate design point (W, Cu) for a certain resolution and speed of SC DACs.
First, kT/C noise (8) is considered. Fig. 14 shows the quantization and kT/C noise versus sizes of switches and unit capacitance at 5 GS/s for different resolutions. Fig. 14a shows that kT/C noise does not limit performance for 12-bit (6-6 segmentation) DACs since it is always less than quantization noise (assuming the output swing of overall DAC equal to 0.3 Vpp). Theoretically, it is possible to design 13-bit (7-6 segmentation) DACs if either Cu > 30 fF for small W (≈1 um) or Cu > 40 fF for large W (≈40 um) is chosen as shown in Fig.  14b . The kT/C noise is almost independent on clock frequency.
Second, simultaneous switching noise (SSN) (22) is considered. Fig. 15 shows SSN versus switch sizes with different resolutions at 5 GS/s for PSRR ≈ 40 dB. SSN is mainly dependent on the sizes of switch drivers and independent on Cu (22) . In this case the assumption that the main switch size is two times bigger than the switch driver size is used. PSRR of 40 dB can be achieved from reference generator, big decoupling capacitors and several techniques that are described in SSN section above. By comparing between SSN voltage and half LSB voltage, the bounds of W < 3.5 um for 12 bits and W < 0.3 um for 13 bits are found in Fig.  15 . This higher bound of switch sizes will change according to PSRR. A larger range of W can be achieved with higher PSRR and vice versa.
SSN is also dependent on clock frequency and Fig. 16 shows SSN versus switch sizes with different sampling rates of 5 GS/s and 8 GS/s for PSRR ≈ 40 dB and 12-bit resolution. By comparing between SSN voltage and half LSB voltage, the bounds of W < 3.5 um for 5 GS/s and W < 2.2 um for 8 GS/s are found in Fig. 16 .
Design area W-Cu:
The design area W-Cu is defined based on the lower and upper bounds of W and Cu caused by kT/C noise, SSN, settling time and capacitor mismatch. For example, Fig. 17 shows design area W-Cu of 12-bit (6-6 segmentation) and 13-bit (7-6 segmentation) SC DACs for clock frequency fs = 5 GHz. The solid lines which consist of SSN (22), settling time (19) and mismatch (12) are upper/lower bounds of 12-bit SC DACs. The mismatch coefficients in (12) are taken from 65 nm CMOS process. The parasitic capacitance of each switch in (18) is assumed to 5 fF maximum. The kT/C noise is not considered for 12-bit resolution since it meets requirement shown in Fig.  14a . The area, which is limited by W < 3.5 um from SSN, Cu > 0.4 fF from capacitor mismatch, Cu < the solid line from settling time effect, is the design area W-Cu of 12-bit SC DAC at 5 GS/s. The design point of (W, Cu) = (2 um, 15 fF) can be chosen for implementation as shown in Fig. 17 . Similarly, the design area W-Cu of 13-bit SC DAC at 5 GS/s can be defined. However, this area is much shrunk and closed to the origin since SSN condition pushes W < 0.3 um. Moreover, the effect of kT/C noise appears for 13-bit DAC as shown in Fig. 14b . It means that the design area now does not exist anymore since another lower bound of Cu ≈ 35 fF from kT/C noise appears in Fig. 17 .
Another example is to consider the effect of the clock frequency on the design area W-Cu which is shown in Fig. 18 . It is only bounded by the effects of SSN, settling time and capacitor mismatch since kT/C noise is always less than quantization noise (shown in Fig. 14a ). Fig. 18 shows that it is theoretically possible to design a 12-bit SC DAC at 8 GS/s.
Segmentation effects:
The design area W-Cu is affected by segmentation ratio since kT/C and SSN are dependent on number of thermometer bits m. Segmentation does not have much effect on settling time and mismatch since time-constant RC is normalized for every switch and only the worst case is considered, respectively.
From Fig. 14b and Fig. 17 , it can be seen that it is not 
IV. OUTPUT DRIVER DESIGN AND ANALYSIS

A. Design
The output driver is used to drive the off-chip load (50 Ω) of the capacitive DACs since the SC array does not have the driving capability [10] [11] . The driver appears as the most critical circuit causing distortion for capacitive DACs unless the required output voltage swing is low. One solution to this problem is using a closed-loop architecture where the circuit linearity is improved by negative feedback [10] . This technique, however, cannot work well at high frequencies (> 1GHz). Open-loop output drivers that can be supported by a pre-distortion technique [11] is another solution. However, the latter is usually limited by an assumption of minimum memory effect of the nonlinear object and requires highly precise ADC in the feedback path [13] [14] .
To avoid the complicated calibration and be able to work at high frequency (GHz range), the open-loop output driver with non-linearity cancellation technique can be used [12] . It shows potential for wireless application which needs around 300 mV pp of DAC output to drive mixers. In this design of output driver, we also used open-loop topology and the derivative superposition (DS) technique to achieve high linearity, low power with standard supply of 1.2 V.
The conventional source-follower (SF) buffer is shown in Fig. 21a . The optimal bias for minimizing the third-order transconductance g3 is very narrow around Vbias = 0.9 V as shown in Fig. 21b . Moreover, the g2 is quite high at the same bias point. The derivative superposition (DS) technique introduced in [15] was used to cancel the third-order harmonic distortion for the radio frequency (RF) amplifier. Using DS technique, this proposed output driver achieves the cancellation of both g3 and g2 at the same input range (0.8 -1.2 V) as shown in Fig. 22 .
B. Analysis using Volterra series model
The equivalent model of output driver is shown in Fig. 23 . The parasitic resistances and capacitances roA, roB, CgsA and CsgB of the two transistors can be merged as Zout = (ZloadroAroB) and Cgs =CgsA +CsgB.
The harmonic distortion of output driver can be estimated using the Volterra series model [16] . The two-tone (ωA and ωB)
sinusoidal signal with the same amplitude A for each is applied at the driver input. We focus on the third-order distortion as it prevails over the other distortions where ω1 = ±ωA or ±ωB. Using (25) 
where (ω1 + ω2) = (± ωA ± ωB), (ω1 + ω2 + ω3) = (± ωA ± ωB ± ωB) or (± ωA ± ωB ± ωA) in general [16] . The comparisons of (26) is supposed to be closed to zero at low frequency in the working bias range in Fig. 22b . It is a function of frequency which makes degradation for HD3 at high frequency in Fig. 24 . Looking at the third term in (26), we can see that Cgs help to improve HD3 at high frequency because its denominator is proportional with 3xω while the nominator is a function of 1xω. This phenomenon can be seen by simulation verification shown in Fig. 24 . Both curves of analytical and simulated results where Cgs = 2 pF is applied are better than the ones where smaller Cgs is used.
C. Thermal noise analysis
The output noise due to thermal noise of output driver is calculated based on the previous work [12] as where gm = g1A + g1B = 129 mS is the total trans-conductance of driver, ROUT = 50 Ω is load resistance, RB = 1 kΩ is the bias resistance, ɣ ≈ 2/3. It approximates -173 dB/Hz. The total noise at frequency of 5 GHz for example is -76 dB which is less than -72 dB required for 12-bit resolution.
D. Simulation results
With specification of 300 mV output swing and assuming 5 % between two branches of pseudo-differential circuitry, the output driver is simulated in 65 nm CMOS technology. Within the band of 0.5−4 GHz, HD2 < -82 dB and HD3 < -78 dB are attained as shown in Fig. 25 . The dependence of HD2 and HD3 on the bias voltage is demonstrated in Fig. 26 . HD3 < -81 dB and HD2 < -82 dB are achieved with the change of bias voltage from 0.85 V to 1.25 V. The temperature variation effect on the linearity of driver is also verified and shown in Fig. 27 . HD2 is better with higher temperature whereas HD3 is always less than -80 dB in the range of -20−100 0 C.
Under different corner of process as shown in Table II , both HD2 and HD3 are better than -74 dB which is good enough for driving the 12-bit SC DAC. 
V. CHIP IMPLEMENTATION AND MEASURED RESULTS
The measurement setup with on-chip memory for SC DAC testing is shown in Fig. 28 . An integrated circuit (IC) was fabricated in a standard 65-nm CMOS technology and bonded to an RF PCB. In order to generate 12-bit data input for full speed testing of the DAC, the 1-Kbit memory has been implemented on-chip [24] . It is written into serially at a low speed (1 MHz) by an FPGA and then read at full speed internally during the DAC operation. The 32-bit data is split into two 16-bit streams representing odd and even data, then multiplexed using fs/2 clock to obtain two 8-bit data. However, this design only uses 12 bits of the 16 bits. Therefore the memory allows only a 64-point deep signal to be tested and hence the minimum frequency bin spacing in the input signal is fs/64. The sinusoidal differential signal is provided by signal generator and converted to rectangular wave by the internal clock drivers. The strategy of overall clock distribution is shown in Fig. 29 . The two-channel interleaving of 16 bits uses fs/2 for reading data out. The H-tree topology is used to distribute pseudo-differential clocks for the switch-drivers of switched-capacitor arrays. The total chip occupies 1.2x1.7 mm 2 shown in Fig. 30 . Almost half of the chip area is occupied by the memory and decaps. Metal-to-metal capacitors are used for both unit and attenuated capacitor since it has better a matching. Moreover, MIM capacitors cannot be used since its minimum value is limited by the foundry process. In order to reduce parasitic capacitance to the substrate, the bottom plate of Catten is connected towards the output. To have more density layout for saving area, the multi-layer (M2-M3, M3-M4, M4-M5) metal-to-metal capacitors were used. Similar to the current-steering DACs having limiting factors such as device noise, output impedance, signal swing and switching speed described in [28] , design of layout to meet the target specifications mentioned in section III for switchedcapacitor DACs is challenging. The main layout challenges can be described. For high resolution DAC, 12 bit for example, there are a large number of unit cells distributed in large layout areas. In order to provide clock to them with minimum skew and mismatch is one of the main challenges. To implement clock distribution as shown in Fig. 29 optimization of numbers of buffers and their sizes is required, especially at high frequency (> 1 GHz). Parasitic of routings could affect unit capacitance and onresistance so it is necessary to position unit cells in a good way and use high layers of metals for minimizing parasitic.
The connections of output signals of DACs have to be taken care as well. The top layers of metal and tree-architecture are used to equalize and minimize parasitic. Moreover, routings of supplies to switch drivers and unit cells also use similar strategy for reducing supply impacts. To reduce mismatch and parasitic, the active devices have to be placed closed together limiting spaces for internal decoupling capacitors to keep internal supplies stable.
SC DAC can work up to 5 GS/s for 1 GHz bandwidth (BW) and achieves 44 dB SFDR as shown in Fig. 31 . In silicon chip, more SSN and worse PSRR than expected degrade linearity of DAC. It has potential to improve SFDR when SSN still has room to be improved by using more advanced packaging techniques such as flip-chip, 3-D packaging. Another way for reducing SSN effect is to use additional circuits to improve PSRR for SC arrays. Nonlinearity of output driver is not the most critical in this case since the output swing is low (300 mVpp) and it has used circuit techniques of linearization. At lower frequency (fs = 1 GS/s, BW = 16 MHz), the better SFDR of 64 dB is achieved as shown in Fig. 32 .
As mentioned above, due to a 64-point deep signal to be tested the minimum frequency bin spacing in the input signal is fs /64. Fig. 33 shows the measured IM3 of -38 dBc at fs = 5 GS/s for 860 MHz and 1.02 GHz of two tones. The measured SFDR, IM3 of this work and comparison with others are shown in Fig. 34 . For the bandwidth up to 1 GHz of input signal the SFDR > 44 dB are achieved. The SFDR is better than 50 dB for input frequency up to 800 MHz while IM3 is lower than -50 dBc for fin < 700 MHz. This design achieves the highest speed among the previous reported works in the same type of SC DACs. The total power consumption is 50 mW from 1 V digital (40 mW) and 1.2 V (10 mW) analog supplies. The power and area breakdown of the SC DAC is shown in Table III . The total core area including clock distribution is approximate 0.2 mm 2 . To prove the scalability of this SC DAC for more advanced technology it is measured at lower supply of 0.9 V. The achieved performance of the DAC at low supply in term of SFDR is similar to the DAC with standard supply of 1.2 V, as shown in Fig. 35 . The power dissipation and output swing are lower in this case. The measured SFDR versus the output frequency fout with different clock frequency fs is shown in Fig. 36 . The performance of SFDR from low frequency to Nyquist for each fs = 1 GHz, 3 GHz and 5 GHz is plotted respectively. SFDR is degraded with both BW and clock. However, BW appears more critical for degradation of SFDR in region of higher frequency (case of 5 GHz). Additionally the simultaneous switching noise (SSN) which is described in section III has more effects at high frequencies (> 3GHz) and contributes more to that degradation. Fig. 37 shows the measured IM3 versus tone spacing with different the output frequency fout at fs = 5 GHz. As mentioned above, due to a 64-point deep signal to be tested the minimum frequency bin spacing in the input signal is fs /64. The higher clock frequency is the wider tone spacing is available to test. With fs = 5 GHz, the tone spaces will be integer multiple of 78.125MHz. In general, IM3 does not vary much unless spacing is too wide (> 1GHz) as shown in Fig. 37 .
This SC DAC can be used in baseband transmitter for wideband wireless communications such as UWB and 60-GHz radios. In order to show the potential of this design to be used in 60-GHz radio, the spectral mask is measured and shown in Fig. 38 . It is the spectral mask of 16-QAM modulation scheme with the IEEE 802.11ad (WiGig) standard. Single-carrier 16-QAM encoded random data with a frequency bin spacing of approximation 80 MHz between 0 to 880 MHz is first generated and pulse-shaped in Matlab with an 18 th -order rootraised-cosine (RRC) filter having a 0.25 roll-off factor [25] . Fig. 38 shows the measured spectral mask under these conditions at 5.28 GS/s operation. It can be observed that the mask of the IEEE 802.11ad (WiGig) standard is met and the out-of-band quantization noise from SC DAC is found not to be a limiting factor. The measured INL and DNL are 0.1LSB and 0.08LSB, respectively, on a 12 bit level. It shows that linearity of DAC is not limited by static non-linearity but limited by dynamic one, mainly from SSN.
The measurement results show total output noise of -72 dBm, equivalent with PSD = -169 dBc/Hz at 5 GHz sampling frequency. The thermal noise at the output of DAC is dominated by noise from output driver since this number is closed to the one in the output driver section (PSD = -173 dBc/Hz). Therefore the noise due to kT/C noise of SC arrays is negligible which agrees with analytical results shown in Fig.  14a . For comparing to other works, we used the figure-of-merits (FOMs) listed in [5] . There are 5 definitions of FOMs but only FOM3 is more generic and relevant i.e. suitable for any kind of DAC types. 20 10 3
where BW is the signal bandwidth, Ptotal is the total power consumption, whereas Vswing and SFDR stand for the differential output swing and spurious free dynamic range of DACs, respectively. Table IV shows comparison with the recent state-of-the-art reported works. For SC DAC, only 2 designs have recently reported in [10, 11] . The clock frequencies in those designs are not greater than 800 MHz and signal bandwidth not greater than 400 MHz. This design achieves the highest speed among the previous reported works of the same type of SC DACs. The FOM3 of this work is calculated at 5 GS/s for 2 input frequencies fin of 800 MHz and 1 GHz for SFDR of 50 dB and 44 dB, respectively.
The linearity of this design is poorer than expected due to more SSN (worse PSRR) in fabricated chip. It has potential to improve SFDR when SSN still has room to be improved by using more advanced packaging techniques such as flip-chip, 3-D packaging. Another way for reducing SSN effect is to use additional circuits to improve PSRR for SC arrays.
VI. CONCLUSION
This work proposes the split-segmented SC DAC topology which is more suitable for high-resolution and high-speed DACs as compared with pipeline SC DACs. The detailed analyses of kT/C noise, capacitor mismatch, settling-time, and switching noise in SC array are described to define design area W-Cu and find a suitable design point (W, Cu). It also proposes the open-loop output driver where both third-and secondorder distortions are cancelled to drive SC DAC arrays. Among reported capacitive DACs, this design achieves highest clock frequency (5 GS/s) and highest BW (1 GHz) with SFDR of 44 dB. The SSN is the most critical factor degrading the linearity of SC DACs for high frequency operation. There is a room to improve linearity of SC DAC by improving PSRR. This design has potential for use in the baseband of transmitters for wideband wireless communications such as UWB and 60-GHz radios. The total impedance of the second binary segment and Catten2 is found The total impedance of the first, second binary segments and Catten1, Catten2 is 
