This paper presents a 6-bit 4 GS/s current-steering digital-toanalog converter (DAC) for wideband systems. The 4-2 segmented structure is adopted for glitch reduction, and a dynamic decoder is proposed to maintain low power consumption and small area. In order to improve the high-frequency dynamic linearity, the forward-bias technique is employed to reduce the device sizes, and a compact one-dimensional (1-D) current source unit is used to further minimize the parasitic capacitance. The DAC is fabricated in 40-nm low-leakage CMOS process and occupies the active area of 0.036 mm 2 . Over the entire Nyquist range, measurement results show a spurious free dynamic range (SFDR) of >39 dB at 2 GS/s sampling rate and >29 dB at 4 GS/s, respectively. The DAC consumes 28 mW power from 1.1 V supply voltage.
Introduction
High-speed digital-to-analog converters (DACs) are urgently demanded in modern wideband communication systems, such as ultra-wideband (UWB) systems [1, 2, 3] . DACs with medium resolution (∼6 bits) and sampling rates beyond GS/s are preferred for better signal processing. Current-steering topology is widely adopted in GS/s DACs thanks to the inherent advantages of high-speed open-loop structure and low impedance switched nodes [1] . High dynamic linearity is critical for high-speed DACs in wideband applications.
Binary structure [3, 4] is often employed to simplify digital building blocks and save power, but the timing misalignment between weighted cells may occur, which will deteriorate the dynamic performance [5, 6] . Pseudo-segmentation [1] binary architecture is proposed to balance different signals propagation delay among cells. However, the huge glitches at medium code transitions still exist in both full-binary architectures. Current-steering DACs often employ segmented structure for performance improvement. Fig. 1 compares linearity of different segmented schemes for 6-bit DACs. The figure shows that adding bits in the LSB binary section will increase the DNL errors and degrade both static and dynamic linearity. The distortion due to the non-ideal glitch errors will increase by a factor of two with every additional binary bit [5] . Thus, the full-binary scheme is not often used due to the huge transient glitches. The full-unary scheme seems to be a best choice. However, the increasing digital complexity of the unary decoder will cause high power consumption and large area dramatically. Moreover, the conventional unary implementation will lower DACs' speed, which will limit >GS/s operations. The target here is to design a DAC with high speed, wideband linearity, low power consumption and small area. Since the more unary bits, the better linearity, the proposed design employs the 4(unary)-2(binary) segmented scheme instead of 3-3 segmented scheme [2, 7, 8] to achieve better dynamic performance. To relax the design constraints caused by the high-level segmentation, a decoder based on truesingle-phase-clock (TSPC) [9] dynamic pipelined decoding logics is proposed.
Thanks to the improved decoder, the DAC enables high-speed operation as well as low power and small area.
For GS/s DACs, the current source (CS) unit should provide sufficient output impedance (Zout) over the Nyquist frequency to achieve good dynamic linearity. At high frequency, the harmonic distortions induced by the signal-dependent Zout of CS unit cause performance degradation greatly. Meanwhile, when frequency is beyond GHz, the cascode structure used in CS unit may be less effective due to the serious parasitic effects [10] . The parasitic capacitance, either at the source or drain of the cascode device in CS unit should be minimized to suppress the distortion. To alleviate above problem, the 1-dimensional (1-D) CS unit is firstly used to achieve compact layout for minimum parasitic capacitance. The forward-bias technique [11] is further employed to reduce the device sizes and maintain the large voltage headroom. The reduction of device sizes contributes to minimum parasitic capacitance in the CS unit, thus better spurious free dynamic range (SFDR) can be obtained in a wide frequency range.
Based on above improved techniques, the paper presents a 6-bit 4-2 segmented current-steering DAC in 40-nm CMOS technology and Fig. 2 depicts its architecture. The DAC core is composed of digital decoding block, switch drivers and analog current source array (CSA). To easily and flexibly test the DAC, a designfor-test block (DFT) is integrated on chip to generate high-speed digital patterns. Next section describes the detailed building blocks, and section 3 shows the measurement results. Finally, the conclusions are drawn in section 4. 
Dynamic decoder
For current-steering DACs design, segmented structure provides optimized performance between area, power, speed, complexity and linearity. Conventional row-column decoders based on CMOS logic gates in Fig. 3 (a) are often used, but the long critical paths induce delay mismatches among different signals, which leads to the linearity degradation dramatically [6] . The long logic paths are decomposed into pipelined parts in Fig. 3(b) for high-speed operation. The pipelined approach enables >GS/s conversion rates and signals synchronizations with the help of flip-flops. The high-speed flip-flops are often implemented with TSPC logic [9] to save power and area. It is worth noting that the combinational logics between flip-flops seem to be unnecessary. Embedding the decoding logics into the dynamic flip-flops will remove the inherent logic paths. Based on this consideration, the paper proposes the improved TSPC dynamic pipelined decoding logics showed in Fig. 3 (c).
TSPC logics enable higher operation speed than static CMOS logics, and consume less power comparing with CML logics. Fig. 4 (a) depicts a general TSPC logic. The basic elements include TSPC p/n-latch (P-C 2 MOS/N-C 2 MOS) and n/p-precharge/evaluation stages (N-Block/P-Block), which operate in dynamic pipelined way. Fig. 4 (b) shows that the TSPC p-latch work in pass mode when CK ¼ 0, and in latch mode when CK ¼ 1, while vice versa for n-latch. Embedding the static decoding logics in Fig. 3 (b) into TSPC logics compose of decoding flipflops (DeFFs). This is feasible since the logics in 2-stage row-column decoder are AND, BUFFER, OR and AND-OR. Since the P-block will cause more delay compared with same-size N-block logics, it is beneficial to leave it as passing stage. Fig. 4 (c) shows the building DeFFs with p/n-latch and N-block. A group of DeFFs can be obtained by putting above gates into the N-blocks. The dynamic decoder makes DAC implement with an optimum 4-2 segmented structure, thus less glitch energy and higher linearity can be achieved in contrast with binary scheme. Comparing with CML implementation [8, 9] , the dynamic decoder draws less currents and consumes less power. In all, future high-speed DACs can benefit from the presented implementations. separates the current source cell from cascode and switch cells for good matching yield [12] . Both Mcas's drain (C1) and source (C0) parasitic capacitance lead to distortion at high frequency. C1 is dominated by the parasitic source capacitance (Csrsw) of Msw and drain capacitance (Cdrcas) of Mcas. In 2-D CS unit, C0 mainly consists of the parasitic capacitance (Cwire) of interconnection wires and the intrinsic capacitance (Cdrcs) of current source device. Since C0 may appear at the output of DAC through the parasitism of Mcas (Cgs, Cgd, Cdb) [10] , for better dynamic linearity, especially with sampling rate beyond GS/s, not only C1 but also C0 should be minimized.
A compact 1-D CS unit structure showed in Fig. 6 (b) similar as [4] is employed to eliminate Cwire firstly. The switch pairs, cascode transistor and current source are arranged in a compact strip layout, which removes the large Cwire and contributes great reduction of C0. Secondly, the dimensions of transistors in 1-D CS unit are further reduced using forward-bias technique.
On the one hand, the size of Mcs can be reduced to diminish the Cdrcs. The size of current source transistor to guarantee the matching requirement [12] is calculated with (1):
Where (2):
Where jV th0 j is jV th j with zero-biased source-bulk voltage, γ is the body bias coefficient, and j f j is the bulk Fermi potential. From Eq. (2), a method to reduce the threshold voltage of the MOS transistors is to forward bias the bulk-source junction. In the conventional NMOS CS unit, the V s and V b are connected together to ground with no bias phenomenon. By using forward-bias technique, the threshold voltage decreases as the V bs increases. According to Eq. (1), the overdrive voltage V OV is increased without consuming extra voltage headroom, and this allows the reduction of CS area with the constant gate voltage and unit current. One the other hand, the body voltage of cascode device (Vbcas) and switch pairs (Vbsw) in conventional CS unit are connected to ground as well. However, both the source voltages of Mcas and Msw are larger than 0, typically several hundreds of millivolts during normal operation. According to Eq. (2), V bs < 0 increases threshold voltage V th , hence increases the transistor sizes. With forward-bias technique, the body voltages of cascode device and switch pairs are also biased to maintain large voltage headroom to reduce sizes. Fig. 7(a) shows the improved CS unit with forward-bias technique and Fig. 8(a) depicts the area reduction of current source with different forward-biased junction voltage V bs . The area reduction is normalized with zero-biased bulk-source voltage (V bs ¼ 0). It depicted obviously from the figure that forward biasing the bulk-source junction of NMOS current source reduces the required area. Monte Carlo simulation is carried out carefully to maintain the 99.7% INL yield. The body voltages can be separately tuned in the deep N-well (DNW). The forward-bias voltage of current source Vbcs is set to 0.3 V to avoid the p-n junction turning on and Fig. 7(b) shows the biases generation circuits. A replica circuit of CS unit is used to generate the bias voltages Vbcas and Vbsw, which sets V bs of Mcas and Msw to 0. This configuration avoids extra source parasitic capacitance induced by the direct connection with local substrates. Comparing with the conventional CS unit with V b ¼ 0 for each transistor, the sizes of the improved 1-D CS can be reduced due to the increased voltage headroom, and C1 and C0 can be further diminished. Fig. 8(b) shows the frequency response of output impedance of CS unit and the grey line represents the conventional 2-D CS unit in Fig. 6(a) . With above techniques, the frequency response of the proposed 1-D CS unit can be modified into the blue one. With the minimum parasitic capacitance of C0 Ã and C1 Ã , the output impedance is improved at high frequency. The feed-through effect is also (a) Improved 1-D CS unit (b) Biases GEN circuits Fig. 7 . Improved 1-D CS unit with biases generation circuits alleviated due to the size reduction of switch pairs. Overall, the better highfrequency dynamic linearity can be obtained with the presented 1-D CS unit.
Measurement results
The presented 6-bit DAC is implemented in 40-nm low-leakage CMOS process with nominal Vth of 385 mV/450 mV for NMOS/PMOS, respectively. Fig. 9 shows the micrograph of the chip and corresponding layout view. The DAC core occupies 0.036 mm 2 Single-tone and two-tone tests are performed for characterizing the dynamic performance. A transformer converts the differential DAC output to the single-end input of spectrum analyzer. Fig. 11(a) shows the output spectrums for near-Nyquist signal at sampling rates 2 GS/s and Fig. 11(b) depicts the input signal frequency of 1.5 GHz at 4 GS/s. The measured SFDR/SNDR are 39.9 dB/33.8 dB and 36.7 dB/29 dB, respectively. As the signal increases towards high frequency, nonlinear distortions are induced due to the poor matching of output traces on test board and parasitic effects of the bonding wires. The SFDR and SNDR as a function of signal frequency at conversion rates of 2 GS/s and 4 GS/s are summarized in Fig. 12 . Over the entire Nyquist range, the DAC achieves the SFDR of >39 dB at 2 GS/s and >29 dB at 4 GS/s, respectively. The measured two-tone test spectrum with frequencies of 1.05 GHz and 1.064 GHz at update rate of 4 GS/s is showed in Fig. 13 , and the third-intermodulation distortion (IMD3) less than −50 dB. Comparisons between the presented DAC with the state-of-the-art high-speed DACs are shown in Table I . Although the 3-3 segmented DACs in [7, 8] are fabricated in advanced 28-nm CMOS technology, the proposed DAC shows lower power consumption and comparable area in 4-2 segmentation, demonstrating the benefits of the improved decoder. The proposed work shows a similar SFDR performance but with higher sampling rate comparing with [7] . The DAC in [8] achieves higher conversion rate, but with poor dynamic linearity across the entire Nyquist frequency. The 6-bit binary DAC in [4] provides no competitive performance in terms of power and area due to the current source calibration scheme. Even with 4-2 segmentation, the presented DAC with forward-bias technique avoids redundant calibration circuits and obtains better performance regardless of power, area and speed. The pseudo-segmented 6-bit DAC in [1] achieves the highest dynamic linearity, however the present work shows higher sampling rate and competitive area with 4-2 segmentation. The DAC in [3] gets the highest operation speed, however its resolution is only 4-bit. To make comparisons among these high-speed DACs, the figure of merit (FoM) in [1] is adopted. The table shows the presented DAC occupies smaller area and obtains lower FoM among these GS/s DACs.
Conclusions
The paper presents a 6-bit current-steering DAC for wideband systems. The optimum 4-2 segmented scheme is adopted using the proposed dynamic decoder for low power consumption and small area. The 1-D compact CS structure with forward-bias technique further improves the dynamic linearity. Measurement results show the presented DAC achieves >29 dB SFDR over the entire Nyquist range at 4 GS/s sampling rate. The proposed techniques can be beneficial for future wideband DAC designs. 
