The present work deals with 12-bit Nyquist current-steering CMOS digital-to-analog converter (DAC) which is an essential part in baseband section of wireless transmitter circuits. Using oversampling ratio (OSR) for the proposed DAC leads to avoid use of an active analog reconstruction filter. The optimum segmentation (75%) has been used to get the best DNL and reduce glitch energy. This segmentation ratio guarantees the monotonicity. Higher performance is achieved using a new 3-D thermometer decoding method which reduces the area, power consumption and the number of control signals of the digital section. Using two digital channels in parallel, helps reach 1-GSample/s frequency. Simulation results show that the spuriousfree-dynamic-range (SFDR) in Nyquist rate is better than 64 dB for sampling frequency up to 1-GSample/s. The analog voltage supply is 3.3 V while the digital part of the chip operates with only 2.4 V. Total power consumption in Nyquist rate measurement is 144.9 mW. The chip has been processed in a standard 0.35 µm CMOS technology. Active area of chip is 1.37 mm 2 .
Introduction
The rapid improvement in the field of wireless communications and the image signal processing area requires the designers to put an increasing amount of design effort in the integration of digital and analog systems on a chip (SoC). High performance DACs find applications in the area of wireless transceivers such as Wireless Local Area Networks (WLAN) and Wireless Metropolitan Area Networks (WMAN), image signal processor such as High Definition Television (HDTV), digital signal synthesizers, and etc. CMOS current mode DACs are the natural candidate for such applications Because of their high speed, low power, and cost effectiveness [1] . Nowadays the WLAN products are increasing in the market. The WLAN infrastructure such as access points connected to the internet exists now everywhere in homes, offices, and public spaces such as WLAN hotspots. New services or applications are being created by connecting various kinds of WLAN products with the WLAN infrastructure. Figure 1 shows the typical structure of a direct conversion (zero-IF) transmission chain for wireless applications.
Two DACs are needed to convert the I and Q digital modulated signals coming from the digital signal processor (DSP) into analog waveforms, which are smoothed by the following low-pass reconstruction filters. These baseband signals are then shifted to radio frequency (RF) by two quadrature mixers, and summed up to obtain the final waveform to be transmitted at the antenna, after the amplification provided by the power amplifier (PA) [2] . The baseband sections of such telecom standard transmitters typically consist of cascading of a digital-to-analog con- verter (DAC), receiving the digital signal processor (DSP) bit-stream, and an analog reconstruction filter, which has to suppress the DAC spectral images. Digital interpolations filter to be situated between the DSP (which typically operates at Nyquist frequency) and the DAC, to enhance the data-rate to the desired value. The design of such a baseband section of wideband wireless communication systems has to optimize the trade-off between two possible approaches: A low DAC conversion frequency, implies a low power interpolation filter, with demand to a high-order, power-hungry analog reconstruction filter, and a high DAC conversion frequency, implies a digital filter with a high interpolation factor, that relaxes the required performance of the analog smoothing filter. This trade-off is presently optimized with a DAC data-rate about 8-10 times the signal bandwidth and a 4-6th order analog reconstruction filter. For instance, in the case of the WLAN IEEE 802.11a standard (whose signal bandwidth is equal to 10 MHz), the DAC data-rate is around 100 MHz as illustrated in Figure 2 [2] [3] [4] .
Due to the upcoming higher data rate standards (IEEE 802.16 and 802.11n, for instance), future implementations will involve with several critical issues on this baseband section architecture. As the new standards will present a larger signal bandwidth (25 MHz for the upcoming IEEE 802.16, for instance [5] ), the use of traditional transmission (TX) baseband architectures will result in a more and more critical design of the analog filters, since their cut-off frequency has to be increased (with an increasing sensitivity to the lower CMOS gain and to the non-dominant poles) [6] . Figure 3 shows this work which exploits the DAC oversampling ratio (OSR) to avoid the use of an active analog reconstruction filter [2] . As a matter of fact, the DAC conversion frequency is increased up to 1 GHz.
High Speed Conventional
Current-Steering DACs
Binary Weighted Architecture VS. Unary Decoded Architecture
Current-steering DACs are based on an array of matched current sources which are unity decoded or binary weighted [7] . As shown in Figure 4 , the reference source is simply replicated in each branch of the DAC, and each branch current is switched on or off based on the input code. For the binary version, the reference current is multiplied by a power of two, creating larger currents to represent higher-magnitude digital signals. In the unitelement version, each current branch produces an equal amount of current, and thus 2N current source elements are needed. The performance of the DAC is specified through static parameters: Integral Non-Linearity (INL), Differential Non-Linearity (DNL) and parametric yield; and dynamic parameters: glitch energy, settling time and SFDR [8] . Static performance is mainly dominated by systematic and random errors. Systematic errors caused by process, temperature and electrical slow variation gradients are almost cancelled by proper layout techniques [9] . Random errors are determined solely by mismatch due to fast variation gradients. Advantages and disadvantages of these structures are 
Segmented DAC Structure
Usually, to leverage the clear advantages of the thermometer-coded architecture and to obtain a small area simultaneously, a compromise is found by using segmentation [10] . The DAC is divided into two sub-DACs, one for the MSBs and one for the LSBs. Thermometer coding is used in the MSB where the accuracy is needed mostly. Because of the reduced number of bits in this section, the size is considerably smaller than a true thermometer coded design. The LSB section can either be done using the binary-weighted or the thermometer-coded approach. We will refer to a fully binary-weighted design as 0% segmented, whereas a fully thermometer-coded design is referred to as 100% segmented. The design of current-steering DAC starts with an architectural selection to find the optimum segmentation ratio (m over n) that minimizes the overall digital and analog area [10] [11] [12] . The INL is independent of the segmentation ratio and depends only on the mismatch if the output impedance is made large enough [7] . The DNL speciation depends on the segmentation ratio but it is always satisfied provided that the INL is below 0.5 LSB for reasonable segmentation ratios. The glitch energy is determined by the number of binary bits b, being the optimum architecture in this sense a totally unary DAC. However, this is unfeasible in practice due to the large area and delay that the thermometer decoder would exhibit. The minimization of the glitch energy is then done in circuit level design and layout of the switch and latch array and current source cell [13] .The optimum segmentation is workout 75% in [10, 12] so we have used this segmentation to achieve the best performance in high-speed design. Thus we consider 9-bit as thermometer-coded and 3-bit as binary-weighted. Figure 6 shows a typical block diagram of an n-bit segmented current-steering DAC which uses the advantages of both architectures. Input word is segmented between b less significant bits that switch a binary weighted array and m= n -b most significant bits that control switching of a unary current source array. The m input bits are thermometer decoded to switch individually each of the unary sources [14] [15] [16] . A dummy decoder is placed in the binary weighted input path to equalize the delay. A latch is placed just before the switch transistors of each current source to minimize any timing error [10] . Figure 7 shows a block diagram of a conventional row and column decoded 12-bit current-steering DAC. In this block diagram, the lower significant bits are applied to a dummy decoder [17] . This decoder creates a delay proportional to the Binary-to-Thermometer decoder and causes the signal to arrive at the switches synchronously. The five LSB bits are column decoded and the four MSB bits are row decoded. Column decoder is a 5-input 31-output Binary-to-Thermometer Decoder and row decoder is a 4-input 15-output Binary-to-Thermometer Decoder. Outputs of the decoders control 511 current cells in the main matrix. But if we think about Binary-to-Thermometer Decoder structure we understand that β-bit increase of the input of the decoder cause the area, complexity, number of control signal and power consumption of the decoder increase with 2 β . In fact power and area are doubled with only one bit increase in the input of the decoder and we can write:
New Thermometer Decoding Architecture
Thus:
where BTD is Binary-to-Thermometer Decoder, P is the power consumption of the decoder and A is active area that the decoder uses. Now consider Figure 8 that shows a 3D decoding architecture. In this block diagram three BTD have been used. Three bits for height, three bits for row and three bits for column and every cell is selected with 3 parameters (R, C and H). In fact we have only used three (3to7 BTD) instead of two (5to31 BTD) and (4to15 BTD) thus power consumption and area of the circuit have been improved two times because:
And for area we have:
In this structure 3 LSB bits are column decoded, 3 middle bits are row decoded and 3 MSB bits are height decoded. On the other hand, we have only used 21 control signals instead of 46 control signals thus the number of control signals has been decreased by 55 percent hence we can achieve the best speed and performance.
The Current Cell, Latch and Driver
Static and dynamic performance of current-steering DACs is mostly determined by the accuracy of the current sources, finite output impedance, and switching time. Figure 9 shows a current source transistor M CS , an additional cascode transistor M CAS that increases the output impedance and two complementary switch transistors M SW . This figure shows cascode current source and switch structure for 1LSB while for realizing unary current source cell (8LSB) we used same structure with 8 parallel transistors. In proposed 12-bit DAC three bits are binary weighted so it uses the current source of Figure 9 and remaining 9 bits are thermometer decoded and need unary current sources. Since two D/A converters processed in the same technology do not necessarily have the same specifications due to technological variations, therefore it is of the utmost importance to know the relationship that exists between the specifications of the circuit and the matching properties of used technology. For a current-steering D/A converter, the INL is mainly determined by the matching behavior of the current sources. A parameter that is well suited for expressing this technology versus DAC specification relation is the INL yield [16] . This INL yield is defined as the ratio of the number of D/A converters with an INL smaller than 1 LSB to the total number of tested D/A converters. As defined by Pelgrom, mismatch "is the process that causes time-independent random variations in physical quantities of identically designed devices" [18] . Pelgrom's paper has become the de facto standard for analysis of transistor matching, and thus his formula for the standard deviation of saturation current for two identically sized devices has been used for the design. This formula is:
where 
A S D WL
Most of these variables are process-dependent constants. Using these results, an equation for the minimum size device that still provides a reasonable current standard deviation can be determined [13] :
where A β , A VT , V GS and V T are process parameters, while I is the current generated by a given source and σI is the relative standard deviation of one current source. 
where inv_normal is the inverse cumulative normal distribution. The M CS transistor size is found by:
where µ n C OX is the MOS transistor gain factor and ΔV = (V GS -V T ). Applying Equations (12) and (13) (12) and (13) and use only mismatch Equation (10) to reach a minimum sizing of current cell. With this method, the speed of switch is high also INL < 0.5 LSB is satisfied. The small-signal output impedance for the current source topology of Figure 9 is given by:
The optimum M SW and M CAS gate bias voltages concerning the output impedance are found by differentiating R out with respect to V gSW and V gCAS . For the SW and CAS gate bias voltages that maximize output impedance are found as: Figure 10 shows the biasing scheme for the cascoded current sources. The PMOS sections of the biasing circuits are labeled as Global biasing while the NMOS sections are labeled as Local biasing. In the actual implementation, the global biasing is realized using a common-centroid layout to reduce effects of gradients. The local biasing is separated into four quadrants. There is no direct connection between any two quadrants. This will improve both DNL as well as INL performance [10] . A driver circuit with a reduced swing placed between the latch and the switch reduces the clock feed-through to the output node as well [19, 20] . Figure 11(a) shows a current source, switch, latch and driver cell. A new swing-reduced-device (SRD) circuit is designed (shown in Figure 11(b) ). The latch circuit complementary output levels and non-symmetrical cross point are designed to minimize glitches [13] . The waveforms of the different nodes are shown in Figure 11 (c) without SRD circuit and Figure 11(d) with SRD circuit. Signals with symmetrical crossing point are fed from the left and SRD makes a non-symmetrical crossing point which reduces the spike at node V X considerably. In SRD circuit, M SRD1 is always on and when M SRD2 is off, V gSW approaches 2.4 V (power supply value of digital part). When M SRD2 is on with proper sizing of M SRD2 , V gSW can be set to desired value because V gSW in this case will be equal to V SG of M SRD2 transistors. In this circuit for complete switching of M SW transistors we need 350 mV differential voltage, so V SG of M SRD2 is set to 2.05 V. On the other hand for non-symmetric crossing it's enough to choose bigger size for M SRD1 than M SRD2 . Size of M SRD1 and M SRD2 has been given in Table 1 , also SRD output wave forms and its effect in reducing spike in node V X is shown in Figure  11(d) . The capacitive coupling to the analog output is minimized by limiting the amplitude of the control signals just high enough to switch the tail current completely to the desired output branch of the differential pair. In addition the switch transistors are kept relatively small in order to avoid large parasitic capacitances. 
Layout and a Few Techniques to Achieve High Speed
Clock distribution for 1 GHz is very difficult and getting data in this speed is very hard thus we have used 2 channels for digital section. Every channel works at 500MHz and then results of two channels are combined at the input of the switch to get 1 GHz. Figure 12 shows the structure used for digital section of the DAC. Channel 1 samples input data with clock and channel 2 samples input data with clock-not. A buffer just before switch combines the output of two digital channels. It sends the output of digital channel 1 with clock and the output of digital channel 2 with clock-not to the input of switch. In fact in one period of clock we take 2 samples of the input code and at the output it seems that the circuit works at 1 GHz. On the other hand, we use master-slave operation in all digital circuits and use pipelining scheme, so in overall the digital circuit only senses one gate-delay. For example the structure of one of 3-input 7-output Binary-to-Thermometer Decoder has been shown in Figure  13 . Layout of all digital section has been done manually to guarantee the best speed, low power and minimum area. Figure 14 shows the complete layout of the DAC, latches and switches which are grouped in a separated array placed between the decoders and the current source arrays to isolate these noisy digital circuits from the sensitive analog circuits that generate the current. A guard ring has been used to separate analog section from digital section. Layout of the decoder circuit has been drawn manually and pipelining used to reach the maximum speed and improvement of the parasitic capacitance and sizing of transistors has been done with simulation. For reduced systematic errors each unary current source is divided into 16 sub-current sources and Q 2 Random Walk distribution scheme is applied [21] .
Simulation Results
Simulations have been performed on a differential 50-load. The internal node interconnection capacitance has been estimated to be 400 fF, and the output capacitance 1pF. The analog voltage supply is 3.3 V while the digital part of the chip operates at only 2.4 V. Total power consumption in the Nyquist rate measurement is 144.9 mW. SFDR is better than 64 dB in Nyquist rate. Figure 15 shows differential output spectrum where DAC worked with 1 GSample/s speed and input code near to Nyquist rate (495 MHz) with 1 mV (rms) noise voltage on analog power supply. Also Figures 16 and 17 show differential outputs spectrum for 1 GSample/s speed with input signals in 100 MHz and 25 MHz respectively. Measured SFDR for both of them was better than 70 dB. Figure 18 shows the measured SFDR versus various input frequency for the proposed DAC at a 1 GHz sampling frequency. In Figure 19 , a dual-tone SFDR measurement is shown. Two sinusoidal signals around 15 MHz with 5-MHz spacing have been applied to the D/A converter at an update rate of 1 GSample/s. The SFDR equals 71 dB. In order to make simulation of glitch energy transition of input digital codes from 011111111111 has been made to 100000000000, such that the glitch energy has been obtained to be 2.3 pV.s. Figures 20 and 21 show DNL and INL characteristics of designed DAC for in creasing summarizes some of important performance parameters of the DAC.
Conclusion
In this article a 3.3 V, 12-bit, current-steering, 9 + 3 seg- mented architecture digital to analog converter for base-band of wireless transmitter circuits has been presented. A new 3-D thermometer decoding scheme has been used in digital section which reduces the area power consumption and number of control signals considerably.
Simulations have been performed to analyze and solve some of important dynamic linearity limitations. Using two digital channels in parallel, one operating with clock and the other operating with clock-not for the sampling rate of 1 GS/s while each channel operates only at 500 MHz. This clocking strategy makes clock distribution much easier. Analog switches and SRD circuits have been optimized not only to get minimum area and maximum speed but also to improve dynamic behavior of the DAC. Segmentation (75%) decreases DNL error and glitch energy considerably and guarantees needed improvement of SFDR. Separate power supplies have been used for digital and analog parts. Digital section operates at lower supply voltage than analog part. This increases speed and reduces power consumption of the digital part and at the same time decreases power supply noise and improve the performance of the analog part. The technology used is a 0.35 µm, single-poly four-metal, 3.3 V, standard TSMC Mixed Mode CMOS process. The active area of the DAC, as shown in Figure 14 , is 1052 µm × 1306 µm.
