I. INTRODUCTION
O VERSAMPLED ΔΣ digital-to-analog converters (DACs) have been reported in all-digital [1] , "almost" digital [2] , and RFDAC [3] , [4] transmitters for UMTS, WLAN, and WiMAX standards. Their use has been driven by the goal of a software defined radio wherein the bulk of the processing is moved into the digital domain in order to reduce the analog complexity and enable easy reconfiguration. The digital-toanalog conversion is pushed closer to the antenna and hence operates at a higher frequency compared with traditional transmitters. ΔΣ DACs thus become attractive in digital-centric transmitters because they greatly reduce the number of analog cells required compared with Nyquist DACs by using digital processing. Second, the oversampling relaxes the order of the reconstruction filter after the DAC. The example in Fig. 1 shows the difference between a traditional transmitter and a digital baseband transmitter that uses a ΔΣ DAC. It can be seen that the ΔΣ DAC, with much lesser number of current cells than a Nyquist DAC, relaxes the analog matching requirements. It extends the digital-analog boundary closer to the upconversion mixer and operates at a frequency that depends on the radio standard, the signal bandwidth, and the required oversampling ratio (OSR).
High-speed high-data-rate communication requires increasingly larger bandwidths, such as the recent wideband standards for UWB [5] and 60-GHz [6] radios that have large RF bandwidths of 528 MHz and 1.7 GHz per channel, respectively. However, the bandwidth of ΔΣ DACs reported so far has been limited to 100 MHz (i.e., 200-MHz RF bandwidth when used in I & Q paths of transmitters) [3] . Aiming to improve the bandwidth of ΔΣ DACs for wideband communication, this brief presents a ΔΣ DAC with 200-MHz bandwidth in 65-nm CMOS.
Orthogonal frequency-division multiplexing (OFDM) modulation is commonly used for transmission at high data rates. If an OFDM signal is used with a modulation scheme such as QPSK, 16-QAM, or 64-QAM, then a signal-to-noise ratio (SNR) in the range of 25-35 dB is required from the DAC [3] , [7] . Arriving at such an SNR for bandwidths over 200 MHz requires DAC sampling rates exceeding 8 GS/s. The speed of the digital ΔΣ modulator (DDSM) in the DAC then becomes the limiting factor for achieving this sampling rate when a conventional implementation is used.
Time-interleaved DDSM-based DACs are therefore required in order to relax the input data rate and the speed of the logic in the modulator. Fig. 2 shows this concept for an M-channel interleaved DDSM DAC. The digital processing contains M modulator sections operating at a relaxed 1/Mth of the desired sampling rate. The multiplexing of the channels just before the DAC is still at the full sampling rate [8] .
1549-7747/$31.00 © 2013 IEEE Eight-channel interleaved DDSMs that operate at 2.5 GHz have been previously described in [9] and [10] . However, the digital modulator in [9] does not address the DAC design, and in [10] , the time interleaving of the ΔΣ DAC is demonstrated by simulations. Furthermore, in [9] and [10] , interleaving has been used with the aim of relaxing the logic speed, while its impact on the final full-speed multiplexing has not been considered due to the moderate speed of 2.5 GHz. However, for the > 8-GHz speed targeted in this work, a high degree of interleaving complicates the routing, and more importantly, it increases the timing complexity of the multiplexing, which then requires accurate multiphase clock generation. Therefore, a two-channel interleaving strategy is utilized in this work, which enables a simplified multiplexer (MUX) by allowing the use of a single clock for the logic, as well as the multiplexing, while pushing the speed of the logic in each channel to 4 GHz.
This brief is organized as follows: Section II motivates the architecture choice. Section III describes the circuit design of the proposed ΔΣ DAC. The test chip and measurement results are presented in Section IV, followed by the conclusion in Section V.
II. ΔΣ DAC ARCHITECTURE
A MASH-architecture-based ΔΣ DAC is suited for highspeed implementation because it is inherently stable, has a short integrator critical path, and easily allows pipelining as compared with other typical modulator architectures [9] . MAT-LAB simulations show that a MASH 1-1 DDSM DAC with seven thermometer-coded current cells (3-bit output) can provide a SNDR of 45 dB at 200-MHz bandwidth and 8 GS/s (OSR = 20) with a current mismatch (σ) of 3.6%. A secondorder modulator presents only a moderate filtering complexity for suppressing the generated out-of-band quantization noise [2] , [3] . For the chosen 1-1 MASH modulator, simulations show that a second-order low-pass filter can satisfy the spectral mask for the UWB standard in [5] . With these relaxed analog constraints, this mainly becomes a digital problem of designing a DDSM for such a high speed. For a conventional MASH 1-1 DDSM that uses static CMOS logic, the 8-GHz speed is found to be outside the capability of a standard 65-nm technology node at 1 V even when extensive pipelining with 1-bit integrators is utilized [4] . Hence, a time-interleaved MASH DDSM is necessary to relax the critical path and the clock rate.
In the 2.5-GHz DDSM described in [9] , eight-channel interleaving was chosen with the aim of relaxing the timing of the logic while minimizing the area. The MUX was not a bottleneck at this speed and was easily achieved using a standard parallel-to-serial conversion. Fig. 3 (a) shows this commonly used scheme where the eight individual channels working at an F s /8 speed are retimed to F s . It can be noted that the clock divider and channel select logic still operate at the full sampling rate F s . At 8 GHz, this multiplexing scheme becomes a bottleneck as meeting the clock divider and the channel select logic paths becomes a challenge with full-swing CMOS logic. In order to use this scheme at 8 GHz, a reduced-swing logic is required instead, e.g., current mode logic (CML). However, the reduced swing in the DAC switch drivers would then affect the DAC linearity [11] .
Another multiplexing style [see Fig. 3(b) ], described in [12] for four channels, uses CML and analog phase rotators for [9] . (b) Four-channel multiplexing with phase rotators and dual current cells [12] . (c) Two-channel multiplexing in this work. setting the clock phases. In this MUX, the final latches are moved into the current cell itself and two current cells per bit are used. This scheme requires a modified current cell and a complicated routing with three times the number of wires going into the cell. The number of current cells also doubles, resulting in an overall increase in analog complexity.
The two MUX choices show that the increased relaxation in the logic timing resulting from an increased degree of interleaving comes at a cost of complicated MUX and current cell circuits. Hence, two-channel interleaving [see Fig. 3(c) ] is chosen in this work by giving a higher priority to the simplification of the MUX and the current cells. This twochannel approach has the following benefits. First, the logic and the MUX now work with a single half-rate clock, and second, it allows multiplexing with rail-to-rail swing and use of traditional current cells. However, this requires the logic in each channel to operate at 4 GHz, which is still a challenging task. This speed has been achieved by a careful integrator design that is tailored for the interleaved structure and is described in the following section. [11: 0] form the two half-rate channels and are the divided even and odd streams of the original data rate. The 10-bit back-toback integrators/adders, i.e., the feedback path in each of the first-order MASH DDSMs form the critical path in the design (shown with a dashed rectangle). The carry generated from the integrators is added to the remaining two most-significant-bit input bits x 0 [11:10] and x 1 [11:10] . Being in a forward path, this addition and also the end processing are not critical paths. Therefore, the main design problem becomes that of two 10-bit back-to-back adders that must operate at over 4 GHz (250-ps cycle time).
III. ΔΣ DAC CIRCUIT DESIGN

A. DDSM Design
The aim of this work has also been to enable the high sampling rate by utilizing only robust static logic so as to avoid the lower noise margins that are associated with faster logic styles such as precharged-domino, ratioed, or pass-transistor logic. Pipelining is essential for meeting the speed requirement. Furthermore, in order to maximize the speed of the pipeline and the number of bits that can be accommodated in each pipeline, an important observation about the two adders can be made. In the first adder (Integrator 0), both the carry chain and the sum delays are in the critical path, but in the second adder (Integrator 1), only the carry chain is in the critical path. Hence, using the same adder cells for both the adders does not result in the fastest speed. Using this observation, all the noncritical nodes of both the adders are slowed down by reducing their drive strength. This helps to reduce the capacitance on the critical path and speed it up. adders previously mentioned, all the four carry select adders are differently sized. There are two critical paths in each pipeline stage, from
FF (Path 1) and
. Table I shows the postlayout delay contributions from both the paths at the typical corner, 1-V supply, and 75
• C. The table shows that this optimized 2-bit structure supports a frequency greater than 4 GHz, and the delays in the two paths are nearly equalized.
Adder A1 is the slowest in the path as it generates both the sum and co (carry) complementary outputs with equal delay. It also internally generates the complementary inputs, which slows it by one additional inverter delay. In comparison, A2 has a faster ci → sum delay as its ci → co delay is non critical. On the same lines, A3 has a faster a → co delay as its a → sum delay is noncritical. A4 equalizes both the co and sum delays similar to A1 but is faster due to its lower fan-out. The NOR gate at the end of the adder path is an additional overhead required to synchronously reset the integrator at startup.
B. Clock Distribution
In order to keep the TGFFs compact in size by avoiding a clock inverter inside the flip-flops (FFs), both the clock phases are provided from outside the FF. While the global clock distribution is single ended, the pseudo-differential clock driver in Fig. 6 distributes the clocks with a 30-ps slope locally to the FFs. Each such clock driver is used for every 18 FFs (fan-out of 3) in a 70 μm × 45 μm area. This driver also minimizes the clk − clk overlap as both the clock phases have an equal load. Fig. 7 shows the static 2 : 1 MUX per cell. The data from the second channel are shifted by half a cycle prior to the MUX. The MUX is single ended, and the complementary DAC switch signals are generated with a 15-ps slope after the MUX by a center-crossing pseudo-differential switch driver. The switch driver uses the same circuit as the local clock driver in Fig. 6 . A high-crossing switch driver [11] that is modified for this MUX structure is required in the DAC, but such a driver is a challenge at 8 GS/s due to the high capacitance and contention at its cross-coupled nodes. Hence, the center-crossing driver is chosen to meet the speed and the fast slope. The DAC is sensitive to a mismatch in driver rise and fall delays and requires a careful driver design. However, due to its pseudo-differential nature, a 3-ps mismatch is the smallest that could be achieved in this design, and this is found to degrade the SNDR by 4 dB in postlayout simulation. The clock to the MUX bypasses the main clock distribution and uses a minimized buffering of four stages. The MUX is sensitive to the clock duty cycle and simulations show an SNDR reduction of 4 dB for every 1% variation from the desired 50% duty cycle.
C. Final Multiplexer and Current Cell Design
The DAC current cell, also shown in Fig. 7 , is designed for 0.3-V pp−diff swing with a 100-Ω passive load and 1.2-V supply. The current source dimensions for the 3.6% mismatch requirement are derived using foundry matching parameters. The current cell has M4-M5 transistors as the cascode pair on top of the switching M2-M3 pair that is biased in the linear region and M1 as the current source. The seven current cells are laid out in a single column with dummies on either side to simplify the route matching between the switch driver and the switches.
IV. MEASUREMENT RESULTS
The proposed ΔΣ DAC is implemented in a standard 65-nm CMOS process and uses only general purpose low-V T (DDSM & DAC) and standard-V T devices (DAC). Fig. 8 shows the chip photograph. The active area of the DAC is 0.13 mm 2 . The test chip is directly wire bonded to an FR4 PCB. Fig. 9 shows the measurement setup used. A 12-bit input of 8192 sample length and frequency F in is sent into the chip at rate F bb using a Tektronix AWG5012C pattern generator. Internal to the chip, these data are upsampled to the F s rate by a zero-order hold operation. In addition, the two DDSM channel inputs are also shorted together. The zero-hold operation and the shorted inputs together result in the upsampling of the input data by 2F s /F bb since the DAC effectively works at a 2F s sampling rate. A FIR low-pass filter is not implemented to remove the upsampling images at the DDSM input. With this simplified setup, these unfiltered images appear at the DAC output also, but they lie outside the band of interest. In a real application, however, the upsampling images are filtered out prior to the DDSM. By keeping F in ≤ F bb /4, the unfiltered images do not result in intermodulation products in the band of interest (dc to F in ). The duty cycle of clock F s is tuned off-chip before the measurements. single tone input consuming 68 mW (40-mW clocking, 23-mW logic/FF, and 5-mW analog) from 1-V digital and 1.2-V analog supplies. The measured IMD3 is −57 dBc (see Fig. 11 ) for two 6-dBFS tones near 200 MHz placed 2 MHz apart. Fig. 12 shows the simulated DAC output spectrum, which has a SNDR of 42 dB. The measured SNDR is found to be lower than the simulated value primarily due to a 10-dB loss resulting from a test setup limitation of synchronizing the F s and F bb clocks. The two clock sources have been locked to a common 50-MHz reference, but they are still not truly synchronous. This results in an upsampling uncertainty in the zero-hold operation and restricts the SNDR. The higher noise floor up to 800 MHz in the measured spectrum results from this synchronization problem. Table II shows the comparison of this work with previous ΔΣ DACs having a > 2.5-GHz sampling rate.
V. CONCLUSION
This brief has presented a 8-GS/s 200-MHz bandwidth ΔΣ DAC with 26-dB SNDR, −57-dB IMD3, and 68-mW power consumption in 65-nm CMOS. The high sampling rate has been achieved by a two-channel interleaved 4-GHz MASH 1-1 modulator structure. This allowed a single clock solution, thus simplifying the timing complexity of the final full-rate multiplexing. Using only seven current cells with relaxed matching requirements, this work has demonstrated the potential of this predominantly digital DAC for use in the baseband of trans- 
