Abstract-This work presents an 11 GS/s 1.1 GHz bandwidth interleaved ΔΣ DAC in 65 nm CMOS for the 60 GHz radio baseband. The high sample rate is achieved by using a two-channel interleaved MASH 1-1 architecture with a 4 bit output resulting in a predominantly digital DAC with only 15 analog current cells. Two-channel interleaving allows the use of a single clock for the logic and the multiplexing which requires each channel to operate at half sampling rate of 5.5 GHz. To enable this, a look-ahead technique is proposed that decouples the two channels within the integrator feedback path thereby improving the speed as compared to conventional loop-unrolling. Measurement results show that the ΔΣ DAC achieves a 53 dB SFDR, -49 dBc IM3 and 39 dB SNDR within a 1.1 GHz bandwidth while consuming 117 mW from 1 V digital/1.2 V analog supplies. Furthermore, the proposed ΔΣ DAC can satisfy the spectral mask of the IEEE 802.11ad WiGig standard with a second order reconstruction filter.
I. INTRODUCTION
T HE increasing demand for high-data-rate short-range wireless communication has led to the evolution of the unlicensed 60 GHz radio band (57.2-65.8 GHz) which has a continuous bandwidth of 9 GHz. This has resulted in the development of recent standards, such as WiGig (IEEE 802.11ad) [1] , ECMA-387 [2] and WirelessHD [3] . These standards have divided the 60 GHz band into four channels, each having a 1.76 GHz (I+Q paths) RF channel bandwidth (BW).
Digital-to-analog converters (DACs) form a part of the transmitter baseband and are required to have a wide bandwidth greater than 880 MHz (in both, I & Q paths to enable the 1.76 GHz channel BW) and a resolution greater than 6-8 bits to support the different modulation schemes of these standards [4] - [8] . Most of the DACs reported in literature for 60 GHz radio have so far used a conventional approach with a 2 digital interpolation of the baseband, followed by a Nyquist current-steering DAC and a fourth or fifth order passive LC-analog anti-aliasing filter, which then connects to an up-conversion mixer [4] , [7] . This approach is shown in Fig. 1(a) . The passive filters occupy a large on-chip area and have a low quality factor [8] . While some low-area high-order wideband active filters have also been recently reported [9] , [10] , they are challenging to design and impact the transmitter linearity.
With the advances in CMOS scaling, there is a trend of using digital processing to move the analog functionality of the RF transceivers to the digital domain for easy configurability and relaxing the analog circuit requirements. Some examples of these techniques in transmitters include oversampling/interpolation filtering to reduce the anti-aliasing filter order and the use of modulation to reduce the number of DAC unit cells [11] - [15] . However, these techniques have been applied only for relatively low channel-bandwidth standards ( MHz) e.g., WLAN, WiMAX, UMTS, WCDMA and UN-II bands where the carrier frequencies are only a few gigahertz.
Applying similar techniques to 60 GHz radio is challenging due to its large BW which results in a very high speed requirement from the digital processing. Nevertheless, there is an emerging trend towards digital architectures for the 60 GHz band. A 7 bit oversampling interpolation digital FIR filter before the DAC operating at 9.6 GS/s is presented in [8] . This oversampling filter, along with the sinc response of a Nyquist DAC, can satisfy the spectral mask of the WiGig standard without an anti-aliasing filter and allow the DAC to directly connect to the mixer. This digital oversampling based architecture is shown in Fig. 1(b) . However, this architecture now requires a Nyquist DAC with at least 6-8 bits of resolution and operating at a high sample rate of GS/s. The design of this DAC is challenging as this may require the use of analog techniques such as use of sub-DACs or dual-current cells with higher matching requirements, special DAC switching schemes e.g., quad-switching, clocking schemes with extensive phase calibration and threshold voltage calibration of the switch to correct timing errors [16] - [18] .
A third potential architecture that still uses a digital oversampling filter but now instead uses a DAC instead of the Nyquist DAC is shown in Fig. 1(c) . In this scenario, the DAC can further enable this trend towards digital architectures by using digital processing to reduce the number of DAC unit cells and hence the overall DAC complexity. However, DACs have the drawback of a large out-of-band shaped quantization noise which needs to be filtered out to meet the spectral mask of the WiGig standard. If the order of the filtering can be restricted to a first or a second order, then a good trade-off between the high filter order of the conventional transmitter [ Fig. 1(a) ] and the large complexity of a high-speed Nyquist DAC in the interpolation-based architecture [ Fig. 1(b) ] can be [4] , [5] and [6] . (b) Oversampling filter used in [8] that requires a high-speed Nyquist DAC. Based on the foregoing discussion, the DAC is required to work at GS/s and provide a MHz. DACs have not been targeted for this high bandwidth and sample rate because of the speed limitation of the integrator (feedback path) in conventional digital modulators (DSM). Hence, time-interleaved modulators (TIDSM) that use a poly-phase decomposition (loop-unrolling) of the integrator are required to relax the critical path in the modulator [19] - [23] . Using this concept, MASH based TIDSMs that achieve 8 GS/s [20] , [23] and a DAC with 200 MHz BW [20] have been previously reported. However, these loop-unrolled architectures are eventually limited by the critical path of the integrator and the final fullrate-multiplexing that makes a greater than 10 GS/s speed very challenging. Hence, this work presents a two-channel MASH look-ahead time-interleaved modulator (LA-TIDSM) that reduces the critical path of a conventional loop-unrolled MASH TIDSM by modifying the execution order of the computations, while the two-channel architecture allows a single clock design with a simplified final multiplexing.
A two-channel LA-TIDSM MASH 1-1 DAC with an 8 bit digital input and a 4 bit DAC that achieves 11 GS/s and 1.1 GHz bandwidth is presented in this work. This DAC along with a second order low pass filter can support the spectral mask of the IEEE 802.11ad WiGig standard for the 60 GHz band. The remainder of this paper is organized as follows. Sections II and III describe the modulator choice and the LA-TIDSM architecture, respectively. Sections IV and V describe the implementation of the LA-TIDSM DAC and the testing methodology using an on-chip testing memory. Finally, the measurement results and the conclusions are presented in Sections VI and VII, respectively.
II. MODULATOR ARCHITECTURE
In order to support modulation schemes from BPSK to 16-QAM, the DAC is targeted for a dB SNDR in a bandwidth of 880 MHz [11] . The DSM should operate at a multiple of the reference sampling rate, which is 1.76 GHz in Single Carrier (SC) mode for the WiGig standard. The order of the DSM also affects the out-of-band quantization noise and hence the filter order required to meet the spectral mask. In addition to these constraints, the number of channels in the TIDSM based DAC and the choice of the final full-rate-multiplexing (serializer) strategy also influences the choice of the DSM order and the achievable SNDR. Fig. 2 shows a generic two-channel TIDSM architecture that is obtained by loop-unrolling a conventional DSM. The twochannel TIDSM shown in Fig. 2 implements a and operates at a relaxed half-sampling rate of . The DSM is implemented as a 2 2 block digital filter that contains the two poly-phase components of [19] . The two outputs are then multiplexed by the same half-rate-clock to the full sampling rate of . While a larger number of channels can further relax the critical path in the DSM [21] , the final full-rate-multiplexing now requires accurate multiphase clock generation which is challenging at high frequencies [16] , [23] . Hence, two-channel TIDSM DACs are of particular interest as they use only a single half-rate-clock for the DSM and the multiplexing, thus keeping a low clocking complexity while still relaxing the DSM critical path. The multiplexing and the overall DAC performance of this two-channel architecture is sensitive to the duty cycle of the clock. A duty cycle error (DCE) in the clock i.e., a variation from 50% duty cycle directly impacts the SNDR of the TIDSM DAC since this results in a timing skew between the two channels. The SNDR loss results from the folding of the high-frequency shaped noise between and back into the BW of interest that lies between 0 and . It has been shown in [24] that the loss in SNDR, from this noise folding in a TIDSM DAC due to a DCE of is given by (1) where , represents the DSM order and is the oversampling ratio. Although duty cycle correction or a double frequency clock that is divided down to achieve a 50% duty cycle can be employed to mitigate this problem, there still exists some residual DCE [16] , [25] . This suggests that an increased SNDR is required as a margin to accommodate some amount of DCE. It can further be noted that the DCE does not affect the SFDR of the DAC. The interleaving spurs resulting from the band between 0 and appear in the band between and . Hence, these tones are also filtered out by the anti-aliasing filter. Table I shows the different possible alternatives for the TIDSM DAC in the presence of above mentioned constraints. The SNDR is estimated for a 0 dBFS sine wave at 880 MHz and a DCE error of 1% i.e., the clock duty cycle is between 49% and 51% (a 2 ps timing error at 10 GS/s). In order to estimate the filter order needed, the baseband signal is assumed to be first up-sampled and pulse shaped with a 0.25 roll-off root-raised-cosine (RRC) filter prior to the TIDSM [7] . The TIDSM uses an 8 bit input data from the filter and an . It can firstly be seen from Table I that the fourth option with an OSR of 7 and a first order filter is the most desirable option but the 12.32 GS/s sample rate is very challenging. The third option with 10.56 GS/s, 4 bit DAC and a second order filter is the next best that can achieve the 40 dB SNDR. It can be further seen from Table I and (1) that a third order DSM does not yield a better SNDR in the presence of 1% DCE. Thus, the second order TIDSM with a 4 bit DAC and operating at GS/s is chosen as the design target. The unit cell current matching ( ) for the 4 bit thermometer coded DAC was chosen such that the SNDR loss due to mismatch is less than that produced by a 1% DCE. Monte-Carlo simulations showed that satisfies this requirement. Fig. 3 shows the WiGig spectral mask that can be met this chosen DAC option and a second order filter.
III. PROPOSED LOOK-AHEAD TIME-INTERLEAVED MODULATOR
The traditional MASH DSM architecture that consists of a cascade of first-order error-feedback (EFB) DSMs (Fig. 4) is a very attractive candidate for high-speed implementation due to two main reasons [11] , [12] . Firstly, the critical path is the shortest, corresponding to one adder delay, and restricted within each of the individual modulators. Any critical path spanning across the different cascade stages can be pipelined as this is a forward path [21] . Secondly, a cascade of first-order modulators is inherently stable. A conventional first-order EFB modulator with the integrator critical path is shown in Fig. 5 wherein the LSBs of the input signal, enter the integrator. The carry generated from the integrator is then added to the remaining MSBs of . The integrator bit-width is determined by the number of DAC bits required. Fig. 6 shows the first-order loop-unrolled two-channel TI EFB DSM operating at half the speed but the critical path is now a two adder delay (Adders A and B). The two adders, A and B can be optimized to achieve a very high speed, nevertheless, they ultimately limit the modulator speed [20] . An effective 10 GHz speed cannot be met with this twochannel architecture in a standard 1 V 65 nm CMOS technology if purely static CMOS logic with its robust noise margins and 1 bit per pipeline stage is to be used. The main reason for the speed limitation of this first order EFB TIDSM is the fact that adder B has to wait for the computation from adder A i.e the two adders (or channels) are coupled [shown in Fig. 7(a) ]. If the two channels/adders could be decoupled, then the two additions can happen in parallel within the integrator, thus speeding it up [ Fig. 7(b) ]. To achieve this decoupling, a pre-computation that corresponds to the intermediate computed value of Fig. 7(a) is performed prior to the loop. If this pre-computation (or look-ahead) turns out to be incorrect, then a post-decode block corrects this after the integrator. In summary, this involves moving a part of the computation out from the integrator feedback loop to before (look-ahead) and after the integrator (post-decode).
In order to arrive at the proposed LA solution, the first-order EFB two-channel TIDSM of Fig. 6 must be considered again. The DSM has an input width of bits, of which the LSBs enter the feedback path i.e., the integrator. The two carry signals ( , ) generated from the integrator are then added to the MSBs of the two channels, respectively, to obtain the noise-shaped output. Let and be the lower bits of the two-channel entering the integrator. Then, the following equations can be written for the th sample of the two generated sum ( , ) and carry ( , ) signals.
where denotes a floor operation and can take the value of 0 or 1 in this case. Using (2) in (3), we get (6) Equation (6) represents the two coupled adders. This equation is commutative in nature if any carry generated is ignored and can be rewritten as (7) Equation (7) shows that the first addition part of the equation, i.e., can be pre-computed in advance (look-ahead) before entering the feedback loop since the two inputs are readily available i.e., can be computed independent of . Rewriting (7) as, (8) where and represents only the sum part from this addition i.e., lower bits (and not the carry generated from the addition). From (2) and (8), it can be seen that the computation of and is possible in parallel, thus making it possible to decouple the two adders, A and B. The parallel computation of and results in the improvement of the operating speed by reducing the critical path to that of only one adder as compared to (6) . Fig. 8 demonstrates the proposed LA-TIDSM that implements (2) and (8) in parallel by moving the pre-computation of the intermediate partial sum, to before the loop. However, this modified order of executing the additions compared to the loop-unrolled TIDSM (Fig. 6) for computing results in an incorrect carry being generated from the loop for the second channel (CH1) in some cases. If the carry generated from [(8) ] in the LA-TIDSM is called , then , where [ (5)] is the correct expected carry for CH1 of Fig. 6 . Note that carry of CH0 is not affected by this change in order of the additions, i.e.,
. Hence, for the modulators of Figs. 6 and 8 to be functionally equivalent, the TABLE II  TRUTH TABLE TO COMPUTE THE CORRECT VALUE OF CARRY,  FROM  , AND expected carry must be correctly decoded before passing it on to the final addition with the MSB bits.
In order to decode the correct value of , the carry generated by the pre-addition of and is also propagated forward (Fig. 8) A numerical example explaining the LA-TIDSM is also presented in Appendix A. The proof for arriving at this truth  table for that results in the functional equivalency between the TIDSM and the proposed LA-TIDSM is provided in Appendix B.
The delay of the pre-computation in (8) is one adder delay similar to that of the integrator while the delay required to implement the post-decoding of in (9) is less than one adder delay. This technique can be extended to any number of channels. While the critical path of a conventional M-channel TIDSM is M adders, for an LA-TIDSM it always remains one adder delay, independent of the number of channels. In a M-channel LA-TIDSM, M-1 look-ahead additions are performed prior to the integrator i.e., and carry signals resulting from each addition are propagated forward. The expression for the correct carry of the th channel can be generalized for an M-channel LA-TIDSM as (10) where . It can be recollected that is always correctly generated and requires no post-correction.
Alternative implementations of the LA-TIDSM are also possible. Referring to (6) where , there exists another way of computing instead of the post-decode block. It is observed that is not required within the loop and hence can be calculated by replicating the operation outside the loop. However, this technique is inefficient as it requires an extra adder and does not help to improve the critical path within the loop.
The TIDSM structure of Fig. 2 and its enhancement, the LA-TIDSM in Fig. 8 is obtained by a TI/poly-phase decomposition of the delay element, in the integrator transfer function,
. An alternative TI implementation of the MASH architecture has been recently proposed in [23] by using a poly-phase decomposition of the full integrator transfer function, instead. This implementation also has a one adder critical path within the loop, but results in an inefficient carry generation logic for and . For a two-channel implementation of a first order modulator, the LA-TIDSM uses only 3 adders while [23] requires 8 adders. As the number of channels increases, the hardware savings are larger e.g., for 3-channels, the LA-TIDSM uses only 5 adders while [23] requires 21 adders.
IV. HIGH-SPEED LA-TIDSM DAC DESIGN

A. Modulator Design
An 8 bit input two-channel LA-TIDSM with 4 bit output is implemented in a MASH 1-1 configuration consisting of a cascade of two first-order EFB DSMs. Each of the two EFB DSMs is pipelined into 2 bit sections as shown in Fig. 9 . Only purely static CMOS custom designed logic is used. The FFs used are conventional Static Transmission Gate Flip-flops (TGFF) while the 2 bit additions are carried out using 1 bit carry-select full adders (FA). A NOR gate for synchronously resetting the integrator is also used at the end of the addition. Since the NOR gate is inverting, Adder 2 generates and . On the other hand, Adder 1 generates and carry. However, this requirement of different output polarities from the two adders has no impact on the total delay. Table III shows the post-layout simulated delay contributions from the various components in the critical path formed by the feedback. The simulations are carried out at 1 V, 75 for a typical corner and 110 for a slow corner in a standard 65 nm CMOS process using general purpose (GP) transistors and maximum RC extracted layout. Adder 1 is inherently slower than Adder 2 because it produces the complementary inputs/outputs and has a two gate delay. Adder 2 on the other hand, receives complementary carry inputs, does not need to produce complementary outputs and has only a one gate delay. The output FF for is replicated so that one copy of the output goes to the next MASH stage while one copy goes back into the feedback loop. It is seen that the total delay of 181 ps at the typical corner implies a maximum half-clock frequency of 5.52 GHz and an effective rate of 11.05 GS/s. Comparing this to the 2 bit TIDSM pipeline of [20] , this represents a 37 ps improvement in the delay or a 17% speed up in the critical path. Fig. 10 shows the 2:1 final full-rate multiplexing (MUX) scheme and the switch driver. The 4 bit output of the LA-TIDSM is converted to a 15 bit thermometer code prior to the final multiplexing. The CH1 thermometer encoding is moved to the clock falling edge through a half-cycle path shifting of the CH1 output from the LA-TIDSM. There is a half-cycle path at the input of the MUX which has a 70 ps delay and hence easily meets the timing. Since the switch driver is required to generate complementary outputs, this pseudo-differential multiplexing with the cross-coupled inverters, and helps to nominally equalize the delays of the complementary outputs. The switch driver is made high-crossing through the use of two cross-coupled NMOS, and [26] . The cross-over point is set at 0.7 V as setting it any higher yields no further improvement in the dynamic performance of the whole DAC. The switch driver is designed for 15 ps rise and fall times when connected to the current-steering DAC. The MUX utilizes two 1 V power supplies, one for the clock distribution and one for the switch driver. Each of these rails use an on-chip decoupling of 100 pF. Fig. 11 shows the DAC current cell used. The current source utilizes a low-low-power (LP) NMOS and is designed for 0.6% current mismatch [27] with an overdrive voltage of 360 mV. The matching is over-designed compared to the requirement of 1.1% from Section II because the DAC also supports a modulator bypass mode that allows the DAC to be driven directly from the memory by a 4 bit data of any other NTF for testing purposes. The switches and use the fast low-GP devices and operate in the linear region. The cascodes, and on top of the switches are sized for an output impedance that gives a greater than 50 dB SFDR performance. The cascodes also use 1.2 V low-LP NMOS which grants some additional headroom compared to the 1 V GP devices. Cascoding on top of the switches is used to avoid the coupling of the switch driver signals with the DAC output. For measurement purposes, the DAC has a differential 100 on-chip source termination and is interfaced to a spectrum analyzer with an off-chip 1.1 GHz bandwidth 2:1 center-tapped transformer. This setup ensures proper impedance matching for the DAC. Deep -well structures have been extensively used in order to reduce the substrate noise coupling from the digital blocks. The MUX and the switch driver NMOS devices are also placed in small distributed deep -wells while the 4 bit DAC consisting of only NMOS is placed in a separate large deep -well. The 15 current cells are laid out in a one single column with the odd and even numbered cells placed on either side of the center, respectively, to mitigate the gradient errors. The clock distribution to the 15 MUX switch driver cells is carefully matched with an H-tree and the NMOS of the distribution buffers are also placed in small distributed deep -wells.
B. Final Multiplexer and DAC Current Cell Design
V. CHIP IMPLEMENTATION AND TESTING METHODOLOGY
A prototype IC is fabricated in a standard 65 nm CMOS technology and mounted on a JLCC-68 package. It integrates a 8 bit two-channel LA-TIDSM with a 4 bit DAC and a 1 Kbit memory to enable full speed testing of the DAC. Fig. 12 shows the chip photograph while Fig. 13 shows the overall testing methodology. The memory is designed using static TGFFs and laid out in a 32 b 32 b aspect ratio with each location being 8 bit wide. The memory is written into serially at a low speed and then read at full speed internally during the DAC operation. This is achieved by first fetching four memory locations incrementally using a lower frequency clock. This 32 bit data is split into two 16 bit streams representing odd and even data. These two streams are then multiplexed using the clock to obtain two 8 bit data that are fed to the LA-TIDSM. The memory allows a 128-point deep signal to be tested and hence the minimum frequency bin spacing in the input signal is . For all the SFDR and IM3 measurements, a dithered input signal is used so that the non-linearity components are not masked while no dithering is used during SNDR measurement. The entire chip including the pads occupies an area of 1.5 mm 0.9 mm. The high-speed clock is sent into the chip as a sinusoidal differential signal and amplified to rail-to-rail within the chip. Static CMOS pseudo-differential clock distribution is used. Fig. 14 shows the overall clock distribution strategy for the IC using the pseudo-differential clock inverter (CI) as a building block. The short clock path to the MUX comprising only 7 inverter stages with a H-tree (mentioned earlier in Section IV-B) is also shown in the same figure. The duty cycle is set by the cross-coupled inverters in the clock distribution and hence no external duty cycle calibration of the input clock is performed.
VI. MEASUREMENT RESULTS
The LA-TIDSM DAC achieves an effective sample rate of 11 GS/s. Since the 3 dB bandwidth of the transformer is 1.1 GHz, all the measurements are restricted to this bandwidth. Fig. 15 shows the measured wideband spectrum and the noise shaping at 11 GS/s with a 1.1 GHz input tone. Fig. 16 shows that the measured SNDR is 39 dB in a 1.1 GHz bandwidth. Fig. 17 shows a measured IM3 of 49 dBc with two 6 dBFS tones located at 945 MHz and 1117 MHz, respectively. Due to the limited depth of the testing memory, the closest distance between two coherently sampled sinusoidal tones possible is 170 MHz. To measure the harmonic distortion, a 428 MHz tone is the highest frequency whose HD2 and HD3 lie close to the 0-1.1 GHz band. The measured HD2/HD3 is 56 dB/53 dB, respectively, and shown in Fig. 18 . Fig. 19 shows a sweep of the input frequency versus the measured SFDR (0-1.1 GHz band), SNDR (0-input frequency) and IM3 (center frequency) at 11 GS/s. The figure shows that a greater than 53 dB SFDR and smaller than 49 dBc IM3 performance is achieved in the 0-1.1 GHz band. The measured SNDR is 42 dB (ENOB 6.8 bits) for the WiGig 880 MHz BW and 39 dB (ENOB 6.2 bits) in a 1.1 GHz BW. The total measured power consumption is 117 mW from 1 V digital (90 mW) and 1.2 V (27 mW) analog supplies. The power and area breakdown of the DAC is shown in Table IV . In order to evaluate only the final MUX and estimate the DCE in the DAC, the 4 bit DAC is configured as a wideband Nyquist DAC that is directly driven from the memory by using the modulator bypass path in the chip. A 4 bit unshaped single tone signal at 2.83 GHz ( ) is used. This results in a measured interleaving spur of 36.9 dBc at 2.67 GHz ( ) as shown in Fig. 20 . The timing error, is then calculated using [18] (11)
This yields or an estimated DCE of 0.88%. Using (1), the DCE is found to contribute to a 1.2 dB relative SNDR loss for the IEEE 802.11ad 880 MHz BW and a 0.6 dB loss for the 1.1 GHz BW.
In order to measure the IEEE 802.11ad spectral mask, singlecarrier 16-QAM encoded random data with a frequency bin 
spacing of
MHz between 0 to 880 MHz is first generated and pulse-shaped in Matlab with an 18th-order RRC filter having a 0.25 roll-off factor. This data is loaded into the memory for the mask measurement. The filtering is achieved from a combination of the 1.1 GHz interfacing transformer, bonding wire inductance, JLCC socket capacitance and the PCB track. It is seen that this overall combination provides a 1.5th-order lowpass response between 0.95-1.9 GHz and a 2.3rd-order low- pass filter response between 1.9-3 GHz. Fig. 21 shows the measured spectral mask under these conditions at 10.56 GS/s operation. It can be observed that the mask of the IEEE 802.11ad (WiGig) standard is met and the out-of-band quantization noise from the second-order modulator is found not to be a limiting factor. Table V shows the comparison of this LA-TIDSM DAC with previously reported DACs having a sample rate GHz. It is seen that this work represents an improvement of over five times in the measured bandwidth and is the first DAC to achieve a sample rate greater than 10 GS/s and BW greater than 1 GHz. High-speed DSMs have also been used in hybrid DACs (a combination of Nyquist and DACs) [12] , [23] and frequency synthesizers [21] . Table VI shows a comparison with these previously reported high-speed digital modulators having greater than 5 GHz speed. The table shows that the high speed modulator space is dominated by the MASH architecture and this LA-TI DSM achieves the highest speed.
Since the aim of this LA-TIDSM DAC is to provide a third alternative to the traditional Nyquist DAC based architecture [ Fig. 1(a) ] and the oversampled high-speed Nyquist DAC architecture [ Fig. 1(b) ], it is of interest to compare the performance of this DAC with other previously reported DACs with these characteristics and a similar resolution. Table VI shows this comparison. For high-speed DACs reported in [18] and [28] , performance in the 0-1.1 GHz bandwidth has been extracted so a comparison with similar bandwidths can be made. It can be seen that the overall SFDR in this work shows a similar performance as these Nyquist DACs. The overall figure-of-merit (FOM) [29] is found to be comparable to the other Nyquist DACs. Since 75% of the power in DAC comes from the digital part, this DAC can benefit from further CMOS scaling which can further improve its FOM. An area comparison of this DAC with [4] and [30] is easier because these DACs are also designed in 65 nm CMOS. The DAC in this work has 1.6 times more area than the Nyquist DAC presented in [4] . In [30] , although a very compact DAC is presented, a high performance analog transistor with 1.5 times better matching parameter, is used. If normal low-Vt low-power transistors are used, then the DAC would have 2 times larger area than the Nyquist DAC of [30] . This indicates that the two-channel TI-DAC has a larger area consumption as compared to Nyquist DACs due to the increased digital processing. If the area is a constraint, then a TIDSM with larger number of channels can help to reduce the area [21] .
The DAC clock spurs can be a concern in transceivers utilizing frequency-division duplexing (FDD) where transmit and receive operations occur simultaneously in bands that are close to each other, such as LTE or W-CDMA standards. The DAC clock can leak through the antenna duplexer into the receiver band degrading its performance [31] . IEEE 802.11ad compliant 60 GHz radio transceivers, on the other hand, use time-division duplexing (TDD) where transmit and receive operations are in the same band with separate antennas and no duplexer [4] , [5] . Thus, the receiver performance is less affected by the DAC clock spurs.
VII. CONCLUSION
This work has presented an 11 GS/s 1.1 GHz bandwidth timeinterleaved MASH 1-1 DAC in 65 nm CMOS that is suitable for the 60 GHz radio baseband. Consisting of only fifteen analog current cells (4 bit DAC), the highly digital DAC achieves a dynamic performance of 53 dB SFDR, 49 dBc IM3 and 39 dB SNDR in a 1.1 GHz bandwidth consuming 117 mW of power. The high sample rate and bandwidth is enabled by a two-channel architecture allowing a single half-rate-clock for the logic and the multiplexing. This requires the logic to operate at half of the sampling rate, which is achieved through a look-ahead technique that reduces the critical path of the modulator to one adder only. The DAC has the potential for use in digital architectures for wideband transmitters. 
Numerical Example of the LA-TIDSM:
An example of the look-ahead approach is presented here using decimal numbers in order to explain the post-decode block. Assume that the integrator can hold values between 0 and 9. Let the value stored in the integrator, . Let the two channel inputs and be 6 and 8, respectively. Then, using (2)-(5), the following result is obtained for the conventional TIDSM of Fig. 6 : , , and . Now considering the LA-TIDSM of Fig. 8 , we get and . Moving into the integrator, the following result is obtained: , , and . It is seen that the value of and are correctly calculated. Also, while . Hence, the correct value of has to be predicted looking at , and i.e., the truth table in Table II . For , and , we get from the table which is the correct expected value in a conventional TIDSM.
APPENDIX B
Proof of Equivalency Between TIDSM and LA-TIDSM
The critical part of LA-TIDSM is arriving at the truth table for correctly decoding (Table II) that results in a functional equivalency with the TIDSM. In this section, only the LSB's of and are used and hence the LSB suffix for these variables is dropped. Consider the sequencing of operations in a TIDSM (Fig. 6) . Let the integrator output in the previous clock be called for the remainder of this section. Then, the value of the carry is calculated in the TIDSM by combining (2), (3) and (5) and re-writing them as (12) (13) Now, looking at the LA-TIDSM in Fig. 8 , needs to be correctly predicted from , and i.e., must be estimated for the eight different cases. The following two identities are used in the proof for any two bit unsigned numbers, and . (14) (15) Only two of the eight cases from 
Using (20) in (22), we have (23) Now, using (21) in (23), we get , which cannot be true. Hence, this condition cannot occur implying . Extending this proof similarly to the remainder of the six cases results in the truth table of Table II. 
