A Wideband 2 13-bit All-Digital I/Q RF-DAC Morteza S. Alavi 
C
ONSUMER electronic devices such as smart phones, tablets, and laptops are continuously evaluated in regard to three key criteria: low-cost, high power efficiency, and support of multi-mode/multi-band communication standards such as Wi-Fi, Bluetooth, and fourth generation (4G) of 3GPP cellular. An RF transmitter (TX) is considered the most power-consuming circuitry of the entire radio system, thus constituting a hindrance in extending the battery lifetime of portable wireless devices. Recently, intensive research has been directed toward realization of digitally intensive and all-digital RF TXs that provide high output power at high efficiency while being highly reconfigurable.
In consideration of this, an RF TX modulator, being the nearest to the antenna as it converts digital baseband modu-lation samples into an RF waveform, is considered the most critical building block of the TX, and it can take on either a polar [1] - [5] , Cartesian in-phase/quadrature-phase (I/Q) [6] - [17] , or an outphasing [18] topology. For wide modulation bandwidths, due to their direct linear summation of the I and Q signals, and thus, the avoidance of the bandwidth expansion, Cartesian modulators prove to be a better choice than their polar or outphasing counterparts [19] - [21] . Reference [6] proposed a digitally controlled I/Q modulator that utilizes current sources to isolate the orthogonal I and Q paths. The utilization of the current sources, however, deteriorates the far-out noise. Additionally, in order to meet the required RF output power, that approach employs an external power amplifier. Later, an I/Q direct digital RF modulator is introduced in [13] to which a finite impulse response (FIR)-based quantization noise filter is embedded so as to filter out the quantization noise in the receiver frequency band. Implemented in 130-nm CMOS, it also employed arrays of current sources to isolate the orthogonal paths as well as to set the proper coefficient values for the FIR filtering operation.
An all-digital orthogonal I/Q modulator concept was first proposed in [14] , where a 2 3-bit static I/Q implementation could achieve a maximum RF output power of 12.6 dBm. Since the effective modulating sample resolution is the utmost important parameter as it directly impacts the achievable dynamic range, linearity, error vector magnitude (EVM), noise floor, and out-of-band spectral emission, we recently proposed [22] an all-digital I/Q RF digital-to-analog converter (RF-DAC) with 2 13-bit resolution that can provide peak output power beyond 22 dBm. Due to its versatility, high efficiency, wide bandwidth, and fine resolution while requiring only a small chip area, the proposed solution is a very promising candidate for future multi-mode/multi-band TXs. In this paper, we elaborate in more detail on the system-and circuit-level design considerations, as well as digital calibration along with associated digital predistortion techniques.
This paper is organized as follows. Section II provides an overview of the concept of the digital I/Q RF-DAC along with system-level design considerations. The digital differential I/Q switch-array power generation stage and its related power-combining network are discussed thoroughly in Section III. The implementation is unveiled in Section IV. The digital I/Q calibration and digital predistortion techniques are addressed in Section V. Extensive measurement results are presented in Section VI. their composite I and Q digital vectors. Their code resolution must be high enough to cover all I/Q points of the corresponding trajectory connecting the symbols. This indicates that, for only supporting an -symbol constellation diagram, the resolution of the digital I/Q modulator should be at least (1) In addition, also affects the subsequent quantization noise, which is discussed in more detail in Section II-B. An important issue related to any transmit modulator is its agility in traversing from one I/Q point to another. As graphically depicted in Fig. 1(a) by and paths, traversing along trajectory instead of incites a more rapid complex baseband modulation, and consequently, the modulator must manage wider bandwidth as supported by a higher sampling rate. To do so, based on the idealized block diagram in Fig. 1(b) , the in-phase and quadrature-phase digital baseband signals are up-sampled and interpolated to -and -. This process ensures that the spectral images will be attenuated and located far away from the carrier and can thus be easily filtered out. The -and -are -bit upsampled digital signals, which should be directly up-converted to their continuous-time reconstructed real-valued RF output signal. As a result, these signals are applied to a pair of digital-to-RF-amplitude converters (DRACs), comprising an array of 1-bit unit cell mixers and 1-bit unit cell digital power amplifiers (DPAs).
II. CONCEPT OF DIGITAL I/Q TX
The DRACs are clocked in tact of differential quadrature up-converting clocks , , , and . According to Fig. 1(a) , the four quadrants of the constellation diagram must be covered by the modulator. This can be achieved by swapping between or between according to the sign bits of -and -. The DRAC outputs are connected to a power combiner that facilitates the conversion of the up-converted digital signals into the reconstructed RF output. In fact, the digital I/Q modulator represents an RF-DAC. In this approach, however, the primary challenge is related to the orthogonal summing of the I and Q DRAC outputs in order to reliably reconstruct the modulated RF signal.
A. Orthogonal Summing Operation of RF-DAC
The I/Q RF-DAC of Fig. 1(b) has two signal paths, namely, in-phase path and quadrature path carrying out the following operations:
--
The final signal is generated by vectorial summation of (2) and (3),
The summing operation must be orthogonal, and there should be no interaction or correlation between and , otherwise, EVM, bit error rate (BER), and spectral regrowth will emerge. If the duty cycle ( ) of the up-converted clock is 50% [17] , an overlap between and will always exist. Mathematically, their orthogonality can be verified using a dot product operation, (5) where is the clock period, and the clocks are assumed of unity amplitude. According to Fig. 2(a) and (5), and are not orthogonal. When considering the idealized digital I/Q modulator depicted in Fig. 2(c) and employing the clocks, the foregoing circuit is simulated. According to its SPICE simulated constellation diagram in Fig. 2(d) , the related EVM at 16-dBm RF output power is 21 dB. Hence, to improve linearity, a sophisticated digital predistorter would be required [17] . Moreover, the drain efficiency of its composite DPA is low due to the fact that the maximum conduction angle is 75% of the RF clock cycle.
To perform an orthogonal summation, the duty cycle of up-converting clocks is selected as to avoid any interaction between and . Based on Fig. 2(b) , the overlap between and is now zero. Thus, they are orthogonal, .
(e) .
Employing the aforementioned up-converting clocks of the digital I/Q modulator of Fig. 2(c) , the circuit-level simulated constellation diagram of Fig. 2(e) is realized. Its corresponding EVM at 16-dBm RF output power is 32 dB. As a result, this system only requires a very simple DPD [22] , and more importantly, the related drain efficiency of its composite DPA is higher due to the 50% maximum conduction angle. Note that, according to Fig. 2(d) and (e), the I/Q RF-DAC can address the entire four-quadrant constellation diagram.
B. System Design Considerations
The dynamic performance of the all-digital I/Q RF-DAC strongly depends on the interpolation rate of the --signals and their resolution. Since, in this prototype, the digital signal processing, including I/Q baseband interpolations, is performed in MATLAB and subsequently uploaded into two on-chip static random access memories (SRAMs) via a universal asynchronous receiver/transmitter (UART), the memory length (SRAM capacity) also affects the RF-DAC performance. Fig. 3(a) exhibits the system-level simulation setup that reflects the dependency of these parameters on its dynamic performance. First, and are interpolated in software by the clock that is generated by an integer-division of the RF carrier local oscillator (LO) clock. Thus, the clock and the baseband upsampled signals are synchronized to the LO clock. Next, and are quantized and then uploaded into the SRAM memory. Subsequently, the SRAM memory is read out using a clock and directly fed to the DRAC block. Since the is slower than LO clock, the DRAC performs as a zero-order hold (ZOH) to balance the speed of baseband upsampled signals with the LO clock. For the sake of signal-processing clarity, ZOH is depicted as a separate block between the memory and DRAC. Note that all simulations in Fig. 3 are performed under an assumption that the DRAC resolution is identical to that of the quantizer; the carrier frequency is 2.4 GHz. As a result, three yet-to-be-defined variables of Fig. 3(a) are frequency , DRAC resolution , and memory length , which should be appropriately selected. The lower limit is determined by the highest operational bandwidth of . At present, the bandwidth of baseband communication signals does not exceed 160 MHz. On the other hand, the upper limit could be as high as . Note that, in this case, the divide-by-would be redundant. In reality, running the at the LO rate could consume too much power, thus reducing the overall system efficiency. Fig. 3(b) exhibits the simulations for which is swept from 150 to 600 MHz in increments of 150 MHz while are 64-tone/80-MHz signals. The subsequent RF power spectrum is shaped by the Sinc function of the ZOH interpolation
The ZOH operation creates spectral replicas at multiples of the sampling frequency away from the carrier: , where . In conclusion, the upsampling and synchronization operations represent a ZOH that performs like a sinc-filter with its corresponding zeros located at multiples of (i.e., ). As such, the spectral images are notched by the ZOH operation. Note that doubling not only reduces the out-of-band emissions, but also decreases the spectral replicas by 6 dB. If is 150 MHz, then it would be unfeasible to support the 160-MHz baseband signals. On the other hand, a 600-MHz clock consumes twice the amount of power than at 300 MHz. Furthermore, a SRAM in a low-power 65-nm CMOS would not be feasible at 600 MHz. Therefore, is selected as 300 MHz that is generated employing a 8 divider.
Another simulation is performed by sweeping the bandwidth (two-tone frequency spacing) of from 20 to 80 MHz. According to Fig. 3(c) , the wider band signals produce higher out-of-band spectra while the spectral replicas are larger (6 dB/octave). This is merely the limitation of the present implementation and is entirely due to the limited sample-storing memory relative to the signal period. Fig. 3 (d) further illustrates that doubling improves the noise floor, although this would not be a limitation in practical TXs. Since, in this work, the upsampled baseband signals residing in the SRAM are furnished to the DRAC, this configuration performs as an fast Fourier transform (FFT) executor. Consequently, the greater number of FFT points results in the lower out-of-band spectrum. In this work, however, is selected at 8-kword (every word is 16 bits) to save the chip area. We should emphasize that the SRAM storage of modulating samples was selected rather than a real-time reception of the baseband data in order to emulate the environment of contemporary single-chip radios in which the RF transceiver is integrated with the digital baseband. This affords the benefit of avoiding contamination of the sensitive RF spectrum from the wideband modulating digital data through bond pads, bond-pad wires, and the electrostatic discharge (ESD) ring. As discussed earlier, the lower limit of is determined by (1). However, it should be much higher than that in order to meet the quantization noise requirements of practical communication standards. As with any digital-to-analog (DAC) converter, increasing improves the dynamic range of the RF-DAC. Based on Fig. 3(e) , every extra bit improves the out-of-band spectrum by 6 dB. In this work, is selected at 13 bits (the most significant bit (MSB) is the sign bit) to support the most stringent communication standards. band DACs of a conventional analog I/Q TX. Moreover, I/Q calibration can be easily performed at baseband while its band- The peak voltage swing of Drain and Drain nodes could be more than 2.4 V, which can cause device breakdown if the switchable cascode structure is not employed. Using the cascode, though, increases the on-resistance of the unit cell switches [see Fig. 5(b) ], which subsequently causes higher power loss as well as lower drain efficiency. Note that all simulations of Fig. 5(b) -(e) are performed with channel length of 60 nm and width of 500 nm. As stated, the DRAC resolution is 12 bits, which requires 4096 switch-array unit cells in each orthogonal path of , , , and . In this work, the targeted maximum RF output power is more than 22 dBm while keeping at 1.2 V. Therefore, the maximum RF power of each orthogonal path should be 1/4 of . According to simulations, utilizing 500-nm switches in 2 13-bit RF-DAC configuration ensures that each orthogonal path provides more than 16 dBm. Fig. 5(c) indicates that the drain efficiency of the cascode switch is lower than that of a simple switch, such as the one in Fig. 2(c) , due to its higher . In this simulation, the power-combining network is lossless, which would result in 100% drain efficiency if was, hypothetically, zero. Increasing the number of on-switches from 512 to 4096 improves the drain efficiency as a result of less overall power loss due to increased turned-on switches. Note that the cascode switch not only mitigates the related breakdown problem, but also exploited as an up-converting unit cell mixer. Controlling each cascode transistor unit based on its related baseband data (i.e., --), the equivalent on-resistor of is modified. Therefore, this can modulate the amplitude and phase of the reconstructed RF output signal. Finally, perhaps the most significant advantage of this cascode structure is to effectively isolate the I and Q paths, which results in improved EVM and linearity.
In addition to its on-resistance, the cascode MOS switch also exhibits a considerable gate/drain capacitance that is proportional to its channel width [see Fig. 5(d) ]. Choosing wider cascode switches in order to achieve higher efficiency, unfortunately, worsens the power consumption of the preceding RF clock buffers, which subsequently reduces the overall system efficiency. As a result, the selected channel width of 500 nm appears a good compromise between the overall system efficiency and maximum RF power. Note that the drain capacitance also depends on the drain voltage. Fig. 5 (e) demonstrates that the drain capacitance at V is almost double than that at V. Therefore, turning on the switches as well as varying the drain voltage modifies the drain capacitance, which eventually results in AM-AM and AM-PM nonlinearities. As a result, the selected power-combining network must also manage the drain capacitors.
The power-combining network is an important part of the RF-DAC, as it determines its output power, efficiency and quadrature accuracy. Its significance is verified using load-pull simulations and demonstrated in Fig. 6 (a). Note that, for simplicity, the load-pull simulation is only performed for the Drain node and its related drain efficiency, power, and modulation error contours are plotted. The modulation error is defined as a deviation of the modulated RF output signal from its ideal position. The load-pull simulation of Fig. 6 (a) indicates that the orthogonality is diminished for loads corresponding to high efficiency and power contours. This reveals that utilizing up-converting clocks with is a necessary, but not sufficient, condition for the orthogonal operation. The explanation for that argument lies in the fact that, at low RF power, the I and Q paths barely interact with each other. However, at higher RF power, is lower and the drain capacitance is higher (lower capacitance reactance), therefore, the I and Q paths begin loading each other's matching network.
Note that, according to the simulated load-pull contours, one of three possible loads could be selected: load based on the maximum efficiency, maximum power, and minimum modulation error. Fig. 6 (b) illustrates the simulated modulation error versus the number of turned-on switches for the three mentioned load scenarios. This simulation confirms that the most appropriate selection for the modulation accuracy better than 28 dB is choosing the load based on a minimum modulation error, which is indicated in Fig. 6 (a). This load affords the best modulation accuracy and reasonable efficiency (exceeding 50%), as well as generating the desired RF output power. By doing so, the digital predistortion would be simpler. In conclusion, to maintain , , , and as orthogonal at all RF power levels, the circuit elements of the power-combining network must also be included in all I and Q paths.
In order to achieve high efficiency at high RF power, and considering , , , and as digital clock signals of rectangular pulse shape, the class-E type matching network [23] [24] is adopted. Furthermore, the class-E matching can absorb the drain capacitance of cascode switches into [see Fig. 5 (a)]. It should be mentioned that, due to the electrical summation of and , the overall duty cycle at differential nodes of Drain and Drain in Fig. 5 (a) is 50% at equal component power levels. In addition, in a class-E matching network, the loading condition for an RF signal with is entirely different than at [24] . This explains why the efficiency/power contours of Fig. 6 (a) significantly differ from the modulation error contours.
Based on the above considerations, the design of an orthogonal power-combining network is divided into four identical parallel class-E type matching networks, which are distinctly illustrated in Fig. 5(a) . In this idealized power combiner, provides the required dc current of DRAC; decouples the drain node from the output. There are three yet-to-be-defined components:
, , and , whose values will be calculated in this section. As mentioned earlier, each orthogonal path generates more than 16 dBm of RF power at V. As a result, (8) where is a unitless function and depends only on the duty cycle , and thus, [21] , [24] . Based on (8), the following equation is derived: (9) According to [21, eqs. (11) and (12)], and strongly depend on ,
where and are unitless functions that only depend on . Hence, and . The idealized power combiner of Fig. 5 (a) is rather impractical. It should be modified such that it does not contain bulky components such as and . Moreover, the eventual RF output must drive the single-ended load of 50 .
To achieve these design goals, a balun is incorporated into the power-combining network as exhibited in Fig. 7 (a). Accordingly, the transformer comprises leakage and magnetizing inductors of and , respectively, as well as an ideal transformer with turns ratio [25] . While comparing the idealized power-combining network of Fig. 5 (a) and the more practical one of Fig. 7(a) , the value of , , and are derived as follows [21] :
Moreover, should resonate with , where is the bond-pad capacitance.
is a bond-wire inductor, which only slightly affects the power-combining network. Generally, the desired determines the size and structure of the selected transformer, which subsequently determines the value of for a given value of the magnetic coupling factor . To conclude, the balun de-couples the drain dc condition from the load (elimination of ) and converts the differential signal to a single-ended output [21] , [26] . Furthermore, the balun provides a dc bias path for the DRAC transistor switches (elimination of ) and transforms the 50-load to the desired impedance at the drain nodes of DRAC. As noted previously, the targeted output power for this design is 22 dBm. Based on the required and , the transformer size is selected at 450 450 m with 1:2 turns ratio. The transformer windings are 12-m wide with 3-m gaps between them. The balun must manage high currents of up to 360 mA. To do so, it employs three parallel traces in the primary winding that are inter-digitated with the secondary winding in order to satisfy electromagnetic rules of the technology [4] . Based on ADS Momentum simulations, the related of up to 6 GHz is 0.84. Moreover, according to Momentum and circuit-level simulations, the insertion loss of the balun is 1 dB, which causes the drain efficiency of the modulator to drop from almost 55% to approximately 44%.
The shunt input and output capacitors of the transformer balun are employed to fine tune the amplitude and phase relationship of the I/Q modulator for the desired frequency. For this purpose, two 4-bit binary-weighted capacitor banks are added at the primary and secondary sides [see Fig. 7(b) and (c) ]. Since the entire design is achieved using 1.2-V standard thin-oxide Based on the simulations, the primary capacitance varies between 4.8-7.8 pF, while the secondary capacitance changes between 1.9-2.7 pF. In addition, the reliability of RF-DAC is simulated with the assistance of Fig. 7(e) . The peak drain voltage of node Drain is less than 2.4 V, which indicates that the breakdown will not occur. Moreover, the minimum drain voltage is approximately 0.25 V, which results in an appropriate drain efficiency. Fig. 7(f) demonstrates the RF output signal. Its related RF output power is more than 22.6 dBm while the drain efficiency exceeds 44%. Also, the desirable modulation accuracy of the I/Q RF-DAC could be quickly ascertained from Fig. 7 (e) and (f). Based on these simulations, the I/Q signal is the result of orthogonally summing of I and Q signals . Table I summarizes the design parameters of the power-combining network.
IV. IMPLEMENTATION OF DIGITAL I/Q TX Fig. 8 reveals the block diagram of the implemented TX based on the proposed 2 13-bit RF-DAC. In the remainder of this section, its building blocks will be sequentially disclosed and their circuit design techniques described.
A. Clock Input Transformer
An off-chip single-ended clock at frequency is applied to an on-chip transformer to convert it to differential clock 
signals
. The transformer size is selected at 150 150 m with 1:1 turns ratio. The center tap of the secondary winding is connected to a common mode of . The windings are 6-m wide with 3-m gaps between them. Per Momentum simulations, the coupling factor is in the range of 7-13 GHz. Note that the simulated is related to each differential segment of the transformer. Based on that, the circuit simulations indicate that the transformer converts a 4-V single-ended signal to a 1.2-V differential clock that swings around . Due to nonidentical differential layout traces that introduce varying parasitic capacitance, the differential signals could arrive at the following 2 divider misaligned in phase, which might corrupt its operation. Therefore, the phases of clocks are realigned employing back-to-back inverters.
B. High-Speed Rail-to-Rail Dividers
The differential clock, , is applied to two cascaded 2 dividers to generate the desired carrier LO at , as shown in Fig. 9 (a) and (b). The 2 divider is implemented as a flip-flop-based frequency divider, which consists of four C MOS latches [27] arranged in a loop [see Fig. 9(c) ]. This topology produces four differential quadrature clock signals [ , , , and in Fig. 9(a) ] that operate at . The back-to-back inverters of Fig. 9 (c) ensure that no illegal states will occur. They also align the differential clock phases ( and ). The input and output nodes of C MOS latches experience rail-to-rail voltage swing. Consequently, they exhibit a superior noise performance over the low-swing current-mode logic (CML) latches. On the other hand, due to the large current bias and lower voltage swing of the CML latches, their operational frequency can be much higher than that of C MOS. Since the noise performance and power consumption are crucial design considerations, the C MOS latches are thus adopted here. The clock signals, however, could be as high as 7 GHz, and the divider should be operational for all process, voltage, and temperature (PVT) conditions, which might be difficult to achieve. Dissipating more current (e.g., by employing wider transistors while keeping the same supply level) could improve the speed of C MOS latches. Hence, their power consumption increases, which would decrease the overall system efficiency of the TX.
In this work, however, in lieu of increasing the power, the data and clock inputs of C MOS are swapped [see Fig. 9(d) ]. By doing so, the -to-Q delay of the latch, and subsequently, the overall loop time period of the divider decreases. Based on simulations and confirmed through measurements, the RF-DAC frequency of operation can be as high as 3.5 GHz at V. Note that all other 2 divider circuits also utilize an identical structure. The transistor sizing, however, is adjusted based on their operational frequency. For instance, the width of all transistors in the next 2 divider in both the main RF clock path ( 2) as well as the baseband clock path ( 16/32) of Fig. 8 are reduced by a factor of 2. Furthermore, every other differential output clock of the first divider ( and ) is applied to the next divide-by-2 circuits. By doing so, all C MOS latches experience identical loading conditions. Thus, their fanouts are equal.
Note that all clocks in the digital baseband circuitry ( and ), as well as the final RF fundamental clocks, , , , and , are synchronized. The amplitude and phase imbalances of the I and Q paths would deteriorate the I/Q image and leakage performance of the TX, thus they should be calibrated. The baseband and RF phase synchronization makes the I/Q calibration much simpler. Furthermore, employing two cascaded 2 dividers (i.e., divide-by-4 circuit) will ameliorate the quadrature accuracy of the fundamental clocks since all phases of the fundamental clocks are derived from the same rising edge of the master clock even in the event of a non-50% duty cycle.
C. Complementary Quadrature Sign Bit
As depicted in Fig. 8 , the second 2 divider is followed by a sign bit circuitry. As shown in Fig. 10(a) , it is implemented as two pseudo-differential (i.e., complementary) NAND-gate-based multiplexers with input selection control signals - [12] and - [12] . Based on the 2-bit (i.e., four-state) selection control, the differential clock pairs of or can be swapped, and thus the entire four-quadrant constellation diagram can be covered [see Fig. 10(b) ]. Contradictory to our previous scheme in [21] , the sign bit is located between the second divider and the 25% duty-cycle generator. In this new arrangement, the sign bit circuitry manages the 50% duty cycle clock instead of the 25% one, which reduces power consumption. Moreover, a simple back-to-back inverter pair [see Fig. 10 (a)] is employed for further phase alignment, which was not feasible in [21] . As a result, by exploiting smaller devices, faster rise/fall times are achievable. Moreover, compared to the transmission-gate-based multiplexer employed in [21] , the NAND-based multiplexer produces faster rise/fall times. This is because, in the transmission gate, the control logic transistors are placed between two floating nodes so the charging/discharging of the MOS channel is decelerated.
D. Differential Quadrature 25% Duty Cycle Generator
The sign bit signals , , , and are applied to a 25% duty cycle generator [see Fig. 11(a) ]. As stated previously, the orthogonal summing of the I and Q paths is achieved by employing the differential quadrature clocks with a 25% duty cycle. As a result, the 25% duty cycle generator is one of the most important building blocks of the clock generator chain.
The circuit utilized in [21] provides unmatched narrow/wide clock pulses. For example, the duty cycle for one pulse might be 31%, while it might be 27% for the others. In this work, however, the 25% duty cycle circuit generator of [11] is adopted. It is conceptually illustrated in Fig. 11(a) . Based on this approach, the 25% clocks at ( , , , and ) are generated by the AND operation between clocks of and where they operate at and , respectively. Thus, the 50% duty cycle clocks of are utilized as a reference pulsewidth for generating , , , and . Namely, their pulsewidth is identical to while running at . Hence, the circuit creates clocks with a precise 25% duty cycle. The AND operation of the 25% duty cycle generator as well as the sign bit are accomplished utilizing the circuit in Fig. 11(b) . This is an asymmetric circuit with respect to the gates of and . The gate capacitance of is smaller than of due to the series configuration (switchable cascode) of . Therefore, and are applied to the and gates, respectively. Thus, the AND gate consumes less power. Note that the desired 25% duty cycle clocks could also be generated using the AND operation of every two adjacent clocks of , , , and . The disadvantage would be the asymmetric AND inputs that create unmatched wide/narrow pulses. Thus, the circuit illustrated in Fig. 11(a) is the preferred approach.
E. Floorplanning of 2 13-bit DRAC
As mentioned previously, the targeted TX is an all-digital RF-DAC with 2 13-bit (including sign bit) resolution.
and -represent binary digital codes, which must be converted to thermometer encoding in order to avoid nonmonotonic behavior and midcode transition glitches [28] , [29] . The use of the pure thermometer encoding, however, would increase the complexity of the encoders, the chip area, interconnect parasitics, and power consumption. Thus, a segmented approach is adopted here [30] .
The segmentation is selected such that 8 bits are used for the MSB and 4 bits for the least significant bit (LSB) of the binary input. Therefore, the DRAC implementation requires 256 MSB and 16 LSB units. The design of such a complex RF-DAC requires several iterations between the schematic and layout design phases. The 256 MSB units further split into two sections while the clock generator circuits are situated in the middle [see Fig. 12(a) ]. Moreover, the 128 MSB units of each part are arranged such that they comprise eight rows and 16 columns (8 16) . Subsequently, the I/Q segmented thermometer code requires two types of in-phase and quadrature-phase baseband row and column thermometer codes, which are referred to as Row Row , as well as Col Col , and are generated by row and column encoders. The right MSB unit bank addresses the low thermometer code values (i.e., 0-127), while the remaining (i.e., 128-256) are managed by the left bank. Furthermore, the LSB unit comprises 16 small DRAC unit cells, which occupies only one row (1 16) at the bottom of the right MSB DRAC unit bank. The MSB DRAC units in each row must be situated in close proximity to each other. Moreover, the dummy DRAC cells are placed at the beginning and end of each row, which globally improves the matching of the DRAC unit cell with respect to each other. In addition, odd rows begin from the left side while the even rows begin from the right side. This "snake" traverse movement is indicated with arrowed lines in Fig. 12(a) . By doing so, the MSB thermometer units are continuously traversed from an odd to even row and vice versa. As a result, the differential nonlinearity (DNL) of the entire RF-DAC, as well as the glitch related to the dynamic switching of DRAC units, are kept below one LSB. Note that the clock trees (clock generating blocks) force the DRAC to split into two sections, which could possibly introduce considerable glitches.
To further justify it, Fig. 12 (b) and (c) compares two travel scenarios from the right bank to the left one. Namely, continuous and intentionally noncontinuous traverse. As indicated in Fig. 12(a) , the continuous traverse is the direct path between the cells 127 and 128, which is the nearest possible path. On the other hand, the noncontinuous traverse is the hypothetical path between the cells 127 and 255. Fig. 12(c) illustrates that noncontinuous movement generates a significant number of spurs and should thus be avoided. Therefore, as exhibited in Fig. 12(a) , the travel from the right bank to the left must be performed gradually. In conclusion, the continuous traverse prudent layout, as well as employing dummy cells, would almost entirely eliminate the dynamic glitch problems.
F. Thermometer Encoders of 3-to-7 and 4-to-15
Based on the above segmented arrangement, two 3-to-7 and three 4-to-15 (including the LSB encoder) binary-to-thermometer encoders are employed (five in total) and placed at the left, right, and bottom sides of the DRAC [see Fig. 12(a) ]. The encoders are implemented based on a 2-to-3 binary-to-thermometer encoder depicted in Fig. 12(d) . In this approach, the LSB and MSB of the thermometer code are produced by OR and AND operations of the two input binary bits ( and ), respectively. Moreover, the middle bit of the thermometer code is equal to the input MSB . The 3-to-7 encoder, however, is implemented in two increments. First, the intermediate 3-bit thermometer codes of Fig. 12(d) 
G. MSB DRAC Unit Cell
The DRAC design was fully described in Section III. In this section, the DRAC unit cell is explained in more detail. The MSB DRAC unit is illustrated in Fig. 13(a) . This unit consists of four equal and well-matched subsections (sub-DRAC), each comprising its own data and clock inputs. The quadrature input clocks are , , , and , and based on these signals, the sub-DRACs are referred to as , , , and , respectively. Moreover, as mentioned earlier, the related input data thermometer bits are Row , Col , Row , and Col along with two extra control bits of Row and Row in which they guarantee that all DRAC unit cells of the previous rows are activated. The sub-DRAC section comprises two parts; a pure digital (logic) and a digital-to-RF conversion part. The logic part consists of a decoding logic (AND-OR) and a time synchronizer flip-flop. Based on logic condition of its inputs, the AND-OR decoder [see Fig. 13(b) ] determines whether or not the sub-DRAC cell should be activated. The master/slave edge triggered flip-flop is employed for synchronizing all DRAC unit cells to its input clock, namely, , , , and , in order to reduce undesirable harmonic distortion related to early-late arrival of the input data of each DRAC unit cell. Additionally, this flip-flop also behaves as a ZOH interpolator. It comprises two cascaded multiplexer based latches, as indicated in Fig. 13(c) . In the sense mode of operation, the input clocks are low/high, and consequently, the input data passes through the "lower" pass-gate logic of and is subsequently buffered by the cascaded inverters of and . It signifies that the path between and is transparent. In the store mode, on the other hand, are high/low, and as a result, the "top" pass-gate logic of is transparent, and the "lower" one is opaque. Therefore, the two inverters of and are cross-coupled with each other and latch the digital input signal. All transistors of both the AND-OR decoder logic and flip-flop circuit are implemented with the most minimal aspect ratio in 65-nm CMOS, i.e., m m to minimize area and power consumption. As depicted in Fig. 13(a) , the flip-flop output of the sub-DRAC cell is buffered and subsequently connected to the cascode transistor ( , , , or ) to tolerate the input gate capacitance, and consequently, to improve the rise/fall time performance. As stated previously in Section IV-D, the gate capacitance of the cascode transistor with an aspect ratio of m m is much lower than the input capacitance of with the same transistor sizing. Therefore, utilizing a moderated buffer size is sufficient enough to satisfy the required data transition conditions. The buffer sizing is indicated in Fig. 13(c) .
The digital-to-RF conversion part consists of a gated cascode switch ( , , , and ) that yields the up-converting 1-bit mixer operation. Furthermore, it is perceived as a sub digital power cell. The switchable cascode transistor ( , , , and ) alleviates the reliability issue related to the high voltage swing that appears on the output nodes Drain Drain . Moreover, the cascode configuration also increases the output impedance, which results in the improved isolation between the I and Q paths that facilitates the orthogonal combination. The unit cell of the digital quadrature mixer is formed by electrically combining the outputs of two individual quadrature mixers (the upside -and downside -of Fig. 13 ) that are driven by quadrature input clocks (which also act as four sub digital power cells). Consequently, the entire RF-DAC is now created by simply connecting together the corresponding drain nodes of 256 MSB with 16 LSB DRAC unit cells.
As stated, each DRAC unit cell consists of , , , and unit cells, and their layout arrangement affects the performance of the entire RF-DAC. Fig. 13(d) illustrates one possible solution in which each quadrature sub-DRAC pair, i.e., and , is juxtaposed in two different sub-rows, which indicates that the DRAC unit cell is expanded horizontally. In this arrangement, the high-frequency 25% duty cycle quadrature clock pairs of and are laid out alongside each other. This, subsequently, increases the parasitic coupling capacitance of these clock lines, and as a result, deteriorates the clock rise/fall times. Moreover, since the position of clock lines are different than , their line capacitances also vary. Thus, and clock pulses are narrower and wider, respectively. Post-layout circuit simulations of Fig. 13(d) reveal the rise/fall time, as well as narrow/wide pulse problems related to the horizontal layout. The better solution, however, is to expand the DRAC unit cell vertically and place , , , and sub-DRAC unit cells in four sub-rows, as illustrated in Fig. 13(e) . In this arrangement, the parasitic coupling capacitance between the clock lines are almost negligible. The clock lines are also situated in the same positions and are sandwiched between the same sub-DRAC cells. Hence, their related rise/fall time and pulsewidth are well matched. Post-layout simulations in Fig. 13(e) substantiate that the vertical expansion is the most appropriate selection. To compensate for the extra vertical area related to the vertical expansion of the DRAC unit cell, the entire DRAC, as stated previously, comprises eight rows and 16 columns. Thus, left/right MSB DRAC banks become "squarish," which is beneficial for improved area efficiency and shorter clock distribution, which leads to less power dissipation.
V. DIGITAL I/Q CALIBRATION AND DPD TECHNIQUES
The proposed digital I/Q RF-DAC based TX, just as a typical I/Q TX [31] , requires an I/Q calibration to balance the I path with respect to the Q path in order to mitigate issues associated with an LO leakage and I/Q image. Moreover, as stated above, the I/Q RF-DAC comprises the efficient DPA arrays, which produce more than 22 dBm of saturated RF power. Otherwise stated, as depicted in Fig. 5(b) , the of the turn-on switches changes nonlinearly with respect to the input code and thus creates the AM-AM nonlinearity. Specifically, the AM-AM nonlinearity is the result of the code-dependent conductance of the drain node [32] . Furthermore, as stated in Section III, turning on the switches, as well as varying the drain voltage changes the drain-bulk capacitance of the digital power switches [see Fig. 5(c) and (d) ]. These varying capacitances in combination with the code-dependent switch conductance cause a large impedance shift at their related drain nodes, which subsequently leads to the AM-PM nonlinearity. Fig. 14 illustrates the shifting of the load reflection coefficient -of the related DRAC's drain node while sweeping the turn-on switches. Note that both and contribute to the AM-AM and AM-PM nonlinearities. In addition, as elaborated above, due to the fact that the passive power-combining network affects the RF-DAC's orthogonality, the imperfect orthogonal summing of the I and Q quadrature paths, as a result of inaccurate components of the passive combining network, leads to spectral regrowth [31] . Consequently, the RF-DAC must be digitally predistorted to meet the spectral mask of the chosen communication standard. To address these issues, techniques to manage these nonidealities are presented here.
A. IQ Image and Leakage Suppression
To improve the LO leakage and I/Q image suppression, the I/Q RF-DAC should be calibrated. First, (4) is rewritten according to clock pulses of , , , and ,
where is the baseband frequency, and represents a 25% duty cycle rectangular pulse clocked at . Moreover, , , , and are amplitudes of , , , and , respectively. In an ideal condition, their amplitudes are identical and equal to 1. As a result, after some iterations and the elimination of the higher harmonics, (15) is rewritten as . Note that, as stated in Section IV-B, due to the phase synchronization between the RF and baseband paths, as well as the precise quadrature clock generation utilizing divide-by-4 circuitry, the phase imbalance between and is 0. This is one of the significant advantages of the proposed I/Q RF-DAC. In reality, however, because of mismatches between , , , and , after some iterations and simplifications, changes to the following equation: -(16) in which and are the carrier image and leakage, respectively. To cancel , a proper dc value (i.e., ) is added to the original complex-valued baseband signal. Moreover, exploiting a very simple algorithm, the amplitudes of and ( , , , and ) change such that decreases. As a result, the calibration algorithm improves the LO leakage and I/Q image. To prove that a simple I/Q calibration algorithm can be effective, a 2.234-MHz I/Q baseband signal is applied to the TX. Fig. 15 illustrates that the simple calibration algorithm can significantly suppress the LO leakage and I/Q image. In this scenario, GHz while the output power is 19.62 dBm. Based on this measurement, the I/Q image suppression exceeds 58 dBc after five iterations while the LO leakage converges to better than 80 dBc.
Furthermore, in the quest to improve the RF-DAC transfer function linearity, eight integrated circuit (IC) chips have been measured and two well-known DPD algorithms have been employed. 
B. DPD Based on AM-AM and AM-PM Profiles
In this approach, a two-tone sinusoidal signal is applied at the baseband input, and the AM-AM and AM-PM profiles of the I/Q RF-DAC are evaluated [33] . First, the LO leakage and I/Q image are calibrated, and the down-converted envelope and phase of the probed RF output are subsequently collected. After rearranging the measured envelope and phase signals based on the signed 12-bit baseband code range, i.e., from 4095 to 4095, the AM-AM and AM-PM characteristics are obtained and depicted in Fig. 16 . According to these characteristics, the inverse functions of the envelope, i.e., and phase, i.e., , are applied to the input baseband code. Based on Fig. 16 , applying the AM-AM predistorted profile makes the desired AM-AM transfer function a straight line, i.e., . Moreover, the desired AM-PM characteristic is a constant line, i.e., .
C. DPD Based on I/Q Code Mapping
The second predistortion approach is preformed employing a constellation-mapping based DPD algorithm [34] - [36] . This paper, however, proposes a very simple modified constellationmapping DPD that is based on 1-D mapping of -and -. As stated previously, the complex modulated baseband data, --, is applied to the DRAC. Thus, the modulated RF output of the RF-DAC is expressed as (17) Nonetheless, due to the fact that RF-DAC is a nonlinear TX, (17) is not valid, and the RF output of the RF-DAC becomes
where -and -are the corresponding nonlinear complex profiles of -and -in which they are normalized to their related input codes. These profiles are indicated in Fig. 17(a) . In practice, these nonlinear characteristics are acquired as follows. First, due to the orthogonal operation of RF-DAC, and are individually swept from 4095 to 4095. The subsequent RF output is down-converted, and the related baseband complex signals, i.e., -and -, are obtained. Next, the inverse functions of -and -are evaluated and depicted in Fig. 17(b) . The in-phase and quadrature-phase DPD profiles are as follows:
Otherwise stated, the following relationships are established between -and , as well as -and :
Therefore, in this DPD process, -and -are individually mapped to and , respectively, -
-
Specifically, this DPD process can be inferred as 1-D mapping of two individual signals of -and -. In particular, since and are orthogonal, the DPD does not require a 2-D exhaustive search of the entire constellation diagram, which is required in [17] . Consequently, due to orthogonality, the subsequent and are obtained as follows:
-- Fig. 17(c) illustrates the open-loop 1-D mapping DPD. Note that the DPD profiles of and are obtained only at the beginning of the measurement operation, and will remain unchanged afterwards. Fig. 18(a) depicts the constellation mapping measurement setup structure. Using MATLAB, I and Q randomized symbols ( and ) are generated and supplied to the I/Q baseband modulator. This block creates quadrature amplitude modulation (QAM) signals of and . To confine the modulation bandwidth, and then get pulsed-shaped by exploiting a root-raised cosine (RRC) interpolation filter and upsampled to as high as the rate, which is (see also Fig. 8) . Afterwards, -and -are mapped utilizing (23)- (26) and Fig. 17(b) . Next, the predistorted signals ( and ) are uploaded into two designated on-chip SRAMs. Thereafter, the up-converted RF signal is down-converted utilizing a vector signal analyzer (VSA) and the subsequent down-converted digital in-phase and quadrature-phase signals are fed back to MATLAB. Three important steps should be followed. First, the measurement time delay should be calibrated. The subsequent complex signal phase, i.e., , should then be rotated such that the eventual phase, i.e., , is the same as the original complex phase, i.e., -- 
D. Verification of DPD I/Q Code Mapping
Examining this approach, a 256-symbol modulation is created. Based on the Fig. 18(b) concept, the constellation diagram is continuously swept from the top-left to top-right in a "snake"-like manner and traversed back again to its original point in order to preserve continuity. Note that, for simplicity, Fig. 18(b) only illustrates a 16-symbol constellation diagram, as well as their time-domain representations. Next, -and -, whose I/Q trajectories are exhibited in Fig. 18(c) , are predistorted ( and ) using the lookup table of Fig. 17 (b) and loaded into two on-chip SRAMs. Fig. 18(d) shows the effect of the I/Q DPD mapping on the I/Q trajectories of the original modulated signals. The RF output signal is down-converted, and its corresponding I/Q trajectories are depicted in Fig. 18(e) , which demonstrates a good agreement with the original I/Q trajectories of Fig. 18(c) .
and are then down-sampled and decimated to create the measured constellation diagram [see Fig. 18(f) ]. Its related EVM is 32 dB.
Note that, due to the limited data length of (i.e., 8192), which are repeatedly fed to the DRAC circuit from the first data point to the last, any discontinuity between the first data point and the last one creates an undesirable spectral jump. To alleviate this issue and to preserve the continuity, the data length of and are doubled and applied to the RRC interpolation filter, thereby only half of the data length of the subsequent -and -are exploited and applied to the DPD lookup table. This technique is referred to as a wraparound process. As a result, the starting points of the I/Q trajectories of Fig. 18(c) -(e), indicated with circles, have been shaped in such a way as to ensure the continuity of the I/Q signals.
VI. MEASUREMENT RESULTS
The proposed 2 13-bit all-digital I/Q RF-DAC is implemented in a TSMC 65-nm LP CMOS process technology. Fig. 19(a) exhibits the chip micrograph. The chip occupies 1.27 2 mm with an active area of 0.45 1 mm . Moreover, the designated SRAMs occupy an area of 1.27 1 mm while the remainder is occupied by decoupling capacitors and I/O pads. The RF-DAC employs only standard "Vt" transistors. All pads, including the single-ended RF input clock and RF output, are wire-bonded directly to the FR4 board.
The RF-DAC ground plane is improved utilizing the following approach. First, all ground pads are wire-bonded using flat bond wire, which decreases the equivalent inductance of the bond wire by approximately four times. Second, the chip is situated into a 300-m-deep hole. This makes the bond wires shorter, and as a result, the interconnecting inductance is smaller. For the measurements, as depicted in Fig. 19(b) , the chip requires five different supply voltages, namely, -V for the balun center-tap node, -V for the RF-DAC core, V for the input transformer center-tap node, -V for the SRAMs and UART interface, and finally, -V for I/O supply voltages. They are generated employing on-board regulators, ADP225ACPZ-R7 from Analog Devices, which use a common input supply voltage of 4.5 V. This configuration allows the entire I/Q RF-DAC chip to be tested with only a single battery or supply voltage. Moreover, due to employing the on-chip input transformer, the input 4 RF clock is a single-ended signal. In addition, as stated previously, all required clock signals, including the baseband upsampling clock and the up-converting RF carriers, are generated via the on-chip frequency dividers. Thus, the I/Q RF-DAC only requires one external clock generator, which results in a very simple board design and the test setup.
To verify the design through measurements, as was fully explained in Section II-B, first, the and baseband signals are upsampled and interpolated in software (PC-MATLAB). These upsampled signals, -and -, are subsequently loaded via UART into two SRAMs. Earlier simulations demonstrate that the achievable maximum drain efficiency of the I/Q RF-DAC output stage should be well above 44%. Due to the low power arrangement of the foregoing clocking and pre-driver circuitry, the overall system efficiency of the realized monolithic TX should be able to achieve 37% at 2.4 GHz for a peak output power level of 22.6 dBm at 1.2 V. Experimental verification demonstrates that, without using any correction for the printed circuit board (PCB) and SMA connector losses, the peak overall system efficiency occurs at 2.1 GHz and achieves 31.5% with a related peak output power of 22.3 dBm at 1.2 V. Although the TX was verified to work properly from 60 MHz to 3.5 GHz, the best performance is achieved in the frequency range of 1.36-2.51 GHz, where measurements illustrate the output power and overall system efficiency of more than 21 dBm and 21%, respectively (see Fig. 20 ). For this measurement, the carrier frequency is swept from 1.35 to 2.63 GHz in steps of 2 MHz. The supply voltage is also swept from 0.6 to 1.3 V. Fig. 20(a) and (b) only indicate the measurement results for 1.2-1.3 V. Based on these results, the peak output power is 22.8 dBm, while its related drain efficiency and system efficiency are 42% and 34%, respectively. These results emphasize the wideband operation of the realized on-chip output balun. Since the resolution of RF-DAC is 2 13 bits, the input baseband codes are swept from 4095 to 4095, and the output power with its related voltage and phase are measured. The measurement results are demonstrated in Fig. 20(c) and (d) . Based on Fig. 20(c) , the static carrier leakage level is more than 70 dB lower than the achievable maximum power. Fig. 20(d) exhibits the RF-DAC efficiency versus RF output power. The drain and system efficiencies at the 6-dB back-off are 19% and 14%, respectively.
The static AM-AM nonlinearity of the digital I/Q TX is illustrated in Fig. 21(a) . As expected, at lower absolute codes (center of the curve), the output voltage changes linearly with respect to the input code. In contrast, at higher codes, the curve begins to saturate. Moreover, Fig. 21(b) and (c) indicates the static AM-PM nonlinearity profiles. Based on the measurement results of Fig. 21(b) , the maximum phase deviation of individual I and Q codes from lower to higher codes is less than 10 . Fig. 21(c) indicates that, by changing only the -or -, not only the output amplitude changes, but also the output phase, thus revealing the AM-PM distortion of the RF-DAC. By applying the lookup table of Fig. 17(b) , the static I/Q constellation for a 256-symbol case is measured and depicted in Fig. 21(d) . Its related EVM is better than 30 dB while the maximum RF power is higher than 22 dBm. Note that the measurement results of Fig. 21(b)-(d) are obtained as follows [21] . The time-domain RF output signals are captured and saved. The FFT of these signals is subsequently calculated, and the amplitudes and phases are plotted to obtain the static constellation diagram of Fig. 21(d) .
The static phase noise of RF-DAC is measured for various carrier frequencies between 1.5-2.5 GHz, and the noise floor is ascertained to lie below 160 dBc/Hz. Fig. 22(a) exhibits the RF-DAC phase noise at 2.4 GHz. The maximum baseband code for and is 4095 which produces 21.54 dBm of RF power. It should be noted that, at 200-MHz frequency offset, the phase noise is 160 dBc/Hz. The figure also indicates two "spurs" at 300 and 600 MHz, which are actually the spectral replicas discussed previously. In this aspect, the ZOH filter operation ensures that these replica levels are below 70 dBc/Hz. Moreover, the RF-DAC phase noise performance is reexamined for lower codes (e.g., 32). Based on Fig. 22(b) , its related RF power and noise floor reduce to 14 dBm and 165 dBm/Hz, respectively.
Dynamic measurements have also been extensively performed. First, LO leakage and I/Q image suppression are examined. For this experiment, the LO frequency is set to 2.1 GHz, and the baseband frequency of -and -signals are approximately 2.05 MHz. Fig. 23(a) demonstrates that, even without applying any I/Q calibration, the LO leakage and image levels are 62 and 51 dBc, respectively, at an output power of 20.03 dBm. As such, these numbers are sufficient to meet the specifications of most communication standards. The low image level indicates the superior matching of I and Q paths. Moreover, the use of a divide-by-4 circuit instead of a divide-by-2 also proves to be beneficial in improving the quadrature operation. Applying the I/Q calibration technique of Section V-A, the image signal is further reduced by 14 dB [see Fig. 23(b) ].
The RF-DAC linearity significantly improves by applying either of the two DPD approaches discussed previously in Sections V-B and V-C. First, starting with the AM-AM/AM-PM profiles of Section V-B and applying only a fourth-order memoryless polynomial approximation, the linearity of the RF-DAC improves more than 25 dBc. Fig. 23(c) and (d) demonstrates the two-tone test measurement results before and after applying the DPD discussed in Section V-B. The tone spacing is set to 2.2 MHz, and the total RF power is measured above 16 dBm. The leakage level is below 55 dBm ( 68 dBc) and the third-order intermodulation product is improved to better than 50.4 dBc. Since only the fourth-order polynomial is used, the nonlinearities of higher intermodulation products do not reduce as much as . Although the DPD improves the linearity of the lower order odd intermodulation products (i.e., 3rd-7th), it deteriorates the odd higher order products, thus causing a bit of spectral regrowth. Comparing Fig. 23(c) and (d) , 9th-15th intermodulation products worsen.
Furthermore, employing the constellation-mapping DPD approach of Section V-C, a variety of I/Q signals have been tested. Fig. 24(a) exhibits the measured spectrum in combination with its related constellation diagram of a single-carrier "7-MHz 4-QAM" signal with and without the DPD. Utilizing the DPD improves the RF-DAC linearity by more than 19 dB. The adjacent channel power ratio (ACPR) is better than 47 dBc, while the alternate channel power ratio is better than 49 dBc. The measured EVM is 38 dB while its mean RF power and related drain efficiency are 18 dBm and 24.9%, respectively. Additionally, a single-carrier "22 MHz 64-QAM" signal is measured, and the corresponding spectrum and constellation diagram are depicted in Fig. 24(b) . Its corresponding ACPR is better than 43 dBc, while its related EVM is 28 dB.
Moreover, the chip is tested using a multi-carrier "20-MHz, 256-QAM, orthogonal frequency division multiplexing (OFDM)" signal. The close-in and far-out spectrum measurements are depicted in Fig. 25(a) and (b) . The close-in linearity exceeds 50 dB, therefore, it can pass the close-in spectral mask by a large margin. Nonetheless, due to the ZOH operation, its far-out spectrum contains replicas, which are discernible in Fig. 25(b) . According to the measured amplitude probability distribution depicted in Fig. 25(c) , the average power is 10.25 dBm, while the related peak-to-average-ratio (PAR) is as high as 8.6 dB.
The chip performance is examined for other single-carrier QAM signals with various modulation constellations and bandwidths. Fig. 26(a)-(c) exhibits the spectra of single-carrier "44-MHz 256-QAM," "88-MHz 256-QAM," and "154-MHz 1024-QAM," respectively. Since the operational bandwidth of our available VSA is limited to 20 MHz, it was not feasible to measure the EVM related to Fig. 26(a)-(c) . However, it is evident that the simple DPD lookup table of Fig. 17(b) still works up to 40 MHz. The RF-DAC shows memory effects, but only for high frequency offsets, and as a result, the DPD lookup table should be amended.
Additionally, as discussed in Section II-B, signals with wider bandwidths exhibit higher out-of-band spectra [see Fig. 17(d) ]. The explanation for such an artifact lies in the limited SRAM memory (8-kword in our implementation): with the fixed upsampling clock rate of , the "effective" over-sampling rate of wider band signals is lower than for narrower band signals; therefore, the noise floor will go up. Fig. 26(d) also reveals the spectral replicas of the ZOH operation. Section II-B suggests that increasing the upsampling clock rate, e.g., or even higher, would be a straightforward solution for decreasing the noise floor and spectral replicas. Table II summarizes the implementation and performance of the proposed I/Q RF-DAC. Table III compares our work against the relevant publications [8]- [11] , [16] , [17] . The proposed RF-DAC and the Mediatek work [17] are evidently the most prominent in achieving superior performance. However, [17] operates on the duty cycle of LO clocks, 40-nm CMOS technology, supply voltage of 1.8 V, upsampling clock rate of 804 MHz, and most importantly, requires a very sophisticated DPD algorithm. On the contrary, this work employs a very simple DPD lookup table facilitated by the novel technique to orthogonally combine the I and Q vectors using the LO clocks and the adapted power-combining network. The drain efficiency of our work is higher than in [17] , and if our RF-DAC were to be designed in a finer technology node, Bit resolution with its corresponding architecture of the TX. EVM is reported at maximum reported measurable bandwidth, which are either 5 or 20 MHz.
The average power is reported. Perhaps the peak is 9 dBm with 7% drain efficiency (off-chip balun).
They only reported their system efficiency. Note that their power-combining network is off-chip. the drain efficiency would be even higher. Note that, the Intel sytem-on-chip (SoC) work [16] , achieved higher RF power, but with lower drain efficiency due to incorporating a conventional DAC, low-pass filter, passive quadrature mixer, and class-AB PA. In contrast, the proposed 2 13-bit RF-DAC provides reasonably high RF output power with higher efficiency using a simpler architecture. In addition, Table III also presents the best performance numbers of recently published polar [5] and outphasing TXs [18] . As evidenced, the I/Q TXs can manage very wideband signals along with more effective EVM.
VII. CONCLUSION
In this paper, based on a concept of RF-DAC, we proposed a high-power high-resolution wideband all-digital I/Q TX. It employs 25% duty-cycle differential quadrature clocks to directly up-convert interpolated I and Q baseband signals and orthogonally combine them to their RF continuous-time representation. It is constructed through digital I/Q cascoded switch array unit cells connected to an on-chip low-loss transformer-based power-combining network. The TX is realized in 65-nm CMOS and produces 22.8-dBm peak output power, with 34% total system efficiency within 1.36-2.51-GHz frequency range. EVM for 64-and 256-symbol constellations is better than 32 dB. The entire system design considerations, as well as the circuit-level techniques, are thoroughly discussed. The TX can manage up to 154-MHz baseband signals. The constellation-mapping DPD is applied to the RF-DAC, and it improves linearity by more than 19 dB. These numbers indicate that this innovative concept is a viable option for the next generation of multi-band/multi-standard TXs. The realized demonstrator can perform as an energy-efficient RF-DAC in a standalone digital TX directly [e.g., for wireless local area network (WLAN)] or as a pre-driver for high-power basestation PAs.
