By scrambling the order of the DAC elements on a sample-by-sample basis, DEM converts the distortion caused by static amplitude, timing, and pulse-shape mismatches into white or spectrally shaped noise. Because DEM is a purely digital algorithm, its implementation becomes increasingly attractive in the latest CMOS technology nodes, where high-speed low-cost digital logic can be successfully exploited to compensate for the larger analog device mismatches.
I. INTRODUCTION Dynamic element matching (DEM) continues to be a prime choice to improve the static nonlinearity of high-resolution DACs [1] , [2] . By scrambling the order of the DAC elements on a sample-by-sample basis, DEM converts the distortion caused by static amplitude, timing, and pulse-shape mismatches into white or spectrally shaped noise. Because DEM is a purely digital algorithm, its implementation becomes increasingly attractive in the latest CMOS technology nodes, where high-speed low-cost digital logic can be successfully exploited to compensate for the larger analog device mismatches.
One important application of high-performance DEM DACs is in radio transmitters for wireless communications [3] [4] [5] , where the aforementioned digitalization trend has strongly emerged during the past two decades. Modern radio standards utilize complex modulation schemes, wide channel bandwidths, and stringent out-of-band (OOB) emission limits, thus imposing formidable speed and linearity requirements on the D/A conversion. Moreover, an additional design aspect in radio transmitters is the wide range of static signal power control required at the antenna, which can be as much as 75 dB for 3G user equipment. It is common to distribute such large gain control through the entire transmit signal chain, with 30-50 dB typically implemented in the digital domain [6] [7] [8] . Hence, in addition to high speed and linearity, the D/A conversion should also be optimized for signals with digital back-off, i.e., amplitude level smaller than the full-scale input range of the DAC.
Although the general idea of controlling a shuffler based on the signal amplitude was recently reported in one patent [9] , to the authors' best knowledge no details have been published on how to optimize an existing DEM algorithm for the scenario of digital back-off. In all previous circuit implementations, the DEM encoder operates always by assuming a full-scale input signal, thus scrambling all DAC elements although only a fraction of them would be required to perform the conversion. As explained in this letter, such nonoptimal operation leads to wasted power, reduced effectiveness of the DEM algorithm, as well as higher mismatch noise at the DAC output. To solve these problems, we introduce a new power-scalable DEM technique, based on the segmented tree-structure encoder [10] , where parts of the digital logic and DAC elements can be efficiently bypassed and disabled in back-off mode. The concept is experimentally validated for a prototype 9-bit I/Q RF-DAC, with block diagram shown in Fig. 1 . Measurement results demonstrate that DEM operates effectively with up to -18 dB of static digital back-off, while achieving a reduction in system power consumption of up to 72% compared to the full-scale scenario. Even though this letter discusses power-scalable DEM in the context of digital-intensive RF transmitters, the proposed method can be straightforwardly adopted in any DEM DAC application that utilizes digital gain control.
This letter is organized as follows. Section II introduces the concept of power-scalable DEM and its hardware-efficient realization. Section III describes the circuit-level implementation of the DEM RF-DAC. Measurement results are presented in Section IV, and Section V concludes this letter. Fig. 2(a) shows the block diagram of a 6-bit tree-structure DEM encoder with signed data input [5] , segmented into four unary- 
II. POWER-SCALABLE DEM
2573-9603 c 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
(a) for segmenting and nonsegmenting SBs, and
for first-layer SBs. In a scenario of digital back-off, the input signal amplitude spans only a subset of the encoder's full range {−32, . . . , +32}, leading to the following issues. First, as the signal can still propagate through any SB, the entire tree encoder must stay enabled and waste dynamic power. Moreover, in a differential DAC, a small-amplitude signal is formed by setting nearly half of the always-active DAC elements to +1 and the rest to −1, meaning that also the DAC power consumption does not scale with digital back-off. Second, the low signal amplitude forces the s[n] sequences constrained by (1) to be zero most of the time. If the sequences are spectrally shaped, such constraint will decrease the effective quantizer gain in the loops used for sequence generation. This in turn leads to closed-loop transfer function degradation or even instability, thus compromising the mismatch-shaping action of DEM. Third, because all DAC elements are active, their mismatch-noise contribution to the analog output is constant regardless of the decreased signal power, leading to nonoptimal signal-to-noise ratio.
All of the above challenges can be simultaneously solved as follows. Consider, for example, a digital back-off of -6 dB. Because this corresponds to 1/2 of the full-scale amplitude, the 6-bit input signal could be converted with a 5-bit DEM DAC without causing overflows. Thanks to the modularity of the tree encoder, a 5-bit DEM DAC with 3+2 segmentation can be readily reconfigured from the original 6-bit system, by rerouting the connections like in Fig. 3(a) . The resulting 5-bit tree encoder still operates as explained in [10] , thus scrambling the active DAC elements to ensure conversion of static mismatches into pseudorandom noise. Furthermore, the excluded SBs can be disabled by the means of conventional digital power-saving techniques (e.g., clock gating), while the unused DAC elements can be also turned off if their internal circuitry allows it. The described method can be trivially extended to a digital back-off of -12 dB [ Fig. 3(b) ] or any multiple of -6 dB. In practice, a direct implementation of the approach of Fig. 3 would require inserting multiplexers along the signal path. This is undesirable because such multiplexers would prevent the synthesis software to infer a single optimized datapath cell for multiple SB adders, yielding lower quality-of-results (QoRs). Fortunately, the extra multiplexers can be avoided altogether by utilizing the trick displayed in Fig. 4 . By multiplying the input of a nonsegmenting SB by 2 N , the signal is passed untouched across the following N SBs, since all the s[n] sequences are forced to zero by (1) and the factor 2 N is compensated by the 1/2 gain blocks. By varying N, the signal is effectively rerouted to a different SB in the tree according to the programmed level of digital back-off, like in Fig. 3 . Note that the factor 2 N can be realized by left-shifting the signal by N positions, thus no high-speed multiplier is needed on the signal path.
III. CIRCUIT IMPLEMENTATION Without loss of generality, this letter demonstrates the effectiveness of power-scalable DEM in the context of a 9-bit I/Q RF-DAC, which is part of an all-digital transmitter (Fig. 1) . In the implemented system, the modulator and DEM encoder realize a bandstop transfer function with center frequency programmable over the entire Nyquist range. The purpose is to create a notch in the OOB spectral density at a specific frequency offset from the transmit band, in order to ease coexistence with nearby receivers [3] , [5] . Fig. 5 depicts the top-level block diagram of a single quadrature branch. The 9-bit RF-DAC is segmented into four unary-weighted MSBs (i.e., 16 elements with weight 32) and five binary-weighted LSBs. To reduce complexity of the digital circuitry, only the six MSBs are scrambled by the power-scalable DEM encoder, whereas the three LSBs directly drive the corresponding RF-DAC elements, since their mismatch-noise contribution has been verified through simulations to be negligible. Pipeline registers are inserted along the signal path, enabling a sample rate equal to 1/4 the carrier frequency. The encoder supports four power-scaling modes for digital back-off levels smaller than 0, -6, -12, and -18 dB, with extensive clock-gating logic used to reduce power consumption of the disabled SBs. Digital power control is performed by adjusting the gain signal P CTRL (visible also in Fig. 1 ) with the wanted resolution, followed by selecting the correct DEM power-scaling mode as shown in Fig. 5 . Even though the range of P CTRL can be as large as 50 dB [8] , no additional modes for back-off levels lower than -18 dB are implemented, since the practical benefits of power-scalable DEM would be negligible at such low signal amplitudes. Because power control in a transmitter system is adjusted between radio frames, when no signal is being transmitted, any transient caused by switching the DEM encoder to a different power-scaling mode (e.g., supply noise) is not critical. Fig. 6 discloses the implementation details of the sequence generator internal to each SB. It resembles a modulator without signal input, where the special quantizer is designed to satisfy (1) or (2) . By modeling the quantizer as additive random error, it can be shown that the circuit generates a pseudorandom sequence shaped by
which is a bandstop transfer function with programmable coefficients α and r. The value of α = 2 cos(2π f 0 /F s ) determines the notch offset f 0 for a given sample rate F s , while r mainly affects the stability of the loop through the maximum magnitude of NTF(z) in the passband [3] . All measurements in this letter use r = 0.75, which is a good compromise between stability, spectral performance, and ease of implementation. Note that (3) is also the noise transfer function implemented by the modulator preceding the DEM encoder, which decreases the signal wordlength from 14 to 9 bits. Thanks to the constraint s[n] ∈ {−1, 0, +1}, the −(1 − r)α and 1 − r 2 programmable taps in Fig. 6 are replaced by simple three-way multiplexers with precomputed inputs. The remaining arithmetic is arranged into a single multiply accumulate operation, thus providing the best QoR in terms of propagation delay and hardware complexity.
The 9-bit RF-DAC implements a high-speed current-steering architecture. As shown Fig. 7(a) , each unity-weighted conversion element consists of a digital mixing circuit, two AND gates for 25% dutycycle generation, and two symmetrical current-steering branches built of a current source in series with a switch [5] . An element with weight W i is formed by parallel connection of W i units. The operating principle is illustrated by the waveforms in Fig. 7(b) , where the 4× relation between sample period T s and carrier period T c can be observed. Depending on the selected power-scaling mode, the ith DEM encoder output b i [n] either switches between +1 and −1, or is constant 0 (Fig. 3) . When b i [n] = ±1, the element is enabled and performs digital-to-RF conversion normally. When b i [n] = 0, both current-steering branches are disabled, thus significantly decreasing RF-DAC current consumption in digital back-off.
IV. MEASUREMENT RESULTS
The I/Q RF-DAC is integrated with an on-chip balun as part of a prototype cellular system-on-chip, targeted for the 3.4-4.9-GHz frequency band. The chip is fabricated in a 16-nm FinFET process, with micrograph shown in Fig. 8 . The circuit utilizes three separate supply domains: 0.8 V for the synthesized digital part, 1.0 V for the digital mixers, and 1.2 V for the current-steering RF-DACs. The overall linearity of the transmitter is verified to comply to the specifications of the supported cellular standards. All measurements reported in this letter are performed with an LTE20 carrier at various levels of digital back-off from the full-scale output power of +3 dBm. Fig. 9 shows the OOB spectra of the I/Q RF-DAC in three configuration modes: 1) linear quantization (both modulation and DEM bypassed); 2) modulation without DEM; and 3) modulation with DEM. The latter two cases are measured with the stopband tuned to 45 and 100-MHz offset from the 3.4-GHz transmit band. The results validate the basic operation of the system [3] , highlighting a noise-floor improvement enabled by modulation and DEM of nearly 10 dB compared to using linear quantization. The notch depth is limited by phase noise originating from the long on-chip clock paths, which could be improved by utilizing stronger buffering.
Operation in digital back-off is demonstrated by the OOB spectra of Fig. 10 , with the stopband offset tuned to 100 MHz. Because of the smaller number of active RF-DAC elements, the averaged noise Fig. 9 . OOB spectra in different configuration modes, when transmitting a full-scale (+3 dBm) LTE20 carrier at 3.4 GHz. Fig. 10 . OOB spectra for various levels of back-off from the full-scale output power of +3 dBm, when transmitting an LTE20 carrier at 3.4 GHz. floor over 18-MHz bandwidth decreases by up to 4.5 dB from the full-scale value. This is in contrast to the case of linear quantization, where the noise floor is constant regardless of the back-off level. The corresponding power consumptions of the I + Q signal paths are reported in Fig. 11 , where the relative reductions from total fullscale power are also highlighted. The +DEM blocks consume 19.7 mW with no back-off, when clocked at 0.85 GS/s. This marks a 63% improvement compared to [5] (53 mW @ 0.9 GS/s), which has been achieved through several system and arithmetic optimizations as well as porting the design to a 16-nm technology. Thanks to the power-scalable DEM encoder, the digital consumption decreases to 9.1 mW for -18 dB of digital back-off (-54%), while the RF-DAC consumption also drops from 267 to 71 mW (-73%) without any bias tuning or gain control in the analog domain. Table I compares the measured performance with the state-of-theart RF-DACs. This letter demonstrates programmable bandpass modulation and DEM at the highest carrier frequency and output power, while achieving the largest reduction in power consumption by utilizing only digital back-off. 
