Abstract-We propose a new transmitter architecture for ultra-low power radios in which the most energy-hungry RF circuits operate at a supply just above a threshold voltage of CMOS transistors. An all-digital PLL employs a digitally controlled oscillator with switching current sources to reduce supply voltage and power without sacrificing its startup margin. It also reduces 1/f noise and supply pushing, thus allowing the ADPLL, after settling, to reduce its sampling rate or shut it off entirely during a direct DCO data modulation. The switching power amplifier integrates its matching network while operating in class-E/F2 to maximally enhance its efficiency at low voltage. The transmitter is realized in 28 nm digital CMOS and satisfies all metal density and other manufacturing rules. It consumes 3.6 mW/5.5 mW while delivering 0 dBm/3 dBm RF power in Bluetooth Low-Energy mode.
I. INTRODUCTION
U LTRA-LOW-POWER (ULP) radios underpin short-range communications for wireless Internet of Things (IoT) [1] - [12] . Yet, the IoT system lifetime still tends to be severely limited by a transmitter power consumption and available battery technology. Fig. 1 shows a system lifetime for various battery choices as a function of current consumption. State-ofthe-art Bluetooth Low Energy (BLE) radios [1] - [3] consume ∼7 mW and thus can continuously operate no more than 40 hours on a single SR44 battery, which has comparable dimensions to the radio module. This triggers inconvenient battery replacements at least every few months, which limits their marketing attractiveness. The lifetime could be easily extended with larger batteries but that comes at a price of increased weight and size and it is clearly against the vision of IoT miniaturization.
Energy harvesting from the surrounding environment can enable and further spur the IoT applications by significantly extending their lifetime. Solar cells offer the highest harvested power per area, as can be gathered from Fig. 2 [10] , [13] . However, they provide much lower voltages (0.25-0.75 V) than the nominal deep-nanoscale CMOS supply of ∼1 V. Hence, boost converters are typically used to bring the supply level up to the required ∼1V. As evident from Table I , the relatively poor efficiency (≤ 80%) of state-of-the-art boost converters wastes the harvested energy, thus worsening the system-level efficiency, in addition to increasing the hardware complexity coupled with issues of switching ripples. Consequently, it would be highly desirable for the ULP radios to operate directly from the harvested voltage.
In this paper, several new system and circuit techniques are exploited to enhance the ULP transmitter efficiency: First, the most energy-hungry circuitry, such as a digitally controlled oscillator (DCO) and an output stage of a power amplifier (PA), can operate directly at the low voltage of harvesters. Second, a new switching current-source oscillator reduces power and supply voltage without compromising the robustness of its start-up. Third, thanks to the low wander of the DCO, digital power consumption of the rest of all-digital PLL (ADPLL) is saved by scaling the rate of a sampling clock to the point of its complete stillness. Last, a fully integrated differential class-E/F 2 switching PA is utilized to optimize high power added efficiency (PAE) at low output power of 0-3 dBm.
The paper is organized as follows. Section II introduces a new RF oscillator topology that is suitable for ultra-low voltage/power applications. The tradeoffs between the output power, matching network insertion loss, drain and power-added efficiency of the class-E/F 2 PA are investigated in Section III. The ADPLL-based TX architecture is discussed in Section IV. Section V experimentally verifies our approach.
II. SWITCHING CURRENT-SOURCE DCO
RF system designers shall be able to better optimize a power budget of various IoT radio blocks by understanding 0018-9200 © 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. the characteristics of a BLE transient power profile. Fig. 3 illustrates such an example of a commercial CC2541 IC from Texas Instruments during a single connection event [17] and could be used as our rough guide. We infer that the frequency synthesizer activity is at least 3x longer than that of a PA. Furthermore, the PLL power consumption is generally known to be merely 3-4x lower than that of the PA at the maximum BLE output power of 1 mW. This ratio gets even lower as the TX output power reduces. By considering both scenarios, the energy consumption of the frequency synthesizer could even be larger than that of the PA. Consequently, RF oscillators, as one of the BLE transceiver's most power-hungry circuitry, must be very power efficient and preferably operate directly at the energy harvester output [18] .
A. Oscillator Power Consumption Tradeoffs
Phase noise (PN) and figure of merit (FoM) of any RF oscillator at an offset frequency Δω from its resonating frequency ω 0 = 2πf 0 can be expressed by L(Δω) = 10 log 10 KT 2Q 2 t α I α V P DC · F · ω 0 Δω 2 and FoM = 10 log 10 10 3 KT
where K is the Boltzmann's constant, T is the absolute temperature, Q t is the LC-tank quality factor; α I is the current efficiency, defined as a ratio of the fundamental current harmonic I ω0 over the oscillator DC current I DC ; and α V is the voltage efficiency, defined as a ratio of the single-ended oscillation amplitude, V osc /2, over the supply voltage V DD [21] - [25] . F is the oscillator's effective noise factor and estimated by
where φ = ω 0 t, i 2 n,i (φ) is the white current noise power density of the ith noise source, Γ i is its relevant ISF function from the corresponding ith device noise [26] . Finally, R in is an equivalent differential input parallel resistance of the tank's losses. The oscillator I DC may be estimated by one of the following equations:
As a result, the RF oscillator's P DC is derived by
By considering the BLE blocking profile in [19] , the oscillator's PN shall be better than −105 dBc/Hz at Δf = 3 MHz offset from a f 0 = 2.45 GHz carrier [6] , [9] . Hence, the PN requirements are quite trivial for IoT applications 1 and can be easily met by LC oscillators as long as Barkhausen start-up criteria are satisfied over process, voltage and temperature (PVT) variations. 2 Consequently, maximally reducing the oscillator's power consumption, P DC , at a low V DD is the ultimate goal in IoT applications. Eq. (4) indicates that the minimum achievable P DC can be expressed in terms of a set of optimization parameters, such as R in , and a set of topology-dependent parameters, such as minimum V DD , α V and α I .
Lower P DC is typically achieved by scaling up R in = L p ω 0 Q t simply via a large multi-turn inductor, as in [27] . For example, while maintaining a constant Q t , doubling L p would theoretically double R in , which would reduce P DC by half but with a 3 dB PN degradation. However, at some point, that tradeoff stops due to a dramatic drop in the inductor's self-resonant frequency and Q-factor. Fig. 4(a) shows the simulated Q-factor of several multi-turn inductors in TSMC 28 nm CMOS versus their inductance. As the inductor enlarges, the magnetic and capacitive coupling to the low-resistivity substrate increases, such that the tank Q-factor drops almost linearly with L p . As evident from Fig. 4(b) , this constraint sets an upper limit on R in , which is chiefly a function of the technology node. Parasitic capacitance of inductor windings, gm-devices, switchable capacitors and oscillator routings determines a minimum floor of the tank's capacitance, which appears to be ∼250 fF at f 0 = 4.8 GHz. It puts another restriction on L p and R in (max) to ∼4.5 nH and ∼1.3 kΩ and sets a lower limit on P DC of each oscillator structure. Under this condition, the tank's Q-factor drops to ≤ 9. This explains the poor FoM of RF oscillators in modern BLE transceivers [1] - [3] .
The topology-dependent parameters also play an important role in trying to reduce P DC . Eq. (4) favors structures that offer higher α I or can sustain oscillation with smaller V DD and α V . On the other hand, α V · α I should be maximized to avoid any penalty on FoM [22] , [28] , as evident from (1) . Consequently, to efficiently reduce P DC without disproportionately worsening the FoM, it is desired to employ structures with a higher α I and a lower minimum V DD . To get a better insight, Fig. 5 shows such effects for the traditional crosscoupled NMOS-only (OSC N ) and complementary push-pull (OSC NP ) structures [30] , [31] . Due to the less stacking of transistors, the V DD,min of OSC N can go 40% lower than in OSC NP . However, α I of OSC NP is doubled due to the switching of tank current direction every half period. Its oscillation swing, and thus α V , is also 50% smaller. Hence, OSC NP offers ∼3× lower α V /α I . However, both structures demonstrate similar α V · α I product [32] . Consequently, each of them has its own set of advantages and drawbacks such that their minimum achievable P DC and FoM are almost identical, as shown in [22] , [33] , which is in line with the FoM optimization but against the P DC reduction, as evident from (1) and (4). Furthermore, while maintaining the same R in , a class-F 3 operation does not reduce P DC of traditional oscillators, since its minimum V DD , α V and α I are identical to OSC N [24] .
A push-pull class-C oscillator appears as an excellent choice for ULP applications due to its largest α I and smallest α V [34] , as per Table II . However, it needs an additional complex biasing circuitry (e.g., an opamp) to guarantee the proper oscillator start-up and to keep the transistors in saturation during the onstate. There are also strong mutual tradeoffs between the biasing circuit's P DC , oscillator's amplitude stability and PN, much intensified in ULP applications where the tank capacitance tends to be smaller [35] . As a consequence, the biasing circuitry can end up consuming comparable power as the ULP oscillator itself. On the other hand, V DD of class-D oscillators can go below a threshold voltage, V t . However, due to hard switching of core transistors, its α V and α I are respectively higher and lower than other structures [36] , as shown in Table II . According to (4) , this trend is against the P DC reduction. Consequently, the current oscillator structures have issues with reaching simultaneous ultra-low power and voltage operation.
In this work, we propose to convert the fixed currentsource of the traditional low-voltage NMOS topology into a structure with alternating current sources such that the tank current direction can change every half-period. Consequently, the benefits of low supply of the OSC N topology and higher α I of OSC NP structure are combined to reduce power consumption further than practically possible in the traditional oscillators. Fig. 6 shows an evolution towards the switching currentsource oscillator. The OSC N topology is chosen as a starting point due to its low-V DD capability. To reduce P DC further, it is desired to switch the direction of the LC-tank current in each half period, which will double α I . Consequently, we propose to split the fixed current source M 1 in Fig. 6 (a) into two switchable "current sources" M 1 and M 2 , as suggested in Fig. 6(b) . This allows for the tank to be disconnected from the V DD feed and be moved in-between the upper and lower NMOS transistor pairs to give rise to an H-bridge configuration. In the next step, the passive voltage gain blocks, A 0 , are added to the NMOS gates, as shown in Fig. 6(c) . Both upper and lower NMOS pairs should each individually demonstrate synchronized positive feedback to realize the switching of the tank current direction. The "master" positive feedback enforces the differential-mode operation and is realized by the lower-pair transistors configured in a conventional cross-coupled manner. Since the lower pair is voltage-biased, its negative conductance seen by the tank may be estimated as
B. Switching Current-Source Oscillator
On the upper side, the differential-mode oscillation of the tank is reinforced by the M 3,4 devices which realize the second positive feedback. 3 The negative conductance seen by the tank into the upper pair can be calculated as
, which clearly indicates that the voltage gain block is necessary and A 0 must be safely larger than 1 to be able to present a negative conductance to the tank, thus enabling the H-bridge switching. By merging the redundant voltage gain blocks, 4 the proposed switching current-source oscillator is arrived at in Fig. 6(d) .
Figs. 7-8 illustrate the proposed oscillator schematic and simulated waveforms indicating various operational regions of M 1−4 transistors. The two-port resonator consists of a step-up 1:2 transformer and tuning capacitors, C 1,2 , at its primary and secondary windings. The current-source transistors M 1,2 set the oscillator's DC current. Along with M 3−4 , they play a vital role of switching the tank current direction. As can be gathered from V DD of the proposed oscillator can be as low as V OD1 + V OD3 ≈ V t , which is extremely small given the capability of switching the tank current direction. Note that the oscillation swing cannot exceed V OD1,2 at DA/DB nodes and is chosen 150 mV to satisfy the PN requirements with a margin. However, it is the bias voltage V B ≈ V OD1 + V gs3 that limits the minimum supply. Hence, M 3,4 should work in weak-inversion keeping V gs3 < V t to achieve lower V DD,min . However, the transistor's cut-off frequency f max drops dramatically in the subthreshold operation. Note that f max should be at least 3-4× higher than the operating frequency f 0 = 4.8 GHz to guarantee the oscillator start-up over PVT variations. This constraint limits V gs3 ≈ 0.3 V for V OD3 ≈ 150 mV, as inspected from Fig. 9 . Consequently, even by considering the tougher V B requirement, the proposed structure can operate at V DD as low as 0.5 V, on par with OSC N .
Such low V DD and swing could easily lead to start-up problems in the traditional oscillators. This could certainly increase power consumption, P buf , of the following buffer, which would require more gain to provide a rail-to-rail swing to output a clock to a following ÷2 divider. Fortunately, the transformer gain enhances the oscillation swing at M 1,2 gates to even beyond V DD , thus guaranteeing the oscillator start-up and reduction of P buf . Consequently, the oscillator buffer is connected to the secondary winding.
As evident from Fig. 8 , M 3,4 transistors operate in a class-C manner as in a Colpitts oscillator, meaning that they deliver somewhat narrow-and-tall current pulses. However, their conduction angle is quite wide, ∼π, due to the low overdrive voltage in the subthreshold operation. On the other hand, M 1,2 operate in a class-B manner like cross-coupled oscillators, meaning that they deliver square-shape current pulses. Hence, the shapes of drain currents are quite different for the lower and upper pairs. However, their fundamental components demonstrate the same amplitude and phase to realize the constructive oscillation voltage across the tank. The higher drain harmonics obviously show different characteristics. However, they are filtered out by the tank's selectivity characteristic. Note that the current through a transistor of the upper pair will have two paths to ground: through the corresponding transistor of the lower pair and through the single-ended capacitors. Consequently, the single-ended capacitors sink the higher current harmonics of M 3,4 transistors.
C. Thermal Noise Upconversion in the Proposed Oscillator
To calculate a closed-form PN equation, the proposed oscillator model is simplified in Fig. 10 . At the resonant frequency, the transformer-based tank can be modeled by an equivalent LC-tank of elements L eq , C eq and R in . 5 On the other hand, M 1−4 transistors, together with the passive voltage gain of the transformer, are decomposed into two nonlinear time-variant conductances. The first one is always negative to compensate for the circuit losses:
The second one is always positive, G ds (φ) = 0.25 g ds,1:4 (φ), modeling the equivalent channel conductance of M 1−4 . The noise sources of M 1−4 are uncorrelated and always find a path through the tank and via C par to ground. To get a better insight, the equivalent noise due to channel conductance, i 2 n,Gds (φ) = 4KT G ds (φ), and due to transconductance
It is well known that the relevant impulse sensitivity function of noise sources associated with a sinusoidal waveform oscillator, V osc · cos φ, may be estimated by Γ = sin(φ) [26] , [30] . By exploiting (2), the effective noise factor due to resistive losses of the oscillator becomes 5 The interested reader is directed to [41] for accurate closed-form equations of Leq, Ceq and R in . 6 Calculated following the method in [37] .
where G DS [k] describes the k th Fourier coefficient of the instantaneous G ds (φ). To get better insight, different components of the above equation are graphically illustrated in Fig. 11 (a)-(c). The literature interprets R in G DSEF term in (5) as the tank loading effect. 7 In our design, M 1 and M 2 alternatively enter the triode region for part of the oscillation period and exhibit a large channel conductance. As shown in Fig. 11(a) , simulated 0.5R in G DS1EF can be as large as 0.6 for the lower pair transistors. However, M 3,4 work only in saturation and demonstrate small channel conductance for their entire on-state operation, as evident from Fig. 11(a) . Hence, the simulated value of 0.5R in G DS4EF is as low as ∼0. 17 . Note that both NMOS and PMOS pairs of the OSC NP structure simultaneously enter the triode region for part of the oscillation period and load the tank from both sides. In the proposed structure, however, only one side of the tank is connected to the AC ground when either M 1 /M 2 is in triode while the other side sees high impedance. Hence, this structure at least preserves the charge of differential capacitors over the entire oscillation period. Consequently, compared to the traditional oscillators, the tank loading effect is somewhat reduced here.
To sustain the oscillation, the average power dissipated in the oscillator's resistive loss, R in + 1/G ds (φ), must equal the average power delivered by the negative resistance, G n (φ), of the active devices. As proved in [37] , this energy conservation requirement results in
As with OSC NP [31] , [37] , both upper and lower feedback mechanisms should exhibit almost identical, i.e., ∼50%, contribution to the compensation of oscillator losses. Consequently,
, and, By exploiting (2), the effective noise factor due to transconductance gain is calculated as
To get better insight, different components of above equation are graphically illustrated in Fig. 11(d) -(e). By merging (7) into (8), we have
As discussed in conjunction with Consequently, the upper transistors should work harder and compensate 1/(2(A 0 − 1)) of the oscillator loss. Consequently, as (9) indicates, the G M noise contribution by the lower pair is smaller. However, its effect on F loss is larger such that both pairs demonstrate more or less the same contribution to the oscillator PN [see Fig. 11(f) ]. Finally, the total oscillator effective noise factor is
.
By considering γ 1 = γ 4 = 1.4 and A 0 = 2.15, the noise factor of the proposed oscillator is ∼5.3 dB, which is just 1.5 dB higher than the ideal value of (1 + γ) despite the aforementioned practical issues of designing ultra-low voltage and power oscillators. The phase noise and FoM of the proposed oscillator can be calculated by replacing (10) in (1).
D. 1/f Noise Upconversion in the Proposed Oscillator
Several techniques have been exploited to lower the oscillator's 1/f noise upconversion. First, dynamically switching the bias-setting devices M 1,2 will reduce their flicker noise, as also demonstrated in [38] . It also lessens the DC component of their effective ISF [26] . Second, as suggested in [39] and [40] , 1/f noise upconversion can be alleviated by realizing an auxiliary resonance at 2ω 0 such that the 2nd-harmonic current flows into an equivalent resistance of the tank in order to avoid disturbing the waveform's rise and fall symmetry. Since common-mode signals, e.g., the 2nd harmonic of the drain current, cannot see the tuning capacitance at the transformer's secondary [21] , the auxiliary 2ω 0 resonance can be realized without die area penalty by adjusting the single-ended capacitance at the transformer's primary [39] . The last source of 1/f noise is M B1 in the biasing circuitry. By utilizing long-channel devices in M B1/B2 biasing, their power consumption becomes negligible. Furthermore, their large W L area generates less 1/f noise. Consequently, based on aforementioned techniques, a lower 1/f 3 PN corner is expected than in the traditional oscillators.
E. Optimizing Transformer-Based Tank
The transformer-based tank's input equivalent resistance, R in , and voltage gain, A 0 , should be maximized for the best system efficiency. They are a strong function of ζ=L 2 C 2 /L 1 C 1 [18] , [41] , as shown in Fig. 12 . R in may be estimated by
where ω 2 s = 1/L 2 C 2 , and Q 1 and Q 2 are respectively the Q-factors of the transformer's primary and secondary windings. It can be shown that R in reaches its maximum when Note that the tank Q-factor is maximized at different ζ = Q 2 /Q 1 [24] . The maximum R in is obtained by inserting (12) into (11):
Consequently, the transformer's coupling factor k m enhances R in by a factor of ∼ (1 + k 2 m ) at ζ Rmax . For this reason, the switched-capacitor banks are distributed between the transformer's primary and secondary to roughly satisfy (12) . For k m ≥ 0.5, the voltage gain of the transformer-based tank may be estimated by
As shown in Fig. 12(c) , A 0 increases with larger ζ. Note that larger R in and A 0 are desired to reduce P DC and P buf , respectively. To consider both scenarios, trans-impedance R 21 = R in · A 0 term is defined and depicted in Fig. 12(d) . R 21 reaches its maximum at ζ = 1 for Q 1 ≈ Q 2 , which is reasonable for monolithic transformers. We also define the maximum of R 21 as the transformer FoM = (
Consequently, the transformer dimensions and winding spacing are chosen to maximize this term.
III. CLASS-E/F 2 SWITCHED-MODE POWER AMPLIFIER
The second most energy-hungry block in a BLE transceiver is the PA. 8 Designing a fully integrated PA optimized for low output power (P out < 3 dBm) with high power-added efficiency ( PAE > 40%) is very challenging, especially when the spurious harmonic level must be below −41 dBm to fulfill the FCC 15.247 regulation. To deliver such a low P out with the highest PAE to the R L = 50 Ω load, the equivalent resistance r L seen by PA switching transistors must be scaled up by the PA's output matching network.
A single-ended (SE) class-D PA generates the lowest P out among various flavors of switched-mode PAs when considering the same V DD and r L . Hence, the impedance transformation ratio, ITR = r L /R L , and therefore insertion loss of its matching network, can be theoretically the lowest, making the class-D PA an attractive choice for fully integrated BLE transmitters, as also gathered from [1] - [3] . However, the 2nd-harmonic emission of SE class-D PAs is quite poor and thus an additional feedback structure is needed to adjust the PA's conduction angle to ∼π in order to suppress even-order harmonics [1] - [3] . However, that circuitry worsens the system power consumption, die area and complexity. Furthermore, a loaded Q-factor of a class-D series LC matching network Q L = L s ω 0 /R L is quite low (∼1 for L s as large as 3.5 nH). Hence, its filtering function would not be capable to suppress the 3rd harmonic to ≤ −41 dBm. As a consequence, an additional on-chip [2] , [3] or off-chip [1] low-pass filter is required. This approach dramatically increases the matching network insertion loss and area such that the original benefits of SE class-D PAs are lost and the BLE system efficiency is limited to ≤ 20% in state-of-the-art publications [1] - [3] .
In this work, a fully integrated differential class-E/F 2 PA [ Fig. 13(a) ] is exploited to address the aforementioned issues. Its characteristics and its matching network will be optimized in the following subsections.
A. Efficiency and Selectivity Tradeoff in Transformer-Based Matching Network
Fig. 14 illustrates a general schematic of a transformer-based matching network of a switched-mode PA, which performs simultaneously m-series (i.e., voltage) and p-parallel (i.e., current) combining [42] , [43] . As proven in Appendix B, the matching network efficiency η p can be calculated as shown in (15) at the bottom of the page.
η p is a strong function of the effective inductance seen by the load, mL s /p, and C L . Hence, for the sake of simplicity,
as the loaded Q-factor of the secondary side of the matching network. The η p reaches its local maximum when
By exploiting the Q L definition above and replacing L s(opt) from (16) into L s in (15) , and carrying out lengthy algebra, the local maximum of η p may be estimated by (17) , shown at the bottom of the page. Fig. 15(a) shows the maximum possible passive efficiency η p(opt) versus Q L . As can be seen, there exists a global optimum Q L that maximizes the transformer-based matching network efficiency at a given frequency. The η p reaches its global maximum when
As a result, the global optimum load capacitance, C L(opt) , may be estimated by
. Note that both (16) and (18) are more general and accurate than in [42] . Using the optimum ξ and Q L , the maximum η p will be given by
which is the same result as in [42] . As gathered form Fig. 15(a) , there is a strong tradeoff between the frequency selectivity and efficiency of the transformer-based matching network for
. Combined with the fact that the effective matching network's Q improves almost linearly with Q L , it is therefore 
) to double the frequency selectivity for the price of a negligible, i.e., ≤ 5%, efficiency drop.
B. Impedance Transformation
The matching network should also realize the required load resistance, r L , and series inductance, L ser , for proper zerovoltage and zero-slope switching (ZVS and ZdVS) operation of the class-E/F PA. As shown in Appendix B, r L may be estimated by
To deliver the relatively low P out ≤ 3 dBm to the antenna, realizing a larger r L is desired. Unfortunately, as can be gathered from (20) , the voltage summation (m > 1) and imperfect magnetic coupling k m exhibit reverse effect of reducing r L . The p-way current combining enhances r L but at the price of (p − 1) extra transformers and thus a dramatic increase in the PA die area [43] , [44] . Hence, the parallel combining is not considered in this work. Eq. (20) further indicates that a step-down transformer (1: n) with a small turns ratio (n < 1) could be used to enhance r L . However, the Q-factor of transformer windings, and thus its efficiency, drops dramatically as n reduces. Consequently, the turns ratio of 1:1/2 was chosen in consideration of both the r L enhancement and η p optimization scenarios. P out is further reduced by using V DD = 0.5 V (i.e., roughly half the nominal supply) for the drains of switching transistors with the side effect of ∼6 dB lower power gain for PA's transistors. However, the power gain of 28 nm NMOS devices is high enough at a relatively low frequency of 2.4 GHz such that the 6 dB power gain penalty has a negligible effect on the total system efficiency. Furthermore, the drain voltage peak of the switching transistors is ≤ 1.5 V, thus alleviating reliability issues due to a gate-oxide breakdown [21] , [45] .
As shown in Appendix B, the equivalent series inductance, L ser , seen from the transformer's primary is
Note that switched-mode PAs typically need a large L ser to satisfy the ZVS/ZdVS criteria, which leads to a large inductor with a reduced Q-factor. As can be gathered from (21) and Fig. 15(b) , L ser increases with a larger Q L for ξ ≥ 1. More interestingly, L ser can even be larger than the primary inductance,
, which helps to reduce both matching network dimensions and insertion loss. Unfortunately, r L reduces with C L and thus the peak efficiency occurs at a higher output power. Consequently, it is again desired to choose C L ≈ 2C L(opt) by considering the tradeoff between r L and L ser enhancement factors. Fig. 13(b) illustrates an equivalent circuit of the PA matching network in the differential mode at the fundamental frequency ω 0 . At all higher odd harmonics, L ser presents high impedance and thus the only load seen by the switch is its parallel capacitance C s , just the way it is in the traditional class-E PAs.
C. Class-E/F 2 Operation
As illustrated in Fig. 16 , the step-down 2:1 transformer acts differently to the common-mode (CM) and differential-mode (DM) input signals. When the transformer's primary is excited by a CM signal [ Fig. 16(b) ], the magnetic flux within the primary's two turns cancels itself out [46] . Consequently, the transformer's L p is negligible and no current is induced at the transformer's secondary (k m−CM ≈ 0). Hence, R L , L s and C L cannot be seen by even harmonics of drain current.
Furthermore, the CM inductance, 2L cm , seen by the switching transistors is mainly determined by the dimension of the trace between the transformer's center-tap and decoupling capacitors at the V DD node. Together with C s , 2L cm realizes a CM resonance, ω cm . Note that P out of the class-E PA can be reduced by ∼2 dB at the same r L and V DD by means of an additional open circuit acting as the switches' effective load at ∼2ω 0 (i.e., class-E/F 2 operation [47] ), as supported in the power factor, K p , column in Table III . Consequently, this PA needs smaller ITR for P out < 3 dBm, which results in a lower insertion loss for its matching network and thus higher system efficiency. However, in practice, limited value of an equivalent parallel resistance of the CM resonance, R cm , leads to a power loss at the second harmonic and thus a penalty on the PA's efficiency if ω cm is set at precisely 2ω 0 . Consequently, in this design, we adjust the CM resonance slightly lower (i.e., at ∼1.8ω 0 ) to benefit from the lower K p of semi class-E/F 2 operation, while avoiding the additional power loss at even harmonics. 
2 /P out , where V Dsat represents the transistor's average V DS in the on-state. As explained in [44] , V Dsat is a strong function of the switch size, technology and topology-dependent parameters, and it is set to ∼0.12 V to maximize the PAE of the proposed PA. The shunt capacitance, C s , and series inductance, L ser , may be estimated by exploiting K c and K L definitions:
Now, the transformer geometry should be designed to realize the required r L and L ser by (20) - (21) while optimizing the matching network efficiency via (16)- (19) . In this work, the circuit variables are as follows:
IV. ALL-DIGITAL PHASE-LOCKED LOOP AND TRANSMITTER ARCHITECTURE Fig. 17 shows a block diagram of the proposed ultra-lowpower (ULP) all-digital PLL (ADPLL), whose architecture is adapted from a high-performance cellular 4G ADPLL disclosed in [48] . Due to the relaxed PN requirements of BLE, the DCO ΣΔ dithering [49] was removed thanks to the fine switchable capacitance of the tracking bank varactors producing a fine step size of 4 kHz. The DCO features two separate tracking banks (TB): 1) phase-error correction, and 2) direct FM modulation. Each bank is segmented with LSB (i.e., 1x ≡ 4 kHz) and MSB (i.e., 8x) unit-weights. Each TB range is 4 kHz × (8 + 8 × 64) = 2.08 MHz.
The DCO clock is divided by two to generate four phases of a variable carrier clock, CKV 0−3 , in the Bluetooth frequency range of f V = 2402-2478 MHz. Two of its phases, CKV 0,2 , are fed as differential clock signals to the digital PA (DPA) in Fig. 13(a) . The four CKV 0−3 phases are routed to the phase detection circuitry, which selects the phase whose rising clock edge is expected to be the closest to the rising clock edge of a frequency reference (FREF) clock. This prediction is based on two MSB bits of a fractional part of reference phase, which is an accumulated frequency command word (FCW). By means of this prediction, the selected TDC input clock CKV' spans a quarter of the original required TDC range, i.e., T V /4, where T V is the CKV clock period. This way, the long string of 417 ps/12 ps > 35 TDC inverters is shortened by 4x, improving INL linearity and power consumption by the same amount.
The TDC output, after decoding, is normalized to T V by the Δ T DC /T V multiplier and the quadrant estimation, normalized to T V /4, is added to produce the phase error φ E . The DCO tuning word is updated based on φ E . The φ E [k] is fed to the type-II loop filter (LF) with 4th-order IIR. The LF is dynamically switched during frequency acquisition to minimize the settling time while keeping phase noise (PN) at optimum. The built-in DCO gain, K DCO , and TDC gain, K T DC , calibrations are autonomously performed to ensure the wideband FM response.
The following architectural innovations allow the ADPLL to support ULP operation (highlighted in blue): The effective sampling rate of the phase detector and its related DCO update is dynamically controlled by scaling-down the FREF clock and simultaneously adjusting the LF coefficients in order to keep the same bandwidth and LF transfer-function characteristics. During the ADPLL settling, the full FREF rate is used, but afterwards its rate could get substantially reduced (e.g., 8x), or completely shut down, thus saving power consumption of the digital circuitry. The resulting in-band PN degradation is tolerable due to low PN of the DCO. In fact, freezing FREF would incur sufficiently low-frequency drift during the BLE 376 μs packets, while keeping in operation the bare minimum of circuitry highlighted in red. Fig. 18 shows the die photo of the ULP TX in TSMC 1P9M 28 nm CMOS. Both DCO and PA transformers' windings are realized with top ultra-thick metal. However, they include a lot of dummy metal pieces on all metal layers (M1-M9) to satisfy very strict minimum metal density manufacturing rule of advanced (≤ 28 nm) technology nodes [48] . Fig. 19 displays the phase noise of the proposed oscillator at the lowest and highest tuning frequencies for V DD = 0.5 V and 0.8 V, while R in ≈ 310 Ω. The measured PN is −111 dBc/Hz at 1 MHz offset from 5.1 GHz carrier while consuming ∼0.35 mW at 0.5 V. As justified in Section II-D, the 1/f 3 PN corner of the oscillator is extremely low (i.e., ≤ 100 kHz) across the tuning range (TR) of 22% (i.e., from 4.1 to 5.1 GHz). Its average FoM is 189 dBc and varies ±1 dB across the TR. For the supply frequency pushing measurements, the oscillator's V DD supply is swept within 0.4-0.6 V while the off-chip bias resistor R Bias (see Fig. 7 ) is removed and V B is directly connected to an external reference voltage. 9 Contrary to the OSC NP structure, V DD perturbations here cannot directly modulate V gs and thus the oscillator's DC current and nonlinear C gs of M 1−4 devices. Consequently, the worst-case supply frequency pushing is very low, 10-12 MHz/V across TR, thus making the oscillator suitable for direct connection to solar cells and integration with a PA. Fig. 20 plots the measured phase noise at different configurations for both integer-N and fractional-N BLE channels. When 9 Since V B biasing does not consume any DC current, the current consumption of its internal biasing circuit is extremely low; therefore, realizing an on-chip V B voltage reference with a good PSRR would be quite straightforward.
V. EXPERIMENTAL RESULTS
used as an LO at undivided 40 MHz FREF, the ADPLL consumes 1.4 mW with an integrated PN of 0.87
• (yellow line in Fig. 20) . It exhibits in-band PN of −101 dBc/Hz, which corresponds to an average TDC resolution of ∼12 ps. Thanks to the low wander of the DCO, digital power consumption of the rest of ADPLL can be saved by scaling the rate of sampling clock to 5 MHz. However, the in-band PN increases by 10 log 10 (40/5) = 9 dB to −92 dBc/Hz with an integrated PN of 1.08
• (blue line in Fig. 20 ). Fig. 21 shows a representative spectrum of the ADPLL at integer-N and fractional-N channels and summarizes the worstcase spur for each BLE channel. The reference spur is −80 dBc and the worst-case fractional spur is −60 dBc. The open-loop spurs are not visible above the −90 dBc noise floor of our equipment. Fig. 22 shows the TX spectra for 1 Mb/s GFSK modulation at different modulation indexes and its burst modulation quality. All spectral mask requirements are fulfilled, while the FSK error is 2.7%.
To achieve simultaneous fast locking and power savings, the loop bandwidth is dynamically controlled via a gearshift technique [49] . During frequency acquisition, the loop operates in type-I, with a wide bandwidth of 2 MHz. It is then switched to type-II, 4th-order IIR filter with a 500 kHz bandwidth when it enters the tracking mode. Finally, the loop bandwidth is reduced to 200 kHz to optimize the ADPLL integrated jitter. The measured lock-in time is less than 15 μs for f REF of 40 MHz as shown in Fig. 23(a) . Thanks to the low flicker noise, frequency pushing and pulling of the DCO, its frequency drift is extremely small, as demonstrated Fig. 23(b) . Consequently, the rest of ADPLL can be shut-down during the modulation to improve the power efficiency of the BLE transmitter. The maximum difference between 0/1-symbol frequency at the start of the BLE packet and 0/1 frequencies within the packet payload should be less than ± 50 kHz. This specification is properly satisfied with over an order-of-magnitude margin even while in the open loop operation, as shown in Fig. 23(b) and (c) .
The PA output level is digitally adjustable between −5 to +3 dBm and reaches peak PAE of 41%, which includes the power consumption of two stages of PA drivers [see Fig. 24(a) ]. The measured TX harmonic emissions are shown in Fig. 24(b) . Due to the differential operation, proper 2nd-harmonic termination and trading negligible efficiency loss for higher loaded Qfactor of PA's matching network, 2nd and 3rd harmonics remain well below the −41 dBm regulatory limit. The proposed TX consumes 3.6/5.5 mW during the open-loop 1 Mb/s GFSK BLE modulation at 0/3 dBm output, resulting in η T X = 28/36% total TX efficiency. The power consumption would increase by 0.8 mW with TDC, variable counter and digital circuitry turned on when the ADPLL is clocked at 40 MHz FREF. Thus, even in the closed loop, with η T X = 23/32% at 0/3 dBm, it is still more power efficient than the prior record [6] (also [50] but at 13.5 dBm output). The TX power breakdown is also illustrated in Fig. 24(c) . Table IV summarizes the performance and compares it with leading ULP transmitters. The proposed ULP TX achieves the lowest power consumption and phase noise.
VI. CONCLUSION
We have proposed an ultra-low power (ULP) Bluetooth Low Energy (BLE) transmitter that demonstrates the best-ever reported system efficiency and phase purity, while abiding by the strict 28 nm CMOS technology manufacturing rules. A new switching current-source oscillator combines advantages of low supply voltage of the conventional NMOS cross-coupled oscillator with high current efficiency of the complementary push-pull oscillator to reduce the oscillator supply voltage and dissipated power further than practically possible in the traditional oscillators. Due to the low wander of DCO, digital power consumption of ADPLL can be significantly saved by scaling down the rate of sampling clock after settling or even shutting it down entirely during direct DCO data modulation. A fully integrated differential class-E/F 2 switching PA is utilized to improve system efficiency at low output power of 0-3 dBm while fulfilling all in-band and out-of-band emission masks. Its required matching network was realized by exploiting different behaviors of a 2:1 step-down transformer in differential and common-mode excitations. Furthermore, for both the proposed oscillator and power amplifier, accurate key analytical equations are derived to provide useful design insights. APPENDIX A Consider the switching current-source oscillator of Fig. 7 . Since M 3−4 transistors work only in weak inversion and saturation during their on-state, short-channel modulation effects should be considered in the G DS4EF calculation in (5). It is well known that g ds4 (φ) = λ · I M 4 (φ), where I M 4 and λ are, respectively, the drain current and channel-length modulation coefficient of M 4 . As a result, G DS4EF is estimated as
where I M 4,H2 is the 2nd harmonic of I M 4 . By considering λ = 4.8 V −1 , I M 4,H2 /I DC = 0.33, and I DC = 750 μA, the calculated G DS4EF becomes 1.2 mS, which agrees fairly well with the simulation results in Fig. 11(a) .
On the other hand, since M 1 works in saturation only for a short part of the oscillation cycle and its channel conductance, g ds1 , is much larger in the triode region, a square-law behavior in the G DS1EF calculation in (5) seems a good assumption. As a result, g ds1 may be estimated by
where 
By exploiting the G DS1EF definition and carrying out a lengthy algebra, we obtain
By replacing the oscillator's circuit parameters (V B = 0.45 V,
3) and (A.4), the calculated G DS1EF is equal to 3.81 mS, which is in good agreement with the simulations [see Fig. 11(a) ].
APPENDIX B
Consider the transformer-based matching network shown in Fig. 14(b) . The current through the secondary and primary windings of the ideal transformer can be respectively calculated by
, and
Furthermore, the voltage across the magnetizing inductance,
Consequently, the current through the leakage inductance,
, is calculated by
As a result, the total power dissipated in the transformers' secondary and primary is respectively estimated: 
The matching network efficiency, η p , is the ratio of power delivered to the load, P L , over total power: η p = P L /(P L + P rp + P rs ). By exploiting (B.4) and (B.5), (15) is obtained. On the other hand, the load Z L seen from the input ports of the matching network (see Fig. 14) can be calculated by
(B.6)
As a result, the equivalent series inductance and load resistance seen from the transformer's primary can be respectively estimated by
2 .
(B.8)
By exploiting Q L and ξ definitions, we have 
Robert Bogdan Staszewski

