Abstract-We propose a new transmitter (TX) architecture for ultra-low power radios. An all-digital PLL employs a digitally controlled oscillator with switching current sources to reduce supply voltage and power without sacrificing its phase noise and startup margins. It also reduces 1/f noise allowing the ADPLL, after settling, to reduce its sampling rate or shut it off entirely during direct DCO data modulation. The switching power amplifier integrates its matching network while operating in class-E/F2 to maximally enhance its efficiency. The transmitter is realized in 28nm CMOS and satisfies all metal density and other manufacturing rules. It consumes 3.6 mW/5.5 mW while delivering 0 dBm/3 dBm RF power in Bluetooth Low-Energy.
I. INTRODUCTION
Ultra-low power (ULP) transmitters are key subsystems for wireless sensor networks and internet-of-things (IoT). However, the system lifetime is severely limited by their power consumption and available battery technology. Energy harvesting can enable further applications but provides lower supply voltages (on-chip solar cells: 0.2-0.8V) than deepnanoscale CMOS supply of 1 V. Although dc-dc converters can boost the voltage, their poor efficiency (≤80%) introduces significant power penalty. Hence, the following new techniques are exploited in this work to enhance the ULP transmitter (TX) efficiency: First, the most power-hungry circuitry, such as digitally controlled oscillator (DCO) and output stage of power amplifier (PA), can operate directly at low voltage of harvesters. Second, a new switching current source oscillator optimized for 28 nm CMOS reduces power and supply voltage without compromising the robustness of the oscillator start-up or loading its tank quality factor. Third, thanks to the low wander (i.e, low flicker noise) of the DCO, digital power consumption of the rest of all-digital PLL (ADPLL) is saved by scaling the rate of sampling clock to the point of its complete shut-down. Last, a fully integrated differential class-E/F 2 switching PA is utilized to optimize high power added efficiency (PAE) at low output power of 0-3 dBm.
II. SWITCHING CURRENT SOURCE OSCILLATOR
The phase noise (PN) specifications are quite trivial for IoT applications and can be easily met by LC oscillators as long as the Barkhausen start-up criterion is satisfied over process, voltage and temperature (PVT) variations. Consequently, reducing DCO's power consumption (P DCO ) is the ultimate goal of IoT applications. The P DCO can be calculated by [1] 
where, R p is an equivalent input parallel resistance of the tank modeling its losses; α I is the current efficiency, defined as ratio of the fundamental component of tank's current I ω0 over the oscillator DC current I DC ; and α V is the voltage efficiency, defined as ratio of the drain oscillation amplitude V osc (single-ended) over the supply voltage V DD . Lower P DCO is typically addressed by scaling up R p =L p ω 0 Q t simply via a large multi-turn inductor [2] . However, (1) indicates that topology parameters such as oscillator's α I and minimum supply voltage (V DDmin ) can also play an important role in the minimum achievable P DC . Fig. 1 illustrates the proposed oscillator that combines best features of the traditional cross-coupled NMOS oscillator (i.e., low V DD ) and the complementary push-pull oscillator (i.e., high α I ) from the ULP standpoint. As can be gathered from Fig. 1 , oscillation voltage at G B is high within the first half-period. Hence, only M 2 and M 3 transistors are on and the current flows from left to right side of the tank's primary inductor 2L p . However, M 1 and M 4 are turned on for the second halfperiod and tank's current direction is reversed. Consequently, like in the push-pull structure, the tank current flow is reversed every half-period thus doubling the oscillator's α I to 4/π.
The minimum V DD is determined by the bias voltage V B = V OD1 + V gs3 . Hence, M 3,4 should work in weak-inversion keeping V gs3 < V t to achieve lower V DDmin . Since have the same DC gate voltage, M 3,4 sub-threshold operation also provides enough V OD overdrive for the switching current source devices M 1,2 to operate in the saturation region at the DC operating point. Hence, unlike traditional oscillators, the dimension of M 3,4 devices must be a few times (i.e., 8×) larger than current source devices to guarantee their weakinversion operation. Furthermore, the oscillation swing cannot go further than V OD1,2 at DA/DB nodes, which is chosen ∼150 mV to satisfy the system's phase noise spec by a few dB margin. Consequently, as with cross-coupled NMOS oscillator, the proposed structure can operate at V DD as low as 0.5 V.
The transformer voltage gain (A) enhances the oscillation swing at M 1−4 gates to even >V DD and guarantees the startup over PVT variations. Furthermore, the combination of A and the effective trans-conductance gain of M 1−4 must compensate the tank losses. Hence, the contribution of M 1−4 to the oscillator PN reduces by A, which compensates the effect of lower voltage efficiency (α V ) of this structure on the oscillator PN and FoM.
Larger tank input impedance, R p is also beneficial to reduce the oscillator's power consumption.
Hence, the PVT tuning capacitors are divided in the transformer's primary and secondary to roughly satisfy this criterion.
Switching the bias of M 1−4 devices reduces both their 1/f noise and also the DC component of their effective ISF function. Consequently, a much lower 1/f 3 PN corner is expected than in the traditional oscillators [1] .
III. CLASS-E/F 2 POWER AMPLIFIER Designing a fully integrated PA optimized for low output power (P out < 3 dBm) with PAE > 40% is very challenging, especially when differential structure is needed to satisfy the stringent 2 nd harmonic emissions. To realize such a low P out , one needs to employ a matching network with a large impedance transformation ratio (ITR) to increase the load resistance, r L , seen by the PA transistor drains of Fig. 2 .
Unfortunately, the differential structure and imperfect magnetic coupling factor k m of the matching network's transformer exhibit reverse effect of reducing r L and thus ITR. Hence, the transformer turns ratio (n:1) should be large n>4 to compensate for them. However, Q-factor of transformer windings, and thus its efficiency, drops dramatically with n>2. Consequently, the PAE of published integrated PAs is relatively low (< 30%) or off-chip components are used in their matching networks [2] - [6] .
The drain efficiency η D of class-E/F switch-mode PA can be calculated by [7] 
where, C 1 is PA's required shunt capacitance to satisfy class-E/F zero-voltage and zero-slope (ZVS) switching [8] . R on and C out are, respectively, on-state channel resistance and output capacitance of M 1 transistor. Note that R on ×C out is a constant at a given technology and invariant to changes of M 1 's width. F I is defined as ratio of RMS over DC values of M 1 drain current. F C is the PA waveform factor. Both F I and F C are merely a function of matching network strategy and do not change over technology or PVT variations [8] .
A smaller P out can also be realized by using a lower V DD for the PA's drains (e.g., 0.5 V) without any degradation of η D , as gathered from (3). As a consequence, the required ITR will be smaller, which results in better efficiency of PA's output matching network. Furthermore, the drain voltage of the switching transistor is also limited to ≤1.5 V, alleviating reliability issues due to gate-oxide breakdown [9] . Eq. (3) also indicates that the switching amplifiers with smaller F C ·F 2 I (see Table I ) inherently demonstrate higher efficiency. For example, class-E PA efficiency can be improved by realizing an additional open circuit as the PA switches' effective load at 2 nd harmonic 2ω 0 (i.e., class-E/F 2 operation). Furthermore, class-E/F 2 shows a better tolerance to C out variations [7] due to the role of 2 nd harmonic tuning in smoothing the drain voltage waveform [8] . These benefits come at the expense of ∼3× lower power gain for PA transistors compared to that in the conventional class-E setup. However, the power gain of 28 nm NMOS devices is high enough at relatively low frequency of 2.4 GHz such that a 4.5 dB power gain penalty has a negligible effect on the total system efficiency. Fig. 2(b) shows the equivalent circuit of the PA matching network in the differential mode. The transformer's secondary inductance 2L s and capacitor C 2 resonate at ω 0 to optimize the matching network efficiency. Furthermore, the transformer's leakage inductance L p (1 − k 2 m ) and primary capacitor C 1 respectively realize the required series inductance and shunt capacitance of a class-E/F PA to satisfy its ZVS switching criteria.
As illustrated in Fig. 3 , the step-down 2:1 transformer acts differently to the common-mode (CM) and differential-mode (DM) input signals. When the transformer's primary is excited by a CM signal at 2ω 0 [ Fig. 3(b) ], the magnetic flux excited within two turns of the primary winding cancels itself out. Consequently, the transformer's primary inductance is negligible and no current is induced at the transformer's secondary (k m−CM ≈ 0). Hence, R L , L s and C 2 cannot be seen by the 2ω 0 component of drain current. Furthermore, CM inductance seen by PA transistors is mainly determined by the dimension of the trace between the transformer center-tap and decoupling capacitors at the V DD node, which roughly must resonate with C 1 at 2ω 0 to realize the class-E/F 2 operation (see Fig. 2 (c) ). Fig. 4 shows a block diagram of the proposed ultra-low power (ULP) all-digital PLL (ADPLL) adapted from a highperformance cellular 4G ADPLL [10] its output by 2 to create a CKV clock vector, CKV 0... 3 . A phase predictor ensures the TDC input CKV' is < T V /4 by selecting a CKV phase that is closest to FREF. The TDC output, after decoding, is normalized to T V by the Δ T DC /T V multiplier and the octal estimation, normalized to T V /4, is added to produce the phase error φ E . The DCO tuning word is updated based on φ E .
IV. ALL-DIGITAL PHASE-LOCKED LOOP ARCHITECTURE
The following architectural innovations allow the ADPLL to support ULP operation (highlighted in blue): The effective sampling rate of the phase detector and its related DCO update is dynamically controlled by scaling-down the frequency reference (FREF) clock and simultaneously adjusting the loop gain. During ADPLL settling, the full FREF rate is used, but afterwards its rate could get substantially reduced (e.g., 8x), thus saving power consumption of the digital circuitry. The resulting in-band PN degradation is tolerable due to low PN of the DCO. In fact, freezing FREF would incur sufficiently low frequency drift during Bluetooth 625μs packets, while keeping in operation the bare minimum of circuitry highlighted in red. Fig. 5 shows the die photo of the 0.75mm 2 ULP TX in TSMC 1P9M 28nm CMOS. Both DCO and PA transformer's windings are realized with top ultra-thick metal. However, they include a lot of dummy metal pieces on all metal layers (M1-M9) to satisfy very strict minimum metal density manufacturing rule of advanced (≤28 nm) technology nodes. o (blue line in Fig. 6 ). The reference spur is -80 dBc and the worst-case fractional spur is -60 dBc at BT-LE channels as shown in Fig. 7 (a) . Fig. 7 (b) shows Bluetooth-LE 1 Mb/s GFSK modulation provided by the ADPLL, while fulfilling all spectrum mask requirements with sufficient margin.
V. MEASUREMENT RESULTS
PA output level is adjustable between -5 to +3 dBm and reaches excellent peak PAE of 41% (see Fig. 8 (a) ). The proposed TX consumes 3.6/5.5 mW during the open-loop 1 Mb/s GFSK BT-LE modulation at 0/3 dBm output, resulting in η T X =28/36% total TX efficiency. The power consumption would increase by 0.8 mW with TDC, variable counter and digital circuitry turned on when the ADPLL is clocked at 40 MHz FREF. Thus, even in closed loop, with η T X =23/32% at 0/3dBm it is still 1.5x better than the prior record. The TX power breakdown is also illustrated in Fig. 8 (b) . Table II summarizes the performance and compares it with leading ULP transmitters. The proposed ULP TX achieves the lowest power consumption and PN. VI. CONCLUSION We have proposed an ultra-low power Bluetooth LE transmitter that demonstrates the best ever reported power efficiency and phase purity, while abiding by the strict 28 nm CMOS technology manufacturing rules. A new switching current source oscillator combines advantages of low supply voltage of the conventional NMOS cross-coupled oscillator with high current efficiency of the complementary push-pull oscillator to reduce the oscillator supply voltage and dissipated power further than practically possible in the traditional oscillators. Furthermore, due to the low wander of DCO, digital power consumption of ADPLL was saved by scaling the rate of sampling clock to the point of its complete shut-down. A fully integrated differential class-E/F 2 switching PA is utilized to improve system efficiency at low output power of 0-3 dBm. Its required matching network was realized by exploiting different behaviors of a 2:1 step-down transformer in differential and common-mode excitations.
