A divider-less PLL exploits a phase detector that directly samples the VCO with a reference clock. No VCO sampling buffer is used while dummy samplers keep the VCO spur <-56dBc. A modified inverter with low short-circuit current acts as a power efficient reference clock buffer. The 2.2GHz PLL in 0.18μm CMOS achieves -125dBc/Hz in-band phase noise with only 700μW loop-components power.
Introduction
Clock multiplication PLLs with very low jitter have recently been proposed based on sub-sampling [1, 2] and injection locking [3, 4] . In a PLL, the VCO dominates the out-of-band phase noise while the loop-components dominate the in-band phase noise. The sub-sampling (SS) PLL [1, 2] can achieve very low in-band phase noise because: 1) divider noise is eliminated; 2) the phase detector (PD) and charge pump (CP) noise is not multiplied by N 2 . This paper describes a new SSPLL design aiming to drastically reduce the loop-components power while maintaining its superior in-band phase noise performance. Fig. 1(a) shows the low power SSPLL architecture. A sub-sampling phase detector (SSPD) samples the VCO with a reference clock Ref and converts VCO phase error into sampled voltage variation. A CP converts the sampled voltage to current. A Pulser controls the CP gain and simplifies the SSPD design to a track-and-hold [1] . A frequency locked loop ensures correct frequency locking and is disabled after locking to save power. In a SSPLL the PD and CP noise contributions are low and thus their power can be scaled down progressively. The VCO and Ref buffers for the SSPD then become the bottlenecks for low power. In [1] , they account for 30% and 60% of the total loop-components power, respectively. In this design, we propose two techniques to alleviate these bottlenecks: 1) direct sampling of the VCO without buffer while keeping the disturbance to the VCO low; 2) power efficient Ref buffering with drastically reduced short-circuit current. Fig. 2 shows the LC VCO and SSPD schematic. Different from [1] , no buffer is used between the VCO and SSPD samplers. This saves power as buffers running at f VCO are power consuming. The samplers use PMOS switches since the VCO DC level is high. A concern of this buffer-less direct VCO sampling is the disturbance to the VCO operation. When Ref turns on/off the sampling switch, the VCO is loaded/un-loaded by the sampling capacitors C sam . The VCO load and thus f VCO is changed resulting in binary frequency shift keying (BFSK), causing spurs at integer multiples of f ref .
Proposed Low Power SSPLL
In order to reduce this effect, dummy samplers are added as figure) compensates the inverter delay. Due to the complementary switching of the sampler and its dummy, the VCO load does not change over time and the BFSK effect is compensated. In reality, the compensation is not perfect due to capacitor mismatch ΔC sam between the sampler and its dummy. Since ΔC sam scales with the value of C sam , it is desirable to have a small C sam for a low spur level. However, a smaller C sam means more sampler noise.
With the phase domain model in Fig. 1(b) , the in-band phase noise due to the samplers can be derived as Fig. 1(b) . Thus large inverters need to be used at the expense of power. As the input SR is low and output SR high, power is wasted due to the "short-circuit" current caused by simultaneous conduction of the NMOS and PMOS transistors during switching.
In a sampling process, only one of the two clock edges is used as the sampling edge. In this SSPD design (Fig. 2) Fig. 3 shows the proposed Ref buffer, which exploits this property to drastically reduce power. A similar circuit has been used in [2] to control the Ref duty cycle. Here we exploit it to achieve low power. The idea is to directly convey the critical edge and re-position the other non-critical edge at a convenient place to avoid the short-circuit current. The buffer core is an inverter with an NMOS N1 and a PMOS P1. N1 is directly connected to XO as in a conventional inverter, while a timing control circuit (TCC) is inserted between P1 and XO. The TCC consists of two delay cells Δt 1 and Δt 2 and a few standard logic gates. It generates a narrow pulse V GP from the XO and controls the gate of P1. As shown in Fig. 3 , Δt 1 and Δt 2 are set such that the time when V GP is low (P1 conducts) and the time when XO is higher than the threshold of N1 (N1 conducts) are non-overlapping. Since f ref is low, this timing plan is easy to achieve. In this way, N1 and P1 will not conduct simultaneously thereby eliminating the short-circuit current. Since the Ref rising edge is the critical sampling edge, the size of N1 is kept big to maintain a low sampling edge noise, while the TCC and P1 use small sizes to save power as they only add noise to the non-critical edge. The first block Inv1 in the TCC is a conventional inverter and has the slow XO as its input. It thus still has short-circuit current, but the contribution to the total buffer power is negligible as its size is small. The proposed buffer thus greatly reduces power while maintaining the critical edge's noise performance.
Experimental results
The 2.2GHz PLL was fabricated in standard 1.8V 0.18-µm CMOS with an active area of 0.4 x 0.5 mm 2 (Fig. 4) . Measured in-package with a 1.8V p-p 55MHz XO as input, the in-band phase noise £ in-band at 200kHz is -125dBc/Hz as shown in Fig.   4 . The jitter integrated from 10kHz to 100MHz is 0.16ps rms . The PLL loop-components consume 0.7mW and the VCO 1.8mW. The worst case reference spur measured from 20 chips while changing Ref duty cycle is -56dBc. Fig. 5 summarizes the PLL performance and benchmarks it to low jitter PLLs. This design has the best PLL FOM. Note that we directly used a 55MHz sine-wave XO as the PLL input while [3] used a 50MHz square wave and [4] used a 1GHz sine wave. Compared with [1] , the loop-components power is 8x lower while £ in-band is only 1dB worse. Compared with [2] , the loop-components power is 3x lower while £ in-band is 4dB better. 
