Abstract-This paper presents several design techniques to widen the operating frequency range and increase the locking range of dynamic current-mode latch-based inductor-less millimeter-wave frequency dividers. A self-calibration technique is introduced to guarantee frequency locking over a wide frequency range with low-input amplitude and over process, voltage, and temperature variations, thereby optimizing power consumption and ensuring robustness. Three divide-by-four prototypes incorporating the aforementioned techniques are designed to cover a frequency range exceeding 16-67 GHz. Fabricated in a 65-nm CMOS technology, the prototypes achieve min-average-max increases of 10%-47.5%-121.4% in fractional bandwidth over the state of the art. The first and third prototypes consume 3.7/6.2 mA (min/max) from a 1-V supply and achieve a figure of merit of 4.1/7.15 GHz/mW (min/max), while the second prototype consumes 3.9/6.7 mA (min/max) from a 1.1-V supply and achieves a figure of merit of 3/6.98 GHz/mW (min/max). The prototypes occupy the active areas of 11 × 42/11 × 53 µm 2 (min/max). Finally, with the selfcalibration scheme, the dividers operate with input power less than −10 dBm, supply voltage variation of ±100 mV.
I. INTRODUCTION
T HE millimeter-wave (mm-wave) frequency bands have gained wide acceptance in current applications, such as short-range wireless personal and local area networks, and automotive radar. The mm-wave standards are also being developed to serve new applications including cellular communication and wireless backhaul in several new bands including 28, 36, 45, 73 , and 79 GHz. Frequency synthesizers with wide tuning range, high-resolution, low cost, and low power consumption are essential components in such applications [1] - [6] , and are therefore of high current interest. In mmwave transceivers, common approaches to local oscillator (LO) generation include fundamental frequency LO generation [7] - [10] and two or three times frequency multiplication [11] , [12] . Also, automotive mm-wave radar transceivers typically use direct frequency modulation using fundamental frequency synthesizers [4] .
In all such transceivers, the high-speed frequency divider is a key constituent block that poses serious design challenges in terms of power/area consumption and operating range at mm-wave frequencies. Injection locked frequency dividers (ILFD) [13] - [16] can operate at high frequency with low power consumption, but suffer from large area, limited division ratio, and narrow locking range which is further reduced at higher division ratios [17] , [18] . Current mode logic (CML) dividers [19] - [21] are another popular choice, but are very power hungry at mm-wave. The Miller divider [22] , [23] is another candidate that can operate at high frequency with low power consumption, but suffers from smaller locking range compared to CML dividers. In [24] , a new class of high speed inductor-less frequency dividers based on dynamic CML (DCML) latches [ Fig. 1(a) ] was introduced to achieve wide operating range and low power consumption with small die area. In [25] and [26] , the load modulation technique was introduced into the DCML latches to improve their locking range and extend the minimum operating frequency (F min ). However, both these dividers suffer from several shortcomings. First, the leakage (OFF-current) of the tail current source (M N1 ) and finite OFF-resistances of the load transistors (M P1 , M P2 ) significantly increase F min , thus degrading the locking range. Second, the operating frequency range (F max − F min ) is extremely sensitive to parasitic capacitance and resistance, and to process, voltage, and temperature (PVT) variations. Third, the operating characteristic of the divider is divided into several narrow "sensitivity bands" [27, Fig. 8] ; wideband operation is achieved by tuning the bias currents and the loads separately for each band. This adjustment is performed manually in [24] and [27] , which is impractical in an mmwave frequency synthesizer. Fourth, the sensitivity bands in the topology of [24] and [27] become narrower at lower input amplitudes that are typically produced by the LO buffer in a practical mm-wave synthesizer. Nonetheless, it is worth noting that the performance of this divider improves with technology scaling [26] , [27] .
This paper introduces several design techniques to increase the operating range under small input amplitude conditions, and demonstrates their effectiveness through three prototype topologies. In Section II, an accurate time-domain analysis of current DCML dividers [24] , [27] is presented in order to elucidate their limitations. Based on these insights, several new design techniques are proposed and applied to three different topologies, as described in Section III. In the first topology, current bleeding and inter-latch source coupling are introduced to increase F max and decrease F min . The second topology judiciously mixes devices with different threshold voltages to maximize the locking range. The third topology introduces bulk modulation of the tail transistor and adaptive bulk biasing of the load transistors to increase locking range and F max , respectively. Despite these improvements, several bands with individually optimized biases and loads are necessary to cover a wide frequency range. In order to address this issue, a calibration technique is introduced that automatically adjusts bias and load conditions, thereby enabling the divider to lock over an extremely wide frequency range while minimizing power consumption and enhancing robustness over PVT variations. This is described in Section IV. Characterization of the three 65-nm CMOS prototypes, presented in Section V, proves robust self-calibrated operation from 16 to 67 GHz with input amplitude as low as 100 mV (−10 dBm). Section VI compares the proposed divider topologies, and Section VII concludes this paper.
II. LIMITATIONS OF THE CONVENTIONAL INDUCTOR-LESS FREQUENCY DIVIDER

A. Effect of Leakage Current (I OFF ) and Finite
The load-modulated latch used in the DCML divider [26] is shown in Fig. 1(b) . For a given input amplitude, the bias voltages of the tail current source (V BN ) and the PMOS loads (V BP ) determine the maximum injection current (I ON ) and the minimum ON-resistance (R ON ) of the PMOS loads, respectively. The baseline DCML divider using loadmodulated latches was analyzed in [26] under the assumption that the tail current source and output PMOS transistors are completely turned OFF during the hold phase (i.e., R OFF = ∞ and I OFF = 0). In this section, we present a more accurate time-domain analysis including these effects that provides additional insights necessary to further enhance the operating frequency range.
Waveforms of a load-modulated latch in a locked DCML divider [ Fig. 1(a) ] at low-input frequency are shown in Fig. 2(a) by dotted lines for the ideal case, and by solid lines for the case with finite I OFF and R OFF . Each latch has two read phases when the tail current source is on and the load capacitances (C L ) (dis)charge causing the outputs to change according to the input data; for the first and third latches in Fig. 1(a) , the read phases are (t 0 − t 1 ) and (t 2 − t 3 ), during which CLK is high. During the read phases, assuming that the input differential voltage is sufficiently large to commutate the current in the differential pair completely, the differential output voltage (V P − V N ) tends toward an amplitude I ON R ON with time constant R ON C L . Each latch also has two hold phases, during which the tail current source and PMOS loads are turned OFF and the load capacitances maintain their output voltages; for the first and third latches, the hold phases are (t 1 − t 2 ) and (t 3 − t 4 ), during which CLK is low.
To ensure correct operation of the divider, the differential output voltage at the end of each read and hold phase should be higher than the voltage V SW required to switch the following latch. In addition, to achieve lock at high frequency with a practically feasible input amplitude, the bias voltage V BN should be set sufficiently high to increase the injection current during the read phases, while V BP should be small enough to decrease the equivalent resistance of the load transistors. However, for a given input amplitude, this results in a significantly large I OFF and low R OFF during the hold phase, which in turn degrades (i.e., increases) F min . It can be observed from Fig. 2(a) that the output waveforms with finite I OFF and R OFF are similar to the ideal case during the read phases, albeit with different voltage levels. However, during the second hold phase (t 3 − t 4 ), when the inputs exchange states, the differential output voltage (V P − V N ) decreases toward −I OFF R OFF with time constant R OFF C L , thus leading to an increase in F min . An expression for F min can be derived with reference to Fig. 2(b) , where the second hold phase (t 3 , t 4 ) is divided into two regions: 1) the switching region (t 3 , t 3x ) during which (V INP − V INN ) changes from −V SW to V SW and output voltages (V P , V N ) remain approximately constant and 2) the decaying region (t 3x , t 4 ) where the (V P − V N ) starts to decay toward −I OFF R OFF instead of remaining at I ON R ON as in the ideal case. F min can be expressed in terms of the switching time T SW and the decaying time T decay as F min = 0.5/(T decay +T SW ); T SW and T decay can be found using first-order RC equivalent circuit analysis (see the Appendix). Assuming that T decay T SW in case of low-input frequency, and that V SW is between 0.25I ON R ON and 0.75I ON R ON [26] which are both realistic assumptions in the DCML latch, an approximate expression for F min can be written as follows:
Equation (1) Fig. 3(a) compares the analytical expression (1) against simulations of a divide-by-four using idealized DCML latches wherein PMOS loads were switched dynamically between explicit resistors R ON (during read phase) and R OFF (during hold phase), the tail current source was switched dynamically between ideal current sources I ON (during read phase) and I OFF (during hold phase), and actual transistors were used for input transistors (M n2 , M n3 ). It can be observed that the maximum error between the analytical expression and the simulation is ±7.1%. Fig. 3 
B. Effect of PVT Variations
Equations (1) and (2) show that the operating range of the DCML divider is highly sensitive to the parasitic capacitance and resistance of the load transistors. The former is exacerbated from modeling/extraction uncertainties while the latter varies with supply voltage, threshold voltage and process parameters. Fig. 4(a) shows the simulated variation in selfoscillation frequency (F OSC ) of the DCML divider ( Fig. 1 ) with supply voltage and load capacitance (C L ). The bias voltages of the tail current source (V BN ) and the PMOS loads (V BP ) are set to 650, and 325 mV, respectively. Load capacitance is swept with the supply voltage held at 1 V, and supply voltage is swept with an extra 15 fF (as an estimated of layout parasitic) added to the 43 fF intrinsic capacitance at the output nodes. It can be seen that variations as small as a few millivolts in supply voltage or a few femto-farads in C L , cause shifts of several GHz in F OSC . In addition, for given input amplitude, process and temperature variations affect the threshold voltage, which in turn causes variations in the maximum injection current, the switching threshold V SW of the differential pair, and equivalent resistance of output transistor. Fig. 4(b) shows that, according to simulations, the sensitivity curve can shift by more than 25 GHz over process and typical supply voltage variations (100 mV). This underscores the need for automatic calibration. Furthermore, it is seen that the operating range becomes narrower at higher V DD and at the fast-fast corner; this is due to: 1) higher I OFF of the tail current source and load transistors which increase F min and 2) higher V OV of the differential pair which limits F max . Moreover, PVT and mismatch variations degrade the input duty cycle which in turn degrades both F max and F min due to decrease the available time for the read or hold phases. Fig. 5(a) shows the schematic of the DCML divide-byfour circuit employing source coupled DCML latches, each ] can be configured in one of four modes: 1) baseline mode (DIV1-0) with both MS1 and MS2 off; 2) source-coupled mode (DIV1-SC) with MS1 on and MS2 off; 3) current bleeding mode (DIV1-CB) with MS1 off and MS2 on; and 4) combined current bleeding and source coupled mode (DIV1-CB+SC) with both MS1 and MS2 on. The baseline DIV1-0 latch and the resulting divider provide a reference to compare the proposed dividers in the same technology node. DIV1-0 has been designed to maximize locking range for a given power consumption; the methodology described in [26] and [27] is used to size the output loads, tail current source, and input differential pair.
III. PROPOSED WIDE LOCKING RANGE FREQUENCY DIVIDER TOPOLOGIES A. First Topology DIV1: Current Bleeding and Source Coupling
As noted in the Section II, the time required for the differential output voltage to exceed V SW during the read phase (t 0 − t 1 ) should be reduced in order to increase the operating frequency. This can be done by increasing the injection current (I ON ), but in order to do so without increasing the overall current consumption, the sources of the differential pairs of alternate latches driven by the same clock (i.e., L 1 − L 3 and L 2 − L 4 ) are coupled as shown in Fig. 5 (a). The operation of this technique can be understood by examining the idealized waveforms of latch L 3 during L 3 s critical read phase (t 0 −t 1 ), as shown in Fig. 6 (a). During this period, L 1 is in its non-critical read phase, while L 2 and L 4 are in their hold phases. In essence, source coupling results in using a fraction of the tail current of L 1 to provide the injection current in L 3 during L 3 s critical read phase (t 0 − t 1 ). This is achieved at the expense of less injection current in L 3 during its non-critical read phase (t 2 − t 3 ).
Conceptually, there are two reasons why source coupling increases injection current and therefore F max , as described next. To understand the first reason, refer to V IN3 ) . Therefore, in the baseline divider where the sources are not coupled, the voltage at node X L1 is higher than that at node X L3 . Hence, the injection current in L 1 is higher than that in L 3 . However, this simply increases power consumption without increasing F max since L 1 is in its non-critical read phase while L 3 is in its critical read phase. On the other hand, in DIV1-SC where the source nodes X L1 and X L3 are coupled to each other, the voltage at the coupled node attains a level between the levels of X L1 and X L3 in the un-coupled case. Therefore, for the same power consumption, higher injection current is available to L 3 compared to the baseline case. This, in turn, increases F max .
To understand the second reason, refer to (V IN2 ) . Therefore, in the baseline divider, node X L4 is at a higher voltage than X L2 . When X L2 and X L4 are coupled as in DIV1-SC, the voltage at the coupled node is lower than the voltage at X L4 in the baseline divider. Therefore, the average current flowing into L 4 (which is in its second hold phase) during (t 0 −t 1 ) is lower than the baseline case, while the average current flowing into L 2 (which is in its first hold phase) is higher than the baseline case. Therefore, the differential output of L 2 (differential input of L 3 ) increases faster than in the baseline case, which increases the source voltage of input pairs of L 3 . This, in turn, leads to higher injection current in L 3 due to channel length modulation of the tail transistor in L 3 . The insights gained above are verified by simulation. Fig. 7(a) compares the rms on-current I L3 during first and second read phases of DIV1-0 and DIV1-SC when the tail current source is biased at 600 mV and driven by 66-GHz signal. As predicted by the above analysis, the source coupled case achieves higher injection current in the critical read phase, and lower injection current in the non-critical read phase, thereby leading to an increase in F max without increasing overall power consumption.
The source coupling technique nonetheless suffers from a drawback for low-input frequency. Due to faster switching in the first read phase, the differential output voltage at the end of this phase is higher in DIV1-SC than in DIV1-0, as shown in Fig. 6(a) . Since this is the second hold phase of the following latch (L 4 in this discussion), it exacerbates the effect of I OFF for the following latch (Fig. 2) . However, as discussed in the previous paragraph, the OFF-current that flows through the source of the input pairs of any latch in DIV1-SC (i.e., L 4 in this discussion) decreases during the second hold phase; this partially alleviates the impact of source coupling on F min . The combined effect of these two effects is a small increase in F min in DIV1-SC.
In order to extend downward the operating frequency range, a current bleeding transistor M P3 is introduced next (DIV1-CB). During the read phase, with CLK n low and CLK p high, M P3 is turned OFF and the latch operates as before. During the hold phase, M P3 is turned on and absorbs most of I OFF . This helps maintain the correct output state, thereby decreasing F min . The size of M P3 has an important effect on the operating frequency range. Increasing the size of M P3 helps to absorb a larger I OFF which enhances F min . However, this increases leakage current through M P3 during the read phase and adds capacitance at the source of the input pair, which absorbs a part of the injection current during the read phase. Post-layout simulations of DIV1-CB conducted with different sizes for M P3 resulted in an optimum device size of 4 μm/60 nm transistor.
In order to improve both F max and F min , source coupling and current bleeding were combined, resulting in DIV1-CB+SC. Fig. 7(b) compares the simulated average rms current flowing in the input pair during the hold phases of the four DIV1 configurations. It can be observed that the current bleeding technique reduces I OFF significantly and this reduction increases with increasing input amplitude. This in turn decreases F min . For example, examination of Fig. 7 (b) together with Fig. 3(a) reveals that at 100-mV input amplitude, current bleeding decreases I OFF from 1.22 mA to 800 μA, resulting in a decrease in F min of more than 8 GHz.
B. Second Topology (DIV2): Mixed V t Design
The second proposed latch topology, shown in Fig. 8 , uses devices with different threshold voltages. Low-V t (LVT) devices are used in: 1) the tail current source to increase the injection current and 2) the input differential pair to commutate current completely faster than regular-V t (RVT) devices. From (2), this increases F max for a given input amplitude and input capacitance. In a mm-wave divider, the bias voltage V BP of the PMOS loads is usually made small (e.g., <300 mV) to decrease the load time constant. However, for practical input amplitudes, the PMOS loads do not turn completely OFF resulting in a relatively small equivalent resistance during the hold phase. This degrades F min and results in narrow locking range. To mitigate this degradation, high-V t (HVT) devices are used for the PMOS loads. Fig. 9 compares simulated OFF and ON resistances of different V t devices versus gate voltage; the OFF and ON resistances are calculated with V DS of 1 V and 150 mV, respectively. For the same gate voltage, it can be seen that the OFF resistance of HVT devices is much higher, at the expense of slightly higher on resistance. Thus, using HVT loads, F min decreases significantly while F max decreases slightly. To compensate the degradation in F max , higher injection current or higher supply voltage can be used at the expense of power consumption. Alternatively, as shown in Fig. 8 , PMOS bulk adaptation (V BL ) is proposed to change the effective R ON andR OFF of the PMOS loads. It can be seen from (1) and (2) that the ratio F max /F min is proportional to the ratio by R OFF /R ON of the PMOS loads [26] . Fig. 11(a) compares R OFF /R ON of different V t devices versus their bulk voltages. It can be seen that HVT devices have higher R OFF /R ON ratio for the same F max . For example, with V BLR = 1.1 V and V BLH = 0.7 V, the R OFF /R ON of a HVT device is 1.27 times that of an RVT device, which results in wider locking range. Figs. 10 and 11(a) show that, for a divider with any of the aforementioned latch topologies and for given power budget, bulk biasing can be used to increase F max at the expense of narrower locking range, or to widen the locking range for lower operating frequencies.
C. Third Topology (DIV3): Bulk Modulation and Bulk Control
As discussed in Section III.B, for a given tail transistor size and bias, using an LVT device increases F max . On other hand, F min also increases due to large leakage current during the hold phase. To overcome this tradeoff, bulk modulation is introduced in the tail current source, as shown in Fig. 12 . Here, the clock is applied to the bulk of the tail transistor with a bias voltage V BLn , which is different from the gate bias V BN , as shown in Fig. 12(c) . During the read phase, with CLKn high, the bulk voltage of tail current increases which decreases the threshold voltage and in turn increases the maximum injection current. During the hold phase, with CLKn low, threshold voltage increases, thus reducing the minimum leakage current. The post-layout simulated threshold voltage variation was found to be −200 mV/V.
The bulk modulation technique improves both F min and F max . Fig. 11(b) compares the average ON and OFF current of the tail current source (10 μm/60 nm), biased in saturation, with and without bulk modulation when it is driven by 40-GHz clock. For example, as shown in Fig. 11(b) , with 150-mV input amplitude, bulk modulation increases the average I ON from 3.23 to 3.79 mA, and decreases the average I OFF from 690 to 495 μA. From (1) and (2), this improves F max and F min by 8 and 4.2 GHz, respectively. Note that the average power consumption of the divider with and without bulk modulation is approximately the same. Additionally, the designed DIV3 incorporates bulk control (V BL ) of the PMOS loads, as discussed in the Section III.B.
IV. SELF-CALIBRATION AND CURRENT MINIMIZATION
Despite the enhancement achieved by the aforementioned techniques in the frequency range covered by each sensitivity band, setting an appropriate bias current and load bias voltage is necessary to enable ultra-wideband coverage. Moreover, as discussed in Section II, the operating frequency and locking ranges of all proposed divider topologies are extremely sensitive to PVT variations which necessitate different optimal bias settings depending on PVT conditions. These challenges are exacerbated when the input amplitude to the divider is small; in practice, this amplitude-produced by a voltagecontrolled oscillator buffer-will also vary with PVT. In order to facilitate practical applicability of the divider in a wide variety of usage scenarios, a self-calibration scheme that sets optimal bias conditions regardless of input frequency and PVT conditions is proposed in this section. The calibration circuit, shown in Fig. 13 , implements two algorithmsfrequency calibration and current minimization, and comprises a high-speed frequency detector, reference frequency divider, resistive Digital to analog converters (DAC's), and a fully synthesized digital controller. Frequency detection is realized using an 8-bit synchronous counter which generates an 8-bit digital word representing the ratio of the divider output frequency and the reference frequency (F Ref ). Fig. 14(a) and (b) shows the implementation of the 8-bit counter and its timing diagram, respectively. The 8-bit highspeed counter is based on a binary ripple carry incrementer. The incrementer is spilt to two smaller incrementers (mod-4 and mod-64) to achieve sufficient timing margin over PVT variations [28] . The first incrementer generates the trigger of the higher incrementer once its count reaches "10" as shown in truth table (Fig. 14) to add extra timing margin [28] for counter's critical path (i.e., gating logic in Fig. 14) . The True Single-Phase Clock flip-flops in the counter are periodically reset by a signal generated by the reference divider, which also generates the reference signal counter and reference divider consume 2.6 mA from a 1-V supply.
A. Frequency Calibration and Current Minimization Algorithms
The frequency calibration and current minimization algorithms are illustrated in the flowcharts shown in Fig. 15(a) and (b) . Initially, the digital words controlling the bias voltages V BP and V BN of the PMOS loads and tail current sources (Fig. 5 ) are preset to their highest value, thereby setting V BP and V BN to their minimum and maximum value, respectively. This initializes the divider to operate in the sensitivity curve with highest frequency and maximum injection current. For frequency calibration (load modulation state), the difference between current and previous frequency words generated by the high-speed counter is compared with a programmable value (Abs_error) to detect the change in output frequency. The frequency change is averaged over a large programmable number (Max_cycles) of reference cycles. The calibration engine detects the state of frequency unlocking if the averaged frequency change exceeds a certain level (Max_Errors). If unlocked, the digital word controlling V BP is decreased and other calibration state variables are reset. The above procedure is repeated until the divider locks to the right sensitivity curve and the frequency change become negligible (i.e., less than reference frequency). Then, before switching to the current minimization algorithm, the frequency comparison is repeated for Max_lock reference cycles to ensure locking.
In the current minimization phase, the digital word corresponding to V BN is decreased and frequency lock detection is repeated. If the divider still remains in lock after (Max_lock × Max_cycles) reference cycles, V BN is again decreased until a frequency unlocked state is detected. Finally, to return to the locked state with minimum current consumption, the digital word controlling V BN is increased by one step, whereafter the calibration engine resets and resumes monitoring the output frequency variation. The operation of the self-calibration is described in detail below for two cases under different PVT conditions.
Case 1 (Abrupt Frequency Decrease): Fig. 16 shows the simulation results of the calibration scheme when a 50 GHz, 100-mV input signal is applied initially to DIV1-CB with supply voltage of 1 V, temperature of 27°C, and typical process corner. Table I shows the nominal values of the programmable calibration parameters. The calibration operation can be divided into several regions, as shown in Fig. 17 .
Initially, the divider operates at the highest sensitivity curve and calibration begins in the load modulation mode. The calibration engine modulates the output load by decreasing bias voltage V BP to move across the sensitivity curves, as shown by (1) in Fig. 17(a) . Once locking is achieved, the calibration engine switches to the current minimization mode. In this mode, shown by (2) in Fig. 17(b) , the bias voltage V BN is progressively decreased thereby decreasing the injection current in the divider and effectively narrowing the sensitivity curve. Eventually, divider loses lock, whereupon the calibration engine switches V BN to back its previous value causing lock to be achieved once again, as shown by (3) in Fig. 17(b) . This results in minimum power consumption at 50 GHz under the specific PVT condition. Next, the input frequency is switched to 28 GHz which causes the divider to lose lock. The calibration engine responds by setting the injection current to its highest value and updating V BP to modulate the load until the lock is achieved once again [(4) and (5) in Fig. 17(c) ]. Thereafter, the calibration engine switches to the current minimization mode as described previously.
Case 2 (Duty Cycle Variation and Abrupt Frequency Decrease):
In this case, the duty cycle of the input waveform Simulation results of background calibration scheme for Case1: DIV-CB at (typical process corner, Temp = 27°C, and V DD = 1 V).
is reduced to below 50% followed by an abrupt increase in frequency. The operation of the calibration is illustrated for divider DIV3 which is assumed to operate initially with a 36-GHz, 150-mV input at 85°C in the fast-fast process corner with a supply voltage of 0.9 V. Simulation results are shown in Fig. 18 . The divider is assumed to have initially settled into an "optimally" locked condition with digital words V BP = 7 and V BN = 12 [i.e., Point X in Fig. 18(b) ]. When the duty cycle of input signals is switched from 50% to 40%, the sensitivity curves become narrower and the divider becomes unlocked. The calibration engine responds by setting the bias current to its maximum value (by changing V BN ) and then updates V BP and V BN until lock is achieved once again [(1) in Fig. 18(b) ]. When the input frequency is switched abruptly to 60 GHz [region (2) in Fig. 18(c) ], the calibration engine searches for the optimal sensitivity curve by first decreasing the digital word controlling V BP down to its lowest value and then resetting to its maximum value.
B. Calibration Parameters
Calibration accuracy depends on the parameters Abs_error and Max_Errors. Ideally, when divider is locked, these values should converge to zero. In practice, due to PVT variations and supply voltage noise of frequency detector, the counter output toggles about the correct value. Therefore, programmable parameters Abs_error and Max_Errors are used to increase margin for PVT variation. Extensive post-layout simulations of the divider with a semi-behavioral model of the calibration loop (where only the high-speed counter is replaced with an extracted circuit) under PVT variations have been used to set the values of Abs_error and Max_Errors; the calibration has been verified to operate correctly for settings of 1/8, 2/4, and 4/2.
In addition, to improve calibration operation and reduce the effect of parameters on calibration performance, the impact of supply variation can be alleviated by using on-chip supply regulator for frequency detection circuit and by increasing averaging time (Max_cycles). The calibration time can be reduced by decreasing the number of averaging cycles and locking cycles, or by increasing the reference frequency. However, this reduction will be achieved at the expense of calibration accuracy, higher power consumption and complexity of the fully synthesized digital controller. Calibration performance can also be improved by adapting Max_Errors after locking is detected (i.e., decreasing the value of Max_Errors during locking to improve calibration accuracy). Finally, it is noted that this calibration scheme can be scaled to accommodate dividers with moduli greater than four by changing the number of bits in the high-speed counter. The minimum number of bits for frequency detector can be expressed as
where F in and N are input frequency of divider and division ratio, respectively.
V. CHARACTERIZATION AND DISCUSSION
A chip (Fig. 19 ) with three prototypes using the proposed techniques was fabricated in a 65-nm CMOS process. The core of each divider has dimensions of 11 μm × 43 μm. A top level schematic representative of all three prototypes is shown in Fig. 20 . The input is provided externally via an onchip balun. The divider outputs are converted to CMOS levels by a two-stage CML-to-CMOS converter which comprises a differential amplifier followed by CMOS inverters that are coupled using back-to-back inverters to maintain 50% duty cycle. A buffer chain is used to drive 50-measurement loads. The supply voltage for the divider core is provided by an on-chip Low dropout regulator, while the supply voltage for digital circuits is provided from an external dc source. All measurements were performed using on-wafer probing.
It is noted that a separate "conventional" divider ( Fig. 1 ) with devices M S1 , M S2 , and M P3 removed from the layout was not implemented on the test chip in order to save silicon area. However, in order to verify that DIV1-0 provides a fair reference for comparison, the locking ranges of DIV1-0 and the conventional DCML divider were compared using postlayout simulation. The input capacitance and capacitance at node X [ Fig. 5(b) ] of DIV1-0 were estimated to be 80 and 156 fF, compared to their respective values of 77 fF, and 151 fF for a conventional divider. The locking ranges of DIV1-0 and the conventional divider were estimated from post layout simulation as (41.75-71 GHz) and (41.5-71.25 GHz) at 0-dBm input power. Their lock ranges at −10 dBm input power were (51.5-64.5 GHz), and (51.4-64.8 GHz). Fig. 21 (a) compares measurements and post-layout simulations of the self-oscillation frequency F osc of three samples of the baseline topology DIV1-0. The simulations-conducted on an RC-extracted divider core with EM-modeled supply, ground and output signal paths-indicate that by changing the PMOS bias voltage V BP , the F osc can vary from 5 to 20 GHz, which indicates an F max of 80 GHz. Fig. 21(b) compares the measured and post-layout simulated sensitivity curves of DIV1-0 for different bias voltages V BP . A standalone onchip balun was characterized for the purpose of de-embedding test setup losses. Maximum/minimum losses of the overall test setup were measured to be 24/18.2 dB at 16/45 GHz and they have been de-embedded. The measured sensitivity curves, which are in good agreement with simulations, span 18-64.2 GHz with fractional bandwidths of 51.7%, 56.1%, and 35.6%, respectively. At the low end, the measurement is limited by the bandpass nature of the on-chip balun. At the high end, although the divider was not characterized beyond 67 GHz due to signal generator limitations, the good agreement between the simulated and measured F osc indicates that the divider is capable of operation beyond 67 GHz. Table II compares the locking range and fractional bandwidth of DIV1 for the sensitivity curves reported in Fig. 22(a) . It can be observed that the locking range is improved by more than 5 GHz when the current bleeding or source coupling techniques are used separately. Examination of Table II reveals that current bleeding along (DIV1-CB), F min decreases significantly, while F max is not affected significantly. On the other hand, source coupling alone (DIV1-SC) enables a significant increase in F max while affecting F min slightly. These observations are consistent with the analysis in Section II. Combining current bleeding and source coupling (DIV1-CB+SC) improves the locking range by 4.75 and 3.25 GHz at input power of −5 and −10 dBm, respectively. The measured locking range improvement of the DIV1-CB+SC configuration is observed to be less than the analytical prediction of Sections II and III; this is due to unaccounted capacitance at node X in Fig. 5 resulting from devices M n1 , M S1 , and M S2 . This capacitance absorbs a fraction of the injection current and hence decreases the operating range. Fig. 22 (b) compares DIV2 with DIV1-0 for different bias voltages V BP . A 1.1-V V DD is used in DIV2, so that it operates in roughly the same frequency bands as DIV1-0 or given V BP ; this increases the power consumption of DIV2 from 6.2 to 7 mW. In the low/mid/high sensitivity bands, DIV2 extends the locking range by 4.75/8.25/2.75 GHz and achieves fractional bandwidth of 59.4%/68.2%/39.1% compared to 51.7%/54.1%/35.6% for DIV1-0. Note that the improvements in the highest and lowest curves in Fig. 22(b) are limited by signal generator and on-chip balun limitations, respectively. Fig. 23 (a) compares the measured sensitivity curves of DIV3 and DIV1-0 for different bias voltages V BP . It is seen that bulk modulation improves both F min and F max , as discussed in Section III, and extends the locking range by 4.5/5.75/6.25 GHz at low/medium/high curve; correspondingly, the fractional bandwidth has been improved from 51.7%/55.1%/35.6% to 67.9%/70%/47.1% in the three bands. The measured locking range and fractional bandwidth of DIV1-0, DIV2, and DIV3 are summarized in Table III ; examination of this table highlights the large improvements that are enabled by the proposed circuit design techniques. Fig. 23(b) shows the variation in F OSC of DIV3 with the bulk bias voltage V BL for V BP = 350 mV. The sensitivity of F OSC to V BL is −5.3 GHz/V; this provides another method to shift the sensitivity curve by more than 20 GHz for a given current consumption.
The phase noise at the input and the output of the divider (DIV2) are measured using a Keysight E5052B signal source analyzer, and are shown in Fig. 24(a) and (b) , respectively for 52-GHz input frequency. Fig. 25 and Table IV compare the measured phase noise of proposed architectures with input frequency of 44 GHz. Since DCML divider is a synchronous divide-by-four, it does not suffer from jitter accumulation. The output phase noise is reduced by roughly 12 dB compared to the input. This reduction extends to about 2-MHz offset from the carrier frequency beyond which amplitude noise from the buffers starts to dominate. As expected, the phase noise of proposed DIV2, DIV3 is better than DIV1-0, since the phase noise of a DCML latch is proportional to (C L /I 2 ON ) [27] . Fig. 26 shows the operation of the calibration engine. Calibration was characterized for all divider topologies, and behaves as expected; however, characterization results are presented herein only for DIV1-CB+SC. Fig. 26(a) shows the locked output spectrum at 1-V V DD for a 52 GHz, −10 dBm input. When the V DD is decreased by 100 mV, the divider loses lock, as shown in Fig. 26(b) . The calibration circuit is then enabled, whereupon the divider regains lock by changing the bias voltage V BP and moving to a different sensitivity curve; this is shown in Fig. 26(c) . The input frequency is then changed abruptly to 24 GHz; Fig. 26(d) shows the output spectrum as the calibration engine enables the divider to regain Measurement results and benchmarks are summarized in Table V . For the dividers with multiple sensitivity curves, Figure of merit (FOM) and Fractional bandwidth (FBW) of each curve are calculated at maximum reported input power of each benchmark. In order to make a fair comparison with dividers with moduli other than four, and to explicitly account for the fact that achieving a target locking range becomes harder at low-input amplitude, it was necessary to define a new figure-of-merit as follows:
Three main conclusions are drawn from Table V: 1) the proposed techniques significantly enhance performance and robustness in practical applications; 2) in contrast to previous designs, the proposed designs demonstrate little locking range degradation at low-input amplitude compared to the maximum achievable locking range at high-input amplitude; and 3) Using the calibration scheme, the proposed dividers cover widest locking range reported to date (16-67 GHz) without any manual tuning which results in highest FOM P to date.
It is worth noting that the divide-by-two ILFD in [16] features the highest FOM in Table V . This is due to the combined advantage of injection locking to a resonant circuit using a strong second harmonic rather than a weak fourth harmonic in divide-by-four. However, this advantage vanishes when FOM P is used for comparison, as seen in Table V .
Table V also underscores the scaling-friendly nature of the DCML approach, as evidenced by the fact that the divideby-four designs [26] , [27] with superior FOM are designed in more advanced CMOS technologies. Note, however, that advanced technologies (especially 32 nm and below) vary widely in terms of process type and flavor. Such considerations, as well as the availability of process features chosen for a particular application can heavily influence design choices. Therefore, the circuit techniques proposed herein (or combinations thereof), and the self-calibration scheme are highly relevant even in more advanced technologies.
VI. COMPARISON AND EXTENSIONS OF DIVIDER TOPOLOGIES
In this section, the proposed divider topologies are compared based on the analysis and measurements presented in Section II and Section V, respectively. Several observations and conclusions can be drawn as follows.
1) In low-cost CMOS designs without additional process options such as multiple Vt devices or triple well, source coupling alone can be used to increase F max , while current bleeding alone can be used to decrease F min . Combining the two techniques improves F max and F min despite a small increase in the input capacitance. 2) In designs using technologies where devices with different V t and/or triple well are available, DIV2 and DIV3 can improve locking range at the expense of slightly higher power consumption and input capacitance, respectively. 3) Certain combinations of the proposed techniques can be used to further improve the locking range. For example, source coupling can be combined with HVT loads to improve both F max and F min with approximately the same input capacitance and power consumption. Bulk modulation and current bleeding can be combined to further reduce F min . 4) Combining all of the above techniques can further improve locking range; however a price is paid in terms of higher input capacitance and power consumption. In other words, overall performance improvement can be achieved if the input amplitude is low, but such improvement is diminished when higher input amplitude can be delivered. For example, the effectiveness of combining current bleeding, HVT loads and bulk modulation in reducing I OFF is diminished when the input amplitude is increased. Similarly, the effectiveness of combining source coupling, LVT tail transistor and bulk modulation in increasing I ON is diminished at higher input amplitude. Nevertheless, the improvements offered by combining all techniques can result in a reduction of overall system power, especially at high mm-wave frequencies, since the swing at the output of the LO buffer can now be smaller, thereby reducing its current consumption.
VII. CONCLUSION
We have proposed several new design techniques to enhance the operating frequency range and locking range of inductorless mm-wave frequency dividers. These techniques are informed by a more refined analysis of the divider than available in the literature. The proposed design techniques, namely source coupling, current bleeding, multi-V t design, adaptive bulk biasing and bulk modulation are demonstrated via three prototype designs. A calibration scheme is proposed to adaptively tune and optimize bias settings for a given input frequency, thereby enhancing robustness, reducing power consumption and resulting in a practically usable divider. In addition, during the decaying region (t 3x , t 4 ), when the differential output voltage (V P − V N ) starts to decay toward −I OFF R OFF instead of remaining at I ON R ON as in the ideal case, the latch outputs can be expressed as
In order to maintain frequency lock, the differential output voltage (V P − V N ) should be higher than V SW at the end of decaying region. In other words, the following condition should be satisfied
where T decay , the maximum allowable decaying time to ensure proper operation can be derived as
By defining the minimum operating frequency F min as 0.5/(T decay + T SW ), and by deriving T SW = t 3x − t 3 from (1), F min can be expressed as
