# Design of Low-Voltage Power Efficient Frequency Dividers in Folded MOS Current Mode Logic

Francesco Centurelli<sup>1</sup>, *Member, IEEE*, Giuseppe Scotti<sup>1</sup>, *Senior Member, IEEE*, Alessandro Trifiletti<sup>1</sup>, and Gaetano Palumbo<sup>2</sup>, *Fellow, IEEE* 

Abstract—In this paper we propose a methodology to design high-speed, power-efficient static frequency dividers based on the low-voltage Folded MOS Current Mode Logic (FMCML) approach. A modeling strategy to analyze the dependence of propagation delay and power consumption on the bias currents of the divide-by-2 (DIV2) cell is introduced. We demonstrate that the behavior of the FMCML DIV2 cell is different both from the one of the conventional MCML DFF (D-type Flip-Flop) and from FMCML DFF without a level shifter. Then an analytical strategy to optimize the divider in different design scenarios: maximum speed, minimum power-delay product (PDP) or minimum energy-delay product (EDP) is presented. The possibility to scale the bias currents through the divider stages without affecting the speed performance is also investigated. The proposed analytical approach allows to gain a deep insight into the circuit behavior and to comprehensively optimize the different design tradeoffs.

The derived models and design guidelines are validated against transistor level simulations referring to a commercial 28nm FDSOI CMOS process. Different divide-by-8 circuits following different optimization strategies have been designed in the same 28nm CMOS technology showing the effectiveness of the proposed methodology.

*Index Terms*—Current-Mode Logic, frequency divider, logic design, nanometer CMOS, delay model.

# I. INTRODUCTION

**F**REQUENCY dividers are fundamental building blocks in many applications, such as frequency synthesizers [1]-[3], clock generators [4-5], high-speed SerDes subsystems [6]-[11] and time-interleaved data converters [12[-[15]. These applications show in general a trend towards the use of deep submicron CMOS technologies, that provide higher and higher frequency performance (with transition frequencies up to 350/200 GHz for NMOS/PMOS devices [16]), and require low supply voltages around 1V or less, with much reduced power consumption with respect to their bipolar counterparts.

Minimization of power consumption is a key factor in many applications, to ease portability and to limit overheating, thus simplifying also the design of packaging and heat dissipation. Several architectures are available for high-speed frequency dividers, such as static frequency divider (SFD) [17]-[21], regenerative frequency divider (RFD) [17], [22]-[24] and injection-locked frequency divider (ILFD) [17], [25]-[27]. The SFDs have the advantage that they can operate from dc to very high frequencies, and are composed only of standard digital blocks: this simplifies the design and allows design reuse and application in reconfigurable systems, making them the most commonly used frequency divider architecture unless extremely high frequencies are required. Contrasting requirements are posed on the design of frequency dividers: low power consumption and low area footprint are important to ease the application in Systems-on-Chip; a low supply voltage can be mandated by technological limits or system specifications, thus stressing the noise margin, hence the output swing.

For high frequency applications, logic families based on a differential approach and on current steering are typically preferred, since they offer the benefits of fast switching, low sensitivity to common-mode disturbances and low power supply switching noise, that is a great advantage for mixed-signal applications where logic circuits share the same chip with high-sensitive analog blocks. All this is paid by an increased power consumption with respect to the standard CMOS. In the case of MOS technology, the reference logic family is thus the MOS Current-Mode Logic (MCML) [28]. Conventional MCML exploits series gating to obtain logic functions. In this logic style, even limiting the number of stacked devices to two, the minimum supply voltage cannot be too low due to the cascade of gate-source voltages. Indeed, for a standard MCML 2-input logic gate at least a minimum supply voltage of

$$V_{DD,min} = 2V_{TH} + 3V_{OV} + V_R \tag{1}$$

is required, where  $V_{TH}$  is the threshold voltage of the devices,  $V_{OV} = V_{GS} - V_{TH}$  is the overdrive voltage and  $V_R$  is the DC voltage drop across the load resistor, whose value is constrained by the need to fully switch the differential pairs<sup>1</sup>.

To further reduce the supply voltage, to be compliant with technology constraints and simplify interfacing with lower

Manuscript received May 01, 2020.

Authors <sup>1</sup> are with the DIET Dept. of the University of Rome "La Sapienza" Italy (e-mail: <u>francesco.centurelli@uniroma1.it</u>).

Author<sup>2</sup> is with the DIEEI Dept. of the University of Catania, Italy (e-mail: <u>gaetano.palumbo@dieei.unict.it</u>).

Copyright (c) 2008 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org.

 $<sup>^{1}</sup>V_{R}+3V_{OV}$  is the theoretical minimum supply, but (1) takes into account the constraints derived by cascading MCML stages with interstage level shifters.

frequency blocks implemented in standard CMOS, some solutions which modify the basic MCML family have been proposed in the literature [29]-[32]. Among these solutions, whose drawbacks are discussed in [33] and [34], the Folded MCML (FMCML) approach [33], which allows

$$V_{DD,min} = V_{TH} + 2V_{OV} + V_R \tag{2}$$

seems to be particularly promising. In particular, [33] and [34] have shown the advantages of the FMCML logic style for a very low-voltage implementation of D-latch and DFF (D-type Flip-Flop) in a mixed-signal environment, that requires low supply switching noise and high immunity to noise and disturbances. These advantages easily apply also to the SFD, that uses the DFF as basic building block.

In this paper we present design criteria for low-voltage, high speed and energy efficiency static frequency dividers implemented in deep submicron CMOS technologies and based on the FMCML approach. Despite MCML frequency divider designs were previously treated [35]-[36], all the previous strategies are not suited for the case under consideration. Indeed, [35] uses the conventional MCML style, and the core latch, even if loaded by a level shifter, has not a constant delay versus the bias current, as happens in our logic style. This difference highly affects the design procedure that, completely different from our case, can assume constant the level shifter bias current. Regarding the design in [36], it is carried out considering MCML cells without level shifters, which are mandatory in a low voltage domain with FMCML.

The paper is structured as described in the following. In section II we describe the proposed frequency divider architecture which exploits the FMCML D-latch as basic building block. In section III we present a complete analysis of the clock-to-output propagation delay of the basic FMCML divide-by-2 (*DIV2*) cell, which is then exploited to derive design guidelines for multistage frequency dividers. Validation of the proposed models and design case studies referring to a 28nm FDSOI CMOS technology are reported in sections IV and V respectively. Finally, some remarks and the conclusions are drawn in section VI.

# II. THE FREQUENCY DIVIDER ARCHITECTURE

In applications where high frequency is the main requirement, the  $2^N$  static frequency divider is typically implemented by cascading *N* divide-by-2 (*DIV2*) stages, thus implementing what is called an asynchronous frequency divider. In the simplest implementation, each stage is a Toggle Flip-Flop (*TFF*) with the *T* input set to logic-1 (to toggle at every rising edge of the clock signal), as shown in Fig. 1a, and the output of the *i*-th divide-by-2 stage is applied as the clock input of the following stage. In CML logic, where the *DFF* can be easily implemented and has complementary input and output, the *TFF* is realized by a *DFF* where the *D* input is connected to the inverted output (i.e., the  $\overline{Q}$  output is connected to the *D* input, as shown in Fig. 1b.

For very low-voltage applications, standard MCML cannot be used, and a suitable alternative is the FMCML logic style. In FMCML, the 2-level series gating is implemented by exploiting both NMOS and PMOS differential pairs. In particular, the lowest level of a standard MCML D-latch is implemented through a PMOS differential pair, whereas the steered currents are folded by NMOS current mirrors to NMOS differential pairs, which realize the upper level of a conventional NMOS MCML D-latch. Finally, currents are recombined in the output load, made up of resistors or triode-biased PMOS devices, thus implementing a wired-*OR* function. This topology allows reducing the minimum supply voltage with respect to a standard 2-level MCML logic gate, which becomes equal to that of a simple inverter [33]-[34]. Moreover, it is worth noting that a level shifter can still be required to adapt the output dc common-mode level to the input DC level of the PMOS input pair.



Fig. 1. Static frequency divider: a) based on TFF; b) based on DFF.



Fig. 2. Topology of a D-latch in Folded MCML logic style.

The topology of a FMCML D-latch is shown in Fig. 2, where the clock signal input is applied to the PMOS differential pair,  $M_1$ - $M_2$ , which steers the tail current,  $I_{TAIL}$ , towards the current mirrors  $M_3$ - $M_7$  or  $M_4$ - $M_8$ , thus enabling the track differential pair  $M_9$ - $M_{10}$  or the latch differential pair  $M_{11}$ - $M_{12}$ . Transistors  $M_5$  and  $M_6$  are inserted to allow matching the drain-source voltages in the current mirrors, thus enhancing the mirroring precision (consider that in deep submicron technologies the channel length modulation effect is usually very large).

The *DFF* in Fig. 1b is implemented with the master-slave approach by cascading two D-latches with counter-phase clock signals. When the clock is high, the first latch tracks the input signal, and the second one holds the previous input. When the clock is low, the first latch holds the input and the second latch tracks the output of the first one, thus making the output equal to the input read in the previous phase.

It is worth noting that the FMCML logic style allows an efficient low-voltage implementation of the *DFF* [33], since only a single PMOS pair is used, common to both the two



Fig. 3. Topology of a D-type flip flop in Folded MCML logic style.

latches, thanks to an additional current mirrors output branch, as shown in Fig. 3. When the input clock signal steers the current to transistor  $M_1$  ( $M_2$ ) and to current mirror  $M_3$ - $M_7$ - $M_{7A}$ ( $M_4$ - $M_8$ - $M_{8A}$ ), the track (latch) pair of the first latch and the latch (track) pair of the second latch are enabled, and the fully differential nature of the structure avoids the need of an additional PMOS pair. Therefore an FMCML divider could result advantageous with respect to a standard MCML one not only due to the lower supply voltage, but also for a lower number of current branches; on the other hand, the NMOS current mirror could reduce the maximum operating frequency, by providing a large load to the input pair and by providing an additional pole, as will be discussed in the next section..

As noted before, in Fig. 3 an input source follower is also included since, despite a level shifter is not required for a single divide-by-2 stage, it is mandatory to cascade several stages, as in a divider. Hence, it will be considered in the rest of the paper, including the derived design guidelines.

# III. DELAY MODEL OF THE FMCML FOR STATIC FREQUENCY DIVIDER

In this section we present a complete analysis of the clockto-output propagation delay of the basic FMCML frequency divider cell, which is then exploited to derive design guidelines for multistage frequency dividers.

## A. Time constants

Static frequency dividers are usually characterized by their sensitivity curve [37], which shows the relation between the minimum input amplitude of the full-rate clock, for which the divider properly works, and the frequency of the full-rate signal. This curve allows to identify the divider self-oscillation frequency (SOF), which is (twice) the oscillation frequency of the divider in a ring oscillator configuration (i.e., in feedback and without clock).

However, when the divider is implemented in deep submicron CMOS technologies, other figures of merit, such as the maximum input frequency the divider is able to work with, or the *DFF* maximum toggle frequency, are usually adopted. According to [35] and [38] these frequencies are related to the D-latch clock-to-output propagation delay<sup>2</sup>  $t_{CKQ}$ , which is much easier to measure and to model. The propagation delay  $t_{CKQ}$  can, hence, be used as a metric to assess the divider performance and to derive design guidelines (consider that for a master-slave *DFF* the clock-to-output propagation delay is equal to the clock-to-output propagation delay,  $t_{CKQ}$ , of the slave latch).



Fig. 4. Small-signal equivalent circuit of the level shifter (parameters with subscript LS refer to  $M_{LS1,2}$  in Fig. 3, and  $G_G$  and  $C_G$  model the admittance of the current source  $I_B$ ).

The propagation delay of the FMCML D-latch,  $t_{CKQ}$ , can be estimated by linearizing the circuit and applying the opencircuit time-constant method [39]. In particular, it is useful to separate the level shifter contribution,  $t_{pLS}$ , from the D-latch core contribution,  $t_{pLATCH}$ , which are related to the bias currents  $I_B$  and  $I_{TAIL}$ , respectively. Thus, we can write

$$t_{CKO} = t_{pLS} + t_{pLATCH}.$$
(3)

Concerning the level shifter, from the small-signal circuit reported in Fig. 4 we get the following transfer function:

<sup>&</sup>lt;sup>2</sup> The propagation delay is defined as the time taken by the output to reach 50% of its full swing, starting from the point in which the input has reached 50% of its final value.



Fig. 5. Small-signal equivalent circuit to calculate the time constants of the latch.

$$F(s) = \frac{v_o}{v_i} \simeq \frac{g_{mLS}}{g_{mLS} + g_{mbLS} + G_G} \frac{1 + s\tau_z}{1 + s\tau_p} \tag{4}$$

where

$$\tau_p \approx \frac{c_{gsLS} + c_{sbLS} + c_G + c_{LB}}{g_{mLS} + g_{mbLS} + G_G} \tag{5}$$

and

$$\tau_z = \frac{c_{gsLS}}{g_{mLS}} \tag{6}$$

are the pole and zero time constants,  $Y_G = G_G + sC_G$  is the output admittance of the current source  $I_B$ ,  $C_{LB}$  is the load capacitance seen by the level shifter, and primarily it is due to the PMOS differential pair input capacitance of the latch core, and the remaining terms are the usual small-signal parameters of the transistors  $M_{LS1,2}$  in Fig. 3.

From (4), the level shifter propagation delay can be calculated as

$$t_{pLS} = \tau_p \ln\left(2\frac{\tau_p - \tau_z}{\tau_p}\right) \simeq \tau_p \ln 2 \tag{7}$$

where the approximation holds since usually  $\tau_P$  is at least an order of magnitude higher than  $\tau_Z$  (despite the denominator of  $\tau_P$  is slightly larger than the  $\tau_Z$  one, due to  $C_G$  and  $C_{LB}$ , the numerator of  $\tau_P$  is several times higher than the  $\tau_Z$  one).

According to [33], the propagation delay of the latch core,  $t_{pLATCH}$ , can be computed as the sum of three time constant contributions,

$$t_{pLATCH} = (\tau_1 + \tau_2 + \tau_3) \ln 2$$
(8)

where  $\tau_1$ ,  $\tau_2$  and  $\tau_3$  take into account three sections along the total path of the latch core and, for practical values of the device parameters, some of them could be negligible. In particular, referring to the small-signal differential half-circuit in Fig. 5, we can identify the time constants  $\tau_1$ ,  $\tau_2$  and  $\tau_3$  to be the time constants of:

- the PMOS differential pair (from  $v_i$  to  $v_D$ );
- the current mirror output branch (from  $v_D$  to  $v_S$ );
- the track differential pair (from the source of  $M_9-M_{10}$  to the output, i.e. from  $v_s$  to  $v_o$ ).

Assuming equal  $M_3$ ,  $M_5$  and  $M_7$ , the time constants are given respectively by

$$\tau_1 = \frac{c_{gdp} + c_{dbp} + c_{diode} + 2c_{gsn} + 3C_{gdn}}{c_{diode}} \tag{9a}$$

$$\tau_2 = \frac{c_{gdn} + c_{dbn} + 2c_{gsDP} + 2c_{sbDP}}{g_{mDP} + g_{mbDP}} \tag{10}$$

$$\tau_3 = R_D \Big( C_{gdDP} + C_{dbDP} + C_{LATCH} + C_{RD} + C_L \Big)$$
(11)

where  $C_{RD}$  is the parasitic capacitance of the resistive load  $R_D$ [28],  $C_L$  is the load capacitance, and  $Y_{diode} = G_{diode} + sC_{diode}$ is the admittance of the input branch of the current mirrors loading the PMOS differential pair and made up of transistors  $M_{3,4}$  and  $M_{5,6}$  ( $M_3$  and  $M_5$  in Fig. 3 are equivalent to a diode connected MOS), equal to

$$G_{diode} \approx g_{mn}$$
 (12)

$$C_{diode} \approx C_{dbn} + C_{gsn} + 2C_{gdn}.$$
 (13)

The other parameters have the usual meaning of MOS smallsignal parameters (suffix *p* refers to  $M_1$  and  $M_2$ , suffix *DP* to  $M_9-M_{12}$  and  $M_{9A}-M_{12A}$ , and finally suffix *n* to  $M_7-M_8$  and  $M_{7A}-M_{8A}$ ). Moreover, the load effect of the latch differential pair  $M_{11}-M_{12}$ , results

$$C_{LATCH} = C_{gDP} + C_{sbDP} \tag{14}$$

where  $C_{gDP}$  is the capacitance at the gate of MOS transistor without a bias current.

Note that if we consider a single FMCML D-latch instead of the FMCML *DFF* in Fig. 3, the current mirror has only one output branch and therefore  $\tau_1$  in (9a) can be rewritten as follows:

$$\tau_{1,\text{latch}} = \frac{C_{gdp} + C_{dbp} + C_{diode} + C_{gsn} + 1.5C_{gdn}}{G_{diode}}.$$
 (9b)

#### B. Core latch current domain behavior

In order to derive *DIV2* block design guidelines to be used in the divider design, parameters dependence on the bias currents has to be explicitly represented, thus allowing the clock-tooutput propagation delay and the power consumption to be analyzed and optimized. It is worth noting that all the small signal parameters in the previous equations are proportional to the gate width of the respective devices (minimum gate length is assumed).

We start by selecting appropriate values for the voltage swing

$$V_{swing} = 2\Delta V = 2R_D I_{TAIL} \tag{15}$$

and the noise margin, that can be expressed as [28]

$$NM = \Delta V \left( 1 - \frac{\beta}{A_V} \right) \tag{16}$$

where  $\beta$  is a factor that depends on the model adopted to describe the MOS behavior (ranging from  $\sqrt{2}$ , for the quadratic model, to 1 for a submicron linear model) and  $A_V$  is the small signal gain of the considered path (clock input to output)

$$A_{\nu} = g_{mp} R_D. \tag{17}$$

By using the  $\alpha$ -power model of the MOS transistors, from relationships (15)-(17) we can express the gate width of the PMOS devices  $W_p$  as a function of the tail current [28]

$$W_p = \frac{2^{\alpha - 1}}{K} \left(\frac{A_V}{\alpha \Delta V}\right)^{\alpha} I_{TAIL} \tag{18}$$

where K and  $\alpha$  are technology-dependent parameters ( $\alpha$  and K asymptotically tend to 1 and  $v_{sat}C_{ox}$ , respectively, for shortchannel devices, whereas they are equal to 2 and  $\mu_p C_{ox}/2L_p$  for long channel devices).

Similarly, once an appropriate value for the overdrive voltage has been selected, the gate width of transistors with suffix n and the one of transistors with suffix DP can be written as follows:

$$W_n = \frac{I_{TAIL}}{2K_n V_{o\nu,n}^{\alpha}} \tag{19a}$$

$$W_{DP} = \frac{I_{TAIL}}{4K_{DP}V_{ov,DP}^{\alpha}}$$
(19b)

and the gate width of the NMOS transistors in the level shifter is given by:

$$W_{LS} = \frac{l_B}{\kappa_{Buf} V_{ov,LS}^{\alpha}}.$$
(20)

Substituting (18)-(20) into (9a), (10) and (11), the first two time constants,  $\tau_1$ , and  $\tau_2$ , (i.e., the propagation delay from the PMOS input to the sources of the NMOS track pair) do not depend on the bias current  $I_{TAIL}$ , since all the terms in numerator and denominator depend on the width of a MOS of the latch core, thus on  $I_{TAIL}$ . More comments and in-depth investigations are needed on the third time constant,  $\tau_3$ , which from (11) can be rewritten as:

$$\tau_3 = \tau_{3MOS} + \tau_{RD} + R_D C_L \tag{21}$$

where

$$\tau_{3MOS} = R_D \Big( C_{gdDP} + C_{dbDP} + C_{LATCH} \Big) \tag{22}$$

$$\tau_{RD} = R_D C_{RD}. \tag{23}$$

Like  $\tau_1$  and  $\tau_2$ ,  $\tau_{3MOS}$  is independent on  $I_{TAIL}$  ( $R_D$  can be expressed as a function of  $I_{TAIL}$  by using (17) and (18)). The intrinsic time constant of the pull-up load  $\tau_{RD}$ , instead, exhibits a dependence on  $I_{TAIL}$  which changes with the adopted kind of load [40]. In particular, for a resistance load,  $\tau_{RD}$  decreases with  $I_{TAIL}^2$  up to a high bias current value, whereas, for a triode-based PMOS load, the behavior of  $\tau_{RD}$  with the bias current depends on the adopted strategy to set the value of the equivalent resistance. In fact, if we set the value of the equivalent resistance by modifying the channel length,  $\tau_{RD}$  decreases with  $I_{TAIL}^2$ , whereas, if we set the value of the equivalent resistance by modifying the gate bias voltage  $V_{GATE}$ ,  $\tau_{RD}$  decreases with  $I_{TAIL}$ . In both cases, this behavior holds up to a low bias current value from which  $\tau_{RD}$  remains constant (the current value corresponding at the PMOS triode resistance with minimum size [40]).

In order to compute the value of the load capacitance  $C_L$ , we have to remember that our divider core is based on a unitary feedback *DFF* as shown in Fig. 1b. Therefore,  $C_L$  is the sum of the input capacitance  $C_{in}$  of the track NMOS differential pair, and another term, which, when a level shifter is used as *DIV2* 

input, is equal to the input capacitance  $C_{iLS}$  of the level shifter in the following stage (see Fig. 3). Hence, (8) can be rewritten as:

$$t_{pLATCH} = \ln 2 \left( \tau_{MOS} + \tau_{RD} + \frac{C_{iLS}\Delta V}{I_{TAIL}} \right)$$
(24)

where  $\tau_{MOS} = \tau_1 + \tau_2 + \tau_{3MOS} + R_D C_{in}$ , that includes all the effects due to the MOS devices independent from  $I_{TAIL}$ , is generally dominant with respect to the other terms of (24) (due to the current mirror capacitive load, the higher contribution in  $\tau_{MOS}$  is  $\tau_1$ ).

Focusing on  $\tau_{RD}$ , in typical VLSI applications, where minimization of silicon area is to pursue and a PMOS triode load is hence adopted, unless for very low tail currents, it is practically independent on the current. Moreover, even with a resistive load, since  $\tau_{RD}$  is inversely proportional to  $I_{TAIL}^2$ , the contribution of  $\tau_{RD}$  to the overall propagation delay can be neglected (especially if high-resistivity polysilicon resistors are used) [40].

In conclusion, relationship (24) shows that the propagation delay of the latch core,  $t_{pLATCH}$ , unless for very low current values, can be assumed independent from  $I_{TAIL}$ . Thus, usually, in a FMCML latch with a level shifter load the optimal tradeoff between high speed and low power-delay product (*PDP*) is achieved at low bias current, being, with a very good approximation, the maximum speed independent on the current and almost equal to the intrinsic time constant of the gate,  $\tau_{MOS}$ .

It is worth noting that this behavior is different both from the conventional MCML one ([28] and [36] and [41]) and from the FMCML latch without a level shifter load ([33] and [34]). Indeed, in the considered case, the time constant due to the current mirror, missing in a conventional MCML, makes the contribution independent from  $I_{TAIL}$  (i.e., the constant part),  $\tau_{MOS}$ , much more dominant with respect to the other terms, and the presence of the level shifter makes the latch load negligible.

# C. Level shifter current domain behavior

Let us now consider the propagation delay of the level shifter. By inspection of (5), unless for  $C_{LB}$ , which is the input capacitance of a PMOS pair and proportional to  $I_{TAIL}$  of the core latch, all the capacitances are directly proportional to  $W_{LS}$ , hence to the level shifter bias current  $I_B$  through (20). Thus, the propagation delay is given by

$$t_{pLS} = \left(\tau_B + \frac{\tau_{LB}}{\kappa_{LS}}\right) \ln(2) \tag{25}$$

where

$$K_{LS} = \frac{I_B}{I_{TAIL}} \tag{26}$$

is the ratio between the bias currents of the level shifter and the one of the latch core, respectively (i.e., it represents the level shifter bias current normalized to the one of the core latch). Moreover, in order to highlight the dependence on the bias current ratio,  $K_{LS}$ , we have rewritten  $\tau_p$  as the sum of the two time constant contributions:

•  $\tau_B$ , which accounts for all the capacitances unless  $C_{LB}$  and is independent from the bias current ratio;

=

•  $\tau_{LB}$  which is the time constant due to the level shifter load,  $C_{LB}$ , divided for  $K_{LS}$  (i.e., inversely proportional to the level shifter bias current).

Note that the term in the accurate expression of (7) not only is negligible, but, being  $\tau_z$  independent from the level shifter bias current, it has also a negligible dependence on the ratio  $K_{LS}^3$ .

From (25) it is apparent that setting  $K_{LS}$  sufficiently high (i.e., the level shifter bias current,  $I_B$  sufficiently higher than the tail current  $I_{TAIL}$ ), we can cut the contribution of the level shifter load,  $C_{LB}$ , with respect to the other capacitance contributions, thus decreasing the level shifter propagation delay up to its asymptotic minimum value,  $\tau_p \ln (2)$ .

# IV. FMCML DFF AND DIV2 DESIGN

In this section we focus on the design of the basic frequency divider cell and present design guidelines for the frequency divider following various design constraints.

# A. DIV2 design strategy

It is apparent that in the design of a  $2^N$  frequency divider with a high speed performance the first *DIV2* cell has to be designed with the minimum possible propagation delay  $t_{CKQ}$ . Thus, considering that the *DIV2* clock-to-Q propagation delay is equal to

$$t_{CKQ} = t_{pLS} + t_{pLATCH} =$$

$$= \ln(2) \left[ \left( \tau_B + \frac{\tau_{LB}}{\kappa_{LS}} \right) + \left( \tau_{MOS} + \tau_{RD} + \frac{c_{iLS}\Delta V}{l_{TAIL}} \right) \right] \approx$$

$$= \ln(2) \left( \tau_{MOS} + \tau_B + \frac{\tau_{LB}}{\kappa_{IS}} \right), \qquad (27)$$

we get that the  $I_{TAIL}$  has practically no effect on the delay. We can therefore choose a low value for such current, that allows obtaining that the  $t_{pLATCH}$  is almost equal to  $ln(2)\tau_{MOS}$ . This choice also means to guarantee the minimum power-delay product (*PDP*) of the latch core. Concerning the level shifter design, we should set a sufficiently high  $K_{LS}$  value if we want to minimize the propagation delay of the level shifter, and hence of the whole *DIV2*. It is apparent, however, that this choice may require a too high current consumption.

In particular, considering the DIV2 power consumption

$$P_{DIV2} = (2I_B + 3I_{TAIL})V_{DD} = (2K_{LS} + 3)I_{TAIL}V_{DD}, \quad (28)$$

and multiplying it by  $t_{CKQ}$  in (27) we get that the *DIV2 PDP* has a hyperbolic behavior whose minimum is at

$$K_{LS\min,PDP} = \sqrt{\frac{3}{2} \frac{\tau_{LB}}{\tau_B + \tau_{MOS}}}.$$
(29)

 $K_{LSmin,PDP}$  in (29) is surely lower than one, and hence, much lower than the value which allows the maximum speed performance.

Moreover, looking for the minimum energy-delay product (EDP), which represents the optimum tradeoff between the energy per operation and speed, we can start from the following expression of EDP

$$EDP = PDP \cdot t_{CKQ} =$$

<sup>3</sup> Its expression versus  $K_{LS}$  results:  $1 - \tau_z/\tau_p = 1 - \tau_z/(\tau_B + \tau_{LB}/K_{LS})$ 

$$= [ln(2)]^{2} (2K_{LS} + 3) \left(\tau_{B} + \tau_{MOS} + \frac{\tau_{LB}}{\kappa_{LS}}\right)^{2} I_{TAIL} V_{DD}, (30)$$

then we can compute the derivative of (30) and set the result to zero in order to find the minimum value, obtaining

$$2(\tau_B + \tau_{MOS})^2 K_{LS}^3 - 2[\tau_{LB} + 3(\tau_B + \tau_{MOS})]\tau_{LB} K_{LS} - 5\tau_{LB}^2 = 0,$$
(31)

Then, neglecting the last term, we find the  $K_{LS}$  value to achieve the minimum *EDP* 

$$K_{LS\min,EDP} = \sqrt{\left(\frac{\tau_{LB}}{\tau_B + \tau_{MOS}}\right)^2 + 3\frac{\tau_{LB}}{\tau_B + \tau_{MOS}}} \approx \sqrt{3\frac{\tau_{LB}}{\tau_B + \tau_{MOS}}} = \sqrt{2}K_{LS\min,PDP}$$
(32)

#### B. DIV2 approximated estimation and remarks

In order to evaluate a draft comparison among the various design strategies previously presented, let us assume as reference time constant,  $\tau_n$ , the ratio of the NMOS input capacitance,  $C_n$ , (about equal to  $C_{gs}+C_{gd}$ ) divided by the NMOS transconductance (i.e.,  $\tau_n=C_n/g_{mn}$ ). For PMOS transistors we define as  $\tau_p$  the ratio of the PMOS input capacitance,  $C_p$  divided by the PMOS transconductance (i.e.,  $\tau_p=C_p/g_{mp}$ ). As additional hypothesis we assume that PMOS transistors are sized for the same transconductance of NMOS devices (i.e.  $g_{mn} = g_{mp} = g_m$ ). With this design choice the PMOS input capacitance  $C_p$  can be related to the NMOS input one through a  $\mu_n/\mu_p$  factor (i.e.,  $C_p = (\mu_n/\mu_p)C_n$ ). We also assume that  $W_n = W_{DP}$ , that implies that their  $g_m$ 's are different if the overdrive voltage is the same(see (19)).

Considering for simplicity  $\mu_n/\mu_p = 2$  and  $C_{gs} = C_{gd}$ , and neglecting the junction contribution, we can rewrite (9a) and (10) as follows:

$$\tau_1 \approx 4\tau_n + \tau_p/2 = 5\tau_n \tag{33}$$

$$\tau_2 \approx 1.5\sqrt{2}\tau_{\rm n}.\tag{34}$$

Moreover, from (22) and (17) and assuming  $C_{LATCH} = 2C_n$ , we get

$$\tau_{\rm 3MOS} \approx 1.5 A_{\nu} \tau_{\rm n.} \,. \tag{35}$$

Finally, regarding the term  $R_D C_{in}$ , due to the feedback with the D input, we get

$$R_D C_{in} \approx 1.5 A_v \tau_{\rm n}. \tag{36}$$

Regarding the level shifter, adopting the same simplified considerations and computing  $C_{LB}$  as the sum of the gate-source contribution and two times the gate-drain contribution (due to the Miller effect), from (5) we can approximate the terms in (25) as

$$\tau_B \approx \tau_n$$
 (37a)

$$\tau_{LB} \approx 1.5 \tau_n. \tag{37b}$$

Thus, substituting (33)-(37) into (27) we get

$$t_{CKQ} \approx \ln(2) \left( 8.12 + 3A_V + \frac{1.5}{\kappa_{LS}} \right) \tau_n.$$
 (38)

Relationship (38) can be adopted to approximatively estimate the propagation delay increase under the minimum PDP and minimum EDP designs. In particular, evaluating (38) with  $K_{LSmin,PDP}$  and  $K_{LSmin,EDP}$  under the approximation done, i.e. equal to 0.33 and 0.47, respectively, we find a propagation delay increment with respect to the minimum propagation delay (i.e., (38) evaluated with  $K_{LS}=2$ ) equal to about 25% and 16% for the minimum PDP and minimum EDP designs, respectively. From the power consumption perspective, we can compare the power consumption of the optimum PDP and optimum EDP designs with the case  $K_{LS}=2$ , which tends to set the minimum propagation delay. In this case we find a power reduction advantage equal to 48% and 44%, respectively. Even if we compare the optimum PDP and optimum EDP designs with the design case  $K_{LS}=1$ , (which is less close to the minimum propagation delay), we gain a power reduction advantage equal to 27% and 21% respectively.

TABLE I. MAIN PROCESS PARAMETERS OF THE 28 NM FD-SOI CMOS

| TECHNOLOGY       |                         |  |  |  |
|------------------|-------------------------|--|--|--|
| $\mu_n C_{ox}$   | $210 \frac{\mu A}{V^2}$ |  |  |  |
| $\mu_p C_{ox}$   | $78 \frac{\mu A}{V^2}$  |  |  |  |
| $V_{TH}^*$       | 0.35V                   |  |  |  |
| $W_{min}$        | 80nm                    |  |  |  |
| L <sub>min</sub> | 28nm                    |  |  |  |

\*In FDSOI processes  $V_{TH}$  can be adjusted by means of body bias. In our design the body of NMOS and PMOS devices has been connected to ground and  $V_{DD}$  respectively.

| TABLE II. DESIGN        | PARAMETERS FOR THE FMCM DFF IN FIG. 3 |
|-------------------------|---------------------------------------|
|                         | Folded DFF in Fig. 3                  |
| L                       | 28nm                                  |
| $\Delta V$              | 300mV                                 |
| I <sub>TAIL</sub>       | 10μΑ                                  |
| $I_B$                   | 20µA                                  |
| $R_D$                   | 30KΩ                                  |
| $W_D/V_{GATE}$          | 150nm/120mV                           |
| $W_{I,2}$               | 1000nm                                |
| W <sub>3,4,5,6</sub>    | 250nm                                 |
| $W_{7,8,7A,8A}$         | 250nm                                 |
| W <sub>9,10,11,12</sub> | 500nm                                 |
| W9A,10A,11A,12A         | 500nm                                 |
| W <sub>LSI.2</sub>      | 250nm                                 |

#### C. Simulation results and validation

In order to validate the analysis and considerations above reported, we simulated in Cadence Virtuoso the FMCML *DFF* in Fig. 3 and the resulting *DIV2* cell considering a commercial 28nm FD-SOI CMOS technology from ST Microelectronics [42], whose main technology parameters are reported in Table I. It is worth noting that with this submicron technology we have  $C_{sb} \approx C_{db}$  and  $C_{gd}$  about 15% and 90% of  $C_{gs}$ , respectively<sup>4</sup>.

Following the design strategy suggested above and, in particular, setting all devices with minimum gate length to minimize parasitic capacitances, and gate widths according to the required noise margin and static gate-source voltages, we find the transistor dimensions reported in Table II for a reference  $I_{TAIL}$  and  $I_B$  of  $10\mu A$  and  $20\mu A$  respectively. Gate widths have then been scaled with the currents  $I_B$  and  $I_{TAIL}$  to keep the biasing conditions as constant as possible (also the number of gate fingers has been scaled with the currents). Moreover, the minimum value for both the  $I_B$  and  $I_{TAIL}$  currents to avoid operation in sub-threshold region is about  $5\mu A$ .

The behavior of the time constants  $\tau_1$ ,  $\tau_2$ , and  $\tau_{3MOS} + R_D C_{in}$  versus  $I_{TAIL}$  is plotted in Fig. 6. Hence, as expected, all these three time constants remain almost constant when increasing the core latch tail current, and among them  $\tau_1$  is the greatest and  $\tau_2$  is significantly lower than  $\tau_1$  and lower than  $\tau_{3MOS} + R_D C_{in}$ .



Fig. 6. Latch time constants behavior versus current  $I_{TAIL}$ .



Fig. 7. Dependence of level shifter time constants on the bias current  $I_B$ : a) pole time constant; b) pole-zero time constant ratio.

Regarding the level shifter, in Fig. 7a it is reported the value of the pole time constant  $\tau_p$  versus the level shifter bias current. Moreover, to show the correctness of neglecting in (7) the zero time constant  $\tau_z$ , the ratio  $\tau_p/\tau_z$  versus the level shifter bias current is also shown in Fig. 7b. From Fig. 7a, it is apparent that the level shifter contribution could significantly affect the *DFF* 

<sup>&</sup>lt;sup>4</sup> The weight of  $C_{gd}$  with respect  $C_{gs}$  can be similar in other nanometer technologies, for example in a 65-nm CMOS we have  $C_{gd}$  about 75% of  $C_{gs}$ .

time response. Indeed, its pole can be lower than  $\tau_2$  for sufficiently high bias currents, but can become appreciable on the clock-to-Q propagation delay  $t_{CKO}$  for lower bias currents.



Fig. 8. Propagation delay, *PDP* and *EDP* of the *DIV2* cell vs. the bias current ratio  $I_B/I_{TAIL}$ .



Fig. 9.. Clock-to-output propagation delay and PDP vs  $I_{TAIL}$ , for  $K_{LS}$ =0.33 and  $K_{LS}$ =1.



Fig. 10. Estimated maximum input frequency of the divider vs  $I_{TAIL}$ , for  $K_{LS}$ =0.33 and  $K_{LS}$ =1.

The  $t_{CKQ}$  together with the power-delay product *PDP* and the energy-delay product *EDP* of the *DIV2* cell are plotted in Fig. 8 versus the current ratio  $K_{LS}$ . The minimum *PDP* and the minimum *EDP* in Fig. 8 are found to be at  $K_{LS}$  values related through a  $\sqrt{2}$  term as expected, and close to the values estimated in the previous sub-section.

To further validate the analysis and show some other details, the  $t_{CKQ}$  and the *PDP* versus the core latch bias current  $I_{TAIL}$  for two key  $K_{LS}$  values are reported in Fig. 9. The  $K_{LS}$  values are 0.33 that corresponds to the minimum *PDP*, and 1 that allows a speed performance close to the ideal maximum one. Fig. 10 shows the estimated maximum input frequency of the divider  $(f_{MAX} = 1/2t_{CKQ})$  as a function of the current  $I_{TAIL}$ , for two values of the ratio  $K_{LS}$ .

The validate the model of the clock-to-Q propagation delay in (27), we have compared the simulated propagation delay (shown in Fig. 9a) with that obtained by (27) using the time constants in Fig. 6 and 7, for  $K_{LS}=1$ . For  $I_{TAIL}$  ranging from 5 to 50  $\mu$ A, the average relative error results 1.76%, and the maximum error is below 7.9%. When using the simplified model where we neglect the effect of the zero in the level shifter, average and maximum errors are 3% and 9.16% respectively.

We have also studied the effect of process, supply voltage and temperature (PVT) variations on the clock-to-Q delay: for the case  $I_{TAIL}=I_B=10 \ \mu\text{A}$ , the delay remains between 72 and 79 ps when a -20° to 120°C temperature range and ±10% supply voltage is considered. When process corners are considered, the critical cases are those with opposite deviations for NMOS and PMOS devices, providing a delay from 64 to 87 ps.

#### V. FREQUENCY DIVIDER DESIGN

In this section we focus on the design of the frequency divider following two main approaches. The former is the simplest one and is based on the use of equal *DIV2* cells, whereas the other adopts optimized *DIV2* cells having different biasing currents at the different stages.

### A. Design with equal DIV2

It is apparent that if we assume the  $2^N$  frequency divider made up of N identical *DIV2* stages, we can immediately identify three design strategies: maximum speed, minimum *PDP* or minimum *EDP*.

In particular, if we want the  $2^N$  frequency divider with the highest speed performance, all the *DIV2* cells have to be designed for the minimum  $t_{CKQ}$ . Thus, as shown in section IV,  $t_{CKQ}$  minimization in a power conscious way is achieved by setting  $I_{TAIL}$  at the minimum value, since  $t_{CKQ}$  is almost constant with  $I_{TAIL}$ , and the level shifter bias current,  $I_B$  at least twice  $I_{TAIL}$ , to make negligible the level shifter delay contribution. This strategy, of course, results in higher power consumption than the other two strategies.

For the other two design cases we have to simply set  $I_B$  lower than  $I_{TAIL}$ , according to (29) or (32) for the minimum *PDP* or the minimum *EDP*, respectively. In this case, having all the *DIV2* cells equal to each other, speed and power consumption reduction are those estimated in section IV.B

# B. Frequency divider design with customized DIV2

A more optimized design strategy, especially if we want to obtain the maximum speed performance and minimum power dissipation, can be pursued considering that each *DIV2* cell operates at a halved frequency with respect to the previous one. Hence, exploiting again the considerations carried out in the previous section, we can change and adapt the level shifter bias current  $I_B$ , of a *DIV2* cell with respect to the one of the previous *DIV2* cell. More specifically, we can design the *DIV2* cell *i* with its  $K_{LS,i}$  value set to allow a  $t_{CKQ}$  double than the *DIV2* cell *i*-1. Thus from (27) we can write

$$\left(\tau_{MOS} + \tau_B + \frac{\tau_{LB}}{\kappa_{LS,i}}\right) = 2\left(\tau_{MOS} + \tau_B + \frac{\tau_{LB}}{\kappa_{LS,i-1}}\right).$$
 (39)

that provides

$$K_{LS,i} = \frac{\tau_{LB}}{\tau_{MOS} + \tau_B + 2\frac{\tau_{LB}}{K_{LS,i-1}}} = \frac{\tau_{LB}}{\tau_{MOS} + \tau_B} \alpha_i \tag{40}$$

where the coefficients

$$\alpha_i = \frac{\alpha_{i-1}}{2 + \alpha_{i-1}} \tag{41}$$

have the numerical values reported in Table III if the divider has been designed for the maximum speed performance, that requires  $K_{LS,I} \rightarrow \infty$  thus  $\alpha_2=1$ .

|            | TABLE III. $\alpha_i$ values |                             |                             |                              |  |  |
|------------|------------------------------|-----------------------------|-----------------------------|------------------------------|--|--|
| i          | 2                            | 3                           | 4                           | 5                            |  |  |
| $\alpha_i$ | 1                            | <sup>1</sup> / <sub>3</sub> | <sup>1</sup> / <sub>7</sub> | <sup>1</sup> / <sub>15</sub> |  |  |

Following this strategy, in the case we pursue the best speed performance, we can save a significant amount of power: For example, considering a static frequency divider with N=3, the adoption of the optimized strategy allows to reduce the power consumption to about 38% with respect to the implementation

with equal *DIV2* cells, without degrading the maximum speed. From (28) and considering equal *DIV2* cells with the maximum speed performance, the power consumption results

$$P_T = 7NI_{TAIL}V_{DD}.$$
 (42)

While if only the first cell is designed with the best speed performance, again from (28), and using (40) and (41), which allow neglecting the level shifter power consumption of the DIV2 from 2 to N, we can find

$$P_T = (7 + 3(N - 1))I_{TAIL}V_{DD}.$$
(43)

Note, however, that the procedure can be applied up to the third or fourth *DIV2* cell, since the level shifter bias current becomes negligible<sup>5</sup>.

#### C. Simulation results and validation

To validate the proposed design strategies, we have applied them to the design of a divide-by-8 frequency divider implemented with the FMCML logic style in the same 28-nm FDSOI CMOS technology considered in the previous section.

In the first case study (I) we have designed a frequency divider made up of N=3 identical *DIV2* stages for maximum speed performance. As a first design step we have set  $I_{TAIL} = 10 \,\mu A$  which is close to the minimum value which guarantees  $t_{CKQ}$  almost constant with  $I_{TAIL}$  (i.e., a latch propagation delay almost equal to  $\ln(2)\tau_{MOS}$ ). Then, according to section V.A, the level shifter bias current has been set to  $I_B = 20 \,\mu A$  (twice  $I_{TAIL}$ ), to make the delay contribution of the level shifter negligible. The resulting maximum operating frequency of this divider is 13.8 *GHz*, with a 168.5  $\mu W$  power consumption.

It is worth noting that, since our divider core is based on a unitary feedback *DFF* (see Fig. 1b), the clock-to-output propagation delay of the basic divider cell (i.e., the reciprocal of the maximum *DIV2* frequency) is  $2t_{CKQ}$ . Thus, the expected maximum divider clock input frequency should be  $1/2t_{CKQ}$ . However, according to simulation results, we can achieve a maximum divider clock input frequency even slightly lower than  $1/t_{CKQ}$ , at the cost, as shown in Fig. 11, of a reduced output swing in the first and second *DIV2*. The full output swing in fact exceeds the minimum required to fully switch a differential pair<sup>6</sup>.

The second and third case studies are designed again with N=3, identical *DIV2* stages,  $I_{TAIL} = 10 \mu A$ , but (II)  $I_B = 3.3 \mu A$  (i.e.,  $K_{LS} = 0.33$ ) and (III)  $I_B = 4.7 \mu A$  (i.e.,  $K_{LS} = 0.47$ ), to achieve minimum *PDP* and minimum *EDP* respectively. The resulting maximum operating frequencies are 10.9GHz and 11.8 GHz with a power consumption of  $87.5 \mu$ W and  $93.7 \mu$ W. In conclusion, we find a 28% and 18% speed reduction with respect the maximum frequency case gaining a 48% and 44% power consumption reduction, respectively. By comparing these results with the estimated ones in section IV.B, we find a high accuracy on the power consumption reduction.

frequency larger than predicted by  $1/2t_{CKQ}$ ; to define the maximum allowable input frequency, we use the criteria that the output swing of the first *DIV2* cell must be large enough to allow correct operation of the following cells of the divide-by-8 divider.

<sup>&</sup>lt;sup>5</sup> It is also not much useful to reduce the core latch bias current, since even the minimum sized MOS transistors work in the sub-threshold region and the derived relationships are no longer accurate.

<sup>&</sup>lt;sup>6</sup> The latches in the divider are able to switch with a lower swing, also thanks to the positive feedback in the hold pair, whereas the propagation delay is calculated under the hypothesis of a full swing. This justifies a maximum clock

The fourth considered case study (IV) regards the frequency divider design with customized *DIV2* cells to achieve the maximum speed but at reduced power consumption. In particular, setting the  $I_{TAIL}$  equal to 10 µA for all the stages and, according to (40),  $I_B$  equal to 20 µA, 2 µA and 1 µA, for the first second and third *DIV2*, respectively, we find a 14.9 GHz maximum operating frequency with a 110.2 µW power consumption.

By following the customized design, the maximum operating frequency is even higher than in the case with maximum speed and all equal DIV2. This is due to the slightly lower load on the first DIV2 cell. Moreover, as expected, we achieve this best speed performance with a 35% reduction on the power consumption.

Paying a small prize in term of speed performance, but of course gaining in term of power consumption with respect to the design with all equal *DIV2*, other two design cases can be considered by changing only the first *DIV2*, in order to obtain a minimum *PDP* or *EDP*. In particular, changing only the *I<sub>B</sub>* of the first *DIV2* into 3.3  $\mu$ A (V) or 4.7  $\mu$ A (VI) we find a maximum operating frequency equal to 10.9 GHz (-27%) or a 12.1 GHz (-18%) with a power consumption equal to 79.6  $\mu$ W (-28%) or 81.7  $\mu$ W (-26%), respectively.

All the cases designed and analyzed are summarized in Table IV. The *DIV2* output waveforms of the frequency divider designed with customized *DIV2* cells at the maximum operating frequency are reported in Fig. for an input clock signal with a period  $t_{CK}$  equal to 67 ps and an amplitude of 0.3 V.

|                | TAB           | LE IV. SUMMA               | RY OF CASE  | STUDIES             |                  |  |
|----------------|---------------|----------------------------|-------------|---------------------|------------------|--|
| Case           | ILATCH        | $I_{B1} / I_{B2} / I_{B3}$ | Pd          | t <sub>CK,MIN</sub> | f <sub>max</sub> |  |
| Study          | (µA)          | (µA)                       | (µW)        | (ps)                | (GHz)            |  |
| Ι              | 10            | 20 / 20 / 20               | 168.5       | 72                  | 13.8             |  |
| II             | 10            | 3.3 / 3.3 / 3.3            | 87.5        | 92                  | 10.9             |  |
| III            | 10            | 4.7 / 4.7 / 4.7            | 93.7        | 85                  | 11.8             |  |
| IV             | 10            | 20 / 2 / 1                 | 110.2       | 67                  | 14.9             |  |
| V              | 10            | 3.3 / 2 / 1                | 79.6        | 92                  | 10.9             |  |
| VI             | 10            | 4.7 / 2 / 1                | 81.7        | 83                  | 12.1             |  |
|                |               |                            |             |                     |                  |  |
| 0.2<br>01A4[A] | $\bigwedge$   | $\bigwedge$                | $\bigwedge$ | $\wedge$            | $\bigwedge^{1}$  |  |
| 0.2            | $\bigcap$     | $\langle $                 |             | $\overline{}$       |                  |  |
| 8 0<br>-0.2    | }             | $\bigvee$                  | $\bigcup$   | L                   |                  |  |
| % 0<br>-0.2    | <i>S</i><br>8 | 1.85                       | 1.9         | 1.95                |                  |  |

Fig. 11. Input and output waveforms for divider designed with customized DIV2 cells at Maximum speed for an input  $t_{CK}$  of 67 ps.

#### VI. CONCLUSION

In this paper a methodology to design high-speed, powerefficient static frequency dividers based on the low-voltage Folded MOS Current Mode Logic (FMCML) approach has been introduced. The method is based on the analytical modeling of the propagation delay and power consumption of the DIV2 cell as a function of bias currents  $I_{TAIL}$ . and  $I_B$ . Design guidelines for the simple case of equal DIV2 cells and for the more optimized case which adopts customized DIV2 cells having different biasing currents at the different stages have been analytically derived for three different design scenarios: maximum speed, minimum power-delay product (*PDP*) or minimum energy-delay product (*EDP*).

The FMCML logic style is well suited for very low-voltage applications in a mixed-signal environment, and its topology with a folding current mirror introducing a large time constant makes the existing design guidelines for MCML frequency dividers unsuitable to maximize speed and minimize power consumption.

Six case studies involving the design of a divide-by-8 circuit have been carried out referring to a 28nm FDSOI CMOS technology. Results, summarized in Table IV, highlight the high agreement between predicted and simulated speed and power consumption reduction. They also confirm the effectiveness of the proposed approach, which allowed the design of a divide-by-8 frequency divider with 14.9 GHz maximum operating frequency and 110.2  $\mu$ W power consumption when targeting maximum speed, and with 10.9 GHz maximum operating frequency and 79.6  $\mu$ W power consumption when optimizing for minimum *PDP*.

#### REFERENCES

- C. Lin, T. Chien and C. Wey, 'A 5.5-GHz 1-mW full-modulus-range programmable frequency divider in 90-nm CMOS process,' *IEEE Trans. Circuits and Systems Part* II, vol. 58, no. 9, pp. 550-554, Sept. 2011.
- [2] A. I. Hussein, S. Vasadi, J. Paramesh, 'A 450 fs 65-nm millimeter-wave time-to-digital converter using statistical element selection for all-digital PLLs,' *IEEE J. Solid-State Circ.*, vol. 53, no. 2, pp. 357-374, Feb. 2018.
- [3] Z. Chen, Z. Wang, H. Liu, and W. Wang, 'A 26-44 GHz programmable frequency divider for wideband mm-wave,' EDSSC 19 IEEE Int. Conf. Electron Devices and Solid-State Circuits, pp. 1-3, Jun. 2019.
- [4] G. S. Jeong, W. Kim, J. Park, T. Kim, H. Park, and D.-K. Jeong, 'A 0.015mm2 inductorless 32-GHz clock generator with wide frequencytuning range in 28-nm CMOS technology,' *IEEE Trans. Circuits and Systems Part II*, vol. 64, no. 6, pp. 655-659, Jun. 2017.
- [5] A. Zandieh, N. Weiss, T. L. Nguyen, D. Haranne, and S. P. Voinigescu, '128-GS/s ADC front-end with over 60-GHz input bandwidth in 22-nm Si/SiGe FDSOI CMOS,' BCICTS 18 IEEE BiCMOS Compound Semic. Integrated Circuits Technol. Symp., Oct. 2018.
- [6] A. Yazdi, M. M. Green, 'A 40-Gb/s full-rate 2:1 MUX in 0.18µm CMOS,' *IEEE Trans. Microwave Theory Techn.*, vol. 59, no. 11, pp. 2879-2887, Nov. 2011.
- [7] H. Won, T. Yoon, J. Han, J.-Y. Lee, J.-H. Yoon, T. Kim, J.-S. Lee, S. Lee, K. Han, J. Lee, J. Park, and H.-M. Bae, 'A 0.87 W transceiver IC for 100 Gigabit Ethernet in 40 nm CMOS,' *IEEE J. Solid-State Circ.*, vol. 50, no. 2, pp. 399-413, Feb. 2015.
- [8] F. T. Chen, J.-M. Wu, and M.-C. F. Chang, '40-Gb/s 0.7-V 2:1 MUX and 1:2 DEMUX with transformer-coupled technique for SerDes interface,' *IEEE Trans. Circuits and Systems Part I*, vol. 62, no. 4, pp. 1042-1051, Apr. 2015.
- [9] J. Lee, P. Chiang, P. Peng, L. Chen, and C. Weng, 'Design of 56 Gb/s NRZ and PAM4 SerDes transceivers in CMOS technologies,' *IEEE J. Solid-State Circ.*, vol. 50, no. 9, pp. 2061-2073, Sept. 2015.
- [10] G. Shu; W. S. Choi; S. Saxena; M. Talegaonkar; T. Anand; A. Elkholy; A. Elshazly; and P. K. Hanumolu, 'A 4-to-10.5 Gb/s continuous-rate digital clock and data recovery with automatic frequency acquisition,' *IEEE J. Solid-State Circuits*, vol. 51, no. 2, pp. 428-439, Feb. 2016.
- [11] S. Han; T. Kim; J. Kim; J. Kim, 'A 10 Gbps SerDes for wireless chip-tochip communication,' *ISOCC 15 Int. SoC Design Conference*, pp. 17-18, 2015.

- [12] T. Alpert, F. Lang, D. Ferenci, M. Grözing and M. Berroth, 'A 28GS/s 6b pseudo segmented current steering DAC in 90nm CMOS,' *IMS 11 IEEE Int. Microwave Symp.*, pp. 1-4, Jun. 2011.
- [13] H. Huang, J. Heilmeyer, M. Grözing, M. Berroth, J. Leibrich and W. Rosenkranz, 'An 8-bit 100-GS/s distributed DAC in 28-nm CMOS for optical communications,' *IEEE Microwave Theory Techn.*, vol. 63, no. 4, pp. 1211-1218, Apr. 2015.
- [14] M. Grözing, D. Ferenci, F. Lang, T. Alpert, H. Huang, J. Boem, T. Veigel, M. Berroth, 'High-speed CMOS DACs and ADCs for broadband communication,' *IMS 13 IEEE MTT-S Int. Microwave Symp.*, 2013.
- [15] L. Kull, D. Luu, C. Menolfi, M. Brändli, P. A. Francese, T. Morf, M. Kossel, A. Cevrero, I. Ozkaya, and T. Toifl, 'A 24-72-GS/s 8-b time-interleaved SAR ADC with 2.0-3.3-pJ/conversion and >30 dB SNDR at Nyquist in 14-nm CMOS FinFET,' *IEEE J. Solid-State Circ.*, vol. 53, no. 12, pp. 3508-3516, Dec. 2018.
- [16] R. L. Schmid, A. Ç. Ulusoy, S. Zeinolabedinzadeh and J. D. Cressler, 'A comparison of the degradation in RF performance due to device interconnects in advanced SiGe HBT and CMOS technologies,' *IEEE Trans. Electron Devices*, vol. 62, no. 6, pp. 1803-1810, Jun. 2015.
- [17] U. Singh, M. M. Green, 'High-frequency CML clock dividers in 0.13-μm CMOS operating up to 38 GHz,' *IEEE J. Solid-State Circ.*, vol. 40, no. 8, pp. 1658-1661, Aug. 2005.
- [18] P. Heydari, R. Mohanavelu, 'A 40-GHz flip-flop-based frequency divider,' *IEEE Trans. Circuits and Systems Part II*, vol. 53, no. 12, pp. 1358-1362, Dec. 2006.
- [19] L. Li, P. Reynaert, M. Steyaert, 'A 60GHz 15.7mW static frequency divider in 90nm CMOS,' ESSCIRC 10 Eur. Solid-State Circuits Conf., pp. 246-249, 2010.
- [20] A. I. Hussein, J. Paramesh, 'Design and self-calibration techniques for inductor-less millimeter-wave frequency dividers,' *IEEE J. Solid-State Circ.*, vol. 52, no. 6, pp. 1521-1541, Jun. 2017.
- [21] X. Zhao, Y. Chen, P.-I. Mak, R. P. Martins, 'A 0.0018mm2 153% locking-range CML-based divider-by-2 with tunable self-resonant frequency using an auxiliary negative-gm cell,' *IEEE Trans. Circuits and Systems Part I*, vol. 66, no. 9, pp. 3330-3339, Sep. 2019.
- [22] J. Lee, B. Razavi, 'A 40-GHz frequency divider in 0.18-µm CMOS technology,' *IEEE J. Solid-State Circ.*, vol. 39, no. 4, pp. 594-601, Apr. 2004.
- [23] A. Safarian, S. Anand, P. Heydari, 'On the dynamics of regenerative frequency dividers,' *IEEE Trans. Circuits and Systems Part II*, vol. 53, no. 12, pp. 1413-1417, Dec. 2006.
- [24] Y.-H. Lin, H. Wang, 'A 35.7-64.2 GHz low power Miller divider with weak inversion mixer in 65 nm CMOS,' *IEEE Microwave Wireless Components Lett.*, vol. 26, no. 11, pp. 948-950, Nov. 2016.
- [25] C.-C. Chen, H.-W. Tsao, H. Wang, 'Design and analysis of CMOS frequency dividers with wide input locking ranges,' *IEEE Trans. Microw. Theory Techn.*, vol. 57, no. 12, pp. 3060-3069, Dec. 2009.
- [26] L. Wu, H. C. Luong, 'Analysis and design of a 0.6 V 2.2 mW 58.5-to-72.9 GHz divide-by-4 injection-locked frequency divider with harmonic boosting,' *IEEE Trans. Circuits and Systems Part I*, vol. 60, no. 8, pp. 2001-2008, Aug. 2013.
- [27] S.-L. Jang, W.-C. Lai, G.-Z. Li, Y.-W. Chen, 'High even-modulus injection-locked frequency dividers,' *IEEE Trans. Microw. Theory Techn.*, vol. 67, no. 12, pp. 5069-5079, Dec. 2019.
- [28] M. Alioto, G. Palumbo, 'Power-aware design techniques for nanometer MOS current-mode logic gates: a design framework,' *IEEE Circuits and Systems Mag.*, vol. 61, no. 4, pp. 40-59, Sep. 2006.
- [29] B. Razavi, 'The role of PLLs in future wireline transmitters,' *IEEE Trans. Circuits and Systems Part I*, vol. 56, no. 8, pp. 1786-1793, Aug. 2009.
- [30] P. Payandehnia, H. Maghami, S. Sheikhaei, A. Abbasfar, B. Forouzandeh, and K. Nanbakhsh, 'High speed CML latch using active inductor in 0.18μm CMOS technology,' *ICEE 11 Iranian Conf. Electrical Eng.*, pp. 1-4, May 2011.
- [31] K. Gupta, N. Pandey, M. Gupta, 'Analysis and design of MOS current mode logic exclusive-OR gate using triple-tail cells,' *Microelectron. J.*, vol. 44, no. 6, pp. 561-567, 2013.
- [32] K. P. Sai Pradeep, S. Suresh Kumar, 'Design and development of high performance MOS current mode logic (MCML) processor for fast and power efficient computing,' *Cluster Computing*, vol. 22, pp. 13387-13395, 2019.
- [33] G. Scotti, D. Bellizia, A. Trifiletti, G. Palumbo, 'Design of low-voltage high-speed CML D-latches in nanometer CMOS technologies,' *IEEE Trans. VLSI Systems*, vol. 25, no. 12, pp. 3509-3520, Dec. 2017.

- [34] G. Scotti, A. Trifiletti, G. Palumbo, 'A novel 0.5 V MCML D-flip-flop topology exploiting forward body bias threshold lowering,' *IEEE Trans. Circuits and Systems Part II*, vol. 67, no. 3, pp. 560-564, Mar. 2020.
- [35] M. Alioto, R. Mita, G. Palumbo, 'Design of high-speed power-efficient MOS current-mode logic frequency dividers,' *IEEE Trans. Circuits and Systems Part II*, vol. 53, no. 11, pp. 1165-1169, Nov. 2006.
- [36] R. Nonis, E. Palumbo, P. Palestri, L. Selmi, 'A design methodology for MOS current-mode logic frequency dividers,' *IEEE Trans. Circuits and Systems Part I*, vol. 54, no. 2, pp. 245-254, Feb. 2007.
- [37] U. Singh, M. Green, 'Dynamics of high-frequency CMOS dividers,' ISCAS 02 IEEE Int. Symp. Circuits and Systems, vol 5, pp. 421-424, 2002.
- [38] W. Fang, A. Brunnschweiler, P. Ashburn, 'An analytical maximum toggle frequency expression and its application to optimizing high-speed ECL frequency dividers,' *IEEE J. Solid-State Circ.*, vol. 25, no. 4, pp. 920-931, Aug. 1990.
- [39] A. Hajimiri, 'Generalized time- and transfer-constant circuit analysis,' *IEEE Trans. Circuits and Systems Part I*, vol. 57, no. 6, pp. 1105-1121, Jun. 2010.
- [40] F. Centurelli, G. Scotti, A. Trifiletti, G. Palumbo, 'Delay models and design guidelines for MCML gates with resistor or PMOS load,' *Microelectron. J.*, vol. 99, paper 104755, May 2020.
- [41] M. Alioto, G. Palumbo, 'Power-delay optimization of D-latch/MUX source coupled logic gates,' *Int. J. Circuit Theory Appl.*, vol. 33, pp. 65-86, 2005.
- [42] D. Golanski, P. Fonteneau, C. Fenouillet-Beranger, A. Cros, F. Monsieur, N. Guillard, C.-A. Legrand, A. Dray, C. Richier, H. Beckrich, P. Mora, G. Bidal, O. Weber, O. Savod, J.-R. Manouvrier, P. Galy, N. Planes, and F. Arnaud, 'First demonstration of a full 28nm high-k/metal gate circuit transfer from Bulk to UTBB FDSOI technology through hybrid integration,' VLSI 13 IEEE Symp. VLSI Circuits, pp. 1-24-125, Jun. 2013.



**Francesco Centurelli** was born in Roma in 1971. He received the laurea degree (cum laude) and the Ph.D. degree in Electronic Engineering from the University of Roma "La Sapienza", Roma, Italy, in 1995 and 2000 respectively.

In 2006 he became an Assistant Professor at the DIET department of the University of Roma La Sapienza.

His research interests were initially focused on system-level analysis and design of clock recovery circuits and high-speed analog integrated circuits, and now concern the design of analog-to-digital

converters and very low-voltage circuits for analog and RF applications. He has published more than 100 papers on international journals and refereed conferences, and has been also involved in R&D activities held in collaboration between Università "La Sapienza" and some industrial partners.



Giuseppe Scotti was born in Cagliari, Italy, in 1975. He received the M.S. and Ph.D. degrees in electronic engineering from the University of Rome "La Sapienza", Rome, Italy, in 1999 and 2003, respectively. In 2010, he became a Researcher (Assistant Professor) at the DIET department of the university of Rome "La Sapienza" and in 2015 he was appointed Associate Professor in the same department. He teaches undergraduate and graduate courses on basic electronics and microelectronics. His research activity was mainly concerned with

integrated circuits design and focused on design methodologies able to guarantee robustness with respect to parameter variations in both analog circuits and digital VLSI circuits. In the context of analog design his research activity was concerned with circuit topologies for the realization of low-voltage analog building blocks using ultra-short channel CMOS technology, whereas in the context of cryptographic hardware his focus has been on novel PAAs methodologies and countermeasures. He has been also involved in R&D activities held in collaboration between "La Sapienza" University and some industrial partners, which led, between 2000 and 2015, to the implementation of 13 ASICs. He has coauthored more than 45 publications in international Journals, about 70 contributions in conference proceedings and is the co-inventor of 2 international patents.



Alessandro Trifiletti was born in Rome (Italy) on October 4, 1959. In 1991 he joined Electronic Engineering Department of "La Sapienza" University in Rome as research assistant, where he was involved in research activities dealing with analogue, RF and microwave IC's design. In 2001 he became assistant professor and in 2005 he got the position of associate professor and in 2019 the position of Full Professor at the Engineering Faculty of the same University. Prof. Trifiletti has worked in the field of Microelectronics, both from

the point of view of design methodologies and circuit topologies. On these subjects, Prof. Trifiletti has (co-)authored over 210 publications, of which about 80 published on international Journals, the others published on the proceedings of major international Conferences (a large part of these sponsored by the IEEE). He is presently reviewer for some IEE and IEEE reviews, among them: IEEE Transaction on Microwave Theory and Techniques, IEEE Transaction on Circuit and Systems (part I and II), IEE Proceedings on Circuits, Devices and Systems, IEE Electronic letters. In last 20 years he has been engaged in the coordination of research teams from DIET (previously DIE) in the framework of national and international programs, involving both industrial and academic partners. From an industrial perspective, Prof. Trifiletti expertise covers topics about analogue and RF microelectronics, Radar and ESM systems, high-speed communication systems, security issues in cryptographic algorithms implementation, and embedded system design.



**Gaetano Palumbo** (F'07) was born in Catania, Italy, in 1964. He received the Laurea degree in Electrical Engineering in 1988 and the Ph.D. degree in 1993 from the University of Catania. In 1994 he joined the University of Catania, where he is full professor. His primary research interests are in analog and digital circuits.

He was co-author of four books by Kluwer Academic Publishers and Springer, in 1999, 2001, 2005, 2014 respectively, and a textbook on electronic devices in 2005. He is the author of more

than 400 scientific papers on referred international journals (190+) and in conferences. Moreover, he has co-authored several patents.

He served as an Associated Editor of the *IEEE Transactions on Circuits and Systems part I* in 1999-2001, 2004-2005 and 2008-2011, and of the *IEEE Transactions on Circuits and Systems part II* in 2006-2007.

In the period 2011-2013 he served as a member of the Board of Governors of the IEEE CAS Society.

In 2005 he was one of the 12 panelists in the scientific-disciplinary area 09 - industrial and information engineering of the CIVR (Committee for Italian Research Assessment), In 2003 he received the Darlington Award.

In 2015 he has been a panelist of GEV (Group of Evaluation Experts) in the scientific area 09 - *industrial and information engineering* of the ANVUR for the Assessment of Italian Research Quality in 2011-2014.