I. INTRODUCTION
A CCURATE multiphase clock generation (MPCG) is essential for applications such as time-interleaved analogto-digital converters [1] and wireless transceivers with high image rejection and harmonic rejection [2] . Phase errors degrade performance, e.g., by generating spurious tones [3] or limiting the achievable image and harmonic rejection [4] .
Phase errors originate from delay deviations in MPCG blocks, e.g., delay elements in a delay-locked loop or flip-flops (FFs) in a shift register or divider-based MPCG [5] . Delay deviation can originate from the intrinsic properties (mismatch and noise) of the FF in a divider itself or can be caused by external influences like supply noise. To reduce the effect of supply noise, current mode logic (CML) is often used. However, if the power supply noise can be adequately reduced by regulation and decoupling capacitors, the question is which type of FF offers the lowest jitter for a given amount of power. At the International Solid-State Circuits Conference, we increasingly see dynamic transmission gate (DTG) and standard CMOS logic dividers being used in phase-locked loops and other jitter critical applications (e.g., [6] and [7] ). Good achieved results make it plausible that the supply decoupling problem can be solved to a sufficient degree. Among the intrinsic error sources, the timing errors due to mismatch are much larger than from device noise [8] . As mismatch is static, it adds a skew to a onephase clock. However, if multiple clock phases contribute to one output at different moments in time, deterministic "mis- match jitter" results [5] , [9] . Although mismatch jitter can be reduced by digital calibration, this adds considerable cost and complexity. As discussed in [5] and [9] , putting identical circuits in parallel (W-scaling and admittance/impedance scaling) reduces mismatch jitter at the cost of higher area and power consumption. Therefore, just comparing mismatch jitters without considering power will give a highly sizing-dependent result. Hence, we normalize jitter variance to power consumption, as in [5] , and use the jitter-power figure-of-merit (FOM)
where σ tm is the timing variance due to mismatch and P d is the power dissipation. This FOM has a fundamental basis and allows for comparing differently sized circuits fairly, similar to normalizing oscillator phase noise or filter SNR to power. In [7] , DTG FFs (DTG-FFs) were used and able to achieve very low phase errors at much lower power consumption than CML. Explorative simulations in [10] confirmed that DTGFFs have significant advantages over CML FFs (CML-FFs) for MPCG. However, we would like to understand under which conditions (frequency and number of phases) this is true and how technology affects the conclusions. Although the speed, power, and power delay have been analyzed fundamentally extensively for several FF topologies (e.g., [11] and [12] ), there is not much work to optimize jitter-power performance. This brief hence derives analytical equations to estimate jitter, power, and FOM for both DTG-FFs and CML-FFs. Such analytical equations are valuable for insight, to guide the initial design of FFs.
In Section II, the mismatch jitter and power consumption are modeled for DTG-FFs and CML-FFs, and in Section III, the jitter-power FOMs are compared and verified by simulations. Section IV draws conclusions.
II. FF POWER AND MISMATCH-JITTER MODELING
We will now model the mismatch jitter and power consumption for an N -phase MPCG/divider implemented using DTGFFs and CML-FFs as depicted in Figs. 1 and 2 for the case N = 4. The differential divider outputs (e.g., pair I+, I−) will be analyzed, so that a fair comparison can be made with a CML-FF that has a differential output. To provide insight, we keep the equations simple and use first-order device equations rather than the more complicated short-channel models. Evaluating (1) for an MPCG with N DTG-FFs, we find FF." Thus, we divide (2) by N to find an FF FOM, assuming all FFs are identical and contribute the same mismatch jitter
For an MPCG with CML-FFs as in Fig. 2 , however, only N/2 FFs are required because differential outputs are already available. Thus, its expression of FOM per FF becomes
We assumed that the presence of start-up initialization switches can be neglected and that all FFs are triggered by the same edge of a shared clock. Thus, a deterministic time shift in that clock edge is common for all the FFs and does not contribute to phase errors between clock phases. Thus, even if a large number of cascaded buffers is used in front of an FF to drive N big FFs, buffer timing errors fall out and the phase error is dominated by the FF. In contrast, if buffers are added after the FF, both the FF and the buffer contribute mismatch errors.
To minimize total jitter, buffers should be added before the FF in case it has to drive a large capacitive load. As such, buffers are generally scaled up ("tapered buffer chain"), and the overall power consumption is dominated by the FFs and the last buffer preceding the FF, justifying just one clock buffer stage in the FOM model. In a master-slave FF, the slave latch drives the load, and thus, its delay variation renders mismatch jitter. To improve FOM, the master latch can be scaled down compared to the slave latch. As this is possible for both logic families, for simplicity, we keep the master and slave latches identical.
We derive FOM equations for DTG-FFs in Section II-A and for CML-FFs in Section II-B.
A. FOM of a DTG-FF
The mismatch jitter of a DTG-FF [ Fig. 1(b) ] is the variation of clock-to-output delay. The critical delay path is drawn in Fig. 3 . First, we model the transmission gate delay modeled by its equivalent RC time constant [13] . Here, we take a simplified first-order transmission gate (TG) delay where the equivalent resistance is assumed to be constant over the transition range [14, Fig. 6 .48]. Using the simple square-law MOS transistor model, the equivalent TG resistance can be obtained. From the TG equivalent resistance, the delay from the 50% input level to the 50% output level can be written as
where C L is the output capacitance, K = μC ox W/L, V T is the threshold voltage, and suffices n and p refer to nMOS and pMOS transistors, respectively. Equation (5) is valid for both high-to-low (H-L) and low-to-high (L-H) transitions. We modify the equivalent resistance by adding the driving inverter resistance (see Fig. 3 ) to better estimate the delay. For an L-H output transition, the pMOS in the inverter is active and operates in the triode region. The same is true for the nMOS for an H-L transition. Adding these resistances, the delay for the differential (antiphase) output, which is the average of an H-L and L-H delay, can be written as
where C int is the load capacitance due to the TG itself. The last two resistance terms in (6) model inverter triode resistances for equally sized inverter and TG transistors. In practice also Using (7) and defining the ratios as follows, (6) can be written as
where r l is the loading ratio of an FF, i.e., its C L expressed in terms of its input capacitance. Ratio r μ is the pMOS-to-nMOS width ratio (W p /W n , typically 2.5, equal to the electron-to-hole μ ratio), μ is the mobility of an nMOS transistor, and γ c is the ratio of drainto-gate capacitance of a MOS transistor (bias independent for simplicity). Although the delay equation (8) neglects the effect of finite rise/fall time, it gives a reasonable estimate (see Fig. 4 ).
Mismatch jitter is now obtained taking partial derivatives of (6). Applying approximation (7) and after some algebra, we can obtain the mismatch-jitter variance
Here, σ VTn and σ Kn are the standard deviations of V Tn and K n mismatch, respectively, assumed to be the same for a pMOS. The overdrive V OD = (V DD − V Tn ), and d o is the normalized overdrive ratio of V OD w.r.t. V DD . As the total equivalent device size at the output node is bigger than the FF size, its capacitive mismatch is less important than K mismatch and it is neglected. When used inside an MPCG, each FF's output drives another FF along with the external load. To take this into consideration, we replaced r l by (r l + 1) in (9). The power consumption of a CMOS inverter can be approximated in terms of its nMOS gate capacitance C gn as
where f O is the output clock frequency. This assumes that the dynamic charging/discharging power is dominant over shortcircuit power and leakage power. The dynamic power consumption of a DTG-FF [ Fig. 1(b) ] can be expressed as
and the input buffer power consumption per FF is where the input clock frequency f i is expressed as N fo . With the help of (1), (3), (9), (12) , and some algebra, we get
where A VTn and A kn are the technology-dependent mismatch constants and F DTG (rl) is a function which depends on the circuit topology used in the DTG-FF. For Fig. 1 , it is
Here, we approximate the MOS gate capacitance as C OX · W · L. As the FOM is, by its definition, independent of admittance scaling, it only makes sense to optimize the FOM of the FF by changing width ratios such as r μ and r l . We used r μ = 2.5 to match the rise and fall delays of the FF. The clock-buffer size is chosen to be close to its optimum 2.5 [15] for minimum power and mismatch-jitter product.
To optimize FOM, we see that lowering V DD is very effective, while short channels (small L) are also very beneficial. When N is increased, the FOM increases via F DTG according to (13) assuming that f o is constant. This is expected since, for constant f o and higher N , f i goes up, increasing dynamic power proportionally, whereas mismatch jitter remains the same according to (11) . However, if we keep the f i constant and increase N , f o will decrease and thereby decrease the dynamic power and, hence, the FOM.
B. FOM of a CML-FF
The CML-FF in Fig. 2 (top) consists of two identical CML master-slave latches. The delay variation of CML-FF, same as that of a CML latch (Fig. 2, bottom) is derived in analogy to that of a CML buffer as in [5] . For cascaded CML buffers, the output load capacitance is dominated by the input transistors of the next CML stage. To minimize the load capacitance for the previous stage, the width of the nMOS has to be just enough to flip the bias current from one load resistor R b to the other (see the CML buffer in Fig. 2, bottom) . In that case, the input transistor overdrive voltage is the same as the voltage swing V S . Thus, the bias current of a CML buffer (I B ) or a CML latch (I L ) in a CML-FF can be related to its voltage swing (V S ) as
where W B and W L are the widths of input transistor of the buffer and the latch, respectively. Using (15) , the mismatch jitter of the FF given in [5] can be rewritten as As the power consumption of a CML buffer is V DD I B , we obtain the CML buffer FOM from (16) in terms of basic technology, design, and mismatch parameters as
where r RM is the ratio of resistor and the input nMOS device area and A R is a resistor mismatch constant. Here, we ignore load capacitance mismatch (see Section II-A) for simplicity. The load capacitance is modeled via load ratio r l .
To get the CML-FF FOM used in an MPCG, we need to know the relation between I L and I B . The ratio of I L and I B is designed such that both buffer and FF have the same output slew rate. This is to have an equal distribution of mismatch jitter among cascaded stages. The clock buffer drives N/2 CML-FFs or N CML latches, and the latch drives (r l + 2 + γ c ) times its total input capacitance. Thus, the buffer and CML-FF input transistor width ratio (also current ratio) is
Hence, the total power consumption per FF is
Changing (17) according to the load condition of a CML-FF in an MPCG and using (4) and (19), we obtain
where F CML is a function of r l specific to CML MPCG
Two design choices can improve the FOM in (20): increasing the voltage swing (reduces V T mismatch effect) and reducing the load ratio (reduces the load capacitance and delay). We simulated with 1.2 V of power supply in a 90-nm CMOS technology and used 0.4 V of voltage swing which keeps all transistors more or less in saturation. The load ratio affects FF delay and the mismatch-jitter variance in a similar manner, so that FOM and delay are proportional when V DD and V S are fixed. Thus, low delay is preferred as in [5] . To compare model with simulation, we calculated the power, mismatch jitter, and FOM using the values in Table I for a 90-nm CMOS process. We simulated four-phase MPCGs for an input frequency f i = 4 GHz and slew rate of 48 V/ns. The DTG-FF nMOS width is 16 μm, and the CML-FF (R = 67 Ω and I L = 6 mA) input device is 55 μm so that the input capacitance considering ratio r μ is equal for both FFs.
For mismatch jitter, we did Monte Carlo simulations with 100 iterations for "only mismatch" variations. The power consumption (P d ) and the mismatch-jitter (Mj) model results are compared with simulation results in Fig. 5(a) and (b) , respectively, with changing load capacitance. The power consumption has some deviation from the model due to the square-law-model inaccuracy. Simulated DTG-FF Mj is less than modeled, as we assumed equal A VT for the pMOS and nMOS (actually, pMOS mismatch is less). In contrast, simulated CML-FF Mj is more than the modeled one due to the approximated first-order delay equation used. We accept these model errors to keep model equations simple. The simulated delay, power, Mj, and FOM are shown in Table II for r l = 1 for both FFs. A column for r l = 1/8 is added for DTG, where device width is increased, keeping the load capacitance the same. It demonstrates that DTG Mj can be pushed down by W-scaling at the cost of power, at relatively constant FOM.
As both power and mismatch comparison is device size dependent, we compare FOM to get a size-independent comparison. The FOM is compared to that in Fig. 5(c) . Deviations in FOM exist up to about a factor of two; however, the difference between the two logic families is significantly more than the model error. Expressed analytically, the FOM ratio can be found from (13) and (20)
taking into account that a DTG-MPCG needs N FFs whereas the CML-MPCG needs only N/2 (see Figs. 1(a) and 2 ). The ratio in (22), for example, R FOM , can also be written as
where f T is nMOS unity gain frequency, defined as
In (23), the ratio of the FOMs can be separated in three parts: The first part has strong technology dependence, and it is proportional to f T . With CMOS technology downscaling, the f T increases, and so does the ratio, explaining why DTG MPCGs indeed become relatively better compared to CML in scaled CMOS technologies. The second F -ratio term is a function of design parameters related to circuit topology. The low capacitance in a DTG-FF, as there is no cross-coupled pair, and its fast path from clock to output help boost the ratio through smaller F DTG . The third term in (23) is a function of the mismatch parameters and close to one, so it does not affect the comparison result significantly. Fig. 6 (a) shows this advantage for a wide output frequency range. In this case, the simulation was done for a load capacitance of 10 fF and r l = 1. When we change the number of phases, we can either keep the input frequency constant or the output frequency. From (23), FOM ratios for both scenarios are plotted in Fig. 6(b) for f o = 100 MHz and f i = 4 GHz. DTG-FF performs better (ratio > 1). In Fig. 6(c) , we compare the simulated FOM for changing FF sizes, with fixed input (at INCLK+ and INCLK− in Figs. 1 and 4) and output capacitances, and it also shows an order of magnitude better than the FOM for DTG. In this case, extra buffers have been added in the clock path when larger FF devices are used. Although the CML-FF FOM is more robust to temperature (∼5% for −10
• C−85
• C) and process variations (∼15%) than the DTG-FF (∼10% and 55%, respectively), a big advantage remains.
Therefore, for low power and jitter performance, DTG logic is preferred for wideband operation, e.g., for flexible softwaredefined radio applications. This is because its power and FOM are automatically reduced for lower frequency [first term in (13) ] whereas CML always dissipates the current that is required at the highest frequency of operation.
IV. CONCLUSION
DTG-FFs and CML-FFs have been compared fundamentally with respect to their potential to realize accurate multiphase clocks in a power efficient way. The comparison is based on an FOM which quantifies the product of mismatch-induced timing jitter variance and power dissipation, normalized for admittance scaling effects. First-order analytical expressions are derived and confirmed by simulations to model mismatch jitter, power dissipation, and jitter-power FOM. The analytical expressions are used to compare FFs and also to design them for low FOM.
A comparison shows that DTG-FFs outperform CML in jitter-power FOM in 90-nm CMOS technology. This is mainly because DTG-FFs only consume power during switching. Moreover, they have less capacitance (no need for a cross-coupled pair) which reduces both power and jitter. The advantage scales roughly with f T /f O so technology scaling benefits DTG logic compared to CML (23). These equations can be useful in selecting FF for multiphase generation, for different technologies and frequencies of operation.
