Low-Jitter Multi-phase Clock Generation: a Comparison between DLLs and Shift Registers by Gao, X. et al.
Low-Jitter Multi-phase Clock Generation: A 
Comparison between DLLs and 
Shift Registers  
Xiang Gao, Eric A.M. Klumperink, Bram Nauta 
CTIT Research Institute, IC Design Group, University of Twente 
7500AE, Enschede, The Netherlands 
E-mail: X.Gao@utwente.nl 
 
 
 
 
Abstract—This paper shows that, for a given power budget, a 
shift register based multi-phase clock generator (MPCG) 
generates less jitter than a delay-locked loop (DLL) equivalent 
when both are realized with current mode logic (CML) circuits 
and white noise is assumed.  This is due to the factor that the 
shift register MPCG has no jitter accumulation from one clock 
phase to the other as in the DLL based MPCG.  For N-phase 
clock generation, the shift register MPCG needs a reference 
clock with N times higher frequency and thus requires a VCO 
with higher frequency than the DLL counterpart. However, we 
can show that this does not lead to additional power 
consumption.  
I. INTRODUCTION  
Multi-phase clocks are useful in many applications. In high-
speed serial link applications [1], multi-phase clocks are used to 
process data streams at a bit rate higher than the internal clock 
frequencies. In wideband wireless communication systems like a 
software defined radio, multi-phase clocks can be used for 
cancelling un-wanted harmonics and sidebands without using 
filters[2].  
For multi-phase clock generation, delay-locked loops (DLLs) 
are often used [3]. Other than a DLL, a shift register can also be 
used to generate multi-phase clocks [2]. Compared with a DLL 
multi-phase clock generator (MPCG), a shift register based MPCG 
uses N times higher frequency for N-phase clock generation and at 
first glance seems to have more power consumption. However, a 
shift register MPCG does not have jitter accumulation from one 
clock phase to the other clock phase as in a DLL equivalent, which 
should be taken into account for a fair comparison. This work 
makes a solid comparison between MPCGs using a DLL and a shift 
register, primarily based on their power and jitter performance.  
The rest of the paper is arranged as follows. Section II describes 
the architecture of a DLL MPCG and analyses its stochastic jitter. 
Section III examines the stochastic jitter of a shift register based 
MPCG. These two structures are then compared in Section IV. In 
section V, the simulation results are presented and Section VI 
discusses the conclusions. 
 
Figure 1. (a) DLL based MPCG architecture (b) CML delay unit schematic 
II. DLL BASED MPCG JITTER  
A.  DLL Based MPCG Architecture  
The architecture of a DLL based MPCG is shown in Fig.1(a). 
It consists of a voltage controlled delay line (VCDL) which has N 
identical delay units (DUs) and a control loop which is formed by a 
phase detector (PD), a charge pump (CP) and a loop filter (LF). In 
the DLL, a reference clock generated by a VCO, CLKref, with the 
wanted frequency f is propagated through the VCDL. The loop 
compares the phase of the last output of the VCDL CLKN with 
CLKref and controls the VCDL so that the total delay time is one 
reference clock period. Therefore, the outputs of the N DUs 
CLK1~CLKN are multi-phase clocks with ideally 2π/N phase shift 
in between.  
Because of better supply noise and substrate bounce rejection, 
current mode logic (CML) delay units are often used in DLL 
designs. To compare the output jitter of a DLL based MPCG and a 
shift register based MPCG, we assume CML circuits are used in 
both structures. The simplified schematic of a CML delay unit is 
shown in Fig.1(b). It is based on an NMOS source coupled pair 
driving the resistive load RL and biased by a current source IB. Due 
to the full switching of the tail current, the differential output 
swing VSW is determined by RL and IB as: 
       
LBSW RIV ⋅=    (1) 
The amount of delay td is primarily determined by the load 
resistance RL and the load capacitance CL at the output node. If td is 
measured from the input clock crossing point to the output clock 
DU 
CLKref 
PD/CP/LF
CLK1 CLK2 CLKN CLKN-1 
VSW
CL
 RL
IB
DU DU DU 
f 
    VCO 
(b) (a)
28541-4244-0921-7/07 $25.00 © 2007 IEEE.
crossing point, it can be approximated as [4]: 
LBSWLLd CIVCRt ⋅⋅=⋅= )/(2ln2ln      (2) 
B. DLL Based MPCG Output Jitter  
In this work, we assume all noise sources are white and analyze 
absolute jitter performance, for simplicity. 
In a DLL based MPCG, there are three jitter sources: the 
reference clock, the VCDL and the PD/CP/LF control loop. The 
control loop jitter is usually relatively small [5] and thus ignored 
hereafter. It has been shown in [5][6] that for an optimal DLL 
design, its output jitter is defined by the reference clock and the 
jitter of a free-running VCDL. The DLL renders no improvement 
on the reference and VCDL jitter. 
For a free-running VCDL, the jitter will accumulate from one 
delay unit to the other. If we define the rms jitter variance of one 
delay unit as σ2∆td,DU, the jitter variance on the output of the nth 
delay unit (CLKn) σ2∆td,DLL,n will be [5]: 
2
,
2
,, DUtdnDLLtd n ∆∆ = σσ           (3) 
For multi-phase clock applications like the poly-phase multi-
path technique for harmonic rejection [2], the jitter of every clock 
phase is equally relevant. To measure the jitter performance of a 
group of clocks, the average jitter variance σ2∆td,DLL,avgN  is used. 
With (3), σ2∆td,DLL,avgN can be calculated as: 
    2
,
2
,, 2
1
DUtdavgNDLLtd
N
∆∆
+
= σσ   (4) 
In (4), only the VCDL jitter is taken into account, the reference 
clock jitter will be discussed in Section IV. 
III. SHIFT REGISTER BASED MPCG JITTER  
A. Shift Register Based MPCG Architecture 
The architecture of a shift register based MPCG is shown in 
Fig.2(a). It consists of a divide-by-N followed by a D-FlipFlop 
(DFF) chain with N DFFs1. In the shift register MPCG, a reference 
clock generated by a VCO, CLKref, with a frequency of N·f, is fed 
into the divide-by-N and the DFF chain. The divider-by-N 
generates a clock at the frequency of interest f which is then fed into 
the DFF chain at its D input. Since a DFF is sensitive to rising 
edges, the Q output of each DFF will be delayed from the previous 
DFF Q output by one period of CLKref which is equivalently a 2π/N 
phase delay at the wanted frequency f. Therefore, the output clocks 
CLK1~CLKN of the DFF chain are N-phase clocks with ideally 2π/N 
phase shift in between. 
B. Shift Register Based MPCG Output Jitter  
For a proper operation of a DFF, its D input signal must be 
stable before the input clock starts to switch. When the input clock 
switches, the logic value on the D node is transferred to the Q 
output. In other words, the timing of the DFF output is determined 
by the DFF input clock. The D input signal only acts as an 
“enabler” of a transition and will not affect the jitter of the DFF 
output.  
In Fig.2(a), the divide-by-N is an “enabler” for the first DFF 
and will not affect the jitter of the DFFs. Therefore, the power 
____________________ 
1The use of DFF chain here assumes that only the clock rising edges are 
used. If both rising and falling edges can be used, i.e., with 50% duty cycle 
reference clock, DFFs can be replaced with simpler latches [2] and thus 
reduce the shift register power consumption by two times. 
 
Figure 2. (a) Shift register based MPCG architecture (b) DFF block 
schematic 
consumption on the divide-by-N can be made small. In addition, it 
is not an indispensable block. Its function can be incorporated into 
the DFF chain, e.g., with a simple AND gate feedback as in [2]. 
Therefore, the jitter and power of the divide-by-N is not taken into 
account in the following calculations. There are now two jitter 
sources in the shift register based MPCG: the reference clock and 
the DFF chain. Since the shift register MPCG is an open loop 
system, it has no improvement on the reference and DFF chain 
jitter.  
In Fig.2(a), the output jitter of a DFF is not affected by the jitter 
of the previous DFF since the previous DFF only acts as an 
“enabler”. Therefore, there is no jitter accumulation from one clock 
phase to the other clock phase as in the DLL MPCG. If we define 
the rms jitter variance of one DFF as σ2∆td,DFF, the average jitter 
variance for the set of N-phase clocks generated by the DFF chain 
σ2∆td,SR,avgN can be calculated as: 
2
,,
2
,, noiseDFFtdavgNSRtd ∆∆ = σσ   (5) 
In (5), only the DFF chain jitter is taken into account, the 
reference clock jitter will be discussed in Section IV. 
IV. COMPARISON BETWEEN DLL AND SHIFT REGISTER  
It’s clear from Section II and III that a DLL MPCG has two 
major jitter sources: the reference clock and the VCDL, while a 
shift register MCG also has two jitter sources: the reference clock 
and the DFF chain. The comparison will start with the reference 
clock jitter. Then the VCDL jitter and DFF chain jitter are 
compared.  
A. Comparing the Reference Clock Jitter 
From the above analysis, we see that the DLL and shift register 
based MPCGs both have no improvement on the reference jitter.  
However, the shift register MPCG requires a reference clock with N 
times higher frequency than that of the DLL equivalent. The VCO 
in the shift register then should work at N times higher frequency. 
This may lead to a concern that the shift register puts more 
requirements on the VCO. 
 To compare VCOs running at different frequencies, the so 
called Normalized Phase Noise Figure of Merit FOMNor-PN can be 
used [7]: 
)
1
log(10))(log(10 2
2
mW
P
f
ffLFOM DC
OSC
m
mPNNor +=−
  (6) 
in which L(fm) is single side band phase noise to carrier ratio of the 
VCO at an offset frequency fm, fOSC is the VCO working frequency 
and PDC is the VCO power dissipation.  
For the shift register MPCG, fOSC is N times higher. However, 
with the same phase noise requirement at the frequency of interest, 
L(fm) can be N2 times higher since the shift register MPCG also  
÷N
CLK1 CLK2 CLKN-1 CLKN
D Q 
CLKref
D Q D Q D Q D Q VCO 
(b) (a) 
 N*f 
2855
 Figure 3. (a) Schematic of CML latch at the switching instant. (b) 
Simplified schematic for jitter analysis. 
functions as a divide-by-N which theoretically improves L(fm) by a 
factor of N2. These two cancel each other if substituted into (6). In 
other words, although the VCO in the shift register MPCG runs at N 
times higher frequency, it consumes the same power as the VCO in 
the DLL MPCG for the same quality of design. If LC VCOs are 
used, then higher working frequency may even be preferred, since 
Q tends to be higher and the required area for inductors is smaller.  
B. Comparing the VCDL and DFF Chain Jitter 
A DFF can be designed with two master/slave connected 
latches as shown in Fig.2(b). The output jitter of a DFF is not 
affected by the jitter of the first latch, since the first latch is only an 
“enabler”. The CML implementation of a latch is shown in Fig.3(a). 
At the zero crossings of the input clock CLK, the differential stage 
M1 and M2 is balanced. For a proper operation of the DFF, the D 
signals of the latch are already stable before the CLK starts to 
switch. Therefore, one of the transistors M3 and M4 is off and the 
other one is in triode region. The same happens to the transistors 
M5 and M6. Neglecting the transistor’s on resistance, the schematic 
of the latch can be simplified as shown in Fig.3(b)[8], with RL and 
CL the effective load resistance and capacitance. Fig.3(b) is exactly 
the same as the schematic of the CML delay unit as shown in 
Fig.1(b).  Therefore, we can apply the same jitter analysis for a 
delay unit and a DFF. 
The jitter variance of the circuit shown in Fig.3(b) can be 
predicted using the analysis presented in [8] as: 
2
2 2)
2
1(
B
LLmT
Ttd I
kTCRg
⋅⋅++=∆ γγσ
  (7) 
where γ and γT are respectively the noise factor of the transistors of 
the differential pair and the tail bias transistor. gmT is the 
transconductance of the tail bias transistor. For a correct circuit 
operation, the tail bias transistor should be ensured to work in the 
saturation region. The overdrive voltage VOV,T should be smaller 
than the allowable voltage headroom VS,T for the tail transistor. 
Therefore, gmT can be written as: 
TS
B
TOV
B
mT V
I
V
Ig
,,
αα ≥=   (8) 
with α the transistor model parameter which is equal to two for the 
square-law model. In order to achieve low jitter, gmT should be set at 
its minimum value. Using (1) and (8), the jitter will then be: 
2
,
2
,
2 2)
2
1(2)
2
1(
B
L
TS
SW
T
B
L
TS
LB
Ttd I
kTC
V
V
I
kTC
V
RI
⋅⋅++=⋅⋅++=∆
αγγαγγσ   (9) 
With (2), (9) can be re-written as:  
B
d
SWTS
SW
Ttd I
t
V
kT
V
V
⋅
⋅
⋅⋅++=∆ }2ln
2)
2
1{(
,
2 αγγσ          (10) 
 
Figure 4. Timing diagram for (a) a DU, (b) a latch 
When VSW is chosen, VS,T is a constant for a speed optimized 
CML circuit design and can be derived as [9]:  
})
2
1(
2
1{
1
,
α+−−= SWTDDTS VVVV   (11) 
Therefore, the bracketed part of (10) is a constant once VSW is 
chosen and can be represented by a constant c: 
Bdtd Itc /
2
⋅=∆σ        (12) 
In most of the clock generator designs, jitter and power are the 
two most important parameters. Via admittance level scaling [10], 
noise power and hence jitter variance can always be reduced at the 
cost of increasing power consumption. In order to take this tradeoff 
into account, a (1mW) power normalized jitter variance σ2∆td,NorP is 
defined to make a fair comparison: 
)1/(22 , mWPtdNorPtd ⋅= ∆∆ σσ                 (13) 
where σ∆td is the amount of jitter and P is the power consumption. 
For the same circuit, applying admittance level scaling won’t 
change the value of its σ2∆td,NorP. A design with a smaller σ2∆td,NorP 
means it generates less jitter, given the same amount of power.  
For a CML circuit, the total power consumption is dominated 
by the static power. Therefore, the σ2∆td,NorP can be derived with (12) 
and (13) as:   
 
dDDDDBBdNorPtd tmWVcmWVIItc ×⋅=⋅×⋅=∆ )1/()1/()/(
2
,σ    (14) 
which indicates that the σ2∆td,NorP is proportional to the amount of 
delay for a CML delay unit when VSW is chosen. 
If we focus on the jitter generated by the DLL and the shift 
register, the σ2∆td,NorP of the two structures can be compared with 
(4), (5), (12) and (13) as: 
DUd
Latchd
DDDUB
DUB
DUd
DDLatchB
LatchB
Latchd
avgNDLLNorPtd
avgNSRNorPtd
t
t
N
mW
VIN
I
t
cN
mW
VIN
I
t
c
,
,
,
,
,
,
,
,
2
,,,
2
,,,
1
4
12
1
1
2
⋅
+
=
⋅⋅
××
+
⋅⋅
×
=
∆
∆
σ
σ
  (15) 
where the parameters with subscripts DU and Latch are related to a 
DU in the DLL and a latch in the shift register, respectively. 
Although the simplified jitter analysis schematic for the DU and 
the latch is the same, there are some differences between them.  
Fig.4 shows a timing diagram for the DU and the latch. For the DU, 
its output is a clock phase and its input is the previous clock phase. 
Therefore, the delay of the DU is functionally required to be: 
fNN
Tt DUd
⋅
==
1
,
   (16) 
where T and f are the period and frequency of the wanted N-phase 
clocks. 
For the latch, its output is one clock phase and its input is the 
reference clock. There is no such a delay requirement as (16). For 
IB
M2 
IB 
M1 
M5 
CLK CLK 
D D 
M3 M4 
M6 CL 
RL 
M1 M2CLK CLK
td,DU=T/N 
(a) (b) 
td,Latch <T/N 
(a) (b)
2856
 
Figure 5. Simulation results for (a) a CML delay unit (b) DLL and shift 
register FOM comparison.  
the DFF to work properly, it should satisfy: 
fNN
Ttatt LatchdsuLatchd
⋅
=<+=+
1)1( ,,
         (17) 
where tsu is the setup time required by the DFF, a is the ratio 
between td,Latch and tsu which is design dependant and larger than 
zero. Equation (17) shows that there is a certain boundary for how 
high the working frequency of a shift register MPCG can be due to 
the technology limitation. Defining the maximum working 
frequency of the shift register MPCG as fmax,SR, the latch delay will 
have its minimum value td,Latch,min at fmax,SR. With (17), td,Latch,min can 
be derived as: 
SR
Latchd fNa
t
max,
min,,
1
1
1
⋅
⋅
+
=
                 (18) 
Since a small delay is preferred for a small σ2∆td,NorP, the latch 
delay can be chosen to its minimum value as in (18). For a DU, the 
delay is limited by (16). Taking this factor into account, (15) can be 
re-written as: 
1
4
1
1
max,
2
,,,
2
,,,
+
⋅⋅
+
=
∆
∆
Nf
f
a SRavgNDLLNorPtd
avgNSRNorPtd
σ
σ                (19) 
The first part of (19) is smaller than one since the DFF needs a 
finite setup time. The second part in (19)  is smaller than or equal to 
one as soon as the shift register can work at the wanted frequency. 
The third part of (19) is also smaller than or equal to one if the 
wanted number of clock phases is larger than 2, which is normally 
the case. Therefore, (19) is smaller than one which means that the 
shift register MPCG has a smaller σ2∆td,NorP than the DLL MPCG as 
soon as it can work at the wanted frequency. Equation (19) also 
indicates that the advantage of the shift register based MPCG will 
be larger in applications where clocks with larger number of phases 
at lower frequencies are needed. 
Other than a smaller σ2∆td,NorP, a shift register based MPCG also 
has other advantages like shorter settling time, wider tuning range 
and easy of design. However, the maximum achievable working 
frequency is lower than a DLL MPCG since the shift register 
MPCG works at N times higher than the frequency of interest and 
needs a VCO with higher frequency. This may pose design issues. 
V. SIMULATION RESULTS 
In order to verify the conclusion in (14) that the σ2∆td,NorP of a 
CML delay unit is proportional to the amount of delay, simulations 
were done in Spectre with time domain Pnoise analysis. The 
technology used is 0.13um CMOS. The delay is tuned by tuning the 
load capacitance. VSW is chosen to be half VDD. The simulation 
results are shown in Fig.5(a). The simulated values fit the calculated 
curve well. It shows very clearly the trend that the σ2∆td,NorP 
increases with increasing delay time. 
To verify the comparison results for the DLL based MPCG and 
the shift register based MPCG, i.e., (19), simulation is done for a 
free running VCDL and a DFF chain with clean reference clock and 
N equal to 8. For the CML DFF used in the simulation, a is about 
0.5. The simulation results are shown in Fig.5(b). The simulated 
values fit the calculated curve well. 
VI. CONCLUSION 
Analysis show that, for a given power budget, a shift register 
based MPCG generates less jitter than a DLL equivalent when both 
are realized with current mode logic circuits. The reason is that the 
shift register MPCG has no jitter accumulation from one clock 
phase to the other as in the DLL MPCG. Although the shift register 
MPCG requires a reference clock with higher frequency, it does 
not lead to additional power consumption on the VCO. In addition, 
the MPCG using a shift register has the degree of freedom to 
reduce the delay time while jitter generation of a CML circuit is 
proportional to its delay time for a given power budget. The 
advantage of the shift register based MPCG will be larger in 
applications where clocks with larger number of phases at lower 
frequencies are needed. 
 
REFERENCES 
[1] C. K. K. Yang, M. A. Horowitz, “A 0.8-µm CMOS 2.5 Gb/s 
oversampling receiver and transmitter for serial links”, IEEE Journal 
of Solid-State Circuits, vol. 31, pp. 2015 - 2023, December 1996. 
[2] R. Shrestha, E. Mensink, E. A. M. Klumperink, G. J. M. Wienk, 
B.Nauta, "A Multipath Technique Cancelling Harmonics and 
Sidebands in a Wideband Power Upconverter", IEEE International 
Solid-State Circuits Conference (ISSCC), San Francisco, February 6-
8, 2006. ISSCC Digest, pp. 452-453 
[3] C. C. Chung and C. Y. Lee, “A New DLL-Based Approach for All-
Digital Multiphase Clock Generation”, Solid-State Circuits, IEEE 
Journal of, Vol. 39, No. 3, March 2004 
[4] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, “Digital Integrated 
Circuits (A design perspective)”, Prentice Hall, 2003 (Second 
Edition)  
[5] R. C. H. van de Beek, E. A. M. Klumperink, C. S. Vaucher, and 
B.Nauta, “Low-Jitter Clock Multiplication: A Comparison Between 
PLLs and DLLs“, IEEE Transactions on Circuits and Systems—II: 
Analog and Digital Signal Processing, Vol. 49, NO. 8, Aug. 2002  
[6] B. Kim, T.C. Weigandt and P.R. Gray, “PLL/DLL System Noise 
Analysis for Low Jitter Clock Synthesizer Design,” in Proc. Int. 
Symp. On Circuits and Systems, June 1994. 
[7] A. Wagemans, P. Baltus, R. Dekker, A. Hoogstraate, H. Maas, A. 
Tombeur, J. van Sinderen, “A 3.5mW 2.5GHz diversity receiver and 
a 1.2mW 3.6GHz VCO in silicon-on-anything”, IEEE International 
Solid-State Circuits Conference, vol. XLI, pp. 250 - 251, February 
1998. 
[8] S. Levantino, L. Romano, S. Pellerano, C. Samori, A. L. Lacaita, 
“Phase noise in digital frequency dividers”, Solid-State Circuits, 
IEEE Journal of Volume 39, Issue 5, May 2004 Page(s):775 – 784 
[9] R. C. H. van de Beek, “High-Speed Low-Jitter Clock Multiplication 
in CMOS ”, PhD thesis, University of Twente, 2004 ISBN 90-365-
1989-6. (http://doc.utwente.nl/41485) 
[10] E. A. M. Klumperink, B. Nauta, "Systematic Comparison of HF 
CMOS Transconductors", IEEE Transactions on Circuits and Systems 
II: Analog and Digital Signal Processing, Vol. 50, No. 10, Pg. 728 -
741, Oct. 2003  
])[( 22 , psNorPtd∆σ
(a) (b) 
2
,,,
2
,,,
avgNDLLNorPt
avgNSRNorPt
d
d
∆
∆
σ
σ
2857
