Agile All-digital Clock Generators with Spread-Spectrum Capabilities in 28nm technology by Tessitore, Fabio
 TESI DI DOTTORATO 
UNIVERSITÀ DEGLI STUDI DI NAPOLI “FEDERICO II” 
DIPARTIMENTO DI INGEGNERIA ELETTRICA 
E DELLE TECNOLOGIE DELL’INFORMAZIONE 
DOTTORATO DI RICERCA IN 
INGEGNERIA ELETTRONICA E DELLE TELECOMUNICAZIONI 
 
AGILE ALL-DIGITAL CLOCK 
GENERATORS WITH SPREAD-
SPECTRUM CAPABILITIES  
IN 28nm TECHNOLOGY 
FABIO TESSITORE 
 
Il coordinatore del Corso di Dottorato 
Ch.mo Prof. Niccolò RINALDI 
 
Il Tutore 
Ch.mo Prof. Davide DE CARO 
 
 
Anno Accademico 2013-2014 
 
Contents 
 
 
 
 
Introduction.         1 
 
Chapter 1 Spread Spectrum Clocking techniques.  5 
1.1 Spectrum of Frequency Modulated Signal.  7 
      1.1.1 Continuous Frequency Modulation.  7 
      1.1.2 Discontinuous Frequency Modulation. 13 
1.2 Spread Spectrum Clock Generator.   17 
      1.2.1 PLLs/FLLs techniques.    17 
      1.2.2 Ad-hoc all-digital techniques.   21 
1.3 References.       24 
 
Chapter 2 All-Digital SSC Generator in 28nm CMOS  
Supporting Discontinuous Modulation Profile. 29 
2.1 Modulator.       33 
2.2 Delay Line Block.      36 
      2.2.1 NAND-based Digitally Controlled Delay  
               Lines (DCDL).     38 
      2.2.2 MUX-based Digital Delay Interpolator. 44 
      2.2.3 Delay Line Block timing.    46 
2.3 Measurement Unit.      50 
2.4 Circuit analysis and sizing.    53 
      2.4.1 Measurement circuit sizing  
         and limitations.     53 
      2.4.2 Jitter analysis and circuit sizing.  55 
2.5 On-chip measurements.     59 
      2.5.1 Verification of the Measurement Unit.         60 
      2.5.2 Verification without modulation.  60 
      2.5.3 Verification with modulation.   63       
      2.5.4 Comparison with the state of the art.  66 
2.6 References.       67 
 
 
 
 
 
 
 
 
 
Contents                                                                                             f                                                                   
 
Chapter 3 SSC Generator with injection locking Digitally 
Controlled Oscillator (DCO).    69 
3.1 Circuit architecture.     70 
      3.1.1 DCO-block.      72 
      3.1.2 Edge counter architecture.   74 
      3.1.3 Modulator.      76 
      3.1.4 Measurement Unit.     78 
3.2 Circuit analysis and sizing.    82 
      3.2.1 Measurement sizing and 
               Limitations.      82 
     3.2.2 Jitter analysis and circuit sizing.   84 
3.3 Post-layout simulation and future works.  87 
 
 Conclusions.         91 
Acknowledgments.         93 
         
Introduction
Electromagnetic interference (EMI) is an energy disturbance 
that affects the performance of electrical/electronic circuit due to 
radiated electromagnetic fields or conducted power source. In high 
speed digital system, nowadays, the EMI reduction has become an 
essential part of design considerations. In fact, the high-speed clocks
employed in digital chips radiate electromagnetic noise over a wide 
frequency band, which may interfere the performance of the 
equipment that generates it or other electronic equipments in 
proximity to it. The EMI also increases noise and leads to bad jitter 
performance of the integrated circuit (IC).
Regulatory agencies, such as FCC in the United States and the CISPR 
have established the EMI measurement standards, which prescribe the 
maximum allowable emission levels and prescribe also precise 
measurement procedures in order to evaluate compliance with 
regulations.
Many method for reducing the EMI level have been developed in the 
last twenty years. The shielding is a traditional method for preventing 
EMI emissions. This method  consists in covering, fully or partially, 
the emission locations with grounded conductive shields.
Unfortunately, for many system, the shielding technique is the least 
desirable method for reducing EMI emissions.
The line filter is another well-known method that allows to reduce the 
EMI level. In this case the EMI level can be mitigated with the help of 
a low pass filters in order to eliminate the high-order harmonics. In 
particular, this method reduces the rise and the fall times of the signal 
in order to reduce the radiated EMI. However, in the high speed 
system the filtering reduce critical the setup and hold margins,
increases the amount of signal overshoot. This result in a worsening of 
2 Introduction
the jitter performance. Moreover the line filter method is not systemic 
and only produces a limited, local effect.
The spread spectrum clocking (SSC) is an established, effective and 
efficient technique to reduce the radiated EMI. This method allows to 
reduce the level of EMI of digital circuit by intentionally sweeping the 
frequency of the clock signal (frequency modulated) within a certain 
frequency range in order to evenly spread the energy of each clock 
harmonic over a given bandwidth, reducing in this way the peak 
power level of radiated electromagnetic interference.
In this work, a novel all-digital Spread Spectrum Clock Generator 
(SSCG) prototype is presented in all its aspects: design, simulation 
and post-fabrication measurements. The architecture and the first post-
layout simulation results of a second new all-digital SSCG is 
presented in the second part of this thesis.
The thesis is structured as follows:
? The Chapter 1 deals with the SSC techniques for digital 
circuits. In particular, the performances achievable with 
continuous frequency modulated signals and with 
discontinuous frequency modulated signals in spread spectrum 
applications are presented. The SSC is commonly 
???????????? ????? ??? ?????? ??????-loop (PLL). The most 
recent approaches implement SSCG by using all-digital PLL 
or frequency locked-loops (FLL) or ad hoc all-digital 
techniques. A brief description of such approaches is discussed 
in the second part of this chapter.
? The design of an all-digital Spread Spectrum Clock Generator
(SSCG) prototype is presented in Chapter 2. This circuit is 
based on an all-digital architecture which do not require any 
loop to implement frequency synthesis and spreading. The
developed SSCG is realized by using a design flow completely 
based on standard cells simplifying design and porting in new 
technologies, and is able to perform both discontinuous 
frequency modulation or complex modulation profiles.
In the first part of this chapter the whole SSCG architecture is 
explained in detail. Afterwards, the circuit sizing details are 
given and the major components of deterministic jitter are
Introduction 3
discussed. Finally, the on-chip measurement results are 
reported and are compared with the state of the art.
? The implementation of a second novel all-digital SSCG with 
injection locking digitally controlled oscillator (DCO) is 
presented in Chapter 3. The circuit allows the clock 
multiplication and is able to perform the injection locking 
technique in order to reduce the jitter of the circuit.
At first, a detailed description of the whole SSCG architecture 
is discussed. Afterwards, the circuit implementation details are 
given and the major sources of deterministic jitter are also 
discussed. Finally, the first post-layout simulation results are 
presented.
 
Chapter 1
Spread Spectrum Clocking techniques
Spread-spectrum clocking (SSC) [1], also known as “clock 
dithering” or “clock frequency modulation” is a well known approach 
to reduce the electromagnetic interference level (EMI) produced by 
digital chips. This method generates a clock signal with a frequency 
that is intentionally swept (frequency modulated) within a given 
frequency range with a predetermined modulation waveform 
(modulation profile). In particular, SSC allows us to spread the energy 
of each clock harmonic over a certain bandwidth in order to mitigate.
the peak power level at each harmonic. 
Figure 1.1. Frequency domain representation at a harmonic 
of a trapezoidal clock signal with and without SSC [1].
This is demonstrated in figure 1.1, where a harmonic of a clock signal 
with and without spreading is shown. SSC is nowadays used in a 
6 Chapter 1 - Spread Spectrum Clocking techniques
number of applications, like high-speed serial links [2],[3], high-
performance digital circuit [4]-[8] and switching power converters [9]-
[12].
The peak power level reduction, also known as modulation gain, 
generally depends on the frequency deviation (modulation depth), 
modulation frequency (fm) and on modulation profile (modulation 
waveform). The procedures to evaluate the peak power level of 
frequency-modulated waveforms are prescribed by the EMI 
measurements standards [13]-[16]. Please note that these procedures 
require the analysis of the signal with a spectrum analyzer in a swept-
frequency mode with a prescribed resolution bandwidth (RBW) and 
peak-type detector. 
The spectrum of frequency-modulated signal is studied in [1] by using 
the Fourier series. However, it is generally difficult to obtain an 
analytic form of a frequency modulated clock signal by using the 
Fourier transform. Moreover, the Fourier analysis does not take into 
account the combined effect of the spectrum analyzer RBW and the 
peak-type detector. This results in a remarkable difference between
the experimental data and the spectrum mathematically obtained 
through Fourier transform. However, this property is lost when the 
effect of the detector and spectrum analyzer RBW are considered.
An empirically derived optimal modulation waveform, when the 
spectrum analyzer RBW and the peak-type detector are considered, is 
presented in [17].
An analytical and efficient method for determining the spectrum of 
modulated signal by considering the effect of the spectrum analyzer 
RBW and the detector is presented in [18] and [19]. In particular, in 
these papers, it is theoretically demonstrated for the first time that the 
optimal modulation frequency is close to the spectrum analyzer RBW, 
when the peak-type detector is taken into account. Moreover, in [18]
and [19] is shown that the optimal modulating waveform can be 
obtained as a solution to a simple differential equation. However, 
following this approach the optimal waveform depends on the 
spectrum analyzer filter shape. Therefore, if different instruments with 
slightly different filter shape are used then a slightly different results 
can be observed. Furthermore, the basic formulae for analyzing a 
frequency modulated clock signal and its spectrum derived in [18] and 
[19] assumes the continuity of the modulating waveform and provides 
only a coarse approximation in the case of waveforms with 
discontinuous frequency modulation. 
7
In [20] the analysis of [18] and [19] is extended to the discontinuous 
frequency modulation. Therefore, in this paper, for the first time the 
peak-level reduction of the spectrum achievable with discontinuous 
frequency modulated signal, in spread spectrum clocking applications,
is investigated. In particular, in [20] is shown that the discontinuous 
frequency waveforms can achieve higher modulation gains in 
comparision to continuous frequency signals. In addition, the 
spectrum of a discontinuous frequency modulated signals, measured 
by a spectrum analyzer with a given RBW, is obtained in closed form. 
The optimal discontinuous frequency modulated waveform is, in this 
paper, also obtained, under some assumptions, as the solution of a 
first-order differential equation.
The analysis presented in [20] is very important since the recent all-
digital SSC generators [5],[7] allow the synthesis of the clock signals 
with a discontinuous frequency behaviour.
This Chapter is organized as follows. The section 1.1 recalls the 
results of [18],[19] and [20] and shows the simulation results in which 
the discontinuous frequency modulation allows an improvement of the 
modulation gain with respect the continuous frequency modulations. 
Afterwards, a brief introduction to the state-of-the-art of the SSC 
generators is presented in the section 1.2.
1.1 Spectrum of Frequency Modulated 
Signal
1.1.1 Continuous Frequency Modulation
Let a function u(t) represent the waveform of a clock signal with a 
constant fundamental frequency f0. This signal can be written by using 
the Fourier series as:
0
0( ) exp[ 2 ]2
k
k
Iu t j kf t???
???
? ?? (1.1)
8 Chapter 1 - Spread Spectrum Clocking techniques
The waveform of the frequency-spread clock signal us(t) is obtained 
by replacing the variable t with the variable ?? as follows:
( ) ( )su t u t?? (1.2)
where:
( )
mf t
m
t t V d
f
? ? ?
??
? ? ? ? (1.3)
and ???? is a periodic function of unity period with values [-1,+1]. By 
using the equations (1.2) and (1.3) the signal us(t) can be written as:
0
0( ) exp 2 ( )2
mf t
k
s
k m
Iu t j kf t V d
f
?? ? ???
??? ??
? ?? ?? ? ?? ?? ?? ?? ?? ?? ?
? ?
? ?00
1
( ) Re ( )
2 2
k
k
k k
I t I I t
?? ??
??? ?
? ? ?? ? (1.4)
Therefore, the instantaneous frequency f(t) of this waveform is given 
by:
0
1( ) 2 ( )
2
mf t
m
df t f t V d
dt f
?? ? ?? ??
? ?? ?? ?? ?? ?? ?? ?? ?? ??
0(1 ( ))mf V f t?? ? (1.5)
??? ???????? ???? ????????? ?????? ????? ????? ?? ???? fm correspond, 
respectively, to the relative frequency deviation (??????0) and to the 
modulation frequency of the frequency-spreaded clock waveform. 
In this section the possible frequency overlapping effect of the 
modulated spectrums of neighbour harmonics is neglected, 
??????????? ????? ??? ???????? ????? ????<<f0. Therefore, only the 
spectrum of a single modulated harmonic Ik(t) is analyzed.
The modulation gain (Gain) is defined as follows:
? ?0max ( )k c
IGain
S f
? (1.6)
9
where I0k and max[S(fc)] are, respectively, the amplitude of the un-
modulated harmonic and the peak value of the spectrum S(fc) of a 
frequency modulated harmonic Ik(t).
The spectrum S(fc) is the spectrum measured by a swept-frequency 
spectrum analyzer in a peak-hold mode (see figure 1.2), in according 
to the EMI measurements standards [13]-[16].
Figure 1.2. Model of a swept-frequency spectrum analyzer 
in a peak-hold mode.
Figure 1.2 shows the model of a swept-frequency spectrum analyzer 
in a peak-hold mode. It is realized by using a band-pass filter with 
impulse response h(t,fc) centered around the frequency fc, followed by 
a peak detector. The filter impulse response h(t,fc) can be written as:
? ?0( , ) ( ) exp 2c ch t f h t j f t?? ? (1.7)
where h0(t) is a low-pass impulse response with a 3dB bandwidth 
RBW. The parameter RBW correspond to the so-called RBW of the 
spectrum analyzer. The output signal of the filter can be written as:
( , ) ( ) ( , )b c k cI t f I t h f d? ? ?
??
??
? ? ?? (1.8)
The peak-detector evaluates the maximum absolute value of Ib(t,fc),
named S(fc). The spectral component S(fc) is plotted by the spectrum 
analyzer at frequency fc. In [18], the criterion bandwidth is defined as:
? ?1/24sw mB k f f? ? ? (1.9)
10 Chapter 1 - Spread Spectrum Clocking techniques
In [18] and [19], it is demonstrated that if RBW>>Bsw then no 
reduction of the peak spectral components can be obtained. However, 
in our applications of interest that is the frequency-spreading 
applications, the condition RBW<<Bsw can be considered valid. In 
[18] and [19] this hypothesis results in an approximate expression of 
the output of the band pass filter in figure 1.2: 
? ? ? ? ? ?1/2( , ) ( ) ,b c n n c k n
n
I t f jkf t h t t f I t??? ? ?? (1.10)
where tn denotes the time at which the instantaneous harmonic 
frequency k·f(t) coincides with the filter center frequency fc, that is  
k·f(t)= fc.
Let us consider an example of a clock signal that is frequency 
modulated with a simple triangular waveform. In particular, the output 
of the band-pass filter of figure 1.2 is schematically shown in figure 
1.3 for two different center frequencies fc1 and fc2. As you can see, the 
frequency fc1 is near to the harmonic center frequency ??f0. In this 
case, the impulse response h((t-tn,fc1) does not interfere with the 
successive impulse response h((t-tn+1,fc1). This implies that the 
maximum value of Ib((t,fc) becomes independent of fc and depends 
only on the maximum value of h((t,fc) and on the first derivative of 
instantaneous frequency f(t).
Figure 1.3. Schematic illustration of the output of the bandpass filter Ib(t,fc)
considering a triangular modulation for two different filter center frequencies.
11
In [20] it is shown the simulated spectrum of the first harmonic (k=1) 
of a clock signal with a triangular modulation with ??=5MHz, 
fm=40KHz and RBW=100kHz (see figure 1.4). This figure shows that 
the spectrum is almost flat for fc close to f0 and the modulation gain in 
the middle is about 15.5 dB.
Instead, when a frequency fc2 is chosen close to the upper 
instantaneous harmonic frequency (k(f0+??)) then the two successive 
pulses h((t-t?n,fc2) and h((t-t?n+1,fc2) interfere with each other. This 
interference can be either constructive or destructive, depending on 
the particular value of fc2. As you can see in figure 1.4, this implies an 
oscillating behaviour of the spectrum. In particular, when the two 
pulses interfere constructively, the maximum value at the output of the 
filter is increased and the modulation gain on the obtained spectrum 
reduces (e.g. in figure 1.4 it reduces to about 10.4 dB).
Figure 1.4. Spectrum S(fc) of the first harmonic (k=1) of a clock signal with a 
triangular modulation (??=5MHz, fm=40KHz and RBW=100KHz) [20].
In [20] is also shown why, by intuition, the discontinuous modulation 
frequency can provide higher modulation gains with respect to the 
continuous modulation frequency signal. To clarify this point, the 
figure 1.5 shows a schematic illustration of the output of the band-pass 
filter considering, unlike the figure 1.3, a sawtooth modulating 
waveform that is a discontinuous waveform (around t=kTm). As you 
can see in figure, the time distance between two successive pulses is 
independent of the particular filter center frequency fc.
The figure 1.6 shows the simulated spectrum of a clock signal with a 
sawtooth modulation and the same value of f '(t) with respect to the 
case of figure 1.3. 
12 Chapter 1 - Spread Spectrum Clocking techniques
Figure 1.5. Schematic illustration of the output of the bandpass filter Ib(t,fc)
considering a sawtooth modulation for two different filter center frequencies.
Figure 1.6. Spectrum S(fc) of the first harmonic (k=1) of a clock signal with 
sawtooth modulation (??=5MHz, fm=80KHz and RBW=100KHz) [20].
The modulation gain is increases to about 14.1dB with respect to the 
modulation gain obtained by using a triangular modulation (10.4dB). 
However, this modulation gain value for the sawtooth modulation 
differs from the value predicted by using the equation (1.10). In fact, a 
non oscillating spectrum behaviour with a modulation gain close to 
15.5dB is expected by using the equation (1.10). Therefore, the 
???????????????????????????????????????????????????????????? ?????????
?????????? ??????????????????????????????????????????????????????????
13
of the spectrum. This oscillation depends on the discontinuity of the 
instantaneous modulating frequency. This discontinuity is not taken 
into account in (1.10).
The next section recalls the analytical approximation of the spectrum 
of discontinuous frequency modulated derived in [20].
1.1.2 Discontinuous Frequency Modulation
In the previous section, it was shown that the frequency discontinuity 
allows to avoid the constructive interference at the output of the band-
pass filter. However, an oscillating behaviour of the spectrum caused 
by the discontinuity points has been observed too. This results in a 
worsening of the modulation gain. Therefore, in order to minimize the 
discontinuous frequency points, in [20] the signal of interest is the one 
described by (1.4) where V(t) is a continuous and monotonic function 
for t ? [0,1]:
(0) 1
(1) 1
V
V
? ??? ? ?? (1.11)
Thanks to the constraints of (1.11) the only discontinuous frequency 
points are the points n?Tm. Therefore, the output of the band-pass filter
(figure 1.2) can be written as:
( 1)
( , ) ( ) ( , )m
m
n T
b c k cnT
n
I t f I h t f d? ? ??? ?
???
? ? ?? ? (1.12)
so thanks to the (1.11) the signal Ib((t,fc) is equal to a summation of 
integrals with continuous integrands.
In [20] the method of stationary phase [21]-[24] is used in order to 
compute each integral in (1.12). This method requires two conditions, 
the first one is the criterion bandwidth (see eq. (1.9)) while the second 
one can be written as:
mk f f? ? (1.13)
14 Chapter 1 - Spread Spectrum Clocking techniques
This condition is always verified in spread-spectrum applications. 
Assuming that the modulation function V(t) is monotonic and 
considering the previous two conditions, each integral in (1.12) can be 
approximated as:
( 1)
( ) ( , )m
m
n T
k cnT
I h t f d? ? ?? ? ??
? ?
/4
01/2
0
1 ( ) ( , ) ( ( ))
( )
j
k n n cI t h t t f e t kf t
kf t
? ? ? ?? ???? (1.14)
???????????????????????????
? ?( ) 1 2 1 2 / 2 /t C t S t? ?? ? ? ?? ? ? ? ? ?? ? ? ?? (1.15)
C(t) and S(t) are the well-known Fresnel integrals.
In (1.14) ?? is equal to:
0 0min[ , ]mt t T t? ? ? (1.16)
By substituting (1.14) in (1.12):
? ?
/4
1/2
0
1( , ) ( ) ( , )
( )
j
b c k n n c
m
I t f I t h t t f e
kf t
???
???
????
0( ( ))t kf t? ? ??? (1.17)
?????? ???? ????????? ?? ????????? ? ???? ???? ??????? ??? ???? ??????????????
frequency points. The peak detector computes the spectral component 
S(fc) that is the maximum value of the filter output max[Ib(t,fc)]. The 
spectral component S(fc) is computed when the pulses do not overlap 
each other. Since the pulses h(t-tn,fc) are spaced by Tm and the duration 
of each pulse is about 1/RBW, the pulses not overlap when 
fm<<RBW. Therefore, the spectral component can be written as:
? ? ? ?0 0 01/20
1( ) max[ ( )] ( )
( )
c kS f I h t t kf t
kf t
? ? ????? (1.18)
15
The good agreement between the analytical approximation of (1.18) 
and the simulation results is shown in figure 1.6.
In addition, in [20] the optimal discontinuous frequency modulation 
waveform is obtained numerically as the solution of a nonlinear, first 
order, boundary value problem with a Dirichlet boundary condition.
Figure 1.7 shows the modulation gain obtained for different 
modulation techniques by varying fm. In particular, the modulation 
gain is evaluated for ?? = 5MHz, RBW = 100 kHz and fm is varied 
between 10 and 500 kHz. 
Figure 1.7. Modulation Gain for different modulations and modulation frequencies 
(??=5MHz, RBW=100kHz).
As you can see, the modulation gain increases with the modulation 
frequency as long as the fm is lower than 100 kHz, in accordance with 
(1.18). When fm is higher than 100 kHz then the condition fm<<RBW 
begins to no longer verified. This implies that the equation (1.18) 
becomes progressively less accurate. For fm>>RBW the spectrum 
analyzer filter does not have any effect on the spectrum. In this case, 
the Fourier analysis can be used to compute the spectrum of the 
frequency modulated clock signal and the modulation gain decreases 
with the fm. Moreover, the figure 1.7 shows that for low fm values the 
highest modulation gain is achieved by the continuous frequency 
optimal modulation [18],[19]. Instead, for high fm values the highest 
modulation gain is obtained by the optimal discontinuous modulation 
derived in [20]. In fact, by increasing the modulation frequency, the 
16 Chapter 1 - Spread Spectrum Clocking techniques
continuous modulation performances become severely limited by the 
pulses superposition effect.
The simulation result of figure 1.7 also shows that, for fixed frequency 
deviation ?? and RBW, the employment of discontinuous optimal 
frequency modulation allow to obtain the maximum modulation gain 
value, 16.5dB. This best modulation gain is 0.5dB and 2dB higher 
than the highest modulation gains achievable by using sawtooth 
modulation frequency and continuous optimal modulation frequency. 
17
1.2 Spread Spectrum Clock Generator
Modern system-on-chip integrates several modules (e.g. CPU, 
graphics, memories, USB interfaces, I/O interfaces,....) working at 
different frequencies that can be dynamically varied to implement  
localized dynamic and frequency scaling. In particular, some modules 
require spread spectrum clocking to reduce the EMI level, while 
others modules (e.g. USB interfaces) need un-modulated clock signal 
with reduced jitter. The clock generators should fulfil all those 
requirements (frequency synthesis and spreading capabilities, low 
jitter).
The spread spectrum clock generators (SSCGs) are commonly 
implemented by using phase-locked loop (PLL) [25]-[28], frequency-
locked loop (FLL) [8] or ad-hoc all digital techniques [4],[5],[7],[29].
Unfortunately, the wake-up time in a PLL/FLL architectures is limited 
by the loop locking-time. Moreover, PLL/FLL bandwidth limitation 
makes it difficult to implement triangular or Hershey-kiss modulation 
profiles and do not allow to produce discontinuous modulation (e.g. 
sawtooth) that result in additional EMI improvements. All-digital 
techniques, on the other hand, suffer from very large deterministic 
jitter.
1.2.1 PLLs/FLLs architectures
The PLL-based SSCGs architectures are reported in [25]-[28]. The all-
digital digital clock generator core presented in [25] is shown in figure 
1.8.
Figure 1.8. Simplified block diagram of the All-Digital Clock Generator 
Core proposed in [25]
A time-digital converter (TDC) is used as a phase detector in order to 
provide a digital word to the digital skew correction (DSC) logic. The 
18 Chapter 1 - Spread Spectrum Clocking techniques
digital phase detector (DPD) allows to evaluate the phase error 
between the reference and the oscillator output by using the TDC 
output. The spread spectrum clocking is implemented within a PLL 
loop. In particular, the SSC is performed as a frequency  modulation at
the output of the oscillator. Figure 1.9 shows the block diagram of the 
SSC. As you can see, in the SSC block an arbitrary spreading 
operation is performed by adding the frequency command word to a 
digital modulation sequence. 
Figure 1.9 SSC block diagram [25].
The architecture proposed in [26] uses a fractional injection-locked 
oscillator. The injection locking is a well-know technique that allows 
to reduce the output jitter of clock generators. 
Figure 1.10. Conventional injection locking method.
The figure 1.10 shows the conventional injection locking technique. In 
this method the reference signal is shorted to the output of the delay 
19
cell (DC), named reference-injected DC, in order to realign the output 
edge of the delay cell every reference period. The accumulated jitter 
from the VCO is, therefore, suppressed since the output edge of the 
delay cell has been realigned in the correct position. The drawback of 
this solution is that the output frequency can be changed only by 
integer multiplies of the reference frequency.
The solution of [26] allows to overcome this drawback. The figure 
1.11 shows the fractional injection locking technique proposed in [26].
In this architecture, a ring oscillator with a multiphase output is used 
in order to allow the injection to an output that is a fractional multiple 
of the reference frequency. Therefore, the reference-injected delay cell
is selected in according to the fractional value of the frequency control 
word. In figure 1.11, as an example, the injection-delay cell changes 
from DC5 to DC1 in order to obtain a fractional multiplication factor 
equal to N+1/5. 
Figure 1.11. Fractional injection locking method of [26].
In [27] the spread spectrum clocking is performed by using a digital 
PLL with a high-frequency random modulation (RM). Figure 1.12
shows the architecture of [27]. A 3-phase clock is generated by using 
a digitally controlled oscillator (DCO). The frequency of the DCO is 
controlled by a digital code. The phase error (Ne) is evaluated by 
subtracting the output of the phase quantizer and the accumulated 
frequency control word. A digital loop filter (DLF) receives the phase 
20 Chapter 1 - Spread Spectrum Clocking techniques
error and provides the DCO control signal. A pseudo-random binary 
sequence (PRBS) generator is used to generate the random sequence 
which is added to the DLF output.   
Figure 1.12. Random modulation digital PLL architecture proposed in [27].
In [28] a triangular-modulated SSCG with phase-rotating technique is 
proposed. This architecture allows to obtain an improved jitter 
?????????????????????????????????????????????????????????????-based 
technique.
The SSCG architecture presented in [8] is realized by using a
frequency-locked loop (FLL) with a memoryless Newton-Raphson 
modulation profile. In this work, the profile generator performs the 
Newton-Raphson mathematical algorithm in order to generate the 
optimized nonlinear profile without the need for memory. This result 
in a reduction of area and power consumption.
21
1.2.2 Ad-hoc all-digital technique
The SSCGs implemented with ad-hoc all-digital techniques are 
presented in [4],[5],[7],[29]. The SSCG architecture proposed in [29]
allows to perform both clock spreading and synthesis. An output 
frequency of 400MHz is obtained by parallelizing six generators. 
However, this circuit requires a careful design to avoid metastability 
and a complicate testing due to the use of many clock domains.
In [5] an all-digital SSCG is implemented by using a digital controlled 
delay line with a digital circuit to control it. The figure 1.13 shows a 
simple block diagram of this SSCG architecture. Note that the digital 
delay line is employed to modulate an input clock signal. In particular, 
the control digital circuit allows to increase or decrease the delay on a 
clock in order to obtain a modulated output signal. 
This circuit can do up and down spread by modulating the reference 
frequency with a triangular waveform. However, in this circuit the 
frequency synthesis capability (output frequency multiplication or 
division) is not allowed and the maximum clock frequency is limited 
to 27MHz.
Figure 1.13. SSCG top-level diagram of [5].
The figure 1.14 shows the SSCG architecture proposed in [4]. As you 
can see, a delay cell array (DCA) is used in order to control the 
position of clock transitions with a triangular modulation profile. In 
particular, the delay of each individual delay cell is tailored to achieve 
the required period modulation. This architecture does not allow 
performing the frequency synthesis and operates at 100MHz. 
Moreover, the process, voltage and temperature (PVT) variations of 
the delay lines are not compensated. This result in a variation of the 
output waveform modulation depth with PVT.
22 Chapter 1 - Spread Spectrum Clocking techniques
Figure 1.14. SSCG with Delay Cell Array of [4].
The all-digital SSCG presented in [7] is based on a fully synchronous 
architecture. The circuit is able to perform frequency synthesis and is 
able to generate a clock frequency larger than 1GHz with an arbitrary 
modulation profile and a modulation frequency up to 5MHz. The top 
level diagram of the all-digital SSCG of [7] is shown in figure 1.15.
As you can see, the circuit is realized by using a digital processor and 
a delay line block which including three digitally controlled delay 
lines. The delay lines ?RE ?????FE are used to generate the modulated
??????? ?????????? ???? ?????? ?????? ????? ?MEAS, closed in a ring 
oscillator topology, is employed for the on-line measurement of the 
delay line resolution in order to compensate the PVT variations.
Figure 1.14. All-digital SSCG architecture of [7].
23
The modulator provides the control signal of the delay lines to achieve 
the required period modulation. However, this architecture suffers 
from a very large deterministic jitter.
24 Chapter 1 - Spread Spectrum Clocking techniques
1.3 References
[1] K.B. Hardin, J.T. Fessler, D.R. Bush, "Spread spectrum clock 
generation for the reduction of radiated emissions," Electromagnetic 
Compatibility, 1994. Symposium Record. Compatibility in the Loop., 
IEEE International Symposium on , vol., no., pp.227,231, 22-26 Aug 
1994.
[2] S. Y. Lin and S. I. Liu, “A 1.5 GHz all-digital spread-spectrum 
clock generator,” IEEE J. Solid-State Circuits, vol. 44, no. 11, pp. 
3111–3119, Nov. 2009.
[3] F. Pareschi, G. Setti, and R. Rovatti, “A 3-GHz serial ATA 
spread-spectrum clock generator employing a chaotic PAM 
modulation,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 
10, pp. 2577–2587, Oct. 2010.
[4] J. Kim, D. G. Kam, P. J. Jun, and J. Kim, “Spread spectrum 
clock generator with delay cell array to reduce electromagnetic 
interference,” IEEE Trans.Electromagn. Compat., vol. 47, no. 4, pp. 
908–920, Nov. 2005.
[5] S. Damphousse, K. Ouici, A. Rizki, and M. Mallinson, “All 
digital spread spectrum clock generator for EMI reduction,” IEEE J. 
Solid-State Circuits,vol. 42, no. 1, pp. 145–150, Jan. 2007.
[6] T. Ebuchi, Y. Komatsu, T. Okamoto, Y. Arima, Y. Yamada, 
K. Sogawa, K. Okamoto, T. Morie, T. Hirata, S. Dosho, and T. 
Yoshikawa, “A 125–1250 MHz process-independent adaptive 
bandwidth spread spectrum clock generator with digital controlled 
self-calibration,” IEEE J. Solid-State Circuits, vol. 44, no. 3, pp. 763–
774, Mar. 2009.
[7] D. De Caro, C. A. Romani, N. Petra, A. G. M. Strollo, and C. 
Parrella, “A 1.27GHz, All-digital spread spectrum clock 
25
generator/synthesizer in 65nm CMOS,” IEEE J. Solid-State Circuits,
vol. 45, no. 5, pp. 1048–1060, May 2010.
[8] S. Hwang, M. Song, Y. H. Kwak, I. Jung, and C. Kim, “A 3.5 
GHz Spread-Spectrum Clock Generator with a memoryless Newton-
Raphson modulation profile,” IEEE J. Solid-State Circuits, vol. 47, 
no. 5, pp. 1199–1208, May 2012.
[9] F. Lin and D. Y. Chen, “Reduction of power supply EMI 
emission by switching frequency modulation,” IEEE Trans. Power 
Electron., vol. 9, no. 1, pp. 132–137, Jan. 1994.
[10] K. K. Tse, H. S.-H. Chung, S. Y. Huo, and H. C. So, “Analysis 
and spectral characteristics of a spread–spectrum technique for 
conducted EMI suppression,” IEEE Trans. Power Electron., vol. 15, 
no. 2, pp. 399–410, Mar. 2000.
[11] J. Balcells, A. Santolaria, A. Orlandi, D. Gonzalez, and J. 
Gago, “EMI reduction in switched power converters using frequency
Modulation techniques”, IEEE Trans. Electromagn. Compat., vol. 47, 
no. 3, pp. 569–576,Aug. 2005.
[12] K. K. Tse, R. W.-M. Ng, H. S.-H. Chung, and S. Y. R. Hui, 
“An evaluation of the spectral characteristics of switching converters 
with chaotic carrier frequency modulation,” IEEE Trans. Ind. 
Electron., vol. 50, no. 1, pp. 171–182, Feb. 2003.
[13] Radio Frequency Devices, FCC 47 CFR Part 15, 2008.
[14] Information Technology Equipment—Radio Disturbance 
Characteristics—Limits and Methods of Measurement, CISPR 22 
(2003–2004).
[15] American National Standard for Methods of Measurement of 
Radio Noise Emissions From Low Voltage Electrical and Electronic 
Equipment in the Range of 9 kHz to 40 GHz, ANSI C63.4 2003.
[16] American National Standard for Electromagnetic Noise and 
Field Strength Instrumentation, 10 Hz to 40 GHz Specifications, ANSI 
C63.2-1996.
26 Chapter 1 - Spread Spectrum Clocking techniques
[17] K. Hardin, R. A. Oglesbee, and F. Fisher, “Investigation into 
the interference potential of spread-spectrum clock generation to 
broadband digital communications,” IEEE Trans. Electromagn. 
Compat., vol. 45, no. 1, pp. 10–21, Feb. 2003.
[18] Y. Matsumoto, K. Fujii, and A. Sugiura, “An analytical 
method for determining the optimal modulating waveform for dithered 
clock generation,” IEEE Trans. Electromagn. Compat., vol. 47, no. 3, 
pp. 577–584, Aug. 2005.
[19] Y. Matsumoto, K. Fujii, and A. Sugiura, “Estimating the 
amplitude reduction of clock harmonics due to frequency 
modulation,” IEEE Trans. Electromagn. Compat., vol. 48, no. 4, pp. 
734–741, Nov. 2006.
[20] D. De Caro, "Optimal Discontinuous Frequency Modulation 
for Spread-Spectrum Clocking," Electromagnetic Compatibility, IEEE 
Transactions on , vol.55, no.5, pp.891,900, Oct. 2013.
[21] A. Erdelyi, Asymptotic Expansions. New York, USA: Dover, 
1956.
[22] F. W. J. Olver, Asymptotics and Special Functions. London, 
U.K.: Academic,1974.
[23] B. Friedman, Lectures on Application–Oriented Mathematics.
New York, USA: Wiley, 1991.
[24] C. Chapman, “Time-domain asymptotics and the method of 
stationary phase,” in Proc. Roy. Soc. London A, Math. Phys. Sci., Apr. 
1992, vol. 437, no. 1899, pp. 25–40.
[25] Y.W. Li, et al., “A reconfigurable distributed all-digital clock 
generator core with SSC and skew correction in 22nm high-k tri-gate 
LP CMOS,” ISSCC Dig. Tech. Papers, pp.70-72, Feb. 2012.
[26] P. Park, et al., “An all-digital clock generator using a 
fractionally injection-locked oscillator in 65nm CMOS,” ISSCC Dig. 
Tech. Papers, pp.336-337, Feb. 2012.
27
[27] N. Da Dalt, P. Pridnig, W. Grollitsch, “An all-digital PLL 
using random modulation for SSC generation in 65nm CMOS,” 
ISSCC Dig. Tech. Papers, pp.252-253, Feb. 2013.
[28] K.H. Cheng, C.L. Hung, C.H. Chang, “A 0.77 ps RMS Jitter 6-
GHz Spread-Spectrum Clock Generator Using a Compensated Phase-
Rotating Technique,” IEEE J. Solid-State Circuits, vol.46, no.5, 
pp.1198,1213, May 2011.
[29] D. J. Allen, A.L. Carley, "Free-Running Ring Frequency 
Synthesizer," Solid-State Circuits Conference, 2006. ISSCC 2006. 
Digest of Technical Papers. IEEE International, vol., no.,
pp.1502,1511, 6-9 Feb. 2006.
 
Chapter 2
All-Digital SSC Generator in 28nm 
CMOS Supporting Discontinuous 
Modulation Profile
The developed circuit presented in this chapter exploits an 
all-digital architecture which do not require any loop to implement 
frequency synthesis and spreading. This allows to solve all limitations 
of PLL/FLL in term of capability to implement frequency
discontinuous modulation profiles, with high accuracy and large 
modulation frequency and instant recovery time. In addition, the 
developed circuit can be designed by using a design flow completely 
based on standard-cells, reducing in this way the design time and 
simplifying the porting in new technologies.
This chapter is organized as follows. Firstly a detailed description of 
the developed architecture is discussed. The circuit implementation 
details are given in the sections 2.1, 2.2 and 2.3. Section 2.4 describes 
the circuit sizing and the major sources of deterministic jitter are also 
discussed. Finally, the section 2.5 reports the on-chip experimental
measurement results.
The top-level diagram of the developed all-digital Spread Spectrum 
Clock Generator (SSCG) is shown in Fig. 2.1. The system has an 
input clock signal (clk) having a constant period Tclk=1/fclk and 50% 
duty cycle and generates a frequency modulated output clock signal.
The output clock signal is produced by using the four 
digitally-controlled delay-lines (DCDL), named ?0RE, ?1RE, ?0FE,
?1FE and two delay interpolators, digitally controlled by a Modulator.
The DCDL couple ?0RE, ?1RE is driven on the rising-edge of clk and 
30 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
is in charge of generating output clock edges in a timing window of 
length Tclk/2 starting from the input clock rising-edge. Similarly the 
DCDL couple ?0FE, ?1FE is in charge of generating output clock 
edges in a timing window starting from the falling edge of the input 
clock signal. Each delay-line is used for one half clock period, the 
remaining half clock period is used as timing margin for the settling 
time of the delay-lines control signals (delay-control R and delay-
control F). Each DCDL has a delay resolution equal to tR (about 28 ps 
in our implementation).
Figure 2.1. SSC Generator architecture.
Offset-DCDLs (?1RE and ?1FE) have the capability to provide delay-
values offset by tR/2 with respect to standard DCDL (??RE and ??FE). 
As a consequence each delay interpolator receives two signals delayed 
by tR/2 and is able to interpolate the delay of these two signals with a 
1/16 step. The overall resolution in output clock edge positioning 
(encoded by ?RE/tR and ?FE/tR signals) of the SSCG is therefore tR/32 
(about 0.9 ps in our implementation, typical corner). The XOR gate 
merges the two waveforms produced by the DCDLs and interpolators 
and generates the output clock signal. 
By choosing the number of delay elements of each delay-line so that 
an input clock semi-period is covered, the Modulator is able to 
position output clock edges anywhere in the time axis, with a delay 
resolution equal to tR/32 (see section 2.1). This topology has the 
31
capability to position two output clock edges within one clock period. 
Therefore, the maximum output clock frequency, named fout is:
MAXout CLK
f f? (2.1)
Note that the digital inputs Tout?? ??? and fm represent the nominal 
output period, the modulation depth and the modulation frequency, 
respectively.
Such a tight resolution allows to generate output clock signals with 
reduced jitter by compensating the delay asymmetries (rise/fall 
asymmetries of DCDL, delay asymmetries of the XOR gates which 
provide the output clock signal) between the different paths between 
the input clock and the output clock signal. 
In this work we have verified a very simple compensation approach, 
in which delay asymmetries are evaluated in simulation and 
implemented in two simple compensation Lookup-Tables (delay 
compensation R and delay compensation F in figure 2.1). It is worth to 
note that each LUT requires only two input bits, and encodes delay 
asymmetries normalized to tR. This makes the asymmetry 
compensation weakly dependant on process, voltage and temperature 
(PVT) operating conditions. Our simulations, on different corners and 
operating conditions, showed that this approach is very effective in 
reducing the output jitter of the circuit. As we will see in the section 
2.5, this is also confirmed by jitter measurements of the test chip.
A fifth delay line (named ?MEAS), closed in a ring-oscillator topology, 
and a Measurement Unit are included to compensate PVT variations. 
The ring-oscillator period is TOSC?2(?MEAS+?nand), where ?nand is the 
delay of the NAND gate used to start/stop the oscillator. A simple 
counter is employed in the measurement unit, to measure the ratio 
TOSC/Tclk. The ratio tR/Tclk is obtained by performing two successive 
measurements of TOSC (TOSC1, TOSC2), for two different values (N1, N2)
of the control signal ?MEAS/tR: tR=(TOSC2-TOSC1)/(2(N2-N1)) where, 
N2-N1 is a power of 2, to simplify hardware implementation (see 
section 2.3).
The ?MEAS DCDL can be smaller than the other two DCDL. In our 
implementation ?0RE, ?1RE, ?0FE and ??FE DCDLs use 88 elements, 
while ?MEAS requires only 40 elements. The value of tR is continuously 
measured and feeds the modulator block (see figure 2.1), to track PVT 
variations. In Fig. 2.1, the signal R0 at prescaler output is 
asynchronous with respect to the input clock, and hence a 
32 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
synchronizer is required. This is the only synchronization point 
between different clock domains in our architecture.
Implemented SSCG supports two different sleep modes. In weak sleep 
mode all sub-modules are shut-down , with the exception of the ring-
oscillator and the counter, that constantly tracks voltage and 
temperature variations. In these conditions the SSCG presents a power 
dissipation of only 390?W and is able to wake-up in a very low 
recovery time (few nanoseconds), which depends only on the 
pipelining latency of the circuit. In full sleep mode, all sub-modules 
are turned-off and the recovery time depends on the measurement time 
of the measurement unit (5.5?s in our implementation). It is worth to 
note that weak sleep mode functionality is not possible in a PLL/FLL 
where tracking the voltage and temperature variations requires the 
operation of the whole loop.
The developed all-digital SSCG has been implemented by using only 
standard cells. As you can see in the next sections, all the output jitter 
sources are confined within the Delay Line block, therefore a full 
custom layout has been realized for the Delay Line block in order to 
mitigate the asymmetries components. Instead, an automatic place & 
route design is used to design the Processor block. 
33
2.1 Modulator
The Modulator determines the shape of the modulation, the 
modulation depth and the modulation frequency by computing the 
position of the edges of the output clock signal. Fig. 2.2 shows the 
architecture of the Modulator. This blocks is composed by two 
subsystems: a Direct Digital Frequency Synthesizer (DDFS) and a 
Modulation Profile block. Both subsystem are synchronized by the 
input clock signal (clk).
Figure 2.2. Architecture of the Modulation profile and Digital Frequency
Synthesizer.
The DDFS receives as input the instantaneous frequency fi of the 
output clock, normalized to the input clock frequency (fCLK) and 
computes, in each clock cycle, the DCDL inputs (INRE, INFE) and 
????????????????????????RE/TCLK???FE/TCLK). As shown in Fig. 2.2 this is 
obtained with the help of a finite state machine (Next-Edge
Computation block).
The signal INRE is high when an output clock edge has to be generated 
between the next rising and the falling edges of the input clock signal. 
??????????????????????????RE/tR encodes the delay (normalized the delay 
line resolution tR) between the input clock rising edge and the output 
clock edge. Similarly, the signal INFE is high when an output clock 
edge has to be generated between the next falling and the rising edges 
of the input clock signal. In this case, the s?????? ?RE/tR encodes the 
34 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
delay (normalized the delay line resolution tR) between the input clock 
falling edge and the output clock edge. Note that the output clock can 
have a frequency as high as fclk, therefore it is possible that in the same 
clock cycle both INRE and INFE are high. The DCDL control signals 
are scaled to the DCDL resolution tR by using two multipliers and the 
value TCLK/tR is given by the Measurement Unit. 
The Modulation Profile block computes the instantaneous frequency fi
by adding the desired output frequency (fo) to the instantaneous 
frequency ???????????fi imposed by the frequency modulation profile.
??????????????????????????????????????????????????f and fm) is shown 
in figure 2.3 (considering as an example a linear modulation profile). 
Note that the modulations are realized in down-spreading.
Figure 2.3. Modulation parameter (e.g.  triangular modulation shape).
As you can see in the figure 2.2 an accumulator counts the elapsed 
time normalized to the modulation period Tm=1/fm. This signal feeds 
the blocks which compute four different modulation profiles. 
Triangular and Sawtooth profiles are realized by using respectively a 
1’s complementer and the output of the accumulator directly. The 
Hershey-Kiss and the optimal discontinuous frequency modulation [1]
are implemented by using a piecewise linear approximation using 64 
uniform segments. These blocks output the instantaneous period 
devia????? ??fi?? ??????????? ??? ?f. The desired modulation can be 
selected by using  the signal s0 and s1 bits. Finally a multiplier and an 
35
adder allows to compute the instantaneous frequency fi. Please note 
that, with respect the previous all-digital SSCG [2], the our Modulator
allows to realize the optimal discontinuous frequency modulation.
Note that the discontinuous frequency modulation allows to obtain an 
improvement of the modulation gain of 1.5—2dB with respect to 
continuous frequency modulations (see section 1.1). 
The Table 2.1 reports the exact meaning of the Modulator inputs. 
Please note that there is no need to compute modulation parameters 
??fi/TCLK, fi/fCLK) at gigahertz frequency. The modulation profile 
block, therefore, is divided by four to reduce the power dissipation. 
The resolution of TOUT is, in the worst case (fCLK=300MHz), equal to 
0.41ps. In the worst case (TOUT/TCLK=1024) the maximum relative 
??????????? ?????? ????OUT is 3.1%. Please note that a larger 
modulation depth can be obtained for TOUT<1024TCLK. In the worst 
case (TOUT/TCLK=1) the resolution depth is 0.20%. Finally, in the 
worst case (fCLK=300MHz) the maximum modulation frequency is 
18.75MHz, while the modulation frequency resolution is 0.07KHz.
Table 2.1. Meaning and range of Modulator input signals
36 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
2.2 Delay Line Block
 
A key element of an all digital Spread Spectrum Clock Generator is 
the design of the digitally controlled delay line (DCDL), since INL of 
these components translates in output clock jitter. As introduced in the 
section 2.1, four DCDLs and two interpolators are used in order to 
generate the output clock signal.
In literature different architectural solutions have been developed to 
implement the DCDL. In many papers the DCDL is realized by using 
a delay cells chain and a multiplexer to select the desired cell output 
[3]-[8]. The drawback of these solutions consists in the trade-off 
between the frequency range and the minimum delay (tmin) of the 
DCDL due to increase of the multiplexer delay with the increase of 
the number o cells. Please note that tmin is a critical design parameter 
in many application. In fact, for example, in ADDLL/ADPLL tmin
determines the maximum output frequency of the circuit. This critical 
aspect is true also for the all-digital SSCG of [2], where a correct 
DCDL synchronization is obtained only by imposing that tmin is lower 
than one half input clock period. Note that the solution presented in
[3] allows to obtain a reduced tmin by using a tree-based multiplexer 
topology. However, this implies an irregular structure which results in 
increased non-linearity layout effects. 
In [9]-[12] the DCDL consists in series of equal delay elements (DE). 
Each DE is constructed by using only NAND gates, obtaining a very 
good linearity and resolution equal to 2tNAND (tNAND being the delay of 
a NAND gate). The minimum delay of DCDL (tmin) is very low and 
becomes independent of the number of cells. Moreover  the highly 
regular topology allow a simple layout organization which provides 
very low non linearity layout effects. However this topology presents 
a glitching which does not allow its employ to use in a SSCG 
architecture.
The DCDL topology employed in [13] uses again a structure of 
cascaded delay-elements. However, differently from [9]-[12], each 
element is constructed by using tree-state inverter (TINV), obtaining 
a resolution tR=2tTINV. In addition, in [13] the two parallel DCDL 
topology, shown in Fig. 2.4, is proposed. This topology consists in 
using two DCDL in parallel connection, where the delay of one 
DCDL is offset by tR/2=tTINV. This structure, named coarse stage, 
37
presents an “equivalent” resolution equal to tR/2=tTINV that, in general, 
results lower than the resolution of [9]-[12]. In fact, the pull-up
network of a TINV requires two series devices whereas a NAND gate 
uses a single device in the pull-up. Therefore, we can expect that 
tNAND<tTINV<2tNAND.
Figure 2.4. Two parallel DCDL topology proposed in [13] to halve the resolution of 
the coarse stage.
In [2] the DCDL is also realized by using a cascade of equal delay 
elements. In particular, each delay element is constructed by using an 
inverter and an inverting multiplexer. Note that the different delays of 
the inverter and the multiplexer results in a tmin mismatch between odd 
and even control codes. This mismatch results in an increased integral 
non linearity (INL). Moreover, the multiplexer have a large delay, 
which provides a resolution higher than the resolution of both NAND-
based and TINV-based DCDLs. The DCDL presented in [14] is also 
based on a NAND-based delay elements chain. This topology avoids 
the glitching problem of previous NAND-based solutions [9]-[12] and 
maintains the same resolution (2tNAND) and minimum delay. The 
Delay Line Block of the developed SSCG use the glitch free NAND-
based DCDL proposed in [14]. Note that in our implementation, as 
you can see in the next section, the parallel coarse stage approach of 
[13] is extended to the NAND-based DCDL of [14]; this results in a 
resolution equal to tNAND which results lower than the previous DCDL 
topology. Moreover, as you can see in section 2.2.2, this work 
introduces a novel fine stage that allows to provide a lower resolution. 
Note that the total absolute jitter depends on delay resolution and 
asymmetries, therefore the improving delay-line resolution with 
reducing of asymmetries have a strong impact on the overall jitter. 
38 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
2.2.1 NAND-based Digitally Controlled Delay 
Lines (DCDL)
The NAND-based DCDL presented in [14] is shown in Fig. 2.5. This 
DCDL is composed by a cascade of equal delay elements (DE), each 
realized by using four NAND gates (where “A” denotes the fast input)
and 2 dummy NAND gates (highlighted in gray) added for load 
balancing. This architecture guarantees the monotonicity if the delay 
chain and also very good linearity, if the DCDL layout is properly 
arranged. Two sets of control bits (Si and Ti) are used for controlling 
the delay of the DCDL. The Si bits encode the control-code c by using 
a thermometric code: Si = 0 for i < c and Si ????????????????????????????
bits Ti encode again c by using a one-cold code: Tc+1 = 0, Ti ????????????
c+1. In the Fig. 2.5, as an example, the state of the control bits are 
shown considering In=1 and c=1. According to the chosen control-bits 
encoding, each DE can be in one of three possible state: turn state 
(Si=1, Ti=1), pass state (Si=0, Ti=1) and post-turn state (Si=1, Ti=0). In 
Fig. 2.5 the NAND highlighted in gray allow to obtain the same load 
(two NAND gates) for all the NAND gates which, therefore, in a first 
order approximation, present the same delay. In this way we can write 
the delay ?, from In to Out, as follows:
2 2NAND NANDt t c? ? ? ? (2.2)
where tNAND=(tNAND LH + tNAND HL)/2 while tNAND LH e tNAND HL
represent the delay of each NAND gate for a low-to-high and high-to-
low output commutation, respectively. 
Figure 2.5. Digitally-Controlled Delay Line topology of [14]
 
39
The equation (2.2) suggests that tR = 2tNAND and tmin = 2tNAND. Please 
note that there is a relationship between the logic-states of successive 
DEs. As an example a DE in post-turn state is always followed by a 
DE in turn-state. 
Table 2.2. All possible states of a couple of DEs
The Table 2.2 shows all the possible logic-states of i+1-th DE given 
the logic state of i-th DE. Glitching is a common problem into the 
DCDL design, in fact, for example, when the DCDLs are employed to 
process clock signals the glitch-free operation is required. The
NAND-based DCDL is glitch free when the switching of every couple 
of successive DE is glitch-free in all possible conditions. In particular,
a timing constraint on control bits is required to avoid the glitching
events for the DCDL of [14]. Therefore, a properly driving circuit is 
used to compute a correct time temporization of control-bits Si and Ti.
The Figure 2.6 shows the driving circuit presented in [14] which, 
assuming the employ of the encoding mentioned before, have no-
glitch in presence of delay control code switching. Please note that 
this driving circuit uses two flip-flop for determining the control bits 
of each DE. So, in general, if we have N delay element then 2N flip 
flop are required for driving the DCDL. 
In order to reduce the number of those flip-flops, an improved driving 
circuit is realized. The developed driving circuit is able to generate the 
control-bits of the NAND-based DCDL avoiding the glitch events and 
using one flip-flop for each DE.
For this purpose a new encoding for the control bits is implemented 
(see Table 2.3). This encoding allows to evaluate each Si signal by 
looking two successive Ti signals (Ti and Ti-1).
40 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
Figure 2.6. Driving Circuit of [14]
 
More precisely, the Si can be obtained by using an OR logic function 
between the signals Ti and Si.
Table 2.3. New encoding for logic states of DE
 
The Fig. 2.7a  shows the novel driving technique that allows to halve 
the flip-flop number of the driving circuitry with respect to the driving 
circuit of Fig. 2.6. Instead, the Fig. 2.7b shows an alternative driving 
circuit obtained by using complemented flip-???????????????i signals), 
one NAND gate and an inverter to compute Si and Ti signals. 
Unfortunately, with the driving circuit of both Fig. 2.7a and Fig. 2.7b
a glitch can occurs at the output of the DCDL.  
As an example, the Fig. 2.8 shows a possible glitch case in the 
NAND-based DCDL when the delay control code is increased. In fact, 
referring to the i-th DE both inputs of the NAND “4” switch, in 
particular the “A” input of this gate switches from 1 to ?, and this 
switching is driven by the high-to-low (HL) switching of Si+1. The  
other input (“B” input) of the NAND gate switches from ? to 1, 
because of the HL switching of Si. A glitch ( 0? ?? ? ) can occurs at 
the output of this gate if the “B” input switches much before the “A” 
input.
41
Figure 2.7. New driving circuit topology.
We will assume that no glitch is produced when the difference 
between  the  arrival  time  of  “A” input and the arrival time of “B” 
input is lower than the propagation delay of the gate (tNAND). Referring 
to the Fig. 2.8, the arrival time of “A” input and “B” input are 
indicated as tA and tB respectively.
Figure 2.8. Possible glitch event of NAND-based DCDL with the driving circuit of 
Fig 2.7.
In particular tA=tSi+1HL + 2tNAND where tSi+1HL represent  the arrival 
time of HL switching of Si+1 signal and tB=tSiHL + tNAND where tSiHL
represent the arrival time of HL switching of Si signal. If the driving 
circuit shown in the Fig. 2.7b is used then the arrival time tSiHL and the 
arrival time tSi+1HL are both equal to tNAND; in this way the time 
difference tB – tA = –tNAND, that is the “A” input switches after the “B” 
input, so the glitch occurs at the output of the NAND “4”.
42 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
The Fig. 2.9 shows a glitch-free design of the driving circuit where a 
certain number of gates are used in order to guarantee the timing 
constraints on control bits Si and Ti for avoiding the glitch. In fact, 
referring to the Fig. 2.9, the arrival time tSiHL???tNAND and the arrival 
time tSi+1HL=tNAND, therefore tB – tA = tNAND, that is the “B” input 
switches much after the “A” input. In this way the glitching at the 
output of the NAND “4” is avoided and the timing glitch margin is 
????????????tNAND.
The analysis of all the possible glitch cases is out of the scope of this 
thesis, however we have verified that the glitch events are avoided in 
all possible cases. To sum up the circuit of Fig. 2.9, composed by a 
cascade of control elements (CE), exploits the redundancy of the 
encoding of Si, Ti bits (Si=Ti+Ti-1) which allows to employ a single 
flip-flop for each CE.
Figure 2.9. Novel glitch free driving circuit topology.
 
In comparison to the solution of [14], which requires two flip-flops for 
CE, this results both in a sensible reduction of area occupation and 
power dissipation.
As mentioned in the previous section, in order to improve the 
resolution of the DCDL presented in [14], we have been extended the 
parallel topology proposed in [13] to a NAND-based structures. This 
topology consists in using two DCDL in parallel connection, where 
43
the delay of one DCDL is offset by tR/2. This approach allows to 
obtain an “equivalent” resolution equal to tR/2=tNAND.
Figure 2.10. Digitally-Controlled Delay-Line (DCDL) topologies.
Therefore, as you can see in Fig. 2.10, an offset NAND-based DCDL 
is realized. In the offset-DCDL, the first DE is modified and two 
offset DEs are added at the beginning of the line. The effect of these 
DEs is replacing the NAND 4 of the first DE of the main DCDL, with 
the parallel of NAND 4a and the cascade of NANDs 4b,4c,4d. This 
structure results in an overall delay of 2tNAND (half-way between 
3tNAND and tNAND) which corresponds to add tR/2 to the signal 
propagation path, in comparison to the main DCDL. The DE layout 
has been placed and routed manually, by carefully controlling 
parasitic.
As discussed in section 2.1, note that each delay-line has able to 
position the output clock edge within a window with a duration of one 
half clock input clock period (TCLK/2). This requires the following 
constraint:
44 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
? ?1 2R CLKN t T? ? ? (2.3)
?????? ?? ??? ???? ??????? ??? ????????? ??? ???? ?????? ?????? ??RE?? ??RE,
??FE????FE . Since the minimum input clock frequency is 300MHz and 
the minimum tR is 21.8ps, the minimum number of elements of the 
delay lines is NMIN=77. In the realized circuit, in order to optimize the 
symmetry of the layout of the delay lines N is chosen equal to 88. This 
allows the circuit to work down to a frequency of 264MHz.
2.2.2 MUX-based Digital Delay Interpolator
As we have highlighted at the beginning of this chapter, the design of 
highly linear interpolators and the design of the DCDLs is a key point 
to realize a SSCG with reduced output jitter. In the developed IC a 
novel digital delay interpolator is employed, which, differently from 
previously proposed solution (e.g. [13]), can be realized by using only 
standard cells. The employed circuit topology, with an interpolation 
factor N equal to 16, is shown in Fig. 2.11.
The circuit is composed by sixteen parallel two to-one MUX cells, 
available from the standard-cells library, and is driving by a 
thermometric code through the control bits F0,....,F15. Let us name x
the delay control code. If we assume for each multiplexer a simple 
delay model with intrinsic delay ti and rise time tr, the voltage at the 
output node can be written as:
DD
OUT i
R
V xV t t
t N
? ?? ? ??? ?? ? (2.4)
where N is the number of multiplexers and ? is the delay between the 
two input signals. 
The propagation delay td of the interpolator can be therefore written 
as:
2
R
D i
t xt t
N
? ? ? ? (2.5)
45
This simple equation shows that the interpolator allows to linearly 
control the delay with a delay-range corresponding to the delay ? of 
the two input signals. Clearly this simple model neglects a number of 
non-ideal effects, like asymmetry in multiplexer propagation delay 
between the two multiplexer inputs (D0 and D1), dependence of the 
MUX delay on the logic value of the control bit state, that translate in 
a non-linear behaviour. 
One solution to mitigate these effects is flipping each other the
multiplexers, as shown in the figure 2.11. This allows to have, 
independently from the control code x, always almost N/2 multiplexer 
control bits at state 0 and N/2 multiplexer control bits at state 1. In 
these conditions the multiplexers asymmetries tends to compensate 
each other.
Figure 2.11. Architecture of the MUX-based digital delay interpolator used in the 
developed SSCG.
In our implementation the delay interpolator receives two signals 
delayed by tR/2 from the DCDLs. In particular, an architecture with 
sixteen multiplexer has been used in order to interpolate the delay of 
these two signals with a 1/16 step. In this way the overall resolution in 
output clock positioning of the SSCG is equal to tR/32.
46 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
2.2.3 Delay Line Block timing
As explained previously, the delay lines ??RE and ??RE covers a 
timing window which starts from the rising edge of the input clock. 
On the other hand, the delay lines ??FE and ??FE covers a timing 
window which starts from the falling edge of the input clock. 
Therefore, in the architecture of figure 2.1, the input signals of the 
delay lines ??RE and ??RE are the output of positive-edge flip-flops. 
The input signals of the delay lines ??FE and ??FE are the outputs of 
negative flip-flops. Note that no dual-edge flip-flop is necessary 
within the circuit. This simplifies the design of the clock tree. 
Figure 2.12. Delay Line block detailed timing.
Figure 2.12 shows a detailed timing of the delay lines ??RE and ??RE
and the delay interpolator. Note that the timing of the delay lines ??FE
and ??FE is similar, the only difference is that the delay lines ??FE and 
??FE are timed on the falling edge of the clock signal. Therefore, 
47
without loss of generality we can consider only the delay lines ??RE
and ??RE and delay interpolator for the timing analysis.
In figure 2.12 the following definitions are assumed: 
? clk_0: clock signal of the control block of the delay lines ??RE and 
??RE.
? SRE, TRE: control signals of the delay element of the delay lines 
??RE and ??RE.
? clk_F: clock signal of the delay interpolator.
? F: control signals of the delay interpolator.
? clk_IN: clock signal of the flip-flop.
? INRE: output signal of the flip flop that is the input of the delay lines 
??RE and ??RE.
? OUTRE: output signal of the delay interpolator.
? ININTP_RE: input signal of the delay interpolator.
? ?clkIN: timing margin of the clk_IN signal with respect the clk_0
signal.
? ?clkF: timing margin on the clk_F signal with respect the clk_0
singal.
? tSmax: maximum propagation delay through the control block of the 
delay lines.
? tSmin: minimum propagation delay through the control block of the 
delay lines.
? tFmax: maximum propagation delay of the interpolator control 
signals.
? tFmin: minimum propagation delay of the interpolator control 
signals.
? tINmax: maximum propagation delay of the flip-flop.
? tINmin: minimum propagation delay of the flip-flop.
? tminw.c.: minimum propagation delay of the delay lines and delay 
interpolator in the worst case.
? tmin_DLw.c.: minimum propagation delay of the delay lines in the 
worst case.
? tmin_DLb.c.: minimum propagation delay of the delay lines in the best 
case.
Note that the clock signals are synchronized by the clock tree of the 
top-level of the SSCG.
Let us evaluate the setup/hold time constraints on the control signals 
of the delay lines ??RE and ??RE (S and T signals) and the setup/hold 
48 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
time constraint on the control bits of the delay interpolator (F signals).
The setup time constraint on the control signals SRE and TRE can be 
written as:
min maxclkIN IN St t? ? ? (2.6)
Therefore:
max minclkIN S INt t? ? ? (2.7)
The hold time constraint can be written as:
max min_ . . min/ 2clkIN IN DLw c clk CLK St t T T t? ? ? ? ? ? (2.8)
Therefore:
max min_ . . min/ 2clkIN clk IN DLw c ST t t t? ? ? ? ? (2.9)
By using the equations (2.7) and (2.9) the clock period constraint on 
the S,T signals can be obtained as:
max min min_ . . max min/ 2clk IN IN DLw c S ST t t t t t? ? ? ? ? (2.10)
Similarly, the setup time constraint on the control signals of the delay 
interpolator can be written as:
min min_ . . maxclkIN IN DLb c clkF Ft t t? ? ? ? ? ? (2.11)
Therefore:
 
min min_ . . maxclkF clkIN IN DLb c Ft t t? ? ? ? ? ? (2.12)
Instead the hold time constraint is given by:
max min . . min/ 2clkIN IN clk w c CLK F clkFt T t T t? ? ? ? ? ? ? ? (2.13)
49
Therefore:
max min . . min / 2clkF clkIN IN w c F clkt t t T? ? ? ? ? ? ? (2.14)
By using the equations (2.12) and (2.14) the clock period constraint on 
the FRE signals can be obtained as:
max min max min min . . min_ . ./ 2clk IN IN F F w c DLb cT t t t t t t? ? ? ? ? ? (2.15)
An extensive set of simulations have been performed in order to 
evaluate the delay of the propagation signals tFmax, tFmin, tSmax, tSmin,
tINmax, tINmin, tminw.c, tmin_DLw.c and tmin_DLb.c. In this way, it is possible to 
?????????????????????????????clkIN???clkF and Tclk by using the equations 
(2.7), (2.12), (2.10) and (2.15). The Tab. 2.4 summarizes the timing 
constraints of the delay line block.
Table 2.4. Timing constraints of the Delay Line Block.
50 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
2.3 Measurement Unit
As introduced in the section 2.1, the SSCG is needed to take into 
account the process, voltage and temperature (PVT) variations which 
modifies the delay-line resolution tR. In order to compensate the 
variations of tR with the PVT operating condition of the circuit, the 
Measurement Unit continuously measures the ratio TCLK/tR and 
provides the value of this parameter to the Modulator block. 
As shown in Fig. 2.1, the measurement circuit is composed by a 
Measurement Unit which drives a replica delay-???????MEAS), closed in 
a ring oscillator topology. Let us to explain the measurement 
operation. 
The period of the ring oscillator output (TR) is given by:
2 MEASR NAND MIN R
R
T t t t
t
? ??? ? ? ? ?? ?? ?
(2.16)
where tNAND is the delay of the NAND gate and tMIN is the minimum 
delay through ?MEAS.
The value of tR can be extracted in the following way. Firstly the delay 
????? ?MEAS ??? ??????? ????? ?MEAS/tR input equal to N1. The resulting 
period of the ring oscillator TR1 is divided by the prescaler with a 
division ratio equal to 4 and the frequency divider with a division ratio 
?????? ??? ???? ??? ???? ??????????? ???? ??? ????? ???????? ???? ??????? ??? ????
frequency divider is used as the enable signal of a up/down counter 
clocked by input clock. At the end of one cycle, the counter content 
M1 will be a measure of the half period of the prescaler output:
? ? ? ?11 11 2 RCLK NAND MIN R
d TM T d t t N t?? ?? ? ? ? ? ? ? ?? ?? ? (2.17)
??????????? ?? ??????? ???????????? ??? ?????????? ????? ?MEAS/tR input 
equal to N2. Therefore the counter content M2 will be given by:
? ? ? ?2 21 CLK NAND MIN RM T d t t N t? ? ? ? ? ? ? (2.18)
51
By subtracting the two equations (2.17) and (2.18) the two unknowns 
tNAND and tMIN are deleted from the measurement:
? ?1 2 1 2 2R
CLK
tM M d N N
T
? ? ? ? ? (2.19)
Therefore:
? ? ? ?1 21 2 1 2
2R
CLK
t M M
T d N N d N N
?? ?? ? ? ? (2.20)
The Measurement Unit evaluates the ratio TCLK/tR by using an 
arithmetic unit:
? ?1 2
1 2
CLK
R
d N NT
t M M
? ?
?? (2.21)
Therefore the maximum measurement error of tR/TCLK is:
? ?/ 1 2
2
R CLKt T d N N
? ? ? ? (2.22)
By expanding in Taylor series and truncating at the first order it is 
possible to evaluate the error on TCLK/tR measurement:
2
/ /CLK R R CLK
CLK
T t t T
R
T
t
? ?? ?? ?? ?? ? (2.23)
The total measurement time is equal to:
? ? ? ?1 2 2meas R MIN NANDT d N N t d t t? ? ? ? ? ? ? (2.24)
52 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
Please note that by reducing N2 we improve the measurement error 
and the measurement time but we increase also the frequency of the 
ring oscillator. Therefore, the N2 value is selected as low as possible, 
by taking account the prescaler maximum operating frequency. In our 
implementation, tR=21.8ps in the fast corner (see the next section) and 
we have chosen N2=6, obtaining a maximum ring oscillator frequency 
equal to 3.28GHz, that is the maximum clock frequency for which the 
prescaler has to be designed. Clearly, the maximum clock frequency 
of the frequency divider will be given by 3.28GHz/4=821MHz. 
Moreover, in the slow corner (tR=51.2ps) so the maximum ring 
oscillator frequency is 1.39GHz and the maximum clock frequency of 
frequency divider is 1.39GHz/4=349MHz. We have also chosen 
N1=N2+32 and d=216. In this way we have an error:
2
20
/ 2CLK R
CLK
T t
R
T
t
? ?? ?? ?? ?? ? (2.25)
Table 2.5. Measurement Unit parameters and performances.
By considering the minimum clock frequency (300MHz) and the 
minimum delay-line resolution (21.8ps) we can evaluate the error in 
the worst case conditions:
53
/ 0.02238CLK RT t? ? (2.26)
Moreover, with the chosen parameters, by considering the worst speed 
corner (tR=51.2ps), the total measurement time is:
161.1measT s?? (2.27)
The Table 2.5 reports the measurement circuit parameters and 
performances.
2.4 Circuit analysis and sizing
2.4.1 Measurement Circuit Sizing and 
Limitations
The range of tR values which the measurement circuit is able to handle 
is determined by the length of the registers storing M1 and M2 and the 
number of bits used for the signal TCLK/tR in Fig. 2.1. 
In this section the limits imposed by the length of M1 and M2 are 
discussed. The internal architecture of the Measurement Unit uses a 
single up/down counter to measure directly the quantity M1-M2.
Initially the Delay-line ?MEAS is driven with ?MEAS/tR = N1, the 
up/down counter is cleared and is set for up-counting. Afterward, the 
Delay-line ?MEAS is driven with ?MEAS/tR = N2 and down-counting is 
selected. Without resetting the counter between the first and the 
second measure, the final value of the counter will be directly equal to 
M1-M2. The only condition to impose for the correct circuit operation 
is that M1-M2 is lower than the maximum value representable within 
the counter. This condition imposes a constraint on the maximum 
measurable tR. Therefore we can write:
? ?
12
1 2
2 1ML
RMAX CLKt Td N N
?? ?? ? (2.28)
54 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
where LM12 is the number of bits of the counter which calculates M1-
M2. In our case we chosen LM12=19, consequently:
0.02499RMAX CLKt T? ? (2.29)
In the worst case (fCLK = 1500MHz), the maximum value allowed for 
tR is 167ps. 
Let us now consider the signal TCLK/tR (see Fig. 2.1). In our 
implementation this signal is composed by 14 bits, with a MSB of 
weight 27 and an LSB of weight 2-6. The LSB weight results in a 
quantization error of TCLK/tR given by:
7
/1 2CLK RT t? ?? (2.30)
Table 2.6. Parameter and limitations of the measurement circuit.
This is an additional source of error on the signal TCLK/TR of Fig. 2.1,
which adds to the error ?TCLK/tR described in the section 2.2. The MSB 
weight of TCLK/tR imposes a limitation on the minimum tR value which 
can be measured. The minimum possible tR value can be written as:
8 62 2
CLK
RMIN
Tt ?? ? (2.31)
 
55
In the worst case (fCLK = 300MHz), tRMIN is equal to 13ps. The Tab. 2.6
summarizes the parameters and limitations of the measurement circuit.
2.4.2 Jitter Analysis and Circuit Sizing  
Let us analyze the jitter sources in our SSCG architecture. There are 
two main jitter sources: the jitter sources introduces by the Modulator
block and the jitter sources due to the delay asymmetries of the Delay-
Line Block. In this section a theoretical jitter analysis for both blocks 
is discussed. 
The figure 2.2 show the architecture of the Modulator. Note that when 
the modulation is turned-??????f/fCLK is equal to 0 and, consequently 
??????fi=0. Therefore, in these conditions, no error is introduced by the 
Modulation Profile block in Fig. 2.2. Fig. 2.13 show the portion of 
Modulator relevant for the jitter analysis.
Figure 2.13. Portion of the Modulator relevant for the jitter analysis
(without modulation).
The DDPS generates the input signals for the four delay-????????RE,
??RE?? ??FE?? ??FE and for the delay interpolators. As shown in Fig. 
2.13, this is obtained with the help of a Next-Edge Computation block
that is a finite state machine (FSM). The output of the Next-Edge 
Computation ????????RE/TCLK an???FE/TCLK) drive the delay lines and 
56 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
interpolators through a scaling block. In this way it is possible to 
compute the position of output clock edges related to input clock 
period.
?????????????? ???????RE/TCLK ?????FE/TCLK has a weight of 2-2 that is 
the ???????? ?????? ??? ?RE/TCLK ???? ?FE/TCLK is 1/2. In fact, each 
clock edge can be positioned in a timing window of one half clock 
cycle. Let us name 2-u ???? ??????? ??? ???? ???? ??? ?RE/TCLK and 
?FE/TCLK. In the implementation of Fig. 2.13???????????????RE/TCLK and 
?FE/TCLK are on 12 bit, therefore u=13. ???????????????????????RE/TCLK
?????FE/TCLK are scaled to the  resolution tR by using two multipliers 
and the value TCLK/tR is given by the Measurement Unit.
Let us analyze the jitter components of this portion of Modulator. The 
first jitter component ??? ???? ??? ???? ????????????? ??? ?RE/tR ???? ?FE/tR.
This error correspond to the error due to the resolution of the Delay-
line, and, consequently is independent from the particular architecture 
and sizing chosen for the Modulator. This first jitter component, 
named 
Rt
Jabs , can be easily write:
0.5
R
R
t
tJabs
I
? ? (2.32)
where I is the interpolation factor equal to 32. The second jitter 
component is due to the errors of the signal TCLK/tR. As discussed in 
the previous sections, this signal is affected by a measurement error 
(?TCLK/tR) and a quantization error (?1TCLK/tR). Note that the maximum 
value of ?RE/TCLK ???? ?FE/TCLK is 1/2, therefore the source of jitter 
due to the errors of  TCLK/tR, named Jabs TCLK/tR, can be write:
? ?/ / /1 12CLK R CLK R CLK R RT t T t T t tJabs II? ?? ? ? ? ? (2.33)
According to (2.33) we have:
/ 0.483CLK R
R
T t
tJabs
I
? ? (2.34)
 
The last source of jitter is due to the quantization of ?RE/TCLK and 
?FE/TCLK. By looking the Fig. 2.7, this source of jitter can be easily 
written as:
57
1 2
2
u CLK
delayTclk R
R
TJabs t I
t I
?? ? ? ? ?? (2.35)
By considering the minimum clock frequency (300MHz), the 
minimum delay-line resolution (21.8ps) we can evaluate this source of 
jitter in the worst case conditions. Moreover in our implementation u 
is equal to 13, therefore:
0.299 RdelayTclk
tJabs
I
? ? (2.36)
By summing up the three jitter components, the following upper 
bound for the jitter is obtained:
1.282 RSSCG
tJabs
I
? ? (2.37)
It is worthwhile to highlight that that the (2.37) represent an upper 
bound for the output jitter, since it assumes the independence of the 
three jitter sources. Therefore the actual jitter can be substantially 
lower.
Let us now analyze the jitter components in the Delay-Line block. For 
this purpose the delay lines and interpolators of the Delay-Line block 
have been simulated for all possible control words. The simulations 
have been done considering a transistor level netlist of the Delay-Line 
block extracted from the layout with the inclusion of parasitic. Note 
that the layout of Delay Line Block have been realized in order to 
equalizes the parasitic, minimizing the differential non-linearity
(DNL) of the DCDLs. The Table 2.7 reports the post-layout 
simulation performance of the Delay-Line block for slow, typical and 
fast corners. The technology is STMicroelectronics 28nm CMOS. The 
integral non linearity (INL) values reported in Table 2.7 take into 
account  a  several  jitter  sources: the asymmetries of the tmin
parameters of the delay lines and interpolators, the linearity error of 
the delay-lines and the asymmetries of the delays of the XOR gates. It 
can be observed that the jitter introduced by the Delay-Line block is 
comparable with the jitter introduced by the Modulator.
58 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
 
Table 2.7. Delay Line block performances obtained by transistor-level simulations.
Finally, a mixed-transistor level simulations have been performed to 
validate the circuit. Note that in the mixed-transistor level simulation 
the Delay-Line block is described at the transistor level (the netlist 
includes the parasitic extracted from the layout) and the Processor
block is described by using a gate level netlist. The simulation results 
are reported in Table 2.8.
 
Table 2.8. SSCG absolute jitter obtained by mixed-transistor level simulations.
59
2.5 On-chip measurement
 
The circuit has been fabricated in STMicroelectronics 28nm CMOS 
technology and is included in a 420 pin test chip. The IC Micrograph
and the SSCG layout are shown in Fig. 2.14. DCDL placement has 
been carried out manually; the regular structure in figure 2.10 
corresponds to a regular layout. The remaining parts of the circuit 
have been implemented by place&route tools. The total silicon area is 
0.031 mm2.
Figure 2.14. SSCG Layout and Micrograph.
The largest block is the modulator. Note that a BIST logic block has 
been added in order to easy circuit experimental verification. The 
maximum clock frequency is 1.5GHz, limited by the digital 
modulator. In the experimental verification a fast corner chip has been 
verified. 
60 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
2.5.1 Verification of the Measurement Unit 
The verification of the Measurement Unit is the first test performed on 
the SSCG, since a malfunction of the Measurement Unit compromises 
the overall circuit functionality. In the realized IC, the signal TCLK/tR,
shown in Fig. 2.1, is available at the output of the chip allowing the 
constant monitoring of the measured DCDL resolution. The Fig. 2.15
show the measurement unit performances. In particular the figure on 
the left shows the tR values obtained by varying the supply voltage, for 
different input clock frequencies. The right figure displays the 
??????????????????????R) given by the maximum difference between 
the tR values measured at each supply voltage. As it can be observed tR
??????????????????????????????????????R lower than 90fs. This gives a 
strong evidence of the correct operation and performances of the 
measurement unit.
Figure 2.15. Experimental tR ?????????R obtained for different supply voltage and 
clock frequency. 
2.5.2 Verification without modulation 
 
The Fig. 2.16 shows the measured jitter of the output clock in the case 
fCLK = 1.5GHz, fOUT = 490MHz, Vdd=1.0V, obtained measuring the 
61
fast chip when the modulation is turned off. As it can be observed the
jitter, measured by using Tektronix TDSJIT software, reduces from 
4.09psrms to 3.16psrms by activating the delay asymmetry 
compensation blocks.
Figure 2.16. Measured Jitter with and without asymmetry compensation (spread 
spectrum OFF). 
 
It is worth to highlight that in the developed circuit, differently from a 
PLL, the input jitter adds to the jitter introduced by the circuit. In this 
operating conditions an input clock jitter of 1.64psrms (which 
corresponds to the datasheet performance of employed input clock 
generator: Agilent 81134A). This allow to estimate the jitter due to 
SSCG as 2.71psrms. The Tab. 2.9 summarizes the experimental jitter 
values obtained by using TDSJIT software in different conditions.
62 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
Table 2.9. Experimental output clock jitter obtained by TDSJIT software with 
fCLK=1.5GHz and Vdd=1.0V 
 
In the Table 2.9 the performances of the SSCG are reported 
considering fast chip for fCLK=1.5GHz and Vdd=1.0V. The fast chip 
have been verified for different output clock frequencies. The data 
reported in the table confirm the correct circuit operations in all 
conditions. Please note that for a fOUT=fCLK and  fOUT=fCLK/2 a very low 
output jitter is measured. In these conditions, in fact, the input clock is 
produced by selecting always the same tap of the same delay-line. 
Therefore, the output jitter corresponds only to the jitter of the input 
clock signal added to the jitter introduced by the clock tree of the 
SSCG.
 
 
63
 
2.5.3 Verification with modulation
 
The fast SSCG chip has been extensively verified when the 
modulation is turned on. In this section the most significant 
measurements are reported. The Fig. 2.17 shows the experimental 
instantaneous output frequency of the SSCG obtained with fCLK=1.5
GHz, fm=100kHz, fOUT=1GHz and ?f/fOUT=10%. Two different 
modulation profiles have been considered: Hershey-kiss and 
Sawtooth. The figure shows that the circuit operate correctly by 
imposing the desired modulation profile in down-spread from the 
imposed output frequency (1 GHz). This measurements highlights the 
capability of the SSCG to follow both discontinuous frequency
modulations (e.g. sawtooth) or complex modulation profile (Hershey-
kiss).
Figure 2.17. Experimental Modulation Profiles by TDSJIT software (fCLK=1.5GHz, 
fm=100KHz, fOUT=1GHz, ?f/fOUT=10% and Vdd=1.0V).
The effectiveness in reducing the EMI spectrum is investigated in 
figure 2.17 and in figure 2.18. The figure 2.18 shows the experimental 
peak power level of the output spectrum measured by Agilent PSA 
E4445A spectrum-analyzer in Peak-mode when the SSCG is 
configured in order to employ the Hershey-kiss modulation (fig. 
2.18a), triangular modulation (fig. 2.18b), sawtooth modulation (fig. 
2.18c) and optimal discontinuous modulation (fig. 2.18d) with
fOUT=1GHz, fm=100kHz, RBW=100kHz and 10% modulation depth.
The results of figure 2.18 show that the discontinuous frequency 
modulation can result in an sensible improvement with respect to 
continuous frequency clock signals.
64 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
The Fig. 2.19 reports the amount of reduction of the peak power level 
reduction of the output clock spectrum at 1GHz with 10% modulation 
depth, for different modulation profiles and modulation frequencies 
fm. The output spectrum is measured by using Agilent PSA E4445A 
spectrum-analyzer in Peak-mode with a resolution bandwidth (RBW) 
of 100kHz.
Figure 2.18. Experimental Modulation Gain for different modulation profiles:
a)H-kiss modulation b)Triangular modulation
c)Sawtooth modulation d)Discontinuous Optimal modulation
(fOUT=1GHz, ????OUT=10%, RBW=100kHz, fm=100kHz).
To the best of our knowledge, the measurement results of figure 2.19
represent the first experimental verification of the advantages of 
frequency-discontinuous modulations. The figure, in fact, shows that 
the highest modulation gain (27dB) is achieved by using the optimized 
discontinuous modulation profile, which is slightly more effective 
than sawtooth; compared to the triangular and Hershey-kiss 
waveforms the improvement in modulation gain is larger than 2 dB.
This behaviour is in accordance with the theoretical analysis done in 
[1] (see section 1.1). The measured spectrum obtained with the 
65
optimal discontinuous modulation for fm=140kHz is also reported in 
figure 2.19
As a final test, the capability of the SSCG to realize an instantaneous 
switching, between two frequencies without clock glitches, is verified. 
This is experimentally verified by the results of Fig. 2.20 where it is 
shown the measured instantaneous output frequency when the SSCG 
is driven in order to switch from fout=750MHz to fout=500MHz. 
(Hershey-kiss modulation, ????OUT=10%, fm=100kHz). 
Figure 2.19. Experimental Modulation Gain for different modulation profiles and 
modulation frequencies (fOUT=1GHz, ????OUT=10%, RBW=100kHz).
Figure 2.20. Instantaneous output frequency switching from 750MHz to 500MHz 
(fCLK=1.5GHz, fm=100KHz, and ????OUT=10% and Vdd=1.0V).
66 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
2.5.4 Comparison with the state of the art  
 
The performances of the IC, considering the fast corner and a supply 
voltage of 1.0V, are summarized in Table 2.10 and are compared with 
the previous art. The developed circuit supports discontinuous 
frequency modulation with an fm that can be as large as 20 MHz, and, 
considering measures with RBW=100kHz, presents the highest 
modulation gain, despite of the employ of a ?? value lower than the 
previous implementations, which allows achieving very modulation 
gains for RBW=1MHz (see [1]). The jitter of developed circuit, with 
few exceptions is comparable with the best PLL implementations. 
Power and area also compare favourably with respect to published 
results. Unique features of proposed SSCG are the weak and standby 
mode and the very low recovery time from full standby.
Table 2.10. Circuit performances and comparison with the state of the art.
67
2.6 References 
[1] D. De Caro, "Optimal Discontinuous Frequency Modulation for 
Spread-Spectrum Clocking" Electromagnetic Compatibility, IEEE 
Transactions on , vol.55, no.5, pp.891,900, Oct. 2013.
[2] D. De Caro, C. A. Romani, N. Petra, A.G.M. Strollo, C. Parrella,
"A 1.27 GHz, All-Digital Spread Spectrum Clock 
Generator/Synthesizer in 65 nm CMOS," Solid-State Circuits, 
IEEE Journal of , vol.45, no.5, pp.1048,1060, May 2010.
[3] C.C. Chung, C.Y. Lee, “An All-Digital Phase-Locked Loop for 
High Speed Clock Generation,” IEEE Journal of Solid-State 
Circuits, vol.38, no.2, pp.347-351, Feb. 2003.
[4] P.L. Chen, C.C. Chung, C.Y. Lee, “A Portable Digitally 
Controlled Oscillator Using Novel Varactors,” IEEE Trans. on 
Circuits and Systems-II: Express Briefs, vol.52, no.5, 
pp.233-237,May 2005.
[5] P.L. Chen, C.C. Chung, J.N. Yang, C.Y. Lee, “A Clock Generator 
With Cascaded Dynamic Frequency Counting Loops for Wide 
Multiplication Range Applications,” IEEE Journal of Solid-State 
Circuits, vol.41, no.6, pp.1275-1285, Jun. 2006.
[6] B.M. Moon, Y.J. Park, D.K. Jeong, “Monotonic Wide-Range 
Digitally Controlled Oscillator Compensated for Supply Voltage 
Variation,” IEEE Trans. on Circuits and Systems II: Express 
Briefs, vol.55, no.10, pp.1036-1040, Oct. 2008.
[7] T. Matano, Y. Takai, T. Takahashi, Y. Sakito, Y. Takaishi, H. 
Fujisawa, S. Kubouchi, S. Narui, K. Arai, M. Morino, M. 
Nakamura, S. Miyatake, T. Sekiguchi, K. Koyama, K. Miyazawa, 
"A 1-Gb/s/pin 512-Mb DDRII SDRAM using a digital DLL and a 
slew-rate-controlled output buffer," VLSI Circuits Digest of 
Technical Papers, 2002. Symposium on , vol., no., pp.112,113, 13-
15 June 2002.
68 Chapter 2 – All-Digital SSC Generator in 28nm CMOS
[8] S. Damphousse, K. Ouici, A. Rizki, M. Mallinson, "All digital 
spread spectrum clock generator for EMI reduction," Solid-State 
Circuits Conference, 2006. ISSCC 2006. Digest of Technical 
Papers. IEEE International , vol., no., pp.962,971, 6-9 Feb. 2006.
[9] R.J. Yang, S.I. Liu, “A 40–550 MHz Harmonic-Free All Digital 
Delay Locked Loop Using a Variable SAR Algorithm,” IEEE 
Journal of Solid-State Circuits, vol.42, no.2, pp.361-373, Feb. 
2007.
[10] R.J. Yang, S.I. Liu, “A 2.5 GHz All-Digital Delay-Locked Loop 
in 0.13 mm CMOS Technology,” IEEE Journal of Solid-State 
Circuits, vol.42, no.11, pp.2338-2347, Nov. 2007.
[11] S. Kao, B Chen, S. Liu, "A 62.5–625-MHz Anti-Reset All-
Digital Delay-Locked Loop," Circuits and Systems II: Express 
Briefs, IEEE Transactions on , vol.54, no.7, pp.566,570, July 
2007.
[12] L. Wang; L. Liu, H. Chen, "An Implementation of Fast-Locking 
and Wide-Range 11-bit Reversible SAR DLL," Circuits and 
Systems II: Express Briefs, IEEE Transactions on , vol.57, no.6, 
pp.421,425, June 2010.
[13] K.H. Choi, J.B. Shin, J.Y. Sim, H.J. Park, “An Interpolating 
Digitally Controlled Oscillator for a Wide-Range All-Digital
PLL,” IEEE Trans. on Circuits and Systems—I: Regular Papers,
vol.56, no.9, pp.2055-2063, Sept. 2009.
[14] D. De Caro, "Glitch-Free NAND-Based Digitally Controlled 
Delay-Lines," Very Large Scale Integration (VLSI) Systems, 
IEEE Transactions on , vol.21, no.1, pp.55,66, Jan. 2013.
 
Chapter 3
SSC Generator with injection locking 
Digitally Controlled Oscillator (DCO)
In this chapter a novel all-digital SSCG with injection locking 
digitally controlled oscillator (DCO) is presented. The first developed 
SSCG (Chapter 2) has been redesigned in order to allow the 
generation of an output clock signal with a frequency higher than the 
frequency of the input clock signal. In fact, the most important 
drawback of the previous developed SSCG architecture consists in the 
maximum output clock frequency limitation that is the output clock
can have a frequency as high as the clock frequency of the circuit.
Several modifications and additions have been done in order to 
include new functionalities and characteristics. The functionalities of 
this novel circuit are the following:
? Maximum output clock frequency greater than 1.5GHz.
? Multiplication capability (maximum multiplication factor 
equal to 8).
? Same specification of previous SSCG (e.g. minimum output 
frequency 300MHz, minimum output frequency 1.5MHz, 
minimum modulation frequency=18MHz).
The present Chapter is organized as follows. The SSCG architecture is 
described in the section 3.1. The circuit implementation details are 
given in section 3.2 where the major sources of deterministic jitter are 
discussed. The section 3.3 describes the output signal jitter and 
70 Chapter 3 – SSC Generator with injection locking DCO
presents the post-layout simulation results. Moreover, in this section
the future developments of the SSCG implementation are delineated.
3.1 Circuit architecture
The circuit architecture of the SSCG with injection locking is shown 
in Figure 3.1.
Figure 3.1. SSC Generator architecture.
The system has an input clock signal (clk) having a constant period 
Tclk and generates a frequency modulated output clock waveform.
This novel architecture allows to perform the clock multiplication in 
order to obtain an output clock frequency much higher than the input 
clock frequency. To this purpose a novel Delay-Line block is designed 
by using a DCO-based architecture. In particular, the output clock 
signal is produced by using a four DCO blocks (D0RE, D0FE, D1RE and 
D1FE) digitally controlled by the Modulator.
All the DCO blocks are employed for both the generation of the 
output signal and the on-line measurements of the DCO block 
parameters to compensate PVT variations. In particular, two DCO 
blocks (e.g. D0RE and D0FE ) are in charge of generating the output 
clock edges (clock generation mode) while the others DCO blocks
(e.g. D1RE and D1FE ) are in charge for the on-line measurements of 
the DCO block parameters (measurement mode). However, when an
on-line measurement is completed, the DCO blocks couple (e.g. D1RE
and D1FE) switch from clock generation mode to measurement mode,
while the others DCO blocks (e.g. D0RE and D0FE) switch from 
measurement mode to clock generation mode. Therefore, it is possible 
to measure the parameter of the DCO blocks which then will in charge 
of generating the output clock edges. In this way the PVT 
71
compensation is more accurately with respect the architecture of the 
previous SSCG where a replica delay line was used to compensate the 
PVT variations (see section 2.1).
The DCO blocks D0RE and D1RE are driven on the rising-edge on the 
input clock and are in charge of generating up to a maximum of four
output clock edges in a timing window of length TCLK/2 starting from 
the input clock rising-edge. Similarly the DCO blocks D0FE and D1FE
are in charge of generating up to a maximum of four output clock 
edges in a timing window starting from the falling edge of the input 
clock signal. Figure 3.2 shows, as an example, the case in which the 
DCO blocks D0RE and D0FE are in charge of generating the output 
clock edges. As you can see, each DCO-block is used for one half 
clock period, the remaining half clock period is used as timing margin 
for settling time of DCO block control signals.
Figure 3.2. Output clock generation.
This topology has the capability to position up to a maximum of eight 
clock edges within one input clock period. Therefore, the maximum 
output clock frequency (fout-MAX) is therefore:
8out MAX CLKf f? ? ? (3.1)
Note that the DCO blocks are employed in order to allow the 
multiplication of clock frequency that is an output clock signal with a 
frequency higher than the frequency of the input clock signal.
Moreover, as described in the following section, these blocks also 
allow to implement the injection locking technique in order to mitigate 
the overall jitter of the circuit. 
72 Chapter 3 – SSC Generator with injection locking DCO
3.1.1 DCO-block
The employed DCO-block architecture is shown in Fig. 3.3.
Figure 3.3. DCO block architecture (clock generation mode).
This architecture has been implemented by using a design flow based 
on standard cells. All the inputs are synchronized to the SSCG clock 
(clk) and includes only positive-edge flip-flops.
As you can see in figure 3.3, two delay-line units ???? ???? ??? are 
employed. Each delay-line unit is realized by using two NAND-based 
DCDLs where the delay of one DCDL is offset by half resolution and 
a MUX-based digital delay interpolator, as seen in the previous 
Chapter (see section 2.1).
??????????????????????????used in a ring oscillator topology in order to 
implement the clock multiplication, ?????? ???? ?????? ????? ????? ??? ???
employed to compute the correct position of the output clock edges 
generated by the DCO.
Let us analyze the operating principle of the DCO block when it is in 
charge of generating a certain number of output clock edges (clock 
generation mode). As an example, the DCO block of Fig. 3.3 is driven 
on the rising edge on the input clock. The flip-flop input INRE has a 
low to high transition or a high to low transition when at least an 
output clock edge has to be generated between the next rising and the 
falling edges of the input clock signal. As you can see in figure 3.3, a 
pulse generator has been implemented by using an inverter and a 
XNOR gate. This pulse generator receives as input the flip-flop output 
and computes the pulse signal PRE. This pulse signal is properly
delayed by using the delay line unit ?? to compute the set input signal
73
of the set-reset latch (S_latch). In this way, the output of set-reset latch 
(en_DCO) has a low to high transition, therefore the DCO is enabled 
to generate the output clock edges. 
Two edge counters have been implemented in order to stop the clock 
edges generation. In particular, a rising edges counter (counter LH)
and a falling edges counter (counter HL) have been realized. In fact, 
since the initial state of the output of the DCO is know and also the 
number of clock edges that needs to be generate is known, it is 
possible to choose which kind of counter should be activated for 
stopping the clock generation. 
To clarify this point, in figure 3.3, as an example, four output clock 
edges are generated and the initial state of the DCO output is
considered low. In this case, only the falling edge counter is enabled 
and only the two falling edges are counted. At the end of the count,
the signal RHL becomes high in order to reset the set-reset latch. 
Therefore, the signal en_DCO has a high to low transition and the 
generation of clock edges is stopped.  
Figure 3.4. DCO block architecture (measurement mode).
Note that the architecture of figure 3.3 allows to implement the 
injection locking technique. The injection locking (see section 1.2.1)
is an effective method to reduce the overall jitter produced by the 
DCO in each clock cycle. In our implementation, the injection locking
method is implemented ???????????????????????????????. In particular, 
the delay ??? ???? ?????? ????? ????? ?? is properly choose in order to 
realign the first generated output clock edge every input clock semi-
period. In this way it is possible minimize the jitter accumulated by 
the DCO in the semi-period.
74 Chapter 3 – SSC Generator with injection locking DCO
The DCO block architecture in measurement mode configuration is 
shown in Fig. 3.4. As you can see, in this configuration the two delay 
line units are separately closed in a ring-oscillator topology for the on-
line measurements of the delay line unit resolution tR and the 
minimum delay tmin in order to compensate the process voltage and 
temperature variations. 
At first the Measurement Unit is in charge of evaluating the ratio 
TCLK/tR of the delay line unit ??, by measuring the ring oscillator 
output. Afterwards, the Measurement Unit is in charge of evaluating 
the ratio TCLK/tR and the ratio TCLK/Tmin of the delay line unit ????????
section 3.1.4). 
3.1.2 Edge counter architecture
The edge counter architecture is shown in Fig. 3.5.
Figure 3.5. Edge counter architecture.
The edge counter receive as input the number of output clock edges
that needs to be generated (named NLH/HL), the output clock of the 
DCO (named out_dco), the enable signal (named ELH/HL) and produce 
the reset signal (named RLH/HL) of the latch SR in order to disable the 
DCO oscillation. 
The Register is update with the SSCG clock (clk) on the basis of the 
required number of output clock edges that needs to be generated. 
Therefore when the clk signal has a low to high transition (or a high to 
low transition for the DCO blocks driven on the falling edge of the 
input clock signal) the register contain the sum of NLH/HL and the 2-bit 
counter value. The 2-bit counter and the flip flop are synchronized by 
the output of the DCO. In this way, the output of the 2-bit comparator 
75
is high when the number of the generated output clock edges is equal 
to NLH/HL-1. Moreover, when the last output clock edge is generated 
the output of the flip-flop RLH/HL has a low to high transition in order 
to reset the latch SR. Therefore the DCO oscillation is stopped.
The figure 3.6 show, as an example, a simulation of the DCO-block in 
clock generation mode. In particular a generation of four output clock 
edges is considered. In this case the falling edge counter is enabled in 
order to count two falling edges (NHL=2). When the two falling edges 
are counted the output counter signal RHL has a transition low to high, 
therefore the set-reset latch is disabled (en_DCO becomes low). 
However, the latch SR needs to be disabled before of the following 
(the fifth) edge of clk_out. This results in a constraint to the maximum 
output clock frequency. 
As shown in the figure 3.6, the post-layout simulation confirm this 
scenario and the maximum output frequency is equal to 2 GHz (slow 
corner).
Figure 3.6. Edge counter frequency limitation.
76 Chapter 3 – SSC Generator with injection locking DCO
3.1.3 Modulator
The architecture of the Modulator block is shown in Figure 3.7.
Figure 3.7. Architecture of the Modulation profile and Digital Frequency 
Synthesizer.
This architecture uses a Direct Digital Frequency Synthesizer (DDFS), 
which receives the instantaneous frequency fi of the output clock 
signal and computes the DCO block inputs INRE and INFE, the number 
of falling/rising output clock edges that needs to be generated (NRE,
NFE) and the control signals of the delay lines units ???????? ??????
????RE /tR??????RE /tR??? ???FE /tR??? ???FE /tR, for the DCO-blocks that are
charge of generating the output clock edges (D0RE, D0FE or D1RE,
D1FE). The ????????????????????????????????????? ?????RE /tR??????FE /tR)
are scaled to the delay line resolution tR by using two multipliers and 
the measurement signal (TCLK/tR) computed by the Measurement Unit.
Instead, the control signals of the delay lines ?? ?????RE /tR??????FE /tR)
are scaled to the delay line resolution tR and to the minimum delay of 
the DCO (tmin) by using two multipliers, two adders and the 
measurement signals (TCLK/tR, tmin/ TCLK) computed by the 
Measurement Unit. In fact, t??? ???????? ???????? ??? ???? ?????? ????? ???
determine the oscillation frequency of the DCO. In particular, note 
that when the output clock frequency is higher than the input clock 
frequency then the DCO generates a certain number of output clock 
77
edges in one input clock semi-period. The period of these output clock 
edges is:
2
min2OUT R
R
T t t
t
?? ??? ? ? ?? ?? ?
(3.2)
where tmin is the minimum delay of the DCO equal to the sum of the 
minimum delay of the delay line and the delay of the enable latch of 
the DCO (see Fig. 3.3). As you can see, the output clock frequency 
depends on delay line resolution and on the tmin. Therefore it is 
necessary to measure also the minimum delay for the delay line unit 
?? in order to compute the correct output clock frequency (see section 
3.1.4). 
The instantaneous frequency fi is computed by the Modulation Profile,
by adding the desired output frequency (fo) to the instantaneous 
frequency deviation ??i imposed by the frequency modulation profile.
Four profiles are implemented in the circuit. Triangular and saw tooth 
profiles are realized by suing respectively 1’s complementer and the 
output of the accumulator directly. More complex modulation laws 
(Hershey-Kiss and the optimal frequency modulation) are 
implemented by using a piecewise linear approximation where the 
interval [0,1/fm] is divided in 64 uniform segments.
The Table 3.1 reports the exact meaning of the Modulator inputs. 
Note that the Modulation Profile block is divided by four to reduce the 
power dissipation. In the worst case (fCLK/fOUT = 1024) the maximum 
????????? ??????????? ?????? ?f/fOUT is 3.1%. Please note that a larger 
modulation depth can be obtained for fOUT < 1024fCLK. In the worst 
case (fOUT/fCLK = 8) the resolution depth is 1.60%. Finally, in the worst 
case (fCLK = 300MHz) the maximum modulation frequency is 
18.75MHz, while the modulation frequency resolution is 0.07KHz.
Table 3.1. Meaning and range of Modulator input signals.
78 Chapter 3 – SSC Generator with injection locking DCO
3.1.4 Measurement Unit
This circuit is needed to take into account the PVT variations which 
modifies the resolution tR of the delay lines and the minimum delay 
tmin of the DCO. To compensate the variations of tR and of tmin with 
the PVT operating condition of the circuit, the Measurement Unit
continuously measures the ratio TCLK/tR and the ratio TCLK/tmin and 
provides this information to the Modulator block (see Figure 3.1). Let 
us to explain the measurement operation. As shown in Figure 3.4,
when the DCO-block is in measurement mode then the delay line unit
??? is closed in a ring oscillator topology. The output period (TR) of 
this ring oscillator is given by:
1
min2R R
R
T t t
t
?? ??? ? ? ?? ?? ?
(3.3)
where tmin is the minimum delay through ?1 plus the delay of the 
multiplexer tMUX (see Fig. 3.4). The value of tR can be extracted in the 
following way. Firstly the delay line unit ?1 ??????????????????/tR input 
equal to N1. The resulting period of the ring oscillator TR1 is divided 
by the prescaler with a division ratio equal to 4 and the frequency 
divider with a division ratio equal to d'. In the following, let us name 
????????????????????????????????????divider is used as the enable signal 
of a up/down counter clocked by input clock. At the end of one cycle, 
the counter content M1 will be a measure of the half period of the 
prescaler output:
? ? ? ?11 min 11 2 RCLK R
d TM T d t N t?? ?? ? ? ? ? ? ?? ?? ? (3.4)
?????????????????????????????????????????????????????/tR input equal 
to N2. Therefore the counter content M2 will be given by:
? ? ? ?2 min 21 CLK RM T d t N t? ? ? ? ? ? (3.5)
By subtracting the previous two equations the unknown tmin are 
deleted from the measurement: 
79
? ?1 2 1 2 2R
CLK
tM M d N N
T
? ? ? ? ? (3.6)
Therefore:
? ? ? ?1 21 2 1 2
2R
CLK
t M M
T d N N d N N
?? ?? ? ? ? (3.7)
The Measurement Unit evaluates the ratio TCLK/tR by using an 
arithmetic unit:
? ?1 2
1 2
CLK
R
d N NT
t M M
? ?
?? (3.8)
Therefore the maximum measurement error of tR/TCLK is:
? ?/ 1 2
2
R CLKt T d N N
? ? ? ? (3.9)
By expanding in Taylor series and truncating at the first order it is 
possible to evaluate the error on TCLK/tR measurement:
2
/ /CLK R R CLK
CLK
T t t T
R
T
t
? ?? ?? ?? ?? ? (3.10)
The total measurement time is equal to:
? ?1 2 min2meas RT d N N t d t? ? ? ? ? ? (3.11)
Please note that by reducing N2 we improve the measurement error 
and the measurement time but we increase also the frequency of the 
ring oscillator. Therefore, the N2 value is selected as low as possible, 
by taking account the prescaler maximum operating frequency. In our 
implementation, tR=0.68ps in the fast corner and we have chosen 
N2=192, obtaining a maximum ring oscillator frequency equal to 
3.28GHz (assuming tMUX=0), that is the maximum clock frequency for 
80 Chapter 3 – SSC Generator with injection locking DCO
which the prescaler has to be designed. Clearly, the maximum clock 
frequency of the frequency divider will be given by 
3.28GHz/4=821MHz. Moreover, in the slow corner (tR=1.6ps) so the 
maximum ring oscillator frequency is 1.39GHz and the maximum 
clock frequency of frequency divider is 1.39GHz/4=349MHz. We 
have also chosen N1=N2+1024 and d=216. In this way we have an 
error:
2
25
/ 2CLK R
CLK
T t
R
T
t
? ?? ?? ?? ?? ? (3.12)
By considering the minimum clock frequency (300MHz) and the 
minimum delay-line resolution (0.68ps) we can evaluate the error in 
the worst case conditions:
/ 0.71613CLK RT t? ? (3.13)
Moreover, with the chosen parameters, by considering the worst speed 
corner (tR=1.6ps), the total measurement time is (assuming tMUX=0):
154.4measT s?? (3.14)
Table 3.2. Measurement Unit parameters and performances.
81
Similarly, at the end of this measurement, the Measurement Unit
measures the resolution tR ??? ???? ?????? ????? ????? ???? ?????????? ????
same error measurement and the same measurement time are obtained. 
However, as explained in the previous section, it is necessary to 
measure also the minimum delay of the DCO. Therefore, the 
Measurement Unit is able to measure the ratio tCLK/tmin. By summing 
up (3.4) and (3.5) we can obtain:
? ? min1 2 1 2 2 2R
CLK CLK
t tM M d N N d
T T
? ? ? ? ? ? ? (3.15)
Therefore:
min 1 2 1 2 2
2 2 2
R
CLK CLK
t M M N N t
T d T d
? ?? ? ? ?? ? (3.16)
The arithmetic unit of the Measurement Unit evaluates the ratio 
TCLK/tmin as:
1 2
min 2
CLKT M M
t d
?
?? (3.17)
The maximum measurement error of tmin/TCLK is therefore equal to:
min
1 2
/ /
2
2 2CLK R CLKt T t T
N N
d
? ??? ? ? ? (3.18)
The Tab. 2.4 reports the measurement circuit parameters and 
performances.
82 Chapter 3 – SSC Generator with injection locking DCO
3.2 Circuit analysis and sizing
3.2.1 Measurement Circuit Sizing and 
Limitations 
The range of tR values which the measurement circuit is able to handle 
is determined by the length of the registers storing M1 and M2 and the 
number of bits used for the signal TCLK/tR in Fig. 3.6. 
In this section the limits imposed by the length of M1 and M2 are 
discussed. The internal architecture of the Measurement Unit uses a 
single up/down counter to measure directly the quantity M1-M2.
Initially the delay-line unit ???? ?????? ??? ???????????? ??tR = N1, the 
up/down counter is cleared and is set for up-counting. Afterward, the 
Delay-????? ?MEAS ??? ??????? ????? ??tR = N2 and down-counting is 
selected. Without resetting the counter between the first and the 
second measure, the final value of the counter will be directly equal to 
M1-M2. The only condition to impose for the correct circuit operation 
is that M1-M2 is lower than the maximum value representable within 
the counter. This condition imposes a constraint on the maximum 
measurable tR. Therefore we can write:
? ?
12
1 2
2 1ML
RMAX CLKt Td N N
?? ?? ? (3.19)
where LM12 is the number of bits of the counter which calculates M1-
M2. In our case we chosen LM12=23, consequently:
0.125RMAX CLKt T? ? (3.20)
In the worst case (fCLK = 1500MHz), the maximum value allowed for 
tR is 83ps. 
Let us now consider the signal TCLK/tR (see Fig. 3.6). In our 
implementation this signal is composed by 14 bits, with a MSB of 
weight 212 and an LSB of weight 2-1. The LSB weight results in a 
quantization error of TCLK/tR given by:
83
2
/1 2CLK RT t? ?? (3.21)
This is an additional source of error on the signal TCLK/TR of Fig. 3.6,
which adds to the error ?TCLK/tR described in the section 3.1.4. The MSB 
weight of TCLK/tR imposes a limitation on the minimum tR value which 
can be measured. The minimum possible tR value can be written as:
13 12 2
CLK
RMIN
Tt ?? ? (3.22)
In the worst case (fCLK = 300MHz), tRMIN is equal to 0.41ps. The Tab. 
2.5 summarizes the parameters and limitations of the measurement 
circuit.
Table 3.3. Parameter and limitations of the measurement circuit.
84 Chapter 3 – SSC Generator with injection locking DCO
3.2.2 Jitter analysis and circuit sizing
In this section a theoretical jitter analysis for Modulator block is 
discussed. 
The figure 3.7 show the architecture of the Modulator. Note that when 
the modulation is turned-??????f/fCLK is equal to 0 and, consequently 
??????fi=0. Therefore, in these conditions, no error is introduced by the 
Modulation Profile block. The figure 3.8 show the portion of 
Modulator relevant for the jitter analysis.
Figure 3.8. Portion of the Modulator relevant for the jitter analysis (without 
modulation).
The DDFS generates the input signals for the DCO-blocks. As shown 
in Fig. 3.8, this is obtained with the help of a Next-Edge Computation
85
block that is a finite state machine (FSM). The output of the Next-
Edge Computation ?????? ?????RE/TCLK, ????FE/TCLK?? ??????/TCLK and 
??????/TCLK) drive the delay lines and interpolators of the delay line 
units of the DCO block through a scaling block. 
Not??????????? ???????RE/TCLK ?????FE/TCLK has a weight of 2-2 that is 
???? ???????? ?????? ??? ?RE/TCLK ???? ?FE/TCLK is 1/2. In fact, each 
clock edge can be positioned in a timing window of one half clock 
cycle. Let us name 2-u ???? ??????? ??? ???? ???? ??? ?RE/TCLK and 
?FE/TCLK. In the implementation of Fig. 3.8, these signals are on 12 bit, 
therefore u=13. 
Let us analyze the jitter components of this portion of Modulator. The 
first jitter component is due to the quantization of ?????? /tR???????? /tR,
? ????? /tR, ? ????? /tR. This error corresponds to the error due to the 
resolution of the delay-line, and, consequently is independent from the 
particular architecture and sizing chosen for the Modulator. This first 
jitter component, named 
Rt
Jabs , can be easily write:
0.5
Rt R
Jabs t? ? (3.23)
The second jitter component is due to the errors of the signal TCLK/tR.
As discussed in the previous sections, this signal is affected by a 
measurement error ??TCLK/tR) and a quantization error ??1TCLK/tR). Note 
that the maximum value of ??????/TCLK????????/TCLK????????/TCLK and 
??????/TCLK is 1/2, therefore the source of jitter due to the errors of  
TCLK/tR, named /CLK RT tJabs , can be write:
? ?/ / /1 12CLK R CLK R CLK RT t T t T t RJabs t? ?? ? ? ? (3.24)
According to (3.24) we have:
/ 0.483CLK RT t RJabs t? ? (3.25)
The last source of jitter is due to the quantization of ??????/TCLK,
??????/TCLK????????/TCLK ??????????/TCLK. By looking the Fig. 3.8, this 
source of jitter can be easily written as:
86 Chapter 3 – SSC Generator with injection locking DCO
1 2
2
u CLK
delayTclk R
R
TJabs t
t
?? ? ? ? (3.26)
By considering the minimum clock frequency (300MHz), the 
minimum delay-line resolution (0.68ps) we can evaluate this source of 
jitter in the worst case conditions. Moreover in our implementation u 
is equal to 13, therefore:
0.299delayTclk RJabs t? ? (3.27)
By summing up the three jitter components, the following upper 
bound for the jitter is obtained:
1.379 RSSCG
tJabs
I
? ? (3.28)
It is worthwhile to highlight that that the (3.28) represent an upper 
bound for the output jitter, since it assumes the independence of the 
three jitter sources. Therefore the actual jitter can be substantially 
lower. 
 
 
 
 
 
 
 
 
87
3.3 Post-layout simulation and future works
The layout of the DCO-block is shown in Figure 3.9.
Figure 3.9. DCO-block layout.
Several simulations have been performed in order to realize a jitter 
analysis of the DCO-block. This analysis is fundamental for different 
reasons. In fact, this analysis gives an indication of the achievable 
jitter before completing the RTL design and it sets a specification to 
Figure 3.10. : Jitter analysis results in the worst case (typical corner).
88 Chapter 3 – SSC Generator with injection locking DCO
verify the SSCG performances. Moreover, the jitter analysis is the 
way to obtain the asymmetry components that configures the 
asymmetry compensation block within the RTL. A mixed-signal 
exhaustive simulations of the DCO-block are employed and a Matlab 
scripts are developed for the data analysis. The Fig. 3.10 shows the 
jitter analysis results in the worst case, obtained by varying the control 
signals of the delay lines of the DCO-block for different number of 
output clock edges generated. As you can see in figure 3.10, the jitter 
increase with the increasing of the number of the generated output 
clock edges. 
The layout of the Delay-Line block is shown in figure 3.11.
Figure 3.11. Delay Line block layout.
89
Starting from the analyses and the simulation results presented in this 
chapter, a several points can be considered as a fundamental to 
complete the design flow of the SSCG:
? Jitter analysis simulations of the entire SSCG.
? Asymmetries components evaluation in order to configure 
the asymmetry compensation blocks within the RTL.
? Final RTL including all the features.
? Place and Route of the SSCG.
? Development of a novel test-chip.
? Test-chip measurements.
 
0Conclusions
In the first part of this thesis a prototype of an all-digital 
Spread-Spectrum Clocking Generator (SSCG) supporting the 
discontinuous frequency modulation has been presented in all its 
aspects: design, simulation and fabrication measurements. The 
developed architecture is based on an all-digital architecture which do 
not require any loop to implement frequency synthesis and spreading. 
In addition, the developed circuit can be designed by using a design 
flow completely based on standard cells, which simplifies the design 
and porting in new technologies. The circuit is included in a 420 pin 
test chip implemented in ST 28nm CMOS flip-chip technology. The 
experimental measurements show the capability of developed SSCG 
to implement both discontinuous frequency modulations (e.g. 
sawtooth) or complex modulation profiles (e.g Hershey-kiss). 
Moreover, these measurement results represent the first experimental 
verification of the advantages of frequency discontinuous 
modulations. Another advantage of developed IC is the much larger 
maximum modulation frequency with respect to previous 
implementation, which allows achieving very good modulation gains 
for RBW=1MHz.
The final specifications of the first developed SSCG can be 
summarized as follows:
min max
? circuit clock frequency (fCLK) 300MHz     1500MHz
? circuit output frequency (fOUT) - fCLK
? modulation frequency (fm) 10kHz   20MHz
? coarse/fine delay-line  resolution (tR)       0.68ps 1.6ps
? modulation depth (??/ fOUT) 0.5% 10%
In the second part of this thesis the first developed SSCG has been 
redesigned in order to allow the generation of an output clock signal 
92 Conclusions
with a frequency higher than the frequency of the input clock signal.
To this purpose a new delay line block has been designed in order to 
implement the clock frequency multiplication. Furthermore, the 
injection locking technique is implemented by using a novel DCO-
base architecture in order to improve the jitter performance of the 
circuit. The circuit is realized by using only standard cells and is able 
to generate an output clock frequency larger of 2GHz with a 
maximum multiplication factor equal to 8. However, the design flow 
of this circuit is not yet complete. In fact, a set of simulations to 
evaluate the jitter performance of the entire SSCG must be performed.
This jitter analysis will be used to evaluate the asymmetries 
components of the circuit in order to configure the compensation 
blocks within the RTL. Afterwards a final RTL will be realized
including all the future. Finally, a place and route of the SSCG may 
be implemented.
aAcknowledgments
At the end of my thesis I would like to thank all those people who 
made this thesis possible and an unforgettable experience for me.
First, I would like to express my gratitude to my supervisor Prof. 
Davide De Caro, who offered his continuous advice and 
encouragement throughout my thesis.
I would like to thank all the members at Electronic Department for 
their support, cooperation and friendship. In particular, I would like to 
thank my friends Pierluigi, Grazia, Michele, Mariangela, and 
Alessandro for the nice time spent together. 
Finally, I would like to thank my parents, my sister and my brother for 
their love, patience and continuous support.
 
