A 1.8-pJ/b, 12.5-25-Gb/s wide range all-digital clock and data recovery circuit by Verbeke, Marijn et al.
IEEE JOURNAL OF SOLID-STATE CIRCUITS 1
A 1.8 pJ/b, 12.5–25 Gb/s Wide Range All-Digital
Clock and Data Recovery Circuit
Marijn Verbeke, Pieter Rombouts, Hannes Ramon, Bart Moeneclaey, Xin Yin, Johan Bauwelinck, and Guy Torfs
Abstract—Recently, there has been a strong drive to replace
established analog circuits for multi-gigabit Clock and Data
Recovery (CDR) by more digital solutions. We focused on PLL-
based All-Digital CDR (AD-CDR) techniques which contain a
Digital Loop Filter (DLF) and a Digital Controlled Oscillator
(DCO) and pushed the digital integration up to a level where
our DLF is entirely synthesized. To enable this, we found that
extensive subsampling can be used to decrease the speed of the
DLF while maintaining a good operation. Additionally, an Inverse
Alexander phase detector and a 5.5-bit resolution DCO complete
the AD-CDR architecture. As a result of the low-complexity and
digital architecture, the AD-CDR occupies a compact active chip
area of 0.050mm2 and consumes only 46mW at 25Gb/s. This is
the smallest area and lowest power consumption compared to the
state-of-the-art. In addition, our implementation is highly tunable
due to the synthesized logic and supports a wide operating range
(12.5Gb/s-25Gb/s), which is a significantly larger range compared
to previous work. Finally, thanks to our digital architecture the
power dissipation decreases linearly while moving to the lower
speeds of our operating range. This is in contrast with most prior
work, making our design truly adaptive.
Index Terms—All-Digital Clock and Data Recovery (AD-CDR),
Inverse Alexander Phase Detector, Digital Loop Filter (DLF),
Digital Controlled Oscillator (DCO), subsampling, synthesis.
I. INTRODUCTION
IN multi-gigabit data communication links, the data is seri-ally transmitted to the receiver without any accompanying
clock. This clock has to be recovered at the receiver side
in order to sample and process the received data. Therefore,
a Clock and Data Recovery (CDR) circuit is an essential
component in such a high speed receiver, and the design and
the performance of the CDR has a significant influence on the
overall operation of the link [1].
M. Verbeke, H. Ramon, B. Moeneclaey, X. Yin, J. Bauwelinck and
G. Torfs are with the Department of Information Technology (IN-
TEC), IDlab, Ghent University - imec, 9052 Gent, Belgium (e-mail:
{marijn.verbeke, hannes.ramon, bart.moeneclaey, xin.yin, johan.bauwelinck,
guy.torfs}@ugent.be).
P. Rombouts is with the Department of Electronics and Informa-
tion Systems (ELIS), Ghent University, 9052 Gent, Belgium (e-mail:
pieter.rombouts@ugent.be).
This work was supported by the Agency for Innovation by Science and
Technology in Flanders (IWT) and the Hercules project VeRONICa for the
chip fabrication.
This paper is a postprint of a paper submitted to and accepted for publication
in IEEE Journal of Solid-State Circuits. The copy of record is available at
IEEE Xplore: M. Verbeke et al., “A 1.8-pJ/b, 12.5-25-Gb/s Wide Range All-
Digital Clock and Data Recovery Circuit,” in IEEE Journal of Solid-State
Circuits. doi: 10.1109/JSSC.2017.2755690
c© 2017 IEEE. Personal use of this material is permitted. Permission from
IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional
purposes, creating new collective works, for resale or redistribution to servers
or lists, or reuse of any copyrighted component of this work in other works.
The need for low cost and high integration mandates that the
CDR should be implemented in a deep-submicron technology.
However, it is hard to achieve high performance for classical
analog CDRs in today’s modern technologies [2]. Therefore,
digital CDRs have become increasingly important for high-
speed data communication. A digital CDR eliminates the need
for a large loop filter capacitor used in classical analog CDRs.
Instead, a digital CDR uses a compact digital loop filter which
can realize large time-constants without any additional cost in
area. Additionally, a digital loop filter is tolerant to process,
voltage and temperature variations and is noise insensitive.
The filter is also easily scalable, portable across CMOS
technologies and highly adaptable. Therefore, a digital CDR
is the optimal choice for a high speed receiver implemented
in a deep-submicron technology and has been a major area of
research interest in recent years [2]–[12].
We focus on a subset of these digital CDRs, i.e. so-called
All-Digital Clock and Data Recovery (AD-CDR) circuits. AD-
CDRs are derived from the first All-Digital Phase Locked
Loop (AD-PLL) introduced in [13] and comprise a phase
detector and a digital controlled oscillator in addition to a
digital loop filter [2]–[4], [14]–[18]. PLL-based CDR circuits
have the advantage over alternative digital friendly CDRs
that they have intrinsically a wide frequency capture range
due to the ability to adapt both phase and frequency [19].
Additionally, they benefit from a wide bandwidth and have
the ability to reject input jitter [20].
The only problem is that the digital loop filter, which
consists of a proportional and an integral path, typically cannot
operate at the tens of Gb/s data rate. In prior work, the speed of
the integral path of the digital loop filter is reduced by using
demultiplexing [2], [18] or subsampling [3], [21]. However,
the proportional path still runs at a high speed and due to
this, these blocks had to be designed and laid out by hand,
largely counteracting the advantages of a digital design which
ultimately should allow automatic synthesis.
There is only one very recent related work [4] where the
digital block is entirely synthesized. To accommodate this
synthesis, the input of the digital loop is heavily demultiplexed
into many parallel lanes but this has disadvantages: a large
amount of parallel samplers are needed to process the high-
speed data input and this in turn requires a considerable clock
distribution network. Moreover, the huge amount of samples
has to be processed by a complex signal processing block.
This increases the power consumption and chip area: e.g. the
work in [4], which includes a CTLE and a DFE, has an area
which is 10 times larger than our work. Additionally, the power
consumption per bit is more than 75 % higher than our work.
2 IEEE JOURNAL OF SOLID-STATE CIRCUITS
In this work, we use extensive subsampling [22] instead
of demultiplexing to reduce the operating speed of the entire
digital loop filter. This enables us to push the digital integration
up to a level where our digital loop filter is entirely synthesized
without requiring complex signal processing. To demonstrate
the correct operation, we implemented a 25 Gb/s PLL-based
all-digital clock and data recovery circuit. This AD-CDR
features an Inverse Alexander phase detector which is low-
power, simple, fast and accurate. In particular, this phase
detector shows improved performance in simulations over the
conventional Alexander phase detector when subsampling is
used [23]. In this work, we complement the earlier theoretical
work by presenting the first experimental verification of this
Inverse Alexander phase detector. The last building block
of the AD-CDR is a low-resolution digital controlled ring
oscillator. We demonstrate that a resolution as low as 5.5 bit
can be used without degrading the performance of the AD-
CDR.
Thanks to the highly digital architecture, the active die
area is very compact and only occupies 0.050 mm2 which is
significantly smaller than competing work [2]–[11]. Moreover,
the power efficiency of the CDR core is 1.8 pJ/b which is also
better than the state-of-the-art [2]–[11]. Additionally, the AD-
CDR is highly adaptable: i.e. the characteristics of the loop
filter can be tuned to satisfy multiple jitter tolerance speci-
fications. Moreover, the operating range can be varied from
12.5 Gb/s to 25 Gb/s, which is the broadest operation range
of any digital CDR that does not use a high-quality, multi-
gigahertz reference clock. Due to the truly digital frequency
adaptable nature, the power consumption decreases linearly
with the operating data rate. This means that when the data rate
is reduced, also the power consumption goes down accordingly
and hence an excellent power efficiency is maintained over the
entire operating range: e.g. at 25 Gb/s the power consumption
is 46 mW while at 12.5 Gb/s this is 23 mW.
The remainder of the paper is organized as follows. Section
II presents the used AD-CDR architecture. In section III
the detailed circuit implementation in a 40 nm Low Power
CMOS process is discussed. The experimental results of our
12.5 Gb/s to 25 Gb/s AD-CDR circuit are summarized in
Section IV, and Section V concludes the paper.
II. ALL-DIGITAL CLOCK AND DATA RECOVERY
ARCHITECTURE
The overall architecture of our AD-CDR architecture is
shown in Fig. 1. It consists of a Bang-Bang Phase Detector
(BB-PD), a subsampler, a Digital Loop Filter (DLF) and a
Digital Controlled Oscillator (DCO). The BB-PD determines
the phase difference between edges in the input data stream
(Din) and the recovered clock (Clk) signal. When the clock is
leading the input data, an Early signal is generated to decrease
the frequency of the recovered clock. Alternatively, when the
clock is lagging, the BB-PD outputs a Late signal to increase
the frequency of the recovered clock. These Early and Late
signals are subsampled by a factor of N and then filtered by
the Digital Loop Filter (DLF). The resulting signal controls
the DCO such that the phase error is reduced. Note that if
no data transition occurs, the BB-PD cannot determine if the
clock leads or lags the data and therefore does not generate
any signal. Consequently, the DCO is not adjusted.
Fig. 1. AD-CDR architecture
Fig. 2. Basic block diagram of the conventional Alexander PD and Inverse
Alexander PD [23].
A. Bang-Bang Phase Detector
Alexander bang-bang phase detectors are typically used in
high-speed CDR circuits because they provide simplicity in
design, good phase adjustment and can work at high speeds
[24]. Additionally, these BB-PDs have the advantage that the
output is already digital, making this type of Phase Detector
(PD) very suitable to drive the Digital Loop Filter.
Recently, the Inverse Alexander phase detector was
proposed as an improvement over this established and
well-known circuit [23]. An elaborate comparison between
the conventional and Inverse Alexander phase detector is
given below.
1) Comparison of Alexander and Inverse Alexander PD:
The conventional Alexander phase detection is based on three
successive data samples which are sampled at twice the data
clock frequency. In the basic block diagram illustrated by
Fig. 2, this is done by sampling the data both on the rising and
the falling edges of the recovered clock Clk. By monitoring
the differences between the three sampled values, it can be
detected whether a data edge has occurred and if this data
edge occurs before or after the corresponding clock edge. For
the actual phase detection, the 3 successive samples, available
at nodes S0, S1 and S2 are used. To understand the operation,
3 possible waveforms are considered in Fig. 3. First, the ideal
locking condition is shown in Fig. 3(a). In this case, the value
of sample S1 is undefined and in practice due to noise the
PD will randomly produce an Early or a Late pulse. Fig. 3(b)
shows the case where the clock edge leads on the data edge
(Early) and Fig. 3(c) shows the case where the clock edge
lags on the date edge (Late). In the absence of data transitions
VERBEKE et al.: A 1.8 PJ/B, 12.5–25 GB/S WIDE RANGE ALL-DIGITAL CLOCK AND DATA RECOVERY CIRCUIT 3
(not shown in the figure), all three samples S0, S1 and S2 are
equal and the xor gates (Fig. 2) will set both the Early and the
Late signals to zero. These relations are summarized as [25]:
Early :S0 ⊕ S1 = 0, S1 ⊕ S2 = 1 → Clk frequency ↓
Late :S0 ⊕ S1 = 1, S1 ⊕ S2 = 0 → Clk frequency ↑
Others :S0 ⊕ S1 = S1 ⊕ S2 → Do not adjust clk
Fig. 3(a) shows that once the CDR has settled, the samples
S0 and S2 correspond to two successive data output (Dout)
samples, while sample S1 occurs at the transition of the data.
Fig. 3. Waveforms for the locking behavior of the Alexander PD: (a) Ideal
locking condition with phase difference ∆φ = 0.5 UI; (b) Early condition;
(c) Late condition.
The proposed Inverse Alexander PD is also shown in Fig. 2
and obviously has the same schematic as the Alexander PD,
but the Early and the Late signal are interchanged, which leads
to an inversion of the sign in the CDR loop:
Early :S0 ⊕ S1 = 1, S1 ⊕ S2 = 0 → Clk frequency ↓
Late :S0 ⊕ S1 = 0, S1 ⊕ S2 = 1 → Clk frequency ↑
Others :S0 ⊕ S1 = S1 ⊕ S2 → Do not adjust clk
The inversion of the sign in the CDR loop causes the CDR to
settle to a different equilibrium point. As shown in Fig. 4(a),
the Inverse Alexander PD will align the rising edges of the
clock signal with the data edges. If the rising edge of the clock
leads (is Early), the first sample, S0, is unequal to the last two
and the clock frequency must decrease (Fig. 4(b)). Vice versa,
if the rising edge of the clock lags (is Late), the last sample, S2
differs from the first two and the clock frequency must increase
(Fig. 4(c)). In lock, the middle sample, S1, corresponds with
the data sample Dout while the other sample moments S0 and
S2 occur at the data transitions.
Fig. 4. Waveforms for locking behaviour of the Inverse Alexander PD: (a)
Ideal locking condition with phase difference ∆φ = 0 UI, (b) Early condition,
(c) Late condition
2) PD characteristics comparison – Full-rate operation:
The output characteristic of both the conventional and the
Inverse Alexander phase detector are shown in Figs. 5(a)-(b).
Here it is assumed that all waveforms are ideal (as in Figs. 3
and 4). If an edge occurs, either a 1-bit Early or Late pulse
will be generated, which for both phase detectors results in
the well known bang-bang action. For both PDs, there is only
one stable locking point, which corresponds to a phase shift
of half a UI (unit interval) for the conventional and to zero
phase shift for the Inverse Alexander PD (also indicated on
the figure).
However, in practice the waveforms are not ideal and several
imperfections occur such as phase noise on the recovered clock
and non-ideal input data waveforms that exhibit pulse width
jitter and unequal rise and fall times, which translates to duty-
cycle distortion. All these effects affect the behavior of both
PDs. At full rate, the difference between the conventional
and Inverse Alexander are negligible, but when the PD is
subsampled, the difference becomes pronounced. To illustrate
the phenomenon, we will discuss the case of duty-cycle
distortion (DCD).
DCD means that the duration of a logic-0 differs from the
duration of a logic-1 [26]. The notations T0 and T1 are used
to represent respectively the duration of an occurrence of a
single logic-0 and a single logic-1 affected by DCD; where
the sum of T0 and T1 always equals 2 UI. Note that when
T1 equals 1 UI, there is no DCD and when T1 < 0.5 UI,
the DCD is too large to have any useful operation of the
CDR. The reciprocal case when T1 > T0, is analog. To
examine the influence of DCD, the output characteristics of
both PDs are determined and shown in Figs. 5(c)-(d) for the
artificial case of a data stream with a single logic-1 data pulse.
This means that there are two consecutive data transitions.
When examining this case, it turns out that apart from the
normal Early and Late cases, two anomalous states occur. The
first anomalous case, shown in Fig. 6(a), occurs around the
locking point of the conventional PD. Here an Early pulse is
immediately followed by a Late pulse. If the phase detector is
operated at full speed, this will be filtered by the lowpass loop
filter and essentially translate in a net null action. The second
anomalous case is shown in Fig. 6(b) and is most relevant for
the Inverse Alexander variant. Both the Early and Late signal
are simultaneously active. This is normally an illegal state, but
in practice most loop filters (e.g. the popular charge pump [25]
and also the digital loop filter used in our prototype) deal with
this situation by interpreting this as a net null action.
Both these anomalous cases occur for phase errors near
the equilibrium locking point and broaden the locking point
into a locking region which is illustrated in Figs. 5(c)-(d).
For the conventional Alexander the locking range corresponds
to the Early immediately followed by Late case, whereas
for the Inverse Alexander the locking range corresponds to
the simultaneous both Early and Late case. Despite of this
difference, both cases are almost equivalent when the PDs are
operated at full rate.
3) PD characteristics comparison – Subsampled operation:
When the PD is subsampled only one out of N of the PD
output values is used. When we study the PD characteristics
for the case of ideal waveforms we still obtain the same
result as Figs. 5(a)-(b). However, in the case of DCD, the
simultaneous both Early and Late case remains unchanged
but the Early immediately followed by Late case is altered:
since one of the 2 successive samples will be lost and since
the data are not correlated to the subsampling process, either
Early or Late will be randomly selected as shown in Fig. 7.
This means that a significant amount of excess random jitter
is injected in the loop which will increase the probability
4 IEEE JOURNAL OF SOLID-STATE CIRCUITS
Fig. 5. Simplified (single pulse) PD output characteristics at full rate operation: (a) the Alexander PD for the case of ideal waveforms, (b) the Inverse
Alexander for the case of ideal waveforms, (c) the Alexander PD for the case of duty cycle distortion and (d) the Inverse Alexander PD for the case of duty
cycle distortion.
Fig. 6. PD waveforms for data with DCD corresponding to the anomalous
cases (a) Alexander Early immediately followed by Late (most relevant for
conventional Alexander), and (b) Simultaneous Early and Late (most relevant
for Inverse Alexander)
Fig. 7. Simplified (single pulse) PD output characteristics at subsampled rate
operation: (a) the Alexander PD for the case of duty cycle distortion and (b)
the Inverse Alexander PD for the case of duty cycle distortion.
of bit errors. This problem occurs in the locking region
of the conventional Alexander PD and not for the Inverse
Alexander PD. For this reason the Inverse Alexander PD is
expected to have a greatly improved performance when the
PD is subsampled [23]. Therefore the Inverse Alexander PD
topology, proposed in [23], is chosen to implement the BB-PD.
B. Digital Controlled Oscillator
For the implementation of the DCO in our AD-CDR, a
quarter-rate architecture [27] is used. This means that the DCO
operates at one fourth of the data speed, and provides the
required sample-time resolution in the form of 8 uniformly-
phase-shifted clock phases. This can conveniently be realized
by a 4-stage differential ring oscillator (see Section III-C) and
significantly relaxes the requirements on the clock buffers and
BB-PD circuitry. For a 25 Gb/s data input, this means that the
DCO frequency will be 6.25 GHz. To illustrate the quarter-rate
operation, the waveforms of a ‘1010’ data sequence and the
8 clock phases are shown in Fig. 8 for the case of an Early
clock.
In the ideal locking condition, the even clock phases are
perfectly aligned with the data edges, while the odd clock
phases are in the middle of the data symbol, which is the
Fig. 8. Waveforms of a ‘1010’ data sequence and the 8 clock phases when the
AD-CDR is Early. The red clock phases correspond to edge-related samples
and the black to data-related samples (as in Fig. 4).
ideal sample moment. Per clock period, there are 4 sets of
three consecutive samples and each set of three consecutive
samples can be used by the Inverse Alexander PD operation
to generate an Early/Late signal. In our design, only 1
out of these 4 Early/Late signals is used: i.e. we only
use clock phases Clk0, Clk1 and Clk2 to gather the phase
information (Early/Late). Of course, still all the data need to
be recovered, which can be done by using the odd clock phases
to sample the input data. The net result is that clock phases
Clk4 and Clk6 are not used and that the phase information is
already subsampled by a factor of 4 in the phase detector.
C. Digital Loop Filter
A typical DLF consists of a proportional and integral path
and can be described by the discrete-time transfer function
HDLF (z) given by:




where Kp and Ki are the respective gains of the proportional
path and integral path, and DKp and DKi are the correspond-
ing delays. In our implemented DLF, we can adapt both the
proportional and integral gain setting, while the delays are hard
wired. The delay in the proportional path and in the integral
path are respectively DKp = 2 and DKi = 9 digital clock
cycles. Especially the delay in the proportional path should be
VERBEKE et al.: A 1.8 PJ/B, 12.5–25 GB/S WIDE RANGE ALL-DIGITAL CLOCK AND DATA RECOVERY CIRCUIT 5
limited in order to avoid stability issues, but with the expected
jitter in the CDR loop, this delay (DKp = 2) is low enough
to ensure its stability [28].
Note that this digital loop filter is connected directly follow-
ing the subsampling block (Fig. 1) to allow automatic synthesis
of the entire digital loop filter. Consequently, the proportional
and integral path are equally affected by the subsampling.
D. Subsampling
In the 40 nm Low Power CMOS process used in this work,
the maximal clock speed should not exceed 1.75 GHz to
enable an automated design (synthesis, place and route) of this
DLF. This means that, even with the subsampling by a factor
of 4 that already occurs in our phase detector implementation,
the operating frequency at the output of the BB-PD is still
too high: e.g. if the CDR operates at 25 Gb/s the output of the
BB-PD operates at 6.25 GHz. Hence, this operating frequency
has to be further reduced to facilitate the implementation of
the DLF. Therefore, the output of the BB-PD is additionally
subsampled by a factor of 4. Overall, this means that the DLF
will only receive an output signal of the PD once out of every
N (=16) data periods. In Fig. 1, the subsampling corresponds
to the block ‘↓ N ’.
Although a higher level of subsampling would further
reduce the area and the power of the DLF, a higher subsample
factor will not lead to an overall optimal power efficiency. This
is because the CDR should be able to deal with data sequences
where the BB-PD does not receive data edges (and hence does
not generate Early nor Late signals) for many clock cycles.
In the case without subsampling, this occurs if the input
data contains a long sequence of Consecutive Identical Digits
(CIDs). If this happens the output of the phase detector is stuck
at zero and the feedback is broken such that the CDR operates
temporarily in open loop. This means that the oscillator runs
freely and any frequency difference between the input data rate
and the recovered clock frequency will cause a linear increase
or decrease of the phase difference over time. In a prolonged
open-loop situation, this phase drift will exceed a unit interval
causing the AD-CDR to lose its lock, which means that the
CDR operation is disrupted. For an input sequence of ‘k’ CIDs,





where fdata corresponds to the input data rate.
To tolerate a long idle sequence, the DCO must have a
sufficiently high resolution such that quantization error is
small. This way, the DCO frequency will be closer to the
desired input data frequency. And hence, when the loop
temporarily opens due to an idle sequence, the corresponding
phase drift will remain acceptable.
Another effect that lowers the maximum tolerable idle
sequence is given by the random walk process of the phase of
the recovered clock during open-loop operation [29]. Lowering
the phase noise of the DCO will reduce this random walk
process.
In the case of subsampling, the loop filter operates at lower
frequency and the total idle time Tidle corresponding to a CID









This means, that the idle time due to k CID input bits for
the case with subsampling is almost equal to the case without
subsampling.
However, regardless of the CIDs in the full rate input data, it
can happen that after subsampling the phase detection output
consists of a long idle sequence of length l (without any Early
nor Late pulse). E.g. for the popular PRBS31 test sequence, it
can be shown that for subsample factor that is a power of 2,
there will always be an idle sequence in the subsampled PD
output of lenght l = 31.
Now, the corresponding idle time is proportional to the sub-





This means that the tolerance to a long idle sequence will
become worse for increasing value of the subsampling factor
N . To maintain adequate robustness to long idle sequences
for an increasing value of the subsampling factor N , the DCO
phase noise and resolution should be improved accordingly.
This indicates that there is a trade-off for the subample factor
N in the sense that increasing N will decrease the power
consumption of the DLF but increase the required power
consumption in the DCO. From behavioral simulations, we
found that choosing N = 16 is an adequate compromise.
According to simulation, with this setting, our circuit should be
able to tolerate input data streams which after PD subsampling
have an idle (subsampled) sequence length l of over 100.
III. CIRCUIT IMPLEMENTATION
The top-level implementation of our AD-CDR is shown in
Fig. 9. In our physical partitioning, we tried to maximally
exploit automated digital tools. Therefore, we pushed part of
the BB-PD after the subsampling such that it could also be
automatically synthesized. The result is that the BB-PD and
the subsampling block are intertwined. The implementation
consists of 6 high-speed samplers followed by a retiming
block, a subsampling block and the (automatically synthe-
sized) ‘Phase Detection Logic’. Additionally, the AD-CDR
comprises an automatically synthesized digital loop filter, a
clock divider and a DCO.
The 6 high-speed samplers are driven each by their own
6.25 GHz clock phase coming from the DCO. 4 samplers out
of the 6 are used to sample the data, while the other 2 samplers
are used to sample the edges. As mentioned above, 2 out of
the 8 uniformly-phase-shifted DCO clock phases are not used.
In the retiming block, all the collected samples (i.e. 4 data
samples and 2 edge samples) are aligned to 1 clock phase.
The retimed samples of the data constitute the recovered data
(the actual CDR output), while the phase information, which
6 IEEE JOURNAL OF SOLID-STATE CIRCUITS
Fig. 9. Block diagram of AD-CDR implementation (speeds are indicated for 25 Gb/s operation). Red is used for edge-related samples and black for
data-related samples (as in Fig. 4).
Fig. 10. Detail of the full custom part of the BB-PD & Subsampling, which contains 6 samplers, a retiming block and a subsampling block
(speeds are indicated for 25 Gb/s operation)
adjusts the CDR to reduce the phase error, is subsampled to
1.56 Gb/s. This phase information is sent to the synthesized
digital block (running at 1.56 GHz) where first, the Phase
Detection Logic calculates the Early and Late signals. These
are then further processed by the DLF which controls the
quarter-rate DCO.
A. BB-PD and Subsampling
The implementation of the BB-PD & Subsampling com-
prises two parts: a full custom designed block and the auto-
matically synthesized Phase Detection logic. A more detailed
view of the full custom block consisting of the high-speed
samplers, the retiming block and the subsampling block is
given in Fig. 10.
1) Sampler: First, the incoming data is sampled with a
high-speed sampler which is implemented as a Sense Am-
plifier based Flip-Flop [30]–[35]. The Sense Amplifier based
Flip-Flop has a fast sense amplifier input with a short capture
window followed by a slower regenerative latch (Fig. 11). This
makes it an ideal choice for a subsampling stage, which needs
to capture the high-speed input data very quickly, but has
relaxed requirements on the clock-to-output delay. The device
sizes of the Sense Amplifier based Flip-Flop shown in Fig. 11
are summarized by Table I.
TABLE I
DEVICE SIZES OF THE SENSE AMPLIFIER FLIP-FLOP SHOWN IN FIG. 11
Transistor L W
M0 40nm 8.4um
M1 - M4 40nm 2.4um
M5 - M6 40nm 4.8um
M7 - M10 40nm 3.6um
M11 - M12 40nm 1.2um
M13 40nm 0.6um
M14 40nm 1.2um
M15 - M16 40nm 1.8um
Invertor: pmos 40nm 2.4um
Invertor: nmos 40nm 1.2um
Fig. 11. Sampler circuit: Sense Amplifier based Flip-Flop with a fast sense
amplifier input and a slower regenerative latch
2) Retiming: The 6 sampled data signals (4 corresponding
to actual data samples and 2 corresponding to edge samples)
are sent to the retiming block, which aligns the samples to one
clock phase. For this, two types of dynamic flip-flops clocked
with the opposite clock edge are used. The sampled input data
from clock phases zero to three, is retimed by an array of
positive edge triggered dynamic flip-flops of type I (Fig. 12).
This is a standard dynamic flip-flop, shown in Fig. 13(a).
3 of these retimed samples that contain the information of
two edges (Edge0, Edge1) and one intermediate data symbol
(Dout0), are used for the phase alignment but first have to be
subsampled (see below). To relax the timing requirements of
the flip-flops, the sampled input data from clock phases five
and seven is retimed by an array of type II (negative edge
triggered) dynamic flip-flops (Fig. 12). This type is clocked
with the opposite clock edge compared to type I, but an
additional half clock cycle delay is incorporated (Fig. 13(b))
such that all samples are retimed to the same clock edge.
The devices sizes of the dynamic flip-flops shown in Fig. 13
are summarized by Table II.
VERBEKE et al.: A 1.8 PJ/B, 12.5–25 GB/S WIDE RANGE ALL-DIGITAL CLOCK AND DATA RECOVERY CIRCUIT 7
Fig. 12. Retiming circuit consisting of a array of retiming type I (postive edge
triggered) flip-flops and an array of type II (negative edge triggered) flip-flops.




Fig. 13. Flip flops used in Retiming circuit: (a) Type I (positive edge triggered)
dynamic flip-flop and (b) Type II (negative edge triggered) dynamic flip-flop
Fig. 14. Subsampling circuit
3) Subsampling: Before the phase alignment information
can be sent to the digital block, this information has to be
subsampled by a factor of 4 (Fig. 10). The subsampling is
performed in two steps (Fig. 14), where for each step the
clock frequency is first divided by two and secondly applied
as clock signal to an array of three type I dynamic flip-flops.
Because the input data of the flip-flops is twice the speed of
the corresponding clock input, the data is subsampled by a
factor of 2. Overall, the input data is thus subsampled by a
factor of 4 and the clock signal is divided by 4. This divided
clock is used as clock signal for the digital block.
4) Digital phase detection logic: Next to the full custom
blocks, the BB-PD & Subsampler comprises the synthesized
TABLE II
DEVICE SIZES OF THE DYNAMIC FLIP-FLOPS SHOWN IN FIG. 13
Transistor L W
M1, M5, M9 40nm 0.6um
M2, M6, M10 40nm 0.6um
M3, M7, M11 40nm 1.2um
M4, M8, M12 40nm 0.6um
Fig. 15. Digital phase detection logic
Digital Phase Detection logic. This part is automatically
generated from a Verilog description, which corresponds to
the schematic shown in Fig. 15. It compares the consecutive
samples and determines whether the clock leads or lags the
data, according to the Inverse Alexander operation [23].
B. Digital Loop Filter
The implementation of the automatically generated DLF is
shown in Fig. 16. The DLF receives an Early/Late signal
from the phase detection logic and this signal is then processed
by a proportional and an integral path. The proportional path
directly amplifies the Early/Late signals with −Kp and Kp,
respectively. To maintain the stability of the AD-CDR, the
delay in this path is minimized and the implementation is
made as simple as possible. To achieve this, Kp is always
an integer and the output is a 7-bit thermometer code. Now,
the proportional path can simply be implemented by selecting
or deselecting ‘Kp’ of the thermometer-coded output bits.
These bits directly drive the fine-tuning input of the DCO
(see section III-C). This configuration allows the gain Kp to
be set between 0 and 7.
The integral path of the DLF is implemented as a multi-
rate architecture. That is, a Clk/2-domain is created to reduce
the clock speed which facilitates the implementation of the
accumulator. Therefore, the Early/Late signal is demuxed by
a factor of 2. The internal accumulator has a high resolution of
16 bit. This allows the use of a broad range of integral gains
Ki, which can be set to integer powers of 2. However, to
avoid a bulky DCO design, only the 5 most significant bits of
this 16 bit word are converted to a 31 bit thermometer-coded
word which drives the DCO. In contrast to a binary-weighted
coding, this thermometer coding increases the robustness
against parasitic effects and reduces glitches when switching
between states. In total, the DCO is controlled (in standard
operation) by 45 (=7+7+31) bits each driving a unit varactor
which corresponds to a resolution of 5.5 binary-weighted bits.
Furthermore, there are some signals shown in Fig. 16 that
are not used in normal operation: first there is a ‘from FD’
signal, which is used in the calibration process of the DCO
(see section III-D) and which can be activated by the control
8 IEEE JOURNAL OF SOLID-STATE CIRCUITS
Fig. 16. Digital loop filter implementation
(a) (b)
Fig. 17. DCO structure: (a) Ring oscillator and (b) Delay cell
signal ‘Calibration’. Second, there is also a ‘a fixed DCO
setting’ signal which is only used for debug purposes and gives
the ability to characterize the DCO separately. This signal is
activated by the control signal ‘DCO Characterization’.
C. Digital Controlled Oscillator
To generate the 8 uniformly-phase-shifted clock phases for
the aggregated 25 Gb/s PD operation, the DCO is imple-
mented as a 4-stage ring oscillator with differential delay cells
(Fig. 17) [29].
The delay cell is shown in Fig. 17(b). It can be tuned by
tuning the tail bias current or by tuning the load network. For
the load, we distinguish a coarse tuning and a fine tuning.
The coarse tuning has 6-bit resolution and is only used during
calibration of the DCO (see section III-D) and is implemented
by switching binary-weighted resistors on or off.
The fine tuning is done by tuning the load varactors. During
normal CDR operation only this fine tuning is used. It is
implemented as follows: the thermometer-coded words from
the DLF (see Fig. 16) switch unit varactors on/off. To reduce
the area of the ring oscillator and achieve a good resolution,
the varactor units are distributed equally over the 4 delay cells.
Per LSB of the fine tuning word, only one varactor is switched.
However, the clock phases of the DCO have to be kept equally
spaced as much as possible. Therefore, the on/off switching
of the varactors is sequenced across the different delay cells:
1. toggle a varactor in the first delay cell, 2. toggle a varactor
in the third delay cell, 3. toggle a varactor in the second delay
cell, 4. toggle a varactor in the fourth delay cell, etc.
The tune mechanism through the tail bias current is in
principle not needed, because according to simulation the
entire operating range could be sufficiently covered with the
load tuning alone. However, this tuning was added to achieve
a larger robustness vs. process variations, such that the entire
intended frequency range has sufficient coverage even under
unforeseen process conditions. Here, a 4-bit current control
was implemented on the chip.
D. Calibration of the DCO
Before normal AD-CDR operation, where only the fine-
tuning of the DCO is adapted, the DCO frequency should first
be adjusted to within about ± 30 MHz of the correct quarter-
rate frequency of the data rate (e.g. 6.25 GHz for 25 Gb/s
input data). For this, a coarse tuning of the DCO is performed
in a calibration cycle at startup. This is done through an
automatic frequency control loop which is based on an external
reference clock and counters [2]: The frequency control
loop counts the number of clock cycles of the digital clock
and external reference clock. These numbers are compared
with SPI configured registers and the coarse settings are then
gradually adjusted. This procedure is repeated until the DCO
lies within about ±30 MHz of the correct desired frequency.
The circuit is incorporated in the synthesized digital block.
The power overhead of this calibration procedure is negligible:
the synthesized circuit is only based on simple counters and
comparators and consumes almost no power (approximately
0.75mW).
IV. EXPERIMENTAL RESULTS
The AD-CDR is fabricated in a 40 nm Low Power CMOS
technology. The low power flavor is not favorable for a high-
speed circuit, but was selected based on the available tape-outs.
Unfortunately, the received samples (all from the same wafer)
VERBEKE et al.: A 1.8 PJ/B, 12.5–25 GB/S WIDE RANGE ALL-DIGITAL CLOCK AND DATA RECOVERY CIRCUIT 9
were apparently from a slow process corner. This forced us
to increase the DCO supply voltage to 1.15V (instead of the
nominal value of 1.1V). For the BB-PD and synthesized logic
we had to increase the voltage to 1.25V. All the measurements
reported in this section were done with these increased supply
voltages.
A photo of the fabricated chip together with an annotated
layout view, is shown in Fig. 18. The chip area of the CDR
core is only 0.050 mm2.
To test the fabricated CDR, it was wire bonded on a high-
speed PCB. The input buffers of the CDR and the transmission
lines on the PCB are designed for an input impedance of 50 Ω.
The measurements were performed by directly connecting
the measurement equipment through this PCB to the ESD
protected I/O pads of the CDR.
A. Functional tests
First, basic functional tests were performed on our prototype
at 3 different operating frequencies: 25 Gb/s, 20 Gb/s and
12.5 Gb/s. For this, a 231-1 pseudo random bit data sequence
(PRBS31) was applied to the input of our AD-CDR. Note that
with this PRBS31 test sequence, the PD output, after the 16
times subsampling that we have in our circuit, will contain
idle paterns with a length l equal to 31 (see Section II-D).
At 25 Gb/s, the CDR core without input and output buffers
has a power consumption of 46 mW of which 11 mW is
dissipated by the samplers, retiming block and subsampling
block, 4 mW is consumed by the digital block and 31 mW
is used for the DCO. The power dissipation at 20 Gb/s and
12.5 Gb/s is respectively 38 mW and 23 mW.
Next a batch of Bit Error Rate (BER) measurements were
performed. The full data stream is available as 4 parallel
channels at quarter-rate, but due to equipment limitations, we
could only do the BER measurement on 1 of the 4 channels
at the same time. All the measurement reported underneath
are done in this configuration. In a typical measurement the
AD-CDR was operated over a time span of 15 minutes and
the bit errors over this time frame were collected. These
measurements consistently resulted in an error-free operation
of the AD-CDR at 20 Gb/s and 12.5 Gb/s. At 25 Gb/s a BER
of 3.5·10-13 was measured, well below the error correction
capabilities of most applications [36]. In the remainder of
this section, the performance of the DCO, the PD (including
the experimental comparison of the conventional and Inverse
Alexander PD) and the AD-CDR are discussed in more detail.
B. Digital Controlled Oscillator operation
The DCO can be driven independently from the other
blocks. This allows to characterize the DCO for different
current, coarse tuning and fine tuning settings.
In Fig. 19 the DCO frequency characteristic is shown.
The x-axis represents the 6-bit resistor coarse tuning word
concatenated with the 5-bit integral path fine tuning word and
results in 2048 possible configurations. The measurement was
repeated over multiple current settings: ranging from current
setting ‘2’ to ‘15’ (for the lowest current settings the results
were not meaningful). Fig. 19 demonstrates that the DCO
Fig. 18. Photo of the implemented chip with annotated layout view

















Fig. 19. Free running frequency of the DCO with (a) an overview the complete
frequency range and (b) a detail around 6.25 GHz
covers a frequency range from 2.73 GHz to 8.95 GHz which
corresponds to a data rate range from 10.92 GHz to 35.8 GHz.
A detail of the characteristic around 6.25 GHz, which is
the quarter-rate oscillation frequency for 25 Gb/s input data, is
shown in Fig. 19(b). In this figure the influence of the different
settings is more visible: each color/symbol corresponds to dif-
ferent current setting. The different line segments of the same
color have a different coarse tuning value and all frequency
points within a separate line segment have a different fine
tuning value.
The DCO was designed such that for every coarse transition,
the output frequency range would overlap between the two
adjacent settings. If we now focus on e.g. the rightmost (dark
blue) current setting we note that this is the case for some
10 IEEE JOURNAL OF SOLID-STATE CIRCUITS












Fig. 20. The gain of the DCO Kdco at 6.25 GHz




Fig. 21. Supply sensitivity at 6.25 GHz.
coarse transition. However for some coarse transitions there is
an undesired frequency gap. This means that for a fixed current
setting some oscillation frequencies cannot be generated by
changing only the coarse and fine tuning settings. This issue
arises from underestimated parasites. Fortunately, this problem
was anticipated and can be circumvented by using the coarse
current tuning. In this way, the desired frequency range is still
completely covered.
The measured DCO gain KDCO at 6.25 GHz for the differ-
ent current settings is shown in Fig. 20. The figure shows that
KDCO is about 1.7 MHz per LSB for high current settings and
that KDCO increases to 2.3 MHz per LSB for lower current
settings. Clearly, this means that the DCO quantization step is
very rough. The measurements reported below are performed
for a current setting equal to 12.
The DCO supply sensitivity at 6.25 GHz is shown in
Fig. 21. Here, the supply sensitivity equals 3.3 GHz/V. Due
to the high supply sensitivity, the phase noise of the DCO
is degraded: e.g. at a frequency offset of 10 MHz from the
carrier the measured phase noise is equal to −95 dBc/Hz (see
dotted line in Fig. 25). In post-layout simulation, however, the
corresponding phase noise was only −110 dBc/Hz at 10 MHz
from the carrier. We attribute this deterioration to supply noise
which leads to excessive phase noise due to the poor supply
sensitivity.
C. Phase Detector operation
To determine the performance of the PD, the sensitivity of
the samplers is measured. This sensitivity is defined as the
time span in which the input data is sampled correctly by
the samplers. The measurement is performed by applying an











Fig. 22. Sensitivity of the PD with a PRBS7 input data at 25 Gb/s
external quarter-rate clock signal together with the input data
to the AD-CDR. For this measurement, a 27-1 pseudo random
bit data sequence (PRBS7) at 25 Gb/s with a rise time of
0.25 UI is applied. The internal DCO is bypassed such that
the data is sampled by the external clock. By sweeping the
time difference between the external clock and the input data,
we could determine the BER for each time difference and the
resulting bathtub curve is shown in Fig. 22. The bathtub curve
indicates that a time span of 18.8 ps out of a data period of
40 ps gives a BER below 10-12.
D. Experimental comparison of the conventional and Inverse
Alexander PD
To facilitate the experimental comparison between the con-
ventional and Inverse Alexander PD, our prototype circuit was
designed such that it can be configured to operate with the
conventional as well as the Inverse Alexander PD. This is
done by switching the sign of the control loop of the CDR in
the DLF. Furthermore, the subsample factor N can be set to
16 (which is the nominal case) or to 32 (which is a test mode).
For these cases comparative bit error rate (BER) measurements
were performed. A 25 Gb/s PRBS7 was applied to the CDR
and jitter was intentionally applied to the input data stream.
For the jitter, Gaussian pseudo-white noise with a bandwidth
of 80 MHz (= equipment limit) was used. The jitter level
was varied and the CDR was operated over a long time until
a sufficient number of bit errors were collected to obtain a
reasonably accurate estimation of the bit error rate. The results
are summarized in Fig. 23. In the interpretation of the curves
it should be noted that at a high jitter level the CDR starts
to occasionally lose synchronism (due to cycle slips). This
happened in each of the considered configurations, but as the
figure shows, much earlier for the conventional PD than for
the Inverse PD.
From Fig. 23, we can conclude that the BER performance
of both the conventional as well as the Inverse Alexander PD
degrades when the subsample factor increases from N = 16
(nominal value) to N = 32 (test case). For N = 32,
the conventional PD was in fact not functional at all. It is
also obvious from the figure that due to subsampling and
non-idealities, the Inverse Alexander PD greatly outperforms
the conventional Alexander PD: if we compare the BER at
the same jitter level the improvement is not measurable but
definitely above a factor 105. If we compare the jitter levels
VERBEKE et al.: A 1.8 PJ/B, 12.5–25 GB/S WIDE RANGE ALL-DIGITAL CLOCK AND DATA RECOVERY CIRCUIT 11














Fig. 23. Measured bit error rates for the conventional and the Inverse
Alexander phase detector with a PRBS7 input data sequence at 25 Gb/s: (a)
with a subsample factor N = 16 and (b) with a subsample factor N = 32.
(Digital loop filter settings: Kp = 5 and Ki = 2−7).
where a certain BER occurs, the improvement is about a factor
1.9.
Moreover, the phase noise of the recovered clock is com-
pared between the conventional and Inverse Alexander phase
detector for different subsample factors (Fig. 24). In all cases,
a PRBS31 data sequence at 25 Gb/s was applied to the input
of the CDR and the digital loop filter parameters were held
constant. As predicted in Section II-A, the Inverse Alexander
phase detector will introduce less noise which leads to smaller
phase noise compared to the conventional Alexander phase
detector for the same subsample factor. However when the
subsample factor is doubled, additionally aliasing effects occur
which increases the in-band phase noise with approximately
3 dB for both the conventional and Inverse Alexander phase
detector.
E. All-Digital Clock and Data Recovery operation
For the final AD-CDR operation measurements, the standard
operation mode (with Inverse Alexander PD and subsample
factor N=16) was again selected.
The closed loop phase noise of the recovered clock for
different gain settings is shown in Fig. 25 next to the phase
noise of the free running oscillator. Here, a PRBS31 data
sequence at 25 Gb/s is applied to input of the AD-CDR
and the phase noise of the quarter-rate recovered clock is
captured. The figure shows that increasing the proportional
gain Kp, increases the bandwidth of the AD-CDR. As the ratio
of the proportional gain Kp and integral gain Ki decreases,












Fig. 24. Phase noise of the recovered clock with a PRBS31 input data
sequence at 25 Gb/s: Comparison between Alexander and Inverse Alexander
phase detector for different subsample factors (i.e. N = 16 and N = 32).
(Digital loop filter settings: Kp = 5 and Ki = 2−7).












Fig. 25. Phase noise of the recovered clock with a PRBS31 input data
sequence at 25 Gb/s: Sweep Kp
peaking starts to occur. Furthermore, the figure also shows
that outside the loop bandwidth, the phase noise of the closed
loop system approximates the phase noise of the free running
clock. In the time domain, the closed loop phase noise was
measured as 1.455 ps RMS jitter on the recovered clock as
shown in Fig. 26(a). Additionally, the corresponding measured
eye diagram of the recovered data is depicted in Fig. 26(b).
The RMS jitter is approximately 3.71 ps.
The capture range of the AD-CDR was also measured
and is equal to 248 MHz. This corresponds to the tuning
range in normal operation and is sufficiently large to allow
correct operation from an initial calibration that aligns the
DCO frequency within ± 30 MHz of the desired quarter rate
frequency.
Moreover, the jitter tolerance (JTOL) of the AD-CDR is
shown in Fig. 27(a) and Fig. 27(b) for different proportional
gains Kp and integral gains Ki, respectively. On both fig-
ures, the SDH STM-256 jitter tolerance mask and the jitter
12 IEEE JOURNAL OF SOLID-STATE CIRCUITS
(a)
(b)
Fig. 26. Persistence plots of (a) the recovered (differential) clock
(jitter < 1.5 psrms) and (b) the recovered data (jitter ≈ 3.71 psrms)
tolerance of [2] and [4] are added for comparison. These
jitter tolerance curves are measured by applying a PRBS7
input data sequence at 25 Gb/s with sinusoidal jitter. Each
measurement is obtained by increasing the jitter level until
the BER becomes > 10-12. As shown on the figures, the jitter
tolerance curves can be widely tuned by adapting the digital
loop parameters. E.g. the jitter tolerance can easily be set
such that it satisfies the STM-256 mask and exceeds the jitter
tolerance of [2] and [4]. Please note that for the lower jitter
frequencies, the jitter tolerance is better than indicated on the
figures, since the highest jitter level that our equipment can
generate still leads to a BER that is better than 10-12.
Finally, a comparison with the state-of-the-art of digital
CDRs is shown in Table III. This summary shows that our
design occupies the smallest area and has the highest power
efficiency. Although the performance of the DCO is modest
and the phase noise and the jitter of the recovered clock are
higher than prior work, only our work and [4] satisfy the STM-
265 jitter tolerance mask as shown in Fig. 27. Finally, apart
from [9] and [11] which have the unattractive requirement that
they need a tunable, high-quality, multi-gigahertz frequency
reference clock, our design has the highest relative frequency
range for digital CDRs.




















Fig. 27. Jitter tolerance with a PRBS7 input data sequence at 25 Gb/s: (a)
Sweep Kp and (b) Sweep Ki
V. CONCLUSION
We have presented an AD-CDR in 40 nm Low Power
CMOS technology. It can operate in a very wide range of data
speeds (from 12.5 Gb/s to 25 Gb/s). The CDR takes in the
high-speed data and recovers a quarter-rate clock and demuli-
plexes the recovered data into 4 parallel data streams. A ring
oscillator generates 8 equally spaced quarter-rate clock phases,
and provides the necessary timing resolution for an Inverse
Alexander phase detector, which captures the recovered data
and sends Early/Late signal to the automatically-synthesized
digital loop filter.
A key enabling element of the presented design is the use
of extensive subsampling together with the Inverse Alexander
phase detector to reduce the operating speed of the synthesized
digital logic and still guarantee good operation of the CDR. By
avoiding parallel structures, this simplifies the design, reduces
the active die area and decreases the power consumption.
The resulting AD-CDR core has an area of 0.050 mm2 and
consumes only 46 mW at 25 Gb/s and 23 mW at 12.5 Gb/s.
The implemented CDR is highly tunable and satisfies the jitter
tolerance specifications for SDH STM-256.
VERBEKE et al.: A 1.8 PJ/B, 12.5–25 GB/S WIDE RANGE ALL-DIGITAL CLOCK AND DATA RECOVERY CIRCUIT 13
TABLE III
COMPARE DIGITAL CDRS
[5] [6] [9] [10] [11] [2] [4] [This work]
CMOS Technology [nm] 28 40 65 28 90 65 28 40
Data Rate [Gb/s] 40 19-27 1-16 28 6-44 22-26.5 22.5 - 32 12.5-25
relative frequency range [%] - 29.6 93.8 - 86.4 17 29.7 50
Type Digital CDR Digital CDR Digital CDR (∗∗) Digital CDR Digital CDR AD-CDR AD-CDR AD-CDR
Oscillator type LC-VCO + PI QR-VCO + PI PI PI PI LC-QDCO DAC + Ring-VCO Ring-DCO
Power [mW] 927 (∗) 85(∗) 89 107 230 218 102 46
Power eff. [pJ/bit] 23.2 3.1 5.5 3.8 5.7 8.2 3.2 1.8
Area [mm2] 0.81 0.09 0.088 0.52 0.2 0.46 0.52 0.050
Phase noise @ 1MHz [dBc/Hz] -105 -96 - - - -115 - -105
Jitter RCLK [psrms] 0.170 1.66 - - 0.249 1.28 - 1.46
JTOL @ 10 MHz [Uipp] 0.3 0.5 0.4 0.3 0.35 0.16 0.35 0.6
Satisfies STM-265 JTOL mask No No No No No No Yes Yes
Reference clk [GHz] - 0.1 8 - 16 14 6-11 No No No
Demux ratio 1:64 1:2 1:16 1:4 1:16 1:64 1:32 1:4
Equalization CTLE, 17-tap DFE CTLE, CTLE CTLE No No CTLE, No
2-tap Transversal Filter, 2-tap DFE 1-tap DFE
3-tap Sampled FFE
(∗) Power consumption of complete receiver.
(∗∗) Here, the design is described as an AD-CDR. However, according to our definition of an (PLL-based) AD-CDR, this is a digital CDR.
REFERENCES
[1] B. Razavi, “Challenges in the design high-speed clock and data recovery
circuits,” IEEE Communications Magazine, vol. 40, no. 8, pp. 94–101,
aug 2002.
[2] S.-H. Chu, W. Bae, G.-S. Jeong, S. Jang, S. Kim, J. Joo, G. Kim, and
D.-K. Jeong, “A 22 to 26.5 Gb/s Optical Receiver With All-Digital
Clock and Data Recovery in a 65 nm CMOS Process,” IEEE Journal
of Solid-State Circuits, vol. 50, no. 11, pp. 2603–2612, nov 2015.
[3] T. Lee, Y.-H. Kim, J. Sim, J.-S. Park, and L.-S. Kim, “A 5-Gb/s 2.67-
mW/Gb/s Digital Clock and Data Recovery With Hybrid Dithering
Using a Time-Dithered DeltaSigma Modulator,” IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, vol. 24, no. 4, pp. 1450–
1459, apr 2016.
[4] W. Rahman, D. Yoo, J. Liang, A. Sheikholeslami, H. Tamura,
T. Shibasaki, and H. Yamaguchi, “A 22.5-to-32Gb/s 3.2pJ/b reference-
less baud-rate digital CDR with DFE and CTLE in 28nm CMOS,”
in 2017 IEEE International Solid-State Circuits Conference (ISSCC).
IEEE, feb 2017, pp. 120–121.
[5] R. Navid, E.-H. Chen, M. Hossain, B. Leibowitz, J. Ren, C.-h. A. Chou,
B. Daly, M. Aleksic, B. Su, S. Li, M. Shirasgaonkar, F. Heaton, J. Zerbe,
and J. Eble, “A 40 Gb/s Serial Link Transceiver in 28 nm CMOS
Technology,” IEEE Journal of Solid-State Circuits, vol. 50, no. 4, pp.
814–827, apr 2015.
[6] Z.-H. Hong, Y.-C. Liu, and W.-Z. Chen, “A 3.12 pJ/bit, 19-27 Gbps
Receiver With 2-Tap DFE Embedded Clock and Data Recovery,” IEEE
Journal of Solid-State Circuits, vol. 50, no. 11, pp. 2625–2634, nov
2015.
[7] G. Shu, W. S. Choi, S. Saxena, M. Talegaonkar, T. Anand, A. Elkholy,
A. Elshazly, and P. K. Hanumolu, “A 4-to-10.5 Gb/s Continuous-
Rate Digital Clock and Data Recovery With Automatic Frequency
Acquisition,” IEEE Journal of Solid-State Circuits, vol. 51, no. 2, pp.
428–439, feb 2016.
[8] H. Won, T. Yoon, J. Han, J.-Y. Lee, J.-H. Yoon, T. Kim, J.-S. Lee, S. Lee,
K. Han, J. Lee, J. Park, and H.-M. Bae, “A 0.87 W Transceiver IC for
100 Gigabit Ethernet in 40 nm CMOS,” IEEE Journal of Solid-State
Circuits, vol. 50, no. 2, pp. 399–413, feb 2015.
[9] G. Wu, D. Huang, J. Li, P. Gui, T. Liu, S. Guo, R. Wang, Y. Fan,
S. Chakraborty, and M. Morgan, “A 116 Gb/s All-Digital Clock and Data
Recovery With a Wideband High-Linearity Phase Interpolator,” IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 24,
no. 7, pp. 2511–2520, jul 2016.
[10] J. Liang, A. Sheikholeslami, H. Tamura, Y. Ogata, and H. Yamaguchi,
“A 28Gb/s digital CDR with adaptive loop gain for optimum jitter
tolerance,” in 2017 IEEE International Solid-State Circuits Conference
(ISSCC). IEEE, feb 2017, pp. 122–123.
[11] L. Rodoni, G. von Buren, A. Huber, M. Schmatz, and H. Jackel, “A
5.75 to 44 Gb/s Quarter Rate CDR With Data Rate Selection in 90 nm
Bulk CMOS,” IEEE Journal of Solid-State Circuits, vol. 44, no. 7, pp.
1927–1941, jul 2009.
[12] S.-W. Kwon, J.-Y. Lee, J. Lee, K. Han, T. Kim, S. Lee, J.-S. Lee,
T. Yoon, H. Won, J. Park, and H.-M. Bae, “An Automatic Loop
Gain Control Algorithm for Bang-Bang CDRs,” IEEE Transactions on
Circuits and Systems I: Regular Papers, vol. 62, no. 12, pp. 2817–2828,
dec 2015.
[13] R. Staszewski, K. Muhammad, D. Leipold, Chih-Ming Hung, Yo-Chuol
Ho, J. Wallberg, C. Fernando, K. Maggio, R. Staszewski, T. Jung,
Jinseok Koh, S. John, Irene Yuanying Deng, V. Sarda, O. Moreira-
Tamayo, V. Mayega, R. Katz, O. Friedman, O. Eliezer, E. De-Obaldia,
and P. Balsara, “All-digital TX frequency synthesizer and discrete-time
receiver for Bluetooth radio in 130-nm CMOS,” IEEE Journal of Solid-
State Circuits, vol. 39, no. 12, pp. 2278–2291, dec 2004.
[14] Chi-Shuang Oulee and Rong-Jyi Yang, “A 1.25Gbps all-digital clock and
data recovery circuit with binary frequency acquisition,” in APCCAS
2008 - 2008 IEEE Asia Pacific Conference on Circuits and Systems.
IEEE, nov 2008, pp. 680–683.
[15] I.-F. Chen, R.-J. Yang, and S.-I. Liu, “Loop latency reduction technique
for all-digital clock and data recovery circuits,” in 2009 IEEE Asian
Solid-State Circuits Conference. IEEE, nov 2009, pp. 309–312.
[16] N. Tall, N. Dehaese, S. Bourdel, and B. Bonat, “An all-digital clock
and data recovery circuit for low-to-moderate data rate applications,” in
2011 18th IEEE International Conference on Electronics, Circuits, and
Systems. IEEE, dec 2011, pp. 37–40.
[17] Ching-Che Chung, Duo Sheng, and Yang-Di Lin, “An all-digital clock
and data recovery circuit for spread spectrum clocking applications in
65nm CMOS technology,” in 2012 4th Asia Symposium on Quality
Electronic Design (ASQED). IEEE, jul 2012, pp. 91–94.
[18] H. Song, D.-S. Kim, D.-H. Oh, S. Kim, and D.-K. Jeong, “A 1.04.0-Gb/s
All-Digital CDR With 1.0-ps Period Resolution DCO and Adaptive Pro-
portional Gain Control,” IEEE Journal of Solid-State Circuits, vol. 46,
no. 2, pp. 424–434, feb 2011.
[19] T. Masuda, R. Shinoda, J. Chatwin, J. Wysocki, K. Uchino, Y. Miyajima,
Y. Ueno, K. Maruko, Z. Zhou, H. Suzuki, and N. Shoji, “A 12 Gb/s
0.9 mW/Gb/s Wide-Bandwidth Injection-Type CDR in 28 nm CMOS
With Reference-Free Frequency Capture,” IEEE Journal of Solid-State
Circuits, vol. 51, no. 12, pp. 3204–3215, dec 2016.
[20] M.-t. Hsieh and G. Sobelman, “Architectures for multi-gigabit wire-
linked clock and data recovery,” IEEE Circuits and Systems Magazine,
vol. 8, no. 4, pp. 45–57, 2008.
[21] M. Talegaonkar, R. Inti, and P. K. Hanumolu, “Digital clock and data
recovery circuit design: Challenges and tradeoffs,” 2011 IEEE Custom
Integrated Circuits Conference (CICC), pp. 1–8, 2011.
[22] C. Van Praet, G. Torfs, Z. Li, X. Yin, D. Suvakovic, H. Chow, X.-Z.
Qiu, and P. Vetter, “10 Gbit/s bit interleaving CDR for low-power PON,”
Electronics Letters, vol. 48, no. 21, p. 1361, 2012.
[23] M. Verbeke, P. Rombouts, X. Yin, and G. Torfs, “Inverse Alexander
phase detector,” Electronics Letters, vol. 52, no. 23, pp. 1908–1910,
nov 2016.
[24] J. Lee, K. Kundert, and B. Razavi, “Analysis and Modeling of Bang-
Bang Clock and Data Recovery Circuits,” IEEE Journal of Solid-State
Circuits, vol. 39, no. 9, pp. 1571–1580, sep 2004.
[25] B. Razavi, Design of Integrated Circuits for Optical Communications,
2nd ed. Hoboken, New Jersey, USA: John Wiley & Sons, Inc., 2012.
14 IEEE JOURNAL OF SOLID-STATE CIRCUITS
[26] M. Marcu, S. Durbha, and S. Gupta, “Duty-cycle distortion and spec-
ifications for jitter test-signal generation,” 2008 IEEE International
Symposium on Electromagnetic Compatibility, pp. 1–4, aug 2008.
[27] A. Vyncke, G. Torfs, C. Van Praet, M. Verbeke, A. Duque, D. Suvakovic,
H. Chow, and X. Yin, “The 40Gbps cascaded bit-interleaving PON,”
Optical Fiber Technology, vol. 26, pp. 108–117, 2015.
[28] M. Verbeke, P. Rombouts, A. Vyncke, and G. Torfs, “Influence of Jitter
on Limit Cycles in Bang-Bang Clock and Data Recovery Circuits,” IEEE
Transactions on Circuits and Systems I: Regular Papers, vol. 62, no. 6,
pp. 1463–1471, jun 2015.
[29] A. Abidi, “Phase Noise and Jitter in CMOS Ring Oscillators,” IEEE
Journal of Solid-State Circuits, vol. 41, no. 8, pp. 1803–1816, aug 2006.
[30] B. Nikolic, V. Oklobdzija, V. Stojanovic, Wenyan Jia, James Kar-Shing
Chiu, and M. Ming-Tak Leung, “Improved sense-amplifier-based flip-
flop: design and measurements,” IEEE Journal of Solid-State Circuits,
vol. 35, no. 6, pp. 876–884, jun 2000.
[31] A. Strollo, D. De Caro, E. Napoli, and N. Petra, “A novel high-speed
sense-amplifier-based flip-flop,” IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 13, no. 11, pp. 1266–1274, nov 2005.
[32] H. Jeon and Y.-B. Kim, “A CMOS low-power low-offset and high-
speed fully dynamic latched comparator,” in 23rd IEEE International
SOC Conference. IEEE, sep 2010, pp. 285–288.
[33] M. Brandolini, Y. J. Shin, K. Raviprakash, T. Wang, R. Wu, H. M.
Geddada, Y.-J. Ko, Y. Ding, C.-S. Huang, W.-T. Shih, M.-H. Hsieh,
A. W.-T. Chou, T. Li, A. Shrivastava, D. Y.-C. Chen, B. J.-J. Hung,
G. Cusmai, J. Wu, M. M. Zhang, Y. Yao, G. Unruh, A. Venes,
H. S. Huang, and C.-Y. Chen, “A 5 GS/s 150 mW 10 b SHA-Less
Pipelined/SAR Hybrid ADC for Direct-Sampling Systems in 28 nm
CMOS,” IEEE Journal of Solid-State Circuits, vol. 50, no. 12, pp. 2922–
2934, dec 2015.
[34] T. Kobayashi, K. Nogami, T. Shirotori, and Y. Fujimoto, “A current-
controlled latch sense amplifier and a static power-saving input buffer for
low-power architecture,” IEEE Journal of Solid-State Circuits, vol. 28,
no. 4, pp. 523–527, apr 1993.
[35] S. Fateh, P. Schonle, L. Bettini, G. Rovere, L. Benini, and Q. Huang,
“A Reconfigurable 5-to-14 bit SAR ADC for Battery-Powered Medical
Instrumentation,” IEEE Transactions on Circuits and Systems I: Regular
Papers, vol. 62, no. 11, pp. 2685–2694, nov 2015.
[36] X. Yin, M. Verplaetse, R. Lin, J. Van Kerrebrouck, O. Ozolins,
T. De Keulenaer, X. Pang, R. Pierco, R. Vaernewyck, A. Vyncke,
R. Schatz, U. Westergren, G. Jacobsen, S. Popov, J. Chen, G. Torfs,
and J. Bauwelinck, “First demonstration of real-time 100 gbit/s 3-level
duobinary transmission for optical interconnects,” in 42st European
Conference on Optical Communications, vol. Th.3.B.5, 2016, pp. 28–30.
Marijn Verbeke was born in Eeklo, Belgium, in
1990. He obtained the diploma of M.S. in electri-
cal engineering from Ghent University, Belgium, in
2013. In the same year he joined the Design group of
IDLab, Dept. INTEC at Ghent University where is
he is currently working toward a Ph.D. degree. His
technical interests are mixed signal circuit design,
mainly focusing on clock and data recovery.
Pieter Rombouts was born in Leuven, Belgium in
1971. He obtained the Ir. degree in applied physics
and the Dr. degree in electronics from Ghent Uni-
versity in 1994 and 2000 respectively. Since 1994
he has been with the Electronics and Information
Systems Department of Ghent University where he
is a professor of analog electronics since 2005.
His technical interests are signal processing, cir-
cuits and systems theory, analog circuit design and
sensor systems. The main focus of his research has
been on A/D- and D/A-conversion. He has served or
is currently serving as an Associate Editor for IEEE Transactions on Circuits
and Systems-I, IEEE Transactions on Circuits and Systems-II and Electronics
Letters.
Hannes Ramon received the M.S. degree in electri-
cal engineering from Ghent University, Belgium in
2015. From 2015 on, he has been working towards
a Ph.D degree in the Design group of IDLab, Dep.
INTEC at the same university and associated with
imec. His research interests are high-speed, high-
frequency mixed signal designs for (opto-) electronic
communication systems and digital signal process-
ing.
Bart Moeneclaey (M’14) was born in Ghent, Bel-
gium, in 1988. He received the engineering degree
in applied electronics from Ghent University, Ghent,
Belgium, in 2011 where he is currently working
toward the Ph.D. degree. He has been a Research
Assistant in the Design group of IDLab, Dept.
INTEC at Ghent University, since 2011. His research
is focused on amplifier circuit design for high-speed
optical communication systems.
Xin Yin (M’06) received the B.E. and M.Sc. degrees
in electronics engineering from the Fudan Univer-
sity, Shanghai, China, in 1999 and 2002, respec-
tively, and the Ph.D. degree in applied sciences, elec-
tronics from Ghent University, Ghent, Belgium, in
2009. Since 2007, he has worked as a staff researcher
in IMEC and since 2013 he has been a professor in
the INTEC department at Ghent University. He was
and is active in European and International projects
such as PIEMAN, EUROFOS, MARISE, C3PO,
DISCUS, Phoxtrot, MIRAGE, SPIRIT, WIPE, Ter-
aboard, STREAMS, and GreenTouch consortium. His current research in-
terests include high-speed and high-sensitive opto-electronic circuits and
subsystems, with emphasis on burst-mode receiver and CDR/EDC for optical
access networks, and low-power mixed-signal integrated circuit design for
telecommunication applications. He has authored or co-authored more than
120 journal and conference publications in the field of high-speed electronics
and fiber-optic communication.
In 2014, he led a team including researchers from imec, Bell Labs
USA/Alcatel-Lucent and Orange Labs, which won the GreenTouch 1000x
award in recognition of the invention of the Bi-PON protocol and sustained
leadership. He is a member of the ECOC technical program committee (TPC)
since 2015.
Johan Bauwelinck (M’02) received a Ph.D. degree
in applied sciences, electronics from Ghent Uni-
versity, Belgium in 2005. Since Oct. 2009, he is
a professor in the INTEC department at the same
university and since 2014 he is leading the IDLab
Design group. His research focuses on high-speed,
high-frequency (opto-) electronic circuits and sys-
tems, and their applications on chip and board level,
including transmitter and receiver analog front-ends
for wireless, wired and fiber-optic communication
or instrumentation systems. He was and is active
in the EU-funded projects GIANT, POWERNET, PIEMAN, EuroFOS, C3-
PO, Mirage, Phoxtrot, Spirit, Flex5Gware, Teraboard, Streams and WIPE
conducting research on advanced electronic integrated circuits for next gen-
eration transport, metro, access, datacenter and radio-over-fiber networks. He
has promoted 18 PhDs and co-authored more than 150 publications and 10
patents in the field of high-speed electronics and fiber-optic communication.
VERBEKE et al.: A 1.8 PJ/B, 12.5–25 GB/S WIDE RANGE ALL-DIGITAL CLOCK AND DATA RECOVERY CIRCUIT 15
Guy Torfs (M’03) received the engineering degree
in applied electronics and the Ph.D. degree in ap-
plied sciences, electronics from Ghent University,
Belgium in 2007 and 2012 respectively. Since 2011,
he is with IMEC, associated to Ghent University
where he became assistant professor in 2015. In
2014, as part of the Bi-PON and Cascaded Bi-PON
team, he was rewarded with the Greentouch 1000x
award. He was co-recipient of a 2015 DesignCon
Best Paper Award in the High-Speed Signal Design
category. His research focuses on high-speed mixed
signal designs for wireless baseband and fiber-optic and backplane commu-
nication systems, including digital signal processing and calibration, analog
equalization circuits and clock and data recovery systems.
