Synchronized Interconnected ADPLLs for Distributed Clock Generation in 65 nm CMOS Technology by Galayko, Dimitri et al.
HAL Id: hal-02318785
https://hal.archives-ouvertes.fr/hal-02318785
Submitted on 17 Oct 2019
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Synchronized Interconnected ADPLLs for Distributed
Clock Generation in 65 nm CMOS Technology
Dimitri Galayko, Chuan Shan, Eldar Zianbetov, Mohammad Javidan, Anton
Korniienko, Olivier Billoint, François Anceau, Eric Colinet, Elena Blokhina,
Jérôme Juillard
To cite this version:
Dimitri Galayko, Chuan Shan, Eldar Zianbetov, Mohammad Javidan, Anton Korniienko, et al.. Syn-
chronized Interconnected ADPLLs for Distributed Clock Generation in 65 nm CMOS Technology.
IEEE Transactions on Circuits and Systems II: Express Briefs, Institute of Electrical and Electronics
Engineers, 2019, 66 (10), pp.1673-1677. ￿10.1109/TCSII.2019.2932029￿. ￿hal-02318785￿
1549-7747 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2019.2932029, IEEE
Transactions on Circuits and Systems II: Express Briefs
1
Synchronized Interconnected ADPLLs for
Distributed Clock Generation in 65 nm CMOS
Technology
Dimitri Galayko, Member, IEEE, Chuan Shan, Eldar Zianbetov, Mohammad Javidan, Anton Korniienko,
Franc¸ois Anceau, Olivier Billoint, E´ric Colinet, Elena Blokhina, Senior Member, IEEE, and Je´roˆme Juillard
Abstract—This paper presents an active distributed clock
generator for manycore systems-on-chip consisting of a 10×10
network of coupled all-digital phase-locked loops, achieving less
than 38 ps phase error between neighboring oscillators over
a frequency range of 700-840 MHz at VDD = 1.1 V. The
network is highly robust against VDD variations. An energy cost
of 2.7 µW/MHz per node is 7 times lower than that in analog
implementations of similar architectures and is twice lower than
that in conventional H-tree architectures. This is the largest
on-chip all-digital phase-locked loop network ever implemented.
With clock generation nodes linked only locally, this solution is
proven to be scalable. The presented clock generation network
does not require any external reference, except for the start-
up frequency selection, generating a synchronized signal in fully
autonomous mode and maintaining frequency stability within
0.09% during 1700 seconds. Such a network of frequency and
phase synchronized oscillators can be used as a source for local
clocking areas.
Keywords—All-Digital Phase-Locked Loops (ADPLLs), ADPLL
Networks, Synchronization, Active Distributed Clocking
I. INTRODUCTION
Coupled oscillators are a repeating trend in micro- and
nano-electronics, with more and more applications being dis-
covered over recent years — neuromorphic computing [1],
distributed computations [2], ultra-low phase noise frequency
generation [3], clocking of peer-to-peer networks and radio-
communications [4], [5], distributed frequency generation [6]–
[8] and others.
Clock generation [9] remains a key challenge in the im-
plementation of high-performance, reliable Systems-on-Chip
(SoCs). Indeed, the saturation of the clock frequency growth is
strongly related to the issue of the distribution of a clock signal
over a large chip and its energy cost. As the power consump-
tion rate increases nonlinearly with the frequency of a clock
generator, it affects dramatically the synthesis of gigahertz
Dimitri Galayko, Chuan Shan, Eldar Zianbetov, Mohammad Javidan and
Franc¸ois Anceau are with Sorbonne Universite´, LIP6, F-75005, Paris, France.
Anton Korniienko is with Laboratoire Ampe`re, E´cole Centrale de Lyon,
France.
Olivier Billoint and E´ric Colinet are with CEA-LETI, Grenoble, France.
Elena Blokhina is with University College Dublin, Ireland.
Je´roˆme Juillard is with Central Supe´lec, Gif-sur-Yvette, France.
This work was supported by the HERODOTOS grant number ANR-10-
SEGI-014 from French National Agency of Research (ANR) and by Enterprise
Ireland grant JRNET number CF-2018-0872-P.
 
REF
N1.1
N1.2
N2.1
Phase-Freq. Detector
Oscillator and its PI corrector
Oscillator whose output
is routed out of chip
PFDs routed 
out of chips:
N1.9-N1.10,
N1.6-N1.7, 
N10.9-N10.10
PFD N1.1-N1.2
N1.10
N10.10
PFD0 PFD N1.2-N1.3
N10.1
Fig. 1. Architecture of the implemented ADPLL network for clock gen-
eration. The placement of Phase-Frequency Detectors (PFDs) and Digitally
Controlled Oscillators (DCO) is shown.
frequencies [10]. Centralized frequency distribution requires
chip-wide feedback links for the control of the generated clock.
This limits scalability as the size and functionality of SoCs
increase.
In active distributed clocking, clock signals are re-generated
for each clock domain, whose size is typically 200-300 thou-
sand gates. Inside of each clock domain, the clock signal is
distributed by conventional clock tree networks of a moderate
size. Global synchronization between clocking domains is
achieved by coupling local clock sources with a network of
phase-locked loops (PLls). Previous implementations of active
distributed clocking, such as, for instance, a 4× 4 network of
analog coupled PLLs [11], resonant clocking [12] or oscillators
coupled by injection through magnetic links [13], were based
on analog techniques. The sensitivity with respect to PVT
variations, low compatibility with the digital design flow, lack
of scalability and difficulties of reconfiguration are typical
issues. For this reason, all-digital coupled oscillators appear to
be a promising solution to those issues. The on-chip network
reported in Refs. [6], [14], having a size of 4×4 nodes,
validated the feasibility of such an approach.
This study presents the design, implementation and mea-
surements of a large network of all-digital PLLs (ADPLLs)
1549-7747 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2019.2932029, IEEE
Transactions on Circuits and Systems II: Express Briefs
2
CLKx,y
DCODigital PI
corrector
Error
combiner
Nodex,y
CLKx,y-1
CLKx+1,yCLKx-1,y PFD
CLKx,y+1
Divider
Total error
Local high freq. clockLocal divided clock
To the local
 clock tree
PFD
PFD
PFD
Distributed ADPLL
%4
3 2-complement
phase error
value
Mult
4
Bang-Bang
detector
Vernier TDC
Digital PFD
Divider
%2
to output
pads for
testing
To PFDs of
neighboring
nodes
Fig. 2. Architecture of one node (a single ADPLL) of the implemented net-
work. It contains up to four phase-frequency detectors (PFDs), a proportional-
integral (PI) controller and a digitally controlled oscillator (DCO).
for the synchronization of digitally controlled oscillators and
the generation of distributed clock signals for SoCs. We
describe an implementation and measurement results of 10×10
synchronized oscillators in CMOS 65 nm technology of ST
Microelectronics. The goal of the study is to prove the scal-
ability of this clocking solution with an increased number of
oscillators, to verify the feasibility of a large all-digital globally
synchronized ADPLL network and to test its performance.
The operation of the network in autonomous mode, without
an external reference driving the network, is also investigated.
II. SYSTEM ARCHITECTURE
The implemented clock generator is a Cartesian 10×10 net-
work of distributed oscillators (Fig. 1). 180 Phase-Frequency
Detectors (PFD) measure the phase error between each couple
of neighbour oscillators. The node ‘N1.1’ is also connected
through a PFD to an external reference signal, which allows
one to set up the frequency of the whole network. All the
connections of the network are programmable so that the
topology of the network may be reconfigured dynamically. The
external reference may be disconnected, and the network may
operate in autonomous mode.
Figure 2 presents the structure of one node of the imple-
mented network. The Digitally Controlled Oscillator (DCO)
is a 7-stage ring oscillator with CMOS inverters, whose
frequency is controlled by a matrix of 7 × 9 three-state
inverters, providing 256 frequency steps and occupying a total
area of 50×50 µm2. The chosen DCO architecture is highly
regular and suitable for integration using EDA tools [15].
The choice of the DCO output frequency range of 700-840
MHz is a compromise between the frequency practically useful
for applications (1-2 GHz typical clock frequency in IoT
electronics) and the cost of implementation and testing of a
laboratory prototype.
The distributed synchronization of the oscillators is achieved
by an array of digital phase-frequency detectors implemented
as a combination of a bang-bang (BB) phase detector and
a Time-to-Digital Converter (TDC) [6], [14]. The control of
the DCOs is implemented through digital Proportional-Integral
(PI) controllers. There are 100 DCOs, 100 PI controllers
(correctors) and 181 PFDs in this design with an area of
50×50 µm2, 100×70 µm2 and 55×30 µm2 respectively.
Each PFD is composed of a bang-bang detector, measuring
the sign of the phase error [15], and a 3-bit TDC, measuring
the magnitude phase error [14]. Overall, the PDF provides
a 4-bit signed phase error signal, ranging within ±80 ps at
VDD = 1.1 V. Figure 3 shows transistor-level simulations of
the input-output characteristics of the implemented PFD for
different VDD voltages. In order to improve the accuracy of
synchronization, the TDC employs the Vernier architecture,
where the time step is defined by the difference between the
delays of two cells. Compared to Ref. [6], this allows a time
resolution of 16 to 20 ps at VDD = 1.1 V for small phase errors,
which is less than the smallest buffer delay (30 ps in 65 nm
CMOS). This design is also very robust with regard to VDD
variations within a 20% range (i.e., VDD = 1.0− 1.2 V). Such
robustness is explained by a closed-loop architecture inherent
in PLL design.
The Proportional-Integral controller (corrector) is a conven-
tional controller receiving a weighted sum of the errors arriving
from the neighbor DCOs [6] but with a reduced length of
registers and hence a decreased area (100×70 µm2 each). Each
has four inputs with several programmable features. Firstly, the
weights of the inputs (W1-W4 in Fig. 2) can be set to 0, 1,
2 or 4. The zero weight corresponds to the case when a node
is disconnected or the connection does not exist (for example,
the peripheral nodes have only 2 or 3 neighbors). Secondly,
the PI controller has programmable gains of the integral and
proportional paths. These gains can be optimized and selected
to ensure synchronization. The PI controller is clocked by the
local DCO signal with the local output frequency divided by
4. The same divided signal is applied to the inputs of the
PFDs thus achieving the coupling between clocking domains.
The relatively large size of the PI controller (100×70 µm2) is
due to these programmable functions implemented for testing
purposes, and its area may be further reduced by 30-50%.
In a real application when the proposed network is used for
-8
-6
-4
-2
0
2
4
6
8
10
-120 -80 -40 0 40 80 120
PF
D 
di
gi
ta
l o
ut
pu
t, 
in
 P
FD
 re
so
lu
tio
n
st
ep
s u
ni
ts
 
Time error between the input clocks, ps
Simulated transfer characteristic of PFD, for 
different VDD voltages 
VDD=1.0V VDD=1.1V VDD=1.2V
Fig. 3. Transistor-level simulations: characteristics of the implemented phase-
frequency detector for three different VDD. The simulations are performed
using the Cadence Spectre circuit simulator.
1549-7747 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2019.2932029, IEEE
Transactions on Circuits and Systems II: Express Briefs
3
clock generation, the distance between two nodes will lead to
some delays associated with PFDs. Since the PI corrector is a
standard synchronous digital circuit, it is possible to account
for delays by designing a proper timing of the register transfer
level circuit. In addition, the loss of stability in the network
due to delays may be compensated by a proper choice of the
PI controller coefficients, as discussed in Sec. III.
III. FOUNDATIONS OF PLL NETWORK
SYNCHRONIZATION AND PERFORMANCE
The distributed synchronous clocking approach was sug-
gested in [16]. This study demonstrated that a network of
coupled PLLs was able to provide clock signals to physically
distant parts of a computing system. However, due to particular
features of the phase-frequency detector used, the design
suffered from “mode-locks” (multiple coexisting stable modes
with synchronicity in frequency but not in phase). The first
proof-of-concept of PLL networks was carried out in [11]. The
network in that study was made of 16 distributed oscillators
operating at 1.3 GHz fabricated in 0.35 µm CMOS technology.
The new wave of distributed frequency generation has been
based on all-digital PLLs with successful designs demonstrated
in [6], [17].
There is a deep and rigorous theory underlying the syn-
chronization process in PLL/ADPLL networks. This theory
has been developed substantially over recent years so it has
become possible to treat such complex systems analytically,
despite their mixed analog-digital nature and self-sampling
operation. The first advancement in this regard was presented
in [18] where an equivalent linear time independent discrete-
time system was proposed. In Ref. [19], a design methodology,
using a convex optimization approach and involving simple
linear matrix inequality constraints, was developed. Study [20]
introduced a novel nonlinear event-driven discrete-time AD-
PLL model that is not based on any simplifications typical for
ADPLL modelling. The proposed model was then used in [5],
[21] to demonstrate the global stability and synchronization
of ADPLL networks. Summarizing the recent research, we
outline the following:
 The worst-case synchronization error between two
neighbors in a network is equal to or less than the sum
of the first two resolution steps of the PFD. According
to the characteristic shown in Fig. 3, it corresponds to
38 ps at VDD = 1.1 V.
 Several studies have emphasized the possibility of un-
desirable synchronization modes (mode-locks) in analog
PLL networks. The implemented 10×10 oscillator net-
work does not display mode-locks based on testing hun-
dreds of runs in different configurations with different
initial conditions. This is a clear advantage of a digital
ADPLL network over its analog counterpart.
IV. CHIP MEASUREMENT
The photograph of the fabricated chip in CMOS 65 nm
technology of ST Microelectronics is shown in Fig. 4. The
chip has a single supply for all the blocks of the network. For
testing purposes, some signals are routed-off-chip, as indicated
1810
1
8
2
0
Chip core: 10  10 ADPLL network
Node
Fig. 4. Photograph of the implemented ADPLL network-on-chip.
in Fig. 1. The power consumption of the system as a function
of the frequency of the input reference signal at different
supply voltages is given in Fig. 5, highlighting the frequency
lock-in range of the network for different VDD. The DCO power
consumption dominates in the overall node consumption with
≈2.7 µW/MHz per node at VDD = 1.1 V.
The synchronization of the network has been characterized
by two methods. The first method is the observation of the
digital output of three (out of total 181) PFDs, see Fig. 1. Since
the PFDs are implemented on-chip, this provides a precise and
free of parasitics measurement of the phase error between two
neighboring nodes. Figure 6(a) presents the mean and the root
mean square (RMS) of the digital output of the three PFDs
versus the supply voltage. The mean error is under 0.3 PFD
resolution step (≈6 ps). The RMS of the phase error is close
to unity. As shown by the example of the PFD between nodes
N1.6 and N1.7 (Figure 6(b) and (c)), the output of the PFD is
±1 during 94% of time and with±2 during the remaining time.
A PFD output of ±1 means that the error is below 38 ps at
VDD = 1.1 V and 42 ps at VDD = 1.0 V. Figure 6(d) presents
the spectrum of the phase error noise directly measured on
	



	
		
	
	
	


  	 	 






	





		
		
	"    !
  	
Saturation due to 
reached limit of the 
DCO tuning range
ﬁle:///shared/Dropbox/Dublin Paris cooperation/ISICAS 2019/I...
1 sur 1 26/07/2019 à 05:13
Fig. 5. Chip measurement: power consumption as a function of the frequency
of the input reference signal for different supply voltages VDD.
1549-7747 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2019.2932029, IEEE
Transactions on Circuits and Systems II: Express Briefs
4
routed-off-chip oscillator outputs and the spectrum of the phase
noise of oscillator N1.6. The precision of synchronization can
be improved by increasing the PFD resolution.
The second method is a routed-off-chip measurement of
the output of 27 (out of total 100) DCOs (please refer again
to Fig. 1) after the frequency was divided by 8. A large
and variable length of the bonding wires (5 to 7 mm) and
PCB routing, combined with a high power consumption of
the pad ring, have made this method less reliable for the
characterization of the real phase error on-chip. Figure 7
presents the routed-off-chip characterization of the phase error
statistics between the clocks generated on the main diagonal
(between nodes (1,1) and (i,i) where i = 1, . . . , 10) obtained
when the network is fully autonomous, at an output frequency
of 720.8 MHz at VDD = 1.1 V. A ‘proportional-to-the-distance’
trend is clearly observed, with a maximal mean error of 943 ps,
which is 68% of the clock period. In most cases, an almost
constant standard deviation of the error, above 300 ps, is
mainly attributed to the jitter due to the noise of the pad supply.
A linear scaling of the absolute phase error with the dis-
tance between the nodes may be explained as follows. In
synchronous mode, ADPLL/ADPLL network acts as a linear
control system with a transfer function in the z-domain and a
constant delay. When synchronized, we deal with a network
of linear systems where phase error propagates linearly. The
total phase error we observe depends on how many nodes we
have in a given path under observation.
The stability of network operation in autonomous mode over
time is demonstrated in Fig. 8. The network is first driven by
an external reference signal in order to set the frequency at a
certain desired value (at t = 0). The external signal is then
disconnected by reprogramming the network topology. After
that, the network is observed during 1700 seconds. The plot
demonstrates an excellent frequency stability (with less than
0.01
0.4
0
0.54
0.04
-2 -1 0 1 2
PFD	output	code
PFD output statistic
N1.6-N1.7 @ 1.1V
1.6 1.8 2 2.2
× 104
-2
-1
0
1
2
PFD output sample
N1.6-N1.7 @ 1.1V
Time, s
(a)
(b)
(c)
(d)
-0.3
-0.1
0.1
0.3
0.5
0.7
0.9
1.1
1.3
1 1.05 1.1 1.15 1.2
PF
D	
di
gi
ta
l	o
ut
pu
t,	
in
	P
FD
	re
so
lu
;o
n	
st
ep
s	u
ni
ts
Core	supply	voltage,	VDD,	Volts
Neighboring errors in 3 points on the
chip, against core supply voltage VDD
N1.6-N1.7 N1.9-N1.10 N10.9-N10.10
RMS values
Mean values
524000 samples for each point
16
5 { {
18
0 {
20
0 21
0 
22
0 
23
0 
External	reference	frequency	used	in	the	experiment,	MHz
file:///shared/Dropbox/Dublin Paris cooperation/ISICAS 2019/I...
1 sur 1 26/07/2019 à 07:31
Fig. 6. Chip measurement: (a) Statistics of the output of the three observable
PFDs versus the supply voltage (left plot); (b) Power spectral density (PSD)
of selected phase errors and the phase noise of oscillator N1.6; (c) and (d)
statistics of the output of a PFD.
100
200
300
400
500
600
700
800
900
1000
1100
0 2 4 6 8 10 12 14 16 18 20
Ab
so
lu
te
	p
ha
se
	er
ro
r	b
et
w
ee
n	
clo
ck
s,	
ps
Manha4an	distance	between	network	nodes
Phase error between remoted nodes:
nodes (1,1) and (i,i), off-chip measurements
Absolute	Mean	phase	error Std.	dev.	of	the	error
N
1.
1-
N
3.
3
N
1.
1-
N
4.
4
N
1.
1-
N
5.
5
N
1.
1-
N
2.
2)
N
1.
1-
N
6.
6
N
1.
1-
N
7.
7
N
1.
1-
N
8.
8
N
1.
1-
N
9.
9
N
1.
1-
N
10
.1
0
Node numbers
Each point:
statistics on 1800 periods
file:///shared/Dropbox/Dublin Paris cooperation/ISICAS 2019/I...
1 sur 1 26/07/2019 à 05:30
Fig. 7. Chip measurement: phase error between nodes (1,1) and (i,i) against
the Manhattan distance between the nodes. The measurement was carried out
on routed-off-chip DCO signals.
0.09% max-to-min noise-like fluctuations) and high robustness
of synchronization between neighboring nodes (with a less
than 3 ps mean value obtained through the routed-off-chip
measurement described above).
The proposed network design is compared with existing
analog active clocking techniques [11], [13], conventional H-
tree networks [22] and authors’ previous studies [6] in Table. I.
The size of the implemented network and the power consumed
per node are among best, except for [13], where a sophisticated
inductive coupling is used. Comparing to the authors’ previous
implementation of the ADPLL network, the presented network
is significantly greater in size (100 vs 16), has 2.7 times lower
power consumption per node and per MHz, and has 3 times
smaller area per node. The choice of a relatively low clock
758.60
758.95
759.30
759.65
760.00
760.35
760.70
761.05
761.40
-1
1
3
5
7
9
11
13
15
-200 0 200 400 600 800 1000 1200 1400 1600 1800
O
ut
pu
t	f
re
qu
en
cy
	o
f	t
he
	n
et
w
or
k,
	M
Hz
M
ea
n	
ph
as
e	
er
ro
r	b
et
w
ee
n	
ne
ig
hb
or
in
g	
cl
oc
ks
,	p
s
Time	elapsed	from	the	switching	into	autonomous	mode,	s
Stabity of the network in autonomous mode: error between 
neighboring clocks and the network frequency
Absolute	Mean	phase	error
Frequency	of	the	network
Operation in autonomous mode
Initialization with external reference
,	nodes	N1.6-N1.7
file:///shared/Dropbox/Dublin Paris cooperation/ISICAS 2019/I...
1 sur 1 26/07/2019 à 08:20
Fig. 8. Chip measurement: operation in autonomous mode with the PFD-0
link deactivated (see Fig. 1). The phase error is measured on the routed-off-
chip DCO signals with statistics calculated over 1800 periods.
1549-7747 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2019.2932029, IEEE
Transactions on Circuits and Systems II: Express Briefs
5
TABLE I. TABLE COMPARING THE PRESENTED STUDY WITH THE
STATE-OF-THE ART IMPLEMENTATION OF SIMILAR SYSTEMS.
Gutnik et al, 
[11]
Take et al., 
[13]
Pavlidis et al., 
[22]
Prev. author's 
work, [6],[14] This work
CMOS Technology 350 nm 180 nm 180nm 65 nm 65 nm
Network size 4x4 nodes 2x64 nodes (3D topology)
3x16 nodes 
(3D topology) 4x4 nodes 10x10 nodes 
Oscillator types  Ring VCO LC and CMOS ring VCOs H-tree Ring DCO Ring DCO
Frequency range 1.1-1.3 GHz 1.1 GHz 1 GHz 1100-2380 MHz @1.1V
700-840MHz 
@ 1.1V 
Errors between 
neighbors 10 ps peak ~10 ps Not available <60 ps peak <38 ps peak
Error between 
distant nodes N/A
<25 ps, prop. 
to distance 32.5 ps
N/A (not 
relevant for 
small network)
prop. to  
distance, 
< 960 ps
Total consumed 
power 130mA@3V 196mW
260 mW 
@VDD=1.5 V
180mW@1.6 
GHz
202mA@1.1V 
@816 MHz
Consumed power, μW 
per MHz per node 18.75 1.4 5.4 7.3 2.7
Area per node 0.0038 mm2 0.04 mm2 Not available 0.045 0.013 mm2
Implementation Analog Analog/digital Analog Digital Digital
frequency (800 MHz) is mainly due to a compromise with
regard to the complexity of design and test of the chip in
laboratory conditions. A migration of the design to a more
recent CMOS technology will naturally lead to an increased
clock speed, fitting to the specifications of modern Systems-
on-Chip. A high scalability and complete compatibility with
the conventional digital design flow are achieved at the price
of a larger peak error between neighboring nodes (38 ps) than
that in state-of-the-art analog solutions (10 ps).
V. CONCLUSIONS
The paper presented a first implementation of a very large
network of coupled all-digital PLLs integrated on a chip
using 65 nm CMOS technology. Compared to the performance
of existing implementations of smaller networks, the quality
of the neighbor-to-neighbor synchronization in the presented
network is maintained at the same level, under 2 phase-
frequency detector steps. This proofs the scalability of the
proposed architecture and its suitability for global clock gener-
ation in large systems-on-chip. The analysis of transistor level
simulations and chip characterisation reinforce the idea that
the design of the ADPLL blocks may be optimized to improve
the performances of such a network drastically. The accuracy
of synchronization may approach that of analog solutions if
the resolution of the time-to-digital is increased. The size and
the power consumption of a node may be improved by using
an alternative DCO design, for instance, a current-controlled
voltage-controlled oscillator. The ability of the proposed net-
work to operate in autonomous mode, while providing a fully
digital control of the network configuration, is an advantage
of the proposed clocking technique over previously reported
analog architectures.
REFERENCES
[1] J. Torrejon, M. Riou, F. A. Araujo, S. Tsunegi, G. Khalsa, D. Querlioz,
P. Bortolotti, V. Cros, K. Yakushiji, A. Fukushima et al., “Neuromorphic
computing with nanoscale spintronic oscillators,” Nature, vol. 547, no.
7664, p. 428, 2017.
[2] G. Causapruno, F. Riente, G. Turvani, M. Vacca, M. R. Roch, M. Zam-
boni, and M. Graziano, “Reconfigurable systolic array: from architec-
ture to physical design for NML,” IEEE Trans. on Very Large Scale
Integration (VLSI) Systems, vol. 24, no. 11, pp. 3208–3217, 2016.
[3] S. A.-R. Ahmadi-Mehr, M. Tohidian, and R. B. Staszewski, “Analysis
and design of a multi-core oscillator for ultra-low phase noise,” IEEE
Trans. on Circuits and Systems I: Regular Papers, vol. 63, no. 4, pp.
529–539, 2016.
[4] E. Gantsog, A. B. Apsel, and F. Lane, “A quantized pulse coupled
oscillator for slow clocking of peer-to-peer networks,” in 2015 IEEE
Int. Symposium on Circuits and Systems (ISCAS), 2015, pp. 1314–1317.
[5] E. Koskin, D. Galayko, and E. Blokhina, “A concept of synchronous
ADPLL networks in application to small-scale antenna arrays,” IEEE
Access, vol. 6, pp. 18 723–18 730, 2018.
[6] E. Zianbetov, D. Galayko, F. Anceau, M. Javidan, C. Shan, O. Billoint,
A. Korniienko, E. Colinet, G. Scorletti, J. Akre et al., “Distributed
clock generator for synchronous SoC using ADPLL network,” in IEEE
Custom Integrated Circuits Conference (CICC), 2013, pp. 1–4.
[7] R. Islam and M. R. Guthaus, “HCDN: Hybrid-mode clock distribution
networks,” IEEE Trans. on Circuits and Systems I: Regular Papers,
vol. 66, no. 1, pp. 251–262, 2019.
[8] Z. Bai, X. Zhou, R. D. Mason, and G. Allan, “Low-phase noise clock
distribution network using rotary traveling-wave oscillators and built-in
self-test phase tuning technique,” IEEE Trans. on Circuits and Systems
II: Express Briefs, vol. 62, no. 1, pp. 41–45, 2014.
[9] L. Xiu, “Clock technology: The next frontier,” IEEE Circuits and
Systems Magazine, vol. 17, no. 2, pp. 27–46, 2017.
[10] P. E. Ross, “Why CPU frequency stalled,” IEEE Spectrum, vol. 45,
no. 4, pp. 72–72, 2008.
[11] V. Gutnik and A. Chandrakasan, “Active GHz clock network using
distributed PLLs,” IEEE Journal of Solid-State Circuit, vol. 35, no. 11,
pp. 1553–1560, 2000.
[12] F. Mahony, “10 GHz clock distribution using coupled standing-wave
oscillators,” IEEE Int. Solid-State Circuits Conference, 2003.
[13] Y. Take, N. Miura, H. Ishikuro, and T. Kuroda, “3D clock distribution
using vertically/horizontally-coupled resonators,” in IEEE Int. Solid-
State Circuits Conference, 2013, pp. 258–259.
[14] M. Javidan, E. Zianbetov, F. Anceau, D. Galayko, A. Korniienko,
E. Colinet, G. Scorletti, J. Akre, and J. Juillard, “All-digital PLL array
provides reliable distributed clock for SOCs,” in Circuits and Systems
(ISCAS), 2011 IEEE Int. Symposium on. IEEE, 2011, pp. 2589–2592.
[15] J. Tierno, A. Rylyakov, and D. Friedman, “A wide power supply range,
wide tuning range, all static CMOS all digital PLL in 65 nm SOI,”
IEEE Journal of Solid-State Circuits, vol. 43, no. 1, pp. 42–51, 2008.
[16] G. Pratt and J. Nguyen, “Distributed synchronous clocking,” IEEE
Trans. on Parallel and Distributed Systems, vol. 6, no. 3, pp. 314–328,
1995.
[17] C. Shan, D. Galayko, F. Anceau, and E. Zianbetov, “A reconfigurable
distributed architecture for clock generation in large many-core SoC,”
in Int. Symposium on Reconfigurable and Communication-Centric
Systems-on-Chip (ReCoSoC), 2014 9th, 2014, pp. 1–8.
[18] J.-M. N. Akre´, J. Juillard, D. Galayko, and E. Colinet, “Synchronization
analysis of networks of self-sampled all-digital phase-locked loops,”
Circuits and Systems I: Regular Papers, IEEE Trans. on, vol. 59, no. 4,
pp. 708–720, 2012.
[19] A. Korniienko, G. Scorletti, E. Colinet, E. Blanco, J. Juillard, and
D. Galayko, “Control law synthesis for distributed multi-agent systems:
Application to active clock distribution networks,” in Proceedings of the
American Control Conference, 2011, pp. 4691–4696.
[20] E. Koskin, E. Blokhina, C. Shan, E. Zianbetov, O. Feely, and
D. Galayko, “Discrete-time modelling and experimental validation of
an all-digital PLL for clock-generating networks,” in IEEE Int. New
Circuits and Systems Conference (NEWCAS), 2016, pp. 1–4.
[21] E. Koskin, D. Galayko, and E. Blokhina, “Averaging techniques for
the analysis of event driven models of All Digital PLLs,” in IEEE Int.
Symposium on Circuits and Systems (ISCAS), 2018, pp. 1–5.
[22] V. F. Pavlidis, I. Savidis, and E. Friedman, “Clock distribution net-
works for 3-D integrated circuits,” in IEEE Custom Integrated Circuits
Conference (CICC), 2008, pp. 651–654.
