Measuring, monitoring, and maintaining timing at large and small scales by Gantsog, Enkhbayasgalan
MEASURING, MONITORING, AND MAINTAINING TIMING AT LARGE AND SMALL 
SCALES 
 
 
 
 
 
 
 
 
A Dissertation 
Presented to the Faculty of the Graduate School 
of Cornell University 
In Partial Fulfillment of the Requirements for the Degree of 
Doctor of Philosophy 
 
 
 
 
 
 
by 
 Enkhbayasgalan Gantsog 
December 2017 
  
  
 
 
 
 
 
 
 
 
 
 
© 2017 Enkhbayasgalan Gantsog  
ALL RIGHTS RESERVED 
 iii 
 
MEASURING, MONITORING, AND MAINTAINING TIMING AT LARGE AND SMALL 
SCALES 
 
 Enkhbayasgalan Gantsog, Ph.D. 
Cornell University 2017 
 
This thesis explores techniques for measuring, monitoring and maintaining timing 
at small and large scales. At small scales, timing non-idealities of clock signals is of 
interest. As clock speeds become higher and higher in modern circuits, non-idealities 
such as clock duty-cycle, clock skew and jitter become proportionally large. 
Therefore, on-chip characterization of the clock using low power is important. A 
stochastic technique for on-chip measurement of such non-idealities is introduced. The 
technique uses a simple noisy oscillator to perform random sampling, allows easy 
integration in a CMOS process and is a promising alternative to direct measurement. 
Theoretical analysis proving the accuracy and robustness of the technique is presented. 
An implementation in CMOS 65nm process, occupying an active area of 0.015 mm2 
and consuming 0.89 mW, achieves a root mean square error of 0.1 ps and 0.31 ps in 
externally referenced and self-referenced jitter measurements respectively. To the best 
of our knowledge, the stochastic technique is the only fully on-chip jitter measurement 
technique that does not require post processing to obtain the jitter amplitude.  
At large scales, this work explores a technique to achieve and maintain low-power 
synchronization of long-range peer-to-peer (P2P) RF system. Once synchronized, 
radio nodes can achieve significant power savings by turning off the RF front-end 
most of the time. Such aggressive duty-cycling allows battery operated radio to 
directly communicate over long range, enabling a variety of applications, such as IoT 
 iv 
devices that do not strain the existing infrastructure and communication in natural 
disaster scenarios where infrastructure is unavailable. Existing synchronization 
techniques for narrowband radio are not scalable to large number of nodes and are 
often asymmetric (e.g. they require one central node that consumes high power). To 
solve this problem of scalable, long-range, P2P narrowband radio synchronization, a 
low-power signal-processor utilizing the pulse coupled oscillator (PCO) scheme for 
low-latency detection of syncword for aggressive duty-cycling is presented. The signal 
processor is insensitive to phase and frequency mismatch and compatible with 
commercial RF front ends. It consumes 5.1 uW in 0.01% duty-cycled mode while 
detecting a 63-bit syncword at 1.25 Mbps with BER=10-3 at SNR=18.4dB. 
 v 
BIOGRAPHICAL SKETCH 
 Enkhbayasgalan Gantsog received the B.S. degree (highest honors) in electrical 
engineering with a minor in business from Lehigh University, Bethlehem, PA, in 
2011. He joined Dr. Alyssa Apsel’s group at Cornell University, Ithaca, NY, in 2012 
to pursue the Ph.D. degree in electrical engineering. 
His research interests include analog and mixed-signal circuit design in high-
precision timing measurements and synchronized RF mesh-network systems. 
 
 
 
 
 
  
 vi 
 
 
 
 
 
 
 
 
 
 
 
To My Late Parents 
 vii 
ACKNOWLEDGMENTS 
Like many things in life, hard work alone is not enough to conduct good research 
and obtain a PhD degree. It also takes guidance from mentors, help from colleagues, 
support from friends and family, and a bit of luck. During my time at Cornell, I was 
fortunate enough to meet many people who directly and indirectly helped me reach 
this final point of my PhD career. 
First and foremost, I would like to thank my advisor, Dr. Alyssa Apsel. She has 
taught me how to think creatively, solve open-ended problems, and learn the ins and 
outs of conducting research. She has also been a very supportive and understanding 
advisor. Knowing that my advisor would always back me when things outside of my 
control occur is a great feeling to have. 
I would also like to extend my gratitude to my committee members, Dr. Alyosha 
Molnar and Dr. Edwin Chihchuan Kan. Their insights and comments on my research 
have been very valuable. I only wish I had sought their expertise at the earlier stage of 
my second research project. I am also grateful to Dr. Ehsan Afshari for teaching me 
the basics of analog circuit design and microwave theory and taking the time to sit on 
my Q exam. 
I extend thanks to the past and current members of Apsel group and the other 
circuits groups for providing a fertile environment for doing research. Special thanks 
go to Ivan Bukreyev, who did a tremendous amount of work for the PCO project. 
Without his help, I would not have been able to finish this project. I also thank Dr. 
Xiao Wang for taking so much time out of his busy life to provide interesting and 
stimulating discussions.  
I was very fortunate that my second project was in collaboration with and funded 
by Qualcomm. I thank Frank Lane from Qualcomm (now at Mixcomm, Inc.) for his 
valuable comments and discussion about the research project as well as his mentorship. 
 viii 
I have many friends in Ithaca and all around the world that I would like to thank 
for making my time at Cornell a positive experience, but it would be impossible to list 
all of their names here. Special thanks to Ana, Ivan Stoev, Oat and Michael (in 
alphabetical order) for being there during the toughest periods of my life.  
Last but not least, I would like to thank my family for their love. The memories of 
my late father’s work ethic and courage inspired me to push myself when I hit a wall. I 
thank my late mother for her unconditional love and support. I am also grateful to my 
sister for her love and encouragement. She is my angel. I thank my brother for 
stepping up to take the responsibility for family obligations back home. Knowing that 
he was there and making the best decisions, I was able to focus on my studies at 
Cornell.  
 ix 
TABLE OF CONTENTS 
Chapter 1 ................................................................................................................ 18 
1.1 On-chip measurement of clock non-idealities .............................................. 18 
1.2 Low-power and scalable synchronization of long-range P2P RF system .... 20 
Chapter 2 ................................................................................................................ 23 
2.1 Background ................................................................................................... 23 
2.2 Proposed stochastic measurement technique ................................................ 24 
2.2.1 Delay and duty-cycle measurement ....................................................... 26 
2.2.2 Jitter measurement ................................................................................. 31 
2.3 Theory ........................................................................................................... 32 
2.3.1 Engineering the oscillator cycle jitter to be Gaussian ........................... 32 
2.3.2 From Gaussian jitter to uniform sampling ............................................. 33 
2.3.3 Error in uniform sampling and the output error in delay measurement 
mode ..................................................................................................................... 38 
2.3.4 Theoretical description of the jitter measurement process .................... 40 
2.3.5 Measurement of clock jitter ................................................................... 45 
2.3.6 Measurement error due to the internal noise of the circuit and the REF 
CLK jitter .............................................................................................................. 48 
2.4 Appendix ...................................................................................................... 50 
2.4.1 Counter output when the clock jitter has bimodal distribution ............. 51 
2.4.2 Counter output when the clock jitter is periodic ................................... 54 
Chapter 3 ................................................................................................................ 60 
 x 
3.1 Overall architecture ...................................................................................... 60 
3.1.1 Jittery oscillator ..................................................................................... 62 
3.1.2 Edge detector ......................................................................................... 63 
3.1.3 Delay block ............................................................................................ 64 
3.2 Measurements ............................................................................................... 65 
3.3 Conclusion .................................................................................................... 69 
Chapter 4 ................................................................................................................ 72 
4.1 Pulse coupled oscillators .............................................................................. 72 
4.2 PCO for synchronization of IR-UWB radio nodes ....................................... 74 
4.3 PCO for synchronization of long-range narrowband RF network ............... 75 
4.3.1 Investigation of sequences for the syncword ......................................... 76 
4.3.2 Range estimate ....................................................................................... 78 
4.3.3 PCO network simulation ....................................................................... 79 
4.3.4 Simulation results and discussion .......................................................... 81 
Chapter 5 ................................................................................................................ 86 
5.1 Overview of the synchronizer block ............................................................. 86 
5.2 Signal processor ............................................................................................ 87 
5.2.1 Correlator ............................................................................................... 89 
5.2.2 Peak detector ......................................................................................... 92 
5.2.3 Differential detector ............................................................................... 93 
5.2.4 Amplitude detector ................................................................................ 96 
5.2.5 Dual core ............................................................................................... 98 
 xi 
5.3 Digital PCO .................................................................................................. 99 
5.3.1 Digital PCO Model .............................................................................. 100 
5.3.2 Network simulation with digital PCO ................................................. 102 
5.3.3 Simulation results and discussion ........................................................ 103 
5.3.4 Digital PCO implementation ............................................................... 108 
5.4 Sequence generator ..................................................................................... 109 
5.5 Measurement results ................................................................................... 110 
5.6 Conclusion .................................................................................................. 123 
Chapter 6 REFERENCES .................................................................................... 126 
 
 
  
 xii 
LIST OF FIGURES 
Figure 2.1. Two input signals and an internal signal are used in the proposed 
technique. The internally generated sampling signal is used to check the state of the 
input signals at its rising edges, which fall at random locations due to high jitter 
(represented in gray). .................................................................................................... 25 
Figure 2.2. The rising edges of the sampling signal may fall at random phases of the 
unit period of the input signals. Over many cycles of the sampling signal, its edges 
will cover the whole unit period with fine resolution. Those edges that fall within the 
counting region are counted. The number of counted edges is proportional to the delay 
between the input signals. ............................................................................................. 25 
Figure 2.3. The measurement runs for N cycles of the sampling signal. A sampling 
edge is counted if the first input is high while the other is low when the edge arrives. 
This condition defines the counting region. ................................................................. 27 
Figure 2.4. The expected value of the measurement output has linear relationship with 
the delay. Assuming 50% duty-cycle in the clock signals, the count goes up to ࡺ/૛, 
corresponding to a delay of half a clock period. .......................................................... 28 
Figure 2.5. When the duty-cycle of the input signals is not 50%, the counting region is 
reduced. The dotted lines represent the input signals with 50% duty-cycle and the 
corresponding counting region per input signal period. ............................................... 28 
Figure 2.6. When the duty-cycle of the input signals is not 50%, the maximum count is 
limited and ambiguity between the count and delay exists for large delays. The dotted 
lines represent the case where input signals have 50% duty-cycle. ............................. 29 
Figure 2.7. The counting region is empty if JIT CLK and REF CLK are jitter free. Any 
jitter event in JIT CLK (represented in gray) causes the counting region to appear. If 
any sampling edge falls within the region, then it is counted. REF CLK blocks half of 
the jitter distribution in JIT CLK as shown. ................................................................. 32 
Figure 2.8. The sampling edge delay referred to the clock unit period is the phase 
difference between the sampling edge and the last rising edge of the clock. ............... 34 
Figure 2.9 The pdf of the sampling edge on the clock unit period is the sum of the pdf 
tails of the VCO cycle jitter distribution. The clock period where the sampling signal 
is most likely to occur is denoted m. ............................................................................ 34 
Figure 2.10 (a) The probability density function of the location of the sampling edge 
with respect to the clock unit period. Both plots are obtained by numerically solving 
(2.3) with ࣌ࡿࢇ࢓ࢀ࡯ࡸࡷ ൌ ૙. ૞૝ and ࢚ࣆ, ࢏ ൌ ૙. ࡭	is the amplitude of the sinusoidal 
variation on the uniform distribution. (b) The sinusoidal variation of ܘ܌܎_࣎࢏ becomes 
 xiii 
negligible as ࣌ࡿࢇ࢓ grows. Hence, when the rms jitter of the sampling oscillator is on 
the order ࢀ࡯ࡸࡷ, ࣎࢏ may be assumed to have a uniform distribution. .......................... 37 
Figure 2.11 Jitter terminology used in this thesis. ........................................................ 41 
Figure 2.12 Clock signal with random (Gaussian) jitter. Its rms/standard deviation 
value is represented as ࣌࡯ࡸࡷ. ...................................................................................... 45 
Figure 2.13 Mean and standard deviation of the counter output for the simulations of 6 
GHz (top/red) and 10 GHz (bottom/blue) clocks with random jitter. The simulated 
results (square points) match the theoretical estimates (lines) provided by (2.22) and 
(2.23). ........................................................................................................................... 48 
Figure 2.14 Clock signal with bimodal/Dirac distributed deterministic jitter and small 
random (Gaussian) jitter. The distance between the ideal clock edge and the Dirac 
peaks shown in dotted lines is ࣎ࢾࢾ. .............................................................................. 48 
Figure 2.15 The expression in parentheses in (2.33) and its approximations at small 
and large ࢽ values (left). The same plot normalized to the expression (right). ............ 54 
Figure 2.16 Matlab simulation of the counter output when 6 GHz JIT CLK has a 
bimodal/Dirac distributed deterministic jitter and random Gaussian jitter as shown in 
Figure 2.14. (a) When the Dirac peak distance is proportional to the rms jitter, the 
expected value of the count has linear relationship with ࣌ࡾࢇ࢔ࢊ since the general 
shape of the jitter distribution does not change, but it only widens, increasing the 
counting region. (b) With constant ࣎ࢾࢾ, the expected value of the count has linear 
relationship with large ࣌ࡾࢇ࢔ࢊ where the random jitter is dominant. For small rms 
jitter, the deterministic jitter is dominant and the expected value of the count is 
constant since ࣎ࢾࢾ is set constant. (c) Similarly, with constant ࣌ࡾࢇ࢔ࢊ, the expected 
value of the count has linear relationship with large ࣎ࢾࢾ where the deterministic jitter 
is dominant. For small deterministic jitter, the random jitter is dominant and the 
expected value of the count is constant since ࣌ࡾࢇ࢔ࢊ is set constant.......................... 56 
Figure 2.17 Clock signal with single tone periodic jitter (shown in dotted line) and 
random Gaussian jitter (not shown). The convolution of the periodic jitter and the 
Gaussian jitter results in the distribution shown in solid line. The distance between the 
ideal clock edge and the farthest periodic jitter is ࣎ࡼࡶ. The random jitter has standard 
deviation ࣌ࡾࢇ࢔ࢊ. ........................................................................................................ 57 
Figure 2.18 Matlab simulation of the counter output when 6 GHz JIT CLK has a 
deterministic periodic jitter and random Gaussian jitter as shown in Figure 2.17. 
Similar to Figure 2.16, (a) when periodic peak distance is proportional to the rms jitter, 
the expected value of the count has linear relationship with ࣌ࡾࢇ࢔ࢊ. (b) With constant 
࣎ࡼࡶ, the expected value of the count has linear relationship with large ࣌ࡾࢇ࢔ࢊ where 
the random jitter is dominant. For small rms jitter, the periodic jitter is dominant and 
the expected value of the count is constant. (c) Similarly, with constant ࣌ࡾࢇ࢔ࢊ, the 
 xiv 
expected value of the count has linear relationship with large ࣎ࡼࡶ where the periodic 
jitter is dominant. For small periodic jitter, the random jitter is dominant and the 
expected value of the count is constant. ....................................................................... 58 
Figure 3.1. Simplified block diagram of the proposed technique, which can be used for 
measurement of clock jitter, delay, and duty-cycle. ..................................................... 61 
Figure 3.2. (a) The delay block consisting of 8 stages of current starved inverters. The 
delay is adjusted by the current starving transistors. (b) High-jitter VCO. .................. 61 
Figure 3.3. (a) Edge detector block (b) D-flip flop with duty-cycle tuning option. ..... 63 
Figure 3.4. Placing the adjustable delay block on the sampling signal path to 2nd D-flip 
flop (right) achieves the same result as placing it on the path of the high speed Input 1 
(left). ............................................................................................................................. 63 
Figure 3.5. (a) Die photo. (b) VCO period histogram. Period jitter = 54.5 ps rms. ..... 65 
Figure 3.6 On-chip delay measurement. ....................................................................... 67 
Figure 3.7. On-chip jitter measurement in (a) external referenced mode and (b) self-
referenced mode where the clock is delayed by one period. Counter outputs are 
converted to show the estimated rms jitter in ps. ......................................................... 68 
Figure 4.1. Transient of the synchronization of 3 PCOs .............................................. 73 
Figure 4.2. Comparison of various sequences for syncword ........................................ 77 
Figure 4.3. Estimate of the transmission range ............................................................ 79 
Figure 4.4. Topology of the randomly located nodes. Nodes within 100 m are 
connected to each other. ............................................................................................... 81 
Figure 4.5. Average number of cycles to synchronize vs. period difference between the 
leader node and the 2nd fastest node ............................................................................. 83 
Figure 4.6. Average number of cycles to synchronize vs. processing delay ................ 84 
Figure 4.7. Maximum relative jitter vs. period difference between the leader node and 
the 2nd fastest node. The jitter of the leader node is plotted as a reference. ................. 84 
Figure 5.1. Block Diagram of the system. Synchronizer block (PCO Sync) shown in 
red is designed and fabricated. Transient of PCO phase state is illustrated as an 
example of three node synchronization. ....................................................................... 87 
Figure 5.2. Block diagram of the proposed signal processor. ...................................... 89 
 xv 
Figure 5.3. The correlator consists of sampling cells that store the latest 63 bits of the 
decoded signal. The output of the differential detector, representing the current data 
bit, is connected to 63 cells. Half a bit period later, different cells containing the latest 
63 bits are connected to the output. .............................................................................. 90 
Figure 5.4. The correlator cell consists of a sampling capacitor and switches 
connecting to other blocks. The numbers in circles represent which phases the 
corresponding switches turn on in. In phase (iii), the capacitor may be connected to the 
output (ࢂࢉ࢕࢘࢘) in either positive or negative configuration based on ࢙࢑. .................. 91 
Figure 5.5. (Left) The peak detector consists of three comparators and a logic block 
for majority vote for the output. (Right) Dynamic comparator used in the peak 
detector. ........................................................................................................................ 92 
Figure 5.6. (Left) Differential detector diagram and (Right) its sampling cell. ........... 96 
Figure 5.7. Illustration of digital PCO state function. The state of the main counter 
lasts the same number of reference periods as its count. ............................................ 100 
Figure 5.8. Illustration of the difference between design A and design B. ................ 102 
Figure 5.9. Time of firing of all nodes relative to each time the fastest node fires 
(design A). .................................................................................................................. 103 
Figure 5.10. Synchronization speed, i.e. the average number of cycles (across 100 
different initial conditions) the leader node takes  to synchronize. Large and small 
jitter refer to reference clock jitter of 1 µs and 0.1 ns, respectively. .......................... 104 
Figure 5.11. Synchronization quality measured by the average of the maximum 
relative jitter. With the relaxed requirement, the network is considered synchronized if 
the period of each node is within Tref of the period of the fastest node. ..................... 105 
Figure 5.12. Time of firing of all nodes relative to each time the fastest node fires 
(design B). .................................................................................................................. 106 
Figure 5.13. Synchronization speed. With fast reference clocks (small Tref), the 
discretization error is negligible and the performance of design A and design B 
approaches that of analog oscillator. .......................................................................... 107 
Figure 5.14. Synchronization speed with varying coupling strength ......................... 107 
Figure 5.15. Basic block diagram of the digital PCO ................................................. 109 
Figure 5.16. Die and PCB photo ................................................................................ 111 
Figure 5.17. Differential detector: 1st and 2nd waveform – I and Q inputs, 3rd waveform 
– digitized output, 4th waveform – peak detector output. Differential detector correctly 
 xvi 
decodes the baseband signals with carrier frequency offset modulation. The envelope 
wavelength is deliberately shortened to show the bit transitions clearly. .................. 112 
Figure 5.18. Correlation success rate vs. input amplitude. Blue/bottom waveform has 
fixed ࢂ࢚ࢎ࢘ࢋ࢙ࢎ. Red/top waveform has amplitude detector turned on. ..................... 113 
Figure 5.19. Correlator output of Core 1 (top) and Core 2 (bottom). When one of the 
cores samples the input signal at bit transitions and skips a pulse (inside red 
rectangles), the other core samples at the center of the bits, producing the correct 
peaks. .......................................................................................................................... 114 
Figure 5.20.  BER vs SNR at the chip input. .............................................................. 115 
Figure 5.21. Setup for BER measurement. ................................................................. 116 
Figure 5.22. BER in the presence of multipath. Multipath interference of equal 
magnitude does not significantly degrade performance up to distances of 144m of 
system range. .............................................................................................................. 116 
Figure 5.23. Large in-band interferers with signal ratios outside of the blue region and 
with offsets between 0.6 and 63 bits can degrade detection. ...................................... 117 
Figure 5.24. Digital PCO circuit output – Cadence transient simulation. A coupling 
pulse received at t =7 µs advances the oscillator state ............................................... 118 
Figure 5.25. Transient of digital PCO. ....................................................................... 118 
Figure 5.26. Transient of the sequence generator ....................................................... 119 
Figure 5.27. Transient of a node locking to a received signal .................................... 119 
Figure 5.28. Wireless synchronization of three nodes ............................................... 120 
Figure 5.29. The reset signal of synchronized PCO and the histogram of its rising 
edge. ............................................................................................................................ 121 
Figure 5.30. Duty-cycled operation. System locks with low latency despite long “off” 
cycle and random “on” states. .................................................................................... 122 
 
  
  
xvii 
 
LIST OF TABLES 
Table 3.1: Comparison with Previous On-Chip Delay Measurement Circuits ............ 71 
Table 3.2: Comparison with Previous On-Chip Jitter Measurement Circuits .............. 71 
Table 5.1: Power consumption of synchronizer block ............................................... 111 
Table 5.2: Power consumption of a duty-cycled RF front end ................................... 112 
Table 5.3: Comparison with the state-of-the-art wake-up radios. .............................. 123 
  
  
 
 
18 
 
 
 
 
Chapter 1  
INTRODUCTION 
Precise timing is crucial in many applications. It enables high performance 
computing, fast data rate, efficient time division duplexing, and extended battery life 
through duty-cycling of the power delivered to circuitry. This thesis explores 
techniques for measuring, monitoring and maintaining timing at small and large 
scales.  
At small scales, timing non-idealities of clock signal such as clock duty-cycle, 
clock skew and jitter are investigated and a stochastic technique for on-chip 
measurement of such non-idealities is introduced. At large scales, this work presents a 
technique to achieve and maintain low-power synchronization of long-range peer-to-
peer (P2P) RF system using pulse coupled oscillators (PCO). 
1.1 On-chip measurement of clock non-idealities 
As clock speeds increase in modern high-performance circuits, timing non-
idealities, such as jitter, clock skew and unbalanced duty-cycle, become major 
bottlenecks of performance. On-chip measurement of these non-idealities has become 
increasingly important for debugging, diagnostics, calibration, and monitoring of 
device failure and aging. For example, delay measurement circuits [1]–[9] can be used 
for characterizing phase errors in track-and-hold circuits and phase interpolators. Jitter 
  
 
 
19 
 
 
 
measurement circuits [10]–[16] can be used for characterizing the jitter in all-digital 
phase locked loops and clock-and-data recovery circuits. 
Since the timing errors are small in absolute magnitude but proportionally large in 
high speed clocks, deterministic measurement of such errors often requires large area, 
high power consumption or heavy off-chip computation. Stochastic measurement 
using a noisy sampler is an attractive solution in this application because a simple low-
precision circuit of a noisy oscillator, comparators and counters can perform high-
accuracy measurement by utilizing the inherent averaging effect of the random 
process. 
In the first part of this thesis, we present a stochastic technique for on-chip 
measurement of three different timing non-idealities: clock jitter, phase delay and duty 
cycle. This work provides theoretical analysis proving the validity and robustness of 
the technique in addition to demonstrations and measurement results for all three 
measurement types. 
The next two chapters are organized as follows. Section 2.1 reviews the existing 
schemes for on-chip measurement of delay and jitter. Section 2.2 introduces the 
proposed technique first in the application of delay measurement and then in the 
application jitter measurement. Section 2.3 presents the theoretical analysis on the 
validity of the technique. Chapter 3 provides details for the circuit implementation of 
the technique (Section 3.1) and describes the measurements and results of the 
fabricated chip (Section 3.2). 
  
 
 
20 
 
 
 
1.2 Low-power and scalable synchronization of long-range P2P RF system  
Events such as Hurricane Sandy occasionally remind us of the fragility of our 
communication infrastructure. Ad-hoc discovery of peers and communication would 
reduce dependence on infrastructure and benefit emergency first-responders as peer-
to-peer (P2P) communication is inherently less dependent on existent structures. It 
would also benefit other commercial applications by enabling “Internet of Things” 
(IoT) devices to avoid straining the existing infrastructure and utilize localized 
communication without sending geolocation information to the cloud, improving 
privacy.  
Many of the capabilities of P2P networks have been demonstrated and the link 
management problems solved [17]. Yet long-range P2P communication is difficult to 
realize on battery-operated platforms (e.g. mobile). This is due in part to the high 
power consumption required to achieve P2P communication on such platforms 
without aggressive duty cycling. Duty-cycling, or turning off the power-hungry RF 
front-end most of the time and turning it on briefly at the right time to transmit/receive 
data for significant power savings, requires global synchronization, and in the case of 
an ad-hoc network, an “emergent” synchronous network capable of providing precise 
system timing in order to minimize the duration that the front end is on for. 
Unlike in cellular networks, peer-to-peer nodes cannot rely on strong master-to-
slave asymmetries. Since all peers have limited energy resources, none can be 
subjected to higher power demands of a master node. In addition, asymmetric P2P 
  
 
 
21 
 
 
 
systems are typically not scalable due to the complexity that the master node must 
handle with increasing number of slave nodes. Conventional P2P protocols such as 
Bluetooth and ZigBee are examples. 
One method that has been demonstrated for scalable synchronization of P2P 
networks is the idea of utilizing Pulse Coupled Oscillator (PCO) networks [18], [19]. 
Synchronization of oscillator networks has been rigorously proven to always occur in 
groups of any number of ideal oscillators, and requires only simple connectivity. This 
network was studied in less ideal conditions and shown to be useful in scalable 
wireless ultra-wideband (UWB) networks with pulse rates of 150 kHz [20].  
In various communication schemes, it is a common practice to use a syncword to 
identify the start of a frame and synchronize a channel. This practice reduces 
interference and enables devices to identify other in-network devices without false 
triggers. 
To achieve long-range synchronization, the correlator must be compatible with 
narrowband transceivers. At the same time, the entire system needs to be duty-cycled 
to conserve energy. Addressing both of these challenges requires consideration of 
several additional factors. First, the latency of detecting a syncword should be 
minimized in order to allow aggressive duty-cycling below 1%. To accomplish this, 
the radio must be able to switch on and off quickly and should not require a preamble 
be sent with every sync word. Therefore, the receiver must be insensitive to phase and 
frequency offsets by design. Second, the radio must be able to identify a syncword in 
an asynchronous fashion, without cross-correlation errors. Third, as the range of the 
  
 
 
22 
 
 
 
network may be large, the receiver must tolerate a wide dynamic range of inputs 
without time misalignment or false positives. Finally, all of the above must be done 
with a very low power budget to enable energy harvesting or long battery lifetimes.  
In this work, we study the PCO based synchronization scheme in narrowband P2P 
system and demonstrate a low-power signal processor for low-latency detection of a 
programmable syncword in wireless nodes. It is compatible with commercial RF front 
ends, and can enable aggressive duty-cycling for power savings in such systems 
Chapter 4 presents the system level analysis of applying the PCO based 
synchronization in a narrowband radio. Different sequences are evaluated for the 
syncword. MATLAB simulation of the P2P network synchronization that takes into 
account the syncword processing delay and digital PCO is performed. Chapter 5 
provides details for the circuit implementation of the baseband signal-
processor/synchronizer block while discussing the factors that degrade the 
synchronization quality, such as carrier phase and frequency mismatch, varying signal 
power and multipath. Finally, the measurements and results of the fabricated chip and 
demonstration of the synchronization of 3 node system are presented.  
  
 
 
23 
 
 
 
Chapter 2  
STOCHASTIC MEASUREMENT OF CLOCK NON-IDEALITIES 
2.1 Background 
Since measuring path delay is increasingly important in today’s high-speed low-
power digital circuits, there has been significant interest in high accuracy delay 
measurement circuits. Previous on chip delay measurement circuits can be classified 
into four main groups.  
In ring-oscillator based circuits [1], [2], the delay path is used as part of a ring 
oscillator chain and oscillation frequency is measured with and without the delay path. 
However, the resolution of the measurement is lower. Vernier delay line based circuits 
[3], [4] achieve good timing resolution using small delay cells to create Vernier scale. 
However, measuring a long delay path requires a large circuit. Time-to-digital 
converter based measurements [5], [6] typically first convert the time delay to a 
voltage value and then use an ADC or similar circuit to obtain a digital value. For high 
resolution, these designs also have large area overhead. 
Delay measurement circuits that use random sampling have been proposed [7]–[9], 
however all of these designs utilize a sampling oscillator whose frequency is 
modulated by a pseudo-random number generator (PRNG). Although the sampling 
signal should have uniform distribution with respect to the input signal transitions for 
linear measurement, it is not clear if PRNG based oscillator satisfies this requirement. 
  
 
 
24 
 
 
 
The technique presented here instead uses a simpler noisy VCO that generates a true 
random jitter and we provide analysis showing the sampling edges are drawn from 
uniform distribution.  
Several different techniques for on-chip measurement of clock jitter have also been 
reported in literature. Similar to delay measurement, Vernier delay line and time-to-
digital converter based circuits are also used for measuring jitter [10], [11]. A phase 
detector based circuit is proposed in [12]. Liang et al. measure the jitter of random 
data by correlating the phase detector outputs from two CDR paths [13]. Since the 
CDR is an integral part of the jitter measurement, this technique is not suitable for 
non-CDR circuits. Counter based circuits [14], [15] sweep an external clock (or the 
input clock itself) at several delay points to map the cumulative distribution function 
of the jitter. These and all of the previously reported jitter measurement techniques 
require off-chip post processing to obtain the jitter magnitude. The advantage of the 
technique proposed here is that the circuit directly outputs a digital value proportional 
to the jitter magnitude without the need for off-chip processing. This allows the 
opportunity to monitor aging/environmental effects on the clock jitter, automatically 
adjust control parameters for the clocking circuitry by placing the jitter measurement 
output in a feedback loop, and efficiently diagnose chips in volume manufacturing. 
2.2 Proposed stochastic measurement technique 
We present the basic idea of using a noisy stochastic sampler constructed from a 
free running oscillator to measure delay and jitter between two clock signals on-chip. 
  
 
 
25 
 
 
 
In Section 2.3, we analyze the system and show that the error in the measurement is 
small. (E.g. the standard deviation of the variability in counter output for a delay equal 
to 0.1% of the input clock period is theoretically 0.005% of the period.) 
 
Figure 2.1. Two input signals and an internal signal are used in the proposed 
technique. The internally generated sampling signal is used to check the state of 
the input signals at its rising edges, which fall at random locations due to high 
jitter (represented in gray).  
 
Figure 2.2. The rising edges of the sampling signal may fall at random phases of 
the unit period of the input signals. Over many cycles of the sampling signal, its 
Sampling
Edges
Input 2
Input 1
Counting
Region
  
 
 
26 
 
 
 
edges will cover the whole unit period with fine resolution. Those edges that fall 
within the counting region are counted. The number of counted edges is 
proportional to the delay between the input signals.  
2.2.1 Delay and duty-cycle measurement 
We begin by assuming a general delay measurement setup shown in Figure 2.1. 
The two input signals are assumed to be identical clock signals with a constant delay 
relative to each other. A third signal is generated internally by a free-running on-chip 
oscillator. We assume that this signal has extremely high jitter and runs at much 
slower frequency than the input clock signals. Its jitter is expected to be large enough 
that the transitions may fall at random phases relative to the input signals. Over large 
N periods of the sampling signal, the transitions are expected to cover the full phase 
range of the “unit period” of the input signal with fine resolution (Figure 2.2). The 
transitions are then used to sample the state of the input signals at these phases. We 
will not call the sampling signal a clock since it is generated by a free-running 
oscillator and the input signals are actual clock signals. 
  
 
 
27 
 
 
 
 
Figure 2.3. The measurement runs for N cycles of the sampling signal. A 
sampling edge is counted if the first input is high while the other is low when the 
edge arrives. This condition defines the counting region.  
The stochastic sampling process can be described in the flow chart shown in 
Figure 2.3. At the rising edge of each sampling signal, the detection circuit checks if 
Input 1 is high and Input 2 is low. If that is the case, then the edge is counted, giving a 
measure of the delay between the input clock signals. After N sampling edges that 
sample across the full phase range of the unit period of the input signal, the number of 
counted edges is proportional to the delay between the input signals. When the two 
inputs are phase matched, the count is ideally 0 since the two inputs assume the same 
values at all phases. If either the first or the second input is delayed, the count is non-
zero. Assuming the input clocks have 50% duty-cycle, this count can go up to half N, 
which corresponds to a delay of half a period ( ஼ܶ௅௄/2). Increasing the delay beyond 
஼ܶ௅௄/2 results in overlap with the previous period, reducing the counting region and 
Start 
measurement
End 
measurement
Increment 
Edge 
Counter 
by one
Yes
Yes
No
Nth
sampling edge 
arrived?
Wait for 
sampling 
edge
Input 1 = High
Input 2 = Low
No
  
 
 
28 
 
 
 
thereby causing the count to decrease. The relationship between counter output and the 
delay is expected to be as shown in Figure 2.4. 
 
Figure 2.4. The expected value of the measurement output has linear relationship 
with the delay. Assuming 50% duty-cycle in the clock signals, the count goes up 
to ࡺ/૛, corresponding to a delay of half a clock period.  
 
Figure 2.5. When the duty-cycle of the input signals is not 50%, the counting 
region is reduced. The dotted lines represent the input signals with 50% duty-
cycle and the corresponding counting region per input signal period.  
TCLK/2-TCLK/2 0
N/2
C
ou
nt
Delay between Input 1 and Input 2
  
 
 
29 
 
 
 
 
Figure 2.6. When the duty-cycle of the input signals is not 50%, the maximum 
count is limited and ambiguity between the count and delay exists for large 
delays. The dotted lines represent the case where input signals have 50% duty-
cycle.  
When the duty cycle is not 50%, the counting region is reduced (Figure 2.5). 
Therefore, the maximum value of the counter is clipped below ܰ 2⁄  and a 
corresponding ambiguity arises in the count-delay relationship (Figure 2.6). In order to 
eliminate this ambiguity and maximize the range of measurable delay, the circuit may 
incorporate a calibration knob to vary the clock duty cycle. Duty-cycle measurement 
of a clock is achieved by setting Input 2 low so that the counting region is equal to the 
duty-cycle. 
For a perfectly linear relationship between the counter output and the delay, the 
phases of the sampling edges should ideally be evenly distributed across the unit 
0
N/2
C
ou
nt
Delay between Input 1 and Input 2
TCLK/2-TCLK/2
  
 
 
30 
 
 
 
period of the input clock. This linear relationship could be accomplished with either an 
extremely fast sampling signal so that a large number of samples can be taken within a 
single period of the input signal, or a slower sampling signal that is perfectly locked to 
the input signals with a constant offset in frequency (i.e. with negligible phase noise) 
so that different phases of the unit period can be sampled each sampling cycle. 
However, the circuit implementations of both of these non-stochastic methods are 
prohibitively costly in terms of power and area. The technique presented in this paper 
uses a sampling signal where the phase of the sampling edges on the unit period of the 
clock signal is randomly drawn from uniform distribution (Section 2.3.2). After N 
cycles, the sampling edges finely and nearly evenly cover the full unit period of the 
clock signal. 
The main importance of the proposed technique is that utilizing a simple sub-
sampling oscillator with large jitter allows the measurement of the delay or jitter of 
clock signals with high accuracy. We present the theoretical foundation of the 
technique in Section 2.3. Since the sampling edges fall at random phases of the unit 
period of the input signal, each measurement results in a slightly different counter 
output. Therefore, the expected value and the variance of the counter output and their 
relationship with the delay between the input signals are of interest. We provide this 
analysis in Section 2.3.3, preceded by the analysis of the probability distribution of the 
sampling edges in Section 2.3.2. 
The on-chip oscillator that generates the sampling signal can be any kind of 
oscillator as long as its cycle jitter is large and Gaussian as discussed in Section 2.3.1. 
  
 
 
31 
 
 
 
This work uses a single-ended ring VCO that is designed to be extremely noisy for this 
purpose. 
2.2.2 Jitter measurement 
The delay measurement technique may also be used to measure the jitter of a clock 
signal. We assume that the jittery clock (JIT CLK), whose jitter is to be measured, is 
supplied to Input 1. A reference signal (REF CLK), which can be either an externally 
provided clock or an internally generated copy of JIT CLK, is supplied to Input 2. If 
REF CLK is externally supplied, it should have the same frequency as JIT CLK and 
negligible jitter. 
The jittery clock signal and the reference signal are assumed to be phase matched 
and to have the same duty cycle. The circuit may incorporate a delay block for 
calibrating the phase match in addition to the duty-cycle calibration circuitry. If JIT 
CLK and REF CLK are jitter free, then the count will be 0 since they are phase 
matched. However, the clock jitter results in a random timing difference between the 
edges of the two signals every period (Figure 2.7). When the sampling edges occur 
during such timing differences, they are counted and the jitter is recorded. 
If a copy of JIT CLK is delayed by one period and used as REF CLK, the circuit 
operates in self-referenced mode. In this mode, period jitter is measured as opposed to 
the absolute jitter in externally referenced mode. 
  
 
 
32 
 
 
 
 
Figure 2.7. The counting region is empty if JIT CLK and REF CLK are jitter 
free. Any jitter event in JIT CLK (represented in gray) causes the counting 
region to appear. If any sampling edge falls within the region, then it is counted. 
REF CLK blocks half of the jitter distribution in JIT CLK as shown.  
2.3 Theory 
2.3.1 Engineering the oscillator cycle jitter to be Gaussian 
The proposed technique requires the cycle jitter (as defined in 2.3.4) of the 
oscillator to be large and Gaussian to guarantee a uniform distribution of the sampling 
edges across the unit period of the clock signal. This is achieved with a voltage 
controlled oscillator (VCO) that is extremely noisy. During the design of the VCO, we 
aimed to maximize the white Gaussian noise and minimize flicker noise because 
flicker noise may not be Gaussian. Therefore, thermal and/or shot noise must be 
maximized to be the dominant sources of the sampling signal jitter. The output signal 
of the VCO will accumulate noise from the supply, ground, and substrate. Hence, 
  
 
 
33 
 
 
 
good substrate isolation and supply decoupling are assumed to minimize noise that 
may not be white Gaussian.  
As the VCO accumulates random device noise over time to generate the cycle 
jitter, the sum of thermal noise perturbations is Gaussian distributed since thermal 
noise is white and Gaussian, hence independent over time. Similarly, the jitter portion 
due to the accumulation of shot noise is Gaussian. The sum of these two portions of 
different noise sources is also Gaussian as the thermal and shot noises are 
independent. Once the VCO edge transition is used to sample the input signals, the 
VCO starts accumulating device noise again, which is independent of the device noise 
perturbations that contributed to the jitter of the previous edge. Therefore, the cycle 
jitter of the VCO can be assumed to be Gaussian distributed. 
2.3.2 From Gaussian jitter to uniform sampling 
As noted in previous sections, one of the advantages of the proposed technique lies 
in that the VCO is designed to operate at a much lower frequency than the clock 
signal, but with rms jitter on the order of the clock period, ஼ܶ௅௄ (Figure 2.8), giving 
the effect of random sampling.  Here we analyze the distribution of sampling edges 
and show that over many cycles this produces an approximation to uniform sampling.  
  
 
 
34 
 
 
 
 
Figure 2.8. The sampling edge delay referred to the clock unit period is the phase 
difference between the sampling edge and the last rising edge of the clock.  
 
 
Figure 2.9 The pdf of the sampling edge on the clock unit period is the sum of the 
pdf tails of the VCO cycle jitter distribution. The clock period where the 
sampling signal is most likely to occur is denoted m.  
We define ߬௜ as the time delay of ݅௧௛ sampling edge with respect to the preceding 
rising-edge of the clock signal (Figure 2.8). ߬௜  takes a value between 0 and ஼ܶ௅௄ , 
Sampling
Signal
Input 1
τiτi‐1
TSam,i
_____TSam
TCLK
σSam
Sampling signal
Input 1
m+2m+1mm-1m-2
m+2
m+1
m
m-1
m-2
Σ 
τ	
0 TCLK
pd
f_τ
iሺτ
ሻ
pd
f_S
am
iሺtሻ
Unit 
CLK
Period
tµ,i t
  
 
 
35 
 
 
 
which denotes the nominal period of the input clock signal. The period of the sampling 
signal, ௌܶ௔௠,௜, is a Gaussian random variable with mean  ௌܶ௔௠തതതതതത and standard deviation 
ߪௌ௔௠  due to the large jitter in the sampling signal. Therefore, ߬௜  is also a random 
variable and it can be defined as  
߬௜ ൌ ൫߬௜ିଵ ൅ ௌܶ௔௠,௜൯݉݋݀ ஼ܶ௅௄. (2.1) 
While the location of the ݅௧௛ sampling edge is randomly drawn from a Gaussian 
distribution, the time delay ߬௜ has a different probability distribution as it is defined 
across the clock signal period where the sampling edge falls. Since the Gaussian 
distribution of the sampling clock jitter is designed to be much wider than ஼ܶ௅௄, the 
sampling edge may fall in any of the several clock periods coinciding with the 
Gaussian distribution (Figure 2.9). Hence, the probability density of ߬௜ is the sum of 
the probability density of the sampling edge in each of those clock periods. In other 
words, the probability density function (pdf) of ߬௜  can be derived by dividing the 
Gaussian pdf of the sampling edge into sections of ஼ܶ௅௄ and summing these sections 
as shown in Figure 2.9. Mathematically, the Gaussian pdf of ݅௧௛  sampling edge is 
represented as  
݌݂݀_ܵܽ݉௜ሺݐሻ ൌ 1ඥ2ߨߪௌ௔௠ଶ
݁ݔ݌ ൭െ൫ݐ െ ݐఓ,௜൯
ଶ
2ߪௌ௔௠ଶ ൱,
(2.2) 
where ݐఓ,௜ ൌ ሺ߬௜ିଵ ൅ ௌܶ௔௠തതതതതതሻmod	 ஼ܶ௅௄. Hence, the pdf of ߬௜ may be represented as  
  
 
 
36 
 
 
 
݌݂݀_߬௜ሺ߬ሻ ൌ ෍ ሾ݌݂݀_ܵܽ݉௜ሺ߬ ൅ ݇ ஼ܶ௅௄ሻሿ
ஶ
௞ୀିஶ
ൌ ෍ ቈ 1ඥ2ߨߪௌ௔௠ଶ
݁ݔ݌ ቆെ 12ߪௌ௔௠ଶ ൫߬ ൅ ݇ ஼ܶ௅௄ െ ݐఓ,௜൯
ଶቇ቉
ஶ
௞ୀିஶ
,	
(2.3) 
where ߬ is the time variable defined over the unit period of the clock signal on the 
interval ሺ0, ஼ܶ௅௄ሻ. 
Although the theoretical calculation of the series in (2.3) is complicated, a 
numerical analysis shows (Figure 2.10a) that the resulting pdf of ߬௜ is a flat line of 
1 ஼ܶ௅௄⁄  with one cycle of small-amplitude sinusoidal variation. The following 
equation has an excellent fit with the numerical data (correlation coefficient > 1 – 10-
15): 
݌݂݀_߬௜ሺ߬ሻ ൌ 1஼ܶ௅௄ ൅ ܣ ܿ݋ݏ ൬2ߨ
߬ െ ݐఓ,௜
஼ܶ௅௄
൰, (2.4) 
where	ܣ is the amplitude of the sinusoidal variation and a function of ஼ܶ௅௄ and ߪௌ௔௠. 
When ܣ is minimized, pdf_߬௜  becomes uniform distribution and the location of the 
previous sampling edges (e.g. ߬௜ିଵ) does not matter. Figure 2.10b shows that ܣ gets 
smaller with increasing ߪௌ௔௠ ஼ܶ௅௄⁄ . For example, when ߪௌ௔௠ ஼ܶ௅௄⁄  is larger than 0.62, 
ܣ normalized to 1 ஼ܶ௅௄⁄  is smaller than 0.001. Hence, ߬௜  can be assumed to have a 
uniform distribution when the rms jitter of the sampling signal is on the order of the 
clock period, minimizing the sinusoidal variation of the pdf. 
  
 
 
37 
 
 
 
 
Figure 2.10 (a) The probability density function of the location of the sampling 
edge with respect to the clock unit period. Both plots are obtained by numerically 
solving (2.3) with ࣌ࡿࢇ࢓ ࢀ࡯ࡸࡷ⁄ ൌ ૙. ૞૝  and ࢚ࣆ,࢏ ൌ ૙ . ࡭	 is the amplitude of the 
sinusoidal variation on the uniform distribution. (b) The sinusoidal variation of 
ܘ܌܎_࣎࢏  becomes negligible as ࣌ࡿࢇ࢓  grows. Hence, when the rms jitter of the 
sampling oscillator is on the order ࢀ࡯ࡸࡷ, ࣎࢏ may be assumed to have a uniform 
distribution.  
The design challenge of generating large enough sampling jitter when the clock is 
too slow is the main limitation of the proposed technique. For applications where the 
clock is very slow, the random edge generator should be designed to have more jitter, 
such as utilizing a noisy differential ring oscillator or injecting external white 
Gaussian noise. 
pd
f_τ
in
or
m
al
iz
ed
  
to
 1/
T CL
K
0 10.25 0.5 0.75
0.98
1.02
1.0
normalized to
0.3 0.90.5 0.7
0.3
0.2
0.1
0.0
normalized to 
An
or
m
al
iz
ed
 
to
 1/
T CL
K
  
 
 
38 
 
 
 
2.3.3 Error in uniform sampling and the output error in delay measurement 
mode 
The previous section demonstrated that the location of a sampling edge on the unit 
period of the clock signal is randomly drawn from a uniform distribution. Over many 
cycles of the sampling signal, the locations of the sampling edges superimposed onto 
the unit period of the clock signal are randomly spread out, but they are not perfectly 
evenly distributed. If we imagine Δݐ  to be the time distance between the adjacent 
sampling edges on the unit ஼ܶ௅௄ period (i.e. not the edges of consecutive sampling 
cycles), the Δݐ  partitions are not all the same size and they are different for each 
measurement. Therefore, the number of sampling edges within the counting region is 
slightly different for each delay/jitter measurement. This means that, for a given delay 
between input signals, the counter will output slightly different numbers. Conversely, 
a counter output from a single measurement can only estimate the delay to be a range 
of values. Intuitively, the more sampling edges the measurement uses, the more 
accurate the measurement becomes since Δݐ is reduced. In this section, we estimate 
the ambiguity in the counter output and calculate how many sampling cycles are 
required for the measurement. 
One way to characterize the error due to irregular spacing of the sampling edges is 
to define a counting region (Figure 2.2) with width Δܶ  and quantify how many 
sampling edges fall within that partition.  If we define the random variable ܻ  to 
describe the event where a sampling edge falls within the region Δܶ, then ܻ has a 
  
 
 
39 
 
 
 
Bernoulli distribution with success probability of ݌ ൌ Δܶ ஼ܶ௅௄⁄  since the sampling 
edge has uniform distribution over ஼ܶ௅௄. After ܰ cycles of the sampling signal, the 
number of sampling edges that fall within Δܶ (i.e. the value that the counter outputs) is 
a random variable ܵ that is defined by 
ܵ ൌ ଵܻ ൅ ଶܻ ൅ ⋯൅ ேܻ. (2.5) 
Since the sum of independent and identically distributed (i.i.d) Bernoulli 
distributions has Binomial distribution, ܵ has Binomial distribution with mean 
ߤௌ ൌ ܰ݌ ൌ ܰ ߂ܶ஼ܶ௅௄ (2.6) 
and variance 
ߪௌଶ ൌ ܰ݌ሺ1 െ ݌ሻ ൌ ܰ ߂ܶ஼ܶ௅௄ ൬1 െ
߂ܶ
஼ܶ௅௄
൰. (2.7) 
Equation (2.6) proves that the expected value of the counter output has linear 
relationship with the delay between the input signals as the counting region Δܶ can be 
used to represent the delay. Equation (2.7) shows that the delay measurement error is 
the largest when the delay is half the clock period (Δܶ ൌ ஼ܶ௅௄ 2⁄ ) and the error 
reduces as the two input signals are more aligned (e.g. as Δܶ approaches 0 or ஼ܶ௅௄). 
Since Binomial distribution with large ܰ  can be approximated by Gaussian 
distribution, 99.7% of the time ܵ  will obtain a value within േ	3ߪௌ  of ߤௌ , and the 
measured delay will be within േ3ఙೄே ஼ܶ௅௄	of the actual delay 99.7% of the time. In 
  
 
 
40 
 
 
 
order to choose a value for ܰ, we state that the delay measurement error should be less 
than ஼ܶ௅௄ 200⁄  for the delay with the worst error (i.e. Δܶ ൌ ஼ܶ௅௄ 2⁄ ): 
6ߪௌܰ ஼ܶ௅௄ ൑
஼ܶ௅௄
200. (2.8) 
Using (2.7), (2.8) becomes 
6 ஼ܶ௅௄ܰ ඨܰ
஼ܶ௅௄ 2⁄
஼ܶ௅௄
ቆ1 െ ஼ܶ௅௄ 2⁄
஼ܶ௅௄
	ቇ ൑ ஼ܶ௅௄200	 (2.9) 
and after simplification 
ܰ ൒ 360,000. (2.10) 
We chose ܰ ൌ 360,000. Note that the measurement error is smaller at other delay 
values. For example, for a delay equaling 1% of the clock period, the measurement 
error is smaller than ஼ܶ௅௄ 1005⁄ , 99.7% of the time. 
2.3.4 Theoretical description of the jitter measurement process 
Jitter can be described as absolute jitter, period jitter and cycle jitter among many 
others. Since there are conflicting definitions of the jitter terminology in literature, we 
define our terminology here. 
  
 
 
41 
 
 
 
 
Figure 2.11 Jitter terminology used in this thesis.  
Denoting the nominal period of the clock ஼ܶ௅௄ and the transition time of the ݊௧௛ 
rising edge of the clock ݐ௡, we define the absolute jitter as the time difference between 
the actual transition time of the edge and its ideal time (Figure 2.11), 
ݐ௡ െ ݊ ஼ܶ௅௄. (2.11) 
We define the cycle jitter to be the difference between the transition times of the 
adjacent rising edges of the clock compared to the nominal period ஼ܶ௅௄, i.e. 
ሺݐ௡ାଵ െ ݐ௡ሻ െ ஼ܶ௅௄. (2.12) 
In other words, the cycle jitter is deviation of the actual period from the nominal 
period. 
We define the period jitter to be the difference between adjacent periods, i.e.  
ሺݐ௡ାଵ െ ݐ௡ሻ െ ሺݐ௡ െ ݐ௡ିଵሻ ൌ ݐ௡ାଵ െ 2ݐ௡ ൅ ݐ௡ିଵ. (2.13) 
Ideal
Clock
tn tn+1tn-1
nTCLK(n-1)TCLK (n+1)TCLK
-τa 
Jitter
Clock
τb τc 
Jabsolute=-τa Jcycle=τa+τb Jperiod=2τa+τb+τc
  
 
 
42 
 
 
 
Since the counting region in the delay measurement mode is deterministic, the 
counter output measures the proportion of the sampling edges that fall within that 
region. However, the counting region in the jitter measurement mode is probabilistic 
due to the clock jitter (Figure 2.7) and is different for each sampling edge. Therefore, 
the previous analysis describing delay measurement does not apply to how the clock 
jitter is measured. A more general way to look at the jitter measurement mode is that 
since the sampling edge may fall at any location on the unit period of the clock signal, 
it can sample any part of the clock jitter probability distribution. Even for a 
measurement with a single sampling edge (i.e. N=1), the count result is still 
representative of the amount of clock jitter because if the count is 1 instead of 0, it is 
more likely that the clock jitter is large rather than small. The result is highly 
unreliable for N=1 with large variance in the output count, but raising N to a large 
number increases the confidence in the measurement accuracy and reduces the random 
error of the count. 
More specifically, the probability that a sampling edge at ߬, which is defined in 
Section 2.3.2, is counted equals the cdf of the clock jitter at ߬  (݌ ൌ cdf_ܥܮܭሺ߬ሻ) 
because the sampling edge is counted as long as the clock jitter causes JIT CLK to be 
high at ߬ while REF CLK is low. Hence, the random variable ܺ describing this event 
is Bernoulli distributed, ܺ~ܤ݁ݎሺcdf_ܥܮܭሺ߬ሻሻ, where the success, or 1, denotes the 
sampling edge being counted. 
The mean of ܺ, or the probability that a sampling edge at any point on the unit 
period of the clock signal is counted, is 
  
 
 
43 
 
 
 
ߤ௑ ൌ ܧሾܺሿ ൌ 1ଶܲݎሺܺ ൌ 1ሻ ൅ 0ଶܲݎሺܺ ൌ 0ሻ ൌ ܲݎሺܺ ൌ 1ሻ
ൌ න ܲሺܺ|߬ሻ݀߬
்಴ಽ಼
଴
ൌ න ݂ܿ݀_ܥܮܭሺ߬ሻ	݌݂݀_߬ሺ߬ሻ݀߬
்಴ಽ಼
଴
ൌ න ݂ܿ݀_ܥܮܭሺ߬ሻ 1
஼ܶ௅௄
݀߬
்಴ಽ಼
଴
,	
(2.14) 
where pdf_߬ሺ߬ሻ ൌ 1 ஼ܶ௅௄⁄  describes the uniform distribution of the sampling edge on 
the unit period of the clock signal as described in Section2.3.2. Since the sampling 
signal jitter and the clock signal jitter are independent, ܲሺܺ|߬ሻ ൌ
cdf_ܥܮܭሺ߬ሻpdf_߬ሺ߬ሻ. 
Similarly,  
ܧሾܺଶሿ ൌ 1 ∙ ܲݎሺܺ ൌ 1ሻ ൅ 0 ∙ ܲݎሺܺ ൌ 0ሻ ൌ ܧሾܺሿ. (2.15) 
The standard deviation of ܺ is thus  
ߪ௑ ൌ ඥܸܽݎሾܺሿ ൌ ඥܧሾܺଶሿ െ ܧሾܺሿଶ ൌ ඥߤ௑ሺ1 െ ߤ௑ሻ.	 (2.16) 
After ܰ cycles of the sampling signal, the number of counted sampling edges 
is a random variable ܥ that is defined by ܥ ൌ ଵܺ ൅ ܺଶ ൅⋯൅ ܺே. Since the sum of 
i.i.d Bernoulli distributions has Binomial distribution, ܥ  has Binomial distribution 
with mean ߤ஼ ൌ ܰߤ௑   and variance ߪ஼ଶ ൌ ܰߤ௑ሺ1 െ ߤ௑ሻ . For a successful jitter 
measurement, the expected value of the counter output ߤ஼ should be proportional to 
  
 
 
44 
 
 
 
the rms jitter of the clock and the random error of the measurement, which is indicated 
by 
ߪ஼
ܰ ൌ ඨ
1
ܰ ߤ௑ሺ1 െ ߤ௑ሻ, (2.17) 
should be low.  The exact values of the mean and the standard deviation of the counter 
output can be analytically estimated when the clock jitter distribution is known. The 
upper bound of (2.17) is when ߤ௑ ൌ 0.5, which is a very unlikely case since it implies 
extremely large clock jitter, such as a deterministic jitter with peak to peak amplitude 
of more than a clock period. Therefore, the standard deviation of the jitter 
measurement with N=360,000 is much less than 
ߪ஼
ܰ ஼ܶ௅௄ ≪ ඨ
1
ܰ
1
2 ൬1 െ
1
2൰ ஼ܶ௅௄ ൌ
஼ܶ௅௄
1200 , ܰ ൌ 360,000.	 (2.18) 
Hence, even without knowing the exact distribution of the jitter, the measurement 
resolution is estimated to be sub-picosecond. 
The total jitter of a digital signal is composed of two types of jitter: random jitter 
and deterministic jitter. The random jitter is often assumed to be Gaussian since 
Gaussian noise processes inherent in devices, such as thermal noise and shot noise, are 
the major sources of random jitter. In addition, the accumulation of many different 
independent noise sources will result in a Gaussian random jitter by the central limit 
theorem. 
  
 
 
45 
 
 
 
The deterministic jitter may be further classified into data-dependent jitter and 
uncorrelated jitter. The data-dependent jitter is due to inter-symbol interference or 
duty cycle distortion in the data signal. Since this work focuses on the measurement of 
clock jitter, the analysis of data-dependent jitter is not considered. The uncorrelated 
jitter may be due to the coupling from other uncorrelated signals, such as an external 
clock or a periodic signal. 
 
Figure 2.12 Clock signal with random (Gaussian) jitter. Its rms/standard 
deviation value is represented as ࣌࡯ࡸࡷ.  
2.3.5 Measurement of clock jitter 
When the clock jitter is Gaussian distributed with 0 mean and standard deviation ߪ஼௅௄ 
(Figure 2.12), the mean and standard deviation of the counter output can be calculated. 
Neglecting the falling edge jitter of the clock signals and focusing only on the rising 
edge jitter,  
σCLK
  
 
 
46 
 
 
 
ߤ௑ ൌ න ݂ܿ݀_ܥܮܭሺ߬ሻ 1஼ܶ௅௄ ݀߬
଴
ି்಴ಽ಼
ൌ න 12ቆ1 ൅ ݁ݎ݂ ቆ
߬ െ 0
ඥ2ߪ஼௅௄ଶ
ቇቇ 1
஼ܶ௅௄
݀߬
଴
ି்಴ಽ಼
ൌ 12 ൅
1
√2ߨ ஼ܶ௅௄
ቌ1 െ ݁ି
಴்ಽ಼మ
ଶఙ಴ಽ಼మ ቍߪ஼௅௄ ൅ 12 ݁ݎ݂ ቆെ
஼ܶ௅௄
ඥ2ߪ஼௅௄ଶ
ቇ.	
(2.19) 
Assuming the clock jitter is small compared to its period ሺ ஼ܶ௅௄ ≫ ߪ஼௅௄ሻ , ߤ௑  is 
approximately 
ߤ௑ ൎ 1√2ߨ
ߪ஼௅௄
஼ܶ௅௄
. (2.20) 
The falling edge jitter doubles the probability of a VCO edge being counted, hence 
ߤ௑ ൎ 2√2ߨ
ߪ஼௅௄
஼ܶ௅௄
. (2.21) 
The mean and the standard deviation of the counter output are 
ߤ஼ ൌ ܰߤ௑ ൎ 2ܰ√2ߨ
ߪ஼௅௄
஼ܶ௅௄
. (2.22) 
ߪ஼ ൌ ඥܰߤ௑ሺ1 െ ߤ௑ሻ ൌ ඨܰ 2√2ߨ
ߪ஼௅௄
஼ܶ௅௄
൬1 െ 2√2ߨ
ߪ஼௅௄
஼ܶ௅௄
൰.	 (2.23) 
Equation (2.22) shows that the expected value of the counter output is linear with the 
clock jitter normalized to the period. This result is verified numerically by simulating 
  
 
 
47 
 
 
 
the measurement in MATLAB, where a sampling signal (running at 400 MHz with 54 
ps rms jitter) and clock signal with varying amounts of jitter are provided. Figure 2.13 
shows the mean and standard deviation of the counter output for 6 GHz and 10 GHz 
clock simulations. 
When the clock jitter has a bimodal distribution with two Gaussian random jitter 
distributions of ߪோ௔௡ௗ separated by deterministic ߬	ఋఋ distance (Figure 2.14), it can be 
shown, see appendix, that the expected value of the counter output is 
ߤ஼ ൎ 2ܰ√2ߨ
ߪோ௔௡ௗ	
஼ܶ௅௄
ቌ√2ߨ2
߬ఋఋ
ߪோ௔௡ௗ ݁ݎ݂ ൬
1
√2
߬ఋఋ
ߪோ௔௡ௗ ൰ ൅ ݁
ିଵଶ
ఛഃഃమ
ఙೃೌ೙೏మ ቍ,	 (2.24) 
and for periodic jitter with peak deviation ߬௉௃, it is 
ߤ஼ ൌ 2ܰߨ ஼ܶ௅௄ ߬௉௃. (2.25) 
 
0 2 4 6
C
ou
nt
 (x
10
0
0)
18
12
6
0
0 2 4 6
RMS Clock Jitter (ps)
C
ou
nt
 S
ta
nd
ar
d 
D
ev
ia
tio
n 150
0
100
50
RMS Clock Jitter (ps)
  
 
 
48 
 
 
 
Figure 2.13 Mean and standard deviation of the counter output for the 
simulations of 6 GHz (top/red) and 10 GHz (bottom/blue) clocks with random 
jitter. The simulated results (square points) match the theoretical estimates 
(lines) provided by (2.22) and (2.23).  
 
Figure 2.14 Clock signal with bimodal/Dirac distributed deterministic jitter and 
small random (Gaussian) jitter. The distance between the ideal clock edge and 
the Dirac peaks shown in dotted lines is ࣎ࢾࢾ.  
Equation (2.24) shows that the expected value of the counter output is linear to the 
random jitter (ߪோ௔௡ௗ) or the deterministic jitter (߬	ఋఋ) when either of them dominates. 
Similarly, a linear relationship is shown by (2.25) for periodic jitter.  The standard 
deviation of the counter output for bimodal and period jitter is also presented in the 
appendix. 
2.3.6 Measurement error due to the internal noise of the circuit and the REF 
CLK jitter 
The internal noise of the jitter/delay measurement circuit causes the sampling signal to 
sample Input 1 and Input 2 at slightly different times (excluding the deliberate delay 
σRand σRand
τδδ
  
 
 
49 
 
 
 
using the delay block). Therefore, the edges of the JIT CLK and REF CLK will 
effectively have random relative variations and the counter output will be non-zero 
even if the input signals have no jitter. As the measurement circuit is designed to be 
isolated from supply, substrate and coupling noise, the internal noise is dominated by 
device noise and assumed white Gaussian. Since the input clock is external to the 
measurement circuit, the clock jitter and the sampling jitter are independent. 
Therefore, the random jitter that the circuit measures has standard deviation 
ߪ்௢௧௔௟ ൌ ටߪோ௔௡ௗଶ ൅ ߪூே்ଶ , (2.26) 
where ߪூே் is the sampling jitter due to the internal noise of the measurement circuit. 
 When the deterministic jitter is negligible, the expected value of the counter 
output is 
ߤ஼ ൎ 2ܰ√2ߨ
ඥߪோ௔௡ௗଶ ൅ ߪூே்ଶ
஼ܶ௅௄
. (2.27) 
In order to get a linear relationship with the rms clock jitter, the internal jitter must be 
minimized. In addition, the internal jitter also causes the counter output to have 
slightly more variance (ߪ஼ଶ) since ߪ஼  increases with ߤ஼ . However, the measurement 
accuracy is still much better than the limit obtained in (2.18) because ߤ஼ ܰ⁄  is still 
much smaller than the ߤ௑ ൌ 0.5 assumed for the limit. 
  
 
 
50 
 
 
 
When the deterministic jitter is the dominant source of the clock jitter, it needs to 
be larger than both the random jitter of the clock as well as the internal jitter in order 
for the measurement to be linear with the deterministic jitter. 
If REF CLK has jitter, the circuit measures the "relative" jitter between JIT CLK 
and REF CLK. In other words, the circuit measures how much the edges of JIT CLK 
vary with respect to the jittery edges of REF CLK. If REF CLK and JIT CLK have 
correlated jitter, such as when they come from the same source, the correlated 
component of the jitter is ignored by the measurement circuit. Therefore, the 
measurement of ߪூே் during calibration does not require the input clock signals to be 
jitter free. Input 1 and Input 2 can be shorted, thereby eliminating the clock jitter that 
can be measured by the circuit. The measurement results presented in Section 3.2 are 
adjusted for ߪூே் using this method. 
If REF CLK jitter in externally referenced mode is uncorrelated with JIT CLK 
jitter, then REF CLK jitter is indistinguishable from the internal jitter. In other words, 
the standard deviations add in power similar to (2.26). 
In self-referenced mode where REF CLK is one period delayed version of JIT 
CLK, the period jitter is measured since successive clock edges are measured. 
2.4 Appendix 
  
 
 
51 
 
 
 
2.4.1 Counter output when the clock jitter has bimodal distribution 
In addition to random jitter, deterministic jitter can be a considerable source of 
timing uncertainty in clock signals. The simplest model for a deterministic jitter is the 
dual Dirac model, which assumes a bimodal distribution of jitter as shown in Figure 
2.14. The clock edges are assumed to transition at either of two fixed positions from 
the ideal point due to a deterministic noise source such as coupling from an external 
clock signal. The timing of the edges may further vary due to random jitter. 
The dual Dirac model does not perfectly represent the actual deterministic jitter in 
real life, but it is nonetheless widely accepted in industry [23]. Therefore, we analyze 
in this subsection the quality of the jitter measurement assuming the clock jitter fits the 
dual Dirac model. A more realistic model with periodic jitter is analyzed in Appendix 
2.4.2. 
Two jitter variables are considered in the model: the standard deviation of the 
Gaussian random jitter, ߪோ௔௡ௗ , and the deviation of the deterministic Dirac 
displacement from the ideal position, ߬ఋఋ. The cdf of the jitter is 
ܿ݀ ஼݂௅௄ሺఛሻ ൌ 14ቆ1 ൅ ݁ݎ݂ ቆ
߬ െ ߬ఋఋ
ඥ2ߪோ௔௡ௗଶ
ቇቇ ൅ 14ቆ1 ൅ ݁ݎ݂ ቆ
߬ ൅ ߬ఋఋ
ඥ2ߪோ௔௡ௗଶ
ቇቇ.	 (2.28) 
Similar to the analysis in Section 2.3.5, the probability that a sampling edge is counted 
is 
  
 
 
52 
 
 
 
ߤ௑ ൌ 2න ݂ܿ݀_ܥܮܭሺ߬ሻ 1஼ܶ௅௄ ݀߬
଴
ି்಴ಽ಼
ൌ 2න ቌ14ቆ1 ൅ ݁ݎ݂ ቆ
߬ െ ߬ఋఋ
ඥ2ߪோ௔௡ௗଶ
ቇቇ
଴
ି்಴ಽ಼
൅ 14ቆ1 ൅ ݁ݎ݂ ቆ
߬ ൅ ߬ఋఋ
ඥ2ߪோ௔௡ௗଶ
ቇቇቍ 1
஼ܶ௅௄
݀߬
ൌ 1 ൅ 12ߢ ቆ2ߛ ݁ݎ݂ሺߛሻ ൅ 2
݁ିఊమ
√ߨ ൅ ሺߢ ൅ ߛሻ ݁ݎ݂ሺെߢ െ ߛሻ
െ ݁
ିሺ఑ାఊሻమ
√ߨ െ ሺെߢ ൅ ߛሻ ݁ݎ݂ሺെߢ ൅ ߛሻ െ
݁ିሺି఑ାఊሻమ
√ߨ ቇ,	
(2.29) 
where  
ߢ ൌ 1√2
஼ܶ௅௄
ߪோ௔௡ௗ (2.30) 
and  
ߛ ൌ 1√2
߬ఋఋ
ߪோ௔௡ௗ (2.31) 
  
 
 
53 
 
 
 
Assuming the clock period is much larger than the jitter (i.e. ஼ܶ௅௄ ≫ ߪோ௔௡ௗ and 
஼ܶ௅௄ ≫ ߬ఋఋ), (2.29) becomes  
ߤ௑ ൎ 1√ߨߢ ൫√ߨߛ ݁ݎ݂ሺߛሻ ൅ ݁
ିఊమ൯. (2.32) 
The expected value of the counter output is  
ߤ஼ ൌ ܰߤ௑ ൎ ܰ√ߨߢ ൫√ߨߛ ݁ݎ݂ሺߛሻ ൅ ݁
ିఊమ൯. (2.33) 
The plot of the expression inside the parentheses in (2.33) with respect to ߛ is shown 
in Figure 2.15. If the deterministic jitter is larger than the random jitter (i.e., ߛ ൐ 1), 
(2.33) becomes  
ߤ஼ ൎ ܰ√ߨߢ √ߨߛ ൌ
߬ఋఋ
஼ܶ௅௄
ܰ, (2.34) 
which means the expected value of the counter output is linear with the deterministic 
jitter. For small values of ߛ ≪ 1, (2.33) can be approximated using Taylor expansion 
as 
ߤ஼ ൎ ܰ√ߨߢ ሺ1 ൅ ߛ
ଶሻ ൌ ܰ√ߨ ஼ܶ௅௄ ൫√2ߪோ௔௡ௗ ൅ ߛ߬ఋఋ൯, (2.35) 
which shows that the counter output is a measure of both random and deterministic 
jitter. If the random jitter is the dominating source of the total clock jitter (ߛ ൎ 0), then 
  
 
 
54 
 
 
 
(2.33) matches (2.22) obtained in Section 2.3.5. MATLAB simulations numerically 
verifying this result is shown in Figure 2.16. 
 
 
Figure 2.15 The expression in parentheses in (2.33) and its approximations at 
small and large ࢽ  values (left). The same plot normalized to the expression 
(right). 
2.4.2 Counter output when the clock jitter is periodic 
An example of a clock with periodic deterministic jitter is analyzed in this 
subsection. The total clock jitter, as shown in Figure 2.17, is the convolution of 
Gaussian random jitter and periodic deterministic jitter. Assuming the periodic jitter is 
single tone, its pdf can be modeled as  
0 321 γ	
0
1
2 
0 321 γ	
0
2
6
4
  
 
 
55 
 
 
 
݌݂݀_ܲܬሺ߬ሻ ൌ
ە
۔
ۓ 1
ߨට߬௉௃ଶ െ ߬ଶ
|߬| ൏ ߬௉௃
0 |߬| ൒ ߬௉௃
,	 (2.36) 
where ߬௉௃ is the peak deviation of the periodic jitter from the ideal location. Since 
calculating the convolution mathematically is infeasible, we calculate ߤ௑ assuming the 
random jitter is negligible. The cdf of the periodic jitter is  
݂ܿ݀_ܲܬሺ߬ሻ ൌ
ۖە
۔
ۖۓ 0 ߬ ൑ െ߬௉௃1
ߨ ቆܽݎܿݏ݅݊
߬
߬௉௃ ൅
ߨ
2ቇ |߬| ൏ ߬௉௃
1 ߬ ൒ ߬௉௃
.	 (2.37) 
Hence, we obtain  
ߤ௑ ൌ 2න ݂ܿ݀_ܲܬሺ߬ሻ 1஼ܶ௅௄ ݀߬
଴
ି்಴ಽ಼
ൌ 2න ቆ1ߨ ܽݎܿݏ݅݊
߬
߬௉௃ ൅
1
2ቇ
1
஼ܶ௅௄
݀߬
଴
ି்಴ಽ಼
ൌ 2ߨ ஼ܶ௅௄ ߬௉௃.	
(2.38) 
The expected value of the counter output is 
ߤ஼ ൌ ܰߤ௑ ൌ 2ܰߨ ஼ܶ௅௄ ߬௉௃ (2.39) 
and it has linear relationship with the periodic jitter amplitude. This result is verified 
numerically using MATLAB simulations (Figure 2.18). The case when the random 
  
 
 
56 
 
 
 
jitter is non-negligible is also shown in Figure 2.18. Similar to the jitter with dual 
Dirac distribution, ߤ஼  is linear to ߪோ௔௡ௗ  if the random jitter is much larger than the 
deterministic jitter. 
 
Figure 2.16 Matlab simulation of the counter output when 6 GHz JIT CLK has a 
bimodal/Dirac distributed deterministic jitter and random Gaussian jitter as 
shown in Figure 2.14. (a) When the Dirac peak distance is proportional to the 
C
ou
nt
er
 O
ut
pu
t 45000
30000
15000
0
0 2 4 6 8 10
C
ou
nt
er
 O
ut
pu
t 21000
18000
15000
12000
0 2 4 6 8 10
C
ou
nt
er
 O
ut
pu
t 24000
17000
10000
3000
0 2 4 6 8 10
±3 standard deviations from the mean
Theoretical estimates provided by (33)
The mean of 1000 simulations
(a) 
(b) 
(c) 
  
 
 
57 
 
 
 
rms jitter, the expected value of the count has linear relationship with ࣌ࡾࢇ࢔ࢊ since 
the general shape of the jitter distribution does not change, but it only widens, 
increasing the counting region. (b) With constant ࣎ࢾࢾ, the expected value of the 
count has linear relationship with large ࣌ࡾࢇ࢔ࢊ  where the random jitter is 
dominant. For small rms jitter, the deterministic jitter is dominant and the 
expected value of the count is constant since ࣎ࢾࢾ is set constant. (c) Similarly, with 
constant ࣌ࡾࢇ࢔ࢊ, the expected value of the count has linear relationship with large 
࣎ࢾࢾ where the deterministic jitter is dominant. For small deterministic jitter, the 
random jitter is dominant and the expected value of the count is constant since 
࣌ࡾࢇ࢔ࢊ is set constant. 
 
 
 
 
Figure 2.17 Clock signal with single tone periodic jitter (shown in dotted line) and 
random Gaussian jitter (not shown). The convolution of the periodic jitter and 
the Gaussian jitter results in the distribution shown in solid line. The distance 
τPJ
  
 
 
58 
 
 
 
between the ideal clock edge and the farthest periodic jitter is ࣎ࡼࡶ. The random 
jitter has standard deviation ࣌ࡾࢇ࢔ࢊ. 
 
Figure 2.18 Matlab simulation of the counter output when 6 GHz JIT CLK has a 
deterministic periodic jitter and random Gaussian jitter as shown in Figure 2.17. 
C
ou
nt
er
 O
ut
pu
t 33000
22000
11000
0
0 2 4 6 8 10
C
ou
nt
er
 O
ut
pu
t 18000
12000
6000
0
0 2 4 6 8 10
C
ou
nt
er
 O
ut
pu
t 15000
10000
5000
0
0 2 4 6 8 10
Theoretical estimates provided by (22)
Linear fit:
(a) 
(b) 
(c) 
±3 standard deviations from the mean
Theoretical estimates provided by (39)
The mean of 1000 simulations
  
 
 
59 
 
 
 
Similar to Figure 2.16, (a) when periodic peak distance is proportional to the rms 
jitter, the expected value of the count has linear relationship with ࣌ࡾࢇ࢔ࢊ. (b) With 
constant ࣎ࡼࡶ, the expected value of the count has linear relationship with large 
࣌ࡾࢇ࢔ࢊ where the random jitter is dominant. For small rms jitter, the periodic 
jitter is dominant and the expected value of the count is constant. (c) Similarly, 
with constant ࣌ࡾࢇ࢔ࢊ, the expected value of the count has linear relationship with 
large ࣎ࡼࡶ  where the periodic jitter is dominant. For small periodic jitter, the 
random jitter is dominant and the expected value of the count is constant. 
  
  
 
 
60 
 
 
 
Chapter 3  
ON-CHIP TIMING MEASUREMENT CIRCUIT 
The previous chapter provides an analytical treatment of the stochastic 
measurement techniques for timing of clock signals. In this chapter, a circuit solution 
to demonstrate these techniques is introduced. 
3.1 Overall architecture 
A simplified block diagram of a circuit implementation of the proposed technique 
is shown in Figure 3.1. The sampling signal is generated by a voltage controlled 
oscillator that is designed to have as much Gaussian jitter as possible. The edge 
detector block checks whether Input 1 is high and Input 2 is low when the sampling 
edge arrives. If they are, the Edge counter is incremented by the falling edge of the 
sampling signal. The Reset counter counts every edge of the sampling signal and when 
ܰ sampling edges are received, it triggers a register to save the output of Edge counter 
and then resets itself as well as the Edge counter. Since the measurement runs for 
ܰ ൌ 360,000 cycles of the sampling signal, the two counters are 19-bits. Although 
the implementation of a counter that counts to a value that is a power of 2 (e.g. N = 
219) is simpler and larger N improves accuracy, the measurement duration and energy 
consumption increase with N. Therefore, ܰ is kept as 360,000 to obtain the shortest 
measurement time with good accuracy. The delay block (Figure 3.2a) is a variable 
delay line that can be used in self-referenced jitter measurement mode to delay the 
  
 
 
61 
 
 
 
clock signal by one period to generate the reference signal. It is also used to calibrate 
the two input signals to have the same phase. A duty cycle calibration circuit is 
integrated in the edge detector block, which is further described in Section 3.1.2. 
 
Figure 3.1. Simplified block diagram of the proposed technique, which can be 
used for measurement of clock jitter, delay, and duty-cycle.  
 
 
(a)                                                           (b) 
Figure 3.2. (a) The delay block consisting of 8 stages of current starved inverters. 
The delay is adjusted by the current starving transistors. (b) High-jitter VCO.  
Delay
Edge 
Counter
En
R
Reset 
Counter
Edge
DetectorInput 2
Input 1
VCO
21
Pa
ra
lle
l t
o
Se
ria
lOutput
OutIn
Vp
Vn
Vctrl
Vout
  
 
 
62 
 
 
 
3.1.1 Jittery oscillator 
As explained in Section 2.3.2, the sampling signal should have as much Gaussian 
jitter as possible. In order to achieve this, several design choices were made for the 
VCO. To minimize unwanted noise, the supply and control voltages are decoupled 
with large capacitors. The substrate is shielded from the other (mainly digital) circuitry 
on the same chip. The VCO is a current-starved, single-ended ring oscillator (Figure 
3.2b). In order to maximize the jitter due to thermal noise, the VCO devices are set to 
be near-minimum size, consistent with the observations reported in [22]: jitter due to 
white noise in CMOS ring oscillator decreases with transistor width and length (below 
an optimum length). The minimum size also results in small current consumption, 
which helps with stabilizing the supply. Large resistors are added at the gates of the 
current starving devices to further increase the jitter using the thermal noise of the 
resistors. The final tweaking of device sizes with simulation was performed to 
maximize thermal noise while keeping the flicker noise corner low since flicker noise 
also increases with decreasing device area. 
The VCO frequency should be set low enough that enough jitter is accumulated to 
highly randomize the sampling edges. On the other hand, setting the frequency too 
slow prolongs the measurement since the measurement takes N VCO cycles. We run 
the VCO at 400 MHz, which results in a measurement of about 0.9ms. 
  
  
 
 
63 
 
 
 
 
(a)                                                           (b) 
Figure 3.3. (a) Edge detector block (b) D-flip flop with duty-cycle tuning option. 
 
Figure 3.4. Placing the adjustable delay block on the sampling signal path to 2nd 
D-flip flop (right) achieves the same result as placing it on the path of the high 
speed Input 1 (left). 
3.1.2 Edge detector 
The edge detector comprises two D-flip flops, an AND gate and a multiplexer 
(mux) (Figure 3.3a). The D-flip flops check the state of the inputs when the sampling 
(VCO) edge arrives. The AND gate then produces an enable signal for the edge 
D Q
Q
D Q
QInput 2
Input 1
VCO Delayed VCO
Enable 
Edge 
Counter
Delay = TCLK
Delay = TCLKSampling signal
Input 2
Input 1
  
 
 
64 
 
 
 
counter if the inputs are in the correct state. The mux connects Input 1 to the second 
D-flip flop in self-referenced jitter measurement mode.  
The main source of the internal jitter, ߪூே் as mentioned in Section 2.3.6, is the 
variation in the latching time of the flip flops. Therefore, in order to reduce the 
variation and speed up the latching time, the first stage of the flip flops is designed to 
be a dynamic comparator (Figure 3.3b). The simulated input-referred jitter of the flip 
flops is 32.8 fs. 
The duty cycle calibration is implemented in the dynamic comparator by adjusting 
the current available through each branch of the dynamic comparator. 
3.1.3 Delay block 
As mentioned in previous sections, adjusting the delay between the two inputs is 
necessary in self-referenced jitter measurement and calibration. Instead of placing a 
delay block in the path of the high-speed input signal, this design incorporates the 
adjustable delay block on the sampling path to the second D-flip flop. This results in 
delaying the sampling of Input 2 and achieves the same results as placing the 
adjustable delay block in the path of Input 1 (Figure 3.4). 
  
 
 
65 
 
 
 
 
(a)                                                           (b) 
Figure 3.5. (a) Die photo. (b) VCO period histogram. Period jitter = 54.5 ps rms. 
3.2 Measurements 
A circuit demonstrating the application of this technique was fabricated in 65nm 
TSMC CMOS process and initial measurements were reported in [21]. Figure 3.5a 
shows the die photo of the chip. The edge detector blocks are positioned sparsely in 
order to match the pitch of the high-speed probe that was used to supply the clock 
signals. When the circuit is integrated into a system, the blocks can be placed in a 
compact area.  
The setup for delay measurement mode includes Anritsu MP1763C signal 
generator, the fabricated chip and Keysight Infiniium 90000A oscilloscope. The signal 
generator is used to create two 6 GHz clocks with adjustable delay in 1ps steps. The 
chip outputs the digital counter value in series using a shift register. The digital 
oscilloscope is used to capture the chip output. The VCO is set to run at 400 MHz and 
2.5 3.51.5 2 3
TVCO (ns)
  
 
 
66 
 
 
 
N=360,000. The measurement is repeated 200 times at each delay point in order to 
capture the random variation in counter output. 
Out of the simulated period jitter of approximately 54 ps rms of the free running 
VCO, about 88% is from the resistors, 10% is from the current-starving devices, 1% is 
from the inverting devices and the other 1% is from the bias mirror devices. Here, the 
percentage is calculated as the ratio of the contributed variance to the total jitter 
variance since the jitter contributions add in variance (instead of standard deviation) as 
they are independent. The period jitter of the VCO is measured to be 54.5 ps rms 
(Figure 3.5b). 
Each chip is calibrated before it is used for measuring the clock jitter and delay. A 
clock signal is supplied to one input while the other input is held constant (at either 0 
or VDD) so that the counter output represents the duty cycle since the counting region 
equals the clock signal. The duty-cycle calibration circuitry is then tuned so that slight 
offset in the 50% clock duty cycle and the process mismatch in the edge detector are 
compensated. The internal and clock jitter will not affect the duty cycle measurement 
since the sampling edges capture both sides of the jitter distribution when the 2nd input 
is held constant. A clock with no added jitter is then supplied to both inputs of the 
circuit so that the delay block is calibrated by tuning the delay block until the counter 
output is minimum. The small but non-zero count corresponds to the effective jitter 
due to the noise in the measurement circuit (including the edge detector) and is 
measured to be 0.07 ps rms. This internal jitter is compensated for in the measurement 
  
 
 
67 
 
 
 
plots shown below in order to show the linear relationship between counter output and 
the injected jitter amplitude. 
 
Figure 3.6 On-chip delay measurement. 
 
 
(a)                                                           (b) 
-0.5T -0.25T 0T 0.25T 0.5T
C
ou
nt
er
 O
ut
pu
t
0
2
4
6
0 2 4 6
M
ea
su
re
d 
Ji
tte
r (
ps
)
Injected Jitter (ps)
0
2
4
6
0 2 4 6
M
ea
su
re
d 
Ji
tte
r (
ps
)
Injected Jitter (ps)
  
 
 
68 
 
 
 
Figure 3.7. On-chip jitter measurement in (a) external referenced mode and (b) 
self-referenced mode where the clock is delayed by one period. Counter outputs 
are converted to show the estimated rms jitter in ps. 
The delay measurement result is shown in Figure 3.6. As expected, the expected 
value of the counter output has linear relationship with the delay between the input 
signals. The standard deviation of the measurement at the delay of ஼ܶ௅௄ 2⁄  is 0.14 ps. 
The performance of the circuit in delay measurement mode is compared with the 
measured results of previous work in Table 3.1.  
In externally-referenced jitter measurement mode, a Keysight N4903B is used to 
supply JIT CLK and REF CLK at 6 GHz. The JIT CLK is a jitter added version of 
REF CLK and the added rms jitter is swept from 0 ps to 6 ps, with the maximum 
limited by the equipment. In self-referenced jitter measurement mode, only JIT CLK 
is supplied by the equipment and it is connected to both D-flip flops internally on chip, 
with the second flip flop being sampled one clock period after the first one. The rest of 
the setup is the same as the delay measurement. Figure 3.7 shows the expected value 
of the measurement for different rms jitter values in externally referenced and self-
referenced modes when the internal noise of the circuit is compensated. 
The root mean square error (RMSE) of the measured data relative to the actual 
injected jitter amplitude is calculated to quantify the measurement error instead of 
reporting two metrics: the difference between the mean of the measured jitter and the 
actual injected jitter, which represents precision, and the standard deviation of the 
  
 
 
69 
 
 
 
measurements, which represents accuracy. The RMSE for externally referenced mode 
is 0.102 ps and for self-referenced mode, it is 0.308 ps. The circuit consumes 0.89 
mW. Table 3.2 compares the previous jitter measurement circuits with this work, 
which is the only technique that does not require off-chip post processing to provide 
an estimate of the jitter amplitude. Integrating the post processing on-chip would 
significantly increase the power and area costs of the other works. 
3.3 Conclusion 
On-chip measurement of timing non-idealities is increasingly important for 
characterizing and improving high performance circuits where the high speed clock 
reduces the margin for timing errors. We presented a stochastic technique that can be 
used for on-chip measurement of clock jitter, delay and duty-cycle. We provided 
theoretical analysis showing that the stochastic measurement based on a simple noisy 
sampler that does not require timing and process accurate circuitry can be accurate and 
robust. We implemented the technique in a CMOS process and demonstrated the 
expected functionalities. The chip consumes 0.89 mW, occupies an active area of 
1.5 ∙ 10ସߤ݉ଶ  and achieves measurement error below 0.31 ps. To the best of our 
knowledge, it is the only work demonstrated to measure all three of the mentioned 
timing non-idealities. It is also the only fully on-chip jitter measurement circuit that 
does not require off-chip processing. 
There are two ideas that I would have liked to try if I had more time and that have 
been left for the future. First, applying the timing measurement circuit in a feedback 
  
 
 
70 
 
 
 
loop to adjust control parameters and demonstrating the calibrated/controlled outcome 
would have enhanced this work. For example, the proposed circuit could be used to 
monitor the output phase of phase interpolators (PIs) or delay locked loops (DLLs). 
With the feedback loop, the PIs and DLLs can generate the output signals with higher 
phase accuracy. 
Second, it would be interesting to explore the application of the proposed circuit 
when it is deliberately made to be less precise. Currently, each measurement runs for 
ܰ ൌ 360,000 cycles, taking about 1 ms. Hence, any timing changes on a shorter time 
scale are averaged over the measurement period. This limits the application of the 
circuit to infrequent calibration or monitoring of slow variations such as aging effect 
and temperature change. However, if high precision is not crucial, ܰ  can be 
significantly reduced with small degradation in precision because the measurement 
error scales with ∝ ඥ1 ܰ⁄ , according to (2.9).  
 
  
  
 
 
71 
 
 
 
Table 3.1: Comparison with Previous On-Chip Delay Measurement Circuits 
Work Test Clock Speed 
Measurement 
Error 
Counter 
Length 
Measurement 
Time (µs) Power Area (µm
2) Process
[7] 0.2 GHz 1 ps 2.5·105 N/A N/A 5.9·105 180 nm 
[8]  0.1 GHz 50 psa 6.6·104 N/A N/A 3.3·103 130 nm 
[9] 2.5 GHz ±0.25 psb 5.0·109 N/A (longb) N/A 1.0·103 32 nm 
This 
Work 6 GHz 0.14 ps 3.6·105 900 0.89 mW 1.5·104 65 nm 
 
Table 3.2: Comparison with Previous On-Chip Jitter Measurement Circuits 
Work Test Clock Speed 
Measurement 
Error 
Off-Chip 
Processing Power Area (µm
2) Self Referenced Process
[12] 2.5 Gbps 1.56 ps Yes N/A 3.5·104 No 110 nm 
[13] 5 GHz ≤1 ps Yes 132.8 mW 3.2·105 Yes 65 nm
[14] 2.5 GHz 0.4 ps Yes 1 mWa 3.2·103 Yesc 130 nm
[15] 3.36 GHz 0.7 ps Yes N/A 4.9·102 Yes 65 nm
[16] 6 GHz 2.0 ps Yes 36.48 mW N/A No 65 nm 
This 
Work 6 GHz 0.10 ps 0.31 ps No 0.89 mW 1.5·104 No Yes 65 nm
a.1 mW @ 200 MHz clock. Self-referenced version is published separately 
  
  
  
 
 
72 
 
 
 
Chapter 4  
PCO BASED SYNCHRONIZATION FOR PEER-TO-PEER NARROWBAND 
RF NETWORK 
While the previous two chapters present a stochastic technique to measure and 
monitor timing non-idealities of clock signals at the small scale of integrated circuit 
chips, this and the next chapters present a technique to achieve and maintain timing at 
the large scale of P2P radio nodes communicating over tens to hundreds of meters. For 
the radio nodes in a symmetric P2P network (i.e. all nodes are identical with no 
distinctive master node) low-power long-range communication is only possible if the 
power-hungry RF front-end is duty-cycled so that the average power of transmission 
is much reduced. Aggressive duty-cycling for significant power savings requires the 
entire network to be synchronized. This work utilizes PCO for achieving the 
synchronization and maintaining network timing. 
4.1 Pulse coupled oscillators 
Some species of fireflies, notably of the genus Pteroptyx found in Southeast Asia, 
are known to flash synchronously. Thousands of these fireflies are able to flash in 
perfect unison for their potential mates. From an engineering perspective, this 
phenomenon is very interesting because simple creatures that consume very small 
amount of energy and lack high intelligence are able to achieve synchronization at a 
massive scale.  
  
 
 
73 
 
 
 
 
Figure 4.1. Transient of the synchronization of 3 PCOs 
Mirrollo and Strogatz developed a mathematical model of this phenomenon in 
their work [18]. The model assumes each firefly has its own relaxation oscillator, or a 
body clock, that interacts through impulsive coupling, representing the flashing of the 
firefly. The oscillator is assumed to have a monotonically increasing and concave 
down phase function with respect to time. When the phase reaches a threshold, the 
firefly “fires” a coupling pulse and resets the phase. The coupling pulse, or the flash of 
light emitted by the firefly, causes the other oscillators’ phase to advance by a fixed 
amount, which is referred in this thesis as the coupling strength. Since the phase 
function is nonlinear, the fixed amount of phase advance translates to varying amount 
of time adjustment in how long the oscillator takes to reach the threshold.  
Mirrollo and Strogatz proved that if the function (S(t)) mapping the phase variable 
to time is smooth, monotonically-increasing and concave-down (S'>0 and S"<0), a 
network of any number of ideal PCOs achieves synchronization. Here, the network 
consists of identical PCO nodes and does not require a distinctive master node. Each 
  
 
 
74 
 
 
 
node receiving a coupling pulse, adjusting its own body clock/oscillator, and firing a 
coupling pulse when reaching the threshold causes the network self-organize and 
synchronize. Hence, it creates a naturally ad-hoc and scalable system. 
An example of the synchronization process for three PCOs is shown in Figure 4.1. 
Each PCO starts at a random phase and follows the mathematical model described 
above. After exchanging several coupling pulses, the system achieves synchronization. 
4.2 PCO for synchronization of IR-UWB radio nodes 
The mathematical proof by Mirrollo and Strogatz that any number of ideal PCOs 
can reach synchronization became the basis for PCO based synchronization of 
wireless nodes. PCO network in less ideal conditions was studied by X. Wang et al. 
and shown to be a promising scheme for scalable synchronization of ultra-wideband 
impulse radio (IR-UWB) networks [20]. They mapped the parameter space for 
physical implementation of IR-UWB radios in CMOS into the mathematical model of 
PCO and showed through simulation that robust synchronization is achieved.  
Reference [20] presented several design requirements that is necessary for PCO 
synchronization in real-world applications: 
1. The frequency mismatch between the PCOs should be large enough to 
reduce the relative jitter in the period of the oscillators.  
It is shown that when the network is synchronized, the node with the 
highest PCO frequency resets on its own and drives the other nodes to 
reset. The period difference between this leader node and the second 
  
 
 
75 
 
 
 
fastest node in the network should be large enough that the second node 
does not temporarily become the fastest node due to jitter. 
2. The PCO design should include a blackout period longer than twice the 
largest propagation delay between two directly connected nodes.  
Blackout period is the amount of period right after a node resets, during 
which the node ignores any incoming coupling pulses. When a node resets 
on its own and sends out a coupling pulse, which causes a neighboring 
node to immediately reset and send a coupling pulse back, the original 
node should ignore the “echo” coupling pulse.  
3. Coupling strength should be large enough to compensate for the frequency 
mismatch of the oscillators. 
Otherwise, the fastest node would not be able to drive the rest of the 
nodes to immediately reset, allowing the relative jitter of the oscillators to 
distort the synchronization quality. 
These design requirements are applicable to the synchronization of any RF 
networks since propagation delay, oscillator jitter and frequency mismatch are 
inherent in real life systems. 
4.3 PCO for synchronization of long-range narrowband RF network 
While PCO syncronization was demonstrated for UWB networks, UWB radios 
have significant limitations in range and tolerance to interferers. Implementing such a 
concept for narrowband radios requires complete rethinking of the system/circuits.  
  
 
 
76 
 
 
 
Since the range of this application is tens to hundreds of meters, the path loss is 
significant and the node is required to detect the received coupling pulse at much low 
SNR. The received signal also becomes susceptible to interference. In order to 
alleviate these issues, the synchronization scheme for long-range narrowband RF 
network must utilize a signature-signal/syncword as the coupling pulse so that the 
received signal can be correlated with the expected syncword for more reliable 
detection. 
The longer range also results in longer propagation delay of the coupling pulse. In 
addition, due to the correlation operation, processing delay is added. These delays 
affect the synchronization quality.  
Since the RF nodes operate in narrowband, phase and frequency mismatch of the 
carriers as well as the baseband clocks in the PCO nodes affect the detection of the 
coupling pulse. Hence, the PCO synchronization circuitry should be designed to 
handle these non-idealities. 
While the IR-UWB in [20] operates with pulse rates of 150 kHz, most IoT 
applications want to sense the peers with a rate on the order of only one Hz [24], in 
part to further save power. This requires a PCO circuit design different from that of 
[20] as discussed in Section 5.3.  
4.3.1 Investigation of sequences for the syncword 
Various kinds of sequences with different properties are used in 
telecommunications, such as Gold code in CDMA and Zadoff-Chu sequence in LTE. 
  
 
 
77 
 
 
 
Several of these sequences are investigated for choosing the syncword. Coupling 
pulses consisting of each sequence and white Gaussian noise are generated in 
MATLAB. They are then correlated with the noise-free sequence. The correlation is 
repeated 108 times to obtain the bit error rate (BER) at given signal to noise ration 
(SNR). The simulation result is show in Figure 4.2. While the Zadoff-Chu sequences 
provide the best performance, they also require complex processing as they are analog 
and complex signals. Hence, a 63-bit Kasami sequence, which is binary and real, is 
chosen for its simple and power efficient implementation (Section 5.2.1). It achieves 
BER of 10-5 at 3 dB SNR excluding any non-idealities in processing. An actual circuit 
implementation of the correlation and any coupling pulse processing will degrade this 
performance. 
 
Figure 4.2. Comparison of various sequences for syncword 
1E-6
1E-5
1E-4
1E-3
1E-2
1E-1
1E+0
-7 -2 3 8
B
E
R
SNR in dB
Gold - 63
Gold - 31
Kasami - 63
ZC25 Exp-01 - 64
ZC25 - 64
real ZC25 Exp-01 - 64
  
 
 
78 
 
 
 
 
4.3.2 Range estimate 
Using the SNR value from previous section, the maximum range between the RF 
nodes for synchronization with the given BER can be estimated assuming the 
applicable path loss model and the specs of the RF front end used.  
The free-space (line of sight) path loss in dB at carrier frequency ݂ in Hz over 
distance ݀ in m is 
ܲܮ ൌ 20 ݈݋݃ ݀ ൅ 20 ݈݋݃ ݂ െ 147.55 ݀ܤ. (4.1) 
According to IEEE 802.11 Task Group project (IEEE 802.11ah) [25], the outdoor 
device to device path loss model for antenna height of 1.5 m and carrier frequency of 
900 MHz is 
ܲܮ ൌ െ6.17 ൅ 58.6 ݈݋݃ ݀. (4.2) 
Note that this path loss increases much more quickly with distance (58.6) compared to 
the path loss of free-space (20) or outdoor macro deployment (37.6). 
The signal power at the input of the correlator is  
ௌܲ ൌ ்ܲ௑ െ ܲܮ ൅ ܣோ௑, (4.3) 
where ்ܲ௑ is the transmitted power and ܣோி is the gain of the RX front end. The noise 
power at the input of the correlator assuming room temperature is 
  
 
 
79 
 
 
 
௡ܲ ൌ െ174݀ܤ݉ܪݖ ൅ 10 ݈݋݃ ܤܹ ൅ܰܨோ௑ ൅ ܣோ௑, (4.4) 
where ܤܹ  and ܰܨோ௑  are the bandwidth and noise figure of the RX front end 
respectively. The bandwidth and input-power dependent ܰܨோ௑  and ܣோ௑  can be 
obtained from the spec sheet of the RF front end that is used. Assuming transmit 
power of 0 dBm and a commercial receiver with BW of 2 MHz, the SNR at the input 
of the correlator given the transmission distance is plotted in Figure 4.3 in blue. The 
red line shows the SNR of 3 dB for BER = 10-5 with 63-bit Kasami sequence estimates 
about 80 m of transmission range using the path loss model in (4.2).  
 
Figure 4.3. Estimate of the transmission range 
4.3.3 PCO network simulation 
Rigorous mathematical analysis of synchronization with the parameters of a real-
word narrowband system is quite difficult in closed form. Consequently we conduct a 
numerical simulation for a qualitative analysis in MATLAB. We constructed an event-
-20
0
20
40
60
80
0 20 40 60 80 100 120
SN
R
 (d
B
)
Distance (m)
  
 
 
80 
 
 
 
based simulator similar to the one presented in [20], which takes into account 
“frequency mismatch, jitter, propagation delay, variable coupling strengths” as well as 
“arbitrary network connectivity.” The simulator we used for this work calculates the 
phase change using the digital oscillator model (Section 5.3), keeps track of the phase 
of the reference clock (as well as the oscillator) in each node, and includes the 
processing delay of syncword correlation in addition to the propagation delay. The 
simulator also keeps track of the arrival times of propagating CPs. Each event in the 
simulation is either (a) a node reaching its threshold and firing or (b) an in-flight CP 
reaching a destination node. At each simulation step, we elapse the time by the amount 
until the next soonest event. For (a), we reset the firing node’s phase and generate a 
new CP. For (b), we advance the phase of the destination node – if the phase reaches 
threshold, then we also follow the steps in (a). 
The simulator models a network of 20 nodes randomly scattered on a 290 m × 290 
m grid with a coupling range of 100 m and TPCO = 1 s (Figure 4.4). We picked a 
blackout period of 100 ms and maximum coupling strength that immediately takes the 
nodes to threshold. The period jitter of the PCOs is set to be 4 µs rms. The coupling 
events are modeled as instantaneous. A frequency mismatch of ±50 ppm was chosen 
as a reasonable value for the reference clock. The exact frequency of each reference 
clock is drawn from a uniform distribution of ±50 ppm at the beginning of the 
simulations. The simulations simulates a transient of 100 cycles, which is long enough 
for the nodes the system to synchronize. Since the synchronization can be dependent 
on the initial state of the PCOs, each simulation is repeated 200 times with initial 
  
 
 
81 
 
 
 
conditions for both the reference clock and the digital oscillator randomly chosen from 
a uniform distribution. 
 
 
Figure 4.4. Topology of the randomly located nodes. Nodes within 100 m are 
connected to each other.  
4.3.4 Simulation results and discussion 
The impact of the narrowband specific non-idealities on the synchronization is 
measured by two attributes: how quickly the network synchronizes and how well it 
maintains synchronization. The speed of synchronization is important because the 
timescale of a PCO cycle is on the order of seconds. Thus, if the system requires many 
cycles to synchronize, there is a long absolute time, for which data communication is 
disabled and the front end consumes high power as it is not duty-cycled. A measure of 
synchronization quality is “relative jitter,” amount by which the phase/period of the 
0 75 150 225 300
0
75
150
225
300
Network Topology
Distance (m)
D
ist
an
ce
 (m
)
  
 
 
82 
 
 
 
nodes varies relative to the phase/period of the leader node. Even if the leader has 
large period jitter, the synchronization is robust as long as the other nodes follow the 
phase of the leader. Maintaining low relative jitter synchronization is important 
because it allows the power hungry front end transceiver circuits to turn on and off 
more aggressively with very accurate timing precision and low error rates. 
The speed of synchronization depends on two parameters, propagation delay ௣ܶ௥௢௣ 
and Δ ௉ܶ஼ைିଵ, which is the period difference between the fastest node and the second 
fastest node. Intuitively, if the fastest node is forced to reset by the second fastest node 
due to initial startup condition, then the fastest node lags the slower node by ௣ܶ௥௢௣. 
Every cycle from then on, the fastest node reduces the lag by Δ ௉ܶ஼ைିଵ until it leads the 
slower node by Δ ௉ܶ஼ைିଵ . At that point, the fastest/leader node starts to drive the 
slower node to threshold (i.e. the slower node is locked to the leader). Hence, we 
expect the number of cycles the system takes to synchronize (ܰ) to be proportional to 
௣ܶ௥௢௣  and inverse proportional to Δ ௉ܶ஼ைିଵ: 
ܰ ∝ ௣ܶ௥௢௣߂ ௉ܶ஼ைିଵ. (4.5) 
The propagation delay ௣ܶ௥௢௣  is a combination of how long the coupling pulse 
travels between nodes and how long the syncword processing takes, ௉ܶ௥௢௖. Since the 
correlator has to wait until all of syncword is received, ௉ܶ௥௢௖  is the sum of the 
correlation circuit processing time and the period of syncword. In the first simulation, 
Δ ௉ܶ஼ைିଵ is swept and ܰ is recorded. ௉ܶ௥௢௖ is set to be 52	ߤݏ, which is representative 
  
 
 
83 
 
 
 
of the delay in the circuit implementation in Chapter 5. Since initial conditions of the 
system, such as the starting phase and location of the oscillators, affect how quickly 
the system synchronizes, Figure 4.5 plots the average of ܰ over 200 iterations with 
different PCO starting phases. Similarly, in the second simulation, ௣ܶ௥௢௣ is swept and 
ܰ is recorded. Δ ௉ܶ஼ைିଵ is set to be 40	ߤݏ out of the PCO period of 1 s. The results are 
shown in Figure 4.6. With ௉ܶ௥௢௖ ൌ 52	ߤݏ, average ܰ is 5. 
 
Figure 4.5. Average number of cycles to synchronize vs. period difference 
between the leader node and the 2nd fastest node  
Both figures show results consistent with the linear relationship in (4.5). Again, the 
speed of synchronization depends on the initial conditions and the location of the 
nodes. In addition, we assumed the system reaches synchronization only if the leader 
node drives all the other nodes. During the synchronization process, the nodes may all 
synchronize to a slow node until the fastest node overcome the lag and become the 
0
5
10
15
20
25
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
52 us
70 us
  
 
 
84 
 
 
 
leader. Hence, if 2 ௣ܶ௥௢௣ is within the tolerance for synchronization quality, the system 
may be considered synchronized in a shorter amount of time. 
 
 
Figure 4.6. Average number of cycles to synchronize vs. processing delay 
 
Figure 4.7. Maximum relative jitter vs. period difference between the leader node 
and the 2nd fastest node. The jitter of the leader node is plotted as a reference. 
0
5
10
15
20
0 40 80 120 160
40 us
25 us
0
3
6
9
0 20 40 60
Relative jitter Leader jitter
  
 
 
85 
 
 
 
Once the system is synchronized, the jitter of some nodes may occasionally cause 
them to fire before the coupling pulse from the leader or a leader driven node arrives. 
However, the jitter must be large enough to overcome the period difference between 
the leader and the slow nodes. Hence, the faster the leader is than the rest, the less the  
system experiences relative jitter. Figure 4.7 shows that the maximum relative jitter is 
reduced for larger Δ ௉ܶ஼ை. As long as Δ ௉ܶ஼ை is larger than the jitter of any node in the 
system, the high quality synchronization is maintained. Therfore, the criteria for 
keeping synchronization is  
߂ ௉ܶ஼ை ≫ ߪ௉஼ை, (4.6) 
where ߪ௉஼ை is the oscillator jitter of the PCO nodes. 
In a multi-hop network where some nodes are indirectly driven by the leader node 
through some intermediary nodes, the firing lag between slow nodes and the leader 
node is integer multiples of ௣ܶ௥௢௣ . This should be taken into account when the 
framework for the data transmission scheduling is designed. 
  
  
 
 
86 
 
 
 
Chapter 5  
SYNCHRONIZER CIRCUIT FOR PEER-TO-PEER NARROWBAND RF 
NETWORK 
In the previous chapter, a technique for emergent synchronization using PCO was 
introduced and its application in narrowband long-range P2P RF network was 
discussed.  This chapter presents the design and implementation of the technique in 
CMOS circuits. The synchronization of a P2P network using the implemented circuits 
is also demonstrated. 
5.1 Overview of the synchronizer block 
This chapter presents a baseband synchronizer block for scalable synchronization 
and aggressive duty cycling of narrowband RF peer-to-peer network. The 
synchronizer block consists of an signal processor (SP), which detects if a coupling 
pulse containing the syncword is received, and a timing circuit, which implements the 
PCO using digital blocks to adjust local timing and synchronize the node clock to the 
network. For the system level demonstration, each node in the network consists of a 
commercial narrowband RF front end connected to the proposed synchronizer block. 
Figure 5.1 shows the peer-to-peer network, a node in the network, and the 
synchronizer block in addition to an example of the synchronization process. 
 
 
  
 
 
87 
 
 
 
5.2 Signal processor 
The goal of the proposed signal processor is to receive a baseband signal from a 
commercial RF front end and detect whether a predetermined syncword is embedded 
in it. Doing so requires correlating the signal with the expected sequence. However, 
practical duty cycled radios with long off-times must synchronize with 1) low latency 
sync detection to enable short duty cycles for low power, 2) tolerance to LO mismatch 
and random startup states of the RF front end due to LO between “on” cycles, and 3) 
high SNR for long range. Hence, the SP consists of circuit blocks designed to meet 
these challenges. 
 
 
Figure 5.1. Block Diagram of the system. Synchronizer block (PCO Sync) shown 
in red is designed and fabricated. Transient of PCO phase state is illustrated as 
an example of three node synchronization.   
  
 
 
88 
 
 
 
Cross-correlation requires multiplication, summing, subtraction, delaying and 
comparing. While these can be done in the digital back-end, high performance ADCs 
and DSPs consume high power. Dedicated active analog circuits for some of these 
operations, such as banks of multipliers, also consume high power. In order to 
improve energy efficiency of these operations, we design a passive signal processor 
using only capacitors and switches along with few comparators. The passive topology 
allows low-power operation, quick on- and off-switching for aggressive duty-cycling, 
and predictable and short latency. 
A block diagram of the signal processing circuit is shown in Figure 5.2. The main 
block is the correlator, which correlates the received signal with the expected 
sequence, programmed by the serial-to-parallel interface (SPI). A Kasami sequence of 
63 bits is chosen for its good correlation performance and ease of implementation. 
The output of the correlator is resolved by the peak detector. Carrier frequency 
mismatch generates low frequency amplitude variation on the demodulated baseband 
signal. The differential detector mitigates this effect and enables detection in the 
presence of carrier offset. The amplitude detector automatically adjusts the peak 
detector threshold based on the received signal amplitude. Each of these blocks is 
discussed in greater detail in the subsections below. 
 
  
 
 
89 
 
 
 
 
Figure 5.2. Block diagram of the proposed signal processor.   
5.2.1 Correlator 
This work uses a programmable 63 bit Kasami sequence for the syncword, which 
is used as the impulsive coupling for the narrowband system. The correlator block 
performs the cross-correlation of the expected Kasami sequence ݏሾ݇ሿ and the received 
signal ݎሾ݊ሿ, where ݇ and ݊ represent the ݇௧௛ bit and ݊௧௛ sample, respectively.  
ݏ ⋆ ݎሾ݊ሿ ൌ෍ ݏሾ݇ሿݎሾ݊ ൅ ݇ሿ
଺ଷ
௞ୀଵ
(5.1) 
As shown in (5.1), the correlation operation has three operations: slide, multiply 
and sum. If ݎ contains the sequence and is aligned with ݏ, then the correlator generates 
a peak. However, the time point ݊  when the concealed sequence in ݎ  starts is 
unknown. In conventional wireless circuits such as Wi-Fi and cellular radio, the 
starting point is estimated in backend digital circuitry, which is avoided in this work. 
This circuit instead emulates 63 multiplications in parallel, each with a different bit of 
ݏሾ݇ሿ, so that the need to detect the phase of the concealed sequence in ݎ is eliminated. 
  
 
 
90 
 
 
 
The correlator stores samples of the signal ݎ  in sampling capacitors as ݎ  is 
received, which is equivalent to the sliding operation of the correlation. Then, 63 
capacitors containing the latest 63 samples of ݎ are shorted together (Figure 5.3) to 
average the stored charges, thereby performing the summing operation. Depending on 
whether the expected sequence bit ݏሾ݇ሿ is 1 or െ1, the capacitor with the ሺ݊ ൅ ݇ሻ௧௛ 
sample of ݎ is shorted in positive or negative configuration with the rest of the 63 
capacitors, which emulates multiplication operation. Notably, only one type of 
capacitor cell is used to perform all the operations of correlation. 
 
 
Figure 5.3. The correlator consists of sampling cells that store the latest 63 bits of 
the decoded signal. The output of the differential detector, representing the 
current data bit, is connected to 63 cells. Half a bit period later, different cells 
containing the latest 63 bits are connected to the output.   
Each sample of the received signal ݎ is used once per bit period for 63 consecutive 
bit periods. On the other hand, shorting capacitors for the summing operation modifies 
  
 
 
91 
 
 
 
their charge, making them unusable for future bit periods. Therefore, each received 
sample of ݎ  must be stored on 63  different capacitors. The sampling capacitor 
multiplied with ݏሾ63ሿ is used for correlation during the same cycle, which allows it to 
be re-purposed during the next cycle. Similarly, sampling capacitor multiplied with 
ݏሾ݇ሿ can be re-purposed every ሾ63 െ ݇ሿ௧௛ bit period. Thus, as shown on the left of Fig. 
2, the correlator requires 1 ൅ 2 ൅⋯൅ 63 ൌ 2016 sampling capacitors. 
The sampling capacitors in the correlator have four phases of operation (Figure 
5.4): (i) storing the samples of ݎ, (ii) sharing the stored charges to amplitude detector 
cells, to be discussed below, (iii) shorting 63  capacitors containing the latest 63 
samples to the output node, and (iv) resetting the charges on the shorted capacitors. 
 
Figure 5.4. The correlator cell consists of a sampling capacitor and switches 
connecting to other blocks. The numbers in circles represent which phases the 
corresponding switches turn on in. In phase (iii), the capacitor may be connected 
to the output (ࢂࢉ࢕࢘࢘) in either positive or negative configuration based on ࢙ሾ࢑ሿ.   
  
 
 
92 
 
 
 
5.2.2 Peak detector 
The peak detector consists of three identical dynamic comparators that are 
connected to the output node of the correlator (Figure 5.5). In phase (iii), when 
selected correlator capacitors are connected to the output node, the comparators are 
clocked to compare the correlator output with a threshold. If the comparator outputs 
are different due to random noise, the voting logic selects the majority vote. The peak 
detector outputs 1 if the correlator peak is larger than the threshold:  
௖ܸ௢௥௥ሾ݊ሿ ൌ 163෍ ݏሾ݇ሿݎሾ݊ ൅ ݇ሿ
଺ଷ
௞ୀଵ
൐ ௧ܸ௛௥௘௦௛. (5.2) 
The amplitude detector adjusts the charges stored on the correlator capacitors in order 
to make ௧ܸ௛௥௘௦௛ effectively proportional to the input signal amplitude. 
 
Figure 5.5. (Left) The peak detector consists of three comparators and a logic 
block for majority vote for the output. (Right) Dynamic comparator used in the 
peak detector.   
En
EnEn
Vote
2:1
  
 
 
93 
 
 
 
5.2.3 Differential detector 
For aggressively duty-cycled RF front ends with long off times, there will be phase 
and frequency mismatch between the transmitter carrier (TX LO) and the 
demodulation LO due to drift. This mismatch results in a low frequency amplitude 
modulation on the received baseband signal, causing the signal to be too small to 
detect and/or flipping the polarity of the binary code embedded in it. The differential 
detector solves this problem (a) by looking at the combination of I and Q channels so 
that if one of them has small amplitude due to the envelope, the other has large 
amplitude; and (b) by comparing two consecutive bits to compute a decoded bit (e.g. if 
the consecutive bits have the same polarity, then the decoded bit is 1 so that the actual 
polarity of the received bits does not matter). 
Assume the baseband signals ܺ  and ܻ  are transmitted on the ܫ  and ܳ  channel, 
resulting in an RF signal of ܺ cosሺ߱ݐሻ ൅ ܻ sinሺ߱ݐሻ, where ߱ is the carrier frequency 
of the transmitter. When the signal is received at the receiver, it is mixed with carrier 
signals cosሺ߱ݐ ൅ Δ߱ݐ ൅ ߶ሻ and sinሺ߱ݐ ൅ Δ߱ݐ ൅ ߶ሻ for down-conversion, where Δ߱ 
and ߶ are the frequency and phase mismatch between the transmitter and receiver 
carriers. Hence, the baseband ܫ and ܳ signals in the receiver are 
൜ ܫ௥ ൌ ܺ ܿ݋ݏሺ߂߱ݐ ൅ ߶ሻ െ ܻ ݏ݅݊ሺ߂߱ݐ ൅ ߶ሻܳ௥ ൌ ܺ ܿ݋ݏሺ߂߱ݐ ൅ ߶ሻ ൅ ܻ ݏ݅݊ሺ߂߱ݐ ൅ ߶ሻ ,	 (5.3) 
where the amplitude terms are removed for simplification. It can be seen from (5.3) 
that the sinusoidal terms vary from െ1  to 1  over time, causing the problems 
  
 
 
94 
 
 
 
mentioned above. Assuming the transmitter sent the same signal on ܺ and ܻ, (5.3) 
becomes 
ቊ ܫ௥ ൌ √2ܺ ܿ݋ݏሺ߂߱ݐ ൅ ߶′ሻܳ௥ ൌ √2ܺ ݏ݅݊ሺ߂߱ݐ ൅ ߶′ሻ ,
	 (5.4) 
where ߶ᇱ ൌ ߶ ൅ ߨ 4⁄ . Consider the following transformation for the consecutive 
samples of the received baseband signals:  
ܫ௥ሾ݊ሿ ∙ ܫ௥ሾ݊ െ 1ሿ ൅ ܳ௥ሾ݊ሿܳ௥ሾ݊ െ 1ሿ ൌ
ൌ 2ܺሾ݊ሿܺሾ݊ െ 1ሿ
∙ ሺܿ݋ݏሺߔሻ ∙ ܿ݋ݏሺߔ ൅ ߂߱ ௕ܶ௜௧ሻ ൅ ݏ݅݊ሺߔሻ ∙ ݏ݅݊ሺߔ ൅ ߂߱ ௕ܶ௜௧ሻሻ
ൌ 2ܺሾ݊ሿܺሾ݊ െ 1ሿ ܿ݋ݏሺ߂߱ ௕ܶ௜௧ሻ,	
(5.5) 
where Φ ൌ Δ߱ݐ ൅ ߶′, ௕ܶ௜௧ is the bit period of the Kasami sequence and ݊ refers to the 
݊௧௛ sample of the baseband signals. The transformation in (5.5) is void of the low 
frequency amplitude modulation and the sinusoidal term is close to 1 since Δ߱ ௕ܶ௜௧ is 
small. However, it requires accurate analog multiplication. 
The differential detector performs a variation of the transformation in (5.5) for 
more power-efficient implementation: 
ݎሾ݊ሿ ൌ ܫ௥ሾ݊ሿݏ݅݃݊ሺܫ௥ሾ݊ െ 1ሿሻ ൅ ܳ௥ሾ݊ሿݏ݅݃݊ሺܳ௥ሾ݊ െ 1ሿሻ,	 (5.6) 
  
 
 
95 
 
 
 
where ݎሾ݊ሿ is the output of the differential detector, which is also the input to the 
correlator in Section 5.2.1. More than 99% of the time, the sinusoidal terms on the 
consecutive samples of ܫ௥  and ܳ௥ will have the same sign, e.g. signሺcosሺΦሻሻ ൌ
signሺcosሺΦሻΔ߱ ௕ܶ௜௧ሻ. In these cases, (5.6) becomes 
ݎሾ݊ሿ ൌ √2ܺሾ݊ሿݏ݅݃݊ሺܺሾ݊ െ 1ሿሻ ∙ ሺ|ܿ݋ݏሺߔሻ| ൅ |ݏ݅݊ሺߔሻ|ሻ
ൎ √2ܺሾ݊ሿݏ݅݃݊ሺܺሾ݊ െ 1ሿሻ ∙ ൫1 ൅ ൫√2 െ 1൯|ݏ݅݊ሺ2ߔሻ|൯.	 (5.7) 
The amplitude factor in (5.7) with the sinusoidal term varies only between 1 and 
√2. In the other ൏ 1% of the time, the amplitude factor varies between ~0.954 and 
√2  with conservative assumptions of carrier frequency, mismatch, and signal 
bandwidth. Since output bit ݎሾ݊ሿ depends on whether the consecutive two bits of the 
transmitted signal have the same sign, the Kasami sequence should be encoded 
accordingly on the transmitter side so that the decoded output of the differential 
detector is also a Kasami sequence. 
As shown in Figure 5.6, the circuit implementation of (5.6) consists of sampling 
capacitors to sample ܫ௥  and ܳ௥  and dynamic comparators with a delay register to 
produce signሺܺሾ݊ െ 1ሿሻ, which determines whether the capacitors containing ܫ௥  and 
ܳ௥ samples are shorted to the output node in positive or negative configuration. 
  
 
 
96 
 
 
 
 
Figure 5.6. (Left) Differential detector diagram and (Right) its sampling cell. 
5.2.4 Amplitude detector 
Over time, the correlator output produces several peaks proportional to the 
received signal amplitude. While the largest peak occurs when the embedded sequence 
in the received signal aligns with the expected sequence, the correlator produces 
smaller peaks for misaligned sequences even if the received signal has no noise. 
Hence, too high a threshold in the peak detector ( ௧ܸ௛௥௘௦௛ ) will miss the received 
sequence while too low a threshold produces false positives. Therefore, the threshold 
must be adjusted if the received signal amplitude changes and the correlator peaks 
change proportionally. The amplitude detector achieves this by removing a fixed 
percentage of the charges stored on the correlator capacitors, effectively making the 
threshold proportional to the received signal amplitude. Doing so avoids the need for 
analog envelope detector and its requirement for fine bandwidth adjustments. Define 
  
 
 
97 
 
 
 
௧ܸ௛௥௘௦௛:ൌ ߙ ௔ܸ௩௘௥௔௚௘ ൌ ߙ 163෍ |ݎሾ݊ ൅ ݇ሿ|
଺ଷ
௞ୀଵ
ൌ ߙ 163෍ ݎሾ݊ ൅ ݇ሿݏ݅݃݊ሺݎሾ݊ ൅ ݇ሿሻ
଺ଷ
௞ୀଵ
,	
(5.8) 
where ߙ is a constant between 0 and 1. Substituting (5.8) into (5.2) and simplifying it 
yields 
1
63෍ ݎሾ݊ ൅ ݇ሿሺݏሾ݇ሿ െ ߙ ∙ ݏ݅݃݊ሺݎሾ݊ ൅ ݇ሿሻሻ
଺ଷ
௞ୀଵ
൐ 0.	 (5.9) 
Defining ߙ′ ൌ ሺ1 െ ߙሻ ሺ1 ൅ ߙሻ⁄  and ݎ′ ൌ ݎሺ1 ൅ ߙሻ, (5.9) becomes 
1
63෍ ݏሾ݇ሿݎ′ሾ݊ ൅ ݇ሿߚ
଺ଷ
௞ୀଵ
൐ 0,	 (5.10) 
where 
൜ߚ ൌ ߙ
ᇱ ൏ 1 ݂݅ ݏሾ݇ሿ ൌ ݏ݅݃݊ሺݎሾ݊ ൅ ݇ሿሻ
ߚ ൌ 1 ݂݅ ݏሾ݇ሿ ് ݏ݅݃݊ሺݎሾ݊ ൅ ݇ሿሻ. (5.11) 
If we let ݎᇱ ൌ ݎ , (5.10) governs the operation of the correlator, and performs the 
original thresholding function in (5.2) except for ߚ. Hence, the amplitude detector 
consists of charge-sharing capacitors that are shorted to the correlator capacitors at 
node ݌ሾ݇ሿ  (Figure 5.4) in phase (ii) based on the condition in (5.11) so that the 
correlator voltage reduces from ݎሾ݊ሿ to ߙ′ݎሾ݊ሿ. 
  
 
 
98 
 
 
 
5.2.5 Dual core 
Since the transmitter and the receiver are not phase locked, the signal processing 
circuit may sample the received signal at a bit transition, causing errors. Therefore, a 
copy of the circuit, Core 2 (Figure 5.2), is added. It samples the received signals half a 
bit period after Core 1 so that if either one samples at a transition, the other core 
samples at the center of the bit period. The combined output produces 1 if either core 
outputs 1. 
  
  
 
 
99 
 
 
 
5.3 Digital PCO 
After the SP detects the syncword in the received signal despite phase and 
frequency offset and variations in signal amplitude, it generates a pulse, which is 
coupled into the digital PCO to advance the phase of the local clock. 
As mentioned in Chapter 4, the period of the PCO in this application is on the 
order of a second. Analog oscillators, such as the RC relaxation oscillators that met the 
specifications for UWB PCO synchronization, cannot be used for one second 
timescale due to the very large components necessary to obtain the large time 
constant. An integrated capacitor of this scale has significant parasitics and leakage 
that prohibits correct operation. 
In this work, we propose a digital oscillator circuit that can be integrated in a 
CMOS process as an alternative to the analog relaxation oscillator and investigate the 
impact of the quantization/discretization error inherent to the digital design on the 
synchronization performance of the PCO network. We show through simulation that 
when compared to the analog oscillator case, the trade-off between the 
synchronization speed and synchronization quality is much more pronounced with 
digital PCOs. We find that a well-designed digital PCO can achieve a better 
performance than an analog PCO if the opposing criterion is relaxed. We also find that 
the most important design decision in a digital PCO is how to time the reset after the 
oscillator reaches threshold due to coupling. This critical decision can considerably 
affect the synchronization performance, and is studied analytically in this paper before 
  
 
 
100 
 
 
 
the results are applied to our final design. The circuit implementation of the proposed 
digital PCO design is discussed in Section 5.3.4. 
5.3.1 Digital PCO Model 
The fundamental design of the digital oscillator is based upon a modified counter 
driven by a reference clock (a crystal, a MEMS oscillator or a low jitter on-chip 
oscillator) and implements the monotonically-increasing concave-down function 
necessary for the PCO network. Our design consists of a main counter that counts a 
slowing trigger clock, each cycle of which is one reference clock period longer than 
the previous (Figure 5.7). 
 
Figure 5.7. Illustration of digital PCO state function. The state of the main 
counter lasts the same number of reference periods as its count. 
We can show that the normalized output of this counter is 
 
  
 
 
101 
 
 
 
ܸ ൌ ඥ ௥ܶ௘௙ ൅ ඥ ௥ܶ௘௙ ൅ 8݊ ௥ܶ௘௙െඥ ௥ܶ௘௙ ൅ ඥ ௥ܶ௘௙ ൅ 8݊ ௉ܶ஼ை, (5.12) 
where Tref is the reference clock period, TPCO is the period of the digital oscillator and 
nTref represents the quantized time. Since lim்ೝ೐೑→଴ ܸ ൌ ඥݐ ௉ܶ஼ை⁄ , the digital 
oscillator output is an approximation of ඥݐ ௉ܶ஼ை⁄  function. Although this function 
satisfies the monotonicity and the concavity conditions, the digital output sometimes 
has S'=0 (and S"=0) due to the discrete approximation. Therefore, it is important to 
study whether the system can synchronize with the non-smooth state function. 
The critical design decision lies in how an oscillator responds to a CP that takes it 
to threshold. Ideally, the oscillator should self-reset TPCO after receiving the CP 
(Figure 5.8). However, due to the discretization, it may only reset at either t=tA or 
t=tB. Thus, for that cycle, the digital oscillator has an effective period of  PCOA TT
or   refPCOB TTT , respectively, where τ is the delay between the coupling and the 
following trigger clock. Since τ ≤ Tref, we see that TB ≤ TPCO≤ TA. As shown in Section 
5.3.3, network synchronization is affected differently by the two different designs that 
provide TA and TB, which we call design A and design B respectively. 
  
 
 
102 
 
 
 
 
Figure 5.8. Illustration of the difference between design A and design B. 
5.3.2 Network simulation with digital PCO 
We ran the same MATLAB simulation from Section 4.3.3 to assess the impact of 
digital PCO on the synchronization. The digital PCO is different from analog PCOs 
since its phase function is discrete. The simulator calculates the phase change using 
the digital oscillator model and adjusts the period of each cycle with τ. The simulator 
models a network of 20 nodes randomly scattered on a 500 m × 500 m grid with a 
coupling range of 200 m and TPCO = 1 s. In order to highlight the impact of just the 
digital PCO quantization error, the period jitter is set to be a small value of 0.1 ns. 
Simulations with a jitter of 1 µs are also done to illustrate that the network of digital 
PCOs can achieve synchronization with both jitter and the quantization error. When 
the digital oscillator resets either at the end of a cycle or due to coupling, its nominal 
period TPCO is adjusted to include τ, if necessary, as well as the jitter. Simulations are 
run for 100 cycles for each parameter sweep.  
 
τ τ 
TPCO
TB
TA
Time
A received coupling takes 
the node to threshold
Tref 
tB tA 
  
 
 
103 
 
 
 
5.3.3 Simulation results and discussion 
An implication of design A is that if a CP takes the fastest node to threshold before 
the network synchronizes, the node will appear to have a longer period (
PCOPCOA TTT   ), thereby hurting the speed of synchronization. A typical example 
of node dynamics of the network for design A is plotted in Figure 5.9, which shows 
the firing times of each node relative to the fastest node every time the fastest node 
fires. We observed that the network can be locked to a slower clock at the beginning 
as the fastest node reduces τ each cycle, eventually leading the synchronization. 
 
 
Figure 5.9. Time of firing of all nodes relative to each time the fastest node fires 
(design A). 
 
0 10 20 30 40 50
-1
0
1
2 Phase Dynamics
Firing Cycle Index
Fi
rin
g 
Ti
m
e 
O
ff
se
t (
s)
  
 
 
104 
 
 
 
 
Figure 5.10. Synchronization speed, i.e. the average number of cycles (across 100 
different initial conditions) the leader node takes  to synchronize. Large and 
small jitter refer to reference clock jitter of 1 µs and 0.1 ns, respectively. 
The synchronization time increases linearly (Figure 5.10) with the reference clock 
period. Figure 5.11 (Design A with Small Jitter) shows that design A does not 
introduce any considerable relative jitter to the system. In fact, one advantage of 
design A is that it helps to maintain the synchronization since once synchronized, all 
the slower nodes have even longer effective periods and thus their period jitter is less 
likely to take them out of lock. The orange curve shows an increased relative jitter 
with the introduction of an absolute jitter of 1 µs. However, as the effect of 
discretization increases with Tref, design A reduces the relative jitter by reducing the 
contribution of the absolute jitter even below the analog oscillator case. The 
0
5
10
15
20
25
000E+0 200E-6 400E-6 600E-6 800E-6 1E-3
N
um
be
r o
f C
yc
le
s t
o 
Sy
nc
Reference Clock Period, Tref (s)
Design A with Small Jitter
Design A with Large Jitter
Design B with Small Jitter
Design B with Large Jitter
  
 
 
105 
 
 
 
performance of analog oscillator is equivalent to that of the digital designs when Tref is 
sufficiently small (Figure 5.11). 
 
 
Figure 5.11. Synchronization quality measured by the average of the maximum 
relative jitter. With the relaxed requirement, the network is considered 
synchronized if the period of each node is within Tref of the period of the fastest 
node. 
The impact of design B can be more serious because it affects the quality of the 
synchronization. A slow node locked to the leader node appears to have a shorter 
period (TB ≤ TPCO) and if the leader (fastest) node is not fast enough, TB may become 
smaller than the period of the leader node, resulting in the slow node temporarily 
getting out of lock. Once out of lock, the slow node has its nominal TPCO (instead of 
0E+0
2E-7
4E-7
6E-7
8E-7
1E-6
10E-9 100E-9 1E-6 10E-6 100E-6 1E-3
R
el
at
iv
e 
Ji
tte
r (
s)
Reference Clock Period, Tref (s)
Design A with Relaxed Requirement
Analog Case with Large Jitter
Design A with Small Jitter
Design A with Large Jitter
Design B with Small Jitter
Design B with Large Jitter
  
 
 
106 
 
 
 
TB) and synchronizes with the leader again. This constant loss of synchronization and 
re-locking can create a relative jitter that does not appear in design A. This effect is 
illustrated in the plot of a typical node dynamics in the pseudo-synchronized state 
(Figure 5.12). The relative firing times of many nodes exhibit triangular shapes, which 
means these nodes are getting in and out of lock. This large relative jitter is illustrated 
in Figure 5.11. The synchronization speed of design B appears faster than design A 
(Figure 5.10) because design A takes longer to reach synchronization with a much 
stricter relative jitter criterion. However, design A network achieves synchronization 
faster than design B for the same synchronization quality (Figure 5.13 - Design A with 
Relaxed Requirement). 
 
 
Figure 5.12. Time of firing of all nodes relative to each time the fastest node fires 
(design B). 
0 20 40 60 80
-1
0
1
Phase Dynamics
Firing Cycle Index
Fi
rin
g 
Ti
m
e 
O
ff
se
t (
s)
  
 
 
107 
 
 
 
 
Figure 5.13. Synchronization speed. With fast reference clocks (small Tref), the 
discretization error is negligible and the performance of design A and design B 
approaches that of analog oscillator. 
 
 
Figure 5.14. Synchronization speed with varying coupling strength 
We investigated whether the coupling strength alters the impact of the digital PCO 
on synchronization speed. As shown in Figure 5.14, the synchronization speed is 
0
5
10
15
20
25
10E-9 100E-9 1E-6 10E-6 100E-6 1E-3
N
um
be
r o
f C
yc
le
s t
o 
Sy
nc
Reference Clock Period, Tref (s)
Design A with Relaxed Requirement
Design B with Relaxed Requirement
Design A with Minimum Relative Jitter
Analog Case
0
10
20
30
0 0.2 0.4 0.6 0.8 1
N
um
be
r o
f 
C
yc
le
s t
o 
Sy
nc
Coupling Strength
Design A
Design B
  
 
 
108 
 
 
 
constant as long as the coupling strength is large enough. We also performed the same 
simulations with 5 nodes and the results were qualitatively similar. 
In summary, design A is the better choice. First, while design B adds relative jitter 
on the order of 1 µs, design A not only introduces zero relative jitter contribution from 
the discretization, but also reduces the contribution from the period jitter of the 
reference clock, achieving better synchronization quality than even the analog 
oscillator design. However, this robust synchronization is at the expense of 
synchronization speed. Depending on the system parameters, the network can take as 
many as hundreds of cycles or more to achieve synchronization. However, if the 
system does not have a strict relative jitter requirement, design A is also the better 
choice because if we can tolerate the same relative jitter as in design B, only 2-3 
cycles are necessary for design A network to pseudo-synchronize, which is fewer than 
that of the analog oscillator and much fewer than the tens of cycles of design B. 
5.3.4 Digital PCO implementation 
To meet the needs of this system, a PCO based on a relaxation oscillator described 
in design A was implemented using standard counters and digital logic. The slowing 
trigger clock is generated using a counter driven by the reference clock (Figure 5.15) 
and a static comparator logic that resets the counter when its count equals that of the 
main counter, which represents the oscillator. The output pulse from the comparator 
serves as the trigger clock signal for the main counter. A main counter that resets at 
count M+1 takes 2)1()...21(  MMTMT refref time to reset, which is within MTref 
  
 
 
109 
 
 
 
of the desired TPCO. To be able to set the oscillator period in steps of Tref, we included 
a dedicated counter in Trigger Clock Timing block that blocks the reference clock for 
up to MTref at the beginning of each cycle. The blackout counter tracks the blackout 
period. The main counter comprises two identical counters, one tracking the oscillator 
phase and the other tracking the coupling-adjusted oscillator phase so that the PCO 
can switch to the coupling-adjusted counter for instantaneous coupling. The former 
counter resets to 1 and the latter resets to a value equal to the coupling strength. The 
difference between design B and design A is that once a coupling to threshold is 
received, the PCO counters are either reset immediately (design B) or at the next 
reference clock (design A). 
 
Figure 5.15. Basic block diagram of the digital PCO 
5.4 Sequence generator 
The sequence generator block outputs the Kasami sequence when the PCO resets. 
It consists of 63 shift registers, which can be programmed to output any binary 
Reset
Trigger
Clock
PCO Output 
Pulse
Trigger
Clock
Timing
=
Reference
Clock
Received Coupling Pulse
Coupling
Blackout
PCO 
Main
Counters
R
  
 
 
110 
 
 
 
sequence. The output of this block becomes the baseband input of the commercial TX 
used in the RF node.  
5.5 Measurement results 
The synchronizer block was fabricated in a TSMC 65nm CMOS process and 
occupies 3.63 mm2. The P2P RF node setup is shown in Figure 5.16. The control and 
clock signals are generated by an FPGA and delivered to the synchronizer chip 
through a motherboard and daughterboard. The commercial RF TX and RX chips 
(ADRF6701 and ADRF6850) sit on a custom PCB that is connected to the baseband 
daughterboard. In addition to testing each block in the fabricated chip, we 
demonstrated synchronization of a 3 node wireless mesh network in the 915 MHz ISM 
band using the RF node setup in Figure 5.16. 
  
 
 
111 
 
 
 
 
Figure 5.16. Die and PCB photo 
The signal processor consumes 1.11	ܹ݉  when fully on and 13.19	μܹ  when 
operating at a duty cycle of 0.007% (Table 5.1). This is dominated by leakage, which 
accounts for about 7 µW. 
Table 5.1: Power consumption of synchronizer block 
 Clock Running SP Clock Off Duty-cycled 70us/1s 
Signal Processor 1.04 mW 6.19 uW 6.26 uW 
PCO 6.93 uW 6.93 uW 6.93 uW 
Total 1.11 mW 13.12 uW 13.19 uW 
An estimate for a duty-cycled power consumption of a commercial RF front end is 
provided in Table 5.2. The duty-cycling is assumed to be 100us/1s in order to account 
  
 
 
112 
 
 
 
for slow turn on time. The VCO in the synthesizer is assumed to take 1.5 ms to turn on 
and stabilize. The total power of maintaining the synchronization is 13.2	ߤܹ ൅
41ߤܹ ൌ 54.2	ߤܹ. 
Table 5.2: Power consumption of a duty-cycled RF front end 
RF Duty-cycled Notes 
TX 10 uW Duty-cycled 100us/1s 
RX 24 uW Duty-cycle: VCO 1.5ms/1s, the rest 100us/1s 
Xtals 7 uW No duty-cycle 
Total 41 uW  
The data rate for the input baseband signal is 1.25	ܯܾ݌ݏ for the measurements 
presented below ( ௕ܶ௜௧ ൌ 0.8	ߤݏ ). The latency is less than 2 ௕ܶ௜௧ ൌ 1.6	ߤݏ . More 
specifically, the time it takes for the peak detector to output the result once the last bit 
of the sequence is received at the processor input is between ௕ܶ௜௧ and 2 ௕ܶ௜௧, depending 
on the phase offset between the transmitter and the receiver. 
 
Figure 5.17. Differential detector: 1st and 2nd waveform – I and Q inputs, 3rd 
waveform – digitized output, 4th waveform – peak detector output. Differential 
detector correctly decodes the baseband signals with carrier frequency offset 
  
 
 
113 
 
 
 
modulation. The envelope wavelength is deliberately shortened to show the bit 
transitions clearly. 
Figure 5.17 demonstrates the functionality of the differential detector. The top two 
waveforms are the baseband received signals on I and Q channels with an envelope 
due to the carrier frequency mismatch. The figure shows that the encoded sequence 
embedded in them flips polarity from one eye to the next. Regardless of this polarity 
switch, the sequence is decoded correctly as shown by the third waveform, which is 
the digitized output of the differential detector in Core 1. The decoded sequence is also 
correct when the received signal has small amplitude at the edges of the eye. The 
fourth waveform is the peak detector output confirming that the differential detector 
sent the correct decoded sequence to the correlator. 
 
Figure 5.18. Correlation success rate vs. input amplitude. Blue/bottom waveform 
has fixed ࢂ࢚ࢎ࢘ࢋ࢙ࢎ. Red/top waveform has amplitude detector turned on. 
Measurement of the amplitude detector is presented in Figure 5.18. The red/top 
waveform shows the success rate of the correlation when the amplitude detector is 
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
0
0.5
1
  
 
 
114 
 
 
 
turned on. Since the threshold automatically adjusts based on the input amplitude, the 
correlation works for a large range of input voltages. The blue/bottom waveform plots 
the case without the amplitude detector, i.e. with fixed ௧ܸ௛௥௘௦௛. If the input power is 
too small (or too large, which is not shown in the figure), the correlation fails.  
Figure 5.19 shows the dual core design. When there is a dead-zone at the output of 
one core due to it sampling the input signals during the transitions, the other core 
samples at the right phase and produces the correct output. 
 
Figure 5.19. Correlator output of Core 1 (top) and Core 2 (bottom). When one of 
the cores samples the input signal at bit transitions and skips a pulse (inside red 
rectangles), the other core samples at the center of the bits, producing the correct 
peaks. 
  
 
 
115 
 
 
 
 
Figure 5.20.  BER vs SNR at the chip input. 
Figure 5.20 plots correlation BER vs input SNR. Since our signal processor does 
not include an RF front end, the BER is measured against SNR of the baseband. The 
setup for this measurement is shown in Figure 5.21. Assuming a commercial RF front-
end with ܰܨ ൌ 5	݀ܤ  and 2	ܯܪݖ  bandwidth, the SNR of 5	݀ܤ  at BER of 10ିଷ 
translates to a sensitivity of െ101	݀ܤ݉. If the PCO nodes are placed outdoors in an 
urban environment at 1.5 m height and set to transmit with 0 dBm power, the path loss 
model in (4.2) applies and the PCO nodes should be within 67.4 m of each other in 
order to synchronize with BER = 10-3. 
Figure 5.22 shows tolerance to in-band interference from multipath and other 
nodes. For distances within the 120m range (path delay less than ~400ns, i.e. half bit 
period) there is negligible degradation of performance from multipath reflection. This 
  
 
 
116 
 
 
 
is due to the fact that the bit period is relatively long and the echo signals still arrive 
within a bit period, which does not degrade the correlation significantly. 
 
 
Figure 5.21. Setup for BER measurement. 
 
 
Figure 5.22. BER in the presence of multipath. Multipath interference of equal 
magnitude does not significantly degrade performance up to distances of 144m of 
system range. 
AWG
AWG
AWG BasebandProcessor
I
Q
+
-
+
- PCO
Added 
Noise
Baseband signal with noise 
(SNR), random phase, LO 
frequency mismatch envelope
Success if single 
pulse at the right time.
Otherwise error
1.00 0.2 0.4 0.6 0.8
(240)(0) (48) (96) (144) (192)
BER = 1e-3
A
m
pl
itu
de
 R
at
io
 o
f 2
nd
Se
qu
en
ce
 to
 1
st
Se
qu
en
ce
Delay of 2nd sequence arrival in bit-period
(Equivalent distance difference in m)
0.80
0.85
0.90
0.95
1.00
1.05
  
 
 
117 
 
 
 
 
Figure 5.23. Large in-band interferers with signal ratios outside of the blue 
region and with offsets between 0.6 and 63 bits can degrade detection. 
However, if the PCO period difference between the fastest (i.e. shortest PCO 
period) two nodes in the system is smaller than syncword length, or <51ppm in our 
system, their transmitted signals overlap, degrading detection (BER>1e-3) if the 
amplitude ratio is large (above the blue region in Figure 5.23). This can be resolved by 
ensuring that one node in the network has a 51ppm faster PCO and drives the nodes to 
synchrony. 
Figure 5.24 shows the transient circuit simulation output of the main counter of the 
digital PCO, showing the monotonically-increasing concave-down curve. Figure 5.25 
shows the transient of the digital PCO set to reset at count 16 for simplicity purposes. 
The top waveform is external coupling pulse. Its first pulse is ignored as it is in the 
blackout period (3rd waveform indicates blackout, active low). The second pulse 
BER = 1e-3
6050403020100
1.0
0.8
0.6
0.4
0.2
0.0
Sequence offset in bits
A
m
pl
itu
de
 R
at
io
  
 
 
118 
 
 
 
causes the PCO (4th waveform) to reset early. 2nd waveform indicates when the PCO 
starts from 0. 
Figure 5.26 shows the output of the sequence generator (3rd waveform). It can be 
seen that the sequence generator outputs the same sequence as the 1st waveform, 
which is the correct sequence. 
 
Figure 5.24. Digital PCO circuit output – Cadence transient simulation. A 
coupling pulse received at t =7 µs advances the oscillator state 
 
Figure 5.25. Transient of digital PCO.  
0
5
10
000E+0 20E-6 40E-6 60E-6 80E-6 100E-6
M
ai
n 
C
ou
nt
er
 
O
ut
pu
t
Time  (s)
Coupling Pulse Received
  
 
 
119 
 
 
 
 
Figure 5.26. Transient of the sequence generator 
 
Figure 5.27. Transient of a node locking to a received signal 
  
 
 
120 
 
 
 
Figure 5.27 shows the functionality of the synchronizer. An RF source is 
configured to (1) turn on, (2) send carrier signal, and then (3) modulate the baseband 
Kasami sequence to the carrier frequency. The signal from the RF source, 
demodulated down to baseband, is seen as the red/top waveform in the figure. It can 
be seen that signal processor detects the expected sequence successfully despite phase 
and frequency mismatch and noise. Disabling the 2nd core resulted in the dead-zone in 
correlator output, which implies the baseband clock of the RF source has also different 
frequency compared to the FPGA generated baseband clock. The green/bottom 
waveform shows that the PCO is able to lock to the received signal.  
Figure 5.28 shows synchronization of 3 nodes in a mesh network through an RF 
link at 915 MHz with off-the-shelf RX and TX chips. The top waveform shows the 
initial state of the three PCO’s at their natural frequencies and phases. The bottom 
shows the synchronized PCOs after the links are activated and coupling between 
nodes proceeds. 
 
Figure 5.28. Wireless synchronization of three nodes 
  
 
 
121 
 
 
 
The synchronization jitter is shown in Figure 5.29. Jitter is measured to be 4e-5% 
of period and does not significantly degrade duty cycle. Most of the jitter is due to the 
baseband clock mismatch, which causes the synchronized edge to drift over half bit 
period (400 ns) as it is sampled by the two SP cores that sample half bit period apart. 
 
Figure 5.29. The reset signal of synchronized PCO and the histogram of its rising 
edge. 
  
 
 
122 
 
 
 
 
Figure 5.30. Duty-cycled operation. System locks with low latency despite long 
“off” cycle and random “on” states. 
Finally, once an RF node is synchronize to the network, it duty-cycles the signal 
processing block and the sequence generator block to save power. The digital PCO is 
always on since it provides the timing for the duty-cycling. In this implementation, the 
duty-cycling is accomplished by stopping the clock signals. Figure 5.30 shows that the 
node turns on the clock right before it expects to receive a coupling pulse, receives the 
pulse and advances its PCO phase to threshold/reset, maintaining the network timing. 
  
 
 
123 
 
 
 
Since this architecture is unique in that it supports scalable mesh networks, the 
closest comparisons in performance (sensitivity, SNR, power) are made to the wake-
up radios as shown in Table 5.3. Estimates of NF, RX power, BW are based upon 
commercial radio front end.  
Table 5.3: Comparison with the state-of-the-art wake-up radios. 
 
 
Sensitivity (dBm) Power (uW) Modulation Data rate (kbps) LO Freq (MHz) Tech (nm) 
This Work -101a 44.2b BPSK 1250 915 65 
[26] -97 99 OOK 10 2400 65 
[27] -87 45.5 GFSK 50 924.4 65 
[28] -70 44 FSK 200 402 130 
[29] -69 0.0045c OOK 0.3 113.5 180 
[30] -56.5 0.236 OOK 8.192 2400 65 
[31] -45 0.116 OOK 12.5 402 130 
* Comparison with the state-of-the-art wake-up radios. Comparison is to RX only since other designs do not include TX power. 
aEstimated from baseband SNR assuming conventional RX with NF=5dB and BW=2MHz 
b13.19uW baseband chip + 31uW for RX front end duty-cycled at 0.01%, synthesizer assuming 1.5ms turn-on time, and crystal. 
cAssumes clock and input data are synchronized 
5.6 Conclusion 
In this chapter, we demonstrated a low-power baseband synchronizer block for 
emergent synchronization of P2P RF network. The architecture supports a low-latency 
detection of a programmable syncword in wireless nodes. It is compatible with 
commercial RF front ends, and can enable aggressive duty-cycling for power savings 
  
 
 
124 
 
 
 
in such systems. Our measurement results show a total power consumption of 1.11 
mW while processing 1.25 Mbps syncword in a fully-on mode, stand-by power of 
13.12 µW, a sensitivity of -101 dBm, and latency of less than two bit periods, enabling 
0.007% duty-cycled power consumption of 13.19 µW.  
Although the hardware that synchronizes narrowband P2P radio is developed in 
this thesis, fully functional wireless P2P network with duty-cycled communication 
requires future work. First, a communication framework that uses the PCO reset signal 
as a trigger to schedule the data communication should be developed. In addition, the 
current duty-cycling algorithm should be expanded to handle turning on the radio 
during data transmission. This should take into account the fact that multi-hop nodes 
may reset ௣ܶ௥௢௣ before/after their neighbors.  
Design trade-offs affecting power consumption should be more thoroughly studied 
to improve power savings or increase performance. For example, reducing the 
syncword length reduces ௣ܶ௥௢௣  and saves power and area in the signal processing 
block. However, it will degrade BER performance, requiring higher TX power or 
shortened range. If better BER performance is required, the syncword should be 
extended to achieve more correlation gain. However, expanding the proposed analog 
processor to handle longer sequence would quickly increase the power and area costs 
due to parasitics. Hence, all digital architecture for the signal processor should be 
investigated. With an ADC at the input, the exact same architecture for the differential 
detector, correlator and amplitude detector can be implemented in digital. 
Alternatively, traditional signal processing may be performed in DSP. The power and 
  
 
 
125 
 
 
 
other metrics should be evaluated to determine which signal processor implementation 
is better. 
  
  
 
 
126 
 
 
 
Chapter 6 REFERENCES 
[1] X. Wang, M. Tehranipoor and R. Datta, “A novel architecture for on-chip path 
delay measurement,” 2009 Int. Test Conf., Austin, TX, 2009, pp. 1-10. 
[2] A. Jain, A. Veggetti, D. Crippa and P. Rolandi, “On-chip delay measurement 
circuit,” 17th IEEE Eur. Test Symp., Annecy, 2012, pp. 1-6. 
[3] R. Datta, A. Sebastine, A. Raghunathan and J. A. Abraham, “On-chip delay 
measurement for silicon debug”, Proc. Great Lakes Symp. VLSI, Boston, MA, 
2004, pp. 145-148. 
[4] M. C. Tsai, C. H. Cheng and C. M. Yang, "An All-Digital High-Precision Built-In 
Delay Time Measurement Circuit," Proc. 26th IEEE VLSI Test Symp., San Diego, 
CA, 2008, pp. 249-254. 
[5] S. Ghosh, S. Bhunia, A. Raychowdhury and K. Roy, "A Novel Delay Fault Testing 
Methodology Using Low-Overhead Built-In Delay Sensor," in IEEE Trans. 
Comput.-Aided Des. Integr. Circuits Syst., vol. 25, no. 12, pp. 2934-2943, Dec. 
2006. 
[6] K. Kato and S. Choomchuay, “An on-chip delay measurement using adjacency 
testable scan design,” 7th Int. Conf. Inform. Technol. and Elect. Eng., Chiang Mai, 
2015, pp. 508-513. 
[7] S. Maggioni, A. Veggetti, A. Bogliolo and L. Croce “Random sampling for on-
chip characterization of standard-cell propagation delay”, Proc. 4th Int. Symp. 
Quality Electron. Des., 2003, pp. 41-45. 
[8] R. Z. Bhatti, M. Denneau and J. Draper, “Phase measurement and adjustment of 
digital signals using random sampling technique,” 2006 IEEE Int. Symp. Circuits 
and Syst., Island of Kos, 2006, pp. 4. 
  
 
 
127 
 
 
 
[9] M. Mansuri, B. Casper and F. O’Mahony, “An on-die all-digital delay 
measurement circuit with 250fs accuracy,” Symp. VLSI Circuits, Honolulu, HI, 
2012, pp. 98-99. 
[10] T. Hashimoto, H. Yamazaki, A. Muramatsu, T. Sato and A. Inoue, “Time-to-
digital converter with Vernier delay mismatch compensation for high resolution 
on-die clock jitter measurement”, IEEE Symp. VLSI Circuits, Honolulu, HI, 2008, 
pp. 166-167. 
[11] K. Nose, M. Kajita and M. Mizuno, “A 1-ps resolution jitter-measurement 
macro using interpolated jitter oversampling”, IEEE J. Solid-State Circuits, vol. 
41, no. 12, pp. 2911-2920, Dec. 2006. 
[12] M. Ishida, K. Ichiyama, T. Yamaguchi, M. Soma, M. Suda and T. Okayasu, 
“On-chip circuit for measuring data jitter in the time or frequency domain", IEEE 
Radio Frequency Integrated Circuits Symp., Honolulu, HI, 2007, pp. 347-350. 
[13] J. Liang, M. S. Jalali, A. Sheikholeslami, M. Kibune and H. Tamura, "On-Chip 
Measurement of Clock and Data Jitter With Sub-Picosecond Accuracy for 10 Gb/s 
Multilane CDRs", IEEE J. Solid-State Circuits, vol. 50, no. 4, pp. 845-855, Apr. 
2015. 
[14] K. A. Jenkins, A. P. Jose and D. F. Heidel, "An on-chip jitter measurement 
circuit with sub-picosecond resolution", Proc. Eur. Solid-State Circuits Conf., 
2005, pp. 157-160. 
[15] K. Niitsu, M. Sakurai, N. Harigai, T. J. Yamaguchi and H. Kobayashi, "CMOS 
Circuits to Measure Timing Jitter Using a Self-Referenced Clock and a Cascaded 
Time Difference Amplifier With Duty-Cycle Compensation", IEEE J. Solid-State 
Circuits, vol. 47, no. 11, pp. 2701-2710, Nov. 2012. 
  
 
 
128 
 
 
 
[16] J. D. Schaub, F. H. Gebara, T. Y. Nguyen, I. Vo, J. Pena and D. J. Acharyya, 
"On-chip jitter and oscilloscope circuits using an asynchronous sample clock," 
Proc. Eur. Solid-State Circuits Conf., 2008, pp. 126-129. 
[17] X. Wu et al., "FlashLinQ: A Synchronous Distributed Scheduler for Peer-to-
Peer Ad Hoc Networks," in IEEE/ACM Trans. Netw., vol. 21, no. 4, pp. 1215-
1228, Aug. 2013. 
[18] R. E. Mirollo and S. H. Strogatz, “Synchronization of pulse-coupled biological 
oscillators,” SIAM J. Appl. Math, vol. 50, pp. 1645-1662, Dec. 1990. 
[19] Y. W. Hong and A. Scaglione, “A scalable synchronization protocol for large 
scale sensor networks and its applications,” IEEE J. Sel. Areas Commun., pp. 
1085–1099, May 2005. 
[20] X. Wang, R.K. Dokania and A. Apsel, “PCO Based Synchronization for 
Cognitive Duty-Cycled Impulse Radio Sensor Network,” IEEE Sensors J., vol. 11, 
no. 3, pp. 555 - 564, Mar. 2011. 
[21] E. Gantsog, D. Liu and A. B. Apsel, "0.89 mW on-chip jitter-measurement 
circuit for high speed clock with sub-picosecond resolution," in Proc. Eur. Solid-
State Circuits Conf., Sep. 2016, pp. 457-460. 
[22] J. A. McNeill and D. Ricketts, “Low jitter VCO design examples” in The 
Designer's Guide to Jitter in Ring Oscillators. Boston, MA, USA: Springer-Verlag 
US, 2009, ch. 9, sec. 1, pp. 245-246. 
[23] S. H. Hall, H. L. Heck, “Modeling and Budgeting of Timing Jitter and Noise,” 
in Advanced Signal Integrity for High-Speed Digital Designs, Hoboken, NJ, USA: 
John Wiley & Sons, 2009, ch. 13, pp. 549-603. 
  
 
 
129 
 
 
 
[24] M. S. Corson, R. Laroia, J. Li, V. Park, T. Richardson and G. Tsirtsis, "Toward 
proximity-aware internetworking," in IEEE Wireless Commun., vol. 17, no. 6, pp. 
26-33, December 2010. 
[25] IEEE P802.11 Wireless LANs, TGah Channel Model, IEEE 802.11-11/0968r4, 
Mar. 2015. 
[26] C. Salazar, A. Kaiser, A. Cathelin and J. Rabaey, "13.5 A −97dBm-sensitivity 
interferer-resilient 2.4GHz wake-up receiver using dual-IF multi-N-Path 
architecture in 65nm CMOS," 2015 IEEE Int. Solid-State Circuits Conf. – Dig. 
Tech. Papers, San Francisco, CA, 2015, pp. 1-3. 
[27] T. Abe et al., "An ultra-low-power 2-step wake-up receiver for IEEE 
802.15.4g wireless sensor networks," 2014 Symp. VLSI Circuits Dig. Tech. 
Papers, Honolulu, HI, 2014, pp. 1-2. 
[28] J. Pandey, J. Shi and B. Otis, "A 120μW MICS/ISM-band FSK receiver with a 
44μW low-power mode based on injection-locking and 9x frequency 
multiplication," 2011 IEEE Int. Solid-State Circuits Conf., San Francisco, CA, 
2011, pp. 460-462. 
[29] H. Jiang et al., "24.5 A 4.5nW wake-up radio with −69dBm sensitivity," 2017 
IEEE Int. Solid-State Circuits Conf., San Francisco, CA, 2017, pp. 416-417. 
[30] N. E. Roberts et al., "26.8 A 236nW ???56.5dBm-sensitivity bluetooth low-
energy wakeup receiver with energy harvesting in 65nm CMOS," 2016 IEEE Int. 
Solid-State Circuits Conf., San Francisco, CA, 2016, pp. 450-451. 
[31] S. Oh, N. E. Roberts and D. D. Wentzloff, "A 116nW multi-band wake-up 
receiver with 31-bit correlator and interference rejection," Proc. IEEE 2013 
Custom Integrated Circuits Conf., San Jose, CA, 2013, pp. 1-4. 
