Automated Tool To Generate Global Clock Distribution For Spine Structure by Tan , Chun Meng
Automated Tool to Generate Global Clock Distribution 
for Spine Structure 
 
 
 
 
 
 
By 
 
 
 
 
Tan Chun Meng 
 
 
 
 
 
A Dissertation submitted for partial fulfilment of the 
requirement for the degree of Master of Science 
 
 
 
 
July  2014 
 ii 
Acknowledgement 
 
First and foremost, I would like to thank God, the merciful and the compassionate, for 
having made everything possible in this work by giving me the power to believe in 
myself and showing me inner peace. 
 
It is a pleasure to offer my regards and thanks to those who made this thesis possible and 
those who helped me during the research. This dissertation could not have been written 
without their help and support. Firstly, I am sincerely thankful to all my family, 
particularly to my dearest loving mother. Thank you for everything that you have done 
for me; and this thesis is for you. I am also thankful to my supervisor, Dr. Arjuna 
Marzuki, for his support and for giving me the opportunity to work under him. His 
suggestions and constructive comments have improved my thinking, research view, and 
writing skills. Thanks for teaching me to view the problem from different angles.  
 
Last, but by no means least, I would like to thank the Dean and staff of the School of EE, 
USM for their valuable support. In particular, I would like to thank Pn. Jamaliah and Cik 
Rosni who has helped me during my candidature period in USM.  
 iii 
Table of Contents 
Acknowledgement .............................................................................................................. ii 
Table of Contents ............................................................................................................... iii 
List of Tables .......................................................................................................................v 
List of Figures and Illustrations ......................................................................................... vi 
List of Abbreviations and Nomenclature .......................................................................... vii 
Abstrak ............................................................................................................................. viii 
Abstract .............................................................................................................................. ix 
CHAPTER 1 ......................................................................................................................10 
INTRODUCTION .............................................................................................................10 
1.1 Overview ..................................................................................................................10 
1.2 Problem Statements .................................................................................................14 
1.3 Aim and Objectives .................................................................................................15 
1.4 Research Methodology ............................................................................................15 
1.5 Thesis Organization .................................................................................................16 
CHAPTER 2 ......................................................................................................................18 
LITERATURE REVIEW ..................................................................................................18 
2.1 Introduction to Clock in Synchronous System ........................................................18 
2.1.1 Transition Time ...............................................................................................20 
2.1.2 Duty Cycle .......................................................................................................21 
2.1.3 Clock Skew ......................................................................................................23 
2.1.4 Clock Jitter .......................................................................................................25 
2.1.5 Clock Uncertainty ............................................................................................26 
2.2 A Glance on Various Kinds of Clock Distribution Networks .................................28 
2.2.1 Binary Tree ......................................................................................................30 
2.2.2 The H-Tree ......................................................................................................32 
2.2.3 Clock Mesh and Local Meshes ........................................................................33 
2.2.4 Clock Spine .....................................................................................................35 
2.3 CTS Process .............................................................................................................36 
2.3.1 Limitation of CTS ............................................................................................36 
2.4 Summary ..................................................................................................................38 
CHAPTER 3 ......................................................................................................................39 
THE DESIGN AND IMPLEMENTATION OF THE CLOCK SPINE ............................39 
3.1 Overview of the CTS Design Flow ..........................................................................40 
3.2 Overview of the Clock Spine Custom PnR Design Flow ........................................42 
3.3 Overview of the Automated Clock Spine Generation Design Flow ........................44 
3.3.1 The Automated Solution Concept to Generate Netlist and TCL Command 
File ...................................................................................................................46 
3.3.2 The Automated Solution Concept to Calculate the Coordinates of the 
Buffers..............................................................................................................50 
3.4 Circuit Design ..........................................................................................................56 
3.5 Layout Implementation ............................................................................................58 
3.6 Common CTS ..........................................................................................................59 
 iv 
3.7 Summary ..................................................................................................................61 
CHAPTER 4 ......................................................................................................................62 
EVALUATION AND DISCUSSION ...............................................................................62 
4.1 Clock Spine Quality of Results ................................................................................63 
4.2 CTS Quality of Results ............................................................................................68 
4.3 Comparison of the Results .......................................................................................71 
4.4 Summary ..................................................................................................................72 
CHAPTER 5 ......................................................................................................................73 
CONCLUSION ..................................................................................................................73 
5.1 Key Learning ...........................................................................................................73 
5.2 Conclusion ...............................................................................................................74 
5.3 Future Work .............................................................................................................76 
REFERENCES ..................................................................................................................77 
Appendix A: Clock Spine Generation Script .....................................................................79 
Appendix B: Clock Spine Netlist .......................................................................................86 
Appendix C: Full Chip Netlist ...........................................................................................87 
Appendix D: TCL Command File .....................................................................................88 
 
 
 
 
 
 
 
 
 
 
 
 
 v 
List of Tables 
 Table 4.1 The Clock Spine Timing Characteristic Taken from the Measurement 
Points 66 
Table 4.2 The CTS Timing Characteristic Taken from Random Measurement Points
 69 
Table 4.3 The Summary of Clock Spine versus CTS Timing Performance 
Comparison 71 
 
 
 
 vi 
List of Figures and Illustrations 
 Figure 1.1 Typical Synchronous Circuits 10 
Figure 2.1.1 Illustration of Transition Time  20 
Figure 2.1.2 Illustration of Duty Cycle 21 
Figure 2.1.3 Illustration of Clock Skew 23 
Figure 2.1.4 Clock Jitter Effect 25 
Figure 2.1.5 Illustration of Clock Uncertainty 26 
Figure 2.2 Concept of Global, Regional and Local Clock Distribution Network 28 
Figure 2.2.1 Example of Binary Tree  30 
Figure 2.2.2 Example of H-Tree 32 
Figure 2.2.3 Illustration of Clock Mesh 33 
Figure 2.2.4 Clock Spine Structure 35 
Figure 2.3.1 (a) CTS Add In Buffer to Balance Clock Skew 37 
Figure 2.3.1 (b) CTS Remove Buffer to Balance Clock Skew 37 
Figure 3.1 The Design Flow Overview for CTS 40 
Figure 3.2 The Design Flow Overview for Custom Clock Spine 42 
Figure 3.3 The Design Flow Overview for Auto Generated Clock Spine 44 
Figure 3.3.1(a) The Algorithm of Automated Solution 46 
Figure 3.3.1(b) The Pseudo Code of the Automated Solution 47  
Figure 3.3.1 (c) The Code Snippet to Parse the Text for Spine Netlist 49 
Figure 3.3.2 (a) The Algorithm of Automated Solution 50 
Figure 3.3.2 (b) The Illustration of Buffers Coordinates Calculation for Even Number 
of Buffers in Next Stage 51  
Figure 3.3.2 (c) The Illustration of Buffers Coordinates Calculation for Odd Number 
of Buffers in Next Stage 52 
Figure 3.3.2 (d) The Induction to Derive the Algorithm to Calculate Y-Coordinates 
for Odd Number of Buffers in Next Stage Cases 53 
Figure 3.4(a) Overview of the Clock Spine Architecture 56 
Figure 3.4 (b) Schematic Entry of the Clock Spine for Pre-layout Simulation 57 
Figure 3.5 Snapshot of the Clock Spine Layout in Full Chip View 58 
Figure 3.6 Snapshot of the ICC CTS Layout Results in Full Chip View 60 
Figure 4.1(a) The Flow Chart of Clock Spine Performance Evaluation 63 
Figure 4.1(b) The Clock Spine Measurement Points 64  
Figure 4.1(c) The Sample Input Clock Signal to Output Clock Signal of the Clock 
Spine Distribution Network 65 
Figure 4.2 The Sample Input Clock Signal to Output Clock Signal of the CTS 
Distribution Network 68 
 vii 
List of Abbreviations and Nomenclature 
 
Abbreviation 
IC 
SoC 
IP 
OCV 
CTS 
HDL 
RTL 
ICC 
STA 
FC 
ECO 
CAD 
APR 
PnR 
 Meaning 
Integrated Circuit 
System on Chip 
Intellectual Properties 
On Chip Variation 
Clock Tree Synthesis 
Hardware Description Language 
Register Transfer Logic 
Integrated Circuit Compiler 
Static Timing Analysis 
Full Chip 
Engineering Change Order 
Computer Aided Design 
Auto Place and Route 
Place and Route 
 
  
 viii 
Abstrak 
 
Jam signal adalah isyarat yang mengawal dan menyegerakkan activiti-activiti logik serta 
activiti-activiti baca daripada/tulis kepada daftar-daftar dalam sesebuah litar segerak. 
Jadi, cara untuk mereka bentuk rangkaian pengedar signal Jam sentiasa dikeutamaan 
dalam reka bentuk litar bersepadu. Skim tulang belakang jam terkenal dengan kualiti 
isyarat yang diedarkan. Skim tersebut mempunyai prestasi yang baik jika dikajikan dari 
segi pencong, ketar dan OCV. Oleh itu, skim ini telah pun terjadi popular terutamanya 
litar berkelajuan tinggi seperti CPU. Walau bagaimanapun, skim tulang belakang jam ini 
tidak biasa digunakan dalam SoC. Hal ini adalah kerana kesulitan untuk mereka bentuk 
dan juga kerumitan dalam mengesahkan prestasi skim ini. Banyak alat-alat reka bentuk 
bagi SoC tidak menyokong skim tersebut sehingga hari ini. Jadi metodologi untuk 
memperkenalkan and mengintegrasikan skim tulang belakang jam serta membuktikan 
qualiti signal yang diedarkan akan dibincangkan di dalam tesis ini. Perbincangan 
termasuklah pengetahuan dan automasi metodologi yang boleh mengurangkan kerumitan 
mereka bentuk skim tulang belakang jam. 
 ix 
Abstract 
 
Clock is a signal which synchronizes the logic as well as register read/write activities of a 
synchronous circuitry. Therefore a good way to design a reliable clock distributor 
network is always the top priority in IC design. Clock spine is well known for the 
robustness in clock signal quality delivered. Spine structure had shown good performance 
in terms of skew, jitter and OCV. Thus this scheme is popular for the high speed circuitry 
such as CPU chipset design. However, the clock spine is not commonly employed in 
SoC, due to the design as well as the validation complexity of this scheme. Many SoC 
design toolsets do not support this scheme up until now. So in this thesis, an automated 
methodology will be introduced and proven to integrate clock spine into a SoC to 
distribute a high frequency clock signal. These include the know-how and automation of 
the methodologies to minimize the complexity of designing the clock spine.  
 10 
 
 
CHAPTER 1   
 
INTRODUCTION 
 
 
1.1 Overview 
Figure 1.1 Typical Synchronous Circuits 
 
As shown in Figure 1.1, this is a typical synchronous circuit. Combinational circuits are 
a group of functional or computational combination of logic gates, such as adder. In 
between the combinational circuits, there will be a registers bank. The registers are flip-
flops employed to hold the computed data. 
 
 11 
 
 
Usually the flip-flops used will be of edge triggered flip-flops. When the clock signals 
is in transition from low to high, the registers load in bits output from the combinational 
circuit. Then the data get transferred to the next stage. At clock signal transition from 
low to high, the data computed in Combinational circuit A will write into Register bank 
A. Then the data in Register bank A will get populated to Combinational circuit B. 
 
The Combinational Circuit B will calculate the output. By the next low to high 
transition, the output will be written into Register bank B. And at the same time, the 
content in the Register bank A will be overwritten by a new set of data computed by 
Combinational Circuit A. 
 
Clock signals in IC design world are square wave signals, toggling between two states 
namely high and low. There are several clocking signals oscillating at different 
frequencies and driving different partitions in practical chip design. The full chip is 
partitioned based on functionality and operating frequencies. 
 
From the example explained in previous text, the clock is ultra-vital in term of the 
chip’s functionality. The quality of the clock signals distributed to the registers will 
directly cause an impact to the chip’s functionality and even may cause the design to be 
malfunction. Nowadays, whenever a new project to design a new chip started, the clock 
distribution networks were to be redesigned again. So a very significant design effort 
 12 
 
 
was channeled to make sure the clock distribution networks delivered good quality 
clock signals. 
 
In SoC design, some IP is getting reused. Take the multicore processor as an example, 
the similar core is duplicated and integrated as a duo-core or quad-core processor. Yet 
these cores are run by the same frequency clock signals. Thus, a global distribution 
network is needed to distribute the clock signal to these cores. Binary tree for example 
is employed to serve as a global clock distribution network. The clock designer needs to 
re-synthesize the clock tree taking various clock quality issues such as clock skew into 
consideration. The clock tree is synthesized by process called Clock Tree Synthesis 
(CTS) with the aid of Computer Aided Design (CAD) toolsets. Tree populated by CTS 
is however sensitive to On Chip Variation (OCV) problem. 
 
For high speed circuitry such as CPU, another scheme named clock spine is commonly 
used. Clock spine is well known to have better skew, jitter and OCV performance 
against other kind of distribution network schemes. However, this structure is not 
supported by the SoC design toolsets and thus designers need to manually do the place 
and route task. The manual work is very time consuming. 
 
To be competent in the time-to-market, IC design companies wish to market the product 
in a fast and qualified manner. In this thesis, a new tool or approach to automate the 
clock spine place and route work will be discussed. The proposed solution is able to 
 13 
 
 
reduce the timely effort if the clock designer chose to use a spine as the global clock 
distribution network. 
  
 14 
 
 
1.2 Problem Statements 
SoC is getting popular among IC design industries. In the SoC design, clock signals are 
important to synchronize the chip’s logic activities. Without a high quality clock signals, 
the chip might not able to function as expected. Computation errors will lead to 
malfunction of the design. The major error sources for a clock signal are clock skew and 
jitter. 
 
Clock spine had been commonly used in high speed circuitry design such as CPU. It 
provides great quality of results in skew and jitter reduction. But it has not been 
commonly used in a SoC design. The clock spine is not yet supported by many SoC 
design toolsets up until now. The clock spine needs to be placed and routed manually. 
The manual work consumes a great amount of design effort. The parasitic RC of the 
spine needs to be extracted for timing quality verification.  
 
Therefore, in this thesis, a methodology and apparatus will be introduced and proven to 
integrate clock spine into a SoC to distribute a high frequency clock signal. These 
include the know-how and automation of the methodologies to minimize the complexity 
of the clock spine.  
  
 15 
 
 
1.3 Aim and Objectives 
The aim of the research is to design, implement, and evaluate a new methodology and 
apparatus of a clock spine distribution network in a SoC. To realize this aim, the 
following objectives are adopted: 
i. To prove the concept of automating the clock spine generation as a global 
clock network in IC design. 
ii. To evaluate the performance of the automated generated clock spines versus 
CTS tree in term of the signal quality. 
 
 
1.4 Research Methodology  
Overall, the research’s methodology is divided mainly into three phases.  
 
i. Literature review: The literature starts by reviewing the basic concept of 
clock signals. This includes the fundamentals to understand the source of 
errors to the clock signals. Various clock schemes used in existing IC design 
industry will be introduced in brief. The limitation of designing clock tree by 
CTS as well as designing clock spine with manual effort will be discussed.  
 
 16 
 
 
ii. Design and Implementation: A Perl script is used to generate the netlist files 
and TCL placement command file for the clock spine. Using Auto Place and 
Route (APR) in Synopsys ICC tool, the spine’s layout would be generated. 
The parasitic RC of the Spine clock will be simulated with using SPICE. 
 
iii. Evaluation, Benchmarking, and Case study: The clock spine simulated result 
would be compared to the Clock Tree Synthesis (CTS) result. The spine 
performance will be benchmarked with the CTS result. 
 
 
1.5 Thesis Organization 
This thesis is organized into five chapters. The rest of the thesis is organized as follows. 
Chapter 2 elaborates the key parameters and design specifications of a clock distribution 
design and the state-of-the-art for the clock distribution schemes in SoC industry. In the 
end of the chapter, the key problem statement and interest of the thesis will be explained. 
 
Chapter 3 outlines the design flow and implementation of clock distribution network 
design. This includes the SoC industry commonly used toolset for CTS, ordinary way of 
populating a clock spine network as well as the proposed automated way to generate the 
clock spine network.  
 
 17 
 
 
Chapter 4 compares the quality of results for both clock spine and CTS tree (as 
elaborated in Chapter 3). Benchmarking of both clock schemes will be elaborated and 
discussed in details. 
 
Finally, in Chapter 5, the conclusions of the research are presented as well as the 
findings and contributions of the research are highlighted clearly. In addition, the 
chapter highlights the possible future works as a continuation of this work.   
 18 
 
 
CHAPTER 2   
 
LITERATURE REVIEW 
 
 
In this chapter, the basic concepts of clock design in Integrated Circuit (IC), particularly 
SoC will be discussed. These include the elaboration of the primary design 
requirements and interests of a clock signal. Different clock schemes available in the 
industry and their advantages and disadvantages will be discussed.  
 
 
2.1 Introduction to Clock in Synchronous System  
All signals are expected to propagate and update all the memory elements within the 
expected time frame in a fully functional sequential circuit. Therefore, the idea of 
synchronous system is introduced to achieve that. In the synchronous system, all the 
signal propagation activities will be overseen by “clock”. The clock is actually a 
periodic signal distributed globally to synchronize the signal propagation. 
 
The clock signal window will synchronize the fast signals and slow signals in order to 
make sure all the logic states are updated at the same time instance. This is vital to 
 19 
 
 
ensure the intended logic result is correctly computed. In other word, the system fastest 
clock frequency will be dependent on the slowest signal.  
 
The clock signal performance is judged by the clock frequency, uncertainty, and usage 
overhead. All mentioned factors will determine the functionality and performance of a 
synchronous system (Rusu, 2001). The frequency of the clock determines how frequent 
the logic states in the design could be updated within a second. This means that the 
higher the clock frequencies, the faster system computation ability. And since the clock 
will directly affect the performance of the whole system, the clock is usually imposed 
by strict timing constraints. If the clock signals do not comply with the timing 
constraints imposed, the memory elements could get wrong data updated and thus lead 
to malfunction (Rabaey, Anantha & Nikolic, 2003). 
  
 20 
 
 
2.1.1 Transition Time 
 
 
  
 
 
 
Figure 2.1.1 Illustration of Transition Time 
 
Transition time is refers to as a period of time used for a signal to change from the 
original logic state to another logic state. In this thesis, the transition is mainly focus on 
the clock signal. Thus the transition time could be low-to-high (0 to 1 transition) or vice 
versa. The most common used way is to take the time period starting from 20% of VCC 
to 80% of VCC for the rising edge; and similar 80% to 20% for the falling edge (Rusu, 
2001). 
 
Transition time is very important to clocking signals. This is one of the main factors that 
limit the maximum frequency the clock signal can go (Wong, 2002). Besides, transition 
time of the clock signal also determine the total power consumption and radiation 
interference (Rabaey, Anantha & Nikolic, 2003). Slow transition time suppresses the 
clock maximum frequency; Fast transition time consumes more power for the sake of 
 
20% 
80% 
Transition 
 Time 
Time 
 21 
 
 
faster respond and could introduce crosstalk to nearby signals thus affect the signal 
integrity. 
 
 
 
2.1.2 Duty Cycle 
Duty Cycle is the ratio of the positive pulse period over the entire clock period. For 
example if the clock signal is having the exact equal time for the high and low, the duty 
cycle will be 0.5 or 50%.  
 
𝐷𝐷𝐷𝐷 𝐶𝐷𝐶𝐶𝐶 =  𝑇+𝑉𝑉
𝑇𝑇𝐷𝑇𝐶 𝑃𝐶𝑃𝑃𝑇𝑃… … … (𝐸𝐸𝐸 2.1.2) 
 
 
 
 
 
 
Figure 2.1.2 Illustration of Duty Cycle 
  
T+ve 
Total Period 
0 0 1 1 
 22 
 
 
Ideally, the clock signal would be designed to have a 0.5 duty cycle. However the 
reality is that this is hard to meet. The nature of faster electrons mobility in the NMOS 
than the holes mobility in the PMOS of the CMOS technology makes the variations in 
the rise time and fall time.  
 
This will result a distortion to the duty cycle. The size ratio of the PMOS to NMOS 
could be justified to reduce the mentioned effect but an ideal duty cycle would hard to 
be met. Furthermore, temperature is another variable that come from the surrounding 
environment that the designers have no control over it. Mobility would be affected by 
the temperature (Sze, 2002). 
 
Duty cycle is vital to, especially level sensitive or edge sensitive elements. 
  
 23 
 
 
2.1.3 Clock Skew 
Clock Skew is the difference in the arrival time of the clock signal on the sequentially 
adjacent registers (Cadence, 2013). This could be due to the difference in clock path 
distance and the clock loading.  
 
 
 
 
 
 
 
 
 
 
 
Figure 2.1.3 Illustration of Clock Skew (Rusu, 2001) 
  
The clock skew as illustrated in Figure 2.1.3 is given by the: 
𝑇𝑠𝑠𝑠𝑠 = |𝑇𝑎 −  𝑇𝑏| … … … (𝐸𝐸𝐸 2.1.3) 
 
The clock skew, by theory should be constant from cycle to cycle (Rabaey, Anantha & 
Nikolic, 2003). Let’s consider two clock paths, ClkA and ClkB, as shown in the Figure 
 24 
 
 
2.1.3. Both clock paths are generated from the same clock source Clk. In ideal situation, 
both clock path should be having the same delay and suppose to arrive at the destination 
at the same time. However, in the real situation, both could have very different 
environments while travelling to the destination. 
 
The below factors are commonly seen to introduce clock skew (Rusu, 2001): 
1. Physical wire line length and load mismatch of both clock paths. 
a. The RC parasitic properties of wires differ with the length. The loads 
driven by each clock branches are also of different weight. 
2. Process and power supply variations across the die.  
a. Oxide Thickness (Tox), Effective Channel Length (Leff) and etc. 
resulting in different channel lengths and device threshold voltage Vt 
across the die. 
3. Temperature variations across the die on operation. 
a. Critical regions and highly active regions in an IC tend to generate 
more heat. The more heat the wire receives, the more resistance the 
wire has and thus signals get through slower. 
4. Inductive effects from surrounding active elements activity. 
a. Neighbor’s active components will have coupling effects and affect 
the clock signals. 
  
