Case Studies on Clock Gating and Local Routign for VLSI Clock Mesh by Ramakrishnan, Sundararajan
  
 
 
CASE STUDIES ON CLOCK GATING AND LOCAL ROUTING FOR  
VLSI CLOCK MESH 
 
 
A Thesis 
by 
SUNDARARAJAN RAMAKRISHNAN  
 
 
Submitted to the Office of Graduate Studies of 
Texas A&M University 
in partial fulfillment of the requirements for the degree of  
MASTER OF SCIENCE  
 
 
August 2010 
 
 
Major Subject: Computer Engineering 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Case Studies on Clock Gating and Local Routing for  
VLSI Clock Mesh 
Copyright 2010 Sundararajan Ramakrishnan  
  
 
 
CASE STUDIES ON CLOCK GATING AND LOCAL ROUTING FOR  
VLSI CLOCK MESH 
 
A Thesis 
by 
SUNDARARAJAN RAMAKRISHNAN  
 
Submitted to the Office of Graduate Studies of 
Texas A&M University 
in partial fulfillment of the requirements for the degree of  
MASTER OF SCIENCE  
 
Approved by: 
Chair of Committee,  Jiang Hu 
Committee Members, S. Gwan Choi 
 Anxiao Jiang  
Head of Department, Costas Georghiades 
 
 
August 2010 
 
Major Subject: Computer Engineering 
 iii 
ABSTRACT 
 
Case Studies on Clock Gating and Local Routing for  
VLSI Clock Mesh.  
(August 2010) 
Sundararajan Ramakrishnan, B.E., PSG College of Technology, Coimbatore, India 
Chair of Advisory Committee: Dr. Jiang Hu 
 
 The clock is the important synchronizing element in all synchronous digital 
systems. The difference in the clock arrival time between sink points is called the clock 
skew. This uncertainty in arrival times will limit operating frequency and might cause 
functional errors. 
Various clock routing techniques can be broadly categorized into ‘balanced tree’ 
and ‘fixed mesh’ methods. The skew and delay using the balanced tree method is higher 
compared to the fixed mesh method. Although fixed mesh inherently uses more wire 
length, the redundancy created by loops in a mesh structure reduces undesired delay 
variations. The fixed mesh method uses a single mesh over the entire chip but it is hard 
to introduce clock gating in a single clock mesh. This thesis deals with the introduction 
of ‘reconfigurability’ by using control structures like transmission gates between sub-
clock meshes, thus enabling clock gating in clock mesh. By using the optimum value of 
size for PMOS and NMOS of transmission gate (SZF) and optimum number of 
transmission gates between sub-clock meshes (NTG) for 4x4 reconfigurable mesh, the 
 iv 
average of the maximum skew for all benchmarks is reduced by 18.12% compared to 
clock mesh structure when no transmission gates are used between the sub-clock meshes 
(reconfigurable mesh with NTG =0). 
 Further, the research deals with a ‘modified zero skew method’ to connect 
synchronous flip-flops or sink points in the circuit to the clock grids of clock mesh. The 
wire length reduction algorithms can be applied to reduce the wire length used for a local 
clock distribution network. The modified version of ‘zero skew method’ of local clock 
routing which is based on Elmore delay balancing aims at minimizing wire length for the 
given bounded skew of CDN using clock mesh and H-tree. The results of ‘modified zero 
skew method’ (HC_MZSK) show average local wire length reduction of 17.75% for all 
ISPD benchmarks compared to direct connection method. The maximum skew is small 
for HC_MZSK in most of the test cases compared to other methods of connections like 
direct connections and modified AHHK. Thus, HC_MZSK for local routing reduces the 
wire length and maximum skew. 
 v 
ACKNOWLEDGEMENTS 
 
First, I would like to thank my advisor, Dr. Jiang Hu, for his continuous support 
and guidance throughout my thesis. He showed me different ways for approaching 
research problem and the need to be persistent to accomplish any goal. He has always 
extended his help during different stages of my thesis and I shall always thank him for 
that.  
I would also like to thank my committee members, Dr. Gwan Choi and            
Dr. Anxiao Jiang, for their support throughout the course of the thesis. Also I extend my 
gratitude and thanks to all the staff of the Electrical and Computer Engineering 
Department for taking care of the administrative issues.  
Besides my advisors and committee members I would like to extend my gratitude 
to my friends for their continuous encouragement and support during the research.  
Finally, I would like to thank my parents for their unconditional support and 
encouragement to pursue my interests.  
 vi 
NOMENCLATURE 
 
CDN Clock distribution network 
HC_DC Direct connection method from clock sinks to nearest clock grid 
HC_MZSK Modified zero skew method to form local distribution network  
HC_M_AHHK Modified AHHK method to form local distribution network 
HC_DC_EXT Direct connection after elongation for wire length balancing 
 vii 
TABLE OF CONTENTS 
 
              Page 
ABSTRACT ..............................................................................................................  iii 
ACKNOWLEDGEMENTS ......................................................................................  v 
NOMENCLATURE ..................................................................................................  vi 
TABLE OF CONTENTS ..........................................................................................  vii 
LIST OF FIGURES ...................................................................................................  ix 
LIST OF TABLES ....................................................................................................  xi 
1. INTRODUCTION ...............................................................................................  1 
2. INTERCONNECT MODELING AND DELAY ANALYSIS ...........................  4 
3. IMPORTANCE OF CLOCK GATING ..............................................................  6 
4. CLOCK DISTRIBUTION USING H-TREE ......................................................  9 
   
  4.1 Methodology of Clock Routing in H-Tree ...........................................  10 
  4.2 Clock Routing in H-Tree with Clock Gating .......................................  14 
 
5. CLOCK DISTRIBUTION USING CLOCK MESH AND H-TREE ..................  21 
   
  5.1 Methodology of Clock Routing using Clock Mesh and H-Tree ..........  24 
  5.2 Clock Distribution Network Using Reconfigurable  
   Clock Mesh and H-Tree .......................................................................  29 
  5.3 Impact of the Size of the Transmission Gate between the  
   Sub-Clock Meshes  ...............................................................................  37 
  5.4 Analysis by Changing Number of Transmission Gates between the  
   Sub-Clock Meshes ................................................................................  43 
 
6. WIRELENGTH MINIMIZATION FOR BOUNDED SKEW FOR CLOCK 
DISTRIBUTION USING CLOCK MESH AND H-TREE ................................  49 
 
 
 viii 
Page 
7. RESULTS AND ANALYSIS .............................................................................  66 
8. CONCLUSION ...................................................................................................  72 
REFERENCES ..........................................................................................................  75 
APPENDIX ...............................................................................................................  81 
VITA .........................................................................................................................  83 
 ix 
LIST OF FIGURES 
 
FIGURE                                                                                                                        Page 
1 Equivalent Π−model representation of unit length wire with  
resistance (Ra) and capacitance (Ca) ...........................................................  4 
 
 2 RC ladder for linear network ......................................................................  5 
 
 3 Clock gating in a simple sequential logic ...................................................  7 
  
 4  H-tree with 1 level ......................................................................................  9 
 
 5 H-tree with 2 levels ....................................................................................  10 
 
 6 H-tree routing with wires of Manhattan distance .......................................  11 
 7 Clock routing using H-tree of level 4 for ‘02.in’ benchmark without  
  clock gating  ...............................................................................................  13 
 
 8 Two level H-tree with level 1 clock gating ................................................  16 
 
 9 Two level H-tree with level 2 clock gating  ...............................................  18 
 
 10  Simple clock mesh of different levels  .......................................................  25 
 
 11 Clock distribution network using clock mesh and H-tree  .........................  26 
 
 12 Clock routing using clock mesh of level 5 and H-tree of level 4 for  
  ‘02.in’ without clock gating .......................................................................  27 
 
 13 Clock distribution network with 2x2 reconfigurable clock mesh of  
  level 3 with H-tree of level 2 ......................................................................  31 
 
 14 Clock distribution network with 4x4 reconfigurable clock mesh of  
  level 3 and H-tree of level 2 .......................................................................  34 
 
 15  Variation of the maximum skew with the change of the size of the  
   transmission gate for 2x2 reconfigurable clock mesh for ’02.in’ case .......  39 
 
 16  Variation of the maximum skew with the change of the size of the 
  transmission gate for 2x2 reconfigurable clock mesh for ’01.in’ case .......  40 
 x 
FIGURE                                                                                                                        Page 
 
 17 Variation of the maximum skew with the change of the size of the  
  transmission gate for 4x4 reconfigurable clock mesh for ’02.in’ case .......  42 
 
 18 Variation of the maximum skew with the change of the size of the  
  transmission gate for 4x4 reconfigurable clock mesh for ’01.in’ case .......  42 
 
 19 Variation of the maximum skew with the change of the number of the 
transmission gate for 2x2 reconfigurable clock mesh for ’02.in’ case .......  44 
 
 20 Variation of the maximum skew with the change of the number of the 
transmission gate for 2x2 reconfigurable clock mesh for ’01.in’ case .......  45 
 
 21 Variation of the maximum skew with the change of the number of the 
transmission gate for 4x4 reconfigurable clock mesh for ’02.in’ case .......  46 
 
 22 Variation of the maximum skew with the change of the number of the 
transmission gate for 4x4 reconfigurable clock mesh for ’01.in’ case .......  47 
 
 23 Tapping point location between two sub-trees ...........................................  50 
 
 24 Enlarged view of sub-block (b4, 9) of the clock mesh for  
  ISPD ’02.in’ benchmark with local distribution connection using  
  HC_DC method ..........................................................................................  63 
 
 25 Enlarged view of sub-block (b4, 9) of the clock mesh for  
  ISPD’02.in’ benchmark with local distribution connection using 
  HC_MZSK method ....................................................................................  64 
 
 26 Enlarged view of sub-block (b4, 9) of clock mesh for  
  ISPD’02.in’ with local distribution connection using  
  HC_M_AHHK method with c=0 ...............................................................  64 
 
 27 Enlarged view of sub-block (b4, 9) of clock mesh for  
  ISPD ’02.in’ with local distribution connection using  
  HC_M_AHHK method with c=0.3 ............................................................  65 
 
 xi 
LIST OF TABLES 
 
TABLE                                                                                                                          Page 
 
 1 Comparison of the maximum skew, average power for clock  
  distribution network formed with only H-tree for different 
  benchmarks of ISPD 2010 ..........................................................................  14 
 
 2 Comparison of the maximum skew, average power for H-tree with  
  level 1 clock gating ....................................................................................  17 
 
 3 Comparison of the maximum skew, average power for H-tree with  
  level 2 clock gating ....................................................................................  19 
 
 4 Comparison of the maximum skew, average power for clock  
  mesh with H-tree and single mesh without ‘reconfigurability’ for  
  different ISPD benchmarks ........................................................................  29 
 
 5 Comparison of the maximum skew, average power for clock  
  mesh with H-tree for 2x2 reconfigurable mesh ..........................................  33 
 
 6 Comparison of the maximum skew, average power for clock  
  distribution network with H-tree and 4x4 reconfigurable clock mesh .......  36 
 
 7 The variation of the maximum skew, average power with the  
  change in the size of the transmission gate for 2x2 reconfigurable  
  clock mesh for ’02.in’ case.........................................................................  38 
  
 8 The variation of the maximum skew, average power with the  
  change in the size of the transmission gate for 4x4 reconfigurable  
  clock mesh for ’02.in’ case.........................................................................  41 
 
 9 Variation of the maximum skew and other parameters with the  
  change of number of transmission gates for 2x2 reconfigurable  
  clock mesh for ’02.in’ case.........................................................................  44 
 
10 Variation of the maximum skew and other parameters with the  
change of number of transmission gates for 4x4 reconfigurable  
clock mesh for ’02.in’ case.........................................................................  46 
 
 11 Variation of the maximum skew, wire length and maximum fall  
  time for the ISPD 2010 benchmarks 02.in, 03.in, 08.in. ............................  57 
 xii 
TABLE                                                                                                                          Page 
 
 12 Variation of the maximum skew, wire length and maximum  
  fall time for the IBM benchmarks r1, r4, r5 ...............................................  59 
 
 13 Variation of the maximum skew, wire length and maximum  
  fall time for the ISPD benchmark ’02.in’ by changing the density  
  of clock mesh level and H-tree level ..........................................................  60 
 
 14 Variation of the skew, average power of the CDN formed using  
  H-tree and single mesh with and without blockages ..................................  62 
  
 15 Summary of comparative results ................................................................  70 
 
 1 
1. INTRODUCTION 
 
 Clock design is an important design step in the overall chip integration 
methodology and often a critical step in the fast design process. The clock is the 
important synchronizing element in synchronous digital system. The clock signal is 
distributed from an external pad to all the flip-flops and synchronizing elements through 
a clock distribution network (CDN). Global clock distribution networks with low 
difference in arrival time between different clock sinks are required in the high 
performance microprocessors. The difference in the clock arrival time between sink 
points is called the clock skew. 
The clock skew is the maximum difference in the delay time from the clock 
source to the flip-flops. The clock skew is preferred to be less than 5% of the critical 
path delay time to build high performance systems, which is a very tight constraint [1]. 
The transistor delay used to be the main factor in affecting the performance of a system, 
but with the deep submicron technology, the interconnect delays makes up a large part of 
the overall delay [2]. Several clock routing techniques have been proposed till now 
which can be broadly categorized into balanced tree method and the fixed mesh method. 
One of the methodologies to realize the balanced tree method is to use H-tree which 
helps to achieve small skew [3]. The flip-flops of different sizes at different locations  
 
 
____________ 
This thesis follows the style of IEEE Transactions on Computer Aided Design of 
Integrated Circuits and Systems. 
 2 
increase the load at some particular H-tree leaf node and thus increases the clock skew. 
This restriction introduces the difficulty in keeping the H-tree symmetric and achieving 
small skew. In [4] by Jackson et al. the Method of Means and Medians (MMM) is used 
to divide the circuit recursively into two subsets and then connect the subset at their 
center of masses calculated based on the clock pin locations. This method aims at 
reducing the clock skew when the flip-flops are not placed symmetrically. The above 
method does not take into account the weight factor due to the value of capacitances 
each ‘balanced tree method’ leaf node is driving.  
Further the balancing of clock routing for small skew can also be done for the 
two sub-clock trees based on the bottom-up approach where the skew is minimized as in 
the Path Delay Balancing Method [5]. The algorithms like the clustering-based 
algorithm reduces the total wire length used for routing along with the target of 
achieving smaller skew [6]. The delay and skew due to balanced tree method like H-tree 
is more than the fixed mesh method. Thus we can go for the fixed mesh method which 
uses a single mesh over the entire chip driven by a large buffer which also reduces the 
clock skew. However, it increases overall wire length used.  
Clock routing method could use both ‘balanced tree method’ and clock mesh. 
The ‘balanced tree method’ will be at the higher level to distribute the clock to the lower 
level involving the fixed mesh. This helps to attain the advantage of both the methods. In 
order to provide the clock with small skew, the ‘balanced tree method’ like H-tree can be 
used at the top level and the fixed mesh can be used at the lower level. The fixed mesh is 
used to connect to the sink points in the clock distribution network. The work deals with 
 3 
analysis of the global clock distribution using the clock mesh network and its effect on 
clock skew after the introduction of ‘reconfigurability’ in the clock mesh. The later part 
of the work deals with the analysis of the methods to connect the synchronous flip-flops 
or sink point in the circuit to the clock grids of the clock mesh in the clock distribution 
network. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 4 
2. INTERCONNECT MODELING AND DELAY ANALYSIS 
 
The modeling of the wires should be done with more accuracy for the correct 
evaluation of delay, skew and slew rate of the clock signal from the source terminal to 
the sink points [7]. The accuracy of the delay calculation heavily depends on the 
accurate modeling of the clock distribution network and the delay analysis model [8]. 
We have modeled the wires in the clock distribution network as distributed Π− model as 
shown in the following Figure 1. 
If ti is insulator thickness, ε is dielectric permittivity and w is line width, then the 
capacitance per unit length (neglecting sidewall capacitance) is given by [9] 
  
C a= (ε∗w)/ti              (1) 
 
Similarly if ρ is metal resistivity then the resistance per unit length is given by [9], 
Ra = ρ / (w*ti)              (2) 
 
 
 
Figure 1. Equivalent Π− model representation of unit length wire with Resistance (Ra) 
and Capacitance (Ca) 
 
 5 
The resistance per unit length and the capacitance of the wire are obtained based 
on the benchmark files and technology file being used.  For example the test cases in the 
‘ISPD 2009’ and ‘ISPD 2010’ benchmarks used resistance per unit length and 
capacitance per unit length values as 0.0001 Ohm/nm and 0.0002 fF/nm respectively. 
While for the IBM benchmarks, resistance per unit length and capacitance per unit 
length used are 0.003 Ohm/nm and 0.02 fF/nm respectively.  
The Elmore delay of a linear network is defined as first moment of the network 
impulse response [10] [11]. For the RC ladder as shown in Figure 2, the Elmore delay is 
given by 
Elmore Delay =∫ Vout(t) dt∞0  = R1(C1 + .. + Cn) + R2(C2 + … + Cn) +… + Rn Cn        (3) 
 
 
Figure 2. RC ladder for linear network 
 
The Elmore delay is used for calculating the estimate of the delay and skew 
while performing the local clock mesh routing using the modified zero skew routing 
method. However, the actual calculation of delay and skew were done using simulation 
of the SPICE model of the circuit using HSPICE. 
 
 
 6 
3. IMPORTANCE OF CLOCK GATING 
 
The clock network distributes clock from the source to the sinks. The clock 
distribution network consumes a large percentage of power in modern microprocessors 
[12]. Major power consumption in the circuits may be due to combinational logic whose 
values are changing on each clock edge or power consumed by flip-flops. The latter case 
will have a non-zero value even if the inputs to the flip-flops and internal state of those 
flip-flops are not changing. Further the power consumed by clock buffers in the clock 
tree of the design also contributes to certain amount of power [13].  
The power consumption can be reduced by using voltage scaling, reducing the 
load capacitance and reducing the switching activity. Since the first two are dependent 
mostly on technology the switching activity can be exploited to reduce the power at the 
logic design stage [14]. Gated clocks are better for reducing power by reducing the 
switching activity of logic in redundant cycles [15] [16]. The unneeded functional units 
at a particular time can be disabled to reduce power [17]. Clock gating has been used to 
reduce power by disabling the clock and thereby disabling value changes on unneeded 
functional units [18]. The methods like deterministic clock gating (DCG) suggests that 
for many of the stages in pipelined architectures of modern processors, usage of a block 
in a circuit is deterministically known a few cycles ahead and clock gating helps to 
reduce the power to a larger extent [19]. If the ‘enable signals’ are generated based on 
the switching activity then they can be used to realize clock gating to reduce power. For 
example in the simple sequential circuit with flip flops and combinational logic, the 
 7 
clock gating can be introduced by using a simple logic ‘AND’ gate with inputs as clock 
signal and the enable signal. The output of the AND gate is the new gated clock signal. 
Figure 3 shows the clock gating for the simple sequential logic with enable signal. 
 
 
Figure 3. Clock gating in a simple sequential logic 
 
When the enable signal is made low then clock signal being fed to the flip-flops 
of the sequential logic will be disabled saving the power consumption. Similarly in order 
to implement clock gating for a group of flip-flops or sequential logic, they can be 
considered to be associated with a sub-block and enable signal can be used to realize 
clock gating for that sub-block. Thus the clock gating reduces the unnecessary value 
changes and hence reduces power consumption. The concept can be further extended to 
clock distribution network using H-tree and clock mesh. The global clock gating can be 
introduced in the H-tree by using ‘AND’ gate logic at the leaf nodes of the H-tree whose 
inputs will be the clock signal and corresponding enable signal. The clock gating in 
simple clock distribution network using H-tree will turn-off the flip-flops or clock sinks 
connected to a particular node of the H-tree. Clock mesh proves to be better in reducing 
 8 
the skew and delay but introducing clock gating in a single mesh is difficult and the 
following sections deals with the introduction of clock gating in clock distribution 
network using H-tree and clock mesh in detail. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 9 
4. CLOCK DISTRIBUTION USING H-TREE 
 
Minimization of clock skew is the main concern in the clock distribution 
network. H-tree structures have been used extensively for clock routing in regular 
systolic arrays. Single level consists of two vertical columns and one horizontal row 
forming an H-shaped structure is as shown in Figure 4. The k level H-tree is effective in 
achieving a balanced structure which helps to reduce the clock skew if the clock sinks 
are distributed on a 2kx2k array [20]. The H-tree reduces the wire length compared to the 
clock mesh. It also helps to achieves balanced structure at the top level unlike the 
rectilinear Steiner tree structure. 
 
 
Figure 4. H-tree with 1 level 
 
In order to achieve smaller skew multilevel H-tree can be used to provide the 
source signals to the local distribution network or the clock sinks [21]. With a single 
level H-tree the clock signal from the source is distributed to the four quarters of the 
chip. Similarly 2nx2n sub-blocks of the chip can be provided with the clock using n level 
H-tree. With increase in the number of levels the wire length required for local 
 10 
distribution network is reduced and hence it helps to reduce the skew. The H-tree with ‘2 
level’ is shown in Figure 5.  
 
 
Figure 5. H-tree with 2 levels 
 
4.1. METHODOLOGY OF CLOCK ROUTING IN H-TREE  
The H-tree structure as described can contain many levels and the leaf nodes of 
the final levels are used as source point for connecting the clock sinks to the H-tree. 
Normally the leaf nodes of the final level H-tree are connected individually to the chain 
of buffers whose outputs are then connected to the clock sinks or the local distribution 
network. Further in the description, leaf node of the H-tree usually refers to the H-tree 
leaf nodes which have the buffers connected to them. With n levels of H-tree we have 
2nx2n sub-blocks. Clock sinks in the chip are connected to any leaf node of the H-tree 
which is nearer to them. The Manhattan distance is used to find the nearest leaf node of 
 11 
the H-Tree to that particular clock sink. The distance between two points measured 
along axes at right angles is the Manhattan distance. In a plane with p1 at (x1, y1) and p2 
at (x2, y2), the Manhattan distance is |x1 - x2| + |y1 - y2|.  The length or cost of an edge 
in the tree is the Manhattan distance between the two endpoints of the edge and the total 
tree cost or length is the sum of all edge costs or length in the tree. The key idea behind 
connecting clock sinks to H-tree is to find the leaf node of that sub-block out of 2nx2n 
sub-blocks of H-tree which is nearer to the clock sink. The connection is made with the 
Manhattan distance. For instance consider the sample circuit with seven sink nodes. The 
clock routing using H-tree for this circuit would result in clock distribution network as 
shown in Figure 6. 
 
 
Figure 6. H-tree routing with wires of Manhattan distance  
 12 
In Figure 6 sink 1 with coordinates (x1, y1) is connected to the H-tree node 
namely ‘H_node_12’ with coordinates (x2, y2), since ‘H_node_12’ is the leaf node 
which is nearer to it. The corresponding Manhattan distance is |x1 - x2| + |y1 - y2|. 
Similarly the local routing is done for other sinks by connecting them to nearest H-tree 
node. For example in the ‘ISPD 2010’ test cases the benchmark ‘02.in’ has 2249 sink 
points. For each of the sink points the leaf node of the H-tree which is nearer to them is 
determined and the HSPICE file is created representing the entire clock routing using the 
H-tree alone. Later section deals with clock gating which can ‘switch off/on’ particular 
sub-blocks of the chip using H-tree method. Figure 7 shows the clock routing using H-
tree alone without clock gating of any sub-blocks for the ISPD 2010 benchmark ‘02.in’.    
In Figure 7 the symbol ‘+’ represents the connection points of all the leaf nodes 
of the final level H-tree with local connections from the clock sinks. The clock sinks are 
represented by the symbol ‘o’. Here we have used 4 levels of H-tree to make the 
connection from the clock sinks to the leaf nodes of the final level H-tree. Although the 
H-tree was used to form the SPICE file of the CDN, it is not shown graphically in the 
Figure 7. Also the connections from clock sinks to H-tree leaf node are shown as straight 
connections here, but the actual Manhattan distance value is used to calculate the 
distance while generating the SPICE netlist file. Equivalent Π− model of the wire is used 
to make the connection for the length which is calculated from clock sink to the nearest 
leaf node of the final level H-tree. The Manhattan distance is used for length calculation. 
The wire length used to connect clock sinks to H-tree changes if the H-tree level is 
changed. It affects slew rate, the maximum skew, average power consumed for the 
 13 
particular configuration of clock routing. Also size of buffers which are used at the end 
of the H-tree has bigger impact on the above parameters. In Figure 7, the continuous 
white spaces where there are no clock sinks are indicating the blockages. 
 
 
Figure 7. Clock routing using H-tree of level 4 for ‘02.in’ benchmark without clock 
gating 
 
Simulations have been performed for all the ‘ISPD 2010’ benchmarks after 
forming the clock distribution network (CDN) using H-tree of level 4. The clock 
frequency used for simulation is 1GHz for ISPD benchmarks with 50% duty cycle. The 
 14 
45nm technology file from Predictive Technology Model (PTM) is used to form buffers 
within the clock distribution network for each of the ‘ISPD’ benchmarks. Equivalent 
Π− model of the wire with resistance per unit length and capacitance per unit length 
values as 0.0001 Ohm/nm and 0.0002 fF/nm respectively were used to model the wire of 
specific length for H-tree and local routing in the CDN for the benchmarks. Table 1 
shows the variation of the maximum skew, maximum fall time, average power and wire 
length for different ISPD benchmarks for the CDN using only H-tree.  
 
Table 1. Comparison of the maximum skew, average power for clock distribution 
network formed with only H-tree for different benchmarks of ISPD 2010. 
 
 
 Clock distribution network 
 
 With Only H-tree without clock gating 
Bench
marks 
Number of 
sinks in each 
benchmark 
Max 
Skew 
(ps) 
Max 
Fall time 
(ps) 
Average 
Power 
(mW) 
Wire 
length 
(cm) 
Max 
latency 
(ns) 
01.in 1107 36.1 92.49 343.56 31.5 1.35 
02.in 2249 52 138.24 483.92 55.97 2.42 
03.in 1200 19.27 44.4 214.09 6.47 0.68 
04.in 1845 11.87 27.19 219.46 12.74 0.69 
05.in 1016 15.77 35.63 207.44 9.58 0.69 
06.in 981 31.07 77.4 161.77 4.92 0.44 
07.in 1915 25.12 59.66 221.06 9.87 0.69 
08.in 1134 25.58 62.66 167.79 6.92 0.44 
 
 
4.2. CLOCK ROUTING IN H-TREE WITH CLOCK GATING 
The clock distribution system accounts for a large amount of power (20% to 
50%) consumed by the systems [12]. Thus, in low power synchronous systems the 
power consumed must be reduced subject to the various constraints like keeping the 
 15 
clock skew, clock slew rate under the required bounded values. In the global clock 
distribution using the H-tree the clock gating can be introduced at different levels of H-
tree. Additional control signals and gates are required to perform clock gating which 
implies that there exists a tradeoff between the amount of clock tree gating and the total 
power consumption of the clock tree [12]. The activity-driven clock trees have been 
previously analyzed for performing the clock gating [12]. In our implementation the 
clock distribution network from H-tree of Figure 6 can be converted to a structure with 
level 1 clock gating by introducing the control signals enb1, enb2, enb3, enb4 with 
control logic for the four quarters of the chip. The control logic used here is a logic AND 
gate with enable signals. The 45nm technology file is used for constructing control logic 
and buffers for the ‘ISPD’ benchmarks. Figure 7 shows H-tree of two levels with level 1 
clock gating.  
With the level n clock gating there are 2nx2n sub-blocks which can be clock gated 
in an H-tree with n or more levels. For example in Figure 8, we have 2 level of H-tree 
and one level of clock gating. Hence the four quadrants of the chip, controlled by the 
control signals enb1, enb2, enb3, and enb4 can be clock gated. If the enb1 is made low 
then all the signals ‘Clk out 21’, ‘Clk out 22’, ‘Clk out 23’, ‘Clk out 24’ including the 
‘Clk out 2’ signals are made low. All lower levels H-trees and clock sinks connected to 
the above terminals or the lower level H-trees (if the H-tree levels considered are greater 
than two) are also switched off. Thus the clock gating helps to switch off that particular 
sub-block controlled by that particular enable signal.  
 16 
The granularities at which we can switch off the particular nodes are determined 
by the number of levels of clock gating. In Figure 8 four sub-blocks can be clock gated 
individually. The granularity can be increased by increasing the number of levels of 
clock gating. H-tree with one level of clock gating is shown in Figure 8. Therefore, there 
are 2x2 sub-blocks to clock gate (2 nx2n sub-blocks for n levels of clock gating). It can be 
referred to as H-tree with 2x2 clock gating.  
The results for simulations performed for the test case ‘02.in’ corresponding to 
‘ISPD 2010’ benchmark are as shown in Table 2. Totally 4 levels of H-tree are used with 
level 1 clock gating (2x2 clock gating). 
 
 
Figure 8.Two level H-tree with level 1 clock gating 
 17 
From the results it can be seen that the maximum skew is in the range of 50ps, 
while there are reasonable decreases in the power value with increase in the number of 
sub-blocks which are clock gated or switched off. The power dissipation is still there 
when none of the sub-blocks are ON which is because of the power dissipation in buffers 
and the wire used for first level of the H-tree. It can be seen that variance in the value of 
the maximum skew is small with change in the number of sub-blocks which are 
switched ON. There is power saving with the number of blocks which are turned off. 
Also using enable signals there is greater flexibility in controlling the number of sub-
blocks which are turned off. In order to get more flexibility/granularity in clock gating 
we can use clock mesh of higher clock gating levels. 
 
Table 2. Comparison of the maximum skew, average power for H-tree with level 1 clock 
gating 
 
Benchmark Clock distribution network 
02.in Only H-tree with 2x2 clock gating (level = 1) 
Number of sub-blocks ON Maximum Skew (ps) 
Maximum 
Fall time 
(ps) 
Average 
Power 
(mW) 
Wire 
length 
(cm) 
Maximum  
latency 
(ns) 
0 - - 32.31 55.97 - 
1 49.7 138.24 144.43 55.97 2.42 
2 49.7 138.24 258.68 55.97 2.42 
3 51.1 138.25 373.75 55.97 2.42 
4 52 138.24 483.92 55.97 2.42 
 
 
The H-tree with two levels of clock gating is as shown in Figure 9. For the 2 
level H-tree with two level of clock gating, the clock gating can be done with more 
 18 
granularities since number of sub-blocks which can be clock gated increases. Also the 
size of the sub-block which can be clock gated decreases for a give benchmark with 
same level of H-tree.   
In Figure 9 if ‘enb 22’ signal is made low then local distribution network 
connected to ‘Clk Out 22’ are switched off. Since there are more number of enabling 
signals to control clock gating hence the granularity of the clock gating is more. 
Similarly in Figure 9 the level 2 clock gating is shown and it can be referred to as H-tree 
with 4x4 clock gating. Simulations have been performed for the test case ‘02.in’ 
corresponding to ‘ISPD 2010’ benchmark. The results are as shown, in Table 3.  
 
 
Figure 9.Two level H-tree with level 2 clock gating 
 19 
Table 3. Comparison of the maximum skew, average power for H-tree with level 2 clock 
gating 
 
Benchmark Clock distribution network 
02.in Only H-tree with 4x4 clock gating (level = 2) 
Number of 
sub-blocks ON 
Max 
Skew 
(ps) 
Max Fall 
time (ps) 
Average 
Power (mW) 
Wire length 
(cm) 
Max 
latency 
(ns) 
0 - - 80.41 55.97 - 
1 48.4 138.24 107.72 55.97 2.42 
2 50.2 138.24 128.38 55.97 2.42 
3 50.2 138.24 153.38 55.97 2.42 
4 50.3 138.24 174.19 55.97 2.42 
5 50.3 138.24 199.55 55.97 2.42 
6 50.3 138.24 224.53 55.97 2.42 
7 50.3 138.24 249.79 55.97 2.42 
8 52 138.24 279.63 55.97 2.42 
9 38.8 99.58 302.18 55.97 2.4 
10 38.8 99.58 329.1 55.97 2.4 
11 38.8 99.58 354.28 55.97 2.4 
12 52 138.24 379.85 55.97 2.42 
13 52 138.24 406.23 55.97 2.42 
14 52 138.25 431.25 55.97 2.42 
15 52 138.24 451.49 55.97 2.42 
16 52 138.24 477 55.97 2.42 
 
 
Since there are 16 sub-blocks, Table 3 shows variation of the maximum skew, 
the maximum fall time and average power with the number of sub-blocks which can be 
clock gated in the H-tree. The maximum skew corresponding to the values of 9, 10 and 
11 in the ‘number of sub-blocks ON’ is less than skew for other values since the clock 
sink corresponding to the maximum skew is in the sub-block which is switched off for 
the above configurations. There is a variation in the maximum fall time for the above 
 20 
cases because of the uneven loading due to the clock sinks at all the leaf nodes of the H-
tree for the present configuration. 
For the values of 9, 10 and 11 in the ‘number of sub-blocks ON’, the maximum 
fall time is less since the H-tree leaf node with maximum loading is switched off. Thus 
the maximum fall time corresponding to these configurations does not include the clock 
sinks which would have, otherwise, contributed to the increase in the maximum clock 
fall time. Those clock sinks have been disabled by clock gating. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 21 
5. CLOCK DISTRIBUTION USING CLOCK MESH AND H-TREE 
 
Although H-tree structure helps to reduce clock skew, it is efficient and 
applicable only when the clock sinks or synchronizing components are identical in size 
and are placed in a symmetric array. The clock mesh consists of the horizontal and 
vertical clock grids to which either local distribution network is connected or clock sinks 
are directly connected. The Manhattan distance can be used to connect local clock 
distribution network or clock sinks to the closest clock grid. Row clock grid in the clock 
mesh runs from left to right and column clock grid runs from bottom to top of the mesh. 
A ‘mesh grid node’ is the point where a column is connected to a row [22]. The top, 
bottom, left and right clock grids along with the four mesh grid nodes at corners form the 
sub-block/sub-grid of the clock mesh. Clock sinks found within the four ‘mesh grid 
nodes’ correspond to that particular sub-grid/sub-block. A local distribution network can 
be formed within sub-grids/sub-blocks in a clock mesh. For example if a clock mesh has 
number of rows and columns as 17 and 17 respectively then total number of sub-blocks 
is16x16. Since H-tree is considered to distribute the clock from the source to the clock 
grid, we can assume the clock mesh to be consisting of equal number of rows and 
columns. 
The non-tree clock network has been recognized as a good approach to minimize 
the problems due to variation [23]. A mesh structure is better than a tree structure at 
coping with process variations, since the mesh has more local connections and the 
regenerated signals coming from different local connections help to smooth out the local 
 22 
delay variations and yield a smaller clock skew [21]. Thus, clock mesh-based structures 
like clock mesh, clock spines and cross links are more effective in reducing the 
variations on clock skew due to environmental variations effects [24]. Some of the 
commercial processors like first generation Itanium [25], IBM Power 4 [26], Sun 
Microsystems Dual Core Sparc V9 [27] and IBM Power 6 [28] microprocessors use 
clock mesh to reduce clock skew due to variations and design mismatch. 
The microprocessor designs in many cases dissipate larger fraction of the 
microprocessor chip power in global clock distribution network [29] [12]. Also the 
interconnects haven’t scaled as well as the devices hence it is necessary for the 
microprocessors to locally buffer the clock signals to maintain the required slew rate. In 
order to reduce most of power dissipation due to clock distribution network, efficient 
clock gating techniques are needed to turn off the local clock distribution networks when 
they are not being used. Most of the power dissipation is due to the gate capacitances of 
latches or clock sinks and due to the capacitances of the wire used for the local clock 
distribution. However, care must be taken to make sure that the clock skew is not 
increased by large amount after reducing the power using clock gating techniques.   
Most of the clock distribution network designing techniques aims at reducing the 
clock skew, slew rate, and jitter. Undesired variations in cycle time period are called 
jitter.  The jitter will reduce performance because the cycle time will be short and chip 
frequency must be slowed down to avoid long path errors [29]. Most of the clock 
distribution network is buffered at many levels. If these buffers are using relatively noisy 
Vdd and Gnd then much of the clock jitter will be from the buffer delay variations in 
 23 
clock distribution network due to power supply noise [30]. In our analysis the clock 
signal is considered to be jitter free. 
The normal clock distribution technique to get small skew consists of using a 
clock distribution network with many levels of H-tree connected to single clock mesh 
[31]. Clock network consists of buffered trees with the final level of trees driving a 
single common grid covering most of the chip. This combines the advantage of both the 
tree structure and the clock mesh. Clock tree has advantages like low power, low latency 
and minimal usage of wiring track. Without the grid, trees must often be rerouted 
whenever location of clock pins change or when value of the load capacitances change 
significantly [31]. Since the clock mesh provide a regular structure hence the routing 
would be easier from the clock sink to the nearest grid even if the position of the clock 
sink changes. Local routing track would be simpler with the clock mesh for clock sinks. 
Also when the single mesh is used the regenerated clock signals from the different parts 
of the mesh helps to improve the clock slew rate and the maximum clock skew.  
Thus in the simplest form clock distribution network consist of symmetric H-tree 
driving a single mesh and local distribution network of the clock sinks within the sub-
block of clock mesh. The grid reduces local skew by connecting nearby points directly. 
The tree wires are then tuned to minimize skew over longer distances [31]. The size of 
the buffers used at the leaf nodes of the H-tree driving the clock mesh also affects the 
global skew, local skew and clock slew rate. In our work the buffer sizes have been 
varied to find the optimum value which gives minimum slew rate and clock skew for all 
the benchmarks.  
 24 
5.1. METHODOLOGY OF CLOCK ROUTING USING CLOCK MESH AND H-TREE 
 Clock distribution network using clock mesh has got inherent routing 
redundancies which leads to improvement in clock skew and reliability [32]. The 
variation tolerance of a leaf-level mesh is a direct result of its high redundancy, with 
multiple sources to sink paths for every sink [31]. Thus clock mesh with local 
distribution network forms the lower level of the clock distribution network while the 
upper H-tree helps to distribute the signal to clock mesh from the clock input source.  
For forming the clock mesh and H-tree, Equivalent Π− model of the wire of 
specific length with resistance per unit length (Ra) and capacitance per unit (Ca) as 
shown in Figure 1 can be used for Spice simulation. Basic structure of clock mesh 
consists of horizontal rows and vertical columns of clock grids which are having 
connections at the ‘mesh grid node’. The clock mesh is characterized based on the 
number of levels. The clock mesh is said to be of level n if it has 2nx2n sub-blocks/sub-
grids and there are (2n+1) rows of clock grids and (2n+1) columns of clock grids in the 
clock mesh.  
For example the clock mesh structure with level 1, level 2 and level 3 are shown 
in Figure 10. Clock mesh with level 1 has 4 sub-blocks (2x2) while the clock mesh with 
level 3 has 23x23 (8x8) sub-blocks. With more levels in the clock mesh the local 
distribution network required for the clock sinks within the sub-blocks will be lower and 
it helps to improve clock slew rate and hence the clock skew. In order to assure a more 
uniform distribution of clock signals to clock grids, connection from H-tree to clock 
mesh is made at ‘mesh grid nodes’ in a more uniform fashion.  
 25 
         
Level =3       level=2        level=1 
Figure 10. Simple clock mesh of different levels 
 
If the clock grid is of level 3 then there are 9 rows of clock grid and 9 columns of 
clock grids which are connected at the 81 specific nodes called mesh grid nodes. The 
above structure consists of 8x8 sub-blocks. The clock sinks within the sub-blocks can be 
connected to the nearest clock grids which form the local distribution networks. Clock 
distribution network considered have regular single clock mesh and top level H-tree. 
Thus, if the H-tree considered is of level n then clock mesh is considered to be of level 
(n+1) to make uniform connections from H-tree to clock mesh and it also helps to make 
the structure as a reconfigurable one when introducing clock gating for the sub-blocks in 
the clock mesh. Each clock sink receives signal from the nearby lowest level H-tree leaf 
nodes and also the regenerated signals from the other leaf nodes of the H-tree. Hence, 
this structure creates high redundancy with various paths from source to the sink for 
every sink [22]. Consider Figure 11 with clock grid of structure 8x8 sub-blocks (level 3) 
and H-tree of level 2. The connection of H-tree leaf nodes are made to the clock mesh at 
some of the ‘mesh grid nodes’ indicated by the black dots/marks in Figure 11. The 
 26 
sample circuit is considered to have got 4 clock sinks. Local connection for clock sinks 
within sub-blocks is made using a direct connection from the clock grid to the clock 
sink.  
 
 
Figure 11. Clock distribution network using clock mesh and H-tree 
 
The method for forming the entire clock distribution network including H-tree, 
clock mesh, and local clock sink connections is done in Matlab which generates HSPICE 
compatible file. The HSPICE file represents the entire clock distribution network along 
with clock sinks. Simulations were performed to calculate the maximum clock skew, the 
maximum rise/fall time and latency for a given benchmark. The method designed to 
 27 
form the clock mesh for the ‘ISPD 2010’ and ‘ISPD 2009’ benchmarks determines the 
optimum number of H-tree and clock mesh level required to achieve the required clock 
skew for that particular benchmark. Also concern is given to the minimum number of 
clock sinks which must be present within the particular sub-block. Hence the variation in 
the load distribution in each of the sub-blocks is small.  
 
 
Figure 12. Clock routing using clock mesh of level 5 and H-tree of level 4 for ‘02.in’ 
without clock gating 
 
In ‘ISPD 2010’ test cases the benchmark ‘02.in’ has 2249 sink points. For each 
of the sink points nearest clock grid is found along with distance. The connection is 
 28 
made from that clock sink to the grid. Clock routing is done which has the H-tree, clock 
mesh and local distribution network from the clock sinks to the clock grid. Figure 12 
shows the clock distribution network for the test case ‘02.in’ in the ISPD 2010 
benchmark. Although the H-tree was used to form the HSPICE netlist but it is not shown 
graphically in the Figure 12. 
In Figure 12 the symbol ‘+’ represents the connection points of all the leaf nodes 
of the final level H-tree to the clock grid  and the local connections from the clock sinks 
to the nearest clock grid is done using direct connections. Clock sinks are represented by 
the symbol ‘o’. Here we have used up to 4 levels of H-tree and clock mesh with 5 levels. 
Thus there are 32x32 sub-blocks in the clock mesh structure. This is a simple clock mesh 
without reconfigurable mesh. The clock gating can be done in these structures by using 
reconfigurable clock mesh as described in the following sections.  
With the H-tree and single clock mesh forming the base structure of the clock 
distribution network it is not possible to clock gate the single sub-block since the 
regenerated signals from the neighboring clock grids will reach the clock sinks. The sub-
blocks which have been clock gated haven’t been disconnected from the connections 
with the neighboring clock grid’s wires. Therefore, there is requirement of the 
introducing ‘reconfigurability’ in the clock mesh which enables clock gating in the clock 
mesh and at the same time tries to integrate the advantage of both the H-tree and clock 
mesh. Table 4 shows the variation of the maximum skew, the maximum fall time, 
average power, wire length and the maximum latency for different ISPD benchmarks for 
the clock distribution network using single clock mesh and H-tree. 
 29 
Table 4. Comparison of the maximum skew, average power for clock mesh with H-tree 
and single mesh without ‘reconfigurability’ for different ISPD benchmarks 
 
 
Clock distribution network 
 
With Single mesh and H-tree combined 
Benchmark 
Max 
Skew 
(ps) 
Max Fall 
time (ps) 
Average 
Power (mW) 
Wire length (cm) Max 
latency (ns) 
01.in 11 45.31 424.29 75.29 1.33 
02.in 16.9 66.26 542.04 96.84 2.39 
03.in 2.52 17.75 227.25 15.24 0.67 
04.in 3.93 20.77 238.58 23.86 0.68 
05.in 2.85 18.04 229.54 22.92 0.68 
06.in 4.11 17.02 174.75 12.75 0.41 
07.in 5.75 21.37 235.03 18.91 0.68 
08.in 5.35 18.99 183.23 16.27 0.42 
 
 
5.2. CLOCK DISTRIBUTION NETWORK USING RECONFIGURABLE CLOCK 
MESH AND H-TREE  
The mesh structure helps to achieve smaller skew compared to clock routing 
using H-tree only. Hence it is used at the lowest level of the clock tree. In the clock mesh 
when a signal is delayed in relation to the others, it is compensated by redundant paths 
[33]. In low power applications some parts of the clock tree can be turned off by clock 
gating some part of clock distribution network using H-tree only structure. In a clock 
distribution network with single clock mesh and multilevel H-tree the clock gating will 
have problem because the redundancy created by the loops will cause the signals from 
other parts of the mesh to reach the sub-block which is switched off after clock gating.  
Thus, there is a need for changing the structure of single clock mesh which will 
exploit the advantage of redundancy of the signals in mesh structure and also isolate and 
 30 
turn off some portions of the mesh when the clock gating is enabled (switched off) for 
that sub-block/sub-grid of the mesh. Similar to single clock mesh, reconfigurable clock 
mesh is distributed with clock signal from the top level H-tree. The main idea behind the 
‘reconfigurability’ is to introduce control structures that will establish the connection 
with other neighboring sub-blocks of the clock mesh when clock gating is enabled for 
that sub-block and if clock gating is disabled later, then isolate and switch off that 
particular clock grid from other clock grids. 
The PMOS or NMOS can be used as control unit. However, the usage of the 
transmission gate has advantage over the usage of the either NMOS or PMOS 
separately. Transmission gate consists of a NMOS transistor and a PMOS transistor in 
parallel with gates controlled by complementary signals. When transmission gate is ON, 
at least one of the two transistors is ON for any output voltage and hence it conducts 
both ‘0’ and ‘1’ well. If the transmission gate is used as the controlling structure between 
two sub-blocks then transmission gate normally acts as a voltage controlled resistor 
connecting the input and the output.  
The clock mesh of level 3 has 8x8 sub-blocks in the clock mesh. With 2x2 
‘reconfigurability’, we have 4 ‘sub-clock meshes’ separated and controlled by the 
transmission gate. The resulting clock distribution network is as shown in Figure 13. H-
tree with ‘level 2’ is used to distribute clock signal to clock mesh. Since H-tree has clock 
gating of level 1, a 2x2 reconfigurable mesh is formed with the control signals enb1, 
enb2, enb3, enb4. The sub-clock meshes in the four quadrants of the entire chip area are 
connected by control units (transmission gates) with control signal pairs like 
 31 
‘and_enb1_enb2’ and ‘nand_enb1_enb2’ etc as shown in Figure 13. Extra control logic 
is needed to generate control signals. Signal pair like ‘and_enb1_enb2’ and 
‘nand_enb1_enb2’ is generated by using the logic ‘AND’ and ‘NAND’ gates 
respectively with control inputs ‘enb1’ and ‘enb2’. Similarly other signals like 
‘and_enb2_enb3’, ‘nand_enb2_enb3’, ‘and_enb3_enb4’, ‘nand_enb3_enb4’, 
‘and_enb4_enb1’, and ‘nand_enb4_enb1’ can also be generated.  
 
 
Figure 13. Clock distribution network with 2x2 Reconfigurable clock mesh of level 3 
with H-tree of level 2 
 
 32 
For example when a sub-block controlled by ‘enb1’ signal is turned off then 
control signals corresponding to ‘enb1’ are ‘and_enb1_enb2’, ‘nand_enb1_enb2’, 
‘and_enb4_enb1’ and ‘nand_enb4_enb1’ will have logic values of 0, 1, 0, 1 respectively. 
This will make corresponding transmission gates to be in switched off mode. Thus 
corresponding ‘sub-clock mesh’ is completely isolated from other sub-clock meshes. 
When there is a requirement to turn on the same sub-clock mesh then ‘enb1’ signal is 
made logic 1. The transmission gates will be switched ON again. And the ‘sub-clock 
mesh’ is now connected to the other sub-clock meshes using the transmission gate. In 
Figure 13 only an 8x8 clock mesh has been considered with 2x2 ‘reconfigurability’, 
where each sub-clock meshes is connected to neighboring sub-clock meshes by only two 
transmission gates. In later sections the effect of changing the number of transmission 
gates (control units) will be discussed and also the effect of changing size of PMOS and 
NMOS of the transmission gate will be discussed.  
Table  5 shows the comparison of the maximum skew, average power for clock 
distribution network with 2x2 reconfigurable clock mesh of level 5 with H-tree of level 4 
for the test case ’02.in’ of ISPD 2010 benchmark. This is the largest benchmark in ISPD 
2010 test cases with more number of sinks. It can be seen that there is not much 
variation in skew unlike clock routing structure with H-tree only. The maximum skew is 
in the range of 20 ps which is much less than the value for the structure with H-tree of 
similar configuration as shown in Table 2. Same configuration is used for simulating 
both the structures. Latency of both structures is very similar. However, the wire length 
used for clock mesh is greater than the wire length used for CDN with H-tree only.  
 33 
In Table 5 it can be seen that there is reduction in the power with clock gating 
based on the number of sub-clock meshes being clock gated. For the value of 0 in the 
column of ‘Number of sub-clock meshes ON’ in Table 5 the power dissipation is due to 
the power accounting to first level of the H-tree which is consuming power even if all 
the clock meshes are turned off. The results calculated for the other benchmarks showed 
very similar trend. 
 
Table 5. Comparison of the maximum skew, average power for clock mesh with H-tree 
for 2x2 reconfigurable mesh 
 
Benchmark Clock distribution network 
 
02.in H-tree and 2x2 reconfigurable mesh with clock gating (level =1 ) 
and SZF =1 and NTG=17/side 
Number of 
sub-clock 
meshes ON 
Max Skew 
(ps) 
Max Fall 
time (ps) 
Average 
Power 
(mW) 
Wire 
length 
(cm) 
Max 
latency 
(ns) 
0 0 0 32.4 98.94 0 
1 20.2 76.75 159.62 98.84 2.46 
2 20.4 76.75 288.49 98.84 2.46 
3 20.3 76.33 417.7 98.84 2.46 
4 20.2 76.3 544.14 98.84 2.46 
 
The above design deals with reconfigurable mesh with four ‘sub-clock meshes’ 
at the four quadrants of the chip area. They are connected to other sub-clock meshes and 
connection establishments are controlled using transmission gates. Hence only four sub-
clock meshes can be turned on or turned off based on clock gating signals. Granularity 
 34 
of controlling the clock gating can be increased by making the clock mesh as 
reconfigurable structure with 16 ‘sub-clock meshes’ instead of only 4 sub-clock meshes.  
 
 
Figure 14. Clock distribution network with 4x4 reconfigurable clock mesh of level 3 and 
H-tree of level 2 
 35 
Clock distribution network with 4x4 reconfigurable clock mesh of level 3 and H-
tree of level 2 can be formed as shown in Figure 14 with 16 sub-clock meshes. Although 
the control logic required for creating the control signals have increased exponentially 
the clock gating can now be done with more granularity compared to the clock 
distribution network with 2x2 reconfigurable clock mesh. Here NTG is number of 
transmission gate connections for each sub-blocks and SZF is the transmission gate size 
factor which will be explained in the following sections. 
In Figure 14 the clock mesh structure has 16 sub-clock meshes and there is a 
requirement of 16 enable signals. Also the number of transmission gates has increased. 
Transmission gate control signals are generated from logic ‘AND’ gates and ‘NAND’ 
gates. For example, control signals corresponding to the sub-clock mesh controlled by 
the output of H-tree ‘Clk out 11’ are ‘and_enb11_enb14’, ‘nand_enb11_enb14’, 
‘and_enb11_enb12’ and ‘nand_enb11_enb12’.  If the ‘enb 11’ signal is at logic 0, then 
above control signals will switch off the transmission gate and will isolate corresponding 
‘sub-clock mesh’ from the rest of the ‘sub-clock meshes’. Similarly clock gating can be 
performed for the rest of the ‘sub-clock meshes’ to achieve reduction in power 
consumption.  
There is possibility of increasing the granularity of clock mesh ‘clock gating’ 
from 4x4 to 8x8 which would give more flexibility in turning on or turning off particular 
sub-clock mesh of smaller size. For very low power applications, this might be of great 
use to reduce power. Table 6 shows the comparison of the maximum skew, average 
power for clock distribution network with H-tree and 4x4 reconfigurable clock mesh for 
 36 
the test cases ‘02.in’ of ‘ISPD 2010’ benchmark . H-tree level considered is of ‘level 4’ 
and the clock mesh considered has level 5 with 32x32 (25x25) sub-blocks. However, 
from Table 5 and Table 6 it can be inferred that the maximum fall time in Table 6 is 
greater than that of Table 5 by a small percentage. This is because of increase in 
‘reconfigurability’ levels and introduction of more number of transmission gates, instead 
of direct connection of sub-clock meshes using wires as in single mesh.  
 
Table 6. Comparison of the maximum skew, average power for clock distribution 
network with H-tree and 4x4 reconfigurable clock mesh  
 
Benchmark Clock distribution network 
02.in 
H-tree and 4x4 reconfigurable clock mesh with clock gating  
(level =2 ) SZF =1 and NTG=9/side 
Number of sub-
blocks/sub-
clock meshes 
ON 
Max 
Skew 
(ps) 
Max Fall 
time (ps) 
Average 
Power 
(mW) 
Wire 
Length 
(cm) 
Max 
latency 
(ns) 
0 0 0 80.55 102.84 0 
1 17.4 77.79 110.53 102.84 2.46 
2 19.2 77.78 137.05 102.84 2.46 
3 20.3 77.79 166.13 102.84 2.46 
4 20.3 77.78 192.77 102.84 2.46 
5 20.3 77.79 222.07 102.84 2.46 
6 20.4 77.79 251.16 102.84 2.46 
7 20.4 77.79 280.44 102.84 2.46 
8 20.3 77.23 312.51 102.84 2.46 
9 16.3 66.22 340.11 102.84 2.45 
10 16.4 66.59 370.63 102.84 2.46 
11 16.4 66.6 399.82 102.84 2.46 
12 20.4 77.22 429.04 102.84 2.46 
13 20.3 77.21 458.99 102.84 2.46 
14 20.4 77.72 488.16 102.84 2.46 
15 20.3 77.2 514.37 102.84 2.46 
16 20.3 77.2 544.84 102.84 2.46 
 37 
Table 3 gives the results for the clock distribution network using H-tree with 
level 2 clock gating. Comparing Table 3 with Table 6, it can be inferred that the 
maximum skew value for the clock distribution network corresponding to Table 6 is very 
small compared to the maximum skew in Table 3. 
Thus reconfigurable mesh combines both the advantage of clock mesh and H-tree 
with the capability of clock gating. Thus it guarantees small skew. However, average 
power for the clock distribution network using clock mesh with 4x4 ‘‘reconfigurability’’ 
is slightly higher than the power value for clock distribution network formed using H-
tree with level 2 clock gating. The following sections deal with impact of the number of 
transmission gates (control units) between each of the sub-clock meshes and the impact 
of the size of the PMOS and NMOS of the transmission gate.   
 
 5.3. IMPACT OF THE SIZE OF THE TRANSMISSION GATE BETWEEN THE 
SUB-CLOCK MESHES  
The transmission gates between sub-clock meshes in reconfigurable clock 
meshes provide controllability for isolating and clock gating particular sub-clock meshes 
from the others. In the above case, impact due to the size of the transmission gate will 
give more insight into the requirement for introducing transmission gates between the 
sub-clock meshes. In transmission gate structure we consider the size of PMOS to be 
twice the size of the NMOS. In our analysis the size of PMOS and NMOS is varied by 
changing the width of the PMOS and NMOS of the transmission gate. 
 38 
Analysis is performed for various benchmarks of the ISPD 2010 benchmarks. 
For example Table 7 shows the comparison of the skew and slew rate for test case 
’02.in’ in ISPD 2010 benchmark. The transmission gate size factor (SZF) determines the 
size (width) of the transmission gate. If SZF =1 then for ISPD benchmarks with 45nm 
technology, NMOS and PMOS of transmission gate will have size value as NMOS 
SIZE=SZF*4*45nm and PMOS SIZE=SZF*8*45nm respectively.  
 
Table 7. The variation of the maximum skew, average power with the change in the size 
of the transmission gate for 2x2 reconfigurable clock mesh for ’02.in’ case 
 
Benchmark Clock distribution network 
02.in 
H-tree with 2x2 reconfigurable mesh with clock gating 
(level =1 ) keeping the number of TG connections to 
(NTG) 17/side for each sub-blocks 
Changing the 
transmission gate size 
factor (SZF) with 
NMOS=SZF*4*45nm & 
PMOS=SZF*8*45nm 
Max 
Skew (ps) 
Max Fall 
time (ps) 
Average 
Power 
(mW) 
Wire 
Length 
(cm) 
Max 
latency 
(ns) 
0.25 20.3 76.59 544.13 98.84 2.46 
0.5 20.3 76.48 544.13 98.84 2.46 
1 20.2 76.3 544.14 98.84 2.46 
10 19.5 75.69 544.35 98.84 2.46 
20 19 76.63 544.55 98.84 2.46 
30 18.8 77.05 544.77 98.84 2.46 
40 18.7 77.23 544.93 98.84 2.46 
50 18.7 77.37 545.06 98.84 2.46 
60 18.7 77.5 545.22 98.84 2.46 
80 18.8 77.75 545.46 98.84 2.46 
100 19 78 545.7 98.84 2.46 
120 19.4 78.25 545.93 98.84 2.46 
140 19.8 78.49 546.15 98.84 2.46 
 39 
Variation in the skew with the size of transmission gate is given by Table 7 for 
the clock distribution network using 2x2 reconfigurable clock mesh of level 5 and H-tree 
of level 4. And the corresponding variation of the maximum skew with the change of the 
size of the transmission gate (SZF) has been plotted in Figure 15. Similarly, the variation 
of the maximum skew with SZF for 01.in benchmark corresponding to 2x2 
reconfigurable mesh is shown in Figure 16. Transmission gate acts as voltage controlled 
resistor and its resistance decreases with increase in width of PMOS and NMOS when 
transmission gate is in conducting mode. Thus it can be seen that with the increase in the 
width of PMOS and NMOS of the transmission gate the maximum skew decreases. 
 
 
Figure 15. Variation of the maximum skew with the change of the size of the 
transmission gate for 2x2 reconfigurable clock mesh for ’02.in’ case 
18.6
18.8
19
19.2
19.4
19.6
19.8
20
20.2
20.4
0 50 100 150
M
ax
 sk
ew
(p
s)
SZF
Max Skew (ps) vs   transmission gate size factor (SZF)
Max Skew (ps)
 40 
 
Figure 16. Variation of the maximum skew with the change of the size of the 
transmission gate for 2x2 reconfigurable clock mesh for ’01.in’ case 
 
Therefore, with increase in the width of the transmission gate, the maximum 
skew decreases till it reach the value of the smallest possible skew achievable using the 
single clock mesh. After certain point increase in capacitance due to the increase in 
transmission gate size will add to the load capacitance of the clock distribution network 
using 2x2 reconfigurable mesh as seen by the H-tree leaf nodes.  
For ’02.in’ benchmark the sub-clock meshes are connected using 17 transmission  
gates to the neighbouring sub-clock meshes whose size is varied.Also the power value 
increases by small percentage with increase in the size of transmission gates. Therefore, 
using the optimum value of transmission gate size will help to achieve the maximum 
skew value close to the maximum skew of single clock mesh along with the capability of 
‘reconfigurability’. 
11.35
11.4
11.45
11.5
11.55
11.6
11.65
11.7
11.75
0 50 100 150
M
ax
 S
ke
w
(p
s)
SZF
Max Skew (ps) vs   transmission gate size factor (SZF)
Max Skew (ps)
 41 
Table 8. The variation of the maximum skew, average power with the change in the size 
of the transmission gate for 4x4 reconfigurable clock mesh for ’02.in’ case 
 
Benchmarks Clock distribution network 
02.in 
H-tree with 4x4 reconfigurable mesh with clock gating 
(level =1 ) keeping the number of TG connections to 
(NTG) 17/side for each sub-blocks 
Changing the transmission 
gate size factor (SZF) with 
NMOS=SZF*4*45nm & 
PMOS=SZF*8*45nm 
Max 
Skew (ps) 
Max Fall 
time (ps) 
Average 
Power 
(mW) 
Wire 
Length  
(cm) 
Max 
latency 
(ns) 
0.25 20.4 77.61 543.68 102.84 2.46 
0.5 20.4 77.47 543.73 102.84 2.46 
1 20.3 77.2 543.84 102.84 2.46 
10 19.6 73.87 545.92 102.84 2.46 
20 19.1 73.28 548.04 102.84 2.46 
30 18.8 73.45 549.85 102.84 2.46 
40 18.8 73.91 551.44 102.84 2.46 
50 18.8 74.34 552.88 102.84 2.46 
60 18.8 74.7 554.28 102.84 2.46 
80 18.8 75.34 556.95 102.84 2.46 
100 19 76.67 559.56 102.84 2.46 
120 19.5 80.67 561.99 102.84 2.46 
140 20.9 84.53 564.34 102.84 2.46 
 
 
Similarly the entire set of experiment is simulated for the clock distribution 
network using 4x4 reonfigurable mesh with H-tree. It showed similar pattern for the 
’02.in’ test case in the ‘ISPD 2010 benchmark’. The H-tree of level 4 and clock mesh of 
level 5 with 32x32 sub-blocks has been used. There are 16 ‘sub-clock meshes’ which 
can be clock gated using a 4x4 reconfigurable clock mesh. The variation of the 
maximum skew with SZF corresponding to the 4x4 reconfigurable mesh for 02.in and 
01.in benchmarks are shown in Figure 17 and Figure 18 respectively. 
 42 
  
Figure 17. Variation of the maximum skew with the change of the size of the 
transmission gate for 4x4 reconfigurable clock mesh for ’02.in’ case 
 
 
 
Figure 18. Variation of the maximum skew with the change of the size of the 
transmission gate for 4x4 reconfigurable clock mesh for ’01.in’ case 
18.5
19
19.5
20
20.5
21
21.5
0 50 100 150
M
ax
 sk
ew
(p
s)
SZF
Max Skew (ps) vs   transmission gate size factor (SZF)
Max Skew (ps)
12
12.5
13
13.5
14
14.5
15
0 50 100 150
M
ax
 sk
ew
(p
s)
SZF
Max Skew (ps) vs  number of TG gate connection (SZF) 
between sub clock mesh
Max Skew (ps)
 43 
The sub-clock meshes are connected using 9 transmission  gates to the 
neighbouring sub-clock meshes whose size is varied. Table 8 shows the variation of the 
maximum skew, average power with the change in the size of the transmission gate of 
4x4 reconfigurable clock mesh for ’02.in’ test case. From Table 8 it can be seen that the 
variation of the skew follows same trend as in Table 7. In 2x2 and 4x4 reconfigurable 
clock mesh the number of transmission gates (NTG) used to connect between two sub-
clock meshes are 17 and 9 respectively. 
 
5.4. ANALYSIS BY CHANGING NUMBER OF TRANSMISSION GATES 
BETWEEN THE SUB-CLOCK MESHES  
The number of transmission gates required to connect from one sub-block mesh 
to other sub-block meshes can be varied to see the effect on the maximum skew value. 
By finding the optimum number of transmission gates required we can reduce the 
number of transmission gates still maintaining small value for the skew. Hence the 
complexity of the logic for generating the control signal and its routing complexity can 
be reduced. Also the loading due to the inherent capacitance of the PMOS and NMOS of 
the transmission gates can be reduced. This would help in achieving better slew rate for 
both falling and rising signal.  
Table 9 shows the variation of the clock skew and other parameters with the 
variation in the number of transmission gates used between the sub-clock meshes. In 
Table 9 the transmission gate size factor (SZF) is ‘50’ with NMOS=4*45nm and 
PMOS=8*45nm.  
 44 
Table 9. Variation of the maximum skew and other parameters with the change of 
number of transmission gates for 2x2 reconfigurable clock mesh for ’02.in’ case 
 
Benchmarks Clock distribution network 
02.in 
H-tree and 2x2 reconfigurable mesh with clock  gating  
(level =1) keeping the transmission gate size factor (SZF=) as 
50 
Changing the 
number of TG gate 
connection (NTG) in 
for each sub-blocks 
Max Skew 
(ps) 
Max Fall 
time (ps) 
Average 
Power 
(mW) 
Wire 
Length 
(cm) 
Max 
latency 
(ns) 
17 18.7 77.37 545.06 98.84 2.46 
9 19 77.04 544.66 98.84 2.46 
5 19.6 76.56 544.38 98.84 2.46 
3 20.3 76.5 544.23 98.84 2.46 
2 20.3 76.31 544.25 98.84 2.46 
1 20.3 76.1 544.16 98.84 2.46 
0 20.4 76.71 544.03 98.84 2.46 
 
 
  
Figure 19. Variation of the maximum skew with the change of the number of the 
transmission gate for 2x2 reconfigurable clock mesh for ’02.in’ case 
18.6
18.8
19
19.2
19.4
19.6
19.8
20
20.2
20.4
20.6
0 5 10 15 20
M
ax
 S
ke
w
(p
s)
NTG
Max Skew (ps) vs number of TG gate connections(NTG) 
between sub clock mesh
Max Skew (ps)
 45 
 
Figure 20. Variation of the maximum skew with the change of the number of the 
transmission gate for 2x2 reconfigurable clock mesh for ’01.in’ case 
 
From Table 9 it can be inferred that the maximum skew value increases with 
change in the number of transmission gates (NTGs) between the clock mesh. With more 
controlling structures the maximum skew of reconfigurable mesh can be reduced to the 
value achievable using the single clock mesh. Also the same analysis is done for the 4x4 
reconfigurable clock mesh. The variation of the maximum skew with NTG 
corresponding to the 2x2 reconfigurable mesh for 02.in and 01.in benchmarks are shown 
in Figure 19 and Figure 20 respectively. Similarly variation of skew with the NTG value 
is show in Table 10 for the 4x4 reconfigurable clock mesh. For the 4x4 reconfigurable 
mesh the variations of the maximum skew with NTG for 02.in and 01.in benchmarks are 
shown in Figure 21 and Figure 22.  
11.35
11.4
11.45
11.5
11.55
11.6
11.65
0 5 10 15 20
M
ax
 sk
ew
(p
s)
NTG
Max Skew (ps) vs number of TG gate connections(NTG) 
between sub clock mesh
Max Skew (ps)
 46 
Table 10. Variation of the maximum skew and other parameters with the change of 
number of transmission gates for 4x4 reconfigurable clock mesh for ’02.in’ case 
 
Benchmarks Clock distribution network 
02.in 
H-tree with 4x4 reconfigurable mesh with clock gating (level 
=1 ) keeping the transmission gate size factor (SZF=) as 50 
Changing the number 
of TG gate connection 
(NTG) in for each 
sub-blocks 
Max Skew 
(ps) 
Max Fall 
time(ps) 
Average 
Power 
(mW) 
Wire 
Length 
(cm) 
Max 
latency 
(ns) 
9 18.8 74.34 552.88 102.84 2.46 
5 19 73.75 549.44 102.84 2.46 
3 19.5 74.82 547.34 102.84 2.46 
2 20.3 76.79 546.14 102.84 2.46 
1 20.3 75.72 544.86 102.84 2.46 
0 20.4 77.73 543.66 102.84 2.46 
 
 
  
Figure 21. Variation of the maximum skew with the change of the number of the 
transmission gate for 4x4 reconfigurable clock mesh for ’02.in’ case 
18.6
18.8
19
19.2
19.4
19.6
19.8
20
20.2
20.4
20.6
0 2 4 6 8 10
M
ax
 S
ke
w
(p
s)
NTG
Max Skew (ps) vs  number of TG gate connection (NTG) 
between sub clock mesh
Max Skew (ps)
 47 
 
Figure 22. Variation of the maximum skew with the change of the number of the 
transmission gate for 4x4 reconfigurable clock mesh for ’01.in’ case 
 
It can be inferred that with change in the number of transmission gates between 
the sub-clock meshes there is variation in the clock skew. If the optimum value of SZF 
and NTG is used then it will decrease the skew by certain amount.  
The optimum value of SZF and NTG are 50 and 17 respectively for ’02.in’ 
benchmark if the H-tree level used is level 4 and clock mesh level used is level 5 for 2x2 
reconfigurable clock distribution network (CDN). The equivalent SZF (ESZF) value is 
the overall width of all transmission gates used along one side of the sub-clock meshes 
which gives small value for maximum skew. For ’02.in’ benchmark, it corresponds to 
the optimum value of SZF =50 and NTG =17. Thus for a clock distribution network 
using 2x2 reconfigurable clock mesh ‘equivalent SZF’ (ESZF = SZF * NTG) value for 
’02.in’ benchmark is 850. Hence it is equivalent to using a single transmission gate of 
12.4
12.6
12.8
13
13.2
13.4
13.6
0 2 4 6 8 10
M
ax
 sk
ew
(p
s)
NTG
Max Skew (ps) vs   transmission gate size factor 
(NTG)
Max Skew (ps)
 48 
size of ESZF=850 or using number of transmission gate equal to optimum NTG=17 each 
with size SZF=50. However using NTG value equal to the number of clock grids along 
any axis in each sub-clock mesh is better since it distributes the transmission gates along 
the clock grid and may avoid uneven loading at some H-tree leaf nodes.  
The tables in the appendix give the comparison of the maximum skew for the 
2x2 and 4x4 reconfigurable clock mesh. The maximum skew of the single mesh and 
maximum skew of clock mesh without transmission gates (control structures) between 
the sub-clock meshes were simulated with different ISPD benchmarks. It can be seen 
that if the optimum value of SZF and NTG is used for each of the benchmarks then the 
maximum skew attained is smaller than maximum skew of the clock mesh without 
transmission gates between sub-clock meshes and greater than the skew attained with 
single mesh. In some cases it also attains skew lesser than the value attained using single 
mesh. Also comparison of the maximum skew with change in the NTG is done for 
’02.in’ benchmark in the appendix with SZF = 100. It shows that the small value of the 
maximum skew is 18.8 ps for NTG = 9 and SZF =100 (ESZF = 900) which is close to 
18.7 ps which is the minimum value of table 9 for NTG =17 and SZF = 50(ESZF = 850). 
It can be inferred that the control structures like transmission gates used between 
the sub-clock meshes reduces the skew and also helps to introduce clock gating in the 
clock mesh. Similar trend is seen if the clock mesh density and the H-tree levels are 
reduced. It indicates that introducing the transmission gates is better in reducing the 
skew, rise/fall time compared to the structure where no transmission gates are used. 
 
 49 
6. WIRELENGTH MINIMIZATION FOR BOUNDED SKEW FOR CLOCK 
DISTRIBUTION USING CLOCK MESH AND H-TREE 
 
In the previous section different techniques of clock gating in clock mesh was 
analyzed. In this section the wire length minimization of the local routing within the sub-
blocks of the clock mesh will be discussed. In the above methods local connection using 
clock mesh is done by making the direct connection from the clock sinks to the nearest 
clock grid.  
Comparing the implementation of the clock distribution network using clock 
mesh and H-tree, the H-tree always uses relatively much less wire length for performing 
clock routing but the delay and skew using clock mesh is lesser than that of using H-tree. 
Total wire length of the clock routing using the clock mesh consists of the wire length of 
H-tree, clock mesh and local distribution connections. In the previous sections the local 
connections are made by direct connection. Although the wire length of clock mesh 
structure cannot be reduced in clock distribution network but the wire length of local 
distribution network can be reduced by the following method. The aim is to still 
maintain the skew in comparable to the clock distribution network using direct local 
connection. In this section methodologies like modified zero skew routing method and 
modified AHHK method have been considered which tries to reduce the wire length 
used for local distribution network in the clock routing using clock mesh, for some 
bounded skew.  
 50 
Previous work [3] [34] deals with wire length balancing for clock routing for 
minimizing the clock delay. However, the real objective should be to reduce clock skew, 
delay and wire length by balancing the loads which it is driving [35] [36] [37] [38]. 
Previous work [39] deals with conventional methodology called ‘exact zero skew clock 
routing’ which aims at optimizing the timing performance in synchronous circuits. 
The ‘zero skew clock routing’ method discusses about the method to find the 
location of the new root of the merged tree, such that the delay time from this new root 
to all leaf nodes are equal, i.e., zero skew. This new root point is the ‘tapping point’ for 
zero skew. The zero skew algorithm [39] discussed is a recursive bottom-up process. 
The zero skew merge process finds the tapping point [39], as in Figure 23, between two 
sink points or between root nodes of the two sub-trees of local connections or between a 
sink point and a root node of the local connection. 
 
 
Figure 23. Tapping Point location between two sub-trees 
 51 
The tapping point is the point between two clock nodes or between two sub-trees 
from which the Elmore delay value will be same to either of the leaf nodes. That is it 
separates the interconnection wire of the two sub trees into two parts (need not be equal) 
of equal Elmore delay [39]. Thus the zero skew point is found using the zero skew 
merge process using the Elmore delay matching of the two sub trees as in [39]. Similarly 
the other methods of forming the local routing is AHHK approach which directly 
combines the recurrences for Prim’s MST algorithm and Dijkstra’s SPT algorithm [40]. 
In the clock mesh method discussed previously connection for any clock sink is 
made directly to the nearest clock grid but wire length can be reduced by applying the 
variant of zero skew method or the variant of AHHK method. The conventional 
approach of forming the local connections like the zero skew merging process or AHHK 
method cannot be applied directly for clock mesh. The conventional approaches try to 
connect all sinks in the circuit to form the clock routing in the bottom up approach till all 
clock sinks and root nodes of the sub trees are connected with the single clock source. 
However in the clock mesh method with upper level H-tree there are many sub-blocks 
which contain clock sinks and the requirement is to connect the clock sinks and root 
nodes of the sub trees to any of the optional points on the nearest clock grid. The 
optional points on the clock mesh serve the purpose of clock source for each of the sub-
blocks. Thus conventional routing method must be modified accordingly, since there are 
many optional source points and then one which gives minimum skew and wire length 
must be chosen. 
 52 
If any clock sink is very near to the clock grid then instead of connecting with 
other sub trees the clock sink can be connected directly to the nearest clock grid. That is 
there is no requirement to always form the sub trees between the clock sink. This 
reduces the wire length but might increase the clock skew by small value. Decision must 
be made dynamically to determine which clock sinks should be connected locally to 
form the distribution network and which other clock sinks should be used for direct 
connection. In the modified zero skew method the Elmore delay model is used to 
calculate the estimated delay from the nearest clock grid to the clock sink or between 
clock sinks before forming the sub trees. The Manhattan distance is used to calculate the 
distance between root nodes of two ‘sub-trees’ or between the clock sinks or between a 
clock sink and a root node of the sub tree. We have used the Elmore delay model for the 
estimated delay calculation. The primary disadvantage of the Elmore delay model is that 
it has limited accuracy. It always overestimates the delay [41]. Hence it is very difficult 
to achieve skew value as claimed by the zero skew method [39], but we can achieve 
close to zero skew. Hence in the modified zero skew method a threshold value for the 
skew called ‘SKEW_BOUND’ have been assumed for each of the benchmarks.  
If the estimated skew is within the SKEW_BOUND then the skew limit can be 
considered to be reached. The local clock distribution is formed from the sub trees only 
if the estimated skew is within the SKEW_BOUND value. The following steps describe 
the modified zero skew method of clock routing which is the variant of conventional 
zero skew method. It aims at minimizing the wire length for the bounded skew in a clock 
distribution network (in short CDN) using clock mesh and H-tree. 
 53 
STEP 1: The skew bound for the clock distribution network is indicated by 
SKEW_BOUND and the sink capacitances of the total circuit are indicated by Ci where 
Ci ϵ {C1, C2…Cn}.  
STEP 2: CDN with clock mesh of (n+1) rows and (n+1) columns has totally 2nx2n  sub-
blocks and they are indicated by their index i and j as bi,j , where 1 ≤ i ≤ n and 1 ≤ j ≤ n.  
STEP 3: Find the minimum of the estimated Elmore delay (MIN_D_CAP) of the wire 
length, used to connect the clock sink to the nearest clock grid, for all sink capacitances 
in the given test case. So, (MIN_D_CAP = min (D(Ci)). Here D(Ci) is the estimated 
Elmore delay from connecting node on clock grid to the clock sink with capacitance Ci. 
STEP 4: The following steps are repeated for each of the sub-blocks bi,j , where 1 ≤ i ≤ n 
and 1 ≤ j ≤ n. 
STEP 5: The sink capacitances in a specific sub-block (bi,j) of a total grid are denoted as 
pij={pij1, pij2 ,….,pijm }  where m<=n. Thus, here m is the total number of clock sink 
within each of the sub-blocks. 
STEP 6: In the specific sub-block bi,j  find the center point (centerij) 
STEP 7: Arrange the sinks in the sub-block bi,j in the increasing order of the distance 
from the (centerij) as p_incxyi,j =sort (pi,j).  
STEP 8: Initialize a variable namely ‘combine_possible’=0. This variable will determine 
when to stop applying the routing algorithm to each of the sub-blocks. 
STEP 9: Form the matrix namely ‘sk_xxy’ and ‘sk_yxy’ (each of size mxm) which gives 
the x and y coordinate of the ‘zero skew point’ between the clock sinks or between the 
root nodes of the sub trees or between the clock sink and the root node of sub tree in the 
 54 
current sub-block (bi,j). Usually for every two points chosen at a time we get 2 two zero 
skew points since we have chosen Manhattan distance. The zero skew point which is 
close to the clock grid is chosen leading to the overall wire length reduction. 
STEP 10: Form the matrix namely ‘d_node_xyxy’ which gives the total wire length used 
from the nearest node on the clock grid to clock sinks. This includes the wire length 
between the current nodes X or Y to the zero skew point and also the distance from the 
(merging point) zero skew point to the nearest clock grid in the sub-block (bi,j). 
STEP 11: Form the matrix namely ‘tot_ep_el_time_xyxy’ which gives the total estimated 
Elmore delay from the nearest node on the clock grid to clock sinks which includes 
Elmore delay from the point on the clock grid to the zero skew point (obtained by 
connecting two current nodes X and Y in the sub-block (bi,j) ) and the Elmore delay from 
zero skew point to the clock sink. 
STEP 12: Form the matrix namely ‘sk_node_xyxy’ from the ‘tot_ep_el_time_xyxy’ by 
subtracting the MIN_D_CAP (minimum of the estimated Elmore delay of all the sink 
capacitances) from it. 
STEP 13: Sort the elements in ‘d_node_xyxy and find the first element which has 
minimum value of d_node_xyxy and form the connection if its corresponding skew in 
sk_node_xyxy is less than the skew bound (SKEW_BOUND). 
STEP 14: Remove the nodes X and Y from the list of nodes within the sub-block and 
replace it by the corresponding zero skew merging point coordinates. Hence the current 
number of nodes which is not associated with any sub trees in the sub-block will be 
 55 
reduced by a factor of 1 and assign combine_possible=1. If no merging point satisfies 
the following condition (sk_node_xyxy < MIN_D_CAP) then assign combine_possible=0 
STEP 15: If (combine_possible==1) is true then go to the step 9 and start repeating the 
steps till combine_possible==1, other wise move to step 16 
STEP 16: After the merging the points in the current sub-block the same algorithm is 
repeated for all other sub-blocks (bi,j) , Go to step 5 and repeat the steps for the sinks in 
the other sub-blocks (bi,j) 
STEP 17: Stop if the algorithm have been applied to all the sub-blocks (bi,j). 
This modified zero skew method of connection is compared with other methods 
like direct connection to the clock grid in clock mesh, modified AHHK method, and 
direct connection made to the clock grid after elongating to balance the wire length for 
all the clock sinks. These methods varies from modified zero skew method only in local 
distribution network connection. 
The conventional AHHK approach directly combines the recurrences for Prim’s 
MST algorithm and Dijkstra’s SPT algorithm [40]. The AHHK algorithm iteratively 
adds the edge eij and the sink vi to T, where vi, and vj are chosen to minimize  
((c.Lj) + dij) such that vj ϵ T, and vi ϵ V-T                    (6) 
The above condition should be minimized for some choice of 0 ≤ c ≤ 1. When c=0 the 
algorithm constructs the tree with minimum weight, similar to MST of prim’s algorithm. 
As c increases, AHHK constructs a tree with increasingly larger weight but with lower 
radius. With c = 1 the AHHK will be like the Dijkstra’s algorithm. The AHHK algorithm 
 56 
guarantees for every sink that li is within a constant factor of d0i. This helps to reduce the 
average, rather than maximum, delay of the clock sinks [40].  
The conventional AHHK cannot be used as such since it assumes a single source. 
The objective of conventional AHHK is to minimize the above equation. In our 
experiment modified version of AHHK method is used to form the local clock routing 
within each of the sub-blocks. The modification is that the connection can be made to 
any point on the clock grid (assuming multiple source points on the clock grid) unlike 
conventional AHHK in [40] which assumes single source. The condition required for 
deciding the local connection using modified AHHK is  
         (c*( Lj))+ dij ≤ Li such that vj ϵ T, and vi ϵ V-T,          (7) 
If the above condition is true then the connection is made from the vi to vj 
otherwise the direct connection is made from vi to the nearest clock grid and finally vi is 
combined with the sub tree T. The modified AHHK is simulated with the various value 
of c (0, 0.3, 0.5, 0.7, 1) for the IBM benchmarks and for the ISPD benchmarks and its 
performance is compared with modified zero skew method for wire reduction and clock 
skew reduction.   
Table 11 shows the comparison of the modified zero skew method (HC_MZSK) 
of local routing with the other methods like direct connection (HC_DC) to the clock grid 
in clock mesh, modified AHHK (HC_M_AHHK) method, and direct connection made to 
the clock grid by elongating (HC_DC_EXT) the wire for balancing wire length for all 
the sinks.  
 
 57 
TABLE 11. Variation of the maximum skew, wire length and the maximum fall time for 
the ISPD 2010 benchmarks 02.in, 03.in, 08.in 
 
Different methods of 
making local connections 
Bench
mark 
Max 
Skew 
(ps) 
Max 
Fall 
time 
(ps) 
Average 
Power 
(mW) 
Local 
wire 
Length 
(cm) 
Total 
Wire 
length 
(cm) 
HC_DC 02.in 16.9 66.26 542.04 10.59 96.84 
HC_DC_EXT 02.in 32.1 113.86 609.79 51.98 138.23 
HC_MZSK 02.in 15 61.56 540.03 9.25 95.5 
HC_M_AHHK with c= 0 02.in 26.1 74.77 547.17 9.02 95.27 
HC_M_AHHK with c= 0.3 02.in 19.6 66.4 547.47 9.14 95.39 
HC_M_AHHK with c= 0.5 02.in 16.4 64.73 547.75 9.31 95.56 
HC_M_AHHK with c= 0.7 02.in 16.5 65.26 548.2 9.62 95.87 
HC_M_AHHK with c= 1 02.in 18.8 70.24 549.53 10.48 96.73 
Different methods of 
making local connections 
Bench
mark 
Max 
Skew 
(ps) 
Max 
Fall 
time 
(ps) 
Average 
Power 
(mW) 
Local 
wire 
Length 
(cm) 
Total 
Wire 
length 
(cm) 
HC_DC 03.in 6.16 37.68 108.49 0.87 8.35 
HC_DC_EXT 03.in 16.48 60.88 127.5 8.29 15.77 
HC_MZSK 03.in 6.06 37.29 108.15 0.71 8.19 
HC_M_AHHK with c= 0 03.in 10.23 42.27 110.51 0.66 8.14 
HC_M_AHHK with c= 0.3 03.in 9.99 42.25 110.82 0.69 8.17 
HC_M_AHHK with c= 0.5 03.in 10.24 42.42 111.04 0.74 8.22 
HC_M_AHHK with c= 0.7 03.in 10.36 42.71 111.18 0.8 8.29 
HC_M_AHHK with c= 1 03.in 10.57 42.92 111.32 0.87 8.35 
Different methods of 
making local connections 
Bench
mark 
Max 
Skew 
(ps) 
Max 
Fall 
time 
(ps) 
Average 
Power 
(mW) 
Local 
wire 
Length 
(cm) 
Total 
Wire 
length 
(cm) 
HC_DC 08.in 14.04 42.03 84.39 1.99 9.66 
HC_DC_EXT 08.in 39.99 99.54 121.35 17.14 24.82 
HC_MZSK 08.in 12.79 39.64 82.91 1.38 9.05 
HC_M_AHHK with c= 0 08.in 15.06 41.05 82.35 1.11 8.78 
HC_M_AHHK with c= 0.3 08.in 13.76 39.5 82.51 1.18 8.85 
HC_M_AHHK with c= 0.5 08.in 12.95 38.96 82.77 1.28 8.96 
HC_M_AHHK with c= 0.7 08.in 13.07 39.74 83.22 1.48 9.15 
HC_M_AHHK with c= 1 08.in 13.78 42.03 84.11 1.83 9.51 
 58 
For simulating 180nm and 45nm technology files from Predictive Technology 
Model (PTM) was used for IBM benchmarks and ISPD benchmark respectively. In 
Table 11 it can be seen that for the ISPD benchmarks the modified zero skew method 
(HC_MZSK) helps to reduce local wire length of the benchmark ‘02.in’ to 9.25cm while 
still maintaining the small maximum skew with value of 15 ps. 
Although it can be seen that modified AHHK method (HC_M_AHHK) performs 
well in reducing the wire length for the value of c=0 and c=0.3 but corresponding 
maximum skew is very high compared to the values of HC_MZSK method, direct 
connection (HC_DC) method and HC_M_AHHK method with other c values (0.5, 0.7, 
1). Similarly for the benchmark ‘03.in’ the HC_MZSK method has the maximum skew 
of 6.06 ps. It is the lowest value among the other skew values while still using small 
local wire length than HC_DC method. Similar trend follows for other benchmarks. 
Since modified AHHK method depends on the reducing the wire length and does 
not take into account the value of the Elmore delay and capacitance of sinks, so a large 
parametric sweep of c is needed to be done to find the value of ‘c’ in HC_M_AHHK 
method which will reduce the wire length by larger value still maintaining the maximum 
skew less than the SKEW_BOUND. Main reason for the large skew variation with ‘c’ 
value is that HC_M_AHHK is based on the reducing the actual length rather than on 
dealing with the Elmore delay of those length. Since the modified zero skew method is 
based on the estimated Elmore delay values it helps to reduce the local wire length and 
the maximum skew. Similar trend is observed for the IBM benchmarks as show in the 
Table 12. 
 59 
TABLE 12. Variation of the maximum skew, wire length and the maximum fall time for 
the IBM benchmarks r1, r4, r5 
 
Different methods of 
making local connections 
Be
nc
hm
ark 
Max 
Skew 
(ps) 
Max 
Fall 
time 
(ps) 
Average 
Power 
(mW) 
Local 
wire 
Length 
(um) 
Total 
Wire 
Length 
(um) 
HC_DC r1 7.7 73.43 817.56 211.86 3281.01 
HC_DC_EXT r1 9.3 78.33 821.62 548.06 3617.21 
HC_MZSK r1 7.6 73.16 817.4 198.22 3267.37 
HC_M_AHHK with c= 0 r1 9.7 73.76 817.38 197.05 3266.20 
HC_M_AHHK with c= 0.3 r1 7.6 73.17 817.47 198.38 3267.5 
HC_M_AHHK with c= 0.5 r1 7.7 73.24 817.43 200.80 3269.95 
HC_M_AHHK with c= 0.7 r1 7.8 73.33 817.48 205.64 3274.79 
HC_M_AHHK with c= 1 r1 7.7 73.43 817.56 211.86 3281.01 
Different methods of 
making local connections 
Be
nc
hm
ark 
Max 
Skew 
(ps) 
Max 
Fall 
time 
(ps) 
Average 
Power 
(mW) 
Local 
wire 
Length 
(um) 
Total 
Wire 
Length 
(um) 
HC_DC r4 18.2 87.86 2545.7 1229.31 12427.02 
HC_DC_EXT r4 23.8 101.18 2567 3703.59 14901.29 
HC_MZSK r4 18.1 87.61 2545.2 1174.35 12372.05 
HC_M_AHHK with c= 0 r4 19.4 87.5 2545.3 1166.38 12364.09 
HC_M_AHHK with c= 0.3 r4 19.4 87.51 2545.4 1175.89 12373.6 
HC_M_AHHK with c= 0.5 r4 19.5 87.57 2545.9 1187.50 12385.2 
HC_M_AHHK with c= 0.7 r4 19.1 87.69 2545.8 1204.45 12402.15 
HC_M_AHHK with c= 1 r4 18.2 87.86 2545.7 1229.31 12427.02 
Different methods of 
making local connections 
Be
nc
hm
ark 
Max 
Skew 
(ps) 
Max 
Fall 
time 
(ps) 
Average 
Power 
(mW) 
Local 
wire 
Length 
(um) 
Total 
Wire 
Length 
(um) 
HC_DC r5 24.2 107.25 2591.9 2332.57 15048.77 
HC_DC_EXT r5 33.2 128.23 2632.2 6886.15 19602.36 
HC_MZSK r5 24.1 106.44 2590.7 2168.75 14884.96 
HC_M_AHHK with c= 0 r5 25.2 106.9 2590.5 2143.80 14860.01 
HC_M_AHHK with c= 0.3 r5 23.9 106.36 2590.5 2172.32 14888.52 
HC_M_AHHK with c= 0.5 r5 24.1 106.61 2591.1 2213.19 14929.39 
HC_M_AHHK with c= 0.7 r5 23.9 106.7 2591.4 2263.05 14979.26 
HC_M_AHHK with c= 1 r5 24.3 107.3 2606.1 2332.57 15048.77 
 60 
  After applying the modified zero skew method the wire length reduction is 
much less, since the density of clock mesh used is very high (with 16x16 to 32x32 sub-
blocks) with very high number of sub-blocks in the clock mesh. Hence if the density of 
clock mesh is less than the above density (clock mesh with less than 16x16 sub-blocks) 
then it might help to reduce the wire length by a greater margin for the local routing.  
 
TABLE 13. Variation of the maximum skew, wire length and maximum fall time for the 
ISPD benchmark ’02.in’ by changing the density of clock mesh level and H-tree level 
 
Different methods of 
making local connections benchmark : ISPD '02.in' 
For H-tree level =4 and 
Clock mesh level =5 
Max 
Skew 
(ps) 
Max 
Fall time 
(ps) 
Average 
Power 
(mW) 
Local wire 
Length 
(cm) 
Total 
Wire 
Length 
(cm) 
HC_DC 16.9 66.26 542.04 10.59 96.84 
HC_DC_EXT 32.1 113.86 609.79 51.98 138.23 
HC_MZSK 15 61.56 540.03 9.25 95.5 
HC_M_AHHK with c= 0 26.1 74.77 547.17 9.02 95.27 
HC_M_AHHK with c= 0.3 19.6 66.4 547.47 9.14 95.39 
HC_M_AHHK with c= 0.5 16.4 64.73 547.75 9.31 95.56 
HC_M_AHHK with c= 0.7 16.5 65.26 548.2 9.62 95.87 
HC_M_AHHK with c= 1 18.8 70.24 549.53 10.48 96.73 
For H-tree level =3 and 
Clock mesh level =4 
Max 
Skew 
(ps) 
Max 
Fall time 
(ps) 
Average 
Power 
(mW) 
Local wire 
Length 
(cm) 
Total 
Wire 
Length 
(cm) 
HC_DC 66.3 210.17 300.77 20.87 64.32 
HC_DC_EXT 129.7 440.62 388.7 87.29 130.74 
HC_MZSK 51.9 177.53 294.43 16.48 59.93 
HC_M_AHHK with c= 0 86.9 223.1 294.63 15.8 59.25 
HC_M_AHHK with c= 0.3 60 182.29 295.45 16.35 59.8 
HC_M_AHHK with c= 0.5 54.7 180.87 296.33 17 60.45 
HC_M_AHHK with c= 0.7 55.3 185.21 297.73 17.96 61.41 
HC_M_AHHK with c= 1 72.2 222.16 301.61 20.71 64.16 
 61 
Since density of clock mesh is very high, the global routing wire length (total 
wire length) in TABLE 11 and 12 is mainly dominated by the clock mesh wire length. 
The clock mesh wire length is constant for a particular benchmark for a particular 
configuration. Hence the percentage of total wire length reduction is less. However the 
skew is bounded within the SKEW_BOUND for all the cases after using the above 
mentioned methods for local routing. Hence above methods help to reduce skew, slew 
rate, overall wire length by small amount and the local wire length by higher percentage. 
Table 13 shows the variation of the maximum skew by decreasing the clock 
mesh density from level 5 (32x32 sub-blocks) to level 4 (16x16 sub-blocks) and H-tree 
level from level 4 to level 3. It can be seen that direct connection method made to the 
clock grid by elongating (HC_DC_EXT) has poor performance in both the 
configurations while the direct connection (HC_DC) is not performing as good as the 
modified zero skew method  (HC_MZSK) and modified AHHK (for some values of ‘c’) 
method. It can be inferred that the HC_MZSK method is consistent in achieving reduced 
skew and reasonable wire length reduction. 
Table 13 shows that the HC_M_AHHK with c=0 has minimum wire length 
usage but the maximum clock skew is highest among all the cases. The HC_M_AHHK 
with c=0 resembles the Prim’s MST for the local distribution network and increases the 
radius of each of the clock sinks. For c=0.3, wire length usage is reasonable but the clock 
skew is still more than HC_MZSK method for the same configuration. For further 
decrease in the density of both clock mesh (level =3) and H-tree (level =2) the maximum 
slew rate fails for HC_DC, HC_DC_EXT and HC_M_AHHK with c= 1. The skew for 
 62 
HC_MZSK method is still less than the skew of other methods for the above 
configuration. Results for those configurations haven’t been shown since the maximum 
skew was higher than the SKEW_BOUND for that particular benchmark. 
In the above simulations the blockages in the ISPD benchmark circuits have been 
neglected for reducing the complexity of the algorithm to form the clock distribution 
network. However Table 14 shows that even after not considering the blockages there is 
not much variation in the maximum skew. Considering blockages will decrease the wire 
length. Hence, it has some impact on average power and fall time but not large variations 
in the skew.  
 
Table 14. Variation of the skew, average power of the CDN formed using H-tree and 
single mesh with and without blockages 
 
 
Clock distribution network 
 
CDN with H-tree and Single mesh 
without blockages 
CDN with H-tree with Single mesh 
with blockages 
Benc
hmar
ks 
Max 
Skew 
(ps) 
Max 
Fall 
time 
(ps) 
Averag
e Power 
(mW) 
Max 
latency 
(ns) 
Max 
Skew 
(ps) 
Max 
Fall 
time 
(ps) 
Average 
Power 
(mW) 
Max 
latency 
(ns) 
01.in 10.6 38.66 455.56 1.33 10.4 38.87 434.25 1.33 
02.in 15.9 55.5 563.39 2.39 16.9 56.02 550.88 2.38 
03.in 5.8 32.3 116.6 0.63 7.26 33.61 116.27 0.63 
04.in 11.43 41.97 125.17 0.65 13.11 44.51 122.71 0.65 
05.in 7.99 32.99 114.48 0.65 10.83 37.17 109.8 0.65 
06.in 8.22 29.63 88.6 0.39 8.22 29.63 88.6 0.39 
07.in 14.88 44.28 125.06 0.64 14.88 44.28 125.06 0.64 
08.in 13.56 38.05 94.76 0.4 13.56 38.05 94.76 0.4 
 63 
In all the above configurations the clock frequency used for simulation was 
1GHz and 200MHz for ISPD benchmarks and IBM benchmarks respectively with 50% 
duty cycle. The sub-block (b4,9) has x axis and y axis limits as [lower_left_x 
lower_left_y upper_right_x upper_right_y] = [2437500 nm  3500000 nm 3250000 nm 
3937500 nm]. The graphical view of enlarged view of sub-block (b4,9) of the clock mesh 
for ISPD ’02.in’ benchmark showing the local distribution connection using HC_DC 
method,  HC_MZSK , HC_M_AHKK with c=0 and HC_M_AHKK with c=0.3 has been 
shown in Figure 24, Figure 25, Figure 26 and Figure 27 respectively. The H-tree used is 
of level 3 and clock mesh used is of level 4. Although in all figures the connection from 
the clock sinks to the clock grid is shown to be a direct connection, the distance of the 
connections is the actual manhattan distance.  
 
 
Figure 24. Enlarged view of sub-block (b4,9) of the clock mesh for ISPD ’02.in’ 
benchmark with local distribution connection using HC_DC method  
 64 
 
Figure 25. Enlarged view of sub-block (b4,9) of the clock mesh for ISPD ’02.in’ 
benchmark with local distribution connection using HC_MZSK method  
 
 
Figure 26. Enlarged view of sub-block (b4,9) of clock mesh for ISPD ’02.in’ with local 
distribution connection using HC_M_AHHK method with c=0 
 65 
 
Figure 27. Enlarged view of sub-block (b4,9) of clock mesh for ISPD ’02.in’ with local 
distribution connection using HC_M_AHHK method with c=0.3 
 
 
 
 
 
 
 
 
 
 
 
 
 
 66 
7. RESULTS AND ANALYSIS 
 
 In this work the effect of reconfigurable mesh on the clock distribution network 
(CDN) has been analyzed for various ISPD benchmarks. Table 1, Table 2 and Table 3 
shows the comparison of the maximum skew, average power, wire length of the CDN 
using H-tree only, while Table 4, Table 5 and Table 6 shows the comparison of same 
parameters for CDN using clock mesh with different set of parameters. By comparing 
the results of Table 1 for CDN with H-tree only without clock gating and Table 4 for 
CDN with single mesh we can infer that average value of total wire length and average 
power used for all test bench is increased by 119.99% and 10.57% respectively while the 
average value of the maximum skew for various benchmarks achieved using single mesh 
CDN is 76.96% less than the CDN structure with H-tree only. Also the maximum fall 
(transition) time of signal using clock mesh is 55.99% lesser than using H-tree alone. 
This suggests that CDN with single clock mesh is better than CDN with H-tree alone at 
the cost of increase in wire length.  
 Comparing results of CDN using H-tree enabled with clock gating in Table 2 and 
CDN with 2x2 reconfigurable clock mesh in Table 4 it can be inferred that we can 
achieve same level of power reduction by clock gating in H-tree and using 
reconfigurable clock mesh for same set of parameters. However the maximum skew and 
the maximum fall time using 2x2 reconfigurable clock mesh are 59.93% and 44.64% 
less than that of using H-tree respectively at the cost of 76.59% increase in wire length 
for ISPD ’02.in’ benchmark. Similarly comparing the results of 4x4 reconfigurable clock 
 67 
mesh in Table 6 with results of CDN using H-tree only for the same set of parameters we 
get 60.15% and 41.90% reduction in the maximum skew and the maximum fall time by 
using 4x4 reconfigurable clock mesh at the cost of 83.74% increase in wire length. The 
primary advantage of using clock mesh is that the clock mesh helps in reducing the 
maximum fall (transition) time and skew but the wire length is increased. 
Reconfigurable clock mesh introduces ‘reconfigurability’ or clock gating to reduce 
power dynamically.  
The ‘reconfigurability’ is introduced with transmission gates, which acts as 
voltage controlled resistor connecting input and output. Table 7 and Table 8 showed the 
variation of the maximum skew for different value of transmission gate size factor (SZF) 
for 2x2 and 4x4 reconfigurable clock mesh respectively for ’02.in’ benchmark. Inference 
shows that the maximum skew reduces with increasing SZF till an optimum point and 
after which it starts increasing because capacitance due to transmission gate itself loads 
the CDN with increasing SZF. The optimum value of SZF can be used for transmission 
gates and it is 50 for ISPD ‘02.in’ benchmark. Similarly Table 9 and Table 10 shows the  
skew variation for the 2x2 and 4x4 reconfigurable mesh respectively with the number of 
transmission gate connections (NTG) after using optimum value of SZF. It can be seen 
that when NTG value is decreased the number of connections between the sub-blocks 
are decreased leading to the increase in the maximum skew for both 4x4 and 2x2 
reconfigurable clock mesh. Thus it is better to use optimum value of NTG which gives 
small value for the maximum skew and the minimum fall time. Similar trend follows for 
other benchmarks. After using optimum value of SZF and NTG, the maximum skew 
 68 
achievable using reconfigurable clock mesh will be close to the skew obtained using 
single mesh CDN. For different ISPD benchmarks the value of optimum SZF and NTG 
are calculated and listed in the tables of the Appendix section. However it can be seen 
from Table 10 and Table 3 for ’02.in’ benchmark that the maximum skew achievable 
after using the value of SZF(=50) and NTG(=9) is 18.8 ps which is not equal to the 
maximum skew value of 16.9 ps with single mesh CDN. If transmission gates are not 
used between the sub-clock mesh then the maximum skew is 20.4 ps. The single mesh 
has skew value higher than the skew of reconfigurable mesh and the skew of clock gated 
mesh without transmission gates. So it is inferred that reconfigurable mesh has the 
capability of clock gating and skew reduction to certain extent. 
 In the previous methods the connections were made directly from the clock sinks 
to the nearest clock grid in the CDN with clock mesh. Thus the remaining work deals 
with the optimum method of connecting the clock sinks to the clock grid with the aim of 
reducing the local wire length. Table 11 and Table 12 shows the comparison of the 
maximum skew, the maximum fall time and other parameters for modified zero skew 
method (HC_MZSK) of local routing with the other methods like direct connection 
(HC_DC) from clock sinks to the clock grid, modified AHHK (HC_M_AHHK) method 
(with parameter ‘c’) and HC_DC_EXT method where connection is made from clock 
grid to clock mesh by elongating for balancing the wire length for all the sinks.  
Table 11 shows the comparison of results for some of the ISPD 2010 benchmarks 
(02.in, 03.in, 08.in), while Table 12 shows the results of some of IBM benchmarks (r1, 
r4, r5). It can be seen from Table 11 that the HC_MZSK method maintains less value for 
 69 
the maximum skew compared to other methods. And it has small value for the local wire 
length compared to HC_DC, HC_DC_EXT and HC_M_AHHK (for some values of ‘c’) 
methods. For one of the ISPD benchmark ’02.in’, the maximum wire length reduction 
using HC_M_AHHK (with c=0) are 2.55% (local wire length) and 0.24% (global wire 
length) compared to HC_MZSK (modified zero skew method) while the increase in the 
maximum skew is 42.53% for HC_M_AHHK (with c=0) compared to HC_MZSK. 
Similar trend follows for other benchmarks of the ISPD test cases. Hence for ISPD 
benchmark it is better to use the HC_MZSK method to further reduce the maximum 
skew and wire length rather than using HC_M_AHHK or HC_DC method. Also 
HC_DC_EXT is based on wire length balancing rather than on delay balancing like 
HC_MZSK. Thus it does not give good performance in all the cases. 
In the IBM benchmarks the HC_MZSK is not always the one which has 
minimum skew. For example, for r1 test case the maximum skew, local and global wire 
length for HC_MZSK and HC_M_AHHK (with c=0) are very similar. For r5 benchmark 
when using the HC_M_AHHK with c= 0 decrease in local wire length is 1.17% while 
the increase in skew is 4.37% compared to modified zero skew method (HC_MZSK). 
Since the HC_M_AHHK with c= 0 is direct implication of MST method hence it always 
achieve minimum local wire length than HC_MZSK but its skew is always higher. Also 
the HC_M_AHHK aims at minimum wire length. Thus skew will be definitely affected. 
Finding the optimum value of c for HC_M_AHHK to get both minimum value of skew 
and wire length is a hard task since optimum value of ‘c’ changes for each benchmark.  
 
 70 
Table 15. Summary of comparative results 
 
Comparing Comparative Summary of results 
Table 1 and 
Table 4 
CDN with single mesh gives small value for maximum skew and 
maximum fall time but uses more wire length compared to CDN with 
H-tree only. 
Table 2 and 
Table 5 
CDN with 2x2 reconfigurable mesh gives small value for maximum 
skew and maximum fall time but uses more wire length compared to 
CDN with H-tree enabled with clock gating for almost same level of 
power reduction granularity. 
Table 3 and 
Table 6 
CDN with 4x4 reconfigurable mesh gives small value for maximum 
skew and maximum fall time but uses more wire length compared to 
CDN with H-tree enabled with clock gating for almost same level of 
power reduction granularity. 
Table 7 and 
Table 8 
CDN with 2x2 and 4x4 reconfigurable mesh have optimum value of 
SZF = 50 for ISPD '02.in' benchmark. Similar trend follows for other 
benchmarks. 
Table 9 and 
Table 10 
CDN with 2x2 and 4x4 reconfigurable mesh have optimum value 
(considering the routing required for control signals) of NTG = 17 and 
NTG =9 respectively for ISPD '02.in' benchmark. 
Table 11 and 
Table 12 
Among the different methods of making local connections the 
HC_MZSK method helps to reduce the maximum skew and maximum 
fall time compared to using HC_DC, HC_DC_EXT and HC_M_AHHK 
(with different c values). The HC_MZSK gives reasonable decrease in 
the wire length. The HC_M_AHHK with c= 0 has the lowest wire 
length in most cases but with worst skew. 
Table 13 
The value of maximum skew and maximum fall time for the HC_MZSK 
method is lesser compared to the value obtained using HC_DC, 
HC_DC_EXT and HC_M_AHHK (with different c values) even after 
changing the levels (densities) of clock mesh and H-tree. 
Table 14 
The variation on the maximum skew, maximum fall time and latency is 
less with and without blockages. While there is variation on wire length 
and average power for the simulation done on clock mesh with and 
without blockages. 
 
 
The Table 15 shows the comparative study of the results of various tables (Table 
1 to Table 14) with brief explanation about the impact on different parameters. Although 
the reduction in the skew and wire length for the IBM benchmark is not very high, 
 71 
unlike in ISPD benchmarks, but still we get slight improvement by using the 
HC_MZSK. Thus we can conclude that with usage of HC_MZSK method to make local 
connection we can reduce the maximum skew and local wire length compared to other 
methods to a certain extent.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 72 
8. CONCLUSION 
 
The clock distribution network using the clock mesh has got inherent routing 
redundancies which leads to the improvement in the clock skews and reliability. Because 
of that clock mesh structure has less skew compared to the skew obtained using H-tree 
alone. The leaf level mesh with the top level tree, like H-tree, has been effective in 
reducing the skew variation [31]. However, it is hard to introduce the clock gating in a 
single mesh. So, the reconfigurable mesh can be used. The work deals with the analysis 
of impact on the maximum clock skew, maximum of rise or fall time, average power, 
wire length used for the different levels (2x2 and 4x4 reconfigurable clock mesh) of 
‘reconfigurability’ or clock gating. The transmission gate is used as the control structure 
for sub-blocks in the reconfigurable clock mesh. Variation in size (SZF) and number of 
transmission gates (NTG) has quite some impact on the clock skew and average power. 
The impact on the maximum of fall or rise time, average power by varying SZF and 
NTG is very minimal. The single clock mesh structure for a particular configuration 
gives small clock skew, rise/fall time (better slew rate) than the reconfigurable mesh. 
The reconfigurable clock mesh performs better than clock gated mesh structure with no 
transmission gates between sub-clock meshes. Choosing the optimum NTG and SZF 
values will make the skew and rise time very close to that of single clock mesh for very 
small increase in the average power. 
In the single mesh and reconfigurable mesh the local routing is done from the 
clock sinks to the clock grid using the direct connection. The local routing methodology 
 73 
can be improved to reduce the wire length used and slew rate can be improved 
(maximum rise/fall time can be reduced) by using modified zero skew method. The 
modified zero skew method (HC_MZSK) which is modified version of conventional 
zero skew algorithm for local routing in clock mesh is compared with the other methods 
like direct connection (HC_DC) to the clock grid in clock mesh, modified AHHK 
(HC_M_AHHK) method for clock mesh and direct connection after elongating to 
balance the wire length for all sinks (HC_DC_EXT). For most of the benchmarks the 
modified zero skew method (HC_MZSK) reduces the wire length when compared to the 
other methods. Although the wire length for the HC_M_AHHK with c=0 is minimal 
among all the benchmark it has a very high skew. So, finding the optimal value of the ‘c’ 
factor is important for HC_M_AHHK method. The value of c which gives optimal 
configuration varies for different benchmarks. Since modified zero skew (HC_MZSK) 
method is based on the Elmore delay balancing rather than on wire length balancing it 
tries to achieve minimum skew and reasonable wire length reduction.  
It tries to reduce the local wire length by small margin and the maximum skew 
by larger percentage when compared to HC_DC (direct connection method). The chip 
size, resistance per unit length and capacitance per unit length is also very high. Thus, 
clock mesh density used for achieving the skew, in the range of 3 ps to 20 ps, is very 
high. The number of sub-blocks of the clock mesh used for most of the benchmarks were 
either (16x16) or (32x32) to achieve the required skew. Because of the high mesh 
density the overall reduction in wire length for local wire length achieved is small in 
most of the cases. It is still better to use the HC_MZSK method for clock mesh local 
 74 
routing to get the improvement in the clock skew reduction, rise or fall time reduction 
and wire length reduction in most of the benchmarks. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 75 
REFERENCES 
 
[1] H. Sato, A. Onozawa, and H. Matsuda, “A balanced-mesh clock routing 
technique using circuit partitioning,” in Proc. of the European Conf. on Design 
and Test, Paris , France, 1996, pp. 237-243. 
[2] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI. Menlo Park, 
CA: Addison-Wesley, 1990.  
[3] H. Bakoglu, J. Walker, and J. Meindl, “A symmetric clock distribution tree and 
optimized high speed interconnections for reduced clock skew in ULSI and WSI 
circuits,” in Proc. IEEE Int. Conf. Computer Design, Port Chester, NY, 1986, pp. 
118–122. 
[4] M.A.B. Jackson, A. Srinivasan and E.S. Kuh, “Clock routing for high 
performance ICs,” in Proc. ACM/IEEE Design Automation Conf., Orlando, FL, 
1990, pp. 573-579. 
[5] F. Minami and M. Takano, “Clock tree synthesis based on RC delay balancing,” 
in Proc. of IEEE Custom Integrated Circuits Conf, Boston, MA, 1992, pp. 1–28. 
[6] M. Edahiro, “A clustering-based optimization algorithm in zero-skew routings,” 
in Proc. of Design Automation Conf., Dallas, TX, 1993, pp. 612-616. 
[7] S. Dhar, M. Franklin, and D. Wann, “Reduction of clock delays in VLSI 
structures,” in Proc. of IEEE Int. Conf. on Computer Design, Port Chester, 1984, 
pp. 778–783. 
 76 
[8] J. Fishburn, “Shaping a VLSI wire to minimize Elmore delay,” in Proc. of the 
European Conf. on Design and Test, Paris, France, 1997, pp. 244-251. 
[9] Q. Zhu and W. Dai, “High-speed clock network sizing optimization based on 
distributed RC and lossy RLC interconnect models,” IEEE Trans. on Computer 
Aided Design of Integrated Circuits and Systems, vol. 15, pp. 1106–1118, 1996. 
[10] W. Elmore, “The transient response of damped linear networks with particular 
regard to wideband amplifiers,” Journal of Applied Physics, vol. 19, no. 1, pp. 
55-63, 1948. 
[11] J. Vlach, J. Barby, A. Vannelli, T. Talkhan, and C. Shi, “Group delay as an 
estimate of delay in logic,” IEEE Trans. on Computer-Aided Design of 
Integrated Circuits and Systems, vol. 10, no. 7, pp. 949–953, 1991. 
[12] A. Farrahi, C. Chen, A. Srivastava, G. T´ellez, and M. Sarrafzadeh, “Activity-
driven clock design,” IEEE Trans. on Computer Aided Design of Integrated 
Circuits and Systems, vol. 20, no. 6, pp. 705–714, 2001.   
[13] V. Tiwari, D. Singh, S. Rajgopal, G. Mehta,R. Patel, and F. Baez, “Reducing 
power in high performance microprocessors,” in Proc. of the 35th Annual Design 
Automation Conf., San Francisco, CA, 1998, pp. 732-737. 
[14] A. Hemani, T. Meincke, S. Kumar, A. Postula, T. Olsson, P. Nilsson, J. Oberg, P. 
Ellervee, and D. Lundqvist, “Lowering power consumption in clock by using 
globally asynchronous locally synchronous design style,” in  Proc. of the 36th 
Annual Design Automation Conf., New Orleans, LA, 1999, pp. 873–878. 
 77 
[15] L. Benini and G. De Micheli, “Transformation and synthesis of FSMs for low-
power gated-clock implementation,” in Proc. of Int. Symp. on Low Power 
Design, Dana Point, CA, 1995, pp. 21-26. 
[16] L. Benini, G. D. Micheli, E. Macii, M. Poncino, and R. Scarsi. “Symbolic 
synthesis of clock-gating logic for power optimization of synchronous 
controllers,” ACM Trans. Des. Autom. Electron. Syst., 1999, vol. 4, no. 4, pp. 
351–375. 
[17] F. Emnett and M. Beige, “Power reduction through RTL clock gating,” in The 
Synopsys Users Group (SNUG), San Jose, CA, 2000. 
[18] D. Brooks and M. Matrons, “Value-based clock gating and operation packing: 
dynamic strategies for improving processor power and performance,” ACM 
Trans. on Computer Systems (TOCS), vol. 18, no. 2, pp. 89–126, 2000. 
[19] H. Li, S. Bunya, Y. Chen, T. Vijaykumar, and K. Roy, “Deterministic clock 
gating for microprocessor power reduction,” in Proc. of the 9th Int. Symp. on 
High-Performance Computer Architecture, Anaheim, CA, 2003, pp. 113-122. 
[20] M. Franklin and D. Wann, “Asynchronous and clocked control structures for 
VLSI based interconnection networks,” in Proc. of the 9th Annual Symp. on 
Computer Architecture, Austin, TX, 1982, pp. 50–59. 
[21] M. Mori, H. Chen, B. Yao, and C. Cheng, “A multiple level network approach 
for clock skew minimization with process variations,” in Proc. of the Asia and 
South Pacific Design Automation Conf., Yokohama, Japan, 2004, pp. 263–268. 
 78 
[22] A. Rajaram and D. Pan, “MeshWorks: an efficient framework for planning, 
synthesis and optimization of clock mesh networks,” in Proc. of the Asia and 
South Pacific Design Automation Conf., Seoul, Korea, 2008, pp. 250–257. 
[23] J. Hu, A. Rajaram, and R. Mahapatra, “Reducing clock skew variability via cross 
links,” in Proc. of the ACM/IEEE Design Automation Conf., San Diego, CA, 
2004, pp. 18–23. 
[24] G. Venkataraman, N. Jayakumar, J. Hu, P. Li, S. Khatri, A. Rajaram, P. 
McGuinness, and C. Alpert, “Practical techniques to reduce skew and its 
variations in buffered clock networks,” in IEEE/ACM Int. Conf. on Computer-
Aided Design, San Jose, CA,  2005, pp. 592–596. 
[25] S. Tam, S. Rusu, U. Desai, R. Kim, J. Zhang, and I. Young, “Clock generation 
and distribution for the first IA-64 microprocessor,” IEEE Journal of Solid-State 
Circuits, vol. 35, no. 11, pp. 1545-1552, 2000. 
[26] P. Restle, C. Carter, J. Eckhardt, B. Krauter, B. McCredie, K. Jenkins, A. Weger, 
and A. Mule, “The clock distribution of the Power4 microprocessor,” in IEEE 
Int. Solid-State Circuits Conf., San Francisco , CA, 2002, vol. 1, pp. 144–145. 
[27] J. Hart, K. Lee, D. Chen, L. Cheng, C. Chou, A. Dixit, D. Greenley, G. Gruber, 
K. Ho, J. Hsu, et al.,  “Implementation of a fourth-generation 1.8-GHz dual-core 
SPARC V9 microprocessor,” IEEE Journal of Solid-State Circuits, vol. 41, no. 1, 
pp. 210–217, 2006. 
[28] R. Berridge, R. Averill III, A. Barish, M. Bowen, P. Camporese, J. DiLullo, P. 
Dudley, J. Keinert, D. Lewis, R. Morel, et al., “IBM POWER6 microprocessor 
 79 
physical design and design methodology,” IBM Journal of Research and 
Development, vol. 51, no. 6, pp. 685–714, 2007. 
[29] D. Dobberpuhl, R. Witek, R. Allmon, R. Anglin, D. Bertucci, S. Britton, L. 
Chao, R. Conrad, D. Dever, B. Gieseke, et al., “A 200-MHz 64-b dual-issue 
CMOS microprocessor,” IEEE Journal of Solid-State Circuits, vol. 27, no. 11, 
pp. 1555– 1567, 1992. 
[30] J. Warnock, J. Keaty, J. Petrovick, J. Clabes, C. Kircher, B. Krauter, P. Restle, B. 
Zoric, and C. Anderson, “The circuit and physical design  of the POWER4 
microprocessor,” IBM Journal of Research and Development, vol. 46, no. 1, pp. 
27–51, 2002. 
[31] P. Restle, T. McNamara, D. Webber, P. Camporese, K. Eng, K. Jenkins, D. 
Allen, M. Rohn, M. Quaranta, D. Boerstler, et al., “A clock distribution network 
for microprocessors,” IEEE Journal of Solid-State Circuits, vol. 36, no. 5, pp. 
792–799, 2001. 
[32] X. Ye, M. Zhao, R. Panda, P. Li, and J. Hu, “Accelerating clock mesh simulation 
using matrix-level macromodels and dynamic time step rounding,” in Proc. of 
9th Int. Symp. on Quality Electronic Design, San Jose, CA, 2008, pp. 627–632. 
[33] G. Wilke, R. Fonseca, C. Mezzomo, and R. Reis, “A novel scheme to reduce 
short-circuit power in mesh-based clock architectures,” in Proc. of 21st Annual 
Symp. on Integrated Circuits and System Design, Gramado, Brazil, 2008, pp. 
117–122. 
 80 
[34] K. Boese and A. Kahng, “Zero-skew clock routing trees with minimum wire 
length,” in Proc. of the IEEE Int. Conf. on ASIC, Los Angeles, CA, 1992, pp. 
1.1.1- 1.1.5.  
[35] H. Su and S. Sapatnekar, “Hybrid structured clock network construction,” in 
Proc. of Int. Conf. on Computer-aided Design, San Jose, CA, 2001, pp. 333-336. 
[36] T. Chao, J. Ho, and Y. Hsu, “Zero skew clock net routing,” in Proc. of the 29th 
Design Automation Conf., Anaheim, CA, 1992, pp.518-523. 
[37] N. Chou and C. Cheng, “Wire length and delay minimization in general clock net 
routing,” in Proc. of the Int. Conf. on Computer-Aided Design, Santa Clara, CA, 
1993, pp. 552–555. 
[38] J. Cong, A. Kahng, C. Koh, and C. Tsao, “Bounded-skew clock and Steiner 
routing under Elmore delay,” in Proc. of the IEEE/ACM Int. Conf. on Computer-
aided Design, San Jose, CA, 1995, pp. 66–71. 
[39] R. Tsay, “An exact zero-skew clock routing algorithm,” IEEE Trans. on 
Computer-aided Design of Integrated Circuits and Systems, vol. 12, no. 2, pp. 
242–249, 1993. 
[40] C. Alpert, T. Hu, J. Huang, and A. Kahng, “A direct combination of the Prim and 
Dijkstra constructions for improved performance-driven global routing,” in IEEE 
Int. Symp. on Circuits and Systems, Chicago, IL, 1993., pp. 1869–1872. 
[41] R. Gupta, B. Tutuianu, and L. Pileggi, “The Elmore delay as a bound for RC 
trees with generalized  input signals,” IEEE Trans. on Computer-aided Design of 
Integrated Circuits and Systems, vol. 16, no. 1, pp. 95–104, 1997.  
 81 
APPENDIX 
 
The maximum skew results obtained for the 2x2 and 4x4 reconfigurable clock 
mesh are compared with the maximum skew for the single mesh and maximum skew of 
clock mesh without transmission gates (control structures) between the sub-clock 
meshes (NTG = 0). It is presented below for different ISPD 2010 benchmarks.   
Also the comparison of the maximum skew with change in the NTG is done for 
’02.in’ benchmark with SZF = 100 but the optimum value of SZF for the same 
configuration is SZF =50 with NTG =17. 
 
 Clock distribution network 
Max Skew (ps) 
Benchm
arks 
With 
Single 
mesh 
Optimum max 
skew with 2x2 
reconfigurable 
clock mesh 
Optimum SZF 
(clock mesh of 
level 5 and H-tree 
of level 4) 
Optimum NTG 
(clock mesh of 
level 5 and H-tree 
of level 4) 
Max skew with 
no TG between 
the sub-clock 
meshes 
01.in 11 11.4 50 17 11.6 
02.in 16.9 18.7 50 17 20.4 
03.in 2.52 2.57 100 17 2.62 
04.in 3.93 3.98 100 17 4.12 
05.in 2.85 3.03 80 17 3.4 
06.in 4.11 5.34 60 17 5.97 
07.in 5.75 5.84 40 17 5.91 
08.in 5.35 5.72 30 17 5.97 
 
 
 82 
 
 
Clock distribution network 
Max Skew (ps) 
Bench
marks 
With 
Single 
mesh 
Optimum max 
skew with 4x4 
reconfigurable 
clock mesh 
Optimum SZF 
(clock mesh of 
level 5 and H-
tree of level 4) 
Optimum NTG 
(clock mesh of 
level 5 and H-
tree of level 4) 
Max skew without 
Transmission gates 
between the sub-
clock meshes 
01.in 11 12.5 50 9 13.4 
02.in 16.9 18.8 50 9 20.4 
03.in 2.52 2.63 80 9 3.71 
04.in 3.93 3.99 60 9 4.67 
05.in 2.85 3.01 40 9 3.84 
06.in 4.11 5.4 60 9 7.06 
07.in 5.75 6.38 60 9 7.97 
08.in 5.35 6.69 60 9 8.54 
 
 
Benchmark Clock distribution network 
02.in 
H-tree (of level 4) with 2x2 reconfigurable mesh (of level 5) with 
clock gating (level =1) keeping the transmission gate size factor 
(SZF) value as 100 
Changing the 
number of TG gate 
connection(NTG) in 
for each sub-blocks 
Max Skew 
(ps) 
Max Fall 
time(ps) 
Average 
Power(mW) 
Wire 
Length 
(cm) 
Max 
latency 
(ns) 
17 19 78 545.7 98.84 2.46 
9 18.8 77.47 545.08 98.84 2.46 
5 19.4 77.13 544.66 98.84 2.46 
3 20.2 76.92 544.41 98.84 2.46 
2 20.3 76.7 544.38 98.84 2.46 
1 20.3 76.2 544.22 98.84 2.46 
0 20.4 76.71 544.03 98.84 2.46 
 
 83 
VITA 
 
Sundararajan Ramakrishnan received his Bachelors of Engineering degree in 
electronics and communication engineering from PSG College of Technology, 
Coimbatore, India in 2008. He joined the computer engineering program at Texas A&M 
University in September 2008 and graduated with Master of Science degree in August 
2010. His research interests include VLSI CAD algorithms for physical design 
automation, logic design, ASIC design and verification. 
 He may be reached at the Department of Electrical and Computer Engineering, 
Texas A&M University, College Station, TX 77843-3128. His email is 
ramsrajan@tamu.edu. 
 
