












論  文  題  目  
 
Op t imiza t i o n  o f  C lo c k  Ga t in g  



























2012 年  6 月  
2 
Power  consumpt ion  has  become a  major  concern  f or  re l iab i l i ty  prob lem 
of  semiconductor  product s ,  espec ia l ly  w i th the  s igni f i cant  spread  o f  
portab le  dev ices ,  l ike  smartphone  in  recent  years .  Major  source  o f  power  
consumpt ion is  the  c l ock  tree  which  may account  f or  45% o f  the  sys tem 
power,  and  c lock  gat ing  is  a  w ide ly  used tech nique  to  reduce  th is  port ion  
o f  power  d iss ipat ion .  The bas i c  idea o f  c l ock  gat ing i s  to  reduce  the  
dynamic  power  consumption  o f  regi sters  by  swit ch ing o f f  unnecessary  
c lock  s ignals  to  the  reg is ters  se lect ive ly  depending on the  contro l  s ignal  
wi thout  v io lat ing the  funct ional  correc tness .  Clock  gat ing  may lead  to  a  
cons iderab le  power  reduct ion o f  overa l l  system  with proper  contro l  
s ignals .  
S ince  the  c l ock  gat ing l og ic  consumes  ch ip  area and power,  i t  i s  
imperat ive  to  minimize  the  number  o f  inserted  c lock  gat i ng ce l l s  and the i r  
swi t ch ing act iv i ty  for  power  opt imizat ion.  Commerc ia l  t oo l s  support  c l ock  
gat ing as  a  power  opt imizat ion f eature  based on  the  guard s ignal  
descr ibed in  HDL and  the  minimum number  o f  reg i sters  inser t ing  the  
c l ock  gat ing  ce l l  spec i f i ed  as  the  synthes i s  opt ion (s tructural  method) .  
However,  th is  approach  requires  manual  ident i f i cat ion o f  the  proper  
contro l  s ignal  and  the  proper  group ing o f  reg is ters  t o  be  gated .  That  is  
hard  and des igner - intens ive  work.  Automat ic  c l ock  gat ing generat ion  and  
opt imizat ion is  necessary.  
In  this  d is ser tat ion ,  we f ocus  on the  opt imizat ion o f  c l ock  gat ing l og ic  
based  on swit ch ing act iv i ty  analys i s  inc lud ing c lock  gat ing contro l  
cand idate  extrac t ion f rom internal  s ignals  in  the  or ig ina l  des ign and  
opt imum contro l  s ig nal  se lec t ion  cons ider ing  sharing  o f  a  c lock  gat ing ce l l  
among mult ip le  re g is ters  f or  power  and  area  opt imizat ion .  An 
opt imizat ion method o f  s ing le -s tage  c l ock  gat ing  l og i c  for  dynamic  power  
reduct ion  o f  reg is ters  is  f i rs t ly  proposed ,  and is  enhanced to  mul t i - s tage  
c l ock  gat ing  to  reduce  a lso  the  power  d iss ipat ion  o f  c l ock  gat ing  ce l l s .  The  
proposed method supports  automat i c  c l ock  gat ing generat ion  combined 
wi th the  wide ly  used  commerc ia l  t oo l  f or  rea l - l i f e  app l i cat ions .  
This  d is sertat ion  cons is ts  o f  5  chap ters  organized  as  f o l l ows .  
Chapter  1  [ Introduct ion]  summarizes  power  reduct ion  methods  in  
current  LSI des ign ,  the  background  and  the  re lated  works  on c lock  gat ing  
technique.  Based  on  the  previous  research ,  we  show the  bas i c  idea  o f  the  
proposed  c lock  gat in g opt imizat ion met hod for  power  and  area reduct io n,  
which  can  be  app l ied  f or  s ing le  and  mult i  cascaded  c lock  gat ing  s tages .  
3 
The organizat ion o f  the  d issertat ion  i s  a l so  descr ibed in  th is  chapter.  
Chapter  2  [Pre l iminar ies ]  g ives  a  deta i led  int roduct ion on c lock  
gat ing  technique ,  such  as  lat ch - f ree  and lat ch -based c lock  gat ing ,  
enhanced  c lock  gat ing ,  mul t i - s tage  c l ock  gat ing and  hierarchica l  c lock 
gat ing .  Binary  Dec is ion Diagram (BDD)  is  a lso  introduced  f or  l og i c  
funct ion manipulat ion ,  which  i s  the  bas is  o f  t he  proposed  methods  to  
check  the  sat is fac t ion  o f  c l ock  gat ing  cond it i on  and  to  compute  
1-probabi l i ty  ( swi tch ing  ac t iv i ty  o f  gated  reg is ters )  o f  each  c lock  gat ing  
contro l  cand idate  f or  minimum cos t  computat ion.  
Chapter  3  [Swit ch ing  Act iv i ty  Based S ingle -Stage  Clock  Gat ing]  
d i scusses  our  swi t ching  act iv i ty  based s ing le - s tage  opt imizat ion  a lgor i thm  
us ing  BDD.  In  order  to  deal  w i th the  t rade -o f f  be tween  power  sav ings  o f  
gated  reg isters  and power  penalty  o f  synthes ized  c lock  gat ing l og ic ,  we 
newly f ormal ize  the  contro l  s ignal  se lec t i on  phase  cons ider ing shar ing o f  a  
c l ock  gat ing  contro l  among  mul t ip le  reg is ters  to  minimize  the  number  o f  
inser ted  c lock  gat ing  ce l l s .  A coef f i c i ent  α  i s  int roduced  to  measure  the  
cost  o f  a  c l ock  gat ing ce l l  depending  on techno logy l ibrar ies .  α  i s  the  rat io  
o f  the  power  consumpt ion o f  a  c l ock  gat ing ce l l  w ith  respect  t o  that  o f  a  
f l ip - f l op ,  measured as  0 .6~0.8 .  We  devise  a  swi tch ing act iv i ty  based  
eva luat ion  method  o f  dynamic  power  consumpt ion  and  in  the  exper iments  
us ing a  commercia l  too l ,  we  conf i rm that  our  evaluat ion method has  the  
same tendency with  the  actua l  power  consumption  a f ter  layout .  We  
deve lop  methods  based  on  BDD by  add ing a  mechanism to  compute  the  
minimum cos t  path  in  BDD which  corresponds  to  the  opt imum power  
reduct ion  o f  a  c ir cui t  and  to  show the  path  in format ion  f or  c l ock  gat ing  
contro l  s ignal  insert ion  with  input  probabi l i ty.  Contro l  cand idate  pruning  
is  a l so  int roduced to  e f f ect ive l y  speed up  the  method .  
With the  proposed method ,  19.1% -71.9% power  reduct ion has  been 
found on counter  c i r cui ts  a f ter  layout ,  and 2 .3% -18 .0% cos t  reduct ion on  
ISCAS89 and Opencore  benchmark c i r cui ts .  About  2% i mprovement  
compared  with  prev ious  research has  been achieved .  By  contro l  cand idate  
pruning,  69% candidates  have been pruned  on benchmark c ir cu it s .  
Chapter  4  [ Automat i c  Opt imizat ion  o f  Mul t i -Stage  Clock  Gat ing 
Logi c ]  shows  Integer  Linear  Programming  ( ILP)  formulat ion based  
automat ic  mult i - s tage  c lock  g at ing opt imizat ion method.  
In  s ing le - s tage  c l ock  gat ing ,  c l ock  gat ing  ce l l  i tse l f  consumes  power  
re lated  to  α  ( 0 .6~0 .8  vs .  F.F. ) .  By  cascaded mul t i - s tage  c l ock  gat ing ,  
4 
unnecessary  c lock  pulses  to  c l ock  gat ing  ce l l s  can  be  avo ided  by  o ther  
c l ock  gat ing ce l l s  a t  cascaded stages ,  so  that  the  swi tch ing act iv i ty  o f  
c l ock  gat ing  ce l ls  can  be  reduced.  Commerc ia l  t oo ls  can  inser t  mul t i - s tage  
c l ock  gat ing,  but  that  jus t  depends  on  the  descr ibed  guard s ignal  
s t ruc ture .  So  we  enhance  the  s ing le -s tage  method and propos e  an 
automat ic  mult i - s tage  c lock  gat ing method.  
In  th is  chapter,  an  automati c  mult i - s tage  c l ock  gat ing  opt imizat ion 
method  us ing  ILP f ormulat ion  has  been  proposed  and  d is cussed .  The 
method inc ludes  c l ock  gat ing  contro l  cand idate  combinat ion extract ion ,  
constraints  cons truct ion in  LP f ormat  and opt imum control  s ignal  
se lec t ion  at  cascaded c lock  gat ing  s tages  cons ider ing  the  shar ing o f  a  
c l ock  gat ing contro l  among mul t ip le  reg is ters  and c lock  gat ing ce l ls .  We 
f ind  that  any  mult i - s tage  contro l  s ignal  i s  a l so  a  s ing le - s tage  contro l  
s ignal ,  and  that  any  combinat ion  o f  s ignals  can  be  se lec ted  f rom 
s ingle - s tage  cand idates .  We a lso  deve lop  an automated  c lock  gat ing too l  t o  
automat ica l ly  add guard  cond it i ons  at  cascaded s tages  into  the  s t ruc tura l  
Ver i log  and  to  deter mine the  opt imum minimum_bi twidth va lue ,  which 
wi l l  be  trans lated  into  mul t i - s tage  c l ock  gat ing l og ic  by commerc ia l  EDA 
too ls  f o l lowing  the  s tandard synthes is  and  layout  procedures  f or  rea l - l i f e  
app l icat ions .  
By pos t - layout  power  est imat ion  on  8  benchmark c ir cu it s  ( ISCAS89,  
Opencore  and inter face  c ir cu it s )  and  a  Low Dens i ty  Pari ty  Check (LDPC)  
Decoder  (6 .6K  gates ,  212  F.F. s )  us ing  Synopsys  NanoSim,  on  average ,  35% 
ac tual  power  reduct ion has  been achieved compared wi th  or ig ina l  des igns  
and 31% improvement  f rom st ructura l  gat ing approach has  been  obtained .  
CPU t ime for  opt imum mult i - s tage  contro l  se lect i on us ing  a  commerc ia l  
ILP so lver  ( IBM CPLEX)  is  severa l  seconds  f or  up to  25 K var iab les  in  LP 
format .  In  add it i on with actua l  power  reduct ion,  up  to  30% area reduct ion  
has  al so  been  obta ined  compared wi th or ig ina l  des igns  without  c l ock  
gat ing  by  the  reduct ion  o f  mult ip lexers  f or  contro l l ing  reg ister  banks .  By  
rep lac ing these  mul t ip lexers  w ith  c lock  gat ing log ic  shared by those  
reg isters ,  corresponding area  o f  the  mul t ip lexers  i s  e l iminated.  No setup  
and  ho ld  t iming  v io lat ion as  we l l  as  skew v io lat ion were  observed  a f ter  
implement ing mul t i - s tage  c lock  gat ing .  
Chapter  5  [Conc lus ion]  summaries  the  proposa ls  and  draws  conc lus ion  
o f  thi s  d is ser tat ion.  Future  work re lated  to  system level  app l icat ion o f  
mul t i - s tage  c lock  gat ing  in  accordance  with  the  newest  semiconductor  
process  techno logy  has  a lso  been  d is cussed.  
