I n t h i s p a p c~w e d e M t i b e t h e d m l o~~o f a~l i n g a n d s i n t u l o -
fcrcnt ffcqnencies. l n t h c m i e r o p r o c e s a o r~,~% o b a l c l o c k d i s~i c m i a~a n p r -

I n t h i s p a p c~w e d e M t i b e t h e d m l o~~o f a~l i n g a n d s i n t u l o -
P m f W e~~t h a l t h e S C D V b a s e p i~l i n e i a w a t a v o l t a g e V~~s
dcpcndmtinsttcdons. T h e m e m r y~~i s t s o f i s s n i n g l o a d a t o
thc data cache and fmvdhg data to bpcndmt iustructions. lmodawmg high lataicies in any 0ftkcf.e Uucc aacial Bows willbavc 811 impact on the processor's prformancc.
ThC level 1 instmtion cache and thc branch predictor taken together ars a good caadidatc for one whm block u a n s p o n h to the 
Dynamic Clock and Voltage Management
Experimental Results
T08ases~thepaformaneeandpowerofolltproposcdGALSpmccssor
Spec95andMediabu~chbcnChmarks. Tothiscndwehavcpcrformcdhvo Fignrer 5 , 6 and 7 slww therrlativsaverage pnveraodtotal enagy mnsrrmption of the MCSV processor7 mrmalucd ' againstthetotalenugy and averap power ermsomplion of the synehmmms processor respectively. With the m m R t l ofthe global clock, we gei in some cases up to 5% reduction h the totalenngy, whereas inwmc cases the power savi ns due to atobal clackarids is Offoetbvthe extramwer eonsomotion -.
. due to the longer recovery pipeline and due to slowermdt updates. This is because the aMition of asynchronous cmnmnnieatim chaunels leads to an inaease in ihe effective leegtb ofthc pipcline. This incrrase in pipcline length ia the MCSV processm also leads to 17% higher speculative exd o n on the average aaoss the benchmarks we tested. Similarly, the avemge rmmbcr of in-flight instntaious in the pipeline is higher in thc In case of gcc, perl,fpppp and epic MCDV and MCDV-P are e i l h~ comparabk, or op to 8% betier than their sJmchmnous counterpartOveral1. an m g e power rcductioa of 22% is achieved at 12% lass in performance.
Finally, wc show the breaLdown of m g e power valm for a few bcndnnarlrs considered in the base ( m y synchrions) and GALS (MCSV) urnr. As it can be seen in F i e 8, the totalclack power is raked d w to the elimination ofthe global clock. However, due to ind execntion time and higher speculation, the power umsmned by the frma-cnd incrcascs slightly. as docs the Power consum~m for the D-csche andcnsndion core.
6 Conclusion a v m g e of 12%.
In this papa, we have wed a power/perfmmaacc modeling and analysis framework for highdld superscalar out-of-orda pcesmrs using multipleclocks, and possibly multiple, dynaa~~mcally adjnstablevalhges to
