T h is paper pres ents a d es ign and optimiz ation tech niq u e for th e M u ltiple R es tricted M u ltiplication prob lem [1 ] . T h is refers to a s itu ation w h ere a s ingle v ariab le is mu ltiplied b y s ev eral coefficients w h ich , w h ile not cons tant, are d raw n from a finite s et of cons tants th at ch ange w ith time. T h e approach ex ploits d ed icated regis ters in F P G A arch itectu re for fu rth er time-s tep b as ed optimiz ation ov er prev iou s approach es [1 , 2 ]. It is als o comb ined w ith an effectiv e techniq u e, b as ed on h igh -lev el pow er mod elling, for pow er optimiz ation. T h e prob lem is formu lated into an integer linear program for find ing s olu tions to th e minimu m-cos ts . T h e new approach res u lts u p to 2 2 % area s av ing compared to th e optimal non-regis ter approach in [1 ] , and 8 0 % of all res u lts als o s h ow 2 1 % -4 8 % pow er s av ings .
. INTRODUCTION AND B ACK G ROUND
In th e pas t, th ere h as b een s ome w ork on a mu ltiplication prob lem w h ere, for each mu ltiplier, one mu ltiplicand cons is ts of a s et of coefficients w h ich ch ange w ith time. S u ch an operation cou ld b e cons id ered as time-mu ltiplex ed mu ltiplication [1 , 2 , 3 , 4 ] . A ty pe of th is prob lem referred to as M u ltiple R es tricted M u ltiplication (M R M ) w as propos ed in [1 ] . M R M refers to a s itu ation w h ere a s ingle v ariab le is mu ltiplied b y s ev eral coefficients w h ich , w h ile not cons tant, are d raw n from a relativ ely s mall s et of v alu es . S u ch a s itu ation commonly appears in fold ed implementation of F IR filter [5 ] , and poly nomial ev alu ation [6 , 7 ] . F ig. 1 (a) illu strates a general s tru ctu re th at is a s traigh tforw ard approach u s ing R O M s and mu ltipliers to implement th is mu ltiplication. T h e s tru ctu re in F ig. 1 (a) contains s ev eral mu ltipliers w ith a common inpu t x, and s ets of cons tant mu ltiplicand s lab elled as {c 11 , c 12 , . . . , c 1T }, {c 21 , c 22 , . . . , c 2T }, . . . , {c C1 , c C2 , . . . , c CT }, w h ere C and T are th e nu mb er of th e s ets or ou tpu ts , and th e nu mb er of th e coefficients in each s et, res pectiv ely . T h e firs t s u b s cript h ere refers to th e T h e w ork on th is paper w as s u pported b y a R oy al S ociety grant (2 0 0 4 /R 1 ). s patial ind ex and th e s econd to th e time ind ex , i.e. c it is th e v alu e of mu ltiplicand i at time t. In [1 ] , it w as s h ow n th at th e M R M prob lem can b e add res s ed th rou gh ex tend ing th e b as ic u nit of operation from an ad d ition, u s ed in M u ltiple C ons tant M u ltiplication (M C M ) [8 ] , to an ad d er/mu ltiplex er comb ination s h ow n in F ig. 1 (b ). A s an ex ample, as s u me w e h av e a s ingle v ariab le inpu t x mu ltiplied b y tw o s ets of coefficients {16 5 , 132, 32} and {4 0 , 32, 8}. A n optimiz ed implementation of th is M R M prob lem can b e s een in F igs . 1 (c)-(e), and is d es crib ed b elow .
F ig. 1 (c) is recogniz ab le as a s tand ard M C M s olu tion, containing tw o ad d ition nod es , and generating th e tw o v alu es 16 5 x and 4 0 x. F igs . 1 (d ) and (e) illu s trate h ow th e s ame Data F low G raph (DF G ) s tru ctu re can b e u s ed to obtain th e remaining coefficients b y s electing th e b eh av iou r of th e nod es from th e pos s ib ilities s h ow n in F ig. 1 (b ).
Recent work on time-multiplexed multiplication that relate to M RM has appeared in the literature [2 , 4 ] . T he approaches es s entially work b y cons tructing s uch multiplication as a comb ination of adder/s ub tractor/multiplexer node. T he mos t important feature is that for X ilinx-b as ed implementations , implementing thes e nodes req uires the s ame res ources that would b e us ed in an addition. T his prov ides an effi cient des ig n in terms of area complexity reduction. In [4 ] , it was inv es tig ated that us ing adder/s ub tractor-multiplexer comb ination can b e confi g ured up to 6 3 different cells .
T hes e are applied in des ig n methodolog y for implementing multipliers with a limited rang e of coeffi cients . T he work b y [2 ] has b eg un to addres s this prob lem us ing the ty pe of computational node demons trated in [4 ] . D ue to demanding increas es in low-power electronic applications , power cons umption has b ecome a critical is s ue. A s a res ult, in this paper, we pres ent a des ig n and optimiz ation techniq ue for the M RM prob lem b y extending prev ious approaches to exploit the dedicated reg is ters in a V irtex s lice, and introducing a techniq ue to optimiz e power cons umption.
F or a g iv en s et of coeffi cients , it is pos s ib le to hav e more than one feas ib le s olution with the s ame area. A lthoug h in area terms there are eq uiv alent, the s olutions are dis tinct in the way that the functions of each node are performed as well as the g raph topolog ies . D is tinct chang ing from one function to another caus es different s ig nal trans itions or s witching activ ity . F rom the power cons umption point of v iew, it is adv antag eous to minimiz e the s witching . T he propos ed approach introduces a hig h-lev el power macro-model which is a function of the numb er of adder/multiplexer nodes , and of s witching s at output nodes caus ed b y the chang ing of node function. F rom empirical ev idence, thes e parameters s how cons iderab le impact on power optimiz ation. T he parameters are us ed to form a cos t function for g uiding the optimiz ation proces s when it is formulated into an Integ er L inear P rog ram (IL P ) [9 ] .
T his paper therefore makes the following nov el contrib utions : 1 . T o our knowledg e, the fi rs t alg orithm dealing with time-multiplexed multiplication that exploits dedicated reg is ters for optimiz ed F P G A -b as ed implementation. 2 . T he fi rs t q uantifi cation of the power s av ing s pos s ib le on timemultiplexed multiplication. 3 . Introduction of power macromodelling for the M RM prob lem. 4 . E xtended formulation of the minimum-area M RM and power cos t as IL P formulation. T he res t of this paper is org aniz ed as follows : S ection 2 demons trates an effi cient implementation of adder/multiplexer exploiting dedicated reg is ter on X ilinx V irtex family of F P G A s for M RM s olution. S ection 3 pres ents repres enting g eneral D F G s for M RM prob lem and then formulating into IL P is g iv en in S ection 4 . A n approach for power optimiz ation is des crib ed in S ection 5 . S ection 6 collects res ults and concludes the paper. 
AN EFFICIENT IMPLEMENTATION OF AD D ER /MU LTIPLEX ER W ITH D ED ICATED R EG IS TER

. No d e Fu n c tio n s o f Ad d e r /m u ltip le x e r w ith R e g is te r fo r MR M Pr o b le m
It can b e s een that the 4 -input L U T can b e confi g ured to b e v arious functions . If we allow the dedicated reg is ter output (the q v ariab le in F ig . 2 (b )) to b e an input to thos e functions , s o we will hav e more confi g urations pos s ib le than there in prev ious work [1 ] . T his increas es the pos s ib ility for more res ource s haring leading to minimiz ation of the hardware res ources . S ince the L U T has 4 inputs , we dis ting uis h two conditions : either 2 v ariab les and 2 s electors , allowing s witching b etween up to 4 functions , or 3 v ariab les and 1 s elector, allowing s witching b etween up to 2 functions . T hes e are s hown as F ig . 2 (c).
O ur approach focus es on 7 common functions : a, b, a + b, a + q, b + q, 2q and q. T he pos s ib le confi g urations are clas s ifi ed as three ty pes (T y pe 1 , 2 and 3 ) as s hown in Table 1 . Listing of all possible confi gurations by type
, and listed in colum ns by type in T able 1 . F or exam ple, th ere is only one confi guration of T ype 1 , w h ich allow s th e selector to sw itch betw een th e operations a, b, and a + b. F or type 2 , th ere are tw o possible confi gurations, w ith th e options sh ow n in T able 1 . T h e num ber of outputs, w h ich w e sh all d enote C, correspond s to th e num ber of sets of tim e-v arying coeffi cient(s). F or T -tim e steps, w e req uire ov erall T repetitions; all signals th at control each correspond ing sh ifter and m od el m ultiplex er are tied togeth er. T h is ensures th at sh ifting and routing for all graph s (all T ) are th e sam e. T h e only select line allow ed to ch ange w ith th e tim e is th e select line internal to each ad d er/m ultiplex er nod e, w h ich can be ch anged to ach iev ed th e d esired output v alues, as illustrated in T able 1 .
. R E P R E S E N TI
F ig. 3 (b) sh ow s a portion of th e general structure and th e d etail of a com putational nod e. T o perform th e function th at correspond s to register, an output of th e i th nod e of m od el in prev ious tim e step t − 1 (x i,t−1 ) w ill be an input for th e i th nod e of th e m od el in t. A ll notations prov id ed h ere are ex plained in d etail to d iscuss in ILP in S ection 4 and 5 .
. TR A N S F O R M A TI O N I N TO I L P F O R M U L A TI O N
A n instance of th e problem is encod ed as a T × C m atrix , w h ere T is th e num ber of row s correspond ing to th e num ber of tim e steps and C is num ber of colum ns representing outputs. A s th e num ber of com putational nod es correspond s to th e area, fi nd ing th e m inim um -nod e solution is th e area optim al solution for th e T × C problem . In th is section, a set of ILP is presented , w h ich is feasible iff th e problem can be solv ed using a fi x ed num ber, N , of nod es.
T h ere are th ree m ain com ponents: m od el m ultiplex er, sh ifter and ad d er/m ultiplex er w ith register, as sh ow n in th e proposed m od el in F ig. 3 . T h e m od el consists of integer and binary v ariables. In (1 ), th e binary d ecision v ariables m i,r represent th e selection of input x r ,t to th e m od el m ultiplex er, for nod e i at tim e step t, r ∈ { 0, 1, . . . , i − 1}. In (2 ), v ariables q i,k represent th e d egree of sh ifting: q i,k = 1 m eans th at a i,t is th e v alue c i,t th at is sh ifted left by k bits, w h ere k ∈ { 0, 1, . . . , B − 1} and B is th e num ber of bits used . F inally, in (3 ), th e binary v ariables o i,t,p represent w h ich of th e sev en operations listed in T able 1 is to be perform ed at nod e i d uring tim e step t, w h ere p ∈ { 0, 1, . . . , 6}.
. M od el m ultiplex er function
ci,t = xr ,t if mi,r = 1
. S h ifter function
ai,t = 2 k ci,t if q i,k = 1(2 )
. A d d er/m ultiplex er w ith register function
bi,t ai,t + bi,t ai,t + xi,t−1 bi,t + xi,t−1 2xi,t−1 xi,t−1 if oi,t,0 = 1 (option 0 ) if oi,t,1 = 1 (option 1 ) if oi,t,2 = 1 (option 2 ) if oi,t,3 = 1 (option 3 ) if oi,t,4 = 1 (option 4 ) if oi,t,5 = 1 (option 5 ) if oi,t,6 = 1 (option 6 )
4 . O ne source nod e for each input 
6 . O n e o p e ra tio n p e r tim e s te p ∀i, ∀t,
7 . T a b le 1 ty p e c o n s tra in ts (d e s c rib e d in te x t)
∀i, ∀t,
C o n s tra in t (4) a n d (5 ) s ta te th a t m u ltip le x e r m u s t o n ly s ele c t o n e in p u t, (6 ) s ta te s th a t s h ifte r m u s t o n ly s h ift b y o n e k, (7 ) s ta te s th a t o n ly o n e o p e ra tio n c a n b e p e rfo rm e d a t a n y o n e tim e s te p . H o w e v e r, th e to ta l n u m b e r o f o p e ra tio n s o f e a c h c o m p u ta tio n a l n o d e m u s t c o n fo rm to th e c o n fi g ura tio n s h o w n in T a b le 1 . T h is re q u ire s (8 )- (1 2 ). B in a ry v a ria b le s f i,p re p re s e n t th e fu n c tio n s th a t n o d e i p e rfo rm s : 
POWER OPTIMIZATION
In th is s e c tio n , a te c h n iq u e fo r p o w e r o p tim iz a tio n is p res e n te d . A p o w e r m o d e l is p ro p o s e d a n d u s e d a s a g u id e lin e fo r c o n s tru c tin g c o n s tra in ts fo r IL P to o p tim iz e p o w e r c o ns u m p tio n .
. Po w e r D is s ip a tio n in F PG As a n d Ma c r o -m o d e llin g
T h e to ta l p o w e r in a C M O S c irc u it d e s ig n c a n b e d iv id e d in to s ta tic a n d d y n a m ic p o w e r. S ta tic p o w e r d is s ip a tio n d ep e n d s o n th e p h y s ic a l p ro p e rtie s o f th e d e v ic e s , m a in ly o cc u rrin g a s a re s u lt o f le a k a g e c u rre n ts a n d is th e re fo re o u ts id e th e d e s ig n e rs ' c o n tro l fo r F P
O c c a s io n a lly a s e t o f c o e ffi c ie n ts c a n b e c o m p u te d b y s e v e ra l d iffe re n t M R M g ra p h to p o lo g ie s . A s a n e x a m p le , F ig s . 4(a ) a n d [12, 7] T . A lth o u g h b o th F ig s . 4(a ) a n d (b ) p ro v id e th e s a m e a re a (s a m e n u m b e r o f n o d e s ), th e s ig n a l tra n s itio n s o r s w itc h in g a c tiv ity a re d iffe re n t d u e to th e d is tin c t fu n c tio n c h a n g in g o f th e n o d e s . In F ig . 4(a ), c h a n g in g th e fu n c tio n a t n o d e 1 p rod u c e s a s ig n a l s w itc h in g a t its o u tp u t w h ic h th e n p ro p a g a te s to 2 a n d 3 to g e n e ra te th e fu rth e r s ig n a l s w itc h in g s . In 4(b ), s w itc h in g h a p p e n s p rim a rily a t th e o u tp u t o f n o d e 3. and S = 3, and in (b) N = 3 and S = 1. T h u s, th e power mod el is g iv en by
In th is work , we focu s on th e su m of th e log ic power and sig nal power. W e propose (1 5 ) as a su itable fu nctional form of th e power mod el. T h e term N 2 is u sed to mod el th e fact th at with a larg e area, proportional to N , th e capacitance on th e rou tes also increases proportional to N . T o inv estig ate th e factors k 1 and k 2 in (1 5 ), many rand om stru ctu res were g enerated and sy nth esiz ed for Xilinx V irtex-II XC 2 V 1 0 0 0 -4 d ev ice [1 0 ] . T h e rou te mod els were analy z ed u sing v alu ech ang e-d u mp fi les by XPower tool [1 0 ]. A least-sq u ares fi tting approach was th en u sed .
F ig . 5 illu strates th e scatter plot of th e actu al power ag ainst mod el power of (1 5 ) with k 1 = 0.07 9 and k 2 = 0.14 14 . It is sh own th at F ig . 5 fi ts with worst-case error 0 .1 4 % (av erag e 0 .0 4 % ).
. I L P M o d e llin g fo r S ign a l S w itc h in g R e d u c tio n
It is sh own th at a red u ction in nu mber of th e nod es and th e sig nal switch ing s can lead to a red u ction in th e power d issipation. In th is su bsection, th e sig nal switch ing based power consu mption will be incorporated into th e IL P formu lation prev iou sly presented in Section 4 .
A binary v ariable F i,w represents wh eth er th ere is a fu nction switch ing from time step w to w+1 of i t h nod e obtained from (1 6 ) wh ere p ∈ {0, 1, . . . , 6} and w ∈ {1, 2, . . . , T − 1}.
T h e v ariable F in (1 7 ) represents th e total nu mber of th e fu nction switch ing s, and is u sed as an objectiv e fu nction in IL P.
H owev er, in stru ctu res in F ig . 4 (a) and (b), it is sh own th at alth ou g h h av e th e same v alu e fu nction switch ing s, th ey are d ifferent in terms of propag ation cau sed by th e sig nal switch ing s; th erefore extra constraints are req u ired to mod el th is propag ation.
. F u nction switch ing
Fi,w ≥ oi,w,p − oi,w+1,p ( 1 6 ) 2 . T otal nu mber of th e fu nction switch ing s
3 . C onstraints for fu nction switch ing s
Fi,w ≥ F h ,w + A R i,h ,w − 1 ( 2 1 ) 4 . N od e fu nctions of left inpu t connection L i,h ,w ≥ oi,w,0 + m i,h − 1 (2 2 )
L i,h ,w ≥ oi,w,3 + m i,h − 1 ( 2 3 ) 5 . N od e fu nctions of rig h t inpu t connection R i,h ,w ≥ oi,w,1 + n i,h − 1 (2 4 )
R i,h ,w ≥ oi,w,4 + n i,h − 1 ( 2 5 ) 6 . N od e fu nctions of both inpu t connection A L i,h ,w ≥ oi,w,2 + m i,h − 1 (2 6 )
A R i,h ,w ≥ oi,w,2 + n i,h − 1 ( 2 7 ) wh ere all th ese constraints are v alid for ∀i > 1.
T h e ad d itional constraints (1 8 )-(2 1 ) are based on th e fact th at wh en, at time step w, th ere is a solu tion from (2 2 )-(2 7 ) th at, at least, one inpu t of nod e i connects to th e ou tpu t of nod e j, th en
R eferring to F ig . 3 (b), in (2 2 )-(2 3 ), th e binary v ariables L i,h ,w represent th e i t h nod e fu nctions (option 0 and 3 of (3 )) th at req u ire a left inpu t v alu e wh ich is obtained from a prev iou s h t h nod e, wh ere h ∈ {1, 2, . . . , i − 1}. In (2 4 )-(2 5 ), th e v ariables R i,h ,w represent th e i t h nod e fu nctions (option 1 and 4 of (3 )) th at req u ire a rig h t inpu t v alu e wh ich is obtained from th e prev iou s h t h nod e. T h e v ariables of A L i,h ,w and A R i,h ,w in (2 6 )-(2 7 ) represent th e i t h nod e fu nction (option 2 of (3 )) th at req u ires both left and rig h t inpu ts, and each is connected to ou tpu t of th e h t h nod e(s). W h en th e v alu e of L i,h ,w , R i,h ,w , A L i,h ,w or A R i,h ,w is "1 ", it mak es (1 8 ), (1 9 ), (2 0 ) and (2 1 ) respectiv ely , to satisfy th e req u ired cond ition (2 8 ).
RESULTS AND CONCLUSION
The synthesis results of average area and power targeting X ilinx V irtex -II X C 2 V 1 0 0 0 -4 devic e [1 0 ] are illustrated in Tab le 2 and Tab le 3 respec tively. A ll approac hes were tested using 1 0 sets of 4 -b it c oeffi c ients generated random ly. A ll IL P m odels were solved using M O S E K optim iz ation software [1 2 ] .
