In this study, three multiplier-blocks generated by different algorithms are analyzed for their power consumption via transition count based on their implementation on the Xilinx Virtex device. The high level Glitch-Path method, which is used for estimating the relative figures of transitions occurring at the outputs of the adders, has been refined for more accurate estimation and a new method GP Score is proposed. Several design issues are discussed regarding ways of reducing the transitions.
INTRODUCTION
Multiplier-Blocks based on the primitive operations (add, subtract and shift) lead to better utilization of the resources in the parallel implementation of digital filters [l]. When the FIR filter structure shown in Figure 1 has integer coefficients, the multiplier-block can be realized to generate product of the input x[n] and the coefficients, leading to less area and lower power consumption in the filter. The algorithm proposed by Bull and Horrocks (BH) [l] and its modified version (BHM) [2] looks for the closest match to the target value from a range of multiples of all fundamentals and add to graph and continue to do so until every coefficient value is formed. Reduced Adder Graph (RAG-n) [3] algorithm performs an exhaustive search for all possible structures that can be formed with one adder and then continue to do so until the targets are formed. The cost criteria of most algorithms are the number of adders in the graph [2] [3] . The RAG-n algorithm gives the graph with lowest adders. However, it has been shown that fewer adders do not always imply less power [4] . This happens when the logic-depth, the longest path (measured in edges) from the input to any node in the graph, is bigger in comparison. BHM algorithm performs better than RAG-n for long word-length coefficients. A new algorithm, C1, which has been proposed in [5] , aims to reduce the logic-depth of the multiplier-block by choosing the shallowest graph among the candidates even if this results in some additional adders.
In digital CMOS circuits the major source of power dissipation is due to transitions at the circuit nodes, and it is formulated as follows [6] :
where a,,, is the node transition activity factor (the average number of times the node makes a transition in one clock period), CL is the load capacitance, vdd is the supply voltage to the circuit and fClk is the clock frequency of the circuit. It can be easily inferred that the number of transitions taking place in two circuits is an acceptable measure for the comparison of the relative power consumption provided that the number of nodes are not significantly different [6] .
Glitch-Path (GP) count, a high-level tool to estimate the transition activity at the output nodes of the adders, was proposed in [7] . It relies on the fact that transitions generated by an adder output produce more transitions on the next adder stage when there is no pipelining. GP count for a node is formulated as follows:
where GCLpul , and G4;pur-2 are the GP counts at the inputs of i " node (adder).
The total number of GPs in a graph is then defined as:
In this paper, a new method for power comparison estimation, GP Score, has been proposed based on the GP idea. Three multiplier-blocks generated by C1, BHM and RAG-n algorithms were implemented to measure their transition counts and verification of the GP Score. Section 2 is about the calculation of GP Score concept. Implementation details and timing simulations are explained in Section 3. A discussion about the results of all the experimental work undertaken and several design issues on low-power design of multiplier-blocks are given Section 4. Section 5 concludes the paper.
GP SCORE
The GP idea suggests that an adder would produce a GP plus the number of GP's coming to its inputs. This idea as we reported in the past [7] did not take into account the number of adder bits in an adder. Therefore a better and more representative way of predicting the glitch generation would be by considering the number of the adder cells deployed in the actual implementation.
Bearing these assumptions in mind, the procedure for calculating the GP score is as follows:
For an adder or subtractor, calculate the wordlength of the output, n, by;
where x is partial product and 8 is the number of bits of the input. Thefloor function rounds the number to the nearest integer towards minus infinity.
Calculate the maximum number of zeros, e, that are padded to the end of the inputs (if any) by;
where vl, v2 are edge values.
If the operation is a subtraction, the actual adder length, r, is; r = n ( 6 4 If the operation is addition, the adder length is; This is due to the fact that adder cells having one of their inputs permanently connected to zero will not be implemented. 4) If the shift value of one of the inputs is larger than the other inputs word-length, the transition of glitches along the carry chain will decrease. Therefore, the adder length, r, is modified to be;
for the subtractors. For logic-depth values greater than 1, the GP score coming from the previous adders should be considered as the sum of the GP scores for the carry and sum outputs. Therefore a scaling factor 0.67, which has been empirically calculated from real experimental transition figures, should be applied to the GP scores that are affecting the next adder.
As the logic-depth increases, the amount of transitions generated by each adder cell also increases. This fact has been observed by considering the real timing data too. Therefore another empirically derived coefficient value of (W) is applied to the adder length where k is the logic-depth and a is a constant with value 0.55.
The resulting overall GP score formula from equations (4), (5) and (6) is:
I -298 which is a refined form of equation (2) by considering the details about the particular adder or subtractor. This formula operates only at top-level -on the multiplier-block and doesn't include or need any lowlevel data, which is not available to multiplier-block designers. It can easily be incorporated to the design data we have for comparing their relative power consumption. There is a point to remember that, this formula has been derived and used for 2's complement number representation only. It does not function properly for different number formats and alternative formulas need to be derived. [ 5 ] have been designed using VHDL. The filter is a Remez design of order 24 with 12-bit coefficients. For the multiplier-block implementation, the even values are halved until an odd fundamental is found. The absolute value of these coefficients are generated by the multiplier-block and negated by exchanging the adder in the delay line with a subtractor. Table 1 shows the structure of the multiplier-block generated by the BHM algorithm. Each row is an adder and the details about the input to that adder are given in the 2"d column. Last column gives the logic-depth up until that adder. The edge values are implemented by hardwired shifting of the input. If the edge value is negative, a subtractor is generated instead of an adder. Bold partial products are the fundamentals of the coefficients. All adders and subtractors used were of ripple-carry type with optimized length for a particular product.
IMPLEMENTATION & SIMULATION
The VHDL implementation of the resulting filter has been hierarchically synthesized using the Leonard0 Spectrum software [8] . Designs have been optimized for delay. They were implemented on the XILINX Virtex FPGA device; model BG432-4, with the Alliance tool [9] . The placer effort has been set to 2 and timing data was produced after the actual routing for back-annotated simulations.
Timing simulations were performed with the Modelsim simulator with Ips precision [IO] . The filters are excited with 512 uniformly distributed 8-bit random numbers using 2's complement representation.
The transitions occurring at the sum and carry output of each adder in the multiplier blocks has were counted.
I 327 /=41x8+ Ix-1 I 3 I3395 c 7 4 7 x 1 +331x81 5 I
RESULTS
The results are presented in Table 2 . According to the transition figures gathered from the timing simulations, the C1 design has the least amount of transition activity, whereas the RAG-n design has two times more transitions than the others, despite its adder-count figure. It is clearly seen that the logic-depth and GP counts are well correlated with the number of transitions. Our new GP Score metric came out to be the best indicator of transition activity among all measures when the individual ratios with transitions are considered. Figure 4 shows the normalized ratios of GP score and GP counts to the actual transitions for all the adders in three multiplier blocks where 1 represents perfect estimation. The first 20 adders are for the BHM design and the next 19 adders are for the C1 design. The rightmost point on the graph shows an adder with logic-depth 9 from RAG-n design. The standard deviation of the ratios came out as 0.07 for GP score and 0.21 for GP count. Maximum estimation errors for these designs are 60% for GP count and 20% GP score. It is clear from the graph that the GP Score ratio can be taken as a good indication of the transition figures for any adders in any design that uses carry-ripple adders with no pipelining. Table 3 shows the details about the implementation of an adders and a subtractor for product 35 in two different designs. The adder is from BHM design. One of its inputs is from the output of the adder for product 17 and the other one is connected to the input of the filter. The subtractor is from C1 design and used for product 35 too. Both of its inputs are connected to the output of the adder for product 5. As seen from the table, transition activities are significantly different even though their logic-depth are the same. One reason for this is the difference of number of actually implemented adderhbtractor cells in the designs. Subtractors have almost always more bits implemented than the adders. Another reason is the I -299 amount of transition activity occurring at the inputs of the adderhbtractor cells. Both of these facts are covered by the idea of the GP score and the outcome of the GP Score is in good correlation with the transition activity as seen from the Table 3.   TABLE 2 
CONCLUSION
A new high-level method called GP score ratio has been proposed and tested.
Three multiplier-blocks generated by the RAG-n, BHM and C1 algorithms were implemented on a XILINX Virtex device and their transition activity was observed. The C1 design was found to be slightly better than BHM. The RAG-n design had the most transition activity even though it has the least amount of adders. Our novel GP Score metric was found to be a good indicator of transition activities of the adders with 20% maximum estimation error when compared to 60% error of GP count. Future work will focus on the power estimation of multiplier-blocks with carry-save adders.
