This paper presents an error compensation method for fixed-width group canonic signed digit (GCSD) multipliers that receive a W-bit input and generate a W-bit product. To efficiently compensate for the truncation error, the encoded signals from the GCSD multiplier are used for the generation of the error compensation bias. By Synopsys simulations, it is shown that the proposed method leads to up to 84% reduction in power consumption and up to 78% reduction in area compared with the fixed-width modified Booth multipliers. key words: fixed-width, GCSD multiplier, quantization error, digital arithmetic 
Introduction
In some DSP applications such as FFT and pulse-shaping filters, multiplications are performed only with a few predetermined coefficients which are time-varying in periodical order. In these applications, multipliers should have programmability. When a few coefficients share a multiplier, modified Booth encoding, which halves the number of partial products, is generally used. To further reduce the number of partial products, the group canonic signed digit (GCSD) multiplier in [1] was recently proposed based on the variation of canonic signed digit (CSD) encoding [2] and partial product sharing algorithm. This multiplier provides an efficient design when the multiplications are performed only with a few predetermined coefficients (e.g., FFT).
In many multimedia and digital signal processing (DSP) applications, it is desirable to maintain fixed-with property through multiplication operations to avoid quick growth in word size. For example, the (2W − 1)-bit product obtained from two W-bit operands is quantized to W-bits by eliminating the (W − 1)-least-significant bits (LSBs). In practice, fixed-width multipliers can be designed based on Baugh-Wooley, modified Booth and CSD algorithms. In † † Corresponding author. The author is with Korea Association of Aids to Navigation (KAAN), IT Castle2 12F, Kasan Geumcheon-ku, Seoul, 153-768, Korea. typical fixed-width multipliers, the adder cells required for the computation of the (W − 1)-LSBs are eliminated and error compensation biases are introduced to the retained adder cells.
In order to reduce the truncation error, various error compensation methods for fixed-width multipliers have been proposed [3] - [9] . Error compensation biases of these methods can be classified into constant biases [3] and adaptive biases [4] - [9] . While a constant bias is generated independent of the truncated partial product bits and is fixed for a given input word size, an adaptive bias is determined depending on each input. Thus, adaptive bias methods achieve better computation accuracy than the constant bias method.
In this paper, we propose an error compensation method for low-error fixed-width GCSD multiplier. To efficiently compensate for the truncation error with reduced hardware complexity, the encoded signals from the GCSD multiplier are used for the generation of error compensation bias.
This paper is organized as follows. After a brief review of the GCSD multiplier in Sect. 2, we propose an error compensation bias design method for GCSD multipliers in Sect 3. In Sect. 4, two application examples of the proposed fixed-width multiplier design method are presented. Finally, short statements conclude this paper. Figure 1 shows the N-point radix-2 4 single-path delay feedback (SDF) FFT architecture [10] . In the first and the third multiplication blocks, three coefficients {cos(π/8), cos(π/4), and sin(π/8)} are multiplied by an complex input signal in periodical order as
GCSD Multiplier
In general, the multiplications in (1) can be implemented using a programmable multiplier such as the modified Booth multiplier. If the coefficient word-length is W, the number of the partial products obtained by the modified Booth algorithm is W/2.
To further reduce the number of partial products, the following coefficient grouping algorithm can be used [1] :
Copyright c 2010 The Institute of Electronics, Information and Communication Engineers Starting from the first column, a group is defined such that each row in a group contains at most one nonzero digit. A group should contain as many columns as possible so that the number of groups is minimized.
By applying the grouping algorithm to the three coefficients in (1) with N w = 14, the CSD coefficient table with 5 groups is obtained as shown in Table 1 . Each group G i generates a corresponding partial product P i . Thus, the multiplication result (Y) is obtained as
In Table 1 , the number of partial products required by the grouping algorithm is only 5, while the modified Booth encoding requires 7 (= N w /2) partial products. Thus the grouping algorithm reduces the number of partial products by 2, which can lead to lower power consumption and higher speed. Each group includes at least two columns by the grouping algorithm since CSD coding does not allow any consecutive nonzero digits. Thus, the number of partial products generated by the grouping algorithm is always less than or equal to that of the modified Booth encoding.
If the nonzero digit locations of two groups are the same as in G 4 and G 1 in Table 1 , the two groups can share PP generation circuits. The sign difference in the first rows of G 4 and G 1 can be taken care of later by additional complementing circuits. For any row in a group that contains only 0's, the corresponding PP is 0. In this case, the zero digits in the row can be changed to nonzero digits to share the partial product generation circuits, since the output value can be easily changed back to 0 using a control signal. By the partial product sharing algorithm in [1] , a new representation of each group in Table 1 can be obtained using control signals as shown in Table 2 , where S i , N i and Z i are shift, negation, and zero control signals, respectively.
In conventional approach, the coefficient look-up table (LUT) has 14 columns if the coefficient word-length is 14. In GCSD multipliers, encoded values are stored instead of binary coefficients. In [11] , a look-up table (LUT) reduction method for modified Booth encoded coefficients was proposed. By similar approach, the number of columns of Table 2 can be reduced to 7. Thus, in this case, LUT size is reduced by 50% compared with conventional approaches. By Synopsys simulations, it is shown that the GCSD method reduces the area, power consumption and propagation de-lay by 41%, 45% and 12%, respectively, compared with the conventional modified Booth multiplier [1] .
Fixed-Width GCSD Multiplier
For the coefficient Table 1 with N w = 14, the corresponding partial product array for the GCSD multiplier can be obtained as shown in Fig. 2 . The partial product array can be divided into MP and LP as shown in Fig. 2 , where MP and LP mean more significant and less significant parts, respectively. Then, if S MP and S LP represent the sums of the elements in MP and LP, respectively, we can express (2W − 1)-bit ideal product P I as
In typical fixed-width multipliers, the adder cells required for S LP are omitted and appropriate biases are introduced to the retained adder cells based on a probabilistic estimation. Thus, the W-bit quantized product P Q can be expressed as
where σ means the error compensation bias. Note that σ approximates the carry signals propagated from LP to MP. To generate error compensation bias more efficiently, the truncated bits in LP can be further divided into LP major and LP minor depending on their effects on the truncation error. Then, S LP can be expressed as S LP = S LP major + S LP minor .
As an example, in Fig. 2 , S LP major and S LP minor can be expressed as
where p i, j means the jth partial product bit of partial product P i . Theoretically, the most accurate error compensation bias can be obtained by true rounding as where [t] r denotes the rounding operation of t. Let 2 −L i be the weight of the LSB of the partial product P i . Also, let M i be the required number of shift-left bit positions for the partial product P i . As an example, for G 0 in Table 1 , coefficient sin(π/8) has a nonzero digit at the last column of G 0 . Thus M 0 is defined to be 0 for sin(π/8) since no shift operation is required in this case. On the other hand, cos(π/8) has a nonzero digit at the second column of G 0 . Thus for cos(π/8), M 0 is defined to be 4 since shift-left by 4 bit positions are required in this case. Using Table 1,  Table 2 and Fig. 2 , the possible values of LP minor (P i ) can be obtained depending on the control signals as shown in Fig. 3 . If N i = 1, the input signals are negated as can be seen in Fig. 3 . When Z i = 0, partial product bits are changed to 0.
Assume that each bit x i of input X has the uniform probability distribution. Then, the expected value of x i is
Then, it can be shown that the expected value of S LP minor (P i ) can be computed as
where
The error compensation bias of fixed-width modified Booth multiplier in [8] is defined as follows:
where C E [t] and C A [t] represent the exact carry value and the approximate carry value of t, respectively. In (11), C A {S LP minor } is the approximate carry value (a carry) propagated from LP minor to LP major . In [8] , the expect value of S LP minor (P i ) is identified as
where y i corresponds to Z i signal in this paper. The approximate carry signal in [8] is defined as the rounded value of 
E[S LP minor ]
. For given N w , the approximate carry value of the fixed-width modified Booth multiplier is computed as a carry [8] 
Thus, (11) can be rewritten as
where t defines the largest integer less than or equal to t. Note that in [8] Fig. 3 , the control signals {L 0 , M 0 , N 0 , Z 0 } are varied as {12, 4, 0, 1}, {12, 5, 0, 1} and {12, 0, 1, 1}, depending on the selected coefficient from {cos(π/8), cos(π/4), sin(π/8)}. By (9) , E[S LP minor (P 0 )] can be computed as 2
, and 2 −1 (1 + 2 −12 ), respectively.
Let the partial products included in LP minor be P K , P K−1 , . . . , P 0 . Since the LSB of P i always has a smaller weight than that of P i−1 under any coefficient conditions, the following relation holds:
Thus, using (15) and CSD property, the maximum value of E[S LP minor ] can be obtained as
The carry signals generated from LP minor are determined by the integer part of the terms inside the parenthesis in (16).
can have an effect on the carry signals generated from LP minor to LP major but
) has no effect on the carry signals.
In addition, for the partial product P K , when M K is larger than or equal to L K , the partial product bits inside the LP minor are filled with 0's. Thus, when the partial product P K is negated (i.e., N K = 1), a carry signal is propagated to LP major .
Based on these observations, E[S LP minor ] can be easily computed as
In this paper, the rounded value of E[S LP minor ] is defined as the approximate carry value propagated from LP minor to LP major . As an example, from Fig. 3 (d) , it can be seen that L 3 (= 2) is larger than M 3 (= 0 or 1). Thus, by (17), the approximate carry value is decided as
where t defines the smallest integer greater than or equal to t. In general, (19) can be implemented as shown in Fig. 4 (a) . However, since the number of nonzero Z i signals in Table 2 is either 3 or 4, the values of a carry signals for the three coefficients are always 2. Thus, in this case, no additional hardware is required for the generation of the approximate carry signals as shown in Fig. 4 (b) . The proposed error compensation bias is computed using S LP major and a carry. When the number of nonzero Z i signals is odd, the effect of rounding error can be large in the computation of a carry signal. To alleviate this problem, we propose an error compensation bias for fixed-width GCSD multipliers as follows:
If N NZPP = odd, then where N NZPP is the number of nonzero Z i signals.
In GCSD multipliers, the number of the different coefficients in a group is assumed to be small. Thus, the coefficient selection signals (or, address) need only a few bits. Using this property, the approximate carry signals can be designed using the address bits. For example, if the word length N w of the coefficients in Table 1 is 12, the approximate carry value can be obtained as follows: 
From (21), the following expression can be obtained using Karnaugh map:
In general, the implementation using address bits requires smaller area compared with the implementation based on coefficient control signals when the number of coefficients is small.
Performance Comparisons
To evaluate the performance of the proposed fixed-width GCSD multiplier, we compute the maximum absolute error ε max , the average of absolute error ε avg and the variance of error ε var for all the possible 2 W input values of X as follows Table 3 shows the simulation results of the fixed-width GCSD multiplier for the input word size W = 14. Let M true and M post denote the fixed-width multiplier by the true rounding and post-truncation, respectively. For the computation of M true and M post , all the possible bits are required during the multiplication and the final product is obtained by rounding or truncating the least significant (W − 1)-bits from the exact (2W − 1)-bit result. Also, Table 4 compares the Synopsys simulation results using MagnaChip 0.18-μm CMOS technology. Notice that, compared with the fixedwidth modified Booth multiplier, the proposed GCSD fixedwidth multiplier can reduce about 10% average error. In addition, the proposed multiplier leads to 29%, 36% and 9% reduction in area, power consumption and propagation delay, respectively.
As another example, the proposed algorithm is applied to the following coefficients used in the pulse-shaping filter design for CDMA [11] : a 1 a 0 (00): 1111111010, a 1 a 0 (01): 1111111000, a 1 a 0 (10): 1111110111, Table 3 Comparison of the error performances for FFT applications.
Table 4
Synopsys simulation results for FFT applications.
Fig. 5
Partial product array of the GCSD multiplier for pulse-shaping filters.
Table 5
Comparison of the error performances for pulse-shaping filters. (25) Figure 5 shows the partial product array corresponding to the GCSD multiplier. Table 5 compares the error  performances of several methods and Table 6 compares the Synopsys simulation results. In this case, although the error performances of the proposed method and the method in [8] are almost the same, the proposed multiplier leads to 78%, 84% and 53% reduction in area, power consumption and propagation delay, respectively.
Conclusions
In this paper, an efficient error compensation method for fixed-width GCSD multipliers is proposed. To compute the error compensation bias more accurately, the encoded signals from the GCSD multiplier are used for the bias generation.
The simulation results show that the proposed method leads to significant reduction in area, power consumption, and delay time compared with the fixed-width modified Booth multipliers.
