Abstract-This brief develops a general methodology for designing a lower-error two's-complement fixed-width multiplier that receives two -bit numbers and produces an -bit product. By properly choosing the generalized index, we derive the better error-compensation bias to reduce the truncation error and then construct a lower error fixed-width multiplier, which is area efficient for VLSI implementation. Finally, we successfully apply the proposed fixed-width multiplier to realizing a digital FIR filter, which has shown that the performance is better than that using other fixed-width multipliers.
I. INTRODUCTION
Low-error, small-area, and high-speed multipliers are the most important processing element for digital signal processing (DSP) applications [1] such as digital filters [2] , [3] , Moving Picture Experts Group (MPEG) coding, and so on. The multipliers based on the Baugh-Wooley algorithm [4] , [5] produce 2n-bit output with n-bit multiplier and n-bit multiplicand input. However, for some practical applications, we only require n-bit multiplication output, which may be obtained by directly truncating the n least-significant bits and preserving the n most significant bits. However, by this way, significant errors introduced in the fixed-width operation are undesirable for many DSP applications. To reduce the introduced truncation error, Kidambi et al. [6] proposed the bias compensation structure derived from the statistics of carry propagation, but this structure did not adaptively adjust the proper bias by taking account of a variety of input signals. Next, Jou et al. [7] provided the carry-generating circuit to improve the truncation error corresponding to J-Ks' index. However, there exist two problems that have never been discussed before. One is how to choose proper indices, and the other is whether other lower error multipliers exist or not. The work proposes the general methodology for designing the lower error two's-complement fixed-width multiplier. In addition, this new multiplier has the same area-ratio as J-Ks' multiplier under reasonable assumption. This brief is organized as follows. In Section II, we propose a better error-compensation bias to reduce the truncation error by properly choosing the generalized index, as well as binary thresholding and then construct a simply lower error fixed-width multiplier. In Section III, we guarantee that this compensation bias still be held for large width n. The performance comparison results in terms of maximum error, average error, variance of the error, and area ratio are discussed in Section IV. In Section V, we apply the proposed lower error fixed-width multiplier to a low-pass FIR digital filter [8] , and it can be shown that the performance is better than that using other fixed-width multipliers. At last, short statements conclude the representation of this brief.
II. DESIGN OF FIXED-WIDTH MULTIPLIERS
Considering two two's-complement integer operands, a n-bit multiplicand X and a n-bit multiplier Y can be, respectively, represented by X = 0xn012 Equation (3) is the famous Baugh-Wooley array multiplier [4] , [5] , in which this algorithm combines partial products with the same weighting factor and places them in the same column. Fig. 1 shows the subproduct array for 8 2 8 multiplication. According to (3), an 8 2 8
standard multiplier structure can be obtained as shown in Fig. 2 (a) in which its main symbolic cells are depicted in Fig. 2 (b) and other cells A, ND, HA, and FA denote an AND gate, a NAND gate, a half adder, and a full adder, respectively. By partitioning the subproducts into two sections, (3) can be rewritten as follows:
where P i 2 f0; 1g; MP = 2n01 i=n P i 2 i is the most-significant section, and LP = n01 i=0
Pi2 i is the least-significant section as shown in the upper right triangular area of Fig. 2(a) . It is well known that the simplest fixed-width multiplier is to directly truncate LP section, but this approach leads to the largest truncation error. So, Kidambi et al. [6] provided a constant bias method, which was derived from the carry propagation probability of LP . The truncated multiplier presented in [6] yields the approximate n-bit fixed-width product P K-G-A ; that is
where K-G-A represents the error-compensation bias depending on the width n. While the width n is given, the error-compensation bias K-G-A is a constant under the uniform probability distribution of input bits. Although this approach compensates more information than the simplest truncated multiplier, the bias cannot be adaptively adjusted for different input signals. Thus, the truncation error is still large. Jou et al. [7] presented another way to analyze the error compensation, and suggested a truncated multiplier which results in the fixed-width product P J-K as in (6) and (7) performs better than the bias K-G-A , it is deeply expected to develop a generalized methodology to further improve truncation error. It is because that there exist two never discussed problems in [7] : how to choose proper indices and whether other lower-error multipliers exist or not.
It is known that the most accurate truncated product is theoretically given by P Standard = M P + Temp 2 2 n ; 
where [t] r represents the rounding integer for t. It should be emphasized that Temp is an ideal error-compensation term and it is infeasible to implement the truncated fixed-width multiplier without using any acceptable approximation. From (9), it is observed that Temp is mainly affected by x n01 y 0 + x n02 y 1 + 1 1 1 + x 1 y n02 + x 0 y n01 due to the largest weight. Now, let us define the main-error compensation term E main and the remain-error compensation term E remain , respectively, as 
Thus, we can rewrite (9) 
In (15), the first term in the bracket is referred to as coarse-adjustment term and the second term [K]r is referred to as fine-adjustment term. The coarse adjustment term can be easily realized as simple circuit while the index is decided. On the other hand, the fine-adjustment term can be approached by the expected value in rounding operation after analyzing the statistics. With the spirit of designing simple and realizable error-compensation circuit, we propose two types of binary thresholding for bias estimation. Both types of binary thresholding of index are described as follows.
Type 1: See (17), at the bottom of the page. Type 2: See (18), at the bottom of the page.
where K 1 ; K 2 ; K 3 , and K 4 are, respectively, the average of K for those satisfying index = 0; index > 0; index < n; and index = n.
Next, in order to achieve high accuracy compensation, an investigation on the choice of the generalized index index is required. By exhaustive search, we can find some good generalized indices for small width n (n 12). For large width n, because of high computation load, we have to utilize statistic method to verify error-compensation equations performed by these better indices. It is noted that J-K in Type 1 threshoding is a special index of the generalized index index by choosing q 0 = q 1 = 11 1 = q n01 = 0. For evaluating the resulting performance, given inputs X and Y , let "; "; and be the absolute error between the standard multiplier and various truncated multiplier, the average error, and the variance of error, respectively. That is
where P Standard and P Truncated represent the output value for the standard multiplier and output value for various truncated multipliers, respectively, and Ef1g is the expectation operator. Given index (q n01 ; q n02 ; . . . ; q 0 ) in (13), in the following development, we call the index index (q n01 ; q n02 ; . . . ; q 0 ) as the Qth index where Q 1 = qn01 2 2 n01 + qn02 2 2 n02 + 1 11 + q0 2 2 0 :
Note that Q has a range varying from 0 to 2 n 0 1; for example, index (100 001) denotes the 33th index for n = 6.
By full search simulation for n = 6, we obtain values of K1 and K2 as shown in Fig. 3 for all possible indices. In order to design a simply realizable error-compensation circuit, we choose the indices which sat- lower-error in terms of average error and variance of error, we direct our attention to Type 2 thresholding. Type 2 is another new proposed thresholding and, by exhaustivesearch simulation, it is found that Type 2 structure is an excellent structure with feasible implementation and better performance. We obtain the values of K 3 and K 4 , as shown in Fig. 4 , for n = 6. To have a simple and feasible compensation circuit, it is found that the 33th index is one of the choices, where K 3 and K 4 are close to integers 1 and 0 as possible, respectively. Followed above procedure, we can simulate for the wordlength n from 4 to 12 by the full-search simulation in Type 2 thresholding. After possible simulation of different width n, we observe that the specific index Q=2 +1 achieves better performance as described in Section IV in detail. Of course, the chosen index is of satisfying [K 3 ] r = 1 and [K 4 ] r = 0 for different width n. Hence, the simply realizable error-compensation structure with the lower truncation error for Type 2 thresholding is described as in (23), shown at the bottom of the page, where Q=2 +1 = x n01 y 0 + x n02 y 1 + 111 + x 1 y n02 + x 0 y n01 . Equation (23) has been completely simulated for n 12 and can be mapped to a new structure. Thus, the proposed 8 2 8 lower-error fixed-width multiplier with the 129th index can be depicted in Fig. 5 .
III. FIXED-WIDTH MULTIPLIER WITH LARGE WIDTH
It is known that (2 n01 + 1)th index in Type 2 thresholding can be expressed as By computer simulations, we find that this index Q=2 +1 achieves better performance for small width n. It is difficult to simulate that the index is of the better performance for large width n since the exhaustive simulation takes significant computation time. In this section, we show that the index Q=2 +1 is also suitable to being adopted to design the fixed-width multiplier for large width n; that is, we show that [K 3 ] r = 1 and [K4]r = 0 for large width n. While analyzing fine-adjustment in (15), we encounter the problem in which [K]r depends on input signals of E main ; E remain ; Q=2 +1 and other two terms. Herein, the probability of the input bits is assumed to be uniform distribution, so we approximate (1=2)E main ; (1=2)E remain and other terms using the analysis of output expected value of logic functions. Two cases can be taken into consideration: Q=2 +1 < n and Q=2 +1 = n. 
Note that Efxiyj g = 1=4 and Efxiyj g = 3=4, since the probability of input bits is assumed to be uniform distribution. Similarly, we can obtain (26) from (11) as +1 is suitable to implementing the fixed-width multiplier with large width n. 
It is obvious that a fixed-width multiplier is more accurate if "; and " max are smaller. Tables I-III show the simulated results for the var- ious fixed-width multipliers of different width n. The K-G-As' structure [6] is the truncated multiplier with constant compensation bias only depending on the width of the multiplier, the J-Ks' structure is the fixed-width multiplier devised by Jou et al. [7] , and the proposed structure is our fixed-width multiplier of Type 2 thresholding with the index Q=2
+1 . The comparison results show that our proposed fixedwidth multiplier is more accurate than the others. The excellent performance is achieved due to the fact that we derive a better error-compensation bias to reduce the effect of truncation error. 
where subscripts denote the corresponding fixed-width multipliers. The area ratio is defined as follows:
Substituting (36)-(39) into (40) to evaluate area-ratio with = 0:09 and = 0:45, we tabulated as Table IV . The area ratio in Table IV shows that our proposed multiplier is area efficient since closely to half the area of the standard multiplier. 
V. DSP APPLICATION OF FIXED-WIDTH MULTIPLIERS
In this section, we apply the proposed fixed-width multiplier to the 35-tap FIR filter as shown in Fig. 6 for speech processing. The behavior of a digital FIR filter can be represented as follows:
where
output sequence at ith discrete time. The superscript i, is the time index. First, for practical consideration [8] , the maximum input voice data and filter coefficient in two's complement are normalized to the same value 127 with 8-bit quantization. In the experimental simulation, the temporary output is an accumulated value using 32 bits. Finally, the outputs, O i , are then obtained by scaling the accumulated values. For convenience of comparison of various fixed-width multipliers, we take 1000 samples for the consonant part and vowel part of "Chicken," as shown in Fig. 7 . We are concerned with whether the filtered waveform is accurate via our proposed fixed-width multiplier, so the correct standard output is required. We use error-free output as a standard, which is used to compare the accuracy performances of fixed-width multipliers. Fig. 8 shows the standard filtering output signals and Figs. 9-11 show the filtering output signals processed by the 35-tap low-pass FIR filter applying a variety Fig. 9 . Output signals using K-G-As' structure [6] . Fig. 10 . Output signals using J-Ks' structure [7] . of fixed-width multipliers. Using constant bias K-G-As' multiplier, it is seen from Fig. 9 that there are larger average error and variance of errors in consonant part. Fig. 10 is obtained by applying J-Ks' multiplier and it shows better performance than that of Fig. 9 . However, compared to standard output, we find that output signals in Fig. 10 still have large average error as well as variance of the errors. The smaller average error and variance of the errors especially for consonant part is obtained by using our proposed fixed-width multiplier as shown in Fig. 11 .
VI. CONCLUSION
This brief develops the general methodology for designing a lowererror two's-complement fixed-width multiplier. By properly choosing the generalized index, we derive a better error-compensation bias to reduce the truncation error and then construct a lower error fixed-width multiplier, which is area-efficient for VLSI realization. Finally, we successfully apply the proposed fixed-width multiplier to a digital FIR filter for speech processing application. It has shown that the performance for consonant part is better than that using other fixed-width multipliers. On the other hand, interested readers can study other binary thresholding with generalized indices and use different operators, Fig. 11 . Output signals using the proposed structure. such as ceiling [9] or flooring operators, to devise another useful and realizable fixed-width multiplier.
