In this paper, we propose two 2's-complement fixed-width Booth multipliers that can generate an n-bit product from an n-bit multiplicand and an n-bit multiplier. Compared with previous designs, our multipliers have smaller truncation error, less area, and smaller time delay in the critical paths. A four-step approach is adopted to search for the best errorcompensation bias in designing a multiplier suitable for VLSI implementation. Last but not least, we show the superior capability of our designs by inscribing it in a speech signal processor. Simulation results indicate that this novel design surpasses the previous fixed-width Booth multiplier in the precision of the product. An average error reduction of 65-84% compared with a direct-truncation fixed-width multiplier is achieved by adding only a few logic gates. key words: digital signal processing, fixed-width Booth multiplier, VLSI
Introduction
In digital signal processing (DSP) applications (e.g., digital filters and wavelet transformers), it is desirable for the width of arithmetic data to remain fixed throughout the entire computation. To achieve this goal, a fixed-width multiplier [1] - [5] capable of receiving an n-bit multiplier and an n-bit multiplicand and producing an n-bit output is essential. In practice, fixed-width multipliers are based on the CSD algorithm [6] , the Baugh-Wooley algorithm or the Booth algorithm. For decades, the Baugh-Wooley fixed-width multipliers have been widely studied. In [8] , King and Swartzlander proposed a fixed-width multiplier by analyzing its adaptive error-compensation bias. We generalized the low-error fixed-width multiplier via indexing and binary thresholding in [3] , [4] . In [5] we have applied the binary thresholding algorithm in the compensating circuit for fixed-width Booth multipliers with small width. In Sect. 3.3 of this paper a statistical technique will be used to verify that the compensating bias is as good for large width too. We have also optimized the compensating circuit, which reduces about one critical path of HA or FA.
The proposed scheme is based on keeping n + w most significant columns of the partial products intact, where w is † † † The author is with the Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan.
a) E-mail: dennis@lion.ee.ntu.edu.tw DOI: 10.1093/ietfec/e90-a. 6 .1180 a nonnegative integer between 0 and n−1. When w becomes larger the error is smaller, more gates are required in the compensating circuit. On the contrary, when w is smaller the error becomes larger, fewer gates in the compensating circuit are needed. Therefore, this algorithm allows users to adjust the value of w according to their need in the design of the compensating circuit, and is thus called adaptive.
Because an area-time efficient fixed-width multiplier cannot be achieved using the Baugh-Wooley algorithm, most researchers pay attention to the fixed-width Booth algorithm. The Booth multiplier is widely used in ASIC-oriented products due to the higher computing speed and smaller area. This encoding technique has two advantages: a) only about half of the partial products are needed during the computation, that is, the number of partial products is reduced by a factor of 2; b) delay on the critical path is less than that of the Baugh-Wooley multiplier. Moreover, in a fixedwidth Booth multiplier, area saving can be further achieved by truncating n least significant columns and preserving n most significant columns of the partial product. However, since errors are not compensated, significant errors are introduced due to truncation. In [1] , a means of reducing the truncation errors was proposed. Unfortunately, it lacks systematic analysis and error estimation. Therefore, our goal is to introduce a systematic design methodology for low-error area-time efficient Booth multiplier by a) using an error-compensating bias according to a new binary threshold; b) simulating the K value and error performance of the proposed error-compensating bias with various indices; then c) selecting the index resulting in lowest error and satisfying identical K value for limited width of n; and finally; d) constructing a low-error Booth multiplier. Under the limited truncation error, applying this methodology allows additional error-compensating circuit to be easily realized with little area overhead induced by the bias generation circuit. This leads to high-speed area-time efficient multiplication suitable for modern high-performance VLSI and SOC (system on a chip) applications in which area, speed, throughput, and power consumption are critical parameters.
Modified Booth Multiplier
Consider the multiplication of two 2's-complement integers, an n-bit multiplicand A and an n-bit multiplier B, as
Copyright c 2007 The Institute of Electronics, Information and Communication Engineers
where b −1 = 0. The bracketed term in (2) has a value of {−2, −1, 0, 1, 2}. Each recoded value performs a certain operation on the multiplicand A; the multiple additions at each stage are required to generate the product. Substituting (2) into (1), we obtain
where
. Triplet scanning takes place from b −1 to the MSB with a one-bit overlap, and thus only that is floor ((n + 2)/2) − 1 for signed numbers and defiantly floor ((n + 2)/2) for unsigned numbers of partialproduct rows need to be computed. To simplify the representation of each partial product, we define the notation.
where S i, j represents the j-th bit product of the i-th row. In conventional 2's complement Booth arithmetic operations, partial product sign extensions are required for each stage, but these extended sign bits result in large amount of power and area overhead. The sign S in an 8 by 8 multiplier can be expressed as Figure 1 shows the subproduct Booth multiplier for 8 × 8 multiplication according to Eqs. (4) and (5). 
Design of Fixed-Width Booth Multiplier
The 2n-bit product of the n by n 2-complement multiplication can be divided into two sections:
It is known that the most accurate truncated product is given by
where [•] r denotes the rounding integer of •. Figure 2 , without loss of generality we assume n = 8, Eq. (8) can be denoted as
From Eq. (9), it can be observed that σ Temp is mainly affected by S 3,1 + S 2,3 + S 1,5 + S 0,7 because the most significant weight is occupied. Thus, it is convenient to define the main-error compensation term E main and the remain-error compensation term E remain as
From Eqs. (10) and (11), we can rewrite Eq. (8) as
It must be emphasized that σ Temp is the most accurate errorcompensation bias (also called the true rounding approach). It is apparent that this true rounding approach results in larger area (i.e., higher cost) than the direct-truncted Booth multiplier. On the contrary, to obtain the smallest area fixedwidth two's-complement multiplier involves truncating the LP section directly, but doing so results in large truncation error. Before proposing a new systematic methodology, some useful terminologies are defined first. To explore the influence of the index in the proposed binary thresholding, we first define a generalized index. Here the generalized index for 8 by 8 multipliers is defined as
where n + w is the number of columns kept in the subproduct, the binary parameters q 3−w , q 2−w,..., q 0 ∈ {0, 1}, and the operator
in which T is the binary complement of T . Furthermore, θ index,w (q 3 , q 2 , q 1 , q 0 ) is referred to as as θ Q,w , where
Note that Q ranges from 0 to 2 (n/2)−w −1; for example, the 9th index is θ index,w=0 (1, 0, 0, 1), which is denoted as θ Q=9,w=0 .
Realizable Error-Compensation Bias by Keeping n
Most Significant Columns (w=0)
The concept of binary thresholding multipliers and methods for choosing their indices have been discussed in [5] . In this work, we propose a generalized methodology to further develop an adaptive multiplier with smaller truncation error. We can rewrite Eq. (12) into
Where
In Eq. (16), the first term is referred to as the roughadjustment term and the second term, [K] r , is referred to as the fine-adjustment term. The rough-adjustment term can be easily realized in hardware once the index is determined. On the other hand, the fine-adjustment term can be approximated with the expected value in rounding operation after analyzing the statistics [3] . In order to design a simple and realizable error-compensation circuit, we propose two types of binary thresholding for error-compensation bias and impose one restriction on the value of K under w = 0. Both types of binary thresholding of the generalized index, θ Q,w=0 are described as follows:
• Type 1:
• Type 2:
where K 1 , K 2 , K 3 and K 4 are the average values of K that satisfies θ Q,w=0 = 0, θ Q,w=0 > 0, θ Q,w=1 < n/2 and θ Q,w=0 = n/2, respectively. In this subsection, the values of K are restricted so that the chosen indices satisfy [K i ] r ∈ {−1, 0, 1} for i = 1, 2, 3 and 4. Next, in order to achieve high accuracy in compensation, we investigate on the values of K and the error performance of the generalized index θ Q,w=0 . Note in Type 1 binary thresholding is a special index of the generalized index θ Q,w=0 when q 0 = q 1 = . . . = q n−1 = 0. After performing exhaustive search simulation on Type 1 binary thresholding for n = 8, we obtained the optimal values of K 1 and K 2 for all possible indices. However, these two indices (i.e., θ Q=0,w=0 and θ Q=2 (n/2)−1 ,w=0 ) in Type 1 binary thresholding lead to nonconstant K 1 for n ≤ 16. This phenomenon can be easily verified by computer simulation and verified by statistical techniques [3] similar to that mentioned in Sect. 3.3. Besides, Type 2 was proven to outperform Type 1 for array multipliers with w = 0, and thus we do not take any other approximations for Type 1. Figure 3 illustrates the values of K 3 and K 4 that we have obtained through exhaustive search simulation for n = 8. Corresponding to the rounding values of K 3 and K 4 , we also simulated the average values, shown in Fig. 4 . Considering the goal of smaller error and the restriction on K, we find that the θ Q=0,w=0 and θ Q=2 (n/2)−1 ,w=0 indices are of better performance (see Fig. 4 .
Following the above procedures, we can simulate the values of K 3 as well as K 4 and the error performance for n from 4 to 16 in Type 2 binary thresholding. After the full-search simulation, we observe that two specific indices θ Q=0,w=0 and θ Q=2 (n/2)−1 ,w=0 still achieve better performance where the chosen indices satisfy [K 3 ] r = −1 and [K 4 ] r = 0 for n ≤ 16. Hence, the simple error-compensation biases with smaller truncation error in Type 2 binary thresholding are described as σ T ype2,Q=0,w=0
where θ Q=0,w=0 = S 3,1 + S 2,3 + S 1,5 + S 0,7 and σ T ype2,Q=2 (n/2)−1 ,w=0
where θ Q=0,w=0 = S 3,1 + S 2,3 + S 1,5 + S 0,7 . Equation (20) has been completely simulated for n ≤ 16 and can be mapped to a new structure.
Realizable Error-Compensation Bias by Keeping
More than n Columns (w ≥ 1)
By lower truncation error can be obtained if larger n+w most significant columns are kept in hardware. However, the area cost could be increased. Since the reduction and rounding errors are not of the same weight, Eq. (12) can be rewritten as 
• denotes the maximum integer equal to or less than •.
Treating reduction and rounding errors in Eq. (23). In the same way, to design a realizable error-compensation bias, two types of binary thresholding for the error-compensation bias can be changed to
where K 1 , K 2 , K 3 and K 4 are defined as those of Eqs. (18) and (19) except for w = 1. The restriction of K can be modified as [K i ] r ∈ 0, 1, 2 w−1 − 1, 2 w−1 for i = 1, 2, 3 and 4. For w=1, with identical simulation procedures as mentioned in Sect. 3.1, we introduce only the analysis and design for w=1. In Type 1 binary thresholding, by exhaustive search we can find one good index, as shown in Fig. 5 . We observe that the specific index, θ Q=0,w=1 , achieves best error performance given that [K 1 ] r = 1 and [K 2 ] r = 0, as shown in Fig. 6 . On the other hand, for Type 2 binary thresholding, all the average errors are larger than those resulted from the best index in Type 1 thresholding, θ Q=0,w=1 . Therefore, we do not need to discuss Type 2. So far, the second step is processed. As a consequence, a new smaller error fixed-width Booth multiplier under w=1 can be described and simplified as: 
where θ Q=0,w=1 = S 3,0 + S 2,2 + S 1,4 + S 0,6 . In the third step, Eq. (29) can be mapped to a new structure as shown in Fig. 7 . Note that the error-compensation circuit only needs three basic gates. For other w, we can evaluate K with the same procedures.
Low-Error Fixed-Width Booth Multipliers with Large Width n
By full search simulation, we find that θ Q=0,w=0 and θ Q=2 (n/2)−1 ,w=0 in Type 2 binary thresholding achieve better performance for small width n. Similarly, we observe that θ Q=0,w=1 in Type 1 binary thresholding leads to better results. It is difficult to simulate the performance for large width n since the exhaustive simulation takes lots of computation time. In this section, we show that these indices can be adopted in designing fixed-width Booth multiplier for larger width n. That is, we verify [ • Case 1: θ Q=0,w=1 = 0 Note that E S i, j = 3/8 and E S i, j = 5/8 since the probability distribution of input bits is assumed to be uniform. In this case, θ Q=0,w=1 = 0 is met only when S 3,1 = S 2,3 = S 1,5 = S 0,7 = 0. That is, Case 1 is a conditional probability case, and thus we deduce Eq. (30) from Eq. (23) as
Note that K 1 is proportional to 63n/256, so this errorcompensation circuit is difficult to design for n ≥ 4. We observe the fact that Case 1 is a minor case for all input combinations and the rounding value of K 1 is always equal to or greater than one. In order to design a simple errorcompensation circuit based on the above two facts, we adopt a constant to approximate the value of K 1 . Equation (30) can be approximately set to 
Substituting Eq. (33) into Eq. (26), we obtain σ T ype1,Q=0,w=1
By combining Eqs. (32) and (34), we obtain Eq. (28) and simplify it as Eq. (29). The chosen index θ Q=0,w=1 is suitable for error-compensation bias for large width n. Similarly, for w ≥ 2, the statistical verification techniques and constant approximation can be applied. We can conclude that the error-compensation bias is
Performance and Area Comparisons
The new structure can achieve better error performance than other multipliers. Compared with J-T-T's multiplier in [1] , our multiplier (Type 2, w=0) outstands in average error.
When compared with C-L-P's multiplier in [2] , our multiplier (Type 1, w=2) has less gates and shorter critical path Table 1 Comparison results of area and critical delay among various Booth multipliers and [3] , [4] for n = 8. and delay time as listed in Table 1 . The simulation results of the fixed-width Booth multipliers that varies in width n as listed in Table 2 . Our proposed fixed-width Booth multiplier is of Type 2 thresholding with the index θ Q=0,w=0 and of Type 1 thresholding with the index θ Q=0,w=2 . Clearly the proposed fixed-width Booth multiplier is more accurate than others. The improved performance is achieved by applying a better error-compensation bias to reduce the effect of truncation error. Besides, the new Booth multiplier also saves much chip area and performs better with much smaller average error. Most importantly, the gate count and the critical delay of the proposed structure are better those of J-T-T's and C-L-P's multiplier. Based on the circuitry architecture of the Booth multiplier we have designed, by using our compensation circuit of (Type 1, w=2) and that of (ACGP 2) as provided by C-L-P respectively. Thus, the proposed fixed-width 8 × 8 Booth multiplier shown in Fig. 8 has the combined advantage of area-time efficiency and excellent error performance. 
DSP Application of Fixed-Width Booth Multipliers
In this section, we apply the proposed fixed-width multiplier to the 35-tap FIR filter for speech processing. First, for practical consideration [7] , the maximum input voice data and filter coefficient in two's complement are normalized with 8-bit quantization. In the simulation, the temporary output is an accumulated value using 32 bits. Finally, the outputs are then obtained by scaling the accumulated values. For convenience of comparison among various fixedwidth Booth multipliers, we take 1000 samples for the consonant part and the vowel part of "Chicken." What we are concerned here is whether the filtered waveform is accurate via the proposed fixed-width Booth multiplier in which the precise standard output is the controlled group. The filtered output signals are processed by the 35-tap low-pass FIR filter with different fixed-width Booth multipliers. From the comparison results in Fig. 9 on four fixed-width Booth multipliers in speech processing application, we observed that the Type1 Booth multiplier with θ Q=0.w=2 satisfies our requirement both in the consonant and the vowel parts.
Conclusions
This paper proposed a new methodology for designing lowerror two's-complement fixed-width Booth multipliers. By properly choosing the generalized index, we derive a better error-compensation bias than previous works to reduce the maximum error, the average error and the variance of error, and improve the truncation error. Furthermore, these error-compensation biases can be easily implemented. It is very suitable for VLSI digital signal processing applications where the accuracy, area, and speed issues are crucial. Besides, these Booth multipliers can be easily applied to computing engines such as digital filters and wavelet transform. Finally, we successfully apply the proposed fixedwidth Booth multiplier to a digital FIR filter for speech processing application. It has been shown that the performance of our multiplier for the consonant part is far better than that of existing fixed-width Booth multipliers. The future works include the study of other binary thresholding with our generalized index and the restriction on K to design more useful and realizable fixed-width Booth multipliers.
