In this brief, a probabilistic estimation bias (PEB) 
I. INTRODUCTION
FIXED-WIDTH multipliers generate an output with the same width as the input. They are widely used in digital Signal processing systems, such as discrete cosine transform (DCT), finiteimpulse-response filter(FIR), and fast Fourier transform(FFT). Nevertheless, the computation error is introduced if the least significant (LS) half part is directly truncated. To reduce the computation error, many compensation techniques were presented for array multipliers. There is an apparently tradeoff between accuracy and hardware complexity. Recently, compensation works have been increasing, focused on reducing the truncation error on the Booth multiplier. In, Jou et al. have presented statistical and linear regression analysis to reduce the hardware complexity. However, the truncation error was partly depressed because the estimating information that came from the truncated part is limited. Song et al. determined the estimation threshold by using a statistical analysis. Huang et al. have presented a self compensation approach using a conditional mean derived from exhaustive simulation. Nevertheless, these timeconsuming exhaustive simulations and heuristic compensation strategies may introduce curve fitting errors. Heuristic compensation bias circuits can reduce the error further by using more inputs from the encoder; however, these circuits consume more hardware overhead.
This study proposes a probabilistic estimation bias (PEB) method for reducing the truncation error in a fixed-width Booth multiplier. The PEB formula is derived from the probabilistic analysis in the partial product array after the Booth encoder. In addition, the low-error and area-efficient PEB circuit is obtained based on the simple and systematic procedure. In this way, the time-consuming exhaustive simulation and the heuristic design process of the compensation circuit can be avoided. Furthermore, the hardware efficiency and low error are validated through our simulation results.
II. FIXED-WIDTH BOOTH MULTIPLIER
Modified Booth encoding is popular to reduce the number of partial products. Two L-bit inputs X and Y, and a 2L-bit standard product SP (without truncation error) can be expressed in two's complement representation as follows:
The modified Booth encoder maps three concatenated inputs y 2i+1 , y2 i , and y 2i−1 into y' i , which are tabulated in Table I ,where P{y' i } stands for the probability of y' i . After encoding, there are Q = L/2 rows in the partial product array with an even width L. The corresponding partial products represented in input x i are tabulated in Table II , where the last column n i stands for the sign of each partial product. 
An example of 10 × 10 fixed-width Booth multiplier with the Booth encoder is displayed in Fig.  1 . The partial product array can be divided into two parts: the main part (MP), which includes ten most significant columns (MSCs), and the truncation part (TP), which includes ten LS columns (LSCs). The SP can be rewritten as follows:
In the fixed-width multiplication, TP can be estimated and the quantized product QP can be defined as
where σ representing the estimation bias (EB) from TP can be further decomposed into TPMajor (MSC of TP) and TPminor (LSCs of TP) parts as
TPminor =TPm1 + TPm2 (6) where Round(k) is rounding k to the nearest integer. In Fig. 1 , because TPMajor affects more than TPminor while contributing toward the EB σ, the σ value can be obtained by calculating TPMajor and estimating TPminor in order to reduce truncation errors. In our analysis of estimation, expected values on all elements including n i in TPminor are derived. First, we derive the expected values (probabilities of being one) on all elements in TPminor, except for P 0,0 and n 0 . Taking column P 0,i (i '= 0) in Table II as an example, we sum up the expected values on nonzero terms in the third, fourth, and sixth rows. When the third row (y' i = 1) is taken into consideration, the expected value of x 0 is 1/2 because the probability of each input bit is assumed to be uniformly distributed. Then, we can trace back to Table I occur. The expected value E[P 0,0 ] can be derived as follows:
Similarly, the expected value E[n 0 ] is 1/2 as well. Hence, the expected values of all elements (including n i ) in TPminor are obtained as follows:
III. PROPOSED PEB
Based on (9) and (10), the PEB formula is derived. Then, the proposed PEB circuit is implemented by systematic steps that provide a simple and extendable solution for long fixedwidth (L ≥ 16) Booth multipliers.
A. Proposed PEB Formula
To easily understand the deduction process, we divide TPminor into two groups, i.e., TPm1 and TPm2, as displayed in Fig. 1(b) . Group TPm1 includes the columns containing n i and can be derived as follows:
3L/32 0.75 0.9375 1.125 1.5 3
where Q = L/2. Substituting (7) and (8) into (11), the expected value of TPm1 can be simplified as Similarly, the remaining group TPm2 and its expected value can be derived as follows: Combining (12) and (13), the expected value of TPminor can be calculated as follows:
where the last term 2−2((L/2)+1) can be neglected because its value is smaller than the former term 3L/32, particularly for large L. As a result, the expected value of TPminor can be estimated as follows:
where A and b are the integer and fractional parts of 3L/32, respectively. B is set to 1 if b ≥ 0.5, otherwise B = 0. (4), we obtain the PEB formula as follows:
B. Proposed PEB Circuit Using the Systematic Procedure
The realization of (16) can be easily implemented by using full adders (FAs) and half-adders (HAs). The PEB circuit is obtained after the following systematic steps: 1) Find integer A and bit B by calculating PEB in (15) .
2) Generate A estimation carries (ec 0 − ec A−1 ), and add them to the LSC of MP.
3) Sum up bit B and elements in set {TPMajor} = {PL−1,0, PL−3,1, . . . , P1,Q−1} with the FA or HA tree to produce the remaining estimation carries (ecis) being added to the LSC of MP and a sum (for rounding). The detailed procedure is listed as follows: a) Add bit B and set {TPMajor} in the carry-save form [16] with sums to be repeatedly added for producing ec is until only one sum is left. 
IV. PERFORMANCE COMPARISONS

A. Fixed-Width Booth Multiplier
In Table IV , Cadence System-on-Chip (SoC) Encounter is applied with Taiwan Semiconductor Manufacturing Company (TSMC) 0.18-μm standard cell library to implement all the listed circuits, and the area (in square micrometers) and power consumption (in milli watts) comparisons are normalized to those of the post truncated Booth multipliers as shown in parentheses, respectively. The accuracy can be evaluated in terms of the absolute average error |€'|, the maximum error €M, mean square error €ms, the average error €', and the variance of absolute error €v defined as
Where Avg{・}, |N|, Max{・}, and Var{・} represent the average operation, the absolute value N, the maximum operation, and the variance operation, respectively. Table V shows the error comparisons of existing fixed-width Booth multipliers in various lengths L, where numbers in parentheses stand for the truncation errors of direct-truncated (DT) multipliers, which is defined in (17) . Compared with that of [9] and [14] , our proposed PEB circuit provides the smallest truncation errors
TABLE V
absolute average error |€'|, maximum error €m, mean square error €ms, average error €', and the variance of absolute error €v comparisons except the average error with the same or 1% more hardware overhead. It is also interesting to observe that the designs of [10] and [15] outperform [9] , [14] , and our proposed PEB circuit in these error merits using more hardware. In general, a tradeoff exists between hardware overhead and accuracy in these compensation circuits. The larger hardware overheads of [10] and [15] come from the bias generation circuits and encoders. Because our compensation bias is derived from a theoretical deduction, our PEB circuit could be easily extended Fig. 4 . Core layout and characteristics of the DCT core using the proposed PEB circuit.
for high-accuracy fixed-width multiplication using more information from TPminor with the penalty of more area. Different from previous compensation circuits for Booth multipliers, our PEB circuit does not need the exhaustive simulation and the heuristic bias circuit design.
B. Application Example: DCT
In order to exhibit the accuracy in real applications, the proposed low-error PEB is applied into an 8×8 2-DDCT [17] . The size of the test image "Lena" is 512×512 pixels, with each pixel being represented by 8-bit 256-gray-level data. Moreover, the accuracy performance of the DCT core is evaluated by the peak signal-to-noise ratio (PSNR). The comparison results for the accuracy of the PSNR and the synthesized area are tabulated in Table VI . Compared with the DCT core using standard Booth multipliers, the DCT core using the proposed PEB circuit reduces 23% area with the PSNR penalty of 4 dB. On the other hand, the accuracy the PSNR of the DCT core using the proposed PEB circuit is more than 17 dB, which is larger than the DT approach with only 2%more hardware overhead. To implement the DCT with the proposed PEB circuit on a chip, we use the Synopsys Design Compiler to synthesize the register-transfer-level design and Cadence SoC Encounter to run placement and routing. Fig. 4 shows the layout view and the characteristics of the architecture. While implemented in a 1.8-V TSMC 0.18-μm 1P6M CMOS process, the proposed DCT core can be operated in a 55 MHz clock rate, and the core size is 501μm×508μm.
V. CONCLUSION
In this brief, we have first derived the PEB formula and have applied the probabilistic analysis for the truncated two's complement fixed-width Booth multiplier. Then, a simple and systematic procedure has been presented to design the compensation circuit based on the PEB formula and the probabilistic analysis. Compared with the existing works, the proposed method has provided smaller area and smaller truncation errors. The realization of our PEB circuit does not need exhaustive simulations and heuristic compensation strategies that tend to introduce curve fitting errors and unacceptable exponential simulation time. Furthermore, the proposed PEB Booth multiplier in the DCT application has shown the improvement of the PSNR by 17 dB with only 2% area penalty compared with the DT method. In the future work, our PEB circuit can be applied for high-accuracy fixed-width multiplication using more inputs from TPminor with more hardware overhead.
