Introduction
With inherent tolerance in the domains of imprecise calculations' application, the power consumption rather than accuracy is a critical design concern, especially in signal processing domain. When there is no need to retain the whole calculation results, two designs are adopted to do calculation: the post-truncated (P-T) method and the direct-truncated (D-T) method.
Different from P-T as used to calculate accurately the results, D-T is to adopt some methods of estimating to figure out the results [1] . Compared to P-T, although D-T produces a large truncation error, it still has a significant effect on reducing the area and power consumption and shortening the time delay, which is more acceptable for nowadays integrated circuit (IC) design. Due to the noteworthy hardware reduction, the dynamic power dissipation can also be proportionally reduced without resorting to sophisticated power reduction techniques [2] .
Numerical calculation includes the fixed-point computation [1] - [8] , and the floating-point computation [9] - [10] in MMSE algorithm. From now on, plenty of researches have been done in error-compensated circuits domain in order to improve the accuracy in D-T way. These works have been generalized as Baugh−Wooley (BW) multipliers and Booth multipliers. The Booth multiplier has two remarkable advantages: 1) it can reduce the number of partial products to only about half of the original; 2) the critical-path delay is also less than Baugh−Wooley multiplier (from adaptive low-error fixed-width booth multipliers), which makes this encoding technique widely used in the ASIC-oriented products.
Several works in estimating methods to achieve higher accuracy performance [1] - [8] . A constant-correction truncation (CCT) scheme has been proposed, which has a simple structure and is easy for implementation; however, its problems are also significant [3] . Firstly, the resulting product will have a non-zero DC component. Secondly, to limit the range of the maximum error to be less than a LSB of the data path, the area and power savings cannot be maximized. Based on these problems, a data-dependent variable correction truncation scheme (VCT) has been repersented by using information from the partial product bits of the column adjacent to the truncated least significant bit (LSB) [4] . Then, a pseudo-carry compensated truncation (PCT) scheme has been demostrated by using an adaptive pseudo carry compensation [5] . With an 8 × 8 multiplier, even a 99.4% accuracy for a 16 × 16 multiplication has been shown [6] . While, the performance of different kinds of computing methods such as Wallace, Dadda, Compr. 4:2 have been compared in an accurate and approximate manner [7] . Then, a new approximate Wallace tree multiplier (AWTM) has been presented, which obtained a mean accuracy of 99.85% to 99.965% [8] .
This brief puts forward a novel algorithm in compensation by probability statistics (PS). It counts the number of 1 in multiplicator X (in binary), then this number would be used to figure out the probability of 1 in X. Although PS method has lower accuracy compared to MLCP one, the circuit area, timing delay; and the power consumption of PS method is superior to those of MLCP method. Above all, according to the tradeoff between accuracy and circuit area, the PS method provides an excellent performance within the tolerance of accuracy.
The rest of this paper is organized as follows. In Section II, an introduction of Booth encording, decording, and the saving method of bit extension. This novel algorithm and the theoretical analysis basis are given in Section III. Section IV depicts the comparisons of accuracy, area, power and further demonstrates the performance of multiplier with the proposed compensation circuit. Finally, conclusion of this paper is drawn in Section V.
Booth Multiplier Implement
Modified Booth encoding is widely used in ASIC-oriented products because of its reduction in the number of partial products [1] .
Based-4 Booth Encoding
The n-bit product P can be expressed in two complement representations as follows:
In this encoding process, when the n is even, the binary multiplicator Y needs to be shifted to left for 1 bit, then the top digit should be repeated twice in the two front bits of Y; otherwise, when the n is odd, it needs to be shifted to left for 1 bit, then the top digit should be repeated only once in the two front bits of Y. Assuming that the number of multiplicator Y is n bits (n is a even integer). It is obviously to find that, the number of partial products of P from the n reduce to (n/2) + 1. Table 1 lists three concatenated inputs y 2i+1 , y 2i , and y 2i−1 mapped into YE i using a Booth encoder. Table 2 shows the partial products with corresponding YE i for a 12-bit Booth encoder. In addition, when the multiplicator Y is encoded, as the top digit repeats twice, the result of this encoded part must be +0 or -0, partial products must be 0 and the bottom line of partial products can be omitted. Table 2 shows the partial products with corresponding YE i for a 12-bit Booth encoder.
Booth Decoding
Just shown in Table 2 , n+1 bits are set for storing the partial products each row. After encoding, the partial product array with an even width n contains n/2 rows. If the result of Booth encoding of multiplicator Y is positivewe definite S = 1, otherwise, S = 0. Table II shows an example of how to definite the E and S, when n = 12.
PoS(ISCC 2017)057

A Novel Signed Truncated Booth Multiplier b ased on Probability
Yingran Tan Fig. 1 (a) shows the partial product array in a Booth multiplier when n = 12. Each partial product row needs n+1 bits for partial products, the extended sign bits as brought by Booth encoding, can be simplified as the combination of Es and 1s.
Saving Method of Bit Extension
If the multiplicand and the result of Booth encoding of multiplicator have different sign bits, or when the result of Booth encoding is -0, there is a definition of E: E = 0; if the result of Booth encoding of multiplicator have the same sign bit, or when the result of Booth encoding is +0, there is a definite of E: E = 1.
New Truncated Multiplier
Estimation Method based on Probability
The left red frame of Fig. 1 (b) represents the precise calculation of the partial productions by using the Wallace-tree algorithm in this design. Otherwise, the right side of Fig. 1 would rather be estimated by the method, which will be introduced bellow, than be calculated in the interest of obtain the contribution that the right part to the left one. The probability of 1 of the binary multiplicator X is defined as proX, just as the formula below:
Assuming that zero and one in the binary multiplicand Y have the same probability, and the probability of 1 of the binary multiplicator Y is defined as proY , that is: So, when it comes to the situation of multibit, such as p n to p k , if the first k bits are supposed to be stored, the possibility of carry which the later one has to the one-bit forward one can be defined as 0.5× pr o X × p r o Y (only when the two partial products that equal to 1 meet the one carry will come into being). In this way, each bit is needed to multiply the correlationweighted indexes as 2
The above comprehensive calculation requires estimate of the number of partial products and its location contribution to carry, and in the 12-bit multiplier the result is p r o P×5.2917 :
PoS(ISCC 2017)057
A Novel Signed Truncated Booth Multiplier b ased on Probability
Yingran Tan
Hardware Implementation
Two Approximation Methods
Considering the hardware design, two approximation methods have been adopted in this multiplier.
Firstly, in the carry estimation, the matlab method has been used to simulate the error rate of this multiplier. There is a distinction from the theoretical value and simulation value. When the weight of X's probability equal to 5.25-5.375, the error rate comes to the minimum 1.6358.
In consideration of hardware structure, 5.25⋅p r o P is used to take the place of the p r o P , which means shifting the multiplicand X two-bit to the left and plus the other two part: the original multiplicand X, and shifting the multiplicand X two-bit to the right.
Secondly, there is a problem that how many bits of the binary p r o P should be stored at least. Considering the extreme condition, when all bits of X in binary are 1, it means p r o P=0.5 and the result of carry is calculated as 2.625, equaling to 10.101 in binary.
However, the only integer part is needed. In this way, it only needs to keep the two bits of the result of p r o P at least. There are only three conditions for n x (whichis the number of 1 in X, and n x = 0,1, ... ,12): when n x ranges from 0 to 5, it has been figured out that p r o P=00 ;
when it ranges from 6 to 10, p r o P=01 ; and when it ranges from 11 to 12, p r o P=10 .
Both of the two approximation methods also help save areas because of storage.
Other Area Saving Way
In addition to those methods mentioned above, when it comes to the proX generation, MUX-13 has been used at first to implement the divider; however, it needs too large area. Thus this approach has been withdrawn and several OR gates have been used instead.
Comparisons and Discussion
This section presents a comparison of the accuracy, area cost, power consumption and computation delay of several Booth multipliers.
Accuracy
In this brief, the average absolute error |ε| is defined as: Comparison of the |ε| for various methods. Table 3 shows |ε| for D-T, the proposed PS estimation method, and previous works MLCP respectively [1] . |ε| is the most crucial metric for comparing the accuracy of a multiplier. Table 3 shows that the proposed truncated Booth multiplier sacrifices high performance compared to various multipliers.
In this part, it shows that there is 1.640-accuracy-sacrifice, which certainly has influence on next parts of calculation. Besides, when it comes to many times of iteration, the error rate could be superposed as multiplying. However, as long as the number of iterations is limited, the
PoS(ISCC 2017)057
A Novel Signed Truncated Booth Multiplier b ased on Probability
Yingran Tan error rate could be in control and acceptable, especially in nowadays error-tolerance electronic current design. Table 5 : Power Consumption perfomance.
Power Consumption Performance
The area cost and the computation delay are critical in Booth multiplier designs. Table 4 lists the area and delay of the proposed truncated Booth multiplier and previous designs with various L and W values, in the case of the fastest one and the delay referred in the paper [1] . The area and delay information was implemented by using the Synopsys design compiler with a TSMC 65-nm CMOS standard cell library to synthesize the RTL design.
All the multipliers are implemented by using the Wallace Tree architecture in Fig. 4 with their own compensated circuit. The reference multiplier is synthesized directly by the DC tool. Obviously, even compared with this, the system can tolerate the accuracy of the sacrifice in exchange for the delay and the area of ascension is definitely huge. The MLCP methods presented used exhaustive simulation to design the compensated circuit, and therefore required a long simulation time to establish compensated circuit [1] . The proposed methods have greatly reduced the simulation time. Although one design outperformed other circuits in |ε| values; at the same times, it has a largest circuit area and delay as well [1] .
Circuit Performance
The power consumption is also a major judgement in Booth multiplier designs. Table 5 reveals the minimum area and the power consumption performance of the proposed estimator and previous designs.
The compensated circuit designs generally include a tradeoff between accuracy and area. The novel truncated Booth method in the accuracy tolerance range reduces the power consumption, calculation delay and circuit area to a great extant. Therefore, the proposed Booth multiplier can fulfill the demands of booth throughput and miniaturization.
Conclusion
In this paper a novel truncated Booth multiplier is proposed for inherent tolerance to digital signal processing. Approximate computing appears as a most potential solution to reduce both power and area dissipation as well as the delay. It pays 1.6358 accuracy-sacrifice as cost in return for 11%-13% to the multiplier by DC Synopsys design tool and 51%-64% to MLCP method in area scaling down, at least 2% for time delay decrease, and 36%-39% to the multiplier by DC Synopsys design tool and 48%-64% to MLCP method in power consumption cutting down, which is an excellent breakthrough.
