Abstract-In this brief, two designs of low-error fixed-width signmagnitude parallel multipliers and two's-complement parallel multipliers for digital signal processing applications are presented. Given two n n nbit inputs, the fixed-width multipliers generate n n n-bit (instead of 2n n n-bit) products with low product error, but use only about half the area and less delay when compared with a standard parallel multiplier. In them, cost-effective carry-generating circuits are designed, respectively, to make the products generated more accurately and quickly. Applying the same approach, a low-error reduced-width multiplier with output bit-width between n and 2n has also been designed. Experimental results show that the proposed fixed-width and reduced-width multipliers have lower error than all other fixed-width multipliers and are still cost-effective. Due to these properties, they are very suitable for use in many multimedia and digital signal processing applications such as digital filtering, arithmetic coding, wavelet transformation, echo cancellation, etc.
I. INTRODUCTION
In general, the multiplier is one of the core components in multimedia and digital signal processing (DSP) chips because it dominates these chips' performance and area. To get a higher speed, parallel multipliers are always adopted at the expense of high area complexity. In the past, many parallel multiplication algorithms (architectures) (e.g., [1] - [3] ) have been proposed to reduce the chip area and increase the speed of the multipliers. This brief proposes another approach to significantly reduce the chip area of the parallel multipliers without sacrificing performance. The approach is based on the fact that the multiplication operations used in many multimedia and DSP applications [4] , [5] usually have the special fixed-width property. That is, their input data and output product have the same bit width.
One can obtain a fixed-width multiplier by directly omitting about half the adder cells of the conventional parallel multiplier but a significant error would be introduced in the resulting product, and this is undesirable for many fixed-width multimedia and DSP applications. Kidambi et al. [6] gave a simple method, i.e., added a constant bias to the retained cells to reduce the error of the fixed-width multiplier. However, its product error is still large. In this brief, we propose the low-error fixed-width multipliers for both sign-magnitude and two'scomplement formats, respectively. In them, efficient carry-generating circuits, which feed the revising information to the carry inputs of the retained adder cells, are designed to significantly reduce the product error. In addition, the proposed fixed-width multipliers still uses only about half the area of and less delay than the standard parallel multiplier. Our design strategy for fixed-width multipliers has also been applied to design the reduced width multiplier, which is useful in some special applications such as wavelet transformation [7] , [9] . Experimental results show that the proposed fixed-width and reduced-width multipliers have lower error than all other fixed- width multipliers and are still cost-effective. The design method and architectures of the proposed fixed-width multipliers will be introduced in Section II. In addition, this design strategy is extended to design the reduced-width multipliers. In Section III, the error and area comparisons of the proposed fixed-width multipliers with other fixed-width multipliers are reported.
II. DESIGN OF FIXED-WIDTH PARALLEL MULTIPLIERS
This section first describes the method to design the low-error fixed-width sign-magnitude parallel multiplier and proposes the multiplier's architecture. Then, the similar ideas and approach are applied to designing the fixed-width two's-complement and reduced-width parallel multipliers.
A. Fixed-Width Sign-Magnitude Multiplier
Considering the multiplication of two n-bit inputs X and Y; a standard multiplier performs the following operations to obtain the 2n-bit product P P = XY = 
Fig. 1 also shows the sections generating MP and LP for n = 6; in the shaded region of Fig. 1 , the circuit producing LP is also denoted as LP , and the column (column circuit) generating Pi is also denoted as P i . A fixed-width multiplier can be obtained directly by removing the shaded region and feeding each carry input of the residuary part with 0 to form a circuit denoted as MP 0 ; which is different from the circuit used to generate MP . This will save half the area of the circuit. However, a significant error will be introduced in the fixed-width product.
To design a low-error fixed-width multiplier, we first analyze the source of errors generated by MP 0 ; and then derive a small carry-generating circuit Cg to feed each carry input of MP 0 to reduce errors effectively. Let " denote the difference between the two products produced by the standard multiplier and circuit MP 0 ; and it is caused by the carries generated from column circuit Pn01 (i.e., the column circuit generating P 5 in Fig. 1 for n = 6) in circuit LP . Let i denote the sum of carries generated from column circuit P i ; then " = n01 32 n . Since 0 n01 n01; then 0 " (n 01)32
n . According to (1) and the multiplier architecture shown in Fig. 1 , we have 
Equation (5) shows that n01 is dominated by the bit-products
x i y j ; where i + j = n 0 1; since they have the largest weight 2 n01 x 0 y 1 : A low-cost Cg can be designed easily if a simple relationship between n01 and is found. The following Lemmas and Theorems first find the relationship between and ; and then that between n01 and .
Let X and Y be the two n-bit input operands of a multiplier, then the bits of X and Y; denoted as xi and yi for 0 i n 0 1; are called input bits. In the following derivation, we assumed that the probability of each input bit equal to 1 is 0.5, and the probability of bit-product xiyj of the multiplier equal to 1 is denoted as p1(xiyj). Lemma 1: For a given ; then there are at most n+ and at least 2 bits in X and Y equal to 1.
Proof: See the Appendix.
Lemma 2: Given a ; the probability of any input bit of the multiplier equal to 1 is (n + 3)=4n; and p1(xiyj) = (n + 3) 2 =16n 2 .
The curves of p1(xiyj)'s for different bit-width n's, denoted as p1 n ; versus are drawn in Fig. 2 . We can see that p1 n is approximately proportional to . For simplification, a straight line is found to approximate the curve of p1n by using linear regression analysis. The straight line found is = =n; where is the value of the ordinate. The straight lines of =n versus for different n's are also drawn in Fig. 2 for comparison. In addition, the average relative errors between =n and (n + 3) 2 =16n 2 for some given n's, denoted as ; are listed in Table I . The results show that =n is indeed approximate to p1n. Therefore, we get p1(x i y j ) = (n + 3) 2 =16n 2 = =n:
Theorem 1: For a given , we have that 
Proof: See the Appendix. Theorem 2 gives us a guideline on designing a carry-generating circuit Cg. The designed Cg has n inputs and n 0 1 outputs, and consists of n 0 2 AO cells and one 2-input AND gate. Each AO cell contains one 2-input AND gate and one 2-input OR gate (see Fig. 3 ). A 6-bit Cg is shown in the shaded area of Fig. 3 . Feeding each input of the n-bit Cg with a proper bit-product x i y j where i + j = n 0 1; it will generate 0 outputs equal to 1 and the other n 0 0 0 1 outputs equal to 0. That is, it satisfies (9). The following theorem gives a proof.
Theorem 3:
Feeding the n-bit Cg with proper bit-product x i y j 's for i + j = n 0 1; it will generate 0 outputs equal to 1 and the other outputs equal to 0.
Feeding each carry input of circuit MP 0 with the outputs of the proposed circuit Cg properly, a new low-error fixed-width multiplier is formed. Fig. 3 shows an example of the low-error fixed-width sign-magnitude multiplier for n = 6. The speed of the proposed fixed-width multiplier is faster than that of the standard parallel multiplier, and the analyzes of area and product errors of it are given in Section III.
B. Fixed-Width Two's-Complement Multiplier
Considering the multiplication of two n-bit inputs X and Y , a standard two's-complement multiplier performs the following operations to obtain a 2n-bit product P [2] : To design a low-error fixed-width two's-complement parallel multiplier, we adopt an approach similar to that described in Section II-A. Let the carry-generating circuit of the two's-complement parallel multiplier be denoted as Cg. According to (10) and the multiplier architecture shown in Fig. 4 , the sum of carries generated from column circuit P n01 in circuit LP , also denoted as n01; is 
Moreover, Theorem 1 states that = =2 0 =n. Therefore, (13) can be rewritten as 
Since n; we get that b0=nc = 0 if = 0 and b0=nc = 01
if > 0. Then, we get the following approximate relationship: 
According to (15) , an n-bit Cg that consists of n02 OR cells, each contains a 2-input OR gate, and one 2-input NOR gate is designed.
An example of a 6-bit Cg is given in the shaded area of Fig. 5 .
Moreover, Fig. 5 shows the low-error fixed-width two's-complement multiplier for n = 6. 
C. Reduced-Width Multiplier
Given two n-bit inputs, a multiplier which generates an m-bit product where n < m < 2n is called a 2n-to-m reduced-width multiplier. It is useful in some special applications such as wavelet transform [9] and data coding in which the encoder generates m-bit data so that only the m most-significant bits of a 2n-bit product are required. In addition, the reduced-width multiplier can further reduce the product error of the fixed-width multiplier described above.
Again, our design strategy for the design of a fixed-width multiplier is applied to the design of a low-error reduced-width multiplier. A 2n-to-m reduced-width sign-magnitude multiplier can be obtained by eliminating the adder cells needed to produce the 2n 0 m leastsignificant bits (LSB's) of the product and then by adding a 2n 0 m bits carry-generating circuit Cg to feed revised data to the carry inputs of the retained adder cells. Since the input bit-products of column circuits P n02 ; P n03 , 1 11; P 1 in the two's-complement and the sign-magnitude multipliers are identical (see Figs. 1 and 4 for n = 6), the design of a 2n-to-m reduced-width two's-complement multiplier is the same as the design of a 2n-to-m reduced-width sign-magnitude multiplier. An example of a 12-to-7 reduced-width two's-complement multiplier is shown in Fig. 6 .
III. ERROR AND AREA COMPARISONS
We take the standard multiplier M S , the proposed fixed-width multiplier MF , the fixed-width multiplier MP 0 defined in Section II-A, the fixed-width multiplier M1 proposed in [6] , the fixed-width multiplier M 2 which consists of circuit MP 0 and column circuit Pn01 with each carry input fed with 0, and the proposed 2n-to-(n+1) reduced-width multiplier M R for comparisons.
To compare the accuracy of the different fixed-width multipliers, we calculate and compare the maximal absolute error, the average error, and the relative error of their products. Let F P be the product of an n-bit fixed-width multiplier, then its maximal absolute error " M is the maximum value of jMP 0 F P j's for all input pairs, and its average error " is ( jMP 0 F P j)=2 2n ; where jMP 0 F P j is the sum of jMP 0 F P j's for all input pairs. In addition, the relative error " R of a fixed-width multiplier is defined as jMP 0 F P j=MP.
Let P "R denote the percentage of input pairs whose "R is larger than 0.01, as shown at the bottom of the page. Then, the ratio of P "R's of the other fixed-width multipliers to that of our fixedwidth multiplier M F is denoted as <(0.01). Table III . The area of the proposed fixed-width multiplier is nearly half the area of a standard multiplier and is less than the area of M 2 . In addition, the area of MR is also listed in Table III . The results show that the area of M R is also competitive with the standard parallel multiplier.
For a more practical error comparison, the different multipliers are applied to wavelet transformation [7] , [9] and then the quality of reconstructed images is compared. Four 512 2 512 images of Lena, F16, Face, and Bear are picked for this experiment, and quality comparison among different multipliers is done based on root mean square error (RMSE), signal-to-noise ratio (SNR), and peak signalto-noise ratio (PSNR). The smaller RMSE and the larger SNR and PSNR represent the better quality of the reconstructed images. One resolution level wavelet operations performed by different multipliers and the quality comparison are reported in Table IV. In Table IV , M R1 and M R2 denote the 18-to-10 and 18-to-11 reduced-width multipliers, respectively. The results show that the proposed fixedwidth multiplier M F obtains better quality than other fixed-width multipliers, and the quality can be further improved by using MR1 and M R2 .
P "R = the number of " R 's which are larger than 0.01 for each input pair the number of all input pairs II  THE COMPARISON RESULTS OF ERRORS FOR  DIFFERENT FIXED-WIDTH MULTIPLIERS   TABLE III  AREA RATIO FOR DIFFERENT FIXED-WIDTH MULTIPLIERS   TABLE IV  THE QUALITY COMPARISON OF ONE-LEVEL WAVELET  TRANSFORM USING DIFFERENT MULTIPLIERS IV. CONCLUSION
The design of low-error fixed-width sign-magnitude and two'scomplement multipliers have been presented. By using this type of multiplier, the chip area can be significantly reduced and a little performance promotion is also introduced. In addition, the design strategy has also been applied to designing a reduced-width multiplier, which has lower product error than that of a fixed-width multiplier and still maintains low area complexity. They are useful in fixed-width data path architectures for multimedia and DSP applications where a uniform or reduced word width is usually required.
APPENDIX
Proof of Lemma 1: If x i y j = 1; then both input bits x i and y j are 1. If xiyj = 0; then only one of xi and yj is 1, or both are 0.
Moreover, if there are bit-product x i y j 's for i + j = n 0 1 equal to 1, then there are n 0 bit-product xiyj's for i + j = n 0 1 equal to 0. Therefore, at most 2 + (n 0 ) = n + bits and at least 2 bits in input bits are 1.
Proof of Lemma 2:
By Lemma 1, [(n + ) + 2]=2 = (n + 3)=2 bits of 2n input bits are 1 on average. Therefore, the probability of any input bit equal to 1 is [(n + 3)=2]=2n = (n + 3)=4n.
Moreover, the probability of the bit-product xiyj equal to 1 is
2 .
Proof of Theorem 1:
The number of bit-product terms in whose coefficients are equal to 1=2 m is n 0 m + 1; and (7) states that p1(x i y j ) = =n. Thus 
Proof of Theorem 3:
Referring to Fig. 3 
The inputs of Cg must satisfy the one of the following two cases. Thus, (9) is also satisfied.
I. INTRODUCTION
Radio frequency electronics is increasingly taking advantage of the advances in CMOS technology. These advances include the development of a 0.1-m CMOS technology [1] . This particular CMOS technology has demonstrated active devices with f t 's (i.e., the unity current gain cutoff frequency) exceeding 100 GHz and minimum noise figures less than 0.5 dB at 2 GHz [1] . The more commercially available 0.5-m CMOS technologies have demonstrated f t 's of 20
GHz and minimum noise figures of 1.6 dB at 2 GHz [2] . There have also been several reports within the literature of front-end RF designs (i.e., low noise amplifiers (LNA's) and mixers) within a CMOS environment (e.g., [3] - [7] ). A majority of the LNA designs have demonstrated a relatively high noise figure in comparison to the expected minimum [3] - [6] . We attribute this to the fact that the input MOS device was not conjugately matched for minimum noise. For the LNA design in [7] , the minimum noise figure was achieved by using exterior impedance tuners to match for both noise and power. Though this method was effective in matching the device for both noise and power, it would have a relatively low level of integration; i.e., it requires many off-chip components. In [5] , a mathematical description of matching for both noise and power for a MOSFET was provided. Here it was shown that if one includes the effects of induced gate noise, conjugate noise matching is not straightforward because the optimum noise reactance does not equal the input reactance of the device. In this paper, we further examine how induced gate noise influences conjugate noise matching. Furthermore, we will outline a simple method that can be used to simultaneously match for both power and noise for a MOS transistor.
II. INDUCED GATE NOISE
Induced gate noise is assumed to be one of the main components of noise within the intrinsic portion of a MOS transistor. The other noise sources include the drain channel noise and the thermal noise due to the resistance of the gate material; Fig. 1(a) illustrates the physical location of these noise sources with respect to an approximate lumped network model of a MOS transistor. The thermal noise resistance due to the resistance of the gate material has been shown to be equal to the total resistance of the gate length divided by three. In the network
