Abstract-Multiplication by constants can be efficiently realized using shifts, additions, and subtractions. In this work we consider how to select a fixed-point value for a real valued, rational, or floating-point coefficient to obtain a low-complexity realization. It is shown that the process, denoted addition aware quantization, often can determine coefficients that has as low complexity as the rounded value, but with a smaller approximation error by searching among coefficients with a longer wordlength.
I. INTRODUCTION
I N many DSP algorithms multiplier coefficients are either floating-point numbers (e.g., from filter design algorithms), rational numbers (e.g., 1/3), or real numbers (e.g., or ). However, when implementing digital signal processing (DSP) algorithms fixed-point computations are often preferred over floating-point due to lower complexity and power consumption.The conversion from floating-point, rational, or real valued numbers to fixed-point can be seen as quantization of an infinitely long fixed-point representation. To avoid lengthy repetition we will in the following use floating-point, rational, and real numbers interchangeably to denote numbers that can not be exactly represented using fixed-point representation.
It should be noted that typically, one distinguishes between quantization of the data and quantization of the multiplier coefficients. Data quantization leads to round-off noise, which is usually modeled as an additive error signal, where the error signal is characterized as a stochastic process with properties depending on the type of quantization used. Coefficient quantization on the other hand leads to a static deviation from the ideal transfer function. It should be noted that data quantization is also often performed within the algorithm implementation to reduce the wordlength of the computations. Especially, for recursive algorithms this is required as, otherwise, the wordlength would grow indefinitely.
In this work we consider multiplication by a constant fixedpoint number approximating a number that can not be exactly represented with the same number of bits (or possibly not at all of presentation we will without loss of generality assume that . Using fractional bits and proper rounding the approximation error, , is (1) It is throughout this work assumed that we should meet an approximation specification of fractional bits, as in (1), although other measures can be dealt with in a similar way.
An unsigned fractional fixed-point coefficient, , represented using fractional bits can be written as (2) where . Now, assume that a multiplication with a data, is performed. The result is
The multiplication can then be performed as a sum where the input is shifted and multiplied by either 0 or 1, once for each bit of . In total there are additions to compute the result. Note that for bit-parallel computation the shifts can be hardwired, and, hence, no logic cells are required for shifting. If the coefficient is known in advance the multiplication by 0 or 1 can be simplified to either 0 or . Zero-valued data does not contribute to the sum. Therefore, the number of additions is directly proportional to the number of nonzero bits of .
Using a signed-digit (SD) representation we have . Hence, each bit is now a ternary digit. As for the constant coefficient multiplier case we do not represent the coefficients explicitly as inputs to the multiplier, the complexity does not increase by introducing a third alternative for each position. Instead, it just leads to that some of the additions may be replaced by subtractions. As a subtraction has about the same complexity as an addition, for simplicity throughout this work we will refer to both as additions. The potential benefit of using a SD representation is that it is often possible to find a representation with fewer nonzero positions compared to using a binary representation. An SD representation with the smallest possible number of nonzero digits is referred to as a minimum signed-digit (MSD) representation. One MSD representation of special interest is the canonic signed-digit (CSD) representation. For a CSD representation we have . For each coefficient there are several possible SD representations. There may also be several MSD representations. However, the CSD representation is unique (hence, the name canonic), so if a CSD representation is found we know that it is also an MSD representation and the minimum number of nonzero positions is well established. The average number of nonzero positions in a 1070-9908/$26.00 © 2009 IEEE CSD representation is asymptotically compared with for binary, while the maximum number of nonzero positions is for CSD compared with for binary. Hence, the number of additions are on average reduced by using MSD/CSD representation compared to binary.
Over the years several algorithms have been proposed to design DSP algorithms with a few number of nonzero SD terms, often referred to as sum-of-powers-of-two (SOPOT) or signedpower-of-two (SPT) terms. Examples include specific digital filters [1] , [2] and transforms [3] , [4] , as well as general DSP algorithms [5] , [6] . The resulting realization is often called multiplierless as general multiplications are replaced by shifts and additions. In [2] the statistical properties of SD representations are investigated for multiplier coefficients. There has also been investigations on using SD representations with a low number of nonzero digits for data [7] .
Despite the fact that the CSD representation is minimal it is still possible to find constant multiplication realizations using fewer additions compared to a straightforward shift-and add realization based on CSD [8] - [10] . In [8] an optimal approach was introduced and it was shown that all constant multiplications with coefficients with up to 12 bits can be realized using at most four additions. In [9] that approach was simplified and it was shown that at most five additions were required for up to 19 bits coefficients. In addition to the optimal approach, a heuristic was also introduced in [9] based around the idea that it is sometimes worthwhile to increase the number of nonzero signed-digit terms to reduce the number of additions. The generation of all signed-digit representation for a coefficient can be obtained as in [11] . Finally, in [10] an efficient heuristic was proposed, based on the heuristic in [9] , to allow low complexity multiplication with arbitrary wordlength. In terms of theoretical results it has been shown that the maximum number of additions grows as , where is the coefficient wordlength [12] , [13] , while at least additions are required, where denote the number of nonzero SD terms for the coefficient [9] , [14] .
The discussion in this paper is based on carry-propagation addition, i.e., addition of two numbers to yield a single result. A similar approach can be used for different types of additions, e.g., using high-speed redundant carry-save additions where the constant multiplications structures in [15] should be used instead. It should also be noted that the number of bits involved in each addition, and, hence, the number of full adder cells required, differs between the additions [9] . Furthermore, the number of cascaded additions may also be of interest to consider. It is possible to consider this as well during the search process described in the paper by simply adopting a different cost measure when selecting the best solution. The presented results focus on the number of additions only.
II. ADDITION AWARE QUANTIZATION
If the allowed wordlength is increased with fractional bits, the approximation error can be guaranteed to meet . However, another way of using the additional fractional bits is to realize that there are exactly different representable coefficients for which , including the one obtained by rounding to fractional bits. The basic idea in this work is to search these and select the coefficient value that has the smallest approximation error for the allowed complexity. The allowed complexity is typically assumed to be the same number of additions as required by the coefficient rounded to fractional bits. We refer to this scheme as addition aware quantization. It should also be noted that in some cases it is possible to find valid representations that require a lower complexity compared to the rounded fractional bits coefficient. This is further illustrated in Section III.
To further illustrate the fact that there are different solutions consider Fig. 1(a) where the possible alternatives for fractional bits are illustrated. Clearly, there is only one value, denoted , that meets the requirements. Now, increasing the resolution with one bit gives the case in Fig. 1(b) , where an additional possible solution is available. The fact that it here happened to have a smaller approximation error is not crucial. Instead, we are interested in the fact that we have a second, alternative, approximation. Finally, the general case with extra fractional bits is illustrated in Fig. 1(c) .
There will be extra coefficients where , are subtracted from the fractional bits approximation, ; see Fig. 1(c) . Similarly, there will be extra coefficients where , are added to . If (as in Fig. 1(a) ), then (4) otherwise we have (5) where the other term can be determined by .
III. DESIGN EXAMPLES In this section, we provide a number of examples illustrating the concept and results of addition aware quantization. The design examples also illustrate various ways of applying the addition aware quantization concept. For the addition costs we use the optimal results in [9] for up to 19-bits coefficients. For longer wordlengths the heuristic in [10] is used. The number of correct fractional bits, CFB, is defined as (6) Clearly, there is a tradeoff between the number of extra bits to search, and, hence, the offline computational complexity, and the possible obtainable results, and, hence, the online computational complexity. It should be noted that eventually all newly introduced coefficients will have such a large number of nonzero positions that it is not possible to find realizations with the required number of additions [9] , [11] . However, it is yet not known if there exists such a bound based on the number of fractional bits.
A. Rational Numbers
Multiplication with rational numbers (or division with integers) occurs frequently in some DSP algorithms. As many rational numbers have a repeating base-2 representation it means that when the pattern has a suitable length it is possible to use fewer additions compared to having a shorter wordlength. The results of this are illustrated in Fig. 2 . This also provides a good example of that increasing the wordlength sometimes can decrease the addition complexity; the multiplication with 1/7 requires five additions when rounded to 23 fractional bits, but only three additions when rounded to 24 fractional bits.
B. Trigonometric Constants
Trigonometric constants occur in, e.g., FFTs, DCTs, and Goertzel filters [3] , [4] , [16] . Furthermore, it is notable that constants such as and are special cases of trigonometric constants. Here, we consider the best obtainable approximation using three additions. The results are shown in Table I for a number of different trigonometric constants found in the literature.
As can be seen the proposed methodology sometimes increases the precision for a given complexity. However, it is not always the case that coefficients exist with the same complexity but higher precision, as illustrated for some of the coefficients. With the proposed method this can be verified.
C. Cordic Scale Factor Compensation
The CORDIC algorithm is a method to compute certain trigonometric and hyperbolic elementary functions based on (7) for trigonometric operations and (8) for hyperbolic operations 1 , cf. [17] .
We consider compensation of the asymptotic gain factors, i.e., multiplication with and when . The results are given in Table II and show the best possible approximations using a given number of additions. The results for five additions are given by the heuristic from [10] , and, hence, those can not be guaranteed to be optimal. As a final note, it can be seen that is almost two times larger than . Hence, the number of total correct bits is one more for for the same number of correct fractional bits.
D. Joint Optimization of Several Factors
Sometimes a cascade of two or more constant multiplications are used. Then, the approximations of the individual multiplications are accumulated. While this may lead to cancellation of approximation errors having negative signs, it may also lead to that the total approximation error is larger than the individual approximation errors. The straightforward way of handling this is to increase the wordlengths of the individual multiplications until the total error meets the specification. Addition aware quantization provides a better way of obtaining this accuracy increase. This is illustrated using a reconfigurable double constant multiplier for certain types of FFT algorithms as proposed in [18] . The multiplier structure is shown in Fig. 3 and it can multiply a single input with any of the coefficient pairs using only constant multiplications with and by using that . 1 The factor h in (8) is defined as the largest integer such that 3 +2h 0 1 2n. In practice this leads to that certain iteration angles, such that i = (3 0 1)=2, are used twice to obtain convergence [17] . The approximation error for the multiplication is . Hence, it is possible that even though the multiplications with and are correct to bits, the multiplication with is only correct 2 to bits. To reduce the approximation error to the required level we will use addition aware quantization. For each precision requirement we select the solution with smallest maximum approximation error among those solutions with the smallest addition count. For comparison we will also use a straightforward scheme based on increasing the number of fractional bits and rounding, as discussed above. The results in terms of approximation error is shown in Fig. 4(a) , where it can be seen that in seven out of the 14 considered precisions, the rounded version actually breaks the precision requirements for the multiplication. The results in terms of required number of additions is shown in Fig. 4(b) . Here, it can be seen that the proposed method in rare cases even decrease the number of additions. The reason that more additions are sometimes required is due to the fact that in these cases the rounded version do not meet the specification (compare to Fig. 4(a) ). A benefit of the addition aware quantization scheme that is manifested in this example is the ability to select coefficient values such that the signs and magnitudes of the approximation errors cancel.
IV. CONCLUSION
In this work we have proposed addition aware quantization as a way to find fixed-point coefficients suitable for shift-and-add 2 sin =8 + cos =8 1:3065629648763 > 1. realization of the corresponding multiplication. By searching nearby coefficients it is often possible to find values that either have a smaller approximation error with the same addition count or, in some cases, a smaller addition count still meeting the error specification. Several examples illustrated the usefulness and the properties of the method.
