This paper presents a novel power aware fractional bit-widths optimization scheme during floating-point to fixed-point transformation of digital signal processing (DSP) algorithms. The scheme guarantees accuracy at output and saves power in multipliers. Quantization-Operation-Error (QOE) model is used to construct the worst case quantization error propagation. Based on QOE, a power reduction technique is proposed to dynamically reduce switching activity in multipliers when not the worst case is confronted. Results of four case studies demonstrate that 1.65% to 2.14% system power is saved with the power reduction technique, which is nearly free.
Introduction
Power consumption has become a primary design criterion for modern DSP systems. The majority of low-power design techniques and analyses in electronic designs are targeting low levels, such as transistor level, gate level and Register Transfer Level (RTL). However, the most effective power reductions often stem from system level [1] .
Most algorithms with high precision in computation is wasteful and significant hardware reductions are possible. Bit-widths can be optimized to achieve desired performance and efficient implementation cost: higher speed, smaller area and lower power. The process, called floating-point to fixed-point transformation, can directly reduce power at system level.
There are mainly two kinds of methods for bit-widths optimization. One is simulation-based [2, 3, 4] and the other is analytical [5, 6] . The former methods use large simulations to search the bit-widths. The analytical methods deploy interval analysis and error models to analyze signals' ranges and precisions. Many computer arithmetic and scientific applications restrict the maximum absolute error bound at output. The simulation-based methods do not guarantee to find results within the error constraint for every input. So, analytical methods are used for this kind of accuracy-guaranteed problem.
The remainder of this paper is organized as follows. Section 2 discusses related work. Section 3 presents our power aware accuracy-guaranteed fractional bit-widths optimization scheme. Section 4 gives results and comparisons of four case studies. Conclusions are summarized in section 5.
Related work
Fang et al. [5] use Affine Arithmetic (AA), which considers the correlations among signals, to model range and precision analyses. It serves much better than Interval Arithmetic, but signal's range and precision are solved in one single affine expression, which may limit the optimization.
Lee et al. [6] develop an approach called MiniBit, which also uses AA, but separates the range and precision problem apart. MiniBit guarantees output accuracy while minimizing area cost. Power is not considered in MiniBit.
Mallik et al. [4] propose algorithms for trading off error constraint with power and area. They employ SystemC to accelerate their simulations and use a safety factor to tighten the error constraint for more convincing results. However, accuracy at output is not guaranteed in their work.
Proposed scheme

Background
Wordlength
A fixed-point signal's wordlength (WL) is composed of integer part (IWL, including the sign bit for signed arithmetic) for preventing overflow and fraction part (FWL, or fractional bitwidth) for sustaining output accuracy.
Our scheme focuses on output accuracy, which only concerns about signals' FWLs optimization.
Signals' IWLs can be derived by adopting method like Range Analysis in [6] .
After signal's FWL is determined, the signal is quantized. There are mainly two types of quantization: truncation and round to nearest, which respectively cause maximum error of 2 −F W L and 2 −F W L−1 to the signal. Truncation is more implementation efficient than round to nearest. From now on, (Wx, Ix, Fx) is used to represent signal x's (WL, IWL, FWL) and truncation is considered as the default quantization type.
Affine Arithmetic
Affine arithmetic (AA) [7] is developed as a refinement in range analysis. It not only keeps track of signals' intervals, but also preserves correlations among them. Using AA, the quantized version x q of signal x is:
where ε x ∈ [0, 1] represents the independent uncertainty of x that propagates through dataflow and contributes to the uncertainties of intermediate signals and output.
Power
Power consumption in a CMOS digital system consists of dynamic, shortcircuit and leakage power. Shortcircuit power is due to short circuit current conducting directly from the supply to ground, and leakage power is primarily determined by fabrication technology. They take very small portion of the total power consumed in a system and are rather low level issues. We are interested in higher levels of abstraction, so only dynamic power is considered.
where α is the switching activity parameter, C L is the load capacitance, V DD is the operating voltage and f clk is the clock frequency. high-level view, power can be saved through switching activity reduction, which minimizes the number of operations in computation. In this paper, we consider the maximum absolute error Δ output , which is constrained at final output, is less than or equal to 1. Through interval analysis, it is easy to have that any of inputs or intermediate signals x's absolute error Δ x :
Accuracy guaranteed FWLs optimization
Module definition
From Eq. (1), signal x's quantization error 0 ≤ Q x ≤ 1 and F x ≥ 0.
Error propagation and QOE
In Figure 1 , for "op = ±", c = a ± b.
where
Quantization-Operation-Error (QOE) is proposed to formulate the error introduced to module's output because of inputs' quantization and the following operation.
For "op = ×", c = ab. From Eq.(2) and because xy ≤ (
where QOE a×b = (|b| + 1)Q a + (|a| + 1)Q b . From Eq.(3-4), we can derive that the absolute error at algorithm's output is absolute linear summation of inputs quantization errors and all dataflow modules' QOE.
The approximations in Eq. (4) enlarge the error expression, which result a more rigorous FWLs result and larger area cost, but signals' quantization errors are separated from each other. That makes the FWLs search much easier and power reduction possible.
FWLs Search Algorithm
To guarantee accuracy at output, the worst case must be considered, which all the coefficients and quantization errors' maximum absolute value should be taken, like max |a| , max |b| , max Q a and max Q b are taken in Eq. (4 [6] , area model of x ± y is taken as max(Ix + F x, Iy + F y) and x × y's area is (Ix+ F x)(Iy + F y). The total area of an algorithm is all modules' area summation. After signals' IWLs are determined (Section 3.1.1), total area is a function of all signals' FWLs combination.
When error constraint err_spec at output is provided, our object is to find the FWLs combination which makes max Δ output ≤ err_spec while minimizing the total area.
We can build a 2-D max QOE Look Up Table (LUT) and an AREA LUT with respect to (Fa, Fb) combination for every module, which respectively store module's maximum QOE (multiplied by coefficients like in Eq. (4)) and area. Starting from the minimum FWLs, we choose the most efficient module's FWLs change as every greedy search step, which decreases most error with unit area increase. It makes the results move very quickly towards err_spec. After err_spec is fulfilled, we can furthermore reduce area with least error increase, which makes use of the gap between max Δ output and err_spec to find a lower area solution. This greedy search algorithm will quickly find an area-efficient FWLs combination result.
Power reduction technique
Multipliers are the major sources of power consumption in typical DSP applications. Based on QOE in Section 3.2.2, we can reduce multipliers' power by decreasing their switching activities without sacrificing the required error constraint.
From Eq.(4) QOE a×b = (|b|+1)Q a + (|a| + 1)Q b , max |b| and max |a| are taken for the worst case. So, a multiplicand's quantization error is inversely proportional to the other one's value plus 1. We can dynamically relax multiplicand's quantization error when the other one's value does not achieve its peak. Take a's Q a in module a × b as the example:
where F a_t = log 2 max|b| |b| means the truncated fractional bits from F a, because of the variance of b's value.
One thing to be noted is that F a in Eq.(5) maybe negative, which means the resulted effective least significant bit can be higher than decimal point. Just like Eq.(5), the dynamically truncated fractional bits F b_t = log 2 max|a| |a|
. Figure 2 gives a circuit design example of Eq.(5) in signed multiplier. Unsigned multiplier is simpler. However, it is more accurate to model signal's distribution as gaussian process [8] . Figure 3 In order to achieve the best implementation efficiency, bits of a q connected to the circuit in Figure 2 can be b-statistics-dependent. For example, for the distribution of b having larger E(F a_t), the bits of a q connected to the circuit should be more.
Han et al. [9] have studied truncation of multiplicands in multipliers. For one of the most commonly used multipliers, which uses Wallace algorithm, the reduction in multiplicands' least significant bits via truncation mask reduces multiplier's power nearly linearly with the truncated wordlength. Figure 4 gives the experimental results of 16 × 16 Wallace multiplier implemented on Xilinx Virtex-4 XC4VLX100-11 FPGA. "(WL, 16), From Figure 4 we can see that, the proposed scheme can efficiently reduce multipliers' power, without sacrificing error constraint at output.
Case Studies and Comparisons
Four applications: degree four polynomial approximation (D4PA), RGB to YCbCr color space conversion (RGB2YCbCr), 2 × 2 matrix multiplication (2×2 MM) using Strassen's algorithm and 8 × 8 Discrete Cosine Transform (8 × 8 DCT), are carried out. Error constraint at output is set to 2 −16 . All inputs are considered as gaussian distributions. Multipliers use Wallace tree algorithm. Target device is Xilinx Virtex-4 XC4VLX100-11 FPGA and clock frequency is 1MHz. Xilinx tool "XPower" is used to estimate accurate dynamic power after design is synthesized, placed and routed by ISE9.1i. Area and power of applications, without (w/o) and with (w) the power reduction technique described in Section 3.3, are compared in Table 1 . Table 1 : Case studies and comparisons
Conclusions
A novel power aware accuracyguaranteed fractional bit-widths optimization scheme for floating-point to fixed-point transformation of DSP algorithms is presented in this paper. Quantization-Operation-Error (QOE) model is used to construct the worst case quantization error propagation. Based on QOE, a power reduction technique is proposed to dynamically reduce switching activities in multipliers, without sacrificing required accuracy at output. The power save is nearly free.
