High-level synthesis is a novel method to generate a RTlevel hardware description automatically from a high-level language such as C, and is used at recent digital circuit design. Floating-point to fixedpoint conversion with bit-length optimization is one of the key issues for the area and speed optimization in high-level synthesis. However, the conversion task is a rather tedious work for designers. This paper introduces automatic bit-length optimization method on floating-point to fixed-point conversion for high-level synthesis. The method estimates computational errors statistically, and formalizes an optimization problem as a non-linear problem. The application of NLP technique improves the balancing between computational accuracy and total hardware cost. Various constraints such as unit sharing, maximum bit-length of function units can be modeled easily, too. Experimental result shows that our method is fast compared with typical one, and reduces the hardware area.
Introduction
High level hardware synthesis is a key technology for designing complex systems with short time-to-market, and some industrial tools are released from some companies [1] - [3] . A RT-level hardware descriptions are generated automatically from a high-level language such as C or extended C. However, a modern multimedia application (mpeg, mp3, etc.) is designed with full-precision floating-point arithmetic. So, these codes should be re-designed as a hardware oriented algorithm in which fixed-point arithmetic is used instead of floating-point one. Many researches have been reported to reduce the time for the floating-point to fixedpoint conversion.
In the FRIDGE project [4] , [5] , a framework for the conversion is proposed to check the correctness of the specified bit-length of variables using the simulation. Though the static analysis method is partially used to speed up the checking process, a designer still should specify the word length of each register and function unit that is tedious task.
A translator from C programs with floating-point op- † † The author is with the Graduate School of Informatics, Kyoto University, Kyoto-shi, 606-8501 Japan.
† † † The author is with the Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma-shi, 630-0101 Japan.
a) E-mail: nobuhiro doi@fuji.waseda.jp DOI: 10.1093/ietfec/e89-a. 12.3427 erations to DSP programs with only integer operations was developed by Kim et al. and Kum et al. [6] . In the translation, several properties of DSPs such as the fixed word length unit and operation set are used to reduce the search space for which simulated annealing handles. These properties are too restrictive on the ASIC/FPGA design. Kim and Sung also proposed a numerical analysis method for IDCT architectures [7] , where bit-length of coefficient and adder units are calculated using variance matrix. The method in [8] , which is applied for digital filter design, also uses numerical analysis method. It propagates and analyzes errors on the Z transfer function of IIR and FIR. These methods are for application-specific problems and more general frameworks is required.
Static analysis methods [9] - [11] are practical ones and applicable for general problems. They compute maximum error of each operation in a program statistically, and estimate word length of function units which guarantees the specified accuracy. This type of method do the 1-pass estimation using program analysis, so the processing time is far small compared with simulation based approaches. George A. Constantinides et al. presents an approach to the word length allocation and optimization problem for linear digital signal processing systems in [9] . In the optimization, word length of fixed-point operation units are determined by using mixed integer programming technique. They also propose enhanced method [10] which is applicable for nonlinear system. The approximation technique based on Tailor expansion is used for non-linear structures. In the [11] , a static error analysis technique for the code generation of DSP application is introduced. The smart interval method Affine Arithmetic [12] is used instead of the traditional interval method Interval Arithmetic [13] at the error analysis.
We have been developing bit-length optimization system based on static analysis [14] , [15] . In this paper, we propose the refined optimization algorithm. The main feature is the formalization into a non-linear problem. A computational error due to the conversion from floating-point to fixed-point operation is estimated statistically, and NLP technique is applied for the area & speed optimization. By using NLP based optimization, a near optimum solution which guarantees the specified accuracy is computed in short time, and additional constraints can be appended easily.
The remainder of the paper is organized as follows: next section introduces the overview of our compiler, and Sect. 3 presents fixed-point arithmetics. Section 4 describes Copyright c 2006 The Institute of Electronics, Information and Communication Engineers our error model and error propagation method based on affine arithmetic. Then, Sect. 5 explains the bit-length optimization method with the non-linear programming formulation. Section 6 shows implementation and evaluation, and Sect. 7 has the conclusion.
Bit-Length Optimization Flow
The optimization flow of the proposed compiler is show in Fig. 1 . The input language is written in SystemC [16] . The custom integer type (sc int) or fixed-point type (sc fixed) are defined in SystemC, so it is appropriate to describe an original algorithm.
The compiler receives an original algorithm including both char/int data type and float/double data type, then do the lexical/syntactic/semantic analysis. After the preprocessing, Control/Data Flow Graph (CDFG) is constructed. The CDFG consists of basic blocks, and each block contains a partially ordered set of operations and control flows to other basic blocks. Next two steps are analysis to detect the value range and computational error where general 16/32-bit integer operations and floating-point operations are analyzed. The result of the analysis is used in the following step, "Formulation for the non-linear programming." A general non-linear solver can be applied to solve the problem, and optimized bit-length of variables/function units are computed. However, the result is real value, so the extra rounding operation is applied to get integer result. After these steps, fixed-point algorithm or RT-level code will be generated.
The bit-length optimization problem is summarized as follows:
Inputs :
• Control/Data Flow Graph • Hardware specification that says known bit-length of input/output variables and operation units, acceptable error, maximum hardware area and so on.
Outputs :
• Optimized bit-length of variables and function units which enable accurate computation, but satisfy area constraints.
Fixed-Point Arithmetic
A program for image/sound processing usually includes floating-point operations. Since floating-point operation units require many transistors and much power, its application is highly limited for mobile use. In many cases, these costly operations are converted to low-cost fixed-point operations for hardware implementations even if computational errors are generated. Fixed-point format is the same as integer representation one, except the binary point. So, its operations can be executed on integer units. The typical format of fixed-point arithmetic is specified by a 3-tuple sign, wl, iwl , where sign 2's complement representation, wl word length, the number of bits, iwl integer word length.
In addition, the word length of the fractional part is described as f wl (wl = iwl + f wl) (Fig. 2) . When converting one fixed-point data type instance s 1 , wl 1 , iwl 1 into another different one s 2 , wl 2 , iwl 2 , bitlength of fractional part should be cut down in some cases. In the following, Truncation is used as the default policy since it is simpler to be implemented (i.e. requires less hardware) than Round-off.
Error Model Based on Affine Arithmetic
Value ranges and computational errors are closely related to optimization performance, so it should be analyzed accurately. We use static analysis method based on Affine Arithmetic [12] which is a smart interval method. Each value is represented as a range called affine form. We modified basic model for the formalization.
In this section, we describes basics of affine arithmetic, then explain our error model and error propagation method.
Affine Arithmetic
In the typical high-level synthesis system, Interval Arithmetic [13] is used for the static analysis. It is useful for the range analysis, but overestimation especially for correlated variables. Affine Arithmetic [12] is a refined model to overcome the problem in IA. In affine arithmetic, the uncertainty of a variable x is represented as a range called affine formx:
where x 0 is central value ofx, and ε i is uncertainty terms weighted by x i . The magnitude of weighted parameter x i is dependent on other variables. These uncertainty terms may be canceled out at the operation, so the result of range analysis is tighter than that of IA.
Operations on the affine form such asx±ŷ, a±x and ax, can be defined easily, and their results are still affine forms. For the multiplication, the result is no longer affine form. In this case, the approximated affine form is used as a result.
Error Modeling Using Affine Arithmetic
Errors on variable or operations are represented with affine form. We denote error models for constants, variables and operations in this section. Note that we modify the range of an uncertainty term ε from [−1, 1] to [0, 1] because Truncation is used as the default rounding policy for the hardware implementation.
-Constants -
In the high-level language, real values are represented with floating-point format (float/double) which provides enough accuracy for practical applications. However, they are mapped on fixed-point variables at the hardware implementation. As described in Sect. 3, fractional word length in not infinite, so fixed-point numbers have quantization error.
Let A be a fixed-point value andL A be its fractional word length. The quantization error of A takes the range [0, 2 −L A ], so the range (Â) and error (∆Â) of value A is represented with affine form as follows:
where A 0 is the central value and ε E A is the uncertainty term for the quantization error. The part of expression written in bold font is the error of A.
-Variables -
For the representation of variables, the uncertainty term of the variable range is appended. The range (X) and error (∆X) of variable X is represented as follows:X
X 1 ε X indicates the range of the variable, and X 2 2 −L X ε E X indicates the one of quantization error.
-Addition / SubtractionThe range and error for addition/subtraction is derived from the affine form of variables.
The result of multiplication is no longer affine form, so we introduce new independent component. The range of multiplication result can be described as follows:
The affine form has a second order term ε X ε Y . This term can be replaced to new independent component ε Z because the
-Division / Square root -
The range and error for division/square root can not be derived from right-hand member, so the approximation is introduced [12] . For division, the operation is converted to a multiplication by using the reciprocal of the divider. Chebyshev approximation technique is useful for the range estimation of reciprocal. This approximation technique is also applicable to square root. The implementation of the approximation method is a future work.
Error Propagation
Errors of source variables are propagated to the destination variables via the operations. Each operation in a data-flow graph can be described with SSA(Single Static Assignment) form. One operation is described as Z = X • Y, where • is a arithmetic operation. The range and error of the operation result is estimated based on the error model mentioned in Sect. 4.2.
In general, a source program includes control structure such as loop, so following rules are applied in the error propagation.
-BranchFor branch operations (i.e. if-then-else), we cannot decide which part is to be taken at compile time. So, both then part and else part are analyzed, then enough bit-length for accurate computation on both part is prepared. For the propagation, following conversion technique is introduced:
LetL X if andL X else be fractional word length of variable X for each branch parts. In this case, fractional word length of X should be equal to MAX(L X if ,L X else ). MAX function cannot be manipulate in NLP, so these two parameters are replaced by one parameterL Z as follows:
Loop structures are categorized into the following three types:
1. the number of the loop iteration is known. 2. the number of the loop iteration is not known, but the maximum number of the loop iteration is known. 3. there are no information on the number of loop iteration.
For loop structure of type 1 or 2, we apply our method after applying the loop unfolding. There are several preprocessors which can unfold the loop. For loop structure of type 3, we need to find properties (fixed point properties) which will not be change by applying the loop body. The estimation of programs including loop of type-3 is a future work.
Formalization with NLP Technique
This section presents bit-length optimization method via the formalization with non-linear programming technique. The optimization flow is overviewed in Fig. 1 . Computational errors are analyzed using affine model as mentioned in Sect. 4, and total amount of errors on output variables are estimated. Then, conditional functions are constructed from errors and hardware specification such as acceptable error/maximum bit-length. The bit-length optimization problem under the constraints is solved by the application of a general nonlinear solver.
In this section, we show the motivative example first, then describe formalization with non-linear programming technique. After that, extra rounding operation to find integer result is applied. We also discuss about construction of conditional functions and its easiness.
Motivative Example
We use following equation as an example to explain the behavior of the optimization algorithm. variables (tmp3,tmp4,tmp5,tmp6). The program also includes three real constants defined as double precision floating-point numbers. These constants should be converted to fixed-point ones for hardware implementation. Our algorithm finds the minimum fractional word length of each variable required for accurate computation.
The following specifications are given as problem specifications:
• Each input variable (red,green,blue) is 8-bit integer and takes values between 0 to 255. These variables are integers and do not have errors.
• The output variable (Cr) is fixed-point value and takes values between 0 to 255. The error in Cr should be less than ±0.5.
Value Range Analysis and Error Analysis
First step of the optimization is value range analysis and error analysis. These analysis are performed statistically based on affine arithmetic. For example, the range and error of tmp3 in the source program is computed as follows:
C 0 is a real constant (0.1684),L C 0 andL tmp3 are fractional word length of C 0 and tmp3. The quantization error of tmp3 (2 −L tmp3 · ε tmp3 ) is newly introduced. In the same way,tmp4,tmp5 andtmp6 are computed as follows:
where real constant 0.5 can be converted to fixed-point value without quantization error. Finally, the range and error of output variable Cr is derived. The error of Cr (= ∆Cr) is represented with affine form:
The fractional word length of tmp6 is totally dependent on ones of tmp3 and tmp4, so it has no affect on computational error.
Formalization for Non-linear Problems
After the analysis, range and error of each primary output is expressed with affine form. The expression can be considered as a polynomial where fractional word lengtḣ L i (i = 0, 1, · · · , n − 1) is unknown parameter. Non-linear programming technique is used to determineL i under the constraints.
Let O j ( j = 0, 1, · · · , m − 1) be primary outputs of a module, and Lim(O j ) is an acceptable error specified by a designer. It is specified that ∆O j should not exceed its acceptable error Lim(O j ). The estimated error is described as follows:
then, the following conditional expressions are drawn:
They are considered as conditional functions for the nonlinear problem. The optimization goal is the minimization of hardware cost, so the objective function is described as follows:
where L i is word length of each variable, and is sum of the fractional word length (L i ) and the integer word length (L i ). The integer word lengthL i is determined easily based on the value range of a variable.
A cost function should be a function that represents actual hardware cost, but it can be replaced with an approximate function. One definition is like that:
The hardware cost is in proportion to maximum word length of source variables.
• Multiplication The hardware cost is in proportion to the multiplication of source variables' word length.
Recall the example and its specification, the error of a primary output ∆Cr should be the value in the range [−0.5, 0.5], then the conditional function is −0.5 < ∆Cr < 0.5.
There are seven fixed-point constants/variables, then the objective function is
where the integer word length of constants/variables is computed from its range. For example, integer word length of tmp3 (L tmp3 ) is 7-bit, since the range is [0, 0.1684 · 255].
Application of NLP Solver
The bit-length optimization problem is formalized as a nonlinear problem via Sect. 5.3. A general non-linear solver is applied to solve the problem, and we use SQP (Sequential Quadratic Programming) method [17] , [18] . It is an algorithm for the minimization of an differentiable real function f subject to inequality and equality constraints. The gradients of constraints are used as a guide for the exploration.
The optimization result of example program is shown in Table 1 , where the cost function mentioned in Sect. 5.3 is introduced as an optimization goal.
Getting Integer Result
In general, the results of a non-linear problem are real values (i.e. Table 1 ). So these values are re-evaluated to find integer results for hardware implementation. Following reevaluation methods are available:
• Rounding-up : All real values are rounded-up. This approach is simple and the result always guarantees computational accuracy.
•
Step-by-Step Search : Sets of fractional word length Table 3 Optimization result with '
Step-by-step search.'
around the optimization result are evaluated, and the result which guarantees computational accuracy and occupies least area is selected. This approach may find better result than that of previous approach, but takes much more time. Table 2 is the result of rounding-up approach. The result of the other approach in Table 3 is also feasible and achieves less area.
The fractional word length of {tmp3, tmp4, tmp5, tmp6} are reduced to 3 bit from 6 bit when "Step-by-step" search is applied. In "Rounding-up" strategy, all real values are rounded-up systematically even if the value is nearly equal to an integer. However, rounding-up of one value (i.e. 10.16 → 11) suppresses quantization error. In such cases, any other real values can be cut down if the total computational error does not exceed the acceptable error. "
Step-bystep" strategy tries to find such result.
Additional Conditional Functions
In the optimization using non-linear programming technique, the conditional functions for computational accuracies are specified as inequality constraints. Various constraints can be specified easily in the same way.
-Case-1 : Maximum word length specification In some cases, maximum word length of operation units are limited (i.e. 16-bit, 32-bit) and all operations should be binded on these units. In order to find a feasible result, a designer just append conditional functions as follows: 
Scheduling should be considered for operation unit sharing. The implementation of these techniques is future works.
Computational Complexity
The computational complexity can be divided into two parts.
One is the computation time to generate non-linear formulas from the Control Data-flow graph, and the other is one to solve the non-linear problem.
In the non-linear formula generation, we assume that we have a sequence of operations with two operands. We have shown a systematic method to generate a non-linear conditional formulas from each arithmetic operations. Note that the number of generated conditional formulas is also proportion to the number of operations. Let the number of operations is n, then the computational complexity is O(n). Also note that the number of variables in the non-linear problem is O(n).
The computational complexity for solving the nonlinear problem depends on the algorithms. We use SQP, and in the SQP we repeat QP several times. The computational complexity of QP is NP-hard in general [19] . Therefore, SQP can also be considered as NP-hard. The computation time, however, for usual problems is not so large as shown in our experiments.
Implementation and Evaluation

Implementation Detail
We have implemented the optimization algorithm with C++ language (5000 lines), System C library [16] , Antlr parser library [20] and DONLP2 library [18] . Then, we applied it two sample programs: Color space conversion (example program) and FIR filter. The experiments is executed on the Linux PC with Pentium 4 -2.4 GHz and 1 Gbyte memory.
We compared the results of proposed algorithm with those of other three approaches 'Manual optimization,' 'GA based optimization' and 'Previous algorithm.' In the 'Manual optimization,' the designer analyzes the program and uses the knowledge to reduce fractional word length. The designer check the correctness of optimization result by using exhaustive simulation. It is impractical to use exhaustive simulation for a large program, input patters are limited. 'GA based optimization' is the optimization based on genetic algorithm. After the 'Value Range Analysis & Error Analysis,' genetic algorithm is applied to find a feasible set of fractional word length. In the algorithm, a gene is modeled as a set of fractional word length {L 0 ,L 1 , · · · ,L n−1 }, and a fitness function is the function representing hardware area. The results are generated by the iterative execution, and best one is selected as a final result. 'Previous algorithm' is proposed by us in [15] . In the algorithm, static analysis is ap-plied, and some heuristics are used to find fractional word length.
In the examination, we give acceptable errors for primary outputs as constraint functions, and define the objective function for proposed method as follows:
Though the sum of variables word length is not equal to actual hardware cost, it can be considered as a measure for that.
Application to Color Space Conversion
The optimization result of color space conversion is shown in Table 4 . Note that only fractional word length are summarized in the table.
The result shows that proposed algorithm achieve fractional word length optimization under the constraints. The result of proposed algorithm is comparable to the one of 'Manual optimization.' Only L C 0 by 'Manual optimization' is less than others because the approach is not statistic one.
Application to Various Programs
The proposed algorithm is applied to two sample programs: Color Space Conversion, FIR filter. Results are shown in Table 5 , where '#var' is the number of floating-point variables and constants declared in the source program, '#Bits' is the sum of fractional word length, 'Time' is the CPU time for optimization and '*' in the table denotes that the correctness is not guaranteed for the results because of the limited simulation patterns. Since the number of all combinations of input values for the simulation blows up, the number of input patterns are limited to 1,000,000. For YCrCb, we apply our algorithm with the constraint −0.5 < output error < 0.5, and for FIR with the constraint −1.0 < output error < 1.0.
From the results, we can conclude that our algorithm successfully optimize word length. Total bit length by 'Manual optimization' and proposed algorithm are nearly the same. Furthermore, our algorithm guarantees the correctness, and is faster than the other approaches.
The results of proposed algorithm are worse for several variables in Table 4 compared to 'Manual optimization.' Note that 'Manual optimization' does not guarantee the correctness for the worst case, and the method just tests the equality of a program using float type and that using fixed type with respect to 1,000,000 random patterns. Also note that 'Manual optimization' is effective for programs with small number of variables but is hard to apply programs with large number of variables.
Conclusion
We describe automatic bit-length optimization method on floating-point to fixed-point conversion for high-level synthesis. The method estimates computational errors statistically based on smart interval method Affine Arithmetic. Minimization problem of the fractional word length is formulated as a non-linear problem, and solved using a SQP method. We also discuss about construction of various constraint functions. Our algorithm does not require the simulation on huge data, and is fast and guarantee the worst case accuracy.
We have implemented the optimization algorithm and that is applied to two sample programs. From the results, we found that the method is useful for the estimation of the fractional word length.
We have shown a basic idea for the estimation of the fractional word length. Refinement of the algorithm would be needed.
One of the important problems is the unification of scheduling, unit sharing and bit-length optimization. They are closely related each other and have great effects on the performance of synthesized circuits. The constraints such as maximum area and latency should be included too.
The evaluation of the optimization result is another im-portant topic. The optimization result of our algorithm and exact optimum result should be compared, and evaluated how the algorithm achieve bit-length optimization.
