Abstract -The word-length of Functional Units (FU) has a great impact on design costs. This paper addresses the problem of choosing different word-lengths for each FU while considering circuit area and power consumption. A high-level synthesis tool is used to minimize the circuit area and power consumption by selecting an optimal word-length for each FU in the system. Our results demonstrate that by customizing word lengths to nonstandard sizes, savings can be made in the overall area and power without losing accuracy.
I. INTRODUCTION
One of the problems in implementing signal processing algorithms on digital hardware is choosing an appropriate word length for arithmetic units. Traditionally this problem is solved by making a worst-case assumption and choosing a single word length for all arithmetic units. Using a word-length less than this worst-case assumption at different points in the algorithm would, however, save both area and power.
Several pieces of work have focused on finding an optimal word-length for the algorithm as the first step and then designing or optimizing the system within that constraint, [2] . The word-length is not considered in the subsequent optimization process. In other studies in which word-length has been considered during optimization [9] , signals have been categorized into a few groups to constrain the word-length in all functional blocks, or only one objective has been considered in addition to digital noise [1, 8] .
High Level Synthesis (HLS) has been considered to be a key factor in reducing the distance between the initial specification and target design [3] . Because of the variety of possible applications, domain-specific HLS tools are needed to achieve an optimal solution.
In this work, we present a multi-objective optimization method to optimize circuit area and power consumption by choosing optimum word-lengths for each FU. Cost functions for area, power consumption and digital noise are discussed in section 2, sections 3 gives a short description of the implemented design tools and section 4 is devoted to the GA method which has been applied. Results are explained in section 5.
II. COST FUNCTION
From a high level synthesis point of view, both the total area and power consumption of a system can be divided into three parts: data paths; controllers; and interconnections. Having focused on word length optimization, area and power costs should be considered as functions of the functional unit word length. Depending on the implementation methodology, word lengths have different impacts on each part (datapath, controller and interconnections) of the metrics. In our method, changing the word length does not change the controller area, so that is considered as a constant value in the cost function. In addition, since this methodology is bus oriented, interconnection costs are only marginally affected by word length compared to the changes in the datapath. Accordingly our cost models assume the datapath costs are variable and others are constant, equation (1) .
F is the cost function and W are the word lengths for the functional units. In the following section, we present a brief description of the cost models for circuit area, power consumption and digital noise.
A. Area Cost Function
Since the area of the controller ( C A ) does not change with word length and the interconnection area ( B A ) only slightly depends on it, the total area of the datapath is evaluated by adding up the sub-block and FUs areas
). Thus as an approximation, the area of building blocks such as sequential multipliers, adders, registers, buffers and switches can be assumed to have a proportional relationship to word length while the area of a combinational multiplier can be modeled by a second order relationship with its word length. Design implementation results confirm this assumption as depicted in figure (1). Equation (2) gives the area cost function for a system.
B. Power consumption Cost Function
Knowing that the changing word length of the FUs does not affect the controller activity and structure, the power consumption of controllers (P C ) is a fixed term in the estimated power consumption. In addition, because a bus-oriented approach is used in this study, interconnection power consumption (P B ) only depends on the maximum word length in the shared bus; therefore, ignoring the P B dependency on W is acceptable at this level of abstraction. Equation (3) shows the general model of power consumption with these approximations. A set of designs was used to evaluate the functional unit dependency on word length and the results are presented in Figure ( 2). In this figure, the average power consumption for basic cells, with random input data is shown with respect to word length. In these simulations, the Nominal Low Leakage ST 0.12 m technology file is used. From this, we can see that power consumption can be modelled as a linear function of the word length.
On the other hand, power consumption is a combination of static and dynamic parts; accordingly, in each FU it is a sum of static and dynamic parts as in Equation (4) .
Here k P is the power consumption of the th k FU. In general, dynamic and static power consumption are data dependent [5] but in this study, to estimate power consumption in the optimization procedure, static power consumption is considered proportional to the total power, Equation (5) .
k is the leakage power factor. Simulations verify this assumption for basic blocks for different word lengths.
Another assumption used to reduce the evaluation complexity is a time slot approximation [10] . In this approximation the total power consumption of a functional unit is calculated in two parts: activation time slots and standby time slots. During functional operation, power consumption is the sum of dynamic and static power whereas in standby, only the leakage power is taken in account. Based on this approximation, the total power consumption for each functional unit is given in Equation (6) .
F is the average power consumption of the system, k P is the average power consumption of the th k functional unit, k t is its activation time and T is the total system operation time. 
C. Digital Noise Cost Function
In practice, digital signal processing systems can only offer a finite number of binary digits to represent the signals to be processed. Fitting real values in these limited containers causes effects which can be categorized in several different ways. From a mathematical point of view, using a limited number of bits to represent a real number always means adding or removing indeterminate information at the input, which is usually considered as an error or noise. To model this problem in our tool and to evaluate its impact; there are two problems to consider: first is a noise model for computational errors and second is a model of noise propagation. A number of models of digital noise have been proposed in [7] .
To provide a noise propagation model, it must be recalled that many DSP algorithms can be considered as Linear Time Invariant (LTI) systems. This assumption allows us to use superposition of independent noise sources to compute the noise effect on the system output, [2] , [6] . The effects of noise sources on the output can be approximated using Equation (7). 
k can be found from Equation (9) 
n 1 is the present arithmetic unit word length and n 2 is the next arithmetic unit word length and p is the position of the decimal point.
III. IMPLEMENTATION
The system design methodology starts from a hierarchical specification of the target system and is based on three parts: the functional unit data base; the target architecture; and the synthesizer-optimizer. The target architecture is built on a partitioned shared bus with distributed controller which makes the target design very flexible to match a variety of DSP algorithms as well as being very modular and manageable for the synthesizer and optimizer [1] . From a synthesis point of view, on the other hand, this target architecture is a restriction in that it forces the synthesizer to map every design to a pre-defined structure which dominates the feasible solution space in favour of the optimizer.
The functional unit database is a library of functions and sub-systems. There are four kinds of sub-system in our method: algorithm executers, interfaces, memories and controllers, which each might contain further functional units and/or sub-systems. In addition to implementation information, this database provides the required information for the design optimizer cost functions including: area, accuracy, delay and power consumption.
The synthesizer's input is a high level specification of the algorithm in the form of difference equations. Basically there is a pre-defined hierarchical architecture to which the target system must be mapped. The starting specification of the system and the final implementation are both represented by a digraph. A set of library files is used to produce Intermediate Code (ICD) files which are a more compact form of the initial specification of the target system The library files contain the basic blocks of the system and their cost relationships (noise, area, power, and delay) as functions of word length. These cost parameters can be used in a cost evaluation program after scheduling, allocation and binding to optimize the design.
IV. OPTIMIZATION
A GA is utilized in this study for design optimization. The genetic operators are extracted from standard GA procedure which includes selection by roulette wheel, crossovers, mutation [4] and brand new randomly produced genes. Rates for crossovers, mutation and imported genes are chosen as shown in Table (1). In Table ( 1) M is the number of FUs, p(x) is a randomly generated value and K 1 , K 2 , K 3 , K 4 , K 5 are constant values dependent on M and the number of the iterations in the algorithm.
According to the target architecture, one word length (w) has to be assigned to each functional unit. Therefore, we define a vector of word lengths for the FUs in data paths as in Equation (10) and this vector is used as the gene in the GA optimization algorithm. An optimization problem must then be solved, with multiple objectives and constraints taken into consideration. A standard technique for Multi-objective Optimization is to minimize a positively weighted convex sum of the objectives, as shown in Equation (11). 
V. RESULTS
Four case studies were implemented in ST 1.2 m technology. Design I is an order-10 difference equation, Design II is an order-18 difference equation, Design III is a Filter (FIR-25) and Design IV is a DCT 4x4.
In most practical implementations, there are known constraints which must be satisfied and therefore, other costs must be optimized with respect to them. Comparison of the results in Figures (1) and (2) and equation (11), suggests that by freezing one of the costs and taking it as a design constraint during optimization; it is possible to achieve the same required objective with minimum costs for the other two. To illustrate this, a set of constrained optimizations was performed with constrained accuracy. Table ( 2) provides the results of such design optimizations.
Several examples are given in Table ( 2) for each design. At first, all the FUs in the design were assigned to a fixed word-length. Four basic cases (W=8, 16, 24 and 32) were implemented and their design costs (Area, Power Consumption and Digital Noise) were calculated as the reference values. In the second step, three optimization approaches were applied for each design in each case. Optimizations were based on freezing one of the costs and optimizing two others. Clearly, in all cases design costs are reduced by our methodology however this improvement is dependent on design and accuracy constraints.
VI. CONCLUSIONS
This study presents a methodology for implementing DSP algorithms which uses models of power consumption, circuit area and output noise and their relationship to wordlength. Investigation of basic designs shows a considerable improvement in costs when optimizations are employed. 
