A Symbolic Noise Analysis Approach to Word-Length Optimization in DSP Hardware by Ahmadi, Arash & Zwolinski, Mark
A Symbolic Noise Analysis Approach to
Word-Length Optimization in DSP Hardware
Arash Ahmadi Mark Zwolinski
Electronic System Design Group
School of Electronics and Computer Science University of Southampton
{aaO3r,mz} @ecs.soton.ac.uk
Abstract-This paper addresses the problem of choosing WL optimization is conducted using the synthesized hardware
different word-lengths for each functional unit in fixed-point models. In [4], a combined method of static and dynamic
implementations of DSP algorithms. A symbolic-noise analysis analysis is proposed which employs an interval propagation
method is introduced for high-level synthesis of DSP algorithms
in digital hardware, together with a vector evaluated genetic a f
algorithm for multiple objective optimization. The ability of method for precision bit-width optimization. Nayak et al. in
this method to combine word-length optimization with high-level [5] present a compiler that takes high-level signal processing
synthesis parameters and costs to minimize the overall design algorithms described in MATLAB and generates optimized
cost is demonstrated by example designs. hardware in which data range optimization is performed by
I. INTRODUCTION a data range propagation technique. Their results show sig-
nificant reductions in hardware costs. In [6] and [7] methods
The main objective ofHigh Level Synthesis (HLS) is to find based on analytical digital noise analysis are proposed which
the optimal design in terms of area, latency, throughput, and are more suitable for Linear Time Invariant (LTI) systems;
power consumption. Data Word-Length (WL) is one of the however, these methods are extended to WL optimization for
parameters that influences these metrics. In custom hardware nonlinear systems in [8] and [9]. These methods exploit this
implementations there is freedom for the WL to be chosen fact that the fixed-point implementation of an algorithm is a
optimally for different points of the hardware. Despite the weak perturbation of its high precision specification.
simplicity of the idea, designers face difficulties choosing the Several works report applications of symbolic analysis in
best WL in complicated systems, thus 50% of the design time computational error analysis. A basic implementation of this
may be spent on WL determination [1]. method is known as Interval Arithmetic (IA) and Affine
Optimization approaches based on Linear Programming Arithmetic (AA) which perform a symbolic error analysis on
(LP) have execution times that increase exponentially with the algorithm [10]. In this method, dependencies of the noise
design complexity. In general, WL optimization is an NP-hard sources are taken into account in a parametric representation
problem [2], making exact methods impractical in the case of of the error at different points in the Data Flow Graph (DFG).
real designs. In [11] Lee et al. implemented an AA-based method which
The objective of this work is to introduce a new method of categorizes the problem into two parts, range analysis and
WL optimization. In this approach, a Symbolic Noise Analysis precision evaluation. The former gives the integer part of the
(SNA) method is used to analyze the computational error at data whereas the latter provides the fractional part of the
every point of the hardware, without restrictive assumptions numbers in every point on the DFG. Similar to this work
about the statistical model ofthe signals. This model is applied a study is reported by Pu and Ha in [12] which applies
to a Multi-Objective Optimization (MOO) method to find the AA with a different heuristic. In the later work, inspired
minimal WL at each point in the hardware implementation of from [13], by applying the central limit theorem the first and
Digital Signal processing (DSP) algorithms. second moments of the output noise are approximated from
The paper is organized as follows: section II provides a the symbolic representation of the output noise.
review of related work; the proposed computational error Our work introduces a static analysis which is a combination
model is presented in section III; section IV is devoted to of the symbolic data range analysis and noise analysis, called
a very brief review of the cost functions the implementation SNA. This accuracy evaluation method is combined with
of the synthesizer and synthesis results are reported in section a multi-objective optimization method in which the objec-
V. tives are circuit area, latency, power consumption and digital
noise, all integrated in a Vector Evaluated Genetic Algorithm
II. BACKGROUND (VEGA). The contributions of this work are:1) Merging noise
In [3] a heuristic WL optimization method is introduced to based analysis with symbolic range analysis to characterize
tradeoff system area against Signal Quantization-Noise Ratio the computational noise analytically as well as statistically;
(SQNR). This is a stimuli based method which utilizes a refer- 2) Correcting the round-off noise model for multiple WL
ence floating point computation ofthe algorithm while the final implementations in shared hardware designs; 3) Introducing
457
1-4244-0797-4/07/$20.OO ©¢ 2007 IEEEa GA method for WL which integrates the HLS with WL
8 t 1 8
optimization for a variety of nonlinear applications; and 4) x
Combining WL with area, power consumption and delay in a
multi-objective optimization and design method. 16bit 24-bit
Produced Noise
III. WORD-LENGTH: CAUSES AND EFFECTS
In mXodel ;t0 25-bit
In the digital representation of data, reducing the data bit- In real o
width has a direct effect on the accuracy, which is construed Fig. 1. Mapping a multiple-WL DFG to shared resource hardware.
as computational error or noise. From this viewpoint, WL op-
timization methods can be categorized as error range analysis
or noise analysis. The former approach considers how the In the proposed method, modeling the effects of WL ma-
maximum/minimum values of the signals propagate through nipulation takes place in two basic steps: the first is a noise
the system from inputs to the output(s). Accordingly, the result symbol ofthe computational errors for every operation node in
ofthe analysis is the range ofthe output error. Several methods the DFG, and the second is propagation of the noise symbols
are introduced in this category such as IA [1], AA [11] and through the DFG. The noise model, that is presented in [2], is
the Taylor Model [14]. These sub-categories are altered in the the commonly accepted model in the multiple WL paradigm.
way of their range representation and approximation. In the Then this model is embodied in the form of affine symbol
noise analysis approach, on the other hand, the outcome of the variables in [11]. Equation (4) gives the variance (07k) of the
accuracy reduction is represented as a random process, which noise.
also called computational noise. Different characteristics of 2 =2p _ 2n2 _2ni
the computational noise has been inspected so far and they k 12
2
commonly assumed to be White Sense Stationary (WSS) where p represents the decimal point position, n2 represents
signals [15]. Inspired by analogue signal processing, most of the required WL and nm represents the available WL for data
the existing works utilize the Signal-to-Noise Ratio (SNR) representation (ni > n2). According to this model, the values
error criterion as the accuracy cost. ofnoise sources are specified by the WL ofthe current FU and
In our proposed method a partially known quantity x is its preceding (parent) node(s). Despite its clarity, this model
represented in SNA form as in Equation(l). can mislead the optimization search in some cases, especially
x
= TN(E), (1) in stochastic search methods.
x TN (E), Figure (1) shows the maximum required WL in a sample
where T(.) is a polynomial of order N with M known DFG assuming 8-bit input data. The intermediate WLs are
coefficients (Xl, X2, xM); and E is an array as in Equation calculated based on the type of the operations and the input
(2). signals WLs. Since the maximum required WL propagates
E Lx 5E, £ 2, 8m], (2) through the DFG, WL in each point in the DFG is a function
of the parent nodes of that point. Accordingly, a noise source
whereis are sml ic representation oa m l, in the model in [2] is also dependent on all its preceding
This model, called algebraic representation [14], covers a nodes. This example shows that in noise source evaluation by
big range of nonlinear relationships which can be expressed as Equation (4), the WL for every node in the DFG must be cal-
an algebraic relation. By eliminating x from E in Equation (2), culated from data range propagation through all its preceding
Equation (1) will be reduced to a Taylor Model. Furthermore, parent nodes. Especially in stochastic search methods or HLS
the AA representation can be achieved with a first order Taylor integrated methods this data range analysis must be repeated
Model as in Equation (3). at every iteration.
m To provide a noise propagation model, it must be recalled
x x0 + i x i, (3) that many DSP algorithms can be considered (or approxi-
mated [9]) as LTI systems. This assumption is very useful
where the x0 is the original value, x is the rounded value, in simplification of the noise symbols propagation through
xi E R are constants and -1 < &i < +1 are noise symbols. DFG, however it is not the case in all applications. In our
As represented by the AA analogy, noise symbols are method, noise propagation through the DFG is evaluated by
random variables in the range [-1, +I]. Every noise symbol polynomial algebra. Accordingly, by a Taylor approximation
has a known Source (S) in the computation DFG and a of the nonlinear operations, it is possible to formulate the
known Probability Density Function (PDF). Accordingly, in noise symbols in the output. This rules and methods are
this study, any noise symbol is defined by two other symbols explained comprehensively in related works such as [11] and
&i = (S, P), in which S represents the noise source and P [14]. Another problem regarding noise symbol propagation is
indicates the PDF type. Extending symbol variables &i into two noise symbol combination. Unlike the AA method, here noise
symbols provides more information about noise at every point symbols are random variables with a known PDF, thus they
of the system, however it necessitates more computational can be merged to form new noise symbols. This is useful
effort during the optimization process. especially when the number of symbols increases explosively
458 2007 IEEE International Symposium on Integrated Circuits (ISIC-2007)in iterative algorithms. The expectation and variance of a sum performed around this preliminary found solution. All our
of independent random variables such as &i in Equation (3) optimization results are achieved in reasonable execution times
can be calculated by: using this biased search method on an AMD-Opteron CPU.
E( xi * &i) xi *E.(i), (5) V. RESULTS
Four case studies were implemented in ST 1.2,im technol-
Var((=1 xi &i) = =1 42 Var(&i) (6) ogy using the proposed method and tools. Design I is an order-
+2 xi * xj * Cov(£i,£j), 18 difference equation, Design II is a Filter (FIR-25), Design
III is an 8-point FFT and Design IV is a DCT 4x4.
where E(.), Var and Cov stand for expectation, variance and Since, in practical implementations, there are pre-defined
covariance respectively. In addition, based on Central Limit constraints which must be satisfied and therefore, other costs
Theorem the symbolic noises in Equation (3) be merged the must be optimized with respect to them, an exhaustive set of
distribution of the replacement symbolic noise is approxi- synthesis optimizations areperformedto show the design costs
mately normal for large m. dependency to WL as a synthesis parameter along with the
IV. OPTIMIZATION METHOD other classical synthesis parameters such asbinding, allocation
and scheduling.
From HLS viewpoint, costs of the design can be divided Table (I) provides the results of design optimizations with
into three parts: those of datapaths; controllers; and intercon- fixed WL. This table gives the basic costs (design area, power
nections. Since WL is the optimization parameter, its effect consumption, delay and digital noise variance in the output
must be evaluated on each part individually. The controller node) for different assumptions of uniform WL (W=8, 16, 24
and interconnections parts are not dependent on the WL and and 32) in all the design points. This set ofinformation is used
so they can be considered as constant values in the cost as the basis for comparison with other optimization results and
function. It is shown in [16] and [2] that accuracy, area and also as constraints for them.
power consumption costs are dramatically dependent on the In the second step, constrained optimizations are applied
WL and execution delay is a function of the WL in the case for designs considering WL as a synthesis parameter. Since
of sequential FUs. Therefore, the cost model is as given in there are four different costs in this study, four different cases
Equation (7), of constraints are considered. Table (II) shows the synthesis
FTotal(G) = FC + FI + FD (7) results for the same systems where design area is constrained.
The constraint values for the area cost function in Table (II)
where F represents the cost function, X is the set of synthesis are the area costs results in the Table (I).
parameters including WLs array for functional units (FUs) Similarly Tables (III), (IV) and (V) show synthesis results
and C, I and D indices stand for Controller, Interconnection with optimization constraints for energy consumption, output
and Datapath respectively. All the relations and values are noise and latency respectively. Again the constraint values
derived from basic cells in the ST 1.2 ,um technology using for each column and row of these tables can be found in
the Synopsys tools as in [17] and [16]. the corresponding column and row in the Table (I). In these
The implemented design method starts from a high-level tables A, E, N and D stand for: area cost (in pum2), energy
specification of the system and produces a set of synthesizable consumption cost (in ,uWatt/Hz), digital noise variance in the
RTL-VHDL files. This tool is based on a target architecture output and latency cost of the design respectively (number of
with a multiple-shared bus. Actually this target structure clock cycles).
restricts the implementation space but in return reduces the
search time dramatically [16]. The utilized genetic operators VI. CONCLUSION
(including weighted roulette wheel, crossovers and mutation This study presents a new method for minimizing the
[18]) are extracted from a standard GA procedure for variable hardware implementation of DSP algorithms by optimizing
length, integer array genomes. The synthesizer then employs the word-length of the data in each functional unit. Symbolic
an elite-preserving, VEGA optimization algorithm with a noise analysis is used in combination with models of power
fitness function of a weighted Chebyshev combination of the consumption, circuit area anddelay. Results from fourexample
basic design costs (area, delay, energy and noise) to find the designs demonstrate a considerable saving in costs when these
optimal points in the constrained feasible space [18]. optimizations are applied.
From this experience, genetic search does not converge REFERENCES
in a reasonable time for complicated designs (more than 50
nodes in the DFG) because of the size of the feasible space [1] H. Keding, M. Willems, M. Coors, and H. Meyr, "FRIDGE: A fixed-
(> 5032 ). Thus a biased generation of the individuals is point design and simulation environment," in DATE'98, 1998, pp. 429-
435.
employed to speed up the GA optimization. Accordingly, [2] G. A. Constantinides, P. Y. K. Cheung, and W. Luk, Synthesis and
before optimization search, a design with uniformly chosen Optimization ofDSP Algorithms. Kluwer Academic Publishers, 2004.
[3] K.-I. Kum and W. Sung, "Combined word-length optimization and high- WL is found to have the closest costs to the constraint values level synthesis of digital signal processing systems," IEEE Trans. on
as the bias point, then the GA search for optimal points is CAD, vol. 20, no. 8, pp. 921-930, 2001.
2007IEEE International Symposium on Integrated Circuits (ISIC-2007) 459TABLE I TABLE IV
DIFFERENT FIXED-UNIFORM WL FOR DESIGNS NOISE CONSTRAINED SYNTHESIS
Designs Cost Uniform Fixed WL for all the FUs Designs Cost Noise costs are constrained as in Table (I)
_____ ~~~~ W=8 W=16 W=24 IW=32 1 #1 #2 #31 #4
Design I A 4152 8304 12456 16608 Design I A 3633 7785 11937 16089
E 5672.73 20779.2 45753.7 80602.9 E 4478.08 18350.9 42091.6 75706.9
N 1.03E-2 4.04E-5 1.58E-7 6.16E-10 D 150 293 436 580
D 168 311 454 598 Design II A 20646 43349 64495 87323
Design II A 22184 44368 66552 88736 E 6074.11 24996 55273.2 100628
E 7143.11 26929.2 59584.1 105061 D 53 79 101 126
N 1.28E-2 5.09E-5 1.99E-7 7.62E-10 Design III A 12649 41595 88645 116178
D 58 82 106 130 E 7572.32 31676.3 76467.4 128521
Design III A 14456 44368 89736 119648 D 99 107 114 135
E 9631.03 35813.4 78909 138029 Design IV A 26889 100915 163027 219987
N 2.95E-2 1.J15E-4 4.50E-7 1.76E-9 E 16856.9 69308.4 147610 265048
D 100 110 121 145 D 119 126 145 168
Design IV A 29912 111344 174744 222688
E 18256.5 71076.6 156085 273138 TABLE V
N 3.26E-2 1.27E-4 4.97E-7 1.94E-9
D 121 130 152 178 LATENCY CONSTRAINED SYNTHESIS
TABLE II Designs Cost Delay costs are constrained as in Table (I)
4 #1 #2 #3 #4
AREA CONSTRAINED SYNTHESIS Design I A 3633 7785 11937 16089
E 4478.08 18350.9 42091.6 75706.9
Designs T Cost Area costs are constrained as in Table (I) N 1.03E-2 4.04E-5 1.58E-7 6.16E-10
_________ #1 #2 #3 #4 Design II A 21665 44243 65783 88736
Design I E 4478.08 18350.9 42218.3 75706.9 E 6936.68 26541.3 57530 104750
N 1.03E-2 4.04E-5 1.09E-7 6.16E-10 N 1.20E-2 4.46E-5 1.87E-7 6.96E-10
D 150 293 436 580 Design III A 14331 36909 79951 119201
Design II E 6295.61 25751.8 58066.5 101421 E 8879.05 30696 73861.1 136681
N 1.J15E-2 4.80E-5 1.77E-7 6.82E-10 N 2.81E-2 9.87E-5 3.29E-7 1.32E-9
D 53 79 102 126 Design IV A 26245 106908 165353 221919
Design III E 9095.21 34298.4 77773.5 136270 E 16355.9 70709.7 151477 269987
N 2.01E-2 5.68E-5 3.14E-7 1.05E-9 N 2.06E-2 1.JOE-4 4.58E-7 1.84E-9
D 100 107 113 137
Design IV E 17341.5 70962.8 152568 271351
N 2.62E-2 1.22E-4 4.71E-7 1.63e-9
D 119 126 143 168 2006.
[7] G. A. Constantinides, P. Y K. Cheung, and W. Luk, "Optimum and
heuristic synthesis of multiple word-length architectures," IEEE Trans.
TABLE III on VLSI, vol. 13, no. 1, pp. 39-57, 2005.
ENERGY CONSTRAINED SYNTHESIS [8] C. Shi and R. W. Brodersen, "Automated fixed-point data-type optimiza-
tion tool for signal processing and communication systems," in DAC'04,
Energy costs are constrained as in Table (I) 2004, pp. 478-483.
Designs Cost [9] G. A. Constantinides, "Perturbation analysis for word-length optimiza-
tion," in FCCM03, 2003, pp. 81-90.
Design I A 3633 7785 11937 16483 [10] J. Stolfi and L. de Figueiredo, Self-Validated Numerical Methods and
N 1.03E-2 4.04E-5 1.58E-7 4.27E-10 Applications. Institute for Pure and Applied Mathematics (IMPA),
D 150 293 436 580 1997.
Design II A 21987 44243 66105 87967 [11] D.-U. Lee, A. A. Gaffar, R. C. C. Cheung, 0. Mencer, W. Luk, and
N 1.14E-2 4.29E-5 1.66E-7 6.92E-10 G. A. Constantinides, "Accuracy-guaranteed bit-width optimization,"
D 55 78 102 127 IEEE Trans. on CAD, vol. 25, no. 10, pp. 1990-2000, 2006.
Design III A 14456 44118 82724 118754 [12] Y Pu and Y Ha, "An automated, efficient and static bit-width op-
N 2.73E-2 6.31E-5 2.46E-7 1.40E-9 timization methodology towards maximum bit-width-to-error tradeoff
D 100 106 119 135 with affine arithmetic model," in ASP-DAC'06, 2006, pp. 886-891.
Design IV A 27086 100468 164315 222438 [13] C. F. Fang, R. A. Rutenbar, and T. Chen, "Fast, accurate static analysis
N 2.79E-2 1.15E-4 4.69E-7 1.86E-9 for fixed-point finite-precision effects in DSP designs," in ICCAD'03,
D 119 126 144 169 2003, pp. 275-282.
[14] N. S. Nedialkov, V. Kreinovich, and S. A. Starks, "Interval arithmetic,
affine arithmetic, taylor series methods: Why, what next," Numerical
Algorithms, vol. 37, no. 1-4, pp. 325-336, 2004.
[4] R. Cmar, L. Rijnders, P. Schaumont, S. Vernalde, and I. Bolsens, [15] A. B. Sripad and D. L. Snyder, "A necessary and sufficient condition for
"A methodology and design environment for DSP ASIC fixed point quantization errors to be uniform and white," IEEE Trans. on Acoustics,
refinement," in DATE'99, 1999, p. 56. Speech, and Signal Processing, vol. 25, no. 5, pp. 442-448, 1977.
[5] A. Nayak, M. Haldar, A. Choudhary, and P. Banerjee, "Precision and [16] A. Ahmadi and M. Zwolinski, "Word-length oriented multiobjective
error analysis of MATLAB applications during automated hardware optimization of area and power consumption in dsp algorithm imple-
synthesis for FPGAs," in DATE'01, 2001, pp. 722 - 728. mentation," pp. 614-617, May 2006.
[6] G. Caffarena, G. A. Constantinides, P. Y. Cheung, C. Carreras, and [17] "Area word-length trade off in dsp algorithm implementation and
0. Nieto-Taladriz, "Optimal combined word-length allocation and ar- optimization," pp. 16/1-6, 2005.
chitectural synthesis of digital signal processing circuits," IEEE Trans [18] K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms.
on Circuits and Systems II: Express Briefs, vol. 53, no. 5, pp. 339-343, John Wiley and Sons, 2001.
460 2007IEEE International Symposium on Integrated Circuits (ISIC-2007)