Optimum wordlength allocation by Constantinides, GA et al.
Optimum Wordlength Allocation
George A. Constantinides, Peter Y.K. Cheung, Wayne Luk
Department of Electrical and Electronic Engineering, Imperial College, London SW7 2BT.
Department of Computing, Imperial College, London SW7 2BZ.
Abstract
This paper presents an approach to the wordlength al-
location and optimization problem for linear digital signal
processing systems implemented in Field-Programmable
Gate Arrays. The proposed technique guarantees an op-
timum set of wordlengths for each internal variable, al-
lowing the user to trade-off implementation area for er-
ror at system outputs. Optimality is guaranteed through
modelling as a mixed integer linear program, constructed
through novel techniques for the linearization of error and
area constraints. Optimum results in this field are valuable
since they can be used to assess the effectiveness of heuris-
tic wordlength optimization techniques. It is demonstrated
that one such previously published heuristic reaches within
0:7% of the optimum area over a range of benchmark prob-
lems.
1 Introduction
This paper addresses the problem of hardware synthesis
from an initial, infinite precision, specification of a digi-
tal signal processing (DSP) algorithm. DSP algorithm de-
velopment is usually initially performed without regard to
finite precision effects, whereas for Field-Programmable
Gate Array (FPGA) implementation, finite precision ef-
fects are often of critical importance.
It has been argued elsewhere [1] that often the most ef-
ficient FPGA implementation of an algorithm is one which
supports a wide variety of finite precision representations,
so that the best representation can be used for each internal
variable. The accuracy observable at the outputs of a DSP
system is a function of the wordlengths used to represent
all intermediate variables in the algorithm. However accu-
racy is less sensitive to some variables than to others, as is
implementation area.
The contribution of this paper is to present a technique
for optimum wordlength allocation, for the case where the
DSP algorithm to be synthesized is a linear, time-invariant
(LTI) system [2]. Existing methods for wordlength alloca-
tion are heuristic by nature and thus it is difficult to mea-
sure the quality of the solutions produced by these meth-
ods. It is this uncertainty that has motivated the work
set forth in this paper: to enable accurate characterization
of the effectiveness of wordlength optimization techniques
with respect to optimum solutions.
The wordlength optimization techniques of interest are
those which allow a user-controlled trade-off between im-
plementation area and signal quality at the DSP system
outputs, such as those described by [1, 3, 4, 5].
The Mixed Integer Linear Programming (MILP) tech-
nique described in this paper has been applied to several
small benchmark circuits, and the results compared to the
heuristic presented in [1]. Modelling as a MILP permits
the use of industrial-strength MILP solvers such as Bon-
saiG [6]. Although MILP solution time can render the syn-
thesis of large circuits intractable, optimal results even on
small circuits are valuable as benchmarks with which to
compare heuristic optimization procedures. For this pur-
pose the optimal specifications have been made available
for public download for anyone interested in comparing
new or existing wordlength optimization techniques.
Although the construction of the MILP is described in
detail in this paper, no complete example MILP is given for
space reasons. Several examples can be found at:
http://infoeng.ee.ic.ac.uk/gac1/OptimumWL
This paper has the following structure. Section 2 de-
scribes the relevant literature in wordlength optimization,
before the computation model and associated high-level
area models are described in Sections 4 and 5. A brief re-
view of the proposed noise model for LTI systems is then
provided in Section 6, before the construction of the pro-
posed MILP is given in Section 7. Results from application
of the model to benchmark circuits are given in 8, before
concluding the paper in Section 9.
2 Background
In [7] it has been demonstrated that a simplified version
of the problem addressed in this paper is NP-hard. There
are, however, several published approaches to wordlength
optimization. Those offering an area / signal quality trade-
Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02) 
1082-3409/02 $17.00 © 2002 IEEE 
off are all of an heuristic nature [1, 3, 5] or do not sup-
port different fractional precision for different internal vari-
ables [4].
Bendetti and Perona [8] have proposed an analytic
method for wordlength optimization based on interval
arithmetic. The authors propose a ‘multi-interval’ ap-
proach, and demonstrate that the addition, subtraction,
multiplication and division of the proposed intervals re-
sult in similar intervals, which may be propagated through
a loop-free data-flow graph in order to estimate the
wordlength required for a computation without losing any
precision.
The Bitwise Project [9] propose a similar compiler-
based technique based on propagating the ranges of inte-
ger variables backwards and forwards through data-flow
graphs. Their focus is on removing unwanted most-
significant bits (MSBs) – no technique is proposed for re-
moving unwanted least-significant bits (LSBs).
The MATCH Project [4] also use compiler-based prop-
agation through data-flow graphs, except they allow vari-
ables with a fractional component. All signals in their
model must have equal fractional precision – the authors
propose an analytic worst-case error model in order to es-
timate the required number of fractional bits.
Wadekar and Parker [3] have also proposed a methodol-
ogy for wordlength optimization. Like [4], their technique
also allows controlled worst-case error at system outputs,
however they allow each intermediate variable to take a
wordlength appropriate to the sensitivity of the output er-
rors to quantization errors on that particular variable. A
Genetic Algorithm is used to perform the optimization,
and Taylor series are used to evaluate an estimate of the
worst-case error at a system output for any given internal
wordlengths.
Cmar et al. [10] have developed a wordlength optimiza-
tion system which uses a combination of range propaga-
tion and simulation with known input vectors to limit the
wordlengths of internal variables. An heuristic algorithm
is applied whereby the wordlength is decided based on the
value of an empirically derived scaling of the error stan-
dard deviation for each signal under simulation. The idea
behind such an approach is that it sets an upper-bound on
the wordlength of each variable, beyond which the least
significant bits will be drowned in quantization or external
noise. No additional mechanism is proposed to automate
the tradeoff of system area against error.
Kum and Sung [5] have proposed several wordlength
optimization techniques to trade-off system area against
system error, some of which have been incorporated in
the Cadence Signal Processing Worksystem [11]. These
techniques are heuristics based on bit-true simulation of
the design under various internal wordlengths. Some sim-
Table 1: Degrees of nodes in a computation graph
type id(v) od(v)
INPORT 0 1
OUTPORT 1 0
ADD 2 1
DELAY 1 1
GAIN 1 1
FORK 1  2
ilar simulation-based work has been reported by Leong et
al. [12].
In a previous paper [1] we present an optimization
heuristic based on analytic average-case error analysis of
LTI systems. Results of between 6% and 45% area reduc-
tion were achieved by our heuristic compared to the use of
the optimum uniform wordlength design. However until
this paper it has been impossible to effectively judge the
quality of the solutions achieved due to the lack of an opti-
mum comparative benchmark.
3 Notation
In this paper, the following notation is used.
For a directed graph G(V;E), pred(e) and succ(e) indi-
cate the predecessor and successor nodes of an edge e 2 E.
od(v) denotes the out-degree and id(v) denotes the in-
degree of a node v 2 V . For a node v 2 V with id(v) = 1,
in(v) denotes the signal driving node v. Similarly for a
node v 2 V with od(v) = 1, out(v) denotes the signal
driven by node v.
Set subtraction is indicated by the operator n.
For a function f : X ! Y , f(X 0  X)  Y denotes
the subset fy 2 Y j9x 2 X 0 : f(x) = yg.
4 Computation Model
A computation graph G(V; S) is the formal represen-
tation of an algorithm. V is a set of graph nodes, each
representing an atomic computation or input/output port,
and S  V  V is a set of directed edges representing the
data flow. An element of S is referred to as a signal. The
set S must satisfy the constraints on indegree and outde-
gree given in Table 1. We partition the set V into subsets
V = V
G
[ V
I
[ V
O
[ V
A
[ V
F
[ V
D
, representing the set
of gain nodes, input nodes, output nodes, adder nodes, fork
nodes and delay nodes respectively.
Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02) 
1082-3409/02 $17.00 © 2002 IEEE 
x[t] y[t]+
(b) an example computation graph
+
z
-1
z
-1
ADD GAIN DELAY FORK
(a) some nodes in a computation graph
COEF
Figure 1: The graphical representation of a computation
graph
A graphical representation of a simple computation
graph is shown in Fig. 1. Adders, constant coefficient mul-
tipliers and unit sample delays are represented using dif-
ferent shapes. Edges are represented by arrows indicating
the direction of data flow. Fork nodes are implicit in the
branching of arrows. Inport and outport nodes are also im-
plicitly represented, and may be labelled with the input and
output names, x[t] and y[t] respectively in this example.
The algorithms described by computation graphs will
be implemented using a multiple wordlength architecture,
as introduced in [1]. This scheme will be briefly reviewed
in order to aid the understanding of the remainder of this
paper.
In FPGAs, it is well known that a fixed-point implemen-
tation is generally more efficient than a floating-point im-
plementation for most DSP algorithms with low dynamic
range [13]. Each signal in a multiple wordlength architec-
ture is allowed to take a distinct wordlength and scaling,
appropriate to the internal variable represented by the sig-
nal. Fig. 2(a) shows the meaning of these two parameters:
n
j
is the number of bits in the representation of the sig-
nal (excluding the sign bit), and p
j
is the displacement of
the binary point from the LSB side of the sign bit towards
word LSB.
During the design stage, each wordlength is chosen in-
dividually to minimize logic usage while satisfying round-
off or truncation error. The contribution of this paper is to
perform this design optimally.
p
...S
n
(a)
(a,v) (b,w) (c,x)
+
(d,y)(e,z) z
-1
(b)
Figure 2: The Multiple-Wordlength Paradigm: (a) Sig-
nal Parameters: ‘s’ indicates the sign bit (b) A multiple
wordlength architecture,
5 Area Models
In order to formulate the error-constrained area mini-
mization problem, it is necessary to construct high-level
models of the area consumption of each type of node. Only
adders, gains, and delays are considered to consume area
resources on the FPGA; the remaining nodes are simply
wiring or input/output constructs.
Model formulation has proceeded by defining a param-
eterized high-level area model from knowledge of the in-
ternal architecture of a component. The model parameters
have then been tuned to the Xilinx Virtex series of FPGAs
through the synthesis of many sample library elements us-
ing coregen and least-squares fitting to the theoretical
model. Although the values of the model parameters pre-
sented are specific to Xilinx Virtex, the models themselves
are general and can easily be re-tuned to alternative FPGA
families and manufacturers or for ASIC implementation.
5.1 Adders
Usually adders are implemented in FPGAs as ripple-
carry designs, since fast carry chains are provided in mod-
ern FPGA architectures [14]. Multiple wordlength im-
plementations of adders can be conceptually quite com-
plex due to the alignment required for signals of differ-
ent wordlength or with different scaling (binary point loca-
tion). Fig. 3 illustrates the adder types found in practice in
multiple wordlength implementations [15]. The inputs of
these adders have been arranged so that binary-point align-
Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02) 
1082-3409/02 $17.00 © 2002 IEEE 
SS
(a)
+
S
S
S
S+
na
nb
no
m-1
s
no
q
na
nb
s
S
S
(b)
+
S
S
S
S
(d)
+
na
nb
m-1
na
nb
no
q
no
s
s
a:
b:
o:
a:
b:
o:
(c)
S
m-1 noq
no
S
S
S
m-1 noq
no
a:
b:
o:
a:
b:
o:
Figure 3: Multiple wordlength adder types
ment requires left shifting of input b.
Even when the inputs to the adders have been so ar-
ranged, there are still four distinct cases as illustrated in
Figs. 3(a)–(d). In Figs. 3(a) and (c), input b’s most signif-
icant bit (MSB) extends beyond that of input a, whereas
the opposite is true in Figs. 3(b) and (d). The remaining
distinction concerns the output wordlength. In Figs. 3(a)
and (b) the output is drawn entirely from the overlapping
portions of signals a and b. By contrast the outputs in
Figs. 3(c) and (d) draw a portion of their value from sig-
nal a alone – this portion is implementation cost-free.
The core integer adder used to implement such mul-
tiple wordlength adders will consist of a total of up to
max(n
a
  s; n
b
) + 2 single-bit adders if all MSBs of the
result are required. However not all these adders will have
equal cost, because those not driving a portion of the out-
put signal, illustrated in Figs. 3(a) and (b), require carry
logic but no sum logic. In Figs. 3(c) and (d) there are no
such cases. Also it is important to note that not all possible
MSBs of the summation may be required by the output. A
total of m bits may not be required due to signal scaling
information [1].
The output is drawn entirely from the overlap between
signals a and b if n
o
+m  max(n
a
  s; b) + 1. Thus the
overall area of an adder can be modelled as (1).
A =
8
>
>
<
>
:
k
1
(n
o
+ 1) + k
2
[max(n
a
  s; n
b
) m  n
o
+ 1];
if n
o
+m  max(n
a
  s; b) + 1
k
1
[max(n
a
  s; n
b
) m+ 2];
otherwise
(1)
For a Xilinx Virtex implementation our experiments
suggest k
1
 1:0 LUTs and k
2
 0:5 LUTs.
5.2 Gains
Area estimation for constant coefficient multipliers is
significantly more problematic. A constant coefficient
multiplier can be implemented in FPGAs as a series of ad-
ditions and subtractions, through a recoding scheme such
as the classic Booth technique [16]. This implementation
style causes the area consumption to be highly dependent
on coefficient value. Although an ideal area model would
account for a recoding-based implementation, this cur-
rently remains unimplemented. Instead we propose to use
a ‘coefficient blind’ area model, which has been demon-
strated in practice to provide good results [1, 15, 17]. The
placed-and-routed area results attainable with the present
implementation also provides an upper bound for those at-
tainable by a more sophisticated model.
For the remainder of this section, we consider a gain
node with input signal i, output signal o and a coefficient
of wordlength CW.
The number of additions required to implement a con-
stant coefficient multiplier is assumed to rise proportion-
ally with the coefficient wordlength. Each of these will be
a (n
o
+ 1)-bit addition. However, a total of n
i
+ n
c
  n
o
additions along the edge of the multiplier array may not
require their sum circuitry, as with the adder case. Note
that this area model is equally valid with full-adder based
array multipliers and standard Wallace or Dadda-tree [18]
multiplier implementations.
A = k
3
CW(n
o
+ 1) + k
4
(n
i
+ CW   n
o
) (2)
For a Xilinx Virtex implementation, our experiments
over a wide range of coefficient values and wordlengths
suggest values of k
3
 0:60 LUTs and k
4
  0:85 LUTs.
5.3 Delays
The area of a unit sample delay with input i, imple-
mented as a register, is simply expressed as (3). For Xilinx
Virtex implementation, our experiments suggest k
5
 1:0
LUTs.
A = k
5
(n
i
+ 1) (3)
6 Noise Model
As shown in [1], since the systems of interest for this
work have the LTI property, an analytic model based
Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02) 
1082-3409/02 $17.00 © 2002 IEEE 
on [19] can be used to estimate the error at each system
output. The variance of the error injected by each trunca-
tion of a signal from n
1
bits to n
2
bits is given by (4). If the
transfer function from this point to the system output of in-
terest is given in the z-domain as H(z), then the resulting
error variance at the output is 2L2
2
fH(z)g, where L2
2
fg
denotes the well-knownL
2
-norm, included as (5) for com-
pleteness. (Z 1fg represents the inverse z-transform).

2
= 2
2p
(2
 2n
2
  2
 2n
1
) (4)
L
2
fH(z)g =
 
1
2

R
 
jH(e
j
)j
2
d
!
1
2
=

1
P
n=0
jZ
 1
fH(z)g[n]j
2

1
2
(5)
A contribution of this paper is to demonstrate how to
linearize these error models and hence incorporate them
within a MILP model for the entire optimization problem.
7 MILP Model
The MILP formulation presented relies on some knowl-
edge of integer linear programming. An excellent tutorial
is given by Garfinkel and Nemhauser [20] on this topic.
The proposed MILP model contains several variables,
which may be classified as: integer signal wordlengths,
and signal wordlengths before quantization, binary auxil-
iary signal wordlengths, and auxiliary signal wordlengths
before quantization, binary decision variables, real adder
costs, and real fork node errors.
Note that only adders, gains, and delays cost area re-
sources (forks are considered free). However adders have
an inherently complex area model and thus while gains and
delays are included directly in the objective function, the
cost of each adder V 2 V
A
is represented by a distinct
variable A
v
.
We are now in a position to formulate an area-based ob-
jective function for the MILP model (6), where CW(v) rep-
resents the coefficient wordlength of gain node v.
min:
X
v2V
A
A
v
+
X
v2V
G
n
(k
3
CW(v) + k
4
)nin(v) 
k
4
nout(v) + (k3 + k4)CW(v)
	
+
X
v2V
D
k
5
nin(v) (6)
Constraints on quantization error propagation are much
harder to cast in linear form due to the exponentiation,
shown in Section 6. In order to overcome this nonlinearity,
we propose to use an additional binary variables, n, one for
each possible wordlength that a signal could take. This is
expressed in (7), and (8) ensures that each signal can only
have a single wordlength value. Here n is used to denote
set subtraction. Note that in order to apply this technique,
it is necessary to know upper-bound wordlengths n^
s
for
each s 2 S. Techniques to derive these will be discussed
in Section 7.1. Note that signals which drive fork nodes are
not considered in this way, as fork node error models are
considered separately (see Section 7.3).
8s 2 S n pred(V
F
); n
s
 
n^
s
X
b=1
b  n
s;b
= 0 (7)
8s 2 S n pred(V
F
);
n^
s
X
b=1
n
s;b
= 1 (8)
Using these binary variables it is possible to re-cast ex-
pressions of the form 2 2nj , which appear in error con-
straints (see Section 6), into linear form as Pn^s
b=1
2
 2b
n
j;b
.
Similarly it is necessary to linearize the exponentials in
wordlengths before quantization (9)–(10).
8s 2 S n pred(V
F
) n succ(V
F
); n
q
s
 
n^
q
s
X
b=1
bn
q
s;b
= 0 (9)
8s 2 S n pred(V
F
) n succ(V
F
);
n^
q
s
X
b=1
n
q
s;b
= 1 (10)
For each system output, we propose to use an error con-
straint of the form given in (11). Note that in this pa-
per we only consider single-output systems, for simplic-
ity of explanation, however the technique is general and
our software can optimize multiple-input, multiple-output
(MIMO) systems. E represents a user-defined bound on the
error power at the system output, and hence on the signal
quality.
X
v2V
F
E
v
+
X
s2Snpred(V
F
)nsucc(V
F
)nsucc(V
I
)
2
2p
s
L
2
2
fH
s
(z)g(
n^
s
X
b=1
2
 2b
n
s;b
 
n^
q
s
X
b=1
2
 2b
n
q
s;b
) +
X
s2succ(V
I
)
2
2p
s
L
2
2
fH
s
(z)g(
n^
s
X
b=1
2
 2b
n
s;b
  2
 2n
q
s
)
 12E (11)
Note that those signals driven by system inputs are con-
sidered separately, since there is no need for Boolean vari-
ables representing the pre-quantization wordlength of a
Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02) 
1082-3409/02 $17.00 © 2002 IEEE 
variable, as this parameter is defined by the system envi-
ronment. Place-holders E
v
are used for the contribution
from fork nodes; these will be defined by separate con-
straints in Section 7.3.
7.1 Wordlength Bounds
Upper bounds on the wordlength of each signal, before
and after quantization, are required by the MILP model in
order to have a bounded number of binary variables corre-
sponding to the possible wordlengths of a signal.
Our bounding procedure proceeds in three stages: per-
form an heuristic wordlength optimization on the computa-
tion graph [1]; use the resulting area as an upper-bound on
the area of each gain block within the system, and hence
on the input wordlength of each gain block; ‘condition’
the graph, following the procedure of [1]. The intuition
is that typically the bulk of the area consumed in a DSP
implementation comes from multipliers. Thus reasonable
upper-bounds are achievable by ensuring that the cost of
each single multiplier cannot be greater than the heuristi-
cally achieved cost for the entire implementation.
Of course this only bounds the wordlength of signals
which drive gain blocks. In addition, the wordlength
of signals driven by primary outputs is bounded by the
externally-defined precision of these outputs. Together this
information can be propagated through the computation
graph, resulting in upper bounds for all signals under the
condition that any closed loop must contain a gain block.
In the remainder of this paper, we denote by n^
j
the so-
derived upper bound on the wordlength of signal j 2 S
and by n^q
j
the upper bound on the wordlength of the same
signal before LSB truncation.
7.2 Adders
It is necessary to express the area model of Section 5.1
as a set of constraints in the MILP. Also a set of constraints
describing how the wordlength at an adder output varies
with the input wordlengths is required.
7.2.1 Area model
In the objective function, the area for each adder v 2 V
A
was modelled by a single variable A
v
. It will be demon-
strated in this section how this area can be expressed in
linear form.
Let us define 
v
for an adder v 2 V
A
with input sig-
nals a and b (12), where the inputs ‘a’ and ’b’ are cho-
sen to match with Fig. 3 so that it is b which needs to be
left-shifted for alignment purposes. s
v
is also illustrated
in Fig. 3, and models the number of bits by which input b
must be shifted.

v
= max(n
a
  s
v
; n
b
) (12)
We may then express the area of an adder as (13). Sig-
nal o is the output signal for the adder and m
v
models
the number of MSBs of the addition which are known
through scaling to contain no information, as described
in [1] and illustrated in Fig. 3. This value is independent
of the wordlengths, and for an adder can be expressed as
m
v
= max(p
a
; p
b
) + 1  p
o
.
A
v
=
8
>
>
<
>
>
:
k
1
(n
o
+ 1) + k
2
[  m
v
  n
o
+ 1] ;
n
o
+m
v
  + 1
k
1
[  m
v
+ 2] ;
otherwise
(13)
The non-linearities due to the max operator in (12) and
the decision in (13) must be linearized for the MILP model.
This is achieved through the introduction of four binary
decision variables Æ
v1
, Æ
v2
, Æ
v3
and Æ
v4
for each adder v 2
V
A
.
For the remainder of this section, we consider a general
adder with inputs i and j and output o, to distinguish from
the more specific case considered above, where input b was
used to denote the left-shifted input to an adder. In order to
model (12), if p
j
 p
i
then (14)–(17) are included in the
MILP. Otherwise (18)–(21) are included in the MILP.
n
i
  n
j
+ p
j
  p
i
< Æ
v1
(n^
i
+ p
j
  p
i
) (14)

v
  n
j
+ p
j
  p
i
 (1  Æ
v1
)( n^
j
  p
i
+ p
j
) (15)
n
i
  n
j
+ p
j
  p
i
 Æ
v2
( n^
j
+ p
j
  p
i
) (16)

v
  n
i
 (1  Æ
v2
)( n^
i
) (17)
n
j
  n
i
+ p
i
  p
j
< Æ
v1
(n^
j
  p
j
+ p
i
) (18)

v
  n
i
+ p
i
  p
j
 (1  Æ
v1
)(1  n^
i
  p
j
+ p
i
) (19)
n
j
  n
i
+ p
i
  p
j
 Æ
v2
( n^
i
+ p
i
  p
j
) (20)

v
  n
j
 (1  Æ
v2
)( n^
j
) (21)
Note that 
v
and
v
are only bounded from below by the
constraints given. Inequalities are used in order to allow
disjunctions and thus implications, for example selecting
Æ
v1
= 0 in (14) gives n
i
  n
j
+ p
j
  p
i
< 0, whereas
selecting Æ
v1
= 1 gives 
v
 n
j
+p
j
 p
i
 0. Allowing Æ
v1
as an optimization variable results in n
i
 n
j
  p
j
+ p
i
)

v
 n
j
+ p
j
  p
i
. Equality of A
v
is guaranteed through
its positive coefficient in the objective function.
In order to model (13), (22)–(25) are included in the
MILP. These terms model the choice in (13) as a pair of im-
plications, in an identical manner to that described above.
Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02) 
1082-3409/02 $17.00 © 2002 IEEE 
no
  
v
+m
v
  1  Æ
v3
(m
v
 
^

v
) (22)
A
v
+ (k
2
  k
1
)n
o
  k
2

v
+ k
2
(m
v
  1)  k
1

(1  Æ
v3
)
h
(k
2
  k
1
)n^
o
  k
2
^

v
+ k
2
(m
v
  1)  k
1
i
(23)
n
o
  
v
+m
v
  1 < Æ
v4
(n^
o
+m
v
  2) (24)
A
v
+ k
1
(m
v
  
v
  2)  (1  Æ
v4
)k
1
(m
v
 
^

v
  2) (25)
7.2.2 Output Wordlength
The pre-quantization output wordlength of an adder with
inputs i and j and output o is given by nq
o
= max(n
i
 
p
i
; n
j
  p
j
) + p
o
. We may express this as (26)–(27), since
before-quantization wordlengths only appear with negative
coefficient in the error so the error constraints can be relied
upon to reduce nq
o
to achieve equality.
n
q
o
 n
i
  p
i
+ p
o
(26)
n
q
o
 n
j
  p
j
+ p
o
(27)
7.3 Forks
As demonstrated in [15], fork nodes can lead to unusual
error behaviour due to the different possible orderings of
wordlength at their outputs, which are required in order to
guarantee freedom from statistical correlation and hence
an accurate error model. Fig. 4 illustrates the six different
possible configurations of a 3-way fork with outputs n
1
,
n
2
and n
3
. For example, the top left figure corresponds to
n
1
 n
2
 n
3
and the bottom right to n
3
 n
2
 n
1
.
Each of the ‘Q’ blocks is a truncation of least-significant
bits in a signal. The z-domain transfer function from the
truncation error injected, to the system output, is shown
underneath the relevant ‘Q’ block.
In order for the MILP to fully model this behaviour it is
necessary to consider each of the possible orderings. Let

v
be a w-tuple, representing an order (a; b; : : : ; f) on a
w-way fork node v 2 V
F
with input signal i. Thus, for
example, 
v
(2) is the second largest signal width. We may
express the error resulting from truncation of those signals
driven by node v as (28), with one constraint per possible
, a total of w!. Here ^ represents Boolean conjunction.
w 1
V
r=1
(n

v
(r)
 n

v
(r+1)
))
E
v
= 2
2p
i

w 1
P
r=1
L
2
2

w r
P
h=1
H

v
(h)

(2
 2n

v
(r+1)
 2
 2n

v
(r)
) + L
2
2

w
P
h=1
H

v
(h)
(2
 2n
q
i
  2
 2n
w
)

(28)
Q Q1 2 Q3
H + H + H1 2 3 H 3H + H2 3
n1 n2
Q Q1 2 Q3
H + H + H1 2 3 H 2H + H2 3
n1 n2
Q Q1 2 Q3
H + H + H1 2 3 H 3H + H1 3
n1n2
n3
n3
n3
Q Q1 2 Q3
H + H + H1 2 3 H 1H + H1 3
n1n2
Q Q1 2 Q3
H + H + H1 2 3 H 2H + H1 2
n1 n2
Q Q1 2 Q3
H + H + H1 2 3 H 1H + H1 2
n1n2
n3
n3
n3
Figure 4: Possible output permutations in a 3-way fork
Applying DeMorgan’s theorem and linearizing the re-
sulting disjunction gives (29)–(33). Each exponential is
then further linearized through the procedure described in
Section 7. The  and  variables in (29)–(33) are additional
binary decision variables and the right-hand side of each
inequality consists of a trivial bound on the left-hand side,
multiplied by a decision variable. At least one inequality is
non-trivial, a property ensured by (33).
n
(1)
  n
(2)
< 
v(1);(2)
n^
(1)
(29)
n
(2)
  n
(3)
< 
v(2);(3)
n^
(2)
(30)
: : :
n
(w 1)
  n
(w)
< 
v(w 1);w
n^
(w 1)
(31)
E
v
  2
2p
i
 
w 1
X
r=1
L
2
2
(
w r
X
h=1
H
(h)
)
(2
 2n

v(r+1)
 2
 2n

v
(r)
)+
L
2
2
(
w
X
h=1
H
(h)
)
(2
 2n

v(w)
  2
 2n
q
o
)
!

 
v
2
2(p
i
 1)
w 1
X
r=0
L
2
2
(
w r
X
h=1
H
(h)
)
(32)
Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02) 
1082-3409/02 $17.00 © 2002 IEEE 
w 1
X
r=1

v(r);(r+1)
+ 
v
 w   1 (33)
It is not necessary to explicitly consider quantization of
the input signal to a fork node, since the above constraints
use the pre-quantization wordlength of the fork input n q
i
.
It is necessary, however, to guarantee that the input sig-
nal provides enough wordlength for the largest of its out-
puts (34).
n
i
 n
a
n
i
 n
b
: : : (34)
n
i
 n
f
7.4 Gains
In contrast to adders and fork nodes, gain nodes are
straight-forward. The area of a gain node has already
been modelled in the objective function (Section 7). The
only remaining constraint required is to model the pre-
quantization output wordlength of a gain v 2 V
G
with in-
put signal a, output signal o and coefficient of wordlength
CW(v) and scaling SC(v) (35). This constraint is already
in linear form.
n
q
o
= n
a
+ CW(v)  p
a
  SC(v) + p
o
(35)
7.5 Delays
Delay nodes also have a simple relationship between
their input wordlength and their output wordlength before
quantization, shown in (36) for the case of a delay node
with input i and output o.
n
q
o
= n
i
(36)
7.6 MILP Summary
A MILP model for the wordlength optimization prob-
lem has been proposed. It remains to quantify the number
of variables (37) and constraints (38) present in the model.
Note that the number of constraints given does not include
integrality constraints, the unit upper bounds on Boolean
variables, or the trivial fork constraints in (34) which do
not form part of the optimization problem.
vars =
P
s2Snpred(V
F
)
(n^
s
+ 1)+
P
s2Snpred(V
F
)nsucc(V
F
)
(n^
q
s
+ 1)+
jV
F
j+
6jV
A
j+
P
v2V
F
od(v)(od(v)   1) f1 + (od(v)  2)!g
(37)
cons = 2jS n pred(V
F
)j+
2jS n pred(V
F
) n succ(V
F
)j+
1+
10jV
A
j+
P
v2V
F
od(v)(od(v)  1) f1 + 2(od(v)  2)!g+
jV
G
j+ jV
D
j
(38)
It can be seen that so long as the number of large-fanout
fork nodes are limited, the number of constraints in the
MILP model grows approximately linearly in the num-
ber of nodes and signals. Under the same conditions the
number of variables can grow up to quadratically with the
number of signals because the upper bounds on each sig-
nal wordlength will vary approximately linearly with the
number of large area-consuming nodes. Both parameters
are dominated by any large-fanout fork nodes, since the
number of  variables and their associated constraints grow
combinatorially in the fanout.
8 Results
Fig. 5 illustrates area-error tradeoff curves for both a
second and a third order linear phase FIR filter [2]. For
the second order filter, results for both 4-bit and 8-bit in-
puts are given. For the third order filter, only results for a
4-bit input have been obtained. Three curves are present in
each plot: the optimum uniform wordlength implementa-
tion, the heuristically derived multiple wordlength imple-
mentation from [1], and the optimum multiple wordlength
implementation achieved by solving the MILP presented
in this paper.
The results clearly illustrate the high-quality solutions
achievable by the heuristic solution, averaging only 0:7%
with a maximum of 3:9% worse than the optimum result.
An optimum wordlength allocation for an RGB to
YCrCb convertor described in [1] with 4-bit inputs has also
been performed. This result shows an optimal cost of 78.61
LUTs, equal to the result achieved by the heuristic pre-
sented in [1].
Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02) 
1082-3409/02 $17.00 © 2002 IEEE 
10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1
0
20
40
60
80
100
120
140
2nd order, 4−bit input, optimum
2nd order, 4−bit input, heuristic [FCCM01]
2nd order, 4−bit input, uniform
3rd order, 4−bit input, optimum
3rd order, 4−bit input, heuristic [FCCM01]
3rd order, 4−bit input, uniform
2nd order, 8−bit input, optimum
2nd order, 8−bit input, heuristic [FCCM01]
2nd order, 8−bit input, uniform
Figure 5: Area / Error tradeoffs compared for a 2nd and
3rd order FIR filter
Fig. 6 illustrates the structure [21] and optimum
wordlengths of the RGB to YCrCb converter for 4-
bit inputs (of range 112), 4-bit coefficients, and with
an error-free Y, whereas a bounded error power of up
to 10 2 has been allowed for Cr and Cb. We be-
lieve such optimum results, even for small circuits, to
be highly valuable as a benchmark against which many
new and existing heuristics [1, 3, 4, 5, 8, 9, 10] may
be compared. For this reason we are making several
optimum wordlength benchmarks publicly available at
http://infoeng.ee.ic.ac.uk/gac1/optimumWL.
The BonsaiG MILP solver [6] was used to solve the
MILP models: execution time ranged from 2 seconds to 6
minutes on an AMD Athlon 1.2 GHz with 512 MB RAM.
This compares to less than 0.2 second for the heuristic so-
lutions on the same machine. Limits on the scale of the
MILP solvable are due to both excessive run-time and nu-
merical instabilities in the MILP solver.
9 Conclusion
This paper presents an approach to construct a mixed
integer linear program (MILP) from an error-constrained
area optimization problem, in order to perform wordlength
allocation. High-level area models of parameterizable li-
brary blocks have been proposed and fitted to a Xilinx Vir-
tex implementation. These form the basis of the objective
function for the optimization, which is performed subject
to user-specified constraints on output signal quality.
+
+
+
+
r
g
b
Y
Cr
Cb
4 4
4
4
4
4
4
6
6 10 10
10
10
10
10
11
1010
10
6
8
Figure 6: Optimal wordlength allocations for the ITU RGB
to YCrYb converter
Results indicate that our previously proposed heuristic
solution [1] produces results reaching the optimum in most
cases and, on average, deviating only 0:7% from the opti-
mum area.
Our current and future work is concentrating on
wordlength optimizations of nonlinear DSP algorithms,
and on including other models such as power consumption
into the optimization procedure.
References
[1] G. A. Constantinides, P. Y. K. Cheung, and W. Luk, “The
multiple wordlength paradigm,” in Proc. IEEE Sym-
posium on Field Programmable Custom Computing Ma-
chines, Rohnert Park, CA, April–May 2001.
[2] S. K. Mitra, Digital Signal Processing, McGraw-Hill, New
York, 1998.
[3] S. A. Wadekar and A. C. Parker, “Accuracy sensitive word-
length selection for algorithm optimization,” in Proc. Inter-
national Conference on Computer Design, Austin, Texas,
October 1998, pp. 54–61.
[4] A. Nayak, M. Haldar, A. Choudhary, and P. Banerjee, “Pre-
cision and error analysis of MATLAB applications during
automated hardware synthesis for FPGAs,” in Proc. Design
Automation and Test in Europe, Munich, Germany, 2001,
pp. 722–728.
[5] K.-I. Kum and W. Sung, “Combined word-length optimiza-
tion and high-level synthesis of digital signal processing
systems,” IEEE Trans. Computer Aided Design, vol. 20,
no. 8, pp. 921–930, August 2001.
[6] L. Hafer, “Bonsaig,” http://www.cs.sfu.ca/
lou/BonsaiG.
Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02) 
1082-3409/02 $17.00 © 2002 IEEE 
[7] G. A. Constantinides and G. J. Woeginger, “The complexity
of multiple wordlength assignment,” Applied Mathematics
Letters, vol. 15, no. 2, pp. 137–140, February 2001.
[8] A. Benedetti and P. Perona, “Bit-width optimization for
configurable DSP’s by multi-interval analysis,” in Proc.
34th Asilomar Conference on Signals, Systems and Com-
puters, 2000.
[9] M. Stephenson, J. Babb, and S. Amarasinghe, “Bitwidth
analysis with application to silicon compilation,” in Proc.
SIGPLAN Programming Language Design and Implemen-
tation, Vancouver, British Columbia, June 2000.
[10] R. Cmar, L. Rijnders, P. Schaumont, S. Vernalde, and
I. Bolsens, “A methodology and design environment for
DSP ASIC fixed point refinement,” in Proc. Design Au-
tomation and Test in Europe, Mu¨nchen, 1999.
[11] “Signal processing worksystem,”
http://www.cadence.com/eda solutions/
sld spdv 13 index.html.
[12] M. P. Leong, M. Y. Yeung, C. W. Fu, P. A. Heng, and
P. H. W. Leong, “Automatic floating to fixed point trans-
lation and its application to post-rendering 3D warping,”
in Proc. IEEE Symposium on Field-Programmable Custom
Computing Machines, 1999, pp. 240–248.
[13] W. B. Ligon, S. McMillan, G. Monnn, F. Stivers, and
K. Underwood, “A re-evaluation of the practicality of
floating-point operations on FPGAs,” in Proc. IEEE Sym-
posium on FPGAs for Custom Computing Machines, 1998.
[14] Xilinx, Inc., San Jose, Field Programmable Gate Arrays,
1998.
[15] G. A. Constantinides, High Level Synthesis and Word
Length Optimization of Digital Signal Processing Systems,
Ph.D. thesis, University of London, 2001.
[16] A. D. Booth, “A signed binary multiplication technique,”
Quarterly J. Mechan. Appl. Math., vol. 4, no. 2, pp. 236–
240, 1951.
[17] G. A. Constantinides, P. Y. K. Cheung, and W. Luk,
“Roundoff-noise shaping in filter design,” in Proc. IEEE
International Symposium on Circuits and Systems, May –
June 2000.
[18] B. Parhami, Computer Arithmetic: Algorithms and Hard-
ware Designs, Oxford University Press, Oxford, U.K.,
2000.
[19] G. A. Constantinides, P. Y. K. Cheung, and W. Luk, “Trun-
cation noise in fixed-point SFGs,” IEE Electronics Letters,
vol. 35, no. 23, pp. 2012–2014, November 1999.
[20] R. S. Garfinkel and G. L. Nemhauser, Integer Program-
ming, John Wiley and sons, New York, 1972.
[21] B. L. Evans, “Raster image processing on the TMS320C7X
VLIW DSP,”
http://www.ece.utexas.edu/bevans/hp-dsp-
seminar/07 C6xImage2/sld001.htm.
Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02) 
1082-3409/02 $17.00 © 2002 IEEE 
