Optimal combined word-length allocation and architectural synthesis of digital signal processing circuits by Caffarena, G et al.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 53, NO. 5, MAY 2006 339
Optimal Combined Word-Length Allocation
and Architectural Synthesis of Digital
Signal Processing Circuits
Gabriel Caffarena, Student Member, IEEE, George A. Constantinides, Member, IEEE,
Peter Y. K. Cheung, Senior Member, IEEE, Carlos Carreras, and Octavio Nieto-Taladriz
Abstract—In this brief, we address the combined application of
word-length allocation and architectural synthesis of linear time-
invariant digital signal processing systems. These two design tasks
are traditionally performed sequentially, thus lessening the overall
design complexity, but ignoring forward and backward dependen-
cies that may lead to cost reductions. Mixed integer linear pro-
gramming is used to formulate the combined problem and results
are compared to the two-step traditional approach.
Index Terms—Architectural synthesis, digital signal processing,
fixed-point arithmetic, word-length allocation.
I. INTRODUCTION
THIS brief addresses the problem of hardware synthesisof digital signal processing (DSP) algorithms under both
error and latency constraints. Programmable logic devices are
chosen as the target architecture.
The multiple word-length implementation of DSP algorithms
[1] has lately been an active research field. The traditional uni-
form word-length design approach, inherited from a micropro-
cessor-oriented approach, has been reviewed for the last few
years and algorithms for both word-length allocation [2]–[6]
and architectural synthesis [4], [7], [8] have been tuned to the
more efficient multiple word-length design. However, little re-
search has been carried out regarding the combined applica-
tion of both design tasks. In [4] a 3-step methodology is pre-
sented: approximate word-length allocation, architectural syn-
thesis and accurate word-length allocation of the resulting archi-
tecture. The approach is a pioneer work in the combination of
word-length allocation and architectural synthesis; it only lacks
a report on the area savings obtained compared to the traditional
approach.
Mixed integer linear programming (MILP) is used to define
the problem. Mainly, it allows assessing the suitability of the si-
multaneous application of these two design tasks, and the results
Manuscript received January 28, 2005; revised July 29, 2005. This work was
supported in part by the Spanish Ministry of Science and Technology under
Research Project TIC2003-09061-C03-02 and by the Engineering and Physical
Sciences Research Council, U.K. This paper was recommended by Associate
Editor C.-T. Lin.
G. Caffarena, C. Carreras, and O. Nieto-Taladriz are with the Departamento
de Ingeniería Electrónica, Universidad Politécnica de Madrid, Madrid 28040,
Spain (e-mail: gabriel@die.upm.es; carreras@die.upm.es; nieto@die.upm.es).
G. A. Constantinides and P. Y. K. Cheung are with the Department of Elec-
trical and Electronic Engineering, Imperial College London, London SW7 2BT,
U.K. (e-mail: g.constantinides@ic.ac.uk; p.cheung@ic.ac.uk).
Digital Object Identifier 10.1109/TCSII.2005.862175
Fig. 1. Fixed-point format model: n is the signal word-length and p indicates
the position of the fractionary point with respect to the sign bit s.
presented in this brief can be used to evaluate future heuristic al-
gorithms.
The main contribution of this brief is the presentation of an
optimal analysis of the simultaneous application of word-length
allocation and architectural synthesis. This approach is com-
pared to the sequential application of optimal algorithms for
word-length allocation and architectural synthesis. Area savings
up to a 13% are reported.
The brief is divided as follows. Section II deals with the main
concepts involved in the combined application approach. The
next section deals with the MILP formulation of the problem.
In Section IV some results are analyzed. Finally, conclusions
are drawn in Section V.
II. COMBINED WORD-LENGTH ALLOCATION AND
ARCHITECTURAL SYNTHESIS
A. Combined Approach
The combined application of the word-length allocation and
architectural synthesis tasks has as a starting point a compu-
tation graph , a maximum latency , and a maximum
noise variance at the output .
is a formal representation of the algorithm, where
is a set of graph nodes representing operations, and
is a set of directed edges representing signals that determines the
data flow. We consider com-
posed of gains, additions, unit delays, forks (branching nodes),
and input and output nodes. Signals are in two’s-complement
fixed-point format defined by the pair , where is the
word-length of the signal not including the sign bit, and is
the scaling of the signal that represents the displacement of the
binary point from the sign bit (see Fig. 1).
Operations are to be implemented on resources from set
and it is the aim of the combined approach to find the word-
lengths , the time step when each operation is executed (sched-
uling), the types and number of resources forming (resource
allocation) and the binding between operations and resources
(resource binding) that comply with both and constraints,
while achieving minimum area.
1057-7130/$20.00 © 2006 IEEE
340 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 53, NO. 5, MAY 2006
Fig. 2. Fork model. (a) 2-way fork. (b) Cascade model with sorted outputs. (c)
Noise at the output (expressed as  ) due to quantizations Q1 and Q2.
The notation for the range of a function
is used in this brief. represents the cardinality of set
denotes logical AND, , logical OR, and , set subtraction. The
set of input signals driving node are expressed as and
the output signals driven by as . The upper bounds on
variable are represented as .
B. Scaling
The scaling of signals can be computed before the optimiza-
tion process starts. We choose an analytical approach based on
the computation of the -norm for each signal. Given the input
peak value , the scaling of a signal is determined by (1),
where is the -norm and is the transfer function
from the input to signal
(1)
C. Noise Model
We adopt the quantization error presented in [9]. The quanti-
zation error introduced by the quantization of a signal from
bits to bits is modeled by the injection of a uni-
form-distributed white noise with a variance equal to (2). The
variance of the noise contribution at the output is ,
where is the -norm and is the transfer function
from signal to the output
(2)
As stated in [5], the error introduced by forks requires a spe-
cial treatment. Fig. 2 shows a 2-way fork with quantized out-
puts. First, the outputs must be sorted in descendant word-length
order to take into account the correlation between them and the
input of the fork [Fig. 2(b)]. It can be clearly seen that the quan-
tization noise injected by traverses both and , hence
requires both and [Fig. 2(c)]. The noise injected
by only traverses , thus only contributes to the
output’s noise. The error that a -way fork introduces can be
expressed as in (3), where the -tuple expresses the order of
the outputs [5]
(3)
D. Architectural Synthesis
The data flow of a single iteration of the algorithm is ex-
pressed by means of the sequencing graph extracted
from . is the set of operations and are the
edges specifying the precedence relations among operations.
This graph is used to decide about scheduling.
In our approach, we assume 1-cycle latency operations. Each
operation can be executed during the time interval de-
fined by (4) where denotes the set of nonnegative in-
tegers. is the execution time of operation for the as
soon as possible scheduling and is the execution
time of operation for the as late as possible scheduling for a
total time steps of . The set of all possible execution times is
given by
(4)
(5)
The set of resources is divided into mul-
tipliers, adders, and registers which implement gains, additions,
and delays. Multiplexing logic and memory to store interme-
diate values are not considered among resources.
We express the compatibility between an operation, or set of
operations and resources with function .
Targeting programmable logic devices, we regard as share-
able only multipliers since the multiplexing logic necessary is
often negligible compared to the area of these resources, a sit-
uation that does not apply to adders or registers. For instance,
the ratio between the area of a LUT-based 16 16-bit multi-
plier and a 16-bit 2-input multiplexer and a 16-bit 4-input multi-
plexer are 17.25 and 8.625 respectively for Virtex-2 and Virtex-4
devices (using Xilinx ISE v7.1). Thus, there are dedicated re-
sources to implement each addition and delay, so and
are one-to-one functions and = and
.
Multipliers have one input devoted to coefficients and its
word-length is equal to a system-wide constant . The
other input is assigned to the input signal of gains and must
have a word-length greater than or equal to the maximum
word-length of the inputs of gains bound to the resource.
An upper bound on the number of multipliers necessary can
be estimated from the number of multipliers necessary to im-
plement the ASAP scheduling. Initially, all gains can be imple-
mented on all multipliers, therefore .
E. Area Models
The cost of an adder bound to addition with
inputs and and output is given by (6) and it is derived from
the model in [5]. A ripple-carry adder is supposed. Signals and
must comply with the following: signal is shifted bits from
the least significant bit of and scaling should be bigger than
or equal to the value of (see Fig. 3 for an example). Equation
(6) requires the definition of and . Let us define as the
number of overlapped bits between and with sign extension
(7). A safe adder would require bits. Let us define as
the number of nonrequired bits at the output due to scaling (8).
The area of an optimized adder is equal to the area of the safe
adder minus bits (6). Note that the max operation in (7) can
be expressed as a disjunction, and that is a constant number
(6)
CAFFARENA et al.: OPTIMAL COMBINED WORD-LENGTH ALLOCATION AND ARCHITECTURAL SYNTHESIS 341
Fig. 3. Example of configuration of addition signals.
if
otherwise
(7)
(8)
The cost of a register bound to delay with
input is given by the straightforward equation
(9)
Equation (10) contains the cost of a multiplier bound
to a subset of gains with inputs
(10)
III. MILP FORMULATION
This section relies on some knowledge of integer linear
programming [10]. The variables used in the MILP model
are divided into: binary scheduling and resource binding
variables , integer signal word-lengths , integer signal
word-lengths before quantization , binary auxiliary signal
word-lengths , binary auxiliary signal word-lengths before
quantization , binary decision variables and , in-
teger adder costs , integer auxiliary variables and real
fork-node error variables .
In the following subsections, we present the formulation of
the MILP model.
A. Objective Function
The objective function is the sum of the area of all resources
(adders, registers, and multipliers) and it is given by (11). The
cost of adders is to be linearized in the constraints section
according to (6)
(11)
B. Architectural Synthesis Constraints
Here, we introduce the constraints related to scheduling, re-
source allocation and resource binding. Equation (12) defines
the binary variables that steer the constraints in this sub-
section
if operation is scheduled at time step
on resource
otherwise
(12)
Equation (13) shows the binding constraint that ensures that
an operation is executed on exactly one resource. The next con-
straint (14) states that a resource does not implement more than
one operation at a time. Note that there is no need to apply (13)
and (14) to operations with dedicated resources. The precedence
constraints are given by (15) ensuring that operations obey the
dependencies in the sequencing graph
(13)
(14)
(15)
And finally, (16) expresses the resource compatibility con-
straints, which guarantee that a resource bound to several oper-
ations must be compatible with all of them. Again, only multi-
pliers are considered: the input devoted to signals must have a
word-length as big as the maximum of the word-lengths of each
gain input bound to it. The summation is equal to 1 if
operation is bound to resource
(16)
Note that although only multipliers are prone to sharing the
notation can be easily extended to include more resources that
can be shared (dividers, adders, etc.) or to map more than one
type of operation to the same resource (e.g., gains and multipli-
cations bound to multipliers).
C. Adder Cost
The linearization of the adder cost is based on the model from
[5]. Constraints (17)–(20) cast (7) using binary decision vari-
ables and , and also trivial bounds on the left side of the
equations. Equation (6) is directly implemented using constraint
(21)
(17)
(18)
(19)
(20)
(21)
342 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 53, NO. 5, MAY 2006
D. Word-Length Allocation Constraints
Here, we present the constraints related to the estimation of
the noise at the output of the system [5]. The error constraint
is given by (22) and it is divided into two summations, the first
dealing with forks’ signals, and the second dealing with the re-
maining signals in (see Section II-C)
(22)
Note that nonconstant powers of two must be linearized. Each
term is replaced by , where are binary
auxiliary variables associated to signal by (23) and (24). For
simplification sake, we leave all nonconstant powers of two as
they are throughout the text
(23)
(24)
The noise introduced by a fork is expressed by con-
straints (25)–(28), which come from applying DeMorgan’s the-
orem to (3) and linearizing the disjunction obtained. Binary vari-
ables and are introduced. These constraints are repeated for
each possible ordering of the outputs of a fork
(25)
(26)
(27)
(28)
E. Conditioning Constraints
This last set of constraints computes the word-lengths before
quantization when considering scaling and word-length propa-
gation information [5].
Given an addition with inputs and and output , its
output’s word-length is equal to ,
expression linearized through the following:
(29)
(30)
Delays with input are conditioned through
(31)
Regarding forks, the outputs do not require conditioning but its
inputs must comply with the following:
(32)
The conditioning of gain is expressed by constraint
(33)
where is the scaling of the coefficient associated to .
Finally,the following equation:
(34)
indicates that signals must be truncated to a word-length smaller
than or equal to its pre-quantization word-length.
F. Bounds on Word-Length of Variables
Bounds on word-lengths are estimated using an adaptation of
the procedure presented in [5]: 1) use an heuristic algorithm to
allocate word-lengths and calculate the area due to gains; 2)
assign to each gain input the word-length that makes its area to
be as big as ; 3) set all gain inputs to the maximum word-length
of all gain inputs; and 4) condition the graph.
IV. RESULTS
An MILP solver [11] was used to find the optimal so-
lutions for a set of FIR and IIR filters. The filters coeffi-
cients were obtained using the tool fdatool from Matlab
6.5 [12]. The FIR filters were implemented using the
direct transposed symmetric FIR structure. We denote
a second-order FIR filter with 8-bit inputs and
4-bit coefficients a
third-order FIR filter with 8-bit inputs and 8-bit coefficients
; and a
fourth-order FIR filter with 4-bit inputs and 4-bit coefficients
.
The IIR filter was a second-order filter with 4-bit in-
puts, gain and 4-bit coefficients
, implemented
using the direct form II transposed. The filters were tested
under different latencies and for each latency two solutions
were computed, one for the sequential approach, where the
error constrained problem was solved first and its solution was
fed to the latency constrained problem, and another for the
combined approach.
The comparison results are in Table I in terms of percentage
of area reduction comparing both sequential and combined ap-
proaches. The number of lookup tables (LUTs) required for
the different approaches is also provided (sequential/combined).
The area savings range from 0% to 13.16%, and are due to an
optimal exploration of the dependencies between word-lengths,
resources and error variance. Empty cells imply that a solution
was not found by the MILP solver in practical times (less than 12
hours). For instance, Table II, shows the word-lengths, including
the sign bit, assigned to gains ( and ) and to multipliers
, adders ( and ) and registers ( and ) for
the error/latency conditions and (see Table I, third row)
CAFFARENA et al.: OPTIMAL COMBINED WORD-LENGTH ALLOCATION AND ARCHITECTURAL SYNTHESIS 343
TABLE I
AREA REDUCTION (%) OBTAINED BY THE COMBINED APPROACH
TABLE II
DETAILED WORD-LENGTH DISTRIBUTION FOR FIR
for . The first row represents the area saving and states the
error/latency condition. The rest of rows show the word-lengths,
showing two word-lengths if the sequential results differ from
the combined results . In case the
area of the multiplier is reduced while the area of adders is
slightly increased. In case the area of registers and adders is
reduced thanks to the increase of the word-lengths of gains. Fi-
nally, case shows that an increase in the word-length of gains
enables reducing the area of adders.
The execution times to solve the MILP problems range from
several seconds to several hours (IIR).
V. CONCLUSION
In this brief we have presented a novel MILP formulation for
the combined error and latency constrained area-minimization
problem, applicable to linear time-invariant DSP algorithms.
This optimal model of the problem can be used to assess the
quality of future heuristic methods that address the problem.
The problem can be easily reformulated to include more com-
plex resource binding [8], [13] that support multiple latency and
pipelined resources, operation chaining, etc. The approach can
be also applied to ASIC implementations.
Results show the advantage produced by the combined use
of these well-known design tasks. Area savings up to 13% are
reported.
REFERENCES
[1] G. A. Constantinides, P. Y. K. Cheung, and W. Luk, “The multiple
wordlength paradigm,” in Proc. IEEE Symp. Field-Programmable
Custom Computing Machines, Rohnert Park, CA, 2001, pp. 51–60.
[2] R. Cmar, L. Rijnders, P. Schaumont, S. Vernalde, and I. Bolsens, “A
methodology and design environment for DSP ASIC fixed point refine-
ment,” in Proc. Design Automation Test Eur., Munich, Germany, 1999,
pp. 271–276.
[3] C. Carreras, J. A. Lopez, and O. Nieto-Taladriz, “Bit-width selection for
data-path implementations,” in Proc. Int. Symp. System Synthesis, San
Jose, CA, 1999, pp. 114–119.
[4] K.-I. Kum and W. Sung, “Combined word-length optimization and
highlevel synthesis of digital signal processing systems,” IEEE Trans.
Comput.-Aided Design Integr. Circuits, vol. 20, no. 8, pp. 921–930,
Aug. 2001.
[5] G. A. Constantinides, P. Y. K. Luk, and W. Luk, “Wordlength optimiza-
tion for linear digital signal processing,” IEEE Trans. Comput.-Aided
Design Integr. Circuits, vol. 22, pp. 1432–1442, Oct. 2003.
[6] G. Caffarena, A. Fernandez, C. Carreras, and O. Nieto-Taladriz, “Fixed-
point refinement of OFDM-based adaptive equalizers: A heuristic ap-
proach,” in Proc. Eur. Signal Processing Conf., Vienna, Austria, 2004,
pp. 1353–1356.
[7] J.-I. Choi, H.-S. Jun, and S.-Y. Hwang, “Efficient hardware optimization
algorithm for fixed point digital signal processing ASIC design,” Elec-
tron. Lett., vol. 32, pp. 992–994, 1996.
[8] G. A. Constantinides, P. Y. K. Cheung, and W. Luk, “Optimal datapath
allocation for multiple-wordlength systems,” Electron. Lett., vol. 36, pp.
1508–1509, 2000.
[9] , “Truncation noise in fixed-point SFG’s,” Electron. Lett., vol. 35,
no. 23, pp. 2012–2014, 1999.
[10] R. S. Garfinkel and G. L. Nemhauser, Integer Programming. New
York: Wiley, 1972.
[11] Mosek Aps (2004). [Online]. Available: http://www.mosek.com
[12] Using FDATool with the Filter Design Toolbox.. Natick, MA: The
Mathworks Inc., 2004.
[13] B. Landwehr, P. Marwedel, and R. Dömer, “OSCAR: Optimum simul-
taneous scheduling, allocation and resource binding based on integer
programming,” in Proc. Design Automation Test Eur., Grenoble, France,
1994, pp. 90–95.
