POFGEN: a design automation system for VLSI digital filters with invariant transfer function by Wacey, G & Bull, D R
                          Wacey, G., & Bull, D. R. (1993). POFGEN: a design automation system for
VLSI digital filters with invariant transfer function. 631 - 634.
10.1109/ISCAS.1993.393800
Link to published version (if available):
10.1109/ISCAS.1993.393800
Link to publication record in Explore Bristol Research
PDF-document
University of Bristol - Explore Bristol Research
General rights
This document is made available in accordance with publisher policies. Please cite only the published
version using the reference above. Full terms of use are available:
http://www.bristol.ac.uk/pure/about/ebr-terms.html
Take down policy
Explore Bristol Research is a digital archive and the intention is that deposited content should not be
removed. However, if you believe that this version of the work breaches copyright law please contact
open-access@bristol.ac.uk and include the following information in your message:
• Your contact details
• Bibliographic details for the item, including a URL
• An outline of the nature of the complaint
On receipt of your message the Open Access Team will immediately investigate your claim, make an
initial judgement of the validity of the claim and, where appropriate, withdraw the item in question
from public view.
POFGEN: A Design Automation System for VLSI Digital Filters 
with Invariant Transfer Function 
G. Wacey and D.R. Bull 
Dept. Electrical and Electronic Engineering, University of Bristol, 
Queens Building, University Walk, Bristol BS8 ITR, UK 
Ahstrucf - This paper describes the structure, methodology 
and potential of a new design automation tool, POFGEN, for 
the generation of fixed function VLSI digital filters. The 
system accepts input data in the form of a cnefficient vector 
and uses this to form a pipelined, multiplier-free architecture 
by employing primitive operator graph synthesis methods. 
The output from POFGEN is available either in the form of a 
structure diagram or as an HDL file for direct ASIC 
generation. 
I. INTRODUCTION 
As a consequence of cost reductions in both computer 
hardware and circuit fabrication coupled with a deniand for 
shorter design cycles, the requirement for tools which 
autonlate tlie IC design process has increased. This is 
especially noticeable in the area of real-time signal 
processing, wlzere complex algorithm, requiring high 
computational rates, must be efficiently integrated. A 
nuniber of packages have been reported for tlie design and 
conipilation of application-specific DSP functions. These 
include FIRST [I], Cathedral [2] and FIRGEN [3]. FIRST 
and Cathedral I are restricted to bit-serial architectures with 
fixed cell libraries. More recently, increasing integration 
levels coupled with the quest for higher bandwidth 
processing has promoted the development of bit-parallel 
architectural synthesis tools (Cathedral 11-IV and FIRGEN). 
These systems perform varying degrees of optinusation at 
the arithmetic level, with some facilitating the design of a 
broad range of DSP functions. 
This paper introduces a new design automation system 
(POFGEN) which incorporates specification and 
architectural synthesis tools, tailored to the implementation 
of FIR digital filters, in both bit-serial and bit-parallel 
formats, €or fixed function applications. The approach 
adopted embodies the primitive operator filter (POF) design 
methodology [4] in which the multiply- accuniulate array is 
replaced by a structure based on a single directed graph 
eniploying only prinutive operations (addition, subtraction 
and power of two multiplication). The graph is formed by 
encouraging the reuse of internal vertices while preserving 
llie specified transfer function with no loss of coefficient 
accuracy. It offers a significant reduction in arithmetic 
0-7803-1254-6/93$03.00 Q 1993 IEEE 
complexity typically, for larger filters, replacing each 
multiplier by a single adder or subtractor. 
11. SYSTEM OVERVIEW 
Figure 1 illustrates the structure of POFGEN. Filter 
coefficients may either be input interactively, or in the fomi 
of files generated by standard filter design packages. 
Multiple files representing a filter bank can also be accepted. 
These are combined internally to fomi a single file 
containing all unique values [5 ] .  
The transposed-form (one to many mapping) primitive 
operator graph is generated initially, using one of four 
classes of graph synthesis algorithm, characterised 
according to the prinutive operators permitted. This is then 
used to form a logical graph, where precedence relationships 
between vertices are assigned. This allows an initial register 
................................................... 
v 
input / define “‘5‘ cneficieiit 
4.. ................................. 
f 
3. t t t ? 
I 4 
model 
diagram 
Fig.1. POFGEN structure 
631 
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on February 10, 2009 at 09:48 from IEEE Xplore.  Restrictions apply.
assignment to be perfornid which ensures correct 
synchronisation of signals at each graph vertex. Such an 
assignment is however generally inefficient and a process of 
graph reduction niust be perfornied. At any tinie after 
logical graph formation the graph may be transposed to 
yield the direct forni structure. Either graph type may be 
optiinised for either bit-serial or parallel arithmetic. 
Once a graph has been generated, internal data-path 
wordlengths can be assigned, the design simulated and a 
measure of circuit iniplenientation complexity generated. 
The effects of signal rounding within tlie graph body nlay be 
assessed at this stage and simulated to give signal to 
quantisation noise ratios. The final pipelined architecture 
may be viewed in tabular or structural form and, for output 
purposes, the delay elements and accuniulate and folding 
additions (tlie latter used in synunetrical linear phase filters) 
may be appended to the graph data structure to forni a 
coniplete filter. Finally the filter may be output in the form 
of an HDL file or structure diagram. 
111. DIRECTED-GRAPH SYNTHESIS METHODS 
A .  The Primitive Operator Graph 
Tlie relationship between an input signal, x[n], an output 
signal, y[n], the signals at internal graph vertices, w[n], and 
graph output vertices, u[n], for the transpose form filter 
structure is given by the following equations: 
w[n] = grw[n] + hrx[n] 
+] = &+q 
y [ n ] =  CI l [n- i ]  
N 
i d )  (1) 
where 61 and 82 are edge gain matrices, h is a row vector 
which niaps x[n] on to wl[n] and N is tlie filter order. 
Tlie specified coefficient vector is used to synthesise a 
transposed-form graph, employing one of four available 
algorithnis (addition only, additioidsubtraction, 
additioidsliifi and additioidsubtractiodshift). Tlie most 
ef€icient algorithms for the majority of applications are the 
so called shiA biased algorithms [4]. These attempt to 
niininiise the nuniber of addershubtractors in the graph, by 
allowing graph edges to assume gains equal to any non- 
negative power of two. 
Prior to algoritluii execution the coefficient file is 
niodified by dividing all elements by the niaxiniuni power of 
two, 2l. which maintains an integer result. The initial graph 
is then synthesised using these new values and the 
remaining output edge gains are compensated in order to 
restore tlie original multiplier value. This pre-shifting 
generates a graph with fewer vertices, decreasing the 
internal coniniunications overhead and reducing the average 
vertex data path width. 
Consider an example filter with scaled integer coeficients 
{ 1,4,9,15,27,38). The gain matrices after graph foniiation 
are as follows: 
g, = 
o 1+2' i + z l  o 0 24 
0 0 0 1 1+2' 0 
0 0  0 1 0  1 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
g, = 
1 22 0 0 0 0 
0 0 1 0 0 0 
0 0 0 0 0 0 
0 0 0 1 0 0 
0 0 0 0 1  0 
0 0 0 0 0 2 '  
B. The Logical Graph 
The edge gain matrices g l  and g2 are used to form a 
logical graph which provides a vehicle for tlie timing and 
pipelining of the filter. Each vertex in the graph is assigned 
a precedence value where the precedence level of vertex k, 
fed from vertices i and j, is given by: 
Ak =niax(Ai,A,)+l (2) 
Here Ai and A. are the precedence levels of vertices i and j 
respectively. h e  nuniber of precedence levels present in a 
graph is thus A = niax(Ai). 
The precedence graph is fully pipelined by placing a 
padding register on every edge, at every precedence level. 
The delay ,dij, required on edge ij for correct data alignment 
and the corresponding nuniber of padding registers, rij. 
needed can be computed from the edge gain matrices and 
are given by equations (3) and (4) respectively. 
dij = Ai -A, 
C. Direct Form Transpositiori 
Thus far only the transposed form structure has been 
considered. POFGEN however allows a graph to the be 
transposed to yield the direct-form (many to one mapping) 
structure. This too can be pipelined and register-reduced as 
described below. 
Graph transposition is achieved by reversing all edgc 
directions, exchanging branch vertices and adder vertices as 
appropriate and interchanging the outputs with the input. 
Direct transposition generally gives rise to a structure with 
an excessive nuniber of precedence levels. This is due to 
vertices in the initial graph being frequently reused (ie 
having a high out-degree), resulting in a long sequence of 
adders in tlie direct form. By identifying tlie associated 
edges and conibining these using a tree structure, a graph 
with fewer precedence levels results. If, during tree 
formation, the edges with tlie largest gains are conibiiicd 
first, shiA elimination as described below, can be applied 
more effectively. 
632 
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on February 10, 2009 at 09:48 from IEEE Xplore.  Restrictions apply.
Iv .  REDUCTION TECHNIQUES 
A.  Topological Techniques 
The fully pipelined graph formed as described above is 
generally suboptimal. It is therefore often desirable to 
reduce circuit complexity by elinunating any redundant 
padding registers. Initially duplicate paths and their 
associated padding registers are renioved, as shown in figure 
2. These occur where a vertex has a number of parallel 
edges emerging from it, each having associated padding 
registers. This overhead niay be eliminated if a single 
shared path is used with delay values modified as follows: 
rnv = r,, = 0; r,,,,, = q k  - 5, ; rnl = - qk ; r. mr = r.. y ( 5 )  
Therefore Alll = A j  and AI, = A k .  
A second eliniinatioii technique involves translating 
additiodsubtraction vertices to higher precedence levels as 
demonstrated in figure 3 where, assuming rjq=O, 
and A ,  =A, = A k  (7) 
A final reduction technique, illustrated in figure 4 and 
characterised by equation (9), involves the identification of 
all adderhubtractor vertices with shifts on both inputs ( ie 
for vertex k, {gik, gjk}>l). The gain common to both inputs 
can be factorised and repositioned at the vertex output. This 
reduces the data path width of the vertex in a parallel system 
or reduces the nuniber of shift register bits required in a 
serial realisation. 
S1;k = Sik - min( Sjk, Sjk ); S'jk = sjk - min( $jk, .Tjk); 
It should be noted that for bit-serial implementations, 
shifts are realised as one bit registers, and as such can 
double as padding registers, provided that correct vertex 
timing and edge gains are maintained. 
B. Tinring Andvsis Based Techniques 
The processing delay caused by each processing element is 
a function of the wordlength at the associated graph vertex, 
the delay characteristics of the cell coniponents used and the 
capacitative loading due to tracking. Ignoring the effects of 
the latter, the delay, da, for a single adder coniprising k+l 4- 
bit look-ahead-carry adder blocks is given by equation (9), 
..... 
'nl 
Fig.2. Duplicate path elimination 
@ A i  @ 
Wi, 
/ 
0: A,
-E 
Fig.4. Shii  eliminatioii 
where t l ,  t2, t3 and t4 represent worst case delays iiiput to 
output, carry in to output, carry in to carry out and input to 
carry out transitions. The total delay, dt, for a path through 
A consecutive adders is given by equation (10). Assuming 
tl>tpt4>t3 and tl+t3<t2+t4 then, 
r i t = ~ t Z + ~ 1 4 + ( ~ - ~ ) r ,  : K ~ A  
( 10) 
( A - K ) t , + K f 2 + K f 4  : K I A  
where K is the largest value of k associated with any of the 
A adders in the chain. Using the above equations and 
incorporating additional delays due to capacitative loading, 
each path through the graph can be optinlally load balanced 
and any redundant pipeline registers removed. 
633 
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on February 10, 2009 at 09:48 from IEEE Xplore.  Restrictions apply.
v. OUTPUT OPTIONS 
A .  Structural Diagram 
The fully reduced graph can be displayed in the form of a 
table or structural diagram. An example of the latter, for a 
fully pipelined version of the filter specified in section IIIa, 
is given in figure 5 .  
B. Hflrdware Description Language 
The structure can be converted either to vendor 
independent VHDL [6] or to a vendor specific HDL at tlie 
gate level (currently only ES2 'model' code is supported). A 
series of POF paranieterised parts [7] have been generated 
and these are used to form tlie HDL text file. The HDL file 
can take one of two fomB dependent on whether a fixed or 
variable input wordlength is specified. The latter produces 
paranieterised HDL file, allowing flexibility during 
siniulation and ASICs with differing input wordlengths to be 
fabricated from the same design. Tlie internal data 
wordlength of tlie filter can be assigned a maximum, critical 
or user defined limit. Tlie latter is set by the designer to 
indicate any desired value. The niaxiniuni is determined by 
allowing the wordlength to grow by one bit (excluding 
shifts) for each adderhubtractor present in the graph. Tlie 
critical case is dependent on coefficient distribution is given 
by equation (1 l), 
where Bill is the input wordlength and ceil(.) returns tlie 
least integer greater or equal to its argument. 
An example model code file corresponding to the 
structure in figure 5 is given below. Tlus is based on the 
assumptions of critical case wordlength growth and %bit 
input data. 
hichide "parallellib. iiic" 
Part test [clk,xn(0:7),r] -> 110(0:7),112(0: 1 l),h3(0: 11),h4(0: 13), 
h5(0:12) 
Signal a1(0:7),a3(0: 1 l),a4(0:9),a9(0:7) 
vlpr (8) [clk,?ai(O:7),r] -> a1(0:7) 
pspadder (8,8,12,0,3,1) [clk,al(0:7),al(0:7),r] -> a3(0:11) 
pspadcler (8,8,10,0,1,1) [clk,al(0:7),al(0:7),r] -> a4(0:9) 
pspadder (12,10,12,0,1,1) [clk,a3(0: 1 l),a4(0:9),r]->h3(0: 1 1) 
pspadder (12,12,14,0,1,1) [clk,a3(0: 1 l),a3(0:1 l),r]->114(0:13) 
pspadcler (8,1O,13,4,O, 1) [clk,a9(0:7),a4(O:9),r] -> h5(0:12) 
vlpr (8) [clk,al(0:7),r] -> a9(0:7) 
vlpr (8) [ck,a9(0:7),r] -> h0(0:7) 
vlpr (12) [clk,a3(0:1 l),r] lQ(0:ll) 
Eiid 
Elid Of File 
pipeline register 
Si shift by x bits 
h[0]=1 
h[l]=4 
h[2]=9 
h[3]=15 
s2 
h[4]=27 
h[5]=38 
Fig.5. Example pipelined POF structure 
VI. CONCLUSIONS 
.This paper has outlined tlie nietliodology underlying tlie 
POFGEN design automation package, together with its 
potential for applications requiring high throughput, fixed 
function FIR filters. The system, as described, is now fully 
operational and has been used successfully in the production 
of a filter bank for a 64 channel sub-band coder for video 
data compression. This systeni has now been fabricated on a 
single gate array. Work is continuing on the development of 
a full VHDL interface and on tlie incorporation of enhanced 
optiniisation techniques. 
ACKNOWLEDGEMENT 
Tlie authors would like to thank Sony Broadcast and 
Communications, the SERC and Dave Horrocks (UWCC) 
for their support and assistance. 
REFERENCES 
Murray AF.  and Denyer P.B., 'A CMOS Design Strategy for Bit-Serial 
Signal Processing', IEEE Journal of Solid-State Circuits, Vol. SC-20, 
No. 3, June 1985, pp746-753, 
DeMan, H et al.,'Architedure Driven Synthesis Techniques fur VLSI 
Implenientation of DSP Algoritluiw', Roc. IEEE, Vol. 78, No. 2, Feh. 
Jain R., Yang P.T. and Yoshino T., 'FIRGEN: A Computer-Aided 
Design System for High Perfomiaiice FIR Filter Integrated Circuits', 
IEEE Trans. on Sigtml Processing, Vol. 39, No. 7, July 1991, pplG55- 
1668. 
Bull, D.R. and Homwks D.H., 'Primitive Operator Digital Filters', IEE 
Proc. Part G., June 1990, pp 401-412. 
Bull, D.R., Wacey, G., Stone, J.J. and Solo& J.M., 'A Compound 
Primitive Operator Approach to the Realisation of Video Sub-Band 
Filter Balk', Proc. IEEE hitl. C o d  on Acoustics, Speech and Signal 
Processing, Minneapolis, USA, April 1993. 
'IEEE Standard VHDL Language Reference Manual', IEEE Std 1076, 
Second Printing, April 1989. 
Wacey, G., 'A VLSI Iniplenieiitation of Priiiutive Operator Digital 
Filters', MSc Dissalation, I hiv. of Wales Col. of Cardiff, 1990. 
1990, ~ 3 1 9 - 3 3 5 .  
634 
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on February 10, 2009 at 09:48 from IEEE Xplore.  Restrictions apply.
