A compound primitive operator approach to the realisation of video sub-band filter banks by Bull, DR et al.
                          Bull, D. R., Wacey, G., Stone, J. J., & Solof, J. M. (1993). A compound
primitive operator approach to the realisation of video sub-band filter banks.
In Unknown. (Vol. 1, pp. 405 - 408). Institute of Electrical and Electronics
Engineers (IEEE). 10.1109/ICASSP.1993.319141
Peer reviewed version
Link to published version (if available):
10.1109/ICASSP.1993.319141
Link to publication record in Explore Bristol Research
PDF-document
University of Bristol - Explore Bristol Research
General rights
This document is made available in accordance with publisher policies. Please cite only the published
version using the reference above. Full terms of use are available:
http://www.bristol.ac.uk/pure/about/ebr-terms.html
Take down policy
Explore Bristol Research is a digital archive and the intention is that deposited content should not be
removed. However, if you believe that this version of the work breaches copyright law please contact
open-access@bristol.ac.uk and include the following information in your message:
• Your contact details
• Bibliographic details for the item, including a URL
• An outline of the nature of the complaint
On receipt of your message the Open Access Team will immediately investigate your claim, make an
initial judgement of the validity of the claim and, where appropriate, withdraw the item in question
from public view.
A COMPOUND PRLMITIVE OPERATOR APPROACH TO THE REALISATION OF VIDEO 
SUB-BAND FILTER BANKS 
David R. Bull*, Glnliani Wucey*, Joliri J. Storre' mid Jon A4. SoloJ" 
*Dept. Electrical a id  Electronic Eng., University of Bristol, Bristol BS8 ITR, UK 
+Sony Broadcast aiid Coiiuiiuiications, Priestley Road, Basingstoke, Hants. RG24 9.P, UK 
ABSTRACT 
This paper presents a iiew video sub-band filtering arcliitecture 
appropriate for VLSI iiiipleiiieiitatioii. The system eiiiploys a 
reduced complexity iiiultiply-accuiiiulate structure realised 
using aii exteiisioii to the primitive operator graph synthesis 
tecluiique. This, in conjunction with a data multiplexing 
regime which impleiiients a two cliaiuiel QMF on a single FIR 
structure, lias facilitated tlie fabrication of a sixty four sub- 
band coder / decoder on a single gate array. The circuitry is 
reconfigurable, allowing vertical aiid horizontal filtering in 
analysis or syntliesis mode. The paper describes the 
conceptual development of the iiew approach aid  presents 
novel architectural features associated with its 
implementation. Also included are complexity coiiiparisons 
with conveiitioiial approaches. 
1. INTRODUCTION 
hnage decorrelation using sub-band filtering methods lias been 
tlie subject of iiicreased interest in receiit years [ 1,2,3,4]. 
Since tlie image artifacts produced can be controlled by the 
choice filters aiid architecture, a perfoniiaiice superior to that 
obtaiiiable with tlie DCT is possible. 
om - 
stage 1 stage 2 stage 3 stage 3 stage 2 stage 1 
ANALYSI- C- S Y N T H E S I S - - - - - - - ,  
Figure 1: One dimensional sub-band structure fur n=3 
Sub-band filtering relies oii the Quadrature Mirror Filter 
(QMF) structure [I]. hi its simplest fonii, tlie two cliaiuicl 
QMF can be used to separate an input signal into low and 
higli-pass sub-bands such that tlie overall data rate remains 
constant after decimation. Furthennore, in tlie abseiice of 
quaiitisation, it is possible to perfectly reconstruct tlie origiiial 
inpiit sigiial froiii the low aiid high pass sub-bands [1,3]. Tlie 
two channel QMF can be applied in subsequent stages to yield 
multiple sub-bands. For example, a flat decomposition yields 
2" sub-bands, where ii is tlie iiuiiiber of stages, and tliis is 
shown in figure 1 for a = 3 .  For images, tlie QMF can be 
applied to tlie rows aiid then to tlie coluiiuis of the image 
yielding iiiultiple sub-baiids in the two diiiieiisioiial spatial 
frequency plane. 
Tlie filter architecture presented here derives its efficiency 
froiii two independent tecluiiques. Tlie first utilises a data 
multiplexed QMF (DMQMF) architecture to efficiently 
implement the two-chamiel QMF as a single finite impulse 
response (FIR) structure wliicli allows constant data rate 
processing tluougliout all filtering stages. Tlie secoiid 
tecluiique focuses on tlie multiplier array as a major influence 
on both impleiiientatioii coiiiplexity aiid power consumption. 
Available tecluiiques for complexity reductioii range from tlie 
eliiiiiuation of reduiidaut iiiultiplier rows tluougli canonical 
coeflicieut recodiiig [ 5 ] ,  to distributed aritluiietic [6] and 
residue number system based methods. An alteniative 
approach, which eliminates the requirement for explicit 
multiplication operations by realising tlie coefficient vector - 
data vector imier product in the fonii of aii optiiiiised directed 
graph, is the priiiiitive operator filter (POF) tecluiique 171. 
POF lias, in tlie past, only been applied to situations requiring 
a single invariant filter response. However, ai exteiisioii to 
the method is developed here which facilitates its use iii 
recoiifigurable filter balk designs. 
2. THE DATA MULTIPLEXED QUADRATURE 
MIRROR FILTER 
The data multiplexed quadrature mirror filter (DMQMF) 
architecture is based 011 tlie staiidard FIR structure as sliown in 
figure 2 .  Tlie output of an odd-syiiuiietrical N-tap filter ,y[ii], 
is given by equation 1, where M=(N-1)/2 a id  D tlie amount of' 
inter-tap delay in multiples of tlie original siuiipliiig period, 
l/fs. 
1-405 
0-7803-0946-4193 $3.00 0 1993 IEEE 
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on February 9, 2009 at 10:31 from IEEE Xplore.  Restrictions apply.
input signal 
x[n+MD] 
I 
niiide select 
Figure 2: Data multiplexed QMF structure 
M 
? ' [ I t ]  = < + I ] .  c[o] + c( x [ n  + 1 4  + x [ n -  t ,n] ) .c [ t , t ]  (1) 
m=l 
Since each stage of the aiialysis QMF filtering process is 
followed by a factor of two decimation, lialf of the samples in 
tlie output sequence are redundant. If the decimation 
operatioiis are perfoniied out of phase then both aiialysis lialf- 
rate filters can be combined to fonii a single full-rate filter 
with lowpass / high-pass coefficieiit multiplexing.. Similarly, 
the synthesis half-rate filters caii also be combined to fonii a 
single full-rate filter, this time with tlie synthesis low-pass aiid 
high-pass coefficients interleaved. A more eficieiit hardware 
realisation of tlie filter balk can thus be devised which 
generates only non-reduiidant output samples. 
For the tirst stage ofaiialysis in the DMQMF, D=l. A selector 
( S )  is used to switch the coefficients c[iii] betweeii low-pass 
m d  high-pass vectors in such a way that the filtered samples, 
~ [ i i ] ,  alteniate betweeii low and high-pass values. Thus, when 
geiieratiiig output saiiiple 11, the filter coefficients c[m] are 
given by: 
c[ m ]  = / [ n i l  
C[ 1 \ 1 1  = / I [  I H ]  
\vliere I[m] and h[m] are tlie low-pass and high-pass filter 
coetlicieiits, respectively of length NI illid Nl1. The output 
sequence y[n] thus consists of a data niultiplexed sequence of 
low and high pass saiiiples represented by (L,H,L,H,L,H,L,H). 
:O I ttt I ( N I  - 1) / 2 
:U I t\t I (11, - 1) / 2 
for ii even 
for ii odd 
hi tlie second stage of DMQMF, D=2. Each delay block thus 
represents two delay units and tlie coefficient selector S 
switches at fd4. The low-pass filter is therefore applied to the 
first two samples and tlie high-pass filter is then applied to the 
next two samples such that y[n] now takes the fonii: 
(LL,HL,LH,€nr,LL,HL,LH,~~~). hi a siiiiilar way D=4 for the 
third stage resulting in an output sequence of 
(LLL,HLL,LI &€€I IL,LLI I,I-ILH,LHH,HHH). 
nie function of tlie synthesis process is to add together the 
interpolated low-pass and high-pass filtered sigiials into a 
single data stream. This is done using the same FIR structure 
as shown in figure 2 with tlie coeflicient selector S selecting 
betweeii two sets of filter coefficients for alteniate output 
samples. IIowever, in this case, oiie of tlie sets comprises odd 
lowpass coelticients and eveii high-pass coefficients aiid the 
other set comprises odd high-pass coelfcieiits and eveii low- 
pass coefficients. By arranging tlie coefficients in this maiuier, 
the addition of the low-pass and high-pass interpolated sigiials 
can he implemented witliiii the filter itself. The coefficients 
are switched in such a way that low-pass input samples arc 
always multiplied by low-pass coefficients aid high-pass input 
samples are always multiplied by high-pass coefficients. 
For tlie sub-band filtering of soiiie images, it is preferable to 
use different filters for tlie vertical and liorizoiital dimensions. 
Therefore, four filtering modes can be defined: horizontal 
aiialysis (HA), vertical analysis (VA), vertical sylitliesis (VS) 
aiid horizontal synthesis (HS), these being selected usiiig the 
multiplexers in figure 2. Each iiiode comprises two filters (I 1' 
aiid LP) giving a total of eight coefficient sets. Table 1 slio~vs 
a typical example of eight cocflicieiit sets represented to 15 hit 
precision and derived using a method described iii [8]. 
Implementation of successive balks of DMQMF caii he 
acliicved tluough iiiultiple concatenation of the circuit shown 
in figure 2, using an appropriate value of D at each stage. 
3. A COMPOUND PRIMITIVE OPERATOR 
MULTIPLIER-ACCUMULATOR REALISATION 
3.1 
The POI; approach [7] exploits redundancy present in the 
coefficient vector-data vector iiuier product coiiiputatioii to 
yield aii optimised multiplier-free replaceiiieiit for tlic 
conventional multiplier-accmii~ilator array. hi its basic fonii, 
the tecluiique could he employed to generate eight independent 
primitive operator graphs, oiie for each sub-band lilter type 
required. Iliis however yields little or iio saving over general 
Compound Primitive Operator Graph Synthesis 
Table 1 - Filter Coefficient Vectors 
1-406 
purpose iiiultipliers with switchable 
coefficients. 
The POF tecluiique can be adapted to 
synthesise a single graph embodying 
coellicient vertices for all eight filters. 
Just as tlie conventional primitive operator 
graph exploits vertex reuse witliiii a single 
tilter, so tlie adapted iiietliod extends this 
to allow reuse across iiiultiple filters. hi 
the case of the DMQMF, depending oii tlie 
QMF coefficients selected, a high degree 
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on February 9, 2009 at 10:31 from IEEE Xplore.  Restrictions apply.
l i  
0 pipelie register 
redundant registei 
@ bit parallel adder 
sX shift by x bits 
Figure 3: Pipelined primitive operator graph 
of coiiuiioiiality caii exist between vectors (see [8] a id  table 1 ) .  
This characteristic is exploited by the new approach. 
3.2 Implementation details 
Tlie set of unique coefficients froiii table 1 were eiiiployed to 
synthesise a coiiipouiid priiiiitive operator graph using the 
POF'GEN design package [9]. Tlie resulting graph when fully 
pipelined at tlie word level is shown in figure 3. Tlie 
complexity of tliis structure caii however be further reduced 
when wordleiigtli and tiiiiiiig rules are considered. Tliese 
lacilitate reductions in iiitenial data path widths and the 
iiuinber of pipeline stages respectively. 
Witli knowledge of the input data wordlength together with 
individual coefficient values, the maxiiiiuiii iiitenial signal 
tvordlength caii be deteniiiiied. An upper bound on the output 
\vordleiigtli, Bout, can be coiiiputed using equation 2, where 
Bill is tlie iiiput wordlength aiid ceil( .) retunis the least integer 
greater or equal to its argument. 
Witli ai1 initial wordlength of 13-bits (after tlie folding addition 
operation), ail upper liiiiit of 28-bits, doiiiiiiated by the HP.HS 
Iilter, results. Tlie intenial data path width of tlie graph has 
thus been allowed to increase to this value prior to MSB 
truiication. 
Tlie processing delay caused by each processing element is a 
limctioii of the wordleiigtli at the associated graph vertex, tlie 
delay characteristics of the cell compoiieiits used and tlie 
capacitative loading due to tracking. Ignoring tlie effects of tlie 
latter, tlie delay, d,, for a single adder coiiiprisiiig k+l 4-bit 
look-ahead-carry adder blocks is giveii by equation 3, 
dd = f l  + f 2  +( k - l ) t3  (3) 
where: 
t l : =  worst case delay for an input to output transition. 
t2:= worst case delay for a carry in to output transition. 
t3:= worst case delay for a carry in to carry out transition. 
t4:= worst case delay for an iiiput to carry out transition. 
The total delay, dt. for a path through A consecutive adders is 
given by equation 4. Assuiiiing tl>t2>tq>t3 and tl+t3<12+t4 
then, 
dl = A t ,  + A t 4  + ( K  - A)t ,  : K 2 A 
(4) 
( A  - K ) f l  + K f 2  + Kt4 : K 5 A 
where K is the largest value of k associated with any of tlie A 
adders in tlie chain. Using the above equatioiis and 
incorporating additioiial delays due to capacitative loading, 
each path tluough tlie grapli can be optiiiially load balanced 
and aiiy reduiidaiit pipeline registers removed. The result of 
tliis exercise lias been to reduce the POF structure from eight 
to oiily thee pipeline stages as indicated in figure 3. 
The POF structure requires a control unit to coiifigure data 
patlis according to filtering task. Table 2 gives exaiiiplc 
coiuiectioiis required to route folding adder output signals 
xIi1[ii] (equation 5 ) ,  for each filter, to tlie correct POF weiglitcd 
path. 
n,,,[ n] = x[ n + m ~ ]  + x[ tri - ~ H D ]  (5) 
It caii be observed that each graph iiiput vertex need oiily he 
switched between two possible sources: xlli[ii] (for fixed m) or 
signal gromid. Each switch can thus be realised with minimal 
overhead and controlled by sigiials derived froiii a thee  bit 
filter ideiitifier code. 
TABLE 2: Signal connections tu selected graph input 
vertices for all filters. 
Tlie overall sub-baiid filtering ASIC architecture is iiidicated 
in figure 4. 
4. IMPLEMENTATION EFFICIENCY 
Area coiiiparisoiis between a coiiipiled cell iiiultiplier ailcl the 
POF approach are giveii below, based 011 the following 
assumptions (BC= basic cell, ie oiie p-11 traiisistor pair.): 
1-407 
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on February 9, 2009 at 10:31 from IEEE Xplore.  Restrictions apply.
I-------  
Data niiiltiplexing I494 11 and iiiultijdexiiig 
DATA IN -i - A OUT 
Figure 4: Three stage filter architecture 
(I )  48 BCs per 4-bits of addition 
BCs per &bits of register 
BCs per bit of data switch 
20 
2 
1 BCs per invertor 
4 BCs per exclusive-OR gate 
(21n+8)*(411+11) BCs for a conipiled cell iiiultiplier 
with in bit iiiultiplicand and ii bit iiiultiplier 
Oiily iiiultiples of 4-bit adder blocks are used 
An mput wordlength of 12-bits IS used 
(11) 
(111) 
It ~ a i i  be seen froiii table 3 that the conipiled cell iiiultiplier 
results in an additional 2486 1 basic cells when compared with 
the ~oiiipouiid POF solution, representing an overall increase 
iii basic cell count of approximately 68% hi practice however, 
SIIIL(III area will be utilised less efficiently by the POF 
npproacli due to routing coiiiplexity Typical area utilisation 
adjustiiient factors are 0 8 for conipiled cell layouts and 0 7 for 
nomially routed (POF) layouts Taking these factors into 
account, coniparative complexity values can be denved wluch 
result in the conipiled cell approach having a coniplexity 47%) 
higher than that for the POF 
Multiplier/adder tree I9924 11 Multipliers I 14266 
5. CONCLUSIONS 
This paper has presented a new approach to tlie 
iiiipleiiieiitatioii of video sub-band filter banks. The systeni 
provides a reduced coiiiplexity solution tluougli the use of a 
data multiplexing regime in coiijunctioii with a primitive 
operator realisation of the filter iiiultiplier accuiiiulator These 
techniques together have facilitated the fabrication of a system, 
I 
on a single gate array, capable of operation in both analysis 
and synthesis modes for iiiiage decomposition into 64 suh- 
bands. The system described has been coiiibined with 
quantisation, entropy encoding, field and line stores and rate 
control hardware to provide a complete, 2 bits per pixel 
'perfect reconstruction' compression / decoinpressioii system 
for NTSC coniposite video signals 141. 
Control logic 
Total for one stage 
Total for 3 stages 
ACKNOWLEDGEMENTS 
The filter coefficients as sliowu in Table 1 are published with 
the kind peniiission of .I. I I .  Wilkinsoii (SBC, U.K.) 
32 Adder tree 3100 
12161 Total for one stage 20448 
36483 Total for 3 stages 6 1344 
REFERENCES 
[ I ]  Vaidyaiiathan, P.P., "Quadrature mirror filter balks, 
M-band extensions aiid perfect reconstruction techniques", 
IEEE ASSP Magazine, Vol. 4, pp 4-22, 1987. 
[2] Woods, J.  W. and OWeil, S. D., "Subband Coding of 
hnages", E E E  Trans. ASSP, Vol. ASSP-34, No. 5, pp 1278- 
12xx, 1986. 
[3) Le Gall, D., and Tabatabai, A., "Sub-band Coding of 
Digital hiiages Using Syiuiietric Short Keniel Filters and 
Aritluiietic Coding Techniques", Proc. ICASSP-88, Glasgow, 
141 Ilurley, T.R. aiid Stone, J.J., "Sub-band Coding of 
Composite Video For Data Compression hi A Solid State 
Recorder", Proc E E  4th Iiikniatioiial Conference on hiiage 
Processing aiid its Applications, Maastricht, Netlierlands, pp 
465-469, 1992. 
[ 51 Peled, A. ,"On the Hardware hiiplementation of Digital 
Signal Processors", IEEE Trans. ASSP, Vol. ASSP-24, No. 1 ,  
[6] Peled, A. aiid Liu, B.,"A New Hardware Realisation of 
Digital Filters", IEEE Trans. ASSP, Vol. ASSP-22, No.6, 
pp456-462, 1974. 
[7] Bull, D. R. and Horrocks, D. H.,"Priniitive Operator 
Digital Filters", IEE Proc. Part G, Vo1.138, No. 3, pp401412, 
1991. 
[8] Wilkiiisoii, J. W., "Wavelet Traiisfonii in a Digital 
Video Tape Recorder", IEE Colloquiuiii oii Applications 01' 
Wavelet Traiisfoniis in hiiage Processing, London, UK, 1993. 
[9] Wacey, G. and Bull, D. R.,"Architectural Synthesis of 
Digital Filters for ASIC hiiplemeiitation", Proc IEE Saraga 
Colloquiiini on Digital and Analogue Filters', London, UK, 
UK, 11~761-764, 1988 
~ ~ 7 6 - 8 6 ,  1976. 
~>~>11/1-11/6, 1991. 
11 COMPOUND POF I BC 11 COMPILED I BC COUNT II 
II  Y I II  
1-408 
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on February 9, 2009 at 10:31 from IEEE Xplore.  Restrictions apply.
