VLSI implementation of a massively parallel wavelet based zerotree coder for the intelligent pixel array by Alagoda, Geoffrey N.
Edith Cowan University 
Research Online 
Theses: Doctorates and Masters Theses 
1-1-2001 
VLSI implementation of a massively parallel wavelet based 
zerotree coder for the intelligent pixel array 
Geoffrey N. Alagoda 
Edith Cowan University 
Follow this and additional works at: https://ro.ecu.edu.au/theses 
 Part of the Computer Engineering Commons 
Recommended Citation 
Alagoda, G. N. (2001). VLSI implementation of a massively parallel wavelet based zerotree coder for the 
intelligent pixel array. https://ro.ecu.edu.au/theses/1078 
This Thesis is posted at Research Online. 
https://ro.ecu.edu.au/theses/1078 
Edith Cowan University
Research Online
Theses: Doctorates and Masters Theses
2001
VLSI implementation of a massively parallel
wavelet based zerotree coder for the intelligent pixel
array
Geoffrey N. Alagoda
Edith Cowan University
This Thesis is posted at Research Online.
http://ro.ecu.edu.au/theses/1078
Recommended Citation
Alagoda, G. N. (2001). VLSI implementation of a massively parallel wavelet based zerotree coder for the intelligent pixel array . Retrieved
from http://ro.ecu.edu.au/theses/1078
Edith Cowan University 
  
Copyright Warning 
  
 
  
You may print or download ONE copy of this document for the purpose 
of your own research or study. 
 
The University does not authorize you to copy, communicate or 
otherwise make available electronically to any other person any 
copyright material contained on this site. 
 
You are reminded of the following: 
 
 Copyright owners are entitled to take legal action against persons 
who infringe their copyright. 
 
 A reproduction of material that is protected by copyright may be a 
copyright infringement. Where the reproduction of such material is 
done without attribution of authorship, with false attribution of 
authorship or the authorship is treated in a derogatory manner, 
this may be a breach of the author’s moral rights contained in Part 
IX of the Copyright Act 1968 (Cth). 
 
 Courts have the power to impose a wide range of civil and criminal 
sanctions for infringement of copyright, infringement of moral 
rights and other offences under the Copyright Act 1968 (Cth). 
Higher penalties may apply, and higher damages may be awarded, 
for offences and infringements involving the conversion of material 
into digital or electronic form.
VLSI Implementation of a Massively Parallel 
Wavelet Based Zerotree Coder for the 
Intelligent Pixel Array 
By 
GEOFFREY NJSHANTHA ALAGODA 
A Thesis Submitted in Partial Fulfilment of the Requirements for the 
Degree of Doctor of Philosophy 
at the 
School of Engineering& Mathematics 
Edith Cowan University 
Principal Supervisor: Professor Kamran Eshraghian 
Associate Supervisor: Dr. Alexander Rassau 
Nov 2001 
To my fiancli Vanisha, 
my _father Gamini, 
my mother Deanna., 
and my bn;ltl1c~·bale/;;' 
' ' . 
,( 
·:,,,·\·.,' 
,,, 
.,, . 
. ,.,,:· 
II 
Contents 
'\· 
~se ofThesis ............... '.·: ........ :.'.,,,, ... ,.~;;·;;,/.~; ..... :·: . .-.:.;:\:·:.;;,;;;;,;,, ..... : .... IX 
" ·, __ ~ .,'-/ 
Declararion .................... i ...... :~:::,.: ... ·;;;':;;:i. ...... :'.:~};\,·;';;;:;·,,;;:· .....•• :: .. ,.: ..... X 
.. ' ·_, .. . .. '· . . 
·--· . 
Acknowledgements .... : •••. ,, .. ::L.:·:~t:.·.:::.\.~;.-,;,'.:~;;:./·._:-~.'.:;·;.t:}:'. ... -...... XI 
-,,·· --,·" ·/.c1' --, 
Abstract ........ ,,,,,,,,,,,,,,,,,,,-;;,,;~~ •... i ... ,,;·;:,,'. ........ :.;~;;_,'·:·;;·;-.:f,:,,_'..·;';,',, ... ? .. :_'. .. XII 
__ - ·. ' . ' -- ·;.,:,· ·:- ·:-:, ,-;::- ... 
Publications ........................................... ,.;,;,, •• ;,, ......... ,;·,,;;;;,,;~ ••• ,,,;;,,,,XIV 
Li fF. .· ·. ' --i: .- ·- .- -·.·:.;_·· ' .. ·· .. xv st o 1gures .................................................... _ ................... ~······ ... , ..... _ 
·.·,· .. ; 
·. --~ 
List ofTables ,,,,, .................... ,,,,,;;,,, .. ;;;;,,,,,: ....... :,i;-;;;',:.:;;-;,L: .•. :.:'..;:-: ••. XIX 
-·.,i,1 
1 Mobile Multimedia Comm\lnicatio~s-Sy&t~ins ._ ..... ·;'. ... :·;.:.: ..... :._'. .... ·,' ...... 1 
. . " ' :.-
1.1. InlrOduction .............................................. ' .... ·_ .. ',·-· ..... i:·.' .... : .... ·-' .... ··./.,'.-..... · ........... 1· 
. -· . ~. ' . 
1.2. Motivation ofThesis ............................. :, ............. :.i ... '. ..... :.':: .... : .... :.:-:: ............................ 4 
. . .· _ _.;, ..•. . "i,.:'<:<' _,, .. 
1.2.1. Mobile Multimedia Communic:likms IndUSiry .. :, .. .-:; .... :.'. ............ : .............. , ...... 4 
1.2.2. Novd Res~~ Opp~~~ty:.: .............. _..::'.:;: .. :·~:?:·.:: . .):/::".:_. .. : .... :: .. :~:::;.: ........ :6 
1.3. s(O!Je ofThcsis_ ... _.:·• ............. .' ... ;.;.: ...... ,, ... _.:··• ..... _., .. _:··L· ... ;:,;,::·':·:·'.·;.;::·.::.:: ... ,.::::.: ... .' .... : ..... 7 
1.3.1. Scope of Chapter 2 ....... :: .. .'.'. .... ,.::.;;:::° .. :,;{,.,.,:.:.,:;:.:;:: ... :·i/·· .. :.;:.:_.': ...... ::;.!.. ............ 7 
1.3.2. 
1.3.3 .. 
1.3.4. 
_,, .·:. 
1.3.5. 
_1.3:6. 
::.:::~:~:~:::::::::::::::::::::-::::::::'.'..'.'.'.'.:::::::i'.::;:;'.~;~;:;::::::'.:':::::::;;::'.:~:::::::::·::::::::::::::: 
. . . . . . 
. __ }:::: ~-; ~~::_;:::::;:::)t~:;:::::;)1;_:;::f t:)/::;;_/:tf~-t;{::t::::::;::::::::::::, 
_ScOpe of Chapter 7 ........ _.._ .. :.' ..... ; ... :: ..... : ....... ." .. '. .. ,.,,;::.:.;: .. :.:.,: .. :· ... ::,.· . .-:: .......... : ......... 9 
2 i~age / Video Co cling Tec.haiques ... .'.:.._;:,;,.i'.:~.-;~.{;,-;:,;~;i:·).-;i/:" ... ·'.":~ .. 10 
2.1.- Iniroduccioo ................... ; .......... :.'.'. ... :.; .......... :\·.':':'.}: ... -.;,;; ...... ,;',.' ..... i .. '.:· .. -_:.::: .... ;; .. : ........... 10 
2.2. Image I Video Coding PrimitivC~ .......... :;::-. .. '.:·.: .. ::: ... :,;:: .. \ . .X::\i .. :.i\::·:".;.:.: .... ::: .. :·: ... 13 
2.2.1. 
2.2.2. 
m 
• 
2.2.3. I~age / Video Resolutions .............................. ; ..... :'.'.'": ...... ::; ...... :: ........... : ......... 15 
2.2.4. - Test Images & Scqncnccs ............... : ........... ;;,: .. :.· ..................... :· ......... ;~; ....... ; ..... '..16 
·- ' .. · . ,, ,. 
2.3. Motion Analysis (Stage A) ............................. ~·;,,.,;;.',: ... '.:.'.:;;; .. ,:; .... .' ... :: ..... ;.::'..: ..... .-........ 18 
2.3.1. Motion Estlmation .................................... :: ... :.·: .... ;: .... '.;.-.'.,:.·: .. ,:: ... ~.'. ...... ,'..:'. .. : ....... 18 
i .. 1, 
2.3.2. Motion Compcnsntion .................. : .... , ............. ;·; .. ;'.·,;,;; .. },i:lf,., .. ;'.: ... '."-':'""""""20 
2.3.3. Motion Prediction ...................................... :· ... :,.: ..... ::: ... :.· ..... :.:'.: .. ;;(; ... :: .... ::.:: ....... 20 
"2.4. Discrete Cosine Transform (Stage B) .. , ...... :: ... :'. ... '../.'. .... -•. :.:,\'.:_::· .. :.::~:,:·(:.\ ... :· ........ 20 
2.4.1. Matrix Based 2D DCT Computation ................... -........ ;'.'.~ . .';: •. :.:·.~a'.:.';; ..... :.: ...... 25 
-_.-.· ',) -· Y, .::i •. ·• .. ' 2.5. Discrete Wavelet Tram form (Stage B) .......................... :: ...... : ... :.; .. ; ....... , •. :.-.; ... '. ......... 26 
2.5.1. 
2.5.2. 
·2.SJ~ 
Filter Based Wavdet Decomposition .......... :: .............. :!::: ..... :.;;" ..... :· .... : ............. 28 
Triangular Binary Wavclct .................................. 0•• .................... :: ..... : •• ;;.: ...... :.· ..... '.32 
Pyramidal vs. Nuclei~-Wavelet B°i~ks .......... ::: ... :L:.'i·, .. :: ... :·:.: ... :.:' .................... .35 
2.6. Qu~ntiSation (Stage q ..................................................... _. ....... :: ......... ;: ...... : .. :.: .......... '..36 
2.6.1. DCT Quanl$ation .................................................. .'; ......... ,1 ....... .' •. ; .................... :.39 
2.6.2. Subband Quantisation ................................ ;.;', ... i.'..:: ... _. ...... :· ........ : . .:1 .• : ............... .40 
2.7. Reconstructed Im:ige Quali.ty ........................... ;.; ........ .': .• ,:.' • .': .. :::.:.:, ....... : .......... : ......... ..40 
2.7.i. Peak Signal to Noise Ratio (PSN.R) ......... ,:'. ...... i.-.:; .. .': .... .''. ... : .. :: ..... :: .......... : ...... .41 
2.7.2. Visual QuaLity ........................................... : ..... :.,:._ ........... : .. i.:-... _. ... ;_: .. .'.'.,.: ............... .41 
. ·. _-, 2.8. Entropy Coding (Stage D) ......................... '. ........ ;;;; ...... ,:;: .... , ....................................... 43 
' ' ' . - . . 
2.8.1. ZigZag& Run Length Coding ....................... : ........ , ...... :.,:: ........ : ...... :.: ............. 43 
~¥J~li¥il~l?~~~ 
. '., .. .-·-_ . ·- - . :.,:,;:·· 
3 ZeroTree Coding ...... ! ............. _.;·,.;,,,·;;:,•L,;~ .• ;.'.;;:·: . .-:: •• '. ..... ,;'.:~·.-.:: .. ,-;-;,·;,;: ..... 49 
3.1. · Introduction ............................................... ~ ..... :'. ...... : ... :: ..... :.:" ............ :::.;.:'.: ............... :: .. .49 
3.2. The EZW Algorithm ........................... .-............... :: .. ~: .... , ..... :·: ............ L.~ ............ :: .... : .. so 
. . . ,-. '-... - -0·:, .. -- -. 
3.2.1. Sigru6cancc ofCoefficients ..................................... :.: ..................... , ................... 52 
3.2.2. Positional Coding of Coef6denti ...... , ............ , ..... '.:.. .. ; ... : ...... i:; .. : .. .' ... i:: ............. 54 
3.2.3. SuccCssive Approximation Qnantisatioii. (SAQ):.}.::.'.:.;." ... '..' .. -............ ;'..' .......... 58 
3.2.4, Significance Reocderi11g .......... :, ................ · ..... .'.'. ... _.:.;.-j,;.?.L'.'.~ .. ;;.: .... :,.-............ 6Q 
32.5. EZW Encoding I Decoding Process ............... _:::·.:,.:: .. :.' .. /:./:"._;;'..: .. '.: .. : ............ 61 
·.1., N 
3.3.. ZTEA!gorithm .................. .' .............................. : ..................... : ........ : .... ::::,_..: ................... 67 
·3.3.l. 
3.3.2. 
3.3.3. 
3.3.4. 
Sub band Quantisation, : ............................. : .................. : .. · .. ;:,,: .. -.:.\ ............ ,: ......... 66 
Positional Information Coding .. : ...... '. ............... ~ ....... .'." ...... '., ...... _. .. ;: .. , .... ~:: .......... 69 
~::::::::::::::::::::::::::'.:'.:::::::::::·:'.;:::::;:?::;:::/'.'.\::'.\f.'.2:::::~~/({::.::::i:::_:::: 
3.5; Performance Results .................................. : .... ;': ... :.:, . .' ......... ; •. : ........ :;.:: ...... · ... : ... : ............ 80 
. ,-. 
3.6. Conclusions ......................................... ,.-.·,,_-,' ... · • .' ...... .- .. ' -, ...... •·. ::: ..... " ... · ......... 81 
. "· -
' 4 Intelligent Pixel Paradig,n, .... .,,,,,.,.,,,,;~ ... ,;.~ ...... i:;:·:;;,';,,,,_ ................... 82 
4.1. Introduction ............................ ,; "., ............. :·'' ···· ... .' .. .-. -.•. ' ........ ·. ·:'·_., ... -... ·· ........... 82 
a 
Conventio~al Video.Coding Sy:;tell\S ....... : .... '..:;·;::···/:./;f /:)·_.":_ .. /:;r{:,.: .. _L ..... 83 
4.3. ·The IPA Video Coding Sy:;tem ........ : ......................... :,;' .. : ... : .. .-: ..... .-.: ... : ..... : ........ , ......... 85 
4.2. 
. . .. . ,' _. ·.::.;;'/(.;"- /.-. ·. 
4.4. Intelligent Pixel Stmcture ............................. ;; .................... ; ... :: .... : •. : ......... : .................. 87 
4.4.1~· ~-:~aprure_ c~~ponent ; ... '.;;};"h·:;;/-?:'.·:~;;:;J:::::}/)}·O,i.) .: .... ; ..:: ..... .89 
4.4.2, .IP Processing Componcnt ........................................ , ......................................... 90 
4.4.3. ':JP Display Compon~tt ..... :'. •• ::, .. : .. ·:.': ........ ::': .... ::· ... ::,, •. :.:·:.t ... \'.::::::·; ....... :: .......... 91 
" 4.5. IPA lnlerconnecCS .......................... : ... :.:.:: ... : ....... : ...... ·: ........ : ........ ::.: .• : .... .-.•. ,.: .•.. ,., ........ 93 
4A>.. ~ray Ma:Piog. of~xleo. ~~~-~c\ .. ;;,J·:\:\:'.·:.::'.~,·:((/.;:::i;\-~?;:::}···--·'···:: .... 95 
_4.6.1. :. Monon Compensation. Codec , .. , .. :; ......................... .-, ... · ....... ::: .......... _ .................. 97 
: . . .... . .' ·,,, ( ·; .. ,,_ '., , : •. ;,._.:·::;'):c .,.-,, .:-:1, '· . 
. 4.6.2. _ Scnle Control Architecture ........ :: .. :.:c .. , ... ,; .... ; ........... · ... , ........ : .......... :.: ................ 99 
,· . ' .. · ·', __ ,,' ,- tf'' '·.- -.- "---,'. 
4.G.3. High Pass I Low Pass Pixel Identification ... : ..... !: .. : ..................... ::.;-............. 102 :, . . ·.- ,,, - ,- ''-"'<' -i· ·• --., 
-~.6.4, Triangular Wavdet Traosfo~-Codec .. .' .. ; ...... : .... \ .. .-;;:: ...... ~ ... ::.'.: ...... , ....... .', 103 
4.6.S. 
4.6.G. 
v 
5 Massively Parallel Zerottee Codec for the IP:A- .................................. 115 
.·· 
5.1. Inttoduttion ................................................................. : ....... : ....... '.:.'.\} .• : .•.. ' .............. 115 
5.2. Parallel Significance Tree Propagation ....................... :, ... '.:-........... :\1\:.L .............. 116 . . . 
5.2.1. Nucleic Blocks and Scale Sclection ............. ·.: ................... L ... :·.:;~;:·:: ....... ; .... ;.116 
5.2.2. Tree Interconnections and Propagation.' ....... ·'. ....... -;·.-: .• :.:-.. .-.; .... ; .. ~; ... ;;;.· ........... 119 
5.2.3. Pixel-wise Tree Architccture ............................. ::.,:' ........ : ... i,, .... '../::." ................ , 122 
5.3. Embedded Zerottee Wavelet Codec .............. :: ........ :·.:.".: ............... :( .... .-........ !; .. ;;:;: ... 123 
5.3.1. 
5.3.2. 
5.3.3. 
5.3.4. 
5.3.5. 
C ffi . R . . ,.-- .:· ·. -· 123 oe cient eorgarusatmn .............................. , ............................. _~ ..... :; ....... 1 .. 
,, ' ... .,. " ' S"gnifi Id till . Archl ' , .. - . - -· ' .. ·, 124 
l cance en cauon tecture .......... : .. :"""''"':.''"'.~"''"'""""·:"·'.;'" , 
PixclSclf-Classification and Enabling ............ : ... : .. : ...... ; .......... :.· ............. ::.1.'. ... 127 
- ·' . ··._ .. :·' ,' ;, : . 
Pillel Latching & Bypass Atchltcclure ................. .-.:;; .. ,;;;,, ... ;.;;,.;; ... ;,; ............. 130 
. . . ' _·"/;. : _/' --< .,::,:· 
Sigrnficance Tree Signal Generntton .................... , ... ,,:_, ...... ;,: ....... ,:.'.:.: ....... , ...... 133 
'5.3.6. Data Excrnction/Load Architecture ................ ;: .. :.· .... : ..... :.::.}/.:.-.-: ... , .. '..'.:'. ....... 134 
' .. . .. . 
5.3.7. Array Edge Buffer Design .. : ............................................... : .. , .... ;,;,{:';., ... : ........ 138 
,I ,,-;,-. ,, .-. ' ' 
5.3.8.· The Encode Control Sequence .......................... .-.... :· ......... :!.,: .. ::.: .• :::i .. .':.'.-.. , .. _: \42 
.-., --- . ): ., .,Jf<',_.-· .. .y\':\.;: ~ 
·_,5.3.9. .-The Decode Control S!!<juence .................................. _. ........ ,., ...... , ................ ,.14,. 
5.4. Zerotree En~p'. Code,r .... _ ............ : ......... : .. ;:::"""""":_,.· ..... '.i.'.:.:;_;/~~-·:·::',:~7.':: .. (,;:}'14: 5.4.t. ' CoefficrentQuannsauon .............. ; ....................................... : .. :, .. ; .• ; ............... :.14:, 
. _. -· ,·'..,.·:,./ .- ,, ' 
5.4.2. Significance Identification ~hitecture ...................... _..'.'._'/ .. ::;::~/:;·:.'.'""".".146. 
5.4.3. Pixel Self-Classification Archltecruie ............ '.: ...... :., .... .'.;.-.!.'.: .. :: ..... :.: ... '.: ...... :·:'. 147 
. . ', ·.'.--'', .. ,._:· ... _..,,.,, .",,,. 5.4.4 • .- P1Xel Enable Archltecture .................................. _ .. ,. ...... _.._. ........... /•• .. •··:·";'"'"'"""" 149 
·. 5.4.5. 
5.4.6. 
5.4.7. 
Symbol Latching & Bypass Arcltitcc~~;_( ....... : .. ;._ •.• :, .. -..... : •. ,,. ... ;):::.-: ........ :.::' 150 
Significance Tree Genemtion ................... ::." ... '. ................ i:.::;:.'.: .. :.'.'.:·.-; . .' .. .".:,, .. :,. 152 
Data Extraction/Load Architecture ................. , .. ;'.: ............ :::: .... .-•• .-: ... :: ........ :.· 152 
5.4.8. Array Edge Buffer Design ..................................................... ;.'. .. :-.................. -... 153 
5.4.9. The Encode Control Sequence ......... :_. • ."." ........ ::: .. :.-.. :;::·/.":::·~ .. \'.: .. ;r;~:.'.""'. ....... 154 
5.4.10. · -The Decode Control Sequence .......... : ................... : .... : ..... ; ..................... ,.' ...... 154 
5.55···.',·.,1; •• :c. "·~~::~~;;:;:::::;:;~::;;} ;::':r;;:i:t : }:i :: ::; 
Hnnlware and Control CompleXlty,, ....... , ........ ; .......... , .. , ................................ 160 
-.. ·. . ·:·. · · .. · , ·- :·· " ·;'-',. " ·,·i• ";•' c' '·'- · .. 'i 
5.6., · _ Ffrml _Codec Selected For Irr,pleme_ntation ... ;.; ....... : ....... (.:: .... ;,;:.: .. ,·.,·;;,;; ..... , .... ;; ... 160 
5.7. ; Conc
0
lusion ..... : ... :: ........ :: ..... :.; .... :.:.:: ........ :.' ........... .-...... :\: ...... i.1.'.';.".' ... ;,;,;1,;.:.,:., ........... : .. 162 
.. 
·,:;-" 
6 The ZTE Codec Riialised _In ·VLSI-......................... : ..... : .. : ..... ; ......... 163 
... .,; VI 
6.1. 
62. 
6.3. 
6.4. 
I .. ~ · ./ ' . 163 ntr,.,...uct1on , ............................... , .............................................................................. . 
IPIOOP Prot~cy"pe Configunition .................... , ....................... .' .. .': ...... ,: ..................... 164 
The Technology ............................... : ....................... · .............. : .................................... 164 
. . 
IPIOOP Floor Plan ...................................... :~ .................................... :·: ........................ 166 
. . 
6,4.1. IPIOOP Si:ie & Layout Conside!llticiris, ............. :: . .' .. '. .............. :.--... -.:· ... : ............. 166 
6.4.2. Power Distribution & Metal Allocation .............................. :: ... ·.:.-: .................. 167 
6.4.3. PixelF!oor-Plan ............................................................................... ; ................. 169 
6.4.4. NBLOCKArrangement ............................................................ : .. .' ................... 170 
6.4.5. Control Signal Routing & Buffer Placement ............................ , .................... 171 
6.5. IPlOOP Primitive Component Layouts & Simulations ................. : ....................... 172 
6.5.1, Basic Primitivcs ................................................................ · .................................. 172 
6.5,2. Buffer Requirements ......................................................................................... 179 
6.6, ZTE Post Layout Simulations ...................................................... :;.: .. ::; .... :, ... : ......... 185 
6.6.1. Coefficient Significance detection .................. : ................................................ 187 
6.6.2. Pixel Pass-through Mode ................................................. :,: ....... , .. : ................... 188 
6.6.3. Wavelet Coefficient Output ...................................................... , .. :: .................. 189 
6.6.4. DNT Symbol Generation ............................................ : .................................... 190 
6.6.5. ZTR Symbol Generation ..................................................... , ............................ 191 
6.6.6. VZT Symbol Generation .................................. : .......................... ::: ................. 192 
6.6.7. V t\L Symbol Gencrntion ................................ ;; ...................... ; ......................... 193 
6.6.8. DNT Decode Operntion ........................ : .... :: .. _. ......................... _._ ... :· ................. 194 
6.6.9. ZTR Decode Operntion ................... :.: .................. ;; .. :.:: ........... :.:.: ................... 195 
6.6.10. VZT & vAL Decode Operation ................... .': ............ :\.:: ...... .::: .................. 196 
6,7, Pillcl Components (Layout) ........................................ , ..................... '. ....................... 198 
6.8. NBLOCK Layout ........................... '.'. ..................................... : ........ , ......... , ... , ............. 199 
6,9, IPIOOP 1:ull Array .................... : ........................................ : ............ : .................. , ......... 204 
6.9.1. IPIOOP Array General Specifications ........................... : ................................. 206 
6.9.2. IPIOOP Simulated Array Power Consumption .... : ........................................ 206 
6.9.3. IPIOOP Prototype Testing ................................................... : ............................. 207 
6.10. Conclusion ............................................................................. : ....... ,: ........ :: .................. 207 
7 Conclusions & Further Research ............. : ... ;;\'. ... ~· ...... ::.:: .................. 208 
7.1. 
7.l 
.· 
,1 • • • • ' : ,-.' ,' ·.·.:_ .· ":\· ,\ ' ' ' ' . 
q,ntnbunons ofth1s The1ts ., ...................... ,'. ................... _,; ........ , ........... , .................. 208 
i.:Ondusion .......................................... .'.· ........ ;.;·: .......... , ........... : .... : .... : ......... ;,;, ............. 209 
,• - l,, VII 
7 .3. Future Re~ch Opportunities ................................................................................ 210 
7.3.1. Colour Video Processing .................................................................................. 210 
7.3.2. Standards Developmem ................................................................................... 210 
7.3.3. Increased Resolution ...................................................................................... , .. 210 
Bibliography ....................................................................................... ,. 212 
Appendix A .............................................. ,,,,,,,,, ................................... 220 
Appendix B .................................... ,,,,,,,,,,, ........................................... 230 
,;,.-i. Vlll 
USE OF THESIS 
 
 
The Use of Thesis statement is not included in this version of the thesis. 
Declaration 
I certify that this thesis does not incorpotate without acknowledgement any material 
previously submitted for a degree or diploma in any institution of higher education; and to 
the best of my knowledge and belief it does not contain any material previou~Jy published 
or written by another person excen• ,,,1,~·~ A,rn ~,A-\1.Ce is made in the text. 
Signature. Date.~./~{/~~~ .. 
x 
Acknowledgements 
I would like to express my gratitude to the following people at Edilh Cowan University. 
Prof. Kamran Eshraghfan, my principal supervisor for his invaluable guidance, support 
11nd encouragement throughout this study. Kamrar,. always inspires enthusiasm in those 
that he works with. This thesis would have never been completed without the financial 
support that he procured. 
Dr. Alexander Rassau, my sccondacy supervisor, for his time, patience, encouragement 
and effon correcting this thesis. Alex's supervision w-~1 vital in the timely completion of 
the thesis tluough his insightful knowledge of the topic, 
Dr, Stefan Lachowicz, who provided the initial supervision, for his suggestions, time and 
encouragement. 
Andrew Ehrhardt, David Lucas, Joe Austin-Crowe, Vishalakshi Rarnakonar and 
Edward Gluszak, for their input,comradeship, and encouragement. 
I would also like to express my gratlrndc to the following institutions. 
School of Engineering, Edith Cowan University for aU the financial and equipment 
support provided during the course of this study. Imelligc:nl Pixels Inc. for financial nnd 
software tools ~upport, and the Australian Research Council (ARC), for providing the 
numerous grants that supported this study and the research ream. 
Family support wai without quc,;tion one of the most vital factors influencing the 
completion of the research. They motivated and encouraged me unfailiogly, so thanks go 
to my father Gamini, my mother Deanna and my brother Dale. 
One person who was always there 10 puU me through the tough times, and always had faith 
in my abilil)' to achieve this goal, was my fiRnce Vanisha Nair. -My deepest gratitude goes 
to her. 
XI 
ABSTRACT 
In the span of a few years, mobile multimedia communication has rapidly become a 
significant ar~'a of research and development constantly chaUenging boundaries on a variety 
of tcchnologic-a! fronts. Mobile video communications in particular encompasses a number 
of technical hurdles tlm gcncmlly steer technological advancements towards devices that 
are low in complexity, low in power usage yet pcrfonn the given task efficiently. Devices 
of this narurc haw been made availabk through the use of massively parallel processing 
arrays ~ucb as the Jntclligcm Pixel Processing Array. The Jmclligent Pixel Processing array 
is a novel concept that integrates a par;iUd image i:aptul"c mechanism, a paraUcl processing 
component and a parallel display component into a single chip solution geared toward 
mobile communications environments, be it a PDA based S)'Stcm or the video 
communicator wriotwa1ch portrayed in "Dick Trncy" episodes. This thesis details work 
performed to pro,·ide an efficient, low power, low complcKity solution surrounding the 
massively parnllcl lmplcmenmtion of a zcrotree entropy codec for the Intelligent Pixel 
Array. 
Bandwidth limitations that surround many of today's communications channels have 
forced investigations into viable and efficient compression algorithms for appUcation in 
mobile video communications. However the complexity presented by most modern hybrid 
image / video compression algorithms typically rcsen·c them for pure software based 
implementations. Recent trends io the lield indicate that a hybrid video compression 
algorithm im·olving a discrete wavclcr trnnsform coupled together with a zerotrcc entropy 
coder provides an effective compromise between compression efficiency and complexity. 
However, most solutions still remain purely software based. 
Zerotrec coding is an efficient lossless entropy coding technique designed for image and 
video compre1oion. Its strength Lies in its ability ro generate single symbol rcpresentario1is 
for large areas of transformed cocflicients. However, the compumtional complex.icy the 
zcrom:c coder presents, generally rcservci it to a purely software based solution typically 
requiring hardware that is less cflident in power, such as DSl's. The low-power usage of 
the Intelligent Pb;el Processing Array therefore becomes an attractive alternative. The 
solution to rhc problem of eflicicntly performing zcrotrce coding in a massively parallel 
XII 
pixel based architccmrc is thus pro\·idcd herein. A novel pixcl·wisc zcro!rct sclf-
c!a~sification technique is used to wnvcn the ,cqucnthJ 'ltrotr(C codec to its parallel 
equivalent. Two zcrotree coding Algorithms were invl!Stib'1lled, the Embctlded Zerotrcc 
Wavelet (EZW) and the Zcrotrcc Enrropy Coder (ZTE), with the latter being chosen for 
the final dciign, due to it, suitability for implementation within a parallel processing 
environment. 
The Intelligent Pixel Atr:1)' is composed of an N Jt M matrix of pixel based processing 
clements. The aim of th( work perfonned here is for the development of a QCIF sized 
processing amy containing (176 x 144) identical pixel clements, The successful 
implement:" ,f this concept within a maximum possible die size of around 15mm x 15 
mm forces the pixel pitch to revolve nround the ,izc of 80µm x 80 µm. This area is then 
subdil'id~-d into areas for the capmrc and ADC, forward and inl'crsc motion compensation 
processing, forward and inl'crsc wavelet transform processing, forward and in\•cr,;c 
zcrotrcc co<ling and image display. Therefore tl,e zerorrec solution provi<lcd b)' this thesis 
01•ercomcs two essential problems. l. ,\ pixel ba,c<l parallel processing division of the 
prcdominaml)' sc,1ucntia! zcrotrcc co<liog algorithm an<l 2. 11,c intcgratinn of the zcrotrce 
codec into the sharc<l pi.xd pitch of80 x 80 µm, using 0.25µm tcclmology. 
The prototype <lcsigncd as n result nf<l1is work implements a 32 x 32 pixel :trr:1y, consisting 
of approximately 1 million tram is tors an<l consuming less than 35mW of power at a low 
opcmting frequency of 100 KHz, while still coding a1 25 frames per second. The fully 
scalable narurc of thb architecture aUows for an)' arbitrary sized implementation of <l1is 
array, with minimal Or no modification 10 cnch pixel clement, 
l'oUowing the first introductory chapter, thi~ thesis proceeds onto a description of current 
image / 1·idco compression mctho<lok,gics. ,\ fuU bickground to, and comparison of, the 
two zcrotrcc coding techniques arc prol'idcd nexc, The subsc'<jUCnt chapter intro<luces the 
concept of the mnsiivcly parallel intelligent Pb;d Arrny and prior work performc<l on such 
a device, ,\ parallel hardware architecture is then presented for each zerotrcc coding 
tc~lmiquc, an<l the more suitable technique is chosen. Next physknl layouts and hardware 
simula1ions arc presented for the chosen solution irnplemcmcd in 0,25µm CMOS 
technology •. Finally n conclusions and directions towar<ls future work in this srca arc 
provi<lcd, 
XIII 
Publications 
1l1c following is a list of parers published during the course of this thesis dcvclopmeru. 
,\, M. Rassau, G. ,\Jagoda, D, J.ucas,J. Austin·Crowc, K. Eslm1ghian (1999) - "Massively 
Parnllcl Imclligeot-Pixd lmpkmcmation of a Zcrotrec En1ropy Video Codec for 
Multimedia Communications", VI.ff: S_ytltms On A Cbip, Klu=r Academic Pub!ishcfS, 
Portugal, 89-100. 
,\. l\l. Raisau, G. Abgodn, K. Eshrnghian (1999) - "Massively Parnllcl Wa,·clct Based 
Video Co.Jee For An [mcmgcnt-Pixcl Mobile Multimedia Communic:itor", I'ijlh 
lnf,m,1/itm,1! S;m~tir1m on Signal Pro,ming ,ind ilt AppliMion, Queensland, 793-795. 
1-1. N, 01eung, G. ,\bgoda, K. Eshraghian and L Ang (1999) - "Sn11n Pixel VLSI 
Architecture For Embedded Zcrotn.'C Wavelet Coding", Fifth lnlrm11tion11i Syllljx,lium on 
Si,gn<1/ Prorming ,md ilt Applka1io11, Quccniland, 693-695. 
A. J\.I. Rassau, R. Man,ddat, G. ,\fob'Oda and K. Eshraghfan (1999)- "System Analysis of 
An Jmclligcnt Pixel Mobile Multimedia Communkator'', Fijib lnttmational Sympcni,m M 
Signal /'rorming ,1111/ itt Appliufio11, Queensland, 801-803. 
G. ,\logoda, ,\. M. Rassau and K. fahrnghian (2001) - ",\ Massivdy P=llcl Per-Pixel 
Based Zcrotrcc l'roccsiing Architecture for Rc1l-Timc Video Compression", to /Jr pWli!Jml 
1!1 SP/E's /ntrmatkmal .<.Jmpcsi,,m 011 Mimtl«lro11irJ and Mim-Elrrtro-J\11r/,a11iral S.)t/11111, 
Adelaide. 
XIV 
Figures 
Figure 1-1: 30 Multimedia Communications Concept ..................................................... 2 
Figure 1-2: Video Compression Components,, ................................................................. 3 
Figure 1-3: World-wide mobile phone users .................................................................... 5 
Figure 1-4: Task Allocation Structure ............................................................................... 6 
Figure 2-1: Natural Image (Jenny) .................................................................................. 11 
Figure 2-2: Generated Image ............................................................................. , ............ 11 
Figure 2-3: Image /Video Coding Stages ....................................................................... 13 
F'lgure 2-4: Luminance & Chrominance sample positioning .......................................... 15 
Figure 2-5: Lena Y & RGB Images ................................................................................ 17 
Figure 2-6: The 'Jenny' Sequence .................................................................................. 17 
Figure 2-7: The 'Salesman' Sequence .............................................................................. 18 
Figure 2-8: Motion Estimation ........................................................................................ 19 
Figure 2-9: Lena's Eye .................................................................................................... 21 
Figure 2-10: 16 x 16 & 2D DCT Transfonn of eye ........................................................ 21 
Figure 2-11: "Raw" Lena image @ I bit per pixel ......................................................... 22 
Figure 2-12: Transfonned Lena image @ 0.69 bits per pixel. ........................................ 22 
Figure 2-13: 'Lena' DCT (JPEG) @ 0.2 bits per pixel .................................................... 24 
Figure 2-14: Forward 20 DCT Algorithm ...................................................................... 25 
Figure 2-15: Inverse 2D OCT Algorithm ........................................................................ 25 
Figure 2-16: 'Lena' image in multi-resolution subbands ................................................. 27 
Figure2-17: DWTof 'Lena'image@0.19bpp ............................................................. 28 
Figure 2-18: DWT subband decomposition via filters .................................................... 29 
Figure 2-19: 2D DWT subband subdivision ................................................................ , .. 30 
Figure 2-20: Subband filtering on an image ................................................................... 31 
Figure 2-21: Effect of precision on filter perfonnance ................................................... 32 
Figure 2-22: 'Lena· image @ 0.2 bpp .............................................................................. 33 
Figure 2-23: High & low pass triangular filter coefficients ............................................ 33 
Figure 2-24: Forward triangular DWT a\gorithm ........................................................... 34 
Figure 2-25: Inverse triangular DWT algorithm ............................................................. 35 
Figure 2-26: Pyramidal vs. Nucleic coefficient arrangement ......................................... 35 
rigurc 2-27: Related coefficient distance in pyramidal schcme ..................................... 37 
Figure 2 .. 28: Related coefficient distance in nucleic sclieme .......................................... 37 
Figure 2-29: DCT Significance Map ............................................................................... 38 
Figure 2-30: DCT Quantisation ...................................................................................... 39 
Figure 2-31: Subband Quantisation ..................................... ; .......................................... 40 
Figure 2-32: 0.21 bpp Image with 30 dB PSNR ............................................................. 42 
Figure 2-33: 0.19 bpplmage with 30 dB PSNR ............................................................ .42 
Figure 2-34: Zigzag Coding Techniquc .......................................................................... 44 
Figure 2-35: Run Length Coding Example ..................................................................... 44 
figure 2-36: Arithmetic Coding Example ....................................................................... 46 
Figure 2-37: Image I Video Encoding ............................................................................. 47 
Figure 2-38: Image I Video Decoding ............................................................................ 47 
Figure 3-1: Test Imoge .................................................................................................... 51 
Figure 3-2: 3 Scale DWT of Test Irnage ......................................................................... 51 
Figure 3·3: Main EZW Components (Encoder & Decoder) ........................................... 52 
Figure 34: Signific1111ce Iterations .................................................................................. 53 
Figure 3-5: Signilicant coefficient count for varying threshold values .......................... 54 
xv 
Figure 3-6: EZW Relations Trees ................................................................................... 55 
Figure 3-7: Symbol Identification Flowchart (Encode) .................................................. 57 
Figure 3-8: Coefficient Identification Flowchart (Decode) ............................................ 57 
Figuro 3-9: Scanning Order • Subbands & Coefficients ................................................. 58 
Figure 3·10: SAQ Refinement process ........................................................................... 60 
Figure 3-11: Example of an 8 x 8, 3-scale wavelet transfonn ........................................ 62 
Figure 3-12: Example EZW Coefficient Tree ................................................................. 63 
Figure 3·13: Example EZW Symbol Tree ...................................................................... 63 
Figure 3·14: Example EZW Symbol Decode ................................................................. 65 
Figure 3-15: Example EZW after l Pass .......................................................... ;,,, ........... 65 
Figure 3-J 6: Main ZTE Components (Encoder & Decoder) .......................................... 67 
Figure 3-17: Quantisation Vector for Sub blinds ............................................................. 69 
Figure 3·18: [2 4 16 16] Quantised Coefficients ............................................................ 71 
Figure 3-19: Example ZTE Significance Map ................ : ............................................... 72 
Figure 3-20: Example ZTE Significance Tree ................................................................ 72 
Figure 3-21: Example ZTE Symbols .............................................................................. 73 
Figure 3-22: Example ZTE Symbol Docode ................................................................... 74 
Figure 3-23: Example ZTE Coefficient Value Decode ................................................... 75 
Figure 3-24: Coefficient Map ......................................................................................... 76 
Figure 3-25: DFS Tree Search ........................................................................................ 77 
Figure 3-26: DFS Coefficient Arrangement ................................................................... 77 
Figure 3-27: BSFTree Search ......................................................................................... 78 
Figure 3-28: BFS Coefficient Arrangement .................................................................... 79 
Figure 3-29: ZTE vs. EZW Performance comparison for lhe Lena image ..................... 80 
Figure 3-30: ZTE vs. EZW Performance comparison for the Jenny Image ................... 81 
Figure 4-1: Conventional Video Communication Methodology .................................... 84 
Figure 4-2: 30 Chip I/0 .................................................................................................. 86 
Figure 4-3: IPA Video Communications System ........................................................... 86 
Figure 4-4: QCIF IPA and Pixel Close-up ...................................................................... 87 
Figure 4-5: IP Top View ................................................................................................ 88 
Figure 4-6: IP Cross-sectional View ............................................................................... 88 
Figure 4-7: Capture Flowchart ........................................................................................ 89 
Figure 4-8: Pixel Processing Flowchart .......................................................................... 90 
Figure 4-9: IP Processing Architocture ........................................................................... 91 
Figure 4-10: Real Image Formation for Capture ............................................................. 92 
Figure 4-11: Virtual Image Formation for Display ......................................................... 92 
Figure4-12: Hypercube Interconnect Arrangement ....................................................... 93 
Figure 4-13: IPA Interconnect Arrangement .................................................................. 93 
Figure 4-14: IPA Data Flow Directions .......................................................................... 94 
Figure 4-15: Proposed IPA Video Codec ....................................................................... 96 
Figure 4-16: System-On-Chip Architecturo .................................................................... 97 
Figun. 4-17: Frame Difference & Summation ................................................................ 98 
Figure 4-18: Frame Difference Components .................................................................. 99 
Figure 4-19: VEnable & HEnable Control Lines .......................................................... 100 
Figure 4-20: Activation ofHHi subband via VEnable & HEnab!e .............................. 100 
Figure 4-21: Pixel Bypass Architecture ........................................................................ 101 
Figure 4-22: Low/High pass pixel identification architecture ...................................... 102 
Figure 4-23: High I Low pass grid pattem .................................................................... 103 
Figure 4-24: Row/Column ID Transforms for 20 ....................................................... 104 
Figure4-25: Wavelet Transform Architecture .............................................................. 104 
XVI 
Figure 4-26: Quantisation Architecture ........................................................................ 106 
Figure 4-27: Summation Unit Schematic ...................................................................... 108 
Figure 4-28: Register RO Schematic ............................................................................. 110 
Figure 4-29: Motion Compensation Unit.. .................................................................... 112 
Figure 4-30: Mode Selection Circuitry .............................................................. , .......... 113 
Figure 5-1 ·, NB LOCK Coefficient Tree Representation ............................................... 117 
Figure 5-2: Scales vs. PSNR for Different Images ...................................................... ,118 
Figure 5-3: Performance for Varing Bit-counts & Scales ............................................. 118 
Figure 5-4: 3 Scale NBLOCK & Wavelet Tree ............................................................ 119 
Figure 5-5: Single Branch of Five Signal Relation Tree ............................................... 121 
Figure 5-6: NBLOCK Signal Route Map ..................................................................... 121 
Figure 5-7: Pixel-wise Tue Architecture ...................................................................... 122 
Figure 5-8: Significance Identification Architecture ............................... :.'. .................. 125 
Figure 5-9: Significance Identification Schematics ...................................................... 126 
Figure 5-10: EZW Classification Schematics ............................................................... 129 
Figure 5-11: EZW Pixel Enable Schematics ................................................................. 129 
Figure 5-12: Symbol Latch and Bypass Schematic ...................................................... 131 
Figure 5-13: Per-pixel Significance Tree Generation ................................................... 134 
Figure 5-14: Subband Encode Decode Order ............................................................... 135 
Figure 5-15: Distributed Pixel Counting ....................................................................... 136 
Figure 5-16: Per-pixel Counting Archi1ecture .............................................................. 137 
Figure 5-17: Edge Encode Buffer Architecture ............................................................ 139 
Figure 5-18: Edge Decode Buffer Architecture ............................................................ 140 
Figure 5-19: Buffer- Array Interfacing ........................................................................ 142 
Figure 5-20: ZTE Significance Identification Architecture .......................................... 147 
Figure 5-21: ZTE Classification Architecture ............................................................... 148 
Figure 5-22: ZTE Pixel Enable Architecture ................................................................ 149 
Figure 5-23: ZTE Latch & Bypass Architecture ........................................................... 151 
Figure 5-24: ZTE Significance Tree Generation .......................................................... 152 
Figure 5-25: Jenny Sequence at 64Kbps (5fps) ............................................................ 158 
Figure 5-26: Salesman Sequence at64 Kbps (5fps) ..................................................... 158 
Figure 5-27: Jenny Sequence at 250 Kbps {25fps) ....................................................... 159 
Figure 5-28: Salesman Sequence at 250 Kbps (25fps) ................................................. 159 
Figure 5-29; Final ZfE VLSI Schematic ...................................................................... 161 
Figure 5-30: Fina[ VLSI WT and MC Schematics ....................................................... 162 
Figure 6-1: JPlOOP Prototype JJO Configuration .......................................................... 164 
Figure 6-2: Ring Oscillation for UMC 0.25µm Process ............................................... J 66 
Figure 6-3: !PIOOP Proposed Dimensions .................................................................... 167 
Figure 6-4: Power Distribution in IPlOOP Prototype .................................................... 168 
Figure 6-5: Pixel Floor-Plan ......................................................................................... 170 
Figure 6-6: Array & NBLOCK Floor Plan ................................................................... 17 I 
Figure 6·7: Inverter Schematics & Layout ......... , ......................................................... 172 
Figure 6-8: Inverter Operation ...................................................................................... 173 
Figure 6-9 Transmission Gate Schematic & layout ...................................................... 174 
Figure 6-10: Transmission GateOperation ................................................................... 174 
Figure 6-11: XOR Schematics & Layout ...................................................................... 175 
Figure 6-12: Exch1sive-OR Operation .......................................................................... 176 
Figure 6-13 DFF Schematic .......................................... ,,, .............................. , .............. J 77 
Figure 6-14: DFF Layout .............................................................................................. 177 
Figure 6-15 DFF Operation ........................................................................................... 178 
xvrr 
Figure 6-16: Multiplexer Schematic & Layout ............................................................. 178 
Figure 6-17: Multiplexer Operation .............................................................................. 179 
Figure 6-18: 3 Stage Buffer layout ................................................................................ 181 
Figure 6-19: 4.Stage Buffer .......................................................................................... 182 
Figure 6-20: 2-Stage Buffer .......................................................................................... 182 
Figure 6-21: Pad to BufferConnectious ....................................................................... 183 
Figure 6-22: 4-Stage Buffer Operation ......................................................................... 184 
Figure 6-23:3-Stage Buffer Operation .......................................................................... 184 
Figure 6-24: ZTE Coding Full Layout Circuilty ........................................................... 186 
Figure 6-25: Layout for Zero Identification .................................................................. I 87 
Figure 6-26: ZID Non Zero Detection .......................................................................... 187 
Figure 6-27: ZID Non High Detection .......................................................................... 188 
Figure 6-28: Pixel Pass-thru Mode ............................................................................... 189 
Figure 6-29: WCDATAOUT Mode .............................................................................. 190 
Figure 6-30: DNT Detection, Latch & Shift Out .......................................................... 191 
Figure 6-31: ZTR Detection, Latch & Shift Out ........................................................... 192 
Figure 6-32: VZT Detection, Latch & Shift Out ........................................................... 193 
Figure 6-33: VAL Detection, Latch & Shlf1 Out .......................................................... 194 
Figure 6-34: DNT Decode Mode .................................................................................. 195 
Figure 6-35: ZTR Decode Mode ................................................................................... 196 
Figure 6-36; VZT Decode ............................................................................................. 197 
Figure 6-37: VAL Decode ............................................................................................ 197 
Figure 6-38: FuU Pixel Layout ...................................................................................... 198 
Figure 6-39: 8x8 NB LOCK Layout .............................................................................. 199 
Figure 6-40: zrE NB LOCK Status Signal Routes ....................................................... 200 
Figure 6-41: Pixel l,J ZTE Routes ............................................................................... 201 
Figure 6-42: Pixel 1,2 ZTE Routes ............................................................................... 201 
Figure 6-43: Pixel 2,1 zrE Routes ............................................................................... 202 
Figure 6-44: Pixel 2,2 ZTE Routes ............................................................................... 202 
Figure 6-45: zrE CIN Connected ................................................................................. 203 
Figure6-46: zrE CIN Not Connected .......................................................................... 204 
Figure 6-47: IPIOOP Array Layout ............................................................................... 205 
Figure 6-48: lp!OOP Micrograph .................................................................................. 205 
XVHI 
Tables 
Table 2-1: Image Video Compression Standards ........................................................... 13 
Table 2-2: Common Video Resolutions .......................................................................... 16 
Table 2-3: List of common wavelet filtercoefficients ................. .' .................................. 31 
Table 3-1: EZW Ex.ample First Pass ............................................................................... 63 
Table 4-1: IP Mm:: Select Direction ................................................................................ 95 
Table 4-2: VEnable & HEnab!e list for sub band activation ..................... '. ................... 101 
Tab]e4-3: Register RO I/0 Ports .................................... ; .................. : ........................... 110 
Table4-4: RO Reversing Function ............................................................ -..... , .............. 1 ll 
Tab!e4-5: Motion Compensation Signals ..................................................... : ............... 113 
Table 5-1: Pixel Relation Signals .................................................................................. 120 
Table 5-2: Negative 2's Complement Convert Mode ......................................... ; .......... 124 
Table 5-3: RO Reverse Configuration ........................................................................... 124 
Table 5-4: Significance Identification Circuit J/0 ........................................................ 127 
Table 5-5: EZW Self-Classification Symbols ............................................................... 128 
Table 5-6: JJO Ports for Symbol Latch ......................................................................... 131 
Table 5-7: Edge Encode Buffer JJO Ports ..................................................................... 140 
Tab!e 5-8: Edge Decode Buffer JJO Ports ..................................................................... 141 
Table 5-9: Encode Cycle ............................................................................................... 143 
Table 5-10: Decode Cycle ............................................................................................. 144 
Table 5-11: Quantisation Select .................................................................................... 146 
Table 5-12: ZTE Self-Classification Symbols .............................................................. 147 
Table 5-13: Latch & Bypass 1/0 Ports .......................................................................... 15,1 
Table 5-14: ZTE Encode Cycle ..................................................................................... 155 
Table 5-15: ZTE Decode Cycle .............................................................................. : ..... 156 
Table 5-16: FZW vs. ZTE Complexity Comparison .................................................... 160 
Table 6-1: Primitive Gate Dimensions ........................ ; ................................................. 173 
Table 6-2: Ideal Stage Ratio of e ................................................................................... 180 
Table 6-3: Selected Stage Ratio .................................................................................... 181 
Table 6-4: ZID Module Signals .................................................................................... 185 
Table 6-5: ZTE Module Signa!s .................................................................................... 185 
Table 6-6: DNT Symbol Status Voltages ..................................................................... 191 
Table 6-7: ZTR Symbol status Voltages ....................................... : ... , .... :: ..................... 192 
Table 6-8: VZT Status Signals ...................................................................................... 193 
Table 6-9: VAL Status Signals ..................................................................................... 194 
XIX 
Cliapter 1 
MOBILE MULTIMEDIA COMMUNICATIONS 
SYSTEMS 
1.1. Introduction 
Humankind's never-ending dtlm for more h1fonnntion and more reliable iervices has been 
a powerful driving force behind neady all advancements in communicatloll technology to 
date. Communication sysrems, from its humble beginnings of message scrolls, carrier 
pigeons, smoke clouds etc., have evolved significantly through the advent of new 
Page - I 
technologies throughout time. Advancements in electrical and particularly electronic 
technologies have revolutionised the means providing communications today. 
Traditionally, communications systems were generally comprised of a single form of media, 
voice, written letters, etc. Since the advent of the television, which supported two forms of 
media (audio and video), increasing demands for devices capable of transmitting multiple 
forms of media, has resulted. The increasing complexity and decreasing size of electronics 
components, particularly in the area of digital signals processing, coupled together with 
advancements in wireless communication standards, have introduced the concepts of 
compact mobile communications devices, such as mobile phones. Recent mobile devices 
and communications standards allow for communication via multiple media, such as voice, 
text and pictures. This leads us to the present day where mobile multimedia devices, which 
can support voice, text, faxes, web browsing, high resolution pictures, and full motion 
video, appear just around the corner. Figure 1-1 illustrates a third generation (3G) mobile 
multimedia communications concept as proposed by Nokia. 
Figure 1-1: 3G Multimedia Communications Concept 
Integrating full motion video capability into a mobile multimedia communications device 
presents a number of significant challenges. These include standards development, 
network infrastructure development, high compression efficiency and low power operation 
of circuitry. Solutions to all of these problems are not suggested in this thesis, as it defines 
Page - 2 
too large a scope. This thesis, however, aims to explore the hardware (VLSI) development 
of an efficient video compression codec focused for applications in multimedia 
communication devices. 
Transmission of 'raw' or uncompressed video, even a simple black image, unfortunately, 
consumes a large volume of data bandwidth. For instance, considering a greyscale video 
sequence consisting of 352 pixels by 288 pixels at 25 frames a second (video quality); it 
consumes approximately 20 megabits per second or the equivalent of approximately 6758 
voice transmissions per second. Even though technological advances are constantly 
increasing the capacity of modern communication channels, bandwidth is generally not 
cheap. Images and video, in 'raw' format tend to contain a significant amount of 
redundancy, be it in the form of large singular coloured areas, low frequency spatial 
gradients or transmission of similar content from frame to frame. Video compression 
algorithms today, such as MPEG 1/ 2/ 4, H261 /263 and MJPEG2000, minimise, not only 
the spatial redundancy in frames, but also the similarities found between two consecutive 
frames, therefore, allowing for low bandwidth representations of the original video 
sequence. The techniques employed to do so generally dictate the compression efficiency 
of the codec. 
A typically video compression system is composed of the components illustrated in Figure 
1-2. Example algorithms belong to each component category, particularly those employed 
in this thesis, is listed below each relevant component. The highlighted (grey) components, 
however, constitute the major focus of this thesis. 
Image 
-
Motion 
Compensation 
Frame 
Differencing 
>-
Energy Data 
>- Quantise ~ Code& Remap Reorder 
Triangular Successive E'ZW, ZfE 
Wavelet Approximatio n 
Transform 
~ 
Entropy 
Coder 
Arithmetic 
Coder 
Figure 1-2: Video Compression Components 
Compressed 
Stream >----> 
In December 1993, J erome M. Shapiro, introduced an image compression scheme based 
on the wavelet transform and a novel coefficient mapping technique, which he named the 
Embedded Zerotree Wavelet (EZW). In September 1996, Stephen Martucci and Iraj 
Sodagar also proposed a modified version of the EZW, which was targeted more towards 
Page - 3 
low bit-rate vidro coding and was named the ZeroTn:e En1ropy coder (ZTE). Both 
proposed algorithms provided an efficient technique to map coefficients in multiple 
wavelet subbands into trcc!l which could be represented using a single symbol. Thus 
exploiting the self-similarities between subbands, which are inherent to coefficients 
resulting from a wavelet transfonn. The major differences between the two indude, 
1, Different eymbol sels -The EZW uses a 4 symbol set, while the %'L'B uses a 
simpler 3 symbol m. 
2, Quantisation method - The EZW contains an embedded quantisation 
algorithm while the ZTE uses an external subband quantisation process. 
3. Number of passes - The number of passes vary significantly as the EZW usts 
multiple passes while the ZTE only two. 
Although, the EZW coder was initially proposed for image compression its usefulness is 
not limited to this, and like the ZTE coder, it too can be employed for compression of 
video sequences. In this thesis, designs for both video codec are presented, particularly in 
regards 10 the hardware realisation of a device suitable for mobile multimedia applications. 
1,2, Motivation a/Thesis 
The motivations behind this thesis are derived from two perspectives, 
1.2.1. Mobile Multimedia Communications Industry 
''Watching fllms and concel'UI anywhere 
By 2005, we will be downloading films, live coocens, and games through the air, 
enabling us to access the entertainment of our choice anytime and anywhere. We will 
carry a new type of slim-Line digital communications device to enable us to relax and 
work more efficiently on the move. Pay-per-access services will mushroom, and the 
film and music industries will increasingly use mobile multimedia services to tcst-
markct products.'' 
,\ future prediction by a company called Rnkc, owned by Siemens, 
hnp:/ /ww,,1•.roke.m.uk/ ncws/archivc/scicncc_fiction.htm 
Page-4 
Comments such as this, although made with a light-heart, predict some of the future 
possibilities for mobile multimedia communications, and humankind's impending need 
to embrace these concepts in daily life. Compact mobile communications equipment 
were once only a gleam in the eyes of the science fiction writer (such as Gene 
Roddenberry - for the creation of captain Kirk's communicator), yet today, there are 
an estimated 553 million mobile customers globally. Figure 1-3 illustrates the growth 
of the mobile phone user group since 1992, as taken from, 
http: //home.intekom.com / cellular/ . 
=400 -1----~~~~~~~~~~~~~~~~~~~ 
0 
:i 
l 
~ 300 -1----~~~~~~~~~~~~~~~-l--~~~ 
" :::> 
" 
.0 ~ 200 -1--~~~~~~~~~~~~~-f~~~~--1 
c., 
O-l---~~-+-----<...,__..,==~~ ~ ~~..--~ ~ -,-~ ~ -l 
1990 1992 1994 1996 1998 2000 2002 
Year 
Figure 1-3: World-wide mobile phone users 
In addition to this, since its introduction, SMS messaging, another form of media, has 
taken off to result in the exchange of nearly 15 billion global messages monthly 
(December 2000) and it is still increasing. More information is available at 
http: //uk.gsmbox.com/ news/mobile news/all / 30904.gsmbox 
T hese figures indicate that, a significant number of global communities have not only 
fully embraced mobile multimedia communication technology but are also willing to 
expand usage to other forms o f communication media if the opportunity exists. 
Communication via video, therefore, has the potential to fully blossom in tomorrow's 
market and as such significantly motivates research into this area. 
Page - 5 
1.2.2. Novel Research Opportunity 
Mobile multimedia communications applications tend to demand products that 
consume little development resources (e.g. high bandwidth interfacing), are of compact 
size and consume minimal power. To target these demands the designs this thesis have 
adopted a more unified capture, display and processing approach than that used in 
conventional modular systems. This proposed unified approach is advantageous 
because, as far as the end product designer is concerned, it vu.-tually eliminates the high-
bandwidth interfacing required between the capture, processing and display 
components, and as such potentially reduces the associated development cost, in 
comparison to a modular system. In addition to this, a massively parallel processing 
approach has also been adopted. The concurrency inherent to parallel processing 
arrays enable them to provide the same functionality as a single processor system, yet at 
a reduced clock frequency. This reduction in clock speed, has the potential to reduce 
the power requirements of the device, rendering it more suitable for mobile multimedia 
products. 
Both of the design choices made above present some difficult, yet interesting 
challenges particularly when considering implementing in VLSI devices. Several 
groups were setup to produce solutions to these problems and were organised into the 
structure illustrated in Figure 1-4. 
Intelligent Pixel 
Array Project 
I 
I I 
Optical Systems Capture & ADC Parallel Audio University of ECUTeamA Processing Array University of Cambridge ECUTeamB Las Palrnas 
I 
Parallel Wavelet Parallel 2.erotree Buffur & Control And Motion Design & Design Compensation Integration 
Figure 1-4: TaskAllocati.on Structure 
Page - 6 
The research motil't! for this thesis 1-datcs to the design and realisation of a fuUy 
integrated massively paraUcl zcrotrec codec (the highlighted bloc~) for the lntelligcnr 
Pixel Array ProcL'Ssor. Some of the challenges ibis thesis aims to undertake include, 
1. Conversion of the serial processor based 2erotrec algorithms to massively 
parnllcl pixel based algorithm. 
2, Facilitare a highly parn.Uel zeMrcc significant search mechanism in VLSI 
hardware. 
3. Design architectures for both EZW and ZTE a!gorirhms. 
4. Design self-classification mechanisms. 
5. Design army load and unload mechanisms. 
6, Design scale and smus based pixel activation mechanisms. 
7. Integration of both codecs into a parallel wavelet tmnsfonn arcbitectu1c. 
8. Compare the performance and complexity issues of both EZW and ZTE 
codecs. 
9. Produce a fuUy custom layout of the selected codec in 0.25 micron VLSI 
technology, 
J .3. Scope of Thesis 
The aim of this thesis is to develop an Intelligent Pixel Army based m:issively parallel 
zctotrce video coding architecture and its fuU custom layout. Two zerom:e (EZW ~nd 
ZTE) algorithms arc in\•cstigated and a choice is made based on the coding performance 
and implementation feasibility of both algorithms, The thesis is composed of seven 
chapters including this one; the following is a breakdown of the scope" to be covered by 
each. 
1.3.1. Scope of Chapter 2 
Chapter 2 lll an introductory chapter which prei;ents SL'Vcmi concepts related to images, 
video and video compression. The concepts introduced include eicplanarions of 
images, video, image sampling, lmag./vidco compression, test ilmgcs used and codec 
l'aBe· 7 
components such as, motion analysis and compensation (Block Matching Motion 
Estimation, frnme differencing), im:igc trnnsfonns (wavelets, Discrete Cosine 
Traosfonn), image quantisation (subband) image/video coding schemes (zigzag, run-
length) and entropy coding (Variable Length Coding, arithmetic). This chapter 
presents the basics for n:;age in other chapters. 
1.3.2, Scope of Chapter 3 
Chapter 3 presents an in-depth investigation into the two, Embedded Zero tree Wa\•clet 
(EZ\'11) and ZcroTrce Entrop)' (ZTE), image/video coding zcrotrec algorithms. This 
chapter covers the concepts of zemtrec symbol gcncratiou, signilic:mcc identification 
of wavelet coefficients, subband quantisation, successive approximation, zerotrcc 
codec components, tree search algorithms and the image coding pcrfonnancc of both 
algorithms. The chapter introduces the terminology used in chapters ro foUow. 
1.3.3. Scope of Chapter 4 
Chapter 4 fimly introduces the Intelligent Pix.cl Array and Intelligent Pixel parndigms 
and their application to cnlt'•tmnications systems. Secondly it presents the modified 
architectures designed to suppon both the wavelet tr:msfonn component and the 
zcrotrce codec. The chapter spedlically covers the concepts of pix.cl-wise capture 
display and processing infrastructure, scale control, high-pass/low-pass pixel 
identification structure, pix.cl communication, the primary register component, motion 
compensation components and pixel mode nrchirccrurc, The component a,chitccturcs 
sugge~tcd here arc used in conjunctioo with the zerotrcc architectures to constitute the 
fuU video codex,. 
1.3.4. Scope of Chapter S 
Chapter 5 presents the fuU parnllel hardware architectures for both the EZW and ZTE 
algorithms. A comparison, in terms of compression efficiency and hardware 
complexity, is then presented to select a particular codec for prototype implemcmation, 
The chapter covers the areas of NBLOCK format, the para.lid scarcb interconnections, 
Page - 8 
significance identification, symbol generation, dala load and unload :uchite<:turcs and 
edge buffer specifications. Both codec cfftciendes arc tested on two image SC<juenccs 
and e hardware comparbon is also made. Finally, the bener hanlw11re design t, 
selected. 
1.3.5. Scope of Chapter 6 
Chapter 6 present' the full custom L1your for the ZTE an:hhe<:nu:e in a 0.25µm UMC 
CMOS process, for the IP!OOP prototype. A simulated power calculation ii then 
made. The chapter covers the areas of specifics relating to the design technology, the 
b:i:;ie gates used, the buffer drivers, the 2crotr~-e components, power di.nribution, pin 
distribution and the cntln: army la;•out. HSpicc simulations are =de on the completed 
pixd design to obt1in the nct:essary power measures. 
1.3.6. Scope ofChapter7 
Chapter 7 presents a conclusion 10 the thesis and also presents some fururc research 
options for continuation of the work. 
Page·9 
Cliapter 2 
IMAGE/VIDEO CODING TECHNIQUES 
"A picture is woi:th a thousand words."' 
Fred Barnard (dm, 1920). 
2.l. Introduction 
A viewpoint on this slogan suggests that a description of a detailed scene quite often 
necessitates the use of significantly large number of words. Words, which arc limited in 
indhidually descripti\•c content, when assembled together as a part of the whole, can 
describe a natural scene in aU its detail. Similarly, a digiral representation of~ detailed 
image requires a large number of its primary detail components, ruuncly bits. A typical 
gn:yscale image (PAL CIFJ of352 x 288 pixels, with 8 bits of colour per pixel, requires 792 
kilobits to be represented in fuU. Images with larger reso!utiorn; and/or more colours 
(1-IDlV, DVD) require significantly more sp~ce to be fully represented, Given an 
environment with a !arge to infinite bandwidth, the trJnsmission of these images for 
communications puJ]>Oses presents few obstacles. However, most real-world 
communications applkations arc limited in available bandwidth and as such a compromise 
between the detail levels and the data rate (bps) is applicable to virtually all visual 
communications systems. For imtance most modem modems with data rates in the range 
15 - 56 kbps or ISDN channels that use 64 kbps or the implementations of the new 3M 
I According to a p11pcr (lJ published by Alan F. Blackwell a pictun: is acrnally wonh 84.1 Vlwds. 
Page· 10 
kbps 3G [2] standards are still well under the required bandwidth for 25 frame per second 
QCIF (176 x 144) video communications streams, which consumes around SMbps. To 
circumvent the lack of bandwidth in most communication channels, video communications 
systems in general eliminate redundant information from the stream while attempting to 
represent the unique detail content with an acceptable compromise between image-quality 
and bit-rate. One that performs as such, can be considered as an efficient video 
compression system for a particular type of video sequence, since the efficiency of a video 
coder is directly related to factors such as video content, required video quality, frames per 
second, coder / decoder complexity etc. 
Video is composed of one of two basic types of images; natural image (Figure 2-1) or 
animated sequences be it computer generated or hand drawn (Figure 2-2). 
Figure 2-1: Natural Image (Jenny) 
Figure 2-2: Generated Image 
Page - 11 
Clearly Figure 2-1, typical of most photographs O[ video camera sequences, consists of 
more colours and colour gradients than Figure 2-2, and as such it tends to compress better 
with an energy remapping coder such as Joint Picture Experts Group (JPEG) [3] [4], 
Wavelets or Set Partitioning In Hierarchical Trees (SPIH1). In comparison Figure 2-2, 
typical of gencrntcd an or hi-level images from faxes, clearly contain more solid colour 
areas and sharper edges, and as such it is better compressed with coders such as ]BIG or 
GIP. See [SJ for more details on the comparisons. Generated art tends to exhibit greater 
higher frequency properties and also contain large single valued image areas; which typically 
and efficiently coded by entropy coding techniques such as run-length coding. variable 
length cDding, arithmetic coding etc. Narural image compression algorithms in comparison 
tend to be complex in narure than thcir counterparts, because they employ transforms and 
quantisation ~chemcs in addition to the entropy codec.!! [6]. For all intents and purpo!cs 
the transform and quantisation stage attempts to represent the image in a "generated art" 
manner to be efficiently coded by an en1ropy coder. The complexity of a natural image 
codec is generally a reflection of its efficiency as shown in [7] where the more complex 
SPIHT codec out performs the generic JPEG codec. 'This thesis will focus on a natural 
image compression scheme, as the targeted application, namely mobile mulrimedia 
communication, depends heavily on a codec that performs efficient naturnl video 
compression for low-bit rate communications. 
Vi.Jen can be considered as a set of simila[ images that vary in content at a specific rate in 
time (eg@ 25 fps for PAL). Therefore a typical video compression technique provides a 
means of efficiently coding temporal redundancy between successive images in addition to 
a general image·coding algorithm. Some major hybrid video/image compression standards 
available today, as presented in [5] are Listed in Table 2-l. 
Hybrid video coding techniques today, tend to employ Rt least four key compression stages, 
as indicated in Figure 2-3, with each stage uniquely contributing to the compression 
process. The following section will cover some aspects of each of the stages. 
Page -12 
Table 2-1: Image Video Compression Standards 
Standard j Crl'atini: llod)· j Tari:l't ,\pplirntion I 'l)Jll' I \'car 
...... CCITTG3 ...... ......•........................... JTU-T ................... ................ ... .... ..... ........... .. Facsimile ... ............................. hnaffe ...... .... 1980 ... . 
...... CCl7TG4 ...... ........................ .. ........ .JTU-T ............... .. ... ................... ...................... Facsimile .............................. Jmaffe ..... . .... 1984 ... . 
........... GIF ............... .... .. ..... ................ Co11puserve ..... .......................... ............. Generic /11dexed Pictures ................... ltnaffe ..... . J987189 . 
.... .. .... H.261 .... ..... .......... ..... ..... ................. lTU-T ............................. .. ................ .Video. over. lSDN_(64Kbps) .... .............. Video ........... t 990 ... . 
......... JPEG ............ Joim Picture Experts Group_(/SO, _ITU-T. lECJ. ... ............. Ge11eric Natural Piclllres ................... lmaffe ...... .... 1992 ... . 
.. ...... MPEG-1 .. .......... .... Motion. Picture .Experts Group (ISO, lEC) ........ ................... ... Video _On CD ...................... .... ... Video ... .. . ... ) 992 ... . 
........ JBIG .......... ... . Joi111 Bi-level lmage.Group.(ISO,. ITU-T, IEC) ........................ .. Bi11aryJ111ages ...................... ...... hnaffe ...... .. ..J993 ... . 
J.81 ETSI Video Distribution Over Public Networks Video 1994 
. MPEG-2 I H.262 ... Motion Picture Experts Group (ISO, .ITU-T, _IEC> .. ............. Generic _Interlaced. Video .................... Video .......... .1995 ... . 
H.263 I G.723.l ITU-T Video Over POTS Video 1996 
.... .... MPEG-4 ........ . Motion.Picture.Experts Group (ISO,_JTU-T,_IEC) ............ Object.Based.Video/ lnia$es ........... Lmal\e I Video ..... 1998 ... . 
....... MPEG-7 ....... .. Motion. Picture. Experts Group (ISO, _ITU-T, _IEC) . . Multimedia.Content Discription Interface ........ Misc .. ..... .... 2001 ... . 
Uncompressed 
1 
.,, 
Video Stage A StageB Stage C 
Image Codec 
Stage D I I ---.,~ressed 
Video 
Vjdeo Codec 
Figure 2-3: Image/ Video Coding Stages 
Stage A- Temporal Redundancy Compensation 
Stage B - Image Energy Remap or Transform 
Stage C - Quantisation or Lossy Compression 
Stage D - Entropy Coding or Non-lossy Compression 
2.2. Image I Video Coding Primitives 
2.2.1. RGB - YUV Colour Space 
Images captured, typically for compression and display, are represented in one of two 
standards, either RGB (Red Green Blue) or YUV (Luminance and Chrominance) [8]. 
The RGB technique is often used for reproduction or capture, while the YUV is more 
often used for processing and transmission. Images captured in RGB typically use 24 
bits per pixel with an 8-bit value for each colour component in the pixel, resulting in 
around 16 million possible colour levels per pixel. The main disadvantage of the RGB 
scheme is the lack of a common intensity component, which restricts the processing on 
the image to processing performed on each individual colour. In comparison the YUV 
Page -13 
scheme employs one luminance component (Y) and two chromioancc components 
(UV) to represent the image. The luminance component can be rcaLisc'C! a~ a grcyscalc 
representation of the image, whkh when supplemented by the two chrominancc 
components, represents rhc original colour image. The fundamenllll advanlllgc of 1his 
representation originates from human perception of n colour image, where dcfr,;ts in 
the detail levels in the intensity component affect the visual stimulus ~ignifkantly more 
in comparison to defects in colour 19]. Therefore, by maintaining the imcnsity 
component to a higher degree of accuracy a compromise can be made on the detail 
levels of the lumin1ncc componen~. The 1hrce components arc represented using B-
biu per pixel per component; however the sampling scheme is adjustc'll m remove 
some redundancy a, this fim stage. A typical conversion matrix for RGB YUV 
trnnslation, as t:ikcn from [8] can be seen in (Eq. 2.1) :md (Eq. 2.2) (After level shifting 
U and V hy subtracting 128). 
[
'] [ 0.299 ~ = -0~~78 
0.587 
-0.3313 
-0.4187 
0.114 ][R] 0.5 G 
-0.0813 8 
0 
-0.34414 
1.772 
1.,0, I'l 
-0.~414 ~ 
(Eq. 2.1) 
(Eq.2.2) 
Codecs that use the YUV standard readily al,.ow for greyscale images as well as colour, 
and as such this standard is adopted for the rest of this do,;umcnt. 
2.2.2. YUV Image Sampling 
Since the human eye is less susceptible !O errors in colour as compared to intensity, the 
first basic compression mechanism is usually appLied here. 1bis is achieved lly sub. 
sampling the colour componen~ further. The usual sampling ratio between YcU:V as 
employed in standards such as MPEG, H.261 etc. ii fixed at 4:1:l (a 4:2:2 scheme is 
also available where only the horizontal. chrominance component is rcduced). In thii 
Page· 14 
scheme each of the chrominance components are sampled at half the sampling rate of 
the luminance component in each direction, resulting in a four-fold decrease in the two 
colour component pixels, in comparison to the RGB scheme. The 4:1:1 sampling 
pattern pictured in Figure 2-4 corresponds to an equivalent 12 bits per pixel, that 
"appears" very close to the 24-bit per pixel RGB scheme. However each sample in 
each colour component is still converted to an 8-bit value. 
2.2.3. 
I 
I 
i 
UV , ! 
I 
i 
i 
-·- · - · -· - · - · -·- ____ __ 1 __ -·- · - ·- · - · - · -·- · - · - ·-· - · -·-· - · - · - · - · -·- · 
I 
I 
! 
! 
UV. 
I 
Chrominance (U & V) Pixels 
Figure 2-4: Luminance & Chrominance sample positioning 
Image / Video Resolutions 
The sizes of images that are captured vary in resolution dramatically from one 
application to another, usually ranging anywhere from 32x32 to any arbitrary 
resolution, with the occasional exception to either dimension being a multiple of 8. 
Influenced by television systems, video codec resolutions however, have adopted a 
number of default resolution standards. As referenced in [5] [8] the most common 
formats are listed in Table 2-2. The size of the image influences the computational 
complexity of a particular algorithm and in practice larger images compress to a better 
ratio than smaller images. 
Page- 15 
Table 2-2: Common Video Resolutions 
Fonnat y UV I Usage 
QCIF- NTSC 176 x 120 88 x60 Teleconferencing 
QCIF- PAL 176x144 88 x72 Teleconferencing 
CIF- NTSC 352 x240 176 x 120 Teleconferencing I Video 
CIF-PAL 352 x 288 176x144 Teleconferencing I Video 
SIF - 525 720 x480 360 x480 Movies, High Res. Video 
SIF- 625 720 x576 360 x576 Movies, High Res. Video 
The two main standards used for teleconferencing and video communications 
applications are, the Common Intermediate Format (CIF) and the Quarter CIF 
(QCIF) . Since the applications focus of this document slots into the category of video 
communication the image resolution of QCIF or CIF with PAL are used for codec 
testing purposes. 
2.2.4. Test Images & Sequences 
As a test image, the 'Lena' (or sometimes spelled 'Leanna') image, is one of the most 
commonly used images in the available literature. The image exhibits both sharp 
cornered textures and gradual changers in texture, while providing a human face as a 
test image. The image is square and generally found in resolutions ranging from 1024 x 
1024 down to around 32 x 32. The image used here is 256 x 256 x 8 bits for the 
greyscale and 256 x 256 x 24 bits for the colour, as this resolution somewhat resembles 
the QCIF standard. The greyscale image and its colour version can be seen in Figure 
2-5. 
As far as video sequences are concerned, two different sequences are used in this 
document. One the proprietary 'Jenny' sequence and the other the standard 'Salesman' 
Sequence. 
Page - 16 
Figure 2-5: Lena Y & RGB Images 
The 'Jenny' sequence was filmed on a CCD camera, within the post graduate 
engineering labs at Edith Cowan University. It portrays a woman,Jenny, speaking as if 
over a video phone. The camera was hand held to provide a sequence that emulated a 
mobile video phone device and as such is susceptible to hand held tremors. The clip 
was shot at 25 fps PAL with CIF (352 x 288) resolution and is composed of 24-bit 
colour. The sequence is 62 seconds long and is composed of 1557 frames. The 
sequence was later down sampled to QCIF (176 x 144) and converted to 6-bit greyscale 
to suit the application in this document. The colour and greyscale versions of the first 
frame in the Jenny sequence are presented in Figure 2-6. 
Figure 2-6: The 'Jenny' Sequence 
The 'Salesman' sequence is a professionally shot fixed camera sequence of a salesman 
busy plying his trade. The sequence contains a large number of high detail and 
contrasting objects, e.g. books, which remain stationary throughout the sequence. The 
only moving component is the salesman. This is a standard sequence used by many 
Page - 17 
video compression algorithm developers to test their codecs. The original sequence 
was captured at 25 fps and at a resolution of 360 x 288 pixels. The sequence is in 8-bit 
greyscale and is 449 frames (18 seconds) long. For test purposes in this application, the 
sequence was then sub sampled to QCIF (176 x 144) resolution, using 6-bit greyscale. 
Figure 2-7 illustrates the first frame of the salesman sequence . 
l ' .. ' 
'I I ' 'II' J 1, [ I I 
j ·I" M RH ' 
...... d l!l1 ?;. 
,. 
li1··. _ I( ,· .... :'tt. 
·I •.li:!li.. ·, ~ , .. , ,:. "• f. .. Ul I ' -~ I, . . ,. .. " ·"'""' 
<fl'··-~'-. ~j~-~ 
.,,.... 1,Vi~t:-""~- I 
.Z' ,,. 
,. 
-;; 
Figure 2-7: The 'Salesman' Sequence 
2.3. Motion Analysis (Stage A) 
Motion estimation, prediction and compensation, collectively known as motion analysis, 
are performed on video sequences to minimise the temporal redundancy associated with 
successive images which contain a small spatial variance between frames [10]. A common 
example video clip, which tends to have very minimal temporal variance, typically portrays 
an individual reading the news in a stationary background. The only movement in such 
cases tend to be limited to the individual's face, eyes, lips and hands. Continuous 
retransmission of frames from such a sequence generally results in the excessive use of 
bandwidth, as the successive frames will only contain minimum changes from the first. 
Therefore, motion analysis is employed to code the changing areas only. There are three 
concepts surrounding the area of motion analysis, as extrapolated from [8] [11], which have 
the following meanings. 
2.3.1. Motion Estimation 
Motion Estimation is generally considered as the process of detecting movement 
between two consecutive frames in a sequence, and calculating a motion vector to 
denote its displacement. MPEG [12] and H.263 [13] standards employ this technique 
Page - 18 
to perform block motion estimation. The motion vector, u, is typically calculated for a 
16 by 16 pixel block having both X and Y coordinates. If n = (l"Jx1N) represents a 
pixel at the head of a 16 by 16 block at time 't with intensity I, and at time 't -1 the head 
pixel was at o =(OxJ O), then the vector u is generated in such a manner to minimise the 
error between the block intensities I, and I,_1• This way the new block intensity at 
position n could be generated from the contents at o and shifted to position o + u. 
Figure 2-8 illustrates the use of these vectors (15]. 
Vectoru 
Block from 
Previous Image 
Figure 2-8: Motion Estimation 
Block in 
Current Image 
A function calculating the sum of absolute difference (SAD) (Eq. 2.3) between two 
blocks of data is classically used to calculate the error between a block in the current 
frame and a block in the previous frame. For each block in the current frame this 
function is repeated on several blocks in the previous frame while searching for the 
minimal displacement. The search distance is usually set to 16 pixels in either direction 
from the origin, however standards such as MPEG-2 (14] have provisions for longer 
and non-integral search distances to suppon variable sized blocks (16]. 
SAD= I(N-0)2 
(Eq. 2.3) 
Where N and O represent the current block and previous block respectively. 
Page -19 
2.3.2. Motion Compensation 
Since the motion estimation process only searches for the minimum SAD value below 
retransmit threshold, the possibility that non-zero SAD values existing is not zero. 
This implies that the motion estimation process located the closest match and not an 
identical match. To compensate for this a difference block, N - 0, is transmitted. 
nus process is tenned motion rompMiation (17]. Frame differencing is an exueme case 
where motion compensation is performed on the entire frame from one frame to the 
next and hence a simpler hardware alternative to full motion analysis. 
2.3.3. Motion Prediction 
Ugcd in srnndards such as MPEG (IBP frame system) [12], motion prediction is 
performed at both the encoder and decoder. Herc the encoder attempts to track the 
predictions made by the decoder ~nd supply compensated blocks to cover the 
difference between that made by the decoder and the real image. nus technique allows 
for better compression, however increases the memory capacity required and the 
complexity of the coding / decoding scheme. 
When performing motion analysis three different frame types are employed; Intra, Inter 
and Predicted frames. The Intra-frames are fully encoded frames that provide a base for 
coding Inter-frames and Predicted-frames, and as such require a larger bandwidth than the 
other two. The Inter-frames and Predicted-frames contain the motion estimation 
components. Intra-frames are regularly injected to 'refresh' any deviation caused by Inter 
or Predicted frames within a compressed sequence. In this manner optimal results are 
attained, 
2.4. Discrete Cosine Transform (Stage BJ 
The Discrete Co.sine Transform (DCI), &st developed in [18], is the most commonly used 
20 orthogonal image transform prcscm today. Standards snch RS JPEG, MPEG 1/2/4, 
H.261/3 etc. employ this transform to relocate the spatial energy in the image into a few 
coefficients. This transform behaves in a manner approitlmating a real-valued Fourier 
Transform, packing the significant coefficients into the lower frequency areas of the 
transformed spatial positions [19]. This also has the effect of de-correlating the 
Page-20 
dependencies between adjacent pixels in the time domain. Therefore, an image containing 
numerous sharp texture edges produces a DCT with a high number of larger magnitude 
coefficients throughout the spectrum. If the number of high magnitude coefficients were 
to be reduced or quantised then the inverse transform redistributes this loss over the entire 
image eliminating the high frequency sharp edges generating a blurred image. 
The lowest frequency coefficient of a DCT is termed the DC coefficient as it represents the 
average image level. This value generally is an order of magnitude larger than the next 
largest coefficient and is normally left un-quantised, as it degrades the reconstructed image 
significantly. Figure 2-10 illustrates a typical DCT performed on a 16 x 16 image of Lena's 
eye (Figure 2-9), showing the compaction of energy into the low frequency spectrum. 
Note in Figure 2-10 the DC coefficient was reduced by a factor of 10 to allow for display. 
(l.) 
'O 
B 
·a 
0.0 
ro 
~ 
400 
300 
200 
100 
0 -L -
-100 
-200 
20 
15 
Figure 2-9: Lena's Eye 
--,--
- -, 
' ,- --
----! 
-, I 
I - .... - ~ : 1---- - I .... - - .... , ........ 
- t- -
I 
j - - ..,_ -
I I I .... '"'I .... 
- - - I - - - .... I ' 
_1_- I ,- ............ -.! 1 
: -~- I :----~---_; 
........ -1-.... I I 
, _ - - t 
I 
... -
- I L -
- - I , _ -
-, 
I - - r 
- .., -
20 
0 0 
Figure 2-10: 16 x 16 & 2D DCT Transform of eye 
To illustrate the effect of quantisation on a transformed image, Figure 2-11 illustrates an 
image that has been quantised at the extreme level in the time domain to generate bi-level 
image at 1 bit per pixel. While in Figure 2-12 the image has been quantised at an even 
lower 0.69 bits per pixel after undergoing a DCT and still remains near perfect. 
Page - 21 
50 
100 
150 
50 100 150 200 250 300 350 450 500 
Figure 2-11: "Raw" Lena image @ 1 bit per pixel 
50 
100 
150 
200 
250 
300 
350 
400 
450 
500 
50 1 OD 150 200 250 300 350 400 450 500 
Figure 2-12: Transformed Lena image@ 0.69 bits per pixel 
The standard forward and inverse 2D DCT can be mathematically expressed as in (Eq. 2.4) 
and (Eq. 2.5) respectively [21]. 
Page - 22 
F(11, v) = (lf4)C(u)C(v)ff t(i,j)co{ (21 + l)U11')oo{(2j + l)vJI'), 0 Su :S (M -1) 
1• 01-o 2M 2N OSv:S(N-1) 
C(x)J11.J2 x=O l I otherwise 
(Eq,2.4) 
f(l,J) = (I/ 4)fI;ccu}C(v)F(u, v)co{(2u + l)i,r)oo{(2v+l)j,rl· IS i SM 
•• 1,,1 2M 2N ISJSN 
Where 
F(u, v) = the transformed coefficient. 
f(i,J) = The image value:;. 
M x N = 111e size of block. 
(Eq.2.s) 
HoWC\'Cr in most cases, by exploiting the orthogonal nature of the DCT, a 2D OCT can he 
realised by performing multiple single dimensional tramforms on each of the rows lI!d 
columns. Furthermore to ease the computational burden the DCT convolution poses an 
imcgcr OCT evaluation has been presented in [20/. The forward and inverse single 
dimensional ttansfornis are realised by evaluating (Eq. 26) and (Eq, 2 7) for N number of 
samples (18] [21]. 
F(u)=C(u)i;/(i)cos((2i+l}mll • OSuS(N-1) 
,.. 2N 
(Eq.2.6) 
'" [<2u + l)Jlil /(i)=C{lt)LF(u)cos --- , OSIS{N-1) 
•·• 2N 
(Eq.:q) 
Page-23 
Where, 
C(x) = Hi ~ 
,x=O 
,1~ x~ (N -1) 
Standards such as H.261, H.263 MPEG 1, 2 & 4 which actively use the DCT as part of the 
video processing algorithm, segment the image into a number of smaller 8 x 8 time domain 
blocks before these block are transformed [13]. The advantage here is the reduction in 
complexity and register resources as needed to perform a full image transform, especially in 
hardware. In addition the transform can usually be performed in parallel on all the blocks 
in the image. The disadvantage to this technique is that under higher levels of compression 
needed for lower bandwidth usage, it suffers from blocking artefacts 
50 
100 
150 
200 
250 
300 
350 
400 
450 
500 
50 100 150 200 250 300 350 400 450 500 
Figure 2-13: 'Lena' DCT (JPEG) @ 0.2 bits per pixel 
Figure 2-13 illustrates the 'Lena' image after it has undergone an 8 x 8 block transform and 
quantisation to represent 0.2 bits per pixel (bpp). The blocking artefacts are caused as a 
Page - 24 
result of the spl!Cttal truncation that occurs when quantisation is applied to the fre9uency 
,;omponents of a block. This affects the spatial image by driving all thC pixels in a block to 
the same colour levd; hence two adjacent blocks that have different "average" fre9ucndcs 
will contrast against each other resulting in a blocking effect. 
A fuU image transfonn will avert the blocking anefacts, however since the DCT is not 
ideally suited for full image transfomis, due to the hardware complexity that such an 
implcmcntatioo requires, simpler techni9ues such as wavelets and zerottees are generally 
used to petfonn full image ttansfonns. [7] [22] 
2.4.1, Matrix Based 20 OCT Computation 
The 2D OCT can be computed, for parallel implementations, employing a matrix 
based approach. If R. is a vector from O 10 (R·I) and Co is a vector from Oto (C.1), 
where R and C tcpresent the row and column width of the block to he transfonned, 
also giveo CM to be a RxCmatrix containing the cosine values generated by (E9. 2.8), 
then <:amputing the statements in Figure 2-14 results in generntion of a 2D OCT for 
that block. Where Fis f tramfonned. The algorithm for the invase is as in Figure 
2-15. 
(Eq. 2.8) 
I. CMl=CM(R,C) 
2. CM2=CM(C,R) 
3. F,=CM2x(cu1x/) 
., 
'Fi{/urez-14: Forward 2D DCr Algorithm 
1. CMl,,,CM(R,C) 
2. CM2=.CM(C,R) 
3. CMI(l,:)=0.Sx(CMl(l,:)) 
2. CM2(1,:)=0.5x(CM2(1,:)) 
3. f"'CM2x(CMlxF) 
Figure 2-15: Inverse 2D DCT Algorithm 
- ·!' ,.,,· r.1 Page-25 
2.5. Discrete Wavelet Transfonn (Stage BJ 
Since its first theoretical formulation by J. Modet, A. Grossmann [26] and Y. Meyer [27], 
wavelet theory has been applied with effective results to a variety of signal processing 
applications in the fields of datll compression, image compression, image processing, timc-
frequency spectral estimation etc.[25]. Daubechles [28] and Mallat [29] conducted the 
pioneering work bridging the gap between the whoUy mathematical wavelet and its digital 
signals processing counteqiart. A predominant feature of wavelets and wavelet analysis, by 
which it facilitates digital signals processing, relies in its ability to effectively represent a 
signal in its time-frequency equivalent [30). This time-frequency conversion property of 
the DWf, remaps a signal's transient components to a position on the time-frequency 
plane, which represents the predominant frequency of that component at the pmicu.Iar 
time of its occurrence. Hence each coefficient in the transform is dctennined by taking an 
inner product between the input function and a suitably chosen wavelet basis function. 
This value then represents, in some sense, the degree of similarity between the input 
function and that particular basis function [31]. If the basis functions are orthogonal (or 
onhonormal), then an inner product taken between two basis functions is zero, indicating 
that these are all completely dissimilar. Therefore, if the input image is composed of 
components that arc similar to oue, or a few, of the lnsis functions, then all but oue, or a 
few, of the coefficients will result in relatively small values. 
The application of wavelets to images for compression purposes stems from the analysis of 
typical images, which tend to indicate that even though areas of large spatial activity are 
easily identified, extensive areas of low spatial variance or substanrially uniform areas can 
be identified as well. Since the rapidity of change in spatial variance can be expressed in 
terms of spatial frequency components, [32] suggested that typical images have strong low-
frequency components that ncccsiitate preservation. Given that a DWf represents spatial 
components in time-frequency space, when applied to an image this results in the 
separation of the vertical and horizontal spatial frequency components into multi-
rcsolution subbands. Figure 2-16 depicts the 'Lena' image in multiple subbands each at a 
different resolution. An excellent comparison between different wavelet based subband 
transforms arc presented in [33]. 
... ·.,.,, .. Page -26 
50 
100 
150 
200 
250 
50 100 150 200 250 
Figure 2-16: 'Lena' image in multi-resolution subbands 
When this concept is applied to a 2D image and a suitable quantisation and entropy coding 
scheme is applied on the generated subbands, the result has shown [22] to outperform 
most DCT based compression systems, especially at low bandwidths. At lower 
compression levels, however, where quantisation methods above 1 bpp are used, [34] has 
shown that DCT based schemes with the same source coder results in improved image 
quality. The main tradeoffs between the very common DCT based systems and the 
Wavelet based systems lie between the complexity of implementation and efficiency. 
Wavelet based schemes often provide a more efficient compression scheme at low cost in 
complexity. 
To obtain an efficient compression rate, once the image is transformed into the form of 
Figure 2-16, a psycho-visual quantisation scheme is applied in varying degrees to the 
different subbands, with the aim of minimising the possible significant coefficient count. 
Figure 2-17 shows the 'Leha' image quantised at 0.19 bpp using a DWT method and a 
relative entropy encoder. It exhibits an obvious improvement in visual quality over the 0.2 
bpp DCT image in Figure 2-13 
Page - 27 
2.5.1. 
50 
100 
150 
200 
250 
300 
350 
400 
450 
500_ . , , 
50 100 150 200 250 300 350 400 450 500 
Figure 2-17: DWTof 'Lena'image@o.19 bpp 
Filter Based Wavelet Decomposition 
The purely mathematical approach to performing a subband decomposition of a 2D 
image in theory involves the computation of the inner products of a basis mother 
wavelet and components of the image [35]. This however, in terms of computational 
complexity, is quantifiably comparable to performing a standard tried and proven 
DCT, if not higher. Fortunately a technique introduced by (36] has shown that 2D 
wavelet decomposition is easily approximated by solely performing a selected set of 
simple filter functions on the image. This fast DWT technique realises the subband 
decomposition by iteratively performing a two-band subdivision of the low-frequency 
image components into high and low pass frequency segments. Since the greatest 
frequency component in the filtered sequence is bound to half that of the original, both 
filtered streams can be sub-sampled by two without being subjected to aliasing effects. 
Figure 2-18 illustrates a typical subband decomposition scheme for the iterative 
realisation of a one-dimensional DWT by employing a high / low pass filters and 
coefficient down sampling. 
Page-28 
f(itit) LPF LPF 
) ~~··· 
HPF HPF HPF 
gi(2itit) gi(4itit) gJ(Bitit) 
Figure 2-18: DWT subband decomposition via filters 
The maximum number of iterations that can be performed is related to the size of the 
data vector being transformed and is described by (Eq. 2.9). Where N is the number of 
elements in the array; is not odd and is exactly divisible by 2 1 • Where ftoor(x) is a 
function that returns x truncated to O decimal places. 
I= floor(Iog 2 (N)) 
(Eq. 2.9) 
Literature tends to suggest that the use of three, four or five iterations or scales 
provides a good balance between performance and complexity [37]. Reconstruction of 
the image is just the reapplication of similar filters followed by an up-sampling and 
interpolation process conducted on both sets of data that are then re-combined via a 
summation process. 
The key to performing an effective DWT and its inverse is heavily dependant on the 
quality of selected transform filters [39]. Most modern wavelet transform techniques 
employ Quadrature Mirror Filters (QMF) for the two-band filters, as aliasing caused by 
the forward transform while using such filters, are cancelled in the inverse transform 
hence providing near perfect reconstruction [38]. Furthermore, by choosing an 
orthogonal set of filters, a 2D transform is easily realised by alternatively performing 
multiple single dimensional transforms on the rows and columns. The subband 
subdivision that results from a 2D wavelet transform is illustrated in Figure 2-19. The 
like colours in the figure represent the frequency components, increasing in spatial 
frequency as the size increases, for the horizontal, vertical and the diagonal. Figure 
2-20 illustrates the first two decompositions of an image using a 2D transform. An 
Page - 29 
interesting attribute of the DWT is that it can be easily performed on an entire image 
unlike the DCT, because the filtering process restricts the bit size of the resultant 
coefficients to within the vicinity of the sizes of the image values. Imperfections, 
however, exist in the lack of ability to represent the real numbered filter coefficients 
precisely in different implementation scenarios. Some common filters used, listed in 
decreasing order of compression efficiency and complexity [40], include the Villasenor 
9-7 filter, the Daubechies 8 tap filter, the Villasenor 2-6 binary filter, the triangular 
binary filter and the Haar binary filter. These filters belong to the category of QMFs, 
where Table 2-3 refers to the standard coefficients used to evaluate these filters. 
I > 
Original Image 
r::> 
First Stage 
I > 
Second Stage Third Stage 
Figure 2-19: 2D DWT subband subdivision 
Page -30 
Rows Columns Columns Rows 
ILPFI .... ~ 
·l 
LPFI .... (.) I LL 
~~r ~·, 
IHPFI 
·~ 
~:::!.~~'~ 
... D\ ~ ,,~,!-·.,· $,,, ~~-) 
:~J· "~" 
Subsample Subsample LH 
~'·~ jLPFI 
·~ 
•'" ~ ~~· 
Image I I 
... ~~"'.:~·~": t 
;(~ ·~' 
IHPFI 
• . , ·. t~~·· 2 
•<+> HL 
~.,~···~ 
IHPFI ~ ;;.,":1t, , ~~ _..,~ . · ;..~ ....... ~~ .. ftt~~~ 
HH 
Figure 2-20: Subbandfiltering on an image 
Table 2-3: List of common wavelet.filter coefficients 
Coefficients NON-INTEGER WAVELETS 
Daubechie's-8 Villasenor-9/7 Villasenor-9/7 
(Ho) (Ho) (Go) 
W(Ol 0.230378 0.037828 -0.064539 
W(ll 0.714847 -0.023849 -0.040689 
W(2) 0.630881 -0.110624 0.418092 
W(3) -0.027984 0.377402 0.788486 
W(4) -0.187035 0.852699 0.418092 
W(5) 0.030841 0.377402 -0.040689 
W(61 0.032883 -0.110624 -0.064539 
W(7) -0.010597 -0.023849 
W(8) 0.037828 
BINARY (INTEGER) WAVELETS 
Two- Two- Triangular Triangular Haar * '12 
Six/'12 Six/'12 (Ho) (Go) (Ho) 
(Ho) (Go) 
W(O) 1/2 -1/16 0 1/2 I 1 
W(l) 1/2 1/16 1 1 I 1 
W(2) 'h 0 1/2 
W(3) 'h 
W(4) 1/16 
W(5) -1/16 
Page - 31 
m 
~ 
0:.: 
z 
en 
ll. 
130 
110 
90 
70 
50 
30 
10 
-+- Daubechie's-8 
---- Villasenor-9/7 
--Ir- Two-Six 
~ Triangular 
-.- Haar 
8 10 12 14 16 18 20 22 24 26 28 30 
Wavelet Coefficient Precision (bits) 
Figure 2-21: Effect of precision on filter performance 
Figure 2-21 is a depiction of the performance of some of the more common DWTs 
where the register precision has been limited in the number of bits. The quick 
convergence to near perfection of the binary filters (namely Triangular and Harr) 
should be noted. 
2.5.2. Triangular Binary Wavelet 
Typically, realising a DWT requires the application of a convolution between the signal 
and the filter coefficients. For larger multi-tap, non-integer filters such as Villasenor 
the computational complexity generally limits the method primarily to software. The 
research performed in [44] suggests that the triangular binary wavelet is not only simple 
in nature for hardware implementation but also performs sufficiently as a transform 
filter for the DWT. 
Figure 2-22 illustrates the 'Lena' image quantised to 0.2 bits per pixel using (a.) 
Villasenor filters and (b.) triangular filters. As is apparent, the variance in visual 
Page - 32 
appearance quality of the images is very subtle with the Villasenor filters 
accommodating slightly higher frequency components. 
(a) (b) 
Figure 2-22: 'Lena' image@ 0.2 bpp 
( a) Villasenor 
(b) Triangular 
The simplicity, in terms of implementation, of the triangular wavelet transform is 
attributed to three key factors; 
Mag 
Ho Go 
Sample 
X-1 x X-1 x X+J 
Figure 2-23: High & low pass triangular filter coefficients 
1. The low-pass coefficients are directly copied. Figure 2-23 indicates that the 
low-pass coefficients are generated from the H0 = [O 1 OJ filter. Since the only 
multiplier is a factor of one, the generated coefficients are literally a sub-
sampled version of the original coefficients (or image values). Therefore the 
Page -33 
low-pass component constitutes of each alternate coefficient from the original 
set. 
2. High-pass coefficients only require shifts and summations. Figure 2-23 
also shows that the high-pass filter, G0 = [0.5 1 0.5], consist of two divisions 
and two summations. The 0.5-filter coefficients are implemented by simple 
binary right-shifts of the image coefficients, while the 1-filter coefficients are 
realised by a simple copy. The resultant G0 is attained by summing these three 
components together. 
3. 20 performed via multiple 10 transforms. Due to its orthogonal nature, the 
2D transform is easily realised by performing several single dimensional 
transforms on the rows and columns of the image independently. These 1 D 
transforms can be conducted in parallel in a single direction (either rows or 
columns) at a time. 
The algorithm for the forward and inverse 2D triangular DWT can be seen in Figure 2-24 
and Figure 2-25, respectively. 
For S = 1 to scales 
X = select all low pass coefficients in scale 
First perform on columns 
Low= all odd index valued column coefficients in all rows 
High= all even index valued column coefficient in all rows 
Tl=- left column shift all Low values 
Tl= Tl+ Tl(Last Column) 
High= ((2*High)-Tl - Low)/2 
Store Low and High in output 
Perform Rows 
Low=- all odd index valued row coefficients in all columns 
High = all even index valued row coefficient in all columns 
Tl= left row shift all Low values 
Tl = Tl + Tl(Last Row) 
High= ((2*High}-Tl-Low)/2 
Store Low and High in output 
End for 
Figure 2-24: Forward triangular DWT algorithm 
Page - 34 
2.5.3. 
For S = 1 to scales 
X = select all low pass coefficients in scale 
First perform on columns 
Low= all odd index valued colunm coefficients in all rows 
High = all even index valued colunm coefficient in all rows 
Tl= l eft colunm shift all Low values 
Tl= Tl+ TI(Last Colurm1) 
High = ((2*High) + Tl + Low )/2 
SL ore Low and High in output 
Perform Rows 
Low= all odd index valued row coefficients in all columns 
High = all even index valued row coefficient in all colu nms 
Tl = I eft row shift all Low values 
Tl= TI+ Tl(Last Row) 
High= ((2*High) +Tl+ Low)/2 
Store Low and High in output 
End for 
Figure 2-25: Inverse triangular DWT algorithm 
Pyramidal vs. Nucleic Wavelet Blocks 
Coefficients generated as when evaluating a DWT are typically organised in one o f two 
representation formats; pyramidal as in [22] or Nucleic block based as in [23] [24]. 
Figure 2-26 contrasts the difference between the two, for a 3-scale transform 
conducted on an 8 x 8 block. As can be observed the positioning of the resultant 
coefficients varies quite significantly. 
1 2 3 -I 5 6 7 8 1 5 3 6 2 7 4 8 
9 10 II 12 13 14 15 16 33 37 34 38 35 39 36 40 
17 18 19 20 21 22 23 24 17 13 19 14 18 15 20 16 
25 26 27 28 29 30 31 32 41 45 42 46 43 47 44 48 
33 34 35 36 37 38 39 40 9 21 11 22 10 23 12 24 
41 42 43 44 45 46 47 48 49 53 50 54 51 55 52 56 
49 50 51 52 53 54 55 56 25 29 27 30 26 31 28 32 
57 58 59 60 61 62 63 64 57 61 58 62 59 63 60 64 
Figure 2-26: Pyramidal vs. Nucleic coefficient arrangement 
The pyramidal format stems mainly from literature and more traditional techniques 
employed to perform the DWI [41]. It arranges the coefficients in a manner that is 
Page- 35 
easy to view and access, with related coefficieni.s being scattered throughout the block 
or coefficient map. The nucleic scheme originates as a by-product of performing the 
DWT in a parallel array based manner, and its main advantage is that all related 
coefficients belonging 10 a particular set of spatial frc9uencies arc grouped together in a 
single nucleic block, whereas the pj•ramida! scheme can &stributc these contents across 
the entire image coefficient map. Therefore, the nucleic block based approach is more 
appealing for hardware implementation where long p~ths present problems. 
Figure 2-27 represents a 16 x 16 pixel Image block, transformed using three-scales and 
arranged into the pyramidal format. As am be obviously noted, a large number of 
related coefficients in different subbands span nearly half the width of the array. In 
comparison Figure 2-28 represents a 16 x 16 block il!ustmting the nudcic block 
scheme. A significant number of connections remain quite short in distance, however, 
a few connections can travel the maximum distance of :z-w pixcl1, where N represents 
the number of scales. The example connections provided in the two figures illustrate 
the savings in connection distance that can occur if nucleic block are employed. 
2.6. Quantisation (Stage CJ 
The coeffident 9uantisation stage introduces the primary lossy component in current 
standard image/video compression codecs. The pre-calculated loss in oumber of bits used 
to represent each of the coefficients has resulted in the term 'lossy compression' being 
identified with this stage. Hence, coefficients that have been 9uancised generally approach 
the value of zero or are represented in a fewer scale range gradients. Considering an 
analogue range represented by 8·bit coefficient values, it is easily identified that 
representing these values with 6-bit values reduces the number of range divisions from 256 
to 64. Therefore representing the range 0·255 by 6 bits of precision reiults in the 
coefficients being grouped into multiples of 4 lcvc!s (eg.0..4 .• 8 .. 12 .•..•. 255), taken from the 
original step size. A typical 9uantisation equation, cspeciRl!y in relation to hardware 
implementation, can be seen in (Eq. 2.10). Where C is the original coefficient, C' is the 
9uantised coefficient, N, the original number of bits used to represent the coefficient and 
M, the new number of bits used to represent the coefficient 
Page -36 
-c' = floor(C X i lvf-N)) 
(Eq. 2 .10) 
2 2 3 4 3 4 5 6 7 8 5 6 7 8 
2 2 11 12 11 12 13 14 15 16 13 14 15 16 
9 9 10 10 3 4 3 4 21 22 23 24 21 22 23 24 
9 9 10 10 11 12 11 12 29 30 31 32 29 30 31 32 
17 18 17 18 19 20 19 20 5 6 7 8 5 6 7 8 
25 26 25 26 27 28 27 28 13 14 15 16 13 14 15 16 
17 18 17 18 19 20 19 20 21 22 23 24 21 22 23 24 
25 26 25 26 27 28 27 2 29 30 31 32 29 30 31 32 
33 34 35 36 33 34 35 36 38 39 40 37 38 39 40 
41 42 43 44 41 42 43 44 45 47 48 45 46 47 48 
49 50 51 52 49 50 51 52 53 54 56 53 54 55 56 
57 58 59 60 57 58 59 60 61 62 63 61 62 63 64 
33 34 35 36 33 34 35 36 37 38 39 40 38 39 40 
41 42 43 44 41 42 43 44 45 46 47 48 45 47 48 
49 50 51 52 49 50 51 52 53 54 55 56 53 54 
57 58 59 60 57 58 59 60 61 62 63 64 61 62 
Figure 2-27: Related coefficient distance in pyramidal scheme 
1 5 3 6 2 7 4 8 1 5 3 6 2 7 4 8 
33 37 34 38 35 39 36 40 33 37 34 38 35 39 36 40 
17 13 19 14 18 15 20 16 17 13 19 14 18 15 20 16 
41 45 42 46 43 47 44 48 41 45 42 46 43 47 44 48 
9 21 11 22 10 23 12 24 9 21 11 I 22 10 23 12 24 
49 53 50 54 51 55 52 56 49 53 50 54 51 55 52 56 
25 29 27 30 26 31 28 32 25 29 27 30 26 31 28 32 
57 61 58 62 59 63 60 64 57 61 58 62 59 63 60 64 
1 5 3 6 2 7 4 8 1 5 3 6 2 7 4 8 
33 37 34 38 35 39 36 40 33 37 34 38 35 39 36 40 
17 13 19 14 18 15 20 16 17 13 19 I 14 18 15 20 16 
41 45 42 46 43 47 44 48 41 45 42 46 43 47 44 48 
9 21 11 22 10 23 12 24 9 21 11 22 10 23 12 24 
49 53 50 54 51 55 52 56 49 53 50 54 51 rn 25 29 27 30 26 31 28 32 25 29 27 30 26 57 61 58 62 59 63 60 64 57 61 58 62 59 
Figure 2-28: Related coefficient distance in nucleic scheme 
Since performing uniform quantisation over all coefficients doesn't necessarily represent an 
efficient quantising codec, three key factors influence the decision; 
Page - 37 
1. Coefficient magnitudes resulting from a particular transform - Typically 
the quantisation magnitude, in terms of its coefficients, is proportional to the 
average magnitude of the transform coefficients. For instance, since 
magnitudes resulting from an 8x8 DCT can increase up to 64 times the original 
pixel values, a much larger quantisation coefficient is applied when compared to 
wavelets which only double in value. 
2. Coefficient significance, in reference to its corresponding spatial 
frequency - Depending on the delocalisation algorithm employed to perform 
the transform, varying quantisation levels are applied to coefficients 
representing different spatial frequencies. Typically, coefficients representing 
less significant spatial frequencies are quantised more than those of more 
significance. For instance the DC coefficient in a DCT is quantised relatively 
less in comparison to the rest. Figure 2-29 represent the pattern of a typical 
DCT signific-ance coefficient map with the more significant coefficients located 
closer to the top left corner (brighter) and less significant coefficients (darker) 
arranged closed to the bottom right. The brighter components are quantised 
less in comparison to the darker components typically following a psycho-visual 
quantisation matrix. 
oc -
i """ i ~ 
i 
! 
L 
! 
l ,. 
Figure 2-29: DCT Significance Map 
3. The accommodation of a particular quantisation scheme in a selected 
algorithm - Commonly the quantisation scheme chosen is designed to fully 
integrate with encoding algorithm as it directly influences the final bit-count for 
a particular image or transform. Inadequacies in the quantisation scheme 
generally result in the reduction of features or efficiency of the coder. For 
Page - 38 
2.6.1. 
instance, the quantisation scheme typically used with a DCT is not necessarily 
optimal for a wavelet-based coder. In addition features such as multi-resolution 
and scalable video, which is more suited to wavelet based systems due to its 
subband hierarchy, require special considerations like successive approximation 
to improve its efficiency. 
DCT Quantisation 
The standards H261, H263, MPEGl 2 & 4, which employ some form of the DCT in 
the base video coding algorithm, use predetermined tables of quantisation values, for 
the coefficients resulting from the DCT. Since the block size of a DCT is fixed at 8 x 8 
coefficients, a fully researched quantisation value-set from a selected list is applied 
individually to all the coefficients, and an index to the list is supplied to the decoder to 
enable correct reconstruction. 
Coefficients 
Selection 
Quantisation Matrix 
List 
• I 
I 
I 
I 
• 
Quantisation 
Figure 2-30: DCT Quantisation 
Quantised 
Coefficients 
• 
Index 
Figure 2-30 illustrates the quantisation process as carried out for each 8 x 8 coefficient 
block. Although the standards do not dictate the selection system it is generally based 
on the average energy of each block. Recent algorithms such as MPEG 2 and 4 also 
allow for the inclusion of a user defined quantisation matrix if required. 
Page- 39 
2.6.2. Subband Quantisation 
QJ 
Q4 
QJ I QJ 
Q4 I Q4 
Figure 2-31: Subband Quantisation 
Subband quantisation is a relatively newer method when compared to the more 
traditional uniform / non-uniform quantisation techniques applied on static sized 
frequency blocks [22]. Its existence is generally attributed to the advent of wavelet 
based image decomposition techniques, and as such it is generally closely coupled with 
the DWT when used for image / video compression. Since the DWT performs a 
logarithmic subdivision of the image into multi-resolution subbands, each subband can 
be quantised with a different uniform or non-uniform quantisation matrix. However, 
the main hurdle that needs to be overcome is based on the development of 
prioritisation scheme efficiently allocating the bit budget across all the subbands. In 
Figure 2-31 the quantisation matrices Ql - Q4 can be derived on order of importance. 
[35] has shown that the lowest subbands tend to require more precision in general as 
compared to the higher frequency subbands. The bit budget therefore, can be thus 
-allocated. A further, more analytical, improvement suggested in [22] involves applying 
successive approximation to each of the coefficients in the each of the bands. This 
not only allows for an efficient allocation of bits to the more important subbands but 
also allows for features such as progressive video coding and precise rate control. 
Chapter 3 tackles the concepts of successive approximation more thoroughly. 
2. 7. Reconstructed Image Quality 
Since quantisation generally degrades the quality of the reconstructed or decoded image, it 
becomes vital that this be quantitatively or qualitatively expressed. This aids in the 
comparison of different compression systems and their effect on different image types. 
Page - 40 
2.7.1. Peak Signal to Noise Ratio (PSNR) 
The PSNR has become the adopted standard measure employed 10 compare image 
quality between different quantisation and coding systems. The PSNR is a quantimtive 
measure and is calculated by evaluating (Eq. 2.11), which rakes into account the original 
and the reconstructed image. It is generaUy accepted in the field that a PSNR of25dB 
or higher is of acceptable image quality. 
PSNR =olOJog[ :E J' , )db 
:!;(R-0) 
(Eq.2J1) 
Where, I represents an N x M block ofpe~k sign~! values (ie. 2nwa'"Tl!>rrnfim -1), R is'nn 
N x M block for the reconstructed image, and O is an N x M block for the original 
image. N x M represent the number of pixels In an image. 
PSNR is a computational method for comparing image quality, however simple scaling 
factors in the reconstructed image which can be deemed acceptable after visual 
appraisal, easily defoat this PSNR test, 
2.7.2. Visual Quality 
Even though the PSNR measure is a reasonable estimate, a full visual appraisal is also 
often performed on reconstructed images. Artefacts such as image scaling easily 
mislead the PSNR, while blocking artefacts which affect the PSNR less may be 
considered visually unacceptable. Figure 2-32 and Figure 2-33 both show images 
quantised to a PSNR of 30 dB. However one might consider Figure 2-33 to be more 
visually pleasing. 
Page- 41 
50 
100 
150 
200 
250 
300 
350 
400 
450 
500-. ;,,..u 
50 100 150 200 250 300 350 400 450 500 
Figure 2-32: 0.21 bpp Image with 30 dB PSNR 
50 100 150 200 250 ~ 350 400 450 500 
Figure 2-33: 0.19 bpp Image with 30 dB PSNR 
Page - 42 
2.8. Entropy Coding (Stage DJ 
Section 2.6 introduced a lossy component in the general image compression algorithm. 
The lossy component typically reduces the number of bits required to represent a 
coefficient or a set of coefficients. However, it does not provide a me<:hanism for the 
elimination of redundant data usually found inherent to the images or video, for instance 
large areas that are the same in colour or texture typically produce a set of coefficient~ with 
a few larger in value and lhe majority very dose to iero. Entropy coding is based on the 
loss-less coding of these large expanses of similar valued coefficients. Generally the more 
efficient codecs (MPEG 4, Wavelet Based codecs etc.) rend to employ an adaptive method 
to identify and code redundant coefficient are.as [42] [43]. Some entropy coding techniques 
;ire outlined below. 
2.8.1. ZigZag & Run Length Coding 
Zigzag and run-length coding [8] originate from a typical block-based image or video 
codec such as JPEG, H.261, H.263, MPEG 1 [12], MPEG 2, etc., which generally use 
the DCT ttansfonn and related quantisation. The prncess has two m:iin purposes, 
1. To convert a 2D black into a JD vector of coefficients for ease of transmission 
2. To minimise the redundancy of repeated coefficients within the vector. 
Figure 2-34 is a typical zigzag-coding map employed to code a typical 8 x 8 DCT 
coefficient block. It exploits the key features of a DCT and associated quantisation 
mechanism, to encnde coefficients where the high frequency coeffidents are 
significantly 9uantised or zero. Therefore producing a coefficient list that contains 
large DC values at the beginning and smaller or iero coefficients towards the end. 
This enables the use of a simple redundancy coding technique to be applied to group 
of like high frequency components within the block together, i.e. run-length coding. 
Page - 43 
Figure 2-34: Zigzag Coding Technique 
Run-length coding is a technique that generates a symbol to represent a larger run of 
one particular type of coefficient followed by another. For example in Figure 2-35 a 
different symbols can be allocated for the following runs. 
SymbA = 2221 
SymbB = 00002 
SytnbC = 00001 
110 1-401 sa 11 o 1 2 1 2 1 2 1 1 1 o I o I o I o 1 2 1 o I o I o I o I 1 1 ... 1 
Figure 2-35: Run Length Coding Example 
The symbols employed are typically statistically chosen by investigation of a large 
volume of images, and are typically fixed for a particular standard [8] [11]. Finally these 
symbols can then be Huffman or arithmetic coded to eliminate nearly all redundancy. 
2.8.2. Zerotree Coding Overview 
This form of entropy coding is fully covered in Chapter 3, as this thesis is based on the 
implementation of such a codec within a specific parallel architecture. The zerotree 
coding technique is generally performed on full image transforms such as the Discrete 
Wavelet Transform, and as such exploits the self-similarity or relations between 
subbands resulting from a DWT. These relations are then used to efficiently code large 
Page - 44 
areas of insignificant coefficients with a few Sjltnbols. The quantisation schemes used 
also tie in very closely with the 2erotrce coding and are generally chosen to 
accommodate some key features. 
2.8.3. VLCCoding 
Variable Length Coding is a form of loss-less coding that codes sjltnbols with a 
statistically chosen variable bit-length code, This type of final entropy coding is 
typically used in standards such as JPEG, MPEG and H.261/3, which have a 
statistically defined set of variable length codes to accommodate any variations in the 
image. VLC coding, especially in the mentioned situations, has two simple flaws 
1. The minimum code generated is of integer bit-length (i.e. I bit per symbol} 
2. Does not adapt to any image characteristics. 
The latter is easily solved by applying an adaptive algorithm such ns Adaptive Huffman 
Coding [46) [47] [48) which attempts to 'customise' the coding statistics to the image or 
sequence at hand. However, the complexity nnd memory requirements inc:rensc 
significantly. 
The generation of one bit per Sjltnbol is a flaw inherent to VLC coding techniques and 
as such is nnt easily overcome, Typically the more complex yet efficient Arithmetic 
Coding technique is employed to overcome this problem. 
2.8.4. Arithmetic Coding 
Arithmetic coding is a statistical coding technique that attempts represent source data 
with minimal entropy [49]. It is not n table lookup based approach like VLC / 
Huffman coding, thcrcfore it docs not: require each symbol to be represented by an 
integer number of hits. Also, more than one symbol can be represented in less than 
one bit. I tis this fea111re that makes this a better alternative to variable length coding as 
it approaches the Shannon [50] entropy bound. 
Page - 45 
The algorithm is based on an infinite range of real values ranging frotn O to 1. An 
example best illustrates the operation of the coder. Taking a 4 symbol systems, with 
symbols A, B, C and EOT, if having probabilities of occurrence 0.4, 0.3 0.2 and 0.1 
respectively, are transmitted in the following sequence ACB EOT, then Figure 2-36 
describes the operation. 
1.0 1 f*, 
A A A 
0.6 
,_ ~ -~1- B- 11-- I 1---- 1-B B B 
o.J ~ - c I I c I'\'-··- I C 
O.l I EOT I. ~ ~j EOT j -~'.j EOT Ii Value transmitted 
0.0 0.6 · O. from this ranoe 
e, 
Figure 2-36: Arithmetic Coding Example 
The first A litnits the range between 0.6 and 1.0, then each of the other symbols alter 
the range according to (Eq. 2.12) 
LowN =Lowa +Rangea *Q1, 
HighN =Lowa+ Rangea * (Q1 + PN) 
(Eq. 2.12) 
Where Q1 is the lower accumulated probability and P" is the newer probability. Finally 
a value between 0.664 and 0.6664 is chosen and sent. 
The decoder initially starts off with O and 1 as the range, then, since the number 
received is greater than 0.6 it knows the first symbol is an A, it then mimics the 
encoder until an EQT is extracted. This signifies the end of the symbol stream. 
Page - 46 
2.9. Image I Video Coding System 
With these basic blocks in mind a general video / image compression codec can be 
realised. Figure 2-37 is a block diagram representing a typical video / image encoder while 
Figure 2-38 illustrates a block diagram of the Decoder. This thesis is based on the 
development of a zerotree coding technique that can be modelled into such a system but 
with consideration for implementation in a massively parallel environment. A survey of 
some VLSI architectures is presented in [72]. 
hmge o~ 
Sequeoce 
Motion 
Analysis 
Motion 
Prediction 
Block I Image 
Fwd. Transform 
Block /Image 
Inv. Transform 
Quantisation 
Quantisation·1 
lmige Coding 
1 or2 Phase 
Entropy Coding 
l or2 Phase 
Entropy Decoding 
Figure 2-37: Image/ Video Encoding 
I i • Compressed 
Stream 
I Motion Image or ____.. Prediction Sequence Block /Image Inv. Transform Quantisation· ' I or 2 Phase L_____j_J::ompressed Entropy Decoding r I Stream 
i L image Decoding 
----- ·--·--·----·-------·----------
Figure 2-38: Image/ Video Decoding 
2.10. Conclusion 
This chapter has presented a number of fundamental principals surrounding modern 
image and video compression methods. The concepts introduced include explanations 
of images, video, image sampling, image/video compression, test images used and 
codec components such as, motion analysis and compensation (Block Matching 
Page - 47 
Motion Estimation, frame differencing), image tr.msforms (wavelets, Discrete Cosine 
Transform), image quantisation (subband) image/video coding schemes (zigzag, run-
length) and entropy coding (Variable Length Coding, arithmetic). This chapter is 
intended as au introduction and a base for the chapters 10 follow. 
',,,, -,_c • , ; i:,1,(,J.!U Page· 48 
Cliapter 3 
ZEROTREE CODING 
"My designation is Seven of Nine, Terrlacy Adjunct ofUnimatrix Zero-One, but you 
may call me Seven of Nine." 
Borg drr,nc 7 of9 introducing herself, .l~nr '/'rek: V~rug,r 
3.1. Introduction 
In 1993 Jerome M. Shapiro [22] introduced a simple, yet remarkably effective image-coding 
algorithm based on a wavelet transform. He combined the wavelet transform with a novel 
representation of the transformed coefficients and termed this scheme 2erotrce coding. 
Zcrotree coding, analogous to the Borg drone's request, can represent a larger piece of 
information in a more succinct manner. This hierarchical technique provided a means to 
represent the coefficients in terms of their magnirude, position and significance. The 
Embedded Zerotree Wavelet (EZW) coding technique exploited the hiernrchical nature of 
the wavelet transform to categorise coefficients of similar spatial position, but at different 
frequencies, into trees that were based on the significance of the coefficients. Since its 
original acceptance many variations of this technique have been applied to image and video 
coding schemes alike, eo.ch providing some form of improvement over the standard coder 
[51] [52] [53]. In 1996 S, A. Martucci and I. Sodagar [54] also introduced a version of the 
zerotree coder based on a modified set of symbols that provided more uniform results 
compared to the EZW coder. It was identified as being more suit:1ble for constant bit-rare 
video coding particularly for low bit-rate channels [55]. This technique termed ZeroTree 
Entropy (ZTE) coding became the standard adopted for the texture compression 
Page - 49 
subsystem In MPEG4. This thesis reports the work carried out to implement a zerotrcc 
coder in a massively paralleJ. architecrure to suit t:he Intelligent Pixel paradigm described in 
Chapter 4 
3.2. The EZW Algorithm 
The effectiveness of the EZW algorithm is directly influenced by the mechanism used to 
generate the coefficients, and as snch the DWT has become the proven c-hoice. This is 
because the spatial image content is arrnnged in t:he form of a hierarchical tree, which is a 
crucial requirement for the EZW. The DWI' subdivides the spatial image content into 
time-frequency subbands that are easily hierarchically arranged. As a r~5ult, image 
components that comain higher spatial change arc represented in the higher frequency 
subbands (HLt - HH3, Sec Figure 3-1 and Figure 3-2) while t:hc "smooth" changing 
areas arc represented at the lower frequency bands (LL). The number of scales determines 
the number of seynenratlon bands Wt subdivi:le t:he spatial change frequency range (0 - J) 
in an image. To illustrate this refer to Figure 3-1 and Figure 3-2. Figure 3-1 contains 
typic:al DWT subband decomposition map and a test image attributing sharp high contrast 
horizontal and vertical lines and an area of gradual spatial change in the middle. Figure 3-2 
is the 3-scale DWT subband decomposition of the test Image using a triangular wa:clct 
filter. An obvious characteristic can be identified with the U. band, in that it resembles n 
sub-sampled version of the original image. The other bands fall into one of three 
categories, the horizontal frequency meta-tree, the vertical frequency meta-tree and the 
diagonal frequency meta-tree. The three trees, depending on t:hc subband, show varying 
degrees of frequency components caused by the high contrast changes in the image. It is 
=ily noticed t:hat the gradual changing components, the box in t:he middle, is almost fully 
represented in the U. band. However, the most important observation mad~, in lthlS of 
the EZW algorithm, is that t:hc content of all the subbands arc in some way related to a 
component in the LL band. This implie1 that coefficients in different banch are possibly 
related to r:ach other via n tree stmcrure. 
:· ... ,,.,. Page - 50 
LHl . so 
LH2 I 100 150 HLl I mu I 200 
250 
:m 
360 
HL2 I HH2 I 400 
450 
500 
SO 100 150 200 250 :m 360 400 450 500 
I 
Figure 3-1: Test Image 
0 v~ v~ t!~ J ~ 
50 
100 
150 
200 
250 
300 
350 
400 
450 
500 
50 100 150 200 250 300 350 400 450 500 
Figure 3-2: 3 Scale DWT of Test Image 
The novelty, in terms of coding, exhibited by the EZW can be attributed to three major 
coding mechanisms within the algorithm. Figure 3-3 is a block diagram of the complete 
EZW coding-decoding process highlighting the main components 
Page - 51 
~ 1. Positional ..... 
Coding 
DWT ~ f--. 3. Significance ~ Aritluretic Reorder Coder 
~ 2.SAQ ..... 
- 1. Positional ..... 
decoding 
DWT ·1 It- i. 3. Significance 
---
Arithiretic ... 
Reorder- 1 Decoder I"' 
-
2.SAQ· 1 
-
Figure 3-3: Main EZW Components (Encoder & Decoder) 
3.2.1. Significance of Coefficients 
Given a particular bit budget the EZW algorithm always attempts to represent an input 
image to as high a standard as achievable within that bit constraint. To accomplish 
this, a search to locate the more significant coefficients within the transformed image is 
performed. The significance of a coefficient is determined by performing C ~ T , 
where C is the coefficient and T is the current threshold value that is initialised with 
the result of (Eq. 3.1) 
T = 2Ltog2 (MAX(Jc(x,y )J))J 
(Eq. 3.1) 
Where MAX(x) is the max values of matrix x and C(x,y) is the set of coefficients 
resulting from performing a DWT on an image. 
Since the initial threshold value only identifies the most significant coefficients, and 
since these coefficients only represent a small subset of the DWT coefficients, the 
threshold is readjusted and reapplied in another iteration to identify a greater number 
of significant coefficients. This significance identification process is iterated until the 
threshold reaches a value that generates too many coefficients or is zero. With each 
subsequent iteration the threshold value is halved, i.e. Tn +i = T Yi. The 6 images in 
Page -52 
Figure 3-4 shows 6 iterations of the significance identification mechanism, where the 
white 'dots' represent significant coefficients that have been identified. The images 
show results of threshold values 2048, 1024, 256, 128, 16 and 1 respectively. The 
values in italics indicate the number of significant coefficients found in that iteration. 
50 
100 
150 
200 
250 
:m 
50 100 150 200 250 :m :Bl 400 450 500 50 100 150 200 250 :Ill :Bl 400 450 500 
(a) (476) (b) (905'; 
50 100 150 200 250 :DJ :Bl 400 450 SXl 50 100 150 200 250 :DJ :Bl 400 450 500 
(c) (1050) (d) (2169) 
50 100 150 200 250 :DJ :Bl 400 450 500 50 100 150 200 250 :Ill :Bl 400 <ti! 500 
(e) (19241) (t) (142362) 
Figure 3-4: Significance Iterations 
Page - 53 
Generally lower threshold values identify a larger number of significant coefficients, 
which implies that a suitable compromise is often needed. Figure 3-5 illustrates this 
trend for the coefficient image used for Figure 3-4. The cumulative total indicates the 
total number of significant coefficients detected. 
X1~ 3 r-------r~-.~-,-~-,-~--.---~-.-~--,-~~~~~ 
2.5 
c 
5 2 () 
c 
Q) 
·;:; 
~ 1.5 0 
() 
c 
~ 
~ 
O'l 
ci5 
0.5 
\ 
I 
- Total Coefficients 
- Coefficients per Iteration 
• 
0 1 t ===-- .-::: t j, 
1 2 3 4 5 6 7 8 9 10 11 
Threshold size in bits (log T/log 2) 
Figure 3-5: Significant coefficient count for varying threshold values 
3.2.2. Positional Coding of Coefficients 
Previously it has been noted that the key factor that allows for effective zerotree coding 
stems from the DWT, where coefficients in alJ higher bands have relations to a 
coefficient in the LL band. For instance Figure 3-2 clearly shows the relation between 
the LL band and the HH3 band, which contains the diagonal high frequency 
components of the LL band, but at a greater resolution. The useful implication here is 
that, the LL band forms the top node of set of "relation trees" that spans the entire 
transformed image. Figure 3-6 shows a typical "relation tree" that results for each 
coefficient (or pixel) in the LL band. 
An important implication of these "relation trees" ties in with the decaying spectrum 
l221 nature of coefficients typical of wavelet transforms, in that the existence of a 
Page - 54 
significant coefficient in a higher frequency band depends heavily on the existence of 
an equal or more significant coefficient in a related lower frequency band. This then, 
justifies the use of zerotrees, which imply that a coefficient in a high frequency band is 
likely to be zero if a related coefficient in the low frequency band is zero. In practice 
exceptions to this exist and are dealt with differently within the zerotree algorithm. 
LHBands HI.. Bands 
' 
-- . ·------; 
'-i-.. I 
I I 
I I ! ____ . __ 1 
.,,'I\ 
,---;.'"'-';_ J ,~---, 
, .JJ.t l .. .... J 
,I I i I 
~--;:"'--' ~----' 
,--,--, ,- ----, I .I I. / 
, I ' 
! ~----' 
--., 
// 
_,,:/-- ,--rf'--, 
'-;/•' 'r._' '--"' , __ ' 
,t,,~·(1-, 1--' --#:---
, _,.-, ,' '"Tl- I ,_ - I, __ I 
., ,;,.,,. ,. ..-: .. ,--,~ --
'.c· I' 1-C... I '-- ft--
r -r::_ r, .. , _, --,,- ... 
, __ ,, __ I , __ . , '--
HHBands 
Figure 3-6: EZW Relations Trees 
Scale O 
Scale 1 
Scale 2 
Scale 3 
In general each parent node is related to four different sub-nodes ( children) belonging 
to the immediate higher frequency subband. The only exceptions to this are the 
coefficients located in the LL band, which have only three sub-nodes, and coefficients 
located in the la.st scale-3 subband, which have no sub-nodes. With the exception of 
coefficients in the LL band, all of the remaining coefficients have exactly one parent 
node. Coefficients in the LL band have no parent nodes. 
When encoding, the EZW algorithm takes advantage of these "relation trees" to 
describe the spatial coordinates of insignificant coefficients in high frequency 
subbands. A special symbol, termed a zerotree sytnbol, is allocated to describe a set of 
insignificant children coefficients that have a parent that is also insignificant. The 
advantage here is the ability to represent entire "trees" of insignificant coefficients with 
one symbol and still decode the positions of these insignificant coefficients correctly. 
Page - 55 
This is similar to the run-length mechanism employed in typical OCT compression 
systems, where the large lengths of zeros are rcpreterited by count values. However, 
the improvement in the EZW algorithm lies in its ability to define large sequences of 
zeros directly in two-dimensional space. A more generalised 2D hierarchical 
coefficient tree partitioning approach to Image compression termed "Set Partitioning 
of Hierarchical Image Trees (SPHI1)" is investigated in [7]. This :algorithm is an 
Improvement on the EZW algorithm but is significantly more complex in tcmis of the 
symbol ttee searches employed. 
Io order to fully categorise all possible occurrences of coefficient significance patterns, 
the EZW algorithm depends on four different symbols, which arc listed below. 
!. POS- Positive coefficient symbol - This symbol defines coefficients that are 
significant and positive In value. This symbol conveys no infonnation about the 
children coefficients and as such they may or may not be significant. 
2. NEG- Negative coefficient symbol - This ~ymbo! defutcs coefficients that are 
significant and negative in value. This symbol conveys no information about the 
children coefficients and as such they may or may not be significant. 
3. ZTR - Zerotree Root - This symbol defines coefficients that are both 
insignificant and have children coefficients that are insignificant. The compression 
efficiency of the EZW algorithm is gencmlly proportional to the existing number 
of such symbols. 
4. /ZO- lsola1ed Zero -This symbol defines coefficients that are iosignificant yet 
contain significant children coefficients. These symbols arc used to categorise trees 
that do not follow the decaying spectrum phenomenon, as well as define previously 
significant parents which less significant descendant. 
Figure 3-7 pictures a tlowchart representation of the symbol identification algorithm 
that is performed for each coefficient during an encode cycle. The decode process uses 
the flowchan in Figure 3-8 to identify the coefficients with respective symbols. 
Page-56 
NEG 
Yes 
POS 
Each 
Coefficient 
lZO 
No 
Figure 3-7: Symbol Identification Flowchart (Encode) 
Select Next 
Coefficient as Current 
Get Symbol & 
Current Coefficient Index 
Store Symbol for 
Current Coefficient 
Figure 3-8: Coefficient Identification Flowchart (Decode) 
Yes 
-Don' t 
Code 
ZTR 
Once the coefficient symbols have been identified, a special scanning technique is 
applied to order the symbols for correct transmission and decoding. Figure 3-9 
illustrates the scanning order for firstly the subbands and then for coefficients within 
the subbands. This transmission pattern facilitates the correct decoding of large 2D 
Page- 57 
zerotrees, as a single symbol can provide information as to which symbols require 
information from the stream that is ordered in this manner. 
------------------
____ ...... -- ... -
__ .... _______ .,.. __ _ 
""!' ... _ 
-,. 
-- ....... 
·--•• 1 1 1 1 
Figure 3-9: Scanning Order - Subbands & Coefficients 
The strict compliance with the zigzag nature of the coefficient scanning within a single 
subband is not really necessary, as a simple raster-scan technique will also suffice. 
However, for ease of sequential software implementation when searching for 
significant coefficients, the "zigzag" technique is typically used. 
3.2.3. Successive Approximation Quantisation (SAQ) 
Successive Approximation Quantisation is a method used to quantise coefficient 
magnitudes ( or introduce lossy compression) in a progressive manner. The coefficients 
are initially represented with a coarse quantised power of 2, and then refined 
progressively over time to equate the original. The incorporation of this scheme into 
the EZW algorithm provides a fully embedded approach to the quantisation of the 
coefficients, thus supplementing the positional coding mechanism with an efficient 
magnitude representation technique as well. Two reasons justify the use of such a 
technique over the flat quantisation model used in many typical compressing schemes 
such as JPEG, MPEG, and H261 etc.; 
1. Progressive image/video coding -This is a technique whereby the detail levels 
of an image undergoing transmission are increased over time. This means that 
Page - 58 
decoding of an image can start before the full content of its equivalent compressed 
stream has been received. Therefore, at any one time the image decoded is a 
representation of the original image, but at a level of detail proportional to the 
amount of compressed data received. This technique is further improved by 
applying significance ordering as described in the next sub-section, Section 3.2.4. 
2. Simple yet preeiBe tale control over the coding-A by-product of SAQ scheme 
allows for simple bit-wise truncation of the bit-stream. If a bit-budget for a 
particular communications channel is defined, then the coding of a particular image 
can be stopped at precisely that bit-count in full confidence that a near optimal 
image decode for that bit-rate is possible. Providing significance reordering also 
further improves this mechanism. Section 3.2.4. 
The SAQ mechanism is always initialised at the first significance identlficatlon phase 
(Section 3.2.1). Th.is phase then defines the initial quantisation level. Once the 
significant coefficients have been identified more detail relating to these coeffidems 
can be coded into the transmission stream. TI1e operation is generally based on the 
segmentation nf the coefficfom plane (matrix) into bit-planes, which are then 
systematically ordered and uansmitted, 
Decoding can begin anytime after the first quantisation level for the image has been 
received, subsequent refinement data can then be used 10 improve the image quality. 
Figure 3-10 is composed of a set of Images representing a single image being 
progressively decoded under a SAQ scheme. A dear improvement in the image is seen 
(fop to bottom and left to right) as more refinement passes are made. If all the passes 
are transmitted then this scheme will incur no loss in image quality. For this case, 
however, the bit.count, when compared to an uncompressed image actually l11crc~ses 
by at least a fuctor of (B + 2}{, where the largest coefficient can be represented in B 
bits and the largest image nugnirude can be represented by b bits. This is an 
npproximition as the effect of the positional coding scheme is highly dependant on the 
image selected. For the 'Lena' image this calculation indicates that the bit-count 
increases by a factor dose to 1.87, which Implies a size of approximately 3.9 Mbits, if 
all passes are performed 
Page-59 
Figure 3-10: SAQ Refinement process. 
The EZW algorithm simply employs a single binary (0, 1) digit to represent each 
coefficient refinement symbol. The symbols used are 
0 - For the coefficient C < T. 
1 - For the coefficient C ~ T. 
where T represents the current threshold. Since only two types of symbols are used 
and since the probability of occurrence is heavily dependant on the image 
characteristics, performing arithmetic coding may have no effect, and as such the 
resulting bit-stream entropy may be increased [22]. 
3.2.4. Significance Reordering 
A useful feature inherent to the EZW algorithm is its ability to provide a data bit-
stream that can be easily truncated at any moment, yet represent an image as close to 
Page- 60 
the original as possible within a given bit-budget. This feature is mainly attributed to 
the SAQ process. However, by reordering the coefficients in a manner where the 
refinements for the most significant decoded coefficients are transmitted before those 
lesser in significance, a "best'' representation of the original image is possible within the 
given bit-budget. The EZW algorithm incorporates this mechanism into its coding 
technique to fully enhance the positional coding and SAQ mechanisms. For example, 
if in a pass three coefficients were decoded, le. 43, 55 and 61, then during the 
refinement phase the order of the refinement bits arrive in reverse, as the coefficient 61 
receives the first refinement bit, foUowcd by the coefficient 55 and finally 43. In this 
manner if the bit-stream was truncated mid-stream of the refinement pass, the most 
significant coefficients, which have a greater bearing on the decoded image, receive the 
most refinement bits. Although this is a very useful technique it pioposes mher 
significant hurdle in terms of hardware implementation, as the coefficient list has to be 
constantly re-sorted. Tit.is feature is typically omitted in most hardware only EZW 
codecs due to the complex sorting structures required. 
Multi-resolution image/video coding - Tit.is is a technique which facilitates the 
transmission of a single coded image or video sequence to any nwnber of possible 
mrgets with varying bit-budgets relating to the available bandwidth constraints. For 
instnncc a single high quality im:ige coded for a Tl internet connection can have its bit-
stream truncated to suit a modem. A lower resolution image will be generated at the 
modem end, yet it maintains the "best" possible representation of the original image 
given this bit budget. 
3.2.5. EZW Encoding/ Decoding Process 
An enmple illustrating the EZW encoding and decoding algorithm, as extracted from 
[22], is now presented to aid in the explanation. Only the preliminary passes are 
performed for this iUustration. The stages within the algorithm are described below. 
I. Perform Wavdet Transform - Ao example coefficient map resulting from a 3-
scale D\i:'T conducted on an 8 x 8 image is depicted in Figure 3-11. This example 
has an anomalous high frequency coefficient with a valne of 47, which is re!ativc!y 
high when compared to its parent coefficients. The maximum magnitude is 63. 
Page· 61 
63 -34 49 10 7 13 -12 7 
-31 23 14 -13 3 4 6 -1 
14 15 3 -12 5 -7 3 9 
-9 -7 -14 8 4 -2 3 2 
-5 9 -1 47 4 6 -2 2 
3 0 -3 2 3 -2 0 4 
2 -3 6 -4 3 6 3 6 
5 11 5 6 0 3 -4 4 
Figure 3-11: Example of an 8 x 8, 3-scale wavelet transform 
2. Set Initial Threshold Value - The largest coefficient magnitude is 63, therefore it 
will require at least 6 bits to be fully represented. An initial threshold value of 
2<6- 1> = 32 is chosen as any power greater than 5 results in a threshold too large 
for the coefficients. 
3. Identify Significant Coefficients - A search is then performed to identify all 
significant coefficients by comparing them to the threshold. All coefficients 
conforming to C ~ T , where T = 32, are now identified as significant. 
4. Generate Coefficient Symbols (Dominant Pass) - The coefficient values 63, -
34, 49 and 47 which have magnitude values greater than the threshold, generate the 
symbols POS, NEG, POS & POS respectively. Table 3-1 lists the set of symbols 
generated for the first significant pass for the entire array. 
'The coefficient -31 in the LL and coefficient 14 in the HL band are both clearly 
insignificant. However, since coefficient 14 has a significant descendant, an IZO 
symbol is generated, and since coefficient -31 has an /ZO as a descendant it too 
generates an IZO symbol. The coefficient 23 is neither significant nor does it have 
any significant descendants, which results in the generation of a single ZTR symbol 
for the entire tree. If any parent coefficient is allocated a symbol other than a ZTR 
then all of its immediate descendants will be allocated symbols. All other 
Page- 62 
coefficients are not assigned symbols and are not transmitted. Figure 3-12 and 
Figure 3-13 show the resultant trees for both the coefficients and symbols 
respectively. Where Z = ZTR, I= IZO, P=POSand N=NEG 
Table 3-1: EZW Example First Pass 
Subband I Coeff1c1ent I Symbol 
I Value I 
LL 63 POS 
LL -34 NEG 
LL -31 IZ 
LL 23 ZTR 
HL 49 POS 
HL 10 ZTR 
HL 14 ZTR 
HL -13 ZTR 
LH 15 ZTR 
LH 14 IZO 
LH -9 ZTR 
LH -7 ZTR 
LH2 7 ZTR 
LH2 13 ZTR 
LH2 3 ZTR 
LH2 4 ZTR 
HL2 -1 ZTR 
HL2 47 POS 
HL2 -3 ZTR 
HL2 -2 ZTR 
Figure 3-12: Example EZW Coefficient Tree 
Figure 3-13: Example EZW Symbol Tree 
Page - 63 
These symbols are then scanned according to section 3.2.2, where the coefficients 
arc first ordered in terms of subband1 and then "zigzag'' grouping of children. The 
transmitted contents follow the order in Table 3-1. 
5. Transmit Refinemenl Bits (Subordinale Pass) - Since the four coefficients 63, 
-34, 49 and 47 are identified as significant, a further refinement hit can he sent in 
this pass. This is performed by evaluating (Eq. 3.2), where T il; the current 
threshold and C,., represents any previously identified significant coefficient 
magnitudes. 
(c,. -T)>!_ 
2 
(Eq.3.2) 
If this condition is met then a binary (1) is produced for tlm symbol, otherwise a 
binary (0) is produced. In this case the output symbols are 1, 0, 1 and Oas 31 > 16, 
2 < tG, 17 > 16 and 15 < 16 respectlvely. These bits are then reordered according 
to decoding priority, where coefficients that are decoded to higher values receive 
refinement bits from the start of the stream. Implementation of this requires that a 
decoder accompany the encoding process, hence adding to its complexity. The 
decoding stages will be presented next. 
6. Decode Symbols - Since the symbols are arranged in a subband hierarchy, the 
first symbols to arrive or be decoded belong to the U band. These symbols define 
the initial significant coefficient:; in the 11. band. In tum they offer the decoder 
information about which coefficients are significant in the next most significant 
band, so as to allocate the next incoming symbols appropnatdy. fo this example 
the first symbol is a POS, which indicates that the LL coeffici~n( is a positive 
coefficient, which is in the interval [32,64), Hence a value of 48 il; chosen. This 
initial symbol also suggests that the next three symbols that arrive belong to its 
immediate descendants, as it is not a ZTR. Therefore, when the symbols NEG, 
IZO and ZTR arrive they are allocated to coefficients in the LH, HL and HH 
hands rcspectlv~ly. The reconstructed values for these coefficients become -48, 0 
and 0. The ZTR in the HH band indicates that there will be no mOl"e symbols for 
lhe HH range of subbands in this pass. The NEG and /ZO symbols suggest that 
Page-64 
the next eight symbols to arrive belong to the LH1, and HLl subbands 
respectively. Figure 3-14 illustrates this process, where the light grey boxes indicate 
the positions of the next coefficients to arrive. The dark grey boxes drawn in the 
last tree represent coefficients that are anticipating refinement bits as described in 
the next section. Figure 3-15 firstly reiterates the band structure for reference and 
the final decoded coefficients after the first dominant pass. 
p l t--- >--a 
-
- - -· 
I I 
* 
-+-~1--
P N P 1Z 
# I Z Z z Z I 1 I z z 
I 
~ I I I f-FE 
p N 
* 
I z 
-+ , __ I 
----+-
I 
I 
-1-
I 
p z z z 
zl z z zl -
__.__4-----'-----1-i-
t z ~ -·. :: ! 
-
-
Figure 3-14: Example EZW Symbol Decode 
LHI 
I 
48 -48 48 J O 0 0 
- -
LH2 
HLI I HHI 
0 0 0 0 0 0 
-o I o 
... 
-
O I O 
HL2 I HH2 
~j~ ij 0 1 0 ---r 
-tt--1 
Figure 3-15: Example EZW after 1 Pass 
Page - 65 
7, RcOnemenl Decode - Once tl,c S)'mbol pass identifies the codficicms 10 be 
refined then this proccu im·okes is the application oft he incoming ordered stream 
of refinement bits 10 those coeffickms, Since alt chc decoded coefficients so far 
arc in the same range, ,he decoder expects the stream !O be org:miscd in terms of 
the subbandJ. Therefore it i1 or<lcred as LL, LH, LHI and Hl2. 11,e incoming 
bit-stream of 1,0,1 and O refines the cocffidents m 56, -40, 56 and 40 respecti1·cly. 
These values arc c::rlculatcd br ci·aluating (Eq. 3.3), where C,,, = Current 
significant S)mbo!, 1" = Current threshold (32) and R = Current refinement bit for 
cucfficicm. 
c,.,=c"'+~}rR=I 
C =C -!.)if R=O 
"' "' 4 
(Eq.3.3) 
'The ncwl)· rdinc<l cocffidcms arc then ordered in terms of significance for rhc ne~r 
refinement decode. ,\s a result the coefficients 56, 40, 56 and 40 becomes ordered 
as 36, 56, 40, 40, whidt C<11rcspoml1 m the original \'aluc, of 63, 49, 34 and 47 
rc:1pccti1•d)•, 
8, R«alcula" Thre1hold & Rci1cratc - The threshold is now hal1·cd !O result in 
the value 16 and the algorithm is n•pc~tcd from step 3 but whh the new threshold. 
nu, time howc1·cr pr•"·iousl1· defined significant coefficients arc trea1cd as zero (0), 
11,hich can onl)' be a"ib'flcd the simbols o( ZTRor JZa The new 1hrcshold then 
ids'11tifics r,.,.~, new cocfficicms, namdy-Jl in the HI, band and 23 in the HH band, 
which arc coded as NEG and POS symbol,. 111c IJ, band is now coded as an 
!ZO, but 1hc I.I I band is codl.J as a ZTRas it hH no new si!,'flilicant coefficients. 
l'or the refinement pass the new Jata is ordered in a manner where data (or nll 1hc 
pre~ious cocffickuts as amnl\al in chc last scd100 arc placed ahi::id o( Lhc ns'W 
coeflieicm•. ,\nd che entire decode process ls rcpea1cd. 
Paoe· 66 
3.3. ZTE Algorithm 
In 1996 Stephen A. Martucci, Iraj Sodagar, Tihao Chiang and Ya-Qin Zhang [SS] 
introduced a highly efficient zerotree wavelet based video coder for use in low bandwidth 
environments. The ZeroTree Entropy coder (ZTE) was originally submitted as a possible 
alternative for the primary video communications mechanism within the MPEG-4 
standard. However, due to an abundance of available DCT based processors it was not 
considered for this purpose and was later chosen for the compression of textures within 
still images in the MPEG-4 standard [56]. 
The ZTE algorithm, like its predecessor (EZW), is based on the mapping of coefficients 
resulting from a DWf. It also exploits the self-similarity, or relations between the high and 
lower frequency subbands, to efficiently code large 2D areas of insignificant coefficients 
with symbols. The primary difference between the two relates to the way quantisation is 
performed. While the EZW performs a SAQ on the wavelet coefficients, the ZTE applies 
a single external quantisation process prior to coding the symbols, thus not supporting true 
embedded quantisation or multi-resolution coding. However, a significant advantage of 
this scheme is that it simply requires two passes, one for the symbols and the other for the 
coefficients. In addition a simple pseudo multi-resolution technique can be implemented 
by reordering the symbols and coefficients in an appropriate subband hierarchy. Martucci 
[SS] also proved that this codec performed better for low bit-rate video communication 
with an added advantage that resulted in a more constant quality-rate. 
The functional block diagram for a typical ZTE algorithm can be seen in Figure 3-16. 
DWT ..... 1. Subband ~ 2. Positional ---. Arithmetic ~ Quantisation Coding Coder 
DWT- 1 ~ 1. Subband ...- 2. Positional f- Arithmetic ~ I"" Quantisation· 1 Decoding Decoder , 
Figure 3-16: Main ZTE Components (Encoder & Decoder) 
A description of aspects relating to the ZTE coding/ decoding process is presented next. 
Page- 67 
3.3.1. Subband Quantieation 
Since the ZTE algorithm docs nor include a fully embedded qllllntisation scheme, it 
depends on an external process to ptoVlde zero coefficients so as to build ierotrees. 
TypkaUy a predefined quantisation matrix, either in the form of singular values for 
subbands or a fuU)' qualified subband-sil:ed quantisation array, is applied to each 
subbaod. The lossy-compression component for the ZTE algorithm occurs here and 
as such it directly controls the number of sigoificam coefficients. The aim being to 
quantise as mnny high frequency coefficients to a zem value, which is rypicaUy 
performed by quantising these high frequency coefficients with much larger values than 
that used for low frequency coefficients. If an image has little or no high frequency 
components then the three largest bands, taking up•;. of the image resolution can be 
quantised to near zero. The ZTE algorithm relics on the existence of a significant 
number of grouped zero coefficients to operate efficiently. 
Without an embedded quantisation scheme this technique lacks the simple rate control 
mechanism (i.e. the bit·ra!C control) exhibited by tl,c EZW algorithm. The mre comm] 
scheme here is dependant on two fucmcs, 
I. The characteristic, 0£ 1he image - Generally the lower the higher frequency 
content an image possesses the lower the resulting bit-rate is. For example, in an 
cxin:me situation a single colour image, such as a black screen gcncrn!cd tluring a 
scene change, gcnera1c-s coefficients which only reside in the lowest frequency 
band. Since the image received is typically uncontrolled the main quantisatlon 
mc'Chanism focuses on the next factor. 
2. The quan1intinn veewr - This is a scr of values which an: applied to each 
subln.nd r~-spccti1·cly. The coefficients selected in 1hi, vector directly influence the 
final biMa1e. Therefore, the difficulty 6~-. in optimisbg this qu.:mti~tion vector to 
hen 1uit 1he channel bandwidth. Sin~ the wa1·ckt 1nmsform generally observes a 
decaying spectrum trend !22] across the subbaodi from low to high frcqucnC)', it i• 
e~ploitcd to form a quantisation l'L'Clor increasing Ill quantiutinn value, Figure 
l·l7 mu111ratc, the quantimion 1•cc1nr and its application to reladn: 1ubb:mds on 
che transformed image. The lighter colouts rcpr~~cot Inger quantisation 
rucfficirm, in compariwn to the darker coloun. A1 on be seen, Q2 and Q3 arc 
Page ·68 
of equal importance (similar colour) yet Q4 is less important than both of them, 
therefore Q2 and Q3 are quantised less compared to Q4. 
Q5 
Q8 
Q6 I Q7 
Quantisation Vector 
[QI Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 QlO] 
Q9 QIO 
Figure 3-17: Quantisation Vector assignment for Subbands 
The magnitudes chosen for Ql. .. Q10 depend on the DWT employed. Multi-tap 
filter based DWT algorithms (e.g. Vilasonor, Daubechies 9-3) generally produce 
coefficients that are larger in magnitude than most 2/3-tap filter based DWf 
algorithms (eg. Triangular, Harr). An adaptive approach to the selection of these Q 
values generally presents an optimal solution. The technique used in [SS] operates 
on the principal that an average bit-budget derived from the previous image in a 
sequence, is used to kerb the bit-budget, and ultimately the quantisation vector, for 
the current image. Therefore, to achieve a targeted bit-rate the average bit-budget 
is appropriately altered. Once derived, this average is further subdivided to allocate 
all subbands an individual bit-budget. Based on the previous quantisation 
coefficient, Q and the current bit-budget, a new Q value is generated for each of 
the subbands. By applying linear regression and extracting the coefficients of a first 
order autoregressive model, the quantisation weighting Q1 .. Q10 can be calculated. 
Generally the initial frame is processed using a default setting and corrected over 
ti.me. 
3.3.2. Positional Information Coding 
The tree generation technique closely resembles that used in the EZW in that it also 
exploits the self-similarity between the higher and lower frequency subbands. 
Page- 69 
Therefore the same parent-child telationships are still used. Three major differences, 
however, contrast the two; 
1. The ZTE only genel'lltes one sel of symbols - during image encoding only a 
single set of symbols are ever generated. These symbols rely on the quantisation 
mechanism to provide the necessary zero coefficients. Unlike the EZW, the ZTE 
does not search for a set of different threshold values, instead it defines coefficients 
as either signifkant (<>0) or not (=O). Then trees are fomied to account for the 
non-significant (quantised to zero) coefficients. Since a SAQ process is not 
performed, the significant coefficients arc encoded whole afrer the preceding 
symbols have been identified. Distinctions between positive and negative 
significmt coefficients are not made either. Therefore, an extta bit is embedded 
into the coefficient valne to indicate the sign. 
2. Two pass mechanism per image - The lack of SAQ mechanism enables the 
encoding and decoding process to be completed in just two passes. One for the 
symbols and the other for coefficients. 
3. Only 3 different symbols ;an: used - The ZTE scheme uses 3 different symbol 
types to identify the contents of a DWI'. This smaller sized S)mbol set has been 
proven to perfomi better for low bit-rate video. 
To account for all possible cases the foUowing three symbols are used 
1. ZTR- Zero Tree Kool - These symbols define coeflicients are not significant, 
have significant parents, yet comain no significant descendants. The more ZTR 
symbols gcncrntcd the better the compression. 
2. VZT- Valurd Zero Tree root - These s;,nbols identify coefficients that are 
significant yer contain no signific:mt descendants. Thi, type of S)mbol is used to 
facilitate !ow bit·r:tte video. AU significant coeflidcn1s in the highest frequency 
sub band are allocated th.i.! symbol. 
Page- 70 
3. ~ - A Value - Coefficients identified as ~ are determined when they 
contain significant descendants. As such an insignificant coeffi.cient can be deemed 
a VAL if it contains a significant descendant. In this case, where the decaying 
spectrum phenomenon is violated, a zero coefficient value is transmitted for this 
coefficient. 
All other coefficients are zero and are not transmitted. 
3.3.3. Encoding Process 
62 -32 48 0 0 0 0 0 
-28 20 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 32 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
Figure 3-18: [2 41616] Quantised Coefficients 
To illustrate the operation of the encoding process the example wavelet coefficients in 
Figure 3-11 are reused. After applying a subband quantisation vector of [2 4 16 16], for 
each of the Q values [Ql . .. Q4], on the wavelet coefficients, the resultant pre-encode 
map is presented in Figure 3-18. The following steps are then performed. 
1. Identify Significant Coefficients - Each quantised coefficient, in the coefficient 
map is iteratively searched for significance determination. A coefficient is deemed 
significant if its magnitude is greater than zero. The significant coefficients for this 
example are shown Figure 3-19. 
2. Identify Parent / Child Relationships - The significance map is then arranged 
in a tree structure identifying all parent and child relationships inherent to 
Page - 71 
hierarchical wavelet decompositions. The tree structure, for this example, also 
containing the significant coefficients, is presented in Figure 3-20. 
s s s 0 0 0 0 0 
s s 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 s 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
Figure 3-19: Example ZTE Significance Map 
Figure 3-20: Example ZTE Significance Tree 
3. Generate Symbols - Each relevant coefficient in the map is assigned a symbol, 
otherwise it is not transmitted. Significant coefficients th.at have no significant 
descendants are assigned the VZT or valued zero tree root symbol. All other 
significant coefficients are assigned as VAL. Insignificant coefficients with 
significant descendants are assigned the symbol VAL with coefficient magnitude 0. 
All insignificant coefficients that belong to a significant parent yet contain no 
significant descendants are assigned the ZTR symbol. Figure 3-21 illustrates the 
resultant symbol tree for this example (the following shortened forms are used V = 
Page - 72 
VAL, VZ = VZTand Z = ZTR). The areas without any symbol allocation are not 
transmitted. 
Figure 3-21: Example ZTE Symbols 
4. Transmit Symbols - During this stage the generated symbols are ordered into 
subband hierarchies and transmitted. The arranged symbols for this example are 
[VAL] [VAL VAL VZ1] [VZT ZTR ZTR ZTR ZTR VAL ZTR ZTRJ [ZTR 
VZT ZTR ZTRJ, where the [] are used to indicate subband separation. 
5. Transmit Coefficients - Once the symbols have been transmitted the coefficients 
are ordered into subbands and transmitted. For this example they are (62] [-32 -28 
20] [48 O] and (32] 
This step concludes the two-pass ZTE coding process for a particular image. 
3.3.4. Decoding Process 
The decoding process, like that performed in the EZW, commences once symbols for 
the lowest subband have been received. Since these symbols are hierarchically 
arranged, symbols from a lower frequency subband are used to allocate incoming 
symbols to the next higher frequency subband. A step-wise description of the 
decoding process with input vectors [VAL] [VAL VAL VZ1] [VZT ZTR ZTR ZTR 
Page - 73 
ZTR VAL ZTR ZTRJ [ZTR VZT ZTR ZTRJ for the symbols, and [62] [-32 -28 20] 
[48 OJ [32) for the coefficients follows. 
1. Decode Symbols - Once a set of symbols relating to a subband is received, these 
are used to determine if any other symbols need to be decoded. In this example 
the first received symbol is a VAL, which indicates that this coefficient requires a 
value and that its immediate descendants require symbols. Therefore, these 
descendants are assigned the symbols VZT, VAL and VZT The VZT sytnbol 
indicates that this coefficient's descendant tree contain no significant coefficients to 
expect symbols for. This process is then repeated for each subband. Figure 3-22 
illustrates this process (the following shortened forms are used V = VAL, VZ = 
VZTand Z = ZTRJ. The bold sytnbols identify the newly allocated symbols. 
VIV ? ? ? ? ? ? 
VIVZ ? ? ? ? ? ? m-+ ? ? ? ? ? ? ? ? ? ? ? ? . 
? ? ? ? ? ? ? ? ? ? 0 0 ? ? ? ? 
? ? ? ? ? ? ? ? ? ? 0 0 ? ? ? ? 
? ? ? ? ? ? ? ? ? ? ? ? 0 0 0 0 
? ? ? ? ? ? ? ? ? ? ? ? 0 0 0 0 
? ? ? ? ? ? ? ? ? ? ? ? 0 0 0 0 
? ? ? ? ? ? ? ? ? ? ? ? 0 0 0 0 
~ vz z 0 0 0 0 vz z z 0 0 0 0 ~ vz z 0 0 0 0 vz z z 0 0 0 0 
z v 0 0 0 0 0 0 z v 0 0 0 0 0 0 
z z 0 0 0 0 0 0 z z 0 0 0 0 0 0 
0 0 ? ? 0 0 0 0 0 0 z vz 0 0 0 0 
0 0 ? ? 0 0 0 0 0 0 z z 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Figure 3-22: Example ZTE Symbol Decode 
2. Decode Values - In this phase coefficients that were assigned significant symbols 
in the previous stage are allocated coefficient values from the stream. As per the 
symbols these are arranged in a hierarchical sub band stream. With this example the 
first decoded coefficient, the LL coefficient, receives the value 62. The next set of 
subband coefficients receives values -32, -28 and 30. In this manner the whole 
image is reconstructed. Figure 3-23 depicts the coefficient value decode process. 
Page - 74 
62 I ? ? 0 0 0 0 0 62 l-38 ? 0 0 0 0 0 
7 I ? 0 0 0 0 0 0 -28120 0 0 0 0 0 0 
0 ? 0 0 0 0 0 0 0 ? 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 ? 0 0 0 0 0 0 0 ? 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
62 l-38 48 0 0 0 0 0 62 l-38 48 0 0 0 0 0 
-281 20 0 0 0 0 0 0 -281 20 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 ? 0 0 0 0 0 0 0 32 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Figure 3-23: Example ZTE Coefficient Value Decode 
Completion of this process cycle results in singe image decode. 
3.4. Search Techniques 
The parent-child significance determination process, especially when considering 
implementation, deserves particular attention as it dictates the memory structure and 
processing pattern required to perform the identification. Two typical techniques for such 
a search are described below. 
3.4.1. Depth First Search 
The Depth First Search (DFS) (57] is a significance identification search that scans a 
dependence tree in a bottom-up manner, selecting a single sub-tree at a time. It 
requires that the coefficient data be organised into sub-trees, with the highest 
frequency subbands located at the outer fringes of the main tree, and the lowest 
frequency subbands located towards the trunk of the main tree. Once a sub-tree has 
been scanned the next sub-tree is selected for scanning, until all the sub-trees are 
scanned, whereupon the main trunk linking all the sub-trees is scanned. Figure 3-24 is 
a typical wavelet coefficient map that allocates a unique label for each coefficient. The 
significance dependence tree for this map is shown in Figure 3-25. The dark curved 
Page - 75 
arrows and the numbers beside each indicate the searching order. For a single 
processor system this method results in a vertical tree-wise search performed from one 
horizontal extreme to the other. 
ia ib ja jb mamb na nb 
ic id jc jd mcmd nc nd 
ka kb la lb oa ob pa pb 
kc kd le Id oc od c d 
Figure 3-24: Coefficient Map 
A typical realisation of the DPS technique requires the use of a single linea bank of 
RAM, arranged into trees as in Figure 3-26, and a few variables to retain significance 
information. A pointer keeps track of the current coefficient being processed, which 
are organised into groups of four. Each group supplies significance information to its 
parent. When a parent is analysed, its significance information is stored for the next 
level parent in an external variable for that scale. In this manner the entire image can 
be processed by a single sequential processor system. However, this zerotree search is 
more suited to a parallel processing system where each tree is processed individually as 
they have very little dependence on each other. When all sub-processors have 
completed the allocated task, a main processor can then collate the significance 
information. 
Page - 76 
--?!'---[fit-~---~ 
I ea I eb I ec I ed I ' I fa I lb I fc I fd I 
I ka I kb I kc I kd I 
-- -~------!";? I 
,"';' !\ 
..,, ,. ., . 
! la! lb! le! Id I 
I 
I 
I 
I 
. --- -
Figure 3-25: DFS Tree Search 
ea I la I ma eb lb mb ec ·1c me ed Id md im= m:::: =:::!IE 
na 
nb 
no 
nd 
m 
ga 
II I I ka II I I 08 gb kb ob gc kc oc gd kd od 
lE 
ha 
hb 
Figure 3-26: DFS Coefficient Arrangement 
13 I dd 
.... ,,\ 
I 
I ea I pb I e I e I 
Page- 77 
3.4.2. Breadth First Search 
The Breadth First Search (BPS) [57] algorithm, when compared to the DFS algorithm, 
operates by first scanning an ordered tree horizontally before proceeding to the next 
vertical level. The BFS is predominantly suited to single sequential processor based 
systems. Figure 3-27 illustrates the processing pattern required to navigate an example 
tree performing the BFS. The horizontal search pattern is seen. 
25 
27 
-_., 
-------
.. -.. --
-----
---
------
--~ ~ ~------
--- .,"" ...... 29 
,/~":... ,,<,: ... \ ,-'r"" .. ~ . -;,- ~ ~
~ ~ - 18 10/ • ',r :c 20 If.. 1' . <Ill ,, 22 
ma mb mo md I oal obi nel ~I oal obi ocl odl 
• 
I 
' I 
17 19 21 23 
Figure 3-27: BSF Tree Search 
The BFS technique, particularly in reference to zerotree significance identification, is 
typically realised by the use of a single bank of RAM, which is coupled together with 
two index pointers. The coefficients are firstly arranged into the manner depicted in 
Page- 78 
Figure 3-28. One of the two pointers (P1) is initialised at coefficient 'ea' or the start of 
the block of memory, while the other (P2) is initialised to 'ba' or the start of the next 
level. 
P1 
1 
ea 
eb 
ec jd 
II I I pa ed ka pb 
fa kb 
fb kc 
fc kd II I 1--nm ~ P2 
fd la 
ga lb 
gb le 
gc Id 
gd ma 
ha mb 
hb me 
he md 
hd na 
ia nb 
lb nc 
nd 
Figure 3-28: BFS Coefficient Arrangement 
Coefficients found at location P1 are tested for significance and are used to establish 
the dependence tree for the coefficient at location P2. For each increment in pointer 
P2, P1 is incremented four times, therefore linking each parent to a set of four 
dependants. When P1 reaches 'ba', P2 would have reached 'ab' and hence the parent-
child relationship is correctly matched for the next higher level. The only exception to 
this is the lowest scale 'aa', which is related to only three dependants and as such P1 is 
only incremented three times when P2 points to 'aa'. 
The main advantage of the BFS technique is that it facilitates the use of a single 
processor to perform an efficient encode. 
Page - 79 
3.5. Performance Results 
To test the suitability of both algorithms, two images were selected as inputs. The 512x512 
pixel 'Lena' image to represent common images and the 176x144 pixel 'Jenny' image more 
suited to a typical representation of a mobile multimedia communications image. Both 
coding methods employ a triangular wavelet filter, an EZW or ZTE scheme and an 
arithmetic coder. The performance comparison is attained by compressing each image to a 
variety of bit-rates, while ascertaining Peak Signals to Noise Ratios for each such rate. 
55 
50 
45 
40 
35 
co 
~ 
a: 30 
z 
en 
c.. 
25 
20 . 
• 
15 
10 
5 
0 2 4 
Lena Image 
6 
BITCOUNT (bits) 
8 10 12 
x 105 
Figure 3-29: ZTE vs. EZW Performance comparison/or the Lena image 
Figure 3-29 displays the performance curves for both ZTE and EZW algorithms when 
applied on the Lena image. A marginal improvement in the PSNR for the ZTE algorithm 
especially at lower bit-rates can be observed. In Figure 3-30, the performance graph for the 
'Jenny' image, this improvement is also visible. 
Page - 80 
55 ,---~~~-,-~~~~-,-~~~~,----~~~--,-~~~~~~~~~~~~~~ 
ro 
:3. 
50 
45 
40 
35 
Q'. 30 
z 
CJ) 
Cl. 
25 
20 
15 
10 
Jenny Image 
. . . . . . . . . . 
. . . . 
• 
.. 
. . . . . . . . . . 
..... 
r==z,'E 
~
5~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
0 2 4 6 8 
BITCOUNT (bits) 
10 12 14 
x 104 
Figure 3-30: ZTE vs. EZW Performance comparison for the Jenny Image 
3 .6 . Conclusions 
In this chapter, investigations into both the EZW and ZTE entropy coding algorithms 
have been made. Both algorithms use the self-similarities in inherent to subbands resulting 
from a wavelet transform to identify trees of insignificant coefficients. Although both 
techniques are similar in nature, the differences that separate them make for key 
consideration especially for hardware implementation purposes. The EZW algorithm has 
advantages in terms of precise rate control through the use of multiple symbol and 
refinement data passes. In comparison, the dual-pass nature of the ZTE, however, 
improves the coding speed and simplicity of implementation, particularly in reference to 
hardware implementation. The marginal improvement in performance exhibited by the 
ZTE algorithm should also be noted. As neither algorithm has a clear advantage over the 
other, the task of selecting the most appropriate for parallel hardware implementation is 
reserved for Chapter 5, where hardware issues of both algorithms are examined. 
Page - 81 
Cliapter4 
INTELLIGENT PIXEL PARADIGM 
"A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a 
ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort 
the dying, take orders, give orders, cooperate, act alone, solve equations, analyse a new 
problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die 
gallantly. Specialization is for insects.tt 
Ln2aru, !..nng, "Time Enough for Love" (R.A. Heinlein) 
4.1. Introduction 
The Intclligent Pixel Paradigm (IP), first introduced by Professor Kllmran Eshraghian in 
1998 [58], is a novel OPTO-VLSI architecrurc targeted towards low-power multimedia 
communications systems. In comparison to conventional multimedia communications 
equipment, this paradigm attempw to amalgamate traditionally modular multimedia syi;tem 
components, i.e. capture, processing& display, into a single device. Although applicable to 
a variety of systems, tbis paradigm was initially considered ideal for video 
compression/decompression systems. In this ca~e, the IP paradigm introduces a further 
novel component that suggests that the multimedia system be realised by a parallel, 
independent, pixel-wise processor array termed the Intelligent Pixel Array (IPA). The 
potential advantage of this technique is two fold, Firstly it allows for array size scalability 
to any arbitrary size (I.e. the array size is independent of the processing architecture) with 
Page-82 
minimal Increase in clock frequencies, secondly employing a massively parallel processing 
architecture allowing for the use of vciy low dock speeds, providing the same processing 
ability at a lower cost in required power. 111c major challenge posed by this paradigm is 
two fold. Firstly the challenges posed in designing a traditionally sequential processor 
based algorithm in a massively parallel environment Secondly the limitations posed by 
VLSI deiigu technologie~ such as the maximum die-size, minimum .feature me and 
ptoduction yields (i.e. larger designs increase the possibility producing failures). 
This thesis focuses on the development of a massively parallel pixel-wise 2erol!ee coding/ 
decoding algorithm to suit the novel Inrelligcnt Pixel Array architecture for the purpose of 
multimedia video communications. 
4.2. Conventional Video Coding Systems 
Traditiona.lly, multimedia video communications systems are composed of five primary 
components. 
1. CCD (Charge Coupled Device), CMOS or other video camera - for image 
capture at a specified rnte (video). 
2. ADC(Analogue to Digital Converter) and Encoding - Convert. an 
analogue video signal to digital before applying Digiml Signal Processing (DSP) 
techniques ro compress the size requirements of the captured video. 
3. Transmitter/Receiver - This component handles the communications 
between twO multimedia video communications devices within a specified 
bandwidth limitation and standard ~.e. 3G, GSM etc.). 
4. Decoder and DAC (Digl!al to Analogue Convener) - Exp-~ruls n 
compressed video stream ro original bandwidth, before converting this to an 
appropriate analogue signal for display. 
5. CRT, LCD, LED or other display device - for the display of decoded 
images. 
Conventional multimedia solutions (72] genemlly isolate these components into separate 
devices (cg Mictosoft's ASF streamirigvideo format). 
Page-83 
Capture 
CCD / CMOS 
Conventional 
===--I I 
DSP 
Wavelet I Zero tree 
OCT 
Low Barrlwidth Conrections 
TX/RX 
GSM 
3G 
DSP 
Wavelet I Zerotree 
OCT 
--High Barrlwidth Conrectiins ========----
Display 
LCD / LED 
CRT 
Figure 4-1: Conventional Video Communication Methodology 
Figure 4-1 illustrates a typical multimedia communications system resembling that found in 
most PC based solutions. The main disadvantage of this system results from the existence 
of some high bandwidth interconnections between devices. This requires high speed 
interfacing between, for instance, the camera and DSP device. For example a QCIF image, 
with size 176 x 144, with an 8-bit colour depth, operating at 25 frames per second requires 
a bandwidth of approximately 5 Mbps. Although present technologies accommodate bit-
rates of this order, as the resolution and colour-depth are increased, most systems, unless 
specially designed, perform inadequately. Generally in order to minimise the amount of 
data generated and exchanged between these components an analogue signalling technique 
is also employed (e.g. Composite, SVIDEO etc.). 
The Digital Signal Processor (DSP) generally performs a transformational task (DCT, 
Wavelet etc.) on the raw data to produce a compressed, low-bandwidth stream which is 
then transmitted. However most PC based systems cannot adequately provide a high 
bandwidth interface while maintaining real-time compression of the raw data, especially at 
higher resolutions and colour depths. Here pure hardware solutions (eg. Matrox RT2500) 
that generates compressed streams tend to perform more efficiently. 
Once compressed, the low bandwidth stream, targeted for a particular transmission 
channel is distributed. The decoding process receives a compressed data stream from the 
Page - 84 
network, which is then decoded by the DSP before producing the decoded frames on the 
display device. Since the PC already has a mechanism to handle large volume data outpul!l 
(i.e. CRT via RGB c:ibles) this is easily performed. However, the decode process generally 
demands a high CPU utilisation and is further hindered if the video adapter does not 
s~ppon overlays and motion compensation. 
Most PC based systems tend to already accommodate some methods f~r high-speed 
imecconnecrs and compression. However when considering the development of a mobile 
video communications device, these systems have to be redesigned to suit the application 
at hand nnd be integrated into the device. Considerable development time and effon can 
be spared here if n single chip device had the ability to perform these tasks and provide a 
compressed stream for transmh;~io11. Devie<:8 that exhibit !ow 1>0We£ consumption 
chamcreristics will prove robe of particular use. 
The scope for the IPA, therefore, can be summarl:;cd into the development of a massively 
parallel, low power, video capnu:e, compression and display system implemented on a 
single chip. 
4.8. The lPA Vuleo Coding System 
The Intelligent Ph:cl Array (IPA) is a concept that merges optical and VLSI t-cchnologics to 
provide a singular OPTO-VLSI device with the ability to provide the full functionality of 
conventional video communications systems. The ''Divide and Conquer" principal has 
always been a particularly advantageous approach for the simplified realisation of any 
complex system, however in this case the meriw of a more unified approach are exploited. 
Two major advantages of such a unified approach, especially in terms of realisation within 
n single VLSI device, are 
1. The deveJopment of a eingle system on chip 1olu1ion - for multimedia 
communications devices, simple single chip solutions provide an avenue for 
product size compauion and development cost reduction, for instance, for 
use in hand held units, possibly even wristwatches. 
Page-85 
2. Minimisation of interfacing between third party modules - resources 
utilised in the development of a working interface between the typically high 
bandwidth capture and display components with processing components can 
be allocated for other needs. 
To comply with this unified strategy, the IP A design attempts to incorporate the three 
components of image capture, processing and display into one 3D OPTO-VLSI device. 
The 3D component is derived due to the raw data I/0 handling mechanism, which 
receives / transmits raw data in a perpendicular plane to the processing array as seen in 
Figure 4-2. The proposed video communications system supported by the IP A is 
illustrated in Figure 4-3. 
High Bandwidth Raw Data 
,/ Input I Output 
,/~ In 3rd dimension 
,// Eg. Image Capture/ Display 
•-..: ___ _ 
( -----
Low Bandwidth Control 
&Stream~ 
I 
I 
I 
I 
I 
I 
I 
I 
I 
. 
Figure 4-2: 3D Chip I/0 
Image 1 IP Array 
\., 
Capture Processing Processing \ Display Processing Processing \~ 
< Low Bandwidth Interfacing 
'4 
Image 2' 
Image I ' 
/\ 
' ~ 
Display 
Capture 
' 
.. 
Image 2 
Figure 4-3: IPA Video Communications System 
Page - 86 
In order to accommodate a novel 3D I/ 0 approach, the array design employs a pixel-wise 
capture mechanism, directly coupled to a massively parallel, pixel-wise processing array and 
a pi..xel-wise display element. The array configuration for a proposed QCIF sized device 
with a close-up of the visual pixel pattern is illustrated in Figure 4-4. The dark lines are 
caused by the inter pixel routing paths, while the dark squares at the top left corner of each 
pixel, is caused by the capture photodiode. The reflective metal surrounded by these dark 
artefacts constitutes the display component. The processing remains hidden under this 
display metal. 
Pixel Close-up 
IPA 
Figure 4-4: QCIF IPA and Pixel Close-up 
4.4. Intelligent Pixel Structure 
To accommodate parallel per-pixel capture, display and processing, the IP A is constructed 
with an array of M x N (for QCIF this is 176 Columns x 144 Rows) individual pixels 
arranged in a grid pattern. Each of the pixels, although differing in spatial position, share 
identical capture, processing and display characteristics with every other pixel. The only 
difference between pixels, relate to Chapter 5, where pixel routing interconnects differ. 
Figure 4-5 and Figure 4-6 illustrate the top-view and cross-sectional view of each pixel 
Page - 87 
respectively. Using this mentality, the image is captured via the photodiode, processed by 
the 'hidden' internal circuitry and displayed via the LCD component. 
t 
Pixel 
Interconnections 
ADC L ..__ .... 
. . . . . . . . . . ... 
...... ·.·.· .. ·.·.·.· ... ·.·. · 
·.·.·.·.·.·.·.·.·.·.· 
. . . . . . . . . . . 
:]::~oG¢isjIJg:~ : 
: : Driver: C:irctiihy:::::: 
,Cw·:Jiitl~i. ·-" · 
Pixel 
Interconnections 
.. .. ..... 
. / to:µ M~ta:LM.ifri:i( :: 
: : : : : : : P.6i-: L.¢P :t>ti-vt!r 
. Liq I.rid :Crystaf & tetis : · 
Pixel 
Interconnections 
Figure 4-5: IP Top View 
Incident 
\. Light 
\. 
Capture\ 
Lenslet \ 
l 
/ 
I 
I 
\ 
/ 
' PD 
\. 
\. 
Top Metal Mirror Driver 
\ I 
Reflected & 
LCD Modulated 
/ L ight 
\ / D isp lay Lenslet 1 , 
I 
\. 
'i Liquid Crystal 
Support Posts 
Figure 4-6: IP Cross-sectional View 
t 
J 
Cross Section Here 
Pixel 
Interconnections 
Page - 88 
4.4.1. IP Capture Component 
In order to capture an image, the intensity of the light falling on the photodiode is 
measured. For all intents and purposes the photodiode can be viewed as light intensity 
dependant current source; therefore, by allowing the photodiode to charge a fixed size 
capacitor and measuring the time taken, an intensity measure for the incident light may 
be ascertained. The photodiode itself is designed as a two stage device which 
maximises the carrier generation (increased capture visibility) and minimise the dark 
cw:rents (average leakage currents). The detail design specifications for the proposed 
photodiodes and the associated analogue to digital converters (ADC) are presented in 
[59], and are subject to intellectual property constraints; therefore, they are considered 
outside the scope of this thesis. 
The ADC is basically designed to convert the analogue charge time of the capacitor 
into an equivalent digital representation. Once reset, a counter, clocked at an 
appropriate frequency, reaches a certain limit before the capacitor is fully discharged; 
the value within this counter at that precise moment, represents the intensity measure 
(in this case the counter is 6-bits wide). It is performed on each pixel simultaneously, 
when the array is switched into its initial capture mode. Once captured the resultant 
digital values are retained within all the pixels throughout the other processing stages 
until it is required to capture the next frame. This takes place exactly 40µs (for 25 fps) 
after the last capture. Figure 4-7 shows the basic steps performed by the capture 
component within one capture, process and display cycle performed by the array. 
Reset Capacitor 
l 
Perform ADC 
Measurement 
l 
Latch ADC 
Measurement 
Figure 4-7: Capture Flowchart 
Page - 89 
4.4.2. IP Processing Component 
Each pi...xel contains an identical processing component that is controlled via a single set 
of external control lines. A pixel, when in processing mode, operates through three 
basic modes, namely Local Stream Encode, Local Stream Decode and Foreign 
Stream Decode. In local stream encode mode the processor performs the primary 
task of encoding a raw image for transmission. In local stream decode mode the 
processor reconstructs the image as would be reconstructed at another such device. 
This allows for accurate motion analysis determination for the next frame. In the final 
stage, the foreign stream decode mode, the processor decodes the stream received 
from another such device in preparation for display. 
These three basic modes further sub-divide into 15 sub-modes, which are based on the 
processing components associated with the video codec. The flowchart illustrating the 
order of these 15 sub-modes and their positions within the three basic modes are 
presented in Figure 4-8. 
---------------------, 
Local Stream Encode 
' I I 
... 
I 
I 
(l) I I 
I 
Motion Analysis I 
I 
I 
• 
I 
I 
I 
(2) I I 
Forward Wavelet I I 
I 
• 
I 
I 
I 
(3) I I 
Quantisation I I 
I 
,_ -
,-- _-:;.-:;.-::.-::.-::.-::.c:.,tc:.c:.-::.-::.-::.-::.-::.-::.-::.-::. -' 
I 
' 
' I (4) 
Quantisation·' 
• (5) 
Inverse Wavelet 
I 
f Local Stream Decode 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
,---------------------, .---------------------, 
: Local Stream Decode : : Foreign Stream Decode : 
I I I I 
' 
I 
I + I I + I I I I I 
' I (6) I ' (11) I I I 
I I I 
I Motion Comp. I I Stream Load i I I 
' I I • -_-_- _-_-_-_-_-_-+.---: _:-_-_-_-_---.; I 
• I I (7) I (12) I 
Forward Wavelet I Zerotree Decoding I 
I 
• 
I 
• I I (8) I (13) I 
Quantisation I Quantisation·' I 
I 
I I j 
+ I + I I I I I 
I I (14) (9) I I 
Zerotree Coding 
I I Inverse Wavelet I I 
I I 
I I 
.. .. I I I j 
I I (15) (10) I I 
I 
' Stream Extraction I I Motion Comp. 
I I 
I I 
I I I 'f I I 
Local Stream Encode 
Figure 4-8: Pixel Processing Flowchart 
Page - 90 
Control Bus 
From Top 
Main Register 
From Right 
From Left 4 MUX 
~ Arithmetic Unit 
l l _ ~torage Registers 
MUX - ~ Scale 
Units for Control 
Control 
l 
ZTE unit 
ControlB~~+:::=:::::=:~~~~~~~~~~~~~~ ::::::==_,. 
Output 
From Bottom 
Figure 4-9: IP Processing Architecture 
Several reusable components are employed to perform these 15 processing tasks. 
These components include multiplexers, registers, an arithmetic unit, a scale control 
unit, a zerotree symbol generation unit and several multiplexer units. Figure 4-9 
illustrates the relation of these blocks with a single Intelligent Pixel. The operational 
details of these modes within this architecture are presented in the following sections. 
4.4.3. IP Display Component 
In 1997 [60] introduced a new ferro-electric liquid crystal on silicon spatial light 
modulator (SLM) which presented a novel possibility for the display mechanism in the 
IP A. This SLM utilises that the top metal layer within the target technology for two 
purposes, firstly to reflect the incident light, and secondly to drive the liquid crystal 
deposited above it. In order to represent a grey-scale pixel value the SLM uses a time-
division multiplexing scheme. This technique allows for the brightness of a pixel to be 
related back to the coefficient value held within the pixel [61]. 
Page - 91 
A significant hurdle that demands attention lies in the lens arrangement proposed for 
the IP A. This complexity results from the union of the capture component and the 
display component on the same plane. A lens arrangement, as illustrated in Figure 4-10 
and Figure 4-11, are proposed. This allows for both the display and capture of images 
simultaneously. 
~~~~~~~~~-~~~~--
Figure 4-10: Real Image Formation for Capture 
I-
~ 
;:,:[1\ ______ _ 
~~d'~ ···H""'""""-·-1 
~-- ------- 'ii-- --------------1------------_J 
1. ··,. ':I 11 \ ·r •.1,l 
~11 
Figure 4-11: Virtual Image Formationfor Display 
Research in regards to this lens technology is currently being conducted at the 
University of Cambridge and due to intellectual property constraints full details 
relevant to this section is considered out of scope for this thesis. 
Page - 92 
4.5. IPA Interconnects 
To be useful, all parallel processing environments require that its processing elements (PEs) 
such as the Intelligent Pixel (IP) be interconnected to other PEs for information exchange. 
For instance the hypercube architecture (as employed by C-Cube systems) suggests that 
each PE be connected to every other PE in the system [62), see Figure 4-12. Although this 
provides a flexible and high throughput data pipe between two pixels, the associated 
routing requirements tends to render this scheme impractical, especially in the case of a 
QCIF (176 x 144) pixel array. 
_ ... - ' , ...... ~, --- , I ,' 
," I I I. ~', .,. .... -- ,' I 
, I I , _... , I 
,' I I_ .... --"',,,, , I 
, I J- ' a_ it· I 
, _t.- I ',~, / 
... , I I \ ---- I :--~--r--,- ~ I 
I I t f , I 
I I ' I- I ' \ I 
I.. ,' '- 1-.. .. \ *'- I ' ' \ ,1 
\,._. I ~ I , I\ 
,( I "'# I } , \ 
,,, I ? '\ .,. I t " 
I ,:, I f' I ' ' o" I '\ 
I \ ',1 f I 1, 
I ,~. , -,. \ I / -
I ~ ' ~--~---
/ ~' -------- I I --- ~ 
I ' \ I r _. ' 
,' ,ff ,,',, ---r-, / 
I I , '<- \ I I , 
I ff ,...},,, I/,' 
I t I ... _.... ,~, I/ , 
I I ,' ---- -~·-
--
\ 
PEs 
I 
Figure 4-12: Hypercube Interconnect Arrangement 
The IP A, instead, accommodates an interconnect scheme that allows each pixel to 
interconnect with four of its nearest neighbouring pixels in a lattice arrangement similar to 
that used in [63). See Figure 4-13. 
+ • 
' 
' 
I 
--+---1 
I 
I 
., 
Figure 4-13: IPA Interconnect Arrangement 
Page - 93 
The communication direction of each pixel is then controlled via two global control signals 
that traverse to each pixel in the array. Therefore, at a particular instant data exchange is 
limited to only function in one direction, either left, right, up or down, throughout the 
entire array. The direction depends on the levels of the two signals RC (Row/Column 
selection) and LR (Left/Right selection). Figure 4-14 illustrates the four directions used 
and their corresponding signal levels. 
RC=O 
LR~O 
C<=<I 
LR=O ~ 
Data Flow Left to Right 
e, 
:» ~ 
.. 
~ - 'T1 I P.Es 0 ., 13 
-~ ~ 
u 
0 
tti 
0 
RC=O 
LR-= 1 
~ 
RC=I 
LR = l 
0 
t 
@ 
t 
G) 
t 
@ 
t 
Figure 4-14: IPA Data Flow Directions 
·Es 
.,,, 
0 ~ 
., ,; 
I ~ 
1, PEs ~ 
o' r 0 
t s I : 
~ 
The direction control lines directly control a 4--input multiplexer within each pixel. This 
multiplexer then, selects one input from the four outputs of the surrounding pixels, left, 
right, top or bottom. See Figure 4-9, which illustrates the 4-input multiplexer. Table 4-1 
displays the direction selection table for each 4--input multiplexer and the overall array data 
flow direction. 
Page - 94 
Table 4-1 : IP Mux Select Direction 
0 0 From Left Right 
0 FromRiglit Left 
1 0 From Up Down 
1 1 From Down Up 
The reduction in interconnection routing complexity generally limits the application of this 
system to algorithms that are orthogonal in nature, requiring only immediate neighbouring 
pixel values. Taking this into account, a motion compensation, wavelet transform and 
quantisation scheme, which is suited to this architecture is presented in [24] [44]. However, 
to perform the Ze.totree coding algorithm a modified interconnection scheme becomes 
necessary, and is covered in Chapter 5. Algorithms such as arithmetic coding) which are 
predominantly sequential and require more versatile interconnections are typically not 
suited to this environment and as such conducted outside of the pixel array. 
4.6. Array Mapping of Video Codec 
Chapter 2 and Chapter 3 introduce several components relevant to different video coding 
schemes. This section proposes a novel video coding algorithm suited for implementation 
within an Intelligent Pixel Array Processor. Figure 4-15 illustrates a block diagram of the 
proposed codec for the IP A Processor. The extra dotted Zerotree component is included 
for the case where EZW is used over ZTE (See Chapter 5 for more information). 
Furthermore, two areas on the diagram have been defined, one for implementation within 
the array and the other for implementation outside the array, yet located on the same chip. 
Page - 95 
!ntcllisent Pixel Array Based Outside Amly 
I . . !111Bge Frame Forward . . Zcrotrce Arithmcuc 1 
- Diffcrcree Wavelet Q111.ntisat:1on Coding Coding 
I 
~ I I PIIOV l I U>Ca~ ········ · - --- t : I StreumPackall" l-!--- StllO&mOa ( Zmo1n,c ! ; 1 & Tmll.'llllit 
.. : Decoding :T" 
Fralllll H Inverso H . . I I '····· ········-' 1 
Swnm.ation Wavelet Qt11nt1S1tt1olf' I i 
FraJ1111 
Surnrnalion 
Inverse 
Wavelet Ql11Jltisa1ioir
1 
Previous 
Foreign F,,,-mc f--- Reoeivad Image 
Zero tree 
Decodiog 
~
- - - - · - · - • •- - - - - - - -- - - - - - --- - - • - - - • - n • " • • - _,_i 
Receive& 
Stream Paming 
Arithmetic 
Decoding 
Figure 4-15: Proposed IP A Video Codec 
Stream In 
The proposed System-On-Chip (SOC) view is presented in Figure 4-16. The main features 
of the proposed SOC include 
• Capture, processing & display 
• 25 frames per second PAL QCIF video compression 
• Frame differencing motion compensation 
• 3-scale triangular wavelet image decompositions 
• Subbaod based quantisation 
• ZTE or EZW entropy compression 
• Arithtneticcoding 
• Stream assembly/ disassembly for transmission 
• Control circuitry 
• Low clocking frequencies (100-200 kHz) to minimise power usage 
• Integrated audio coder (Research conducted at the university of Las Palmas) 
Page- 96 
4.6.1. 
i 
b 
<e. 
n 
Stream In 
Receive Buffer, 
Stream Parse, 
Arithmetic Decode, 
Array Interface Buffer. 
_l~..L...L-l L. LJ.....l."L L...l-..l . ..,.!--L! 
QCJF IPA 
176 x 144 Pixels 
~-.. -..... ............... l.......l.._..L. .l ...... --..-.......... 
Array Interface Buffer, 
Arithmetic Coder, 
Stream Encode, 
Transmit Buffer 
Stream Out 
> c: 
&. 
0 
('"') 
0 
0. 
~ 
Figure 4-16: System-On-Chip Architecture 
Motion Compensation Codec 
The proposed IP A motion compensation codec employs frame differencing to 
minimise temporal redundancy between frames. Although, this technique is not as 
sophisticated as the block matching motion estimation (BMME) technique, it 
represents a trade-off between hardware complexity and the provision of minimal 
motion compensation ability. Techniques such as the BMME and other more complex 
motion compensation systems require the use of several of extra registers and 
components (or external RAM) on a per-pLxel basis. Unfortunately, when employing a 
0.25 micron CMOS technology, the required extra space is considered impractical 
especially for a massively parallel QCIF sized array. These complex techniques are 
better implemented on systems with large memory reserves. 
A block diagram illustrating the operation of proposed frame differencing/ summation 
technique is presented in Figure 4-17. This technique exploits the trait that most video 
Page - 97 
communication streams tend to be composed of frames that have large areas of 
similarity between adjacent frames. Therefore when a previous frame is subtracted 
from the current, the resulting differences tend to be small which generally improves 
the compression performance. 
Quantisation typically introduces some error into the entire system. Since the decoder 
only receives the 'error introduced' image and not the original image at the sending 
device, it becomes necessary for the motion compensation scheme to take this into 
account when subtracting the previous image. This is because the previous image at 
the decoder contains some errors. This is overcome by storing and using the contents 
of the quantised frame instead of the 'raw' previous frame, at the encoder. Since this 
quantised frame is available at the decoding end, the decoder sums the incoming frame 
to the previous quantised frame to generate the new frame. In this manner, the 
decoded image somewhat 'tracks' the quantisation error and generates frames 
consistent with the encoder, for display. 
Image 
Previous Image 
Franc 
Store 
Reg. 
Previous Image 
Image + 
' 
' I 
I 
I 
--------..J 
Figure 4-17: Frame Difference & Summation 
With all pixels performing this process in parallel a frame difference of the entire image 
can be accomplished in 8-10 clock cycles, depending on the data precision chosen. 
Frame differencing is easily performed within the proposed massively parallel hardware 
architecture, by the use of two storage registers, selected to match the precision of the 
input image, and one higher precision accumulator register, all distributed on a per-
pixel basis. Each pixel also requires an arithmetic unit capable of performing both 
addition and subtraction. Since serial adders are smaller in size and subtraction is 
Page - 98 
equivalent to negative addition, one such serial adder is employed on a per-pixel basis. 
One of the two storage registers is employed to store the previous quantised frame 
content of the locally captured image, while the other is used to store the previous 
quantised frame content of the foreign frame, hence facilitating both encode and 
decode. Figure 4-18 illustrates the architecture required by each pixel to perform frame 
differencing and summation. In this case the system is configured to perform 
summation of a recently decoded value with the previous foreign frame to develop a 
new foreign previous frame for storage. Two other modes that can be performed 
include the ability to store a surnmated local frame, and to provide the motion 
difference to the accumulator register. 
4.6.2. 
, ,~oreignReg. ', , 
\ , "I 
• ' v , , , ... 
' ,' , 
" . 
. ·.~: / ' ', : 
/ \ Local Reg. \ 
' . . 
. 
. 
' . 
. 
' 
.. 
l 
. 
Accumulator Reg . 
Figure 4-18: Frame Difference Components 
Scale Control Architecture 
For various purposes, such as wavelet transform and zerotree data load/ extraction 
processes, a need exists to establish a scale based pixel enabling mechanism. Such a 
feature allows for the enablemenr of all pixels belonging to exactly one scale or sub-
band throughout the whole array. The novel technique proposed here is heavily 
dependant on the positioning of sub-band related coefficients within a nucleic block, 
and involves the activation of a pixel via intersecting horizontal/vertical control lines. 
The design for an N x N nucleic block requires N horizontal (HEnable) control lines, 
with each horizontal line propagating to all pixels in a row, and N vertical lines 
(VEnable), each propagating to all pixels in a column. Figure 4-19 (A) shows the 
required HEnabfe and VEnable signals for a 3-scale nucleic block. Figure 4-19 (B) also 
shows the required control signals for a 16 x 16 pixel array, which consists of four 3-
Page -99 
scale nucleic blocks. It is observed that the same number of external control lines (8 + 
8 = 16) can be used to drive the HEnab/e and VEnab/e lines in both figures. Similarly, 
given a 3-scale transform, any arbitrary sized array may be controlled via 8 vertical and 
8 horizontal control lines. Generally the number of scales used (n) is related to the 
number of external control lines required by, 2" for the horizontal and 211 for the 
vertical, resulting in a total of 2 X 2" control lines. 
HEnable 
Lines 
VEnable Lines 
l 2 3 5 6 7 8 
I 
7 -49~50-51 - 52-53-54-55-56 
- - J_ -x. c... :;J__ ':t.. ..-L 
g-57-5-g-59-60-61-62""'6.J-64 
(A) (B) 
Figure 4-19: VEnable & HEnable Control Lines 
In order to select a subband the relevant VEnab/e and HEnab/e signals are set high. 
Figure 4-20 illustrates the principal behind subband selection (HH3) through activation 
combinations of VEnable and HEnable. 
Nucleic 
Pyramidal 0 l 0 1 0 J 0 l 
I 
rl l L J_ l . _ J_ ..L l 
~I I'\_ I_ l"t, _..,_A - r - L - l"'f - 0 =~ I LH3 
HLi I HI-Ii 
HL:i 
Figure 4-20: Activation of HH3 subband via VEnable & HEnable 
Page - 100 
A full list of useful VEnable / HEnab/e activation patterns for a 3-scale nucleic block 
based array, of any arbitrary size, is presented in Table 4-2. Using this, specific 
subbands in a particular scale can be operated on. The last four entries m the table 
relate to wavelet transform operations, where more than one subband is active at one 
time. 
Table 4-2: VEnable & HEnable list for subband activation. 
RC Ul 
LL 10000000 10000000 
LH1 00001000 10000000 
HL1 10000000 00001000 
HHl 00001000 00001000 
LH2 00100010 10001000 
HL2 10001000 00100010 
HH2 00100010 00100010 
LH3 01010101 10101010 
HL3 10101010 01010101 
HH3 01010101 01010101 
All 11111111 11111111 
Oto 2 10101010 10101010 
Oto 1 10001000 10001000 
0 10000000 10000000 
Control Signals Including 
VEnable and HEnable __ __, 
Processing Circuitry 
Pixel Enable 
Circuitry 
Figure 4-21: Pixel Bypass Architecture 
Out to4 
Surrounding 
Pixels 
A disabled pixel, (i.e. when HEnable I VEnable lines to that pixel are low) functions as 
a pass through unit for data. If data arrives from the left, right, top or bottom the data 
is repeated at the right, left, bottom or top respectively. The bypass architecture is 
Page - 101 
presented in Figure 4-21. In this mode nearly all functions of the pixel are disabled. 
The internal signal GEnable is employed to indicate the activation status of a pixel; if 
GEnable is logic high then the pixel is enabled otherwise disabled. GEnable is derived 
directly from VEnable, HEnable and other zerotree components described in Chapter 5. 
If either VEnable or HEnable is set low, then GEanble is forced low. 
4.6.3. High Pass / Low Pass Pixel Identification 
In order to perform the wavelet transform, each pixel requires information identifying 
it as a high-pass or low-pass pixel. To accommodate this each pixel is equipped with 
two pairs of signals. Signals HPRin and HPRout are used for row-wise identification, 
while signals HPCin and HPCout are used for column-wise identification. These signals 
cascade from one pixel to the other in either a row or column direction. Figure 4-22 
illustrates the connection pattern for the first few pixels in an array. Logic zero is 
asserted at the top and left edges to initialise the high/low pass identification process. 
The output signals for the row and column are derived from the input signals based on 
the activation status of each pixel. Both outputs reproduce the resultants of switchable 
inverters, which invert the input signals in enabled pixels. Inactive pixels do not invert 
and reproduce the input signal at the output. 
o-m 
o-11 
0 0 
• ...... .... 
: 
11-11 11-11 
o-11 II-ii II 
r ·1 r • 
0 
II 
.. 
1 =HPCin 
2=HPCout 
3=HPRin 
4=HPRout 
Figure 4-22: Low/High pass pixel identification architecture 
If all pixels are enabled (i.e. scale 0-2), the array becomes a grid pattern of Low and 
High pass pixels. If every alternate pixel is enabled (i.e. scale 0-1), then the grid pattern 
is found in these alternate pixels. Figure 4-23 illustrates this pattern for when all pixels 
are activated and for when alternate pixels are activated. The first of the two-digit 
Page - 102 
number in each pixel indicates the column wise inverting pattern, while the second 
digit indicates the row-wise inverting pattern. 
Deactivated Pixels 
D_D_D_D_D~D-D-D 
Scale O - 2 Active Scale O - 1 Active 
Figure 4-23: High/ Low pass grid pattern 
If a selection of the transform direction (Row / Column) is chosen, then technique 
provides pixels with High/Low pass identification. Although presented for one 
nucleic block of a 3-scale transform this holds true for all image sizes and nucleic block 
sizes. 
4.6.4. Triangular Wavelet Transform Codec 
The thesis in [44) presents a novel architecture for performing the triangular wavelet 
transform and its inverse, given a massively parallel processing architecture such as the 
IP A, unlike in [45] which represent the majority of SIMD implementations. Since a 2D 
triangular wavelet transform (forward and inverse) is orthogonal in nature, it ls 
generally implemented by realising two single dimensional transforms, one for the rows 
and the other for the columns. The IP A implementation of the transform exploits this 
to perform the 2D full image transform in a row/ column manner as illustrated in 
Figure 4-24. However, when performing either the row or column transforms, they are 
conducted in parallel for the entire array reducing the required clock cycles to around 
150- 200 for a 3-scale full image transform. 
Page -103 
Row-wise Transform Column-wise Transform 
Figure 4-24: Row/Column 1D Transforms for 2D 
Furthermore, since the triangular wavelet algorithm (Section 2.5.2) is realised with 
simple shifts (multiply/ divide by 2) and summations, hardware implementation is easily 
performed via a single shift register, an arithmetic unit, some pixel enable logic and a 
high/ low-pass selector on a per-pixel basis. The shift register provides for coefficient 
storage and multiplication. The arithmetic unit provides an addition/ subtraction 
facility. The pixel enable logic allows for scale selection while the high/ low-pass 
selector determines which pixels belong to a scale's high-pass / low-pass coefficient 
components. Figure 4-25 depicts the per-pixel architecture required to perform a 
multi-scale wavelet transform and inverse. 
Shift Register. 
Reg. Enable 
~---------+-- Data Out 
Adder 
Adjacent Pixe I Value 
Pixel Active Signal 
Incident 
HigWLow Pass 
Select S igna I 
Sea le Based Pixel 
Activate Contro I 
High/Low Pass 
Selector 
~ 
[ High-Pass Pixel Signal 
~ 
High/ Low Pass 
Select Signa I 
For Next Pixel Scale Control 
Signals 
Figure 4-25: Wavelet Transform Architecture 
The control structure for a forward transform as taken from Section 2.5.2, 1s as 
follows. 
Page - 104 
1. Enable all pixels- Set the scale control signals to actlvnte all phcls. 
2. Multiply all high-pass pixel coefficiente by 2- Since each pixel is assigned 
as low/high pass (from Sectlon 4.6.3), the multlplicatlon is easily limited lo the 
high-pass pixels, Typically, multiplication by two is performed via left shifts on 
the coefficient register, however, in this case since the register RO is already 
capable of performing right shifts, it is beneficial, in terms of minimising 
hardware complexity, to perform this using right shifts. This is accomplished 
by allowing RO the ability to cycle, which is performed P-1 cimcs, where P 
represents the precision nf register RO (m this case 9). Overflow is a non-issue 
if the chosen precision for the wavelet coeffidcut is limited to p.3 (this case 6) 
bits and 2's complement numbering is maintained. 
3. Subtmct low-pass cqefficient from left - Register RO, in :ill !ow-pass pixels, 
during the wavelet transform is fixed to rotate. In oomparison the high-pnss 
pixels are routed to receive data from the accumulatoroomponent. The contra! 
lines LR and RC are first set to '00' to accept data from the left. The 
accumulator module is then forced into subtract mode via setting a contra! 
signal S to '0', and a carry bit is preset. The clock into RO is then enabled via 
the control signal E111W. Nine clock cycles arc then applied to attain the 
subtracted value in RO in all high-pass pi~cls. 
4. Subtract low-pass coefficient from right - This is similar to the previous 
stage, with the exception that, LR and RC are now set to 'lO'. 
5. Divide contents of RO by 2 - RO in each high-pns pixel is then divided by 
two. Titis is performed by settlng EnRO to 'O', holding the most significnnt bit 
of RO constant nnd left-shifting RO by one. 
6. Perform in vertical direction - Herc processes 2 to 5 arc rcpeat-1.'tl, except for 
the comm! signals LR and RC, which arc initiaUy set to '01' and then '11' for 
processes 3 and 4. 
7. Perform other scll!e1 - The other scale transform, arc pcrfonned by repeating 
processes 2 to 6. Howe1·er, during each iteration the ntr.1.}' is sub-Sllmplcd in 
terms of the number of nctiw pixels, VEn,,h!t and HEnub!t arc csrnblishL'tl 
according to Table 4-2 to sdect the )ow-pass pixd, {rum a pr-1.-vious singe. 
The forw:ml wavekt tran1fonn results in thi: generation of a multiple nucleic blocks 
1hroughout 1he array. In the C:tSc ofa 32 ic: 32 pixel lrta)', using a thrl'l: scale transfom1, 
Pafle · 105 
4 x 4 nucleic blocks of size 8 x 8 pixels are generated. During the inverse transform 
the image is developed across the array, instead. The inverse transform is performed 
by repeating processes 1 to 7, with the exception that additions are performed instead 
of subtractions in processes 3 and 4. 
4.6.5. Coefficient Quantisation and Inverse 
Quantisation is only a necessity for coefficients used in the zerotree entropy coder 
(ZTE), because the embedded zero tree wavelet (EZW) algorithm harbours an 
embedded quantisation mechanism within its coding algorithm. The ZTE quantisation 
mechanism is simply implemented by a sub-band based coefficient truncation 
technique. This generally limits the number of coefficient quantisation levels possible 
integer multiples of two. However, when considering the reduction in per-pixel 
circuitry posed by a simple right-shift truncation, in comparison to division, the trade-
off is easily justified. The proposed quantisation architecture is presented in Figure 
4-26. 
Enable 
Right Shifting Register RO 
RO 
Enable 
Figure 4-26: Quantisation Architecture 
The quantisation process relies on the rightward bit shift of coefficients in register RO. 
However, to perform correct 2's complement quantisation the MSB bit is held constant 
by disabling the clocking of the MSB. By enforcing that register RO rotates only when 
a pixel is active, entire sub-bands can be made active and quantised to a particular level. 
The number of bits truncated (7), when quantising to a particular level, is directly 
proportional to the number of clock-cycles issued. 
The inverse quantisation process is performed by reversing the direction of the shift 
register. However, since it is costly to implement a bidirectional shift register, the 
Page - 106 
inverse is perfonned via rigbr-shifw. If the rotate feature of the register is enabled and 
MSB is a!so enabled to rotate, then performing B. T dock cycles, where B represents 
the number of bits in register RO (In this case 9) and T represents the number of 
quantisation dock cycles, results in the inverse qWlntisation of the coefficient. 
Examining an example; if RO contained O 01001011b (75) and if two bits are truncated 
the resultant becomes, 0 OO!OOIO!b (37) after the first dock cycle, and O OOOIOOlOb 
(18) after the second. During the inverse phase the register ii; cycled seven (9-2) times 
in the following manner from O OOOOIOOlb to 1 OOOOOIOOb to O lOOOOOIOb to O 
OIOOOOOlb to 1 OOJOOOOOb to O lOO!OOOOb and to finally result in O OlOOIOOOb (72). 
Once the quantisation is completed the ZTE encode stage takes place, as co\·ercd in 
Chapter 5. 
4.6.6. Stream Codec 
The zerotrec coding stage precedes the arithmetic coding stage. Design of an 
arithmetic coding architecture is considered bL•yond the scope of this thesis; howc\'Cr, ~ 
suitable design is presented in {64]. Optimally, the arithmetic coder should be designed 
to be adaptive, with at least cwo statistical mblcs to represent thc symbol and 
coefficient set of the chosen zerotree coder. Since a typical arithmetic coder is 
sequential in narure, it may be placed outside the parallel army yet renuin on the same 
chip. Once arithmetic coding has been pcrfonncd the stream is ready to be packed and 
tr:tnsmitted, For ideal perfonnancc at least cwo FIFO based 1/0 buffcn nrc require...! 
to nuintnin the data rntes, Conventional buffer designs such as those gcncrnicd via tbc 
XILINX buffer generator nuy be used. 
4. 7. Primary Component Schematics 
This section pnescnts n ,cries o( components in schematic form, necessary for the 
Implementation of the prnposcil wavelc1 based zerum:c codce. Components r,tcH•ntc.J 
herein arc designed to imcgrntc to dthcr, 1bc EZW or 1he 7.Tlt co\kcs. As such 
optimisations for a particulu impkmcnmion h~\·e been omittL.J. Th\• designs have bc'i:J\ 
Page· 107 
verified in VHDL and have been shown to adequately perform the required tasks. Tools 
such as Synopsys, Cadence and VHDLSimili have been employed for the verification of 
these architectures. 
The designs contain two types of I/ 0 representations, I / 0 ports drawn with either 
continuous or dashed lines. The latter represents signals generated and used within 
individual pixels, while the former indicates control signals originating externally to pixels. 
Furthermore blue I/0 ports represent data compared to the red I/0 ports which represent 
control lines. 
4.7.1. Summation Unit 
The schematic representation of the summation unit or accumulator is presented in 
Figure 4-27. 
CLK Latch 
RO Out 
1c-;~va~;--.•~-.., 
L----
+ 
I sout I I I RO I MC In 
Ext/MC Out ~ ~Jinl.l.Q~ 
s ---------i 
Con\'a I s I EnROa I \lode ICik Cycles 
0 0 0 Carry Set 1 
0 0 1 Subtract Data Dep. 
0 1 0 Carry Clear 1 
0 1 1 Add Data Dep. 
1 0 0 Carry Set 1 
1 0 1 Convert Data Dep. 
1 1 0 
1 1 1 )/11/1 'iJ I .1:fJ. 
Figure 4-27: Summation Unit Schematic 
The summation unit is capable of performing the following tasks. 
Page - 108 
1, Bitwise Addition - This is performed by firstly, clearing the latch, then setting 
control sJgnnl S to high anti clocking for one clock cycle. Once E11ROa is set high, 
bit-wise dnm arriving at EXT/MCOUT and RDo111 are sununated to produce a hit-
wisc result nt RO/ MCin when docked. 
2, Bitwise Difference - This is performed by firstly, setting the latch, then setting 
control signol S to low and docking for one clock cycle. This sets up the circuit to 
perform 2's complement differencing or subtraction by ncgati\oe addition. Once 
E11ilOP is scr high and the circuit is docked data from the two inputs arc 
differenced according to M/MCin == R001,/ + (-EXT/MCOUT). 
J. Bitwise 2's Complement Conversion - Titis mode is used to convcn the 
contents arrlving at ROont into 2's complement mode and vice vcna. For this to 
occw: S must be ~et low for one dock cyde in onler 10 set the latch. Then both 
E11IWP nnd Co11Vi, arc set high and the circuit is clocked. 
EnRO,, and ConV,, arc bmh dcriwd 5ignals, which are acti\'ated by other pans of the 
:uchitecnu:c. 
4,7.2. The Register RO 
l11c schematics for register RO arc illustr:ucd in Figure 4-28. Dc-scriptions for the 1/0 
pons used nrc also prcscntcJ in Tnblc 4-3. 
Page· 109 
·u;;-,;;i; , 
I] 11R03 ,. ~ -------+-+---! 
~ · 
I RllCt2 J:>---+-~---------- _;~c_;,;v; ·> 
' r 
u~n~o>L 
~ 11~iiit> 
o I o 
Cycle I O 0 
EXT Load I O 1 11 
Comert o I o 
Adder Load 
·~/ ~}>:1.P', 
Figure 4-28: Register Ro Schematic 
Table 4-3: Register Ro I/0 Ports 
S1t,".nal I )ir I I ,pt'. I , )n~m , J)~'>Cllj)llPll 
CLK In Clock External Clock signal 
EnRO In Control External RO Clock Enable 
HPRin In Control External Row High-Pass Selector Input 
ROCtl In Control External Mode Select 1 
ROCt2 In Control External Mode Select 2 
RC In Control External (Row) Colunm Selector 
REV In Control External Reverse RO 
ClkMSBa In Control Internal RO MSB Clock Enable 
Con Va Out Control Internal Generated Convert Signal 
EnROa Out Control Internal Generated Pass Based RO Enable 
GEnable In Control Internal Pixel Enabled Signal 
LP Local Control Internal Low-Pass Identification 
WT In Control Internal Wavelet Transform Mode Select 
Above fn!Out Data External Colunm High-Pass Selector Input 
Adder In Data Internal Data From Summation Unit 
EXT In Data External Data From Other Pixels 
RO out Out Data Internal Data To Other Components 
R4out In Data Internal Data From Zerotree 
R5out In Data Internal Data From Zerotree 
SigE In Data Internal Data From Zerotree ID 
Sign Out Data Internal Data to Zerotree 
ZROl In Data Internal Data from Zerotree 
Page - 110 
Register RO is composed of 9 edge-triggered D type Flip Flop registers, several input 
selection multiplexes and various control logic, which describe the behaviour of the 
register. The behaviour and precision of register RO depends on which mode the pixel 
currently resides in (as a result of the mode the array is placed in). If the pixel is in 
wavelet mode (WT is set logical high) then the precision of RO is increased to 9-bits 
from 8, which occurs in any other case. This one bit increase guarantees the non-
occurrence of an overflow during the wavelet transform. Also in this mode, register 
RO in enabled high-pass pixels, are automatically clocked while the enabled low-pass 
pixels are forced to cycle. In light of this, the two main operation modes of the register 
RO are as follows. 
1. Shifting - The primary function of register RO is to shift its contents right on a 
clock cycle. The input to the most significant bit, however, can arrive from either 
another pixel (EXT), the summation unit (Adder;, LSB of RO (ROou~ or the 
zerotree components (SigE, R4out or R5ou~. The significant amount of control is 
used to facilitate the multiple sources and some other features used for different 
operations in other processing stages. 
2. Reverse - When the Rev signal is set high, the register swaps RO_O and RO_l with 
R0_6 and R0_7 and performs a cycling function all within one clock cycle. By 
applying one swap for ever four shifts the contents of register RO is reversed. 
Table 4-4 shows an example of how the contents of RO ate reversed within nine 
clock cycles. The reversing function is used for the EZW codec and is described 
further in Chapter 5. The reverse function should only be activated in pixel shift 
(SH= 1) mode. 
Table 4-4: Ro Reversing Function 
1 1 2 3 4 5 6 7 8 
0 1 8 7 3 4 5 6 2 
0 2 1 8 7 3 4 5 6 
0 6 2 1 8 7 3 4 5 
1 5 6 2 1 8 7 3 4 
0 5 4 3 2 1 8 7 6 
0 6 5 4 3 2 1 8 7 
0 7 6 5 4 3 2 1 8 
0 8 7 6 5 4 3 2 
Page - 111 
In addition to these operating modes register RO contains other features to 
accommodate various requirements requested by different components in the 
processing stage. The features include the ability to aid in the conversion of 2's 
complement value into sign magnitude representation (via ROCt1 and ROCt2), hold the 
most significant bit (R.0_8, Sign bit) constant (via ClkM.SBa), behave differently 
depending on high/ low pass pixel status (via LP and EnRO) and extract/ load 
sign/ magnitude values from the zerotree components (via SzgE, ROout, R4out, R5out 
and ZRO/). 
4.7.3. Motion Compensation Unit 
The motion compensation unit is based on the storage of two 6-bit pixel values and the 
redirection of these values to different processing components. The schematics for the 
motion compensation unit are provided in Figure 4-29. It can accept data from two 
sources ROout (register RO) or Adder (accumulator unit) and produces resultant data via 
MCout. The two registers, Rl and R2 are both composed of edge-triggered D-type 
flip-flops (DFFs) interconnected in a manner to allow right shifts of data. In this 
implementation, the previous frames (foreign and local) are represented by 6-bit pixel 
values; therefore, these registers are designed to contain 6-bit values (i.e. 6 DFFs). 
RI - Foreign 
R2 - Local 
LR1245 
Figure 4-29: Motion Compensation Unit 
Table 4-5 lists all signals used in the motion compensation unit and their respective 
descriptions. The signal LR has been reused to select between the inputs ROout and 
Adder, as the pixel interconnection direction is not required during this simple motion 
compensation stage. For any clocking to occur in motion compensation mode, the 
Page - 112 
internal control signal MC must be held high. The signal Strmo11t determines which of 
Rl (previous foreign pi.xel storage) or R2 (previous local pixel storage) is activated. Rl 
is activated when Strmotttis low, else R2. 
Table 4-5: Motion Compensation Signals 
Signal I 'f ,p.: r [J, ,c-n1'li<1i1 
CLK Clock Clock Signal 
LR Control Selects Load from RO or Adder 
LR1245 Control Selects Register Load/Rotate Mode 
Strmout Control Selects RI or R2 Clocking and Output 
MC Control Activates Motion Compensation Unit 
4.7.4. Mode Selection Circuitry 
In order to switch the pixel and ultimately the array into specific operation modes, 
special mode selection circuitry is required. The mode selection circuitry is primarily 
dependant on two global control signals Ml and M2, which propagate to all pixels. 
Depending on the values of Ml and M2 the pi.xel can operate in one of four modes, 
which includes Motion Compensation Mode (MC = 1), Wavelet Mode (WT = 1), 
Zerotree Mode (ZT = 1) and Data Shift Mode (SH = 1). Figure 4-30 illustrates the 
schematics for the proposed mode selection circuitry. 
J>--{tti 
J>--{,yj:~ 
J>--{[rl/ 
J>--{ijc~ 
\I I 
0 
0 
1 
I 
\I' 
0 
1 
0 
I 
: \Ind,· I),:,, lll'llllll 
SH Shifting Mode 
MC Motion Conwemation Mode 
WT Wavelet Transfonn Mode 
zr .zerotree Codec Mode 
Figure 4-30: Mode Selection Circuitry 
The selected mode applies to all pixels in the array and enables special processmg 
circuitry relevant to the particular mode. 
Page - 113 
4.8. Conclusion 
In this chapter, the main design aspects of a novel massivdy parallel pb;cl-wisc processing 
arrar with potential to capture, cmnpn:ss, decompress and display real-time video in a 
iinglc 30 OPTO-VLSI device, has been explored, An optics plane, oriented 
pcrpcndicular!y !O the procc'Ssing circuity and electronic Interfacing, has been shown to 
interface high bandwidth data for video capture and display. In comparison, the 2D Vl.<il 
processing plane has been shown to imerfoce low bandwidth compressed data to mid from 
another such device (or relevant software). The Intelligent Processing Array (IPA) has 
been shown m be composed of N x M (In the c~sc of a QCIF image there are 176 x 144 
pi.~cls) pro~ssing clements (PEs) termed Intelligent Pixels (IPs). Each IP has the 
capability to perform light capture, via a photodiodc/ ADC, perform Nffi!! processing, via 
VL.Sl drcui1ry, and display n pixel value, via a LCD based SLM. The VLSI circuitry, 
sirua1ed underneath the display, has been designed to perform frame differencing, 
triangular forward/inverse wavelet transform and sub-band based qllllntlsarion 
mechanisms, all 011 a per-pixel basis. Fur1hennore, these VLSI archi1ccnm:s also possess 
1be abiLity to interface with an integrated ierotree codec, sucb as the EZW or ZTE 
architccrurcs, presented in Chapter 5. This chapter fully details these architectures and 
associated processing component:;, which have been verified f'>r funcrionaLiry. 
Page· 114 
Cfzapter 5 
MASSIVELY PARALLEL ZEROTREE CODEC FOR 
THE IPA 
"The significant problems we face cannot be solved at the same level of thinking we were 
at when we created them." 
,\lbcrr Einstein (1879.1955) 
5,l, Introduction 
Recent times hvc introduced a number of hardware designs/implemcnt:atlons of 2erotree 
algorithms such as those described in Chapter 3. Some more prominent designs are 
presented in [65] [66] [67]. Essentially, these designs are based for opcr:ation on SIMD 
processors and as such a massively parallel approach has not as yet been investigated. The 
work presented in this chapter illustrates the design of a massively parallel self·dassifying 
array of pixel·wisc processing clements, which as a whole perfonn the duties of a aeroucc 
algorithm for the purpose of imagc/,·ideo compression. Two zerotree codec designs have 
been presented, firstly the EZW and secondly the ZTE algorithm. A choice is then made 
for the selection of the most suited algorithm for VLSI implementation within the IP!OOP 
parallel processing array prototype. Tackled herein are problems relating to parallel 
implementations of hierarchic.al 2erotrce propagation, pbel based self-classification of 
zerotree symbols and coefficients, integration of a ierom,e codec to the wavelet transform, 
parallel data load/ extract for varying number of piKelS and array-wise replication of 
Page-115 
processors for arbitrary sized array generation. The architecture and schematics presented 
herein have been verified through VHDL simulations conducted in Synopsis and/or 
VHDLSimiW. 
5.2. Parallel Significance Tree Propagation 
,\n interesting challenge tlmt surfaces when considering the design of VI.SI :u:cbitectures 
for zerorrce coding, relates to the method chosen to implement the significance 
propagation tree between related parent and descendant pixels. M discussed in Chapter 3 
zcrotrcc algorithms require significance infonnation sharing between parent and 
descendant pixels that belong to similar spacial positions but are loa11ed within the multiple 
sc:tlos of a W:t\•elet transform. TypicaUy, the wavelet coefficients are organised by scale into 
a memory bank, which is then traversed in multiple passes identifying significant 
coefficients and then coding relevant symbols for those coefficients. Titis technique is 
ideal for conventional SThID processors. The organisation of the memory bank depends 
on the search technique selected, from a typical choice between Depth First Search and 
Breadth First Search, which arc described in Chapter 3. To introduce an increase in 
processing speed, some solutions adopt a MISD approach, which subdivides the wavelet 
coefficient map into three distinct sets of data, to which individual processors are allocated 
ro perform parallel processing on the main nee. Extcndiog this concept signific:intly, this 
section proposes a parallel search scheme that allows pixel-wise search of the entire tree in 
a MIMD approach. 
5.2.1. Nucleic Blocks and Scale Selection 
As introduced in Chapter 2, a nucleic block (NBLOCK) consists of all related spatial 
coefficients resulting from the wavelet transform of an image. Figure 5-1 (s) shows the 
arr:1y-wisc distribution of coefficients generated by a typical SISD wavelet transform 
technique, while Figure 5-1 (b) illustrates the NB LOCK equivalent. This indicates that 
an NBLOCK oontains exactly one wavelet tree, and as such forms one zerotree 
dependence tree. Furthermore, it implies that ench of these tJees can be processed 
independently of each other and, therefore, in parallel. An NBLOCK is perfectly 
square entity with dimensions that arc purely governed by the nwnbet of scales used in 
Page-116 
the wavelet transform that generated it. If Sc represents the number of wavelet scales 
and the size of the NBLOCK is represented by L = K 2 , then the dimensionKis 
defined as K = zsc . Therefore a three scale wavelet transform will generate a 8 X 8 
NB LOCK 
NB LOCK 
II!-
(a) Image-wise tree dispersion (b) NB LOCK tree dispersion 
Figure 5-1: NBLOCK Coefficient Tree Representation 
The proposed parallel scheme is based on each pixel receiving the significance status of 
all its relations instantaneously and as they are determined. In order for this to occur 
in parallel across all pixels in the array, fixed interconnections are required between the 
parents and their descendants. This is described in the next section. An effect that 
results is the enforcement of fixed sized NBLOCKs pertaining to a particular 
implementation, as the interconnections are hardwired and not dynamically allocated. 
This also imposes a restriction on the number of scale levels for the wavelet transform 
as this must comply with that chosen for the zerotree component. Literature suggests 
[22) that typical image compression algorithms employ 3 - 6 scale transforms. 
However, as the number of scales increase, so does the number of interconnections 
and routing complexity of those interconnections within an NBLOCK. Therefore, a 
compromise is required. Figure 5-2 illustrates a series of performance curves for 
different images coded at 64K. Each of the images is compressed with a 1 - 8 scale 
wavelet transform and zerotree coder. As is clearly seen, a 3 scale and an 8 scale codec 
show very little difference in compression performance. This suggests that a 3 scale 
system while providing the least complex interconnection pattern also provides 
adequate compression performance. Figure 5-3 extends this to display the effect of a 
varying bit-count, and shows that with the exception of very low bit-counts a 3 scale 
Page - 117 
system performs quite adequately. Although 3 scale systems are described hereafter, 
the principals apply for all other scaled systems, with the exception of the parallel 
interconnection pattern. 
a:: 
z 
Cl) 
a. 
40 ,--~~~-,-~~~---..--~~~-,-~~~---.~~~~-,-~~~---.~~~~ 
a:: 
z 
(/) 
a. 
35 
30 
25 
20 
15 
10 
5 
8 
+- Couple 
--e- Cro'Nd 
Lax 
Lena 
--.-- Man 
Woman1 
-1- Woman2 
2 3 4 5 6 7 
Scales 
Figure 5-2: Scales vs. PSNRfor Different Images 
--:---
---~--· 
---:----
, .. 
.. -
' . 
.,-
: ... -.. 
6 
4 
Scales 
- - - t' .. 
---: 
0 0 Bitcount 
·.• ,. 
., 
... 
8 
., 
~ .. ~ 
.. .. ; 
7 
6 
x 104 
Figure 5-3: Performance for Varing Bit-counts & Scales 
Page - 118 
5.2.2. Tree Interconnections and Propagation 
Once identified, the number of chosen scales (in this case 3) dictates the size and 
interconnection pattern of the NBLOCK. As previously discussed, a 3 scale 
decomposition results in the generation of an 8 x 8 pixel NBLOCK, which in turn is 
composed of wavelet trees with interconnections spanning across 64 pixels. Figure 5-4 
shows the resultant map of a 3 scale NBLOCK and the relationship tree structure 
generated as of it. 
~~ ~/~ 
~ /~~ 
~/~ 
~\ 
~~ 
~:>zy 
Figure 5-4: 3 Scale NBLOCK & Wavelet Tree 
The zerotree significance search pattern for both the EZW and the ZTE operate solely 
within this structure. Pixels/Coefficients at one level lower in image component 
frequency plane, when compared to the current pixel are termed the parent pixels (e.g. 
pixel 1 is the parent of pixels 5, 33 and 37). Pixels that share a common parent are 
termed sibling pixels to each other (e.g. pixels 3, 35, 39 and 7 are sibling pixels). Finally 
pixels that are of a higher image frequency component than the current pi.."'<el are 
termed descendant or children pixels (e.g. pixels 3, 35, 39 and 7 are children pixels 
belonging to pixel 5). 
Page -119 
Each NBLOCK always contains exactly one low-pass image component (pixel 1), 
several intermediate image components (e.g. pixels 3, 35, 39, 7 etc.), but is largely 
composed of high-pass image components at the fringes of the tree (e.g. pixels 2, 18, 
20, 4 etc.). The single low-pass pixel has no parent pixels only descendant pixels 
therefore these pixel are always transmitted. The intermediate pixels contain both 
parent pixels and descendant pixels and therefore can contain different transmission 
status depending on both the parent and children pixel significance statuses. The bigh-
pass pixels contain no descendant pixels and as such only depend on their current 
status and the significance statuses of their itnmediate parents. 
Generalising this for all pixels, it is clear to see from Chapter 3 that each parent pixel 
requires significance information from four children and each child requires 
significance information from its itnmediate parent. In hardware this can be achieved 
via 10 signals between each pixel its parent and its children. Unfortunately, this places 
a heavy interconnection routing burden inside the NBLOCK. However, this can be 
reduced to just five signals per pixel by simplifying content of the required signals. A 
parent pixel does not need information pertaining to which of its children pixels are 
significant, only that one or more of them are. Therefore, a single significance signal 
from the parent is sufficient for all of its children. Table 5-1 lists the five required 
signals and their respective descriptions. 
Table 5-1: Pixel Relation Signals 
Signal I Dir I Description 
Pin IN Significance indication from parent 
Cin IN Significance indication from children 
Shin IN Significance indication from siblings 
Pout OUT Significance indication to children 
Shout OUT Significance indication to sibling 
The Shin and Shout signals exist as a result of the simplification process, where by 
significance data from one sibling is passed to the next in line and so on until the last 
sibling in the chain passes the amalgamated significance signal to the group's parent 
pixel. Figure 5-5 illustrates the five signal based interconnection pattern between the 
children and the parent pixels. Since the highest frequency component pixels have no 
descendants the Cin signals for those pixels are connected to logic zero. Also the first 
Page - 120 
pixel (e.g. 2, 3, 34, 38, 6 etc.) in a set of sibling pi.xels receives a zero in the Sbin signal 
as Sbin is a one way signal and the pi.xel contains no related sibling to the left of it. 
Pin 
Sbin Sbout 
Data from children to parent via Siblings (Cin to Sbin to Sbout) 
.............. • Data from parent to children (Pout to Pin) 
Figure 5-5: Single Branch of Five Signal Relation Tree 
Pixel 1 receives logic 1 in its Pin signal as this indicates that these pixels have to be 
transmitted regardless of the state. 
Figure 5-6: NBLOCK Signal Route Map 
Page - 121 
Figure 5-6 illustrates the interconnection pattern between pbcels belonging to one of 
the three sub-trees found within an NBLOCK If all three trees were mapped into this 
NBLOCK, it is easily observed that the more congested routing patterns exist within 
small areas (5 pixels) within the NBLOCK, which corresponds to the connections 
between high image frequency pbcels. 
5.2.3. Pixel-wise Tree Architecture 
The pL"'<:el-wise tree architecture is presented in Figure 5-7. The signal Sbout is 
generated through simple combinational logic, while the Pout signal is based on the 
pixel operational mode (encode/decode) and the particular zerotree coding algorithm 
used (EZW /ZTE). The Sbout signal is set to logic 1 if Cin, Sbin or Pixel Significance 
(S~ Signal) is set to logic 1. The Pout signal, in encode mode, is generally dependant on 
the Cin signal, however, depending on the algorithm used Pout can also depend on the 
status of the Pixel Significance (Sig) . In decode mode the Pout signal is set to logic 1 
only on the existence of special symbols that indicate child significance. The primary 
reason for this stems from the condition that, in decode mode, low frequency pixels 
dictate the significance of higher frequency pixels, resulting in a fully hierarchical 
decode approach. 
~:---r 
Sbin===:J 
Pixel 
Significance 
Propagation Logic 
1----Sbout 
---~ Significance Decode 
Symbol Status (Sig) 
Encode I Decode 
Selection 
Figure 5-7: Pixel-wise Tree Architecture 
Pout 
The tree generation logic maintains a steady parent-child significance state that is based 
on the significance of the pixels, which presents all other relevant pixels with enough 
information to generate a zerotree based symbol when necessary. 
Page- 122 
5.3. Embedded Zerotree Wavelet Codec 
The EZW coder, described in Chapter 3, is a two-phase coder that iteratively alternates 
between the symbol generation phase and coefficient refinement phase, gradually coding 
the wavelet coefficient to fuU precision during transmission. The hardware coder in turn is 
also designed to alternate between these two phases of generating symbols and successive 
approrimation based refinemem bits, both for transmission. Symbol generation is 
performed via n concurrent pixel-wise coefficient sc!f-classificatlon mechanism which 
spam the entire image-size array of pixels organised into multiple NB LOCKS. The parallel 
NBWCK based significance propagation tree described in Section 5.2 provide all required 
significance information (from parent to children) for a particular pixel in an NBLOCK. 
The refinement bits, which result from successive approximation, ll!e generated by bit-
planing all the significant cocffidcms by means of right shifting coefficient registers. This 
process coupled with a significnncc identification mech:mism, a symbol / refinement data 
extraction mechanism, two am1y interface buffers and a symbol based pixel activation 
mechanism, constitute the entire hardware ru:chirecrure for the EZW codec. The VHDL 
code for a single pixel with th · EZW component is provided in Appendix A. 
5.3.1. Coefficient Reorganisation 
Once executed, the wavelet archirecmre described in Chapter 4, results in the 
generation of coefficients which are presented in an unfavourable oriematlon for use 
with the EZW architecture. At worst, the situation is compounded by two problems 
that require measures to compensate. 
1. Coefficient is 2's Complement Negative - In this case the leading ones in the 
coefficient interfere with the significance identification system. To overcome this 
problem the negative coefficients are convened from 2's complement to sign and 
magnitude representation after the wavelet transform, but before progressing to the 
zcrotrcc mode. Therefore, the shifting mode (Mt "' 0 and M2 :=: 0) is used to 
perform this task. Given the amalgamation of circuits in Figure 4-27, Figure 4-28 
and Figure 4-30, by establishing the control signals in Table 5-2 for a total of 9 
dock cycles, this task can he performed. On the condition where the most 
significant bit RO_S is logic O ~.c. a positive coefficient) the conversion circuitry is 
Page· 123 
disabled and a bypass is established. Also in order to maintain the sign in bit R0_8 
for the zerotree architecture, R0_8 is not clocked. At the end of the 9 clock cycles 
any negative coefficient in RO should be converted to a positive with the MSB set 
to the sign. 
Table 5-2: Negative 2's Complement Convert Mode 
2. Right Shift Presents LSB First - Since RO is configured to perform only right 
shifts and the coefficients are aligned with the LSB in bit position RO_O, it becomes 
necessary to reverse the bit-positioning of the coefficient in register RO. This is 
accomplished by performing the algorithm described in Section 4.7.2 within 
register RO. Table 5-3 lists the control configuration required to perform the bit-
position reverse of RO within 8 clock cycles. This phase is carried through in the 
pixel shift mode (M1 = 0 and M2 = 0) for all pix.els in the array. 
Table 5-3: Ro Reverse Configuration 
o I o x 0 X I XI O O IXI 1 IO IX Do Swap_ 
o I o x 0 X IXI O O IXIOIOI X 3 Shilt Half 
o I o x 0 1 X IXI O O IXlllOI X Do Swap_ 
o I o x 0 X IXI O O IXI O 10 IX 3 Shilt Half 
Once EZW decoding has been performed these two steps are repeated in reverse order 
to re-establish the original wavelet coefficients. 
5.3.2. Significance Identification Architecture 
Once a pixel coefficient has been correctly oriented, the zerotree significance 
identification for the EZW encode phase can commence. As described in Chapter 3 
Page - 124 
the EZW algorithm uses a threshold level to determine if a coefficient is significant. In 
the hardware implementation this threshold value is fixed at 27 = 128 and the 
coefficient, in each iteration, is constantly multiplied by two Qeft shifted) until it 
reaches the threshold level or until 8 shifts are made (i.e. the bit precision). As a result 
a coefficient is never permanently deemed insignificant, except in the case of a zero 
coefficient. For example if the value 100 (01100100b) is chosen as the coefficient, then 
in the first iteration, since the MSB is zero, this value is deemed insignificant. 
However, in the next iteration the value will become 200 (11001000b) and will be 
deemed significant. 
The encode stage of the EZW significance identification architecture requires the use 
of a shifting register, such as RO, a memory cell to keep track of previous insignificant 
status and some control logic. Figure 5-8 is a block diagram illustrating the 
components associated with the EZW significance identification. 
Right Shifting 8-bit Register RO 
Internal I External 
Control Signals 
Significance Detect (SigE) 
Significance Status (Sig) Contro 1 Logic 
Decoded S yroool 
Figure 5-8: Significance Identification Architecture 
The significance store, once reset in encode mode, is only set when the LSB (now 
carrying the MSB of the coefficient) of RO receives a logic 1 after shifting. The 
moment the store is set, the coefficient in RO is considered significant and this state is 
reflected through the Significance Detect signal SigE to all other components requiring 
this signal. In the following clock cycle this signal is switched logic low and the Sig 
(Significance Status) signal is latched to logic high. This indicates that the coefficient 
Page- 125 
has been found significant previously, in reference to the current iteration. This fulfils 
the requirement dictated by the EZW algorithm that once a coefficient has been found 
significant, in subsequent iterations, it is considered insignificant. 1bis state is altered 
only when the reset signal is issued. In encode mode, The SigE signal is used to 
determine the symbols, while the Sig signal is used to determine the refinement bits 
required for EZW successive approximation. The SigE signal is also linked to the 
significance tree generation architecture to propagate the significance of the originating 
pixel to both its parent and children. 
The EZW decode mode also relies on a similar architecture to identify significant 
pixels. This therefore, allows for the use of the same architecture, albeit, one that 
switches inputs depending on the operating mode. The decode mode also employs the 
two signals SigE and Sig for the same purpose as in the encode mode. However, the 
significance is determined on the reception of an appropriate significant symbol rather 
than the contents of register RO. From Chapter 3, it is known that the symbols, POS 
and NEG, are the only two that could be considered significant symbols as they initiate 
the refinement-passes. Once a pL'Cel has received one of these two symbols the pixel 
can never receive either a POS or a NEG again, for the duration of the decode cycle. 
Given this encode and decode requirement, a possible schematic for the significant 
identification architecture is presented in Figure 5-9. Table 5-4 briefly describes the 
I / 0 signals that propagate to and originate from the significance identification 
schematic. 
~Q~~12_1~.,. 
[ru;) 
ROout 
- - --, 
i!;r.!.R..9~ 
Figure 5-9: Signifi.cance Identification Schematics 
Page - 126 
In order for the circuit to function as intended, it is required that register R3 be 
initialised to logic 0. This is accomplished by issuing one or more clock cycles when 
the circuit is not in zerotree mode (i.e. ZT = 0). This has been designed with the 
knowledge that the pixel will be performing at least one clock cycle in the wavelet 
mode (WT= 1) or the Motion Compensation mode (MC= 1) or the Shift mode (SH 
= 1) before the zerotree mode commences. In these cases the ZT signal is set low. 
Table 5-4: Significance Identification Circuit I/0 
Signal I Dir I Type I Description 
Clk In Control Global Clock Signal 
EnROa In Control Enable R3 Clock On RO Shift 
FZf In Control Enable R3 Clock On Zerotree Formation 
GEnable In Control Global Enable 
Strrrout In Control Encode I Decode Select (1 = Encode) 
zr In Control Zerotree Mode Select 
ROoul In Data Bits Originating From Register RO 
EXTa In Data Significant Symbol In 
SigE Out Status Initial Significance Detected 
Once register R3 has been initialised to 0, zerotree encoding or decoding can 
commence by setting signal ZT = 1 (This is controlled by the mode selection circuity 
described in Section 4.7.4.) The Strmout signal, also used in the motion compensation 
unit, is reused to dictate if the circuit is in encode (Strmout = 1) or decode (StrmotJt = 0) 
mode. 
In encode mode, the LSB from register RO is supplied via the ROout input. The EnROa 
signal used to enable the clock to register R3 when register RO is cycling the coefficient. 
Similarly, the signal FZT takes the role of the signal EnROa and input R4out takes the 
role of input R0011t, in decode mode. 
5.3.3. Pixel Self-Classification and Enabling 
Given a pixel's significance status, its sign and the significance status of its parents and 
children, a pixel is then able to self-classify itself. The self-classification occurs for all 
pixels in the entire array in parallel. A pixel can be classified into one of four states or 
Page- 127 
be completely disabled. Table 5-5 lists all these possible states. Since there are four 
states, two bits (signals ZS1 and ZS.2) are used to represent the symbols. 
Table 5-5: EZW Self-Classification Symbols 
Pin I Sil,!n I Cm I SigE I DL·~nipti1111 I Symbol j 7_-; IZ.'i2 
l 0 0 0 Zerotree Root ZIR 00 
l 0 0 l Positive Significance POS 10 
l 0 l 0 !so.lated Zero IZO 01 
l 0 l 1 Positive Significance POS 10 
l 1 0 0 Zerotree Root ZIR 00 
l I 0 l Negative Significance NEG 11 
l L L 0 Isolated Zero IW 01 
l l I I Negative Significance NEG 11 
0 x x x Pixel Disabled ZIR 00 
A ZTR symbol is generated by a pixel which has a significant parent or sibling, is not 
currently significant and contains no descendant pixels that are significant. This 
symbol results in the deactivation of the Pout (Signal propagating to all descendants 
confirming the significance of the parent) signal to logic 0. 
A POS symbol is generated by a pb,el which contains a significant positive coefficient 
during the current iteration. The significance statuses of a pixel's descendants have no 
bearing on the generation of this symbol. However, this symbol implies that a pixel's 
immediate descendants should generate their own symbols; therefore, it results in the 
activation of the Pout signal to logic 1. 
A NEG symbol is generated by a pixel which contains a significant negative coefficient 
during the cw:rent iteration. The significance statuses of a pixel's descendants have no 
bearing on the generation of this symbol. However, this symbol implies that a pixel's 
immediate descendants should generate their own symbols; therefore, it results in the 
activation of the Pout signal to logic 1. 
An IZO symbol is generated by a pixel which is insignificant in the current iteration 
but contains at least one significant descendant. This symbol results in the activation 
of the Pout signal to logic 1. 
Page - 128 
Given these conditions, Figure 5-10 presents a possible schematic for the generation of 
the two symbol bits. The additional signal, Strmout, is employed to clear ZS1 and ZS2 
to logic 0, which is a requirement during decode (i.e. Strmottf = 0) mode, as no symbols 
are generated in decode mode, only received. 
Figure 5-10: EZW Classification Schematics 
Although the schematic is capable of generating all four classification symbols it is not 
capable of providing a no transmission state. When the status signal Pin is low, this 
indicates that the parent is a zerotree and that all descendants belonging to that parent 
should be excluded from being able to transmit a symbol. This is overcome by having 
a global pixel enable/ disable mechanism (a new GEnable signal) that has ties with the 
scale control architecture described in Section 4.6.2. In accordance with the scale 
control architecture, a pixel is enabled for processing when both the VEnable and 
HEnable signals arriving at a pixel is set high. This scheme is now modified to include 
pixel enablement only when all three signals VEnable, HEnable and a new signal named 
ZTEnable, are all set high. The ZTEnable signal only influences the pi.'<el enablement 
during the zerotree phase. The resultant GEnable signal, which is derived from these 
three signals, propagates to all components requiring this global enable signal. 
-ii> 
V Enable >----------' 
HE nab le >---__J 
I iThnjlQ ti.,. 
.------ ..... 
~Q~~QI~/ 
Figure 5-11: EZW Pixel Enable Schematics 
Page - 129 
Figure 5-11 shows the pixel enable schematics employed within each pixel. It is dear 
to see that when the pixel is not in zerotrce mode (ZT = 0), that ZTEnabk = 1, which 
excludes the zcrotree circuitry from interfering with other modes. In zerotrce mode, 
the ZTE1111M signal is directly controlled by a further two signals, the EnROa dgnal and 
the output from the multiplexer. Figure 4-28 shows that the generated GEnabk signal 
influences the clocking mechanism of register RO, which implies that when a pixel is 
disabled, register RO will not be docked, Io the significance identification phase, if a 
pi~el is currently found insignificant, then the pixel will be disabled. However, for the 
next iteration, register RO has to be enabled for right-shifting, therefore, the signal 
EnROa is arranged to temporarily allow pixcl enablement in the zerotrce mode. 
The Pin and Si& signals also infl11encc the status of the ZTE,wblt signal via the 
m11Itiplexcr. The S signal, originally used for summation select in the adder (Section 
4.7.1) is reused to select between the symbol generation mode and the successive 
approximation refinement bit generation mode, as the adder is 1101 used concum:ntly 
witb the zerotree mode. Io both encode and decode modes, if a parent pixel is 
significant, then the EZW algorithm dictates that that parent's lrrunediatc descendants 
arc r~'<juir<!d to generate or receive a symbol. l11ercfore, the Pio signal is employed to 
control pixel activation when in symbol generation/ receive mode. A pixel generates or 
receives a refinement bit only when it has been identified as significant previously and 
therefore, the S(!:signal is used to control the pixel enablement in refinement mode. 
The enablement of a pixel in any zcrotree mode implies that it contains or awaits a 
valid symbol or refinement bit and is ready for transmission or reception. 
5.3.4. Pixel Latching & Bypass .Architecture 
Two registers (R4 and RS) arc allocated IO each pixel for the pu!pOse of symbol 
caprure from the symbol generation cir.:uit or from buffers external to the army. In 
addition to this, bypass and concrol mechanisms are also built in to aid in the zcrotrcc 
extraction and load of refinement bits and coefficient signs. The schematic for the 
proposed architecture is presented in Figure 5-12 and descriptions of the associated 
I/0 port.1 are listed in Table 5-6, For any data to be loaded into these registers, the 
Page-130 
!o:I 
signals GEnable and EnROa are required to contain the values of logic 1 and O 
respectively. Loading is performed by clocking the circuit in this state. 
~ V --1cik1v1sn; ;,, 
LLRI 24:'i' 
fJ:1~LZQ~1'; ~):) I 
Figure 5-12: Symbol Latch and Bypass Schematic 
Table 5-6: I/0 Ports for Symbol Latch 
I I I S1~11:tl D11 I I\ JlL' , 011~ Dc~t Dc~u 1pt1(111 
CLK In Clock External Global Clock Signal 
ClkMSB In Control External Force Enable RO MSB Clocking 
FZf In Control External Form Zerotrees 
LR1245 In Control External Select Load for R4 & RS 
s In Control External Symbol I Refinement Select 
Strmout In Control External Encode I Decode Select 
ClkMSBa Out Control Internal Enable RO MSB Clocking 
EnROa In Control Internal Register RO Clock Enable I zr Bypass 
GEnable In Control Internal Pixel-wise Enable 
ZT In Control Internal Zerotree Mode Select 
EXT In Data External Data From Surrounding Pixels 
EXT a Out Data Internal Data For Decode Significance Detect 
Pixout Out Data External Data To Surrounding Pixels 
ROout In Data Internal Data From LSB ofRegister RO 
SigE In Data/Cont Internal Significance Detect Signal 
ZROl Out Control Internal Place Refinement Into RO 
ZSl In Data Internal Symbol bit 1 
ZS2 In Data Internal Symbol bit 2 
ZS3 Out Data/Cont Internal Parent Significance During Decode 
Registers R4 and RS receive data from one of three sources. 
Page - 131 
1. Signals ZS/ and ZSZ - These signals originate ftom the symbol classification 
drcuitty (Section 5.3.3) and, in encode mode, contain a symbol pertaining to the 
classification of that pixel. In decode mode, these signals are fixed at zero (i.e. 
ZTR symbol) which ii; used to clear the registers R4 and RS. Loading of signals 
ZS1 and ZS2 can be selected by setting the LR1245 signal to O and signal S ro 0. 
2. Ex1emal Pixel Data Load (EXT) - Data from the surrounding four pixels can 
be ]Ollded by setting the signal LR1245 to 1. This mode ii; used for symbol and 
refinement bit extraction/load from/to array. 
3. Load From LSD Qf Register RO - In order to extract a refinement bit from the 
array in encode mode, it is necessary to load R4 with data from the MSB of register 
RO. To accomplish this, the four signals ZT, FZT, SltmoNI, LR1245 and S require 
to be set to logic 1, 0, 1, 0 and 1 respectively. The FZT signal is used to 
engage/disengage this mode. 
In addition to this function the architecture has three bypass modes 
1. Pixel Bypass Mode - When a pixel is disabled, with GE11al,/, set !ow, the pixel 
acts as a link joining adjacent pixels on both sides together (Le, EXT is connected 
to Plxoul). In this mode no data reaches the pixel or is generated by the pixel. The 
two series invenccs allow for signal boost on long distance bypasses if conventional 
transmission gate based multiplexecs are used. 
2. Reglsler R4 & R5 Bypass - For wavelet functionality, when register RO is 
activated (signal EnROa is set high) registecs R4 and RS are disconnected to connect 
the I.SB of register RO instead tn Pixou!, Also when EnROa is set high clocking to 
registers R4 and RS a.re also blocked. 
3. Bypass Registt'r R5- When extracting and loading refinement bits, which arc one 
bit in length, it becomes necessary to utilise one-bit registers. For this reason 
register RS can be bypaseed by setting the S signal 10 I. In addition, if the circuit is 
Page· 132 
in decode mode (,Stmw11/ = 0), then the ZROI signal is employed to feed 
refinement bits to register RO. 
Finally, the proposed architecture performs two other functions. 
1. Enable MSB Oock of RO When Significant - In addition to being able 10 fotce 
the signal C/kMSBa high via ClkMSB, in decode mode (Jtmro11/ = 0), the 
architecture allows for the enablement of Register RO's MSB clock when the signal 
S1gE becomes high. Th.is C{leration is used to load the sign into RO_B when a 
significant symbol is decoded. 
2. Parent Decode Symbol - When in decode mode, a parent pixel, vfa the Po111 
signal, informs its descendants to prepare for symbols when it receives a significant 
symbol (POS, NEG or IZO). The Po11/signal is derived from the ZJJ signal. The 
binary representations of POS, NEG, IZO and ZTR are 10, 11, 01 and 00 
respectively. The ZSJ signal should only be low when the symlxll is a ZTR (OOb) 
which is the same as saying th~ ZSJ signal shonld be low, only when the inverse of 
both registers R4 and RS result in I lb, hence the usage of a NANO gate. 
5.3.5. Significance Tree Signal Generation 
The per-pixel architecture required to populate the significance tree is chicny governed 
by the zerotree algorithm. There arc two outputs, namely Sbo11/ and PoHI signals, which 
are associated with the generation of the EZW significance tree, The Sbo11t signal or 
sibling out signal is derived from the three signals G,1, Sbin and J(!:E and is switched 
high if any of these signals arc high. Both CsiJ and J,gE when high generate significant 
symbols (POS, NEG or JZOJ, and as such require transmitting. 11us is conveyed via 
Sbo11/ to the parent The Sbin signal simply acts as a linking signal that provides the 
collective significance of all siblings prior to the current, 
The Po11t signal, however, is dependant 011 which operational mode the pixel is in. In 
encode mode the Po11l signal is derived high from the existence of a lrlgr' status 
between the two signals On and J,gE'. '111~ Cin signal presents significance information 
Page-133 
about the children to the parent. If a child is significant, then the parent is required to 
enable all children pi,xels and prepare them for symbol transmission. Similarly, since 
the symbols POS and NEG, derived from the high status of the SigE signal, do not 
convey the significance information of the descendants, the parent is required to force 
the children to generate a symbol. In decode mode, the Pout signal is set high when the 
parent's received symbol corresponds to one of POS, NEG or IZO Therefore, when 
the signal Strmout is set low (for decode) the Pout signal assumes the content of ZS3 
which is derived from the symbols according to Section 5.3.4. 
The proposed architecture to perform this function is presented in Figure 5-13 
I Sbin > L 
---~ 
Pout 
Figure 5-13: Per-pixel Significance Tree Generation 
5.3.6. Data Extraction/Load Architecture 
Two phases of operation that require some architectural consideration are the 
extraction and load of compressed zerotree data from/ to the array, and ultimately each 
pixel. Data extraction is performed when the pixel contains symbols or refinement bits 
to be transmitted, while the load is performed to populate the array with the same data 
prior to decoding. 
To be able to correctly decode encoded streams, both encode and decode processes are 
conducted in a subband based hierarchical manner, which also provides a mechanism 
for scalable video resolution selection. A 3-scale image decomposition system consists 
of 10 subbands to be transmitted as presented in Figure 5-14. The 
transmission/ reception subband order follows that of LL, LH1, HL1, HH1, LH2, HL2, 
HH2, LH3, HL3 and HH3. Each of these subbands are selected by applying the 
Page - 134 
appropriate control signals to the VEnable and HEnable signal lines as described in 
section 4.6.2. Once a particular subband has been enabled, the pixel data for that 
subband is raster scanned and extracted at the bottotn of the array via a series of 
downward shifts. During decode, the raster scanning mechanism is activated at the top 
edge of the array yet the data is shifted in a downward direction into the array. 
LL .. 
\ :--' LH1 Ui2 
HL1 HH1 ™3 
HLi HH2 
HL, HH3 
Scale O LL 
Scale l LH1 Hi., HH1 
Scale 2 Ui2 Hl-i HJii 
Scale 3 Ul3 HL, HH3 
Figure 5-14: Subband Encode Decode Order 
In encode mode, pixels resulting from a disabling scale control sequence or coefficient 
insignificance switch into bypass mode, therefore the array is effectively reduced to 
only pixels that require transmission. Extraction of data from the array in this state is 
the most efficient, in terms of clock cycles. However, it introduces a significant 
problem in pixel position synchronisation as the number of enabled pixels present per 
column is unknown to the control mechanism. In decode mode, a count of active 
pixels must be performed to enable correct decoding. One technique that may be used 
is to introduce a series of counters at the top and bottom of the array. However, these 
present a significant inctease in complexity in the control architecture and the array-
Page- 135 
edge components. The proposed alternative is to provide a distributed counting 
mechanism within each pixel by allowing each pixel to add a link to a chain of bits that 
can be shifted in parallel to the data. To realise this architecture each pixel is outfitted 
with an additional register and some control logic. The control logic allows for the 
register orientation to be switched between its function as a component of a large 
column-wise shift-register or as a pixel-wise single cell storage element. 
Using this architecture, all active pixels place logic 1 in the storage cell, while 
deactivated pixels simply bypass the cell completely. The net effect that occurs is the 
generation of a number of column-wise shift-registers, which are composed of the 
storage cells of active pixels only, and all containing logic 1. Figure 5-15 illustrates an 
example of this counting method for an 8 by 8 NBLOCK with the HH3 band selected 
for transmission. 
·--=·-· Counter Shift Register 
In disabled pixels 
Zero insertion 
At edge 
In counter bypass rmde 
·--=·=· For all edge colll'onents 
that contain al, data is 
extracted. 
Figure 5-15: Distributed Pixel Counting 
Counter Shift Register 
In enabled pixels 
Place l in chain 
Pixels belonging to 
the same subband yet 
contain insignificant 
coefficients, hence 
deactivated 
Although the entire HH3 subband is selected, four pixels within this subband are 
deactivated, as would happen if the pixels were not carrying any significant coefficients. 
At the top edge of the array, logic O's are fed into the shift register, the only O's to 
traverse the system. When these O's propagate to the bottom-edge of the array via the 
Page- 136 
large shift register, this indicates that the column has exhausted all of its significant 
coefficients and, as such, expectation of any data from that column should be halted. 
In this light, out of the four active columns in Figure 5-15, the two left-hand columns 
will contain data for four clock iterations, while the two right-hand columns will 
contain data for only two clock iterations. This ensures the extraction of exactly 12 
sets of data. Using this self-counting system any non-symmetrically positioned data 
array is efficiently extracted. 
The same system is employed in the decoding process, with the exception of the shift 
register, which now propagates upwards instead of downwards. Also the end of 
column zeros are inserted at the bottom edge instead of at the top. When a zero 
propagates to the top, via the shift register, it indicates that the column has no more 
free pixels requiring coefficients, and the entire column is disabled via that column's 
VEnable signal. In decode mode, since the parent pixels, residing in the previous scale, 
dictate which children are to be significant, each pixel uses this knowledge to generate 
logic 1 in the shift register, indicating that it is expecting data. 
To alleviate the burden of routing an additional set of wires in each column, the 
proposed scheme can be integrated into the low / high pass pixel select system 
described in Section 4.6.3. Incorporating this, the per-pixel architecture for the 
proposed counting system is presented in Figure 5-16. 
~-Z_T_/ 
Above >-+--,,--, 
Below >--+--t--r--1 
,------ ..... 
~Ql;_l}<!~l~--Strmout 
HPROut 
Figure 5-16: Per-pixel Counting Architecture 
Initially, when the circuit is not in zerotree mode (ZT = 0), it is disabled, and acts the 
part of two XOR gates, which invert the Above and HPRin signals when GEnable (i.e. 
Page - 137 
Pixel is active) is held high. This complies with the wavelet transfotm which requires a 
low/high p11Ss pixel selection mechanism as described in Section 4.6.3. When in 
zerotree mode the functions of the Above and Btlow signals change to become pixel 
interconnections. The functions of HPRln and HPROu/ do not change. To support 
both encode and decode, the signals Abot~ and &low are required to be bidirectional. 
In encode mode the shift-register is configured to accept data from Above and release 
data to Behw. In decode mode the opposite is true. 
Investigating the encode phase, when loading the symbols or refinement bits into 
register R4 or RS, the GE"abfe (detennined by the significance tree) signal is also loaded 
into register R6 by setting the signals LR1245 and FZT to O and 1 respectively. Once 
this i!RNS is in register R6, 1etting signafo LRl245, LR and RC (from Section 4.5) to 1, 
O and 1 respectively will result in interconnection of two array-wise shift-registers, one 
for the symbol/refinement bits and the other for the counting circuitry. If extracting 
symbols, for every two-bit symbol extracted, the counting is shifted nnce, while if 
extracting refinement bits, for each bit extracted the counter is again shifted once, 
The nperational principal for the decode phase is identical, except for an exchange 
between the functions of Above and Below. 
5.3.7. Array Edge Buffer Design 
In order to implement the proposed encnde/decode phases, the per-pixel extract/load 
circuitry needs to be supplemented with an array-edge buffer at boch the top and 
bonom of the array. The design nf these buffers ate considered beyond the scope of 
this thesis as severnl other timing issues have to be investigated, hence it has been 
allocated as a separate research topic and conducted as another student project. 
However, the requirement specifications for the encode and decode buffers which 
relate to the ierotree codec arc specified herein. 
The proposed placement nf the encode buffer is at the bottom of the array and it 
requires ceUs equalling the number of pixels in the bottom row of the anay. Each ceU 
is CODllected individually to a corresponding pixel by two connections. The first, a 
Page-138 
unidirectional connection linking the pixel data output to the buffer cell data input, and 
the second, a bidirectional link connecting the Bottom signal of a last row pixel and its 
corresponding signal (named here as Statin) in the buffer cell. In symbol extraction 
mode (Encode), all buffer cells should be ready to accept a two-bit symbol from the 
pi.-x:el above it and a single bit from the Statin signal, when the array has finished 
generating symbols. Once this data is received it can be serially removed from the 
buffer, at a higher clock speed, and transmitted. Only symbols that are accompanied 
by logic 1 in the Statin link when the first symbol bit arrives in the buffer are removed. 
All cells containing a zero are placed in bypass mode. The proposed encode buffer 
(Bottom) architecture is presented in Figure 5-1 7 and an I/ 0 port description is 
provided in Table 5-7. 
RC Cnt 
BR4Set 
BR CLK 
Statin ~ 1 
A CLK 
CellOut 
:>0--+---------i CCntOut 
Figure 5-17: Edge Encode Buffer Architecture 
The registers BR1 and BR2 are alternatively used to shift symbol or refinement bits 
depending on the status of SR_SEL (SR_SEL = 1, sets symbol mode). These registers 
should be able to be clocked at the array frequency (100 KHz) or a much higher buffer 
extract frequency (- 1-5 MHz) depending wether data is being shifted from the array 
or from the adjacent cell respectively. The register B3 is used to capture the Statin 
signal and provide a bypass mechanism for the cell extract counter and data stream if 
Statin is logic 0, when extracting data from the array. This register is clocked at the 
array's operating frequency. 
Page - 139 
Table 5-7: Edge Encode Buffer I/0 Ports 
Signal I I \l'C [)e~L l iptJ\lll I 
A CLK Clock Array Clock 
BR4Set Control Set Register BR4 To Logic 1 
BR CLK Clock High Speed Bit Removal Clock 
PC SEL Control CelVPixel Data Input Select 
RC Cnt Control Remove Cell Counter In 
SC CLK Clock Symbol/Cell Switchable Clock 
SR SEL Control Symbol/Refinement Mode Select 
Statin Control Array Column Count In 
CCntOut Control Cell Counter Output 
LC out Data Left Cell Data Output 
TPixout Data Data From Pixel Above 
CellOut Data Data To Next Cell On Right 
The B4 register forms part of an active cell counter. This register is set to logic 1 
before high speed cell data extraction is performed. If the cell contains valid data (BR3 
= 1) the logic 1 in the BR4 register is added to the list of active cells else it is bypassed 
and remains invisible. Logic O is fed in via the RC_Cnt signal at the outmost left cell 
before shifting begins, when this O reaches the outmost right cell, cell data extraction is 
complete. This is very similar to the active pixel extraction mechanism described in 
Section 5.3.6. The proposed edge decode buffer architecture is presented in Figure 
5-18 and the corresponding I/0 port descriptions are presented in Table 5-8. 
BPixin 
~ I I I I I I I I I I I I I 
~ t 
SC CLK 
--
SR SEL 
RCCnt 
I II ,~  
Stat in 
VEnable 
SCVEnable >-----+---~ 
A CLK') I 
VE SEL 
Figure 5-18: Edge Decode Buffer Architecture 
Page - 140 
Table 5-8: Edge Decode Buffer I/0 Ports 
Signal I Type I Description 
A_CLK Clock Array Clock 
BR_CLK Clock High Speed Bit ReIIDval Clock 
RC_Cnt Control Rem:>ve Cell Counter In 
SC_CLK Clock Symbo.VCell Switchable Clock 
SCVEnable Control VEnable Signal from Scale Control 
SR_SEL Control SymboVR.efinerrent Mode Select 
Statin Control Array Column Count In 
VE_SEL Control Vertical Enable Select 
VEnable Data Colurnn Enable 
CCntOut Control Cell Counter Output 
LCout Data Left Cell Data Output 
Bpixin Data Data To Pixel Below 
CellOut Data Data To Next Cell On Right 
The edge decode buffer is similar in nature to the encode buffer. However, the Stalin 
signal belonging to a buffer cell is connected to the Above signal on a pixel at the top of 
the array and the BPixin signal connects to the From_Top input of that same pixel. The 
cell array is fed data from the left side of the buffer; however, the counter bit-shift 
propagation occurs from right to left. This is to provide the maintenance of data in 
correct order and supply a count to the receive decoder. Figure 5-19 illustrates the 
interfacing between the array and the two buffers. 
The registers BR1 and BR2 are used to shift symbol/ refinement bits along the buffer 
and into the array if it is clocked. Registers B3 and B4 play an identical role as that in 
the encode except that the counting bit shifts are carried out from right to left along 
the cell array. The decode buffer also needs the ability disable columns pixels as they 
are filled with data. This is proposed by controlling the VEnable signal once the array 
counter bits have been established (i.e. Set VE_SEL to 1) . When BR3 receives a zero 
this immediately disables pixels in that column. They can be reactivated for scale based 
processing by setting VE_SEL to logic O; when this occurs, VEnable resembles the 
original scale control setting. As with the encoding buffer, two clock frequencies are 
employed, the slower to load data into the pixel array and the high speed clock to load 
data into the buffer. 
Page - 141 
Data ln 
Array 
Colwm1 Data 
' 
' 
' 
' 
' 
' 
' 
' 
' 
' 
' 
' 
0-+ 
Flow of 
Data & Buffer Counter Bits 
Pixel Array 
,-. 
Encode Buffer 
Flow of 
Data & Buffer Cmmter Bits 
Decode Buffer 
IJJ.~ 
' 
' ' 
Figure 5-19: Buffer-Array Interfacing 
5.3.8. The Encode Control Sequence 
' 
' 
0 
' 
' 
' 
Array Colunm 
Counter Bits 
(Stalin_) 
Data Out 
Control settings for the EZW full array encode cycle are presented in Table 5-9. 'The 
scale control steps (subband selection) are derived according to Section 4.6.2. 
5.3.9. The Decode Control Sequence 
The complete array-wise decode cycle is performed by following the control settings in 
Table 5-10. The scale control steps (subbaod selection) are derived according to 
Section 4.6.2. 
Page - 142 
Table 5-9: Encode Cycle 
p 
h 
a 
( )l'Seti pti< lll ta [Set Carty [2's Comp Convert 
3 Reverse Phase 1 
4 Reverse Shift 
5 Reverse Phase 2 
6 Reverse Shift 
7 Latch Symbols 
8 LoadR6 
R R 
() () 
<: c 
0 0 0 0 
0 0 0 0 
0 0 0 0 
0 0 0 0 
1 1 0 0 
1 1 0 0 
9 I Select Lowest FreQuencv Subband 
10 Shift S mbol LSB 1 1 0 0 
11 Shift LSB & Counter 1 1 0 0 
1; 
n 
0 0 0 1 
0 0 0 1 
0 0 0 1 
0 0 0 1 
0 0 0 0 
0 0 0 0 
0 1 0 0 
0 1 0 0 
c 
I 
(l 
c 
s k 
I L c c 
r R I \' 
111 I k c 
() 2 R 1: .\( 
0 0 1 0 0 1 
0 0 0 0 0 3 
0 0 1 0 0 1 
0 0 0 0 0 3 
1 0 0 0 0 1 
1 0 0 1 0 1 
1 1 0 0 0 1 
1 1 0 1 0 1 
12 Repeat Phases 10 & 11 until last pixel in all columns have a O for Below 
I I I I I I I I I I I I I I I I 
13 !Repeat Phases 10, 11 & 12 for each subband 
14 I Select Entire Arra, 
15 I Set R6 & Ref bit 1 1 0 0 0 1 1 0 1 0 o 11 Io I I I 1 
I I 
16 Select Lowest Pre, 
17 Shift LSB & Counter 1 1 0 0 0 1 1 0 1 1 0 1 0 I 11 
18 Re eat Phase 15 until last ixel in all columns have a O for Below 
19 ReEeat Phases 16 & 17 for each subband 
I I I I I I 
20 Select Entire Arra 
21 Shift RO for Next Bit 111110101011101111101010101 I I 1 
22 I Repeat 7 to 21 until mode chan1?e or RO has done 8 shifts 
Page - 143 
Table 5-10: Decode Cycle 
s 
r i. I I I c 
I' R R r R 
h II II I ·: 111 I k 
a c: c fl () 2 R J, :\[ 
:\[ :\I r r I. R R II 4 l' z s 
l' I Desc1iption I 2 I 2 R c s () t :, \' r B 
ClearR3 0 0 0 0 0 0 0 0 0 0 0 0101 
2 ClearR4& RS 1 1 0 0 0 1 0 0 0 0 0 0101 
3 Enable Lowest Frequency Subband 
4 Load R6 1 1 0 0 0 1 0 0 0 0 0 1 01 
5 Load Symbol LSB 1 1 0 0 0 1 0 0 0 1 0 0 01 
6 Load Symb MSB & R6 1 1 0 0 0 1 0 0 0 1 0 1 ol 
7 Repeat Phases 5 & 6 until O is received at the top pixel via Above 
I I I I I I I I I 
8 Repeat Phase 4 to 7 for each subban 
9 Enable All Pixels 
10 Load RO Sien 1 1 0 0 0 1 0 1 0 0 0 o I 1 I 
11 Enable Lowest Frequency Subband 
12 Load R6 for Ref Bit 1 1 0 0 0 1 1 0 0 0 0 1 01 
13 Load RefBit 1 1 0 0 0 1 1 0 0 1 0 1 01 
14 Repeat Phase 13 until O is received at the top pixel via Above! 
15 Repeat Phase 13 to 14 for each subband 
16 Enable All Pixels 
17 Load RO RefBit 1 1 0 0 0 1 1 1 0 0 010101 
18 Repeat 3 to 17 until mode chan Ye or RO has done 8 shifts 
19 Reverse Phase 1 0 0 0 0 0 0 0 1 0 0 1 0 0 
20 Reverse Shift 0 0 0 0 0 0 0 1 0 0 0 0 0 
21 Reverse Phase 2 0 0 0 0 0 0 0 1 0 0 1 0 0 
22 Reverse Shift 0 0 0 0 0 0 0 1 0 0 0 0 0 
23 Set Carry 0 0 1 0 0 0 0 0 0 0 0 0 0 
24 2's Comp Convert 0 0 1 0 0 0 0 1 0 0 0 0 0 
I 
I 
I 
I 
I 
I 
I 
I 
I 
c 
I 
() 
c 
k 
c 
y 
c 
l ' 
I 1 
11 
I 1 
11 
11 
I 1 
I 1 
I 1 
I 1 
3 
1 
3 
1 
8 
Page- 144 
5.4. Zerotree Entropy Coder 
The ZTE coder, unlike the EZW, is a twO-pass coder dmt performs a single symbols and 
coefficients pass before transmitting the contents of the array, The hardware coder, 
therefore, perfonns only two passes per encode of the enti.i:e array. As with the EZW 
coder, symbols for all pixels in the array are generated via a concurrent pixel-wise 
coefficient self-classification archirecnire. Highly parallel significance tree propagation 
techniques arc employed to gather significance data from all pixels within a single 
NBLOCK (as described in Section 5.2) to enable the generation of symbols. The symbols 
are supplemented with a set of coefficients which, unlike the EZW, does not require 
special preparation. The coefficients, however, do require a quantisation phase, because 
the ZTE does not support successive approximation. This process coupled with a 
significance identification mechanism, a symbol / coefficient data extraction mechanism, 
two array interface buffers and a symbol based pixel activation mechanism, constitute the 
entire hatdware an:hitectU(C for the ZTE codec. The VHDL code for a single pixel with 
the ZTE component is provided in Appemlix B. 
5.4.1. Coefficient Quantisation 
As mentioned previously (Section 3.3.1), the ZTE algorithm requires the use of a 
subband quantisation component. The ZTE algorithm relies quite heavily on the 
generation of grouped zeto (or -1) coefficients. The hardware implementation of the 
subband quantisation scheme requires the ability to right-shift the coefficient (register 
RO), hold the coefficient MSB and select individual subbands. The atchitecrurc 
proposed in Chapter 4 exceeds the specifications required to perform this function. 
To perform quantimion, firstly a subband is selected, then that subband is right-
shiftcd appropriately to match the requited quantisation. Since right-shifts arc used the 
possible quantisation coefficients are factors of two. Table 5-11 lists the required 
control sequence for performing quantisation once a subband has been selected. The 
value X represents the number of shifts required, for instance if the subband is to be 
quantised by a factor of 4 thcnX = log2 4 = 2. 
Page-145 
Table 5-11: Quantisation Select 
_,_ 
Negative coefficients which are truncated in this fashion never reach zero instead they 
teach the value of -1. This is because two's complement truncation results in the 
generation of a long string of ones which represent the value -1. This condition needs 
to be accounted for. 
5.4.2. Significance Identification Architecture 
Unlike the complex scheme used by the EZW algorithm, the ZTE algorithm only 
requires a simple significance identification architecture. This is because it only 
requires knowledge of zero (or -1) coefficients. To account for the case of negative 
numbers being truncated to negative one, any truncated negative one coefficients are 
considered insignificant also. Therefore, the search for insignificant coefficients 
requires the evaluation of both conditions; the coefficient is zero or the coefficient is 
negative one. Negative one in 8-bit 2's complement binary is represented as 
11111111 b, while zero is represented by OOOOOOOOb. Therefore, the condition 
becomes a case of determining if all bits of the coefficient are the same or not. This is 
easily accomplished by an XOR logic gate. If the XOR triggers as the contents of the 
coefficient are cycled through the XOR gate, then the coefficient can be deemed 
significant, as it is neither zero nor negative one. The value resulting from the XOR 
can be stored as the significance status. Figure 5-20 illustrates the proposed 
architecture. The register RO described in Section 4.7.2 is slightly modified to provide 
the two extra outputs RO _Oout and R0 _ 1 out. The latch is reset only when the pixel exits 
the zerotree processing mode (ZT = 0). Therefore, the significance status is 
maintained for the duration of the zerotree process. 
Page -146 
ZT 'Xl1---~~--, Reset 
RO Oout 
Latch 
RO lout 
Register RO 
RO Oout 
RO lout 
Figure 5-20: ZTE Significance Identification Architecture 
5.4.3. Pixel Self-Classification Architecture 
Once each pixel has completed identifying self-significance identification, this 
information is propagated to all relevant pi,xels via the significance tree. When each 
pixel receives this significance data, which occurs concurrently throughout the entire 
array within one clock cycle, the pixel is able to self-classify itself. The ZTE algorithm 
employs three symbols (ZTR, VZT and VAL) for the generation of the significance 
tree and it is generated only once for the entire compression cycle of the image. Table 
5-12 lists these symbols, the generation conditions and two-bit (ZS1, ZSZ) 
representations. 
Table 5-12: ZTE Self-Classification Symbols 
0 0 0 Do Not Trasmit I DNT I 00 
0 0 l Impossible 
0 l 0 Impossible 
0 1 1 Impossible 
0 0 ZerotTee Root ZTR 01 
0 l Valued Zerotree Root vzr 10 
0 Value VAL 11 
Value VAL 11 
The following is a definition for the generation of the symbols 
Page - 147 
A DNT symbol is used to indicate a Do Not Transmit state within a pixel and, 
therefore, prevent any symbol transmission from this pixel to the receiver. This extra 
symbol is not considered a waste of bit-allocations as three symbols require two bits in 
any case. The DNT symbol is generated in a pixel when all three, the parent, the 
children and the pL'{el itself are insignificant. In cases where the pixel or its 
descendants are significant the parent (Pin) signal can never be logic 0. Therefore these 
conditions are considered impossible. In this case the Pout signal is held low and a 
coefficient is not expected from this pixel. 
A ZTR symbol is generated by a pixel which is not significant, has no significant 
descendants but contains a significant parent (or sibling signalling via the parent). In 
this case the Pout signal is held low and a coefficient is not expected of this pixel. 
A VZT symbol is generated by a pixel which is significant, yet contains no significant 
descendants. Pixels generating these symbols are expected to also transmit coefficients. 
In this case the Pout signal is held high during encode. 
A VAL symbol is generated by a pixel that contains one or more significant 
descendants. Pi,'{els generating these symbols are expected to provide a coefficient 
even if the coefficient is zero and is a dictated by the ZTE algorithm. 
The two symbols VAL & VZTrepresent the key difference between the ZTE symbols 
and EZW symbols (POS, NEG& IZO). 
Given these conditions a possible architecture for the generation of these symbol bits 
(ZS1 and ZS2) is presented in Figure 5-21. 
Figure 5-21: ZTE Classification Architecture 
Page - 148 
The Strmout signal is employed to clear the outputs during the decoding phase (Strmout 
= 0). The ZS3 signal is used for pixel disabling and is placed here to reduce required 
components. 
5.4.4. Pixel Enable Architecture 
In addition to the scale control signals VEnable and HEnable the ZTE architecture also 
requires a means of disabling a pixel when it contains no data to transmit (i.e. assign 
GEnable to zero). The proposed pixel disable mechanism employed by ZTE 
architecture resembles that of the EZW because during the symbol encode and decode 
cycles the Pin signal controls the deactivation (Pin = 0) of pixels otherwise only 
controlled by the scale control architecture. This is because if a parent is insignificant 
then there is no reason for a child to transmit a symbol. When encoding and decoding 
the coefficient itself, the pixel disable, due to the zerotree component, is governed by 
the existence of a VAL or VZT symbol. These in turn, during the encode cycle, are 
generated via the logical OR operation performed on signals Cin and S~. Since this 
function is performed previously in the symbol generation logic, the signal ZS3 is 
employed to deliver the result here. Given this specification, the GEnable signal is 
derived by the architecture presented in Figure 5-22. 
CcI~~f1i> 
VEnable >-- ~ 
HEnable >---~ 
Figure 5-22: ZTE Pixel Enable Architecture 
In zerotree mode (ZT = 1) the pixel is enabled if both VEnable and HEnable are high 
and one of the following conditions are true. 
Page - 149 
1. The Pin signal is high in both symbol encode and decode mode. This impLies that 
the parent is significant or has significant descendants. Hence the pixel shnuld be 
prepared to transmit or receive symbols. 
2. The ZJ3 signal is high when encoding coefficients. This impLies that the pixel is a 
Value or a Valued Zero Tree toot, thetefore, requiring coefficient transmission. 
3. Both the signal Pin and the contents of register R4 arc high, when performing 
coefficient decode. This impLies a pixel's parent is significant and the recently 
received symbol is a Vaine or Valned ZeroTree root. Therefore, this pixel is 
enabled to receive a coefficient. 
4. Finally, when EnROa is high during symbol encode mode (S = 0). This dlows 
cycling of RO in zerotree mode before any significance staro:ses have been 
determined. During refinement passes (S = 1), however, this is disabled so that 
only significant pixels are held enabled. 
5.4.S. Symbol Latching & Bypass Architecture 
The pixel latching and bypass architecture provides for the capture and transport of 
symbols and coefficients to and from the pixel Since the ZTE, like the EZW 
algorithm, is based on the transport of two-bit symbols, two additional registers (R.4 
and RS) are required. These registers only dock when the pixel ls in zerotree symbol 
mode, with GE11ab/e high and EnROa low. Also several bypass mechanisms arc 
proposed for the transport of coefficients and symbols via the same output. The 
proposed architecture is presented in Figure 5·23 and a description of associated I/0 
ports is presented in Table 5-13. 
Registers R4 and RS can receive data from one of several sources when in zcrotree 
mode. These are Listed ns follows. 
1. Slguals ZSJ & ZS2- These signal1 originate from the symbol generation circnltry 
providing a symbol in encode mode (SltmoNI= l), and a zero value in decode mode 
Page - 150 
(Strmottt = 0). The signals are captured when LR1245 is set to logic O and the 
circuit is clocked. This function is used for encoding. 
2. The External Data Load Signal EXT - When LR1245 is set to logic 1 data 
originating at the EXT signal is loaded into register R4 and data at register R4 is 
load into register RS. The circuit acts as a large grouped (2-bit groups) shift register 
when several pixels are connected together. 
Figure 5-23: ZTE Latch & Bypass Architecture 
Table 5-13: Latch & Bypass I/0 Ports 
Signal I Dn 
I 
r' P'-' ! <) 11g Dc~t : Dc•,u 1plloll I 
CLK In Clock External Global Clock Signal 
LR1245 In Control External Select Load for R4 & R5 
EnROa In Control Internal Register RO Clock Enable I zr Bypass 
GEnable In Control Internal Pixel-wise Enable 
zr In Control Internal Zerotree Mode Select 
EXT In Data External Data From Sl.ll'founding Pixels 
Pixout Out Data External Data To SITTounding Pixels 
ROout In Data Internal Data From LSB of Register RO 
R4out Out Data Internal Data For Pixel Enable in Decode 
ZSl In Data Internal Symbol bit l 
ZS2 In Data Internal Symbol bit 2 
ZS4 Out Data/Cont Internal Parent Significance During Decode 
The architecture also consists of the following bypass modes. 
1. Pixel Bypass Mode - If the GEnable signal is set low, then all data originating at 
EXT will be routed to Pixout. In this mode the pixel acts as a transparent pixel to 
its immediate neighbours. 
Page - 151 
2. Symbol Bypass Mode - When EnROa is set high the pixel enters this mode. In 
this mode the data from register RO is routed to Pixout. This mode is used to 
transmit data from register RO. 
Finally the architecture also provides a signal, ZS4, which is used in the decode phase. 
This signal is held high when registers R4 and RS contain a VAL symbol, indicating 
that its immediate children should expect symbols. 
5.4.6. Significance Tree Generation 
The proposed per-pixel ZTE significance tree generation architecture is presented in 
Figure 5-24. The sibling out signal, used during encode, is held high either, if a pi..xel's 
left sibling is significant (Sbin Signal), if the pixel itself is significant (S~ Signal) or it has 
significant descendants (Cin Signal). This signal is not used during decode. 
During encode (Strmout = 1) the Pout signal is held high if a pixel has significant 
descendants, as this implies the generation of a VAL symbol. During decode (Strmout 
= 0) the Pout signal is held high if the contents of register R4 and RS indicate a VAL 
(11 b) symbol. This signal is provided by the ZS4 signal. The Pout signal allows the 
descendants of a particular pixel to expect the arrival or transmission of a symbol and 
hence self-activate. 
Sbin 
Shout 
Cin 
zs4 ---- Pout 
Strnnut >- ---
Figure 5-24: ZTE Significance Tree Generation 
5.4.7. Data Extraction/Load Architecture 
The data extraction and load architecture proposed for the ZTE is identical to that 
proposed for the EZW in Section 5.3.6. The only difference lies in the control 
Page - 152 
sequence. Instead of iteratively alternating between the symbol and refinement passes, 
the control sttuctute here performs a single symbol and a single coefficient pass. 
Initially, the scale control lines are set to enable only the lowest suhband. Then in an 
encode cycle, when capturing the symbols into R4 and R5, the FZT sig.nal is activated 
for one dock cycle also, so as to load GEnabk imo register R6. Then the signal 
LJl..1245 is activated to link all the R6 registers into one array sized colwnn shift 
register. Similarly the RC and LR signals arc set to logic 1 and O respectively to 
connect the main pixel interconnects in a vertical shift direction from top to bottom. 
Then, for every symbol shifted (two dock cycles) the FZT signal is acrivatcd for one 
clock cycle to shift the counter bit. The artival of a zero at the bottom pixel's Bt~i;, 
output indicates that the column contains no more symbols to send. When all columns 
produce a zero the next subband can be activated and the symbol shift .repeated for it. 
\Xlhen all symbols have been extracted the S signal is activated for coeffidcnt extraction 
mode. Again, the lowest sub band of pixels is first activated. Then for one dock cycle 
the signals LR.1245 and FZT arc set to logic O and 1 respectively to load GEnabk into 
R6. Thr. symbols do not reload as Sis held high. The signals LR124S and FZT arc 
then set to logic 1 aod logic O respectively, also EnRO (not EnROa, which is a derived 
signal) is set to logic 1. For each eight shifts (8 clock cycles) of the register RO, the 
FZT signal is held high for one, so as to shift the counter bit. As with the symbols, 
coefficient shifts stop when the bottom pixel produces a zero on its Bthw signal. 
The decode mode also follows that of the EZW. However, only two p~sses are made, 
one for the symbols and the other for the coefficiems. During the symbol pass the 
FZ'f signal is only activated for one in two clock cycles, while the same signals is 
activated for one in eight clock cycles during the ~oefficiem pass. 
5.4.8. Array Edge Buffer Design 
Since the data extraction 1md load architecture contains virtually no difference between 
the ZTE and EZW algocilhms, the edge buffers designs also follow suit. There are 
only two differences between the buffer designs. 
Page· 153 
I. Register BR4 is extended to II-bits - Since the ZTE archi1ecrnre can produce an 
8-bit coefficient value instead of a single refinement bit as generated by the EZW 
architecture, register BR4 is extended to handle 8-bits instead. The register BR4 
then becomes an 8-bit shift register. 
2. The control structure le modified 10 handle 8-bit cych:s - Since- the pixel Array 
control strucrurc is modified to handle 8-bit coefficients the buffer control 
structures have to also be modified to perform the same. fo addition the control 
structure has m be modified to handle on!y rwo passes instead of several alternate 
passes in the EZW architecture. 
These changes only present a simple set of modifications and as iuch do not rc~uiro 
explicitly defined architecture. 
5.4.9. The Encode Control Sequence 
The full ZTE encode cycle for either an array or single pixel snucture is presented in 
Table 5-14. The variable Xis used to represent the number of quantisation cycles. It 
is dependant on the user selectable subband quantisation mechanism and the chosen 
quantisation coefficients, such as that described in Sections 2..6.2, 33.1 and 4.6.5. 
5.4.10. The Decode Control Sequence 
The full ZTE decode cycle for either an array or single pixel structure is presented in 
Table 5-15. The variable Y is used to represent the number of de-quantisation C)'Cles. 
It is dependant on the user selectable subband quantisation mechanism and the chosen 
9uantisation coefficients, such AS that described in Sections 2.6.2, J.3.1 and 4.6.5. 
Poge-154 
p 
h 
;\ 
s 
l' 
Table 5-14: ZTE Encode Cycle 
SI 
R R r 
() () I ·: 111 
<: <: 11 (I 
\( t I I. R R II 
Dcsui pti, m 
c 
I 
() 
c 
k 
L <: c: 
R I \' 
I k c 
2 R I · .\l 
4 l' z s l' 
Select a Subband For Quantisation ~EBo uantise Subband 111010101010101110101010101 I IX 
Repeat Phases 1 & 2 Until all Relevant Subbands Have Been Quantised 
4 Enable Entire Array 
5 Significance Detect 1 1 0 0 0 1 0 1 1 0 0 0 0 8 
6 Enable Lowest Frequency Subband 
7 Latch Symbols & R6 1 1 0 0 0 1 0 0 1 0 o I 1 I o I I I 1 
8 Extract RS 1 1 0 0 0 1 0 0 1 1 010101 I I 1 
9 Extract RS & R6 1 1 0 0 0 1 0 0 1 1 0 110 
10 Repeat Phases 8 & 9 Until Bottom Pixel displays O in Below 
I I I 
11 Repeat Phases 7 -10 For all Subbands 
12 Enable Lowest Fre 
13 Latch R6 for Coeff 1 1 0 0 0 1 1 0 1 0 o I 1 Io I I I 1 
14 Ex. RO 7 LSBs 1 1 0 0 0 1 1 1 1 1 010101 I 17 
15 Ex. RO MSB & R6 1 1 0 0 0 1 1 1 1 1 o I 1 I o I I 11 
16 Repeat Phases 14 & 15 Until Bottom Pixel dis 
17 Repeat Phases 13 -16 For all Subbands 
Page -155 
Table 5-15: ZTE Decode Cycle 
2 ILatch R6 11 I 1 Io IO 0 1 0 0 0 0 0 1 01 I I 1 
3 Load RS 1 1 0 0 0 1 0 0 0 1 0 0 0 
4 Load RS &R6 1 1 0 0 0 1 0 0 0 1 0 1 01 I I 1 
5 Re 
6 Repeat Phases 2 - 5 For all Subbands 
-
7 Enable Lowest Fret 
8 Latch R6 for Coeff 1 1 0 0 0 1 1 0 1 0 0 1 01 I I 1 
9 Ld. RO 7 1.SBs 1 1 0 0 0 1 1 1 0 1 0 0 01 I 17 
10 Ld. RO MSB & R6 1 1 0 0 0 1 1 1 0 1 0 1 ol I I 1 
11 Repeat Phases 9 & 10 Until Top Pixel Displays O in Above 
12 Repeat Phases 8 - 11 For all Subbandsl 
13 Select a Subband For De-Quantisation 
14 De-Quantise Subband 1 0 0 0 0 0 0 1 0 0 0 0 1 I I IY 
15 Repeat Phases 13 & 14 Until all Relevant Subbands are De-Quantised 
Page - 156 
5.5. Codec Comparison 
To aid in the decision to select a single codec for hardware prototyping, both the EZW and 
ZTE codecs arc compared in this section. 
5.5.1. Video Compreeeion Performance 
To provide a fair comparison, both 2erotrce codecs were tested using the same Motion 
Compensation, Triangulac Wavelet Transform and Ariihmetic Coder componems 
(Chapt•·r 4). Two sequences, the proprietary Jenny sequence and the stnndard 
Salesman sequence, were compared at two diffcre11t bit rates, 64 Kbps (ISDN) and 250 
Kbps (3G). More information oo these two video sequences can be found in Section 
2.2.4. 
At 64 Kbps, both video sequences were sub-sampled from 25 fps to 5 fps to maintain 
a low bit-rate, Figure 5-25 illustrates the compression performance of the two 
algorithms on the Jenny sequence at 64 Kbps and at 5 fps. Figure 5-26 illustrates the 
same comparison using the salesman sequence. The Jenny sequence seems to show 
nellT identical performance between the two algorithms. However, a slight decrease in 
the uniformity of the PSNR in the EZW is noted. The ZTE preforms slightly bmcr 
when compressing the salesman sequence, and also indicating a uniform curve. 
At 250Kbps (-JG), both video sequences were sampled at 25 fps. Figure 5-27 
illustrates the perfornmnce for the Jenny Sequence, while Figure 5-28 the Salesman 
sequence. The Jenny sequence seems to show somewhat similar performance, between 
the two codecs, while the Salesman sequence favours the EZW. 
This suggests that at high bit-rares, for the sequence without a moving camera 
(Salesman), the EZW performs berm, yet at low bit-rates the ZTE outperforms the 
EZW. For a moving camera sequence Qenny) the performance difference between the 
two are marginal at both bit-rates. However, in ,ill cases the ZTE seems to deliver the 
most uniform performance characteristic. 
Page -157 
39 
l=~ I 
38 
37 
36 
35 
a:'. 
z 
en Q. 
34 
33 
32 
31 
30 
0 50 100 150 200 250 300 350 
Frames 
Figure 5-25: Jenny Sequence at 64Kbps (5fps) 
Figure 5-26: Salesman Sequence at 64 Kbps (5fps) 
Page - 158 
[t: 
z 
U) 
Q. 
~ I ~ I 
50 
45 
40 
30 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
0 200 400 600 800 1000 1200 1400 1600 
Frames 
Figure 5-27: Jenny Sequence at 250 Kbps (25fps) 
~ I ~ I 
50 
45 
~ 40 
~ 
25 L..~~_J_~~~.L_~~_J_~~~.L_~~_J_~~~.L_~~_J_~~~.L_~~_J 
0 50 100 150 200 250 300 350 400 450 
Frames 
Figure 5-28: Salesman Sequence at 250 Kbps (25fps) 
Page - 159 
5.5.2. Hardware and Control Complexity 
Comparing the two schemes, it is clearly observed that the ZTE scheme is far simpler, 
in terms of both the hardware and control complexity. The EZW scheme requires the 
ability to perform two's complement conversion, the reversing of the coefficient, the 
generation of multiple symbols per single pixel, complex sign feedback and complex 
refinement decode to register RO, when compared to the ZTE. Table 5-16 lists a 
comparative count of 'standard' gates between the ZTE and EZW architectures. The 
ZTE is clearly the winner in terms of complexity. In a QCIF sized array the number of 
additional gates required for EZW approaches 23x176x144 = 582,912 gates. 
Furthermore, the extra control line Rev used by the EZW algorithm is not required for 
the ZTE. 
Table 5-16: EZWvs. ZTE Complexity Comparison 
(\imponi:nt I EZW I ZTE 
Register RO Multiplexes 12 5 
Significance ID Gates 12 2 
Self-Classification Gates 6 6 
Pixel Enable Gates 4 8 
Load & Extract Gates 22 13 
Tree Generation Gates 3 2 
Total 59 36 
Given these two comparisons, the ZTE architecture presents itself as an ideal candidate for 
VLSI implementation. Therefore, the first VLSI prototype, the IP100P, includes the ZTE 
codec as detailed in Chapter 6. The codec was modified to exclude the extraction / load 
counter circuitry to aid with testability and due to tight deadlines. 
5.6. Final Codec Selected For Implementation 
The IP100P is the first prototype designed to test the functionality and power 
characteristics of such a low clock speed massively parallel compression codec. The design 
is a 32 x 32 array, which includes motion differencing, the wavelet transform, quantisation 
and ZTE coding. Since the author had not redesigned the wavelet transform circuitry at 
Page - 160 
the time of chip design, the author was forced to alter the ZTE design to allow it to 
integrate into the existing control structure present at the time. Although, the control 
signals were modified to better suit both components the functionality of the circuit still 
remains the same. Furthermore, since complementary CMOS VLSI circuitry optimisations 
favour the use of NAND and NOR gates over the traditional AND and OR gates, the 
ZTE architecture was further modified to include this optimisation. The symbol bits were 
also inverted to aid in this optimisation. The final ZTE design is presented in Figure 5-29 
and the final wavelet transform / motion compensation architecture is presented in Figure 
5-30. Some control signals such as ZT, WT and MC were modified to directly drive the 
individual components via the signals SjmIO and En_R1R2. These provide equivalent 
functionality to M1 and M2 in the proposed design. 
ZemlO 
t:.~E>---f~====:1 >-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~I.> 
---......... I CSJn , i_: ___. 
[ !!!'::: 
C>a~ ln~~~~-t~~~~~~~~~~~~~~-,.~ 
C_~1R2.,,,_R4R5 
c~"""'o 
c. 
G,_Enabkt 
~ 11 [ HEnabla 
Figure 5-29; Final ZTE VLSI Schematic 
Page - 161 
~ -~ft 
cu,cc:..u, 
• .Jbro'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~_JL._~,.-
._,,_ ... __,.. 
... __.._ ... 
Figure 5-30: Final VLSI WT and MC Schematics 
5. 7. Conclusion 
This chapter has presented two novel massively parallel architectures for zerotree coding; 
the Embedded Zerotree Wavelet coder and the ZeroTree Entropy coder. The designs are 
based on pet-pixel self-classification of significant wavelet coefficients. Integration into the 
wavelet architecture, presented in Chapter 4, has been also presented. Each zerotree coder 
has been designed as a set of components which include significance identification, self-
classification, pixel enable, Extract/Load, pixel-wise tree generation and extract/load 
counting architectures. A description of the functional requirements for the array-edge 
buffers has also been provided. Finally a decision for the use of the ZTE architecture in 
the design of the IP100P prototype has been established, due to limited space 
requirements, tight timelines and less complexity. As a result, the modified ZTE 
architecture used in the IP100P prototype and the Wavelet Transform / Motion 
Compensation architecture have also been shown. 
Page - 162 
Cfiapter6 
THE ZTE CODEC REALISED IN VLSI 
"Neo, no one has ever done anything Like this.". Trinity 
"That's why it's going to work."· Neo 
i\forrix: TI1c ,\[01·ic 
6.1. Introduction 
In this chapter the feasibility of VLSI hardware implemen1ation of a novel massive!)' 
paraUel Zcrotrce Entropy Coder, as introduced in the previous chapter, is explored. The 
work conducted herein contributes to the VLSI hardware realisation of the project code 
named IPIOO. The IPlOO project, which originated as an idea in 1995, is currently 
composed of three fuodamenttl streams of research, Sl:!'.'11•med by function. They are •.. 
I. The IP100C - The c:ipture component researched and developed at Edith Cowan 
University and University of Ulm. 
2. The IPIOOD-Thc display component researched and developed at the Universiq• 
of Cambridge. 
3. The IPIOOP - The processing component rcsCllrched and developed at Edith 
Cowan University and University of Las Palmas de Gr:an C,maria. 
The author was primarily responsible for the development and implemcn!lltion of the ZTR 
components belonging to the IPl OOP prototype. A, such this chapter highlights the key 
Page -163 
ZTE components and the other related support components developed for the IPlOOP by 
the author. 
6.2. IPlOOP Prototype Configuration 
The IPl OOP's primary function is to demonstrate the design and implementation feasibility 
of a novel pixel based video compression approach in current VLSI technology. As such, 
and to aid with the testing the chip design is limited to the implementation of the array 
only. Such components as array edge buffers; arithmetic coders, stream coders, control 
logic etc. are omitted for this design. Figure 6-1 illustrates the I/ 0 and control pin 
allocation strategy for the realisation of a 32 x 32 IP prototype array. 32 switchable Input / 
Output columns allow for array load and unload while 18 control lines provide for a 
versatile external control mechanism. 
VDD-
10 Control 
lines 
7 1/0 lines 
GND 
Gm 7 Control I lines I 
v6D I 18 1/0 lines 
VDD 
t 
GND 
-GND 
7 1/0 lines 
+ 1 Control line 
-VDD 
Figure 6-1: IP100P Prototype I/0 Configuration 
This I/ 0 pattern was chosen to suit the per-side pin-count of a JLCC 84 pin chip carrier. 
6.3. The Technology 
The technology employed for the realisation of the IP100P and its sister prototype, the 
IPlOOC, present the following features. 
Page - 164 
1. Foundry: UMC (United Microelectronics Corporation) - Chosen because of 
ease of access via Europracticc and because of fuiancial and design support merits. 
2. Minimum feature size: 0.25 micron - At this minimum siu: an array 
approxirru11ing to 16 x 16 mm' easily supports a QCIF image with adequate yield. 
The minimum transi:.tor si.:e \VI = 0.3µ.m, L = 0.24µm. 
3, Power Supply Con.figuration: 2.SV & 3.3V - The pads require a 3.3V (VCC) 
1/0 signal, however the internal circuits operate 011 2.5V (VDD). A birrnty 'off' is 
represwted by a OV signal, while a binary '011'is represented by a 2.SV signal. 
4. Well Canfiguration: Twin Well- Thfo process employs a Twin-Well process with 
the substrate acting as the well for the NMOS transistors aad a P+ implant applied 
inside aa N-Well for the PMOS traasistors. 
5. Metal Layers: Five (5) - This caters for the use of four metal layers for routing 
and interconnects and one layer (5) for the polished mirror driver. Metal 5 was not 
used as a functioaal layer for the IPlOOP design. 
6. Poly-silicon Layers: Two (2) - Although two poly layers arc present the design 
utilised only one poly layer. The second is typically used for the generation ofpoly-
silicon capacitors. 
7. VIA Con6guration: Supports Stacked VIAs -A highly useful configuration for 
circuit compaction and power distribution. Vias can b,e stacked from the lowest 
diffusion contact through all metal layers to the highest metal layer (5) 
8. 45° Manhattan Rules - The technology allows for 45° bent gates and routing for 
compactness. Although 90° routing is allowed it is not a valid option for the gates. 
9. Ring Osclllator - A nine invener ring oscillator constructed from the supplied 
level 49 BSIM3 NMOS, PMOS HSpice transistor models is presented in Figure 
6-2. This oscillating frequency for 9 gares approximates to 1 GH;,, which implies 
that the maximnm operating frequency of a single minimum shi,e gate can be 
calculared from lx;{11sec = 9 GHz. This is also confirmed by cvnluating ring 
oscillators with 3, 5, and 7 inveners. Therefore the propagation delay for each 
cascading minimum size gate is 1/9 GHz= 0.111 ns. As a result if 60,000 gates are 
connected in series then at IOOKHz the propagation delayed applied by these gates 
will result in significant dock skew. The average power consumption for these 9 
gates operating at 9 GHz is approximately 500µW 
Page -165 
Power (Max/Mean/Min)= (1428.9625 / 496.943 / 383.7375) uW, Inst. Pow@ (lddin 
1500 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
. . 
-, : [ ! :· ----1000 
. . . 
. . . ~ 
t I I I I 
I , 1 I I 
500 ~-----
.WwWJJJ#JJJ)W~W~~J~WJ~~~wm.wJJ 
' ' ' ' 
0 
0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 
out 
3 
2 
> 1 
0 
-1 
0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 
Time (us) 
Figure 6-2: Ring Oscillati.onfor UMC o.25µm Process 
The full custom layout work and simulation of the IPl OOP prototype in UMC 0.25 micron 
technology was completed in approximately four months, with the use of tools including 
Cadence (Virtuoso, Dracula, Analog Artist), HSpice, Electric, Synopsis and VHDLSimili. 
6.4. IPlOOP Floor Plan 
A 32 x 32 array comprised of 1024 individual pixel processors each with identical content, 
which allowed for a hierarchical instance based layouts to be employed for the IPl OOP 
realisation. This section will cover some of the top-level layout considerations associated 
with the layout of the IP100P. 
6.4.1. IPlOOP Size & Layout Considerations 
The size of the full chip is easily derived from the size of a QCIF (176 x 144 pixel) 
image, as this is the targeted size for the final product. If a size restriction of 16 x 16 
Page - 166 
mm2 is to be maintained due to cost, yield and technology factors, and a further 3 mm 
in each dimension is reserved for peripheral components, a resultant 13 x 13 mm2 area 
remains for the actual pixel array. The larger row size of 176 pixels therefore dictates 
that a pixel pitch of 80 x 80µm2 be allocated for each pixel. Applying this pixel-pitch to 
a 32 x 32 array results in an array size of 2560 x 2560 µm2• Therefore the IP100P 
design, including pads and buffers is set to be in excess of 3.4 x 3.4 mm 2. 
6.4.2. 
l~ 
3.446 mm full I 
Prototype 1 . 
i 
-0.4 mm Pads 
~11~ 
---, 
~ ~ -0.lmm Routing 
I t & Buffers 
--1 
Pixel Array 
Figure 6-3: IP100P Proposed Dimensions 
Power Distribution & Metal Allocation 
The power distributioh strategy used in a design is generally related to the number of 
metal layers available. A five metal process was selected as it offered a good 
compromise between the required reflectivity of the top metal layer and the required 
resolution for use as the LCD SLM. In addition, when selecting the number of metal 
layers, the photodetector at the substrate level demands some special consideration. 
Since the photodetector area is to be left uncovered for capture of light, having many 
metal layers requires a highly refined cut creation process, increasing the chance of 
creating defective photodetectors. Furthermore, if the depth of the well in which the 
photodetector resides is not kept to a minimum the shadows cast by the top metal 
layers can interfere with the capture ability of the photodetector. Taking this into 
Page - 167 
account along with the fact that metal five is reserved for the SLM, the allocation of 
the available metallisation layers is presented below. 
1. Metal-1 - Used in the design of primitive components and some higher level 
routing when possible. 
2. Metal-2- Used for interconnects between primitive components. 
3. Metal-3 - Primarily used to distribute VDD to all the devices. It is also used to 
supplement the routing in metal-2 to areas where there is no viable access through 
metal-2. The power is distributed by means of horizontal bands across the array to 
major power distribution nodes in each pixel. Metal-3 is also used to distribute 
some ZTE Status signals within NBLOCKS. 
4. Metal-4 - Primarily used to provide Ground (GND) to all the devices via pixel 
width vertical rails. A gap in between each pixel is reserved for control line routing 
in metal-4 also. 
5. Metal-5 - Reserved for the pixel mirror driver. Also used for logo and pads. 
/ 
Ground Supply 
Pads 
Ground on 
Metal4 
' ' 
\ 
I 'I 
I 
I 
I 
I 
I 
i 
I 
t 
i 
I 
I 
; 
VDDon 
Metal 3 
l~DSupply 
Pads 
Figure 6-4: Power Distribution in IP100P Prototype 
Figure 6-4 illustrates the power and ground distribution methodology for the supply of 
power throughout the array. Pads placed at the vertices of the array provide for the 
delivery of external power into the array. A second potential of 3.3V is required for 
Page - 168 
pad interfacing and is supplied with each block of conttol or 1/0 pads through the use 
ofa VCC pad. 
6.4.3. Pixel Floor-Plan 
The area for each pixel sub-divides into 7 ({1} - {7}) distinct zones, as shown in 
Figure (i..5. These areas are reserved for the following functions. 
t. Photodiode (15 x 15 µm1) - This area is reserved for the implementation of a 
photodiode device in future versions. 
2. Analogue 10 Digi1al Converter (15 x 60 µm1) - This area is reserved (or the 
implementation of an Analogue to Digital Convener to accompany the photodlode 
in future versions. 
3. Wavelet Coding & Motion Differencing (50 x 75 µm') -This area is used for 
the realisRtion of !he wavelet coding and motion compensation circuitry. Some 
basic components in this = include -1 9-bit shift register, 2 6-bit shift registers, 
I 4-way multiplexer, I serial adder, 9 2-way multiplexers, a single dock dimibutlon 
driver and miscellaneous logic components. 
4, ZTE Coding Componenta (10 x 75 µm') - This ~rea is used for the 
implementation of the Zcrotrec Entropy Coder and Decoder. B:isic components 
include 3 1-bit clearable shift registers, 5 2-way multiplexers, 13 miscellaneous 
gates, ZTE significance routing circuitry and reserved space for a future minor 
driver. 
5, Horizontal Routing Linea (5 x 80µm'} - 6 0.40j-llll metal-I horizontal routes 
intedaced with 6 OAOµm Metal-3 routes. This area i1 isolated from VDD 
distribution via mctal-3, ground is distributed over this area on mctal-4. 
6, Vertical Routing Une, (80 x Sµm') • 6 0.40µm mctal-2 horizontal routes 
interlaced with 6 0.40µm Metal-4 routes. This area is isolated from ground 
distribution via metal-4, VDD i1 distributed over this area via metal-3. 
Furthermore other meta!·3 routes may be utilised where VDD distribudoo is not 
present (i.e. ZTE Status lines). 
7. Vertical Cent; Routee (80 x 5µm'} -This zone is kept (rec of meul-2 when 
possible so ~s to allow signal di1tribudon via the centre of the pixel. 
Page· 169 
With these floor plans the layout of the NBLOCK and hence the array can commence. 
6.4.4. 
{3} 
Area Resened 
for Wavel:t& 
Motion 
Compensation 
50 x 75 µm 
§. 
"' x
0 
00 
!:! 
0 
N 
] 
';' 
$ 
~ ,.,.., 
t-
._,., 
{2} 
Area Resened 
for ADC 
15x60µm 
--------------------~--~--------- ---- ---
{4} 
Area Resened 
ZTE 
10 x 75 µm 
I I {5}Area Reserved for 6 x Menl-11& 6 lJ Metal-3 Route 5 x 80 µm 
Figure 6-5: Pixel Floor-Plan 
NB LOCK Arrangement 
§. 
"' x 
0 
00 
s 
~ 
-,,j-
...!. 
.::! 
~ 
x 
"' o,d 
1 
... 
~ 
x 
"' ... 
<8 
-0 
<l) 
b 
~ 
~ 
--< ,.,..., 
"' 
._,., 
Nucleic Blocks (NBLOCKS) provide the smallest individual scalable component in the 
array. A useful result of the approach is the novel ability to scale the size of the array 
to suit any integer multiple of NBLOCKS. For a 3 scale DWT the size of the ZTE 
dependence tree results in an NBLOCK size of 8 x 8 pixels. Routing, especially with 
regards to the ZTE Status components, is unique within an 8 x 8 block of pixels. In 
the case of the IPlOOP prototype, a set of 4 x 4 NBLOCKS are used to construct an 
array of 32 x 32 pixels, as illustrated in Figure 6-6. 
Page - 170 
NOl N02 N03 N04 • • • " ' I ' 111 I : : : • ~ ~ '. ~ 11 It I I I 
PE PE PE 
I~ ~ ~ N08 PE PE 
"" 
:---.. 
N~ 
~ PE 
N09 
~l Nl2 
I'---.. 
~ ......_ 
Nl3 Nl4 Nl5 ~ 
""' 
-
~ 
1 ,i:1111]1'.:111;111 ilP.1n11 nr:1,111111 T ·1 , ., 
Figure 6-6: Array & NBLOCK Floor Plan 
Each NBLOCK contains the significance dependence routing required for a single 
ZTE tree, for which status signals are routed via any free route paths that remain after 
control routing has been established. The routing can be accomplished using 2 - 5 
vertical and horizontal routes per pixel. 
The IP1 OOP prototype design, after control routing is placed, provides for 5 free 
vertical and 3 free horizontal routes. These routes are utilised to perform the ZTE 
significance routing. 
6.4.5. Control Signal Routing & Buffer Placement 
All control signals are routed via either a set of horizontal or vertical control lines. 
Generally all control lines propagate to all pixels in the array, with the exception of the 
scale control lines that only propagate to appropriate pixels belonging to a particular 
scale. The buffers that drive these global control lines are distributed at the borders of 
the IP100P layout. To accommodate for two different transistor drive loads, two 
inverter buffer stage designs are used, which are presented in the next sections. These 
buffers are attached to the appropriate horizontal or vertical control lines for signal 
delivery to the pixels. 
Page - 171 
6.5. IPlOOP Primitive Component Layouts & 
Simulations 
The ZTE component included in the IP100P prototype is constructed of a number of 
essential primitive components that require tight packing into the allotted space. Although 
the use of standard cell design for layout is the norm, the severe space restrictions imposed 
by a pixel-pitch of 80 x 80µm2 virtually illuminates this approach as a practical option. The 
disadvantage is that these the fully custom designed primitives require some form of 
functional verification. This section is therefore dedicated to the verification of the 
functional aspects of these primitive components. It should be noted that the primitive 
designs used herein is also employed in the wavelet component and as such were designed 
and verified as a group effort. The designs investigated here are adopted from (68]. 
6.5.1. Basic Primitives 
A typical schematic representation for a CMOS inverter together with its equivalent 
minimum width layout in UMC 0.25 micron technology is presented in Figure 6-7. 
The operation of the inverter and the power consumed when driving a typical single 
minimum size gate is presented in Figure 6-8. The dimensions in terms of width and 
height for this inverter and other basic logic gates are presented in Table 6-1. 
1n---V---ou1 
'flUUWW(!LjLU'(ff!,1 T 
(a) (b) 
Figure 6-7: Inverter Schematics & Layout 
Page - 172 
Power (Max/Mean/Min)= (164.2375 / 46.0554 / 1.3432e-005) uW, Inst. Pow@ wdin 
200 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-
' ' ' 
150 t···· ·····:······-~:-····:·········· ··••······ :··········· :··········-:········ _ ·;---,·-· ··i ..... · 
' ' \ . . ' ' /' ' ~ 100 ····· ···) ...... ··j·····y·+········· ·········t········+········ 17···• ····--t-···\···:-········ 
' ,/ ' \_ ' ' ' ' ' \ . 
. I ' \. ' ' , . . 
50 ·······-~··7·····1·····-----~- ................. ! ........... :······· /:········· ·t·· ·····\; ........ . 
:/ : : -_ : _... . : ,. 
o· - - '---- ..--
o 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 
> :I[_ -· i ~: __ 1 :~:r_~: :_ ] _~: L --Tl 
1 
........ ,...... . \ : • . ,./, 
I I ·• ' ; \ : . . 
0 -
0 0.2 0.4 0.6 0.8 1.2 1.4 1.6 1.8 2 
3i--~-r~~-,~~-,-~--,~~""T~~-,~~,-~---,~--;:i:::==::=::;--i 
- in 
0.6 0.8 1.2 1.4 1.6 1.8 2 
2 j- : : ' 
> I --:: • ----r----) ----~ --~ . 
~~-,_;--------- i ; '-...::::: r- - , 
, ~
2 
i -;----: -- i_.L .. > . -- · 
. Q4 : -.. 
1 ,-
......... ~ 
Time (ns) 
Figure 6-8: Inverter Operation 
Table 6-1: Primitive Gate Dimensions 
Gate I Width (µm) Height (µm) 
INV 3.12 3.3 
NANO 4.92 3.3 
NOR 3.88 3.3 
AND 3.44 3.3 
OR 4.47 3.3 
The functionality of other simple logic gates such as NAND, NOR, AND and OR 
were also verified io this technology, although the results are not presented here. 
A typical schematic and layout for a standard transmission gate with substrate 
polarisation is illustrated in Figure 6-9. The gate dimensions are 4.20 µm and 3.12 µm 
for the height and width respectively. The gate acts as a link when Se! is grounded and 
a VDD signal is applied to Se/. When the reverse is applied both MOSFETS fail to 
make the threshold voltage Vr, irrespective of the input or output levels. Figure 6-10 
Page-173 
graphs the operation of this transmission gate. Before the Se! signal is set at 2.Sns, an 
unknown condition is observed indicating a typical disconnected state for a 
transmission gate. 
~ VD9>-l j_ 
Sel In* Out 
Sel 
VOO! 
~ 
(a) (b) 
Figure 6-9 Transmission Gate Schematic & layout 
Power (Max/Mean/Min) = (2.3375 / 0.22804 / 1.3711 e-007) uW, Inst. Pow@ wdin 
l I J H ~2"················i·······+··· =·············51 J : ; ·· ······· ·· · ··t · · L . l : 
0 1 2 3 4 5 6 
> :~················i··············i1----·-···;(= ·ii· ... :1----l_-_sel _~ 
1 -----------------:"----------------··:··-------------- :"----------------··:··---------------··:··---------------
' ' ' ' ' 
' ' ' ' ' o ~~
0 1 2 3 4 5 6 
> :~·················l···············i·········?t···············bsJ····~I - ,m 1 
0 1 2 3 4 5 6 
3 
> :L · l71 r bsJ J- io ~ 
0 1 2 3 4 5 6 
Time (ns) 
Figure 6-10: Transmission Gate Operation 
Page - 174 
The Exclusive OR gate, in the interest of saving space, has been implemented in a less 
stable manner as compared to typical logic design. It operates on the principle of a 
switchable inverter as is observed when studying the schematics in Figure 6-11. If Bis 
at zero (0) potential then the PN transistor pair becomes disconnected and the 
contents of A are switched to the output as the transmission gate now becomes active. 
If B is 2.5V then the transmission gate enter a disconnected state while the PN 
transistor pair is activated by the inverter. Hence the inverse of the contents in A is 
propagated to the output. Caution needs to be applied for such circuits as the input B 
provides power for a pair of transistors. Therefore the input B should be generated 
from a strong driving source and be switched at a slower rate. [69] [70] 
A ___, ____ ___, A$8 :B-MB 
B 
Out 
(a) (b) 
Figure 6-11: XOR Schematics & Layout 
This gate has a width of 5.92 µm and a height of 4.23 µm. 
The ZTE component relies on three storage elements for the two purposes of symbol 
storage, and pixel enable latch. The register used here is a classic design from [68]. It 
is a positive edge triggered D type flip flop with an inverted clear line as seen in Figure 
6-13. The layout for this register can be seen in Figure 6-14 with dimensions of 8.80 x 
9.07µm for the height and width respectively. The operation of this DFF is verified in 
Figure 6-15, where two such register elements are abutted together. When CLK is at 
OV gate 1 and 4 are active and gates 2 and 3 are disconnected. As a result the first stage 
becomes disconnected from the second, while the second stage maintains the data at 
DOUT. When CLK is VDD (2.5V), gates 1 and 4 are disconnected the signal latched 
in the first stage is transferred to DOUT. The interesting event occurs when CLK rises 
from OV to VDD. 
Page - 175 
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~., ... 
Power (Max/Mean/Min)= (506.6498 / 30.7216 / 4.5411e-005) uW, Inst. Pow@ ',(jdin 
~ :[ t ·· -- :-- -· [ 1 .. _··· ....... -..... ··-- ·j···········-····A--·-J 
200 -i·· ··-·····-·-·------·A··--·-----·;··i1·-·· --··· ,. ....... ---···------:···---,----,-·-·--- -----·--·-------
o / /\ 'i\ )\ ·'\: - , 
0 5 10 15 
3 ~ 
>:~-r ti\ -J.·-· li········ · 1 1 - -· ,rn-~ 
0 5 10 15 l .. l' • -, > :-t······· ··········· ·· ······· !r········ ················· i ······· ·· ;····= 
0 5 10 15 
- -- . : ,~~: --~ l , I ~ >:·r ·v:r~r·- ti··· r-~ 
0 5 10 15 
Time (ns) 
Figure 6-12: Exclusive-OR Operation 
Here the signal at DIN is latched and passed out to DOUT via a 4 gate delay. If two 
elements are cascaded then due to the delay of 4 gates the previous DOUT signal of 
register element 1 is latched to element 2 prior to this DOUT changing. When a 
downward transition of the CLK occurs, stage 2 of the element latches the signal at C 
and stage 1 is setup to DIN. This process results in the generation of a single clock 
cycle delay. The main advantage this register exhibits is the reduction of at least 10 
transistors when compared to a standard DFF. Theis register only requires 20 
transistors, which can result in a saving of up to 3.8 million transistors in a QCIF array. 
However care must be enforced to provide a "clean" clock signal as this register 
implementation only provides a small margin (2-Sns) for clock skew between CLK and 
CLK . Therefore in the pi.xel a separate clock buffer is employed to provide this 
signal. Another matter that requires some consideration is the set-up of data prior to 
clocking, which needs to be approximately 2ns in lead of the clock signal. 
Page - 176 
DIN 
Clear 
DOUT 
l • 1 DOUT 
CLK • CLK I 
Figure 6-13 DFF Schematic 
CLR_._BAR 
~I lj tf'' lf@w/1 lj lj it/( If (((itjf I Ii lj rf ( rl(li (((/(((((1 lj lj I ((Ii l(rti tA 1io1h 11i; 1JJ7/J)J))!))}JID/»1ij)J/))J)w ;0J11v1m 
Figure 6-14: DFF Layout 
Page- 177 
Power (Max/Mean/Min)= (660.37 I 0.49378 / 2.7411e-005) uW, Inst. Pow@ vddin 
~ '::r .11lJ!Thr1r] 1!11 j 
:r·····················n·····l ·····1····· 'f n····· I- 001 ~
5 
>: ······················~ - ···················································=······· 
:r n 1, r , r, I- ,~ ~5 
> :··············=·····:·-···.- -- ---=·-. ----· 
:~ I 11 11 f l I - '"' ~
5 
>: ·····-····--··· =·
:r i··········-·········· L ................................... r· ··············- ·-·· I- "· ~5 
> : -··· -· - -· ··-· - .. ......... ; -·------· ---
:r·n·nnnhnn·-·n n hnn r- "' ~5 
> : --·---·~··-~-··~···~· =~··-~-··~-··~···=~··~···j H 0 5 10 15 
Time (us) 
Figure 6-15 DFF Operation 
A typical multiplexer schematic and associated layout is shown in Figure 6-16. 
:=[J-~ VOD! 
Se! 
Se! Out 
(a) (b) 
Figure 6-16: Multiplexer Schematic & Layout 
The design uses two transmission gates connected inversely to allow either one input A 
or B to be switched to the output based on the signal at Sel. The layout for this design 
Page- 178 
fills an area 6.92 µm wide by 4.00 µm high, and its operation can be verified from 
Figure 6-17. At 4ns the multiplexer switches from int to in2. 
Power (Max/Mean/Min)= (113.1788 / 4.0093 / 1.5308e-005) uW, Inst. Pow@ \ddin 
~ ::r + 1 + hd······LJ··· +····1······j 
0 1 2 3 4 5 6 7 8 9 10 
: ~ 1 11 ~ t I I 1-i "' ~ > ; ·····················; .......... ; ........... ;;.......... : ........... : ........... ; ......... : ......... . 
0 1 2 3 4 5 6 7 8 9 10 l i L ·i lI11\J ·L-i "" ~ >: 1·····:; .......... : ....... ~ ........... CJ ; ..........•......... 
0 1 2 3 4 5 6 7 8 9 10 l r + I 111\J T· 1-i ,,, ~ > : : .....................•...........•...........•..........•........... Cj ........... : .......... ; ........  
0 1 · 2 3 4 5 6 7 8 9 10 
2 ______ __ ) _______ ___ .;. _______ _ ) __________ -~---- ____ --~- ___ -----)------ __ _ ~-- _______ --~---·-- __ J in1 l I I i i i ~ Ii ~ > : ········+··········i··········: ........... ; .......... ; ........... Cj ........... : .......... ; ........  
0 1 2 3 4 5 6 7 8 9 10 
Time (ns) 
Figure 6-17: Multiplexer Operation 
When in information extraction or feed-in mode the ZTE and wavelet components 
bot rely on a 4-input multiplexer to be switched so as to be connected to the next 
adjacent pi..xel. Such a four input multiplexer simply combines the use of three 2-input 
multiplexers in a two stage circuit. The first stage typically employs two 2-input 
multiplexers to select two outputs from four inputs. The second stage then selects 
between these two outputs. The functionality of such a device has also been verified 
although not presented here. 
6.5.2. Buffer Requirements 
The IPlOOP employs several buffers located on the edge of the pixel array to drive the 
control signal busses in a row based manner. The ZTE component requires the use of 
Page- 179 
six of these control lines and hence related buffers. Buffer design typically hinges on 
the selection of suitable values for both the number of stages N and area of the final 
drive transistorA0 typically in comparison to the area of minimum size gate [71]. The 
values for A0 are typically governed by the ratio C/cG, where CL is the load 
capacitance and Ca is the capacitance of a minimum size gate. A minimum size gate, 
as calculated according to [69] [71] results in a CG of 2.6961 x 10-15 farads. Also the 
minimum inverter delay calculated from the ring oscillator approximates to 0.111 ns. 
Therefore, given an optimal stage ratio A of 2.71 ( e) [71 ], Table 6-2 lists the number 
of stages N required to support this optimal stage ratio, and also the relevant minimal 
intrinsic delay incurred by a buffer designed with these characteristics. Since N is 
inversely proportional to A (Eq. 6.1) the number of stages N approaches values that 
are practically difficult to realise, especially when desiring to minimise chip area. 
Therefore Table 6-3 presents a set of stage ratios (A) and stage numbers ( N) which 
introduces a trade-off between the delay time and ease of design (i.e. buffers that 
contain 3 or 4 stages with ratios of 4). These buffer drivers are used in the IPlOOP 
prototype and although it is clearly visible that the delay is increased, it still remains 
within acceptable tolerances for clocking at 100 KHz. 
Table 6-2: Ideal Stage Ratio of e 
Lumped C1 All N 
Per-Pixel Pixel<; Array Total Fan Total AtTay Load Orv Cap Opt Num Bulfer Del Intrinsic 
Simal Ca F Per Row Ca F In Ca, F Size Ratio ofStal!CS Ratio Dela ns 
CLK 9.46E-15 32 3.02628E-13 2 \ 4.751788-13 176.25 5 14.06 
,, .. ~ n o •t•<n .. u ·"**'"" ......................... ................... •o• •••••••••••••••• •• ••• •• •• • •• • 
. .. 5 .. ) 2.03182E-12 CFzrR 5.00E-14 32 I .600458-12 753.62 7 18.01 
,,,,,,h.-.••.-•"-vo,,,o, ...... .. .... ............... ................... . ................. '"" .............. 
DO 
CSYMIO 7.79E-14 32 2.49367E- I 2 5 i 2.925048-12 1084.92 7 19.00 
•••• •" - ""''·"'•·•·••a.••> • ----------~· ... 
CLOADRl 4.02E-14 32 1.28481£-12 4 : 1.629918-12 604.54 6 11 A 1 I !:934 
HEN ABLE 2.44E-14 32 7.8232E-13 2 ! 9.54870E- l 3 354.17 6 15.96 1 1.773 
VEN EBLE 2.44E-14 32 7.7968E-13 2 j 9.52230E-13 353.19 6 15.95 P 1.112 
ln(CL/ ) 
N= / Ca 
ln(A) 
(Eq. 6.1) 
Page - 180 
Table 6-3: Selected Stage Ratio 
4 4 16 1.778 256 
1084.92 I 3 4 12 1.333 64 
4 4 16 1.778 256 
3 4 12 1.333 64 
3 4 12 1.333 64 
Therefore, two buffers can be designed, one, a four stage and the other, a three stage 
(inverting). Figure 6-18 and Figure 6-19 show the layouts for the 3-stage and 4-stage 
buffers respectively. 
... 
. . . . . . . . . . . .. 
. . .. 
Figure 6-18: 3 Stage Buffer layout 
I-
~ 
0 
The 3-stage buffer is 10.09 x 5.520µm in size while the 4-stage buffer is 9.98 x 12.7µm 
for the height and width respectively. 
Since each buffer drives a row of 32 pi..xels and since there are 32 rows, a further 
buffering stage is required to drive this stage from the pad signal. In the case of the 
three stage buffer, another 3-stage buffer is used to drive the 32 secondary buffers, 
Page - 181 
therefore providing a non-inverting signal to the pixels. The rows with 4-stage buffers, 
on the other hand, are driven by a third 2-stage buffer. This 2-stage buffer, layout in 
Figure 6-20, has an intrinsic delay of (2)( 4 )(t d) = 0.889ns and has a total delay of 
0.889(2) =1.778ns. It is 9.91µm high and 3.750µm wide. 
Figure 6-19: 4-Stage Buffer 
I-
~ 
0 
Figure 6-20: 2-Stage Buffer 
OtJT 
Page - 182 
Figure 6-21 illustrates the technique implemented when connecting the pad to the 
internal buffers. Here a 2-stage buffer is connected via a bus to 32 4-stage buffers for 
the control signal CLOADR1R2R4R5. 
Figure 6-21: Pad to Buffer Connections 
Figure 6-22 and Figure 6-23 show results from simulations conducted on both the 4 
and 3 stage buffers driving an approximate load of 1100 transistors. The "cleaner" 
operation of the 4-stage buffer is clearly observed. Even though the 2-stage buffer's 
performance is significantly worse than the 4-stage buffer, the 2-stage buffer is 
sufficient to drive most control lines that are only altered at microsecond boundaries. 
Hence due to the significance size reduction the 2-stage buffer presents, this allows for 
better space management. 
Page - 183 
~:t10' .. P=1(~m 2/0T~i"±-@ldio j 
0 1 2 3 4 5 6 7 8 
3 
> :l···········j············tzr···j·············J·············\··sc- omJj 
0 1 2 3 4 5 6 7 8 
> :l:···········:1~·-·····1·11:·············i1·········· ···1•·············j1 ·········; :1 -············lj--mid.Jj 1 : : ••• : ••• : •..•. :. .... . · :·············:············ 
o I I I I ' • 
o~~~
0 1 2 3 4 5 6 7 8 
> ~l···········bd····l·i·RJ·l1·= '"'··11 
0 1 2 3 4 5 6 7 8 
Time (ns) 
Figure 6-22: 4-Stage Buffer Operation 
Power (Max/Mean/Min) = (4960.6 / 451 .0967 / 7.9897e-005) uW , Inst. Pow@ ...:!din 
~ ~l'-"='-------" ;f(~j_········· ·· · ···•____.i~x J_____,l 
0 2 4 6 8 10 12 
> :t 177 I I T \JJ-~·~ 
0 2 4 6 8 10 12 
> l 1r I T I t rJ- ~;~ ~ 
0 2 4 6 8 10 12 
> :~---··--1: ............... ·:····················1 •\ 1•-- ...1 -. in1 _~j 
1 . .. .. . .. . .. ·····:--······· ········· :---- -······· · · ·· · ·:·· ····· · · ·· · ······:···· ·· ·· ········:--···--·········· 
I t I I + 
. . ' . ' 
o ~~
0 2 4 6 8 10 12 
Time (ns) 
Figure 6-23:3-Stage Buffer Operation 
Page - 184 
6.6. ZTE Post Layout Simulations 
This section details the simulations performed to verify the correct operational 
functionality of the ZTE component in the IP100P. Due to the difficulties associated with 
performing simulations on the entire array (- 900,000+ transistors) without the use of 
appropriate tools such as Star-Hspice, the operational functionality of a single p.ixel is 
simulated. The simulations are organised into two sections, the ZID part which simulates 
an eight bit register combined with a significance identification module, and a ZTE part 
which performs all of the other ZTE symbol generation and extraction process. Control 
and status signals attached to each of these components are presented in Table 6-4 and 
Table 6-5. The layout for the ZTE is shown in Figure 6-24. It comprises of 170 
transistors and occupies and area of 32x22µm + 18x9µm = 866µm2• The area without 
circuitry is reserved for routing via Metal-2 (Metal-2 Free Zone) 
Table 6-4: ZID Module Signals 
Signal I Direction I Description 
Ina IN Register LSB Value 
Inb IN Register Bitl Value 
GEN ABLE IN Clear Signal 
-
ZEROID (out) OUT Zero NOT Detected Signal 
MID NA Signal From XOR 
- - -
Rin IN Data in Register RO 
Table 6-5: ZTE Module Signals 
Signal I Din:ction I Description 
CLK IN General Clock In Signal 
CSYMIO IN Generate ZTE Symbols Control Signal 
CFZTR IN F~!1:1 Zerotree Control Signal 
-
CLR1R2R4R5 IN Shift & Load Registers Control Signal 
-
HEN ABLE 1N !{orizontal Pixel Enable Signal 
--VENABLE IN Vertical Pix.el Enable Signal 
-~ 
ZERO ID IN Zero NOT Detected Signal 
DATAIN IN In~ol_!ling Data_ From Adjacent Pixel 
WCDATAOUT IN Inc~ming Wavelet Coefficient ,_ 
GEN ABLE OUT ~~ej Di.sabled Signal & Clear 
-
DATAOUT OUT Data For Next Adjacent Pixel 
-
SSIN IN Sibling Significance In Signal 
CSIN IN Chikl Significance In Signal 
PSIN IN Parent Significance In Signal 
SOOT OUT Sibling Significance Out Signal 
PSOUT OUT Parent Significance Out Sumal 
Page - 185 
Registers 
Decode 
Encode 
U CL 
Figure 6-24: ZTE Coding Full Layout Circuitry 
Page - 186 
6.6.1. Coefficient Significance detection 
During the ZTE encode stage the initial action performed is the generation of a ZID 
signal from each wavelet coefficient. This is accomplished via the ZID circuitry; see 
Figure 6-25 for the layout. This circuit, when connected to the two least significant 
digits of a shifting register generates a high signal if any one or more of the bits differ 
from the rest (i.e. 00000000 & 11111111.are considered zero). The two different 
conditions are simulated in Figure 6-26 and Figure 6-27, using this circuit and an 8-bit 
register. 
Out 
Figure 6-25: Layout/or Zero Identification 
Power (Max/Mean/Min)= (1462.715 / 1.8123 / 0.069681) uW, Inst. Pow@ l.ddin 
2000 i f ·····························] ~ '"';[ ·,, I I [T ltll I ltl·t r-1 I I I O 
3 
2
~- ----------r···l··-·r---l · > _ ······························-··-·:··· .. ,. ............... ~ - rin 1 .. ········· .......................................... .............. , 
0 
. 
'F 'f ,S > 2 ................................... ; ....... ······ · ····· · · ··········f······················· - out : ---------- ----- ,: o,,,,,,0mriLt f L ~ 
3i 10 15 ~ 
> 
2 
··································1························t·'· t-r 1··· '···1·······1····! - mid :F ----------------------- r m --~ 
> 2 ·· ····_············· ············<· ·· ····················· ·····=·······[]·····I - 1nb :F ---- ,: --------------------------- f ------------------------~ 
> 2·-------------------···············t························[]··-~---o····--------1- ina ~ ............................... ... , ................................ ···i ·· ··· · ···· ....................... . 
> !f .... :.:::::::::::::: ::· .. f-- ......................... :::: '[ ::::::::::1-.g~~~-~'~-~ 
3R 1n 1n ~ 
> ~~---n---n--o---n---=---n---n---o--o---=--n---n---l11r c11-
5 10 15 20 
Time (us) 
Figure 6-26: ZID Non Zero Detection 
Page - 187 
Power (Max/Mean/Min)= (1462.29 / 1.9588 / 0.0012518) uW, Inst. Pow@ l.ddin 
2000~--------~--------~-------~ ~ 100~ [--,----,--- ,--- ,- - ,---r·r·,---r-+--t·--t---1--J-j--J-1---1---1---l·--1---,---1 ·--, ---,------------------J 
3 0 
> ! t·------------l------1-------1------r------------------: ________ : ____ ::_:'.-------------------------~ -
2 --------------------··- ----····------:---------------------- ~ -----------:--- --- --· ·- -··-----·····J - out 3~ 10 1~ ~o 
> ~ ··-----·-----------------------------[ Ch,m,geDete?ece.+---------[------------------------------------· 
> 2 --------··------- -··---- -----------:-------------------------- ------- ---:--- ------- ---~--- ---1 - mid 3~ 10 1s ~o ~ -----------------R-----------[-----------------------+-----l---t--t-----+----+----------------
> 2 ------------------------ - ----·------~------··--·---·····-----··:···---- ---:--- ------- ···---- ---·j - inb 3~ 10 15 ~o ~ -------------------------1-----------[---------------------------------!---l---R-----+-----------------
> 2 ---------·····--- ---!··----·-·------:--·---··-------· · ··-·----· -·· · ·- · -· ·: - -- · · --·-· ··-'---------j - ina 3~ 10 ITT ~o 
~ -----------------1-------------------r-------------- ----------+--···;; -------t-------·-----------------
> !~················· ········ ······· ·'f · ····························'f ······· · [- genable -~O 
3n 1n 1n ~o 
> !=--n-··n-··n--·n··-=--·n----n---n-··n---=---n···n---r-1rc1  _ 
5 10 15 20 
Time (us) 
Figure 6-27: ZID Non High Detection 
The ZEROID (out on graph) signal is cleared by switching the GENABLE signal low. The 
GENABLE signal must be cleared for each significance identification cycle. 
6.6.2. Pixel Pass-through Mode 
In pass through mode the data arriving at DATAIN should be replicated at DATAOUT. 
This mode is activated by disabling the entire pixel via control signals VENEBLE and 
H ENEBLE. In Figure 6-28 which simulates the pixel in pass-thru mode it is easily seen 
that DATAIN is copied directly to DATAOUT. In this mode the generated GENABLE 
signal disables the entire pixel (ZTE + Wavelet) and resets all relevant registers. 
Page - 188 
Power (Max/Mean/Min)= (122.1095 / 0.13062 / 0.00036972) uW, Inst. Pow@ vddin 
~ ~~: [···-·····+···· .····+···· .······[····· .······[····· .······[····· .·····+····-i-···+···· .······]····· .·····J 
3E , r .. I ~ 
> 
2 
······· · (· ······ ··! ···········:············1···········-r-- ··········1···· ·······j······\ -; venable ~ J~ i T r T I 1 T 1=:h,~:, ~ Jt i T f 1 f I I 1=:~~~:~  
> l~ i••••••••••1········••1·····1 ····· 1······· + ;••••1•••••;=:,=~·~ ~ . . .......... ····· ······ · : ············ ····-·······:············:············,··········· 
3r-~~r--~ ----,=====r~~-r-~~-r.:=====-t::==:----i--~---;:::::::r:::====~ 
2 
> 1 
o~~~~~~~~~~~~~~~~~~~~~-
3~ 1 ? ~ f ~ ~ 1 ~ RA 
> ~F -----------~------·-----:············:············:······· ·····:········· · ·- : ............ : ...... 1 
0 
;Y I--, t-, i-, L-.., i-, 't.....-, L-, ,l' . f 
> 1 
0 __J 
0 2 3 4 5 6 7 8 9 
Time (us) 
Figure 6-28: Pixel Pass-thru Mode 
6.6.3. Wavelet Coefficient Output 
This mode is used to extract coefficients from the wavelet component to a 
neighbouring pixel. When the control signal CSYMIO is held low and the pixel is 
activated via GENEBLE, the contents arriving at WCDATAOUT are routed to DATAOUT. 
Figure 6-29 shows this process, and also highlights the activation of GENABLE only 
when both HENABLE and VENABLE are set high. 
Page - 189 
Power (Max/Mean/Min)= (627.0425 / 0.21034 / 0.00037006) uW, Inst Pow@ \Odin 
~
1
::r +I····+·+ + + +.f + J 
6.6.4. 
, ff :: :j: f: ::: [·::::::: !::::: l :: :::}:::::j j::L= .?data"' ·~ 
> !E ::j } :: :{: ::· ! :.::·f :::::::.r ::::::.::J::: :I .... ;w"'"'·~ 
> l :}:·:1 + :: ·l · l ···+ l l· 1-:h~a•, ~ 
, !~···· ...... i ..... 1 ..... f ... ::::::··:···········1·······::::t::::::::::.t············ ·····I~~~·····~ 
> i~::::·· . i:··· ········!- ·········---1------------!;-----------!c -··········8~ ----- 1~------1- :c dataout ~ 1 ·····: -·· ---- -- ------- - -- -- : . . ---- - ·· · · · ---~----- --- ., . . . .. .-...... ... . . , 
0 ' ' ' 
> ~i....... 1- ........ } ........ ) ........... J. 5 ? 7 j ~ ~ 
o ~ ~--=--~_L___L~_L___L___L__j 
~? L L L L: L L L ~ ~ 
> 1 
0 
0 2 3 4 5 6 
Time (us) 
Figure 6-29: WCDATAOUT Mode 
DNT Symbol Generation 
7 8 9 
As with all symbol generation the DNT symbol is based on a two-part process 
(Symbols & Coefficients) (Section 5.4.3). This two-part process is verified in the 
following section for each operation mode. Since the decode mode and the coefficient 
encode part follow a similar control timing scheme they are grouped together in the 
symbol decode section following the current symbol generation sections. During the 
encode process when the ZTE status signals assume the values in Table 6-6 a two bit 
Do Not Transmit (DN1) symbol (2.5, 2.5 V) is generated and latched into R4 and RS. 
The timing and simulation for this state is presented in Figure 6-30. The signal 
LR1R2R4RS initially routes the symbols generated from the ZTE status signals to R4 
and RS, which latches this data when the clock enabling signal CSY.MIO is set high. 
When CSY.MIO becomes active again the symbols are shifted out while receiving 
symbols from adjacent pixel from DATAJN. This procedure is applied for the 
generation of all other symbols. 
Page - 190 
Table 6-6: DNT Symbol Status Voltages 
Signal I Va luc ( \' ) 
SSIN 0 
CSIN 0 
PSIN 0 
ZEROID 0 
Power (Max/Mean/Min)= (967.3675 / 1.2195 / 1.3036e-005) uW, Inst. Pow@wdin 
~ 1~t·····. + I +-· +1+-···+ ··-··i--1 +1-i ..... -j 
~~ -+ R r I r~ ±=-~~ ;g,Mble ~ > : ~ • ; ~ L : i -- ; ; : ----·--
!~ ----L-: . r~~;-- --- i + .. f -1 ;,.w~, ~ 
> : · _; _ ···-· --- .... ; ...... --· ' DNTSf~ 1 ··· ; ---· : .. ·--··· 
:~ Ll._ f-=i---- L \··r±--··. -;~,m,, ~ 
> : - __ , ----: -- -=- --:--: .; _ --
30 ~ f ? i f ~ T ~ 9 
>;~ 1 + T----J--L r -1=£Q~j 
!t -- h 1'-ci-····h-h---r-1· ---h----- r· ·1--I;- ,lk ~ 
> : ~=::Lt=t=1-· = --= >~ --= -_---- t · . 
0 1 2 3 4 5 6 7 8 9 
Time (us ) 
Figure 6-30: DNT Detection, Latch & Shift Out 
6.6.5. ZTR Symbol Generation 
The ZeroTree Root or (ZTR) symbol, signal level (0, 2.SV), results when a ZTE status 
signal corresponding to that of Table 6-7 is detected. The timing and simulation for 
the generation, latching and extraction are presented in Pigure 6-31. 
Page - 191 
Table 6-7: ZTR Symbol status Voltages 
Signal I \'aluc ( \') 
SSIN O or 2.5 
CSIN 0 
PSIN 2.5 
ZERO ID 0 
Power (Max/Mean/Min)= (992.9 / 1.0893 / 0.0003029) uW, Inst. Pow @\ddin 
~ 1~t .... +···I· i······ ···f ···· 1 ·+· ....  I········ 1·····1 + ++- j 
:~ 1 1 L le I - f ·-l·' ·1 1 · ;,~·· ~ >, "······················:......................... ; ............ , ...... ..•...... : .......... . 
0 . 
:~·---·L·-····-··1·· r ··· l ········i ····· ~ ~ l- ~,.ra,~ ~ > ; "· .••. ; __ ········:··· ... : ....... ; . ·- ; DRSym,O .. ; .... .• ; .. t ...... . 
> : ~ -· L L r=t ; ! H--"i -;cs,m,, ~ ;·_ 1--: --~---·1··· _ _;---~1--:--: ... 
2 . -· · · i··-······· ··· ····· ·····-:·-·····----~".:"' ... ·········f···········--:----···--····i----1 - clr1r2r4r5 't : l I : l ' ; ! ~ 
>:-=·L_I ++-' :-J_····~·-
+-h···l·lh···ah h fl+'ll;-elk ~ 
> ;············=······:---··_·····=···· 1······1··--=·····t········-~-------:·····J·-: 0 1 2 3 4 5 6 7 B 9 
Time (us) 
Figure 6-31: ZTR Detection, Latch & Shift Out 
6.6.6. VZT Symbol Generation 
The Valued Zero Tree root (VZT) symbol, with signal level (0, 2.SV) is generated when 
the ZTE status signals resemble that of Table 6-8. The ti.ming and simulation for the 
detection, latch and extraction is seen io 
Page - 192 
Table 6-8: VZT Status Signals 
Signal I \'.1l L11..' (VI 
SSlN O or 2.5 
CSIN 0 
PSIN 2.5 
ZERO ID 2.5 
Power (Max/Mean/Min)= (906.51 / 0.60392 / 3.9911e--005) uW, Inst. Pow@ l.{fdin 
1000 t 
3l: 500 -
·······]···· l·····l·······-f-·1····1········ ····[ ········· ·· 1 ····· l······l···t····· f ......... j :::, 
0 ;~ Li l · t · ;. .. ;-1 r· .Li .. ;g,oabl, ~ 
> 1 " .......... ; ••....•..... ·······; ..•......... : . . .. ; . .. ..J ............ : . . i ... ... -
0 . 
+ i .. .. · i l=l ··· [ 2t1=[ I-;'"""'~ > • · • · · ·' ··· ; ···· ·· · ··········· · ········ ; ms:: : ···  · · 
~r ... L · l · [ ····+· J .. . . l j + .j ; cs,m, i 
> : J ... ;. : . : .. .. ! + : : -. ·······" 
;}··········-t----······· 1· ····· f ........ q ........ :..... 1 ............ r1 _ r."'"·~ ~ 
,: ·········: ____ .............. : ;._ '-------:---··'·····L __ 1 ~: ii l1hr1 ; ~ 2 ·········f-····· ... ... )' .............................................................. ·----1 elk >:~ 1 l+l [J [J .j = = :i 
0 1 2 3 4 5 6 7 8 9 
Time (us) 
Figure 6-32: VZT Detection, Latch & Shift Out 
6.6.7. VAL Symbol Generation 
This symbol, the V ALue with signal levels (0, OV), is generated when the ZTE status 
signals equate to that in Table 2-1. Timing and simulation results for the generation of 
this symbol is presented in Figure 6-33. 
Page - 193 
Table 6-9: VAL Status Signals 
Signal I \'alu~ ( V l 
SSIN O or 2.5 
CSIN 2.5 
PSIN 2.5 
ZERO ID O or 2.5 
Power (Max/Mean/Min)= (670.2075 / 0.42482 / 0.0001 2635) uW, Inst. Pow@ l.ddin 
~ 1::r i-- I 1-----·-·- -r·-·· 1--1----------i ·- + ··, + 1- ·t ·- - -j 
> :l· .. ~ -+ ::t: l:···:· [· .... f ·······-! ··-· I ······· ···l ···· 1··· ; ''"'''' .1 
> ~1--: j:.:::··:l : ::. f-~ :.: :j •--- lv~~sd,;~ .•• .;...I.~-~ ""''00' 1 
> ~t L . l_-1: ::.::j:::: ~':~~_:::::~_::L:j. f 1-.. 1 csy~ o 1 
> ;i -- i-- f=L····--+-- L . -1 .- [.1_- !,"~~ ~ al L_ - i ' , ' ' ; __ 
> lf ·-~~h••h4~J-•h-- [t::[th-•••j; __ ··F-~ 
0 1 2 3 4 5 6 7 8 9 
Time (us) 
Figure 6-33: VAL Detection, Latch & Shift Out 
6.6.8. D NT Decode Operation 
In the decode phase a DNT valued pixel is identified by its immediate parent. Here if 
the PSIN signal is at 2.SV, then the pi.."Xel is disabled and it enters pass-thru mode, which 
then never expects a symbol or a wavelet coefficient. In encode mode one the DNT 
symbol has been extracted the pi.'<el enters this pass-thru mode to avoid transmitting 
the wavelet coefficient. The control signalling for this mode of operation requires that 
Page - 194 
initially, all pixels be set to the symbol VAL (OV, OV), which is achieved by disabling 
the G E N ABLE signal, via HENABLE and VENA.BIB. Following this if the CFZ1R signal 
is held high while the PSIN signal remains high then the pixel will enter pass-thru mode. 
The operational simulation cot"l'.esponding to this mode is displayed in Figure 6-34. 
The DNT decode mode is unique m that it solely relies on its parent to supply 
activation information, hence hot requiring a symbol itself. 
Power (Max/Mean/Min)= (853.7875 / 0.51517 I 0.00011367) uW, Inst. Pow@ vddin 
~ 1~~~[--·-. +-----------·!-- ·--·-·-+··· ----+-------· +- -----·-+· --------j---- -- ··-+. -······ J 
>!b l ... I L 1 -=-=.~ I J=- ; _±cl ! ~'" J 
~L=. ;----·· ···t········-i==-··:·· ...... -~- ---··· -+- ----~- ---;:--·········] 
> ff: :: :! { : ~ ::: : l ::: Jd:P:"'',~ mo,Je :: + 1-}·oa•· ~ 
> l~ ·:· . tJ :; . :::::::!: : E ±=: ::I: : + ·I ~ ; ..... ,~ ~ 
> !L ·· ··!' - L J= . I - J ········· ! - L [-= amao J. ~ C ---· --~- --···· ··-r·· -----·±=·----·--(- .... -+-- --·-----l-- -- -··-J--- --- -- I -_______ Jj 
3t 1 ? r 1 9 ? t ~ 1 
> ! -- ·:::i··:::_i:: :~·-·:··~:: :.~:::::: .. :::::·:-.:~:::·.::::::)::::::::::::i:::::J~--~~~mi_~--~ 
3 -~ 
2 
>1 
0 
3'~' ~~· ~ ~f---+--' --1~=t---1i__-+--d~~ 21 r T r 'L_  ? f ? r
> 1 
0 
0 
6.6.9. 
- - '----'-----'-
2 3 4 5 6 7 8 9 
lime (us) 
Figure 6-34: DNT Decode Mode 
ZTR Decode Operation 
If a pixel's parent is significant (PSIN = OV) then it should expect a symbol. In the case 
where this symbol is a ZTR (OV, 2.SV) it implies that a wavelet coefficient does not 
follow~ hence the pixel deactivates itself (GENABLE is set low). The operation of this 
mode is presented in Figure 6-35. 
Page - 195 
Power (Max/Mean/Min)= (800.235 / 1.799 / 0.00037006) uW, Inst. Pow@ vddin 
~ '~t----- +- I + t i I i -j I +------l - i l 
l--- +--1 !------ 1- r -- !~ i"'Th~i~ I~;''"'"' ~ > : ~ -- - : : : --- : ·- + j + ----- -
2 - --· ••• • ········-···- -' -·-··-------· ------------ '------------ '········-···· ' ············ ·-·····j - dataout ,~ i r r r 1 r l ; ~ > ; _ + + ; : + - : : -
> ;~zrns m~, ~~ l + +----·--l l L I ~ """" ~ : y : ; • ----------- -- --·- -· 1 ' +; ----- --
> !t :::::: !::~: : r r ::: i : r r-:: :~: l ·:: I , ~,~ 
>;~-- -- h····b--tJ[J--tJ---hJtJ----j;~ elk 1 ~ .. ·:~~- -- -- · .. -·---- ..... ····------- -· ·· ······ ····· _ _, ___ .................... j·· . 
;~ -l ------ + - + -- L ! i + -I f- ~, ~ > :0 +  : ------------ _______ .___ : _--- --
0 1 2 3 4 5 6 7 8 9 
Time (us) 
Figure 6-35: ZTR Decode Mode 
6.6.10. VZT & VAL Decode Ope.ration 
Decode of a VZT or VAL symbol is activated only when a pixel's parent is deemed 
significant (PSIN = OV). Both VZT and VAL decode reguire a symbol and coefficient 
to be captured from the stream, therefore the G ENABLE signal must remain high after 
the symbols are captured. When GENABLE is returns to active, any coefficients passed 
to this pixel will propagate to the wavelet component. Figure 6-36 and Figure 6-37 
illustrate the decode operation for a VZT symbol and a VAL symbol respectively. 
Page - 196 
Power (Max/Mean/Min) = (942.25 / 1. 7845 / 0.00037006) uW, Inst. Pow @ vddin 
~ 1~ t .... ·· [ · I j ·1 + + i......... + I + i ... f .......... i 
;~ + j f • + + l + · [ ·I -~~~b,, 1j > :C ,. i : .......... : ; :~-Pi,.:""'- : 
;~ l· 1 l ·········· l ·········· 1 '.... . ! 1 ; ,,ra,• 1  > :C·· .... f ....... i:.: + • ............ : ..  
3° 1 t ~ 1 r ? T ~ 9 
> ;§r•,mt~~ r·· 1 ·:·· f ·· ·········;· :·.····. ··1 '"''" ~ 
;l L f + + LJ ... +1-;os,m,, ~ 
> : • ' ....... ·~ . . : ···········; ····. : : .. 
l h fl FTh F:1 Fl F 1:- ·· 11 >~ . ···---~·--·=--··=·······~···--·=1···--=·········+--·--- J---
30 1 2 3 4 5 6 7 8 9 
> ~ ~-- ·· .. ···t··· ..... --i·. --·-···--j---·----··+··-·-------r=1·--·--------1··--·----·I_ ~-c~r -~ 
0 _ 1 __ 
0 1 2 3 4 5 6 7 8 9 
Time (us) 
Figure 6-36: VZT Decode 
Power (Max/Mean/Min)= (670.2125 / 0.54001 I 0.00011523) uW, Inst. Pow@ vddin 
~ '~t + 1 ·+1+1 l· +1 + +·· +···j 
;~·-····Lf+··l : f l i > ; -·-·-·--··r··<-----~----- '-- ---;·-----·--··r ------- -----
;~ ···  i l · !·········· i· l··········!····[ 1 i,~,- ~ > ; -·--· --· i- .......... ; .......... ; ............ : ........... ; ............ ; ........... ;............ [ ........ .. 
;t ·· +····· 1·· ·· ·····11 J ...... l ... ·······i··I~ '''''" ~ > ; V~ Sy~bol Load --'. ........... ;----·-------'. ........... ;...... ..... ; . ....... ;------------'. ........ .. 
;~ j ········· r .. + + J L L 1-1 "'m,, ~ >; --······-~- -·~·····_·· ·-----i-------- ...... ·-----'.·------ ··:----------+-·--------:·-------~ 
;~ · h R f=r h h R h i:-,,. ~ 
> ; · _----~- . -=· ··Ll·--=-----==-----=-----==----r···r----
;~ + L : + · I · l ·+ If- ,.,, ~ > ~ -- · · '.. - ...... ; .......... ; ............ ; ................................. ; ........ ---: ......... ~ 
0 1 2 3 4 5 6 7 8 9 
lime (us) 
Figure 6-37: VAL Decode 
Page - 197 
6. 7. Pixel Components (Layout) 
The full pixel layout identifying the relevant sections is presented in Figure 6-38. This 
layout allows for the abutting of pixels in either direction. VDD is abutted horizontally 
while GND is abutted vertically. 
lOµm Metal 2 Free Zone 
(Metal 2 Routes) 
Wavelet Register (RO) fl' 
..... 
ZTE Component 
--~ ~ 
Sµm Routing Space 
Control & ZTE 
Figure 6-38: Full Pixel Layout 
~ 
-.r.. 
ZTE Routes 
Inter-pixel 
Page - 198 
6 .8 . NBLOCK Layout 
The NBLOCK is the smallest replicable component within the IP Array, which is capable 
of performing a 3-scale image encode / decode in this case. The inter-pixel ZTE status 
signal routing for each pixel within the NBLOCK is unique, yet identical to the same pixel 
in another NBLOCK. The full layout constructed with abutting pixels for the NBLOCK is 
shown in Figure 6-39. The corresponding routing schematic indicating the Metal-2 paths 
used for vertical ZTE status line routing are shown in Figure 6-40. 
Figure 6-39: 8x8 NBLOCK Layout 
Page - 199 
,., ri 1·2 :c 1·3 0 14 • 1-5 1~ - Hi 1·7 16 
II 
2-1 1 . 2-2 2-3 2-4 J 2-5 2-64 . 2-7 I 2'8 
/ , 
3 1JtJ \ 32 33L, • /34~ 35. 36' I 37 38 
~ 
~l I I 41 l!!l • 
'&\1 
4'3 
44 ' 
4'51 4-6 I . 4'7 
I/ I J 5'1, .,J s-2- \ 5'31 5'4 ' 
5q I &7 5'8 I II J I I I 6 1 I · &2 4,. \1 63 ' 65, 66 j . &7 68 
7- 11 7·2 7·3 7·4. 7-5 7-6 7·7 78 
8 1 8-2 8-3 84 85 &6 8~ 88 
Figure 6-40: ZTE NBLOCK Status Signal Routes 
In addition to this, to illustrate the routing mechanism used for the first block of four 
pixels in an NBLOCK, the four figures Figure 6-41, Figure 6-42, Figure 6-43 and Figure 
6-44 are presented. The blue (Metal 2) lines correspond to the vertical routing paths, while 
the brown (Metal 3) lines correspond to the horizontal route paths. Connections between 
the two are by means of vias. These figures illustrate the unique pixel-wise ZTE status 
signal connections for pi,..._els (1,1), (1,2), (2,1) and (2,2) in an 8 x 8 NBLOCK. 
Page -200 
Figure 6-41: Pixel 1,1 ZTE Routes 
Figure 6-42: Pixel 1,2 ZTE Routes 
Page - 201 
Figure 6-43: Pixel 2,1 ZTE Routes 
Figure 6-44: Pixel 2,2 ZTE Routes 
Page -202 
A number of pi.'<els, in the context of an NBLOCK, require that special connections be 
made to VDD or GND. For instance the lowest scale parent (pixel (1,1)) requires that it's 
PSIN signal be connected to VDD for the encode, and GND for decode. This is 
accomplished by isolating the two sections and connecting the two areas to relevant fixed 
statuses, for each such pixel in the NBLOCK. In addition pixels at the highest scales do 
not connect to the CIN status route, because they have no descendants and hence are 
connected to GND. This pixel-wide route line is then used for other ZTE status signals 
for other pixels. Figure 6-45 shows a pixel that has it CIN Route Line connected to the 
internal circuitry, while Figure 6-46 shows a highest scale pi.'<el that has its internal CIN 
connected to ground relieving the CIN Route Line for other pi.'<el status routes. 
No GND Connection 
CIN Route Line CIN Route Connected to Circuit 
Figure 6-45: ZTE CIN Connected 
Page - 203 
CIN Route Line 
Circuit CTN Com1ected to GND 
0 
.;':-17:"!:ffil 
~..,i,·n 
:l::·~- r' C" t..AL+ii::;,g '"'T ' . . ,,d 
'~;~&'i: !J . . .. /J . . ,! •• a_ ... . . 
D/J ... -: . . . '; =. · '.~ :i -;·.~ffii:~ . . ~ t1 .  t ·'IA --· W i ·- 1•· . . . ..
· 4n!irrlff.£iJTJT..m:tfjffl 
~ 
CTN Route Not Connected to Circuit 
Figure 6-46: ZTE CIN Not Connected 
6.9. IPlOOP Full Array 
The layout and micrograph for the IP1 OOP prototype is presented in Figure 6-47 and 
Figure 6-48 respectively. After fabrication and bonding via Europractice the chips were 
returned in April 2001. Operational verification of the chip is still in progress [70]. 
Page - 204 
SOG-a6va 
. otoO> 
~OQ> 
(' OlO()> 
r 11o100> 
•S<l!> 
,,., 
l[dD.l60.l;JJW cfOO[dJ :g/7-9 o.lnf5]d 
:,.nofiv7 fiD.I.IV aooraJ :L/7-9 o.inf5Jd 
<~"'"'PP'+ 
<ftlZJ 
'"'""' <q•s·:i 
i < 1"0-.... llf;-:> 
<eiici,c01-, 
< f)lat,.l.:i--:J 
<.ioqelf"J:i 
<•,, 
6.9.1. IP100P Array General Specifications 
Dimension,: 3.446 x 3.446 mm'{unpackaged) and 3,1 x 3.1 cmi (Packaged) 
Al:tive Pin-Count: Gt Pins 
Re1olution: 32 x 32 pixels 
Power Requirements: VDD (2.SV), VCC(3.3V) & GND 
Operational Frequency; 100- 200 KHz 
6.9.2. I'.PtOOP Simu1ated Array Power Consumption 
Simulations of each pixcl mode have resulted in the following power extremes defined 
per pixel per mode. 
Max Peak Power: 1462.29µ\V (ZID One Detection Phase) 
Max Average Power: 1.799µW (ZTR Symbol Decode) 
Min Idle Power: 1.303G x IO°"µW (DNT Detect, Latch & Shift Out) 
Given these figun:1, a 32 by 32 arr:ay at war.it ease, when every pixel is searching for a 
non significant coefficient \'nluc, the peak power requirements am reach a maximum of 
32 :d2 x 14G229µW = 1498mW, but for periods of(1-2)ns. 
The wom case maximum nvcrnge power for a 32 x 32 array is 32 x 32 x l.799µW = 
1.843mW, which may be he!d for periods of lOu1 if the entire array is decoding ZTR 
symbols at once. Hmi1evcr the maximum number of pixels that can perfonn this 
opcNrion at one time, due ro subband division, is 3 x 16 x 16 (High frequency 
Subbands) = 768. Therefore the maximum uvernge power is reduced 10 1.38mW. 
The minimum idle power dissipation for a 32 x 32 array is 32 x 32 x 1.3036 x to\iW = 
0.0133µ\V, 
Page - 206 
6.9.3. IPlOOP Prototype Testing 
Once the IPlOOP prototypes had been fabricated and rerurncd, appropriate test-beds 
were constructed to test the functionality of the device. The tests are currently being 
conducte<l by another group member progressing towards his research topic [70]. 
Initially, due to a voltage conversion isoue with the test rig, the <levice exhibited severe 
data corruption when more than one pixel was accessed. After assessing t!,,.,"test rig 
further it was deemed unsuitable for testing of the IP100p chip and the resting was 
postponed until the installation of the new Agilant-HP tester was completed. Currently 
the chip's full functionality is being verified via the new tester. Preliminary resuhs 
indicate the full operation of the device. This work ls fully documented in [70]. 
6.10. Conclusion 
Titls chapter has demonstrated the realisation feasibly of a novel parallel ZTE compression 
algorithm as described in C:bapter 4, in 0.25 micron UMC technology, Given the current 
maximum die-si2c coupled with the size of a QCIF army, this chapter has also shown an 
implementation of the ZTE that is restricted to a pixel-pitch of 80 x BOµm. As a result full 
custom layout has bc't:n employed with all primitive components verified for functionality 
within the UMC technology. The operation of the ZTE component as a pixel-wise whole 
has been also verified with the use of Level 49 BSIM 3 HSpice Transistor model snpplied 
by the foundry. 
The significant advantage of this parallel implementation is that it performs the processing 
neL..!s ofa 32 x 32 x 100,000 Hz"' 102.4 MHz ZTE instruction processor with a maximum 
average power of approximately t.843mW. Furthennore, this implementation ls easily 
scaled to accommodate increased resolution {i.e. QCIF, CIF, etc.), finer technology 
(0.13µm, O.lµm, etc), any fonn of wavelet that can be implemented (I.e. Debauchics, 2,6, 
etc) and any nnmber of wavelet scales (4,5 etc.). 
Page-207 
Cliapter 7 
CONCLUSIONS & FuRTHER RESEARCH 
"I think there is a world market for maybe five computers," 
Thomas Wms<>n (1H74-1956), Ch.1innan of IBM, 1943 
7.1. Contributions of this Thesis 
lo the interest of producing a viable product for use in mobile multimedia 
communications, patticubrly in video, this thesis has prcsemed a novel massively parallel 
architecture fo,· video compression. Initially, gencml aspects relating to a video 
compression system have been described, particularly the topics of image representation 
and sampling, motion compensation, transform coding, coefficient coding and emropy 
coding. This is followed by an in-depth analysis of the two main zi:rotree coding 
algorithms, EZW and ZTE, and associated searching algorithms. Both algorithms were 
shown to perform similarly and considered as ideal candidates for implementation in a 
video compression device. The architecture for this video compression device, the IPA, is 
then introduced. In order to minimise high-bandwidth component interfacing in a 
developed end product as well as to minimise power usage, this thesis has presented 
architectures for a unified image / video capture, processing and display device, the IPA. 
This device exploits the advantages of massively parallel processing to reduce the clock 
frequencies required for real-time processing. Design issues for such a 30 OPTO-VLSI 
device have then be~n discussed, particularly in reference to the parallel implementation of 
the wavelet and motion compensation architectures, the scale control mechanisms, high-
Page - 208 
pass/ low-pass pixel selection mechanisms and control elements. These architectures were 
then modified to include necessary components to support the integration of both 
aforementioned zerotree architectures. 
Detail,•d descriptions of the massively parallel architectures developed for both the 
Embedded Zerotree Wavelet (EZW) and the ZeroTree Entropy (ZTE) coding algorithms 
have been provided. Solutions to a nwnber of problems including issues relating to 
sequentinl to parallel processor conversion of the algorithms, highly parallel significance 
searches, per-pixel based pixel self-classification and enablement, army load / unload 
schemes, parnllcl to serial to parallel edge array conversion, and integration into the wavelet 
transform architecture, have been given. Funbermore, a comparison between the two 
zcrottee ar,hirecture!, in terms of coding efficiency and hardware compleidtywhen relating 
to an area limited VLSI device design, has heen performed in order to select the most 
appropriate for prototype implementation. The ZTE algorithm, due to its hardware and 
control simplicity was chosen and a fuU custom layout produced. Issues relating to power, 
pads and control line distribution have been discussed in addition to the fuU definition of 
the circuit in 0.25 micron UMC twin-tub technology. Simulation results for power 
urllisation have then been provided. 
7.2. Conclusion 
This thesis has presented the complete design of two novel massively p:millel zerotrcc 
architectures for implementation within the Intelligent Pixel A.tray paradigm. In particular 
the ZTE has been chosen for fuU custom design testing in the IPlOOP prototype to prove 
the feasibility of such a product. The advantage of the proposed systems stems from the 
parallel nature of the design, which allows for the coding of 25 fps video sequences using a 
clock frequency in the region of 100 kHz, consuming Jess than 35mW of power when 
processing a QCIF image. Although the technology chosen for this implementation limits 
the hardware complexity of the design somewhat, the architectures aod principals conveyed 
in this thesis represent ao importaot stepping stone for not only future mobile multimedia 
communications equipment, but also an interesting area of research. 
Page-209 
7.3. Future Research Opportunities 
This area of research ptcscnll a number of additional challenges that can fonn the basis of 
future research topics. Some of these arc Listed bclnw. 
7.3.t. Colour Video Processing 
This is a significant area of resc=h spanning the ri:gimes of caprurc, processing and 
display components. Colour image processing in panicular inuoduces a number of 
parallel arehitectural chaUcnges including, RGB to YUV conversion, sub-sampLing of 
colour components, register reuse and device compaction, The fundamental difficulty 
with colour processing relates to the available space on chip and, therefore, within a 
pixel, as three sct:s of data (Y, U and V) has to be stored, processed and exchanged. 
New smaller feature size VLSI process technology holds the key to this area. 
7.3.2, Standards Development 
At present the codec employs a propriety streaming format for the exchange of video 
information. However, for the product to be widely appLic:ablc either a new standard 
has to be developed or the present sueam has to be modified to suit an existing 
standard such as MJPEG2000, which provides for the use of a wavclet-zerotree based 
algorithm for video compression. Th.is, although not a particularly exciting area of 
research, is considered a must for successful product deployment in multimedia 
communications. 
7.3.3. Increased Resolution 
In furore versions, technology permitting, ao opportunity exists to increase the pixel 
caprure, display and processing resolution to that of CIF, DVD or even HDTV. ,\s 
with the increased pixel complexity inuoduced with colour, this is heavily dependant 
on the min!mwn feature si2e of the process rechuology at hand, 
Page-210 
Although the suggestions made above are directly related to Image a11d video coding, 
the paraUcl ptocessing array presents other novel opportunities for -devices such as 
optical switches, multiplexers, hologram production etc. which, llllly be of great use in 
tomorrows world. 
Page-211 
Bibliography 
[1] A. F. Blackwell (1996) - "Cottcction: A Picture is Wotth 84.1 Words", Prrx. of the 
First ESP Sllldmt Worhhop, pp. 15-22. 
[2] ARJB (1999) - "Codec for Circuit Switched Multimedia Telephony Service; 
General Description, 3G TS 26.110 VJ.o.1 Technical Specification", J" Gmeralion 
Partnmh;p Projetl. 
[3] W. Penebaker and J. Mitchell (1992) - 'JPEG Still Image Compression 
Standard", VP11 Nartr,md '&inhold,New York. 
[4] G. K. Wallace (1991) - "The JPEG Still Picture Compression Standanl", CIJ!l/111. 
ACMVol34No. 4,pp. 31-44. 
[SJ K. R. Rao (1998) - ''Image and Video CodingTeclinology aixl. Standards'', C,,,1m 
Nola. University of Western Australia. 
[6] 0. Edder, P. Fleury, T. Ebrahimi and M. Kunt (1999) - "High-Perfonnance 
Co1npression of Visual Infottnation -A Tutorial Review- Part 1: Still Pictures", 
Proaeding, efiht JBEB Vol 87 No. 6, pp. 976-1008. 
f7J A. Said and W. A. Pearhnan (1996) - "A New, Fast, and Efficient Image Codec 
Based on Set Partitioning in Hierarchical Trees", IEEE Tm111. 011 Cirrni/J and 
SysW/11far Vidro Ter/1110!00, Vol 6, No 3, pp. 243-250. 
[SJ A. C. Hung (1993) - "PVRG-P64 Codec l.1", Program Dom111tn/"'10JI, Stanford 
Univcr.;ity. 
[9] D. Chen and A. C. Bovik (1990) - ''Vislllli Pattem Image Coding'', IEEE Tra111. 
on Co1111111111iralio11 Vol. }8 No. 12,pp. 2137-2145. 
Page-212 
[10] M. C. Chen and A. N. Willson (1997) - "A Spatial and Temporal Motion Vector 
Coding Algorithm for Low Bit-Rate Video Coding'', Pror. of ICTP'97 Vol. 2, pp, 
791-794. 
[II] CCITI' (1990) - "Recommendation H.261: Video Codec for Audiovisunl 
Services at P x 64 kbit/s", Lint T nvmmision 011 No11-Tdepho111 Sig11ak, Geneva. 
[12) MPEG-1 (1993) - "Coding of Moving Pictures and Associated Audio for Digital 
Storage Media Up to About 1.5 Mbit/s", Ttch. &p., ISO/IEC!S 11172. 
[13] ITU-T Rec. H.263, (1995) ''Video Coding for Low Bit-Rate Communications". 
114] MPEG-2 (1994) - "Generic Coding of Moving Pictures nnd Assodated Audio", 
Te,h. Rtp., JSO/IEC DIS 13818. 
[15] C. Auyeung,). Kosmach, M. Orchard, and T. Kalafatis (1992) - "Overlapped 
Block Motion Estimation", Prot. oflCVCIP'92, pp. 561-572. 
[16] M. H. Chan, Y. B. Yu, and A.G. Constantinides (1990) - "Va.dable Si;:e Block 
Matching Motion Compensation with Applications to Video Coding'', IEE 
Prrm1,i;11g1 Vol 137 Pt. I No. 4, pp. 205-212. 
[17] M. T. Orchanl and G. ). Sullivan (1994) - "Oicdapped Block Motion 
Compensation: An Estimation-Theoretic Approoch", IEEE Trans. on Image 
Procming Vii} No. 5, pp. 693-699. 
[18] N. Ahmed, T. Natarnjan and K. R. Rao (1974) - "Discrete Cosine Transfonn", 
IEEE Tra11t. 011 Comjmting, pp. 90-93. 
[19] R.J. Clarke (1985) - ''Transfonn Coding oflmages",AradtmirPms, New York. 
[20] M. Costa and K, Tong (1994) - "A Simplified Integer Cosine Transform and Its 
Application in Image Compression", TDA Progrur Rrport42-119, pp. 129-139. 
Page - 213 
[21] Y. H. Chan and W. C, Siu (1994) - "Short Communication, An Approach to 
Subband DCT Image Coding", Jo11mal ef Vis11a/ Crimm1111it:Plio11 aJld Image 
&proentaliM, pp. 95-106, 
[22] J. M. Shaplm (1993) - "Embedded Image Coding Using Zerorreei; of Wavelet 
Coefficients", IEEE TraJI!. 011 Signa/Prvmsi11g Vol 41 No. 12, pp. 3445-3462, 
[23] A. M. Rassau, G. Alagocfa, D. Lucas, J. Ausrin·Crowe, K. Eshraghian (1999) -
"Massively Parallel Intelligent-Pixel Implemematloo of a Zerotree Entropy Video 
Codec for Multimedia Communications", VI.fl: Sy1km1 011 A Cbip, Kl,1,ifr 
Awd,mir P11bli.rhm, Portugal, 89-100. 
[24] A. M. Rassau, G. Alagoda, K, Eshraghian (1999) - ''Massively Parallel Wavelet 
BRSed Video Codec For An Intelligent-Pixel Mobile Multimedia Communicator", 
F!Jlh [11/emalional Symposi11m on Signal Procminil, and it, Appliratio11, Qucen1land, 793-
795. 
tzSJ P. M. Bentley and J. T. E. McDonnell (1994) - ''\Vavclet Transforms: An 
Introduction", Elutr1J11irs and Comm1111ir<tlion Enfimm'nJ.}onma~ pp. 175-194. 
[26] A. Grossman and J. Morlet (1984) - "Decompo1irion of Hardy Functions Into 
Square Integnible Wavelets of Constant Shape'', SIAM Journal ef Math. Vol 15, pp. 
723-736. 
[27] Y. Meyer (1989)- "Wnvclets and Opemtors, Annlysis at Urbana vol.I Edited by 
E. Berkson, N .T. Peck and J. Uhl", L«ll1n Notu S1rie.r, London Math. Society. 
[28) I. Daubechies (1988) - "Orthonormal Base.1 of Compactly Supported Wavelets'', 
Cr;mm,111. °" Plffl! and ApplM Malhemalirs Vol XU, pp. 909-996. 
[29] S. G. Mallat (1989)- "A Theory for Multi·reso]ution Signal Decomposition: The 
Wavelet Representation", IEEE Tra"J, Pat/em A11t1/Jti.r a,d Mad,indnldl, Vol 11, 
PP· 674·693, 
Page-214 
[30] A. Graps, (1995)- "An Introduction to Wavelets", IEEE Comp11talion11f Sdma 1111d 
Engi11«ring Vol 2 No. 2. 
[31] K. Ramchandran, K., Vetted~ M, and Hedey, C., ''Wavelets, Subband Coding 
and Best Bases'', P= of!EEE Vol 84 No. 4. 
[32] R. J. O:u:kc (1996) - ''Digital Compression of Still Illlllges and Video", Atlld1mk 
l'ruJ LJd., London. 
[33) E. P. Simoncelli and E. H. Adelson (1990) - "Subband Coding: ChRptcr 4 
Subband Transfonns edited by J. Woods", Kl1111fr Amdemir Pmr, pp. 143-192. 
[34] z. Xiong, O. Guleryuz and M. T. Orchard (1996) - "A DCT-based Embedded 
Image Coder", IEEE Sign<J! Prruming L:llt" Vol. J No. 11, pp. 289-290. 
[35] O. Rloul and M. Vcnerli (1991) - ''Wavelets and Signal Processlng'', IEEE SP 
M<Jg,,zjM, pp. 14-38. 
[36] S. G. MaUat (1989) - "Mulriresolution Approximations and Wavelet 
Onhomonnal Bases in L'R", TrunJ. of the Amlriran Math. Sori1!J VoL J/j No. I, 
pp. 69-87. 
[37] R, A. Gopinath and C. S. Bnrrus (1992) - ''Wavelet Transfoans and Filter 
Banks", Wa11elrts: A TN!Ori11/ in Tlnor, and Applimtio111, C K Chui, ed, Academic 
Press, San Diego. 
[38] H, Cag!ar, Y. Liu and A. N. Akansu (1993) - •'{)ptlmal PR-QMF Design for 
Subband Image Coding", Proc. of Journal of Visual Communication and Image 
Representation Vol 4 No.3, Academic Press Inc., pp. 242-253. 
[39] G. Strang and T, Nguyen (1996) - ''Wavelets and Filter Banks", Wtf!t.rfq 
Cambridgt Prt.rr. 
Page - 215 
[40] J. D. Villasenor, B. Bel2er, and J. Llao (1995) - ''Wavelet Filter Evaluation for 
Image Compression", IEEE Tron;, 011 Ima§ Prousrin,!l Vol. 4, Nr,,, 8, pp. 1053-
1060. 
[41] J. Y. Tham, S. Ranganath and A. A. Kassim (1998)- "Highly Scalable Wavclet-
Based Video Codec for Very Low Bit-Rate Environment'', IEEE joamal 011 
SrkrtedAma i11 Ccmm1mirolion Vol 16 No. 1, pp. 12-27. 
[42] D. Marpe and H. L. Cycon (1999} - ''Very Low Bit-Rare Video Coding Using 
Wavelet-Based Techniques", IEEE Trans. on Circuiw and Systems for Video 
Technology Vol. 9 No. 1, pp. 85-94. 
[43] Y. H. Kim and J. Modcstino (1992) - "Adaptive Entropy Coded Subbaod 
Codingoflmages", IEEE Tmn.r. Imau Prores.ring Vol. I, pp. 31-48. 
[44] A. M'.. Rassau (1999) - "Massively Parallel Wavelet Based Video Codec for an 
Intelligent-Pixel Mobile Multimedia Communicator'', PliD. Tht.ri.r Drpariminl ef 
(jbemrli11, University of Reading, England. 
[45] G. Knowles (1990) - ''VLSI Architecture for the Discrete Wavelet Transform", 
El«troni11 Llllm Vol 26, pp. 1184-1185. 
[46] D. A. Huffman (1952) - "A Method for the Construction of Minimwn 
Redundancy Codes", Pror. ef tk Tll!li/11/r '![Radio Engin1m Vol 40, pp. 1098-1101. 
[47] F. Halsall (1995) - "Data Communications, Computer Networks and Open 
Systems (Fourth Edition)", Addison-We11ley Publishing Company Inc., USA, 
pp.139-145. 
[48] D. S. Hirshbetg and D. A. Lelewer (1990) - ''Efficient Decoding of Prefix 
Codes", Comm. ACM Vol JJ No. 4, pp. 449-459. 
Page - 216 
/49] J. H. Witten, R. M. Neal and J. G. Cleary (1987) - "Arithmctk Coding for Darn 
Compression", Compmi11g Pmlfin; C(ll1//'ll/mirrtlio111 of tin ACM Vol JO No. 6, pp. 
520-540. 
!50) C. E. Shannon (1948) - "A Mathematical Theory of Commwtlcarion", Bt/1 
.fy1tem1 Ted1. ]011m11/ Vol 27, pp 379-423, 623.656. 
/51] H. Man, F. Kossentini and M. T. J. Smith (1997)- "Robust EZW Image Coding 
for Noisy Channels", Pror. oj/EEE Sit,11,,/ PnJ(ming Utlm Val. 4 No. 8, pp. 2'll-
229. 
(52] J.M. Zhong, C.H. Leung and Y, Y, Tang (2000)- "An Improved Embedded 
Zerouec Wavelet Image Coding Method Based on Coefficient Partitioning Using 
Moq,ho!ogiral Operation", jo11T1111I of P11111m &mgnilio11 f1nd Artifitin/ l111tlligr11rt Vnl 
14 No. 6, pp, 795-807. 
[53] E. Kang, T Tanaka and S. Ko (1999) - "Improved Embedded Zcrotree Wavelet 
Coder", Eledroniff Lttm Vol JJ No. 9, pp. 705-706. 
J54] S. A. Martucci, I. Sodagar (1996) - "Zerotrec Entropy Encoding of Wavelet 
Coefficients for Very Low Bit-Rate Video", Pror. of tht IEEE J1r1. Coll} !l'lrdge 
Pmmli11g, Switzerland 
/55] S. A. Marrucci,I.Sodagar, T. Chiang and Y. Zhang (1997) - "A Zerotree Wavelet 
Video Coder", IEEE Tro111. on Cimtif; and Sytkmtfar Vid<'O Ttdmo'4r;J Vol 7 No. f, 
pp.109-118. 
/56] A. Kaup (1999)- "Object-based Texture Coding of Moving Video in MPEG4", 
Proc. IEEE Trans. on Clrcuits and System for Video Technology Vol 9, pp. 5-15. 
/57] T. Carmen, C. Ldserson. and R. Rivest (1994) - "Introduction to Algorithms" 
(14th printing). Tlie MIT Prm. 
Page-217 
/58) i\. M. R:issau, K. Eshraghian, H. Cheung, S. W. Lachowicz, T. C. B. Yu, W. A. 
Crossland and T.,D. Wilkinson (1998) - "Sman Pixel Implementation of a 2-D 
Parnllcl Nucleic Wavelet Traosfonn for Mobile Multimedia Communkarions", 
Prw. cf tht Dtsign, A11/omalio11 and T ts/ in Enrop, Coefmnrt. 
/591 C. Spizig (2000) - ",\nalog·to-Digital Convener Array Implcmenmtion for !he 
Camera-On-A-CMOS Chip", Marlm tht!ir; Uniwrdty cf Ulm mrd Edith Cowan 
U11i,~rd!Y, 
[60] W. A. Crossland, T. D. Wilkinson, T. M. Coker, T. C. B. Yu and M. Stnnlcy 
(1997) - ''The Fast Bitplane SLM: A New Ferroelectric Liquid Crystal on Silicon 
Spatial Light Modu!Ator Designed for High Yield and Low Cost 
Manufacturability", OSA TOPS Vol 14 Spali11/ 4t.h1 Mod11/ator,, pp. 102-106. 
[61) N, Collings, W. A, Crossland, P. J. Ayliffe, D. G, Vass, and I. Underwood (1989) 
- "Evolutionary Development of Advanced Li,:uid Crystal Spatial Light 
Modularors",App.itdOptkr Vol. 28 No. 22. 
[62] L Dla2 de Cerio,,\, Gonz:ilez and M. Valero-Garda (1996) - "Communication 
Pipelining in Hypercubes", Pam/It/ 1:i,.,.,.ringL.t11m Vol. 6 No. 4, pp. 507-523. 
i '3] S. Y. Lee and J. K. Aggarwal (1986) - "Parallel 2-D Convolution on a Mesh 
Connected Array Processor" Pror. cf IEEE Comp,tltr Sodt!J Conjtrt11rr 011 Compntrr 
Vinim 1111d Pal/1m &rognilion, pp. 299-304. 
[64) L. Wall, K. Fercns and W.Kinser (1993) - "Real-Time Dynamk Arithmetic 
Coding for Low Bit-Rate Channels", Pror. IEEE Co1111111111kalio111, Comp11tm & 
PoutrC011jtrt11rt, pp. 381-391. 
[65] H. N. Cheung, G . ./\Jagoda, K. Eshraghian and L Ang (1999) - "Sman Pixel 
VLSI Architecture Fnr Embedded Zerotree Wavelet Coding", Fijib ]11/1m11Jio1111/ 
Symposi/1111011 Sig,1111/ Pro1willga11d itsApp!kalion, Queensland, 693-695, 
Page-218 
[66] J. Bae and V. K Ptasanna (1995) - "A Fast and Arca-efficient VLSI Architecture 
for Embedded lrru.ge Coding", Pro~ ef lnt•maJimsa/ Omjmna on Ima§ Prorming, 
Vol J, pp. 452-455. 
[67] L Ang, H. N. Cheung, and K. Eshraghian (1998) "VL'>I An:hitecturc for 
Significanc~ Map Coding of Embedded Zcrotrec Wavelet Coefficients", Pro, ef 
APCC4S'98, pp. 627·630. 
(68] K. Eshrnghfan (1991) - "CMOS & BiCMOS VLS Design Analog & Digital", 
llltmdvt S11m111tr Co11m at the Swiss Frdm,/ Insfihile ef T«hM!og, Lausanne 
Electronics L,borntoties. 
[69] D. P. Puclmcll and K. Eshraghian (1994) - "Bask VLSI Design Third Edition", 
Silken SysftlllI EntJnming Srries, Prentice Hall, Australia. 
[70] D. Lucas (2001-2002) - "Primitives and Design of the Intelligent Pixel 
Multimedia Communicator'', Pb. D. '[/mis, Editb Cowan Uni1miry, Stlll in ptogress. 
i'71J N. H. E. Weste and K. E;ihrnghian (1993) - ''Principals of CMOS VLSI Dei;ign, 
A Systems Perspective, Second Edition",Addiron-Wn/9 P11bbJbing Comp"'!}, 
[72] P. Pimh, N. Demassieux and W. Gehrke (1995) - ''Vl.5I Archirectures for 
Video Compression :-A Survey'', Pror. of the IEEE Vol BJ No. 2, pp. 220-246. 
Page-219 
Appendix A: EZW VHDL Listings 
•• IP 1 Bit Regl•tor 
•• Cl,Ahg<>d.11 2DOI 
•• Rev, 001 
library leee1 
uoe loe,,.otd_l<>9lc_1164.ollr 
l'NTl'l'Y HGl IS 
l'!lRTC 
" CLR, CW!; 
DO, DOD ,, 
Ellll RBGlr 
In otd_l<>gtc, 
In otd_lCl!JIC1 
out atd_l0glc 
MCl!ITECTIIRE ~U<:i' DP RE(Jl IS 
oignal A, II, c, c, R, p otd_loglct 
HOIN 
A •• l>I when !CU:. 'D'l •l•• s, 
B<• lnotCIJ 
c <• !A nand ct.RI, 
D <• C lomen (Ct.I:• 'l'f else Pr 
E<•Cnct1>l1 
P •• (E nond Ct.II) 1 
DO•• 111 
DOB<• P1 
am rmiucr, 
•• IP RO Regloter COn>po:Jent 
•• 11,Mogoda 2001 
•• Rov, 001 
llbrory leeo, 
uoe looe.atd_l09lc.llU.all1 
Effl"ITY IIAIJ<R5G IS 
PORTI 
CI.It, -0, Cll<M!lB, ROCtl, RO<:t2 
Global Centro! In 
RPRln, Above, RC 
Signah 
GEl>oblo, 11'1', Rov, SlgR, ZROl 
COntrol In 
Mder, EXT, R4out 
'" ccnva, BnROa 
RO<>Ut, Sig,, 
,, 
Ellll 11AlllltBG1 
COHl'OIIEIIT RBCII 18 
-· " =·= DO, DOB ,, 
END COMPONDIT I 
In 1td.logk1 
In otd_l09lc1 
out std.loglc 
, In otd_loglct 
In otd_loglc1 
In otd_l"9ic1 
in otd_lo,gic, 
out otd_lo,gic1 
out otd_lo,gic 
olgM\ HXOV 
olgna\ R0"1'V 
1lgnal ROlrl'IIV 
olgnol Tl, T2, Tl, Ti 
oignall 
otd_lo,glc_w,otoo (U downto 1J, 
otd_lo,gio_veotot I• downto OI, 
otd_lo,gk_wot<>< la downto 01, 
1td_lo,gio1 
OChet Mixed 
!nt&rnal 
Oat• Paths 
lnteonal Cont•! 
•• Oat• Paths 
Tomp 
Page-220 
olgnol ece Btd_l<>gic ,. 'I' I 
ol9nol .. std_leglc1 
BE<l!N 
-- Add reg elemonto 
.. 
' 
em, port NS 
'" 
.. MXOVIIO),CLR CLR,C!J< 
ROl1l'EV(Dll1 
u 
' 
REGl ron ~s 
'" 
.. ROIITl'(I) ,CI.R CI.R.,CLK 
ROUTl!Vll))1 
" ' 
eoo, ron ~s 
'" 
ROIJTV(Jl,CLR CLR,Cl.K 
RO\l'l'IIV(21), 
" ' 
=> port NS 
'" 
ROIJTVIO ,CI.R CLli,C!J< 
ROl1l'EV(lll1 
" ' 
REOI ron ~s 
'" 
.. ROUTV(S),Cl.R CI,R,Cl.K 
ROUTl!V 14)) t 
CLR,,ci.K 
"' ' 
eoo, ron NS 
'" 
.. MXOVl1'1,CLR 
RO\l'l'IIV(SI) ! 
CI.R,'CoK 
" ' 
em, ro« ~s 
'" 
.. IIXOVl81,CLR 
ROl1l'EV(Sll1 
" ' 
REOl port ~s 
'" 
.. MXOV(•),CLJI CLJl,CLK 
R011l'EVl7)lt 
.. 
' 
,oo, rort NS 
'" 
.. MXOV(Sl,CI.R CI.R,C!.K 
ROU'l'SV(O)), 
-- Clock• 
T1 <• lru and ill!n,.ble and nJ I 
T2 <• IC!.K and ClkMSB and OSnablo) I 
-- l!U><eo 
HXOV(ll <• MXOV(2) when (LP• 'I') olso Adder, 
""OV(21 ·- R4out when IZROI • 'l') eloe ROUTV(Ol I 
MXOVll) <• E:llT "hen (ROCtl • 'O'I eloo !<>COVIIIJ 
Tl,DO 
Tl,llO 
Tl,00 
Tl,DO 
Tl,00 
n,oo 
Tl,oo 
Tl,DO 
T2,oo 
MXOVl4) •• MXOVl21 whon l(ROCOl nor ROCt2) • 'l') eleo MXOV(llt 
HXOV(5) <• R4out <lhen (SlgS • •1•1 ol,o HXOV(41 t 
MXOV(G) ROUTV(81 whon IIIT • •1•) eloe MXOV(S)1 
!<>COV(71 <• Rottrvl1) when IRev • 'I') olse MXOVlll)1 
MXOV(81 n MXOV(21 when (Rov • 'I') eloe Rottrv(7)! 
MXOVl1'1 <• ROUTV(l) when (Rev. 'l') eloo ROU'IV(S)t 
IIXOVIIDI <• ROUTVISJ When IRev • 'l') eloe ROIJTV(l)1 
HXOV(ll) ,. SigE when (S!gE • ,,,1 ehe IIXOV(GI, 
HXOV(l2) <• Ahovo when (RC• 'I') elso ijPRlno 
-- Gonorol crap 
Tl <• IROCC2 nor I not ROctl) I, 
Conv~ <• Tl, 
ROout <• ROUTV(OI; 
Sign <• ROU'I'Vla)t 
n <• Elll!O or INT and !<>COV(l2) I, 
EnROO <• 1'41 
RotnV(O),OOB 
ROUTV(l) , POii 
.. ROIJTV (2) ,DOB 
ROUTV(l),DOB 
ROIJTVl41,,DOB 
.. ROUTV(5) ,DOS 
ROIJTV(6J ,DOB 
ROIJTV(7),00B 
ROUTVIBl ,PO!I 
LP <• IIIT and (not MXOV!lJ) I and (not Tl) I or ( !noo Tl) nor (ROUTV IOI 11 o 
ENO STRU<:T, 
- • IP Adder Component 
-- G,llhgoda 2001 
-- uv. 001 
library ieee, 
uoe ieoo.st,Llogle_us4,al11 
ENTITY ADDeR IS 
PORT( 
= 
RO, KCEXT 
Soot , 
a, anROa, Conva ,, 
IDm ADOER1 
in std_logic, 
ln •td_logic, 
OUT ot<I !ogle, 
111 OO<l_logle 
AACHITSc:nmE STRUCT OF J\DOSR IS 
C'OMl'ONENT RaOI IS 
PORT( 
m 
C(R, CLK 
in otd logic, 
in atd::logic, 
., 
.. 
.. 
.. 
.. 
.. 
olgnol ~I 
oignal Cout 
sign.ml Cin 
algn.ml Al, A2 
oignol CLR 
e!gnol OOB 
BEOIN 
, out o,d,._loglc 
, atd logic1 
, otd:1oglc1 
, otd logic ,. •o•, 
o<d_l0gie, 
, otd !<>glc ,. •1• 1 
I ••d=!Oglc1 
•• Md A o!ngl• reg e1oroont 
u ' REGl p,,rt mop (DI •> Hl,C!.R •> CLR,CLK ., CLK,DO •> Cin,DOB •>DOB). 
•· Md 
Al <• (RO xor ConVol, 
A2 <• ( !MCEXT xor SJ nor eonva) , 
sout <• IAl xor A2) wMI\ Cln • •o• elo• IAI xru,r A21, 
Cout <• !Al oncl 1121 or ICln a.nd All or ICln oncl A2l I 
Ml<• Cout whon EnRO• • '1' oleo !ln,,t •I or ConV0)1 
!<ND srn=, 
•• 1P Motion cornpeneaolon Cocnponono 
•• G.Alogo<lo 2001 
•• K<=V, 001 
llhraty ieeo, 
uoe !eeo.a<d_loglc_us<.oll1 
ENTITr HCM IS 
PORT( 
CLK, LR, LR1'<5, Sormout 
C<>I1trol In 
In atd_loglci 
in aod_logic; 
In atd_logic, 
out atd_loglc 
"' Cono«>l In 
'" 
'"' 
COHFOOENT Rao! Is 
_, 
0, 
CT.It, CLK 
DO, DOB ,, 
END COMPONEHT1 
elgnal Cl.It 
signal HX 
•ignol RID 
olgn•l IIOlO 
algnol RlOB 
signal R>OB 
signal Tl, T2, TJ 
aign•l• 
BEO!N 
In std_logic, 
In atd_loglc1 
out etd_loglc 
otd_loglc ,. ,1• 1 
atd_logic_vector (a do,mto 0), 
nd_logk_vector 15 downto OIJ 
a<d_loglc_vector 15 dOWllCO DI, 
ecd_loglc_vectcr !5 dowto O) 1 
etd_loglc_v~ctor (5 downtc 0), 
std_logic1 
•• Md ros1 ele1DOnta 
Global 
Internal 
II.Ota Pathe 
LIO, RBGI port mop IOI•> RlO(l),CLR » CLR,CLK •> Tl,DO •> RlOIDl,OOB •> RIOB(Dll1 
LU, 111!01 port ""'P ID! •• Rl0(2),CLR •> CL!l,CLK •> Tl,DO-> RIOlll,OOB •• RlDB(l)II 
Lii RBOl port mop !DI•> RlOlll,CT.lt •• C!.R,Cl.K •> Ti,DO •> RIO!>),POB •• Rl0!!(2)11 
Lil REG! port "'"P (01 •> R10141,CL!l •> CU!,Cl.lt o> Tl,DO •> RlO(J),POB •> RlOBIJ))/ 
Ll4 t Hill port ltlllp (DI•> RlD!Sl,CLR •> CLR,CLK •> Tl,DO •> Rl0(4),DOB •> RIOBl4))1 
LIS, RE<.ll pore "'-'P ID!•> Ml:12),CLR •• CLR,CLK •> TI,DO •> RI0!5),00B •> RlOB(SII• 
•• Md reg> elecnenta 
r.,O REG! port inap {0! •> R20ll),C!Jl •> CLR,CLK •> T2,DO •> IIOlO{O),!XlB •> R20B(O)l1 
L21 Rl:<ll port IMP (DI•> R20l>l,CIJI •> CLll,CLlt •> T2,DO •> R20(l),DOB •> R20Bl1))1 
L2' REDI per< map (01.•> ruO!Jl,CLR •> CLR,CL!t •> T>,DO •> Rl0(2),DOB ., R>OBl2))1 
L2J REGl port map IDI •> R20(41,CLR •> CLR,CLK •• 'n,DO •> R20(a),OOB •> "-lOBU)h 
Page-222 
L24, 111101 port ""'P IDI •> R2o(Sl,CI.!1 •> CLR,C'LK-> T2,DO •> R20l4),DOll- •> R.10!!(4)11 
L25 t rum1 pert map IOI•> l'IX(l),C'LR -· CLR,CLK •> T.l,DO •> R.10(51,DOB •> R2oa!S))1 
-- Cloc!<o 
Tl <• (CLK and MC) I 
Tl <• (Tl ond (not Stnnoutl) 1 
T2 <• !Tl ond Strn,,utJ I 
-- """"" MlCIJI <• Mderwhoi> IL!!• 'l') doo ROCUtt 
MX!21 <• MXIJ) when 11.111215 • •1•) eloe RIO(Ol1 
lllt(IJ <" MlC(J) whon ll.Rl245 • •I•) else R.10(0)1 
MX(O) ·- R20IO) ~hen l•t.nnout • 'l') el•• RlO(ol, 
-· Goneral Crop 
MCout •• MXIO)i 
END STauc:r, 
-- lf-SignUleonco ldentlfleatlcn compo,,e,it 
-- For the EZ>I 
-- O,Alagod• 2001 
-- Rov. 001 
Ubrary ieee, 
uoo l~oe,atd_l0gi.~_U6t,all1 
ENTITY SIOM IS ,_, 
CLK, Bt,..,,uo, FZT 
CClltTI>l In 
ZT, Sn!tOa 
ContTI>l In 
Cln, Bbln, ROou,, Rtout, isl, ZROl 
'" SigB, Sig, ::SI, '182, sbout, l'<>u, 
,, 
SND SID>!; 
ARCHITRCn/R£ srnuc:r Of SIDM IS 
COMPONENT RllGl IS 
PORT( 
"' CLR, CLK 
~.= ,, 
In std Icgic1 
ln_otd:10gk1 
out otd_logic 
1!110 COHPONENT1 
oign>.l CLJI otd_lcglc ,. •1•1 
olgnd RJO, RJOB otd_logic; 
signal TI, T2, Tl, T'I, TS, TS c otd_logiC/ 
TEmp eign>.lo 
BEGIN 
• - Add r•g element 
In otd_logic, 
!n •td_lcg!c1 
in •td_logic, 
out otd_Iogic 
Ll ' REGl port map IDI •> Tl,CT.11 •> CLR,CLK •> T2,DO •> RJO,DOB •> RJOB!, 
•• CloekB 
T2 <• (CLK and IZT nand !(EnROa or (lnot Btrn,,utl ond FZT)I rucnd Rl0l!l))1 
-- General Crap 
BlgE •• RJOB ond Tlt 
Sig <• 8301 
Tl•• IZT •nd ROout and Str,c,out) or IZT and R<out om! !net Strmouoll, 
Tl ·- R30B ond Tlt 
ZSl <• Tl and Strmout, 
ZS2 <• IStnnout and I ICln nand (not Tl) I nand (1.ROl nand Tl))), 
T4 <• Cln or sbln1 
Sbcut •• T4 or Tl 1 
i'<>ut •• T4 when (Stmout • •1•) oloe zs1, 
END STRUC'Tt 
Global 
Internal 
Doto Path• 
Data Path• 
Page ·223 
'" IP EZIQ! C<>mponon• 
-- <l,lllo~do 2001 
•• RO?. 001 
library iooe, '- \. 
uoe leee.otd_logfo_uat,oll1'· 
ENTITY SZW!! IS 
PORTI , , 
C~K. Llll'4E, Strmcut, FZT, HPRin, 
-- Gl<>llal con«ol In 
EnROa, ZT 
-- InOetilal Control In 
Pin, Shin, Cln, ZROl, ROout, !:KT 
- - nata Paths In 
<ll!noblo, HPROUt 
-- rnternol Control out 
Sbout, Pout, Pbout, SlgE, ZR01 
-- Data Patho Out. 
/\bevel, Solow! ' I 
-- unidirectional ?? 
llboveo, Snlowo 
-- unld.ireetlonal ,,• 
) ' .. .-'..-
END EZWM/ 
ARCHITSCTIIIIE STRUCl' OF RZHH ·:is 
' 
COMPOME!IT !!a<ll IS 
PORT( 
" CLR, CLK 
IIO, OOB ,, 
END COMPOIIE!IT1 
COHl'OHEIIT SIDH IS 
PORTI 
In otd_logic, 
in ot:<l_lcgie, 
O!'" otd_logic 
cLK, sormout, FZT 
control In 
ZT, EnROa 
control In 
Cln, Shin, ROout, R4out, ZSJ, ZROJ 
aigE, Sig, Z.Sl, zs2, shout, Pout 
VEnobln , l!Hnahle, S in 
In otd_logle, 
?';, / In etd_loglc, 
in •td_log!e, 
in otd_log!e, 
In otd_logle1 
cut etd_loglc 
••• 
In otd_logle, 
out otd_logie 
otd_logle, 
otd_lcgie, 
otd_logle1 
Gl<>llol 
Intornol 
DaOo Patho 
o!gnol 11>:0V 
olgnol R4ouo, R5out, RGcut, Rtoutb, R5outh, RBoutb 
olgn•l Tl, T2, Tl, T( 
ctd_loglo_veotcr (I to 10), 
otd logic, 
etd::loglo, Te"!' 
olgnols 
oignal C!,11 
signal Sig 
oi9n.al ZSI, Z.S2, Z5l 
eod !ogle,. •1•, 
otd-lcgle1 
etd::logic1 
BSaIN 
-• Mil sl9 Component 
SIDHl ' SIDM pott map (CLK •> CLK, SOtrnOUt •> s,rmcut, FZT •> PZT, ZT -· ZT, EnROo •> 
EnROo, 
ZROl, 
Cln •> Cin, Shin •> Shin, ROou• •> ROcut, R4cut •> R4out, ZSJ •> ZSl, ZROl 
SlgE •> SigE, Sig •> Sig, ZSI •> ZSl, ZS2 •> ZS2, Shout •> Shout, P<lut •> Pout) 1 
-- Add re9 elemento 
Rt Ram port map IOI•> MXOVl2),Cl,II •> CLR,Ct.l< •> n,oo •> R4ouo,ooa •> R4ouob)1 
RS, asa1 port map IOI•> IIXOVll),'.:IJI •> CLR,c:t.K •> Tl,oo •> RSout,OOS-, R5o~t'.b)1 
RG, RBGl port ""P IOI•> MXOVIS),CLll •> CLR,CLK •> T2,00 •> R6ouo,ooa •> ROo~Ob)1 
•• Cloek8 
Tl <• Tl and CLK ond IEnROO nor (not ZT) I I 
T2 .- (FZT orul CLKI ! 
-- l!uxeo 
MXOV(ll <• ROout when l(T4 ond lnoo FZTI ond strmout) "•1•) aloe (EXT)i 
MXOV(21 <• MMOV(ll 1'hen IW!l245 • 'l'I oho IZSll1 
Page-224 
NXOVUI Rtout when !LRl.ltS • 'l'l oho (ZS211 
NXOV(tl <• MllOV!JI wbon !Ti• 'l'I olao (RSOUtl1 
IIX<>V(SI <• ROouO wllen (EnROo • 'l') "1,e NXOV!•I, 
KXOV(61 (l!XOV!S)I wt,en 11'3 • 'l'I oho IBXT)1 
l<l(QVl1) <• (Slgl when IS• 'l') oloo IPlnH 
MXDVla) (Abcvnll when (IStrmout or (not Z1'1) • '1'1 else !B<llowll, 
!1XOVl9> IKXOV(SI) when !WH<US • '1'1 eloe (1'l)t 
MXOVllO) <o !Rioutl vhen ((1'3 •nd Z1'1 • •1'1 oho !Tl""' MXOV(D))1 
T4 <• S ond ZT, 
HPROUt <o lhPRln ><Or Tl), 
Tl« Wnoble ond Hi<noblo and l(not ZT) or NXOV(1) or IZT ond BflllOal)1 
G""Oble <o Tl, 
P!XOUt <• not(no, MXOVl61), 
AbcVeo <• (not (not MXOVIIDIJI when (16ormout or (not ZTI) '0') oho Abcvei, 
Belowo <• (not lnot M!;OVl!Ol)I whu (ISt<mout or (not zT)) 'I'! o\oe B<llowi; 
ZSl <• IR4outb nond R5outb) 1 
ZROl <• IH ond lnot st....,Ut))1 
1'110 STROCT1 
•· IP E.W Full Pi>«1l 
•• a,Alagodo 2001 
•· Rev, DOI 
Ubrary ieee, 
uoo ieee.atd_l"!lic_ll64,oll1 
.min IP IS 
PORTI 
CW<, L!U2'15, StrTJ>Out, FZT, S 
•• alobol Control In 
EnRO, LR, RC, Ml, M2, cU<IISD, ROCtl, ROCt.l, Rev 
•• Globol control In 
Pin, Shin, Cin 
-· Data Path• ln 
Lin, Rin, uln, Bin 
·- Data •••ho In 
llbovol, Below!, HPRln, VSnable , HRnable 
•• !mldirootlru,al n 
HPROut 
Bxto<nal Control Ouo 
Sbouo, Pout, Pl,o;,ut 
Data Path• out 
EHD IP1 
Abcvoo, Eolowo 
•• UnldirectiolUII 11 ,, 
ARCHITl!"l'URE STRUCT OP IP rs 
COMPONENT MO! IS 
PORT( 
Cl.I(, LR, LRl.l(S, Str11>0u< 
•• Olobal Conorol In 
"' •• Internal Control In 
Adder, Roouo 
•• 11.ota Path• In 
="' 
-· Data Path• out ,, 
END C<lMPOm:NTt 
COHPONEllT J\OOER rs 
PORT! 
~· RO, MCEXT 
,~, 
S, EflllOa, ConVa 
,, 
SND COMPONE!l'l't 
COMP0N!;H1' IUIINRBG IS 
PORT( 
CLK, Enl!O, ClkMSB, ROCtl, R0Ct2 
•• Global Control In 
, in otd_l"!lic, 
'" 
in otd_l"!liOJ 
in ood_l"!lic, 
in std_l"!II01 
out eod_logio, 
out otd_l"!llo1 
out otd_l"!llc 
in otd_l"!licr 
in otd_l"!lic, 
in etd_l"!llc, 
out otd_l"!ll" 
in otd_logio, 
In etd_loglc1 
Oll'l' etd l"!llc-1 
, IN otd_l"!llc 
Page-225 
HPRin, Jll>cve, RC 
other HlX<td s19nah 
Ol!noble, hT, Rev, SigB, Z!!Ol 
-- ineemol Conerol In 
Mder, BKT, Rlout 
•• Data Paths In 
"""Va, EnROo 
1nternol Ccntol out 
ROout, Sign 
•• Oat• Patho Out 
,, 
SNt> COHPONENTt 
"""P<llmIT szm, '" 
PORT! 
in otd_logic, 
in oed_loglc, 
In 3t,;l_logic, 
out etd_logie,1 
ou, •td_logic, 
CLK, 1.R12•s. Strmout, PZT, HPRin, 1/Ena!,le , HBnabl•, S 
•• Gl<>l>•l Control In 
EnRD•, zr 
-- Internal control ln 
Pin, Shin, Cin, ZROJ, ROout, SXT 
-- O•t• Poths In 
OEnal>lo, HPROUt 
-- Internal Contd out 
Sbout, !>out, fixout, SlgE, ZROl 
-- oata Patho Out 
Above!, aolowi 
•• unldirection&l 77 
llbcveo, aolowo 
-- unldhectlonal n 
" am, C0""""""1'1 
,Tgnal IWWa, SigE, ZROl,ZROZ Sign 
slgnal GEnablo, WT, MC. zr, Slf, Con Vo 
signal ROout, ADDER], EXT, MCou~ R4out, RSoul 
signalM4 
signals 
,lgnalSlg, HPMUX, MCEXT 
eEGIN 
- Component Port M.,p, (Shee,h) 
:,tdjoglq 
ln otd_loglo, 
In •td_logio, 
~, 
""' 
In std_logic, 
out otd_loglc, 
:std_!oglc: 
:.Id.logic:: 
:,tdjoglo_>«lor (I 10~); 
:,tdjogk; 
otd_loglo1 
•td_log!c, 
-Temp 
ADDl: ADDER port mop (UK-,. UK, RD-> RO.,,.,t, MCEXT-> MCEXT, Soot ..,. ADDER], S "" S, EnRll, ..,. EnRO,, 
ConVa->ConV•): 
MCM!, MCM port mop (UK ,.,.UK, LR-> Ul, tR124S..,. LR!24S, Strmout..,. Slnnoul. 
MC-,. MC, Adder-> ADDERl, ROout-> ROout. MCout-,. MC:C.,t); 
MRMI: MAJNREG port map (Cl.K, EnRll, CTkMSB, ROC!I, ROCl2. HPRln, Abovel. RC, GEn,blo, WT, Rev, SlgE, 
ZRDl, ADDERl, EXT, R4out, Con Va, EnRO,, ROout, Sign); 
EZWI: EZWM port map (CU<, LRl245,Slrmoul, FZJ', HPR!n, VEn,ble, HEnablo, 5, 
-MW<cs 
EnROa, ZT, Pln, Sbln, On. Sign, ROout, EXT, GEnablo, tfPROut, 
Sl>oul, Poul, Pll<oUI, SlgE. ZROl, Abcwci, !lelowi, 
Aho,.,,, !lelowo); 
M4(1) <- Rln when (LR• 'I') ol .. Lin; 
M4(2),:,, Bin when [LR" 'I') olse Uln; 
M4(3)""' M4(2) when (RC" 'l')ol"" M4(l); 
EXT""' 'O' whon ((SH• 'l') and (CTkMSB•'l')) el,e M4(3J; 
HPMUX <- Abovel whon (RC• 'I') obo Hl'!Un; 
MCl!XT <- MCout when (MC •'\')else EXT; 
- Logic Me Boby 
St!<-'\' when ((Ml • 'O'),nd (Ml• 'O')) olse 'O'; 
MC<" '1' when ((Ml • 'O')•nd (Ml •'l')) else 'O'; 
WT""' 'l' when ((Ml• 'l') and (M2" 'O))d.., 'O'; 
zr <- ']'when ({Ml •'I) and (M2• '!')) else V: 
ENOSTRUCT; 
Page-226 
-- Ip Teet blld 
-- simple Pece<le 
-- G,Aloge<lo 
lil>rory iooe I 
use leee.otd_loglc_ll~4.oll1 
use •td. textio.oll1 
uoe loeo.etd_l<>9"ie_toxtlo.ell1 
ENTITY IP_TII_EZWllBC lo 
End IP_TII_E""1lEC I 
IIRCH!TEc:nmE STiii of IP_T!I_EZ\IDEC IS 
-- Tlmo Poriodo 
conotont PERIOD , time ,. 1000 no, 
-- Md S"""' Ccrnponento 
COMPONENT IP 18 
FORT( 
,;i,i, Ll\1ll4S, Stnno~t, FZT, S 
-· Qlol>al control In 
EnRO, Lii, RC, HI, H2, Clk>!SD, ROCtl, ROC'l:2, ROY 
-- Olob>l control In 
Pin, Shin, Cln 
•• Dato Poehe In 
Lin, Rln, Uln, Bin 
•· Data Paehe In 
,'.h<>Yel, Below!, HPIIIn, IIEn>.ble • HEnoble 
•• t1nidlrocelonal 17 
HPl!Out 
External Control out 
about, Pout, P!1tOut 
Data Poth• Out 
llhoveo, Below 
-- Unidirectional 17 ,, 
~p C()l,IPOOENT, 
-- Sl911al Definitions 
signal CLk 
ei9nol Cnt 
si9Tiol Fin 
elg,iol Tdoto 
otd logic,. '0'1 
lnt"9er ,. 01 
std logic,. •0•1 
otd:loglc_vector 18 <l<lwnto l) 1 
Ol9TI•l L!IU4S, Str,oout, FZT, HPRin, \'En4ble , HEnahlo,' s 
- - c1<>1>a1 cont<ol In 
signal En!IO, LR, RC, Ml, Kl, Cli<IISB, ROCtl, Roc,i, ROY 
-- Internal control In 
signal Pin, Sbln, Cin 
-· Dato Patho In 
signal Un, Rln, Uln, Bin 
•• Internal Conte! Out 
aignal HPl!Out, Sbout, Pou,, Plxout 
-- Doto Patho out 
olgnol llhovel, Below! 
-- Unidirectional 11 
signal N>oveo, Below 
-- Unidirectional 11 
--olgnol I<l><l 
signal OB 
downto l) ,. •ooocoooo,0000000• 1 
obared vorlahl~ I<l><l, !<be.I 
pr<>eodure NP lvorloble Hult , integorl lo 
BOOIN 
wait (or P2RIOP'11Ult, 
BNP WP1 
in atd_loglc, 
'" 
In 1td_loglc, 
In otd_logic, 
In otd_l09ic1 
out otd_logle, 
out otd_logic, 
out otd_logic 
, otd_lc,gic ,. •o•, 
inte,ger ,. 81 
otd~loglc_.,octcr 
lnte,ger ,. o, 
Page-227 
"' 
amm 
-- Do l'<,rt ltmp (Bh• .. hl 
!POI ' IF port map (CLK, Lal.lts, strmout, FZT, B, EnRO, LR, RC, Ml, 11,, CikMSB, ROctl, 
ROCt2, ll<lv, 
Hn, Sbln, C!n, Lin, Rln, U;ln, Bin, ,'.hovel, B&lowl, HPRin, VEnabl• 
, HEnoble, 
HPRDu<, about, Pcut, P!xout, 
l\boveo, Bolcwo), 
Set Cleek and cthor ccncurr<int crop 
CLK •• ((not CLK) on<! I not Finl) after (PERIOD/2) 1 
cnt •• cnt + l al' ter PERIOD when !Pin • • o •) , 
TI , p«,ceae bogln 
-- 1><, TE•t prccoso llor<1 
Cleor olgni!lcance Info 
- - Test t.o.d ond SubO<oct 
-- clear all three regs .,1,h •eroe 
VEnoblo <• 'l' , HEnoble <o • 1 • , 
Llll~O<• •o•, 
Strmout <• '0' I 
FZT <• •o•, 
s •• •o•, 
Enl!O <• •1•, 
ClkMSB<• '1'1 
ReV,.•o•, 
LR<• •o•, RC <o •o•, 
Ht<- •o•, M2 •• •o•, 
ROCtl <• '0'1 R0Ct2 <• 'I'; 
Pin<• •o• 1 cin <• •o•, sbln o •o•, Lin•· •o• 1 Rln •• •o•, mn ,. •o•, sin,. •o•, 
Abovel .- •o•, Below!,. •0•1 
\lolt fer PSRIOD1 
Ml<• •1•1 
M2 <> •0'1 
ClkMSB <• '0'1 
llait: for PBRIOD. a, 
Ml<• •0'1 
lt2 <• •I' 1 
1.111245 <• '1'1 
llait for PERIOD• 61 
Strm,mt <• 'I' I 
Woit fer PERIOD • 61 
Vl!n.able <• •1•, l!l!nobl& <• 'l'i 
LRl24S <• '0' 1 
Strmout <• 'o • , 
FZT <• •o•, 
" •• '0'' 
EnRD <• '0'1 
Cl!<HSD <• '0' 1 
Rov <• •o•, 
1,11 <• •0'1 RC<• '1'1 
Ml<• •1•1 H2 <• '1'1 
ROCCl ,• 'O't ROCt:2 <• 'l'I 
Pin,. •1•, cin ,. •o•, Shin,. •o•, 
Lin,. •o•, Rln ,. •o• 1 uin <• •o• 1 Bin .. •o• 1 
lll,cv&i ,. •o•, Belew!<• •o•, 
llait fer Pl!IUOD1 
1/Enable <• 'I'; lllln.>blo ,. •1•, 
LRIU5 <• '0'1 
st"""'ut <• •o•, 
FZT <• 'l'i 
S <• '0' I 
EnRD <• '0'1 
Cll<HSB <• '0'1 
Rov,. •o•, 
Page-228 
LR<" '0'1 RC<• 'l', 
l!l •• •1•1 1!2 <• •1•, 
Rot:tl <• •o•, ROCt2 <• •1•, 
Pin<• •1•, cln <• •o•, Sbln •• •o•, 
Lin•· •0•1 Rln •• •o•, tlin •· •0•1 Bin•· •o•, 
l\b<,vel <• •o•, Bolowl <• •o• 1 
Wait for PSRIOD1 
Vllnable <• •1 • , 11Enablo <• 'I' 1 
!.!112!5 •• •1•, 
Strmc,Ot <• '0'1 
FZl'<• •0•1 
S <• '0' I 
SnRO •• •0•1 
Clk>!SB <• '0'1 
Rev <• 'O' ! 
!.R <" •o•, RC<• '1'1 
Ml<• '1'1 Hl <• '1'1 
Roctl •· •o•, Roct2 •• •1•, 
Pin<• •1•, cln •• •o• 1 abln <• •o• 1 
Lin•· •o•,Rin<• •o•,u1n •• •o•, Bin<• ,o•, 
Abovol •• •o•, Bolowi <• •o•, 
Wait fer Pl:IUOD1 
Wnoble <• • 1 • , H<'nablo <• 'l' , 
~Rl2i5 <• 'I'! 
Stnncut <• '0' 1 
FZT <• •1•, 
" <• '0'' 
RnltO <• 'O't 
Clkl!!!B <• '0'J 
Mv <• •o• 1 
LR•• •o•, RC<• 'l't 
Hl <• '1'1 l!2 <• 'l'o 
ROCtl <• •o•, Rocta <• •1•, 
Pin<• •1•, cu,•• •o•, abln ,. •o•, 
!.In<• •0•1 Rln •• •o•, Uln <• 'l'r Bin<• •o•, 
-1 •• •o•, Below!•· •o• 1 
lloit for PERIDD1 
•• toot Proc ... Ende H•<e 
Fin<- 'l'J 
wa.lt, 
end prcceoo Tl' 
ENO STII!; 
i ! . '"' I,.' Pape-229 
Appendix B: ZTE VHDL Listings 
-- IP ZTE srnvCTUAAL O~FIHITI<>ll 
-- Author• G,Alogoda 
-· VMdcn OOJa (001 was Crop) 
-- For single Pixl!l. Test only 
Ubrory ieeo, 
u•• leee .ood_loglc_n•~ .all I 
ontity PIXl<I, h 
port (Clk, """ 
,c 
~ 
Lo•d!!O 
·~· ,:nRlR2 
1,oadRlR2_ RZSy,n 
,~ 
Hld_msb 
CyeloRO 
Mac 
st:r••=• 
HP_O,l_ln 
HP_Roi,_in 
HP_Col_ou• 
HP_Row_cut 
Vllnablo 
HEnohle 
-· ""
1ofo_1n 
right_ln 
tcp_ln 
1>ott0<n in 
dota_oiit 
Sib_ln 
cld_ln 
par_in 
sib_cu, 
pu_cut 
" 
""d PIXaL1 
In otd_l<>9ic1 
In otd_logic, 
in otd lO<Jlc; 
out ooc()ogic, 
cut otd_.lc,glc 
in otd_loS"ict 
In .,d_logic, 
, in otd_logic1 
, In otd_lo,;ic, 
In otd_:l:,glci 
, In aed_loglc, 
In otd_loyic, 
in otd_l"9ic1 
In otd logic, 
in std)ogie, 
in .,d_loglc1 
U, otd_logic1 
in otd_loglc1 
in std_logic, 
out otd logic1 
out ood:)09ic, 
In otd logic, 
in otd:)ogic, 
in otd_!ogic, 
in .,d_logic, 
In otd_l09ic, 
in otd_loglc, 
in otd_l09io-1 
in otd_logic, 
out aod_log1c1 
orchltectu"" STIIUCTI!AAL of PIXEL lo 
-- a-bit Regiotore 
Reg-letor set 
oignol RO 
algnol RO_l 
signal Rl 
olgnal R2 
olgnol RZ$fm 
0!9001 RZSlg 
signal Rl 
olgnal R6 
•td_log!C_W!Ctor!D to 71, 
std_logic, 
otd_logic_vectorlC to S) 1 
otd_log!c_vector(D to S), 
otd_loglc_W!ctor lo to l) 1 
otd_l09lc1 
-- 1-b1t RegUtero oppendoge 
-- G-bit l!egiotorl 
•• 6-blt RegUter2 
l·bit Reglooor, For Bo & Bl 
Otd_logiC/ 
otd_logict 
-- 1-bit l!egisterJ Corry Btore 
-- 1-blt Rog!Oter6 for f'Zor 
-- "'"""" 
signal Hl, M2,Hl,H4, HS,M6,H7 ,Ha, M•A,H9B,H9C, MlD, MU, Hla,Hia, HU , std_log!C/ 
-- Do Hwee• 
elgnal DMl_e, mn_1 ••<Lloglc, 
-- ~dder 
signal a_out, c_out, C_In, T_Add , otd_loglc1 
-- Globol Enoblo 
oignal CEnoblo , otd_logic, 
-- llerotroo Identlllcotlon 
Page-230 
bogin 
olgnal Zid , otd_lD;Jlc, 
REGO, proceoo IRet, Clk) 
hegl.n 
if Rot • 'l' then 
RO <• "00000000'1 
RO_l '0' J 
ehif IClk • 'I' and clk'event) then 
U ( (not !!ORO) ond (MU xcr I.R) ) • • 1 • then 
RO l •- '0'1 
ond ift -
'l' then if (((Ma >«>r I.RI nond lnot MROII and OEnable and (not ayn,IO)) • 
ond lft 
RO_l <• R0(7)1 
if Hld_mBb • '1' th~n 
ehe 
RO(l to 7) <• ROIO ,o G)t 
ROIO '° 7) <• 116 O RO(O ,o 61 t 
end ift 
~nd if1 
end proceoo rumo, 
IIB<ll, proce•• IRst, Clkl 
..... 
if R•t • 'I' then 
Rl <• •000000•1 
ehll (Clk • 'l' and clk•event) then 
if DMI l • 'l' then 
-RI<• M2 & Rl(O to 41, 
end if, 
end IC, 
end proceoo RIDl1 
RE<l2, prcce .. (Rot, Clk) 
begin 
if ...,, • 'l' then 
R2 •• •000000•, 
eloif IClk • 'I' and Clk•eventl thon 
U DIU o • 'I' thon 
-R2 <• Ml & 8210 to 4)1 
ond !ft 
ond lfi 
end proceoo RBG2, 
REGl, proceoo (Rot, elk, OSnoblo) 
..... 
U Rot • •1• then 
RS <• '0' 1 
olslf (Clk. 'l' and Clk•ewnt) then 
if (GSRablel • •o• then 
end ifo 
RS<• '0'J 
111 •· c_out, 
end lfi 
end proc,,oo RE<ll / 
RSG_a¥H, procoso (Mt, Clk, Ganohlo, Ullnable, Vllnable) 
begin 
if Ro,• 'l' then 
else 
end lf1 
RZSym (01 '0' t 
RZSym (ll <• '0', 
if IClk • 'l' •nd Clk'ovont) then 
end If, 
if IGEnohlo and Sym!O) • '1' then 
RZSym IO) <• Hl01 
RZSyn, (l) <• Mll1 
ond u, 
U IVEnablo nor HEnoblol 'l' then 
RZS)'ffl (~I '1'1 
RZS)'ffl ,11 <• 'I'/ 
end U1 
end p>:<>ceoo RBG_SYH1 
Page-231 
RSGO, prcce•• IClk, Olln.tble) 
begin 
if {Vllnoble nor HSnabl•I • '1' then 
RG <• 'l '! 
olBH (Clk. •1• and Clk'•,...nt) thon 
if pztr • •1• then 
RO<• {FZtr nond IPor_rn or (Rzsym 10) nor RZS)'m !Ill or 
(lnot RZSym IO)I and Rzsym IO) and lnot sym1011>J, 
end If, 
end U, 
•nd proces• Raus, 
-.. Ill <• s_ouc when Loa<IRIR2_RZSym • 'l' el•• R> ISi , 
M2 <• s_out whon LOadRlR2_RZS)'m • 'I' oUe RIISI, 
Ml <o Rl!S) when StrOll1!>0Ut • .,. ehe R2(5)1 
Mt,. s_oue when (IMO nor SMlR>I nor cyeleRO) • •1' eloo Ms, 
MS <• R0(7) when Loo.dRO • 'I' else RO_l, 
MO<• MSC whon LOadRO • 'l' el•• 1141 
M7 <o IHJ and LR) whon SnRlR2 • 'l' el•• 11sc, 
Ma <• HP_C:Ol_ln when RC• •1• ehe HP_RoW_In, 
MSA <• rJght_ln when LR• '1' oloo left_ln, 
MSB <• bottom_ln when LR• •1• eloe top_ln, 
MSC,. MSB when RC• 'I' eloe MSA1 
MIO •• {ZIP or Cld_ln) when LoadRlR2_Rz.sy,, • •1• else Msc, •• Ztr•• Id •o·· 
becouao Al screwed up diagram 
Mll <• lcld_ln or {Por_in ond (not ZID)I) when Load!UR2_Rzsym • •1• olse RZSym 
101, 
1112-<• MS vhen Symrn • •o• else RZSym 10)1 
1113 ,. H12 when GEnal>lo • 'I' el•• M9C1 
Ml4 •• (Cld_In or Zin! .. hen FZtr • •o• olse (RZSym {O) nond RZS)'ITI (Ol)t 
Pe Muxoo 
ntn_o <• IEIIR1R2 ond LRI when etroomout '0' elee •o•, 
DM1_1 <• l&nRIR2 ond LRI whon strea1110Ut 'I' alee '0', 
output Signal neflnltlon 
Sib_out <• (S!b_rn or Zm or Cld_ln), 
Par_oue •• 11u, 
Data_out <• llll1 
HP_Col_Out ,. !HP_Col_In ><or G.Bnablel, 
HP_RoW_OUt <• (HP_RoW_ln xor o&nabloJ, 
Latchee • Internal Signals 
OEnable <• IR6 and Vllnable and HSnable and notlFZerl)1 
RZsig <• (RZSig or IR0(6) xor R0{1))1 and (not (not OEnoblo))1 
ZID <• Rzslg, 
Sor!Ol Adder 
C_In <• Rl or A<ldC1 
T_Add <• {117 X<>r Sub I I 
S_Out <• (IIS xor T_Addl xor c_1n, 
c_out <• (T_Md ond C_Inl or IT_Add and KS) or IMS and C_lnl, 
~nd srnUCT1JAAL, 
con!lguration CF<I_PIXSL_STJ\<ICTURAI, of PIXSL le 
for STRUCTURAL 
•• Toot The .:ntir• ZTS Pixel In <>no shot Vor OOI -· 
·- J\uthoro ' A.ROOSOU, O.AJ.ogoc!a 
library leee, 
uoo ieee.etd_loglc_ll64.all1 
entity Il'l'tlot is 
end IPTO•t, 
archltocture STil!ULUS of IPTeot le 
Page-232 
•• Tl100 Settlll!I 
conot""t ••RIOD I time ,. 20 ""' 
- - """'l'onent Defn 
O""'P"Il<nt PIXB~ 
pott ( 
Clk, Rot 
" 
" LOo<l!tO 
EnRO 
BllRlU 
LOa<l!tlR2_RZSym 
In otd_logk; 
In atd_logic, 
In otd_loglc, 
In atd_logk, 
In otd_logic, 
In atd_loglc, 
in otd_loglc, 
Jn otd_loglc, 
in otd_loglc, 
In otd_loglc, 
in otd_loglc, 
In otd_loglc1 
in std_loglc, 
in atd_loglc, 
out atd_loglc1 
out otd_loglc1 
In otd_loglc1 
in otd_loglc, 
In otd_loglc1 
In otd.logk, 
·~ Hld_lMb 
Cycle!IO 
,Me 
Stre....,ut 
RP_C<>l_in 
HP_RoW_in 
HP_C<>l_out 
HP_RoW_out 
vanoble 
nanohle 
Sym!O 
FZTR 
left_in 
rlgbt_ln 
top_ln 
IM>ttem_ln 
data_out 
in otd_loglc, 
ln std_logic, 
In otd_loglc, 
In std_logic, 
out atd_loglc, 
Sib.In 
Cld_ln 
Par_ln 
Slb_out 
Por_out 
In otd_loglc1 
In otd_loglc1 
In std_logk1 
out otd_loglc1 
out otd_logk, 
" ' RO_l , 
" 
" RZSym t 
Rzslg t 
,, 
lnout otd.loglc_vector(O to 7) 1 
lnoot otd_loglcr 
!neut otd_loglc_vector(O to 5)1 
lnou, sod_loglc_vocoor(O to 5), 
!neut otd_loglc_vector (0 to 1) J 
!neut std_loglc 
end component, 
-- Tostbench algnol Definition 
signal Clk otd_loglc ,. •o•, 
elgnol Rot otd_logic, 
olgnal RC otd_logio, 
oignol !JI sod_logic, 
signal LoadRO otd_logie, 
signal EIUID sod_logic, 
aignol EnRIR2 etd_loglc, 
1ig,1ol LoadlUR._Risym 1td_loglc1 
olgnol Sub otd_logk1 
lignol Hld_..,b otd_logk1 
algnol CycleRO etd_loglc1 
signal llddC otd logic, 
olgnol Streamouc ot{)oglc1 
slgnol HP_ccl_in otd_loglc, 
olgnal HP_RoW_ln otd_loglc1 
signal HP_ccl_c,ut otd_logtc, 
olgnal HP.RoW_out otd_loglc1 
olgnol vanoblo otd_loglcr 
olgnal H•n•ble otd_loglc1 
signal symrn otd_loglc1 
olgnal FZTR otd_loglc1 
olgnal lett_ln 
signal righC_ln 
olgnal sop.In 
eignal t>ottom_in 
atd_loglcr 
otd_logic, 
, otd_loglcr 
, otd_logic, 
B•blt Roghte~Q 
l·blt Reghtoro Appendogo 
6-blt Rogletorl 
G·blt Reghtor2 
1-blt Reghtor, For SO I. Bl 
Page-233 
olgnol elb_ln 
olgnal cld_ln 
olgnal P••-ln 
1!!!"ol Slb_out 
•lgnol P•l'_out 
•lg.,..! RO 
olgna1 RO_l 
•lgnal RI 
olgnol R2 
olgnal RZSym 
olgnol RZSlg 
I Btd_logiet 
otd_loglc, 
otd_loglc, 
otd_loglc, 
otd_loglc1 
otd_logler 
nd_logte_vec,or(O to n r 
atd_loglc1 
otd_logle_voetor(O to 5IJ 
otd_lOgle_voctorlO to 5)1 
otd_logte_voctcrlO to 1)1 
Otd_JOgiCJ 
olgn.ol Cnt , Int•~• ,. o, 
""'" !P ' Plxol port IMP IC!k •• Clk, Rot •> Rot, Re •> RC, LR •> LR, LoodRO •> 
LO.ldRO, Enlll •• EnllO, RnRIR2 •> RnRlR2, 
LO.,dRIR2_RZSym •> LO.ldRlR;_RZSym, SUI> •> sub, Hld_ ... b •• Hld_meh, 
Cy<:loRO •> Cy<:loRo, MdC •> -'ddC, 
St,ooo,:,ut •> StrHO,:,Ut, 
•> HP_<:ol_out, 
HP_<:ol_ln •> HP_<:ol_ln, HP_Row_ln •> HP_Ro><_ln, HP_<:ol_ouo 
HP_Row_out., HP_Row_out, 
VEnllllo •> VEnablo, m:rtable •> tm.,,.ble, ~y,nlO ·• Symlo, 
PZTII •> PZTII, 
le[O_ln 
tap_ln, 1,ottam_ln -, boteom_ln, 
dato_out 
s11,_1n 
Slb_cut 
.. 
RZSya,, RZSig •> RZS!g 
,, 
•• CCneueront .. tt1n9 foe the deck 
Cll< <• ln<>t Clkl attO< (PSRIOD/2) J 
cnt •• cnt • I o!tu PER!001 
STIH I prouoa 
..... 
Rat •• 'l't 
Vin.able•· •o•, 
HRnable •• •o•, 
woit !<>r IPIIRIOlll J 
Rlt <• •o•, 
VJ:nable •· •1•, 
unobl• •• •1•, 
" 
.. 
~
-~· 
.. 
=· Enll1R2 .. 
t.oadRJR2_RZS)'TI <• 
·~ Hld_lUb 
,:YcleRO 
=· 6troamout 
Sya,10 
,m 
left_ln 
rlght_ln 
t"l'_ln 
bott""'.ln 
Sib.In•• •o•, 
Cid Jn •• •o•, 
Pu:1n •• •o•, 
.. 
.. 
.. 
.. 
.. 
.. 
ijpcn! In,. •1•, 
flP:R-:ln ,. '1'1 
•o•, 
•o•, 
•o•, 
'0' I 
•o•, 
•o•, 
•o•, 
'0'' 
•o•' 
.•. ' 
•o•, 
•o•, 
•o•, 
'0'' 
•o•' 
'0' I 
•o•, 
rn ... 01 • nxol 1.ood Nod• 
•> lo!t_ln, right_ln •> rlght_ln, top_ln 
-, <11,ta_out, 
•> slb_ln, Cld_ln -, Cld_ln, Pa.._ln •> Par_ln, 
•> Slh_out, Por_out •> Pac_out, 
•> RO, RO_l •> RO_!, RI •> RI, R2 •> Ill, RZS)'ffl •> 
•• Load RO• '00001110' (14 doel 
Page,234 
'!'1 
<• 'I' I 
<• '0' I 
.. •o•, 
t.o.d llata tor RI into RO 
left In « •o• 1 wait tor HRIOD1 
lcfo:)n <• •1•1 wolt !or PaRIOD1 
left_in <• '1'1 wait for PERIODt 
left_in <• 'l'I woit for PERIOD, 
lo!t_in <• •o•, wait for PERIOD: 
lolt_ln <• •o•, woit for HRIOD1 
hft_ln <• 'O't wait for PERIOD, 
lolt_ln <• •o•, volt for PERI01>1 
ae•or• (M • •00001110•) 
report •t.oad ••!led• 
oovority failure,· 
-- Phaoo 02 - MoOion Dltferenco Modo 
•• RO • '00001110' 114 detl 
-- RI • •000010• IOO deel for tMt later 
EO!tO <• 'l't 
Cy<,loRO <• 'l't 
LOadRO « •O•t 
Hld mb <• •o•, 
volt tor PERIODt 
CycleRO '0'! 
EnRlR2 '1'1 
l,QodRIR2_RZSI"'\ <• •o•, 
StreA1""Ut •• '1'1 
Sub '1'1 
LI! 'I' 1 
AddC 'l't 
wait !or PIUIIOD1 
AddC 'O't 
wait !or PSRIODt 
woit for PERIOD, 
wolt for PERIOD, 
waio for PERIOD, 
Wait (or PERIOD, 
~ 'O' I 
wait for PERIOD, 
wait fo< PERIOD, 
aooort !RO • "00001110•) roport "Motion DIHor<tnee Pocked• oeverlty !allure, 
aooero Rl • •000000• report 'RI loaded with crop from Motion DIU• 0<1Verity failure, 
-- Phaoe Ol Forward wavelet Mollo 
-- Lolt '00001111' 115 deel 
-- Right '00001101' !ll doc) 
•• Top • •00001010• !10 dee) 
-- Scttoo, • '00010000' (IG doc) 
HP_Col_ln <• '!'1 
HP_R.,,._ln « '1'1 
EnRllt> '0' I 
1,Q•dRO •0•1 
Hld_mab •o•, 
Shift Right LP Pi>lola 
SnRO •o•, 
LR « •0•1 
RC <• 'O't 
CycleRO <• •1•, 
~•It !or PaRI001 
Sulltroct fr<>~ i.ett 
!<nRO 'I', 
CycloRO ,. '0'! 
RC <• '0'1 
LR •o•, 
Gull ' I' 1 
Page -235 
_, 
'I'! 
lolt_in •• 'I'; wolo !or PBRIOP: 
_, 
'0'' 
loft_in <o 'l'' wol< !or PERIOD, 
left_in <• 'l'' wait for nR100, 
lolt_in <• 'l'' wolt for PERIOD, 
left in <• '0'' wait for PERIOD, lef,:)n ,. •o•' wait for PERIOD; 
lett_in <• •o•, wait for PERIOP, 
left_in <• •o•' wait for PBRIOD; 
left_in <• •o•, wait for PERIOPo 
aoaert (RO• •00000110• and RO_! 'l'I 
,..port •sub from Left Bugg&rod• 
oeverity failure, 
Reoet C.rcy Bit 
HBnablo'" •o•, 
VBnable <• '0'1 
wait for PBIUOD, 
HBnahle <• 'l' I 
VEnoblo « •1•1 
suberact from Right 
" 
'0'' 
" 
•1•' 
_, 
'l'' 
right_in <• 'l'' wait for PERIOD, 
_, 
•o•' 
ri9bt_ln <• •o•, wait for PBRIOP! 
right_in ,, .,. ' wait !or PSRlOD, 
right_ln 'I' 1 wait for PSRIOD, 
right_in •o•, wait for PERIOD, 
right_in ,. '0'' wait for PSRIOD; 
right_ln <• •o•, wait for PERIOD, 
rlght_ln ,. '0'' ... 1. for PSRIOD, 
right_ln ,. •o•, wait for PERIOD, 
••••rt IRO • •00000000• ond RO_l • '0') 
ropon •sub fr°"' Right Buggerod• 
OOVMity fallur<11 
Ro,.te 7 All 
CyeleRO <• 'l', 
wait for P6RIOD•71 
Rooooe I HP 
RnRO <• •0•1 
!.R <o 'I' I 
wait for PERIOD, 
ShHt Right LP PlxolB 
RC<• 'l'I 
!.R ,. •o•, 
wait for PERIOD1 
•• Subtracc from llbove 
~"" 'l'' CyeleRO <• '0'' 
" 
'I'' 
~ '0'' 
""' 
<• 'I' I 
Mac 'l'' 
top_ln .. '0' I wait 
Mac '0'' 
top_ln .. 'I' I wolt 
top_ln '0'' wait 
top_ln 'I' I wolt 
top.In '.0'' .. ai, 
top_!n •o•, wait 
top_ln .•. ' "alt 
top_!n .. •o•, wait 
top_ln •o•, >1alt 
for PERIOD> 
for PERIOD1 
for PSRIOD, 
for PElllODt 
for PSRIOo, 
for PElllODt 
for PElllOD, 
tor PEllIOD 1 
for PElllODt 
aooort (RO• •11111011• and RO.I• '0') 
report •sub from Top Buggered" 
oovarlty !allure, 
Page - 236 
Reoot carry Blt 
Hl:noblo <• •o•, 
l'Enoble<• •0•1 
wait for PSRIOD, 
IIBnable <• •1•, 
""1loblo <• •1•, 
subtract 
" ~
'"' M,C 
<• •1·, 
'l'' 
<• 'l'' 
'1'' 
bottom_in <• 
-· 
•o•, wait 
•o•, 
for PERIOD, 
bottom_ln <• 
bottClffl lo <• 
bottom:)n <• 
bottom in <• 
bottom-in<• 
bottom-in <• 
boetom:::ln <• 
bottc,n_ln <• 
•o• 1 wait for PERIOD1 
'0' 1 wolt for PERIOD/ 
•o•, wait for PSRIOD1 
•1•, walt for PERIOD, 
•o•, Walt for PERIOD, 
•o•, wait for PERIOD, 
•o•, woit for PERIOD, 
• 0' 1 wolt for PERIOD/ 
oaoert (RO • •11110011• and RO_l • 'O'l 
npore •sub lrom llOttom Eugg•r•d" 
severity failure, 
•• Ph••• 04 • Quantisatlan • Inver•• (hy Div 41 
SllRIR:l 'C' / 
LoadRlR2_RZSym <• •o • , 
StreaTJ>OUO <• '0 • , 
Sub '0' ! 
l:nRO 'I' r 
eycleRO •1•, 
LoodRo •o•, 
Hld mab <• '1'1 
wait for PERIOD•), 
Hld_mob <• •o•, 
... it for PERIOD•(•· ll, 
ooaort IRO • •11110011' and RO_l • 'l'I 
repor• "Quantisation hrked• 
••verity error, 
•• Ph .. • o; • Slgn!Ucance Bit Generation 
H•noble <• •0•1 
""""ble <• •o•, 
wait !or HRIOD1 
11>:nable <• •1•, 
Vl:noble <• 'l', 
eycloRO 'I' I 
t.oodRo •• • o•, 
EnRO •1•, 
wolt !or PSRIOD••, 
ooeere (RZSig • • 1 •) 
<epcrt •significonco screwed" 
sevority error, 
•• Phoae osa - Syi,,l>ol Gonontion 
~~B: 
Por_in 
Sym!O <• 
LcadRlR2_RSSyn, <• 
wait !or PERIODJ 
•o•, 
•o•' 
'I' 1 
.,., 
'l'' 
aosore (RZSym • •1D"I 
report •sym!>ol ahaftsd• 
Page - 237 
oeVority erron 
·- Phaee 07a - S)'lllhol Sxtraotlon. 
LOad!tlR>_RZSym <• '0'1 
•• Form Zerotr.eo 
FZtr 'l'1 
wait for PKRIOD1 
PZtr •o•, 
RC <• 'O' t 
!.R <• •o•, 
loft_ln <• dato_out1 
wolt for PEl!IOP1 
lett_in <• data_out, 
wait for PKRIOP1 
loft_ln <• dato_out1 
aooert IRZsym • •10•) 
report •syml>Ol Reload rogord• 
eoverlty orrw, 
-· Phooo 07b • coefficient Extraction 
•• p=n, Zerntreeo 
Par in •· •o•, 
FZti' <• •!'! 
wait for PERIOD; 
FZtr <• •o•, 
RC •O• 1 
LR <• •o•, 
sy,,,10 •o•, 
Load!tO <• •l't 
eyeleRO •· •o•, 
left_in <• •1•, wait for PKRIOPi 
left_ln <• •1•, wait for PERIOD1 
left_in ,. •o•, wait for PBRIO!l1 
left_in <• •o•, wait for PERIODt 
left_in <• •o•, wait for PKRlODJ 
left_ln <• •o•, wait !or PERIOD1 
left_in ,. •o•, ""it for PERIOD, 
left_in <• •o•, "alt for PERIOD, 
u,ert !RO • •00000011•) 
roport •coefficlont Reload .uined• 
oeverity errcr1 
•• Phaoe oa • Inverse wavelot Tranefom 
HP_C'ol_In <• '1'1 
HP_Row_In ••• ,., 
EnRIR2 •o•, 
r.oad!tO •o•, 
Hld_mob '0'1 
Shift Ri~ht LP Pixolo 
=• •o•, 
LR <• •0'1 
RC <• •0•1 
CycleRO « •1• t 
... 1. for PERioD1 
A<ld from ,..ft 
l!nl!O <• •l't 
eyeleRo •· •o•, 
RC <• ,o,, 
!.R •0' I 
loft_in ,. • l' ' woit for PHRIOtl I 
lott_l<> <• •1•, wait for PERlon, 
loft_in <• '"'' wait for PE!llOD1 
left_in <• •o•, "alt for PERioo, 
loft_in « •o•, woit for PHRlOtl1 
left_ln <• •o•, wait for PERIOD1 
lo!t_ln <• '"' l ..alt for PKRIOP1 
left_in <• •o•, ... 1. for PHRIOP, 
!e!t_in ,. •0•1 woio for PERIOJ>i 
•••••• IRO • •00000100• and RO_l • •1•1 
Page-238 
report •eun, fro~ t.eft boll<><:koed' 
eeverity error, 
-- Roeet Carry Bit 
HEnoble <" • 0' 1 
ll!lnablo ,. •o•, 
wait far PERIOD/ 
m,noblo '" 'l' 1 
HEn&ble <" 'l', 
•• /ldd lro<n Right 
Re •o•, 
I.II 'l' 1 
right_ln <• •1•' 
dght_in <• •o•, 
rlght_in '" •o• 1 
right_l.n .. '0'' 
right_in <o •o•' 
rlght_in <• •o•' 
rlght_in <• '0' I 
right_ln <• •o•, 
d51lt_in <• '0', 
woit for PERIOD, 
wdt fer PERIOD, 
Wait for PERIOD, 
'1ait fer PERIOD/ 
woit fer PERIOD, 
wait for l'l:Rrnp, 
walt fer PITTIIOD1 
Walt fer PBRIODt 
wait fer PERIOD, 
o .. ert IRO • "00000101' and RO_l • '0') 
report •sum frC<n Right bollookoed• 
oevoritY error, 
-- Rotate 7 Jill 
CycleRO " 'l' 1 
watt fer PERI00'71 
Rotate l HP 
EnRO ,. •0•1 
LR <• 'l' 1 
walt for PERIOD, 
Shift Right LP Pixels 
RC<• 'l't 
LR<• '0'; 
w,it fer PERIOD/ 
- - Add from Ahovo 
~" 
.,, ' 
CyoleRO ,, •o•, 
"' '" 
.,. , 
" 
•o•, 
top_ln 
'" 
'0'' wait Cer PERIOOt 
top_ln •o•, wait far PBII.IOO, 
tcp_in ,. 'l' I '1dt for PERID01 
eep_in •o•, wolt fer PERIOD, 
top_in 
'" 
'0'' wait fer PERIOD, 
top_ln •o•, wait for PE~IOD, 
tcp_in .. •o•' wolt fer PERIDD1 
ecp_in •o•, Wait for PaRIOP, 
top_ln 
'" 
•o•, wait fer PERIOD, 
uoere (RO • •00001100• and liO_l • '0') 
r•po-.:t •sum frc<n Tep bolloekoed' 
oevnltY error, 
-- Reset carry Bit 
H11n•ble ,. •o•, 
HEnable ,. '"'' 
watt fer PlUIIODt 
HKnoble ,, •1•, 
Han>.blo ,. ' I' , 
- - Add from B<!lcw 
RC <• 'l't 
LR 'I', 
bottom In •• '0' I wait fer PERIOD, 
bcttcmJn'" •1•, '1dt for PERIOD/ 
bottom in'" •o•, wait tor PSRIODI 
bottom:in" •o•, wait fer PERIOD> 
1>ott<>m in'" •o•, wait for Pl:Rrno, 
bottom:ln <• •o•, wait fer PERIOD/ 
bottcm_in <• •o•, wo1' tor PSRIODI 
Page-239 
bottom_ln <o •o•, wait for PERIOD, 
bottom_ln <• •o•, wait for PRJ!IODt 
aooort (RO• '00001101• and RO l • '0') 
report "Sum from Bottoo, t,,,U'ook,ed• 
oovt1rity error, 
-- Ph••• 09 - Motion S'-"-'tlon Hodo 
EnRO <• 'l'i 
CycloRO •• 'l' , 
LOadRO <o •o•, 
Hldmob,-•o•, 
wait !or PERioo, 
CycloRO <• • O • , 
,:ORlR:l 'l' t 
l.oadRlR2_RZSl'm <• 'l' , 
"""""""'"t .,., 
LJl 'l'' 
wait for PEIIIOll•t 1 
LR <• •0•1 
wait for PERIOD'2i 
aoaort (IU • •011010•) 
repost •HoUon !JU""'tlon cropp<!d• 
oeverity error, 
asoert faloo 
roport •eompleted Succeoofully• 
oeverlty error, 
wait, 
end proceoo STIH, 
end STIHIJLtrs, 
con!lguratlcm CFG_Il'TEST_STJMUWS of Il'TEST is 
for otlmulue 
for Ip , pixel 
""" configuration WORK. CF<I_PiiEI,_STRUCTUAAL, 
end for, 
end !or, 
end CF<I_IPTEST_STl!\ULUS, 
Page - 240 
Thie page is intentionally left blank. 
Page· 241 
