Depth-first search embedded wavelet algorithm for hardware implementation by Ang, Li-Minn
Edith Cowan University 
Research Online 
Theses: Doctorates and Masters Theses 
1-1-2001 
Depth-first search embedded wavelet algorithm for hardware 
implementation 
Li-Minn Ang 
Edith Cowan University 
Follow this and additional works at: https://ro.ecu.edu.au/theses 
 Part of the Computer Sciences Commons 
Recommended Citation 
Ang, L. (2001). Depth-first search embedded wavelet algorithm for hardware implementation. 
https://ro.ecu.edu.au/theses/1047 
This Thesis is posted at Research Online. 
https://ro.ecu.edu.au/theses/1047 
Edith Cowan University 
  
Copyright Warning 
  
 
  
You may print or download ONE copy of this document for the purpose 
of your own research or study. 
 
The University does not authorize you to copy, communicate or 
otherwise make available electronically to any other person any 
copyright material contained on this site. 
 
You are reminded of the following: 
 
 Copyright owners are entitled to take legal action against persons 
who infringe their copyright. 
 
 A reproduction of material that is protected by copyright may be a 
copyright infringement. Where the reproduction of such material is 
done without attribution of authorship, with false attribution of 
authorship or the authorship is treated in a derogatory manner, 
this may be a breach of the author’s moral rights contained in Part 
IX of the Copyright Act 1968 (Cth). 
 
 Courts have the power to impose a wide range of civil and criminal 
sanctions for infringement of copyright, infringement of moral 
rights and other offences under the Copyright Act 1968 (Cth). 
Higher penalties may apply, and higher damages may be awarded, 
for offences and infringements involving the conversion of material 
into digital or electronic form.
EDITH COWAN UNIVERSITY 
LIBRARY 
Depth-First Search Embedded Wavelet 
Algorithm for Hardware Implementation 
by 
Ang Li-minn, B. E. (Hons.) 
Thesis submitted for the degree of 
Doctor of Philosophy 
School of Engineering and Mathematics 
Edith Cowan University 
November 2001 
I EDITH COWAN UNIVERSITY 
� PERTH WESTERN AUSTRALIA 
USE OF THESIS 
 
 
The Use of Thesis statement is not included in this version of the thesis. 
Abstract 
The emerging technology of image communication over wireless transmission 
channels requires several new challenges to be simultaneously met at the algorithm and 
architecture levels. At the algorithm level, desirable features include high coding 
performance, bit stream scalability, robustness to transmission errors and suitability for 
content-based coding schemes. At the architecture level, we require efficient 
architectures for construction of portable devices with small size and low power 
consumption. An important question is to ask if a single coding algorithm can be 
designed to meet the diverse requirements. Recently, researchers working on improving 
different features have converged on a set of coding schemes commonly known as 
embedded wavelet algorithms. Currently, these algorithms enjoy the highest coding 
performances reported in the literature. In addition, embedded wavelet algorithms have 
the natural feature of being able to meet a target bit rate precisely. Furthermore, work 
on improving the algorithm robustness has shown much promise. The potential of 
embedded wavelet techniques has been acknowledged by its inclusion in the new 
JPEG2000 and MPEG-4 image and video coding standards. 
The consideration now is whether the algorithms can be efficiently implemented in 
hardware. Whereas much work has been accomplished at the algorithm level, the same 
cannot be said at the architecture level. The disparity between the algorithm level and 
the architecture level is surprising considering that we need both levels to construct 
portable multimedia devices. Unlike hardware architectures in general, the complexity 
in embedded wavelet architectures does not only lie in its computational and storage 
requirements. The additional complexity lies in designing tree searching schemes 
which can be efficiently implemented in hardware. Our focus is to design suitable tree 
searching schemes for hardware implementation. There are basically two approaches 
for tree searching architectures: iterative or noniterative schemes. The noniterative 
scheme simplifies the tree searching but suffers from the possibility of misprediction. 
In this thesis, we return to the original concept of embedded wavelet algorithms using 
iterative tree searching. The significance of this work is to embrace iterative tree 
searching schemes wholeheartedly and to develop schemes which can be efficiently 
implemented in hardware. The iterative tree searching schemes do not suffer from 
misprediction. 
The approach taken in this thesis is to adopt a top-down design methodology 
beginning from algorithm investigation and ending at hardware implementation. We 
begin by looking at the variations in embedded wavelet algorithms. We propose tree 
searching schemes based on the depth-first search (DFS). The DFS gives several 
advantages for hardware implementation. The DFS algorithms have low storage 
requirements, simple computations and allows for efficient addressing of the nodes in 
the tree structure. The coding performances of the DFS variants are only slightly lower 
than its original algorithms. Furthermore, the DFS allows for novel bit stream (BS) 
architectures where the coefficient magnitude bits are not stored and are processed as 
they flow through the architectures resulting in fast architectures with low complexity. 
We present a DFS BS embedded wavelet system which can be switched to perform 
encoding and decoding on the same device. The coding system is suitable for 
implementation on DSP processors, field programmable logic devices or VLSI and is a 
promising solution for video coding applications in general and for portable video 
communicators in particular. The final part of the thesis discusses the applications of 
the DFS algorithms for robust coding and content-based coding applications. 
ii 
Statement of Originality 
I declare that this thesis contains no material which has been accepted for the award 
of any other degree or diploma in any university. To the best of my knowledge and 
belief, this thesis contains no material previously published or written by any other 
person, except where due reference is given in the text of the thesis. 
I consent to this thesis being made available for photocopying or loan. 
SIGNED: 
Ill 
DATE: 3 / ,� / '.).J,::> l 
Acknowledgments 
I would like to express my gratitude to the following people at Edith Cowan 
University. 
Prof. Kamran Eshraghian, my supervisor, for his guidance and support throughout 
this research project. His comments and criticism on the draft of the thesis are greatly 
appreciated. 
My supervisor Dr. Hon Nin Cheung, for his time and advice on this project and the 
preparation of this thesis. His constructive feedback and suggestions are very much 
appreciated. 
Prof. Hans-Jorg Pfleiderer, visiting professor to the School of Engineering and 
Mathematics in 1996, for several discussions on transform architectures and bit stream 
approaches. 
Dr. Morteza Biglari for his help in the configuration and use of the design tools. 
The assistance provided by staff and postgraduates in this department is much 
appreciated. 
I would like to thank my parents and grandma for their unceasing love always. 
IV 
List of Author's Related Publications 
[ 1] K. Eshraghian, S. Lachowicz, G. Alagoda and L. Ang, "Architectural mappings 
for multimedia smart-pixel arrays," in Proc. IEEE Int. Workshop Design, Test and 
Applications, Dubrovnik, Croatia, pp. 33-35, Jun. 1998. 
[2] L. Ang, H. N. Cheung and K. Eshraghian, "VLSI architecture for significance 
map coding of embedded zerotree wavelet coefficients," in Proc. IEEE Asia 
Pacific Conf Circuits and Systems, Chiangmai, Thailand, pp. 627-630, Nov. 
1998. 
[3] L. Ang, H. N. Cheung and K. Eshraghian, "VLSI architecture for embedded 
zerotree wavelet coding," in Proc. 5th Int. Symp. DSP for Communication Systems, 
Perth, Australia, pp. 128-133, Feb. 1999. 
[4] L. Ang, H. N. Cheung and K. Eshraghian, "VLSI decoder architecture for 
embedded zerotree wavelet algorithm," in Proc. IEEE Int. Symp. Circuits and 
Systems, Orlando, U.S.A, pp. 141- 144, May 1999. 
[5] L. Ang, H. N. Cheung and K. Eshraghian, "EZW algorithm using depth-first 
representation of the wavelet zerotree," in Proc. 5th Int. Symp. Signal Processing 
and its Applications, Brisbane, Australia, pp. 75-78, Aug. 1999. 
[6] L. Ang, H. N. Cheung and K. Eshraghian, "Robust image compression using the 
depth-first search on the wavelet zerotree," in Proc. 5th Int. Symp. Signal 
Processing and its Applications, Brisbane, Australia, pp. 797-800, Aug. 1999. 
[7] H. N. Cheung, G. Alagoda, K. Eshraghian and L. Ang, "Smart-pixel VLSI 
architecture for embedded zerotree wavelet coding," in Proc. 5th Int. Symp. Signal 
Processing and its Applications, Brisbane, Australia, pp. 693-696, Aug. 1999. 
[8] L. Ang, H. N. Cheung and K. Eshraghian, "VLSI architecture for very high 
resolution scalable video coding using the virtual zerotree," in Proc. IEEE 
Workshop Signal Processing Systems, Taipei, Taiwan, Oct. 1999. 
V 
[9] L. Ang, H. N. Cheung and K. Eshraghian, "Comparison of winner-take-all 
motion compensation schemes for embedded wavelet coding," in Proc. 6th Int. 
Conf Neural Information Processing, Perth, Australia, pp. 390-394, Nov. 1999. 
[10] H. N. Cheung, L. Ang and G. Alagoda, "Bit stream architecture for the 
implementation of the improved embedded zerotree wavelet algorithm," in Proc. 
Second Int. Conf Information, Communications & Signal Processing, Singapore, 
Dec. 1999. 
[11] H. N. Cheung and L. Ang, "Analysis of embedded zerotree wavelet algorithms for 
robust image compression," in Proc. Second Int. Conf Information, 
Communications & Signal Processing, Singapore, Dec. 1999. 
[12] H. N. Cheung, L. Ang and K. Eshraghian, "Embedded zerotree wavelet processor 
for mobile video communicator," in Proc. IEEE Int. Symp. Intelligent Signal 
Processing and Communication Systems, Phuket, Thailand, Dec. 1999. 
[13] H. N. Cheung, L. Ang and K. Eshraghian, "Parallel architecture for the 
implementation of the embedded zerotree wavelet algorithm," in Proc. 5th 
Australasian Computer Architecture Conf, Canberra, Australia, Jan. 2000. 
[14] L. Ang, H. N. Cheung and K. Eshraghian, "A dataflow-oriented VLSI architecture 
for a modified SPIHT algorithm using depth-first search bit stream processing," in 
Proc. IEEE Int. Symp. Circuits and Systems, Geneva, Switzerland, pp. 291-294, 
May 2000. 
[15] H. N. Cheung and L. Ang, "Application of the EZW algorithm to content-based 
image compression," to be presented at IEEE Int. Symp. Intelligent Signal 
Processing and Communication Systems, Honolulu, Hawaii, Nov. 2000. 
VI 
Contents 
1 Introduction ......................................................................................... 1 
1.1 Significance of this Work ............................................................................ 4 
1.2 Contribution of this Thesis .......................................................................... 5 
2 Embedded Wavelet Algorithms for Hardware Implementation ......... 7 
2.1 Introduction ................................................................................................. 7 
2.2 Embedded Wavelet Algorithms and its Variations ..................................... 8 
2.2.1 Coefficient Tree Structure .............................................................. 10 
2.2.2 Tree Searching (TS) Scheme ......................................................... 12 
2.2.3 Tree Coding Scheme ...................................................................... 15 
2.2.4 Generation and Transmission of Symbol Stream ........................... 21 
2.3 Features of Depth-First Search (DFS) for Hardware Implementation ........ 22 
2.3.1 Partitioning of Coefficient Tree into Subtrees ............................... 23 
2.3.2 Propagation of Significance Symbols ............................................ 24 
2.3.3 MSIG Quantization ........................................................................ 25 
2.3.4 Bit Stream Processing .................................................................... 26 
2.4 Simulation Results ....................................................................................... 27 
2.4.1 Comparison of TS Strategies ......................................................... 27 
2.4.2 Comparison of DFS Configurations ............................................... 30 
2.4.3 Comparison of MAP Coding Schemes .......................................... 32 
2.5 Conclusions ................................................................................................. 37 
vii 
3 Architectures for the DFS Embedded Wavelet Algorithms ............... 39 
3. 1 Introduction ................................................................................................. 39 
3.2 Bit Stream (BS) Architecture ...................................................................... 4 1  
3.2. 1 Depth-First Search Bit Stream Input.. ............................................ 41 
3.2.2 Encoder Architecture ..................................................................... 44 
3.2.2. 1 First Stage Processor ...................................................... 47 
3 .2.2.2 Second Stage Processor ................................................. 50 
3. 2. 3 Decoder Architecture ..................................................................... 5 2 
3.2.4 Advantages of DFS BS Architecture ............................................. 54 
3.2.4. 1 Simple Comparison Operations ..................................... 55 
3.2.4.2 Simple Arithmetic Operations ....................................... 55 
3 .2.4.3 Minimal Storage Architecture ........................................ 56 
3.2.4.4 Just-In-Time Processing ................................................. 56 
3.2.4.5 Architecture Scalability .................................................. 57 
3.3 Parallel Architecture .................................................................................... 58 
3.3. 1 Parallel Implementation of the DFS BS EZW Algorithm .............. 59 
3 .3 .2 Output Buffering Requirements ..................................................... 6 1  
3 .3 .3 Different Output Schemes .............................................................. 62 
3.4 Conclusions ................................................................................................. 64 
4 Hardware Implementation of the DFS BS SPIHT System ................. 68 
4. 1 Introduction ................................................................................................. 68 
4.2 DFS SPIHT Algorithm ................................................................................ 7 1  
4.3 Memory Bank Implementation ................................................................... 75 
4.3 . 1  DWT /IDWT Processor ..................................... : ............................. 78 
Vlll 
4.3.2 DWT-to-DFS/DFS-to-DWT Converter ......................................... 81 
4.3.3 TS/ITS Processor ........................................................................... 83 
4.4 Smart Pixel VLSI Implementation .............................................................. 85 
4.4.1 Structure of a Smart Pixel .............................................................. 85 
4.4.2 Additional Circuitry in Smart Pixel ............................................... 87 
4.4.3 DWT Interface and TS Processor .................................................. 88 
4.5 Simulation Results ....................................................................................... 90 
4.6 Conclusions ................................................................................................. 94 
5 Applications of the DFS Embedded Wavelet Algorithms .................. 96 
5.1 Introduction ................................................................................................. 96 
5 .2 Robust Coding ............................................................................................. 97 
5.2.1 Error Characteristics of the DFS EZW Algorithm ......................... 98 
5.2.2 Robust DFS EZW Models ............................................................. 100 
5.2.3 Simulation Results ......................................................................... 102 
5 .3 Content-Based Coding ................................................................................ 105 
5.3.1 Content-Based EZW Encoding ...................................................... 106 
5.3.2 DFS BS EZW Content-Based Implementation .............................. 108 
5.3.3 Simulation Results ......................................................................... 110 
5.4 Conclusions ................................................................................................. 114 
6 Conclusions and Future Work. ............................................................ 117 
6.1 Conclusions ................................................................................................. 117 
6.2 Future Work ................................................................................................ 119 
Bibliography ............................................................................................. 121 
IX 
List of Figures 
2.1 Block diagram of an embedded wavelet algorithm ............................................ 8 
2.2 Tree searching module ........................................................................................ 9 
2.3 Ancestor-descendant relationship of subbands in EZW algorithm .................... 10 
2.4 EZW coefficient subtree ..................................................................................... 11 
2.5 Ancestor-descendant relationship of subbands in SPIHT algorithm .................. 12 
2.6 SPIHT coefficient subtree ................................................................................... 12 
2. 7 Sub band scanning schemes ................................................................................. 13 
2.8 BFS traversal of coefficient tree containing N subtrees ..................................... 14 
2.9 DFS traversal of coefficient tree containing N subtrees ..................................... 15 
2.10 Example of coefficient subtree ........................................................................... 17 
2.11 Mixed MAP and SAQ symbol stream ................................................................ 22 
2.12 DFS EZW algorithm ........................................................................................... 23 
2.13 DFS partitioning of coefficient tree into subtrees ............................................... 24 
2.14 DFS propagation order ........................................................................................ 25 
2.15 Lena test image ................................................................................................... 28 
2.16 Barbara test image ............................................................................................... 28 
2.17 Simulation results for TS strategies for Lena image ........................................... 29 
2.18 Simulation results for TS strategies for Barbara image ...................................... 29 
2.19 Simulation results for DFS Configurations for Lena image ............................... 31 
2.20 Simulation results for DFS Configurations for Barbara image .......................... 31 
2.21 Simulation results for DFS traversal with EZW MAP coding ........................... 33 
2.22 Simulation results for DFS traversal with Improved EZW MAP coding ........... 34 
2.23 Simulation results for DFS traversal with SPIHT MAP coding ......................... 35 
2.24 Test images ......................................................................................................... 36 
2.25 Simulation results for DFS SPIHT MAP coding ................................................ 37 
3.1 Subband arrangement and example of coefficients ........................................... .41 
3.2 Coefficient tree example from EZW ................................................................... 42 
3.3 DFS representation showing subband orientations ............................................. 42 
3.4 DFS input bit stream ........................................................................................... 43 
3 .5 DFS BS EZW encoder architecture ................................................................... .44 
3.6 First Stage Processor architecture ...................................................................... .48 
X 
3.7 Example of a SIG bit plane ................................................................................. 49 
3.8 Flowchart for First Stage Processing .................................................................. 50 
3.9 Example of a ZTR bit plane ................................................................................ 5 1  
3. 10 Flowchart for Second Stage Processing .............................................................. 52 
3.11 Bit streams before and after channel coding ....................................................... 53 
3. 12 Approaches for the DFS BS EZW decoder architecture .................................... 54 
3. 13 Calculation of SIG symbol using bit stream approach ....................................... 55 
3. 14 Calculation of approximated coefficient value ................................................... 56 
3.15 DFS BS EZW processor with channel rate control ............................................ 57 
3. 16 Scalability of bit stream architecture .................................................................. 58 
3.17 Structure of the parallel DFS BS EZW architecture ........................................... 61 
3. 18 Structure of the heuristic-guided parallel architecture ........................................ 64 
4. 1 Variations for EW algorithms ............................................................................. 68 
4.2 SPIHT algorithm ................................................................................................. 74 
4.3 DFS SPIHT algorithm ......................................................................................... 75 
4.4 Memory bank DFS BS SPIHT structure ............................................................. 76 
4.5 Coefficient tree with MSIG node ........................................................................ 77 
4.6 Image pixels and DWT coefficients ................................................................... 78 
4. 7 Memory bank DWT arrangement for encoding .................................................. 79 
4.8 Memory bank IDWT arrangement for decoding ................................................ 8 1  
4.9 Memory bank DWT arrangement for four subtrees ........................................... 82 
4.10 Handshaking mechanism between First and Second Stage processors .............. 84 
4. 1 1  Smart pixel structure ........................................................................................... 86 
4.12 SP DFS BS SPIHT system .................................................................................. 86 
4.13 Coding component in a SP .................................................................................. 87 
4 .14 D FS B S TS processor and S P interface .............................................................. 89 
4.15 Schematic of DFS BS SPIHT system ................................................................. 90 
4. 16 Cell area for DFS BS SPIHT configurations ...................................................... 92 
4. 17 Clock cycle latency for DFS BS SPIHT configurations ..................................... 92 
4.18 Breakdown of cell area showing combinational and sequential areas ................ 93 
4.19 Cell area for different widths of coefficient memory bank ................................. 94 
5. 1 Lena image at 1.0 bpp with 23 SAQ bit errors ................................................... 103 
5.2 Lena image at 1.0 bpp with 22 POSH NEG bit errors ..................................... 104 
5.3 Selected region of an image and its position in the subbands ............................ 106 
XI 
5.4 Content-based encoding and decoding of image using the EZW algorithm ...... 107 
5.5 Three-dimensional representation of sign-magnitude binary format ................. 109 
5.6 Content-based decoded Lena images at 0. 1 bpp ................................................. 1 12 
5.7 Content-based decoded Lena images at 0.05 bpp ............................................... 1 14 
6. 1 DFS BS EW system with image segmentation and error protection modules ... 120 
xii 
List of Tables 
2.1 MAP symbols for coding of coefficient tree ....................................................... 16 
2.2 Effect on tree search using SDF and SGDF symbols ......................................... 16 
2.3 Example of MAP coding symbols for a threshold value of 32 ........................... 17 
2.4 MAP coding scheme for EZW algorithm ........................................................... 18 
2.5 MAP coding scheme for Improved EZW algorithm ........................................... 19 
2.6 DFS Configurations ............................................................................................ 30 
2.7 Comparison of DFS SPIHT and WVQ algorithms ............................................. 38 
3. 1 Assignment of MAP symbols for EZW coefficient nodes ................................ .46 
3.2 Assignment of TSDF bits .................................................................................... 49 
3.3 Assignment of TNZF bits ................................................................................... 51 
4. 1 MAP symbols for DFS SPIHT and SPIHT algorithms ...................................... 73 
4.2 Data signals for DFS BS SPIHT system ............................................................. 91  
5.1 Change of a MAP symbol to another symbol by a one-bit error ........................ 101 
5.2 Error detection of the DFS EZW ........................................................................ 101 
5.3 Bit errors in SAQ symbols without error detection ............................................ 103 
5.4 Bit errors in POS H NEG symbols without error detection .............................. 104 
5 .5 Overall PSNRs of content-based encoded Lena images at 0. 1 bpp .................... 110 
5.6 Overall PSNRs of content-based encoded Lena images at 0.05 bpp .................. 113 
Xlll 
AOMSIG 
BFS 
BS 
DFS 
DWT 
EBCOT 
EW 
EZW 
IZ 
LIP 
LIS 
LSP 
MAP 
MSIG 
NEG 
NZF 
POS 
PSNR 
SAQ 
SCIWT 
SDF 
SGDF 
SIG 
SIGN 
SPIHT 
SP 
SQ 
TS 
VQ 
WVQ 
ZTR 
Glossary of Terms and Abbreviations 
All offspring marked significant 
Breadth-first search 
Bit stream 
Depth-first search 
Discrete wavelet transform 
Embedded block coding with optimized truncation algorithm 
Embedded wavelet 
Embedded zerotree wavelet algorithm 
Isolated zero 
List of insignificant pixels 
List of insignificant sets 
List of significant pixels 
Significance map 
Marked significant coefficient 
Negative coefficient 
Not zerotree root flag 
Positive coefficient 
Peak signal-to-noise ratio 
Successive approximation quantization 
Significance checking in wavelet trees algorithm 
Significant descendant flag 
Significant grand descendant flag 
Significant coefficient flag 
Coefficient sign 
Set partitioning in hierarchical trees algorithm 
Smart pixel 
Scalar quantization 
Tree search 
Vector quantization 
Wavelet vector quantization algorithm 
Zerotree root 
XIV 
Chapter 1 
Introduction 
The emergmg technology of image communication over wireless transmission 
channels requires several new challenges to be simultaneously met at the algorithm and 
architecture levels. At the algorithm level, desirable features include high coding 
performance, bit stream scalability, robustness to transmission errors and suitability for 
content-based coding schemes. The first feature of high coding performance is essential 
to satisfy the limited bandwidth of wireless channels which are not currently met by 
conventional standards such as JPEG, MPEG and H.263. Bit stream scalability include 
rate scalability and spatial scalability. Rate scalability is a useful feature where the 
number of bits to be transmitted can be dynamically varied to suit the channel 
bandwidth at a particular time. Spatial scalability provides the option for multimedia 
applications with different screen resolutions. The third desirable feature is robustness 
of the transmitted bit stream over the wireless channel. Wireless communication 
channels suffer from burst errors in which a large number of consecutive bits are lost or 
corrupted by the channel fading effect. The fourth desirable feature is suitability for 
content-based coding schemes which is an important component in the new MPEG-4 
standard. At the architecture level, we require efficient architectures for construction of 
portable devices with small size and low power consumption. 
An important question is to ask if a single coding algorithm can be designed to meet 
the diverse requirements. In previous coding schemes, researchers adopted the divide 
and conquer approach where a single feature was targeted for improvement which was 
often obtained at the expense of other features. The JPEG, MPEG and H.263 standards 
suffer from limited scalability features. Other schemes such as vector quantization 
(VQ) emphasize coding performance but require high complexity architectures. 
Recently, researchers working on improving different features have converged on a set 
of coding schemes commonly known as embedded wavelet algorithms [ 16], [ 17], [ 18], 
[ 19]. Currently, these algorithms enjoy the highest coding performances reported in the 
literature. In addition, embedded wavelet algorithms have the natural feature of being 
able to meet a target bit rate precisely. Furthermore, work on improving the algorithm 
robustness has shown much promise [26], [27], [28], [29]. In this thesis, we will also 
demonstrate the simplicity of incorporating content-based coding into embedded 
wavelet algorithms. The potential of embedded wavelet techniques has been 
acknowledged by its inclusion in the new JPEG2000 and MPEG-4 image and video 
coding standards [32]. 
The consideration now is whether the algorithms can be efficiently implemented in 
hardware. When first reported in 1993, the Embedded Zerotree Wavelet (EZW) 
algorithm [ 16] created much interest in the research community. Up till then, the 
coding performance of image coding algorithms which were based mostly on VQ were 
improved at the expense of increased complexity in the number and size of the 
codebooks required. The EZW algorithm reversed this trend in that it achieves high 
coding performance using simple scalar quantization (SQ) techniques. It achieves this 
because it utilizes a form of image redundancy which was not being used in other 
coding schemes. The algorithm reorganizes the wavelet coefficients in the subbands 
into a tree structure which it then efficiently encodes for transmission. The coefficients 
in the lowest frequency subband form the root nodes and the coefficients in the highest 
frequency subbands form the leave nodes. Specifically, the algorithm introduced the 
2 
concept of a zerotree. The idea is that if a coefficient in the tree is insignificant with 
respect to a threshold value, then there is a high probability that the descendants of the 
coefficient would also be insignificant. In such a case, a special symbol known as a 
zerotree root (ZTR) is encoded to indicate to the decoder that there are no more 
significant coefficients and the search can move on to another branch in the tree. 
Given that the EZW algorithm uses simple SQ instead of the more complex VQ 
schemes, it would be natural to assume that it is suitable for hardware implementation 
and there would be several architectures reported in the literature. However, a search of 
the literature reveals that this is not the case. Whereas much work has been 
accomplished at the algorithm level, the same cannot be said at the architecture level. 
The disparity between the algorithm level and the architecture level is surprising 
considering that we need both levels to construct portable multimedia devices. A 
significant contribution at both the algorithm and architecture levels based on wavelet 
VQ (WVQ) has been reported very recently [33]. In this work, the authors shed some 
light on the complexities involved in designing embedded wavelet architectures. They 
postulate that embedded wavelet architectures require iterative methods to decide 
zerotrees. Unlike hardware architectures in general , the complexity in embedded 
wavelet architectures does not only lie in its computational and storage requirements. 
The additional complexity lies in designing tree searching schemes which can be 
efficiently implemented in hardware. The WVQ algorithm simplifies the tree searching 
requirements by using a noniterative method to decide zerotrees. The search begins at 
the tree root and descends to lower levels of the tree structure only if the current node is 
significant. There is a possibility of misprediction when the current node is 
3 
,, 
insignificant hut 1hcrc is a signilkant dcsccnd:1111 node. The nonitcr:itivc search schcmc 
will miss lhis. 
I .  I Significance of this Work 
, , '  
Embedded wavelet algorithms satisfy many desirable features for wireless image 
communh.:ation. The barrier impeding its potential arc tree searching schemes which 
can be efficiently implemented in hardware. The objective of this thesis is to remove 
this barrier by proposing efficient embedded wavelet architectures. Our focus is to 
design suitable tree searching schemes for hardware implementation. There are 
basically two approaches for tree searching architectures: non iterative [33] or iterative 
schemes [34], [35], [36], [37]. The iterativ e  schemes which have been reported makes 
use of memory banks to store the wavelet coefficients where all the coefficients can be 
accessed to establish the ancestor-descendant relationships i n  the tree structure. The 
state-of-the-art for the noniterative scheme is the WVQ algorithm [33]. The 
noniterative scheme simplifies the tree searching but suffers from the possibil i ty of 
misprediction. In this thesis, we return to the original concept of embedded wavelet 
algorithms using iterative tree searching. The significance of this work is to embrace 
iterative tree searching schemes wholeheartedly and to develop schemes which can be 
efficiently implemented in hardware. The i terative tree searching schemes do not suffer 
from misprediction. 
We observe that whereas researchers have developed several hardware techniques 
such as parallel processing and pipelining to handle computational complexity, 
comparatively fewer techniques have been developed to handle complex data structures 
such as lists, trees and graphs. In contrast, software techniques to handle these data 
4 
struclurcs have been extensively developed. As the complexity of hardware systems 
irn.:rcasC·s, dala strm:tures become essential tools 10 organize d.tt.t for processing and 
storage.'' Although this thesis is focused particularly on designing tree sc;irching 
architectures for embedded wavelet algorithms, we hope that the design process would 
also provide some genernl insight into how to implement rnmplcx data structures in 
hardw.tre. 
The approach taken in  this thesis is to adopt a top-down design methodology 
beginning from algorithm investigation and ending al hardware implementation. At 
each stage in the design process. we evaluate the options available and select the most 
suitable path for the next stage. To guide the direction of thi� thesis, we follow closely 
to the original EZW algorithm and to the Set Partitioning i n  Hierarchical Trees (SPIHTJ 
algorithm [17].  The SPIHT algorithm is well known and is considered to be one of the 
best image coders in the l iterature. 
1 .2 Contribution of this Thesis 
This thesis consists of six chapters. Other than the first and last chapters which are 
the Introduction and Conclusion chapters, the remaining four chapters contain 
contributions corresponding to the different design levels of a lgorithm, architecture, 
implementation and application. 
In chapter two, we outline the variations and options available for embedded wavelet 
algorithms and discuss the variations which are suitable for hardware implementation. 
We present a variation based on depth-first search (DFS) processing which has reduced 
\storage and simpler processing for hardware implementation. The coding performances 
of the DFS variants are only slightly lower than its original embedded wavelet 
5 
algorithms and ml' higher than the WV<) .ilgorithm al high hit ralc:-. and is comparable 
to the WV(J algorithm at lower hit rate:-.. 
Chapter 1hn:c prcscnt� new an:hitct.:turcs 111 implement cmhcddcd w�1vclct algorithm.., 
based on thc DFS rcprc:-.cnt;1tion of the c.:ocrtil'ic111 tn:c. We t.:011:-.idcr two j_..,..,uc<,. Thc 
first issuc is the simplicity of thc art.:hilct.:turcs for hardwarc implcmcntalion an<l thc 
second issue is the llcxihility of the- .:irt.:hilccturcs to implemcnt variations in thc 
algorithms. For the lirst issue of simplicity, we present novel bit stream (BS) 
architectures which processes the bits as they 11ow through the proccssor. For the 
second issue of flexibility. we present parallel DFS BS architectures which provides 
options to output in different formats to achieve higher coding performanci:s. 
Chapter rour presents hardware approaches for the implcmcntation of the DFS BS 
SPIHT system. The input into the system during encoding i:-. a stream of image pixels 
and the output is an encoded bit stream ready for transmission. Thi: system produces 
the reconstructed image data from the received bit stream <luring decoding.. We present 
a hardware implementation of the DFS BS SPIHT system which can be switched to 
perform encoding and decoding on the same device. 
Chapter five presents applications using the DFS embedded wavelet algorithms. We 
will discuss two applicalions for the DFS algorithms. The first application is the use of 
the DFS EZW algorithm for robust coding and the sccoml application is for content· 
based coding using the DFS implementation of the Improved EZW algorithm. 
6 
Chapter 2 
E1nbedded Wavelet Algorith1ns for Hardware lmplcmcnlation 
2. 1 Introduction 
A study into the different variations availahlc for cmhc<ldc<l wavelet <EW) 
algorithms may provide two benefits. Firslly. it may allow the development of an 
algorithm variation which has a higher coding performance than cxi.-.ting algorithms. 
And secondly, a variation of the algorithm may be found to be suitoblc for hardware 
implementation. The study of EW algorithms which are suitable for hardware 
implementation lags behind the study to improve the performance of the algorithms 
which is very far advanced. With the introduction of the JPEG2000 and MPEG-4 
coding standards, there has been increasing interest in implementing EW algorithms in 
hardware. Previous approaches to simplify EW algorithm:,, for hardware 
implementation have looked primarily at reducing the storage requirements HUI, [4 1  ]. 
[42], [43). Other factors which are desirable for hardware implementation arc simpler 
processing and efficient addressing of wavelet coefficients in thi: tree structure. The 
first two factors of lower storage requirements and simpler processing apply to 
hardware architectures in general. The third factor of efficient addressing of nodes in 
the tree structure is unique to the implementation of tree-based coding schemes such as 
EW algorithms. 
This chapter is organized as follows. In Section 2.2. we outline the variations and 
options available for EW algorithms. In Section 2.3. we present a tree s('arching (TS) 
variation based on depth-first search (DFS) processing which allows for efficient 
7 
addn:ssing of l'Ocflkicnl nodes in 1he tree stn11.:1ure. In addition, the DFS variants 
require less storage 1h,111 previous approm.:hes and allows for simpler pmccssing. We 
perform simulations 10 compare thl' coding performances of the IJFS vari,mts with its 
original EW algori1hms in Section 2A .and present conclusions in Section 2.5. 
2.2 En1bedded Wavelet Algorithms and its Variations 
Figure 2 . 1  shows a block diagram of an EW algorithm. The discrete wavelet 
transform (DWT) transforms the image pixels into its corresponding wavelet 
coefficients. These wavelel coefficients are passed into the TS module to generate a 
symbol stream. The generated symbol stream is then passed into an arithmetic coder to 
obtain a compressed bit stream which is transmitted. 
Image 
pixels 
Discrele Wavelet Tree Symbol Arithme1ic Wavelet 
Transform coefficients Searching stream Coder 
Figure 2. 1  Block diagram of an embedded wavelet algorithm 
Bit stream 
The symbol stream is generated from multiple passes of coding the coefficients 
where a threshold value is associated with each pass. Initially. the threshold value is set 
to half the maximum value of the coefficients and is further halved for each subsequent 
pass. The number of passes to be performed depends on the target bit rate to be met. 
Two types of symbols are contained in the symbol stream: 
• Significance map (MAP) symbols which encode the positions and signs of the 
significant coefficients in the tree structure and 
• Sul'cessivc-approximalitm qu:1111izalitm (SA(}) symh11Js whi1:h cn1:11dc the 
approximate m:.1gniludcs of 1hc signili1:;1111 cocflidcnts in dcnc.isin!,! order or 
imprn:1anrc. 
A cocflkient is dclinc<l :.1s signilic,1111 if the magnitude of 1hi: c11cflidc111 value is gre,uer 
than or equal to the threshold v:.1luc. The order of gcncrntion of the symhol stream 
altcrn,ues between MAP and SA(J symbols as shown in Figure 2.2. 
Pass 2 Pass 1 
Wavelet 
coefficients 
Tree 
Searching I I I - - I SAO I MAP I SAO I MAP I 
Figure 2.2 Tree searching module 
The feature common in all EW algorithms is that the algorithms .:stablish a tree 
structure amongst the coeflicients in the subbands. The algorithms then use the tree 
structure to efficiently encode the MAP. But the algorithms difft:r in: 
• The structure of the coefficient tree used: 
• The strategy to search the coefficient tree for signilic;mt codlicients: 
• The scheme to code the coefficient tree and the 
• Generation and transmission of the symbol stream so that symbols which are more 
significant are transmitted ahead of less significant symbols. 
The fourth difference of generalion and lransmission of the symbol stream applies not 
only lo MAP coding but also lo SAQ coding where the SAQ symbols can be reordered 
for transmission as in the EZW algorilhm. 
9 
2.2. 1 Coefficient Tree Structure 
The lirsl varialion avail.1hlc for EW algorithms is the strm:turc of the codfo:icnt tree. 
Scvcrnl EW algorithms use the tree stru1:ture in  the EZW algorithm. Figun! 2J shows a 
three scale DWT decomposition and the tree structure for the EZW alg<)rithm. From the 
lowest th .'quency .-;ubband, the coeffo:icnts hcgin an ancestor-descendant hierarchy of 
coefficients descending all the way to the h ighest frequency subbands. Each parent 
coefficient in the tree structure has four children coefficients except at the lowest 
frequency subband where each parent coefficient has three children coefficients. The 
coefficient tree structure contains a number of subtrees equal to the number of 
coefficients in the lowest frequency subband . 
.. L
. 
HL, 
-� l:±J -- HL,  
LH, HH� 
L H ,  HH , 
Figure 2.3 Ancestor-descendant relationship of subbands in EZW algorithm 
Figure 2.4 shows the structure of a coefficient subtree. The subtree is ordered in 
several ways: 
• Firstly, the tree is ordered from the lowest frequency subbunds at the root node to 
the highest frequency subbards at the leaf nodes; 
10 
• Sc1..·tmdly. wilh lhc cxccplitm of the lt,wc ... t frcqucm:y ,uhh;uul. the three hranchc, , ,f 
11\C 1n�c arc ordcn:d <ll'Cording to their frcqucm.:y ... uhhaml.., and 
• Thirdly. 1hc roo1 node linb 1hc 1lirl·c frL"qucncy hrand1c, :11.:conlrnJ:! 10 1hc1r "Pall:tl 
posi1io11 in the suhhands. The frcqucnl'y hram.:hc, arc arrangc<l in lhl· 1 11.. L I  I and 
H H  order. 
HL:, 
d� 
-· 
---------
;---, 
/·� 
'LH.: LH:' LH.' 
Hll1 
7-
/ , 
_ / , 
Ul: Hll! 1111: HII: }HI: -. , 
,// r-.,, __ -·-· 
JUI, JUI, HI( Hll JIii; JIii JIii Ill!, 
Figure 2.4 EZW coefficient subtree 
A variation of the coefficienl tree structure is that used in the SPIHT algchithm. One 
notable difference between the tree structure in the SPIHT algorithm and the tree 
structure in the EZW algorithm is in  the linking of the root nodes in !he Jowc ... t 
frequency subband. In the EZW tree .structure, each coefficient in the lowest frequency 
subband is Jinked to three children nodes in the next higher frequency subbands. In the 
SPIHT tree structure. each coefficient in the lowest frequency subband is either linked 
to four children nodes or to none. Figure 2.5 shows the anceslor-descendant 
relationship for the SPIHT algorithm. The root node marked by the asterisk h:.is no 
descendants and is not linked to the three children nodes. 
1 1  
Figure 2.5 Ancestor-descendant relationship of subbands in SPIHT algorithm 
Figure 2.6 shows the coefficient subtree for the SPIHT algorithm. A point to note is 
that the tree structure is for a two scale DWT decomposition. The inconsistent 
treatment of the root nodes in the SPIHT tree structure increases the complexity for 
hardware implementation. 
Figure 2.6 SPIHT coefficient subtree 
2.2.2 Tree Searching Scheme 
A second variation amongst EW algorithms is in the scheme to search the coefficient 
tree for significant coefficients. Figure 2.7 shows the different schemes which can be 
used to scan the wavelet subbands. The simplest scheme for hardware implementation 
1 2  
is the raster scan. The EZW algorithm uses the raster scan scheme. The scan begins 
from the lowest frequency subband and after all the coefficients in the subband have 
been scanned proceeds to the next higher frequency subband. A subband scanning 
scheme with slightly more complexity is the boustrophedon scan from the Greek for 
"how an oxen plows a field". This scan is similar to the raster scan except that the scan 
order for every other row is reversed. The third subband scanning scheme which can be 
used is the Morton scan. This scanning scheme is similar to the raster scan except that 
the coefficients in the subbands are scanned in a fractal-like manner. The final subband 
scanning scheme with the highest complexity for hardware implementation is the peano 
scan. This scan combines the row reversal of the boustrophedon scan with the fractal 
manner of the Morton scan. 
(a) Raster scan (b) Boustrophedon scan 
Z--- �z--·· /� ��-- f /�/{-j / l  �,.. --·.W 1----,.� -----------t 
�/� �- --� 
( c) Morton scan (d) Peano scan 
Figure 2.7 Subband scanning schemes 
1 3  
The four subband scanning schemes i n  Figure 2.7 arc similar in that the schemes 
scan all the coefficients in the lowest frequency suhband first before scanning the 
coefficients in the higher frequency subbands. The root nodes in the coefficient tree 
structure are scanned first. The subband scanning schemes are similar to the breadth­
first search (BFS). BFS traversal scans the tree structure at different levels beginning 
from the root nodes of the tree and ending at the leaf nodes. The scanning order for the 
tree struc1ure in Figure 2.4 is LL,. HL,, LH,, HH,, HL2, HL,, HL,, HL2, LH2, . . .  , HH,,  
HH1, HHi, HH 1 • For a coefficient tree with more than one subtree, BFS scans all the 
nodes on the same level before it scans the nodes in another level. Figure 2.8 shows the 
BFS traversal of a coefficient tree containing N subtrees. 
Subtree 1 Subtree 2 Subtree N 
......................................  .. ....................................... .. --,----- ...--r--� 
------• ----··• ······• ·
··
···• ··-···• 
Figure 2.8 BFS traversal of coefficient tree containing N subtrees 
In conlrasl to the subband scanning schemes and the BFS, depth-lirst search (DFS) 
traversal scans the tree structure beginning from the root node and descends all the way 
to the leaf nodes before it backtracks u p  a level. The scanning order for the tree 
structure in Figure 2.4 is LL3, HL3, HL2, HL1 , HL 1, HL1 , HL1 , HL2, . . .  , HH2, HH 1 , 
HH1, HH1, HH1. For a coefficient tree with more than one subtree, DFS scans all the 
nodes in  a subtree before i t  scans the nodes in another subtre·�. The DFS performs a 
natural partitioning of the coefficient tree into a number of subtrees which can be 
processed independently. Figure 2.9 shows the DFS traversal of a coefficient tree 
containing N subtrees. 
14 
Subtree 1 Subtree 2 Subtree N 
( ...
.... 
·:·····
·
···" ........................ . 
Figure 2. 9 DFS traversal of coefficient tree containing N suhlrccs 
The search order for the BFS and DFS tree traversal schemes arc fixed and the 
traversal follows a predetermined order. An example of a TS strategy where the 
traversal i s  not fixed is the search strategy used in the SPIHT algorithm. Jn this 
i.! algorithm, three ordered lists are maintained to store the significance information. The 
three l ists are the list of insignificant sets (LIS), list of insignificant pixels (LIP) and list 
of significant pixels (LSP). The search order of the coefficient tree is according to the 
set partitioning strategy. The coefficients are processed in the !Jrder indicated in the 
LIS. 
2.2.3 Tree Coding Scheme 
··. •. 
Another variation for EW algorithms is the scheme used for MAP coding of the 
coefficient tree. SAQ coding is 1'similar in  all EW algorithms. For the coding of tlie 
MAP in  each pass, each node in  the coefficient tree is associated with a set Of binary 
symbols as shown i n  Table 2 . 1 .  The MAP symbols are d ivided ·into t\Y6 categories: 
coding symbols and processing symbols. 
15 
. . 
Table 2. 1  MAP symbols for col.ling of coefficient tree 
Cate•orv Svmbol Dcscrintion 
SIGN Sign bit of the coefficient 
SIG Significant Coefficient Fl.tg 
- indicate if this coefficient 
oosition is silmificant 
Coding Significant Descendant Flag 
- indicate if this coefficient SDF 
nosition has any significant l.lcsccnl.lant coefficient 
Significant Grand Descendant Flag - indicate i f  this coefficient 
SGDF position has any significant descendant coefficient with the 
excention of its immediate children 
Marked Significant Coefficient Flag - indicate if  this 
MSIG coefficient has already been found to be significant m a 
Processing nrevious nass 
All Offspring Marked Significant Coefficient Flag - indicate 
AOMSIG that all the descendants of the coefficient are already MSIG 
nodes 
MAP coding symbols are the only symbols sent to the decoder. The SDF and SGDF 
symbols are only sent to the decoder in cases when the decoder cannot infer their 
values. For example, the SDF symbol is not sent for leaf nodes when the decoder can 
infer that SDF = 0. Similarly, the SGDF symbol is  not sent for leaf nodes and their 
parent nodes as the decoder can infer that SGDF = 0. The SDF and SGDF symbols 
code the significance of the descendants. In effect, these symbols are used to tell the 
decoder how to search the coefficient tree. Table 2.2 shows the effect on the tree searCh 
Using these symbols. ' 
Table 2.2 Effect on tree search using SDF and SGDF symbols 
SDF SGDF Effect on tree search 
0 X Stoo search after current node 
0 Stoo search after children nodes 
I Continue search 
Figure 2.10 sh�ws an example of a coefficient subtree and Table 2.3 shows the MAP 
codi�g symbols for a threshold value of 32. In the table, the root coefficient -31 is less 
than the threshold value and the SIG symbol is set to 0. The coefficient value is 
16 
negative which is indicated by the SIGN symbol of I .  The coclfo:icnt has at least one 
significant descendant cocflicicnt and the SDF symbol is set to I .  The coefficient has a 
significant grand descendant and the SGDF symbol is set to I .  
Figure 2. IO Example of coefficient subtree 
Table 2.3 Example of MAP coding symbols for a threshold value of 32 
Coefficient SIG SIGN SDF SGDF 
-3 1  0 I I I 
15 0 0 0 0 
-5 0 I 0 0 
9 0 0 0 0 
3 0 0 0 0 
0 0 0 0 0 
14 0 0 I 0 
- I  0 I 0 0 
47 I 0 0 0 
-3 0 I 0 0 
2 0 0 0 0 
-9 0 1 0 0 
2 0 0 0 0 
-3 0 1 0 0 
5 0 0 0 0 
1 1  0 0 0 0 
-7 0 1 0 0 
6 0 0 () 0 
-4 0 I 0 0 
5 0 0 0 0 
6 0 0 0 0 
MAP processing symbols are not sent lo the decoder but are only kept al the encoder 
and decoder for processing. The MSIG symbol denotes which coeffic ients have already 
17 
.
·,� 
been found to be significant in previous passes. Other than the MSIG symhol, an 
ad<Htional symbol which can be kept at the encoder and decoder f'or processing i s  the 
AOMSIG symbol which is used to indicate that all the desccn<lants of the coefficient arc 
already MSIG nodes. When AOMSIG = I ,  the encoder docs not need to send the SDF 
and SGDF symbols as the decoder can infer that SDF = 0 and SGDF = 0. 
The MAP symbols in Table 2 . 1  can be used to implement tree coding schemes for a 
number of EW algorithms. For example, the MAP coding representation for the EZW 
algorithm is shown in Table 2.4. The symbol values in italics <lenote that this 
information is used for encoding the symbols only and is not part of the s ymbols. It is 
"assumed" and therefore it is not sent. In the table, five MAP symbol s  are used to 
represent the various information of the coefficient tree in  the current pass .  In terms of 
the search, the isolated zero (IZ) symbol explicitly indicates that the search should 
continue down to lower levels from this coefficient while the zerotree root (ZTR) 
symboi indicates that the search should stop at this coefficient level. 
Table 2.4 MAP coding scheme for EZW algorithm 
Svmbol SIG SIGN SDF SGDF 
POS I 0 I I 
NEG I I I I 
MSIG = 0  IZ 0 X I I 
ZTR 0 X 0 0 
0 X 0 0 
MSIG - I MSIG 0 X I I 
For the positive (POS) and negative (NEG) symbols, the original EZW algorithm 
assumes that if the current coefficient is significant, then there would probably be 
significant coefficients which are descendants of the current coefficient at lower levels. 
Based on this assumption, the search will continue to lower levels. However, if there 
1 8  
are no significant descendant coefficients, additional ZTR symbols would be required al 
lower levels to stop the search. The information to indicate whether the search should 
continue from a node to lower levels below is by the SDF symbol. In the case of the IZ 
symbol, SDF = I, while for the ZTR symbol, SDF = 0. For MSIU coefficients, the 
original EZW algorithm continues the search to lower levels. 
To incorporate explicitly the information to the POS and NEG symbols if the search 
should continue to lower levels or not from the corresponding coefficient node. we 
attach the SDF to these two symbols. Shown in Table 2.5 are the modified POS and 
NEG symbols together with other modifications that improve the performance of the 
EZW algorithm. Some of the modifications are similar to those reported in  [ 19] but we 
describe the modifications in relation to the MAP coding representation. 
Table 2.5 MAP coding scheme for Improved EZW algorithm 
Svmbol SIG SIGN SDF SGDF 
POS SD I 0 I I 
POS ZT I 0 0 0 
POS LEAF I 0 0 
NEG SD I 1 1 
MSIG = 0 NEG ZT 1 1 0 0 
NEG LEAF 1 1 0 0 
IZ 0 X 1 
ZTR 0 X 0 0 
ZTR LEAF 0 X 0 0 
MSIG = 1 IZ MSIG 0 X I I 
ZTR MSIG 0 X 0 0 
While the first two bits, SIG and SIGN are the same as those i n  Table 2.4 for the 
POS and NEG symbols, the differences are in the additional SDF bit. For the POS_SD 
(Positive S ignificant Coefficient with Significant Descendant), the SDF has a value of I ,  
while for the POS_ZT (Positive Significant Coefficient and Zerotree Root), the value is 
0. The binary representations of these two symbols are 101  and 100 respectively. The 
19 
other symbol POS_LEAF (Posilive Significant at Leaf Node) is for the case that the 
coefficient is al the lowest level of the tree, Le. a leaf node, where there arc no 
descendants. In this c;1sc, SDF is always O and is not St!lll, and tht! hi,rnry rt!prt!scntation 
for this symbol is IO. The corresponding symbols for ncgativt! signifo.:ant cocfficit!nts 
are shown in the table. 
The other symbols are for the cases where the coefficients are not significant or the 
coefficients have been found significant in the previous passes. In the latter case, SlG is 
always 0. The symbols IZ and ZTR have the same meanings as those shown in Table 
2.4. The new symbol IZ_MSIG indicates that the coefficient has been found significant 
in a previous pass. Therefore the SIG bit is not sent since we do not need to send any 
more significant information about this coefficient. Similarly, there is no need to send 
the SIG bit for ZTR_MSJG. The ZTR_LEAF symbol in Table 2.4 indicates a non­
significant coefficient at a leaf node where only the SIG = 0 needs to be sent. When a 
coefficient at a leaf node is found significant, it will have MSIG sel to I and SIG to O in 
all subsequent passes. Together with the fact that SDF is always O for a leaf node, a 
MSIG leaf node is virtually deleted from the coefficient tree for future passes of the 
search .  The other case, which will not happen, is that a leaf coefficient will never have 
an IZ symbol. 
Two kinds of modifications have been made to the basic EZW algorithm. The first 
one is to attach an additional bit, i.e. SDF, to the significant coefficient symbols. This 
will increase the number of bits to code a significant coefficient by I in  some cases, but 
will decreas� the number of ZTR symbols if the corresponding coefficient has no 
significant descendants. Whether this will lead to an improved performance of the 
EZW algorithm depends on the characteristics of the image being coded l 19]. The other 
20 
kind of modifications arc certain to reduce the numhcr of hits required to code an image 
by not sending the information the decoder :1lrcady has. The MAP rnding 
representation for the Improved EZW algorithm can he further improvi!tl hy i:xplic.:itly 
sending the SGDF bit. The SGDF hit is only sent if SDI' � I .  This scheme would he 
similar to the MAP coding scheme for the SPIHT algorithm. 
2.2.4 Generation and Transmission of Symbol Stream 
The fourth variation available for EW algorithms is the generation and transmission 
of the symbol stream. For high performance, symbols which are more significant 
should be transmitted ahead of symbols which arc less significant In each coding pass, 
the SAQ symbols should be transmitted ahead of the corresponding MAP symbols. In 
the EZW and SPIHT algorithms, coefficients which have been found to be significant 
(MSIG) are taken out of the coefficient tree and put into a separate queue. During each 
pass, the coding of the MAP symbols are generated from traversing the coefficient tree 
and the SAQ symbols are generated from scanning the MSIG coefficient queue. The 
order of generation of the symbol stream alternates between MAP and SAQ symbols. 
A variation for the generation of the symbol stream would be to leave MSIG 
coefficients in  the coefficient tree. When traversing the coefficient tree. when MSIG = 
0 a MAP symbol is generated and when MSIG = I ,  a SAQ symbol is generated. The 
symbol stream would be a mixture of MAP and SAQ symbols as shown in Figure 2. 1 1 . 
Another option available for EW algorithms is to reorder the SAQ symbols before 
transmission. The SAQ symbols are reordered in descending order based on the 
reconstructed magnitude of the wavelet coefficients at the decoder as in the EZW 
algorithm. 
21 
Pass 2 Pass 1 
Wavelet 
coetlicienls 
Tree 
Searching I I · · ,  MAP + SAO I MAP + SAO ! 1-L-----'---'----�--�- --�� . 
Figure 2. 1 1  Mixc<l MAP and SAQ symbol �trcmn 
2.3 Features of DFS for Hardware Implementation 
In this section, we present a variation of EW algorithms for hardware 
implementation based on depth-first search (DFS) processing. The features of the DFS 
EW algorithms are: 
I .  The depth-first tree searching strategy is used: 
2. Coefficients which have been found to be significant are not taken out of the 
coefficient tree but are marked significant (MSIG); 
3. The symbol stream is a mixture of MAP and SAQ symbols; 
4. The SAQ symbols are not reordered for transmission; 
5. The wavelet coefficients are processed using the sign-nrngnitude representation and 
6. The threshold values are powers of two. 
Figure 2. 1 2  shows the DFS variant for the EZW algorithm. The DFS variants for the 
Improved EZW and SPIHT algorithms have similar structure. The wavelet coefficients 
are in the sign-magnitude representation where the sign bits are stored in SIGN and the 
magnitude bits are stored in MAG. The number of magnitude bits used to represent the 
coefficients is given by II and each tree node is represented by (i). The symbol LEAF(i) 
indicates if the node is a leaf in the coefficient tree. The DFS algorithms have 
advantages for hardware implementation. They have reduced storage, simpler 
computation and allows for efficiem addressing of the nodes in the coefficient tree. 
22 
DFS EZ\V Algorithm: 
I )  Initialization: Output II and set MSIG(i) to O for all tree nodes 
2) Coding Pass: 
For each node (i) in the coefficient tree do: 
If MSIG(i) = I then 
• Output MAG(i,11) 
Else 
• Output SIG(i) 
If SIG(i) = I then 
• Output SIGN(i) 
• Set MSIG(i) = I 
Else 
If  LEAF(i) = 0 then 
If AOMSIG(i) = 0 then 
• Output SDF(i) 
End If  
End If  
End If  
End If 
End For 
3) Quantization-Step Update: decrement II by I and go to Step 2 
Figure 2. 12 DFS EZW algorithm 
2.3.1 Partitioning of Coefficient Tree into Subtrees 
Feature ( I )  is that the depth-first tree searching strategy is used. The DFS performs a 
natural partitioning of the coefficient tree into subtrees which can be processed 
independently. Figure 2.13 shows the storage arrangements for the BFS and DFS tree 
traversal schemes. The BFS arrangement is shown in Figure 2. I 3(a). The root nodes in 
the coefficient tree are located at level O and the leaf nodes are located at level 2. The 
BFS arrangement stores the coefficient tree according to its different levels. By 
scanning the storage in a sequential manner, all the root nodes at level O will be 
prOcessed before the nodes at the next level are processed. Figure 2. l 3(b) shows the 
23 
DFS arrangement. The DFS performs a natural partit ion ing of the coef
f
ic ient tree i nto a 
number of subtrees which are arranged one after another i n  the torage. The torage 
requ i rements can be reduced to the torage requ ired for a i ngle subtree. After one 
subtree has been proces ed, the storage can be reused for proce s i ng the next subtree. 
level 0 
level 1 
level 2 
(a) (b)  
level 0 
level 1 
level 1 
level 0 
level 1 
level 2 
level 1 
level 2 
Figure 2 . 1 3  DFS partitioni ng of coeffic ient tree i nto ubtrees 
2.3 .2 Propagat ion of Significance Symbol 
subtree 1 
subtree 2 
One complexity for process ing of the coefficient tree i s  the determi nation of the SDF 
and SGDF signi ficance symbols. The DFS s impl i fies the propagation of the 
s ign i fi cance symbols in Lhe tree by propagating the ign i fi cance symbols to the parent 
node as oon a the necessary chi ldren nodes have been searched. Figure 2. 1 4  shows 
t he propagation order of the s ign ificance symbols u i ng the DFS. The propagation 
order beg ins from the leaf nodes and end at the root node. The DFS can begin from 
the HH 1 chi ldren nodes and a oon as the four chi ldren nodes have been searched, the 
24 
signifi cance . ymbols are propagated to the H H2 parent node. After the H H2, LH2 and 
HL_ node have been , ear hed, the ymbol i propagated to the LL2 node. The 
ignificance symbols for the LL2 root nod can be immediate ly det rmi ned after the 
d tem1ination of the ign i ficance ymbol for the H2, LH_ and H L2 branch n de . 
� 
0 
C 
.Q 
O> 
ct1 
a. 
e 
a.. 
Cl) 
I.J... 
0 
-
HLz 
HLI 
HL
1 
HL1 
HL1 
L!-11 
LH 1 
Lll 1 
Ll-l, 
LH I 
HH2 
HH 1 
HH 1 
HH I 
HH
1 
Figure 2 . 1 4  DFS propagati n order 
2 .3 .3  MS IG Quanti zat ion 
ln  EW a lgorithms, th coding of the MAP ymbol are generated from travers ing the 
co fficient tree and the SAQ ymbol are generated from cann ing th M IG coefficient 
queue. In  each c ding pas , coeffi ient which have be n found to b sign i ficant are 
transfen-ecl from th co fficient tree to the MSIG coefficient queue. Thi would requ ire 
variable storage cheme or a wart-case f al locating th maximum torage r quired 
for b th the coefficient tree and MSIG coefficient qu u . Feature 2 remov the need 
for a eparate storage for MSIG coefficient . The M IG ymbol is u ed to keep track of 
which coefficient. have been found to be sign i ficant. Simi lar approa he to reduce 
25 
sloragc rc4uircmcnts for i.:ocfticicnts whid1 ha·,c hcen foun<l to he significant have heen 
usi:<l in  HOJ. HI  I, 1-t2f. These ;1ppro:u.:hcs usc the set par1i1ioni11g scheme for the search 
strategy. The DFS ;1ppro:u.:h requires less storage hecausc it is a fixed trnversal scheme. 
For a 5 l 2x5 I 2 grnyscalc im.igc. the scl partilioning approm.:h requires 320K of storage 
while the DFS approach fl'4uircs �56K of storage. 
2.3.4 Bit Stream Processing 
Feamre (3) is to 1ransmi1 rhe symbols as they are generated to avoid having to store 
lhe symbols before they are transmiued. Feature (4) omits the high processing cost of 
reordering the SAQ symbols for transmission. Features (5) and (6) use the advantages 
of bit stream processing using powers of two threshold values. They take advantage of 
the fact 1ha1 rhe wavelet coefficients are in their sign-magnitude representation. One 
requirement during encoding is to determine the SIG symbol. In BS processing, this 
can be directly obtained from the magnitude representation of !he coefficient without 
having to perform any comparison opermions. For example. in the DFS EZW algorithm 
in Figure 2 . 12. SIG(i) can be simply replaced wi1h MAG(i,11). In decoding. one 
requirement is the calculation of lhe approximated coefficient magnitudes after the end 
or lransmission. Using bil s1rcam processing. this can be easily achieved by treating the 
MSIG symbol as if it is the last SAQ symbol received from the encoder. This has the 
effect or appc!nding a · I" 10 the approximated magnitudes of MSIG nodes. For nodes 
which are nol MSIG. the values of the approxi1m11ed magnitudes remain 0. The 
approximated magni1udcs can be determined withoul having to perform any arithmetic 
operations. 
26 
,·,
, 
•,I, 
2.4 Simulation Results 
2.4.1 Comparison of TS Strategies 
For an initial investigation, the BFS and DFS tree searching strategics were 
simulated using the Lena and Barbara 5 12x5 1 2  images and the peak signal-lo-noise 
ratio (PSNR) (dB) for the images were calculated. Figure 2. 1 5  and Figure 2. 16 show 
the Lena and Barbara test images respectively. The simulations were wriuen to follow 
the EZW algorithm as closely as possible and only the tree searching strategies were 
changed. We used the wavelet filters and adaptive arithmetic coder used in the EZW 
algorithm as described in [46] and [47) respectively. 
Figure 2. 1 7  and Figure 2. 1 8  show the coding performances which were obtained for 
the Lena and Barbara images respectively for various bits per pixel (bpp). The 
performance for the EZW algorithm is shown for comparison. The BFS gave similar 
results to EZW. This is expected considering that the two search strategies are similar. 
The DFS results in a slight decrease in performance when compared with EZW and 
BFS. For lhe Lena image at 0.25 bpp, EZW gives a PSNR of 33. 1 7  dB. The PSNR 
obtained using BFS and DFS were 33.28 dB and 32.81 dB respectively. There is a 
reduction of about 0.5 dB using DFS. 
27 
Figure 2. 15  Lena test image 
-
Figure 2. 16  Barbara test image 
28 
m " 
a: 
m " 
a: 
40 
38 
36 
34 
32 
30 
28 
26 
24 
22 
20 
0 
Figure 2 . 17  
36 · 
34 
32 
30 · 
28 
26 · 
24 
22 
20 
1 8  
1 6  
0 
- <> -BFS 
.. . . , .. 
0.2 
• •  O• . DFS --EZW 
. - . . . . . . . . . 
0.4 0.6 
Rate (bpp) 
. . . . . . . .  
0.8 
Simulation results for TS stralegies for Lena image 
- <> - BFS · · •· · DFS - - EZW 
0.2 0.4 0.6 
Rate (bpp) 
0.8 
Figure 2.18 Simulation results for TS strategies for Barbara image 
29 
.. 
1 
1 
2.4.2 Comparison of DFS Configuration, 
We then performed simulations on the following simplified configurations of the 
EZW algorithm using the DFS tree searching strategy as shown in Tahlc 2.6. The four 
configurations were applied to the Lena and Barbara images. Configurations I and 2 
gave similar results. as did Configurations J and 4 showing that reordering or SAQ 
symbols does not significantly affect the performance. A similar conclusion about SAQ 
reordering has been reported in [481 for the raster scan used in the EZW algorithm. 
Figure 2.19 and Figure 2.20 show the coding performances versus bit rates for 
Configurations I and 4 for the Lena and Barbara images respectively. The average 
difference between the PSNRs produced by Configurations I and 4 is 2.2 dB showing 
that arithmetic coding does give a significant improvement in the DFS EZW algorithm. 
Table 2.6 DFS Configurations 
Configuration Descriotion 
This configuration consists of the DFS arrangement. The SAQ 
I symbols are not reordered and the output bit stream is not 
arithmetical Iv (;Oded 
It consists of the DFS arrangement and the SAQ symbols are 
reordered. It is expected that the performance of this configuration 
will be the same as that of Configuration I i f  all the information 
2 resulting from a pass of the scanning of the coefficient tree is 
completely transmined to the decoder. This configuration produces 
better results if the transmission is truncated during the sending of the 
SAQ 
This configuration consists of the DFS arrangement and the arithmetic 
3 coder (AC). This configuration is expected to perform better than 
Configuration I due to the further compaction of the output bit stream 
by the AC 
It consists of  all the components of the EZW algorithm using DFS. 
This configuration wil l  produce the best performance compared to the 
other configurations 
JO 
z 
z 
40 
38 
36 
34 
32 
30 
28 
26 ' 
.. 
. . 
24 
22 . ' 
20 
' 
0 
. . 
. . .  - . .  Configuration 1 - --Configuralion 4 
.. 
0.2 0.4 0.6 
bpp 
0.8 1 
Figure 2 . 19  Simulation r�suhs for DFS Configurations for Lena image 
. . . . . . .  configuration 1 --Configuration 4 
36 
34 
32 
30 
28 .. 
26 • '  
24 
22 -· 
20 
. ' 
1 8  
' 
1 6  
0 0.2 0.4 0.6 0.8 
bpp 
Figure 2.20 Simulation results for DFS Conligunuions for Barbara image 
3 1  
2.4.3 Comparison of MAP Coding Schemes 
We next performed simulations to evaluate the performance of the different MAP 
coding schemes. We used the DFS in these simulations. We used the MAP coding 
schemes for the EZW, Improved EZW and SPIHT algorithms. The simulations were 
performed without arithmetic coding and without SAQ reordering. Figure 2.2 1 ,  Figure 
2.22 and Figure 2.23 show the results which were obtained for the EZW, Improved 
EZW and SPIHT MAP coding schemes respectively. The performances for the original 
EZW, Improved EZW and SPIHT algorithms are shown for comparison. The 
performance of the Improved EZW used the results of the Significance Checking in 
Wavelet Trees (SCIWT) algorithm reported in [ I 9J. Six scales of subband 
decomposition using the 9/7 biorthogonal filter [49] were used and the integer part of 
the wavelet coefficients were converted to their sign-magnitude representation. 
The EZW coefficient tree structure was used for the DFS EZW and the DFS 
Improved EZW coding schemes. The SPIHT coefficient tree structure was used for the 
DFS SPIHT coding scheme. The SDF symbol was not explicitly sent for MSIG nodes 
for the DFS EZW coding scheme. The SDF symbol was explicitly sent for MSIG nodes 
for the DFS Improved EZW and DFS SPIHT coding schemes. The AOMSIG symbol 
.. was used in all cases. For the various bit rates, the MSIG symbol was treated as the last 
received SAQ symbol after the end of transmission. 
32 
--E� 
. .  ·Gl· . •  DFS with E'Z)JIJ MAP coding 
41 
39 
37 
35 . . .  
33 . .  e,
· · · 
ii, 31 
. .  " .o ·  -
29 z 
27 .. o 
25 l . . 
23 
21 
1 9  
17 
0 0.2 0.4 0.6 0.8 1 
Rale (bpp) 
(a) Lena image 
-- E� 
. .  ·G· . .  OFS with EZ:N MAP coding 
37 
35 
33 
31 . . . . . . 
. , . 
CD 29 . . . " , . - . . o · ·  a: 27 z 
25 a. o · 
23 .o· 
21 . ,W 
1 9  q, 
17  
0 0.2 0.4 0.6 0.8 1 
Rale (bpp) 
(b) Barbara image 
Figure 2.2l Simulation results for DFS traversal with EZW MAP coding 
33 
• • 
41 
39 
37 
35 
33 
CD 31 
� 
cc 29 z 
� 27 
25 
23 
21 
19 
. . � 
--SCIWT 
. .  -e· . .  DFS with lmprowd E'Z!N MAP coding 
. . .  . 
. . ' .()· . .  
. .  
.o · 
· · · · · - . . 
17  L_����� �� � � �������� -----'  
0 0.2 0.4 0.6 0.8 
Rate (bpp) 
(a) Lena image 
--SCIWT 
• .  ·G>· • •  DFS with lmprowd ENI MAP coding 
37 -,--�����
�
������
�
�����----: 
35 
33 
31 · 
CD 29 · 
� 
CC 27 · z r 2s . 
23 p 
21 � 
19  
0 
.o·· . . 
. 0 
0.2 
.0· . . 
. . 
0.4 0.6 
Rate (bpp) 
(b) Barbara image 
. - . , •  
. .  
0.8 
Figure 2.22 Simulation results for DFS traversal with Improved EZW MAP coding 
34 
--SPIHT 
. .  ·0· , .  DFS with SPIHT MAP coding 
41 
39 � .  - . 
" . .  . . . . 
37 - - · · .o· · · 
35 . .  . ' 
33 .o· 
iii' 31 . . 
'C o· 
a: 29 .. . ' 
27 � a. 
25 
23 
21 
19 
17 
0 0.2 0.4 0.6 0.8 1 
Rate (bpp) 
(a) Lena image 
--SPIHT 
- - ·G>· - - DFS with SPIHT MAP coding 
37 · 
35 
33 . . . 
. � . . 
31 · 
. ' . . o· · 
iii' 29 
a: 27 · .o· 
25 a. .. 
.o 
23 ' 
21 
19 
17 
0 0.2 0.4 0.6 0.8 1 
Rate (bpp) 
(b) Barbara image 
r;.:., � 
Figure 2.23 Simulation results for DFS tr'.1Versal with SPJf
TM
//
P coding 
35 
',,,, " 
The r ·ult how thal even with ut us ing ari thmet ic coding, the performance of the 
OPS I mproved EZW and DFS SP[ HT algori thms are only · l ightly lower than their 
respective EW algori thms. There l a . ign i ficant decrease .i n performance for the DFS 
EVN algori thm wi thout arithmetic cod ing. The average decrea e. i n  performance are 
2.4 dB .  1 .0 dB and 0.9 dB for the DFS EZW DFS I mproved EZW and DFS SPI HT 
algori thm respectively. Other Lhan the OF EZW algori thm, the perfomrnnce of the 
DFS EW algori thms are only about one dB les than thei r ful l  a lgori thms. The 
performance of the DFS SPI HT algori thm j a lmo 't  comparable to th perfotmance of 
the fu l l  EZW algori thm. 
Final ly, we pe1fonned s imulation to i nve 'l igate the DFS SPTHT aJgori lhm further. 
For the e si mLJ lations we replaced the SPIHT tree structure with the EZW t1·ee structure 
and did not use the AOMSIG symbol. The AOM 10 symbol wa found to i ncrease 
performance only for bit rates abov l bpp. We used the test images a shown i n  Figure 
2.24 and averaged the PSNR for the five imag . flgure 2 .25 hows the re u l ts whi h 
were obtai ned. The performance of the SPI HT algori thm with and without arithmetic 
cod ing (AC) are shown for comparison. Th av rag decrea ·e' in performance for the 
DFS SPI HT algori thm with lhe P I HT alg rithm wjthout arithmetic codi ng and with 
n1ithmetic cod ing ru-e 0.6 dB and 1 .0 dB re· pectively. 
Figure ""' .24 Te t images 
36 
;) 
-· 
. • .  g . . .  OFS SPIHT 
38 
37 
36 
35 
34 
33 
32 
31 
- - - - SPIHT without AC 
__ , ' . . 
... -- . , 0 , . -
/ . ·  
,,.
"' , . . · · 
/ . · · 
30 
.,"' .r;r 
,. .. . , . . 
29 
28 
27 
26 
25 
24 
0 
, . , .. · ,. 
fJ r. .. ,: . 
0.2 0.4 0.6 
Rate (bpp) 
- - SPIHTwith AC 
0.8 , 
Figure 2.25 Simulation results for DFS SPIHT MAP coding 
2.5 Conclusions 
In this chapter, we have outlined the variations and options for EW algorithms and 
discussed the variations which are suitable for hardware implementation. EW 
algorithms differ in the structure of the coefficient tree used. the strategy to search the 
coefficient tree for significant coefficients, t'.1e scheme to code the coefficient tree and 
the generation and transmission of the symbol stream so that symbols which are more 
significant are transmitted ahead of less significant symbols. For the coding of the 
coefficient tree, we have introduced a set of MAP symbols which is applicable for a 
number of EW algorithms. 
We have presented a variation of EW algorithms for hardware implementation based 
on DFS processing. The DFS EW algorithms have lower storage requirements than 
37 
previous approaches, simpler computations and allows for efficient addressing of till! 
nodes in the cocflicient tree. The p:.irtitioning of the coeffkicnt tree into subtrees, the 
depth-first search propagation of the signilicancc symbols, MSIG quantization and hit 
stream processing gives these aJv;.intagcs for hardware implementation. Simulations 
show that even without using arithmetic coding. the performance of the DFS Improved 
EZW and DFS SPIHT variants are only slightly lower than their complete algorithms. 
The performance of the DFS SPIHT variant is almost comparable to the performance of 
the complete EZW algorithm. Comparisons with the WVQ algorithm 1331 show that 
the performance of the DFS SPIHT algorithm is higher than the WVQ algorithm at high 
bit rates and is comparable to the WVQ algorithm at lower bit rates. Table 2.7 shows a 
summary of the DFS SPIHT and WVQ algorithms. The DFS EW variants offer 
algorithms which are suitable for implementation in hardware without significant 
decreases in the performance of the coding system. 
Table 2.7 Comparison of DFS SPIHT and WVQ algorithms 
DFS SPIHT WVQ 
Tree searching scheme iterative noniterative 
Quantization method scalar vector 
As a final note, we would like to say that the DFS approach has been developJ�i with 
some foreknowledge of hardware architectures which would be suitable. In particular. 
we have in mind the bit slream architectures which we will discuss in the next chapter. 
The feature of these architectures is that the coefficient bits are not stored in the 
processor and are processed as they flow through. This results in fast archilcc!Ures 
which have low complexity for hardware implementation. The DFS EW algorithms an: 
distinctive from previous approaches for hardware implementation because of their 
corresponding bit stream architectures which are readily available for implementmion. 
38 
Chapter 3 
\ ! --:...-_ " ..,/ 
Architectures for the DFS Embedded Wavelet Algorithms 
3. 1 Introduction 
Embedded wavelet (EW) algorithms vary from one another in the structure or the 
coefficient tree, the strategy to search the coefficient tree for significant coefficients, the 
scheme to code the coefficient tree and the generation and transmission of the symbol 
stream so that symbols which are more significant are transmitted ahead of less 
significant symbols. Amongst the variations. the tree searching (TS) strategy and the 
tree coding scheme can be considered to be the two variations which give each EW 
algorithm its particular characteristic. One variation for TS is the depth.first search 
(DFS). The DFS simplifies the hardware implementation of EW algorithms. The DFS: 
• Performs a natural partitioning of the coefficient tree into subtrees which can be 
processed independently and minimizes the storage requirements by processing one 
subtree at a time; 
• Provides an efficient scheme to determine ancestor-descendant relationships in the 
Silbtrees by propagating the significance symbols to the ancestor node as soon as the 
necessary descendant nodes have been searched. 
The processing requirements can be further simplified by using the sign-magnitude 
binary representation of the coefficient:,,, in the coding process. The use of the binarv 
representation in the coding process translates into using threshold values which are 
powers of two for significance map (MAP) coding. In :.1ddi1ion. succcssi,·e­
approximation quantization (SAQ) of significant coeflicicnts is simplified to magnitude 
39 
(MAG) coding where lhc S.i\Q symbols c,m bi.! ohtaincd directly from the.! nrngnit.udc 
rcprcst'.'ntarion of the coctfo.:icnts. 
Previous .ipproacht'.'s to implement EW algorithms in hardware have been mainly by 
making use of memory bilnks to store lhc wavelet codficicnts wh!.!rc all the coefficients 
can be accessed to cstahlish the ancestor-descendant relationships [34[, [35], [36], [37]. 
The disadvantilgc of the memory bank approach is that ii requires the coefficients to be 
stored in the TS processor prior to processing. In contrast to the all memory bank 
approach, we propose a new architecture to implement the tree searching. In this 
approach, the wavelet coefficients of an image are transferred from the two-dimensional 
discrete wavelet transform (DWT) processor to the TS processor in a bit stream which 
can be processed immediately. For clarity of discussion, we will look at the bit stream 
(BS) archilecture in relation to the implementation of the DFS EZW algorithm. The 
approach can be extended for the implementation of other DFS algorithms. A second 
-:..�)�·-· issue we will discuss is the flexibility of the DFS BS architectures to implement 
variations i n  EW algorithms. In particular, we will look al how the DFS BS 
architectures can be modified to implement tree searching strategies which are similar in 
principle to the set partitioning scheme used in the SPIHT algorithm. For the 
discussion. we will make use of a parallel DFS BS EZW architecture. 
This chap1er iS organized as follows. In Section 3.2, we discuss the DFS bit stream 
architecture for the implementation of EW algorithms. The DFS BS architecture results 
in simple and fast implementations of the algorithms. We discuss the DFS BS parallel 
archi1ec1Ure in  Section 3.3. The p;.1rallel approach results in  an even faster architecture 
at the cost of an increase in  h.irdwarc complexity. In addition, the parallel architecture 
40 
provide opt ions to output for di fferent tree . earch ing format to achieve h igher odi ng 
performance . Conclu ion are presented i n  S ct ion 3.4. 
3 .2 B i t  S tream Arch i tecture 
3 .2 . 1 Depth-Fi rL t Search B i t  Stream Input 
Figure 3 . 1 (a) show the ubbancl arrangement for a three ale DWT de omposit ion 
and Figure 3 .  l (b) how an example of the wav Jet oefficient u ed in EZW [ 1 6]. 
Ll:i HL:i � I -34 49 1 0  7 1 3  - 1 2  7 
H� 
LH3 HH3 -3 1  23  1 4  - 1 3  3 4 6 -1 
HL, 
1 5  1 4  3 - 1 2  5 -7 3 9 
LH2 HH2 
-9 -7 - 1 4  8 4 -2 3 2 
-5 9 - 1  47 4 6 -2 2 
3 0 -3 2 3 -2 0 4 
LH 1 HH, 
2 -3 6 -4 3 6 3 6 
5 1 1  5 6 0 3 -4 4 
( a) Subband arrangement (b) Wavelet coefficient 
Figure 3 . 1 Sub band an-angement and e · ample of coef
f
icient 
Figure 3.2 show the tr e r pre. entation of th c ffi ient  and Figure 3.3 how the 
order of the wavelet . ubband · which wi l l  be earch cl by the DFS. The DFS parL i l ion 
th coefficient tree i nto a number of ubtr e each of which ha one root node. The 
number of ubtree is equal to the number of coeffi ient in the lowest frequency 
ubband. Each ubtree an be further part i t ioned i nto it D ur ubband orientation . The 
l ist of coeffic ient i n  Lh four orientat ions ar on atenated i n  the rder LL, H L  LH 
and H H  to  a j ngle I i  t r pre ent i ng th ent ir ubtre . 
4 1  
(\ ' '  
,, , 
Jh'. 
,. JI,, -------
0 
.� ·· ·� � 
A IA. A A A A A A A A A J\\ 
7 !l .I • -ll7 h · I  1 7  • ·I J �  l l ·l �  .\ 0 1•1 1 1  l .l I ll ,, . � 1, 4 (, .<-l l l  " " ' '• !J I 1 1, ·•• 
�--�·-----_) '------- ,-
HL branch LH branch HH branch 
Figure 3.2 Coefficient tree example from EZW 
Level Subband orientation 
LL HL LH HH 
0 63 
I -34 -31 23 
2 49 1 5  3 
3 7 -5 4 
3 13 9 6 
3 3 3 3 
3 4 0 -2 
2 10 1 4  - 1 2  
3 - 1 2  - 1  -2 
3 7 47 2 
3 6 -3 0 
3 - 1  2 4 
2 14 -9 -14 
3 5 2 3 
3 -7 -3 6 
3 4 5 
3 ' -2 1 1  3 
2 •i - 13  -7 8 
3 3 6 3 
3 9 -4 6 
3 3 5 -4 
J'-' ,.-- 3 2 6 4 
Figure; 3.3 DFS representation showing subband orientations 
Figure 3.4 shows the binary representation of the coefficient tree in Figure 3.2 in the 
DFS format and the arrangement of the bit stream into the DFS BS EZW processor. 
The coefficients are represented using the sign-magnitude representation. The 
coefficient magnitudes are represented using seven bits. The MSB is represented as b6 
42 
and the LSB i repre ented a b0. The · ign bit · are r pre ented a b1. Al l the ign bi t  
for the coeffic ient are . h i fted out first fol lowed by the MSBs and ending with the 
LSB . The bit ar input beginning from the leaf coeffic ient and ending at the root 
coeffic ient of the tree. 
0 
1 
0 
1 
0 
0 
0 
0 
1 
0 
1 
0 
1 
0 
1 
0 
0 
1 
0 
1 
0 
0 
0 1 1 1 1 1 1 ==:>-- level 0 
HL coefficient bits 
0 0 1 1 1 1 1 level 1 
0 0 0 1 1 1 1 level 2 
0 0 0 0 1 0 1 
0 0 0 1 0 0 1 
level 3 
0 0 0 0 0 1 1 
0 0 0 0 0 0 0 
0 0 0 1 1 1 0 level 2 
0 0 0 0 0 0 1 
0 1 0 1 1 1 1 
level 3 
0 0 0 0 0 1 1 
0 0 0 0 0 1 0 
0 0 0 1 0 0 1 level 2 
0 0 0 0 0 1 0 
0 0 0 0 0 1 1 
level 3 
0 0 0 0 1 0 1 
0 0 0 1 0 1 1 
0 0 0 0 1 1 1 level 2 
0 0 0 0 1 1 0 
0 0 0 0 1 0 0 
level 3 
0 0 0 0 1 0 1 
0 0 0 0 1 1 0 
HH coefficient bits 
bo . . .  b6 ' b7 DFS BS EZW � --+ r Processor 
Figure 3.4 DFS input bit str am 
43 
MAP + SAO 
bit stream 
' ' 
3.2.2 Encoder Arcl1itccture 
Figure 3.5 shows the DFS BS EZW crn:odcr ;irchilcct�rc showing the flow and 
stornge of information for a 1:ocfficicnt lrcc with N subtrccs. Each suhtrcc has NC 
coefficient nodes. 
Mag 
bit s 
., 
nitude 
tream 
SIGN MSIG 
. 
N x NG entries 
-
SIG SDF 
>- :) 
NG entries 
r -
First Stage Second Stage 
DFS BS EZW DFS BS EZW 
Processor Processor 
I MUX I 
SAO MAP 
. Output bit stream . 
Figure 3.5 DFS BS EZW encoder architecture 
The sign bits are input first into the processor and stored. An option would be to not 
store the sign bits and to transmit them to the decoder immedi.itely. This has the 
drawback of sending sign bits for coefficients which would not be significant for the 
target bit rate and lead to a reduction in coding pcrforn·iance. For a high target bit rate 
44 
when many coeflic.ients would be signilicant, this muy be a viable oplion. But for a 
/I ·-,, 
lower large! bit ralc, this would cause a signilicant decrease in coding performance. In 
our architecture, we use a general scheme where the sign hits are stored in the 
processor. 
Whereas the sign bits have to be stored in the processor, the magnilUde bits can be 
discarded once they are processed. Two stages arc required for every pass of the 
magnitude bits at the same bit plane. The two stages correspond to scanning the 
coefficient tree twice for each threshold value. The First Stage scans the coefficient tree 
from the leaf nodes up to the root nodes and the Second Stage scans the coefficient tree 
from the root nodes down to the leaf nodes. The First Stage involves processing the 
magnitude bits as they arrive at the processor. The corresponding coefficient is 
determined to be significant if the bit arriving at the processor is I ;  otherwise it is not 
significant. Using the MAP coding scheme, this is represented by the SIG symbol. A 
coefficient is found significant only once and this is represented by the MSIG symbol. 
A MSIG coefficient is treated as having a zero magnitude by the other coefficients. At 
the same time, the SDF symbol for each coefficient in the current pass is set if the 
coefficient has at least a significant descendant. 
Another function performed in the First Stage is to generate and to output the SAQ 
bit for each coefficient which has been marked significant in the previous passes. In the 
bit stream architecture, the SAQ bit is just the bit itself for the MS IG coefficient. The 
outcome after the First Stage DFS BS EZW processor is that the SIG and SDF MAP 
bits have been generated for non MSIG coefficients -and the SAQ bits have been 
generated for MSIG coefficients. 
45 
The Second Stage involves scanning lhc iuformalion produced in lhc First Swgc of 
processing to oulput the MAP symbols. The MAP symhol assignment for the EZW 
coefficit!nt nodes is shown in Tahlc 3 . 1 .  The symbol POS has the hi nary representation 
or 10, NEG or 1 1 , IZ or O I and ZTR of 00 where the first bit is the SIG bit. 
Table 3 . 1  Assignment of MAP symbols for EZW cocffidcnt nodes 
Symbol assigned SIGN SIG SDF 
POS () I X 
NEG I I X 
IZ X () 1 
ZTR if NZF = 1 X 0 () 
The ZTR symbol is only output if the coefficient ancestor is not assigned a ZTR 
symbol. This condition is indicated by a flag, Not Zerotree Root Flag (NZF) which is 
set to l i f  the ancestor of the coefficient is not a ZTR symbol. Coefficients marked 
significant will not have MAP symbols in the subsequent passes. Each entry in the 
storage i n  the processor for each coef
f
icient position has the following fields: 
SIGN SIG SDF MSIG NZF 
Due to the depth-first nature, NZF does not need to be kept for a coefficient 
throughout the search but only a single NZF field is required for each level in the 
coefficient tree. Similarly, the SIG and SDF fields do not need to be kept for all N x NC 
coefficients but only for the NC coefficients of a single subtree. For a coefficient tree 
containing N subtrees where each subtree has NC coeflicienl nodes, the total storage 
required is 2 x N x NC bits for storing the SIGN and MSIG bits for all the coefficients 
and 2 x NC bits for storing the SIG and SDF bits for the subtrec being processed. 
Further reduction in storage may be possible by making use or the storage in the DWT 
46 
processor for SIGN. SIG and MSIG for the subtrecs which arc not being proccssc<l 
currently. 
3.2.2.1 First Stage Processor 
Central to the First Stage is the generation and storage of the SIG and SDF MAP bits 
for the coefficients. If at each clock cycle i, one magnitude bit MAG is input into the 
processor then the MAP bits can be determined by the following Equations: 
SIG(i) = MAG(i) AND NOT(MSIG(i)) (3.1) 
SDF(i) = SIG(i- 1 )  OR SIG(i-d- 1 )  OR S1G(i-2d- l )  OR SIG (i-3d- l )  (3.2) 
OR SDF(i-1 )  OR SDF(i-d-1 )  OR SDF(i-2d- l )  OR SDF (i-3d- l )  
where d is the number of bits which have to be delayed from a child coefficient to its 
next sibling coefficient and is given by: 
I.,,, - /'"' 
d = -4-�- -
3 
(3.3) 
The labels lmax and lcur denotes the maximum level and the current level i n  the 
coefficient tree respectively. The values of d are 2 1 ,  5 and I for levels 0, I and 2 
respectively. Equation (3. 1 )  shows that the SIG bit is set to the input magnitude bit,for 
nodes which are not MSIG. For MSIG nodes, the SIG bit is set to 0. Equation (3.2) 
shows that the SDF of a coefficient is determined by the SIG and SDF bits o f  its 
children. For example, the SDF for a coefficient node at level 2 is determined by the 
SDF and SIG bits of its children nodes at level 3. Figure 3.6 shows the architecture for 
the First Stage Processor. The LEVEL SELECT circuit selects the current level of the 
incoming magnitude bit. For each subtree, the circuit generates the DFS sequence 
47 
1 1 (3), 1 1 (3), 1 1 (3), 1 1 (3), 10(2), 1 1 (3), 1 1 (3), 1 1 (3), 1 1 (3), 10(2), 1 1 (3), 1 1 (3), 1 1 (3), 
1 1  (3), 10(2), 1 1  (3 ), 1 1  (3 ), I I (3 ). I I (3 ), I 0(2), 0 I ( I ), . .  ., 0 I (  I )  ,00(()) corresponding to 
the current levels shown inside parcnthcsis. The circuit can be implemented w,ing a 
two.bit counter for each level. The SDF values for leaf nodes corresponding to the level 
sequence 1 1  will always be 0. 
1-�- - - - - --- -- - - - - - - --+ SJG 
J.EVEJ. 
SEI.EtT 
� -->1 01 
ll 
C.- - --- ---+ SDF 
lO 
I I  
Figure 3.6 First Stage Processor architecture 
Figure 3.6 shows that several one-bit storage and OR gates are required in the 
architecture. In practice, in order to assign a value to SDF, only a single bit at each 
level is needed to keep the temporary value for the level above. Let these bits be 
TSDF(l). The function of these bits is to keep the temporary SDF for each level. 
Initially, these bils are set to zeros. It is assigned to SDF(/- 1 )  and reset after the 
coefficients in a subtree at level / are processed during the search. Figure 3.7 shows an 
example of a SIG bit plane and the two TSDF bits required for processing. 
48 
T 'DF(fl) 
� 
TSDl'( I J  
� 
0 
1 8  1 7  ) 6  
l 
0 
l 
2 
3 
4 
5 
6 
7 
8 
9 
J O  
1 1  
1 2  
1 3  
1 4  
1 5  
1 6  
1 7  
1 8  
1 9  
20 
1 5  1 3  1 2  I I  1 0  � t, 5 
Figur 3.7 Example of a SIG bi t  plane 
M G  
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
0 
0 
0 
Table 3 .2 A ignment of TSDF bits 
LEVEL T DF(O) T DF( l )  
SELECT 
1 0  0 0 
1 0  0 0 
1 0  0 0 
1 0  0 0 
0 1  0 0 
1 0  0 0 
1 0  0 0 
1 0  0 0 
1 0  0 0 
0 1  0 0 
1 0  0 0 
1 0  0 0 
1 0  0 l 
1 0  0 l 
0 1  1 0 
1 0  l 0 
1 0  l 0 
1 0  l 0 
1 0  l 0 
0 1  I 0 
00 0 0 
SlG 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
0 
0 
0 
0 
SDF 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
l 
0 
0 
0 
0 
0 
1 
Tab! 3.2 how the a ignment of the T D bi t  . Figure . show. the flow hart to 
a compl i  h the Fir t tage Proce ing .  Note that SDF 0) i not immediately a i gned 
the valu of 1 for th i ncoming magni tude b i t  at the 1 3 th lock cycle (i = 1 2 ). lt i only 
ass ign d the value of 1 at th 1 5 th clock ycl (i = 1 4) when the level above i bei ng 
processed. l n  th i  way, the SDF bit are propagated from the leaves to the rool node. I f  
49 
any node below the root has a SIG value of I ,  TSDF(OJ will be set to I ; else TSDF(O) 
will remain 0. TSDF(O) is reset after the root node has been processed whereas 
TSDF( I )  is reset for processing a new branch. 
� - - - • - - - -, 
SIG : MAG AND NOT(MSIG) 
LEVEL SELECT= 10  
No 
LEVEL SELECT = 01 
No 
SOF = TSOF(O) 
TSDF(O) = 0 
Eod 
Y,
� 
I 
Yes 
SDF • O  L 
TSDF(l) = TSDF(1) OR SI� -
1 
SOF = TSDF(l) 
TSDF(O) = TSOF(OJ OR TS0F(1) OR SIG 
TSDF(1) "' 0 
Figure 3.8 Flowchart for First Stage Processing 
3.2.2.2 Second Stage Processor 
The Second Stage is to scan from the top of the storage in  the processor to the 
bottom. Two flags TNZF(l) (Temporary NZF) need lo be kept at each level lo indicate 
i f  a coefficient parent is not a ZTR node. Only one bit is needed at each level since it is 
a DFS. Figure 3.9 shows an example of a ZTR bit plane and the two TNZF bits 
required for processing. A value of 1 indicates that the node is a ZTR node. Table 3.3 
50 
'how th a ignrnent of th TNZF bi t . Figure 3 . 1 0  show the flowchart to accompl ish 
the Second Stage Proce sing. 
TNZPl l l  
i 
0 
l 
2 
3 
4 
5 
6 
7 
8 
9 
10 
1 1  
1 2  
1 3  
1 4  
1 5  
1 6  
1 7  
1 8  
1 9  
20 
7 
04 
5 4 I U  1 2  1 3  I �  1 5  
Figure 3.9 Example of a ZTR bi t p lane 
TabJe 3.3 A ignment of TNZF bi t 
ZTR LEVEL TNZF 1 )  TNZF(2) SELECT 
0 00 l l 
1 0 1  l 0 
l 1 0  l 0 
l 1 0  l 0 
1 1 0  1 0 
I 1 0  1 0 
0 0 1  1 1 
1 1 0  1 1 
0 1 0  1 1 
1 1 0  1 1 
1 1 0  1 1 
1 0 1  I 0 
1 1 0  l 0 
1 1 0  1 0 
1 1 0  1 0 
l 1 0  1 0 
I 0 1  1 0 
1 1 0  I 0 
1 1 0  1 0 
1 1 0  1 0 
l 1 0  1 0 
5 1  
lev.l 0 
1 1  I �  1 9  20 
NZF 
1 
1 
0 
0 
0 
0 
1 
1 
l 
1 
1 
1 
0 
0 
0 
0 
1 
0 
0 
0 
0 
( ___ - - �-;;i" - ) 
--- .. ,_ 
input ZTn 
.---'- - r 
<:----��EVEL SELECT = 00 . "·:�:> �
e
� 
---------- -�- ------- - i r�: I __ _ 
NZF "' 1 
TNZ/'(1) = NGT(ZHlJ 
� .  
. .. 
.. 
-�-
--- y ' i 
-----
-- - --------. es i NZF = TNZF(l) i- __ J 
�VEL SELECT ��-- --
--------- •  TNZF(2) = NOT(ZTR) 
i i �----·--- L.�--·-·-· · · -- - - -- , 
I No C NZF , TN2F(2) ---- --1 
' 
__J 
( ___ _  ,_"' ___ ) 
Figure 3 . 10  Flowchart for Second Stage Processing 
3.2.3 Decoder Architecture 
In this section, we discuss the decoder architecture for the bit stream approach. In 
particular, we will discuss a consideration bec.iuse of modifications which are 
performed to the bit stream due to channel coding. In the DFS BS EZW architecture 
shown in Figure 3.5, the SAQ bits are output by the First Stage Processor and the MAP 
bits are output by the Second Stage Processor. The MAP and SAQ bits form the output 
bit stream. The SAQ bits are only output for coefficients which are MSIG whereas the 
MAP bits are only output for coemcients which are not MSIG. Also, ZTR MAP bits 
are only output if the NZF flag is l .  This means that there are clock cycles for which no 
MAP or SAQ bits are output. Figure 3 . 1  l(a) shows an example of the output hit stream 
from the encoder where • represents the no�output case. During channel coding, the 
52 
channel will compilL'l logcthcr the hits for trnnsmission to he sent to lht: dcl:rnlcr. Figure 
3. 1 1  (b) shows the input hit stream to the dcl:odcr. 
l l l • • O l • • l • • I IJ • O J  
(a) Uncompactcd output bit stream from the encoder 
MAP bi!� SAQ bit� 
I I l U I I 1 0 0 l 
(b) Compacted input bit stream into the decoder 
Figure 3. 1 1  Bit streams before and after channel coding 
The architecture to decode the output bit stream in Figure 3. 1 l(a) back into the 
coefficient bit stream is simpler than the encoder architecture because for each bit plane 
only one stage of processing is required. The processing corresponds to scanning the 
coefficient tree from the root nodes down to the leaf nodes. The complexity arises 
because of the compaction to the bit stream due to channel coding. In the bit stream 
architecture, bits which arrive have to be processed immediately. I f  the bits are not 
processed immediately, they have to be stored until they are ready to be processed. Due 
to the compaction performed on the bit stream by the channel coder, the bits would 
arrive at the decoder earlier than the decoder expects. 
We propose two approaches to handle lhe discrepancies in timing between the 
transmission channel and the decoder. The firsl approach is Lo buf
f
er the inco1ning bils 
to regain back the uncompacted output bit stream as in Figure 3. 1 1  (a). This ;q!iproach 
53 
/'i' ' 
;<: ,.IJ,, 
leads to 1.1 simple decoder architecture at the i..:ost of an additional input huffcr which is 
required. The second approach is lo utilize a more complex decoding scheme where 
each bit which is output from the dcC"mJcr is accompanied hy an addn:ss which informs 
the w:.1vele1 prm:essor of the position of the coelliC"ient in the tree. This approach would 
require a decoder with more complexity hut would not require the input buffl!r. Jn 
addition. this approach leads to a faster decoder arC"hitccture. Figure 3. I 2(a) and Figure 
3. 12(b) show the two approaches for the DFS BS EZW decoder architecture. 
Compacted ---.i 
bit stream 
Uncompacted 
bit stream 
Input buffer 
(a) Simple decoder architecture 
Decoded 
coefficient stream 
Complex 
DFS BS EZW 
Decoder 
Decoded coefficient strcmn 
Compacted ---.i 
bit st��um ,, Coefficient address 
(b) Complex decoder architecture 
Figure 3 . 12  Approaches for the DFS BS EZW decoder architecture 
3.2.4 Advantages of DFS BS Architecture 
The bit stream approach of processing the bits as they arrive at the processor leads to 
simple architectures with minimal memory requirements. A n  additional advantage is 
easier interfacing to the DWT processor and the transmission channel. 
54 
CJ ,. 
3.2.4. 1  Simple Comparison Operations 
The determination of the SIG symbol can he ohwincd llircctly from the magnitude 
representation of the coefficient without having lo perform 1.my comparison operations. 
Figure 3. 13  shows the determination of the SIG symbU, ··ror the coefficient 63 using a 
threshold of 32. Figure 3.13(a) shows the comparat, .. :proach used to dclcrminc 
coefficient significance used in the architectures in 134], i"l.7J and Figure 3 . 1 3(h) shows 
the bit stream approach. Using the bit stream approach, the SIG symbol is simply given 
by the current magnitude bit. The difference between the two approaches is that the bit 
stream approach uses threshold values which are powers of two. 
Magnitude 
0 0 1 1 1 1 1 1 
Magnitude 
Comparator SIG 1 0 0 (1) 1 1 1 1 1 1 
SIG 
Threshold 
(a) Comparator approach (b) Bit stream approach 
Figure 3. 1 3  Calculation of SIG symbol using bit stream approach 
3.2:4.2 Simple Arithmetic Operations 
One requirement in decoding is the calculation of the approxi mated coefficient value 
after the end of transmission. Figure 3. 14 shows the calculation of the coefficient 63 
after three threshold values. When the decoder receives the POS symbol, it decodes the 
sign bit and lhe MSB and sets the MSIG bit to I .  but does not insert it in position yet. 
For the thresholds of 16 and 8, the decoder receives the two SAQ ' I '  bits, If 
transmission ends here, tll'e decoder inserts then the MSIG bit into the next available bit 
55 
,,.. 
position before dccodirlg the cocflicicnt value. Zeros are inserted i nto the remaining hit 
positions. The d�codcU value is 01 1 1 100 which '.i,s +60 in sign-magnitude 
representation. The decoded value can be obtained without having to perform any 
arithmetic operations. 
Threshold Coded symbol Decoded value Decoded bits 
32 POS +32 0 [2]  1 MSIG 
1 6  '1 ' +48 0 �  
0 
.......... 
8 '1 ' +56 1 1 1 0 0 : .......... 
Figure 3 . 14 Calcull,!tion of approximated coefficient value 
3:2.4.3 Minimal Storage Architecture 
The bit stream architecture operates on individual bits at a time. Each bit in the 
coefficient binary representation is treated as a separate piece of information. The 
MSBs are processed and discarded before the LSBs are even input into the architecture. 
This feature of processing individual bits dif erentiates the bit stream approach from 
other architectures which operate on the entire coefficient binary representation [34], 
[37]. The bit stream approach also results in an architecture with minimal storage 
requirements. The same storage for processi ng the MSBs can be reused for processing 
the LSBs. 
3.2.4.4 Just-In-Time Processing 
The bit stream architecture gives the flexibility of processing as many or as few bits 
to suit a transmission bit rate. For example, Figure 3 . 1 5  shows a DFS BS EZ\V 
processor with a channel rate control .  When the target channel rate is met. the DFS BS 
EZW processor is informed and future incoming bits arc not processed. The processor 
56 
,, 
doesn't process any further if lhc channel cannot w.:commo<latc the gcncrnte<l bits. I ll'.1 
this way, no processing is �Vastcd. For c01111}arison, the other approach would be to 
process all incoming bits and d iscan.1 thc gcncrntcd hits which cannot be accommodated 
by thi.: transmission channel. 
bO "' b6 , b7 DFS BS EZW Bit strea Processor m 
;_; 
Channel rate 
control 
Figure 3 . 15  DFS BS EZW processor with channel rate control 
3.2.4.5 Architecture Scalability 
DWT processors have been designed to suit different image sizes. Another 
advantage of the bit stream architecture is that it can be used to handle images of 
varying dimensions provided that the same number of subband decompositions are 
used. A 16xl6 image with three scales of subband decomposition would give four 8x8 
subtrees whereas a 32x32 image with three scales of subband decomposition would give 
1 6  8x8 subtrees as shown in Figure 3. 1 6. As far as the architecture is concerned, the 
only difference between processing the two images is that the input bit stream of the 
larger image is four times longer than the input bit stream of the smaller image. 
" 
57 
4 su bt re e s  
D FS B S  
P rocessor  
1 6  s u bt rees 
D F S B S  
P rocessor  
Figur 3. 1 6  Scalabi l i ty of  b i t  tr am archi te ture 
3 .3  Parallel Architecture 
Thi ection pre ent a paral lel arch i tecture for the i mplementation of the EZW 
algori thm ba ed on the DFS BS archi tecture. U i ng th d pth-fir t earch of the wavelet 
coefficient tree, the wavelet o fficient i n  the co fficient tree are fir t part i t ioned i n to 
i nd pendent ubtree . I n  the case of ful l  paral le l ism, each of the ubtree i proces ed 
by an i ndependent proces,or. The output from ach pr e ·sor i then mul t ip l  ed back 
i nto a i ngle output bit Lream. Whi le the output bi t  I .ream from each ubtree proce sor 
i i n  the DFS format, the overal l  mult ip l  xed output b i l  tream repre ents the earch of 
the subtrees in para l l  I . The use of the DFS BS tructur a l  o makes it p ib le for 
part i al paral lel i m where a subtree pro e . or can pr ce s two or more subtree i n  
quence. Th is provides flex ib i l i ty for the d ign of the overa l l  pr ces or opt imal ly to 
mat h the rate of the overal l  i nput bi t  tream .  A ubtree pr ce or  can be ea  i ly 
modi fi d to perform any improved MAP coding . h me and the mul t ip l  xer for th 
output b i t  stream from the proce. . r. can be modi fied to produce the format of 
di fferent TS strategie.. l n  part icular, we wi l l  I ok at how th mul t ip lexer can be 
-g 
modified to obtain the outputs from the parallel processors in such a way as to 
implement a tree searching strategy which is similar in principle to the set partitioning 
semch strategy used in the SPIHT algorithm. 
3.3 . 1  Parallel Implementation of the DFS BS EZW Algorithm 
The maximum parallelism which can be achieved is when the number of processors 
is equal to the number of subtrecs. The number of parallel DFS BS EZW processors 
can be lowered. The simplest way to reduce the number of processors is  to concatenate 
a number of subtrees into a single processor. If there i s  only one processor, we return to 
the sequential DFS BS EZW architecture. For ease of implementation, we assume that 
the number of subtrees to each processor are equal. Additional control circuitry is 
required if some processors process less subtrees than other processors. There is no 
difference in the architecture or implementation of a DFS BS EZW subtree processor 
for pr?cessing one single subtree or two or more subtrees in sequence. 
Let N be the number of subtrees and M the number of processors. Each processor 
then processes N I M number of subtrees. If N I M = I ,  then we have maximum 
parallelism. On the other hand, if N I M =  N. i.e. M = l ,  we have the sequential DFS BS 
EZW architecture. The number of temporary storage words for the information kept in 
the overal l  parallel processor is N x NC where NC is the number of coefficients in a 
subtree. This number is independent of the number of subtree processors for the storage 
of SIGN and MSIG. Once the subtrees are partitioned and allocated to the processors, 
the processing of the subtrees by each processor is exactly similar to the case of the 
sequential architecture. An important aspect of the parallel processing of the subtrees is 
on how to multiplex the output from each subtree together back into a single bit stream. 
59 
, ,  
Let SAQ,, be the output SAQ symbol and MAP" the output MAP symbol of the ith 
subtree during the kth cluck cycle in the SAQ pass and the MAP pass respectively. The 
SAQ pass ;111d MAP pass corresponds to First Stage and Scconc.J Swgc processing 
respectively. The basic operation to multiplex the output hits from the subtn:cs is to 
take the output symbols from each subtrce in a round robin fashion. Due to the 
scanning of the cocf
l
icient positions one by one in the SAQ pass LO generate the SDF 
bits and in the same process to output the SAQ bits, there are clock cycles that do not 
produce an output bit from a subtree. 
Another complication is to indicate if the output from a subtree is valid so that i t  can 
be read and the output bit can be collected. I n  the DFS BS architecture, this information 
is readily available. When a coefficient position is scanned during the SAQ pass, the 
MSIGik bit is read. I f  MSIGik = 1 ,  then it indicates that the output from the parallel 
processor is valid and is to be collected for this subtree. Each clock cycle in the parallel 
processors generate a maximum of N bits from their outputs. The multiplexer needs to 
read each of the MSIGik bits and collect the output bits for those outputs where MSIGik 
= 1 .  I n  the MAP pass, a similar approach can be used to collect the MAPik symbols. In  
this case, the m ultiplexer uses the NZFik bi t  which can be output together with the MAP 
symbols. If NZFik = 1 ,  the output from the processor processing the ith subtree is valid. 
In this case, two bits are collected from the output for the MAP symbols of this subtree. 
Figure 3.17 shows a schematic diagram of the overall structure of the parallel DFS BS 
EZW architecture. 
.1., 
60 
I' 
DFS 
bit streams •• Multiplexer 
Transmit 
Figure 3 . 17 Structure of the parallel DFS BS EZW architecture 
3.3.2 Output Buffering Requirements 
Since a subtree does not always outpul a symbol during a clock cycle, a buffer at the 
output of the multiplexer is required to overcome the asynchronous nalure or the overall 
output bit stream if synchronization of the output bit stream is required for the next 
stage of processing. The length of the buffer depends on the statistics of how the EZW 
symbols are generated. A handshaking circuit between the multiplexer and each of the 
subtree processors can be used to prevent the buffer from overflow. A subtree 
processor only outputs its sym�ol to the multiplexer if the buffer is not full .  While 
handshaking can be used to prevent the buffer from overllowing, underllow, where 
there is no oulpul from any of the subtrees imposes a more serious problem. In the 
extreme case where a blank image or a blm1k frame of video is input to the EZW 
system, there will be no significant coefficients i n  every scan of the coefficient tree. As 
61 
a result, there will bi: only N MAP symbols, i.e. ZTR.s generated <luring each scan of the 
coefficient tree. A blank video frame is not unusual in the case of video coding. 
While i n  the First Stage of processing by a suhtrce processor, every node in the 
subtrce needs to be searched to establish the SDF hits and to output the SAQ bits, in the 
Second Stage. a more intelligent scheme can be used to avoid scanning each of the 
nodes. Equipped with the knowledge of the ancestor-descendant relationship, the 
search can skip to a node that has a symbol to output. This will i nevitably make the 
addressing scheme more complex. Another way to ease the underflow problem is to 
perform another kind of parallel processing. In this case, a subtrce processor can be 
modified to accept two subtrees. While the processor is performing the First Stage of 
one subtree, it simultaneously performs 1he _Second Stage for the other subtree. In 
addition, we can also combine the outputting of the SAQ and MAP symbols in lhe 
Second Stage. These modiricalions will reduce the output buffer requirements but wilJ 
i ncrease the complexity of a subtree processor. 
3.3.3 Different Output Schemes 
I.i:i the previous chapter, we have found that the DFS causes a slighl decrease in the 
coding performance when compared to the raster scan method used i n  the original EZW 
algorithm. During the outputting of the MAP symbols i n  the Second Stage, instead of 
scanning the storage l inearly from top to bottom using the DFS scanning scheme, an 
address generator can be incorporated to genernte the corresponding raster scan address 
in  the DFS storage. l!, this way, the order of outputting MAP symbols can be generated 
according to the raster scan scheme. For the SAQ symbols, if reordering is performed, 
the order in  which they arc output from the subtrce processors is not significant. Since 
62 
the original EZW algorithm was reported, 1here have been several impruvemcnls 
proposed lO increase ils cmiing performance. The proposed improvements mak!! use or 
modified and addilional MAP symbols that have diffen:nl lc1'.:!ths of binary 
reprcscnli.ltions ranging from one to three bits. To implement lhc improved algorilhms, 
the multiplexer can be modified to read different number of bits from lhc subtrcc 
processors with <.m increase in its implementation complexity. 
Another more complex modification to the multiplexer .can be in the way it collects 
the output bits from the subtree processors. Instead of a round robin fashion, the 
collection can be guided by some heuristics resulting in schemes that arc similar in 
principle to the set partitioning search strategy used in the SPIHT algorithm. One 
possible heuristic would be to collect more bits from subtree processors wilh a higher 
number of significant coefficients as this may imply that the information contained in 
these subtrees may be more l ikely to be significant than the information contained in 
other subtrees. To further increase the oplions for bit collection. each subtrce processor 
can be further split into three orientation processors corresponding to the three higher 
frequency orientation branches of a subtree. Figure 3. 1 8  shows the structure of the 
heuristic-guided parallel architecture with the HL. LH and HH orientation processors. 
This structure allows for different heuristics to guide the collection of bits for the 
various frequency orientation branches. A simple way to deal with the LL root 
coefficient is to always imply that SDF = 1 so that the search will descend to the three 
orientation branches. In the same way, each orientation processor can be further split 
into another three sub-branch processors if more collection options are required. The 
DFS allows the partitioning of a coeflicient tree into as many or as few sub-branches as 
required by the application. 
63 
D 
u " e 
D 
:r 
Subtrt:c I 
D D 
u u " " e e 
D D 
:r :r 
.J :r 
Heuristics 
control 1-----.>I 
D u " e 
D 
.J :r 
Suhircc 2 
D D ,  ,  " e e e 
D D 
:r :r 
.J :r 
p p 
Multiplexer 
'----------' 
, ,  
Tran�ihil 
Suh1rcc N 
D � u ,  " e e E 
D D 
.J :r :r .J 
bit strcums 
Figure 3. 1 8  Structure of the heuristic-guided parallel architecture 
3.4 Conclusions 
� u 
2 
:r :r 
In this chapter, we have presented new architectures for the implementation of EW 
algorithms based on the DFS representation of the coefficient tree. The DFS performs a 
natural partitioning of the coefficient tree into subtrees which can be processed 
independently or in parallel and provides an efficient scheme to determine ancestor­
descendant relationships in the tree structure. We have presented two new architectures 
and focused the discussion on the issues of hardware simplicity and flexibility to 
implement algorithm variations. 
For the first issue of simplicity, we have presented the DFS BS EZW architecture 
which is fast and uses minimal storage and simple processing. The feature of the bit 
64 
stream architecture is that the codfo.:icnt nrngnitmlcs .ire not kept in the processor hut 
arc discarded after use. By the DFS 111etho<l, it is possihlc to process the cocffidcnt bits 
without direct involvement of its ancestors or descendants. The architecture uses two 
stages of proce:,;sing corresponding to two scans of the coefficient trl!c. A counter is 
used to keep track of which level is being searched, simple logic circuits an; used for 
various functions and ,rnothcr counter is used to generate the address to ac(.;ess the 
information in the storage. While the SIGN ,J.nd MSJG bits need to be kept for each 
node in the coefficient tree, the information in SIG and SDF is only used for one scan of 
a subtree and their storage can be reused for processing of a new subtree. 
For applications which only require hardware encoding capability such as digital 
cameras and satellite irnagers, the DFS BS encoder together with a suitable DWT is 
sufficient to provide a hardware encoder solution. For applications which require both 
hardware encoding and decoding capabilities such as video coding, a DFS BS decoder 
is  required as well. We have discussed the decoder architecture and have identified a 
complexity for the architecture due to the compaction of the bit stream by channel 
coding. We have proposed two approaches for lhe d�coder. The hrst appro.1ch is to use 
an input buffer to restore back the uncompucted bit stream. The issue here is the size of 
the input buffer required. If the required buffei: size is small then this approach looks 
promising. The size of the buffer would be large if the number of ZTR symbols 
received is high but could be smaller if the number of ZTR symbols received is lower. 
The optimal size of the input buffer can be determined by st11tistical simulations. The 
second approach is to use a more complex scheme to perform the decoding where each 
bit which is output from the decoder is accompanied by an address which informs the 
wavelet processor of the position of the coefficient in  the tree. This approach would be 
65 
suitable for architectures where the DWT and TS processors share tlw same storage and 
processing rcstnirccs. 
For the second issue of llexihility to implement algorithm variations, we have 
focused the: discussion using the p.irallcl DFS BS EZW architecture. The architcclUrc 
makes use of the DFS BS EZW processor to process a subtrcc or a concatenation of 
several subtrees to p1oduce intermediate information in the DFS format in parallel. The 
proposed architecture is independent of the subtrcc processor in that another processor 
implementing other MAP coding schemes can be used instead. Once the intermediate 
information has been generated in the First Stage, in the Second Stage, the MAP 
symbols can be output in the DFS format or another format by using a more complex 
multiplexer. Rather than the linear addressing scheme used in the basic DFS subtree 
processor, a generator to produce the addressing scheme corresponding to a different 
search method can be incorporated. The proposed paralld architecture provides a 
flexible scheme for different degrees of parallelism and can be easily modified to 
implement the Improved EZW algorithm or algorithms similar to the SPIHT algorithm. 
When multiplexing the output from each subtree processor back to a single bit 
stream, the problem of overflowing the output buffer is addressed by the use of 
handshaking. A more serious problem is the underflow of the output buffer where there 
is very low output from a pass of the coefficient tree which will lead to an asynchronous 
output situation in the overall output bit stream. The underflow situation can be eased 
by the use of more complex addressing schemes in the Second Stage of processing to 
produce the output symbols. 
66 
The sequential and parnllcl DFS BS approaches give a divcrsily of implementation 
options and coding performances lo meet the requirements for different applications. 
For example, a videophone application may use a sequential DrS BS architccLUrc and 
be !-iUfficient to meet thc required perfurmam:e. For .ipplications which require high 
performance such as HDTV, the parallel DFS BS :m.:hitccture gives fastcr pron:ssing 
and higher performance due to better output collection methods. Parallel processing is 
used not only to increase the speed of processing but also to achieve higher 
performance:-
We have identified a number of complexities which can be solved by having a single 
structure for the DWT and TS processors where storage and processing are shared. The 
SIGN and MSIG..bits required for encoding can be stored in  the DWT processor. The 
input and output buffering requirements for the TS processor can be minimized by using 
a scheme whereby the DWT processor communicates with the processor for each bit 
which is input or output from the processor. In the next chapter. \VC will discuss 
different hardware approaches for implementing the joint DWT-TS system. 
67 
Chapter 4 
Hardware Implementation of the DFS BS SPIHT System 
4. 1 Introduction 
The hardware implementation of embedded wavelet (EW) algorithms would require 
two sets of decisions to be made. The first decision is to select the algorithm variation 
to be implemented and the second decision is regarding the hardware platform for 
implementation. There may also be a need to consider the suitability of the hardware 
for the selected algorithm variation. Certain variations in the algorithm may incur much 
more implementation complexity than other variations because of constraints specific to 
the hardware technology, Figure 1 I shows the variations in E\V algorithms for 
hardware implementation. The vr,riations are divided into the four categories of tree 
structure, search strategy, coding scheme and symbol slrearn. We have already 
discussed these variations in Chapler 2. In this chapt�r. we select a particular variation 
for hardware implementation. The variation we have selecled is highlighled in the 
figure. 
Tree structure Search strateo" Codi1w scheme Symbol stream 
EZW DFS EZW Alternating BFS Improved EZW SPIHT Set nartitionin2: SPIHT Mixed 
Figure 4 . 1  Variations for EW algorilhms 
From the discussion in the previous chapters. we have found that the EZW tree 
structure and the DFS tree searching strategy are suitable for hardware implementation. 
The former bt:cause it treats the root nodes as any other tree node and has a more 
68 
suitable ancestor-descendant relationship in the tree structure. The latter because it 
allows for the bit stream architecture where the coefficient magnitude bits do not have 
to be stored and arc processed as they llow through the processor. We have selected the 
SPIHT coding scheme because it gi .•es the highest coding performance and the mixed 
symbol stream because it minimizes the input and output buffering requirements. 
·· The SPIHT algorithm can be described as consisting of discrete wavelet transform 
(D\VT), tree searching (TS) and arithmetic coder (AC) modules. As such, the 
implementation of the full SPIHT algorithm may require up to three separate processors 
to perform the functions of the DWT, TS and AC. An advantage of the SPIHT 
algorithm over the EZW algorithm is that the output bit stream of the former is compact 
and arithmetic coding does not significantly i ncrease the algorithm performance. By 
not requiring the AC module, the implementation of the SPIHT algorithm requires two 
processors. One processor performs the DWT/IDWT and another processor performs 
the TS/ITS. The DWT module is an essential component in  wa,·elet coding systems. 
Several hardware architectures have been reported for the DWT for implementation in  
VLSI [50], [51], [52], field programmable logic devices (FPLDs) [55] and DSP 
processors [561. The TS module is specific to architectures for embedded wavelet 
coding. The TS architectures can be broadly classified according to the way i t  performs 
searcbing of the wavelet trees: breadth-first search or depth-first search (DFS). 
We investigate the feasibility of a single chip device to implement the DFS BS 
SPIHT system. A single chip solution for the DFS BS SPIHT system is a ,•cry 
promising ·.'.'lolution which has low cost and high performance for video coding 
applications in general and for portable video communicators in particular. The work 
presented in  this chapter giv�s three advantages for the hardware implementation of 
69 
video coding systems. 1�liC three advantages correspond to the three design levels of 
algorithm, architecture ml:J implementation. At the algorithm level, the DFS SPII-IT 
algorithm gives several advantages over DCT-based coding schemes used in the current 
JPEG [57] and MPEG [58[ coding standards. In addition to high coding pcrformanr, 
which is very close to that of the original SPI HT algorithui.;: the DFS SPIHT algorithm 
maintains the natural features of EW algorithms like bit stream scalability and the 
ability to meet a target bit rate exactly. At the architecture level. the DFS SPIHT 
algorithm can be easily implemented using low complexity bit i;tream architeclUrcs. 
In this chapter, we will look at the advantages of the DFS BS SPIHT architecture for 
implementation. We will discuss how the bit stream TS/ITS processor can be 
efficiently combined with a DWT/IDWT processor to form a single architectural 
structure for hardware implementation. We will discuss two approaches for hardware 
implementation. The first approach uses a conventional memory bank approach and the 
second approach uses a futuristic smart pixel (SP) VLSI approach. We apply the bit 
stream DFS BS SPIHT EW processor for the memory bank approach and the parallel 
DFS BS SPIHT EW architecture for the SP approuch. For the various approaches. we 
will consider two implementation issues. The first issue is the interfacing of the 
DWT/IDWT processor with the TS/ITS processor. The second issue is the distribution 
of the processing and storage requirements amongst the DWT/IDWT processor and the 
TS/ITS processor to achi�ve an optimal system. 
We present a memory bank implementation of the DFS BS SPIHT system which cun 
be switched to perform encoding and decoding on the same device. The input into the 
system during encoding is a stream of image pixels and the output is :m encoded bit 
stream ready for transmission. The system pruduccs the reconstructed image. dat:i from 
70 
the received bit stream during decoding. The proposed system uses two memory banks 
for processing: coefficient memory bank and subtrec memory bank. To reduce storage 
requirements, the cocffidcnt memory bank is used al different stages of proccs.5ing tu 
, ., 
store the image pixels or wavelet coefficients. In the encoding stage, the coefficient 
memory bank contains the image pixels initially. After the DWT is performed, the 
memory bank contains the wavelet coefficients. During decoding, the coefficient 
memory bank stores the reconstructed wo/Nelet coefficients and after the IDWT contains 
the decoded image pixels. 
' \\1 
This chapter is organized as follows. I�c:-S_eC"ti6h 4.2, we discuss the DFS SPJHT 
(  
\ 
algorithm in relation to the original SPIHT\algorithm. We discuss the memory bank 
and SP VLSI approaches for hardware implementation in Section 4.3 and Section 4.4 
respectively. Jn Section 4.5, we present a memory bank implementation of the DFS BS 
SPIHT system using SYNOPSYS simulation and synth<;,sis tools. We give conclusions 
in Section 4.6 . 
4.2 . DFS SPIHT Algorith1n 
The SPIHT algorithm maintains thre(/ ordered lists to store the significance 
information. The three lists are the list of insignificant sets (LIS), list of insignificant 
pixels (LIP) and list of significant pixels (LSP). The SPIHT algorithm as reported in 
[17] is shown in Figure 4.2. The algorithm uses the following functions to indicate the 
significance of the coefficients or its descendants: 
• 
: significance of the coefficient; 
S,lD(i.j)) : significance of the descendants of the coeflicient; 
7 1  
11 
'ii 
Ji 
• S11(L(i,j)) : s ignificance of the descendants of the coefficient with the exception of 
its immediate children 
where (i,j) is a coordinate in the LIS, LIP or LSP. The SPIHT algorithm poses two 
complexities for i mplementation: 
1 .  The search order of the coefficient tree is according to the set partitioning strategy. 
This requires the coefficients to be processed in the order indicated in the US; 
2. The size of the LIP would decrease from a maximum and the size of the LSP would 
increase from a minimum. 
Complexity ( I )  causes irregularity in processing the coefficient tree where the order 
of processing is not fixed but depends on the LIS. Complexity (2) would require either 
variable storage schemes or a worst-case of allocating the maximum memory required 
for both LIP and LSP. 
To overcome the two Complexities, we propose the following modifications: 
l. The depth-first search is used instead of the set partitioning search strategy. This 
permits a regular processing of the coeffident tree according to the DFS traversal 
order. Moreover, since the order of processing is  fixed, the LIS need not be kept, 
thus reducing the memory requirements. 
2. Coefficients which have been found to be significant are not taken out of the tree but 
are marked significant (MSIG). The LIP and LSP arc kept together in the tree 
structure and the MSIG symbol indicates if  the coefficient is  in the LIP or LSP. 
In the architecture, each node of the coefficient tree is associated with a set of binary 
symbols. Table 4. 1 shows the MAP symbols used in the DFS SPIHT ,_,lgorithm and the 
corresponding SPIHT significance functions. An additional symbol which is used in the 
DFS SPIHT algorithm is MSIG. 
72 
(_I 
,, ,, 
I' 
!I 
L ,, 
>' 
\\ 
Table 4.1 'MAP symbols for D'
.
'S SPIHT and SPIHT algorithms 
DFS SPII-IT SPIHT 
SIGN Sien 
S G (Si11nilicant Coefficient Fla!.!) S,,( i.i) 
SDF (Si11nilicant Descendant Flag) S,,(D(i ;)) 
SGDF (Si11nificant Grand Descendant Flag) S,,(L(i ;)) 
MSIG (Marked Significant Flag) -
Figure 4.3 shows the DFS SPIHT algorithm. The wavelet coefficients are in the 
s ign-magni,tude represeniation where the sign bits are stored in SIGN and the magnitude 
bits are stored in MAG. The number of magnitude bits used to represent the coefficient 
is given by II and each trei;! node is represented by (i). The symbo! LEAF(i) indicates i f  
the node is  a leaf in the 'coefficient tree. The output from the algorithm contains a 
mixture of MAP and MAG symbols. To separate the MAP and MAG symbols for 
transmission would require additional storage. To avoid the additional storage, we 
propose a third modificatioq: 
3. The MAP and MAG symbols are transmitted as they are generated. 
Figure 4.3 shows that by using the DFS and bit stream processing, the implementation 
of the DFS SPIHT algorithm is simplified when compared to the original SPIHT 
algorithm. 
73 
\! 
'\ 'i 
\\ ' 
SPIHT Algorithm: 
1 )  Initiulizution: 
outpu't 11 = L!�g2(max(i.JJ I ci,i I ) )J; 
set the LSP as an empty list, and add the c.::oordirnHcs (i,j) E H to the LIP, and ' 
only those with dcsccndams also to the LIS,as type A entries. 
2) Sorting Pass: 
I, : 
2 . 1 )  for each entry (i.j) in  the LIP do: 
2. 1 . l )  output S,,(i.j); 
2 . 1 .2) if S,,(i.j) = 1 then move (i,j) to the LSP and output the sign of C;j; 
2.2) for each entry (i.j) in the LIS do: 
2.2. l )  i f  the entry is of type A then output S,,(D(i.j)); 
• if S,,(D(i.j)) = l then for each (k,I) E O(i.j) do: 
• output S,,(k,/); 
• if S,,(k,I) = 1 then add (k,I) to the LSP and 
output the sign of cu; 
• if S,,(k,/) = 0 then add (k;/) to the end of the LIP; 
• if L(i,j) .t O then move (i,j) to the end of the ps, as an entry 
of type B, and go to Step 2.2.2); otherwise, remove entry (i,j) 
from the LIS; '' 
' 2.2.2) if the entry is of type B then 
• output S,,(L(i,j)); 
• if S,,(L(i,j)) = 1 then 
• add each (k,/) E O(i,j) to the end of the LIS 
as an entry of type A;  
• remove (i,j) from the LIS. 
3) Refinement Pass: 
for each entry (i.j) in the LSP, except those included in the last sorting pass 
(i.e., with same n), output the 11th most significant bit of I ci,i I ;  
4) Quantization-Step Update: 
decrement n by I and go to Step 2 .  
Figure 4.2 SPIHT algorithm 
74 
DFS SP/f/TA/gorith111: 
I )  Initialization: r,-, , 
output II and set MSIG(i) to O for all tree nodes 
2) Coding·and Refinement Pass: 
,, 
2 . 1 )  for each node (i) i n  the coefficient tree do: 
2 . 1 . 1 )  if MSIG(i) = I then output MAG(i,11) 
2 . 1 .2) if MSIG(i) = 0 then 
• output SIG(i) • if SIG(i) = I then output Sl(J:"(i) and set M SIG(i) 's: I , • if LEAF(i) = 0 then output SDF(i) and SGDF(i) i{SDF{i) = l 
3) Quantization-Step Update: 
decrement n by j and go to Step 2 
Figure 4.3 DFS SPIHT algorithm 
. . 4.3 Memory Bank Implementation 
,�, Figure 4.4 shows the memory bank DFS BS SPIHT structure. The encoder and 
decoder consist of four modules and two memory banks. The memory banks are used 
at different stages to perform encoding and decoding. During encoding, the coefficient 
memory bank initially contain:,; the i mage pixels. The DWT processor operates on the 
image pixels and transforms them into their corresponding wavelet coefficients. The 
coefficients are stored in their sign-magnitude representation. The SIGN, MAG and 
MSIG bits for a subtree are shifted out of the memory bank in decreasing order of their 
bil planes. The DWT-to-DFS converter converts lhe MAG bits and their correspondin!', 
SIGN and MSIG bits into the DFS format for input into the First Stag_i: TS encoder. 
The converter can be implemented using .i look-up table in ROM. The First Stage TS 
75 
encoder culculatcs the SIG, SDF and SGDF bits and stores the SIG, SIGN, SDF, SGDF 
and MSIG bits in the subtrcc memory bank. The Secom.I Swgc TS encoder reads the 
bits out from the subtrec memory bank an<l outputs the MAP for non MSIG bits and 
outputs the SAQ for MSIG bits. During dccoJing. the transmitted bit strcarn is input 
into the Second Stage TS decoder which decodes the 'MAP and SAQ bits into the 
corresponding SIGN and MAG bits. The decoded SIGN and MAG bits are converted 
from the DFS format back to the DWT format by the DFS-to-DWT converter and 
stored i n  the coefficient memory bank. The IDWT processor then operates on the · 
coefficients to restore the image pixels. 
ENCODER 
DWT·to-DFS f-> First Stage 
Converter TS Encoder 
T ,j, 
Coefficient .. _ ,  Subtree 
Memory Bank Memory Bank 
f ,j, 
DWT Second Stage 
Processor TS Encoder 
I . Tr ansm itte 
bit stream 
DECODER I_ \ 
First Stage IDWT 
TS Decoder Processor 
,j, T ,j, 
Subtree Coefficient 
Memory Bank Memory Bank 
,j, t 
Second Stage DFS·to-DWT 
TS Decoder .... Converter 
d T 
Figure 4.4 Memory bank DFS BS SPIHT structure 
The First Stage TS decoder and lhe sublree memory bank are not essential for the 
decoding process. They are used to simplify the processing requirements for the 
Second Stage TS decoder. One requirement for the Second Stage decoder is to 
detern1ine if coefficients are already MSIG. MAP bits and SAQ hits are decoded from 
the transmitted bit stream for non-MSIG coefficients and MSIG coefficients 
76 
" 
respectively. Figure 4.5 shows a coefficient tree with a MSIG node. The condition 
SDF = 0 i°n1plies that there arc no MAP bits to be decoded for _the cocfficicnls. 
However, this does not mean that there are no SAQ bits lo bc;dccodcd. The decoder 
ii 
would still have to decode SAQ bits for MSJG coefficients even though ils anccslOr 
nodes have SDF = 0. In order to do this, the Second Stage decoder would have to scan 
the coefficient memory bank l inearly to determine MSJG coefficients. 
SDF = 0 
MSIG node 
Figure 4.5 Coefficient tree with MSIG node 
The purpose of the First Stage decoder is to read the coefficient memory bank and 
use the subtree memory bank to store the MSIG information in a tree structure similar to 
the structure used in the encoding process. In this case, we use the SIG and SDF bits to 
determine the MSJG tree. MSIG coefficients have SJG = 1 .  The condition SDF = 0 
implies that there are no MSJG descendant nodes in the branch. The Second Stage 
decoder searches the tree to determine MSIG coefficients. Using the subtree memory 
bank to store the MSIG tree does not increase the lrnrdware complexity significantly 
because the memory storage is required for encoding. It does however simplify the 
processing requirements for the Second Stage decoder for determining MS!Cl 
coefficients. 
<.\ 
" 77 
4.3 . 1  DWT/IDWT Processor 
J/ Figure 4.6 shows an 8x8 image pixel array and the DWT coefficients for three s<.:alcs 
of subband decomposition. The subbands arise from separable application of row and 
column lilters at e;.1ch scale. In each row and column filtering, low-pass and high-pass 
filters are applied. The outputs of the low-pass and high-pass filters arc decimated by 
two to give the same number of coeflicients as image pixels. The filtering process is 
applied repeatedly on the lowest frequency subband until the required number of scales 
is reached. 
1(1,1) 1(1,2) 1(1,3) 1(1 ,4) 
1(2,1) 1(2,2) 1(2,3) 1(2,4) 
!(3,1) 1(3,2) )(3.3) 1(3,4) 
1(4,1) 1(4,2) 1(4,3) )(4,4) 
1(5,1) 1(5,2) ,1(5,3) 1(5,4) 
1(6,1) 1(6,2) 1(6,3) 1(6,•)J 
. .  
1(7,1) 1(7,2) 1(7,3) 1(7,4) 
1(8,1) 1(6,2) 1(8,3) 1(8,4) 
l{t,5) 1(1,6) 
1(2,5) 1(2,6) 
1(3,5) 1(3,6) 
1(4,5) 1(4,6) 
1(5,5) 1(5,6) 
1(6,5) 1(6,6) 
1(7,5) 1(7,6) 
1(8,5) l(B.6) 
1(1 ,7) 
1(2,7) 
1(3,7) 
1(4,7) 
1(5,7) 
1{6,7) 
1(7,7) 
1(6,7) 
1{1.8) 
1(2,8) 
1(3,8) 
1(4,8) 
1(5,8) 
1(6,6) 
1{7,8) 
1(8,8) 
DWT -
LL, HL, HL, HL, 
LH, HH, HL, HL, 
LH, LH, HH, HH, 
LH, LH, HH, HH, 
LH, LH, LH, LH, 
LH, LH, LH, LH, 
LH, LH, LH, LH, 
LH, LH, LH, LH, 
Figure 4.6 Image pixels and DWT coefficients 
HL, HL, HL, HL, 
HL, HL, HL, HL, 
HL, HL, HL, HL, 
HL, HL, HL, HL, 
HH, HH, HH, HH, 
HH, HH, HH, HH, 
HH, HH, HH, HH, 
HH, HH, HH, HH, 
Figure 4.7 shows the arrangement of the i mage pixels in the coefficient memory 
bank for encoding. The image rows are arranged consecutively in the memory bank. 
At each s'i\age, the coefficients pointed to by a set of memory pointers are filtered. We 
illustrate the scheme using three-tap filters. The scheme can be extended for higher tap 
filters by increasing the number of memory poi 11tcrs. We define three memory painters 
Back, Current and Front as shown in Figure 4.7(a). 
.,. 
78 
Back ..... 
Current ..... 
Front ..... 
I( 1 ,  1 )  Back ..... L 
1(1 ,2) H 
1(1,3) L 
1(1 ,4) H 
1(1 ,5) L 
1(1 ,6) H 
1(1,7) L 
1(1 ,8) H 
1(2,1) Current --+ L 
1(2,2) H 
L 
H 
L 
1(2,3) 
Row 1(2,4) 
DWT 
1(2,5) 
1(2,6) H 
1(2,7) L 
1(2,8) H 
1(3, 1) Front --1 L 
L 1(8,1) 
1(8,2) H 
1(8,3) L 
1(8,4) H 
1(8,5) L 
1(8,6) H 
1(8,7) L 
1(8,8) H 
(a) (b) 
Back 
Current 
Front 
Colum 
DWT 
--+ 
� 
--1 
n 
LL 
HL 
LL 
HL 
LL 
HL. 
LL 
HL. 
LH. 
HH, 
LH, 
HH, 
LH, . . . . . . . . ... 
HH. 
LH 
HH. 
LL 
LH, 
HH 
LH. 
HH 
LH. 
HH, 
LH, 
HH. 
Figure 4.7 Memory bank DWT arrangemenl f0r encoding 
LL� 
HL, 
HL • 
HL. 
HL. 
HL 
HL-
HL. 
LH. 
HH 
LH. 
HH 
LH. 
HH 
LH, 
HH, 
LH. 
LH 
HH 
LH, 
HH. 
LH 
HH, 
LH, 
HH. 
(d) 
To perform the row filtering, the memory pointers are set to consecutive locations 
and moved one location at a time down! the memory bank. The Back and Front memory 
pointers differ from the Current poinier by an Offset depending on the current scale. 
For the S111 scale of subband decomposition, the Offset is given by i· 1 • The memory 
pointers are given by: 
Back = Current - Offset (4. 1 )  
Front= Current + Offset (4.2) 
The low-pass and high-pass coefficients are stored in  alternating memory locations as 
shown i n  Figure 4.7(b). 
79 
... - ,• ' 
-�,, 
To perform the column filtering, the memory pointers arc inilializcd as shown m 
Figure 4.7(b). The pointers arc given by: 
Back = Current - Offset x hnagc_Col (4.3) 
: Front = Current + Offset x l mage_Col (4.4) 
1 • .  
\V!� .. ere Image_Col is the number of columns in the image. For filtering at the row and 
column boundaries, symmetric extension of the image is applied. For filtering at the 
beginning of rows or columns, the Back pointer is set to Front and for filtering at the 
end of rows or columns, the Front pointer is set to Back. For e..ich scale of subband 
decomposition, only the lowest frequency subband is further decomposed. After the 
last coefficient in  a column is  filtered, the Current memory pointer is set to skip the 
higher frequency subbands which are not required for further filtering. The memory 
pointer is set by: 
Current = Current + (Offset - l )  x Image_Col (4.5) 
The filtering process is repeated for the number of scales of decomposition required. 
After the filte'ring, the memory bank contains the wavelet coefficients in  the 
arrangement as shown in Figure 4.7(d). 
The reverse process is performed for decoding. Initially, lhe memory bank contains 
the wavelet coefficients and after decoding il contains the decoded image pixels. Figure 
4.8 shows the IDWT arrangement for the decOding process. The IDWT processing 
alternates between column filtering and row filtering for each scale. The operations 
required for the DWT encoding and IDWT decoding processes are similar. The 
difference is to use analysis filters for the DWT decomposition and synthesis filters r?J 
J,: 
-·:_:,.• the IDWT reconstruction. /· 
{,_ ' .  "��� 
80 
LL. 
HL, 
HL. 
HL. 
HL. 
HL. 
HL. 
HL, 
LH, 
HH 
LH. 
HH, 
LH. 
HH. 
LH. 
HH. 
LH. 
LH. 
HH, 
LH 
HH. 
LH, 
Ht·f. 
• LH. 
\iHH, 
··� .  
LL 
HL, 
LL 
HL, 
LL 
HL, 
LL 
HL, 
LH, 
HH 
LH, 
HH, 
· · · · · · · · -� LH. 
HH, 
LH, 
HH. 
LL 
LH, 
HH, 
LH 
HH, 
LH, 
HH, 
LH 
HH, 
-- c_ - -
Column 
IDWT 
L 
H 
L 
H 
L 
H 
L 
H 
L 
H 
L 
H 
L 
H 
L 
H 
L 
L 
H 
L 
H 
L 
H 
L 
H 
Row 
IDWT 
Figure 4.8 Memory brink IDWT .i'ri·angement for decoding '·
1 
4.3.2 DWT-to-DFS/DFS-to-DWT Converter 
1 ( 1 ,  1 )  
1(1 ,2) 
1(1 ,3) 
1(1,4) 
1(1,5) 
1(1 ,6) 
1(1 ,7) 
1(1 ,8) 
1(2, 1 I 
1(2,2) 
1(2,3) 
1(2,4) 
1(2,5) 
1(2,6) 
1(2,7) 
1(2,8) 
1(3, 1 )  
1(8, 1 )  
1(8,2) 
1(8,3) 
1(8,4) 
1(8,5) 
1(8,6) 
1(8,7) 
1(8,8) 
For input into the TS encoder and decoder, the DWT format in the coefficient 
memory bank has to be converted into the DFS format. These conversions can be 
performed using look-up tables (LUT) implemented in ROM. The number of entries in 
the LUT is determined by the size of the subtrec. Figure 4.9 shows the memory bank 
DWT arrangement for four subtrees. 
8 1  
Subtree 1 
Subtree 2 
Subtree 4 
LL 
HL, 
HL, 
HL 
HL. 
HL, 
HI 
HL. 
LL. 
HL, 
HL 
HL, 
HL, 
HL 
HL. 
HL 
LH. 
LH. 
HH, 
LH, 
HH. 
LH. 
HH, 
LH 
HH. 
Subtree 1 Subtree 2 
( __ ____..... _______________ r--··---------· ---- �-----] 
'----------.....---�,.__ - �� _ _  ___) 
Subtree 3 Subtree 4 
Figure 4.9 Memory bank DWT arrangement for four sublrees 
The coefficienls for lhe firs! row of subtree I are slored in the first eighl consecutive 
Q1emory locations and the coefficients for lhe first row of subtree 2 are stored in the next 
J' ' 
-- "\ .. eight consecutive locations. The last eight locations store the coefficients for the last 
),t-<=· 
row of subtree 4. The rows for each subtree are nat loi.:ated in consecutive locaiions but 
are dispersed throughout the memory bank. One approach lo bring the subtree rows 
into consecutive locations is to perform a rearrangement of the coefficients in the 
memory bank. This approach would require additional processing to be performed. A 
better approach would be to leave the coefficients in the memory hank and use a more ·1 : 
intelligent addressing scheme to bring the coefficients for each subtree together. This 
82 
approach takes advantage of the fac.:t that although tht: suhlrcc rows arc dispersed 
1hrnugholll tht: mc:mory hank. the: number of memory locations fro111 one subtrcc: row lo 
1hc next is always the same for a panicular i111.1ge size. 
4.3.3 TS/ITS Processor 
The architectures for the TS and ITS proccswrs have been described in Chapter J .  
In this section. we focus our discussion on the handshaking mechanism between the 
First Stage and Second Stage processors to access the subtrec memory bank. The 
handshaking mechanism between the First Stage and Second Stage proc.:c:-,sors is similar 
to the producer�consumer problem with a difference. The dif
f
erence is that the First 
Stage processes from the leaves of the tree to the roots whereas the Second Stage 
processes from the roots to the leaves. This requires a LIFO memory storage. 
We define two signals VALID and LIFO_SEL to perform the handshaking between 
the processors. The V AUD signal indicates to the Second Stage processor that there is 
an item of data which is ready for processing. The VALID signal i s  sel to O initially and 
is subsequently set lo I after the First Stage processor has finished processing the first 
subtree. The LIFO_SEL signal indicates the direction to scan the memory bank. When 
LIFO_SEL = 0, the memory bank is scanned from the bottom to the top of the memory 
bank and when LIFO_SEL = I ,  the memory hank is scanned from the top to the bottom 
of the memory bank. The LIFO_SEL signal is toggled after each subtree has been 
processed. Figure 4. 10  shows the handshaking mechanism between the First Stage and 
Second Stage processors using the VALID and LIFO_SEL �ignals. 
8.1 
, ,  
-1 ----�-�= 
Firs! S1age J� Second S1ago 
TS Encoder i----t'1 VALID = 0 � TS Encoder 
T 
I 
I LIFO_SEL = 0 I 
-�:::i--
1 LIFO_ SEL - 0  
(aJ V AUD = 0 and LIFO_SEL = 0 
Flrst Stage � L; Second Stage First Stage LJ L Second Stage 
TS Encoder - VALID = 1 f--l TS Encoder TS Encoder 1--------+! VA LI D = 1 f----i TS Encoder 
T T 
LIFO_SEL = 1 UFO_SEL " 1 I 
(b) VALID = I and LIFO_SEL = I 
Figure 4. 10 Handshaking mechanism between First anti Second Stage processors 
Initially, 1he VALID signal is set to O and the Second Stage processor doesn't 
operate. The LIFO_SEL signal i�: set to O and the First Stage processor hegins storing 
data from the bottom of the memory bank as shown in  Figure 4 . IO(a). When the data 
reaches the top of the memory bank. the VALID signal is set lO I and the LIFO_SEL 
signal is toggled to I as shown in Figure 4. 1 O(b ). This indicates to the processors to 
store and read the data beginning from the top of the memory bank. For the next 
subtree, the LIFO_SEL signal is toggled hack tn 0. 
4.4 S111art Pixel VLSI Implementation 
In an imagi: or \'idl'o coding :-.ysti:m. the t.:aplun:. digili1.ing :md coding of an image 
arc normally pafonm·d hy diffcrl'nt moduli:s. lk<.:i:ntly. a smart pixel (SPJ approad1 
has hct:n proposed lo ptrfmm thi: capture mid tligi1i1.ing nf an imaJ:!C in the ,,um: de\' ice 
(5lJJ. In :1dditio11, a 1,,·0-dimcn...,ional DWT is abo i:mhcddcd in the.: device. The 
proposed device will also have the capahility w perform a two-dimcn:-.ional ID\VT and 
to display the resulting i m.igc. Our objective is to de:-.ign a TS prm.:cssor which can he 
nw:rged into the proposed SP array an<l which will perform coding and pm,.,ibly 
decoding using as many componcnls as possible hy S\l.'itching the processor to encode 
or decode as required by the SP array. We wi l l  mainly <le!->cribc the encoding part of the 
processor. 
4.4.1 Structure of a Smart Pixel 
The SP architecture is an array processor architecture which combines image capture 
and processing on a single chip. An architecture to perform the DWT within the SP 
array has been reported f59}. The implementation of a two-dimensional DWT consists 
of an array of smart pixels. Except for those at the edge of the array. each smart pixel is 
connected vertically and horizontally to its four nearest neighbours creating a mesh 
structure of processing elements. Figure 4. 1 1  shows the structure of a smart pixel for 
encoding. The sensor is responsible for capturing the light at this pixel position. After 
the light intensity is converted to digital form and stored in the shift registers. the D\VT 
circuitry in cor.junction with those in  the other pixels perform a DWT on the l'apturcd 
image. Figure 4.12 shows a block diagram showing the components of a complete SP 
DFS BS SPIHT system for image coding. The intcrfac� circuit shifts oul thl' 
85 
infonnation of the coeflicienls in 1he DWT format to that of the DFS hit slrc.un format 
re,1uircd by the TS processor. The success of thL' comhined sy .... 1ern depends on the usr.: 
of the storage in lhL' SP so as to minimize storage n.:quireme11h in  1hc TS prrn.:es..,or and 
on shifling :1s nHIL'h proL'L'ssing as possihlc from the TS prrn.:c., ... or \1?  1hc SP array. Thi., 
would n:-quirc .idditional cin.:ui1ry 10 he inrorporatcd into carh .. mart pixel to help in  
performing the L'.<l<ling. 
Sensor AID circuitry 
,I, 
� 
-I 
Storage shift 
C registers - .2 · -
X � ·- u 
J. e-·2 4-. � 
c E  
- E  
u -I DWT circuitry 
Figure 4. 1 I Smart pixel structure 
ENCODER DECODER 
DWT-to-DFS DFS-to-DWT 
TS Processor ITS Processor 
Bit stream 
Figure 4. 12  SP DFS BS SPIHT system 
H<, 
4.4.2 Additional Circuitry in Smart l'ixcl 
Figurc 4. 1 3  is a hlod: diagram showing the inside of a SP with 1hc addition.ii cin.:uil 
for the gcncralion of MSIG and the nmtn,I ftir :-.hi hi 11g: 1,111 lhl' MA( i. SI< i!'i and �1Sf( i 
hits. RL·gislcr MAG_BITS stores tht: magnitude of !he rc..,ulting coefficient c11'1cr a 
DWT lrns hct'll performed on the captured image using the DWT part of rhc SP. wink· 
SIGN stores the sign hit. A shift register i s  considered 10 he connected through the 
sman pixels in cac:h cclu:nn fi1r the purpose of shifting out the cocffo.:icnl infnr111a1ion. 
Each pixel O(.;cupics three bits of the shift rt!gi'.'>tcr. The fin,t hit i:-. always the rnrrcnt 
MSIG bit. 
Column 
SHWT .. ol:T ,tull 
T<'�i,1rr 
FmmS]' 
,1ti. .... 
r 
T11SI' 
hclow 
Sl<i:,. 
�IA(i_BITS 
�ISJCi 
Figure 4. 1 3  Coding ('omponelll in  a SP 
The other part of the additional circuit is for the calculation and updating or MSIG at 
the end of each pass of the coefficient tree. The MSIG hit is updated according to the 
following equation: 
M SIG,, = MAG, OR MAG, OR . . .  OR M AG,, I 4.6 I 
where MAG11 i s  the magnitude hit of the cocflicicnt 111 the pth p.ass of thL' trL'C' with 
MAG 1 being the most significant hit of  the coeflicicnt. The updating l'ln:uit consists of 
Kl 
an OR gate with the rnrrcnt MSIG and MAG a.., the inpuh. The rc<.,11]1 j.., lalchcd and b 
wrilll'II into MSl(i at the end of lhc rnrrcnl pa'>'.'.. The c.:0111e11l<, of lhL'. rc�i..,ler 
�·IAU_BITS :ll"L' abo ,hiftcd lo tht· lel"i. A global rn111rol ,ig11al. S I I IFI "  < )I JT 1, u,cd to 
indirnlt' the differt·nt ncnl.., in\'olvl'd ill tht· .., h i fli11g 1 1f  the thrcl' hil<, f111 1/Jc rt1cll ic1c111,. 
the updating of (\:JSl(i and the ldt ..,hifting of the hit" in VIA(i  BITS. The actual 
shifting of the hit.., i.. controlled hy lhc global clock ..,ignal. A local circuil in cach SI' 
rnn he tlc..,igncd 10 dt·tcrmine lht' event:-. hy counting the occurTence:-. nf lhc change of 
SHffT_OUT. 
4.4.3 DWT Interface and TS Processor 
Figure 4. 14 shows a block diagram of the SP interface and the TS prrn:c..,sor. Also 
shown in the figure is the arrnngement of the coefficient... in the SP array for one .... uhlrcc 
as a result of a three scale DWT dccompo .... ition. The arrangement of the c.:ocfficit'nt.., in 
the array is m.:cording to the nuclck schcml' 1 59] .  By thL' nucleic \Cht.'mr:. a "uhtrcl' j.., 
always located in the SP array in this manner. Each <,t1htrcc i, di,·idcd into four 
quadrants. A column of a quadrant is shifted out first fnllowr:d by auothr:r column and 
so on. When a column is shifted, those hits of the ..,,1111c c.:olurnn i n  thl' quadrani... aho\'e 
will be shifted lo the lower quadrants. The signal SHIFr_OUT control" the ..,hifting of 
a column. Three modulo four counters are used to trad which quadrnm the shi fled out 
information belongs to as well as the Goordinatcs of the original positions of the 
coefficients. This information is ncc:-Jcd hy the orientation dc11rnltiplexcr th:it 
distrihutes the shifted out information 10 the orientation hrnnch proct'ssors as well as hy 
the First Stage TS processor. 
The original DFS algorithm sc;.m:hes :1 suhtrcc in sm:h .i way !hat lhe cucllicienh 
hdongin!,! to one oril·nlation is searched co111plctcly hcforc lhc sean.:h ..,cans I l ic part nf 
!he suhtrl'l' hdont,?ill!! to anolhl·r oric11lati1m. l llll' to the nature or shiflill!,! oul or the 111!\ 
of the ,,,:od'liL·icnh in !he SP anay. infor111a1iu11 for the lhrce orie1llation.., arTt\ t:\ a! !he 
TS llnll'l'Ssor i n  parallel. ThL·rdore. the DFS b modified fun her lo break do\\ n a 
suhtrcc into thrcc oril·nwtion hrand1cs and a root node. Ead1 of the three orientatirni 
branches is seardil•d in the depth-fir"! manner but the \Can.:h i\ performed in a p\cudo 
par.ilk! way in lhl' hr�t Stagl' of thl' procc�'iing. The �card1 of the three orientation 
bram:hcs will lin.iny mergl' had, lO the root nude,;. After the hr'it S1agc. all the 
information ahout a subtrce i" stored im,ide the processor for the Second Stage of 
processing. The temporary storage for the ruot node and the three orientation hranchc'i 
will be reconfigured back into a single storage structure as defined by the DFS 
representation. This storage structure is then scanned from top to hollom to t,!encratc the 
MAP and SAQ symbols according to the DFS. 
Ouadranl 
counter 
HL 
Column 
counter 
LH 
Row 
counter 
-�- -�� Temporary -� - -, 
Firs! S1age storage Second Stage 
processor processor 
Enable 
Bit stream 
Figure 4.14 DFS BS TS processor and SP interface 
89 
4.5 Si1nulation Results 
In this .-.1.·1.·1iu11. WL' pn ..',l"lll ,imulation rt·..,ull, lor a 11h.:mory h:111� 1111plc111l·n1a111111 111 
(63) <lllll ()c,ign Cl1111pilL'r, fti-1 1 11, determine 11, :1rchilt:L:lun.• t·11111plt:o: 1t> "f l1L· 
synthcsiud l·in .. ·uit w,1 ... \'L'ri ficd u,mg 1hc tc,t d.11;, in I lh/ . Wl' u .. ctl the 1.\1 _ J l ,K 
technology lihrnry supplit!d \\ ilh SY7\0PSYS \\ ilh a dod;. p,:ri01J uf 100 11,. Tht.' 
encoder was synthesized u ... ing a 11111.11 cell arL'a of 5609 gate.... S'L'.\l. ,,c ,imul,1tcd and 
synthesized the Dt=S HS SPU·ff ,y-.tcm for Hv l 6  image pi,ch. f·or the IJ\\T and 
IDWT. we: used three scale:- of .;,uhhand <lcnnnpo,uion and n:1.:11n,1rul·1ion u,mg thi: 
tillering. can be impk'mcntcd using ..,hift-. and udd,. \\"c u,t·d 1 1  �ih ftir 1hc width of tl1t: 
cocflicient 1mcmorv b;:mk. The DFS BS SPI I-IT ',\'..,lL'lll \\.t\ ... , nthe ... 1,cd thin!! a 101al . . . � 
cell area of 6 1 53 I gates. Figure -t 1 5  shows a lop k\·d ... t.:h1.·matll' ul t ilt' ,� ,1cm. 
Eni:o<lc_l>cc1,dc 
lmilgc_Pi,d� 
Bi1_S1rcarn 
Re.n.ly_fnr _lnpul 
Ou1pu1_ Valid 
--t 
. 
DFS BS Sl'I I IT 
Sysll·m 
Figure -l. 15  S1.·hcmatil' or DFS BS SPIHT system 
Table 4.2 shows the drnrncterislil:s of the dat.1 sig1rnls. The Encnde_Dccode i npur 
signal determines if the dc\'icc is set to perform cni.:oding or lh:cmling. Thi:' signal is �et 
to O for encoding and I for decoding. During encoding. hnagc_Pixclri cuntain the input 
data lo he encoded and Hit_S1rcam contain the nu1put hit stream to he tran:-.mittcd. 
90 
' 
·.,\ 
',.\ 
\\ 
\\ 
',I 
l,\ 
During dccoding. Hit_Strc;un coru.iin lhc inpul hi1 ,1rc:un 10 he di.'C.:«Klcd and 
lm:igc_Pixcb \.·tnllotin 1hc tmlJml rcn111,tnu:1t:d d,11a. Tl1c f Jutp111 Valid l1·1111hhak111µ 
tr.m,mi..,.i,111 ,:hanm ..·I anti 1, .1h\.1\, ,;1lul ,111d f<l·,uh lor lnpul i.. 'l"f lo I . . 
Dat;1 ,11!11,11 ' 
E1K1Klc ()t:(1Klc [ 
lma1!C Pn.d,  I 
Iii! Sm.·,un I 
lkaJ\ ft1r � lnrml ! 
0UIPUt \·ahd I 
lnpu1/( Jutpul 
lnnu1 
Bul1rc-,: 1 1 1 111;11 
B1d1rr.:,:111 111..1I 
In nut 
OuUHJI 
I 1 );11,1 '·" uhh ! 
I 
! 
I 
' ' 
' 
I 
I hi! I 
---j 
1'\ h11' i 
I hi! 
I hi! I 
I h11 I 
\Ve then extended the ,imul:1tion, for difkrcnl imagc •,ci1.i:,. For lhe ... e ,imul1.11ion..,. 
we used the Akatel Mietcc 0.7)1 C�IOS h:chnology lihran 166) wi1h a dod, pcri1KI of 
50 ns. Figure . .t. 16 and Figure -U 7 ,t10w 1hc total ccll arl'a and 1:11 11,;l t.:�t:k la!t.'IK� 1t1 
implement the DFS BS SPIHT ,y,1cm fur a rt.·,our(�·-dri\l'Tl .111d 1im111�-t..lml'n 
confieuration respcctivel,· . 
I '  
� • 
il 
rrilnimizc cell area hy till' sharing of common rl·,ou11.:l·, 
configuration for 32x32 image pi.xeb w.i� '-Ylllhc,i1.cd lhing .i 101.1! l·l'II �1rt.·a ul 1 57491 
gates with a clod cydc latency of 1 16 L·ydcs. The timing·tlrivcn n1nfigur,u1011 1� 
synthesized to minimize clock cycle latency. 1 he timing-tlrin·n l'tmfiguration for 
32x32 image pixels was synthesized using a 101.al cell area of 1 77965 gatc ... with ;.1 dock 
[ cycle latency of 65 cycles. For a 13% increase in cell area. the dod cyL·lt.· latency nf 
'!', the timing-driven configuration is almost half 1h.i1 required hy the restlllrL"t.'-drh·cn 
configuration. Further explorations in hardware synthcsi!- l'i.111 hc- l'arricd 0111 to oht;1i11 ,1 
joint optimization between the cell are.i :.md clock l·yde latenL"y. 
'l I 
= 
u 
·�/ 
. . . .  Resourcc-dnl.4?n configural!Oll 
--Timtng·dn..cn conftguralt00 ! .' ,, 200000 �---------·--- --�-----, 
160000 
120000 
. 
60000 . . . . . 
40000 
0 
8 16 24 
Row leng!h 
Figure 4.�6 Cell an:a for DfS BS SPIHT l·onfiguratiom. 
. . . . . . .  Resource-dnven configuration 
--Timing-dri..en configura11on 
32 
160 �------------------� 
120 
! 00 - - . - . . - . - . . . . . . . . . . . . . .  .
.. . . . . . . . . 
.. .  . . . 
. . . . . . . . 
'8 1-----------1 
40 
0 
8 16 24 32 
Row length 
Figure 4. 17 Clock cycle lalcncy for DFS IIS SPIHT conligurnlions 
Figure 4. 18 shows the breakdown of the cell area in terms of combinational and 
sequential areas for the resource-driven configuration. The figure shows that the 
combinational area increases almost as much as the sequential area as the image size 
increases. The reason for this is because SYNOPSYS synthesizes memory to static 
RAM which is implemented using flip-flops and multiplexers in its cell library. A large 
memory bank would require a correspondingly large multiplexer to access the memory 
locations. The cell area would be very much reduced by using a RAM component from 
an IC vendor. 
1 60000 · 
ell 1 20000 · 
� 
ell 
80000 
40000 
- . .  - . combinational . . . . . . .  sequential -- total cell area 
. . . .. -- -. . . .  -
.. ..  -..:: ·_ :...: : : :.. - · -
.. .. y·-··:; - · .., .-
0 1-������--r-��������������� 
1 6  24 32 
Row length 
Figure 4. 18 Breakdown of cell area showing combinational and sequential areas 
Figure 4. 19 further shows the total cell area for different bit widths of the coefficient 
memory bank. The figure shows that the cell area does not increase significantly for 
larger coefficient bit widths. The increases in the cell areas are due to the additional 
flip-flops which are required for storing the increased precision in the coefficient 
magnitudes. 
93 
1 60000 
co 1 20000 
� co 
80000 
40000 
- . .  - . 1 1  bits . . . . . . .  1 3  bits -- 1 5 bits 
"' . 
0 -+----------,----------r----------! 
1 6  24 32 
Row length 
Figure 4. 19 Cell area for different widths of coefficient memory bank 
4.6 Conclusions 
We have presented two approaches for the hardware implementation of the DFS BS 
SPIHT system. The first approach uses a memory bank approach and the second 
approach uses a smart pixel (SP) VLSI approach. We have discussed two 
implementation issues. The first issue is the interfacing of the DWT/IDWT processor 
with the TS/ITS processor. We have looked at the interfacing of the processors for the 
memory bank and SP approaches. For the memory bank approach, the DWT coefficient 
arrangement can be converted to the DFS format by using a look-up table (LUT). The 
number of entries in the LUT is determined by the size of the subtree. For the SP 
approach, a different interfacing scheme is used. Due to the nature of the shifting out of 
the bits in the SP array, the original DFS algorithm is modified so that the DFS is 
performed on the three orientation branches of the subtree in the First Stage processing. 
94 
The second issue is the distribution of the processing and storage requirements 
amongst the DWT/IDWT and TS/ITS processors to achieve an optimal system. To 
reduce storage requirements, the coefficient memory bank is used at different stages of 
processing to store image pixels or wavelet coefficients. One requirement in the system 
is the generation of the MSIG information. The flexibility of the bit stream approach 
enables the generation of the MSIG bits before the encoding process. For the SP 
approach, the MSIG bits can be directly generated inside each pixel. We have also 
discussed the similarity of processing for the TS encoder and decoder. Both the encoder 
and decoder use two stages of processing. The First Stage scans from the bottom to the 
top of the tree whereas the Second Stage scans from the top to the bottom of the tree. 
Although the First Stage is not essential for decoding, it is advantageous to perform the 
First Stage scanning to store the MSIG tree structure in the subtree memory bank to 
simplify the Second Stage decoding. This does not increase the hardware complexity 
significantly because the memory storage is required for encoding. 
Finally, we have presented a memory bank implementation of the DFS BS SPIHT 
system. The similarity of the encoder and decoder structures enable the construction of 
a coding system which can be switched to perform encoding and decoding on the same 
device. The flexibility to perform switching between encoding and decoding makes the 
coding system very suitable for "encode-decode" applications such as video 
communicators. The DFS BS SPIHT system can also be applied to "encode-only" 
applications such as digital cameras and satellite imagers. We have performed 
simulations to determine the complexity of the system for different configurations, 
image sizes and coefficient bit widths. The DFS BS SPIHT system is suitable for 
implementation on DSP processors, FPLDs or VLSI. 
95 
Chapter 5 
Applications of the DFS Embedded Wavelet Algorithms 
5 . 1  Introduction 
Features which are desirable in compression algorithms are high coding 
performance, the scalability of the coded bit stream, the robustness of the bit stream 
towards storage or transmission errors, the suitability for content-based coding and rate­
control schemes and low complexity for hardware implementation. Embedded wavelet 
(EW) algorithms give the highest coding performances amongst image coding 
algorithms in addition to their natural characteristics for bit stream scalability and rate­
control [67]. In the previous chapters, we have discussed the DFS EW algorithms 
which have low complexity for hardware implementation while maintaining the natural 
advantages of EW coding. 
In this chapter, we will look at the applications of the DFS EW algorithms for robust 
and content-based coding. The DFS algorithms give high coding performance even 
without arithmetic coding and the reordering of the SAQ. The omissions of arithmetic 
coding and SAQ reordering in the DFS algorithms were intended to simplify the 
hardware implementation. An additional benefit of omitting these modules is that the 
bit stream has robustness towards the occurrence of bit errors. We investigate the 
robustness of the DFS EZW algorithm. The second application we will look at is for 
content-based coding for videoconference and videophone applications. In such 
applications, the normal scene involves a human face occupying the centre of the image 
and a background. Content-based coding is an important part of the MPEG-4 coding 
96 
standard. Each fnune of an inpul video signttl 
1/' 
·, I' 
is'''first segnfonted into a video ohjcct 
plane (YOP). We assume the h1cation l)f the face VOP and discuss lhc implementation 
of the content-based image compression using the DFS implementation of the Improved 
EZW algorilhm. 
5.2 Robust Coding 
For image or video transmission over a noisy transmission channel, there are two 
kinds of noise if  a lossy compression technique is used to compress the image or video 
before it is transmitted. The first kind of noise is caused by the compression itself 
,.,_�hich only encodes the image to an approximated image to be sent through the channel. 
,._,,. 
<The other kind of noise is due to the occurrence of errors to the information being 
transmitted in  the noisy channel. Traditionally, these two kinds of noise are handled 
" ' 
separately ir, . (·he source coding and channel coding stages where the coding in each 
stage is optimized according to the corresponding criteria. Another approach is the joint 
source and channel coding technique where attempts have been made to o�tain an 
optimum tradeoff between source coding accuracy and channel error protection ?nder 
1he constraint of a fixed lransmission bandwidlh [68]. 
During the transmission of the image information in  the embedded bit stream using 
the EZW algorithm, decoding can be stopped at any point and an approximated image 
can be reconstructed based on the information received so far. This characteristic can 
also be used in the case that if an error has occurred to the bit stream, then decoding can 
be terminated and an approximated image can be reconstructed with less than the 
expected resolution. In order to minimize the propagation of errors, the embedded bit 
stream can be partitioned into a number of independent bit streams. The partitioned bit 
97 
streams can he scnl lhrough different trnnsmission channels. An error that occurs in nni; 
of 1hc bit strcmns will not affect the 01)11:r. The original EZW :1lgorithm will not tolernte 
any error in any one of the hit streams an<l it will have to stop decoding the hit !-.tri:am 
th.U has been af
f
ci:tcd hy an error. 
An error occurring in a bit position of the EZW cm:odcd hit  stream will lead to 
propagation of errors to other parts of the hit stream. Various methods have been 
proposed to overcome the propagation of errors. In the Wt)rk reported in 126], the 
coefficient tree structure is partitioned into independent subtrees and coc.lcs for these 
subtrees are sent through different channels. In this way, an error occurring in one of 
the subtrees will not affect the integrity of the other subtrees. In addition, the proposed 
method can decode the information already received prior to the occurrence of the error 
to recover as much information as possible. Other methods [27], [29] use similar 
strategies to isolate the affected part of the codes and decode those not affected. 
In  this section, we will discuss the error propagation characteristics of each EZW 
module and investigate different ways to modify the original EZW algorithm including 
the deletions of some of the modules so as to arrive .1t modified algorithms that are 
more robust to channel errors. We propose modifications to the original algorithm that 
will detect serious transmission errors with additional information and will ignore 
transmission errors that only cause local effects. 
5.2. 1 Error Characteristics of the DFS EZW Algorithm 
With the arithmetic coder (AC) in the EZW system, an error occurring i n  any part of 
the bit stream will force the AC in the decoder si�e of the system to go out of 
synchronization with the AC in the encoder. The EZW algorithm performs a raster scan 
98 
of the cocfticicnls to cswhlish the imccslor.c.Jcsccnd,1111 rcl:11ionship i n  the coclfo:ir.:nt 
tree slructurc. In c.ach 1x1ss of the scan, this relationship is n:prcscntr.:t.1 i,', the MAP hit 
stream. An error rn.:curring in  a hit position m.iy corrupt thr.: i nformation ahout the rr.: .. 1 
of the 1rcc. From that point onw:.mls, the tlr.:cmJcr will not hr.: able to dccmk: any furl her 
useful information. The dcco<lcr is still .able to detect the occurrence of  ..in error hy the 
AC !26J. It may still use information already rcccivc<l to construct a decoded im..igc of 
less resolution. In the following discussion� we assume that the arithmetic coder is 
optional so as to study the effects of an error on the hit stream being transmitted. 
Using the DFS representation, the coefficient tree structure is naturally partitioned 
into a number of  independent subtrees. These subtrces are encoded separately during 
transmission. An error occurring during the transmission of the encoded information in 
a pass will only affect che information in a subtree if different physical or logical 
channels are used to transmit the information of the subtrecs. Similarly, the decoder 
will be able to use already received information to construct image information from the 
part of the subtree. Normally, an error occurring in a bit position in the SAQ bit stream 
will only affect the corresponding coefficient, i.e. the value of the coefficients may be 
i ncreased or decreased. In the EZW system, although an error occurring in SAQ will 
not affect the decoded structure of the coefficient tree, it will af
f
ect more than one 
coefficient because of the SAQ reordering. An error to an approximate bit will change 
, the value of that coefficient. While the encoder is still using the correct value to order 
the approximated bits, the decoder will use the erroneous bit to order the decoded 
coefficient values in the decoder. As a result, approximated values will be added to the 
wrong coefficients. 
99 
, ,I-
5.2.2 Robust DFS EZW Models 
. '- \ Thi1 umlcrlying modilic,1tio11 mmlc to the EZW algorithm to improve its robm,tncss is 
to use the Dt-:S 10 partilion !hi: codlkicnl tree into independent suhtrccs where thc MAP 
and SAQ bits c;m he sc.•11t through different physical or logit.:al channels. If arithmctil.'.' 
coding is performed on the overall hit stream containing both MAP and SAQ, decoding 
will stop at or dose to the error point for this suhtrcc. However, this error will not 
affect the other subtrccs. If arithmetic coding is applied only to the MAP and no 
reordering is <lone on the SAQ, then an error occurring in the SAQ part of the bit stream 
will only affect the corresponding coefficient. In this case, the bit errors can be 
tolerated and decoding does not need to be stopped. 
On the other �and, if  only the SAQ is arithmetically coded, an error occurring in 
either part of the bit stream may lead to termination of decoding. However, we will 
discuss the situation that an error to the MAP part may be tolerated if there is no 
structural change to the tree structure. If no arithmetic coding is used, then decoding 
can continue i f  the error does not lead to the structural change of the MAP. Table 5 . 1  
summarizes all the cases where an error to a symbol i n  the MAP changes the symbol to 
another symbol and their effects. Of the nine cases, the change of a POS symbol to a 
NEG symbol and vice versa will not be detected without the AC. The effect of this 
error is to change the sign of the coefficient in the decoder and this change of sign will 
lead to noise being introduced into the decoded image. Errors leading to the other cases 
can be detected if the length of the MAP is known. Assuming the length of the MAP is 
known to the decoder, an error causing another symbol to change to the ZTR symbol 
and vice versa would be detected within the current MAP. A more serious case, as for 
as error detection is concerned, is that an error causes the change of the symbols POS or 
100 
)) , . , 
NEG to IZ and vice vcrsu. In this case, the error will 1101 he dclcctcd until the decoding 
of the next MAP. In order to detc1.:1 this error effcclivcly, the decoder ah,o needs tu 
know 1he length of the SAQ. Tahk· 5.� s11111111:iri1.cs the detection of errors and the 
severity if undl'tectcd. The use of the AC tu detect •my error is not l isted in the table 
t'!Xccpt when the error C<lll only he detected hy it. 
Table 5 . 1  Ch,mgc of a MAP symbol lo another symbol by a one-bit error 
MAP symbol Corrup1cd Effect to lhe decoder 
symbol 
NEG The shin of one coeflidcnt is changed to negative 
POS 12 The corresnon<ling SAQ is one hit shorter 
2TR The length of MAP is different 
POS The sign of one coefficient is chanue<l to positive 
NEG 12 The correspon<lin.2 SAQ is  one bit shorter 
2TR The lenl!.th of MAP is dif
f
erent 
POS The corrcspon<lin.2 SAQ becomes a hit longer 
12 NEG The corr,espon<lino SAQ becomes a bit longer 
2TR The k:n°th of MAP is diffcfent 
POS 
2TR NEG The length of MAP is different 
12 
Table 5.2 Error detection of the DFS E2W 
An error to Detected by Severity if undetected 
Reordered SAr i Most coefficients in a subtree 
POS H NEG Arithmetic coder only The sign of one coefficient 
Non-reordered SAQ The magnitude of one 
coefficient 
Changing the lengths of SAQs 
MAP symbol i 11 subscquenl passes and 
involving IZ 
Decoder even without AC, but 
information on one signific:.rnt 
with the lengths of MAP and coefficient 
SAQ Major corruption to the MAP symbol structure of the tree and 
involving 2TR information on significant 
cocfticients 
IOI 
From Tahlc 5.2. i f  the 11pproxi11mtcd hits in the SA() arc reordered, then :m AC cnuld 
be used to dclCl'I any error occurring IO it. If arithmc1ic codin!! b not m,cd, the SAC) 
should not he rconk�rcd. The most dl',rructivc cases involve the symhol ZTI{ whether it 
i s  1he original symbol or the corrupted syrnhol. The net effect of thb error is that the 
information following this error hccomcs almost random. In 1his case, deem.ling of the 
corresponding subtree should he stopped. The less dcs!n1t:tive cases involve the IZ 
symbol. I n  these cases, the structure of the MAP is not changed, except that the length 
of SAQ is one bit d ifferent. I f  the decoder can identify that such an error has occurred. 
it can still decode the MAPs in subsequent passes but it has to discard all the future 
SAQs. Decoding a MAP can only extract the most significant bits of the newly found 
significant coefficients. 
5.2.3 S imulation Results 
In this section, we perform simulations to investigate the robustness and tolerance of 
the DFS EZW algorithm to channel noise. In lhe sirnulalions, we di<l not use the AC 
and SAQ reordering  so as to test the tolerance of the algorithm to the POS H NEG 
errors and the occurrence of errors to the non-reordered SAQ, i.e. these errors are not 
detected. The·coefficient tree structure was r ·rtitioned into independent subtrees using 
the DFS representation. Each subtree is coded independently as in [26] and the 
embedded bit stream for each subtree is  assumed to be sent through separate physical or 
virtual transmission channels. Table 5.3 shows the PSNR obtained when one random 
bit error was substituted for each IOOO SAQ symbols for different bit rates (bpp). The 
table shows the total number of SAQ symbols and the total number of SAQ bit errors at 
different b i t  rates. As expected from Table 5.2, since each SAQ bit error affects only 
102 
one cocflicicnl. the PSNR is  not much affected hy the SAQ hil errors. Figure 5 . 1  show!-. 
1he Lena image .it 1 .0 hpp wilh 23 SAQ hit errors. A PSNR of 34.39 dB wa� ohtaincd. 
bpp 
0.25 
0.5 
1.0 
Table 5.3 Bil  errors in SAQ symbols without error detection 
Number of SAQ Numhcr of SAQ 
PSNR i<lllJ 
without hil symbols hit errors with hit error error 
55 1 1  4 28.25 28.24 
1 1 227 I{) J0.89 JIJ.88 
23493 23 34.39 34.)9 
Figure 5. 1 Lena image at 1.0 bpp with 23 SAQ bit errors: PSNR = 34.39 dB 
Similarly, Table 5.4 shows the PSNR obtained when one random bit error was 
substituted for each 1000 POS <-> NEG symbols for different bit rates. The random bit 
error changed the symbol POS lo the symbol NEG and vice versa. The table shows the 
total number of POS/NEG symbols and the total number of POS <-> NEG bit errors at 
103 
liifft.·rcnt hil rnll'�. A 1..·hangc of lhl' l'(JS ,ymhol to thl' Nl-:<i �ymhol and vicl' ver:-,a wii l 
will ht.• introdurt.·d into lhl' dl'rnlbl image. l lowi:vcr, 1hc.: n:�ulh :-.how that al higlu.:r hil 
rah.·s. thL' noi�l' in lhL' i111;.1gl' due lo each POS � � ;  Nl.:(i hit error i, much reduced. Figure 
5.:! shows the Lena image ;ii 1.0 hpp wi1h 22 POS < � >  NEG hit error:-.. A PSNR of 34.26 
dB was obtained. 
bpp 
0.25 
0.5 
1 .0 
Table 5.4 Bit errors in POS H NEG symbob without error detection 
Number of Number of 
POS/NEG symbols POS H NEG error!-. 
4154 3 
9727 I 0 
24027 22 
PSNR (dBJ 
withoul bit 
error 
28.25 
30.89 
34.39 
..... •
with bit error 
28. 1 8  
30.61 
34.26 
Figure 5.2 Lena image at 1 .0 hpp with 22 POS <-> NEG bit errors: PSNR = 34.26 dB 
104 
5.3 Content-Based Coding 
In  .tpplk:uions whL'n.' lhl' purposl' of lhL' image or vidL'tt j.., to provitlL' fal·c-tu-lacc 
interaction al diffL'l"l'lll ends of the L'on1111unication d1anncl. more c.:mpha..,j.., ..,llould hi.:' 
pl;,1ccd on lhc human fal'c.: in the image and less 011 1hc b.u:kground. In \'itkm:onlcrc111..:1..· 
and \'idt..-ophonc applic,1tion . ... the normal .,L'cnc involve..., a human fot.:r.: occupy iug the 
centre of the image and a background. While thi.: hackgroumJ serves the purpo..,e of 
showing the context of the ovt!r;.11! picture. its deli.lib arc lc:-.s important and \'iL'wa:,; arc 
normally less interested in its contents. A complete elimination of tht: hackgrnund rn.iy 
not be acceptable. Methods have been proposed to allocate more rcsourcr:s to the region 
of an image which contains the huma11 face in imcrL"st 1691, 1701, The . ..,c me1hml.., arc 
based on analyzing the t, ven image and on locating the face region. 
In  this section, ,ve propose to use the EZW algori1hm to acc.:ommodmi.: ccmtent-b,1...,cd 
encoding of different regions of an image. Assuming thi.: information ahout where the 
region of interest in  an image is available, the proposed method applies a :,;caling factor 
to the coefficients in the selected region in all the subhands of the rt.!sulting wawld 
decomposition. The coeflicicnts are then encoded and transmiued to the other end of 
the communication channel. Before the dccmlcd 1.1pproximated coelfo.:ients ,ire inn�rse 
wavelet transformed, they are multiplied by the inver:,;e of the corresponding sculing 
factor. The advantages of the proposed method arc th;:11 the encoded hit stream dm.•s 1101 
lose its embedded and scalable features and thc.!rc will not he serious blocking artifacts 
at the boundaries of the scaled up and the background n�gions. The proposed method 
can be extended to video coding schemes which arc based on the EZW algorithm. 
105 
• ' I "··"· Cuntcnt-Basrtd EZ\V Encoding 
Using thc EZW algorithm. lo phu.:,: morL" t:mpha,i, on the rq,dun of intcrc..,I i, lo 
inl·rc.1s1..• lhc signifil'OIIH.·c of lhc n1dficic111s in lhal rcl,!ion. Tiu: 1110,1 ... 1raightforward 
way to i 11L·rc.tsc the signilkanL·c of !he L'ocfficicnb is to multiply tho'>c coclfo.:ii:11\.\ with 
a sc;.ding fol.'lor lx·forc 1h1..• co.:flkicnt:-. arc suhjl!ctcd to the ,canning hy the EZW 
algorithm. In Figurc 53ta). OJh..'c thL' region is determined in the input irn:1ge. the region 
is then L'onwrtcd into 1hc corresponding rcgiom. in  all thr.:: suhhamb. Figure 5Jfh> 
shows the sdcctc<l region of the image in all th1..• subbands for a three scale wa\'clct 
decomposition of an image. 
(a) Selected region 
• • • • • 
\ .• • • 
II 
(h) Selected region in thc suhh:.m<ls 
Figure 5.3 Sclccled region of an image ;uul its position in the suhhands 
;1pplk·alin11. Th1.• l,l\.·alion am! ,i,c- of lhc- f;Kc- n:_;?1011 ;irL" lhc:n tr..111,111111c:d. Ju the: ca,t: nl 
once the region i, loG.1tcd ;md i, 1ra,.:�cd hy hoth the L"11L·o1.lc:r ,md tlccrn.kr . 
.!. l 
T 11 ('-JI men, 1 ,,n � J tJ.:r rr;1 . .  r; 
IIJHkl trJn•!LJTm Jc1nm1n.111,,n 
• l 
S,·:i ! 1ng lll !J,� re;1"n .!•dl1,1rnt• 
,I, 
Eni:uJ1ng llf ,:11d!1,:1eni- ti:, rhc EZW �1r,1r11hm 
TrJn,m1"wn 
DernJ1ng 1 11,�mhn],  ,, the EZW 
,I, 
Oc-,oling ol 1a.-c r�·g1,1n ,·,1dl1,· 1rnt-
,I, 
lllHr1e 11111-llimcn,wnJI 11 J1ckt tran,form 
,I, 
,\ ppw, 1 nJJll"J 
1 mJgc 
Figure 5.4 Content-based encoding and decoding of inrngc using the EZW algorithm 
The image is then subjected to a lWtHlimensional wavelet dc-cnmposi1ion. Afti:r the 
coefficients in the sclectc<l region arc multiplied hy lhi: scaling factor. they ari: 
processed by the EZ\V algorithm as if they were normal coefficients from a wawh:t 
107 
Jccumposilion. There is nu d1ange to the EZW al!,'ori1h111 ihclf aml 1he alJ!orillnn 
prrn.luecs the- EZW symhob .tl'l'ording 10 lhl' modified valt1c.., ul' thl· coelficie11h. At the 
Jccrnkr end. lill' EZW ... y111h11h arc dt'l.'rnktl. Bdore lhe decoded \'aluc.., ul Jill' 
l'ut•flkicnb ,arc invcn,c 1rnnsfom1L:d had; into the "fl:tlial domain. the coefficit:nh in Ilic 
sdC"l'h.'I.! rl'gion of 1he image in all thl· suhhamb ,ire muhiplic<l hy lhc invcn,c ol lhe 
scaling fm:tnr. 
I n  the sd1emc shown in  Figure 5.4, due to the fact that there hus heen an artificial 
increase i n  the significance of the coeflicienls in the selected region. there may he an 
increase in  the !lumber of ZTR symbols in the first few passes of the EZW algorithm. 
The number of these introduced ZTR symbols depends on 1hc size of the scaling factor. 
Let the maximum value of 1he magnitudes of the cocfticienls in the non-selcc.:ted region 
be Cina,, the maximum value of the magnitudes of the original coefficients III the 
selected region be C' 111J, and the scaling facwr be S. After the coeflic.:icnts III the 
selected region are multiplied by the scaling factor S. the maximum value is S * Cm;1,. 
If S * C' max > Cmax, then the threshold value of the EZ\V algorithm is equal to S * C 111,i\ 
/ 2:,; where N is the N111 pass of the coefficients. Let T :;:  S * C' nw, / i'. During the N111 
pass, i f  T > Cma,, then all the subtrees belonging to the non-selected region have no 
significant coefficients and each of these subtrees is encoded by a ZTR symbol. In 
general, the number of ZTR symbols is increased if S * C' ma.� > C1111". 
5.3.2 DFS BS EZW Content-Based Implementation 
Figure 5.5(a) shows a three-dimensional represenLation of the wavelet coefficients in 
sign-magnitude binary format. The sign bits of the coefficients are processed separ:.1tely 
according to the EZW algorithm. I n  this binary representation of the magniludcs of the 
108 
coefficients, each bit plane of the subtrees is  scanned i n  uccessive pa ses accordi ng to 
the b i t  stream (BS) version of the DFS EZW algorithm start i ng from the most 
s ign ificant bit plane. If the scali ng factor is of the power of two, the mult ip lying of the 
e lected area in a l l  the subbands by the cal i ng factor is equivalent to shifti ng the 
ubtrees forward by the log of the sca l i ng factor. 
(a) Subtree bit p lanes 
0 I 
0 0 0 
0 0 0 ii 
0 0 0 II I 
o n o o 
0 U O I 
U U O I 
U O U 
U O O O I I 
0 [) U I 
ll O 0 
O O 0 
u u 
(b) Shi fted b i t  p lanes 
Figure 5 .5 Three-di mensional repre entation of sign-magni tude b inary format 
Shown i n  Figure 5.5(b) i the case where the subtrees for a selected region of the 
image have been sh i fted forward. Scann i ng of the subtrees starts from the outmost b i t  
pJane where only the bits for the selected region are pre ent. This a1i-angement avoid 
the genera tion of the ZTR symbol a ·  di cussed in Se tion 5 . 3 . 1 in the case of the 
general EZW algorithm. The other feature of th DFS BS EZW i mplementation of 
content-ba ed image encoding is that the shi ft i ng forward of the subtrees can be 
performed even after the scann ing of the subtrees has tarted. This occur m 
1 09 
applications where the user at the decoder end of the transmission may request more 
information for a selected region to be transmitted. Using the DFS BS EZW 
implementation, the request is translated to just shifting forward of those subtrees 
corresponding to the selected region in a new pass of the next bit plane. 
5.3.3 Simulation Results 
In this section, the DFS BS implementation of the Improved EZW algorithm is used 
in simulating the performance of the content-based encoding of an image on the 
5 12x5 12 Lena image. Using the face region selected by a rectangle as shown in Figure 
5.3(a), Figure 5.6 shows the resulting images using scaling factors as shown in Table 
5.5 together with the overall PSNRs. Decoding of the image stopped at the bit rate of 
0. 1 bpp. As seen from the figures, the decoded image without content-based encoding 
loses sharpness more or less uniformly throughout the image. 
Compared to the face region in those decoded images with scaling factors greater 
than one, the sharpness of the face region improves with increasing scaling factor, while 
the sharpness of the background deteriorates, as well as the overall PSNRs. Although 
an abrupt window is used to select the face region, there is no obvious blocking effect at 
the boundary between the selected region and the non-selected region. A 'blocking' 
effect becomes obvious when the selected region is much clearer than the non-selected 
region. 
Table 5.5 Overall PSNRs of content-based encoded Lena images at 0. 1 bpp 
Scaling factor 1 4 16 
PSNR (dB) 28.48 26.55 23.51 
1 1 0 
(a) Sca l i ng factor is 1 a t  0. 1 bpp 
(b) Scal ing factor is 4 at 0. 1 bpp 
J 1 1  
c) Sca l ing fa tor i J 6 at 0. 1 bpp 
Figure 5.6 Content-ba ed decoded Lena i mage at 0. 1 bpp 
To how the performance f the prop ed method at very low b i t  rate. , decoding of 
the i ma0e topped at 0.05 bpp. Figure 5.7 hows the result i ng i mage and Table 5.6 the 
correspondi ng P NRw ver. u the a l i ng factor. Without content-ba 'ed encoding, every 
part of the i mage i s  qual ly af� cled by U1e low b i t  rate · u. ed, esp ia l ly the face region 
of the i mage a h wn in Figure 5.7(a ) .  The harpne. r th face region i i mproved by 
u i ng a .  cal i ng fa tor gr ater than one a hown i n  Figure 5.7( b) and Figure 5.7( ) with 
cal i ng factor equal to 4 and 1 6  re p c t ively. The race reg ion i s  even 'harp r in Figure 
5 .7 (c )  than that i n  Figure 5.6(a) where there i no c n tent-ba ed needing with double 
t he bi t  rate. Fr m the e imu lation • .  a cal i n0 fact r oJ 1 6  i ncrea e the contra t of 
sharpn s b tween th e l  c ted r gion and U1e background where it become. a lmo t 
unr cognizable. 
1 1 2 
Table 5 .6 Overa l l  PSNR  of content-based encoded Lena images at 0.05 bpp 
Sca l i ng factor 1 4 1 6  
PSNR (dB ) 26.32 24.78 2 1 .39 
(a)  Sca l i ng factor is l at 0.05 bpp 
(b) Scal ing factor i 4 at 0.05 bpp 
1 1 3  
(c) Scal ing factor i s  1 6  at 0.05 bpp 
Figure 5.7 Content-based decoded Lena images at 0.05 bpp 
5 .4 Conclus ions 
In t h i s  chapter, we have discussed two appJ jcations for the DFS EW algorithms. The 
first appl icat ion we have looked at is for robust coding. We have l ooked into the error 
charactetistics of the EZW algotithm. The original EZW algorithm using arithmetic 
coding can be used to detect al l  the en-ors to the transm i tted bit stream. Decoding wi l l  
stop and information al ready recei ved by the decoder can be used to reconstruct an 
approximated image with less resol ut ion. By partit ioning the coeffic ient tree structure 
into independent subtrees using the DFS representat ion, the effect of an error occuITing 
in a subtree is confined to that ubtree only. The origjnal EZW aJgorithrn does not 
tolerate any enor in the bit sLTeam because of the AC. We have l isted three other 
di fferent cases to apply the AC to part of the bit stream. Each of the cases offer 
different error detection capabi l ity and enor to lerance. 
1 l 4  
We have also looked i111u 1hc severi1y of error occurrence to di ffcrcnl parts of the hil 
stream and have found 1ha1 1he reordering of the .ipproximatc bits in SAQ will lcud to 
more error prop�1gution if 1101 detected. For the MAP, we have arn1lyzr.:d the different 
cases when an error rn.:curs to each of the four MAP symhols. l l  has been suggc.�tec.l thut 
decoding docs not need to be stopped if the error docs not lead to structural change to 
the transmitted information about the subtrees. We have found that the effects of bit 
errors range from utlCcting only a fraction of the magnitude of a coefficient, the sign of 
a coefficient, the length of the SAQ only, and the most serious case, when both the 
lenglhs of lhe MAP and SAQ are affecled. 
Simulations have been performed to look at the actual behaviours of the various 
modifications 10 the EZW algorithm. The first modification was the DFS partitioning 
with no AC and no SAQ reordering. This modified configuration has two 
characteristics. It detects the major errors that will lead to error propagation and 
tolerales those errors that will only cause localised errors. However, without the AC. 
we need 10 assume lhal 1he lenglhs of bolh lhe MAP and SAQ will be avail;1ble to lhe 
decoder and they are not subject to errors. Further analysis is required 10 decide if this 
information should be incorporated into the source coding or be provided by the channel 
coding. The use of this additional information on the lengths of MAP and SAQ may 
enable us to identify the errors involving the IZ and ZTR symbols. 
The second application we have looked at i s  for contentwbased ·coding. We have 
presented a new method to implement the approach where a selected region of an image 
is to be encoded with more resources. The method is based on the DFS representation 
of the EZW algorithm and makes use of the bit stream implementation. The approach is 
modular in the sense that the additional feature of contcntwbased encoding and decoding 
1 15 
is performed by a scparnte function while the modules of the original DFS BS EZW 
structure is unchmtged. The DFS BS .o,;uhtrcc representation c,f th�'. wavelet C(Jcfficicnts 
also makes it possihlc for implementing scaling factors which arc powers of two by 
simply shifting the suhtrccs to increase the significance of  the corresponding 
coefficients al the encoder. At the decoder, the suhtrces arc shifted in the reverse 
direction before the coefficients arc inverse wavelet transformed. This simple shiliing 
of the subtrees is useful for interactive selection of regions in an image to be given more 
resources even when some of the bit planes have been transmitted. The other advantage 
of the proposed method is its adaptability in that the sharpness of the background as 
well as the selected region can be improved with higher bit rates which can be 
determined during encoding or decoding. 
Simulation results have confirmed that the proposed method gives a much clearer 
image in the selected region at the expense of losing sha.rpness in the background region 
as well as a drop in the overall PSNR. However, the method does not produce any 
obvious blocking artifacts apart from an increase in contrast of the sharpness between 
the selected region and the background with a large scaling factor. The contrast in 
sharpness can be improved by using multiple regions around the target region with 
progressively decreasing scaling factors. 
1 16 
' '  
Chapter 6 
Conclusions and Future Work 
6. 1 Conclusions 
We began this thesis by asking the question whether a single coding algorithm can 
be designed to meet the diverse requirements for image communication over wireless 
transmission channels. Embedded wavelet (EW) algorithms show much potential to 
meet the requirements of high coding performance, bit stream scalability and robustness 
to transmission errors. Other desirable features are low complexity architectures and 
suitability for content-based coding schemes. The barrier impeding the potential of EW 
algorithms are tree searching (TS) schemes which can be efficiently implemented in 
hardware. There are two approaches for TS architectures: iterative or noniterative. The 
noniterative architectures suffer from misprediction. The complexity in the 
implementation of iterative TS arch itectures i s  that i t  requires keeping track of the 
significance of the descendant nodes. 
I n  this work, we focused on iterative TS schemes and used a top-down design 
methodology beginning from the investigatio n  of the original EW algorithms. We 
looked at variations in EW algorithms and i demified four variations i n  which the 
algorithms differ from one another. The algorithms differ in  the tree structure, the 
search strategy, the coding scheme and the output scheme. One search strategy is the 
depth-first search (DFS). The DFS gives several advantages for hardware 
implementation. The DFS algorithms have low storage requirements, simple 
computations and allows for efficient addressing of the nodes in the tree structure. We 
117 
performed simulations to i.:omp<irc the DFS variants with ils original algorilhms :111d 
found !hill the i.:oding pcrformani.:c is only slightly lower than the original algorith1m,. 
Furthermore, the pcrformani.:c of the DFS SPII-IT variant without aritlum:tk i.:oding is 
almost i.:omparablc to the performance of the complete EZW algorithm with arithmetic 
coding. A comparison with the WVQ algorithm [331 shows that the pcrforrmmcc of the 
DFS SPIHT algorithm is higher than the WVQ algorithm at high bit rates and is 
comparable to the WVQ algorithm at lower bit rates. 
At the architecture level. the DFS allows for novel bit stream (BS) TS architectures 
where the magnitude bits are not stored and are processed as they flow through the 
architecture resulting in fast architectures with minimal storage and low complexity. 
Two stages are required i n  the processing. In the First Stage, the coefficient tree is  
scanned from the bottom to the top of the tree and the significance information is  
propagated from the leaves to the roots. The Second Stage scans the tree from the top 
to the bottom of the tree and performs lhe coding. The DFS performs a natural 
partitioning of the coefficient tree into subtrees which can be processed in parallel for 
faster processing or to meet a target bit rate. Parallel processing can also be used to 
lower the clock frequency to reduce the overall power consumption of the architecture. 
Another advantage of the parallel DFS architecture is that it provides options to output 
in different formats to increase coding performance. 
The DWT/IDWT and TS/ITS processors can be combined lo form a single 
architectural structure for implementation. The processing and storuge requirements arc 
distributed amongst the processors to achieve an optimal system. The similarity of the 
encoder and decoder structures enable the construction of a coding system which can be 
switched to perform encoding and decoding _on the same device. We have performed 
1 1 8  
simulations to determine the complexity of the sys1cm for diffcrcnl conliguralions, 
image sizes and cocf
l
icicnt hit widlhs. The DFS BS SPIJ-IT system is sui1ahle for 
implementalion on DSP processors, field progrnmmahle log!c devices or VLSI and i . ., .i 
promising solution for re.ii-time video coding •1pplications in general and for portahlc 
video cormnunicators in particular. 
We have looked at two applications for 1hc DFS EW algorithms. The DFS 
algorithms have robustness to transmission errors by not performing SAQ reordering 
and arithmetic coding. The DFS algorithms can accommodate content-based coding by 
shifting the subtree bitplanes before the encoding process. 
6.2 Future Work 
The work presented in this thesis can be extended i n  the following ways. Firstly. the 
DFS algorithms and architectures can be extended to accommodate motion 
compensation schemes for coding of colour video images 171  J using. newer biorthogonal 
" ·. wavelet filters [72J. We have performed initial work on applying. a winner-take-all 
artificial neural network for motion compensation schemes for 1hc DFS algorithms [9J. 
Initi°al simulations show the polential of this approach. Secondly. the DFS EW 
algorithms with its corresponding bit stream architectures form a good foundation for 
video communicators. The rnOustness and content-based coding schemes could fcmn 
the front-end and the back-end of the DFS EW system as shown in Figure 6. I .  The 
front-end module performs the image segmentation for the system. The hack-end error 
protection module protects the MAP bits for transmission. The SAQ bits do not have to 
be protected for transmission. 
1 19 
MAI' hi1 .. Error 
IJl;S HS � l'n11ccti,111 Image Emhci.li.lcll 
Scgnu.•n1:.1tion \\'a,·ck1 Sy.-.1cm S,\<) hi1 .. 
Figure 6. 1 DFS BS EW sys1c111 wilh inmgc scgmcnt.ition :m<l error protection modules 
Thirdly. the DFS traversal scheme can he extended to accommodate the searching of 
non-regular lrt..'C structures such as 1hose resulting from decomposition using the <liM.:rctc 
wavelet packct trnnsforrn. An EW algorithm which w,cs non-regular tree structures is 
lhe Space-Frequency Quantization (SFQ) algorithm I 18 J. Fourthly. the work in this 
thesis can be improved using techniques used in the Embedded Block Coding with 
Op1imized Trunca1ion (EBCOT) algori1hm [731 which is use<l in 1he JPEG2000 
slandard. The EBCOT algorilhm a<l<ls lhe feature of spalial scalabilily lo EW 
algorithms. 
120 
Bibliography 
1 1 1  K. Eshrnghian, S. L;1chowicz, G. Alagoda and L. Ang. "Architct:tural mapping .... 
for multimcdi:.1 smart-pixel arr;.1ys." in /1roc IEE/:" /111. Worhlmp f),,JiKII, Tl.'sl mu/ 
Apf'lic·mions, Duhrovnik. Croatia. pp. 33-35, Jun. I lJlJX. 
I2J L. Ang, H. N. Cheung and K. Eshraghian. "VLSI .1n.:hitccturc for �ignificancr.: 
map coding of embedded zcrotrcc wavelet cocffidcnts," in Proc. IEEE Asia 
Pacific: Conf Circuit.\' and S,·.rtems. Chiangmai, Thailand, pp. 627-630. Nov. 
1998. 
[3] L. Ang, H. N. Cheung and K. Eshraghian, "VLSI architecture for embedded 
zerotree wavelet coding," in Proc. s111 hll. Symp. DSP for Comnumkation Systems. 
Perth, Australia, pp. 128-133, Feb. 1 999. 
[4] L. Ang, H.  N. Cheung and K. Eshraghian, "VLSI decoder architecture for 
embedded zerotree wavelet algorithm," in Proc. IEEE lnt. Symp. Cirrnits a,1d 
Systems, Orlando, U.S.A, pp. 141- 144, May 1999. 
[5] L. Ang, H. N. Cheung and K. Eshraghian, "EZW algorithm using depth-first 
representation of the wavelet zerotree," in Pmc. 5111 hu. Symp. Sig,wl Pmcl's.,'ing 
and its Api1licatio11s, Brisbane. Australia, pp. 75-78. Aug. 1999. 
[6] L. Ang, H.  N. Cheung and K. Eshraghian, "Robust image compression using the 
depth-first search on the wavelet zerotree," in Proc. 5111 Im. Symp. Signal 
Processing and its Applications, Brisbane, Australia, pp. 797-800. Aug. 1999. 
[7] H. N. Cheung, G. Alagoda, K. Eshraghian and L. Ang. "Smart-pixel VLSI 
architecture for embedded zerotree wavelet coding," in Proc. 511, Im. Symp. Signal 
Processing and its Applications, Brisbane, Australia. pp. 693-696. Aug. 1999. 
[8] L. Ang, H. N. Cheung and K. Eshraghian, "VLSI architecture for very high 
resolution scalable video coding using the virtual zcrotrcc:' in Proc. IEEE 
Workshop Signal Processing Systems, Taipei, Taiwan. Oct 1999. 
121 
191 L. Ang, H. N. Cheung and K. E..,hraghian, "Comparison of winner-take-all 
motion compensation schemes for embedded wavelet coding," in Proc. 611r /111. 
Cmif. Nr.mral Information Processin>:, Perth, Australia, pp. 390-394, Nov. 1999. 
l lOl H. N. Cheung, L. Ang and G. Alagoda. "Bit stream architecture for lhL! 
implementation of the improved embedded zcrotrcc wavelet algorithm." in Proc. 
Seco,u/ Im. Conf /11formatio11, Communications & Signal Processing, Singapore, 
Dec. 1999. 
[ l  I ]  H .  N. Cheung and L. Ang. "Analysis o f  embedded zerotree wavelet algorithms for 
robust image compression," in Proc. S,tcond hu. Conf. /11Jormatio11, 
Comr,umications & Signal Processing, Singapore, Dec. 1999. 
[ 12) H. N. Cheung, L. Ang and K. Eshraghian. "Embedded zerotree wavelet processor 
for mobile video communicator," in Proc. IEEE /nt. Symp. Intelligent Signal 
Processing and Communication Systems, Phuket, Thailand, Dec. 1999. 
[ 1 3] H. N. Cheung, L. Ang and K. Eshraghian, ··Parallel architecture for the 
.implementation of the embedded zerolree wavelet algorithm."' in Proc. 5
1/r 
-:c: > Australasian Compmer Architecture Conf .. Canberra. Australia. Jan. 2000. 
[ 14] L. Ang, H. N. Cheung and K. E,hraghian. "'A datallow-oricntcd VLSI architecture 
for a modified SPIHT algorithm using depth-first search bit stream processing.·· in 
Proc. IEEE Im. Symp. Circuits mu/ Systems. Geneva, Switzerland. pp. 291-294. 
May 2000. 
[ 15]  H. N. Cheung and L. Ang. ··Application of the EZW algorithm to content-based 
image compression," to be presented al IEEE Im. Symp. lntelli,rwnr Signal 
Proces.'iing am/ Communicmion Systems, Honolulu. Hawuii. Nov. 2CK)O. 
1 16) J. M. Shapiro, "Embedded image coding using zcrolrees of wo.,\·elet coefticients,·· 
IEEE Trans. Signal Processing, vol. 41 .  pp. 3445-3462. Dec. 1993. 
t22 
" . .
[ 17 1  A.  Said 1.md W. A. Pearlman. "A new fast and cflicicnl inmic c1Klcc hasc<l on .,cl 
partitioning in hicrnrchical lrccs," IEEE Tmu.\. Circuit.\ mu/ s_r.'ilt'UH {r1r Vid,·o 
1'.•c/mology. vol. 6. pp. 2-B-250. Jun. 1 996. 
1 18 1  K.  Rmnchandr;.111, Z .  Xiong mu.I M. T. Ordi.mL ··sp;1L·c-frcqucncy qua111i1a1iun for 
w;.1vekt image coding." /HEE Trans. b11t1,i:1• l'mc·,·.ui11g. vol. 6. no. 5. pp. 677-693. 
May 1997. 
[ 19] J. M. Zhong. C.H.Lcung and Y.Y.Tang. ''An improved cmbl!<ldcd zcrotrcc 
wavelet image compression algorithm based on significance checking in w;,1\·clt!I 
lrees." in Proc. IEEE bu. Conf System.\", M,m, mu/ Cyham•tics. pp. 4567-4571 .  
Oct 1998. 
(20] S. A. Marlucci. I. Sodagar, T. Chiang and Y. Zhang, "A zerotrec \\'avelet video 
coder," IEEE Trans. Circuits and Systems for Video Teclmology. vol. 7. pp. 109· 
1 18, Feb. 1997. 
[211  Q.  Wang and M. Ghanbari. "Scalable coding of \'ery high resolution video u:-.ing 
lhe virtual zerotree. " IEEE Trcms. Circuits and Systems for Vidt·o Ted11wlo�r. 
vol. 7, pp. 7 19-727, Oct 1997. 
[22] I. Sodagar, H. -J. Lee. P. Halrack and B. -B. Chai. "Multi-scale: zcrotrcc entropy 
coding," in Proc. IEEE Im. Symp. Cin-11its and Sysft'ms. G!!nc,·..i. Swi1zerland. pp. 
3 1 1 -314, May 2000. 
[23 J P. N. Topiwala (Ed.). Wcn•elet inwge and vicleo compression. Kluwer Academic 
Publishers, Jui. 1 998. 
[241 G. S1rang and T. Nguyen, Wal'elets mu/ Filter Banks, Wellesley-Cambridge Press. 
Oct 1996. 
[251 M. Veuerli, Wavelet.,· mul .mbbcmd cmling. Prentice Hall. Apr. 1995. 
[261 C. D. Crcusere, "A new method of robust image compression bascll on the 
embedded zerotree wavelet algorithm," IEEE Tram·. Image Proas.\·ing, vol. 6. pp. 
1436- 1442, Oct 1 997. 
123 
l:?71  H. Man. f. Kosscnlini ,md M. J. T. Smilh, "r\ class of EZW image coders for 
noi.sy channels," in Proc. Im. Cn11J: lnwge l'mc:es.\·inf.:, vol. 3, pp. 90-93, Ocl. 
19')7. 
[281 H. Man. f. Kosscmini, and M. J. Smilh, "A channel error robust EZW image 
coding technique," in Proc. 2nd ErlaKCll Symposium on Advances in Digital 
fmt1ge Co,r11111111icmio11. Apr. 1997. 
[29) P. G. Shfrwood and K. Zl!gcr. '"Progre�sive image coding for noisy channels," 
IEEE Sig11al Proce.\·si11g Leuers. vol. 4, pp. 189- 19 1 ,  Jui. 1997. 
[301 P. G. Sherwood and K. Zeger. '"Progressive image coding on noisy channels," in 
Proc. Dara Compression Con}:. pp. 72-8 1 .  Mar. 1997. 
(31]  T. -C. Yang. S. Kumar and C. -C. J. Kuo, "Low-overhead error-resilient bit-plane 
image coding,'' in Proc. IEEE bu. Symp. Cirrnits and Systems, Orlando, U.S.A, 
pp. 50-53. May 1999. 
[321 Y. Q. Shi and H. Sun, Image a,id 1•ideo compression for multimedia engineering, 
CRC Press, Dec. 1999. 
[33] S. -K. Paek and L. -S. Kim. "A real-time wavelet vector quantization algorithm 
and its VLSI architecture,'' IEEE Trans. Circuits and Systems .for Video 
Technology. vol. IO, no. 3, pp. 475-489, Apr. 2000. 
[34] J. Bae and V. K. Prasanna. '"A fast and area-efficient VLSI architecture for 
embedded image coding, " in Proc. fill. Con]: Image Processing, vol. 3. pp. 452-
455, Oct. 1995. 
[35) M. Schwarzenberg, M. Traeber, M. Scholles and R. Schuftny, "A VLSI chip for 
wavelet image compression.'' in Proc. IEEE Im. Symp. Circuits and Systems. 
Orlando, pp. 271 -274, May 1999. 
)36) R. Y. Omaki, G. Fujita, T. Onoye and I. Shirakawa, "Architecture of embedded 
zerotree wavelet based real-lime video coder," in Proc. IEEE lnt. C01�f. 
AS/C/SOC, pp. 137-141, Sep. 1999. 
124 
" 
·· [37] i\J ;-' Singh. A. Antoniou and D. J. Shpak, "Hardware implementation of a wavelet 
based image compression coder," in Proc. IEEE Symp. Advances i11 Digital 
Filtering and Signal Processing, pp. 169- 173, Jun. 1 998. 
[38] Vega-Pineda, J. ,  Suriano, M . A. ,  Villalva, V. M., Cabrera, S. D. and Chang, Y. -
C., "A VLSI array processor with embedded scalabil ity for hierarchical image 
�ompression," i n  Proc. IEEE lnt. Symp. Circuils cmd SY,slems, v9I. 4, pp. 168- 1 7 1 ,  
May 1996. 
[39] V. Bhaskaran and K. Konstantinides, "Image and v id��o compression standards : 
algorithms and architectures," Kluwer Academic Publishers, Jun. 1 997. 
[40] W.-K. Lin and N. Burgess, "Listless zerotree coding for color images", 
i n  Proc. 32nd Asilomar Conf. Signals, System an�/ Computers, CA, USA, 
Nov. 1998. 
[41] W.-K. Lin and N. Burgess, "Low memory color image zerotree c.oding", in Proc. 
Information, Decision and Control, pp. 9 1 -95, Adelaide, Australia, Feb. 1999. 
[42] W.-K. Lin, B. W.-H. Ng, N. Burgess and A. Bouzerdoum, "Ile.�uced memory '•1 . 
zerotree coding algorithm for hardware implementation", i n  Proc. iiEE Im'.- Conf 
Multimedia Computing and System, Florence, Italy, Jun·. 1999. 
[43] C.-Y. Su  and B.-F. Wu, " Image coding based on embedded recursive zerotree", in 
Proc. lnt. Sym. Multimedia Information Processing, Taipei, Taiwan, Dec. 1 997. 
[44] W.-K. Lin and N.  Burgess, "A low memory video compression algorithm for 
hardware implementation", in Proc. bu. Workshop Multimedia Signal Processing, 
Copenhagen, Denmark, Sep. 1 999. 
[45] W.-K. Lin and N. Burgess, "3D listless zerotree coding for low bit rate video", in 
Proc. lnt. Conf Image Processing, Kobe, Japan, Oct. 1999. 
[46] E. H. Adelson, E. P. Simoncell i  and R. Hingorani. "Orthogonal pyramid 
transforms for i mage coding," in Proc. SPJE, vol. 845, Oct. 1 987, pp. 50-58. 
125 
[47.1 I. H. Witten, R. Neal and J. G. Cleary, "Arithmetic coding for data compression," 
Comm. ACM, vol. 30, pp. 520-540, Jun. 1987. 
[481 V. R. Algazi and R. R. Estes, "Analysis based coding of image transform and 
subband coefficients," in Proc. Sf'IE, vol. 2564, 1995, pp. 1 1-21. 
[49] M. Antonini, M. Barlaud, P. Mathieu and I .  Daubcchics, "Image coding using 
wavelet transform," IEEE Trans. Image Processing, vol. I ,  pp. 205-220, Apr. 
1992. 
[50] K. K. Parhi and T. Nishitani, "VLSI architectures for discrete wavelet 
transforms," IEEE Trans. VLSI Systems, pp. 19 1 -202, Jun., 1993. 
[5 1 ]  A. Grzeszczak, M. K. Manda! and S. Panchanathan, "VLSI implementation of 
discrete wavelet transform," IEEE Trans. VLSI Systems, vol. 4, pp.421-439, Dec. 
1996. 
[52] M. Vishwanath, R. M. Owens and M. J. Irwin, "VLSI architectures for the 
discrete wavelet transform," IEEE Trans. Circuits and Systems, vol. 42, pp. 305-
3 16, May 1995. 
[53] C. Yu and S. J, Chen, "Design of an efficient VLSI architecture for 2-D discrete 
wavelet transforms, IEEE Trans. Consumer Electronic, vol. 45, pp. 1 35- 140, Feb. 
1999. 
[54] C. Chakrabarti, "A DWT-based encoder architecture for symmetrically extended 
images," in Proc. IEEE lnt. Symp. Circuits and Systems, Orlando, U.S.A, pp. 123-
126, May 1999. 
[55] 
;·, 
'\ 
J>-.Ritter and P. Molitor, "A partitioned wavelet-based approach for image 
compression using FPGA's," in Proc. IEEE Custom Integrated Circuits C01f, 
Orlando, pp. 547-550, May 2000. 
[56] K. Haapala, P. Kolinummi, T. Hamalainen and J, Saarinen, "Parallel DSP 
implementation of wavelet transform in image compression," in Proc. IEEE fill. 
Symp. Circuits and Systems, Geneva, pp. 89-92, May 2000. 
126 
II 
[571 G. K. Wallace, "The JPEG Still Picture Compression Standard," Co111111. /\CM, 
vol. 34, pp. 30-44, Apr. 1991 .  
[58] J. L. Mitchell, W. B. Pennebaker, C. E. Fogg and D. J. LeGall, MPEG video 
compression standard, Chapman and Hall, 1996. 
[59] A.M. Rassau, K. Eshraghian, T.C.B. Yu, H.N. Cheung, S.W. Lachowicz, W.A. 
Crossland and T.D. Wilkinson, "Smart pixel implementation of a 2-D parallel 
nucleic wavelet transform for mobile multimedia communications," in Proc. 
Design, Automation and Test Con/., pp. 19 1 - 195, Feb. 1 998. 
[60] P. J. Ashenden, The designer's guide to VHDL, Morgan Kaufmann Publishers, 
Oct. 1 999. 
[61] S. Sjoholm and L. Lindh, VHDLfor designers, Prentice Hall, Dec. 1996. 
[62] M. J. Smith, Application-specific imegrated cirrnits, Addison Wesley Longman, 
Nov. 1996. 
[63] D. Knapp, Behavioral synthesis: digital system design using rhe Synopsys 
Behavioral Compiler, Prentice Hall, May 1996. 
[64] P. Kurup and T. Abbasi, Logic synthesis using Synopsys, Kluwer Academic 
Publishers, Jan. 1995. 
[65] E. H .  Adelson and E. P. Simoncelli "Subband image coding with three-tap 
pyramids," in Proc. Picture Coding Symposium, 1990. 
[66] MTC-22000 CMOS 0.7µ standard cell family, Alcatel Microelectronics. 
[67] P. -Y. Cheng, J. Li and C. -C. J. Kuo, "Rate control for an embedded wavelet 
video coder,'' IEEE Trans. Circuits and Systems for Video Technology, vol. 7, pp. 
696-702, Aug. 1997. 
[6�] M. J. Ruf and J. W. Modestino, "Operational rate-distortion performance for joint 
source and channel coding of images," IEEE Trans. Image Processing, vol. 8, pp. 
305-320, Mar. 1999. 
127 
!.i 
" 
i! 
I D. Chai and K. N. Ngan, "Face scgmcntatii)n using skin color map in videophone 
.. ,I applic:itions," IEEE Tran\·, Cin:uifs mu/ System.\· jt1r Video Teduw/01.:_v. vol. 9, pp. 
551-564, Jun. 1999. 
1701 D. Chai and K. N. Ngan, "Content-based hit allocation mid rntc control for 
classical MC-OCT video co<ling sys��ms," in Proc. IEEE lnt. Workshop 
lmelligem Signal Proassing and Cm,1m_w1icatio11 Systems, Mclhourne, Australia. 
vol. 2, pp. 601-605, Nov. 1998. 
[7 1 ]  Y .  -Q. Zhang and S. Zafar, "Motion-compensated wavelet transform coding for 
color video compression," IEEE Trans. Circuits and Systems for Video 
Tec/1110/ogy, vol. 2, pp. 285-296, Sep. 1992. 
[72] H. J. Kirn and C. C. Li, "Lossless and lossy image compression using 
biorthogonal wavelet transforms with multiplierless operations," IEEE Trans. 
Circuits and Systems: Analog and Digital Signal Processing, vol. 45, pp. 1 1 1 3-
1 1 18, Aug. 1998. 
(73] D. Taubman, "High performance scalable image compression with EBCOT." 
IEEE Trans. Image Processing, vol. 9, no. 7, pp. 1 1 58-1 170, Jui. 2000. 
' ·.'i'i' q (/, 
',') 
128 
';' ' , ,  
