Pipelined implementation of Jpeg image compression using Hdl by Toomu, Arun Kumar Reddy
UNLV Retrospective Theses & Dissertations 
1-1-2008 
Pipelined implementation of Jpeg image compression using Hdl 
Arun Kumar Reddy Toomu 
University of Nevada, Las Vegas 
Follow this and additional works at: https://digitalscholarship.unlv.edu/rtds 
Repository Citation 
Toomu, Arun Kumar Reddy, "Pipelined implementation of Jpeg image compression using Hdl" (2008). 
UNLV Retrospective Theses & Dissertations. 2387. 
https://digitalscholarship.unlv.edu/rtds/2387 
This Thesis is protected by copyright and/or related rights. It has been brought to you by Digital Scholarship@UNLV 
with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the 
copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from 
the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/
or on the work itself. 
 
This Thesis has been accepted for inclusion in UNLV Retrospective Theses & Dissertations by an authorized 
administrator of Digital Scholarship@UNLV. For more information, please contact digitalscholarship@unlv.edu. 
PIPELINED IMPLEMENTATION OF JPEG IMAGE COMPRESSION
USING HDL
by
Arun Kumar Reddy Toomu
Bachelor o f Technology 
J.N.T University, Hyderabad, India 
2006
A thesis submitted in partial fulfillment 
o f the requirement for the
Master of Science Degree in Electrical Engineering 
Department of Electrical and Computer Engineering 
Howard R. Hughes College of Engineering
Graduate College 
University of Nevada, Las Vegas 
August 2008
UMI Number: 1460544
INFORMATION TO USERS
The quality of this reproduction is dependent upon the quality of the copy 
submitted. Broken or indistinct print, colored or poor quality illustrations and 
photographs, print bleed-through, substandard margins, and improper 
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript 
and there are missing pages, these will be noted. Also, If unauthorized 
copyright material had to be removed, a note will indicate the deletion.
UMI
UMI Microform 1460544 
Copyright 2009 by ProQuest LLC.
All rights reserved. This microform edition is protected against 
unauthorized copying under Title 17, United States Code.
ProQuest LLC 
789 E. Eisenhower Parkway 
PC Box 1346 
Ann Arbor, Ml 48106-1346
UM Z Thesis ApprovalThe Graduate College 
University of N evada, Las Vegas
JULY 18 . 2008
The Thesis prepared by
ARUN TOOMU
Entitled
PIPELINED IMPLEMENTATION OF JPEG COMPRRSSThN USTNC HOT,
is approved in partial fulfillm ent of the requirements for the degree of 
_________________MASTER OF SCIENCE IN ELECTRICAL ENGINEERING
in a m n  Committee Member
om mitfée M em b■''.Examination'^.
Graduate College Faculty Representative
Examination Committee Chair
Dean o f the Graduate College
1017-53 11
ABSTRACT
Pipelined Implementation of JPEG Image Compression using VHDL
by
Arun Kumar Reddy Toomu
Dr. Henry Selvaraj, Examination Committee Chair 
Professor o f Electrical and Computer Engineering 
University o f Nevada, Las Vegas
This thesis presents the architecture and design o f a JPEG compressor for color 
images using VHDL. The system consists o f major parts like color space converter, down 
sampler, 2-D DCT module, quantization, zigzag scanning and entropy coding. The color 
space conversion transforms the RGB colors to YCbCr color coding. The down sampling 
operation reduces the sampling rate o f the color information (Cb and Cr). The 2-D DCT 
transform the pixel data from the spatial domain to the frequency domain. The 
quantization operation eliminates the high frequency components and the small amplitude 
coefficients o f the co-sine expansion. Finally, the entropy coding uses run-length 
encoding (RLE), Huffman, variable length coding (VLC) and differential coding to 
decrease the number o f bits used to represent the image. The JPEG compression is a lossy 
compression, since downsampling and quantization operations are irreversible. But the 
losses can be controlled in order to keep the necessary image quality.
Architectures for these parts were designed and described in VHDL. The results were 
observed using Active-HDL simulator and the code being synthesized using xilinx ise for 
vertex-4 FPGA. This pipelined architecture has a minimum latency o f 187 clock cycles
iii
TABLE OF CONTENTS
ABSTRACT......................................................................................................................................... iii
LIST OF F IG U R E S............................................................................................................................vi
ACKNOW LEDGEM ENTS............................................................................................................ vii
CHAPTERI INTODUCTION..................................................................................................  I
1.1 Thesis Outline ....................................................................................................................... 3
1.2 B ackground............................................................................................................................. 3
CHAPTER2 THEORITICAL BACKGROUND..................................................................  4
2.1 Data Compression B asics ..........................................................    4
2.2 Data Compression techniques.............................................................................................5
2.2.1 Lossless Vs Lossy Com pression..............................................................................5
2.2.2 Predictive Vs Transform C oding .............................................................................5
2.2.3 Subband Coding.......................................................................................................... 6
2.3 Loss less Com pression......................................................................................................... 6
2.4 Lossy Compression techniques.......................................................................................... 7
2.4.1 Subband Coding...........................................................................................................8
2.4.2 Transform Coding..................................................................................................... 10
2.4.2.1 Discrete Cosine Transform (DCT) Based C oding...............................11
2.4.2.2 Lapped Transforms (LT) Based C oding................................................14
2.4.2.3 Discrete W avelet Transform (DWT) Based Coding........................... 14
2.4.2.3.1 JPEG 2 0 0 0 .................................................................................  16
CH APTER3 ARCHITECTURE..........................................................................   18
3.1 Outline o f JP E G ......................................................................  18
3.2 Architectures o f  JPEG........................................................................................................ 21
3.2.1 Color Conversion......................................................................................................21
3.2.2 Discrete Cosine T ransform .................................................................................... 22
3.2.3 Q uantization............................................................................................................... 27
3.2. 4 Zig zag Scanning ...................................................................................................30
3.2.5 Entropy Coder.....................................................................  31
3.2.5.1 Differential C oder........................................................  34
3.2.5.2 Run Length E ncoder.................................................................................. 35
3.2.5.3 Size C alculator........................................................................................  37
3.2.5.4 Variable Length Coder...............................................................................38
3.2.5.5 Huffman Encoder............................................................    39
3.2.5.6 Preassm ebler................................................................................................ 40
IV
3.2.5.7 Assem bler...................................................................................................................... 41
CHAPTER4 RESULTS AND D ISCUSSION................................................................... 44
4.1 Simulation W aveform s....................................................................................   44
4 .1.1 Color C onversion ....................................................................................................44
4.1.2 DCT.............................................................................................................................44
4.1.3 Quantization..............................................................................................................46
4.1.4 Zig zag Scanning......................................................................................................47
4.1.5 Differential C oder....................................................................................................48
4.1.6 Run Length Encoder............................................................................................... 48
4.1.7 Size Calculator......................................................................................................... 49
4.1.8 VLC Coder.......................................................................................................... 50
4.1.9 Huffman Encoder.....................................................................................................50
4.1.10 Preassem bler...........................................................................................................52
4 .1 .11 A ssem bler............................................................................................................... 53
4.1.12 JP E G .................................................................................................................... :..54
4.2 Synthesis R epo rt..................................................................................................................55
CHAPTERS CONCLUSION  .............................................................................................57
R EFER EN C ES...................................................................................................................................58
VITA............................................................................................................................................. 61
LIST OF FIGURES
Figure 2.1 Block Diagram o f SBC............................................................................................. 9
Figure 2.2 2-D DCT using Vector Processing...................................................................... 13
Figure 2.3 Level-3 dyadic DW T scheme used for Image Com pression......................... 16
Figure 2.4 General block-diagram o f the JPEG 2000 encoder ..........................................17
Figure 3.1 The JPEG Baseline Encoder.................................................................................. 18
Figure 3.2 1-D DCT Implementation.......................................................................................25
Figure 3.3 2-D DCT Implementation...................................................................................... 26
Figure 3.4 Quantization A rchitecture..................................................................................... 27
Figure 3.5 Zigzag Scanning...................................................................................................... 31
Figure 3.6 Entropy Encoder...................................................................................................... 31
Figure 3.7 Pipelined Architecture for Entropy coder...........................................................33
Figure 3.8 Differential Coder.................................................................................................... 34
Figure 3.9 Run Length Encoder....................  45
Figure 3.10 Huffman Coder Architecture .............................................................................. 39
Figure 3 .11 Preassembler Architecture.....................................................................................41
Figure 3.12 Assembler A rchitecture......................................................................................... 42
Figure 4.1 Simulation o f Color Conversion...........................................................................44
Figure 4.2 Simulation o f D C T ................  45
Figure 4.3 Simulation o f Q uantizer.........................................................................................46
Figure 4.4 Simulation o f Zig Zag Scanner.............................................................................47
Figure 4.5 Simulation o f Differential Coder..........................................................................48
Figure 4.6 Simulation o f Run Length encoder...................................................................... 48
Figure 4.7 Simulation o f Size Calculator............................................  49
Figure 4.8 Simulation o f VLC C o d er..................................................................................... 50
Figure 4.9 Simulation o f Huffman Coder for DC Components........................................ 50
Figure 4.10 Simulation o f Huffman Coder for AC Components........................................ 5 1
Figure 4 .11 Simulation o f Preassem bler...................................................................................52
Figure 4.12 Simulation o f A ssem bler........................................................................................53
F igure 4.13 S imulation o f JPEG Encoder.................................................................................54
Figure 4.14 RTL Schematic o f JPEG E ncoder....................................................................... 55
Figure 4.15 Design Summary o f JPEG for Vertex-4 FPGA ................................................ 56
VI
ACKNOW LEDGEMENTS 
It is great pleasure for me to acknowledge the people who have helped me during the 
course o f my thesis work. My special thanks to my advisor, Dr. Henry Selvaraj who has 
supported me in the right direction. I would specially acknowledge Dr. Emma Regentova, 
Dr. Yahia Baghzouz for serving as committee members and D r Laxmi Gewali for serving 
as graduate college representative. I would like to thank my friends who gave me morale 
support in achieving this. I would like to acknowledge Aldec- ari people for supporting 
with the project “Real time system on Chip” # 2368-254-50YH.
Vll
CHAPTER 1 
INTRODUCTION
The transition from magnetic film based image representation to digital representation 
has been primarily motivated by the ease o f working with digital data and better special 
representation o f the image. Over the years, the need for image compression has grown 
steadily and currently it is being recognized as an enabling technology. For example, 
image compression has been and continues to be crucial to the growth o f multimedia 
computing. In addition, it is the natural technology for handling the increased spatial 
resolutions o f today's imaging sensors, and evolving broadcast television standards. 
Furthermore, image compression plays a crucial role in many important and diverse 
applications, including videoconferencing, remote sensing, document and medical 
imaging, facsimile transmission (FAX), and the control o f remotely piloted vehicles in 
military, space, and hazardous waste control applications. In short, an ever-expanding 
number o f applications depend on the efficient manipulation, storage, and transmission o f 
binary, gray-scale, or color images. One notable area o f application which is greatly 
driving R&D in image compression is the enormous growth in the use o f Internet and 
mobile communication devices that generated a revolution in the way human-beings 
communicate and exchange information. The necessity o f efficient digital information
delivery (e.g. images) in those devices is imperative, and different methods to do that 
have been proposed. A digital image uses a big storage space and big bandwidth for 
transmission, in mobile devices this is a problem because the space and bandwidth can be 
spent or saturated rapidly. A possible solution to solve this problem is to find a 
representation that use less information to represent digital images, by this necessity 
image compression emerges in the field o f video and digital images. Image compression 
addresses the problem o f reducing the data amount required to represent a digital image 
and is made by a removal process o f the image redundant information [I]. An ideal 
scheme is to make lossy image compression in order to save a lot o f  storage space but 
sacrificing the quality o f an image. We can compress the image with lossless 
compression techniques (e.g. Run Length Coding, Huffman Coding) but the compression 
ratio is small. These techniques are highly useable in the areas like Medical and Military 
applications where highly accurate data is needed.
JPEG image compression is lossy compression technique which is based on transform 
coding. Basically the image compression techniques make use o f following factors for 
the compression: One is majority o f useful content changes relatively slowly across the 
image. So by transforming the image content to frequency domain we can represent data 
as frequency components. Usually low frequency components contain most o f the image 
data than high frequency components. For compressing the image we can eliminate the 
high frequency components. The other is defects in Human visual system (HVS). Humqn 
eye is less perceptible to High frequency components than low frequency.
1.1 Background
Digital image compression is a very popular research topic in the field o f multimedia 
processing. The main objective o f research is to develop architecture for JPEG 
Compression schemes that give good visual quality and speed. The Compression 
technique was implemented hardware description language like VHDL, VERILOG. 
Hardware implementation speeds up image/video processing comparing to software.
1.2 Thesis Outline
This thesis is organized into five chapters. Chapter 1 gives the introduction. Chapter 
2 gives an overview o f image compression and classification o f compression schemes. It 
discusses different compression methods such as subband coding, discrete cosine 
transform (DCT), lapped transform (LT) and discrete wavelet transform based coding. 
Chapter 3 describes architecture o f JPEG image compression and its implementation. 
Chapter 4 gives simulated results o f different modules in JPEG compression and 
discussion about the results. Chapter 5 describes the final conclusion o f the thesis and 
presents some future work.
CHAPTER 2
2.1 DATA COMPRESSION BASICS 
Data compression is the reduction or elimination o f redundancy in data representation 
in order to achieve savings in storage and communication costs. It relies on the fact that 
image information, by its very nature, is not random but exhibits order and has some 
form o f structure. I f  this order and structure can be extracted, the essence o f the 
information often can be represented and transmitted using less data bits than would be 
needed for the original. We can then reconstruct the original or a close approximation o f 
it at the receiving end. A common characteristic o f most images is that the neighboring 
pixels are correlated and therefore contain redundant information. Image, Video and 
audio signals are amenable to compression due to the following factors: redundancy and 
irrelevancy reduction.
Redundancy reduction: Redundancy looks at “properties” o f an image and reduces 
redundant data.
Irrelevancy reduction: Much o f  the data in an image may be irrelevant to a human 
observer so we can omit that data.
In general, three types o f redundancy can be identified;
• Spatial Redundancy or correlation between neighboring pixel values.
• Spectral Redundancy or correlation between different color planes or spectral 
bands.
• Temporal Redundancy or correlation between adjacent frames in a sequence o f 
images (in video applications).
Image compression research aims at reducing the number o f bits needed to represent an 
image by removing the spatial and spectral redundancies as much as possible. Since we 
will focus only on still image compression, we will not worry about temporal 
redundancy.
2.2. Data Compression Techniques
2.2.1 Lossless vs. Lossy Compression
In lossless Compression schemes the reconstructed image, after compression is 
digitally identical to the original image. However, lossless compression can only achieve 
a modest amount o f compression. On the other hand, lossy schemes are capable o f 
achieving much higher compression but under normal viewing conditions no visible loss 
is perceived (visually lossless). Some o f the lossy compression schemes used include 
differential pulse code modulation (DPCM), pulse code modulation (PCM), vector 
quantization (VQ), Transform and Subband coding. An image reconstructed following a 
lossy compression contains degradation relative to the original. Often this is because the 
compression scheme also discards non-redundant information.
2.2.2. Predictive vs. Transform Coding
In Predictive Coding, information already sent or available is used to predict future 
values, and the difference is  coded. It removes redundancy between successive pixels. It 
only encodes residual between actual and predicted. Since this is done in the image or 
spatial domain, it is relatively simple to implement and is readily adapted to local image
characteristics. Differential Pulse Code M odulation (DPCM) is one particular example o f 
predictive coding. Transform coding, on the other hand, first transforms the image from 
its spatial domain representation to a different type o f representation using some well- 
known transforms such as DCT, DW T or Lapped transform, and then codes the 
transformed values (coefficients). This method provides greater data compression 
compared to predictive methods as transforms use energy compaction properties to pack 
an entire image or a video frame into
2.2.3 Subband Coding
In Subband Coding, information (image) is split in to frequency band o f a signal in 
various subbands. To code each subband, we use a coder and bit rate accurately matched 
to the statistics o f the subband.
2.3 Lossless Compression
In lossless compression schemes the reconstructed image, after compression is 
numerically identical to the original image. Through lossless compression we can only 
achieve a modest amount o f compression [1].
The lossless methods are also called entropy-coding schemes, since there is no loss 
o f information content during the process o f compression. This type o f compression is 
used in certain environments such as compression o f text, database records, spreadsheets, 
word processing files, or medical and military imaging medical imaging where no loss o f 
information is tolerated. Typical compression ratios for lossless data compression are 
around 3:1.
2.4 Lossy Compression Technique
In lossy compression, the reconstructed image is approximation o f the original image. 
Lossy compression is generally used for video and sound, where a certain amount o f 
information loss can be tolerated. The JPEG image compression is one o f  the examples o f 
lossy compression. Using JPEG compression, one can decide how much loss to introduce 
and make a trade-off between file size and image quality. Depending upon the fidelity 
required, compression ratios o f  even up to 100:1 can be obtained.
The JPEG committee has created many standards since it was created in 1986. ISO 
had actually started to work on this 3 years earlier, in April 1983, in an attempt to find 
methods to add photo quality graphics to the text terminals o f the time, but the 'Joint' that 
the 'J' in JPEG stands for refers to the merger o f  several groupings in an attempt to share 
and develop their experience. This is the collaboration between three international 
standard organizations. International Telegraph and Telephone Consultative Committee 
(CCITT), International Organization for Standardization (ISO), and the International 
Electrotechnical Commission (lEC).
The formal name o f the standard that most people refer to as 'JPEG' is ISO/IEC IS 
10918-1 I ITU-T Recommendation T .8I, as the document was published by both ISO 
through its national standards bodies, and CCITT, now called ITU-T. IS 10918 has 
actually 4 parts
Part 1 - The basic JPEG standard, which defines many options and alternatives for the 
coding o f still images o f photographic quality
Part 2 - which sets rules and checks for making sure software, conforms to Part I
Part 3 - set up to add a set o f extensions to improve the standard, including the SPIFF file 
format
Part 4 - defines methods for registering some o f the parameters used to extend JPEG [I].
JPEG has defined an international standard for coding and compression o f continuous
tone still images. The primary aim o f the JPEG standard is to propose an image
compression algorithm that would be generic, application independent and aid VLSI
implementation o f data compression. To meet the different applications, the JPEG
standard includes two basic compression methods, each with various modes o f operation.
For lossy compression DCT (Discrete Cosine Transform) method is proposed and a
predictive method for lossless Compression. The Baseline DCT method is most widely
implemented JPEG method for many applications.
The compression ratio o f the image is given by;
Compression ratio = Source coder input data size (2.1)
Source coder output data size
Most widely used lossy compression techniques are
(i) Subband Coding
(ii) Transform Coding
2.4.1 Subband Coding
The fundamental concept behind Subband Coding (SBC) is to split up the frequency 
band o f a signal (image in our case) into various frequency subband or subband signals 
and then to code each subband using a coder and bit rate accurately matched to the 
statistics o f the band. SBC has been used extensively first in speech coding [10, 13] and
later in image coding [14] because o f its inherent advantages like variable bit assignment 
among the subbands and coding error confinement within the subbands.
The simplest way to encode audio signals is Pulse Code M odulation (PCM), which is 
used on music CDs, DAT recordings, and so on. This produces a high quality signal, but 
at a high bit rate (over 700k bps for one channel o f CD audio). To reduce the bandwidth 
we can use mu-law encoding. This is like PCM on a logarithmic scale, and the effect is to 
add noise that is proportional to the signal strength. Sun's au format for sound files is a 
popular example o f mu-law encoding. Using 8-bit mu-law encoding we can reduce the 
bandwidth to 350k bps, which is better than PCM.
Audio 
Bit s tream
D ig ita l  a u d i o  
s i g n a l T im e/F requency
M apping Q uantizer an d  coding Fram e packing
P sychoacoustic  Model
Audio 
Bit stream^ F ram e Unpacking R econstruction
Digital 
audio signal
Frequency/T im e
M apping
Figure 2.1 Block Diagram o f SBC
Fig 2.1 represents general subband encoder. First, a time-frequency mapping (a filter 
bank, or FFT, or something else) decomposes the input signal into subbands. The 
psychoacoustic model looks at these subbands as well as the original signal, and 
determines masking thresholds using psychoacoustic information. Using these masking
thresholds, each o f the subband samples is quantized and encoded so as to keep the 
quantization noise below the masking threshold. Finally all these encoded bits are packed 
as a frame and sent through communication channel.
At the decoder end, frames are unpacked, subband samples are decoded, and a 
frequency-time mapping turns them back into a single output audio signal.
Disadvantages o f Subband Coding
• One o f  the major problems with the subband coding is to resolve the bit allocation 
problem or the number o f bits assigned to each individual subband to get the best 
performance. One way is to use the idea o f optimal bit allocation to each 
quantized subband output individually. This is mostly valid for higher bit rates o f 
approximately 1 bit/sample or more.
• In Subband coding method is it is difficult to determine optimal coding system for 
low bit rate applications.
• I f  the overall bit rate changes the optimal bit allocation change which requires 
repetition o f entire coding process again.
• As the filters are not ideal filters it is not possible to perfectly decorrelate all the 
frequency Subbands and there is slight overlapping between adjacent frequency 
Subbands.
• It is very difficult to use Subband coding scheme for motion compensated video 
because o f frequency Subbands.
2.4.2 Transform Coding
Transform Coding is converting information from one set values to another using 
mathematical functions.
10
Different types o f transform coding techniques are
(a) Discrete Cosine Transform (DCT) based coding
(b) Lapped Transforms (LT) based coding
(c) Discrete W avelet Transform (DWT) based coding.
2.4.2.1 Discrete Cosine Transform (DCT) Based Coding
Discrete cosine transform (DCT) translates the image information from spatial 
domain to frequency domain to be represented in a more compact form. DCT properties 
are similar to Fourier transform.
By simple analogy we can illustrate how DCT works. Consider an unsorted list o f 15 
numbers between 0 and 4 (2, 3, 1, 4, 2, 2, 0, 1, 4, 1, 0,1,  4, 0, and 0). The transformation 
involves two steps one is sorting the list and second is counting the frequency o f 
occurrence o f each number -> (4, 4, 3, 1, and 3). Through this transformation we lost the 
spatial information but captured the frequency information.
Neighboring pixels within an image are highly correlated. So it is required to use any 
transform to exploit this correlation and representing information with fewer number o f 
bits. The Discrete Cosine Transform (DCT) has been shown to be near optimal for a large 
class o f images in energy concentration and de-correlating (Karhunen Loeve Transform 
{KLT} is the optimal transform but it isn’t used because its difficulty to practically 
implement) [7]. The DCT decomposes the signal into spatial frequencies, which then 
allow further processing techniques to reduce the precision o f  the DCT coefficients 
consistent with the Human Visual System (HVS) model.
Discrete Cosine Transform (DCT) is a lossy compression scheme where an N x N 
image block is transformed from the spatial domain to the DCT domain. DCT convert the
11
input image into spatial frequency components called DCT coefficients, in such a way 
that lower frequency components appear at left hand com er and high frequency 
components at right hand side o f  DCT matrix. As we know Human Visual System is less 
sensitive to high frequency components than low frequency components, we can further 
process the coefficients by quantization like process to represent data with less number o f 
bits.
Advantages o f DCT
• DCT is the near-optimal for signal processing
•  Efficient and wide acceptability
• Parallel processing capability
• Less complex comparing to other transform algorithms
• DCT can be done block by block level.
Mathematical equations o f DCT
The 2-D DCT is give as
XC,., = y g x N . ,  (2.2)
" " 4  2M  2V
First 1-D DCT for rows is calculated and then the 1-D  DCT o f columns is calculated.
The above equation is divided into rows and column parts as follows:
„  (2 .colnumber-F 1) # rownumber# 7i
C = K • cos  ---------------------    (2.3)
2M
K = —  for row = 0
N
■Jl
K = —  for row ^ 0 
N
12
„t (2. rownumber +1) • colnumber» n
C = K # c o s - ----------------------------------------------
I N
K = —  for column = 0
M
(2.4)
V2
K = —  for column 0 
M
For the 8X8 blocks, a one dimensional DCT/IDCT followed by an internal buffer 
memory followed by one-dimensional DCT is used to perform 2-D DCT. This way we 
can reduce the computation complexity o f DCT for the 2-D Image.
YCbCr signals
1-D DCT I
2-D DCT
Coefficients
RAM Buffer 2-D  DCT
Figure 2.2 2-D DCT using Vector Processing
Disadvantages o f DCT
• In JPEG, we divide an image into 2-D non-overlapping blocks o f 8X8 and apply 
8-point 2-D DCT on them to obtain fewer transformed coefficients. By this 
process we will only exploit spatial correlation between pixels but not correlation 
between blocks.
• The second disadvantage is blocking artifacts, discontinuities at the block 
boundaries (because o f using 8X8 blocks) resulting from reconstruction 
mismatches at low bit-rate situations.
13
2.4.2.2 Lapped Transforms (LT) Based Coding
The lapped transform was developed to solve the problem o f blocking effect in DCT 
based coding schemes. Instead o f non-overlapping 2-D blocks, the process uses 
overlapping 2-D blocks o f an image spatially. One o f the special types o f  lapped 
transforms is called lapped orthogonal transform (LOT).
Advantages o f lapped transform
• No need to use block based coding.
• Coding efficiency can be improved by taking into account o f  inter block spatial 
correlation.
• Blocking artifacts are eliminated
• Pre- and post-filter are can be constructed in modular cascaded stages, to 
minimize hardware/software modifications.
By lapped transform blocking effects are reduced but other effects like ringing 
around edges o f blocks will appear due longer basis functions. LOT is extension o f DCT 
but due to its complexity compared to improved advantages, so it is less popular for 
image compression [2],
2.4.2.3 Discrete W avelet Transform (DWT) Based Coding
Discrete wavelet transform (DWT) is one o f the latest coding techniques used instead o f 
DCT. Its main advantage over DCT is that there is no need to divide the image into non 
overlapping blocks. Because o f their inherent multi resolution nature, wavelet-coding schemes 
are especially suitable for applications where scalability and tolerable degradation are 
important. After JPLG image compression JPLG committee has released its new image coding 
standard, JPLG-2000, which has been based upon DWT.
14
By Fourier transform (DCT based) we can represent signal as sum o f sine and cosine 
functions. By this we can know frequency spectrum o f the signal, but we do not know 
when and where they are present. To overcome this problem we should able to represent 
signal in frequency as well as time domain. This is done by wavelet transform.
By time-frequency joint representations one has to cut the signal o f interest into 
several parts and then analyze the parts separately, by this we can get more information 
about the signal. In wavelet transform the use o f a fully scalable modulated window 
solves the signal-cutting problem. The window is shifted along the signal and for every 
position the spectrum is calculated. W avelet transform is convolving input signal with 
particular instances o f the wavelet (window) at various time scales and positions. Then 
this process is repeated many times for every new cycle. By this we can get signal in time 
as frequency domain, all with different resolutions [3].
Performing these convolutions at every position and every characteristic scale is 
called the continuous wavelet transform. By, N yquisf s theorem the highest frequency we 
can model with discrete signal data is half that o f the sampling frequency. So in the worst 
case we have to use the transform at every other point [4].
The continuous wavelet transform is generally expressed as:
L  [4] (2.5)
/   ^ y  s  )
In CWT, the signals are analyzed using a set o f basis functions which relate to each 
other by simple scaling and translation. In the case o f DWT, digital filtering techniques 
are used for the time-scale representation. The signal to be analyzed is passed through 
filters with different cutoff frequencies at different levels [12].
15
I M A G E
1  LevelsI
3 Levels
Figure 2.3 Level-3 dyadic DWT scheme used for Image Compression [5]
2.4.2.3.I JPLG 2000
JPLG2000, the new standard for still image coding, better addresses the problems o f still 
image compression by previous methods. It offers a wide range o f  functionalities such as 
lossless and lossy coding, embedded lossy to lossless coding, progression by resolution and 
quality, high compression efficiency, error resilience and region-of-interest (ROI) coding. 
Comparative results have shown that JPLG2000 is indeed superior to established image 
compression standards [5].
In JPLG-2000 compression first the image is preprocessed by tiling the image i.e. 
partitioning the original image into non-overlapping blocks. Tile components are decomposed 
into various decomposition levels by using separable wavelet transform, than a scalar
16
quantization is used to quantize than eaeh bloek is entropy eneoded. EBCOT proeess is used 
for Entropy eoding.
{Rate control) Arithmetic Coding
Formatting Layer formation 
(opSonal)
C pde-block partition(optional)
Figure 2.4 General bloek diagram o f the JPEG 2000 Eneoder [11]
Advantages OF JPEG 2000
• JPEG 2000 offers high image quality than JPEG.
• In the JPEG 2000 compression the compressor can choose image quality, 
maximum resolution and losses.
•  JPEG 2000 can provide both lossless and lossy compression in the same 
compression engine.
Disadvantages o f DW T
• The cost o f computing DW T as compared to DCT is much higher. The 
complexity o f calculating DWT depends upon the length o f wavelet filter.
•  Larger DW T basis functions or wavelet filters produces blurring and ringing noise 
near edge regions in images or video frames.
17
CHAPTER 3 
ARCHITECTURE
3.1 Outline o f JPEG
The basic model o f JPEG is shown below
In p u t  I m a g e -
R G B  to
, Y C b C r  
; C o n v e r s io n
F o r w a r d  D C T Q u a n t iz a t io n
A
D if fe re n tia l  
: G o d e r
R u n  L e n g th  
E n c o d e r
Huffman
Coding
à
O u tp u t  B i t s t r e a m
Q u a n t iz a t io n
t a b l e s
H u # n a n
T a b le s
Figure 3.1 JPEG Baseline Encoder
The Join Photographic Expert Group proposed the JPEG compression standards [6]. 
The encoder model transforms the input image into suitable form for further processing. 
After that entropy encoder compresses the output form encoder.
Different modes o f JPEG are
• Lossless Coding
• Sequential Coding
•  Progressive Coding
• Hierarchical Coding.
18
In Lossless Coding the image can he reconstructed after decoding. In this process we 
use methods like differential coding, Huffman coding, Arithmetic coding.
In Sequential Coding image blocks are scanned sequentially from top to bottom and 
left to right. Baseline Coding is example o f  sequential coding. In Progressive Coding 
image blocks are processed sequentially, but coding is completed in multiple scans. The 
first scan yields the full image but without full details which are provided in successive 
scans. In Hierarchical Coding each image component is encoded as a sequence o f frames. 
The first frame is usually a low resolution o f original image and subsequent frames are 
differential frames between original and reference reconstructed image [7].
Based on these modes there are four distinct processes for jpeg image compression.
• Baseline process,
• Extended DCT-based process,
• Lossless process,
• Hierarchical process.
Both baseline and extended DCT processes uses DCT in the encoding process, but in 
the entropy coding Baseline uses Huffman encoding and extended DCT uses Huffman or 
arithmetic coding. Lossless process uses predictive or sequential methods for encoding 
and Huffman or arithmetic for entropy process. Hierarchical process uses either DCT or 
lossless process for encoding and same entropy encoding methods as other process.
In this thesis we are going to discuss about implementation o f baseline JPEG image 
compression.
Baseline JPEG mode is the most widely used jpeg image compression. Baseline mode 
is simple and is based on sequential mode i.e. Image is scanned from left to right and top
19
to bottom. Image is divided into non overlapping blocks o f 8X8 each o f 8 bit, DCT 
process. Quantization and entropy encoding steps are performed on that.
The JPEG Baseline can be divided mainly into five parts: those are color space 
conversion, down sampling, 2-D DCT, quantization and entropy encoding. The color 
space conversion converts the image form RGB color to YCbCr (luminance component 
Y and two chrominance components Cb and Cr). Luminance components component 
contains gray image and chrominance components contain color information. The down 
sampling reduces the sampling rate o f color information (Cb,Cr). 2-D DCT transform 
image information from spatial domain to frequency domain. By quantization operation 
high frequency components are eliminated and low frequency components are 
represented by less number o f bits. JPEG uses predefined quantization tables for 
eliminating the high frequency components. The selection o f quantization tables is 
critical since it affects both compression efficiency and image quality. After quantization, 
the DCT coefficients are arranged in zigzag order to get low frequency components at the 
top and high frequency components at the bottom. It maps 8X8 block to 1X64 values. 
Finally entropy coding is applied. It uses differential coding for the DC components and 
Run Length Encoding for AC Components. The location o f (0, 0) o f  each block i contains 
DC Coefficient represented as DCi. Since the adjacent blocks are likely have similar 
average energy levels so we can send only the difference o f current and previous DC 
coefficients which is know as Differential Pulse Code M odulation (DPCM). The 1X64 
vectors have lot o f zeros, it is represented by [run length, count, and value] pair by Run 
Length encoding to reduce the number o f  bits to represent data. In Run Length encoding 
only non-zero values will be sent with counting the number o f zeros preceding it. After
20
that Variable length coding (VLC) and Huffman coding is applied to represent data with 
less number bits [6]. In the VLC coder the amplitude is represented with its significant bit 
as most significant bit. For each pair o f  run length codes there is a variable length 
Huffman code which will be used by the Huffman encoder to perform the compression. 
The Huffman codes are stored in tables. In the JPEG image compression process down 
sampling and quantization are irreversible, but the losses can be controlled depending the 
necessity o f image quality [8].
3.2 Architectures o f jpeg
3.2.1 Color Conversion
The color space conversion is the first operation in a JPEG compressor if  the input 
images are in RGB color space. Although the JPEG algorithm is unaffected by color, 
since it processes each color independently, but change in color space improves 
compression ratio significantly. This is due to defect in Human Visual System (HVS) that 
is less particular for some o f the characteristics o f  the image and also RGB is not efficient 
in dealing o f real world images. In RGB all the three components need equal band width 
to generate colors and highly correlated. RGB images are not very best for processing o f 
the image too. For example if  want to change intensity o f pixel we should call all the 
three colors and process the colors. I f  we have any access to intensity o f colors directly 
the processing will be faster. The appropriate representation o f colors for JPEG 
compression is YCbCr W here Y is Luminance component and Cb and Cr are two 
Chrominance components. Luminance component contains image information (Gray
21
scale) and Chrominance component contains color information. Component Cb contains
information relative to blue color and Cr component contains information relative to
red color. The range o f YCbCr is 16-235 for 8 bit representation.
The below calculation are used in converting RGB to YCbCr.
Y;,= 0.299R,v+0.587G,y+0.114B,^
Cb/^= —0.169Ry—0.331 G,y+0.5By 
Cr,y= 0.5R,y-0.419G,y-0.081B,^
The source image is portioned into non-overlapping 2-d blocks o f 8x8, which are 
scanned sequentially form left to right and top to bottom. The nominal range o f 
Luminance component is 0 to 1 and Chrominance component’s nominal range is -0.5 to 
0.5. To make Chrominance components range equal to Luminance, 128 is added to the 
Cb and Cr components [17].
This color conversion architecture is based on simplified models provided by Xilinx 
Corporation which uses only four multipliers [17]. The architecture has latency o f six 
clock cycles and operates at frequency o f 285 MHZ.
3.2.2 Discrete Cosine Transform (DCT)
Discrete Cosine Transform (DCT) is a lossy compression scheme where an N x N 
image block is transformed from the spatial domain to the DCT domain. DCT 
decomposes the signals into frequency domain which are called DCT coefficients. The 
lower frequency DCT coefficients appear towards upper left com er and higher frequency 
DCT coefficients are in the right-hand com er o f DCT matrix. The Human visual System 
is less sensitive to high frequency components so we can quantize high frequency 
components by quantization.
22
For implementing o f DCT we use vector processing using four parallel multipliers. 
The output Y o f 8 X 8 DCT for input X is given by Y = C*X»C', where C is the matrix 
with the cosine basis functions, and C' is the transpose coefficients [18]. Using row 
column decomposition Y can be computed by 1 -D DCT transforms as
Y=C*Z where Z= X*C'. (2.6)
The mathematical equation for DCT is given as
M-\ N~\
XCpo = Z Z X N ,
C(p)C(q) 7 r { lm + \ )p  7r{2n + l)q
-.cos
m-0n=0 2M
.cos-
2V
(2.7)
First 1-DCT is performed for rows and then for columns. The 1-D DCT is calculated by 
separating equation-1 into rows and column parts.
W here C and C* are calculated as
C =
^ 3 1 7 0 23170 23170 23170 23170 23170 23170 23170^
32148 27246 18205 6393 -6393 -18205 -27246 -32138
30724 12540 -12540 -30274 -30274 12540 12540 30274
27246 -6393 -32138 -18205 18205 32138 (%93 -27246
23170 -23170 -23170 23170 23170 -23170 -23170 23170
18205 -32138 6393 27246 -27246 -6393 32138 -18205
12540 -30274 30274 -12540 -12540 30724 -30724 12540
1 ^ 3 9 3 -18205 27246 -32138 32138 -27246 18205 -6 3 9 ^
23
rc '
23170 32138 30274 27246 23170 18205 12540 6393
23170 27246 12540 -6393 -23170 -32138 -30274 -18205
23170 18205 -12540 -32138 -23170 6393 30724 27246
23170 6393 -30274 -18205 23170 27246 -12540 -32138
23170 -18205 -12540 32138 -23170 -6393 30274 -27246
23170 -27246 12540 6393 -23170 32138 -30274 18205
^ 3 1 7 0 -32138 30274 -27246 23170 -18205 12540 -6 3 9 2 ^
The intermediate value Z = X»C' can be calculated as follows: 
Where
X =
xOO x01 
x 1 0  XVI 
X20x21 
X30x31 
x40 X41 
X50 X51 
X60 X61 
X70x71
x02x03  
x12 x13 
X22X23 
X32X33 
X42x43 
X52XS3 
X62X63 
X72x73
X04 X05 
x14 x15 
x24 X25 
X34 x35 
X44 X45 
X54 x55 
X64 X65 
X74 X75
x06 x07 
x16xT7 
X26x27 
X36x37 
X46 X47 
x5€ x57 
x66 x67 
x76 x77
Z(0,0)-23 1 7 0 (Xoo+Xoi+Xo2+Xo3+Xo4+Xo5+Xo6+Xo7)
Z(0,,)=32138x00+27246x01+18205X02+6393X03^6393X04-18205x05-27246x06-32138X07) 
=3 2 1 3 8(xoo-xo?)+27246(xo i -xoo)+1 8205 (xo2-xq5 ) + 6 3  93 (X03-X04) 
Z(o,2)=30274(xoo-xo7)+12540(xoi-xo6)-12540(xo2+Xo5)-30274(xo3+xo4) 
Z(o,3)=27246(xoo-xo7)-6393(xoi-xo6)-32138(xo2-xo5)-18205(xo3-xo4)
Z (o,4 )= 23  1 7 0 ( xoo- xo7 )-2 3  1 7 0 ( xo i- xo6 )-2 3  1 7 0 ( xo2+ xo5 )+ 2 3  1 7 0 ( xo3+ xq4)
Z(0,5)= 1 8205 (xoo-xo7 ) - 3 2  138(xoi -Xoo)+6393(xo2-xo5)+27246(xo3-xo4)
Z(o,6)=12540(xoo-xo7)-30274(xoi-xo6)+30274(xo2+xo5)-12540(xo3+xo4)
Z(o,7)=6393(xoo-xo7)-18205(xoi-xo6)-27246(xo2-xo5)-32138(xo3-xo4)
24
Or
Z(k,0)=23170(XkO+Xki+Xk2+Xk3+Xk4+Xk5+Xk6+Xk7)
Z(k,i)~32138xko+27246xki+18205xk2+6393xk3-6393xk4-l 8205xk5-27246xk6-32138xk7) 
=32138(xko-Xk7)+27246(xki-Xk6)+18205(xk2-Xk5)+6393(xk3-Xk4) 
Z(k,2)=30274(Xko-Xk7)+12540(xki-Xk6)-12540(xk2+Xk5)-30274(xk3+Xk4) 
Z(k,3)=27246(xko-Xk7)-6393(xki-Xk6)-32138(xk2-Xk5)-18205(xk3-Xk4)
Z(k,4)=23170(Xko-Xk7)-23170(xki-Xk6)-23170(xk2+Xk5)+23170(xk3+Xk4) 
Z(k,5)=18205(xko-Xk7)-32138(xki-Xk6)+6393(xk2-Xk5)+27246(xk3-Xk4) 
Z(k,6)=12540(xko-Xk7)-30274(xki-Xk6)+30274(xk2+Xk5)-12540(xk3+Xk4) 
Z(k,7)=6393(xko-Xk7)-18205(xki-Xk6)-27246(xk2-Xk5)-32138(xk3-Xk4)
Where k=0,2..........7
Then 2-d DCT function is calculated from Y= CZ. Where Z is 1-D DCT matrix for input 
X and C is matrix o f cosine coefficients.
Figure 3.2 1-D DCT Implementation [18]
25
The above bloek diagram is used for implementation o f 1-d DCT. First 1-D DCT 
values are ealculated and stored in a RAM and seeond 1-D DCT is done on the values 
stored in the RAM. 8X8 inputs are loaded into adder/subtractor whose outputs are fed to 
the multiplier. The multiplier takes constant coefficients from the ROM and feed into the 
second input o f the multiplier. The multiplier outputs are given to adder which will 
perform additions and gives 1-D DCT Coefficients which will be stored in a transpose 
buffer (RAM). The toggle flip flop controls the addition and subtraction operations.
Toggle
ADD
ADD
SUB
ADO
Figure 3.3 2-D DCT Implementation [18]
2 6
The values stored in the transpose buffer are read column by column and fed as input 
to second DCT. The output o f DCT is 2-D DCT coefficients which are used as inputs to 
quantizer for further processing.
3.2.3 Quantization
The quantization process reduces number o f bits used to represent the DCT 
coefficients. Since Human eye is less sensitive to high frequency components than low 
frequency components so quantization factors are high for high frequency components 
than low frequency components.
The quantization operation is an integer division o f the 2-D DCT coefficients by pre­
defined values. These pre-defined values are stored in tables called quantization tables. 
There are two quantization tables for baseline JPEG standards one for Luminance (Y) 
and other for Chrominance components (Cb and Cr). The optimum values o f the 
components in quantization tables are dependent on the application, but the JPEG 
standard suggests typical tables that have a good efficiency for any application [9].By 
quantization we can eliminate coefficients which are less perceptible to human eye.
Figure 3.4 Quantization Architecture [9]
27
The quantization architecture designed as shown in fig. 3.5. The quantization 
architecture uses two ROMs for storing the quantization tables for Luminance and 
Chrominance components. For the multiplier we use barrel shifters controlled by 
quantization values stored in the ROMs. By using the barrel shifters for the multiplication 
we can reduce the number o f clock cycles required for multiplication. For each array 
element o f  8X8 blocks, there is a specific constant to be used from the quantization table 
for the division operation [9].
The quantization tables used for the JPEG compressions are presented in these are 
tables proposed by standard JPEG-92. The quantization tables used for compression and 
reconstruction are exactly same. Scaling factor is used to get the desired compression 
levels. The scaling factors after 2-d DCT are multiplied with quantization values and 
multiplied values are stored in ROM [9].
The Quantization operation is given by
Cqij = round
r
CijX 1_
Q ij  X  F c i j
0 -  i j  -  7
W here
Cqij quantization coefficient 
Cij Coefficient o f 2-d DCT 
Q ij quantization constant 
Fey Scaling factor
. The quantization values (Qy) and scaling factors (Fey) are as given below:
Qîîj =
C 11 10 16 124 140 151 161 145 X  
12 12 14 19 126 158 160 155
14 13 16 24 140 157 169 156
14 17 22 29 151 187 180 162
18 22 37 56 168 109 103 162
24 35 55 64 181 104 113 192
49 64 78 87 103 121 120 101
. 72 92 95 98 112 100 103 199J
2 8
QCÿ =
r  17 18 24 47 99 99 99 99 9 9 ^
18 21 26 66 99 99 99 99 99
24 26 56 99 99 99 99 99 99
47 66 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99 J
r
Fcii
8, 10 11,10 10, 45 9,41 8, 00 6, 29 4, 33 2,21
11, 10 15, 39 14,50 13, 05 11, 10 8, 72 6,01 3 ,07
10, 45 14,50 13,66 12, 29 10, 45 8,21 5, 66 2, 88
9,41 13,05 12,29 11,06 9,41 7, 39 5, 09 2, 60
8, 00 11, 10 10, 45 9, 41 8, 00 6,29 4, 33 2,21
6, 29 8, 72 8,21 7, 39 6, 29 4, 94 3, 40 1,73
4, 33 6,01 5, 66 5,09 4, 33 3,40 2, 34 1,20
Q 2,21 3,07 2, 88 2, 60 2,21 1,73 1,20 0 , ^
The basic operation o f quantizer is multiplying Cy with l/(  Qy * Fey). In the 
quantization architecture it uses ROM memory for storing the controls o f the four 
dislocated ones indicating the displacements that must be carried by each barrel shifter, 
instead o f storing quantization matrix. Each barrel shifter uses three bit control for the 
displacement i.e. total o f 12 bits. B S l uses three most significant bits where as BS4 uses 
three least significant bits. The input to quantizer is 15 bit and output is 10 bits that means 
it reduces periodically the number o f bits that represent the data. Quantization operation 
is carried in the pipeline o f three stages. In the first clock cycle quantizer takes the input 
and corresponding calculates displacement and addition o f shifted inputs form B S l, BS2,
29
BS3, BS4 are carried in A, B adders and in the next clock cycle adder C adds the result 
o f  adder A and B. In the next clock cycle we will get the output
Table 3.1 Barrel Shifter Controls
C ontro l  Barre l  Shifter
000 6 zero zero zero
001 7 9 10 11
010 8 10 11 12
o i l 9 11 12 13
100 10 12 13 14
101 11 13 14 15
110 X 14 15 15
111 X 15 15 15
In the quantizer architecture the control word are stored in column by column. So 
when we are reading the inputs for quantizer that is outputs o f DCT we should read 
column by column.
3.2.4 Zig Zag Scanning
Quatized DCT coefficients will have zeros in the high frequency region o f image 
blcok i.e. right bottom com er o f the block. For getting more number o f  zeros at the same 
place, we scan the image block in such a way that all the zeros will accumulate at the end. 
Zig Zag scanning is very useful for further processing o f the image. It maps 2-D 8X8 
image to 1X64 1-D coefficients.
30
f / / / / / / /I
I' / / / / >
/ / / / /
I' / / / / / /
f / / / y / / •d
/ / / / /1 /^
8x8
3.2.5 Entropy Encoder
DC
m M x
DC 1'
Acn
AC1
ACM
Figure 3.5 Zig Zag Scanning
StzA Cèkülatlon s iz e . DCHWmanCoder
RLE
Coder
VLC
Coder
size
---- Size CakadatKxi AC Huffman1 ^ Coder
[— ►
Slze|AmpMude
DC Code
Run/SizelAmpllhjde
AC Cod#
AC I luWman 
Tables
Figure 3.6 Entropy Encoder [7]
The last stage o f  JPEG Compression is entropy encoding. This block improves 
overall compression efficiency by performing lossless coding on the quantized DCT 
coefficients. The entropy encoder receives 10 bits input after quantization and gives 
output o f 32 bit JPEG in an asynchronous way. The entrance o f the architecture o f the 
entropy encoder is synchronous and exit is asynchronous. This is mainly due to two
31
reasons, one is the output o f the Run Length Encoder (RLE) is asynchronous and it is 
propagated throughout the architecture and other is due to different lengths o f Huffman 
codes.
After quantization the resulting matrix will have large number o f zeros which are 
read in Zig Zag manner to increase the sequence o f zeros. Entropy encoder uses 
differential coding, Run Length Encoding (RLE), Variable Length Coding (VLC) and 
Huffman encoding to make the reduction in the number o f  bits used to represent the 
image after JPEG Compression [6,9,10].Color and gray scale image follow same step for 
entropy coding, but the differential coder and Huffman coder are different for color and 
gray scale images. In entropy coding DC and AC components are processed separately. 
The component in the position <0, 0> (first line first column) o f the 8X8 matrix is called 
DC Component and reaming 63 components are AC Components. The first operation is 
Differential coder for DC components and Run Length Coder (RLE) for the AC 
components.
The DC Components o f successive 8X8 windows in an image are highly correlated. 
So by differential encoding we will only take the difference between actual DC 
Component and previous DC Component o f the previous matrix. The differential code is 
coded by VLC coder and it is also used to calculate number o f significant bits that are 
generated by VLC coder. This is done by Size calculator. So, by VLC encoder we will 
get the amplitude and by Size calculator we will get the sizes which are given as inputs to 
Huffman coder. Huffman coder uses Huffman tables which are stored in ROMs (one for 
Luminance and one for Chrominance DC Components) to get the outputs. The values
32
generated by Huffman coder and VLC coder are concatenated to get the JPEG DC code 
[7].
For Processing AC components, first step is counting the number o f zeros before non 
zero coefficient which is will be done by Run Length Encoder (RLE). The RLE encoder 
compresses an input stream by representing consecutive zeros by their run-length. The 
Run Length Encoder counts number o f zeros until the last zero is present or it reaches 
maximum zero count. So the output o f Run Length Encoder is [Run Length and 
Amplitude]. The non-zero values are passed through VLC encoder to get the amplitude 
and also it is given to Size Calculator to calculate the size. Both Run and Size are 
concatenated and are Huffman Coded. Huffman encoder takes codes from predefined 
Huffman tables which are stored in FPGA internal ROMs. VLC coder amplitude, 
Coefficient Size, Huffman Code and Huffman Size are given to Pre assembler which will 
concatenate the VLC amplitude and Huffman Size to give variable length code which is 
applied to Assembler for further processing. The number o f significant bits in the 
amplitude o f Preassembler is given by addition o f Coefficient Size and Huffman Size.
In the assembler stage amplitude is assembled into 32 bit words output Compressed 
JPEG bit stream.
The pipelined architecture o f entropy encoder is given as
%o
OK
JPEG
^  DC AmpWud*
^  AC AmpAud# 
^  Run 
Q ) AmpACDC
<§) CceffVeienl
VLC
Coder
Size
C a lc u la tio n
Hulfrnan
Coder
Assembler
Differential
Coder
RLE
Coder
Figure 3.7 Pipelined Architecture for Entropy coder [20]
33
In the pipelined architecture intermediate registers are used for the synchronization o f 
different operations.
3.2.5.1 Differential Coder
Differential coding is the first operation in the entropy encoding. It is used only for 
DC components. Differential coder performs simple subtraction between the current 
matrix DC component and previous matrix DC component o f the same color elements. 
The result is called Amplitude DC.
i
a
Figure 3.8 Differential Coder [20]
The Differential coder for color images is presented in above Fig 3.8. It consists o f 
three 10 bit registers for storing the previous DC Coefficients o f each color matrix 
(Y,Cb,Cr) and one 10 bit adder for performing the subtraction operation. The writing o f 
the register is done when the ACDC signal is active high that indicates the matrix value is 
DC. By using YCbCr signal we can select which register we need to write. We will use 
multiplexer signal YCbCr to select exact adder input form Luminance or Chrominance 
registers. W hen rst signal is low the DC coefficient is writes the input into the register 
according to the signal ACDC and YCbCr and the same input is given to Adder. The 10
34
Bit Adder performs the subtraction operation and gives the output which is processed by 
VLC encoder in the next clock cycle.[20]
Registers
YCbCf ACDC Y Cb Cr
00 0 Yes No No
01 0 No Y es N 0
10 0 N 0 No Y es
X 1 No N 0 No
Table 3.2 Selection o f Component
The signal YCbCr is used to control the multiplexer which gives the input to 
subtractor.
3.4.5.2 Run Length Encoder
Run Length Encoder (RLE) is used for counting the number o f zeros in the AC 
components. Run length Encoder is same for gray and color images. RLE coder 
architecture is presented below.
$
&
I
Ü
<
ACDC
Figure 3.9 Run Length Encoder [20]
35
The AmpACDC and Run, Okrle registers presented in architecture are same registers 
which are used in the global Entropy coder architecture for pipelining. The output o f RLE 
coder is asynchronous where as input is synchronous. While Run length Encoder is 
counting number o f zeros, OKrle signal goes low and there w on’t be any valid output. 
W hen the non-zero input occurs RLE coder stops counting the zeros and it updates the 
outputs with new pair o f Run and AC amplitude. RLE architecture has a flag (Okrle) to 
indicate when new valid outputs are available. There are two restrictions in RLE coder 
that are imposed by JPEG standards [20].
The first restriction is maximum value o f Run should be 15 then zero counter has 4 
bits. W hen there are more than 16 zeros in the sequence, the zero counter will be restarted 
and the output is sent has 15/0 Run/Amplitude pair, which indicates there are 15 
continuous zeros fallowed by zero. Counter used in the RLE coder controls this 
restriction. W hen it reaches 15 zeros followed by 0 it will automatically give outputs 15 
for Run and zero for AmpACDC. The second restriction is when the input is sent which 
is last input and it is Zero than Last signal will come into picture and it controls the 
outputs by sending this bit as the last bit. If  the value is zero than it will reset the output 
register forming the pair 0/0. In the normal operation when these two restrictions doesn’t 
occur the counter operation counts the number zeros and when it is counting, Okrle signal 
goes low indicating output is not ready. W hen the non-zero input occurs Okrle signal 
goes high indicating valid output and ACC signal gives number o f zeros to Run register 
and amplitude is sent to AmpACDC register [9].
Differential coder and RLE coder must operate in perfect synchronism, so that they 
can be used by other components in the Entropy coder at the same pipeline stages. The
36
DC and AC amplitudes generated by Differential and RLE coder pass through the 
multiplexer controlled by ACDC signal to get the correct output to be used in the rest o f 
the architecture.
3.2.5.3 Size Calculator
DC and AC amplitudes are applied to Size calculator that indicates number o f 
significant bits o f the AmpACDC value. The size calculation is done by looking at the 
tables proposed by JPEG standards [6].
The size calculation table is given as
Table 3.3 Size Calculation table
Value Size
0 0(0000)
-1,1 1(0001)
-3,-2,2,3 2(0010)
-7 ...-4 ,4 ...7 3(0011)
-15....-8,8....15 4(0100)
-31...-16,16...31 5(0101)
-63...-32,32...63 6(0110)
-127...-64,64....127 7(0111)
-255...-128,128...255 8(1000)
-511...-256,256....511 9(1001)
-1023..512,512 1023 10(1010)
37
From the above table we can generate coefficient size o f 4 bits which is used to 
control the VLC coder architecture and it is also given as input to the Huffman coder and 
pre assembler architectures. Amplitude form Differential coder or Run Length Encoder is 
given as input to Size Calculator to find out the Coefficient Size.
3.4.5.4 Variable Length Coder:
Variable Length Coder (VLC) is used to identify which bits among the 10 bits 
AmpACDC are significant with the objective to discard the not significant bits, including 
the sign bit. The negative number must be represented in one’s compliment to be VLC 
coded. The entrance o f the VLC coder has a controller to discard sign bit. The signal 
interpretation is also inverted: a number that starts with zero is negative and a number 
that starts with one is positive.
The number o f shifts to left to each Coefficient Size value is given as.
Table 3.4 VLC Architecture Shifts
Coefficient
Size
Number o f 
shifts to left
0 10
1 9
2 8
3 7
4 6
5 5
6 4
7 3
8 2
9 1
VLC encoder uses Barrel shifter controlled by Coefficient Size which is calculated 
from size calculator. This barrel shifter shifts the AmpACDC value to the left to put the
38
first significant bit as most significant bit o f the word discarding sign bit. The ealculated 
amplitude is called VLC Amplimde. The output o f the VLC coder is 9 bit which is not 
really variable length. The assembler in the next stage will discard the not significant bits 
and generates Variable length Codes.
3.2.5.5 Huffman Encoder:
The Coefficient size (to DC Coefficients) and concatenation o f Coefficient Size and 
Run (to AC coefficients) are Huffman coded. The architecmre proposed below uses static 
Huffman tables proposed by JPEG 92 standards. In the Huffman coding, the compression 
is achieved by assigning short code words to input symbols o f high probability and long 
code words to low probability input symbols. For a given source-probability distribution 
Huffman coder gives optimum symbols to represent the data [7]. The use o f standard 
tables simplifies hardware but decreases the compression rate [7]. Huffman Coder 
architecture designed for color images is given below.
, H u ffm an  
' T a b le s
ROM1
12x13Wh
MRP a
YCbGM
Huffman 
CodeROW2lAlSb**
"0000 
& Size C bO r/D C I1MSB &x)oo«xr%
4m.aB
ROM3
ACDC1
ROM4178x21 M
C bC r/A C
31SB
y  Hu*msn 
Size
YCbCfl
Figure 3.10 Huffman Coder Architecture [20]
39
The architecture presented above uses four ROM memories for storing the Huffman 
tables used to code color images: one for DC Luminance, one for AC Luminance, One 
for DC Chrominance and one for AC Luminance components. The Huffman tables store 
Huffman code and Huffman size. The size o f Huffman codes can be calculated by Size 
calculator but we know the sizes o f Huffman codes so we can directly store the Huffman 
size into static tables which eliminates delay. So the output o f ROM memories is 
Huffman code followed by Huffman Size. The values to be Huffman coded are used like 
address to these memories. The number o f words and bit width used to represent 
Huffman codes were optimized. The DC tables use 12 memory positions with 4 address 
bits (Size). AC tables uses 176 memory positions with 8 bit address bits (Run & Size).
DC Luminance table 9 bits to Huffman codes and 4 bits for Huffman size. DC 
Chrominance table uses 11 bits for Huffman codes and 4 bits for Huffman tables. AC 
Luminance and Chrominance components uses 16 bits for Huffman Code and 5 bit 
Huffman Size. Two multiplexer are used to which o f the four memories should be 
connected to the output. In the two multiplexers YCbCr signal is used as a controller to 
process either Luminance(Y) or Chrominance (CbCr). ACDC signal is used to get the DC 
or AC component as the output. The Huffman code and Huffman Size are applied to Pre­
assembler for further processing.
3.4.5.6 Preassembler
The Pre-assembler architecture receives four inputs generated form the previously 
explained blocks: VLC Amplitude from VLC coder. Coefficient Size from Size 
Calculator and Huffman Size, Huffman code from Huffman Coder and generates two 
outputs Amplitude and Size which will be used in Assembler architecture.
40
S iz e
'O’s 
CoefTtoent Size
B S A
Figure 3.11 Preassembler Architecture [20]
VLC Amplitude bits are shifted to right by Barrel Shifter (BSA) that is controlled by 
Huffman Size. These shifted codes are assembled with Huffman Code by ‘O R’ logic 
operation. The Huffman code is concatenated with zeros in the right which are used as a 
mask in performing the ‘O R’ logic operation with VLC amplitude. The ‘O R’ logic 
operation preserves only significant bits which makes the code variable length. Assembly 
o f the Huffman code and VLC code generates Pre-assembler output Amplitude o f 28 bits. 
The addition o f Huffman Size and Coefficient Size gives the number o f significant bits in 
the Amplitude output which represented by Size.
3.4.5.7 Assembler
The final assembling o f words in JPEG is carried through the Assembler architecture 
con sid er in g  o n ly  s ig n ifica n t b its  o f  the A m p litu d e  input. The S iz e  input gen erated  from  
the Pre-assembler indicates how many bits are significant among the 25 Amplitude bits
V I
41
OK
JPEG
word
»j
Figure 3.12 Assembler Architecture [20]
The Assembler architecture consist o f one Barrel Shifter (BSB) controlled by 
accumulation o f Size values and an ‘O R’ logic operation to assemble the significant hits 
o f  different inputs. The assembly o f the words is controlled by an adder which 
accumulates different sizes o f input Amplitudes and stores into the register ACC.
The Assembler uses two 32 bit registers to assemble jpeg words. The High registers 
stores the 32 most significant bits from the Barrel shifter (BSB) output. W hen it records 
the 32 bits it will send the word as output JPEG word and OK register sends the output is 
valid. The Low register is used to store Overflow bits when the generated values from the 
Barrel shifter (BSB) as more than 32 significant bits. This overflow hits are again sent to 
High register when new jpeg word is ready to assemble. The maximum size o f  Amplitude 
input is 25 bits and the higgest displacement possible by Barrel shifter is 31 hits, so the 
output o f Barrel shifter should be 56 bits. O f these 56 hits, 32 bits most significant bits 
are used in the ‘O R’ logic operation whose result will he stored in High register and the 
remaining 26 bits are stored in the Low register. ‘O K ’ register indicates new valid data is 
ready. OK signal is also act as control signal for the multiplexer which decides input for 
the ‘O R’ logic operation. I f  the new jpeg word is ready it takes the values from the Low
42
register otherwise it allows High register as input to the jpeg word. The assembler 
operation is enabled only when the RLE eoder generates valid outputs. This is eontrolled 
by O krelel signal. O krlel signal is generated when the valid output is present at the RLE 
eoder with one delay eloek eyele. In this way we ean eliminate the effeet asynchronous 
results given by RLE eoder [20].
43
CHAPTER-4
RESULTS AND DISCUSSIONS
4.1 Simulation Waveforms
4.1.1 Color Conversion
N a m e 1 V a lu e ■ 100 , 200 . 300 400 , 50C --------- "%0 1 700 ■ 800 > 900 1000 1 1100 1 1200 '
Cio ■
1 ^
1
C lock 0
^  C lockE n... 1 ..... r............
8F <= XFF XF» X «  X»F XOF X31
RI ( - Xff X w , XM XD3 Xl7 .. X I L
b 'j <= (i"i %FF x « XD5 X'df
Y EE ;(cicp XIO Xe b X " XOE Xffi
œ  "O Cb 80 ;<oo x « X70 Xso X07
B - o  Cl 80 '■{m x«» Xfc X05 X72 XB7
Fig 4.1 Simulation o f Color Conversion
The eolor conversion module inputs are R, G, B, elk, elken and ouputs are Y, Cb, Cr 
For the inputs R = FF, G = FF, B = FF Outputs are Y = EB, Cb = 80, Cr = 80.
Latency o f eolor conversion module is five eloek cycles.
44
4.1.2 Discrete Cosine Transform (DCT)
Name
î>-RST
Value
0
| I I 200 7 '  400 1 600 , 800 , 1000 ,
l<=o i 1
1200 1400 1 1800 , 2000 , 2200 , 2400
JM 50 ns .........................................:...........
^CLK 1
m  ^  xin DD :<=,,. ifuLI XfF X'fS Xe8 ) (C 5  XoD
m  ^  z _ _ o u \ 0F7 \ Kooo X0F7 X03F X02B X7D6 XOOOXOII X7F8 )(7F1 XiOS X^OOO
rdy_out 0 1
E) -c dct_2d 000: I ffbooo
N a m e V a l u e S t i . . . 1 . 9 .4  . r — ........... 1 , . 9 i  . , . Ip  . , 10.2 1 . 10/  10.6 1 10.8 , % 1l '  7  7  \ " l f 2  "  i . 114 . , 11,6 '...... ' . 3,
>>■ R S T 0 < = 0
" > C L K 1 D o . . . j7rUijmrL^n_rur^^uu^uaj^r^uTrmj“mjLr
B  ^  x in D D
l+J ^  2  o u t 1 0 8 | l 0 8  X^OO X 1O8  X ooo  X108 X ooo
o  r d v _ o u t 1
Œ! d c l _ 2 d 1 7 4 6
Fig 4.2 Simulation o f DCT
DCT input matrix is
r
Xin=
FF F8 E8 
DD DD DD 
DD DD DD 
DD DD DD 
DD DD DD 
DD DD DD 
DD DD DD 
DD DD DD
E9 A9 A5 
DD DD DD 
DD DD DD 
DD DD DD 
DD DD DD 
DD DD DD 
DD DD DD 
DD DD DD
C5 DD 
DD DD 
DD DD 
DD DD 
DD DD 
DD DD 
DD DD 
DD DD
2-D DCT coefficients are obtained by calculating 1-D DCT on rows first and then on the 
columns.
First 1- D DCT Output we get on the 14 th eloek eyele and these outputs are stored in 
a transpose buffer. Second DCT is applied on the outputs o f the transpose buffer.
We get 2-D DCT coefficients starts from 95^ eloek eyele DC coefficient is 1746 and 
after that for every eloek eyele we get other AC coefficients.
45
4.1.3 Quantizer
N a m e V a lu e Sti... 100 1 200 1 300 . 400 ' 5 0 0 1 1 800 1 000 1 lopo , 1100 1 1200
o~ f e s e t 0 < = 0 1 Î..........................‘
^  e lk 1 D o .. .
lo a d 1 < =  1 .1
YCbCr 0 < = ... I
m  ^  d a ta <uuu X200 X249 X 4 K X449 X 4 C X 4 K X4DC X45B X3Z7 _ X 495 2 ^
Œi ^  q u a n L ir’i Ü3EC < uuuu Xe73F  X^SCE X^BCC X^BEC X^SEC | X^^ED X o ooo  Xoooo Xoooo
i±! q u a n L o u t 0 5 3 <000 Xo X04D X02C X004
rdyquant 1 1 1 ...............................................
i.+.i b s l  _ o u t 0 0 0 3 .(0000 X0033 X0037 XOOIB Xoooo Xoooo Xoooo
5  ^  b s 2 _ o u t 0 0 0 1 (oo o o XOOOD Xotiot Xoooo “ X oool Xoooo
t o u t 0 0 0 0 <0000 Xoooe Xoooo Xoooo
l±i *r  b ? 4 _ o u t 0 0 0 0 <0000 Xoooo Xoool Xoooo
itj ^  a d d e t_ a IJ0U4 <0000 X0033 Xo044 X0023 Xo004 Xoooo Xo004 Xoooo Xoooo
[±i ^  a d d e r _ b 0 0 0 0 <0000 Xoooo X0004 ~Xo(IOO
1+] ^  a d d e r _ c 0 0 2 C <0000 Xoooo X004D <002C Xo004 Xoooo Xo004 Xoooo
E] a d d e r _ in c 0 4 D <000 X im X o to X02C X004 X003 I M l
Fig 4.3 Simulation o f Quantizer
Quantizer divides input DCT eoeffieients with predefined number to represent data 
with less number o f bits.
Quantizer inputs are YCbCr, load( rdy signal form DCT), Quantin and Outputs are 
Quant out and rdyquant.
YCbCr selects the predefined value form luminance or Chrominance component ROMs. 
For the inputs quant in = 673F (110011100111111), YCbCr = 0 (Luminance)
The first predefined value data = 200(100000000000).
Output is quant out = 033(0000110011).
For the inputs quant in = 673F (0001100111001110), YCbCr = 0 (Luminance)
The first predefined value data = 249(001001001001).
Output is quant out = 033(0001001101)
46
4.1.4 Zig Zag Scanning
m r i
^  elk 0 CIO i J i n _ n j L n j i . r L m T j m n n n r L i T _ r u m r L r i J L R n n j i r
D" rdy_in 1 .1 1
E l  ^  q d c L in IV' < . . . . (  X  X  X  X  X  X iK B  X X X  X ^ X i K F
I S  s c a n _ m e m 0 0 K u u  X«o X m  X *  X «  X M  X «  X "  X "  X '9  X ’2 X «B  X "  X "  X « :  X '^  X 'a  X ^  X ^  X »  X «  X =  X =  X "
r S ^  c n t6 4
^  to g g le _ m e m 1
r+] q d c L in _ r e g l (0 2 F ,0 2 F ,.. , C O C O O O O O C C C C O C C C C C O G O C O Z X J C ^
[+1 q d c L tf i_ fe g 2 [02F.U LI... ?({UUU.UUU.UUU.UUU.UUU.UUU,UUU,UUU.UUU.UUU,UUI L J  L J  .UUU.UUU.UUU.UUU.UUU.UUU.UUU.UUU.UUU.UUU.UUiJ.UUU.UUL
[+] zigsasL O U t Ü2F iooo
rdy_out 1
N am e V a lu e  Is i i . . . 4300 1 5000 1 5200 i 5400 i 5600 i 5800 i 6000 i 6200 i 6400 ■ 6600 i r"— -------17000 1 7200 1 7400 i
ï> c lk  |0 Clo... _ R j L f L r L J i R J i r m r L r L n „ a ^ _ R j m n n r L r T J i _ f L f L f L n _ r
^  rdy_in 11 <= 1
+ q d c l_ in  0 2 “ X Xo' f X X Xn?'
El sca n _ m e m  100
+ ctil64  2
^  togg te_m em  j 1 1
+ g d c l in t e a l  |02F ,C2F x x x x x x x x x x x x x x x x
I+J q d c t jn _ r e g 2  fQ2F UU... X l O C C O O f l O C
[ S z i g z a g _ o u t  |0 2 F X D C C @ @ L J G G $
rdy_out 11 r
Fig 4.4 Simulation o f Zig Zag Scanner
In the Zig Zag Scanning inputs are elk, rdy in, qdet in and Outputs are zigzag out, 
rd y o u t.
W hen quantization output is ready qdet in goes high and Zig Zag Scanner starts 
loading quantized DCT eoeffieients into R O M l and it continues for 64 eleok eyele. 
After 64 eloek eyele Zig Zag scanner gives output in the Zig Zag manner by reading 
inputs form RO M l using scan mem.
For input addresses 00,01,02,03 ... so on. Zig Zag Ouput addresses are 00, 01, 0 8 ,.. .so 
on.
47
4.1.5 Differential eoder
N a m e V a lu e  | s t i . . .  j  » ' ipo , .1 5 0  7  200  , 25 0  , 35 0  . Epo 1 450  , 500  , 550 , eoo , ■ e so  4-
reset 0  i < = o i  1 1 . . . . . .  1
e lk 1 IC I0 . . . 1 . . . . . . . . . . . . 1 1 1 1 ■ . . . 1 r “  . . 1 f 1 1 1 1
rdyq u an tl ] - - - .  1
'> -A C D C ! - - - - f
El YCbCr ,i;i> X '
E l  ^  D C _ c o m p  
El -o  D C _am p
m ^ Y
El C b
! l ,  VIS X'G X «  X3"
1 . . . . . . . .  . . . . . . .  X ' X1020 X51
16 I I  X "  X «  1 X «  Xo
000 i i X » 3 3
Cr 000 i I
Fig 4.5 Simulation o f Differential Coder
The Differential coder is used for taking the difference o f current and previous DC 
components.
Here Inputs are elk, reset, YCbCr, ACDC, rdyquant 1 and DC comp are inputs.
De amp is the output.
W hen rdy quantl is high and ACDC is low Differential eoder starts processing.
If  YCbCr is ‘00’ the output is Y and “01” the output is Cb and “ 10” output is Cr. 
Latency o f Differential eoder is zero.
4.1.6 Run Length Encoder
N a m e  r u  c r  j  V a l u e
c -  t e s e t  1 0
o - e lk  1
la s t  I 0
A C D C  I
1*3 A C _ c o rn p  10 5 5
[+] A C C  I 0
m  a m p u ^ Q D I   ..............
^  O K ile in  |1
. 1Û0 ■ 200 I 300 ■ 4Q0 • 500 700 I 800 1 900 I 1000 , 1100 i 1200
X l9 9  X'OOO X0 5 5 I X o o o  X0S7 X023 X08f-‘ XoOO X")03
<0 X ’ X 3 Xo 1 X ' xo X' Xo__
K222... .......X o o o .. X 5 yooo j(097  J(000  j<000 j(003
K u XO Xi X 2 I X "  X1 X o Xi
Fig 4.6 Simulation o f Run Length Encoder
48
RLE coder counts number o f zeros preceding non-zero input.
RLE eoder activates for when ACDC signal is high.
Here the input sequence is AC comp = 199, 0,0,0,55,0,87,23, 8F, 0, 3.
Output (Runin, ampACDCin) is [(0,199), (3, 55), (1, 87), (0, 23), (0, 8F), (0, 3)]. 
Okrlein is goes high when non-zero output occurs. W hen OKrlein is high the outputs are 
valid.
The latency o f RLE coder depends on the oeeurrenee o f non-zero input.
4.1.7 Size Calculator
. 5.0 • I • 100 , , 150 I !2 p 0  < 2 5 0  , 3 0 0  , 3 5 0  ■ 4 0 0  , 4 5 0  i 5 0 0  , 5 5 0  i 6 0 0
^ c l k |0 C lo... . . . . 1 r 1 r 1 1 “1 r " “n  r - L .
EB am pA C D C OUI {u u u XOID X «17 Xooo
E i  s iz e l o {0 * X'e »
Fig 4.7 Simulation o f Size Calculator
The Size Calculator calculates size o f the input amplitude i.e. number o f significant bits 
in the signal.
Input is ampACDC from Differential eoder or RLE eoder.
F or input 0 1D (0000011101 ) Output size = 5 
03B (0000111011) Output size = 6 
007 (0000000111) Output size = 3 
OFF (0011111111) Output size = 8 
000 (0000000000) Output size = 0 
The Latency o f Size Calculator is one eloek eyele.
49
4.1.8 VLC Coder
N am e V alue Si.. t t 50 1 IOC 1 .150 1" 200 1 250) 1 300 1 350 1 400 1 450 1 500 1 5p—-~|g(
elk 0 Clo...! 1 1 1 1 1 1 1 1 1 i—
0  coef_size 1 <=1 Xs .X'
(003 X0I3 X005 Im (020
B -0 VLCamp 100 (uuu ÿso Po Im )(100
Fig 4.8 Simulation o f VLC Coder
VLC eoder identifies the significant bits and shifts the significant bits to most significant 
bits. Inputs to the VLC eoder are eoef_size and ampACDC. Coef_size inputs comes from 
Size calculator process which gives the number o f significant bits. ampACDC input is 
form Differential eoder or RLE eoder.
If  input is (10, 0000000011) output is 1100000000.
(101, 0000010011) output is 1001100000.
(1001, 101001101) output is 1010011010.
Latency o f VLC eoder is zero.
4.1.9 Huffman Coder
H a m s ' j V a lu e  f I s i U f  . ■ 100 200 . 300 4p0( 500 600 1 7 “ 7 Î 800 900 . 1000 . liço 1 1200
o- r e s e t lo [ < « 0  1 1
d k h ic io ... 1 - ........... J  L _ J — L _ J  L _
r e a d _ e n i i I <= 1 ............. '
Y C b C rl iO I <... n  ............. ..,
J—
»• ACDC 10 1 1-----------
1 ‘ -I./M |3 . ■ % . y i ...........
1+1 ru n 10 i .■ =: 1 i(u Xo X5
B  h u f f _ c o d e iCOOQ 1 1(0000 XC0( 0 XEOOO _J(eooo _J(AOOO J < 4 0 0 0 _V:(8000
1+1 -o  h u h  s iz e 103 1 Kpo X03 X04 _)(03
I.+.I ^  R 0 M 1 _ d a te i0 8 Q 3 1 1(0000 X1C04 _X pC 03 XHCI3 _X000S X1003
i+1 R 0 M 2 _ d a ta 12002 5Cj:.ono X7805 _X 7 C o e _J<4002 X?004 X2002 X6003
l+J ^  R O M 3 d a ta 0 0 0 0 0 0 I Koooooo
I'+i ^  R 0 M 4 _ d a ta o n u n u o [ Küooooo
Fig 4.9 Simulation o f Huffman Coder for DC Components
50
For DC components
Huffman eoder inputs are size (Size calculator), run (from RLE eoder) and outputs are 
huff_eode(Huffman code), huff size (Huffman size).
For the DC components (ACDC=0) Output is from ROM l or Rom2.
YCbCr signal selects the Output should be form Rom l or Rom 2.
For the inputs size= 5(0101) run =0 (0000) ACDC=0, YCbCr=0.
Outputs are Huffman code = COOO (1100000000000000) and Huffman size= 03 (00011).
For the inputs size= 6(0111) run =0 (0000) ACDC=0, YCbCr=0.
Outputs are Huffman code = EOOO (1110000000000000) and Huffman size= 03 (00100).
resel 0 |<=0 
U iCIa.,
, I7)3nsi------ :------- :------- :------- :-------
elk _ r ' L _ n _ j  L _ r D _ _ n _ _ r " L _ n ._ T i _ j  L _ n _ _ n _ _ n _ _ r "
read_en 1 <=1
YCbCrl 1 <= 0_ 1
î>ACDC 1 <=1 1 .........................
El ^ size 9 X» X* X» x« x=
f+j run <-i 1 X» Xz x« X) X'. . ...... _
!±] huff_code FFA2 I X4PPCI ){WDO Xoooo XFFAO XFFOE X^FOE XFFAO X^F^  ^ X^FOO X^FO!
Lfj hull_size 10 X!» X™
l+l ^  R0M1_data UMLT Xoooo
El ^  R0M2_dala OOUL Xoooo
El ^  R0M3_dal3 1FF4F0 i X'FF*30 X'RR^BO X'FFIOO X'FFSOO X'FFIBO X'FFOBO
+ R0M4_doia 1FF51C XlFF*50 X’FFSDO X'^ F'DO XF^ t^C XlFF2B0 X'FFIFO X'FFIIO
Fig 4.10 Simulation o f Huffman Coder for AC Components 
For AC components (ACDC=1)
For AC components Outputs are taken form R 0M 3 or R 0M 4.
Y CbCr signal selects the Output should be form Rom3 or Rom4 
For inputs size= 6(0110), run = 5 (0101), ACDC = 1, YCbCr = 1.
51
Output is taken from R 0M 4.
Outputs are Huffman eode = FFA2 (1111111110100010) and Huffman size = 10(10000). 
For inputs size= A(IOIO), run = 4 (0100), ACDC = 1, YCbCr = 1.
Outputs are Huffman eode = FF9E (1111111110011110) and Huffman size = 10(10000). 
Latency Huffman coder is six eloek cycles.
4.1.10 Preassembler
Name V alue | s i i 1 ■ 2.0 40 ep 80 100 120 140 160 1 180 200 1 220 , 240 . 260 , rT C ----
1(5 K T - -  ------------^
EJ VLCamp 18 0  |< = . Xieo x «
+ ^ h u f L s i z e 11/ < 1(06 X02 X07
[+} ^  huff_code 378 0  | <= . . KFCOO . Xsooci ___X37S0
+ n 02  1 X " Xia
+ ^ 12 0Ü30000 Ü0054000 Xossoooc XoOSPOOO
+ 13 06F0003 K1F80000 X0600000 X06FOOOO
LB ^  bs„in COOOGOOOi x'Aeoooooo Xbooooooo Xcooooooo
r±] ^  b s out û'GOûûo:: i(02AO000O XïCOOOOOCi Xoisooooo
E  size 09 I (OB Xm
!+j -c amplilude ooroooo (1FD4000 Xo7SOOOO XoeFocioo
Fig 4.11 Simulation o f Preassembler
Pre assembler concatenates the significant bits o f Huffman eode and VLC amplitude. 
Inputs for Preassembler are eoef size (Size calculator), VLCamp (VLC eoder), huff_size, 
huff_eode( Huffman eoder).
For the inputs eoef_ size = 5 (0101), VLCamp = 150 (101010000), huff size = 6 (00110), 
huff eode = FCOO (1111110000000000)
Outputs are size = b (01011) amplitude = 1FD4 (1111111010100000000000000).
52
4.1.11 Assembler
N am e V a lu e s t i . [  ■ m  1 ■ 150 1 200 . 250 , ' 300 " 350 1 400 . 460 " 1...... 500 7 ” 5 W " V  eoo
i i 0 <= 0  I [
■>- d k 1 C lo .. .! ......1 . . I ~  1 .. 1 1 1.............. 1 J ................... 1 J  I ........ 1
O K rIel 1 < = 1  i - - - '
O' lo a d 1 <= 1 '
S )  o  s is e 04 X " XOD ÿ o F x«
@  a m p litu d e iG o m o o <= i X 'scoooo XlEOOOOO X1E5DOOO X’175D400 XiOOOOOO XlSOOOOO
\±! ^  low 0 0 0 0 0 0 0 0 X?5000000 Xoooo 000
m  AT h igh 7 5 8 0 0 0 0 0 XCEOOOOOC XCFEOOOOO XCFEF2E80 XCFEF2EDD x ™ tiOOO
S ) J P E G  w ord C F E F 2E D D Xoooooooo XCEOOOOOO XCFEOOOOO XCFEF2E80 XCFE
^  OK 1 1
itJ ^  A C C _in 1 0 ( o c Xl9 X oc Xio
B  h ig h i 7 5 8 0 0 0 0 0 XCEOOOOOO XCFEOOOOO XCFEF2E80 XCFEF2EDD X7530 000
^  O K J n 0 J -------------------------------
^  c o u t 0 J 1
ŒI ^  b s_ o u t 0 0 0 0 0 0 0 .. . XcEOOOOOOOOOOOO (OIEODOOOOOOOOO XOOOFZESOOOOOOO X0000005D750000 Xoosooooooooooo X
Fig 4.12 Simulation o f Assembler
Assembler assembles the amplitude inputs to 32 bit words and sends output as JPEG 
bit stream.
Inputs to the assembler are size and amplitude (fro Preassembler) and O krlel (from RLE 
eoder) and Output is JPEG word and OK.
For the inputs (amplitude, size) = (07,1900000), (05, lEOOOOO), (OD, 1E6D000), (OF, 
175D400), (04, 1000000), (04, 1600000)
The Output JPEG word is CFEF2ED0.
Aee in aeeumulates all the sizes and when the Aee in is 32 bits are more than that OK 
signal goes high and JPEG word is the valid output.
53
4.1.12 JPEG Encoder
<=-• d k  
o- d k e n  
] o- Y CbCr 
£>• A C D C  
D" last 
r e a d _ e r i  
c- ACDC1 
o- lo ad
icio.„i
i < =  1 i
i <= 1 I
|<=0 I
{ < - 1  I
l < = 1  j 
!<=1 i
a • f2 I4F
a ■ g2 |4 F
a '  b2 14F
y i 54
a ■ cb1 100
a - '  c rl Î8 0
m '  q u an tin iOOOO
rdy 11
[+|iu- q u a n t_ c o e f1  1000 
^  rd y q u a n t ! l
(tl ^  R un in 1 -------------------------- X:--------------
_  _
a  w  am pA C D C in ÜÜÜ Xooo
O krlein 0 ...............1
o k rle l r e g _ o u t 1 I I....... ......................................... 1
a  m ux_out 0 0 0  1 Xooo
a  c o e f_ s iz e :: i X» X»
+! V L C c im p lAM Xooo ,X W X 'oo
ŒÎ «''■ V L C reg_ou t 0 0 0 XllE Xooo X'lAS
i+! h u h _ c o d e FDOO Xoooo
i+1 «'’■ h u fL s iz e ÛÛ. i j X02
a  ^  h u ffc o d e ie g .. . 1FD8Ü %FD80
C+j ^  huff$ izereg_ ... 11.". XOA
EG sizein 12 X oc x =
a  arripliludein ilF B tA O O  i X1FB6000 XlFBOflOO X1FB52S0
a  XU m ux_out1 1083  i X0O4 Xooo Xooo
a  XU am plitude i fB h - r io r  i X'o®ooooo X1FB6000 X1FB6A00
i+i lU i t i e X»2 X oc X c
c- OK |l I............
CB JP E G  w ord 1 9 4 7D 4C 34  i X947D4C34 X6BOOOOOO
Fig 4.13 Simulation o f JPEG Encoder
E n cod er m o d u le  reads inputs form  a tex t f ile  and th ese  inputs are ap p lied  to  C olor  
Conversion module. At the end o f 6*'^  eloek eyele Color conversion module gives Output 
which is applied to DCT module. After performing 2-D DCT operation, we get the 
Output at the end o f lOO'^ clock eyele which is applied to Quantizer module. Quantized
54
DCT coefficients are stored in a ROM memory o f Zig Zag Coder. After storing all 64 
eoeffieients Zig Zag scanner sends Output in Zig Zag manner (172* clock eyele) and 
these outputs are applied to Differential Coder, Run Length Coder, size calculation, VLC 
Coder, Huffman Coder, Preassembler and Assembler modules. The JPEG word is 
obtained at the end o f  192* eloek eyele.
4.2 Synthesis Report
E J  O '
o
Fig 4.14 RTL Schematic o f JPEG Encoder
55
Design Summary
Synthesis Report for Xilinx Virtex 4 FPGA
Dtfviue Utilization Summary
Logic Utilization Used Available Utilization Note(s)
Total Number Slice Registers 2 .692 1 0 .944 24%
N um ber u se d  a s  Flip Flops 2.GG8
N um ber u se d  a s  L a tch es 24
N um ber of 4 input LUTs 2 .592 1 0 ,944 23% 1
Logic Distribution
Num ber of o c c u p ie d  S lices 2.34G 5 ,472 42% I
N um ber of S lices con tain ing  only re lated  logic 2 ,346 2 ,3 4 6 100% 1
N um ber of S lices con tain ing  un re la ted  logic 0 2 ,346 0% 1
Total Number of 4 input LUTs 2 ,708 1 0 ,944 24% 1
Num ber u se d  a s  logic 2 592
N um ber u se d  a s  a  route-thru 67 1
Num ber u se d  a s  Shift registers 49
Num ber of b o n d e d  lOBs 43 320 13%
Num ber of BU FG /B U FG CTRLs 2 32 6%
N um ber u se d  a s  BUFGs 2
N um ber u se d  a s  BUFGCTRLs 0
N um ber of FIFO tG /R A I^B IG s 3 36 8%
N um ber u se d  a s  FIF01 Gs 0
N um ber u se d  a s  RAM BIGs 3 1
Fig 4.15 Design Summary o f JPEG for Vertex-4 FPGA
Timing Report 
Speed Grade: -11
M inimum period: 7.445ns (Maximum Frequency: 134.318MHz) 
M inimum input arrival time before clock: 7.703ns 
Maximum output required time after eloek: 4.221ns
56
CHAPTER 5 
CONCLUSIONS
This thesis presents pipelined implementation o f ‘Baseline JPEG image compression 
architecture’ on eolor images. The optimized architectures for the modules such as DCT, 
quantization, differential eoder. Run length encoding, Huffman encoding were explained 
and coded in VHDL (Hardware description language) using Active HDL simulator. From 
simulation results it has been observed that the architecture has a minimum latency o f 
187 eloek cycles for an image o f 8X8 pixels.
The VHDL eode is synthesized using Xilinx ise 9.1 simulator. The architecture requires 
2768 o f  logic blocks and frequency is 134.318 MHZ for Xilinx Vlrtex-4 FPGA.
57
BIBLIOGRAPHY
[1] “Home site o f  the JPEG and JBIG committees”, www.JPEG.org.
[2] “Lapped Transform via Time-Domain Pre- and Post-Filtering” by Trae D. Tran, 
Member, IEEE, Jie Liang, Student Member, IEEE, and Chengjie Tu, Student Member, 
IEEE
[3] http://pagesperso-orange.fr/polyvalens/elemens/wavelets/wavelets.html#seetionl
[4] http://www.thepolygoners.eom/tutorials/dwavelet/DW TTut.html
[5] JPEG2000 Image Coding System Theory and Applications, by N Athanassios. 
Skodras Touradj Ebrahimi School o f Science and Technology - Computer Science Eeole 
Polytechnique Fédérale de Lausanne -  EPFL.
[6]The International Telegraph and Telephone Consultative Committee (CCITT), 
“Information Technology -  Digital Compression and Coding o f Continuous-Tone Still 
Images -  Requirements and Guidelines” . Ree. T.81, 1992.
[7] “Image and video eomoression standards -  Second Edition, Kluwer Academic 
Publishers, USA, 1999 by vasudev bhaskaran, Konstantinos Konstantinides .
[8] J. Miano. Compressed Image File Formats -  JPEG,PNG, GIF, XBM, BMP, Addison 
W esley Longman Ine, USA, 1999.
[9] L. Agostini, S. Bampi, “Integrated Digital Architecture for JPEG Image 
Compression,” European Conference on Circuit Theory and Design, Vol. Ill, pp. 181- 
184, 2001.
58
[10] “JPEG Still Image Data Compression Standard”, by W. Pennebaker and J. Mitehell. 
Van Nostrand Reinhold, USA, 1992.
[11] “Paeket Analyzer for JPEG2000 Code streams and its VHDL m odel” by Masayuki 
Kurosamt, Akemi IKED AS, Khaiml Munadi and Hiioshi Kiyatt, Department of 
Eleetrieal Eng., Tokyo Metropolitan Univ., Japan
[ 12] http://mtg.upf.edu/~xserra/eursos/TDP/refereneies/Park-DW T.pdf 
[13] Kumar, C.S., "Comments on 'Subband coding o f images'," Acoustics, Speech and 
Signal Processing, IEEE Transactions on , vol.36, no.7, p p .1089-1090, Jul 1988 [14] 
“Verilog HDL: A Guide to Digital Design and Synthesis” , by Samir Palnitkar, SunSoft 
Press, Prentice Hall.
[15] “Design o f Architectures for JPEG Image Compression (portuguese). M aster 
Dissertation by L. Agostini, Federal University o f Rio Grande do Sul. Informatics 
Institute. Pos-Graduation in Computer Science Program, Porto Alegre, Brazil-
[16] “JAGAR: A Fully Pipeline VLSI Architecture for JPEG Image Compression 
Standard”, by M. Kovae and N. Ranganathan, Proceedings o f the IEEE, vol. 83, 1995, 
pp. 247-258.
[17] http://www.xilinx.eom/support/doeumentation/application_notes/xapp930.pdf.
[18] http://www.xilinx.eom/support/doeumentation/applieation_notes/xapp6I0.pdf.
[19] Lei, S.-M.; Sun, M.-T., "An entropy coding system for digital HDTV applications," 
Circuits and Systems for Video Technology, IEEE Transactions on , vo l.l, no .l, p p .147- 
155, March 1991.
59
[20] Agostini, L.V.; Silva, I.S.; Bampi, S., "Pipelined entropy coders for JPEG 
compression," Integrated Circuits and Systems Design, 2002. Proceedings. 15th 
Symposium on , vol., no., pp. 203-208, 2002
60
VITA
Graduate College 
University o f Nevada, Las Vegas
Arun kumar reddy Toomu
Loeal Address:
4248 Grove Circle, apt 3 
Las Vegas, N V -89119
Degree:
Bachelor o f Technology in Electrical and Computer Engineering, 2006 
JNT University, Hyderabad, India
Thesis title: Pipelined Implementation o f  JPEG Image Compression using VHDL
Thesis Examination Committee:
Chairperson, Dr. Henry Selvaraj, Ph.D.
Committee member. Dr. Emma Regentova, Ph.D.
Committee member. Dr. Yahia Baghzouz, Ph.D.
Graduate College Faculty Representative, Dr. Laxmi gewali, Ph.D.
61
