UNLV Retrospective Theses & Dissertations
1-1-2006

Analysis of runtime re-configuration systems
Utthaman Thirunavukkarasu
University of Nevada, Las Vegas

Follow this and additional works at: https://digitalscholarship.unlv.edu/rtds

Repository Citation
Thirunavukkarasu, Utthaman, "Analysis of runtime re-configuration systems" (2006). UNLV Retrospective
Theses & Dissertations. 1988.
http://dx.doi.org/10.25669/x09a-y9cb

This Thesis is protected by copyright and/or related rights. It has been brought to you by Digital Scholarship@UNLV
with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the
copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from
the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/
or on the work itself.
This Thesis has been accepted for inclusion in UNLV Retrospective Theses & Dissertations by an authorized
administrator of Digital Scholarship@UNLV. For more information, please contact digitalscholarship@unlv.edu.

A NA LY SIS OF RU N TIM E RE CO NFIG U RA TIO N SYSTEM S

by

U ttham an Thirunavukkarasu
Bachelor o f Com puter Science & Engineering
M adras U niversity
1999

A thesis subm itted in partial fulfillm ent
o f the requirem ents for the

Master of Science Degree in Electrical and Computer Engineering
Department of Electrical and Computer Engineering
Howard R. Hughes College of Engineering

Graduate College
University of Nevada Las Vegas
May 2006

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

UMI Number: 1436801

INFORMATION TO USERS

The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality illustrations and
photographs, print bleed-through, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.

UMI
UMI Microform 1436801
Copyright 2006 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.

ProQuest Information and Learning Company
300 North Zeeb Road
P.O. Box 1346
Ann Arbor, Ml 48106-1346

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

T hesis Approval

IJN TV

The Graduate College
University of Nevada, Las Vegas

December 19

The Thesis prepared by
Utthaman Thirunavukkarasu

Entitled
"Analysis of Runtime Reconfiguration Systems”

is approved in partial fulfillment of the requirements for the degree of
Master of Science in Electrical Engineering

Exam ination C om m ittee Chair
XT

Dean o f the Graduate College

H
E xam ination Qdinmittee*Member

Exam ination C om m ittee M em ber

^

y

/

Graduate College F aculty R epresentative

11

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

,2005

ABSTRACT

Analysis of Runtime Re-Configuration Systems
by
Utthaman Thirunavukkarasu
Dr. H enry Selvaraj, Exam ination Com m ittee Chair
Professor of Electrical and C om puter Engineering
U niversity o f Nevada, Las Vegas
In recent years Program m able Logic D evices (PLD) and in particular Field
Programm able Gate Arrays (FPGAs) have seen a trem endous increase in sales and
applications in the area o f em bedded systems. The main advantage of FPGAs is the
flexibility that they offer a designer in reconfiguring the hardware. The flexibility
achieved through re-configuration of FPGAs usually incurs an overhead of extra
execution time, data m em ory and also pow er dissipation.
FPGAs provide an ideal tem plate for run-tim e reconfigurable (RTR) designs. Only
recently have R TR enabling design tools that bypass the traditional synthesis and
bitstream generation process for FPGA s becom e available, JBits is one o f them. W ith
run-time reconfiguration o f FPGAs, we can perform partial reconfiguration, which allows
reconfiguration o f a part o f an FPG A while the other part is executing some functional
computation. The partial reconfiguration of a function can be perform ed earlier than the
time when the function is really needed. Such configuration pre-fetch can hide the
reconfiguration overhead m ore effectively.

Ill

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

This thesis will im plem ent a reconfigurable system and study the effect o f runtim e re
configuration using V ERILO G and a new Java based tool JBITS. This work will provide
pointers to high level synthesis tools targeting runtim e re-configuration.

IV

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

TABLE OF CONTENETS
A BSTR A C T......................................................................................................................................... iii
LIST OF F IG U R E S ...........................................................................................................................vii
LIST OF T A B L E S .............................................................................................................................vii
A C K N O W LED G EM EN TS.............................................................................................................. ix
CHAPTER I IN T R O D U C T IO N .................................................................................................... 1
1.1 Field Program m able Gate A rrays and R econfigurable C om puting ............................2
1.2 Run-Time Reconfigurable C o m p u tin g.............................................................................. 3
1.3 J B its ........................................................................................................................................... 5
1.4 JPEG 2 0 0 0 ................................................................................................................................5
1.5 FPGAs and Im age Processing.............................................................................................. 8
1.6 M otivation and Contribution of the th e sis........................................................................ 9
1.7 O rganization............................................................................................................................10
CHAPTER 2 B A C K G R O U N D ................................................................................................... I I
2.1 Run Tim e R econfiguration................................................................................................. II
2.2 Stages in the JPEG 2000 A lg o rith m .................................................................................. 16
2.3 MQ E ncoder............................................................................................................................19
CHAPTER 3 TOOLS U S E D ........................................................................................................29
3.1 VERILOG L an g u ag e........................................................................................................... 29
3.2 Memec Insight Virtex-II M B ............................................................................................. 30
3.3 Xilinx I S E 5 .0 ........................................................................................................................31
3.4 JBits 3 . 0 ...................................................................................................................................33
CHAPTER 4 M E T H O D L O G Y ................................................................................................... 43
4.1 VERILOG Im plem entation.................................................................................................43
4.2 Synthesis and Pow er Estim ation o f the D e s ig n .............................................................48
4.3 JBits and X H W IF.................................................................................................................. 51
CHAPTER 5

RESULTS AND D IS C U S S IO N S .....................................................................55

CHAPTER 6

C O N C L U S IO N ......................................................................................................64

R E F E R E N C E S .................................................................................................................................... 65

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

V IT A ......................................................................................................................................................70

VI

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

LIST OF FIGURES
Figure
Figure
Figure
Figure
Figure
Figure

LI
1.2
2.1
2.2
2.3
2.4

Figure 2.5
Figure 2.6
Figure 2.7
Figure 3.1
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure

3.2
3.3
3.4
3.5
3.6
4.1
5.1
5.2
5.3
5.4
5.5
5.6

Figure 5.7

Traditional FPGA D esign F low ................................................................................3
JBits Design F lo w ....................................................................................................... 6
Stages in JPEG2000 Encoding Process................................................................ 17
Basic structure o f a generic Q -coder.....................................................................20
Pseudo code of Q -coder........................................................................................... 21
The interval in the unit interval [0, I] divides into EPS
subinterval and M PS subinterval respectively................................................. 22
Probability E stim ation..............................................................................................24
An exam ple of probability estim ation for an EPS followed
by a sequence of M PS S ym bols...........................................................................25
Probability Estim ation T ab le..................................................................................26
Block diagram o f M em ec Insight Virtex-Il MB Development
Kit B o a r d ..................................................................................................................31
JBits E n v iro n m en t................................................................................................... 33
Static Design F lo w .................................................................................................... 35
RTR Design F lo w .....................................................................................................36
RTR Execution M o d el.............................................................................................37
Snapshot of the BoardScope tool ........................................................................ 42
Design Flow for Pow er calculation and FPGA configuration....................... 50
Simulation w aveform s o f the verilog_file_l.v code........................................ 56
Floorplan for verilog_file_I design on XC2V1000 F P G A ............................ 58
Routing Congestion for verilog_file_I.v design on xc2vl000 F P G A
59
Floorplan of the v erilog_file_l.v design on xcvlOOO FPG A ......................... 60
Routing Congestion for verilog_file_l.v design on xcvIOOO FPG A
61
Design view of v erilog_file_l.v on xcvlOOO FPGA
using Boardscope.....................................................................................................62
Design view of verilog_file_2.v on xcvlOOO FPGA
using B oardscope.....................................................................................................62

vu

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

LIST OF TABLES
Table
Table
Table
Table
Table

4.1
4.2
4.3
4.4
5.1

Table 5.2
Table 5.3
Table 5.4

A rithm etic Encoder M ain D ata Inputs/O utputs......................................... 45
A rithm etic Encoder T ask s.............................................................................. 45
M ain D ata path Registers C ............................................................................46
O utputs o f the Arithmetic Encoder VERILOG E n tity .............................48
R esource U tilization of the designs on
X C 2V I000 F P G A ............................................................................................57
D ynam ic Pow er consumption of the designs on
X C 2 V I0 0 0 F P G A ............................................................................................58
R esource U tilization of the designs on XCVIOOO F P G A ........................ 59
D ynam ic Pow er consumption of the designs on
XCVIOOO F P G A ...............................................................................................60

viu

Reproduced witti permission of ttie copyrigfit owner. Furtfier reproduction profiibited witfiout permission.

A CKN O W LED G EM EN TS
I would like to acknow ledge the im m ense assistance and moral support provided by
my advisor, Dr. H enry Selvaraj during the course o f my m asters program at the
University of N evada, Las Vegas. The guidance provided by him in steering this project
from concept to com pletion has been invaluable.
I would like to thank Dr. M uthukum ar V enkatesan for all his direct and indirect
support throughout this investigation. It is also im portant to m ention the moral support of
my im mediate fam ily including my parents and my brother who were available for me at
all times. But for their constant m otivation, it would have been im possible for me to get
this far in life.
I would like to acknow ledge the help of Gopinath Balakrishnan, Balaji Anandsagar
and all my friends for their help and advice which has always been an invaluable source
of motivation for me.
Finally I would like to thank the D epartm ent o f Electrical and Com puter Engineering
at the U niversity o f Nevada, Las V egas for giving me an opportunity to pursue my
m asters’ degree.

IX

Reproduced witti permission of ttie copyrigfit owner. Furtfier reproduction profiibited witfiout permission.

CH A PTER 1

IN TRO D U CTIO N
D ecades ago, electrical engineers had to use m ultiple dedicated chips to perform logic
functions. Later, these chips were consolidated into general devices that could be
custom ized for specific Boolean operations. As transistors became smaller, these devices
became more pow erful and their range o f operations became greater. FPGA s represent
the m ost recent and most powerful of these custom izable parts and their com plexity have
risen to exceed that of regular com puter processors. However, com puter scientists are yet
to recognize the capabilities of FPGA design, which rem ains the sole province of
electrical engineers. FPGA s provide a num ber o f unique advantages, one such advantages
involves Run Tim e Reconfigurable processing. T o appreciate these qualities, we need to
see how FPGAs fit into a com putational spectrum that ranges from fixed-structure to
flexible-structure processing, and from program m able to configurable when com pared to
Digital Signal Processors(D SP). The Digital Signal Processors are designed to perform
high speed math and give more flexibility to the user. H ow ever, this flexibility is
achieved at a cost of higher com plexity and program m ing difficulty. All general purpose
processors have a fixed instruction set and no m atter how flexible they are, the processors
have to work within that instruction set (i.e. you can custom ize these processors, but you
can’t change the instruction set in any way). Since the instruction set is determined by the

Reproduced witti permission of ttie copyrigtit owner. Furttier reproduction protiibited wittiout permission.

processor’s internal structure, these processors can be called fixed-structure processors
and the custom ization part of this processor is called programm ing. B ut FPGAs are
different from these processors since with an FPGA, you have the option of using
processors with flexible structures which can be reconfigured.

1.1 Field Program m able Gate A rrays and R econfigurable Com puting
FPGA is a short for Field-Program m able Gate Array, a type of logic chip that can be
reconfigured. G enerally speaking it is a piece o f hardware that can be rew ired to perform
any logic. FPG A s consist o f thousands of separate, independent com putational elements,
called logic elements. The process o f im plem enting a design on FPGA is called
configuration. W hen you configure an FPGA , you’re changing the structure and behavior
of the logic elem ent’s internal connections and setting param eters that control its logical
functions. An FPG A s logic elem ents are too small to be useful by them selves, but by
configuring and connecting them , you can use the FPGA to perform a fully custom ized
computation. O f course, configuring hundreds of logic elem ents individually is
prohibitively difficult but the presence of modern CAD tools m akes it easy for users to
configure an FPGA. M ost digital designers create designs using one of two languages VHD L and V erilog. Then, they use specialized software to convert these designs into bit
streams. This conversion process is called synthesis, which extracts the logical
characteristics from a design, and im plem entation, which builds a bitstream according to
these characteristics. This bits ream file is then pushed onto an FPG A which in turn is the
hardware im plem entation of the logic function. Any FPG A processing can be called

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

reconfigurable com puting because you can always reset an FPG A back to its blank state
and configure it with a new bitstream. For exam ple, you can initially configure a Xilinx

H D l r > e s ig n ( V H D L .V E R I L O G )

r*lace a n d R o u te

■Verification

B its tr e a m

FPGA. C o n f ig u r a tio n

Figure]. 1: Traditional FPG A Design Flow

V irtex-II FPG A to perform GPS tracking in a cell phone. Then, once the position has
been acquired, you can reset the FPGA and reconfigure it to function as a voice decoder.
Regular FPGA developm ent is a straight forw ard process and m any sources can be found
that explains it in greater depth.

1.2 Run-Tim e Reconfigurable Com puting
Run-tim e or dynam ic reconfiguration of circuits has recently becom e viable with the
introduction of SRA M based dynam ically reconfigurable Field Programm able Gate
Arrays (FPGAs) .Run-tim e reconfiguration (RTR) or

D ynam ic reconfiguration (DR)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

also called, on-the-fly reconfiguration or in-circuit reconfiguration is a type of
reconfiguration which allows m odifications o f a system configuration during its normal
operation without resetting the FPG A .M ost traditional FPGAs are configured with full
bit streams. That is, after the device has been reset, the bitstream sets each connection and
param eter within each Logic Elem ent. Flowever, with new generation FPGA s like the
Xilinx Virtex fam ily FPGA s, you can perform partial reconfiguration, in which the partial
bitstream affects part of the device. In this case, you don’t need to reset the FPG A to load
the other part of the bitstream since it can be done during run-time. This is a very
important capability since configuring during runtim e allows you to gradually adjust an
FPGA s structure to better reach a goal. For exam ple in signal processing, you can
dynamically alter the coefficients o f a digital filter to im prove signal-to-noise ratio which
in turn im proves the signal quality. W hile dynam ic reconfiguration has always been
possible in all X ilinx SRA M -based parts, very little has been done to provide software
support for this capability. In general, the design flow has been limited to static circuit
design tools, with schematic capture or H ardw are D escription Language (HDL) frontends. In addition, the m ethod used to produce configuration data from these circuits was
based on autom atic placem ent and routing technology developed originally for
production of printed circuit boards. This approach relied on the solving of known NPcomplete problem s and was necessarily slow and non-determ inistic. The placem ent
algorithms usually provided a physical im plem entation of the circuit which bore little
resemblance to

the logical

circuit.

This

made the task

of locating items for

reconfiguration difficult. To support dynam ic reconfiguration o f an SRAM based FPGA
we require a tool which is as fast as possible, and provide physical information about the

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

circuit for reconfiguration. One such tool is JBits. A brief description of JBits is given in
the next section.

1.3 JBits
The JBits™ software is a set of Java™ classes which provide an Application
Program m ing Interface (API) to access the Xilinx FPGA bitstream. The interface
operates on either bit streams generated by X ilinx design tools, or on bit streams read
back from actual hardware. This perm its all configurable resources like Look-up tables,
routing and the flip-flops in the FPG A to be individually configured under software
control. The API has been used to construct com plete circuits and to m odify existing
circuits. In addition, the object-oriented support in the Java program m ing language has
perm itted a small library o f param eterisable, object oriented m acro circuits or Cores to be
im plem ented. Finally, this A PI m ay be used as a base to construct other tools. This
includes traditional design tools for perform ing tasks such as circuit placem ent and
routing, as well as application specific tools to perform more narrow ly defined tasks.

1.4 JPFG 2000
The Joint Photographic Experts G roup (JPEG) standard has been in use for more than
a decade now. It has proved a valuable tool during all these years, but it cannot fulfill the
advanced requirem ents of today. T o d ay ’s digital imagery is extrem ely dem anding, not
only from the quality point of view, but also from the image size aspect. Current image
size covers orders o f m agnitude, ranging from w eb logos of size of less than 100 Kbits to

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

high quality scanned images of approxim ate size of 40Gbits. With the continual
expansion of m ultim edia and Internet applications, the needs and requirements of the

JBits
Libraries

User
Java
Code

< ;= :*

Java
Compiler

Executable

Reconfigurable H/W

F ig u re!.2: JBits D esign Flow

technologies used, grown and evolved and this in turn led to the developm ent of a new
image com pression standard called JPEG 2000. The JPEG 2000 standard, finalized in
2001, defines a new im age-coding schem e using state-of-the-art compression techniques
based on wavelet technology.
The JPEG 2000 international standard represents advances in image com pression
technology w here the im age coding system is optim ized not only for efficiency, but also
for scalability and interoperability in netw ork and mobile environm ents. Digital im aging
has become an integral part of the Internet, and JPEG 2000 is a powerful new tool that

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

provides power capabilities for designers and users o f networked image applications The
JPEG 2000 standard provides a set of features that are o f im portance to many high-end
and em erging applications by taking advantage o f new technologies. The m ajor
difference between JPEG and JPEG 2000 is that the later uses D iscrete W avelet
Transform instead of Discrete Cosine Transform used in JPEG standard. The markets and
applications better served by the JPEG 2000 standard are Internet, color facsim ile,
printing, scanning (consum er and prepress), digital photography, remote sensing, mobile,
m edical im agery, digital libraries/archives, and E-com m erce. Each application area
im poses some requirem ents that the standard, up to a certain degree, should fulfill.
The JPEG2000 algorithm is large and com plex. Given the limited scope and time
available for this thesis, im plem enting most o f the standard in hardware is not a realistic
Goal so it was decided to im plem ent a section o f JPEG 2000 on hardware. Selecting the
section of JPEG 2000 to im plem ent on hardw are is therefore an im portant choice to be
made. The selection was made so that the greatest possible performance gain is obtained,
given that only a subset of the algorithm processing will be carried out in hardware. After
going through the previous works and understanding the working of JPEG 2000, the
Arithm etic Entropy Encoder stage of the JPE G 2000 encoding standard was chosen for
hardw are im plem entation. In this thesis the A rithm etic Encoder stage of JPEG2000 is
im plem ented on FPGA and is reconfigured during runtim e to explore the advantages and
overheads incurred during runtime reconfiguration over static reconfiguration

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

1.5 FPGAs and Im age Processing
The use o f reconfigurable field-program m able gate arrays (FPGAs) for imaging
applications show considerable prom ise to fill the gap that often occurs when Digital
Signal Processor (DSP) chips fail to meet performance specifications. Although DSP
chips can process data at high-speeds, their architectures can inhibit overall system
performance in real-tim e imaging. The rate of operations can be increased when they are
performed in dedicated hardware, such as special-purpose im aging devices or FPGAs,
which provides the architecture necessary to im plem ent real-tim e im age processing
products successfully and cost-effectively. For many fixed applications, non-SRAM based (antifuse or flash-based) FPG A s provide the raw speed to accom plish standard
high-speed functions. However, in applications where algorithm s are continuously
changing and com pute operations m ust be modified, only SRA M -based FPGA s give
enough flexibility. The addition o f reconfigurable FPGAs as a flexible hardw are facility
enables D SP chips to perform optim ally. The benefits prim arily stem from optim izing the
hardware for the algorithm s or the use of reconfigurable hardw are to enhance the product
architecture. Since the use of JBits for RTR requires a separate “host” processor and Java
Virtual

M achine

(JVM )

to

execute

the

Java

classes

that

perform

bitstream

reconfigurations. It was desirable, therefore, to have the im age processing application
benefit from the required PC-FPG A shared processing environm ent. Image processing
applications have been shown to benefit from shared processing environm ents, in which
the FPGA is utilized as a co-processor [16, 17]. This concept can be extended to utilize
RTR for the core, in which the “h ost” process defines a specialized circuit coprocessor
instance to accelerate com putation o f the current im age-processing task. Designing

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

specialized circuitry is useful in image encoding applications, w here the perform ance of
the encoding algorithm is dependent on the im age itself.

1.6 M otivation and Contribution of the thesis
For the design of an RTR application to be justifiable, it should exhibit clear
advantages over a similar, static circuit. Although a num ber of architectures for static
A SIC-based A rithm etic Encoder architectures have been explored, they offer little in the
way o f operation custom ization. Advantages that can be exploited through RTR include
circuit speed increases through decreased latency or increased clock frequency, and
decreased

resource

consum ption

when

com pared

to

the

static

im plem entation

counterpart. The purpose o f this thesis is to explore the advantages o f RTR, estimate the
overheads incurred during RTR in terms of pow er and resource and try to suggest
methods to m inim ize it.
The main contribution o f this thesis is study and im plem entation RTR when applied
to A rithm etic

E ncoder of JPEG 2000.

This

thesis presents

a working

Verilog

im plem entation o f the arithm etic encoder stage o f the algorithm . The code has been
tested using test bench to check w hether it confirm ed with the JPEG 2000 standard set by
the Final Draft Com m ittee of JPEG 2000 standard. The Verilog code was simulated and
the perform ance o f the design was estim ated. The Verilog code successfully passed all
simulation tests as a w orking im plem entation of the JPEG 2000 arithm etic encoding stage.
Since JBits 3.0 supports Virtex-II -

FPG A s, the M emec Insight Virtex-11 MB

Developm ent Kit was eventually identified as an appropriate product available that
fulfilled the necessary requirem ents since JBits 3.0 supports V irtex-II FPGA and with a

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

million gates V irtex-II FPGAs gives us more flexibility for partially reconfiguring the
FPGA and eventually is very well suited for RTR . A t this point, the Verilog m odules
were synthesized and program m ed into the FPGA. Once the FPGA was configured JBits
was used to reconfigure the hardware in run time and the results were analysed, which is
discussed in the results chapter o f the thesis

1.7 Organization
As explained in this introductory chapter, this thesis presents the results of RTR when
applied to the A rithm etic Encoder of JPEG 2000. As background. Chapter 2 discusses
previous work that is related to this thesis. It discusses the previous works on Runtim e
Reconfiguration and exiting JPEG 2000 image processing algorithm .Chapter 3 discusses
the tools selected for RTR and then tries to justify the selection of Virtex-II FPGA for the
experiment. C hapter 4 explains the com plete experim ent process, the Verilog code,
simulation waveform s, synthesis and im plem entation of the code, and the JBits
environm ent in which RTR of the FPG A is carried out.
Chapter 5 o f the thesis discusses the results and C hapter 6 concludes the thesis with
consideration given to future developm ents that could arise out o f the work presented in
this thesis.

10

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

CH APTER 2

BA CK G RO U N D
2.1 Run Time R econfiguration
2.1.1 Introduction
Nowadays

m any

em erging

applications

in

com m unication,

computation

and

consum er electronics dem and that their functionality stays flexible after the system has
been manufactured. Such flexibility is required in order to cope with the changing user
requirements, im provem ents in system features, changing protocol and data-coding
standards, demands for support o f a variety of different user applications, etc. Until
recently, FPGA has only been used in prototyping of ASIC designs and low-volum e
production, mostly because o f its low speed, high cost per unit and high pow er
consumption. However, thanks to the im provem ents of FPGA technology, soaring non
recurring engineering (NRE) cost and shortening tim e-to-m arket requirem ents, there is an
increasing interest in using FPG A s instead of ASICS for em bedded systems design [1].
M ost applications running on FPG A -based systems are im plem ented using a single
configuration per FPG A [10]. This m eans that the functionality of the circuit does not
change while the application is running. Such an application can be referred to as being
Compile-Time Reconfigurable

or Static-Tim e

Reconfigurable, because the entire

11

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

configuration is determ ined at the com pile-tim e and does not change throughout system
operation. A nother im plem entation strategy is to im plem ent an application with multiple
configurations per FPGA [12], [11], [8]. In this scenario the application is divided into
time exclusive operations that need not, or cannot, operate concurrently. Each operation
is im plem ented as a distinct configuration which can be dow nloaded into the EPGA as
necessary at run-tim e during application operation. This approach is referred to as Run
Time Reconfiguration, RTR or Dynam ic Reconfiguration. D ynam ic Reconfiguration can
be achieved into tw o different ways: dynam ic external reconfiguration and embedded
reconfiguration. D ynam ic external reconfiguration im plies that an active array may be
partially reconfigured by an external device such as a Personal Com puter, while ensuring
the correct operation o f those active circuits that are not being changed. Embedded
reconfiguration extends the concept of dynam ic reconfigurability assuming that specific
circuits on the array are used to control the reconfiguration of other parts of the FPGA.
Clearly the integrity o f the control circuits m ust be guaranteed during reconfiguration, so
by definition em bedded reconfiguration is a specialized form o f dynam ic reconfiguration
[13]. As a program m able platform, a dynam ically reconfigurable architecture only makes
sense when it provides a better solution than other alternatives e.g., superscalar processor
and DSP, in term s o f performance, cost, pow er and developm ent efforts. Considering
ever-increasing perform ances o f processors, a good design m ethodology is essential to
the success of this approach .A new class of cores called run-tim e param eterisable (RTP)
has been introduced in [9]. RTP cores allow a single core to be com puted and custom ized
at run-time. For exam ple, an adder core can be produced, and then param eterized at run
time for different operand widths. An innovation of this approach consists in considering

12

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

the R TP cores as a specific exam ple of a reconfigurable core, placed on the
program m able device in a dynam ic m anner to respond to the changing com putational
dem ands of the application. The problem of this methodology is that the RTP are targeted
only to a single device fam ily and there is no information about the com m unication
channel between R TP and about how they solve the physical reconfiguration problem .
There is lot o f research being conducted on Design M ethodology and Environm ent
for Runtim e Reconfiguration. The most com m on requirement for the set of tools and
associated m ethodologies addressing the follow ing issues are;
• Automatic or manual partitioning of a conventional design,
• Specification of the dynam ic constraints,
• Verification of the dynam ic im plem entation through dynamic simulations at m ajor steps
of the design flow,
• Automatic generation of the configuration controller core for H D L im plem entation,
•

Dynamic

floor

planning

m anagem ent

and

guidelines

for

m odular

back-end

im plem entation.
2.1.2 Design Partitioning
One of the m ost im portant tasks in D ynam ic reconfiguration is Design Partitioning.
There are basically two types of partitioning 1) Tem poral and 2) Spatial partitioning. If a
function is partitioned in to set of operations that are executed sequentially in tim e it is
known as temporal partitioning. As the nam e suggests partitioning a function in space is
called spatial partitioning. General purpose m icroprocessors provide a silicon m edium
that can be configured to solve any com putation task and it is mostly a tem poral
com putation or in other words serial com putation. Reconfigurable devices com pute a

13

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

function by configuring functional units and interconnecting them in space. This provides
parallelism to computation. Superscalar and VLIW m icro-processors exploit some low
level parallelism, but not like the reconfigurable logic devices. Though reconfigurable
logic provides parallelism in com puting due to the lack o f serial com putation they exhibit
scalability problem s i.e. an application larger than the capacity of a reconfigurable
com puter can not be mapped with out scaling the available hardw are resources [22], but a
judicious temporal partitioning can avoid an over sizing of the resources needed.
R un-Tim e reconfiguration tries to take advantage of the serialism and parallelism of a
design by splitting larger designs into tem porally exclusive collections, so-called
configurations of sm aller sub problem s that are loaded onto FPGA s dynam ically during
the application’s run-time. The dynam ically reconfigurable com puting consists of
successive execution of a sequence o f algorithm s on the same device. The objective is to
swap different algorithm s on the same hardware structure, by reconfiguring the FPGA
array in hardware several times in a constrained time and with a defined partitioning and
scheduling [42, 43]. So when we are taking about RTR we are usually concerned about
both temporal and spatial partitioning o f the design, i.e. a step that partitions design into
tim e-exclusive spatial segments and groups these sub-functions or hardware objects into
FPGA configurations [19, 20]. Several research efforts have been made to partition and
map a com putational task onto spatially interconnected reconfigurable processing
elements [17] [18]. The objective is to exploit the parallelism in the application by
mapping it to the spatially interconnected elements. Splash [17], PAM [18] and other
com puting environm ents have practically dem onstrated the perform ance gains equivalent
to super com puters. One other im portant issue in run time reconfiguration are the control

14

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

of FPG A reconfiguration and the generation of com m unication channels between
hardware objects in arbitrary configurations [21].This is obtained by proper sequencing
of the partitions. One com m on way o f sequencing the configuration is the use of DFG.
As we know an application consists o f som e potential for spatial com putation and some
restrictions that require temporal com putation. If an application can be represented as a
data flow graph(D FG ) with sub-tasks as nodes and their dependencies as edges, all the
nodes with sam e level in the graph constitute potential candidates for spatial com putation
i.e., they exhibit spatial flexibility. The nodes connected by edges should be executed
sequentially i.e., tem poral com putation.
2.1.3 Design Issues and JBits
M ost FPG A designs follow the traditional ASIC design flow, confining the
reconfigurability to load time. However, run-tim e or dynamic reconfiguration is of
special interest am ong the research com m unity because it provides a perform ance/cost
advantage over load-tim e configuration [2]. In the past years, there has been a little
research addressing the design issues of dynam ically reconfigurable architectures. H auser
et al. presented the G arp architecture [3] and its com piler [4]. The Garp architecture
com bines configurable hardw are with a standard M IPS processor on the same die. The
specially designed features allow hardw are to be reconfigured in m icroseconds. The Garp
com piler can find instruction-level parallelism (ILP) from C code, and directly com pile
selected loops to the reconfigurable array. H owever, if only ILP is to be exploited, the
system is unlikely to outperform VLIW and superscalar processors regarding

speed.

Kaul et al. proposed a SPA R CS fram ework [5]. In SPARCS, a high-level synthesis too is
em ployed to estim ate resources and latency. An Integer Linear Program m ing (ILP) model

15

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

is formulated to solve spatial and tem poral partitioning problems. This flow is com plete
and well-defined. But it is still based on traditional hardw are design, and com plexity is
even increased due to dynam ic reconfiguration. H utchings et al. proposed a JH D L tool to
handle run-time reconfiguration design [6]. It provides high-level language support to
express

dynamic

reconfiguration

and

a

dual

sim ulation/execution

environment.

Nevertheless, JH D L suffers from a structural design approach, which makes it too lowlevel to be used in large designs. JBits is another tool that supports run time
reconfiguration X ilinx’s JBits Softw are D evelopm ent K it (SDK) [23][24] gives system
designers the ability to directly configure and reconfigure the Virtex™ fam ily o f FPGAs
using standard Java software developm ent tools. W hile JBits does provide access to the
FPGA hardware at the lowest levels, it is possible to use JBits to design circuits at a
higher level o f abstraction using the R un-Tim e Param eterizable (RTP) Core library [26].
Recent enhancem ents to the R TP Core specification provide support for high-level
abstractions for placem ent, routing and variable granularity. In addition, output of static
net lists to formats such as ED IF is supported [25]

2.2 Stages in the JPEG 2000 A lgorithm
This section provides a very basic explanation o f the internal stages of the JPEG2000
algorithm. The JPEG 2000 algorithm will be presented from the point of view of the
encoding process. Unless otherw ise specified, the inform ation in this section has been
sourced from [28] and [29]. The JPEG 2000 Final D raft Comm ittee (FCD) divides the
algorithm into six sections. This is shown in Figure 2.1.

16

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Input Im age
Preprocessing

E ncoded Data
Data Odering

IntercomponenI
Transform

Wavelet
Transform

Arithemetic
Entropy
Encoding

Quantization

Cc-efficient
Bit modelling

JPEG 2000 Block Diagram

Fig. 2.1: Stages in JPEG 2000 Encoding Process

An image input to JPEG 2000 for com pression

first undergoes

some basic

preprocessing. Flere the input im age sam ples are level-shifted so that they have a
“nominal dynamic range that is approxim ately centered about zero” [28].A digital image
can contain m ultiple com ponents. For exam ple, a color photo is often specified in terms
of its red, green and blue parts. Each o f these is considered a separate image com ponent.
JPEG 2000 optionally allow s for an inter-com ponent transform to be applied to the image
after it has been level-shifted. This transform helps de-correlate the separate com ponents
for m ulti-com ponent im ages. N o other part o f the algorithm relates different com ponents
to one another. At this point in the process the image undergoes a D iscrete W avelet
Transform (DW T). In contrast to the D iscrete Cosine Transform (DCT), used by JPEG,
the D W T is the source of a num ber o f the superior features of JPEG2000. For exam ple,
the D W T deals with im age discontinuities far better than the D C T [28]. As a result,
JPEG 2000 displays none o f the artifacts that were present in JPEG versions of im ages
with sharp discontinuities. A dditionally, different variations of the D W T can be applied

17

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

depending on whether lossy or lossless com pression is desired. The output of the D W T
process is a set of transform coefficients. A fter the D W T has been applied, the transform
coefficients undergo quantization. For lossy com pression, quantization is one of the parts
of the algorithm where inform ation is lost [28].Once the transform coefficients have been
quantized, JPEG 2000 specifies a coefficient encoding procedure, referred to here as
“coefficient bit m odeling” . The quantized values are arranged into rectangular arrays
called code-blocks. The coefficient data in these code-blocks is considered to be a series
of bit-planes. (A bit-plane refers to “all the bits o f the same m agnitude in all coefficients
or samples” [29]). Each code-block is encoded one bit-plane at a time, with three
consecutive coding passes being used per bit-plane. These three coding passes are called
the ‘significance propagation’, ‘m agnitude refinem ent’ and ‘cleanup’ passes respectively
[29]. Each bit-plane encoding pass generates a series of output binary symbols. The
JPEG2000 standard allows these sets o f sym bols to be passed through an arithmetic
entropy encoder. This encoder com presses the symbols to further reduce the amount of
data to be placed in the output file. The specific arithm etic encoder used in this stage is
known as an “M Q Encoder” . The M Q encoder in JPEG 2000 is also com patible with the
arithmetic encoder used in the JBIG2 com pression standard for bi-level (e.g. black and
white) images [38]. After bit-plane and arithm etic encoding, the coding pass data is
“packaged into data units called packets, in a process referred to as packetization” [39].
This data ordering stage of the algorithm arranges the final JPEG2000 compressed
output, referred to as the “code stream ” . The standard allows for different arrangement
orders of packets within the code stream. The order in which packets are arranged affects
the way in which the im age can be progressively recovered. For exam ple, one ordering

18

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

arrangem ent allows the compressed im age to be progressively decoded with increasing
fidelity. A different ordering arrangem ent allow s progressive decoding with increasing
resolution. The resulting code stream containing the JPE G 2000 com pressed version of
the original input im age can optionally be w rapped in the JP2 file format, also specified
in the standard. The file format allows extra inform ation about the im age and its
interpretation to be included with the data. For exam ple, the code stream itself does not
specify the color system used by a m ultiple-com ponent image (e.g. RGB). That
inform ation can be specified in the JP2 file format.

2.3 M Q Encoder
2.3.1 A lgorithm of Q coder
A new adaptive binary arithmetic coding system , the Q-coder, was developed in IBM
research. It is characterized by m ultiplier free approxim ation, renorm alization-driven
probability estim ation and bit stuffing. Coding conventions are used in optimal hardware
and optimal softw are im plem entation. It also incorporates a new probability-estim ation
technique which provides an extrem ely sim ple and robust mechanism for adaptive
estim ation o f probabilities during the coding process.
This section presents an algorithm and convention of Q coder. First, a discussion of
the coding conventions leads to optimal hardw are and software im plem entation. Second,
Section 2.3.2 covers fixed precision operation, m ultiplier-free approxim ation, bit-stuff,
and the estim ation of probabilities by a new technique which uses only the interval
renorm alization which is a necessary part o f finite-precision arithmetic coding process.
Dynamic probability estimation makes the Q coder an adaptive binary arithmetic coder.

19

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

2.3.2 Coding convention of Q -coder
The basic structure of a com pression/decom pression system is shown in Figure 2.2.
The com pression process is divided into three basic parts: a model which converts
uncompressed data into binary decisions, a probability estim ator, and arithmetic coder.
The dash boxes enclose the parts o f the system com prising the Q coder. The model is
outside the scope o f this section. The model in the encoder uses the uncom pressed data to
determine the state S (used to determ ine w here the probability estimate Qe for that state
is stored) and Y N, the binary (yes/no) decision that is to be encoded. These are the inputs
to the Q coder. This model is called statistical probability modeling.

The com pressed

data are output one byte at a tim e and m ay be transm itted to a decoder im m ediately or

U n com p ressed data

M odel
YN

Probability
E stim ator
MPS

A rithm etic
Coder

U n co m p ressed data

Figure 2.2: Basic structure of a
generic Q -coder

20

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

stored for suture use. The Q decoder is sim ilar to encoder.
The code string approaches the final code from below. It is initialized to be zero
and always points to the base of the arithm etic-coding interval. This interval is
subdivided into LPS and M PS subinterval. LPS is beneath M PS. The relative size of
each code. Let A denote the present interval on the num ber line. Let C, the code
string, subinterval is determ ined by the estim ated LPS probability Qe and the
estim ated MPS probability Pe, which is equal to 1 - Qe. The following is a simplified
pseudo point to the base of that interval.
The coding process for a single symbol is as follows:

R eceiv e Y N
If M P S is encoded
C = C + Qe
A = Pe - A - Qe
E lse (L PS is en cod ed )
A = Qe
End
If A < 0 .7 5
R en orm alize A and C;
U pdate Q e
End

Figure 2.3: Pseudo code o f Q -coder

21

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

LPS in le t\ al = Q l

Figure 2.4: The interval in the unit interval [0, 1] divides into LPS subinterval and
M PS subinterval respectively

A ccording to the pseudo code, it is observed that renormalization is driven when A
register drops below 0.75. Thus, the norm alized range is maintained in the interval 0.75
to 1.5.

This keeps A centered around 1 so that the arithmetic approxim ations are

reasonably good. For ease of im plem entation in hardware, it is suggested that the test for
renormalization be done on the m ost significant bit of A. As mentioned above, most of
the symbols are M PS sense and few er symbols are LPS. But renorm alization occurs
following both the M PS (occasionally) and the LPS (always). Renorm alization following
the LPS always occurs since Qe, w hich is always sm aller than 0.5 assigns to A. If Qe
becomes smaller, the sequence of leading zero becom es longer. So the com putation on
LPS sense focuses on shift operation w ithout interval subtraction. The algorithm of Qcoder is suitable for hardware im plem entation because the interval subtraction and code
string addition can be done in parallel.

22

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

2.3.3 The Characteristic o f Q-coder
■ Fixed-precision:
Arithmetic coders usually avoid the increasing-precision problem by using a fixed
precision arithmetic. Im plem entation in fixed-precision arithm etic requires that a choice
be made for the fixed-precision representation of the interval. So, a renormalization rule
m ust be devised which m aintains the interval size. Both the code string and interval size
must be renorm alized identically; else the identification of the code string as a pointer to
the current interval will be lost. Efficiency of the hardware and software im plem entations
suggests that renorm alization be done by using a shift-left logical operation.
■ M ultiplier-Free Approximation:
One final practical problem needs to be resolved.

In general, arithmetic coding

requires a m ultiply operation to scale the interval after each coding operation. Generally,
multiplication is a costly operation in both hardw are and software im plem entations. An
early im plem entation of adaptive binary arithm etic coding avoids multiplication [13].
H ow ever the Q -coder uses an even sim pler approxim ation to avoid the multiply. If
renorm alizations are used to keep the current interval. A, in the range 0.75 < A < 1.5, the
multiplications required to subdivide the interval can be approxim ated as follows:
A x Q e - Qe
A x Pe = A X (1 - Qe) ~ A -Qe
■ Probability Estimation:
Adaptive arithm etic coding requires that the probability be re-estim ated periodically.
D ynam ic probability estim ation is a very im portant concept, it was developed in earlier
arithmetic coding im plem entations [42] [43]. The probability estimation technique used n

23

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

the Q-coder differs from the earlier techniques in a way the estimates are revised only
during the interval renorm alization that is required in the arithmetic coder. Estimations
only at renorm alizations are very im portant for efficient software im plem entations. The
inner loop o f the coder is then minimized. Since each renorm alization produces at least
one com pressed-data bit, the instructions cycle spent on the estimation process are related
to the com pressed-data code string length.
The dynamic probability estim ation can be defined as a finite-state machine, that is, a
table of Q values and associated next states for each type of renorm alization and MPS
exchanging flag for M PS and LPS reversing.

MPS

MPS

MPS.I

MPS

LPS

'LPS

.PS

Kex- ]
MPS

MPS

LPS

.MPS
MPS exchange
LPS

fLPS

MPS
LPS

UPS

MPS

Figure 2.5: Probability Estim ation

Figure 2.5 shows the actual finite-state m achine used to estimate the probabilities.
The leftmost section illustrates the exchange o f M PS and LPS definitions at Qe = 0.5. As
the LPS goes from Kex to Kex - 1, the LPS and M PS senses are reversed, as indicated by

24

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

the asterisk. The center section shows region w here the finite-state machine changes from
a single-state jum p on LPS to a double-state jum p. Some parts of the finite-state m achine
require a ju m p of m ore than one state in order to balance the m ovem ent to larger or
sm aller Qe indices follow ing the renorm alization. The rightm ost section shows the
diagram for the sm allest values o f Qe.

!

Figure 2.6: An exam ple of probability estimation for an LPS followed
by a sequence of M PS Symbols

Figure 2.6 illustrates the sequencing o f the probability estim ator for a LPS followed
by a sequence o f M PS s. The ordinate shows interval (A-register) value, and the abscissa
shows the discrete allow ed values o f Qe.

Solid lines indicate changes to the interval

resulting from coding operations: dashed lines represent changes resulting from
renormalizations.

T he initial LPS renorm alization (marked with an asterisk) causes a

transition to a know n A -register value and a known state in the finite-state machine. As
MPSs are coded, the interval (A- register) drops below 0.75. At that point a transition is
made towards a sm aller Qe. And the interval is renorm alized by doubling until it is

25

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

greater than 0.75. In m ost cases only one doubling is needed. But the pair of doublings
shown at Qe = 0.32831 is the exception rather than the rule.
The Qe values for each Q e index are chosen to have m ostly bit values of 0, except
that the Tbit is always a 1. This strategy simplifies the hardware im plem entation in the
sense that each bit value o f 1 (except the last) requires additional wiring and circuits.

a,
«auàr

&
.A *
ihrwrar;

X

x -w t'
X w ai
Xw r
x e u iX O W l'
xexa

»
!.
;
Ï
Î
1
:
:
Î
:
2
•
:

vM w r
X
X
X
X
:
x fo n X IK M X 'W I '
X

X WCB"
X o a rX
X W ifV

2
:
.2
2

;
I
r
j
3
;
j
;

;

■
:
;
i
;
;
i
1
1
1
!
!
•
1
!
i
1
i
!
i
?
:
:

;

0

:
0
Ü
6
0
0
0
0
0
C)
0
1)
0
0
0
0
e
(f
0
0
2
a
■a
a
a
')
u
Ù

a
j

lOlO
lix o

)
4

IÛOI
^111
» iie
end
(HOI

w o

à
e
*
#
H
:J
u
ij
0

;s
%
25
22
2j
J4
2Î
2À
27
2Î
;■»

600
OOl*
*1*
«<Wt
*0»
w *
noon
MMD
09*
«no
«0»
MW
«X »
MX*

«0*

'X W
axx)
o w

W !
ÛW: o t ) : r
00 » i
(w :
W :
*01
(081 Ù 2 4 I4 5
w i 0 :5 4 :*

w t
w t
DOW 0 * 1
il*
IW
% 0:
DM» WOI
tw
0001
OOtO 0 * 1

a .H o t,
o .io r
o .K )k .'
0 .5709:
0CWI2U

I t 'd

am i
am i
OlOi .001
ijk l *11
,» I 0 ; : i i
9 1 II
XV: M u
.0 :1
a x t )i;i
iXXC' DiGi
005 ;
xw
*01
1.11 II

o (G ù w
M u a :'
I).,* * ::

U W |2 ,

Figure 2.7: Probability Estim ation Table

Eigure 2.7 gives the inform ation necessary to perform the probability estim ation. For
each Qe value in the table, a LPS sense causes the Qe index to be decremented by the
amount indicated in the “D eer L P S ” column. M PS renorm alizations increment the Qe
index by 1 unless the Qe index had reached the m axim um value, as shown in the “Inter

26

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

M PS” column. An LPS from the top entry causes an exchange in the sense o f the MPS
and LPS, as indicated by the “M PSexch flag”. The Qe values as binary bits are followed
by the decimal form. The decim al fraction 0.75 as chosen to correspond to X ’lOOO’: this
determines the scaling in converting the first column to the last column.
■

Bit stuffing in the Q coder
A nother problem to be resolved in the fixed-precision arithmetic is a carry

propagation problem. It appears on binary arithmetic coding. It is possible to generate a
code string with a consecutive sequence of 1-bit of arbitrary length. If a bit is added to
the least significant bit of this sequence, a carry will propagate until a 0 is encountered.
Langdom and Rissanen [42] resolved this problem by “bit stuffing” . If a sequence of Ibit o f a predefined m axim um length is detected, an extra 0-bit is stuffed into the code
string. The stuffed 0-bit acts as a carry trap for any carry from future coding operations.
The decoder, after detecting this same sequence, removes the stuffed bit, adding any
carry contained in it to the code string remainder.

The Q -coder follows this general

scheme, but with the additional constraints that the string of I-bits is eight bits in length
and is byte-aligned.
The general principle of bit stuffing in reviewed in [42]. Carry propagation can only
occur through a sequence consecutive of 1-bits. In principle, the sequence o f I-bits can
as long as the code strings itself. To avoid arbitrary propagation o f the carry, sequences
o f Is are interrupted periodically by inserting or stuffing 0-bits. In the Q -coder the bit
stuffing is done only on byte boundaries. One zero is stuffed into the high order bit of the
byte im mediately follow ing any byte which is all Is(X ’F F ’), this 0-bitis the receiver for
any carry which m ight subsequently occur. The diagram below shows the 12 “x ” bits to

27

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

the right of the binary point aligned with the fractional bits in A, the four “s” spacer bits,
and the 8 “b” bits which will output as a byte when the flag bit which starts in the bit
position 24 is shifted out of the register. A carry, which will sometimes be shifted into
bit 24, is added to preceding byte.

If the 8 bits which will be output are X ‘F F ’, the

stuffed bit which is inserted is follow ing the high order seven “b” bits to prevent the carry
bit from next com pressed data.

28

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

CH A PTER 3

TOOLS USED
3.1 V ERILO G Language
The im plem entation o f the JPEG 2000 arithm etic encoder was written in VERILOG.
The code written was required to conform to the JPEG2000 standard for the arithmetic
entropy encoding stage. Verilog H D L is a H ardw are D escription Language (HDL).
V erilog is one o f the m ajor Hardware D escription Languages (HDL) used by hardware
designers in industry and academia. The V erilog language provides the digital designer
with a means o f describing a digital system at a w ide range of levels o f abstraction, and at
the same time, provides access to com puter-aided design tools to aid in the design
process at these levels. One of the main reasons for the popularity o f Verilog is that it is
very sim ilar to com puter language C which m ake it easy for engineers to understand
since they usually have prior experience w orking with C. VERILOG is widely used for
creating designs that can be program m ed into FPG A devices. It is a general and versatile
language that can be used to design a digital system and is w idely com patible with
industry tools. U sing VERILOG allows as much o f the design as possible to be portable
to other synthesis tools, other FPGA cards or even other FPGA architectures. Previous
experience had already been obtained in the use o f VERILOG for programm ing FPGA

29

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

devices. This experience was o f invaluable help for the im plem entation section o f this
project.

3.2 M emec Insight Virtex-II MB
The M emec Insight Virtex-II M B Developm ent Kit provides a com plete solution for
developing designs and applications based on the Xilinx® Virtex-II FPGA family. The
Virtex-II MB system board utilizes more than one-million gates Xilinx Virtex-II device
(XC2V1000-4FG 456C) in the 456 fine-pitch ball grid array package. The high gate
density and large num ber of user I/Os allows complete system solutions to be
implemented in the high perform ance FPGA. The system board includes a 2 M x 16
D ouble D ata Rate (DDR) memory, two clock sources, RS-232 port, and additional user
support circuits.

A Low Voltage Differential Signaling (LVDS) interface is provided

with 16-bit transmit and 16-bit receive ports, plus clock, status and control signals for
each. The Virtex-II FPGA fam ily has the advanced features needed to fit the most
demanding,

high

perform ance

applications.

The

M em ec

Design

Virtex-II

MB

D evelopm ent Kit provides an excellent platform to explore these features. The board
provides an ideal platform for reconfigurable com puting applications with its usage of
high-density FPGA devices that provide a template for reprogram m able logic circuits. It
also provides a unique test platform since a host com puter can interact with the
Programm able Elem ents through onboard memory. One im portant reason for selecting
this board is that JBits 3.0 is com patible with xilinx virtex-II FPGAs and can be used to
program the X C 2V I000-4FG 456C chip. The X H W IF interface which comes with the

30

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

JBits 3.0 SD K provided by xilinx supports the XC2V1000 chip and can be used to
com m unicate with a FPGA board.

S e le c tM a p
S la v e S e ria l

R S-232

7 -S « g m « n f

Display

OiP

I TAG P o rt

IT
B u lto n

Swrtthci

I/O Cor»r»eetor

VffteX'U FPGA
XC2V1000

(fWSt)

2 M X 16
DOR SDRAM

I/O C o n n e c to r
P tG O S lo l

1 6 Bit LVDS RX

16 B it IVDS TX

15 V
2.S V

Clocks (2)

3.3 V

Voltage
R e g ^ la lo r ik

R e s e t C ircuit

Figure 3.1: Block diagram o f M em ec Insight Virtex-II MB D evelopm ent Kit Board

3.3 X ilinx ISE 5.0
Synthesis is one of the most essential steps in the design methodology. It takes the
conceptual Hardware D escription Language (HDL) design definition and generates the
logical or physical representation for the targeted silicon device. Design im plem entation
is the process o f translating, mapping; placing, routing and generating a BIT file for the
design.

31

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Synthesis tools are required to produce highly optimized results with a fast com pile
and quick turnaround time. To m eet this requirement, the synthesis engine needs to be
tightly integrated with the physical im plem entation tool and have the ability to
proactively m eet the design tim ing requirements by driving the placem ent within the
physical device. In addition, cross probing between the physical design report and the
HDL design code will further enhance the turnaround time. X ilinx’s Integrated
Simulation Environm ent (ISE) provides support for today's m ost popular m ethods of
design capture including HDL and schematic entry, ISE even allows designs that contain
a m ixture o f V H D L and Verilog, integration of IP cores as well as robust support for
reuse o f your own IP. The com plex design process is stream lined and com pleted in less
time, reducing the overall project costs, with easy-to-use graphical interfaces and
powerful problem -solving technology such as Architecture W izards, PA C E floor
planning. Project N avigator and more. W ith these rich features, plus the mixture of
Design Entry capabilities, ISE provides the easy to use program m able logic design tools
available today for logic design.
Once synthesis is done configuring the program m able logic device is the last step in
the design m ethodology. A bit stream is generated from the physical place and route
information and is transferred through cables to the target device. IM PACT, included
with all ISE configurations, is a robust configuration tool that autom atically takes care of
everything from bit stream generation to the device download. A single interface
provides support for both parallel port IT AG cables, and USB and RS-232 serial port
M ultiLINX cables. In this thesis w e use Xilinx ISE 5 to synthesize and configure the

32

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

FPGA .There is no specific reason for choosing this even though availability o f the tool
and its com patibility with the Virtex - II FPG A played a significant role in its selection.

3.4 JBits 3.0
The JBits API is a set of Java classes that provide an Application Program Interface
(API) into the Xilinx Virtex II FPGA fam ily bit stream. This API may be used to
construct digital designs and param eterisable cores that can be executed on Xilinx
Virtex II FPGA devices. The API provides the low est level interface to the Virtex II
architecture and thus it can also serve as a base to construct traditional circuit placem ent
and routing, as well as application specific tools to perform more narrowly defined tasks.
Figure 3.1 shows a com ponent view of the JBits environm ent. The following sections
discuss each com ponent in greater detail.

JB its L ibrary

W ire D atabase

B oardscope
D e b u g g ei

M

R e m o te H a rd w a re

K

N—1/

F P G A H a rd w a re

Figure 3.2: JBits Environm ent

33

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

JBits operates on either bit streams generated by Xilinx ISE design tools, or on bit
streams read back from actual hardware. This provides the capability o f designing,
m odifying and dynam ically m odifying circuits for Xilinx Virtex II series FPG A devices.
This capability is achieved by providing access to all the resources of a Virtex II device.
The API gives access to the Look up Tables (LUT) inside a Configurable Logic Block
(CLB) and to the routing resources o f the Virtex II device. The device architecture is
represented as a tw o-dim ensional array of Configurable Logic Blocks (CLB) in JBits.
Each CLB is referenced by a row and column, and all configurable resources in the
selected CLB m ay be set or probed. Control of all routing resources adjacent to the
selected CLB is made available in the API. JBits code is written in Java. JBits API
provides a High level language approach to develop reconfigurable systems including
runtime reconfiguration. Though JBits API is an enabling technology for RTR, the API
can also be used to produce traditional static design bit stream files for Virtex II FPGAs.
The design flow follow ed is different from that of the traditional CAD tools.
U nlike the conventional design flow, that uses H D L or schematic design entry, a design
is specified in a Java program using the JBits API. The application takes a bit stream file
as an input and extracts the device configuration data. This data is m odified according to
the design specified using the JBits API and is output to a bit stream file that will be used
to configure the Virtex II hardware. Once dow nloaded to the Virtex II hardware, the
design can be debugged using the X H W IF interface. The execution model for this flow is
illustrated below.

34

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

B itstream
F ro m X ilin x ISE
Too!

D e s ig n E n try

D e s ig n
Im p le m e n ta tio n

D e s ig n E x e c u tio n

J b its A P I

D e s ig n A P P

B itsre a m
M o dified by JB its

V irte x -II
H a rd w a re

S ta tic D e s ig n F lo w

Figure 3.3: Static Design Flow

The real pow er o f JBits is its use in the development of Java Run Time
Reconfiguration (RTR) applications. In this flow, the circuits can be configured on the fly
by executing a Java application that com m unicates with the circuit board containing the
Virtex II device. This is made possible by using the JBits to specify the design and use
the XHW IF API to dow nload the design within the same Java application. This design
flow is illustrated below. Again, this bit stream input to the Java Application can be a
null bit stream or a bit stream for an existing design.

35

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

B itslrc am
F rom X ilinx ISE
lo o l

D e s i g n 1 -n try a n c
im p lc m c n ia tio r

D esign A P P

D e s i g n V e r il'ie a iio n
a n d E x e c u tio n

V irte x -Il
H a rd w a re

R T R D e sig n F iu w

Figure 3.4: RTR Design Flow

The execution model for this flow is illustrated below. In this case, the host com puter
executing the JBits application uses the X H W IF interface API to com m unicate with the
Virtex II reconfigurable hardware. For exam ple, in a typical PC com puting environm ent,
the host m icroprocessor executes the JBits application and configures the Virtex II
reconfigurable hardw are located in the PCI slot or connected by a JTAG cable, using the
XHW IF API. This enables run-tim e configuration and reconfiguration of the Virtex II
device.
3.4.1 Accessing the CLB resources:
A resource is defined as the configurable elem ent inside FPGA. This may include the
CLB inputs (F LU T inputs, G LU T inputs, BX, B Y ...), Routing (singles. H exes,....),
Clocks, etc. The JBits API gives the user the ability to configure CLB directly. The CLB
functional elements are in classes in the Java package which is given below.

36

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

com.xilinx.JBits. Virtex2.Bits.Logic. Center and com.xilinx.JBits. Virtex2. Wires. Center.
For example, consider the Java class com.xilinx.JBits.W ires.Center.BXO. This class
abstracts the Slice 0 BX input o f CLB.

B itstream s

Jbits A p p lica tio n

X H W IF
Virtex-II
Hardware

Hosi
Microprocessor

In te rfa c e S lo t

R T R E x ec u tio n M od el

Figure 3.5: R TR Execution Model

Consider the following code: L et’s select Slice 0 BY to be Slice 0 BX input of CLB (0,0).
import com.xilinx.JBits. Virtex2.Bits. Wires. Center
JBits jb its = new JBits(D evice.getD evice(deviceType));
jbits.read("m yB itstream .bit'j; int clbRow - 0; //R o w 0
int clbCol = 0; / / Column 0

37

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

LUTs in CLBs can be used to implement logic functions. In JB its, Virtex CLB LUTs
are defined in the com .xilinx.JBits.Virtex.Bits.Logic.C enter.LU T class. The configuration
data can be read from and w ritten to the LUTs using the set CLB Bits() and get CLB
BitsO methods respectively.

The CONTENTS field can be considered a 2D array

indexed first by the required slice (0, 1, 2, 3) and then by the F or G constants defined in
the class. Thus to access the F LUT of slice 1, you would use LU T.C O N TE N TS[l]
[LUT.F].
3.4.2 JBits W ire Database:
JBits has a set of classes that provide an API for obtaining connectivity information.
The inform ation about the connectivity is collectively known as the JBits wire database.
These classes contain inform ation about connectivity within a tile, between tiles, and bits
associated with each connection. Full support for every tile and resource in the Virtex-II
FPGA is available.
The JBits wire database represents the connectivity of a X ilinx FPGA with Java
objects. The database is represented in a device generic m anner and can be used for any
member of the Virtex-II fam ily o f FPGAs. Inform ation can be obtained about sources,
sinks, and bit stream settings using the API. The database uses a tile based approach
which uses one object to tell the connectivity of a single tile type (intra-tile routing graph)
and a method to create the connectivity between tiles (inter-tile routing graph). This
leads to a single representation o f any device in a fam ily of FPGAs. Some important
definitions are:
Wire - A template of a physical wire in a tile. It includes m ultiplexer input and output
connectivity inform ation (intra-tile routing graph). A lso includes inform ation about the

38

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

location

o f bits

in

the

bit

stream

that

are

set to

modify

the

connectivity.

Pin - An instantiation o f a W ire tem plate that is specific to a tile in the device.
Segm ent - A collection o f Pins that are directly connected together across tiles (inter-tile
routing
package

graph).

Som e

im portant

packages

com .xilinx.JBits.Archlndependenf.

This

of

the

package

w ire

database

includes

are

architecture

independent im plem entations o f the following classes: Wire, Segment, Pin, Lookup table,
package com.xilinx.JBits.Virtex2: This package contains the architecture specific classes
that provide a main entry into the database.
3.4.3 XHWIF:
The XHW IF API provides various methods to describe an FPGA -based board and to
send data on and off the board. It includes m ethods for reading and writing bit streams to
FPGAs, and m ethods for describing the kind and number of FPGAs on the board. Also
included are m ethods for increm enting the on-board clock, and for reading from and
writing to on-board m em ories, if they are available. The interface standardizes the way
the applications com m unicate with hardware, so that using the same interface can
com m unicate with a variety of boards. All of the hardware specific information is hidden
inside of a class that im plem ents the X H W IF interface. U sing the Java program m ing
language's Native M ethods, X H W IF can com m unicate with hardw are directly, through
libraries or through a device driver. The advantage of this approach is that new hardware
can be quickly and easily supported. All that is required is that an XH W IF interface for
the new hardware be im plem ented. Once this interface exists, all software which uses the
XH W IF interface will run on the new hardware, without m odification and without re
compilation. The X H W IF interface can be used to com m unicate with a FPGA board.

39

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.

once the XHWEF API is ported to that particular board. Once the API is ported to a
particular board various tools and applications that use the XHWEF A PI to perform
various hardware board operations can run on the new ly supported piece of hardware
without any re-com pilation. Finally, XHWEF package also provides a client/server
program. This can be used to connect to hardw are’s placed rem otely on a network.
However, application that uses the XHWEF API can run on a local and rem ote hardware
without modifications. The XFIWEF API provides a generic set of methods to interface
with FPGA hardware. These methods calls can be used to com m unicate with a board
once the board interface is obtained. The first and forem ost step in com m unicating with a
board using the XEEWEF API is to obtain the interface specific to that board. The
XHWEF.GetO method is used for this purpose. The exam ple code shows how-to obtain
the board interface for a fictions hardw are nam ed, "MyBoard". The board name can be
passed as a com m and line param eter but it is hard coded here for illustration purposes.
Only after connecting to the board, other operations are performed.
String boardName - "MyBoard";
X H W IF board - XH W IF.Get(boardName);
/* Get the remote host nam e (i f we have one) */
String remoteHostName = XH W IF.G etRem oteH ostNam e(boardNam e);
/* Get the remote p o rt num ber ( if we have one) V
int port - XHWIF. GetPort(boardName);
/* Connect to the hardware */
result = board.connect(rem oteH ostNam e, port);

40

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

O nce connected, inform ation about the type and num ber of devices on the board can
be obtained. This inform ation can be used to instantiate an appropriate JBits object and
read in the bit stream to be dow nloaded on to the board. Once connected X H W IF
provides methods to R esetting and configuring the FPGA device, clocking the Design,
Reading back the Bit stream accessing the on-board memory.
3.4.4 Board scope
Board scope is a graphical FPGA debugging environm ent that operates at the bit
stream level. Com m unication with hardware takes place using an underlining X H W IF
layer. This allows B oardScope to use any hardware platform that has been ported to
XHW IF. The B oardScope environm ent displays FPGA read-back inform ation in a
graphical context. This inform ation includes CLB flip-flop states, LUT configurations,
BRA M data, and lO B register states. The debugging process is started with a connection
to a supported FPGA hardware platform, either locally, or remotely using a network
connection. A fter connecting to desired hardware, bit streams can be loaded on the
device. D ebugging in B oardScope allows the user to switch between hardware platforms;
Figure 3.4 provides a screen shot of the BoardScope debugging environm ent, along with
captions that explain features. The debugging environm ent features different graphical
views in which the operation of the FPGA hardw are is shown in different contexts.
Possible views include State, Core, Power, and Routing Density. As an exam ple, the
State view provides a graphical display of read back state information. The main grid
representing the CLB layout shows the state inform ation of all four flip-flops within a
single grid square. Clicking on a CLB grid square causes the look-up table configuration

41

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

to be displayed in the graphical CLB viewer. These views create a robust debugging
environm ent for R TR applications.

CLO C K S T E P AND
READBACK

ZOOM

Êzmm

BOSSMSSIfll
fm

M
at»

i^ ^ S !
CLB STA TE
INFORM ATION

L O G IC 0

L O G IC 1

L O O K -U P TABLE
VALUES

r. O uW j

:'S # F
COM M AND

WINDOW

Fig. 3.6: Snapshot of the BoardScope tool

42

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

C H APTER 4

M ETH O D O LO G Y
4.1 V ERILOG Implementation
This section discusses the im plem entation of the arithmetic entropy encoder in the
JPEG2000 standard in VERILOG. O nce written in VERILOG, the encoder design can
be synthesized and program m ed into the FPG A on the M emec Insight Virtex-II MB
board. The background issues related to the im plem entation are discussed first. Following
this, the design m ethodology for the V ERILO G code is presented. The implementation
process involves several steps. These included an exam ination of both the JPEG2000
standard, the design o f the encoder data path and the design of an overall controller for
the encoder.
4.1.1 Arithmetic Entropy Encoder
As outlined in Section 2.2, the coefficient bit m odelling stage of JPEG2000
compression involves encoding the quantized coefficients produced by the wavelet
transform. Coefficient bit m odelling involves a series o f ‘passes’ for each code block bitplane - the significance, refinem ent and cleanup passes. Each o f these produces a series
of output binary symbols. It is these sym bols which can then be com pressed further by
passing them to the arithmetic entropy encoder (also ju st referred to as the arithmetic
encoder).

The

standard

additionally

allows

an

arithmetic-coding

43

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

bypass mode (also known as ‘lazy’ m ode [28]), whereby the arithmetic encoding stage is
skipped for some coding passes [28] [29]. The bypass m ode can be used to reduce the
com putational com plexity of the com pression process [28]. This feature of the standard
reinforces the profile results which show that the arithm etic encoding stage is
com putationally expensive and therefore w ell-suited to hardw are processing. The specific
type o f arithmetic encoder used in the JPEG 2000 standard is known as ‘M Q coder’ [28].
4 .L 2 Exam ination of the JPEG2000 Specification
The first step in the developm ent process was to analyze thoroughly the JPEG2000
specification for the arithmetic entropy encoder. Analyzing the standard led to a good
know ledge o f the internal workings o f the arithmetic encoder as well as to initial ideas
about how a hardw are im plem entation could be designed. The JPEG2000 standard
explains basic data inputs and outputs o f the encoder, as shown in Table 4 .E This table
gives a good description of the inform ation passed to and from the encoder. The specific
binary inputs and outputs used to transm it the inform ation will be covered later.
4.L 3 D ata path Design
After exam ining

JPEG

2000 encoder im plem entation,

work

on the VERILOG

im plem entation began. Initial task for an im plem entation was to design of the data path
for the encoder. Eirstly, the key registers in the data path were identified. Nearly all of
these registers corresponded to variables in the JPEG 2000 specification. These main
registers are shown in Table 4.3.

44

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Table 4.1: A rithm etic Encoder M ain D ata Inputs/Outputs
Data Inputs

Description

Input Bit

T he symbols produced by the coefficient bit modeling stage are
encoded one bit at a time.

Context

E xtra inform ation passed in addition to the actual input bits which is
required for com pression is known a ‘context’. A context can be
specified with tw o pieces of information:
A n Index. This index points to an entry in a large look-up table of
constants used in the com pression process. This table is Table C-2 in
the JPEG2000 FCD [29], also referred to in the VERILOG code as the
‘Qe-value table’.
A M ore-Probable Sym bol (MPS). A t any stage, the encoder considers
one of the binary digits 0 or 1 to be the more probable ‘sym bol’ to be
encoded. The com plem entary binary symbol (0 or I) is consequently
the Less-Probable Symbol (LPS).

Output Byte

Stream of com pressed bytes o f data produced by the entropy encoder
at its output.

Table 4.2: A rithm etic Encoder Tasks
Encoder Task
Initialize
Set context

Encode bits

Flush

Description
Clears the internal state o f the encoder so that it is ready to start
encoding bits.
Sets the current encoder context that will be used to encode
subsequent bits. The current context is part of the encoder’s internal
state.
B it Encode the next bit of input data. An Encode Bit task is usually,
but not always, preceded by a Set Context task. A consequence of the
Encode Bit task m ay be that the encoder outputs another byte of
com pressed data. H ow ever, this generally happens only after multiple
Encode Bit tasks.
Signals the end o f a stream o f input bits. The Flush task generally
causes the encoder to output m ultiple concluding bytes of com pressed
data. JPEG 2000 supports two variations of term ination for the Flush
task.

45

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

The arithm etic encoding algorithm often involves operating on these data registers and
placing the results back into the same registers. All the arithm etic operations to be
perform ed on the registers are listed after com pletely exam ining the algorithm. Some
registers, such as the K and MPS registers, could be updated by only a few different
operations. Others, in particular the C register, could be updated by any one o f a
substantial set of arithmetic calculations. Once registers and the operations which acted
on them had been identified, the VERILOG code for the data path was written to
im plem ent this com putational functionality

Table 4.3: Datapath Registers
Register
C

Bits
32

A
CT

32
5

M PS
Index
B

1
6
8

Tem p C
K

32
5

D escription
32 Code register. C om pressed data bytes are initially generated
in certain bits o f this register.
Interval registers for arithm etic encoding.
Count register. U sed to count the num ber of times A and C
registers a bit-shifted under certain conditions.].
C urrent M ore Probable Symbol
C urrent index into look-up table.
O utput byte buffer. The next com pressed byte to be output
com es from this register.
Tem porary C register.
U sed for the predictable term ination variant of the Flush task. .

Control signals were created to select the operation that was performed on a given
register on the next rising clock edge. D uring synthesis, large multiplexers would be
generated at the inputs of registers that could be updated by a great number of different
arithmetic operations. As the m ultiplexer size increases, so does its propagation delay. O f
all the registers in the encoder, the C register is associated with the largest num ber of
operations. As m entioned briefly in Section 2.3, the arithm etic entropy encoder also

46

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

requires the use of a large look-up table containing constants. These constants relate to a
probability estim ation process that the encoder perform s [29], any change in the look-up
table results in change of com pression ratio. A grey scale image can be com pressed more
efficiently if we used a different look-table since we are dealing only with 256 shades of
grey. At this point it is made clear that there were two different arithmetic encoder design
created v e r ilo g J ile _ l.v and verilogJile_2.v.T hese tow file differ only in their probability
look up table.
4.1.4 Designing the Controller
Once the data path was in place, the controller for the VERILOG arithmetic entropy
encoder was designed. The controller was responsible for translating the four encoder
tasks in Table 4.2 into a series o f operations on the data path registers. The JPEG2000
specification clearly states all the operations that m ust be performed on each registers in
order to produce a correct com pressed data. The JPEG 2000 standard makes heavy use of
flowcharts to specify this internal behavior. A significant advantage of the hardware
im plem entation over the software im plem entation is the ability of the hardware to
perform parallel processing. O perating on two or more registers in the same clock cycle
allows faster perform ance to be achieved than if only one register can be m odified at a
time. Designing the controller involved analyzing the flowcharts in the standard to
determine which operations could be perform ed in parallel and which could be safely
executed only in sequence. A dditionally, it was required that the V ERILOG design be
written as synthesizable code. T he prim ary requirem ent of the VERILOG im plem entation
was one of the com pliance. A synthesis tool such as Xilinx ISE had to be able to compile
the VERILOG arithmetic encoder so that Virtex-II FPGA could be program m ed with the

47

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

design. O nce the code for the arithmetic encoder was w ritten, a test-bench was written
(also in the V ERILO G language) to test the V ERILO G encoder with test data. The test
bench was sim ulated in using A ldec’s A ctive-H D L tool. This test was passed
successfully. The output produced by the V ERILO G im plem entation matched exactly the
expected output stream o f bytes.

Table 4.4: Outputs of the A rithm etic Encoder V ERILO G Entity
Output
boutreg
encrd

Size
8
1

bready

1

cout
aout
ctout
m psout
Indout

32
32
5
1
6

D escription
Bytes of com pressed data are output from this port.
Equals ‘1’ when the arithm etic encoder is idle and ready to
perform another task.
Usually ‘O’, this signal is set to ‘I ’ for one clock cycle when
a new byte o f com pressed data appears on boutreg.
These five signals are provided as outputs in order that the
internal state o f the arithm etic encoder can be observed at
any point in time. These outputs are directly connected to
The associated registers in the encoder data path.

4.2 Synthesis and Pow er Estim ation o f the Design
The Xilinx Integrated Software Environm ent (ISE) 5 along with Xilinx Power
(XPower) is used for design synthesis, configuration and estim ation of power. The Xilinx
ISE 5 uses the steps shown in Figure 4.1 to generate the files required for the X Pow er to
com pute power. Once tested for its correctness the code had to be synthesized. X ilinx’s
ISE 5.0 synthesis and im plem entation tool was used to synthesize the code. During
synthesis, we create a new project and specify the device as V irtex II xc2vl000 with
package as FG456 where EG stands for the package type Fine G rid and 456 stands for the
num ber of package pins.

48

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

The first step is design entry; here we include the Verilog code describing the
encoder. Once the design has been entered it is synthesized, during which the Verilog
code is converted to a net list file which is passed on as input to the next step. After
synthesis, you run design im plem entation, which converts the logical design into a
physical file form at that can be dow nloaded to the selected target device. Im plem entation
process consists o f three steps Translating, M apping, Placing and Routing. The Translate
process merges all of the input net lists and design constraints and outputs a Xilinx native
generic database (NGD) file, which describes the logical design reduced to Xilinx
primitives.

The M ap process, maps the logic defined by an NGD file into FPGA

elements, such as CLBs and lO B s. The output design is a native circuit description
(NCD) file that physically represents the design mapped to the com ponents in the Xilinx
FPGA. An A SC II Physical C onstraints File (PCF) is also produced in MAP. The PCF
contains tim ing constraints that are used by pow er estimation tool X Pow er to identify
clock nets. It also provides tem perature and voltage inform ation, if these constraints have
been set in the U ser Constraints File (UCF).

The Place and Route process takes a

mapped NCD file, places and routes the design, and produces an NCD file that is used as
input for bit stream generation. The next step is Generate Program m ing File process
which produces a bit stream for Xilinx device configuration.

49

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Verilog Design
Entry

Synthesis

Map

Timing
Simulation
Place and Route
NCD & PCF

VCD

Xpowet

Impact
Download to FPGA

Figure 4.1: Design Flow for Pow er calculation and FPGA configuration

After the design is com pletely routed, you m ust configure the device so it can execute
the desired function. This bit stream is used to configure the FPGA using IM PACT. The
design was verified for syntax error and the post im plem entation verification was done
using floor plan. The design passed all the verifications. Once the design has been
synthesized and im plem ented we calculate the pow er using the X ilinx’s X power tool.
The timing simulations done in Active-FfDL are verified using the waveform editor.
The waveforms are exported as Value C hange D um p (VCD) files. In the last step, the
X Pow er uses the files generated in the previous steps to estim ate pow er consum ed by the
design. XPower calculates the pow er in the design by summing up the power consumed

50

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

by each element. The pow er consum ed by each switching element in the design is given
by equation
P = C * y - *E* F ,

(4.1)

where P represents the pow er in mW , C represents the capacitance in Farads, V
represents the voltage in Volts, E represents the switching activity (average num ber of
transitions per clock cycle), and F represents the frequency in Hz.
The capacitance is determ ined from the NCD file. The voltage is a fixed value for a
specific device and it is set by default in the X Pow er interface. F * E gives the activity
rate o f the signal in the design. The activity rate is defined as the rate at which a net or
logic element switches. F or dynam ic pow er calculation activity rates are expressed as
frequency, which is the m ost variable elem ent in Equation 4.1. The activity rates are set
in the VCD file generated from the tim ing simulation in Active-HDL.

4.3 JBits and X H W IF
Once we have a w orking model of the arithmetic encoder synthesized and pushed on
to an FPGA.

JBits was used to reconfigure the FPGA dynam ically. W ith not much

documentation available on the latest version of JBits we had to w ork hard on the API to
understand its utilities. The first and m ajor step in using JBits is the porting o f Xilinx
Hardware Interface X H W IF on to the board. M ost challenging aspect of porting to new
hardware in the X H W IF environm ent is the use of the Java program m ing language. The
current method used by Java to provide support for non-Java im plem entations is called
the Native method. The Java N ative M ethod is sim ply a function defined in a Java
program, which is tagged with the reserved keyw ord native. This means that no Java code

51

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

will be supplied for this function. Instead, Java will search some user-supplied library for
the code, which im plem ents this "native" function. This approach is the technique used
by Java to interface to other languages such as C, which in turn is used to interface to
device drivers, xc2vI000 hardware or other non-Java software.
Porting X H W IF can be basically divided in two sections, the first section deals with
board description code, which tells X H W IF w hich FPGA devices and packages are used
by the hardware and the second part deals with Java Native m ethods and the library
creations. So as said above we created a xc2 vl0 0 0 .ja va a board specific Java file .This
file contains the basic inform ation about the board; the FPGA device on the board and
their package type o f the FPGA in our case it was FG456 and their description. Once
xc2vl000.java is written it is com plied using the Java xc2vl000.java com mand to get the
class file. This class file is stored in the JBits3.0/com /xilinx/X H W IF/Boards directory.
Once these constants are defined, you m ay com pile this file to produce a J a v a class
file. Note that your com piler may require that other XH W IF classes. See your com piler
docum entation if it com plains about m issing classes. Once you have produced this class
file, it should be m oved to the XH W IF/Boards directory.
Once the board has been described and the XH W IF interface compiled into a class
file, the Java Native M ethods m ust be im plem ented. These native methods are the lowlevel and system dependent functions w hich provide access to the hardware. The
following native methods must be im plem ented to provide full XH W IF support:
public native int xc2vlOOOConnect();
public native int xc2v1000D isconnect(};
public native int xc2vl000G etSystem Info(int d a ta f], int length);

52

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

public native int xc2vl000R eset();
public native int xc2vl000SetC lockF requency(float frequency);
public native int xc2vlOOOClockOn();
public native int xc2vlOOOClockOjf();
public native int xc2vl000C lockStep(int count);
public native int xc2vl000G etC onfiguration(int device, byte data[], int length);
public native int xc2vl000SetC onfiguration(int device, byte data[], int length);
public native int xc2vl000G etR A M (int address, byte dataf], int length);
public native int xc2vl000SetR A M (int address, byte data]], int length);
U sing Java x c2 v l0 0 0 command, we produce xc2vl000.h "C" header files, which
describe the "C" functions that are called by the Java Native M ethods. O nce these header
files are produced, we wrote xc2 v l0 0 0 .c a "C" code to im plem ent these functions. The
actual code to im plem ent the hardware functionality will be placed in a single "C" file,
which will have "generic" functions which will be referenced by the JNI. Files generated:
•

xc2vlOOO.h The xc2vl000./z file is simply a small file containing the function
prototypes for the xc2vlOOO.c file

•

xc2vlOOO.c This file contains all o f the functions which must be implem ented to
produce the "C" library

•

xc2vlOOOJNl.h This is the header file that is autom atically generated by javah
utility from the Sun JDK.

•

xc2vlOOOJNl.c This provides the bridge between the autom atically generated
interface and the more generic functions in xc2 vl0 0 0 .c This file can easily be
produced by taking the function prototypes for xc2vl000./M ./z

53

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

All the files are copied on to the com /xilinx/X H W IF/Boards directory where they all
are com plied to produce the libraries required by the Java Native M ethods. The
xc2vlOOO.dll library file produced is copied and place in the Java 1.4.2/bin directory so
that it can be accessed by any JBits program related to the board. Therefore the four
im portant files creating a subclass of the com .xilinx.XH W IF package. This is a subclass
that describes the details of the new hardware platform. In the exam ples below, this class
will be called

"xc2vlOOO". Compile this interface, called xc2vl000.java into an

xc2v1000.class file and place this class file in the "XHW IF/Boards/" directory. Produce
the Java Native M ethod interface for the xc2 v l0 0 0 class by running Sun's Javah
Im plem ent the Java N ative M ethod interfaces in "C" and com pile into a library / libraries.
Place these libraries in an appropriate directory (typically one in the search path).
Once the XHW IF was ported a code was written using JBits API to read the bit stream
from the board and load a new m odified bit stream on the board. During the load stage
we make sure that the clock has been stopped. W e can also load the new bit streams using
BoardScope.

54

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

CHAPTER 5

RESULTS AND D ISCUSSIONS
As m entioned earlier, the arithmetic entropy encoder used in JPEG 2000 is known as
an ‘M Q encoder’ and is also used in the JBIG2 com pression standard. W hile the
JPEG 2000 Final Committee Draft (FCD) [29] contains no test data for the arithmetic
encoder, such data is found in the JBIG2 Final C om m ittee Draft [29]. As the arithmetic
encoders used in JPEG2000 and JBIG2 are com patible, the test data in the JBIG2 FCD
can be used to test the V ERILO G encoder developed in this project. The test data
sequence from JBIG2 consists of a series of input bits that are fed to the encoder, as well
as the sequence o f output bytes that a correct im plem entation should produce. One small
difference between the arithmetic encoder specifications in the tw o standards should be
noted. Since the ‘F lush’ task in the JBIG2 FCD results in the constant bytes OxFF and
OxAC being output at the conclusion of the ‘F lush’ procedure and the JPEG2000
specification

does

not

result

in

these

concluding

two

bytes

being

produced.

Consequently, when testing the V ERILOG encoder im plem entation the sequence of
expected output from the JBIG2 FCD was adjusted to rem ove the trailing OxFF and
OxAC bytes. This was the only m odification made to the test data. A test-bench was
written (also in the VERILOG language) to test the V ERILO G encoder with the JBIG2
test data. The test-bench was sim ulated in the Active-FfDL programm ed. This test was

55

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

passed successfully. The output produced by the V ERILOG im plem entation m atched
exactly the expected output stream o f bytes obtained from the JBIG2 FCD. Both the
codes have been tested and the simulation w aveforms are near same.

# Aclive-HDL6.2(Yeraog_FI(e_l,ArIthmetlc_Encoder)-WdveforniEtBtw1■*
ffe
Sea[* Mew Wgkspace Qeagn
Kavefonn look #xlow
iü-cÿia^ jS
!lH |9 iJlP S ;S S ^||lW ?% i© l i
W&
^
Mt 3^ «&I ^
Q Q % ' \J V â'I
t----------:
|Ç3mqenc_tb_shift
jvalue jstrnuUftc» -1
R: fesetn
n
1Tr-# mqenc_tb_shat
1

R: dock

SI "■ ctRegOut
^

Ivatie ±
1
1
—'
0
00
0
0

R'clock
R-mpsin
5 P=indexin
" Win
R=doSetContext
R:doEncodeBk
dolnit
R-doFlush
Aj...... V

03
0

U

^
1}
CjReso^'

inCount

0000

:*;R- dalalnReg

040QA,

L+'R=
*■

dataOutReg

00000..

outputDK

0

inCountOp

2

inOp

2

presState

4

;+;R- nextSlate

4

R- mpsin

0

i±;R- indexin

00

" bitIn

0

R- doSetContext

4P0

•

' 500

■ •

700

1 . 800

900

■

tOOO

.

1100

I

1200

1

X®

3000000000000000000000000000000000000

XOOOOOOOOGOOOGOOCxxxxxxxxxx%= XXXX" xxxxxxxxxxoo
xCXXXXXXXXX)0 (XXXXxxxxxxxxxx%= XXXX" XXXXXXXXXX" XX
i

n

,0

d e s ig n flow .-.a iw a v e fo rm e

Al

L / . / : / ........

c\,'

E x p o r t of d e s i g n [arichitietic_encoder] f i l e s to H T M L h a s b e e n f i n i s h e d
U D e s i g n tuas e x p o r t e d t o [ E : \ U t t h a m a n \ N e w F o l d e r \ M y D e s i g n \ V e r i l o g _ F i l e _ l \ A r i t h m e t i c _ E n c o d e r \ a n t h m e t i c _ e n c o d e r ]
# E x p o r t D e s i g n t o H T M L t i m e : [6 s]
a

# Starting page

Console

<1

XOA

XOC

00F7

l±lR- oulCouni

>

X

„
i5O0r»+2
ik % 4

jm m im m jim uim m n tm nm fm m m m m m m m jirm m inm m i
XXXX" Xw
XK
x=
x s o o o œ c x z O G O C X J ^ .... x-x-^'x X r x ' x«« )
C D C O O O O O C bO D D C C I . vx XXr x X >
B

mpcRegOut

R= END SIM

I.+-R-

IClock

i09

I'+l *■ indexRegOut

i+.iR=

1

iO

300

..taM
" . ■ ■ ■ ■ »

is E :\ U t t h a m a n \ N e w F o l d e r \ H y D e s i g n \ V e r l i o g _ F i i e _ l \ A r i t h r a e t i c _ E n c o d e r \ a r i c h i n e t i c _ e n c o d e r \ i n d e x .h t m

/

F igures. I: Sim ulation w aveform s o f the verilog_file_l.v code

Once the codes w ere tested and sim ulated we exported the .vcd file of for pow er
calculation. The V erilog im plem entations o f the arithmetic encoder were synthesized
successfully using the X ilinx ISE 5 software. The place and route procedure was also
performed, in order to generate tim ing inform ation. For place and route, the FPGA
targeted was a V irtex II x c2 v l0 0 0 device of speed grade -5 , as found on the board. The

56

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

A

m aximum clock frequency of the V erilog_file_l encoder w as reported by Xilinx ISE 5 to
be 73.43MHz. and 81.3 M Hz for the V erilog_file_2 encoder design. Once the designs
were placed and routed we verified the design using floorplan which give us a better
picture of the resource utilized by the designs and all the input output ports used.

T ab les.1: R esource Utilization of the designs on X C 2V 1000 FPGA

N um ber o f Slices
N um ber of Slice Flip Flops
Number of 4 input LUTs
Number of Board lO B

V erlog_file_I
490
235
861
100

Verilog_file_2
444
214
823
100

The floorplan also gives us an idea about the routing resources used; we see that all
the wires and CLBs used are concentrated in a single location giving us an idea of
modularity. W e also see that the design occupies less than half o f the resources available.
Since both the codes are more or less the same except for the look up table we have not
provided the floor plan for the second code. The sim ilarity of the designs can be
understood once w e see the synthesis reports o f the designs. The synthesis report for both
the codes are tabulated below, we see that there is a little difference in the resource and
pow er consumptions o f the designs .The design verilog_file_l.v consumes more
resources and more dynam ic pow er than verilo g Jile_ 2 .v since the look table used in this
design has more entries than the one used in verilog_file_2.v.
Once the resource and pow er data are tabulated the JB its X H W IF was created for the
Virtex II x c2 v l0 0 0 board, although the library files were successfully created,
unfortunately the interface couldn’t be ported to the board since Xilinx Hardware

57

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Interface for JBits 3.0 had some m issing directory files for some boards and w ere in
process of development.

Table 5.2: Dynam ic Pow er consum ption of the designs on XC2V1000 FPGA

Dynamic Pow er mW atts
Current mAmp

V erilog_file_l
30.12
12.5

V erilog_file_2
26.67
10.51

USUIS

m q E n c “mqEr»c“ [101 lOBs. B56F »
Caii3)C hain_2 "C a iiy C h a in ” (
C a fiv C h a in _ 1 0 “C aiiy C h ain "
C a n y C h a in _ 1 2 "C a iiy C h a in "
C airy C h a m _ 4 " C a iiy C h a in " {
C airy C h a in _ 3 " C a iiy C h a in " [
C any C h a in _ 1 1 "C a iiy C h a in "
C a iiy C h ain _ 9 " C a iiy C h a in " [
m q E n c "P iim iliv e s" 1 101 lOB:
C airy C h ain_ E "C aiiyC hain"
C a iiy C h ain _ 5 “ C a iiy C h ain " t
C aiiyC h ain _ 1 “C a iiy C h ain " I
C a iiy C h ain ? “C a iiy C h ain " [

iiJ_
_n0114

_noin

N28340
N28930

H2Bam
N28890

I±J____

Figure5.2: Floorplan for verilog_file_l design on XC2V1000 FPGA

Since we were unable to connect to the board using X H W IF on JBits 3.0 we reverted to
the previous version o f JBits i.e. JBits 2.8 since it has Boardscope GUI and Virtex Device
Sim ulator which acts as a virtual board. JBits2.8 is not com patible with Virtex -II FPGA

58

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

but is com patible with Virtex FPG A so we select xcvlOOO board for our test since it has
nearly the same num ber of gates as the xc2vl000.

^

Xilmx Flooi-pJanner - n x f n c f n f

* E*
!OB iâ

■S bw; -

;FI)?iptan ,

m
Enc

Ü *

__

_______

f f s A s .! »

me" [101 lOBs. 8 5 6 F .

C a iiy C h aifi_ 2 "C aiiy C h am " [
C a iiy C h ain _ 1 0 "C aiiy C h ain "
C a iiy C b am _ 1 2 " C a n y C h w "
C a iiy C h ain _ 4 "C a iiy C h a in " [
C a iiy C h am _ 3 "C aiiy C h ain " [
C a iiy C h ain _1 1 "C aiiy C h ain "
C a iiy C h am _ 9 "C a iiy C h a in " [
m qE n c " P i W i v e * " [101 lOB
C a iiy C h ain _ 6 "C a iiy C h a in " [
C a iiy C h a in .S “C a iiy C h ain " [
C a iryC hain_1 "C a iiy C h a in " (
C a iiy C h ain _ 7 "C a iiy C h a in " (

_n0114
_n0111
N28940
N28930
N28925
N28920
N28915
N28910
N28905
N28900

lib

lib

SwT

C 0N 6ETI0N: 0

jif

F igures.3: Routing C ongestion for verilog_file_l.v design on xc2 v l0 0 0 FPGA

T ab les.3; Resource U tilization o f the designs on XCVIOOO FPGA

N um ber o f Slices
Num berof Slice Flip Flops
Number o f 4 input LUTs
Number o f Board lOB

V erlog_file_l
486
235
844
100

Verilog_file_2
449
214
819
100

Since the board has been changed w e conducted all the synthesis, pow er analysis test
and floorplan generation for the new X ilinx board. The data and simulation results are
given in table 5.3 and 5.4.

59

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Table5.4: D ynam ic Pow er consum ption of the designs on XCVIOOO FPGA
V erilog_file_l
61.51
24.61

Dynamic Pow er( m W atts)
C urrent(m A m p)

V erilog_file_2
65.67
26.67

JffJxl

0 X&i* H ocfplaim ef - m q W n f
%

Edt g m Herardiy Pattern Bootplan Wndow Help

Di ^B i f

^

mqEnc mqEnc (101 luBs.831 F

mqEnc.fnf P lacem «A for XCVI0Q0-S-KS6Q

^ CaiijiChain 2 "CaffvCharn"
CanyChdm_10 CaifjiCham
Carrj»Chain_î2 "CafiyCham
CarryChain_4 "CanyChain" [
CatryChain_3 "CaiiyChain
CafryChain_11 "CaiiyChain"
CairyChain_9 "CaiiyChain" [
mqEnc "PiimHivej" [101
CariyChain_8 "CaiiyChain
CaiiyChain_5 "CaiiyChain" [
CaiiyChain_1 "CaiiyChain
CaiiyChdin_7 "CaiiyChain

Figure 5.4: Floorplan of the verilog_file_l.v design on xcvlOOO FPGA

W hen com pared to the V irtex-II board the Virtex board consum es more power and
the maximum clock frequency at which it operates is less than the clock frequency at

60

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

which the V irtex-II board operates. . Since our main aim is to dynam ically reconfigure
the device we used Boardscope to connect to the new board using Virtex device
simulator; Virtex D evice Sim ulator acts as a virtual board .The bitstreams were loaded on
to the virtual board using boardscope and the routing density was captured using the
Routing D ensity Viewer. We see that they are sim ilar to the floorplan that we got through
Xilinx ISE 5.

^ XiBnKFkœrjtone-- mqEnc/rf

Rie «WIferâfdty PaRen

#idcw#

.

Ds’ B i f »? ■!

iire|ErKinfPidcemefftforXCViI»n-HGKD
mqEnc mqEnc [101 IOBs.831F
CairyChain 2 XariyCS»" |
CaiiyDiain_1ll "CaiiyChain
CairyChain_12 "CaiijiChatn
CaitjiDiam_4 XanjiChain" [
CaiivChain_3 TaiiyCham"
CatiyDiain_11 "CaiiyChain"
CaifvChain_3 "CanyCham" [
mqEnc "Piiiiitiveï" [101
CaiiyChain^B"CanyCham" [
CariyDiain_5 "CariyChan" [
CaiiyChain 1 XanyCh»" [
CaiiyChain 7 TanyCham

CH0Œ893

Figures.5: Routing congestion for the design verilog_file_l.v on xcvlOOO FPGA

61

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

' S ta le V ie w

R ow ing D en sity V ie w ;

■ Coro View

ilc-_2\mqen(.bilto t-iSsiiqam cle\,ce 0
.i.Tiiiyuralinri
from Lc-nrfl''/iitf>rDS dfvif.
G confiBitraüùn ü ala

1 L-ütiimitisliù!! data

R ou tin g infqim atlcn roi CL B {f. 4>-

jO . ik

u ;o d

Çominaiwl:

Oew 0 CLB ( 49. 51) SLiCE1_FX ;

F igures.6: Design view of verilog_file_l.v on xcvlOOO fpga using Boardscope

File

Eil«

B o a rd

V ie w e r s

Help

p.; c<"

: -TC JT-. <1
S t a t e v ie w

C ore V iew

^

r RniRIng D e n sity V ie w

Figure 5.7: Design view of verilog_file_2.v on xcvlOOO FPG A using Boardscope

62

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Since the board is virtual we couldn’t get the power results as planned for the thesis
though it is possible to get the approxim ate pow er consumption using Boardscope pow er
viewer it is a exhaustive process w here we have to probe all the CLBs present in the
FPGA to get the num ber o f flip flops toggling every clock cycle and manually calculate
the power by m ultiplying the toggle rate

by a fixed a value and adding these pow er

values to get the total pow er but since the pow er value is not accurate which we would
expect after such a tedious jo b probing nearly 250 flipflops ,the pow er calculation was
not carried out. The written API core to swap the design couldn’t be tested since the lack
o f board but with the use o f V irtex D evice sim ulator in conjunction with Boardscope we
could simulate the loading of the bitstream s. W e first loaded the bitstreams related to the
first design on to the virtual hardware and then without resetting the device we replaced it
with second design and saw the result using the routing density viewer .We can clearly
see in the screenshots that when we replace an existing design by another one the design
gets placed in a different location on not w here the first one existed. This clearly shows
that we do incur a area overhead during runtim e reconfiguration of the device. Even
though this is not dynam ic reconfiguration as we know it is very close to one since we are
only missing the hardw are in the w hole environm ent .A successful porting of the X HW IF
interface on to the board should m ake it easier for us to reconfigure the board at runtime.

63

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

CHAPTER 6

CONCLUSION
An im plem entation of the arithmetic encoder was written as a Verilog module. This
m odule was extensively tested in simulation, using test bench. The Verilog encoder
correctly passed all tests as a working im plem entation of the JPEG2000 standard. The
module was successfully synthesized and used to configure a Virtex - II FPGA. All
timing inform ation resource and pow er utilization results were calculated successfully.

A new tool for Run time reconfiguration was tested and used to reconfigure a virtual
Virtex FPGA dynam ically; even though this was a com plete replacem ent of the design on
a virtual hardware we did this without resetting the FPGA which is dynamic in nature.
W e couldn’t connect to the board since the X ilin x ’s Hardware Interface couldn’t be
ported to the specified board. Even though JBits provides access to all the resources in a
EPGA it is useful and more effective only when a separate RTP core is written for a
design rather than using a bitstream generated by traditional HDL tools, since in a RTP
core we know the resources used and their location on the EPGA as user defined. This
leads us to suggest som e future work where an R T P core for arithmetic encoder can be
created and reconfigured using JBits and can be used as a co-processor in JPEG 2000
image processing.

64

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

REFERENCES
[1]

S. Davis. Total Cost of O wnership: Xilinx FPGAs vs. Traditional ASIC Solutions.

Xilinx, Inc., Eeb. 2000.
[2]

S. G ovindarajan, I. Ouaiss, M. Kaul, V. Srinivasan, and R. Vemuri, An effective

design approach for dvnam ically reconfigurable architectures, in Proc. IEEE Symp. on
Eield- Program m able Custom C om puting M achines, 1998.
[3] J. R. H auser and J. W aw rzynek, Garp: A M IPS processor with a reconfigurable
coprocessor, in Proc. IEEE Symp. on Field-Program m able Custom Computing M achines,
pp. 12-21, 1998.
[4] T. J. Callahan and J .W awrzynek, Instructionlevel parallelism for reconfigurable
computing, in Proc. International W orkshop on Field Program m able Logic, 1998.
[5] M. Kaul, R. Vemuri, S. Govindarajan, and Ouaiss, An automated temporal
partitioning and loop fission approach for fpga based reconfigurable svnthesis of dsp
applications, in Proc. Design A utom ation Conference, pp. 616-622, 1999.
[6] P. Bellows and B. H utchings, Jhdl- an hdl for reconfigurable svstem s, in Proc. IEEE
Symp. on Field-Program m able C ustom Com puting M achines, 1998.
[7] B. Blodget, S. M cM illan, and P. Lysaght. A lightweight approach for embedded
reconfiguration of fpgas. 1991.
[8] P. French and R.W .Taylor. A self-reconfiguring processor, pages 50-59. Proceedings
of lEEEW orkshop on FPGAs for C ustom Computing M achine, 1993.

65

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

[9] S. Guccione and D.Levi. R un-tim e param eterizable cores, pages 215-222. IEEE
Symposium on Eiled Program m able Logic and Application, 1999.
[10]

D.

T.

Hoang.

Searching

genetic

databases

on

splash2. pages

185-191.

Proceedings of lEEEW orkshop on FPGAs for Custom Com puting M achines, D A. Buell
and K.L. Pocek, 1993.
[11]

P. Lysaught, J. Stockwood, J. Law, and D. Girma. Artificial neural network

im plem entation on a fine-grainde LPGA, 1994.
[12]

D. Ross, O. Vellacott, and M. Turner. An fpga-based hardware accelerator for

image processing. M ore FPGAs: Proceedings o f the 1993 International workshop on
field-program m able logic and applications, pages 299-306, 1993.
[13]

S. Tapp. Configuration quick start guidelines. X A PP151, July2003

[14]

Design Section Head M BDA France G erard Habay Technical M anager Deltatec,

Alain Rachet Senior ASIC/EPGA D esigner M BD A France M anaging Partial Dynamic
Reconfiguration in V irtex-Il Pro FPGAs A SIC/EPGA
[15]

Bingfeng M eiG _Serge Vernal de Hugo D e M an Rudy Lauw ereins_IM EC vzw,

K apeldreef 75, B-3001, Leuven, Belgium _ D epartm ent of Electrical Engineering, K. U.
Leuven,

B-3001,

Leuven,

Belgium

D esign

and

Optim ization

of

Dynamically

Reconfigurable Em bedded Systems
[16]

A Design M ethodology for D ynam ic Reconfiguration:The Caronte A rchitecture.

Fabrizio Ferrandi

Politecnico

di M ilano M ilano,

Italy M arco D. Santambrogio

Politecnico di M ilano M ilano, Italy D onatella Sciuto Politecnico di M ilano M ilano, Italy.
[17]

Duncan A. Buell, Je_rey M .A rnold, and W alter J.Kleinfelde. Splash2, FPGAs in

Custom Computing M achine. IEEE Com puter Society Press, 1996.

66

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

[18]

J.V uillem in,

P.Bertin,

D.Roncin,

M .Shand,

H.Touati,

and

P.Boucard.

Program m able Active M emories : Recon gurable Systems Come o f Age. IEEE
Transactions on V L SI, 4(1), M arch 1996.
[19]

M. Kaul and R. Vemuri. Optimal tem poral partitioning and synthesis for

reconfigurable architectures. In Design, Autom ation and Test in E urope, pp. 389-396,
1998.
[20]

K. P um a and D. Bhatia. Tem poral partitioning and scheduling data flow graphs

for reconfigurable com puters. IEEE Transactions on Computers, 48:579-590, 1999.
[21]

M. Eisenring and M. Platzner. Synthesis o f interfaces and com m unication in

reconfigurable

em bedded

systems.

lE E

Proceedings— Computers

and

Digital

Techniques, 147:159-165, 2000.
[22]

Q uicktum D esign Systems, w w w .quicktum .com .

[23]

Xilinx, Inc. JB its SDK, February2001, w w w .xilinx.com /products/softw are/ibits/

[24]

Steven A. Guccione, D elon Levi and Prasanna Sundararajan. JBits: A Java-based

Interface

for

Reconfigurable

C om puting.

2nd

Annual

M ilitary

and

Aerospace

Applications o f Program m able D evices and Technologies Conference (M APLD ’99),
Sept. 1999.
[25]

JBits D esign A bstractions Cam eron Patterson and Steven A. Guccione Xilinx,

Inc., 2100 Logic D rive, San Jose, CA, 95124 (USA)
I Cam eron.Patterson,Steven.G uccione}@ xilinx.com
[26]

Steven A. G uccione and Delon Levi. Run-tim e param eterizable cores. In Patrick

Lysaght, James Irvine, and Reiner W. H artenstein, editors, Field-Program m able Logic
and

Applications,

pages

215-222.

Springer-V erlag,

Berlin,

A ugust/Septem ber

67

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

1999.Proceedings of the 9th International W orkshop on Field- Program m able Logic and
Applications, FPL 1999. Lecture N otes in Com puter Science 1673.
[27]

Partitioning in Tim e : A Paradigm for Recon gurable Com puting Karthikeya M.

G ajjalaPum a and Dinesh Bhatia D esign Autom ation Laboratory U niversity o f Cincinnati
Cincinnati, O H 45221(0030 fkgajjala,dineshg@ ececs.uc.edu
[28]

M.D. Adams, “The JPEG-2000 Still Im age Compression Standard” , M .D. Adams

(author of Jasper software), URL: http://ww w .ece.ubc.ca/~m dadam s/papers/jpeg20002001-06-30.ps.gz, (Last accessed 22 A ugust 2001).
[29]

ISO 15444-1 Final C om m ittee Draft (FCD) Version 1.0, Inform ation Technology

JPEG 2000 Image Coding System , International Standards Organisation, M arch, 2000.
URL: http://w w w .jpeg.org/public/fcdl5444-l.pdf, (Last accessed 3 October 2001).
[30]

LPGA Coprocessing in a JPEG 2000 Im plem entation By Jam es Brennan, School

of Information Technology and Electrical Engineering, U niversity o f Queensland.
[31]

An FPGA -based Run-tim e R econfigurable 2-D D iscrete W avelet Transform Core

by Jonathan B. Ballagh, Virginia Polytechnic Institute and State University
[32]

EBCOT coprocessing architecture for JPEG2000 Huakai Zhang, Jason Fritts,

Dept, of C om puter Science and Engineering, W ashington U niversity, One Brookings
D rive, St. Louis, MO, USA 63130
[33]

JBits: Java based interface for reconfigurable com puting by Steve Guccione,

D elon Levi and Prasanna Sundararajan
[34]

The hoplite guide to run-tim e reconfigurable com puting written by M atthew

Scarpino Revision 1.0 - 8.13.04

68

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

[35]

BoardScope: A Debug Tool for Reconfigurable Systems Delon Levi and Steven

A. Guccione
[36]

JRoute: A R un-Tim e Routing API for FPGA Hardware Eric K eller

[37]

Java D ebug Hardware M odels using JBits Jonathan Ballagh, Peter A thanas, and

Eric K eller V irginia Tech 340 W hittem ore Hall Blacksburg, VA 24061
[38]

ISO 14492 Einal Com m ittee Draft (ECD), Information Technology - Coded

Representation O f Picture And A udio Information - Lossy/Lossless Coding O f Bi-Level
Images

(JB1G2),

International

Standards

Organization,

July

1999.

Available

at

http://w w w .jpeg.org/public/fcdl4492.pdf, (Last accessed 3 O ctober 2001).
[39]

H E. R esnikoff and R.O. W ells Jr., W avelet Analysis - The Scalable Structure of

Inform ation, Springer, New York, 1998.
[40]

http://ww w .xilinx.com

[41]

http://w w w . M em ec.com - Sem iconductor D istributor - Program m able Logic,

Analog, ASSP.htm

69

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

VITA

Graduate College
U niversity o f N evada, Las Vegas
Local Address;
4223 Grove circle. A pt # 3
Las Vegas, N evada 89119
Degrees:
Bachelor of Engineering, Education, 1999
University o f M adras, Chennai, India
M aster of Science in Electrical Engineering, 2005
University o f N evada, Las Vegas
Special Honors and Awards:
M em ber of Tau B eta Pi
Thesis title: A nalysis o f Runtim e Re-Configuration in EPGAs
Thesis Exam ination Committee:
Chairperson, Dr. H enry Selvaraj, Ph. D.
Comm ittee M em ber, Dr. M uthukum ar Venkatesan, Ph. D.
Comm ittee M em ber, Dr. Sharam Latifi, Ph. D.
Graduate Eaculty R epresentative, Dr. Lax mi Gaewali, Ph. D.

70

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

