VHDL behavioral description of Discrete Cosine Transform in image compression. by Deng, An-Te
Calhoun: The NPS Institutional Archive
Theses and Dissertations Thesis Collection
1991-09
VHDL behavioral description of Discrete Cosine
Transform in image compression.
Deng, An-Te

















Thesis Advisor: Chin-Hwa Lee




SECURITY CLASSIFICATION OF THIS PAGE
REPORT DOCUMENTATION PAGE
la REPORT SECURITY CLASSIFICATION
Unclassified
lb RESTRICTIVE MARKINGS
2a SECURITY CLASSIFICATION AUTHORITY
2b DECLASSIFICATION/DOWNGRADING SCHEDULE
3 DISTRIBUTION/AVAILABILITY OF REPORT
Approved fur public release; distribution is unlimited.
4 PERFORMING ORGANIZATION REPORT NUMBER(S) 5 MONITORING ORGANIZATION REPORT NUMBER(S)





7a NAME OF MONITORING ORGANIZATION
Naval Postgraduate School
6c ADDRESS (City, State and ZIP Code)
Monterey, CA 93943-5000
7b ADDRESS (City, State, and ZIP Code)
Monterey, CA 93943-5000




9 PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER
8c ADDRESS (City, State, and ZIP Code) 10 SOURCE OF FUNDING NUMBERS
Program tlemc*nt No Proiea Nc Work unit Ai
Numoer
1 1 TITLE (Include Security Classification)
VHDL Behavioral Description of Discrete Cosine Transform in Digital Image Compression
12 PERSONAL AUTHOR(S) Deng.An-Te













18 SUBJECT TERMS (continue on reverse if necessary and identify by block number)
Image Compression; Discrete Cosine Transform; VHSIC Hardware Description Language;
Top-Down design;
1 9 ABSTRACT (continue on reverse if necessary and identify by block number)
This thesis describes a VHSIC Hardware Description Language (VHDL) simulation of a hardware 8x8 Discrete Cosine Transform iDCT) which
can be applied to image compressin. A Top-Down Design approach is taken in the study, a discussion of DCT theory is presented, along with a
description of the ID DCT circuit architecture and its simulation in VHDL. Results of the 2D DCT simulation are included for two simple test
patterns and verified by hand calculation, demonstrating the validity of the simulation. Shortcomings found in the simulation are described,
together with suggestions for correcting them. In the future, the VHDL description of the 8 it 8 image block 2-D DCT can be further developed into
structural and gate-level description, after which hardware circuit implementation can occur.
20 DISTRIBUTION/AVAILABILITY OF ABSTRACT
£j UNCLASSIFIED/UNLIMITED ] SAME AS REPORT J OTIC USERS
22a NAME OF RESPONSIBLE INDIVIDUAL
Chin-Hwa Lee
21 ABSTRACT SECURITY CLASSIFICATION
Unclassified




DD FORM 1473, 84 MAR 83 APR edition may be used until exhausted
All other editions are obsolete
SECURITY CLASSIFICATION OF THIS PAGE
Unclassified
Approved for public release; distribution is unlimited.
VHDL Behavioral Description




Lt. Col, Republic of China Army
B.S., Chung Cheng Institute of Technology, 1976
Submitted in partial fulfillment
of the requirements for the degree of
MASTER OF SCIENCE IN SYSTEM ENGINEERING
from the
ABSTRACT
This thesis describes a VHSIC Hardware Description Language (VHDL) simulation of
a hardware 8x8 Discrete Cosine Transform (DCT) which can be applied to image
compression. A Top-Down Design approach is taken in the study, a discussion of DCT theory
is presented, along with a description of the 1-D DCT circuit architecture and its simulation in
VHDL. Results of the 2-D DCT simulation are included for two simple test patterns and verified
by hand calculation, demonstrating the validity of the simulation. Shortcoming found in the
simulation are described, together with suggestions for correcting them. In the future, the VHDL
description of the 8 x 8 image block 2-D DCT can be further developed into structural and




A. LITERATURE BACKGROUND 1
B. OBJECTIVE 1
C. RATIONALE FOR USING VHDL TO DESCRIBE THE CIRCUIT . 2
D. OVERVIEW OF THE THESIS 2
II. BASIC DISCRETE COSINE TRANSFORM THEORY 4
A. DISCRETE COSINE TRANSFORM IN IMAGE COMPRESSION . . 4
1. Rationale for using Discrete Cosine Transform 4
2. Formulae of the Discrete Cosine Transform 5
B. ALGORITHM FOR 8 BY 8 IMAGE DISCRETE COSINE
TRANSFORM 8
1. Methodology of 2-D DCT 8
2. Principle of distributed arithmetic 10
3. Methodology for forming the ROM storage 11
4. Exploiting the symmetry in DCT to save storage in ROM 12
III. A STRUCTURAL ARCHITECTURE FOR THE 1-D DCT 16
IV
Mi
A. 8X8 IMAGE BLOCK 1-D DCT CIRCUIT ARCHITECTURE ... 16
B. TRANSPOSE RAM ARCHITECTURE 20
IV. VHDL BEHAVIORAL DESCRIPTION OF THE 1-D DCT COMPONENT
. 22
A. BLOCK DIAGRAM DESCRIPTION 22
B. BI-TO-DI AND DI-TO-BI VHDL PACKAGE 23
C. CLOCK GENERATOR MODULE (CLOCKGE) 26
D. PARALLEL SHIFT REGISTER MODEL (LOAD) 27
E. SHIFT-TWO-REGISTER MODEL (SHIFT) 29
F. 2-BIT ADDER/SUBTRACTOR MODEL (ADDSUB) 31
G. SHIFT REGISTER MODEL (REG) 33
H. READ ONLY MEMORY MODEL (ROM) 34
I. SHIFT RIGHT 1-BIT REGISTER MODEL (SHI1) 35
J. ADDER/SUBTRACTOR-G MODEL (ADDG) 36
K. SHIFT REGISTER-H MODEL (REG_H) 40
L. 16-BIT ADDERJ MODEL (ADD_I) 40
M. SHIFT RIGHT 2-BIT REGISTER MODEL (SHI_2) 42
N. PARALLEL LOAD SERIAL SHIFT REGISTER MODEL
(RESULT) 43
O. TEST BENCH 45
V. SIMULATION OUTPUT ANALYSIS AND EXPERIENCE 46
A. FORMATION OF ROM STORAGE VALUES 46
B. SIMULATION AND TESTING IMAGE PATTERN (I) 47
C. SIMULATION AND TEST OF IMAGE PATTERN (II) 57




Input Data Sequential Order error 63
2. Formation of 2-bit Adder in VHDL source code 64
3. No Timing control in Add_i Model 64
4. "Set" control in Test Bench 65
5. Signals cannot be used as variables in VHDL 66
6. Preventing Negative Zero occurrences in Packl 66
VI. CONCLUSION 67
APPENDIX A. 12-BIT 1-D DCT VHDL SOURCE CODES 69
APPENDIX B. 16-BIT 1-D DCT VHDL SOURCE CODE 105
APPENDIX C. MATLAB PROGRAM OF DECIMAL-BINARY
CONVERSION 114
APPENDIX D. STRUCTURAL 1-D DCT HAND CALCULATION 115
vi
APPENDIX E. FORMATION OF 2-BIT ADDER 121
A.TWO BIT ADDER TRUTH TABLE 121
LIST OF REFERENCES 125
INITIAL DISTRIBUTION LIST 126
vn
LIST OF TABLES
Table I: Multiplication Coefficients 46
Table II: 8x8 image pixel values of Pattern (I) 49
Table III: 1-D DCT spectral coefficients of Pattern (I) in VHDL
simulation 50
Table IV: 1-D DCT coefficients of pattern (I) using Spider Subroutine ... 50
Table V: Transposed 1-D DCT coefficients of pattern (I) in VHDL
simulation 51
Table VI: 2-D DCT spectral coefficients of pattern (I) in VHDL
simulation 51
Table VII: Table V in integer values 52
Table VIII: 2-D DCT spectral coefficients of pattern (I) using Spider
Subroutine 52
Table IX: 2-D DCT coefficients of pattern (I) using direct calculation ... 53
Table X: 16-bit binary number representation of table (V) 54
Table XI: Serial 2-bit addition/subtraction output 55
Table XII: 2-D DCT coefficients of pattern (I) using manual calculation . . 56
Table XIII: 8x8 image block pixel values of pattern (H) 57
Table XIV: 1-D DCT coefficients of pattern (H) using VHDL simulation . . 58
Table XV: 2-D DCT coefficients of pattern (H) using VHDL simulation . . 58
Table XVI: Pattern H 1-D DCT coefficients using Spider Subroutine .... 59
Vlll
Table XVII: 2-D DCT coefficients of pattern (II) using floating point
calculation 60
Table XVIII: Equivalent decimal numbers of table (VI) 61
Table XIX: Equivalent decimal numbers of table (XII) 61
Table XX: Truth table of 2-bit adder 121
Table XXI: (Table XX) continue 122
IX
LIST OF FIGURES
Fig. 1 2-D DCT Block Diagram. 9
Fig. 2 Architecture of 1-D DCT 16
Fig. 3 1-D DCT block diagram 22
Fig. 4 clock_ge block diagram 26
Fig. 5 Serial load parallel shift register block diagram 27
Fig. 6 Shift two register block diagram 29
Fig. 7 2-bit add/sub block diagram 31
Fig. 8 "adsu M flow chart 32
Fig. 9 shift register (reg) block diagram 33
Fig. 10 ROM block diagram 35
Fig. 11 Shil register block diagram 36
Fig. 12 Addg block diagram 37
Fig. 13 Shift registerg block diagram 40
Fig. 14 16-bit addi block diagram 41
Fig. 15 Shift right 2-bit register block diagram 42
Fig. 16 Parallel shift serial output register block diagram 44
Fig. 17 Block diagram of Test Bench 45
Fig. 18 Pattern (1)8x8 image block 48
Fig. 19 U0 hand calculation 115
Fig. 20 VI hand calculation 116
Fig. 21 V3 hand calculation 117
Fig. 22 U4 hand calculation 118
Fig. 23 V5 hand calculation 119
Fig. 24 V7 hand calculation 120
Fig. 25 Karnaugh map reduction 123
XI
ACKNOWLEDGEMENTS
Many of the ideas in this thesis are based on the experience of my advisor, Dr.
Chin-Hwa Lee, who has labored with me through the chapters. Many thanks go to Dr.
Lee for his patience and valuable advises. Also I am very grateful to Dr. Roberto Cristi
for his comments on my thesis.
It has been a pleasure sharing with Dr. Pat Pauley, who not only has been a
supporting force, but also has proofread my thesis in no time during her busiest hours.
I owe special thanks to those who have suffered with me through the writing
process — my family: my wife Vicky, my daughter Bobo and my son Joshua.
Do you not know? Have you not heard? The Lord is the everlasting God, the
Creator of the ends of the earth. He will not grow tired or weary, and his
understanding no one can fathom. He gives strength to the weary. Even youths
grow tired and weary; and young men stumble and fall; but those who hope in the
Lord will renew their strength. They will soar on wings like eagles; they will run





This thesis is basically developed from the paper "An 8 x 8 Discrete Cosine
Transform Chip with Pixel Rate Clock" by D'Luna, L. J. [Ref. 1]. The original paper
introduced the algorithm and implementation of one-dimensional (1-D) as well as two-
dimensional (2-D) Discrete Cosine Transform (DCT) where the principle of distributed
arithmetic is used. According to the algorithm introduced, hardware circuit architecture
was implemented.
Another very important aspect discussed in this thesis is the implementation of a
"Top-Down Design" concept that uses Very High Speed Integrate Circuit (VHSIC)
Hardware Description Language [Ref. 4-8] as a tool. "Top-Down Design" is a kind of
design that describes the given algorithm with a high level language first. After the
algorithm is described, the structural architecture is described next. Finally this structural
description is developed into hardware circuit. VHDL facilitates the algorithm
description, structural description as well as hardware circuit simulation.
B. OBJECTIVE
The purpose of this thesis is to describe the behavior of the implemented
architecture of the algorithm mentioned above with VHSIC Hardware Description
Language (VHDL). It was simulated on a workstation in order to analyze the
characteristics. In the process of describing the behavior of this structural architecture,
complicated hardware circuits are developed in behavior models. This is usually the first
step in a "Top-Down Design" task. The objective is to use a DCT implementation as an
example to study the "Top-Down Design" methodology.
C. RATIONALE FOR USING VHDL TO DESCRIBE THE CIRCUIT
In the past, VHSIC design was dominated by bottom-up design methodologies
where hardware circuit details were established and produced before the system was
constructed [Ref. 4]. This methodology is very useful in dealing with small circuits.
However, when the system gets complicated, bottom-up design methodology is more
difficult to handle. In this work, a high-level, top-down design approach is taken.
Initially, a description of the algorithm is written. Later on, a detailed architecture is
described. All are done in VHDL. VHDL is a hierarchical hardware description language
which supports mixed-level simulation. This thesis shows the beginning steps for a "Top-
Down Design" approach. The 8x8 image block DCT algorithm were implemented into
a behavior model and a structural model. VHDL were used here to accomplish the initial
design of the 1-D Discrete Cosine Transform implementation.
D. OVERVIEW OF THE THESIS
There are six chapters in this thesis. The first chapter is an introduction to the
literature background, the objective, and the reasons for using the VHDL. Chapter II
introduces the algorithm of Discrete Cosine Transform and the principle of distributed
arithmetic. Chapter III examines the components of the structural architecture. Chapter
IV gives the actual VHDL behavioral description of the components, its actual circuit
block diagram, and its connections. Chapter V analyzes the simulation results and gives
some experience on design problems. The last chapter is the conclusion.
n. BASIC DISCRETE COSINE TRANSFORM THEORY
A. DISCRETE COSINE TRANSFORM IN IMAGE COMPRESSION
1. Rationale for using Discrete Cosine Transform
Image transmission or storage usually deals with a large amount of digital
data. There are usually 512 x 512 pixels in a monochrome picture. If one pixel needs
8 bits to represent its information, transmitting a monochrome picture means that more
than two megabits (512 x 512 x 8 = 2,097,152) of digit data need to be transmitted.
There are many ways to do coding, compressing huge amounts of data to reduce the
transmission bandwidth and the amount of storage space required. Among these methods,
transform domain compression is an effective way to eliminate the redundant information
in images, since image data are usually highly correlated.
Image transformation is used to extract a small number of significant
coefficient values from the original image, by mapping the image data onto a two-
dimensional spectrum. Each coefficient in the transform domain represents some amount
of energy of the spectral component. The original spatial image can then be recovered
back from these coefficients, since each image has its own specific spectral pattern. After
the transformation, there are only a few coded values required to describe the original
image. Consequently, it is possible to save bits during transmission and storage.
The Fourier transform algorithm has been applied to image processing for a
long time, since it possesses many desirable analytic properties. But, it has two major
drawbacks. First, the computation of the Fourier transform involved complex numbers
rather than real numbers. Secondly, the decreasing rate of spectrum energy as frequency
increases is low. This low decreasing rate in the spectrum is a very significant
disadvantage in image coding.
The Discrete Cosine Transform (DCT) has the advantage of involving only
real number computations. It is well suited for image data compression. Consequently,
8x8 image blocks of two dimensional cosine transforms have been adopted as an
international standard draft (JPEC) [Ref. 1]. This thesis concentrates on studying the
Discrete Cosine Transform and building a circuit for 8 x 8 image blocks.
2. Formulae of the Discrete Cosine Transform
The general formula of a one-dimensional Discrete Fourier Transform (1-D




z, = xp* (1)
where Zk is the transform of X„ Ctt is the forward transformation kernel, and i and k
range from to N - 1. The inverse transform of the 1-D DCT is given by the relation
AM
EX.-EZA (2)
where h^ is the inverse transformation kernel. The characteristic of the transform is
determined by its transformation kernel properties.






















where Zk , k = 0, 1, 2, ... , N- 1, is the 1-D DCT of Xfl).
The inverse kernel is of the same form as Eq. (3) and (4), so that the inverse
DCT is expressed by the equation
X
i
= — Zo + ,
fit N
s-i
1 T Z,cos (2/ + l)k«
N ti k 2N
(7)
where/ = 0, 1, 2, ... , N- 1.
The two-dimensional forward DCT kernel is given as
c
*» i <8>
. l[cos (2 ' *- 1 >t"][eo.g-liV«-
N 2tf 2W
C#| - 4j<wig I.^ MeotW ^iJH ] (9)
where i,j = 0, 1, ... , 2V- 1, and k, I - 1, 2, ... , N- 1. The inverse kernel is also of




=: wL X, ^v (10)
*u " "E E XJcos (21' + 1)fa ][cos (2^ * 1)fa ] (ID
where fc, / = 1, 2, ... , N - 1, and
*4*4E E ^cos^i^ltcosfiegt] d2)v N w Ntft tt ^ 2N " 2W
where i, y = 0, 1, ... , N - 1.
It can be seen that DCT transformation kernels are separable from Eqs. (3),
(4), (8), and (9). Therefore, the two-dimensional forward or inverse transformation can
be computed by applying two one-dimensional DCT operations successively.
B. ALGORITHM FOR 8 BY 8 IMAGE DISCRETE COSINE TRANSFORM
1. Methodology of 2-D DCT
Let Xy denote an image pixel value, which is an n-bit number. The indices
i and j represent the row and column location of the pixel, respectively. The N x N





ZU = -E£^ (2/ + l)kK **® + l)ln kJL = U,...^-l. d4)u Nf^U J 2N 2N
Zu is the spectral coefficient corresponding to the k* horizontal frequency and P* vertical
frequency. In matrix notation, the inner summation is equivalent to a 1-D DCT
computation on the columns of X. The outer summation is equivalent to a 1-D DCT
computation on the rows of the inner summation results. C can be used to represent the
2-D DCT matrix. It has the 1-D DCT basis vectors which are elements C^ (1-D DCT
kernels), where
Cm0 = — m = 0,U,...^-1 (15)
c*






m = 0, 1, 2, ... , N-l; k = 1, 2, ... , N-l. Because the kernels of the DCT
transformation can be separated, the 2-D matrix Z of 2-D DCT coefficients can be
represented as
Z = [X'Q'C = C'XC. (17)































dot 6x1 TRAM64 &&AJZ
Fig. 1 2-D DCT Block Diagram
The N x N block of image X is input column by column first, and the 1-D DCT
computation is done. This computation is carried out as shown in the square bracket of
Eq.(17) for the/ column (for j = 0, 1, ... , N - 1). The result of this N x N matrix
is then transposed for the second row by row 1-D DCT computation. This transpose is
done as described by term on the outside of the square brackets in Eq.(17). After the
transposition, the same 1-D DCT computation involving the same transform matrix C is
carried out again. The transpose step takes care of the column to row change operations
of the data. The key operations involved here are the matrix transpose and the 1-D DCT
computation.
2. Principle of distributed arithmetic
The implementation of the 1-D DCT studied here is based on the principle
of distributed arithmetic. Using this principle, it is possible to implement the "bit
calculation" into the chip design. "Bit multiplication" is simply carried out by using the
input data bit pattern to address a Read Only Memory and by summing up all the results
to obtain the "transposed spectral values". If Y
t (y; = (yuJm =
N' 1
) is the image pixel vaiue
represented by a row vector, then its 1-D DCT is
E
m=0
** = y*A* <* = o, i,.., n - i. <«>
Now the input data v^ can be represented in 2's complement notation with
p-bit as
*. = -yr*-' * £** (19)
where yjq> is the (f1 bit of the incoming image pixel values v^ which have a value of
either or 1. 2q is the binary weight of the (f
1
bit. For example, if the input data is a 2's
10
complement 8-bit pattern then y^ = -yj* x 2 7 + yj0> x 2° + yjl> x 2' + yj* X 22
+ yJ3> x^+ yj4> x 24 + yj5) x 25 + yj6> x 2s . Substituting Eq. (19) into Eq. (18)
4--Evf^l *EEv^ (20)
m=0 q=0 m=0
Z* = -F^C^- 1^- 1 * J^FJPJPW (21)
q=0
where F& is a function of the vectors Ck and Y/9) and is represented as
*V<Vft - E c^ff /or « = 0,1,2,.., P-1. <22 >
m=0
Its binomial form can be shown as
F (C Y**\ = c v (9) + c v (4) + + c v iq) (23)
where, q = 0, 1, ... , p-1.
3. Methodology for forming the ROM storage
In Eq.(23), c^ are 1-D DCT basis (kernels) vectors used as multiplication
coefficients. They are converted from decimal numbers to the 2's complement notation
used in this thesis. yjq> are the bit patterns represented in 2's complement form of the
N data points v^. Because the basis vectors are fixed value coefficients and F& are
functions of the basis vectors and the binary bit patterns, the values of F& (with a fixed
11
k) for all possible N bit patterns (yjq) - m = 0, 1, 2, ... , N - 1) can be calculated and
stored in Read Only Memory (ROM) according to Eq.(22) and Eq.(23). The N-bit
pattern changes with time according to the incoming data yjq> (m = 0, 1, 2,..., N-l).
This bit pattern will form an address to access the ROM to extract the corresponding
FJCkt Rvalue.
From Eq.(20) and Eq.(21), the corresponding 1-D DCT spectral coefficient
2^ can be computed by shifting and adding the F& values stored in the ROM. In Eq.
(21), F& is a function of the corresponding basis column vector Q for k = 0, 1, 2, ...,
N-l. Ft is different from each other as k varies. The incoming data vector Yi is the same
for the multiplication coefficients involved for all values of k. It is possible to build up
N separate memory banks of multiplication coefficients and compute the N 1-D DCT
spectral coefficients Z
rt
(k = 0, 1, 2,..., N-l) in parallel or concurrently.
4. Exploiting the symmetry in DCT to save storage in ROM
Here, 8x8 image blocks are used, so N = 8. The incoming data has 8 bits.
This means 2 8 = 256 possible bit patterns will be formed into addresses. There shall be
256 corresponding multiplication coefficient sum stored in the ROM for each of the 8
DCT spectral coefficients. However, advantage can be taken of the symmetry in the DCT
basis vectors. It can be shown that
c












where c^ is defined by Eq. (15) and Eq. (16). And the following can be proven,
C
n* = ~














Hence, Eq. (18) can be reduced to
A//2-1
m=0






= E Cv,m - y^.i-jc^
where k = 1, 3,..., N-l (it odd). (29)
Equations (22) and (23) then can be reduced to
N/2-1
FJC^ = E CJym yiJhl_jF*
m=0




W,if> = <^„ - r^j»
w/iere A: = 1, 3, 5,..., AM. (3D
From the above equations, it is possible to add or subtract the incoming data
points before memory access and reduce the number of distinct data values in ROM from
N to N/2. The total number of bit patterns is now only 2N/2 = 24 = 16. Only a 16 word
ROM is necessary for each of the 8 DCT coefficients, and therefore a total of 16 x 8
= 128 word ROM is required. This savings of ROM storage is significant compared to
the cost of using adders and subtracters in a different architecture. Since there is only
one particular bit pattern (those bits which have the same binary weight) at a time
allowed to address the ROM, and bit pattern changes according to the serially coming
14
data, the addition and subtraction can be done in a bit serial fashion. This advantage is
exploited in the chip implementation discussed in the next chapter.
15
ffl. A STRUCTURAL ARCHITECTURE FOR THE 1-D DCT
8x8 IMAGE BLOCK 1-D DCT CIRCUIT ARCHITECTURE
The 1-D DCT architecture studied previously is shown in Fig. 2 [Ref. 1]. There
ootctfcq
»
























t t t tt tt t
C D E F G H J
Fig. 2 Architecture of 1-D DCT
K
are 8 slices parallel to each other corresponding to the 8 DCT coefficients which are
computed concurrently. First, 12-bit pixels AI(11:0) are put column by column into the
"serial-in-parallel-out" shift register (A). This sequence needs 8 clock cycles to complete.
16
After the 8th clock, the shift registers output the data into the "parallel load 2-bit serial
shift register" (B) at once. This is completed at the 9th clock cycle. At the same time, the
serial-in-parallel-out shift registers also get their new incoming data. The data stored in
the B shift register has to be added or subtracted according to Eqs. (30) and (31) in order
to reduce the ROM storage. In order to make Eqs. (30) and (31) more understandable,
they are expanded as below
JV/2-1
m=0
m=0 m=l m=2 m=3
= C00(YH)+Yr7)^+C 10(Ya+YK)W+C20(Ya+Yi5)(*+C3o(Ya+Yi4)^----k =
+Cn(Y l0+ Y£<*+C12(Y„ + Ym)« +C22(Ya+Ya)«>+
C
32 (Ya +Y^—k = 2
+CM(Y 1o+Y i7)^+C 14(Y ll +Y l6)^+C24(Y l2+ Y l5)^ +C 34(Y l3 +Yl4r----k = 4
+C06(Ya+Yp)W+C l6(Yn + Y iJ («+CM(Y i2+Y iS)<« + C36(Ya+Yi4)W----k = 6
NI2-1
m=0
m = m=l m = 2 m = 3
=Co,(Y l0-Y l7)^+C 11(Y il-Y i6)^+C21(Y l2-Y l5)^+C 31(Y l3-Y l4)^--k= 1
=Co3(Y l0-Y i7)^+C 13(Yll-Y l6)^+C23(Y l2-Y l5)^+C 33(Y l3-Y (4)^--k = 3
=Co5(Yi0-Y l7)^+C 15(Y ll-Y l6)^+C25(Y i2-Y l5)^+C 35(Y 13-Y l4)^--k = 5
=Ca7(Y i0-Yi7)^+C 17(Y il-Y i6)^+C27(Y l2-Y l5)^+C37(Yl3-Y i4)^--k = 7
17
The numbers above the expanded equation represent the index m, and the numbers on
the right side are the index k. C^ are multiplication coefficients. The bit
addition/subtraction is determined according to whether k is an even or odd number.
Registers B must be emptied in less than 8 clock cycles in order to receive new
data coming from registers A. Each datum is 12 bits in length. If a single bit is coming
out of registers B, it will take 12 clock cycles to empty the register. This will cause
collision during the addition and subtraction of the data. There are two ways to solve this
problem ; either to clock register B twice as fast or to shift out data 2 bits at a time. The
latter alternative has been chosen for the reasons of convenient design and easy system
considerations. The shifted 2-bit data is added or subtracted in the "2-bit
adder/subtractor" C. Their output is stored in the shift registers D which split the least
significant bit and most significant bit (binary weight q = and q = 1) into two output
lines.
Next comes the question as to where the output data of the adders and subtracters
should go to address the ROM. How should the values in the ROM be arranged? It is
shown in the above expanded equations that all the adder outputs which is designated as
(U (0:3) and U
t (0:3) (Refer to Fig. 2). They are the 4 bits patterns which are the sum
of the two adjacent bit Yjq) . q = represents LSB bit and q = 1 represents MSB bit in
Eqs (20) and (21). (U (0:3) and Ui(0:3) should be multiplied by the coefficients C^,
where k = 0, 2, 4, 6. All the two adjacent difference output V (3:0) and V 1 (3:0) should
be multiplied by the coefficients C^, where k = 1, 3, 5, 7. As a result, the four adders
and subtractors output bit patterns form a 4-bit address to access the corresponding
18
accumulated sum of the coefficients C,,*, k = 0, 1,... 7 which are stored in ROM E.
This step will accomplish the 1-D DCT coefficient multiplication. The output of the
ROM is first latched in register F, and then adder/ subtractor G will calculate the sum of
the "2-bit" spectral coefficient values according to Eq (21). The LSB (q = 0) values are
shifted to the right one position and added to the q = 1 values. This addition will
continue until the last bit pattern (12 th) of the incoming column data. According to Eq.
(19), the incoming data have been represented in 2's complement notation, so the most
significant bit's value should be subtracted from all the previous summations. This is
done by changing the add/sub control line of G into subtraction at the clock cycle of the
last bit pattern for each column of data.
The 2-bit sum or difference results of G are stored into register H and then sent to
the accumulator I and J. The accumulator consists of one "16-bit adder" and a "shift
right 2-bit register". The value stored in ROM E is a 16-bit word. The 16-bit adder I
adds the previous 2-bit right shifted value (output of J) to the incoming value (output of
H). The resulting value then is output to J register to do the 2-bit right shift. This process
will accomplish the computation of Eq. (21) as index q varies from to p-1 in 2 bit
increments. One thing has to be noted with caution; the initial value in the shift right 2-
bit registers for every incoming column of data should be zero. Otherwise, the previous
column values would accumulate. To avoid this, just clear the shift right 2-bit register
at the beginning of the accumulation of every column group.
After 8 clock cycles, the accumulated values are parallel loaded into register K.
Similar to register A but in the reverse direction, register K puts out the 1-D DCT
19
spectral coefficients column by column. These 1-D DCT coefficients are then transposed
by the transpose RAM (TRAM) according to Eq.(17). The transpose RAM is described
in the next section. After the transpose RAM, 1-D DCT coefficients are then input into
again the same 1-D DCT architecture. The only difference now is that the registers A
and B have to be expanded from 12 bits to 16 bits for the second transform.
B. TRANSPOSE RAM ARCHITECTURE
According to Eq. (17), the purpose of the "transpose RAM" is to change the 8 x
8 1-D DCT coefficient block's columns into rows; and rows into columns. The
coefficient values are generated from the 1-D DCT architecture column by column.
First, these values are put into a RAM while the transposed values are written.
Therefore, the transpose RAM must have the capability of reading in the 1-D DCT
values and writing out the transposed values in the same cycle. How can this be done?
The coefficient values come out of the 1-D DCT architecture in serial order; the
0, 1, 2,..., 7 coefficients of the first column of the 8 x 8 block come in first and then
the 0,1,... 7 coefficients of the second column and the third column and so on. This
order is a long stream of coefficients 0,1,... 63 for each 8x8 image block. After
storing them in the RAM, the coefficients must be read out in groups of 8 values in the
order of 0, 8, 16,..., 56; 1, 9, 17,..., 57; 2, 10, 18,..., 58; 3, 11, 19,..., 59; 4, 12,
20,..., 60; 5, 13, 21,..., 61; 6, 14, 22,..., 62; 7, 15, 23,..., 63 to achieve the transpose
operation. In the same cycle, just after reading out the first block of transposed values,
the coefficient values of the second block can be written into those locations. It is just
20
like reading block 1_0 (first 8x8 block position 0) and writing block 2_0 (second 8 x
8 block position 0), reading block 1_8 and writing block 2_1, reading block 1_16 and
writing block 2_2, and so on. In order to achieve the transpose of the second block, the
sequence for reading out block 2 must be in the order of 0, 1,2,... 63. When reading
out the coefficients of block 2, the third block coefficients are being written into the same
locations just after read out. The order is just like reading block 2_0 and writing block
3_0, reading block 2_1 and writing block 3_1, reading block 2_2 and writing block 3_2,
and so on. Notice the sequential order is 0, 1, 2,... 63 first, and then 0, 8, 16,..., and
then again in the sequential order of 0, 1, 2,... 63, and so on.
As shown before the structural architecture design is based on the principle of
distributed arithmetic, and it is data-path oriented. The methodology to describe this
architecture in VHDL and to simulate it on a computer are discussed in the next chapter.
21
IV. VHDL BEHAVIORAL DESCRIPTION OF THE 1-D DCT COMPONENT

































CR CLR SET p
1 t t a
TEST BENCH
Fig. 3 1-D DCT block diagram
The block diagram of the 1-D DCT shown in Fig. 3 can be described in models
using VHDL. The block diagram shown here includes a 1-D DCT system discussed in
chapter III and the additional clock generators, delay lines, control line, package 1, and
test bench. There are minor differences between this diagram and the architecture
described in the previous chapter. What is taken into consideration when simulating this
22
system in VHDL is that a signal flow latency will occur. Therefore, a delay line is
necessary to change the clock triggering time and solve this latency problem.
Additionally, the architecture in the previous chapter does not make it clear when to
control the add/sub register G and fulfill the calculation of summing 2's complement
values. It is shown here that the control line generating this control bit is triggered by
the delayed clock.
From the modeling point of view, it is rather complicated to build up a 16-bit adder
in VHDL following the usual arithmetic logics. The easiest approach is to convert the
16-bit binary coefficient values into integer numbers and then do the addition or
subtraction in integers. After the integer addition or subtraction, the integers are simply
converted back to binary values. This conversion task is accomplished by functions in
package 1. A VHDL package is a collection of functions and procedures. Of course,
some overflow/underflow situations are expected to occur during these conversions. One
last thing to note in Figure 3 is that the test bench module controls all the signal flow,
the input data, and the output data, and it also simulates the whole design.
B. BI-TO-DI AND DI-TO-BI VHDL PACKAGE
the package 1 in VHDL is shown below,
package packl is ~ Package declaration
procedure bi_to_in — Procedure 1 changes 16 bits binary into integer
(variable x : bit_vector(15 downto 0);
variable y : out integer);
procedure in_to_bi —Procedure 2 changes integer into binary
(variable m : in integer;
variable n : out bit_vector(15 downto 0));end packl;
package body packl is — Package body declaration
23
procedure bi_to_in — First procedure that changes bits to integer
(variable x : bit_vector(15 downto 0);
variable y : out integer) is
variable sum : integer : =0;
variable p : bit_vector(15 downto 0);
begin
p : = x;
if p(15) = T then ~ Change negative value to positive
for i in to 14 loop
if p(i) = T then
for i in to 13 loop




for k in to 14 loop — Integer conversion
if p(k) = '1' then
sum := sum + 2**k;
end if;
end loop;
y : = -sum; — Convert back to negative value
else
for 1 in to 14 loop — Positive value conversion
if p(l) =1' then
sum := sum + 2**1;
end if;
end loop;
y : = sum;
end if;
end bi_to_in; ~ end of procedure 1
procedure in_to_bi — Second procedure that changes integer to bits
(variable m : in integer;
variable n : out bit_vector(15 downto 0)) is
variable temp_a : integer : = 0;
variable tempb : integer : = 0;
variable w : bit_vector(15 downto 0);
begin
if m < then
temp_a : = -m; -- Take the absolute value of negative values
else
temp_a : = m;
end if;
24
for i in 14 downto loop -- Binary conversion
temp_b := temp_a/(2**i);
tempa : = temp_a rem (2**i);






if m > then
w(15) := '0'; - Assign positive sign bit
else
w(15) := '1'; -- Assign negative sign bit
for k in to 14 loop
if w(k) = T then
for k in to 13 loop ~ Invert negative bits to 2's complement





if W(i4) = '0' and w(13) = '0' and w(12) = '0' and w(ll) = '0'
and w(10) = '0' and w(9) = '0' and w(8) = '0' and w(7) = '0'
and w(6) = '0' and w(5) = '0' and w(4) = '0' and w(3) = '0'
and w(2) = '0' and w(l) = '0' and w(0) = '0'
then
w(15) := '0'; -- Avoid negative zero
end if;
n : = w;
end in_to_bi; - end of procedure 2
end packl; ~ end of procedure
This VHDL package used in the simulation is basically similar to any other high-
level language subroutine involving specific shared operations. The difference here is
that it is possible to gather several different procedures or functions together in one
package. The packl here consists of two procedures ~ bi_to_in and in_to_bi. Bi_to_in
converts the 16-bit binary numbers (represented in 2's complement notation) into positive
25
or negative integers. The in_to_bi procedure converts the positive or negative integers
back to 2's complement 16-bit binary numbers. Note that in the 2's complement number
system used here, there are only 16 bits including one sign bit. In overflow situations,
the digits that overflow will be truncated.
C. CLOCK GENERATOR MODULE (CLOCKGE)
The block diagram of the "clock_ge" is shown in Figure 4.
The interface connection (port map in
VHDL) has also been shown. This tells
how the circuit can be connected to the




Fig. 4 clockge block diagram
clk.vhd is shown below,




architecture clk_ctl of clock_ge is — Architecture declaration
begin
process(CLCK) - Process declaration
variable I : integer : = 0;
begin - Process begin
CLCK < = not CLCK after 5 ns; — Switching clock generation
I := I + 1;
assert I < = 80 — Assertion terminates the infinite process
report "job done"
severity Error;
end process; - End of process
end clk_ctl; ~ End of architecture
There is a sensitivity signal "CLCK" in the source code which provides the clock
for all the circuits. The initial value of CLCK is "0." Its value is changed into "1" after
26
5 ns. Since a process in VHDL basically is an infinite loop, it is necessary to use an
"assert" instruction to terminate the process. By changing a counter value "I", the job
can be terminated appropriately after 80 iterations.











Fig. 5 Serial load parallel shift register block diagram
Figure 5 shows the detailed block diagram of the parallel shift register (LOAD).
The source code in VHDL is shown below
entity LOAD is
port (AI : in bit_vector(15 downto 0); B0,B1,B2,B3,B4,B5,B6,B7 .
out bit_vector(15 downto 0);CLK : in bit);
end LOAD;
architecture BEH of LOAD is
27
type shift is array (0 to 7) of bit_vector(15 downto 0);
begin
process
variable A : shift;
variable I,count : integer : = 0;
begin
wait until CLK'event and CLK = '1'; — Clock controls the timing
for count in to 7 loop
wait until CLK'event and CLK = '1';




if (count = 7) and (CLK'event and CLK= 'l') then -- Output data
BO < = A(7);
Bl < = A(6);
B2 < = A(5);
B3 < = A(4);
B4 < = A(3);
B5 < = A(2);
B6 < = A(l);
B7 < = A(0);
end if;
end loop;
wait on AI,CLK; ~ Process activated when sensitivity signal changes
end process;
end BEH;
The input 16-bit data come from AI column by column. The speed of the input data
is controlled by the test bench. Note that the first data that appears is the 8th pixel value
of the first column. In other words, the sequential order of the incoming data is 7, 6,
5,... 0. In this order, the data is pushed down into the correct position, and the 1-D
DCT can be done correctly. After the 1-D DCT computation in Figure 3, the
corresponding spectral coefficients will be put back in the correct order,i.e., 0, 1,2,...
7. "LOAD" module parallel outputs the data to the second circuit "SHIFT" after eight
clock cycles (count = 7). After that, it processes another new column of data.
28








d2 — H2 bo2 ^$02




















Fig. 6 Shift two register block diagram
The block diagram for SHIFT is shown in Figure 6. There is the second clock
generator with three delay gates. Since the incoming pixel values pass through the
parallel shift register (LOAD), and it causes a delay of one clock cycle, it is necessary
to compensate for this latency by delaying the clock which triggers the shift-two-register
(SHIFT). Another clock which runs twice as fast as ck has been used to trigger the
original clock passing through the delay line. The VHDL source code of this faster clock
is similar to the previously discussed clock generator except the switching period is
twice as fast. The assertion time for termination is therefore twice as long, the delay line
29
consists of shift registers. The VHDL source code of the DELAY and the shift register
is as follows
entity delay is
port(a : bit;b : out bit;CLK : bit); -Normal clock coming in from port
a
end delay;
architecture beh of delay is
begin
process
variable x : bit;
begin
wait until CLK'event and CLK = '1'; — Faster clock controls timing
x : = a; ~ Shifting the incoming clock
b <= x;




port(bi0,bil,bi2,bi3,bi4,bi5,bi6,bi7 : in bit_vector(15 downto 0);
bo0,bol,bo2,bo3,bo4,bo5,bo6,bo7 : out bit_vector(l downto 0);
CLK : in bit); - Port declaration, eight input and output
end shift;
architecture beh of shift is
begin
process
variable I : integer : = 0; ~ counter as well as index
begin
for r in to 7 loop
wait until CLK'event and CLK =1';
bo0(0) < = biO(I); -- "q" = binary weight
boO(l) < = biO(I+l); -- "q" = 1 binary weight
bol(0) < = bil(I);
bol(l) <= bil(I+l);
bo2(0) < = bi2(I);
bo2(l) <= bi2(I+l);
bo3(0) < = bi3(I);
bo3(l) <= bi3(I+l);
bo4(0) < = bi4(I);
bo4(l) <= bi4(I+ l);
30
bo5(0) < = bi5(I);
bo5(l) <= bi5(I+l);
bo6(0) < = bi6(I);
bo6(l) <= bi6(I+l);
bo7(0) < = bi7(I);
bo7(l) <= bi7(I+l);
I : = I + 2; -- increment of two
end loop;
I : = 0; ~ reset the counter for next column of data
wait on CLK,bi0,bil,bi2,bi3,bi4,bi5,bi6,bi7; - wait for new data
end process;
end beh;
The data are input to the shift register in 16-bit words and output in 2-bit words.
Note that the counter "I" has been used as an index for each data word. Therefore, a
reset (I : = o) is necessary after each column of words are done. Otherwise, the index
would be running out of range, giving a run time error in the VHDL simulation.
1
F. 2-BIT ADDER/SUBTRACTOR MODEL (ADDSUB)
The 2-bit adder/subtracter module is shown in Figure 7. The "adsu" VHDL source
code is shown in Appendix A. A simple flow
chart in Figure 8 shows the behavior described
in VHDL. There are eight 2-bit words input
into this circuit. It is necessary to do the
"serial" 2-bit addition or subtraction according
to the expanded Eqs. (30) and (31). Since the
Fig. 7 2-bit add/sub block diagram incoming data have been Presented in
•cO- •0 d M — ooO
wl— •1 01 — 001
«£— •2 02 — oo2
•OS— 9 M —. ood
»C4—
.
•4 M — <»4
m6— •B M __oaS
m«—
i






Fig. 8 "adsu" flow chart
2
'complement notation, 2's complement
addition or subtraction should be used. On
the other hand, the 2-bit serial operation
should consider carriers generated f| : .<
previously. In other words, the first 2-bit
addition/subtraction might generate a
carrier. This carrier must carry on to the
next 2-bit add/sub computation. The simplest way to solve this problem is using a 2-bit
adder accompanied by a register handing the carrier bit for the next addition/subtraction.
For the subtraction case, it is necessary to convert the subtrahend into 2's complement
notation and then use the same 2-bit adder to accomplish the computation. What has been
done here is to convert the subtrahend into l's complement first and then add it to "1"
at the very first subtraction. The incoming subtrahend is just converted into l's
complement notation and the adder takes care of the "1" addition. In this way, the serial
subtraction is accomplished. There are four 2-bit adders and four 2-bit subtracters in the
source code. The "cr" bit sets the adder carry at the beginning to zero and the "st" bit
sets the subtracter carry to " 1 " . Later on, the adder/subtracter will take care of the carry
by itself. For the convenience of notation, the incoming two 2-bit data and the carrier
bit have been combined into a 5 -bit word, and the addition is done in the 2 -bit adder





The shift register block
diagram is shown in Figure 9.
Signal is input from port a and
output to port b. The shift register




























Fig. 9 shift register (reg) block diagram
entity reg is
port(a0,al,a2,a3,a4,a5,a6,a7 : bit_vector(l downto 0); -- input port
b0,bl,b2,b3,b4,b5,b6,b7 : out bit_vector(l downto 0); ~ output port
CLK : bit);
end reg;
architecture beh of reg is
begin
process
variable d0,dl,d2,d3,d4,d5,d6,d7 : bit_vector(l downto 0);
begin







d7 : = a7;
wait until CLK'event and CLK = '!';-- Clock control
bO < = dO
bl <= dl
b2 < = d2
b3 < = d3
b4 < = d4
b5 < = d5
~ shift the variable to output signal
33
b6 < = d6;




This circuit is the simplest one. The only effect of this code is to use a signal
assignment statement to simulate a signal buffer causing a latency period of one clock
cycle. The "wait until CLK'event and CLK = '1';" statement activates the timing
control. The "wait on CLK" statement activates the process's operation whenever the
clock changes its state.
H. READ ONLY MEMORY MODEL (ROM)
Figure 10 shows the read only memory block diagram . The VHDL source code
is included in Appendix A. There are eight 2-bit words input to this block, and sixteen
16 x 16 words corresponding to the 1-D DCT multiplication coefficients being read out.
The outputs of four adders with binary weight q = 0's and q = l's bits form two 4-bit
address bus to access the corresponding ROM multiplication coefficients. The same
situation happens for subtraction. There are sixteen individual ROM locations with
sixteen different values stored in them. Why there are sixteen ROM locations, and why
there are sixteen different values stored in them are discussed in detail in later sections.
Note that in the address assignment part of the source code, the order of the addresses
starts from eO, el, e2, e3 and ends with e7, e6, e5, e4. This detailed explanation will
also be given in later discussion. The values stored in the individual ROM have been
34
rom
Fig. 10 ROM block diagram
converted from the sum of coefficients "CV to 16-bit 2's complement binary values.
The values of "C^" are calculated according to Eq. (15) and Eq. (16).
I. SHIFT RIGHT 1-BIT REGISTER MODEL (SHI1)
Figure 1 1 shows the shift right 1-bit register block diagram. Its VHDL source code
is included in Appendix A. The shift right 1-bit register receives sixteen 16-bit words and
makes the right shift operation in eight words. It outputs the resultant sixteen 16-bit
words to the next circuit. The only difference between the input and the output values
is that the odd numbered 16-bit words have been shifted right 1 bit position. At the same




Fig. 11 Shil register block diagram
a proper bit ("0" or " 1
"
, depending on weather it has a positive or negative value) to
properly extend the binary 2's complement number.
J. ADDER/SUBTRACTOR-G MODEL (ADDG)
Figure 12 shows the addg block diagram. It includes one control circuit and five
delay gates. The control circuit enables the add_g to do addition or subtraction. The
purpose of the delay line is to compensate for signal latency. To activate the add/subtract








t m , , ad<J_g A«ub V
<* ck n-H t1
' b1 —+«1f2— •2O— a3
14—
*




19- « M —'93
f7— a7
ta- •8 M — 9*
tS- •9
no—- alO w — 95in— •11




M4_ •14 b7 — 97
f15 . •15
f18. •16 cm b8 — 9«
Fig. 12 Addg block diagram
The addg VHDL source code as well as the control and the delay VHDL source
code are shown below.
entity control is
port(CLK : bit;ct : out bit);
end control;
architecture beh of control is ~ control
begin
process
variable i : integer : = 0;
begin
wait until CLK'event and CLK =T; - Clock triggers the circuit
if i = 7 then
ct < = ' 1
'




i := i + 1;
37
if i = 8 then




entity delay 10 is
port(a : bit;b : out bit;CLK : bit);
end delay 10;
architecture beh of delay 10 is — delay
begin
process
variable x : bit;
begin
wait until CLK'event and CLK = T;
x := a;
b < = x;
wait on CLK, a;
end process;
end beh;




bit_vector(15 downto 0); — input port
M,b2,b3,b4,b5,b6,b7,b8: out bit_vector( 15 downto 0); -output port
CLK,as : bit);
end add_g;




nl,n2,n3,n4,n5,n6,n7,n8 : bit_vector(15 downto 0);
variable yl,y2,y3,y4,y5,y6,y7,y8,y9,yl0,yll,yl2,yl3,yl4,yl5,yl6,
ml,m2,m3,m4,m5,m6,m7,m8 : integer := 0;
begin
wait until CLK'event and CLK = T;
xl := al; x2 := a2; x3 := a3; x4 := a4; -- input values
x5 := a5; x6 := a6; x7 := a7; x8 := a8;
x9:=a9; xlO := alO; xll := all; xl2 := al2;
xl3:=al3; xl4 := al4; xl5 := al5; xl6:=al6;






if as = 0' then
ml := yl + y2; m2 := y3 + y4; m3 := y5 + y6; m4 : = y7 + y8;
m5 := y9 + ylO; m6 := yll + yl2; ml := yl3 + yl4; m8 := yl5 +
yl6;
else -- Control gives the subtraction instruction
ml := yl - y2; m2 := y3 - y4; m3 := y5 - y6; m4 := y7 - y8;
m5 := y9-yl0; m6 := yll - yl2; m7 := yl3 - yl4; m8 := yl5 - yl6;
end if;
— Procedure call to do binary conversion
in_to_bi(ml,nl); in_to_bi(m2,n2); in_to_bi(m3,n3); in_to_bi(m4,n4);
in_to_bi(m5,n5); in_to_bi(m6,n6); in_to_bi(m7,n7); in_to_bi(m8,n8);
bl < = nl; b2 < = n2; b3 < = n3; b4 < = n4;




The control is triggered by the clock, and an output of the control bit "ct" is
generated. On the 8th clock period, the "ct" becomes "1" but equals "0" otherwise. The
delay is also triggered by the clock. It receives one bit and outputs the same bit one clock
cycle later.
Add_g has sixteen 16-bit word inputs and eight 16-bit word outputs. It performs
16-bit addition or subtraction. As discussed previously, it is rather complicated to build
up a 16-bit adder/subtractor in a VHDL structural approach. The e est way is to
convert the 16-bit binary words into integers. In this way, "use work.packl.all" at the
beginning of the entity has to be declared, in order to call the "bi_to_in" procedure in
packl. "Work" represents the working library used, and "packl.all" represents all the
packages being used. After the conversion of binary values to integer values, addition or
subtraction was done according to the control input "as". The results then are converted
39




The reg_h block diagram is
shown in Figure 13. It functions just
like "reg", except "reg" handles 2-bit
words and "reg_h" handles 16-bit
words. The VHDL source codes are
reg_h
01— aO bO — hi
g2^ a1 b1 — h2
Q3^ 82 b2 — h3
Q4^ a3 b3 — h4
95— a4 b4 — h5
fl6— a5 b5 ~h6
fl7— 86 b6 — h7
g8— a7 CLK b7
— h8
1^
Fig. 13 Shift register_g block diagram
the same except for the declaration of the length of bit-vectors.
L. 16-BIT ADDERI MODEL (ADD_I)
Figure 14 shows the block diagram of the 16_bit adder (ADD_I). ADD_I and
ADD_G are basically the same. ADD_I does not have the "as" control bit or "if
instruction in the VHDL source code to do the subtraction. Another big difference is
that ADD_I is not triggered by the clock. It adds up the two 16-bit inputs with no delay.
It does integer addition with the procedures in packl also. The two inputs come from
REG_H and the feedback output from the SHI_2, which shifts the result to the right by
2 bits. This is shown in Figure 2. The VHDL source code for ADDI is shown below








rl a2 b1 —» 11
h2 a3
r2 •4 b2 — i2
h3 a5





rS a10 b5 — 16
he «11
r6 a12 be —»
h7 a13
r7 a14 b7 —17
h6 alfl
r8 al6 bfi —* 18
Fig. 14 16-bit addi block diagram
end add_i;




nl,n2,n3,n4,n5,n6,n7,n8 : bit_vector(15 downto 0);
variable yl,y2,y3,y4,y5,y6,y7,y8,y9,yl0,yll,yl2,yl3,y 14, yl5,yl6,
ml,m2,m3,m4,m5,m6,m7,m8 : integer := 0;
begin
xl := al; x2 := a2; x3 := a3; x4 := a4;
x5 := a5; x6 := a6; x7 := a7; x8 := a8;
x9:=a9; xlO := alO; xll := all; xl2 := al2;









ml := yl + y2; m2 : = y3 + y4; m3 : = y5 + y6; m4 := y7 + y8;
m5 := y9 + ylO; m6 := yll + yl2; m7 := yl3 + yl4; m8 := yl5
in_to_bi(ml,nl); in_to_bi(m2,n2); in_to_bi(m3,n3); in_to_bi(m4,n4);
in_to_bi(m5,n5); in_to_bi(m6,n6); in_to_bi(m7,n7); in_to_bi(m8,n8);
bl < = nl; b2 < = n2; b3 < = n3; b4 < = n4;



















































Fig. 15 Shift right 2-bit register block diagram
The shift right 2-bit register (shi_2) block diagram is shown in Figure 15. It
includes another clock generator running two-times faster to trigger the delay unit which
delays the normal clock by one period. It has another clear line (clr) from the test bench
42
that clears the register every eight clock cycles. The VHDL source code of SHI2 is
shown in Appendix B.
The SHI_2 model has eight 16-bit word inputs from ADD_I and has sixteen 16-bit
word outputs. The input values have been checked for the sign bit, and the SHI_2 shifts
the data 2 bits to the right in proper 2's complement representation. There are eight
blocks in the SHI_2 module. The results are updated and fed back to ADD_I module to
perform an addition with the incoming data values. In every 8th clock cycle, the results
are parallel shifted to the "parallel load serial shift" register (RESULT). During the same
cycle, the shift right 2-bit results are cleared, and the SHI_2 is ready for the next column
operation.
N. PARALLEL LOAD SERIAL SHIFT REGISTER MODEL (RESULT)
The block diagram of the parallel load serial shift register (RESULT) is shown in
Figure 16. There are eight inputs from SHI_2; RESULT puts out only one value at a
time. The VHDL source code of RESULT is shown below,
entity result is
port(al,a2,a3,a4,a5,a6,a7,a8 : bit_vector(15 downto 0);
k : out bit_vector(15 downto 0);CLK : bit);
end result;
architecture beh of result is
type r is array (0 to 7) of bit_vector(15 downto 0);
begin
process
variable x : r;
begin
x(0) := al; x(l) := a2; x(2) := a3; x(3) := a4;
x(4) := a5; x(5) := a6; x(6) := a7; x(7) := a8;
for i in to 7 loop











Fig. 16 Parallel shift serial output register block diagram





Eight 16-bit words are input into RESULT every 8th clock cycle. They are pushed
out one value at a time at every clock period. After all eight values have been output,
new values are fed in again for the next cycle.
44
TEST BENCH
di dr set cr p
1 1 1 I t
DESIGN CIRCUIT
O. TEST BENCH
The Test bench block diagram is
shown in Figure 17. It actually
includes all the intermediate signals,
the control signals, and the input and
output signals. The VHDL source code
for the test bench is shown in
Appendix B. All the components used
in the system have been declared and Fi8« 17 Block dl^r&m of Test Bench
instantiated. The signals used for the simulation are declared also. Configuration
statement binds all the components to the test system. The input pixel values are fed into
the system through "di", and it is simulated. The results of the simulation are collected
by signal "p". A table of the simulation results "p" is generated and analyzed to see if
the design is functioning correctly.
45
V. SIMULATION OUTPUT ANALYSIS AND EXPERIENCE
A. FORMATION OF ROM STORAGE VALUES
As discussed before, there are only sixteen-word ROM for each multiplication
coefficient due to the symmetry in DCT. The coefficients can be calculated according to
Eq. (15) and Eq. (16).
Table I: Multiplication Coefficients
m = m = 1 m = 2 m = 3
C* k = even
A = Yu+Yn B = Ya +Ym C = Ya+Ya D = Y^Yi4 U
.3535533905 .3535533905 .3535533905 0.3535533905 k =
.4619397662 .1913417161 -.1913417161 -.4619397662 k = 2
.3535533905 -.3535533905 -.3535533905 .3535533905 k = 4
.1913417161 -.4619397662 .4619397662 -.1913417161 k = 6
Qu, k = odd
A = Yi0-Yi7 B = Ya-Yi6 C = Yn-Ys d = ra-7* V
.4903926402 .4157348061 .2777851165 .0975451610 k = 1
.4157348061 -.097545161 -.4903926402 -.2777851165 k = 3
.2777851165 -.4903926402 .09754516101 .4157348061 k = 5
.0975451610- .2777851165 .4157348061 -.4903926402 k = 7
Since N = 8, the expanded equation of Eq. (30) and Eq. (31) can be derived as in Table
I after substituting the proper index (m, k). The labels U0, U2, ..., V7 are included in
46
the table for better understanding. Labels A, B, C, D stand for bit patterns. For example,
ifA = 1, B = 0, C = 1, D = 1, then the values in column 1, 3, and 4 should be
summed up to get the corresponding multiplication coefficient sum stored in the ROM.
The bit pattern in the circuit has two weighted groups (LSB group q = 0's, and MSB
group q = l's). The coefficient values for these two patterns are exactly the same.
Therefore, there are only 8 x 16 = 128 different coefficient sums stored in ROM.
One very important fact must be stressed. Are the values stored in the ROM
decimal numbers? The answer is obviously no. The values are stored in the ROM as
binary numbers. How can these summed decimal numbers be converted into binary
numbers? Upon inspection of Table I, it is noted that the largest possible decimal
number generated is not greater than 2. The smallest possible decimal number generated
is not lesser than -2. As stated before, the number system used here is 16-bit 2's
complement number. Therefore, one sign bit, one digit bit, and fourteen fraction bits are
chosen to represent the binary numbers stored in the ROM. All the decimal coefficients
calculated according to the specific bit pattern A, B, C, D have to be converted into
binary 2's complement 16-bit numbers. This conversion operation is carried out with the
help of a small program written in Matlab listed in Appendix C. The actual values stored
in the ROM are shown in the ROM VHDL source code.
B. SIMULATION AND TESTING IMAGE PATTERN (I)
The first image pattern being used is shown in Figure 17. It is a two-dimensional





1 2 3 5 6
Fig. 18 Pattern (1)8x8 image block
levels. Therefore, the pixel value of each point in this image can be represented from the
following formula
/ (x, y) = [cos(2*fr + 2*/p0 + 1] / 2 x 128 (32)
where/, = 1/4, fy = 0.
After substituting the corresponding index (x, y) in Figure 17 into Eq. (32), the pixel
values represented in this 8x8 image block can be shown in Table II. The 12-bit binary
representations of decimal numbers 128 and 64 are "000010000000" and
"000001000000". Converting the values in Table II into 12-bit binary numbers and taking
48








128 64 64 128 64 64
y =
5
128 64 64 128 64 64
y =
4
128 64 64 128 64 64
y =
3
128 64 64 128 64 64
y =
2
128 64 64 128 64 64
y =
i
128 64 64 128 64 64
y
= 128 64 64 128 64 64
x = x = 1 x = 2 x = 3 x = 4 x = 5 x = 6 x = 7
them column by column into the 1-D DCT VHDL model yields the corresponding 1-D
DCT spectral
coefficients (in Hex) as listed in Table III. The same decimal values in Table II has also
been put into a 1-D DCT subroutine for calculation which is in a image processing
library called spider. The result is shown in Table IV.
Due to the time limitations, the attempt to carry out the transpose of the 1-D DCT
coefficients in VHDL behavior models was not made. However, manual transpose is
done instead. Transposed 1-D DCT coefficients of pattern (I) in VHDL simulation is
shown in Table V. The values in Table V are converted again into binary numbers and
49
Table III: 1-D DCT spectral coefficients of Pattern (I) in VHDL simulation
0B50 05A8 0000 05A8 0B50 05A8 0000 05A8
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000




input column by column into the 16-bit 1-D DCT VHDL model to accomplish the 2-D
DCT operation. The 2-D DCT spectral coefficients which have been transposed back in
the VHDL simulation are shown in Table VI. The 1-D DCT operations in the VHDL
simulation is based on integer calculation. In order to prove that the 1-D DCT VHDL
50
Table V: Transposed 1-D DCT coefficients of pattern (I) in VHDL simulation
0B50 0000 0000 0000 0000 0000 0000 0000
05A8 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 00000
05A8 0000 0000 0000 0000 0000 0000 0000
0B50 0000 0000 0000 0000 0000 0000 0000
05A8 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
05A8 0000 0000 0000 0000 0000 0000 0000
Table VI: 2-D DCT spectral coefficients of pattern (I) in VHDL simulation
01F7 005F 0000 00C4 00FF FF7C 0000 FFED
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
simulation result is correct, the values in Table V are converted into integers and are
shown in Table VII. The values in Table VII is again calculated column by column using
the spider 1-D DCT subroutine. Its 2-D DCT spectral coefficients are transposed and
shown in Table VIII.
51







Table VIII: 2-D DCT spectral coefficients of pattern (I) using Spider Subroutine
4095.5 768.54 1573.0 2047.7 -1051.0 -152.8
To ensure that the 1-D DCT structural calculation in the VHDL simulation is
correct
,
direct 1-D DCT calculation on a calculator is also carried out based on
Eq.(15), and Eq.(16). Equations (33) and (34) show the calculation example for k =
and k = 1.
52
C(0) = —(2896 + 1448+0 + 1448+2896+1448+0 + 1448) (33)
C(l) =
N
-(2896COS— + 1448cos— +0 + 1448cos—
8 16 16 16
+2896cos— +1448COS— +0 + 1448cos—
16 16 16
(34)
The results using this approach are listed in Table IX. Note that the results of Table VIII
and Table IX are very close.
Table IX: 2-D DCT coefficients of pattern (I) using direct calculation
4095.5 768.59 1537.0 2047.7 -1051.0 -152.8
It is also necessary to trace the operation in the VHDL structural models shown in
Figure 2. To understand the structural operation and calculation of the 1-D DCT in the
VHDL simulation in more detail, a manual derivation and calculation are carried out for
53









the purpose. First the values in Table V need to be converted into binary numbers, which
are shown in Table X. It is clear that only one column of Table X is not zero. Therefore,
there is only one column of the 1-D DCT that needs computation. The values in the first
column are input into the 1-D DCT VHDL model which yields the serial 2-bit
addition/subtraction results as shown in Table XI.
The first column in Table XI shows how the 2-bit addition/subtraction is done. The first
row on the top represents the clock cycle. The rows in the upper-half (U) correspond
"k" equal to even numbers, and the rows in the lower-half (V) correspond "k" equal to
odd numbers. Each half column has four bits, forming a bus to address the corresponding
ROM coefficients. For example, at the first clock cycle, there are two 4-bit buses. The
four least significant bits (LSB) form an "ABCD" corresponding to "0000" bus to address
the "U00" (refer to Fig. 2) ROM value. This yields the value "0000000000000000" as
output. The MSBs of the first clock cycle addresses the "U01" ROM value
54

















Slice (0*7) 00 01 00 00 11 11 10 00 A
U
Slice (1+6) 00 00 01 01 10 10 10 00 B
Slice (2+ 5] 00 00 01 01 10 10 10 00 C
Slice (3+4 00 01 00 00 11 11 10 00 D
Slice (3 -4] 11 11 01 10 01 01 10 00 D
V
Slice (2-5) 11 11 01 10 01 01 10 00 C
Slice (1-6) 00 00 10 01 10 10 10 00 B
Slice (0-7) 00 00 10 01 10 10 10 00 A
"0000000000000000" out. It then adds up with the 1-bit right shifted value of "U00".
This result is stored in REG_H and then 2-bit right-shifted in the SHI_2 register. The
first clocked 2-bit right-shifted word is then fed back to ADD_I and added to the second
clocked result "0101101010000010". The procedure of getting this second clocked result
is just the same as that of getting the first clocked result. The summation of the first 2-bit
right-shifted number and the second clocked result "010110101000010" is then shifted
right 2 bits, yielding "0001011010100000". This value is then added to the third clocked
result "0111000100100010", yielding "1000011111000010". This process goes on
serially until the 8th clock cycle is reached. The addressed output ROM value of the MSB
of the 8th clock cycle "0000000000000000" is subtracted from the right-shifted 1-bit
55
Table XII 2-D DCT coefficients of pattern (I) using manual calculation
00000001111I11H 0000000001011111 000000001 1000100 00C0BOK111 11111 1111111111111100 1111111111101100
addressed ROM value of the LSB of the 8th clock cycle "0000000000000000". This result
is then added to the previous accumulated 7 clocked values, yielding
"0000011111111111". This final result is then right shifted 2 bits, yielding
"00000001 111111 11" and output as the first pixel 2-D DCT coefficient of the first
column. 8x8 image block of the 2-D DCT coefficients pattern I using structural
manual calculations are shown in Table XII. The detailed calculation procedure is listed
in Appendix D. Note that the summation of the accumulated two clocked values and the
third clocked result generates an overflow. This overflow will eventually generate a
negative value when right-shifted 2 bits. This is a inherent drawback of using 16-bit
integers arithmetics.
56
C. SIMULATION AND TEST OF IMAGE PATTERN (II)
Image pattern II is equal to image pattern I rotated by 45°. The following formula
was used to calculate each pixel value.
1 1
f(x,y) = [cos(2ti(^)7x + 2n(-^)Ty) + 1] / 2 x 128 (35)
Table XHI: 8x8 image block pixel values of pattern (II)
7 64 128 64 64 128 64
6 64 128 64 64 128 64
5 64 64 128 64 64 128
4 128 64 64 128 64 64
3 64 128 64 64 128 64
2 64 128 64 64 128 64
1 64 64 128 64 64 128
128 64 64 128 64 64
1 2 3 4 5 6 7
The 8x8 image block pixel values of pattern II represented in decimal numbers are
shown in Table XIII. The 2-D DCT of pattern II has been calculated in two ways,
VHDL simulation and spider subroutine. Using VHDL simulation first, Table XIII is
converted into binary numbers and is input column by column into the VHDL 1-D DCT
test bench. Its 1-D DCT coefficients is shown in Table XIV. For 2-D DCT, the values
57
Table XIV: 1-D DCT coefficients of pattern (II) using VHDL simulation
7 016A 016A 016A 016A 016A 016A 016A 016A
6 FFBB 0043 0043 FFBB FFBB 0043 0043 FFBB
5 0000 0000 0000 0000 0000 0000 0000 0000
4 FF74 008A 008A FF74 FF74 008A 008A FF74
3 00B5 00B5 FF4A FF4A 00B5 00B5 FF4A FF4A
2 005D FFA2 FFA2 005D 005D FFA2 FFA2 005D
1 0000 0000 0000 0000 0000 0000 0000 0000
000D FFF2 FFF2 000D 000D FFF2 FFF2 000D
1 2 3 4 5 6 7
in Table XIV are then transposed manually, and the results are input into the 16-bit
VHDL 1-D DCT test bench. The 2-D DCT spectral coefficients for pattern II in VHDL
simulation are listed in Table XV.
Table XV: 2-D DCT coefficients of pattern (II) using VHDL simulation
7 005F 0000 0000 0000 0000 0000 0000 0000
6 FFFF 0000 0000 0000 FFE7 0000 0000 0000
5 0000 0000 0000 0000 0000 0000 0000 0000
4 FFFF 0000 0000 0000 FFCE 0000 0000 0000
3 F555 0017 0000 0031 0000 FFDE 0000 FFFC
2 FFFF 0000 0000 0000 0020 0000 0000 0000
1 0000 0000 0000 0000 0000 0000 0000 0000
FFFF 0000 0000 0000 0005 0000 0000 0000
1 2 3 4 5 6 7
58
Table XVI: Pattern II 1-D DCT coefficients using Spider Subroutine
7 181.0 181.0 181.0 181.0 181.0 181.0 181.0 181.0

















6.757 -6.757 -6.757 6.757 6.757
6.757
-6.757 6.757
1 2 3 4 5 6 7
1-D DCT subroutine in Spider is used to double check the VHDL simulation
result. Values in Table XIII are calculated column by column, and its result is listed in
Table XVI This result is compared with that of Table XIV for verification.
2-D DCT floating point calculation is also used to check the VHDL simulation.
Again for the same reason of comparison, values in Table XIV are chosen and converted
into integers. After the Hex-integer conversion, these values are transposed again and
calculated by 1-D DCT Spider subroutine column by column. The results are shown in
Table XVII.
59









1 2 3 4 5 6 7
D. RESULT ANALYSIS
There are four methods being used to prove the accuracy of the VHDL structural
1-D DCT in VHDL simulation. Comparing Tables VI, VIII, IX, and XII, the
similarities among them are obvious. Tables VIII and IX are almost the same while
Tables VI and XII need to be converted into decimal numbers for ease of comparison.
Table VI needs to be converted into 16-bit binary values first, then using the definition
of the 16-bit binary number system (1 sign bit, 1 integer and 14 fraction bits) to convert
the binary words into decimal numbers.
The multiplication factor as to how many times the number is being right-shifted
here is 2 17 . The equivalent integer values of Table VI and Table XII are shown in Table
XVIII and XIX. Most of the pixel values are similar to those in Table VIII and IX with
a few differences. There are two reasons that can explain this phenomenon. First, there
is a limitation in 16-bit binary number representation. Those fractional numbers that are
60
Table XVm Equivalent decimal numbers of table (VI)
4024 760 1568 2040 -1056 -152
Table XEX Equivalent decimal numbers of table (XII)
4088 760 1568 2040 -1056 -160
smaller than 2' 14 are truncated. This will cause small difference between Table VI,XII
and Table XVIII,XIX. The second reason is due to the overflow situation. The
accumulated sum of the coefficients might be greater than the biggest number that a 16-
bit binary number system could represent. This overflow situation will cause larger
difference between Table VI,XII and Table XVIII,XIX.
61
A way is found to indicate the overflow situation. Checking can be made in
ADDG and ADD_I by adding the following VHDL source code right after the integer
to binary number conversion.
if ((x,(15) = '1' and x 2(15) = 'V and n,(15) = '0') or
(x,(15) = '0' and x2(15) = '0' and n,(15) = '1')) then
overl < = '1';
if ((x3(15) = '1' and x4(15) = '1' and n2(15) = '0') or
(x3(15) = '0' and x4(15) = '0' and n2(15) = '1')) then
over2 < = '1';
if ((x 15(15) = '1' and xu(15) = 'V and ng(15) = '0') or
(x 15(15) = '0' and x 16(15) = '0' and n8(15) = '1')) then
over8 <= '1';
Of course, at the port declaration, a special signal declaration must be made in order to
notify the test bench about this overflow condition. VHDL source code for the port
declaration is shown below.
port(~;bl,b2,b3,b,b5,b6,b7,b8: out bit_vector(15 downto 0);
overl,over2,over3,over4,over5,over6,over7,over8 : out bit;
CLK : bit);
Addition to the port modification, the test bench component's port also needs to be
modified, the last thing to accomplish in signaling this overflow condition is to declare
62
signals and unable the "port map" to receive the overflow signal coming from ADDG
and ADD I. VHDL source code is shown below.
signal ovl,ov2,ov3,ov4,ov5,ov6,ov7,ov8 : bit;
g : add_g port Inap(fl,f2,f3,f4,f5,f6,n,f8,f9,fl0,fll,n2,fl3,n4,fl5,fl6,
gl,g2,g3,g4,g5,g6,g7,g8,ck,qo,ovl,ov2,ov3,ov4,ov5,ov6,vo7,vo8);
i : addi port map(hl,rl,h2,r2,h3,r3,h4,r4,r4,h5,r5,h6,r6,h7,r7,h8,r8,
il,i2,i3,i4,i5,i6,i7,i8,ck,ovl,ov2,ov3,ov4,ov5,ov6,ov7,ov8);
Whenever the overflow bit "ov#" changes to ' 1
'
, it indicates that particular pixel value
has experienced overflow.
E. EXPERIENCE
My experience in the work can be listed as follows.
1. Input Data Sequential Order error
The sequential order of input pixels which are input to the parallel shift
register was assumed to be 7, 6, ...0. According to the transposed sequence, the actual
input data should be in the order of 0, 1, 2,... 7. Therefore, there would be an error if
the sequence of the transposed data is not reversed. This means that another reverse
circuit should be added between the transpose circuit and the input "load" circuit. But,
63
it is rather complicated to add an extra circuit. The easiest way to solve this problem is
to input the data in the order of 0, 1....7 and switch the subtrahend connections (0-7, 1-
6, 2-5, 3-4) in the 2-bit adder/subtractor circuit. In this way, the order of input data and
output data are always in the order of 0, 1, 2,... 7 and it is not necessary to add an extra
circuit.
2. Formation of 2-bit Adder in VHDL source code
The interface of a 2-bit adder has five inputs (two for the adder, two for the
addend, and one for the carrier), three outputs (two for the addition result, and one for
the carrier). Thus, a truth table involving all possible input combinations can be made.
There are five inputs, therefore 25 = 32 combinations will occur. After building up an
8 x 32 truth table, Karnaugh map reduction can be used to minimize the complex
expression in boolean algebra. It is the boolean algebra expression which is used in the
VHDL source code. There is a detailed example listed in Appendix E.
3. No Timing control in Addi Model
Almost every circuit needs a clock to trigger and control the sequential
process. ADD_I is a special adder circuit without a triggering clock. As mentioned
earlier, the accumulator of the serial bit result consists of ADD_I and SHI_2. ADDI is
used to add up the incoming clocked result with the latest accumulated result right after
right-shifting by 2 bits. If these two circuits are triggered by the clock, then there will
be a time delay of one clock cycle between ADD_I and SHI_2. In other words, ADD_I
is adding the incoming clocked result with the accumulated right-shifted 2-bit result from
64
one clock cycle earlier, rather than the latest. This will cause an error in the output
coefficients. The method to remove of this time delay of one clock cycle between ADD_I
and SHI_2 is to allow only one clock to trigger this accumulator. Another alternative
considered is to use the clock to trigger ADD_I instead of triggering SHI_2. However,
the experiment shows that this cannot be done, since SHI_2 has to be cleared on every
8 th clock cycle, and this clearing needs a counter to calculate the exact time. On the other
hand, SHI_2 is to output the correct accumulated result every 8* clock period. These two
factors both need a clock to control the timing. This is why ADD_I was chosen not to
be triggered by the clock.
4. "Set" control in Test Bench
It is strange enough that the "set" control in the test bench does not get the
value T at the beginning of simulation. The function of "set" is to initiate all the
subtracter's carriers in "adsu" to T in order to accomplish the subtraction. This
initiation is performed only once. The carrier of the subtractor is then carried over all
by itself. That is to say, the carrier is a variable in "adsu". This carry variable is initiated
by the "set" first and will be influenced by the "set" at subsequent times if modification
of the signal "set" is not made. Fortunately, "set" has to change only once from '0' to
T at the beginning of the simulation. Therefore, an "event" instruction causes "set" to
be a sensitivity signal. Since "set" changes only once, it will not have any further
influence on the carrier variable. Other than this, the time for "set" to change its state
is very important, the clock is '0' at the beginning of the simulation and changes its state
to T after 5 ns. If "set" changes its state other than at 5 ns, the subtraction result will
65
be wrong. Only when "set" changes its state at 5 ns will the result of subtraction be
correct.
5. Signals cannot be used as variables in VHDL
In solving the problem mentioned in previous section, efforts have been made
to use the "set" signal directly as a variable within the process. This certainly will yield
a syntax error doing compilation of the source codes.
6. Preventing Negative Zero occurrences in Packl
There is a paragraph of source code added to packl at the end of "in_to_bi"
when negative zeros found during the simulation. When these negative zeros arrive at the
gate of shi_2, they will generate very large negative numbers and cause an error at the
output. This unwanted situation has been taken care of by adding source code to check
for negative zeros at the end of the integer-binary conversion procedure. Although this
extra checking source code works fine, it means an extra circuit must be added. This
is not the goal in circuits design. A close inspection of in_to_bi source code has been
made and a very small mistake has been found. At the beginning of inverting the bit
stream into 2's complement codes, positive or negative integers is checked in order to
assign the correct sign bit "w(15)" for the converted binary number. It is found that
"w(15) := '0'" is only assigned to the situation when "m > '0'". The other values are
all assigned with "w(15) := T". This is how negative zeros are generated. Had the
source code "m > '0'" been changed to "m > = '0'", the extra negative zero checking
codes would not be necessary.
66
VI. CONCLUSION
The main objectives of this thesis, using the VHDL to describe a 1-D DCT
structural architecture of a 8 x 8 image block and simulating it on a workstation, have
been reached. The basic theory of 1-D DCT, the principle of distributed arithmetic and
the actual hardware architecture have been made more clear in the VHDL simulation.
Above all, the experience of using the VHDL to describe an algorithm and the simulation
of the VHDL is obtained. Although getting familiar with the language and its simulation
has been time-consuming, the benefits of the signal tracing and the time modeling have
been demonstrated in this thesis. VHDL itself is a portable document and a hierarchical
language. Therefore, this thesis can be adopted in other more complicated design.
Despite the fact that the VHDL simulation result of integer point calculation is not
as precise as floating point calculation, the resultant energy spectrum of 1-D DCT is
already good enough to recover the original image block. Besides, absolute value
accuracy is not important for image compression. It is the relative value between pixel
points that matters. Another point worthy to mention is that the approach in this thesis
has the advantages of calculation speed, since the hardware for floating point calculation
is much more complicated than that for integer point calculation.
There is still a very important module that was not described, the transpose
module. The transpose module can be connected to the test bench and fulfill the
automatic 2-D DCT simulation.
67
The simulation done here is only the initial part of the "top-down design" process.
The algorithm of an 8 x 8 image block 2-D DCT in VHDL behavior description was
implemented. This behavior description can be further developed into gate level
descriptions. Once reached the gate level, the hardware circuit implementation can be
realized.
68





architecture clkctl of clockge is
begin
process(CLCK)
variable I : integer : = 0;
begin
CLCK < = not CLCK after 5 ns;
I:= I + 1;





Serial load parallel shift register
entity LOAD is
port (AI : in bit_vector(ll downto 0); B0,B1,B2,B3,B4,B5,B6,B7 . out bit_vector(ll
downto 0);CLK : in bit);
end LOAD;
architecture BEH of LOAD is
type shift is array (0 to 7)of bit_vector(ll downto 0);
begin
process
variable A : shift;
variable I,count : integer : = 0;
begin
wait until CLK'event and CLK = '1';
for count in to 7 loop
wait until CLK'event and CLK = '1';




if (count = 7) and (CLK'event and CLK='l') then
B0 < = A(7);
Bl < = A(6);
69
B2 < = A(5);
B3 < = A(4)j
B4 < = A(3);
B5 < = A(2);
B6 < = A(l);






Twice faster clock generator —
entity clock is
port(CLK :inout bit := '1');
end clock;
architecture beh of clock is
begin
process(CLK)
variable I : integer := 0;
begin
CLK < = not CLK after 2.5 ns;
I:= I + 1;







port(a : bit;b : out bit;CLK : bit);
end delay 10;
architecture beh of delaylO is
begin
process
variable x : bit;
begin






Parallel shift out 2-bit register
70
entity shift is
port(bi0,bil,bi2,bi3,bi4,bi5,bi6,bi7: in bit_vector(ll downto 0);
bo0,bol,bo2,bo3,bo4,bo5,bo6,bo7: out bit_vector(l downto 0);
CLK : in bit);
end shift;
architecture beh of shift is
begin
process
variable I : integer : = 0;
begin
wait for 90 ns;
for r in to 5 loop
wait until CLK'event and CLK = '1';
bo0(0) < = bi0(D;
boO(l) < = bi0(I+l);
bol(0) < = bil(I);
bol(l) < = bil(H-l);
bo2(0) < = bi2(I);
bo2(l) < = bi2(I+l);
bo3(0) < = bi3(I);
bo3(l) < = bi3(I+l);
bo4(0) < = bi4(I);
bo4(l) < = bi4(I+l);
bo5(0) < = bi5(I);
bo5(l) < = bi5(I+l);
bo6(0) < = bi6(I);
bo6(l) < = bi6(I+l);
bo7(0) < = bi7(I);
bo7(l) < = bi7(I+l);
















architecture beh of adsu is
begin
process
variable Cl,c2,c3,c4,c5,c6,c7,c8 : bit_vector(4 downto 0);
variable dl,d2,d3,d4,d5,d6,d7,d8 : bit_vector(2 downto 0);
variable el,e2,e3,e4,e5,e6,e7,e8 : bit;
begin
wait until CLK'event and CLK = T;
if cr'event then
el := cr; e2 := cr; e3 := cr; e4 := cr;
end if;
if st'event then







dl(0) := (cl(l) and (not cl(3)) and (not cl(0)))
or (not cl(l) and cl(3) and (not cl(0)))
or (not cl(l) and (not cl(3)) and cl(0))
or (cl(l) and cl(3) and cl(0));
dl(l) := (not cl(2) and not cl(l) and cl(4) and not cl(0))
or (not cl(2) and cl(4) and not cl(3) and not cl(0))
or (cl(2) and not cl(4) and not cl(3) and not cl(0))
or (cl(2) and not cl(l) and not cl(4) and not cl(0))
or (not cl(2) and cl(l) and not cl(4) and cl(3))
or (cl(2) and cl(l) and cl(3) and cl(4))
or (not cl(l) and not cl(2) and cl(4) and not cl(3))
or (not cl(l) and cl(2) and not cl(3) and not cl(4))
or (cl(l) and not cl(2) and not cl(4) and cl(0))
or (not cl(2) and not cl(4) and cl(3) and cl(0))
or (cl(2) and cl(3) and cl(4) and cl(0))
or (cl(2) and cl(l) and cl(4) and cl(0));
dl(2) := (cl(l) and cl(2) and cl(3))
or (cl(l) and cl(3) and cl(4))
or (cl(l) and cl(2) and cl(0))
or (cl(2) and cl(3) and cl(0))
or (cl(3) and cl(4) and cl(0))
or (cl(2) and cl(4))










d2(0) := (c2(l) and (not c
or (not c2(l) and c2(3) and (not c2(0)))
or (not c2(l) and (not c2(3)) and c2(0))
or (c2(l) and c2(3) and c2(0));
d2(l) := (not c2(2) and not c2(l) and c2(4) and not c2(0))
or (not c2(2) and c2(4) and not c2(3) and not c2(0))
or (c2(2) and not c2(4) and not c2(3) and not c2(0))
or (c2(2) and not c2(l) and not c2(4) and not c2(0V)
or (not c2(2) and c2(l) and not c2(4) and c2(3))
or (c2(2) and c2(l) and c2(3) and c2(4))
or (not c2(l) and not c2(2) and c2(4) and not c2(3))
or (not c2(l) and c2(2) and not c2(3) and not c2(4))
or (c2(l) and not c2(2) and not c2(4) and c2(0))
or (not c2(2) and not c2(4) and c2(3) and c2(0))
or (c2(2) and c2(3) and c2(4) and c2(0))
or (c2(2) and c2(l) and c2(4) and c2(0));
d2(2) := (c2(l) and c2(2) and c2(3))
or (c2(l) and c2(3) and c2(4))
or (c2(l) and c2(2) and c2(0))
or (c2(2) and c2(3) and c2(0))
or (c2(3) and c2(4) and c2(0))
or (c2(2) and c2(4))
or (c2(l) and c2(4) and c2(0));
bl(0) <= d2(0);







d3(0) := (c3(l) and (not c3(3)) and (not c3(0)))
or (not c3(l) and c3(3) and (not c3(0)))
73
or (not c3(l) and (not c3(3)) and c3(0))
or (c3(l) and c3(3) and c3(0));
63(1) := (not c3(2) and not c3(l) and c3(4) and not c3(0))
or (not c3(2) and c3(4) and not c3(3) and not c3(0))
or (c3(2) and not c3(4) and not c3(3) and not c3(0))
or (c3(2) and not c3(l) and not c3(4) and not c3(0))
or (not c3(2) and c3(l) and not c3(4) and c3(3))
or (c3(2) and c3(l) and c3(3) and c3(4))
or (not c3(l) and not c3(2) and c3(4) and not c3(3))
or (not c3(l) and c3(2) and not c3(3) and not c3(4))
or (c3(l) and not c3(2) and not c3(4) and c3(0))
or (not c3(2) and not c3(4) and c3(3) and c3(0))
or (c3(2) and c3(3) and c3(4) and c3(0))
or (c3(2) and c3(l) and c3(4) and c3(0));
d3(2) := (c3(l) and c3(2) and c3(3))
or (c3(l) and c3(3) and c3(4))
or (c3(l) and c3(2) and c3(0))
or (c3(2) and c3(3) and c3(0))
or (c3(3) and c3(4) and c3(0))
or (c3(2) and c3(4))
or (c3(l) and c3(4) and c3(0));
b2(0) < = d3(0);







d4(0) := (c4(l) and (not c4(3)) and (not c4(0)))
or (not c4(l) and c4(3) and (not c4(0)))
or (not c4(l) and (not c4(3)) and c4(0))
or (c4(l) and c4(3) and c4(0));
d4(l) := (not c4(2) and not c4(l) and c4(4) and not c4(0))
or (not c4(2) and c4(4) and not c4(3) and not c4(0))
or (c4(2) and not c4(4) and not c4(3) and not c4(0))
or (c4(2) and not c4(l) and not c4(4) and not c4(0))
or (not c4(2) and c4(l) and not c4(4) and c4(3))
or (c4(2) and c4(l) and c4(3) and c4(4))
or (not c4(l) and not c4(2) and c4(4) and not c4(3))
or (not c4(l) and c4(2) and not c4(3) and not c4(4))
or (c4(l) and not c4(2) and not c4(4) and c4(0))
74
or (not c4(2) and not c4(4) and c4(3) and c4(0))
or (c4(2) and c4(3) and c4(4) and c4(0))
or (c4(2) and c4(l) and c4(4) and c4(0));
d4(2) := (c4(l) and c4(2) and c4(3))
or (c4(l) and c4(3) and c4(4))
or (c4(l) and c4(2) and c4(0))
or (c4(2) and c4(3) and c4(0))
or (c4(3) and c4(4) and c4(0))
or (c4(2) and c4(4))
or (c4(l) and c4(4) and c4(0));
b3(0) < = d4(0);





c5(3) := not a4(0);
c5(4) := not a4(l);
65(0) := (c5(l) and (not c5(3)) and (not c5(0)))
or (not c5(l) and c5(3) and (not c5(0)))
or (not c5(l) and (not c5(3)) and c5(0))
or (c5(l) and c5(3) and c5(0));
65(1) := (not c5(2) and not c5(l) and c5(4) and not c5(0))
or (not c5(2) and c5(4) and not c5(3) and not c5(0))
or (c5(2) and not c5(4) and not c5(3) and not c5(0))
or (c5(2) and not c5(l) and not c5(4) and not c5(0))
or (not c5(2) and c5(l) and not c5(4) and c5(3))
or (c5(2) and c5(l) and c5(3) and c5(4))
or (not c5(l) and not c5(2) and c5(4) and not c5(3))
or (not c5(l) and c5(2) and not c5(3) and not c5(4))
or (c5(l) and not c5(2) and not c5(4) and c5(0))
or (not c5(2) and not c5(4) and c5(3) and c5(0))
or (c5(2) and c5(3) and c5(4) and c5(0))
or (c5(2) and c5(l) and c5(4) and c5(0));
d5(2) := (c5(l) and c5(2) and c5(3))
or (c5(l) and c5(3) and c5(4))
or (c5(l) and c5(2) and c5(0))
or (c5(2) and c5(3) and c5(0))
or (c5(3) and c5(4) and c5(0))
or (c5(2) and c5(4))
or (c5(l) and c5(4) and c5(0));
b4(0) < = d5(0);
75





c6(3) := not a5(0);
c6(4) := not a5(l);
d6(0) := (c6(l) and (not c6(3)) and (not c6(0)))
or (not c6H) and c6(3) and (not c6(0)))
or (not c6u) and (not c6(3)) and c6(0))
or (c6(l) and c6(3) and c6(0));
d6(l) : = (not c6(2) and not c6(l) and c6(4) and not c6(0))
or (not c6(2) and c6(4) and not c6(3) and not c6(0))
or (c6(2) and not c6(4) and not c6(3) and not c6(0))
or (c6(2) and not c6(l) and not c6(4) and not c6(0))
or (not c6(2) and c6(l) and not c6(4) and c6(3))
or (c6(2) and c6(l) and c6(3) and c6(4))
or (not c6(l) and not c6(2) and c6(4) and not c6(3))
or (not c6(l) and c6(2) and not c6(3) and not c6(4))
or (c6(l) and not c6(2) and not c6(4) and c6(0))
or (not c6(2) and not c6(4) and c6(3) and c6(0))
or (c6(2) and c6(3) and c6(4) and c6(0))
or (c6(2) and c6(l) and c6(4) and c6(0));
d6(2) := (c6(l) and c6(2) and c6(3))
or (c6(l) and c6(3) and c6(4))
or (c6(l) and c6(2) and c6(0))
or (c6(2) and c6(3) and c6(0))
or (c6(3) and c6(4) and c6(0))
or (c6(2) and c6(4»
or (c6(l) and c6(4) and c6(0));
b5(0) < = d6(0);







d7(0) := (c7(l) and (not c7(3)) and (not c7(0)))
or (not c7(l) and c7(3) and (not c7(0)))
or (not c7(l) and (not c7(3)) and c7(0))
76
or (c7(l) and c7(3) and c7(0));
d7(l) := (not c7(2) and not c7(l) and c7(4) and not c7(0))
or (not c7(2) and c7(4) and not c7(3) and not c7(0))
or (c7(2) and not c7(4) and not c7(3) and not c7(0))
or (c7(2) and not c7(l) and not c7(4) and not c7(0))
or (not c7(2) and c7(l) and not c7(4) and c7(3))
or (c7(2) and c7(l) and c7(3) and c7(4))
or (not c7(l) and not c7(2) and c7(4) and not c7(3))
or (not c7(l) and c7(2) and not c7(3) and not c7(4))
or (c7(l) and not c7(2) and not c7(4) and c7(0))
or (not c7(2) and not c7(4) and c7(3) and c7(0))
or (c7(2) and c7(3) and c7(4) and c7(0))
or (c7(2) and c7(l) and c7(4) and c7(0));
d7(2) := (c7(l) and c7(2) and c7(3))
or (c7(l) and c7(3) and c7(4))
or (c7(l) and c7(2) and c7(0))
or (c7(2) and c7(3) and c7(0))
or (c7(3) and c7(4) and c7(0))
or (c7(2) and c7(4))
or (c7(l) and c7(4) and c7(0));
b6(0) < = d7(0);






c8(4) := not a7(l);
d8(0) := (c8(l) and (not c8(3)) and (not c8(0)))
or (not c8(l) and c8(3) and (not c8(0)))
or (not c8(l) and (not c8(3)) and c8(0))
or (c8(l) and c8(3) and c8(0));
d8(l) := (not c8(2) and not c8(l) and c8(4) and not c8(0))
or (not c8(2) and c8(4) and not c8(3) and not c8(0))
or (c8(2) and not c8(4) and not c8(3) and not c8(0))
or (c8(2) and not c8(l) and not c8(4) and not c8(0))
or (not c8(2) and c8(l) and not c8(4) and c8(3))
or (c8(2) and c8(l) and c8(3) and c8(4))
or (not c8(l) and not c8(2) and c8(4) and not c8(3))
or (not c8(l) and c8(2) and not c8(3) and not c8(4))
or (c8(l) and not c8(2) and not c8(4) and c8(0))
or (not c8(2) and not c8(4) and c8(3) and c8(0))
77
or (c8(2) and c8(3) and c8(4) and c8(0))
or (c8(2) and c8(l) and c8(4) and c8(0));
d8(2) := (c8(l) and c8(2) and c8(3))
or (c8(l) and c8(3) and c8(4))
or (c8(l) and c8(2) and c8(0))
or (c8(2) and c8(3) and c8(0))
or (c8(3) and c8(4) and c8(0))
or (c8(2) and c8(4))
or (c8(l) and c8(4) and c8(0));
b7(0) < = d8(0);







port(a0,al,a2,a3,a4,a5,a6,a7 : bit_vector(l downto 0);
D0,bl,b2,b3,b4,b5,b6,b7 : out bit vectord downto 0);
CLK : bit);
end reg;
architecture beh of reg is
begin
process











bO < = dO
bl < = dl
b2 <= d2
b3 <= d3
b4 < = d4
b5 < = d5









port(e0,el,e2,e3,e4,e5,e6,e7 : bit_vector(l downto 0);
bl0,bll,b20,b21,b30,b31,b40,b41,b50,b51,b60,b61,b70,b71,b80,b81
out bit_vector(15 downto 0);
CLK : bit);
end rom;
architecture beh of rom is
begin
process
variable al0,al 1 ,a20,a21 ,a30,a31 ,a40,a41 ,a50,a51 ,a60,a61 ,a70,a71
,
a80,a81 : bit_vector(3 downto 0);
begin
wait until CLK'event and CLK = '1';
al0(3) : = e0(0); al0(2)
all(3) : = eO(l); all(2)
a20(3) : = e7(0); a20(2)
a21(3) : = e7(l); a21(2)
a30(3) : = e0(0); a30(2)
a31(3) : = e0(l); a31(2)
a40(3) : = e7(0); a40(2)
a41(3) : = e7(l); a41(2)
a50(3) : = e0(0);a50(2)








when "0000" = > blO <
when "0001" = > blO <
when "0010" = > blO <
when "0011" = > blO <
when "0100" = > blO <
when "0101" = > blO <
when "0110" = > blO <
when "01Lll" = > blO <
:= el(0): al0(l) : = e2(0): al0(0) : = e3(0);
:= el(l); all(l) : = e2(l): all(0) : = e3(l);
:= e6(0): a20(l) : = e5(0): a20(0) : = e4(0);
:= e6(l): a21(l) : = e5(l): a21(0) : = e4(l);
:= el(0): a30(l) : = e2(0)« a30(0) : = e3(0);
:= el(l): a31(l) : = e2(l): a31(0) : = e3(l);
:= e6(0): a40(l) := e5(0) a40(0) : = e4(0);
:= e6(l):
!
a41(l) := e5(l) a41(0) - = e4(l);
:= el(0) ; a50(l) : = e2(0). a50(0) : = e3(0);
:= el(l) ; a51(l) := e2(l) ; a51(0) : = e3(l);
:= e6(0) ; a60(l) := e5(0) ; a60(0) : = e4(0);
:= e6(l)
;
a61(l) := e5(l) ; a61(0) := e4(l);
:= el(0) ; a70(l) := e2(0) ; a70(0) := e3(0);
:= el(l) ; a71(l) := e2(l) ; a71(0) := e3(l);
:= e6(0) ; a80(l
)
:= e5(0) ; a80(0) := e4(0);
:= e6(l)
;










when "1000" = > blO < = "0001011010100000"
when "1001" = > blO < = "0010110101000001"
when "1010" = > blO < = "0010110101000001"
when "1011" = > blO < = "0100001111100001"
when "1100" = > blO < = "0010110101000001"
when "1101" = > blO < = "0100001111100001"
when "1110" = > blO < = "0100001111100001"
when "1111" => blO <= "0101101010000010"
end case;
case all is
when "0000" = > bll < = "0000000000000000";
when "0001" = > bll < = "0001011010100000";
when "0010" = > bll < = "0001011010100000";
when "0011" = > bll < = "0010110101000001";
when "0100" = > bll < = "0001011010100000";
when "0101" = > bll < = "0010110101000001";
when "0110" = > bll < = "0010110101000001";
when "0111" = > bll < = "0100001111100001";
when "1000" = > bll < = "0001011010100000";
when "1001" = > bll < = "0010110101000001";
when "1010" = > bll < = "0010110101000001";
when "1011" = > bll < = "0100001111100001";
when "1100" = > bll < = "0010110101000001";
when "1101" = > bll < = "0100001111100001";
when "1110" = > bll < = "0100001111100001";
when "1111" = > bll < = "0101101010000010";
end case;
case a20 is
when "0000" = > b20 < = "0000000000000000"
when "0001" = > b20 < = "0000011000111110"
when "0010" = > b20 < = "0001000111000111"
when "0011" = > b20 < = "0001100000000101"
when "0100" = > b20 < = "0001101010011011"
when "0101" = > b20 < = "0010000011011001"
when "0110" = > b20 < = "0010110001100010"
when "0111" = > b20 < = "0011001010100000"
when "1000" = > b20 < = "0001111101100010"
when "1001" = > b20 < = "0010010110100000"
when "1010" = > b20 < = "0011000100101001"
when "1011" = > b20 < = "0011011101101000"
when "1100" = > b20 < = "0011100111111101"
80
when "1101" = > b20 < = "0100000000111100";
when "1110" = > b20 < = "0100101111000101";
when "1111" = > b20 < = "0101001000000011";
end case;
case a21 is
when "0000" = > b21 < = "0000000000000000";
when "0001" = > b21 < = "0000011000111110";
when "0010" = > b21 < = "0001000111000111";
when "0011" = > b21 < = "0001100000000101";
when "0100" = > b21 < = "0001101010011011";
when "0101" = > b21 < = "0010000011011001";
when "0110" = > b21 < = "0010110001100010";
when "0111" = > b21 < = "0011001010100000";
when "1000" = > b21 < = "0001111101100010";
when "1001" = > b21 < = "0010010110100000";
when "1010" = > b21 < = "0011000100101001";
when "1011" = > b21 < = "0011011101101000";
when "1100" = > b21 < = "0011100111111101";
when "1101" = > b21 < = "0100000000111100";
when "1110" = > b21 < = "0100101111000101";
when "1111" = > b21 < = "0101001000000011";
end case;
case a30 is
when "0000" = > b30 < = "0000000000000000";
when "0001" = > b30 < = "1110001001110000";
when "0010" = > b30 < = "1111001111000010";
when "0011" = > b30 < = "1101011000110001";
when "0100" = > b30 < = "0000110000111110";
when "0101" = > b30 < = "1110111010101111";
when "0110" = > b30 < = "0000000000000000";
when "0111" = > b30 < = "1110001001110000";
when "1000" = > b30 < = "0001110110010000";
when "1001" = > b30 < = "0000000000000000";
when "1010" = > b30 < = "0001000101010001";
when "1011" = > b30 < = "1111001111000010";
when "1100" = > b30 < = "0010100111001111";
when "1101" = > b30 < = "0000110000111110";
when "1110" = > b30 < = "0001110110010000";







































































> b40 < = "0000000000000000"
> b40 < = "1110111000111001"
> b40 < = "1110000010011110"
> b40 < = "1100111011010111"
> b40 < = "1111100111000010"
> b40 <= "1110011111111011"
> b40 < = "1101101001100000"
> b40 < = "1100100010011000"
> b40 < = "0001101010011011"
> b40 < = "0000100011010100"
> b40 < = "1111101100111001"
> b40 < = "1110100101110010"
> b40 < = "0001010001011101"
> b40 < = "0000001010010101"
> b40 < = "1111010011111011"





> b41 < = "0000000000000000";
> b41 <= "1110111000111001";
> b41 <= "1110000010011110";
82
when "0011" = > b41 < = "1100111011010111";
when "0100" = > b41 < = "1111100111000010";
when "0101" = > b41 < = "1110011111111011";
when "0110" = > b41 < = "1101101001100000";
when "0111" = > b41 < = "1100100010011000";
when "1000" = > b41 < = "0001101010011011";
when "1001" = > b41 < = "0000100011010100";
when "1010" = > b41 < = "1111101100111001";
when "1011" = > b41 < = "1110100101110010";
when "1100" = > b41 < = "0001010001011101";
when "1101" = > b41 < = "0000001010010101";
when "1110" = > b41 < = "1111010011111011";
when "1111" => b41 <= "1110001100110100";
end case;
case a50 is
when "0000" = > b50 < = "0000000000000000";
when "0001" = > b50 < = "0001011010100000";
when "0010" = > b50 < = "1110100101100000";
when "0011" = > b50 < = "0000000000000000";
when "0100" = > b50 < = "1110100101100000";
when "0101" = > b50 < = "0000000000000000";
when "0110" = > b50 < = "1101001010111111";
when "0111" = > b50 < = "1110100101100000";
when "1000" = > b50 < = "0001011010100000";
when "1001" = > b50 < = "0010110101000001";
when "1010" = > b50 < = "0000000000000000";
When "1011" = > b50 < = "0001011010100000";
When "1100" = > b50 < = "0000000000000000";
When "1101" = > b50 < = "0001011010100000";
When "1110" = > b50 < = "1110100101100000";
When "1111" => b50 <= "0000000000000000";
end case;
case a51 is
when "0000" = > b51 < = "0000000000000000";
when "0001" = > b51 < = "0001011010100000";
when "0010" = > b51 < = "1110100101100000";
when "0011" = > b51 < = "0000000000000000";
when "0100" = > b51 < = "1110100101100000";
when "0101" = > b51 < = "0000000000000000";
when "0110" = > b51 < = "1101001010111111";
when "0111" = > b51 < = "1110100101100000";
83
when "1000" = > b51 < = "0001011010100000";
when "1001" = > b51 < = "0010110101000001";
when "1010" = > b51 < = "0000000000000000";
When "1011" = > b51 < = "0001011010100000";
When "1100" = > b51 < = "0000000000000000";
When "1101" = > b51 < = "0001011010100000";
When "1110" = > b51 < = "1110100101100000";
When "1111" => b51 <= "0000000000000000";
end case;
case a60 is
when "0000" = > b60 < = "0000000000000000";
when "0001" = > b60 < = "0001101010011011";
when "0010" = > b60 < = "0000011000111110";
when "0011" = > b60 < = "0010000011011001";
when "0100" = > b60 < = "1110000010011110";
when "0101" = > b60 < = "1111101100111001";
when "0110" = > b60 < = "1110011011011100";
when "0111" = > b60 < = "0000000101110110";
when "1000" = > b60 < = "0001000111000111";
when "1001" = > b60 < = "0010110001100010";
when "1010" = > b60 < = "0001100000000101";
When "1011" = > b60 < = "0011001010100000";
When "1100" = > b60 < = "1111001001100101";
When "1101" = > b60 < = "0000110100000000";
When "1110" = > b60 < = "1111100010100011";
When "1111" = > b60 < = "0001001100111110";
end case;
case a61 is
when "0000" = > b61 < = "0000000000000000"
when "0001" = > b61 < = "0001101010011011"
when "0010" = > b61 < = "0000011000111110"
when "0011" = > b61 < = "0010000011011001"
when "0100" = > b61 < = "1110000010011110"
when "0101" = > b61 < = "1111101100111001"
when "0110" = > b61 < = "1110011011011100"
when "0111" = > b61 < = "0000000101110110"
when "1000" = > b61 < = "0001000111000111"
when "1001" = > b61 < = "0010110001100010"
when "1010" = > b61 < = "0001100000000101"
When "1011" = > b61 < = "0011001010100000";
When "1100" = > b61 < = "1111001001100101";
84
When "HOI" = > b61 < = "0000110100000000";
When "1110" = > b61 < = "1111100010100011";
When "1111" = > b61 < = "0001001100111110";
end case;
case a70 is
when "0000" = > b70 < = "0000000000000000";
when "0001" = > b70 < = "1111001111000010";
when "0010" = > b70 < = "0001110110010000";
when "0011" = > b70 < = "0001000101010001";
when "0100" = > b70 < = "1110001001110000";
when "0101" = > b70 < = "1101011000110001";
when "0110" = > b70 < = "0000000000000000";
when "0111" = > b70 < = "1111001111000010";
when "1000" = > b70 < = "0000110000111110";
when "1001" = > b70 < = "0000000000000000";
when "1010" = > b70 < = "0010100111001111";
When "1011" = > b70 < = "0001110110010000";
When "1100" = > b70 < = "1110111010101111";
When "1101" = > b70 < = "1110001001110000";
When "1110" = > b70 < = "0000110000111110";
When "1111" = > b70 < = "0000000000000000";
end case;
case a71 is
when "0000" = > b71 < = "0000000000000000";
when "0001" = > b71 < = "1111001111000010";
when "0010" = > b71 < = "0001110110010000";
when "0011" = > b71 < = "0001000101010001";
when "0100" = > b71 < = "1110001001110000";
when "0101" = > b71 < = "1101011000110001";
when "0110" = > b71 < = "0000000000000000";
when "0111" = > b71 < = "1111001111000010";
when "1000" = > b71 < = "0000110000111110";
when "1001" = > b71 < = "0000000000000000";
when "1010" = > b71 < = "0010100111001111";
When "1011" = > b71 < = "0001110110010000";
When "1100" = > b71 < = "1110111010101111";
When "1101" = > b71 < = "1110001001110000";
When "1110" = > b71 < = "0000110000111110";




when "0000" = > b80 < = "0000000000000000";
when "0001" = > b80 < = "1110000010011110";
when "0010" = > b80 < = "0001101010011011";
when "0011" = > b80 < = "1111101100111001";
when "0100" = > b80 < = "1110111000111001";
when "0101" = > b80 < = "1100111011010111";
when "0110" = > b80 < = "0000100011010100";
when "0111" = > b80 < = "1110100101110010";
when "1000" = > b80 < = "0000011000111110";
when "1001" = > b80 < = "1110011011011100";
when "1010" = > b80 < = "0010000011011001";
When "1011" = > b80 < = "0000000101110110";
When "1100" = > b80 < = "1111010001110111";
When "1101" = > b80 < = "1101010100010101";
When "1110" = > b80 < = "0000111100010010";
When "1111" = > b80 < = "1110111110110000";
end case;
case a81 is
when "0000" = > b81 < = "0000000000000000";
when "0001" = > b81 < = "1110000010011110";
when "0010" = > b81 < = "0001101010011011";
when "0011" = > b81 < = "1111101100111001";
when "0100" = > b81 < = "1110111000111001";
when "0101" = > b81 < = "1100111011010111";
when "0110" = > b81 < = "0000100011010100";
when "0111" = > b81 < = "1110100101110010";
when "1000" = > b81 < = "0000011000111110";
when "1001" = > b81 < = "1110011011011100";
when "1010" = > b81 < = "0010000011011001";
When "1011" = > b81 < = "0000000101110110";
When "1100" = > b81 < = "1111010001110111";
When "1101" = > b81 < = "1101010100010101";
When "1110" = > b81 < = "0000111100010010";









bit vector(15 downto 0);
bl0,bll,b20,b21,b30,b31,b40,b41,b50,b51,b60,b61,b70,b71,b80,b81:
out bit vector(15 downto 0);
CLK : bit);
end shi_l;
architecture beh of shil is
begin
process
variable al,a2,a3,a4,a5,a6,a7,a8 : bit_vector(15 downto 0);
begin
wait until CLK'event and CLK = '1';
if fl(15) ='0' then
al(15) : = '0';
else
al(15) : — ji'«
end if;
al(14) : = fl(15):




al(9) : = fl(10);
al(8) : = fl(9);
al(7) : = fl(8);
al(6) : = fl(7);
al(5) : = fl(6);
al(4) : = fl(5);
al(3) : = fl(4);
al(2) : = fl(3);
al(l) : = n(2);
al(0) : = fl(l);
blO < = al;
bll < = f2;
if f3(15) = '0' then
a2(15) : = '0';
else



































































































































































































































































b80 < = a8;






procedure bi_to_in —change 16 bitsd sign,l integer and 14 fraction into real)
(variable x : bit_vector(15 downto 0);
variable y : < ut integer);
procedure in_to_bi —change real into binary (1 sign,l integer, 14 fractions),
(variable m : in integer;
variable n : out bit_vector(15 downto 0));
end packl;
package body packl is
procedure bi_to_in
(variable x : bit_vector(15 downto 0);
variable y : out integer) is
variable sum : integer :=0;
variable p : bit_vector(15 downto 0);
begin
D ! — X*
if p(15) = *V then
91
for i in to 14 loop
if p(i) = '1' then
for i in to 13 loop




for k in to 14 loop
if p(k) = 'V then





for 1 in to 14 loop
if p(l) = 'V then







(variable m : in integer;
variable n : out bit_vector(15 downto 0)) is
variable temp_a : integer : = 0;
variable tempb : integer : = 0;
variable w : bit_vector(15 downto 0);
begin





for i in 14 downto loop
tempb := temp_a/(2**i);
tempa := tempa rem (2**i);
if (temp_b = 1) then
w(i) : = '1';
else








for k in to 14 loop
if w(k) = '1' then
for k in to 13 loop





— prevent negative zero occurs.
if w(14) = '0' and w(13) = '0' and w(12) = '0' and w(ll) = '0' and
w(10) = '0' and
w(9) = '0' and w(8) = '0' and w(7) = '0' and w(6) = '0' and w(5) = '0' and











bl,b2,b3,b4,b5,b6,b7,b8 : out bit_vector(15 downto 0);
CLK,as : bit);
end addg;




nl,n2,n3,n4,n5,n6,n7,n8 : bit_vector(15 downto 0);
variable yl,y2,y3,y4,y5,y6,y7,y8,y9,yl0,yll,yl2,yl3,yl4,yl5,yl6,
ml,m2,m3,m4,m5,m6,ni7,m8 : integer := 0;
begin
wait until CLK'event and CLK = '1';
xl := al; x2 := a2; x3 := a3; x4 := a4;
x5 := a5; x6 := a6; x7 := a7; x8 := a8;
93
x9 := a9; xlO := alO; xll := all; xl2 := al2;
xl3 := al3; xl4 := al4; xl5 := al5; xl6 := al6;
bi_to_in(xl ,y 1) ;bi_to_in(x2,y2) ;bi_to_in(x3,y3) ;bi_to_in(x4,y4)
;
bi_to_in(x5,y5) ;bi_to_in(x6,y6) ;bi_to_in(x7,y7) ;bi_to_in(x8,y8)





if as = '0' then
ml := yl + y2; m2 := y3 + y4; m3 := y5 + y6; m4 := y7 + y8;
m5 := y9 + ylO; m6 := yll + yl2; m7 := yl3 + yl4; m8 := yl5 + yl6;
else
ml := yl - y2; m2 := y3 - y4; m3 := y5 - y6; m4 := y7 - y8;
m5 := y9 - ylO; m6 := yll - yl2; m7 := yl3 - yl4; m8 := yl5 - yl6;
end if;
in_to_bi(ml,nl); in_to_bi(m2,n2); in_to_bi(m3,n3); in_to_bi(m4,n4);
in_to_bi(m5,n5); in_to_bi(m6,n6); in_to_bi(m7,n7); in_to_bi(m8,n8);
bl < = nl; b2 < = n2; b3 < = n3; b4 < = n4;






port(a0,al,a2,a3,a4,a5,a6,a7 : bit_vector(15 downto 0);
D0,bl,b2,b3,b4,b5,b6,b7 : out bit_vector(15 downto 0);
CLK : bit);
end reg_h;
architecture beh of regh is
begin
process










wait until CLK'event and CLK = '1';
bO < = dO;
94
bl <= dl;
b2 < = d2;
b3 < = d3;
b4 < = d4;
b5 < = d5;
b6 < = d6;









bl,b2,b3,b4,b5,b6,b7,b8 : out bit_vector(15 downto 0);
CLK : bit);
end addi;




nl,n2,n3,n4,n5,n6,n7,n8 : bit_vector(15 downto 0);
variable yl,y2,y3,y4,y5,y6,y7,y8,y9,yl0,yll,yl2,yl3,yl4,yl5,yl6,
ml,m2,m3,m4,m5,m6,m7,m8 : integer := 0;
begin
xl := al; x2 := a2; x3 := a3; x4 := a4;
x5 := a5; x6 := a6; x7 := a7; x8 := a8;
x9 := a9; xlO := alO; xll := all; xl2 := al2;
xl3 := al3; xl4 := al4; xl5 := al5; xl6 := al6;
bi_to_in(xl,yl);bi_to_in(x2,y2);bi_to_in(x3,y3);bi_to_in(x4,y4);
bi_to_in(x5,y5);bi_to_in(x6,y6);bi_to_in(x7,y7);bi_to_in(x8,y8);




ml := yl + y2; m2 := y3 + y4; m3 := y5 + y6; m4 := y7 + y8;
m5 := y9 + ylO; m6 := yll + yl2; m7 := yl3 + yl4; m8 := yl5 + yl6;
in_to_bi(ml,nl); in_to_bi(m2,n2); in_to_bi(m3,n3); in_to_bi(m4,n4);
in_to_bi(m5,n5); in_to_bi(m6,n6); in_to_bi(m7,n7); in_to_bi(m8,n8);
bl < = nl; b2 < = n2; b3 < = n3; b4 < = n4;
95




Shift right 2-bit register
entity shi_2 is
port(al,a2,a3,a4,a5,a6,a7,a8 : bit_vector(15 downto 0);
Srl,sr2,sr3,sr4,sr5,sr6,sr7,sr8,bl,b2,b3,b4,b5,b6,b7,b8
:
out bit_vector( 15 downto 0);clr : bit_vector(15 downto 0);
CLK : bit);
end shi_2;






variable i : integer := 0;
begin
wait until CLK'event and CLK = »1»;
xl := al; x2 := a2; x3 := a3; x4 := a4;
x5 := a5; x6 := a6; x7 := a7; x8 := a8;
if xl(15) = '0'then
yl(13) := xl(15); yl(12) := xl(14); yl(ll) := xl(13);
yl(10) := xl(12); yl(9) := Xl(ll); Yl(8) := Xl(10);
yl(7) := Xl(9); yl(6) := xl(8); yl(5) := xl(7);
yl(4) := xl(6); yl(3) := xl(5); yl(2) := xl(4);
yl(l) := xl(3); yl(0) := xl(2); yl(14) := '0';
yl(15) := '0';
else
yl(13) := xl(15); yl(12) := xl(14); yl(ll) := xl(13);
yl(10) := xl(12); yl(9) := Xl(ll); Yl(8) := Xl(10);
yl(7) := Xl(9); yl(6) := xl(8); yl(5) := xl(7);
yl(4) := xl(6); yl(3) := xl(5); yl(2) := xl(4);
yl(l) := xl(3); yl(0) := xl(2); yl(14) := '1';
yl(15) := '1';
end if;
if x2(15) = '0'then
y2(13) := x2(15); y2(12) := x2(14); y2(ll) := x2(13);
y2(10) := x2(12); y2(9) := X2(ll); Y2(8) := X2(10);
y2(7) := X2(9); y2(6) := x2(8); y2(5) := x2(7);
y2(4) := x2(6); y2(3) := x2(5); y2(2) := x2(4);
96
y2(l) := x2(3); y2(0) := x2(2); y2(14) := '0';
y2(15) := '0';
else
y2(13) := x2(15); y2(12) := x2(14); y2(ll) := x2(13);
y2(10) := x2(12); y2(9) := X2(ll); Y2(8) := X2(10);
y2(7) := X2(9); y2(6) := x2(8); y2(5) := x2(7);
y2(4) := x2(6); y2(3) := x2(5); y2(2) := x2(4);




y3(13) := x3(15); y3(12) := x3(14); y3(ll) := x3(13);
y3(10) := x3(12); y3(9) := X3(ll); y3(8) := x3(10);
y3(7) := X3(9); y3(6) := x3(8); y3(5) : = x3(7);
y3(4) := x3(6); y3(3) := x3(5); y3(2) := x3(4);
y3(l) := x3(3); y3(0) := x3(2); y3(14) := '0';
y3(15) := '0';
else
y3(13) := x3(15); y3(12) := x3(14); y3(ll) := x3(13);
y3(10) := x3(12); y3(9) := X3(ll); Y3(8) := X3(10);
y3(7) := X3(9); y3(6) := x3(8); y3(5) := x3(7);
y3(4) := x3(6); y3(3) := x3(5); y3(2) := x3(4);
y3(l) := x3(3); y3(0) := x3(2); y3(14) := '1';
y3(15) := '1';
end if;
if x4(15) = '0' then
y4(13) := x4(15); y4(12) := x4(14); y4(ll) := x4(13);
y4(10) := x4(12); y4(9) := X4(ll); y4(8) := x4(10);
y4(7) := X4(9); y4(6) := x4(8); y4(5) := x4(7);
y4(4) := x4(6); y4(3) := x4(5); y4(2) := x4(4);
y4(l) := x4(3); y4(0) := x4(2); y4(14) := '0';
y4(15) := '0';
else
y4(13) := x4(15); y4(12) := x4(14); y4(ll) := x4(13);
y4(10) := x4(12); y4(9) := X4(ll); Y4(8) := X4(10);
y4(7) := X4(9); y4(6) := x4(8); y4(5) := x4(7);
y4(4) := x4(6); y4(3) := x4(5); y4(2) := x4(4);





y5(13) := x5(15); y5(12) := x5(14); y5(ll) := x5(13);
y5(10) := x5(12); y5(9) := X5(ll); y5(8) := x5(10);
y5(7) := X5(9); y5(6) := x5(8); y5(5) := x5(7);
y5(4) := x5(6); y5(3) := x5(5); y5(2) := x5(4);
y5(l) := x5(3); y5(0) := x5(2); y5(14) := '0';
y5(15) := '0';
else
y5(13) := x5(15); y5(12) := x5(14); y5(ll) := x5(13);
y5(10) := x5(12); y5(9) := x5(ll); y5(8) := x5(10);
y5(7) := X5(9); y5(6) := x5(8); y5(5) := x5(7);
y5(4) := x5(6); y5(3) := x5(5); y5(2) := x5(4);
y5(l) := x5(3); y5(0) := x5(2); y5(14) := T;
y5(15) := '1';
end if;
if x6(15) = ,0'then
y6(13) := x6(15); y6(12) := x6(14); y6(ll) := x6(13);
y6(10) := x6(12); y6(9) := X6(ll); y6(8) := x6(10);
y6(7) := X6(9); y6(6) := x6(8); y6(5) := x6(7);
y6(4) := x6(6); y6(3) := x6(5); y6(2) := x6(4);
y6(l) : = x6(3); y6(0) := x6(2); y6(14) := '0';
y6(15) := '0';
else
y6(13) := x6(15); y6(12) := x6(14); y6(ll) := x6(13);
y6(10) := x6(12); y6(9) := x6(ll); y6(8) := x6(10);
y6(7) := X6(9); y6(6) := x6(8); y6(5) := x6(7);
y6(4) := x6(6); y6(3) := x6(5); y6(2) := x6(4);




y7(13) := x7(15); y7(12) := x7(14); y7(ll) := x7(13);
y7(10) := x7(12); y7(9) := X7(ll); y7(8) := x7(10);
y7(7) := X7(9); y7(6) := x7(8); y7(5) := x7(7);
y7(4) := x7(6); y7(3) := x7(5); y7(2) := x7(4);
y7(l) := x7(3); y7(0) := x7(2); y7(14) := '0';
y7(15) := '0';
else
y7(13) := x7(15); y7(12) := x7(14); y7(ll) := x7(13);
y7(10) := x7(12); y7(9) := x7(ll); y7(8) := x7(10);
y7(7) := X7(9); y7(6) := x7(8); y7(5) := x7(7);
98
y7(4) := x7(6); y7(3) := x7(5); y7(2) := x7(4);
y7(l) := x7(3); y7(0) := x7(2); y7(14) := '1';
y7(15) := '1';
end if;
if x8(15) = '0'then
y8(13) := x8(15); y8(12) := x8(14); y8(ll) := x8(13);
y8(10) := x8(12); y8(9) := X8(ll); y8(8) := x8(10);
y8(7) := X8(9); y8(6) := x8(8); y8(5) := x8(7);
y8(4) := x8(6); y8(3) := x8(5); y8(2) := x8(4);
y8(l) := x8(3); y8(0) := x8(2); y8(14) := '0';
y8(15) := '0';
else
y8(13) := x8(15); y8(12) := x8(14); y8(ll) := x8(13);
y8(10) := x8(12); y8(9) := x8(ll); y8(8) := x8(10);
y8(7) := X8(9); y8(6) := x8(8); y8(5) := x8(7);
y8(4) := x8(6); y8(3) := x8(5); y8(2) := x8(4);
y8(l) := x8(3); y8(0) := x8(2); y8(14) := '1';
y8(15) := '1';
end if;
srl < = yl; sr2 < = y2; sr3 < = y3; sr4 < = y4;
sr5 < = y5; sr6 < = y6; sr7 < = y7; sr8 < = y8;
i:= i+1;
if i = 6 then
bl < = yl; b2 < = y2; b3 < = y3; b4 < = y4;
b5 < = y5; b6 < = y6; b7 < = y7; b8 < = y8;
xl := clr; x2 := clr; x3 := clr; x4 := clr;
x5 := clr; x6 := clr; x7 := clr; x8 := clr;
srl < = clr; sr2 < = clr; sr3 < = clr; sr4 < = clr;








port(al,a2,a3,a4,a5,a6,a7,a8 : bit_vector(15 downto 0);
k : out bit_vector(15 downto 0);CLK : bit);
end result;
architecture beh of result is




variable x : r;
begin
x(0) := al; x(l) := a2; x(2) := a3; x(3) := a4;
x(4) := a5; x(5) := a6; x(6) := a7; x(7) := a8;
for i in to 7 loop
wait until CLK'event and CLK = '1';







entity test is end test;
architecture str of test is
component clockge port(CLCK :inout bit);
end component;
component clock port(CLK :inout bit);
end component;
component control port(CLK : bit;ct : out bit);
end component;
component LOAD port(AI : in bit_vector(ll downto 0);
B0,B1,B2,B3,B4,B5,B6,B7 : out bit_vector(ll downto 0);
CLK : in bit);
end component;
component shift
port(bi0,bil,bi2,bi3,bi4,bi5,bi6,bi7: in bit_vector(ll downto 0);
bo0,bol,bo2,bo3,bo4,bo5,bo6,bo7: out bit_vector(l downto 0);
CLK : in bit);
end component;
component adsu
port(a0,al,a2,a3,a4,a5,a6,a7 : bit_vector(l downto 0);




port(a0,al,a2,a3,a4,a5,a6,a7 : bit_vector(l downto 0);





port(e0,el,e2,e3,e4,e5,e6,e7 : bit_vector(l downto 0);
bl0,bll,b20,b21,b30,b31,b40,b41,b50,b51,b60,b61,b70,b71,b80,b81:











port(a: bit;b: out bit;CLK: bit);
end component;
component delay2
port (a: bit;b: out bit;CLK: bit);
end component;
component delay3




port(a: bit;b: out bit;CLK: bit);
end component;
component delayS
port (a: bit;b: out bit;CLK: bit);
end component;
component delay6
port(a: bit;b: out bit;CLK: bit);
end component;
component delay7
portia: bit;b: out bit;CLK: bit);
end component;
component delay8
port (a: bit;b: out bit;CLK: bit);
end component;
component delay
port(a: bit;b: out bit;CLK: bit);
end component;
component delay10










port(a0,al,a2,a3,a4,a5,a6,a7 : bit_vector(15 downto 0);







out bit_vector(15 downto 0);CLK : bit);
end component;
component shi_2
port(al,a2,a3,a4,a5,a6,a7,a8 : bit_vector(15 downto 0);
Srl,sr2,sr3,sr4,sr5,sr6,sr7,sr8,bl,b2,b3,b4,b5,b6,b7,b8
:




port(al,a2,a3,a4,a5,a6,a7,a8 : bit_vector(15 downto 0);
k : out bit_vector(15 downto 0); CLK : bit );
end component;
for C: clockge use entity work.clock_ge(clk_ctI);
for ad: clock use entity work.clock(beh);
for a : control use entity work.control(beh);
for L : LOAD use entity work.LOAD(BEH);
for S : shift use entity work.shift(beh);
for D : adsu use entity work.adsu(beh);
for r : reg use entity work.reg(beh);
for o : rom use entity work.rom(ben);
for s_l : shi_l use entity work.shi_l(beh);
for b : delayl use entity work.delayl(beh);








delay3 use entity work.delay3(beh)
delay4 use entity work.delay4(beh)
delayS use entity work.delay5(beh)
delay6 use entity work.delay6(beh)
delay7 use entity work.delay7(beh)
delay8 use entity work.delay8(beh)
delay9 use entity work.delay9(ben)
102
for delylO : delay10 use entity work.delay10(beh);
for g : addg use entity work.add_g(ben);
for h : reg_h use entity work.regh(beh);
for i : add_i use entity work.add_i(beh);
for j : shi_2 use entity work.shi_2(beh);
for t : result use entity work. result(beh);
signal di : bit_vector(ll downto 0);
signal ck : bit;
signal clck : bit;
signal go : bit;
signal io : bit;
signal ho : bit;
signal te : bit;
signal de : bit;
signal ab : bit;
signal cd : bit;
signal ef : bit;
signal gh : bit;
signal ij : bit;
signal kl : bit;
signal d0,dl,d2,d3,d4,d5,d6,d7 : bit vector(ll downto 0);
Signal So0,sol,so2,so3,so4,so5,so6,so7 : bit_vector(l downto 0);
signal co0,col,co2,co3,co4,co5,co6,co7 : bit_vector(l downto 0);
signal do0,dol,do2,do3,do4,do5,do6,do7: bit_vector(l downto 0);
signal clr : bit : = '0';






signal gl,g2,g3,g4,g5,g6,g7,g8 : bit_vector(15 downto 0);
signal hl,h2,h3,h4,h5,h6,h7,h8 : bit_vector(15 downto 0);
signal il,i2,i3,i4,i5,i6,i7,i8 : bit_vector(15 downto 0);
signal J1J2J3J4J5J6J7J8 : bit_vector(15 downto 0);
signal rl,r2,r3,r4,r5,r6,r7,r8 : bit_vector(15 downto 0);
signal cr : bit_vector(15 downto 0) := "0000000000000000";
signal p : bit_vector(15 downto 0);
begin
C : clock_ge port map(ck);
ad : clock port map (clck);
a : control port map(ck,go);
b : delay 1 port map(go,io,ck);
e : delay2 port map(ck,ho,clck);
103
dely3 : delay3 port map(ho,te,clck);
dely4 : delay4 port map(te,de,clck);
dely5 : delayS port map(de,ab,clck);
dely6 : delay6 port map(ab,cd,clck);
dely7 : delay7 port map(cd,ef,clck);
dely8 : delay8 port map(ef,gh,clck);
dely9 : delay9 port map(gh,ij,clck);
delylO : delay10 port map(ij,kl,clck);
L : LOAD port map(di,d0,dl,d2,d3,d4,d5,d6,d7,ck);
S : shift port map(d0,dl,d2,d3,d4,d5,d6,d7,
So0,sol,so2,so3,so4,so5,so6,so7,ck);
D : adsu port map(so0,sol,so2,so3,so4,so5,so6,so7,
Co0,col,co2,co3,co4,co5,co6,co7,
ck,clr,set);
r : reg port map(co0,col,co2,co3,co4,co5,co6,co7,
do0,dol,do2,do3,do4,do5,do6,do7,
ck);
o : rom port map(do0,dol,do2,do3,do4,do5,do6,do7,
el,e2,e3,e4,e5,e6,e7,e8,e9,el0,ell,el2,el3,el4,
el5,el6,ck);
si : shil port Inap(el,e2,e3,e4,e5,e6,e7,e8,e9,el0,ell,el2,el3,el4,el5,el6,
n,f2,f3,f4,f5,f6,n,f8,f9,fio,ni,n2,n3,n4,n5,n6,
ck);
g : add_g port map(fl,rc,ra,f4,f5,f6,n,f8,f9,n0,m,fl2,fl3,fl4,fl5,fl6,
gl,g2,g3,g4,g5,g6,g7,g8,ck,io);
h : reg h port map(gl,g2,g3,g4,g5,g6,g7,g8,hl,h2,h3,h4,h5,h6,h7,h8,ck);
i : addi port map(hl,rl,h2,r2,h3,r3,h4,r4,h5,r5,h6,r6,h7,r7,h8,r8,
il,i2,i3,i4,i5,i6,i7,i8,ck);
j : shi_2 port map(il,i2,i3,i4,i5,i6,i7,i8,rl,r2,r3,r4,r5,r6,r7,r8,
JlJ2j3j4j5J6J7j8,cr,kl);
t : result port map(jlJ2J3J4J5j6J7J8,p,ck);
set < = '1' after 5 ns;
di < = "000101101010" after 7 ns,
"000000000000" after 17 ns,
"000101101010" after 27 ns,
"001011010100" after 37 ns,
"000101101010" after 47 ns,
"000000000000" after 57 ns,
"000101101010" after 67 ns,
"001011010100" after 77 ns;
end str;
104
APPENDIX B. 16-BIT 1-D DCT VHDL SOURCE CODE
Shift right 2-bit register
entity shi_2 is
port(al,a2,a3,a4,a5,a6,a7,a8 : bit_vector(15 downto 0);
Srl,sr2,sr3,sr4,sr5,sr6,sr7,sr8,bl,b2,b3,b4,b5,b6,b7,b8
:
out bit_vector( 15 downto 0);clr : bit_vector(15 downto 0);
CLK : bit);
end shi_2;






variable i : integer := 0;
begin
wait until CLK'event and CLK = '1';
xl := al; x2 := a2; x3 := a3; x4 := a4;
x5 := a5; x6 := a6; x7 := a7; x8 := a8;
ifxl(15) = '0'then
yl(13) := xl(15); yl(12) := xl(14); yl(ll) := xl(13);
yl(10) := xl(12); yl(9) := Xl(ll); Yl(8) := Xl(10);
yl(7) := Xl(9); yl(6) := xl(8); yl(5) := xl(7);
yl(4) := xl(6); yl(3) := xl(5); yl(2) := xl(4);
yl(l) := xl(3); yl(0) := xl(2); yl(14) := '0';
yl(15) := '0';
else
yl(13) := xl(15); yl(12) := xl(14); yl(ll) := xl(13);
yl(10) := xl(12); yl(9) := Xl(ll); Yl(8) := Xl(10);
yl(7) := Xl(9); yl(6) := xl(8); yl(5) := xl(7);
yl(4) := xl(6); yl(3) := xl(5); yl(2) := xl(4);
yl(l) := xl(3); yl(0) := xl(2); yl(14) := '1';
yl(15) := '1';
end if;
if x2(15) = '0' then
y2(13) := x2(15); y2(12) := x2(14); y2(ll) := x2(13);
y2(10) := x2(12); y2(9) := X2(ll); Y2(8) := X2(10);
y2(7) := X2(9); y2(6) := x2(8); y2(5) := x2(7);
y2(4) := x2(6); y2(3) := x2(5); y2(2) := x2(4);
105
y2(l) := x2(3); y2(0) := x2(2); y2(14) := '0';
y2(15) := '0';
else
y2(13) := x2(15); y2(12) := x2(14); y2(ll) := x2(13);




= X2(9); y2(6) := x2(8); y2(5) := x2(7);
= x2(6); y2(3) := x2(5); y2(2) := x2(4);



















= x3(15); y3(12) := x3(14); y3(ll) := x3(13);
= x3(12); y3(9) := X3(ll); y3(8) := x3(10);
= X3(9); y3(6) := x3(8); y3(5) := x3(7);
= x3(6); y3(3) := x3(5); y3(2) := x3(4);
: x3(3); y3(0) := x3(2); y3(14) := '0';
= '0';
= x3(15); y3(12) := x3(14); y3(ll) := x3(13);
= x3(12); y3(9) := X3(ll); Y3(8) := X3(10);
= X3(9); y3(6) := x3(8); y3(5) := x3(7);
= x3(6); y3(3) := x3(5); y3(2) := x3(4);






























:= x4(14); y4(ll) := x4(13);
= X4(ll);y4(8):= x4(10);
x4(8); y4(5) := x4(7);
x4(5);y4(2):= x4(4);
x4(2);y4(14) := '0';
:= x4(14); y4(ll) := x4(13);
= X4(ll); Y4(8) := X4(10);




ifx5(15) = '0' then
y5(13) := x5(15); y5(12) := x5(14); y5(ll) := x5(13);
y5(10) := x5(12); y5(9) := X5(ll); y5(8) := x5(10);
y5(7) := X5(9); y5(6) := x5(8); y5(5) := x5(7);
y5(4) := x5(6); y5(3) := x5(5); y5(2) := x5(4);
y5(l) := x5(3); y5(0) := x5(2); y5(14) := '0';
y5(15) := '0';
else
y5(13) := x5(15); y5(12) := x5(14); y5(ll) := x5(13);
y5(10) := x5(12); y5(9) := x5(ll); y5(8) := x5(10);
y5(7) := X5(9); y5(6) := x5(8); y5(5) := x5(7);
y5(4) := x5(6); y5(3) := x5(5); y5(2) := x5(4);
y5(l) := x5(3); y5(0) := x5(2); y5(14) := '1';
y5(15) := '1';
end if;
if x6(15) = '0' then
y6(13) := x6(15); y6(12) := x6(14); y6(ll) := x6(13);
y6(10) := x6(12); y6(9) := X6(ll); y6(8) := x6(10);
y6(7) := X6(9); y6(6) := x6(8); y6(5) := x6(7);
y6(4) := x6(6); y6(3) := x6(5); y6(2) := x6(4);
y6(l) := x6(3); y6(0) := x6(2); y6(14) := '0';
y6(15) := '0';
else
y6(13) := x6(15); y6(12) := x6(14); y6(ll) := x6(13);
y6(10) := x6(12); y6(9) := x6(ll); y6(8) := x6(10);
y6(7) := X6(9); y6(6) := x6(8); y6(5) := x6(7);
y6(4) := x6(6); y6(3) := x6(5); y6(2) := x6(4);
y6(l) := x6(3); y6(0) := x6(2); y6(14) := '1';
y6(15) := '1';
end if;
ifx7(15) = '0' then
y7(13) := x7(15); y7(12) := x7(14); y7(ll) := x7(13);
y7(10) := x7(12); y7(9) := X7(ll); y7(8) := x7(10);
y7(7) := X7(9); y7(6) := x7(8); y7(5) := x7(7);
y7(4) := x7(6); y7(3) := x7(5); y7(2) := x7(4);
y7(l) := x7(3); y7(0) := x7(2); y7(14) := '0';
y7(15) := '0';
else
y7(13) := x7(15); y7(12) := x7(14); y7(ll) := x7(13);
y7(10) := x7(12); y7(9) := x7(ll); y7(8) := x7(10);
y7(7) := X7(9); y7(6) := x7(8); y7(5) := x7(7);
107
y7(4) := x7(6); y7(3) := x7(5); y7(2) := x7(4);
y7(l) := x7(3); y7(0) := x7(2); y7(14) := '1';
y7(15) := '1';
end if;
if x8(15) = '0'then
y8(13) := x8(15); y8(12) := x8(14); y8(ll) := x8(13);
y8(10) := x8(12); y8(9) := X8(ll); y8(8) := x8(10);
y8(7) := X8(9); y8(6) := x8(8); y8(5) := x8(7);
y8(4) := x8(6); y8(3) := x8(5); y8(2) := x8(4);
y8(l) := x8(3); y8(0) := x8(2); y8(14) := '0';
y8(15) := '0';
else
y8(13) := x8(15); y8(12) := x8(14); y8(ll) := x8(13);
y8(10) := x8(12); y8(9) := x8(ll); y8(8) := x8(10);
y8(7) := X8(9); y8(6) := x8(8); y8(5) := x8(7);
y8(4) := x8(6); y8(3) := x8(5); y8(2) := x8(4);
y8(l) := x8(3); y8(0) := x8(2); y8(14) := '1';
y8(15) := '1';
end if;
srl < = yl; sr2 < = y2; sr3 < = y3; sr4 < = y4;
sr5 < = y5; sr6 < = y6; sr7 < = y7; sr8 < = y8;
i:= i+lj
if i = 8 then
bl < = yl; b2 < = y2; b3 < = y3; b4 < = y4;
b5 < = y5; b6 < = y6; b7 < = y7; b8 < = y8;
xl := clr; x2 := clr; x3 := clr; x4 := clr;
x5 := clr; x6 := clr; x7 := clr; x8 := clr;
srl < = clr; sr2 < = clr; sr3 < = clr; sr4 < = clr;







use work. pack 1. all;
entity test is end test;
architecture str of test is
component clockge port(CLCK :inout bit);
end component;
component clock port(CLK :inout bit);
108
end component;
component control port(CLK : bit;ct : out bit);
end component;
component LOAD port(AI : in bit_vector(15 downto 0);
B0,B1,B2,B3,B4,B5,B6,B7 . Quf bit vector(15 downto 0);
CLK : in bit);
end component;
component shift
port(bi0,bil,bi2,bi3,bi4,bi5,bi6,bi7: in bit_vector(15 downto 0);
bo0,bol,bo2,bo3,bo4,bo5,bo6,bo7: out bit_vector(l downto 0);
CLK : in bit);
end component;
component adsu
port(a0,al,a2,a3,a4,a5,a6,a7 : bit_vector(l downto 0);




port(a0,al,a2,a3,a4,a5,a6,a7 : bit_vector(l downto 0);




port(e0,el,e2,e3,e4,e5,e6,e7 : bit_vector(l downto 0);
bl0,bll,b20,b21,b30,b31,b40,b41,b50,b51,b60,b61,b70,b71,b80,b81:











port (a: bit;b: out bit;CLK: bit);
end component;
component delay2




port (a: bit;b: out bit;CLK: bit);
end component;
component delay4
port(a: bit;b: out bit;CLK: bit);
end component;
component delay 15
port(a: bit ;b: out bit;CLK: bit);
end component;
component delay16
port(a: bit;b: out bit;CLK: bit);
end component;
component delay17
port (a: bit;b: out bit;CLK: bit);
end component;
component delay18









port(a0,al,a2,a3,a4,a5,a6,a7 : bit_vector(15 downto 0);







out bit_vector(15 downto 0);CLK : bit);
end component;
component shi_2
port(al,a2,a3,a4,a5,a6,a7,a8 : bit_vector(15 downto 0);
Srl,sr2,sr3,sr4,sr5,sr6,sr7,sr8,bl,b2,b3,b4,b5,b6,b7,b8
:




port(al,a2,a3,a4,a5,a6,a7,a8 : bit_vector(15 downto 0);
k : out bit_vector(15 downto 0); CLK : bit );
110
end component;
for C: c!ock_ge use entity work.clockge(clkctl);
for ad: clock use entity work.clock(beh);
for a : control use entity work.control(beh);
for L : LOAD use entity work.LOAD(BEH);
for S : shift use entity work.shift(beh);
for D : adsu use entity work.adsu(beh);
for r : reg use entity work.reg(ben);
for o : rom use entity work. rom (ben);
for si : shil use entity work.shi_l(beh);
for b : delayl use entity work.delayl(beh);
for e : delay2 use entity work.delay2(beh);
for dely3 : delay3 use entity work.delay3(beh);
for dely4 : delay4 use entity work.delay4(beh);
for delyl5 : delaylS use entity work.deIayl5(beh);
for delyl6 : delayl6 use entity work.delayl6(beh);
for delyl7 : delay17 use entity work.delayl7(beh);
for delyl8 : delay18 use entity work.de!ayl8(beh);
for g : add_g use entity work.add_g(beh);
for h : reg_h use entity work,regh (beh);
for i : addi use entity work.addi(beh);
for j : shi_2 use entity work.shi_2(beh);
for t : result use entity work.result(beh);
signal di : bit_vector(15 downto 0);
signal ck : bit;
signal clck : bit;
signal go : bit;
signal io : bit;
signal ho : bit;
signal te : bit;
signal de : bit;
signal op,qr,st,eo,ko,mo,qo,ro,so,uo : bit;
signal d0,dl,d2,d3,d4,d5,d6,d7 : bit_vector(15 downto 0);
Signal So0,sol,so2,so3,so4,so5,so6,so7 : bit_vector(l downto 0);
signal Co0,col,co2,co3,co4,co5,co6,co7 : bit_vector(l downto 0);
signal do0,dol,do2,do3,do4,do5,do6,do7: bit_vector(l downto 0);
signal clr : bit : = '0';





bit vector(15 downto 0);
111
signal gl,g2,g3,g4,g5,g6,g7,g8 : bit_vector(15 downto 0);
signal hl,h2,h3,h4,h5,h6,h7,h8 : bit_vector(15 downto 0);
signal il,i2,i3,i4,i5,i6,i7,i8 : bit_vector(15 downto 0);
signal J1J2J3J4J5J6J7J8 : bit_vector(15 downto 0);
signal rl,r2,r3,r4,r5,r6,r7,r8 : bit_vector(15 downto 0);
signal cr : bit_vector(15 downto 0) := "0000000000000000";
signal p : bit_vector(15 downto 0);
begin
C : clock_ge port map(ck);
ad : clock port map(clck);
a : control port map(ck,go);
b : delay 1 port map(go,io,ck);
e : delay! port map(ck,ho,clck);
dely3 : delay3 port map(ho,te,clck);
dely4 : delay4 port map(te.de,clck);
delyl5 : delaylS port map(io,eo,ck);
delyl6 : delay16 port map(eo,ko,ck);
delyl7 : delayl7 port map(ko,mo,ck);
delyl8 : delay18 port map(mo,qo,ck);
L : LOAD port Hiap(di,d0,dl,d2,d3,d4,d5,d6,d7,ck);
S : shift port map(d0,dl,d2,d3,d4,d5,d6,d7,
So0,sol,so2,so3,so4,so5,so6,so7,de);
D : adsu port map(so0,sol,so2,so3,so4,so5,so6,so7,
Co0,col,co2,co3,co4,co5,co6,co7,
ck,clr,set);
r : reg port map(co0,col,co2,co3,co4,co5,co6,co7,
do0,dol,do2,do3,do4,do5,do6,do7,
ck);
o : rom port map(do0,dol,do2,do3,do4,do5,do6,do7,
el,e2,e3,e4,e5,e6,e7,e8,e9,el0,ell,el2,el3,el4,
el5,el6,ck);
s_l : shi_l port map(el,e2,e3,e4,e5,e6,e7,e8,e9,el0,ell,el2,el3,el4,el5,el6,
fl,f2,f3,f4,f5,f6,n,f8,f9,fl0,fll,fl2,n3,n4,n5,n6,
ck)*
g : add_g portmap(n,f2,f3,f4,f5,f6,n,f8,f9,n0,fll,n2,n3,fl4,fl5,n6,
gl,g2,g3,g4,g5,g6,g7,g8,ck,qo);
h : reg h port map(gl,g2,g3,g4,g5,g6,g7,g8,hl,h2,h3,h4,h5,h6,h7,h8,ck);
i : add_i port map(hl,rl,h2,r2,h3,r3,h4,r4,h5,r5,h6,r6,h7,r7,h8,r8,
H,i2,i3,i4,i5,i6,i7,i8,ck);
j : shi_2 port map(il,i2,i3,i4,i5,i6,i7,i8,rl,r2,r3,r4,r5,r6,r7,r8,
jlj2J3j4J5J6J7j8,cr,ho);
t : result port map(jlJ2J3J4J5J6j7j8,p,ck);
112
set < = '1' after 5 ns;
di < = "0000010110101000" after 7 ns,
"0000000000000000" after 17 ns,
"0000010110101000" after 27 ns,
"0000101101010000" after 37 ns,
"0000010110101000" after 47 ns,
"0000000000000000" after 57 ns,
"0000010110101000" after 67 ns,
"0000101101010000" after 77 ns;
end str;
113
APPENDIX C. MATLAB PROGRAM OF DECIMAL-BINARY CONVERSION
while 1
x(l,16) = 0;












for k = 1:15;
if y > l,
x(i) = fix(y);




y = 2 * y;



































































































































































































































































Fig. 24 V7 hand calculation
120
APPENDIX E. FORMATION OF 2-BIT ADDER
A. TWO BIT ADDER TRUTH TABLE
Table XX Truth table of 2-bit adder





1 1 1 1
1 1 1 1








1 1 1 1 1
121
Table XXI (Table XX) continue
















1 1 1 1 1
Two bit adder has five inputs, three outputs. A 1? A , B,, and B represent the input
and Q represents the carrier in. Q,, q represent the output and C represents the carrier
out. After the set up of truth table, reduction can be made by Karnaugh map.
122
B1 \A1A0 JO - Q< B1 NMAO rtpi = 0^









Fig. 25 Karnaugh map reduction
Karnaugh map reduction gives the reduced boolean expression.
q x = A^C^Bfifi^Afififi^A^C^fi^A^fi^
m^am^am'v^^aVi^iWi^iWi (36)
123
q = AJS Ct * A B Ci * A^C, * A^ Ct (37)
C = A
1
A B0+A BlB +A tA Ci+A lB Ci+BiB Ci+AA+A BlCi (38)
124
LIST OF REFERENCES
1. L. J. D'Luna, An 8x 8 Discrete Cosine Transform Chip with Pixel Rate Clocks,
IEEE TH0303-8/90/0000, p7-5.1 - p7-5.4.
2. Herbert. Taub, Digital Circuits and Microprocessors, p58 - p81.
3. R. C. Gonzalez/P. Wintz, Digital Image Processing, pl21 - pl22.
4. Ernest. Meyer, VHDL opens the road to Top-Down Design, Computer Design,
Feb. 1, 1989, p57 - p62.
5. Lipsett/Schaefer/Ussery, VHDL : Hardware Description and Design, Kluwer
Academic Publishers, 1989.
6. James R. Armstrong, Chip-Level Modeling with VHDL, Prentice Hall, 1989.
7. David L. Barton, A First Course in VHDL, Design Automation Guide, 1988.
8. IEEE Standard VHDL Language Reference Manual Std 1076-1987, Institute of
Electronics Engineers, March 1988.
9. 386-MATLAB User's Guide, The MathWorks, Inc.
125
INITIAL DISTRIBUTION LIST
1. Defense Technical Information Center
Cameron Station
Alexandria, VA 22304-6145
2. Library, Code 52
Naval Postgraduate School
Monterey, CA 93943-5002
3. Department Chairman, Code 3A
Department of Electronic Warfare
Naval Postgraduate School
Monterey, CA 93943-5000
4. Professor Chin-Hwa Lee, Code EC/Le
Naval Postgraduate School
Monterey, CA 93943-5000







7. Library of Chung-Shan Institute of Technology
P.O. Box 1, Long-Tan
Tao-Yuan, Taiwan
Republic of China
8. Library of Chung-Cheng Institute of Technology
Tahsi, Tao-Yuan, Taiwan
Republic of China















tion of Discrete Cosine
Transform in image com-
pressioi

